Anthropogenic aerosol forcing – insights from multiple estimates from aerosol-climate models with reduced complexity

. This study assesses the change in anthropogenic aerosol forcing from the mid-1970s to the mid-2000s. Both decades had similar global-mean anthropogenic aerosol optical depths but substantially different global distributions. For both years, we quantify (i) the forcing spread due to model-internal variability and (ii) the forcing spread among models. Our assessment is based on new ensembles of atmosphere-only simulations with ﬁve state-of-the-art Earth system models. Four of these models will be used in the sixth Coupled Model Intercomparison Project (CMIP6; Eyring et al., 2016). Here, the complexity of the anthropogenic aerosol has been reduced in the participating models. In all our simulations, we prescribe the same patterns of the anthropogenic aerosol optical properties and associated effects on the cloud droplet number concentration. We calculate the instantaneous radiative forcing (RF) and the effective radiative forcing (ERF). Their difference deﬁnes the net contribution from rapid adjustments. Our simulations show a model spread in ERF from − 0 . 4 to − 0 . 9 W m − 2 . The standard deviation in annual ERF is 0.3 W m − 2 , based on 180 individual estimates from each participating model. This result implies that identifying the model spread in ERF due to systematic differences requires averaging over a sufﬁciently large number of years. More-over


Introduction
Despite decades of research on the radiative forcing of anthropogenic aerosol, quantifying the present-day magnitude and reconstructing the historical change of the forcing remains challenging.Figure 1 shows the anthropogenic aerosol optical depth for the mid-1970s and mid-2000s that we use in this study (Fiedler et al., 2017;Stevens et al., 2017).The anthropogenic aerosol pollution in the mid-1970s was larger in Europe and North America than in eastern Asia, whereas the opposite is the case in the mid-2000s.In addition to these regional changes in aerosol pollution, differences in the surface albedo, insolation and cloud regimes between the aerosol transport regions of the Pacific and continental Europe may result in temporal changes in the global effective radiative Published by Copernicus Publications on behalf of the European Geosciences Union.
forcing (ERF).Based on a single state-of-the-art climate model, the long-term and global ERF does not change despite the substantial spatial changes in anthropogenic aerosol optical depth (τ a ) between the mid-1970s and mid-2000s (Fiedler et al., 2017).Internal model variability, however, strongly affects annual estimates of the global-mean effective radiative forcing.
In light of model uncertainties (e.g.Kinne et al., 2006;Quaas et al., 2009;Lohmann and Ferrachat, 2010;Lacagnina et al., 2015;Koffi et al., 2016), the use of a single model does not necessarily represent the full spectrum of possible anthropogenic aerosol forcings.In the present study, we therefore revisit the question of Fiedler et al. (2017): "Does the substantial spatial change of the anthropogenic aerosol between the mid-1970s and mid-2000s affect the global magnitude of ERF?".This is based on ensembles of simulations from five global aerosol-climate models, all using identical anthropogenic aerosol perturbations of reduced complexity.In this context, we additionally ask the following: "What is the relative contribution of internal model variability to the ERF spread?".We document the model diversity for the preindustrial aerosol as well as cloud characteristics and the surface albedo that are relevant to the ERF of anthropogenic aerosol.Such model differences have previously been identified for other climate models (e.g.Stier et al., 2007;Nam et al., 2012;Fiedler et al., 2016;Crueger et al., 2018).
Previously a reduction in the model complexity has been accomplished by prescribing idealized aerosol radiative properties, e.g.within the framework of Aerosol Comparisons between Observations and Models (AeroCom; e.g.Randles et al., 2013;Stier et al., 2013).Here, we prescribe observationally constrained optical properties of anthropogenic aerosol and an associated effect on the cloud droplet number concentration with the simple plume parameterization (MACv2-SP; Fiedler et al., 2017;Stevens et al., 2017) but keep the full model diversity in all other aspects.The approach eliminates uncertainties in process modelling of anthropogenic aerosol such that our study represents uncertainties associated with other processes influencing the radiative forcing.In other words, by using MACv2-SP in the participating models, the model inter-comparison allows us to investigate those sources of uncertainty that remain if we pretend to know the spatial distribution of anthropogenic aerosol.This work can be seen as a pilot study for the Radiative Forcing Model Inter-comparison Project (RFMIP; Pincus et al., 2016), endorsed by the sixth Coupled Model Intercomparison Project (CMIP6; Eyring et al., 2016), using the same experiment set-up with MACv2-SP.
Throughout our model inter-comparison, we consider the effect of model-internal variability on estimates of ERF.We do so by producing equally sized ensembles of simulations for all participating models.Model-internal variability in this context is defined as the year-to-year changes in model parameters associated with inter-annual variations of the meteorological state.The results of the climate models are com-pared with satellite data and a stand-alone radiative transfer model.The following section introduces the models and the experiment strategy in more detail, followed by our discussion of the results in Sect. 3 and conclusions in Sect. 4.

Participating models
This work uses five Earth system models and one stand-alone radiative transfer code.The participating climate models, which are run here in an atmosphere-only mode, are the atmosphere component ECHAM6.3 of the Earth system model MPI-ESM1.2(Mauritsen et al., 2019) of the Max Planck Institute for Meteorology (MPI-M), ECHAM6.3-HAM2.3 from ETH Zürich (Neubauer et al., 2019;Tegen et al., 2019), EC-Earth (e.g. Hazeleger et al., 2010;Döscher et al., 2019) run at the Royal Netherlands Meteorological Institute, NorESM (Bentsen et al., 2013;Iversen et al., 2013;Kirkevåg et al., 2013) run at the Finnish Meteorological Institute and HadGEM3 (Walters et al., 2017) developed at the UK Met Office.All models except ECHAM6.3 can treat aerosols and their interaction with meteorological processes with complex process-based parameterization schemes linking aerosols to radiation and clouds.In this study, all physics packages except the parameterization of anthropogenic aerosols are model-dependent, e.g. the treatment of the pre-industrial aerosols and clouds differ.Appendix A summarizes differences in radiation, cloud and aerosol physics packages of the participating models.
In the present study, we prescribe the distributions of anthropogenic aerosols in all models, following the MACv2-SP approach (Fiedler et al., 2017;Stevens et al., 2017).MACv2-SP mimics the spatio-temporal distribution and wavelength dependence of the optical properties of anthropogenic aerosols as well as a change in the cloud droplet number concentration (N) to induce radiative effects associated with the physical processes of aerosol-radiation interactions (F ari ) and aerosol-cloud interactions (F aci ) in a consistent manner.To do so, MACv2-SP uses analytical functions for approximating the monthly distribution of the present-day anthropogenic aerosol optical depth and the vertical profile of the aerosol extinction from the updated MPI-M aerosol climatology (MACv2; Kinne et al., 2013Kinne et al., , 2019)).Figure 1 shows the annual mean patterns of the anthropogenic aerosol optical depth (τ a ) and the fractional increase in the cloud droplet number concentration (η N ) relative to the pre-industrial level of 1850 from MACv2-SP.By design, MACv2-SP does not simulate sub-monthly variability in anthropogenic aerosol.Absorption of anthropogenic aerosol is prescribed with a mid-visible single scattering albedo of 0.93 for industrial plumes and 0.87 for plumes with seasonally active biomass burning.The anthropogenic aerosols are assumed to be small in size, with an Ångström parameter of 2 The use of the optical properties from MACv2-SP yields a consistent description of F ari , including both direct radiative and semi-direct effects, across the models.All models account for the first indirect or Twomey effect by multiplying their cloud droplet number concentrations, calculated for pre-industrial aerosol conditions, by η N prior to the radiative transfer calculation.Since η N is larger than 1 in the presence of anthropogenic aerosols, the effective radius of cloud droplets is reduced, which enhances the cloud reflectivity of short-wave radiation.Note that η N is only available for regions with τ a > 0 (see Fig. 1).In addition, the EC-Earth model also includes a second indirect or cloud-lifetime effect by using the modified cloud droplet number concentrations in the cloud microphysics scheme (Döscher et al., 2019).
We neither prescribe the same natural aerosol nor interfere with any other model components than prescribing the optical properties of anthropogenic aerosols and η N .For instance, the pre-industrial aerosol optical depth (τ p ) depends on the model (Figs. 2 and 3), which only affects F ari and not F aci , as the prescription of η N is identical in the participating models.Regional differences in τ p occur primarily over oceans and deserts, where observations are typically sparse.It is noteworthy that ECHAM-HAM runs with interactive parameterizations for dust and sea-salt aerosol, resulting in different spatio-temporal variability in τ p (Fig. 3), while in ECHAM the monthly climatology from MACv1 is prescribed.In the interactive parameterizations, the natural aerosol emissions, transport and deposition rely on meteorological processes that are difficult to represent in coarse-resolution climate models; e.g.desert-dust emissions strongly depend on the model representation of near-surface winds (e.g.Fiedler et al., 2016) such that constraining the desert-dust burden remains challenging in aerosol modelling (e.g.Räisänen et al., 2013;Evan et al., 2014;Huneeus et al., 2016).The aerosol-climate models also contain some anthropogenic aerosol in τ p , but the majority of the pre-industrial aerosol optical depth is of natural origin.For instance, the 1850s global-mean τ p in NorESM is 0.096, to which anthropogenic fossil-fuel aerosols contribute 0.002.For comparison, the global-mean τ a prescribed here is 0.029 for 2005.
In addition to the complex climate models listed above, we use the offline radiative-transfer model of Kinne et al. (2013) for an assessment of the instantaneous radiative forcing.This model has 8 solar and 12 infrared bands and reads monthly maps of the atmospheric and surface properties.These are, for instance, monthly means for the cloud properties from the International Satellite Cloud Climatology Project (ISCCP) and the surface albedo from the satellite product MODIS-SSM/I described in Kinne et al. (2013).The radiative-transfer calculation considers nine different sun elevations and eight randomly chosen combinations of cloud heights and overlap.The aerosol column properties at 550 nm are defined by the MPI-M Aerosol Climatology (MAC).The aerosol vertical distribution and the fine-mode anthropogenic fraction of aerosol optical depth for the mid-2000s are derived from global models participating in AeroCom (e.g.Myhre et al., 2013).We calculate the radiation transfer with both MAC version one (MACv1; Kinne et al., 2013) and two (MACv2; Kinne, 2019).The latter considers more recent observational data, e.g. from the Maritime Aerosol Network (MAN; Smirnov et al., 2009), and a smaller anthropogenic aerosol fraction.MACv2 is also based on more recent emission data relative to 1850 (Lamarque et al., 2010), while MACv1 used emission data relative to 1750 (Dentener et al., 2006).The two climatologies therefore make different assumptions on the pre-industrial background, shown in Fig. 3.The temporal scaling of anthropogenic aerosol opti-   cal depth in MACv1 and MACv2 is from the same transient ECHAM simulation (Stier et al., 2006).The parameterization form of the Twomey effect for MACv1 and MACv2 is identical to MACv2-SP here, but the assumptions for τ p and τ a differ.

Experiment strategy
All climate model simulations are carried out with the atmosphere-only configurations using prescribed monthly mean sea-surface temperatures and sea ice.Table 1 summarizes the major characteristics of the model simulations.
The modelling groups were free to set up all model components other than MACv2-SP and choose their own boundary and initialization data.Specifically, the modelling groups use their own representation of pre-industrial aerosol for 1850 such that the present work includes both models with prescribed monthly climatologies and interactive parameterization schemes for natural aerosol species (Appendix A).Moreover, the physical parameterizations of radiation and clouds are different across the models (Appendix A).Motivated by the effect of natural variability in ERF estimates in ECHAM (Fiedler et al., 2017), each model was run to produce a number of simulation ensembles: a reference ensemble consisting of six simulations with only pre-industrial aerosols representative of 1850 and two additional ensembles consisting of three simulations each with aerosols representative of 1975 and 2005.For each model, we perform a total of 12 experiments for the years 2000-2010.These are six experiments with τ p for the year 1850, three experiments with τ p and anthropogenic aerosol from MACv2-SP for the year 1975, and three experiments with τ p and anthropogenic aerosol from MACv2-SP for the year 2005.The six pre-industrial simulations serve as the reference for the experiments with anthropogenic aerosol and therefore efficiently increase the number of forcing estimates for anthropogenic aerosol.The first year of each run is considered to be a spin-up period and is excluded from the analysis period was chosen to account for variability in the boundary conditions.
The instantaneous radiative forcing (RF) of anthropogenic aerosols in clear-and all-sky conditions is estimated from double radiation calls in the models having this functionality, namely ECHAM, ECHAM-HAM and NorESM.Aerosol radiative effects predominantly occur for short-wave radiation.We therefore calculate the atmospheric transfer of shortwave radiation once with and once without the contribution from anthropogenic aerosols to the aerosol optical properties and their effect on the cloud droplet number concentration.For each model, this gives us in total 30 annual estimates of RF for each of the two τ a patterns shown in Fig. 1, which is sufficient to estimate the mean RF and can be directly compared to the offline radiation-transfer calculations.We calculate RF at the top of the atmosphere (TOA) and at the surface (SFC) and list the global means in Table 2.
The ERF is calculated as the difference in the short-wave radiative flux at the top of the atmosphere between the simulations with and without anthropogenic aerosols.For illustrating the effect of year-to-year variability, we calculate annual ERF estimates for each of the 10 simulation years.Combining the six pre-industrial experiments with each of the three experiments with additional anthropogenic aerosol thus yields 6 × 3 annual ERF estimates for each year of the simulation, i.e. 180 annual estimates per model and τ a pattern in total.We calculate the standard deviation from these 180 annual ERF values and use it as a measure of the natural variability in ERF internal to the models.The means of these 180 values are used for identifying systematic model differences in ERF.It was shown in an earlier study using ECHAM (Fiedler et al., 2017) that the combination of ensemble size and simulation length adopted here is sufficient for precisely estimating the ERF of a model.For comparison, the RFMIP protocol recommends a 30-year average for diagnosing the ERF of a model (Pincus et al., 2016).Finally, we calculate the net contribution of rapid adjustments (ADJs) to ERF by subtracting RF from ERF for each model.Our rapid adjustments are associated with atmospheric temperature changes, i.e. semi-direct effects, except for EC-Earth, accounting also for adjustments in cloud microphysics.A discussion of the rapid adjustments and the choice for the Twomey effect in ECHAM is given by Fiedler et al. (2017).

Spread in present-day ERF
We characterize the spread in the short-wave effective radiative forcing (ERF) at the top of the atmosphere in our model ensemble for the present-day (mid-2000s).For doing so, we first calculate the multi-model mean as a reference value.The all-sky top-of-atmosphere ERF for the entire multi-model, multi-member ensemble is −0.59 W m −2 , with an inter-annual standard deviation of 0.3 W m −2 , corresponding to a relative variability of roughly 50 %.The inter-annual variability in ERF is illustrated by Gaussian distributions fitted to the frequency histogram in Fig. 4a.The entire range in annual ERFs from the models including inter-annual variability is −1.5 to +0.5 W m −2 .
The all-sky ERFs from the models are 10 %-50 % less negative than the clear-sky ERF in all models, except in EC-Earth, because clouds mask the ERF of low-level anthropogenic aerosol (Table 2).That masking by clouds is most pronounced in HadGEM3.In EC-Earth, the all-sky ERF is more negative than in clear-sky conditions because EC-Earth includes cloud-lifetime effects of anthropogenic aerosols, thus simulating a stronger F aci than all other participating models.The long-term averaged ERFs of ECHAM and ECHAM-HAM are similar, despite ECHAM using a prescribed climatology of τ p and ECHAM-HAM simulating τ p interactively (Sect.2.1).This similarity suggests that the submonthly variability in natural aerosol does not substantially affect the mean ERF of anthropogenic aerosol as long as F aci is treated consistently in the two models.Using different parameterizations for F aci can change this result because of non-linear processes.The magnitude of F aci , however, remains uncertain (Bellouin et al., 2019)   certainty is the poor quantitative understanding of the preindustrial aerosols (e.g.Carslaw et al., 2013).The multi-model spread in the ensemble mean all-sky ERF of individual models is rather small, with a range of −0.40 to −0.9 W m −2 , compared to the internal variability in the entire multi-model ensemble (Fig. 4a).This multi-model spread corresponds to a range of deviations from the multi-model mean of just −0.31 to +0.19 W m −2 and is even smaller when the ERF of EC-Earth, which includes cloud-lifetime effects, is excluded.One could expect less model diversity in all-sky ERF from our study than from previous inter-comparison projects (e.g.Myhre et al., 2013;Shindell et al., 2013) because we prescribe the same aerosol optical properties and the associated change in cloud droplet numbers.However, our model diversity in clear-sky ERF is smaller than for our all-sky ERF (Table 2).This points to the influence of model differences in representing clouds (Appendix B) on the allsky ERF.Our results therefore indicate that model differ-ences in meteorological parameters contribute to the model diversity in all-sky ERF.This is also the case for the ERF uncertainty in a complex aerosol-climate model (Regayre et al., 2018).
The large inter-annual variability implies that it is essential to estimate ERF of individual models from a sufficiently large number of simulated years to quantify model differences in ERF.Otherwise the modelled ERF estimates may not be representative of the long-term average.This could be done either from sufficiently long simulations with annually repeating aerosol or a sufficiently large ensemble of simulations with transient changes.Given the similar year-to-year variability in ERF in the models, the confidence estimates from ECHAM (Fiedler et al., 2017)

Regional contributions to ERF
The distributions of ERF for 2005 are shown as ensemble averages in Fig. 5 and are shown for each model in Fig. 6.Eastern Asia is the largest contributor to globally averaged ERF, as expected from the regional maximum in τ a prescribed there (Fig. 5b).The mean pattern of regional contributions to ERF is in general similar in the models, but differences in its magnitude and detectability appear in some regions.For example, the contributions to the global ERF modelled over central Africa range from positive to negative, averaging to a small value in this region (Fig. 5).
Another interesting example for where regional contributions to globally averaged ERF differ is the North Atlantic.In this region, the variability in the multi-model ensemble is relatively large, 3-6 W m −2 (Fig. 4b), but the small multi-model mean radiative effects are nevertheless detectable (Fig. 5), although ECHAM and the Hadley Centre Global Environment Model (HadGEM) by themselves have regional signals over the North Atlantic that are not statistically significant.
Taken together, the size of year-to-year variability and regional model differences in contributions to the global ERF imply that an ensemble of simulations with more than one model, as done here, is needed for constraining the radiative effect of anthropogenic aerosol regionally.The spread in modelled regional contributions to ERF is typically smaller than the differences associated with natural variability in the model ensemble (Fig. 4b-c).Irrespective of whether we compute the regional standard deviations for the aerosol pattern of the mid-1970s or the mid-2000s, the pattern and strength of the regional natural variability in contributions to ERF are robust (not shown).In regions where the anthropogenic aerosol burden was relatively large in 2005, like eastern Asia, the models disagree on the magnitude of the regional contributions to ERF (Fig. 4c), which means that even for a relatively large anthropogenic aerosol optical depth, natural variability in the atmosphere remains a hurdle against constraining the regional radiative effect.

Contributions from RF and adjustments
The modelled ERF is decomposed into the contributions of rapid adjustments and RF by diagnosing the latter from double calls to the radiation scheme in the models with this functionality (Fig. 5).The RF is considerably less variable from year to year than ERF.Moreover, RF clearly dominates the ERF magnitude in all models that use η N in the radiation transfer calculation (Table 2).Remember that these models consider F aci from the Twomey effect only.The net contribution of rapid adjustments to the global-mean ERF ranges from 0.03 W m −2 in NorESM to 0.2 W m −2 in ECHAM-HAM at the TOA and acts to weaken the forcing magnitude.The positive net contribution from adjustments is consistent with buffering of perturbations by atmospheric processes.
We compare the climate model estimates of RF with the results of the offline radiation-transfer calculations described in Sect.2.1.The offline estimates of the all-sky RF with MACv2-SP (Offline-v1-SP and Offline-v2-SP) are in close agreement with the RF of the climate models that represent F aci in the form of the Twomey effect.This agreement is remarkable, since the aerosol-climate models and the offline model differ in many aspects, including again the representation of clouds (see Appendix B).

Uncertainties in RF
The offline radiation-transfer model is used to assess the role of uncertainty in τ p and τ a in total RF uncertainty.The aerosol classification of MACv2 (Offline-v2) is used as an alternative representation to MACv1 (Offline-v1).MACv2 classifies more ambiguous cases of fine-mode aerosol as anthropogenic than MACv2-SP.These cases primarily occur in remote uninhabited regions such as the Southern Ocean and the Sahara.These regions are poorly captured by the groundbased observation network, so there the MACv2 product primarily uses global model results for separating anthropogenic from natural aerosols.Classifying additional fine-  mode aerosol as anthropogenic increases the all-sky RF to −1.1 W m −2 , which primarily arises due to stronger F aci in MACv2.Ambiguous aerosol classifications, which occur especially in regions with a generally low aerosol burden, and poor observational coverage are therefore causes of uncertainty in present-day RF, with the RF becoming more negative with increasing τ a .An even more negative RF is obtained from the offline model, namely an all-sky RF of −1.4 W m −2 , when both a larger anthropogenic fraction and the lower background burden of 1750 from MACv1 (Offline-v1) are used.Note that the clear-sky RFs from the offline estimates and the climate models are in good agreement such that most of the uncertainty stems from the uncertain magnitude of F aci .This underlines again the importance of the aerosol background for quantifying the cloudy-sky contribution to all-sky RF in agreement with previous studies (Carslaw et al., 2013;Fiedler et al., 2017).Quantitative changes in the natural aerosol burden between the pre-industrial and present-day eras remain poorly constrained.Since the aerosol of 1750 or 1850 has not been observed, using the present-day natural aerosol as a background could yield a better comparability of observational and model estimates in future inter-comparison studies.By prescribing both the same natural and anthropogenic aerosol across different models, differences in the radiative effects of the aerosol can be attributed to model errors in representing meteorological processes and radiative transfer.

Impact of spatial change of pollution
Although the global-mean τ a is similar for 1975 and 2005, the anthropogenic pollution covers very different regions, with the largest maxima in Europe and the US during the mid-1970s and in eastern Asia during the mid-2000s.The regional differences in clouds, insolation and surface albedo can contribute to changes in radiative effects that can result in a different global ERF.For instance, Figs.B1-B3 show the spatial patterns of cloud properties and the surface albedo, illustrating both the regional differences and the model diversity for their representation (see Appendix B).The different spatial distributions of τ a clearly change the pattern of the radiative forcing (Fig. 7).As expected, the maxima in regional contributions to RF and ERF occur over Europe and the US in the mid-1970s and over eastern Asia for the mid-2000s.Despite those regional differences in radiative effects and the inter-model spread in ensemble-averaged global-mean RF and ERF, the spatial pattern of τ a has little impact on the global-mean RF and ERF in each of the participating models.The model ensemble mean changes from −0.54 W m −2 for the mid-1970s to −0.59 W m −2 for the mid-2000s.The mean monthly contributions to RF are also similar for both τ a patterns, irrespective of which model we choose (not shown).
The ensemble-averaged change in ERF is small relative to the natural inter-annual variability in modelled ERFs (Fig. 8).Indeed, contrasting 1-year estimates from the two τ a patterns results in a large spread in ERF changes ranging from decreases to increases in ERF with τ a patterns (Fig. 8c, d).This result is in agreement with previous findings based on Atmos.Chem.Phys., 19,2019 www.atmos-chem-phys.net/19/6821/2019/ECHAM only (Fiedler et al., 2017).The result underlines again the importance of using a large number of simulated years for determining changes in ERF from free-running climate models.Moreover, it provides evidence that the globalmean ERF does not strongly depend on the regional distribution of anthropogenic aerosol in the Northern Hemisphere.
The cloudy-and clear-sky contributions to the all-sky efficiency of the ERF, in other words the ratio of ERF to τ a , helps with better understanding why the two τ a patterns yield similar ERFs.All-sky efficiency is the sum of contributions from cloudy and clear-sky conditions:  where f is the total cloud fraction, and ERF cloudy and ERF clear are the ERF in cloudy and clear-sky conditions, respectively.
Figure 9 shows the regional distribution from the multimodel ensemble average of the terms of Eq. ( 1).The allsky efficiency often increases with increasing distance to major pollution sources because of the decreasing background aerosol, up to −100 W m −2 per unit of τ a .These all-sky efficiencies are primarily explained by the cloudy-sky contributions.Large efficiencies occur typically in remote areas, including some regions at the edges of τ a plumes (Fig. 9).No clear saturation of F aci is evident at all edges of the τ a plumes.Also the spatial distribution of both the all-and cloudy-sky efficiency is rather inhomogeneous.The inhomogeneity contrasts with the clear-sky efficiency, which has much smaller spatial variability.
Averaged globally, all-sky forcing efficiencies for the two aerosol patterns are similar at −26 W m −2 per unit of τ a .The regional all-sky ERF efficiencies, however, change between the mid-1970s and mid-2000s (Fig. 9).This change is almost exclusively explained by the cloudy-sky contribution to the ERF efficiency, reflecting the regional change in Of all models, NorESM and EC-Earth have the strongest ERF efficiencies around −30 and −40 W m −2 per unit of τ a , respectively; i.e. the same aerosol perturbation in these two models is much more efficient in inducing effective radiative effects than in the other models, consistent with the more negative ERFs (Fig. 8).In EC-Earth, the more negative ERF also arises from perturbing the cloud microphysics with η N .In NorESM, the more negative ERF arises from a strong negative RF and a small net contribution from adjustments.

Conclusions
We assess the radiative effects of anthropogenic aerosol in ensembles of simulations from five state-of-the-art aerosolclimate models, prescribing identical anthropogenic aerosol properties of reduced complexity.Each of the participating models uses annually repeating patterns of anthropogenic aerosol for obtaining 180 years of radiative forcing estimates.The multi-model multi-ensemble present-day all-sky short-wave effective radiative forcing (ERF) at the top of atmosphere is −0.59 W m −2 .The year-to-year standard deviations of around 0.3 W m −2 in the models imply a typical year-to-year variability of 50 %, reflecting a strong contribution of model-internal variability to ERF.We therefore recommend caution for the use of ERF estimates based on single years, as in the standard AeroCom protocol with varying reference years.These are likely affected by model-internal variability such that an apparent ERF spread is not associated with systematic model differences alone.Indeed such stud-Atmos.Chem.Phys., 19,2019 www.atmos-chem-phys.net/19/6821/2019/ies have shown a substantial spread in ERF estimates (e.g.Shindell et al., 2013), comparable to the magnitude of the model-internal variability quantified in the present work.
We further recommend that model-based assessments of ERF in the future ensure the elimination of the effects of internal variability, either by averaging over longer time periods from single transient climate simulations or from averaging across several ensemble members for shorter time periods.For instance, the protocol of RFMIP requests 30year averages for estimating the present-day ERF and threemember ensembles with 10-year averages for diagnosing decadal changes in ERF (Pincus et al., 2016).The precision of the estimate can be tested by using confidence estimates (e.g.Fiedler et al., 2017).Note that natural variability is equally an issue in observations.Ensembles of simulations should therefore be used for constraining ERF with the historical record of observations.The inter-annual variability in ERF, and hence the number of years needed to estimate ERF, could be different in nudged model simulations (Zhang et al., 2014).However, nudging a model simulation with reanalysis data can change the climatology and interfere with the rapid adjustments.The resulting ERFs from a nudged simulation are therefore likely different when compared with freerunning model simulations.The interference of nudging with adjustments deserves closer attention in future research.
In our study, we obtain an ERF spread of −0.9 to −0.4 W m −2 , associated with systematic model differences (Fig. 10).This estimate is not affected by model-internal variability, is based on identical anthropogenic aerosol optical properties and makes use of a consistent perturbation of the cloud droplet number concentrations associated with anthropogenic aerosol.The model with the most negative ERF accounts also for changes in cloud microphysics associated with anthropogenic aerosol, whereas the other participating models account for the Twomey effect only.Based on our model spread, we conclude that models with a strongly negative ERF have particularly strong contributions from anthropogenic aerosol effects on clouds.
Our results highlight that the participating models consistently show little change in the global ERF of anthropogenic aerosol between the mid-1970s and mid-2000s, despite the substantially different location of anthropogenic pollution maxima and the model diversity in their ERF magnitude relative to the pre-industrial.Model-internal variability, however, produces ERF changes of different signs and magnitude between the two periods.This result gives further evidence that model-internal variability has not been sufficiently considered in past model studies estimating the ERF difference associated with the mid-1970s to mid-2000s change in anthropogenic aerosol, as previously suggested based on ECHAM alone (Fiedler et al., 2017).The small change in global ERF stems from similar global forcing efficiencies of anthropogenic aerosol in the two periods.These are primarily explained by globally compensating differences in regional cloudy-sky contributions to the ERF efficiency.Assuming stronger aerosol-cloud interactions can cause a larger change in ERF from the mid-1970s to mid-2000s, based on simulations with ECHAM (Fiedler et al., 2017).The forcing from aerosol-cloud interaction is a subject of ongoing discussion and research (Bellouin et al., 2019).Given our multi-model spread in absolute ERF relative to the pre-industrial period, inter-comparing the relative ERF changes between observable periods might provide a better test for a model to represent transient climate changes.Our future work will focus on inter-comparing modelled ERF changes associated with other aerosol patterns.One such endeavour is the usage of MACv2-SP in model simulations in the framework of CMIP6 (e.g.Pincus et al., 2016;Fiedler et al., 2019).
Data availability.The model data of this study will be available on the AeroCom community's data server.Additionally, the model data are archived by the Max Planck Institute for Meteorology and can be made accessible by contacting publications@mpimet.mpg.de.(Stevens et al., 2017;Fiedler et al., 2017).
The global aerosol-climate model ECHAM6.3-HAM2.3 is an updated version of the model described by Tegen et al. (2019) and Neubauer et al. (2019).Revisions made in ECHAM6.3-HAM2.3relate to the atmospheric model and the description of sea-salt emissions, which have been made dependent on the sea-surface temperature.The model uses ECHAM6.3 but is coupled to the aerosol module HAM (Stier et al., 2005;Zhang et al., 2012).An important difference in the atmospheric components is that ECHAM6.3uses a single-moment cloud microphysics parameterization, while ECHAM6.3-HAM2.3 has a two-moment stratiform cloud scheme (Lohmann and Hoose, 2009) for representing the activation of aerosols as cloud condensation nuclei and ice nuclei in mixed phase clouds.Emission schemes for sea salt (Long et al., 2011;Sofiev et al., 2011), desert dust (Tegen et al., 2002;Cheng et al., 2008) and oceanic dimethyl sulfide (DMS; Nightingale et al., 2000) are run online.Emissions of all other aerosol species are prescribed from external input files (Stier et al., 2005;Lamarque et al., 2010).In the configuration used in this study, we prescribe the pre-industrial background of aerosol components from HAM that are not simulated online.These, in combination with the onlinecomputed natural aerosol emissions, are the only aerosols seen by the two-moment cloud microphysics parameterization in this study.EC-Earth (Hazeleger et al., 2010;Döscher et al., 2019) uses the Integrated Forecasting System (IFS) of the European Centre for Medium-Range Weather Forecasts (ECMWF) as its atmosphere component.The latest generation of the model, EC-Earth3, is based on the ECMWF seasonal prediction system 4 with the IFS cycle 36r4.The radiation scheme is based on the rapid radiative transfer model (Mlawer and Clough, 1998;Iacono et al., 2008) with 14 bands in the shortwave spectrum and 16 bands in the long-wave spectrum and uses the Monte Carlo independent column approximation (McICA) approach (Pincus and Morcrette, 2003).Many new features have been added to IFS by the EC-Earth consortium.The pre-industrial tropospheric aerosol climatology that is used in combination with MACv2-SP has been constructed from a simulation with the TM5 aerosol-chemistry model (Huijnen et al., 2010;van Noije et al., 2014), driven by meteorological data from ERA-Interim for the early 1980s.This simulation used CMIP6 emissions of aerosol and precursor gases for 1850 and provides the monthly mean aerosol mass and number concentrations as well as the aerosol optical properties.Stratospheric aerosols are prescribed using the CMIP6 dataset of radiative properties.Aerosol-cloud interactions are implemented only for liquid-phase, stratiform clouds.The cloud droplet number concentration, N, is diagnosed using the activation scheme of Abdul-Razzak and Ghan (2000) and is modified here by η N from MACv2-SP.Cloud microphysics depends on N through autoconversion of cloud droplets to rain.The model used in this study is EC-Earth version 3.2.3.It is close to the CMIP6 version described by Döscher et al. (2019) but does not include the latest revisions that were introduced after the simulations for this study were started.Most relevant to this study is that in the CMIP6 version, the pre-industrial aerosol climatology has been updated by changing the parameterization of the production of sea spray in the underlying TM5 model.Specifically, the whitecap coverage has been made dependent on sea-surface temperature, while its power-law dependence on the 10 m wind speed has been changed from the W10 expression proposed by Salisbury et al. (2013) to the expression proposed by Monahan and Muircheartaigh (1980).The main effect of this revision is an increase in aerosol and cloud droplet number concentrations over the Southern Ocean.
Simulations with the Hadley Centre Global Environment Model (HadGEM) use a modified version of the HadGEM3 Global Atmosphere 7.0 climate model configuration (Walters et al., 2017).HadGEM3 normally uses the Global Model of Aerosol Processes (GLOMAP; Mann et al., 2010) to simulate aerosol mass and number and interactions of aerosols with radiation, clouds and atmospheric chemistry.That scheme is replaced here with prescriptions of the three-dimensional distributions of aerosol extinction and absorption coefficients averaged over HadGEM's six short-wave and nine long-wave wavebands, waveband-averaged aerosol asymmetry, and N .Those prescriptions are made of three components.First, pre-industrial aerosol and N distributions are taken from a HadGEM3-GLOMAP simulation using CMIP6 emission datasets for the year 1850.Second, stratospheric aerosols are taken from the CMIP6 climatologies for the year 1850.Prescribed N values are used in the calculation of cloud albedo (Jones et al., 2001) and autoconversion rates (Khairoutdinov and Kogan, 2000), although the latter do not see the MACv2-SP N scalings, ensuring that anthropogenic aerosols do not exert a secondary indirect effect in the present study.HadGEM3 uses the prognostic cloud fraction and prognostic condensate scheme (PC2; Wilson et al., 2008) that simulates the mass-mixing ratios of water vapour, cloud liquid and ice, as well as the fractional cover of liquid, ice and mixed-phase clouds.
The Norwegian Earth System Model (NorESM; Bentsen et al., 2013;Iversen et al., 2013;Kirkevåg et al., 2013Atmos. Chem. Phys., 19, 6821-6841, 2019 www.atmos-chem-phys.net/19/6821/2019/uses the atmospheric component of the Oslo version of the Community Atmosphere Model (CAM4-Oslo), which differs from the original CAM4 (Neale et al., 2013) through the modified treatment of aerosols and their interaction with clouds (Kirkevåg et al., 2013).The model has a finite-volume dynamical core and the original version 4 of the Community Land Model (CLM4) of CCSM4 (Lawrence et al., 2011).NorESM uses the CAM-RT radiation scheme by Collins et al. (2006).Like ECHAM-HAM and ECHAM, NorESM sets all background aerosol emissions to pre-industrial levels representative of 1850.These background conditions include sulfate from tropospheric volcanoes and from DMS as well as organic matter from land and ocean biogenic processes, mineral dust and sea salt.Sea-salt emissions are parameterized as a function of wind speed and temperature (Struthers et al., 2011), while other pre-industrial aerosol emissions are prescribed following Kirkevåg et al. (2013).These are, in the case of NorESM, sulfate, organic matter and BC aerosols originating from fossil fuel emissions and biomass burning (Lamarque et al., 2010).

Appendix B: Model diversity in cloud properties and surface albedo
The model diversity in RF and ERF is larger when cloudy skies are considered.We therefore assess the model diversity in cloud properties and compare the model climatologies calculated from the simulations for the mid-2000s against observational climatologies from satellite products, listed in Table B1.The observational products provide an orientation for realistic values, although satellite retrievals also have caveats (e.g.Grosvenor et al., 2018).Moreover, we document the surface albedos used here for illustrating both the regional differences and the model diversity.

B1 Macroscopic cloud properties
We first assess the cloud short-wave radiative effect at the top of the atmosphere (F cld ), thus the cloud effect on the planetary albedo.The multi-annual global-mean F cld for 2001-2010 from the Clouds and the Earth's Radiant Energy System (CERES) Ed. 4 is −45.8W m −2 , i.e. less negative than in most models (Table B2).This behaviour indicates a tendency of the models to have clouds that are too reflective, consistent with other model evaluations (Nam et al., 2012;Crueger et al., 2018, Lohmann andNeubauer, 2019).The spatial patterns of modelled F cld are generally similar, but regionally the differences can be more distinct (Fig. B1).
To better characterize the model diversity in clouds, we compare the simulated total cloud cover (f ) and liquid water path (l cld ) to satellite climatologies from the ISCCP and MAC-LWP, respectively (Table B1).Most models underestimate both f and l cld over the oceans compared to the satellite retrievals, but having too few clouds does not necessar-ily imply too small an amount of liquid or vice versa (Table B2).The spatial patterns (Fig. B1) show a tendency of the models for underestimating f in the stratocumulus decks in the southeastern regions of the Pacific and Atlantic Ocean, where aerosol-cloud interactions are thought to be important.The models, however, disagree on the values for f and l cld in those regions.Moreover, the models show a large diversity in l cld in the extratropical storm tracks.NorESM shows the largest maximum l cld exceeding 200 g m −2 .Our findings for l cld are consistent with a similar regional comparison between HadGEM and CAM (Malavelle et al., 2017), the latter of which having a similar atmospheric component to NorESM (see Appendix A).

B2 Cloud microphysical properties
The reported differences in macroscopic cloud properties among the models raise the question of how different the cloud droplet number concentrations (N) are.We find that the models show large diversity in the pattern of N for present-day conditions, as shown in Fig. B2.Note that we show the mean in-cloud droplet number concentration, which means that regions without clouds are not included when averaging N. It is noteworthy that in the models, N is calculated for stratiform cloud types but can additionally include detrained droplets from anvils of deep convection.The spatial pattern of N in ECHAM is not shown due to the simplistic treatment in the model.ECHAM employs statically prescribed values for N, which are constant with height below 800 hPa and exponentially decrease aloft.The near-surface values in ECHAM are N = 80 cm −3 over ocean and N = 180 cm −3 elsewhere (not shown) and are multiplied with η N from MACv2-SP like in the other models.
Compared to the satellite product, the models typically underestimate N, for example, in the stratocumulus decks, where f is also underestimated.How much the quantitative differences between the models and the satellite product are due to differences in the methods for diagnosing N in the satellite retrievals and the models remains an open question, but it is unlikely that the methods solely explain the diversity in the patterns of N. It is interesting that, despite these quantitative differences in N, the spatial pattern of F cld compares reasonably well to observations (Fig. B1), which might be a consequence of compensating differences from tuning the radiation balance at the top of the atmosphere.For instance, the behaviour of NorESM points to too much short-wave reflectivity by clouds that are too thick that overcompensate the missing reflection due to underestimated cloud cover.

B3 Surface albedo
An additional influence on the radiative forcing of anthropogenic aerosol is the surface reflectivity for short-wave radiation.We therefore document the surface albedo for shortwave radiation from the participating models and the satel-   1983-2009Project (Rossow and Schiffer, 1999) f (%) MAC-LWP Multi-sensor Advanced Climatology Liquid water path, 2000-2016 (Elsaesser et al., 2016(Elsaesser et al., , 2017) ) l cld (g m  (Kinne et al., 2013) α s (%) Table B2.Global-mean statistics for clouds, aerosols and surface albedo.The numbers given for l cld and N are averages over ocean regions, consistent with the satellite data availability (Figs.B1 and B2).Details on the satellite products are listed in Table B1.
F cld (W m −2 ) f (%) l cld (g m −2 ) N (cm lite product used in the offline radiative-transfer calculations of this study.In the global mean, the models and the satellite product are very similar, with a surface albedo of 14 %-16 %.However, the spatial distributions in Fig. B3 indicate differences.The typical difference between less reflective ocean surfaces compared to land regions is apparent.Moreover, the analysis reveals diversity in the regional surface albedos of the participating models, typically related to areas affected by snow cover.Since such diversity in the surface albedo was already previously reported for aerosol-climate models with implications for the aerosol radiative forcing (e.g.Stier et al., 2007), future efforts are still needed for constraining the surface albedo in climate models.Atmos.Chem.Phys., 19, 6821-6841, 2019 www.atmos-chem-phys.net/19/6821/2019/Horizon 2020 research and innovation programme with grant agreement no.724602 as well as by the Alexander von Humboldt Foundation.Joonas Merikanto acknowledges the Academy of Finland for funding (no.287440).We acknowledge the usage of the DKRZ supercomputer for running the simulations with ECHAM6.3.ECHAM6.3-HAM2.3simulations were performed through a grant from the Swiss National Supercomputing Centre (CSCS) under project ID 652.We also acknowledge the usage of satellite data from the following providers.CERES data were obtained from the NASA Langley Research Center ordering tool (http://ceres.larc.nasa.gov/,last access: 5 May 2019), ISCCP data were obtained from the International Satellite Cloud Climatology Project website (https://isccp.giss.nasa.gov,last access: 5 May 2019) maintained by the ISCCP research group at the NASA Goddard Institute for Space Studies, MAC-LWP data (Elsaesser et al., 2016) were acquired as part of the activities of NASA's Science Mission Directorate and were archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC, https://disc.gsfc.nasa.gov,last access: 5 May 2019), and the cloud droplet number concentration climatology was provided by the Vanderbilt University Institutional Repository (https: //ir.vanderbilt.edu/handle/1803/8374,last access: 5 May 2019).We thank Akos Horvath for providing information on MAC-LWP.
Financial support.This research has been supported by the Seventh Framework Programme (BACCHUS grant no.603445).
The article processing charges for this open-access publication were covered by the Max Planck Society.
Review statement.This paper was edited by Hinrich Grothe and reviewed by three anonymous referees.

Figure 1 .
Figure 1.Mean anthropogenic aerosol optical depth (τ a ; shaded) and fractional increase in cloud droplet number (η N ; contours) associated with anthropogenic aerosol.Shown are annual means of τ a at 550 nm and η N for the (a) mid-1970s and (b) mid-2000s from MACv2-SP, which prescribes annually repeating monthly maps of τ a in the participating models.Note the non-linear scale.

Figure 2 .
Figure 2. Mean pre-industrial aerosol optical depth (τ p ). Shown are annual means of τ p of the radiation band at around 550 nm for each model.

Figure 3 .
Figure 3. Annual cycle of the global-mean aerosol optical depth at 550 nm.Shown are monthly means of τ p (colours) from the models and τ a (black) for the mid-1970s (dashed) and mid-2000s (solid) from MACv2-SP.

ECHAM − 1 .Figure 4 .
Figure 4. Variability in annual ERF estimates for the mid-2000s.(a) shows Gaussian distributions of annual ERF estimates for presentday from individual model ensembles (colours) and the entire multi-model, multi-member ensemble (black).The bars are the frequency histogram of 1-year ERF estimates from all models, and the legend indicates the means and standard deviations of the ERF estimates.(b)shows the regional standard deviation of annual contributions to ERF from the entire multi-model, multi-member ensemble as measure for the inter-annual variability inherent in the model ensemble.(c) shows the range in the long-term averaged ERFs of the models as measure for the spread in ERF associated with model differences.ERF is for the short-wave (SW) spectrum at the top of atmosphere (TOA) for all-sky conditions.

Figure 5 .
Figure 5. Multi-model, multi-member ensemble mean of the anthropogenic aerosol radiative effects for the mid-2000s.Shown are the (a) instantaneous and (b) effective radiative forcing as well as (c) the net contribution from rapid adjustments for the SW spectrum at the TOA in all-sky conditions.Hatching in (b, c) indicates non-significant values at a 10 % significance level.The numbers in the lower left corner are the spatial averages.The ensemble-mean RF is averaged over three climate models, the ensemble-mean ERF is averaged over five climate models and the ensemble-mean adjustment is their difference.

Figure 6 .
Figure 6.Multi-member ensemble mean of effective radiative effects of anthropogenic aerosol for the mid-2000s.Shown is the effective radiative forcing for the SW spectrum at the TOA in all-sky conditions for each model.Hatching indicates non-significant values at a 10 % significance level.

Figure 7 .
Figure7.Multi-model, multi-member ensemble mean of the anthropogenic aerosol radiative effects for the mid-1970s, as in Fig.5, but with the anthropogenic aerosol pattern of the mid-1970s.

Figure 8 .
Figure 8. Anthropogenic aerosol forcing of the mid-1970s against the mid-2000s.Shown are the (a, b) instantaneous and (c, d) effective radiative forcing for the SW spectrum at the TOA from the pollution of the mid-1970s against the mid-2000s for (a, c) clear-and (b, d) allsky conditions.Thick asterisks are the ensemble means.Blue dots in (c, d) are the model averages of individual years, representing the year-to-year variability internal to the model ensemble.

Figure 9 .
Figure 9. Anthropogenic aerosol effective radiative forcing efficiencies (in W m −2 per unit optical depth) for (a, d) all-sky, (b, e) clear-sky and (c, f) cloudy-sky conditions.(a-c) show efficiencies for mid-2000s anthropogenic aerosols.(d-f) show differences made by using the pattern for the mid-1970s.
η N from the mid-1970s to mid-2000s.The strong change in the cloudysky contribution is in strong contrast to the relatively minor changes in the clear-sky contributions.Differences in regional efficiencies of anthropogenic aerosol effects on clouds thus become balanced in the global mean and result in similar global ERFs for the mid-1970s and mid-2000s.

Figure 10 .
Figure 10.Summary of model spread in anthropogenic aerosol forcing for the mid-2000s.Shown are the instantaneous (RF) and effective radiative forcing (ERF) of aerosol-radiation and aerosolcloud interactions for the short-wave spectrum at the top of the atmosphere for clear-and all-sky conditions from Table 2.The RF from the offline radiation-transfer calculations considers additional uncertainty sources and is shown as separate bars.Refer to Sect.2.1 for details.

Figure B1 .
Figure B1.Multi-member ensemble means of cloud characteristics for the mid-2000s compared to climatologies derived from satellite observations (TableB1).Shown are the mean SW cloud radiative effect at the TOA, F cld (left column); total cloud cover, f (middle column); and liquid water path, l cld (right column) from the satellite products (top row) and the models (rows beneath).Areas without available data are shaded white.

Figure B2 .
Figure B2.In-cloud droplet number concentration for the mid-2000s.Shown are the annually and vertically averaged in-cloud droplet number concentrations (N) from the aerosol-climate models and from the MODIS satellite product byBennartz and Rausch (2017).Areas without available data are shaded white.

Figure B3 .
Figure B3.Surface albedo for short-wave radiation for the mid-2000s.Shown are the mean surface albedo for short-wave radiation (α s ) from the models and the satellite product fromKinne et al. (2013).

Table 1 .
Model experimental set-up.

Table 2 .
Ensemble averages of the short-wave instantaneous radiative forcing (RF) and effective radiative forcing (ERF), and net contribution from rapid adjustments (ADJs) at the surface (SFC) and the top of the atmosphere (TOA) for all sky (clear sky) in W m −2 for the period 1850 to 2005.The first block shows aerosol-climate models with MACv2-SP, and the second block shows estimates of the offline radiative-transfer model.
are a reasonable approximation for the whole ensemble of models in the present study.

Table B1 .
Gridded climatologies of satellite retrievals used for model evaluation.