Sensitivity of the interannual variability of mineral aerosol simulations to meteorological forcing dataset

Interannual variability in desert dust is widely observed and simulated, yet the sensitivity of these desert dust simulations to a particular meteorological dataset, as well as a particular model construction, is not well known. Here we use version 4 of the Community Atmospheric Model (CAM4) with the Community Earth System Model (CESM) to simulate dust forced by three different reanalysis meteorological datasets for the period 1990–2005. We then contrast the results of these simulations with dust simulated using online winds dynamically generated from sea surface temperatures, as well as with simulations conducted using other modeling frameworks but the same meteorological forcings, in order to determine the sensitivity of climate model output to the specific reanalysis dataset used. For the seven cases considered in our study, the different model configurations are able to simulate the annual mean of the global dust cycle, seasonality and interannual variability approximately equally well (or poorly) at the limited observational sites available. Overall, aerosol dust-source strength has remained fairly constant during the time period from 1990 to 2005, although there is strong seasonal and some interannual variability simulated in the models and seen in the observations over this time period. Model interannual variability comparisons to observations, as well as comparisons between models, suggest that interannual variability in dust is still difficult to simulate accurately, with averaged correlation coefficients of 0.1 to 0.6. Because of the large variability, at least 1 year of observations at most sites are needed to correctly observe the mean, but in some regions, particularly the remote oceans of the Southern Hemisphere, where interannual variability may be larger than in the Northern Hemisphere, 2–3 years of data are likely to be needed.

observed and simulated, yet the sensitivity of these desert dust simulations to a particular meteorological dataset, as well as a particular model construction, is not well known.Here we use version 4 of the Community Atmospheric Model (CAM4) with the Community Earth System Model (CESM) to simulate dust forced by three different reanalysis meteorological datasets for the period 1990-2005.We then contrast the results of these simulations with dust simulated using online winds dynamically generated from sea surface temperatures, as well as with simulations conducted using other modeling frameworks but the same meteorological forcings, in order to determine the sensitivity of climate model output to the specific reanalysis dataset used.For the seven cases considered in our study, the different model configurations are able to simulate the annual mean of the global dust cycle, seasonality and interannual variability approximately equally well (or poorly) at the limited observational sites available.Overall, aerosol dust-source strength has remained fairly constant during the time period from 1990 to 2005, although there is strong seasonal and some interannual variability simulated in the models and seen in the observations over this time period.Model interannual variability comparisons to observations, as well as comparisons between models, suggest that interannual variability in dust is still difficult to simulate accurately, with averaged correlation coefficients of 0.1 to 0.6.Because of the large variability, at least 1 year of observations at most sites are needed to correctly observe the mean, but in some regions, particularly the remote oceans of the Southern Hemisphere, where interannual variability may be larger than in the Northern Hemisphere, 2-3 years of data are likely to be needed.
Dust is widely variable in space and time, with 4-fold fluctuations in surface concentration observed across a large region on decadal timescales (Prospero and Lamb, 2003), and globally equally large fluctuations on glacial-interglacial timescales (Kohfeld and Harrison, 2003).The importance of interannual variability of dust in changing or modifying precipitation and temperatures (Yoshioka et al., 2007;Evan et al., 2009;Mahowald et al., 2010) and biogeochemistry (Aumont et al., 2008;Doney et al., 2009) has previously been simulated and shown to potentially be large.Previous model intercomparison studies have shown that the models have some skill in simulations of the annual mean and seasonal cycle (Huneeus et al., 2011), and some studies have sought to consider the causes of interannual variability in dust, such as changes in precipitation, winds, surface roughness or land use (e.g., Tegen and Miller, 1998;Mahowald et al., 2002Mahowald et al., , 2003;;Miller and Tegen, 1998b;Ginoux et al., 2004;Cowie et al., 2013;Ridley et al., 2014).
It is well established in the dust literature that meteorology and surface conditions play central roles in driving changes in dust emissions, primarily from changes in precipitation, winds, surface roughness or vegetation cover on daily to interannual to geological timescales (e.g., Westphal et al., 1987;Petit et al., 1999;Marticorena and Bergametti, 1996;Mahowald et al., 2002;Prospero and Lamb, 2003;Engelstadter and Washington, 2007;Engelstaedter et al., 2003;Roe, 2008;McGee et al., 2010;Knippertz and Todd, 2012;Cowie et al., 2013;Yu et al., 2015), and we do not seek to repeat or review that work here.Rather, we address the question of how sensitive our simulation of interannual variability is to the meteorology or, alternatively, the modeling framework used.While previous modeling studies have evaluated the annual mean and seasonal cycle coherence across dust models (Huneeus et al., 2011) or contrasted specific models (Luo et al., 2003), prior studies exploring the ability of models to simulate interannual variability and the role of different meteorological mechanisms have used only one model (e.g., Miller and Tegen, 1998b;Mahowald et al., 2002Mahowald et al., , 2003;;Ginoux et al., 2004;Ridley et al., 2014).Thus an open question remains as to how robust our simulations of inter-annual variability (IAV) are, and how sensitive they are to the meteorological dataset vs. the modeling framework.Characterizing how well IAV is simulated in models allows a better understanding of how much we should trust model output in studies directed at understanding the role of dust IAV in contributing to IAV in total aerosol optical depth (AOD) variability (e.g., Streets et al., 2009) or in dust impacts on ocean biogeochemistry IAV (e.g., Doney et al., 2009).A related question is how much observational data do we need in order to correctly characterize the mean dust amount, based on how much interannual variability we think exists in different locations.Note that the global model simulations used here may miss small-scale features that may be important for IAV, such as dust devils or moist convective events (e.g., Renno et al., 2000;Marsham et al., 2011).
Here in this study, we use three different reanalysis meteorological datasets, online dynamic winds and different modeling frameworks to try to understand how robust interannual variability in simulated dust is across 1990-2005.Our emphasis is on conducting sensitivity studies comparing the importance of different model meteorological datasets and frameworks, but we do include some comparison to observations.While we focus on interannual variability, we will contrast that to seasonal variability, which has been commonly evaluated in models and previous intermodel comparisons (Huneeus et al., 2011).The period between 1990 and 2005 was chosen for this study because it has more available observational and reanalysis data than other years, but it must be noted that this time range does not have as much variability as previous periods (e.g., dry 1980s vs. wet 1960s in the Sahel region; Prospero and Lamb, 2003).Model results are compared to limited available in situ concentration, deposition and AOD data, in order to evaluate the models' ability to simulate the spatial and temporal variability observed in the dust cycle.In order to simplify the paper, we will focus on IAV in surface concentrations, and provide information on how deposition and AOD contrast with surface concentration variability.Section 2 describes the methods used in the study, including a brief description of the models, data and comparison metrics.Section 3 describes the results of the study, starting with comparison to observations, comparison between different model simulations, and the implication for observational needs.

Model description
Several models are used in this study, all of which include prognostic dust.The atmospheric component of an Earth system model, the Community Atmospheric Model (CAM4) of the Community Earth System Model (CESM) (Neale et al., 2013;Hurrell et al., 2013) is capable being forced either by online-calculated dynamical winds or by reanalysis datasets and is the model used for the bulk of the analysis to test sensitivity to which reanalysis winds are used (Sects.2.1.1 and 2.1.2).To contrast with this model, results from other models, including the CAM5 version of the same model, and two chemical transport models, driven by reanalyses, are also used (Model of Atmospheric Transport and Chemistry, or MATCH, and GEOS-chem) (Sect.2.1.3).More details are described below, along with a summary of the model simulations (Table 1).

CAM4 dust model
For the bulk of the analysis in this study, simulations using the CESM were conducted focusing on the atmospheric  Y 1990Y -2008Y Albani et al. (2014) ) model, CAM4 (Neale et al., 2013;Hurrell et al., 2013).This model is capable of simulations based on prognostic dynamic meteorology, as conducted for long climate simulations, or simulations forced to follow specific meteorological events, which allows comparison to specific observational data or field campaigns (Neale et al., 2013).The CESM model includes four main global climate model (GCM) components: atmosphere (in this case, CAM4), land, ocean, and sea ice, all linked by a flux coupler.However, only the land and atmosphere components were prognostic for these simulations, with prescribed ocean and sea ice being used instead.
Dust is entrained into the atmosphere when strong winds occur in dry, unvegetated regions with easily erodible soils (Marticorena and Bergametti, 1995), using the Dust Entrainment and Deposition module (Zender et al., 2003a).There is a dust source when the leaf area index (LAI) is sufficiently low ( < 0.3 m 2 m −2 ) and the soil moisture modifies the threshold wind velocity, as described in more detail in previous studies (Zender et al., 2003a;Mahowald et al., 2006).For most of the simulations used here, the CAM4 is used with the bulk aerosol module (BAM), This module includes four size bins: 0.1 to 1.0 µm, 1.0 to 2.5, 2.5 to 5.0, and 5.0 to 10.0 µm in diameter (Zender et al., 2003a;Mahowald et al., 2006), with the source size distributions described in Albani et al. (2014), following Kok (2011).The mass fraction of dust contained in each of the four bins is allowed to change over time as aerosols are transported and deposited out of the atmosphere (Zender et al., 2003a;Mahowald et al., 2006).Transport occurs through the CAM4 tracer advection scheme (Neale et al., 2013).Dry deposition includes both gravitational and turbulent settling (Seinfeld and Pandis, 1998;Zender et al., 2003a), while wet deposition includes both convective and stratiform precipitation, incorporating prescribed solubility and parameterized scavenging coefficients (Mahowald et al., 2006;Albani et al., 2014).Emission over different soil types is parameterized by a geomorphic soil erodibility coefficient (Zender et al., 2003b), following the preferential source ideas of Ginoux et al. (2001).Finally, regional soil erodibility is optimized using the methodology described in Albani et al. (2014), by applying scale factors to existing soil erodibility parameters for macro regions, in or-der to best match available data (Albani et al., 2014).For example, in most versions of the model the tuning reduced dustsource strength over Central Asia and the Atacama Desert, and increased dust-source strength over Argentina (Albani et al., 2014), which becomes important when analyzing dust IAV, as discussed in the results sections (Sect.3).

Meteorological forcing datasets
In the CAM model, the reanalysis forcing can be used to nudge the model close to specific weather patterns so that events can be simulated (Lamarque et al., 2011) using a similar methodology to one used elsewhere (e.g., MATCH; Rasch et al., 1997;Mahowald et al., 1997).In this procedure the horizontal wind components, air temperature, surface temperature, surface pressure, sensible and latent heat flux, and wind stress are read into the model simulation from the input meteorological dataset.These fields are subsequently used to internally generate (using the existing CAM4 parameterizations) the variables necessary for (1) calculating subgridscale transport including boundary-layer transport and convective transport; (2) the variables necessary for specifying the hydrological cycle, including cloud and water vapor distributions and rainfall (see Lamarque et al., 2011 for more details;developed in Rasch et al., 1997;Mahowald et al., 1997).While this approach has the advantage of increased versatility with the input reanalysis dataset, there are inconsistencies between the model and reanalyses or observations, which leads to the model being nudged back towards the reanalyses.These inconsistencies mean the model has an anomalous source of energy or water.The approach used in MATCH and CAM was developed to minimize these inconsistencies (e.g., Mahowald et al., 1995Mahowald et al., , 1997)).Previous studies have shown that the MATCH-CAM framework can reproduce the reanalysis precipitation to a very high degree at the sub-daily to monthly to annual timescales (Mahowald et al., 1997;Mahowald, 1996).
Three different reanalysis meteorological datasets were used to simulate dust entrainment, transport and removal in the CAM4 simulations: MERRA (Modern Era-Retrospective Analysis for Research and Applications version 1; Rienecker et al., 2011) Prediction)-NCAR (National Center for Atmospheric Research) 50-year reanalysis (Kistler et al., 2001), and ECMWF (European Center for Medium-Range Weather Forecasts) ERA-Interim (Dee et al., 2011).Within the meteorological literature there are many studies contrasting these datasets to available observations, and showing the errors in the reanalyses, especially the moisture transports and precipitation (Trenberth and Guillemot, 1998;Trenberth et al., 2000Trenberth et al., , 2011;;Trenberth and Fasullo, 2013).Reanalyses should not themselves be considered observations, but are the closest representation we have to observed meteorology, which can drive chemical transport models, and thus they represent an important resource.Here in this paper we supplement the meteorological analysis of different reanalysis datasets by contrasting how they impact dust emission, transport and deposition.
A fourth simulation was also conducted using AMIPtype protocol (Atmospheric Model Intercomparison Project; Gates et al., 1999).For AMIP simulations, the monthly mean sea surface temperatures are used to force the model's online meteorology, but no atmospheric fields are used to constrain the model, in contrast to the reanalysis-driven simulations described above.Because in this case there is no inconsistency between the atmospheric model and the reanalysis, which can result in sources or sinks of water or energy, it is often considered a more robust way to simulate water vapor and thus chemistry (e.g., Hess and Mahowald, 2008;Trenberth and Guillemot, 1998;Trenberth et al., 2000).However, AMIP simulations cannot simulate exact weather events, but only interannual variability.
All the CAM4 simulations were conducted using a ∼ 2 • × 2 • horizontal resolution.Not all of the input reanalysis data were available in the format required for CAM4 for the entire satellite era, so most of the analysis was conducted over a time period of 15 years when all the input data were accessible.There are parts of the analysis for which data outside this time period was used, but this is always indicated in the results.A summary of the model simulations and time periods of data availability are shown (Table 1).The first year of each simulation was neglected to allow for spinup.
For two of the model simulations used here -CAM4 (MERRA) and CAM4 (NCEP) -additional aerosol species were available for analysis -sea salts, black carbon (BC), organic carbon (OC), and sulfate (SO 4 ) -and we use these to contrast the correlations between dust aerosols and other aerosols for a sensitivity study, referenced in Sect.3.3.

Other dust models
In this paper, we also include sensitivity studies, where additional models are used, to contrast the importance of meteorological datasets with model construction.We briefly describe here the other models used in the study, with an emphasis on contrasting the differences in the model construction, but refer the interested reader to the specific model descriptions elsewhere.
An AMIP-style simulation was also conducted using the CAM5 model for comparison with the CAM4 AMIP simulation.While the dust generation module in the CAM5 is identical to the CAM4, there are significant differences in the physics (Hurrell et al., 2013), as well as the aerosol formulation, which has implications for dust (Albani et al., 2014).The CAM5 model includes new planetary boundary layer, radiation, and moist convective parameterizations (Hurrell et al., 2013), in addition to a modal aerosol module (Liu et al., 2012;Ma et al., 2013).Similar to CAM4, the dust simulations in the CAM5 were evaluated and tuned as in Albani et al. (2014).CAM5 assumes all aerosols are internally mixed (all aerosols in the same size are assumed to be mixed together for radiative forcing calculations), in contrast to CAM4, where aerosols are assumed to be externally mixed.CAM5 also allows aerosol indirect cloud effects to be calculated (e.g., Wang et al., 2011), although these are not evaluated here.Like the CAM4 simulations, CAM5 calculations were performed using a ∼ 2 • × 2 • horizontal resolution.
Dust simulations using the GEOS-chem model (version v9-01-03; http://www.geos-chem.org/)from Ridley et al. (2014) were also considered here (Table 1).The model includes similar processes as the CAM4 model, including externally mixed aerosols, but the GEOS-chem model is forced here only by MERRA-1 winds -referred to as GCHEM (MERRA) here.GCHEM (MERRA) uses the DEAD dust module (Zender et al., 2003a), and employs the dust-source function derived from (Ginoux et al., 2001).More details on the dust simulations from the GCHEM (MERRA) model can be found in Ridley et al. (2013Ridley et al. ( , 2014)).Notice that the version used here for these comparisons does not include the source function derived from Koven and Fung (2008) or the vegetation phenology and interannual variability, as included in the study focusing on African emissions (Ridley et al., 2013(Ridley et al., , 2014)).The horizontal resolutions of the model simulations were all 2 • × 2.5 • , and were interpolated onto the CAM grid for analysis in the paper.This model uses a similar dust entrainment scheme as CAM, but has different size distribution and deposition mechanisms, as well as different boundary layer and moist convection physics.
It should also be noted that there is a difference between how the GCHEM (MERRA) model and CAM models incorporate reanalysis meteorology.The GCHEM (MERRA) model reads in all meteorological parameters from the reanalysis datasets, including turbulent and moist convective mixing, as well as the hydrology.This has the advantage, which the chemical transport model does not have, to rederive the hydrological cycle or mixing, which can be difficult (e.g., Mahowald et al., 1995;Rasch et al., 1997).On the other hand, it makes the model less flexible, as it can only be operated using reanalysis datasets with mixing parameters output at the right frequency, in contrast to the MATCH or CAM framework (e.g., Rasch et al., 1997;Mahowald et al., 1997).This means that although both the CAM4 (MERRA) and GCHEM (MERRA) models are forced by MERRA winds, the surface winds and transport may still be slightly different.
Finally, simulations using the MATCH (Rasch et al., 1997) with NCEP reanalysis data (Mahowald et al., 1997;Kalnay et al., 1996) are included.These simulations use a similar entrainment and deposition scheme (Zender et al., 2003a), with a simple wet removal scavenging coefficient (Luo et al., 2003), and have been extensively compared against observations (Luo et al., 2003(Luo et al., , 2004;;Mahowald et al., 2003).In contrast to the CAM models discussed earlier, there is no vegetation phenology included.Instead, the preferential source term from Ginoux et al. (2001) was used.The horizontal resolution of the MATCH (NCEP) model used here is 1.8 • × 1.8 • , and the results were interpolated to the CAM grid before comparison to the other models.
Note that only the GCHEM (MERRA) model simulations are independently developed: the others all come from the same group, and previous studies have suggested that the group developing climate models matters substantially in their behavior (Knutti et al., 2013).There are, however, mean differences even between the CAM simulations, such as in dust vertical distributions and transport in relation to vertical mixing (Albani et al., 2014), and the strength of the Sahel source (e.g., Scanza et al., 2015).

Observational data description
For completeness, we compare standard annual means from each simulation to available data (e.g., Ginoux et al., 2001;Huneeus et al., 2011) in order to show that the mean dust cycle is reasonable in our models.In this study, we use the annual-mean surface concentration, AOD and deposition compilations from Albani et al. (2014).
For the variability studies, data were chosen for overlap with model runs and availability of longer datasets to evaluate interannual variability.In situ observations from the University of Miami network (Prospero and Nees, 1986;Prospero et al., 1996;Arimoto et al., 1990Arimoto et al., , 1997) ) were used here, as compiled in Luo et al. (2003) and Mahowald et al. (2003) and updated at some sites (Prospero and Lamb, 2003).In addition, in situ observations from three sites in northern Africa from the AMMA (African Monsoon Multidisciplinary Analysis) campaign were included (Marticorena et al., 2010).Details on the individual sites and locations are included in Table 2, and are shown in Figs.S1, S3 and S5 in the Supplement.
Sun-photometer-derived AOD is included from the AERONET (Aerosol Robotic Network) database (Holben et al., 1998).Only sites where more than 50 % of the modeled AOD was from dust are included in this comparison (as filtered in Fig. 1 using in Mahowald et al., 2007), and only sites with more than 18 months of data are used here to estimate variability (Table 2).Because both sea salt and dust occur in the coarse mode, we cannot use remote sensing measure-ments of the coarse vs. fine mode to identify dust-dominated stations.
Two sites in the Southern Hemisphere are included (bottom of Table 2): surface concentration data from Rio Gallegos (Zihan, 2016) and deposition data from Kerguelen (Heimburger et al., 2012).Because of the limited datasets in the Southern Hemisphere, we will consider these sites separately in Sect.3.3.Although there are no AERONET data using these Northern Hemisphere criteria available in the Southern Hemisphere, if we focus on the coarse mode, there is one station which is dominated by dust (i.e., far from the coasts, where sea salts would make up a large percentage of the total aerosol load): Tinga Tingana in southeastern Australia.Located in a region of predominately westerly winds, Tinga Tingana lies downwind of the central Australian desert, allowing desert dust to feature prominently in its aerosol load.Tinga Tingana also has a data record beginning in September of 2002, which overlaps our chosen time period by more than 3 years, making this the best Southern Hemisphere site to evaluate our modeled AOD variability.
For evaluation of the models' precipitation, we use the CPC Merged Analysis of Precipitation (CMAP) (http: //www.esrl.noaa.gov/psd/data/gridded/data.cmap.html)(Xie and Arkin, 1997).This is a combination of in situ and satellite observations, as well as models, to present the best estimate of precipitation.

Analysis methods
For comparison of the variability in the modeled and observational values, we define variability similarly to previous studies (Mahowald et al., 2003): where σ is the standard deviation of the modeled and observed values, and µ is the mean.
Here, we focus on IAV, so that the annual means are used in Eq. ( 1) for that calculation.However, in order to contrast with the variability in the seasonal cycle, we will also report the strength of the seasonal cycle in terms of variability, where the values included are the climatological monthly means.
Models and observational time series of in situ concentrations and AOD (Table 2) were also correlated, using rank correlations to assess the ability of the models to simulate variability (similar to Mahowald et al., 2003).Rank correlations are used in order to reduce the importance of individual, extremely high data points, which can dominate regular correlations (Wilks, 2006).Similar to the variability, most of the analysis in the paper focuses on of the annual means, which test the ability of the models to simulate interannual variability, but some information about seasonal correlation is also provided, in order to provide context and comparison to previous studies.We also analyze the observations and model output for trends, by calculating the least-squares fit slope, as well as the standard deviation in the slope.
In order to understand how similar model results are, we also calculate the variability at each grid box and compare different models at both the interannual and seasonal timescale.We also correlate the time series of models at individual grid boxes across model simulations, again on the two different timescales.
To show the sensitivity to meteorology, we correlate the three CAM4 reanalysis simulations (CAM4-RE) which will give us three different correlation coefficients (CAM4-MERRA vs. CAM4-NCEP; CAM4-MERRA vs. CAM4-ERAI; and CAM4-NCEP vs. CAM4-ERAI) and then average at each grid point the three different correlation coefficients to find the average correlation.Similar results are conducted using the AMIP simulations (CAM4-AMIP vs. CAM5-AMIP), and for the model simulations using the exact same meteorology (CAM4-MERRA vs. GCHEM-MERRA and CAM4-NCEP vs. MATCH-NCEP).
Finally, we use the model values to estimate the number of monthly mean observations required to correctly estimate the climatological annual mean value over 1990-2005.To do this, we assume that we would like to have a 95 % chance to be within 1 standard deviation of the climatological mean.1000 Monte Carlo simulations were conducted, and each time we chose randomly from the modeled monthly mean values at each grid point, and for each number of observations (between 1 and 50) we calculated the percentage of the time that the mean is within 1 standard deviation of the climatological mean of the 1990-2005 simulation.At every grid box, the number of observations that would meet the 95 % criteria is then calculated, providing an estimate of the number of months of observations required.
Note that the modeled monthly mean values are not at all Gaussian distributed, and thus normal methods for determining the number of observations would not work (e.g., Wilks, 2006).Thus, for this analysis, we use rank correlations, which work with non-gaussian data.To be consistent with the climate model community (Taylor, 2001;Gleckler et al., 2008), for mean and standard deviation analysis described above, we use these standard metrics, despite the fact that our datasets do have not Gaussian distribution, which will lead to some errors in our results.3 Results and discussion

Comparison of model and observational variability
The annual mean distribution of the model simulations included here are evaluated elsewhere in more detail, since many of these model results were previously published (Luo et al., 2003;Huneeus et al., 2011;Albani et al., 2014;Ridley et al., 2013Ridley et al., , 2014)), but for completeness we repeat comparisons of annual mean surface concentration, AOD and deposition between available observations and the model simulations in the online Supplement (Table S3; Figs.S1-S6).Concentrations vary over several orders of magnitude spatially, and the models are able to simulate these variations (Figs.S1  and S2).In addition, the models can be shown to be mostly accurate in simulating the observed dust AOD and deposition (Figs.S3-S6).Most of the model versions presented here do an equally good job when compared against the observations (Table 3).Note that the CAM4 and CAM5 simulations were tuned against these same observations (Albani et al., 2014), while the MATCH (NCEP) and GCHEM (MERRA) models were previously compared to similar observational syntheses (Luo et al., 2003;Huneeus et al., 2011;Ridley et al., 2013Ridley et al., , 2014)).The mean source flux and globally averaged AOD of the three CAM4-RE are 2400 ± 26 % Tg yr −1 and 0.026 ± 30 %, respectively, while for all the models included here the mean emission flux and AOD are 2400 ± 26 and 0.025 ± 40 %, respectively.These ranges are similar to previous studies (Huneeus et al., 2011;Reddy et al., 2005).Note that using a similar model (CAM4-RE) and similar methodology to constrain the dust AOD, based on a combination of surface concentration, deposition and AOD in dust regions (Albani et al., 2014) obtains an uncertainty just due to meteorology of 30 %.A more recent estimate, based more extensively on remote sensing data with limited information from four different models, but without using deposition or surface concentration data, finds a higher AOD of 0.033 ± 0.006 than found here, and a much smaller error estimate (Ridley et al., Observational data from AERONET stations (citations listed in Table 2).The annual means are divided by the long-term mean to allow comparison with seasonal variability, since they are similarly normalized (Fig. S8).A map (e) with station locations for concentration (blue; Fig. 1), AOD (red, Fig. 2) and Southern Hemisphere analysis (green; Fig. 5).

2016
).Thus there are large differences in the deduced AOD and uncertainties depending on assumptions about how to include different data, as well as the details of the models and methodology used.Since this study is focused on an inter-comparison of different model simulations, rather than an evaluation of a specific model, we conduct limited comparison to observations.We focus on the highest quality data, coming from in situ concentrations and sun photometry data in dust-dominated regions (e.g., Prospero and Lamb, 2003;Holben et al., 1998; Table 2), and ignore satellite-based measurements (e.g., Torres et al., 2002;Evan et al., 2006) and dust visibility data (Mahowald et al., 2007) which are more difficult to interpret, both because they are not always only dust aerosols, and because they can have larger errors and thus be more difficult to compare for interannual variability (Torres et al., 2002;Evan et al., 2006;Mahowald et al., 2007).Some previous studies have included comparison of these in situ and sun photometry data in terms of interannual variability for specific model simulation evaluation (e.g., Mahowald et al., 2003;Ridley et al., 2014).The vertical distribution of the aerosol can also vary depending on the meteorology used (e.g., Albani et al., 2014), which may introduce some additional variability and discrepancy for in situ ground-based measurements.
Focusing on the amount of IAV, the models tend to simulate values between 0.1 and 0.8 for the variability (standard deviation of annual means divided by climatological annual mean; Sect.2.3) at the observing sites, with largest values found at Mace Head in the models, but Bermuda in the observations.The models tend to simulate more IAV in AOD than in concentration (Fig. 1 vs. Fig.2; Fig. 3a vs. Fig.3b), while the observations suggest similar amounts of variability (Fig. 1 vs. Fig.2; Fig. 3a vs. Fig.3b).Both the models and the observations suggest much more variability (> 2-fold) in the seasonal cycle than in the IAV (contrast Fig. 1 vs. Fig.S7 or Fig. 2 vs. Fig.S8; Table S1 vs. Table S2 in the Supplement; or Fig. 3c and d) at many sites (Banizoumbou, Barbados, Bermuda, Zinzana, Miami, Midway, Dalandzadgad and Sedé Boqer), and only slightly larger (1-2-fold) at the other sites.
Next we evaluate the ability of the models to simulate the high and low annual means using rank correlations.Most of the models do not have statistically significant correlation coefficients (Fig. 3e and f; Table S3).The models do much worse at simulating IAV than the seasonal cycle (contrast Figs. 1 and 2 with Figs.S7 and S8; Fig. 3e and f with Fig. 3g  and h), since at most stations, the models have a statistically significant correlation for the seasonal cycle and simulate a similar amount of variability over the seasonal cycle.For the seasonal cycle, the exceptions are at Izaña and Ilorin for all of the models, and the CAM4 (AMIP) simulation, which is not statistically significantly correlated at most of the stations (Table S3; Fig. 3e and f).For the CAM4, forcing with only sea surface temperatures (SSTs) substantially degrades the ability of the dust model to simulate the seasonal cycle, while in CAM5, the seasonal cycle is better simulated.
In model intercomparisons it has been observed that the model mean often does a better job than the individual model simulations (e.g., Flato et al., 2013).We evaluate this in the case of dust using the average of the CAM4 reanalysis models -CAM4 (MERRA), CAM4 (NCEP) and CAM4 (ERAI).At the observational sites considered here, we do not see a large increase in the correlation coefficients for the model average vs. individual models for either the IAV (Table S3) or the seasonal cycle (Table S4).
New in this section is the evaluation of the relative ability of reanalysis-driven models vs. sea surface temperature forced models -CAM4 reanalysis vs. CAM4 (AMIP)which suggests a degradation in the ability of the models driven only by sea surface temperature to simulate both seasonal and interannual variability, although this is dependent on which model version (CAM4 vs. CAM5; Tables S3  and S4; Fig. 3).This correlation potentially provides insight into how much of the variability in dust is driven by sea surface temperatures.There are also significant differences between models driven by the same meteorology (e.g., CAM4 (MERRA) vs. GCHEM (MERRA) and CAM4 (NCEP) and MATCH (NCEP), highlighting the importance of model for-  mulation as well as meteorology.We will return to these points in later sections.

Comparison to trends in observations in the North Atlantic
Recent studies have highlighted the importance of fluctuations in rainfall in the Sahel for driving interannual variability and decadal scale variability in North Atlantic dust concentrations as seen in Barbados (e.g., Prospero and Lamb, 2003), although the importance of land use, winds, surface roughness and vegetation changes have been noted as well (e.g., Marticorena and Bergametti, 1996;M'bourou et al., 1997;Mahowald et al., 2002Mahowald et al., , 2007;;Cowie et al., 2013).Since 2000, it has been noted that the Sahel precipitation no longer anti-correlates with dust at Barbados, suggesting a different mechanism may have become important.(Prospero, 2006;Mahowald et al., 2009).Ridley et al. (2014) proposes the hypothesis that the observed decrease in dust from 1982 to 2008 at Barbados is controlled by source wind strength over source regions in northern Africa.For this argument, they use model evaluation with the GCHEM (MERRA) model (a different version of which is also included here), as well as analysis of ERAI and NCEP reanalysis winds and other observations (Ridley et al., 2014).Indeed, station data in the northern African region, especially the Sahel, support the idea that winds decreased in this region between the late 1970s and 2003 (Mahowald et al., 2007), and there is also an observed widespread decrease in surface winds across many land regions (McVicar et al., 2012).Of course, there are many issues with the observation of surface winds due to small-scale effects of buildings or topography (e.g., discussed in McVicar et al., 2012), so it is unclear how robust trends in observed surface winds are.Note that data correlations between visibility and winds suggest that both precipitation (Prospero and Lamb, 2003) and winds (Engelstaedter and Washington, 2007) are important for changes in dust near the source regions of northern Africa (Mahowald et al., 2007).Importantly, Cowie et al. (2013) argue that the trends in surface winds and dust could be from changes in vegetation through the mechanism of surface roughness, which would also link these changes to precipitation.Note that because model-calculated surface roughness was not archived in the models, we cannot test the Cowie et al. (2013) hypothesis directly in this study.
Here we can consider whether the hypothesis put forward in Ridley et al. (2014), that the decrease in winds in the surface region is responsible for the observed annually averaged decrease in surface concentration at Barbados, is consistent with the simulated trends in the multiple models included in this analysis.For this part of the paper, we use the full time period of our models, although for some models only some of the 1982-2008 time period is available (Table 1).For simplicity we consider only annual averages.As shown in Ridley et al. ( 2014), there is a statistically significant downward trend in the data at Barbados (Table 4).All the model versions considered here simulate a downward trend in the data at Barbados, although for some models this is not a statistically significant trend -CAM4 (NCEP); CAM5 (AMIP).Only one model simulates the slope within 1 standard deviation of the observed value: the CAM4 (MERRA) (Table 4).The GCHEM (MERRA) overpredicts the magnitude of the negative slope as shown by Ridley et al. (2014), while the other model versions underpredict the magnitude of the slope (Table 4).
Only some of the models see a statistically significant decrease in source strength in northern Africa (Table 4), and some models predict an increase over this time period.However, a look at the trends in surface concentration across the Only slopes that are larger in magnitude than 1 sigma from the regression are plotted.Positive slopes imply increasing concentrations.models show that all the models see a decrease in surface concentrations that extends from the Sahel area of northern Africa across the tropical North Atlantic to Barbados (Fig. 4), supporting the idea that the source strength is decreased over the time period 1990-2005.While individual models might simulate downward or upward trends elsewhere, this is the only region that sees a consistent model signal across this time period (Fig. 4).If we focus on the Sahel (western) area of northern Africa, indeed, most of the models simulate a decrease in the source (Table 4); exceptions are CAM4 (ERAI) and CAM5 (AMIP).Since the visibility data in northern Africa also suggest a decrease across this time period in the western Sahel (Mahowald et al., 2007), these support the idea that the decrease in the source is the cause of the decrease in Barbados surface concentrations.In the CAM4 models, the strongest correlation in IAV of the source occurs with surface winds (Table S5), and indeed in all the models there is a de-crease in surface winds over this time period over the source regions, as seen in Ridley et al. (2014) S6).

(Table
There is also a correlation between Sahel precipitation and Barbados concentrations (Prospero and Lamb, 2003), or precipitation and visibility in the Sahel (e.g., Mbourou et al., 1997;Mahowald et al., 2007), so the other driver could be precipitation.In some of the CAM4 model simulations, the IAV of the source strength does feature significant correlations with both LAI and precipitation, but in general those same cases feature even stronger correlations with surface winds (Table S6).Although the quality of the surface wind data precludes us from evaluating the reanalysis for surface winds, as discussed in McVicar et al. (2012), we can evaluate the precipitation, which is commonly done (e.g., Trenberth and Guillemot, 1998).When we do, we see that the reanalysis precipitation datasets are not capturing either interannual variability in precipitation or the slope in the pre-   cipitation, compared to the CMAP precipitation compilation (more details in Sect.2.2) (Table 4).Since precipitation and winds, perhaps due to gustiness in moist convection (Engelstaedter and Washington, 2007) or due to surface roughness changes from vegetation (Cowie et al., 2013), are likely related to each other, an error in the IAV of precipitation may be indicative of an error in the IAV of winds.Overall, the model simulations conducted here support the hypothesis of Ridley et al. (2014), although the quality of the reanalysis data forcing our simulations does not allow us to be conclusive about our results.

Southern Hemisphere variability
Most of the available dust observations come from the Northern Hemisphere (Table 2).Here we consider three sets of data from the Southern Hemisphere, one of surface concentrations in Rio Gallegos (Argentina/Patagonia, 52 • S, 69 • W) (Zihan, 2016) and one of the deposition at Kerguelen Island (49 • S, 70 • E), which is likely to be influenced by both South American and South African dust sources (such as the Patagonian and Namib deserts; Heimburger et al., 2012; Fig. 5).Finally, there is one AERONET station with data in the coarse mode, which is likely to be dominated by dust, since it is far from the coast and downwind of the Australian desert (Tinga Tingana, 29 • S, 140 • E) (Fig. S9); Notice that there are very few data at the first two observational sites, which means these results need to be interpreted carefully, especially for IAV, since 1.5 years of data are measured (Fig. 5).The surface concentration data at Rio Gallegos suggest an IAV variability of 0.08 (Table 5), which is at the lower edge, but in the same range as the observations in the Northern Hemisphere (values between 0.06 and 0.4; Fig. 3a; Table S1).The models, however, tend to predict too large a variability at this site (between 0.2 and 2).It is unclear why the models overpredict the variability, but it may be due to issues with the actual location of the sources in the model compared to the observations, or the strength of the north-south gradient in concentrations in the models vs. observations (e.g., Gaiero, 2007;Gassó et al., 2010).In addition, the way the tuning in Albani et al. ( 2014) was conducted will increase the interannual variability of the dust cycle in the CAM4 and CAM5 simulations in the Southern Hemisphere (Table 5).These model simulations had too small an emission value from some regions and too large from others, and thus the model source strengths were tuned in order to broadly match observations (Albani et al., 2014).In particular, in Argentina, the dustsource strength had to be tuned up very strongly in order to obtain a climatological dust that matched observations.This means that there were very few grid boxes and time periods with active sources, increasing the temporal variability in the model.If we had instead changed the wind threshold in our model formulation in order to tune the source strength (e.g., Tegen and Miller, 1998), we presumably would not have increased our variability as much, highlighting the importance of details in the model formulation for model results.The CAM4-ERAI and CAM4-AMIP simulations especially overpredict variability.Note, however, that some of the CAM4 models (CAM4 (MERRA) and CAM4 (NCEP)) have similar variability as the non-CAM4 models (GCHEM (MERRA) and MATCH (NCEP)), especially at Kerguelen and Tinga Tingana, suggesting that some of this variability may also be associated with the meteorological dataset.We will explore the ramifications for variability predictions in Sects.3.4 and 3.5.Further downstream of the sources, the amount of variability in deposition observed at Kerguelen is 0.10, which is overpredicted in most of the models (Table 5), while at Tinga Tingana, the IAV variability is 0.42 in the observations, and tends to be smaller in the models.
At both Rio Gallegos and Tinga Tingana, the ratio of the IAV variability to the seasonal variability is 1.4 and 1.5, which are on the lower side of the observations in the Northern Hemisphere (Fig. 3c and d; Table S1).The models are able to simulate the reduced fraction of variability due to the seasonal cycle at these sites (Table 5).Because of the limited data and length of data, we cannot be sure, but the observations presented here are consistent with a stronger role of interannual variability, compared to seasonal variability, in dust sources in the Southern Hemisphere than in the Northern Hemisphere, as simulated by the models.Additional longterm data in the Southern Hemisphere would allow more testing of this model result.

Spatial analysis of model simulations of variability and correlations
Consistent with previous studies (e.g., Tegen and Miller, 1998;Mahowald et al., 2003Mahowald et al., , 2011)), the largest variability in IAV (standard deviation over the mean, described in Eq. 1, Sect.2.3) in modeled dust concentrations occurs not in the main dust-source areas or outflow regions of the North Atlantic, but rather adjacent to these regions, where intermittent dust events occur (Fig. 6a)(for example, in the North Atlantic, north of the main transport pathways).Some of the highest IAV variability occurs over ocean regions, especially in the Southern Hemisphere.There is much more seasonal variability than IAV variability in most locations in the Northern Hemisphere (Fig. 6b), with smaller ratios of seasonal to IAV variability over the southwest US, North Atlantic to Northern Europe, and across large parts of the Southern Hemisphere (Fig. 6b).The deposition and AOD IAV variability have similar patterns to the concentration, with a slightly higher variability in deposition and slightly lower AOD (Fig. 6a vs. Fig.S10a and c), and similar importance of the seasonal cycle (Fig. 6b vs. Fig.S10b and d).As discussed in Sect.3.3, for the CAM4 and CAM5 model simulations, some of this enhanced Southern Hemisphere variability could be due to the tuning of the source areas, because the dust sources were not consistently active enough (Albani et al., 2014).If we consider only the models in which the Southern Hemisphere dust sources did not have to be increased -GCHEM (MERRA) and MATCH (NCEP) -then the monthly mean variability in the South Atlantic is similar to the North Atlantic (Fig. 6c), but there still tends to be a smaller proportion of the variability from seasonal variability, and thus a more important role for interannual variability, in the South Atlantic, Indian or Pacific oceans than in the Northern Hemisphere oceans (Fig. 6d).Some of the limited observational data supports strong interannual variability in the Southern Hemisphere, although not as strong as some of the model versions (Sect.3.3).
Next we consider how similar the temporal variability is in the model simulations covering the same time period.If two model simulations are temporally correlated, it implies the timing of the monthly mean variability in the models is similar.Of course, to obtain the fraction of the variability that is similar, the correlation coefficient needs to be squared (if we assume a Gaussian distribution in the model output), which means that even a statistically significant (at 95 %) high correlation of 0.8 (for 16 different years) only implies that 60 % of the variability is similar.However, the correlation is a useful way to consider how similar the simulations are in their variability.The correlations between the model simulations for the surface concentration suggest that the models simulate similar IAV variability over some of the globe, but over most of the globe, there is no statistically significant correlation (Fig. 7).The strongest correlations occur in the model simulations with reanalysis-driven simulations (Fig. 7a, b, d and e), and the simulations with time-varying SSTs (AMIP) were more different (Fig. 7c  and f).This suggests that sea surface temperature forcings are not the only important driver for the dust cycle.Notice that simulations with the same model but different winds (CAM4 (MERRA) vs. CAM4 (NCEP); Fig. 7a) had similar correlation coefficients to using different winds and model framework (e.g., CAM4 (MERRA) vs. MATCH (NCEP); Fig. 7e) or different models with the same winds (e.g., CAM4 (MERRA) vs. GCHEM (MERRA); Fig. 7d).This is made more clear when we average the correlations across CAM4 reanalysis models and compare to simulations using different model frameworks driven with the same meteorological data (Fig. 8a vs. Fig.8b), which show similar patterns of correlations.This suggests that both model framework and winds 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 contribute to variability, and perhaps in a similar, but not additive magnitude.
We can explore the drivers of interannual variability by focusing on the average correlation between the CAM4 simulations driven by reanalysis (CAM4-RE average) and the North Atlantic Oscillation (NAO) and the El Niño-Southern Oscillation (ENSO) climate indices (Fig. 8c an d).These show correlations similar to previous studies (e.g., Moulin et al., 1997;Mahowald et al., 2003), and suggest some correlation between NAO and ENSO.Because we are using a relatively short time period (16 years), these signals do not show as statistically significantly as if we use a longer time period, but the same models (e.g., MATCH in Mahowald et al., 2003, included 22 years) have shown a similar pattern in previous studies.There is much more coherence in the simulated variability from the reanalysis winds, than seen in the NAO or ENSO (Fig. 8a vs. Fig.8c and d).This is also consistent with the lower correlation between the SST-driven model simulations (AMIP-style) (Fig. 8e).There are much higher correlations between model results when we use reanalysis winds compared to forcing with only sea-surface temperatures, indicating the value of using reanalysis datasets to obtain more robust results.Notice that there is a much stronger correlation between models if we consider the seasonal cycle (Fig. 8f), indicating the difficulty the models have with simulating IAV compared with the seasonal cycle.This is consistent with previous studies showing similar simulation of seasonal cycle across many models (e.g., Huneeus et al., 2011).
Surprisingly in some remote ocean regions, the models are simulating similar interannual variability (notably, parts of the Atlantic, Pacific and Indian Oceans), although this could be spurious due to the short time period considered (Fig. 8).There are some statistically significant trends in the model simulations over the 1990-2005 time period, as seen in Barbados (Sect.3.2) and the nearby North Atlantic and some parts of northern Africa (Fig. 4), which may be responsible for some of this coherence.The patterns and magnitude of correlation coefficients is similar for deposition and AOD in the models (Fig. S11).
A comparison of other aerosols in two of the CAM4reanalysis-based simulations available here (Fig. S12) is consistent with the idea that dust is more highly variable.For other aerosol types, especially BC, OC and SO 4 , which include in this study no IAV in the sources, there is some correlation between the models driven by different meteorology far from the sources as well as close (Fig. S12).We will next discuss regional averages to understand how similarly ocean basin averages are simulated, to see if these IAV correlations in some regions are large enough to provide coherent basin estimates.

Modeled temporal trends in different regions
As discussed in the cases of the Tropical North Atlantic and South America (Sects.3.2 and 3.3), as well as in previous studies (Tegen and Miller, 1998), some of the variability in dust comes from the source regions.Looking across the model simulations considered here, we see strong IAV in many of the source regions, with the strongest IAV in the smallest source regions, as seen previously (Tegen and Miller, 1998;Mahowald et al., 2003) (Fig. 9).Only the western Sahel source (Sect.3.2) is simulated to have a statistically significant trend in all the model simulations (Table 4, also seen in Figs. 4 and 9).There are strong increases in some of the model simulations of the Australian source, consistent with the observed increase in drought over this time period (Cai et al., 2014), although we do not know of dust observations verifying this.For example, visibility data extend- ing through 2003 do not support an increase in dust-source strength in Australia (Mahowald et al., 2007).Previous studies have shown that there are decreases in dust from South America over the 1990-2005 period in some models (Doney et al., 2009), but these are not shown in all the model ver-sions or supported by robust observational data (Doney et al., 2009).Yu et al. (2015) show evidence for strong decadal variability in the Saudi Arabian source over a different time period than considered here, with an increase between 2000 and 2015, arguing that a drought in the Fertile Crescent is   S8) in order to focus on interannual variability.The yellow highlighted area is the area encompassed by the five reanalysis-based simulations -CAM4 (MERRA), CAM4 (NCEP), CAM4 (ERAI), GCHEM (MERRA) and MATCH (NCEP).
responsible for the increase after 2000.The model simulations included here show a lower dust source during 2000-2005 (Fig. 9), consistent with their results, and correlation between precipitation and dust source in these simulations (Table 7).Considering the 1990-2005 period, on a global average, some models simulate an increase in the global dust source, other models simulate a decrease, suggesting no clear trends from the modeling of the global dust cycle over this time period (Fig. 9).
If we consider regional averages, there are moderate correlations in time (0.4-0.8) in the annual mean source strength, except for the Sahel region (0.13), suggesting they simulate similar interannual variability (Table 6; Fig. 9).Although the emphasis of this paper is on the temporal variability, there are significant differences in the climatological mean source strengths in the models used here (Table 6), highlighting the uncertainties in dust sources.The larger sources (such as northern Africa and the Sahel) have mean source strengths varying by 25 % while the smaller sources (such as North and South America) vary up to 160 %, just due to differences in the meteorology, using the same CAM4 model and observational constraints (Albani et al., 2014).
Focusing on the drivers of the CAM4-modeled variation in sources suggests that LAI has the strongest correlation Table 6.Regional sources of dust.For each source region, the averaged correlation across time between annual mean source strengths for the CAM-RE cases is shown in the second column.The following columns show the climatological mean source strength (Tg yr −1 ) for the mean of the three CAM4-RE simulations and the mean of the seven simulations included in this study.The ± % standard deviation is also shown, and represents the standard deviations across the models included in the averaging.The regions are defined as follows: Australia: 35 to 25  as well as speculation that past climate variability in winds from East Asia is sensitive to synoptic-scale wind events (e.g., Roe, 2008;McGee et al., 2010).Soil moisture has the highest correlation for North America.Notice that IAV in LAI is likely to be a strong function of soil moisture, and precipitation (Table 8), and LAI may cause changes in surface roughness and therefore winds (Cowie et al., 2013), so these variables are all likely to be related.There are trends across this relatively short time period of a few of the source regions and variables, averaged across all the models (Table S7), but longer records tend to suggest oscillation in dust-related variables: for example, the downward trend in dust in the Sahel from the 1980s followed a strong upward trend between the 1960s and the 1980s (Prospero and Lamb, 2003), and indeed there may have been even longer trends in dust from northern Africa (Mulitza et al., 2010).Thus, trends in the short time period considered here  may not necessarily be representative of the longer-term trends.
The current generation of Earth system models have a great deal of difficulty in simulating not only precipitation (as discussed in Sect.3.2) but also LAI (e.g., Sitch et al., 2015;Mahowald et al., 2016), and thus this strong dependence on difficult-to-simulate variables may decrease our ability to simulate interannual variability.Note that using satelliteretrieved vegetation (e.g., Zhu et al., 2013), as done in some models (e.g., Ridley et al., 2014), may remove model-derived uncertainties, but the satellite-retrieved vegetation has large uncertainties, especially in regions with low LAI, and does not do a good job of detecting brown vegetation, which is very important in resisting dust entrainment (e.g., Okin et al., 2001).This suggests that simulation of the surface conditions (e.g., vegetation, soil moisture, surface winds) in the source regions is likely to limit our ability to accurately simulate IAV in dust.
Downwind of the source regions, there is some coherence in the simulated variability in the surface concentrations as well (Table 8), especially in the North Atlantic, North Pacific, Equatorial Atlantic, South Pacific and South Indian Ocean (with correlation coefficients between different CAM4-RE simulations averaging between 0.46 to 075), but in other regions there is less coherence.The sources which dominate the surface concentration, deposition and AOD in different downwind regions were not diagnosed in this study, but were previously shown for the mean in related model simulations (see Albani et al., 2014; Fig. S1), and suggest source-receptor-type relationships consistent with previous studies (e.g., Tanaka and Chiba, 2006;Mahowald, 2007).Those studies suggest that, as expected, northern African sources dominate the North Atlantic dust burden (or conditions), and East Asian and Central Asian sources dominate the North Pacific.The Central Asian sources are important for the North Indian Ocean.The Southern Hemisphere sources tend to dominate the regions just downwind of the sources (Fig. S1, Albani et al., 2014).These results are consistent with the available source provenance data, which were used to 'tune' the CAM4 simulations (see Albani et al., 2014 for more details).
For the downwind regions, we can consider the importance of the climate indices in the time series (Fig. 10; Table 8).If we examine the regional correlation coefficients of dust surface concentration to the NAO, there is very little correlation across this 16-year period for the North Atlantic (Table 8 and Fig. 8c).Previously studies showed a larger contribution, which might be due to the short time series used here (Moulin et al., 1997) (Mahowald et al., 2003;Ginoux et al., 2004).The largest magnitude correlations are seen in regions remote from the NAO (the Antarctic, for example, with an anti-correlation of −0.42), which may be due to spurious correlations (Table 8).If we consider ENSO (Table 8), we see that the strongest correlations occur in the ocean basins where this oscillation dominates the physics (also seen in Fig. 8d).The North Pacific had the highest correlation coefficient (0.62), followed by the North Atlantic (0.45) and Equatorial Pacific (0.42).These results, for the shorter term oscillation of ENSO in contrast to the decadal oscillations from NAO, are more similar to previous results (e.g., Mahowald et al., 2003).
For some applications (e.g., IAV in iron fluxes on biogeochemistry or IAV in dust contributions to AOD), deposition and AOD are more important (Streets et al., 2009;Doney et al., 2009), and thus we briefly consider the time series of these model fields (Figs.S13 and S14).The mean deposition from the different model simulations can vary widely (Table S9), as seen in the source strengths (Table 6) and previous studies (e.g., Huneeus et al., 2011).Downwind of the large source regions (e.g., North Atlantic, Central North Atlantic, Equatorial Atlantic and Northern and Equatorial Indian Ocean) the standard deviations between CAM4-RE climatology mean are the lowest (7-25 %), and they are largest in the remote ocean regions, like the Southern Ocean (100 %) (Table S9).The standard deviation between models is slightly larger if all models are included (Table S9).Similarly, for the AOD in the ocean basins, the difference in model simulations is smallest close to the source regions (20 %) and largest in the remote ocean regions (100 %) (Table S10).
Here we emphasize the temporal variability, and the comparison of the CAM4-RE model simulations suggest in many basins there are consistent signals, with correlations in net deposition above 0.3 in most basins, with the exception of North Indian and South Atlantic (Table S9; Figs.S13  and S14).Interestingly, AOD is consistent (r > 0.3) in different basins, with the North Central Pacific and North Atlantic Equatorial Pacific having the lowest correlations.This highlights the disconnect between AOD and deposition, which can make diagnosing deposition variability from AOD difficult, as noted previously (e.g., Mahowald et al., 2003).Note even in the basin with the most consistent simulations (0.68 in the North Atlantic), the variability in deposition, simulated similarly in the models, represents only 50 % of the variability (if we assume for simplicity Gaussian distributions, which is not true of these values, suggesting even less of the variability is similarly simulated).

Implication of modeled variability for sampling
The large variability in dust implies that it may be difficult to constrain the dust cycle from limited observations.While we have satellite data, we can only use that data to constrain dust when it is the dominant aerosol, which occurs only in limited regions just downwind of major sources like northern Africa (Mahowald et al., 2007).Over much of the ocean, we only have individual daily-averaged values from cruise data (Baker et al., 2006;Buck et al., 2006;Sholkovitz et al., 2012).Previous model studies have shown that over much of the ocean, modeled daily-averaged dust concentrations will tend to underestimate the annual average, and only be within a factor of 10 to 2 of the true modeled average (Mahowald et al., 2008).Here we consider how many monthly means are required to obtain an estimate of the annual mean   that is within 1 standard deviation of the true model annual mean value (Fig. 11).Over most of the globe, the number of monthly mean observations is 8-12, or almost a full seasonal cycle, as expected.But in regions with large variability (Fig. 6), longer time periods of more than 2 full years are required to obtain a mean and standard deviation including the true mean (Fig. 11).Note that here we assume that there is no trend or significant change in the dust cycle.Characterizations of changes in dust show that there are interesting trends over the longer term, which requires longer records (Prospero and Lamb, 2003;Ridley et al., 2014).

Conclusions
Simulations of annual mean variability in 7 different model simulations are compared to better understand how robust the variability is in models for the period 1990-2005.Although the emphasis of this paper was not on evaluation of specific models, the models were compared with in situ concentration (Prospero and Lamb, 2003;Marticorena et al., 2010) and AERONET AOD (Holben et al., 1998) observations.The models considered here were 4 versions of the CAM4, the GCHEM (MERRA), MATCH (NCEP) and CAM5-AMIP (Table 1) (Albani et al., 2014;Luo et al., 2003;Ridley et al., 2014).Here we ignore the possible dust sources from land use and land cover change, which is hypothesized to represent 25 % of dust sources currently (Ginoux et al., 2012).The model simulations do roughly similarly well (or poorly) compared to observations when driven by reanalysis meteorology, but less well when driven by sea surface temperatures with meteorology being prognostically calculated, implying that using reanalyzed meteorology does improve dust simulations (Fig. 3).The models' ability to simulate the observations is strongest for the seasonal cycle, and the models are less able to simulate the interannual variability, similar to previous studies (Mahowald et al., 2003;Ginoux et al., 2004) (Fig. 3).Surface concentration and deposition have similar distributions of variability, while AOD tends to have less variability (Fig. 3).There is more variability, especially interannual variability, in parts of the Southern Hemisphere (Fig. 6).Some of this was artificially (potentially) enhanced in the simulations considered here because of the way that the CAM4 and CAM5 were tuned in the Southern Hemisphere.But the very limited observations at some stations in the Southern Hemisphere suggest there could a larger fraction of interannual variability than seasonal variability compared with the Northern Hemisphere.This should be tested with more long-term stations in the Southern Hemisphere.
Model simulations of interannual variability is sensitive to both meteorology as well as model construction, and thus drawing firm conclusions about how best to capture observed variability based on only one model is likely to be difficult.Our results that model construction, as well as meteorology, is important is consistent with general circulation model studies which suggest that modeling groups tend to have models which behave similarly (Knutti et al., 2013).These studies also complement reanalysis-based studies of the energy and water cycle, showing that issues remain with the reanalysis datasets (Trenberth et al., 2011;Trenberth and Fasullo, 2013).
Here we considered the hypothesis from Ridley et al. (2014) that Barbados surface concentrations were decreasing over the period 1980-2008 due to a decrease in winds in northern Africa and thus a decreasing source, which follows previous studies in highlighting the importance of winds on short (Engelstaedter and Washington, 2007;Sun et al., 2001) and long timescales in some source regions (Roe, 2008;McGee et al., 2010).Consistent with that hypothesis, most of the model versions considered here can simulate a decreasing concentration at Barbados, and the models trace this back to a decrease in dust source in the western Sahel region of northern Africa, linked to a decrease in surface winds.This is consistent with the meteorological station data in this region, which show both a correlation between dust sources and wind, as well as a decrease in winds over this time period (Mahowald et al., 2007).Basic meteorological principles and previous studies (e.g., Engelstaedter and Washing-ton, 2007) suggest associations between winds and precipitation, and the reanalysis models still cannot do a good job simulating mean or variability in precipitation (Trenberth et al., 2011;Trenberth and Fasullo, 2013), suggesting that it will be difficult to improve the dust source, transport and deposition variability without improvements in the reanalyses.In addition, other studies have noted the relationship between surface roughness changes due to vegetation, driving wind changes, and thus changes in source strength (Cowie et al., 2013), which could not be tested here.Of course, the time period considered here is relatively short, so it is unclear whether other drivers might be more important on longer timescales (e.g., Prospero and Lamb, 2003;Mulitza et al., 2010;Mahowald et al., 2010).
Because of the strong variability in dust, model simulations suggest that observations need to be made for around 1 year in many regions, but in remote regions, especially in the Southern Hemisphere, observations need to be made for more than 2 years in order to sample the modeled variability and correctly capture the annual mean concentrations (Fig. 11).Of course, this assumes there is no long-term variability.Long-term records of dust concentrations represent some of our most important data records to characterize variability in desert dust (e.g., Prospero and Lamb, 2003), and more long-term datasets should be collected.

Data availability
For access to the data used in this analysis, please email the corresponding author.
The Supplement related to this article is available online at doi:10.5194/acp-17-3253-2017-supplement.
Competing interests.The authors declare that they have no conflict of interest.

Figure 2 .
Figure 2. Time series of annual average AOD for model simulations (based on dust only), compared with AERONET observations in Table2for: (a) Bahrain, (b) Dalandzadgad, (c) Ilorin and (d) Sedé Boqer for each of the different model versions (colors are the same as in Fig.1).Observational data from AERONET stations (citations listed in Table2).The annual means are divided by the long-term mean to allow comparison with seasonal variability, since they are similarly normalized (Fig.S8).A map (e) with station locations for concentration (blue; Fig.1), AOD (red, Fig.2) and Southern Hemisphere analysis (green; Fig.5).

Figure 3 .
Figure 3. Model and observed IAV variability for (a) the concentration and (b) AOD from the observation sites listed in Table 2 and shown in Figs. 1 and 2. IAV variability is defined as the standard deviation over the mean using annual means.The ratio of the IAV variability to the seasonal cycle variability (12-month climatology) is shown for (c) concentration and (d) AOD.Model results are in color, while observations are in black.Correlation coefficients for IAV annual mean time series for (e) concentration and (f) AOD between the models and observations at the stations.The correlation coefficients between model and observed values for the seasonal cycle (g, h) for concentration and AOD.Observations are described in Table 2. Concentration stations are abbreviated: Ban: Banizoumbou; Bar: Barbados; Ber: Bermuda; Cin: Zinzana; Iza: Izaña; Mac: Mace Head; Mbo: Mbour; Mia: Miami; Mid: Midway.AOD stations abbreviated: Bah: Bahrain; Dal: Dalandzadgad; Ilo: Ilorin and Sed: Sedé Boqer.

Figure 4 .
Figure 4. Slope of the trend in the relative concentrations (linear regression of concentration onto time) of the modeled annual mean surface concentrations (normalized by the mean) in units of fraction change per decade for (a) CAM4 (MERRA), (b) CAM4 (NCEP), (c) CAM4 (ERAI), (d) CAM4 (AMIP), (e) GCHEM (MERRA), (f) MATCH (NCEP) and (g) CAM5 (AMIP), and the mean slope across the models (h).Only slopes that are larger in magnitude than 1 sigma from the regression are plotted.Positive slopes imply increasing concentrations.

Figure 6 .
Figure 6.Spatial plot of the modeled IAV variability in the model simulations at each grid box -CAM4-RE is the mean of CAM4 (MERRA), CAM4 (NCEP) and CAM4 (ERAI) -where variability is unitless and is the standard deviation divided by the mean of the annual mean between 1990 and 2005 for (a) surface concentration.The ratio of the seasonal variability over the IAV variability (calculated using the 12 climatological monthly means) is shown in the right-hand panel for (b) surface concentration.Similar diagnostics for the non-CAM models -GCHEM(MERRA) and MATCH (NCEP) -are shown in the bottom panel for (a) IAV in variability and (b) ratio of seasonal over IAV variability for concentration.

Figure 7 .
Figure 7. Spatial plot of the temporal rank correlation of the annual mean modeled surface concentration in the CAM4 (MERRA) case compared to each of the other model simulations: (a) CAM4 (NCEP), (b) CAM4 (ERAI), (c) CAM4 (AMIP), (d) GCHEM (MERRA), (e) MATCH (NCEP) and (f) CAM5 (AMIP).At each grid box the 16-year time series are correlated between the two model versions, and the color indicates the value of the correlation.

Figure 8 .
Figure 8. Spatial plot of the average temporal rank correlation of the annual mean modeled values.The correlation is calculated between the CAM4-reanalysis models (CAM4 (MERRA), CAM4 (NCEP) and CAM4 (ERAI), and averaged; left-hand column (a) and the models driven by the same meteorology -average of CAM4 (MERRA) vs. GCHEM (MERRA) and CAM4 (NCEP) vs. MATCH (NDEP) -(b) for surface concentration.At each grid box the 16-year time series are correlated between the models, and the color indicates the value of the correlation.The temporal correlation of the climate index time series and the modeled annual mean concentration is shown in (c) NAO and (d) ENSO, where the values are the average of the correlations in the CAM4-Renalysis models.The temporal correlation of the annual mean concentration between the CAM4 (AMIP) and CAM5 (AMIP) simulations is shown (e).The temporal correlation for the seasonal cycle is shown (f), which represents the average temporal correlation for the three CAM4-reanalysis models, using the climatological mean for each of the 12 months.

Figure 11 .
Figure 11.Monte-Carlo-based estimation of the number of monthly mean observations required to obtain a 95 % chance of obtaining a mean and standard deviation consistent with the true model mean between 1990-2005 at each grid box, based on model simulations using the CAM4 (MERRA) model.More details on methods in Sect.2.3.

Table 1 .
Description of model simulations considered here.

Table 2 .
Description of observational sites included here.Locations are plotted in the maps (Figs.S1, S3 and S5).

Table 3 .
Annual average spatial comparison to observations for different cases (described in Table1 and Methods).Correlations which are statistical significant at the 95 percentile are in bold.

Table 4 .
Slope of the normalized annual mean values from 1982 to 2008 (or time period available, shown in Table1) for the western Sahel (13 to 22 • N and −20 to 13 • E) and northern Africa (0 to 35 • N and −20 to 40 • E) and model (Fig.4) (statistically significant values are in bold, standard deviation of slope in parenthesis for Barbados surf.conc.slope) in the first four columns.Values are normalized by dividing by the mean, so that slopes represent relative change per year.The last column is the correlation of interannual variability in precipitation in each model compared to observations.

Table 5 .
Variability in Southern Hemisphere.Values for the IAV variability (annual average standard deviation divided by mean) and the ratio of the variability from the seasonal cycle over the IAV (for the surface concentration in the model cases and data from Rio Gallegos; deposition data from Kerguelen; and coarse mode AOD from Tinga Tingana (locations listed in Table1)).
Sun et al., 2001),E; East Asia: 35 to 50 • N, 70 to 112 • E; Middle East: 10 to 45 • N, 40 to 70 • E; northern Africa: 10 to 35 • N, 40 • W to 40 • E; Sahel (western): 13 to 22 • N, 40 • W to 13 • E; South Africa: 35 to 20 • S, 15 to 40 • E; South America (Argentina): 55 to 35 • S, 285 to 310 • E. Africa and the western Sahel (Table7).It is reassuring that the model winds have high correlations with IAV source strength in the model for East Asia, for there is a strong correlation in the current climate observations between winds and sources in this region (e.g.,Sun et al., 2001),

Table 8 .
Surface concentration over ocean basins.For each ocean region, the averaged correlation across time between annual mean deposition fluxes for the CAM4-RE cases is shown in the second column.The third column shows the annual mean correlation with NAO, while the third column shows the annual mean correlation with the El Niño-Southern Oscillation climate index.