Some implications of sampling choices on comparisons between satellite and model aerosol optical depth fields

. The comparison of satellite and model aerosol optical depth (AOD) ﬁelds provides useful information on the strengths and weaknesses of both. However, the sampling of satellite and models is very different and some subjective decisions about data selection and aggregation must be made in order to perform such comparisons. This work examines some implications of these decisions, using GlobAerosol AOD retrievals at 550 nm from Advanced Along-Track Scanning Radiometer (AATSR) measurements, and aerosol ﬁelds from the GEOS-Chem chemistry transport model. It is rec-ommended to sample the model only where the satellite ﬂies over on a particular day; neglecting this can cause regional differences in model AOD of up to 0.1 on monthly and annual timescales. The comparison is observed to depend strongly upon thresholds for sparsity of satellite retrievals in the model grid cells. Requiring at least 25% coverage of the model grid cell by satellite data decreases the observed difference between the two by approximately half over land. In model and satellite is an anticorrelation between the proportion p of a model grid cell covered by satellite retrievals and the AOD. This is attributed to small p typically occuring due to high cloud cover and lower AODs being found in large clear-sky regions. Daily median AATSR AODs were found to be closer to GEOS-Chem AODs than daily means (with the root mean squared difference being approximately 0.05 smaller). This is due to the decreased sensitivity of medians to outliers such as cloud-contaminated retrievals, or aerosol point sources not included in the model.


Introduction
Aerosol direct and indirect radiative effects are among the least certain contributions to radiative forcing (Forster et al., 2007;Stevens and Feingold, 2009). Uncertainties arise in the aerosol optical depth (AOD), composition, and the mechanisms and strengths of the interaction between aerosol with other elements of the climate system (such as clouds). Satellites have an important role in assessing the global aerosol burden, both directly and as evaluation tools for models. If a model simulates aerosol loading well, given real-world meteorology and emissions, then this provides confidence in the results of model experiments designed to examine the response of the climate system to changes in aerosol burden. Intercomparisons between different aerosol models, such as those undertaken by the AeroCom project, have found diversity with respect to aerosol composition and AOD , life cycles , and consequently radiative forcing .
The aerosol remote sensing datasets with the longest time series are those derived from Advanced Very-High Resolution Radiometer (AVHRR) data (Mishchenko and Geogdzhayev, 2007;Zhao et al., 2008), although they are only reliable over the ocean. Improved spectral and/or directional sampling by the Moderate Resolution Imaging Spectroradiometers (MODIS), Along-Track Scanning Radiometers (ATSRs) and Multiangle Imaging SpectroRadiometer (MISR), enables accurate aerosol retrieval over land and ocean (Martonchik et al., 1998(Martonchik et al., , 2009Grey et al., 2006;Remer et al., 2008;Thomas et al., 2009Thomas et al., , 2010. These represent the only instrument series to date which provide global tropospheric aerosol records for more than a decade. Previous model comparison studies using satellite data have often used simple monthly or annual satellite aggregates (Kinne et al., 2003Stier et al., 2005;Liu et al., 2006;Lee and Adams, 2010), as existing sensors are unable to provide daily global coverage due to limited swath widths, cloud cover, and availability of daytime measurements (in polar regions for the winter hemisphere). In such studies, the satellites are compared with spatially and temporally complete model fields. The inconsistent spatial and temporal sampling between model and satellite datasets means that a direct comparison is impossible, so aggregation of the datasets to a common spatial and temporal grid is necessary. Some subjective choices regarding the method comparison must therefore be made. This work illustrates some implications of these decisions, which are largely absent from previous discussions, using GlobAerosol retrievals of AOD at 550 nm performed using Advanced ATSR (AATSR) measurements, and aerosol fields from the GEOS-Chem chemistry transport model (CTM), both for the year 2004.

Satellite and model datasets
In this work the GEOS-Chem CTM (v8-01-04), run at 5 • (longitude) by 4 • (latitude) resolution for the whole of 2004, is used. A general description of this model simulation is given by Park et al. (2006). The model is sampled from 09:00 a.m.-12:00 p.m. local time (LT), corresponding to the overpass time (about 10:00 a.m. LT at the Equator) of the AATSR instrument. The daily sampling of the model near the satellite overpass time removes the influence of diurnal variability from the analysis. For each day in 2004, the model provides the 550 nm AOD for six tracer aerosol species (organic carbon, black carbon, sulphate, mineral dust, and two sea salt modes) on 30 vertical levels; the total AOD is obtained by summing over all tracers and levels. Further references to AOD indicate AOD at 550 nm.
Aerosol retrievals from AATSR have been performed as part of the European Space Agency (ESA) Data User Element (DUE) GlobAerosol project. GlobAerosol data and user guides can be freely accessed from http://www. globaerosol.info. The AATSR retrievals were performed using the Oxford-Rutherford Appleton Laboratory Aerosol and Clouds (ORAC) retrieval scheme, described as applied in GlobAerosol by Thomas et al. (2009). A preliminary validation is presented by Poulsen et al. (2007). Daily files (containing all quality-controlled retrievals from a given day, on the retrieval resolution of a 10 km sinusoidal grid) are used here.

Selection of model data
Recently, Levy et al. (2009) used MODIS data to examine the effect of different spatial and temporal weighting schemes on monthly AOD fields and found that, dependent on the method used, regional and global AOD estimates could differ by 30% or more. As aerosol events such as dust storms or plumes from fires are frequently episodic in nature, it is desirable to minimise any mismatch in temporal sampling by comparing data at the highest possible temporal resolution. Therefore the data used in this study are initially aggregated to provide 366 daily fields for comparison (one for each of the 366 days during the year 2004), which may then be combined to create monthly, seasonal or annual fields with temporally-consistent spatial sampling.
The annual mean AOD created by averaging all GEOS-Chem daily fields is shown in Fig. 1a; the annual mean from averaging only those days where there are any coincident AATSR data within a given grid cell is shown in Fig. 1b. In these figures, the annual mean is shown calculated as the simple mean of the 366 days of model data. The figures reveal that neglecting daily coincidence of sampling causes differences in AOD of the order of 0.05-0.1 in the annual mean over regions including Eurasia, northern Africa, the Amazon, East Asia and Northern Canada (Fig. 1c). Aside from the storm tracks, the change is generally negligible over the ocean. On an annual scale, Pearson's correlation coefficient r between the two images is 0.99 and the root mean square (RMS) difference 0.02 (with absolute differences larger for higher AODs). Calculation for a set of monthly means sampled in the same way gives, overall, r =0.97 and an RMS difference of 0.04. If the model and satellite datasets were perfectly consistent representations of the aerosol field, this indicates the expected maximum level of agreement given restrictions imposed by the overpass of AATSR.
If the aerosol loading in a given location is partitioned between some low background value punctuated by infrequent events of high AOD, satellites will tend to provide a lower estimate of AOD if these events are not coincident with the satellite overpass. As a result, the largest differences are found in those locations where the day-to-day variability of GEOS-Chem AOD is large (Fig. 2) and the number of days where there are coincident AATSR retrievals is small (Fig. 3). Increasing the averaging period will reduce the random error resulting from incomplete satellite sampling, but not any systematic differences. Equatorwards of 45 • there are typically up to 150 days with AATSR retrievals; this is due to the limited AATSR repeat cycle. However in many regions (Equatorial oceans, the West coasts of southern Africa and South America, China, Amazonia and Central Africa) there are significantly fewer due to persistent high cloud cover. In these locations use of a satellite sensor with a wider swath (such as MODIS) may be of limited additional use because, despite the increased potential number of observations throughout the year, the cloud cover issues remain the same. Polewards of 45 • there are far fewer days with data; this arises in part due to cloudiness and in part due to the Sun being too low in the sky to perform retrievals for part of the year. In particular for annual means, the change in coverage will bias any composite because the sampling is seasonallyincomplete. The few regions in Fig. 1 where the "any-data" AOD is higher than the "all-days" AOD are largely found in these near-polar regions, implying a higher model AOD in the summer months when the satellite retrievals are possible. In summary, daily sampling of the model along the satellite orbital track is required to avoid regional biases in AOD stemming from the variable nature of atmospheric aerosols and the limited sampling of satellite sensors. Although wideswath instruments will ameliorate the problem and narrowswath instruments exacerbate it, this will be an issue for all similar imaging radiometers.

Selection of satellite data
In dealing with satellite data, decisions must be made as to: the grid to aggregate to; the choice of retrievals to aggregate; and the weight given to each retrieval. It makes sense to aggregate to the finest common grid (i.e. that of GEOS-Chem) to minimise errors arising from spatial mismatch (although, given incomplete sampling by satellites, this may result in gaps in the averaged data).

Number of days with GlobAerosol data Number of days with GlobAerosol data
Ideally all successful retrievals will be included. Cloudcontamination can result in significant biases in AOD from satellite radiometers, and so, dependent on the application of the data, it may be desirable to remove any retrievals suspected of this contamination from further analysis. Two potential sources of error are the misidentification of cloudy pixels as clear, particularly around cloud edges, and retrieval errors caused by the neglect of 3-D radiative transfer effects ("cloud adjacency effects") in the retrieval forward model. Such contamination typically leads to the AOD being overestimated, although AOD can also be underestimated in areas A. M. Sayer et al.: Implications of sampling choices affected by cloud shadows, and the retrieved spectral dependence of AOD can be highly altered (Zhang and Reid, 2006;Wen et al., 2007;Koren et al., 2008;Marshak et al., 2008;Twohy et al., 2009;Várnai and Marshak, 2009, and others). Cloud contamination is a complicated issue for satellite aerosol retrieval algorithms and the size of errors in retrieved data will depend on factors such as the sensor in question's spectral and spatial resolution, and signal-to-noise ratio.
Quality control metrics are applied to remove retrievals suspected of cloud contamination in generation of the GlobAerosol daily products used here (Poulsen et al., 2007). These include the requirement over land that at least 50% of the (approximately 1 km×1 km) instrument pixels in the (10 km×10 km) retrieval pixel were flagged as cloud-free (because the cloud flag is known to miss some clouds over land; it is worth emphasising that only the instrument pixels flagged as cloud-free are averaged to perform the retrieval).
Further restrictions of the cloud-free proportion of the retrieval pixel may be investigated to minimise residual cloud contamination. However, requiring that the area of the retrieval pixel contains solely instrument pixels flagged as cloud-free is undesirable. Firstly, this would reduce the data volume. Secondly, by biasing towards clear-sky parts of the model grid cells, the sample of selected retrievals may be unrepresentative of the true aerosol loading. As well as an aerosol indirect effect whereby increased AOD is linked to an increase in cloudiness, high humidity may increase both cloudiness and, due to aerosol swelling, AOD Koren et al., 2007;Quaas et al., 2009Quaas et al., , 2010Stevens and Feingold, 2009). An informed decision may be reached by comparing histograms of AOD with different thresholds of scene cloudiness. These histograms would be expected to show an unphysical peak at high AOD corresponding to contaminated scenes, decreasing in size as threshold for cloudfree scenes becomes more severe.
With thresholds of increasing severity (from no restriction, to requiring retrieval pixels be completely cloud-free), histograms for AATSR AOD constructed from individual retrievals ( Fig. 4a and 4b) show almost no change in shape over land, and a shift to lower AOD of order 0.025 over sea. These histograms are truncated for clarity as most retrievals are on the lower end of the permitted range (0.01<AOD<2). There is no obvious secondary cloud-contaminated "hump" in the distribution. This indicates that any remaining cloud contamination is unlikely to be significantly reduced through adoption of a stricter cloudiness threshold. However, when retrievals corresponding to each of these cloudiness thresholds are averaged to the GEOS-Chem grid the histograms become more distinct (Fig. 4c and d). In particular, the "completelyclear" histogram is biased towards lower AODs, particularly over land. A possible explanation for this could be because of the link between increased humidity, AOD and cloud coverage. This second row of histograms is created only from those grid cells containing retrievals corresponding to the strictest cloudiness threshold (all instrumental pixels clear) to maintain consistency of spatial sampling between histograms. The difference between the top and bottom parts of Fig. 4 arises because clouds are not evenly spread throughout the retrieval scenes (such that on a model grid scale, broken cloud fields are likely to occur near to each other).
Requiring that at least 50% of the instrumental pixels in the retrieval pixel be flagged as clear decreases the total number of available retrievals by approximately half. Requiring fully-clear scenes decreases this by a further factor of 2. Because there is no clearly-evident cloud-contamination issue in Fig. 4, and to maximise data volume and representivity of coverage, no additional constraint on the maximum cloudiness of the retrieval scene is imposed over land. Over sea, a constraint of at least 25% cloud-free is adopted to decrease potential cloud-contamination while retaining reasonable data volume. Any decision on a threshold to use is necessarily subjective; however, adoption of stricter constraints does not significantly affect the following conclusions (except for the case of requiring that all retrievals considered are 100% cloud-clear, due to the ensuing clear-sky bias).
The next decision is the weight to assign to each retrieval falling in a given model grid cell. ORAC provides an estimate of the uncertainty on each retrieved parameter, for each retrieval. These uncertainties are derived as part of the Optimal Estimation methodology used in ORAC, through propagation of uncertainties in the measurements, forward model, and a priori data into the retrieved state . Some weighted average using these is desirable. As noted by Levy et al. (2009), the choice of weights is significant. Over land, the uncertainty on AOD for a well-retrieved scene is generally proportional to the AOD (with the proportionality constant depending on factors such as surface type); over ocean the uncertainty is comparatively independent of AOD. The weighting system should allot "good" retrievals of different AODs equal weights. Therefore, in land-dominated grid cells, weights corresponding to the square of the reciprocal of the relative uncertainty on each retrieval are used. Over ocean, weights correspond simply to the square of the reciprocal of the absolute uncertainty. A relative weighting over ocean would bias towards high AODs, and an absolute weighting over land would bias towards low AODs. For over 90% of grid cells containing GlobAerosol data, at least 90% of the retrievals in those grid cells are of the same surface type (land or sea), so the number of mixed cells is small. However, the broad conclusions about sampling in this work are not significantly affected by the choice of weights.

Choice of grid cells to include
With the AATSR aerosol retrieval performed on a 10 km sinusoidal grid, on a daily timescale there are approximately 2.47×10 3 cosθ (where θ is the latitude) potential retrievals within a GEOS-Chem model grid cell. In practice there will be substantially fewer, due to aforementioned coverage issues.  finer than the model resolution, incomplete sampling could mean that the AATSR averaged AOD is unrepresentative of the wider region. Therefore it is sensible to apply some sparsity threshold below which the satellite data are insufficiently numerous for a meaningful comparison with the model. This is defined in terms of p, the proportion of the area of a GEOS-Chem grid cell containing AATSR retrievals, such that p=0 represents a GEOS-Chem grid cell without any colocated AATSR data, while p=1 denotes a grid cell containing 2.47×10 3 cosθ retrievals. Figure 5 shows the cumulative frequency distribution of, for those GEOS-Chem daily grid cells where p>0, the proportion of the area of the grid cells filled by the retrievals. From this it can be seen that even requiring a modest threshold of p>0.05 to be acceptable, 50% of the AATSR data over land (60% over sea) must be discarded. For p>0.25 approximately 80% must be rejected over land (90% over sea), and for p>0.5 the figure is 90% over land (over 95% over sea). Higher cloudiness over ocean than over land means that generally less of the area of grid cells is filled. Clearly, the adoption of a threshold to restrict to well-sampled regions has severe implications for the volume of data remaining. The impact on retrievals is shown in Fig. 6. Quite different behaviour is observed over land than sea. Over land, the average of the daily mean AATSR AODs decreases linearly as p increases, from approximately 0.4 for p=0 to 0.1 for p=1. This may reflect the removal of poorly-sampled grid boxes (where the few retrievals are likely due to cloud contamination) as well as a lower AOD in large clear-sky regions where it is possible for p to be large (due to limited cloud cover). A similar although smaller decrease is observed in the GEOS-Chem data, which supports this. The AATSR AODs are generally larger than GEOS-Chem. The root mean square (RMS) difference between the datasets also decreases with p. The mean standard deviation of AATSR data within the GEOS-Chem grid cells shows a similar decrease with increasing p, indicating that well-sampled grid boxes are more homogeneous. This can also be an indirect indication of cloud-contamination of the retrievals in grid cells with small p. The mean AATSR AOD for each bin in p constructed from the grid-cell median AODs (as opposed to means) is also shown. This too shows a decrease with p, although the absolute AOD is lower and closer to the GEOS-Chem values. This is not surprising because, in calculating A. M. Sayer et al.: Implications of sampling choices the median, the influence of outliers (such as cloud contamination or point aerosol sources not included in the model) is mitigated. The fact that the grid-cell median AATSR AOD is smaller than the mean implies that most of these outliers are positive, consistent with this hypothesis. Unfortunately, it is not possible to easily attribute these cases to cloud contamination or point sources, although Fig. 4 suggests that residual cloud contamination may be small. For both lines constructed from AATSR data, the RMS difference between AATSR and GEOS-Chem decreases by approximately half for p=0.25 as compared to p=0. The RMS difference for daily median AATSR AODs is approximately 0.05 smaller than that calculated for daily means for most values of p.
Over ocean, there is little dependence on AOD in either dataset or the AATSR standard deviation on p. Additionally, the grid-cell mean and median AATSR AODs are very similar. This is consistent with residual cloud contamination being small over the ocean, and the general homogeneity of marine aerosol on model grid scales (due to a lack of point sources). Figures 7-9 show the effects of different sparsity thresholds on p on comparisons of global annual mean AOD. In each panel, the GEOS-Chem model is only sampled when and where the GlobAerosol data meet the sparsity threshold. For a Gaussian distribution of AOD within a grid cell, increasing the sampling would decrease the random error on the comparison (so correlation would strengthen, but the mean AOD would remain the same within the noise). Larger changes in mean AOD for different sparsity thresholds would indicate heterogeneity within the grid cell. Over ocean, there is little dependence of the comparison on threshold (again illustrating the comparative homogeneity of marine aerosol), although Pearson's correlation coefficient r increases, which may reflect more consistent sampling of wind-driven aerosol events. Over land, increasing the threshold generally decreases the mean AOD in both datasets, particularly GlobAerosol (where decreases of 0.1 or more are common). This is due to increasing the proportion of clearsky (likely lower AOD) regions, and removing infrequent but strong elevated point sources of AOD from both datasets, and is consistent with Fig. 6. The magnitudes of the change in AOD for different thresholds on p calculated for AATSR mean and median fields is similar, which suggests that the change includes a shift of the distribution of AOD to lower values rather than only the decrease of the influence of high-AOD outliers (which would not alter the median so much).
Correlations strengthen over both land and ocean with stricter thresholds (Figs. 8 and 9), although in many cases lose statistical significance due to the decreased number of samples. These are almost identical for the AATSR mean (Fig. 8) and median (Fig. 9)  approximately half over land as compared to no restriction in p, as seen in Fig. 6; Fig. 8 shows that this is generally true for all land regions. This is also true for annual fields generated from the AATSR daily median AODs (Fig. 9). The largest regional differences between the annual mean model and satellite AODs are generally found for deserts (the Sahara, Arabian and Iranian), Eastern China (industrial aerosol and transported dust), and tropical rainforests (the Amazon and Central Africa). For the first two cases AATSR retrieves lower AODs than seen in the model, and for the third higher. Over bright desert surfaces it is likely that AATSR underestimates the AOD due to poor contrast between the surface and atmosphere. Additionally, strong dust events (both over source regions and transported dust) are likely to be erroneously identified as cloud by the ATSR cloud flag, which was originally designed to filter out any strong atmospheric contaminants (Závody et al., 2000;Birks, 2004). Over rainforests possible reasons for satellite AOD being higher include small cumulus clouds being undetected, aerosol swelling in high humidity, or model biomass burning emissions being too low. Other satellite datasets face similar issues to these (Brennan et al., 2005;Koren et al., 2007). However, a detailed examination of reasons for these regional differences is out of the scope of this paper.
The above discussion focusses only on p as a measure of the absolute spatial coverage of a model grid cell. An additional factor to consider would be the extent of clustering of retrievals within a grid cell (such that the satellite spatial sampling may not be uniform). The available satellite AOD retrievals may be clustered if the satellite swath passes over only a portion of the grid cell, or if some other factor (such as cloudiness, Sun-glint, or retrieval failure over particular  terrain types) limits the coverage. For a model grid cell with high sub-grid aerosol heterogeneity, for the same value of p it would be expected that satellite and model AOD would be in better agreement if the available satellite retrievals were randomly distributed around the grid cell than if they were clustered in one part of it. This may be important for regions with strong aerosol point sources (such as biomass burning or desert dust), although less significant for homogeneous regions such as the open ocean. Additionally, the sub-grid heterogeneity will be dependent on the grid size of the particular model in question. For this reason, p was taken as the metric for sampling adequacy in this work, and the impact of aerosol clustering not directly considered. This source of model-satellite difference would diminish as the spatial resolution of models improves.

Summary and recommendations
Comparisons between model and satellite AOD fields have been shown to be sensitive to choices made about how to sample the datasets, and this has not always been addressed in previous works. The same considerations will also be important when comparing two satellite aerosol datasets. Except in the case of high cloud cover, the difficulties will become ameliorated when either the model resolution becomes finer, or the instrument's swath wider. Two recommendations are made. First, for temporal aggregates (such as weekly, monthly or longer means) the model should only be sampled at those regions overflown by the satellite on a particular day. Failure to do so can lead to regional differences in AOD of up to 0.1 in some areas. This is particularly important for annual composites due to the seasonally-varying maximum latitude at which aerosol retrievals can be performed from visible radiometers.
The second related recommendation is that some threshold based on the proportion p of the model grid cell covered by satellite retrievals be set to determine which grid cells to consider in a comparison. Over ocean a 5% threshold was found to strengthen correlations by approximately 0.1 as compared to no threshold, although little change in annual mean AOD was observed. Over land, a 25% threshold strengthened correlations by a similar amount, and decreased the difference in annual mean AOD by about 50%. However, ensuring that Comparison between model and satellite annual mean AOD as a function of data sparsity threshold. The left column shows annual fields constructed from GlobAerosol AATSR daily mean fields, the middle column from GlobAerosol AATSR daily median fields, and the right column GEOS-Chem daily AOD fields. The first three rows indicate fields where GEOS-Chem grid cells are at least 25% filled by  AATSR data (a, b, c); at least 5% filled (d, e, f); and contain any GlobAerosol data (g, h, i). The bottom row shows (j, k, l) the difference in AOD between the 25% and any-data thresholds within each dataset.  Fig. 8. Comparison between GlobAerosol AATSR and GEOS-Chem annual mean AOD fields as a function of data sparsity threshold. The AATSR data are constructed from daily mean fields (left column of Fig. 7). From left to right, the columns indicate fields where GEOS-Chem grid cells are at least 25% filled by GlobAerosol data; at least 5% filled; and contain any GlobAerosol data. The first row shows (a, b, c) the (satellite-model) difference in annual mean AOD for the three thresholds, and the second row (d, e, f) Pearson's correlation coefficient r where significant at the 90% level.
the satellite data cover a large proportion of a model grid cell before including it in the comparison is undesirable, as this reduces the data volume and biases towards low-AOD regions, meaning derived fields are less representative of true global aerosol fields. Irrespective of threshold, using satellite daily medians instead of daily means results in a closer match between the datasets over land, with more similar typical AODs and a RMS difference smaller by approximately 0.05. This is due both to the median being less sensitive to outliers caused by cloud-contaminated retrievals (typically retrieved as high AODs) and aerosol point sources (which may not be included in the model), although it is difficult to attribute the relative importance of these two factors. This therefore allows a comparison of typical background aerosol fields; although AODs derived from mean and median daily AATSR data were offset, the correlations with GEOS-Chem data are similar. A compromise must be reached between p (spatial consistency of sampling), the data volume, and the representivity of the resulting aerosol fields (due to the anticorrelation between p and AOD). The exact thresholds used will depend on the characteristics of the model and satellite datasets and the scientific focus of the comparison. However, it is suggested that analyses of the type presented in Fig. 6 be performed to test the sensitivity of the comparison to these sampling decisions, which can be significant. These conclusions are expected to broadly hold for other models and satellite datasets with similar resolutions. When the most like-for-like are comparisons made, remaining differences may be attributed to issues within the models (such as emissions, transport or washout processes, or structural limitations stemming from the coarse spatial resolution) and satellite data (such as cloud screening, surface reflectance or aerosol model difficulties) themselves with more confidence.