Open Access

Abstract. Aerosol retrievals from multiple spaceborne sensors, including MODIS (on Terra and Aqua), MISR, OMI, POLDER, CALIOP, and SeaWiFS – altogether, a total of 11 different aerosol products – were comparatively analyzed using data collocated with ground-based aerosol observations from the Aerosol Robotic Network (AERONET) stations within the Multi-sensor Aerosol Products Sampling System (MAPSS, http://giovanni.gsfc.nasa.gov/mapss/ and http://giovanni.gsfc.nasa.gov/aerostat/ . The analysis was performed by comparing quality-screened satellite aerosol optical depth or thickness (AOD or AOT) retrievals during 2006–2010 to available collocated AERONET measurements globally, regionally, and seasonally, and deriving a number of statistical measures of accuracy. We used a robust statistical approach to detect and remove possible outliers in the collocated data that can bias the results of the analysis. Overall, the proportion of outliers in each of the quality-screened AOD products was within 7%. Squared correlation coefficient ( R 2 ) values of the satellite AOD retrievals relative to AERONET exceeded 0.8 for many of the analyzed products, while root mean square error (RMSE) values for most of the AOD products were within 0.15 over land and 0.07 over ocean. We have been able to generate global maps showing regions where the different products present advantages over the others, as well as the relative performance of each product over different land cover types. It was observed that while MODIS, MISR, and SeaWiFS provide accurate retrievals over most of the land cover types, multi-angle capabilities make MISR the only sensor to retrieve reliable AOD over barren and snow/ice surfaces. Likewise, active sensing enables CALIOP to retrieve aerosol properties over bright-surface closed shrublands more accurately than the other sensors, while POLDER, which is the only one of the sensors capable of measuring polarized aerosols, outperforms other sensors in certain smoke-dominated regions, including broadleaf evergreens in Brazil and South-East Asia.


Introduction
Remote sensing of aerosols from space has been a subject of extensive research, with multiple sensors retrieving global aerosol properties on a daily or weekly basis. During the past decade, the retrievals of atmospheric aerosol parameters have been available from a multitude of spaceborne sensors Yu et al., 2006). The diverse algorithms used for these retrievals operate on different types of the remotely sensed signals and rely on different assumptions about the underlying physical phenomena. Significant effort has been made by the various aerosol algorithm teams to refine progressively these assumptions, from algorithm version to version, in order to derive and provide the most accurate products possible. However, despite these efforts, measurements of identical aerosol parameters from different sensors, including the most common observable and widely used aerosol optical depth or thickness (AOD or AOT or τ a ) parameter, often disagree with each other due to a variety of reasons including differences in the underlying surface properties at different locations, intrinsic sensor observation characteristics and retrieval approaches . Therefore, it has become necessary to analyze consistently the available aerosol products wherever possible in order to establish the geographical locations where and un-der what circumstances each of these products provides the greatest accuracy.
The unique attributes of a particular sensor may be advantageous for aerosol retrievals, depending on the parameter(s) being retrieved, especially under favorable atmospheric conditions. However, aerosol retrieval accuracy can also be affected by numerous other factors, including the retrieval algorithm's assumptions and parameterizations, the instrument characteristics (intrinsic design, calibration, and timedependent degradation), the measurement configurations (solar and view geometry), the atmospheric conditions (cloudiness, aerosol mixing, layer height, and humidity), the surface background (vegetated, bare, snow-covered, inundated, or simply just dark or bright land surface or ocean), and others (Kokhanovsky et al., 2007).
Since the accuracy of aerosol retrieval from a sensor may be affected positively or negatively by these factors and conditions in different ways and to varying degrees, a synergetic use of similar aerosol parameters across the sensors is non-trivial, and the data synergy research is instead focused on combining orthogonal (i.e., non-conflicting) aerosol measurements. For example, the aerosol layer height information from the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) has been used to enhance aerosol retrievals from other sensors (Oo and Holz, 2011;Torres et al., 2012;Zhang et al., 2011), while the geometry information from the Advanced Along Track Scanning Radiometer (AATSR) was used to initialize the Moderate Resolution Imaging Spectroradiometer (MODIS) bidirectional reflection distribution function (BRDF) in order to derive AATSR AOD (Guo et al., 2009).
To characterize better the differences and uncertainties that exist between the aerosol retrievals from different sensors, several studies compared a limited number of sensors. For example, AOD retrievals from MODIS were separately compared to retrievals from the MISR Multi-angle Imaging Spectroradiometer (Kahn et al., 2007(Kahn et al., , 2011Mishchenko et al., 2010;Zhang and Reid, 2010), the POLDER -POLarization and Directionality of the Earth's Reflectances sensor (Gérard et al., 2005), and CALIOP (Kittaka et al., 2011;Redemann et al., 2012). A larger set of sensors was intercompared using a synthetic benchmark (Kokhanovsky et al., 2010), and also based on detailed analysis of data from limited geographical regions (Cheng et al., 2012;Yu et al., 2012). In addition, a set of 9 aerosol products was evaluated over ocean and coastal AERONET (Aerosol Robotic Network) sites during the period of 1997-2000, highlighting regions of high retrieval agreement and disagreement (Myhre et al., 2005). However, all the satellite data used in that study had already undergone post-retrieval spatiotemporal aggregation at 1 × 1 degree grid resolution on a monthly mean basis (so-called Level 3 products) before they were used in the comparisons.
Finally, a recent study compared AERONET retrievals with a set of 5 spaceborne aerosol products archived at the ICARE Data and Services Center, including POLDER, MODIS-Aqua (Dark Target retrievals), MERIS, SEVIRI, and CALIOP (Bréon et al., 2011). Although that study was based on a similar collocation framework as that used in the current study, our study focuses on a different set of sensors that provides a more extensive set of over-land spaceborne aerosol products. Furthermore, the presented study is based on the analysis of the spatiotemporally averaged and outlierscreened data, whereas that of Bréon et al. (2011) is predominantly based on the analysis of individually collocated spaceborne and ground-based data points that are the closest in space and time that would correspond to the central values in our collocated data subsets (we report a similar analysis in the Supplement to this paper).
In this work, 11 retrieval-scale (Level 2) aerosol products from multiple spaceborne sensors are intercompared during the recent "golden" period of 2006-2010 (see Fig. 1), when as many as seven major sensors were in operation and measuring aerosols concurrently. Specifically, we focus on aerosol products retrieved over land and ocean from MODIS on Terra and Aqua, MISR on Terra, the Ozone Monitoring Instrument (OMI) on Aura, POLDER on PARASOL, CALIOP on CALIPSO, and the Sea-viewing Wide Field of view Sensor (SeaWiFS) aboard the SeaStar spacecraft. At the time of this study (January 2013), all of the studied sensors were still active, with the exception of SeaW-iFS, whose operation ended in December 2010. The analysis is based on the collocation of the satellite data products using the Multi-sensor Aerosol Products Sampling System (MAPSS) framework (Petrenko et al., 2012), which samples these satellite products relatively uniformly over the global AERosol Robotic NETwork (AERONET) of sun photometers and other important ground-based stations both over land and ocean.
The details of the MAPSS sampling approach are explained in Sect. 2, while the relevant characteristics of the aerosol data products from the different sensors and the corresponding data quality screening techniques are described in Sect. 3 and Sect. 4. Section 5 describes a novel statistical approach for detecting and removing possible data outliers that can exist in the collocated data and, as a result, bias the statistical analysis of these data. Section 6 presents the detailed analysis of the compared aerosol products, while Sect. 7 examines the accuracy of these products based on land cover type. Conclusions are presented in Sect. 8.

Sampling method
The different aerosol-measuring sensors have different spatial resolutions, of which some have square-shaped footprints while others have rectangular pixel shapes. The nominal ground pixel sizes of the analyzed aerosol products at nadir are summarized in Table 1, and these sizes become progressively larger away from nadir. To ensure a uniform and fair sampling of the aerosol products for cross-evaluation with Table 1. Ground-based and spaceborne atmospheric aerosol products analyzed in the study. In the product designation titles, "O" at the end of the title of a product signifies ocean retrievals, "L" -land retrievals, "DT" -land retrievals using the MODIS Dark Target algorithm, and "DB" -land retrievals using the MODIS Deep Blue algorithm. The AERONET AOD retrievals were interpolated to the studied wavelengths of the spaceborne sensors. The indicated local equatorial crossing times (LT) are based on the original orbital designs, and can change during the lifetimes of the satellites. SeaWiFS mission has ended in December 2010. * While 388 nm was the main observational wavelength used in the study for OMI, 500 nm was used where the collocated AERONET data were not available in the UV range.  Fig. 1. Periods of operation of major past and current aerosolmeasuring satellite sensors. The pair of dotted vertical lines marks the "golden" period (between the start of CALIOP in July 2006 and the end of SeaWiFS in December 2010) when as many as seven of these sensors were measuring aerosols concurrently. The golden period was used as the base for the studies reported in the rest of this paper.
AERONET and for comparison with one another, we used the framework of Multi-sensor Aerosol Products Sampling System (MAPSS) that was originally developed by  for validation and analysis of MODIS aerosol products Ichoku et al., 2003Ichoku et al., , 2005Remer, 2002) and later expanded to support aerosol products retrieved by other spaceborne sensors (Petrenko et al., 2012). MAPSS subsets the aerosol products by extracting pixels covering approximately the same area on the ground centered over AERONET sun photometer measurement sites and over certain other point locations that are not addressed in this study.
Assuming an imaginary circle of 55 km diameter whose center coincides with each AERONET station, all spaceborne aerosol product pixels falling within the circle are extracted. An aerosol pixel is regarded as being within the circle if the coordinates of the pixel center fall within 27.5 km from the coordinates of the circle center, where the distance between the coordinates of the two points is determined using the Haversine formula (Sinnott, 1984). Based on the nominal spatial resolution of the sensors in Table 1, the approximate maximum number of pixels within the 55 km diameter sample space at nadir for the different sensors is as follows: MODIS -25, MISR -9, OMI -8, . The actual number of pixels within the sampling circle decreases for the aerosol retrievals away from the nadir of the satellite scene, and can be further reduced in the presence of clouds or other factors preventing retrieval of aerosol parameters. Based on the extracted sample, statistics of each aerosol parameter retrieved within the sampling areas are calculated and include mean, median, standard deviation, as well as the value of the central point in the sample, i.e., the pixel in the spaceborne subset that is the closest (i.e., whose center has the smallest distance) to the ground station, or the individual data point in the ground-based subset that is measured the closest in time to the overpass of the satellite.
In this paper, results are reported based on the analysis of the mean values. Although not reported in this paper because of the space considerations, a similar analysis was performed based on the central values and is reported in the 6780 M. Petrenko and C. Ichoku: Analysis of aerosol measurements from multiple satellite sensors Supplement to this paper. It is appropriate to use the mean values in this paper, so as to maintain the uniform sampling criterion across the different sensors and their respective retrieval pixel sizes to facilitate a fair intercomparison. On the other hand, an analysis based on the central pixel values such as that reported in the Supplement can provide further details on the effect of difference in sampling aerosol products from individual sensors, as well as more accurately characterize the performance of the sensors in the presence of a strong point source of pollutant particles. Additionally, it should be noted that since the mean value of a sample can be computed even if its central value is missing, the reported analysis of the central values is based on a somewhat reduced volume of the collocated data points when compared to the reported analysis based on the mean values.
To collocate AERONET data in time and space with the satellite data, AERONET measurements acquired within ±30 min of each satellite sensor overpass are also extracted and the corresponding statistics are derived. Additionally, for the convenience of aerosol data intercomparison and validation, AERONET AODs are interpolated to the wavelengths of spaceborne sensors in Table 1 based on the established wavelength dependence of AOD (Eck et al., 1999). It is pertinent to note that this interpolation process might introduce an additional source of uncertainty when intercomparing the aerosol products. Also, because of the wavelength dependence of AOD, the difference in the compared wavelengths of the spaceborne products should be considered when intercomparing the relative performance of the products. Furthermore, although many AERONET stations provide observations in the range of 340-1020 nm, certain stations report AOD in the range of 440-1200 nm. For such stations that have no measurements in the UV region, we have evaluated OMI AOD at 500 nm instead of AOD at 388 nm, in order to avoid additional extrapolation biases.
Each AERONET station has a different period of operation, and the quantity of available AOD data points is not uniform across all stations; while many stations are still active, certain stations were active in the past and only for a short period of time. The overall availability of the collocated data during the analysis period of 2006-06-07 to 2010-12-11 is shown in Fig. 2, where for the purposes of this study the stations are classified as land-only, oceanonly, or land-and-ocean. This classification is based on analyzing collocated data of separate aerosol retrievals over land and ocean from the MODIS, SeaWiFS, and POLDER sensors and identifying stations that have AOD data points from the land datasets, ocean datasets, or both; note that the MISR, OMI, and CALIOP sensors provide only joint landand-ocean datasets.

Aerosol products
The key properties of the 11 analyzed aerosol products are summarized in Table 1, while the original science dataset  (SDS) names of the spaceborne aerosol products are outlined in the first column of Table 2, except for the POLDER products that do not have an established SDS product naming convention. The sampled satellite data products are derived directly from the retrieval level aerosol products (Level 2) that represent the highest available spatial resolution for each product/sensor combination and are free of aggregation artifacts that can be present in data at Level 3 Zhang and Reid, 2010).
Of the 11 sampled products, 3 are combined land-andocean products, 6 land-only products, and 4 ocean-only products. Furthermore, 6 aerosol products are retrieved from the twin MODIS-Terra and MODIS-Aqua sensors using the same set of 3 algorithms: the ocean algorithm is used for the retrievals over oceans and other large bodies of water; the land Dark Target (DT) algorithm is used over vegetated regions and other dark surfaces ; and the land Deep Blue (DB) algorithm is used for deserts and barren lands (Hsu et al., 2004). Although the results between the two MODIS sensors are expected to be very close, they might still differ due to the different times of scene observation during the day and other factors summarized in Ichoku et al. (2005) and Remer et al. (2008).
The remainder of this section provides a brief description of the analyzed products and highlights some of the unique aerosol properties reported in these products. A more detailed overview can be found in the theoretical and validation works of the respective science teams of the products as cited below, while a general comparative overview of multiple products and retrieval algorithms are in Kokhanovsky et al. (2007), Lee et al. (2009), and Yu et al. (2006. AERONET (http://aeronet.gsfc.nasa.gov) sun photometers measure aerosol properties using ground-based observations of solar direct and diffuse irradiances, where the nominal accuracy of AOD measurements is within the range of 0.01-0.02. In this work, the AERONET product used is the aerosol optical depth or thickness (AOD or AOT), which is retrieved from the AERONET direct measurements of solar irradiance. Since AERONET measurements are made from the ground looking up, they present a distinct advantage over spaceborne retrievals in that they are not affected by uncertainties associated with the effects of surface properties as much as satellite measurements are (Dubovik et al. 2002;Holben et al. 1998Holben et al. , 2001. Furthermore, the Level 2.0 AERONET data used in this work are carefully calibrated, cloud screened, and quality assured (Smirnov et al., 2000) and therefore are especially suitable for use as the reference standard against which the satellite aerosol remotesensing data are evaluated. M. Petrenko and C. Ichoku: Analysis of aerosol measurements from multiple satellite sensors 6781 Table 2. Studied aerosol datasets, the matching data quality (QA) datasets, and the corresponding QA data screening criteria. Where provided, numbers in parentheses in the middle column indicate the base-1 layer index, base-0 bit number, and number of bits extracted from this QA dataset. For MODIS, MISR, OMI, and SeaWiFS, the QA values are integer numbers between 0 and 3, whereas for MODIS and SeaWiFS larger numbers indicate a better retrieval quality, and for OMI and MISR the opposite is true. For POLDER, QA is a real number between 0 (worst) and 1 (best). For CALIOP, a column is accepted only if all layers found in this column meet all listed QA conditions. The listed extinction QC values indicate retrievals that are unconstrained, constrained, have a reduced lidar ratio, or detected an opaque aerosol layer. CAD score and layer type and subtype flags indicate retrievals that classified a layer with a high confidence as containing aerosol and were able to determine the aerosol type. IAB condition is set to prevent the retrieval anomaly of overcorrecting the attenuation of overlying layers (Kittaka et al., 2011). AOD  The MODIS (http://modis.gsfc.nasa.gov) aerosol product (MOD04 and MYD04) comprises the column aerosol optical thickness and other physical properties of aerosols retrieved globally over land and ocean Hsu et al., 2004;Ichoku et al., 2005;Remer, 2002;Remer et al., 2005). MODIS has a swath of 2300 km.
The MISR (http://www-misr.jpl.nasa.gov) aerosol product (MIL2ASAE) features aerosol retrievals based on observations from 9 independent camera angles. Though limited to a swath of 563 km, its multiple viewing angles allow MISR to measure certain aerosol properties that are not available from the other instruments (e.g., aerosol particle size). Furthermore, MISR multiple cameras enable retrievals under conditions that are unfavorable to single-view (e.g., nadir) instruments, such as over bright surfaces or sun glint, where the other instruments are unable to make reliable retrievals in the visible wavelengths (Kahn, 2005;Kahn et al., 2010;Martonchik et al., 2009).
The OMI (http://www.knmi.nl/omi/research/instrument/ index.php) aerosol product (OMAERUV) measures the near-UV (near ultraviolet) aerosol absorption and extinction optical depth, as well as single scattering albedo, among other aerosol properties (Torres et al., 1998(Torres et al., , 2007. In addition to offering a generous swath of 2800 km, OMI is capable of retrieving absorption optical depth in partially cloudy con-ditions that usually pose a challenge to other aerosol instruments. The POLDER onboard PARASOL (http://www.icare. univ-lille1.fr/parasol) aerosol land product (P3L2TLGC) and aerosol ocean product (P3L2TOGC) are derived from measuring spectral, directional, and polarized properties of reflected solar radiation. Also with a swath of 2800 km, one of the main features of the POLDER instrument is its utilization of polarization properties of the measured radiation for retrieving anthropogenic aerosol optical depth (Bréon et al., 2002;Buriez et al., 2002;Deuzé et al., 1999Deuzé et al., , 2001Herman et al., 1997). It is important to note that the POLDER operational algorithm retrieves AOD at 2 wavelengths (670 and 865 nm) over ocean and only at 1 wavelength (865 nm) over land. Furthermore over land, the POLDER algorithm retrieves only AOD that corresponds to polarized particles, i.e., mainly fine-mode particles originating from anthropogenic activities. However, since there are no specific recommendations on the regions suitable for using the polarized AOD retrievals either as a proxy for total AOD or as pure finefraction AOD, it is beneficial to explore this dataset in relation to other available AOD products in order to identify geographical regions where the polarized AOD from POLDER can potentially be treated as total AOD.

Fig. 2.
Distribution of AERONET stations used in the study. Green, red, and yellow colors indicate stations that can be classified as landonly (233 sites), ocean-only (11 sites), or both land-and-ocean (149 sites), respectively. The classification was established based on data availability in separate over-land and over-ocean datasets in MODIS, SeaWiFS, and POLDER aerosol products. Gray color indicates stations that do not have any collocated data for the studied period of time.
The SeaWiFS (http://disc.sci.gsfc.nasa.gov/dust/) aerosol product (SWDB) uses the Deep Blue algorithm to derive aerosol optical thickness and Ångström exponent. Also based on an orbital ground-coverage swath of 2800 km, the key features of this product are the retrievals of aerosol properties over both bright desert and vegetated surfaces, avoidance of sun glint that improves aerosol retrievals over ocean, and a highly precise calibration of the SeaWiFS sensor (Hsu et al., 2004(Hsu et al., , 2012. The CALIOP (http://www-calipso.larc.nasa.gov) aerosol product (05kmALay) represents atmospheric curtain slices portraying the vertical distribution of aerosols and clouds in the atmosphere, including the density and certain properties of individual aerosol layers (Omar et al., 2009;Winker et al., 2007). Since CALIOP is an active lidar sensor, it can provide both daytime and nighttime retrievals within a narrow swath of about 70 m. Although the lack of the daytime background solar illumination makes nighttime CALIOP retrievals more accurate, they are not used in this study because they cannot be intercompared with the AERONET retrievals, which are available only during the daytime.
Since each of the foregoing datasets has a few versions because of the periodic revisions and updates of their retrieval algorithms over time, the data versions that were current at the time of writing this paper (January 2013) were sampled, although the study has been designed in a highly flexible way to enable rapid re-analysis as the new versions become available. The respective data versions used in this paper are AERONET AOD (Version 2), Terra and Aqua MODIS (Collection 051), MISR (Version 002), OMI (Version 003), POLDER (Versions L and K), SeaWiFS (Version 004), and CALIOP (Version 3-01). Therefore, all of the illustrations and analyses shown in this paper are based on these data versions for the respective aerosol sensors.

Data quality screening
While the AERONET Level-2 data are manually inspected to be free of retrieval defects and anomalies (Smirnov et al., 2000), such an approach is not feasible for the voluminous spaceborne data. Instead, all Level-2 aerosol products analyzed in this paper assign to AOD pixels one or more quality assurance (QA) flags that indicate a degree of "confidence" of the retrieval algorithms in their results. For MODIS and SeaWiFS, aerosol QA flags are integer numbers ranging from 0 to 3, with 3 representing the highest quality. For MISR and OMI data, the reverse is the case (i.e., 0 is the highest quality). Finally, for POLDER and CALIOP, QA data are a combination of one or more flags, most of which are real numbers ranging between 0 and 1, where 1 indicates the highest quality. By means of these QA flags, certain aerosol retrievals are identified as "bad quality" and are considered to be not trustworthy enough for certain analyses. Therefore, users of these aerosol products have been advised to choose data corresponding to a range of QA values that is most appropriate for their specific needs.
To establish similar yet valid QA thresholds for the analyzed products, we consulted science teams of the analyzed products as well as data product validation results reported by these teams and other research groups. Based on this inquiry, we chose the acceptable QA values as described in Table 2. For the majority of the products, the thresholds are set based on selecting a limited subset of the possible QA values. An important exception is the POLDER aerosol products, where the QA flags are expressed as real numbers between 0 ("bad") and 1 ("excellent"). Since there are no formal recommendations on the acceptable range of these flag values, we have adopted thresholds suggested for the "quality of inversion" flag in Bréon et al. (2011), specifically 0.5 for land retrievals and 0.2 for ocean retrievals. It is also important to note that since the primary designation of this flag is to indicate the success of the retrieval algorithm, this flag does not always reflect the actual quality of the retrieved aerosol parameters, especially under certain less than favorable conditions (Fig. 3).
The original MAPSS framework was designed to facilitate data analysis experiments based on different values of QA flags. For this, MAPSS extracts QA flags over the sampling area and computes the statistical mode for integer QA flags and mean for real QA flags. These statistical modes of the integer QA flags and means of the real QA flags provide a single number for the quality assessment of each sample set, and can be used to screen the corresponding subset statistics while providing a convenient alternative compared to screening individual pixels (e.g., see Remer et al., 2008). However, it was observed that this approach has an unequal impact on the statistical properties of the different aerosol products (Petrenko et al., 2012).
As an example, consider Fig. 4, where the global collocated subset mean AOD values from OMI and Terra MODIS Deep Blue (TMODIS DB) are compared to the corresponding subset mean AOD values from AERONET. It can be observed that while filtering the mean TMODIS DB AOD values by the mode of QA flags improves the R 2 and RMSE statistics, when compared to computing the mean values based on individually screened TMODIS DB AOD pixels, this filtering significantly changes the distribution of the collocated data. Specifically, compared to screening individual pixels, QA mode filtering removes 50 % more of the collocated data points and degrades the slope of the fitted regression line as a result of removing certain high-biased points. The opposite behavior can be observed in the collocated OMI AOD and AERONET AOD datasets, where screening by QA mode degrades R 2 of the collocated data when compared to screening individual pixels, although RMSE is still improved and the slope of the fitted regression line remains the same, since both screening approaches produce approximately the same number of the OMI subset data points.
This observation indicates a certain inhomogeneity in the uncertainties that are present in the aerosol products, as in some cases high biases in individual pixels might overwhelm the statistics derived from the sample set. Therefore, to avoid such biases and ensure a fair comparison between the analyzed aerosol products, the rest of this study is based on the QA "pre-filtering" approach, where individual pixels in a spatial sample are screened by their QA values before computing the statistics of this sample. This approach also closely models a typical use of the spaceborne aerosol data, where data users screen each pixel individually and do not consider QA values of its neighboring pixels. The data quantity impact of the described QA screening approach can be observed in Table 3, which provides the sizes of the analyzed datasets before and after the screening. It is noticeable that, depending on the product, the impact is quite different, with the two MODIS ocean AOD datasets and the MISR AOD dataset retaining almost all of their available datasets, whereas the two MODIS DT datasets retained only one-fourth of the complete collocated datasets.
It is important however to keep in mind that the QA values reported by the retrieval algorithms are to a large degree subjective to these algorithms and do not always reflect the actual quality of the retrievals. For example, in the absence of a proper aerosol or surface model, an algorithm can in certain cases use a wrong model to retrieve aerosol properties and mistakenly assign this retrieval a "good" QA flag (e.g., Kahn et al., 2010;. Furthermore, a retrieval algorithm used might not have enough skill or even the possibility to recognize correctly certain conditions that are unfavorable for aerosol retrieval, e.g., sub-pixel cloud contamination in OMI retrievals (Torres et al., 1998). In yet another situation, an aerosol scene can be observed in only a portion of the available observation modes of a sensor, e.g., in only a few of the available observation directions in POLDER (Herman et al., 1997), which can lead to more confident yet less reliable results. The opposite case can also be true where an algorithm correctly retrieves aerosol properties but is not confident about the retrieval. As an example, consider Fig. 5, which explores how QA screening degrades the statistics of OMI AOD and Aqua MODIS Deep Blue AOD datasets when compared to AERONET AOD over Djougou, Benin, as a result of assigning a "bad" QA flag to sufficiently "good" retrievals.

Possible data outliers
Under rare circumstances, aerosol retrievals from spaceborne observations can produce aerosol properties that do not reflect the actual physical state of aerosol in the atmosphere. Some of the reasons for such retrievals were discussed in the previous section and might include, but are not limited Table 3. Statistics of the studied aerosol datasets based on all AERONET stations during the period of 2006-06-07 and 2010- . "N tot " indicates the total number of the collocated spaceborne AOD -AERONET AOD data points, while "N filt " indicates the number of data points after filtering (screening) the spaceborne data by QA as described in Sect. 4 and Table 2. "N out " is the total number of the possible data outliers determined as explained in Sect. 5. The last 8 columns present the statistics on the collocated data based on regression fits also plotted in Fig. 6. Please see the Supplement for a breakdown of the listed statistics based on nominal ranges of AOD loading.  to, such factors as the lack of a proper aerosol model, incorrect assumptions about boundary conditions, cloud contamination, and several other factors. In Fig. 6, the possible abnormal retrievals can be visually identified by observing points that have a minimal data density and lie abnormally far from the fitted regression lines. Even though an actual fraction of such data points in a complete collocated dataset can be relatively minor, the extreme deviations of such points from the overall trend might significantly bias and misrepresent the overall statistics of the data. Therefore, when computing the overall statistics and inter-comparing the aerosol products, such data points should be treated as possible outliers and analyzed separately from the rest of the data. In order to identify and separate the possible data outliers, we analyzed AOD residuals, i.e., the difference between spaceborne AOD and AERONET AOD observations, using the modified Z-score test (Iglewicz and Hoaglin, 1993;National Institute of Standards and Technology, 2012). This test is designed for testing data for multiple outliers in approximately normal datasets and works by finding data points that differ from the mean value by more than 5 median absolute deviations. Unlike the standard deviation used in the traditional Z-score test, the median absolute deviation in the modified Z-score test is calculated based on the median of the data and is less sensitive to extreme values. When applied to the collocated AOD data, this test removes those spaceborne retrievals that grossly overestimate or underestimate groundbased observations as compared to the median retrieval error at a specific AERONET location. This is especially useful since many spaceborne retrieval algorithms tend either to under-estimate certain high-AOD events because of a pre-set maximum AOD threshold, or to over-estimate AOD in the presence of clouds as well as under very low-AOD conditions. However, it should be noted that the test may not remove such possible outliers that have a relatively small error.
It is pertinent to note that even though AOD data are known to follow the lognormal distribution , the AOD residuals of the analyzed products follow an approximately normal distribution as shown in Fig. 7 and Fig. 8, with the exception of the POLDER products that The "POLDER3 O (extended)" histogram is based on those pixels in ocean retrievals, where the retrieval algorithm considered the sensor viewing geometry conditions to be especially "favorable" and produced a set of additional aerosol parameters, such as spherical large-mode AOD, refractive index of fine mode, and others. The quality flag values are binned into 0.01 intervals, and the red lines indicate the 0.5 (land) and 0.2 (ocean) QA thresholds used in this study. Please note that even though certain retrievals can have a very high value of inversion QA (e.g., QA > 0.9 in ocean retrievals), if they were retrieved under less than favorable conditions, they may not necessarily be high-quality aerosol data, as there are almost no extended ocean retrievals with QA > 0.9. mostly underestimate AOD, because their retrievals focus on anthropogenic fine-mode aerosols, and thus represent only the negative portion of the distribution. In the figures, it can be seen that the distributions have long tails, strongly indicating the presence of outliers. Furthermore, it can be observed that the slopes of the fitted lines are different from the slope of the 1 : 1 line. This indicates that the standard deviation of the analyzed residuals is different from 1, showing that these data do not follow the standard normal distribution, although this difference does not affect the test since the modified Zscore test normalizes residuals by the median absolute deviation of the data.
The overall effect of removing the possible outliers can be observed in the bottom-right sub-plots of Fig. 6,Fig. 7,and Fig. 8, as highlighted by the green frames, showing that 575 (4.4 %) outliers are removed from the SeaWiFS ocean AOD dataset. The total numbers of the removed outliers are provided in Table 3 and do not exceed 7 % of the total QAscreened data for any of the datasets when considering the all-season data. The global distribution of the possible data outliers is depicted in Fig. 9 and generally corresponds to the outlier locations reported by the science teams of the aerosol products, e.g., outliers around the coastal areas where the significant subpixel surface variations, shallow waters, sediments, and complex marine/inland aerosol mixtures complicate the retrievals, and also data outliers associated with uncertain retrievals by the MODIS and MISR algorithms in Amazon Basin and near the Sahara desert (Kahn et al., 2010;, although a more detailed study is needed to determine the specific factors that lead to these outliers and their spatiotemporal distributions. Since the Z-score test requires the use of a reference dataset (i.e., AERONET in our case), it cannot be directly applied to remove outliers in spaceborne data in an independent and systematic fashion. However, the results of this study could possibly be used to develop appropriate mitigation measures in the retrieval algorithms or to design specific data screening strategies for each of the products.
In the remainder of this paper, the reported results are based on the QA-screened data with the outliers removed.

Analysis
The overall data distribution for the analyzed spaceborne aerosol products is presented in Fig. 6, whereas the detailed linear regression fit statistics (Fox, 1997) for the products based on the treatment of the possible data outliers and the nominal delimiters of the four boreal seasons, namely, spring (March-May), summer (June-August), autumn (September-November), and winter (December to February), are listed in Table 3. The statistics are presented based on seasonal time frames rather than monthly or shorter time periods because there may not be sufficient coincident data for a scatterplot over such shorter time periods, due to the infrequency of satellite aerosol retrieval caused by cloud cover and other issues. Fortunately, many climatic events that are relevant to aerosol emission, transport, and distribution are often roughly aligned with these seasons.
The second column of Table 3 (N filt ) outlines the total volume of the collocated quality-filtered data available for each of the sensors depending on the boreal season. Although sensor swath width (Sect. 3) and data quality (Sect. 4) are AERONET AOD data are shown on the x axes, while AODs measured by spaceborne sensors are on the y axes. Density plots bin data into 0.1 AOD (0.05 AOD in magnified insets) intervals, where the color of each bin indicates the percentage of all data points that fall into this bin. Left column displays the original unfiltered data with all QA values. Middle column displays the data pre-filtered by QA, where individual pixels in each data sample were filtered based on their QA values before calculating the mean value of the sample. Right column shows the data post-filtered by QA, where the mean of each sample was calculated based on all pixels in the sample; after this, the whole sample was rejected if at least half of the pixels in the sample had QA values below the specified threshold. Note that OMI data have somewhat better properties when pre-filtered, while Terra MODIS-Deep Blue data are in a better agreement with AERONET AOD when post-filtered. The insets (0-0.5 AOD range magnified) are intended to enhance the visualization of the linear regression fits near the origin.
among the main factors that determine the available volume of the data (e.g., MODIS has approximately 4 times the swath width and 4 times the data volume of MISR), it can be seen that the seasonal changes in retrieval conditions also have a very considerable impact on the data. Thus, summer retrievals can have 2-4 times as much collocated data points as winter retrievals. The relative data volume differences between the studied data products should be carefully considered when interpreting the statistics discussed in the remainder of this paper.
In the presented statistics of Table 3, the slope value indicates by how much the satellite retrieval for the parameter under consideration is relatively underestimated or overestimated across different magnitudes, depending on whether the slope value is less than or greater than unity. The offset parameter indicates the extent to which the satellite retrieval is biased. The squared linear correlation coefficient (R 2 ) indicates how consistent the parameter retrieval is across its magnitude range, that is how tightly the points are aligned close to the 1-to-1 line. Finally, the root mean square error (RMSE) indicates the accuracy of the retrievals measured as the average error in the spaceborne retrievals as compared to the ground-based AERONET retrievals.
In Table 3, it can be seen that all MODIS, MISR, and Sea-WiFS aerosol products correlate with AERONET observations with R 2 ≥ 0.6. Furthermore, MISR, SeaWiFS land, and MODIS Dark Target products have R 2 ≥ 0.7, and MODIS ocean products have R 2 ≥ 0.8. Also, once the possible outliers are removed, the SeaWiFS, MISR, and MODIS Dark Target products reach R 2 ≈ 0.8, while the MODIS Deep Blue products have R 2 ≈ 0.7. All the best-performing land products have RMSE values of about 0.15 (measured in the  5. Impact of QA screening on the statistical properties of AOD retrieved by the different sensors over Djougou, Benin. The top part of the figure shows scatterplots of 2 yr of data that are unfiltered (left) or pre-filtered (right) by QA. It can be observed that while filtering improved the properties of certain datasets, it degraded the properties of the others, particularly Aqua MODIS Deep Blue and OMI. This effect can be partially explained by observing that the retrieval algorithms can mistakenly assign bad QA to pixels with good retrievals, as demonstrated in a high-AOD event in the bottom part of the picture. In this figure, the magenta line indicates daily means of AERONET AOD at 440 nm, while bar heights reflect the number of all-QA (top half of the figure) and best-QA (bottom half) data pixels in each spaceborne sample, and error bars represent the mean relative accuracy of each sample computed based on its pixels. As an example, consider AMODIS DB (turquoise) retrieval on 13 December. Even though the mean AOD based on 22 pixels in this retrieval was within 10 % of the corresponding AERONET AOD, all these 22 pixels were marked as having a bad QA.  6. Regression fits of AERONET AOD (x axes) to AOD measured by spaceborne sensors (y axes). Satellite data were pre-screened by QA as explained in Sect. 4. Density plots bin data into 0.1 AOD (0.05 AOD in magnified insets) intervals, where the color of each bin indicates the percentage of all data points that fall into this bin. Density plot in the bottom right corner (green frame) demonstrates the results of the possible data outlier detection and removal procedure described in Sect. 5, compared to the bottom middle plot. The insets (0-0.5 AOD range magnified) are intended to enhance the visualization of the linear regression fits near the origin. For most of the sensors in Table 3 and Fig. 5, the slope of the fitted regression line is below 1.0 and the intercept is slightly above 0. This can be explained by the limitations of Fig. 8. Normality of the difference between spaceborne AOD and AERONET AOD. In each plot, points closely following the blue fitted line indicate the data that are approximately normally distributed. Curvatures around the center of the straight line represent the departure from the normality and indicate a presence of possible outliers, particularly at the tails of the distributions. The difference in the slope and offset of the fitted blue line from the gray 1 : 1 line indicates a deviation from the standard location (i.e., mean = 0) and scale (i.e., standard deviation = 1) of the normal distribution. Satellite data were pre-screened by QA as explained in Sect. 4. Plot in the green frame demonstrates the results of the possible data outlier detection and removal procedure described in Sect. 5.  Table 3. Stations with less than 1 % from the total number of outliers are not shown. The statistical technique for detection and removal of the possible data outliers is described in Sect. 5. the spaceborne retrieval algorithms that tend to (1) overestimate low-AOD events when the AOD signal is very weak and almost indiscernible from the surface signal, resulting in a portion of the surface signal being mistaken for an AOD signal; and (2) underestimate high-AOD events because of the very weak surface signal, where a portion of the AOD signal might be mistaken for a surface signal. Furthermore, certain algorithms have pre-set limits on the highest possible retrieved value of AOD (e.g., 3.0 in MISR), which may further affect the reported statistics. Finally, certain censors have peculiar features that impact the overall characteristics of their data. Among such features are sensitivity to sub-pixel cloud contamination in OMI retrievals that leads to an overestimation of AOD (Torres et al., 1998), sensitivity to fine particles in POLDER land retrievals that leads to underestimation of AOD in coarse-mode-dominated regions (Herman et al., 1997), and also frequent under-estimation of AOD by daytime CALIOP retrievals . Since without a ground reference it is near impossible to recognize an underestimation of high-AOD events or over-  Fig. 10. Seasonal dependence of squared linear fit correlation coefficient (R 2 ) and root mean square error (RMSE) statistics between the collocated spaceborne and ground-based (AERONET) observations of AOD, based on the data in Table 3. estimation of low-AOD events in the original Level 2 spaceborne data, even when considering the associated QA information, it is especially important to explore the behavior of each product across the complete range of AOD values. Figure 10 charts the seasonal dependence of R 2 and RMSE of the spaceborne products based on the data in Table 3. While all of the products demonstrate the high seasonal variations in the statistical parameters, the OMI, CALIOP, POLDER, and MODIS Deep Blue are the most sensitive to the seasonal changes in the retrieval conditions, perhaps because of the uncertainties associated with cloud screening , although collocating spaceborne observations with AERONET introduces certain bias towards cloudfree scenes because of the comprehensive AERONET cloud screening procedures (Smirnov et al., 2000). Furthermore, it can be seen that while removing the data outliers reduces the RMSE and sensitivity to the seasonal changes in the analyzed products, this reduction is not significant, indicating that the retrieval errors reflected by the RMSE of these products likely stem from the regular retrievals rather than the anomalous retrievals.
The accuracy of the spaceborne aerosol products might vary with the location of the retrieval and, depending on the location, some products might be significantly more accurate than others. The spatial dependence of the accuracy of the analyzed products is explored in Fig. 11 and Fig. 12, where it can be observed that no single sensor provides the best retrievals at all sites. Additionally, as indicated by the smaller relative sizes of certain markers in Fig. 11 and Fig. 12, al-though some locations might be covered by highly accurate spaceborne retrievals from certain sensors, if such sensors offer limited coverage and data availability, their accuracy advantage may ultimately produce only limited impact, highlighting the auxiliary but still important role of the less precise but more spatially extensive products. Furthermore, as depicted by the lighter shading in Fig. 11 (e.g., southern Australia) and also in the histogram of R 2 inset in this figure, some sites are not covered by high-correlation (i.e., R 2 ≥ 0.75) retrievals at all or have no collocated retrievals from the most accurate of the products.
Moreover, it can be observed that the best-performing aerosol products differ between Fig. 11 and Fig. 12, and the products providing the best RMSE are oftentimes those with the lower R 2 . Therefore, when choosing an aerosol product for a specific analysis goal and at a specific region, it is necessary to consider a balance between a variety of seasonal, statistical, and spatial factors.

Accuracy of aerosol data products based on land cover type
Aerosol properties are derived from satellite observations based on a set of assumptions about the type and the optical properties of the underlying terrestrial surfaces. Therefore, it can be beneficial to compare the accuracy of the considered aerosol data products based on the land cover types of the sites over which the data subsets were extracted. As a reference for land cover types and their spatial extent, we used the Fig. 11. Spaceborne datasets with the best correlation (R 2 ) of the retrieved AOD with the AOD measured by each individual inland (top) and coastal or island-based (bottom) AERONET site. The intensity of marker shading indicates the degree of correlation. Marker shape indicates the range of root mean square error (RMSE) associated with the displayed best R 2 . Finally, marker size corresponds to the number of collocated data points used to compute the displayed statistics. Histograms in the bottom insets highlight the distribution of these statistics over all sites based on bins of 0.05 AOD. The statistics were calculated based on the data that were pre-filtered by QA and screened of outliers as described in Sects. 4 and 5.
global dataset that is based on the International Geosphere-Biosphere Programme (IGBP) classification scheme and is available from the suite of MODIS products (Friedl et al., 2002). For each land cover type, we identified coincident AERONET stations and averaged their corresponding statistical results from Sect. 6. Tables 4 and 5 list the results of this aggregation, while Fig. 13 and Fig. 14 outline these results on a geographical map. Generally, these aggregated results corroborate the findings of Sect. 6, and the aerosol products from MODIS and MISR sensors produce the most accurate results for the ma-jority of the land cover types, although there are some peculiarities that should be discussed in greater detail in order to understand better the best areas of application of the analyzed aerosol products.
Specifically, IGBP water surface locations include 31 AERONET stations out of 160 stations with collocated ocean retrievals identified in Fig. 2. At these 31 locations, MODIS, MISR, and SeaWiFS demonstrate the best results with R 2 ≈ 0.7. Furthermore, POLDER ocean dataset has a good RMSE = 0.07 (Deuzé et al., 1999) that is comparable to the best performing sensors in this region, albeit it Spaceborne datasets with the best root mean square error (RMSE) of the retrieved AOD to the AOD measured by each individual inland (top) and coastal or island-based (bottom) AERONET site. The symbols used are the same as the symbols in Fig. 11. The statistics were calculated based on the data that were pre-filtered by QA and screened of outliers as described in Sects. 4 and 5.
has a somewhat lower squared correlation coefficient value of R 2 = 0.62; note that these statistics are different from POLDER ocean statistics in Fig. 6, which analyzes a more complete set of AERONET stations. It is interesting to note that the correlation between AERONET and Aqua MODIS AOD with R 2 = 0.8 is higher than the correlation between AERONET and Terra MODIS AOD with R 2 = 0.72. A detailed inspection of the data showed that this difference stems from several AERONET sites with relatively small numbers of collocated data points (N < 35) and the average AOD below 0.2. Under such low-AOD conditions, MODIS ocean algorithm has difficulty in retrieving the precise AOD values and, as a result, is subject to an increased rate of errors Remer, 2002).
Evergreen broadleaf forest regions provide conditions that are favorable for retrieving AOD, and multiple sensors demonstrate the high correlation with AERONET, including MODIS Dark Target with R 2 = 0.84, MISR with R 2 = 0.90, and POLDER with R 2 = 0.98. However, since these regions are also susceptible to complex smoke events (e.g., Ji Parana, Brazil), sometimes combined with dust and pollution events (e.g., Anmyon, South Korea; Hong Kong, China), most of the sensors demonstrate a rather poor RMSE . The important exception is POLDER dataset that has RMSE = 0.07, possibly because POLDER is especially sensitive to small particles produced by biomass burning and anthropogenic pollution sources (Fan et al., 2008), thereby retrieving fairly accurate AOD values at Ji Parana, Brazil, and Fig. 13. Land cover type dependence of squared linear fit correlation coefficient (R 2 ) between the collocated spaceborne and ground-based (AERONET) observations of AOD. Areas corresponding to each IGBP land cover type (bottom right inset) are colored based on the average of the data from those AERONET sites that reside in these areas. The statistics were calculated based on data that were pre-filtered by QA and screened of outliers as described in Sects. 4 and 5.
Lulin, Taiwan. It should be also noted that together with deciduous broadleaf forests and savannas, evergreen broadleaf forest is one of the three land cover types where POLDER demonstrates good results with R 2 ≈ 0.98, indicating the advantage of polarization measurements for aerosol retrievals over these regions.
For mixed forests, MODIS Dark Target products provide the highest retrieval accuracy with R 2 = 0.78 for Terra and 0.82 for Aqua, while MISR data are somewhat less ac-curate with R 2 = 0.72 as a result of underestimating high AODs during summertime biomass burning events (Kahn et al., 2010), although RMSE = 0.04 of MISR is better than RMSE = 0.05 of Terra MODIS and RMSE = 0.06 of Aqua MODIS. Sufficiently reliable aerosol data are also retrieved by SeaWiFS with R 2 = 0.64, by POLDER with R 2 = 0.62, and CALIOP with R 2 = 0.61.
For closed shrubland, MISR with R 2 = 0.90, CALIOP with R 2 = 0.88, and MODIS Deep Blue with R 2 =0.74 for Atmos. Chem. Phys., 13, 6777-6805 (Hsu et al., 2006). It should be noted that closed shrubland is the only area where CALIOP produces some of the best retrievals, Fig. 15. Land cover type dependence of bias between the collocated spaceborne and ground-based (AERONET) observations of AOD. Areas corresponding to each IGBP land cover type (bottom right inset) are colored based on the average of the data from those AERONET sites that reside in these areas. The statistics were calculated based on the data that were pre-filtered by QA and screened of outliers as described in Sects. 4 and 5.
possibly indicating the advantage of active aerosol sensing over this bright-surface region.
Over wooded savannas, both Dark Target and Deep Blue products from MODIS show very good results with R 2 values between 0.80 and 0.90. MISR with R 2 = 0.79 and SeaW-iFS with R 2 = 0.77 produce lower but still reasonably good results. The reduced performance of MISR in this region can be explained by the lack of region-specific aerosol mixtures in its retrieval algorithm, a situation that is expected to be improved in future revisions of the product .
It should be also noted that this region enables one of the two highest correlations between OMI and AERONET observations, probably as a result of favorable cloud-free conditions in sub-Saharan Africa (Ahn et al., 2008;Torres et al., 2007).
Open shrublands are very dry and sparsely vegetated regions that are characterized by bright surfaces. Such regions present a great challenge for remote retrieval of aerosol properties , and none of the analyzed products exceeded the correlation coefficient of 0.7. Among the bestperforming products are MODIS Dark Target with R 2 = 0.68 Fig. 16. Land cover type dependence of variance between the collocated spaceborne and ground-based (AERONET) observations of AOD. Areas corresponding to each IGBP land cover type (bottom right inset) are colored based on the average of the data from those AERONET sites that reside in these areas. The statistics were calculated based on the data that were pre-filtered by QA and screened of outliers as described in Sects. 4 and 5.
for Terra and R 2 = 0.61 for Aqua, MISR with R 2 = 0.65, and MODIS Deep Blue with R 2 = 0.52 for Terra and R 2 = 0.61 for Aqua, as well as CALIOP with R 2 = 0.59. Similar to open shrublands, grasslands were challenging to all of the sensors, where Terra MODIS Deep Blue, Aqua MODIS Dark Target, and MISR demonstrated the best results with R 2 values between 0.65 and 0.67. Even more challenging were snow and ice and also barren or sparsely vegetated areas, where MISR was the only highly accurate aerosol product with R 2 = 0.78 for both land cover types, thanks to its multi-angle measurement capabilities that allow retrieving aerosol properties over bright surfaces and enable the advanced cloud and ice detection capabilities ). Table 4. Linear fit correlation coefficient (R 2 ) between the collocated spaceborne and ground-based observations of AOD estimated at the stations that coincide with different IGBP land cover types. Empty cells indicate no collocated data available from a specific sensor over a specific land cover type. No AERONET stations are available at the areas occupied by deciduous needleleaf forest. The statistics were calculated based on the data that were pre-filtered by QA and screened of outliers as described in Sects. 4 and 5. A graphical representation of this table is in Fig. 13 Table 7. Variance between the collocated spaceborne and ground-based observations of AOD estimated at the stations that coincide with different IGBP land cover types. Empty cells indicate no collocated data available from a specific sensor over a specific land cover type. No AERONET stations are available at the areas occupied by Deciduous needleleaf forest. The statistics were calculated based on the data that were pre-filtered by QA and screened of outliers as described in Sects. 4 and 5. A graphical representation of this table is in Fig. 14

Conclusions
In this paper, we analyzed and intercompared 11 spaceborne aerosol products from MODIS, MISR, OMI, SeaW-iFS, POLDER, and CALIOP sensors, which were sampled fairly uniformly based on the MAPSS framework that was used to collocate these spaceborne observations with groundbased AERONET observations during the period of 2006-06-07 and 2010-12-11, when all the sensors were operational. Based on this analysis, for each of the AERONET stations, we identified products providing the best correlation coefficient (R 2 ) and root mean square error (RMSE). It was found that no single product provides the best retrieval over all sites, and certain sites are not covered by accurate retrievals at all. Furthermore, it was observed that a product providing the best R 2 at a certain location does not always provide the best RMSE at the same location. Therefore, to facilitate the multivariate analysis that is necessary when choosing the most suitable spaceborne aerosol product at a specific region, we plan to develop an interactive tool that would allow exploration of the multi-sensor collocated data on an interactive map. Further, a statistical approach based on the statistical modified Z-score test has been used to identify automatically possible data outliers in the collocated datasets. The reported analysis shows that even though such atypical data points constitute a relatively minor portion (2-7 %) of the analyzed datasets, they can significantly bias the results of the statistical analysis. For this reason, it is suggested that such data points be set aside when analyzing collocated datasets and inspected separately, in order to develop appropriate mitigation measures in the retrieval algorithms or to design specific data screening strategies that could be used to identify outliers in spaceborne datasets independently and systematically.
Finally, we assessed the accuracy of the spaceborne aerosol products based on their spatial distribution relative to different surface types derived from MODIS using the IGBP land cover classification scheme. This analysis identified sensors that retrieve the most accurate aerosol properties over each of the defined land cover types and highlighted the differences that exist between the sensors, providing an advantage or disadvantage in retrieving AOD over a particular land cover type. Notably, some of the land cover types, including open shrublands and grasslands, had only moderately accurate retrievals, indicating the need for improved spaceborne aerosol remote sensing instrumentation/approaches and/or retrieval algorithms.

Supplementary material related to this article is available online at: http://www.atmos-chem-phys.net/13/ 6777/2013/acp-13-6777-2013-supplement.pdf.
Acknowledgements. Support for the development of this project has been provided by NASA HQ under grant number NNX08AN39A through the ROSES 2007 ACCESS Program based on a proposal entitled "Integrated validation, intercomparison, and analysis of aerosol products from multiple satellites" and also under grant NNH10ZDA001N-ESDRERR through the ROSES 2009 Earth System Data Records Uncertainty Analysis Program based on a proposal titled "Coherent uncertainty analysis of aerosol data products from multiple satellites". We thank the science and support teams of MODIS, MISR, OMI, POLDER, CALIOP, SeaWiFS, and AERONET for retrieving and making available their respective aerosol products, as well as for providing assistance during the development of MAPSS sampling for these products. Specifically, we are grateful to certain individual members of the aerosol product teams for their insight and willingness to provide us answers to various questions related to their respective products, namely: AERONET (Brent Holben, Thomas Eck, Oleg Dubovik, Alexander Smirnov,  We also give special thanks to the PIs of the global AERONET sites and their staff for establishing and maintaining these sites. Finally, we would like to honor the memory of our colleague, Gregory Leptoukh, who passed away suddenly in January 2012, as we had a long-term collaboration with him that resulted in the implementation of the MAPSS framework on the GIOVANNI data analysis system, and he was part of the initial discussions of the ideas that led to this study. as well as the relative performance of each product over different landcover types. It was 27 observed that while MODIS, MISR, and SeaWiFS provide accurate retrievals over most of 28 the landcover types, multi-angle capabilities make MISR the only sensor to retrieve reliable 29 AOD over barren and snow / ice surfaces. Likewise, active sensing enables CALIOP to 30 retrieve aerosol properties over bright-surface shrublands more accurately than the other 1 sensors, while POLDER, which is the only one of the sensors capable of measuring polarized 2 aerosols, outperforms other sensors in certain smoke-dominated regions, including broadleaf 3 evergreens in Brazil and South-East Asia. 4 5 1 Introduction 6 Remote sensing of aerosols from space has been a subject of extensive research, with multiple 7 sensors retrieving global aerosol properties on a daily or weekly basis. During the past 8 decade, the retrievals of atmospheric aerosol parameters have been available from a multitude 9 of spaceborne sensors Yu et al., 2006). The diverse algorithms used for 10 these retrievals operate on different types of the remotely-sensed signals and rely on different 11 assumptions about the underlying physical phenomena. Significant effort has been made by 12 the various aerosol algorithm teams to progressively refine these assumptions, from algorithm 13 version to version, in order to derive and provide the most accurate products possible. 14 However despite these efforts, measurements of identical aerosol parameters from different 15 sensors, including the most common observable and widely used aerosol optical depth or 16 thickness (AOD or AOT or τ a ) parameter, often disagree with each other due to a variety of 17 reasons including differences in the underlying surface properties at different locations, 18 intrinsic sensor observation characteristics and retrieval approaches (Li et al., 2009). 19 Therefore, it has become necessary to consistently analyze the available aerosol products 20 wherever possible in order to establish the geographical locations where and under what 21 circumstances each of these products provide the greatest accuracy. 22 The unique attributes of a particular sensor may be advantageous for aerosol retrievals, 23 depending on the parameter(s) being retrieved, especially under favorable atmospheric 24 conditions. However, aerosol retrieval accuracy can also be affected by numerous other 25 factors, including the retrieval algorithm's assumptions and parameterizations, the instrument 26 characteristics (intrinsic design, calibration, and time-dependent degradation), the 27 measurement configurations (solar and view geometry), the atmospheric conditions 28 (cloudiness, aerosol mixing, layer height, and humidity), the surface background (vegetated, 29 bare, snow-covered, inundated, or simply just dark or bright land surface or ocean), and others 30 (Kokhanovsky et al., 2007). 31 Since the accuracy of aerosol retrieval from a sensor may be affected positively or negatively 1 by these factors and conditions in different ways and to varying degrees, a synergetic use of 2 similar aerosol parameters across the sensors is non-trivial and the data synergy research is 3 instead focused on combining orthogonal (i.e., non-conflicting) aerosol measurements. For 4 example, the aerosol layer height information from the Cloud-Aerosol Lidar with Orthogonal 5 Polarization (CALIOP) has been used to enhance aerosol retrievals from other sensors (Oo 6 and Holz, 2011;Torres et al., 2012;Zhang et al., 2011), while the geometry information from 7 the Advanced Along Track Scanning Radiometer (AATSR) was used to initialize the 8

Moderate Resolution Imaging Spectroradiometer (MODIS) Bi-Directional Reflection 9
Distribution Function (BRDF) in order to derive AATSR AOD (Guo et al., 2009). 10 To better characterize the differences and uncertainties that exist between the aerosol 11 retrievals from different sensors, several studies compared a limited number of sensors, e.g., 12 AOD retrievals from MODIS were separately compared to retrievals from the MISR Multi-13 angle Imaging Spectroradiometer (Kahn et al., 2007(Kahn et al., , 2011Mishchenko et al., 2010;Zhang 14 and Reid, 2010), the POLDER POLarization and Directionality of the Earth's Reflectances 15 sensor (Gérard et al., 2005), and CALIOP (Kittaka et al., 2011;Redemann et al., 2011). A 16 larger set of sensors was intercompared using a synthetic benchmark (Kokhanovsky et al., 17 2010), and also based on a detailed analysis of limited geographical regions (Cheng et al., 18 2012;Yu et al., 2012). In addition, a set of 9 aerosol products was evaluated over ocean and 19 coastal AERONET sites during the period of 1997-2000, highlighting regions of the high 20 retrieval agreement and disagreement (Myhre et al., 2005). However, all the satellite data used 21 in that study had already undergone post-retrieval spatio-temporal aggregation at 1×1 degree 22 grid resolution on a monthly mean basis (so-called Level 3 products) before they were used in 23 the comparisons. 24 In this work, eleven retrieval-scale (Level 2) aerosol products from multiple spaceborne 25 sensors are intercompared during the recent 'golden' period of 2006-2010 (see Fig. 1), when 26 as many as seven major sensors were in operation and measuring aerosols concurrently. The details of the MAPSS sampling approach are explained in Sect. 2, while the relevant 6 characteristics of the aerosol data products from the different sensors and the corresponding 7 data quality screening techniques are described in Sect. 3 and Sect. 4. Section 5 describes a 8 novel statistical approach for detecting and removing possible data outliers that can exist in 9 the collocated data and, as a result, bias the statistical analysis of these data. Section 6 10 presents the detailed analysis of the compared aerosol products, while Sect. 7 examines the 11 accuracy of these products based on land cover type. Conclusions are presented in Sect. 8. 12 13 2 Sampling Method 14 The different aerosol-measuring sensors have different spatial resolutions, of which some 15 have square-shaped footprints while others have rectangular pixel shapes. The nominal 16 ground pixel sizes of the analyzed aerosol products at nadir are summarized in Table 1 and 17 these sizes become progressively larger away from nadir. To ensure a uniform and fair 18 sampling of the aerosol products for cross-evaluation with AERONET and for comparison 19 with one another, we used the framework of Multi-sensor Aerosol Products Sampling System 20 (MAPSS) that was originally developed by  for validation 21 and analysis of MODIS aerosol products Ichoku et al., 2003Ichoku et al., , 2005Levy et 22 al., 2010;Remer, 2002) and later expanded to support aerosol products retrieved by other 23 spaceborne sensors (Petrenko et al., 2012). MAPSS subsets the aerosol products by 24 extracting pixels covering approximately the same area on the ground centered over 25 AERONET sun photometer measurement sites and over certain other point locations that are 26 not addressed in this study. 27 Assuming an imaginary circle of 55-km diameter whose center coincides with each 28 AERONET station, all spaceborne aerosol product pixels falling within the circle are 29 extracted. An aerosol pixel is regarded as being within the circle if the coordinates of the pixel 30 center fall within 27.5km from the coordinates of the circle center, where the distance 31 between the coordinates of the two points is determined using the Haversine formula (Sinnott, 32 1984). Based on the nominal spatial resolution of the sensors in Table 1, the approximate 1 maximum number of pixels within the 55-km diameter sample space at nadir for the different 2 sensors is as follows: MODIS -25, MISR -9, OMI -8, and 3 SeaWiFS -16. The actual number of pixels within the sampling circle decreases for the 4 aerosol retrievals away from the nadir of the satellite scene, and can be further reduced in the 5 presence of clouds or other factors preventing retrieval of aerosol parameters. Based on the 6 extracted sample, statistics of each aerosol parameter retrieved within the sampling areas are 7 calculated and include mean, median, standard deviation, as well as the value of the central 8 pixel over the ground station. In this paper, results are reported based on the analysis of the 9 mean values; although not reported in this paper because of the space considerations, a similar 10 analysis was performed based on the central values and is reported in the digital supplement 11 to this paper. It is appropriate to use the mean values in this paper, so as to maintain the 12 uniform sampling criterion across the different sensors to facilitate a fair intercomparison.  Table 1 based on the established 20 wavelength dependence of AOD (Eck et al., 1999). It is pertinent to note that this 21 interpolation (and particularly) extrapolation process might introduce an additional source of 22 uncertainty when intercomparing the aerosol products, especially for certain stations, where 23 AERONET AOD observations in the range of 440nm-1200nm have to be extrapolated by 24 52nm to match OMI AOD at 388nm. 25 Each AERONET station has a different period of operation and the quantity of available AOD 26 data points is not uniform across all stations; while many stations are still active, certain 27 stations were active in the past and only for a short period of time. The overall availability of 28 the collocated data during the analysis period of 2006-06-07 to 2010-12-11 is shown in Fig. 2,  29 where for the purposes of this study the stations are classified as land-only, ocean-only, or 30 land-and-ocean. This classification is based on analyzing collocated data of separate aerosol 31 retrievals over land and ocean from the MODIS, SeaWiFS, and POLDER sensors and 32 identifying stations that have AOD data points from the land datasets, ocean datasets, or both; 1 note that the MISR, OMI, and CALIOP sensors provide only joint land-and-ocean datasets. 2 3 3 Aerosol Products 4 The key properties of the 11 analyzed aerosol products are summarized in Table 1, while the 5 original science data set (SDS) names of the spaceborne aerosol products are outlined in the 6 first column of Table 2, except for the POLDER products that do not have an established SDS 7 product naming convention. The sampled satellite data products are derived directly from the 8 retrieval level aerosol products (Level 2) that represent the highest available spatial resolution 9 for each product/sensor combination and are free of aggregation artifacts that can be present 10 in data at Level 3 Zhang and Reid, 2010). 11 Of the 11 sampled products, 3 are combined land-and-ocean products, 6 are land-only 12 products, and 4 are ocean-only products. Furthermore, 6 aerosol products are retrieved from 13 the twin MODIS-Terra and MODIS-Aqua sensors using the same set of 3 algorithms: the 14 ocean algorithm is used for the retrievals over oceans and other large bodies of water, the land 15 Dark Target (DT) algorithm is used over vegetated regions and other dark surfaces (Remer et 16 al., 2005), and the land Deep Blue (DB) algorithm is used for deserts and barren lands (Hsu et 17 al., 2004). Although the results between the two MODIS sensors are expected to be very 18 close, they might still differ due to the different times of scene observation during the day and 19 other factors summarized in Remer et al., 2008). 20 The remainder of this section provides a brief description of the analyzed products and 21 highlights some of the unique aerosol properties reported in these products. A more detailed 22 overview can be found in the theoretical and validation works of the respective science teams 23 of the products as cited below, while a general comparative overview of multiple products 24 and retrieval algorithms are in (Kokhanovsky et al., 2007;Lee et al., 2009;Li et al., 2009;Yu 25 et al., 2006). 26 AERONET (http://aeronet.gsfc.nasa.gov) sun-photometers measure aerosol properties using 27 ground-based observations of solar direct and diffuse irradiances. In this work, the 28 AERONET product used is the aerosol optical depth or thickness (AOD or AOT), which is 29 retrieved from the AERONET direct measurements of solar irradiance. Since AERONET 30 measurements are made from the ground looking up, they present a distinct advantage over 31 spaceborne retrievals in that they are not affected by uncertainties associated with the effects 1 of surface properties as much as satellite measurements are (Dubovik et al., 2002;Holben et 2 al., 1998Holben et 2 al., , 2001. Furthermore, the Level 2.0 AERONET data used in this work are carefully 3 calibrated, cloud screened, and quality assured (Smirnov et al., 2000) and therefore are 4 especially suitable for use as the reference standard against which the satellite aerosol remote-5 sensing data are evaluated. 6 The MODIS (http://modis.gsfc.nasa.gov) aerosol product (MOD04 and MYD04) comprises 7 the column aerosol optical thickness and other physical properties of aerosols retrieved 8 globally over land and ocean Hsu et al., 2004;Ichoku et al., 2005;Levy et 9 al., 2010;Remer, 2002;Remer et al., 2005). 10 The MISR (http://www-misr.jpl.nasa.gov) aerosol product (MIL2ASAE) features aerosol 11 retrievals based on observations from 9 independent camera angles. Multiple viewing angles 12 allow MISR to measure certain aerosol properties that are not available from the other 13 instruments (e.g., aerosol particle size). Furthermore, MISR multiple cameras enable 14 retrievals under conditions that are unfavorable to single-view (e.g., nadir) instruments, such 15 as over bright surfaces or sun glint, where the other instruments are unable to make reliable 16 retrievals in the visible wavelengths (Kahn, 2005;Kahn et al., 2010a;Martonchik et al., 17 2009). 18 The OMI (http://www.knmi.nl/omi/research/instrument/index.php) aerosol product 19 (OMAERUV) measures the near-UV (near ultraviolet) aerosol absorption and extinction 20 optical depth, as well as single scattering albedo, among other aerosol properties (Torres et 21 al., 1998(Torres et 21 al., , 2007. Moreover, OMI is capable of retrieving absorption optical depth in partially 22 cloudy conditions that usually pose a challenge to other aerosol instruments. 23 The POLDER onboard PARASOL (http://www.icare.univ-lille1.fr/parasol) aerosol land 24 product (P3L2TLGC) and aerosol ocean product (P3L2TOGC) are derived from measuring 25 spectral, directional, and polarized properties of reflected solar radiation. One of the main 26 features of the POLDER instrument is its utilization of polarization properties of the 27 measured radiation for retrieving anthropogenic aerosol optical depth (Bréon et al., 2002;28 Buriez et al., 2002;Deuzé et al., 1999Deuzé et al., , 2001Herman et al., 1997). It is important to note that 29 the POLDER operational algorithm retrieves AOD at 2 wavelengths (670 and 865 nm) over 30 ocean and only at 1 wavelength (865 nm) over land. Furthermore over land, the POLDER 31 algorithm retrieves only AOD that corresponds to polarized particles, i.e., mainly fine mode 1 particles originating from anthropogenic activities. 2 The CALIOP (http://www-calipso.larc.nasa.gov) aerosol product (05kmALay) represents 3 daytime and nighttime atmospheric curtain slices portraying the vertical distribution of 4 aerosols and clouds in the atmosphere, including the density and certain properties of 5 individual aerosol layers (Omar et al., 2009;Winker et al., 2007). 6 The SeaWiFS (http://disc.sci.gsfc.nasa.gov/dust/) aerosol product (SWDB) uses the Deep 7 Blue algorithm to derive aerosol optical thickness and Ångström exponent. The key features 8 of this product are the retrievals of aerosol properties over both bright desert and vegetated 9 surfaces, avoidance of sun glint that improves aerosol retrievals over ocean, and a highly 10 precise calibration of the SeaWiFS sensor (Hsu et al., 2004(Hsu et al., , 2012. 11 Since each of the foregoing data sets has a few versions because of the periodic revisions and 12 updates of their retrieval algorithms over time, the data versions that were current at the time 13 of writing this paper (January 2013) were sampled, although the study has been designed in a 14 highly flexible way to enable rapid re-analysis as the new versions become available. The While the AERONET Level-2 data are manually inspected to be free of retrieval defects and 23 anomalies (Smirnov et al., 2000), such approach is not feasible for the voluminous spaceborne 24 data. Instead, all Level-2 aerosol products analyzed in this paper assign to AOD pixels one or 25 more quality assurance (QA) flags that indicate a degree of 'confidence' of the retrieval 26 algorithms in their results. For MODIS and SeaWiFS, aerosol QA flags are integer numbers 27 ranging from 0 to 3, with 3 representing the highest quality. For MISR and OMI data, the 28 reverse is the case (i.e., 0 is the highest quality). Finally, for POLDER and CALIOP, QA data 29 are a combination of one or more flags, most of which are real numbers ranging between 0 30 and 1, where 1 indicates the highest quality. By means of these QA flags, certain aerosol 31 retrievals are identified as 'bad quality' and are considered to be not trustworthy enough for 1 certain analyses. Therefore, users of these aerosol products have been advised to choose data 2 corresponding to a range of QA values that is most appropriate for their specific needs. 3 To establish similar yet valid QA thresholds for the analyzed products, we consulted science 4 teams of the analyzed products as well as data product validation results reported by these 5 teams and other research groups. Based on this inquiry, we chose the acceptable QA values as 6 described in Table 2. For the majority of the products, the thresholds are set based on 7 selecting a limited subset of the possible QA values. An important exception is the POLDER 8 aerosol products, where the QA flags are expressed as real numbers between 0 ('bad 9 retrieval') and 1 ('excellent retrieval'). Since there are no formal recommendations on the 10 acceptable range of the flag values, we empirically set its threshold to ≥0.7, which selects data 11 of a reasonable quality yet discards the minimal number of data pixels (see Fig. 3). 12 The Specifically, compared to screening individual pixels, QA mode filtering removes 50% more 28 of the collocated data points and degrades the slope of the fitted regression line as a result of 29 removing certain high-biased points. The opposite behavior can be observed in the collocated 30 OMI AOD and AERONET AOD datasets, where screening by QA mode degrades R 2 of the 31 collocated data when compared to screening individual pixels, although RMSE is still 32 improved and the slope of the fitted regression line remains the same, since both screening 1 approaches produce approximately the same number of the OMI subset data points. 2 This observation indicates a certain inhomogeneity in the uncertainties that are present in the 3 aerosol products, as in some cases high biases in individual pixels might overwhelm the 4 statistics derived from the sample set. Therefore, to avoid such biases and ensure a fair 5 comparison between the analyzed aerosol products, the rest of this study is based on the QA 6 'pre-filtering' approach, where individual pixels in a spatial sample are screened by their QA 7 values before computing the statistics of this sample. This approach also closely models a 8 typical use of the spaceborne aerosol data, where data users screen each pixel individually and 9 do not consider QA values of its neighboring pixels. The data quantity impact of the described 10 QA screening approach can be observed in Table 3 that provides the sizes of the analyzed 11 datasets before and after the screening. It is noticeable that, depending on the product, the 12 impact is quite different, with the two MODIS ocean AOD datasets and the MISR AOD 13 dataset retaining almost all of their available datasets whereas the two MODIS DT datasets 14 retained only one-fourth of the complete collocated datasets. 15 It is important however to keep in mind that the QA values reported by the retrieval 16 algorithms are to a large degree subjective to these algorithms and do not always reflect the 17 actual quality of the retrievals. For example, in an absence of a proper aerosol or surface 18 model, an algorithm can in certain cases use a wrong model to retrieve aerosol properties and 19 mistakenly assign this retrieval a 'good' QA flag, e.g., (Kahn et al., 2010a;. 20 Furthermore, a retrieval algorithm used might not have enough skill or even the possibility to 21 correctly recognize certain conditions that are unfavorable for aerosol retrieval, e.g., sub-pixel 22 cloud contamination in OMI retrievals (Torres et al., 1998). In yet another situation, an 23 aerosol scene can be observed in only a portion of the available observation modes of a 24 sensor, e.g., in only a few of the available observation directions in POLDER (Herman et al., 25 1997), which can lead to more confident yet less reliable results. The opposite case can also 26 be true, where an algorithm correctly retrieves aerosol properties but is not confident about 27 the retrieval. As an example, consider Fig. 5 that explores how QA screening degrades the 28 statistics of OMI AOD and Aqua MODIS Deep Blue AOD datasets when compared to 29 AERONET AOD over Djougou, Benin, as a result of assigning a 'bad' QA flag to sufficiently 30 'good' retrievals. 31

Possible Data Outliers 1
Under rare circumstances, aerosol retrievals from spaceborne observations can produce 2 aerosol properties that do not reflect the actual physical state of aerosol in the atmosphere. 3 Some of the reasons for such retrievals were discussed in the previous section and might 4 include, but are not limited to, such factors as the lack of a proper aerosol model, incorrect 5 assumptions about boundary conditions, cloud contamination, and several other factors. In 6 Fig. 6, the possible abnormal retrievals can be visually identified by observing points that 7 have a minimal data density and lie abnormally far from the fitted regression lines. Even 8 though an actual fraction of such data points in a complete collocated data set can be 9 relatively minor, the extreme deviations of such points from the overall trend might 10 significantly bias and misrepresent the overall statistics of the data. Therefore, when 11 computing the overall statistics and inter-comparing the aerosol products, such data points 12 should be treated as possible outliers and analyzed separately from the rest of the data. 13 In order to identify and separate the possible data outliers, we analyzed AOD residuals, i.e., 14 the difference between spaceborne AOD and AERONET AOD observations, using the 15 Modified Z-Score test (Iglewicz and Hoaglin, 1993;National Institute of Standards and 16 Technology, 2012). This test is designed for testing data for multiple outliers in 17 approximately normal data sets and works by finding data points that differ from the mean 18 value by more than 5 median absolute deviations. Unlike the standard deviation used in the 19 traditional Z-Score test, the median absolute deviation in the Modified Z-Score test is 20 calculated based on the median of the data and is less sensitive to extreme values. 21 It is pertinent to note that even though AOD data are known to follow the lognormal 22 distribution (O'Neill et al., 2000), the AOD residuals of the analyzed products follow an 23 approximately normal distribution as shown in Fig. 7 and Fig. 8, with the exception of the 24 POLDER products that mostly underestimate AOD, because their retrievals focus on 25 anthropogenic aerosols, and thus represent only the negative portion of the distribution. In the 26 figures, it can be seen that the distributions have long tails, strongly indicating a presence of 27 outliers. Furthermore, it can be observed that the slopes of the fitted lines are different from 28 the slope of the 1:1 line. This indicates that the standard deviation of the analyzed residuals is 29 different from 1, showing that these data do not follow the standard normal distribution, 30 although this difference does not affect the test since the Modified Z-Score test normalizes 31 residuals by the median absolute deviation of the data. 32 The overall effect of removing the possible outliers can be observed in the bottom-right sub-1 plots of Fig. 6, Fig. 7, and Fig. 8, as highlighted by the green frames, showing that 926 (6.9%) 2 outliers are removed from the SeaWiFS Ocean AOD dataset. The total numbers of the 3 removed outliers are provided in Table 3 and do not exceed 12% of the total QA-screened 4 data for any of the datasets when considering the all-season data. The global distribution of 5 the possible data outliers is depicted in Fig. 9 and generally corresponds to the outlier 6 locations reported by the science teams of the aerosol products, e.g., outliers around the 7 coastal areas where the significant subpixel surface variations, shallow waters, sediments, and 8 complex marine / inland aerosol mixtures complicate the retrievals, and also data outliers 9 associated with uncertain retrievals by the MODIS and MISR algorithms in Amazon basin 10 and near the Sahara desert (Kahn et al., 2010a;, although a more detailed 11 study is needed to determine the specific factors that lead to these outliers and their spatio-12 temporal distributions, in order to develop appropriate mitigation measures in the retrieval 13 algorithms for each of the products. In the remainder of this paper, the reported results are 14 based on the QA-screened data with the outliers removed. 15 16 6 Analysis 17 The overall data distribution for the analyzed spaceborne aerosol products is presented in Fig.  18 6, whereas the detailed linear regression fit statistics (Fox, 1997) for the products based on the 19 treatment of the possible data outliers and the nominal delimiters of the four boreal seasons, 20 namely, spring (March-May), summer (June-August), autumn (September-November), and 21 winter (December to February), are listed in Table 3. The statistics are presented based on a 22 seasonal timeframes rather than monthly or shorter time periods because there may not be 23 sufficient coincident data for a scatter plot over such shorter time periods, due to the 24 infrequency of satellite aerosol retrieval caused by cloud cover and other issues. Fortunately, 25 many climatic events that are relevant to aerosol emission, transport, and distribution are 26 often roughly aligned with these seasons. 27 In the presented statistics, the slope value indicates by how much the satellite retrieval for the 28 parameter under consideration is relatively underestimated or overestimated across different 29 magnitudes, depending on whether the slope value is less than or greater than unity. The 30 offset parameter indicates the extent to which the satellite retrieval is biased. The squared 31 linear correlation coefficient (R 2 ) indicates how consistent the parameter retrieval is across its 32 13 magnitude range, that is how tightly the points are aligned close to the 1-to-1 line. Finally, the 1 root mean square error (RMSE) indicates the accuracy of the retrievals measured as the 2 average error in the spaceborne retrievals as compared to the ground-based AERONET 3 retrievals. 4 In Table 3, it can be seen that all MODIS, MISR, and SeaWiFS aerosol products correlate to 5 AERONET observations with R 2 ≥0.6. Furthermore, MISR, SeaWiFS Land, and MODIS Dark 6 Target products have R 2 ≥0.7 and MODIS Ocean products have R 2 ≥0.8. Also, once the  Table 3. While all of the products demonstrate the high seasonal variations in the 17 statistical parameters, the OMI, CALIOP, POLDER, SeaWiFS, and MODIS Deep Blue are 18 the most sensitive to the seasonal changes in the retrieval conditions, perhaps because of the 19 uncertainties associated with cloud screening , although collocating 20 spaceborne observations with AERONET introduces certain bias towards cloud-free scenes 21 because of the comprehensive AERONET cloud screening procedures (Smirnov et al., 2000). 22 Furthermore, it can be seen that while removing the data outliers greatly reduces the RMSE 23 and removes sensitivity to the seasonal changes in CALIOP, POLDER, and SeaWiFS, the 24 sensitivity remains the same for OMI and MODIS Deep Blue indicating that the retrieval 25 errors reflected by the RMSE of these products likely stem from the regular retrievals rather 26 than the anomalous retrievals. 27 The accuracy of the spaceborne aerosol products might vary with the location of the retrieval 28 and, depending on the location, some products might be significantly more accurate than 29 others. The spatial dependence of the accuracy of the analyzed products is explored in Fig. 11  30 and Fig. 12, where it can be observed that no single sensor provides the best retrievals at all 31 sites. Furthermore, as indicated by the lighter shading in Fig. 11 (e.g., Southern Australia) and 32 also in the histogram of R 2 inset in this figure, some sites are not covered by high-correlation 1 (i.e., R 2 ≥0.75) retrievals at all or have no collocated retrievals from the most accurate of the 2 products. 3 Furthermore, it can be observed that the best-performing aerosol products differ between Fig.  4 11 and Fig. 12 and the products providing the best RMSE are oftentimes those with the lower 5 R 2 . Therefore, when choosing an aerosol product for a specific analysis goal and at a specific 6 region, it is necessary to consider a balance between a variety of seasonal, statistical, and 7 spatial factors. 8 9 7 Accuracy of aerosol data products based on land cover type 10 Aerosol properties are derived from satellite observations based on a set of assumptions about 11 the type and the optical properties of the underlying terrestrial surfaces. Therefore, it can be 12 beneficial to compare the accuracy of the considered aerosol data products based on the land 13 cover types of the sites over which the data subsets were extracted. As a reference for land 14 cover types and their spatial extent, we used the global data set that is based on the 15 International Geosphere-Biosphere Programme (IGBP) classification scheme and is available 16 from the suite of MODIS products (Friedl et al., 2002). For each land cover type, we 17 identified coincident AERONET stations and averaged their corresponding statistical results 18 from Section 6. Tables 4 and 5  and SeaWiFS demonstrate the best results with R 2 ≈0.7. Furthermore, POLDER Ocean data 28 set has a good RMSE=0.08 (Deuzé et al., 1999) that is comparable to the best performing 29 sensors in this region, albeit it has a relatively low squared correlation coefficient value of 30 R 2 =0.55; note that these statistics are different from POLDER Ocean statistics in Fig. 6 that 31 analyzes a more complete set of AERONET stations. It is interesting to note that the 1 correlation between AERONET and Aqua MODIS AOD with R 2 =0.8 is higher than the 2 correlation between AERONET and Terra MODIS AOD with R 2 =0.74. A detailed inspection 3 of the data showed that this difference stems from several AERONET sites with relatively 4 small numbers of collocated data points (N<35) and the average AOD below 0.2. Under such 5 low-AOD conditions, MODIS Ocean algorithm has difficulty in retrieving the precise AOD 6 values and, as a result, is subject to an increased rate of errors Remer, 7 2002). 8 Evergreen broadleaf forest regions provide conditions that are favorable for retrieving AOD 9 and multiple sensors demonstrate the high correlation with AERONET, including MODIS 10 Dark Target with R 2 =0.85, MISR with R 2 =0.89, SeaWiFS with R 2 =0.94, and POLDER with 11 R 2 =0.7. However, since these regions are also susceptible to complex smoke events (e.g., Ji 12 Parana,Brazil), sometimes combined with dust and pollution events (e.g., Anmyon, S. Korea, 13 Hong Kong, China), most of the sensors demonstrate a rather poor RMSE . 14 The important exception is POLDER dataset that has RMSE=0.07, possibly because 15 POLDER is especially sensitive to small particles produced by biomass burning and 16 anthropogenic pollution sources (Fan et al., 2008), thereby retrieving fairly accurate AOD 17 values at Ji Parana and Lulin, Taiwan. It should be also noted that together with deciduous 18 broadleaf forests and savannas, evergreen broadleaf forest is one of the 3 land cover types 19 where POLDER demonstrates very good results with R 2 ≈0.7, indicating the advantage of 20 polarization measurements for aerosol retrievals over these regions. 21 For mixed forests, MODIS Dark Target products provide the highest retrieval accuracy with 22 R 2 =0.78 for Terra and 0.82 for Aqua, while MISR data is somewhat less accurate with R 2 =0.7 23 as a result of underestimating high AODs during summertime biomass burning events (Kahn 24 et al., 2010b), although RMSE=0.04 of MISR is almost a factor of two better than 25 RMSE=0.08 of Terra MODIS and RMSE=0.07 of Aqua MODIS. Sufficiently reliable aerosol 26 data are also retrieved by SeaWiFS with R 2 =0.69 and by POLDER with R 2 =0.65. 27 For closed shrubland, CALIOP with R 2 =0.88 MISR with R 2 =0.81, and MODIS Deep Blue 28 with R 2 =0.74 for Terra and R 2 =0.85 for Aqua produce the best results. Although MODIS 29 Deep Blue shows a better performance than MODIS Dark Target for this land-cover type, the 30 Deep Blue products are retrieved only over a single Lake Argyle AERONET site in northern 31 Australia, whereas Dark Target products are retrieved over 7 sites and have a significantly 32 larger number of data points. Likewise, the best result demonstrated by CALIOP also 1 originates exclusively from the Lake Argyle retrievals. The difference of 0.1 in R 2 between 2 MODIS Terra Deep Blue and MODIS Aqua Deep Blue can be partly explained by the 3 difference in the data availability of these two data sets, as MODIS Terra Deep Blue at the 4 time of this work is availably only through 2007; this effect can be also observed for several 5 other land cover types, where MODIS Aqua Deep Blue tends to have a lower correlation to 6 AERONET and produces results that are closer to the results of SeaWiFS, probably because 7 the latter is also based on the Deep Blue retrieval algorithm (Hsu et al., 2006). 8 Over wooded savannas, both Dark Target and Deep Blue products from MODIS, and 9 SeaWiFS produce very good results with R 2 ≈0.85. MISR with R 2 =0.63 and OMI with 10 R 2 =0.66 produce lower, but still reasonable results. The reduced performance of MISR in this 11 region can be explained by the lack of region-specific aerosol mixtures in its retrieval 12 algorithm, a situation that is expected to be improved in future revisions of the product (Kahn 13 et al., 2009). It should be also noted that this region enables the highest correlation between 14 OMI and AERONET observations, probably as a result of favorable cloud-free conditions in 15 sub-Saharan Africa (Ahn et al., 2008;Torres et al., 2007). 16 Open shrublands are very dry and sparsely vegetated regions that are characterized by bright 17 surfaces. Such regions present a great challenge for remote retrieval of aerosol properties 18 ) and none of the analyzed products exceeded the correlation coefficient of 19 0.7. Among the best-performing products, CALIOP produced the best results with R 2 =0.68, 20 closely followed by MODIS Dark Target  MISR was the only highly accurate aerosol product with R 2 =0.83 for snow / ice and R 2 =0.78 29 for barren lands, thanks to its multi-angle measurement capabilities that allow retrieving 30 aerosol properties over bright surfaces and enable the advanced cloud and ice detection 31 capabilities . 32 1 8 Conclusions 2 In this paper, we analyzed and intercompared 11 spaceborne aerosol products from MODIS, 3 MISR, OMI, SeaWiFS, POLDER, and CALIOP sensors, which were sampled fairly 4 uniformly based on the MAPSS framework that was used to collocate these spaceborne 5 observations with ground-based AERONET observations during the period of 2006-06-07 6 and 2010-12-11, when all the sensors were operational. Based on this analysis, for each of the 7 AERONET stations, we identified products providing the best correlation coefficient (R 2 ) and 8 root mean square error (RMSE). It was found that no single product provides the best retrieval 9 over all sites, and certain sites are not covered by accurate retrievals at all. Furthermore, it was 10 observed that a product providing the best R 2 at a certain location does not always provide the 11 best RMSE at the same location. Therefore, to facilitate the multivariate analysis that is 12 necessary when choosing the most suitable spaceborne aerosol product at a specific region, 13 we plan to develop an interactive tool that would allow exploration of the multi-sensor 14 collocated data on an interactive map. 15 Further, a statistical approach based on the statistical Modified Z-Score test has been used to 16 automatically identify possible data outliers in collocated data sets. The reported analysis 17 shows that even though such atypical data points constitute a relatively minor portion (3%-18 12%) of the analyzed data sets, they can significantly bias the results of the statistical 19 analysis. For this reason, it is suggested that such data points be set aside when analyzing 20 collocated data sets and inspected separately. 21 Finally, we assessed the accuracy of the spaceborne aerosol products based on IGBP land 22 cover classification scheme. This analysis identified sensors that retrieve the most accurate 23 aerosol properties over each of the defined land cover types and highlighted the differences 24 that exists between the sensors, providing an advantage or disadvantage in retrieving AOD 25 over the areas of a particular land cover type. Notably, some of the land cover types, 26 including open shrublands and grasslands, had only moderately accurate retrievals, indicating 27 the need for improved spaceborne aerosol remote sensing instrumentation/approaches and/or 28 "Coherent uncertainty analysis of aerosol data products from multiple satellites". We thank 6 the science and support teams of MODIS, MISR, OMI, POLDER, CALIOP, SeaWiFS, and 7 AERONET for retrieving and making available their respective aerosol products, as well as 8 for providing assistance during the development of MAPSS sampling for these products. 9 Specifically, we are grateful to certain individual members of the aerosol product teams for 10 their insight and willingness to provide us answers to various questions related to their 11 respective products, namely: AERONET (Brent Holben, Thomas Eck,Oleg Dubovik,12 Alexander Hsu, Andrew Sayer), and POLDER (Didier Tanre, Jacques Descloitres, Fabrice Ducos). We 17 also give special thanks to the PIs of the global AERONET sites and their staff for 18 establishing and maintaining these sites. Finally, we would like to honor the memory of our 19 colleague, Gregory Leptoukh, who passed away suddenly in January 2012, as we had a long-20 term collaboration with him that resulted in the implementation of the MAPSS framework on 21 the GIOVANNI data analysis system, and he was part of the initial discussions of the ideas 22 that led to this study.  Table 1. Ground-based and spaceborne atmospheric aerosol products analyzed in the study. In 1 the product designation titles, 'O' at the end of the title of a product signifies ocean retrievals, 2 'L' -land retrievals, 'DT' -land retrievals using the MODIS Dark Target algorithm, and 3 'DB' -land retrievals using the MODIS Deep Blue algorithm. The AERONET AOD 4 retrievals were interpolated or extrapolated to the studied wavelengths of the spaceborne 5 sensors. The indicated local equatorial crossing times are based on the original orbital 6 designs, and can change during the lifetimes of the satellites. SeaWiFS mission has ended in 7  Table 2. Studied aerosol data sets, the matching data quality (QA) data sets, and the 1 corresponding QA data screening criteria. Where provided, numbers in parenthesis in the 2 middle column indicate the base-1 layer index, base-0 bit number, and number of bits 3 extracted from this QA data set. For MODIS, MISR, OMI, and SeaWiFS the QA values are 4 integer numbers between 0 and 3, where for MODIS and SeaWiFS larger numbers indicate a 5 better retrieval quality and for OMI and MISR the opposite is true. For POLDER, QA is real 6 number between 0 (worst) and 1 (best). For CALIOP, the QA condition is applied to all layers 7 found in a column; the whole column is rejected if at least one layer fails the test. The listed 8 extinction QC values indicate retrievals that are unconstrained, constrained, have a reduced 9 lidar ratio, or detected an opaque aerosol layer. CAD score and layer type and subtype flags 10 indicate retrievals that classified a layer with a high confidence as containing aerosol and 11 were able to determine the aerosol type. IAB condition is set to prevent the retrieval anomaly 12 of overcorrecting the attenuation of overlaying layers (Kittaka et al., 2011 Aerosol Optical Thickness at 670 nm Quality index for the inversion QA≥0.7 Table 3. Statistics of the studied aerosol data sets based on all AERONET stations during the 1 period of 2006-06-07 and 2010-12-11. 'Ntot' indicates the total number of the collocated 2 Spaceborne AOD -AERONET AOD data points, while 'Nfilt' indicates the number of data 3 points after filtering (screening) the spaceborne data by QA as described in Section 4 and 4 Table 2. 'Nout' is the total number of the possible data outliers determined as explained in 5 Section 5. The last 8 columns present the statistics on the collocated data based on regression 6 fits also plotted in Fig. 6 Table 4. Linear fit correlation coefficient (R2) between the collocated spaceborne and ground-1 based observations of AOD estimated at the stations that coincide with different IGBP land 2 cover types. Empty cells indicate no collocated data available from a specific sensor over a 3 specific land cover type. No AERONET stations are available at the areas occupied by 4 Deciduous needleleaf forest. The statistics were calculated based on the data that was pre-5 filtered by QA and screened of outliers as described in Sections 4 and Section 5. A graphical 6 representation of this table is in Figure 13 Table 5. Root mean square error (RMSE) between the collocated spaceborne and ground-1 based observations of AOD estimated at the stations that coincide with different IGBP land 2 cover types. Empty cells indicate no collocated data available from a specific sensor over a 3 specific land cover type. No AERONET stations are available at the areas occupied by 4 Deciduous needleleaf forest. The statistics were calculated based on the data that was pre-5 filtered by QA and screened of outliers as described in Sections 4 and Section 5. A graphical 6 representation of this table is in Figure 14 Table 3.   Figure 11. Spaceborne datasets with the best correlation (R 2 ) of the retrieved AOD to the 3 AOD measured by inland (top) and coastal or island-based (bottom) AERONET sites. The 4 intensity of marker shading indicates the degree of correlation. Marker shape indicates the 5 range of root mean square error (RMSE) associated with the displayed best R 2 . Finally, 6 marker size corresponds to the number of collocated data points used to compute the 7 displayed statistics. Histograms in the bottom insets highlight the distribution of these 8 statistics over all sites based on bins of 0.05 AOD. The statistics were calculated based on the 9 data that were pre-filtered by QA and screened of outliers as described in Sections 4 and 10 Section 5. 11 44 1 2 Figure 12. Spaceborne datasets with the best root mean square error (RMSE) of the retrieved 3 AOD to the AOD measured by inland (top) and coastal or island-based (bottom) AERONET 4 sites. The symbols used are the same as the symbols in Figure 7. The statistics were calculated 5 based on the data that were pre-filtered by QA and screened of outliers as described in 6 Sections 4 and Section 5. Figure 14. Land cover type dependence of root mean square error (RMSE) between the 3 collocated spaceborne and ground-based (AERONET) observations of AOD. Areas 4 corresponding to each IGBP land cover type (bottom right inset) are colored based on the 5 average of the data from those AERONET sites that reside in these areas. The statistics were 6 calculated based on the data that were pre-filtered by QA and screened of outliers as described 7 in Sections 4 and Section 5. 8