Inter-comparison of three AATSR Level 2 ( L 2 ) AOD products over China

The Advanced Along-Track Scanning Radiometer (AATSR) aboard on ENVISAT is used to observe the Earth by dual-view. The AATSR data can be used to retrieve aerosol optical depth (AOD) over both land and ocean, which is an 15 important merit in the characterization of aerosol properties. In recent years, aerosol retrieval algorithms both over land and ocean have been developed, taking advantages of the feature of dual-view which can help eliminate contribution of Earth’s surface to top of atmosphere (TOA) reflectance. Aerosol_cci project as a part of Climate Change Initiative (CCI) provides users three AOD retrieval algorithms for AATSR data, including the Swansea algorithm (SU), the ATSR-2ATSR dual view aerosol retrieval algorithm (ADV) and the Oxford-RAL Retrieval of Aerosol and Cloud algorithm (ORAC). The Validation 20 team of Aerosol-CCI project has validated AOD (both Level 2 and Level 3 products) and AE (Level 2 product only) against the AERONET data in a round robin evaluation using validation tool of AeroCOM (Aerosol Comparison between Observations and Models) project. For the purpose of evaluating different performances of these three algorithms on calculating AODs over mainland China, we introduce ground-based data from the CARSNET (the China Aerosol Remote Sensing Network) which is designed for aerosol observation in China. Because China is vast in territory and of great differences 25 in surface, the combination of the AEROENT and the CATRNET data can validate L2 AOD products more comprehensively. The validation results show different performances of these products in 2007, 2008 and 2010. The SU algorithm has very good performance over sites with different surface conditions in mainland China from March to October, but it underestimates AOD slightly with varying mean bias error (MBE) from 0.05 to 0.10 over surface of barren or sparsely vegetation in western China. The ADV product has same precision with high correlation coefficient (CC) larger than 0.90 over most of sites and same error 30 distribution as the SU product. The main limits of ADV algorithm are underestimation and applicability, especially it occurs obvious underestimation over sites of Datong, Lanzhou and Urmuchi where the dominated land cover is grassland with MBE larger than 0.2 and the main source of aerosol is coal combustion and dust. The ORAC algorithm has the ability of retrieving Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2016-195, 2016 Manuscript under review for journal Atmos. Chem. Phys. Published: 16 March 2016 c © Author(s) 2016. CC-BY 3.0 License.


Introduction
The Pan-Eurasian Experiment (PEEX) is a multidisciplinary, multiscale and multicomponent research, research infrastructure and capacity-building program (Kulmala et al. 2015).One of the strategically most important task of PEEX is to filling the observational gap in atmospheric in-situ data in the Siberian and Far East regions and start the process towards standardized and harmonized data procedures (Kulmala et al. 2011).Aerosols play a major role in Earth's climate system, including intervening in the radiation budget and cloud processes, and affecting air quality and human health (Remer et al., 2005;Samet et al., 2000;Tzanis and Varotsos, 2008;Kokhanovsky and de Leeuw, 2009).The particles suspended in the troposphere scatter solar radiation back to cool the atmosphere or absorb solar radiation, which warms the atmosphere, causing changes in the net effect of aerosols.These particles could also affect the formation and microphysical properties of clouds as cloud condensation nuclei (Andreae and Rosenfeld, 2008).The source of aerosols could be anthropogenic or natural (Varotsos et al. 2012).
Particles from different sources are mixed into aerosol masses to influence AOD, reduce visibility (Kinne et al., 2003;Varotsos 2005;Remer et al., 2005) and cause spatial and temporal variability of AOD; therefore, the largest uncertainties in the estimation of radiative forcing are introduced by aerosols (IPCC, 2013).
Over the past 35 years, different types of satellites have been used to obtain atmospheric information, especially aerosol properties (Griggs, 1979;Kokhanovsky and de Leeuw, 2009).Remote sensing provides a means to obtain global and longterm observations of aerosols, especially in the widest ocean and remote regions where ground-based stations cannot be constructed.In addition, polar-orbiting satellites and geostationary satellites obtain daily global images, which helps to capture changes in aerosol patterns and properties (Prins et al., 1998;Torres et al., 2002).There are, however, many difficulties in observing aerosols by satellites because depending on the surface properties, the contribution to the signal received by the satellite can vary drastically; aerosol components and concentrations are constantly varying, and their sources cannot be precisely determined (Levy et al., 2007).
The Advanced Along-Track Scanning Radiometer (AATSR) aboard ENVISAT is used to observe the Earth in dual-view, of which one is nadir direction and the other is forward direction with a viewing angle of 55° from nadir view.The AATSR was designed to have seven spectral channels at wavelength of 0.55, 0.67, 0.87, 1.63, 10.7 and 12 μm.The nadir spatial resolution is 1 km 1 km with a swath width of 512 pixels.Furthermore, the AATSR instrument equipped two calibration targets, blackbody calibration target for thermal channels and opal visible calibration target for visible and near-Infrared channels, aiming to implement self-calibration.The data from AATSR can be used to retrieve AOD both over land and ocean, which is important for the characterization of aerosol properties (Adhikary et al., 2008).In recent years, some aerosol retrieval algorithms have been established both over land and ocean, taking advantage of the features of dual-view, which can help eliminate the contribution of surface to top of atmosphere (TOA) reflectance.Aerosol_CCI, as part of the Climate Change Initiative (CCI) (http://www.esa-aerosol-cci.org/),provides users with three algorithms for AATSR data, including the Swansea algorithm (SU) (Bevan et al. 2012), the ATSR-2/AATSR dual-view aerosol retrieval algorithm (ADV) (Kolmonen et al. 2015) and the Oxford-RAL Retrieval of Aerosol and Cloud algorithm (ORAC) (Thomas et al. 2009).The aim of this work is to evaluate different performances of these algorithms in calculating AOD over different regions of China in 2007China in , 2008China in and 2010. .A ground-based sun-photometer has been used to take sun and sky measurements directly (Holben et al., 1998).The Aerosol Robotic NETwork (AERONET) has constructed hundreds of sites all over the world as of 2015.These stations, operated by the American National Aeronautics and Space Administration (NASA), are operational worldwide, providing multi-spectral channel validation data for satellite-retrieved data to complete synthetic measurements on a global scale.
The China Aerosol Remote Sensing Network (CARSNET) is a ground-based aerosol monitoring system that uses CE-318 sunphotometers, similar to AERONET, and has constructed 37 sites throughout China (Che et al., 2009).It has been validated that CARSNET AOD measurements are approximately 0.03, 0.01, 0.01 and 0.01 larger than measurements of AERONET at the 1020, 870, 670 and 440 nm channels, respectively (Che et al., 2009).In this paper, we combine two aerosol observation datasets from AERONET and CARSNET as reference data to validate these three AATSR AOD products over China more comprehensively.
The basic method for assessment is to compare the retrieval results with data (AOD mainly) obtained by AERONET/CARSNET.However, this direct comparison of retrieval results with AERONET data is limited due to different cloud removalcloud screening processes (de Leeuw et al., 2013), and such a limitation could influence the validation reliability to some extent.To make the validation more reliable, comparison of the retrieval results with high quality data from MODIS or MISR is also one effective method for validation (Kahn et al., 2009).However, AERONET or other ground-based networks provide accurate measurements without the influence of land surface reflection (Holben et al., 1998), which means that comparison of retrieved AOD with ground-based measurements is the basic method.The AATSR L2 products provided by Aerosol_CCI have been validated by the validation team via a round robin (RR) test (de Leeuw et al., 2013).On this basis, we focused on assessing the performances of AATSR aerosol L2 products in mainland China by comparing the retrieval results with AERONET and CARSNET data.

Reference data and validation statistics
AOD is the most important parameter in terms of aerosol properties and is different from other retrieved parameters under the project of Aerosol_CCI.The Aerosol_CCI project adopts three aerosol retrieval algorithms for ATSR-2/AATSR instrument, including Swansea algorithm (SU) (Bevan et al. 2012), the ATSR-2/AATSR dual view aerosol retrieval algorithm (ADV) (Kolmonen et al. 2015) and the Oxford-RAL Retrieval of Aerosol and Cloud algorithm (ORAC) (Thomas et al. 2009b).All of these three algorithms have ability in retrieval of aerosol properties both over land and ocean.ADV algorithm was originally developed for retrieving AOD properties over land at wavelength of 0.555, 0.659 and 1.61 μm (Veefkind et al. 1998).The main advantage of ADV is the introduction of k-ratio approach to eliminate contribution of reflection to TOA reflectance, which uses the ratio of the reflectance measured in the forward and nadir views (Flowerdew and Haigh, 1995).The ORAC algorithm is designed to retrieve AOD properties at each of four AATSR short-waves channels both over land and ocean, including AOD, effective radius and surface reflectance.The build of the forward model used in ORAC algorithm is based on radiative transfer code -DISORT.A parameterized model of surface reflectance distribution is used in retrieval and combines with the AATSR dual-view to make up shortage of the need of a priori of reflectance (North et al. 1999).An iterative optimization method is employed to determinate AOD, aerosol type and surface reflectance.AATSR L2 data (see Tab. 1) are daily products with a spatial resolution of 10 10 , and contain a quality flag or a level of confidence for each pixel (de Leeuw et al., 2013).Compared to the Level 3 (L3) product with a spatial resolution of 1° 1°, daily L2 data have higher spatial resolution, which helps to capture greater detail of aerosol properties and is further explored in our follow-up study.It has been demonstrated that the ground-based observation data from the AERONET have the ability and precision to be used as reference data when users validate AOD (Holben et al., 1998).There are eight AERONET sites in mainland China providing AOD measurement data are available on the website.However, most of these sites are distributed in the eastern China coastal area, as shown in Fig. 1, which, however, does not meet the requirements of comprehensively validating the aerosol properties over all of China.Substantial hazardous aerosol pollution affects most regions of northern (Li, 2014) and eastern China in winter, and heavy dust aerosols from the Taklimakan desert in western China can be transported long distances to eastern 5 China, even to Japan (Takahashi, 2011), resulting in regional differences.

6
The measurements from another network, the CARSNET, equipped with calibrated CE-318 instruments, have the same accuracy as AERONET.The CARSNET has more sites than the AERONET in mainland China, and the spatial distribution of the CARSNET sites is distributed more evenly.Therefore, for the purpose of assessing different performances of these three AATSR L2 AOD products, we selected ground-based measurements from both of these two networks as reference data.
The AERONET provides AOD data at three data quality levels: Level 1.0 (unscreened), Level 1.5 (cloud-screened), and Level 2.0 (L2) (cloud-screened and quality-assured) (http://aeronet.gsfc.nasa.gov/new_web/index.html).Here, we selected AERONET L2 data that are screened and quality-assured.Because both the AERONET and CARSNET data are AATSR products without band-effective wavelengths, we interpolated the ground-based data to the 550-nm wavelength.The AOD of the L2 datasets were compared with AERONET&CARSNET observation data using scatter plots and linear-regression of the data.The comparisons were made for collocated satellite and ground-based observations (Ichoku et al., 2002), i.e., AOD pixels were selected within a spatial extent of +/−50 25 km of ground-based stations and a time range of +/− 30 min of the AATSR overpass from the ground-based measurements.At least 5 five AATSR AOD retrievals and two2 AERONET/CARSNET observations are required in each collocation (Levy et al., 2010).
We conducted collocations according to year (2007, 2008 and 2010) and dataset (ADV, ORAC and SU).In total, 20 groundbased observation sites, including 12 AERONET sites and 8 CARSNET sites, were in the Chinese territory in 2007, of which 6 six AERONET and 8 eight CARSNET inland sites were selected.For 2008, we selected 8 AERONET and 24 CARSNET inland sites, for a total of 32 sites, ignoring the island sites and those near the shoreline.For 2010, only 6 six CARSNET sites are available for us, and a total of 14 inland sites were selected with eight8 AERONET inland sites (see Table 2).

Statistical Metrics
Collocated pairs are analysed using statistical methods.Bias describes the average difference between satellite retrievals and ground AOD.Then, to determine how well the satellite data match the ground-based observation data, the relationship between them is explored.Some basic statistics are shown on the scatter plot, including the root mean square error (RMSE): where τ , represents the ground-based observation data and τ represents the satellite retrievals.Mean satellite-retrieved AOD (MSA) and mean AERONET&CARSNET AOD (MAA) represent the central tendency of the data.Relative mean bias (RMB) is used to determine under-or overestimation of the AOD retrievals; it is the ratio of MSA to MAA: Mean bias error (MBE) is the mean difference between the satellite retrievals and AATSR AODs, and the mean absolute error (MAE) is the absolute value of the mean bias error.Together with RMB, the MBE and MAE are used to determine the magnitude of the difference between the two datasets.

KAPPA Statistics
In the scatter plot of the collocated pairs, the retrieved data and the corresponding collocated ground-based observation data could be considered as two arrays, and the main purpose of KAPPA is to explore how these two arrays match each other.For retrieval of aerosol properties, the performances of most algorithms decrease in effectiveness with increasing AOD, i.e., difficulties in retrieving AOD will be increased as AOD increases.The KAPPA coefficient was originally proposed as a descriptive statistic indicating the degree of beyond-chance agreement between two ratings per subject in a dichotomous form (Bloch and Kraaemer, 1989).KAPPA coefficients with various forms also could be used to measure the accuracy of thematic classifications (Rosenfield and Fitzpatrick-Lins, 1986).KAPPA is, in short, a measure of "true" agreement (Cohen, 1960).The pairs collocated by matching ground-based data with AATSR L2 AOD data could be regarded as two different arrays so that we introduced the KAPPA coefficient to assess agreement between these two arrays.Based on the concept of the KAPPA coefficient proposed by Cohen (1960), an appropriate modification with a two-category nominal scale is shown in Table 3. | is greater than 0.2 (according to EE4), then it is marked as "lowly relevancefar from truth", and if not, it is marked as "highly relevanceclose to truth".For the conventional formula of calculating the KAPPA coefficient: where P o is the proportion of observed agreement and P is the proportion of chance agreement. (4) (5) Algorithms for AATSR AOD retrieval used to underestimate AOD over different regions in China include the ADV ORAC and SU algorithms.On this basis, the agreement between ground-based observation data and satellite retrievals is assessed based on the ADV and SU algorithms (Che et al., 2015).The main aim of this new KAPPA coefficient is to evaluate the comprehensive performance of these algorithms.Its function is to represent not only the degree of underestimation but also the level of agreement between different datasets.

Validation results and analysis
We collected different validation reference data of AERONET and CASNET in 2007, 2008and 2010.Only 14 ground-based 5 observation sites are available in 2007, of which some are located close to each other.Most are located in different provinces; however, the total number of sites is small and the space distribution is not uniform.Therefore, the number of matches is relatively small for all of the algorithms.More AERONET/CARSNET data are available in 2008, with a total of 32 sites including 8 AERONET sites and 24 CARSNET sites.There are 14 AERO&CARS sites providing data for validation in 2010.The focus of this paper is to determine the differences between the ADV, ORAC, and SU L2 AOD products (see Tab. 4 10 and Figure 1).Therefore, we calculated statistics and analysed the validation results separately by year (see Tab. 4 and Figure 1).

The ORAC algorithm
The ORAC algorithm performed well for 2007, achieving a KAPPA coefficient of 0.474.However, the distribution of matches is dispersed implying high RMSE (0.206).In terms of the degree of fitness, its performance is not effective.However, there is no obvious trend of underestimation or overestimation, and accidental errors influence the accuracy of the ORAC algorithm.
The MSA of the ORAC is 0.324.ORAC has the most matches of the three algorithms.Different from 2008, no obvious underestimation occurs in the results of 2007 and 2010.For 2008, the RMB is 0.829, suggesting a slight underestimation trend.
The applicability of ORAC is high, with MSA of 0.271.The collocated pairs are relatively dispersed, influencing the RMSE.
For 2010, the same dispersion of points in the scatter plot and low KAPPA coefficient are observed.Overall, the ORAC algorithm tends to retrieve AODs unstably for either high AODs or low AODs and with slight underestimation in 2007.The results of 2008 and 2010 share common features, indicating that accidental error is larger than systematic error.

The SU algorithm
The SU algorithm performed well for all three years, achieving KAPPA coefficients of 0.409, 0.484 and 0.520, respectively.
The RMBs are 0.816, 0.713 and 0.720 for 2007, 2008 and 2010, respectively, demonstrating the underestimation of the SU product.The applicability of SU is high, with MSA of 0.293 for 2008.The most obvious feature of the SU algorithm is its stability in retrieving AOD for different years or different regions (Fig. 4).The MSA ranges from 0.270 for 2010 to 0.330 for 2007, and the KAPPA coefficient is from 0.520 to 0.409, which suggests that the SU algorithm performed better in retrieving low AODs.The SU algorithm has the best performance in terms of AOD retrieval, as it has the highest KAPPA coefficient (0.520).Overall, the SU algorithm can be applied to retrieve AOD in different ranges with high precision.Factors influencing the performance of the SU algorithm include small systematic error and even smaller accidental error.

Uncertainty analysis based on aerosol loading
In the previous section, we validated all three AOD products over mainland China in 2007, 2008 and 2010, discovering that all three products tend to exhibit underestimation to some extent.For the purpose of ascertaining the causes of the underestimation, in this section, we focus on analysing the AOD uncertainties leading to differences between retrieved AODs and ground-based AODs in special conditions.Collocated pairs are divided into three groups according to aerosol loading, including light loading (τ 0.15), heavy loading (τ 0,4), and moderate loading (Levy et al., 2010).It is obvious that the AOD bias increases with increasing AOD for all three products.These products have one feature in common, that is, the AOD bias tends to be negative, which indicates that the underestimation becomes more significant with increasing aerosol loading.
The ADV and SU algorithms perform well in estimating AOD, i.e., with little underestimation (lower MBEs of -0.04 and -0.02 respectively as shown in Tab.5), when aerosol loading is low (light loading) (Fig. 2).indicates that the satellite-retrieved AOD is larger than the ground-based observed AOD, this "outlier" tends to be caused by a residual cloud.The ground-based network measures AOD from only one point; however, the satellite-retrieved AODs in each collocated pairs are an average of 25 pixels.Any one of these 25 pixels with a cloud residual will lead to an increased AOD in a collocated pair.Therefore, we conclude that the "outliers" above the box are possibly caused by a cloud residual.
From this view, there are 6, 6 and 2 bubbles above the each box for the ADV product for light, moderate and heavy aerosol loading respectively.However, these bubbles are not "possible outlier" due to the MUs and Std_Ss are relative small shown in Fig. 2. Similarly, the bubbles from SU product above each box are not "possible outliers".For the ORAC product, most of bubbles above each box are "possible outliers" due to larger Std_S (>0.2).Most of "possible outliers" are concentrated in light (13 bubbles) and moderate (14 bubbles) aerosol loading regions as shown in Tab. 5, influencing ORAC's performance on estimating AOD.The bubbles below the box are different from those above the box.Most of them are only below the boxes of moderate and heavy aerosol loading, indicating that all these algorithms have limitations of underestimation on estimating AOD in moderate and heavy aerosol loading regions, especially when AOD loading increase.
We make these groups because aerosols exhibit different behaviours with different loading conditions.In general, the bias or uncertainty of satellite-retrieved AOD will increase with increasing AOD or aerosol loading.As discussed above, all of these algorithms underestimate AOD at different levels; similarly, it is worth noting that underestimation becomes more severe with increasing AOD or aerosol loading.Additionally, we make comparison of AOD bias, which is retrieval errors observed, with AOD uncertainty in AOD retrieval for each pixel from AATSR L2 dataset.AOD retrieval error observed (AOD bias) and AOD uncertainty in retrieval are different as evaluating merits.The range of SU AOD uncertainty is from 0.025 to 0.3, smaller than others, even in heavy aerosol loading region.Most of bubbles of ADV product in Fig. 3. are from 0 to 0.4 of AOD uncertainty.The AOD bias and uncertainty are small in light aerosol loading and moderate for ADV and SU products as shown in Fig. 3.For ORAC product, there is no obvious regularity between AOD bias with AOD uncertainty in three aerosol loading regions especially those bubbles with high Std_S.Therefore, we analyse different validation results of each site to study how these factors work (see Table 6).The area and colours of bubbles represent Std_S and AOD respectively.

Inter-comparison of algorithms site by site
In this section, we select five representative AERONET&CARSNET sites with more than 30 successful matches in 2007, Linan is located at 30.3°N, 119.73°E, northwest of Zhejiang province.A total of 80% of the 50 km 50 km surrounding area is covered by green vegetation, and the other 20% is covered with urban land.The ADV and ORAC algorithm underestimated AOD, with MBE = 0.13 and 0.12 in 2010, respectively.The SU performed well in Linan, with slight underestimation.The underestimation of the ADV algorithm is more severe than that of SU and ORAC.Although the ORAC algorithm has the most matches in Linan, its performance was unstable, which means that the level of underestimation was different in different years.continental climate with four clearly distinctive seasons.The dominant land cover is grassland, covering approximately 95% of the spatial extent of the 50 km 50 km area from the MODIS MCD12C1 land cover data.A total of 30% of the surface is arid and semi-arid areas, which can be a source of dust aerosols.SU performs well in retrieving AOD over SACOL, with a low RMSE (0.072).The accidental error in the retrievals using the ORAC algorithm is obvious, leading to a high RMSE 5 (0.170).However, as discussed above, the ADV algorithm severely underestimated AOD in SACOL.The ADV algorithm tended to severely underestimate the AOD of different ranges, except for a small number of high quality matches.The matches of the SU product are of high quality for the three years.The ORAC has collocated matches in January, February, November and December (winter time), unlike the ADV and SU products.However, the accuracy of ORAC in winter is highly uncertain, as shown in Fig. 5. 10 Shangdianzi is situated at 40.15°N, 94.68°E, with complex land cover of approximately 45% cropland, 30% mixed forest, 18% closed shrub land, 5% grassland, 1% water and 1% evergreen needle leaf forest.The SU algorithm has high precision of AOD calculation over this site from March to October, when most of the land cover is green.The ADV algorithm also performs well in calculating AOD over these three sites, with slight underestimation.The performance of the ORAC algorithm in Shangdianzi is unstable, with strong agreement with ground-based data from March to October and severe underestimation in winter, as 5 shown in Fig. 6. surface is covered with urban land according to the MCD12C1 data of a 50 km 50 km area.The performances of these three algorithms are at the same high quality level.However, the ADV algorithm still underestimated AOD at a level of MBE = 0.12 in 2007 and 0.10 in 2008.To guarantee statistical reliability, there must be more than 30 collocated pairs at one site.The determination of the surface cover at each site is based on the proportion (> 80% for one land type) of each land cover type from the MCD12C1 data at a spatial extent of 50 km 50 km.If no land cover type accounts for a proportion larger than 80% at a given site, it will be 10 identified as mixed; then, we select two or more (sum > 80%) land types with the largest proportions as the main land cover.
As the data volume is too low to infer the year-to-year variability of performance at these sites, the analysis gives some useful information but it is important not to over-interpret results from a small selection of data points.20

Analysis of algorithm performances in western China
Because sufficient ground-based data in western China are lacking for the AERONET measurements, only data from CARSNET sites are used in 2008.We selected six CARSNET sites located in western China in which there are more than 25 matches.
Urumchi, situated at 43.78°N, 87.62°E, serves as the provincial capital of Xinjiang Uyghur Autonomous Region and is the most remote city in China in terms of distance to any sea.The dominant land cover at the spatial extent of 50 km 50 km is grassland, which accounts for approximately 85%.The ADV, ORAC and SU algorithms all severely underestimated AOD, with MBE = 0.22, 0.12 and 0.17, respectively.The MBE is lowest mainly because of the "outlier" in April, which decreases the MBE.Ejina is situated at 41.95°N, 101.07°E, and its main land cover is barren ground (84%).The performances of ORAC and SU are at the same high quality level, with MBEs of 0.02 and 0.09, respectively.Another reason why we chose this site is that there are no matches of ADV products successfully collocated with ground-based data.Based on Fig. 10, the ORAC algorithm has strong applicability in Ejina and high accuracy in retrieving AOD.The SU algorithm also performed well.This demonstrates that another limitation of the ADV algorithm is its applicability in calculating AOD in China.Dunhuang is situated at 40.15°N, 94.68°E and is surrounded by barren ground (85%).The same situation is true for Ejina, which causes slight underestimation at each point but high R and low RMSE for the ORAC algorithm (Figure 11).The performance of the SU algorithm was not as good as that of the ORAC because of its underestimation with MBE = 0.10.The limits of underestimation and applicability of the ADV were more obvious at this site, as it only had 6 matches and showed severe underestimation with MBE = 0.17.Tazhong is situated at 39°N, 83.67°E and is surrounded by barren or sparsely vegetated

22
The prevailing climatic pattern in western China is a temperate continental climate with four distinct seasons and less precipitation in winter and spring.In conclusion, compared to eastern China, the applicability of the ADV algorithm is not strong, and the underestimation is more severe.In the four selected sites in western China, the performance of the ORAC algorithm is best, even though severe underestimation occurs at some sites.The accuracy of the SU algorithm is not as high as the ORAC product, with more severe underestimation and lower applicability.

Inter-comparison
In conclusion, the SU algorithm performs well in calculating AOD over different land covers from March to October.Slight underestimation occurs over barren ground or sparse vegetation at different times, and there are no obvious features in terms of precision in the time series over grasslands.For complex land surfaces where the dominant land cover is vegetation, the SU algorithm is extremely effective in estimating AOD.In the last section, we draw a conclusion that the SU algorithm underestimates AOD over mainland China in 2008 probably because the dominant land cover in western China is barren or sparse vegetation, over which the SU algorithm underestimates AOD more severely.
The ADV algorithm underestimates AOD over all selected sites, leading to all selected MBEs being larger than 0. We make such categories for the purpose of assessing the contribution of different surfaces to AOD estimation.Only XiangHe of 2008 belongs to Class 1, and Linan, Shangdianzi, and SACOL are classified into Class 2. Only Urumchi is in Class 3. Note that even though Lanzhou and Datong were not selected due their location, they should be classified in Class 4.
Overall, the ADV algorithm underestimates AODs at all sites but at different levels, as demonstrated by the above categories.
Serious underestimation occurs over the sites in Class 3 and Class 4 in western China, where the dominant land cover is a mixing of urban area and a large portion of grasslands.For the sites in Class 2, differences exist between Beijing and SACOL.
SACOL is similar to the sites in Class 3 and Class 4, the main land cover of which is grassland.Over the sites in Class 1, the algorithm performs well with high R and low MBE, but there are no common features in terms of surface conditions.
The ORAC product collocates most pairs of all of these products.Most collocated pairs of the SU product and ADV product occur in March to October, but the collocated pairs of the ORAC product occur during each month over some sites in 2008.
Because more matches suggest greater errors for the determination of the "outlier" contribution to the overall performance of the ORAC algorithm, we introduce the ratio of the individual difference to average the differences for each site: where DR<1 indicates a "relatively good" match, 3 >DR >1 indicates a "relatively poor" match, and DR >3 is an "outlier" (see Table 7).
There are no obvious "possible outliers" in Ejina shown in  The ORAC product has the largest coverage at the expense of accuracy, especially in the presence of "outliers", and only the ORAC product has collocated validation pairs over some sites during each month in all three years.The ORAC algorithm underestimates AODs over Ejina, Tazhong and Dunhuang, but the "possible outliers" reduce the differences between the CARSNET data and the ORAC product.Xilinhot, Urumchi and SACOL share the same main land cover of grassland.The problem is that the underestimations over these sites are not at the same level.
It is worth noting that the ORAC algorithm has the ability to calculate high AOD; however, most of the AODs have DRs larger than 3, indicating that the estimation of high AOD is unstable and has large error, reducing the overall precision.

Fig. 1 .
Fig. 1.The distribution of selected AERONET&CARSNET sites in mainland China in 2007, 2008 and 2010.The blue and red points represent AERONET and CARNET sites, respectively.
Obviously, when only using | |, the absolute value of the difference between ground-based data and AATSR AOD data in each collocation pair, as an assessment standard for different AODs, is insufficient and lacks persuasion.Therefore, the combination of | | and | |/ , i.e., the ratio of | | to the value of the reference data in each collocation pair, used in the KAPPA coefficient will account for this shortage and provide a new statistic for assessing the agreement between two arrays, taking advantage of the KAPPA coefficient.
For 2010, the lowest RMS (0.089) with the lowest accidental error of the three years.However, the KAPPA coefficient is 0.180, also the lowest of the three years.The most obvious feature of the ADV algorithm is underestimation with the highest MSA is 0.250 in 2007, and the lowest is 0.173 in 2010.The ADV algorithm can retrieve low AOD values with high accuracy.This "ability" is systematic for either high AODs or low AODs.This also limits the range of application of the ADV algorithm, especially in calculating AODs in high value ranges.

Fig. 1
Fig. 1 Scatter plots of AATSR ADV, ORAC and SU L2 AOD products with ground-based data in China for three year of 2007, 2008 and 2010.The black solid line represents 1-1 line.The magenta points are means for specific range of AERO&CARS AOD and the magenta lines are of retrievals at certain range.The areas and colours are determined by means of uncertainty (MU) dataset in AATSR L2 products and standard deviation of retrievals (Std_S) in collocation frame of respectively.

Fig. 2 .
Fig. 2. Scatter plot of AERONET&CARSNET AODs with ADV AOD bias or uncertainties in China in 2007, 2008 and 2010.The areas and colours of bubbles represent MU and Std_S sampling area of respectively.Colours represent different groups: blue denotes light loading, green denotes moderate loading, and red denotes heavy loading.Each group has one box, the bottom and top borders of which represent and , respectively, containing 96% of scattered points from each group.The centre line of each box represents the MBE of each group.
analysis of individual ground measurement sites For the purpose of further evaluating the different performances of these three algorithms in estimating AOD over mainland China, we validate these products on a site-by-site basis.It is significant to explore the roles of different factors in estimating AOD.There are several factors that may have impacts on AOD calculation, including land cover, aerosol type, elevation, etc.

Fig. 3 .
Fig. 3. Scatter plots of ADV, ORAC and SU AOD uncertainty with AOD bias over China for three years of 2007, 2008 and 2010.
2008 and 2010 to guarantee an appropriate statistical sample size.These selected sites are located in different regions where the land cover and climatic pattern are different and representative of mainland China.Two AERONET sites and three CARSNET sites were selected, including SACOL and XiangHe from AERONET, and Linan, Shangdianzi and Xilinhot from CARSNET.Most matches of ADV and SU products collocated with ground-based data occurred in March to October in 2007, 2008 and 2010, as shown in Fig. 4 to Fig. 8.The matches of the ORAC product were distributed in each month over most sites.

Fig. 10 .
Most of the DRs are in the range of 0 to 3, only two DRs are larger than 3, and the maximum (overestimation) is 5.112.The retrieved AOD in March is a possible "outlier" because it is overestimated, whereas most are underestimated.Another two sites dominated by barren or sparsely vegetated land cover are Dunhang (approximately 85%) and Tazhong (100%).The conditions in Tazhong are complex, and there is no obvious relationship between the CARSNET data and the ORAC AODs.Most of the DRs are less than 3, and a total of eight DRs are larger than 3.The DR in February is an "outlier" because the varying tendencies are different between the ORAC product and the ground-based data, indicative of overestimation.

Fig. 13
Fig. 13 Scatter plots of AATSR ADV, ORAC and SU L2 AOD products with ground-based data in China in the spring, summer, 10

Fig. 15
Fig. 15 Comparisons of Ångström exponent of ORAC and SU products.The area and colours of bubbles represent AOD uncertainty and AOD values, respectively.

Table 3 . Design of the KAPPA coefficient.
To estimate the KAPPA coefficient, one needs to determine which pairs are "true" or which pairs are "relevant".However, if only given matched collocation pairs, we cannot determine which pair is relevant.Therefore, the design of criterion 1 and criterion 2 needs to be reasonable and fit for the purpose of validation.
For criterion 1, if | | is greater than the mean of | |, then it is marked as "lowly relevancefar from truth", and if not, it is marked as "highly relevanceclose to truth".Here, the bias was assessed from the first quartile to the third quartile for eliminating possible "outliers".The | | only indicates the absolute error of the retrieved AOD, and it still needs another statistic for criterion 2, i.e., | |/ , which indicates the relative error of AOD retrieval.For criterion 2, if | |/

Table 4 . Main statistics of the validation results.
For 2007, the RMS error is 0.095 and the RMB is 0.704, which reflects the tendency of underestimation.This type of underestimation is more severe with increasing AOD.Low dispersion and slight underestimation make the KAPPA coefficient high (0.473), demonstrating that the ADV algorithm performs well in calculating the AOD over China in 2007.The ADV algorithm is appropriate for the retrieval of low AODs, especially for those less than 1.0; thus, the MSA for 2007 is 0.244.For 2008, the lower RMB (0.621) suggests more severe underestimation, and higher RSE (0.130) indicate lower accuracy.Similar with 2007, the MSA of the ADV is 0.211.Therefore, the KAAPA coefficient, which measures the overall performance, is 0.329, lower than that of 2007.