Results from the 4 th WMO Filter Radiometer Comparison for aerosol optical depth measurements

This study presents the results of the 4 Filter Radiometer Comparison that was held in Davos, Switzerland, between September 28 and October 16, 2015. Thirty filter radiometers and spectroradiometers from 12 countries participated 30


Introduction
Growing recognition of the role of atmospheric aerosols in the determination and modification of the Earth's radiation budget and hydrological cycle through their direct and indirect effects has led to a steady increase of scientific interest in aerosol physical, chemical and optical properties over the last decades (Augustine et al., 2008;Lohmann and Feichter, 2005;Nyeki 45 et al., 2012;Wehrli, 2008). The main parameter related to columnar integrated optical activity of aerosols is their optical depth that can be derived from ground-based measurements of the attenuation of sunlight, but also with modeling of scattered radiation observed from space (Chylek et al., 2003;Shaw et al., 1973;Toledano et al., 2011). Aerosol optical depth (AOD) is the single most comprehensive variable to assess the aerosol load of the atmosphere and represents the least common denominator by which ground based observations, satellite retrievals, and global modeling of aerosol properties are 50 compared, providing a holistic approach for an all-around understanding and quantification of the AOD uncertainties (Heintzenberg et al., 1996;Andrews et al., 2017). This significance is illustrated by the fact that AOD is one of the core measurements at different solar elevation angles (or optical air masses) throughout a day under very stable atmospheric conditions and pristine skies, and plotting the logarithm of these voltages against the relative air mass. The determination of V 0 (λ) values by the Langley method has been the main current practice for calibration of spectral radiometers used in AOD observations. In addition, other in situ calibrations (Nakajima et al., 1996;Campanelli et al., 2004;2007) have been proposed. According to the Beer-Lambert-Bouguer law, the ordinate intercept yields the logarithm of the zero-air-mass 70 photometer voltage V 0 (λ) if the turbidity of the atmosphere remains constant during the measurements (Dirmhirn et al., 1993). Langley extrapolation relies on the assumption of stable optical depth during the period of measurements. Standard least squares fitting techniques are applicable only under the additional assumption of a normal distribution of optical depth fluctuations. However, certain cases of systematic variations of the AOD can induce unnoticed systematic errors in the calibration constant (Shaw, 1976), that may lead to a significant day-to-day scatter. Langley extrapolations are thus rarely 75 successful at most observation sites and are usually performed at high altitude sites or at places where an additional independent assessment of AOD variations can be used. Although the stability of optical interference filters has improved a lot over the last 20 years, periodic re-calibrations of filter radiometers are still needed in order to maintain AOD uncertainties within certain limits.

80
Surface-based global networks of AOD measurements, such as the AErosol RObotic NETwork (AERONET) (Holben et al. 1998;, the Global Atmospheric Watch Precision Filter Radiometer network (GAW-PFR) (McArthur et al., 2003;Wehrli, 2005), the SKYradiometer NETwork (SKYNET) (Aoki et al., 2006;Kim et al., 2008), the Bureau of meteorology AOD Australian network (BoM) (Mitchell and Forgan, 2003), and the National Oceanic and Atmospheric Administration/Earth System Research Laboratory's (NOAA/ESRL) SURFRAD network (Augustine et al, 2000) and 85 NOAA/ESRL Global Baseline Observatories (Dutton et al., 1994;NOAA/ESRL, 2003) are used to measure spectral AODs at various locations worldwide. Several AOD intercomparison campaigns with the participation of different instrument types that belong to some of the above networks have taken place as short-term intensive field campaigns and have proven themselves a successful method of relating the methodologies of standards from one network to another (Aoki et al., 2006;Kim et al., 2005;McArthur et al., 2003;Mitchell and Forgan 2003;Schmid et al., 1999). 90 Simultaneously, most of the previous AOD comparison studies, including the 1 st , 2 nd and 3 rd filter radiometer comparisons Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2017-1105 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 6 December 2017 c Author(s) 2017. CC BY 4.0 License. held for 2 weeks in September -October of 2000, 2005, 2010, respectively. FRC-II and -III were based on AOD results derived from simultaneous measurements by each participant according to their standard protocol and evaluated by their preferred algorithms, including cloud-screening. Recommendations by WMO experts (WMO, 2005) were implemented as of FRC-II. A large number of radiometers were present during both FRC-II (14 from 9 countries) and FRC-III (17 from 10 countries). The main conclusions were that: i) Most of the ground-based AOD measuring instruments were able to achieve 100 comparable results to within ≈±0.005, ii) algorithms used for calibration and evaluation contributed a significant fraction of the observed dispersion in AOD measurements, and iii) measurements of the Ångström exponent (AE) for the wavelength pair 500/862 nm were questionable when AOD < 0.1. In this study, we present the results of the 4 th FRC intercomparison campaign in which 30 instruments, from 12 countries, belonging to the above-mentioned global or national networks, participated. Section 2 presents the instrumentation, the location of measurements and the analytical methodology used. 105 Section 3 describes the intercomparison results while conclusions in Section 4 investigates AOD calculation methods and involved assumptions and set the framework within which the homogeneity of networks will be feasible through standardization of instrumentation and procedures in combination with a multi-faceted data quality control /quality assurance system. The whole activity aims to homogenize/harmonize AOD measurements on a global scale. The comparison protocol was formulated according to the WMO recommendations (WMO, 2003;2005). 110 2 Instrument, location and AOD retrieval

Intercomparison location
The World Optical depth Research and Calibration Center (WORCC) was established at Davos in 1996 and assigned the mission by WMO to develop stable instrumentation and improved methods of calibration and observation of AOD. These new developments were demonstrated in a global pilot network (Wehrli, 2008). Toward this goal and concurrent with the 115 12 th International Pyrheliometer Comparisons (IPC-XII), FRC-IV was held. Instrumentation belonging to different aerosol optical depth global networks were invited to participate. The comparison took place on the premises of the PMOD/WRC from September 28 to October 16, 2015. Thirty filter radiometers and spectroradiometers from 12 countries participated in this campaign. PMOD/WRC (46° 49' N, 9° 51' E, 1590 m above sea level) is situated at the edge of the small town of Davos in the eastern part of Switzerland. The valley of Davos is oriented NorthEast -SouthWest and the horizon limits solar 120 observations to zenith angles smaller than about 78° (from about 7:15 to 16:15 hours CET) in fall. Average sunshine duration in September and October is 173 and 156 hours, respectively, while average long term AOD is ~0.06 at 500 nm (Nyeki et al., 2012). Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2017-1105 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 6 December 2017 c Author(s) 2017. CC BY 4.0 License. Figure 1: Average AOD at 500 nm measured by the WORCC triad during 5 days with cloud free sky conditions. The symbols represent 1minute measurements.
During FRC-IV, there were five days (September 28 -30, October 1 and 12) mainly with sunshine and only very limited presence of clouds. Measurements from these days have been used to compare the participating instruments. During the five 130 intercomparison days, AOD varied from 0.02 up to 0.12 at 500 nm, which can be considered as normal values for the area. Figure 1 shows the AOD variability during the intercomparison days, as measured by the WORCC triad that is defined as the mean of three well-maintained PFR instruments. Before the start of the campaign, the PFR triad was inter-compared with three additional PFR instruments that had performed measurements at Izaña, Tenerife, Spain, (2 instruments) and Mauna Loa, Hawaii, USA, for a period of 9 months. The calibration of the particular instruments was based on the Langley 135 calibration technique. During five cloudless days in August -September 2015, the three Langley-calibrated instruments were compared with the three PFR triad instruments. The differences in AOD for all instruments were from 0.2% up to 0.5% or up to 0.0005 in AOD to at all wavelengths.

Participating instruments
Filter radiometers have been used in meteorology for at least 40 years to measure atmospheric haze or turbidity. Modern sun 140 photometers use dielectric interference filters and silicon photodetectors resembling the filter radiometers used in metrology.
The precision filter radiometers (PFRs; Wehrli, 2000), has been designed with emphasis on radiometric stability and a small number of instruments were built for a trial network of AOD measurement sites (Wehrli, 2005).
Thirty instruments from 12 countries participated in FRC-IV representing the most widely used instrument types for AOD retrieval. The participating filter radiometers were either of the direct pointed type, e.g. classic sun photometers, including 145 sky-scanning radiometers used in direct sun mode, or hemispherical rotating shadow-band radiometers. These included the following (see Table 1 for further details): a.
Nine (9) instruments were of the PFR type (manufactured by PMOD/WRC) that is used in the GAW AOD network (Wehrli, 2005). The PFR is a classic sun photometer with 4 independent channels, a field-of-view (FoV) of 2.5° and Two (2) radiometers were of the Carter-Scott SP02 type (Mitchell and Forgan, 2003), which is similar to the PFR, but has a wider FoV of 5° and no temperature controller.
c. Three (3) CIMEL CE318 sun and sky scanning radiometers as used by AERONET (Holben et al., 1998), two of them CE318-T model, the new standard AERONET instrument with improved performance and capable to perform lunar 155 observations (Barreto et al., 2016). These instruments have a narrow FoV of 1.2° and sequentially measure the sun at 9 wavelengths within a few seconds. No temperature control is used. d.
Four (4) MFRSR rotating shadow band radiometers (Harrison et al., 1994; with a hemispheric FoV. These measure global horizontal and diffuse horizontal irradiance (GHI and DHI) in 5 aerosol channels, the difference in GHI and DHI divided by the solar-zenith angle is cosine-corrected to provide calculated direct beam spectral 160 irradiances. The temperature is held near 40 C. The effective FoV is the largest of any of the instruments in this study at ~ 6.5°.

e.
Three (3) Precision Solar Radiometers (PSR) that are direct sun pointing spectroradiometers able to measure the spectrum from 300 to 1000 nm with wavelength increment of 0.7 nm. FoV and FWHM is 1.5 o and 1.5 to 6 nm respectively. These are manufactured by PMOD/WRC and are temperature controlled. Historically, instrument comparisons have consisted of bringing a number of instruments together to a single location for a period of several days to several weeks (e.g., Schmid et al., 1999). These types of comparisons are essential to moving 175 forward the frontiers of instrument and metrology science. However, there may be little or no relation between the results of these intensive comparisons and the results from the same instruments when placed in an operational network setting. The comparison that is reported here provides insight into the quality of data output by instruments when cared for following operational protocols, designed by the various data centers, responsible for the routine handling of the measurements. Therefore, the results of this comparison should provide an understanding of both the comparability between different 180 networks and the overall data quality of participating networks. However, in addition to this comparison's results, homogeneity related conclusions for different Networks are linked with the action of each Network towards standardization of calibration, instrumentation and towards the use of standard operational procedures (SOPs) including data quality control Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2017-1105 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 6 December 2017 c Author(s) 2017. CC BY 4.0 License. and quality assurance protocols. Given the differences in instrumentation characteristics, calibration strategies (Walker et al., 1987) and processing algorithms used by different networks, the effective equivalence of AOD observations needs to be 185 estimated through Intensive Observation Periods (Schmid et al., 1999) or extensive field comparisons (McArthur et al., 2003;Mitchell and Forgan, 2003) of co-located instruments representing different networks. AOD is defined as the negative natural logarithm of transmission, normalized to the vertical path length, m = 1 through the atmosphere, its error becomes proportional to the relative error in calibration and inversely proportional to the length m of a slant path. The current GAW specification (WMO, 2005) calls for an AOD uncertainty of 0.005±0.01/m thus requiring a calibration uncertainty of 1%. This specification is similar to the uncertainty required for satellite AOD retrievals of 0.015 over land and of 0.010 over the ocean in order to make a meaningful statement concerning the aerosol climate effect (Chylek 195 et al., 2003).
Measurements of solar irradiance were nominally taken each full minute by the participant's data acquisition systems, typically yielding 500 observations per cloudless day. Actual sampling/averaging rates ranged from 15 seconds to 1 minute depending on the instrument. Simultaneous measurements were defined in a timing window of 30 seconds before and after each full CET minute. The raw measurements were evaluated by each participant according to their preferred algorithms, 200 including cloud-screening, and submitted for comparison.
The set of measurements covered wavelengths between 340 nm and 2200 nm. Channels at 368±3nm, 412±3nm, 500±3nm, 865±5nm were defined as the AOD intercomparison wavelengths. The number of instruments that submitted AOD retrievals for each of those wavelengths is summarized in Table 2: 205 Ångström exponents were derived from optical depths at 500 and 865 nm (29 instruments). Values of atmospheric pressure, precipitable water, relative humidity and temperature readings, were made available to all participants by the MeteoSwiss weather station located at PMOD/WRC, at a 10 minute resolution. Total ozone column content measured with a double 210 Brewer spectroradiometer at PMOD/WRC, was available as well. This common auxiliary database was available to all participants in order to avoid AOD related discrepancies introduced by uncertainties linked with the abovementioned parameters.
Several of the participating radiometers were calibrated at various sites within a few months prior to FRC-IV. Their CIMEL instruments: The Cimel sun photometers (#627 and #917) were calibrated by the Langley plot method at the high 225 altitude station Izaña following the AERONET protocols for master instruments (Holben et al., 1998), just before the campaign (August 2015). V o values were calculated as the average of five different Langley calibrations in June, 2015 (mean AOD at 500 nm < 0.016 during these days), following the criteria based on the coefficient of variation (CV) determined in Holben et al. (1998). This criteria requires CV < 0.5% for VIS and IR spectral bands and 1% for UV wavelengths. The permanent Cimel at Davos (#354) was calibrated by comparison with an AERONET master instrument in June 2015, 230 following the AERONET standard procedure for field instruments. MFRSR instruments: SURFRAD network MFRSRs are calibrated on site using a robust estimate for V o s from Langleys based on at least one month or more of data in representative conditions (Augustine et al., 2003). MFR_US_2 and MFR_US_3 were calibrated using only the data from the FRC-IV; MFR_US_1 and MFR_DE_1 also used the data from 240 FRC-IV to calibrate following slightly different modified procedures to determine Vos because of the short duration of the campaign.
SPO2 instruments: The Australian Bureau of Meteorology SPO2s were removed from a high frequency clear sun Australian (Longreach) station where they were calibrated in-situ for 2 years using the methods described in Mitchell, Forgan and Campbell, 2017, prior to participating in the FRC. 245 PSR instruments: PSR instruments were absolutely calibrated at the PMOD/WRC laboratory during the campaign. In order to retrieve the AOD, an absolute extraterrestrial solar spectrum is used.
Microtops instrument: The instrument was calibrated by direct comparison with a calibrated CIMEL/AERONET instrument from June to August, 2015.
COFOVO instruments: The four instruments are calibrated through direct comparison with the National Renewable Energy 250 Laboratory, USA secondary reference spectoradiometer (Tatsiankou et al., 2013). AOD is retrieved by matching absolute irradiances at the six measuring wavelengths with a radiative transfer model.
During the intercomparison, AOD data delivered by the operators of the participating radiometers were evaluated using a common comparison software. The comparison was based on AOD results only, as each operator/group used their own algorithm normally used for standard radiometer operation. The comparison principles were based on the recommendations 255 During FRC-IV, weather conditions allowed over 1000 measurements to be made for most instruments on 5 days, allowing the above-mentioned recommendations to be fulfilled.

AOD differences
This comparison is based on AOD values provided by the individual instrument operators against the triad. Figure 2 shows an example of this comparison including various instruments separated in different instrument types, against the PFR triad on a diurnal plot. The majority of the PFRs showed the best performance with absolute AOD differences from the triad ranging in all cases and wavelengths from zero to 0.01. As the measured wavelength increases, the errors are minimized 270 reaching performance errors close to zero except for some overestimated outliers for PFR_SE_N35, which were caused by non-synchronous measurements (timing) for particular periods. Results for the three CIMEL (CIM) instruments are almost identical to those of the PFR at 500 and 862 nm (Fig. 2a), while a slight underestimation in the order of 0.01 and 0.005 at the shorter wavelengths 368 and 412 nm, respectively was found. It has to be noted that CIMEL AOD at 412 and 368 nm have been linearly interpolated using the CIMEL AOD at 340, 380 and 440 nm and the AEs derived from these three 275 wavelengths. So part of the difference can be explained by the interpolation related uncertainties. POM sky radiometers do not measure AOD at 368 and 412 nm. However, comparable results to the CIM and PFR at 862 nm was retrieved, with a slight underestimation, well within the WMO limits, at 500 nm (Fig. 2b) which was not related to the air mass. This proved the high level of the quality of reference instruments belonging to the GAW-PFR, AERONET and SKYNET networks. The two SPOs, which are similar instruments to the PFRs but with a wider field-of-view and with no temperature controller, 280 showed good agreement compared to the triad. SPO_AU_1, showed excellent median differences (Fig. 2c). For the SPO_US_1, one of the five days of measurements at 500 nm and one of the five days at 862 nm were overestimating, with excellent agreement other days and excellent agreement on all days at 500 nm. The overestimates were likely the result of the four FoVs of the SPO not being optimally aligned. During the shipment of the SPO2_US_1 to Davos the diopter was damaged. It was manual adjusted to its position during the FRC without the benefit of a detailed alignment process that is 285 usually followed to minimize the misalignment of the four independent barrels of the sunphotometer. At 368 nm, small SPO_AU_1 calibration related AOD differences were observed compared to the triad. The four MFR instruments showed Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2017-1105 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 6 December 2017 c Author(s) 2017. CC BY 4.0 License. good agreement for the medians compared to the PFR triad, however, they exhibit larger scatter than the sun-pointing instruments resulting in a lower precision. McArthur et al. (2003) had previously reported that the MFR-derived AOD does not quite meet the accuracy of the sun-pointing instruments under clean atmospheric conditions. MFR_DE showed an AOD 290 overestimation in various instances that gave results that are outside the WMO defined AOD limits (Fig. 2d). This small overestimation of the MFR_DE instrument compared to the PFR Triad could be due to uncertainties introduced while correcting for their angular response, the calibration procedure, or incomplete blocking of the diffuser by the shadow-band.
The MFRSRs that are part of the SURFRAD network (MFR_US2 and MFR_US3) give a median AOD at 500-nm that is in very good agreement with the PFR triad and as good or better than some of the other sun-pointing instruments, e.g., CIMEL 295 and POM; these two slightly underestimate the AOD at 865 nm, but are within the WMO defined limits. Again, these two MFRs' medians are comparable to the better sun-pointing instruments, but give larger scatter. These two MFRs are representative of the SURFRAD network that follow network protocols for calibration and alignment and frequent characterizations of the spectral and angular responses (Augustine et al., 2003, Michalsky et al., 2001. Again, this highlights the high level of the quality of instruments that represent larger networks (GAW-PFR, AERONET, SKYNET, and 300 SURFRAD networks).   PFR AOD comparisons showed that median differences were well within ±0.005 with the 10 th to 90 th percentiles also well 315 within ±0.01 AOD limits, at all wavelengths. Similar results were found for CIMEL AODs at 500 and 862 nm. POM AOD medians showed a small underestimation of about 0.005 at 500 nm and very good agreement at 865nm. The medians of the MFRs AODs were within 0.01 AOD except for the MFR_DE_1 at 500 nm. The three PSR instruments are the only ones that provide high spectral resolution AOD measurements, and the comparisons highlighted the accuracy of the medians at longer wavelengths (500 and 862 nm) with a tendency of overestimated outliers, and a 0 to 0.02 discrepancy between the PSRs at 320 shorter wavelengths. Overall, better results were demonstrated by PSR_006. SIM instruments showed an excellent agreement at 500 and 865 nm, an overestimation from 0.01 to 0.03 and higher scatter than the other instruments. However, based on the instrument retrieval methodology (use of a radiative transfer model with direct irradiances as inputs, in order to calculate AOD), the results can be considered as very good. Finally, the hand-held Microtops instrument overestimated at the two shorter wavelengths, while the scatter of the differences was 0.01 to 0.04 for the 10 th to 90 th percentiles. The blue lines in 325 figure three are defined at the -0.09 and 0.09 AOD limits. This is an average of the air mass related WMO limit that ranged Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2017-1105 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 6 December 2017 c Author(s) 2017. CC BY 4.0 License. from 0.06 to 0.12 for the campaign period. CIMEL AOD at 368 and 412 nm has been interpolated using the CIMEL AOD at 340, 380 and 440 nm, and the Ångström exponents derived from these three wavelengths. Overall, the FRC-IV intercomparison results are comparable with the results found by Forgan (2003), Mitchell et al. (2017) and Kim et al. (2008) under low aerosol loading conditions. The magnitude of the instrument's discrepancy 340 could be partly due to the inherently different spectral responses and detector fields-of-view of each instrument under varying aerosol loadings (Kim et al., 2005). The above results indicate that the pointing instruments provide data of comparable quality. On an observation-by-observation basis, the direct-pointing instruments appear to maintain a difference of lower than 0.01 at nearly all wavelengths in clear stable conditions, equal or lower than the AOD uncertainty. It is estimated that advances in the following aspects may improve (see Section 3.3) agreement at the 0.005 level: i) instrument 345 pointing, ii) better determination of the effects of Rayleigh scattering, ozone, and other absorbers on the calculation of AOD, and iii) better instrument characterization, especially calibration of the radiometers. Significant improvements in AOD precision and instrument accuracy were obtained upon application of cloud-screening algorithms.

Atmos
Concerning additional statistics, we have used Taylor (Taylor, 2001) diagrams in order to evaluate the performance of all instruments at the four measuring wavelengths (Figure 4). Correlation coefficients (CC) among the triad and all other 350 instruments were better than 0.9 for all instruments and wavelengths, with the exception of three instruments, only at 865 nm. In the case of the CIMEL, PFR and POM, CCs were higher than 0.98 in all cases. The normalized standard deviation in Fig. 4 describes the instrument measured AOD variability compared to that of the reference (triad). Most of these ratios were well within the 0.8 to 1 area, with the exception of a single PFR instrument, that provided data for only one comparison day.
Overall, statistics at 368, 412 and 500 nm showed an excellent agreement for all instruments, while at 865 nm the instrument 355 scatter within the Taylor diagram space is higher. However, the agreement can still be considered quite good, as seen when examining   Table 2). When considering 95% of measurements, the best results correspond to the 500 nm wavelength followed by the 867 nm wavelength. A main finding is that the lower the wavelength, the lower is the reliability, accompanied by the lower 365 percentage of participating/supporting instruments. For a lower percentage of measurements (horizontal axis) the 865 nm wavelength reaches 100% of participating instruments, which decreases to 83% at 95% of data within the WMO limits. The shortest studied wavelength (368 nm) showed that 12 out of 17 instruments were within the WMO criterion while the remaining five had less than 70% of the comparison data among the WMO limits.
The difference in the AE between all participant instruments and the triad is shown in Figure 6. We have used only the 500/865 nm channels to calculate the AEs in order to have the same calculation principles for all instruments. 375 Figure 6: Difference in the Ångström exponent between each instrument and the WORCC triad. The boxes represent the 10 th and 90 th percentiles while the black lines represent the minimum and maximum values of the distribution excluding the outliers. Outliers (gray dots) are considered to be values outside the 10 th and 90 th percentiles by four times the width of the distribution at a 10% level. Box colors are only used to differentiate between instruments.
Under low aerosol conditions, a small relative bias in the AOD determination at 500 and 865 nm can theoretically lead to 380 large deviations in the calculated AEs. As an example of AODs of about 0.05 and 0.02 at 500 and 865 nm, respectively, AOD differences of 0.01 and 0.005, respectively, can lead to AE differences up to ~1. This was observed during FRC-IV, and Figure 6 shows that for such low AOD conditions, AEs can differ substantially. Most of the AE instruments differ from the one triad calculated AE by the triad, by less than 0.5 (median difference) but the 10th to 90th percentiles are about 0.5 for Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2017-1105 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 6 December 2017 c Author(s) 2017. CC BY 4.0 License. the PFRs and close to 1 for all other instruments with the exception of the Microtops instrument retrievals that showed a very 385 large variability in AE difference.

Cloud flagging
The FRC campaign was a unique opportunity to compare the different cloud-screening algorithms used by each instrument/group. McArthur et al. (2003), have reported on instrument/network related cloud flagging differences using a three month campaign measurements. The use of such algorithms can lead to significant differences while the selection of 390 threshold values to filter out the retrievals could lead to large deviations. For our comparison, we have used one of each of the main types of instruments and compared the number of available retrievals (PFR, POM, SPO, MFR and CIMEL instruments). More specifically, we have chosen to examine the instruments of each type with the larger dataset on these five days.
The cloud detection algorithm used for the above-mentioned instruments can be summarized as: 395 CIMEL: The AERONET operational cloud-screening algorithm, described by Smirnov et al. (2000;, was used. It consists of temporal filtering in several steps, from minute (triplet stability, with AOD variation <0.02) to hour and diurnal checks, that impose restrictions on the AOD second derivative with time as well as the standard deviation of AOD within the day.
PFR: Three different criteria are used (Wehrli, 2008): a. The instrument signal derivative with respect to air mass is always 400 negative (Harrison et al., 1994). For cases of air masses < 2 where a cloud influence on the noon-side of a perturbation cannot be easily detected, a comparison of the derivative with the estimate of the clear Rayleigh atmosphere is performed and data are flagged as cloudy if the rate of change is twice as much (objective method); b. A test for optically 'thick' clouds with AOD >2 is performed; c. Use of the Smirnov et al. (2000) triplet measurement by calculating AOD and looking at the signal variability for three consecutive minutes (triplet method). 405 POM: The Smirnov et al. (2000) algorithm, was implemented in the SUNRAD code which was used for the POM instruments (Estellés et al., 2012) with two main differences related to instrument characteristics. First, the minimum signal threshold is set to 5.0E-7 A. Second, triplets are built a posteriori with 1-minute instead of 30-second data, as used in the CIMEL. A further check was introduced in the current version of the processing software, which is consistent in removing isolated AOD data points, namely: a given AOD point will be flagged if the previous and next AOD values were already 410 flagged in the standard application of the cloud screening algorithm. 15 sequential samples is used to examine each wavelength's AOD time series; if AOD at any wavelength is rejected by the algorithm for a sample time, the AOD is deemed to be affected by clouds at all wavelengths.
MFR: The technique used for MFRs is described in Michalsky et al. (2010). A coarse filter is used on ten minutes of data 420 that examines differences first from 20-sec sample to 20-sec sample and then over the entire 10-min interval. This is followed by a second similar filter, but using allowances of variability that is scaled to the approximate value of the AOD. If the 10-min span passes both tests, the test is repeated after advancing one 20-sec sample. Duplicate points from processing all of the data are discarded.
We have used the tool developed by Heberle et al. (2015) to visualize the coincidence of the instrument data sets that 425 provided 1-minute AOD (SPO, MFR, PFR and POM) by plotting Venn Diagrams (Figure 7). CIMEL instruments were not included due to the lower AOD-measuring temporal resolution. All instruments only detected cloudless conditions during 25% of the common measurements. The SPO seems to have the most values that do not appear in common with other instruments (4.9% solo, and 18% in common with only one other instrument) and the POM the least (0.1% and 0.8%, respectively). When considering measurements derived as cloudless from at least three out of four instruments, the SPO has 430 the largest number of coincident measurements (69.9%) followed by the PFR (69.2%), MFR (59.9%) and POM (36.3%).
The POM has the smallest dataset, only retrieving AOD from 40% of all possible (at least one instrument provided cloudless AODs) measurements. In order to investigate measurements when only one instrument provides cloud free minute measurements while all other instruments are marked as cloud-flagged (as an example in Figure 7: the SPO has 96 cases/minutes out of a possible 1944 comparison data/minutes), we calculated an artificial AOD time series. This was constructed by spline-interpolating the mean AOD of all the remaining (three) instruments (excluding the CIMEL that has a lower temporal measurement frequency than the rest of the instruments), at the time intervals where the fourth instrument (SPO in this example) provides cloud free 445 data. It was found that the mean AOD at 500 nm (AOD 500 ) and the SPO retrieval difference is 20.5%. In this example, on the one hand a 20.5% increase of AOD over one or a few minutes could be considered as a reason of rejection (cloud-flagging) for all other algorithms except that of the SPO. However, a difference of 0.006 in optical depth could be considered as a limit on trying to separate aerosol and very thin cloud attenuation.
In Table 3, we have calculated the score for each instrument, dividing the number of available retrievals by a total of 1944 450 possible (at least one of the instruments has provided an AOD cloud free minute value) comparison cases. For CIMEL values, where the measurements are not every minute, we used raw data to count all the recordings, and divide the number of cloud-screened data, so it is not directly comparable with other instruments. The POM instruments obtained the lowest score in the cloud screening application, mainly caused by the stringent isolation-check added to the adapted Smirnov et al. (2004 algorithm. 455   black line is the mean AOD from the PFR, MFR and SPO for data points when all three instruments provided data. The gray vertical lines represent periods where the PFR, MFR and SPO provided data but the POM characterized them as "cloudy". Figure 8 shows AOD measurements at 500 nm for all instruments that were tested for their cloud-flagging algorithms during one single day. As seen in Table 3, the POM instrument seems to cloud-flag various minutes/measurements, while all other 465 instruments/algorithms do not. Such instances are shown in Figure 8 as gray areas, and represent periods when all PFR, SPO and MFR instruments provide AOD (thus they do not "detect" any clouds) while the POM does not provide an AOD.
Despite the small instrument-to-instrument differences, the evolution of the AOD during particular periods (gray areas), also described by the mean or artificial AOD, cannot be considered as periods that are affected by clouds. Thus, the POM algorithm is probably too strict compared to the others. In addition, sporadic SPO related high AOD values after 14:00 (at 470 times when no other instrument provides cloudless data) show that during these conditions, the SPO cloud-flagging algorithm was more imprecise.

AOD retrieval differences
For the present intercomparison, no common procedures were used for the removal of gas phase constituents or Rayleigh scattering; cloud screening, solar position, timing, and calibration methodology were at the discretion of the network 475 operators. Datasets from each sun photometer network were corrected for these factors independently. Figure 9 identifies some of the possible discrepancies that may result when considering NO 2 , ozone, Rayleigh scattering, other trace gases and H2O in the atmosphere (Thome et al., 1992) at 500 and 870 nm during the 4th FRC. compared. Furthermore, the respective algorithms for the calculation of the solar zenith angle and air mass at any given time (as provided by the responsible scientists of each instrument) were employed. NO 2 absorption was considered only for POM (fixed vertical column density of 0.218 DU for mid-latitude summer; method and cross-sections from Gueymard, 1995b andGueymard, 2001) and CIMEL instruments (SCIAMACHY monthly climatology; cross-sections from Burrows et al., 1998) and only for AOD retrieval at 500 nm wavelength. Ozone absorption was taken into account by all instruments at 500 nm, 485 but was not accounted for by the CIMEL at 870 nm. Different ozone amounts (measured value of 314 DU for PFR and SPO; fixed value of 300 DU for POM; OMI climatology for CIMEL) and cross-sections (Gueymard, 1995 for PFR;Gueymard, 1995b andGueymard, 2001 for POM;Burrows et al., 1999 for CIMEL; custom set of ozone coefficients for SPO) were adopted. The Rayleigh scattering coefficients by Bodhaine (1999) are used by all networks except SPO, which use those by Bucholtz (1995). Pressure was measured (845.7 hPa) by the PFR control-box, while it was fixed and corrected for altitude 490 (z) for the POM (840 hPa) using the formula: p = 1013.25*exp(−0.0001184*z) (1) Finally, water vapour is only taken into account by POM instruments using a fixed value for the summer season and 495 additionally corrected for altitude using the following formula based on Gueymard, 1995 data. w = 2.9816*exp(−0.552*z) Where w is the precipitable water in cm, and z is the altitude in km. The method for deriving the corresponding H 2 O optical 500 depth is also adopted from Gueymard, 2001. Results of this comparison exercise are shown in Figure 9.  The analyzed factors result in discrepancies of comparable magnitude at a wavelength of 500 nm, but also illustrate a slightly larger effect due to differences in the corrections for Rayleigh scattering and water vapor. At 870 nm, the larger discrepancies can be ascribed to different parametrizations of ozone absorption and Rayleigh scattering. For the case of the MFR instrument the effective wavelength of the "500-nm" filter is about 495.8 nm, which explains the higher Rayleigh 510 optical thickness and the lower ozone absorption related one. The deviations between algorithms can be of either sign and can partially compensate each other in AOD calculations. Finally, NO 2 related differences were 0.002 to 0.004 at 500 nm, at a location (Davos) with very low NO 2 columnar concentrations. The error in the (vertical) AOD resulting from differences between the algorithms (obtained by dividing the differences in the slant optical thicknesses by the air mass factor) did not exceed 0.005 for the selected day. This value is far below the traceability threshold and can thus be considered negligible. 515

Summary and conclusions
Results from the FRC-IV have been presented in this study. Based on the number of instruments and also the participation of reference sun photometers/instruments from various global AOD networks, the campaign could be considered as a successful experiment in assessing the current status of AOD measurement accuracy and precision. The WMO recommendations for AOD comparisons have been adopted for the present campaign and the WORCC PFR triad has been 520 used as a reference.
The absolute differences of all instruments compared to the reference triad have been reported and are based on the WMO criterion defined as: "95% of the measured data has to be within 0.005±0.001/m". At least 24 out of 29 instruments achieved this goal at 500 and 865 nm, while 12 out of 17 and 13 out of 21 achieved this at 368 and 412 nm, respectively.
The statistics from the Taylor diagram analysis revealed the overall accuracy and homogeneity of the instruments. In 525 particular, the majority of instruments gave CCs >0.98 and a normalized standard deviation in the range 0.75 -1 as compared to the triad, at all wavelengths. The similarity of results and the high accuracy of the PFR, CIM and POM instruments demonstrates a promising framework to achieve network homogeneity in the near-future, concerning the AOD measurements. The PSR spectroradiometers, SIM and SPO filter radiometers also had CCs over 0.96 under all conditions. Ångström exponent calculations using a pair (500 nm/865 nm) of wavelengths showed relatively large differences among 530 different instruments. This was largely related to the sensitivity of this parameter at low AOD conditions. AOD differences of about 0.01 at 500 nm that can be easily related to the instrument calibration uncertainties can considerably affect such calculations during low AOD conditions. Hence, this campaign reaffirms that for cases of mean AOD 500 < 0.1, the calculation of AE becomes highly uncertain.
Investigating the sources of differences among different instruments, we compared all parameters included in the AOD 535 retrieval algorithm as provided by the different participating institutes. All individual differences (Rayleigh, NO 2 , ozone, water vapor related optical depths and air mass calculations) amounted to less than 0.01 in AOD at 500 and 865 nm.
Different cloud-flagging algorithms can affect the AOD datasets as different instruments/networks use different techniques.
During a day with sporadic appearances of high and mid-level clouds (which was deliberately chosen as a "difficult" task for such algorithms), results from different cloud-flagging algorithms limited the AOD comparison datasets between two 540 instruments from 40% to 90%, depending on the pair of instruments used, compared to the maximum number of cloudless data points calculated by all instruments. In general, using long term series for determining aerosol climatology at certain locations, too conservative cloud screening could lead to the elimination of high AOD local events, while screening that not eliminates cloud contamination will introduce biases linked mainly with cirrus clouds. Both approaches will have an impact on aerosol climatology and calculated AOD trends. 545 In comparison to earlier FRCs (I to III), the latest FRC reported here experienced an increase in both the number of instruments (total of 30) and international participating institutes (12 countries). In addition, analysis at four different wavelengths was performed for the first time. The CIMEL/AERONET, PFR/GAW and POM/SKYNET and SPO participating sun photometers showed very good agreement when compared to older intercomparisons. As AOD from algorithm differences was quite small, the results of the comparisons of this instrument group are considered to have been 550 very successful as differences are in most cases well within the calibration and overall instrument AOD uncertainties. The rest of the instruments also showed reasonable agreement with few exceptions. MFR instruments experienced additional uncertainties concerning the diffuser based measurements. SIM instruments also performed quite well when considering the radiative transfer based processing algorithm. In addition, spectral AOD retrieving PSR instruments also performed well, especially at the two higher wavelengths. Finally, Microtops AOD data were in most cases within reasonable agreement with 555 the reference triad but additional technical issues such as the hand-held based sun pointing and the smaller integration time (compared with other instruments) of the direct sun measurement lead to enhanced scatter of the results.
Instrument technical features such as differences in the field-of-view did not play an important role in FRC-IV for the low aerosol load conditions that were encountered. In order to quantify such features and similar issues, intercomparison campaigns have to be organized in moderate to high AOD conditions when forward scattered radiation and circumsolar 560 radiation can play an important role in instruments with different field-of-view entrance optics.
The results of the FRC-IV, which included a large variety of AOD measuring instrumentation via the participation of reference instruments from AERONET Europe, SKYNET, GAW-PFR, SURFRAD and the Australian aerosol network, could be considered as a starting point for global AOD homogeneity initiatives. The ultimate objective is a unified AOD product to be used for long term aerosol and radiative forcing studies, case studies involving accurate AOD retrievals, and 565 satellite validation related activities.