Multi-model evaluation of short-lived pollutant distributions over East Asia during summer 2008

. The ability of seven state of the art chemistry-aerosol models to reproduce distributions of tropospheric ozone and its pre-cursors, as well as aerosols over eastern Asia in summer 2008 is evaluated. The study focuses on the performance of models used to assess impacts of pollutants on climate and air quality as part of the EU ECLIPSE project. Models, run using the same ECLIPSE emissions, are compared over different spatial scales to in-situ surface, vertical proﬁle and satellite data. Sev-5 eral rather clear biases are found between model results and observations including overestimation of ozone at rural locations downwind of the main emission regions in China as well as downwind over the Paciﬁc. Several models produce too much ozone over polluted regions which is then transported downwind. Analysis points to different factors related to the ability of models to simulate VOC limited regimes over polluted regions and NO x limited regimes downwind. This may also be linked to biases compared to satellite NO 2 indicating overestimation of NO 2 over and to the north of the northern China Plain emission 10 region. On the other hand, model NO 2 is too low to the south and east of this region and over Korean/Japan. Overestimation

eral rather clear biases are found between model results and observations including overestimation of ozone at rural locations downwind of the main emission regions in China as well as downwind over the Pacific.Several models produce too much ozone over polluted regions which is then transported downwind.Analysis points to different factors related to the ability of models to simulate VOC limited regimes over polluted regions and NO x limited regimes downwind.This may also be linked to biases compared to satellite NO 2 indicating overestimation of NO 2 over and to the north of the northern China Plain emission 1 Introduction Short-lived pollutants (SLPs), defined here as tropospheric ozone and aerosols, including black carbon (BC), are the focus of several important efforts by the scientific community due to their potential role in emerging strategies aiming to mitigate global climate change and improve air quality (Shindell et al., 2012;Anenberg et al., 2012).Due to their relatively short lifetimes (e.g., aerosol lifetime in the troposphere is about one week (Pruppacher and Klett, 1997)), the impact of SLP (as well as ozone precursors) emission reductions on near-term reductions in the rate of climate warming has been examined in several recent studies (Ramanathan and Carmichael, 2008;Jackson, 2009;Penner et al., 2010;Shoemaker et al., 2013;Smith and Mizrahi, 2013;Rogelj et al., 2014).
Ozone is a reactive species impacting both climate and air quality.In the troposphere, it is produced photochemically from the oxidation of carbon monoxide (CO), and volatile organic compounds (VOCs) by OH radicals in the presence of nitrogen oxides (NO x ).Methane is also an important ozone precursor.Tropospheric ozone also has natural sources such as the flux from the stratosphere.Due to photochemical loss, it has a lifetime in the lower troposphere of a few weeks (Stevenson et al., 2006).
It is also removed by dry deposition to the surface.Radiative forcing due to tropospheric ozone over the industrial era is estimated to be 0.40 ± 0.20 W m −2 (Myhre et al., 2013b).Atmospheric aerosol plays a major role in the Earth's radiative balance by scattering (McCormick and Ludwig, 1967) and absorbing solar radiation (Haywood and Shine, 1995).Aerosols also affect the formation, lifetime and albedo of clouds (Albrecht, 1989;Twomey, 1977;Ackerman et al., 2000), causing indirect effects on the radiative balance.According to recent estimates, atmospheric aerosols emitted by anthropogenic and natural sources (e.g., heating, transportation, biomass burning and dust), have, since pre-industrial times, modified the aerosol direct effect by -0.35 ± 0.50 W m −2 , whereas the total (direct and indirect) effects, which include cloud adjustments due to aerosols, modified the Earth's radiative balance by -0.9 (from -1.9 to -0.1) W m −2 (Myhre et al., 2013b) and result from a negative forcing from most aerosols and a positive contribution from BC absorption of solar radiation (Haywood and Shine, 1995).BC is the carbonaceous component of soot, resulting from incomplete combustion.In a recent extensive study, Bond et al. (2013) estimated the direct radiative forcing of BC from fossil fuel and biofuel emissions for the industrial era to be 0.51 W m −2 whereas Myhre et al. (2013b) reported a positive forcing of 0.40 W m −2 (0.05-0.80).BC emissions are almost entirely anthropogenic and 90% of BC emissions are due to diesel engines, industry, residential burning, and open burning (Bond et al., 2013).The impact of SLCPs on air quality occurs at both the lo-cal and regional scales with episodes of high concentrations of pollutants, notably ozone and aerosol particulate matter (PM), having serious effects on human health and leading to premature deaths (Nawahda et al., 2012).For example, Anenberg et al. (2012) estimated that air pollution caused around quarter of a million deaths from lung cancer worldwide in 2010.Air quality impacts of PM, which are classified as carcinogenic to humans, depends not only on the total mass concentration of PM but also on particle size.Aerosol composition also appears to play a role with individual components having an impact, such as BC on cardiovascular mortality, for example, but further quantification is still required (Janssen et al., 2012).Aerosols also pose a serious problem by reducing visibility, sometimes dramatically in the case of China (e.g., (Zhao et al., 2011;Chen et al., 2012;Li et al., 2013)).
East Asia is a key region being targeted by SLP mitigation strategies due to the recent rapid increases in precursor emissions (Streets et al., 2003;Richter et al., 2005;Klimont et al., 2013;Wang et al., 2014;Klimont et al., 2016) contributing regional and global radiative forcing, severe episodes of air pollution and other environmental impacts (Ma et al., 2010).National ambient air quality exceedances occur in many cities (Shao et al., 2006) especially in eastern China (Wang et al., 2011b;Yang et al., 2011;Chan and Yao, 2008;Ma et al., 2012;Boynard et al., 2014) over the North China Plain (NCP, including Beijing and Tianjin), the Yangtze river delta (YRD, including Shanghai) and the Pearl river delta (PRD, including Hong-Kong and Guangzhou).In this context, the European Union (EU) Evaluating the CLimate and air quality ImPacts of Short-livEd pollutants (ECLIPSE) project developed new emission inventories for present-day global SLP emissions as well as future scenarios designed to benefit both air quality and climate with a focus on Asia and Europe (see Stohl et al. (2015) for discussion of the ECLIPSE rationale and summary of results).The ECLIPSE inventory was developed for methane, aerosols, ozone and their precursors, including, in particular, improvements over China and India, where several sources, such as brick making kilns, were updated and previously unaccounted sources such as wick lamps, diesel generators and high-emitting vehicles were included.These emissions were used to perform a detailed analysis of climate metrics for different emission sectors, regions (including China, India) and seasons using state of the art Earth System Models (ESMs).The results were used as a basis for refining a mitigation scenario by including additional measures with beneficial air quality and short-term (20-year) climate impacts.Compared to the ECLIPSE current legislation scenario, taking into account current and planned legislation for emission reduction, the ECLIPSE mitigation scenario, taking into account these additional measures (e.g.gas flaring, diesel engines, and coal/biomass stoves) would reduce global anthropogenic methane and BC emissions by 50 and 80% respectively by 2050.It is estimated that, in the decade 2041-2050, the mitigation scenario would result in 0.2 K less surface temperature warming globally (Stohl et al., 2015) and, at the same time extend, for example, life expectancy in China by 1.8 months in 2030.
An important component of ECLIPSE is the so-called reality check to evaluate model performance over pollutant source (Europe (Stohl et al., 2015), China/Asia , the focus of this paper) and receptor (Arctic (Eckhardt et al., 2015)) regions.In these evaluations, the ECLIPSE models were run with the same present-day ECLIPSE emission inventory (ECLIPSEv4a) for 2008 and 2009.Note that the same global models were used to estimate sector/regional emission responses and, in a sub-set of cases, to predict, using the ECLIPSE emission scenarios, future atmospheric composition and associated impacts on climate and air quality.Due to their coarse spatial resolution, the ECLIPSE global chemistry-climate models may not be the most suitable tools to assess air quality impacts, however, they are the tools used to evaluate climate and air quality impacts together.To address this point, a regional model is also included in the evaluation, and one of the global chemical-transport models is run at relatively high horizontal resolution (50 km) compared to the other global models.The ECLIPSE evaluation over Europe showed that many models underestimate CO, and overestimate ozone, whilst modeled AOD was reproduced reasonably well ((Stohl et al., 2015)).Over the Arctic, models often underestimate both BC and sulfate aerosols due to problems with emissions (e.g.fires), vertical redistribution, transport and loss processes such as wet deposition.Here, we present results from the evaluation of the ECLIPSE models over East Asia.As noted above, this region was targeted due to its still high pollution levels, climate impacts and as a region where SLP mitigation options are being actively considered.It is also a region where significant uncertainties surround model estimates of radiative forcing.For example, Kinne et al. (2006) showed important underestimation of observed AOD by multiple models over East Asia in summer and pointed out that uncertainties in the direct radiative forcing could be larger than inter-model differences in AOD suggest.Even in the recent AeroCom model comparison, inter-model variation in radiative forcing is largest in this region (Myhre et al., 2013a).Samset et al. (2014) suggested that BC direct radiative forcing is overestimated by about 25% downwind of Asian emissions in the upper troposphere over the Pacific based on overestimation of modeled BC compared to aircraft observations.The ECLIPSE model evaluation over East Asia focuses on the summer period (August and September 2008).This was motivated by the availability of intensive observations from the CAREBEIJING 2008 measurements campaign (Huang et al., 2010;Zhang et al., 2014) and by the fact that severe ozone pollution episodes occur over NCP at this time of year even if the maximum is generally earlier in the late spring for trace gases (Naja and Akimoto, 2004;Li et al., 2007;He et al., 2008;Safied-dine et al., 2013) and aerosols (Cao et al., 2004;Sun et al., 2004;Yang et al., 2005;Huang et al., 2006).During the summer months, the monsoon circulation brings cleaner air from the Pacific Ocean into southern and eastern Asia reducing pollutant concentrations (Lin et al., 2009;Kim et al., 2007).However, high pollution episodes with enhanced aerosol concentrations and decreasing visibility still occur in coastal regions, due to increases in relative humidity increasing aerosol sizes (Flowers et al., 2010).The monsoon flux also induces transport of high ozone concentrations inland from coastal city emissions (He et al., 2008).In order to assess model performance over East Asia for air quality, as well as climate, we use a variety of different datasets covering the urban, regional, and continental scales.Ozone, aerosol and precursor data at surface sites in urban and rural locations are used, together with CAREBEIJING aircraft data collected in the lower troposphere south of Beijing, to evaluate model performance in terms of local and regional pollution from major emission regions.Continental scale horizontal and vertical transport of ozone and aerosols, important for radiative impacts, are assessed downwind of the main emission regions using aerosol lidar data as well as satellite aerosol lidar and tropospheric ozone, CO and NO 2 column data.Emissions We examine the effects of these emission reductions on atmospheric composition using a regional model in order to assess potential influences on the model results compared to data collected in the Beijing region.
The emissions, models and datasets used to assess model performance are described in Sect. 2 as well as the meteorological situation during summer 2008.Evaluation of simulated ozone and its precursors on local, regional and continental scales are presented in Sect.3.This includes comparison with the Infrared Atmospheric Sounding Interferometer (IASI) CO and ozone data, the Global Ozone Monitoring Experiment (GOME) NO 2 data and surface/aircraft data collected in the vicinity of Beijing and surface trace gas data collected at downwind sites in Korea and Japan.Comparison of modeled and observed trace gas correlations are used to draw conclusions about whether model discrepancies are due to emissions, chemical processing (VOC or NO x limited ozone production) and/or transport.Comparisons between observed and modeled aerosol optical properties, as well as available surface/aircraft data on aerosol chemical composition, are discussed in Sect.4.This includes comparison with Moderate resolution Imaging Spectrometer (MODIS) AOD, Cloud-Aerosol LIdar with Orthogonal Polarisation (CALIOP) as well as ground-based attenuated backscatter lidar profiles, and aerosol surface composition.Conclusions are given in Sect. 5.

Models, evaluation datasets and meteorological conditions
In this section, the global and regional models involved in this study are presented together with the different measurement datasets used to evaluate their performance including satellite data, ground-based, and airborne measurements.The meteorological conditions over East Asia during summer 2008 are also discussed, including possible biases in model transport patterns.

Model descriptions and emission dataset
The main model characteristics are listed in Table 1.Models were run with ECLIPSEv4a present-day anthropogenic emissions, including agricultural waste burning, for the year 2008 (Klimont et al., 2016).Whilst ECLIPSEv4a emissions are annual averages for most of the sectors, a seasonal cycle is applied to the domestic sector (Streets et al., 2003).Emission reductions associated with the mitigation strategies during the Olympic period mentioned in the Sect. 1 are not taken into account in the ECLIPSE anthropogenic emissions.Wildfire emissions were taken from GFED 3.1 (van der Werf et al., 2010) and aircraft/shipping emissions were from the RCP 6.0 scenario (Lee et al., 2009;Buhaug et al., 2009), respectively, whereas biogenic emissions were prescribed individually by each model (Table 1).Dust, sea salt and dimethyl sulfide (DMS) emissions were also model-dependent.WRF-Chem provides online dust and sea-salt emissions but only the latter are used in the ECLIPSE simulations due to an overestimation of dust loads, as reported by Saide et al. (2012).The main dust sources in East Asia are located in dry regions of China and Mongolia, north of the Himalayas (Taklamakan, Gobi and Gurbantunggut deserts).Most of the dust events occur in spring (Huang et al., 2013) whilst in summer, due to the Asian summer monsoon flux, rather little dust is transported to coastal areas (Kim et al., 2007).Thus, neglecting this source in WRF-Chem summertime simulations is not expected to introduce a large bias in modeled aerosol loads.Global model simulations were conducted for 2008 with a one or two year spin-up (depending on the model) whereas the regional WRF-Chem simulation was for August and September 2008 with a 10 day spin-up using initial chemical and boundary conditions from the MOZART-4 model (Emmons et al., 2010).
Models were run over a range of horizontal and vertical resolutions ranging from around 50 km to 250 km and with 26 to 60 vertical levels.

Satellite observations
Several satellite datasets have been used in this evaluation since they provide useful information about continental scale spatial distributions of pollutants and their precursors.The IASI sensor mounted onboard the MetOp-A platform has provided data since June 2007.It is a nadir-looking Fourier transform spectrometer working in the thermal infrared spectral range (645-2760 cm −1 ) (Clerbaux et al., 2009) that can detect several trace gases including ozone and CO.The MetOp-A orbit is sunsynchronous and provides complete observation of the Earth's surface every day.However, clouds may affect the signal and lead to errors in the retrieved data.The software used for the retrieval of CO and ozone global distributions is the Fast Optimal Retrievals on Layers for IASI (FORLI, (Hurtmans et al., 2012)).GOME-2 ((Munro et al., 2000)), also onboard the Metop-A satellite, is a nadir-looking spectrometer covering the spectral range between 240 and 790 nm at 0.2-0.4nm resolution.
With its large swath of 1920 km, GOME-2 provides near global daily coverage.The GOME-2 sensor uses the Differential Optical Absorption Spectroscopy (DOAS) technique to observe the atmosphere, and tropospheric NO 2 concentrations are retrieved using the algorithm developed by Boersma et al. (2004).AOD space-borne observations are collected by the MODIS instrument onboard two satellites, Aqua and Terra, flying opposing orbits, providing global coverage of the Earth every 1-2 days.The MODIS level 3 products that are used in this study are described by Hubanks et al. (2008).Vertical distributions of aerosols and clouds are probed with the CALIOP instrument mounted on the CALIPSO satellite, part of the A-train satellite constellation.CALIOP is a two wavelength (532 and 1064 nm) polarisation-sensitive lidar as described by Winker et al. (2007).

Ground-based data
Surface data collected at urban, rural and remote sites in China, Japan and Korea was used in this study from SNU/EANET (Seoul National University / Acid Deposition Monitoring Network in East Asia) and Peking University (PKU) stations.Site locations are shown in Fig. 1 and station coordinates are given in Table 2.For example, during the CAREBEIJING 2008 campaign, SLP concentrations were measured at the air quality observatory of PKU in Beijing which can be considered as a typical urban environment.Instrumentation deployed at the observatory measured ozone, NO x (defined as the sum of NO and NO 2 ), and CO (Chou et al., 2011), as well as particulate matter (PM: PM 2.5 and PM 10), organic carbon (OC) and BC (Huang et al., 2010).The Gosan observatory (Kim et al., 2005) is a long-term observatory located on the Jeju island, South Korea, measuring OC, BC, aerosol number size distributions (Flowers et al., 2010), and NO x , sulphur dioxide ( SO 2 ), CO and ozone concentrations.It is not influenced by local pollutant emissions and samples air masses transported downwind of continental Asia (e.g., Kim et al. (2005)).Models were also compared to vertical aerosol backscatter signals measured by the Japanese National Institute for Environmental Studies (NIES) ground-based lidar network (Shimizu et al., 2004;Sugimoto et al., 2008) covering Japan with 10 lidars calibrated using a similar procedure.Backscatter data are available on-line on a hourly basis, allowing a robust validation of the models.

Airborne data
As part of the CAREBEIJING 2008 campaign, 12 scientific flights were performed over the area south of Beijing in the Hebei province.Flights followed linear routes at altitudes in the range 500-2100 m in order to sample both the boundary layer and the free troposphere.The instrumentation on board the aircraft is described by Zhang et al. (2014) and includes ozone, CO, SO 2 and NO x samplers.The flight tracks are shown in blue in Fig. 1.One goal of the CAREBEIJING 2008 campaign, which also included surface measurements, was to examine the effects of additional local emission mitigation from June to September 2008 in Beijing municipality (Wang et al., 2010) which appear to have been significant locally.Wang and Xie (2009) and Zhou et al. (2010) observed reductions of 19 to 57% in CO and 28 to 52% in PM10 on-road emissions in Beijing, whereas Wang et al. (2009) reported a decrease of 21% in summer 2008 CO observations, compared to 2006 and 2007, at a site 100 km from the centre of Beijing and concluded, based on a model analysis, that ozone concentrations were reduced by 2-10 ppbv over the NCP region during the mitigation period.In contrast, Worden et al. (2012) deduced only a 11% reduction in CO emissions over Beijing based on analysis of satellite data.We address this point by using WRF-Chem model to run a sensitivity test with lower emissions (Sect.3.6).

Meteorological context
The majority of ECLIPSE models were driven or nudged using various meteorological analyses from ECMWF (European center for medium-range weather forecast) with only WRF-Chem being nudged using NCEP (National Centers for Environmental Predictions) FNL (final) fields.To illustrate average transport patterns during August and September 2008, we show surface relative humidity and winds over the region in Fig. 1.At this time of the year, the flow over the southern part of East Asia is influenced by the Asian summer monsoon with dominant synoptic winds blowing from the south-east linked to the anticyclonic circulation over the Pacific Ocean to the east.This also leads to high relative humidity over the southern part of the region.
ECMWF and NCEP wind fields are rather similar suggesting that differences in large-scale transport patterns are not the main cause of differences in trace gases and aerosols discussed in later sections.The NorESM climate model is an exception since it was forced with sea surface temperatures for 2008 rather than nudged to meteorology.It was included in this evaluation in order to provide consistency with companion studies where this model was used to estimate present-day and future emission impacts on air quality and climate (e.g., (Stohl et al., 2015)).This model has a monsoon circulation which penetrates further over East Asia compared to the ECMWF and NCEP analyses resulting in higher surface relative humidities over this region.
This may affect ozone photochemistry as well as aerosol formation (Sect. 3.6 and Sect. 4.3).

Interpretation of differences in modeled trace gas distributions
In this section, modeled ozone and precursors (CO, NO 2 ) are evaluated at local, regional and continental scales to examine model performance on scales relevant for regional air quality and regional/global climate impacts.Firstly, large-scale spatial distributions of modeled ozone, CO and NO 2 are compared to IASI and GOME-2 satellite data.Lower tropospheric ozone is evaluated against 0-6 km IASI columns and 0-20 km IASI columns are used to assess whether differences in downward transport from the stratosphere could be influencing modeled ozone over East Asia.IASI CO and GOME-2 NO 2 are used to evaluate performance over and downwind of emission regions.Secondly, to evaluate modeled ozone and its precursors on a regional scale, results are compared to surface measurements from various Chinese and Korean stations as well as vertical distributions observed by aircraft during CAREBEIJING in the lower troposphere.Further analysis of trace gas ratios and ozone diurnal cycles is used to provide insights into whether modeled discrepancies are due to deficiencies in emissions, photochemical processing or transport in the models (Sect.3.6).We also examine, using one model (WRF-Chem), the potential impact of emission reductions over Beijing during the study period.

IASI ozone columns
Day and night-time observations of IASI ozone are used to evaluate the models.Due to the variation of the IASI sensor sensitivity with altitude, modeled ozone values need to be smoothed using the following equation: where AK is the averaging kernel matrix, I is the identity matrix and X smooth , X model , and X apriori are the smoothed, modeled, and a priori ozone profiles, respectively.AK and X apriori are obtained when inverting the measured signal.The 0-20 km column is retrieved by adding up the smoothed profiles over all altitudes.AK is a 40 × 40 matrix, and when it is multiplied by X model , every layer has an influence on the 39 other layers.Here, the 0-20 km column excludes maximum ozone concentrations in the stratosphere.Nevertheless, an overestimation of the stratospheric ozone maximum by a model could lead to an overestimation in other layers providing an indication of the amount of ozone transported from the stratosphere to the troposphere.It should be noted that the WRF-Chem ozone profiles were completed by climatological ozone profiles between 20 and 40 km, because the convolution by the averaging kernel requires a complete vertical profile, whereas the model is limited to 20 km in altitude.IASI data are averaged on a 1 • × 1 • grid and model results were scaled to this grid.Given that the IASI sensor is not particularly sensitive to near-surface trace gas concentrations (Boynard et al., 2009), we focus here on the layer between the ground and 6 km.This tropospheric layer can be considered to be less influenced by the stratosphere and therefore a good indicator of ozone produced over and downwind of Asian emission regions (Boynard et al., 2009).4.
The IASI 0-6 km columns in Fig. 2 highlight large ozone concentrations over the eastern coast of China, covering the NCP and YRD regions north of 30 • N (5-6×10 17 molec cm −2 ).At lower latitudes, and particularly over the PRD region, ozone concentrations are lower (3-4×10 17 molec cm −2 ).The two northerly regions are known for their high emissions of ozone precursors but over the PRD region, as seen in Fig. 1 and discussed in the Sect. 1 (Safieddine et al., 2015), the monsoon flux increases ozone destruction due to higher humidities as well as transporting pollution northward.High ozone concentrations are also observed over Korea, the Sea of Japan, and in the north-eastern part of the evaluation domain which can be attributed to transport of ozone and its precursors from China, Korea and Japan (Naja and Akimoto, 2004).The ECLIPSE models have too much ozone over these regions compared to IASI.This is confirmed by further statistical analysis for this region (delimited in black in Fig. 2) provided in Table 4 with, for example, model mean NME of 24%.Ozone is also overestimated further downwind over the Pacific Ocean compared to IASI in many models.Tropospheric ozone columns are also too high south of 30 • N, even if concentrations are much lower in this region, and may indicate that simulated relative humidities are too low.
Higher modeled ozone in East Asia is not due to a general overestimation in the stratospheric ozone flux, since models show good agreement with 0-20 km IASI ozone columns (high correlation coefficients (R > 0.93, except for WRF-Chem, 0.80), low NME ( < 20%), as indicated in Table 3).Other reasons for these discrepancies are discussed in Sect.3.6.

IASI CO columns
In a similar manner to the IASI ozone columns, IASI CO columns are smoothed using Equation 1 (George et al., 2015).
ECLIPSE models are compared to average August 2008 IASI total CO columns in Fig. 3.In general, models underestimate CO over the Chinese emission regions and over the Pacific downwind from Japan.Underestimation of CO over eastern Asia has already been pointed out in previous studies and suggested as a cause for the general underestimation of CO in the Northern Hemisphere (Shindell et al., 2006).Improvements to simulated CO in winter have been noted following the introduction of a seasonal cycle in domestic combustion emissions (Stein et al., 2014) and also taken into account in this study, albeit not in the same manner.However, this cannot explain the underestimation in the summer months shown here over Chinese emission regions nor the apparent CO overestimation over India (Sect.3.6).

Tropospheric NO 2 columns
NO 2 is a short-lived species produced largely as a result of rapid interconversion of NO emitted from anthropogenic activities and which can be spectroscopically observed.NO 2 photolysis is the primary source of tropospheric ozone.Investigating seen over the delimited emission area.In terms of spatial patterns, we note that the models systematically underestimate NO 2 over the southern/eastern part of the NCP region, as well as over Korea and Japan, possibly pointing to an underestimation in emissions over these regions.They tend to overestimate NO 2 over and to the north of the Beijing region.

Surface trace gas concentrations
As well as evaluating the models on regional/continental scales, we also evaluate the results against surface data where air quality issues are important.Daily average surface mixing ratios of ozone and its precursors, as well as SO 2 (an important anthropogenic aerosol precursor), are compared with ground-based observations at eight sites (SNU/EANET and PKU stations) shown in Fig. 1 averaged over August and September 2008.The first three stations (Beijing, Incheon, and Seoul) are urban stations whereas the last five (Gosan, Kunsan, Kangwha, Mokpo, and Taean) are located at rural locations.Therefore, we evaluate models, not only at polluted locations but also at sites downwind from major emission regions or in regions where pollution levels are lower.We note that the observations at PKU may have been influenced by the mitigation strategies put in place during the study period although we do not find very large differences between the measurements at PKU compared to Incheon and Seoul.
Fig. 5 shows box and whisker plots for modeled and observed NO 2 mixing ratios at these sites.There is significant variability in modeled NO 2 compared the observations at polluted and rural sites.This could be caused by differences in model vertical resolution near the surface although no correlation was found between the height of the first model layers and pollutant concentrations.While HadGEM and TM4-ECPL are able to reproduce the magnitude of NO 2 surface concentrations at both urban and rural sites, EMEP and WRF-Chem show better agreement with measured rural concentrations and tend to overestimate NO 2 in urban areas.OsloCTM2 has difficulties reproducing concentrations at both types of site and NorESM slightly underestimates surface NO 2 surface concentrations in urban areas.Several models (NorESM, OsloCTM2, and TM4-ECPL) underestimate CO at urban locations whereas HadGEM overestimates CO as shown in Fig. 5.In general, all models underestimate observed CO at rural stations confirming the discrepancies found compared to IASI CO data.With regard to ozone (Fig. 5), higher mixing ratios are observed at rural stations compared to polluted urban sites, due to less ozone titration and a switch to photochemical ozone production downwind from source regions.This gradient between the urban and rural locations is reproduced by EMEP and WRF-Chem, whereas TM4-ECPL, OsloCTM2 and NorESM simulate rather constant but excessive ozone at both urban and rural sites.Ozone in the HadGEM model is too low at urban sites.Reasons for these discrepancies are discussed further in Sect.3.6.Comparison with observed SO 2 mixing ratios shows that models tend to overestimate concentrations both at urban and rural locations (also discussed in Sect.4.2).

Trace gas vertical distributions
Modeled vertical distributions for NO 2 , CO, ozone, and SO 2 (hourly or 3-hourly profiles depending on the model, averaged Certain models, and, in particular OsloCTM2 and NorESM, also underestimate CO between 500 and 1000 m where observed CO reached 400 ppbv.This underestimation is consistent with the surface comparisons in Fig. 5 and IASI CO tropospheric columns.Comparison with airborne ozone vertical profiles shows that EMEP, TM4-ECPL, and WRF-Chem are able to capture the high concentrations observed below about 750 m, whereas other models tend to underestimate ozone below this altitude.

Discussion
The comparisons presented in the previous sub-sections show that the model performances vary considerably.In this section we examine, in more detail, possible reasons for the discrepancies described earlier and making use of the various observations in a more synergistic manner, for example, by examining observed/modeled trace gas ratios and ozone diurnal cycles.
Discrepancies may be due to differences in model resolution, transport processes, such as boundary layer exchange, as well as photochemical processing or loss by deposition.The sensitivity of modeled pollutants to reducing emissions over the Beijing region is investigated using WRF-Chem in order to assess the impact of additional emission mitigation during the study period.
Deviations of observed trace gas ratios compared to emitted ratios can be used to determine the extent of chemical or dynamical processing that has taken place (Wang et al., 2005).In our study, observed ratios (CO:NO x and SO 2 :NO x ) at urban sites deviate from the emitted ratios.For example, at the Beijing site the CO:NO x emission ratio is 10.7 ppbv ppbv −1 compared to an observed ratio of 16.7 ppbv ppbv −1 .This indicates that there has been stronger processing of NO x compared to CO which is not surprising given their different chemical lifetimes (few hours compared to several weeks).It may also suggest more active mixing with cleaner air masses lower in NO x compared to CO. CO also has significant secondary sources from VOC oxidation.Modeled ratios are generally less scattered than the observations and either lie close to emitted ratios or between the emitted and observed ratios.In models lying close to the emitted ratios (e.g.TM4-ECPL, NorESM, WRF-Chem, OsloCTM2) this points to a lack of chemical processing, particularly with respect to NO x and may have implications for modeled ozone, as discussed hereafter.In the case of SO 2 , models generally overestimate concentrations at polluted sites (and many rural sites).Over Beijing, observed SO 2 :NO x ratios are lower (0.12 ppbv ppbv −1 ) than emission ratios (1.5 ppbv ppbv −1 ).
Models (WRF-Chem, TM4-ECPL, NorESM, EMEP) lie between observed and emitted ratios.A possible cause is that SO 2 emissions from power plants, which occur outside urban areas such as Beijing, are placed in coarse model grid cells including both urban and rural areas thereby mixing emissions from a variety of sources.This may explain why there is better agreement with higher observed SO 2 collected near to the surface during the CAREBEIJING flights south of Beijing than with observed concentrations at the Beijing urban site.Overestimation in Beijing may also be due to emission reductions associated with the Beijing Olympics (see later discussion), although we find the same overestimation of observed SO 2 at Incheon and Seoul in most models.
Significant variability is seen in the comparison between model and observed CO at polluted locations.Models (OsloCTM2, TM4-ECPL and NorESM) that significantly underestimate CO at most polluted sites also overestimate ozone suggesting more active photochemistry in these models compared to reality.The opposite is true for HadGEM which has very low ozone and significantly overestimates CO pointing to excessive ozone titration by high NO x levels, associated with low OH and weak CO chemical loss in this model.This is also illustrated in Fig. 7 which compares averaged simulated ozone for available models with observed diurnal cycles of ozone in Beijing.The observations show a clear early afternoon maximum even if levels were slightly lower, on average, during August 2008, compared to other years (Zhang et al., 2014).Model variability is large with several models overestimating the daytime maximum (EMEP, NorESM, WRF-Chem, TM4-EPCL).HadGEM has a very flat diurnal cycle with no daytime maxima consistent with an underestimation of photochemical activity in this model.As noted above, the ECLIPSE models also tend to overestimate ozone at rural sites as well as downwind over the Pacific.This may be due to a variety of factors including excessive photochemical production of ozone (or low NO x titration) over polluted regions and/or during transport downwind.Examination of ozone:NO z (NO z = NO y (total odd nitrogen)-NO x ) ratios can be used to examine whether a region is under a VOC or NO x limited regime with a ratio of less than 25 indicating a VOC limited regime (Tie et al., 2013).Analysis of data collected at the PKU site during CAREBEIJING in August 2008 showed that ozone production during a high ozone pollution episode (peaks around 150 ppbv) was due to VOC-limited ozone production until late morning followed by additional NO x limited production in the early afternoon (Chou et al., 2011) whereas VOC-limited ozone production prevailed during periods with lower observed ozone.Previous studies have noted that major emission regions in China are generally under VOC-limited regimes (Wang et al., 2011b).Here, we have been able to use NO y =NO x +HNO 3 +PAN from 3 models to examine average behavior in these models.WRF-Chem, which agrees well with surface observations at polluted and rural sites, is largely under a VOC limited regime (ratio less than 25) over the main emission regions.However, as can be seen in Fig. 7, WRF-Chem overpredicts daytime ozone and has very low predicted nighttime ozone.The latter is due to high NO x at night brought about by a lack of processing and possibly boundary layer mixing of NO x emissions in this model, also suggested by the analysis of CO:NO x ratios.TM4-ECPL is also under a VOC limited regime but this model overestimates ozone at urban and rural surface sites.In this model, NO:NO 2 ratios (lowest model level) are a factor of 2 higher than observed ratios (less than 0.5) in Beijing.As suggested earlier, this indicates insufficient conversion of NO emissions to NO 2 leading to a lack of ozone titration which may be linked to the VOC chemistry shifting the NO:NO 2 balance resulting in ozone rather than NO z (e.g.HNO 3 ) formation.In contrast, the NorESM model, which also overestimates ozone at all surface sites (e.g.Fig. 7), is in a NO x limited regime over polluted areas.This model has too much daytime NO 2 compared to surface observations in Beijing, for example (not shown).This leads to too much photochemical ozone production over emission regions which is transported downwind over Korea/Japan and the Pacific (surface sites and IASI ozone).This may also be linked to the simulation of the monsoon inflow over East Asia which penetrates too far to the north over NCP in this model leading to dilution of emissions with less polluted air masses.
The ozone discrepancies discussed above may also be due to discrepancies in the ECLIPSE emissions.While this is difficult to diagnose explicitly, model evaluation against satellite GOME-2 data, representing NO 2 over wider spatial scales, provides some consistent insights.Models tend to underestimate NO 2 over the southern and eastern part of the main Chinese NCP emission region, consistent with the evaluation against CAREBEIJING aircraft data.On the other hand, background NO 2 is generally overestimated over the Chinese coastal region, around and to the north of Beijing, which may contribute to the overestimation of ozone downwind of the main emission regions.These spatially distributed discrepancies occur across a region with strong concentration gradients leading to over-and underestimations at surface sites.A more systematic underestimation of NO 2 over Korea and Japan by the models is found compared to the GOME-2 data suggesting that emissions over these regions may be underestimated.
The ECLIPSE models also systematically underestimate CO downwind compared to surface data over Korea, Japan, and compared to IASI CO data over Japan and downwind over the north-western Pacific Ocean.Whilst inclusion of additional seasonality in the ECLIPSE emissions (already included for domestic combustion), might improve agreement in winter and spring months (Stein et al., 2014), this is unlikely to explain these summertime differences.Low model CO appears to be linked to the clear overestimation in modeled ozone at rural sites and compared to IASI 0-6 km column data.Excessive ozone resulting in too much destruction of CO may suggest that modeled CO lifetimes are too short.This hypothesis is consistent with the findings of Monks et al. (2015) who concluded that, in models run with the same emissions, differences in OH (chemical schemes) are a more likely cause of the systematic CO underestimation in the Northern Hemisphere and the Arctic than differences in vertical transport.Indeed, we find that surface August mean modeled OH (not shown) is higher in the NorESM model (due to the penetration of the monsoon flux) compared to, for example, TM4-EPCL and WRF-Chem over the main Chinese emission regions.In this case, excessive water vapor may also be contributing to high OH.In contrast, excessive modeled CO over the central Pacific, where concentrations are low, may be due to the position of the Pacific anticyclone in the meteorological analyses used by the majority of models.A shift in the position of the anticyclone to the south, possibly as a result of transport that is too zonal, could produce this pattern of negative (positive) biases over the north (south) Pacific.This may also explain low modeled CO in the Arctic noted by Monks et al. (2015).
To assess the possible impact of emission mitigation measures in Beijing during the period analyzed in this study, the WRF-Chem model was run for 2 weeks (1 to 15 August 2008) with reduced pollutant emissions from the transport, industrial and solvent use sectors, following the mitigation strategy during the Olympics described in Wang et al. (2010).For example, emissions of all species in the transport sector were reduced by 75% in Beijing and 20% in the area 200 km from Beijing, corresponding to eight model grid cells around Beijing in this model.Emissions linked to the industrial sector or to solvents were reduced by 50% in the same region.Most pollutant concentrations are reduced resulting, for example, in lower CO (by about 30 ppbv), locally in and around Beijing, in the emission reduction run compared to the base run.This results in ozone reductions of up to 6-7 ppbv in the region of Beijing.Based on these results, it appears that these reduction measures cannot explain the discrepancies between the models and the observations discussed earlier.
Overall, the evaluation of the ECLIPSE trace gas distributions points to excessive ozone production in many models.Potential causes vary between models and are linked to model treatments of NO x /NO y partitioning and VOC chemistry as well as physical factors, as noted for the NorESM model.This leads to systematic overestimation of ozone downwind of main Chinese emissions regions coupled to a general underestimation in CO concentrations in the same outflow regions.Comparison with a combination of satellite data and surface data at rural sites enables more robust conclusions to be drawn whereas comparisons at urban sites are less conclusive due to large variability in model results and difficulties for global models to reproduce fine-scale variations.This overestimation of ozone has implication for the ability of models to correctly assess regional air quality and climate impacts.Ozone anthropogenic forcing is sensitive to the altitude distribution of ozone perturbations from different emissions (e.g.(Stevenson et al., 2013)).
4 Interpretation of differences in modeled aerosol distributions In this section, model results are evaluated against satellite observations from MODIS and CALIOP instruments, measuring AOD and attenuated backscatter, respectively.MODIS AOD allows a comparison of the total aerosol load integrated over the atmospheric column whereas CALIOP signals are used to evaluate vertical aerosol distributions.Simulated aerosols are also compared to observations (BC, sulphate, and OC) at Beijing and Gosan ground-based stations as well as with vertical profiles from aerosol lidar observations at 10 stations in the Japanese NIES network (blue open triangles in Fig. 1).In Sect.4.3 we discuss reasons for differences between modeled and observed aerosol distributions.

Aerosol optical depth
AOD is determined as the aerosol extinction coefficient integrated over the whole atmospheric column.Since the aerosol extinction coefficient is mostly linked to the aerosol surface distribution (and to a lesser extent to the aerosol complex refractive index), large values of AOD can be observed in cases of high concentrations of fine mode aerosol particles, e.g.pollution over cities (Wang et al., 2011a).MODIS AOD fields at 550 nm were retrieved from daily observations averaged over August and In order to further investigate model skill at simulating AOD, two specific regions with high AOD values are selected within the domain and are indicated in the top left panel of Fig. 8.The first region is located over northern India and is well known for the significant accumulation of pollutants at this time of the year, driven by the Indian monsoon.This accumulation is due to large local emissions, and the effect of dominant southerly winds causing the transport of pollution up to the Himalayas which acts as a natural barrier (Lawrence and Lelieveld, 2010).The second region encompasses the main emission areas in eastern China, including several megacities such as Beijing, Shanghai, and Hong Kong.Whilst this region is influenced by dust episodes coming from the north-eastern Asian deserts (e.g., (Huang et al., 2013)) in spring, the monsoon flux inhibits such events in summer.6 for the two regions and in Table 7 for the entire domain.

Aerosol backscatter coefficient Evaluation against satellite observations
In this section, the ECLIPSE models are evaluated against vertical distributions of attenuated backscatter at 532 nm from CALIOP, averaged over a 3 • ×5 • grid over Asia for August and September 2008.As indicated in Table 1, most of the models calculated the aerosol extinction coefficient (α) rather than the aerosol backscatter (β).Though CALIPSO level 3 data from the operational algorithm includes α, important uncertainties are associated with these retrievals.This is because α retrievals rely on inversion of lidar signals, which requires knowledge of the so-called Lidar ratio S = αaer βaer , dependent on the aerosol type.Omar et al. (2010) showed that a low SNR (Signal to Noise Ratio) can lead to mis-classification and lack of aerosol layer identification, especially close to the surface.Liu et al. (2009) noted cloud contamination in backscatter and α profiles, whereas Young and Vaughan (2009) pointed out potentially erroneous assumptions in the lidar ratio S used in α retrievals.
Finally, Winker et al. (2009) highlighted calibration coefficient biases in the daytime attenuated backscatter profiles.
To verify possible aerosol mis-classification, an alternative product based on the CALIOP level 1 data, and presented by Ancellet et al. (2014), is used.This product is based on level 1 backscatter signals filtered for clouds using CALIPSO level 2 cloud masks.In this retrieval, 3 brightness temperatures (8, 10, 12 µm), measured by the infrared interferometer on CALIPSO, the cloud layer depolarization ratio and the color ratio are used as additional requirements.The final product described in Ancellet et al. (2014) is unitless and is called the apparent (or attenuated) scattering ratio (R app ).This product is not affected by errors associated with the lidar signal inversion.To allow a fair comparison, model results must be converted to R app using: (2) β and α can be described as the sum of molecular (β mol and α mol ) and aerosol (β aer and α aer ) signals which describe the backscatter and extinction associated with trace gases and aerosols, respectively.z ref is an altitude where only the molecular signal is observed.For some models, α only is provided.In this case, R app is calculated using: where BER is the backscatter to extinction ratio (BER = βaer αaer ) and is fixed to 0.02 sr −1 which is a common value observed over east-Asia (Cattrall et al., 2005;Xie et al., 2008;Chiang et al., 2008).BER is the only assumption made in the R app calculation.
The distribution pattern of CALIOP-derived R app between 0 and 2 km (Fig. 10a) highlights three major features over Asia, consistent with the MODIS observations presented in Fig. 8. Enhancements are associated with polluted regions over eastern China and northern India where anthropogenic emissions are significant, and background pollution over the desert region, north-west of the Himalayas.The ECLIPSE models reproduce the location of anthropogenic air masses over eastern China but underestimate the magnitude by 5-50%.One exception is EMEP which overestimates the backscatter signal by more than 50%.
Only OsloCTM2 and EMEP simulate the observed pattern over northern India, albeit with a slight overestimation (20-40%).
The signal over the Tibetan plateau desert, which is mostly due to dust particles, is not simulated by the models.This result suggests that all models lack a source of crustal aerosols because of soil erosion in this particular region with complex orography that is not well represented in models run at coarse resolution.Between 2 and 4 km (Fig. 10b), CALIOP detected elevated aerosols over the Tibetan plateau and eastern China.Again, none of the models are able to reproduce the signal over the desert region.However, the models (except NorESM and HadGEM) are able to capture a higher R app over eastern China.Whereas most models overestimate the observed signals, it is slightly under predicted by WRF-Chem (10%).Two models (WRF-Chem and ECHAM6-HAM2) simulate aerosols at this altitude range over northern India, which probably corresponds to those detected by CALIOP between 0 and 2 km.Above 4 km (not shown), CALIOP only observes a significant signal over the Tibetan plateau region north of the Himalayas, whereas WRF-Chem and ECHAM6-HAM2 simulate backscatter signal over the northern India and all the models, with the exception of NorESM and HadGEM, simulate some aerosols in the middle troposphere over eastern China.

Evaluation against ground-based observations
Good model skill in simulating aerosol vertical distributions is essential for reliable aerosol radiative forcing estimations (Boucher et al., 2013) and assessment of air quality impacts.Whilst evaluation against CALIOP is made with the data averaged over large grid boxes due to the scarcity of satellite overpasses, simulated optical properties are also compared with aerosol lidar measurements collected at sites in the Japanese NIES network.In the NIES lidar network, full overlap between laser and the field of view of telescope is achieved above 500 m.However, corrected data are provided with a geometrical factor that is empirically determined.Thus, the lowest height of useful data is around 150 m.In order to allow a fair comparison, the simulated extinction and observed backscatter are converted to R app , as described in Sect.4.1.2,starting from an altitude of 150 m.All the NIES stations are at urban locations and the R app profiles calculated from observations reveal large aerosol loads in the boundary layer and up to 3 km at certain stations.
The R app mean profiles at each station are shown in Fig. 11.In general, ECHAM6-HAM2, NorESM and HadGEM underestimate the R app between the surface and 2 km whereas average profiles (shape and intensity) from EMEP, OsloCTM2 and WRF-Chem are in a fair agreement with the observations over the same altitude range.Above 2 km, the models adequately simulate R app values.These results are not always in agreement with the model comparison against MODIS and CALIOP observations.For example, the EMEP model overestimates the space-borne derived backscatter signal over Japan (Fig. 10a and 10b) whereas it is in agreement with the ground-based lidar observations.ECHAM6-HAM2 and NorESM also show ambiguous results since the backscatter signal observed by CALIOP is overestimated by both models whereas they underestimate the signals provided by the ground-based lidars.These discrepancies may be due to (i) CALIOP uncertainties at lower altitudes particularly because aerosol products are only retrieved when clouds are not present above aerosol layers, (ii) low model resolution making it difficult for models to capture lidar profiles obtained in urban areas (NIES), and (iii) complex topography, not well resolved by global models.On the other hand, CALIOP signals are averaged over urban, rural and background regions.
A more quantitative parameter providing information about the profile shape is the mean altitude Z mean of the NIES lidar profiles calculated for each profile by a weighting function from 150 m up to 8 km, following: where n is the number of vertical levels.Z mean is overestimated by the models.Modeled overestimation of this quantity is caused by an underestimation of low altitude signals and/or an overestimation of signals at higher altitudes.EMEP simulates Z mean with an error of less than 50 m, ECHAM-6-HAM2, OsloCTM2 and WRF-Chem errors are around 0.5 km.Finally the NorESM mean error is 1.5 km, giving a rather high Z mean value, not only because it has quite some high values of the scattering coefficient in the upper troposphere (comparison limited to 8 km), but also because NorESM strongly underestimates aerosols in the boundary layer.This results in an overestimation of the mean height of the aerosol layer(s).Such biases are likely to be due to coarse model resolutions that are unable to adequately describe variations between rural, maritime and urban areas, leading to an underestimation of high backscatter values usually observed in the boundary layer in urban areas where the NIES lidar are operating.

Aerosol composition
In this section, modeled aerosol components which are important for estimation of anthropogenic radiative forcing (BC, OC, and sulphate), and PM estimation for air quality, are compared with in-situ ground-based observations at Beijing and Gosan as shown in Fig. 12.As noted earlier, pollutant concentrations in Beijing are mainly influenced by local emissions, whereas pollution at Gosan is transported from the Asian continent (Kim et al., 2007).In addition, as noted earlier, anthropogenic emissions in the Beijing area were reduced during summer 2008, with local impacts on observed pollutant levels (Wang et al., 2009(Wang et al., , 2010)).The effect of emission reductions in Beijing area are discussed in Sect.4.3.BC originates from primary emissions due to incomplete combustion, whereas OC and sulphate are emitted from primary sources or formed as secondary products following oxidation of precursors.BC concentrations are mainly influenced by emissions and deposition and less influenced by chemical processing.On the other hand, OC and sulphate aerosols are more hydrophilic and may react with gaseous species and interact with cloud droplets.Their mass therefore evolves as a function of gas condensation at their surface, in addition to primary emissions, oxidation and wet/dry deposition.
In models with aerosol schemes that consider internal mixing of aerosols (HadGEM, NorESM, WRF-Chem), particles containing BC can change from a more hydrophobic to a more hydrophilic state.Mean observed concentrations of BC, OC and sulphate during August and September 2008 are 2.0, 15.4, and 12.1 µg m −3 at the Beijing site compared to 0.18, 1.3 and 7.4 µg m −3 , respectively at Gosan, the latter being consistent with observations from this site reported by Sun et al. (2004).
OsloCTM2 simulates well the transition from high to low concentrations between polluted and downwind locations, whereas NorESM, which simulates polluted concentrations reasonably well, has very high BC downwind at Gosan.ECHAM6-HAM2, TM4-ECPL, and WRF-Chem capture observed OC concentrations fairly well in Beijing but they are underestimated at Gosan.
NorESM and OsloCTM2 have very low (factor ×3) OC over polluted Beijing and OsloCTM2 also underpredicts OC downwind over Gosan whereas EMEP largely overestimates OC in Gosan.These discrepancies, whilst based on comparisons with rather limited data, support global model evaluations (e.g., (Tsigaridis et al., 2014)) showing that models have problems simulating OC and has implications with regard to estimates of radiative forcing from these aerosols.This also applies to radiative forcing estimates due to sulphate aerosol which, as shown in Fig. 12, is one of the most abundant aerosol components measured at the sites considered here (the fraction of organics is also high).At the polluted Beijing site, observed sulphate concentrations are largely over-predicted by ECHAM6-HAM2, EMEP, and TM4-ECPL.This is linked to the overestimation of SO 2 as shown in Fig. 5. ECHAM6-HAM2 also overestimates sulphate at Gosan, in contrast to the EMEP results which are lower than observed, suggesting that sulphate may be lost too fast in this model.HadGEM, OsloCTM2, and WRF-Chem capture the concentrations reasonably well at both sites.Given that HadGEM overestimates SO 2 this suggests insufficient loss processes in this model.

Discussion
MODIS-derived AODs over the Asian region in summer 2008 highlighted that, not surprisingly, elevated AODs are observed over larger cities in eastern China and northern India, where significant accumulation of local pollutants is found close to the Himalayas, due to dominant southerly winds driven by the summer monsoon.The agreement between modeled and observed AODs is generally better over eastern China (where observed AOD is 0.45 ± 0.15) than over northern India (where observed AOD is about 0.4 ± 0.1).However, larger variability can be detected in the model AODs.Several reasons might explain such discrepancies.Different treatments of aerosol emissions within the models can influence simulated AODs, especially, assumptions about the size distribution of particle emissions.All aerosols, including dust and sea salt particles can affect observed AOD, and biases in simulated aerosol concentrations and sizes impact AOD estimates.Aerosol removal processes (i.e., dry and wet deposition) also play an important role in determining modeled aerosol loading and resulting AODs.In this study, ECHAM6-HAM2 shows very strong underestimation, not only in regions where high AODs are reported, but also in more remote regions where background aerosols dominate total mass concentrations.Because all the models use the same emissions, this suggests that this model has aerosol lifetimes that are too short, probably due to a strongly overestimated deposition efficiency.HadGEM and EMEP overestimate AODs over eastern China, but not elsewhere.This suggests that this deficiency is not due to problems in horizontal advection but rather due to a lack of deposition in the atmospheric boundary layer.The HadGEM overestimation can be also ascribed to a large overestimation of sulphate aerosols aloft, linked to a strong overestimation of SO 2 (Fig. 6).In addition, EMEP-derived AODs are strongly underestimated in northern India.Similarly, but to a lesser extent, NorESM and TM4-ECPL slightly underestimate the aerosol loading over the same area.This is an indication that the effect of pollution accumulation due to the Indian monsoon is not adequately represented.This is particularly visible for NorESM, which has lighter winds associated with the Indian monsoon (Fig. 1).WRF-Chem simulates AODs in agreement with observations over India, but underestimates over China, owing partly to a missing source of dust in dry regions of China in this model.The comparison of model results to MODIS aerosol products also highlighted that the model mean provides an excellent result, reproducing the main features of aerosol pollution in East Asia, as well as aerosol abundances over the main source regions.Using results from an ensemble of models to answer air quality-related questions or to study aerosol radiative effects is therefore recommended.The strong underestimation by ECLIPSE models of aerosol loadings identified over northern India is in agreement with the work of Gadhavi et al. (2015), who showed that BC concentrations are strongly underestimated in southern India even when aerosol removal processes in one model were completely switched off.In our study, the fact that observed AODs in northern India are larger than those simulated by most ECLIPSE models suggests that the emissions of BC and precursors of other aerosols are underestimated for India in the ECLIPSE emission data set.This could be related to the rapid recent growth of emissions in India (Klimont et al., 2013), which may be underestimated in the inventories, for example higher emissions from kerosene lamps were identified (Lam et al., 2012) and included in the next generation ECLIPSEv5 dataset increasing the BC estimate for India by about 25% (Klimont et al., 2016), as well as with problems capturing the true spatial distribution of emissions in India.
The analysis of aerosol vertical distributions using CALIOP retrievals highlighted an underestimation in the lowest layers and an overestimation in more elevated layers for most models, except for EMEP.These results suggest that models overestimate transport of aerosol pollution into the free troposphere linked to deficiencies in model treatments of boundary layer exchange, convection, or too much vertical diffusion.Loss by wet scavenging, especially in the boundary layer, may also be insufficient.These findings are confirmed by the comparison to the NIES lidar network above 2 km, representative of aerosols downwind in altitude : the observations derived from this network indeed indicate an overestimation of the mean altitude of aerosols layers, suggesting an underestimation of aerosol deposition efficiency in the mid-troposphere during transport from urban areas in China to sites located downwind.This is in agreement with previous work noting overestimation of observed aerosol concentrations in the free (upper) troposphere (Koffi et al., 2012;Samset et al., 2014).For instance, Samset et al. (2014) evaluated model simulations performed over longer periods against aircraft measurements and found that the models systematically overpredicted BC concentrations in the remote upper troposphere over the Pacific Ocean.They concluded that the BC lifetime in the models is too long.As mentioned in Sect.3.6, CO lifetimes appear to be too short, whereas aerosol lifetimes appear to be too long.This suggests a clear influence of wet deposition rather than chemical processing on aerosols transported downwind from East Asia in the lower troposphere.We also note that, compared to CALIOP data, certain models (NorESM, and to a lesser extent EMEP and OsloCTM2), simulate high amounts of aerosols below 2 km in the northeast of the domain (south of the Kamtchatka peninsula).This is due to elevated sulphate concentrations simulated in these models (not shown).
Concerning EMEP, the overestimation of the backscatter signal is found throughout the tropospheric column supporting the suggested overestimation in aerosol lifetimes linked to an underestimation in wet deposition processes.Because lidar measurements are particularly sensitive to the presence of scattering aerosols, the overestimation of lidar signals in elevated layers points towards an overestimation (respectively, underestimation) of the aerosol scattering effect at altitude (respectively, in the planetary boundary layer) above the Asian region.In terms of aerosol-radiation interactions, the aerosol vertical profile is important for absorbing aerosols like black carbon, but in a complex way making it important to correctly simulate aerosol vertical profiles (Samset and Myhre, 2015).Furthermore, simulation of excessive aerosols between 2 and 4 km may trigger the artificial activation of aerosols as Cloud Condensation Nuclei (CCN) leading to erroneous cloud droplet formation.This would also contribute to an overestimation of the aerosol cooling effect due to the low-level clouds over the Asian continent.
Since ECLIPSE models generally only represent interactions with liquid clouds, the fact that the simulated aerosol layers, with enhanced concentrations, are too high aloft, and may end up above low-level clouds, suggests that aerosol-cloud interactions may be underestimated.Correct simulation of aerosol vertical profiles is critical for aerosol-cloud interactions.
At the surface, the model-mean overestimates BC in Beijing, even if there is a strong divergence in the model results.the exception of two models (OsloCTM2, NorESM), that underpredict OC concentrations, the model-mean agrees rather well with the observations.These results are an improvement compared to the study of Tsigaridis et al. (2014)  Results from the WRF-Chem simulation with reduced emissions due to additional mitigation measures in the Beijing area (Gao et al., 2011), discussed earlier, show that the measures taken for the Olympic Games leads to small reductions in surface BC, OC and sulphate concentrations by 0.3, 1 and 1µg m −3 , respectively.This cannot explain discrepancies between model results and the observations, and especially the overestimation of surface BC and sulphate concentrations in Beijing.As a consequence, the general model overestimation at the surface close to the anthropogenic sources, and the good agreement downwind of the sources, suggests an overestimation of emissions close to local sources in Beijing.But, it also confirms that the generally good agreement found compared to MODIS AOD and the overestimation compared to CALIPSO attenuated backscatter coefficients above the planetary boundary layer are mostly due to an overestimation of vertical transport or an insufficient deposition process.Different studies highlight the potential role of a poor representation of secondary OA production during transport to explain the underestimation of organic aerosols close to the surface (Tsigaridis et al., 2014).This is not found here, based on the comparison with Gosan data, and is not in agreement with the overestimation of aerosols detected at altitude.As a consequence, poor representation of secondary OA production in models during transport does not appear to be a dominant factor over Asia in summer.

Summary
The ability of chemical-aerosol/chemistry-climate models to simulate distributions of short-lived pollutants is evaluated over Asia during summer 2008 using results from models run with the same 2008 ECLIPSE anthropogenic emissions and the same biomass burning dataset.Models were, in general, nudged with meteorological analyses for the study period.Model performance is evaluated using a variety of datasets in order to assess models in different environments and over different spatial and vertical scales.We note again that these models have been used to estimate present-day air quality and climate impacts of short-lived pollutants for the present-day and future scenarios (Stohl et al., 2015).To examine ozone (and its precursors) and aerosols over major emission regions, model results were compared to surface observations at polluted and rural locations, aircraft trace gas data collected south of Beijing and satellite data.Vertically resolved aerosol lidar data collected downwind over Japan and satellite data were used to assess model behaviour on regional and continental scales in the lower troposphere as well as over the total atmospheric column.The assessment of model performance over different scales is important for radiative forcing and air quality estimates.
Models show systematic positive biases in ozone, especially at rural surface locations and compared to satellite data downwind of major Chinese emission regions.The general underestimation of CO over and downwind of emissions is linked to this, most likely due excessive destruction by OH, suggesting that CO lifetimes are too short.The causes of ozone discrepancies varies between models but is linked to model ability to simulate VOC and NO x regimes in polluted and less polluted environments.This may also be linked to inter-model spatial variability in compared to NO 2 surface data and NO 2 satellite column data.The latter indicates a possible underestimation in NO x emissions over Korea and Japan as well as under (over)-estimation of emissions to the south/east (west) of the Chinese NCP emission region.These findings point to the need to employ adequate model resolution to improve simulated responses to emissions when moving from ozone titration to ozone production regimes within large polluted conurbations, their surroundings and downwind.Overestimation of Asian ozone and its transport downwind implies that radiative forcing from this source may be overestimated.Sensitivity analyses, based on one model, suggest that emission mitigation over Beijing cannot explain these discrepancies.
Satellite-derived AOD measurements are reproduced quite well by the models over China although surface BC and sulphate are overestimated in urban China in summer 2008.The effect of short-term mitigation measures taken during the Olympic Games in summer 2008 is too weak to explain differences between the models and observations.Our results rather point to an overestimation of emissions close to the surface in urban areas, particularly for SO 2 .A potential reason for this is the fact that the spatial distribution of power plant emissions has been changing dramatically in the last decade in China (Liu et al., 2015), a change that had not been taken into account in the emission inventory used in this study.ECLIPSE models strongly underestimate aerosol loadings over northern India, suggesting that emissions of BC and other aerosol precursors are underestimated in the ECLIPSEv4a dataset.Improvements have subsequently been included in a later version (ECLIPSEv5) such as higher emissions from kerosene lamps.Model deficiencies in the representation of pollution accumulation due to the Indian monsoon may also play a role.The underestimation of scattering aerosols in the boundary layer, associated with an overestimation in the free tropopshere, can be ascribed to two main factors: an overestimation of the vertical transport of aerosols into the free troposphere and/or insufficient aerosol deposition in the boundary layer.Both factors contribute to overestimated aerosol resi-dence times in models.
In summary, the ECLIPSE model evaluation highlights significant differences between the models and observations, even when models are run using the same emissions over East Asia.

(
polluting vehicles, chemical, power plants) in the Beijing municipality were mitigated from 30 June 2008 and 20 September 2008 (see the detailed mitigation plan in Wang et al. (2010)) in the context of the Beijing Olympic and Paralympic games.
Figure 2 shows August 2008 average IASI 0-6 km ozone columns and the smoothed columns using Equation 1. Statistical parameters (correlation coefficient (R), normalized mean bias (NMB) and error (NME), root mean square error (RMSE)) based on Fig. 2 are given in Table

modeled NO 2
provides insights into discrepancies between simulated and observed ozone.Here, tropospheric NO 2 columns observed by GOME-2 are compared with the model results.Column retrievals do not include corrections for aerosol scattering, which are estimated to be less than 10% byBoersma et al. (2004).Monthly mean observed tropospheric NO 2 columns for August and September 2008, averaged on a regular 1 • × 1 • grid, are shown in Fig.4, as well as the absolute differences between the simulated and the observed tropospheric NO 2 columns.Absolute differences are shown instead of the tropospheric columns to highlight significant biases in remote regions.Since NO 2 has a lifetime of only about 1-2 days in the lower troposphere, highest concentrations are observed close to emission areas, i.e., around Beijing and the main cities (Shanghai, Hong Kong, Seoul, Tokyo).HadGEM, WRF-Chem and NorESM overestimate, and OsloCTM2 and TM4-ECPL underestimate NO 2 columns with NMBs of53, 45,29, 64, 51, 40 and 38 %, respectively.The same biases are over the measurement period) are compared with observations from the CAREBEIJING 2008 airborne campaign in Fig.6collected south of the main urban center of Beijing.Observed data are averages over the 12 flights performed between 28 August and 25 September 2008 binned by altitude between 200 and 2200 m, providing useful information about pollutant concentrations in the boundary layer (BL) and lower free troposphere.Three flight routes covering the area 38-40 • N and 114-118 • E from Tianjin to Shijiazhuang were flown repeatedly.Zhang et al. (2014) showed that the flights sampling polluted air masses originated from urban areas south of the flight locations and suggested a limited influence from emission mitigation measures applied in Beijing municipality at this time.Observed ozone precursors are elevated up to about 1.5 km showing that the entire boundary layer was influenced by pollution.In general, concentrations are lower than observed at the urban surface sites although maximum ozone concentrations of more than 100 ppbv were observed in certain air masses.Model results were extracted along the flight paths corresponding to 2 or 3 model pixels (depending on the model) using hourly (or 3-hourly) output.This allows a fairer evaluation against the observations especially since trace gases have important diurnal cycles.Whilst the model results are an average over fairly large spatial scales, such a comparison provides useful insights into the vertical distribution of pollutants simulated by the models over a region which is more representative of the less polluted background.Observed NO 2 is underestimated by the models at all altitudes (except HadGEM).This result is consistent with the satellite comparison in Fig.4where tropospheric NO 2 columns south of Beijing are underestimated by several models.
September 2008 and taking into account missing observations primarily due to the presence of clouds within the column.Days with missing observations were removed from the model results at specific locations.The model results are bi-dimensionally interpolated on the 1×1 • MODIS grid.Fig.8(top panels) shows average maps of observed and simulated AOD at 550 nm.In general, the models correctly represent the main features of the spatial AOD distribution, including the large values over the NCP area and northern India.However, AOD is not reproduced equally accurately by the models, especially over these two regions.Absolute differences between the models and MODIS are also shown in Fig.8(bottom panels).HadGEM and, to a lesser extent NorESM and TM4-ECPL, overestimate AOD background values.In addition, HadGEM and EMEP overestimate AOD over NCP whereas they are underestimated by ECHAM6-HAM2, OsloCTM2, and WRF-Chem.

Fig. 9
Fig.9compares MODIS and model mean/percentile AOD over these regions, and the rest of the domain, during August and September 2008.In terms of AOD variability, the agreement is generally better over eastern China than over northern India.More specifically, over eastern China, NorESM and TM4-ECPL capture the observed variability with a deviation of less than 10% whereas it is somewhat overestimated by HadGEM and EMEP and underestimated by ECHAM6-HAM2 and WRF-Chem.Over northern India, observed variability is reproduced by HadGEM, OsloCTM2 and WRF-Chem with deviations from the MODIS observations of less than 25%, whereas it is underestimated by the other models.The corresponding statistical parameters are summarized in Table6for the two regions and in Table7for the entire domain.

Observed
BC concentrations are quite low (2 ± 1µg m −3 ) in August-September 2008, but the model-mean over-predicts concentrations by a factor of 3.Only one model (OsloCTM2) agrees well with the observations at this site.Three models simulate surface sulphate concentrations reasonably well over Beijing (WRF-Chem, HadGEM, OsloCTM2) with an average value of 10 ± 2 µg m −3 .The other models strongly overestimate surface sulphate concentrations by a factor 3, driven by the general overestimation in SO 2 concentrations over Beijing.Good agreement with sulphate data in HadGEM and OsloCTM2 may be fortuitous since the evaluation of trace gases suggested excessive oxidizing chemistry in the OsloCTM2 model and lacking oxidizing chemistry in the HadGEM model.Measured OC concentrations are of the order of 14 ± 6µg m −3 over Beijing.With error NME) based on spatial variations for model simulations of 0-20 km ozone column over Asia in August 2008 compared to the IASI ozone-FORLI observations.

Figure 1 .
Figure 1.Map of the Asian region showing mean surface relative humidity (%) and surface wind speed (m/s) and direction ( • ) for August and September 2008 from ECMWF (top left panel), NCEP (National Centers for Environmental Prediction) FNL (final, top right panel) , and NorESM (bottom left panel), and the NOx emissions over east-Asia (bottom right panel).The WRF-Chem domain (dashed line) and the satellite data comparison domain (thick black line) are shown in the top right panel whereas the ground-based stations (cyan square, green circles, blue triangles), and the CAREBEIJING flight tracks (blue lines) used in this study are shown in the bottom right panel.

Figure 2 .
Figure 2. Average 0-6 km ozone columns (molec cm −2 ) over Asia in August 2008 observed by the IASI satellite (left panel) and simulated by the models (models names are given in each relevant panel).The black polygon delimits the region discussed in detail in the text.

Figure 3 .
Figure 3. Average total CO columns (molec cm −2 ) over Asia in August 2008 observed by the IASI satellite (left panel) and relative differences between columns observed by IASI and simulated by the models (models names are given in each relevant panel).

Figure 4 .Figure 5 .
Figure 4. Mean tropospheric NO2 columns in molec cm −2 (between the ground and the tropopause height given in the GOME-2 product) in August and September 2008 over Asia as observed by the GOME-2 satellite and absolute differences between GOME-2 and model simulations (model names are given in the panels).Model mean tropospheric columns are also presented in the bottom left panel.The white square denotes the emission region discussed in the text.

Figure 6 .
Figure 6.Mean vertical profiles of ozone, NO2, CO, and SO2 observed over China during the CAREBEIJING 2008 airborne campaign and simulated by the ECLIPSE models.

Figure 7 .
Figure 7. Ozone diurnal cycle as observed and as simulated by the ECLIPSE models in Beijing averaged over August and September 2008.Whiskers represent two standard deviations.

Figure 8 .
Figure 8. (8 top panels) Mean AODs observed by MODIS and simulated by the ECLIPSE models for August and September 2008, and (8 bottom panels) absolute differences between the simulated and the MODIS AODs (models names are given in each relevant panel).White polygons mark out the two regions discussed in the text.

Figure 9 .
Figure 9. Box plots showing the mean AODs (circle), median (central line), 25 th and 75 th percentiles (box edges), and the extreme data not considered as outliers (whiskers) during August and September 2008 as observed by MODIS and simulated by the ECLIPSE models over northern India, eastern China and the rest of the domain.

Figure 10a .
Figure 10a.Comparison between Rapp over Asia in August and September 2008 averaged over a 0-2 km layer derived from CALIOP data and simulated by the ECLIPSE models.White boxes indicate missing observations due to ground elevation. 46

Figure 10b .
Figure 10b.Comparison between Rapp over Asia in August and September 2008 averaged over a 2-4 km layer derived from CALIOP data and simulated by the ECLIPSE models.White boxes indicate missing observations due to ground elevation.

Figure 11 .
Figure 11.Comparison of mean (grey dots), median (black line), 25 th and 75 th percentiles (grey area) of Rapp profiles observed at 10 NIES aerosol lidar stations over Japan (location shown in Fig. 1) and mean Rapp (colored lines) simulated by the ECLIPSE models.
who reported a systematic under-prediction of organic aerosols (OA) near the surface as well as large model divergence in the middle and upper troposphere.They attributed these discrepancies to missing or underestimated OA sources, removal parameterizations and uncertainties in the temperature-dependent partitioning of secondary OA in the models.At the rural downwind site, Gosan, in South Korea, model agreement is generally much better with BC, OC and sulphate data.Good agreement is found between the model-mean and the measurements, except for BC, but this is due NorESM that strongly overestimates surface concentrations at this location.The BC/SO 4 ratio observed at Beijing is almost constant (∼ 0.2) with enhanced values occurring episodically (02-03 August, 16-17 August, 01-02 September) when the ratio can reach 1.All models reproduce this ratio reasonably well (not shown), but two models (HadGEM, EMEP) show large variations between 0.1 and 6 (mean value of ∼ 2).Over Gosan, the observed BC/SO 4 ratio is lower (∼ 0.1) underlining that is a rural site more remote from local sources.Models reproduce this ratio quite well, except for overestimated ratios in EMEP (∼ 0.4) and TM4-ECPL (∼ 0.2).Such discrepancies may affect model responses to emission perturbations and thus radiative forcing.

Table 1 .
Nevertheless, an important finding is that the global Earth System Models show a similar level of performance compared to the global Chemistry Transport Models.This is encouraging since Earth System Models aim to include chemistry-climate feedbacks and are used to determine both climate and air quality impacts.Somewhat better general agreement is found for trace gas constituents compared to aerosols, for which agreement is very variable.For both trace gases and aerosols, models have difficulties reproducing horizontal gradients between urban and rural (downwind) locations and vertical distributions.Improved model resolution as well as improved understanding and model treatments of processes affecting pollutant lifetimes are needed.Model evaluations using a variety of observations are required so that different aspects of model behavior can be tested.Results from this study suggest that significant uncertainties still exist in chemical-aerosol model simulations.This has implications for the use of such models in the assessment of radiative effects of short-lived climate forcers on climate and regional/global air quality.sureson emission reductions during the 2008 Olympic Games in Beijing, China, Atmospheric Environment, 44, 285 -293, doi:http://dx.doi.org/10.1016/j.atmosenv.2009.10.040,http://www.sciencedirect.com/science/article/pii/S1352231009009157,2010.ECLIPSE model description including meteorological fields used to nudge simulations (where applicable), spatial resolution, aerosol schemes, and biogenic emissions.Trace gases, aerosol species and optical parameters output by the models are provided together with references for the different models.Institutes responsible for each model are indicated with the indices of the author affiliations.

Table 2 .
Coordinates of stations used in this study and available parameters.

Table 3 .
Statistical parameters (correlation coefficient R, normalized mean bias NMB, root mean square error RMSE, and normalized mean