Air quality and climate change , Topic 3 of the Model Inter-Comparison Study for Asia Phase III ( MICS-Asia III ) – Part 1 : Overview and model evaluation

Topic 3 of the Model Inter-Comparison Study for Asia (MICS-Asia) Phase III examines how online coupled air quality models perform in simulating high aerosol pollution in the North China Plain region during wintertime haze events and evaluates the importance of aerosol radiative and microphysical feedbacks. A comprehensive overview of the MICS-Asia III Topic 3 study design, including descriptions of participating models and model inputs, the experimental designs, and results of model evaluation, are presented. Six modeling groups from China, Korea and the United States submitted results from seven applications of online coupled chemistry–meteorology models. Results are compared to meteorology and air quality measurements, including data from the Campaign on Atmospheric Aerosol Research Network of China (CARE-China) and the Acid Deposition Monitoring Network in East Asia (EANET). The correlation coefficients between the multi-model ensemble mean and the CARE-China observed near-surface air pollutants range from 0.51 to 0.94 (0.51 for ozone and 0.94 for PM2.5) for January 2010. However, large discrepancies exist between simulated aerosol chemical compositions from different models. The coefficient of variation (SD divided by the mean) can reach above 1.3 for sulfate in Beijing and above 1.6 for nitrate and organic aerosols in coastal Published by Copernicus Publications on behalf of the European Geosciences Union. 4860 M. Gao et al.: Air Quality and Climate Change, Topic 3 regions, indicating that these compositions are less consistent from different models. During clean periods, simulated aerosol optical depths (AODs) from different models are similar, but peak values differ during severe haze events, which can be explained by the differences in simulated inorganic aerosol concentrations and the hygroscopic growth efficiency (affected by varied relative humidity). These differences in composition and AOD suggest that future models can be improved by including new heterogeneous or aqueous pathways for sulfate and nitrate formation under hazy conditions, a secondary organic aerosol (SOA) formation chemical mechanism with new volatile organic compound (VOCs) precursors, yield data and approaches, and a more detailed evaluation of the dependence of aerosol optical properties on size distribution and mixing state. It was also found that using the ensemble mean of the models produced the best prediction skill. While this has been shown for other conditions (for example, the prediction of high-ozone events in the US (McKeen et al., 2005)), this is to our knowledge the first time it has been shown for heavy haze events.

regions, indicating that these compositions are less consistent from different models.During clean periods, simulated aerosol optical depths (AODs) from different models are similar, but peak values differ during severe haze events, which can be explained by the differences in simulated inorganic aerosol concentrations and the hygroscopic growth efficiency (affected by varied relative humidity).These differences in composition and AOD suggest that future models can be improved by including new heterogeneous or aqueous pathways for sulfate and nitrate formation under hazy conditions, a secondary organic aerosol (SOA) formation chemical mechanism with new volatile organic compound (VOCs) precursors, yield data and approaches, and a more detailed evaluation of the dependence of aerosol optical properties on size distribution and mixing state.It was also found that using the ensemble mean of the models produced the best prediction skill.While this has been shown for other conditions (for example, the prediction of high-ozone events in the US (Mc-Keen et al., 2005)), this is to our knowledge the first time it has been shown for heavy haze events.

Introduction
Air pollution in Asia, particularly in China and India, has been an increasingly important research topic and has attracted enormous media coverage since about 60 % of the world population live and are exposed to extremely unhealthy air in this region.It is estimated that outdoor air pollution brings about 3.3 million premature deaths per year worldwide, with most deaths occurring primarily in Asia (Lelieveld et al., 2015).In addition, the impacts of regional and intercontinental transport of Asian pollutants on air quality and climate change have been frequently reported (Akimoto, 2003;Menon et al., 2002;Ramanathan and Carmichael, 2008).Chemical transport models have been developed and applied to study various air pollution issues in Asia.For example, an Eulerian regional-scale acid deposition and photochemical oxidant model was developed in the United States (Carmichael and Peters, 1984;Carmichael et al., 1986Carmichael et al., , 1991) ) and applied to study long-range transport of sulfur oxides (SO x ), dust and ozone production in East Asia (Carmichael et al., 1998;Xiao et al., 1997).A nested urban and regional-scale air quality prediction modeling system was developed and applied to investigate ozone pollution in Taiwan (Wang et al., 2001).Although important advances have taken place in air quality modeling, large uncertainties still remain, which are related to inaccurate and/or incomplete emission inventories, poorly represented initial and boundary conditions, and missing or poorly parameterized physical and chemical processes (Carmichael et al., 2008a).
Furthermore, many models used to study air quality in Asia were developed in other regions (e.g., the USA and Europe), and the assumptions and parameterizations included in these models may not be applicable to the Asian environment.In order to develop a common understanding of model performance and uncertainties in Asia, and to further develop the models for Asian applications, a model intercomparison study was initiated, i.e., the Model Inter-Comparison Study for Asia Phase I (MICS-Asia I), in 1998 during the Workshop on the Transport of Air Pollutants in Asia held in Austria.The focus of MICS-Asia Phase I was to study long-range transport and the deposition of sulfur within Asia in support of ongoing acid deposition studies.Eight long-range transport models from six institutes in Korea, Japan, Denmark, the USA and Sweden participated in MICS-Asia I. Multi-model results of sulfur dioxide (SO 2 ) and sulfate concentrations and wet-deposition amounts in January and May 1993 were compared with surface observations in East Asia (Carmichael et al., 2002).Source-receptor relationships and how model structure and parameters affect model performance were also discussed during this phase (Carmichael et al., 2002).In 2003, MICS-Asia Phase II was initiated to include more species, including nitrogen compounds, ozone and aerosols.The study period was expanded to cover two different years and three different seasons, and global inflow to the study domain was also considered (Carmichael et al., 2008b).Nine modeling groups from Korea, Hong Kong, Japan, the USA, Sweden and France participated in this phase.Seven topics (i.e., ozone and related precursors, aerosols, acid deposition, global inflow of pollutants and precursors to Asia, model sensitivities to aerosol parameterization, analysis of emission fields, and detailed analyses of individual models) were discussed and published in a special issue of Atmospheric Environment (Carmichael et al., 2008b).
In 2010, MICS-Asia Phase III was launched, and three topics for this phase were decided during the first and second Workshop on Atmospheric Modeling in East Asia.Phase III aims to evaluate strengths and weaknesses of current air quality models and provide techniques to reduce uncertainty in Asia (Topic 1), to develop a reliable anthropogenic emission inventory in Asia (Topic 2), and to evaluate aerosolweather-climate interactions (Topic 3).Various multi-scale models participated in this phase, and the study periods range from year to month depending on study topics.This phase benefits from the Acid Deposition Monitoring Network in East Asia (EANET) measurements, in addition to new observations related to atmospheric chemistry in this region.A detailed overview of the MICS-Asia Phase III, including the descriptions of different research topics and participating models, will be published in a companion paper.An important advance to this phase is the inclusion of multiple online coupled chemistry-meteorology models to investigate aerosolweather-climate interactions, which is the target of topic 3. Online coupled models play important roles in air quality, meteorology and climate applications, but many important research questions remain (Baklanov et al., 2017).
The influences of aerosols on meteorology, e.g., radiation, temperature, boundary layer heights, winds and PM 2.5 , con-centrations have been examined in previous studies using different online coupled models (Forkel et al., 2015;Gao et al., 2016aGao et al., , b, 2017a, b;, b;Han et al., 2012Han et al., , 2013;;Makar et al., 2015a, b;San Jose et al., 2015;Tao et al., 2015Tao et al., , 2016;;Wang et al., 2014;Zhang et al., 2010).In general, there are two ways of online coupling: online integrated coupling (meteorology and chemistry are simulated using the same model grid, and one main time step is used to integrate) and online access coupling (meteorology and chemistry are independent, but data are exchanged on a regular basis) (Baklanov et al., 2014).These two different coupling methods can lead to uncertainties in the results of aerosol-weather-climate interactions.Even using the same coupling method, different parameterizations in different online models cause uncertainties as well.Thus, it is important to intercompare how different online models simulate aerosol-weather-climate interactions, particularly in the heavily polluted Asian region.Other ongoing related modeling frameworks include the Task Force on Hemispheric Transport of Air Pollution (TF HTAP) and the Air Quality Model Evaluation International Initiative (AQMEII).The TF HTAP was initiated to improve knowledge of the intercontinental or hemispheric transport and formation of air pollution and its impacts on climate, ecosystems and human health (Galmarini et al., 2017;Huang et al., 2017).The AQMEII project specifically focuses on regional modeling domains over Europe and North America (Galmarini et al., 2017), within which aerosol meteorology interactions were studied (Forkel et al., 2015;Makar et al., 2015a, b;San Jose et al., 2015).
This paper overviews the MICS-Asia III Topic 3, serving as the main repository of the information linked to Topic 3 simulations and comparisons.Specifically, this paper aims to archive the information of the participating models, how the experiments are designed, and the results of model evaluation.The results of the MICS-Asia Topic 3 experiments looking at the direct and indirect effects during heavy haze events will be published in a companion paper, part II.In Sect.2, we provide the intercomparison framework of Topic 3, including the participating models, emissions, boundary conditions, observational data and analysis methodology.Section 3 presents comparisons and discussions focused on the results related to the meteorological and air pollution conditions during the January 2010 heavy haze episode.

Intercomparison framework
In northern China, severe aerosol pollution frequently happens and attracts enormous interest from both the public and the scientific community (Cheng et al., 2016;Gao et al., 2015Gao et al., , 2016a-c)-c).Two winter months in which severe haze episodes happened in northern China were selected as the study periods for Topic 3.During these two months, the maximum hourly PM 2.5 concentration in urban Beijing reached ∼ 500 µg m −3 and 1000 µg m −3 .Compared to the China Grade 1 24 h PM 2.5 standard (35 µg m −3 ), daily mean PM 2.5 concentrations in urban Beijing exceeded this standard on 20 days and 27 days within these two months.The dramatically high aerosol loadings during these two hazy months substantially affected radiation transfer and provide a good opportunity to study the aerosol effects on weather, air quality and climate.In this study, the participants were required to use common emissions to simulate air quality during these two months and submit requested model variables.The emissions were placed on a publicly accessible website.Six modeling groups submitted results for Topic 3. In this section, we briefly describe these models and their configurations, introduce the emission inventories (including anthropogenic, biogenic, biomass burning, air and ship, and volcano emissions) and the observational datasets, and present the analysis methodology.

Participating models
Table 1 summarizes the characteristics of the participating models.These models include one application of the Weather Research Forecasting model coupled with Chemistry (WRF-Chem; Fast et al., 2006;Grell et al., 2005) by Pusan National University (PNU) (M1); one application of the WRF-Chem model by the University of Iowa (UIOWA) (M2); two applications (two domains: 45 and 15 km horizontal resolutions) of the National Aeronautics and Space Administration (NASA) Unified WRF (NU-WRF; Peters-Lidard et al., 2015;Tao et al., 2013) model by the Universities Space Research Association (USRA) and NASA's Goddard Space Flight Center (M3 and M4); one application of the Regional Integrated Environment Modeling System with Chemistry (RIEMS-Chem; Han et al., 2010) 1) and the set model top pressures range from 100 to 20 mb (Table 1).
2. Gas phase chemistry: at PNU (M1), the RACM-ESRL (Regional Atmospheric Chemistry Mechanism -Stockwell et al., 1997;Earth System Research Laboratory -Kim et al., 2009) gas phase chemistry was used.RACM was developed based on the Regional Acid Deposition Model (RADM2) to simulate regional atmospheric chemistry (Stockwell et al., 1990) (including 237 reactions) and the rate coefficients were updated in the RACM ESRL version (Kim et al., 2009) (Zaveri et al., 2008).The MOSAIC version used in M2 includes some aqueous reactions but no secondary organic aerosol (SOA) formation.At NASA, the GOCART (Goddard Chemistry Aerosol Radiation Transport) aerosol model (Chin et al., 2002) was coupled to RADM2 gas phase chemistry, and incorporated into the NU-WRF model (M3 and M4) to simulate major tropospheric aerosol species, including sulfate, BC, OC, dust and sea salt.In this aerosol model, 10 % of organic compounds from the volatile organic compound (VOC) emission inventory were assumed to be converted to SOA (Chin et al., 2002).Aerosols in RIEMS-Chem include sulfate, nitrate, ammonium, BC, OC, SOA, five bins of soil dust and five bins of sea salt (Han et al., 2012;Li and Han, 2016).ISORROPIA (Nenes et al., 1998) was coupled to RIEMS-Chem to treat thermodynamic equilibrium process and to simulate inorganic aerosols.SOA production from primary anthropogenic and biogenic VOCs is calculated using a bulk aerosol yield method according to Lack et al. (2004).RegCCMS also used ISORROPIA to calculate inorganic aerosols (Wang et al., 2010).For the implementation of aerosol effects, sulfate radiative properties were treated following Kiehl and Briegleb (1993), OC aerosols are assumed to have the same properties as sulfate, and the wavelength-dependent radiative properties of BC follow Jacobson (2001).The AE6 aerosol (the sixth-generation CMAQ aerosol module; Carlton et al., 2010) (Kalnay et al., 1996).
5. Soil dust: M1, M6 and M7 did not include soil dust calculation.M3 and M4 used the GOCART dust module (Ginoux et al., 2001), and M2 used a GOCART version that was modified by AFWA (Air Force Weather Agency).M5 used a dust module that is described in Han et al. (2004).
6. Mixing state: M6 assumes external mixing, while other models use internal mixing treatments for major aerosol compositions.
Many previous studies have underscored that the choice of gas phase mechanism and aerosol models is of great importance for simulating air pollutants (Knote et al., 2015).The different gas phase chemistry and aerosol modules used in the participating models are expected to yield notable differences in performances, which are shown later in Sect.3.

Emissions
The accuracy of air quality modeling results greatly depends on the quality and reliability of the emission inventory.Accordingly, a new Asian emission inventory was developed for MICS-III by integrating state-of-the-art national or regional inventories to support this model intercomparison study (Li et al., 2017).This is the major theme of MICS-Asia III Topic 2. These emissions, along with biogenic emissions, biomass burning emissions, emissions from air and ship transport, volcano emissions, and dust emissions, were used.This section provides some basic descriptions of these emissions.

Anthropogenic emissions
The state-of-the-art anthropogenic emission inventory for Asia (MIX) was developed by incorporating five inventories, including the REAS (Regional Emission inventory in ASia) inventory for Asia developed at the Japan National Institute for Environmental Studies (NIES), the Multi-resolution Emission Inventory for China (MEIC) developed at Tsinghua University, the high-resolution ammonia (NH 3 ) emission inventory in China developed at Peking University, the Indian emission inventory developed at Argonne National Laboratory in the United States and the CAPSS (Clean Air Policy Support System) Korean emission inventory developed at Konkuk University (Li et al., 2017).This MIX inventory includes emissions for 10 species, namely SO 2 , nitrogen oxides (NO x ), carbon monoxide (CO), non-methane volatile organic compounds (NMVOCs), NH 3 , PM 10 , PM 2.5 , BC, OC and carbon dioxide (CO 2 ).NMVOC were provided with CB-05 (Carbon Bond chemical mechanism 05) and SAPRC99 speciation datasets.Speciation mapping of NMVOC emissions for groups using other gas phase chemical mechanisms, such as CBMZ, RADM2 and CBM4, used the speciation framework documented in Li et al. (2014).Emissions of these species were prepared for the years 2008 and 2010 with a monthly temporal resolution and 0.25 • spatial resolution.Weekly or diurnal profiles were also provided.Five sectors were considered, namely industry, power generation, residential sources, transportation and agriculture.Figure 2 shows the spatial maps of these 10 species for January 2010.Emissions of most of these species exhibit similar spatial patterns, with enhanced values in east China and lower values in north and south India.Emissions of NH 3 display a different spatial distribution, with pronounced values in India and lower values in northern China (Fig. 2).A more detailed description of this emission inventory is founds in Li et al. (2017).

Biogenic emissions
Terrestrial ecosystems generate various chemical species, including volatile and semi-volatile compounds, which play important roles in atmospheric chemistry and are the largest contributor to the global annual flux of reactive VOCs (Guenther et al., 2006).For MICS-Asia III, hourly biogenic emissions were provided for the entire year of 2010 using the Model of Emissions of Gases and Aerosols from Nature (MEGAN) version 2.04 (Guenther et al., 2006).The variables that drive MEGAN include land cover information (plant function type, leaf area index) and weather conditions, which include solar transmission, air temperature, humidity, wind speed and soil moisture.In the preparation of MEGAN biogenic emissions, land cover information was taken from the NASA MODIS (Moderate Resolution Imaging Spectroradiometer) products, and weather conditions were calculated using the WRF simulations.Figure S1 in the Supplement shows biogenic emissions of some selected species (isoprene and HCHO) for January 2010.High biogenic emissions are found in South Asia during winter, including India, southern China and Southeast Asia, where solar radiation, air temperature and vegetation cover are higher than in northern regions.As shown in Table 1, M1 and M5 used prescribed biogenic VOCs emissions; other models except M6 used internal calculation.

Biomass burning emissions
Biomass burning is a strong contributor to air pollutants, and extensive biomass burning in Asia, particularly Southeast Asia, exerts a great influence on air quality (Streets et al., 2003).For MICS-Asia III, biomass burning emissions were processed by re-gridding the Global Fire Emissions Database version 3 (GFEDv3; Randerson et al., 2015) (0.5 by 0.5 • ).GFED fire emissions are estimated through combining satellite-detected fire activity and vegetation productivity information.Carbon, dry matter, CO 2 , CO, CH 4 , hydrogen, nitrous oxide, NO x , NMHC (non-methane hydrocarbon), OC, BC, PM 2.5 , total particulate matter and SO 2 emissions are estimated with monthly temporal resolution.Figure S2 shows the gridded biomass burning emissions for January 2010.Biomass burning activity was highest in Cambodia and some areas of Myanmar and north of Thailand (Fig. S2), and the peak emission season is spring.Although it has been concluded that biomass burning could significantly contribute to aerosol concentrations in China, the contribution is limited for the Topic 3 study since the region on which it focuses is northern China, where biomass burning emissions are negligible during winter (Gao et al., 2016a).

Volcanic SO 2 emissions
Volcanoes are important sources of various sulfur and halogen compounds, which play crucial roles in tropospheric and stratospheric chemistry.It is estimated that SO 2 emitted from volcanoes accounts for about 9 % of the total worldwide annual SO 2 flux (Stoiber et al., 1987).The Asia-Pacific region is one of the most geologically unstable regions in the world, where many active volcanoes are located.During MICS-Asia Phase II, the volcano SO 2 emissions had already been provided for chemical transport models (Carmichael et al., 2008b).Volcano SO 2 emissions were provided, with a daily temporal resolution.In January, some volcanoes in Japan are very active, such as Miyake-jima (139.53 • E, 34.08 • N; 775 m a.s.l.) and Sakurajima (130.65 • E, 31.59 • N; 1117 m a.s.l.).

Air and ship emissions
Fuel burning in aircraft and ship engines produces greenhouse gases and air pollutants.The shipping and aircraft emissions used are based on the HTAPv2 emission inventory (0.1 by 0.1 • ) for the year 2010 (Janssens-Maenhout et al., 2015), provided on an annual basis.Aircraft emissions include three parts: landing and takeoff (LTO), climbing and descent (CDS), and cruise (CRS).Aircraft emission hot spots are mostly located in Japan and in Beijing, the Yangtze River Delta (YRD) and the Pearl River Delta (PRD) in China (Fig. S3).The East China Sea, around Japan and Singapore, exhibits high shipping emissions due to active shipping transportation (Fig. S3).It is estimated that international shipping contributed about 10 % to the global SO 2 emissions and together with aviation contributes more than 10 % to global NO x emissions (Janssens-Maenhout et al., 2015).

Dust emissions
In M2, the AFWA version of the GOCART dust model was used.It calculates the saltation flux as a function of friction velocity (u * ) and threshold friction velocity (u * t ): where C is a tunable empirical constant, ρ 0 is air density and g is gravitational acceleration.The bulk vertical dust flux is estimated by F = αQE (Marticorena and Bergametti, 1995), in which α is the sandblasting efficiency and E is the dust erodibility factor.The erodibility factor data are included in the model geography dataset.In M3 and M4, the dust emissions are estimated using the GOCART dust model (Ginoux et al., 2001) and are determined by soil texture, moisture and surface wind speed.The drier the soil and the stronger the wind, the higher the dust emissions over the regions where the erodibility factor is not 0. In M5, soil dust emissions were estimated by the approach from Han et al. (2004): C 0 is a constant (1.4 × 10 −15 ), R i is the reduction factor and f i is the factional coverage of i type of vegetation in a model grid (considering that vegetation cover can reduce dust emissions).u * and u * t are the friction and threshold friction velocities.RH and RH t are the relative humidity and threshold relative humidity near the surface.The total dust emission flux is apportioned to each size bin based on field measurements of the vertical dust flux size distributions in Chinese deserts.

Boundary conditions
To predict more realistic spatial and temporal variations in air pollutants, boundary conditions from global chemical transport models are necessary to drive regional chemical transport models (Carmichael et al., 2008b).Simulations of two global chemical transport models (e.g., GEOS-Chem (The Goddard Earth Observing System Model-Chemistry) and MOZART (Model for OZone And Related chemical Tracers)) were used as boundary conditions for MICS-Asia III.GEOS-Chem was developed in the USA to simulate tropospheric chemistry driven by assimilated meteorology (Bey et al., 2001).The National Center for Atmospheric Research (NCAR) also provides global simulations of atmospheric chemistry (MOZART model) and an interface to convert them to WRF-Chem boundary conditions (Emmons et al., 2010), and NASA provides global aerosol distributions using the global GOCART chemistry model (Chin et al., 2002).GEOS-Chem was run with a 2.5 • × 2 • resolution and 47 vertical layers.The MOZART-4 simulations were configured at the horizontal resolution of 2.8 • × 2.8 • and with 28 vertical levels.NASA GOCART was configured at the same resolution as GEOS-5 meteorology (1.25 • × 1 • ).As listed in Table 1, M1 used climatological data from the NOAA Aeronomy Lab Regional Oxidant Model (NALROM), while M2 used boundary conditions from the MOZART-4 (provided from the NCAR website).M3 and M4 used MOZART-4 as boundary conditions for gases and used GOCART as boundary conditions for aerosols.M6 also used fixed climatology boundary conditions, and M5 and M7 used GEOS-Chem outputs as boundary conditions.The spatial distribution of nearsurface concentrations of major gases and aerosols from both GEOS-Chem and MOZART are shown in Fig. S4.Even if the same global chemistry model is used as boundary conditions, the treatments of inputs might differ in detail, which might lead to dissimilarities.In MICS-Asia II, Holloway et al. (2008) discussed the impacts of uncertainties in global models on regional air quality simulations.

Observation data
Historically, the lack of reliable air quality measurements in Asia has been an obstacle in understanding air quality and constraining air quality modeling in Asia.Beginning with MICS-Asia II, observational data from EANET have been used to evaluate model performance.EANET was launched in 1998 to address acid deposition problems in East Asia, following the model of the Cooperative Program for Monitoring and Evaluation of the Long-range Transmission of Air pollutants in Europe (EMEP).As of 2010, there were 54 wet-deposition sites and 46 dry-deposition sites in the 13 participating countries.Quality assurance and quality control measures were implemented at the national levels and in the Inter-laboratory Comparison Project schemes to guarantee a high-quality dataset.EANET supported the current activities of MICS-Asia III and provided measurements in 2010 to all modeling groups.More information about the EANET dataset can be found at http://www.eanet.asia/.
In addition to the EANET data, measurements of air pollutants and aerosol optical depth (AOD) collected at the Campaign on Atmospheric Aerosol Research network of China (CARE-China) (Xin et al., 2015) were also used.Previous successful networks in Europe and the United States underscored the importance of building comprehensive observational networks of aerosols in China to reach a better understanding of the physical, chemical and optical properties of atmospheric aerosols across China.As the first comprehensive attempt in China, CARE-China was launched in 2011 by the Chinese Academy of Sciences (CAS) (Xin et al., 2015).Before launching this campaign, CAS had already been measuring air pollutants and AOD at some CARE-China sites.
Table 2 summaries the locations and characteristics of the CARE-China measurements for January 2010.Air quality measurements include concentrations of PM 2.5 , PM 10 , SO 2 , NO 2 , NO, CO and O 3 .
In addition, AOD from the Aerosol Robotic Network (AERONET) (https://aeronet.gsfc.nasa.gov/)and the operational meteorological measurements (near-surface temperature, humidity, wind speed and downward shortwave radiation) in China and atmospheric sounding data in Beijing were used.AERONET provides a long-term, continuous, readily accessible and globally distributed database of spectral AOD, inversion products and precipitable water.AOD data are calculated for three quality levels: Level 1.0 (unscreened), Level 1.5 (cloud screened) and Level 2.0 (cloud screened and quality assured) (Holben et al., 1998).The locations and characteristics of the AERONET measurements are also summarized in Table 2.In situ measurements of meteorological data from standard stations in China are operated by the China Meteorological Administration (CMA) and different levels of data, including daily, monthly and annually, are accessible to the public (http://data.cma.cn/en).The locations of all observational sites used are marked in Figs.S5-S7.
The meteorology measurements (locations are shown in Fig. S5) were averaged and compared with model results in pairs.The radiation measurements were averaged and compared against model results in northern China and southern China (locations are shown in Fig. S6), separately.The CARE-China, AERONET and EANET measurements (locations are shown in Figs.S6 and S7) were compared against model results site by site, and model ensemble mean values were calculated by averaging all model results.

Analysis methodology
All groups participating in Topic 3 were requested to simulate meteorology, air quality, radiative forcing and effects of aerosols over the Beijing-Tianjin-Hebei region of east China during two periods: January 2010 and January 2013.Each group was requested to submit the following fields from their simulations.
1. Hourly mean meteorology: a. air temperature and water vapor mixing ratio at 2 m above the ground (T2, Q2), wind speed at 10 m above the ground (WS10) and shortwave radiation flux (W m −2 ) at the surface; b. above variables (except shortwave radiation flux) at 1 and 3 km above the ground.

Hourly mean concentrations:
a. SO 2 , NO x , CO, O 3 , PM 2.5 , PM 10 and sulfate, nitrate, ammonium, BC, OC and dust in PM 2.5 ; b. above variables at 1 and 3 km above the ground.
3. Hourly mean AOD, aerosol direct radiative forcings at the surface, top of the atmosphere (TOA) and inside the atmosphere (single scattering albedo is an option for participants).
4. Hourly mean integrated liquid water and cloud optical depth.
5. Changes in T2, Q2, WS10 and PM 2.5 concentrations at the surface due to both direct and indirect aerosol effects.
We calculated multiple model evaluation metrics, including correlation coefficient (r), root mean square error (RMSE), mean bias error (MBE), normalized mean bias (NMB), mean fractional bias (MFB) and mean fractional error (MFE).The equations for these metrics are presented in the Supplement.

Results and discussions
Winter haze events frequently happen in east China, which is partially due to the stagnant weather conditions in winter.Here we present general descriptions of the meteorological conditions during January 2010 using the NCEP/NCAR reanalysis products.Figure S8 displays the monthly mean T2, WS10 and total precipitation.Near-surface wind speeds were very weak in the eastern and central China regions, and there was no significant precipitation in northern China (Fig. S8).During winters, northern China burns coal for heating, generating more emissions of air pollutants.Under stagnant weather conditions, haze episodes are easily triggered.
High concentrations of aerosols during January provide a great opportunity to study aerosol-radiation-weather interactions.
In this section, we present some major features of model performances in meteorological and chemical variables for the January 2010 period.Detailed analyses of aerosol feedbacks and radiative forcing are presented in MICS-Asia III companion papers.Heavy haze occurred over broad regions of east China in January 2010.The plots of observed meteorological variables and PM 2.5 in Beijing show the general situation (Fig. 3).Elevated PM 2.5 occurred during three periods separated in time by roughly 1 week (8, 16 and 26 January).The major event occurred during 15-21 January.The events occurred during periods of low wind speeds and increasing temperature and relative humidity.The high PM 2.5 concentrations during 15-21 January also greatly reduce the downward shortwave radiation.Below we evaluate how well the models predict these features.

Evaluation of meteorological variables
Air quality is affected not only by emissions but also by meteorological conditions.Meteorology affects air quality through altering emissions, chemical reactions, transport, turbulent mixing and deposition processes (Gao et al., 2016c).Thus, it is important to assess how well these participating models reproduced meteorological variables.The predicted T2, Q2, WS10 and daily maximum downward shortwave radiation (SWDOWN) were evaluated against nearsurface observations at the CMA sites.
Figure 4a-c show the comparisons between simulated and observed daily mean T2, Q2 and WS10 averaged over stations in east China (locations are shown in Fig. S5) during January 2010, along with the multi-model ensemble mean and observational SD.The calculated correlation coefficients between models and observations are also shown in Fig. 4 and other calculated model evaluation metrics are summarized in Table 3.In general, the simulated magnitudes and temporal variations in T2 and Q2 show a high order of consistency with observations, with correlation coefficients ranging from 0.88 to 1.For T2, models tend to have a cool bias; M1 and M2 have the lowest RMSE (0.64 and 0.68), lowest MBE (−0.19 and −0.60) and lowest NMB (−0.07 and −0.22 %) values (Table 3).For Q2, most models tend to slightly overestimate values; M1 and M2 have the best performance, with the lowest RMSE (0.14 and 0.10), lowest MBE (0.02 and −0.01), and lowest NMB (0.84 and −0.55 %) values (Table 3).
Simulated wind speeds exhibit a larger diversity of results.All models tend to overestimate WS10, with MBE ranging from 0.15 to 2.37 m s −1 .Overestimating wind speeds under low wind conditions is a common problem of current weather forecasting models, and many factors, including errors in terrain data and reanalysis data, relatively low horizontal and vertical model resolutions, as well as a poorly parameter-  ized urban surface effect, contribute to these overestimations.From the calculated RMSE, MBE and NMB listed in Table 3, M2, M5 and M7 show better skills in capturing WS10.In addition, the multi-model ensemble mean shows the lowest RMSE for Q2 and also better skills than most models for T2 and WS10.The correlation coefficients between the multimodel ensemble mean and observations are 0.99, 0.99 and 0.98 for T2, Q2 and WS10, respectively.
The accuracy of radiation predictions is of great significance in evaluating aerosol-radiation-weather interactions.We evaluated simulated daily maximum SWDOWN averaged over sites in northern China and southern China separately in January 2010 against observations.The locations of the radiation sites are shown in Fig. S6.As shown in Fig. 4d, over stations in northern China, all models except M6 and M7 reproduce daily maximum SWDOWN well, with correlation coefficients ranging from 0.72 to 0.94.The poor per-  S9).The slightly higher daily maximum SWDOWN from M7 than other models is due to the deactivation of aerosol-radiation interactions in the presented M7 simulation.SWDOWN decreases under conditions of high PM, as shown for example on 9 and 15-21 January.This is one of the important reasons for coupled air quality and meteorology modeling.It is worth noting that most models predict higher daily maximum SWDOWN compared to observations when severe haze happened in the North China Plain (16-19 January 2010), indicating that aerosol effects on radiation might be underestimated.Besides, clouds are also important for altering radiation.To exclude clouds' impacts on the radiation shown here, we calculated the radiation reduction ratio due to clouds using radiation prediction for clear sky and for cloudy conditions from M2 (shown in Fig. S10).During the severe haze period (16-19 January 2010), the averaged reduction fraction is 5.9 % in northern China and 4.2 % in southern China.Thus, the relatively lower radiation during this period (Fig. 4d) is mainly caused by aerosols, but the lowest radiation on 20 January was caused by clouds (Figs.4d and S10).Over southern China sites (Fig. 4e), M6 and M7 show a better consistency with observations than over northern China sites.
According to the calculated RMSE listed in Table 3, M3 and the multi-model ensemble mean exhibit relatively better performance in capturing the observed time series of daily maximum SWDOWN in both northern China and southern China.
The above comparisons show that T2 and Q2 were reproduced well by the participating models, but wind speeds were overestimated by all models.Emery et al. (2001) proposed that excellent model performance would be classified as wind speed RMSE smaller than 2 m s −1 and wind speed bias smaller than 0.5 m s −1 .Based on the calculated RMSE and MBE of WS10 shown in Table 3, RMSE values from all models match the proposed RMSE threshold but MBE values are higher than 0.5 m s −1 .The vertical distributions of temperature, water vapor mixing ratio and wind speeds were also validated against atmospheric sounding data in Beijing at 1 and 3 km (Fig. S11, averaged at 00:00 and 12:00 UTC) (http://weather.uwyo.edu/upperair/sounding.html).The magnitudes of temperature, water vapor mixing ratio and wind speeds from different models are generally consistent with each other at 1 and 3 km, but variations are larger near the surface.

Evaluation of air pollutants
Figure 5 displays the daily averaged predicted and observed SO 2 , NO x , CO, O 3 , PM 2.5 and PM 10 concentrations at the Beijing station, along with the observational SD (locations are shown in Fig. S7).Comparisons for the Tianjin, Shijiazhuang and Xianghe sites are shown in Figs.S12-S14.M6 only provided SO 2 and NO x concentrations, so it is not shown in the plots of CO, O 3 , PM 2.5 and PM 10 .The observed and predicted primary gaseous pollutants, PM 2.5 and PM 10 show the same monthly variations with elevated values at roughly weekly intervals, with the largest event occurring during 15-21 January.For example, as shown in the comparisons of SO 2 concentration, the temporal variations are reproduced well by all the models, but peak values are overestimated or underestimated by some models.Based on the calculated MBE values shown in Table 4, all models except M2 tend to underestimate SO 2 at the CARE-China sites.M1 shows the highest correlation (0.90) with SO 2 observations in the Beijing site, and most other models show similar good correlations.The multi-model ensemble mean shows a better agreement with observations with a higher correlation of 0.92, and it falls within the range shown by the SD error bar.In general, the predictions for NO x capture the main features in the observations, with slightly less skills than for the SO 2 prediction.The calculated correlation coefficients for NO x from different models are close to each other, ranging from 0.63 to 0.88.M2 and M5 predict higher NO x concentrations than observations and other models (MBE in Table 4).All models overestimate NO x concentration in Shijiazhuang (Fig. S14), suggesting that NO x emissions in Shijiazhuang might be overestimated in the MIX emission inventory.All models produce similar CO predictions.
PM 2.5 concentrations are well modeled, with high correlation coefficients ranging from 0.87 to 0.90 in Beijing, from 0.83 to 0.93 in Tianjin and from 0.74 to 0.91 in Xianghe.The correlation coefficient of the multi-model ensemble mean for PM 2.5 reaches 0.94 (Table 4), better than any individual model.The performances of all participating models in reproducing PM 10 variations are not as good as reproducing PM 2.5 .M1 and M2 overestimate PM 10 concentrations, and other models underestimate PM 10 concentrations (MBE in Table 4).These biases are probably related to different treatments of primary aerosols and anthropogenic dust in the models.In winter in the North China Plain, soil dust generally contributes about 10 % to PM 2.5 concentrations (He et al., 2014), but there is also primary PM from anthropogenic activity, such as power plants, traffic and construction.The primary particles are mostly in coarse mode, which might contribute to PM 10 concentrations, but this is highly uncertain compared with other anthropogenic emission sectors.
The models showed the poorest skills in predicting ozone.All models exhibit different performances in simulating ozone concentrations, and the correlation coefficients between models and observations can reach negative values (Fig. S12).M3 and M4 tend to overestimate ozone concentrations, M2 slightly overestimates it, and M1, M5 and M7 slightly underestimate it (MBE in Table 4).According to the calculated RMSE in Table 4, M1 and M7 show relatively better performance in modeling ozone variations.Although WRF-Chem and NU-WRF models were applied at three institutions, different gas phase chemistry schemes were used, which leads to these diversities among predicted ozone concentrations.The impacts of gas phase chemical mechanisms on ozone simulations have been investigated in Knote et al. (2015).The overestimations of ozone concentrations from M3 and M4 primarily occur during nighttime, implying the underestimated titration of ozone by NO x .Forkel et al. (2015) reported that the RADM2 solver in WRF-Chem has the problem of underestimating ozone titration in areas with high NO emissions, and it is the version that applied in M3 and M4.
Figure 6 shows the comparisons between modeled and observed ground level daily averaged concentrations of SO 2 , NO x , O 3 and PM 10 during January 2010 at the Rishiri site in Japan from EANET.The locations of EANET sites are marked in Fig. S7.Comparisons at other EANET sites are shown in Figs.S15-S18.The models are able to predict the major features in the observations.For example, low values of most pollutants are observed (and predicted) during the first half of the month, followed by elevated values, which peak on 21 January.For SO 2 , most models show similar capability in producing the temporal variations in observations with slight underestimation (MBE in Table 5).According to the calculated RMSE averaged over all the EANET sites, M2 and the multi-model ensemble mean performed the best.For NO x , the multi-model ensemble mean shows a lower RMSE than any individual model (Table 5).Similar to the comparisons over CARE-China sites, large discrepancies exist in ozone predictions, but the model ensemble mean still shows the lowest RMSE for ozone predictions.PM 10 concentrations are largely underestimated by M1 (largest negative MBE: −21.03 µg m −3 ) and overestimated by M5 (highest positive MBE: 3.77 µg m −3 ) (Table 5), which could be related to the differences in sea-salt treatments.Spatial distributions of the monthly near-surface concentrations of SO 2 , NO x , O 3 and CO for January 2010 from all participating models are shown in Fig. S19.The aerosol spatial distributions are discussed in the following section.

PM 2.5 and PM 2.5 chemical composition distribution
Due to different implementations of chemical reactions in the models, predicted PM 2.5 chemical compositions from participating models differ greatly.Figures 7 and 8 show the predicted monthly mean concentrations of sulfate, nitrate, ammonium, BC and OC in PM 2.5 from all the participating models for January 2010.M1, M2, M3, M4 and M7 all predict quite low sulfate concentrations in east China but with considerably enhanced sulfate in southwestern China and western India.M5 and M6 show similar spatial patterns of sulfate except that M6 produces higher concentrations.The chemical production of sulfate is mainly from gas phase oxidation of SO 2 by OH radicals and aqueous-phase pathways in cloud water.In cloud water, dissolved SO 2 can be oxidized by O 3 , H 2 O 2 , Fe(III), Mn(II) and NO 2 (Seinfeld and Pandis, 2016).Most chemical transport models have included the above gas phase oxidation of SO 2 by OH and the oxidation of dissolved SO 2 by O 3 and H 2 O 2 in the aqueous phase.Under hazy conditions, radiation is largely reduced due to aerosol dimming effects, and sulfate formation from gas phase and aqueous-phase oxidation processes are slowed down, which tends to reduce sulfate concentration.However, field observations exhibit an increase in sulfate concentration during haze episodes (Zheng et al., 2015).Cheng et al. (2016) proposed that the reactive nitrogen chemistry in aerosol water could contribute significantly to the sulfate increase due to enhanced sulfate production rates of NO 2 reaction pathways under high aerosol pH and elevated NO 2 concentrations in the North China Plain (NCP).Wang et al. (2016) also pointed out that the aqueous oxidation of SO 2 by NO 2 is key to efficient sulfate formation on fine aerosols with high relative humidity and NH 3 neutralization or under cloudy conditions.Besides, Zheng et al. (2015) suggested that heterogeneous chemistry on primary aerosols could play an important role in sulfate production and lead to increasing sulfate simulation during haze episodes.X. Huang et al. (2014) found that including natural and anthropogenic mineral aerosols can enhance sulfate production through aqueous-phase oxidation of dissolved SO 2 by O 3 , NO 2 , H 2 O 2 and transition metal.Gao et al. (2016b), Wang et al. (2014) and Zhang et al. (2015) also emphasized the importance of multiphase oxidation in winter sulfate production.However, these processes are currently not incorporated into the participating models for this study, which might be responsible for the apparent underpredictions of sulfate concentrations (Fig. 9).M5 incorporated heterogeneous chemical reactions on aerosol surfaces (Li and Han, 2010), which enhances total sulfate production.
M1 and M5 predict relatively small nitrate and ammonium concentrations, while M2, M6 and M7 produce similar magnitudes and spatial patterns of nitrate.Nitrate formation involves both daytime and nighttime chemistry.During the daytime, NO 2 can be oxidized by OH to form nitric acid www.atmos-chem-phys.net/18/4859/2018/Atmos.Chem.Phys., 18, 4859-4884, 2018   (HNO 3 ) and by ozone to form NO 3 .HNO 3 is easily removed by dry or wet deposition, but NO 3 is easily photolyzed back to NO 2 .During nighttime, NO 3 is the major oxidant, which oxides NO 2 to form dinitrogen pentoxide (N 2 O 5 ).The homogenous reaction of N 2 O 5 with water vapor is possible but very slow, while the heterogeneous uptake of N 2 O 5 onto aerosol particles has been identified as a major sink of N 2 O 5 and an important contributor to particulate nitrate (Kim et al., 2014).The MOSAIC aerosol module (Zaveri et al., 2008) coupled with CBMZ gas phase chemistry in WRF-Chem already includes the heterogeneous uptake of N 2 O 5 since ver-   Atmos.Chem.Phys., 18, 4859-4884, 2018 www.atmos-chem-phys.net/18/4859/2018/chemistry or the implementations of a different gas phase oxidation in these models.Many studies have been conducted regarding sulfate formation issues.Nitrate also accounts for a large mass fraction in PM 2.5 during winter haze events in northern China, yet less attention has been paid to fully understanding its formation.It is worth digging further into the details about how different processes contribute to high nitrate concentrations in future studies.M3 and M4 do not include the explicit nitrate and ammonium treatment, but ammonium is implicitly considered in total PM 2.5 mass estimate.
The predicted ammonium concentrations are associated with the amounts of sulfate and nitrate, as shown by its similar spatial distribution to sulfate and nitrate.NH 3 neutralizes H 2 SO 4 and HNO 3 to form aerosol, so its amount can affect the formation of sulfate, nitrate and ammonium.Since the same emission inventory was used, the amount of ammonia available for neutralizing will not vary greatly among these models.Thus, the rates of H 2 SO 4 and HNO 3 production determines the amounts of ammonium.For example, the produced ammonium concentrations are small in M1, similar to its predicted sulfate and nitrate concentrations.High ammonium concentrations are predicted from M6, due to high productions of nitrate and sulfate (Fig. 7).
The spatial distributions and magnitudes of predicted BC from all participating models are similar to each other as BC is a primary pollutant and not impacted by chemical reactions.The concentrations of BC in the atmosphere are mainly influenced by planetary boundary layer (PBL) mixing and diffusion, aging, deposition (dry deposition and wet scavenging) and advection processes.Predicted BC concentrations from M2 and M7 are higher than those from other models, which might be caused by the treatment of aging and deposition (dry deposition and wet scavenging) processes.For example, in the GOCART aerosol model (M3 and M4), 80 % of BC are assumed to be hydrophobic and then undergo aging to become hydrophilic in an e-folding time of 1.2 days.Hydrophilic aerosols will go through wet deposition.But in other models like M2 and M7, BC is assumed to be hydrophobic and there is thus less wet removal.
The disparity among predicted OC concentrations is mainly associated with the different treatments of SOA production, given that the primary organic carbon (POC) prediction is generally consistent among models using the same emission inventory.The predicted OC concentrations from M1, M2 and M7 are close to each other.M1 uses SORGAM to simulate SOA, but M2 and M6 did not include any SOA formation mechanism.The similar magnitudes of OC from M1 suggest that SORGAM in M1 does not produce appreciable amounts of SOA, which is consistent with the findings in Gao et al. (2016a).Although SOA formation was implemented in M5, the production is relatively weak compared to M3 and M4.In the atmosphere, SOA is mainly formed from the condensation of semi-volatile VOCs, which are the products of the oxidation of primary VOCs.An em-pirical two-product model (Odum et al., 1996) is often used to simulate SOA formation, but this method was reported to significantly underestimate measured SOA mass concentrations (Heald et al., 2008).Later, the volatility basis-set (VBS) approach (Donahue et al., 2006) was developed to represent the wide range of the volatility of organic compounds and complex processes.It was found that the VBS approach was able to increase SOA production and was able to reduce observation-simulation biases in many regions with high emissions (Tsimpidi et al., 2010) including east China (Han et al., 2016).It was also suggested that primary organic aerosols (POAs) are semi-volatile and can evaporate to become SOA precursors (Kanakidou et al., 2005).In M5, the SOA production is calculated using a bulk yield method (Lack et al., 2004), which uses yields that represent the maximum amount of SOA able to be produced from a unit of reacted VOCs.However, the SOA concentration is highly dependent on the yield data.During haze episodes, photochemistry is reduced due to the aerosol dimming effect; thus, aqueous reaction processes on aerosol water and cloud or fog water could become much more important in producing SOA.R. Huang et al. (2014) also suggested that low temperature does not significantly reduce SOA formation rates of biomass burning emissions.
However, most models oversimplified SOA formation.In M3 and M4, SOA was treated by assuming that 10 % of VOCs from a terrestrial source are converted to OC (Chin et al., 2002), and these models produced high OC concentrations, with a major contribution from SOA.The 10 % yield rate could be unrealistically high during hazy days because solar radiation was much reduced.Zhao et al. (2016) comprehensively assessed the effect of organic aerosol aging and intermediate-volatility emissions on organic aerosol (OA) formation and confirmed their significant roles.All these results suggest that more complicated SOA schemes are needed to improve organic aerosol simulations during haze events.
The different predictions of PM 2.5 chemical components lead to differences in PM 2.5 and PM 10 concentrations for January 2010, which are shown in the last row of Fig. 8.Although spatial distributions of PM 2.5 from these models are similar, the underlying causes are different.M2, M3 and M5 simulated higher PM 2.5 levels in the deserts of west China, which are contributed by wind-blown dust.M1 and M7 failed to produce high PM 2.5 concentrations in the deserts of west China, due to the omission of dust emissions.The spatial distributions of predicted wind-blown dust from M5 are slightly different from M2 and M3, with lower concentrations over the Gobi desert (in west Inner Mongolia) (PM 10 in Fig. 8).M2 and M3 used similar GOCART dust emission schemes based on wind speeds and erodible areas, while M5 further considered the dust reduction by vegetation cover, which could partially explain the relatively lower wind-blown dust predictions from M5.The enhanced PM 2.5 concentrations in central China from M2 and M7 are caused by large nitrate production, as shown in Fig. 7.
The differences in the predictions of aerosols composition discussed above can be seen clearly in the comparisons at the Beijing site during the 13-23 January period when a haze event occurred in the NCP (Fig. 9).Most models failed to produce the observed high sulfate concentrations.Only the sulfate predictions from M5 are close to the observed high values.M2 and M7 predict reasonable nitrate concentrations.M3 and M4 overpredict OC during the haze period, but other models underpredict OC concentrations.Figures 10 and 11 show the ensemble mean monthly averaged near-surface PM 2.5 , PM 2.5 composition, along with the spatial distribution of the coefficient of variation.The coefficient of variation (CV) is defined as the SD divided by the average (Carmichael et al., 2008b), and larger values indicate lower consistency among models.The mean concentrations of PM 2.5 and the mean concentrations of PM 2.5 chemical compositions are high in Sichuan Basin and east China.High CV values are shown in northern China for sulfate and in most areas for nitrate and OC.The diversity in predictions of these species is caused by the complexity of secondary formation and different model treatments, which have been discussed above.Higher consistency is shown for model BC with CV values less than 0.3 in most areas (Fig. 10h).The CV values for PM 2.5 are also low in the northern China region, which is consistent with the good performance of PM 2.5 predictions shown in above comparisons.However, the CV values can reach above 1.6 in northwestern Chinese regions, partially due to discrepancies in dust predictions.

Evaluation of AOD
We used the AOD measurements from the AERONET and CARE-China networks to evaluate how participating models perform in simulating AOD.The submitted AOD data from all models except M6 were at 550 nm, and AOD predictions from M6 were at 495 nm.We used the Ångström exponent relation (Schuster et al., 2006) to convert AOD at 495 to 550 nm and all the AERONET and CARE-China AOD data to 550 nm used.The locations of the AERONET and CARE-China AOD measurement sites are marked in Fig. S6.Daytime mean AOD are calculated in a pairwise manner and the comparisons and performance statistics are shown in Figs. 12 and 13 and Table 6.On some days, data are missing because AOD cannot be retrieved under serious pollution and cloudy conditions (Gao et al., 2016a).On days with data, the variations in AOD are captured well by all models.However, large disparities exist among models in the simulated peak AOD values (factor of 2) during the severe haze episode of 15-20 January 2010 (Figs. 12 and 13).M2 consistently simulated the highest AOD values among models, followed by M5 and M7, while M6 simulated the lowest.
In M1 and M7, particle size distribution is described by a lognormal function with a geometric mean radius and a geometric SD based on the OPAC (Optical Properties of Aerosols and Clouds) database (Hess et al., 1998).In M3 and M4, sulfate, BC and OC are parameterized in bulk mode, and a sectional scheme is used for sea-salt and dust aerosols.M2 uses an eight-bin sectional aerosol scheme with size sections ranging from 39 nm to 10 µm.The refractive index of different aerosol components in the models are mainly taken from d' Almeida et al. (1991) or the OPAC database.All models except M6 use a kappa (κ) parameterization to describe aerosol hygroscopic growth (Petters and Kreidenweis, 2007), in which the hygroscopicity κ values largely vary among different aerosol chemical components.For example, κ = 0 for black carbon and κ > 0.6 for inorganic aerosols.M6 uses a different hygroscopic growth scheme following Kiehl and Briegleb (1993).WRF-Chem models assume internal mixing among aerosols within each mode (or size bin) and external mixing between modes (or size bins), M5 assumes that inorganic and carbonaceous aerosols are internally mixed but externally mixed with soil dust and sea salt.M6 uses an external mixture assumption among aerosols except for hydrophilic BC, which is internally mixed with other aerosols in a coreshell way.
As shown in Fig. 9, the observed total inorganic aerosol concentration in Beijing on 19 January 2010 was about 130 µg m −3 with sulfate concentrations higher than 50 µg m −3 and nitrate concentrations over 60 µg m −3 .However, all models except M5 largely underestimated sulfate concentrations.Most models except M2 underpredicted nitrate concentrations.The predicted concentrations of inorganic aerosols (the sum of sulfate, nitrate and ammonium) from M2 (175 µg m −3 ) is higher than observations and other models (Fig. 9), which can partially explain the largest simulated AOD by M2.The largest simulated AOD by M2 could also be related to different vertical distributions of aerosols.M6 simulated a similar level of inorganic aerosols as M2, but the simulated AOD is lower than other models, which could be caused by weaker hygroscopicity from a different scheme (Kiehl and Briegleb, 1993) and/or lower simulated RH (see Fig. S20).Although M3 and M4 largely overpredict OC concentrations, the mass extinction coefficient of OC is smaller than inorganic aerosols.M1 predicts about 3 times larger BC concentrations than the observations.Although the mass extinction coefficient of BC is larger than inorganic aerosols, the mass concentrations and hygroscopicity of BC are smaller than those of inorganic aerosols, lead- ing to relatively lower AOD from M1 simulation.M5 and M7 show high consistency in the simulated AOD due to similar levels of predicted inorganic aerosol concentrations (80 ∼ 90 µg m −3 ) and similar hygroscopic growth assumptions.
As listed in Table 1, internal mixing is assumed by all the participating models except M6 for major aerosol compositions.Curci et al. (2015) discussed the impacts of mixing state on simulated AOD and found that the external mixing state assumption significantly increase simulated AOD.M6 uses external mixing but shows relatively lower AOD mainly due to its ignorance of other aerosol species such as dust and sea salt.In general, the magnitudes of simulated inorganic aerosol concentrations and the hygroscopic growth efficiency (affected by varied RH) can explain the simulated variations and magnitudes of AOD in Beijing during the severe haze event, given that most models use a similar lognormal size distribution and internal mixing assumptions.
Table 6 shows the statistics for AOD simulations at the northern China sites and at all sites.In the NCP region, R ranges from 0.36 ∼ 0.74 for all the models.It is noteworthy that R values at the sites in NCP are larger than those at all sites, indicating the larger reliability of model inputs (emissions) and meteorological simulations in northern China.In terms of magnitudes, all models tend to underpredict AOD, with an NMB of −2.7 to −71 % in the NCP, and larger biases (NMB of −21 to −75 %) at all sites.It is interesting to note that using a finer grid size (M4) can produced a slightly smaller NMB compared with the same model using larger grid size (M3).The effect of grid resolution will be the topic of a future paper.

Summary
The MICS-Asia Phase III Topic 3 examines how current online coupled air quality models perform in reproducing extreme aerosol pollution episodes in northern China and how high aerosol loadings during these episodes interact with radiation and weather.A new anthropogenic emission inventory was developed for this phase (Li et al., 2017), and this inventory along with biogenic, biomass burning, air and ship, volcano, and dust emissions was used for all the modeling groups.All modeling groups were required to submit results based on the analysis methodology that is documented in this paper.
This paper focused on the evaluation of the predictions of meteorological parameters and the predictions of aerosol mass, composition and optical depth.These factors play important roles in feedbacks impacting weather and climate through radiative and microphysical processes.Comparisons against daily meteorological variables demonstrated that all models could capture the observed near-surface temperature and water vapor mixing ratio, but near-surface wind speeds were overestimated by all models to varying degrees.The observed daily maximum downward shortwave radiation and particularly low values during haze days were represented in the participating models.Comparisons with measurements of air pollutants, including SO 2 , NO x , CO, O 3 , PM 2.5 , and PM 10 , from the CARE-China and EANET networks showed that the main features of the accumulation of air pollutants are generally represented in the current generation of online coupled air quality models.The observed variations in AOD from both the CARE-China and AERONET networks were also reproduced well by the participating models.Differences were found between simulated air pollutants, particularly ozone.While winter time ozone levels are typically low (below 40 ppb) as photochemical pathways are slow, the models captured the synoptic variability but differed in the absolute magnitudes of near-surface concentrations.Large differences in the models were found in the predicted PM 2.5 chemical compositions, especially secondary inorganics and organic carbon.During winter haze events, the production from gas phase chemistry is inhibited and including other aerosol formation pathways (such as aqueousphase chemistry) leads to the large differences between simulated concentrations of secondary inorganic aerosols.In addition, different SOA treatments also lead to large discrepancies between simulated OC concentrations.Differences in the simulated variations and magnitudes of AOD in Beijing during the January 2010 haze episodes could be explained by the differences in simulated inorganic aerosol concentrations and the hygroscopic growth efficiency (affected by varied RH).
Results from this intercomparison demonstrate that there remain important issues with current coupled models in predicting winter haze episodes.Low wind speeds play an important role in haze episodes.Current models can predict the low wind-speed-high-haze relationship but overestimate the low wind speeds.This contributed to the underestimation of PM 2.5 .The models also underestimate the production of sec-ondary inorganic aerosols.There is currently a great deal of research focused on inorganic aerosol production under winter haze conditions and new pathways need to be included in the models to improve prediction skills.Furthermore, current models have various treatments of SOA production, leading to large differences in SOA predictions during winter haze episodes.
However, it was also found that using the ensemble mean of the models produced the best prediction skill.While this has been shown for other conditions (for example, the prediction of high-ozone events in the US (Mckeen et al., 2004)), this is to our knowledge the first time it has been shown for heavy haze events.The uncertainties in predictions of aerosol composition concentrations and optical depth will impact estimates of the aerosol direct and indirect effects during haze events (Gao et al., 2017a-c).The results of the MICS-Asia Topic 3 experiments looking at the direct and indirect effects during these heavy haze events are the subject of companion papers.
Competing interests.The authors declare that they have no conflict of interest.Special issue statement.This article is part of the special issue "Global and regional assessment of intercontinental transport of air pollution: results from HTAP, AQMEII and MICS".It is not associated with a conference.

Figure 3 .
Figure 3. Observed near-surface daily meteorological variables and PM 2.5 concentrations in Beijing for January 2010.

Figure 4 .
Figure 4. Comparisons between simulated and observed near-surface temperature (a), water vapor mixing ratio (b) and wind speeds (c) (T2, Q2 and WS10) and downward shortwave radiation in northern China (d) and southern China (e) (spatial daily values are averaged over measurements shown in S4 and S5; the error bars show the SD of values over the measurement sites).

Figure 5 .
Figure 5. Comparisons between simulated and observed daily air pollutants (SO 2 , NO x , CO, O 3 , PM 2.5 and PM 10 ) at the Beijing CARE-China site.

Figure 6 .
Figure 6.Comparisons between simulated and observed daily air pollutants (SO 2 , NO x , O 3 and PM 10 ) at the Rishiri EANET sites.

Figure 7 .
Figure 7. Simulated monthly concentrations of major PM 2.5 components (µg m −3 ) for January 2010 from all participating models.
sion v3.5.1(Archer-Nicholls et al., 2014), which is the version used by M2, leading to the high production of nitrate.An et al. (2013) incorporated photoexcited nitrogen dioxide molecules, heterogeneous reactions on aerosol surfaces and direct nitrous acid (HONO) emissions into the WRF-Chem model and found that these additional HONO sources could improve simulations of HONO and nitrate in northern China.M7 also predicts high nitrate concentrations (N 2 O 5 and NO 2 gases react with liquid water;Zheng et al., 2015), and the predicted lower nitrate concentrations from other models are probably due to missing aqueous-phase and heterogeneous

Figure 8 .
Figure 8. Simulated monthly concentrations of PM 2.5 and major PM 2.5 components (µg m −3 ) for January 2010 from all participating models.

Figure 9 .
Figure 9. Observed and simulated daily mean concentrations of major PM 2.5 chemical components at the urban Beijing site.

Figure 10 .
Figure 10.The ensemble mean monthly averaged near-surface distributions of PM 2.5 compositions for January 2010 (sulfate (a), nitrate (c), ammonium (e), BC (g), and OC (i)), along with the spatial distribution of the coefficient of variation (b, d, f, h, and j, SD divided by the average).

Figure 11 .
Figure 11.The ensemble mean monthly averaged near-surface distributions of PM 2.5 for January 2010 (a), along with the spatial distribution of the coefficient of variation (b, SD divided by the average).

Table 1 .
Participating models in Topic 3.

Table 4 .
Performance statistics of air pollutants at the CARE-China sites (RMSE and MBE units: ppbv for gases and µg m −3 for PM).

Table 5 .
Performance statistics of air pollutants at the EANET sites (RMSE and MBE units: ppbv for gases and µg m −3 for PM).

Table 6 .
Performance statistics of AOD.