Evaluation of modeling NO<sub>2</sub> concentrations driven by satellite-derived and bottom-up emission inventories using in situ measurements over China

Chemical transport models together with emission inventories are widely used to simulate NO2 concentrations over China, but validation of the simulations with in situ measurements has been extremely limited. Here we use ground measurements obtained from the air quality monitoring network recently developed by the Ministry of Environmental Protection of China to validate modeling surface NO2 concentrations from the CHIMERE regional chemical transport model driven by the satellite-derived DECSO and the bottom-up MIX emission inventories. We applied a correction factor to the observations to account for the interferences of other oxidized nitrogen compounds (NOz), based on the modeled ratio of NO2 to NOz. The model accurately reproduces the spatial variability in NO2 from in situ measurements, with a spatial correlation coefficient of over 0.7 for simulations based on both inventories. A negative and positive bias is found for the simulation with the DECSO (slope= 0.74 and 0.64 for the daily mean and daytime only) and the MIX (slope= 1.3 and 1.1) inventories, respectively, suggesting an underestimation and overestimation of NOx emissions from corresponding inventories. The bias between observed and modeled concentrations is reduced, with the slope dropping from 1.3 to 1.0 when the spatial distribution of NOx emissions in the DECSO inventory is applied as the spatial proxy for the MIX inventory, which suggests an improvement of the distribution of emissions between urban and suburban or rural areas in the DECSO inventory compared to that used in the bottom-up inventory. A rough estimate indicates that the observed concentrations, from sites predominantly placed in the populated urban areas, may be 10-40 % higher than the corresponding model grid cell mean. This reduces the estimate of the negative bias of the DECSO-based simulation to the range of -30 to 0 % on average and more firmly establishes that the MIX inventory is biased high over major cities. The performance of the model is comparable over seasons, with a slightly worse spatial correlation in summer due to the difficulties in resolving the more active NOx photochemistry and larger concentration gradients in summer by the model. In addition, the model well captures the daytime diurnal cycle but shows more significant disagreement between simulations and measurements during nighttime, which likely produces a positive model bias of about 15 % in the daily mean concentrations. This is most likely related to the uncertainty in vertical mixing in the model at night.

Abstract. Chemical transport models together with emission inventories are widely used to simulate NO 2 concentrations over China, but validation of the simulations with in situ measurements has been extremely limited. Here we use ground measurements obtained from the air quality monitoring network recently developed by the Ministry of Environmental Protection of China to validate modeling surface NO 2 concentrations from the CHIMERE regional chemical transport model driven by the satellite-derived DECSO and the bottom-up MIX emission inventories. We applied a correction factor to the observations to account for the interferences of other oxidized nitrogen compounds (NO z ), based on the modeled ratio of NO 2 to NO z . The model accurately reproduces the spatial variability in NO 2 from in situ measurements, with a spatial correlation coefficient of over 0.7 for simulations based on both inventories. A negative and positive bias is found for the simulation with the DECSO (slope = 0.74 and 0.64 for the daily mean and daytime only) and the MIX (slope = 1.3 and 1.1) inventories, respectively, suggesting an underestimation and overestimation of NO x emissions from corresponding inventories. The bias between observed and modeled concentrations is reduced, with the slope dropping from 1.3 to 1.0 when the spatial distribution of NO x emissions in the DECSO inventory is applied as the spatial proxy for the MIX inventory, which suggests an improvement of the distribution of emissions between urban and suburban or rural areas in the DECSO inventory compared to that used in the bottom-up inventory. A rough estimate indicates that the observed concentrations, from sites predominantly placed in the populated urban areas, may be 10-40 % higher than the corresponding model grid cell mean. This reduces the estimate of the negative bias of the DECSO-based simulation to the range of −30 to 0 % on average and more firmly establishes that the MIX inventory is biased high over major cities. The performance of the model is comparable over seasons, with a slightly worse spatial correlation in summer due to the difficulties in resolving the more active NO x photochemistry and larger concentration gradients in summer by the model. In addition, the model well captures the daytime diurnal cycle but shows more significant disagreement between simulations and measurements during nighttime, which likely produces a positive model bias of about 15 % in the daily mean concentrations. This is most likely related to the uncertainty in vertical mixing in the model at night.
global NO x (NO + NO 2 ) emissions by a factor of 3-6 since preindustrial times (Prather et al., 2001). China is one of the largest contributors to NO x emissions over the world, contributing 18 % of global NO x emissions based on the estimate of EDGAR v4.2 (European Commission (EC): Joint Research Centre (JRC)/Netherlands Environmental Assessment Agency (PBL), 2011), as a consequence of the large energy consumption driven by the rapidly growing economy. A good understanding of NO 2 levels as well as temporal and spatial variations is urgent to help solve the serious environmental problems, particularly poor air quality, caused by emissions.
Chemical transport models (CTMs) have been widely used to provide predictions of gas-phase pollutants including NO 2 and particulate matter concentrations, which are powerful tools for understanding regional air pollution issues, assessing emission control scenarios (Kiesewetter et al., 2014), and analyzing trans-boundary transport . The modeled NO 2 concentrations have received extensive evaluation by comparing with ground-based measurements (Pay et al., 2012), satellite observations (Huijnen et al., 2010), and airborne observations  for regional (Stern et al., 2008) and urban-scale (Terrenoire et al., 2015) air quality simulations. The results of these intercomparisons show quite good performance of the models but still suggest uncertainties in the estimation of the meteorological input data (Bessagnet et al., 2016), the modeling of NO x chemistry (Valin et al., 2011), and particularly emission inventories (Mues et al., 2014).
Emission inventories are necessary input to CTMs and recognized as one of the most important sources of uncertainties. Traditional bottom-up emissions are calculated by aggregating information from diverse sources of information such as fuel statistics and measurements of emission factors. The large uncertainties in energy statistics (Guan et al., 2012) and applications of non-Chinese emission factors  have been propagated into uncertainties in bottomup inventories for China (Zhao et al., 2011). The lack of bottom-up inventories for most recent years introduces additional biases for model simulations because inventories could quickly become outdated due to the rapidly changing emissions Liu et al., 2016a). NO 2 columns detected from space provide additional constraints to yield a satellite-derived NO x emission inventory. Initially, NO x emissions have been estimated from satellite observations together with CTMs at coarse resolution based on the assumption of a linear relationship between NO 2 columns and NO x emissions ignoring pollution transport . More complicated techniques like the Kalman filter (Napelenok et al., 2008) and four-dimensional variational data assimilation (4D-Var) (Kurokawa et al., 2009) have been introduced to take pollution transport into account. In addition, CTM-independent methods have been developed for point sources Liu et al., 2016b). The uncertainties in NO 2 column retrievals , in particular for China with high loadings of aerosols (Ma et al., 2013), together with estimation method uncertainties result in errors in satellite-derived inventories.
The modeling of NO 2 concentrations over China has been evaluated with space-and ground-based observations. Reported validation studies have focused on evaluating tropospheric NO 2 column densities simulated by CTMs driven by bottom-up emission inventories using satellite measurements. Differences between the simulated NO 2 column densities and observations of the Global Ozone Monitoring Experiment (GOME) (Ma et al., 2006;Uno et al., 2007), the Scanning Imaging Absorption Spectrometer for Atmospheric CHartographY (SCIAMACHY) (Shi et al., 2008), and the Ozone Monitoring Instrument (OMI)  have been attributed to uncertainties in the magnitude and spatial distribution of bottom-up emissions. The validation of surface NO 2 concentrations was generally performed for limited time periods using a limited set of measurement stations (e.g., three large cities in , one or two sites in Wang et al., 2007, and the city of Nanjing in Ding et al., 2015), due to the absence of routine monitoring data. Alternatively, the satellite-derived inventories were compared to bottom-up inventories directly, which shows considerable disparity (Lin et al., 2012;Ding et al., 2017a).
Measurements obtained from the recently developed air quality monitoring network in China (Zhang and Cao, 2015) provide the means to evaluate the quality of NO 2 modeling. We evaluate the surface NO 2 concentrations simulated by a CTM driven by both satellite-derived and bottom-up inventories with this newly established dataset. To our knowledge, this is the first time that modeled NO 2 concentrations over China have been evaluated with in situ measurements throughout the country, while an intercomparison for simulations with satellite-derived and bottom-up inventories is performed simultaneously. We structure the paper as follows. In Sect. 2.1 and 2.2 the CTM and emission inventories adopted in this study are described, respectively. The introduction of the in situ measurements from the air quality monitoring network in China and the correction for interference of in situ NO 2 data are given in Sect. 2.3 and 2.4, respectively. Annual mean simulated surface NO 2 values are compared with the corrected in situ measurements in Sect. 3.1. Further analyses focusing on seasonality and diurnal cycle are provided in Sect. 3.2 and 3.3, respectively. Section 4 presents a summary of the major findings in this paper.

CHIMERE model
We used the CHIMERE regional chemical transport model in this study, which is designed to produce daily forecasts of tropospheric trace gas and aerosol pollutants and make long-term simulations at a range of spatial scales (Menut et al., 2013). We use the CHIMERE model v2013b over East Asia (18 to 50 • N and 102 to 132 • E) with a resolution of 0.25 • following the configuration in Ding et al. (2015). The CHIMERE simulation was driven by operational meteorological data from the European Centre for Medium-Range Weather Forecasts (ECMWF) with a horizontal resolution of 0.25 • . Atmospheric variables were simulated in eight layers from the surface to 500 hPa. Tropospheric photochemistry is represented using the reduced MELCHIOR chemical mechanism (Derognat et al., 2003), including about 120 reactions and 44 gaseous species. An aerosol module accounting both for inorganic and organic species of primary or secondary origin is included according to Bessagnet et al. (2004). Boundary conditions for the model domain were derived from monthly mean climatology based on the second-generation Model for OZone And Related chemical Tracers (MOZART) (Horowitz et al., 2003) for gases, the Laboratoire de Météorologie Dynamique Zoom -Interaction avec la Chimie et les Aérosols (LMDz-INCA;Folberth et al., 2006) for nitrate and ammonium, and the Georgia Tech/Goddard Global Ozone Chemistry Aerosol Radiation and Transport (GOCART; Ginoux et al., 2001) for other aerosols. At default, NO x emissions are speciated as 9.2 % of NO 2 , 0.8 % of HONO, and 90 % of NO in the CHIMERE model (Menut et al., 2013), following the Generation of European Emission Data for Episodes (GENEMIS) recommendations (Friedrich, 2000;Kurtenbach et al., 2001;Aumont et al., 2003). Open-access satellite-derived and bottom-up inventories that provide up-to-date emissions over East Asia were selected to drive the model in this study, which will be detailed in Sect. 2.2.

Emission inventory
The satellite-derived NO x emissions were estimated by the algorithm DECSO (Daily Emission estimates Constrained by Satellite Observations) v5 using an extended Kalman filter (Mijling and van der A, 2012;Ding et al., 2015Ding et al., , 2017b. DECSO uses one forward model run of a CTM to calculate the response of NO 2 concentrations to both local and nonlocal NO x emissions. Daily OMI NO 2 observations retrieved with the DOMINO version 2 algorithm  are used as a constraint to update emissions. The DECSO emission data are available at www.globemission.eu (last access: 20 March 2018).
The bottom-up anthropogenic NO x emissions were taken from the MIX inventory (Li et al., 2017a), a mosaic Asian anthropogenic emission inventory under the international collaboration framework of the Model Inter-Comparison Study for Asia (MICS-Asia) and the Task Force on Hemispheric Transport of Air Pollution (TF HTAP). The MIX inventory is developed for the years 2008 and 2010 by an integration of state-of-the-art regional emission inventories for all major anthropogenic sources in 29 countries and regions over Asia. The emissions of China integrated in the MIX inventory are derived from the Multi-resolution Emission Inventory for China (MEIC: http://www.meicmodel.org, last access: 20 December 2017) compiled by Tsinghua University. The anthropogenic emissions together with the biogenic emissions, which were computed automatically in the CHIMERE model using the global MEGAN (Model of Emissions of Gases and Aerosols from Nature) model (Guenther et al., 2006), were adopted as the bottom-up inventory. We refer to this combination as the MIX inventory for brevity hereinafter. Note that monthly emissions for all inventories above were provided at the spatial resolution of 0.25 • × 0.25 • . Both inventories show comparable spatial distributions at national and regional scales, but distinctions between urban and rural areas (see Sect. 3.1). The strength of the MIX inventory is that it includes detailed source-category information (e.g., power plant and transportation sector) for emissions, which is useful for driving atmospheric models and designing emission mitigation policies but is not included in DECSO. The advantage of the DECSO inventory is that emissions are timely updated (as soon as the satellite observations are available); while bottom-up inventories usually lag behind a few years and are outdated by the time they become available. In addition, the spatial information in DECSO is based on OMI NO 2 observations, while MIX relies on spatial proxies like gross domestic product (GDP) to allocate emissions due to the lack of data. An in-depth comparison between inventories has been described by Ding et al. (2017).
We focused on 2015 as the most recent year with available DECSO emission estimates and in situ measurements, but we used the MIX inventory for 2010 because the year 2015 is not available yet. However, the use of the 2010 MIX inventory without scaling is not expected to bring significant bias, as the similarity of NO x emissions for 2010 and 2015 has been reported by both the bottom-up inventory MEIC (Liu et al., 2016a) and the satellite-derived inventory DECSO. For the period of 2010-2012, the NO x emissions of China experienced a rapid growth. A sharp decline in NO x emissions was observed in the years of 2013-2015, with a peak around 2012 (Liu et al., 2016a). As a result, the inventory for 2010 is comparable to that for 2015, even though there is a 5-year lag. Figure 1 compares DECSO NO x emissions for 2015 (a) and 2010 (b), which are consistent in both total amount (21.5 vs. 21.6 Tg) and spatial distribution (r = 0.83). Figure 1 further displays the spatial distributions of the MIX NO x emissions for 2010 (c). These emissions are significantly higher (39 %) than the DECSO inventory when averaged over the model domain.
An air quality simulation using the CHIMERE model was conducted for the full year 2015. Pollutant concentrations including NO 2 were simulated based on the 2015 DECSO and the 2010 MIX NO x inventories, respectively. Note that the 2010 MIX inventory for other species was used together with both NO x inventories. Because of the inconsistency between the emission sectors used in the DECSO and the MIX inven- tories and that in SNAP (Selected Nomenclature for Air Pollution) 97, which is internally used in the CHIMERE model, we adopted the sector mapping table as discussed in Ding et al. (2015). The concentration in the lowest model layer (from the ground up to 20 m) was used for validation against surface NO 2 observations in this study. Figure 2 illustrates the annual mean surface NO 2 simulation using both inventories. Large enhancements are found over industrial regions, in particular northern China, the North China Plain, and the Yangtze River Delta. The model run based on the MIX inventory ( Fig. 2b) shows overall larger concentrations than that based on the DECSO inventory (Fig. 2a).

Ground-level in situ measurements
The real-time hourly NO 2 concentrations as well as other major air pollutants are continuously recorded by the Ministry of Environmental Protection (MEP) in China and are publicly accessible from the year 2013 onwards (Zhang and Cao, 2015). We obtained the hourly in situ measurements from a total of 1413 air quality monitoring sites of the MEP network for 323 major cities over the model domain. The majority of those monitoring sites have been placed in the city center and are named urban assessing stations in the official document (MEP, 2013). These are meant to evaluate the overall level and trend of air quality for areas with the highest concentrations and highest population exposure. The placement criteria of urban assessing stations laid down in the legislation (MEP, 2013) ensure that the measurements are representative for urban areas. Stations are required to be well distributed within the developed area of the city and not too close to stationary emission sources (50 m) or roads (10-100 m depending on the traffic flow). The minimum num-ber of monitoring sites required per city depends on both the urban population and city size, i.e., at least one station for an area of ∼ 50 km 2 (Table 1). In addition, for areas with the concentration exceeding grade II of the national ambient air quality standard (i.e., the annual mean NO 2 concentration of 40 µg m −3 ; MEP, 2012), the minimum required number of monitoring sites is increased by 50 %. MEP also operates other types of measurement sites, including regional and background stations to assess the background air pollution levels and pollutant transport, and source impact and traffic stations close to emission sources. However, only megacities like Beijing and Guangzhou operate such non-urban stations. The fact that urban observations dominate should be kept in mind when comparing the observations with the model results. The horizontal resolution of the model is limited to 0.25 • , which will cause representativeness errors (biases) when comparing the measurements from city stations with the mean of a grid box of the simulations, which can also include rural areas. Note that only the measurements for the dates with 24 h valid measurements (larger than 0) are used for the following analysis in this study. Figure 3a displays the heterogeneous spatial distribution of monitoring sites at the scale of the model grid cell. The over 1000 monitoring sites are allocated to a total of 594 grid cells based on their geolocations. The sites belonging to the grid cells with one, two, and three sites account for 17, 21, and 22 % of the total, respectively (Fig. 3b). We calculated the averaged distance between monitoring sites by averaging individual pairwise distances for every two stations in the same grid cell. Because most monitoring sites are urban stations and are clustered in the city areas, which are often much smaller than the area of a grid cell (∼ 600 km 2 ), the averaged distance is rather small with an average of 3.6 km for all grid  cells as shown in Fig 3c. For megacities with significantly larger built-up areas and thus more monitoring sites, the distribution of sites is more homogeneous over the grid cell and results in a lager distance between stations. The average distance increases from 5 km for grid cells with only one pair of stations to 11 km for those with over eight stations. In our analysis, we excluded in situ measurements from cities with unexpected discrepancies between urban and suburban stations. Because only large cities potentially place the monitoring sites outside urban areas related to the rapid expansion of built-up areas, we classified stations as urban and suburban by visually inspecting satellite imagery from Google Earth for large cities with over four stations. We calculated the annual mean NO 2 of each station. When the NO 2 concentration of urban stations is less than that of suburban stations, the measurements behave differently than expected. The cities (four in total) detected to have unexpected measurements are labeled as "unselected" and discarded from the validation dataset. Note that suburban stations presenting higher NO 2 levels than urban stations but close to large emission sources, e.g., industrial park and airport, are understandable and thus are not excluded from the database. Figure 4 presents the daily average surface NO 2 abundance for the city of Xi'an. Only the dates with 24 h valid measurements (lager than 0) are used for the time series illustration here. The expected enhancement in winter highlighted by both urban (red line) and suburban (blue line) stations has not been detected for the urban station with lower annual mean NO 2 abundance than suburban stations (black line), which provides further support for excluding the Xi'an measurements from the model evaluation.

Correction factor
NO 2 concentrations are measured using commercial chemiluminescence analyzers (Zhang and Cao, 2015), which are subject to a systematic overestimation of ambient NO 2 concentrations (Steinbacher et al., 2007). NO 2 is catalytically transformed into NO by a molybdenum converter and subsequently measured with chemiluminescence. However, other reactive oxidized nitrogen compounds (NO z ) such as peroxyacetyl nitrate (PAN) and nitric acid are also partly converted to NO, resulting in an overestimation of the measured NO 2 .
We applied a correction factor proposed by Lamsal et al. (2008) to account for the interferences of other oxidized nitrogen compounds, based on the modeled ratio of NO 2 to NO z . The correction factor (CF) was calculated from the local chemical concentrations as follows: where AN is the sum of all alkyl nitrate concentrations. Figure 5 shows the seasonal means of the correction factors determined with concentrations of the interfering species  predicted by the CHIMERE model driven by the DECSO inventory. Consistent with the findings in Europe (Huijnen et al., 2010) and the US (Lamsal et al., 2008), the correction factor (difference with the ideal value of 1.0) is largest over polluted urban regions, where NO x is a larger fraction of total oxidized nitrogen compounds. The correction factor tends to be closer to unity in winter, when the NO x photochemistry is slower and thus NO x has a larger relative contribution to total oxidized nitrogen compounds. The correction factor derived from simulations with the MIX inventory (not shown) shows a similar pattern to Fig. 5, but with a larger number of values close to 1 related to the larger emissions. Hourly correction factors for individual hours of each day during the year for all individual stations have been applied to the in situ measurements. It is difficult to quantify the accuracy of the correction factors and errors, as the collocated measurements of other oxidized nitrogen compounds are not publicly available. We used the standard deviation of the daily means of correction factors within a season as a measure of its uncertainty. The average standard deviations for all sites are 10 %, which is comparable to the uncertainty level pointed out by the study of McLinden et al. (2014).  3 Results and discussion

Annual intercomparison
We compare the modeled surface NO 2 with the corrected in situ measurements throughout China. In general, the spatial distribution of annual mean NO 2 concentrations from the CHIMERE model simulations is well in line with that from in situ measurements, with a correlation coefficient of over 0.7. However, the modeled NO 2 is biased compared to ground measurements. The differences of annual mean NO 2 concentrations between simulations and measurements are given in Fig. 6. The CHIMERE simulations with the DECSO inventory show considerably lower NO 2 concentrations than the in situ measurements, with a negative difference for nearly 90 % of all grid cells. Conversely, the CHIMERE simulations with the MIX inventory are generally higher than the in situ measurements for grid cells corresponding to large cities: A positive bias is found for 70 % of the grid cells with over four monitoring sites.
Grid cells are classified into five categories, i.e., mountainous, northern, < four stations, densely located, and main sample, and the corresponding scatter plots of corrected measurements against simulations are shown in Fig. 7. We define a grid cell as "mountainous" where the average elevation is higher than 1000 m and the standard deviation of elevations is over 15 % of the mean, based on the topographic data from the 30 arcsec global land topography "GTOPO30" archived by the US Geological Survey (available at https://lta.cr.usgs.gov/GTOPO30, last access: 10 July 2017, rescaled to 0.05 • ). The grid cells higher than 45 • N are classified as "northern". The grid cells with less than four measurement stations are classified as "< four stations". The grid cells with only densely located stations (see definition later in this section) are classified as "densely located". Note that the priority of the category of mountainous, northern, < four stations, and densely located is from high to low when we perform the classification in this study. For instance, for grid cells that meet the criteria of both mountainous and northern, we classify them as mountainous. The remaining and (c) the corrected DECSO inventory are subtracted from the corrected in situ measurements to derive the differences. The mean of the differences is further subtracted from the differences to derive the normalized differences. The magnitude of the size of symbols denotes the number of stations located in the same model grid cell. The color of the symbols denotes the difference of NO 2 concentrations. Grid cells with densely located stations are labeled with "L". The outline of circles corresponding to "main sample" (see Table 2) is highlighted in black. grid cells are classified as "main sample". The results for the daytime period (08:00-19:00 LT) are displayed separately. The correlation coefficient, regression slope, and root-meansquare error for the individual categories compared to measurements are given in Table 2.
Significant regional differences are found. The small slope over mountainous regions could be related to model limitations to resolve cities in the valleys. Furthermore, we may expect difficulties for the model in describing NO 2 concentrations over complex terrain. For mountainous regions, the lower slope may also be related to the large uncertainties in the meteorological parameters associated with the difficulties in resolving the characterization of small-scale orography in the ECMWF model (Beljaars et al., 2004). The errors on meteorological parameters, such as mixing height and temperature (Hongisto, 2005) and wind fields (Minguzzi et al., 2005), can introduce biases for air quality simulations. In addition, the accuracy of the DECSO algorithm highly relies on the appropriate wind fields because DECSO performs trajectory analysis to account for NO x transport away from the source when calculating the sensitivity of concentrations to emissions (Mijling and van der A, 2012). In this way, uncertainties in meteorological parameters are amplified in the DECSO inventory, resulting in a worse agreement for mountainous areas (r = 0.51) compared to the MIX inventory (r = 0.77).
The CHIMERE model accurately reproduces the spatial variability in NO 2 for northern grid cells with a high correlation coefficient of 0.92 and 0.81 for the simulations with the DECSO and the MIX inventories, respectively, but with a large negative bias. The bias could be related to model uncertainties in NO x sinks for high latitudes (Ding et al., 2017b), indicated by the sensitivity studies of modeled NO 2 columns to errors in chemical parameters associated with NO x sinks (Lin et al., 2012;Stavrakou et al., 2013). Additionally, the bias is particularly significant for the simulations with the DECSO inventory, showing a slope of merely 0.20. This could be further explained by the general underestimation of NO x emissions caused by the bias in NO 2 tropospheric columns of DOMINO v2 for this area (Ding et al., 2017b), partly due to a bias in the calculation of air mass factor for retrievals at large solar zenith angles by the radiance transfer model (Lorente et al., 2017) and possibly biases in the estimated stratospheric background. Figure 8 depicts the ratio of the simulated annual mean surface NO 2 concentrations to the corrected in situ measurements sorted by the number of stations located in the same grid cell from small to large. It is interesting to note that the ratio is small for grid cells with less than four stations, but increases along with the increase in the number of stations from four to nine, ranging from 0.6 to 1.0 and 0.9 to 1.8 for the simulations with the DECSO and the MIX inventories, respectively. The trend in the ratio suggests that the representativeness of in situ measurements for the average NO 2 levels of a grid cell improves with increasing numbers of stations or city size. For grid cells with less than four stations, the model with its limited spatial resolution cannot be expected to accurately resolve the spatial gradient of pollutants towards the city center in relatively smaller urban areas. Similarly, the in situ measurements for stations located close together are expected to be less representative of the grid cell mean compared to the homogeneously distributed stations. This is in agreement with the tendency that grid cells with lower average measurement station distances (< 10 km) tend to show  lower ratios (< 0.5) in Fig. 8, in particular for grid cells with a larger number of stations. We select the grid cells with over four stations but lower average distances than the 10 % percentiles in Fig. 3c and name them densely located. We analyze the performance of the model in those grid cells with only densely clustered stations (labeled with "L" in Fig. 6). Not surprisingly, the simulations for those grid cells show larger discrepancies compared to the measurements, and also the correlation coefficient of simulations with the DECSO inventory drops down to a rather low value of 0.55 (Table 2).
We exclude grid cells in the special categories discussed above (i.e., mountainous, northern, < four stations, and densely located stations) to draw conclusions on the abil-ity of the model to reproduce the measurements. Statistical values (correlation, slope, root-mean-square error) for the remaining grid cells (main sample) are given along with the plots in Fig. 7. A slope of 0.74 and 1.3 is found for the simulation with the DECSO and the MIX inventories, respectively. As mentioned before, the majority of stations are located in urban, populated, and polluted areas and the model resolution of 0.25 • will not be enough to represent the existing NO 2 gradients, thus we may expect a negative representativity offset in the modeled surface concentration, even for the main sample obtained after data screening (Irie et al., 2012;Lin et al., 2014). We select grid cells with over four stations, which potentially place one or two stations in back-4180 F. Liu et al.: Evaluation of modeling NO 2 concentrations over China ground areas, to give a rough estimate of the offset. The background stations are defined as stations located far away from urban areas on the basis of a visual inspection of satellite imagery from Google Earth. Not surprisingly, the measurement from the background station which is expected to better represent the grid cell mean is smaller than the average value of measurements from all stations located in the same grid cell. The ratios of annual mean measurements from the background stations to the mean of corresponding measurements from all stations range from 0.64 to 0.86, with an average of 0.74. That is, the average negative representativity offset may reach 25 %. The ratio is closer to 1 in winter (0.83 on average) due to the reduced spatial gradients in NO 2 caused by a longer NO x lifetime, which will be discussed in detail in Sect. 3.2. Thus, the slopes of 0.74 and 1.3 for DECSO and MIX actually indicate a slightly negative and more significantly positive bias, respectively.
A positive bias may indicate an overestimation of NO x emissions, or errors in the spatial downscaling of the bottomup emission totals, although biases from the description of the chemistry, transport, and removal processes in the model cannot be ruled out. The overestimation of the MIX results over large cities are consistent with previous findings that regional inventories like MIX have large positive biases in urban areas . The reason for the positive biases will be discussed in detail later in this section. Uncertainties in the DECSO results may be attributed to biases in the OMI tropospheric NO 2 column densities, or representation errors introduced by the projection of the CTM onto the measured NO 2 satellite footprint (Ding et al., 2017b). OMI NO 2 observations have been reported to be systematically smaller than those from ground-based measurements (e.g., MAX-DOAS) over polluted regions (Shaiganfar et al., 2011;Ma et al., 2013;Ialongo et al., 2016) due to their different spatial representativeness (Irie et al., 2012;Lin et al., 2014) and uncertainties in NO 2 vertical column retrieval, including the shielding effect of aerosols (Shaiganfar et al., 2011) and the varying observation geometry (Vasilkov et al., 2017). In addition, the fact that the adopted model resolution is not sufficient to accurately model nonlinear effects in the NO 2 loss rate may contribute to the negative bias (Valin et al., 2011).
Regional bottom-up inventories tend to have large positive biases in urban areas. Those inventories usually downscale local emissions from regional totals (provincial totals are used in the MEIC/MIX inventory for China) and distribute them to grid cells using spatial proxies (e.g., population density and GDP). However, the spatial proxies may not match the locations of the individual emitting sources, especially for industrial plants located far away from urban centers that tend to have a larger population density and GDP . Such a decoupling will result in an overestimation of emissions over urban areas, which has been proven by the comparison of proxy-based regional inventory with high-resolution urban inventories developed from the exten- sive use of information of individual emitting sources .
In order to better compare the spatial distributions of the two inventories and identify the sensitivity of model performance to spatial distributions of emissions, we further evaluate the impact of the spatial distribution of emissions on simulating NO 2 by applying the same spatial proxy for NO x emissions in both inventories. We scale the total amount of emissions of the 2015 DECSO inventory over the domain adopted in this study to that of the 2010 MIX inventory but kept the DECSO spatial distribution (hereinafter referred to as the corrected DECSO inventory; see Fig. 2). We then compare the modeled NO 2 using the corrected DECSO inventory with in situ measurements in Fig. 6c. It is interesting to see that many high values in the MIX simulation are not reproduced by the simulation with the corrected DECSO inventory. We further assess the simulation results with the corrected DECSO inventory in Fig. 7c. The simulation with the MIX inventory tends to cluster the pollutants more over urban areas than that with the corrected DECSO inventory, indicating that the modeled NO 2 is sensitive to the spatial distribution of emissions. The large bias in the MIX inventory is reduced significantly, with a slope decreasing from 1.3 to 1.0, which suggests an improvement of the distribution of emissions between urban and suburban or rural areas.
Note that due to the lack of the 2015 inventory, the use of the 2010 MIX emissions for other species including SO 2 , CO, and non-methane volatile organic compounds (NMVOCs) in both the MIX and the DECSO simulations may introduce uncertainties in simulating NO 2 . The anthropogenic emissions of SO 2 , CO, and NMVOCs for China have been reported to decrease by 2, 5, and increase by 21 % from 2010 to 2015, respectively (Li et al., 2017b). In gasphase chemistry, the principal sink of NO x is oxidation to HNO 3 . The influence of the growth in NMVOCs on the oxidizing power of the atmosphere is partially compensated for by the reduction in CO, as CO and hydrocarbons play similar roles in depleting oxidants following the HO x -NO x -CO-hydrocarbon chemical mechanisms (Jacob, 2000). Additionally, SO 2 influences NO 2 concentrations by forming aerosols, concentrations of which have an impact on photolysis rates and thus photochemical reaction rates associated with NO x (Mailler et al., 2017). However, the emission changes are rather small compared to the uncertainties in bottom-up estimates, which are even smaller than the discrepancies among estimates from different bottom-up inventories. Thus we believe the uncertainties arising from the use of the 2010 inventory are not significant. A sensitive analysis will be further expected to quantify the influence of emissions of other species on simulated NO 2 . Figure 9 compares the monthly mean NO 2 concentrations simulated by the CHIMERE model using the two inventories with the in situ measurements. The spatial correlation between the modeled NO 2 concentrations and the in situ measurements shows a weak dependence on season, which is slightly worse in summer (July). The correlation coefficients range from 0.64 (July) to 0.73 (January) and from 0.80 (July) to 0.83 (January) for the simulations with the DECSO and the MIX inventories, respectively. A possible explanation for the somewhat higher correlation in January is the smaller model error in winter than in summer, as indicated by previous findings in both China (Lin et al., 2012) and Europe (Huijnen et al., 2010). This may be related to the difficulties in resolving the more active NO x photochemistry in summer by the model. For instance, the model with a horizontal resolution of 0.25 • is not able to fully resolve the spatial gradients of NO 2 close to strong emission sources, but such an impact is smaller in winter than in other seasons, as the NO 2 gradients in a grid cell are smeared out due to the longer NO x lifetime in winter.

Seasonality
The seasonal difference is pronounced when comparing the magnitude of the NO 2 concentrations in Fig. 9. In general, a smaller ratio between modeled NO 2 concentrations and in situ measurements is detected in winter. The ratio reaches the lowest values in January, which is consistent with the general underestimation of simulations in winter as indicated by Lin et al. (2012). For simulations with the DECSO inventory, the ratio deviating more significantly from unity in winter might be due to systematic biases in the OMI NO 2 observations during winter as well. Biases in OMI NO 2 column densities over polluted regions are introduced by the high aerosol loading, most of which are scattering aerosols in China, as aerosols are not explicitly considered in the cloud retrieval or the air mass factor calculation in the operational NO 2 product (Castellanos et al., 2015;Chimot et al., 2016;Wang et al., 2017). The aerosols' effect may be more significant in winter due to the higher aerosol concentrations and larger solar zenith angle (Ma et al., 2013). Additionally, the DECSO algorithm is more vulnerable to biased observations as a result of the smaller number of useful observations in wintertime because of the filtering of snow-covered regions. Conversely, the simulations with the MIX inventory show ratios ranging from 1.09 to 1.22 in the second half of the year. This may signal an overestimation of total emissions, as pointed out in Sect. 3.1. In addition, the assumptions used in the MIX inventory for the distribution of monthly emissions over the year may also contribute to the bias. For example, higher power and industrial emissions are assumed in the second half of the year due to larger industrial productions and thus power generations to meet the annual total production target (Li et al., 2017a). Figure 10 presents the diurnal variability in hourly-averaged surface NO 2 concentrations. The simulations with both inventories and the in situ measurements exhibit a broadly similar daily variation (r = 0.81). The distinct peak in NO 2 concentrations in the morning hours (around 08:00 LT) and in the afternoon (around 20:00 LT) detected by the measurements has been well captured by the model, which can be attributed to increasing (traffic) emissions in the rush hours indicated by the Selected Nomenclature for sources of Air Pollution Prototype (SNAP) diurnal profiles of emissions (Menut et al., 2012) adopted in the CHIMERE model (grey line). Both simulations and measurements show a drop in NO 2 concentrations during daytime with the same timing and amplitude, related to the varying chemical loss rate of NO 2 driven by NO x photochemistry. However, the disagreement between simulated and measured values is larger at night, which may point to problems regarding the treatment of boundary layer mixing. NO 2 concentrations simulated by the model cannot reproduce the observed temporal pattern at night but present constantly high values, probably caused by unrealistically low boundary layer heights and too little vertical turbulence in the model (Bessagnet et al., 2016). This has been further confirmed by the earlier evaluation of the diurnal cycle of trace gases as modeled by CHIMERE in Lampe et al. (2009).

Diurnal cycle
We separately evaluated the model performance for the daytime period (08:00-19:00), when the pattern of diurnal variations simulated by the CHIMERE model is closer to what is observed by the in situ measurements. Not surprisingly, a larger negative slope of 0.64 is obtained for the simulation with the DECSO inventory compared to the surface observations, while the slope for the simulation with the MIX inventory has been reduced significantly to a value of 1.1 (Table 2) due to the tendency of overestimating NO 2 concentrations during night in the model. Note that the slope close to unity for the simulation with the MIX inventory during daytime does not necessarily imply a perfect emission inventory, but still indicates a potential overestimation because we expect a slope smaller than 1 (in the range of 0.64-0.86; see Sect. 3.1) when comparing model simulations with in situ

Conclusions
In this work we evaluated the surface NO 2 concentrations from the CHIMERE CTM, driven by both satellite-derived and bottom-up emission inventories, using the measurements from the ground-based air quality monitoring network of MEP. To our knowledge, this result is the first validation of modeling NO 2 results with this widespread in situ network, which became recently available. Our study demonstrates the capabilities of CTMs such as CHIMERE, combined with satellite observations, to simulate NO 2 concentrations at the surface over China. MEP in situ measurements can serve as a useful dataset for evaluating model simulations, but a careful selection of measurements and scaling correction is necessary to represent the averaged NO 2 level over the area of a grid cell. Measurements with unexpected lower annual mean NO 2 concentrations at urban stations compared to those at suburban stations have been discarded from the final analysis.
The model accurately reproduces the spatial variability in annual mean NO 2 from in situ measurements over China, with a spatial correlation coefficient of over 0.7. In situ measurements used in this study are expected to have a positive bias when compared to model simulations due to a combination of preferential placement of monitors in polluted locations and the limitation of model resolution to resolve large NO 2 gradients over urban areas. The estimated bias is 25 % (ranging between 10 % and 40 %), indicated by the ratios of annual mean measurements from the background stations, which is expected to better represent the grid cell mean to the mean of corresponding measurements from all stations for selected grid cells with over four stations. The bias is especially pronounced for grid cells with too few stations (less than four in this study) or stations located close together. Negative biases have been widely detected for mountainous and northern regions, which are most likely related to the representative issue discussed above, but model uncertainties in meteorological parameters and NO x sinks will also play a role. For other regions, a negative and positive difference has been found for the simulation with the DECSO (slope = 0.74) and the MIX (slope = 1.3) inventories, respectively, suggesting an underestimation and overestimation of NO x emissions from corresponding inventories. The bias between observed and modeled concentrations was reduced significantly, with the slope decreasing from 1.3 to 1.0, when the spatial distribution of NO x emissions in the DECSO inventory is applied as the spatial proxy for the MIX inventory. The reduced bias suggests an improvement of the distribution of emissions between urban and suburban or rural areas in the DECSO inventory compared to that used in the bottomup inventory, which shed light on addressing the spatial errors in bottom-up inventories. Conversely, we also show that the correlation coefficient of the simulated NO 2 concentrations versus the in situ measurements is slightly higher in the MIX-based simulations compared to the DECSO simulations. However, this does not necessarily contradict the findings that the spatial distribution of NO x emissions is more reasonable in DECSO, considering the difference in correla-tion coefficient is minor but the bias in the MIX-based simulations is significant. Nevertheless, the good performance of the satellite-derived emission inventory, in particular the spatial distribution of emissions, has been confirmed by the widespread in situ measurements over China for the first time in this study. The magnitude of satellite-derived emissions shows a slightly negative bias by taking the negative representativity offset of in situ measurements into account, which is attributed to biases in the OMI tropospheric NO 2 column densities, or representation errors introduced by the projection of the CTM onto the measured NO 2 satellite footprint. In addition, satellite-derived NO x emissions succeed in detecting the emission trend for the period of 2010-2015, which is consistent with that in bottom-up emissions (Liu et al., 2016a;.
The performance of the model is comparable over seasons, with a slightly better spatial correlation in winter. This is in line with previous findings of a lower model uncertainty in winter due to the difficulties in resolving the more active NO x photochemistry and larger concentration gradients in summer by the model. In addition, the daytime diurnal cycle has been well captured by the model. However, the disagreement between simulations and measurements is in general larger during nighttime, which is most likely related to the uncertainty in vertical mixing in the model. This nighttime issue causes an estimated bias of about +15 % in the daily mean NO 2 concentrations.
Note that the validation performed in this study is focused on urban areas, which may bring a systematic bias to the conclusive statements, as discussed above. In the future analysis focusing on rural areas is expected to give a more complete picture of the performance of CTMs with inventories. In addition, an in-depth comparison of multiple models with variable chemistry schemes (e.g., Huijnen et al., 2010) is further required to quantify the influence of chemical mechanisms on simulated NO 2 . In order to support model validation, the introduction of additional background stations, as well as the provision of detailed information about the stations, including classification and height, would be very valuable.
Data availability. Measurements from the ground-based air quality monitoring network of MEP were obtained from www.pm25.in(last access: 1 March 2017). The CHIMERE model outputs are available upon request from the corresponding author.
Competing interests. The authors declare that they have no conflict of interest.