Evaluation of ACCMIP ozone simulations using a multi-constituent chemical reanalysis

The Atmospheric Chemistry Climate Model Intercomparison Project (ACCMIP) ensemble ozone simulations for the presentday are evaluated by a state-of-the-art multi-constituent atmospheric chemical reanalysis that ingests multiple satellite data including Tropospheric Emission Spectrometer (TES), Microwave Limb Sounder (MLS), Ozone Mapping Instrument (OMI), and the Measurements of Pollution in the Troposphere (MOPITT). Validation of the chemical reanalysis against global ozonesondes 5 shows good agreement throughout the free troposphere and lower stratosphere for both seasonal and year-to-year variations, with an annual mean bias of less than 0.9 ppb in the middle and upper troposphere at the tropics and mid-latitudes. The model evaluation using the reanalysis reveals that the ensemble mean overestimates ozone in the northern extratropics by 6–11 ppb while underestimating by up to 18 ppb in the southern tropics over the Atlantic in the lower troposphere. Most models underestimate the spatial variability of the annual mean concentration in the extratropics of both hemispheres in the lower troposphere. 10 The ensemble mean also underestimates the seasonal amplitude by 25–70 % in the northern extratropics and overestimates the inter-hemispheric gradient by about 30 % in the lower and middle troposphere. These differences are less evident with the current sonde network, which is shown to provide biased regional and monthly ozone statistics, especially in the tropics. These systematic biases have implications for ozone radiative forcing and the response of chemistry to climate that can be further quantified as the satellite observational record extends to multiple decades. 15


Introduction
Tropospheric ozone is one of the most important air pollutants and the third most important greenhouse gases in the atmosphere (Forster et al., 2007;HTAP, 2010;Myhre et al., 2013;Stevenson et al., 2013) while also playing a crucial role in the tropospheric oxidative capacity through production of hydroxyl radicals (OH) by photolysis in the presence of water vapor (Logan et al., 1981;Thompson, 1992).Global tropospheric ozone is formed from secondary photochemical production of ozone precursors including hydrocarbons or carbon monoxide (CO) in the presence of nitrogen oxides (NO x ) modulated by additional processes including in-situ chemical loss, deposition to the ground surface, and inflow from the stratosphere.These ozone precursors are largely controlled by anthropogenic and natural emissions, e.g., mobile, industry, lightning, biomass burning sources.
Representation of tropospheric ozone in chemical transport models (CTMs) and chemistry climate models (CCMs) is important 1 Atmos.Chem.Phys. Discuss., doi:10.5194/acp-2016-1043, 2016 Manuscript under review for journal Atmos.Chem.Phys.considers the fundamental chemical cycle of O x -NOx-HO x -CH 4 -CO along with oxidation of non-methane volatile organic compounds (NMVOCs) to properly represent ozone chemistry in the troposphere.Its stratospheric chemistry simulates chlorine and bromine containing compounds, CFCs, HFCs, OCS, N 2 O, and the formation of polar stratospheric clouds (PSCs) and associated heterogeneous reactions on their surfaces.MIROC-Chem has a T42 horizontal resolution (2.8 • ) with 32 vertical levels from the surface to 4.4 hPa.It is coupled to the atmospheric general circulation model MIROC-AGCM version 4 (Watanabe et al., 2011).The simulated meteorological fields were nudged toward the six-hourly ERA-Interim (Dee et al., 2011) to reproduce past meteorological fields.
The a priori values for surface emissions of NO x and CO were obtained from bottom-up emission inventories.Anthropogenic NO x and CO emissions were obtained from the Emission Database for Global Atmospheric Research (EDGAR) version 4.2 (EC-JRC, 2011).Emissions from biomass burning were based on the monthly Global Fire Emissions Database (GFED) version 3.1 (van der Werf et al., 2010).Emissions from soils were based on monthly mean Global Emissions Inventory Activity (GEIA) (Graedel et al., 1993).Lightning NO x (LNO x ) sources in MIROC-Chem were calculated based on the relationship between lightning activity and cloud top height (Price and Rind, 1992) and using the convection scheme of MIROC-AGCM.For black carbon (BC) and organic carbon (OC) and other precursor gases, surface and aircraft emissions are specified from the emission scenarios for Greenhouse Gas and Air Pollution Interactions and Synergies (GAINS) model developed by International Institute for Applied System Analysis (IIASA) (Klimont et al., 2009;Akimoto et al., 2015).

Data assimilation method
Data assimilation used here is based upon on an EnKF approach (Hunt et al., 2007).The EnKF uses an ensemble forecast to estimate the background error covariance matrix and generates an analysis ensemble mean and covariance that satisfy the Kalman filter equations for linear models.In the forecast step, a background ensemble, x b i (i = 1, ..., k), is obtained from the evolution of an ensemble model forecast, where x represents the model variable, b is the background state, and k is the ensemble size (i.e., 32 in this study).The background ensemble is then converted into the observation space, y b i = H(x b i ), using the observation operator H which is composed of a spatial interpolation operator and a satellite retrieval operator, which can be derived from an a priori profile and an averaging kernel of individual measurements (e.g., Eskes and Boersam, 2003;Jones et al, 2003).Using the covariance matrices of observation and background error as estimated from ensemble model forecasts, the data assimilation determines the relative weights given to the observation and the background, and then transforms a background ensemble into an analysis ensemble, x a i (i = 1, ..., k).The new background error covariance is obtained from an ensemble forecast with the updated analysis ensemble.
In the data assimilation analysis, a covariance localization is applied to neglect the covariance among unrelated or weakly related variables, which has the effect of removing the influence of spurious correlations resulting from the limited ensemble size.The localization is also applied to avoid the influence of remote observations that may cause sampling errors.The state vector includes several emission sources (surface emissions of NO x and CO, and LNO x sources) as well as the concentrations of 35 chemical species.The emission estimation is based on a state augmentation technique, in which the background error correlations determines the relationship between the concentrations and emissions of related species for each grid point.
Because of the simultaneous assimilation of multiple-species data and because of the simultaneous optimization of the concentrations and emission fields, the global distribution of various species, including OH, is modified considerably in our system.This propagates the observational information between various species and modulates the chemical lifetimes of many species (Miyazaki et al., 2012b;2015;2016).

Assimilated measurements
Assimilated observations were obtained from multiple satellite measurements.Tropospheric NO 2 column retrievals used are the version-2 DOMINO data product for Ozone Monitoring Instrument (OMI) (Boersma et al., 2011) and version 2.3 TM4NO2A data products for Scanning Imaging Absorption Spectrometer for Atmospheric Cartography (SCIAMACHY) and Global Ozone Monitoring Experiment-2 (GOME-2) (Boersma et al., 2004) obtained through the TEMIS website (www.temis.nl).
The TES ozone data and observation operators used are version 5 level 2 nadir data obtained from the global survey mode (Bowman et al, 2006;Herman and Kulawik, 2013).This data set consists of 16 daily orbits with a spatial resolution of 5-8 km along the orbit track.The Microwave Limb Sounder (MLS) data used are the version 4.2 ozone and HNO 3 level 2 products (Livesey et al., 2011).We used data for pressures of less than 215 hPa for ozone and 150 hPa for HNO 3 .The Measurement of Pollution in the Troposphere (MOPITT) CO data used are version 6 level 2 TIR products (Deeter et al., 2013).
Different models vary greatly in complexity.The calculated chemical species vary from 16 to 120 species.Photolysis rates are computed with offline or online methods, depending on the model.Many models include a full representation of stratospheric ozone chemistry and the heterogeneous chemistry of polar stratospheric clouds, but several models specify stratospheric ozone.Methane concentration is prescribed for the surface or over the whole atmosphere in many models.Ozone precursor emissions from anthropogenic and biomass burning sources were taken from those compiled by Lamarque et al. (2010), and the same emissions were used in all the models.Natural emission sources such as isoprene emissions, and lightning and soil NO x sources were not specified and were accounted for differently between models.There is a large range in soil NO x emissions from 2.7 to 9.3 TgNyr −1 and in LNO x sources from 1.2 to 9.7 TgNyr −1 for the 2000 conditions.The range of natural emissions is a significant source of model-to-model ozone differences (Young et al. 2013).A complete description of the models along with the experiment design can be found in Lamarque et al. (2013).Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2016-1043, 2016 Manuscript under review for journal Atmos.Chem.Phys.Published: 23 December 2016 c Author(s) 2016.CC-BY 3.0 License.

Ozonesonde data
Ozonesonde observations were taken from the World Ozone and Ultraviolet Radiation Data Center (WOUDC) database (available at http://www.woudc.org).All available data from the WOUDC database are used for the validation.The accuracy of the ozonesonde measurement is about ±5 % in the troposphere (Smit and Kley, 1998).
To compare ozonesonde measurements with the data assimilation and ACCMIP models, all ozonesonde profiles have been interpolated to a common vertical pressure grid, with a bin of 25 hPa.The reanalysis and model fields were linearly interpolated to the time and location of each measurement using the two-hourly output data, with a bin of 25 hPa, and then compared with the measurements.The averaged profile is computed globally and for five latitudinal bands, SH high latitudes (55  3 Consistency between chemical reanalysis and ozonesonde observations Miyazaki et al. (2015) validated an older version of the reanalysis (http://www.jamstec.go.jp/res/ress/kmiyazaki/reanalysis/) and showed good agreement with independent observations such as ozonesonde and aircraft measurements on regional and global scales and for both seasonal and year-to-year variations from the lower troposphere to the lower stratosphere for the 2005-2012 period.The mean bias against the ozonesonde measurements in the older dataset is -3.9 ppb at the NH high-latitudes, -0.9 ppb at the NH mid-latitudes, 2.8 ppb in the tropics, -1.0 ppb at the SH mid-latitudes, -1.7 ppb at the SH high-latitudes between 850 and 500 hPa (Miyazaki et al., 2015).A major update from the system used in Miyazaki et al. (2015) to the system used in this study is the replacement of forecast model from CHASER (Sudo et al., 2002) to MIROC-Chem (Watanabe et al., 2011), which caused substantial changes in the a priori field and thus the data assimilation results of various species.In addition, we attempt to optimize the surface NO x emission diurnal variability using data assimilation of OMI, SCIAMACHY, and GOME-2 retrievals in the updated system (Miyazaki et al., 2016).Since the updated reanalysis ozone fields used in this study have not yet been validated in any publication, we first present the evaluation results of the chemical reanalysis using global ozonesonde observations for 2005-2009.
Figs 1 and 2 compare the reanalysis and the global ozonesonde observations, and the comparison result is summarized in Table 1.In order to confirm improvements in the reanalysis, results from a model simulation without any chemical data assimilation (i.e., a control run) is also shown.The control run shows systematic biases, such as positive biases in the upper troposphere and lower stratosphere (UTLS) throughout the globe and negative biases in the lower and middle troposphere in the extratropics of both hemispheres.The positive bias in the UTLS is larger in the Southern Hemisphere (SH) than in the Northern Hemisphere (NH).The a priori systematic bias in this study is larger than that in our previous study (Miyazaki et al., 2015) in the UTLS, because of different model setting.However, the reanalysis fields were less sensitive to the a priori profiles in the UTLS than in the lower and middle troposphere because of strong constraints by MLS measurements and long chemical lifetime of ozone in the UTLS.
The reanalysis shows improved agreements with the ozonesonde observations over the globe for the entire troposphere.The data assimilation removed most of the positive bias in the UTLS throughout the year and reduced the negative bias in the lower and middle troposphere in the extratropics.At NH mid and high latitudes in the lower and middle troposphere, the data assimilation reduced the annual mean negative bias of the model by 45-90 %, which is attributed to the reduced bias in boreal spring-summer.The mean bias in the new dataset is smaller than that in the older dataset for most cases (e.g., from -3.9 to -2.9 ppb at the NH high-latitudes, -0.9 to -0.1 ppb at the NH mid-latitudes, -1.0 to -0.1 ppb at the SH mid-latitudes between 850 and 500 hPa).The simultaneous optimization of concentrations and emissions played important roles in improving the lower tropospheric ozone analysis, associated with the pronounced ozone production caused by NO x increases, as demonstrated by Miyazaki et al. (2015).This advantage increases the ability of the chemical reanalysis to evaluate the simulated tropospheric ozone profiles, including the lower tropospheric ozone concentrations.Root-Mean-Square-Errors (RMSEs) are also reduced above the middle troposphere.The tropospheric concentrations show distinct seasonal and year-to-year variations, for which the temporal correlation is increased by the data assimilation globally, except at high latitudes in the lower troposphere (Table 1).

Global distribution
We use the global chemical reanalysis to evaluate the global ozone profiles in ACCMIP simulations.Fig. 3 compares the global distribution of the annual mean ozone concentration between the reanalysis and the ensemble mean of the ACCMIP models.
The average over the multiple models can be expected to improve the robustness of the model simulation results, because some parts of the model errors may cancel each other out.As summarized in Table 2, the global spatial distributions are similar between the reanalysis and the ensemble mean, with a spatial correlation (r) greater than 0.94 from the lower troposphere to the lower stratosphere, except for the NH extratropical middle troposphere (r=0.57).The reanalysis and multi-model mean commonly reveal distinct inter-hemispheric differences, associated with a stronger downwelling across the tropopause and stronger emission sources of ozone precursors in the NH.The wave-1 pattern in the zonal ozone distribution in the tropics, with a minimum over the Pacific Ocean and maximum over the Atlantic (Thompson et al., 2003;Bowman et al, 2009;Ziemke et al., 2011), can also be commonly found in the reanalysis and the multi-model mean.
Large errors between the reanalysis and the multi-model mean in the troposphere are found in the NH extratropics and SH tropics (right panel in Fig. 3).The multi-model mean overestimates the zonal and annual mean concentrations by 6-11 ppb at 800 hPa and by 2-9 ppb at 500 hPa in the NH extratropics.The overestimation is larger over the oceans than over land at the NH mid-latitudes at 800 hPa.Both the mean RMSE and bias are larger at 800 hPa than at 500 hPa in the NH extratropics, whereas they are larger at 500 hPa in the NH tropics (Table 2).In the SH tropics, the multi-model mean underestimates the concentration over the eastern Pacific by up to 9 ppb, over the Atlantic by up to 18 ppb, and over the Indian Ocean by up to 8 ppb at 500 hPa.These negative biases are larger in the middle troposphere than in the lower troposphere for most places and also for the zonal means in the SH tropics (-15 % in the middle troposphere and -10 % in the lower troposphere) (Table 2).
The positive bias in the NH and negative bias in the SH were common reported using OMI/MLS tropospheric ozone column Atmos.Chem. Phys. Discuss., doi:10.5194/acp-2016-1043, 2016 Manuscript under review for journal Atmos.Chem.Phys.Published: 23 December 2016 c Author(s) 2016.CC-BY 3.0 License.measurements (Young et al., 2013).At 200 hPa, the multi-model mean underestimates the zonal mean concentration by 20-30 ppb at high latitudes in both hemispheres, with a larger error in the SH than in the NH (Table 2).Fig. 4 shows the Taylor diagram of the ACCMIP models against the reanalysis for three latitudinal bands for three levels.
The relevant statistics at 500 hPa are summarized in Table 3, for which the tropics is separated into two hemispheres.In the NH extratropics at 800 hPa, most models reproduced the spatial distribution (r = 0.8-0.95), while underestimating the spatial standard deviation (SD) by up to 50 %.Three exceptional models (1, 7, 8) show relatively poor agreements (r = 0.45-0.6 and SD underestimations by 50-60 %).At 500 hPa, there is a large diversity in the agreement.Only a few models (2,4,9,11) show close agreement with the reanalysis (r > 0.8, SD error < 20 %).Notably, two models (12, 15) reveal too large spatial variabilities (SD error > 80 %), and five models (1,6,7,8,12) reveal small spatial correlation (r < 0.15).The regional mean bias is largely positive (> 10 ppb) in several models (7, 8, 12) (Table 3).In this region, ozone distributions are modified by various processes, including vertical transport by convection and along conveyor belts, inflow from the stratosphere, long-range transports, and photochemical production (e.g, Lelieveld and Dentine, 2000;Oltmants et al., 2006;Sudo and Akimoto, 2007;Jonson et al., 2010).The evaluation results indicate that these processes occur differently among models.At 200 hPa, all the models simulate well the spatial distribution (r > 0.95), whereas the spatial variability differs between the models (SD error ranges from -50 % to +30 %).There is relatively large variation in the stratospheric concentration, which results in the diversity in the UTLS.
In the tropics, the spatial correlation is greater than 0.8 at all levels for most models (except for 12, 15), as they capture the wave-1 structure.When dividing the tropics into two hemispheres (Table 3), only a few models (4, 12) reveal low spatial correlation (r < 0.8) for the SH tropics (30 • S-EQ) at 500 hPa.The spatial correlation in the tropics is lower at 500 hPa than at 800 hPa for most models.The SD error is less than 40 % for all the models at 800 and 500 hPa, while mostly overestimating the spatial variability at 800 hPa by up to 30 %.The mean bias is negative for most models at 500 hPa in the tropics in both hemispheres, with larger negative biases in the SH tropics (Table 3).Young et al (2013) noted that correlations between the biases for the NH and SH tropical tropospheric columns are strong.Similarly, our analysis using the reanalysis reveal a high correlation (0.91) between the NH and SH tropical biases at 500 hPa, suggests that similar processes are producing the model biases in the tropical middle troposphere between the hemispheres.At 200 hPa in the tropics, the SD error differs among models, which could primarily be associated with the different representations of convective transports and ozone production by LNO x sources (e.g., Lelieveld and Crutzen, 2007;Wu et al., 2007).
In the SH extratropics at 800 hPa, most models reproduce the spatial distribution (r > 0.9), while underestimating the SD by 15-70 %, except for model 15.The model performance is similar between 800 hPa and 500 hPa, with a smaller SD error at 500 hPa for most models.These high spatial correlations may be related to a lack of local precursor emissions in the SH.
At 500 hPa, a majority of the models underestimate the mean concentration (Table 3), with large negative biases (< -8 ppb) in several models (1,2,12,14).At 200 hPa, the SD error varies from -80 % to + 65 %.The large diversity at 200 hPa may be related to the different representation of the tropopause and stratosphere-troposphere exchange (STE) among models.The poor agreement in model 8 is attributed to too-high concentrations at SH mid-latitudes and too-low concentrations at SH high latitudes in austral spring.

Seasonal variation
Fig. 5 compares the seasonal variation of zonal mean ozone concentration between the ACCMIP models, the reanalysis, and ozonesonde observations.The comparison between the reanalysis concentrations from the ozonesonde sampling (black dashed line) and the ozonesonde observations (blue solid line) shows that the reanalysis is in close agreement with the ozonesonde observations over the globe, as described in Sec. 3.However, in the NH extratropics at 800 hPa, the reanalysis concentration is too low from boreal spring to summer by up to 4 ppb, which leads to an underestimation of the seasonal amplitude.In the NH tropics at 500 hPa, the reanalysis overestimates the concentration except in April.In the SH tropics at 500 and 800 hPa, the reanalysis slightly overestimates the concentrations throughout the year by up to 5 ppb.In the SH extratropics at 800 hPa, the reanalysis concentration is too low by up to 5 ppb from austral autumn to winter.The reanalysis concentration and seasonal variation differs largely between the complete sampling (black bold line, where the concentrations were averaged over all grid points) and the ozonesonde sampling (black dashed line) for the globe.The impact of using the reanalysis instead of the ozonesonde network in characterizing the ozone seasonal variation is discussed in Section 5.
The global ozone concentrations averaged over all grid points are compared between the ACCMIP models and the reanalysis (black solid line vs. red solid line for the multi-model mean and thin colored lines for individual models).There is considerable interannual variability in both the reanalysis and the ACCMIP models.We confirmed that the ACCMIP ensemble mean is mostly within the standard deviation (i.e., year-to-year variation) of the reanalysis (not shown).In the NH extratropics, the multi-model mean overestimates the monthly mean concentrations by 6-9 ppb at 800 hPa and by 3-6.5 ppb at 500 hPa.The multi-model mean reproduces the seasonal variation, whereas there is large diversity among the models.The increase from winter to spring differs among models at 500 hPa, which is probably associated with different representations of downwelling from the stratosphere.Fig. 6 compares the seasonal amplitude.Most models overestimate the seasonal amplitude in the NH lower and middle troposphere, with a mean overestimation of 50-70 % at 800 hPa and 25-40 % at 500 hPa at NH high latitudes.
At 200 hPa, the multi-model annual mean concentration is in good agreement with that of the reanalysis, whereas the seasonal amplitude is underestimated by most models at NH high latitudes, with a mean underestimation of 15-25 %.
In the NH tropics at 500 hPa, the multi-model mean underestimates the concentration by 1-4 ppb throughout the year, which can be attributed to the anomalously low concentrations in several models.There is a large diversity among the models in this region.In the SH subtropics, the multi-model mean is lower by up to 5 ppb at 800 hPa and by up to 11 ppb at 500 hPa, with the largest errors occurring in austral spring.A majority of models overestimate the seasonal amplitude in the NH subtropics at 800 hPa (by about 10-40 %), whereas they mostly underestimate the amplitude in the SH tropics at 800 and 500 hPa.In the tropical upper troposphere in both hemispheres, a few models reveal anomalously high or low concentrations.Both the ozonesondes and reanalysis reveal a sharp increase in ozone between March and April in the NH subtropics, which is not captured in the multi-model mean.
In the SH extratropics, the multi-model mean and the reanalysis are in good agreement at 800 hPa, whereas it largely underestimates the peak concentration in austral winter-spring at 500 hPa (by up to 7 ppb) and 200 hPa (by up to 35 ppb).
The large diversity among the models and the large underestimation in the multi-model mean at 500 hPa in spring could be

Inter-hemispheric gradient
Fig. 7 compares the inter-hemispheric gradient (NH/SH ratio) of the annual mean ozone concentration.We calculated the gradient across the equator; however, recognize a more careful definition of the boundary between two hemispheres would be required to isolate air masses originated from each hemisphere (e.g., Hamilton et al., 2008).The gradient is similar between the ozonesonde observations (blue solid line) and the reanalysis concentration from the ozonesonde sampling (black dashed line) throughout the troposphere.In these estimates, the NH mean concentration is higher than the SH mean by 60-70 % in the lower troposphere, by 30-40 % in the middle troposphere, and by 55-60 % around 200 hPa.Near the surface, the reanalysis slightly overestimates the NH/SH ratio, mainly because of overestimated concentrations at the NH mid-latitudes.
By taking a complete sampling in the reanalysis (i.e., averaging over all model grid points for each hemisphere) (black solid line), the NH/SH ratio becomes smaller by about 25-30 %, 7-10 %, and 15-25 % in the lower troposphere, the middle troposphere, and around 200 hPa, respectively, compared to the average at the ozonesonde sampling sites (black dashed line).
The difference is a consequence of ozonesonde stations located near large cities at NH mid-latitudes, and therefore tend to observe higher ozone concentration than the hemispheric average.Around 200 hPa, the difference could also be attributed to the presence of atmospheric stationary waves and Asian monsoon circulation in the NH, which result in substantial spatial ozone variations in the UTLS (e.g., Wirth, 1993;Park et al., 2008) (c.f., Fig. 3).The annual mean NH/SH ratio based on the global reanalysis field estimated at the surface, 800 hPa, 500 hPa, and 200 hPa are 1.36, 1.42, 1.30, and 1.35, respectively.
Most models overestimate the NH/SH ratio compared with the reanalysis, with a mean overestimation (black solid line vs. red solid line) of 34 % at the surface and 22-30 % in the free troposphere, attributing to both too-high concentrations in the NH extratropics and too-low concentrations in the SH subtropics in most models (c.f., Figs. 3 and 5).The multi-model mean reveals annual mean NH/SH ratios of 1.71, 1.73, 1.54, and 1.49 at the surface, 800 hPa, 500 hPa, and 200 hPa, respectively.The large systematic error in the NH/SH ratio suggests that, for instance, the radiative heating distribution in chemistry-climate simulations are largely uncertain in most models, and such comprehensive information for different altitudes in the troposphere cannot be obtained using any individual measurements, as is further discussed in Section 6.3.

Impact of sampling on model evaluation
As presented in the previous section, the chemical reanalysis provides comprehensive information on global ozone distributions for the entire troposphere which is useful for validating global model performance.It was also demonstrated that the interhemispheric gradient of ozone measured with the ozonesonde and complete sampling method produced different results, and the model-reanalysis difference strongly depended on the choice of the sampling method.As these networks have been the primary basis for CCM evaluation (e.g., Stevenson et al., 2006;Huijnen et al., 2010;Young et al., 2013), the implications of this sampling bias need to be quantified.The current ozonesonde network does not cover the entire globe and is not homogeneously distributed between the hemispheres, ocean and land, and urban and rural areas.Also, the sampling interval of ozonesonde observations is typically a week or longer, which does not reflect the influence of diurnal and day-to-day variations.Model errors are also expected to vary greatly in time and space at various scales.Therefore, the implications of model differences at ozonesonde locations to regional and seasonal processes is uncertain.This section evaluates how changes in evaluated model performance could be obtained by using the complete sampling chemical reanalysis fields instead of the existing ozonesonde network on simulated regional ozone fields.
Sampling bias is an error in a computed quantity that arises due to unrepresentative (i.e., insufficient or inhomogeneous) sampling, which induces spurious features in the average estimates (e.g., Aghedo et al., 2011;Foelsche et al 2011;Toohey et al., 2013;Sofieva et al., 2014).Sampling bias may occur when the atmospheric state within the time-space domain over which the average is calculated is not uniformly sampled.In regions where variability is dominated by short-term variations, limited sampling may lead to a random sampling error.The primary technique for sampling bias estimation is to subsample model or reanalysis fields based on the sampling patterns of the measurements and then to quantify differences between the mean fields based on the measurement sampling and those derived from the complete fields.Sampling bias cannot be negligible, even for satellite measurements (Aghedo et al. 2011;Toohey et al., 2013;Sofieva et al., 2014).
To estimate sampling biases of the ozonesonde network in the ACCMIP model evaluation, two evaluation results of mean model bias are compared using the chemical reanalysis.The first evaluation was conducted based on the complete sampling; the second evaluation used the ozonesonde sampling (in both space and time) that is based on the completion by Tilmes et al. (2012).By using the two-hourly reanalysis fields, we can address possible biases due to the limited model sampling (i.e., monthly model outputs were used).Note that the relatively coarse resolution of the reanalysis may lead to an underestimation of the sampling bias in the model evaluation, because the variability of a sampled field depends on the resolution of the measurement.Tilmes et al. (2012) stated that regional aggregates of individual ozonesonde measurements with similar characteristics are more representative for larger regions; however, this may not mean that evaluation results using the compiled data generate model errors that are representative of actual monthly mean for a surrounding area.The model evaluation results are shown for the 11 regions illustrated in Fig. 8 and summarized in Table 3.Japan was excluded from the evaluation because data from only one station was available for the reanalysis period.The 11 areas surrounding the ozonesonde stations were considered for complete atmospheric sampling (rectangles in Fig. 8), for which small margins were considered around the stations to prevent overestimation of the ozonesonde network limitation.It was confirmed that the discrepancy between the two evaluations generally increases with the size of the area.In contrast, for the SH mid-and high latitudes, the defined areas cover the entire range of longitudes, because of generally less variabilities in the SH than in the SH.
The reality of the reanalysis fields is important for reasonable estimates of the true sampling bias of the real atmosphere.As discussed in Section 3, there is good agreement in the evaluated model performance using the reanalysis and the ozonesonde measurements at the ozonesonde sampling, except for the lower troposphere.This result supports the use of the reanalysis data at the ozonesonde locations.The performance of the ACCMIP model as compared with the ozonesonde measurements is mostly consistent with that shown by Young et al. (2013), although the ozonesonde data periods differ -1997-2011 was used by Young et al (2013)

Mean error and its distribution
The model evaluation results for the two cases differ greatly for many regions, as shown by Fig. 9 and summarized in Table 5.For the NH Polar Regions, Tilmes et al. (2012) stated that separating the regions into eastern and western sectors reduces the variability in ozone within each region because long-range transports of pollution from low and mid-latitudes into high latitudes shows longitudinal variations in the NH (e.g., Stohl, 2006).Comparisons further suggest that, except for the UTLS in winter (December-February (DJF)), the evaluated model performance using the ozonesonde measurements are representative of the surrounding regional and seasonal mean model performance.For the two NH polar regions at 200 hPa in DJF, the validation based on the ozonesonde sampling reveals a large negative sampling bias in the model bias as compared with regional and monthly means.Large negative model biases against the ozonesonde observations have been reported by Young et al. (2013) for 250 hPa, whereas results from this study suggest that these errors are larger than those from regional and seasonally representative model bias.At 500 hPa, the ozonesonde network reveals a negative sampling bias for the NH polar east in DJF.Thus, the positive bias reported in Young et al. (2013) for the NH polar east at 500 hPa may be lower than regional and seasonally representative model biases.The large discrepancy between the two estimates in the UTLS model performance can be attributed to the large variability of ozone distribution and associated model errors on a regional and seasonal scale.
For Canada, large differences (>30 %) exist in the two evaluations in the lower troposphere and for the UTLS in DJF and for the middle troposphere in MAM.The ozonesonde measurements reveal a large negative sampling bias in the model evaluation in DJF at 200 hPa (-4 % in the complete sampling and -25 % in the ozonesonde sampling), while they reveal a negative sampling bias (by about 50 %) at 500 hPa in MAM.Similar differences between the two evaluations are found for Western Europe at 500 hPa and at 200 hPa in DJF.These results suggest that, for instance, the positive bias for Western Europe estimated by Young et al (2013) may be lower than regional and seasonally representative model bias, even for such a small area.The smaller discrepancy between the two estimates for Western Europe as compared for Canada for most cases could be associated with the better coverage of the ozonesonde measurements for Western Europe.Even for the small area of the eastern United States, the two validations differ largely in the UTLS (e.g., -9 % in the ozonesonde sampling and +6 % in the complete sampling at 200 hPa in MAM) and at 500 hPa in MAM, June-August (JJA), and September-November (SON).In the NH subtropics, the two evaluations disagree largely in the middle and upper troposphere in JJA and SON.
The tropical stations were separated into the three sub-regions: Western Pacific and East Indian Ocean, equatorial America, and the Atlantic Ocean and Africa.These regions reflect the different dominant tropical processes including biomass burning and lightning over the Atlantic and Africa.The large variability of tropical ozone and its associated model error, together with the sparse ozonesonde network in these regions, results in large discrepancies between the two evaluations in the tropical regions.At 500 hPa, the ozonesonde measurements reveal a large (by 40-50 %) negative sampling bias in March-May (MAM) and a positive sampling bias in DJF over the Western Pacific and East India, whereas it shows a large negative sampling bias (by 110 %) in MAM over the equatorial Americas.The probability distribution function (PDF) estimated using monthly mean reanalysis and model fields also differs largely between the two samplings (Fig. 10).Over the Western Pacific and East Indian Ocean in SON at 500 hPa, the multi-model mean shows a sharp peak around 54-58 ppb, in contrast to the broad  10).This information is useful to characterize model errors and for process-oriented model validation.On the other hand, the validation based on the ozonesonde sampling (left top panel) does not show any clear pattern and does not support model evaluation.Note that the influence of inter-annual variability was not considered in the analysis because the monthly climatological data were used by averaging over ten years for the models and five years for the reanalysis.
Although the variability of ozone is generally smaller in the SH than in the NH because of smaller local precursor emissions, large sampling biases exist even at SH mid-and high-latitudes due to the sparse ozonesonde network.In the SH mid latitudes, for example, the sign of the evaluated bias is opposite between the two cases at 200 hPa in DJF (-2.8 ppb in the complete sampling and +25.1 ppb in the ozonesonde sampling).In the SH high latitudes, evaluation results differ largely throughout the year in the middle troposphere.Based on the complete sampling, the ozone PDF is broadly distributed with a peak around 38 ppb at 500 hPa in SON at the SH high latitudes (right bottom panel in Fig. 10), while the multi-model mean underestimates high concentrations (>47 ppb) and shows a sharp peak of about 35 ppb.The PDF generated by the ozonesonde sampling does not provide a strong information on the distribution of the ozone (right top panel).These results highlight the advantage of using the reanalysis data for evaluating regional and seasonally representative model performance, and for characterizing these distributions.

Seasonal variation
The seasonal cycle of tropospheric ozone is determined by various factors such as local photochemical production and atmospheric transport (e.g., Monks, 2000).Carslaw (2005), Bloomer et al. (2010), and Parrish et al. (2013) found multi-decadal changes in the amplitude and phase of the seasonal cycle at NH mid-latitudes.It was suggested that these changes can be attributed to changes in atmospheric transport patterns combined with spatial and temporal changes in emissions.CTMs have been used to explore the causal mechanisms; however, they failed to simulate several important features of the observed seasonal cycles (e.g., Ziemke et al., 2006;Stevenson et al., 2006;Parrish et al. 2014;Young et al., 2013).Accurate validation of the seasonal cycle is thus important for evaluating general model performance.
Table 6 compares the relative error in the seasonal amplitude obtained from the multi-mean model with that of the reanalysis for the complete and ozonesonde samplings.The evaluation based on the ozonesonde sampling results in a larger overestimation of the seasonal amplitude in the NH lower troposphere for most cases (+13.4-+63.4% in the sonde sampling and -19.0-+40.2% in the complete sampling).The large discrepancies can be attributed to large spatial variability in the seasonal variations of ozone and its model errors within each defined region and also the existence of short-term variability that is not completely captured by the ozonesonde sampling.For the Eastern US and Western Europe at 800 hPa, the sign of the bias is opposite between the two estimates.In contrast, at 200 hPa in the NH, results between the two evaluations are similar, suggesting spatial homogeneity in the seasonal cycle and its model errors within each region in the NH.
In the tropics, the estimated errors of the seasonal amplitude largely differ between the two samplings throughout the troposphere, suggesting that information obtained from the sparse ozonesonde network cannot be applied to characterize regional model errors in the seasonal cycle, even within the small defined area.Because of the large spatial variability, detailed Atmos.Chem. Phys. Discuss., doi:10.5194/acp-2016-1043, 2016 Manuscript under review for journal Atmos.Chem.Phys.Published: 23 December 2016 c Author(s) 2016.CC-BY 3.0 License.validations using the chemical reanalysis (e.g., for each grid point) would be helpful.Also, in the SH high latitudes, large disagreements in the seasonal amplitude exist at 800 and 200 hPa.

Reanalysis uncertainty
Although the reanalysis dataset provides comprehensive information for global model evaluations, its performance still needs to be improved, especially for the lower troposphere, as also discussed by Miyazaki et al. (2015).Performance can be improved by ingesting more datasets including meteorological sounders such as IASI (Clerbaux et al., 2009), AIRS (Chahine et al., 2006), and CrIS (Glumb et al., 2002).Application of a bias correction procedure for multiple measurements, which is common in numerical weather prediction (e.g., Dee, 2005), is needed to improve reanalysis accuracy.Recently developed retrievals with high sensitivity to the lower troposphere (e.g.Deeter et al., 2013;Fu et al., 2016) and the optimization of additional precursor emissions would be helpful to improve analysis of the lower troposphere.The relatively coarse resolution of the model could cause large differences between the simulated and observed concentrations at urban sites and may degrade the reanalysis.
The statistical information obtained from the reanalysis and the multi-model simulations can be used to suggest further developments for the models and observations.The analysis ensemble spread from EnKF can be regarded as uncertainty information about the analysis mean fields, indicating requirements for additional observational constraints.As shown in Fig. 11 (left panels), the relative reanalysis uncertainty is large over the tropical areas of the oceans at 800 hPa (>20 %), over the Southern Ocean at 500 hPa (10-20 %), and over the tropics of the Pacific Ocean and the Antarctic at 200 hPa (>16 %).
Conversely, the reanalysis uncertainty is small from the tropics to mid-latitudes in both hemispheres at 500 hPa (<11 %).These variations may be related to changes in observation errors, the number of assimilated measurements, as to model errors.

Model uncertainty
The variability across the ensemble models (i.e., ensemble spread) identifies where the models are most consistent or uncertain (center panels in Fig. 11).As discussed by Young et al. (2013), the relative spread among the ACCMIP models is large over the tropical areas of the oceans in the lower and middle troposphere, a reflection of the important differences among the models in various processes such as convective processes, lightning sources, biogenetic emission sources with related chemistry.The large relative spread (>20%) at the NH mid-latitudes and in the SH at 200 hPa may be associated with the different representations of the tropopause and STE among models.In contrast, the relative spread is small around 20-40 • N at 500 hPa (< 10 %).
The simultaneous enhancement of the analysis uncertainty (c.f., Section 6.1), together with the model spread, indicates low robustness of the validation results for some tropical regions over the oceans in the lower troposphere, and over the tropics in the Pacific Ocean as well as the Antarctic at 200 hPa.On the other hand, the ACCMIP model standard deviation with respect to the reanalysis could be used to identify the averaged uncertainty of ACCMIP models (right panels in Fig. 11).The standard 14 Atmos.Chem. Phys. Discuss., doi:10.5194/acp-2016-1043, 2016 Manuscript under review for journal Atmos.Chem.Phys.Published: 23 December 2016 c Author(s) 2016.CC-BY 3.0 License.
deviation is large at NH high latitudes and over the tropical ocean areas at 800 hPa, over the SH tropics at 500 hPa, and in the SH extratropics at 200 hPa (> 25 %).

Implications into model improvements and climate studies
Numerous studies have identified decadal-scale changes in global tropospheric ozone using observations, such as the shift in the seasonal cycle at NH mid-latitudes and trends observed over many regions (e.g., Parrish et al., 2014;Cooper et al., 2014).A long-record of the reanalysis allows detailed structures in simulated inter-annual and long-term variations to be evaluated in association with changes in human activities and natural processes.It is noted that the influence of ENSO was not well simulated in ACCMIP due to a decadal-averaged SST boundary condition, which limits the evaluation of inter-annual variations and could lead to bias in the ACCMIP models and reanalysis comparisons.
Process-oriented validations using the reanalysis would be useful for understanding the uncertainty in simulated ozone fields and associated mechanisms.The ACCMIP models reveal large variations in short-lived species such as OH and ozone precursors (Naik et al., 2013;Voulgarakis et al., 2013), whereas information obtained from direct in-situ measurements cannot be applied for investigating global distributions because of the limited coverage of the measurements and the large spatial variability of concentrations.Validation of various species using the chemical reanalysis product can be used to identify potential sources of error in the simulated ozone fields.Meanwhile, the global monthly products of precursor emissions from the chemical reanalysis calculations (Miyazaki et al., 2012a(Miyazaki et al., , 2014(Miyazaki et al., , 2016) ) can be used to validate emission inventories and LNO x source parameterizations used in model simulations.As changes in tropospheric ozone burden associated with different future scenarios show a broadly linear relation to changes in NO x emissions (Stevenson et al., 2006), evaluations using up-to-date estimated emissions (Miyazaki et al., 2016) may prove useful to partly validate emissions for each scenario.
The performance of the simulated radiative forcing is largely influenced by representation of ozone in model simulations (Bowman et al., 2013;Shindell et al., 2013;Stevenson et al., 2013).Bowman et al (2013) suggested that overestimation of the OLR in the tropical seas of the east Atlantic Ocean and over Southern Africa is associated with model ozone errors, a persistent feature in all ACCMIP models, which was also found in this study using the reanalysis.Validation of short-lived species is also important for evaluating the radiative forcing because simulated OH fields influence simulated climates through for instance their influences on methane (Voulgarakis et al., 2013).Thus, detailed information on model errors in ozone and other short-lived species could be used to improve estimates of radiative forcing in climate studies.Meanwhile, model biases for present-day ozone may be correlated with biases in other time periods.Young et al. (2013) showed that ACCMIP models with high, present day ozone burdens also had high burdens for the other periods of time, including the preindustrial period.
Thus, the validation of present-day ozone fields using the reanalysis have the potential to evaluate preindustrial to present day ozone radiative forcing.The reanalysis product provides comprehensive and unique information on the weakness of the individual models and multi-model mean.We found that the ACCMIP multi-model mean overestimates ozone concentration in the NH extratropics throughout the troposphere (by 6-11 ppb and 800 hPa and by 2-9 ppb at 500 hPa for the zonal and annual mean concentration), and underestimates it in the SH tropics in the lower and middle troposphere by about 9 ppb over the eastern Pacific, by up to 18 ppb over the Atlantic, and by up to 8 ppb over the Indian Ocean.Most models underestimate the spatial variability of the annual mean concentration in the NH extratropics at 800 hPa (by up to 50 %) and in the SH extratropics at 800 and 500 hPa (by up to 70 %).The multi-model mean overestimates the seasonal amplitude in the NH by 50-70 % in the lower troposphere and by 25-40 % in the middle troposphere, whereas the seasonal amplitude is underestimated by 15-25 % at 200 hPa in the NH extratropics.The seasonal amplitude in the NH extratropics shows great diversity among models.The NH/SH ratio is overestimated by 22-30 % in the free troposphere in the multi-model mean; this can be attributed to both a concentration high bias in the NH and a concentration low-bias in the SH in most models.
We quantified the ozonesonde network sampling bias and how reanalysis can help extend the range of that network as a kind of "transfer standard".For instance, the ozonesonde sampling bias in the evaluated model bias is largely negative (positive) in MAM (in DJF) by 40-50 % over the Western Pacific and East India and largely negative by 110 % in MAM over the equatorial Americas at 500 hPa.Although the spatial and temporal variability is generally smaller in the SH than in the NH, the ozonesonde sampling bias cannot be negligible for capturing the regionally and monthly representative model errors even in the SH.The evaluation of the seasonal cycle of tropospheric ozone is also largely limited by the ozonesonde sampling bias.
The evaluation based on the ozonesonde sampling introduces a larger overestimation of the seasonal amplitude than that based on the complete sampling for most of the surrounding areas in the NH lower troposphere, whereas the two estimates are largely different for the entire tropical regions.Therefore, there is an advantage of the reanalysis data for evaluating actual regionally and seasonally representative model performance required for model improvements.However, the network provides critical independent validation of the reanalysis, which can provide a much broader spatial constraint on chemistry-climate model performance.
The proposed model validation approach provides regionally and temporally representative model performance; this could ensure more accurate predictions for the chemistry-climate system.In future studies, validation of multiple species concentrations and precursor emissions from reanalysis would be useful in identifying error sources in model simulations.In particular, the response of tropospheric composition to changing emissions over decadal time scales is still not captured in CCMs relative to a few remote sites (Parrish et al, 2014).Recent increases in emissions from China have been linked to changes in tropospheric ozone concentrations (Verstraeten et al, 2015).Over the next decade, a new constellation of of low Earth Orbiting Atmos.Chem. Phys. Discuss., doi:10.5194/acp-2016-1043, 2016 Manuscript under review for journal Atmos.Chem.Phys.Published: 23 December 2016 c Author(s) 2016.CC-BY 3.0 License.sounders, e.g., IASI, AIRS, CrIS, Sentinel-5p (TROPOMI), Sentinel-5 and geostationary satellites (Sentinel-4, GEMS, and TEMPO) will provide even more detailed knowledge of ozone and its precursors (Bowman, 2013).Assimilating these datasets into a decadal chemical reanalysis will be a more direct means of quantifying the response of atmospheric composition to emissions at climate relevant time scales, which should be a more direct test on chemistry-climate change scenarios.We also plan to apply the proposed evaluation approach to a more recent model inter-comparison project, the Chemistry-Climate Model Initiative (CCMI).
Acknowledgements.We acknowledge the use of data products from the NASA AURA and EOS Terra satellite missions.We also acknowledge the free use of tropospheric NO2 column data from the SCIAMACHY, GOME-2, and OMI sensors from www.temis.nl.This work was  Table 5. Median of the ACCMIP models minus reanalysis at 500 hPa (in % relative to the reanalysis concentrations).Results are shown for the regional average (Regional) and at the ozonesonde sampling (Sonde).Relative differences between the two estimates larger than 30 % are shown in bold.
DJF MAM JJA SON Regional Sonde Regional Sonde Regional Table 6.ACCMIP multi-model mean minus reanalysis comparisons of the seasonal amplitude of regional mean ozone concentration (in %) for the regional average (Regional) and at the ozonesonde sampling (Sonde).The seasonal amplitude is estimated as a difference between maximum and minimum monthly mean concentrations.
Atmos.Chem.Phys.Discuss., doi:10.5194/acp-2016-1043,2016 Manuscript under review for journal Atmos.Chem.Phys.Published: 23 December 2016 c Author(s) 2016.CC-BY 3.0 License.attributed to the differing influence of stratospheric air.The seasonal amplitude is overestimated at 800 and 200 hPa by most models at SH high-latitudes.
Atmos.Chem.Phys.Discuss., doi:10.5194/acp-2016-1043,2016 Manuscript under review for journal Atmos.Chem.Phys.Published: 23 December 2016 c Author(s) 2016.CC-BY 3.0 License.distribution seen in the reanalysis with two peaks around 65 ppb and 35-45 ppb for the complete sampling (left bottom panel in Fig. Atmos.Chem.Phys.Discuss., doi:10.5194/acp-2016-1043,2016   Manuscript under review for journal Atmos.Chem.Phys.Published: 23 December 2016 c Author(s) 2016.CC-BY 3.0 License.We conducted a ten-year tropospheric chemistry reanalysis by assimilating multiple chemical species from the OMI, MLS, TES, MOPITT, SCIAMACHY, and GOME-2 to provide a gridded, chemically consistent estimate of concentrations and precursor emissions.This study explores the potential of atmospheric chemical reanalysis to evaluate global tropospheric ozone of multi-model chemistry-climate model simulations.

Figure 6 .Figure 7 .Figure 8 .Figure 9 .Figure 10 .Figure 11 .
Figure 6.Seasonal amplitude (peak-to-peak difference) estimated from the reanalysis (black solid line) and ACCMIP models (thin colored lines).The ±1σ deviation among ACCMIP models (i.e., model spread) is shown in pink.The seasonal amplitude derived from the multimodel mean fields (red solid line) and the multi-model mean of the seasonal amplitude from each model (red dashed line) are also shown.From top to bottom, results are shown for 200 hPa, 500 hPa, and 800 hPa.34

Table 1 .
Chemical reanalysis (or control run in brackets) minus ozonesonde comparisons of mean ozone concentrations in 2005-2009.RMSE is the root-mean-square error.Units of bias and RMSE are ppb.T-Corr is the temporal correlation.

Table 2 .
ACCMIP model mean minus reanalysis comparisons of the mean ozone concentrations.Units of bias and RMSE are ppb.S-Corr is

Table 3 .
ACCMIP models minus reanalysis comparisons of the mean ozone concentrations at 500 hPa.Units of bias are ppb.