Evaluation of the MACC operational forecast system – potential and challenges of global near-real-time modelling with respect to reactive gases in the troposphere

Introduction Conclusions References


Introduction
The impact of reactive gases on climate, human health and the environment has gained increasing public and scientific interest in the last decade (Bell et al., 2006;Cape 2008;Mohnen et al., 1993;Seinfeld and Pandis 2006;Selin et al., 2009) as air pollutants such as carbon monoxide (CO), nitrogen oxides (NO x ) and ozone (O 3 ) are known to have acute and chronic effects on human health, ranging from minor upper respiratory irritation to chronic respiratory and heart disease, lung cancer, acute respiratory infections in children and chronic bronchitis in adults (Bell et al., 2006;Kampa and Castanas, 2006).Tropospheric ozone, even in small concentrations, is also known to cause plant damage through reducing plant primary productivity as well as crop yields (e.g.Ashmore, 2005).It also contributes to global warming by direct and indirect radiative forcing (Forster et al., 2007;Sitch et al., 2007).Pollution events can be caused by local sources and processes but are also influenced by continental and intercontinental transport of air masses.Global models can provide the transport patterns of air masses and deliver the boundary conditions for regional models, facilitating the forecast and investigation of air pollutants.
The European Union (EU)-funded research project Monitoring Atmospheric Composition and Climate (MACC) (consisting of a series of European projects, MACC to MACC-III), provides the preparatory work that will form the basis of the European Union's Copernicus Atmosphere Monitoring Service (CAMS).This service was established by the EU to provide a range of products of societal and environmental value with the aim to help European governments respond to climate change and air quality problems (more information about this service can be found on CAMS website http://www.copernicus.eu/main/atmosphere-monitoring).The MACC project provides reanalyses, monitoring products of atmospheric key constituents (e.g.Inness et al., 2013), as well as operational daily forecasting of greenhouse gases, aerosols and reactive gases (Benedetti et al., 2011;Stein et al., 2012) on a global and on European-scale level, and derived products such as solar radiation.An important aim of the MACC system is to describe the occurrence, magnitude and transport pathways of disruptive events, e.g., volcanoes (Flemming and Inness, 2013), major fires (Huijnen et al., 2012;Kaiser et al., 2012) and dust storms (Cuevas et al., 2015).The product catalogue can be found on the MACC website: http://copernicus-atmosphere.eu.For the generation of atmospheric products, state-of-the-art atmospheric modelling is combined with assimilated satellite data (Hollingsworth et al., 2008;Inness et al., 2013Inness et al., , 2015; more general information about data assimilation can be found in, e.g., Ballabrera-Poy et al., 2009or Kalnay, 2003).Within the MACC project there is a dedicated validation activity to provide up-to-date information on the quality of the reanalysis, daily analyses and forecasts.Validation reports are updated regularly and are available on the MACC websites.
The MACC global near-real-time (NRT) production model for reactive gases and aerosol has operated with data assimilation from September 2009 onwards, providing boundary conditions for the MACC regional air quality (RAQ) products, and other downstream users.The model simulations also provide input for the stratospheric ozone analyses delivered in near-real-time by the MACC stratospheric ozone system (Lefever et al., 2014).
In this paper we describe the investigation of the potential and challenges of near-real-time modelling with the MACC analysis system between 2009 and 2012.We concentrate on this period because of the availability of validated independent observations, namely surface observations from the Global Atmosphere Watch (GAW) Programme, the European Monitoring and Evaluation Programme (EMEP), as well as total column/tropospheric column satellite data from the MOPITT (Measurement Of Pollution In The Troposphere), SCIAMACHY (SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY) and GOME-2 (Global Ozone Monitoring Experiment-2) sensors.In particular, we study the model's ability to reproduce the seasonality and absolute values of CO and NO 2 in the troposphere as well as NO 2 , O 3 and CO at the surface.The impact of changes in model version, data assimilation and emission inventories on the model performance is examined and discussed.The paper is structured in the following way: Sect. 2 contains a description of the model and the validation data sets as well as the applied validation metrics.Section 3 presents the validation results for CO, NO 2 and O 3 .Section 4 provides the discussion and Sect. 5 the conclusions of the paper.

The MACC model system in the 2009-2012 period
The MACC global products for reactive gases consist of a reanalysis performed for the years 2003-2012 (Inness et al., 2013) and the near-real-time analysis and forecast, largely based on the same assimilation and forecasting system, but targeting different user groups (operational air quality forecasting and regional climate modelling, respectively).The Model for OZone And Related chemical Tracers (MOZART) chemical transport model (CTM) is coupled to the integrated forecast system (IFS) of the European Centre for Medium-Range Weather Forecast (ECMWF), which together represent the MOZART-IFS model system (Flemming et al., 2009;Stein et al., 2012).An alternative analysis system has been set up based on the global chemistry transport model version 5 (TM5; see also Huijnen et al., 2010).Details of the MOZART version used in the MACC global products can be found in Kinnison et al. (2007) and Stein et al. (2011Stein et al. ( , 2012)).In our simulation, the IFS and the MOZART model run in parallel and exchange several two-and three-dimensional fields every model hour using the Ocean Atmosphere Sea Ice Soil version 4 (OASIS4) coupling software (Valcke and Redler, 2006), thereby producing three-dimensional IFS fields for O 3 , CO, SO 2 , NO x , HCHO, sea salt aerosol, desert dust, black carbon, organic matter, and total aerosol.The IFS provides meteorological data to MOZART.Data assimilation and transport of the MACC species takes place in the IFS, while the whole chemical reaction system is calculated in the MOZART model.
The MACC_osuite (operational suite) is the global nearreal-time MACC model production run for aerosol and reactive gases.Here, we have investigated only the MACC analysis.In contrast to the reanalysis, the MACC_osuite is a near-real-time run, which implies that it is only run once in near-real-time and may thus contain inconsistencies in, e.g., the assimilated data.The MACC_osuite was based on the IFS cycle CY36R1 with IFS model resolution of approximately 100 km by 100 km at 60 levels (T159L60) from September 2009 to July 2012.The gas-phase chemistry module in this cycle is based on MOZART version 3.0 (Kinnison et al., 2007).The model has been upgraded, following updates of the ECMWF meteorological model and MACCspecific updates, i.e. in chemical data assimilation and with respect to the chemical model itself.Thus, from July 2012 onwards, the MACC_osuite has run with a change of the meteorological model to a new IFS cycle (version CY37R3), with an IFS model resolution of approximately 80 km at 60 levels (T255L60) and an upgrade of the MOZART version 3.5 (Kinnison et al., 2007;Emmons et al., 2011;Stein et al., 2013).This includes, amongst others, updated velocity fields for the dry deposition of O 3 over ice, as described in Stein et al. (2013).A detailed documentation of system changes can be found at http://atmosphere.copernicus.eu/user-support/operational-info.

Emission inventories and assimilated data sets
In the MACC_osuite, anthropogenic emissions are based on emissions from the EU project REanalysis of the TRopospheric chemical composition Over (RETRO) the past 40 years merged with updated emissions for East Asia from the Regional Emission inventory in ASia (REAS) inventory (Schultz et al., 2007) -in the following referred to as RETRO-REAS.The horizontal resolution is 0.5 • in latitude and longitude and it contains a monthly temporal resolution.Biogenic emissions are taken from Global Emissions InitiAtive (GEIA), fire emissions are based on a climatology derived from Global Fire Emissions Database version 2 (GFEDv2; van der Werf et al., 2006) until April 2010, when fire emissions change to global fire assimilation system (GFAS) emissions (Kaiser et al., 2012).Between January and October 2011 there has been a fire emission reading error in the model where, instead of adjusting emissions to the appropriate month, the same set of emissions have been read throughout this period.
After the model upgrade to the new cycle version CY37R3, in July 2012, the emission inventories changed from the merged RETRO-REAS and GEIA inventories, used in the previous cycle, to the MACCity (MACC/CityZEN EU projects) anthropogenic and biogenic emissions (Granier et al., 2011) and (climatological) Model of Emissions of Gases and Aerosols from Nature version 2 (MEGAN-v2; see Guenther et al., 2006) emission inventories.Wintertime anthropogenic CO emissions are scaled up over Europe and North America (see Stein et al., 2014).Near-real-time fire emissions are taken from GFASv1.0 (Kaiser et al., 2012), for both gas-phase and aerosol.
In the MACC_osuite, the initial conditions for some of the chemical species are provided by data assimilation of atmospheric composition observations from satellites (see Benedetti et al., 2008;Inness et al., 2009Inness et al., , 2013;;Massart et al., 2014).Table 1 lists the assimilated data products.From September 2009 to June 2012, O 3 total columns from the microwave limb sounder (MLS) and solar backscatter ultraviolet (SBUV-2) instruments are assimilated, as well as ozone monitoring instrument (OMI) and SCIAMACHY total columns (the latter only until March 2012, when the European Space Agency lost contact with the ENVIronmental SATellite -ENVISAT).The CO total columns are assimilated from the Infrared Atmospheric Sounding Interferometer (IASI) sensor and aerosol total optical depth is assimilated from the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument.After the model cycle update in July 2012, data assimilation also includes OMI tropospheric columns of NO 2 and SO 2 , as well as CO MOPITT total columns.
Tables 1 and 2 summarise the data assimilation and set-up of the MACC_osuite.

Validation data and methodology
In this study, we have tended to use the same evaluation data sets as during the MACC near-real-time validation exercise.This implies some discontinuities in the evaluations, e.g. the substitution of SCIAMACHY data with GOME-2 data after MACCity / MEGAN / GFASv1.0 daily the loss of the Envisat sensor or an exclusion of MOPITT satellite data after the start of its assimilation into the model.The continuous process of updating and complementation of data sets in databases requires the selection and definition of a validation data set at some point.The comparatively small inconsistencies between our data sets are considered to have a negligible impact on the overall evaluation results.

GAW surface O 3 , CO and NO 2 observations
The GAW programme of the World Meteorological Organisation (WMO) has been established to provide reliable longterm observations of the chemical composition and physical properties of the atmosphere, which are relevant for understanding atmospheric chemistry and climate change (WMO, 2013).The GAW tropospheric O 3 measurements are performed in a way to be suited for the detection of long-term regional and global changes.Furthermore, the GAW measurement programme focusses on observations that are regionally representative and should be free from influence of significant local pollution sources and suited for the validation of global chemistry climate models (WMO, 2007).Detailed information on GAW-and GAW-related O 3 , CO and NO 2 measurements can be found in WMO (2010WMO ( , 2011WMO ( , 2013) ) and Penkett (2011).
Hourly O 3 ,CO and NO 2 data have been downloaded from the WMO/GAW World Data Centre for Greenhouse Gases (WDCGG) for the period between September 2009 and December 2012 (the download was carried out in July 2013).Our validation includes 6 stations with surface observations for NO 2 , 29 stations for CO and 50 stations with surface ob-servations for O 3 .Table 3 lists the geographic coordinates and altitudes of the individual stations.Being a long-term data network, the data in the database are provided with a temporal delay of approximately 2 years.As the data in the database become sparse towards the end of the validation period, near-real-time observations, as used in the MACCproject for near-real-time validation, presented on the MACC website, have been included to complement the validation data sets.For the detection of long-term trends and year-toyear variability, the data quality objectives (DQOs) for CO in GAW measurements are set to a maximum uncertainty of ±2 ppb and to ±5 ppb for marine boundary layer sites and continental sites that are influenced by regional pollution, and to ±1 ppb for ozone (WMO, 2012(WMO, , 2013) ) and 0.08 ppb for NO 2 (WMO, 2011).
For the validation with GAW station data, 6-hourly values (00:00, 06:00, 12:00, 18:00 UTC) of the analysis mode have been extracted from the model and are matched with hourly observational GAW station data.Model mixing ratios at the stations' locations have been linearly interpolated from the model data in the horizontal.In the vertical, modelled gas mixing ratios have been extracted at the model level, which is closest to the GAW stations' altitude.Validation scores (see Sect. 2.3) have been calculated for each station between the 6-hourly model analysis data and the corresponding observational data for the entire period (September 2009-December 2012) and as monthly averages.

EMEP surface O 3 observations
The EMEP is a scientifically based and policy driven programme under the Convention on Long-Range Transboundary Air Pollution (CLRTAP) for international co-operation to solve transboundary air pollution problems.Measurements of air quality in Europe have been carried out under the EMEP programme since 1977.
A detailed description of the EMEP measurement programme can be found in Tørseth et al. (2012).The surface hourly ozone data between September 2009 and December 2012 have been downloaded from the EMEP data web page (http://www.nilu.no/projects/ccc/emepdata.html).For the validation, only stations meeting the 75 % availability threshold per day and per month are taken into account.The precision is close to 1.5 ppb for a 10 s measurement.More information about the ozone data quality, calibration and maintenance procedures can be found in Aas et al. (2000).
For comparison with EMEP data, 3-hourly model values (00:00, 03:00, 06:00, 12:00, 15:00, 18:00, 21:00 UTC) of the analysis mode have been chosen.We used this data set to test the dependency of the biases on a daytime and nighttime basis, separately.Gas mixing ratios have been extracted from the model and are matched with hourly observational surface ozone data at 124 EMEP stations in the same way as for the GAW station data.The EMEP surface ozone values and the interpolated surface modelled values are compared on a monthly basis for the latitude bands of 30-40 • N (southern Europe), 40-50 • N (central Europe) and 50-70 • N (northern Europe).For the identification of differences in the MACC_osuite performance between day and night-time, the MACC_osuite simulations and the EMEP observations for the three latitude bands have been additionally separated into daytime (12:00-15:00 local time, LT) and night-time (00:00-03:00 LT) intervals.

MOPITT CO total column retrievals
The MOPITT instrument is mounted on board the NASA EOS Terra satellite and provides CO distributions at the global scale (Deeter et al., 2004) (Deeter et al., 2013).Following the recommendation in the users' guide (www.acd.ucar.edu/mopitt/v5_users_guide_beta.pdf), the MOPITT data are averaged by taking into account their relative errors provided by the observation quality index (OQI).Also, to achieve better data quality, we use only daytime CO data since retrieval sensitivity is greater for daytime rather than night-time overpasses.A further description of the V5 data is presented in Deeter et al. (2013) and Worden et al. (2014).
For the validation, the model CO profiles (X) are transformed by applying the MOPITT averaging kernels (A) and the a priori CO profile (X a ) according to the following equation (Rodgers, 2000) to derive the smoothed profiles X * appropriate for comparison with MOPITT data: Details on the method of calculation are referred to in Deeter et al. (2004) and Rodgers (2000).The averaging kernels indicate the sensitivity of the MOPITT measurement and retrieval system to the true CO profile, with the remainder of the information set by the a priori profile and retrieval constraints (Emmons, 2009;Deeter et al., 2010).The CO data X * (derived using the above equation) have the same vertical resolution and a priori dependence as the MOPITT retrievals and have been used to calculate averaging kernel smoothed model CO total columns, which are compared to the MO-PITT CO total columns.For the validation, eight regions are defined (see Fig. 1): Europe, Alaska, Siberia, North Africa, southern Africa, South Asia, East Asia and the United States.The model update in July 2012 includes an integration of MOPITT CO total columns in the model's data assimilation system.With this, the MOPITT validation data have lost their independency for the rest of the validation period and MOPITT validation data have thus only been used until June 2012 for validation purposes.
In this study, the tropospheric NO 2 column data set described in Hilboll et al. (2013a) has been used.The measured radiances are analysed using differential optical absorption spectroscopy (DOAS) (Platt and Stutz, 2008) in the 425-450 nm wavelength window (Richter and Burrows, 2002;Richter et al., 2011).The influence of stratospheric NO 2 air masses has been accounted for using the algorithm detailed by Hilboll et al. (2013b), using stratospheric NO 2 fields from the Bremen 3D chemistry and transport model (B3dCTM; see also Sinnhuber et al., 2003a, b;Winkler et al., 2008).Tropospheric air mass factors have been calculated with the radiative transfer model SCIATRAN 2.0 (Rozanov et al., 2005).Only measurements with Fast REtrieval Scheme for Cloud from Oxygen A band (FRESCO+) algorithm (Wang et al., 2008)  A few processing steps are applied to the MACC_osuite data to account for differences with the satellite data such as observation time.First, tropospheric NO 2 VCDs are calculated from the model data by vertical integration from the ground up to the height of the tropopause.The latter is derived based on National Centers for Environmental Prediction (NCEP) reanalysis (Kalnay et al., 1996) climatological tropopause pressure shown in Fig. 1 of Santer et al. (2003).Second, simulations are interpolated linearly in time to the SCIAMACHY Equator crossing time (roughly 10:00 LT).This most likely leads to some minor overestimation of model NO 2 VCDs compared to GOME-2 data, as the Equator crossing time for GOME-2 is about 09:30 LT.Moreover, only model data for which corresponding satellite observations exist are considered.For the validation, the same regions have been used as for MOPITT (Fig. 1), except for Siberia and Alaska.In contrast to comparisons of MOPITT and model data of CO, no averaging kernels were applied to the model NO 2 data.
Satellite observations of tropospheric NO 2 columns have relatively large uncertainties, mainly linked to errors in the stratospheric correction method, i.e. in stratospheric NO 2 columns (important over clean regions and at high latitudes in winter and spring) and to uncertainties in air mass factors (mainly over polluted regions) (e.g.Boersma et al., 2004;Richter et al., 2005).The uncertainty varies with geolocation and time but in first approximation can be separated into an absolute error of 5 × 10 14 molec cm −2 and a relative error of about 30 %.As some of the contributions to this uncertainty can have systematic causes (e.g. a systematic error in the assumed aerosol load can lead to seemingly random errors in the retrieved NO 2 columns due to the complexities of atmospheric radiative transfer, i.e. relative positions of absorber and aerosol layers), averaging over longer time periods does not reduce the errors as much as one would expect for purely random errors.Over polluted regions, the uncertainty from random noise in the spectra is small in comparison to other error sources, in particular for monthly averages.

Validation metrics
A comprehensive model validation requires the selection of validation metrics that provide complementary aspects of model performance.The following metrics have been used in the validation: Root mean square error (RMSE) where N is the number of observations, f the modelled analysis and o the observed values, f and ō are the mean values of the analysis and observed values and σ f and σ o the corresponding standard deviations.The validation metrics above have been chosen to provide complementary aspects of model performance.The modified normalised mean bias is a normalisation based on the mean of the observed and forecast value (e.g.Elguindi et al., 2010).It ranges between −2 and 2 and when multiplied by 100 %, it can be interpreted as a percentage bias.
We chose to use the modified normalised mean bias (MNMB) in our evaluations because verifying chemical species concentration values significantly differs from verifying standard meteorological fields.For example, spatial or temporal variations can be much greater and the differences between model and observed values ("model errors") are frequently much larger in magnitude.Most importantly, typical concentrations can vary quite widely between different pollutant types (e.g.O 3 and CO) and regions (e.g.Europe vs.Antarctica), and a given bias or error value can have a quite different significance.It is useful, therefore, to consider bias and error metrics that are normalised with respect to observed concentrations and hence can provide a consistent scale regardless of pollutant type (see e.g.Elguindi et al., 2010, or Savage et al., 2013).Moreover, the MNMB is robust to outliers and converges to the normal bias for biases approaching zero, while taking into account the representativeness issue when comparing coarse-resolved global models versus site-specific station observations.Though GAW stations prove regionally representative in general, the experience is that local effects cannot always be ruled out reliably in long worldwide data sets, because each of the different species has its individual scale of transport and chemical processes, which in one case may exceed and in the other case fall bellow the model resolution.Referencing to the model/observation mean again constitutes a pragmatic solution to avoid misleading bias tendencies, particularly in sensitive regions with sparse data coverage.Within MACC, the MNMB is used as an important standard score.It is used in the MACC quarterly validation reports and it appears in many recent publications, e.g.Cuevas et al. (2015), Eskes et al. (2015), Sheel et al. (2014).
The MNMB varies symmetrically with respect to underand overestimation.However, when calculated over longer time periods, a balance in model error, with model over-and underestimation compensating each other, can lead to a small MNMB for the overall period.For this reason, it is important to additionally consider an absolute measure, such as the root mean square error (RMSE).However, it has to be noted that the RMSE is strongly influenced by larger values and out- liers, due to squaring.The correlation coefficient R can vary between 1 (perfect correlation) and −1 (perfect negative correlation) and is an important measure for checking the linearity between model and observations.

Validation of ozone
The evaluation of the MACC_osuite run with O 3 from GAW surface observations (described in Sect.2.2.1) demonstrates good agreement in absolute values and seasonality for most regions.Figure 2 shows maps with MNMB (see Sect. 2.3) evaluations for 50 GAW stations globally (top) and in Europe (bottom).Figure 3 presents selected time series plots representing the results for high latitudes, low latitudes and Europe.Large negative MNMBs over the whole period September 2009 to December 2012 (−30 to −82 %) are observed for stations located in Antarctica (Neumayer (NEU), South Pole (SPO), Syowa (SYO) and Concordia (CON)) whereby O 3 surface mixing ratios are strongly underestimated by the model.For stations located at high latitudes in the Northern Hemisphere (Barrow (BAR), Alaska, and Sum-  3 show that an underestimation seen in these regions appears to be remedied and model performance improved with an updated dry deposition parameterisation over ice, which has been introduced with the new model cycle in July 2012 (see Sect. 2.1).
Large positive MNMBs (up to 50 to 70 %, Fig. 2) are observed for stations that are located in or nearby cities and thus exposed to regional sources of contamination (Iskrba (ISK), Slovenia; Tsukuba (TSU), Japan; Cairo (CAI), Egypt).In tropical and subtropical regions, O 3 surface mixing ratios are systematically overestimated (by about 20 % on average) during the evaluation period.The time series plots for tropical and subtropical stations (e.g. for Ragged Point (RAG), Barbados, and Cape Verde Observatory, Cape Verde (CVO), Fig. 3) reveal a slight systematic positive offset throughout the year, however with high correlation coefficients (0.6 on average).
For GAW stations in Europe, the evaluation of the MACC_osuite for the whole period shows MNMBs between −80 and 67 %.Large biases appear only for two GAW stations located in Europe: Rigi (RIG), Switzerland (−80 %), located near mountainous terrain and ISK, Slovenia (67 %).
For the rest of the stations MNMBs lie between 22 and −30 %.RMSEs (see Sect. 2.3) range between 7 and 35 ppb (15 ppb on average).Again, results for ISK and RIG show the largest errors.All other stations show RMSEs between 7 and 20 ppb.Correlation coefficients here range between 0.1 and 0.7 (with 0.5 on average).Table 4 summarises the results for all stations individually.
Monthly MNMBs (see Fig. 4) show a seasonally varying bias, with positive MNMBs occurring during the northern summer months (with global average ranging between 5 and 29 % during the months June and October), and negative MNMBs during the northern winter months (between −2 and −33 % during the months December to March).These deviations partly cancel each other out in MNMB for the whole evaluation period.For the RMSEs (Fig. 5) maximum values also occur during the northern summer months with the global average ranging between 11 and 16 ppb for June to September.The smallest errors appear during the northern hemispheric winter months (global average falling between 8 and 10 ppb for December and January).The correlation does not show a distinct seasonal behaviour (see Fig. 6).
The time series plots in Fig. 3 show that the seasonal cycle of O 3 mixing ratios with maximum concentrations during the summer months and minimum values occurring during winter times for European stations (e.g.Monte Cimone    (MCI), Italy; Kosetice (KOS), Czech Republic; and Kovk (KOV), Slovenia), could be well reproduced by the model, although there is some overestimation in summer resulting mostly from observed minimum concentrations that are not captured correctly by the MACC_ osuite (KOS, Czech Republic, and KOV, Slovenia).
The validation with EMEP surface ozone observations (described in Sect.2.2.2) in three different regions in Europe for the period September 2009 to December 2012 likewise confirms the behaviour of the model to overestimate O 3 mixing ratios during the warm period and underestimate O 3 concentrations during the cold period of the year (see Fig. 7).The mostly positive bias (May-November) is between −9 and 56 % for northern Europe and central Europe and between 8 and 48 % for southern Europe.Negative MNMBs appear, in accordance with GAW validation results, during the winter-spring period (December-April) ranging between −48 and −7 % for EMEP stations in northern Europe (exception: December 2012 with 25 %), between −1 and −39 % in central Europe (exception: December 2012 with 31 %), whereas in southern Europe, deviations are smaller and remain mostly positive (between −8 and 9 %) in winter (exception: December 2012 with 37 %).The different behaviour for December 2012 likely results from the limited availability of observations towards the end of the validation period.The separate evaluation of day and night-time O 3 mixing ratios (Fig. 8) shows that for northern Europe night-time biases exceed daytime biases during all seasons.For central Europe and southern Europe night-time biases are larger (negative MNMBs) during cold periods (December-April), whereas during warm periods (May-November) larger biases (positive MNMBs) appear during daytime.

Validation of carbon monoxide
The validation of the MACC_osuite with surface observations of 29 GAW stations (described in Sect.2.2.1) shows that over the whole period September 2009 to December 2012, CO mixing ratios could be reproduced with an average MNMB of −10 %.The MNMBs for all stations range between −50 and +30 %. Results are listed in Table 5; a selection of time series plots shows the results for stations in Europe, Asia and Canada in Fig. 9. MNMBs exceeding ±30 % appear for stations that are either located in or nearby cities and thus exposed to regional sources of contamination (KOS, Czech Republic) or are located in or near complex mountainous terrain (RIG, Switzerland, and BEO Moussala (BEO), Bulgaria) which is not resolved by the topography of the global model.The RMSEs fall between 12 and 143 ppb (on average 48 ppb) for all stations during the validation period, but for only four stations (RIG, KOS, Payerne (PAY), Switzerland and BEO, all located in Europe) do the RMSEs exceed 70 ppb.Correlation coefficients from the comparison with GAW station data calculated over the whole time period range between 0 and 0.8 (on average 0.4), with only four stations showing values smaller than 0.2 (RIG, BEO, East Trout Lake (ETL) and Lac la Biche (LAC); the latter two located in Canada).
Considering the global monthly MNMBs and RMSEs, it can be seen that during the northern hemispheric summer months, June to September, both are small (absolute differences less than 5 %); see Figs. 10 and 11.Negative MN-MBs (up to −35 %) and larger RMSEs (up to 72 ppb) appear during the northern hemispheric winter months, November to March, when anthropogenic emissions are at a highest level, especially for the US, northern latitudes and Europe.Monthly correlation coefficients are between 0.1 and 0.5 and do not show a distinct seasonal behaviour (see Fig. 12), the low values of 0.1 during the period January to October 2011 result from the reading error in the fire emissions (see Sect. "Emission inventories and assimilated data sets").The generally only moderate correlation coefficient is related to mismatches in the strong short-term variability seen in both the model and the measurements.
The time series plots for stations in Europe, Asia and Canada in Fig. 9 demonstrate that the annual CO cycle could  to a large degree be reproduced correctly by the model with maximum values occurring during the winter period and minimum values appearing during the summer season.However, the model shows a negative offset during the winter period.Seasonal air mass transport patterns that lead to regular annual re-occurring CO variations could be reproduced for GAW stations in East Asia: the time series plots for Yonagunijima (YON) and Minamitorishima (MNM) station, Japan (Fig. 9), show that the drop of CO, associated with the air mass change from continental to cleaner marine air masses after the onset of the monsoon season during the early summer months, is captured by the MACC_osuite.Deterioration in all scores is visible during December 2010 in the time series plots of several stations (e.g.Jungfraujoch (JFJ), and Sonnblick (SBL), Fig. 9).This is likely a result of changes in the processing of the L2 IASI data and a temporary blacklisting of IASI data (to avoid model failure) in the assimilation.
The comparison with MOPITT satellite CO total columns between October 2009 and June 2012 (described in Sect.2.2.3) shows a good qualitative agreement of spatial patterns and seasonality; see Table 6.The MNMBs for eight regions are listed in Fig. 13 and range between −22 and 14 %.The seasonality of the satellite observations is captured well by the MACC_osuite over Asia and Africa, with MNMBs between −6 and 9 % (North Africa), −12 and 8 % (southern Africa), −11 and 12 % (East Asia) and −3 and 14 % (South Asia).The largest negative MNMBs appear during the winter periods, especially from December 2010 to May 2011 and from September 2011 to April 2012, for Alaska and Siberia and for the US and Europe (MNMBs up to −22 %), which coincides with large differences between MOPITT and IASI satellite data (see Fig. 14).On the global scale the average difference between the IASI and MOPITT total columns is less than 10 % (George et al., 2009), and there is a close agreement of MOPITT and IASI for South Asia and Africa (see Fig. 14).However, larger differences between MOPITT and IASI data appear during the north-ern winter months over Alaska, Siberia, Europe and the US, which result in lower CO concentrations in the model, due to the assimilation of IASI CO data in the MACC_osuite.The differences between MOPITT and IASI data can be mainly explained by the use of different a priori assumptions in the IASI and MOPITT retrieval algorithms (George et al., 2015).The Fast Optimal Retrievals on Layers for IASI (FORLI; Hurtmans et al., 2012) software uses a single a priori CO profile (with an associated variance-covariance matrix) whereas the MOPITT retrieval algorithm uses a variable a priori, depending on time and location.George et al. (2015) showed that differences above Europe and the US in January and December (for a 6 year study) decrease by a factor of 2 when comparing IASI with a modified MOPITT product using the IASI single a priori.Between January and October 2011 there has also been a reading error in the fire emissions that contributes to larger MNMBs during this period (see Sect. "Emission inventories and assimilated data sets").is too short.The model simulates larger NO 2 VCD maxima over central Africa, which mainly originate from wild fires.It remains unclear if GFEDv2/GFAS fire emissions are too high here or if NO 2 fire plumes closer to the ground cannot be seen by SCIAMACHY due to light scattering by biomass burning aerosols (Leitão et al., 2010).In the Northern Hemisphere, background values of NO 2 VCD over the ocean are lower in the simulations than in the satellite data.The same is true for the South Atlantic Ocean to the west of Africa (see Fig. 15).This might suggest a model underestimation of NO 2 export from continental sources towards the ocean or too rapid conversion of NO 2 into its reservoirs.However, as the NO 2 columns over the oceans are close to the uncertainties in the satellite data, care needs to be taken when interpreting these differences.7 summarises the statistical values derived over the whole time period.High anthropogenic emissions occur over the US, Europe, South Asia and East Asia compared to other regions on the globe (e.g.Richter et al., 2005).In principle, the MACC_osuite catches the pattern of satellite NO 2 VCD over these regions.However, the model tends to underestimate NO 2 VCDs throughout the whole time period investigated here.The negative bias is most pronounced over East Asia with a modelled mean NO 2 VCD for September 2009 to  December 2012 of about 3.8 × 10 15 molec cm −2 lower than that derived from satellite measurements (see Table 7).

Validation of tropospheric nitrogen dioxide
Considering monthly values, the MACC_osuite strongly underestimates magnitude and seasonal variation of satellite NO 2 VCD over East Asia (MNMBs between about −40 and −110 % and RMSE between 1 × 10 15 and 14 × 10 15 molec cm −2 throughout the whole time period).A change in the modelled NO 2 values is apparent in July 2012 when the emission inventories changed and the agreement with the satellite data improved for South and East Asia but deteriorated for the US and Europe.This results in a drop of MNMBs (Fig. 18) for Europe and the US with values approaching around −70 % by the end of 2012.Nevertheless, correlations between daily satellite and model data derived for the whole time period (see Table 7) are high for East Asia (0.8), South Asia (0.8), Europe (0.8), and lower, but still rather high, for the US (0.6).
The North African and southern African regions are strongly affected by biomass burning (Schreier et al., 2014).Magnitude and seasonality of daily and monthly tropo-spheric NO 2 VCDs (Figs. 16 and 17,respectively) are rather well represented by the model, apart from January to October 2011, due to difficulties in reading fire emissions for this time period (see Sect. "Emission inventories and assimilated data sets").The latter results in large absolute values of the MNMB (Fig. 18) and large RMSEs (Fig. 19) between January and October 2011 compared to the rest of the time period.As for other regions investigated in this section, mean values of simulated daily tropospheric NO 2 VCDs over North Africa and southern Africa between September 2009 and December 2012 tend to be lower than the corresponding satellite mean values (see Table 7).The correlation between daily model and satellite data over the whole time period is about 0.6 for southern Africa and 0.5 for North Africa.Whether this difference in model performance for the African regions is due to meteorology, chemistry or emissions needs to be investigated, but this is outside the scope of this paper.
The evaluation of modelled NO 2 with GAW surface data for six European stations accordingly shows that NO 2 is gen-

Discussion
The validation of global O 3 mixing ratios with GAW observations at the surface levels showed that the MACC_osuite could generally reproduce the observed annual cycle of ozone mixing ratios.Model validation with surface data shows global average monthly MNMBs between −30 and 30 % (GAW) and for Europe between −50 and 60 % (EMEP).For stations located in the northern mid-latitudes, the evaluation reveals a seasonally dependent bias, with an underestimation of the observed O 3 mixing ratios by the MACC_osuite during the winter season and an overestimation during the summer months.The validation of daytime versus night-time concentrations for northern and central Europe shows larger negative MNMBs in the winter months during night-time than daytime (Fig. 8), so that the negative  3).The overestimation of O 3 mixing ratios during the summer months is a wellknown issue and has been described by various model validation studies (e.g.Brunner et al., 2003;Schaap et al., 2008;Ordoñez et al., 2010;Val Martin et al., 2014).Inadequate ozone precursor concentrations and aerosol induced radiative effects (photolysis) have been frequently identified as being the main factors.The time series plots in Fig. 3 6).There is close agreement of modelled CO total columns and satellite observations for Africa and South Asia throughout the evaluation period.However, there is a negative offset compared to the observational CO data over Europe and North America.The largest deviations occur during the winter season when the observed CO concentrations are at a highest level.The evaluation with GAW surface CO data accordingly shows a wintertime negative bias of up to −35 % in magni- tude at the surface for stations in Europe and the US.A general underestimation of CO from global models in the Northern Hemisphere has been described by various authors (e.g.Shindell et al., 2006;Naik et al., 2013).According to Stein et al. (2014) this underestimation likely results from a combination of errors in the dry deposition parameterisation and certain limitations in the current emission inventories.The latter include too low anthropogenic CO emissions from traffic or other combustion processes and missing anthropogenic VOCs (Volatile Organic Compounds) emissions in the inventories together with an insufficiently established seasonality in the emissions.An additional reason for the apparent underestimation of emissions in MACCity may be an exaggerated downward trend in the RCP8.5 (Representative Concentration Pathways) scenario in North America and Europe between 2000 and 2010, as this scenario was used to extrapolate the MACCity emissions from their bench mark year, i.e. 2000.For CO, uncertainties in the evaluation also include the retrieved amount of CO total columns between IASI and MOPITT.These vary with region, with IASI showing lower CO concentrations in several regions (Alaska, Siberia, Europe and the US) during the northern winter months, which possibly contribute to the deviations observed between the modelled data and MOPITT satellite data, as only IASI data have been assimilated in the model.The differences can primarily be explained by the use of different a priori assumptions in the IASI and MOPITT retrieval algorithms (George et al., 2015).On a global scale, however, the average difference between the IASI and MOPITT total columns is less than 10 % (George et al., 2009).
Modelled NO 2 tropospheric columns agree well with satellite observations over the US, South Asia and North Africa.However, there is also a negative offset for NO 2 over East Asia and Europe.For the latter, these findings are supported by the evaluation with GAW surface data.Again, the largest deviations occur during the winter season.The quality of the emission inventory is even more crucial for short-lived reactive species such as NO 2 , where model results depend to a large extent on emission inventories incorporated in the simulations.This is highlighted by the deterioration of agreement between model results and satellite data for the US in July 2012 when anthropogenic emissions were changed from RETRO-REAS to MACCity.This change led to an increasing negative bias in NO 2 over Europe and North America and to an improvement for South and East Asia (see Fig. 18).A deterioration in MNMBs associated with the fire emissions is  visible between January and October 2011 over regions with heavy fire activity (Africa and East Asia), and goes back to a temporary error in the model regarding the reading of fire emissions (see Figs. 16 to 18).Particular challenges for an operational forecast system are regions with rapid changes in emissions such as China, where emission inventories need to be extrapolated to analysis times of the MACC_osuite to obtain reasonable trends.The latter is done as emission inventories usually refer to times prior to MACC_osuite analysis times.A large underestimation of NO 2 in China, especially in winter, has been reported for other CTMs in previous publications (He et al., 2007;Itahashi et al., 2014).The latter has been linked to an underestimation of NO x and VOC emissions, unresolved seasonality in the emissions and expected non-linearity of NO x chemistry.The change in validation data sets from SCIAMACHY to GOME-2 has been shown to have negligible impact on the validation results and conclusions.

Conclusion
The MACC_osuite is the global near-real-time MACC model analysis run for aerosol and reactive gases.The model has been evaluated with surface observations and satellite data concerning its ability to simulate reactive gases in the troposphere.Results showed that the model proved capable of a realistic reproduction of the observed annual cycle for CO, NO 2 and O 3 mixing ratios at the surface, however, with seasonally dependent biases.For ozone, these seasonal biases likely result from difficulties in the simulation of vertical mixing at night and deficiencies in the model's dry deposition parameterisation.For CO, a negative offset in the model during the winter season is attributed to limitations in the emission inventories together with an insufficiently established seasonality in the emissions.
The NO 2 total columns derived from satellite sensors and surface NO 2 observed by European GAW stations could be reproduced reasonably well over most of the evaluated regions, but showed a negative offset compared to the observational data, especially over Europe and East Asia.It has become clear, that the emission inventories play a crucial role in the quality of model results and remain a challenge for near-real-time modelling, especially over regions with rapid changes in emissions.Inconsistencies in the assimilated satellite data and fire emissions showed only a temporary impact on the quality of model results.The implementation of a model update improved the results especially at high latitudes (surface ozone) and over South and East Asia.
The MACC NRT forecast system is constantly evolving.A promising step in model development is the online integration of modules for atmospheric chemistry in www.atmos-chem-phys.net/15/14005/2015/ the IFS, currently being tested for implementation in the MACC_osuite (Flemming et al., 2015).In contrast to the coupled model configuration as used in this paper, the online integration in the Composition IFS (C-IFS) provides major advantages; apart from an enhanced computational efficiency, C-IFS promises an optimisation of the implementation of feedback processes between gas-phase/aerosol chemical processes and atmospheric composition and meteorology, which is expected to improve the modelling results for reactive gases.Additionally, C-IFS will be available in combination with different CTMs (MOZART and TM5), which will help to explain whether deviations between model and observations go back to deficiencies in the chemistry scheme of a model.

Figure 2 .
Figure 2. Modified normalised mean biases (MNMBs) [%] derived from the evaluation of the MACC_osuite with GAW O 3 surface observations during the period September 2009 to December 2012 globally (top), and for Europe (bottom).Blue colours represent large negative values and red/brown colours represent large positive values.

Figure 4 .
Figure 4. Modified normalised mean bias (MNMB) in % derived from the evaluation of the MACC_osuite with GAW O 3 surface observations during the period September 2009 to December 2012 (black line: global average of 50 GAW stations.Multi-coloured lines: individual station results; see legend to the right).

Figure 5 .
Figure 5. Root mean square error (RMSE) in ppb derived from the evaluation of the MACC_osuite with GAW O 3 surface observations during the period September 2009 to December 2012 (black line: global average of 50 GAW stations.Multi-coloured lines: individual station results; see legend to the right).

Figure 6 .
Figure 6.Correlation coefficient (R), derived from the evaluation of the MACC_osuite with GAW O 3 surface observations during the period September 2009 to December 2012 (black line: global average of 50 GAW stations.Multi-coloured lines: individual station results; see legend to the right).

Figure 7 .
Figure 7. Modified normalised mean biases (MNMBs in %) derived from the evaluation of the MACC_osuite with EMEP O 3 surface observations in three different parts in Europe (blue: northern Europe, orange: central Europe, red: southern Europe) during the period September 2009 to December 2012.

Figure 8 .
Figure 8. Modified normalised mean biases (MNMBs in %) derived from the evaluation of the MACC_osuite with EMEP O 3 surface observations during daytime (yellow colour), and night-time (blue colour) over northern Europe (a), central Europe (b) and southern Europe (c) during the period September 2009 to December 2012.

Figure 10 .
Figure 10.Modified normalised mean bias (MNMB) in % derived from the evaluation of the MACC_osuite with GAW CO surface observations over the period September 2009 to December 2012 (black line: global average of 29 GAW stations.Multi-coloured lines: individual station results; see legend to the right).

Figure 11 .
Figure 11.Root mean square error (RMSE) in ppb derived from the evaluation of the MACC_osuite with GAW CO surface observations over the period September 2009 to December 2012 (black line: global average of 29 GAW stations multi-coloured lines: individual station results; see legend to the right).

Figure 15 Figure 12 .
Figure15shows global maps of daily tropospheric NO 2 VCD averaged from September 2009 to March 2012.Overall, the spatial distribution and magnitude of tropospheric NO 2 observed by SCIAMACHY are well reproduced by the model.This indicates that emission patterns and NO x photochemistry are reasonably well represented by the model.However, the model underestimates tropospheric NO 2 VCDs over industrial areas in Europe, East China, Russia and Southeast Africa compared to satellite data.This could imply that anthropogenic emissions from RETRO-REAS are too low in these regions, or that the lifetime in the model

Figure 13 .
Figure 13.Monthly average of modified normalised mean biases (MNMBs in %) derived from the comparison of the MACC_osuite with MOPITT CO total columns for eight different regions during the period September 2009 to June 2012 (see legend on the right).

Figure 14 .
Figure 14.Time series plots of MOPITT CO total columns (black line) compared to IASI CO total columns (black dashed line) and the MACC_osuite CO total columns (red line) for eight different regions (defined in Fig. 1) during the period September 2009 to June 2012.Top: Siberia (left), Alaska (right), second row: United States (left), Europe (right), third row: South Asia (left), East Asia (right) bottom: southern Africa (left), North Africa (right).

Figure 15 .
Figure 15.Long-term average of daily tropospheric NO 2 VCD [10 15 molec cm −2 ] from September 2009 to March 2012 for (left) MACC_osuite simulations and (right) SCIAMACHY satellite observations.Blue colours represent relatively low values; red/brown colours represent relatively high values.
Fig. 20.As is observed for the satellite VCDs, NO 2 surface concentrations decrease in the model with the introduction of the updated model version and emission inventories.For stations located in complex terrain (e.g.Rigi, Fig. 20), results improve after the model update, likely also due to the higher model resolution.Monthly values of MNMB, R and correlation coefficient are shown in Figs.21 to 23.
, however, demonstrate that the minimum concentrations in particular are not captured by the model during summer.Possible explanations include a general underestimation of NO titration, which especially applies to stations with urban surroundings and strong sub-grid-scale emissions (e.g.TSU Fig. 3), including difficulties by the global model to resolve NO titration in urban plumes.It also seems likely that dry deposition at wet surfaces in combination with the large surface sink gradient due to nocturnal stability cannot be resolved with the model's relatively coarse vertical resolution.In regions such as central and southern Europe (Fig. 8) where daytime biases exceed night-time biases, the overestimation of O 3 might be related to an underestimation of daytime dry deposition velocities.Val Martin et al. (2014) described a reduction of the summertime O 3 model bias for surface ozone after the implementation of adjustments in stomatal resistances in the MOZART model's dry deposition parameterisation.The MACC_osuite model realistically reproduces CO total columns over most of the evaluated regions with monthly MNMBs falling between 10 and −20 % (Table

Figure 18 .
Figure 18.Modified normalised mean bias [%] for monthly means of daily tropospheric NO 2 VCD averaged over different regions (see Fig. 1 for latitudinal and longitudinal boundaries) derived from the MACC_osuite simulations and satellite observations (SCIAMACHY up to March 2012, GOME-2 from April 2012 to December 2012).Top: United States (left), Europe (right), second row: South Asia (left), East Asia (right); bottom: southern Africa (left), North Africa (right).Values have been calculated separately for each month.

Figure 21 .Figure 22 .
Figure 21.Modified normalised mean bias (MNMB) in % derived from the evaluation of the MACC_osuite with GAW NO 2 surface observations over the period September 2009 to December 2012 (black line: global average of six GAW stations, multi-coloured lines: individual station results; see legend to the right).

Figure 23 .
Figure 23.Correlation coefficient (R), derived from the evaluation of the MACC_osuite with GAW NO 2 surface observations over the period September 2009 to December 2012 (black line: global average of 6 GAW stations, multi-coloured lines: individual station results; see legend to the right).

Table 1 .
List of assimilated data in the MACC_osuite.

Table 2 .
Description of the set-up of the MACC_osuite between September 2009 and December 2012.Details on the assimilated data are provided in Table1.A description of the emissions is given in Sect."Emission inventories and assimilated data sets" in the text.

Table 3 .
List of GAW and EMEP stations used in the evaluation (GAW listed by label, EMEP listed by region: northern Europe NE; central Europe CE; and southern Europe, SE).The numbers by the station name provide the type of gas: a = O 3 , b = CO, c = NO 2 .Positive latitude values refer to the Northern Hemisphere, negative latitude values to the Southern Hemisphere.
• ] Long [ • ] Alt [m a.s.l.] Station Label/region Programme Lat [ • ] Long [ • ] Alt [m a.s.l.] global coverage within 3 days.The data used in this study correspond to CO total columns from version 5 (V5) of the MOPITT thermal infrared (TIR) product level 3.This product is available via the following web server: http://www2.acd.ucar.edu/mopitt/products. Validation of the MOPITT V5 product against in situ CO observations shows a mean bias of 0.06 × 10 18 molecules cm −2 cloud fractions of less than 20 % are used.Tropospheric NO 2 vertical column density (VCD) from the MACC_osuite is compared to tropospheric NO 2 VCD from GOME-2 and SCIAMACHY.As the European Space Agency lost contact with Envisat in April 2012, GOME-2 data are used for model validation from 1 April 2012 onwards, while SCIAMACHY data are used for the remaining time period (September 2009 to March 2012).Satellite observations are gridded to the horizontal model resolution, i.e. 1.875 • for IFS cycle CY36R1 (September 2009-June 2012) and 1.125 • for cycle CY37R3 (July-December 2012).

Table 4 .
Modified normalised mean bias (MNMB) [%], correlation coefficient (R), and root mean square error (RMSE) [ppb] derived from the evaluation of the MACC_osuite with Global Atmosphere Watch (GAW) O 3 surface observations during the period September 2009 to December 2012.The conventional station names are listed in Table3.

Table 5 .
Modified normalised mean bias (MNMB) [%], correlation coefficient (R), and root mean square error (RMSE) [ppb] derived from the evaluation of the MACC_osuite with Global Atmospheric Watch (GAW) CO surface observations during the period September 2009 to December 2012.The conventional station names are listed in Table3.

Table 6 .
Modified normalised mean bias (MNMB) [%] derived from CO satellite observations (MOPITT) and the MACC_osuite simulations of CO total columns from October 2009 until June 2012 averaged over different regions.

Table 7 .
Statistics derived from satellite observations (SCIAMACHY from September 2009 until March 2012, GOME-2 from April 2012 to December 2012) and the MACC_osuite simulations of daily tropospheric NO 2 VCD [10 15 molec cm −2 ] averaged over different regions for September 2009 to December 2012.

Table 8 .
Modified normalised mean bias (MNMB) [%], correlation coefficient (R), and root mean square error (RMSE) [ppb] derived from the evaluation of the MACC_osuite with Global Atmospheric Watch (GAW) NO 2 surface observations during the period September 2009 to December 2012.The conventional station names are listed in Table3.

Table 8 )
. The annual cycle of NO 2 with maximum concentrations during the winter period is in principle captured by the model, shown in the time series plots in