Multi-species inversion and IAGOS airborne data for a better constraint of continental scale fluxes

. Airborne measurements of CO 2 , CO, and CH 4 proposed in the context of IAGOS (In-service Aircraft for a Global Observing System) will provide profiles from take-off and landing of airliners in the vicinity of major metropolitan areas useful for constraining sources and sinks. A proposed improvement of the top-down method to constrain sources and sinks is the use of a multispecies inversion. Different species such as CO 2 and CO have partially overlapping emission patterns for given fuel-combustion related sectors, and thus share part of the uncertainties, both related to the a priori knowledge of 5 emissions, and to model-data mismatch error. We use a regional modeling framework consisting of the Lagrangian particle dispersion model STILT (Stochastic Time-Inverted Lagrangian Transport), combined with high resolution (10 km x 10 km) EDGARv4.3 (Emission Database for Global Atmospheric Research) emission inventory, differentiated by emission sector and fuel type for CO 2 , CO, and CH 4 , and combined with the VPRM (Vegetation Photosynthesis and Respiration Model) for biospheric fluxes of CO 2 . Applying the modeling framework to synthetic IAGOS profile observations, we evaluate the 10 benefits of using correlations between different species’ uncertainties on the performance of the atmospheric inversion. The available IAGOS CO observations are used to validate the modeling framework. Prior uncertainty values are conservatively assumed to be 20%, for CO 2 and 50% for CO and CH 4 , while those for, GEE (Gross Ecosystem Exchange) and respiration are derived from existing literature. Uncertainty reduction for different species is evaluated on a domain encircling 50% of the profile observations’ surface influence over Europe. We found that our modeling framework reproduces the CO 15 observations with an average correlation of 0.56, but simulates lower mixing ratios by a factor 2.8, reflecting a low bias in the emission inventory. Mean uncertainty reduction achieved for CO 2 fossil fuel emissions is roughly 37%; for photosynthesis and respiration flux it is 41% and 45%, respectively. For CO and CH 4 the uncertainty reduction is roughly 63% and 67%, respectively. Considering correlation between different species, posterior uncertainty can be reduced by up to 23%; such reduction depends on the assumed error structure of the prior and on the considered timeframe. The study 20 suggests a significant uncertainty constraint on regional emissions using multi-species inversions of IAGOS in situ observations. Kadygrov, 2015; Sasakawa, 2010). However, 25 as profiles collected from an aircraft easily exceed the height of towers, airborne data may also prove an interesting option for this application. This alternative was tested in some recent studies that made use of aircraft profiles alone or in combination with other data sources (e.g.: Brioude, 2013; Gourdji, 2013). Methods to maximise the cost-effectiveness of airborne data are the use of unmanned aircraft (drones) and commercial airliners. The latter, in particular, allows for collecting data on a regular basis without requiring a particularly small or light sensor. The most important projects making 30 use of commercial airliners are CONTRAIL (Comprehensive Observation Network for Trace Gases) (Machida, 2008), and MOZAIC/IAGOS (Measurements of Ozone and water vapor by in-service AIrbus aircraft / In-service Aircraft for a Global Observing System) (Marenco, 1998; Petzold, 2015). Both projects have been running for more than two decades and have paper focused on investigating the benefits on uncertainty reduction of such a multi-species inversion in comparison with a single-species inversion. To attain this goal, we set up a synthetic experiment utilizing the measurement times and locations collected from the IAGOS projects in the year 2011. The present paper is intended to pave the way for future 10 studies making use of multi-species IAGOS datasets when they become available. A receptor-oriented framework was set up to derive flux interactions between the atmosphere and the biosphere using IAGOS data. The modeling framework is composed of a Lagrangian Particle Dispersion Model (LPDM, specifically the STILT model), a diagnostic biosphere-atmosphere exchange model (the VPRM model), gridded emission inventories, global tracer transport model output that provides the tracer boundary conditions for the regional domain, and a Bayesian inversion scheme. The present work is 15 based on the modeling framework used in Boschetti (2015) and builds upon that by adding other species, and using a formal Bayesian inversion. A multi-species inversion was carried out in order to exploit the correlations in uncertainties between CO 2 , CO, and CH 4 , specifically in their respective uncertainties in a priori anthropogenic emissions and in model representation error. The aim of this multi-species inversion is to provide better estimates of anthropogenic emissions, and, the case of CO 2 , to better separate the biospheric from anthropogenic contributions. This paper is structured as follows: be and through a spatiotemporal aggregation operator A for the of into physically representative spatial aggregation we a 30 encircling the influence cumulated and posterior uncertainty uncertainty The present paper presents a synthetic experiment aiming to evaluate the effects of exploiting correlations between different trace gases in an atmospheric inversion. We quantitatively described the capability of the modeling framework to reproduce observations, the performance of the inversion scheme in reducing the uncertainty of the different trace gases, and the benefits of multi-species inversions compared to corresponding single-species inversions. We also describe a method to re- 5 scale different prior uncertainty covariance matrices so that the corresponding posterior uncertainties are actually comparable.

Abstract. Airborne measurements of CO 2 , CO, and CH 4 proposed in the context of IAGOS (In-service Aircraft for a Global Observing System) will provide profiles from take-off and landing of airliners in the vicinity of major metropolitan areas useful for constraining sources and sinks. A proposed improvement of the top-down method to constrain sources and sinks is the use of a multispecies inversion. Different species such as CO 2 and CO have partially overlapping emission patterns for given fuel-combustion related sectors, and thus share part of the uncertainties, both related to the a priori knowledge of 5 emissions, and to model-data mismatch error. We use a regional modeling framework consisting of the Lagrangian particle dispersion model STILT (Stochastic Time-Inverted Lagrangian Transport), combined with high resolution (10 km x 10 km) EDGARv4.3 (Emission Database for Global Atmospheric Research) emission inventory, differentiated by emission sector and fuel type for CO 2 , CO, and CH 4 , and combined with the VPRM (Vegetation Photosynthesis and Respiration Model) for biospheric fluxes of CO 2 . Applying the modeling framework to synthetic IAGOS profile observations, we evaluate the 10 benefits of using correlations between different species' uncertainties on the performance of the atmospheric inversion. The available IAGOS CO observations are used to validate the modeling framework. Prior uncertainty values are conservatively assumed to be 20%, for CO 2 and 50% for CO and CH 4 , while those for, GEE (Gross Ecosystem Exchange) and respiration are derived from existing literature. Uncertainty reduction for different species is evaluated on a domain encircling 50% of the profile observations' surface influence over Europe. We found that our modeling framework reproduces the CO 15 observations with an average correlation of 0.56, but simulates lower mixing ratios by a factor 2.8, reflecting a low bias in the emission inventory. Mean uncertainty reduction achieved for CO 2 fossil fuel emissions is roughly 37%; for photosynthesis and respiration flux it is 41% and 45%, respectively. For CO and CH 4 the uncertainty reduction is roughly 63% and 67%, respectively. Considering correlation between different species, posterior uncertainty can be reduced by up to 23%; such reduction depends on the assumed error structure of the prior and on the considered timeframe. The study 20 suggests a significant uncertainty constraint on regional emissions using multi-species inversions of IAGOS in situ observations. 25 30

Introduction
Climate predictions are currently hampered by excessive uncertainties. A symptom of this is that intercomparisons of different models show important difference in their predictions, as shown in Friedlingstein (2006). This makes it difficult to assess the better environmental policies to implement. As is widely recognized at an international level, there is a need for the reduction in anthropogenic emissions (IPCC, 2014). This however implies the necessity for emissions monitoring to 5 verify whether emission-reduction policies are successful. Because most biogenic fluxes in Europe are influenced by human activities, understanding and managing these biogenic fluxes must also be a component of any policy to reduce anthropogenic emissions.
An important tool to estimate carbon budgets by teasing apart sources and sinks in a given spatial domain is the atmospheric 10 Bayesian inversion. Atmospheric inversions combine prior knowledge from emission inventories with atmospheric observations acting as a top-down constraint to produce better posterior knowledge. An important metric to measure the effectiveness of an atmospheric inversion is the uncertainty reduction, defined as the difference between prior and posterior uncertainty normalized with the prior uncertainty. The vast majority of published papers on atmospheric inversions investigate the budget of a single species, usually a long-lived greenhouse gas like CO 2 (e.g. Rödenbeck, 2003) or CH 4 (e.g. 15 Hein, 1997;Bousquet, 2006), but the technique can also be applied to active species like CO (Bergamaschi, 2000). Note that carbon dioxide is a special case as atmospheric CO 2 mixing ratios result from a combination of strong anthropogenic sources with strong sources and sinks from biospheric processes, calling for a separation of anthropogenic from biospheric fluxes.
One way to achieve such a separation is to measure CO alongside CO 2 , and use CO as a proxy for CO 2 anthropogenic emissions. Palmer (2006) used CO 2 -CO correlations to improve an inversion of data from the TRACE-P aircraft mission in 20 March-April 2001, while Wang (2009) employed a similar method using satellite data, obtaining a reduction in CO 2 flux inversion error.
So far the lion's share of the studies investigating atmospheric inversions make use of both continuous in situ and flask measurements from ground based observational networks of tall towers (e.g. Kadygrov, 2015;Sasakawa, 2010). However, 25 as profiles collected from an aircraft easily exceed the height of towers, airborne data may also prove an interesting option for this application. This alternative was tested in some recent studies that made use of aircraft profiles alone or in combination with other data sources (e.g. : Brioude, 2013;Gourdji, 2013). Methods to maximise the cost-effectiveness of airborne data are the use of unmanned aircraft (drones) and commercial airliners. The latter, in particular, allows for collecting data on a regular basis without requiring a particularly small or light sensor. The most important projects making 30 use of commercial airliners are CONTRAIL (Comprehensive Observation Network for Trace Gases) (Machida, 2008), and MOZAIC/IAGOS (Measurements of Ozone and water vapor by in-service AIrbus aircraft / In-service Aircraft for a Global Observing System) (Marenco, 1998;Petzold, 2015). Both projects have been running for more than two decades and have Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License. produced extensive datasets that have proven to be important in the fields of atmospheric modeling and satellite calibration and validation. Regarding carbonaceous species, CONTRAIL has so far been collecting CO 2 mixing ratio measurement, while IAGOS was focused on CO. In the next years the IAGOS fleet will simultaneously provide CO, CO 2 and CH 4 atmospheric concentration measurements (Filges, 2015), enabling the use of multi-species synergy in modeling applications. This synergy follows from the fact that the collocated measurements share the same atmospheric transport and have partially 5 correlated emission uncertainties. This paper is focused on investigating the benefits on uncertainty reduction of such a multi-species inversion in comparison with a single-species inversion. To attain this goal, we set up a synthetic experiment utilizing the measurement times and locations collected from the IAGOS projects in the year 2011. The present paper is intended to pave the way for future 10 studies making use of multi-species IAGOS datasets when they become available. A receptor-oriented framework was set up to derive flux interactions between the atmosphere and the biosphere using IAGOS data. The modeling framework is composed of a Lagrangian Particle Dispersion Model (LPDM, specifically the STILT model), a diagnostic biosphereatmosphere exchange model (the VPRM model), gridded emission inventories, global tracer transport model output that provides the tracer boundary conditions for the regional domain, and a Bayesian inversion scheme. The present work is 15 based on the modeling framework used in Boschetti (2015) and builds upon that by adding other species, and using a formal Bayesian inversion. A multi-species inversion was carried out in order to exploit the correlations in uncertainties between CO 2 , CO, and CH 4 , specifically in their respective uncertainties in a priori anthropogenic emissions and in model representation error. The aim of this multi-species inversion is to provide better estimates of anthropogenic emissions, and, in the case of CO 2 , to better separate the biospheric from anthropogenic contributions. This paper is structured as follows: a 20 short description of the different components of the modeling framework is given in Sect. 2; in Sect. 3 we present and discuss our results; Sect. 4 gives the conclusions. 25 30 Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License.

Modeling framework
Before describing the different models composing the modeling framework, we introduce some specific terminology to reduce ambiguity in Sect. 2.1.1-2.1.6. Quantities that can be observed are termed species, or trace gases, corresponding in this case to total CO 2 , CO and CH 4 . These three species are simulated using five modeled species, namely CO 2 from fossil 5 fuels, CO 2 related to GEE (Gross Ecosystem Exchange) and to respiration, CO, CH 4 . Modeled species related to anthropogenic emissions are modeled as the sum of contributions from different emission sectors (Table 1) and fuel types (Table 2); as a further factor of discrimination, both anthropogenic and biospheric contributions are split into monthly contributions. Simulated fluxes specific for different modeled species, emission sectors, fuel types and months of the year are called flux categories. In this Matherial and Methods, a brief description of the different models that make up the 10 modeling framework is given. For more detailed information, see Boschetti (2015).

Vertical profile input data
In this study the modeled profiles have the identical structure to those collected from the IAGOS fleet of commercial airliners. More precisely, the spatial and temporal coordinates of different observations will be used as input for the 15 modeling framework whereas the observed values of atmospheric mixing ratios of CO and meteorological parameters themselves will play a role in calibrating the modeling framework.
Central for this work is the concept of the Mixed Layer (ML), the lower part of the troposphere in which trace gases are well mixed due to turbulent convection in the time scale of an hour or less, and in which the effect of regional surface-atmosphere 20 fluxes is the strongest. As input to the inversion we use the enhancement of the species' mixing ratio within the mixed layer relative to that in the free troposphere (FT), similar to the approach described in Boschetti (2015). This mixed layer enhancement best reflects the influence of regional fluxes. To compute this, we take the average mixing ratio within the mixed layer and subtract the value taken at 2 km above the mixed layer top (z i ), i.e. well within the free troposphere. The z i is a very important parameter in atmospheric modeling, and accounts for most of the transport uncertainty in the vertical 25 domain. In fact, assuming that the mixed layer is the part of the troposphere in which trace gases are well mixed due to turbulent convection, given a certain amount of trace gas in the ML, its mixing ratio depends strongly on its depth z i . More precisely, even if the model has correctly reproduced the amount of trace gas in the real mixed layer, if the modeled z i is lower (higher) than the actual one, then the simulated ML mixing ratio will be higher (lower) than it actually should be. In the present study, modeled z i are corrected according to Boschetti (2015, Sect. 2.2.1) 30 Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License.

Transport-flux coupling
The modeling framework is composed of a regional transport model (STILT), the EDGAR (Emission Database for Global Atmospheric Research) emission inventory to model anthropogenic emissions, VPRM (Vegetation Photosynthesis and Respiration Model) to model emissions from the biosphere and output from global transport models for lateral boundary 5 conditions for the different modeled species. The expressions 'anthropogenic emissions' and 'fossil fuel emissions' are considered synonymous in this paper and are used to indicate the sum of fossil fuel and biofuel emissions, without including contributions from LULUCF (Land Use, Land use Change and Forestry).
For regional transport we make use of the LPDM STILT (Stochastic Time-Inverted Lagrangian Transport)  to 10 derive the sensitivity of the atmospheric mixing ratio measurement to upstream surface-atmosphere fluxes, so-called "footprints". For each measurement location and time (also called receptor point), a single footprint is derived; this is then matrix-multiplied with an emission map from an emission inventory, resulting in a simulated mixing ratio corresponding to the regional contribution at the measurement location. A detailed description of STILT is given in . We use STILT coupled with emission models for both anthropogenic (EDGAR) and biosphere (VPRM) fluxes on a 15 regional domain that covers most of Europe (33° to 72° N, -15° to 35° E) with a spatial resolution of 1/8 degree for latitude and 1/12 degree for longitude, roughly corresponding to 10 km. As lateral boundary condition for CO mixing ratios the MACC reanalysis (Inness, 2013, downloaded from http://www.ecmwf.int) was used, whereas for CO 2 and CH 4 we use output from the Jena CarboScope (Rödenbeck, 2003; CO 2 data available from www.bgc-jena.mpg.de/CarboScope/) which is based on forward simulations of global-inversion optimized fluxes with the TM3 transport model (Heimann and Körner, 20 2003). TM3 fields have lower resolution, but they are chosen for their consistency with measurements from the groundbased network. In addition, spatial resolution is of relatively minor importance for the contribution from the lateral boundary as it is far away from the measurement locations.
For fossil fuel emissions we use a model based on the EDGAR emission inventory modified following the same approach 25 taken for COFFEE (CO 2 release and Oxygen uptake from Fossil Fuel Emission Estimate) (Steinbach, 2011,;Vardag, 2015).
More precisely, to obtain hourly resolved emissions from the original EDGAR annual fluxes for different emission categories we add specific temporal activity factors (Denier van der Gon, 2011) to account for differences in emissions due to seasonal, weekly and daily cycles. In addition, the different emission categories are further split into contributions from different fuel types from British Petroleum's Statistical Review of World Energy 2014 (BP, 2014). The World Energy 30 Outlook from IEA as alternative source of information was not chosen, as the report from BP is available earlier (April vs. November of the following year). This allows for taking into account changes in emissions between different years. Such an emission model provides hourly resolved fluxes for each fossil fuel flux category with a spatial resolution of roughly 10 km Atmos. Chem. Phys. Discuss., doi: 10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License. on our regional European domain. To take into account also the contribution from the biosphere we use the Vegetation Photosynthesis and Respiration Model (VPRM). VPRM simulates realistic patterns at small spatial (10 km x 10 km) and temporal (hourly) scales and is used here to provide the a priori fluxes for biosphere-atmosphere exchange of CO 2 . This model is described in detail in Mahadevan (2008).

5
STILT transport is driven by meteorological fields from the ECMWF IFS (12 hour forecasts twice daily at 3-hourly temporal resolution), which have a spatial resolution of 0.25 degree with 61 vertical levels. In the following, we will refer to the STILT/EDGAR/VPRM/MACC/TM3 combination of transport, simulated fluxes and advected boundary conditions as merely 'STILT' for simplicity.

Bayesian inversion
Atmospheric inversions provide an estimate of the distribution of sources and sinks over the domain's surface from available concentration measurements ("top-down" approach). This can be formalized in the following linear relation: Where the y vector contains the n observations, and K is the Jacobian matrix that relates the observations with the state vector λ. In the present study the focus will be on surface-atmosphere gas exchanges due to both biospheric processes and anthropogenic emissions. So the observations are trace gas mixing ratios at different times and locations, K is the product of a transport operator H that maps flux sensitivities at different times and locations with a set of gridded fluxes F for the 20 categories of interest, while the state vector λ contains the m scaling factors for the flux categories of interest. H has n rows and a number of column equal to h=N x *N y *N t *N s being respectively the number of pixels in the emission model along the x and y axes, the number of (hourly) simulations in the whole year of interest and the number of state vector elements, resulting in a huge matrix. As the matrix F describes the different simulated gridded fluxes, it is comparably large and has h rows and m columns. By considering K as the result of the product of these two large matrices, it is possible to limit its 25 dimensions to only n rows and m columns; this allows for simplifying the critical task of relating observation with simulated fluxes of the categories of interest. The state vector accounts for specific emission sectors (Table 1) and fuel types (Table 2) for each one of the three modeled species from the EDGAR emission model, plus gross fluxes (gross ecosystem exchange matrices, their product is directly computed within the STILT code. The random error ε accounts for measurement error related to uncertainty in the observation and to model-data mismatch resulting from model uncertainty. Bayesian inversion combines observations (IAGOS profiles) with a priori information (scaling factors and their a priori uncertainties) to reconstruct the most probable state vector. Optimum posterior estimates of the scaling factors are obtained 5 by minimizing the following cost function J (Rodgers, 2000): Here the first and the second term are the observational constraint and the prior constraint term respectively. The prior 10 scaling factors for the fluxes of the different tracers are set equal to one. S ε is the error covariance matrix for the mismatch between simulated and observed mole fractions (model-data mismatch) and accounts for instrumental uncertainty, uncertainty related to the transport model, and other sources of uncertainty like boundary conditions and flux aggregation not accounted for through the state vector adjustment. S prior is the error covariance matrix for the prior scaling factor; its implementation requires a different approach for biospheric and anthropogenic fluxes. The detailed error structure for model-15 data mismatch and prior uncertainty is described in the Sect. 2.1.4. Minimizing the cost function results in an optimal posterior estimate of the state vector λ that is consistent with both the measurements and the prior model estimates: 20 The error covariance matrix of the optimal posterior state (the posterior uncertainty) is given by: Note that this quantity depends on neither the prior fluxes nor the measured mixing ratios, but only on their respective 25 uncertainties and on the transport matrix K.
The targeted quantities of this study are the emissions over a specific area at a specific time scale (e.g. month); those quantities can be derived from the prior and posterior state through a spatiotemporal aggregation operator A that allows for the conversion of scaling factors into physically representative quantities. As a spatial aggregation scale we chose a domain 30 encircling the 50% influence in the cumulated footprint for the receptor points in the ML for the year 2011 (Fig. 1). The prior and posterior uncertainty of these targeted quantities (σ prior and σ post ) is obtained by applying the aggregation operator to the respective uncertainty covariances: Atmos. Chem. Phys. Discuss., doi: 10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License.
Different versions of the aggregation operator were created for this: emissions categories are aggregated according to different fuel types (coal, oil, gas, bio, waste, other) and according to emission sectors (energy, transport, industry, buildings, 5 agriculture, waste, fossil fuel fires).
To quantitatively assess the information provided by the inversion, the reduction of uncertainty in the posterior compared to the prior estimate is a useful measure. The uncertainty reduction UR is defined as: The uncertainty reduction ranges from 0 (posterior as large as the prior uncertainty) to 1 (posterior negligible compared to the prior uncertainty). 15

Prior error structure
As in this study a multi-species inversion with CO, CO 2 and CH 4 is envisioned, we have the chance to exploit the correlations in the uncertainties of the different trace gases related to both a priori fluxes and model-data mismatch. This is particularly true for CO and CO 2 because they share a larger part of the emission sources, which implies correlations in the respective uncertainties. In the multi-species inversion, such information is stored in the areas of the error covariance 20 matrices that describe covariance between different modeled species (off-diagonal 'blocks' in Fig. 2b for S prior and Fig. 3b for S ε ). In the single-species inversions, said covariance is set to zero, corresponding to a situation where the different species are completely independent of one another. Conversely, the measurement uncertainty is stored in the main diagonal of the S ε (Fig. 3d). 25 We used a single year (2011) dataset restricted to the vertical profiles centered at the Frankfurt airport, and restricted to daytime during well-mixed atmospheric conditions (10:30 to 17:30 CET). The dataset contains 1098 pseudo-observations, 366 for each of the three observable species, whereas the state vector contains the scaling factors for 2604 flux categories, each equal to one in the prior.
where C prior is the prior error correlation matrix (Fig. 2a) and ρ prior is a prior rescaling matrix described in Sect. 2.1.6 ( Fig.   4a). First we describe how C prior is generated. The prior error correlation matrix is a square matrix of rank 2604, reflecting the length of the state vector, and results from the product of three components (Fig. 2b, 2c and 2d) accounting for 5 correlations between flux categories according to the modeled species, emission sectors and fuel types respectively. In four different instances, a correlation of 0.7 is applied: 1. Between different anthropogenic modeled species 2. Between GEE and respiration 3. Between different emission sectors 10 4. Between different fuel types Such a correlation implies that the explained variance for each constraint everything else being equal is roughly 50%, (0.7 to the square equals 0.49) with the rest remaining independent. In addition, the correlation between fossil-fuel-related and biosphere-related scaling factors is zero, and the same holds for fluxes of different months, indicating complete independence from one another. 15

Prior error scaling
After having specified the prior error correlation matrix C prior , we now describe how we rescale it to obtain S prior ; for this task we rewrite Eq. (7) as 20 where each C ij is a subset of the fossil fuel part of C prior ('block') as shown in Fig. 2, and each ρ i is defined as Atmos. Chem. Phys. Discuss., doi: 10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License.
where A' is the aggregation operator for annual fluxes over the full domain, and ε i is the corresponding relative prior uncertainty, assuming the values specified in Table 3 for different cases. Case 1 is considered as the default case, with prior uncertainty values conservatively assumed to be 20%, for CO 2 and 50% for CO and CH 4 . Conversely, C bio covers the biosphere part of C prior , and for ! ! ! ! ! for ρ bio we use a prior uncertainty of 0.54 GtC y -1 , as derived in Panagiotis (2016) for 5 the VPRM model. The biospheric part of the prior error covariance matrix assumes no correlation with the fossil fuel species.
The posterior of each Bayesian inversion depends on its specific prior. As the multi-and single-species inversion have different prior uncertainty structures, the uncertainty reduction for targeted quantities cannot be directly compared (Eq. (4)). 10 To be able to compare the two inversions, we require that the a priori aggregated uncertainty of the targeted quantities remains the same, and distribute it differently each time; the prior rescaling matrix ρ prior is needed for this task. The benefits were tested for observations taken in different months and for three different error structures in the prior uncertainty. As a priori aggregated uncertainty we use a percentage of the aggregated modeled emissions for fossil fuels across the whole year.

Model-data mismatch error structure
In an atmospheric inversion, the model-data mismatch from every uncertainty source (such as measurement uncertainty, transport model uncertainty, spatial representation error due to limited model resolution, and boundary condition inaccuracies) needs to be taken into account. In our inversion scheme, we parameterize both the transport model uncertainty 20 and the measurement uncertainty, with the latter playing a minor role. The model-data mismatch covariance matrix (S ε ) is constructed according to the following equation:

25
where C s accounts for correlations between different observed species (Fig. 3b), C t accounts for the temporal correlation ( Fig. 3c), ε tran is the total transport error and ! !"#$ ! accounts for all of the non-transport related errors like spatial representation error and lateral boundary conditions (Fig. 3d).
The assumed measurement uncertainty is 1 ppm for CO 2 , 20 ppb for CO and 20 ppb for CH 4 , while ε tran is time dependent 30 and assumed to be proportional to the modeled enhancement due to regional fluxes. The assumed measurement uncertainty is higher than the expected instrument precision because it also includes in addition the uncertainties related to spatial Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License. representation and lateral boundary condition. ε tran is characterized as follows by different components in the vertical and horizontal domain: where both the horizontal transport error ε tran_h and the vertical transport error ε tran_v are characterized as percentage error.
More precisely, ε tran_h is assumed to be 50%, while ε tran_v is a profile-specific relative error related to the modeled z i . The vertical transport error accounts for the difference in vertical resolution between the transport model and the IAGOS profiles, and its mean value is about 10%. For the horizontal component, an uncertainty of 50% is a conservative estimate based on Lin and Gerbig (2005), where the horizontal transport error is found to be 5.9 ppm for CO 2 . This, combined with about 10 10 ppm of drawdown in the mixed layer relative to the free troposphere, gives something like 50% error in the regional flux signal. The vertical component is so much smaller in percentage since the simulated mixing ratios are already corrected for mismatch between modeled and observed z i .
In the multi-species inversion, the transport error correlation across species is 0.7 (Fig. 3b), while in the single-species 15 inversion this is set to zero. Time correlation is assumed to decay exponentially with an exponential constant of 12 hours.
The between-species correlation for model-data mismatch related to transport uncertainty reflects the fact that species are partially co-emitted and share the same atmospheric transport (and its related uncertainty).

Pseudo-data generation
As explained in the introduction, in situ measurements are not available for all of the three trace gases of interest, but only for CO. For this reason this paper aims to evaluate the benefits of a multi-species inversion over a corresponding singlespecies one by performing a synthetic experiment, using pseudo-observations derived by perturbation of the model outputs based on a priori state vector values. More precisely, the pseudo-observation vector is obtained by matrix multiplication 25 between the Jacobian matrix K and what we assume to be the true state vector. The true state vector itself is obtained by using the sum of the prior state vector (all values equal to one) and a random realization of the prior error, truncated to avoid negative state vector values. This ensures that the difference between the true and prior state vector has the same error correlation structure as described by the prior error covariance matrix.

Results and Discussion
Before evaluating the performance of the inversion scheme in reducing the uncertainty of the state space, a closer look at the ability of the modeling framework to reproduce the enhancements is necessary. Unfortunately, this can be done only for CO as actual measurements are not available for the other species. Figure 5 shows the mean daily enhancement of the three fossil fuel species for both model outputs and observations. A common feature to the three trace gases is that lower values tend to 5 occur during summer time due to a better mixing of the atmosphere. Conversely, enhancement values tend to be higher during winter, reflecting the more stratified atmosphere of the cold months.
In Fig. 5 the modeled CO plot was multiplied by a factor of 2.8, corresponding to the mean ratio between observed and modeled CO enhancements, similar to what was found in Boschetti (2015). Mixing ratio values are highly variable, but the 10 model usually manages to reproduce the associated spikes; the squared correlation coefficient between observed and modeled CO enhancements is 0.56 and the standard deviation of corrected model and observation residuals is 87 ppb. The median of the mixing ratio enhancement for the three trace gases is 30.0 ppb for CO, 53.7 ppb for CH 4 and 3.0 ppm for CO 2 .
For CO 2 this seasonal difference is enhanced due to the simultaneous presence of both anthropogenic and biogenic emissions. During summer values are slightly negative due to strong photosynthesis fluxes from growing vegetation from the 15 active combined with deeper vertical mixing. Negative values arise in 31% of the cases predominantly during the warmer months, implying that during the growing period uptake by photosynthesis dominates over release from combustion and respiration. Both CO and CH 4 experience higher values during winter due to the shallow mixed layer usually associated with cold temperatures, and lower values during summer as higher temperature cause the mixed layer to reach higher altitudes; differences related to seasonal domestic heating and transportation may also play a role. In addition, enhancement for both 20 species is occasionally negative, most likely owing to advection of polluted air masses in the free troposphere. An alternative explanation is that strong winds at lower heights can disperse the emissions in the boundary layer and create a situation in which the mixing ratio in the FT is higher than in the ML.
With respect to the prior error covariance matrix, the posterior error covariance shows lower values (Fig. 6) corresponding to 25 an uncertainty reduction of 22%. Figure 7 and 8 show a priori, a posteriori, and "true" fluxes related to different aggregated fuel types and to different emission categories as described in Tables 1 and 2 for the months of July and December. Figure 8 also shows the biospheric contribution (as absolute values) scaled down by a factor of 10. As is to be expected, the biospheric contributions show 30 strong differences according to the seasonal cycle, while anthropogenic emissions remain rather stable. However, it is worth pointing out that while the fossil fuel prior is similar for both months, the assumed truth can be rather different due the random assignment of the prior error realization. In most cases, the posterior adapts and is therefore closer to the truth than Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License. the prior; the posterior uncertainty is also visibly reduced, as expected. Regarding the different tracers, CO 2 and CO show a somewhat similar pattern indicating a partial overlap in dominating emission categories while CH 4 is dominated by different contributions in both fuel types and emission categories.
Dominant fuels for CO 2 are coal, gas and oil, whose prior fluxes (pseudo data) have a magnitude of 6-11 Megatons of carbon 5 per year (MtC y -1 ) in July and 8-14 MtC y -1 in December, while CO is dominated by a 0.19 MtC y -1 flux from biofuels during winter and secondary contributions during summer from oil and biofuels with a magnitude of 0.06-0.08 MtC y -1 .
Regarding CH 4 , the single dominant contribution is from "Other" fuels, responsible for 0.16-0.24 MtC y -1 of emissions.
"Other" fuel types include emissions from non-metallic minerals industry (e.g. cement, lime), agricultural waste burning, metal industry processing, chemical and solvent industry, solid waste disposal in landfills, wastewater treatment, manure 10 management in agriculture, rice cultivation in agriculture, and agricultural soil emissions. For CO 2 , the dominant contribution from these "Other" fuels in the European domain is from the non-metallic mineral industry (1.13 MtC y -1 ); for CO and CH 4 , the lion's share of the "Other" fuels emission is from the metal industry (0.03 MtC y -1 ) and agricultural waste burning (0.06-0.11 MtC y -1 ).

15
The most important emission sectors for CO 2 are energy, industry, transport and building, each contributing 7-10 MtC y -1 in July and 6-14 MtC y -1 in December, while CO is dominated by a 0.19 MtC y -1 flux from buildings during winter with secondary contributions from industry and transport with a magnitude of 0.04 MtC y -1 and 0.05 MtC y -1 respectively in both the analyzed months. CH 4 is dominated by a contribution of 0.15 MtC y -1 flux from agriculture in July with secondary contributions from waste and energy with a magnitude of roughly 0.06-0.08 MtC y -1 in both July and December. The 20 contribution from biospheric primary production is about 100 MtC y -1 in July, which drops to almost zero in December, while respiration values are 50 MtC y -1 in July and roughly 150 MtC y -1 in December.
As further assessment of the inversion performance, we tested the ability of the inversion scheme to capture the truth compared with a perturbed version of the prior. To do so we calculated for each simulated species the overall bias for the 25 whole year between the prior and both posterior and the perturbed prior. It is clear from Table 4 that while the overall bias between posterior and truth is lower than the prior-truth bias, the bias between perturbed prior and truth is much higher, implying that the performance of the inversion is not an artifact of the pseudo-data generation.
Before investigating the benefits of correlations between different tracers, it is meaningful to evaluate the uncertainty 30 reduction in the monthly budgets for all five modeled species (Fig. 9, based on targeted spatial domain in Fig. 1). The first thing to note is that for all of the five trace gases the posterior uncertainty is lower than the prior one, as it should be. In addition, prior uncertainty varies through the year, reflecting modulation in emission fluxes obtained by adding activity factors to describe the seasonal, weekly and daily cycle.
Mean uncertainty reduction of the monthly values is 38% for fossil fuels emission of CO 2 , 41% for GEE, and roughly 45% 5 for respiration, 64% for CO and about 67% for CH 4 . It is worth pointing out that such values are higher than the mean uncertainty reduction in the scaling factors (22%); this happens because the most representative emission sectors are those influencing the observations the most and thus are also the most constrained.
In addition, note that in this case, the posterior uncertainties for single-and multi-species inversions are similar for the 10 modeled species, with the exception of the CO 2 anthropogenic contributions. To generalize this last result, we tested the benefit of a multi-species inversion for the different cases of prior uncertainty values shown in Table 3. As an indicator for the benefit of including correlation between different species, we use the ratio between posterior uncertainty of the multispecies inversion and the posterior uncertainty of the corresponding single-species inversion. A value of one means that there is not benefit in adding an inter-species correlation to the inversion, while values greater than one means that a multi-species 15 inversion is even less constrained than a single-species one. We expect this indicator to be less than one, meaning that interspecies correlations actually improve the constraint power of the inversion. As before, we consider here the uncertainties of the retrieved budgets for the 50% footprint, where the surface influence is strongest (Fig. 1). Values of this uncertainty ratio for the different trace gases as function of month are shown in Fig. 10 for the different cases listed in Table 3.

20
All of the species experience a reduction in the posterior uncertainty ratio due to the addition of inter-species correlation; said reduction is up to 20% for fossil fuel CO 2 and up to 10% for the other species; In addition, anthropogenic CO 2 is more sensitive to the prior relative error values than CO and CH 4 . As the uncertainty of GEE and respiration is not modified, they show little to no variations for different cases (Fig. 10). There is a dependence of the benefit of the multi-over a singlespecies inversion on the prior uncertainty values (differences between cases 1-3), with the largest difference for fossil fuel 25 emissions of CO 2 . Interestingly for case 2 with reduced prior uncertainty for fossil fuel CO 2 emissions the benefit nearly doubles over the default case (Case 1). Also reducing the prior uncertainties of CO and CH 4 emissions (Case 3) more or less compensates for this increase in benefit. Note that the assumed prior uncertainties for the default case (Case 1) are quite conservative, therefore lower uncertainties were chosen for Cases 2 and 3. While the absolute benefit of adding inter-species correlation is not a game-changer, it is worth pointing out that such improvement also comes with only slightly greater 30 computational effort than multiple independent single-species inversions.
In order to assess the contribution of inter-species correlation in the prior uncertainty vs. that of model-data mismatch uncertainty, Fig. 11 also shows the resulting posterior uncertainty ratios for Case 1 (Table 3)  prior or model-data mismatch correlation. For the anthropogenic component of CO 2 , the greatest constraint is given by the prior correlation, while for GEE, respiration, and CH 4 the strongest contribution is from the model-data mismatch correlation. In the case of CO, the inter-species correlations for different components are dominant for different months of the year. 5 Palmer (2006) (in the following referred to as P06) studied the importance of inter-species correlation to improve inverse analysis using airborne data from the TRACE-P mission conducted in March/April 2001 over the western region of the Pacific Ocean. P06 derived a prior error correlation lower than 0.2 by analysing the uncertainty of emission factors from an Asia-specific emission inventory (Streets, 2003), which is significantly smaller than the correlation of 0.7 assumed in the present study. P06 deemed CO 2 -CO prior correlation greater than 0.5 to be unrealistic for the emissions in Asia, which is 10 mostly associated with the uncertainty in emission factors for CO of 67% for fossil fuel emissions and 240% for biofuel emissions in China (P06 Table 1). However, for the European region used in the present paper we argue that values around 0.7 are appropriate. The resulting uncertainty in the CO 2 -CO ratio, diagnosed from the prior error covariance matrix used in this study, is about 50% for both biofuel and fossil fuel emissions in Europe, which we regard as reasonable. To compare results from P06 with those from the present study, ratios of posterior uncertainties resulting from inversions using 15 correlations between CO 2 and CO of 0.7 in the prior uncertainties and to those using no correlations have been extracted from Fig. 7 in P06 and are also shown as orange diamonds in Fig. 11. It is easy to see that for anthropogenic CO 2 , the value derived from P06 is higher than in our study, meaning, while the two values are very similar for CO. Similarly, posterior uncertainty ratios using model-data mismatch correlations of 0.7 between CO 2 and CO are derived from Fig. 8 of P06 and are shown as red diamonds. In this case, the value derived from P06 is slightly lower than in our study for anthropogenic 20 CO 2 , while the two are again very similar for CO.
From this comparison we can see that the estimates of the benefit of including inter-species correlation in atmospheric inversions in P06 and in this paper are on the same order of magnitude for anthropogenic CO 2 and almost identical for CO, suggesting a general continuity of results. 25 30 Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License.

Conclusions
The present paper presents a synthetic experiment aiming to evaluate the effects of exploiting correlations between different trace gases in an atmospheric inversion. We quantitatively described the capability of the modeling framework to reproduce observations, the performance of the inversion scheme in reducing the uncertainty of the different trace gases, and the benefits of multi-species inversions compared to corresponding single-species inversions. We also describe a method to re-5 scale different prior uncertainty covariance matrices so that the corresponding posterior uncertainties are actually comparable.
Where possible, we confronted model outputs with available observations. Such comparison, possible only for CO, showed a good degree of agreement between the model and observations with an overall correlation of roughly 0.75; modeled values 10 for CO enhancement underestimate the observed ones by a factor of roughly 2.8, compatible with what was found in Boschetti (2015). It is found that posterior uncertainty is much lower than the prior for all of the five simulated species. The mean uncertainty reduction for CO 2 emissions from fossil fuels is roughly 38%, for GEE it is around 41% while for respiration it is roughly 44%. For CO and CH 4 the uncertainty reduction is about 63% and 67% respectively. Finally, we described quantitatively the benefit of using multi-species inversions by exploiting correlations in different chemical species . 15 It is found that considering correlations between different trace gases can reduce the posterior uncertainty by up to about 20% for monthly fluxes. These benefits are however dependent on the error structure of the prior uncertainty.
The present paper paves the way for future studies using simultaneous measurements of different trace gases. This will be especially important in the context of the upcoming routine measurements of CO 2 , CO, and CH 4 vertical profiles within 20 IAGOS. As IAGOS makes use of commercial airliners, such profiles will be collected in the vicinity of major international airports, and hence in the vicinity of major metropolitan areas, where many different human activities take place simultaneously. In such a context, any improvement in the constraint of atmospheric inversions will be particularly useful. A possible improvement in this analysis would be to evaluate the effects of different correlation factors specific to different pairs of anthropogenic species, fuels and emission sectors. 25 30 Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 1 March 2017 c Author(s) 2017. CC-BY 3.0 License.

Acknowledgements
The research leading to these results has received funding from the European Community's Seventh Framework Programme ([FP7/2007([FP7/ -2013 Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-69, 2017 Manuscript under review for journal Atmos. Chem. Phys.              Table 3. Note that CO 2 refers to fossil fuel emissions only, and RESP and GEE refers to the biospheric fluxes.