Atmospheric CO 2 inversions on the mesoscale using data-driven prior uncertainties : quantification of the European terrestrial CO 2 fluxes

Optimized biogenic carbon fluxes for Europe were estimated from high-resolution regional-scale inversions, utilizing atmospheric CO2 measurements at 16 stations for the year 2007. Additional sensitivity tests with different datadriven error structures were performed. As the atmospheric network is rather sparse and consequently contains large spatial gaps, we use a priori biospheric fluxes to further constrain the inversions. The biospheric fluxes were simulated by the Vegetation Photosynthesis and Respiration Model (VPRM) at a resolution of 0.1 and optimized against eddy covariance data. Overall we estimate an a priori uncertainty of 0.54 GtC yr−1 related to the poor spatial representation between the biospheric model and the ecosystem sites. The sink estimated from the atmospheric inversions for the area of Europe (as represented in the model domain) ranges between 0.23 and 0.38 GtC yr−1 (0.39 and 0.71 GtC yr−1 up-scaled to geographical Europe). This is within the range of posterior flux uncertainty estimates of previous studies using groundbased observations.

1 Introduction Global and regional atmospheric inversions assimilate atmospheric CO 2 measurements made by a global network for 2 decades to infer terrestrial carbon fluxes using surface in situ or flask measurements of CO 2 dry mole fractions (Tans et al., 1989;Enting and Mansbridge, 1989;Conway et al., 1994;Fan et al., 1998;Rödenbeck et al., 2003).The opti-mization of CO 2 biospheric fluxes for the European domain has been of high interest in previous studies using either pseudo or real data (Peters et al., 2010;Carouge et al., 2010a, b;Rivier et al., 2010;Broquet et al., 2011Broquet et al., , 2013;;Peylin et al., 2013).Retrieved fluxes from most of the inversions are obtained from global systems at coarse resolution; hence, the spatial and temporal flux variability on finer scales cannot be resolved.Large uncertainties in the flux retrievals are introduced due to the coarse resolution of the transport models used and due to the network sparseness (Peters et al., 2010).
Apart from ground-based observations, satellite measurements have also been recently used in atmospheric inversions to infer terrestrial fluxes (Basu et al., 2013;Deng et al., 2014;Chevallier et al., 2014).The advantage of using spaceborne measurements lies in the high density of the observations, providing the opportunity to constrain regions not seen by the ground network.However, satellite-based inversions significantly differ from ground-based inversions, reporting a larger annual uptake for Europe.A characteristic example is the estimated European uptake in the study by Reuter et al. (2014).They calculated an uptake of 1.02 GtC yr −1 , which triggered an ongoing debate on whether those estimates are data driven or they lack robustness due to deficiencies in the satellite observations and in the inverse modeling (Feng et al., 2016).
One of the largest sources of uncertainty in inversions is the atmospheric transport uncertainty.Modeled dry mole fractions are biased due particularly to uncertainties in vertical mixing near the surface (Gurney et al., 2003;Gerbig et al., 2008;Houweling et al., 2010).As a consequence, posterior Published by Copernicus Publications on behalf of the European Geosciences Union.P. Kountouris et al.: Quantification of the European terrestrial CO 2 fluxes flux estimates are also biased because biases in concentrations due to transport model errors are translated into biases in fluxes through the optimization procedure.Propagation of uncertainties in winds (Lin and Gerbig, 2005) and in mixing heights (Gerbig et al., 2008) for summer months with active vegetation resulted in uncertainties in simulated dry air mole fractions of 5.9 and 3.5 ppm, respectively.
The current study uses the same inversion system as in the technical note in the study of Kountouris et al. (2018) (hereafter referred to as Ko16) in which the inversion system and its setup were assessed based on pseudo data.As a next step, we apply the modeling framework to real CO 2 atmospheric observations.Our main objectives are to investigate the potential to infer flux estimates for Europe with reduced uncertainties and to estimate biospheric fluxes at high spatial resolution and for a full year.We use a spatial flux resolution of 0.25 • × 0.25 • to couple fluxes with the atmospheric transport model, and the state space allows the optimization of 3-hourly net ecosystem exchange (NEE) corrections to the prior NEE fluxes at a nominal spatial resolution of 0.5 • × 0.5 • .A data-driven error structure that was tested in the Ko16 study is implemented consistent with model-data flux mismatches (Kountouris et al., 2015).Further, different error structures are used and assessed, also including a spatial error structure with a hyperbolic correlation shape as suggested by Chevallier et al. (2012).Since spatial autocorrelations have been found to be very short, the annual aggregated uncertainty over the European domain is smaller than traditionally assumed (see also Ko16).The error inflation necessity and implementation was addressed in Ko16 either by inflating the error covariance or, more formally, by introducing a bias term.However, the hyperbolic correlation shape suggested by Chevallier et al. (2012) has a stronger impact from larger distances compared to the exponential shape, leading to an aggregated uncertainty that does not require being inflated.We also perform a number of sensitivity tests to account for misrepresentation of the fossil fuel signal and also for transport uncertainties due to vertical mixing.
This paper is structured as follows: Sect. 2 describes the inversion system, the network, and station data that are used, and it details the assumed error structure.Section 3 shows the results of the goodness of fit, and the retrieved fluxes.The data fitting and the reliability of the posterior fluxes are extensively discussed in Sect. 4.

Two-step inversion
Real-data inversions require a nested inversion scheme since observations also contain contributions from regions outside of the domain of interest (DoI).As in Ko16, the Jena inversion system (Rödenbeck, 2005), including the two-step nesting scheme (Rödenbeck et al., 2009;Trusilova et al., 2010), was used.This scheme allows the combination of regional and global inversions within a consistent system.Here we only provide a brief description as details are given in Rödenbeck et al. (2009) and Trusilova et al. (2010).The atmospheric transport models TM3 (step 1; Heimann and Körner, 2003) and STILT (step 2; Lin et al., 2003) were used for transport at the global and regional domain, respectively.For the global runs, TM3 was used at a spatial resolution of 4 • latitude × 5 • longitude, driven by meteorological fields from the ERA-Interim reanalysis produced by ECMWF (Dee et al., 2011).The transport matrix for the regional inversions was identical to the one used for the synthetic data study in Ko16.
In the first step, a global inversion is performed using the global transport model.The outcome is an optimized flux field on a coarser scale for the full period (FP) and the global domain.Then two forward runs are performed.The first run uses the global transport model over the FP, computing the modeled mixing ratios c mod1 .The second run again initializes the global transport model but only within the regional DoI.This can be regarded as a regional simulation, but with coarse resolution, yielding modeled mixing ratios c mod2 .Then the "remaining mixing ratio" is calculated for all the observing sites inside the DoI: where c ini is the initial condition that corresponds to a well mixed atmosphere with a given initial tracer mixing ratio.
In step two, the high-resolution transport model is used for the regional inversion within the DoI, where all fluxes are represented at fine resolution.For this inversion the vector containing the measured mixing ratios c meas is replaced by the remaining mixing ratios c remain .The optimized fluxes from this step are the high-resolution fluxes of interest.

Atmospheric network and data
For step 1 we used the same station network as in version s04_v3.6 of the Jena CarboScope CO 2 inversion (http:// www.bgc-jena.mpg.de/CarboScope/?ID=s04_v3.6), with 64 stations globally.For step 2 (regional inversion) continuous and flask measurements from 16 stations within Europe were used as described in Ko16 (see also Table 1).Of those 16 stations, 7 are already included in the step 1 inversion.All valid values provided were used, except those paired flask measurements that differed by more than 0.34 ppm, which were omitted.Measurements from the continuous stations were aggregated to hourly values where needed.Nighttime and daytime observations were selected depending on the type of station (Ko16).As all institutions report mixing ratio values traceable to the WMO (World Meteorological Organization) calibration scale, we expect compatibility between the different datasets (also see Rödenbeck et al., 2006).
In this study we use the site HEI (Heidelberg), which is traditionally not used for European CO 2 flux inversions since it Table 1.Information on the stations used for the regional inversions.Same network applied for the synthetic data inversions and the real-data inversions in Kountouris et al. (2018).In the first column the term "type" stands for continuous (C) or flask (F) data.Under "Data origin" WDCGG means World Data Centre for Greenhouse Gases.  is considered too local (Broquet et al., 2013;Rödenbeck et al., 2009;Rivier et al., 2010).The Heidelberg region is considered to be one of the most polluted regions in Germany (Fiedler et al., 2005) and therefore could bias the flux estimates.Moreover, the WES (Westerland) site contains long periods with no data.This could potentially affect posterior flux estimates since extended data gaps can lead to jumps in the presence of biases.Thus, we evaluate the performance and the sensitivity of the European flux estimates to the network configuration by also performing an inversion (referred to as nBV14; see Table 2) excluding HEI and WES.

A priori information and uncertainties
A set of inversion cases differing in the prior information, the error structure and the station configuration was realized (see overview in Table 2).Prior information derived from both biosphere models (VPRM and GBIOME-BGCv1) is used to investigate the impact of the prior fields to the posterior flux estimates.Furthermore, an ensemble of inversions using different error structures is used to investigate the impact on the posterior flux estimates and uncertainties.
Similarly to the synthetic inversion (Ko16), the modeldata mismatch uncertainties are the same as in the Ko16 study (see also Fig. 2 therein).Further, we use the base case nBV (no bias VPRM as prior, B1 in Ko16), which inflates P. Kountouris et al.: Quantification of the European terrestrial CO 2 fluxes the prior uncertainty by upscaling the error covariance matrix, and case BVR (bias VPRM as prior respiration as shape; S1 in Ko16), which includes a bias term.In the base case the VPRM model provides the prior flux field, and exponentially decaying correlations are assumed.The bias component in the BVR scenario will always have a correction with the same sign for all grid cells as it just scales a predefined flux field.In the BVR case it follows the shape of the annually averaged respiration flux, in the BVN case that of the a priori net biogenic flux, and in the BVRT case again that of the annually averaged respiration flux, but with monthly temporal resolution of the bias term to allow for some temporal flexibility.The nBB inversion refers to the scenario in which GBIOME-BGCv1 was used as a priori information instead of VPRM, and the error structure does not contain a bias term.With this case we can evaluate how sensitive the posterior flux estimates are with respect to the prior information that has been used.We also examine a spatial error structure based on a hyperbolic (instead of an exponential) spatial correlation shape as suggested in Chevallier et al. (2012), which we will refer to as the nBVH scenario.
Note that in most of the inversions performed, VPRM fluxes were used as prior information.Those fluxes are already optimized using EC measurements; therefore, evaluation of the posterior flux estimates against EC data on a local scale could result in posterior fluxes that are limited or even not further constrained (since they are already optimized).In contrast, posterior fluxes produced with BIOME-BGC used as prior are expected to show significantly larger corrections compared to the prior estimations and are therefore used for evaluation against EC data.Nevertheless, in most cases we use VPRM as prior in order to keep our estimates as data driven as possible through the overall optimization procedure: on a local scale by using EC data and on a regional scale using the atmospheric dry mole fractions.
As in the synthetic experiment (Ko16) the temporal decorrelation time was set to 31 days.In Kountouris et al. (2015), model-data comparisons representative on a site scale (around 1 km) showed spatial correlation lengths of 40 km whilst model-model comparisons representative at 50 km resolution identified a correlation scale of 370 km.Also considering that the state space has a resolution of 50 km, the spatial decorrelation length was chosen to be approximately 100 km (66 km in the meridional direction and 130 km in the zonal direction).In the prior error covariance, diagonal elements of 2.27 µmol m −2 s −1 were assumed, consistent with the model-data flux mismatches as calculated in Kountouris et al. (2015).Propagating this spatiotemporal error structure yields a domain-integrated uncertainty (E ST ) of 0.15 GtC yr −1 .Note that this is substantially smaller than for the synthetic experiment due to the much shorter spatial correlation length scales.A total annual, domain-integrated uncertainty E tot of 0.3 GtC yr −1 was assumed, which corresponds to twice the standard deviation of annual terrestrial flux estimates for 2007 between ter-restrial biosphere models taken from the global carbon atlas (http://www.globalcarbonatlas.org).This is also consistent with the prior uncertainty (for Europe) assumed for the global inversions performed by the Jena inversion system.For those inversions in which the additional bias term was considered (BVR, BVN and BVRT scenarios), their error E BT was calculated using For the nBVH scenario using hyperbolic correlations similar to Chevallier et al. (2012) 1 1+d , a characteristic value d (lag distance) was used such that the correlation drops after around 60 km to 1/e of its initial value, consistent with the hyperbolic fit to the model-data flux residual autocorrelation in Kountouris et al. (2015).For this case no additional bias term was needed, as the spatially and temporally aggregated uncertainty was found to be 0.32 GtC yr −1 , which is very close to the uncertainty assumed for the inversions (0.3 GtC yr −1 ).
Furthermore, we include ocean fluxes from Mikaloff-Fletcher et al. ( 2007) and anthropogenic emissions from the EDGAR v4.1 inventory scaled at national level for individual years according to the BP (British Petroleum) statistical review of world energy (BP, 2012) following Steinbach et al. (2011).Anthropogenic emissions are considered to be perfectly known (with no prior uncertainty), as one typically assumes that there is more a priori knowledge regarding the anthropogenic emissions as compared to biogenic fluxes.As the inversion cannot distinguish between biogenic and anthropogenic signals, any errors in the a priori anthropogenic emissions will be included as corrections to the NEE flux.

Diagnostics and aggregation of fluxes
Similar to Ko16, we use the χ 2 c metric to evaluate the goodness of fit for each station (Eq.3): where c t is the model-data mismatch in dry mole fractions for a given observation time t, n the number of observations and σ t the assumed uncertainty.Further, we also make use of the reduced χ 2 r (Eq.4), where J min is the cost function at its minimum: For more details about the chi square metric the reader is referred to the Ko16 study.The optimized fluxes are derived at 0.25 • spatial and daily temporal resolution from the inversion system.We postprocess the fluxes by aggregating them spatially on country and domain-wide scales and temporally on monthly and annual scales.Flux comparisons with other studies require that both fluxes refer to the same geographical region.Typically studies refer to TransCom regions with a European domain that expands more into the Eurasian region.To scale our results to the TransCom EU region, we calculated the area ratio between the TransCom EU region and our European domain.This ratio (about 1.69) was used to scale our posterior estimates and the corresponding uncertainties assuming linearity in the variances (presented in Fig. 8).

Simulated CO 2 and goodness of fit
Figure 1 presents a comparison of observed and modeled daily averages of the nighttime (hours 23:00, 00:00, 01:00, 02:00, 03:00, 04:00 UTC) CO 2 dry air mole fractions for the Schauinsland station (SCH), a mountain station, for the year 2007.The prior estimates (gray line) as derived from a forward model run using VPRM flux fields are systematically lower than the observations (black line) with the most divergent values occurring during the growing season.A similar pattern was found for the other atmospheric stations.Posterior CO 2 time series from all the inversions are in much closer agreement with the observations.Table 3 summarizes the statistics between the modeled and the observed CO 2 dry mole fractions for all stations based on daily averages using the respective sampling times (see also Ko16) for mountain (nighttime) and other stations (daytime).Of note is that the real-data inversions include errors due to the modeling of transport, which is not the case in the synthetic experiment in Ko16 as the same transport model was used for forward and inversion runs.Standard deviations of the posterior residuals (observed minus modeled) show an average decrease of 59 % for all inversion setups and for all stations compared to the prior residuals.Correlations between prior and observed mole fractions as well as posterior and observed mole fractions (also Table 3) were likewise increased on average from 0.48 to 0.93.Of note is that nBV and nBB, which use an inflated prior error covariance for the spatiotemporal component, show larger improvement relative to the prior in RMSD and some limited improvement in correlation coefficient, compared to those inversions in which a bias component was included (BVR, BVN, BVRT). Figure 2 visually summarizes the goodness of fit in a Taylor diagram for cases nBV and BVR, presenting prior and posterior estimates of the correlation and the normalized standard deviation between the modeled and observed CO 2 dry mole fraction time series.It is obvious that the additional flexibility of nBV in the spatiotemporal flux distribution results in a better reproduction of the concentration variability.The same picture emerges when comparing the nBV and nBB inversions to nBVH (see Table 3).Although all these cases assume no explicit bias term in the error structure, the larger correlations from areas farther away for the nBVH case with a hyperbolic correlation cause a reduced number of effective degrees of freedom, which results in larger residuals in posterior-observed mole fractions (Table 3) comparable to those of the BVR case.
Calculating the goodness of fit using the station-specific χ 2 c values from Eq. ( 3), most of the sites (Table 3) show val- ues around 1, indicating that the misfits are inside the 1σ site specific uncertainty.For the CBW, HEI, JFJ and KAS sites, values above 1 regardless of the error structure were found, with the most extreme value being 5.17 for the HEI site in the nBVH inversion.This could suggest that for a polluted site like HEI larger uncertainties should be considered.The reduced χ 2 r values regarding the overall model performance (Eq.4) for all inversion setups are found to be close to 1, with χ 2 values of 1.08 (nBV), 1.16 (nBB), 1.17 (BVR), 1.17 (BVN), 1.19 (BVRT), 0.89 (nBV14) and 1.25 (nBVH), suggesting that the assumed prior uncertainty describes the actual uncertainties well.

Posterior flux estimates on different scales
The annually integrated spatial flux distribution is presented in Fig. 3 for all the different inversion settings.Differences between the results based on the two general error structures (with and without the bias term) were observed mainly in central and western Europe (longitudes less than 20 • E), where the network provides a strong constraint.This difference is characterized by stronger spatial flux variability for the general nBV case, with multiple transitions between carbon sources and sinks on regional scales.The same picture emerges for the western part of Europe.In contrast, all the inversions including a bias component (BVR, BVN, BVRT) yield a more homogeneous flux distribution with a somewhat finer structure in the flux retrievals (e.g., France and the northeastern part of Europe).Comparisons between BVR, BVN and BVRT flux distributions do not show any significant difference.Almost the same picture emerges when comparing the nBV and nBV14 cases, indicating that excluding the two stations does not have a very strong influence on our annual flux estimates.However, spatial differences were observed for the areas close to the two sites.The most impor-tant one applies for the area near the HEI station where we observed a transition from source to net carbon sink when excluding the corresponding site.The choice of the prior does only have a small impact on the mean flux as can be seen by comparing posterior fluxes from nBV and nBB despite the significant differences in the flux innovations (Fig. 3).All innovations show that positive fluxes were added mainly in central Europe and more intensively for the cases for which no bias term was used.The positive flux corrections are to be expected since prior fluxes from VPRM show a strong European sink of 0.96 GtC yr −1 , which is most likely to be unrealistic.Overall the results suggest that the general error structure matters, i.e., whether or not to include a bias term, but how the bias is implemented is of less importance for the retrieved flux patterns.One would expect that the flux distribution from the nBVH case would follow the general flux structure from the inversions without the bias term.Interestingly, the distribution is similar to the one obtained from the inversions with the bias term (cases BVR, BVN and BVRT).This shows that inversions assuming correlations with a strong contribution from the far field have similar characteristics as inversions that assume a flat bias term.
Figure 3 shows the spatially aggregated posterior flux estimates for the full domain with the corresponding uncertainties integrated on monthly and annual temporal scales.The same prior uncertainty was used for cases nBV and nBB, although they differ in prior flux field.Posterior estimates from nBV (blue line/shading) and nBB (green line/shading) inversions presented in Fig. 5a do not show any significant difference on monthly and annual scales despite the large difference in prior fluxes.We observe that the maximum uptake occurs slightly earlier for the nBB case.Monthly fluxes from the nBVH inversion also show the same temporal evolution.We do not observe any significant difference in monthly fluxes for the BVR (red line/shading) and BVN (violet line/shading) inversions (Fig. 5b).Both cases are comparable to the nBV and nBB cases on monthly and annual scales.A slightly different picture emerges from the BVRT inversion, in which the bias term allowed for more degrees of freedom for monthly corrections.The resulting seasonal cycle is somewhat smaller, with reduced summer carbon uptake.Inversions that included the bias term yielded smaller posterior uncertainties on both temporal scales, which is expected as the spatiotemporal component of the uncertainty was not inflated as was the case for the nBV scenario.Flux retrievals from the reduced network (sensitivity case nBV14) show a slightly deeper sink, but the differences from the base case nBV are insignificant (i.e., clearly within the posterior uncertainties).
All of the inversions suggest Europe to be a carbon sink, with a range of −0.23 ± 0.13 to −0.38 ± 0.17 GtC yr −1 for the BVRT and nBV inversions, respectively.The mean annual posterior flux estimate for Europe averaged over different inversions amounts to −0.32 GtC yr −1 .
Posterior monthly flux estimates on smaller spatial scales (country level) are shown in Fig. 6.Areas that are not well constrained by the current network show some divergence in the posterior flux estimates, although not significant considering the uncertainty range.For example, Germany, which is better constrained, shows a limited spread of the posterior fluxes, with an annually averaged standard deviation between the different posterior flux estimates being 0.9 MtC yr −1 , while the United Kingdom (which is less well constrained) shows a slightly larger spread of the posterior estimates, with an annually averaged standard deviation of 2 MtC yr −1 .Note that the posterior uncertainties are smaller by about 36 % for the BVR case, which is related to the smaller prior uncertainties on monthly timescales (see also Sect.3.2 in Ko16).

Validation against eddy covariance measurements
As shown in Broquet et al. (2013) and in Ko16, eddy covariance (EC) measurements in principle have the potential for quantitative evaluation of the retrieved fluxes from the inversions.Here we used posterior flux estimates from the nBB inversion for evaluation against EC measurements, as the prior flux fields in nBB (GBIOME-BGCv1) were not optimized using EC measurements.Gap-filled data were downloaded from the European Fluxes Database Cluster (http: //www.europe-fluxdata.eu).A modified flux site network compared to the one reported in Kountouris et al. (2015) was used.Specifically we omitted sites that were not used for the VPRM optimization (CH-Fru, CH-Lae, CH-Oe1, ES-LMa, FR-Avi, FR-Mau, IT-Cas, IT-LMa, IT-Ro2, NL-Dij, NL-Lut, SE-Sk1, SK-Tat) as well as sites that were not available as gap-filled data (CH-Dav, ES-Agu, FR-Aur).Further, some more sites were added both for the VPRM optimization and for the flux comparisons (CZ-wet, DK-Sor, HU-Bug, IT-Non, NL-Ca1, PL-wet, RU-Fyo, UK-PL3).Monthly-averaged fluxes were extracted, with weights for each vegetation class that compensate for the asymmetry between the number of flux towers per vegetation type and the fraction of land area covered by the specific vegetation type, similar to Ko16.
The analysis of the monthly prior biospheric fluxes in Fig. 7 reveals significant differences between observed and prior fluxes from the inversion.The GBIOME-BGCv1 model systematically overestimates the observed fluxes throughout the year.The retrieved fluxes from the inversion (dark green line) are closer to the observed fluxes, with a stronger uptake compared to the prior during spring and summertime.The timing of the peak uptake is shifted to 1 month earlier in comparison to the observations.The mean absolute bias (averaged absolute differences between prior minus observed fluxes and posterior minus observed fluxes) is significantly reduced by 52 % from 0.84 to 0.40 gCm −2 day −1 .The standard deviation of the residuals is reduced by around 24 %, from 0.68 for the prior to 0.40 gCm −2 day −1 for the posterior residuals.Splitting the sites into two main categories, the first only with crops and the second with non-crop sites, revealed differences on how well those sites can be represented.Clearly the best matches were found for the non-crop sites, with a reduction in the mean absolute bias of 51 %, whilst for the crop sites it is limited to 38 %.

Discussion
We performed a series of atmospheric CO 2 inversions based on atmospheric data taken from 16 European stations for 2007.Different data-driven error structures in the prior error covariance were assessed, and optimized biospheric fluxes were retrieved and post-processed on various temporal and spatial scales for further evaluation.In this part we discuss the fitting performance of the inversion system, and we detail the comparisons between our flux estimates on grid, national and continental scales against EC data and reported flux estimates from previous studies.Finally, we discuss how sensitive flux retrievals are in the presence of erroneous representation of the fossil fuel fluxes and the site selection.

Goodness of fit
Site-specific misfits show a reasonable fit to the atmospheric data.Nevertheless, in four cases (CBW, HEI, JFJ and KAS) site-specific χ 2 c values were found to be larger than 1 (see also Table 3), indicating that either the model-data mismatch errors were chosen too small or the spatiotemporal resolution of the flux model was too coarse compared to the biosphere fluxes and therefore small-scale variations are not resolved (Rödenbeck et al., 2003).In fact this seems to be the case for the JFJ and KAS sites as those are high-altitude sites with steep cliffs.In such a complex terrain the atmospheric circulation is hard to be simulated from the transport models.Regarding CBW and in particular HEI, they are polluted sites and it would be reasonable to assume larger model-data mismatch uncertainty since the model is too coarse to resolve the fossil fuel emission patterns.One could argue that using higher spatial resolution to couple fossil fuel fluxes with transport models might reduce the model-data mismatch uncertainties and hence improve posterior fluxes.To investigate this, we performed a forward run at coarser (0.25 • ) and higher (1/12 • latitude × 1/8 • longitude) spatial resolution using only the fossil fuel emissions.As we use a Lagrangian transport model, fluxes at higher resolution than that of the meteorological fields can be used such that the simulated fossil fuel signals contain more spatially detailed information (Lin et al., 2003).The derived concentration signal was subtracted from the observations and subsequently an atmospheric inversion was performed.We report no significant differences between the retrieved fluxes, indicating that simply increasing the spatial resolution to about 10 km is not enough to correctly represent the fossil fuel distribution.
A common approach in atmospheric inversion studies to evaluate the defined uncertainties is to examine the reduced χ 2 r values.However, this might not always be a sufficient metric (Michalak et al., 2005).The reduced χ 2 r values in our study (between 1.08 and 1.25) are larger than those found by Tolk et al. (2011), who found values between 0.34 and 0.78 for their pixel-based inversion, indicating a more conservative choice for their model-data mismatch errors.Even lower values were reported in the study by Peylin et al. (2005), with values ranging from 0.01 up to 0.6 depending on the assumed correlations.χ 2 values from Zhang et al. (2015) were within a range of 1 to 4, but were modified by inflating the error covariances through an iterative procedure, resulting in χ 2 r values comparable to ours.In conclusion, the χ 2 r values give confidence that the assumed prior uncertainties are well defined.

Validation against eddy flux measurements
On the local scale the inversion shows the ability to capture the observed flux variability on a monthly scale, as shown for the nBB case (see Fig. 7).The residuals between posterior model and EC flux data for monthly-and site-averaged fluxes show a range of misfits not exceeding 1.04 gCm −2 day −1 , which is comparable with Broquet et al. (2013), who found misfits of up to 1.5 gCm −2 day −1 using 6 years of data (2002)(2003)(2004)(2005)(2006)(2007).Of note is that the estimated carbon uptake agrees well with the estimated uptake for 2007 in Broquet et al. (2013) (within the uncertainty range).However, in contrast to the synthetic inversion of Ko16, the real-data inversion showed a larger monthly-averaged posterior bias equal to 0.40 gCm −2 day −1 compared to the −0.04 gCm −2 day −1 for the synthetic case.The poorer performance in terms of bias compared to the synthetic case is presumably mainly caused by the representation error.In the synthetic inversion we created a true flux field at the same spatial resolution as the posterior flux estimates and sampled this true flux distribution at the specific EC measurement location.This does not include any spatial representation error of the EC measurements (footprint about 1 km) with respect to the spatial resolution of 25 km at which the fluxes are used within the inversion.A further cause for this poorer performance is related to the transport error, as in the synthetic case the same transport was used to create the synthetic observations and to perform the inversion, while in the real-data inversions the observed atmospheric mole fractions are a result of real transport, which can only be approximated with the transport model used for the inversion.
Differences between posterior flux retrievals and observed NEE fluxes at the EC stations are clearly driven by the crop sites.The good agreement between posterior inverse flux estimates and fluxes measured with the eddy covariance technique at non-crop sites can be attributed to the relatively stable, within the year, land condition.Contrastingly, crop areas are subject to human activities throughout the year.Soil enrichment with organic fertilizers, irrigation and harvesting can severely influence the carbon balance of the local ecosystem.Thus, the poor performance between inverse estimates and EC flux measurements at crop sites can be linked to the extensive anthropogenic influence on those areas.Further, another difficulty that is common for all the ecosystems is that atmospheric concentrations implicitly contain more components than just the NEE signal, e.g., fire emissions.Such emissions are captured in the atmospheric observations (representative scale of hundreds of kilometers) but might not be captured from the EC flux measurements, which have a very short representative scale of around 1 km.
Posterior fluxes showed a shift by 1 month earlier (in May), for the maximum carbon uptake (see also Fig. 7).An initial hypothesis that this might be driven from sites that are difficult to simulate, such as those located in mountain regions, cannot be justified.Specifically, mountain sites were excluded in an additional sensitivity analyses, but the temporal shift remains.However, looking into the error of the difference between two months suggests that the flux difference between May and June is not significant.The error of the difference was calculated using a Monte Carlo experiment.Fluxes were averaged over the stations and the monthly Table 4. Results from jackknife delete-1 statistics for VPRM estimated domain-wide NEE for different vegetation classes and for all of the land area.The uncertainty in NEE from all land area was derived assuming independence in the vegetation-class-specific uncertainties.Note the strong asymmetry between the fraction of land area covered by the different vegetation classes and the number of eddy covariance sites used, indicating over-and underrepresentation: for example, 8 crop sites represent 51 % of the land area, while 15 grassland sites represent 5.6 % of the land area of Europe.differences were calculated.Then we used the standard deviation of the differences over the ensemble members to describe the month-to-month uncertainty.

Mismatch in bottom-up and top-down methods
Of note is the strong flux correction when using a priori fluxes from VPRM with an uptake of 0.96 GtC yr −1 compared to 0.3 GtC yr −1 after the inversion.The large correction of about 0.66 GtC yr −1 corresponds to roughly twice the prior uncertainty.We note that VPRM is a diagnostic model that uses simple light use efficiency and respiration equations and MODIS indices, with parameters optimized to match hourly observations of NEE fluxes (Mahadevan et al., 2008).It does not account for land management and land use changes (i.e., crop harvest, deforestation); thus, it will estimate a strong sink even for lands that have been harvested, with the respiration fluxes resulting from the use of the harvest (e.g., as food) not included.Those so-called lateral carbon fluxes, which are modeled by the atmospheric inversion, account for approximately 0.165 GtC yr −1 of the prior-posterior flux difference (Ciais et al., 2008).The rest of the difference of about 0.5 GtC yr −1 might be related to local characteristics of EC sites, which VPRM is not able to represent.Spatial variations in NEE from VPRM are driven by those of EVI (enhanced vegetation index), which is used at a spatial resolution of 1 km.For example, a crop field with typical dimensions of 100-200 m surrounded by other fields with different crop rotation (and differing phenology) are hard to represent with 1 km resolution EVI (even with the highest possible resolution of 250 m for MODIS re-flectances).To quantitatively assess the impact of this representation error in combination with the selection of sites used for the VPRM optimization, the annual domain-wide C budget from VPRM was recalculated after omitting one site per vegetation type at a time and optimizing the VPRM parameters (jackknife delete-1 method).Detailed results are shown in Table 4.The derived jackknife standard error amounted to 0.54 GtC yr −1 , with a dominant contribution from the cropland vegetation class (0.50 GtC yr −1 ).This uncertainty can fully explain the mismatch between the a priori and the posterior fluxes, and it emphasizes the importance of site selection and site representativeness in upscaling local EC measurements to larger regions.
The estimated uncertainty for VPRM fluxes based on jackknifing is larger than the prior uncertainties assumed for the atmospheric inversions.Hence, one could argue that the prior fluxes using VPRM (which indicate a too strong sink) combined with a too small prior uncertainty in the inversion lead to erroneous flux estimates.However, the optimized biogenic fluxes from all inversions converge at the annual and domain-integrated scale.A particular example is that of the nBB inversion.Even though the GBIOME-BGCv1 fluxes differ greatly from those produced by VPRM, this inversion is fully in line with the results from the rest of the inversions, indicating that the optimized flux estimates are not biased by the a priori flux fields but instead are driven by the atmospheric data.

Sensitivity to anthropogenic emissions
Another source of biospheric flux misrepresentation is the fossil fuel inventories.As mentioned in Sect.2.3 we do not allow for corrections in anthropogenic emissions, as they are assumed to be better known than the terrestrial fluxes.An overestimation or underestimation in anthropogenic emissions will thus lead to a stronger or weaker biospheric sink in atmospheric inversions.The anthropogenic emissions we use are 0.32 GtC yr −1 (27 %) lower for the EU-12 countries compared to those used by Rivier et al. (2010) (1.2 GtC yr −1 ).Peylin et al. (2011) estimate the difference between national totals for the different emission inventories to be around 10 %.In a study by Ciais et al. (2009) uncertainties of total fossil fuel CO 2 emissions in the EU-25 member states were estimated to be 19 % based on four different emission inventories.For the EU-25 countries, EDGAR emissions were found to be 12 % larger than the mean of the GAINS (Greenhouse Gas and Air Pollution Interactions and Synergies), UNFCCC (United Nations Framework Convention on Climate Change) and CDIAC (Carbon Dioxide Information Analysis Center) inventories (Ciais et al., 2009, Table 2).Sensitivity tests with increased prior fossil fuel emissions showed that the added fossil fuel increases the estimated uptake by almost 50 % relative to the added anthropogenic emissions.Taking an extreme scenario in which the fossil fuel emissions are increased by 17 % or 0.3 GtC yr −1 (result-ing in 1.77 GtC yr −1 compared to 1.47 GtC yr −1 total emissions for the EU domain; see Fig. 1 in Kountouris et al., 2018, for the extent of the EU domain), we estimate a European carbon sink for the nBV setup of −0.51 ± 0.17 GtC yr −1 compared to −0.38 ± 0.17 GtC yr −1 for the standard nBV case.Thus, the additional assumed fossil fuel emissions increased the estimated uptake by 0.13 GtC yr −1 , which is about 44 % of the added anthropogenic emissions.The fact that the resulting increase in the biospheric sink does not fully correspond to the increase in assumed emissions is likely a result of the sparse network, in which emissions from regions further away from the measurement sites are not fully registered in the simulated mole fractions.
In this study we assumed that anthropogenic emissions are perfectly known (which is a traditional assumption in atmospheric inversions), although this is not the case.As a result of not allowing for a correction in the fossil fuel component, this correction will be added to the correction of the biogenic signal.In this paragraph we already discussed how uncertain fossil fuel emissions may be.Further, we estimated how the uncertainty in the fossil fuel component impacts the carbon flux estimates; the magnitude but also spatial and temporal flux distributions may be significantly erroneous.For better future carbon flux estimations, fossil fuel optimization seems to be necessary.However, that would require 14 C tracer measurements, which are currently not available.

Sensitivity to site selection
Uncertainties in vertical mixing and especially in the nocturnal boundary layer (Gerbig et al., 2008) should be carefully addressed as they might lead to erroneous estimations of the carbon uptake.Typically, in atmospheric inversions the model-data mismatch error (measurement error covariance) also accounts for uncertainties due to the transport (i.e., wrong representation of the nocturnal boundary layer).The set of network stations includes seven mountain stations, for which we use nighttime observations (daytime for nonmountain stations) as these measurements are considered to be representative for the free troposphere.Errors can be introduced if the measurement height assumed in the transport model is within the modeled nocturnal stable boundary layer while in the real world it is not, which would lead to an overestimation in the simulated CO 2 signals from respiration or vice versa.In the inversion this would be compensated for by introducing stronger uptake fluxes to match the observed CO 2 time series.In order to investigate whether our results are influenced by the use of mountain stations, we performed an additional inversion using the nBV error structure, but excluding all these stations.The resulting sink in Europe was found to be −0.41 ± 0.17 GtC yr −1 , which is fully in line with the nBV inversion using all sites, suggesting that our estimates are not biased due to misrepresentation of the mountain stations, at least on annual and domain-wide aggregation scales.
However, the spatial flux distribution seems to depend on the site selection and in particular on the mountain sites used in a given inversion.Ambiguous carbon fluxes, e.g., carbon sinks over nonproductive areas like the Alps, Great Britain and western Czech Republic, as well as carbon sources over cultivated lands like western France, Poland and Ukraine, were derived from the inversions (Fig. 3). Figure 4 presents the annual spatial flux distribution by using a network of stations with no mountain sites (MS0 case) and using an error structure that does not contain a bias term.This sensitivity test is equivalent to the nBB case in which we also used the GBIOME-BGC model as prior.Subsequently, we plot the flux distribution by adding one mountain site at a time (cases 1-7 in which the number denotes how many mountain sites are being used).The add-one mountain site sequence is as follows: CMN, OXK, PTR, JFJ, KAS, SIL, PUY.For the MS0 case, we observe that in the region around the Alps, and the neighboring countries, the sink is smaller compared to the other inversions.The OXK and the KAS sites seem to be responsible for the sink over the Czech Republic.The KAS site also seems to be the driver for the high carbon flux sources around Poland, Ukraine and the Black Sea coasts.
Using an error structure that allows for a bias term as the one in BVR case seems to moderate the spatial flux misrepresentation.Comparing the subplots nBV (without bias term) and BVR (with bias term) in Fig. 3, we see that the abovementioned highly productive regions (according to the simulation) show somewhat weaker sinks for the BVR case compared to the nBV case (indicated by the less bluish contours).Subsequently, regions that appear to be strong carbon sources (in the nBV case) show a weaker flux signal when the bias term is used (BVR).
Although this study uses as much information as possible, in terms of the available atmospheric observations, large areas, e.g., western France, the whole eastern European part, are still poorly or not constrained at all from the atmospheric network.Hence, the spatial flux distribution in those areas is prone to large uncertainties.

Retrieved fluxes and comparison to previous inverse estimates
The retrieved spatially resolved fluxes showed a sensitivity in their spatial patterns to the a priori error structure, specifically to the inclusion of a bias component, as indicated by differences between the nBV and BVR cases.Such differences were not identified in the synthetic experiment in Ko16; however, in Ko16 a much larger spatial correlation length scale was assumed.In the synthetic inversions the long correlation length (766 km in the zonal direction and 411 km in the meridional direction) drastically reduces the effective number of degrees of freedom, forcing the fluxes to be smoothly corrected, regardless of the use of the bias component.In the real-data inversions the shorter correlation length (around 100 km), combined with the required larger error in-flation (compared to the synthetic inversions) for the nBV and nBB cases, increases the effective number of degrees of freedom.By using a bias component (BVR, BVN, BVRT cases) or by using the hyperbolic correlation shape (nBVH) with stronger large-scale correlation, instead of inflating the spatiotemporal error component, fluxes remain less flexible on a grid scale.
Our knowledge regarding annual CO 2 flux estimates for Europe is still highly uncertain, in part due to the limited number of regional inversions focusing on this domain.Flux estimates from previous studies, mainly global inversions, show a wide range (Fig. 8).We estimated an annual European carbon sink (ranging between −0.23 ± 0.13 and −0.38 ± 0.17 GtC yr −1 for the different inversion scenarios, Fig. 5d), which is however representative for a smaller European region compared to the TransCom European region typically used in other studies.The upscaled flux estimates (see also Sect.2.4) for the TransCom EU region have a range of −0.39 to −0.71 GtC yr −1 .Ciais et al. (2000) estimated a European sink of −0.3 ± 0.8 GtC yr −1 for the target period 1985-1995.However, in contrast to our study they used a global system and a gap-filling algorithm since 42 % of the observational data were missing.A recent study from Peylin et al. (2013) computed the mean European sink for the period 1998-2001 to be −0.44 ± 0.45 GtC yr −1 by utilizing 11 different global inversion systems.Gurney et al. (2004) also performed global inversions and found the mean European annual fluxes for the 1992-1996 period to be −0.98 ± 0.4 GtC yr −1 , which is larger compared to our estimations.Moreover, our results for the mean net monthly fluxes over Europe agreed very well with Rivier et al. (2010), who estimated for the 1998-2001 time frame using five different transport models in their inversion that the maximum seasonal uptake occurs in July and lies between −10 and −80 gCm −2 month −1 , while our results show maximum uptake in June with a range of −33 to −37 gCm −2 month −1 for the different inversion cases.We note that the annual flux differences between our flux estimates and those from other studies may also be caused due to the interannual flux variability.Nevertheless, this should not be expected to critically drive those differences since posterior uncertainties were found to be larger than interannual variations (Broquet et al., 2013), making the significance of the variations questionable.
A recent study from Reuter et al. (2014) based on inversions using satellite observations estimated the carbon budget for the TransCom European region.For the year 2007 the sink was found to be −1.1 ± 0.30 GtC yr −1 , much larger compared to most other inversion estimates using ground observations.However, Feng et al. (2016) tried to investigate why atmospheric inversions using satellite observations show an elevated European uptake through a series of sensitivity tests.They linked the increased uptake when using satellite measurements to potential observation biases and to the emission spatial patterns.Further, Feng et al. (2016)  highlighted that the large European uptake is related to up to 60-90 % from systematically higher modeled CO 2 fluxes transported into Europe from regions outside of the domain.While this looks to be a problem related to column measurements, this is not the case in our study since ground observations were used.In addition, we use the two-step inversion scheme, which limits the influence from the far field as we calculate the concentration signal from outside the domain and subtract that from the observations.Whilst the flux uncertainties outside the domain are not propagated, they can still be expressed as uncertainties in the observation space.However, if biases were introduced from the global inversion to the fluxes outside of the domain, then regional flux estimations may differ.
On a national scale we can compare our results to those for the Netherlands obtained by Meesters et al. (2012), who estimated the annual national carbon sink to be about −0.017 ± 0.004 GtC yr −1 .Our estimations are very close, with a range of −0.012 ± 0.004 GtC yr −1 (BVR inversion) to −0.014 ± 0.005 for the nBB inversion.Of note is that the carbon budget estimates for the Netherlands agree remarkably well despite the substantial differences between the two studies: Meesters et al. (2012) used an inversion scheme that solves for scaling factors of the gross prior fluxes.Spatial correlations of 100 km were assumed but only for photosynthetic fluxes within the same land use class.In addition, the domain of interest (Netherlands) has a stronger constraint as four stations located within the domain were used, while our inversion only uses one station (CBW), with the rest of the stations being at least 360 km away (WES).Both studies assume approximately the same fossil fuel emissions (0.051 GtC yr −1 vs. 0.053 GtC yr −1 in Meesters et al., 2012).

Conclusions
An inverse modeling framework, based on the system described in Ko16, and using real atmospheric data from 16 stations in Europe was deployed to infer biospheric carbon fluxes.Different prior error structures were assumed to investigate how sensitive posterior fluxes are.The results are validated and compared on different temporal and spatial scales.Satisfactory agreement was found when posterior inverse flux estimates were compared against eddy covariance observations on a local scale, as well as against previous studies on national and continental scales, which gives us confidence for our carbon flux estimations.We calculated a sink for the European continent with amounts ranging from −0.23 ± 0.13 to −0.38 ± 0.17 GtC yr −1 depending on the assumed prior error structure.
A special effort was also made to avoid potential biased flux estimations due to site selection (i.e., heavily polluted sites, or sites that are within the nocturnal boundary layer such as mountain stations) by performing inversions using different network configurations.We did not observe any significant impact for domain-wide aggregated fluxes, at least for monthly and annual scales.However, changes in spatial flux patterns on a pixel scale should be expected, when network configuration is then changed.Further, we also studied how sensitive biospheric carbon fluxes are, when incorrect fossil fuel emissions are assumed.We found that due to the network sparseness the fossil fuel emissions are not fully captured in the simulated mole fractions, which may bias the flux estimates.
What do we learn or what should we expect then from the top-down approach?The current analysis, including the technical note in Ko16, suggests that aggregated fluxes on monthly (temporally) and country (spatially) scales can be successfully retrieved from the inversion system.However, retrieving spatially resolved fluxes on finer scales is still rather challenging.A lack of observations for extended European regions, complexity of the terrain, especially in mountainous regions, and the absence of fossil fuel measurements that would otherwise allow the separation of fossil fuel signals from biospheric signals in observed CO 2 time series are the current problems that regional inversions are facing.Whilst ICOS (Integrated Carbon Observing System) will introduce more stations in the European continent, inversions should use all the available information; this could be achieved by assimilating multiple data streams like continuous and flask measurements in combination with satellitederived information, aiming to constrain the European continent as tightly as possible.Further, new stations should also aim to measure combustion tracers.It would be of great help in future inversion systems to be able to update the anthropogenic emission maps and subsequently to more accurately compute the biogenic signal.

Figure 1 .
Figure 1.Daily nighttime (23:00-04:00 UTC) averages for prior, true and posterior CO 2 dry mole fraction time series for the Schauinsland site for the real-data inversion.Time starts at 1 January 2007.

Figure 2 .
Figure 2. Taylor diagram for modeled and observed time series of CO 2 dry mole fractions.Prior (black), observed (green, the perfect match of modeled and observed time series) and the different inversion cases (nBV is blue; BVR is red) are displayed.Different symbols denote different atmospheric stations.The normalized SD was calculated as the ratio of the SD of the modeled time series to the SD of observations.Gray semicircles show contours of the SD of the model error.

Figure 3 .
Figure 3. Annual biogenic flux spatial distribution (a) and flux innovations (posterior minus prior, b) as estimated from the different inversions for the real-data case.Units are in gC yr −1 m −2 .

Figure 4 .
Figure 4.As Fig. 3, but only for the nBB inversion case.The numbers denote the number of mountain sites used in the inversions, e.g., MS0: no mountain site.Units are in gC yr −1 m −2 .

Figure 5 .
Figure 5. Monthly and annual (d) biosphere fluxes integrated over the domain.Panel (a) shows the nBV, nBB and nBVH cases, panel (b) shows the BVR and BVN cases, and panel (c) shows the BVRT and nBV14 cases.Note that all inversions share the same annual prior uncertainty but monthly prior uncertainties differ.Units are in GtC month −1 and GtC yr −1 for monthly and annual fluxes, respectively.

Figure 6 .
Figure 6.Temporal evolution of prior and posterior monthly NEE for selected European countries.

Figure 7 .
Figure 7. Temporal evolution of monthly NEE (gCm −2 day −1 ) averaged over all EC sites (a), excluding crop sites (b) and using only crop sites (c).Uncertainties (error of the mean monthly NEE) are indicated by the shaded areas.

Figure 8 .
Figure 8. Annual European biogenic CO 2 fluxes in GtC yr −1 for the different inversions and comparison to previous studies.Fluxes are upscaled to the TransCom EU domain.Labels of the references are as follows: Ci: Ciais et al. (2000); Gu: Gurney et al. (2004); Ri: Rivier et al. (2010); Pe: Peylin et al. (2013); Re: Reuter et al. (2014).Periods for the inverted fluxes are given below the flux estimates.

Table 2 .
Overview of the inversion scenarios."Shape" describes the internal structure of the bias component (proportional to respiration R or to net ecosystem exchange, NEE), and "Time vary" indicates whether the bias component also has temporal variations or not.The fifth column "Prior" represents the terrestrial model used as prior, and "Correlation shape" describes the functional form used for the spatial prior uncertainty correlation, either exponential (E) or hyperbolic (H).The last column indicates whether the full or the reduced station network was assumed.

Table 3 .
RMSD (first column in ppm) and correlation coefficients (second column) between observations and prior-posterior CO 2 dry mole fractions for daily daytime or nighttime averaged values and for each station.The third column shows χ 2 c , the normalized dry mole fraction mismatch per degree of freedom for 7-day averaged residuals as a measure of how well the data were fitted.The format for each station is as follows: RMSD | r 2 | χ 2 .