Inverse modelling of European CH 4 emissions during 2006–2012 using different inverse models and reassessed atmospheric observations

. We present inverse modelling (top down) estimates of European methane (CH 4 ) emissions for 2006–2012 based on a new quality-controlled and harmonised in situ data set from 18 European atmospheric monitoring stations. We applied an ensemble of seven inverse models and performed four inversion experiments, investigating the impact of different sets of stations and the use of a priori information on emissions.Theinverse models infer total CH 4 emissions of 26.8 (20.2–29.7) Tg CH 4 yr (cid:0) 1 (mean, 10th and 90th percentiles from all inversions) for the EU-28 for 2006–2012 from the four inversion experiments. For comparison, total anthropogenic CH 4 emissions reported to UNFCCC (bottom up, based on statistical data and emissions factors) amount to only 21.3 Tg CH 4 yr (cid:0) 1 (2006) to 18.8 Tg CH 4 yr (cid:0) 1 (2012). A potential explanation for the higher range of top-down estimates compared to bottom-up inventories could be the contribution from natural sources, such as peatlands, wetlands, and wet soils. Based on seven different wetland inventories from the Wetland and Wetland CH 4 Inter-comparison of Models Project (WETCHIMP), total wetland emissions of 4.3 (2.3–8.2) Tg CH 4 yr (cid:0) 1 from the EU-28 are estimated. The while Taking European sites and with vertical proﬁles obtained during the Infrastructure for Measurement of the European Carbon Cycle (IMECC) aircraft campaign. We present a novel approach to estimate the biases in the derived emissions, based on the comparison of simulated and measured enhancements of CH 4 compared to the background, integrated over the entire boundary layer and over the lower troposphere. The estimated average regional biases range between (cid:0) 40 and 20 % at the aircraft proﬁle sites in France, Hungary and Poland.

P. Bergamaschi et al.: Inverse modelling of European CH 4 emissions Abstract. We present inverse modelling (top down) estimates of European methane (CH 4 ) emissions for 2006-2012 based on a new quality-controlled and harmonised in situ data set from 18 European atmospheric monitoring stations. We applied an ensemble of seven inverse models and performed four inversion experiments, investigating the impact of different sets of stations and the use of a priori information on emissions.
The inverse models infer total CH 4 emissions of 26.8 (20.2-29.7) Tg CH 4 yr −1 (mean, 10th and 90th percentiles from all inversions) for the EU-28 for 2006-2012 from the four inversion experiments. For comparison, total anthropogenic CH 4 emissions reported to UNFCCC (bottom up, based on statistical data and emissions factors) amount to only 21.3 Tg CH 4 yr −1 (2006) to 18.8 Tg CH 4 yr −1 (2012). A potential explanation for the higher range of top-down estimates compared to bottom-up inventories could be the contribution from natural sources, such as peatlands, wetlands, and wet soils. Based on seven different wetland inventories from the Wetland and Wetland CH 4 Inter-comparison of Models Project (WETCHIMP), total wetland emissions of 4.3 (2.3-8.2) Tg CH 4 yr −1 from the EU-28 are estimated. The hypothesis of significant natural emissions is supported by the finding that several inverse models yield significant seasonal cycles of derived CH 4 emissions with maxima in summer, while anthropogenic CH 4 emissions are assumed to have much lower seasonal variability. Taking into account the wetland emissions from the WETCHIMP ensemble, the top-down estimates are broadly consistent with the sum of anthropogenic and natural bottom-up inventories. However, the contribution of natural sources and their regional distribution remain rather uncertain.
Furthermore, we investigate potential biases in the inverse models by comparison with regular aircraft profiles at four European sites and with vertical profiles obtained during the Infrastructure for Measurement of the European Carbon Cycle (IMECC) aircraft campaign. We present a novel approach to estimate the biases in the derived emissions, based on the comparison of simulated and measured enhancements of CH 4 compared to the background, integrated over the entire boundary layer and over the lower troposphere. The estimated average regional biases range between −40 and 20 % at the aircraft profile sites in France, Hungary and Poland.

Introduction
Atmospheric methane (CH 4 ) is the second most important long-lived anthropogenic greenhouse gas (GHG) after carbon dioxide (CO 2 ) and contributed ∼ 17 % to the direct anthropogenic radiative forcing of all long-lived GHGs in 2016, relative to 1750 (NOAA Annual Greenhouse Gas Index, AGGI; Butler and Montzka, 2017). The globally averaged tropospheric CH 4 mole fraction reached a new high of 1842.7 ± 0.5 ppb in 2016 (global average from marine surface sites; Dlugokencky, 2017), more than 2.5 times the pre-industrial level (WMO, 2016b). The increase in atmospheric CH 4 has been monitored by direct atmospheric measurements since the late 1970s (Blake and Rowland, 1988;Cunnold et al., 2002;Dlugokencky et al., 1994Dlugokencky et al., , 2011. Atmospheric growth rates were large in the 1980s, decreased in the 1990s and were close to zero during 1999. Since 2007 increased again significantly (Dlugokencky et al., 2009;Nisbet et al., 2014;Rigby et al., 2008), at an average growth rate of 5.7 ± 1.1 ppb yr −1 during 2007-2013 and at a further increased rate of 10.0 ± 2.5 ppb yr −1 during 2014-2016 (Dlugokencky, 2017).
While the global net balance (global sources minus global sinks) of CH 4 is well defined by the atmospheric measurements of in situ CH 4 mole fractions at global background stations, the attribution of the observed spatial and temporal variability to specific sources and regions remains very challenging (Houweling et al., 2017;Kirschke et al., 2013;Saunois et al., 2016). Global inverse models are widely used to estimate emissions of CH 4 at global/continental scale, using mainly high-accuracy surface measurements at remote stations (e.g. Bergamaschi et al., 2013;Bousquet et al., 2006;Mikaloff Fletcher et al., 2004a, b;Saunois et al., 2016). In addition, satellite retrievals of GHGs have also been used in a number of studies. In particular, near-IR retrievals from SCIAMACHY and GOSAT providing column average mole fractions (XCH 4 ) have been demonstrated to provide additional information on the emissions at regional scales (Alexe et al., 2015;Bergamaschi et al., 2009;Wecht et al., 2014). However, current satellite retrievals may still have biases and their use in atmospheric models is at present limited by the shortcomings of models in realistically simulating the stratosphere, especially at higher latitudes (Alexe et al., 2015;Locatelli et al., 2015). Furthermore, integration over the entire column implies that the signal from the CH 4 variability in the planetary boundary layer (which is directly related to the regional emissions) is reduced in the retrieved XCH 4 .
In contrast, in situ measurements at regional surface monitoring stations can directly monitor the atmospheric mole fractions within the boundary layer, providing strong constraints on regional emissions. These regional monitoring stations have been set up over the past years, especially in the United States (Andrews et al., 2014) and Europe (e.g. Levin et al., 1999;Lopez et al., 2015;Popa et al., 2010;Schmidt et al., 2014;Vermeulen et al., 2011). The measurements from these stations were used in a number of inverse modelling studies to estimate emissions at regional and national scales (Bergamaschi et al., 2010Ganesan et al., 2015;Henne et al., 2016;Kort et al., 2008;Manning et al., 2011;Miller et al., 2013). A specific objective of these studies is the verification of bottom-up emission inventories reported under the United Nations Framework Convention on Climate Change (UNFCCC), which are based on statistical activity data and measured or estimated emission factors (IPCC, 2006). For many CH 4 source sectors (e.g. fossil fuels, waste, agriculture), emission factors exhibit large spatial, temporal, and site-to-site variability (e.g. Brandt et al., 2014), which inherently limits the capability of bottom-up approaches to provide accurate total emissions. Particular challenges are the representation of high emitters or super emitters in bottomup inventories (Zavala-Araiza et al., 2015) but also of minor source categories (e.g. abandoned coal mines or landfill sites), which, if not properly accounted for, may result in incorrect inventories. Independent verification using atmospheric measurements and inverse modelling is therefore considered essential to ensuring the environmental integrity of reported emissions (Levin et al., 2011;National Academy of Science, 2010;Nisbet and Weiss, 2010;Weiss and Prinn, 2011) and has been suggested to be used for the envisaged transparency framework under the Paris agreement (WMO, 2016a).
Inverse modelling (top down) is a mass-balance approach, providing information from the integrated emissions from all sources. However, the quality of the derived emissions critically depends on the quality and density of measurements and the quality of the atmospheric models used. In particular, when aiming at verification of bottom-up inventories, thorough validation of inverse models and realistic uncertainty estimates of the top-down emissions are essential. Bergamaschi et al. (2015) showed that the range of the derived total CH 4 emissions from north-western and eastern Europe using four different inverse modelling systems was considerably larger than the uncertainty estimates of the individual models. While the latter typically use Bayes' theorem to calculate the reduction of assumed a priori emission uncertainties by assimilating measurements (propagating estimated observation and model errors to the estimated emissions), an ensemble of inverse models may provide more realistic overall uncertainty estimates, since estimates of model errors are often based on strongly simplified assumptions and do not represent the total uncertainty. Furthermore, validation of the inverse models against independent observations not used in the inversion is important to assess the quality of the inversions.
Here, we present a new analysis, estimating European CH 4 emissions over the time period 2006-2012 using seven different inverse models. We apply a new, quality-controlled, and harmonised data set of in situ measurements from 18 European atmospheric monitoring stations generated within the European FP7 project InGOS (Integrated non-CO 2 Greenhouse gas Observing System). The InGOS data set is complemented by measurements from additional European and global discrete air sampling sites. Compared to the previous paper by Bergamaschi et al. (2015), which analysed 2006-2007 emissions, this study extends the target period (2006)(2007)(2008)(2009)(2010)(2011)(2012), takes advantage of the larger and more stringently quality-controlled observational data set, and includes additional inverse models. Furthermore, we present a more comprehensive validation of model results using an extended set of aircraft observations, aiming at a more quantitative assessment of the overall errors. Finally we examine in more detail the potential contribution of natural emissions (such as peatlands, wetlands, or wet soils) using seven different wetland inventories from the Wetland and Wetland CH 4 Intercomparison of Models Project (WETCHIMP) Wania et al., 2013).

Atmospheric measurements
The European monitoring stations used in this study are compiled in Table 1 and their locations are shown in Fig. 1. The core data set is from 18 stations with in situ CH 4 measurements. These measurements have been rigorously qualitycontrolled within the InGOS project. The quality control includes regular measurements of target gases that monitor instrument performance and long-term stability (Hammer et al., 2013;Lopez et al., 2015;Schmidt et al., 2014;WMO, 1993). The instrument precision has been evaluated as a 24 h moving 1σ standard deviation of bracketing working standards (denoted "working standard repeatability"). A suite of other quality measures, error contributions, and uncertainty in non-linearity corrections, potentially causing systematic biases between stations, have been investigated (Vermeulen, 2016). However, they have not been used in the inversions. The in situ measurements are reported as hourly average dry-air mole fractions (in units of nmol mol −1 , abbreviated as ppb), including the standard deviation of all individual measurements within 1 h.
At most stations, the measurements have been performed using gas chromatography (GC) systems equipped with flame ionisation detectors (FID). At the station Pallas (PAL), a GC-FID was applied until January 2009 and then replaced by a cavity ring-down spectrometer (CRDS). CRDS measurements (which are superior in precision compared to GC-FID) also started at other measurement sites, but here we used the GC measurements wherever available for the sake of time series consistency, while CRDS measurements were included for quality control and error assessment.
The InGOS measurements are calibrated against the NOAA-2004 standard scale (which is equivalent to the World Meteorological Organization Global Atmosphere Watch WMO-CH4-X2004 CH 4 mole fraction scale) (Dlugokencky et al., 2005), except the InGOS measurements at Mace Head (MHD), for which the Tohoku University (TU) CH 4 standard scale has been used (Aoki et al., 1992;Prinn et al., 2000). The two calibration scales are in close agreement. Based on parallel measurements by NOAA and Advanced Global Atmospheric Gases Experiment (AGAGE) at five globally distributed stations over more than 20 years, an average difference of 0.3 ± 1.2 ppb between the two scales has been found. This difference is not considered significant, and therefore no scale correction has been applied. In this study, we use the InGOS "release 2014" data set. Table 1. European monitoring stations used in this study: s.h. is the sampling height (m) above ground; ST specifies the sampling type (I is in situ measurements; D is discrete air sample measurements). The last four columns indicate the use of the corresponding station data set in the inversions S1-S4 (see Sect. 3.1 and  Six InGOS stations are equipped with tall towers, with uppermost sampling heights of 96-300 m above the surface, eight sites are surfaces stations (at low altitudes) with sampling heights of 6-60 m, and four sites are mountain stations (at altitudes between 1205 m and 3575 m a.s.l.).
The in situ measurements at the InGOS stations are complemented by discrete air samples from the NOAA Earth System Research Laboratory (ESRL) global cooperative air sampling network at 11 European sites (and additional global NOAA sites used for the global inverse models) (Dlugokencky et al., 1994(Dlugokencky et al., , 2009) and at five sites from the French RAMCES (Réseau Atmosphérique de Mesure des Composés à Effet de Serre) network . The discrete air measurements are taken from samples which are usually collected weekly.
For validation of the inverse models, we use CH 4 measurements of discrete air samples from four European aircraft profile sites at Griffin, Scotland (GRI), Orléans, France (ORL), Hegyhátsál, Hungary (HNG) and Białystok, Poland (BIK) (see Fig. 1). The analyses of the samples from GRI, ORL and HNG were performed at the Laboratoire des Sciences du Climat et de l'Environnement (LSCE) with the same GC used for RAMCES sites. The samples from BIK were analysed at the Max Planck Institute for Biogeochemistry (MPI).
Furthermore, we use airborne in situ measurements from a campaign over Europe, which was performed in September/October 2009 as part of the Infrastructure for Measurement of the European Carbon Cycle (IMECC) project (Geibel et al., 2012). All measurements of the discrete air samples (from the NOAA and RAMCES surfaces sites and LSCE and MPI aircraft profile sites) and from the IMECC aircraft campaign are calibrated against the WMO-CH4-X2004 scale.

Inversions
Four inversions were performed, investigating the impact of different sets of stations and the use of a priori information on emissions (see Table 2). Inversion S1 covers 2006-2012 using a base set of observations (including only stations with maximum data gaps of 1 year), while inversions S2, S3, and S4 were performed for the years 2010-2012 and include additional stations, for which not all data are available before 2010. In S1, S2, and S3 the InGOS data set is used along with the discrete air samples from NOAA and RAMCES surfaces sites, while in S4 only the InGOS data are used. The exact sets of stations applied in the different inversion experiments are indicated in Table 1. Inversion S1, S2, and S4 use a priori information of CH 4 emissions from gridded inventories. For the anthropogenic CH 4 emissions, the EDGARv4.2FT-InGOS inventory is used, which integrates information on major point sources from the European Pollutant Release and Transfer Register (E-PRTR) into the EDGARv4.2FastTrack CH 4 inventory (http://edgar.jrc. ec.europa.eu/overview.php?v=ingos) (Janssens-Maenhout et al., 2014). Since EDGARv4.2FT-InGOS only covers the period 2000-2010, the inventory of 2010 has also been applied as a priori for 2011 and 2012. For the natural CH 4 emissions from wetlands, most models used the wetland inventory of J. Kaplan (Bergamaschi et al., 2007) as a priori, except TM5-CTE, which applied LPX-Bern v1.0  instead. Inversion S3 was performed without using detailed bottom-up inventories as a priori, in order to analyse the constraints of observed atmospheric CH 4 on emissions independent of a priori information (using a homogeneous distribution of emissions over land and over the ocean, respectively, as starting point for the inversions in a similar manner as in Bergamaschi et al., 2015; for further details see Sect. S1 of the Supplement).

Atmospheric models
The atmospheric models used in this study are listed in Table 3. The models include global Eulerian models with a zoom over Europe (TM5-4DVAR, TM5-CTE, LMDZ), regional Eulerian models (CHIMERE), and Lagrangian dispersion models (STILT, NAME, COMET). The horizontal resolutions over Europe are ∼ 1.0-1.2 • (longitude) × ∼ 0.8-1.0 • (latitude) for the global models (zoom) and ∼ 0.17-0.56 • (longitude) × ∼ 0.17-0.5 • (longitude) for the regional models. Most models are driven by meteorological fields from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis (Dee et al., 2011). In the case of STILT, the operational ECMWF analyses were used, while for NAME meteorological analyses of the Met Office Unified Model (UM) were employed. The regional models use boundary conditions (background CH 4 mole fractions) from inversions of the global models (STILT from TM3, COMET from TM5-4DVAR, CHIMERE from LMDZ) or estimate the boundary conditions in the inversions (NAME) using baseline observations at Mace Head as a priori estimates.
In the case of NAME and CHIMERE, the boundary conditions are further optimised in the inversion. The inverse modelling systems applied in this study use different inversion techniques. TM5-4DVAR, LMDZ, and TM3-STILT use 4DVAR variational techniques, which allow optimisation of emissions of individual grid cells. These 4DVAR techniques employ an adjoint model in order to iteratively minimise the cost function using a quasi-Newton (Gilbert and Lemaréchal, 1989) or conjugate gradient (Rödenbeck, 2005) algorithm. The NAME model applies a simulated annealing technique, a probabilistic technique for approximating the global minimum of the cost function. In CHIMERE and COMET, the inversions are performed analytically after reducing the number of parameters to be optimised by aggregating individual grid cells before the inversion. TM5-CTE applies an ensemble Kalman filter (EnKF) (Evensen, 2003), with a fixed-lag smoother (Peters et al., 2005). All models used the same observational data set described in Sect. 2 (except the stations ZEP and ICE, which are outside the domain of some regional models, and except the mountain stations JFJ, PDM, and KAS, which were not used in the NAME inversions). For the stations with in situ measurements in the boundary layer, most models only assimilated measurements in the early afternoon (between 12:00 and 15:00 LT) and for mountain stations only nighttime measurements were assimilated (between 00:00 and 03:00 LT) . However, NAME and COMET used observations at all times. The different models have different approaches to estimate the uncertainties of the observations (including the measurement and model uncertainties), which determine the weighting of the individual observations in the inversions. In general, the estimated model uncertainties depend on the type of station and for some models (TM5-4DVAR and NAME) also on the specific synoptic situation. The individual inverse modelling systems are described in more detail in the Supplement (Sect. S1).
4 Results and discussion 4.1 European CH 4 emissions Figure 2 shows the maps of the European CH 4 emissions (average 2010-2012) derived from the seven inverse models for inversion S4. The corresponding maps for inversions S1-S3 (available from five models) are shown in the Supplement (Figs. S1-S3). In S1, S2, and S4, which are guided by the a priori information from the emission inventories, the a posteriori spatial distributions are usually close to the prior patterns on smaller scales (determined by the chosen spatial correlation scale lengths). The NAME inversion groups together grid cells for which the observational constraints are weak; i.e. it averages over increasingly larger areas at larger distances from the observations. Consequently, in the NAME inversion the "fine structure" of the a priori inventories disappears in areas which are not well constrained (e.g. Spain). Apart from this specific feature of the NAME model, some further differences in the spatial patterns derived by the different models are apparent. One example is the relatively high emissions derived by the COMET model in northwestern Poland and north-eastern Germany. Such differences on smaller spatial scales are probably partly due to differences in model transport and different weighting of the observations (i.e. different assumptions of model-data mismatch errors) but may also reflect to some extent some noise in the inverse modelling systems.
Comparing inversions S1, S2, and S4 shows overall very similar spatial patterns for all inverse models, indicating only moderate differences in the observational constraints of the three different sets of stations. In particular, addition of NOAA and RAMCES discrete air samples (inversion S2 vs. S4) results in only minor differences in the derived emissions. When the larger set of InGOS stations (S2 vs. S1) is used, most models yield higher CH 4 emissions from northern Italy. This is most likely mainly due to the observations from Ispra (IPR), at the north-western edge of the Po Valley, while this area is not well constrained in S1.
The information content of the observations is further examined in inversion S3, which does not use detailed emission inventories (Fig. S3), similarly to a previous sensitivity experiment in Bergamaschi et al. (2015). In particular, Filled blue circles are the locations of the InGOS measurement stations. Upper-left panel shows a priori CH 4 emissions (as applied in TM5-4DVAR at 1 • × 1 • resolution, while regional models use a higher resolution for the a priori emissions). Dates are mm/yyyy.
TM5-4DVAR and TM3-STILT yield similar spatial distributions with elevated CH 4 emissions from the BENELUX area and north-western Germany, from the coastal area of northwestern France, Ireland, the UK, and the Po Valley. Most of these patterns are also visible in inversion S3 of NAME but with more variability on smaller scales (while TM5-4DVAR and TM3-STILT show much smoother distributions). These regional hotspots are broadly consistent with the bottom-up inventories, which illustrates the principal capability of inverse modelling to derive emissions that are independent of detailed a priori inventories in the vicinity of observations. LMDZ and TM5-CTE also show elevated emissions over western and central Europe but, in contrast to the other three inverse models, no regional hotspots. For TM5-CTE this is related to the applied inversion technique (adjusting emissions uniformly over large predefined regions), which effectively limits the number of degrees of freedom and does not allow retrieval of regional hotspots if such patterns are not a priori present within the predefined regions. For LMDZ, the lack of regional hotspots is probably related to the specific settings for this scenario, with a spatial correlation scale length of 500 km, significantly larger than in TM5-4DVAR (50 km) and TM3-STILT (60 km). Figure 3a displays the annual total European CH 4 emissions derived by the models for 2006-2012 in inversion S1, and for 2010-2012 in S2-S4. The figure shows the total emissions from all EU-28 countries and separately the emissions from northern Europe (Norway, Sweden, Finland, Baltic countries, and Denmark), western Europe (UK, Ireland, Netherlands, Belgium, Luxembourg, France, Germany, Switzerland, and Austria), eastern Europe (Poland, Czech Republic, Slovakia, and Hungary), and southern Europe (Portugal, Spain, Italy, Slovenia, Croatia, Greece, Romania, and Bulgaria). The non-EU-28 countries Norway and Switzerland are included here in northern Europe and western Europe, respectively, but not in the EU-28. Six of the seven models yield considerably higher total CH 4 emissions from the EU-28 compared to the anthropogenic CH 4 emissions reported to UNFCCC (submission 2016), while NAME is very close to the UNFCCC emissions. This behaviour is also apparent for the European subregions western, eastern and southern Europe, while for northern Europe (where natural CH 4 emissions play a large role) NAME also yields higher total CH 4 emissions compared to UNFCCC (except for S3 in 2011 and 2012). Figure 3a also shows the results from the previous study of Bergamaschi et al. (2015), which used four inverse models (previous versions of those applied in this study) and a set of 10 European stations with continuous measurements (complemented by discrete air samples) to estimate CH 4 emissions in 2006-2007. For TM5-4DVAR, TM3-STILT, and LMDZ the results are relatively similar (within ∼ 10 % for EU-28) to this study, while the CH 4 emissions from NAME were ∼ 20 % lower (EU-28). Despite the significantly larger number of European monitoring stations in the present study, however, we emphasise that the available stations do not cover the whole EU-28 area very well. Consequently, the emissions especially from southern Europe remain poorly constrained.
For comparison of total emissions derived by the inverse models and anthropogenic emissions from emission inventories it is essential to account for natural emissions, especially from wetlands, peatlands, and wet soils. For an estimate of these emissions and their uncertainties, we use an ensemble of seven wetland inventories from the Wetland and Wetland  Wania et al., 2013) (the spatial distribution of European CH 4 emissions from the different individual WETCHIMP inventories is shown in Fig. S4). Figure 3a shows the mean, median, minimum, and maximum CH 4 emissions from this ensemble for EU-28 and the different European subregions. These quantities are evaluated after integrating over the corresponding areas, using the multi-annual mean (1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004) of the WETCHIMP inventories. For northern Europe, in particular, the estimated wetland emissions are high (2.5 (1.7-4.3) Tg CH 4 yr −1 , mean, minimum, maximum) and exceed the anthropogenic CH 4 emissions (UNFCCC: 1.3 Tg CH 4 yr −1 ; mean 2006-2012). Substantial wetland emissions are also estimated for western Europe (1.6 (0.4-3.1) Tg CH 4 yr −1 ), but wetland emissions are also nonnegligible for eastern Europe (0.3 (0.03-0.9) Tg CH 4 yr −1 ) and southern Europe (0.6 (0.01-1.1) Tg CH 4 yr −1 ), especially when considering the upper range of these estimates. For EU-28, wetland emissions of 4.3 (2.3-8.2) Tg CH 4 yr −1 are estimated, corresponding to 22 % (11-41 %) of reported anthropogenic CH 4 emissions.
Taking into account the estimates of the WETCHIMP ensemble brings the results of the six inverse models that derive high emissions into the upper uncertainty range of the sum of anthropogenic emissions (reported to UNFCCC) and wetland emissions, while the emissions derived by NAME fall in the lower range (Fig. 3b). This analysis suggests broad consistency between bottom-up and top-down emission estimates, albeit with a clear tendency (6 of 7 models) towards the upper range of the bottom-up inventories for the total CH 4 emissions from the EU-28. This behaviour is also apparent for western and southern Europe, while for eastern Europe several models are close to or above the upper uncertainty bound (NAME is very close to the mean), and for northern Europe several models are in the lower range (or below the lower uncertainty bound) of the combined UNFCCC and WETCHIMP inventory.
Critical to the assessment of consistency between the different approaches is the analysis of their uncertainties. Inverse models usually propagate estimated observation and model errors to the estimated emissions. However, in particular, the model errors are generally based on simplified assumptions. Furthermore, the error estimates of the inverse models usually take only random errors into account and are based on the assumption that observation and model errors are unbiased. Estimated 2σ uncertainties for EU-28 topdown emissions range between ∼ 7 and ∼ 33 % (except for inversion S3 of NAME, for which uncertainties are larger than 50 %). For the subregions northern Europe and southern Europe, which are poorly constrained by measurements, the model estimates of the relative uncertainties are significantly larger, ranging between ∼ 20 and more than ∼ 100 %.
The (2σ ) uncertainties of the UNFCCC inventories shown in Fig. 3a are based on the uncertainties of major CH 4 source categories reported by the countries in their national inven-tory reports. To calculate the uncertainties of total emissions per country (or group of countries), the reported uncertainties per category were aggregated as described in . We note, however, that uncertainties reported for the same category by different countries exhibit large differences (e.g. for coal between 9 and 300 %, for oil and natural gas between 5 and 460 %, for enteric fermentation between 7 and 50 %, for manure management between 5 and 100 %, and for solid waste disposal between 22 and 126 %), with the lower uncertainty estimates appearing unrealistically low. Furthermore, the estimates of the total uncertainties consider only the major categories (EU-28: 93 % of reported emissions) and do not take into account potential additional emissions (and their uncertainties) that are not covered by the inventories. Figure 3a also includes the anthropogenic CH 4 emissions from EDGARv4. 2FT-InGOS (for 2006, which are at the upper uncertainty bound of the UNFCCC inventories for EU-28. The difference between UNFCCC and EDGAR is mainly due to significant differences in CH 4 emissions from fossil fuels (coal, oil, and natural gas), which, however, might be overestimated in some cases in EDGAR .
For wetlands, very large differences between the different inventories of the WETCHIMP ensemble are apparent regarding the spatial emission distribution (see Fig. S4) and the magnitude of the emissions, illustrating the very high uncertainties in the current estimates. Comparing the different wetland inventories, a striking pattern is visible for LPJ-WHyMe, with very high CH 4 emissions for the British Isles. The climate of this region has mild winters that allow simulated wetland CH 4 emissions to continue year-round, yielding high annual emission intensity for LPJ-WHyMe .
In the previous analysis of Bergamaschi et al. (2015) the contribution from natural sources in western and eastern Europe was considered to be very small, based on the wetland inventory of J. Kaplan (Bergamaschi et al., 2007). However, that inventory is close to the lower estimates of the WETCHIMP ensemble. Unfortunately, direct comparisons of CH 4 emissions simulated by the different wetland inventories with local or regional CH 4 flux measurements in European wetland areas are lacking. Therefore, no conclusions can be drawn as to which of the inventories is most realistic.
To further investigate the contribution of wetland emissions we analyse the seasonal variations. Figure 4 illustrates that four inverse models (TM5-4DVAR, TM5-CTE, TM3-STILT, and LMDZ) calculate pronounced seasonal variations in total emissions. For EU-28 the derived seasonality is largely consistent with the seasonality of the wetland emissions from the WETCHIMP ensemble (both regarding the amplitude, and the phase with maxima in summer). For northern Europe, the seasonal variations derived by the four inverse models are somewhat smaller compared to the mean of the WETCHIMP ensemble, while for western and east- Figure 4. Same as Fig. 3a but including seasonal variation of CH 4 emissions derived from the inversions (S1 only; 3-monthly running mean, coloured solid lines), and seasonal variation of wetland CH 4 emissions from the WETCHIMP ensemble of seven models (mean, blue solid line; median, blue dashed line; minimum-maximum range, light-blue range; 3-monthly running mean). ern Europe they are somewhat larger but still broadly within the minimum-maximum range of the WETCHIMP inventories. For southern Europe, the seasonality of the four inverse models is more irregular, and the maximum emissions for the wetland ensemble show a clear peak in winter, which, however, is not apparent in the mean or median of the ensemble. This is probably due to the important role of precipitation for the wetland emissions in southern Europe, while for temperate and boreal regions the seasonal variation of wetland emissions is mainly driven by temperature (e.g. Christensen et al., 2003;Hodson et al., 2011). In contrast to the discussed four models, NAME derives much smaller seasonal variations, and for western Europe, eastern Europe, and EU-28 with the opposite phase (small maximum in winter). Only for northern Europe does NAME also estimate maximum emissions in summer; however the amplitude is much smaller compared to the other models and the WETCHIMP wetland inventories. One contribution to the smaller amplitude is that NAME provides only 3-monthly emissions (compared to monthly resolution of the other four inverse models), but the lower temporal resolution of NAME clearly only explains a smaller part of the different seasonal cycles. Figure S5 shows that also in inversion S3 (which is not using any detailed a priori inventory nor any a priori seasonal cycle) significant seasonal cycles of CH 4 emissions are derived by TM5-4DVAR, TM3-STILT, LMDZ, and TM5-CTE, which demonstrates that the derived seasonal cycles are mainly driven by the observations, and not by the a priori cycle.
Apart from the different behaviour of NAME, the finding that four inverse models derive seasonal cycles that are broadly consistent with the seasonal cycles calculated by the WETCHIMP ensemble supports a significant contribution of wetlands to the total CH 4 emissions. Commonly, anthropogenic CH 4 emissions are assumed to have no significant seasonal variations, except CH 4 emissions from rice and biomass burning (which, however, play only a minor role in Europe). Unfortunately, only very limited information is available about potential seasonal variations of anthropogenic CH 4 sources (other than rice and biomass burning). Ulyatt et al. (2010) reported significant seasonal variations of CH 4 emissions from dairy cows, mainly related to the lactation periods of cows. VanderZaag et al. (2014), estimating total CH 4 emissions from two dairy farms, found higher CH 4 emissions in autumn compared to spring, mainly due to varying CH 4 emissions from manure management. Besides agricultural CH 4 sources, CH 4 from landfills (Spokas et al., 2011) and waste water may also exhibit seasonal variations, while only small seasonal variations were found for natural gas distribution systems (McKain et al., 2015;Wennberg et al., 2012;Wong et al., 2016; and further references therein).
Quantitative estimates of potential seasonal variations of anthropogenic sources cannot be made due to the limited number of studies, but the relative variability of the total anthropogenic sources is expected to be much smaller compared to wetlands.
Model simulations and bottom-up inventories for individual countries (or group of countries) are shown in the Supplement (Fig. S6), illustrating further that wetland emissions are important, particularly in northern European countries but may also contribute significantly in many other countries.
Finally, we analyse the trends in CH 4 emissions (Fig. S7). Anthropogenic CH 4 emissions reported to UNFCCC for EU-28 decreased by −0.44 ± 0.02 Tg CH 4 yr −2 during 2006-2012. Also, all five inversions which are available for this period (inversion S1) derive negative CH 4 emission trends ranging between −0.19 and −0.58 Tg CH 4 yr −2 . The uncertainties given for the trends of the individual inversions (and the reported CH 4 emissions), however, include only the uncertainty of the linear regression (i.e. reflecting the scatter of the annual values around the linear trend) but do not take into account the uncertainties of the annual mean values and the error correlations between different years. In particular, the latter remain very difficult to estimate, which currently limits clear conclusions about the significance of the trends.

Evaluation of inverse models
First we evaluate the performance of model simulations at the atmospheric monitoring stations. Figure S8 shows the correlation coefficients, bias, root mean square (rms) difference, and the ratio between modelled and observed standard deviation for inversion S4, including stations that were assimilated and stations that were used for validation only. For the evaluation of the statistics for the in situ measurements, we use only early afternoon data (between 12:00 and 15:00 LT). Averaging over all stations, the correlation coefficients are between 0.65 and 0.79 for six models, and 0.5 for COMET. The ranking of models in terms of correlation coefficients is closely reflected in the achieved average rms values, ranging between 33 and 70 ppb (with models with higher correlation coefficients typically achieving lower average rms). At several tall towers a clear tendency of decreasing rms with increasing sampling height is visible, demonstrating the benefit of higher sampling heights, which allow more representative measurements that are less affected by local sources and that can be better reproduced by the models.
While the evaluation of the model simulations at the monitoring stations provides a measure of the quality of the inversions and the atmospheric transport models applied (e.g. with the correlation coefficients describing how much of the observed variability can be explained by the models), the analysis of the station statistics cannot quantify how realistic the derived emissions are but gives only some qualitative indications about potential biases of the emissions. The inverse models optimise model emissions to achieve an optimal agreement between simulated and observed atmospheric CH 4 mole fractions (taking into account the a priori constraints). This implies that potential biases of the model (or the observations) may be compensated in the inversions by introducing biases in the derived emissions. In particular, vertical mixing of the models is very critical in this context. For example, too strong vertical mixing of the transport models may be compensated in the inversion by enhancing the model emissions (i.e. deriving model emissions that are higher than real emissions) such that a good agreement between simulated and observed mole fractions at the surface can still be achieved. An important diagnostic that can be used to identify such potential systematic errors is the analysis of vertical profiles (including the boundary layer and the free troposphere). For this purpose we compare our model simulations with regular aircraft profiles at four European sites (Fig. 5). At Griffin (GRI), observed and simulated mole fractions show only small vertical gradients, while at Orléans (ORL), Hegyhátsál (HNG), and Białystok (BIK) large vertical gradients are visible, with increasing values towards the surface. The figure also includes the background mole fractions in the absence of model emissions over Europe calculated by TM5-4DVAR (based on the scheme of Rödenbeck et al., 2009). At GRI, the measurements are in general very close to the background mole fractions, illustrating that the impact of European emission is rather limited at this site. In contrast, pronounced enhancements in measured and simulated CH 4 compared to the background are apparent at the other three sites, especially in the lower ∼ 2 km due to regional emissions. These enhancements show some seasonal variation, with largest vertical extension during summer (∼ 2 km), while they are confined to the lower ∼ 1 km during winter due to the seasonal variations in the average boundary layer height (Koffi et al., 2016). Please note that the differences in the background mole fractions, which are visible in Fig. 5 between some sites, are partly due to the different temporal sampling at the different sites (compare Fig. 6).
To analyse potential model biases more quantitatively, in the following we evaluate the enhancement of observations and model simulations compared to background CH 4 values (1) integrated over the entire boundary layer, and (2) integrated over the lower troposphere up to ∼ 3-4 km. The rationale behind this approach is that emissions initially mainly accumulate within the boundary layer. Therefore, potential biases in model emissions should be reflected in differences between the observed and modelled integrated enhancement within the boundary layer. For the overall budget, however, mixing between the boundary layer and free troposphere plays an important role. Thus, the enhancement integrated over the entire lower troposphere provides additional diagnostics for potential model biases.
The integration of the enhancements is shown for the individual profiles at ORL, HNG, and BIK in the Supplement (Figs. S9, S10, S11). In addition, we also use aircraft mea-  . For NAME the model enhancement has been evaluated using the NAME background, for TM3-STILT using the TM3 background, while for all other models the TM5-4DVAR background is used. (a) Time series; (b) seasonal averages (including 1σ standard deviation) with numbers of available profiles given as bar graphs (see right axis). The numbers on the right side are the average relative bias, 1σ standard deviation, and total number of profiles over the entire period.
surements from the IMECC campaign in September/October 2009 (Fig. S12). These include profile measurements at Orléans and Białystok but also at Karlsruhe, Jena, and Bremen, hence extending the spatial coverage of the sites with regular profiles (ORL, HNG, and BIK). To calculate the enhancements for the individual profiles, we apply the background mole fractions calculated for the TM5-4DVAR zoom domain as the common reference for the observations and the model simulations for all global models (i.e. TM5-4DVAR, TM5-CTE, and LMDZ). For STILT and NAME, the background CH 4 is calculated for the STILT and NAME domains, but the dependence of the background mole fractions (calculated by TM5-4DVAR) on the exact extension of the domain is generally rather small. However, the CH 4 background mole fractions used in the inversions of the regional models (for NAME based on baseline observations at Mace Head and for TM3-STILT based on the TM3 model) show significant differences compared to the TM5-4DVAR background, with typically ∼ 10 ppb higher values at the three continental aircraft sites (ORL,HNG,and BIK;see Fig. 5). In order to in-vestigate which background mole fractions are more realistic we compared the model simulations with the aircraft observations for events with very low simulated contribution (≤ 3 ppb) from European CH 4 emissions (Fig. S14). This analysis shows that TM5-4DVAR simulations are close to the observations (average bias between −1.1 and 3.5 ppb), which indicates that the TM5-4DVAR background is relatively realistic, while NAME and TM3-STILT are consistently higher at the continental aircraft sites with average biases of 12-13 ppb for NAME and 9-12 ppb for TM3-STILT. This supports the use of the background calculated with TM5-4DVAR as reference for the measurements. For the evaluation of the simulated CH 4 enhancements of the regional models, however, we use the actual background used in NAME and TM3-STILT.
For the integration over the boundary layer, we use the boundary layer height (BLH) diagnosed by TM5. A recent comparison of the TM5 BLH with observations from the NOAA Integrated Global Radiosonde Archive (IGRA) (Koffi et al., 2016) showed that TM5 reproduces the daytime BLH relatively well (within ∼ 10-20 %), but larger deviations were found for the nocturnal BLH, especially during summer, when very low BLHs (< 100 m) are observed. Here, we use only profiles for which the (TM5 diagnosed) BLH is not lower than 500 m. The average enhancement of the measurements and model simulations in the boundary layer compared to the background is denoted by c OBS, BL and c MOD, BL , respectively (further details about the evaluation of the enhancements are given in the Supplement). Figure 6 shows the derived relative bias, defined as for ORL, HNG, and BIK for the entire target period 2006-2012 (inversion S1). The three global inverse models (i.e. TM5-4DVAR, TM5-CTE, and LMDZ) show in general only a small average relative bias (rb BL between −7 and 11 %) at the three aircraft sites. In contrast, TM3-STILT and NAME have significant negative relative biases (TM3-STILT: rb BL between −13 and −24 % for the three sites; NAME rb BL = −30 % for ORL and HNG). These negative biases are likely related to the positive bias in the background CH 4 used for NAME and TM3-STILT (see above), since the regional models invert the difference between the observations and the assumed background. In fact, also at most continental atmospheric monitoring stations, the background used for NAME and TM3-STILT is significantly higher (∼ 10 ppb) compared to the TM5-4DVAR background (Fig. S15).
The relative bias is also extracted separately for different seasons (right panel of Fig. 6). There is no clear seasonal cycle in the relative bias apparent and the variability between the different seasons is generally small (data points at BIK for DJF are considered not significant as they are from one single profile only). From this analysis there is no evidence that the seasonal cycle of emissions derived by four inverse models (TM5-4DVAR, TM5-CTE, TM3-STILT, and LMDZ; see Sect. 4.1) with clear maxima in summer could be due to a seasonal bias in the transport models. At the same time, however, NAME, which calculates much smaller seasonal variations of emissions, shows no seasonal variations of the average bias at ORL and HNG. However, especially at HNG the total number of profiles is rather small (n = 21), which limits the analysis of potential seasonal transport biases. Figure S13 shows the relative bias of the CH 4 enhancements integrated over the lower troposphere, defined as rb COL = c MOD, COL − c OBS, COL / c OBS, COL . (2) The three global inverse models (i.e. TM5-4DVAR, TM5-CTE, and LMDZ) have a relative bias between of −4 and 20 % at the three aircraft sites, indicating a small tendency to overestimate the European CH 4 emissions, while the regional models show a negative relative bias (TM3-STILT: between −9 and −20 % for the three sites; NAME −31 % for ORL and −40 % HNG). Figure 7 presents an overview of the derived relative biases for the enhancement integrated over the boundary layer (rb BL , top panel of figure) and in the lower troposphere (rb COL , lower panel). The differences in the relative bias integrated over the lower troposphere compared to that integrated only over the boundary layer (e.g. rb COL > rb BL for TM5-4DVAR and TM5-CTE at ORL and BIK) suggest that shortcomings of the models to simulate the exchange between the boundary layer and the free troposphere may contribute significantly to the bias in the derived emissions. An illustrative example of the shortcomings of the models in simulating the free troposphere are the IMECC profiles at Białystok on 30 September 2009 (Fig. S12). The measurements show a considerable CH 4 enhancement (∼ 25 ppb) at around 3.5 to 4 km, which is not reproduced by the models. This could indicate that cloud convective transport was missed by the models.
A general limitation of the analysis of the enhancements integrated over the lower troposphere, however, is that this analysis is more sensitive to potential errors in the simulated background mole fractions in the free troposphere compared to the boundary layer, because of the generally much lower enhancements in the free troposphere.
Finally, we analyse the correlation between the relative bias of the integrated CH 4 enhancements and the regional model emissions. Figure S16 shows the relationship between rb BL and the average model emissions around the aircraft site, integrating all model grid cells with a maximum distance of 400 km (hereafter referred to as integration radius) from the aircraft site. At all three sites clear correlations between rb BL and the regional model emissions are found, which confirms that rb BL derived from the aircraft profiles can be used to diagnose biases in the regional model emissions.
The derived correlations depend on the chosen area, over which model emissions are integrated. For ORL and HNG, significant correlations were found for integration radii between 200 and 800 km, while for BIK different integration radii resulted in poorer correlations (not shown), probably related to significant differences in the spatial emission patterns derived by the different models around this site. To further improve the analysis, the "footprints" (i.e. sensitivities of atmospheric concentrations to surface emissions) of the individual aircraft profiles should be taken into account in the future. Furthermore, it would be useful to calculate, for all global models individually, the background mole fractions using the scheme of Rödenbeck et al. (2009). This would allow the modelled CH 4 enhancements to be derived more accurately.

Conclusions
We have presented estimates of European CH 4 emissions for 2006-2012 using the new InGOS data set of in situ measurements from 18 European monitoring stations (and additional Figure 7. Overview of relative bias at different aircraft sites. (a) Relative bias within the boundary layer (rb BL ). (b) Column-averaged relative bias (rb COL ). For NAME the relative bias has been evaluated using the NAME background, for TM3-STILT using the TM3 background, while for all other models the TM5-4DVAR background is used. Numbers of available profiles given as bar graphs (see right axis). discrete air sampling sites) and an ensemble of seven different inverse models. For the EU-28, total CH 4 emissions of 26.8 (20.2-29.7) Tg CH 4 yr −1 are derived (mean, 10 % percentile, and 90 % percentile from all inversions), compared to total anthropogenic CH 4 emissions of 21.3 Tg CH 4 yr −1 (2006) to 18.8 Tg CH 4 yr −1 (2012) reported to UNFCCC. Our analysis highlights the potential significant contribution of natural emissions from wetlands (including peatlands and wet soils) to the total European emissions, with total wetland emissions of 4.3 (2.3-8.2) Tg CH 4 yr −1 (EU-28) estimated from the WETCHIMP ensemble of seven different wetland inventories Wania et al., 2013). The hypothesis of a significant contribution from natural emissions is supported by the finding that four inverse models (TM5-4DVAR, TM5-CTE, TM3-STILT, LMDZ) derive significant seasonal variations of CH 4 emissions with maxima in summer. However, the NAME model only calculates a weak seasonal cycle, with small maximum (of EU-28 total CH 4 emis-sions) in winter. Furthermore, it needs to be emphasised that wetland inventories have large uncertainties and show large differences in the spatial distribution of CH 4 emissions.
Taking into account the estimates of the WETCHIMP ensemble, the bottom-up and top-down estimates of total EU-28 CH 4 emissions are broadly consistent within the estimated uncertainties. However, the results from six inverse models are in the upper uncertainty range of the sum of anthropogenic emissions (reported to UNFCCC) and wetland emissions, while the emissions derived by NAME are in the lower range. Furthermore, the comparison of bottomup and top-down estimates shows some differences for the different European subregions. For northern Europe (including Norway) several models are in the lower range (or below the lower uncertainty bound) of the combined UNFCCC and WETCHIMP inventory, while for eastern Europe several models are close to the upper uncertainty bound or above (NAME is very close to the mean). Considering the estimated uncertainties of the inverse models, however, the uncertainty ranges of bottom-up and top-down estimates generally overlap for the different European subregions.
To estimate potential biases of the emissions derived by the inverse models, we analysed the enhancements of CH 4 mole fractions compared to the background, integrated over the entire boundary layer and over the lower troposphere, using regular aircraft profiles at four European sites and the IMECC aircraft campaign.
This analysis showed for the three global inverse models (TM5-4DVAR, TM5-CTE, and LMDZ) a relatively small average relative bias (rb BL between −7 and 11 %, rb COL −4 and 20 % for ORL, HNG and BIK). The regional models revealed a significant negative bias (TM3-STILT: rb BL between −13 and −24 %, rb COL between −9 and −20 % for ORL, HNG and BIK; NAME rb BL = −30 %, rb COL between −31 and −40 % at ORL and HNG). A potential cause of the negative relative bias of TM3-STILT and NAME is the significant positive bias of the background used in TM3-STILT (from global TM3 inversion) and NAME (based on measurements at baseline conditions at Mace Head).
The relative bias rb BL shows clear correlations with regional model emissions around the aircraft profile sites, which confirms that rb BL can be used to diagnose biases in the regional model emissions. The accuracy of the estimated relative biases, however, depends on the quality of the simulated background mole fractions. In particular the enhancements derived for the lower troposphere above the boundary layer (which are usually much smaller than the enhancements within the boundary layer) are very sensitive to the background mole fractions. Therefore, potential model errors in the exchange between the boundary layer and the free troposphere (and their impact on the derived emissions) remain difficult to quantify.
Our study highlights the challenge of verifying anthropogenic bottom-up emission inventories with small uncertainties desirable for the international climate agreements. To reduce the uncertainties of the top-down estimates (1) the natural emissions need to be better quantified, (2) transport models need to be further improved, including their spatial resolution and in particular the simulation of vertical mixing, and (3) the network of atmospheric monitoring stations should be further extended, especially in southern Europe, which is currently clearly undersampled. Furthermore, the uncertainty estimates of bottom-up inventories (including both the anthropogenic and natural emissions) and atmospheric inversions need to be further improved.
Data availability. Underlying data are available upon request.