Global emissions of non-methane hydrocarbons deduced from SCIAMACHY formaldehyde columns through 2003–2006

: Formaldehyde columns retrieved from the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography/Chemistry (SCIAMACHY) instrument onboard ENVISAT satellite through 2003 to 2006 are used as top-down constraints to derive updated global biogenic and biomass burning flux estimates for the non-methane volatile organic compounds (NMVOCs) precursors of formaldehyde. Our interest is centered over regions experiencing strong emissions, and hence exhibiting a high signal-to-noise ratio and lower measurement uncertainties. The formaldehyde dataset used in this study has been recently made available to the community and complements the long record of formaldehyde measurements from the Global Ozone Monitoring Experiment (GOME). We use the IMAGESv2 global chemistry-transport model driven by the Global Fire Emissions Database (GFED) version 1 or 2 for biomass burning, and from the Abstract. Formaldehyde columns retrieved from the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography/Chemistry (SCIAMACHY) instrument onboard ENVISAT satellite through 2003 to 2006 are used as top-down constraints to derive updated global biogenic and biomass burning ﬂux estimates for the non-methane volatile organic compounds (NMVOCs) precursors of formaldehyde. Our interest is centered over regions experiencing strong emissions, and hence exhibiting a high signal-to-noise ratio and lower measurement uncertainties. The formaldehyde dataset used in this study has been recently made available to the community and complements the long record of formaldehyde measurements from the Global Ozone Monitoring Experiment (GOME). We use the IMAGESv2 global chemistry-transport model driven by the Global Fire Emissions Database (GFED) version 1 or 2 for biomass burning, and from the newly developed MEGAN-ECMWF isoprene emission database. The adjoint of the model is implemented in a grid-based framework within which emission ﬂuxes are derived at the model resolution, together with a differentiation of the sources in a grid cell. Two inversion studies are conducted using either the GFEDv1 or GFEDv2 as a priori for the pyrogenic ﬂuxes. Although on the global scale the inferred emissions from the two categories exhibit only weak deviations from the corresponding a priori estimates, the regional updates often present large departures from their a priori values. The posterior isoprene emissions over North America, amounting to about 34 Tg C/yr, are estimated to be

Abstract. Formaldehyde columns retrieved from the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography/Chemistry (SCIAMACHY) instrument onboard ENVISAT satellite through 2003 to 2006 are used as top-down constraints to derive updated global biogenic and biomass burning flux estimates for the non-methane volatile organic compounds (NMVOCs) precursors of formaldehyde. Our interest is centered over regions experiencing strong emissions, and hence exhibiting a high signal-to-noise ratio and lower measurement uncertainties. The formaldehyde dataset used in this study has been recently made available to the community and complements the long record of formaldehyde measurements from the Global Ozone Monitoring Experiment (GOME). We use the IMAGESv2 global chemistry-transport model driven by the Global Fire Emissions Database (GFED) version 1 or 2 for biomass burning, and from the newly developed MEGAN-ECMWF isoprene emission database. The adjoint of the model is implemented in a grid-based framework within which emission fluxes are derived at the model resolution, together with a differentiation of the sources in a grid cell. Two inversion studies are conducted using either the GFEDv1 or GFEDv2 as a priori for the pyrogenic fluxes. Although on the global scale the inferred emissions from the two categories exhibit only weak deviations from the corresponding a priori estimates, the regional updates often present large departures from their a priori values. The posterior isoprene emissions over North America, amounting to about 34 Tg C/yr, are estimated to be Correspondence to: T. Stavrakou (jenny@aeronomie.be) on average by 25% lower than the a priori over 2003-2006, whereas a strong increase (55%) is deduced over the south African continent, the optimized emission being estimated at 57 Tg C/yr. Over Indonesia the biogenic emissions appear to be overestimated by 20-30%, whereas over Indochina and the Amazon basin during the wet season the a priori inventory captures both the seasonality and the magnitude of the observed columns. Although neither biomass burning inventory seems to be consistent with the data over all regions, pyrogenic estimates inferred from the two inversions are reasonably similar, despite their a priori deviations. A number of sensitivity experiments are conducted in order to assess the impact of uncertainties related to the inversion setup and the chemical mechanism. Whereas changes in the background error covariance matrix have only a limited impact on the posterior fluxes, the use of an alternative isoprene mechanism characterized by lower HCHO yields (the GEOS-Chem mechanism) increases the posterior isoprene source estimate by 11% over northern America, and by up to 40% in tropical regions. and anthropogenic non-methane volatile organic compounds (NMVOCs) contributes significantly to the HCHO formation. On the global scale, about 60% of the produced HCHO (ca. 1600 Tg/yr) originates from the methane oxidation, 30% from isoprene degradation, 3% is formed either directly or through the oxidation of NMVOCs emitted during fire events, and the remainder is generated by anthropogenic sources (Stavrakou et al., 2009). Through its subsequent decomposition by photolysis and reaction with OH, HCHO becomes a source of HO 2 , key actor in ozone formation and OH production, and carbon monoxide, and thus impacts on the oxidizing capacity of the atmosphere (Lelieveld and Crutzen, 1990).
Although the degradation of the NMVOCs does not represent the major part of the global HCHO production, important HCHO column enhancements are observed from space over densely forested areas due to the combination of biomass burning and high isoprene emission fluxes, and to a lesser extent over regions with sustained human activities. Satellite measurements constitute therefore an important means in order to constrain the emission budget of NMVOCs. Previous studies have exploited the capability of HCHO columns from the Global Ozone Monitoring Experiment (GOME) satellite instrument to provide crucial information on the underlying NMVOCs. These studies focused on the biogenic emission estimates over the North American continent throughout the period from 1996 to 2001 Abbot et al., 2003), or on the global scale (Shim et al., 2005), whereas more recently, North American isoprene emissions have been derived from HCHO columns measured from the Ozone Monitoring Experiment (OMI) satellite instument for the summer of 2006 . Long-term GOME HCHO columns have also been used as a way to improve the regional NMVOC estimates in East and South Asia (Fu et al., 2007). The aforementioned modelling studies have been carried out using the GEOS-Chem chemistry-transport model and the Harvard retrieval of HCHO columns (Chance et al., 2000(Chance et al., , 2002. The time series of GOME HCHO observations has been recently complemented by HCHO columns retrieved from the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography/Chemistry (SCIAMACHY) UVvisible nadir sounder onboard ENVISAT . This retrieval exercise is based on a new fitting window applicable to both GOME and SCIAMACHY aiming to ensure the quality and consistency of the derived HCHO dataset, which spans the 12-year period from 1996 to 2007. In our study HCHO columns measured from SCIA-MACHY between 2003 and 2006 are used to provide constraints to the current estimates of the parent NMVOC fluxes (Sect. 2.1). Our focus will be on biomass burning and biogenic emissions, responsible for the enhanced measured HCHO columns over tropical areas, as well as in the midlatitudes during the growing season. The NMVOC emission strengths are recognized to be highly uncertain for tropical ecosystems due to the scarcity of field measurements, although valuable information has become recently available (Karl et al., 2007a,b;Cohen et al., 2008).
The IMAGESv2 global chemistry-transport model (Müller and Stavrakou, 2005;Stavrakou et al., 2009) provides the a priori estimate for HCHO columns (Sect. 2.2). The model is driven by bottom-up isoprene emissions taken from Müller et al. (2008). This database uses the MEGAN algorithm , is driven by meteorological fields from the European Center for Medium-Range Weather Forecasts (ECMWF), and is coupled with a canopy environment model. Two a priori pyrogenic emission inventories are tested in this study: the Global Fire Emission Database (GFED) version 1 (van der Werf et al., 2004), and version 2 (van der , both based on satellite fire products, and covering the target period . Previous work indicated that, despite the improved methodology and input data utilized in the updated GFEDv2, this database still has several limitations (Stavrakou et al., 2009).
To keep the influence of SCIAMACHY column errors minimal, data with low signal-to-noise ratio are excluded from the analysis (Sect. 3.1, 3.2). The adopted grid-based inverse modelling approach has been used earlier to provide improved emission estimates of reactive species (Stavrakou and Müller, 2006). It allows for the optimization of the emission strengths at the model resolution, providing a differentiation among the emission sources, while accounting for the interactions between chemical compounds . (Sect. 3.3). Using monthly averaged observed HCHO columns as constraints, two inversion studies are performed, using GFEDv1 or GFEDv2 as a priori for pyrogenic NMVOC emissions, and updated top-down global emission estimates are derived for the NMVOC sources over . Comparisons with previous modelling work are presented, as well as a number of sensitivity studies conducted to quantify how different parameters of the inversion setting influence the inferred solution (Sect. 4.6).

HCHO column abundances from SCIAMACHY
The SCIAMACHY UV-visible nadir sounder, onboard EN-VISAT (2001), has been used to retrieve global tropospheric HCHO columns by applying the differential optical absorption spectroscopy (DOAS) technique (Platt et al., 1994) in the near UV region.
Slant columns are fitted within the 328.5-346 nm wavelength range using the DOAS analysis software (Van Roozendael et al., 1999). The applied HCHO absorption cross-sections are those of Cantrell et al. (1990) shifted by 0.08 nm and convolved to the resolution of the instrument . This shift leads to a reduction by 8% of the obtained HCHO slant column within the spectral Atmos. Chem. Phys., 9, 3663-3679, 2009 www.atmos-chem-phys.net/9/3663/2009/ T. Stavrakou et al.: Multi-year inversion of SCIAMACHY formaldehyde columns 3665 window used for the retrieval. The derived columns are found to be within 2% of the columns which would be obtained using the Meller and Moortgat (2000) dataset (see also in the interactive discussion of the paper). In the air mass factor calculation, weighting functions have been evaluated from radiative transfer calculations performed with a pseudospherical version of the DISORT code (Stammes et al., 1988;Kylling et al., 1995). For each observation, a sensitivity function is interpolated through a look-up table of the scattering properties of the atmosphere, modelled for a number of representative viewing geometries, UV-albedos and ground altitudes. The normalised HCHO profile is provided by the IM-AGESv2 model (Stavrakou et al., 2009) on a monthly basis and interpolated for each satellite geolocation. A correction for cloud effects is applied based on Martin et al. (2001), but no correction for the effects of aerosols is considered in the current version of the dataset. Cloud information is obtained from the Fast Retrieval Scheme for Cloud Observables (FRESCOv5) product (Koelemeijer et al., 2002;Wang et al., 2008). The surface albedo distribution is obtained from the climatology of Koelemeijer et al. (2003). A random and a systematic error component of the slant columns is estimated . The random error on the slant columns reaches 10 16 molec./cm 2 , whereas the systematic error, accounting for different sources of uncertainties (e.g. absorption cross-sections, inaccurate calibration, etc.), ranges from 2.5×10 15 in tropical regions to 8.0×10 15 molec./cm 2 at high latitudes. The error on the air mass factor calculations is estimated at 18% under clear sky conditions, but increases in the presence of clouds to reach 50% for a cloud fraction equal to 50%. On average, the total uncertainty on the monthly HCHO vertical column ranges between 20 and 40% for regions with high signal to noise ratio . The monthly mean HCHO columns used in this study are calculated using observations with a cloud fraction below 0.4.

The IMAGESv2 global CTM
The IMAGESv2 CTM is an updated version of the IMAGES model (Müller and Brasseur, 1995;Müller and Stavrakou, 2005) with updates and improvements briefly discussed in this section.
The model calculates the distribution of 68 chemical compounds in the global troposphere at a resolution of 5 degrees in latitude and in longitude, with 40 levels in the vertical. Advection of chemical compounds is driven by monthly averaged winds obtained from ECMWF operational analyses from September 2002 to December 2006. Boundary layer diffusion, deep convection and other cloud processes are parameterized based on daily ECMWF fields. As the model time step is one day, diurnal variations in the photorates and in the concentrations are accounted for through correction factors (Stavrakou et al., 2009). The HCHO diurnal profiles are also used to estimate the HCHO concentration at the satellite overpass time (10:00 h local time) from the daily averaged values calculated with one day time step. The impact of the adopted numerical implementation (use of a long time step and of correction factors to account for the diurnal cycle) on the modelled HCHO columns was found to be on the order of a few percents in a sensitivity study using an explicit diurnal cycle with a 20-min time step and the KPP chemical solver (Stavrakou et al., 2009). The modelled columns are confronted with HCHO column data (binned onto the model horizontal grid and monthly averaged) between 1 January 2003 and 31 December 2006, following a 4-month spin-up time. The calculated monthly averaged HCHO columns account for the sampling times of observations at each location.
The inorganic reactions, as well as the methane chemistry are as in Müller and Stavrakou (2005), with updates from Sander et al. (2006). The chemical mechanism comprises 15 explicit NMVOCs: acetic acid, methanol, ethane, ethene, acetylene, propane, propene, acetone, biacetyl, isoprene, 2butanone, glyoxal, methylglyoxal, glycolaldehyde, and acetaldehyde. The degradation mechanism for the majority of the NMVOCs is mostly based on the Master Chemical Mechanism (MCM) (Saunders et al., 2003), whereas for isoprene, the Mainz Isoprene Mechanism (MIM) (Pöschl et al., 2000), also reduced from MCM, is adopted with some modifications (Stavrakou et al., 2009). Given the large number and diversity of the NMVOCs involved in the HCHO production, a box model study using MCM has been carried out in order to determine the HCHO yields from the NMVOCs oxidation under high (1 ppbv NO 2 ) and low NO x conditions (0.1 ppbv NO 2 ). The chemical mechanism is then modified so as to reproduce the MCM-computed HCHO yields (Stavrakou et al., 2009).
Short-term and final HCHO yields from the emitted NMVOCs in IMAGESv2 are calculated after one day or two months of box model simulations, respectively, and illustrated in Fig. 1.The simulations are performed under high NO x conditions (1 ppbv NO 2 ), and are initialized with 0.1 ppbv NMVOC, 35 ppb O 3 , and 100 ppb CO. The compounds explicitly included in the chemical scheme represent 87% of the total short-and long-term HCHO production, based on the pyrogenic emission speciation of Andreae and Merlet (2001) with updates from M. O. Andreae (personal communication, 2007), and on the global biomass burnt estimates taken from the a priori biomass burning inventories (Sect. 2.3). The remainder (13%) is accounted for by a surrogate species (Stavrakou et al., 2009).
As displayed in Fig. 1, the highest short-term yields per unit-carbon are calculated for ethene (C 2 H 4 , 0.7), propene (C 3 H 6 , 0.6), biacetyl ((CH 3 CO) 2 , 0.5), acetaldehyde (CH 3 CHO, 0.47) and isoprene (C 5 H 8 , 0.46), whereas abundantly emitted species like methanol (CH 3 OH) and acetic acid (CH 3 COOH) have very small one-day yields, but are important contributors to the final HCHO production. IMAGESv2-computed HCHO yields per unit carbon from the oxidation of NMVOCs under high NO x (1 ppbv NO 2 ) conditions. Short-term and ultimate yields are represented by blue and red rectangles, respectively. Only one rectangle is shown for species having the same short-term and final yield.

A priori emissions
Anthropogenic CO and total NMVOC emissions are provided by the EDGAR v3.3 database for 1997 (Olivier et al., 2001;Olivier, 2002), while speciation for VOC emissions is taken from the POET database (Olivier et al., 2003) In both inventories, monthly carbon fire emissions are estimated at 1 degree spatial resolution, by using satellite fire activity data converted into burned area together with biogeochemical modelling. Trace gas emissions are derived from the carbon emissions using emission factors from Andreae and Merlet (2001) with updates from M. O. Andreae (personal communication, 2007). In the current version of the model, pyrogenic emissions of 15 explicit NMVOCs are included, whereas the C 4 H 10 lumped species accounts for the emissions of non-explicit NMVOCs. The total NMVOC biomass burning emissions amount on average for the 4-year period to 94 Tg/yr for GFEDv1 and to 90 Tg/yr for GFEDv2.
The principal updates of the GFEDv2 inventory with respect to GFEDv1 are (i) the use of hot spots from the Moderate Resolution Imaging Spectroradiometer (MODIS); (ii) the large number of processed burned area tiles allowing to account for regional variations; (iii) a regional relation between fire hot spots and burned area ; (iv) the inclusion of combustion of belowground carbon, a source at least as important as the above ground carbon burning in regions like Southeast Asia; and finally, (v) the use of fire persistence which increases the combustion completeness in deforestation areas.
The MEGAN-ECMWF inventory for isoprene emissions  used in this study accounts for the diurnal, daily, seasonal, and interannual variability of the emissions. It is based on the MEGAN model , which is driven primarily by landcover, temperature, and solar radiation, and on a detailed canopy environment model. The emission rate of isoprene is defined as (1) where (mg/m 2 /h) is the emission at the standardized conditions defined in Guenther et al. (2006), and the factor 0.52 is introduced in order to ensure that E= under these conditions . The dependence on the photosynthetic photon flux density (PPFD, µmol/m 2 /s) and the leaf temperature is considered by the emission activity factors γ P and γ T , respectively . The sum is considered over the eight layers of the canopy environment model . The emission response to the leaf growth state and the soil moisture stress is accounted for by the activity factors γ age and γ SM . The total isoprene emission for a given grid cell is then expressed as the sum of the emissions calculated for all plant functional types present in the given cell, for clear sky and cloudy conditions. Operational ECMWF analyses for the downward solar flux, the cloud cover fraction, the air temperature, the dewpoint temperature, and the windspeed directly above the canopy are used to drive the isoprene emission model. Monthly mean LAI values are climatological values derived from the MODIS satellite instrument. The emissions, available at 0.5 degree resolution, are daily averaged and gridded onto the resolution of IMAGESv2.
The biogenic source of methanol is as in Stavrakou et al. (2009). The biogenic source for CO, NO x and NMVOCs, as well as ocean emissions of CO and NMVOCs are taken from Müller and Stavrakou (2005).

Data selection for the inversion
The criteria for selecting the data used as constraints in the inversion are specified here. Data are excluded when the calculated error on a model grid cell exceeds 70%. The estimated error on the HCHO columns comprises a random and a systematic component, as well as the error on the air mass factor calculation (Sect. 2.1). In addition, we reject data in the vicinity of the South Atlantic anomaly (SAA), a region characterized by strong radiation and a weak geomagnetic field strength. We use the location of the SAA (latitude=−26.1 • , width=16.7 • , longitude=−47.4 • , width=28.4 • ) as derived in Nichitiu et al. (2004), and we exclude those data for which the distance from the SAA location defined as d= exp(−( k+47.4−360 is higher than 0.7, where k, i is the longitude and latitude of the location. Further, as we do not optimize HCHO production from methane oxidation, oceanic data are also excluded from the current analysis. Seasonal averages of the data used in the inversion for the year 2006 are displayed in the left column of Fig. 2.

Modelled vs. observed HCHO columns
Two model simulations are conducted, using either the GFEDv1 or the GFEDv2 database as bottom-up inventory for biomass burning emissions. The middle column of Fig. 2 illustrates seasonal averages of the a priori modelled HCHO column abundancies for 2006 calculated using the GFEDv2 emission inventory, to be compared with the corresponding observed HCHO columns on the left column of the same figure.
The largest columns are mostly associated to the emissions due to fires, and to the biogenic fluxes during summertime over the temperate forests of the Northern Hemisphere. Column abundancies can reach values up to 2×10 16 molec./cm 2 , whereas values higher than 1×10 16 molec./cm 2 are observed most of the year in tropical regions, largely because of isoprene emitted by the evergreen vegetation. The a priori columns shown in this figure, capture quite well the observed patterns and seasonality, as well as the observed column enhancements in most of the regions of interest, although a closer look reveals discrepancies as to the magnitude of the columns, pointing to biases in the a priori NMVOC fluxes. The biogenic and pyrogenic NMVOC emission estimates that best reproduce the observations within their uncertainties and account for the assigned errors on the a priori emissions, will be quantified through inverse modelling, whose general approach is described hereafter.

Grid-Based Inversion Setup
A thorough discussion of the inversion scheme is presented in Müller and Stavrakou (2005) and Stavrakou and Müller (2006). In this section we explain in short the main steps of the inversion. Let pr j (x, t) be the a priori emission distribution, where j =1, ..., m denote the different emission categories, x the space variables (latitude, longitude, and altitude), and t the time (month). The a priori pr j (x, t) and optimized opt (x, t, f j ) spatiotemporal emission distributions are linked through the expression where m is the number of base functions, and f=(f j ) the vector of dimensionless control parameters to be determined so that the function J , quantifying the bias between the modelled HCHO columns H (f) and the observed columns y, reaches its minimum. In this equation, E and B are the matrices of errors on the observations and the emission parameters, respectively, and T means the transpose. The observation error matrix E, assumed diagonal, comprises the error on the vertical columns and a model error of 20%, based on sensitivity calculations (Stavrakou et al., 2009). The matrix B is non-diagonal. Its specification lies on assumptions made for the processes responsible for the estimated emission rates in the bottom-up inventories, as explained in the next section. The derivatives of J with respect to each component of f are computed using the adjoint of the model. By using J and its derivatives in an iterative minimization algorithm (Gilbert and Lemaréchal, 1989), we obtain an updated estimate for f. Once the norm of the gradient of J is reduced by a factor of 100 compared to its initial value (usually after 10-20 iterations), we consider that J has reached its minimum.
The inversion approach adopted in this study consists in estimating the emission fluxes at the model resolution for each emitting process and month between 2003 and 2006 (Stavrakou and Müller, 2006;Stavrakou et al., 2008). The unknowns to be determined for each source category are equal to 120 960 (4 years×12 months×72 longitudes×35 latitudes). To exclude from the analysis grid cells with very low emissions, we apply a threshold value for the a priori maximum emission per category in the 4-year period in a given grid cell equal to 10 9 molec./cm 2 /s. An emission lower than this value in a grid cell for a category is not optimized. This requirement reduces the number of unknowns: ∼30 000 for vegetation fires, and ∼45 000 for biogenic sources. Moreover, we bring in the inversion extra constraints in the form of correlations among the a priori errors on the emission parameters. These correlations are implemented through the off-diagonal elements of the matrix B (Stavrakou and Müller, 2006).
The spatiotemporal coexistence of the different HCHO sources is a recognized inherent difficulty in the derivation of the underlying emissions of HCHO precursors. This very challenging and yet unresolved question is tackled in this study by using the prescribed spatiotemporal correlation setup discussed in the next section. It is based on physical assumptions of how the emission from a source type in a given month and location is expected to correlate with the emission estimated at neighbouring months and locations. Through this framework, the inversion is expected to find a compromising state in between the emission pattern borrowed from the a priori inventories and the HCHO observations within their assigned and measured uncertainties, respectively.

Specifying the Error Covariance Matrix
The diagonal elements of B are defined as the squares of the relative error on the fluxes emitted by a grid cell. The relative errors on the biomass burning and biogenic emission parameters are taken equal to 0.9.
The spatial and temporal components of B are considered as statistically independent, and therefore B can be decomposed as B=B(r)B(t). The spatial correlations between errors on the pyrogenic and biogenic sources emitted by the grid cells i and j are assumed to decay exponentially with the distance d ij between the cells, where the decorrelation length is taken equal to 500 km for both emission categories, φ i denotes the total flux emitted in the grid cell i from pyrogenic or biogenic sources, σ φ i the error on the flux and σ φ i /φ i the relative error. For biogenic emissions a dependence on the distribution of ecosystems is also assumed: with e n i being the fraction of the flux emitted in the i th grid cell by the ecosystem n, according to the plant functional type distribution of Guenther et al. (2006) used in the MEGAN-ECMWF inventory. By this way, the errors on the emissions from different plant functional types are assumed

Results and discussion
The performed inversions are summarized in Table 1. The right column of Fig. 2 illustrates the seasonally averaged HCHO columns for 2006 inferred from the I2 optimization. Comparison between the observed, a priori and updated columns shows that the optimization brings the model closer to the measurements at all latitudes and all seasons compared to the a priori simulated columns. This is reflected in a cost function reduction by about 25% in both inversions. For the sake of simplicity, only global maps from the I2 inversion and for 2006 are shown here. Comparisons for the different years will be also presented in this section. Over northern Africa in wintertime and over North America in the growing season, the HCHO column is found to be decreased by about 20% after optimization, whereas an increase reaching 30% of the HCHO columns over southern Africa is estimated during the local summer, and a significant reduction of the columns over Indonesia is found to be necessary to order to counterbalance the overestimated a priori columns. These column updates imply changes in the underlying biogenic and pyrogenic NMVOC emissions, as illustrated by the ratio of the optimized to the a priori annual emissions for 2006 shown in the bottom panels of Figs. 3 and 4. Note that both the a priori and optimized emissions are monthly and diurnally averaged values. Although on the global scale the inferred emissions from the two categories exhibit only weak to moderate deviations from the corresponding a priori estimates, as summarized in Table 2, the regional updates often present large departures from their a priori values. A description of the inversion results by region follows hereunder.

Southern Africa
The largest isoprene flux change is found over southern Africa, where the annually averaged posterior emissions are estimated to be 60% higher than the a priori values over an extended region (5-25 • S, 15-35 • E), while an increase of more than a factor of two is obtained over a narrow band extending through Zambia (10-20 • S, 25-30 • E). A priori isoprene emissions over southern Africa show a strong seasonality, with a decline from a maximum in January and February to a minimum in July due to low rainfall and leaf drop (Fig. 5, fourth panel), whereas the updated fluxes exhibit a much stronger variability with a peak-to-trough ratio of 4.5 for 2006. The same pattern is repeated for the different years with minor variations. As follows from Fig. 5, the mean a priori biogenic emission through 2003-2006 is equal to 37 Tg C/yr, whereas the posterior estimate amounts to 57 Tg C/yr after both inversions, which is found to be in excellent agreement with the isoprene flux of 56 Tg C/yr estimated by Otter et al. (2003), based on detailed vegetation species information and emission capacity measurements performed in Africa south of the equator. The same study reports that light-dependent monoterpene emitters, having high emission rates, have been identified over the region experiencing the largest isoprene update. Such compounds, not accounted for in the model, could contribute to the HCHO production and partly explain the model/data discrepancies. The beginning of the fire season coincides with the senescent phase of the vegetation. As shown in Fig. 5, the maximum of biogenic emissions corresponds to the minimum of fire occurence and vice versa. Although over the extended region the posterior pyrogenic emission strengths lie on average very close to the a priori estimates, the SCIAMACHY data suggest a decrease of the a priori NMVOC emissions by 20% over the Central East Africa (Congo/Angola), and an increase varying between 10-30% over the southeastern part of the continent (Fig. 4), whereas the derived pyrogenic estimates from both inversions tend to exhibit similar patterns after optimization (Fig. 5, third panel). The observed HCHO columns are compared with the a priori and optimized columns in Fig. 6 (fourth panel). The columns exhibit a moderate seasonal variability and range between 7×10 15 and 13×10 15 molec./cm 2 . A significant bias reduction is obtained after optimization over this region, with most of the emission change taken over by the isoprene source.

Northern Africa
The fire activity over this region progresses eastwards from November through February, peaking over western Africa in November-December and over Central Africa in January-February, whereas its minimum is observed from May to September, as shown in Fig. 5 (first panel). Both optimizations infer a decrease of pyrogenic emissions over Central Africa (Fig. 4), which is more significant in the I2 results (30-40%), as the use of GFEDv2 leads to a large column overestimation (Fig. 6, third panel). A satisfactory level of convergence of the posterior pyrogenic fluxes is achieved (Fig. 5). The GFEDv1 database is on average by 20-30% lower than the GFEDv2 inventory over this region, and seems to better capture the intensity of the fire events. Nevertheless, the lack of aerosol correction in the retrieval might introduce a bias in the HCHO columns during important fire events, and erroneously lead to the conclusion that the GFEDv2 inventory is too high. Note, however, that an independent study using the same emission database pointed also to a large model overestimation when compared to observed NO 2 columns over this region . The annual posterior biogenic emissions over northern Africa stay very close to the a priori estimates, with low interannual variability. The mean posterior biogenic African source over 2003-2006 is estimated at 109 Tg C/yr and is found to be similar to a previous estimate based on GOME HCHO columns (103.3 Tg C/yr, Shim et al., 2005).

Southeast Asia
The Indochina peninsula and Indonesia are the dominant contributors to the biogenic and pyrogenic NMVOC fluxes in Southeast Asia. In Indochina, the driest months, November to February, are followed by the hottest months, March to May (fire season), before the onset of the raining season, lasting from May to October. As displayed in the first panel of Fig. 7, both biomass burning inventories catch the timing and the magnitude of the fire events, although the use of GFEDv2 leads to a slight underestimation of the columns. The inversions bring the posterior emissions closer to each other and to the data. The excellent agreement between the a priori columns and the data during the rainy season yields confidence to the isoprene database used, and in this case, the optimization brings only slight changes to the parent biogenic fluxes. In consistency with our results, no inferred change in biogenic fluxes over Indochina is found by the Fu et al. (2007) inversion study, which used GOME HCHO measurements over 1996-2001. Indonesia (10 • S-5 • N, 95-150 • E) is influenced by the dry Australian continental air masses from June to November and by ocean masses during the rest of the year, but may exhibit deviations from this general description due to local atmospheric conditions and the El Niño-La Niña occurences. While the 2006 El Niño was rather weak compared to 1997-1998, massive forest fires have burst out across Borneo and Sumatra in September-October 2006 and some of the highest deforestation rates ever have been recorded in Indonesia. As shown on Fig. 7, both pyrogenic inventories capture the timing of the maximum, although its intensity varies considerably between the two estimates, 7 Tg in GFEDv1 and 19 Tg in GFEDv2 in 2006 (Fig. 5). The enhanced emissions of GFEDv2 over this region are explained by the below ground burning of decayed plant matter which is taken into account in this version of the inventory (van der . The reduction by almost a factor of two of the GFEDv2 2006 estimate provides an excellent posterior agreement with the observed HCHO columns over Indonesia, whereas the posterior I1 pyrogenic fluxes do not yield any significant improvement compared to the a priori simulation. Note, however, that the lack of an aerosol correction over fire scenes in the HCHO retrieval is a potentially important source of error for the inferred biomass burning NMVOC fluxes. According to the estimate by Fu et al. (2007), the air mass factors are found to be reduced by up to 40% due to the presence of absorbing aerosols, and thus, the retrieved HCHO columns coud be much higher over biomass burning events. The a priori model overestimation over Indonesia in GFEDv2 might be partly explained by the lack of aerosol correction, and thus the significant inferred flux decreases could be more moderate.
The inverse modelling suggests a decrease by 30% of the biogenic source over Indonesia from the mean value of 33 Tg C/yr over [2003][2004][2005][2006] to 23 Tg C/yr for both inversions.A 21% decrease is found over Indonesia and Indochina, from 40.2 to 31.8 Tg C for 2006 (Table 3). A similar decrease is also reported by Fu et al. (2007), but their a priori estimate (23.8 Tg C/yr) is significantly lower than ours. The inversion study by Shim et al. (2005) suggests a 45% increase with respect to the a priori value of 20 Tg C/yr, which brings their inferred estimate closer to our biogenic flux estimate for this region.

North America
Biogenic emissions over North America have been the subject of many studies attempting to infer their magnitude, seasonal and interannual variability based on HCHO columns detected from space. Although a general consistency between these studies is demonstrated (Abbot et al., 2003;Palmer et al., 2006;Shim et al., 2005;Millet et al., 2006), a careful look to the inferred estimates reveals large discrepancies in the reported biogenic fluxes over this region. More precisely, whereas the GOME-derived isoprene emission is found to be by 20% lower than the GEIA estimate , an increase by 20% with respect to the GEIA inventory is reported by Shim et al. (2005), although both studies use the same HCHO retrieval (but different inversion methods, a linear transfer function and a Bayesian aproach, respectively). The HCHO columns over North America exhibit a pronounced seasonality with values ranging between . On the left: monthly a priori pyrogenic emissions are displayed in dotted lines, in blue for GFEDv1 and in red for GFEDv2. Solid lines are used for the updated emissions, blue for I1 and red for I2. Annual a priori and posterior emissions for each year are shown inset, the a priori on the left, the posterior on the right. On the right: monthly a priori biogenic emissions are in black. Updates from I1 are in blue, from I2 in red. Annual a priori and posterior emissions for each year are shown inset. 5×10 15 molec./cm 2 and 13×10 15 molec./cm 2 (Fig. 6, first two panels), whereas the largest divergence between the a priori and the data is observed from June through August. As shown on Fig. 5 Table 3, the biogenic fluxes are significantly decreased after optimization over this region. The annual reduction varies between 32% and 45% over 2003-2006 in the eastern US, whereas the 2006 posterior isoprene emissions over the US are estimated at 11.5 Tg C, 38% lower than the a priori (Table 3). It is worth noting that the HCHO retrieved columns by De  are about 30% lower than the GOME columns by Chance et al. (2000) used in previous studies. The reasons of the discrepancy between the two HCHO datasets over this region are unclear. The low biogenic emissions inferred by our study are supported, however, by the OMI-derived biogenic fluxes from June through August 2006 , which are found to be up to 23% lower than their a priori MEGAN estimate. Further, the MEGAN inventory is found to be overestimated by up to a factor of two when compared to estimates based on airborne measurements (Warneke et al., 2007). Finally, a sensitivity calculation performed using halved isoprene emissions over the US resulted in a significant bias reduction between the INTEX-A campaign HCHO concentrations Fried et al., 2008;Heikes et al., 2001) and the model concentrations in the boundary layer (Stavrakou et al., 2009).

Amazonia
Although Amazonia is thought as the region where rain, high humidity and warmth meet all-year round, large portions of the Amazon experience months of seasonal drought every year. The number of fire occurences during the fire season, lasting from August to November, nearly doubled in the six-year period between (Koren et al., 2007. In particular, during the 2005 dry season the southwestern Amazonia experienced a severe drought, which induced the massive fires in the Brazilian state of Acre (Aragão et al., 2007). This is reflected in the HCHO column enhance-ment detected by SCIAMACHY for this year over this region (Fig. 6, panel 6). Both pyrogenic emission inventories allow for a very good consistency with the data during the fire season, and an excellent agreement is found between the a priori model and the data over the extended western Amazonian region (10 • S-5 • N, 65-75 • W, panel 5 of the same figure). Therefore, almost no change in the a priori biomass burning fluxes is inferred by the inversion (Table 3, Fig. 4), as opposed to a previous study (Shim et al., 2005) reporting a five-fold increase in biomass burning estimates over this region. Table 4. Sensitivity studies performed using the GFEDv1 database.
Sensitivity inversion Description S1 a priori errors on the emission parameters halved S2 a priori errors on the emission parameters doubled S3 decorrelation length for biogenic emissions doubled S4 zero decorrelation length for biogenic emissions S5 modify the chemistry of isoprene as in Lelieveld et al. (2008) S6 isoprene degradation as in the GEOS-Chem model (Evans et al., 2003) The GFEDv1 inventory is found to be on average by 35% higher than GFEDv2 over 2003-2006 in the extended Amazon region, a difference due to large discrepancies in the eastern part of the Amazon basin. In particular, over the Central-East Brazilian states Maranhão and Parà, the use of GFEDv2 leads to a severe underestimation of the modelled columns during the burning season (Fig. 6, last panel), whereas the use of GFEDv1 results in a better consistency with the data, despite the associated overestimation. The substantial bias reduction after the I1 optimization over central eastern Brazil is realized by decreasing the pyrogenic source by up to 20% and by increasing the biogenic flux by 20-60% (Figs. 3 and 4). The model/data bias reduction in the I2 scenario is achieved by a strong increase of the biogenic source, necessary in order to compensate for the low GFEDv2 emissions (Fig. 3), leading thus to an erroneous estimate for the biogenic source over this region. This result highlights the fact that, due to the overlap of the different emission source types, the quantification of the NMVOC emissions using inverse modelling of HCHO columns, should be conducted with particular care in order to avoid misinterpretations of the results.
Isoprene fluxes measured at two sites located in Central Amazon basin, Tapajòs (2 • S, 55 • W) and Manaus (2 • S, 60 • W) yield contradicting indications about the validity of the MEGAN-ECMWF estimates over this region. While MEGAN-ECMWF is found to be strongly overestimated in the Tapajòs region , the isoprene fluxes over Manaus were found to be in good agreement with GEIA (Kuhn et al., 2007), which is higher than MEGAN-ECMWF over this region. Moreover, the MEGAN flux is found to be a factor of two higher that the surface flux isoprene measurements derived during the GABRIEL campaign in French Guyana and Suriname Eerdekens et al., 2008). In contrast to the above deviations, the SCIAMACHY-derived isoprene emissions do not suggest significant departures from the a priori.

Sensitivity inversions
To quantify the importance of the inversion setup on the inferred NMVOC emission results, we conducted a number of sensitivity studies, which are summarized in Table 4. The a priori errors on the emission parameters, taken equal to 0.9 in the standard inversion for both emission categories, are now halved (S1) or doubled (S2), and the decorrelation length for isoprene emissions is set either to 1000 km or to zero (S3 and S4). The motivation for the S5 sensitivity study lies in the recent suggestion that the oxidation of isoprene recycles OH more efficiently than generally assumed under low NO x regimes, and that this could explain the underestimation of OH levels by current models over the pristine Amazon forest . Our isoprene degradation scheme has thus been modified to include a production of two OH radicals in the reaction of first generation peroxy radicals from isoprene with HO 2 . In addition, the rate of the OH + isoprene reaction is reduced by 30% as recommended by Butler et al. (2008) in order to account for the isoprene/OH segregation effect. Finally, the isoprene degradation mechanism of the GEOS-Chem model (Evans et al., 2003) is used in the last sensitivity scenario. The GFEDv1 database is chosen as a priori inventory for carrying out the sensitivity exercises, and the results are compared with the corresponding estimates from the standard I1 inversion.
Changing the errors on the emission parameters is found to have a significant impact on the cost reduction obtained after optimization, which amounts to 10, 22 and 43% in the inversions S1, I1 (standard) and S2, respectively. The posterior pyrogenic emission estimates derived by the sensitivity tests are robust in all cases, with only weak deviations from the estimates of the standard I1 inversion. As seen on Fig. 8, which summarizes the deduced biogenic emissions by region for all inversions, the most significant departure from the a priori is found in the S2 results, whereas the posterior lies closest to the a priori in the S1 optimization, due to the strong trust assigned to the a priori inventory magnitudes and spatiotemporal patterns. The spread of regional flux estimates in the S1/S2 tests ranges between −23% and 23% compared to the I1 results over southern Africa and eastern US, whereas the dispersion is much less significant over Indonesia (−10% and 15%) and Amazonia (−6% and 12%), respectively.
Very weak (less than 12%) is the sensitivity inferred in the S3/S4 experiments over the different regions (Fig. 8). Setting the biogenic length scale to zero means that the information content from emitting regions is not allowed to propagate Fig. 8. Isoprene emissions by region inferred by the inversion I1 (using GFEDv1) for the year 2003, and comparison with the emissions deduced from the sensitivity tests S1-S6 described in Table 4. Units are Tg/yr. across contiguous areas, which are not very strongly constrained by the observations. This results in unrealistic hot spots in the inferred emission patterns, with the highest emitting grid cells undergoing the largest changes, whereas, the adjacent grid cells with lower emissions being not affected at all. Note also that the sensitivity of the results to these two scenarios would be probably larger if the model were run at a finer horizontal resolution.
Not surprisingly, due to the artificially imposed OH recycling in the S5 inversion, the surface concentrations of OH are found to be by more than a factor of two higher than in the I1 inversion over the Amazon forest. However, the HCHO surface concentrations calculated by I1 and S5 over this region do not differ significantly, despite the expected change induced by the enhanced sink term. This reflects the fact that photolysis is by far the dominant HCHO sink at the surface over Amazonia, and therefore, the calculated modelled HCHO columns are not affected by the additional OH source.
A global increase by 38% of the isoprene source strength is deduced from the S6 experiment. This is mainly due to the lower HCHO yield from isoprene (by 17% under high NO x conditions) in the GEOS-Chem mechanism compared to our mechanism Stavrakou et al., 2009). The difference between the two mechanisms is more significant under low NO x levels, where the GEOS-Chem yields are found to be by 35-50% lower than those calculated under high NO x conditions, whereas the IMAGEsv2 yields are only slightly lower under low NO x conditions (Stavrakou et al., 2009). As a result, the posterior isoprene source increase is found to be more significant over tropical regions (30% in South Asia, 40% in South America), whereas the increase inferred in the mid-latidudes of the Northern Hemisphere is moderate (11% over North America). Such large discrepancies reflect important differences between the GEOS-Chem and MCM mechanisms, in particular, regarding the dependence of the HCHO yield from isoprene on the NO x abundancies. It is worth noting in this context that comparisons of model results with airborne measurements over Surinam in Amazonia suggest that the MCM mechanism (to which the IMAGESv2 mechanism is lined up) might overestimate the HCHO yield from isoprene under low NO x conditions .

Conclusions
This is the first study reporting estimates of pyrogenic and biogenic NMVOC emission strengths on the global scale constrained by formaldehyde columns retrieved from the SCIAMACHY instrument over 2003-2006. Our inversion framework, based on the adjoint model of the IMAGESv2 CTM, enables the optimization of the emission fluxes at the model resolution. A large number of HCHO precursors is included in the model, as well as state-of-the-art knowledge about degradation schemes and speciation for pyrogenically emitted species.
The GFED version 1 and 2 biomass burning inventories and the MEGAN-ECMWF database for isoprene are used to drive the model. Two inversion experiments are conducted, using GFEDv1 (I1) or GFEDv2 (I2) as a priori. Although the global posterior estimates remain very close to the a priori values, important changes are often inferred on the regional scale, however.
In Africa north of the equator, the inversions clearly suggest a decrease in the biomass burning source, by 16% and 35% with respect to GFEDv1 and GFEDv2. The strongest isoprene source change is encountered in southern Africa. The suggested average increase in the biogenic emissions over 2003-2006 amounts to 55%, and the posterior isoprene emission (57 Tg C/yr) agrees quite well with literature isoprene flux estimates in this region.
Whereas both GFEDv1 and v2 are consistent over Indochina and result in a very good agreement with the observed HCHO columns, a serious overestimation of the a priori columns is found over Indonesia when GFEDv2 is used. The a priori biogenic source over Indochina yields a nice model/data match, whereas over Indonesia a decrease by 20-30% is necessary to compensate for the high observed columns. The resulting isoprene emission estimate (23 Tg C/yr) agrees reasonably well with literature values.
The a priori simulations reproduce very well the observations over Amazonia, and thus no significant emission update is inferred over this region, as opposed to previous work suggesting a large increase of the isoprene source. However, over the central-eastern Amazon basin, the use of GFEDv1 or GFEDv2 leads either to overestimated or to strongly underestimated columns, respectively. Both inversions succeed in decreasing the model/data biases, but the I2 inversion fails to produce a reasonable posterior emission pattern.
Isoprene emissions over North America are found to be by 25% lower than their a priori estimate over 2003-2006, the reduction being more important in the eastern US. The contiguous US isoprene flux for 2006 (11.5 Tg C) is by 38% lower than the a priori. The suggested decrease, not reported in previous inversion studies using Chance et al. (2000) GOME HCHO columns, is due to our significantly lower HCHO columns over this region compared to Chance et al. (2000). Insofar as the reasons of the discrepancy in the retrievals are unknown until now, the OMI satellite measurements imply also a decrease up to 23% in the MEGAN estimate, corroborating our derived fluxes.
Although the inferred emissions are found to be only weakly sensitive to changes in the inversion setup (errors, correlations), they depend strongly on the choice of the degradation mechanism for isoprene, as shown by a sensitivity test performed with the GEOS-Chem model isoprene oxidation scheme. In this inversion experiment, the global isoprene source increases by 38% compared to our standard inversion results, the increase being more significant over low NO x areas in the Tropics. This strong sensitivity, associated to large uncertainties in the HCHO yields from isoprene under different NO x regimes, reflects the currently incomplete knowledge of the reactions involved in the isoprene oxidation.
The omission of aerosol correction over fire scenes is a recognized important source of error for the inferred biomass burning NMVOC fluxes, which would lead to by up to 40% lower retrieved HCHO columns. Therefore, the significant flux decreases inferred over e.g. Indonesia (when using GFEDv2) could be more moderate, whereas the deduced biomass burning flux increases over e.g. Amazonia and southern Africa (with GFEDv2) might be even more important. The effect of the inclusion of aerosols in the HCHO retrievals is under investigation and will be quantified in a future study.
Finally, it should be acknowledged that the inferred NMVOC emission estimates depend crucially on the quality of HCHO columns observed from space. The large differences in the derived source strengths by different modelling groups might be partly due to the sometimes large discrepancies between the estimated columns by different retrieval groups . Hence, it is clear that an intercomparison exercise between the different datasets, as well as confrontation with available ground-based and airborne measurements, should be foreseen if we want to best take advantage of the measurements and acquire additional insight into the underlying processes involved.