Evaluating the performance of pyrogenic and biogenic emission inventories against one decade of space-based formaldehyde columns

: A new one-decade (1997ĝ€"2006) dataset of formaldehyde (HCHO) columns retrieved from GOME and SCIAMACHY is compared with HCHO columns simulated by an updated version of the IMAGES global chemical transport model. This model version includes an optimized chemical scheme with respect to HCHO production, where the short-term and final HCHO yields from pyrogenically emitted non-methane volatile organic compounds (NMVOCs) are estimated from the Master Chemical Mechanism (MCM) and an explicit speciation profile of pyrogenic emissions. The model is driven by the Global Fire Emissions Database (GFED) version 1 or 2 for biomass burning, whereas biogenic emissions are provided either by the Global Emissions Inventory Activity (GEIA), or by a newly developed inventory based on the Model of Emissions of Gases and Aerosols from Nature (MEGAN) algorithms driven by meteorological fields from the European Centre for Medium-Range Weather Forecasts (ECMWF). The comparisons focus on tropical ecosystems, North America and China, which experience strong biogenic and biomass burning NMVOC emissions reflected in the enhanced measured HCHO columns. These comparisons aim at testing the ability of the model to reproduce the observed features of the HCHO distribution on the global scale and at providing a first assessment of the performance of the current emission inventories. The high correlation coefficients (<i>r</i>&gt;0.7) between the observed and simulated columns over most regions indicate a good consistency between the model, the implemented inventories and the HCHO dataset. The use of the MEGAN-ECMWF inventory improves the model/data agreement in almost all regions, but biases persist over parts of Africa and Australia. Although neither GFED version is consistent with the data over all regions, a better agreement is achieved over Indonesia and Southern Africa when GFEDv2 is used, but GFEDv1 succeeds better in getting the correct seasonal patterns and intensities of the fire episodes over the Amazon basin, as reflected in the significantly higher correlations calculated in this region. Although the uncertainties in the HCHO retrievals, especially over fire scenes, can be quite large, this study provides a first assessment about whether the improved methodologies and input data implemented in GFEDv2 and MEGAN-ECMWF lead to better results in the comparisons of modelled with observed HCHO column measurements. Abstract. A new one-decade (1997–2006) dataset of formaldehyde (HCHO) columns retrieved from GOME and SCIAMACHY is compared with HCHO columns simulated by an updated version of the IMAGES global chemical transport model. This model version includes an optimized chemical scheme with respect to HCHO production, where the short-term and ﬁnal HCHO yields from pyrogenically emitted non-methane volatile organic compounds (NMVOCs) are estimated from the Master Chemical Mechanism (MCM) and an explicit speciation proﬁle of pyrogenic emissions. The model is driven by the Global Fire Emissions Database (GFED) version 1 or 2 for biomass burning, whereas biogenic emissions are provided either by the Global Emissions Inventory Activity (GEIA), or by a newly developed inventory based on the Model of Emissions of Gases and Aerosols from Nature (MEGAN) algorithms driven by meteorological ﬁelds from the European Centre for Medium-Range Weather Forecasts (ECMWF). The comparisons focus on tropical ecosystems, North America and China, which experience strong biogenic and biomass burning NMVOC emissions reﬂected in the enhanced measured HCHO columns. These comparisons aim at testing the ability of the model to reproduce the observed features of the HCHO distribution on the global scale and at providing a ﬁrst assessment of the performance of the current emission inventories. The high correlation coefﬁcients ( r> 0.7) between the observed and simulated columns over most regions indicate a good consistency between the model, the implemented inventories


Introduction
Non-methane volatile organic compounds (NMVOCs) have a strong influence on tropospheric composition due to their impact on hydroxy radical (OH) levels and because of their contribution to ozone production in the presence of NO x (Houweling et al., 1998). NMVOCs also influence the climate due to their role as precursors of secondary organic aerosols (SOA). Their lifetimes range from a few minutes to a few months. The largest source of NMVOCs is biogenic (1150 Tg C/yr, Guenther et al. (1995)), as it represents about 85% of the total emission, the remainder being due to anthropogenic activities (12%, 161 Tg C/yr, Olivier et al., 2001) and vegetation fires (3%, 50 Tg C/yr, Andreae and Merlet, 2001, this study). Emissions of isoprene account for about half the total biogenic source (Guenther et al., 1995). However, due to their diversity, short lifetimes, and large spatiotemporal variability, the global burden as well as the speciation of NMVOCs are still highly uncertain.
Formaldehyde (HCHO) is the most abundant aldehyde in the atmosphere. A very small fraction is due to direct emissions by vegetation fires (Andreae and Merlet, 2001) and fossil fuel combustion (Olivier et al., 2003), but its main source is the chemical degradation of methane and the NMVOCs. Methane oxidation accounts for 60% of the global production of formaldehyde (ca. 1600 Tg/yr), the remainder being due to NMVOC oxidation, according to simulations performed using IMAGESv2, an updated version of the IMAGES global chemistry-transport model (Müller and Stavrakou, 2005). The contribution of directly emitted HCHO is small (<1% of the global total), yet it can be locally important, especially during fire events. Besides the relatively weak sinks associated with dry and wet deposition, the main sinks of HCHO include two photolysis reactions (HCHO→CO+H 2 or HCO+H) and oxidation by OH (HCHO+OH→HCO+H 2 O). Although the reaction pathways leading from methane to HCHO are well established (e.g., Brasseur et al., 1999), the degradation mechanisms of the many NMVOCs are still very uncertain. For example, the globally averaged yield of HCHO from isoprene calculated using the MOZART model (Pfister et al., 2008) (1.1 mol/mol) is more than a factor of two lower than both the high NO x and low NO x yields (2.6 and 2.45 mol/mol) estimated using the Master Chemical Mechanism version 3 (MCMv3) (Saunders et al., 2003) (see also Subsect. 3.2). The final yield of HCHO in the oxidation of isoprene under high NO x conditions is about 20% higher in the MCMv3 (Saunders et al., 2003) than in the chemical mechanism of the GEOS-Chem model . For monoterpenes and sesquiterpenes, uncertainties are even larger. For example the HCHO yield in the OH-initiated degradation of α-pinene, is equal to 3.1 mol/mol in the MCM mechanism , but it is estimated to be about 20 times lower according to recent theoretical studies (Peeters et al., 2001;Vereecken et al., 2007). In addition, large uncertainties are associated with the impact of isoprene on the oxidizing capacity of the atmosphere. Comparisons between the observations from recent tropical forest aircraft studies (Kuhn et al., 2007;Karl et al., 2007) and current models show that OH is largely underestimated in the boundary layer over the Amazon basin, possibly due to missing pathways or inaccurate reaction rates in the degradation mechanism of isoprene (Lelieveld et al., 2008).
Past studies have demonstrated the usefulness of HCHO column measurements from satellites as constraints on NMVOC emissions. GOME columns  have been used together with the GEOS-Chem model in order to investigate isoprene emissions over North America (Abbot et al., 2003;Palmer et al., 2006), at the global scale (Shim et al., 2005), in southeast Asia (Fu et al., 2007), and in South America . Moreover, HCHO retrievals from the Ozone Monitoring Instrument (OMI) have been used to derive isoprene emissions over North America . Although these model studies demonstrate an excellent consistency between model results and satellite observations over several regions, like North America, serious discrepancies are often found in tropical regions, where the biomass burning and biogenic VOC source cannot be disaggregated. For instance, in spite of the large increases in the posterior isoprene (factor of 2-4) and biomass burning (factor of two) emissions over tropical Africa inferred by the inverse modelling study of Shim et al. (2005) for the period 1996-1997, a satisfactory agreement between the simulated and the observed HCHO columns is not achieved after optimization.
In this article we present comparisons between the HCHO columns simulated with an updated version of the IM-AGES CTM (Müller and Stavrakou, 2005) and a new dataset of spaceborne HCHO columns over the period 1997-2006. This 10-year record of HCHO columns retrieved from GOME and SCIAMACHY satellite instruments  is an air quality service of the Protocol Monitoring for the GMES Service Element: Atmosphere (PROMOTE) and Tropospheric Emission Monitoring Internet Service (TEMIS) projects (European Space Agency/Global Monitoring for Environment and Security-GMES) made available on the TEMIS website (www.temis. nl). It differs from previous retrievals Wittrock et al., 2000Wittrock et al., , 2006 by the choice of the spectral window used, chosen to reduce artefacts over desert areas, minimize the noise, and produce a good degree of consistency between both instruments.
Our focus will be on tropical regions, where HCHO columns are dominated by pyrogenic and biogenic NMVOC emissions, as well as on North America and China, which experience enhanced biogenic emissions during the growing season. In order to provide meaningful comparisons of the model results with satellite data in regions where biomass burning is the dominant NMVOC source, the chemical scheme of the CTM has been extended and optimized with respect to HCHO production from pyrogenic NMVOCs. Due to the large uncertainties associated with pyrogenic and biogenic emission estimates, the model results are highly dependent on the emission inventories used in the comparisons. Two biomass burning inventories are tested in this study: the Global Fire Emission Database (GFED) version 1 (van der Werf et al., 2004), which has been extended until 2006 in the context of this study, and GFEDv2 . Both databases rely on fire counts detected from satellites, and cover the 1997 to 2006 period. Biogenic emissions are provided either by the GEIA database (Guenther et al., 1995), or by a new inventory , based on the MEGAN algorithm . The 10year comparisons between simulated and measured HCHO columns are intended as a means to test the ability of the model to reproduce the observed features of the formaldehyde distribution in different regions of the world. These comparisons also provide a unique opportunity to assess the performance of emission inventories, highlight their potential strengths but also point to possible weaknesses.
This article is structured as follows. In Sect. 2 the satellite retrievals and the model are briefly presented. In Sect. 3 a detailed description of the proposed chemical mechanism is given, as well as the estimated contributions of individual NMVOCs to the total HCHO production. In Sect. 4, the relative contributions of the different emission sources to the simulated HCHO column are determined, and a crude estimation of the model errors is provided. Section 5 is dedicated to the comparisons between modelled and observed HCHO columns over the eastern US, Tropical America, Africa, the Far East and Indonesia. Finally, the conclusions and the perspectives of this work are discussed in Sect. 6. The Supplement comprises a short description of the pyrogenic and biogenic emission inventories used to drive the model (Part A), the complete chemical mechanism of the NMVOCs used in the model (Part B), as well as auxiliary figures for Sect. 4 (Part C).

GOME and SCIAMACHY HCHO columns
Observations from two UV-visible nadir sounders, GOME (Global Ozone Monitoring Experiment) launched in April 1995 onboard the ERS-2 satellite (pixel size: 320×40 km 2 ) and SCIAMACHY (Scanning Imaging Absorption Spectrometer for Atmospheric Chartography) launched in June 2001 onboard ENVISAT (pixel size: 60×30 km 2 ), have been used to retrieve global tropospheric HCHO column densities by applying the differential optical absorption spectroscopy (DOAS) technique (Platt et al., 1994) in the near UV region. Both satellites are in sun-synchronous orbit with an equatorial local overpass time of 10:30 and 10:00, respectively.
Slant column abundances are fitted within the 328.5-346 nm spectral window using a multi-purpose DOAS analysis software developed by Van Roozendael et al. (1999). This window is optimized to minimize noise, spectral interferences and artefacts over desert areas, and produces consistent HCHO datasets between GOME and SCIAMACHY observations . To calculate the air mass factors, scattering weights evaluated from radiative transfer calculations performed with a pseudo-spherical version of the DISORT code (Stammes et al., 1988;Kylling et al., 1995) are used together with a priori profile shapes provided by the IMAGESv2 model on a monthly basis and spatially interpolated at each satellite geolocation and each altitude.
A correction for cloud effects is applied to the dataset . No correction has been explicitly applied to account for the effect of aerosols on the air mass factors.
The effect of non-absorbing aerosols is implicitly included through the cloud correction (Boersma et al., 2004), and results in a relatively small error (generally lower than 16%) on the air mass factor calculation. Absorbing aerosols can lead to a reduction of the air mass factor by up to 40% (Fu et al., 2007). The omission of the aerosol correction may thus lead to a significant underestimation of the derived HCHO column by up to 40% over fire scenes. The inclusion of an explicit aerosol correction in the retrieval algorithm will be addressed in future work.
A thorough description of the error estimation is provided in De . For clear sky conditions, in absence of aerosols, the total error on the AMFs is estimated at 18% with equal contributions from the albedo and the profile shape uncertainties. The error increases with cloud fraction, mainly for low altitude clouds, up to 50% for a cloud fraction of 0.5. In this work, monthly HCHO averages are calculated using observations with a cloud fraction below 0.4. The total uncertainty on the monthly HCHO vertical column generally ranges between 20 and 40%, depending o the observation conditions . Over fire scenes, however, this uncertainty can be expected to be significantly larger due to the effect of aerosols on the air mass factors.
The vertical columns used in this study include GOME data from 1997 to 2002 and SCIAMACHY data from 2003 to 2006.

IMAGESv2 Chemical Transport Model
The IMAGESv2 CTM is an updated version of the IM-AGES model (Müller and Stavrakou, 2005). Recent updates and improvements are briefly discussed in this section. The model provides the global distribution of 20 short-lived and 48 long-lived (i.e. transported) chemical compounds between the Earth's surface and the pressure level of 45 hPa through a chemical mechanism comprising 164 gas-phase reactions, 39 photodissociations and 3 heterogeneous reactions on the surface of sulfate aerosols. The model is run at a horizontal resolution of 5 degrees and is discretized in the vertical on 40 hybrid sigma-pressure layers. Monthly mean ECMWF/ERA40 reanalysed wind fields (Uppala et al., 2005) from September 1996 to December 2001 and ECMWF operational analyses beyond this date drive the advection represented by a semi-Lagrangian scheme (Smolarkiewicz and Rasch, 1991). The surface pressure, temperature and humidity fields are also monthly averages derived from the same ECMWF analysis. Note that hourly temperature and radiation fields are used as input for isoprene emissions. Horizontal and vertical diffusion coefficients are estimated using the ECMWF wind variances. ERA40 updraft mass convective fluxes are used until 2001, whereas a climatological mean over 1995-2001 is used beyond this year. Turbulent mixing in the planetary boundary layer (PBL) is parameterized as vertical diffusion.
The model time step is equal to three hours and the model calculates daily averaged concentrations. Correction factors T. Stavrakou et al.: NMVOC emissions vs. spaceborne HCHO columns are applied on the photorates and the chemical kinetic rates to account for diurnal variations in the photorates and in the concentrations. These factors are updated every month and are given by γ i = <k i C A C B > <k i ><C A ><C B > , C A and C B being the concentrations of the reactants A and B, k i the rate of the reaction i, and <> the 24-h mean. The correction factor calculation is performed off-line from full diurnal cycle calculations using diurnally varying photorates and emissions with a 20min time step using the fourth order Rosenbrock solver of the KPP package (Damian et al., 2002;Sandu and Sander, 2006). The HCHO diurnal profiles calculated from this simulation are also used to estimate the HCHO concentration at the satellite overpass time from the daily averaged values calculated with one day time step.
The calculation of the monthly HCHO columns account for the sampling times of observations at each location. The simulated columns are evaluated against HCHO observations between January 1997 and December 2006, following a 4month spin-up time. Archived fields from previous simulations are used as initial conditions. Anthropogenic CO and total NMVOC emissions are taken from the EDGAR v3.3 inventory for 1997 (Olivier et al., 2001;Olivier, 2002, see also http://www.mnp.nl/edgar/). Speciation of VOC emissions is obtained from the POET database (Olivier et al., 2003), except for biofuel emissions where we use the updated speciation by M. O. Andreae (personal communication, 2007) (Andreae and Merlet, 2001).
The distribution of vegetation fires is provided by the Global Fire Emission Database (GFED) version 1 (van der Werf et al., 2004) and version 2 (van der , which cover the complete 10-year period from 1997 to 2006. These databases are briefly described and compared in the Supplement (Part A), and the global NMVOC fluxes are also given for each database and year. The NMVOC pyrogenic emission factors for tropical forest burning, extratropical forest burning and savanna burning are obtained from M. O. Andreae (personal communication, 2007) (Andreae and Merlet, 2001). A total number of 16 explicit NMVOCs are pyrogenically emitted in the current version of the model, whereas a lumped species accounts for the emissions of the non-explicit NMVOCs (Table 1), as discussed in Sect. 3.
The diurnal profile of biomass burning emissions displayed in Fig. 1 is applied in the diurnal cycle calculations with the CTM. This profile is based on fire observations from satellites over 15 tropical and subtropical regions (Giglio, 2007). The diurnal distribution of fire activity has a maximum in the afternoon, little or no burning during the night and in the early morning, and a slightly longer tail behind the afternoon peak. It should be acknowledged, however, that large differences exist between diurnal profiles in different regions, even though they exhibit qualitatively similar patterns (Giglio, 2007). The impact of the diurnal cycle of pyrogenic emissions is assessed in a sensitivity calculation with the model in Subsect. 4.2. To account for fire-induced convection, the biomass burning emissions are distributed over six layers from the surface to 6 km (0-100 m, 100-500 m, 500-1 km, 1-2 km, 2-3 km, 3-6 km), according to the spatially-dependent fractional distribution of emission heights provided in Dentener et al. (2006).
Biogenic emissions of isoprene are provided by the MEGAN-ECMWF inventory , which is based on the MEGAN model  coupled with a detailed canopy environment model and driven by ECMWF fields. The isoprene emissions are daily averaged and gridded onto the resolution of IMAGESv2. The GEIA database (Guenther et al., 1995) is also used in this study as an alternative prior inventory. A description of these inventories and their differences is given in the Supplement (Part A).
The biogenic source of methanol is calculated as in Jacob et al. (2005), and amounts to 130 Tg/yr due to the plant growth process and 22 Tg/yr due to decaying dead plant matter. Biogenic emissions for CO, NO x and other NMVOCs are described in Müller and Stavrakou (2005), as well as ocean emissions of CO and NMVOCs.
3 Optimization of the chemical scheme with respect to HCHO production

The chemical scheme
The NMVOC chemistry species, chemical reactions and kinetic rates are presented in the Supplement (Part B). The inorganic chemistry reactions, as well as the methane degradation mechanism, are kept the same as in Müller and Stavrakou (2005), with updated kinetic rates from Sander et al. (2006). Table 1. Photochemical production of HCHO from pyrogenic NMVOCs and from biogenic isoprene. The emission estimates are 10-year averages based on GFEDv2 and the MEGAN/ECMWF inventory. Short-term and final yields are obtained from box model simulations (see text for details). Note a : This total includes 10 Tg/year of compounds with negligible HCHO yields: formic acid and acetylene (Saunders et al., 2003), hydrogen cyanide (Li et al., 2003), and acetonitrile (Tyndall et al., 2001 The degradation mechanism for the majority of the NMVOCs is largely based on the Master Chemical Mechanism (MCM) (Saunders et al., 2003). For isoprene, the condensed Mainz Isoprene Mechanism (MIM) (Pöschl et al., 2000), derived from the MCM, has been adopted with some modifications. More precisely, the mechanism includes compounds that do not appear explicitly in the MIM, like glycolaldehyde and glyoxal. The oxidation of isoprene hydroxynitrates by OH is adapted directly from the MCM, with the product NO 2 being replaced by HNO 3 , a more likely product in the OH-addition pathway of alkylnitrates (Atkinson, 1994). Further, the rate constant for the formation of methacrylic peroxynitrate (MPAN) is multiplied by a factor of 0.12, i.e. the yield of acylperoxy radicals in C 5 H 8 +OH according to the MCM (0.27) multiplied by the ratio of the NOreaction rates of alkylperoxy and acylperoxy radicals (0.43).

HCHO yields from NMVOC oxidation
We present here calculations of the HCHO yields from the oxidation of different NMVOCs using the fully explicit Master Chemical Mechanism v3.1 (Saunders et al., 2003), and we compare with the HCHO yields calculated with the IM-AGESv2 chemical mechanism (Table 1). To allow a meaningful comparison between the HCHO yields calculated with two different mechanisms, we employ in IMAGESv2 the same inorganic chemistry and photolysis rates provided for the MCM, except for the quantum yield of hydroxyacetone, which is taken equal to that of acetone and for the quantum yield of methacrolein (MACR) and methyl vinyl ketone (MVK) which follow the recommendation of Sander et al. (2006).
Box model time-dependent simulations have been performed in both cases using the chemical solver of the KPP package (Damian et al., 2002;Sandu and Sander, 2006). Simulations start at 06:00 for a temperature of 298 K at a latitude of 30 degrees in February. The model is initialized with 0.1 ppb of the considered NMVOC, 35 ppb O 3 , and 100 ppb CO. The NO 2 concentration is kept constant throughout the simulations and is taken equal to 1 ppb; such a choice reflects the high NO x regime associated to biomass burning events. Simulation results using 0.1 ppb NO 2 have been also obtained, but are generally omitted in the discussion below, except for the important case of isoprene, a compound emitted in both high NO x and low NO x environments. Two HCHO yields are computed: after one day of simulation (short-term yield), and after 2 months ("ultimate" or final yield). The short-term yield is defined as where C 0 (NMVOC) is the initial concentration of the NMVOC. This yield represents the number of HCHO molecules generated by a given NMVOC one day after the injection time. The short-term yield defined in Eq.
(1) provides better indication of the HCHO production that may be detected by the satellite instrument directly above biomass burning areas. The final yield is defined as where C(NMVOC) is the difference between the initial and the final NMVOC concentrations. Due to the importance of both short-term and final yields in the correct representation of the HCHO production by our chemical mechanism, particular care has been taken to ensure that the IMAGESv2 calculated HCHO yields are as close as possible to the MCM yields, as is evident from Table 1. Compounds with the highest short-term HCHO yield are ethene (1.38 mol/mol), propene (1.78 mol/mol), 2,3butanedione (2 mol/mol), and isoprene (2.26 mol/mol). Strongly emitted compounds like acetic acid and methanol with lifetimes of several days have very small 1-day yields. For relatively short-lived species like ethene, glycolaldehyde, propene, acetaldehyde, and isoprene, more than 80% of the final yield is reached within the first day in the box model simulations. The shortest-lived compounds glyoxal, methylglyoxal and biacetyl reach their ultimate HCHO yield within only a few hours.
Both IMAGESv2 and MCM yields of HCHO from isoprene are about 20% higher than the yield calculated with the GEOS-Chem mechanism  under high NO x conditions . The GEOS-Chem yield has been found by Millet et al. (2006) to be consistent with aircraft measurements of isoprene and formaldehyde over North America. The molar primary yields of MVK (0.33), MACR (0.23) and HCHO (0.6) estimated from laboratory studies (Atkinson et al., 2006) are well reproduced by the MCM, except for a slight overestimation concerning HCHO (yields of 0.34, 0.22 and 0.67, respectively). The formation of HCHO observed in MVK and MACR photooxidation experiments has also been shown to be fairly well reproduced by the MCM mechanism (Pinho et al., 2005), when replacing the strongly overestimated quantum yields of MVK and MACR of the MCMv3 by values in line with the recommendations of Sander et al. (2006) adopted in IM-AGESv2 (see Supplement). In conclusion, the HCHO yield from isoprene at high NO x appears to be only slightly biased in the IMAGESv2 mechanism according to the available laboratory data. The yield dependence on the abundance of NO x is more uncertain. As illustrated in Table 2, both MCM and IMAGESv2 are only slightly lower at low NO x (0.1 ppbv NO 2 ) compared to high NO x (1 ppbv) (by 10% and 16% in MCM and IMAGESv2, respectively). This contrasts with the stronger dependence found in the GEOS-Chem mechanism, with 35%-50% lower yields under low NO x levels, due to the loss of carbon in lumped products of organic hydroperoxide reactions in this mechanism .
The MCM-computed HCHO yields from α-and β-pinene under the aforementioned conditions are 2 and 1.7 mol/mol, respectively (see also Palmer et al., 2006). However, the primary yield of HCHO from α-pinene+OH is estimated to be lower than 0.2 mol/mol according to theoretical estimates and in agreement with laboratory data (Peeters et al., 2001, and references therein). Therefore, despite the fact that biogenic emissions of terpenes can exceed those of isoprene for some ecosystems, terpenes are thought to contribute little to the HCHO columns, and will be omitted from the present analysis.

Quantifying the HCHO production from vegetation fires
The HCHO production by a NMVOC is calculated as where P(NMVOC) is the pyrogenically emitted NMVOC, MW its molecular weight, and Y the short-or long-term yields computed by the MCM or the IMAGESv2, as shown in Table 1 and described in Subsect. 3.2. Note that the values in Table 1 are given for a comparison purpose and that HCHO formation is explicitly treated in IMAGESv2.
The total ultimate HCHO production from fires is estimated at 66 Tg/yr, half of which is produced after one day. As illustrated in Table 1, only three NMVOCs, namely acetic acid, methanol and ethene, account for one third of the total short-term HCHO production, and 43% of the final production. Among them, the oxidation of ethene contributes the most, 27% in the short-term and 18% in the long-term HCHO production, whereas the slowly reacting methanol and acetic acid, contribute mainly to the final production term. Other important contributions to the short-term HCHO production come from directly emitted HCHO (14%), C 3 H 6 (12%), 2,3butanedione (7%), and CH 3 CHO (6%). Glyoxal, although abundantly emitted during fire events, leads to a negligible HCHO formation. Insignificant is also the impact of aromatic NMVOCs (benzene, toluene and xylenes) to the total HCHO production (1-2%), and therefore these compounds are not included explicitly in the present study. Although formic acid, hydrogen cyanide and acetonitrile represent about 9% of the total NMVOC emissions, they do not lead to HCHO formation after oxidation. The compounds explicitly included in the IMAGESv2 chemical mechanism represent about 87% of the total short-and long-term HCHO production, the remainder being accounted for by the surrogate species C 4 H 10 (Müller and Stavrakou, 2005).
Since the short-term and final HCHO productions given in Table 1 have been calculated under high NO x conditions, and neglecting wet and dry deposition losses of oxygenated intermediates, they should be seen as only an upper limit of the HCHO produced in the atmosphere.

Contribution of the different emission sources to the total simulated HCHO column
The aim here is to calculate the contribution of different emission sources to the total annual modelled HCHO columns. Model runs are conducted for the year 2006 using the GFEDv2 biomass burning inventory and the MEGAN-ECMWF isoprene database. The precise choice of simulation year and emission inventories doesn't have a large influence on the calculated contributions, except for pyrogenic emissions which exhibit a large interannual variability (see Fig. 1 in the Supplement).
To sidestep the feedbacks of the emissions on the oxidants concentrations, we archive monthly OH, HO 2 , NO, NO 2 and NO 3 fields from a full simulation with all sources included. The contribution of each source category to the total HCHO column is calculated by including only emissions from that source, while keeping the concentrations of the oxidants at the archived values throughout the simulations. The four categories considered are the oxidation of methane (together with a small oceanic NMVOC source), anthropogenic emissions, vegetation fires and biogenic emissions. The result-ing annual mean HCHO columns are shown in Fig. 2, as well as the global amount of HCHO produced annually by the corresponding source. As illustrated in this figure, the oxidation of methane in the background troposphere represents 60% of the HCHO source on the global scale (HCHO yield from methane oxidation is equal to ca. 0.9). However, oxidation of locally emitted NMVOCs generally dominates over continents. Among the NMVOCs, biogenic compounds (mostly isoprene) are dominant and contribute up to 30% on the global scale. Although anthropogenic sources are significant in populated and industrialized areas, they are responsible for only 7% of the total source. The smallest contribution (3%) comes from biomass burning. Although insignificant on the global scale, fire episodes lead to locally enhanced HCHO columns detectable by the satellites, especially in tropical regions.
A comparison between the annually averaged spaceborne HCHO columns and the corresponding simulated columns is presented in Fig. 3 for 1997 and 2005. Regions with enhanced HCHO concentrations are clearly mostly related to biogenic emissions and biomass burning. The model reproduces well the observed distribution over continental regions and to some extent over the oceans, in particular within the Tropics. The SCIAMACHY columns tend to be slightly larger than both the modelled and the GOME values, especially at mid-latitudes of both hemispheres. As already discussed by De , the offset found between GOME and SCIAMACHY columns generally increases with latitude, and concerns primarily the winter values. Still, even for Europe, the offset (1.3×10 15 molec/cm 2 ) is largely below the estimated errors on the vertical columns. Over the eastern US, the offset appears to be negligible during summertime.
The seasonal variability of the monthly averaged observed and modelled columns for the year 2000 is illustrated in Fig. 4. The averages from the model account for the sampling times of observations at each location. The model generally reproduces well the seasonal variation of the HCHO columns over the main emission regions (continental Tropics, US and China), which are the main focus of our study. Noticeable exceptions (e.g. over Southern Africa in September-October, or over Indonesia and Australia throughout the year, see Fig. 4) can be generally attributed to the emissions used in the model, as will be discussed in detail later.

Uncertainties in the modelled HCHO columns
The model errors are very difficult to estimate because they originate in a large diversity of uncertainties related to the chemical mechanism, the OH and photolysis fields, the transport scheme, the emissions (e.g. injection heights, diurnal cycle), and specific numerical aspects of the IMAGES model. We provide here a tentative evaluation of these uncertainties based on a series of sensitivity simulations. Our aim is more to identify the largest sources of error than to provide precise error bars on the columns. Also, this evaluation is partial, since not all sources of error are investigated here.
Model runs are conducted for the year 2004 using the GFEDv2 biomass burning inventory and the MEGAN-ECMWF isoprene database. The precise choice of simulation year and emission inventories is not expected to have a large influence on the conclusions of this sensitivity analysis. Table 3 summarizes the simulations and provides the corresponding globally averaged biases and root mean square deviations (RMSD) calculated over grid cells characterized by a large HCHO signal (SCIAMACHY columns larger than 6×10 15 molec/cm 2 ). The global distributions of the relative differences with respect to the standard run are given in Part C of the Supplement for the most important cases.
Since our focus is on the main emission regions, where the HCHO column is mostly due to the oxidation of fastreacting NMVOCs such as isoprene (see Subsect. 4.1), largescale horizontal transport is not expected to have a large influence on the HCHO levels over these areas, and we do not attempt to quantify the errors associated to advection in the model. The role of subgrid-scale vertical transport is investigated in simulations T1-T4 (see Table 3). Convection redistributes the NMVOCs from the boundary layer to the free troposphere, where photolysis rates and OH levels are usually higher. Consequently, the chemical lifetime of the HCHO column is reduced when convection is enhanced in the model. Ventilation due to large-scale horizontal transport is also stronger at higher altitudes. The HCHO column is therefore reduced when convective fluxes are doubled, by 3.8% on average, and by up to about 10% in specific areas. The effect of changes in the vertical diffusion coefficients is comparatively much smaller.
The abundance of OH radicals is influenced by many factors, among which the radiative fluxes and the abundance of key compounds like O 3 , NO x , CO and the NMVOCs. Furthermore, uncertainties in the chemical mechanism of NMVOCs might lead to large errors in the determination of OH concentrations. We illustrate the dependence of HCHO columns on OH levels in simulation C1, which uses an independent determination of the OH fields (Spivakovsky et al., 1990). Over the main NMVOC emission areas, the surfacelevel concentrations of OH of Spivakovsky et al. (1990) are by 100-500% higher compared to the standard run. In other regions, differences range between −80 and +50%. In spite of these large differences in OH, the relative differences in HCHO columns are small over most emission regions Fig. 3. HCHO columns retrieved from GOME in 1997 and SCIAMACHY in 2005 and calculated using IMAGESv2 with the GFEDv1 biomass burning inventory and biogenic emissions from MEGAN-ECMWF. Units are 10 15 molec/cm 2 . Blank regions in the upper panels denote a lack of valid data (e.g. missing cloud cover information, South Atlantic Anomaly, large zenith angles). (<10%), although they can reach 20% over several low-NO x continental areas, and up to 50% over the Southern Ocean and over Europe during winter. In regions where methane is the main source of HCHO (see Subsect. 4.1), the production of HCHO is proportional to the OH abundance. The reaction with OH being only a small sink for HCHO (about 24% globally according to our calculations), the HCHO abundances are almost proportional to the OH levels over these areas. Over the main biogenic and pyrogenic emission regions, however, most HCHO precursors are short-lived, and the production of HCHO shows little dependence on the OH levels.
The simulation C1 provides probably only an upper limit for the effect of OH levels on HCHO columns over emission regions, because the parameterization of Spivakovsky et al. (1990) neglects the effect of isoprene and other terpenoids on OH, and therefore probably overestimates the OH concentrations in these regions. Over areas characterized by large isoprene emissions and low NO x abundances, OH levels are strongly depleted in the model, a common feature of most CTMs (Lelieveld et al., 2008). This OH shutdown appears to be overestimated, however, according to recent field studies, and the OH sink due to the reaction of NMVOCs with OH could be partly compensated by unknown OH regeneration reactions in the degradation mechanism of isoprene (Lelieveld et al., 2008). The impact of an additional production of OH radicals in the oxidation of isoprene is tested in the simulation C2. As suggested by Lelieveld et al. (2008), two OH radicals are produced in the reaction of isoprene hydroperoxides with OH. The surface-level OH concentrations are increased by 100-200% over remote forests in simulation C2, in comparison with the standard simulation. This OH increase results in only a small increase (a few percents) in the HCHO columns in these areas.
As mentioned in Subsect. 3.2, large differences exist between the isoprene degradation mechanisms used by different models. The GEOS-Chem mechanism for isoprene  replaces the IMAGESv2 mechanism in simulation C3. The HCHO columns are reduced by about 11%, on average, and by up to 20% in low-NO x Tropical areas (see Fig. 3 in the Supplement).
Despite the dominance of photolysis in the global sink of formaldehyde, the impact of doubling or halving the cloud optical depth (runs C4-C5) is small (RMSD around 1%). It should be reminded that cloudy pixels are rejected from the HCHO averages and that the model average accounts for the sampling times of the observations. Therefore, both the observed and simulated monthly columns represent averages in conditions when clouds have a limited influence on the photolysis rates.
The role of the temporal and vertical distribution of pyrogenic emissions is explored in simulations E1-E3. The omission of the diurnal cycle (run E1) leads to HCHO column increases exceeding 10% locally, because biomass burning emissions have their diurnal minimum in the hours preceding the time of the satellite time overpass. The use of 8-day averages from the GFEDv2 inventory (instead of monthly averages in the standard run) leads to a complex pattern of HCHO column changes reflecting the sometimes uneven temporal sampling of HCHO columns. The RMSD in both E1 and E3 simulations is relatively small (2.9%). The vertical distribution of pyrogenic emissions has only a minor influence on the HCHO columns, with an RMSD of only 0.7%.
Numerical errors are associated with the chemical solver and the time step used in the model, as well as with the procedure used to account for the diurnal cycle (see Subsect. 2.2). The calculated HCHO columns are moderately  Fig. 4 in the Supplement).

Performed simulations
Three 10-year model simulations are conducted, using different combinations of emission inventories, as summarized in Table 4. The modelled HCHO columns in each of these simulations are evaluated against observed columns over the individual model grid cells and regions displayed in Fig. 5. Given the high noise and poor agreement between the model and the observations at high latitudes (e.g. Europe, Canada, and Siberia) and in the region influenced by the South Atlantic Anomaly (South America around 30 • S), these areas will be excluded from the discussion. Furthermore, since the chemical scheme used in the model is not optimized with respect to HCHO production from anthropogenic NMVOCs, our interest focuses mainly on the Tropics, where pyrogenically and biogenically emitted NMVOCs are primarily responsible for the enhanced HCHO abundances (Figs. 2, 3), the anthropogenic contribution being much less important (Fig. 2). Comparisons are also provided over eastern US and China since the HCHO columns are still dominated by the biogenic source, being about twice as large as the anthropogenic source strength, as shown on the 2006 annual mean of Fig. 2.
Monthly averaged modelled and observed HCHO columns are compared over the regions shown on Fig. 5 and are illustrated in Fig. 6    for the burning season and for the rest of the year in Tables 5 and 6. The burning season is defined as the months of the year for which vegetation fires (CO 2 emission exceeding 5×10 12 molec/cm 2 /s) co-occur with the lowest precipitation rates. The differences between the computed correlation coefficients for each simulation are tested for statistical significance using Fischer's z-transformation (Bronshtein and Semendyayev, 1997), with the sample size being adjusted for (first-order) autocorrelation. The highest correlation coeffi-cients are marked in bold in Tables 5 and 6 when the differences are found to be significant at the 95% confidence level. Pearson correlation coefficients r d have been also calculated after removing the mean seasonal variation over 1997-2006 during the non-burning season, as illustrated in Table 6.

Eastern US
In Fig. 6, the monthly averaged modelled columns are confronted to observed HCHO columns over the eastern United States (regions 1 and 2 of Fig. 5). Black and red solid lines correspond respectively to the S1 and S2 model simulations described in Table 4, monthly observed HCHO data columns are represented as black diamonds and the error bars represent the HCHO retrieval errors. The strong seasonal variation in the observed HCHO columns over this region, with values about a factor of two higher in summer than in winter, correlates clearly with the growing season characterized by enhanced isoprene emissions, as already reported in Abbot et al. (2003) and Palmer et al. (2006). Both biogenic emission datasets are in good consistency over this region and capture the seasonal variations very well. This is reflected in the high correlation obtained for North America between the modelled and the observed columns (Table 6), even though the model is biased high by 16-19% on average in summertime, the overestimation being more significant for southeastern US (about 38%). The use of the MEGAN-ECMWF inventory results in somewhat lower biases and higher correlation coefficients compared to the GEIA inventory over North America.
The model overestimation contrasts with the underestimation of GEOS-Chem HCHO compared to the Harvard GOME columns  over eastern US by 20-30% . Part of the overestimation in our study could be due to the relatively high yield of formaldehyde from isoprene in our chemical mechanism, as discussed in Subsect. 3.2. More importantly, the GOME HCHO columns used in this work are by about 30% lower in this region than the Harvard GOME columns , which have been found to be up to 14% higher than OMI HCHO columns in the eastern US . The reasons for these discrepancies are currently unknown. A detailed intercomparison between the datasets is clearly needed to cast light upon the causes of these differences.
A sensitivity calculation performed using halved isoprene emissions over North America (Fig. 6, blue line) leads to a significant reduction of the model/data bias for all years, from 37.2% to 7.6% over the southeastern US. The ability of the model to reproduce independent HCHO observations over this region is tested in Fig. 7, where the IM-AGESv2 profiles derived from the S2 simulation and from the simulation with the reduced biogenic source are compared with the mean observed vertical distribution during the INTEX-A campaign (July-August 2004 over four regions. The measurements were performed by two groups, the National Center for Atmospheric Research (NCAR) and the University of Rhode Island (URI) and are publicly available at the NASA data center website (www.air.larc.nasa.gov). Different measurement methods were used by the two groups, namely, tunable diode laser absorption spectrometry (Fried et al., 2008) by NCAR and an automated coil enzyme fluorometric system by URI (Heikes et al., 2001). The URI dataset has systematically lower values than the NCAR data, by about 30% in the boundary layer. The S2 modelled mixing ratios (black lines in Fig. 7) at altitudes higher than 1.5 km lie mostly between the values defined by the two datasets, whereas over the continental boundary layer they are more than 10% higher than the NCAR masurements, and by about 20% higher than in the GEOS-Chem CTM (Millet et al., 2006). Reducing the biogenic source by a factor of two results in a 20-25% decrease of the HCHO concentration in the boundary layer, which is now found to lie in between the values provided by the two datasets.

Tropical America
Comparisons are shown on Figs. 8 and 9 over several regions of Central and South America. The model/data agreement over these regions is quite satisfactory, especially for the extended Amazonian region, Guatemala, and Northern Mato Grosso. The high spatiotemporal correlation coefficients calculated in both the burning and non-burning seasons over Tropical America and Amazonia yield confidence to the spatiotemporal distribution of the implemented emission inventories. Important reductions of the model/data discrepancies are achieved when the lower emissions of the MEGAN-ECMWF inventory are used ( Fig. 9 and Table 6). The differences between the correlation coefficients between S1 (using GEIA) and S2 (using MEGAN-ECMWF) are found to be significant for both the Amazonian basin and Tropical America during the non-burning season, with the MEGAN-ECMWF inventory offering a higher correlation (Table 6).
Although once rarely touched by fires, the Amazon rainforest now experiences human-induced fires, which render the damages caused by natural droughts, related principally to the El Niño events, even more devastating. The Amazonian drought in 1997/1998, caused by the exceptionally strong El Niño event, and the 2005 extended dry season, most probably associated with the atlantic multidecadal oscillation (Marengo et al., 2008), resulted in increased fire activity (Aragão et al., 2007). The HCHO column enhancements caused by these fires are quite well captured by both GFEDv1 and 2 (Supplement, Fig. 1), although their magnitude and timing may differ. The main burning season spans August through October over the Amazonia basin, as illustrated in Figs. 8 and 9. As seen on these figures, the S2 simulation using GFEDv1 allows for a better match with the measurements compared to the S3 model run in most of the selected regions. This is also reflected in the higher correlation coefficients calculated in the S2 simulation during the burning season, despite the large positive biases between the modelled and the observed HCHO columns (Table 5). Such overestimation of the modelled columns over biomass burning regions might be partly or entirely due to the lack of aerosol correction in the retrieval, which can lead to by up to 40% lower HCHO columns, as already mentioned in Subsect. 2.1. Conversely, the generally underestimated modelled columns in the S3 simulation (Figs. 8,9) are most probably related to an important underestimation of the NMVOC emissions used to drive the model.
Over the extended Amazonian region, the higher fire emissions of GFEDv1 and the lower values of the MEGAN-ECMWF emissions (S2 run) provide an excellent consistency with the measured columns, which contrasts with previously conducted studies (Shim et al., 2005). The agreement is very good over the Guatemala grid cell in both (wet and dry) seasons. Over Northern Peru, while the use of GEIA inventory leads to systematically overestimated columns (by up to 50%), a substantial bias reduction is achieved when using the lower emissions of the MEGAN-ECMWF inventory. The GFEDv2 succeeds well in capturing the dry season maxima over the Peru-Bolivia-Brazil grid cell, but fails in getting the right magnitude over N-E Brazil and Santarem. It should be noted, however, that the MEGAN-ECMWF isoprene fluxes are found to be largely overestimated, especially in the wet season, when compared to surface flux measurements conducted at the Tapajós National Forest (2 • 51 S, 54 • 58 W), although the adequacy of a comparison between averages over a large region and point measurements can be questioned . The seasonal pattern of these flux measurements is in agreement with in-situ isoprene concentration measurements for 2002 (Trostdorf et al., 2004). Further measurements are obviously needed in order to assess the representativity of these measurements at a larger scale.

Africa
Over Africa, North of the equator, the dry season extends from November through March, and the rainy season from April through October, whereas in the Southern Hemisphere the contrary is true. On both sides of the equator, however, local climates with two dry and two wet seasons are found. The two biomass burning inventories capture reasonably well the timing of the fires, yet several differences exist regarding the intensity of the maxima as well as the seasonal patterns.
Over the Central African Republic, a satisfactory agreement with the observations is obtained with GFEDv1, whereas systematically overestimated columns, especially in the dry season (up to 100% in January 2000 and are derived when GFEDv2 is used. The absence of aerosol correction in the retrieval could account for only a part of this large overestimation. The wet season over this region is characterized by significantly lower emissions compared to the dry season (factor of two). The lower values of the MEGAN-ECMWF isoprene emissions lead to a reduction of the model/data bias in the wet season for most of the years.
Over Southern Africa, the use of the MEGAN-ECMWF emissions increases the correlation between modelled and observed columns by 31% (from 0.44 to 0.58) compared to the simulation using the GEIA inventory (Table 6), a difference which is found to be statistically significant. Nevertheless, it results in a negative average bias (21.3%) of the modelled columns, except during the dry season, when hydrocarbon emissions by fires dominate over the biogenic source. This could be possibly due to a bias in the retrievals and/or to an underestimation of biogenic emissions over this region. In fact, field studies characterizing species composition and isoprene emission factors conducted over this region allowed for the determination of the MEGAN isoprene emission estimates with less uncertainties than those for many other regions (Otter et al., 2003). On the other hand, the low emission rates of the MEGAN-ECMWF inventory used might be partly due to the soil moisture stress , which could have influenced the campaign measurements at the Southern African locations used to determine the MEGAN basal emission rates . It is more likely, however, that the very simple parameterization for soil moisture stress used in MEGAN might not be appropriate (except for shutting off emissions in severe drought conditions). During the dry season, the one-month delay in the fire peak in the GFEDv2 is supported by the observations and results in a higher correlation coefficient (0.78), compared to the simulation using the GFEDv1 (0.67). This difference is found to be significant at the 95% level of confidence (Table 5).
Over the Ivory coast region, the use of GFEDv2 results in generally underestimated columns during the fire season, whereas the emission peaks and their variations are better captured in the S2 simulation. Both biogenic inventories are in good agreement, although the higher GEIA values improve generally the agreement with the data. However, there are no isoprene emission measurements reported for this region, and the MEGAN-ECMWF estimates are expected to be highly uncertain. In the equatorial rainforest of Congo, which experiences two dry seasons and two wet seasons every year, the use of the MEGAN-ECMWF inventory leads generally to lower modelled HCHO columns, but with a seasonal variation closer to that of the observations. The biogenic emission estimates for Congo derived by MEGAN are based on a limited set of aircraft or tower flux measurements that indicate that this region has generally low isoprene emissions. Both biomass burning inventories, however, miss the observed maximum in February.

Asia and Australia
The monthly mean HCHO columns over Northern and Southern China (Fig. 11) exhibit a pronounced seasonal cycle associated to enhanced biogenic emissions. The modelled columns using the GEIA inventory (black line) are overestimated by up to 20-40% in the growing season (May-October), whereas the use of the lower MEGAN-ECMWF emissions (S3 simulation) leads to a better agreement over both regions. The slight underestimation showing up after 2002 can be partially explained by a small offset between the GOME and SCIAMACHY column data . Another possible explanation could be a positive trend of isoprene emissions in China, especially over small spatial scales, due to massive afforestation programs which rank China as the country with the highest recorded annual planting rates . Such changes however, should happen progressively throughout the years, as tree plantations become mature and emit isoprene at higher rates . Both pyrogenic inventories agree on their estimates of burnt biomass over China and provide a satisfactory agreement with the observations, except in April 2002 where neither inventory succeeds in capturing the elevated HCHO value.
Total biomass burning NMVOC emission from GFEDv2 for Far East and South Asia is estimated to about 20 Tg/yr for 1997-2001 average fire activity, and to 5.5 Tg for the year 2000, a factor two lower than the prior estimate (12 Tg) used in the inversion study of Fu et al. (2007) for the same year. Their inversion constrained by GOME HCHO columns  suggests a five-fold increase of the biomass burning source, which is not supported by our comparisons, even when accounting for retrieval biases over fire scenes. Further, the MEGAN-ECMWF biogenic fluxes averaged over the 1997-2001 period are by 40% lower that the GEIA source (62 vs. 87 Tg/yr), but agree reasonably well with the posterior estimate of Fu et al. (2007) (56 Tg/yr).
The extremely high values of the monthly mean HCHO columns over Indonesia in September/October 1997 (up to 2.5 × 10 16 molec/cm 2 ) are due to the El Niño-induced fires, which raised the pollution levels to unrecorded heights over Sumatra and Borneo. This peak is overestimated by the model by more than a factor of two over Borneo when GFEDv2 is used. Below ground burning, taken into account in GFEDv2, leads to enhanced fire emissions over Indonesia, and is believed to be the main reason for the discrepancy between the two inventories over these regions. The inclusion of peat fires in the GFEDv2 is partly supported by our comparisons, however, because of the better correlation with the data (0.82 vs. 0.71), which is tested to be statistically significant (Table 5), even though modelled columns are biased higher on average with respect to the S2 simulation (59% vs. 36.3%).
Over Northern Australia, the use of the MEGAN-ECMWF inventory leads to a large overestimation (51.5%) of HCHO columns relative to the satellite observations. Over this continent, woody vegetation is dominated by Eucalyptus trees. The very high isoprene emission estimated for Northern Australia by MEGAN-ECMWF is primarily based on the assumption that all Eucalyptus trees emit isoprene. Although high isoprene emission rates have been reported for several Eucalyptus species (He et al., 2000), this represents only a few percent of the more than 700 Eucalyptus species found in Australia (Cronin, 2000). In addition, since it is known that other diverse genera, including oaks and acacias, have both isoprene emitters and non-emitters, it would not be unreasonable to suppose that at least some eucalypts emit small or negligible isoprene quantities. Measurements on the dominant Australian species are necessary to determine whether the assumptions made in the MEGAN model lead to overestimated isoprene emissions in northern Australia. The use of MEGAN-ECMWF inventory, however, improves considerably the agreement regarding the seasonal variation of the observed HCHO columns, since the biogenic emissions over this region show a strong seasonality , as confirmed by the correlation coefficient which is increased from 0.12 in the S1 simulation to 0.59 during the non-burning season (Table 6). During the burning season, higher correlations are achieved over Northern Australia when using GFEDv2 instead of GFEDv1, the difference being tested to be statistically significant (0.67 vs. 0.47 in S2 simulation).

Conclusions
A new dataset of spaceborne HCHO columns derived from GOME and SCIAMACHY satellite instruments and covering the period 1997-2006 has been compared with corresponding columns calculated with an updated version of the IMAGES global chemical transport model. The main difference between this dataset and previous retrievals lies in the choice of the fitting window, which reduces retrieval problems over desert areas, while allowing for a good consistency between datasets from the two instruments. Particular emphasis in the comparisons has been placed on the continental Tropics, where HCHO abundances are generally dominated by biogenic and pyrogenic NMVOC emissions, and on North America and China, experiencing strong biogenic emissions during the growing season. Two biomass burning inventories (GFEDv1 and GFEDv2) and two biogenic inventories (GEIA and MEGAN-ECMWF) are evaluated with IM-AGESv2 through comparisons between observed and simulated monthly averaged HCHO columns. The NMVOC chemical mechanism has been adjusted and extended on the basis of box model simulations conducted using the MCM mechanism for the most prominent pyrogenic NMVOCs. Three model simulations have been carried out using the following combinations of inventories: GFEDv1/GEIA (S1), GFEDv1/MEGAN-ECMWF (S2), and GFEDv2/MEGAN-ECMWF (S3). The high correlation coefficients computed between the (monthly averaged) simulated and retrieved columns provide strong confidence in both the model and the emission inventories used. The use of MEGAN-ECMWF is found to improve the correlation between model and data over the regions of Fig. 5 and Table 6, as evidenced by the statistically significant differences in the correlation coefficients between the S1 and S2 simulations. This suggests that MEGAN-ECMWF provides a better representation of the temporal variability in the emissions.
The situation is more complex regarding the biomass burning inventories. Although neither GFED version appears to be consistent with the observation over all regions, yet the differences between the two inventories can be evaluated based on our model/data comparisons. Two features of GFEDv2, namely the emissions associated with peat bog fires and the one-month delay of the emission peak over Southern Africa, induced by the use of MODIS hot spots, are corroborated by the observations, since the use of GFEDv2 results in a better seasonal variation over Indonesia and Southern Africa, in spite of an overestimation of the modelled columns. This is confirmed by the higher correlation coefficients over these regions during the burning season for the S3 simulation. Over Tropical America, however, the GFEDv1 inventory provides a significantly better correlation with the data columns compared to the very low emissions of GFEDv2. It should be acknowledged again, however, that large uncertainties in the HCHO retrieval are related to the omission of aerosol correction over fire scenes, which may lead to a significant underestimation of the HCHO columns. As a consequence, the overestimation of the modelled columns over several biomass burning areas, like Amazonia for GFEDv1, Indonesia for GFEDv2, or Oceania for both inventories, may be partly or entirely explained by the Table 6. Non-burning season average biases (model-observations) and correlation coefficients r and r d (before and after removing the mean seasonal variation) calculated for the model simulations S1 and S2 (Table 4) over  Other regions are as in Table 5. Numbers in bold letters denote cases where the differences in the correlation coefficients of the two simulations are significant at the 95% confidence level. lack of aerosol correction in the retrievals. Conversely, underestimated modelled columns, as for instance over Amazonia and Southern Africa for GFEDv2, probably imply a large underestimation of the NMVOC emissions implemented in the model.
Despite the overall good performance of the two biomass burning databases, it is understood that the use of monthly averaged emissions constitutes a limitation in the comparisons, because of the very high variability of fire emissions. Furthermore, the NMVOC degradation mechanisms used in CTMs constitute another important issue, as demonstrated by the factor of two difference between the HCHO yields from different isoprene degradation mechanisms used in current CTMs. Uncertainties in the chemical mechanism of other biogenic compounds are probably even larger.
Although this study puts emphasis on tropical regions, where the detected HCHO signal is strong, anthropogenic NMVOC emissions are also expected to produce a detectable signal in industrialized areas. Anthropogenic NMVOC emissions over North America seem, however, to be undetectable even with the high resolution Ozone Monitoring Instrument (OMI) in summertime . Their contribution in wintertime is expected to be more significant, though. In order to quantify this contribution based on satellite retrievals, the chemical mechanism of the model should be extended in order to include all anthropogenic precursors of formaldehyde, which should be identified based on a detailed box model study using (quasi-) explicit degradation mechanisms. This issue will be addressed properly in future work. The quantification of the parent NMVOC emissions through inverse modelling should, however, be performed with particular care due to the spatiotemporal overlaps of the different emission source types.
It is understood that our conclusions depend vitally on the quality of the retrieved columns. However, discrepancies exist among the HCHO datasets, inherent to differences in the retrieval methods. For instance, our GOME slant columns are by about 30-40% lower than the Chance et al. (2000) dataset over North America  and desert regions, whereas over central/Southern Africa, HCHO columns used in Meyer-Arnek et al. (2005); Wittrock et al. (2006) are by 40% lower than in the TEMIS dataset. In view of these discrepancies, it is very important to carry out systematic comparisons between the datasets. More efforts should be devoted to validation with ground-based and aircraft data, as well as to the synergistic use of different satellite instruments over the same time period.