Which processes drive observed variations of HCHO columns over India

Abstract. We interpret HCHO column variations observed by the Ozone Monitoring Instrument (OMI), aboard the NASA Aura satellite, over India during 2014 using the GEOS-Chem atmospheric chemistry and transport model. We use a nested version of the model with a horizontal resolution of approximately 25 km. HCHO columns are related to local emissions of volatile organic compounds (VOCs) with a spatial smearing that increases with the VOC lifetime. Over India, HCHO has biogenic, pyrogenic, and anthropogenic VOC sources. Using a 0-D photochemistry model, we find that isoprene has the largest molar yield of HCHO which is typically realized within a few hours. We also find that forested regions that neighbour major urban conurbations are exposed to high levels of nitrogen oxides. This results in depleted hydroxyl radical concentrations and a delay in the production of HCHO from isoprene oxidation. We find that propene is the only anthropogenic VOC emitted in major Indian cities that produces HCHO at a comparable (but slower) rate to isoprene. The GEOS-Chem model reproduces the broad-scale annual mean HCHO column distribution observed by OMI (r = 0.6), which is dominated by a distinctive meridional gradient in the northern half of the country, and by localized regions of high columns that coincide with forests. Major discrepancies are noted over the Indo-Gangetic Plain (IGP) and Delhi. We find that the model has more skill at reproducing observations during winter (JF) and pre-monsoon (MAM) months with Pearson correlations r > 0.5 but with a positive model bias of ≃  1×1015 molec cm−2. During the monsoon season (JJAS) we reproduce only a diffuse version of the observed meridional gradient (r = 0.4). We find that on a continental scale most of the HCHO column seasonal cycle is explained by monthly variations in surface temperature (r = 0.9), suggesting a role for biogenic VOCs, in agreement with the 0-D and GEOS-Chem model calculations. We also find that the seasonal cycle during 2014 is not significantly different from the 2008 to 2015 mean seasonal variation. There are two main loci for biomass burning (the states of Punjab and Haryana, and northeastern India), which we find makes a significant contribution (up to 1×1015 molec cm−2) to observed HCHO columns only during March and April over northeastern India. The slow production of HCHO from propene oxidation results in a smeared hotspot over Delhi that we resolve only on an annual mean timescale by using a temporal oversampling method. Using a linear regression model to relate GEOS-Chem isoprene emissions to HCHO columns we infer seasonal isoprene emissions over two key forest regions from the OMI HCHO column data. We find that the a posteriori emissions are typically lower than the a priori emissions, with a much stronger reduction of emissions during the monsoon season. We find that this reduction in emissions during monsoon months coincides with a large drop in satellite observations of leaf phenology that recovers in post monsoon months. This may signal a forest-scale response to monsoon conditions.

Abstract. We interpret HCHO column variations observed by the Ozone Monitoring Instrument (OMI), aboard the NASA Aura satellite, over India during 2014 using the GEOS-Chem atmospheric chemistry and transport model. We use a nested version of the model with a horizontal resolution of approximately 25 km. HCHO columns are related to local emissions of volatile organic compounds (VOCs) with a spatial smearing that increases with the VOC lifetime. Over India, HCHO has biogenic, pyrogenic, and anthropogenic VOC sources. Using a 0-D photochemistry model, we find that isoprene has the largest molar yield of HCHO which is typically realized within a few hours. We also find that forested regions that neighbour major urban conurbations are exposed to high levels of nitrogen oxides. This results in depleted hydroxyl radical concentrations and a delay in the production of HCHO from isoprene oxidation. We find that propene is the only anthropogenic VOC emitted in major Indian cities that produces HCHO at a comparable (but slower) rate to isoprene. The GEOS-Chem model reproduces the broad-scale annual mean HCHO column distribution observed by OMI (r = 0.6), which is dominated by a distinctive meridional gradient in the northern half of the country, and by localized regions of high columns that coincide with forests. Major discrepancies are noted over the Indo-Gangetic Plain (IGP) and Delhi. We find that the model has more skill at reproducing observations during winter (JF) and pre-monsoon (MAM) months with Pearson correlations r > 0.5 but with a positive model bias of 1 × 10 15 molec cm −2 . During the monsoon season (JJAS) we reproduce only a diffuse version of the observed meridional gradient (r = 0.4). We find that on a continental scale most of the HCHO column seasonal cycle is explained by monthly variations in surface temperature (r = 0.9), suggesting a role for biogenic VOCs, in agreement with the 0-D and GEOS-Chem model calculations. We also find that the seasonal cycle during 2014 is not significantly different from the 2008 to 2015 mean seasonal variation. There are two main loci for biomass burning (the states of Punjab and Haryana, and northeastern India), which we find makes a significant contribution (up to 1×10 15 molec cm −2 ) to observed HCHO columns only during March and April over northeastern India. The slow production of HCHO from propene oxidation results in a smeared hotspot over Delhi that we resolve only on an annual mean timescale by using a temporal oversampling method. Using a linear regression model to relate GEOS-Chem isoprene emissions to HCHO columns we infer seasonal isoprene emissions over two key forest regions from the OMI HCHO column data. We find that the a posteriori emissions are typically lower than the a priori emissions, with a much stronger reduction of emissions during the monsoon season. We find that this reduction in emissions during monsoon months coincides with a large drop in satellite observations of leaf phenology that recovers in post monsoon months. This may signal a forest-scale response to monsoon conditions.
(OH) (Jaeglé et al., 1998). It therefore plays a role in determining the oxidizing capacity of the global troposphere. The principal source of HCHO is the oxidation of methane (CH 4 ), which provides a global ambient background. Shorter-lived non-methane volatile organic compounds (NMVOCs) elevate HCHO concentrations over continental atmospheres. Minor direct HCHO sources include biomass burning, industry, agriculture, automobiles, shipping, and vegetation. The atmospheric lifetime of HCHO, determined by OH and photolysis, is typically several hours. Building on our past studies (Palmer et al., , 2006Barkley et al., 2008;Gonzi et al., 2011), we interpret HCHO column distributions over India observed by the Ozone Monitoring Instrument (OMI) aboard the NASA Aura spacecraft (Levelt et al., 2006). We interpret spatial and temporal variations in terms of biogenic, pyrogenic, and anthropogenic VOC sources.
India has the sixth largest economy and the second largest population of any country. It also has one of the largest increases in mortality rates due to chronic exposure to elevated levels of surface ozone and particulate matter (Cohen et al., 2017). Figure 1 shows that India has a rich landscape that includes the Thar Desert over northwestern India, major forests over the southwestern coast and over the east and northeast, and five megacities (New Delhi, Mumbai, Kolkata, Bengaluru, and Chennai). The Indo-Gangetic Plain (IGP) stretches from eastern Pakistan, across the northern edge of India (bounded by the Himalayas), to Bangladesh. The IGP represents more than a quarter of a million acres of fertile land, which is used primarily to grow rice and wheat, but also maize, sugarcane, and cotton. The southwest monsoon represents the main source of water to the IGP, with contributions also from rivers flowing from the Himalayas. The southwest monsoon begins in June and subsides in September. High temperatures over the Thar Desert cause a region of low pressure that helps to establish a large-scale land/sea breeze with the Indian Ocean. This results in warm, moisture-laden air from the Indian Ocean travelling inland. Eventually, this air meets the Himalayas where it is forced to rise. As the air rises, cooler temperatures result in precipitation. Some areas of India receive 10 m of rain annually, mostly during the monsoon season.
Satellite columns observations of HCHO were originally developed using observed UV spectra from the Global Ozone Monitoring Experiment (Thomas et al., 1998;Chance et al., 2000). HCHO column data are now available from a range of satellite instruments, but here we focus on data from OMI. Generally, slant HCHO columns are retrieved by directly fitting to observed spectra in a narrow UV window (Chance et al., 2000;De Smedt et al., 2008;González Abad et al., 2015). Vertical columns are determined by scaling these slant columns by scene-dependent air mass factors (AMFs), taking clouds and aerosol scattering into account (Palmer et al., 2001). Past work has shown that HCHO columns over India have increased nationwide on average between 1.6 % yr −1 (1997, De Smedt et al., 2010 and 1.5 % yr −1 (1995( , Mahajan et al., 2015. Mahajan et al. (2015) also showed using coincident satellite measurements of HCHO and NO 2 that over much of India O 3 production is limited by the availability of nitrogen oxides but over urban regions it is limited by the availability of VOCs, supported by detailed modelling studies over Delhi (Sharma et al., 2016). Here, we employ a high-resolution ( 25 km) model of atmospheric chemistry that is closely aligned with the resolution of the satellite data, allowing us to take advantage of the richness of these data.
HCHO columns are related to their parent VOC emissions with a smearing spatial scale that is related to the production rate and molar yield of HCHO . Past studies have used this relationship to infer isoprene emissions from major forests Abbot et al., 2003;Shim et al., 2005;Palmer et al., 2006Palmer et al., , 2007Barkley et al., 2008;Millet et al., 2008;Stavrakou et al., 2009;Curci et al., 2010;Marais et al., 2012;Barkley et al., 2013), biomass burning emissions (Young and Paton-Walsh, 2010;Gonzi et al., 2011;Stavrakou et al., 2016), and anthropogenic emissions (Fu et al., 2007;Stavrakou et al., 2009). Detailed photochemical calculations that link VOCs and the time-dependent production of HCHO lay the groundwork for interpreting the HCHO column data. Many past studies have inferred VOC emissions from the HCHO columns using a linear regression model between these two variables (e.g. Palmer et al., 2003;Millet et al., 2008), but others have adopted a more rigorous Bayesian inverse model approach (e.g. Shim et al., 2005). For central Africa, Marais et al. (2012) employed an inversion method that accounted for the NO x dependence of the isoprene-HCHO relationship. Given the large uncertainties associated with VOC emissions (e.g. Guenther et al., 2012) and the production of HCHO in the low-NO x regime  both approaches provide useful insights. Our study is focused on India where there are significant sources of biogenic, pyrogenic, and anthropogenic VOCs.
The next section describes the OMI HCHO column data, the detailed box model used to study the time-dependent production of HCHO from VOC oxidation, and the GEOS-Chem atmospheric chemistry transport model focused on India. Section 3 reports the results from our analysis of the OMI HCHO column data over India, the associated model interpretation of these data, and the isoprene emissions we infer from the HCHO column data collected over two major forest regions. We conclude in Sect. 4.

Data and methods
Our data are focused on India, as defined by the GADM database of Global Administrative Areas (www.gadm.org). We adopt climatological definitions of seasons from the Indian Meteorological Department that are determined by the onset of the regional monsoon system as follows: winter includes January and February, the pre-monsoon season is : the blue outline represent the states of Punjab and Haryana; the red outline denotes the "Seven Sister States"; and the yellow outline represents the state of Kerala. The pink and green shaded areas denote the NE and E forest sites, where we infer isoprene emissions from OMI HCHO columns. The grey shaded area denotes the oversampling region used to study Delhi. The stars denote the sites where we study HCHO production using a 0-D photochemical model. The population image is modified with permission from http://luminocity3d.org/: data provided by EC JRC & CIESIN, and the design provided by Duncan Smith, Bartlett Centre for Advanced Spatial Analysis, University College London.
from March to May, the monsoon season is from June to September, and the post-monsoon season is from October to December. Our focus is on OMI HCHO column variations during 2014, but we put this year into a longer temporal context by comparing it with data collected from 2008 to 2015.

Ozone Monitoring Instrument (OMI) HCHO columns
For our analysis we use HCHO vertical columns from OMI, a nadir-viewing UV/Vis spectrometer aboard the NASA Aura satellite that was launched in 2004. Aura is in a sunsynchronous orbit with a local equatorial crossing time of 13:38, achieving daily global coverage subject to cloud coverage. OMI uses two imaging grating spectrometers each with a CCD detector to collect solar backscattered radiation in the spectral range 270-500 nm using three channels: UV-1 (264-311 nm), UV-2 (307-383 nm), and Vis (349-504 nm). OMI has an across-track swath width of 2600 km, which in global mode is described by 60 scenes that have ground footprints from 13×24 km 2 at nadir to 28×160 km 2 at the swath edges.
Determination of HCHO vertical columns is a two-step procedure, which we describe below. First, slant columns are retrieved from the observed spectra. Second, an air mass factor (AMF) is used to transform these slant columns into geophysical vertical columns that can be compared more easily with models.
We use the NASA OMHCHOv003 data product (González Abad et al., 2015) from the NASA Data and Information Services Center, which fits HCHO slant columns in the 328.5-356.5 nm window and accounts for competing absorbers, the Ring effect, and undersampling. Typical slant columns range from 4 × 10 15 to 6 × 10 16 molec cm −2 with associated fitting uncertainties ranging from 30 % for larger columns to more than 100 % for smaller columns (González Abad et al., 2015).
We adopt a conservative approach to filtering data, based on previous studies (e.g. De . We filter slant column data using the main data quality flag (removing suspect, bad, or missing scenes); scenes that have been flagged as being affected by the row anomaly corresponding to a problem with a row of CCD detectors on OMI; scenes with slant column values greater than 15 × 10 16 molec cm −2 that we also attribute to potential row anomalies; scenes with solar zenith angles less than 0 • or greater than 70 • ; scenes with cloud fractions > 0.4; and scenes from the last two rows at the edge of the swath, which we believe are too wide to be useful in our analysis. We do not anticipate that these data filters will introduce bias in our analysis, with the exception of removing cloudy scenes that introduces a clear-sky bias. To evaluate the clear-sky bias, we compared the corresponding model HCHO columns (described below) with and without these cloudy scenes and find that removing them results in a monthly mean positive bias of 13 %, consistent with previous work (Palmer et al., 2001).
To transform each observed slant column to a vertical column we calculate an AMF that accounts for temporal and spatial variations from scattering due to clouds and aerosols, and for the vertical distribution of HCHO. This approach is described in detail by Palmer et al. (2001) and Martin et al. (2003), and has since been evaluated in a large number of studies (e.g. Palmer et al., 2003Palmer et al., , 2006Fu et al., 2007;Millet et al., 2008;Barkley et al., 2008;Curci et al., 2010;Gonzi et al., 2011;Marais et al., 2012). We use the OMI OMCLD02 cloud data product, and the nested GEOS-Chem model to provide information about aerosols and HCHO vertical distributions, described below.
Finally, as a post-processing step, we remove any largescale biases using a reference sector normalization procedure Martin et al., 2003). Over the remote Pacific we expect observed HCHO columns to be determined exclusively by the oxidation of methane. By anchoring observed values over this specific region (0-40 • N, 160-180 • W) to the corresponding GEOS-Chem model values we determine the monthly bias that we subsequently subtract from the data within the study domain.
Previous work using aircraft observations over southeastern U.S. has reported that the OMHCHOv0003 OMI HCHO column data product has a negative bias (37 %), and that the GEOS-Chem model (v9-02) has a 10 % negative bias (Zhu et al., 2016). In the absence of similar data over India, we cannot determine with certainty the generality of these biases, and have chosen to implicitly assume that the OMI data are not significantly impacted by systematic error.

Models of tropospheric chemistry
We use the CAABA box model, including a comprehensive description of atmospheric chemistry, to understand the time-dependent yield of HCHO from the oxidation of VOCs in Indian forest and urban environments. These calculations help determine which VOC emissions are responsible for observed HCHO column variations. We also use a nested version of the GEOS-Chem 3-D atmospheric chemistry model to interpret the satellite data and to understand chemistry on a regional spatial scale. We use this model to establish a relationship between emissions of VOCs and HCHO columns which is then used to infer emissions of VOCs that correspond to observed HCHO columns.
The chemical mechanisms from the CAABA box model and the GEOS-Chem model are not identical. CAABA includes a more explicit chemical mechanism and a reduced version of the Mainz Isoprene Mechanism 2 (MIM2, Taraborrelli et al., 2009). The version of GEOS-Chem we use here is described by Eastham et al. (2014). The latest version of the GEOS-Chem chemical mechanism, released after the majority of our calculations had been completed, results in an increase of the HCHO yield from the oxidation of isoprene. We find using preliminary calculations with v11-01 that the revised mechanism does not systematically change the results shown here. Previous studies have evaluated the performance of GEOS-Chem (Marvin et al., 2017) and MIM2 (Taraborrelli et al., 2009) against the Master Chemical Mechanism. The GEOS-Chem v10-01 mechanism slightly underestimates HCHO production in high-NO x conditions and the MIM2 mechanism shows similar HCHO production across a wide range of NO x values.

Box modelling
We use v3.0 of the CAABA/MECCA (0-D) box model (Sander et al., 2011), but without halogen, sulphur, and mer-cury chemistry, to estimate the time-dependent production of HCHO from the chemical oxidation of different VOCs in forest and urban photochemical environments (Table 1).
We set up the model to describe a well-mixed summertime boundary layer, with photochemistry driven by a diurnal cycle in sunlight. For each environment, the photolysis model JVAL (Sander et al., 2014) is used to calculate photolysis rates assuming clear-sky conditions given latitude, longitude, and day of year. For each study location, we assume constant values for humidity, pressure, temperature, and boundary layer height, taken from colocated GEOS-FP meteorology (see below). For forested environments we also prescribe fixed mixing ratios of O 3 , NO 2 , and CO from the GEOS-Chem model, described below. For the Delhi urban environment we use daytime average surface air pollutant values in 2014 as reported by (Tyagi et al., 2016): O 3 (37 ppbv), CO (2.3 ppmv), and NO 2 (19 ppbv), (Table 1). Figure 1 shows the three forested regions (denoted by E, NE, and S) we chose to explore the range of photochemical environments. For each calculation, we spun-up the model from initial conditions for 48 h before running the model for a further seven days. We use a model time step of two minutes.
To evaluate the time-dependent HCHO yield of anthropogenic and biogenic VOCs we run paired calculations as follows: (1) a control run; and (2) a run in which we perturb a VOC by approximately 5 × 10 14 molec cm −2 over a 15 min period at 09:00 LT on the first day after the two-day spin-up period. The perturbed amount is sufficiently small that we can assume that the chemistry response is approximately linear so that we can compare the perturbed run with the control run. We can then use control minus perturbed model calculations to determine HCHO per-carbon yield from the oxidation of the perturbed VOC.

GEOS-Chem 3-D modelling
We use v10-01 of the GEOS-Chem global model of atmospheric chemistry and transport (www.geos-chem.org), driven by GEOS-FP analysed meteorological fields, provided by the Global Modeling and Assimilation Office (GMAO) at NASA Goddard Space Flight Center. The native spatial resolution of these data is 0.25 • (latitude) × 0.3125 • (longitude), and includes 47 vertical terrain-following sigma-levels that describe the atmosphere from the surface to 0.01 hPa of which about 30 are typically below the dynamic tropopause. The 3-D meteorological data are updated hourly, and 2-D fields and surface fields are updated every 3 h. The chemical mechanism used is the "tropchem" mechanism (Eastham et al., 2014). The atmospheric chemistry and transport time steps are five and ten minutes, respectively.
We use the nesting capability of the model to focus on India, defined here as 0-40 • N, 65-100 • E, using the native resolution of the meteorological data. We use time-dependent lateral boundary conditions archived from a self-consistent, 4 • × 5 • version of the global full-chemistry model, which is initialized in January 2013 to minimize the influence of initial conditions. Here, we describe emission inventories relevant to our nested India simulation and direct the reader to the GEOS-Chem website (http://geos-chem.org) for a more comprehensive description of inventories.
Monthly anthropogenic emissions for India are taken from the mosaic MIX inventory (Li et al., 2017), including NO, CO, SO 2 , and NMVOCs that are available on a spatial resolution of 0.25 × 0.25 • . The emissions for India are a mosaic of regional emission inventories, including the Regional Emission inventory in ASia (REAS) (Kurokawa et al., 2013) and those developed by the Argonne National Laboratory (Lu et al., 2011;Lu and Streets, 2012). Indian NMVOC emissions represent by mass 17 Tg per year, of which 43 % is from the residential sector, 36 % is from transport, 20 % is from industry, and a small amount is attributed to the power sector.
Approximately 20 % of India's geographical area is described as forest. To describe biogenic VOC emissions of isoprene, monoterpenes, alkenes, and acetone we use the MEGAN v2.1 emission model (Model of Emissions of Gases and Aerosols from Nature, Guenther et al., 2012). Figure 2 shows seasonal isoprene emissions over India. Monoterpenes emission have a similar distribution to isoprene but are a factor of five smaller in magnitude. Given their relatively small influence on HCHO column variability  we do not discuss them further. Within GEOS-Chem, we ad- To interpret OMI HCHO column data we sample the model at the local time averaged between 13:00 and 15:00, corresponding to the overpass time of OMI, and the location of each observed scene. To put 2014 into a broader temporal context, we interpreted OMI HCHO columns for years 2008-2015. We find that the AMF plays only a minor role in determining OMI vertical column variations so we do not expect our approach to significantly influence our study of variations in the HCHO columns.

Results
First, we report results from the box model calculation to provide some insights into the VOCs that determine the observed HCHO column variations. We then report the annual and seasonal spatial distributions of OMI and GEOS-Chem HCHO columns. Using correlative space-borne data we explore the role of biogenic, pyrogenic, and anthropogenic emissions in determining the spatial distributions of HCHO. Finally, using a model relationship between isoprene emissions and HCHO columns we infer isoprene emissions that are consistent with the OMI vertical columns.

HCHO yield from VOC oxidation in urban and forest environments
We use the CAABA/MECCA box model to determine the time-dependent production of HCHO from the VOCs that we expect to be emitted from urban and forest environments over India. This represents important groundwork for interpreting the satellite observations of HCHO. Table 1 provides an overview of the box model calculations we describe below.
Relating HCHO column variations to emissions of its parent VOC requires that the VOC (1) has a high yield of HCHO so that eventual concentrations are elevated above the global background, determined mainly by the oxidation of CH 4 ; and (2) produces HCHO rapidly so that most of the HCHO is produced close to the emission source and not smeared over long spatial scales. Isoprene is the dominant HCHO precursor over many northern midlatitude and tropical forest ecosystems (Palmer et al., , 2006Barkley et al., 2008;Curci et al., 2010). Over tropical latitudes biomass burning emissions of VOCs also play a role in HCHO column variations (Fu et al., 2007;Barkley et al., 2008;Gonzi et al., 2011;Marais et al., 2012). Fu et al. (2007) also showed that reactive anthropogenic VOCs played a role in HCHO column variations over China.

Biogenic VOCs from forest environments
We explore three contrasting forest regions throughout 2014 (Fig. 1), characterized by latitude-dependent levels of photosynthetically active radiation (PAR) and by their proximity to urban emissions. We focus on the yield of HCHO from the emission of isoprene . We report the resulting lifetime of the injected isoprene for each calculation (Sect. 2.2.1), in addition and the cumulative per-C HCHO yield (Table 2). We find that the duration of the calculation is sufficiently long that the peak HCHO had time to diminish to a negligible amount.
The atmospheric lifetime of isoprene against OH is typically much shorter than an hour, with shortest values (< 15 min) during summer months. The most southern site, in the state of Kerala (Fig. 1), is where isoprene has the longest lifetime. This is due to OH being suppressed by high ambient NO 2 concentrations (in excess of 8 ppbv) originating from the coastal conurbation in that state. We find that levels of NO x are sufficiently high (averages generally around or above 1 ppbv) in all forested regions such that isoprene peroxy radicals preferentially react with NO to rapidly produce HCHO. Peak values for HCHO production are typically reached within 90 min of the isoprene oxidation. The OH suppression in Kerala is primarily responsible for the slow production of HCHO. We report that the peak HCHO signal is typically reached within 30-90 min, corresponding to a smearing length scale of 9-27 km assuming an example wind speed of 5 m s −1 . For the Kerala region where the time taken to reach the peak HCHO signal is typically 2-3 h the corresponding smearing length scale is 36-54 km. Even in Kerala the smearing length is comparable to the longitudinal extent of a single OMI scene and well within the swath width (Sect. 2). Figure 3 shows that the corresponding values of the per-C HCHO yield ranges from 0.38 to 0.66. Higher values are generally associated with higher values of NO x (cf. NO 2 mixing ratios in Table 1 and yields reported in Table 2), consistent with previous studies (Palmer et al., 2006;Barkley et al., 2013). The simulations reported in this work are specific to the Indian scenarios discussed. . Time-dependent cumulative HCHO yield (per-C) produced from isoprene in three contrasting photochemical environments, and from propene, propane, and n-butane in an urban photochemical environments based on Delhi. Calculations are denoted by an alphanumeric code that is explained in Table 1.

Anthropogenic VOC in urban environment
To study the role of anthropogenic non-methane VOCs (NMVOCs) in determining HCHO columns we simulate boundary layer chemistry over Delhi using emissions from the MIX emission inventory (Li et al., 2017). Over major cities, NMVOC emissions originate mainly from stationary combustion and the transport sector. The species in the MIX inventory are defined following the SAPRC-99 chemical mechanism (Carter, 2000). From a comparison of the correlation coefficients we find that over Delhi, according to the MIX inventory, NMVOC emissions include propane, propene, and ALK4, representing 13, 32, and 55 % of NMVOC C emissions. ALK4 denotes > C 4 alkanes. The longest chain alkane in the 0-D chemical mechanism is nbutane. We denote this urban model scenario as "D05" (Table 1). We calculate individual HCHO yields from the oxidation of propane, propene, and n-butane.
Propene (CH 3 CH = CH 2 ) has an atmospheric lifetime of several hours, determined by OH addition to its double bond. The major intermediate products include HCHO and acetaldehyde. The ultimate HCHO yield from the oxidation of propene (close to 0.5) is comparable with the value from isoprene oxidation (Fig. 3) but it only reaches 50 % of that final value within 12 hours, consistent with the longer lifetime of propene. Propane and n-butane have atmospheric lifetimes of 10 and 5 days, respectively, determined by OH. Long-lived intermediate oxidation products include acetone that further delay the production of HCHO (Fig. 3).
Assuming an average wind speed of 3 m s −1 , taken from wind measurements at Indira Gandhi airport in Delhi, only HCHO production from the oxidation of propene (out of all the gases that represent a major contribution to the regional emission inventory) would produce a signal that could potentially be distinguishable above the ambient concentration. We later show that there is some evidence that a Delhi HCHO hotspot can be observed but only after temporally oversampling the data.

Data filtering and AMF statistics
Using the data quality criteria, as defined above, we remove 58 % of all OMI HCHO measurements collected during 2014. The proportion of data removed per month varies during the year, with most scenes removed during the cloudy monsoon season in July (71 %) and the least number of scenes removed during March (45 %). Figure 4 shows the annual mean distribution of AMFs across India, with values ranging from 0.7 to 1.5 and a me-dian value of 1. AMF values are highest in the Himalayas at the extreme north of India, which are often covered by snow, and over the Rann of Kutch salt marsh. In both cases, elevated AMF values are due to high surface albedos.
To determine the influence of the AMFs on the spatial distribution of HCHO vertical columns we compare fitted OMI slant HCHO columns and the corresponding vertical columns to the GEOS-Chem model. We find that the vertical columns reproduce 11 % more of the model spatial distribution than the observed slant columns, consistent with past studies (Palmer et al., 2001;Millet et al., 2008) that show that the AMF plays only a minor role in the observed spatial distribution of vertical HCHO columns.
To investigate the role of model resolution in our calculation of vertical columns we repeated the annual mean analysis using the 2 • × 2.5 • version of the GEOS-Chem driven by the same inventories as described above. Figure 5 shows that the coarser resolution model fails to reproduce smaller-scale variations, as expected, which defines much of the Indian west coast, and misses variations over northeastern India. While the model via the AMF calculation only contributes a small amount to the distribution of HCHO vertical columns it does affect the magnitude of the columns. We find that there is an overwhelming argument to justify the use of the higherresolution model. (2) territory at the south of the country, around the southern part of the Western Ghats mountain range, roughly following the borders of the state of Kerala; and (3) a broad area over the east of the country, roughly outlined by the states of Chhattisgarh, Odisha, and the northern part of Andhra Pradesh.

Annual mean spatial distribution
The nested GEOS-Chem model reproduces the observed magnitude and broad-scale spatial distribution of OMI HCHO columns over India (Fig. 5), but is much smoother. The model has a small but positive bias of 14 % (12 %) for the mean (median) annual column amounts. We find that the model captures 33 % (r = 0.58) of the observed annual mean spatial variation of HCHO columns. The major discrepancy between the model and observed annual mean HCHO distributions is over the IGP and over Delhi. Observed HCHO columns are elevated over the IGP but not to the values shown in the model, which we discuss below. A HCHO hotspot over Delhi is apparently absent from the OMI columns, but as we show later there is evidence that the hotspot exists in the observations. Based on the box modelling, the largest source of HCHO column variability is expected to be isoprene emissions. Figure 1 shows that the broad-scale annual distribution of HCHO over India is consistent with the annual mean distribution of Normalized Diffusive Vegetation Index (NDVI) with a Pearson correlation of r = 0.49. Below, we use seasonal variations of HCHO columns to improve understanding of the drivers. Figure 6 shows OMI and GEOS-Chem model HCHO columns for seasons during 2014. There are distinct seasonal cycles to the HCHO columns over three broad regions: northeastern India, southwestern India, and over the IGP. Over (north)eastern India HCHO columns peak in premonsoon and monsoon months. Columns over southwestern coast of India peak in winter and pre-monsoon months and are very low in other seasons. Columns over the IGP are elevated above those elsewhere in northern Indian during pre-monsoon months and peak in monsoon months. Figure 7 shows that the timing of these elevated columns coincide with warmer surface temperatures. This suggests a role for biogenic VOC emissions, consistent with the strong, local relationship between isoprene emissions and HCHO production that we demonstrated above.

Seasonal spatial distributions
The GEOS-Chem model captures these broad-scale observed distributions of HCHO but is much smoother, as expected. We find the model has some skill at reproducing the observed spatial distribution with a Pearson correlation r of >0.4, with larger values in winter and pre-monsoon months (r = 0.50 and 0.62, respectively) and smaller values in monsoon and post-monsoon months (r = 0.39 and 0.44, respectively). On an annual timescale the model has a positive bias of 1.0 × 10 15 molec cm −2 (33 %), which is skewed due to a large bias during the monsoon months (1.9 × 10 15 molec cm −2 , 17 %). The model has a more defined peak over Delhi, peaking in monsoon months, but there is only a small, diffuse peak in the observations. We discuss this discrepancy below. Figure 7 shows that on a continental scale the seasonal distribution of HCHO slant columns over India in 2014 is not significantly different from observed distributions from 2008 to 2015. In general, HCHO columns follow a similar cycle in the winter and pre-monsoon months, but there are substantial year-to-year variations during the monsoon and post-monsoon months suggesting a role for large-scale variation in meteorology associated with the monsoon. Figure 8 shows the corresponding differences in the spatial distribution of HCHO columns from 2014 compared to the mean 2008-2015 distribution. Median HCHO columns for 2014 are slightly higher than the 8-year average by 0.4 × 10 15 molec cm −2 , but show a similar spatial distribution tak- ing into account year-to-year variability (Fig. 7). Mahajan et al. (2015) reports an analysis of the drivers of year-to-year variations in HCHO columns over India. Figure 7 shows that the observed continental-scale HCHO seasonal cycle is reproduced by the model but with a positive model bias typically within 1 × 10 15 molec cm −2 . On this spatial scale, most of the seasonal cycle can be explained by variations in surface temperature, suggesting a larger than expected role for biogenic emissions. However, the model fails to capture the elevated monthly column during October 2014. We find this is driven mostly by observed variations in HCHO columns over tropical India, coinciding with the withdrawal of the monsoon from India between late September and mid-October (Pai and Bhan, 2015). We suggest this variation represents a response from the vegetation that is missing in the model. Model error associated with cloud coverage and consequent errors associated with the partitioning between direct and diffuse PAR could result in large-scale changes in biogenic emissions. The only forested region that is an exception to the continental-scale picture is over Kerala (denoted by S in Fig. 1). Kerala has a tropical maritime climate with little seasonal variation in temperature. The forested region in Kerala neighbours an urban conurbation associated with a high level of NO x (Fig. 9), which influence the HCHO yields from the oxidation of biogenic VOCs, as described in Sect. 3.1. Fig. 10 shows that the observed seasonal distribution of HCHO columns at this site, broadly reproduced by the model, peaks during the winter and is lowest during premonsoon and monsoon months. The size of the seasonal variation in HCHO columns is not fully explained by the small seasonal variation in surface temperature, suggesting a role for an additional driver. Figure 10 shows that over this region, the seasonal distribution is driven mostly by changes in leaf phenology rather than temperature or PAR. Satellite observations of LAI drop from 4 to 1 m 2 m −2 during the monsoon months and recover in the post-monsoon months. We find similar behaviour over the eastern and northeastern forests with large reductions in LAI during monsoon months (not shown). This behaviour is consistent with vegetation taking advantage of the decreased temperatures and higher precipitation rates during the monsoon season to regulate leaf flushing. Past work has found a contrasting relationship between leaf phenology and satellite observations of HCHO columns over the Amazon Basin (Barkley et al., 2009) in which HCHO columns and LAI values dropped in the transition period between the wet and dry seasons, and recovered soon afterwards. Thermotolerance is one hypothesis that describes why leaves emit isoprene (Singaas et al., 1997). A few weeks after emerging, the emission capacity of leaves peaks and subsequently declines with age. To explain the variation in HCHO columns, (Barkley et al., 2009) proposed wide-scale leaf flushing that allowed vegetation to maximize their protection against the light-rich environment of the dry season. A demographic model of leaf phenology based on the hypothesis that trees seek an optimal LAI as a function of available light and soil water (Caldararu et al., 2012(Caldararu et al., , 2014 explained the observed increase in LAI over the Amazon Basin during the dry season as a net addition of leaves in response to increased solar radiation.

Pyrogenic VOCs
The main loci for biomass burning are as follows: (1) a region approximately encompassed by the state boundaries of Punjab and Haryana (Fig. 1) and (2) northeastern India. The states of Punjab and Haryana have two growing seasons, May-September (rice) and November-April (wheat). Paddy stubble burning in May and October/November represents agriculture burning of wheat and rice residue, respectively. For our purposes northeastern India includes the "Seven Sister States" (Arunachal Pradesh, Assam, Meghalaya, Manipur, Mizoram, Nagaland, and Tripura), where there are significant forest fires particularly during March and April. This is mainly due to deforestation to convert forests to agricultural land (Santenda and Kaushik, 2014). These two geographical regions account for more than 70 % of these emissions during 2014. To illustrate the impact of fires on HCHO column variations we use MODIS fire count data (Justice et al., 2002) to identify when and where fires occur over the states of Punjab and Haryana, and Northeastern India. We calculate monthly mean HCHO vertical columns and fire counts. We then determine which 0.25 • × 0.3125 • grid cells are most affected by fires by selecting these cells in the top 20th percentile of cumulative fires. Figure 11 shows that the highest number of fire counts generally correspond to when there is the largest differ-ence between all grid cells and those most affected by fire. This difference is relatively small over the states of Punjab and Haryana with a peak value of 1.5 × 10 15 molec cm −2 during May. There is a large difference during March and April over the northeastern India where fires contribute up to 5 × 10 15 molec cm −2 to the monthly mean. We find that this contribution to HCHO columns is localized in time and geography.

Anthropogenic hotspots
Guided by a priori emissions and our box modelling, we anticipate that propene is the only anthropogenic VOC likely to produce HCHO rapidly enough that we can relate elevated HCHO columns to emissions. However, we find little evidence that seasonally averaged OMI HCHO columns are elevated over Indian megacities due to limits in the signal to noise, in agreement with previous work (Mahajan et al., 2015).
We use a temporal oversampling approach, following (Zhu et al., 2014), to improve the spatial resolution of HCHO columns over Delhi and the surrounding region. Oversampling increases the signal-to-noise ratio and allows for inspection of finer spatial features, at the expense of the temporal information. We focus on Delhi because the National Capital Region has a population of approximately 17 million people (Perianayagam and Goli, 2012) over a geographical area of approximately 2000 km 2 . Based on the MIX emissions inventory, which is indicative of values from 2010, (Fig. 6) we expect to see an elevated signal from this region. This bottom-up emission inventory likely overestimates emissions from the transport sector, which has seen the biggest change from 2010 to 2014 (Jun-ichi Kurokawa, personal communication, Japan Environmental Sanitation Center, October 2017).
First, our area of focus is divided into a very high resolution grid (0.02 • × 0.02 • ). The temporally averaged column for each point in this grid is the average of the OMI observational vertical columns collected during 2014 with the centre point within 43 km in both the latitudinal and longitudinal directions. This effectively smears out these observations over 43 km squares. Here, we average over 43 × 43 km 2 squares rather than the 24 km radius circles, sampling all the 2014 vertical HCHO columns from the observational dataset for the area around Delhi. Figure 12 shows that the oversampling method results in distinct elevated HCHO columns over New Delhi and along major roadways, although the gradients are still noisy. The magnitude of this elevation is O(10 15 ) molec cm −2 . Elevated areas to the east and southeast of the city may represent HCHO produced from VOC transported downwind from Delhi. Based on our results we find that anthropogenic emissions do not appear to play a large role in the observed column variations of HCHO over India. However, our use of HCHO columns from the OMI instrument, which has a local overpass time of 13:30, may be hindering our ability to observe the anthropogenic contribution to HCHO. Biogenic emissions and to a lesser extent biomass burning emissions peak in the early afternoon hours, which is ideal for OMI. Emissions from the transportation sector, a major source of anthropogenic VOCs, peak during the early morning and late afternoon associated with commuter traffic. We argue that the early morning 09:30 overpass of the Global Ozone Monitoring Experiment (GOME-2) aboard the MetOp satellite is better suited to capture these anthropogenic emissions. Secondary production of HCHO is generally larger than direct emissions of HCHO, and will occur a few hours after the peak commuter time (e.g. Lin et al., 2012;Wang et al., 2017). The early morning rush hour in Delhi starts after 07:00 so we expect a 09:30 overpass to also capture some fraction of the secondary HCHO production. Using data collected from morning and afternoon overpass times to describe diurnal variations of HCHO was presented by (De Smedt et al., 2015), but they did not discuss the relative importance of different VOC emission sources at these different times. Figure 13 shows the annual mean HCHO columns observed by GOME-2 and OMI. For our preliminary argument we are only interested in the distribution of HCHO columns. Here, we have standardized these data for the whole country so that they have a mean of zero and a unit standard deviation using: where x i is a data point,x is the sample mean, and s is the sample standard deviation. This allows us to compare the two data without worrying about bias. The resulting z scores represent the number of standard deviations from the population mean. We find that GOME-2 data has higher columns over the IGP, while OMI more clearly emphasizes the forested regions that we can identify independently through LAI or NDVI measurements (Fig. 1). This qualitative test appears to support our hypothesis and is an early demonstration of how datasets with continuous measurements of a region (such as those from a geostationary satellite) could help capture the temporal variability of HCHO. Taking advantage of the complementary information from multiple sensors that have different local overpass time requires a sophisticated inverse model approach.

Inferring isoprene emissions from OMI HCHO columns
Based on our analysis of the HCHO yields from Indian VOC sources, and the distribution of observed HCHO columns we conclude that the majority of the observed HCHO column variation is due to biogenic VOC emissions. Here, we adopt a simple inversion methodology based on linear regression to infer isoprene emissions from OMI HCHO columns (Palmer et al., 2006). First, we filter HCHO column data over India to minimize any interference from pyrogenic and anthropogenic sources. We focus on two relatively remote areas of India: "East" approximately defined as the area spanning 16-25 • N and 76-87 • E, and "Northeast" defined as the region of India east of 90 • E (Fig. 1). We remove scenes with MODIS land cover classifications (Friedl et al., 2010) corresponding to croplands (including cropland mosaics), urban/builtup, snow/ice, barren/sparsely vegetated, and water bodies. We also filter out potential fire-affected data by identifying, Table 3. A priori and a posteriori isoprene emission estimates (10 11 atom C cm −2 s −1 ) over NE and E forest sites (Fig. 1) for each day, the cells of the 0.25 • ×0.3125 • grid in which fires are reported in the MODIS active fire product. The data from these, and the adjacent cells, are then removed for that day as well as the preceding and succeeding days, following Barkley et al. (2013).
Second, we determine the model relationship between local isoprene emissions E (molec cm −2 s −1 ), as calculated by MEGAN (Guenther et al., 2012), and HCHO columns (molec cm −2 ): where the slope S represents the production of HCHO column per emission of isoprene, and the intercept b represent the HCHO column contributions from longer-lived VOCs mainly from the oxidation of methane. We resolve seasonal a priori emissions of isoprene from observed HCHO columns by transposing the model linear relationship between isoprene emissions and HCHO columns. For both study regions, we find a statistically significant linear relationship between those variables (Table 3), where Pearson correlation coefficients r range from 0.52 to 0.82, with a typical value in excess of 0.70. The slope values (10 3 s) vary with region and season. The offsets that represent the background HCHO column is higher during the pre-monsoon and monsoon summer months when we expect larger HCHO production from a range of longer-lived VOC (including CH 4 ), due to higher values of OH.
Our a posteriori emission estimates are generally lower than a priori values, reflecting the positive model HCHO column bias. This is most pronounced over the northeastern region during the monsoon season, where a posteriori isoprene emissions are 88 % lower than a priori estimate, due to the model not capturing the sharp observed decline in HCHO that appears to be linked with monsoon conditions. We acknowledge that the bias between model and OMI HCHO columns could also reflect a bias in the OMI data (Zhu et al., 2016) but without independent measurements over this region we have chosen to assign these biases exclusively to the model.

Concluding remarks
We used models of atmospheric chemistry to interpret HCHO column distributions during 2014 observed by the Ozone Monitoring Instrument (OMI) satellite instrument over India. The annual mean OMI distribution of clear-sky HCHO columns is dominated by a distinctive meridional gradient in the northern half of the country, and by localized regions of high columns that coincide with forests. We found that the nested GEOS-Chem atmospheric chemistry model (spatially resolved at 25 km) reproduces these broad-scale observed features with a positive model bias, particularly over the Indo-Gangetic Plain and Delhi.
Over India, HCHO has biogenic, pyrogenic, and anthropogenic sources of volatile organic compounds (VOCs), some of which are spatially and temporally disaggregated. Using the CAABA 0-D photochemistry model, we explored a range of forest and urban photochemical environments found over India and their subsequent influence on HCHO concentrations. HCHO columns are related to local VOC emissions with a spatial smearing that increases with the VOC lifetime. We found that isoprene has the largest molar yield of HCHO which is typically realized within a few hours in the presence of moderate levels of nitrogen oxides ( 1 ppbv), in agreement with previous studies. However, we also found that forested regions that neighbour major urban conurbations (e.g. in the state of Kerala) are exposed to much higher levels of nitrogen oxides ( 8 ppbv). This results in depleted hydroxyl radical concentrations and a delay in the production of HCHO from isoprene oxidation. Informed by a regional bottom-up emission inventory for India, we found that propene is the only major component of anthropogenic VOCs that produces HCHO at comparable (but slower) rate to isoprene.
We found that the GEOS-Chem model reproduces observed spatial distributions during winter (JF) and premonsoon months (MAM) better than during monsoon (JJAS) and post-monsoon (OND) months. We attributed these differences in model skill to the response of the natural biosphere to changes in the meteorological and photochemical environments associated with the onset and retreat of the monsoon. We found that on a continental scale much of the seasonal cycle in observed HCHO columns can be explained by monthly variations in surface temperature. This observation together with the strong local relationship we found between isoprene emissions and HCHO production suggests a role for biogenic VOCs, in agreement with the GEOS-Chem model calculation. We also found that the seasonal cycle during 2014 is not significantly different from the 2008 to 2015 mean seasonal variation but there are large year to year variations. There are two main loci for biomass burning (states of Punjab and Haryana, and northeastern India), which we found makes a significant contribution (up to 1 × 10 15 molec cm −2 ) to observed columns only during March to April over northeastern India. The slow production of HCHO from propene oxidation results in a smeared hotspot over Delhi that we could only resolve by using a temporal oversampling method. Based on comparing GOME-2 and OMI HCHO column distributions, we argue that the early morning overpass time is better for quantifying anthropogenic emissions soon after the rush hour and before biogenic emissions are at their early afternoon peak.
Using a linear regression model to relate GEOS-Chem isoprene emissions to HCHO columns we inferred seasonal isoprene emissions over two key forest regions from the OMI HCHO column data. We found that the a posteriori emissions are typically lower than the a priori emissions, with a much stronger reduction of emissions during the monsoon season. This reduction in emissions during monsoon months coincided with a large drop in satellite observations of leaf phenology. Large-scale differences in observed and model HCHO columns during monsoon months may highlight errors in seasonal variations in basal emission rates and/or model errors associated with the underlying meteorological environments, e.g. partitioning of direct and diffuse photosynthetically active radiation.
The next logical step to this analysis is to estimate simultaneous estimates of anthropogenic, pyrogenic, and biogenic VOC emissions by using data collecting data from morning and afternoon local overpass times. In the case of biogenic VOC emissions, information from HCHO columns together with leaf phenology (e.g. leaf area index) and land surface parameters (e.g. soil moisture), can be integrated to develop a new satellite data-driven isoprene emission inventory. A self-consistent pan-tropical emission inventory for isoprene, for example, would help to improve understanding of tropospheric O 3 and organic aerosol that represent some of the largest uncertainties associated with the Earth system. Our ability to achieve this capability is improved by the launch of TROPOMI aboard Sentinel-5P which will result in daily maps of HCHO columns and complementary trace gases at a spatial resolution of 7 km, which dramatically increases the number of clear-sky scenes available for the analysis.
Data availability. The OMHCHOv003 OMI HCHO column data are publicly available through NASA's Mirador website. Model data are archived at the Edinburgh Data Share http://dx.doi.org/10.7488/ ds/2305 (Surl, 2018).
Author contributions. LS and PIP designed the computational experiments, PIP and LS wrote the paper, and GGA provided input on the paper regarding the OMI data analysis.
Competing interests. The authors declare that they have no conflict of interest.