Calibration of column-averaged CH4 over European TCCON FTS sites with airborne in-situ measurements

In September/October 2009, six European groundbased Fourier Transform Spectrometers (FTS) of the Total Carbon Column Observation Network (TCCON) were calibrated for the first time using aircraft measurements. The campaign was part of the Infrastructure for Measurement of the European Carbon Cycle (IMECC) project. During this campaign, altitude profiles of several trace gases and meteorological parameters were taken close to the FTS sites (typically within 1–2 km distance for flight altitudes below 5000 m). Profiles of CO 2, CH4, CO and H2O were measured continuously. N 2O, H2, and SF6 were later derived from flask measurements. The aircraft data had a vertical coverage ranging from approximately 300 to 13 000 m, corresponding to∼80% of the total atmospheric column seen by the FTS. This study summarizes the calibration results for CH 4. The resulting calibration factor of 0.978 ± 0.002 (±1σ ) from the IMECC campaign agreed very well with the results that Wunch et al.(2010) had derived for TCCON instruments in North America, Australia, New Zealand, and Japan using similar methods. By combining our results with the data of Wunch et al.(2010), the uncertainty of the calibration factor could be reduced by a factor of three (compared to using only IMECC or onlyWunch et al.(2010) data). A careful analysis of the calibration method used by Wunch et al.(2010) revealed that the incomplete vertical coverage of the aircraft profiles can lead to a bias in the calibration factor. This bias can be compensated with a new iterative approach that we developed. Using this improved method, we derived a significantly lower calibration factor of 0.974± 0.002 (±1σ ). This corresponds to a correction of all TCCON CH4 measurements by roughly −7 ppb.


Introduction
The Total Carbon Column Observation Network (TCCON) is a worldwide network of ground-based Fourier Transform Spectrometers (FTS). It currently consists of 18 sites that provide a validation source for satellite measurements like the Greenhouse Gas Observing Satellite (GOSAT: Yokota et al., 2009;Morino et al., 2010) and the upcoming Orbiting Carbon Observatory 2 (OCO-2: Crisp et al., 2004). Unlike surface measurements, the FTS data can be used directly for the validation of satellite measurements since both methods provide total column abundances.
TCCON also complements the in-situ measurement network by delivering column-averaged dry-air mole fractions (henceforth abbreviated as "cDMF") of different species like CO 2 or CH 4 . By convention, the cDMF of a gas G is written as X G . In contrast to the ground-based in-situ network, total column measurements are not limited to the atmospheric boundary layer and are thus less sensitive to local sources and sinks and details of vertical transport (Gerbig et al., 2008). However, the reduced sensitivity of total column measurements to local influences makes the identification of seasonal and latitudinal variations of X CO 2 and X CH 4 challenging.
FTS spectral data deliver total-columns for the individual species. The cDMF of the target species is then calculated by dividing the total column value by the dry-air total column using the O 2 column as a proxy as described by Washenfelder et al. (2003). The vertical coverage of this type of measurement spans the whole atmosphere from the radiation source (sun) to the spectrometer (surface).
All members of the TCCON community use the same software GFIT to retrieve cDMF from their spectra. The whole software package including GFIT and other tools is called GGG. GFIT is a nonlinear least-squares fitting algorithm which computes column abundances from the solar absorption spectra. The GFIT algorithm scales an a-priori profile to generate the best spectral fit, and integrates the scaled profile to compute the column abundance . Therefore, the results of the GFIT retrieval contain no information about the vertical distribution of the species.
In-situ measurements and FTS measurements rely on different basic principles. The in-situ measurements are ultimately based on gravimetric or manometric standards (Dlugokencky et al., 2005) while the FTS measurements rely on spectroscopic parameters like line strength from spectral line catalogs. Spectroscopic line parameters like line-strength and line-width typically have uncertainties in the order of a few percent while in-situ measurements are typically accurate to 0.1 % or better. Biases in the spectroscopic data would therefore limit the absolute accuracy of the TCCON total column measurements to ∼1 % compared to a precision of < 0.25 % for X CO 2 .
This discrepancy between precision and accuracy is acknowledged by introducing a calibration factor ψ between total column and in-situ measurements. This calibration factor is expected to be close to but not exactly one. In principle, this calibration factor may consist of a method-dependent part (for example spectroscopic data) and an instrumentdependent part. Wunch et al. (2010) show that there exists a species-specific uniform calibration factor for the calibrated FTS systems and assume that the cause for differences between in-situ and FTS measurements is based in uncertainties of the spectroscopic line list that is used for the FTS data retrieval. Thus, it is highly likely that those species-specific uniform calibration factors apply to all FTS instruments of TCCON.
Calibration of the TCCON results against the in-situ measurements is especially important when TCCON results are used for source/sink estimations with inverse modelling. The results of the inverse models are very sensitive even to small biases in the data (Rayner and O'Brien, 2001).
Airborne in-situ measurements deliver vertical profile information of one or more species (see Sect. 2) with a high vertical resolution. However, with standard jet aircraft, the vertical coverage is typically limited to about 80 % of the total column.
The aircraft data can thus only deliver a partial column. For the calibration, the aircraft profile has to be extended to an artificial aircraft total column (see Sect. 4.2).
This article discusses the results of the X CH 4 calibration with airborne in-situ measurements. In general, the same methods used by Wunch et al. (2010) and Messerschmidt et al. (2011) (for X CO 2 ) were applied to the X CH 4 retrievals. In addition, it investigates improvements of the calibration method used by Wunch et al. (2010) that avoid biases caused by the limited aircraft vertical coverage.

The IMECC campaign
The first airborne campaign to calibrate FTS sites in Europe was part of the Infrastructure for Measurement of the European Carbon Cycle (IMECC), an Integrated Infrastructure Initiative within the European Union's 6th Framework Programme. Its main purpose was the calibration of five European TCCON sites and one mobile TCCON instrument (Geibel et al., 2010).
Two European TCCON FTS sites (Orléans and Bialystok) were co-located with tall tower stations. Figure 1 shows the five European TCCON sites, the mobile FTS in Jena, Germany, the airbase in Hohn, Germany, and the flight tracks of the IMECC campaign. Three other European TCCON sites (Sodankylä, Izaña, Ny-Ålesund) could not be reached by the aircraft during this campaign.
The campaign took place between 28 September and 9 October 2009. The aircraft used was a Learjet 35A, operated by Enviscope/GfD. The in-situ profiles were taken near the FTS sites in the form of spirals from the maximum flight altitude of ∼13 000 m down to ∼300 m (see Figs. 2 and 3). The distance between aircraft and FTS site depended mostly on altitude and limitations imposed by air traffic control. Above 5000 m flight altitude, the distance was typically in the range of tens of km. Below 3000 m flight altitude, the distance was typically within 1-2 km. The notable exception was the profile at Karlsruhe, which was taken during a landing at a nearby airport (43 km away). The Supplement contains all flight tracks.
Eight flights took place over four days with a total of 20 flight hours. During this time, 16 vertical profiles over the European TCCON sites were sampled at different solar zenith angles (SZA). The overall distance flown during the IMECC campaign was approximately 12 000 km. The details of the overflights are listed in Table 1.  Table 1.

FTS instruments and sites
During the campaign, the FTS sites were operated by the individual working groups that are responsible for each site. Three sites were operated by the Institute of Environmental Physics (IUP), Bremen, Germany; one site by IMK-ASF, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany; one by IMK-IFU (KIT), Garmisch-Partenkirchen, Germany; and one by the Max Planck Institute for Biogeochemistry (MPI-BGC), Jena, Germany. With one exception (Karlsruhe) the FTS instruments at these sites were Bruker IFS 125/HR spectrometers operating strictly according to the TCCON data protocol (2012). Each instrument had an Indium Gallium Arsenide (InGaAs) and a Silicon-diode detector that covered a total spectral range of at least 4000 to 15 000 cm −1 at a spectral resolution of 0.02 cm −1 or better.
The Karlsruhe FTS was of the same type and had the same resolution. However, it also did measurements in the mid-infrared region and therefore had a limited bandwidth of 5490 to 11 090 cm −1 for the TCCON measurements. Due to this limited bandwidth, the HF correction described in Sect. 3.6 could not be applied to the Karlsruhe data. Otherwise, the data processing was identical to that of the other FTS instruments.
The instrumental settings used during the campaign and a detailed description of the different sites can be found in Messerschmidt et al. (2011, their Table 2).

Aircraft in-situ instrumentation
During the whole campaign, outside air was sampled through an inlet in the aircraft cabin. This air was continuously analyzed for the abundances of CO 2 , CH 4 , H 2 O, and CO with a time resolution of about three seconds.
Carbon monoxide was measured with an Aero-Laser 5002 (Gerbig et al., 1999). Every ten minutes, an in-flight calibra- tion was performed by replacing the sample gas with air from a working tank for thirty seconds. The air from that tank was traceable to the WMO-2004 CO scale (Novelli et al., 2003). Each calibration was followed by a thirty-second zero gas measurement. By accounting for instrument drift in span and zero, an accuracy of better than ±2 ppb could be achieved.
The other species were measured with a cavity ringdown spectroscopy (CRDS) instrument (Picarro Inc., Santa Clara, CA, USA), the same type as the one described by Chen et al. (2010). The accuracy was better than ±0.1 ppm for CO 2 and ±2 ppb for CH 4 . The H 2 O abundance used to correct the aircraft profiles for dry air to derive cDMFs was measured with a precision of better than ±25 ppm, and an accuracy of about 1.5 % (calibrated against a dew point mirror in the range of 0.7-3.0 %, see . The CRDS analyzer for CO 2 , CH 4 , and H 2 O was only calibrated against MPI-BGC ambient air standards before and after the campaign. Chen et al. (2010) had demonstrated that the CO 2 measurements of a CRDS analyzer of the same type were stable over a two-week campaign in Brazil. Based on those results, we did not employ any in-flight calibrations during the IMECC campaign. No significant drift was detected within the precision of the CH 4 measurements of the CRDS analyzer (0.6 ppb) for the range of 1880-2200 ppb. The MPI-BGC standards for CH 4 were traceable to the WMO-2004 CH 4 scale (Dlugokencky et al., 2005).
In addition to the continuous in-situ measurements, flasks were filled with air samples from an additional inlet. Up to eight flasks per profile were taken at different altitude levels. After the campaign, the concentrations of CO 2 and its isotopes, CH 4 , N 2 O, CO, H 2 , and SF 6 in the flasks were measured at MPI-BGC's gas analysis lab. The results were used  Table 2. Uncertainties related to different parts of the total column that was derived from the aircraft measurments. The main contributions come from the extrapolation to the surface, the aircraft data and the extension of the column to the stratosphere. They are listed as individual uncertainty u p , contribution to the total column uncertainty u t , and relative contribution to the total aircraft X CH 4 error in %. to assure the quality of the continuous measurements. Supplemental meteorological data (air temperature, pressure and relative humidity) were also recorded. Detailed information about the aircraft instrumentation and in-situ data can be found in Messerschmidt et al. (2011).

FTS data processing
To ensure a uniform processing of the FTS data obtained within the IMECC campaign, all spectra of the participating sites were processed in Jena using identical software and settings. For the analysis of the spectral data the TCCON standard retrieval software GFIT  was used with the same settings as used in Wunch et al. (2010). In particular, the same spectral line list (included in GGG release 2010-10-06) was used throughout the study.

GFIT a-priori profiles
The GFIT a-priori profiles are based on MkIV balloon profiles and profiles obtained from the Atmospheric Chemistry Experiment (ACE-FTS) on-board SCISAT-1 -both measured in the 30-40 • N latitude range from 2003 to 2007. With the help of auxiliary data specific to the location and time of the FTS measurement (air temperature (AT), geopotential height (GH), specific humidity (SH), and tropopause pressure (TP) from the NCEP database, Kalnay et al., 1996) they are converted to a local a-priori profile for each day. Within the GFIT analysis this local a-priori profile is weighted with an SZA-dependent averaging kernel and scaled with a retrieval scaling factor to perform a spectral fit of the measured spectral data.

GFIT retrieval uncertainties
The uncertainties of the GFIT retrieval are a combination of statistical errors (measurement noise) and systematic artifacts (e.g. errors/omissions in the spectroscopy, the modeling of the instrument response, and pointing-induced solar line shifts) . The uncertainty estimation -the GFIT error -is a standard product of the GFIT software. The main components of the GFIT error are from instrument alignment errors, nonlinearities of the spectral continuum, and a-priori profile uncertainties. Wunch et al. (2011, their Appendix B) provide a complete error budget.

Coincidence criteria for aircraft and FTS measurements
For the derivation of the calibration factor ψ obviously a data point consisting of an aircraft value and an FTS value for each overflight is needed. The aircraft value was calculated by integrating the extended aircraft column. All spectral data within a time window of ±30 min around the spectrum closest to the aircraft overflight were chosen.

Reducing the effects of cloudy conditions at the FTS sites
The weather situation during the IMECC campaign was not optimal for FTS measurements. Although the flights were scheduled using forecast products and satellite imagery, many sites suffered from cloudy sky conditions during the overflights. Simply removing cloud-affected spectra was not an option as this would have reduced the number of spectra to zero for many FTS sites. However, the effects of solar intensity variations (SIV) from clouds on the interferograms can be corrected with the SIV-correction procedure described by Keppel-Aleks et al. (2007) (a standard TCCON procedure).
The applied SIV correction with GFIT default values reduced the scatter significantly: from a standard deviation of 4.2 ppb without SIV correction to 1.3 ppb with SIV correction. The error bars of early-morning measurements which were often affected by clouds were reduced. The ratio of mean error with SIV correction to mean error without SIV correction was 0.68 for spectra obtained before 06:30 UTC. Some data points that appeared as outliers without SIV correction could be better retrieved (some however with large error bars).

Pre-and post-processing
All available spectra were processed with the standard IPP software that converts interferograms to spectra (standard TCCON procedure). Besides the SIV correction (described above) that is part of IPP, no additional pre-screening was applied.
After processing, all spectra with a GFIT error (see Sect. 3.2) larger than 10 ppb were excluded. For all remaining spectra spectra that matched the coincidence criterion for an overflight (see Sect. 3.3) the median value of the derived X CH 4 data points was calculated. This value represented the FTS data point for calibration.
Retrieval biases due to laser sampling errors, so-called ghosts , could not be corrected. The empirical correction procedure as applied by Messerschmidt et al. (2011) had been established for X CO 2 but not for X CH 4 .

Correction of GFIT a-priori CH 4 profiles via HF correlation
As indicated by Wunch et al. (2010), for a more precise retrieval of X CH 4 the estimated tropopause heights of the GFIT a-priori CH 4 profiles have to be corrected. This was done by using the correlation of methane and hydrofluoric acid (HF) that was observed by Luo et al. (1995) and Washenfelder et al. (2003). The CH 4 -HF-correlation is based on the assumption of complete absence of HF in the troposphere. X HF was retrieved near 4038 cm −1 . To apply this correction, the results of a GFIT X HF retrieval for the individual site were used to calculate an altitude shift for the CH 4 a-priori profiles (see Fig. 4). The modified GFIT a-priori profiles were used for a re-analysis of all IMECC spectral data with the exception of the Karlsruhe instrument. Due to the different detector setup of this instrument (see Sect. 2), the signal-to-noise ratio at 4038 cm −1 was not sufficient to apply the HF correction.
In general, the effect of the HF correction on the X CH 4 calibration coefficient was small and well within the error bars.

Data analysis
Data analysis was performed separately for X CO 2 and X CH 4 . This section describes the results of the X CH 4 calibration. The results of the X CO 2 calibration can be found in Messerschmidt et al. (2011).

Method of intercomparison of two different measurement principles
As pointed out in Sect. 1, in-situ and FTS data cannot be compared directly. The aircraft profile has a high vertical resolution but it covers only a part of the total column that is observed by the FTS. Since the FTS total column cannot be reduced to the partial column measured by the aircraft, the aircraft profile has to be extended to an artificial aircraft total column (see Sect. 4.2). Rodgers and Connor (2003) developed a method that allows the intercomparison of two different measurement methods where one has a much higher resolution than the other. This method is adapted for the intercomparison of aircraft and FTS data after vertical integration (Wunch et al., 2010, their Eq. 3): Atmos. Chem. Phys., 12, 8763-8775, 2012 www.atmos-chem-phys.net/12/8763/2012/ with c s : the retrieved cDMF derived from airborne measurements, γ : the FTS retrieval scaling factor, c a : the FTS a-priori cDMF, a: a vector containing the FTS dry pressure-weighted column averaging kernel, x h : the extended aircraft profile, and x a : the FTS a-priori profile. The profile vectors x h and x a as well as the column-averaged DMFs c s and c a have units of µmol mol −1 . The scaling factor γ is dimensionless. Please note that γ is an internal variable of the GFIT retrieval and not the calibration factor that was mentioned in Sect. (2) with c s : X CH 4 derived from airborne measurements, γ : the FTS retrieval scaling factor, dry air : the total column of dry air, apriori CH 4 : the total vertical column of CH 4 , aircraft CH 4 ,ak : the column-averaging-kernel-weighted vertical column of the aircraft, and apriori CH 4 ,ak : the column-averaging-kernel-weighted vertical a-priori.
The presented method extends the aircraft profile to a total column as described in Sect. 4.2. It then uses the FTS cDMFs, the GFIT a-priori profiles, the retrieval scaling factor, and the GFIT averaging kernels to retrieve the cDMF of this extended aircraft column. This result is finally used to calculate the calibration factor for the FTS measurements (see Sect. 5.1).

Aircraft total column extension
In most cases the aircraft data were limited to an altitude range from approximately 300 to 13 000 m. To compare the aircraft data with the FTS data, this partial column had to be extended both to the surface and to the top of the atmosphere.
For the FTS sites Orléans and Bialystok, ground-based in-situ data from the co-located tall-tower stations Trainou (TRN) and Bialystok (BIK), respectively, were used to extend the aircraft data to the ground. For the other sites the values measured at the lowermost altitude by the aircraft were linearly extrapolated to the surface. The uncertainty was estimated conservatively using the variance of the lowest aircraft data.
For the stratospheric part of the column the GFIT a-priori profile multiplied by the retrieval scaling factor was used (see Fig. 5). The a-priori profile was then weighted with the GFIT averaging kernel and scaled by the retrieval scaling factor for the individual overflight (see Sect. 4.1). The error of the stratospheric mixing ratio was estimated conservatively as 1 % of the scaled and weighted a-priori. This corresponds to the shifting of the profile by 1 km up and down performed by Wunch et al. (2010). An overview of the individual uncertainties of the extrapolation to the ground, the stratospheric extension by using the GFIT a-priori and the aircraft data can be found in Table 2. The extended aircraft columns were then used to calculate the aircraft-derived cDMF needed for Eq. (1).

Calibration factor between aircraft and FTS instruments
In a first step, the results of the GFIT retrievals with standard a-priori profiles -rather than with extended aircraft profiles -were investigated. Similar to Wunch et al. (2010) the data points were fitted with an error-weighted least-squares fit as published in York et al. (2004) to derive the calibration factor ψ std . In agreement with the previous investigation of Wunch et al. (2010), an artificial calibration point at the origin was added (D. Wunch, personal communication, 2010). The fit of the IMECC campaign data produces a calibration factor of ψ std = 0.978 ± 0.002 (±1σ ). Although derived with GFIT standard a-priori profiles, it is already similar to the results of the earlier campaign (Wunch et al., 2010). To be able to compare the results of the IMECC campaign data with the data of Wunch et al. (2010), however, the GFIT retrieval was repeated using the extended aircraft profile from Sect. 4.2 as the a-priori profile for the GFIT retrieval. The different a-priori has minor effects of ±2 ppb on the retrieval for the individual sites. This is of the same order of magnitude as the typical GFIT error for X CH 4 . Figure 6 shows the results of the fit for this procedure (continous line). The resulting calibration factor ψ aircraft = 0.978 ± 0.002 is exactly the same as ψ std and it is also identical to the one derived by Wunch et al. (2010).
In the next step, the Wunch et al. (2010) data were added to the dataset and the fitting procedure was repeated (see dashed line in Fig. 6) to derive a calibration factor ψ I+W for all sites (IMECC + Wunch et al.). As a result, the calibration factor does not change, but the uncertainty is reduced by ∼ 68 % (from ±0.00205 to ±0.00066).
To illustrate the quality of the fit, the residuals (cDMF FTS − ψ I+W cDMF aircraft ) for all calibration points are shown in Fig. 7. For overflights with a larger error bar, the residuals indicate a tendency to a slightly higher calibration factor than the one derived by Wunch et al. (2010). However, most of the calibration points include the calibration factor www.atmos-chem-phys.net/12/8763/2012/ Atmos. Chem. Phys., 12, 8763-8775, 2012

Influence of the individual overflights of the IMECC sites on the calibration factor
As discussed before, uncertainties in the spectroscopy would lead to a network-wide calibration factor. However, it could not be excluded that the calibration factor could also contain station-dependent components even though none were identified by Wunch et al. (2010) or Messerschmidt et al. (2011).
To test the hypothesis that only a single network-wide calibration factor is needed for each FTS site, each overflight was analyzed separately. The York et al. (2004) fitting procedure was used to derive a separate calibration factor for each individual overflight and one based on all other overflights. Figure 8 shows an overlap of the error bars with the calibration factor for 11 of 16 overflights. This corresponds to 68.8 % and confirms expectations for ±1-σ error bars.  Wunch et al. (2010) data derived from the GFIT retrieval with aircraft profiles as a-priori. The black continous line represents the fit for calibration factor ψ aircraft derived for the IMECC data. The dark-orange dashed line represents the fit for calibration factor ψ I+W for all sites (IMECC + Wunch et al.). Both fits are nearly identical.

Influence of the amount of aircraft data on the calibration points
An important factor for the calculation of the calibration factor is the vertical coverage of aircraft data in the artificial aircraft total column as shown in Sect. 4.2. The less aircraft information available, the more the a-priori has to be used to fill the profile.
To illustrate the effect of the vertical coverage of aircraft data in the aircraft total column, a sensitivity test was performed. The vertical coverage of aircraft data was artificially Atmos. Chem. Phys., 12, 8763-8775, 2012 www.atmos-chem-phys.net/12/8763/2012/ Fig. 8. Influence of the individual overflights of the IMECC sites on the calibration factor. For this study, the calibration factor for each individual overflight and the artificial calibration point in the origin were derived (full and empty dots). An additional calibration factor was calculated for the corresponding remaining overflights and the artificial calibration point in the origin (full and empty triangles). The error bars of the overflights JE-OF1a, JE-OF1b, BI-OF2b, OR-OF1a and BR-OF2a do not overlap with the respective calibration factors derived from the overflights over the corresponding remaining sites.
reduced to data measured below a certain pressure value. The remaining part of the column was filled with the scaled and averaging-kernel-weighted a-priori (see Sect. 4.2). Then the calibration point (FTS-to-aircraft ratio) was re-calculated.
The results show the expected behavior of an increasing FTSto-aircraft ratio with the decrease of the vertical coverage of aircraft data (see Fig. 11a).
In an extreme scenario of no aircraft data, the profile is identical to the scaled a-priori. For Eq. (1) in Sect. 4.1 the consequences are that the calibration factor becomes 1. With fewer aircraft measurements, one is left to rely more upon apriori knowledge about the calibration factor. In other words: the less aircraft information contributes to the extended aircraft total column, the more the extended aircraft column tends towards the GFIT a-priori. In the extreme case of no vertical coverage, the buest guess for the calibration factor derived from this profile would be ψ = 1 (no information).
Having these results in mind when looking at the individual aircraft profiles in Sect. 4.2, it is obvious that one can expect different behavior of different overflights due to the vertical coverage of aircraft data.
A good example are the first two overflights over Jena. Overflight JE-OF1a has a maximum flight altitude of 13 km, overflight JE-OF1b of approximately 8 km. Due to the time difference between overflight and first spectrum, for these calibration points exactly the same FTS data are used. The aircraft data are similar as well. Hence, the difference in the residuals in Fig. 7 for these two calibration points is most likely due two the different amount of aircraft data. The residual of JE-OF1a is smaller and the calibration factor for this individual calibration point closer to 0.978. The residual of JE-OF1b, however, is larger and the calibration factor for this individual calibration point is further away from 0.978 (see Fig. 8).

An improved approach to determine the calibration factor
The previous results have shown that the calibration points from aircraft profiles with less vertical coverage are biased towards one. This is caused by the extrapolation of the aircraft profiles with the GFIT a-priori. A simplified example can illustrate the problem. Figure 9 shows two measurements on an artificial pressure level. Measurement A represents the scaled FTS a-priori profile (which, if integrated, is equal to the FTS cDMF) and covers the complete pressure range (total column). Measurement B represents the aircraft profile and covers the lower 50 % of the pressure range (partial column). Measurements A and B are constant (A = 1, B = 3). The true calibration factor ψ true = 1/3 is known in this example.
Following the procedure of Wunch et al. (2010), measurement B is extrapolated to the full total column by using measurement A. This leads to an integrated profile for B and a calibration factor that is biased towards one (ψ int = 1/2). Therefore, the extrapolation of the aircraft profile with the FTS a-priori generally leads to a bias of the calibration factor towards one. The magnitude of this bias depends on the amount of aircraft data and the difference of the calibration factor from one.
A possible solution for this problem is to extrapolate measurement B with a calibration-factor-corrected measurement A to derive the true calibration factor. To be able to do this, the calibration factor has then to be derived in an iterative calculation.
Following this principle, the aircraft column has to be extrapolated with a calibration-factor-corrected GFIT a-priori profile (see Fig. 10). The approach of Rodgers and Connor (2003) (see Eq. 1) is modified to: with c s : the retrieved cDMF of the aircraft, γ : the FTS retrieval scaling factor, c a : the FTS a-priori cDMF, a: a vector containing the FTS dry pressure-weighted column averaging kernel, x h : the extended aircraft profile, x a : the FTS a-priori profile, and ψ n : the iteratively-derived calibration factor. Starting with an initial calibration factor ψ 0 = 1, Eq. (3) is identical to Eq. (1). The calibration points are calculated and the fitting procedure (see Sect. 5.1) is applied. This leads to a new calibration factor ψ 1 = ψ std = 0.978 which is the same as the one determined with the original Wunch et al. (2010) approach. The procedure is then repeated until the factor converges to the final value ψ n . Since the a-priori profile only Fig. 9. Illustration of the bias introduced by the extrapolation of measurement B to a total column using data of measurement A. The integrated profile leads to a calibration value of 2, while the true value should be 3. has a small influence on the GFIT retrieval (see Sect. 5.1), the GFIT retrieval with a ψ n -corrected aircraft profile as apriori for each iteration step was not performed for this study. Figure 11 illustrates the effect of the iterative approach. One profile (OR-OF2a) was artificially reduced in altitude coverage. Then the analysis of Sect. 5.3 was repeated. The more the vertical coverage was reduced, the more the FTSto-aircraft ratio derived from this profile was biased towards one (see Fig. 11a) while the FTS-to-aircraft ratios derived from the other profiles remained unchanged. The calibration factor ψ was then calculated from the unbiased as well as the biased FTS-to-aircraft ratios. Effectively, the bias in the altitude-coverage-reduced FTS-to-aircraft ratio would lead to a (smaller) bias of the calibration factor ψ towards one. The effective bias of ψ depends on the weight of the biased profile relative to the other profiles. Figure 11b shows how this bias can be compensated by the iterative approach. In the iterative approach, the best guess for the calibration factor ψ in the case of missing information is not one but rather the value from the previous iteration. Therefore, the FTS-to-aircraft ratio for the altitude-coveragereduced profile (after several iterations) is not biased towards one any more. Instead, the FTS-to-aircraft ratio of this profile stays near the value determined by the other profiles -even if the altitude coverage is reduced to zero. Thus the bias of the calibration factor ψ can also be avoided. In other words: the calibration factor ψ is retrieved from all profiles, not from a single one. The iterative approach avoids biases caused by profiles that contain less vertical information than others. In the extreme case of a profile with zero vertical coverage (no information) the calibration factor would be determined from the other profiles only. In the original Wunch et al. (2010) approach, this zero-information profile would have biased the whole calibration factor ψ towards one.
Of course this does not imply that ψ could be derived with the same accuracy if all aircraft profiles had reduced or -in the extreme case -zero vertical coverage. Profiles with high vertical coverage are needed to compensate for profiles with low vertical coverage. The difference is that with the iterative approach missing altitude information in some of the profiles is ignored as much as possible instead of biasing ψ towards one.
By using this iterative approach, the calibration points of the individual overflights showed roughly the same scatter and residuals (see Fig. 12, lower part) as in the approach of Wunch et al. (2010) (compare Figs. 6 and 7). The standard deviation for both residual calculations was the same (6 ppb). However, temporally close overflights (BI-OF2a/b, OR-OF1a/b) with different maximum flight altitudes were now more consistent. The influence of the vertical coverage of the aircraft data was reduced to a minimum.
Atmos. Chem. Phys., 12, 8763-8775, 2012 www.atmos-chem-phys.net/12/8763/2012/ Fig. 11. Illustration of the effect of limited aircraft altitude coverage on the calibration factor illustraded for the example of overflight OR-OF2a. The red triangle is the (original) FTS-to-aircraft ratio for this overflight while the blue line is the calibration factor determined from all sites (including Wunch et al. (2010) data). Then the vertical coverage of the profile is reduced by only including data points with pressure p above a minimum pressure p min (p > p min ). The black dots show the effect of this reduced vertical coverage on the FTS-to-aircraft ratio for this profile. The error bars are a combination of FTS and aircraft error and increase with reduced vertical coverage. (a) Standard method according to Wunch et al. (2010): with fewer aircraft data the FTS-to-aircraft ratio for this profile approaches one (which would lead to a bias in the calibration factor). (b) New iterative method: the reduced vertical coverage does not lead to a significant bias any more.
The resulting calibration factor for the IMECC campaign dataset ψ n = 0.974 ± 0.002 (±1σ ) (see Fig. 12, upper part) was significantly different from the one derived by the method of Wunch et al. (2010). The difference of 0.004 between ψ I+W and ψ n corresponds to a ∼7 ppb offset for the FTS cDMFs.

Conclusions
Using the same method as Wunch et al. (2010), the results of the IMECC aircraft campaign confirmed the earlier calibration factor for X CH 4 . When the results of Wunch et al. (2010) and the IMECC campaign were combined, the uncertainty of the fit of the calibration factor could be reduced by ∼68 % (see Table 3). It seems to be most likely that this factor is a uniform calibration factor for the whole TCCON network.
However, further investigation of the method of Wunch et al. (2010) shows that stratospheric extrapolation of the aircraft data is sensitive to the vertical coverage of the aircraft data and introduces a bias of the calibration factor. A uniform vertical coverage of the aircraft data is, unfortunately, not always possible. Besides that, the uncertainties of the stratospheric part lead to significant uncertainties for the aircraft X CH 4 and generate ∼85 % of the total error budget. A better knowledge about the stratospheric distribution of CH 4 is needed to be able to reduce these errors and thus improve the calibration procedure. An iterative determination of the calibration factor presents a possible solution for the problem of different vertical coverage of the aircraft data and removes the bias resulting from the stratospheric extrapolation. The improved iterative method produced a slightly smaller calibration factor than the method of Wunch et al. (2010). For typical atmospheric values of X CH 4 this corresponds to a high-bias of about +7 ppb in the published Wunch et al. (2010) X CH 4 data. This value corresponds to roughly twice the typical GFIT error for X CH 4 . Further investigations with more calibration points (e.g. IMECC + data from Wunch et al., 2010) have to validate the results of this approach.
Apart from the iterative method, there are two options to avoid this problem: -The retrieval of a partial column from FTS spectral data that has the same vertical coverage as the aircraft profile. This is not yet implemented in the GFIT software yet but there are efforts to do so. However, the vertical information is limited and the pressure-broadening coefficients of most spectral lines are not well known. Only a few degrees of freedom could be expected from such a retrieval. Besides, the quality of a partial-column retrieval would certainly be lower than that of a totalcolumn retrieval. It is not clear if there would be a net benefit for the determination of the calibration factor.
-Future calibration campaigns with balloon-based instruments like AirCore (Karion et al., 2010). This would allow one to increase the vertical coverage drastically to an almost complete total column (0-30 km) and thus solve the problem of stratospheric uncertainties.