Supplement to: An assessment of the climatological representativeness of IAGOS-CARIBIC trace gas measurements using EMAC model simulations

Measurement data from the long-term passenger aircraft project IAGOS-CARIBIC is often used to derive trace gas climatologies. We investigate to what extent such derived climatologies can be assumed to be representative for the true state of the atmosphere. Using the chemistry-climate model EMAC we sample the modelled trace gases along CARIBIC flight tracks. Different trace gases are considered and climatologies relative to the mid-latitude tropopause are calculated. Representativeness can now be assessed by comparing the CARIBIC sampled model data to the true climatological model 5 state. Three statistical methods are applied for this purpose: the Kolomogorov-Smirnov test, and scores based on the variability and relative differences. Generally, representativeness is expected to decrease with increasing variability and to increase with the number of available samples. Based on this assumption, we investigate the suitability of the different statistical measures for our problem. The Kolmogorov-Smirnov test seems too strict and does not identify any climatology as representative – not even long lived well 10 observed trace gases. In contrast, the variability based scores pass the general requirements for representativeness formulated above. In addition, even the simplest metric (relative differences) seems applicable for investigating representativeness. Using the relative differences score we investigate the representativeness of a large number of different trace gases. For our final consideration we assume that the EMAC model is a reasonable representation of the real world and that representativeness in the model world can be translated to representativeness for CARIBIC measurements. This assumption is justified by 15 comparing the model variability to the variability of CARIBIC measurements. Finally, we show how the representativeness score can be translated into a number of flights necessary to achieve a certain degree of representativeness.


Introduction
This supplement discusses further results of the study of the representativeness of IAGOS-CARIBIC data using the chemistryclimate model EMAC. For abbreviations and methods, please refer to the main text. Four points are discussed here: Section 2 briefly shows results of the comparison of model and measurement variability. The methods to describe representativeness developed and tested with model data were also applied to data from a random number generator. This 5 is described in Section 3. Section 4 discusses the sensitivity study of the Kolmogorov-Smirnov test using a subsample of MOD CARIBIC . Section 5 shows how the representativeness uncertainty of MOD CARIBIC decreases if the pressure range is increased to 10 hPa < p < 500 hPa, i.e. how the climatologies produced with data from IAGOS-CARIBIC are dependent on the pressure at which samples are taken.
2 Comparing measurement and model variability

10
In order to compare model and measurement variability, the relative standard deviation σ r = σ/µ (σ being the standard deviation, µ the mean) was calculated for MEAS CARIBIC (CARIBIC measurements) and MOD CARIBIC in each month. σ MODCARIBIC r and σ MEASCARIBIC r were calculated in each month. Figure 1  All three methods to investigate representativeness have also been applied to data created with a random number generator.
The results of this study are discussed here.
To produce the random numbers, 20 sets of 10 8 numbers were taken from a normal distribution. These 20 sets are referred to as species, well aware of the fact that they are purely artifial. From species to species, the standard deviation σ was set to vary 5 from 10 −3 to 10 3 , values of the exponent again linearly. 20 mean values µ (increasing from 10 4 to 10 8 , with a linear increase in the exponent) where distributed randomly onto the 20 species. 3000 samples were taken from each of the 20 species. The sample size increases by 20 for each sample, keeping the sample from before. This way, the relationship of the representativeness score with the sample size is directly accessible. The statistics of each species will be denoted by the index 2, while samples are indexed by 1. For short, this dataset will be named RAND.

10
The variability τ * of each species was defined as in Equation 5 of the main text: τ * = log 10 (µ 2 /σ 2 ), where high values of τ * stand for low variability. The two requirements set up in Section 3.3 for representativeness in general also have to hold here: 1. Representativeness has to increase with the number of samples.
2. Representativeness has to decrease with increasing variability of the underlying distribution.
With RAND defined in this way, it is possible to test representativeness using the variability analysis following Rohrer and 15 Berresheim (2006) and Kunz et al. (2008) (see Section 4.2) and the relative differences (see Section 4.3). The Kolmogorov-Smirnov test was positive for very few samples (less than fifty numbers, independent of τ * ) and will not be further discussed.
Its behaviour with aircraft data was subject of a sensitivity study, the results of which are shown in Sec. 4 of this supplement.

Variability analysis
The variabiltiy analysis (defined in Section 4.2 and Eq. 3) was applied in a simplified manner. As RAND is independent of 20 time, R var is reduced to just a single value containing the absolute difference of variability of each species of RAND and the sample taken thereof: R var = |ν 1 − ν 2 |, where ν is the mean variability. Figure 2 shows a result. The exact result is a matter of chance, as a random number generator is used. Similar to using MOD CARIBIC and MOD RANDPATH , a strong dependance on τ * and a weak dependance on the number of samples is visible.
Similar to R var when using MOD CARIBIC and MOD RANDPATH , the variability analysis using RAND meets the two require-25 ments necessary for describing representativeness, which were described in Section 3.3 and above. This result supports the findng that R var can be used as a statistic for describing representativeness.

Relative differences
Similar to R var , R rel is reduced to a simple relative difference when using RAND: R rel = |µ 1 − µ 2 |/µ 2 , where µ is the mean. Like for MOD CARIBIC and MOD RANDPATH , R rel passes both conditions for a valid description of representativeness: it depends on variability τ * and on the number of samples. The latter is also being influenced by chance and generally much weaker.
The fact that R rel passes the two conditions for a description of representativeness can be understood with some theoretical considerations. The standard error of the mean is defined by where σ x , the standard deviation of a sample, can be given by the following equation (N being the number of samples): For N = 1, this gives: Plugging Eq. 3 into Eq. 1 gives: and therefore 10 R rel = log 10 |x − µ| µ = −0.5 log 10 (n) + 1 τ * So ideally, R rel should depend inversely on τ * and directly on the logarithm of the number of values. Figure 3 shows this is approximately true for RAND.
In the case of RAND, R rel can be used to describe representativeness as it passes the two conditions, while R var does not.
Theoretical considerations make the finding plausible. RAND can be considered a theoretical abstraction of MOD. The finding 15 here therefore strongly supports that of Sections 5.2 and 5.3, where R rel and R var have also been found to be good descriptors of representativeness when using MOD CARIBIC and MOD RANDPATH or MOD RANDLOC . In the main text, we use R rel for final results, as it more suitable to answer the question of representativeness for a climatology.

Sensitivity study on the Kolmogorov-Smirnov test
When using MOD CARIBIC , MOD RANDPATH or MOD RANDLOC , the Kolmogorov-Smirnov test proved not usable, returning all 20 negative results. This indicates that MOD CARIBIC is not representative of MOD RANDPATH in the definition of the Kolmogorov-Smirnov test. This behaviour was tested in a sensitivity study, the results of which are described here.
One of the most frequent destinations within the CARIBIC project is Vancouver, Canada (near 120

Aircraft tropopause pressure bias
By calculating R rel using MOD CARIBIC and MOD RANDLOC , an important fact can be illustrated about data collected with instruments on civil aircraft. If data is resorted into heights relative to the tropopause (HrelTP), it still contains data taken at constant pressure altitudes in a limited range. Depending on the pressure at which the data was sampled, it contains information from different meteorological situations. The height of the tropopause relative to the sample pressure determines the Figure 6 shows the results (right hand panel). For comparison, the left hand panel of Figure 6 shows R rel of the same datasets when setting 180 hPa < p < 280 hPa, the range at which CARIBIC measures. On the right, the representativeness uncertainty increases strongly in all heights except just above the tropopause, where MOD CARIBIC contains most data. Only the 10 long lived species CO 2 , N 2 O and CH 4 retain their low uncertainties. For the more variable species to the right of the figure, the representativeness uncertainty increases strongly, especially in the troposphere, where the variability increases.
The strong increase in representativeness uncertainty is due to the bias always present in measurement data from commercial aircraft, which can only collect data high above the tropopause when the tropopause is at high pressure and far below when it is at low pressure values. This bias is naturally contained in all data measured at constant pressure and then sorted relative to  Figure 6. Rrel calculated using MODCARIBIC and MODRANDLOC with the range of p set to 180 hPa < p < 280 hPa (left) and 10 hPa < p < 500 hPa (right).