Estimating Uncertainties in the SBUV Version 8 . 6 Merged Profile Ozone Dataset

The combined record of total and profile ozone measurements from the Solar Backscatter Ultraviolet (SBUV) and SBUV/2 series of instruments, known as the SBUV Merged Ozone Data (MOD) product, constitutes the longest satellitebased ozone time series from a single instrument type, and as such plays a key role in ozone trend analyses. 10 Following the approach documented in Frith et al. (2014) to analyze the merging uncertainties in the MOD total ozone record, we use Monte Carlo simulations to estimate the potential for uncertainties in the calibration and drift of individual instruments in the profile ozone merged data set. We focus our discussion on the trends and associated merging uncertainty since 2001 in an effort to verify the start of ozone recovery as predicted by chemistry climate models. We find that merging uncertainty dominates the overall estimated uncertainty when considering only the 15 years of data since 2001. We derive 15 trends versus pressure level for the MOD data set that are positive in the upper stratosphere as expected for ozone recovery. These trends appear to be significant when only statistical uncertainties are included, but become not significant at the 2σ level when instrument uncertainties are accounted for. However, when we use the entire data set from 1979 through 2015 and fit to the EESC (equivalent effective stratospheric chlorine) we find statistically significant fits throughout the upper stratosphere at all latitudes. This implies that the ozone profile data remain consistent with our expectation that chlorine is 20 the dominant ozone forcing term.


Introduction
The solar backscatter ultraviolet (SBUV) series of instruments provides a 40+-year data record of broadly resolved vertical ozone profiles on a global scale.We recently reported on our updated Merged Ozone Data (MOD) record of integrated total column SBUV measurements (Frith et al., 2014).
Here we extend the record by considering the ozone profile measurements in layers from 25 to 1 hPa, where SBUV provides the best vertical resolution (Kramarova et al., 2013a;Bhartia et al., 2013).The SBUV record comprises data from nine instruments (Nimbus 4 backscatter ultraviolet (BUV), Nimbus 7 SBUV and SBUV/2s on NOAA 9,11,14,16,17,18,19), providing ozone measurements over an era of changing chlorine levels and changing stratospheric climate.In order to isolate these signals from natural ozone variability, a single coherent data set is required.To this end, instruments in the series were cross-calibrated at the wavelength level using overlapping measurements collected within defined spatial and temporal limits (DeLand et al., 2012).Ozone was then derived for each instrument using the Version 8.6 retrieval algorithm to produce the measurement time series used to create the MOD data set (McPeters et al., 2013).While this approach minimized differences among instruments, Frith et al. (2014) showed that small remaining offsets and drifts between measurements contributed to the uncertainty in the total ozone SBUV MOD record.
Profile ozone measurements are inherently noisier than column ozone measurements and generally show larger variations between measurements (e.g., Kramarova et al., 2013b;Hassler et al., 2014;Tummon et al., 2015;Hubert et al., 2016).For the SBUV profile measurements, different wave-Published by Copernicus Publications on behalf of the European Geosciences Union.S. M. Frith et al.: Estimating Uncertainties in the SBUV lengths are sensitive to the ozone concentration at different pressure levels and wavelength is being used to "scan" the profile.Wavelength-dependent calibration errors tend to cause ozone errors that oscillate in the vertical.Thus, profile ozone measurements are much more sensitive to wavelength calibration and instrument issues, while these errors tend to cancel in total ozone.
Although profile measurements have an inherently larger uncertainty than total ozone measurements (at least for SBUV), these data are critical in the search for indicators of the recovery of ozone.Model studies indicate that the expected recovery of ozone from the impacts of chlorine and bromine compounds will be latitude-and altitude-dependent (e.g., Li et al., 2009).To that end, a number of recent studies have indicated statistically significant ozone increases since the late 1990s based on merged ozone profile records, suggesting recovery from the earlier ozone decline attributed to ozone-depleting substances (ODSs; e.g., WMO, 2014;Tummon et al., 2015;Harris et al., 2015;Steinbrecht et al., 2017).However, a full characterization of the uncertainties associated with the merging process of data from multiple instruments is not generally available.It is important that not only the individual instrument uncertainties be taken into account, but uncertainties arising from the merging process itself must also be accounted for before the merged data can be properly interpreted.Such uncertainties result from individual instrument uncertainties (absolute calibration, drift, other systematic errors) but also from differences in measurement technique, spatial and temporal resolution, and native vertical coordinate systems of the merged records (e.g., Damadeo et al., 2014;Hassler et al., 2014;Sofieva et al., 2015).How to propagate such uncertainties and assess their impact on derived trends and other long-term signals is an outstanding question within the community (e.g., WMO, 2014;Harris et al., 2015).
In this work we estimate the merging uncertainty in the SBUV profile MOD record using a Monte Carlo approach similar to that presented in Frith et al. (2014) for total ozone.In the following sections we describe the SBUV measurements and analyze the consistency between the individual instrument data sets.We then describe the merged data product and summarize our approach to estimating potential longterm drifts in the data set.Using a multiple linear regression model, we compute profile ozone trends, focusing on the ozone changes since 2000, a period when models project ozone beginning to recover and the SBUV/2 data are most reliable.We evaluate two sources of uncertainty: the statistical uncertainty resulting from real atmospheric variability and an imperfect regression model fit to the data and the uncertainty in merging the data from multiple instruments because of possible offsets or drifts in calibration.Finally, we compare our results with trends derived from an independent merged record based on the same SBUV instrument data, known as the NOAA Cohesive data set (Wild and Long, 2017).The NOAA Cohesive data set represents a different but reason-able approach to merging the same raw data, and as such, our estimates of merging uncertainty should encompass differences in the derived trends between the data sets.

Data
The SBUV instrument series and most recent Version 8.6 data processing have been described in detail in a series of publications (DeLand et al., 2012;McPeters et al., 2013;Bhartia et al., 2013).The data have been assessed and compared to independent measurements by Labow et al. (2013) and Kramarova et al. (2013b).Here we focus on the most recent updates, including data from NOAA 19 that were not included in the aforementioned studies.Figure 1 shows an update of the SBUV instrument orbit drift history.Here we plot the local time at which each satellite crosses the equator as a function of time.SBUV instruments ideally operate in late morning-early afternoon sun synchronous orbits such that measurements are made at small solar zenith angles and at the same local time each orbit.NOAA satellites are launched into orbits that slowly drift toward the terminator, and several satellites have drifted through the terminator, thus making both afternoon and morning measurements (see Fig. 1).The afternoon and morning orbital segments are denoted by the suffix "pm" and "am" in the following discussion (for example, NOAA 16_pm and NOAA 16_am).The first NOAA instruments (NOAAs 9, 11 and 14) underwent more pronounced orbital drift, and the associated data quality during portions of these records is notably reduced (DeLand et al., 2012;Kramarova et al., 2013b;McPeters et al., 2013).
We follow the same data selection criteria as used in Frith et al. (2014) based on prior data quality assessments.That is, we only use measurements made when the satellite equatorcrossing time is between 08:00 and 16:00 to avoid issues with drifting orbits, with the exception of NOAA 11_pm to avoid a data gap.We do not include NOAA 9 data due to quality issues.Otherwise we retain the Tier 1 and Tier 2 instrument quality designations, with the aforementioned instruments in drifting orbits generally assigned as Tier 2. Tier 2 instruments are of lesser quality than Tier 1 but are still considered useful within the record and include NOAA 11_am, NOAA 14_pm, NOAA 14_am and NOAA 16_am.While we include Tier 2 data when creating the long-term MOD data record, we account for the varying data quality in our uncertainty estimates.Additionally for the profile data set, measurements are removed for a year after the El Chichón volcanic eruption and for 18 months after the eruption of Mt.Pinatubo to avoid periods when volcanic aerosols likely interfered with the algorithm (Bhartia et al., 2013).We identified these periods using internal algorithm parameters and external data comparisons but caution that small volcanic effects may persist beyond the period of missing data.No NOAA 9 profile data are used in the profile MOD (limited NOAA 9 data were used in the total ozone MOD to fill data gaps).Data from the next generation Ozone Mapping and Profiler Suite (OMPS) nadir profiler (NP) instrument, launched in October 2011 on the Suomi-NPP satellite, and subsequent planned missions on the JPSS (Joint Polar Satellite System) series will continue the nadir profiler measurements for decades to come.The Suomi-NPP is in a stable early afternoon orbit, while the NOAA 19 satellite has begun to drift towards the terminator.The current merged data set does not include OMPS NP measurements, but we anticipate that these data will be added soon, as the orbit for NOAA 19 approaches the 16:00 ECT cutoff and the OMPS data are reprocessed with an improved long-term calibration (Version 2).Kramarova et al. (2013b) presented a complete validation of the individual SBUV instrument measurements relative to satellite and ground-based independent data sources.Figure 2 shows updated mean bias comparisons between Aura Microwave Limb Sounder (MLS; daytime measurements only) and SBUV NOAAs 16, 17, 18 and 19 averaged in three broad latitude bands.The MLS profile data are integrated to get column ozone in SBUV pressure layers and then smoothed using the SBUV averaging kernel to match the SBUV vertical resolution (e.g., Kramarova et al., 2013a).Here we use V4.2 MLS data and include NOAA 19 SBUV/2 data (through February 2017) as well as updated revisions of version 8.6 for NOAAs 16, 17 and 18 SBUV/2 data, but the conclusions remain the same as in Kramarova et al. (2013b).Relative to Aura MLS, SBUV biases are largely within 5 %.The SBUV instruments in afternoon orbits (NOAAs 16_pm, 18 and 19) show a distinct pattern with SBUV lower than MLS in the upper stratosphere, higher than MLS in the middle stratosphere and then lower than MLS in the lower stratosphere.The two SBUV instruments in morning orbits (NOAA 16_am and NOAA 17) show a qualitatively similar but less pronounced pattern with respect to MLS.Using MLS as a transfer standard, the morning orbit SBUV measurements are smaller in the middle stratosphere and greater in the upper stratosphere than the afternoon orbit measurements.This pattern is generally consistent with diurnal variations in ozone observed from ground-based microwave measurements at Mauna Loa (Parrish et al., 2014).
Figure 3 shows the drifts of the SBUV instruments relative to Aura MLS over the time period when both instruments are making measurements, computed after removing the respective seasonal cycles from each data record.The drifts are generally within 5 % decade −1 (hereafter % dec −1 ).The largest drifts (> 5 % dec −1 in some cases) are those for NOAA 16 in both the morning and afternoon orbits.Both of these records overlap MLS by only 30-35 months.The longer overlaps for NOAAs 17, 18 and 19 (80+ months) show significantly less drift with respect to Aura MLS.The 2σ uncertainty bounds in Fig. 3 are scaled to approximate the effects of autocorrelation and show that in many cases the larger drifts are not significant.However, significant drifts remain in the southern midlatitudes and upper levels in the tropics.Though these drifts are real, larger variations at the beginning or end of the overlap have an exaggerated effect on the drift amplitude computed over shorter time periods.Kramarova et al. (2013b;  The records from the various satellite SBUV instruments present a consistent picture of ozone variation over time.In the 2.5 to 1.6 hPa layer we see a rapid decline of about 15 % between 1980 and about 2000 with a leveling off after 2000 and perhaps a slight increase.In the 6.4 to 4 hPa layer we see a smaller decrease before 2000 with a subsequent increase back to the 1979-1981 mean level.In the lower 16 to 10 hPa layer we see a similar small decrease and subsequent increase, with a strong deviation in the seasonal cycle compared to the reference period of 1979 to 1981.This deviation in seasonal cycle was present in the higher layers, but it was not as pronounced.These seasonal differences may in part be a real response to long-term chlorine changes (e.g., Stolarski et al., 2012, and references therein), though differences in the amplitude of the seasonal variations among instruments over the same time period also suggest that instrument issues account for some of the seasonality.
Despite the overall coherence among the individual instrument records, offsets and drifts between the instruments are apparent.The largest differences occur in the mid-to late 1990s and again in the mid-2000s associated with the Tier 2 NOAA 11_am, N14_pm and N14_am data.Smaller differences can be seen among the Tier 1 instruments as well.

Analysis
In this section we focus on the quantitative analysis of the SBUV time series with an emphasis on how the uncertainties in the measurements feed into the overall uncertainty in the merged ozone record.We use the same approach described by Frith et al. (2014), in which uncertainties have been constructed for the merged total ozone data record.Profile ozone differences among the individual SBUV instrument measurements are larger than was the case for total ozone, but the uncertainty issues are quite similar.That is, there are two major sources of uncertainty in combining the measurements from multiple satellite instruments.The first source of uncertainty is related to absolute calibration offsets between instruments, while the second stems from possible calibration drift over the lifetime of the instruments.We attempt to quantify these uncertainties and then model their time dependence using a Monte Carlo approach to estimate their impact on the longterm variability in the ozone profile measurements.

Monte Carlo uncertainty model parameters
Figures 2, 3 and 4 show ozone profile measurements from individual SBUV instruments that are generally similar during periods of overlap but include a range of inter-instrument offsets and drifts.The individual instrument data sets were produced by the SBUV processing team after considering all of the known issues with respect to the calibration of each instrument (DeLand et al., 2012).Despite the best efforts of this team to obtain the most accurate possible calibration of each instrument, differences remain that depend on latitude and altitude.To construct the MOD data set, we average the individual data records during periods of overlap of two or more instruments within the 08:00 to 16:00 ECT boundary.
One could argue that we could simply adjust offset differences at each latitude and altitude so that we had one continuous data set with little or no relative offset uncertainty.However, the existing data from each instrument had what the processing team deemed the best possible calibration.The remaining differences in ozone measurement during overlap periods are due to factors that are not understood at this time.This means that it is not clear how to adjust data to remove these offsets.An offset between two instruments during their period of overlap could result from the calibrations being different, but it could also result from a drift over time of one or both of the instruments from its initial calibration.The issues of offset and drift are thus inextricably linked.Therefore, instead of arbitrarily adjusting differences between two instruments during their period of overlap, we have chosen to consider the offsets and drifts as part of the uncertainty in "stitching" together the results from the individual satellite instruments to form our merged data set.In the absence of a standardized reference calibration source, we are in essence using the collection of SBUV inter-instrument offsets and drifts to define an SBUV system uncertainty, in an effort to account for both relative and absolute uncertainties.
To estimate offset and drift uncertainty for the SBUV profile data records, we use the same approach as described in Frith et al. (2014).We compute the mean bias and drift for the overlap pairs in each 5 • latitude bin and use the collection of these values to determine the distribution of offsets and drifts for Tier 1 and Tier 2 instruments (see Frith et al., 2014, Figs. 5 and 6 for a detailed explanation of Tier 1 and Tier 2 instruments).We scale the offsets and drifts by the square root of 2 to distribute the relative offset/drift between instrument pairs.
Figure 5 shows the distribution of the absolute value of the biases for each instrument pair and latitude zone and the root mean square of the distribution to be used in the Monte Carlo simulations.As before, we compute the inverse errorweighted root mean square of the individual biases.The inverse weighting allows us to account for the length of overlap and autocorrelation in the differences for each instrument pair.That is, biases and drifts computed from longer overpass periods have greater weighting, while comparatively high autocorrelation reduces the weight.However, we do not account for correlation in latitude, treating each band as an independent measure of the bias.In addition, with the larger differences in the profile data, we found that the resulting distributions were more skewed than in total ozone.Relevant weighting and scaling is applied to the biases in the plotted distribution.Figure 6 shows the distribution of drifts between overlapping measurements.To avoid aliasing of sea-sonal differences into our drift computation, we first remove any seasonal signal in the differences.We thus require at least 24 months of overlap to account for seasonal variability and compute drift, leaving only two Tier 2 instrument pairs with sufficient overlap: NOAA 11_am-NOAA 14_pm and NOAA 16_am-NOAA 19 (in our total ozone analysis only one Tier 2 instrument pair was used).In this case, the distribution of drifts is sufficiently different for the two pairs for us to treat them individually.That is, the NOAA 16_am drift is assigned based on the NOAA 16_am-NOAA 19 overlap (solid blue line in Fig. 6), while the NOAAs 11_am, 14_pm and 14_am drifts are based on the relative drift between NOAA 11_am and NOAA 14_pm (dashed blue line in Fig. 6).Again relevant weighting and scaling is applied to the drifts in the plotted distribution.
The root mean square of the bias is less than 2 % for the Tier 1 instruments and less than 3 % for Tier 2. In general, high-quality satellite-based profile ozone observations agree to within ∼ 5 % (Fig. 2; Kramarova et al., 2013b;Tummon et al., 2015;Hubert et al., 2016).Similarly, Kramarova et al. (2013b) found the drift of the higher-quality SBUV records (our Tier 1) to be within 0.5 % yr −1 when compared to independent measurements, and we see similar size drifts between the individual Tier 1 SBUV measurements, with slightly larger drifts indicated for the NOAA 16_am data at most layers.Hubert et al. (2016) analyzed data from 14 limb and occultation sounders relative to ground-based reference data sets and also found that most instruments were stable to within 0.5 % yr −1 against the ground reference in the mid- dle and upper stratosphere, though the satellite-to-satellite drifts might be larger.Overall, with the exception of the large drift between NOAA 11_am and NOAA 14_pm, offset and drift between individual SBUV instruments are comparable to those found in other satellite-based profile ozone measurements.

Monte Carlo model structure
The offset and drift parameters derived in the previous section form the basis for our Monte Carlo simulation of how these uncertainties lead to uncertainties in trend determination.We start under the assumption that the data from each satellite have a calibration that is unbiased with respect to the other satellites and, to the best of our knowledge, the calibration does not drift in time.We are thus assuming that any uncertainty could go in either direction.For each instrument used in the MOD, we then randomly prescribe offset and drift uncertainty from assumed Gaussian distributions with 1σ widths equal to the root mean square values shown in Figs. 5 and 6.Simulated Tier 1 instrument uncertainties are drawn from the Tier 1 distributions and Tier 2 uncertainties from the Tier 2 distributions, thereby explicitly representing the varying uncertainty in the individual records.
We follow closely, though not exactly, the approach illustrated in Fig. 7 of Frith et al. (2014).The difference here is that we treat the drift and offset separately, rather than applying a drift and offset simultaneously, as was done in Frith et al. (2014), Fig. 7a.
Specifically, we went through the following consecutive steps: -Step 1: apply a drift uncertainty to each instrument.This step results in traces similar to Fig. 7a of Frith et al. (2014) but each trace starts on the zero line.
-Step 2: inter-calibrate the drifting records from individual instruments using two reference instruments -NOAA 11_pm and NOAA 17.In this step we repeat the instrument calibration process used in the algorithm (DeLand et al., 2012;Frith et al., 2014, Fig. 7b).
Through this adjustment we induce the time dependence we expect from the internal calibration process but remove instrument to instrument offset that is solely due to drift in one or more instruments because this offset is also a component of the offset distribution and we want to avoid double-counting uncertainties.
-Step 3: add a bias uncertainty to each instrument.By adding the bias uncertainty after the calibration (Step 2) we avoid removing the offset through calibration and thus more realistically reflect the offsets that exist after the internal calibration process.
-Step 4: average time series from individual instruments into a single simulated time series.
The steps described above are repeated 10 000 times to form the distribution of potential error due to the merging process.We note that in our total ozone analysis, we used Dobson ground-based data to estimate the precision of the calibration between the beginning and the end of the record (Frith et al., 2014, Fig. 7d), but we eliminate this step in the profile analysis because we do not have a comparable correlative profile ozone record that we believe is stable over the full time period.
Figure 7 shows two examples of Monte Carlo simulations based on the same bias and drift parameters described above at 10-16 hPa.Panel (a) represents a sequential merging process, with each new instrument adjusted to match the previous instrument.In this case the errors accumulate as each new record is added.Panel (b) shows the shape of the simulations from the Monte Carlo model used to represent the SBUV MOD record outlined above.Here the Monte Carlo model is structured to reflect the timing of the V8.6 internal calibration, which is based on two reference calibration periods: NOAA 11_pm in the early 1990s and NOAA 17 in the mid-2000s (DeLand et al., 2012.Nimbus 7 is calibrated to NOAA 11_pm, while all later instruments are calibrated using NOAA 17 as the reference.This means that the uncertainties in the SBUV records do not accumulate sequentially from one instrument to the next, but grow forward and backward in time away from the two reference data sets.Note that if the data set were constructed by referencing the records to an early single-base calibration, such as Nimbus 7 SBUV, the modeled uncertainties would then follow the first example in Fig. 7a.In this case the data at the end of the record would have a relatively large uncertainty with respect to the data at the beginning of the record because the calibration would have been transferred through the Tier 2 instruments (NOAAs 11 and 14) in the middle of the record.

Multiple linear regression model
Having established a merging uncertainty distribution as a function of time, we now must convert this uncertainty to an uncertainty in derived trends from the merged data set.To analyze long-term variability we use a standard multiple linear regression model including terms for the seasonal cycle, quasi-biennial oscillation, 11-year solar cycle, volcanic aerosols from the eruptions of El Chichón and Mt.Pinatubo, El Niño-Southern Oscillation, and either a fit to equivalent effective stratospheric chlorine (EESC) using the full record or linear fits to segments of the data after long-term solar and volcanic variations have been removed.The regression model and proxy data sources are described in detail in Frith et al. (2014), and data sources are given in the "Data availability" section of this paper.
We first fit the original MOD time series to the regression model.The statistical uncertainty, defined as the uncertainty associated with the imperfect ability of the proxies to capture all variability in the data, is computed using a bootstrap analysis of the residual time series.We run 400 iterations, and correlation in the residual is accounted for through 1-year segment resampling (Efron, 1979).We then compute the merging uncertainty by similarly running each of the Monte Carlo uncertainty simulations (shown in Fig. 7b) through the regression model and calculating the standard deviation of the resulting regression coefficients.The total uncertainty is the combination of the statistical and merging uncertainty, computed as the root mean square of the individual uncertainties.
For this analysis, as in Frith et al. (2014), we use a simplified regression model fitting only the EESC and solar terms to the uncertainty simulations over the full time period or a simple linear trend when fitting to 1979-1994 or 2001-2015 time segments.However, we compared these results with fits to the full model and found negligible differences in the final uncertainty estimates.Table 1 gives the derived merging uncertainty as a function of layer for the EESC fit, converted to drift units based on the slope of the EESC curve from 2001 to 2015 and for the linear segment fit over the same period.The larger uncertainty associated with the linear trend parameter reflects the larger potential for merging uncertainties to alias into trends over the relatively short 2001-2015 period compared to lower potential aliasing onto the EESC functional form over the full time period.The linear trend parameter uncertainty is less than the 6 % dec −1 (2σ ) error used in the most conservative approach presented by Harris et al. (2015) but greater than the uncertainty derived by Steinbrecht et al. (2017) based on the spread of individual trends reported from a set of six merged ozone records.
Figure 8 shows the derived trend since 2001 estimated from EESC and from a linear fit as a function of latitude for the upper-stratospheric 1.6-1.0hPa layer.The statistical uncertainty and total uncertainty are shown separately.The trend derived by either method is positive and nearly independent of latitude at a value of about 2 % dec −1 .Both are statistically significant at the 2σ level if uncertainty in the merging process is not included.After adding the merging process uncertainty, the trend obtained using the EESC fit to the entire data set still yields a statistically significant trend after 2001 at the 2σ level.However, the trend obtained by fitting a linear function to the data after 2001 is now not statistically significant at the 2σ level (although it is close).
Figure 9 shows results at 10.1-6.4 hPa.The trend post-2001 for the EESC fit is small at all latitudes and not statistically significant except at the most southerly latitudes shown.When the merging uncertainty is added, the results are not significant at any latitude.For a linear fit to the data since 2001, somewhat larger trends are obtained that are significant at higher latitudes in both hemispheres if the merging uncertainty is excluded.However, when merging uncertainty is included, these trends are no longer statistically significant.Smaller trends at this pressure level are consistent with the vertical trend structure observed across different satellite and ground-based systems reflecting smaller ozone losses rela-  tive to layers above and below (e.g., Randel et al., 1999;Harris et al., 2015).

Comparison with NOAA Cohesive data set
The NOAA Cohesive data set is an independently constructed merged ozone record based on the SBUV series of instruments (Wild and Long, 2017).NOAA Cohesive takes an alternative approach to account for the offsets between SBUV instruments.Only one instrument is included at any given time, and additional external offsets are applied to improve consistency over the record.This version of the NOAA Cohesive record is an update from that reported in the SPARC/IO 3 C/IGACO-O 3 /NDACC (SI 2 N) Past Changes in the Vertical Distribution of Ozone Initiative (Hassler et al., 2014;Harris et al., 2015).Comparisons done within SI 2 N indicated the potential for unphysical trends when a successive head to tail adjustment scheme was applied, as a result of the lower-quality NOAA 9 (NOAA Cohesive uses NOAA 9 rather than NOAA 14 in its construction) and NOAA 11 data (Tummon et al., 2015;Wild and Long, 2017; also Fig. 7a).NOAA Cohesive was revised to not include offsets to Nimbus 7 and NOAA 11 data in the early portion of the record.The long overlap periods from NOAA 16 to NOAA 19 are used to make adjustments in the latter period of the record using NOAA 18 as the reference instrument, but these adjustments are not linked back to the beginning of the record.The resulting data set no longer shows unrealistic trends (Steinbrecht et al., 2017;Wild and Long, 2017).
For our purposes, we can consider the NOAA Cohesive data set as another realistic rendition of the MOD with variations that are defined by the intra-instrument differences (via adjustments applied in NOAA Cohesive).Figure 10 shows the time series of the differences between the MOD data set and the NOAA Cohesive data set for two pressure layers as a function of time.Differences, shown as blue dots, are plotted for all latitude bands to show the range of variations for each month.Also shown are the differences between MOD and the individual SBUV instrument monthly zonal means (black dots), which by definition are non-zero when more than one SBUV instrument contributes to MOD.The 2σ uncertainty limits defined from the variability of the Monte Carlo simulations are denoted by the red lines (Fig. 7).To the extent that the MOD and NOAA Cohesive data sets are two realizations of reasonably constructed time series from the SBUV data, we would expect these differences to fall within our estimated uncertainty bounds for most or all of the time series.The only exception is in the mid-1990s, but this is simply a matter of timing.NOAA Cohesive uses NOAA 9 data in 1994, while we continue with NOAA 11_pm until the NOAA 14 data start in early 1995.The uncertainty estimate reflects the timing of data used in the MOD record, but the magnitude of increased error with the introduction of a Tier 2 instrument is sufficient to cover the range of differences with NOAA 9 (also Tier 2).In the early portion of the record both merged data sets are based on Nimbus 7 SBUV.Minor random variations in this period can result from differences in the exact procedure used to compute the monthly zonal means, such as slightly different filtering criteria.The first significant variation comes in 1989, when MOD uses the average of Nimbus 7 and NOAA 11_pm, while the NOAA Cohesive data set switches to NOAA 11_pm.Starting in mid-1990s, differences increase as more instruments come online, and potential for the data sets to diverge increases.Throughout the period both individual SBUV and NOAA Cohesive measurements are generally contained within the MOD 2σ variability.
Figure 11 shows the trends calculated from the MOD and the NOAA Cohesive data sets for the latitude bands of 35-50 • S and 35-50 • N as a function of pressure.The trends were derived from both merged records using a linear fit to data after 2001, computed after long-term variations in solar cycle are removed based on an initial full fit to the data.The shaded areas show the statistical uncertainty from both fits, while the errors bars show the combined merged and statistical uncertainty for the trend.Note that the merging uncertainty was calculated specifically for the MOD merging process, but for comparison purposes we assume the same merging uncertainty for NOAA Cohesive in the figure.The vertical structure of the derived trends is notably different between MOD and NOAA Cohesive, though in most cases the statistical errors do overlap, if just barely.However, when the MOD merging uncertainty is added, the combined errors encompass both results.The reasons for these trend differences are apparent in Fig. 10, where the MOD data drifted downward compared to NOAA Cohesive in the 4-2.5 hPa, leading to a smaller upward trend.The opposite is true for the 16-10 hPa layer where MOD drifted upward compared to NOAA Cohesive resulting in more positive trends.Although we are fitting to 15 years of post-2000 data, end effects can still lead to differences in the trend.
On the other hand, we can examine the fits to EESC using the entire data set from 1979 through 2015.Figure 12 shows the trends derived for the time period 2001 to 2015 from both the MOD and NOAA Cohesive data sets using EESC to fit the entire data sets.We can see here that the trends derived from the two merged data sets are nearly identical when the entire data set is used to determine the fits.We also see that the trends in the top two layers are statistically significant including the merging uncertainties.

Summary and conclusions
We have presented an analysis of the uncertainties in constructing a merged ozone data set for profile measurements of ozone from the SBUV instruments.The analysis is similar to our previous work on total ozone measurements from SBUV (Stolarski and Frith, 2006;Frith et al., 2014).However, the profile measurements are inherently noisier and the inter-calibration of instruments on different satellites is more uncertain.We find that the inclusion of uncertainty in constructing the instrumental record has a great impact on determining the statistical significance of trend results derived from time-series regression models.For instance, fitting a linear trend to the data since 2001 results in trends that are not statistically significant at the 2σ level at nearly all latitudes and altitudes when uncertainties related to the merging procedure are included.Despite the insignificance of the derived trends from a linear fit to data from 2001, a fit to EESC over the entire record is statistically significant at all latitudes.This significance comes mainly from the fitting to the downward trend prior to 2000.EESC is a modeled representation of the amount of chlorine and bromine in the atmosphere available to destroy stratospheric ozone at a given time and location, based on measurements of ozone-depleting substances in the troposphere, age of air, and the fractional release rates of chlorine and bromine from various chemical constituents (Newman et al., 2007).Our best current understanding, as represented in chemistry climate models, predicts that stratospheric ozone varies linearly with EESC (e.g., Newman et al., 2009).Therefore, when we fit a data record using the EESC as a regression parameter, we are testing the degree to which the data are following the model predictions.A fit to a linear trend on the other hand is a test of whether the data are increasing or decreasing as a result of any forcing not explicitly included in the regression model.Ideally a fit to linear trend can be used to verify that ozone is recovering as expected from chemistry climate models (i.e., following the EESC functional form), or alternatively, to indicate that other factors, such as stratospheric cooling or other unexplained long-term variations, may also be affecting the data.However, the merging uncertainty on trends over the relatively short 15-year time period do not yet allow us to independently verify the ozone recovery rate predicted by the model.Nevertheless, the continued goodness of fit of the data to the EESC curve extended through 2015 provides strong evidence that chlorine is the major driver of the long-term changes in the ozone concentration.
Significant efforts are underway within the ozone trend community to properly characterize the errors associated with merged ozone records.Approaches based on comparing individual data sets directly (Hubert et al., 2016;Harris et al., 2015;this work) indicate larger uncertainties than are suggested based on the spread of the derived trends from multiple merged records (Steinbrecht et al., 2017).While our Monte Carlo approach worked well for total column ozone, the larger differences in the profile warrant investigation of a more complex means of distributing the uncertainties; the use of wide Gaussian distributions to represent the actual distributions shown in Figs. 5 and 6 likely lead to overly conservative error estimates.We know for example that the biases and drifts tend to be correlated from layer to layer and largely cancel in the total, putting a constraint on the potential offsets and drifts.We are also testing the sensitivity of the derived uncertainty to treating each instrument separately (as is done here for the two pairs of Tier 2 instruments) rather than combining multiple instrument pairs into a single distribution.Nevertheless, the differences with the NOAA Cohesive data set suggest the uncertainties modeled here are not overly exaggerated and that we still need more data to resolve these issues.

Figure 2 .
Figure 2. Mean bias of SBUV instruments (N16-N19) relative to Aura MLS in percent computed over respective overlap periods for each instrument.Results shown are averaged in three broad latitude bands: 35-50 • S, 20 • S-20 • N and 35-50 • N. Shaded areas indicate 2σ deviations computed from the monthly means.

Figure 3 .
Figure 3. Drift of the NOAA SBUV instruments relative to Aura MLS over the time period of their overlap in % yr −1 as a function of pressure level.Shaded areas indicate 2σ uncertainties scaled to account for autocorrelation.

Figure 4 .
Figure 4. Time series of ozone anomalies from individual SBUV records for three pressure levels.Anomalies are calculated from the 3year 1979-1981 Nimbus 7 SBUV seasonal cycle.Data are averaged over the 35-50 • N latitude band for (a) 2.5-1.6 hPa, (b) 6.4-4 hPa and (c) 16.1-10.1hPa.The colors indicate the individual instrument records as indicated at the bottom of the figure.

Figure 5 .
Figure 5. Mean offset parameters as a function of pressure level derived from Tier 1 (Nimbus 7, NOAAs 11_pm, 16_pm, 17, 18 and 19) and Tier 2 (NOAAs 11_am, 14_pm, 14_am and 16_am) instrument overlaps used in Monte Carlo simulations to evaluate the uncertainty of potential offsets and drifts on final merged ozone record.The parameters are the weighted root mean square of the collection of mean offsets computed in each 5 • latitude bin and each overlapping instrument pair.The probability distribution for the Tier 1 instruments is shown by the red shaded histogram at each pressure level, while the probability distribution for the Tier 2 instruments is shown by the blue shaded histogram (shown upside down to separate it from the Tier 1 distribution).

Figure 6 .
Figure 6.Drift parameters as a function of pressure level derived from Tier 1 and Tier 2 instrument overlaps used in Monte Carlo simulations to evaluate the uncertainty of potential offsets and drifts on final merged ozone record.For Tier 2 instruments, the NOAA 11_am to NOAA 14_pm relative drift is shown separately from that for NOAA 16_am to NOAA 19.

Figure 7 .
Figure 7. Two examples of Monte Carlo simulations used to model the propagation of uncertainty when merging data sets.Panel (a) shows a typical accumulation of uncertainty with each instrument added sequentially to the merged record.Panel (b) simulations use the same distribution of offset/drift values, but the timing reflects the cross-calibration done within the V8.6 algorithm relative to two baselines: NOAA 11 in the early 1990s and NOAA 17 in the mid-2000s.In both cases, periodic reductions in the error spread occur during periods when two or more instruments are averaged.In these examples only data included in MOD are considered.

Figure 8 .
Figure 8. Trend versus latitude for layer 8 (1.6-1.0 hPa) from 2001 to 2015 obtained by two methods.Red line and shading were obtained by fitting to EESC over the entire time period of 1979 to 2015 and converting to the slope of the EESC curve from 2001 to 2015.Solid vertical lines indicate 2σ uncertainty due to data variability.Red shading indicates uncertainty including the impact of merging uncertainty.Blue line shows the trend obtained by fitting the data from 2001 to 2015 by a linear trend.Solid vertical bars are the 2σ uncertainty due to data variability, while larger dashed vertical bars indicate the addition of merging uncertainty.The points on the far right side are the values obtained for the 50 • S-50 • N average.

Figure 10 .Figure 11 .Figure 12 .
Figure 10.Time series showing MOD/NOAA Cohesive data set differences at two pressure levels (4-2.5 hPa in a and 16-10 hPa in b).Black dots are MOD minus individual SBUV monthly mean differences, defined during times when more than one SBUV instrument contributes to MOD; blue dots are MOD minus NOAA Cohesive monthly mean differences.Both are generally contained within the 2σ variability.Differences in all latitude bands from 50 • S to 50 • N are included.The red lines indicate 2σ variability of 10 000 Monte Carlo simulations.

Table 1 .
2σ uncertainty associated with EESC and linear trend proxy terms based on the standard deviation of fits to 10 000 Monte Carlo simulations.The total uncertainty is the root mean square of these values and the statistical uncertainty derived from the goodness of the regression model fit.