Past changes in the vertical distribution of ozone – Part 3: Analysis and interpretation of trends

. Trends in the vertical distribution of ozone are reported and compared for a number of new and recently revised data sets. The amount of ozone-depleting compounds in the stratosphere (as measured by equivalent effective stratospheric chlorine – EESC) was maximised in the second half of the 1990s. We examine the periods before and after the peak to see if any change in trend is discernible in the ozone record that might be attributable to a change in the EESC trend, though no attribution is attempted. Prior to 1998, trends in the upper stratosphere ( ∼ 45 km, 4 hPa) are found to be − 5 to − 10 % per decade at mid-latitudes and closer to − 5 % per decade in the tropics. No trends are found in the mid-stratosphere (28 km, 30 hPa). Negative trends are seen in the lower stratosphere at mid-latitudes in both hemi-spheres and in the deep tropics. However, it is hard to be cat-egorical about the trends in the lower stratosphere for three reasons: (i) there are fewer measurements, (ii) the data quality is poorer, and (iii) the measurements in the 1990s are per-turbed by aerosols from the Mt Pinatubo eruption in 1991. These ﬁndings are similar to those reported previously even though the measurements for the main satellite and ground-based records have been revised. There is no sign of a continued negative trend in the upper stratosphere since 1998: instead there is a hint of an average positive trend of ∼ 2 % per decade in mid-latitudes and ∼ 3 % per decade in the tropics. The signiﬁcance of these upward trends is investigated using different assumptions of the independence of the trend estimates found from different data sets. The averaged upward trends are signiﬁcant if the trends derived from various data sets are assumed to be independent (as in Pawson et al., 2014) but are generally not signiﬁcant if the trends are not independent.

The successful implementation of the Montreal Protocol with its adjustments and amendments has led to reductions in stratospheric chlorine and bromine amounts since the late 1990s (WMO, 2011). These reductions are expected to result in less chemical depletion of ozone and so lead to increased stratospheric ozone amounts. To date, no signal of a positive response of ozone directly attributable to declining halogen levels has been unambiguously identified in the ozone record because it is hard to separate the effects of changes in chem-istry and transport (e.g. WMO, 2011;Pawson et al., 2014;Hadjinicolaou et al., 2005;Mahieu et al., 2014). There are two main reasons for this: interannual variability in ozone amounts and changes in ozone due to the changing climate. A confounding factor has been the lack of a near-global, selfconsistent set of vertically resolved ozone observations. Near-global measurements of the ozone profile using satellite instruments have been made continuously since 1979, a period in which the coverage achieved by ground-based instruments has increased enormously. However, since 1998, when the first SPARC/IO3C/GAW (Stratosphere-troposphere Processes And their Role in Climate/International Ozone Commission/Global Atmospheric Watch) report was published (Harris et al., 1998), no thorough assessment of the quality of all the measurements has been made in terms of their suitability for studies of longterm changes. Until 2005, when the SAGE (Stratosphere Aerosol and Gas Experiment) II and HALOE (Halogen Occultation Experiment) satellite instruments ceased operation, this did not present a major problem when assessing changes in the ozone profile as the SAGE and HALOE instruments provided internally consistent and quasi-independent records. Few studies extending the observational record after 2005 were published in time for consideration in WMO (2011), though several instrument records up to 2005 were compared thoroughly (Terao and Logan, 2007). Updates were available for the ground-based instruments in the Network for the Detection of Atmospheric Composition Change (NDACC) (Steinbrecht et al., 2009) and for selected satellite instruments (Jones et al., 2009). This situation occurred despite the fact that many more instruments were making ozone profile measurements, including those on the Envisat and Aura satellites (Tegtmeier et al., 2013;Hassler et al., 2014). SPARC, the IO3C, the ozone focus area of the Integrated Global Atmospheric Chemistry Observations (IGACO-O3) and NDACC therefore supported the SPARC/IO3C/IGACO-O3/NDACC (SI2N) initiative with the aim of updating knowledge of changes in the vertical distribution of ozone. In addition, three synthesis papers are being prepared. The first, Hassler et al. (2014), summarises the wide range of available measurements of the ozone profile, highlights how they contribute to our general understanding of its long-term evolution, and provides a "bottom-up" view of the potential data quality and state of the retrieval algorithms. The second, Lambert et al. (2015), contains a thorough comparison of the various data sets and assesses the consistency between them, building on the work of Hubert et al. (2015). This third synthesis paper discusses the long-term changes calculated from the data sets and compares these changes with those found in other studies.
Section 2 (Methodology) summarises the sources of the data sets, relying heavily on Hassler et al. (2014) and the other SI2N papers and highlights the important features to consider when calculating long-term changes. It also con-Atmos. Chem. Phys., 15, 9965-9982, 2015 www.atmos-chem-phys.net/15/9965/2015/ tains a description of the statistical approach used, including how the uncertainties in the trend estimates are calculated. Section 3 (Analysis) describes and discusses the results of the trend calculations performed in this study. Three periods are considered: (i) [1979][1980][1981][1982][1983][1984][1985][1986][1987][1988][1989][1990], a period prior to the eruption of Mt Pinatubo in which the stratosphere was relatively free of aerosols; (ii) 1979-1997, a period of increasing equivalent effective stratospheric chlorine (EESC); and (iii) 1998-2012, the early years of the ozone recovery from decreasing EESC. In addition to these main periods, a number of additional issues are investigated, including the dependence of the calculated trends on the end dates chosen and ways of averaging ozone trends from a number of data sets. In section 4, we discuss and summarise the results. Overall, the aim is to determine the degree to which a trend can be detected given the uncertainties resulting from the various measurements, the statistical analysis, and the relatively short recent record since 1998. Best estimates of the trends and their uncertainties are given based on all the available evidence.

Data sets
The data sets used in this study are from the individual instruments described in Hassler et al. (2014) and the merged data sets described in Tummon et al. (2015). An assessment of the quality of the data sets based on measurement comparisons is given in Lambert et al. (2015) and references therein with additional information in Tegtmeier et al. (2013). Individual references are given below when referring to specific data sets. Unless otherwise stated, the measurements used in the analyses presented here were obtained through the URLs given in Table 1d in Hassler et al. (2014) and Table 7 in Tummon et al. (2015). No additional selection of published data is made based on quality assessment beyond that provided by the instrument teams. The measurements used here have different spatial resolutions. This study aims at providing a global overview of the observed changes and what the associated uncertainties are. It does not look at the spatial structure in much detail -for example, issues associated with finerscale structures in the vertical profile which are observable by some instruments are not addressed.

Measurements prior to 1997
Of the measurements prior to 1997, only four instrument records are suitable for trend analysis: the SAGE and SBUV (solar backscatter ultraviolet) satellite records and the ground-based ozonesonde and Umkehr measurements. Earlier versions of these data sets were analysed in the SPARC/IO3C/GAW report (Harris et al., 1998), and all have recently been improved. SAGE I made 33 months of ozone measurements from February 1979 to October 1981, a period in which stratospheric aerosol loading was exceptionally low.
SAGE II started making ozone measurements in October 1984, after a 3-year gap, when stratospheric aerosol loading was still relatively high following the eruption of El Chichón in early 1982. SAGE II version 7.0 is used here (Damadeo et al., 2014). The lack of an overlap period causes uncertainties in the relative calibration of the two sets of measurements; these are exacerbated by the uncertainty in the reference altitude for each SAGE I measurement. The ad hoc, latitudedependent correction for this error (Wang et al., 1996) removes this systematic error source above an altitude of approximately 20 km. Ozone trends are reported here for the SAGE I/II time series. The SBUV instrument on the Nimbus 7 satellite made measurements from October 1978 to June 1990, with the SBUV/2 series of instruments starting in 1985 (McPeters et al., 2013). The SBUV instrument was subject to little orbital drift and its data are thought to be of higher quality than the SBUV/2 instruments that immediately followed it . Accordingly, we look at trends through June 1990 (Sect. 3.1.1). This approach additionally avoids any impact of the Mt Pinatubo eruption (June 1991) on either stratospheric ozone or on the quality of the ozone measurements. Section 3.1.2 shows results of trends computed over the full period of EESC increase, from 1979 to 1997.
The following procedure for data selection and averaging of ozone soundings was applied.
a. Integrated ozone columns are calculated for each 1 km layer from the ground to 30 km.
b. Data are withdrawn if there is a gap in measurements of more than 500 m in the ozone profile data.
c. Data are retained if at least 25 integrated layers can be calculated in the ozone profile.
d. To be included in the trend analysis, there must be at least two "successful" ozone profiles available during 1 month for the particular station under consideration.
A total of 40 non-polar stations have been used in the analysis presented here. Once the preceding criteria have been applied, the stations are far from evenly distributed and, in particular, there are relatively few stations with data for the early part of the record. For example, only one record (Lauder, New Zealand) was used for the southern midlatitudes in the analysis up to 1997. High-quality up-to-date Umkehr measurements with records beginning prior to 1997 are available for several stations. Among these are four northern mid-latitude stations (Arosa, Switzerland, starting in 1956;Tateno, Japan, in 1957;Boulder, USA, in 1978;Haute Provence, France, in 1984), one tropical station (Mauna Loa, USA, in 1984), and one southern mid-latitude station (Lauder, New Zealand, in 1987). For our pre-1997 analysis, the records from the four northern mid-latitude stations are considered as a latitude band average. Of the other Umkehr records archived at the WOUDC, some are of poor quality or do not cover most of the 1979-2010 period. High levels of volcanic aerosol loading in the stratosphere can create errors in the retrieved profiles. Therefore, about 2 years of data are removed after the eruptions of El Chichón (1982) andMt Pinatubo (1991). Regular calibration of the Dobson and Brewer instruments, which are used to make the Umkehr measurements, guarantee a stable time series. Occasionally, Dobson instruments are replaced introducing potential steps in the record. In these cases, careful homogenisation is required with validation against other measurements (Miyagawa et al., 2009;Zanis et al., 2006). All the data sets mentioned above have undergone major revisions since Harris et al. (1998), and an updated comparison is timely.
As before, measurements at high vertical resolution are provided by ozonesondes at a few locations and by SAGE near-globally, with greater confidence in the SAGE I measurements at altitudes above 20 km. Measurements at lower vertical resolutions are provided by the Umkehr approach at a few locations and by SBUV near-globally.

Measurements since 1998
Section 3.2 describes our analyses of the period since 1998. Many more data sets exist for this period from both satellite and ground-based instruments ( Fig. 2 in Hassler et al., 2014). The number and geographic coverage of ozonesonde and Umkehr instruments making measurements is much greater than in the earlier period (although there are unfortunately signs of this reversing). In addition, the coverage of other instrument types (microwave, lidar, FTIR) in NDACC has increased over this time and these measurements can be used for ozone trend studies as well as for comparisons with satellite measurements and verification of their long-term stability (Hubert et al., 2015;Nair et al., 2012Nair et al., , 2013Steinbrecht et al., 2009;Vigouroux et al., 2015).
However, the most striking change in this period has been in the number of satellite instruments making measurements of the vertical distribution of ozone for the majority of this time and therefore useful for trend studies (Hassler et al., 2014, and  Once the recent data sets are mature, this should lead to a better overall knowledge of the changes in ozone over this period. However, significant uncertainties and differences still exist (Tegtmeier et al., 2013;Lambert et al., 2015), so care is needed to identify robust conclusions about the trends. These problems are likely less important for ground-based records where quality assurance and control procedures can be applied through the major networks (Nair et al., 2012).

Merged data sets: 1979-2012
To estimate trends for the whole period , data sets are required which combine measurements from two or more instruments. Issues that arise when merging data sets and which can contribute to uncertainty include individual instrument uncertainty (precision and offset), potential drift, changes in sampling and changes in measurement technique, all of which can vary with latitude, altitude, time of year, etc.
The SBUV series of instruments has produced a continuous record over the 1979-2012 period, albeit with uncertainties in trends resulting from inter-instrument differences .
Two merged data sets using the reprocessed SBUV v8.6 measurements (McPeters et al., 2013) are available: v8.6 SBUV MOD (Merged Ozone Data set; Frith et al., 2014) and SBUV merged cohesive . The two data sets differ somewhat in the combination of instruments used and the averaging carried out, with the SBUV merged cohesive data set using data from single instruments which are then bias-corrected to produce a continuous record . In contrast, the v8.6 SBUV MOD data set is constructed using the average of all available data for a particular period .
The two solar occultation instruments, SAGE II and HALOE, ceased operation in 2005, so trends calculated past that date require an extension of their record with measurements from other instruments. Several data sets exist that extend the SAGE II record past 2005: GOZCARDS (Global OZone Chemistry And Related trace gas Data records for the Stratosphere; Froidevaux et al., 2015), SWOOSH (Stratospheric Water and OzOne Satellite Homogenized; Davis et al., 2015), SAGE-GOMOS , and SAGE-OSIRIS (Bourassa et al., 2014 Each measurement system has different vertical and spatial resolutions, with some expressed as a function of pressure and others as a function of altitude. The gridded data sets were latitudinally weighted and averaged into three latitude bands: 35-60 • S, 20 • S-20 • N, and 35-60 • N. The original vertical gridding was kept for the altitude-based data sets (i.e. every 1 km spacing) and the pressure level data sets. When combining trend results from individual systems to estimate the overall trend, we convert to a common vertical coordinate and degrade to a common vertical scale (see Sect. 3.3 for details).

Statistical analysis
We apply a multiple linear regression model to all data sets using the updated version (Hassler et al., 2013) of the statistical model described in Bodeker et al. (1998) and used in Tummon et al. (2015). The core regression model includes an offset term (to describe the average annual ozone amount) and basis functions to describe ozone changes due to the seasonal variation, QBO (quasi-biennial oscillation), solar cycle, ENSO (El Niño-Southern Oscillation), and a proxy for the effect of the Mt Pinatubo aerosol on ozone. The orthogonal QBO basis function is a synthetically created time series orthogonal to the monthly mean 50 hPa Singapore zonal wind that allows for a time lag at different latitudes and altitudes.
where t is the ozone concentration for a particular month t for a particular data set; A-H are the model coefficients corresponding to an offset term (to account for the mean annual cycle in ozone), linear trends, and basis functions used; and R(t) represents the residuals (difference between the measured and modelled ozone values). The subscript on each term A-H indicates how many Fourier pairs the term was expanded into to account for the seasonal dependence of the ozone anomalies on the basis functions (Bodeker et al., 1998); for example, NB = 2 indicates two Fourier pairs (two sine, two cosine). The number of coefficients used was about 30. Two types of trend models were used. First, for the main set of analyses from 1979 to 1997 and 1998 to 2012, a piecewise linear trend (PWLT) model was used with separate linear trends calculated for each period with the two lines forced to meet at the end of 1997, the point of inflection. The trends in the two periods are therefore linked. Second, a single linear trend was used for the analyses of data from 1979 to 1990 (Sect. 3.1.1), for the SAGE I/II analysis (Sect. 3.1.2) and for some data sets that were only analysed for 1998-2012 (latter part of Sect. 3.2). A comparison of a PWLT model with a model including two unlinked linear trends found the differences to be generally small. This is also discussed further in Sect. 3.2.
An autoregressive model is applied to the residuals R(t) following Eq. (2): where 1 and 2 are the model coefficients and e t represents the independent random errors with zero mean and variances that are allowed to change from month to month (see Reinsel et al., 1994). The basis functions used represent the QBO, specified as monthly mean 50 hPa Singapore zonal wind and a synthetic basis function orthogonal to this to allow for a time lag at different latitudes and altitudes; ENSO (El Niño-Southern Oscillation), using the monthly mean Southern Oscillation index as proxy; the solar cycle, based on monthly mean F10.7 solar flux data from NOAA's National Geophysical Data Center; and a proxy for ozone perturbations forced by aerosols from the Mt Pinatubo volcanic eruption based on a synthetic time series representing the approximate temporal evolution of stratospheric aerosol concentrations following the eruption (see Bodeker et al., 1998 for further details). Uncertainties in the trend estimates are thus based on the noise in the residual time series and are given as 2σ limits in this paper unless stated otherwise. Non-random effects are discussed later.
The estimated magnitude of the Mt Pinatubo effect should be treated with caution as most of the data sets analysed here filter data that could be affected by additional aerosol in the stratosphere after the eruption. Most of the analysed data sets have gaps in their time series, so the calculated signal is thus a mix of the true signal and the impact of removing data in this period. While we argue that this gives greater confidence in the calculated trends, other analyses would be more appropriate for an examination of the effect of Mt Pinatubo on ozone. The same model was used in Tummon et al. (2015). In that work, where the main aim was comparing trends from different data sets, a minimum data coverage was required as part of the averaging protocol. Here, where the main aim is to provide the best estimate of trends from a given data set, all available data are used, so no minimum data coverage is imposed. The main difference is in areas where the data coverage is poorer, such as the tropics.

Analysis and interpretation
The aim of the analysis is to identify the changes that can be detected in the decadal trends on either side of the peak in EESC when halogen-catalysed ozone loss is expected to have maximised. Accordingly, we split the ozone record into two periods: before 1997, and from 1998 on. Within these two periods, the main issues in looking at decadal trends are the quality and particularly the stability of the available instruments' measurements, the way in which any merging is achieved, and the length of the record. In Sect. 3.3, we examine different methods of propagating uncertainties when combining the individual trends to produce best estimates. We do not use EESC as a proxy for ozone loss as that would mean the temporal evolution of the ozone change was predetermined (Kuttippurath et al., 2015).

1979-1990
Figure 1 shows the trends from 1979 to 1990 for (a) the SBUV instrument on the Nimbus 7 satellite and (b) the combined SAGE I/II records. The overall patterns as a function of latitude and altitude are in good agreement given the differences in vertical resolution and geographic coverage between the instruments. Decreases of 10-15 % per decade are seen in the upper stratosphere (∼ 5 hPa, 42 km) in both data sets. These are statistically significant at all latitudes and are large at high latitudes in each hemisphere. Larger trends are seen in the combined SAGE record. Statistically insignificant trends are seen in the tropical middle stratosphere. The trends again become larger at lower altitudes in the mid-latitude lower stratosphere. This is particularly true in the SAGE I/II measurements for which trends down to 20 km can be considered reliable and which additionally have higher vertical resolution. Figure 2 shows ozone trends from 1979 to 1990 as a function of altitude for three latitude bands: 35-60 • S, 20 S-20 • N, and 35-60 • N. Four data sets are used: Nimbus 7 SBUV, SAGE I/II, Umkehr, and ozonesondes. Separate plots are shown for SBUV and Umkehr (pressure coordinate, coarse altitude resolution), and SAGE and ozonesondes (geometric altitude, fine altitude resolution). Approximate altitudes/pressures are shown on the right hand axis to facilitate comparison. Only a few high quality data sets exist for ozonesondes and Umkehr in this period. These geographically sparse records are unlikely to be representative of the latitude band, so care needs to be taken when comparing the ozonesonde and Umkehr zonal mean trends with those from the two satellite data sets. The SAGE I/II coverage is also limited in the tropics, especially at certain times of year.
Overall, there is good agreement in the trends derived from the two satellite records in the upper stratosphere given the 95 % confidence limits of 2-3 %. Looking more closely, the SAGE trends tend to be more negative than the SBUV trends in both northern and southern mid-latitudes. This is consistent with the observed stratospheric cooling which causes upper stratospheric ozone trends calculated using geometric altitude as a vertical coordinate to be ∼ 2 % per decade more negative than those calculated in pressure coordinates for 1979for -2005for (McLinden et al., 2011Pawson et al., 2014 -see below). The trends derived from the Umkehr stations in northern mid-latitudes are in good agreement with the SBUV trends. Similarly, the ozonesonde trends agree within uncertainties with the SAGE trends in the region of overlap. The increased uncertainties at lower altitudes result from the larger ozone variability in the lowermost stratosphere. While these findings are in general agreement with those reported in Harris et al. (1998) with significant trends in the lower and upper stratosphere, we have more confidence in them and in the estimated errors associated with them as a result of revisions of the underlying measurements.
Trends were only brought forward into the summary of the SPARC/IO3C/GAW report (Harris et al., 1998) for the northern mid-latitudes due to uncertainties in the underlying measurements. Here we also report trends for the tropics and for the southern mid-latitudes. In the southern midlatitudes, the trends calculated for SBUV and SAGE are similar to each other and, in general appearance, to those in the northern mid-latitudes. Closer examination shows that the lower stratospheric trends in the southern mid-latitudes are a little smaller than in the northern hemisphere, consistent with what has been observed in total column ozone (WMO, 2011). In the upper stratosphere, there is no significant difference in the trends in the two hemispheres. There are insufficient ozonesonde and Umkehr records in the southern mid-latitudes (and tropics) for comparison in this period.

1979-1997
The trends from Sect. 3.1.1 are now compared with those for 1979-1997; in the latter half of the period the stratospheric ozone was strongly influenced by the Mt Pinatubo eruption and a number of instruments started making measurements. In addition, the start of the period is when ozone depletion first became apparent, while 1997 was roughly when EESC reached the maximum in the stratosphere. Figure 3 shows the trends from 1979 to 1997 for (i) two versions of the combined SBUV instrument record based on different merging assumptions, (ii) the SAGE I and II records, and (iii) the GOZCARDS merged data set (consisting of SAGEI/II, UARS MLS and HALOE in this period, Froidevaux et al., 2015). Data sets starting in 1984, when the SAGE II record begins, are not included. Overall, the main features of -negative trends in the upper stratosphere with larger trends at higher latitudes and large negative trends at lower altitudes -are seen in the longer records (Fig. 3).
However, in contrast to the panels in Fig. 1, there are now noticeable differences between the three analyses shown. There is reasonable agreement between the v8.6 SBUV MOD and the SAGE I/II trends though the magnitudes of the trends are larger for SAGE I/II by up to 3 % per decade for most of the stratosphere. The differences between the two SBUV versions reflect the difficulties in tying together the records from instruments of the same type on different satellites Tummon et al., 2015). The spatial dis- tribution of the GOZCARDS trends is similar to that of v8.6 SBUV MOD, though the GOZCARDS trends are generally more negative. Figure 4 shows the ozone trends for the same satellite data sets as a function of altitude for three latitude bands: 35-60 • S, 20 • S-20 • N, and 35-60 • N. There are no major differences between these trends and those shown for 1979-1990 in Fig. 2. There is also good agreement between the trends from the different data sets. While Mt Pinatubo was clearly a major influence on ozone (Zerefos et al., 1994;Randel et al., 1995), extending the analysis through 1997 reduces its impact on the calculated decadal trends. The effect of excluding the proxy for the Mt Pinatubo eruption in the statistical analysis was tested and was found to be small when compared to the trends or their associated uncertainties.
As in Fig. 2, the trends calculated from available data sets for ozonesondes and Umkehr are shown in Fig. 4, though again care needs to be taken when comparing them with zonal mean trends from the two satellite data sets. The spread of about ±3 % in the trends within the group is a rough guide to the overall uncertainty in our knowledge of the ozone decrease. It is evident from Figs. 2 and 4 that the necessity to merge records from different instruments substantially adds to the uncertainties in the trends as a result of the choices of which instruments to include and how to make their records consistent (Tummon et al., 2015). The comparison of the uncertainties associated with the trends from the various ground-based records is not straightforward. Each record consists of measurements from a limited number of sites at any particular time. The number of sites and their geographic representativeness changes through the period considered.  Nair et al. (2013) also reported positive upper stratospheric trends over northern mid-latitudes using GOZCARDS data and a combination of lidar and satellite data. However, there are quite a few variations in this feature between data sets (magnitude, level of uncertainty, latitudinal and altitudinal extent), which means that it is hard to be confident about the significance of this feature. It is, however, certainly a clear change from the negative trend seen prior to 1998.  There are some negative trends in the tropics at altitudes ∼ 30-35 km (∼ 15 hPa). This feature (see also Eckert et al., 2014 andGebhardt et al., 2014) is seen in all data sets though, in most cases, this is not statistically significant in individual data sets. It is also obvious that there are many differences in the trends calculated from the various data sets, e.g. in the shape of the trends in the upper stratosphere. Some of these result from different ways of merging the data, others from differences between instruments used in a merged data set (e.g. resolution, sampling). These issues are discussed further in Tummon et al. (2015). The large trends at the higher latitudes in the SAGE-GOMOS record are probably a result of sampling issues Laine et al., 2014). Differences are more obvious in this period as more instruments are used, the length of the record is a little shorter, and the trend signal is smaller. Figure 6 shows ozone trends for 1998-2012 as a function of altitude for the same three latitude bands (35-60 • S, 20 • S-20 • N, and 35-60 • N) as in Figs. 2 and 4. The top two rows show the results calculated using the same PWLT analysis as in Fig. 4. The lowest panel contains the trends for shorter time series which are calculated using a single linear trend. As a result, the ozonesondes show a positive trend at lower altitudes in the mid-latitudes in both hemispheres when calculated with the PWLT, but the northern mid-latitude trend becomes zero with the single linear trend model. In addition, trends are shown for the ground-based lidar, microwave, and FTIR (Fourier transform infrared spectroscopy) instrument latitude band averages. More groundbased and ozonesonde records are available for this period than for the period prior to 1997 as a result of the development of NDACC, but there are still not enough to consider them truly representative of the latitude band, especially in the tropics and the southern mid-latitudes. While the lack of a continued negative trend in the upper stratosphere is clear, there is again a hint of a positive trend when all the records are considered. The negative trend at ∼ 30 km in the tropics is less clear than in Fig. 5 where it is confined principally to the region between 10 • S and 10 • N.

Atmos
The uncertainties calculated for the trends should include the uncertainties resulting from interannual variability, but this is inevitably less true for shorter records. We investigate the importance of this using GOZCARDS data, with the resulting ozone trends shown in Fig. 7 1997 the values are relatively insensitive to changing this date, though there is some change for GOZCARDS in the southern mid-latitudes. The trends in the second period are more sensitive to the inflection date over the mid-latitudes over both hemispheres, consistent with its shorter length. For example, the magnitudes of the GOZCARDS trend at 1.5 hPa in the Southern Hemisphere are 5 % per decade for 1997-2012 and 9 % per decade for 2002-2012 (see also Gebhardt et al., 2014). A similar analysis for v8.6 SBUV MOD yields 0 % per decade and 4 % per decade, respectively.

Best estimates -combining the calculated trends
In principle, the trends calculated from the various data sets can be combined to form a best estimate of the actual trend along with the appropriate uncertainty estimates. In practice it is hard to do this because the degree of independence of the various data sets is not known. If the data sets were completely independent, then a weighted mean and the uncertainty in that mean could be straightforwardly calculated using the estimates of the trends and their associated uncertainties. This approach was taken, for example, in Harris et al. (1998). However, the data sets are not completely independent -notably, several use SAGE measurements in the early part of the record, and extensive comparisons of the different data sets have been made (e.g. Tegtmeier et al., 2013, and references therein) -so there is no rigorous way of performing the weighting or propagating the uncertainties. In addition, the trends are calculated using the same regression model and data with the same underlying atmospheric variability, so imperfect regression and outliers have similar effects in all trends. A similar conceptual problem is encountered when estimates of the lifetimes of several halocarbons are produced using a variety of methods relying to varying degrees on common underlying data sets (Appendix 2 of Chapter 6 in Ko et al., 2013). Ko et al. (2013) consider two contrasting cases to produce a best estimate. In the first case, the different estimates are assumed to be independent. The central estimate is taken as the weighted mean of the various estimates of the lifetime, and the uncertainty range is derived from the weighted mean of the variances from each method, i.e. the differences between the individual mean estimates are not a factor in the variance estimate. The resulting uncertainty should be considered an underestimate of the true uncertainty and is designated the "most likely range". This method is referred to as the sampling distribution of the weighted mean (the SWM-distribution approach). In the second case the different estimates are combined into a single distribution and the uncertainty range corresponds to the joint distribution of the individual variances around the arithmetic (unweighted) mean of the estimates, i.e. the differences in Atmos. Chem. Phys., 15, 9965-9982  the mean trend estimates from the various satellite data sets were taken into account. The resulting range is designated the "possible range". This method is referred to as the joint distribution approach (the J-distribution approach). The ranges for halocarbon lifetimes produced by the two approaches are quite different. For example, the best estimate for the lifetime of CFC-11 was 52 years with a "most likely" range of 43-67 years (SWM distribution) and a "possible range" of 35-89 years (J distribution) (Ko et al., 2013).
In this study, we combine trends calculated from merged sets of observations of the same real quantity (O 3 ) from different platforms, so the comparison is not 100 %. However, the similarities in the amount of rigorous knowledge of the uncertainties are such that we have adopted the same methodologies to combine the results of the time series modelling of the satellite data sets shown in Figs. 4 and 6. The resulting trends and their 95 % uncertainties are shown in Fig. 8. The SWM-and J-distribution approaches are represented by the dark blue and red lines, respectively. The underlying trend estimates from Figs. 4 and 6 are shown in the thin grey lines. There is very little difference in the estimated mean trends, but the uncertainties are substantially larger for the J distribution than for the SWM distribution. In other words, the possible range is noticeably larger than the most likely range. An additional source of uncertainty not captured in the trend analyses of the data sets is a systematic drift in the overall global ozone observing satellite system. The drift can be estimated by bottom-up consideration of the instruments and their long-term calibration systems (e.g. Harris et al., 1998) or by comparison with a well-characterised measurement set. Here we choose the latter approach as no bottom-up drift estimates exist for the merged data sets. The light blue line in Fig. 8 shows the effect of extending the SWM distribution approach to include drift uncertainties based on comparisons of the ground-based lidar and ozonesonde observations with a number of satellite data sets (Hubert et al., 2015), similar to the approach of Nair et al. (2012). Here, a drift uncertainty of 3 % (2σ ) is included at all altitudes for the early trends (top row), while 4 or 6 % are included for the later trends in the middle or lower and upper stratosphere, respectively. (Smaller estimates were tried as a sensitivity test and the overall uncertainty scaled accordingly.) The estimates, applied individually to each data set, are credible because the lidars and ozonesondes are well characterised and lidars in principle have a stable calibration (Werner et al., 1983;Godin et al., 1989;McDermid et al., 1990). The drift-adjusted SWM trends are by definition the same as the SWM trends. How-ever, the uncertainty estimates are much larger as the drift uncertainty is the dominant term. Overall, the drift-adjusted SWM uncertainty estimates are comparable with those derived using the J distribution approach, with some larger and some smaller. One explanation for this is that the effect of the drift uncertainties of the individual satellite data sets is included when using the J-distribution since the assumption for propagation of uncertainties explicitly allows for the differences in the mean values of the underlying individual estimates.
Our best estimates of the ozone trends (Fig. 8) result from trends previously shown separately (in pressure or geometric altitude coordinates) in the earlier figures despite the problems associated with doing so discussed in Sect. 3.1.1. The conversion to common pressure coordinates for instruments whose natural measurement coordinate is altitude was made using MERRA (Modern-Era Retrospective Analysis for Research and Applications; Rienecker et al., 2011) temperature profiles for each altitude and latitude bin. Uncertainties arising from this procedure are likely mainly due to uncertainties in the MERRA reanalyses. They are not included in the estimation of the ozone trend uncertainties.
Atmos. Chem. Phys., 15, 9965-9982 (Hubert et al., 2015). The thick red lines show the possible range for the ozone trends calculated assuming the J distribution. See text and Ko et al. (2013) for more details. The conversion to a common pressure scale of trends derived from instruments whose natural measurement coordinate is altitude was made using MERRA temperature profiles.
Looking at the different approaches together, the trends seen in the upper stratosphere before 1997 in all three latitude bands are negative and statistically significant. Small positive trends are seen in the period after 1998: they are significant when assuming the SWM distribution but not when assuming the J-distribution or drift-adjusted SWM distribution. The differences between the peak trends in the two periods are significant for all approaches. In the lower stratosphere, the differences in the trends are insignificant, with the trends in the later period being close to zero.

Discussion and summary
Trends are reported for a number of data sets for the periods before and after the peak in EESC in 1997. The findings for the period prior to 1997 are broadly similar to those reported elsewhere with decreases in the upper stratosphere at all latitudes and in the lower stratosphere over mid-latitudes. The values found here at 45 km for the combined SAGE I/II data set (1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997) are slightly larger than those found elsewhere for the SAGE II data set (Remsberg, 2014;Damadeo et al., 2014) and those for the merged data sets which rely primarily on SAGE II in this period Bourassa et al., 2014;Tummon et al., 2015). There is reasonably good agreement in the lower stratosphere where the trends using just SAGE II measurements, i.e. from 1984, are smaller than the ones reported here starting in 1979. Considerable benefits would be gained if the SAGE I record could be revised to be consistent with the SAGE II record without having to use the altitude correction from Wang et al. (1996), as it would lead to better knowledge of the changes in ozone in the lower stratosphere in this early period.
Looking at the second half of the record, it is clear that the downward trend in upper stratospheric ozone has not continued and it is likely that there has been an increase since 1998. However, there is disagreement about both the size and the statistical significance of that increase. In particular, we show the sensitivity of the trends to changes in the end points of the analysed data sets and the problems in defining the uncertainties in those trends. Part of the problem is caused by the relatively small size of the increase (which is broadly consistent with model calculations, e.g. Fleming et al., 2011;Pawson et al., 2014). The other part is the relatively large uncertainty which results from the short periods under consideration and from uncertainties introduced when merging data sets.
The earlier discussion of uncertainties (Sect. 3.3) involved the propagation of the uncertainties associated with the trend estimates. It is also worth noting how uncertainties associated with individual data sets are propagated when they are merged to produce a longer time series. These issues are discussed further in the comparison of merged data sets in Tummon et al. (2015) as well as in the individual papers describing their production (e.g. Kyrölä et al., 2013;Bourassa et al., 2014;Frith et al., 2014). Conceptually, the linking of two time series to form a single one involves a difference and uncertainty associated with their absolute calibrations and a difference in the random noise characteristics of the two time series. Each time series additionally has a drift uncertainty. Each of these terms is likely to vary as a function of time and space. In addition, sampling errors can arise when the sampling frequency changes, especially when this is aliased with cycles in ozone, e.g. annually (Damadeo et al., 2014) and diurnally (Sakazaki et al., 2013;Parrish et al., 2014). All these factors serve to confuse and complicate the picture. As a general rule, however, it is clear that the fewer, more stable records that can be used, the better. This general rule will also apply in future: for ozone trend studies, a few good, long-lasting instruments are preferable to a larger number of shorter-lived ones, though it is hard to know in advance which these are. Frith et al. (2014) examined aspects of this issue for the SBUV MOD total ozone measurement series using a Monte Carlo simulation of potential offsets and drifts. The overall impact on the trend uncertainties was therefore comprised of a term related to individual instrument uncertainties (offsets and drifts) and a term related to the natural variability of the atmosphere. The former dominated at low latitudes where total ozone variability is low, while the latter dominated at high latitudes. Similar approaches (e.g. Fioletov et al., 2006) are worth pursuing for merged data sets of vertically resolved ozone, for analysing the uncertainties associated with the existing data sets and for learning lessons for future measurements.

Comparison with WMO (2014)
The results presented here can be compared to those in Pawson et al. (2014). Good agreement is found between the analyses prior to 1997. Overall, their approach is similar to the one used here, and many of the results are similar. However, there are some notable differences between the two analyses, most obviously in the statistical significance of the trends in the last 10-15 years. In particular, they conclude that a statistically significant ozone increase has occurred since 2000 in the upper stratosphere (35-45 km) in the northern mid-latitudes (peak value of +3.9 ± 1.3 % per decade), the tropics (+1.9 ± 1.2 % per decade) and southern mid-latitudes (+3.0 ± 1.2 % per decade). These compare to 1.4 ± 6, 2.7 ± 6, and 1.8 ± 6 % per decade, respectively, for the drift-adjusted SWM estimates. The central values are similar (given the uncertainties), but the associated uncertainties are clearly much smaller in Pawson et al. (2014). Possible causes of difference, could arise from differences in (a) the treatment and propagation of uncertainties, (b) the selection of data, and (c) the statistical approach. These are now considered in turn.
The single most important factor affecting the uncertainty estimates is the drift estimate. Pawson et al. (2014) use the following data sets and 2σ drift uncertainties: HARMOZ (HARMonized dataset of OZone) 2 % per decade, GOZ-CARDS 2 % per decade, SBUV-NASA 3 % per decade, lidar 2.0 % per decade, microwave 2.0 % per decade, Umkehr 4.0 % per decade, and ozonesondes 4.0 % per decade. These drift uncertainties are added in quadrature to the trend uncertainties from the regression, and the resulting individual uncertainties are used in the calculation of the weighted average trend and its uncertainty. A typical value for the uncertainty in the weighted average is thus 0.8 % per decade. In contrast, in this study we use (for the satellite records only) values of the drift uncertainties in the early period of 3 % per decade at all altitudes, and in the later period 4 % per decade in the middle stratosphere and 6 % per decade in the upper stratosphere (all 2σ ). These values arise from the comparison of the satellite records with the lidar and ozonesonde data and reflect the systematic uncertainties that are common to one or more data sets. The other major difference is that the groundbased records (lidar, microwave, Umkehr and ozonesondes) are not treated as independent estimates in our study, partly because of questions as to the representativeness of the number of sites and measurements and partly because several of the sites are common to multiple instruments.
While this factor is the largest contributor to our driftadjusted SWM estimate, it might well also be the biggest contributor to the J-distribution estimates due to the differences in the drifts from the various satellites (Hubert et al., 2015). Developing approaches to reduce the systematic drift uncertainty, e.g. by using the ground-based instruments to ground-truth the satellite instruments or by selecting satellite data more carefully will be important. For example, SAGE II is found to drift less than 1-1.5 % per decade (2σ ), while Aura-MLS has no significant drift against the ground-based stations. Merged data sets such as SWOOSH or GOZCARDS based on these observations might be expected to provide more valid trend estimates.
Second, Pawson et al. (2014) use a different set of measurements. For example, SBUV merged cohesive, SWOOSH, SAGE-GOMOS and SAGE-OSIRIS were not Atmos. Chem. Phys., 15, 9965-9982, 2015 www.atmos-chem-phys.net/15/9965/2015/ used, while the HARMOZ data set , a weighted average of the measurements from OSIRIS, SMR, SCIAMACHY, GOMOS, and MIPAS covering 2002-2012, was included. They additionally include the ground-based records in their combined trends, while we choose not to because of concerns about the representativeness of the trends given the small number of measurement stations. Some of these selections reflect a difference in approach: Pawson et al. (2014) aim to provide the best possible estimates of recent trends and therefore take advantage of measurements from instruments which are no longer operational, while we are analysing merged data sets since they will inevitably have to be used as time goes by. This difference is probably a factor in the differing uncertainty estimates to the ones presented here representing overestimations. The statistical approaches are similar in most respects but differ in several ways. First, Pawson and Steinbrecht do not allow for an annual cycle in their regression coefficients (N x = 0 in Eq. 1 for x = A to H ). This may mean that their trend uncertainty estimates from the regression are smaller because fewer terms are used.
Second, Pawson et al. (2014) use a two-step procedure which first removes the dependence of ozone on the solar cycle, QBO and ENSO based on the entire record. In the second step, independent linear trends are fitted to the remaining ozone variations, which now largely contain the long-term trends only. This procedure is used to remove sensitivities to the chosen point of inflection and allows for more freedom in choosing the trend intervals for the first and second parts of the record. As always, when the second period is relatively short and comparable to the length of the solar cycle, the calculated trends are susceptible to any uncertainties in the solar cycle coefficient. We do not think that this different regression approach is a major factor, as our central estimates and the SWM uncertainties are similar to those in Pawson et al. (2014), both for the pre-1997 trends and the post-1998 trends.
The periods analysed are different, with Pawson et al. (2014) calculating trends for 2000-2013 while we chose 1998-2012. Our results (Fig. 7) show that the peak GOZ-CARDS trends are 1-2 % larger when calculated from 2000 than from 1998, so the different choice of period will be a contributing factor to the different value of the central trend but will not significantly impact the uncertainty estimate. With comparable uncertainties, of course, larger trends such as those from 2000 will be more significant.

Looking ahead
Observed ozone trends reflect the combined effects of several forcings in the atmosphere. In addition to changes in EESC, the abundances of non-CFC greenhouse gases in the atmosphere have increased in recent decades. Over the coming century, greenhouse gas levels are expected to continue to increase while EESC continues to decline. Increasing CO 2 cools the stratosphere (Fels et al., 1980), which slows the catalytic cycles that destroy ozone (Haigh and Pyle, 1982) thereby leading to an increase in ozone in the middle and upper stratosphere. In addition, climate models consistently simulate a strengthening in the Brewer-Dobson circulation under climate change (Butchart et al., 2011). This is particularly important in the tropical and mid-latitude lower stratosphere where there are strong gradients in ozone changes between 100 and 20 hPa (Zubov et al., 2013). The magnitude of these effects are of the order of a few percent per decade and are strongly dependent on the scenario for future greenhouse gas emissions (Eyring et al., 2013) and on the evolution of other chemically-relevant species such as methane and N 2 O (Revell et al., 2012).
Model simulations indicate that during the rise in EESC the effects of climate change would be expected to have offset a small fraction of the halogen-induced decline in middle and upper stratospheric ozone (WMO, 2011). Conversely, as EESC decreases, both effects would act to increase upper stratospheric ozone (Plummer et al., 2010). However, projected changes in the stratospheric circulation mean that total column ozone in the tropics may not return to pre-1980 levels this century (Eyring et al., 2013). The net effect of climate change on column and profile ozone amounts will therefore have a complex horizontal and vertical structure, and these changes will occur concurrently with those from the expected decline in EESC. Model results indicate that measurements with lower vertical resolution will be suitable in the upper stratosphere where the gradients in the climatechange-induced ozone changes are not too steep. However, in the lower stratosphere, and especially in the tropics, higher vertical resolution will be required.
Finally, a critical factor which will determine our ability to monitor future changes in the vertical distribution of ozone is the stability and calibration of the overall system. The combined observing system for total ozone is estimated to be stable to about 1 % per decade, which is appropriate for the size of the changes we are considering. It is not yet clear what is required for the measurements in the vertical distribution of ozone, but a target of a few percent per decade is plausible based on modelled future changes. The results presented here suggest that this will be hard though not impossible to achieve. Certainly it is absolutely essential to minimise the drift. Success will require continued work as well as continued measurements. A global view means that satellite measurements are required. The need to ensure that both the quality and relative stability of the satellite instruments are known requires a complementary capability for independent assessment, most likely from the ground-based instruments in the NDACC and WMO-GAW observing networks. For example, lidars have been shown to have the technical capability to provide this assessment between 20 and 40 km (Nair et al., 2012). Ozonesondes have the capability of providing measurements of suitably high quality at lower altitudes, while Umkehr, microwave, and FTIR have the potential for high-quality measurements at higher altitudes. The ground-based networks have been designed and developed with this capability in mind. It is very important to ensure that they continue to possess the same capability in a period when ozone will be affected by declining levels of halogen compounds, increasing N 2 O and CH 4 , as well as dynamical changes from climate change.