Reanalysis intercomparisons of stratospheric polar processing diagnostics.

We compare herein polar processing diagnostics derived from the four most recent full-input reanalysis datasets: the National Centers for Environmental Prediction Climate Forecast System Reanalysis / Climate Forecast System, version 2 (CFSR/CFSv2), the European Centre for Medium-Range Weather Forecasts Interim Reanalysis (ERA-Interim), the Japanese Meteorological Agency's Japanese 55-year Reanalysis (JRA-55), and the National Aeronautics and Space Administration's Modern Era Retrospective-analysis for Research and Applications version 2 (MERRA-2). We focus on diagnostics based on temperatures and potential vorticity (PV) in the lower to middle stratosphere that are related to formation of polar stratospheric clouds (PSCs), chlorine activation, and the strength, size, and longevity of the stratospheric polar vortex. Polar minimum temperatures (Tmin) and the area of regions having temperatures below PSC formation thresholds (APSC) show large persistent differences between the reanalyses, especially in the southern hemisphere (SH), for years prior to 1999. Average absolute differences of the reanalyses from the reanalysis ensemble mean (REM) in Tmin are as large as 3 K at some levels in the SH (1.5 K in the NH), and absolute differences of reanalysis APSC from the REM up to 1.5% of a hemisphere (0.75% of a hemisphere in the NH). After 1999, the reanalyses converge toward better agreement in both hemispheres, dramatically so in the SH: Average Tmin differences from the REM are generally less than 1 K in both hemispheres, and average APSC differences less than 0.3% of a hemisphere. The comparisons of diagnostics based on isentropic PV for assessing polar vortex characteristics, including maximum PV gradients (MPVG) and the area of the vortex in sunlight (or sunlit vortex area, SVA), show more complex behavior: SH MPVG showed convergence toward better agreement with the REM after 1999, while NH MPVG differences remained largely constant over time; differences in SVA remained relatively constant in both hemispheres. While the average differences from the REM are generally small for these vortex diagnostics, understanding such differences among the reanalyses is complicated by the need to use different methods to obtain vertically-resolved PV for the different reanalyses. We also evaluated other winter season summary diagnostics, including the winter mean volume of air below PSC thresholds, and vortex decay dates. For the volume of air below PSC thresholds, the reanalyses generally agree best in the SH, where relatively small interannual variability has led to many winter seasons with similar polar processing potential and duration, and thus low sensitivity to differences in meteorological conditions among the reanalyses. In contrast, the large interannual variability of NH winters has given rise to many seasons with marginal conditions that are more sensitive to reanalysis differences. For vortex decay dates, larger differences are seen in the SH than in the NH; in general the differences in decay dates among the reanalyses follow from persistent differences in their vortex areas. Our results indicate that the transition from the reanalyses assimilating Tiros Operational Vertical Sounder (TOVS) data to Advanced TOVS and other data around 1998 - 2000 resulted in a profound improvement in the agreement of the temperature diagnostics presented (especially in the SH) and to a lesser extent the agreement of the vortex diagnostics. We present several recommendations for using reanalyses in polar processing studies, particularly related to the sensitivity to changes in data inputs and assimilation. Because of these sensitivities, we urge great caution for studies aiming to assess trends derived from reanalysis temperatures. We also argue that one of the best ways to assess the sensitivity of scientific results on polar processing is to use multiple reanalysis datasets.

Abstract.We compare herein polar processing diagnostics derived from the four most recent "full-input" reanalysis datasets: the National Centers for Environmental Prediction Climate Forecast System Reanalysis/Climate Forecast System, version 2 (CFSR/CFSv2), the European Centre for Medium-Range Weather Forecasts Interim (ERA-Interim) reanalysis, the Japanese Meteorological Agency's 55-year (JRA-55) reanalysis, and the National Aeronautics and Space Administration (NASA) Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2).We focus on diagnostics based on temperatures and potential vorticity (PV) in the lower-to-middle stratosphere that are related to formation of polar stratospheric clouds (PSCs), chlorine activation, and the strength, size, and longevity of the stratospheric polar vortex.
Polar minimum temperatures (T min ) and the area of regions having temperatures below PSC formation thresholds (A PSC ) show large persistent differences between the reanalyses, especially in the Southern Hemisphere (SH), for years prior to 1999.Average absolute differences of the reanalyses from the reanalysis ensemble mean (REM) in T min are as large as 3 K at some levels in the SH (1.5 K in the Northern Hemisphere -NH), and absolute differences of reanalysis A PSC from the REM up to 1.5 % of a hemisphere (0.75 % of a hemisphere in the NH).After 1999, the reanalyses converge toward better agreement in both hemispheres, dramatically so in the SH: average T min differences from the REM are generally less than 1 K in both hemispheres, and average A PSC differences less than 0.3 % of a hemisphere.
The comparisons of diagnostics based on isentropic PV for assessing polar vortex characteristics, including maxi-mum PV gradients (MPVGs) and the area of the vortex in sunlight (or sunlit vortex area, SVA), show more complex behavior: SH MPVGs showed convergence toward better agreement with the REM after 1999, while NH MPVGs differences remained largely constant over time; differences in SVA remained relatively constant in both hemispheres.While the average differences from the REM are generally small for these vortex diagnostics, understanding such differences among the reanalyses is complicated by the need to use different methods to obtain vertically resolved PV for the different reanalyses.
We also evaluated other winter season summary diagnostics, including the winter mean volume of air below PSC thresholds, and vortex decay dates.For the volume of air below PSC thresholds, the reanalyses generally agree best in the SH, where relatively small interannual variability has led to many winter seasons with similar polar processing potential and duration, and thus low sensitivity to differences in meteorological conditions among the reanalyses.In contrast, the large interannual variability of NH winters has given rise to many seasons with marginal conditions that are more sensitive to reanalysis differences.For vortex decay dates, larger differences are seen in the SH than in the NH; in general, the differences in decay dates among the reanalyses follow from persistent differences in their vortex areas.
Our results indicate that the transition from the reanalyses assimilating Tiros Operational Vertical Sounder (TOVS) data to advanced TOVS and other data around 1998-2000 resulted in a profound improvement in the agreement of the temperature diagnostics presented (especially in the SH) and to a lesser extent the agreement of the vortex diagnostics.We

Introduction
Past, present, and future polar lower stratospheric ozone depletion is a subject of critical scientific and human interest.Not only does chemical ozone depletion depend critically on temperatures and polar vortex dynamics in the lower stratosphere, but changes in lower stratospheric ozone also feed back and alter dynamical conditions in both the stratosphere and troposphere, which can significantly affect surface climate (Polvani et al., 2011;Albers and Nathan, 2013;WMO, 2014;Waugh et al., 2015, and references therein).Moreover, ozone depletion is affected by changing tropospheric and stratospheric temperatures, and in turn alters those temperatures via radiative forcing (e.g., Lacis et al., 1990;Forster and Shine, 1997;Levine et al., 2007;Hegglin et al., 2009;Telford et al., 2009;Riese et al., 2012;WMO, 2014).The Southern Hemisphere (SH) springtime polar vortex breakup disperses ozone-depleted air over populated regions, increasing surface UV exposure (e.g., Ajtić et al., 2003Ajtić et al., , 2004;;Pazmino et al., 2005;WMO, 2007).Our ability to quantify chemical ozone loss in observations and to fully understand the mechanisms resulting in that destruction is a key to improving our modeling capability, which in turn will allow accurate forecasting of future ozone changes and their feedbacks on weather and climate.That ability depends critically on accurate knowledge of temperatures in and the dynamics of the lower stratospheric winter and springtime polar vortices.Even in the Antarctic, interannual variations in chemical ozone loss are controlled largely by variations in polar vortex dynamical conditions; thus detection of recovery from chemical ozone depletion also requires accurate knowledge of variability and long-term changes in polar vortex dynamics and temperatures (e.g., Newman et al., 2004;Huck et al., 2005;WMO, 2014).
The chemistry leading to ozone loss involves the conversion of chlorinated and brominated species into forms that destroy ozone on the surfaces of cold aerosol particles and/or polar stratospheric clouds (PSCs) (see, e.g., Solomon, 1999;WMO, 2014, for reviews).These processes only occur at temperatures below a threshold that is dependent on pressure and on water vapor (H 2 O) and nitric acid (HNO 3 ) concentrations (e.g., Hanson and Mauersberger, 1988;Solomon, 1999).Furthermore, these processes can only result in widespread and persistent chlorine activa-tion when/where the cold air is confined so that mixing with outside air cannot dilute the activated fields -that is, inside the "containment vessel" of the winter and springtime stratospheric polar vortex (e.g., Schoeberl and Hartmann, 1991;Schoeberl et al., 1992).Finally, the reactions by which active chlorine (Cl + ClO + 2ClOOCl) destroys ozone require sunlight.The formation and maintenance of the dynamical and chemical environment described above are referred to as "polar processing" since all of these conditions are required for chemical ozone depletion to take place in the lower stratosphere.Since reanalyses from data assimilation systems (DASs) are among the best available tools for modeling and understanding stratospheric dynamics, as well as for driving models of past and present ozone loss, the representation of lower stratospheric temperatures and vortex dynamics in these reanalyses is critical to furthering our understanding of and ability to predict ozone depletion and eventual ozone recovery.
One important finding of earlier studies was that the NCEP/NCAR (National Centers for Environmental Prediction/National Center for Atmospheric Research) and NCEP/Department of Energy reanalyses are unsuitable for polar processing studies because of their poor representation of the stratosphere (very low model top and few model levels) and outdated assimilation approaches (e.g., assimilation of retrieved temperature from operational sounders) (e.g., Manney et al., 2003Manney et al., , 2005a, b), b).The European Centre for Medium-Range Weather Forecasts (ECMWF) 40-year (ERA-40) reanalysis was also shown to be unsuitable for such studies, partly because of unrealistic oscillations in the temperature profiles (e.g., Manney et al., 2005a, b;Feng et al., 2005;Simmons et al., 2005).In the past few years, some studies have begun focusing on the latest generation of reanalyses, which have vast improvements in models and assimilation methods and more comprehensive data inputs (for a review of reanalysis characteristics, see Fujiwara et al., 2017).WMO (2014) showed comparisons of potential PSC volume and of springtime vortex breakup dates (calculated as in Nash et al., 1996) between NCEP/NCAR and two modern reanalyses, ECMWF's Interim (ERA-Interim) reanalysis and the NASA Global Modeling and Assimilation Office (GMAO) Modern-Era Retrospective analysis for Research and Applications (MERRA); NCEP/NCAR was shown to give much lower PSC volumes than in the more modern reanalyses, and the vortex breakup dates differed substantially among each of the reanalyses.Simmons et al. (2014) provided a detailed analysis of the effects of DAS inputs on long-term variability and trends in the ERA-Interim reanalysis temperatures and made comparisons with MERRA, the Japanese Meteorological Agency's 55-year (JRA-55) reanalysis, and the older ERA-40 reanalysis.Lawrence et al. (2015) compared a large suite of diagnostics, based on polar vortex characteristics and temperatures, that are important for polar processing between MERRA and ERA-Interim (hereinafter referred to as ERA-I) for the then-available 34 years of those reanalyses.These comparisons showed significant changes in agreement between the reanalyses over that period, with overall good agreement in the period since 2002, when the amount of data ingested into the two reanalyses' DAS was much greater; the largest improvements in agreement were particularly seen in Antarctic temperature diagnostics.In a paper describing global temperature and wind comparisons as part of the Stratosphere-troposphere Processes And their Role in Climate (SPARC) Reanalysis Intercomparison Project (S-RIP) (Fujiwara et al., 2017), Long et al. (2017) also emphasized changes in agreement between reanalyses related to data input changes, especially improvements in temperature agreement after the transition from TOVS to advanced TOVS (ATOVS) around 1998 to 2000; they also pointed out issues with discontinuities in some reanalyses that were run in multiple streams.Changes such as these noted by Lawrence et al. (2015) and Long et al. (2017), as well as similarities in data inputs that could result in biases in all of the reanalyses, argue for great caution in using temperatures and other fields from individual reanalyses for diagnosing long-term changes and trends.
The ability to compare polar processing diagnostics with observations is very limited for several reasons.Somewhat paradoxically, the vast improvements in DAS usage of available observations have resulted in there being very few truly independent temperature datasets.Furthermore, many of the datasets that are available, even those ingested into the DAS, generally suffer from very limited spatial and/or temporal coverage (e.g., balloon-borne and lidar measurements) and/or issues with resolution, precision, and length of data records (e.g., limb-sounding research satellites).For example, many previous and current limb-sounding satellites had/have very coarse vertical resolution, and did not/do not retrieve temperatures to low enough altitudes to fully cover the lower stratosphere; those that do cover the lower stratosphere typically have incomplete coverage of the polar regions (e.g., the Aura Microwave Limb Sounder (MLS) does not observe poleward of 82 • latitude).Further, validation studies generally do not indicate better quality in lower stratospheric temperatures from limb sounders than that of the reanalyses (e.g., Schwartz et al., 2008, and references therein).Nevertheless, several recent studies have compared some of the latest-generation reanalyses with observations: for example, Hoffmann et al. (2017) compared MERRA, MERRA-2 (the recent successor to MERRA), ERA-Interim, and NCEP/NCAR reanalyses with temperatures and winds from long-duration Concordiasi balloon flights in the Antarc-tic lower stratosphere in September 2010 through January 2011; unsurprisingly, they found much larger temperature biases for NCEP/NCAR than in the other reanalyses, not only because of the shortcomings in that reanalysis, but also because the other reanalyses they considered assimilated Concordiasi measurements.Lambert and Santee (2018) compared MERRA, MERRA-2, ERA-Interim, JRA-55 (the Japan Meteorological Agency's latest reanalysis assimilating both surface and upper air observations), and CFSR/CFSv2 (referring collectively to the NCEP Climate Forecast System Reanalysis and Climate Forecast System version 2) with COSMIC (Constellation Observing System for Meteorology, Ionosphere, and Climate) GPS-RO (Global Positioning System-Radio Occultation) temperatures, and presented an innovative analysis using thermodynamic calculations to derive an independent temperature reference from satellite observations of HNO 3 , H 2 O (from the Aura MLS), and PSC aerosols (from Cloud-Aerosol Lidar with Orthogonal Polarization).They found temperature biases in the reanalyses with respect to COSMIC of −0.6 to +0.5 K, and biases ranging from −1.6 to +0.1 K with respect to the derived temperature references for two PSC types.
The use of multiple data sources and novel methods allowed Lambert and Santee (2018) to compare temperatures over a wide range of winter polar vortex conditions in both hemispheres for 2008 through 2013.Studies comparing with other data sources, such as long-duration balloon flights (Hoffmann et al., 2017, and references therein), are generally restricted to more limited spatial and temporal regimes.In addition, many of the latest-generation reanalyses assimilate data sources such as COSMIC GPS-RO and long-duration balloon flights (Fujiwara et al., 2017;Hoffmann et al., 2017;Lambert and Santee, 2018), thus complicating interpretation of differences from those data sources.Also, as noted by Lawrence et al. (2015), some of the most useful diagnostics of polar processing, while conceptually simple, depend on having full and dense coverage of the polar regions (e.g., minimum high-latitude temperatures or area of temperatures below a PSC threshold), and/or are based on vortex diagnostics that are defined by potential vorticity (PV) (e.g., vortex edge PV gradients or vortex area) that do not have corresponding observations.Furthermore, reanalyses are used in polar processing studies that span the 35 (or more) years of their duration, but much of this period lacks data with widespread coverage for comparison.Because of these limitations, comparisons of reanalyses remain one of our most valuable tools for assessing their representation of the dynamical conditions that control polar chemical processing and ozone loss.
Since the work described in Lawrence et al. (2015), the MERRA-2 reanalysis (intended as a replacement for MERRA) has become available and widely used, including for polar processing and polar vortex studies (e.g., Manney and Lawrence, 2016;Lambert and Santee, 2018;Lawrence and Manney, 2018).While Long et al. (2017) compared tem-peratures in all of the latest-generation reanalyses, they focused on zonal means and the whole stratosphere rather than on the polar lower stratosphere and diagnostics specifically relevant to polar processing.To our knowledge, no studies have been done that compared lower stratospheric polar processing diagnostics in the four most recent full-input (a dataset that ingests both surface and upper air observations) reanalyses (MERRA-2, ERA-I, JRA-55, and CFSR/CFSv2).
In this paper, we compute and analyze the diagnostics used by Lawrence et al. (2015) to provide a more complete and quantitative characterization of reanalysis differences during the satellite era.In addition to including the MERRA-2, JRA-55, and CFSR/CFSv2 reanalyses, the calculation and analysis of diagnostics has been updated to include sensitivity tests (e.g., to temperature thresholds) and to include assessment of the variability in reanalysis differences and the statistical significance of those differences.Section 2.1 briefly describes the reanalysis datasets and the assimilation system inputs most relevant to assessment of polar processing diagnostics.Section 2.2 describes the diagnostics we calculate and the methods used to analyze them.Our results are presented in Sect.3, comprising temperature (Sect.3.1), vortex (Sect.3.2), and derived (Sect.3.3) diagnostics.Section 4 gives a summary and conclusions.
2 Data and methods et al. (2017) provide detailed descriptions of the models, assimilation systems, and data inputs for the reanalyses used here in their overview paper on S-RIP.We compare the four most recent high-resolution full-input reanalyses for all winters with data available from the beginning of the "satellite era" in 1979.All of our analyses are done using daily 12:00 UT fields from each reanalysis dataset (we have tested the sensitivity of our analyses to using 00:00 UT data and have found we get virtually identical results).Because of the importance of resolution, especially in the vertical dimension, in representing the polar lower stratosphere and threshold processes in general (see, e.g., Manney et al., 2017), we start our analyses from reanalysis data on the native model levels and at or (in the case of spectral models) near the native horizontal resolution.Please see Table 1 for a list of relevant acronyms that we use below to describe the instruments, radiative transfer models, etc., that are used by the reanalyses.

CFSR/CFSv2
NCEP-CFSR/CFSv2 is a global reanalysis wherein CFSR covers 1979 through 2010 and CFSv2 covers 2011 through the present (Saha et al., 2010(Saha et al., , 2014)).The data are produced using a coupled ocean-atmosphere model and 3D-Var assimilation.CFSR/CFSv2 uses the CRTM for satellite radiance assimilation.The model resolution is T382L64, but the data used here are on a 0.5 • × 0.5 • horizontal grid on the model levels (available through 2015); vertical grid spacing in the lower stratosphere ranges from about 0.8 km near 100 hPa to 1.3 km near 10 hPa.CFSR did make an undocumented update to their assimilation scheme in 2010 (Long et al., 2017).Furthermore, in the transition from CFSR to CFSv2 in 2011, the resolution, forecast model, and assimilation scheme were all upgraded; CFSv2 is, however, intended as a continuation of CFSR and can be treated as such for most purposes (Saha et al., 2014;Fujiwara et al., 2017;Long et al., 2017); we thus treat these as a single reanalysis in this paper.

ERA-Interim
ERA-Interim (see Dee et al., 2011) is another global reanalysis that covers the period from 1979 to the present.The data are produced using 4D-Var assimilation with a T255L60 spectral model.ERA-I uses the version 7 RTTOV radiative transfer model for radiance assimilation.Here, we use ERA-Interim data on a 0.75 • × 0.75 • latitude-longitude grid (near the resolution of the model's Gaussian grid) on the 60 model levels.The spacing of the model levels in the lower stratosphere is about 1.2 to 1.4 km.

JRA-55
JRA-55 (Ebita et al., 2011;Kobayashi et al., 2015) is a global reanalysis that covers the period from 1958 to the present, and is produced using 4D-Var assimilation.The data from the JRA-55 T319L60 spectral model are provided on an approximately 0.56 • Gaussian grid corresponding to that spectral resolution.JRA-55 uses RTTOV version 9.3 for satellite radiance assimilation.The JRA-55 fields on the model vertical levels have a vertical resolution of ∼ 1.2 to 1.4 km in the lower stratosphere.

MERRA-2
MERRA-2 (Gelaro et al., 2017) is a global reanalysis produced by the National Aeronautics and Space Administration Global Modeling and Assimilation Office (NASA GMAO) covering 1980 to the present.It is based on the Goddard Earth Observing System (GEOS) 5.12.4 assimilation system, which uses 3D-Var assimilation with incremental analysis update (IAU) (Bloom et al., 1996) to constrain the analyses.MERRA-2 is intended to be a replacement for its predecessor, MERRA (Rienecker et al., 2011), as it includes many updates over MERRA (see, e.g., Bosilovich et al., 2015;Molod et al., 2015;Takacs et al., 2016).Changes between MERRA and MERRA-2 that may significantly affect representation of the lower stratosphere include the addition of new observation types in MERRA-2 (see Sect. 2.1.5, Fig. 1 and Fujiwara et al., 2017); an updated radiative transfer model for radiance assimilation; a different treatment of conventional temperature data; and assimilation of data that uses upgraded back- ground error statistics, which control the magnitude and spatial extent of the impact of observations on the assimilated product.
The MERRA-2 data products are described by Bosilovich et al. (2016).All MERRA-2 data products used here are on 72 hybrid sigma-pressure levels that have about ∼1.2 km grid spacing in the lower stratosphere, and a 0.5 • × 0.625 • latitude-longitude grid.Data from MERRA-2 from its spinup year, 1979, are not in the public MERRA-2 record, but we use data from late 1979 to start the analysis with the Northern Hemisphere (NH) 1979/1980 winter.We use the MERRA-2 "assimilated" (ASM) data collection (Global Modeling and Assimilation Office , GMAO) here, as recommended by GMAO, particularly for studies that require consistency between mass and wind fields (see, e.g., Global Modeling and Assimilation Office , GMAO; Fujiwara et al., 2017).

Timeline of satellite data inputs to DASs
Operational satellite observations are the primary data constraints on reanalyses at stratospheric levels.Additional constraints on temperature are provided by radiosonde and other conventional observations (Fujiwara et al., 2017).While conventional data are important in the lower stratosphere, especially at midlatitudes, their coverage is sparse in the NH po-lar region and very poor in the SH, so that these regions are mainly constrained by satellite radiance measurements.

MERRA-2 (d)
Figure 1.Timeline for operational satellite instrument inputs to the reanalyses used herein: panels (a) through (d) show CFSR/CFSv2, ERA-I, JRA-55, and MERRA-2, respectively.Table 1 gives a list of the acronyms used here.Within the constraint of putting them in the same order for each reanalysis, the input datasets are stacked in approximately chronological order, with earliest on the bottom and latest on the top.The black vertical line is at the middle of 1998, near the time of the TOVS to ATOVS transition (see text).See Fujiwara et al. (2017) for a similar timeline but organized per instrument.
While there are significant differences in how the reanalyses handle changing input data and which new data sources they include, the majority of the data inputs are the same for all of these reanalyses.We must thus keep in mind the possibility that similar spurious features represented in multiple reanalyses could arise from using the same (imperfect) data sources.

Methods
The methods and diagnostics used herein are largely the same as those used by Lawrence et al. (2015).In the subsections to follow, we describe the reanalysis fields we use, and how we prepare them and derive the polar processing diagnostics from them.We also provide some information on additional analysis techniques that we use to help interpret our results.

Preparation of meteorological fields
The two meteorological fields necessary to derive most of the diagnostics used herein are temperatures and PV on isentropic surfaces.In the results we present later, we show temperature diagnostics on pressure surfaces, and vortex diagnostics on isentropic surfaces; as will be discussed, we also calculate and use some temperature diagnostics on isentropic levels.
Of the four reanalyses described in Sect.2.1, only MERRA-2 provides potential vorticity on its model levels.CFSR/CFSv2, ERA-Interim, and JRA-55 have isentropic PV available, but these products are only provided on a sparse set of isentropic levels, with very few common levels between the reanalyses.ERA-Interim provides absolute vorticity on model levels, and CFSR/CFSv2 provides relative vorticity; thus to get the vertically resolved PV fields that we need, we derive PV for these reanalyses using their provided vorticity, temperature, and pressure fields on the model levels.
In the case of JRA-55, we use the zonal and meridional wind components to first calculate relative vorticity, which we then use in combination with temperature and pressure to calculate PV on the model levels.While a thorough evaluation of biases that may arise from using different types of PV calculations for different reanalyses would be valuable, it is beyond the scope of this paper.We think, however, that data users will most likely use the most direct calculation to get from the provided fields to the model level PV (as we have), and thus the fields we are comparing are those that are most likely to be used in practice.When calculating polar processing diagnostics, we scale the PV fields into vorticity units as in Dunkerton and Delisi (1986) by dividing the PV by a standard value of static stability calculated from assuming a vertical temperature gradient of 1 K km −1 and a pressure of 54 hPa on the 500 K isentropic level.We use this scaling so that the scaled PV (sPV) values are of the same order of magnitude at different levels throughout the stratosphere.
Since the reanalyses are all on different model levels, we use the reanalyses' temperature and pressure fields to vertically interpolate their temperature and PV fields to a common set of fixed pressure and isentropic surfaces.We use a standard set of pressure and isentropic levels that have been used for several NASA satellite instrument datasets (see, e.g., https://cdn.earthdata.nasa.gov/conduit/upload/4849/ESDS-RFC-009.pdf, last access: 12 September 2018) including the Aura Microwave Limb Sounder (Livesey et al., 2015); these are 12 levels per decade in pressure, and their climatologically corresponding isentropic surfaces.For polar processing diagnostics, we limit our focus to 14 levels with pressures between approximately 120 and 10 hPa; these pressure levels and their corresponding isentropic surfaces are shown in Appendix A (Table A2).

Temperature and vortex diagnostics
The depletion of ozone in the lower stratosphere follows from a complex chain of processes that are highly dependent on meteorological conditions (see, e.g., Solomon, 1999;WMO, 2014, for reviews).The activation of chlorine requires the presence of PSCs, which form when temperatures are sufficiently low, and grow when temperatures stay low for sufficiently long periods of time (Hanson and Mauersberger, 1988;Solomon, 1999;WMO, 2014, and references therein).The catalytic ozone destruction cycles involving both chlorine and bromine further require sunlight, which is usually provided later in winter and spring when sunlight returns to the high-latitude polar regions.These chemical processes also require isolation from lower-latitude air, which is provided by the stratospheric polar vortex; the edge of the polar vortex acts as a barrier preventing transport and mixing, and thus the vortex acts as a containment vessel for the polar air where these processes take place (e.g., Schoeberl and Hartmann, 1991;Schoeberl et al., 1992).We thus examine polar processing diagnostics that primarily focus on lower stratospheric temperatures and the state of the polar vortex to assess the meteorological conditions conducive to ozone depletion.Unless specified otherwise, we focus on the months from December to March (DJFM) for the NH, and May to October (MJJASO) for the SH; these time periods cover roughly the full period during which polar processing takes place for most polar winters (see Sect. 3.1).
The temperature diagnostics that we use include minimum temperatures (T min ) poleward of ±40 • latitude, and the areas of temperatures (poleward of ±30 • latitude) below PSC formation thresholds (A PSC , or area T ≤ T PSC , where the subscript "PSC" may be replaced by "NAT" or "ice" to denote the specific type of PSC).For A PSC , we specifically use the formation temperatures for solid nitric acid trihydrate (NAT; Hanson and Mauersberger, 1988) and ice particles on pressure levels, which we define using climatological profiles of HNO 3 and H 2 O mixing ratios (see Appendix A and Table A1 for values).As a rule of thumb, the NAT threshold between 120 and 10 hPa ranges from roughly 198 to 187 K, respectively, and the ice threshold tends to be between 6 and 8 K below the NAT threshold.We stress that these thresholds are approximations, but they are convenient as proxies for PSC formation and chlorine activation.While most of the results we show herein for the temperature diagnostics are on pressure levels, we also calculate them on isentropic surfaces; for the diagnostics involving PSC thresholds, we assign the PSC thresholds on pressure levels to the isentropic surfaces that are roughly co-located (e.g., 520 K corresponds to 46.1 hPa).This is an additional approximation, but it allows us to keep the intercomparisons simple without having to calculate daily varying PSC thresholds for pressures/temperatures on isentropic levels or pre-computing climatological PSC threshold values from the reanalyses' fields.(Please see Appendix A and Table A2 for the pressure sur- faces used, their corresponding isentropic surfaces, and the PSC threshold temperatures.)To mitigate issues with these approximations and test sensitivity to the thresholds, we also compute A PSC with ±1 K offsets to the PSC formation temperatures.
The vortex diagnostics that we use include maximum gradients in sPV as a function of equivalent latitude (maximum sPV gradients, or MPVGs), which assess the strength of the vortex edge (e.g., Manney et al., 1994Manney et al., , 2011;;Lawrence et al., 2015), and the area of the vortex exposed to sunlight (sunlit vortex area, or SVA).To calculate MPVGs, we bin sPV as a function of equivalent latitude (EqL, the latitude that would enclose the same area between it and the pole as a given PV contour; Butchart and Remsberg, 1986), numerically differentiate, and catalog the maximum value between ±30 and ±80 • EqL.We use ±30 • as a lower limit because relatively large PV gradients can be found in the tropics, which can dominate early/late in the season when the vortex is forming/decaying; we use ±80 • as an upper limit because the small areas represented by points poleward of ±80 • can vary much more dramatically than the lower EqLs, sometimes producing large gradients (e.g., Nash et al., 1996) that are not indicative of the vortex edge.
To calculate SVA, we calculate the area of the vortex that extends equatorward of the daily polar night latitude at 12:00 UT.The area of the vortex is determined by defining constant contours of sPV as the vortex edge over all years; herein we determined these vortex edges individually for each reanalysis from climatological seasonal averages of their maximum PV gradients (as determined above) from the extended periods of November through April for the NH, and April through November for the SH.We use periods that are longer than the DJFM and MJJASO periods we use for the intercomparisons because they help to include the formation and breakdown of the vortex.While constant vortex edges are a simplification, the ones we use here are defined for each reanalysis individually, and thus they inherently fold in any systematic differences that the reanalysis sPV fields may have.Furthermore, more common definitions of the vortex edge that provide daily varying values can be prone to giving spurious oscillations from day to day that could contaminate intercomparisons (e.g., Manney et al., 2007;Lawrence and Manney, 2018).Figure 2 shows the NH and SH profiles of vortex edges used for each reanalysis; it shows that the values obtained for each reanalysis and hemisphere are generally consistent, and that the largest differences between reanalyses below 850 K are around 0.2 × 10 −4 s −1 .

Derived diagnostics
Here, we describe some additional diagnostics that we examine later in the paper that are derived from the raw diagnostics we calculate (primarily those described above).
The winter mean volume of lower stratospheric air with temperatures below T PSC (V PSC ) is a widely used diagnostic of polar processing potential.It is often expressed as a fraction of the vortex volume (V PSC /V Vort ) to provide a measure that is independent of the substantial interannual and interhemispheric variations in vortex size (e.g., Rex et al., 2004Rex et al., , 2006;;Tilmes et al., 2006;Manney et al., 2011;Manney and Lawrence, 2016;WMO, 2014).(Again, we replace the subscript "PSC" with "NAT" or "ice" to denote specific PSC types.)Hence, V PSC /V Vort represents the approximate fraction of the vortex (for a specified altitude range) in which temperatures are low enough for the formation of PSCs.Here, we calculate V PSC and the volume of the vortex using A PSC and the area of the vortex on isentropic levels between 390 and 550 K.A PSC is calculated as described above in Sect.2.2.2; the area of the vortex is calculated similarly to SVA but instead by finding the total area within the PV contours representing the vortex edge.To get volumes, we assume each isentropic level is nominally representative of the volume of air midway between each level; for example, the 410 K level comes after 390 K and before 430 K, so 410 K is assumed to be representative of the altitude "width" between 400 and 420 K.The altitude widths of these nominal levels are determined using the Knox (1998) approximation; for the levels from 390 to 550 K, the Knox approximation gives a mean altitude differential between levels of 1.13 km with a minimum of 0.98 km, and a maximum of 1.30 km.These altitude differentials are then multiplied by the area diagnostics on each isentropic level (which are converted to km 2 ), and summed over the vertical range to get volumes.The volume fraction is then V PSC /V Vort .In the results we show later on, we specifically show winter mean V PSC /V Vort ; these winter means are taken over DJFM for the NH and MJ-JASO for the SH.
The SH vortex breakup is of considerable concern because it results in the dispersal of ozone-depleted vortex air over midlatitudes (e.g., Ajtić et al., 2003Ajtić et al., , 2004;;Manney et al., 2005c;Pazmino et al., 2005;WMO, 2007).While ozone depletion in the Arctic has not yet been large enough for this to be an ongoing concern, vortex evolution during the 2011 Arctic vortex breakup led to significant areas of ozonedepleted air over populated regions associated with increased surface UV (e.g., Manney et al., 2011;Bernhard et al., 2012).To examine the variability and representation in reanalyses of the vortex decay in the lower-to-middle stratosphere, we examine approximate vortex decay dates, which we derive using the vortex area diagnostic on isentropic levels from 460 to 850 K. Here, we calculate vortex area with +0.1 × 10 −4 s −1 sPV offsets to the vortex edges shown in Fig. 2. To accomplish this, we examine NH vortex area between 1 December and 1 June, and SH vortex area between 1 May and 1 March; we have defined the decay date as the last day before which the vortex area is above 1 % of a hemisphere continuously for 30 days.We choose 1 % of a hemisphere as the limit because this threshold is only climatologically met at all levels at the beginning and end of the seasons when the vortex is forming or breaking down, which guarantees that any time the vortex is that small, it is either significantly disturbed or in the process of decaying.The 30-day limit was chosen to help guarantee that the vortex was sufficiently coherent beforehand.Finally, we use vortex edges with the positive sPV offset mentioned above to help remove the influence of small vortex fragments that can be present at the end of the season, which in some cases can add up to areas larger than 1 % of a hemisphere and lead to marginal scenarios that can skew the decay dates.The results we show herein are not highly sensitive to changing the area threshold or using vortex area with/without the sPV offset; except in some marginal cases that we discuss later on, adjusting the area threshold between 1 and 4 % only modifies less than 10 % of the cases (i.e., different years and levels) in all the reanalyses by more than 20 days in the NH and more than 10 days in the SH.

Analysis techniques
For most of the results shown herein, we compare the diagnostics derived from each of the reanalyses to an average across all of the reanalyses, which is referred to as the reanalysis ensemble mean (REM).In Sect.3.1 and 3.2, the comparisons primarily take the form of reanalysis differences from the REM (i.e., reanalysis minus REM).
Our analysis also includes a statistical significance test to determine whether the average differences between the reanalyses and the REM are statistically different from zero over a winter season.To accomplish this, we use a nonparametric bootstrap resampling technique that is useful for time series datasets called the stationary bootstrap (Politis and Romano, 1994).Bootstrapping methods for time series have generally relied on resampling blocks of consecutive observations to construct many artificial time series so that accuracy estimates can be made for sample statistics/estimators (e.g., Lahiri, 2003, and references therein).Rather than resampling random fixed-size blocks (which may or may not overlap) to construct artificial time series, the stationary bootstrap constructs artificial time series by resampling random blocks with random sizes determined from a geometric distribution with specified mean.Herein, we bootstrap the time series of differences from the reanalyses and the REM; we treat the difference time series for each reanalysis, diagnostic, and year individually while the vertical levels are resampled together.In nearly all cases (see the NH A NAT comparisons in Sect.3.1 for the one exception), we perform stationary resampling with a specified geometric distribution mean of 10 (i.e., the expected block length is 10 days) and resample all the time series of differences 2 × 10 5 times.We note that the results shown herein are not sensitive to the choice of the expected block length; we repeated our bootstrapping analysis for different expected block lengths between 5 and 15 days, and in all cases the results were nearly identical.Ultimately, we chose 10 days as a happy medium based on examinations of the decorrelation timescales of some of the difference time series.We then use the boot-strap percentile method to construct 99 % confidence intervals (CIs) of the average differences; the percentile method is known to have issues in cases with small sample sizes, but since we use a more strict 99 % CI and our time series are longer than 120 days, we expect that our estimates are robust (see discussion in DiCiccio and Efron, 1996, and references therein).When these 99 % CIs do not contain zero, we consider the average differences for the reanalysis minus the REM (for a specific level and year) to be indicative of persistent positive or negative differences.Thus, when statistical significance is mentioned hereinafter, we are referring to significance at the 99 % confidence level.

Results
In the next two subsections, we show comparisons of temperature and vortex diagnostics as yearly time series of average differences and standard deviations calculated over the polar processing periods in each hemisphere (DJFM for the NH, MJJASO for the SH).We use these averages and standard deviations alongside the bootstrapping analysis to evaluate the agreement between the reanalyses.

Temperature diagnostics
Figure 3 shows the climatological values of minimum temperatures from the REM.The well-known difference in stratospheric temperatures between NH and SH (e.g., Andrews, 1989) is seen clearly, with the climatological period with temperatures below the NAT PSC threshold spanning approximately December through mid-February in the NH and mid-May through early October in the SH.The lowest temperatures are centered near 20 hPa at about the time of the solstice in the NH, and near 25 hPa approximately a month after the solstice in the SH.NH winter temperatures are lowest earlier in the season because of the prevalence of sudden stratospheric warmings (SSWs) in January and February in that hemisphere.
Figure 4 shows "pixel plots" of the winter mean differences in SH minimum temperatures from the REM (Fig. 4a,  c, e, g), and the standard deviations of the differences (Fig. 4b, d, f, h) for each of the reanalyses.We use similar pixel plots herein for the other diagnostics and hemispheres discussed in Sect.2.2.2.In these plots, each pixel represents a winter mean difference (i.e., reanalysis minus REM averaged over a winter period) or a standard deviation of the differences (i.e., the standard deviation of the reanalysis minus REM over the designated winter period) for a single year and vertical level.
The most striking feature shown in Fig. 4 is an overall improvement in the agreement around the turn of the century, particularly evident in MERRA-2 after 1998.This transition is also apparent in ERA-I, occurring between 1999 and 2001.In earlier years, ERA-I and MERRA-2 bracket the ensemble with differences up to ±3 K, which in later years drop to near 0.5 K.The SH CFSR/CFSv2 and JRA-55 minimum temperatures tend to reside between those of MERRA-2 and ERA-I and are generally close to the REM.
In particular, the JRA-55 differences are marked as not statistically significant (at the 99 % confidence level, as described in Sect.2.2) for many levels and years throughout the reanalysis period.The improvements after 1998 are largest at higher levels (where the differences and standard deviations are themselves largest), becoming less prominent, and less sudden, below about 50 hPa.MERRA-2 shows a change in sign of the differences in the upper levels (∼ 20-10 hPa).The overall convergence of the reanalyses after 1998-1999 is also seen as pronounced discontinuities in the standard deviations of the differences from the REM for ERA-I, JRA-55, and MERRA-2, with values frequently over 2 K before 1999 typically decreasing to below ∼ 0.8 K thereafter.The improvement is less evident in CFSR/CFSv2 with standard deviations in 1998.Investigations in progress (Long et al., 2018) show that both MERRA-2 and ERA-I temperatures in the SH polar stratosphere have oscillations of up to about 3 K that are in opposite directions, leading to the structure of the differences seen here.(Note that the absence of oscillations in the other reanalyses does not imply better agreement with sondes; Long et al., 2018).After 2000, both reanalyses show slightly positive (and, in the case of MERRA-2, largely statistically insignificant) differences from the REM at most pressure levels.CFSR/CFSv2 shows mostly positive differences between 1979 and 1986; afterward, the differences are primarily slightly negative at most of the pressure levels shown.
While the main sources of stratospheric information for all the reanalyses before 1998 are the SSU and MSU instruments, different reanalyses use different radiative transfer models to assimilate them and apply bias correction differently (Wright et al., 2018).It is particularly difficult to speculate about changes in CFSR/CFSv2, since it has multiple discontinuities and biases related to stitching together execution streams and applying a bias correction in a model with a warm bias (Long et al., 2017).Thus, while we cannot pin down particular changes that are associated with the differences among the reanalyses prior to the introduction of ATOVS data, there are numerous factors that could contribute to this behavior.
Average differences in minimum temperatures in the NH (Fig. 5) show more complicated patterns of changes over the years than those seen in the SH.The differences are much smaller throughout the 38-year period, with maximum absolute differences near 1.5 K at the highest levels shown (mainly in the period from about 1994 to 2004), and more frequent years/levels where the average differences are not statistically significant.From roughly 10 to 25 hPa, the standard deviations do decrease from above ∼ 1 K to less than ∼ 0.75 K after around 1999, though (as was the case in the SH) they remain larger in CFSR/CFSv2 than in the other reanalyses.While there are indications of changes around 1999, particularly in ERA-I and MERRA-2 they are less abrupt and of smaller magnitudes than those in the SH, and it is not as clear that there is a uniform trend towards better agreement.As in the SH, the CFSR/CFSv2 differences are primarily positive before 1987 and negative afterwards.ERA-I shows mostly positive differences except near 10 hPa.The opposite is true for JRA-55, except for the period between 1998 and 2006 when the differences are near zero, but slightly positive (and many of them not statistically significant), at pressures greater than about 30 hPa.Similar to the SH case, MERRA-2 exhibits a layered structure of differences prior to 1998: positive between roughly 60 and 30 hPa, and negative outside of this layer.After 1998, the MERRA-2 differences are mainly positive, except at pressures greater than about 50 hPa, where the differences gradually change from negative to positive between 2005 and 2010, except at the lowest levels.
Figure 6 shows the REM climatological values of the area with temperatures below the NAT PSC threshold (A NAT ) for the NH and SH winter seasons.As expected, these echo the patterns of minimum temperatures seen in Fig. 3, with the largest areas in the NH in early January, and in the SH in middle to late July.The great variability in the NH (see the grey envelopes in the line plots) results in the largest values being well above the climatological average, about 7 %-8 % of a hemisphere, but still much lower than the largest average values in the SH of over 10 % of a hemisphere.
Note that comparing differences in NH A NAT among the reanalyses is more difficult than doing so for the SH or for the other NH diagnostics.Because there is significant interannual variability in the onset, termination, and magnitudes of low temperatures in the NH (see both Figs. 3 and 6), there are many NH winters with relatively few days having temperatures below T NAT and thus many days with NH A NAT being zero.Thus, comparing differences among the reanalyses for the full DJFM time period can often be unfairly biased by the high occurrence of zeros, which artificially decreases the average differences and standard deviations.To allay this issue such that we fairly compare NH A NAT , we modify our analysis procedure as follows: we use time series of the REM NH A NAT in November through April on 30, 50, and 70 hPa to define approximate start and end dates for the periods having non-zero A NAT .We use 30 hPa to define the onset dates (because A NAT usually first becomes non-zero around this level; e.g., Fig. 6a), and 50 or 70 hPa to define the termination dates.More specifically, we define the onset dates for each year as the first day at 30 hPa having non-zero A NAT , and the termination dates as the latest day chosen by either 50 or 70 hPa having non-zero A NAT ; both 50 and 70 hPa are used because termination most often happens latest around 70 hPa as seen in Fig. 6a, but in some winters it happens later around 50 hPa.This process gives us individual "NAT seasons" between 1979/1980 and 2016/2017; these have a median length of 85 days, with the minimum and maximum number of days being 40 and 126, respectively.We then use these truncated time series to define the average differences and standard deviations thereof.This modifies the bootstrapping procedure described in Sect.2.2.4; we still perform 2 × 10 5 stationary bootstraps for each year, but because the lengths of the time series vary, we also vary the expected block size for each year by specifying them as the nearest integer to the cube root of the time series lengths plus a constant offset of +5 (which ranges from 8 to 10 days for time series lengths between 40 and 126 days).As was found for the regular bootstrapping procedure, using different expected block lengths with offsets between 0 (3 to 5 days) and 10 (13 to 15 days) had very little effect on the statistical significance results.
Figure 7 shows A NAT differences from the REM for the SH winter seasons.There is a very apparent sudden decrease in the seasonal standard deviations of the differences at levels above ∼ 25 hPa after 1998 similar to but much more pronounced than in the case of minimum temperatures.The 1998-1999 boundary is less obvious in the average differences for CFSR/CFSv2 and JRA-55 but is apparent in ERA-I and MERRA-2.By these metrics, all four reanalyses converge toward better agreement following the TOVS/ATOVS transition.The patterns of differences largely mirror (in an opposite sense) the patterns shown in Fig. 4; that is, there tend to be positive/negative differences from the REM in A NAT wherever there are negative/positive differences from the REM in minimum temperatures.ERA-I and MERRA-2 display layered difference structures prior to 1998; these layers of positive and negative differences are separated by approximately the 30 and 70 hPa pressure levels.As in the case of the minimum temperatures, the layered structures are more persistent in MERRA-2, extending between 1980 and 1998, whereas the one in ERA-I becomes apparent after 1986.JRA-55 and CFSR/CFSv2 are more often closer to the REM in terms of both the mean differences and the standard deviations.For CFSR/CFSv2, at pressures greater than 20 hPa, the differences are mostly negative (smaller A NAT ) prior to 1986, and mostly positive thereafter.No clear pattern is apparent for JRA-55, although after approximately 2005 each reanalysis generally has a uniform sign of the differences from the REM in the deep layer between 120 and 10 hPa.Overall, the largest mean differences tend to be at levels above (pressures lower than) ∼ 20 hPa prior to 1998, with mean differences as large as ±1.5 % of a hemisphere; at higher pressures in the lower stratosphere where the bulk of polar processing takes place, average differences are often well within ±1 % of a hemisphere during this time and within ±0.5 % of a hemisphere thereafter.Despite the better agreement, in later years, many of the differences remain statistically significant after 1998; given the low standard deviations, these results indicate small but persistent (i.e., roughly constant) differences relative to the REM.
Differences in NH A NAT from the REM (Fig. 8) show more complex patterns than those in the SH and less of an obvious convergence toward better agreement after 1998, similar to the corresponding T min differences.The differences do decrease after about 2000, with most average differences being between ±0.25 % of a hemisphere.JRA-55 does show a narrow band of slightly larger positive differences continuing into the later years between about 30 and 15 hPa.MERRA-2 and ERA-I exhibit a pattern of opposing differences in this same layer between 1986 and 1998, but a layered structure of positive and negative differences at the lower levels is mostly only apparent in MERRA-2, consistent with the structure of the T min differences seen in Fig. 5. Overall, the differences are mostly negative in CFSR/CFSv2 and ERA-I, and positive in JRA-55 and MERRA-2, but there is a considerable dependence on time and pressure for all the reanalyses.As was the case for the SH, the standard deviations decrease over time with the largest values seen before 2001.There is a considerable year-to-year variability in the standard deviations at the higher levels in the earlier period with some years especially standing out (1986 in CFSR/CFSv2 and MERRA-2, 1996 in ERA-I and MERRA-2, and 2000 in ERA-I).These highest levels tend to be where A NAT is climatologically marginal (see Fig. 6).
Overall, the patterns of differences in A NAT qualitatively follow those in T min in both hemispheres: positive/negative differences in A NAT correspond to negative/positive differences in the minimum temperatures, as expected.However, the patterns of statistical significance are often different.For example, broad patches of largely statistically insignificant differences in T min in MERRA-2 and JRA-55 in both hemispheres after 1998 do not always translate into differences in A NAT marked as not significant.Furthermore, the largest (most positive) values of one diagnostic do not always yield the smallest (most negative) ones in the other, and vice versa.Even more strikingly, the patterns of standard deviations, while overall similar, do not exhibit a simple monotonic relationship with those in T min and generally display much more year-to-year variability before 2000.This is not unexpected as A NAT differences depend not only on overall temperature biases but also on the morphology of the fields (e.g., spatial patterns or gradients), which varies from year to year and, to

Vortex diagnostics
Figure 9 shows the NH and SH climatologies of REM MPVGs.The evolution of MPVGs is quite similar in both hemispheres, particularly above 500 K; the gradients in sPV gradually increase over time, reaching maxima in roughly mid-February in the NH and early October in the SH.These patterns largely reflect two effects: one is the seasonal cycle of the vortex building up strength and subsiding.The other is the build-up effect from wave breaking and mixing/erosion of PV in the surf zone (the region of low-magnitude PV outside the vortex, e.g., McIntyre and Palmer, 1984) over the season, which can act to sharpen the gradients of PV in the vortex edge region.Generally, MPVGs provide a measure of the strength of the vortex edge as a transport barrier.For simplicity, in the discussion of results below, we will refer to 1.0 × 10 −6 s −1 deg −1 as 1 scaled PV gradient unit, or 1 PVGU.The averages and standard deviations of differences from the REM SH MPVGs are shown in Fig. 10.Through about 1998-2000, ERA-I and JRA-55 show similar patterns of differences from the REM, with a band of near-zero (for JRA-55) or small negative (for ERA-I, magnitudes up to ∼ 1.5 PV-GUs) differences that are usually not statistically significant (at the 99 % confidence level, as described in Sect.2.2) be- low about 460 K, positive differences between about 460 and 660 K, and negative differences above.The ERA-I differences are generally not statistically significant between 580 and 750 K.In the same time period, MERRA-2 shows an approximately opposite pattern, with negative differences from about 460 to 660 K and positive differences above and below; from about 1995 to 1999, MERRA-2 shows large average differences up to about 3.5 PVGUs above 700 K.In contrast to the banded structures in the other reanalyses, CFSR/CFSv2 generally shows small-magnitude negative differences across the levels and period, except during the period including 1985 through 1996 above about 700 K.The seasonal standard deviations of the differences are relatively small (usually on the order of 0.5-1.5 PVGUs), suggesting that statistically significant differences in the reanalyses typically represent differences that are more systematic in nature for MPVGs at these levels and times.The standard deviations tend to increase with height, especially at levels in the middle stratosphere above about 660 K where the differences can exceed 2.5-3 PVGUs.CFSR/CFSv2 and JRA-55 generally show slightly lower standard deviations than ERA-I and MERRA-2, and MERRA-2 shows a cluster of years between about 1994 and 1998 with large standard deviations above 700 K.After about 1998, there is a noticeable shift toward better agreement in most regions, similar to that seen in the SH temperature diagnostics.CFSR/CFSv2 and JRA-55 do not show an obvious improvement below about 550 K but already had close agreement with the REM there.In ERA-I and MERRA-2, most regions show small (magnitude less than 1 PVGU) differences that are not statistically significant after 1998.This shift toward better agreement is also reflected in the standard deviations, which markedly decrease in all the reanalyses, especially at levels above about 580 K. Similar to the temperature diagnostics, the TOVS to ATOVS transition most likely played a large role in this shift, with differences in the handling of this transition and the addition of AIRS radiances in 2002 also expected to be significant factors.Differences in NH MPVGs from the REM (Fig. 11) indicate that CFSR/CFSv2 generally has smaller, and JRA-55 larger, PV gradients than the REM at levels up through about 750 K. ERA-I and MERRA-2 show smaller and less systematic patterns of differences that typically are not statistically significant.ERA-I does show a small vertical region with significant positive differences from the REM between 520 and 580 K until about 2001, similar to its pattern for the SH but with overall smaller differences.The standard deviations of the differences are largely consistent among the reanalyses; other than a few standout cases in ERA-I and MERRA-2 (1994/1995 in MERRA-2 and 2000/2001 in both), the standard deviations tend to increase consistently with height from less than 0.8 PVGUs at the lowest levels, to about 1.5+ PV-GUs at the highest levels.There is some indication of convergence toward better agreement in MPVGs after roughly 2001 in MERRA-2 and ERA-I (when the reanalyses, except JRA-55, began assimilating AIRS radiances; Fig. 1), though most differences from the REM for these two reanalyses were not statistically significant even in the earlier years.No qualitative improvement in agreement with the REM is apparent in the CFSR/CFSv2 or JRA-55 differences, but the standard deviations of the differences do seem to decrease slightly above about 580 K for years after 2001.
Figure 12 shows the REM climatologies of SVA for both hemispheres.As was the case for MPVGs, the seasonal patterns of SVA for both hemispheres are similar.In this case, the patterns are largely due to the lack of sunlight early in the winter season, which gradually returns later on.However, there are notable differences between the hemispheres, particularly that SVA tends to be smaller in the NH; this is because the NH polar vortex is almost always smaller than its SH counterpart.The NH also shows relatively larger values in early winter above about 650 K, resulting from the NH vortex being more often disturbed and shifted to lower latitudes within sunlight.During individual winters, and given sufficiently low temperatures, the amount of vortex air exposed to sunlight at any time is generally indicative of the amount of air where ozone depletion can take place.The averages and standard deviations of differences of SH SVA from the REM are shown in Fig. 13.There are some persistent patterns of differences among the reanalyses; JRA-55 SVA is consistently smaller than that from the REM between about 430 and 700 K and larger above and below.Above about 700 K, JRA-55 differences are generally not statistically significant through about 2003, after which each of the other reanalyses evaluated had started assimilating AIRS radiances (see Fig. 1 and Fujiwara et al., 2017).The other reanalyses generally show sandwiched structures of negative and positive differences: MERRA-2 (ERA-I) shows positive (negative) values between 430 and 520 K, with negative (positive) values above and below.CFSR/CFSv2 shows positive values between about 490 and 660 K, and small (often not statistically significant) negative values at higher and lower levels; in this case, the band of positive differences extends to higher levels after 1998.In the top several levels (approximately 750 to 850 K), agreement of the reanalyses with the REM appears to degrade starting about 1999-2000: ERA-I and MERRA-2 show a decrease in the number of values that are not significantly different from zero, while JRA-55 shows a similar decrease starting around 2003-2004.MERRA-2 differences increase in magnitude in this region and time period, and those in JRA-55 change from negative to positive, while ERA-I shows increased differences, near/over 2.5 %, at the highest levels in 1999-2001.CFSR/CFSv2 shows an increase in the significance of the differences at these levels after 2010 (the time of the CFSR to CFSv2 transition).The standard deviations of the differences are the highest at levels above 660 K where they are often above 1 % of a hemisphere.These are more pronounced in ERA-I, which shows standard deviations often ranging above 1 %, with some years reaching over 2 % above 660 K. Some slightly larger (0.4 % to 0.8 %) standard deviations are also seen at the lowest levels (390 and 410 K), which are around the top of the subvortex region for the SH.After 2001, the standard deviations of differences are generally less than 0.4 % of a hemisphere at most of the levels between 390 and 750 K in all the reanalyses, suggesting a small shift towards more consistent SVA differences compared to the REM among the reanalyses.Examination of the reanalyses' differences in vortex area from those in the REM reveal they are nearly identical to those for SVA, indicating that the differences are largely dominated by differences in the area enclosed within the vortex edge contours.
The patterns of averages and standard deviations of differences in SVA for the NH (Fig. 14) are quite different from those in the SH: MERRA-2 and ERA-I show overall positive differences (except for narrow bands of small differences at the highest levels that are not significant), while JRA-55 shows overall negative values.CFSR/CFSv2 shows negative values below about 520 K and at 800 and 850 K, with positive values in between.There is no obvious indication of a decrease in the magnitude of the differences over the period compared.The standard deviations of NH SVA differences from the REM generally look consistent between the reanalyses, with the largest values greater than 0.7-1.2% of a hemisphere usually confined to a band of levels between 700 and 850 K.At lower levels, however, the standard deviations are quite small throughout the period, generally on the order of 0.5 % of a hemisphere or less; CFSR/CFSv2 shows slightly higher values below about 460 K.As was the case for the SH, the SVA differences are dominated by differences in total vortex area among the reanalyses.Thus, while there is no consistent change in agreement over the years, our results indicate persistent differences in the size of the contours used to define the vortex edges and hence some persistent differences in the isentropic PV fields (reflected in differences in the PV values at which the maximum PV gradients are located).

Derived vortex-temperature diagnostics
The diagnostics shown in the following subsection are derived from the temperature and/or vortex diagnostics shown in the previous two subsections.Figure 15 shows the winter mean volume of temperatures below T ice in the SH expressed as a fraction of the vortex volume, calculated for the central PSC threshold and the ±1 K sensitivity thresholds (see Sect. 2.2.2).Keeping in mind that T ice was estimated assuming nominal pressure levels for isentropic levels (see Sect. 2.2.2), which can result in significant overestimates of areas/volumes, Fig. 15 shows that the volume fraction of cold air is relatively constant from year to year.Generally, the fractions of the vortex are between 0.20 and 0.30 each year, with sensitivities to the ice threshold offsets often less than ±0.05.During the winters of 1979 through 1986, there is a very persistent pattern with CFSR/CFSv2 having the lowest, and ERA-I having the highest, cold volume fractions of the vortex.During this period, CFSR/CFSv2 vortex fractions can individually be lower than the other reanalyses by nearly 0.025 to 0.03.These same years also have the largest inter-reanalysis spreads, with differences between the largest and smallest vortex fractions often greater than 0.04.For nearly all years between 1996 and 2016, ERA-I tends to have the lowest volume fractions, ranging from roughly 0.01 to 0.02 lower than the other reanalyses.For the years from 2007 to 2015, JRA-55 consistently has the highest volume fractions, but in these cases the interreanalysis differences are generally quite small.Differences among the reanalyses in the temperature threshold sensitivity envelopes are quite small, which indicates that there are not any persistent differences in temperature gradients among the reanalyses.
Potential polar processing volumes in the NH are much lower and much more variable than those in the SH.The NH fraction of vortex volume below T NAT (Fig. 16) shows values in the colder years that are comparable to those below T ice in the SH.The lowest values are seen in 1984/1985, 1998/1999, 2001/2002, and 2003/2004, which are all years with very early (mid-December to the beginning of January) major SSWs that profoundly affected the entire stratosphere, including strongly disrupting the lower stratospheric vortex (e.g., Manney et al., 1999Manney et al., , 2005b;;Naujokat et al., 2002); in these years, the fractional volumes are near 0.03, as opposed to nearly 0.30 in the coldest years (e.g., 1996, 2011, 2016).
The range of values from the PSC threshold temperature sensitivity tests varies from about ±0.02 in the warmest years up to over ±0.05 in the coldest years, with differences between reanalyses indicating some differences in horizontal temperature gradients (especially in, e.g., 1997, 2009, and 2011). is not a monotonic progression, so any trend towards better agreement is masked by the larger influence of specific interannually varying conditions that affect the PSC volumes.We note that the results of intercomparisons of V PSC /V Vort outlined above are not very sensitive to the vortex volumes.  1 9 9 8 1 9 9 9 2 0 0 0 2 0 0 1 2 0 0 2 2 0 0 3 2 0 0 4 2 0 0 5 2 0 0 6 2 0 0 7 2 0 0 8 2 0 0 9 2 0 1 0 2 0 1 1 2 0 1 2 2 0 1 3 2 0 1 4 2 0 1 5 2 0 1 6 0.00 When comparing these V PSC /V Vort results having V Vort determined from the reanalyses' individual vortex areas to V PSC /V Vort calculated using the REM V Vort (i.e., the reanalyses' V PSC divided by the REM V Vort ), the reanalysis magnitudes and orderings remain generally consistent.However, using the REM V Vort does tend to decrease the interreanalysis spreads and the sensitivities to the ±1 K PSC temperature thresholds.
Figure 18 shows that the NH vortex breakup is much more variable from year to year than that in the SH.Unlike the SH vortex, the NH vortex can decay nearly simultaneously over a wide range of levels (e.g., 1984 and 1999), or it can decay earlier at some low levels and later at higher levels (e.g., 2001 and 2009).Such variability in vortex decay is due to large variability induced by SSW disturbances to the vortex, as well as polar night jet oscillation events in which the middle and upper stratospheric vortex rapidly reforms following some major and minor disturbances (e.g., Hitchcock et al., 2013;Lawrence and Manney, 2018, and references therein).The reanalyses' differences from the REM are generally quite small; over 90 % of the differences are between ±4 days.With the exception of JRA-55, the reanalyses show no predominant patterns of differences (e.g., positive or negative bands).JRA-55 does seem to have a slightly more pronounced band of negative differences from about 620 to 700 K (with a band of small but positive differences above), in the same region where the JRA-55 vortex area differences tend to be the most negative (not shown directly but consistent with Fig. 14).There are also several outlier cases with absolute differences from the REM greater than 20 days (denoted by the white X symbols).Most of these cases are marginal scenarios when either the REM or the reanalyses' vortex areas oscillate above and below the specified 1 % of a hemisphere threshold at some levels, causing our algorithm to pick disparate decay dates.Many of these outlier cases occur at different singular levels and years in the reanalyses, but 460 K 2003/2004 does show up as a negative outlier in both CFSR/CFSv2 and JRA-55, while 660 K 2005/2006 shows up as a positive outlier in both CFSR/CFSv2 and ERA-I.

Conclusions
We have herein done an extensive intercomparison of diagnostics relevant to polar chemical processing among four recent full-input reanalyses, using the REM as a reference to compare CFSR/CFSv2, ERA-I, JRA-55, and MERRA-2.The diagnostics we compare are based on polar vortex and temperature conditions in the lower-to-middle stratosphere, and comprise measures of PSC formation and chlorine activation based on temperatures; vortex size, strength, and sunlight exposure; and additional diagnostics derived from those directly obtained from temperatures and vortex characteristics.They thus provide a thorough assessment of the reanalyses' representation of the potential for polar processing and ozone loss in both hemispheres.The main findings of our analyses are summarized in the following subsection.

Summary
Temperature diagnostics related to polar processing converge towards better agreement in the SH over the period compared (from 1979 to present).In the period prior to ∼ 1999, reanalysis differences in minimum temperatures compared to the REM could be as large as ±3 K, particularly at pressures below 30 hPa; in years after, reanalysis minimum temperature differences from the REM decrease to within roughly ±0.5 K throughout the 120-10 hPa column.The reanalysis differences from the REM for SH areas with temperatures below the NAT PSC threshold (A NAT ) show a similar and consistent shift, with differences among the reanalyses being as large as ±1.5 % of a hemisphere prior to ∼ 1999 but within ±0.5 % of a hemisphere thereafter.This shift toward better agreement in ∼ 1999 is seen as both a sudden decrease among the winter-averaged differences from the REM and as a sudden decrease in standard deviations of reanalysis minus REM differences, which is consistent with previous studies (e.g., Long et al., 2017) that show large improvements in zonal mean temperatures after the reanalyses transition from assimilating TOVS observations to including ATOVS.In the NH, the agreement among the reanalyses before ∼ 1999 was already much closer (generally within ±1.5 K from the REM for minimum temperatures but often within a much smaller margin at pressures greater than 30 hPa), but the average differences and standard deviations also decreased to a lesser extent thereafter.The structure of average differences, particularly before ∼ 1999, is varied among the reanalyses.MERRA-2 and ERA-I generally showed banded structures of average differences from the REM that changed signs with height prior to ∼ 1999.CFSR/CFSv2 tended to have average differences of the same sign throughout the 120-10 hPa column up until 1987, after which the differences switched signs (as in the case of minimum temperatures) or became more varied (as in the case of A NAT ).The structure of average differences from the REM for JRA-55 was generally a bit more complicated than that   17 but for the NH.The color bar ranges are restricted to distinguish differences of a few days; differences whose magnitude greatly exceeds the range (by more than 7 days and thus differences with magnitude greater than 21 days) are marked with a white X symbol.
in the other reanalyses but did show that the signs of the differences changed in the lower stratosphere between 100 and 30 hPa.The standard deviations of differences from the REM were quite consistent among the reanalyses; they increased with height (not necessarily monotonically), particularly at pressures lower than 30 hPa.
Differences from the REM among the reanalyses for SH MPVGs showed a similar convergence toward better agreement as did the temperature diagnostics.Differences from the REM were within roughly ±2.5 PVGUs prior to ∼ 1999 but within ±1 PVGU after.The standard deviations of the SH differences increased with height to values that were commonly above 2 PVGUs, particularly at isentropic levels above 600 K; after 1999, these standard deviations decreased in magnitude, but the pattern of values increasing with height remained consistent.In these cases the differences from the REM for SH MPVGs were consistently negative across all years for CFSR/CFSv2, while those for JRA-55 were consistently positive.ERA-I and MERRA-2 had banded structures of differences similar to those in the SH temperature diagnostics that mostly disappeared after ∼ 1999.In the case of NH MPVGs, differences remained largely constant over time and potential temperature levels, generally being within ±1.5 PVGUs of the REM with standard deviations that increased with height.Here, again, CFSR/CFSv2 had average differences from the REM that were consistently negative, while JRA-55 was consistently positive.Differences from the REM in sunlit vortex area across the reanalyses in both hemispheres remained relatively constant over time, and they overall followed differences in the raw vortex areas.
In the SH, all the reanalyses showed similar magnitudes and temperature threshold sensitivities in the winter mean volume of air (as a fraction of vortex volume) below ice PSC thresholds.In the NH, the winter mean volume of air below NAT PSC thresholds varied much more from year to year, and the differences among the reanalyses and sensitivities to the temperature offsets were much larger percentages of the actual derived values.These characteristics are in many ways to be expected, since SH winters are much more consistent from year to year than NH winters; thus, even though the individual temperature polar processing diagnostics showed much larger average differences and standard deviations in the SH, the aggregation of the full winter seasons in the winter mean V PSC /V Vort led to more consistent results.For the vortex decay dates, the reanalyses' differences from the REM generally followed their differences from the REM in vortex area (and as a result, sunlit vortex area) in that wherever the reanalyses had larger/smaller vortex areas, they also had later/earlier vortex decay dates.The agreement among the reanalyses for vortex decay dates was generally best in the NH, despite there being some marginal cases with large differences due to vortex disturbances.

Implications
The results shown herein illustrate some implications that may be expected for polar processing studies using reanalysis temperatures and PV in the stratosphere.These implications will generally depend on the hemisphere in question and the details of the study.For example, the derived diagnostics in Sect.3.3 demonstrate that in the aggregate most SH winters in the satellite era are quite similar, and that the sensitivities to different PSC temperature thresholds are consistent among the reanalyses.However, the differences shown in Sect.3.1 indicate that differences can depend strongly on the levels and years examined, especially prior to 1999 before the assimilation of AMSU data in the reanalyses.Thus, studies that discuss SH winter conditions in aggregate are less likely to be affected than detailed studies (e.g., those making use of nudged and specified dynamics models, and/or Lagrangian transport models), whose conclusions could be significantly altered by the details of how, when, and where the temperatures differ among the reanalyses.In contrast, for the NH, Sect.3.1 showed that temperature diagnostic differences were relatively small among the reanalyses, but the results in Sect.3.3 showed that the winter-mean derived diagnostics vary widely between reanalyses in some cases, and can be highly sensitive to the specific temperature thresholds used.Clearly polar processing potential is often much smaller in the NH than in the SH, and thus conclusions based on the often marginal conditions of the NH are much more likely to be affected by small differences among the reanalyses.Thus, both detailed and aggregate studies of NH polar processing could in some cases be markedly affected by differences among the reanalyses.However, all of the reanalyses do show similar interannual variations among the derived diagnostics, and thus for purposes of putting some NH winters into the context of others (e.g., comparing how cold some are relative to others), any of the reanalyses would give similar results.The extent to which different kinds of studies of NH and/or SH polar processing may be affected is beyond the scope of this paper, but work is in progress within S-RIP to explore some of these implications.
It is difficult to assess the potential implications of differences among the reanalyses in the vortex diagnostics.Since MERRA-2 is the only reanalysis that provides PV fields on its model levels, we have applied the strategy we think other data users requiring PV on model levels would use, which was to derive PV from each reanalysis using their available model level products.Thus, it is important to recognize that the vortex diagnostics used herein are derived from PV fields that are calculated from the different reanalyses in different ways, which makes it problematic to assess whether and the extent to which the reanalysis differences are due to differences in such factors as the calculations, dynamics, and vertical and/or horizontal resolution.Because MERRA-2 includes PV calculated within its DAS, we generally consider MERRA-2 PV to be more consistent and complete than the PV fields derived from the other reanalyses' model level data.Despite these complicating factors, our treating each of the reanalyses equally (same procedure for calculating MPVGs, and using each of the reanalyses' climatological MPVGs to define vortex edges) allows us to draw some useful conclusions.While there were some small indications of convergence toward better agreement in MPVGs for both hemispheres among some reanalyses (see Sect. 3.2), there were primarily persistent differences in SVA.Given the combination of differences in MPVGs and SVA (and raw vortex area), the results shown here indicate that there are some inherent differences in the PV fields that lead to somewhat disparate equivalent latitude mappings, which in some cases could alter conclusions drawn about transport barriers and trace gases in equivalent latitude coordinates.It is also possible that results for the SH were contaminated by the presence of double-peaked (bifurcated) PV gradients (e.g., Conway et al., 2018) that could have different magnitudes or structures among the reanalyses.

Recommendations
All of the reanalyses used here represent very large improvements over those commonly used a decade ago, and with those improvements comes much closer agreement in the polar processing diagnostics presented here.The older reanalyses, especially ERA-40 and NCEP/NCAR and NCEP/DOE, have long been obsolete and are not recommended for studies focused on polar processing and the stratosphere in general (see Fujiwara et al., 2017, and references therein).Any of the modern reanalyses evaluated herein are much better choices for polar processing studies as they all provide more accurate and similar representations of interannual variability in polar processing diagnostics in both hemispheres.
In general, it is always better to use more than one reanalysis, even for studies involving recent winters where it can reasonably be expected that differences among the reanalyses will be small.One of the best ways to express uncertainty in results is using multiple reanalyses, and explicitly showing and discussing how they agree/disagree, and whether any differences affect the findings; this is especially important for diagnostics that cannot be compared with observations.As previously shown by Lawrence et al. (2015) for polar processing diagnostics, and Long et al. (2017) for zonal means, our intercomparisons (see particularly  show that there are substantial (especially large in the SH) changes in temperature-based diagnostics that are clearly related to changes in assimilated data inputs among the reanalyses.Since many of the major changes in data inputs are made at approximately the same time in each reanalysis, the agreement or lack thereof between the reanalyses does not provide the information to assess the degree to which these changes are caused by changes to the assimilated observations.We thus emphasize here that reanalysis temperatures, especially in the Antarctic, are not generally suitable for as-sessment of trends in temperature-based diagnostics; use of reanalyses in trend studies should be regarded with skepticism and only attempted after rigorous assessment of the relationships of temperature changes to observations assimilated (which, to our knowledge, has not been done for most of the reanalyses considered here); were such a study to be done, agreement among multiple reanalyses would in addition be required to consider any trends robust.
When using multiple reanalyses, it is important to treat them as fairly and equally as possible to reduce the uncertainty in sources of differences.For example, using one reanalysis with data on model levels and another one with data on pressure levels is not recommended.It is also important to be clear whether and how fields/quantities are derived from the products provided by the reanalyses, as we have done herein with PV.Until and unless reanalysis centers provide standard sets of products on standardized isobaric and isentropic levels, users of reanalysis data will generally be best served by using model data to vertically interpolate and derive fields as needed.Numerous evaluations of reanalyses for S-RIP are finding, as we have here, that it would be valuable to have PV on model levels available in future reanalyses.
With regard to more specific polar processing applications, we also recommend that trends, correlations, and/or other similar analyses of diagnostics that assess low temperatures aggregated over winter months, seasons, and/or vertical levels in the NH polar region be performed with caution.Figure 16 demonstrates that there is non-negligible interannual variability in the sensitivity to the specific temperature values chosen to represent NAT PSC thresholds that are used to calculate the volume of air below NAT thresholds in the NH, especially relative to the SH (Fig. 15) in which we used the lower ice PSC thresholds.The vortex diagnostics in Sect.3.2 show some differences that appear to be related to biases between PV in the reanalyses, arguing for careful assessment of the sensitivity of vortex diagnostics to exact PV values.Because many of the diagnostics that are most informative about lower stratospheric polar chemical processing cannot be readily validated by comparison with data, the comparison of reanalyses is a powerful tool for assessing robustness and uncertainty in these diagnostics.

Figure 2 .
Figure 2. Potential temperature profiles of the reanalysis vortex edges determined from climatological maximum PV gradients expressed as scaled PV for the NH (a) and the SH (b) .Green, blue, purple, and red indicate CFSR/CFSv2, ERA-I, JRA-55, and MERRA-2, respectively.

Figure 3 .
Figure 3.Time series of reanalysis ensemble mean (REM) (a, c, e) Arctic and (b, d, f) Antarctic climatological (1979/1980 through 2016/2017 in the NH, and 1979 through 2016 in the SH) minimum high-latitude (poleward of 40 • latitude) temperatures; panels (a) and (b) show contour plots (blue/purple shades represent low temperatures and reds high temperatures).The horizontal black lines mark the 30 hPa and 70 hPa pressure levels for which line plots are shown in panels (c) and (d) and in (e) and (f), respectively.The shading in the line plots shows the range of REM values on each date, and the black line shows the average values.Note that the time period shown is longer in the SH than in the NH.The same color range is used for each hemisphere, with values from 180 to 220 K shown (blues to reds; all values below/above 180/220 K are shown in the deepest blue/red).

Figure 4 .
Figure 4. SH winter season (MJJASO) (a, c, e, g) averages and (b, d, f, h) standard deviations of minimum temperature differences for each reanalysis from the reanalysis ensemble mean (REM) as a function of year and pressure for the 1979 through 2015 winters, concatenated from the individual years into pixel plots as described in the text.Columns of grey pixels indicate years with no data.Pixels with X symbols inside indicate years and levels where the differences from the REM are insignificant according to our bootstrapping analysis (see Sect. 2.2.4).Blues in the average difference panels show negative values (reanalysis less than the REM) and reds positive values (reanalysis greater than the REM); in the standard deviation panels, yellow/deep blue shades represent low/high standard deviations of the reanalysis differences, respectively.

Figure 5 .Figure 6 .
Figure 5.As in Fig. 4 but for the NH winter seasons (DJFM) for 1979/1980 through 2015/2016.Note that different color ranges are used for the NH shown here than in Fig. 4 for the SH.

Figure 7 .
Figure 7.As in Fig. 4 but for area with temperatures below the NAT PSC threshold in the SH.

Figure 8 .
Figure 8.As in Fig. 7 but for the NH.See text explanation of date ranges used for the calculations.

Figure 9 .
Figure 9.As in Fig. 3 but for maximum sPV gradients.The color range in panels (a) and (b) is from 0 to 20 scaled PV gradient units (PVGUs) (blues to reds), with all values over 20 shown in the deepest red.

Figure 10 .
Figure 10.As in Fig. 4 but for maximum sPV gradients.

Figure 11 .
Figure 11.As in Fig. 10 but for the NH.

Figure 12 .
Figure12.As in Fig.9but for sunlit vortex area.The color range in panels (a) and (b) is from 0 % to 13 % of a hemisphere (blues to reds), with all values over 13 % shown in the deepest red.

Figure 13 .
Figure 13.As in Fig. 4 but for SH sunlit vortex area.

Figure 14 .
Figure 14.As in Fig. 13 but for the NH.
The interannual variability is well represented in all of the reanalyses.The central values usually vary more between reanalyses in colder years; e.g., 1996 and 2011 stand out as showing wide ranges of about 0.045.Between 1992/1993 and 2016/2017, ERA-I tends to have the smallest vortex fractions.In contrast, JRA-55 tends to have the overall largest NAT vortex fractions, having the largest values for 32 of the 38 years, with many cases being noticeably offset from the other reanalyses.While many of the recent years show smaller ranges of central values than the early years, there

Figure 15 .
Figure15.Winter means of the fraction of vortex volume between the 390 and 580 K isentropic surfaces with temperatures below T ice in the SH (a, c), and range of values obtained for the ±1 K sensitivity tests (b, d).The colored bars show the range of values obtained from the tests of sensitivity to the PSC threshold temperature used (see Sect. 2.2.2), while the black dots show the value for the "central" threshold temperature.The winter mean is calculated over the full MJJASO period.For each year, the reanalyses are ordered from smallest central value on the left to largest central value on the right; this order is also given as a text string at the top of the column for each year.The numbers at the bottom of each year's column indicate the difference in winter mean fraction between the largest and smallest central values for the winter season (i.e., between the rightmost and leftmost black dots).In the range panels (b, d), the range about the central value (black dots in a and c) is shown for each reanalysis.Green, blue, purple, and red indicate CFSR/CFSv2, ERA-I, JRA-55, and MERRA-2, respectively.

Figure 16 .
Figure 16.As in Fig. 15 but for the NH and temperatures below T NAT .

Figure 17 .
Figure 17.Pixel plots of (a) vortex decay dates (see text for the definition) based on the REM of vortex area, and (b-e) the difference between the vortex decay dates in each of the reanalyses from the REM (as reanalysis − REM).

Figure 18 .
Figure18.As in Fig.17but for the NH.The color bar ranges are restricted to distinguish differences of a few days; differences whose magnitude greatly exceeds the range (by more than 7 days and thus differences with magnitude greater than 21 days) are marked with a white X symbol.

Table 1 .
List of acronyms for reanalysis assimilated observations and radiative transfer models.