A comparison of Loon balloon observations and stratospheric reanalysis products

Location information from long duration super pressure balloons flying in the Southern Hemisphere lower stratosphere during 2014 as part of X Project Loon are used to assess the quality of a number of different reanalyses including National Centers for Environmental Prediction Climate Forecast System version 2 (NCEP-CFSv2), European Centre for Medium-Range Weather Forecasts (ERA-Interim), NASA Modern Era Retrospective-Analysis for Research and Applications (MERRA), and the recently released MERRA version 2. Balloon GPS location information is used to derive wind speeds 5 which are then compared with values from the reanalyses interpolated to the balloon times and locations. All reanalysis data sets accurately describe the winds, with biases in zonal winds of less than 0.37m/s and meridional biases of less than 0.08m/s. The standard deviation on the differences between Loon and reanalyses zonal winds is latitude dependent, ranging between 2.5 and 3.5 m/s increasing equatorward. Comparisons between Loon trajectories and those calculated by applying a trajectory model to reanalyses wind fields show 10 that MERRA-2 wind fields result in the most accurate simulated trajectories with a mean 5 day balloon–reanalysis trajectory separation of 621km and median separation of 324km showing significant improvements over MERRA version 1 and slightly outperforming ERA-Interim. The latitudinal structure of the trajectory statistics for all reanalyses displays marginally lower mean separations between 15◦S and 35◦S than between 35◦S and 55◦S, despite standard deviations in the wind differences increasing toward the equator. This is shown to be related to the distance travelled by the balloon playing a role in the separation 15

1 Introduction X (an Alphabet company, formerly known as Google[x]) Project Loon, hereafter referred to as Loon, aims to provide worldwide Internet coverage using a network of longduration super-pressure balloons.These balloons fly in the stratosphere at approximately 20 km altitude with flight durations averaging 55 days (maximum 187 days, median 42 days).In this study zonal and meridional wind speeds, derived from Loon location information obtained from the onboard GPS, are compared with interpolated winds from four different reanalyses.The reanalyses used are the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis (Dee et al., 2011), NASA's Modernera Retrospective Analysis for Research and Applications (MERRA) (Rienecker et al., 2011), the recently released MERRA-2, and the National Centers for Environmental Prediction (NCEP) Climate Forecast System Version 2 (CFSv2) analysis (Saha et al., 2011) (which we refer to as one of the reanalyses).The reanalyses assimilate a range of data to tightly constrain a global atmosphere-ocean climate model simulation.Using satellite data, in situ observations from radiosondes, and other data sources, the reanalyses generate a data set that provides a best estimate of the state of the global atmosphere.

L. S. Friedrich et al.: A comparison of Loon balloon observations and stratospheric reanalysis products
These reanalyses are often used to study stratospheric dynamical processes.In particular, reanalyses winds are used to compute forward and backward trajectories to trace the motion of air parcels.For example, a Lagrangian chemical box model can be used to determine ozone loss rates in an air parcel by measuring the concentration of ozone at various times while keeping track of the parcel through isentropic trajectory modelling (Vondergathen et al., 1995).Trajectory analyses are also important for quantifying mixing between different air masses which can affect atmospheric chemistry.This is important as many chemical processes depend non-linearly on the concentrations of the reactants (Stohl et al., 2004), e.g. the rate of ozone loss in the stratospheric polar vortex (Tuck et al., 2003).Calculated trajectories are also used to infer various metrics of mixing (Nakamura, 1996;Haynes and Shuckburgh, 2000;Smith and McDonald, 2014).Determining trajectories is also central to domain-filling techniques which allow fine-scale structure in chemical constituent fields to be derived from space-based measurements (Sutton et al., 1994;Smith and McDonald, 2014).Loon flights are therefore also used to examine the accuracy of trajectories derived from the reanalyses.Stohl et al. (2004) discuss the importance of reanalysis quality in mixing studies.In particular, features such as the polar vortex, which act as barriers to mixing, may be displaced in an analysis relative to the position a forecast from the previous analysis would have predicted.The reason for such a displacement is unphysical and arises from the assimilation of observations.In a transport model used with these analyses, an air parcel may therefore find itself on the other side of a mixing barrier without actually crossing it in a physically meaningful way.Thus, understanding the quality of the reanalyses fields is important in stratospheric chemistry studies.
Measurements of the stratospheric wind field are sparse.While routine radiosonde flights are made once, twice, or four times daily at more than 100 upper-air sites within the global observing system, because the resultant data are assimilated into the reanalyses, they cannot provide an independent verification of the quality of the reanalyses.Independent data from long-duration balloon flights therefore provide a valuable assessment of reanalysis accuracy.The balloon-reanalysis comparison reported on here adds to the body of knowledge encompassed in previous studies, which used a range of models and balloon flights (Knudsen et al., 2002;Hertzog et al., 2004;Knudsen et al., 2006;Hertzog et al., 2006;Parrondo et al., 2007;Boccara et al., 2008;McDonald and Hertzog, 2008;de la Camara et al., 2010;Podglajen et al., 2014).These previous studies have been performed in varied geographical regions, generally using fewer balloons than are used in the analyses reported here.To provide a context for the results reported on below, a brief summary of the key results from previous comparison studies is provided.Hertzog et al. (2004) used six super-pressure balloons launched from high northern latitudes to assess the quality of ECMWF and NCEP/NCAR reanalyses in the lower stratosphere.The NCEP/NCAR reanalysis temperatures showed a 0.8 K warm bias relative to the observations, while the ECMWF analyses showed a 0.3 K cold bias.The temperature observations exhibited small-scale fluctuations which Hertzog et al. (2004) attributed to mesoscale inertia-gravity waves.Both analyses accurately represented the winds with biases of less than 0.3 m s −1 and standard deviations ranging from 2.3 to 2.7 m s −1 using data with a 15 min temporal resolution.Trajectory comparisons suggested that ECMWFderived trajectories were more accurate than those determined using NCEP/NCAR wind fields, with trajectory errors after 15 days of 1000 ± 1200 km for ECMWF and 2300 ± 1300 km for NCEP/NCAR trajectories.Knudsen et al. (2006) examined data from 11 balloons launched from Brazil in 2004.Relative to the balloon-based temperatures, the temperature extracted from the ECMWF operational analyses had a mean 0.9 K cold bias, with a standard deviation of 1.3 K. ECMWF winds showed biases of less than 0.4 m s −1 , with standard deviations of about 3 m s −1 , resulting in average trajectory separations of about 500 km after 5 days.Podglajen et al. (2014) used data from three equatorial long-duration balloon flights, launched in 2010, to examine the performance of ERA-Interim, MERRA, and ECMWF operational analysis.The results of the temperature comparisons were relatively similar to those of previous comparisons, with small warm biases (up to 1 K for MERRA), and standard deviations ranging from 1.5 K for ECMWF to 2.2 K for MERRA.The analysed winds, however, were found to show higher biases than similar analyses in the extra-tropics, with concomitant large differences in derived trajectories.All of the reanalyses were found to have zonal wind biases greater than 2 m s −1 , with the standard deviation of the reanalysis wind differences ranging from 3.5 to 5.8 m s −1 using data with a 1 min temporal resolution.Detailed analysis of cases of persistent (more than 10 days) significant biases in the reanalyses, with zonal wind biases and standard deviations of ∼ 9 m s −1 , suggested that these events corresponded to large-scale equatorial Kelvin and Yanai wave packets with small vertical wavelengths which were not resolved in the reanalyses.Podglajen et al. (2014) also discussed the likely causes of the poor representation of stratospheric equatorial waves and concluded that one of the key factors was the lack of wind speed observations assimilated by the analyses, particularly over the data-sparse eastern Pacific and Indian Ocean.More recent work detailed in Kawatani et al. (2016) also suggests that at 50-70 hPa the geographical distributions of the disagreement between the different reanalyses are closely related to the density of radiosonde observations.Hertzog et al. (2006) assessed the ECMWF ERA-40 and NCEP/NCAR NN50 reanalyses in the Southern Hemisphere upper troposphere and lower stratosphere based on comparisons with 480 super-pressure balloon flights, most lasting longer than 100 days, from the 1971-72 Eole experiment.These comparisons indicated that, in the sub-polar latitudes, both NN50 and ERA-40 exhibited a cold bias of 3 and 0.5 K respectively, while both had a warm bias of ∼ 1 K in the tropics.The winds were found to have biases of ±1 m s −1 , with latitude-binned standard deviations ranging from 5 to 15 m s −1 .Boccara et al. (2008) used data from 27 super-pressure balloon flights with a 15 min temporal resolution, launched as part of the 2005 Antarctic Vorcore campaign, to examine the quality of ECMWF operational analysis and NCEP-NCAR NN50 reanalysis.The NN50 reanalysis showed a 1.51 K warm bias while the ECMWF analyses showed a 0.42 K cold bias.The winds in both reanalyses showed biases of less than 0.15 m s −1 , with standard deviations ranging between 2.4 and 3.4 m s −1 , with ECMWF performing better than NN50.These results indicated an improvement relative to those in Hertzog et al. (2006) which is likely related to the lack of data assimilated in the Southern Hemisphere prior to the satellite period.Boccara et al. (2008) attributed the smallscale fluctuations in the wind and temperature data to gravity waves that were unresolved in the reanalyses.By applying a low-pass filter to remove these small-scale fluctuations, they determined that a significant proportion of the standard deviation was a result of these perturbations.Trajectory separations were found to exceed 1000 ± 700 km after 5 days using NN50, and 10 days for ECMWF.
McDonald and Hertzog (2008) compared temperature measurements in the Antarctic stratosphere made by the CHAMP radio occultation satellite and in situ temperature measurements from Vorcore campaign balloons.The analysis compared near-simultaneous and co-located temperature observations made by these instruments and found excellent agreement between the temperatures measured in two very different ways.The mean bias between the data sets was −0.52 K, with CHAMP temperatures being cooler than the balloon-based measurements, with a standard deviation in the differences of 1.6 K.This paired data set also enabled McDonald and Hertzog (2008) to show that an empirical correction used to remove the influence of radiative heating on the balloon temperature sensors, a variant of which is commonly used to correct balloon-based temperature measurements, did not produce any additional bias.
The remainder of this paper documents the Loon observations (Sect.2.1), introduces the methodology used in our analysis and specifically details the trajectory model used (Sect.2.2).Comparison of the Loon zonal and meridional wind speeds with reanalysis products is then detailed (Sect.3.1) and the Loon flight paths are used to examine the accuracy of trajectories derived from the reanalyses in Sect.3.2.1.The pressure levels of the balloon flights vary between 30 and 70 hPa with an actively controlled altitude, although this active control is used relatively rarely, typically with multiple days between altitude changes.The Loon group use forecasts from the NCEP global forecast system (GFS), as well as forecasts from other sources, to simulate expected balloon trajectories.Based on these forecasts, decisions are made by the Loon team to occasionally adjust the balloons' altitudes, which is done by pumping air into or out of an internal bladder to modify the balloon density.While super-pressure balloons typically move along isopycnic (constant density) surfaces during the rare occasions of altitude control, this is no longer the case.Intervals during which the altitude of a balloon is being modified can be clearly identified by very rapid changes in the pressure.In the following analysis, whenever a pressure change greater than 5 hPa occurs within 1 h, the balloons are considered to be undergoing an altitude control manoeuvre and the data from that period are excluded from the subsequent analysis.
Each balloon data set includes three-dimensional GPS position, pressure, and balloon lift-gas temperature, all of which are typically recorded at 1 min intervals with occasional gaps due to telemetry failures.Throughout this study our analysis uses this 1 min temporal resolution data for comparison with interpolated reanalysis data or trajectories derived from that data.Although no specific details of the instruments used on each of the balloon flights are recorded, the Loon team have provided an upper bound on the uncertainties of the sensors, viz.1.5 hPa for pressure, 10 m for GPS location, and 10 K for temperature.The GPS uncertainty suggests an upper bound of 0.23 m s −1 uncertainty on derived wind speed measurements.The upper bound on the pressure sensor uncertainty is rather large and could potentially lead to uncertainties when vertically interpolating the reanalyses data sets to the balloon locations.Using the hydrostatic equation shows that a 1.5 hPa pressure uncertainty equates to about 300 m in altitude.Given a 3.0 m s −1 change over 2 km at the bottom of the stratospheric jet in the Southern Hemisphere winter (approximated from ERA-Interim climatology), this equates to about 0.4 m s −1 in the worst case.
Comparisons of Loon pressure sensor measurements with pressures extracted from reanalyses, where the reanalyses' geopotential heights have been converted to geometric heights to allow direct comparisons with the GPSreferenced Loon data, indicate that each individual balloon flight exhibits pressure sensor biases ranging from −0.5 to +1.70 hPa, in agreement with the provided uncertainty estimate.Mean biases against NCEP-CFSv2 reanalyses (Loon minus reanalyses) are 0.535 ± 0.537 hPa.Adjusting the pressure data for these biases has only minor impacts on the subsequent analysis.The temperature measurements, being a measure of the lift gas and not the ambient air, are of questionable scientific utility in the current context; their usability is further examined in Sect.3.3.

Methodology
For the comparisons between the Loon observations and the reanalyses products a methodology very similar to that used in Boccara et al. (2008) is used to interpolate the reanalysis data to the temporal and spatial position of the balloon.A summary of the resolutions of the reanalysis products used in this study is provided in Table 1.Our interpolation scheme is a cubic spline fit over 6 data points in both horizontal directions, log-pressure, and time.Simple bilinear interpolation schemes occasionally displayed signs of discontinuities in the reanalysis fields, likely related to the assimilation of data, which subsequently produced dynamical inconsistencies as previously identified in Stohl et al. (2004).The latitude and longitude GPS location data are combined with a simple finite difference calculation to derive the zonal and meridional winds which advect the balloons.Use of a five-point derivative calculation scheme, which is more robust in the presence of noise, produces almost no difference in the velocities derived, but is impacted more by occasional data gaps than the simple scheme, and was therefore not used in this study.A Lagrangian trajectory model was also used to compare trajectories derived from reanalyses against the balloon trajectories.Every 6 h along a balloon flight, an 8-day trajectory was initialized.While super-pressure balloons closely follow isopycnic surfaces, and hence isopycnic trajectories are generally used (Hertzog et al., 2004;Boccara et al., 2008;Podglajen et al., 2014), in the model used here the vertical motion is also accounted for by setting the altitude of the modelled trajectory to correspond to the pressure level of the balloon, as is done by Knudsen et al. (2006).While this approach decreases the impact of potentially failing to recognize small altitude modifications, the range of potential trajectories is still limited by the occasional large altitude changes.Even when calculating trajectories with altitudes prescribed from the balloons, non-isopycnic altitude changes can exacerbate small separations in modelled and actual trajectories.Therefore, for the purposes of this analysis, any trajectories that encounter non-isopycnic balloon altitude changes are truncated such that the data after the altitude shift are excluded from later analysis.
The Lagrangian trajectory model used in this study was developed at the University of Canterbury and is a modified version of that used and discussed in Alexander et al. (2013), McDonald and Smith (2013) and Smith and McDonald (2014).It uses a fourth-order Runge-Kutta algorithm, with a 10 min time-step, with reanalysis wind speeds determined at the trajectory position using the spatial-temporal interpolation scheme detailed above.A polar stereographic  coordinate system is used equatorwards of 70 • to avoid the singularity at the pole.

Winds
A sample of the zonal and meridional winds derived from one of the Loon GPS data sets, along with the corresponding reanalysis winds, is shown in Fig. 2.This flight is shown as an example since it exhibits a wide range of zonal wind velocities.The comparison shows a good correspondence between the Loon observations and all four of the corresponding reanalysis wind time series.While some differences are observed between the reanalysis data sets, these are generally smaller than the differences between the reanalyses and the Loon data.High-frequency variability at periods close to and below 1 day is more noticeable in the Loon observations than in any of the reanalyses, which suggests that these small-scale variations might be important in explaining any differences.The differences likely represent the impact of small-scale waves, with a number of studies identifying that inertia-gravity waves may be important.Statistics of the reanalyses minus Loon-derived wind differences, over a wide range of southern latitudes, show that the Loon-derived wind fields match well with the reanaly-ses.Histograms and key statistics of the wind differences are shown in Fig. 3 and Table 2.The wind differences shown in Fig. 3 all exhibit Gaussian distributions with biases less than 0.37 m s −1 and standard deviations less than 3.4 m s −1 .These values are larger than those derived by Boccara et al. (2008) who found zonal and meridional standard deviations of 2.43 and 2.38 m s −1 for the differences between ECMWF operational analyses and the Vorcore-derived winds.However, the larger standard deviations derived in our study are consistent with the observed latitudinal trend for the standard deviation as discussed below.Table 2 also shows that the mean zonal wind difference between the Loon-derived winds and the reanalyses is larger for ERA-Interim and CFSv2 than for MERRA and MERRA-2.It is also clear that inter-reanalysis differences in the standard deviations of the zonal and meridional wind differences are small.However, the statistical significance linked to the difference in the means of the Loon observations and the reanalysis output have been calculated using the student's t test and the f test for the significance level for the differences in the variances of the distributions.In every case, the differences between the Loon observations and the reanalysis output are significantly different at greater than the 99 % level.
The latitudinal structure in the differences between the Loon and reanalyses winds, shown in Fig. 4, shows a tendency for the standard deviation in the wind differences to  increase closer to the equator.Although there is no obvious trend in the zonal wind biases, ERA-Interim has a consistent positive bias over all latitude ranges as opposed to the biases in the other reanalyses which switch sign.Note that the 99 % confidence interval associated with the biases are such that they are similar to the width of the line representing the bias.The large ERA-Interim zonal bias statistic listed in Table 2 is therefore not an indicator that ERA-Interim is worse in this respect than the other reanalyses, but rather that it exhibits a consistent bias across latitudes whereas the other reanalyses have biases of similar magnitudes which cancel when averaged over latitudes.Across all reanalyses, there appears to be a trend in the meridional biases with net over-estimation polewards of ∼ 40 • S and under-estimation equatorward of ∼ 40 • S.
While the region closest to the equator has larger biases and standard deviations, these biases are significantly smaller than those derived by Podglajen et al. (2014).This may be re- lated to seasonal differences, where most of the Loon flight data were collected through the Southern Hemisphere winter (June to September), while the measurements analysed by Podglajen et al. (2014) were collected in February.However, given the lack of strong seasonal variations in the tropics, this inference is questionable.Another possibility is that interannual variability in the mean winds could play a significant role; the phase of the quasi-biennial oscillation could be important.The fact that Podglajen et al. (2014) also examine a narrower latitude band (within 10 • of the equator) may also be important.The work in Podglajen et al. (2014) also highlighted large wind biases in specific regions (i.e. the Indian Ocean and the eastern Pacific) where in situ observations are scarce.Therefore, given the limited quantity of observations near the equator in both studies, we cannot exclude the effects of sampling bias between the two data sets.
The wind difference statistics indicate that of the four reanalyses analysed, ERA-Interim and MERRA-2 perform the best with MERRA-2 showing a measureable improvement over MERRA.

Trajectories
The trajectory model described above was used to initialize a simulated trajectory every 6 h along the observed Loon balloon trajectory.The resultant separation statistics between the observed and simulated trajectories are shown in Fig. 5 and Table 3.The mean and median values of the trajectory separations as a function of time are shown in panel a of Fig. 5 for the four different reanalyses.A more detailed representation of the separation of the trajectories calculated from the MERRA-2 wind fields from the observed trajectories is shown in panel b of Fig. 5, including confidence intervals and inter-quartile ranges.If a trajectory's corresponding balloon underwent rapid altitude changes over the course of the simulated trajectory, only the separation data up to that altitude change are included, resulting in a decreasing number of available trajectories as time progresses (Fig. 5c).The results plotted in panel a of Fig. 5 show that after the first day, both the mean and median separations increase roughly linearly with time.For MERRA-2, the median separation grows at a rate of roughly 48 km a day.However, the growth of individual trajectory separations is far more chaotic.The departures between the mean and median values of the separation at a particular time along the trajectory suggest there are significant contributions due to extreme outliers, with the mean approaching the upper quartile of separations (Fig. 5b).This also suggests that the median is likely a better indicator of expected trajectory separation.Histograms of the 5-day separations between the reanalyses-based simulations and the Loon trajectories are displayed in Fig. 6.After 5 days, the separations resulting from the MERRA-2-derived trajectories show a smaller number of large outliers and also a slightly higher proportion of simulations at lower separations than the other three reanalyses (Fig. 6).The histograms display a roughly log-normal distribution.A log-normal process is the statistical realization of the multiplicative product of many independent positive random variables, and this form is therefore suggestive of the fact that a combination of multiple factors impacts the separations observed.Comparison between the MERRA and MERRA-2 distributions also shows that the MERRA-2-based trajectories follow more closely the actual Loon trajectories.
The separation statistics shown in Fig. 5 compare well with the analyses detailed in Hertzog et al. (2004) and Boccara et al. (2008) although, surprisingly, the ECMWF analyses used in Hertzog et al. (2004) have somewhat smaller separations at 5 days than those in this study.This may result from the higher quality of reanalyses in the Northern Hemisphere relative to the Southern Hemisphere identified in some previous studies.That said, given the improvement in the quantity of data being assimilated by the more recent reanalyses, and underlying model improvements, this is still a little puzzling.
If trajectories after forced balloon altitude manoeuvres are not excluded from the analyses, we find that the comparisons of the observed and modelled trajectories decrease significantly in quality.The median MERRA-2 separation after 5 days increases from 240 to 574 km, increasing at a rate of roughly 88 km per day.This increase could be expected as trajectories that were initially separated due to small biases in reanalyses, but still follow along the same general flow, might suddenly find themselves in different flow regions when the pressure level is adjusted, leading to higher trajectory separations.However, this apparent degradation in trajectory quality could also be an indicator of selection bias.The Loon team uses a numerical weather prediction (NWP) model output to forecast balloon trajectories, and any balloon motion not predicted by the NWP might require adjustment using forced altitude changes.This would then result in our analysis excluding the effects of the long-term behaviour of these inaccurate trajectories.Similarly, if the reanalyses have difficulty modelling these trajectories, this would lead to an automatic selection bias with the long-term separation statis-tics including more "good" trajectories.The short-term separation statistics are likely to be more reliable and less prone to this sampling bias.
To examine the separations in an alternative manner, we can also inspect the relative separations.There are two variants of this approach.We can examine the separation at some time divided by the total distance travelled by the balloon over 8 days, or alternatively, the separation after h hours divided by the distance travelled by the balloon during those h hours.One motivation for the former method is that if trajectories that travel further have concomitant greater separations, this might diminish the effect of these outliers.The resulting relative separations are shown in Fig. 7.A notable feature in the first relative separation method is that the MERRA-2 and ERA-Interim mean relative separations are much more distinct, and that the mean relative separations of the reanalyses are much closer to the median, lying well within the inter-quartile ranges.The second method also shows some interesting features, with median relative separations remaining roughly constant after the first day: for example the MERRA-2 shows a consistent median relative separation of ∼ 10 %.
Comparison of the results from Figs. 5 and 7a suggests that the trajectories with the highest separations tend to correspond to the flights with the longest distances travelled, which is also revealed when performing a more in depth examination of individual events.In particular, there is a low correlation (r = 0.34) between total distance travelled and the resulting separation, but the mean separations for the upper-half of distance-traveled-balloons is nearly double that of the lower half, suggesting that this factor might dominate the observed variations.This would suggest that while the differences between the reanalyses and Loon winds are important in defining the separation, the mean state of the wind also plays an important role, as one would expect.In addition, the difference in separation statistics between the ERA- Interim and MERRA-2 could then be identified as being related to the larger bias in the zonal mean in the ERA-Interim than the MERRA-2 dataset.
There is little latitudinal variation in trajectory accuracy, but we do find that for all reanalyses the mean trajectory separations are slightly lower between 15 and 35 • S than between 35 and 55 • S.This is slightly counter-intuitive because the standard deviations of wind errors display the opposite trend.This is likely explained by the fact that the growth of the separation depends on the type of flow; for example, over 8 days the balloon trajectories tend to travel a greater total distance as the latitude increases, which might explain the observed trend in trajectory accuracy.For the relative separation, separation divided by total distance travelled, shown in Fig. 7, the opposite trend is observed with greater separations equator-ward.
Notably, we find that the MERRA-2 trajectories are significantly improved with respect to the old MERRA version 1 trajectories, resulting in trajectories with similar mean separation statistics to those derived from ERA-Interim.While the mean separations are nearly indistinguishable, the MERRA-2 median separation is noticeably lower than that of ERA-Interim, suggesting that the MERRA-2 separation distribution is more skewed than that of the ERA-Interim.

Temperature
There are several difficulties associated with the Loon temperature data.As previously stated, the data result from measurements of the lift-gas temperature and not of the ambient air, resulting in strong solar zenith angle (SZA)-dependent differences between the lift-gas temperature and the ambient air temperature.These may result from the combination of the daytime radiative heating of temperature sensors and, we speculate, the balloon envelope absorbing in the UV-visible range.Additionally, although we are not aware of the spe-cific instruments used, it seems that the thermometer used has a high uncertainty and is intended as a diagnostic instrument rather than for scientific data collection.An example of balloon-reanalysis temperature differences is shown in Fig. 8.The temperature differences between the lift-gas and ambient air can be corrected through the use of a correction function, as is commonly done to adjust for temperature measurement biases arising due to radiative heating of the temperature sensors (Hertzog et al., 2004(Hertzog et al., , 2006;;Knudsen et al., 2006), but it should be noted that the impact of solar heating on the lift-gas temperature is much more significant than the usual solar bias, up to +3 K as opposed to the typical ∼ 1.5 K.The temperature differences can be modelled as: , where α, β, γ , δ, λ 0 , λ 1 , and λ 2 are fit coefficients determined from a linear least-squares regression.After removing some flights with anomalous observations (unreasonably large differences, questionable GPS or pressure data), we use temperature data from every second flight to fit the correction function, and then apply this correction to the remaining flights.The fitted parameters are provided in Table 4, and Fig. 9 shows the CFSv2 temperature differences with and without the correction applied.Application of the correction functions reduces the mean Loon-reanalyses temperature differences to a few degrees, significantly improving the utility of the Loon temperature measurements.However, the standard deviation and the shorter-term, day-to-day differences are still much greater than observed in other studies.Ignoring the differences between lift-gas and ambient temperatures by focusing only on the night-time measurements,  we still find standard deviations of ∼ 6 K while other balloon studies typically have biases and standard deviations less than 2 K. Additionally the night-time measurements show interesting behaviour with common consistent night-long differences of up to ±10 K. Consideration of the upper bound on the thermometer uncertainty provided by the Loon team, the significant difference which is much greater than those usually dealt with using correction functions, and the unusually inaccurate night-time temperatures leads us to conclude that currently the quality of the Loon temperature data means it is of little value in assessing the quality of the reanalyses.
Particularly, the variations in the differences between the reanalyses and the corrected temperatures is dominated by the uncertainty in the temperature observations, as the reanalyses show only a ∼ 0.2 K variation in the biases and standard deviations.

Discussion and conclusions
Loon long-duration balloon GPS trajectory information has been used to examine the quality of the horizontal winds in reanalyses along with the concomitant trajectory errors.
The fundamental goal of this study is to test the potential for the Loon balloons to be used in the evaluation of reanalysis fields in the stratosphere.This dataset is potentially of high value because with the exception of the EOLE experiment detailed in Hertzog et al. (2006) the number of measurements available in previous studies has been far lower than the current dataset.It should also be noted that the EOLE experiment took place in 1971-1972 and therefore occurred previous to the satellite era and thus potentially does not offer a good test of the quality of the reanalyses given the very limited amount of data that was assimilated in the Southern Hemisphere before the satellite era.Our results are generally in agreement with the limited number of previous studies.In particular, we find differences between reanalysis winds and the winds derived from the Loon trajectories that are comparable with those in Knudsen et al. (2006) and Boccara et al. (2008); these differences are also smaller than those identified by Podglajen et al. (2014) but slightly larger than those identified in Hertzog et al. (2004).In this study, latitude-dependent wind biases of less than 0.5 m s −1 and standard deviations of roughly 3 m s −1 are observed.In common with Hertzog et al. (2006) and Podglajen et al. (2014) we also find that the standard deviation of these differences increases toward the equator.We also note that these Southern Hemisphere measurements have larger differences with the reanalyses than identified in the Northern Hemisphere study detailed in Hertzog et al. (2004).Unfortunately, we also find that currently the Loon temperature measurements are not suitable for comparison with reanalyses even after a correction scheme similar to the one developed in Hertzog et al. ( 2004) is applied to the data.When considering the biases and standard deviations linked to the four reanalyses used in this study (ERA-Interim, MERRA, MERRA-2 and CFSv2), we find that ERA-Interim and MERRA-2 have slightly smaller standard deviations than the other two products, the improvement between the MERRA and MERRA-2 reanalyses being a notable achievement.
When the trajectories derived from the reanalyses winds are compared to the balloon trajectories, we again find broad comparability with previous studies.For example, the resulting 5-day mean (median) trajectory separations are found to vary from 620 (320) to 760 (480) km while work detailed in Boccara et al. (2008) found mean spherical distances between 400 and 1000 km after 5 days.We also note that the present results are somewhat better than those identified in Knudsen et al. (2006) (1300 km after 5 days) which might be a little surprising given that inspection of Fig. 2 in that paper suggests the standard deviations in the winds used in the trajectory model are comparable.However, a larger bias in the zonal wind (0.7 s −1 ) was identified in Knudsen et al. (2006) than in the current study.We also note that the detailed methodology used in the current study and Knudsen et al. (2006) are very similar and we therefore suggest that this difference may be associated with latitudinal differences in the quality of the reanalyses.It is also notable that MERRA version 2 performs the best out of all the examined reanalyses, showing significant improvements over version 1.The relative separation analysis detailed in Fig. 7 is also suggestive that the mean state and therefore the distance travelled by the balloon plays a role in these separation statistics.This fact likely explains the latitudinal structure of the trajectory statistics, with marginally lower mean separations between 15 and 35 • S than between 35 and 55 • S in all four reanalyses despite standard deviations in the wind differences increasing toward the equator.
As it stands, balloons launched as part of the X Project Loon network provide a useful independent test of atmospheric reanalysis winds.More balloons will continue to be launched which, if they are not assimilated into reanalyses, will allow significantly greater coverage for reanalysis comparisons, and perhaps enable an investigation into the seasonal variability of reanalysis accuracy.Further opportunities for understanding the mixing in the stratosphere using the currently available Loon data are also being currently explored.

Data availability
Reanalysis data used in this paper is publicly available from NCEP, ECMWF, and the GES DISC for the MERRA and MERRA-2 products.Loon data is available upon request from the Project Loon team.

Figure 1 .
Figure 1.General Loon flight information including a (a) set of all balloon trajectories viewed from south pole, (b) timeline showing individual balloon launch times and flight durations, (c) histogram of observation distribution as a function of latitude, and (d) histogram of observation distribution as a function of pressure.

Figure 2 .
Figure 2. Wind speeds measured from Loon flight no.263 along with interpolated reanalysis winds.This shows the typical behaviour for comparisons of balloon and reanalysis wind speeds, including the tendency for the balloon winds to oscillate about the reanalysis winds.

Figure 3 .
Figure 3. Zonal and meridional wind difference histogram outlines.Histograms are binned by steps of 0.25 m s −1 .Corresponding statistics are shown in Table2.

Figure 4 .Figure 5 .
Figure4.Zonal and meridional wind differences binned by latitude, in 1 • steps.There is a clear tendency for wind difference standard deviations to be larger near the equator.There also seems to be a trend in the meridional wind differences, with net over (under) estimation poleward (equatorward) of 40 • S.

Figure 6 .
Figure 6.Histogram of the trajectory separation distribution after 5 days.(b) is the same as (a), but using logarithmic separation to highlight the log-normal distribution, with a long tail of extreme outliers which is not visible in (a).

Figure 7 .
Figure 7. Relative trajectory separations as a function of time.(a) is similar to Fig. 5a except here, prior to deriving the statistics, the separation of each reanalysis trajectory is normalized by the total distance travelled by the balloon during those 8 days.(b) is similar, except here the separations are divided by the the current distance travelled by the balloon, not the total.

Figure 8 .
Figure 8. Differences between Loon lift gas and interpolated MERRA-2 temperatures for flight 322.The SZA-dependent bias is clearly visible.

Figure 9 .
Figure 9. Differences between Loon lift-gas temperatures obtained from selected odd-numbered flights (red traces) and temporally and spatially coincident NCEP-NCAS-CFSR reanalysis temperatures.Mean differences in each 1 • SZA bin are shown with a solid line together with the first standard deviation of the differences as uncertainty bars.Differences after the application of the correction functions are shown in blue.

Table 1 .
Resolution of the model outputs used in this study.The last column identifies the number of pressure levels between 30 and 70 hPa inclusive.All model products provided in 6 h intervals.

Table 2 .
Statistics of the wind differences between the reanalyses and the Loon balloons.Corresponding histograms are plotted in Fig.3.Units are m s −1 .

Table 3 .
Statistics of the trajectory separations after 5 days in kilometres.Corresponding separations over time plots are provided in Fig.5.The errors on the means are the 90 % confidence intervals.

Table 4 .
Best fit correction function parameters as determined by applying the correction to every second flight.