Interactive comment on “ How long do satellites need to overlap ? Evaluation of climate data stability from overlapping satellite records ” by

This paper examines the issue of quantifying differences in satellite measurements sets based on overlapping measurement periods, addressing the question of how long overlap periods need to be to accurately estimate offsets and drifts between instruments. A few general formulas are presented to calculate required overlap periods for given desired precision requirements of offset and drift estimation, and examples are presented.

and temporal coverage. A common challenge with satellite observations is to quantify their ability to provide well-23 calibrated, long-term, stable records of the parameters they measure. Ground-based intercomparisons offer some 24 insight, while reference observations and internal calibrations give further assistance for understanding long-term 25 stability. A valuable tool for evaluating and developing long-term records from satellites is the examination of data 26 from overlapping satellite missions. This paper addresses how the length of overlap affects the ability to identify an 27 offset or a drift in the overlap of data between two sensors. Ozone and temperature datasets are used as examples 28 showing that overlap data can differ by latitude and can change over time. New results are presented for the general 29 case of sensor overlap by using SORCE SIM and SOLSTICE solar irradiance data as an example. To achieve a 1% 30 uncertainty in estimating the offset for these two instruments' measurement of the Mg II core (280 nm) requires 31 approximately 5 months of overlap. For relative drift to be identified within 0.1% per year uncertainty (0.00008 32 watts m -2 nm -1 yr -1 , the overlap for these two satellites would need to be 2.5 years. Additional overlap of satellite 33 measurements is needed if, as is the case for solar monitoring, unexpected jumps occur adding uncertainty to both 34 offsets and drifts; the additional length of time needed to account for a single jump in the overlap data may be as 35 large as 50% of the original overlap period in order to achieve the same desired confidence in the stability of the 36 merged dataset. Results presented here are directly applicable to satellite Earth observations. Approaches for Earth 37 observations offer additional challenges due to the complexity of the observations but Earth observations may also 38 benefit from ancillary observations taken from ground-based and in situ sources. Difficult choices need to be made 39 when monitoring approaches are considered; we outline some attempts at optimizing networks based on economic

18
In this paper, we estimate the direct impact of length of overlap between satellites to the continuity of data from two 19 overlapping satellites. We examine three separate factors that are of direct importance to the users and creators of 20 merged satellite datasets: the quantified offset of the two datasets, the drift between the two datasets, and the impact 21 of sudden jumps in the data during periods of overlap. We note that intercomparison of satellite records cannot, in 22 isolation, determine which of two systems is more accurate or stable. Indeed, agreement of two observing systems 23 can occur when both are similarly inaccurate or similarly drifting and instruments can drift outside of the 24 intercomparison periods. However, intercomparisons offer valuable, independent assessment useful for developing a 25 long-term record. For illustrative purposes, we look at ozone, temperature and solar radiation satellite records and 26 discuss how these three factors can affect the long term records of these parameters. We present techniques for 27 evaluating overlapping data with the solar dataset because the data are less dependent on satellite drift, diurnal 28 match-ups and differences in the instrument-dependent field of view. We note that the usefulness of overlapping 29 data is highly dependent on the length of overlap and the ability to match overlapping data with high precision. In 30 the final section of this paper, we outline methods for optimizing the set of choices which are needed to create a 31 long-term and stable climate record under a variety of constraints, most notably economic constraints. Optimization 32 will result in better use of resources to achieve more accurate and stable merged datasets.  which is presented as the thick grey line. 10 11 Figure 1 shows the result of merging the GOME, SCIAMACHY and GOME-2A total ozone time series (Weber et 12 al., 2011(Weber et 12 al., , 2015 into a continuous time series. In this case, the SCIAMACHY and GOME-2A observations (thin 13 blue and green lines) were successively bias adjusted to be continuous with the original GOME data. Biases (offsets) 14 were determined as a function of latitude in steps of 1 degree using monthly zonal means. Despite extensive pre-15 calibration efforts and monitoring of instrument performance, differences are noted between data from the 16 overlapping satellites. There appears a drop of the original GOME-2 data record during the 2009-2011 period 17 relative to SCIAMACHY, which seems to be larger than the overall bias between two datasets. In this case the very 18 large overlap period from 2007 until 2012 was an advantage and no further corrections beyond the latitude 19 dependent biases were needed to adjust GOME-2. Due to this non-physical drop in the GOME-2A data, the 20 SCIAMACHY data became the preferred choice in the merged (GSG) dataset during the overlap period (2007-21 2011). In contrast the overlap period for SCIAMACHY and GOME was less than 10 months (2002)(2003).
Additional corrections beyond a simple bias are difficult and may require the use of external reference data, 1 although the need for additional corrections may be indicated from satellite overlap data. The long-term stability of 2 these data are critical for estimating ozone recovery and understanding the complex long-term factors affecting 3 stratospheric ozone. Lessons from these overlap data serve to offer guidance for future decisions on satellite 4 observations and overlap periods.
Perhaps no other set of satellite records has been as studied as the temperature records derived from the MSU and 7 AMSU satellites. Two distinct challenges complicating the algorithms needed to develop reliable long-term 8 temperature records are: 1) multiple satellites, in situ and ground based measurements are available each with unique 9 characteristics and 2) the level of agreement differs with latitude and altitude. Multiple sources of data can 10 complicate merged data sets because different choices, even when reasonable, can lead to different long-term 11 characteristics in the record (e.g., Thorne et al., 2005; also see Sec. 3 of this paper). However, multiple records also 12 allow different groups to produce independent merged data sets which have long overlaps and can be directly 13 compared. Through these comparisons we gain valuable information about the uncertainties that arise from the 14 merging process itself, and whether the data sets are stable relative to the requirements of the analysis.  1 Second, differences in the merged data sets can be remarkably large --over a half degree C for latitudinal averages -2 -despite differences between data sets being minimized in the merging process and offset and drift between the 3 compared data set being removed. Third, even when the linear drift over the length of the overlap is removed, the 4 data show apparent drifts that last for several years in each latitude band. Such variations can limit the usefulness of 5 the merged records, but can also highlight issues with particular data sources that can then be addressed. The 6 latitudinal dependence of the variability may indicate regions that are better suited for analysis than others, though in 7 all cases the physical reasons for the correlated variability and potential drifts needs to be carefully examined. The 8 stability levels of satellite temperature datasets are critical for understanding the merged and complex feedbacks that 9 determine regional long-term temperature changes; understanding apparent offsets and drifts between different 10 sources of information is important, particularly when they are large compared to expected trends. Detecting and understanding long-term changes require some of the most challenging stability criteria in order for 13 confidence to be placed on the final results. A number of individuals and coordinated groups have worked to define 14 the requirements for Earth observations, including the recently completed effort by WMO (2011b), which addresses 15 the stability needed for various parameters. In the absence of explicit requirements for limits on drift, we suggest 16 that the standard error of the drift, at the one sigma level, be limited to half of the trend that one is seeking to detect.

17
For example, if a monitoring system is designed to detect a trend of 0.2 degrees per decade, the unchecked drift of  situ observations continue to offer some of the most useful information for constraining offsets and drifts in satellite 33 instrument as well as providing reliable, direct information on the Earth System. Both campaigns and long-term 34 monitoring efforts continue to help verify the accuracy and stability of satellite observations. Despite current efforts, 35 long-term stability and absolute calibration still present a challenge to current internal consistency methods, leading 36 many to look to other approaches for absolute calibration, traceability, and the ability to verify stability.

37
Perhaps the most innovative and needed advancements will come though future in-flight calibration approaches. The 38 development and use of these high accuracy climate benchmark instruments has been advocated for since the early

50
Even with future improvements in satellite observation accuracy, the challenge will remain to understand and merge 51 records from different satellites -each potentially using its own calibration and collection approaches -to provide a single observational record. One of the key factors that we can control is the length of overlap between existing and 1 future satellites. Analysis of an overlap record can only give us an estimate of relative drift, but in the absence of 2 traceable in-flight calibration, it is often one of our best checks on long-term stability of the final data products.

3
Understanding that decisions on overlap will directly affect both the cost of monitoring and the value of the final 4 dataset for evaluating long-term changes in climate, we propose approaches to objectively evaluate the length of 5 overlap needed to achieve a specific stability in the merged data record. In Section 8 we offer an approach to 6 evaluate how important overlap is compared to other choices that can help improve a long-term data record.  in-flight circumstances that produce instrument instabilities are highly specific to individual sensors so the best 20 practice is to employ instrument telemetry and on-orbit calibration methods traceable to international standards.

Set up for SIM/SOLSTICE comparison:
1 Solar irradiance is a crucial driver in the Earth's atmospheric system, influencing variability, circulation and long-2 term behavior of the atmosphere and having a direct role in atmospheric chemistry for the upper layers of the 3 atmosphere. A motivation for this solar irradiance study, as is true for most other long-term satellite monitoring 4 efforts, arises from the need to understand the length of time needed for the overlap of the currently operating 5 SORCE mission with the next generation Total and Spectral Irradiance Sensor (TSIS). TSIS is currently scheduled 6 for launch in the fourth quarter of 2017 for deployment on the International Space Station. While Earth observations 7 often require a minimum of a one year overlap to cover the full range of expected observations, such arbitrary 8 criteria ignore longer timescale phenomena including ENSO and NAO, and are impractical for covering a full  year solar cycle in a planned overlap period. Here we are applying analytical techniques to understand the length of 10 time needed to quantify the offset between two satellite observing systems and to understand the drift between two 11 satellite records (Weatherhead et al., 1998). While it is unclear whether the TSIS/SORCE overlap will mimic the 12 findings from the comparison of the two SORCE instruments, this effort will examine how potential instrument  in both the ozone and temperature satellite datasets described above. These three sources of uncertainties combine 5 and contribute to the length of overlap needed to derive a robust climate record from satellite records. 6 7 1. There is an offset of about 0.5% in the pre-flight calibration between the SOLSTICE and SIM. The pre-8 flight absolute calibration is on the order of 1-2% so within the ability to absolutely calibrate the 9 spectrometer. Note that the observed differences in Figure 3 are within the expected pre-flight calibration 10 uncertainty, but these differences are still large relative to some scientific uses for solar data. The value of 11 overlapping missions for an appropriate period of time is the ability to verify pre-flight calibration 12 estimates of uncertainty and potentially improve the long-term datasets for scientific applications.

36
The next sections of this paper will address the effects of these three anomalies (offsets, drifts, and jumps) in the 37 SORCE datasets and discuss their impacts on dataset uncertainty. The primary contribution of his paper is to

44
One of the most studied issues underscoring the importance of proper treatment of multiple satellite records involves 45 the corrections and merging of Microwave Sounding Unit temperature records. Christy et al. (1995Christy et al. ( , 1998Christy et al. ( , 2000 46 accurately pointed out that trends from satellite temperature records were not in agreement with other temperature 47 records and showed a cooling of the troposphere rather than a warming. Additional work showed that a number of 48 corrections to the satellite record could make a direct and notable difference on the trend derived from the resulting 49 data (NRC, 2000a, 2000b; Zou and Qian, 2016). Some of the most salient lessons from this effort were summarized 50 by Thorne et al. (2005), who concluded, among other points that, "individual adjustments will a priori retain a non-51 climatic signal of unknown sign and magnitude regardless of how reasonable and physically plausible the chosen homogenization approach." The uncertainty of merging satellite data records is a continual challenge with a variety 1 of approaches employed including comparison to ground-based records, statistical intercomparison of satellites by 2 latitude, time of day and season, as well as use of physical models to look for appropriate consistencies with 3 available data. Details of the merging process directly influence the resultant trends and add to the level of 4 uncertainty in the final datasets (Karl et al., 1986). In this section we consider the case where overlap is non-existent 5 and for the case where overlap exists we consider the length of time required to achieve a specific uncertainty in an 6 overlapping set of data. These cases illustrate the need for overlap periods of sufficient duration to make a 7 quantifiable improvement in the long-term record. 8 We consider the straight-forward method for merging two sequential (non-overlapping) data records by requiring 9 the mean level of the three years of data prior to the discontinuity be equal to the mean level of the three years of 10 data after the discontinuity. In such a situation, those six years of data are being forced, by the algorithm, to have 11 very little trend. Imagine a situation where there are two such discontinuities in a twenty year record; more than half 12 of the data has been coerced to have virtually no trend, making the resultant data unreliable for many long-term  1 While it could be argued that there is nothing unique about the time-step of one month, it is a common practice in climate analyses. However, the example datasets used in this paper measure extra-terrestrial solar radiation. With the Sun's rotation of 27.2753 days, we have a natural timeframe close to a monthly average. In Appendix A we carry out the calculations in this paper with monthly averages and with averages based on the solar rotation schedule; we see no notable change in the basic conclusions adopting the more natural solar rotation schedule instead of monthly averages.

5
The overlap data depicted in Fig. 4 show a mean difference between the two datasets of 6.8*10 -4 watts m -2 nm -1 , 6 with a standard error on this mean of 2.7*10 -5 watts m -2 nm -1 when the classic standard error calculation ignores 7 autocorrelation. However, this figure does not support the assumption that the observed differences between 8 SOLSTICE and SIM are stable and would continue beyond the observed end of the analysis period of December 9 2008 because of the apparent drift in the differences. For cases when a drift is not involved, we can make use of the 10 standard formula for the standard error on the mean of the observed time series of differences when simple 11 autocorrelation is present: Where σ is the observed magnitude of variability of the observed differences in monthly averages; φ is the observed 2 autocorrelation in those differences, and n is the number of months of observed overlap. This estimate of Standard 3 Error of the mean is dependent on the data behaving as an autoregressive with time-lag of one month, AR(1), with 4 the underlying interventions behaving approximately as a Gaussian distribution. This more appropriate formula 5 gives a standard error on the mean of 5.2*10 -5 watts m -2 nm -1 , notably larger than if autocorrelation is ignored.

11
We can invert the formula for the Standard Error on the mean in Eq. (1), and solve for n resulting in the time to 12 estimate the mean offset between two satellites for a given accuracy as: 13 14 The above formula shows that for a given magnitude of variability and autocorrelation in monthly satellite overlap 16 data (σ and φ respectively); the length of overlap needed is inversely proportional to the square of the accuracy 17 desired for the offset estimate. The factor of 1.96 is to support a 95% confident limit on the offset with a 50% Limit of 0.0008 watts m -2 nm -1 (which is one percent of the mean of SOLSTICE during the overlap period), then the 28 number of months would need to be 5 months using the student-t distribution which offers 2.8 as the appropriate 29 factor in place of 1.96. Note that to achieve the 95% confidence limit, we must use the appropriate student-t 30 distribution, or approximately 1.96 multiplier in the large number limit, to assure we have the desired confidence on 31 our overlap adjustment. Note also that this is a recursive effort because the answer, number of months, is a function 32 of the multiplier, which is itself a function of the number of months. This exercise is not overly onerous, because the 33 formula offers an estimate of length of time needed to limit uncertainty in an offset, and such an estimate is rarely 34 precise to many significant digits. We conclude for the datasets we have been exposed to that after roughly two 35 years of data collection the large number limit of 1.96, may be considered appropriate.

36
The impact of the offset on the use of the data is critically important to the final analysis. While a "best" merged 37 dataset may be produced from multiple satellites, users should never ignore the added uncertainty due to merged 38 data sources. Using the merged data without including the impacts of the merging would result in smaller standard 39 errors in computing means, variability in trends, than is actually appropriate. The magnitude of the impact of the 40 offset correction is dependent on the use of the data. Two cases are considered here for illustrative purposes. If the 41 merged dataset will be used to estimate the impact of storms on a stable electrical grid, and the impacts have been 42 estimated from the effects observed using the first satellite record, an uncertainty of 0.2% means that the new solar 43 storms may well be off by ±0.2% and the uncertainty in impacts need to be appropriately calculated and conveyed.

44
If the merged datasets will be used to estimate long-term trends, then the impact of an uncertainty of ±0.2 % means 45 that any trends derived will be affected by that level of uncertainty carried out through the length of dataset used for 46 analysis, and may affect the significance of the expected trend, if care is not taken to reduce the uncertainty in the overlap adjustments. shown that one can estimate the length of time to detect trends in environmental observations. This approach is 34 applied to estimate the time to measure a differential drift with a specified uncertainty in the observations taken by 35 two different systems. When detection is considered at the 95% confidence level, estimated overlap for detection is:

37
Where |drift| is the absolute value of the magnitude of the differential drift, σ and φ are the magnitude of variability 38 and autocorrelation, respectively, of the differenced monthly data once any existing trend is removed. We can with new technologies, these assumptions must be checked by careful evaluation of the data, thus emphasizing the 9 importance of an adequate overlap period to help confine potential drifts to a specified level. While pre-launch 10 calibration may indicate drift will be less than a specific level, the ability to verify this will depend on independent 11 intercomparisons of observations.

12
Although no error bars are offered in Fig. 5, it is important to remark that when estimating how long it will take to 13 detect a specified drift, two statistical levels must be considered: one that identifies the meaning of "detecting a  We focus on the ability to detect and understand the jumps that last more than a few months, as they may 6 be the interruptions that can cause the most serious damage to long-term records, particularly those used in the 7 context of climate research. We consider the two cases separately of how a jump affects the offset estimate and how 8 a jump affects the drift estimate.

9
While ancillary data about the instrument or the observed parameter may be used to identify the existence and 30 assumes that any drift and offset will be fitted to the data simultaneously. If data will be fitted sequentially to an 31 offset and then to a drift, the derived drift will be considerably smaller than if the data were fitted to a drift and then

4
The confounding nature of jumps and drifts cannot easily be separated; although ancillary data can be extremely helpful.

5
Note these are synthetic data with arbitrary units. 6 7 In the case of the SORCE instruments, these jumps are mostly prompted by changes in the performance of hardware For the particular case of Fig. 3, the jump appears to be adequately corrected, but other wavelengths show a 1 discernable discontinuity. A similar observation can be made about the SOLSTICE slit anomaly in 2006 (see Fig. 3).
2 Most disconcerting about these events is the possibility that the jumps can be disguised as a    satellite data and illustrate these approaches with data from two instruments used to observe solar output.

39
The uncertainty due to the merging of satellite records is unavoidable, but quantification of this uncertainty is the Earth and potential long-term changes. The value of this paper is the ability to estimate, either prior to satellite 10 launch or soon after satellite launch, the amount of time needed to achieve or verify tolerance for a stable merged 11 satellite record using objective criteria.

13
Data and code availability: Example data and code used will be available from the first author on request 14 (Betsy.Weatherhead@Colorado.edu).

1
The use of monthly averaged data has been common in climate studies for many years, despite obvious deficiencies 2 in this somewhat arbitrary choice. One deficiency is in weighting a daily value in February more highly than a daily 3 value from any other month simply because February has fewer number of days than, for instance, May. A second 4 deficiency is the lack of match-up from the monthly timeframe to the natural world: the summer solstice is not in the 5 center of June, but off-center meaning that the June average would contain more information on pre-solstice 6 conditions than post. Even non-scientific users of climate data are used to using reports such as "Climatic Normals,  (3) 9 with σ and φ as the monthly standard deviation and autocorrelation as described in the body of the paper. Because 10 the estimate of the number of years is dependent on these assumptions, we explicitly test the data used as an 11 example in this paper for illustrative purposes.

12
The autocorrelation, φ, is the most difficult parameter to estimate accurately in a time series, particularly when φ is

19
In situations of high autocorrelation (0.7 in the top plot of Figure B1

7
Any estimate of how long it will take to correctly identify a drift must be taken with some level of understanding of 8 how this estimate is made and what can be expected from using these estimates. Figure 5 offers estimates for a range 9 of times needed to estimate specific drifts, assuming no jumps occur in the record. As a reminder, this plot was 10 created assuming the type of overlap seen in the SOLSTICE-SIM overlap period; specifically, the calculations 11 assume the amount of variability and autocorrelation observed in the differences (shown in the second plot of Fig

22
There are no error bars in Fig. 5. We'd like to begin the discussion of appropriate error bars in this section. As stated 23 in the previous paragraph, the data in Fig. 5 represent estimates of how long it will take to detect a specific level of 24 drift. If we focus on a single point, for instance the two year point that indicates a drift of 1.2*10 -4 watts m -2 nm -1 25 year -1 could be detected, it is possible that a slightly smaller drift could be detected in that two year of overlap, if the 26 variability happens to result in a signal-to-noise for the overlap period that is slightly more favorable. Similarly, if 27 the actual, underlying drift is actually five times as large (6*10 -4 watts m -2 nm -1 year -1 ) it is highly likely the drift 28 would be detectable within the two years. So the "error bars" on this one point would be slightly below the current 29 point and would extend infinitely upward, indicating that much larger drifts could be detected in the two year period.

30
Extending our discussion of error bars in Fig. 5, we can similarly think in terms of horizontal error bars. Again, 31 focusing on the one point in Fig. 3 indicating that a drift of 1.2*10 -4 watts m -2 nm -1 year -1 could be detected in two 32 years, this drift, if it is the true underlying drift, may be detectable a few months shy of two years or may take a few 33 months more than two years. As stated above, the two years is a 50% likelihood of detection. It is highly likely that 34 such a drift could not be detected in a few months of monitoring, but it would very likely be detected in ten years of 35 monitoring. So, again, we have error bars that are non-standard in that they extend to the left in the plot and continue 36 indefinitely to the right.
1 Figure C1. Estimates of how long it will take to detect a drift can be interpreted as the likely time needed. Depending on 2 variability present, even small drifts can be detected (although with less than 50% likelihood of detection), probability 3 indicated by the width of the green area. For a given drift level, there is a chance that the drift can be detected in less than 4 the number of years indicated, although that likelihood is less than 50% for time less than the times indicated, and the 5 probability indicated by the blue area.

6
If we want to express this uncertainty of likelihood of detection in a visual manner, we could employ two 7 dimensional error bars, similar to violin plots which are often employed to express variable information. Figure C1 8 shows the likelihood of detecting a particular drift with two years of overlap. For drifts considerably smaller than 9 1.2*10 -4 watts m -2 nm -1 year -1 , the likelihood of detection with a two year overlap is represented by the width of the 10 green area. Larger drifts can be detected with higher likelihood. Figure C1 shows the likelihood of a true drift of 11 1.2*10 -4 watts m -2 nm -1 year -1 being detected in less than two years. The height of the blue bar indicates the 12 likelihood, with the linear scale being defined such that the likelihood of detection being 50% at two years; 13 considerably higher likelihood of detection is indicated with more years of overlap. There is also a small, but less 14 than 50% likelihood that the true drift might be detected in less than two years; again, the height of the blue bars 15 indicates the likelihood of detection. 16 17 18 1 Treating the value of information derived from satellite data as a public good (e.g. weather forecasts and climate 2 services have non-rival and non-excludable characteristics which define public goods in economic theory), total 3 societal benefits is the sum of the benefits realized by all users of the information. Net benefits (NB) is the 4 difference between total benefits (TB) and total costs (TC).
For purposes of the current discussion we take total costs to simply be a function of the temporal overlap in satellite