Improvement of climate predictions and reduction of their uncertainties using learning algorithms

Simulated climate dynamics, initialized with observed conditions, is expected to be synchronized, for several years, with the actual dynamics. However, the predictions of climate models are not sufficiently accurate. Moreover, there is a large variance between simulations initialized at different times and between different models. One way to improve climate predictions and to reduce the associated uncertainties is to use an ensemble of climate model predictions, weighted according to their past performances. Here, we show that skillful predictions, for a decadal time scale, of the 2 m temperature can be achieved by applying a sequential learning algorithm to an ensemble of decadal climate model simulations. The predictions generated by the learning algorithm are shown to be better than those of each of the models in the ensemble, the better performing simple average and a reference climatology. In addition, the uncertainties associated with the predictions are shown to be reduced relative to those derived from an equally weighted ensemble of bias-corrected predictions. The results show that learning algorithms can help to better assess future climate dynamics.


Introduction
A new group of global climate simulations, referred to as the decadal experiments, was introduced in the Coupled Model Intercomparison Project (CMIP5) multi-model ensemble (Taylor et al., 2012;Meehl et al., 2009).The decadal climate predictions differ from the long-term climate projections in their duration, aims and meaningful output.The idea behind the decadal experiments was to investigate the predictability of the climate by atmosphere ocean general circu-lation models (AOGCMs) in time scales of up to 30 years, whereas long-term climate projections use the same type of models to predict the forced response of the climate system to different future atmospheric compositions over the next century (Meehl et al., 2009;Taylor et al., 2012).
The AOGCMs in the decadal experiments were initialized with interpolated observations of the ocean, sea ice and atmospheric conditions, together with the atmospheric composition (Taylor et al., 2011) (note that long-term projections are initialized with a quasi-equilibrium pre-industrial state; Taylor et al., 2012).Therefore, they were expected to reproduce the monthly and annual averages of the climate variables and the response of the climate system to changes in the atmospheric composition (Warner, 2011;Collins, 2007;Kim et al., 2012).Indeed, it was shown (Kim et al., 2012) that in some regions, the CMIP5 simulations have some prediction skill.It was also confirmed (Kim et al., 2012) that the multimodel average provides better predictions than each of the models, similar to what was found for other climate simulations (Doblas-Reyes et al., 2000;Palmer et al., 2004;Hagedorn et al., 2005;Feng et al., 2011).However, the simple multi-model average does not take into account the quality differences between the models; therefore, it is expected that a weighted average, with weights based on the past performances of the models, will provide better predictions than the simple average.As expected, it was shown that the weighted average of climate models can improve predictions when using ensembles of AGCMs (Rajagopalan et al., 2002;Robertson et al., 2004;Yun et al., 2003), AOGCMs (Yun et al., 2005;Pavan and Doblas-Reyes, 2000;Chakraborty and Krishnamurti, 2009) and regional climate models (Feng et al., 2011;Samuels et al., 2013).
Published by Copernicus Publications on behalf of the European Geosciences Union.

E. Strobach and G. Bel: Climate predictions using learning algorithms
The uncertainties in climate predictions can be attributed to three main sources: the internal variability of the model, inter-model variability and future forcing scenario uncertainties.The internal variability of the model stems from the sensitivity of the model to the initial conditions, sensitivity to the values of the parameters and the discretization method used.The inter-model variability is the result of different parameterization schemes and modeling approaches adopted in different models.The uncertainties due to different forcing scenarios are mostly related to different scenarios assumed regarding future greenhouse gas emissions.On a decadal time scale, forcing scenario uncertainties and uncertainties due to the internal variability of each model are considerably smaller than the inter-model uncertainties (Meehl et al., 2009;Hawkins and Sutton, 2009) (we also verified that the internal variability of each of the models we used is much smaller than the inter-model variability).Therefore, estimation of the uncertainties from an ensemble of climate models is expected to give a meaningful estimation of the total climate prediction uncertainties.
Different methods were used to improve climate predictions using an ensemble of models.A common approach is the simple regression (Krishnamurti et al., 2000;Krishnamurti, 1999).The regression does not assign a weight to each member of the ensemble but rather attempts to find the set of coefficients yielding the minimal square error for a linear combination of the ensemble model predictions.Bayesian methods have also been used for weighting ensembles of climate model projections (Rajagopalan et al., 2002;Robertson et al., 2004;Tebaldi and Knutti, 2007;Smith et al., 2009;Buser et al., 2009Buser et al., , 2010)).The weighting scheme of these methods relies on a certain distribution of the errors and other prior assumptions regarding the models; these assumptions are not necessarily valid for climate dynamics and predictions.Many variations of the Bayesian methods were applied to weather forecasting in order to establish the ensemble of models (Kalnay et al., 2006); these methods are less useful for climate predictions in which the variability between different models is larger than the internal variability of each model (Meehl et al., 2009;Hawkins and Sutton, 2009).
Here, we use several SLAs to weight climate models in the CMIP5 decadal experiments (Taylor et al., 2011) and thereby to improve both global and regional predictions.In addition, we show that the uncertainties associated with these improved predictions are smaller than those of the unweighted ensemble.The first algorithm is the exponentiated weighted average (EWA) (Littlestone and Warmuth, 1994), and the second is the exponentiated gradient average (EGA) (Kivinen, 1997).The two original algorithms were modified and adjusted to improve decadal climate predictions.A more recent algorithm, the learn-α algorithm (LAA), which is more suitable for the study of nonstationary sequences, was also used (Monteleoni and Jaakkola, 2003).The decadal climate predictions allow us to have a learning period and a validation period for testing the SLAs' performances.In addition, the use of methods for nonstationary sequences helps to assess the stationarity of the climate predictions in decadal time scales.
It is important to note that the SLA method assigns real weights (taking values between zero and one) to the ensemble models rather than to future climate paths (it is straightforward to use the weights of the models to get the probabilities of future climate paths, which are the common products of the Bayesian approaches); this characteristic makes the SLA method appropriate for model evaluation.The SLA method has several advantages compared with other weighting schemes: (i) it makes no assumptions regarding the distribution of the climate variables and the model parameters.Therefore, it can be used for all climate variables and all types of predictions; (ii) there is an upper bound for the deviation of the weighted ensemble average from the best model.For a sufficiently lengthy learning period (the duration of this period depends on the variable, the learning rate (which is described later) and the number of models in the ensemble), the SLA prediction is at least as good as the prediction of the best model in the ensemble; (iii) the weights can be dynamically updated, when new measurements are introduced, with no significant computational cost.

The sequential learning algorithms
A sequential learning algorithm (SLA) (also known as online learning) assigns weights to the climate models (the experts) in the ensemble based on their past performance.In this work, the output of the models was divided into two periods: a learning period during which the weights were updated and a prediction period during which the weights remained fixed and equal to the weights assigned by the SLA in the last step of the learning process.In order to capture the spatial variability in model performance, the weights were spatially distributed and the weight of each model in each grid cell was determined by the local past performance of the model.For the sake of clarity, the algorithm is described below without spatial indexes although the calculations were done for each grid cell separately.The prediction of the SLA forecasters is the weighted average of the ensemble (Cesa-Bianchi and Lugosi, 2006).The weights are assigned to minimize the cumulative regret with respect to each one of the climate models.The cumulative regret of expert E is defined as follows: (1) t is a discrete time, l denotes some loss function that is a measure of the difference between the predicted (p t by the forecaster and f E,t by expert E) and the true (y t ) values.In this work, we defined the loss function to be the square of the difference between the forecaster prediction and the "real" value, namely, l(p t , y t ) ≡ (p t − y t ) 2 .L n ≡ n t=1 l(p t , y t ), L E,n ≡ n t=1 l(f E,t , y t ) are the cumulative loss functions of the forecaster and expert E, respectively.The outcome of the forecaster, after n − 1 steps of learning, is weights assigned to the climate models in the ensemble to be used for forecasting the value at t = n.The forecast for t = n is the weighted average of the climate models: (2) Here, N is the number of models (experts) and w E,n−1 is the weight of expert E, which is determined by the regret up to time n − 1.We used two forecasters (weighting schemes): the exponentiated weighted average (EWA) and the exponentiated gradient average (EGA).The EWA weight is defined as follows: and its prediction at time n is the following: The EGA is similar to the EWA but with the cumulative loss calculated from the summation of the loss gradients.The cumulative loss for the EGA forecaster is defined as follows: where For both forecasters, η > 0 is a parameter representing the learning rate.
The deviation between the forecast and the "real" trajectory was quantified using the root mean square error (RMSE).The RMSE of a grid cell with coordinates (i, j ), over a period of n time steps (months in our case), is defined as follows: where p t (i, j ) is the value predicted by the forecaster and y t (i, j ) is the "real" value.The global, area-weighted RMSE is defined as follows: where A Earth is the earth's surface area and A i,j is the area of the (i, j ) grid cell.
The learning rate, η, was chosen to minimize the metric M ≡ RMSE•(1+floor (max( w/ t)/(1/N ))) during the learning period.This metric provides a minimal deviation of the forecast climate trajectory from the observed one and also ensures stable weights of the models (a significant change in the weight of a model was considered the weight a model would be assigned in the absence of learning).We also tested the optimization of η using only a fraction of the learning period and found that as long as the optimization period was of the same order of the prediction period, there was no significant change in the outcome.The optimal value of η was found using a recursive search.We scanned the range η ∈ [0, 700] in which the lower limit represents no learning and the upper limit was set by the machine precision.However, our search never reached the upper limit and, in most grid cells, was found to be at least an order of magnitude smaller.In the first scan, we used a coarse resolution of η = 10 and recursively narrowed the range to reach a resolution of η = 0.01.Other methods to search for the optimal value of η provided similar results but were less efficient.An important difference between the EWA and EGA methods is that after a long enough learning period under ideal conditions (stationary time series), the former converges to the best model while the latter converges to the "real" value assuming that the real value is known.Figure 1 illustrates this difference using a simple case.
This difference between the forecasters implies that for a long enough learning period, using an ensemble that includes one model that performs better throughout the learning period, the weights will be distributed such that the prediction of the EWA will be determined by this best model and the uncertainty will be very small (due to the small weights of the other models).Under the same conditions, the EGA would still assign more significant weights to the other models in order to extract the information they contain regarding the dynamics of the "real" value, leading to larger uncertainty (and often better predictions).
The learn-α algorithm (Monteleoni and Jaakkola, 2003) is based on the fixed-share algorithm developed by Herbster and Warmuth (1998).The fixed-share algorithm is designed to switch between experts (or between climate models in our case) in response to changes in their performances.It is done by adding a switching probability parameter, α, that ensures that all experts are considered at all times.Monteleoni and Jaakkola (2003) improved this algorithm by learning the optimal switching rate between experts.This algorithm was already tested for long-term climate projections using the CMIP3 long-term experiments (Monteleoni et al., 2010(Monteleoni et al., , 2011)), and here we also test its performance in decadal climate predictions for comparison with the EWA and EGA methods.
The learn-α algorithm assigns weights for each expert and for each value of the switching rate α j ∈ [0, 1]; the discrete index, j ∈ 1, . .., m represents the performance-optimized discretization of α (Monteleoni and Jaakkola, 2003).The weight of each expert for a given value of α, w E,t=1 α j ,is set initially to 1/N e (N e is the number of experts in the ensemble), and the weight of each α j , w t=1 α j is set initially to 1/N α (N α is the number of discrete values of α ∈ [0, 1] that are considered).The weights are updated as follows.(i) At each time step, the loss of each model, E, is calculated in a similar manner to the EWA, l E,t ≡ (f E,t − y t ) 2 .(ii) For each α j , the loss per α is calculated, l t α j ≡ − log N e E=1 w E,t α j e −l E,t , and the weight of α j is updated according to where Z t normalizes the weights.(iii) For each model, E, and switching rate, α j , the weight w E,t α j is updated according to where, is the Kronecker delta and Z t (α j ) normalizes the weights per alpha.
The prediction at t = n is a weighted average of the experts and the different values of α: One can see that in the LAA, the learning rate, η = 1, and the switching rate, α, is sequentially optimized, while for the EWA and EGA, the learning rate, η, was set to achieve the best performance during the learning period.The LAA is designed to switch between models faster than the EWA, which is important when the sequences learned are nonstationary.

Improved predictions
We consider an ensemble of eight global climate models for the period of 1981-2011, whose results are part of the CMIP5 decadal experiments (Taylor et al., 2011).Table 1 describes the eight models that we used in this study.These models were first linearly interpolated to the spatial resolution of the NCEP/NCAR reanalysis data using the NCAR command language (NCL, 2011).We focus on the model predictions of the 2 m temperature.The decadal experiments of the CMIP5 project include a set of runs for each of the models, which represent different initial conditions.In agreement with the common knowledge (Meehl et al., 2009), we found that on decadal time scales, the internal variability of each model is smaller than the variability between the models.Therefore, we chose, arbitrarily, the first run for each of the ensemble models.The results presented here are based on a learning period of 20 years , followed by predictions for a 10-year (2001-2011) validation period.
The learning period served for both learning (i.e., weight assignment) and correcting the bias of the models.This was simply done by subtracting the average of each of the models during the learning period and adding the average of the NCEP/NCAR reanalysis data (Kalnay et al., 1996) (considered here as reality).This bias correction was applied to each grid cell separately and was done to ensure that the improvement achieved by the forecasters was beyond the impact of a simple bias correction.In addition, we chose a long enough learning period to ensure that our results were not affected by the drift of the models from the initial condition toward their climate dynamics (Meehl et al., 2009).
The performance of the models was determined by comparing the model predictions to the NCEP/NCAR reanalysis data (Kalnay et al., 1996).We are aware of the spurious variability and trends in the NCEP data and of other reanalysis projects (Uppala et al., 2005;Onogi et al., 2007); however, in order to demonstrate the capability of the SLA to improve global and regional climate predictions, the reanalysis data is the best data set to use.
Using the predictions of the climate models only 20 years after they were initialized can cast doubt on their ability to generate skillful predictions since it is believed that climate models' skill tends to vanish after that long a period.However, we found that, for most of the models we used, this is not the case.This fact is illustrated in Fig. 2, which shows that the globally averaged RMSE of most of the climate models did not increase considerably during the 30-year-long simulations.Another noticeable and important feature of the CMIP5's climate models is the fact that, globally, climatology performs much better than each of the models.In Sect.5, we show that, despite this fact, the SLA can use the models and the climatology to provide a forecast that is better than the climatology.Four forecasting methods (forecasters) were tested: the EWA, the EGA, the LAA and a simple average.The simple average represents no learning and is presented to illustrate the superior performance of the SLAs.The performance of the forecasters is measured by the root mean square error (RMSE), during the validation period, which quantifies the deviation of the predicted climate trajectory from the observed one.
Figure 3 shows the RMSE in the 2 m-temperature monthly average prediction, during the 10-year validation period, for each grid cell.Panels a, b, c and d correspond to the RMSE of the EWA, EGA, LAA and simple average weighting schemes, respectively.The EWA, EGA and LAA forecasters give better predictions than the simple average.The improvement achieved by the three forecasters, compared with the simple average, is more apparent close to the poles and in South America.In these regions, the models deviate more from each other, and the weighting schemes favor those that perform better.Over the oceans and low to mid-latitudes, the models showed better agreement, and therefore, the weighting schemes did not yield a large improvement.The global, area-weighted RMSE can be used to quantify the improvement achieved by the SLA forecasters, that is, 1.316 • C for the EWA, 1.297 • C for the EGA, 1.372 • C for the LAA and 1.390 • C for the simple average.Since the EWA has the tendency to converge to the best model (if the ensemble includes a model that is always better than the others in certain regions), we also compared the performance of the EWA and EGA forecasters with two forecasting meth- ods that predict according to the best model (defined as the model that was assigned the highest weight according to either the EWA or the EGA) in each grid cell.The global, area-weighted RMSE was found to be 1.568 • C for the best model based on the EWA and 1.633 • C for the best model based on the EGA.These results show that the SLA forecasters outperform the best models in the ensemble.In general, we found that a longer learning period improves the predictions of the forecasters.Figure 4 shows that the areaweighted RMSE of the forecasters (during the validation period) is reduced when the learning period is extended.By increasing the learning rate, we found that shorter learning periods can be selected with no significant increase in error; however, we chose a learning period that is of the order of the prediction period in order to capture the climate dynamics in all the time scales that are relevant to the prediction period.

Reduced uncertainties
The weights obtained from the SLA method can be used to better estimate the uncertainties of the predictions.The uncertainties are quantified by the square root of the time average of the weighted variance of the ensemble.This quantity (for a period of n time steps) in the (i, j ) grid cell is defined as follows:  The presented RMSE was calculated for the EGA forecaster; however, a similar trend was obtained for the EWA and LAA.In general, a longer learning period improves the forecaster predictions.SD(i, j ) ≡ ( 13) Here, f E,t (i, j ) is the prediction of model E for grid cell (i, j ) at time t; p t (i, j ) is the prediction of the forecaster for grid cell (i, j ), at time t (i.e., the weighted average of all the models); and w E (i, j ) is the weight assigned to model E at grid cell (i, j ) (the weights remain constant during the validation period for which the SD is calculated).The global, area-weighted uncertainty is defined as follows: Figure 5 shows the uncertainty of the 2 m temperature during the validation period for the three forecasting methods; panels a, b, c and d correspond to the EWA, EGA, LAA and simple average forecasters, respectively.It is important to note that this uncertainty is only due to the different predictions of the ensemble models; other sources of uncertainty are not affected by our forecasting schemes.The three learning algorithms, EWA, EGA and LAA forecasters, yield smaller uncertainties than does the simple average.The improvement is significant in regions where the uncertainties are larger, such as toward the poles and over South America and Africa.The global, area-weighted uncertainties are 1.242, 1.381, 1.078 and 1.593 • C for the EWA, EGA, LAA and simple average forecasters, respectively.These values show that in addition to improving the predictions, the SLA forecasters also reduce the uncertainties of these predictions.Note that the smaller uncertainty of the EWA and the LAA forecasters is simply due to the fact that these forecasters converge to the best model in each grid cell (if the ensemble includes a model that is always the best).The uncertainty of the EGA provides a better estimate of the predictions' uncertainty because its predictions converge to the observations.

Skillful forecast
The skill of a forecaster may be defined as its ability to provide better predictions than the reference climatology.In our study, the natural choice is the climatology of the learning period; that is where, y i,m is the value of the variable (in this study, it is the 2 m temperature as reported in the reanalysis data) in the calendar month m of the year i; the learning period duration is L years; and the climatology, C m , is just the average of that variable during the L years.A prediction that is based on climatology assumes that for each month of the prediction period, the value of the variable will be equal to the climatology of the corresponding calendar month.Therefore, it is reasonable to expect that a skillful model should provide more information on the variability of the climate than the average of previous years (the climatology).Figure 6a shows the differences between the 10-year RMSE of the 2 m-temperature monthly mean of the climatology and of the EGA forecaster.Positive values represent locations where the EGA forecaster has a smaller RMSE and is, therefore, considered as a skillful forecaster.In most regions, the climatology performs better than the EGA forecaster (and, obviously, better than the best model); however, some regions indicate the EGA's advantage, such as eastern North America up to Greenland.We found that the regions in which the EGA forecaster performs better are characterized by larger variability (which increases the deviations from the climatology).The global, area-weighted RMSE is 1.188 • C for the climatology and 1.373 • C for the EGA.One could conclude that the EGA forecaster is not skillful.
To circumvent this problem, we decided to add the climatology of the learning period as an additional model to our ensemble.In Fig. 6b, we show the difference between the RMSE of the EGA forecaster, for the model ensemble that includes the climatology, and the RMSE of the climatology itself.In this figure, one can see that the EGA forecaster, for the model ensemble that includes the climatology, provides predictions that are at least as good as the climatology over most of the globe.Adding the climatology to the ensemble reduced the global, area-weighted RMSE of the EGA forecaster to 1.156 • C -a small improvement (a reduction of about 2.7 %) over the climatology.The global, area-weighted RMSE of the EWA, LAA and simple average with climatology are 1.187, 1.180 and 1.337 • C, respectively.The global, area-weighted uncertainties of the 10-year validation period, in this case, are 0.118, 0.953, 0.836, and 1.552 • C for the EWA, EGA, LAA and simple average forecasters, respectively.Note that as we mentioned earlier, the small uncertainty associated with the EWA forecaster is not representative of the climate prediction uncertainty.In what follows, we focus on the significance of the results of the EGA forecaster.

Significance tests
There is more than one test that can be done to demonstrate the significance of the results.We focus on testing whether the EGA forecaster improves the predictions beyond climatology (as shown earlier, each of the models performs worse than the climatology) and whether it reduces the uncertainties below those of an equally weighted ensemble.Both tests were done globally and regionally.We start by defining two properties.The first is the difference between the absolute error of the climatology and the absolute error of the EGA forecaster at a given grid cell and time point, that is, |(C t (i, j ) − y t (i, j ))| − |(p t (i, j ) − y t (i, j ))|.The second is the difference between the uncertainties of the equally weighted ensemble and the ensemble weighted according to the EGA forecaster at a given grid cell and time point, that is, 1 N N E=1 (f E,t (i, j ) − f •,t (i, j )) 2 − N E=1 w E (i, j ) • (f E,t (i, j ) − p t (i, j )) 2 (the dot replacing the E index represents averaging over that index).For both quantities, positive values represent a better performance of the EGA forecaster.The 10-year validation period yields, for each of these quantities, a time series with 120 points in each grid cell.The fraction of the time series (the number of points out of the total 120) showing positive values can be used to test the significance of the improvement.We define a significant improvement by the EGA forecaster to be when the number of successes are above 66 (i.e., when the null hypothesis that the quantities defined above are symmetrically distributed around zero is rejected with ∼ 90 % confidence).
Figure 7 shows the spatial distributions of the number of positive values (out of the total 120 time points) for the two quantities.The upper panel corresponds to the difference between the absolute error of the climatology and the EGA forecaster, and the lower panel corresponds to the difference between the uncertainties of the equally weighted and EGAweighted ensembles.
The upper panel in Fig. 7 shows that there are large regions of improvement, which is more apparent over land, close to the poles and to the equator.The lower panel shows that in regions in which the EGA reduces the uncertainty, it does so for almost all time points and vice versa.No correlation between significant improvement of the predictions and significant reduction of the uncertainties was identified.
The global test we performed was done by calculating the area weighted average of the two quantities defined above and plotting the histograms of their time series.These are shown in Fig. 8.The upper panel shows the globally averaged absolute error difference between the climatology and the EGA forecaster, and the lower panel shows the globally averaged difference between the uncertainties of the equally weighted and EGA-weighted ensembles.The x axis is in units of • C and is zero centered to emphasize the nonsymmetrical distribution of the data.The upper panel shows that there are only 11 negative values out of 120 and a positive peak at around 0.03 • C. The probability of more than 108 positive values out of 120 in a symmetrical distribution with a zero mean is practically zero; therefore, we conclude that, globally, the EGA forecaster predicts better than climatology.The difference in uncertainties shows that the EGA forecaster has lower uncertainty than the equally weighted ensemble for all the time points, and therefore, we can also conclude that the reduction of the globally averaged uncertainties is significant.

Summary and discussion
The SLA method does not rely on any assumptions regarding the distributions of the climate variables; therefore, it is robust and can be used for any climate variable.The updating scheme of the weights does not require a considerable computational cost and allows for a fast and easy update of the weights when new measurements become available.In the results presented here, we used the deviation from the trajectory of the climate variable as the metric for the E. Strobach and G. Bel: Climate predictions using learning algorithms weighting, but other weighting methods can also be applied.For example, one can use a measure of the statistical distance, such as the Kullback-Leibler divergence (Kullback and Leibler, 1951) or the Jensen-Shannon divergence (Manning and Schütze, 1999); a model that yields a probability density function (PDF) that is closer to the measured PDF of a variable will get a higher weight.
One disadvantage of the SLA method (which may also be considered as an advantage for some applications) is the fact that the weights are between zero and one.This means that if the measurements are not spanned by the predictions of the models, the SLA algorithm will not be able to track the measurements but would converge to the best model since, by definition, the SLA predictions are bound by the predictions of the models of the ensemble.In this case, other methods, such as the regression that can yield any linear combination of the model predictions, may achieve better predictions than the SLA forecasters but will not be able to reduce the ensemble uncertainties.
We showed that climate predictions (on a decadal time scale) of the 2 m-temperature monthly average can be improved and that the associated uncertainties can be reduced using the SLA algorithms.The largest improvement was found using the EGA forecaster.We believe that the small improvement achieved by the EWA and LAA, when the climatology was added as an expert to the ensemble, stems from the fact that over most of the globe, the climatology dominated the predictions of these SLAs.
The improvement, relative to the climatology and the equally weighted ensemble, achieved by the LAA and the EWA, although small, was found to be statistically significant.The better performance of the EGA, compared with the LAA, suggests that in decadal climate predictions, the nonstationary nature of the climate system does not play a major role.The more significant improvement is achieved when focusing on tracking the best prediction rather than the best model (Cesa-Bianchi and Lugosi, 2006).
The improved predictions and reduced uncertainties considered here are only those arising from the variability between different models.This is because the ensemble used in this study consists of only one run (corresponding to one initial condition) of each of the models.The uncertainties due to the internal variability of each of the models remained unaffected.In principle, the SLA method can be used to quantify the quality of different initialization methods.However, there is no justification for weighting initial conditions generated by the same method at times that are of the same order of magnitude before the prediction period.Therefore, the SLA method cannot reduce uncertainties associated with the internal variability of the models.
The SLA method provided better predictions than each one of the models and their simple average.All the models, including the simple average, considered in this study showed no global skill; namely, in averaging over the globe, the climatology provided a better prediction than each of the models.The SLA forecasters do not resolve this issue unless the climatology is added as an additional model to the ensemble.When the model ensemble includes the climatology, the SLA forecasters can yield better predictions than the climatology itself by assigning high weight to the climatology in the regions where the models fail and high weight to the best models in regions where they perform better than the climatology (namely, regions where the best models are skillful).
The method and the results presented here provide performance-based, spatially distributed weights of climate models, which lead to improved climate predictions and reduced uncertainties.These can be relevant for many applications in agriculture and ecology, and for decision makers and other stakeholders.The spatially distributed weights may also be used for testing new parameterization and physics schemes in global circulation models.

Figure 1 .
Figure 1.An ideal experiment with two experts.The first always predicts zero and the second always predicts one.The true value is always 0.7.The EWA forecaster converges to the best model (predicting one) while the EGA forecaster converges to the true value.

Figure 2 .
Figure 2. Temporal evolution of the global and annual average of the 2 m temperature RMSE for the eight climate models (after bias correction) and the climatology.During the 30 years of the simulations, the skill of most of the models did not decline.In fact, a simple linear fit to the models indicates that some of them increased their skill with time.

Figure 3 .
Figure 3. 10-year RMSE of the 2 m temperature for three forecasting methods, (a) EWA, (b) EGA, and (c) LAA, and (d) the simple average.The colors represent the RMSE of each grid cell.All the SLA forecasters yield a smaller global RMSE than the simple average.The improvements achieved by the forecasters, compared with the simple average, are more apparent close to the poles and in southwestern America.

Figure 4 .
Figure 4. Global, area-weighted RMSE of the 2 m temperature, during the 10-year validation period, as a function of the learning time.The presented RMSE was calculated for the EGA forecaster; however, a similar trend was obtained for the EWA and LAA.In general, a longer learning period improves the forecaster predictions.

Figure 5 .
Figure5.The 2 m-temperature uncertainty during the 10-year validation period for three forecasting methods, (a) EWA, (b) EGA, and (c) LAA, and (d) the simple average.The uncertainties of the EWA and LAA are smaller than those of the EGA; however, the predictions of the EGA are better (see the text for a more detailed explanation).All the forecasters yield smaller uncertainties than the simple average.The uncertainties, corresponding to the SLA forecasting schemes, are significantly reduced in regions where the uncertainties are larger, such as toward the poles and over South America and Africa.

Figure 6 .
Figure6.The difference between the 10-year validation period average 2 m temperature RMSE of the climatology and the EGA forecaster, (a) EGA with an ensemble that includes eight models, (b) EGA with an ensemble that includes the same eight models and also the climatology of the learning period as an additional model.The results demonstrate that when the ensemble includes the climatology, the EGA forecaster is skillful.

Figure 7 .
Figure 7.The number of time points in which the EGA forecaster performs better.The upper panel shows the spatial distribution of the number of time points in which the absolute error of the EGA forecaster is smaller than that of the climatology.The lower panel shows the spatial distribution of the number of time points in which the uncertainty of the EGA-weighted ensemble is smaller than that of the equally weighted ensemble.White circles represent significant improvement by the EGA forecaster and black circles represent its significantly poorer performance.Both quantities show better performance of the EGA forecaster over most of the globe.

Figure 8 .
Figure 8.The histograms of the globally averaged differences of absolute error and uncertainty.The upper panel shows the histogram of the globally averaged difference between the absolute error of the climatology and that of the EGA forecaster.The lower panel shows the histogram of the difference between the uncertainties of equally weighted and EGA-weighted ensembles.Both quantities show significantly improved performance of the EGA forecaster.