Multi-model dynamic climate emulator for solar geoengineering

Climate emulators trained on existing simulations can be used to project the climate effects that would result from different possible future pathways of anthropogenic forcing, without relying on general circulation model (GCM) simulations for every possible pathway. We extend this idea to include different amounts of solar geoengineering in addition to different pathways of greenhouse gas concentrations by training emulators from a multi-model ensemble of simulations from 5 the Geoengineering Model Intercomparison Project (GeoMIP). The emulator is trained on the abrupt 4×CO2 and a compensating solar reduction simulation (G1), and evaluated by comparing predictions against a simulated 1% per year CO2 increase and a similarly smaller solar reduction (G2). We find reasonable agreement in most models for predicting changes in temperature and precipitation (including regional effects), and annual-mean Northern hemisphere sea ice extent, with the 10 difference between simulation and prediction typically smaller than natural variability. This verifies that the linearity assumption used in constructing the emulator is sufficient for these variables over the range of forcing considered. Annual-minimum Northern hemisphere sea ice extent is less-well predicted, indicating the limits of the linearity assumption. For future pathways involving relatively small forcing from solar geoengineering, the errors introduced from nonlinear effects may be smaller 15 than the uncertainty due to natural variability, and the emulator prediction may be a more accurate estimate of the forced component of the models’ response than an actual simulation would be.

precipitation response, we employ an EOF-based approach (as in Herger et al., 2015) with a common set of EOFs constructed from both CO 2 -forced and geoengineering simulations.In addition to temperature and precipitation, we also consider Northern hemisphere sea-ice extent (see Supplementary 70 Material for net primary productivity; NPP).
We use simulations from the Geoengineering Model Intercomparison Study (GeoMIP, Kravitz et al., 2011) where solar reduction is used as a proxy for any approach that reduces incoming shortwave radiation.By training the emulator on one set of simulations and validating on a second, we can evaluate the fundamental assumption of linearity.Section 2 describes the methodology and sim-75 ulations used, and the resulting emulator and validation are given in Section 3.

Approach
The expectation that an emulator calibrated to match the GCM response to one climate forcing pathway can also do so for a different pathway is typically based on the assumption that the response to forcing can be reasonably approximated as linear and time-invariant (LTI).Consider a system forced 80 by both time-dependent forcing f (t) from changes in atmospheric greenhouse gas concentrations and time-dependent forcing g(t) from solar geoengineering.For any variable z i (t), define z f i (t) as the response to forcing f (t) with g(t) = 0 and z g i (t) as the response to forcing g(t) with f (t) = 0, where the response is defined in each case as the difference relative to the initial state, and neglecting natural variability.The system is linear if for any scalars α and β, the response to the combined 85 forcing αf (t) + βg(t) is the same linear combination of the individual responses, αz f i (t) + βz g i (t).Note that in general, even if the system is linear, the ratio of any two variables will vary with time simply because different variables respond at different rates; that is, for any forcing scenario, there is not in general some constant µ such that z i (t) = µz j (t) for all time (a plot of z i (t) against z j (t) will not be a straight line if these variables respond with different time constants).The usage of the 90 word nonlinear to express this latter idea is distinct from the concept of the dynamic system itself being linear or nonlinear.By a dynamic system we simply mean that z(t) depends on past values of the forcing f (t) or g(t) in addition to the current values.A linear system can be characterized purely by its response to a Dirac delta function; this is the impulse response.
For an LTI system forced by both time-dependent f (t) and g(t), the response of any variable z i (t) 95 can be expressed in terms of a convolution between the input time-series and the system impulse 3 Atmos.Chem. Phys. Discuss., doi:10.5194/acp-2016-535, 2016 Manuscript under review for journal Atmos.Chem.Phys.Published: 24 June 2016 c Author(s) 2016.CC-BY 3.0 License.response functions as where h f i (t) is the impulse response due to greenhouse gas forcing and h g i (t) the impulse response due to solar reductions; these will not in general be identical, nor in general the same for any choice of output variable z i .The variable n i (t) is included to denote climate variability.The system is 100 time-invariant if h(τ ) in equation ( 1) does not depend explicitly on the current time t; some possible exceptions are noted in Section 4. Herein we consider only annual-mean variables.The same formalism as in equation ( 1) also applies for predicting the seasonal-dependence of the response of a linear (but time-periodic) system, where the impulse response in general also depends on the time of year (e.g., h = h(τ, m) for month m) and additional training simulations would be required 105 to estimate the seasonal-dependence of h.If the climate system were indeed LTI, then equation ( 1) would hold for any variable (temperature, precipitation, etc) either at any one location or projected onto any particular spatial pattern.
To estimate the impulse response for CO 2 forcing, we use the difference between the abrupt 4×CO 2 simulation and pre-industrial simulation for each of the models participating in GeoMIP.

110
To estimate the impulse response for solar reduction, we use the G1 simulation from GeoMIP, in which the CO 2 concentration was quadrupled and insolation decreased to approximately maintain radiative balance and hence global mean temperature (see Figure S1).The difference between G1 and the 4×CO 2 simulations thus gives the response to an abrupt change in solar forcing.Note that each model separately chose the level of solar reduction g 4× required to balance the forcing from 115 increased atmospheric CO 2 , so that the percent solar reduction in G1 varies from model to model based on the efficacy of solar forcing in that model; see Table S1.Define where CO 2 (t) is the time-varying atmospheric CO 2 concentration and Solar(t) is the solar irradiance.The 4×CO 2 experiment then corresponds to forcing f (t) = 1, t ≥ 0 and f (t) = 0, t < 0 with g(t) = 0, while the GeoMIP G1 simulation uses the same f (t) but with g(t) = 1, t ≥ 0.
120 Substituting into Equation (1) then for any variable z i (t), the difference z 4× i (t) between its value in 4×CO 2 and preindustrial is given by 125 from which we can estimate The impulse responses h f,g i (t) could be estimated from the time-series of any forced simulation, but take particularly simple form from these step response simulations.(A linearly increasing forcing scenario such as a 1% per year increase in CO 2 also leads to a simple form, with the impulse response proportional to the second derivative of the 1%CO 2 response.)130 These impulse response estimates are "noisy" due to natural variability.Various approaches could be used to reduce the influence of natural variability, such as 1.Using multiple ensemble members or multiple forcing scenarios (as in Castruccio et al., 2014, for example), 2. Only considering spatial averages by computing the global mean as in pattern scaling, pro-135 jecting onto EOFs as in Herger et al. (2015), or averaging over specific spatial regions as in Castruccio et al. (2014), 3. Applying temporal filtering to smooth high-frequency noise in ĥ or fitting h(t) to some estimated functional form such as semi-infinite diffusion for global mean temperature (Caldeira and Myhrvold, 2013) or a multiple-exponential (Castruccio et al., 2014) or 140 4. Finding some less-noisy predictive variable, such as global mean temperature, to use as the predictor of other, noisier variables (effectively what is done in predicting the regional precipitation or temperature response in any pattern scaling analysis).
Choosing simulations with high forcing levels to train the emulator (4×CO 2 and GeoMIP G1) allows us to make useful predictions at lower forcing levels without the need for introducing arbitrary 145 a priori assumptions on the functional form of the dynamics, such as that every field simply scales with global mean temperature.The penalty for this choice is that the high forcing will exacerbate any nonlinear effects; this precludes, for example, useful predictions of the Northern hemisphere annualminimum sea ice extent, which would require that a lower-forcing simulation be used to train the emulator.

150
A frequency-domain perspective is useful to understand how the "noise" due to climate variability affects the emulator predictions.The Laplace transform of Equation (1) transforms the convolution into multiplication: where the Laplace transform of the impulse response, H i (s) = L(h i ), is the transfer function between that input and that output; capital letters will denote the Laplace transform of h(t), f (t) and g(t).The impulse response could thus equivalently be estimated by first taking the Laplace transform of the input and output, computing the ratio, and computing the inverse transform.Consider for example the response to increased CO 2 (the estimation for solar reduction is analogous), where the 160 emulator is trained on the input f t (t) and used to predict the response to forcing f p (t), with Laplace transforms F t (s) and F p (s).The transfer function estimate used by the emulator is and hence in the frequency domain the response predicted by the emulator for input forcing F p (s) is That is, climate variability in the simulation used to train the emulator leads to an error in the prediction that depends on the ratio of frequency content in the forcing signals between training and prediction simulations.Because a "step" change in the input such as in the abrupt 4×CO 2 simulation has more signal energy at low frequencies than high (Laplace transform proportional to 1/s), 170 it leads to a better estimate of the output response at low frequencies than at high frequencies; the high-frequency estimation errors due to natural variability manifest as "noise" on the estimated impulse response (see Figure 1 for an example).However, the smoothly varying radiative forcing input due to a 1% per year increase in CO 2 has even less energy at high temporal frequencies than the step input (Laplace transform proportional to 1/s 2 ).Thus training an emulator on a "step" input simu-175 lation and then using it to predict the results from a smoothly-varying forcing trajectory will result in relatively noise-free emulator predictions, despite the apparent high-frequency "noise" in the impulse response.Note that the GeoMIP G2 simulation (described at the beginning of the next section) has an abrupt change in the solar forcing at year 50 (see Figure S1), and the emulated responses to this "step" change in forcing are, as expected, noisier than those due to the smooth forcing changes 180 over the first 50 years of G2.

Results and validation
The impulse responses h f i (t) and h g i (t) are estimated for a number of different variables from the abrupt 4×CO 2 and G1 simulations as described above.The impulse-response based emulator for CO 2 forcing without any solar reduction can be validated by comparing the predictions with the 185 simulations for a 1% per year increase in CO 2 (1%CO 2 ).To validate the emulation of solar reduction, we use the GeoMIP G2 scenario, in which CO 2 levels increase at 1% per year, and for the first 50 years, the solar reduction is gradually increased to balance this forcing.This uses the same ratio 6 Atmos.Chem.Phys. Discuss., doi:10.5194/acp-2016-535, 2016 Manuscript under review for journal Atmos.Chem.Phys.Published: 24 June 2016 c Author(s) 2016.CC-BY 3.0 License.
of g(t) to f (t) as in G1 for each model.After 50 years, the solar reduction is returned to zero so that only the radiative forcing from the CO 2 remains (see Kravitz et al. (2011) and Supplementary 190 Figure S1 for a schematic of the forcing in the G1 and G2 simulations).Several of the climate models that conducted experiments G1 and G2 exhibit significant drift in the absence of net radiative forcing, presumably due to initialization issues.These models are not considered further, leading to a total of 9 models considered here (Table S1).
The impulse response functions for predicting the global mean temperature and precipitation re-195 sponses to either CO 2 or solar forcing are shown in Figure 1, averaged over all of these climate models.As expected these are "noisy" estimates due to natural variability.Note that while the temperature response characteristics are similar (aside from the sign) for increased CO 2 and reduced insolation, the precipitation response differs.The impulse response of precipitation clearly highlights that CO 2 and solar reduction have different "fast" responses (rapid atmospheric adjustments 200 in the climate system before temperature has time to adjust) related to different amounts of radiative forcing absorbed by the atmosphere (e.g., Andrews et al., 2010).
Figure 2 validates the ability of the impulse response formulation in equation ( 1) tuned from the 4×CO 2 and G1 simulations to correctly predict the global mean temperature response from the 1%CO 2 and G2 simulations.Figure 3 shows the corresponding plots for global mean precipitation.

205
Similar results are shown in the supplementary material (Figures S2-S3) for the temperature or precipitation difference between land and ocean.Linearity has previously been argued as a reasonable assumption for temperature and precipitation responses (Kravitz et al., 2014, and references therein) and since that is the only assumption made in constructing the emulator, this analysis also validates that assumption for these variables and at these forcing levels.Note that the difference between 210 GCM-simulated and emulator-predicted trajectories is typically less than the standard deviation of natural variability.
Northern hemisphere sea ice extent is an example of a variable that is both highly relevant for assessing possible future scenarios, yet one in which a nonlinear response to forcing might be expected.The 4×CO 2 forcing is large enough that September sea ice is nearly lost in all models, and 215 thus an emulator trained off of this simulation will do a relatively poor job at predicting the reduction in annual-minimum sea ice extent from smaller forcing.However, despite the obvious nonlinearity in the annual-minimum extent, the annual-mean sea ice extent does behave sufficiently linearly in most models, even at this large a forcing level, so that the 4×CO 2 simulation can be used to train a useful emulator.This is illustrated in Figure 4. 220 Finally, Figure 5 illustrates the ability to capture the spatial response.One of the concerns raised regarding the use of solar geoengineering is that the response from turning down the sun does not perfectly compensate that from increased CO 2 , resulting in some regional differences in temperature and precipitation responses (Ricke et al., 2010;Kravitz et al., 2014).It is therefore valuable to assess whether the emulator can capture some of the regional variation in the response between CO 2 and so- averaged over all 9 models (table S1); the inter-model standard deviation is shown by the shaded bands.While these impulse response functions are "noisy", predictions made using them are less so, particularly for forcing levels much smaller than those used in estimating these functions.Note for precipitation the robust "fast" response to increased CO2 has the opposite sign as the "slow" response.
lar forcing.For each model, EOFs are constructed from the spatial temperature response using both 4×CO 2 and G1 simulations to construct a single set of combined EOFs.In general, only the first few principal components are distinguishable from climate variability and have any predictive capability (Figure S4). Figure 5 plots the model-mean temperature and precipitation responses averaged over years 41-50 of the G2 simulation for both the simulation and the emulator prediction.Note that only 230 after averaging over 10 years and 9 climate models are the temperature and precipitation changes due to G2 -which is designed to have near-zero top-of-atmosphere radiative forcing -statistically significant at the grid-cell level.The differences here between the simulated and emulated responses are not significant at the grid cell level, although greater spatial averaging may indicate some statistically significant regional differences.In particular, the emulator appears to slightly underpredict 235 the amount of residual Arctic warming in G2, likely due to the nonlinearity associated with sea ice albedo feedback at the 4×CO 2 forcing used in training the emulator.Beyond this feature, it is difficult to assess with certainty to what extent the differences between the simulated and emulated regional responses are due to nonlinearities or simply due to natural variability.There is no evidence of nonlinearity in either the ability of the emulator to capture differences between land and ocean 240 temperatures or precipitation (Figure S2 and S3), nor in capturing the first few principal components of the response (Figure S4).Because the emulated response is based on simulations with roughly three times higher radiative forcing, and because the process of its construction suppresses highfrequency natural variability, it is potentially a more accurate representation of the forced-response to G2 in the models than that obtained from the actual G2 simulation.Figure 2. Simulated and predicted global mean temperature, both for a 1% per year increase in CO2 (blue curves) and for GeoMIP experiment G2 (red), for each of the climate models considered here.The predicted response using the emulator is given by black lines, solid for the 1% CO2 case and dashed for G2.

Discussion
Climate emulators provide a powerful tool for assessing any proposed future pathway of mitigation choices (including carbon dioxide removal) and different levels of geoengineering.For example, solar geoengineering could be used only to limit peak warming as part of an "overshoot" scenario in which atmospheric CO 2 concentrations peak and subsequently decline as net-negative carbon emis-250 sions reduce concentrations (Long and Shepherd, 2014;Tilmes et al., 2016).A limited, temporary deployment has also been described as a way to reduce the rate of warming (Keith and MacMartin, 2015;MacMartin et al., 2014).These types of limited-deployment scenarios are motivated in part by recognizing that solar geoengineering sufficient to reduce global mean temperature to preindustrial levels could lead to significant regional disparities and other risks, while a deployment that only par-

255
tially reduces global mean temperature might decrease some metrics of climate change everywhere (Kravitz et al., 2014).By training emulators on a standard set of simulations, such as GeoMIP, that have been conducted by multiple modeling centers, any future scenario such as these can be readily evaluated with multiple models.This can provide more insight into the robustness of conclusions than detailed sim- ulations with any single model.(Of course, any collection of models is an ensemble of opportunity, with interpretation challenges as a statistical sample; see, e.g., Collins et al. (2013), Section 12.2, for a thorough discussion.)The emulator used here assumes that the climate system response can be sufficiently well approximated over the range of forcing levels of interest by the output of a linear system.For many variables, the analysis here indicates that this is a sufficiently good assumption,

265
with the difference between simulated and emulated responses smaller than the standard deviation of natural variability.There are many more variables that may be of interest, including higher moments to predict extremes; similar analysis as here could be used to assess whether a linear assumption is or is not sufficient for projecting the response of any variable beyond those considered here.Finally, note that the results herein were obtained using simulations that reduce the solar constant 270 as a proxy for any solar geoengineering approach.The climate effects from any specific technology, such as stratospheric aerosol injection (SAI) will differ (e.g., Ferraro et al., 2015) both due to the different mechanism of radiative forcing, and the different spatial pattern of radiative forcing (the latter being at least partially a design choice; Kravitz et al., 2016).Further, while linearity appears to be a reasonable assumption in these climate models for predicting the response of many climate variables 275 to an imposed solar reduction, it may be a poorer approximation for SAI, for example.Nonlinearities will occur in aerosol size distribution (Heckendorn et al., 2009;Niemeier and Timmreck, 2015), as well as due to changes in the stratospheric circulation that result from the aerosols (Aquila et al., 2014); time-invariance might also not hold if, for example, time-varying stratospheric chlorine concentrations (which affects the aerosol impact on ozone) are considered part of the "system" rather 280 than a forcing.It is unclear how significantly these will affect the ability to develop emulators for this technology.
Author contributions.DGM and BK designed the study, conducted the analysis, and wrote the paper.(Bala et al., 2010), and an overcooling of the tropics and an undercooling of the poles (Kravitz et al., 2013).The latter is an artifact of a latitudinally-uniform reduction in sunlight, and could be better managed by increasing the forcing at high latitudes relative to low (Kravitz et al., 2016).

Figure 3 .
Figure 3.As in Figure 2 but for global mean precipitation.Simulated and emulated response are shown for 1% per year increase in CO2 and GeoMIP experiment G2 for each of the climate models considered here.

Figure 4 .
Figure 4.As in Figure 2 but for Northern Hemisphere annual-mean sea ice extent.Simulated and emulated response are shown for 1% per year increase in CO2 and GeoMIP experiment G2 for several of the climate models considered here; the dotted line shows the response for the abrupt 4×CO2 simulation.

Figure 5 .
Figure 5. Temperature (left) and precipitation (right) averaged over years 41-50 of G2 simulation and averaged over all 9 models.The upper row shows the simulated results; the lower row shows the prediction based on a spatial emulator developed using 4 EOFs for each model.As noted elsewhere, the robust response to increasing CO2 and reducing insolation to maintain zero global mean temperature difference is a net reduction (overcompensation) of global mean precipitation(Bala et al., 2010), and an overcooling of the tropics and an Intercomparison Project and their model development teams, CLIVAR/WCRP Working Group on Coupled Modeling for endorsing GeoMIP and 285 the scientists managing the Earth System Grid data nodes who have assisted with making GeoMIP output available.The Pacific Northwest National Laboratory is operated for the U.S. Department of Energy by Battelle Memorial Institute under contract DE-AC05-76RL01830.This work was partially supported by Cornell University's David R. Atkinson Center for a Sustainable Future (ACSF).12Atmos.Chem.Phys.Discuss., doi:10.5194/acp-2016-535,2016Manuscriptunder review for journal Atmos.Chem.Phys.Published: 24 June 2016 c Author(s) 2016.CC-BY 3.0 License.