Modeling the contributions of global air temperature , synoptic-scale phenomena and soil moisture to near-surface static energy variability using artificial neural networks

The static energy content of the atmosphere is increasing at the global scale, but exhibits important subglobal and sub-regional scales of variability and is a useful parameter for integrating the net effect of changes in the partitioning of energy at the surface and for improving understanding of the causes of so-called ‘warming-holes’ 10 (i.e. locations with decreasing daily maximum air temperatures (T) or increasing trends of lower magnitude than the global mean). Further, measures of the static energy content (herein the equivalent potential temperature, θe) are more strongly linked to excess human mortality and morbidity than air temperature alone, and have great relevance in understanding causes of past heat-related excess mortality and making projections of possible future events that are likely to be associated with negative human health and economic consequences. New non-linear statistical 15 models for summertime daily maximum and minimum θe are developed and used to advance understanding of drivers of historical change and variability over the eastern USA. The predictor variables are an index of the daily global mean temperature, daily indices of the synoptic scale meteorology derived from T and specific humidity (Q) at 850 hPa and 500 hPa geopotential heights (Z), and spatiotemporally averaged soil moisture (SM). SM is particularly important in determining the magnitude of θe over regions that have previously been identified as 20 exhibiting ‘warming holes’ confirming the key importance of SM in dictating the partitioning of net radiation into sensible and latent heat and dictating trends in near-surface T and θe. Consistent with our a priori expectations, models built using Artificial Neural Networks (ANN) out-perform linear models that do not permit interaction of the predictor variables (global T, synoptic-scale meteorological conditions and SM). This is particularly marked in regions with high variability in minand max-θe, where more complex models built using ANN with multiple 25 hidden layers are better able to capture the day-to-day variability in θe and the occurrence of extreme max-θe. Over the entire domain the ANN with 3 hidden layers exhibits high accuracy in predicting max-θe > 347 K. The median hit rate for max-θe > 347 K is > 0.60, while the median false alarm rate is ≈ 0.08. 1 Motivation and objectives 30 Extreme heat is associated with significant societal and environmental impacts, and a number of prior studies have elaborated on the drivers of extreme air temperatures (T) and made projections of extreme T and the associated

human health impacts and socioeconomic consequences (Sanderson and Ford, 2017;de'Donato et al., 2015;O'Neill and Ebi, 2009;Garcia-Herrera et al., 2010). Previous studies have sought to quantify the predictability of extreme T as a function of lead-time and variables describing teleconnections to remote sea-surface temperature anomalies (McKinnon et al., 2016) and/or local soil moisture (Brabson et al., 2005). Physiological stress is maximized under the co-occurrence of elevated T and specific humidity (q) . Thus to understand spatiotemporal 5 variability in heat-related mortality and/or morbidity there is a need to consider integrative variables derived from both T and q, such as equivalent potential temperature (θ e ) computed herein using the following approximation: Where T is air temperature (K), P is atmospheric pressure (hPa), R d is specific gas constant for air (J Kg -1 K -1 ), C pd is specific heat for dry air (J Kg -1 K -1 ), L(T) is latent heat of vaporization (ƒ(T)) (J Kg -1 ), q is specific 10 humidity of water vapor (Kg Kg -1 ) (Bolton, 1980) . Potential temperature is the temperature an unsaturated air parcel would have if brought adiabatically to a standard pressure. Thus, potential temperature is conserved for an unsaturated air parcel if it remains unsaturated as it rises and sinks. Equivalent potential temperature is conserved under vertical motion even if there is phase change of water vapor contained within the air. Hence, use of metrics such as θ e permits more accurate depictions of near-surface 15 energy budgets and surface heating trends for use in climate change detection studies (Davey et al., 2006). Equivalent potential temperature rather than equivalent temperature is used here to allow comparison of values from other reanalyses (or other model output) that uses a different discretization of terrain elevation.
When considering the local surface energy balance (Eq. 2), near-surface T is responsive only to changes in the sensible heat flux from/to the surface, while θ e responds to changes in both the sensible and latent heat flux terms: 20 static energy (θ e ) are enhanced in years and seasons with high global mean air temperatures. Previous research has indicated that variability in equivalent temperature (T e , i.e. the temperature computed from (1) but excluding the correction for bringing the air parcel adiabatically to a reference pressure of 1000 hPa) in the North Atlantic is strongly linked to NAO (Ribera et al., 2004), and the probability of 'heat waves' across the US is linked to hemispheric waves (Teng et al., 2013) and thus the PNA (Trenberth, 1990). 5 • Synoptic scale forcing (Grotjahn et al., 2016). Much of the eastern USA broke records for heat indices during the summer of 2012 in part due to persistent anticyclonic conditions (Peterson et al., 2013), and many heat watch-warning employ a synoptic typing based methodology (Sheridan and Kalkstein, 2004). Further, variability and temporal trends in summertime moist static energy and dew point temperatures (T d ) in the Midwestern USA appear to be linked to enhanced horizontal atmospheric moisture advection due in part to 10 northward expansion of the southeast summertime anticyclone (Kalkstein et al., 1998;Rogers et al., 2007;Pryor and Schoof, 2016;. • Local-regional changes in water availability and energy partitioning at the surface, due to factors such as land cover change and soil moisture modification as a result of irrigation of cropland (Davey et al., 2006;Pryor et al., 2016). 15 Observed tendencies in T, q and θ e are naturally a product of a combination of these drivers (Horton et al., 2016).
The expression of internal climates modes (e.g. ENSO, PNA and NAO) influences the frequency and intensity of different synoptic scale phenomena (Sheridan, 2003;Weaver, 2013), and has been found to be partly responsible for an increase in the number of oppressively hot days in many urban areas across the USA over recent decades (Sheridan et al., 2009). Further, there are important feedbacks between the land-atmosphere coupling, the synoptic 20 scale circulation patterns and boundary-layer structure (Lee et al., 2016). Thus, for example, extreme high T d (and by association elevated q) in the Midwestern USA is associated with (a) development and propagation of low pressure from the High Plains to through the upper Great Lakes, (b) healthy crops and sufficient surface soil moisture, and (c) restricted vertical mixing (Bentley and Stallins, 2008).
The objectives of this research are: 25 1) To use non-linear (machine learning) models applied to a three-tiered suite of predictors: (a) An index of the daily global mean temperature, (b) indices of the conditions at the synoptic scale based on principal components analysis (PCA) of upper-air variables, and (c) soil moisture estimates, to investigate spatiotemporal variations in θ e and enhance understanding of the causes of variability and change in θ e over the eastern USA.
• It is characterized by high summertime T e (and θ e ) and in situ data have indicated trends in T e exceed those in T alone (Schoof et al., 2015). Further, the region is largely congruent with an area of coherence for extreme T events in prior research (McKinnon et al., 2016).
• It encompasses major urban areas that have experienced a number of past extreme heat events associated with substantial excess mortality and morbidity (see summary in (Vanos et al., 2015) and also (Anderson and Bell, 5 2011)). Further, a range of reanalyses exhibit a consistent signal of increasing frequency of both dry (i.e. high T but low q) and humid (i.e. combined high T and q) heat wave days over the study region during the period 1981-2015 .
• It exhibits strong spatial gradients in terms of the nature of land cover and rate of change of both land management and SM (Figure 1a and g, (Pryor et al., 2016;Pryor and Schoof, 2016;Ellenburg et al., 2016)). For 10 example, over the period 1950-2000 the region as a whole experienced rapid population growth (though this was not spatially uniform), expansion of area classified as exurban, an overall reduction of land in agriculture (though again this was highly heterogeneous in space) and an increase in the intensity of water management (including expansion of irrigation) (Brown et al., 2005).
• Parts of it (focused on the southern Great Plains, i.e. the southern and western-most portion of the current study 15 area) were identified in the global land-atmosphere coupling experiments (GLACE) as exhibiting atypically strong atmosphere-surface coupling in some global climate models. Further, soil moisture makes a large contribution to sub-seasonal forecast skill for air temperatures and precipitation in this region (Koster et al., 2011;Koster et al., 2006;Dirmeyer and Halder, 2017;Guo et al., 2006).
• It also incorporates two areas of reduced daily maximum T during multiple consecutive years extending over 20 part or all of the twentieth century (i.e. warming holes). One is located along the border of Iowa (IA)-Nebraska (NE)-South Dakota (SD) and one is centered on Mississippi (MS) and Alabama (AL) (locations shown in Figure 1h). The lack of warming in both regions has been attributed to changing land-surface characteristics and enhanced soil moisture availability (Kalnay and Cai, 2003;Pan et al., 2009;Ellenburg et al., 2016). In the case of MS and AL upto 60% of the variance of summer temperatures has been ascribed to soil moisture (and thus an 25 increase in the LE at the surface at the expense of H, Eq. 2) and cloud cover (reducing the net radiation) (Ellenburg et al., 2016). However, additional factors may account for these warming holes including large scale climate modes such as the Interdecadal Pacific Oscillation and the associated sea surface temperature anomalies in the tropical Pacific (Meehl et al., 2015), and aerosol radiative forcing (Leibensperger et al., 2012;Yu et al., 2014). 30
This minimizes uncertainty in models linking our predictors to near-surface static energy resulting from use of different data sets to derive the predictor suite and/or the response variables (daily minimum and maximum θ e ). The MERRA-2 reanalysis product assimilates an unprecedented array of remote sensing and in situ data streams, but 35 does not assimilate in situ observations of near-surface T or q. MERRA-2 output is available at a resolution of 0.625°× ~0.5° (longitude-by-latitude). We use MERRA-2 output for all summer days (JJA) during 1980-2015 to compute the following variables: 1) A globally averaged daily mean air temperature computed using hourly 2-m T from all MERRA-2 grid cells. This predictor (Figure 1b) is intended to represent the long-term tendency in global mean temperatures and interannual/interdecadal variability caused by internal climate modes (Huang et al., 5 2017).
2) Indices of synoptic scale meteorology. Air temperature (T 850 ) and specific humidity (q 850 ) at 850 hPa along with 500 hPa geopotential heights (Z 500 ) in the domain (25.5-50°N, 97.5-65°W) for 20:00 UTC (i.e. 16:00 Eastern Daylight Time, 15:00 Central Daylight Time) are converted to z-scores (that indicates how many standard deviations an individual value is from the mean) and used in a rotated PCA to generate daily 10 principal component scores (PCs) that represent the proximity of each day to the major modes of synopticscale variability. These predictors (i.e. the PCs) are intended to represent variability in the synoptic-scale circulation (e.g. presence of anticyclonic conditions likely to be associated with subsidence and thus retarded vertical mixing) and also large-scale advection of static energy. Fifteen components are retained based on a scree plot analysis (Cattell, 1966) and are subject to Varimax rotation (Richman, 1986). Daily 15 PC scores for all 15 PCs are used as predictors in the statistical models, to allow each day to exhibit partial membership of multiple synoptic types. Spatial fields of the three variables sampled once per day are used in the PCA due to the high temporal autocorrelation present in these variables, and 20:00 UTC is selected to coincide with approximately the timing of the afternoon peak in surface T over the eastern USA. Figure   2 shows centroids of synoptic modes of variability as defined using the PCA (so-called key days) as 20 represented by the spatial patterns of T 850 , q 850 and Z 500 computed as the mean conditions on the seven days that exhibit highest PC scores on each PC. As shown, many of the synoptic types thus identified are readily interpretable as representing a diversity of zonal versus meridional circulation (cf. type 9 and 5 and 6), and some are characterized by conditions known to be associated with strong southerly low-level advection of high T and q into the region (e.g. types 1, 3, and 15) (Pryor and Schoof, 2016;Weaver, 2013). Further, most 25 types exhibit a high degree of similarity with other synoptic-scale classifications derived for the region (e.g. type 7 is very similar to one that is associated with summertime precipitation over the southeastern USA (Diem, 2006)).
3) An index of soil moisture (SM). This predictor is included to represent the availability of moisture at the atmosphere-surface boundary. Due to its spatial heterogeneity, following previous research (Ford and 30 Schoof, 2016), we use a time and spatially integrated metric of SM. Specifically we use a 90-day running mean estimate of antecedent SM (i.e. the value for 1 June is an average of values from approx. 1 March to 1 June) over 3×3 grid cells centered on the grid cell in question (i.e. integrated over an area of approximately 30,000 km 2 , see Figure 1g). MERRA-2 SM has been evaluated relative to in situ measurements of surface and root-zone SM and exhibits an unbiased root mean square error (RMSE) of 0.05 m 3 m -3 and a variance 35 explanation (R 2 ) value of the average root-zone SM anomaly of 0.56 (Reichle et al., 2017a). Further, since the MERRA-2 system applies bias correction to the precipitation estimates used in the land surface model (Reichle et al., 2017b), this may result in enhanced accuracy of SM estimates. However, it should be noted that there are relatively few direct measurements of SM and thus the evaluation of MERRA-2 is focused on agricultural locations. 4) An estimate of (i) daily maximum θ e and (ii) daily minimum θ e in each grid cell (see Figure 1c and d). The daily minimum and maximum values are used as the predictands in the downscaling and are derived using 5 Eq(1) applied to hourly T at 2-m, q at 2-m and surface pressure (P). The domain used to compute the gridded fields of the predictands (daily minimum and maximum θ e ) is truncated by one grid cell on each boundary of the domain used to generate the predictors to accommodate the spatial averaging used to generate the SM predictor. The range of grid cell θ e estimates are consistent with those derived from station observations within the study region (Pryor and Schoof, 2016), and in accord with a priori expectations 10 both daily max-θ e and min-θ e exhibit primarily latitudinal variability (Figure 1c  Thus the time series of predictors 1) and 2) (global mean T and the 15 PC scores) are common to models built for all grid cells, but predictor 3) (SM) and the response variables (predictands, daily max-θ e and min-θ e ) are grid-cell specific.

Methods 30
Artificial neural network (ANN) architectures are potentially highly useful in developing statistical models for response variables such as θ e because ANN do not require any assumptions about the form of the relationship between individual predictors, between predictors and predictands (min-and max-θ e ), and can treat complex and non-linear term interactions. ANNs are data-driven self-adaptive multi-layer perceptrons that model relationships between input variables and dependent output variables. Term interactions are described using mathematical 7 functions encoded within hidden-layers and weights that connect all nodes within the network layers including the input (predictors) and output (predictand) layers to perform the non-linear mapping between the input and output variables (Gardner and Dorling, 1998). The number of hidden layers within the ANN determines the degree of nonlinearity that can be modeled. Hence, if the data are linearly separable, no hidden layers are required. Our a priori expectation is that the different predictors of daily maximum and minimum θ e will interact in complex, non-linear 5 ways. Thus, we apply ANN to develop models relating the global mean T, PC scores of the synoptic-scale meteorology and antecedent SM to daily maximum or minimum θ e in each grid cell. Because we seek to examine spatial variability in model performance, we build and test the ANNs at the grid-cell level and then examine the resulting spatial coherence of model skill. A range of different learning algorithms can be employed in ANN. Herein the neural networks are constructed within Matlab using the Levenberg-Marquardt back-propagation algorithm (in 10 which the sum of the squares of the deviations between the observations and model predictions is minimized) (Papageorgiou and Poczeta, 2017). Although there is no single 'best-practice' regarding the number of hidden layers to use with ANN, there is evidence that a single hidden layer is sufficient for the large majority of problems (Toth et al., 2000). To test the dependence of model skill on the number of hidden-layers, three independent models are constructed for each MERRA-2 grid cell using: 15 1) No hidden layers, i.e. a linear regression model with no interaction between the predictors.
2) ANN with a single hidden layer.

3) ANN with three hidden layers.
To examine the importance of SM in determining the downscaling model skill, a fourth ANN model (with 3 hidden layers) is also built for each grid cell and each of the two predictands (daily maximum and minimum θ e ) that 20 excludes SM from the input variables. Lastly, it is challenging to determine which measures of SM are most appropriate to use within statistical downscaling models. Therefore, in addition to developing models using the MERRA-2 variable 'PRMC', which is the 'Total profile soil moisture content' in m 3 m -3 (that is summed across all six soil layers and represents the total water potentially available for evapotranspiration to the atmosphere), a fifth ANN model (with 3 hidden layers) is also built that uses the variable 'GWETTOP' that describes the SM content in 25 the upper 5 cm of the soil (unitless) (Reichle et al., 2017a), and thus best represents the SM that is readily available for evaporation into the overlying atmosphere. Table 1 summarizes the model abbreviations used herein. A schematic of the downscaling model architecture and data flows is given in Figure 3. For each model (and thus each grid cell) 70% of the data set is randomly selected to be used for training of the models, 15% is used for internal validation and 15% is withheld and used as an 30 independent sample for model testing. We use two primary metrics of model performance: RMSE and Pearson correlation coefficient (r) between observed and predicted daily minimum and maximum-θ e in each grid cell on each day in the test data set (i.e. independent observations), summarized at both the grid cell level and also averaged over all 1962 MERRA-2 grid cells that have some land areas within them (i.e. a domain average). The correlation coefficient is thus used as a relative measure of model performance, while RMSE provides an absolute measure of 35 degree of agreement between the model 'predictions' and the observed values (i.e. it is the typical value of the prediction error). Given the importance of extreme heat to human health we further examine the ability of the models to capture the occurrence of very high θ e . In this analysis we set a threshold of 347 K (73°C) to indicate extreme max-θ e (based on information provided in (Buzan et al., 2015) for the eastern USA) and a threshold of 337 K for min-θ e. A contingency table approach is used to evaluate the accuracy of the model predictions of extreme-θ e using the hit rate (HR) (Wilks, 2011): Where #hits is number of days in the independent data set where the observations and predictions both indicate exceedance of the threshold, and #misses is the number of days when the observations indicate an exceedance of the threshold but the model prediction does not.
And the false alarm rate (FAR) of each individual grid cell: Where #false alarms is number of days in the independent data set where the observations did not indicate exceedance of the threshold but the prediction was for an exceedance, and #correct non-events is the number of days when the observations and predictions both indicate the threshold is not exceeded.

Results
For both daily maximum-θ e and daily minimum-θ e, the worst model performance statistics (highest RMSE as a 15 fraction of the temporal variability θ e , and lowest r) are associated with the linear models that do not include parameter interactions (i.e. MLR) ( Table 1 and Figures 4 and 5). Nevertheless, output from all model architectures for min-θ e and max-θ e exhibit high correlation coefficients (r > 0.8) with independent data over most of the study domain. Correlation coefficients exceed 0.8 for 84% of grid cells for ANN-HL3-SM applied to max-θ e and 81% for min-θ e . Further, 92% of grid cells for ANN-HL3-SM exhibit RMSE < 5 K for max-θ e and 91% exhibit RMSE < 5 K 20 for min-θ e (Table 1, Figure 4 and 5). The ANN-HL3-SM model also exhibits the highest number of grid cells that have both a RMSE < 5 K and a r > 0.8 for both max-θ e and min-θ e . (Table 1). Grid cells with highest RMSE for min-θ e and max-θ e also generally have highest variance (i.e. largest day-to-day variability). For example, grid cells in IA exhibit highest variance and highest RMSE for max-θ e (Figure 1e and Figure 4), and grid-cells within IL are generally characterized by large RMSE and variance of min-θ e (Figure 1f and Figure 5). Thus, while noting the 25 RMSE (i.e. typical prediction error) is largest over IA for max-θ e (~ 5 K), it is less than half the standard deviation computed from the day-to-day variability in max-θ e (~ 10 K).
Generally all models exhibit slightly worse performance across both measures (r and RMSE) for min-θ e than max-θ e at the grid-cell level and integrated over all land grid cells (Table 1 and Figures 4 and 5). The reduced model skill for min-θ e may reflect use of output at 20:00 UTC values of the predictors used in the synoptic classification due to 30 our particular focus on daytime max-θ e .
Although performance differences between the five model architectures for daily maximum-θ e and daily minimumθ e are comparatively modest when averaged over the entire domain (Table 1), there are important regional variations in the performance of the different model functional forms. Over two-thirds of all grid cells (1332 of 1962) exhibit lower RMSE in the ANN model with 3 hidden layers and including SM (i.e. ANN-HL3-SM) than in any of the other models (e.g. MLR). The enhancement of model performance as measured by a decrease in RMSE for the more complex model of max-θ e and min-θ e is particularly marked in the west-central of the domain (over parts of Missouri (MO) and Iowa (IA), close to or within one of the 'warming holes') ( Figure 4 and 5). This is a region where a substantial fraction of T variance is explained by thermal and moisture advection by the GPLLJ (Weaver, 5 2013), where modeled land-atmosphere coupling is particularly intense (Koster et al., 2011;Koster et al., 2006) and where there are strong longitudinal gradients of SM (Figure 1g). Lowest correlations between predicted and observed min-and max-θ e values occur over east Texas (TX) for all model formulations although the RMSE of model predictions are not particularly high in this area (Figure 4 and 5). The low RMSE may reflect the small dayto-day variability in min-and max-θ e over this region (Figure 1e and f), possibly due to the proximity to the ocean, 10 while the low r may indicate that the synoptic types derived herein are not able to represent mesoscale features such as dry lines that play a key role in dictating day-to-day variability in θ e over this sub-region. It is also worthy of note that this area was excluded from the eastern US in terms of the area of coherence for extreme T over the eastern Consistent with prior research that has indicated the importance of atmosphere-surface interactions (Cai et al., 2014) 25 and specifically soil moisture (Pryor et al., 2016;Seneviratne et al., 2010) in surface energy partitioning and thus near-surface T and q regimes and static energy, exclusion of SM from the ANN with 3 hidden layers (i.e. ANN-HL3) decreases model performance relative to ANN-HL3-SM and increases the RMSE for max-θ e in 70% of grid cells. The regions for which this impact is most strongly manifest are close to or within the 'warming holes' described above and/or are located downstream of regions of significant county-level irrigation and anthropogenic 30 enhancement of SM (Pryor et al., 2016;DeAngelis et al., 2010) (Figure 1g), indicating the potential for anthropogenic enhancement of SM to strongly influence static energy and human heat-stress in these regions. For example, RMSE for max-θ e is increased in models excluding SM in all grid cells within MO, and all but one grid cell each in IA and IL (Figure 4). This finding is also replicated in the second region of weak or negative air temperature trends described above and centered on MS and AL (Ellenburg et al., 2016). The RMSE is lower in 35 ANN-HL3-SM than ANN-HL3 over all but one grid cell in these two states. Thus this analysis strongly supports prior assertions that SM plays a key role in dictating the surface energy balance and in the suppression of daily maximum T, while increasing max-θ e .
The statistical (downscaling) models show similar dynamic range to independent observations, although there is some evidence that the models underestimate the total variance in max-θ e leading to underestimation of extreme max-θ e as is evident from the flattening of the scatterplots for very high values of daily maximum θ e (see the upper row of panels in Figure 4). To examine this further we conduct an analysis of the HR and FAR for max-θ e in excess 5 of 347 K. This threshold is exceeded by daily maximum θ e derived from the MERRA-2 reanalysis on an average of ∼ 15% of summer days when all eastern US grid-cells are considered, but naturally exhibits a higher frequency of exceedance (of up to 75% of days) along the southeastern portion of the TX gulf coast and is observed on nearly 50% of days over coastal portions of the Gulf coast states and FL (Figure 6a). Conversely, it is seldom or never observed within grid cells in the north of the domain (Figure 6a). To ensure a sufficiently robust sample size on 10 which to compute the HR for extreme max-θ e we consider only grid cells where more than 40 days in the independent data sample (i.e. 8%) exceed this threshold. The mean HR values for the linear model (MLR), the ANN with 3-hidden layers (ANN-HL3-SM), and the ANN with 3-hidden layers but excluding SM (ANN-HL3) computed over all these grid cells is 59, 60 and 56%, respectively (Figure 6b-d), indicating that over the entire study domain the role of SM and predictor interactions in explaining the occurrence of extreme max-θ e is modest. All model forms 15 perform least well in terms of predicting the occurrence of max-θ e > 347 K over eastern TX and South Carolina (SC) (Figure 6e-g). However, the model excluding SM exhibits particularly poor performance (i.e. low HR) in these regions. The causes of the poor model performance in eastern TX and SC are currently not fully understood, although it is worthy of note that data from MERRA-2 grid-cells in SC exhibit a relatively low overall frequency of exceedance of this threshold and are also characterized by comparatively low 99 th percentile θ e in an analysis of heat 20 indices derived from the Community Land Model v4.5 (Buzan et al., 2015). Grid cells along the Gulf coast and over the states of MO, IA and IL exhibit high HR for prediction of extreme max-θ e and substantial improvement in HR is noted in IA, IL and MO (Figure 6e-g) in the ANN-HL3-SM relative to the other model forms. This is consistent with strong spatial gradients in SM (Figure 1e), findings of the GLACE projects of strong atmosphere-surface coupling (Koster et al., 2011;Koster et al., 2006), and analyses for stations in IL that also show a strong dependence 25 of high T e on soil moisture (Ford and Schoof, 2016). To contextualize the HR presented above it is important to note that they are associated with comparatively low false alarm rates (FAR). Indeed, FAR for the occurrence of min-θ e > 342 K or max-θ e > 347 K are very modest for all model formulations (Figure 6 and 7). For example, over 94% of grid cells indicate FAR for max-θ e > 347 K that are below 0.25 for the ANN-HL3-SM models. Thus, the relatively high HR reported herein are not being artificially inflated by unrealistically high predictions of the occurrence of 30 extreme θ e . The inclusion of SM as a predictor enhances HR in regions previously identified as exhibiting high variance in extreme θ e without a concomitant increase in FAR (Figure 6 and 7). It should be acknowledged that even the ANN with 3 hidden layers and soil moisture (ANN-HL3-SM) exhibits a modal grid-cell HR of 0.6-0.7, and thus misses a substantial fraction of extreme θ e . Nevertheless, these HR and FAR are indicative of positive Relative Operating Characteristics (ROC) (i.e. plots of the true positive rate greatly exceeds false positive rates) (Wilks, forecasts of summertime T at 2-m over the land areas of Southern Europe developed using the European Centre for Medium-Range Weather Forecasts (ECMWF) seasonal ensemble forecasting system (Weisheimer et al., 2011).
In contrast to the results for prediction of extreme max-θ e the model architecture has virtually no impact on HR for min-θ e > 337 K, and neither does the inclusion of SM in the model. In all cases the domain averaged HR = 59% and no region exhibits consistent improved or degraded performance for ANN-HL3-SM or ANN-HL3 over MLR 5 (Figure 7). This finding is consistent with the overall results for models of min-θ e that exhibit only modest decreases in model performance (increased RMSE and lower r) when SM is excluded from the predictor suite (Table 1 and Figure 5). Consistent with the interpretation of the surface energy balance (Eq. 2) this re-emphasizes that SM more directly impacts near-surface T and q during the daytime under conditions of positive net radiation.
Differences in model performance between ANN conditioned on total SM and using wetness only in the top soil 10 layer (upper 5 cm) are very small when averaged across the domain (Table 1) and indeed for virtually all grid cells.
Only 26 grid cells exhibited a Δ|RMSE| > 0.5 K for models using PRMC versus those using GWETTOP (out of a total of 1962), while 155 exhibited an increase in RMSE > 0.5 K when SM was excluded from the model. Thus, although the weights within the ANNs differ for use of the two SM parameters, the overall model skill is unchanged by use of the two SM estimates possibly due to the spatial and temporal averaging applied herein, or uncertainty in 15 reanalysis-derived SM variables.

Summary and Conclusions
Very few statistical downscaling analyses focus on integrative variables such as θ e that explicitly incorporate covariability of T and q, but such variables have direct applications to climate change impact analyses (such as analysis of heat waves (Buzan et al., 2015)). Further, this is an application of climate downscaling where statistical 20 approaches may be particularly useful given evidence that even when nested within observed lateral boundary conditions Regional Climate Models (RCMs) have difficulty in capturing the joint probability distributions of T and q and thus in accurately representing either the probability distribution of static energy or the spatio-temporal variability therein (Pryor and Schoof, 2016). Analyses of θ e are also essential to advancing fundamental understanding of changes in the total static energy content of the lower atmosphere, and may reveal important 25 information of relevance to both model performance analyses and attribution studies of global change.
The goal of this work is to develop a hierarchy of statistical models with increasing complexity and use them to determine the degree to which increased complexity enhances the skill of model predictions of θ e and to attribute variability in min-and max-θ e over eastern North America. Prior to discussing the results from application of this analysis framework to output from the MERRA-2 reanalysis it is worthy of note that previous research on regional 30 heat wave characteristics over the contiguous US using a suite of reanalyses indicated some important differences in the magnitude of derived equivalent temperature (T e ) between the reanalysis products  as well as in strength of land-atmosphere coupling between the reanalysis products (Ferguson et al., 2012). Thus, there would be value in applying this framework to additional observationally constrained data sets to evaluate: (1) The degree to which the findings of a key role of SM to determining the model skill for daily maximum θ e in specific sub-regions are generalizable and spatially consistent between reanalyses, and further if the predictability of θ e exhibits sensitivity to the spatiotemporal averaging used in deriving the SM predictors. (2) If use of a reanalysis product (or forecast model) that does not employ bias-correction of precipitation amounts would substantially alter the ANN model structure.
(3) If the partial truncation of the upper percentiles of daily maximum θ e in the model predictions is also a generalizable finding when our model framework is applied to different data sets. 5 Consistent with our a priori expectations, models built using ANN out-perform those that do not permit interaction of the predictor variables. Domain averaged RMSE for min-and max-θ e is smallest in the more complex models (i.e. for ANN-HL3-SM, RMSE < 4 K and < 4.3 K, respectively, c.f. mean max-θ e ≈ 333 K and mean min-θ e ≈ 321 K). Particularly in regions with high variability in min-and max-θ e the more complex models with multiple hidden layers are better able to capture the day-to-day variability in θ e . Correlation coefficients exceed 0.8 for 84% of grid 10 cells for ANN-HL3-SM applied to max-θ e and 81% for min-θ e . Further, 92% of grid cells for ANN-HL3-SM exhibit a RMSE < 5 K for max-θ e and 91% for min-θ e .
The primary purposes of this research are to enhance understanding of the causes of variability and change in θ e over the eastern USA and to propose a new downscaling approach to allow projections of daily minimum and maximum θ e using variables commonly available from reanalyses and global and regional climate models. However, 15 although prognostic thermal physiological models are required to make accurate assessments of human heat stress, the ANN models developed here may also have utility in assessments of possible climate change impacts on human health. Further, these analyses also may have applications to short-term forecasting of human-health relevant heat events (McKinnon et al., 2016;Weisheimer et al., 2011), since the methodological framework developed herein could be applied to observed antecedent SM, and modeled forecasts of the global mean T and conditions at the 20 synoptic scale over the eastern USA. Many of the heat watch-warning systems implemented across the United States currently employ a synoptic typing methodology (Sheridan and Kalkstein, 2004), but the performance of such systems may be aided by implementation of other variables/analysis methodologies such as those used herein. The ANN-HL3-SM models developed herein exhibit relative high skill in predicting the occurrence of extreme min-and max-θ e , and indeed out-perform the simpler models. The ANN with 3 hidden layers and that includes SM as a 25 predictor (i.e. ANN-HL3-SM) exhibits a domain averaged median hit rate for max-θ e > 347 K is > 0.60, while the median FAR is ≈ 0.08. Results from the ANN models further indicate that max-θ e and the occurrence of extreme max-θ e appear to be considerably more sensitive to SM than min-θ e which in turn appears to exhibit a stronger dependence on the precise prevailing synoptic scale conditions based on the ANN weights.
Our results imply there are large spatial gradients in the importance of the predictors we used herein. For example, 30 in the northeastern portions of our study region inclusion of SM as a predictor has considerably lower impact on model skill for either max-θ e or min-θ e (Figure 4-7). Global T substantially contributes to model skill near the Gulf coast and close to the Great Lakes but is less important over the remainder to the eastern USA, while SM exhibits greatest importance in sub-regions previously noted as exhibiting 'warming holes'. Our framework has greater skill for max-θ e than min-θ e. It is possible that inclusion of additional predictors could lead to enhanced model skill 35 particularly for extreme high values of max-θ e or min-θ e that are of greatest importance to human health, and/or that our methodology could be evolved to allow derivation of persistence indices (e.g. the occurrence of consecutive nights with high minimum θ e ).
We can not conclusively discount contributions from other phenomena (e.g. aerosol forcing, cloud cover) to the occurrence of 'warming holes' (areas with declining or no-trends in T) (Meehl et al., 2015), and these features may be a complex response to multiple drivers. However, results presented herein are consistent with past work that has 5 indicated the importance of soil moisture (SM) in determining partitioning of the surface energy budget, and thus the spatiotemporal patterns of θ e over the central and eastern USA (Koster et al., 2011;Koster et al., 2006;Pryor and Schoof, 2016;Pryor et al., 2016;Schoof, 2016, 2017;McKinnon et al., 2016). Indeed, SM is particularly important in determining the surface energy partitioning and the magnitude of θ e over regions that have previously been identified as exhibiting 'warming holes', and for all grid cells the RMSE for models including SM as a 10 predictor is smaller than the temporal variability of θ e as measured using the standard deviation of the daily θ e values. Specifically, only a model including SM is able to predict the occurrence of extreme (and highly healthrelevant) values of θ e over the western portion of Midwestern states such as IA, MO, IL and also in MS and AL.
This research thus implies that SM has played and may continue to play a key role in dictating the presence and intensity of 'warming holes' that have been previously noted in analyses of near-surface air temperature data (from 15 both in situ measurements and reanalysis products).