Introduction
Poor air quality is projected to become the most important environmental
cause of premature human mortality by 2030 (WHO, 2014). Long-term exposure to
high levels of ozone (O3) has been linked to increased risk of
respiratory illness, while chronic exposure to elevated fine particulate
matter (PM2.5) is associated with lung cancer, respiratory disease, and
cardiovascular disease (e.g., Dockery et al., 1993; Jerrett et al., 2009;
Krewski et al., 2009; Pope III et al., 2009). In addition to these
consistently documented risks of chronic exposure, there is some evidence
that acute exposures to pollution may themselves carry risks to human health
above and beyond those of the long-term mean exposures (Bell et al., 2005).
Thus, high pollution events may be responsible for a larger fraction of
annual acute mortality. In addition, particularly extreme events may hinder
day-to-day activities, and require the implementation of drastic tactical air
pollution control measures (e.g., the temporary banning of vehicles with
even-numbered license plates from driving in Paris during the spring of
2015). Despite the lack of an observed threshold concentration for
detrimental impacts of air pollution (e.g., Dockery et al., 1993), ambient
air-quality regulations are typically implemented as thresholds, with
penalties for exceedances. For example, in the USA, pollution standards for
O3 and PM2.5 include limits on not only mean annual values (in the
case of PM2.5) but also on thresholds for high annual values (equivalent
to the averaged 98th or 99th percentiles for PM2.5 and O3,
respectively). Thus, predicting and understanding potential changes in
extreme air pollution episodes is central to both air pollution policy and
human health concerns.
A changing climate may modulate air quality, with implications for human
health. Pollutant formation, transport, lifetime, and even emissions all
depend, to a certain degree, on local meteorological factors (Jacob and
Winner, 2009; Tai et al., 2010), meaning that changes in the behaviors of
these factors will often lead to changes in pollutant levels and exposure
risks. Understanding the relationships between meteorological variability
and observed pollutant levels will be critical to the development of robust
pollution projections, as well as sound pollution control strategies.
However, while straightforward sensitivity analyses using long-term averages
and simple linear regressions provide valuable information on mean pollutant
behavior, they are insufficient for analyses of extreme behaviors. Drivers
and sensitivities characteristic of average pollutant responses will not
necessarily be reflected throughout the entire pollutant distribution. To
evaluate these relationships statistically, alternative methodologies must be used.
Previous studies examining the impact of meteorology on pollution levels
have addressed the problem using a variety of tools. Modeling sensitivity
studies offer a direct means of comparing the impacts of large-scale
scenarios or individually adjusted parameters, allowing for a degree of
comparison and replication that is impossible using only observations
(e.g., Hogrefe et al., 2004; Mickley et al., 2004; Murazaki and Hess, 2006; Steiner
et al., 2006; Heald et al., 2008). From such output, pollutant levels under
multiple conditions or scenarios can be evaluated more or less in the same
way that observed levels are, including the examination of global burdens,
regional patterns, or even local exceedance frequencies as a function of
meteorological changes. However, while these tools are powerful, it can be
difficult to verify and understand projected changes due to the high degree
of complexity of these models. On the other hand, observation-based
examinations (e.g., Bloomer et al., 2009; Rasmussen et al., 2012) are tied closely to the actual
underlying physical processes producing changes in pollutant levels, but are
naturally limited in terms of identifying and quantifying the impacts of
individual drivers – it is difficult to separate the impacts of different
meteorological factors without the benefit of multiple sensitivity
comparisons afforded by models.
Daily maximum 8 h O3 vs. maximum daily temperature for an
example site in Essex County, MA (JJA, 2004–2012). An ordinary least-squares
regression line (a) captures the general trend, but is unable to represent
the increase of variability in the distribution with increasing temperature.
Using individual quantile regressions ranging from 5th to 95th percentiles (b),
the increased sensitivity of higher quantiles to increased temperatures becomes apparent.
Ordinary least-squares (OLS) regressions are effective tools for identifying
trends and sensitivities in the distribution of pollution levels as a whole,
especially for well-behaved data showing uniform sensitivities. Previous
studies have analyzed the impacts of changes in weather and climate on
O3 and PM2.5 levels (e.g., Brasseur et al., 2006;
Liao et al., 2006), finding connections between specific meteorological
conditions and mean pollutant response. In particular, the sensitivity of
surface O3 levels to changes in climate – the so-called “climate
change penalty” (Wu et al., 2008) – has been examined in multiple studies
worldwide (e.g., Bloomer et al., 2009), but previous
examinations of individual meteorological sensitivities have typically
produced single, monovariate estimates for changes in O3 given changes
in each driver (e.g., temperature). However, when the variability of a given
response is itself a function of the independent variable, as in Fig. 1a,
the information provided by such regressions is less valuable for describing
the specific response across the distribution – especially at the extremes
(defined here as pollutant levels below the 5th quantile or above the
95th quantile). If the sensitivities of high O3 extremes to
temperature tend to be higher than those of median to low O3 days (as
is the case at many polluted locations), a single sensitivity value would
underestimate the increase in high O3 event frequencies and magnitudes,
given rising temperatures.
This situation is one common example of a distribution that might be better
characterized through the use of more advanced statistical tools, such as
quantile regression (QR) (Koenker and Bassett Jr., 1978). A
semi-parametric estimator, quantile regression seeks to minimize the
sum of a linear (rather than quadratic) cost function, making it less
sensitive to outliers than OLS regression. Unweighted, this simple change
produces a conditional median (or 50th quantile regression), rather
than the conditional mean of OLS regression. Applying appropriately chosen
weights to the positive and negative residuals of this cost function then
targets specific percentiles of the response, allowing for the
quantification of sensitivity across nearly the entire response
distribution. An example of this regression performed across a broad range
of percentiles is shown in Fig. 1b, including the 5th quantile in
black, the 50th quantile in yellow, and the 95th quantile in red.
Here, we apply multivariate QR to an analysis of meteorological drivers of
O3 and PM2.5, with the goal of identifying the covariates most
correlated with changes in peak pollutant levels throughout the USA,
and how these differ from the median response. Such a statistical
examination of historical observations can provide a valuable reference
point for the evaluation of model-predicted extremes, as well as a platform
for short-term pollutant projections.
Location of AQS stations included in this study. The magnitude of
each station's 95th percentile measurement is indicated by color.
Methodology
Inputs
We use O3 and PM2.5 measurements from the US Environmental
Protection Agency's (EPA) Air Quality System (AQS) network, including daily
peak 8 h average measurements of O3 and daily mean PM2.5
levels. All stations with at least 150 valid maximum daily 8 h averages
between 2004 and 2012 are included in this study, totaling 1347 stations for
summer O3, 675 stations for winter O3, 647 stations for summer
PM2.5, and 636 stations for winter PM2.5 (locations and
95th percentile concentrations shown in Fig. 2).
Meteorological variables are taken from the National Centers for
Environmental Prediction (NCEP) North American Regional Reanalysis (NARR)
product (Mesinger et al., 2006). With a spatial resolution of 32 km and
8 output fields per day (representing 3-hourly averages), NARR output
provides a reasonable spatial and temporal match for each of the AQS stations
of interest. While the NARR product represents modeled output and includes
its own errors and biases when compared to observations, it allows for the
consistent use of many variables at high spatial and temporal resolution,
most of which would not be available at all included AQS stations examined
here. NARR reanalyses have been used in previous examinations of
meteorological air pollution drivers with some success (e.g., Tai et al.,
2010).
Meteorological variable generation
As an initial step towards understanding the impacts of meteorology on
pollutant extremes, we construct a large set of possible meteorological
covariates, including NARR meteorological variables for a range of time
frames. By extending the initial scope of possible drivers, we attempt to
capture the important factors and interactions, including not only effects
that were important at all sites, but also those that stood out only in
particular regions or types of locations. To this end, we begin by
considering as many potential indicators as possible, gradually trimming the
list down to a final set to be used in the multivariate quantile
regressions. We use the 3-hourly NARR output to reconstruct hourly
resolution diurnal cycles for each meteorological variable at each station
through time series cubic splines and bilinear interpolation of the gridded
fields to station latitudes and longitudes. In some cases regional means
were included, primarily due to insufficient variability in individual cell
values for that variable at some sites.
In addition to the raw variables available through NARR output, we calculate
several derived parameters. The synoptic recirculation of air has been
linked to elevated pollutant concentrations at many sites around the world,
especially in coastal regions where diurnal wind patterns are prone to
recirculation (Alper-Siman Tov et al., 1997; St. John and Chameides, 1997; Yimin and Lyons, 2003; Zhao
et al., 2009). When air masses are returned to a site with ongoing
emissions, the buildup of precursor concentrations may generate
exceptionally high pollutant levels. To measure this effect we calculate a
daily recirculation potential index (RPI) from surface wind speeds based on
the ratio between the vector sum magnitude (L) and scalar sum (S) of wind
speeds over the previous 24 h (Levy et al., 2009):
RPI=1-LS.
A high RPI (close to 1) indicates that, regardless of individual hourly
wind-speed magnitudes, the total displacement of air over the previous 24
hours was low, potentially leading to a pollutant buildup. Meanwhile, a very
low RPI (close to 0) indicates steady, consistent wind, advecting air masses
away from a location.
Stagnation, or the relative stability of tropospheric air masses, is another
meteorological phenomenon previously cited as a driver of pollutant extremes (Banta
et al., 1998; Jacob and Winner, 2009; Valente et al., 1998). While some of
the raw meteorological fields (e.g., wind speed and precipitation) are
already themselves good indicators of local stagnation, lower-tropospheric stability (LTS), the difference between surface and 700 hPa potential
temperatures, is also calculated as a reflection of temperature inversion
strength in the lower troposphere (Klein and
Hartmann, 1993). Temperature inversions, in which the daytime pattern of air
being warmer near the Earth's surface is reversed, generally lead to stable,
stagnant conditions well-suited for the buildup of pollutants such as
O3 and PM2.5. This phenomenon can be particularly pronounced in
areas with geographical barriers to horizontal transport, such as the basins
of Los Angeles and Salt Lake City (Langford et al., 2010; Pope 3rd, 1991).
From the selected set of raw and derived NARR meteorological fields (Table 1),
we generate a range of temporal variables for each individual
meteorological variable, including extrema and means for each 24 h day,
as well as for 8 h daytime and previous 8 h nighttime ranges. To
include possible long-term impacts of these meteorological variables, each
of the 9 daily values are then extended into 3- and 6-day maxima, minima, and
means, as well as a 1-day delta variable to show 24 h change, resulting
in 63 total temporal options for each listed meteorological variable.
Meteorological fields used in variable selection procedure. Each
NARR field shown was included using nine different possible daily values
(24 h max/min/mean, 8 h daytime max/min/mean, previous 8 h
nighttime max/min/mean), as well as longer term (3- and 6-day) aggregates
and 1-day deltas of those daily values. Variables marked “9x9” represent
regional means, and were generated by averaging the 9 × 9 square of NARR grid
cells centered around each station location (roughly 290 km per side).
NARR variables1
air.2m
2 m air temperature
pres.sfc
surface pressure
air.sfc_9x9
surface air temperature (regional)
rhum.2m
2 m relative humidity
apcp
accumulated total precipitation
shum.2m
2 m specific humidity
crain_9x9
binary precipitation flag (regional)
tcdc_9x9
total column cloud cover (regional)
dlwrf
downward long-wave radiation flux
tke.hl1_9x9
turbulence kinetic energy
dswrf
downward shortwave radiation flux
tmp.700
700 hPa temperature
hcdc_9x9
high-level cloud cover (regional)
uwnd. 500
500 hPa zonal wind speed
hgt.850
850 hPa geopotential height
uwnddir.10m
normalized 10 m wind direction
hpbl
planetary boundary layer height
vvel.700
700 hPa vertical velocity
lcdc_9x9
low-level clouds (regional)
vvel.hl1
lowest level vertical velocity
lftx4
best lifted index
vwnd.500
500 hPa meridional wind speed
mcdc_9x9
midlevel cloud cover (regional)
vwnddir.10m
normalized 10m wind direction
prate
precipitation rate
wspd.10m
10 m wind speed
Derived variables
fire
fire proximity metric
lts2
lower-tropospheric stability
rpi3
recirculation potential index
Temporal options
max
24 h maximum value
min
24 h minimum value
mean
24 h mean value
daymax/min/mean
as above, but using only 08:00 to 16:00 LT
nightmax/min/mean
as above, but using only preceding night: 08:00 to 16:00 LT
diff
change from previous day
3daymax/min/mean
max/min/mean of previous 3 days
6daymax/min/mean
max/min/mean of previous 6 days
1 Mesinger et al. (2006),
2 Klein and Hartmann (1993), and 3 Levy et al. (2009).
Fire proximity metric
Biomass burning emissions can impact pollutant concentrations (e.g.,
Streets et al., 2003) with indirect correlations to daily meteorological
variability, making it a potentially confounding factor when performing
analyses using meteorological variables alone. To help examine and quantify
the likely impact of fires on observed pollutant levels, we create a simple
fire metric to represent the spatial and temporal proximity of each site to
satellite-observed burn locations. Using output from the Moderate Resolution Imaging Spectroradiometer (MODIS) Global Monthly Fire Location Product
(Giglio et al., 2003; Justice et al., 2002), we estimate the total fire proximity impact for each site by
applying spatial and temporal decays to burn detection confidence values,
and summing these values across all detected pixels through the equation
F=log∑i1r12tconf.
Here, the fire proximity index F is a function of the distance (r) and number
of elapsed days (t; ranging from 0 to 6) separating a station from a
MODIS-detected burn pixel with a given confidence value (conf), summed over all
nearby burn pixels (i). The resulting proximity metric does not take transport,
precipitation, or any other meteorological variables into account, simply
producing higher values for stations near burning (or recently burned)
locations. A comprehensive treatment of biomass burning emissions and
transport requires accurate information on many complex factors, including
fuel type, burn intensity, and smoke injection heights (Val Martin et al., 2010;
Wiedinmyer et al., 2011), and fully representing these factors to generate a
robust estimate for the influence of fire emissions goes well beyond the
scope of this work. However, considering both the stochastic nature of large
fire events and the importance of biomass burning on air-quality
variability, we use this cumulative proximity metric as an intermediate measure.
Meteorological variable selection
Combining the 63 described temporal options with all chosen raw and derived
meteorological variables results in over 1700 possible pollutant indicators,
making variable selection problematic. With driver identification an
important goal of this work, we initially keep the selection procedure as
open as possible, maximizing the first sweep of candidates and only
eliminating possible drivers after thorough evaluation (Fig. 3). However,
indiscriminate inclusion of additional variables opens the strong likelihood
of problems related to overfitting and multicollinearity. Furthermore, for
the sake of comparison between stations, we aim for a single set of indicator
variables for the entire set of observation sites included, making selection
on a station-by-station basis impractical. For these reasons we utilize a
stepwise multivariate approach based on combining covariate rankings at
individual stations into a single selection metric. To reduce the
computational cost of variable selection initially we use a testing subset of
stations, including 10 stations (with varying degrees of mean pollutant
levels) from each of the 10 EPA regions (shown in Figs. 4, 5, 7, and 8). We
then use observed pollutant levels (maximum 8 h average O3 and daily
average PM2.5) from each of these 100 stations to evaluate and select
key indicators from the full set of possible meteorological variables
included. Meteorological variable selection is performed independently for
ozone and PM2.5, as well as for summer and winter seasons.
Flowchart of variable selection procedure described in Sect. 2.4.
We select meteorological indicators using 90th percentile quantile
regressions evaluated with the Bayesian information criterion (BIC) metric,
a statistical tool closely related to the Akaike information criterion (AIC)
and similarly based on the likelihood function (Schwarz, 1978; Lee et al., 2014). BIC evaluates the
likelihood of a given set of indicators representing the best set possible,
given a set of associated responses (in this case, daily pollutant levels),
with lower BIC values indicating a stronger statistical model (i.e., the set
of predictive meteorological indicators being evaluated). To perform
stepwise variable selection, we quantify the benefit (via BIC) of adding
each individual variable candidate to the list of selected variables in
turn. Large reductions in BIC indicate a more-important variable, while
small reductions (ΔBIC < 2) indicate a less-important
variable. Unlike other goodness of fit metrics such as the coefficient of
determination R2, BIC values say nothing about the overall strength of
the predictive model as a whole, but rather serve to compare the relative
effectiveness of multiple statistical models attempting to explain the same
set of results. However, again unlike R2, both BIC and (to a lesser
extent) AIC penalize the inclusion of extraneous indicators, reducing the
chance of overfitting. While there is some discussion within the statistical
literature regarding the strengths of BIC vs. AIC, both are considered
versatile, robust tools in the evaluation of statistical models (Burnham and Anderson, 2004; Yang,
2005), and applicable to quantile regression if errors are assumed to follow
an asymmetric Laplace distribution (Geraci and
Bottai, 2007). Note that while the 90th percentile of pollution levels
is lower than the 95th quantile targeted later in this study, the
slightly reduced value is chosen to improve robustness during the initial
variable selection phase.
We begin variable selection by using only time (measured in days elapsed) as
a predictor variable, accounting for any linear trend in pollutant behavior
over the course of the observed period (Fig. 3, step 3). From there, we
identify the most impactful temporal option (daily maximum, mean, minimum,
etc.) available for a single meteorological variable (e.g.,
surface temperature). We perform stepwise variable selection at each station
independently, selecting the candidate temporal option producing the
greatest reduction in BIC (and therefore greatest improvement in the
statistical model), and continuing until no further improvement is possible
(step 4). We then rank the final set of chosen variables at each station by
order of selection (step 5), invert those ranks, and sum these inverted
ranks over all 100 test stations (step 6). This sum represents an overall
importance metric, and will be large for variables that either appear
somewhat valuable at many stations, or that appear to be exceptionally
valuable at just a few stations. We then add the single temporal option with
the greatest summed total to the master list of selected variables. With a
new indicator chosen we filter the remaining candidates (step 8),
eliminating poor performers (those selected at too few sites in the previous
round) or those exhibiting collinearity with the current master list
(R2 ≥ 0.6 relative to previously selected covariates). After this
pruning process we start the selection routine again for all remaining
candidates, using time and all previously selected variables as fixed
covariates in the evaluation process. We repeat this cycle until no temporal
candidates exhibiting summed ranks higher than our chosen threshold remain
for the current meteorological variable, after which the temporal variable
selection starts anew with the next meteorological parameter. Once temporal
variable options have been filtered down for each individual meteorological
covariate through this selection process, we gather all selected variables
together and repeat the same procedure using the combined set of
approximately 300 candidates, finally arriving at trimmed a down set of less
than 20 meteorological indicators for each pollutant species and season
(Table 2, top). The selection process is somewhat sensitive to the
percentile used for the regression, as evidenced by the different variables
selected using the 50th percentile rather than the 90th (Table 2,
below). While most high-ranked meteorological variables show up using both
selection processes, there are noticeable differences, especially in the
temporal options chosen.
Frequency at which normalized 95th percentile QR coefficients for
selected variables were in the top two out of all included variables (above)
for summer O3, and box plots of normalized regression coefficients for
top three covariates in each region (below). Specific meteorological
variables (shown in legend) have been grouped into categories shown on the
x axis of the bar plot. Colors on inset box plots correspond to legend in
above panel, and gray dots indicate the fraction of stations showing a
statistically significant relationship (p ≤ 0.05) to the indicated
covariate in that region. EPA Region numbers are inset on top-right of box
plot panels.
Same as Fig. 4, but for winter O3.
Through this routine, variables can stand out for selection by being either
moderately important at many sites, or by being very important at fewer
sites. By adjusting the threshold parameter for variable selection, the
scope of variable inclusion can be tuned to a certain extent. Higher
thresholds end the selection process sooner, as fewer and fewer new
variables are ranked highly at enough stations to meet the summed value
requirements, while lower values allow the process to continue adding less
important variables. In this work we identify and compare both a concise
“core” set of indicators (variables with summed inverse ranks of at least 2)
and a “full” set of indicators (variables with summed inverse ranks of
at least 1).
It should be noted that the NARR fields used to provide our input
meteorological covariates likely exhibit intrinsic errors and biases which
will certainly affect the predictive power of our models, as well as the
strength of our variable selection process itself. Variables which are
better represented (e.g., temperature) will have an advantage compared to
other potentially important variables with greater uncertainties, such as precipitation.
Quantile regression
The final sets of indicator variables represent those covariates most
broadly associated with changes in high pollutant levels due to
meteorological factors at the 100 chosen test sites. Using these selected
meteorological variables, we next perform linear multivariate quantile
regression to identify sensitivities for percentiles from 2 to 98 % at
each station in the full set of AQS sites. From these regressions we collect
summer (JJA) and winter (DJF) quantile sensitivities of O3 and
PM2.5 to each meteorological variable for each AQS station.
Results
To assess relative covariate importance across the USA we normalize quantile
sensitivities to standard deviations of pollutant and indicator fluctuations
and rank them in relation to each other at each site. Top-ranking covariates
for any given station, then, are those whose variabilities (in normalized
units of standard deviations) are most responsible for variability in the
observed pollutant. Figures 4, 5, 7, and 8 show each variable's frequency of
appearing as the first or second most important indicator by this metric,
with similar variables grouped together into columns. We compare the
covariates most associated with the 95th and 50th percentile of pollutant
concentrations, finding similar, though not identical, frequencies between
top performers for the two quantiles.
Selected covariates for O3 and PM2.5 using
90th percentile (above) and 50th percentile (below) quantile regressions.
“Core” covariates (in bold) were selected using a minimum threshold for
summed inverted ranks of at least 2, with remaining covariates added by
rerunning the selection procedure including all core variables and a relaxed
selection threshold of 1.
Summer O3
Winter O3
Summer PM2.5
Winter PM2.5
Elected via 90th percentile QR
rhum.2m_mean
dswrf_mean.6daymax
air.2m_max
hpbl_mean
vwnddir.10m_mean
wspd.10m_mean
vwnddir.10m_mean
vwnddir.10m_mean
air.2m_max
vwnddir.10m_mean
lftx4_daymin
tke.hl1_9x9_daymax.3daymean
crain_9x9_daymean
rhum.2m_min
uwnddir.10m_mean.3daymean
wspd.10m_nightmax
fire
fire
wspd.10m_max.3daymean
rhum.2m_mean
uwnddir.10m_mean
rpi_max
air.sfc_9x9_nightmin.6daymean
shum.2m_daymax.6daymin
air.sfc_9x9_min.6daymin
hpbl_daymax
fire
crain_9x9_nightmean
pres.sfc_daymax
air.sfc_9x9_nightmin.6daymean
crain_9x9_max.6daymean
lts_min.3daymin
tke.hl1_9x9_max
dlwrf_daymax.6daymin
vwnddir.10m_daymean.6daymean
uwnddir.10m_mean.3daymean
dswrf_daymin.3daymean
crain_9x9_max
apcp_nightmax
dswrf_ max.3daymean
hpbl_max
uwnddir.10m_daymean
rpi_nightmin
lftx4_nightmin.6daymin
tcdc_9x9_mean
tcdc_9x9_mean
vvel.hl1_nightmax.6daymax
wspd.500_min
dswrf_min.6daymin
lts_nightmax.3daymin
hpbl_nightmax.6daymax
tke.hl1_9x9_max.6daymin
vwnd.500_daymax.3daymean
lftx4_min.diff
rpi_nightmax.6daymin
vwnd.500_max.diff
shum.2m_max.diff
lcdc_9x9_nightmin.6daymax
tcdc_9x9_max.6daymax
tcdc_9x9_max.diff
wspd.10m_daymin.3daymin
shum.2m_min.diff
wspd.10m_min.6daymax
hpbl_daymin.6daymin
lts_nightmin.6daymin
pres.sfc_min.diff
mcdc_9x9_nightmax.3daymin
apcp_daymin.3daymax
Selected via 50th percentile QR
rhum.2m_mean
dswrf_mean
air.2m_max
hpbl_mean
air.2m_max
wspd.10m_mean
air.sfc_9x9_nightmin.6daymax
vwnddir.10m_mean
dswrf_daymin.3daymean
dswrf_daymean.diff
crain_9x9_nightmax
wspd.10m_daymax.3daymax
vwnddir.10m_mean
vwnddir.10m_mean
wspd.10m_max.3daymean
crain_9x9_nightmax
crain_9x9_daymean
lts_daymin
vwnddir.10m_mean
wspd.10m_nightmax
fire
shum.2m_min
lftx4_mean
rhum.2m_mean
tke.hl1_9x9_daymax
uwnddir.10m_mean
lts_daymin
uwnddir.10m_mean
uwnddir.10m_daymean.3daymean
crain_9x9_daymax
uwnddir.10m_daymean.3daymean
wspd.10m_max.3daymin
air.sfc_9x9_daymin.3daymean
dswrf_min.3daymin
shum.2m_daymean.diff
rpi_max
rpi_max
fire
crain_9x9_max.6daymean
uwnddir.10m_nightmean.3daymean
lts_mean
air.sfc_9x9_mean.6daymean
rpi_max
dswrf_daymin.6daymax
dswrf_min.6daymin
hpbl_daymax
vwnd.500_min
lftx4_nightmin.3daymean
vwnd.500_min
hcdc_9x9_daymax
vwnd.500_daymax.6daymax
shum.2m_nightmin.6daymean
hpbl_nightmean.3daymin
pres.sfc_nightmin.6daymean
pres.sfc_max
fire
vvel.hl1_mean.6daymean
rpi_nightmax.6daymean
hgt.850_max.6daymax
pres.sfc_mean.diff
air.sfc_9x9_nightmin.diff
rhum.2m_max.diff
lts_daymax.6daymin
vwnd.500_min.diff
mcdc_9x9_nightmax.3daymin
Summer O3
In the summertime, covariates linked to high-percentile O3 are dominated
by a positive correlation with temperature at most sites (Fig. 4, top panel),
consistent with previous modeling sensitivity conclusions (Jacob and Winner,
2009). Altogether, 49 % of the analyzed sites show maximum daily surface
air temperature as the meteorological variable with the greatest normalized
slope relative to observed maximum 8 h average O3 concentrations, and
it is within the top five most influential variables at 79 % of all sites.
Underlying reasons for the dominance of temperature as a driver of observed
O3 include a positive correlation with biogenic emissions of isoprene (a
potential precursor of O3), a negative correlation with the lifetime of
peroxyacetyl nitrate (PAN; an important reservoir species for NOx
and HOx radicals), and an associated correlation between higher
temperatures and bright, stagnant conditions (Jacob and Winner, 2009).
While maximum daily surface temperature stands out as the covariate with the
highest normalized impact on daily summer O3 levels, many other
variables also play important roles, especially in the south and southeast
regions (Fig. 4, bottom panels). Water vapor generally reduces O3 levels
under pristine conditions, removing dissociated excited oxygen atoms and
producing the hydroxyl radical (OH). Under polluted conditions this negative
effect competes with increased O3 production as a result of OH reacting
with carbon monoxide (CO) or volatile organic compounds (VOCs), O3
precursors common to highly polluted environments. These two effects combine
to produce generally weak correlations between humidity and O3 in model
perturbation studies (Jacob and Winner, 2009). In this work, however,
relative humidity (RH) has a strong negative relationship with O3 in
many locations, particularly in the south, consistent with previous analyses
of observed sensitivities (e.g., Camalier et al., 2007). A negative
correlation with temperature and a positive correlation with cloudy, unstable
conditions may explain the stronger associations found in the observations
relative to those of model perturbation studies. Stability, in the form of
turbulent kinetic energy (TKE) is also a strong performer at many sites,
though less so for the 95th percentile than for the 50th. Finally, while fire
proximity stands out at relatively few stations as a dominant driver of
median O3 levels (50th percentile), it appears to be important at far
more sites when examining higher O3 levels (95th percentile).
Spatial and frequency distributions for key covariates of summer
(top panels) and winter (bottom panels) O3. Maps show 95th percentile O3
sensitivities to selected meteorological variables at stations where that
variable was most important (defined as being one of the top two normalized
covariates). Below each map, histograms show the distribution of
sensitivities for the 5th (gray), 50th (yellow), and 95th (red) percentiles at all sites.
Same as Fig. 4 but for summer PM2.5.
Same as for Fig. 4 but for winter PM2.5.
While the top covariate frequencies shown in Fig. 4 can help identify
dominant meteorological factors overall, they do not indicate spatial
distributions or sensitivity magnitudes. The bottom panel of Fig. 4 and top
panel of Fig. 6 address these aspects of selected top covariates, showing
where each tends to drive pollutant variability, as well as how the
sensitivity magnitudes are distributed overall. Spatially, the temperature
sensitivity of 95th percentile O3 levels appears to be most directly
associated with coastal areas, though the strong negative relationship
between relative humidity and O3 in the south likely includes
temperature effects (Fig. 4, bottom panels). In general, the sensitivities of
O3 to changes in temperature are greater for higher O3 quantiles,
as shown by the increasing and flattening distributions for 95th quantile
regression sensitivities compared to 50th and 5th quantile values (Fig. 6,
upper left panel). In fact, quantile regression coefficients for the
95th percentiles averaged 0.9 ppb ∘C-1, 50 % greater than
mean 50th percentile sensitivities. This difference again highlights the
importance of temperature in determining extreme O3 events, since
increased temperatures could be expected to positively affect the magnitudes
of high O3 days even more than would be expected based on average days.
By comparison, downward shortwave radiation flux also shows up as a positive
driver of high O3 levels, but displays much more consistent
sensitivities across O3 quantiles (Fig. 6, upper right panel).
Winter O3
O3 levels are generally lower at all percentiles during the winter
months compared to the summer months, with 95th percentile O3 levels
almost halved at some sites. As seen in Fig. 5, temperature is almost
completely absent from the top ranks of O3 indicators during the winter.
Instead, variables related to incoming radiation flux are most important at
many sites, especially for 95th percentile O3 levels. This indicates the
relative importance of consistently clear skies for O3 production during
the coldest months, a relationship that appears consistently across quantiles
and regions (Fig. 5, bottom panels). Among the incoming radiation metrics,
the 6-day maximum of daily mean shortwave radiation flux showed up as the top
covariate most often, with consistently positive correlations evenly
distributed spatially (Fig. 6, lower right panel). Sensitivities are slightly
greater, on average, for higher quantiles, and stand out as particularly
strong at stations in Wyoming, an area previously highlighted for its
dangerously high winter O3 levels (e.g., Schnell et al., 2009). As with
summer O3, downward shortwave radiation flux (DSWRF) again has a
generally positive influence on winter O3, with some increase in
sensitivity at higher quantiles. Planetary boundary layer height (HPBL)
(Fig. 6, lower left panel), wind, and specific humidity show up as top
covariates at many sites as well, but more so for median quantile regressions
than for 95th percentiles, while fire proximity becomes increasingly
important at the higher quantiles.
Same as Fig. 6 but for PM2.5.
Normalized pollutant concentration sensitivities to meteorological
covariates (0.0 = uniform sensitivity across quantiles). Values shown
here are the weighted least-squares regressions performed on normalized QR
coefficients as a function of quantile for covariates with a mean sensitivity
change of at least 0.05, by species and season. Colors of bars show mean
normalized sensitivities (roughly equivalent to slopes expected from an
ordinary least-squares regression), while magnitudes of bars show mean change
across quantiles, averaged over all stations. Error bars indicate standard
error of the mean.
Summer PM2.5
Figure 7 shows that mean daily temperature is also a key player in predicting
summertime PM2.5, with greater sensitivities at the highest
concentration percentiles. While the previously discussed sensitivities of
O3 to temperature shown in Fig. 6 are the greatest along both the
northeast coast and Southern California, PM2.5 sensitivities to
temperature peak entirely in the east (Fig. 9, upper left panel). One
possible reason for this spatial difference in PM2.5 temperature
sensitivity is the regionality of PM2.5 speciation, especially in terms
of competing sensitivities of nitrate and sulfate aerosol (Dawson et al.,
2007). While concentrations of nitrate aerosol (and, to a lesser extent,
organics) are generally reduced by higher temperatures due to increased gas
phase partitioning, sulfate aerosol concentrations can increase at higher
temperatures because of increased rates of oxidation. Sulfur emissions are
far higher in the east than in the west, offering a likely explanation for
the differing sensitivities of PM2.5 to temperature between the regions.
In addition to temperature, 95th percentile PM2.5 shows strong
sensitivities to wind speeds and tropospheric stability at many sites,
emphasizing the importance of transport and stagnancy for extreme PM2.5
events, particularly those in highly polluted regions (Fig. 7, bottom
panels). A total of 3-day average wind speed stood out among covariates at
many sites throughout the east and midwest regions, and influences tended to
be of higher magnitude for high-quantile PM2.5 levels than for medians
or low quantiles (Fig. 9, upper right panel). Positive correlations for this
metric may be associated with areas whose extremes were governed primarily by
transport, rather than production. Also increasingly important for higher
quantiles of fine particulate matter was fire proximity, with over twice as
many sites including this metric in the top drivers for 95th percentile
PM2.5 as for 50th percentile PM2.5.
Winter PM2.5
Unlike O3, winter PM2.5 levels in the USA are often comparable to
(or even greater than) those of the summer months at many sites (Fig. 2).
Compared to other seasons and species, the dominant covariates of winter
PM2.5 are more consistently distributed between a few key variables
(Fig. 8, top panel). Temperature is apparently less of a factor during cold
months, rarely appearing among the top normalized indicators, and metrics
related to stagnation stand out as important covariates associated with
pollution events. Among meteorological covariates associated with increased
winter PM2.5, stability metrics (TKE and LTS), relative humidity, and
HPBL, stood out as key variables at the most sites, with wind and rainfall
also important at many locations. Top covariates were particularly consistent
in selection and magnitude in the northeast (regions 1, 2, and 3), as shown
by the tight, nearly identical distributions (Fig. 8, bottom panels).
Turbulence had a consistently negative influence on winter PM2.5,
especially for high response quantiles (Fig. 9, lower left panel).
Compared to factors connected to median PM2.5 levels, the two included
tropospheric stability indicators (3-day average of max. daily TKE and 3-day
minimum LTS) showed exceptionally strong sensitivities among covariates of
95th percentile levels, suggesting that PM2.5 extremes in the wintertime
are particularly sensitive to persistently stable conditions (Fig. 9, lower
right panel). Sites in Colorado and Utah, some of which are well-known for
episodes of severely reduced winter air quality, stand out in this regard,
with 95th quantile sensitivities to LTS over 4 times those of other site
averages.
Discussion
Differences in quantile sensitivities
The differences between typical 5th, 50th, and 95th percentile sensitivities
shown in Figs. 4, 5, 7 and 8, help to illustrate the ways in which
meteorological impacts on pollutants can vary in magnitude across the
response distribution. These differences can be more clearly quantified and
compared by measuring the slope of a QR regression itself as a function of
the percentile (Fig. 10). Using the full range of normalized QR output
gathered, from 2 to 98 %, we perform weighted least-squares regressions for
each selected variable at each station. The resulting slope for each
regression (in normalized units of standard deviations) can be interpreted as
a measure of change in sensitivity across the pollutant distribution, with
high values representing strong positive differences in sensitivity, and low
values representing strong negative differences. In other words, a zero slope
implies that the response of a pollutant to a given meteorological covariate
is relatively uniform regardless of the pollutant's concentration, while a
positive slope implies that responses at the high extremes tend to be greater
than those of lower percentiles. To put these changes in context, the overall
mean sensitivity for each variable is shown in color. Quantifying the extent
to which these differences in quantile sensitivities might impact the
response distributions themselves is beyond the scope of this work, but the
magnitudes of sensitivity differences relative to the mean sensitivities
themselves suggest large differences between mean and extreme behavior. For
example, the sensitivity change of summer O3 to maximum air temperature
is shown to be roughly equivalent to the mean sensitivity itself. Thus, a
location showing a mean increase of 1 ppb O3 ∘C-1 might
exhibit an increase of only 0.5 ppb O3 ∘C-1 at the
5th percentile, but a much larger increase of
1.5 ppb O3 ∘C-1 at the 95th percentile. This could
clearly have important consequences for the resulting O3 distribution,
given increasing temperatures.
Ordinary least-squares coefficient of determination (R2)
between observed pollutant concentrations and the reduced set of
meteorological variables selected in this analysis. Results are shown by
pollutant (O3 or PM2.5), EPA region (see Figs. 4, 5, 7, and 8), and
season (JJA is summer, DJF is winter). Red circles indicate median values
using the full set of variables, for comparison. Refer to Table 2 for the
listing of the reduced and full set of variables. Box plot whiskers mark
5th and 95th percentile R2 values.
For summertime O3 and PM2.5, temperature stands out as a covariate
that not only has a strong positive impact on concentrations (indicated by
the bright red color), but also exhibits even stronger impacts on high-percentile pollutant levels than on lower percentile levels at most
stations. On the other hand, while HPBL also strongly impacts summertime
O3, the change in sensitivity between low and high quantiles is
generally small, indicating a variable whose impact on O3 is relatively
unchanging across pollutant percentiles. Besides temperature's connections
to summer O3 and PM2.5, the key meteorological factors associated
with winter PM2.5 stand out for having highly quantile-specific
sensitivities. The sensitivity of PM2.5 to relative humidity, LTS, HPBL, and TKE are all greater for high PM2.5
quantiles than they are for low ones, highlighting the importance of
characterizing the full pollutant response to meteorological covariates,
especially for winter PM2.5.
Overall predictive power of statistical models
The variables identified here were not selected based on their suitability
for ordinary least-squares regression, but they do show considerable skill at
predicting pollutant levels using this methodology, explaining over half of
the variability at most sites (Fig. 11). Predictive skill for summertime
O3 is greatest in east, south, and midwest (regions 2 through 6) and
least in the Pacific southwest and mountain and plain regions (regions 8
and 9). Winter O3 R2 values are generally slightly lower than those
of the summer months, especially in the Pacific northwest and US south
central regions, though this may be partly explained by reduced O3
variability overall in the winter months.
PM2.5 shows a strong split between the relatively well-modeled northeast
and the less-accurately represented midwest and southwest. These results
compare favorably to previous attempts to predict PM2.5 using
meteorological indicators (Demuzere et al., 2009; Tai et al., 2010). Tai et
al. (2010), for example, find multivariate linear regression capable of
explaining less than 50 % of PM2.5 variability in the
northeastern USA. Almost half of the stations in those same regions showed
adjusted R2 values of greater than 60 % using our method, despite the
indicators being chosen to optimize high quantile regressions rather than OLS
regressions. Regional differences in meteorological predictive power in this
work are also comparable to those of Tai et al. (2010), who found high
R2 values in the northeast and Pacific northwest (regions 2, 3, and 5),
and lower values in the south and mountain and plain regions (regions 6
and 8).
Pollutant variability and trend
It is apparent that relatively simple meteorological processes, chosen for
their influence on high percentiles of O3 and PM2.5, are also
capable of explaining a large fraction of daily pollutant variability. There
are a number of possible sources for the remaining variability, including
day-to-day fluctuations in pollutant precursor emissions and highly
localized meteorological patterns. While the nationwide variable selection
process of this study proved capable of identifying indicators that are
broadly effective at predicting daily pollutant levels in many locations,
specific features relevant to individual stations (e.g., direction and
distance of upwind emission sources) may not be adequately represented by
the globally selected variables. Variability in local emission sources
themselves, either due to sporadic local events or differences in weekend
vs. weekday emissions, may also play an important role at some sites. This
analysis is also subject to uncertainties in the NARR product and the
pollutant observations, as well as discrepancies between local station
conditions and the grid-averaged NARR output.
Another important consideration in the analysis of these results is the
nonstationarity of both pollutant concentrations and sensitivities. As a
result of the implementation of widespread emissions controls,
concentrations of O3 and PM2.5 have decreased dramatically in many
of the most polluted areas in the USA. Since 2004, mean summertime
O3 levels at the sites used in this study have decreased by an average
of 0.14 ppb per year, while 95th percentile O3 levels have
decreased by 0.58 ppb per year. Stations that started with exceptionally
high O3 levels (mean summertime levels greater than 80 ppb) have seen
even more dramatic decreases, with means falling by 0.63 ppb per year and
95th percentile levels falling by 1.3 ppb year.
To a certain extent, these changes in pollution levels over time are
accounted for in our analysis through the inclusion of time (measured in
days since the start of the analyzed record) as an indicator variable.
However, changes in meteorological sensitivities themselves as a function of
decreasing emissions are not accounted for. To assess how these decreases in
emissions and overall pollution levels might have affected meteorological
sensitivities, the analyses above were repeated using 4-year subsets of the
full data record: 2004–2007 and 2008–2012, showing a widespread reduction in
sensitivities over time, presumably due to changes in precursor emissions.
For example, 95th percentile sensitivities of summertime O3 to
temperature were 13 % lower in the years 2009–2012 relative to 2004–2007,
consistent with previously reported declines in temperature sensitivity
(Bloomer et al., 2009). In all, we see average
absolute differences in 95th percentile sensitivities among each
station's top two covariates of 22 %, with most changes representing
reductions in sensitivity. Despite these differences, the qualitative
features of our analysis (including sign of sensitivities and differences
between pollutant quantiles) are consistent over time.
Conclusions
This analysis demonstrates that air quality over the past decade was highly
sensitive to meteorology, and that this sensitivity varied across pollutant
type (O3 vs. PM2.5), season, and concentrations (50th vs. 95th percentiles). These differences offer insights into the key
drivers behind extreme pollution event frequencies in the observed record
beyond simple conditional means, highlighting the meteorological covariates
most associated with changes in the highest pollutant levels.
We find that temperature is a dominant covariate at most stations in the
summer for both O3 and PM2.5, with relative humidity, stability,
and radiation flux also key covariates relating to O3, and wind,
stability, and rain often effective for predicting high PM2.5 levels.
O3 variability during winter months is determined largely by changes in
incoming radiation, while winter PM2.5 extremes are most commonly
affected by stagnation, humidity, and HPBL. We show substantial
regional variation in these results, suggesting that while classes of
meteorological drivers of extreme air quality are generally consistent,
specific factors leading to air-quality exceedances are local.
Climate change in coming decades is likely to induce a response in regional
air pollution. The sensitivities of O3 and PM2.5 to changes in
meteorological patterns are, in general, stronger for higher pollution
percentiles, meaning that changes to certain factors (most notably
temperature, wind speed, HPBL, and tropospheric stability) are likely
to affect the magnitude and frequencies of pollutant extremes more
drastically than they affect more moderate pollution levels. This effect
suggests that regional changes to climate could have more significant
impacts on the frequencies of extreme O3 and PM2.5 events than
would be suggested by bulk sensitivities from OLS regressions.
This analysis framework offers new ways to investigate both the observed and
simulated air-quality responses to climate. Through quantile regression, the
selection and ranking of key predictors of pollutant variability can be
evaluated robustly, focusing not on the mean behavior of a heavy-tailed
pollutant distribution, but rather the sensitivities closer to the tail
itself. Furthermore, the comparison of observed sensitivities to those
simulated by regional or global air-quality models could identify key model
biases relevant to the projection of future air quality, potentially
providing insights on the underlying mechanistic reasons for those biases.