Introduction
By 2030, air pollution will be the leading environmentally related cause of
premature mortality worldwide (OECD, 2012). The World Health Organization
(WHO) estimates that exposure to outdoor air pollution resulted in
3.7 million premature deaths in 2012. Many epidemiological studies have shown
that chronic exposure to fine particulate matter (PM2.5) is associated
with an increase in the risk of mortality from respiratory diseases, lung
cancer, and cardiovascular disease, with the underlying assumption that a
causal relationship exists between PM and health outcomes (Dockery et al.,
1993; Jerrett et al., 2005a; Krewski et al., 2009; Pope et al., 1995, 2002,
2004, 2006). This has been shown through single and multi-population time
series analyses, long-term cohort studies, and meta-analyses.
In order to stress the negative impacts of air pollution on human health and
inform policy development (particularly with regard to developing strategies
for intervention and risk reduction), many studies have calculated the total
number of premature deaths each year attributable to air pollution exposure
or the “burden of disease”, through health impact assessment methods. One
of the main obstacles in attributing specific health impacts of PM2.5 is
determining exposure and linking this to specific health outcomes. Jerrett et
al. (2005a) suggest personal monitors would be the optimal method because it
would be easier to attribute individual recorded health outcomes to specific
particulate levels, but point out that the financial costs and
time-intensiveness limit widespread use. Many studies have instead relied on
fixed-site monitors within a certain radius to estimate population-level
exposure. However, these monitoring networks are generally located in urban
regions and provide no information on concentration gradients between sites.
Thus, epidemiological studies typically have to quantify the aggregate
population response to an area-average concentration. Additionally, health
data can be limited and therefore the responses may be determined from a
subset of individuals that may not be representative of the wider population.
Premature mortality from PM2.5 exposure by all-cause (All),
heart disease (heart), and lung cancer (LC) as estimated in other studies for
the globe, US (or North America), and China (or Asia). Values are for
(× 1000 deaths per year). All cause values for this study are
calculated as the sum of heart disease, lung cancer, and respiratory disease
deaths (as opposed to calculating this based on an all-cause CRF).
a Study provides several estimates determined using different CRFs.
b Study provides several estimates from 14 different atmospheric
models. Table 3 provides additional information on the data sources and
concentrations response functions used in these studies.
Study
US (North America)
China
Global
Year for
(Asia/Western Pacific, East Asia)
estimate
All
Heart
LC
All
Heart
LC
All
Heart
LC
Evans et al. (2013)a (WHO region)
2640–4220
1123–1669
176–264
2004
Fann et al. (2012)a (US)
130–320
2005
Anenberg et al. (2010)a
141
124
17
2736
2584
152
2077–7714
3499
222
2000
(continents)
1800–4549
39–336
Lelieveld et al. (2013) (US and China)
55
46
9.1
1006
898
108
2200
2000
186
2005
Cohen et al. (2004)a (WHO region)
28
3–55
1–12
355
192–504
22–53
799
474–1132
39–105
2000
Lim et al. (2012) (GBD 2010, US and China)
86
58
20
858
563
185
3100
2010
Forouzanfar et al. (2015) (GBD 2013, US and China)
78
54
17
916
600
201
2900
2013
WHO (2014) (WHO region)
152
1669
3700
1505
227
2012
Fang et al. (2013) (North America and East Asia)
38
4.4
661
53
1532
95
2000
Silva et al. (2013)b (continents)
12.2–77
908–1240
1880–2380
2000
US EPA (2010)a (US)
26–360
2005
US EPA (2009) (US)
144
Punger and West (2013) (US)
66
61
9.9
2005
Lelieveld et al. (2015) (US and China)
55
1357
3297
2010
Sun et al. (2015) (US)
103.3
68.3
15.4
2000
Rohde and Muller (2015) (China)
1600
2014
This Study: Satellite (US and China) with Burnett et al. (2014)
50
38
5
1271
9
138
2004–2011
This Study: Model (US and China) with Burnett et al. (2014)
43
32
4
1300
931
144
2004–2011
Estimating the burden of disease associated with particulate air pollution
requires robust estimates of PM2.5 exposure. Fixed-site monitoring
networks can be costly to operate and maintain, and the sampling time period
for many of these monitors in the United States is often only every third or
sixth day. Due to the high spatial and temporal variability in aerosol
concentrations, this makes it difficult to determine exposure and widespread
health impacts. Worldwide, monitoring networks are even scarcer, with many
developing countries lacking any long-term measurements. “Satellite-based”
concentrations are now used extensively for estimating mortality burdens and
health impacts (e.g. Crouse et al., 2012; Evans et al., 2013; Fu et al.,
2015; Hystad et al., 2012; Villeneuve et al., 2015). Satellite observations
of aerosol optical depth (AOD) can offer observational constraints for
population-level exposure estimates in regions where surface air quality
monitoring is limited; however they represent the vertically integrated
extinction of radiation due to aerosols, and thus additional information on
the vertical distribution and the optical properties of particulate matter is
required (often provided by a model) to translate these observations to
surface air quality (van Donkelaar et al., 2006, 2010; Liu et al., 2004,
2005). Alternatively, some studies have relied on model-based estimates of
PM2.5 exposure. Table 1 shows that the resulting estimates of premature
mortality vary widely. Here, we discuss these different methods and contrast
the uncertainty in these approaches for estimating exposure for both the US,
where air quality has improved due to regulations and control technology, and
China, where air quality is a contemporary national concern. Our objective is
to investigate the factors responsible for uncertainty in chronic exposure to
PM2.5 burden of disease estimates, and use these uncertainties to
contextualize the comparison of satellite-based and model-based estimates of
premature mortality with previous work. As health impact assessment methods
are becoming more popular in the scientific literature, a greater
understanding of the uncertainties in these methods and the data sets that are
used is important.
Methods and tools
General formulation to calculate the burden of disease
To estimate the burden of premature mortality due to a specific factor like
PM2.5 exposure, we rely on Eqs. (1) and (2) (Eqs. 6 and 8 in Ostro, 2004
and as previously used in van Donkelaar et al., 2011; Evans et al., 2013;
Marlier et al., 2013; Zheng et al., 2015). The attributable fraction (AF) of mortality due to
PM2.5 exposure depends on the relative risk value (RR), which here is
the ratio of the probability of mortality (all-cause or from a specific
disease) occurring in an exposed population to the probability of mortality
occurring in a non-exposed population. The total burden due to PM2.5
exposure (ΔM) can be estimated by convolving the AF with the baseline
mortality (equal to the baseline mortality rate
Mb × exposed population P). The relative risk is assumed
to change (ΔRR) with concentration, so that, in general, exposure to
higher concentrations of PM2.5 should pose a greater risk for premature
mortality (Sect. 2.4).
AF=(RR-1)/(RR)(or the alternate form ofAF=ΔRR/(ΔRR+1)ΔM=Mb×P×AF
Application of this approach requires information on the baseline mortality
rates and population, along with the RR, which is determined through a
concentration response function (including a shape and initial relative risk,
Sect. 2.4), and ambient surface PM2.5 concentrations.
Population density (per km2) for the year 2000 from the GPWv3
data for (a) the continental US and (c) China. The projection for increase
in population density by the year 2015 for (b) the continental US and
(d) China.
Baseline mortality and population
For population data, we use the Gridded Population of the World, Version 3
(GPWv3), created by the Center for International Earth Science Information
Network (CIESIN) and available from the Socioeconomic Data and Applications
Center (SEDAC). This gridded data set has a native resolution of
2.5 arcmin (∼ 5 km at the equator) and provides population
estimates for 1990, 1995, and 2000, and projections (made in 2004) for 2005,
2010, and 2015. We linearly interpolate between available years to get
population estimates for years not provided. Population density for China and
the United States for the year 2000 are shown in Fig. 1 along with the
projected change in population density by the year 2015, illustrating
continued growth of urbanized areas (at the expense of rural regions in
China). We also compare mortality estimates using only urban area population
(similar to Lelieveld et al., 2013, which estimates premature mortality in
megacities). For this, we rely on the populated places data set (provided by
Natural Earth, which gives values for a point location rather than a grid and
includes all major cities and towns along with some smaller towns in sparsely
inhabited regions) which is determined from LandScan population estimates
(Dobson et al., 2000). In the US, approximately 80 % of the population lives in urban areas. For China,
36 % of the population lived in urban areas in 2000, but this number rose
to 53 % in 2013 (World Bank, 2015).
To determine baseline mortalities in the US for cardiovascular disease
(ischemic heart disease and stroke), lung cancer, and respiratory disease, we
use mortality rates for each cause of death for all ages from the Center for
Disease Control (cdc.gov) for each year and each state. We multiply the
gridded population by these state-level mortality rates to obtain the
baseline mortality in each grid box. Other studies have also used
country-wide (or regional) (e.g. Evans et al., 2013) or county-level (e.g.
Fann et al., 2013) average deaths rates by cause. Some studies use the
mortality rate for all cardiovascular diseases, which would produce larger
estimates than just using ischemic heart disease and cerebrovascular disease
(stroke). Additionally, some studies also only consider respiratory deaths
related to ozone exposure. Mortality values are not as readily available for
China, so we rely on country-wide values for baseline mortality (WHO
age-standardized mortality rates by cause). Therefore, in China spatial
variations in Mb are only due to variations in population and not
regional variations in actual death rates (i.e. we do not account for
death-specific mortality rates varying between provinces). In order to
account for some regional variability in mortality rates, we use a population
threshold to distinguish between urban and rural regions for lung cancer
mortality rates (Chen et al., 2013a).
Relative risk
The relative risk (RR) is a ratio of the probability of a health endpoint (in
this case premature mortality) occurring in a population exposed to a certain
level of pollution to the probability of that endpoint occurring in a
population that is not exposed. Values greater than one suggest an increased
risk, while a value of one would suggest no change in risk. These values are
determined through epidemiological studies which relate individual health
impacts to changes in concentrations, and literature values span a large
range (Fig. 2). While these studies attempt to account for differences in
populations, lifestyles, pre-existing conditions, and co-varying pollutants,
relative risk ratios determined from each study still differ. This is likely
due to variables not taken into consideration, errors in exposure estimates
(“exposure misclassification”) (Sheppard et al., 2012), and because,
although the long-term effects of exposure to atmospheric pollutants have
been well-documented, the pathophysiological mechanisms linking exposure to
mortality risk are still unclear (Chen and Goldberg, 2009; Pope and Dockery,
2013; Sun et al., 2010), which make it difficult to determine how
transferable results are from the context in which they were generated.
Relative risk ratios from select previous studies for mortality due
to chronic exposure to PM2.5 (given as per 10 µg m-3
increase) colored by cause of death. Studies applied in this work are
highlighted in bold.
For our initial estimates, we use the integrated risk function from Burnett
et al. (2014) for heart disease, respiratory disease, and lung cancer
premature mortality due to chronic exposure. We also compare our results to
premature mortality estimates using risk ratios determined by Krewski et
al. (2009), which is an extended analysis of the American Cancer Society
study (Pope et al., 1995), and from Laden et al. (2006) which is an updated
and extended analysis of the Harvard Six Cities study (Pope et al., 2002).
The updated Krewski et al. (2009) risk ratios have been widely used in
similar studies due to the large study population with national coverage,
18-year time span, and extensive analysis of confounding variables
(ecological covariates, gaseous pollutants, weather, medical history, age,
smoking, etc.). However, the Burnett et al. (2014) function is becoming more
widely used in the literature (e.g. Lelieveld et al., 2015; Lim et al., 2012;
Apte et al., 2015) because it provides the shape of the mortality function
for the global range of exposure concentrations. Using these different risk
ratios can make our results more directly comparable to studies in Table 1
which rely on the risk ratios from these four studies (Burnett et al., 2014;
Krewski et al., 2009; Laden et al., 2006; Pope et al., 2002).
Concentration response function
In order to determine an attributable fraction, it is necessary to understand
how the response changes with concentration (i.e. does the relative risk
increase, decrease, or level off with higher concentrations?). The shape of
this concentration response function is an area of on-going epidemiological
research (e.g. Burnett et al., 2014; Pope et al., 2015).
In the simplest form, it might be assumed that the change in relative risk
(RR, given as per 10 µg m-3) linearly depends on the surface
PM2.5 concentration (C, in µg m-3) as given in Eq. (3).
ΔRR=(RR-1)×(C-C0)/10
In this equation, C0 can be considered the “policy relevant
(PRB)/target”, “natural background” or
“threshold”/“counterfactual”/“lowest effect level” surface PM2.5
concentration. Studies have shown that there is not a concentration level
below which there is no adverse health effect for PM (e.g. Pope et al., 2002;
Shi et al., 2015), and most experts in health impacts of ambient air quality
agree that there is no population-level threshold (although there may be
individual-level thresholds, e.g. Roman et al., 2008). However, there are few
epidemiological studies in regions with very low annual average
concentrations (Crouse et al., 2012, does record a
1.9 µg m-3 annual concentration in rural Canada) making it
difficult to determine the health risks in relatively clean conditions. How
to extrapolate the relationship out of the range of observed measurements is
uncertain. Therefore, rather than assuming that the function is linear down
to zero, studies often set C0 to the value of the lowest measured level
(LML) observed in the epidemiology study from which the RRs are derived (e.g.
Evans et al., 2013, use 5.8 µg m-3 with the RR from Krewski
et al., 2009) or use the “policy relevant” background (PRB, generally
0–2 µg m-3) concentration. This is the level to which
policies might be able to reduce concentrations and is generally determined
from model simulations in which domestic anthropogenic emissions have been
turned off (e.g. Fann et al., 2012). Similarly, some studies have set this
value to preindustrial (1850) pollution levels (e.g. Fang et al., 2013; Silva
et al., 2013).
Linear response functions are generally a good fit to observed responses at
lower concentrations (Pope et al., 2002). However, studies suggest that
linear response functions can greatly overestimate RR at high concentrations
(e.g. Pope et al., 2015), where responses may start to level off. There is
uncertainty at high concentrations because most epidemiology studies of the
health effects of air pollution exposure have generally been conducted under
lower concentrations (i.e. in the US). In order to determine the shape of
this response at higher concentrations, smoking has been used as a proxy
(Burnett et al., 2014; Pope et al., 2011, 2009), which does show a
diminishing response at higher concentrations. Therefore, both log-linear
(Eqs. 4 and 5, where β=0.15515/0.23218 for heart disease/lung cancer
from Pope et al. (2002) or β=0.18878/0.21136 for heart disease/lung cancer from
(Krewski et al., 2009, in Eq. 5 and β=0.01205/0.01328 for heart
disease/lung cancer from Krewski et al., 2009, in Eq. 4) and power law
(Eq. 6, where I is the inhalation rate of 18 m3 day-1, β=0.2730/0.3195, α=0.2685/0.7433 for heart disease/lung cancer from
Pope et al., 2011, and as used in Marlier et al., 2013) functions have been
also been explored.
ΔRR=exp[β(C-C0)]-1ΔRR=[(C+1)/(C0+1)]β-1ΔRR=α(I×C)β
We note that Cohen et al. (2005) and Anenberg et al. (2010) reference Eq. (4)
as a log-linear function, while Ostro (2004), Evans et al. (2013), and
Giannadaki et al. (2014) use this as their linear function and instead use
Eq. (5) as their log-linear function, we will refer to these equation numbers
for clarity in other sections. Another method to limit the response at high
concentrations is to simply use a “ceiling”, “maximum
exposure/high-concentration threshold”, or “upper truncation” value in
which it is assumed that the response remains the same for any value above it
(e.g. Anenberg et al., 2012; Cohen et al., 2005; Evans et al., 2013). This
can be a somewhat arbitrary value or the highest observed concentration in
the original epidemiological study.
More recently, Burnett et al. (2014) fit an integrated exposure response
(IER) model using RRs from a variety of epidemiological studies on ambient
and household air pollution, active smoking, and secondhand tobacco smoke in
order to determine RR functions over all global PM2.5 exposure ranges
for ischemic heart disease, cerebrovascular disease, chronic obstructive
pulmonary disease, and lung cancer (Eq. 7). Monte Carlo simulations were
conducted in order to derive the 1000 sets of coefficients for the
IER function (the coefficients are available at
http://ghdx.healthdata.org/record/global-burden-disease-study-2010-gbd-2010-ambient-air-pollution-risk-model-1990-2010):
forC<C0,ΔRR=0forC≥C0,ΔRR=α1-exp[-γ(C-C0)ρ.
This form is now being widely used (Apte et al., 2015; Lelieveld et al.,
2015; Lim et al., 2012), and we use it here for our baseline estimates. In
the following sections, we will discuss the uncertainty on the burden of
disease associated with the shape of the concentration response function and
threshold concentration.
Estimating surface PM2.5
We use both a global model and satellite observations to estimate surface
PM2.5 concentrations and translate these to PM2.5 exposure and
health burden. In addition, we use surface measurements of PM2.5 to test
the accuracy of these estimates.
Long-term average (2004–2011) unconstrained model simulation of
PM2.5 for the (a) continental US and (b) China, along
with the (MODIS-Aqua Collection 6) satellite-based PM2.5 for the
(c) continental US and (d) China, and the difference
between the satellite-constrained and unconstrained model PM2.5
concentrations.
Unconstrained model simulation
We use the global chemical transport model GEOS-Chem (geos-chem.org) to
simulate both surface PM2.5 and AOD. We use v9.01.03 of the model,
driven by GEOS-5 meteorology, in the nested grid configuration over North
America and Asia (0.5∘ × 0.667∘ horizontal
resolution) for 2004–2011. Using this longer time period gives greater
confidence in our uncertainty results. The GEOS-Chem aerosol simulation
includes sulfate, nitrate, ammonium (Park et al., 2004), primary carbonaceous
aerosols (Park et al., 2003), dust (Fairlie et al., 2007; Ridley et al.,
2012), sea salt (Alexander et al., 2005), and secondary organic aerosols
(SOA) (Henze et al., 2008). There are several regional anthropogenic emission
inventories used in the model, such as BRAVO over Mexico (Kuhns et al.,
2003), EMEP over Europe (Vestreng et al., 2007), CAC for Canada
(http://www.ec.gc.ca/pdb/cac/cac_home_ e.cfm), the EPA NEI05 inventory
(Hudman et al., 2007, 2008) over the US, and Streets et al. (2006) over Asia.
Any location not covered by one of these regional inventories relies on the
GEIA (Benkovitz et al., 1996) and EDGAR (Olivier and Berdowski, 2001;
Vestreng, 2003) inventories.
Biofuel emissions over the US are also from the EPA NEI05 inventory (Hudman
et al., 2007, 2008) and anthropogenic emissions of black and organic carbon
over North America follow Cooke et al. (1999) with the seasonality from Park
et al. (2003). Biogenic VOC emissions are calculated interactively following
MEGAN (Guenther et al., 2006), while year-specific biomass burning is
specified according to the GFED2 inventory (van der Werf et al., 2006).
Surface dry PM2.5 is calculated by combining sulfate, nitrate, ammonium,
elemental carbon, organic matter, fine dust, and accumulation mode sea salt
concentrations in the lowest model grid box. In the following discussion,
these values are referred to as the “unconstrained model”. Simulated AOD is
calculated at 550 nm based on aerosol optical and size properties as
described in Ford and Heald (2013).
Satellite-based
We also derive a satellite-based surface PM2.5 using satellite-observed
aerosol optical depth, with additional constraints from the GEOS-Chem model,
in a similar manner to Liu et al. (2004, 2007) and van Donkelaar et
al. (2006, 2010, 2011). This method relies on the following relationship:
PM2.5,surface=η×AODsatellite,
where the satellite-derived PM2.5 is estimated at the resolution of the
unconstrained model by multiplying the satellite observed AOD by the value
η, which is the ratio of model simulated surface PM2.5 to simulated
AOD at the time of the satellite overpass. This is then a combined product
which relies on a chemical transport model to simulate the spatially and
temporally varying relationship between AOD and surface PM2.5 by
accounting for all the aerosol properties and varying physical distribution
and then constraining these results by “real” (i.e. satellite) measurements
of AOD. Using the satellite to constrain the model concentrations is
extremely useful in regions where emissions inventories and model processes
are less well known.
For AOD, we use observations from the Moderate Resolution Imaging
Spectroradiometer (MODIS) instrument and from the Multi-angle Imaging
SpectroRadiometer (MISR) instrument. For this work we use MODIS 550 nm
Level 2, Collection 6, Atmosphere Products for Aqua as well as Level 2,
Collection 5 for Aqua. We filter these data for cloud fraction (CF < 0.2)
and remove observations with high AOD (> 2.0) as in Ford and Heald (2012), as cloud contamination
causes known biases in AOD (Zhang et al., 2005), although we note that this could remove high pollution
observations, particularly in China. For MISR, we also use the Level 2 AOD
product (F12, version 22, 500 nm). We note that this is a different
wavelength than from the MODIS instrument, but we neglect that difference for
these comparisons. We use both of these observations for comparison as MODIS
has a greater number of observations while MISR is generally considered to
better represent the spatial and temporal variability of AOD over China
(Cheng et al., 2012; Qi et al., 2013; You et al., 2015). Satellite
observations are gridded to the GEOS-Chem nested grid resolution. We sample
GEOS-Chem to days and grid boxes with valid satellite observations to
calculate the η used to translate the AOD to surface PM2.5.
In Fig. 3, we show the long-term average (2004–2011) of satellite-based
PM2.5 for the US and China using MODIS Aqua Collection 6 and compare
this to model-only estimates. In the following sections, most of our results
will be shown using Collection 6; but reference and comparisons will be made
to other products as a measure of uncertainty. In general the unconstrained
model and satellite-based estimates show similar spatial features and
magnitudes, with stronger local features apparent in the satellite-based
PM2.5. The satellite-based estimate suggests that concentrations should
be higher over much of the western US, particularly over California, Nevada,
and Arizona (comparisons with surface measurements are discussed in
Sect. 2.5.3). In China, the satellite-derived PM2.5 is higher in Eastern
China, around Beijing and the Heibei province, Tianjin, and Shanghai, but
lower in many of the central provinces. While many previous studies suggest
that MODIS may be biased high (and MISR biased low) over China (e.g. Cheng et
al., 2012; Qi et al., 2013; You et al., 2015) and the Indo-Gangetic Plain
(Bibi et al., 2015); Wang et al. (2013) note that the GEOS-Chem model
underestimates PM2.5 in the Sichuan basin, suggesting that the MODIS
satellite-based estimate could reduce the bias in this province.
Surface-based observations
We use observations of PM2.5 mass from two networks in the United States
(where long-term values are more readily available than in China) to evaluate
the model and satellite-derived PM2.5: the Interagency Monitoring of
Protected Visual Environments (IMPROVE) and the EPA Air Quality System (AQS)
database. The IMPROVE network measures PM2.5 over a 24-h period every
third day and these measurements are then analyzed for concentrations of
fine, total, and speciated particle mass (Malm et al., 1994). We use the
reconstructed fine mass (RCFM) values, which are the sum of ammonium sulfate,
ammonium nitrate, soil, sea salt, elemental carbon and organic matter.
Previous studies have generally shown good agreement between measurements and
GEOS-Chem simulations of PM2.5 (e.g. Ford and Heald, 2013; van Donkelaar
et al., 2010). In Fig. 4, we show the long-term average of PM2.5 at AQS
and IMPROVE sites in the US overlaid on simulated concentrations. In general,
GEOS-Chem agrees better with measurements at IMPROVE sites, likely because
these are located in rural regions where simulated values will not be as
impacted by the challenge of resolving urban plumes in a coarse Eulerian
model. There are noted discrepancies in California (Schiferl et al., 2014)
and the Appalachia/Ohio River Valley region where the model is biased low.
The model has a low mean bias of -25 % compared to measurements at the
EPA AQS sites and a bias of -6 % compared to measurements at IMPROVE
sites. Annual mean bias at individual sites ranges from -100 to 150 %.
At these same AQS sites, the satellite-derived PM2.5 is less biased
(-12 % using MODIS C6 or -8 % using MISR).
GEOS-Chem simulated average surface PM2.5 mass for years
2004–2011 overlaid with measurements at IMPROVE (circles) and AQS sites
(diamonds).
To estimate the uncertainty in satellite AOD, we also rely on surface-based
measurements of AOD from the global AErosol RObotic NETwork (AERONET) of sun
photometers. AOD and aerosol properties are recorded at eight wavelengths in
the visible and near-infrared (0.34–1.64 µm) and are often used to
validate satellite measurements (e.g. Remer et al., 2005). AERONET AOD has an
uncertainty of 0.01–0.015 (Holben et al., 1998). For this work, we use
hourly Version 2 Level 2 measurements sampled to 2-hour windows around the
times of the satellite overpasses. We also perform a least-square polynomial
fit to interpolate measurements to 550 nm.
Percent of the population exposed to different annual PM2.5
concentrations in the US (a) and China (b). Lines denote
estimates using the unconstrained GEOS-Chem simulation (red) or using
satellite-based estimates with MODIS (green) and MISR (blue). Shading
represents potential uncertainty associated with the model η (described
in Sect. 4.2) and dashed black lines represent national annual air quality
standards.
Estimated health burden associated with exposure to PM2.5
We compare national exposure estimates for the US and China using
unconstrained and satellite-based (MODIS and MISR) annual average PM2.5
concentrations in Fig. 5, which is a cumulative distribution plot that is
calculated as the sum of the population in each grid box which has an annual
average concentration at or above each concentration level. For the US,
satellite-based estimates suggest a slightly greater fraction of the
population is exposed to higher annual average concentrations, while in
China, the satellite-based estimates suggest that a lower fraction. Using MISR AOD
suggests higher annual average concentrations in the US and much lower in
China, as MISR has a high bias in regions of low AOD and a low bias in
regions of high AOD (Jiang et al., 2007; Kahn et al., 2010). The large
discrepancy between results from MISR and MODIS could be due to differencing
in sampling, but studies have also shown that MODIS is biased high in China
and MISR is biased low (Cheng et al., 2012; Qi et al., 2013; You et al.,
2015). We further discuss the uncertainties in these estimates in Sect. 4.
These exposure estimates are used to calculate an attributable fraction of
mortality associated with heart disease, lung cancer, and respiratory disease
attributable to chronic exposure using both model and satellite-based annual
average concentrations for the US and China (Table 1). In the US, we estimate
that exposure to PM2.5 accounts for approximately 2 % of total
deaths (6 % of heart diseases and 5 % of respiratory diseases)
compared to 14 % (33 % of heart and 22 % of respiratory) in China
using satellite-based concentrations. The Global Burden of Disease estimates
for 2010, that 10 % of total deaths in China and 3 % of total deaths
in the US are attributable to exposure to PM2.5 (Lim et al., 2012). We
present these as an average over the 2004–2011 time period in order to
provide more robust results that are not driven by an outlier year, as there
is considerable year-to-year variability in AOD and surface PM2.5
concentrations (for example, heavy dust years in China). However, there are
trends in population (Fig. 1) and surface concentrations that can influence
these results. For example, there is a significant decreasing trend in AOD
over the northeastern US simulated in the model which is also noticeable in
the satellite observations and the surface concentrations (Hand et al.,
2012). This decreasing trend can be attributed to declining SO2 emissions
in the US as noted in Leibensperger et al. (2012). Trends in China are more
difficult to ascertain as emissions have been variable over this period in
general (Lu et al., 2011; Zhao et al., 2013) with widespread increases from
2004 to 2008 followed by variable trends in different regions through 2011.
The difference between mortality burden estimates using model or satellite
concentrations is approximately 20 % for the US and 2 % for China on
a nationwide basis, although regionally the difference can be much greater. A
question we aim to address here is whether these model and satellite-based
estimates are significantly different.
(a) Percent difference between annual mean AOD from MODIS
Collection 6 and Collection 5 and (b) simulated bias in
satellite-derived annual average surface PM2.5 associated with satellite
sampling.
We compare our results to premature mortality burden estimates from other
studies in Table 1. In general, our estimates for China are higher than most
previous estimates, except for Lelieveld et al. (2015) and Rohde and
Muller (2015). However, these studies provide estimates for 2010 and 2014,
respectively, and we did find an increasing trend in our mortality estimates
over the study time period. For the US, our estimates are in the lower range
of previous studies. The spread among these studies can be attributed to the
data used (i.e. MODIS Collection 5 rather than Collection 6 or unconstrained
model concentrations, choice of baseline mortality rates, and population),
the resolution of the data, the years studied, as well as the risk ratios and
response functions. For example, Evans et al. (2013) also use satellite-based
concentrations (using MISR/MODIS Collection 5 and GEOS-Chem), but use a
different concentration response function and regional baseline mortality
rates. In the following sections, we delineate some of the uncertainty in
these results and reasons for differences compared to previous studies.
Uncertainty in satellite-based PM2.5
Uncertainties in the PM2.5 concentrations derived from satellite
observations arise from the two pieces of information which inform this
estimate: (1) satellite AOD and (2) model η. Here we explore some of the
limitations and uncertainties associated with each of these inputs.
Uncertainty associated with satellite AOD
While satellite observations of aerosols are often used for model validation
(e.g. Ford and Heald, 2012), these are indirect measurements with their own
limitations and errors. The uncertainty in satellite AOD can be due to a
variety of issues such as the presence of clouds, the choice of optical model
used in the retrieval algorithm, and surface properties (Toth et al., 2014;
Zhang and Reid, 2006). For validation of satellite products, studies have
often relied on comparisons against AOD measured with sun photometers at
AERONET ground sites (e.g. Kahn et al., 2005; Levy et al., 2010; Remer et
al., 2005, 2008; Zhang and Reid, 2006). The uncertainty in AOD over land from
MODIS is estimated as 0.05 ± 15 % (Remer et al., 2005), while Kahn
et al. (2005) suggest that 70 % of MISR AOD data are within 0.05 (or
20 % × AOD) of AERONET AOD.
There are also discrepancies between AOD measured by the different
instruments due to different observational scenarios and instrument design.
The Aqua platform has an afternoon overpass while the Terra platform has a
morning overpass. It might be expected that there would be some differences
in retrieved AOD associated with diurnal variations in aerosol loading.
However, the difference of 0.015 in the globally averaged AOD between MODIS
onboard Terra and Aqua (Collection 5), although within the uncertainty range
of the retrieval, is primarily attributed to uncertainties and a drift in the
calibration of the Terra instrument, noted in Zhang and Reid (2010) and Levy
et al. (2010). Collection 6 (as will be discussed further) reduces the AOD
divergence between the two instruments (Levy et al., 2013). MISR employs a
different multi-angle measurement technique with a smaller swath width; as a
result the correlation between MISR AOD and MODIS AOD is only 0.7 over land
(0.9 over ocean) (Kahn et al., 2005).
Normalized mean bias in AOD between MODIS-Aqua Collection 6 and
AERONET sites for (a) the US and (b) China.
Not only are there discrepancies in AOD between instruments, there are also
differences between product versions for the same instrument. The MODIS
Collection 6 Level 2 AOD is substantially different from Collection 5.1 (Levy
et al., 2013, and Fig. 6). In general, AOD decreases over land and increases
over ocean with Collection 6. These changes are due to a variety of algorithm
updates including better detection of thin cirrus clouds, a wind speed
correction, a cloud mask that now allows heavy smoke retrievals, better
assignments of aerosol types, and updates to the Rayleigh optical depths and
gas absorption corrections (Levy et al., 2013). These differences can also
impact the derived PM2.5 (and can explain some differences between our
results and previous studies). In particular, because Collection 6 suggests
higher AOD over many of the urbanized regions, the derived PM2.5 and
resulting exposure estimates (all other variables constant) are greater. The
difference between these two retrieval products, given the same set of
radiance measurements from the same platform, gives a sense of the
uncertainty in the satellite AOD product (Fig. 6a).
We estimate the uncertainty in satellite AOD used here by comparing satellite
observations to AERONET and determining the normalized mean bias (NMB)
between AOD from each satellite instrument and AERONET for the US and China
(Fig. 7). Although there is a very limited number of sites in China, from these
comparisons, we find that the satellites generally agree with AERONET better
in the eastern US and northeastern China than in the western US and western
and southeastern China. There are larger biases in the west near deserts and
at coastal regions where it may be challenging to distinguish land and water
in the retrieval algorithm. NMBs at each AERONET site are generally similar
among the instruments (MISR comparison not shown), with greater differences
at these western sites. While Collection 6 does reduce the bias at several
sites along the East Coast in the US, it is generally more biased at the Four
Corners region of the US. We use these NMBs to regionally “bias correct” our
AOD values and estimate the associated range of uncertainty in our premature
mortality estimates. Compared to the standard MODIS AOD retrieval
uncertainty, our overall NMB is less in the eastern US (-1 %) and
western China (11 %) and higher in the western US (40 %) and eastern
China (18 %).
List of model sensitivity tests and descriptions with results shown
in Fig. 8.
Sensitivity test
Description
AvgAOD
AOD is held constant through season while η varies daily.
AvgEta
AOD varies daily, while η is held constant through season.
AvgProf
Column mass varies daily, but shape of vertical profile is held constant for season. AOD and η vary daily but are re-calculated for redistributed mass.
AvgRH
AOD and η vary daily but are re-calculated assuming relative humidity remains constant throughout season.
2 × 2.5
η values are calculated for simulation run at coarser (2∘ × 2.5∘) resolution and then regrid to nested resolution (0.5∘ × 0.666∘).
SO4
Assume all mass in column is sulfate and recalculate η.
BC
Assume all mass in column is black carbon and recalculate η.
No NO3
Calculate AOD and η without the contribution of nitrate.
There may also be biases associated with the satellite sampling, should
concentrations on days with available observations be skewed. In order to
assess the sampling bias, we use the model and compare the annual mean to the
mean of days with valid observations (Fig. 6b). In general, sampling leads to
an underestimation in AOD (average of 20 % over the US). This can partly
be attributed to the presence of high aerosol concentrations below or within
clouds which cannot be detected by the satellite, the mistaken identification
of high aerosol loading as cloud in retrieval algorithms, as well as the
removal of anomalously high AOD values (> 2.0) from the observational
record. This suggests that the average AOD values can also be influenced by
the chosen filtering and data quality standards. Analysis of the impact of
satellite data quality on the AOD to PM2.5 relationship is discussed in
Toth et al. (2014). They find that using higher quality observations does
tend to improve correlations between observed AOD and surface PM2.5
across the US though in general correlations are low (< 0.55).
Distribution of normalized mean biases in annual average PM2.5
for grid boxes in different regions of the US (top row) and China (bottom
row) determined from sensitivity tests to investigate the uncertainty in
η. Sensitivity tests are described (and abbreviations defined) in
Table 3.
Uncertainty associated with model η
In general, the model simulates PM2.5 well (Fig. 4) and represents
important processes; but, satellite AOD can help to constrain these estimates
to better represent measured concentrations (van Donkelaar et al., 2006).
However, in specific regions or periods of time, errors in η could lead
to discrepancies between satellite-derived and actual surface mass. Snider et
al. (2015) does show some regional biases in the GEOS-Chem model η
compared to η determined from collocated surface measurements of AOD and
PM2.5. In order to assess the potential uncertainty in model-based
η, we perform multiple sensitivity tests to determine the impact that
different aerosol properties, grid-size resolution and timescales will have
on η and, ultimately, on the resulting satellite-based PM2.5
(listed in Table 2). These sensitivity tests are performed solely with model
output, which can provide a complete spatial and temporal record, and results
from the modified simulations are compared to the standard model simulation.
We note that these are “errors” only with respect to our baseline
simulation; we do not characterize how each sensitivity simulation may be
“better” or “worse” compared to true concentrations of surface
PM2.5, but rather how different they are from the baseline, thus
characterizing the uncertainty in derived PM2.5 resulting from the model
estimates of η. We make these comparisons for both the US and China and
show results in Fig. 8. Because mass concentrations in China are generally
much higher, the absolute value of potential errors can also be much greater.
The timescale of the estimated PM2.5 influences the error metric we
choose for this analysis. We use the NMB for estimating error associated with
annual PM2.5 exposure (the metric of interest for chronic exposure).
This allows for the possibility that day-to-day errors may compensate,
resulting in a more generally unbiased annual mean value. The error on any
given day of satellite-estimated PM2.5 is likely larger, and not
characterized by the NMB used here.
Our first sensitivity tests relate specifically to the methodology. To derive
a satellite-based PM2.5 with this method requires model output for every
day and that there are valid satellite observations. Running a model can be
labor intensive, at the same time there are specific regions and time periods
with poor satellite coverage. Therefore, it might be beneficial to be able to
use a climatological η or a climatological satellite AOD. To test the
importance of daily variability in AOD, we compute daily η values and
then solve for daily surface PM2.5 values using a seasonally averaged
model simulated AOD (AvgAOD). This mimics the error introduced by using
seasonally averaged satellite observations, an attractive proposition to
overcome limitations in coverage. This approximation often produces the
greatest error (∼ 20 % in the US and 0–50 % in China)
especially in regions where AOD varies more dramatically and specifically
where transported layers aloft can significantly increase AOD (Fig. 8). For
the seasonally averaged η test (AvgEta), we estimate daily PM2.5
values (which are averaged into the annual concentration) from the seasonally
averaged η and daily AOD values. As regional η relationships can be
more consistent over time than PM2.5 or AOD, this test evaluates the
necessity of using daily model output to define the η relationship. The
error in the annual average of daily PM2.5 values determined using a
seasonally averaged η creates results that are very similar to the error
found calculating an annual average of daily PM2.5 values calculated
using a seasonally averaged AOD.
The model η also inherently prescribes a vertical distribution of
aerosol, which may be inaccurately represented by the model and introduce
errors in the satellite-derived PM2.5. Previous studies have shown that
an accurate vertical distribution is essential for using AOD to predict
surface PM2.5. (e.g. Li et al., 2015; van Donkelaar et al., 2010). We
test the importance of the variability of the vertical distribution in the
η relationship for predicting surface PM2.5 concentrations by
comparing values from the standard simulation against using an η from a
seasonally averaged vertical distribution (AvgProf). For this comparison, we
allow the column mass loading to vary day-to-day, but we assume that the
profile shape does not change (i.e. we re-distribute the simulated mass to
the same seasonally averaged vertical profile). We note that this is not the
same as assuming a constant η, as relative humidity and aerosol
composition are allowed to vary. Additionally, this differs from other
studies (van Donkelaar et al., 2010; Ford and Heald, 2013) in that we are not
testing the representativeness of the seasonal average profile, but testing
the importance of representing the daily variability in the vertical profile.
From Fig. 7, we see that using a seasonally averaged vertical distribution
(AvgProf) can lead to large errors in surface concentrations. Information on
how the pollutants are distributed is extremely important because changes in
column AOD can be driven by changes in surface mass loading, but also by
layers of lofted aerosols that result from production aloft or transport (and
changes in the depth of the boundary layer). This is important in areas that
are occasionally impacted by transported elevated biomass burning plumes or
dust. Large errors often occur in China, especially during the spring when
these regions are influenced by transported dust from the Taklamakan and Gobi
Deserts (Wang et al., 2008). Southeastern China has the largest NMB due to
not only transport from interior China, but also from other countries in
Southeast Asia. There is a positive bias in most regions, because on average,
most of the aerosol mass is located at the surface; therefore, using an
average profile will over predict the surface concentrations. Similar to the
average AOD and η (AvgAOD and AvgEta), average vertical distributions
generally over predict PM2.5 due to the presence of outliers. This
stresses the importance of not only getting the mean profile correct, but the
necessity of also simulating the variability in the profile on shorter
timescales.
We also test the sensitivity of derived PM2.5 to aerosol water uptake.
This is done by recalculating η using a seasonally averaged relative
humidity (RH) profile (AvgRH). This generally reduces the seasonally averaged
AOD (less water uptake) in every season (because hygroscopic growth of
aerosols is non-linear with RH). This leads to an overestimate of η
when applied to the AOD values from the standard simulation and generally
overestimates surface PM2.5 in regions with potentially higher RH and
more hygroscopic aerosols (eastern US and eastern China). This is because,
for the same AOD, a higher η value would suggest more mass at the
surface in order to compensate for optically smaller particles aloft. Western
China (and some of central China) has a negative bias, suggesting that using
a mean relative humidity actually underestimates PM2.5. However, this is
because the RH is generally low but can have large variability, and
concentrations (outside of the desert regions) are also low so that the NMB
may be large although the absolute error is not. A higher resolution model,
although more computationally expensive, will likely better represent small-scale variability and is better suited for estimating surface air quality.
Punger and West (2013) find that coarse resolution models often drastically
underestimate exposure in urban areas. We therefore investigate the grid-size
dependence of our simulated η. For this, we determine the η values
from a simulation running at 2∘ × 2.5∘ grid
resolution (with the same emission inputs and time period), re-grid these
values to the nested grid resolution (0.5∘ × 0.666∘)
and solve for the derived PM2.5 concentrations using the AOD values from
the nested simulation (noted as 2 × 2.5 in Fig. 8). From Fig. 8, we
see larger discrepancies in regions which are dominated by more spatially
variable emissions (Northeastern US and China) rather than areas with broad
regional sources (Southeastern US). This is in line with Punger and
West (2013) who show smaller differences due to resolution in estimated
premature mortality due to PM2.5 exposure in rural areas than in urban
areas. Compared to the other sensitivity tests, using the coarser grid leads
to mean errors of only 10–15 % in the US and in China, which suggests
that spatially averaged η are potentially more useful than temporally
averaged η for constraining surface PM2.5. Thompson and
Selin (2012) and Thompson et al. (2014) show that coarse grids can over
predict pollutant concentrations and consequently health impacts, but using
very fine grids does not significantly decrease the error in simulated
concentrations compared to observations. This effect is more pronounced with
ozone. Additionally, their coarsest grid resolution is 36 km which they
compare to results at 2, 4, and 12 km. Punger and West (2013) compare health
impacts at a variety of resolutions out to several 400 km and show that
coarser resolutions underestimate health impacts because concentrations are
diluted over larger areas instead of allowing high concentrations to be
co-located with large urban populations.
The GEOS-Chem simulation of surface nitrate aerosol over the US is biased
high (Heald et al., 2012). This can be an issue in regions where nitrate has
a drastically different vertical profile (or η) from other species. To
test how this nitrate bias could impact η and the derived PM2.5, we
compute η without nitrate aerosol, and then derive PM2.5 using the
standard AOD (No NO3). This is not a large source of potential error
(< 15 %), with only slightly larger errors in winter and in regions
where nitrate has a significant high bias (central US). Furthermore, these
errors are less than the bias between the model and surface observations of
nitrate in the US (1–2 µg m-3 compared to
2–7 µg m-3), suggesting that even though there is a known
bias in the model, using satellite observations may largely correct for this
by constraining the total AOD when estimating satellite-derived PM2.5.
We also did this comparison for China. Measured nitrate concentrations are
not widely available for evaluation, but Wang et al. (2014) suggests that
model nitrate is also too high in eastern China. The NMB is even less in
regions in China (< 10 %), with negative values in eastern China
(where nitrate concentrations are high) and positive values in western and
central China (where nitrate concentrations are lower and have less bias
compared to observations).
To further explore the role of aerosol composition (and possible
mischaracterization in the model), we take the simulated mass concentrations
and compute the AOD assuming that the entire aerosol mass is sulfate
(SO4 in Fig. 8) or, alternatively, hydrophobic black carbon (BC in
Fig. 8). Black carbon has a high mass extinction efficiency, which is
constant with RH given its hydrophobic nature; while sulfate is very
hygroscopic, resulting in much higher extinction efficiencies at higher
relative humidity values. Overall, assuming that all the mass is sulfate
leads to low biases on the order of 15–20 % as the AOD in many regions
in the US is dominated by inorganics. Errors are largest in regions and
seasons with larger contributions of less hygroscopic aerosols (organic
carbon and dust) and/or high relative humidity. Assuming the entire aerosol
mass is black carbon can lead to greater errors than sulfate because BC has a
larger mass extinction at lower relative humidity values and hydrophobic
black carbon generally makes up a small fraction of the mass loading in all
regions in the US and China. When RH is low, this assumption increases the
AOD, which leads to an under prediction in the derived PM2.5. When RH is
high, this decreases the AOD and leads to an over prediction in derived
PM2.5. The largest percentage changes occur in the southwestern US and
western China (∼ -30 %) due to the low relative humidity, low
mass concentrations, and large contribution of dust.
Premature mortality estimates for (a) the US and
(b) China determined using different RR, CRFs, and threshold/ceiling
values, as described in Table 3. Colors represent cause of death estimated
using PM2.5 concentrations from unconstrained model simulations (solid)
and satellite-based estimates (hatched).
We also compare these sensitivity tests on daily timescales. We do not show
the results here because we rely on chronic exposure (annual average
concentrations) for calculating mortality burdens. The normalized mean biases
in annual average concentrations (Fig. 8) are generally much less (range of
±20 % in US and ±50 % in China) than potential random errors
in daily values as many of these daily errors cancel out in longer term
means. This is the case for our sensitivity tests regarding the vertical
profile and relative humidity, which have much larger errors on shorter
timescales. However, because our method to test the sensitivity to aerosol
type assumes that all aerosol mass is black carbon or sulfate, we introduce a
systematic bias that is not significantly reduced in the annual NMB. This
highlights the differing potential impacts due to systematic and random
errors, which is an important distinction for determining the usefulness of
this method. Systematic errors may not be as obvious on short timescales
compared to random errors (related to meteorology and/or representation of
plumes) that can lead to large biases in daily concentrations. However, these
random errors have less impact when we examine annual average concentrations
and mortality burdens. Systematic errors, potentially related to sources or
processes, may be harder to counteract even on longer timescales and even
when the model is constrained by satellite observations. However, we also
show that random daily errors can bias the long-term mean, stressing the
importance of not only correcting regional biases, but also in accurately
simulating daily variability.
We translate this potential uncertainty in η to potential uncertainty in
mortality estimates determined from the satellite-based PM2.5. We use
the normalized mean bias in annual PM2.5 determined from the sensitivity
tests for RH, the vertical profile, grid resolution, and aerosol composition
for each grid box and then use these values to “bias correct” our
satellite-based annual PM2.5 concentrations and re-calculate exposure
(shown in Fig. 5) and mortality (discussed in Sect. 6). From Fig. 5, we see
that the uncertainty in η, when translated to an annual exposure level,
is larger than the differences in exposure levels estimated from model and
satellite-based PM2.5, suggesting that satellite-based products which
rely strongly on the model or which do not account for the variability in the
aforementioned variables, does not necessarily provide a definitively better
estimate of exposure. Secondly, these uncertainties in many regions are
greater than the difference between both the model and surface PM2.5 and
the satellite-based and surface observations. While these comparisons are
limited spatially and temporally, this highlights that constraining the model
with the satellite observations can improve estimates of PM2.5 but there
remains a large amount of uncertainty in these estimates.
Selection of concentration response function and relative risk
The choice of the shape of the concentration response function (CRF) and
relative risk ratio value explains much of the difference in burden estimated
in different studies listed in Table 1. In general, it is difficult to
determine risks at the population level and studies have found that using
ambient concentrations tends to under predict health effects (e.g. Hubbell et
al., 2009). However, personal monitoring is costly and time-intensive, and
therefore, epidemiology studies generally rely on determining
population-level concentration response functions rather than personal-level
exposure responses. However, populations also respond differently; and
therefore the shape and magnitude of this response varies among studies. The
uncertainty associated with the RR determined in the original epidemiology
study will impact results in any health impact assessment.
Input for premature mortality burden estimate sensitivity tests and
the resulting percent change in mortality due to chronic exposure determined
from satellite-based concentrations. Parentheses are for values determined
from model simulated concentrations.
RR source
Threshold
CRF shape
% Change USA
% Change China
Study using method
In Fig. 8
Burnett et al. (2014)
fitted
IER (Eq. 7)
base
base
Lim et al. (2012);Lelieveld et al. (2015)
B-IER
Burnett et al. (2014)
fitted
IER (Eq. 7)
167 (167)
65 (64)
maximum value determined from set of coefficients
B-IERmax
Krewski et al. (2009)
Lowest measured level (5.8 µg m-3)
Eq. (4)
18 (15)
18 (21)
Evans et al. (2013);Lelieveld et al. (2013)
K-L5.8
Krewski et al. (2009)
Lowest measured level (5.8 µg m-3), ceiling (30 µg m-3)
Eq. (4)
18 (15)
-24 (-26)
Anenberg et al. (2010)
K-Lc30
Krewski et al. (2009)
Lowest measured level (5.8 µg m-3), ceiling (50 µg m-3)
Eq. (4)
90 (93)
6 (4)
Cohen et al. (2004)
K-Lc50
Krewski et al. (2009)
Lowest measured level (5.8 µg m-3)
Eq. (5)
143 (167)
-6 (-7)
Evans et al. (2013)
K-LL5.8
Krewski et al. (2009)
Policy Relevant Background
Eq. (4)
134 (158)
US EPA (2010)
K-LPR
Krewski et al. (2009)
No threshold
Eq. (4)
169 (200)
29 (31)
Silva et al. (2013)
K-L0
Pope et al. (2002)
Lowest measured level (5.8 µg m-3), ceiling (30 µg m-3)
Power Law (Eq. 6)
134 (158)
-26 (-28)
Marlier et al. (2013)
P-PL5.8c30
Pope et al. (2002)
Lowest measured level (7.5 µg m-3)
Power Law (Eq. 6)
102 (105)
-15 (-15)
Pope et al. (2002)
P-PL7.5
Laden et al. (2006)
Lowest measured level (10 µg m-3)
Eq. (4)
-58 (-68)
126 (130)
Anenburg et al. (2010); US EPA (2010)
L-L10
Laden et al. (2006)
Lowest measured level (10 µg m-3)
Eq. (4)
239 (275)
US EPA (2010)
L-LPR
Laden et al. (2006)
Lowest measured level (10 µg m-3); ceiling (30 µg m-3)
Eq. (4)
38 (33)
Anenburg et al. (2010)
L-Lc30
Pope et al. (2002)
Lowest measured level (7.5 µg m-3)
Eq. (4)
-55 (-58)
-25 (-27)
P-L7.5
Pope et al. (2002)
Lowest measured level (7.5 mug m-3)
Eq. (5)
-29 (-28)
1 (1)
P-LL7.5
For an initial metric of the uncertainty in the risk ratios, studies often
include estimates generated using the 95 % confidence intervals of the RR
determined in the original study (as shown in Fig. 2). A confidence interval
shows the statistical range within which the true PM coefficient for the
study population is likely to lie, which could be a single city, region, or
population group. The Krewski et al. (2009) study, which is a reanalysis of
the American Cancer Society (ACS) Cancer Prevention Study II (CPS-II),
included 1.2 million people in the Los Angeles and New York City regions,
whereas the Laden et al. (2006) study, an extended analysis of the Harvard
Six Cities Studies, included 8096 white participants. Using just these
confidence intervals as a measure of uncertainty suggests that there exists a
large range of uncertainty in population-level health responses to exposure
and caution should be exercised when attempting to transfer these values
beyond the population from which they were determined in order to estimate
national-level mortality burdens based on ambient concentrations. The IER
coefficients from Burnett et al. (2014) are generated using the risk ratios,
threshold values, and confidence intervals from previous studies and
therefore also provide a large range in premature mortality estimates. To
depict this range, we also include the 5th and 95th percentile estimates in
addition to the mean estimate. We also show the maximum value in our
sensitivity tests.
Burden of mortality due to outdoor exposure to fine particulate
matter as determined in previous studies (Table 1, gray bars with values from
individual studies designated by black lines), calculated using model
(GEOS-Chem, solid) and satellite-based (hatched) annual concentrations
(colored by disease, whiskers denote 5th and 95th percentile estimates
generated using the Burnett et al., 2014, coefficients). The uncertainty
range on the MODIS-based estimates due to satellite AOD (taupe), model η
(coral), and CRF (blue) are shown on the right.
To test the impact of methodological choices associated with the burden
calculation, we compare results using different concentration response
functions and relative risk ratios that previous studies have used. Table 3
lists the different choices that we explore regarding the CRF and relative
risk, the study that used these values, and the resulting percent change in
burden compared to our initial estimates using the IER from Burnett et
al. (2014). In particular we compare our results using risk ratio values from
Krewski et al. (2009), Pope et al. (2002) and Laden et al. (2006), and
log-linear and power law relationships. Figure 9 shows that the largest
difference in burden is associated with using the higher risk ratios from
Laden et al. (2006) vs. using Krewski et al. (2009) or the mean estimates
determined using the IER coefficients from Burnett et al. (2014), the former
suggest a much greater mortality response to PM2.5 exposure.
Our estimates of Sect. 3 also use the same relative risk values for every
location. However, studies have found that different populations have varied
responses to exposure (potential for “effect modification”) (Dominici et
al., 2003). One of the main uncertainties in these methods is relying on risk
ratios that are primarily determined from epidemiology studies conducted in
the US, which may not represent the actual risks for populations
in China. Long-term epidemiology studies examining exposure to PM2.5
across broad regions of China are scarce, but studies using acute exposure to
PM2.5 or chronic exposure to PM10 or total suspended particles have
suggested lower exposure-response coefficients than determined by studies
conducted in the US and Europe (Aunan and Pan, 2004; Chen et al., 2013b;
Shang et al., 2013), indicating that assessments which use CRFs from studies
conducted in the US might overestimate the health effects in China.
We also explore using different “threshold” values. The IER function uses
threshold values between 5.8 and 8.8 µg m-3. In the US,
higher threshold values can significantly reduce burden estimates. When we
compare sensitivity tests that use the same CRF (Krewski et al., 2009) but
with a regional PRB concentration instead of the lowest measured level
(5.8 µg m-3), the premature mortality estimates are
significantly reduced, suggesting that the choice of this value is very
important in the US where annual mean concentrations are relatively low.
However, in China these threshold values have less impact on our results
because annual mean concentrations are high enough that subtracting a
threshold makes little difference. Conversely, using a ceiling value of
30 or 50 µg m-3 produces no difference
in the US (0 % of the population experiences annual concentration values
greater than 30 µg m-3), while strongly reducing burden
estimates in China.
We also see that the shape of the CRF produces different results between the
US and China. Using a power law or log-linear (Eq. 6) function increases
relative risks at low concentrations and decreases risk ratios at high
concentrations such that total disease burden estimates increase in the US
and decrease in China. In the US, a log-linear CRF is almost equivalent to a
linear response because of the low concentrations. In general, the shape of
the concentration response function is more important at low or very high
concentrations.
Comparison of uncertainty
Figure 10 provides a summary of the different sources of uncertainty
discussed here. We show the mortality burdens for
respiratory disease, lung cancer and heart disease associated with chronic
exposure to ambient PM2.5 and calculated using annual average
model-based and “satellite-based” values (from MISR and MODIS) for both the
US and China. We show here that the satellite-based estimates suggest
slightly higher national burdens in the US and slightly lower in China.
However, our values using these different annual average concentrations fall
within the range of values found in the literature (Table 1).
We further contrast these estimates to the range in uncertainty associated
with our observations and methodology. The difference between the burden
calculated using strictly the model or the satellite-based approach is
greater than the uncertainty range in the satellite AOD, suggesting that this
difference is outside of the scope of measurement limitations and errors.
However, the potential uncertainty in the satellite-based estimate due to the
conversion from AOD to surface PM2.5 (represented by the model η) is
substantially larger, larger even than the difference between model-derived
and satellite-derived estimates. Therefore, while constraining the model
estimate of PM2.5 by actual observations should improve our health
effect estimates, the uncertainty in the required model information may limit
the accuracy of this approach. Again, we stress that these are “potential”
model uncertainties which may overestimate the true uncertainty in regions
where the model accurately represents the composition and distribution of
aerosols. We also acknowledge that we have investigated a limited set of
factors; additional biases may exacerbate these uncertainties. However,
adding additional observational data and model estimates can also help to
better constrain these satellite-based PM2.5 estimates (Brauer et al.,
2012, 2016; van Donkelaar et al., 2015a, b).
Figure 10 also conveys the range in mortality estimates for the US and China
that can result from varying choices for the risk ratio or shape of the
concentration response. While epidemiology studies attempt to statistically
account for differences in populations and confounding variables, there is
still a large spread in determined risk ratios. Just as important, or perhaps
more so than determining ambient concentrations, applying response functions
is a determining factor in quantifying the burden of mortality due to outdoor
air quality. Differences in exposure estimates can be overshadowed by these
different approaches. As an added example, we calculated the mortality burden
using only populated places, similar to Lelieveld et al. (2013) and Cohen et
al. (2004) and find that for the US this decreased the burden by 13 %,
(satellite-based, 18 % for model). For China, this reduces the burden
estimate by 72 %. Differences in our estimates here and those found in
the literature can be partly attributed to differences in application of the
CRF function, along with differences in baseline mortalities and population
estimates. Disease burdens estimated in various studies can therefore only be
truly compared when the methodology is harmonized.
Conclusions
Calculating health burdens is an extremely important endeavor for informing
air pollution policy, but literature estimates cover a large range due to
differences in methodology regarding both the measurement of ambient
concentrations and the health impact assessment. Satellite observations have
proved useful in estimating exposure and the resulting health impacts (van
Donkelaar et al., 2015b; Yao et al., 2013). However, there remain large
uncertainties associated with these satellite measurements and the methods
for translating them into surface air quality that needs to be further
investigated. Our goal with this work is to explore how mortality burden
estimates are made and how choices within this methodology can explain some
of these discrepancies. We also aim to provide a context for interpreting the
quantification of PM2.5 chronic exposure health burdens.
While we have discussed several potential sources for uncertainty in
calculating health burdens with satellite-based PM2.5, there are still a
significant number of other sources of uncertainty that we did not explore.
There are processes that could impact the AOD to PM2.5 relationship in
the model, such as different emissions and removal processes. Additionally,
our sensitivity test results are likely partly tied to the spatial resolution
of the model and the satellite AOD, and their ability to capture finer
spatial variations in pollution in regions with high populations. However,
Thompson et al. (2014) suggest that uncertainty in the CRF will likely still
have a larger impact than uncertainties in population-weighted concentrations
due to model resolution.
Satellite measurements have provided great advancements in monitoring global
air quality, providing information in regions with previously few
measurements. However, further progress still needs to be made in determining
how to characterize exposure to ambient PM2.5 using these satellite
observations, especially as they are becoming more widely used in
epidemiological studies and health impact assessments. Reducing uncertainty,
even at the lower concentrations observed in the US, is important if these
methods and data sets are to be used for policy assessment or air quality
standards. However, as air pollution is a leading environmentally-related
cause of premature mortality, the difficulties in applying these data should
not negate the importance of this endeavor. Overcoming sampling limitations
in satellite observations and better accounting for regional biases could
help to reduce the uncertainty in satellite-retrieved AOD and adding
additional observational data and model estimates can help to better
constrain satellite-based PM2.5 estimates (Brauer et al., 2012, 2016;
van Donkelaar et al., 2015a, b). Future geostationary satellites will also be
critical to advance this methodology and will provide extremely valuable
information for daily monitoring and tracking of air quality. Furthermore,
these geostationary observations, in concert with greater surface monitoring,
will offer new constraints for epidemiological studies to develop health risk
assessments and lessen the uncertainty in applying concentration-response
functions and determining health burdens.