10 yr spatial and temporal trends of PM 2.5 concentrations in the southeastern US estimated using high-resolution satellite data

Long-term PM 2.5 exposure has been reported to be associated with various adverse health outcomes. However, most ground monitors are located in urban areas, lead-ing to a potentially biased representation of the true regional PM 2.5 levels. To facilitate epidemiological studies, accurate estimates of spatiotemporally continuous distribution 5 of PM 2.5 concentrations are essential. Satellite-retrieved aerosol optical depth (AOD) has been widely used for PM 2.5 concentration estimation due to its comprehensive spatial coverage. Nevertheless, an inherent disadvantage of current AOD products is their coarse spatial resolutions. For instance, the spatial resolutions of the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Multiangle Imaging Spectro- 10 Radiometer (MISR) are 10 km and 17.6 km, respectively. In this paper, a new AOD product with 1 km spatial resolution retrieved by the multi-angle implementation of atmospheric correction (MAIAC) algorithm was used. A two-stage model was developed to account for both spatial and temporal variability in the PM 2.5 -AOD relationship by incorporating the MAIAC AOD, meteorological ﬁelds, and land use variables as predic- 15 tors. Our study area is in the southeastern US, centered at the Atlanta Metro area, and data from 2001 to 2010 were collected from various sources. The model was ﬁtted for each year individually, and we obtained model ﬁtting R 2 ranging from 0.71 to 0.85, MPE from 1.73 to 2.50 µgm − 3 , and RMSPE from 2.75 to 4.10 µgm − 3 . In addition, we found cross validation R 2 ranging from 0.62 to 0.78, MPE from 2.00 to 3.01 µgm − 3 , and RM- 20 SPE from 3.12 to 5.00 µgm − 3 , indicating a good agreement between the estimated and observed values. Spatial trends show that high PM 2.5 levels occurred in urban areas and along major highways, while low concentrations appeared in rural or mountainous areas. A time series analysis was conducted to examine temporal trends of PM 2.5 concentrations in the study area from 2001 to 2010. The results showed that the PM 2.5 25 levels in the study area followed a generally declining trend from 2001 to 2010 and decreased about 20 % during the period. However, there was an exception of an increase in year 2005, which is attributed to elevated sulfate concentrations in the study area in


Introduction
Long-term exposure to PM 2.5 (particle size less than 2.5 µm in the aerodynamic diameter) is associated with various adverse health outcomes including respiratory and 5 cardiovascular diseases (Crouse et al., 2012;Peng et al., 2009). Due to the spatiotemporally continuous nature of fine particles, obtaining long-term and spatially-resolved distribution of PM 2.5 concentrations is important to reduce exposure misclassification and facilitate accurate epidemiological studies in the region. In addition, time series analyses of air pollution and human health have become the most common study 10 design to compare day-to-day fluctuations of air pollution and corresponding fluctuations in health outcomes (Ito et al., 2007), requiring long-term PM 2.5 concentration estimates. Previous research examined temporal trends in PM 2.5 levels. For instance, Weber et al. (2003) investigated the temporal variations in PM 2.5 mass at the US Environmental Protection Agency (EPA) Atlanta Supersite Experiment in August 1999.
15 So et al. (2007) examined long-term variation in PM 2.5 levels during two 12 month periods in Hong Kong. EPA (2011) reported temporal trends of annual and 24 h mean PM 2.5 concentrations at the national level from 2001 to 2010 and found that annual and 24 h mean PM 2.5 concentrations dropped 24 % and 28 %, respectively, within these ten years. 20 Another critical aspect is the investigation of spatial trends of PM 2.5 levels. Stationary ambient monitors have been established to measure ground-level PM 2.5 concentrations, yet those point measurements leave large areas uncovered, and therefore spatial variability is difficult to assess with those point measurements alone. In addition, these sites do not measure individual-specific exposure, and thus this approach inevitably 25 introduce measurement errors that likely have substantial implications for interpreting epidemiological studies, especially for time-series analyses (Zeger et al., 2000). Introduction On the other hand, aerosol observations from satellite remote sensing could substantially improve estimates of population exposure to PM 2.5 (van Donkelaar et al., 2010). As a result, satellite-retrieved aerosol optical depth (AOD) that measures light extinction by aerosols in the atmospheric column has been widely used to predict groundlevel PM 2.5 concentrations, considering its relatively low cost and large spatiotemporal 5 coverage. A number of AOD products from sensors such as the Moderate Resolution Imaging Spectroradiometer (MODIS), the Multiangle Imaging SpectroRadiometer (MISR), and the Geostationary Operational Environmental Satellite Aerosol/Smoke Product (GASP) have been applied to PM 2.5 concentration prediction in previous studies (Liu et al., 2007(Liu et al., , 2009Paciorek et al., 2008;Hu et al., 2013). Nevertheless, one 10 of the limitations inherent with those AOD products is their coarse spatial resolutions. For instance, the spatial resolutions of AOD derived from MODIS and MISR are 10 km and 17.6 km, respectively. Although GASP has a spatial resolution of 4 km, the AOD retrievals are less precise than those from the polar-orbiting instruments due to a limited information content (one spectral band) and relatively low signal-to-noise ratio of 15 GOES sensor (Prados et al., 2007). Meanwhile, epidemiological studies typically have access to health data geo-coded to small geographical units (e.g., zip code and census block groups), many of which are substantially smaller than the spatial resolutions of MODIS and MISR. In addition, estimated PM 2.5 concentrations at coarse resolutions inevitably omit some details of spatial variability of PM 2.5 exposure and therefore have 20 fundamental limitations in the investigation of spatial trends of PM 2.5 levels. Hence, it is essential to use high resolution AOD retrievals to generate high spatial resolution PM 2.5 concentration estimates. Recently, a new AOD product retrieved by a multi-angle implementation of atmospheric correction (MAIAC) algorithm has been reported (Lyapustin et al., 2011b). MAIAC AOD has a spatial resolution of 1 km and thus has the 25 ability to estimate PM 2.5 concentrations at that resolution. Moreover, the MAIAC AOD has been demonstrated to be strongly associated with monitored PM 2.5 levels in the New England region (Chudnovsky et al., 2012) Various statistical methods have been developed to establish the quantitative relationship between PM 2.5 and satellite-derived AOD including linear regression (Schafer et al., 2008;Wallace et al., 2007;Gupta and Christopher, 2009), many of which do not consider day-to-day variability. Lee et al. (2011) andKloog et al. (2011) argued that the PM 2.5 -AOD relationship varies day-to-day, and the temporal variability needs to be 5 accounted for in order to improve performance of the AOD-based prediction models. As a result, both studies developed a linear mixed effects model to incorporate daily calibration of the PM 2.5 -AOD relationship and obtained predictions with high accuracy. To move one step further, we introduce a geographically weighted regression (GWR) model as the second stage to account for the spatial variability in the PM 2.5 -AOD rela-10 tionship.
The objective of this paper is first, to estimate spatiotemporally-resolved PM 2.5 concentrations in the study domain during the period between 2001 and 2010 using a twostage model with MAIAC AOD as the primary predictor and meteorological and land use variables as the secondary predictors. Second, maps of annual mean PM 2.5 con-15 centrations as well as the changes between 2001 and 2010 were generated from the daily estimates to visually illustrate spatial trends of annual PM 2.5 levels from 2001 to 2010. Third, time series analyses were conducted for the study domain and the Atlanta metro area using the monthly and annual mean PM 2.5 estimates to examine the 10 yr temporal trends of fine particle levels, and the underlying causes were discussed. Fi-20 nally, potential impact of wild and prescribed fires on PM 2.5 levels in the south of our study region was investigated.

Study area
The study area is approximately 600 × 600 km 2 in the southeastern US, covering most 25 of Georgia, Alabama, and Tennessee, and parts of North and South Carolina (Fig. 1 The domain includes several large urban centers, numerous medium to small cities, and suburban and rural areas. Additionally, Zhang et al. (2010) reported that biomass burning contributes 13 % to PM 2.5 mass annually in the southeastern US. Therefore, to investigate the impact of wild and prescribed fires on PM 2.5 levels, two test sites of the same size (e.g., 18 × 18 km 2 ) were selected in the south of the study area. Those

PM 2.5 measurements 10
The 24 h average PM 2.5 concentrations from 2001 to 2010 collected from US Environmental Protection Agency (EPA) federal reference monitors (FRM) were downloaded from the EPA's Air Quality System Technology Transfer Network (http://www.epa.gov/ttn/airs/airsaqs/). PM 2.5 concentrations less than 2 µg m −3 (∼0.2-3 % of total data records) were discarded as they are below the established limit of 15 detection (EPA, 2008a).

Remote sensing data
MAIAC retrieves aerosol parameters over land at 1 km resolution, which was accomplished by using the time series of MODIS measurements and simultaneous processing of a group of pixels in fixed 25 × 25 km 2 blocks (Lyapustin et al., 2011a(Lyapustin et al., , b, 2012. 20 MAIAC uses a sliding window to collect up to 16 days of MODIS radiance observations over the same area and processes them to obtain surface parameters used for aerosol retrievals. To facilitate the time series analysis, MODIS data are initially gridded to a 1 km resolution in a selected projection. For this work, we used MODIS level 1B (calibrated and geometrically corrected) data from Collection 6 re-processing, which Introduction removed major effects of temporal calibration degradation of Terra and Aqua, a necessary prerequisite for the trend analysis. Validation showed that the MAIAC and operational Collection 5 MODIS Dark Target AOD have a similar accuracy over dark and vegetated surfaces, but also showed that MAIAC generally improves accuracy over brighter surfaces, including most urban areas 5 (Lyapustin et al., 2011b). MAIAC AOD data from 2001 to 2010 were obtained from NASA Goddard Space Flight Center. Zhang et al. (2012) found that Terra and Aqua may provide a good estimate of the daily average of AOD, thus the average of Aqua and Terra measurements can be used to predict PM 2.5 concentrations. In this study, Aqua (overpass at ∼1:30 p.m. LT) and 10 Terra (overpass at ∼10:30 a.m. LT) MAIAC AOD values were first combined to improve spatial coverage. In a common MAIAC pixel, there might be only one MAIAC product from either Aqua or Terra, or both may be present. In the second case, when we combine them, the averaged value represents the mean of the AOD distribution from 10 a.m. to 2 p.m. LT, yet in the first case, AOD as an indicator of PM 2.5 abundance is 15 biased towards the atmospheric condition either in the morning or early afternoon. To overcome this bias, Lee et al. (2011) defined a simple ratio between averaged Terra and Aqua AOD to estimate the missing AOD value. In this study, we fitted a linear regression to define the relationship between daily mean AOD values of MAIAC-Terra and MAIAC-Aqua. We used this regression to predict the missing AOD value (i.e., predict 20 MAIAC-Terra AOD with the available MAIAC-Aqua AOD, and vice versa), then averaged the observed and the predicted AOD values together. Finally, we set an upper bound of 2.0 for the combined AOD to reduce potential cloud contamination (∼0.05-0.1 % of total data records were excluded These data are Terra and Aqua MODIS fire and thermal anomalies data from the official NASA MCD14ML product, Collection 5, Version 1.

Meteorological fields
The meteorological fields provided by the North American Land Data Assimilation System (NLDAS) Phase 2 were downloaded from the NLDAS website 5 (http://ldas.gsfc.nasa.gov/nldas/). The spatial resolution of NLDAS meteorological data is 1/8th-degree (∼13 km). Another meteorological dataset used in this study is the North American Regional Reanalysis (NARR). NARR is a long term, consistent, highresolution climate dataset for North America (Mesinger et al., 2006). The spatial resolution of the NARR dataset is ∼32 km. NLDAS provides most of the meteorological 10 fields used in this analysis, including relative humidity, u-wind, and v-wind, while NARR provides another critical parameter: boundary layer height. To generate daytime meteorological fields corresponding to the MODIS overpass times, 3-hourly NARR measurements and hourly NLDAS measurements from 10 a.m. to 4 p.m. standard local time were averaged.

Land use variables
Elevation data were downloaded from the national elevation dataset (NED) (http://ned.usgs.gov). NED is the seamless elevation dataset covering the conterminous United States and is distributed by the US Geological Survey (USGS). The elevation data are downloaded at a spatial resolution of 1 arcsec (∼30 m

Data integration
All the data were first re-projected to the USA Contiguous Albers Equal Area Conic USGS coordinate system. For model fitting, a 1×1 km 2 square buffer was generated for each PM 2.5 monitor site. Meteorological fields and AOD values were assigned to each PM 2.5 monitor site using the nearest neighbor approach. Forest cover and elevation 10 were averaged, while road length and point emissions were summed over the 1×1 km 2 square buffer. For PM 2.5 prediction, the same procedure was performed for each 1 × 1 km 2 MAIAC grid cell.

Model structure
We developed a two-stage spatiotemporal model. The first stage is a linear mixed 15 effects model with day-specific random intercepts and slopes for AOD, meteorological fields to account for the temporally varying relationship between PM 2.5 and AOD (Eq. 1). The model structure can be expressed as where b i and b i,t (day-specific) are the fixed and random intercept and slopes, respec- on PM 2.5 concentration estimates for the entire period, while the random effects contribute to the covariance structure and account for the daily variability in associations between dependent and independent variables. The second stage is a geographically weighted regression (GWR) model that can generate a continuous surface of estimates for each parameter at each location instead 15 of a universal value for all observations. We fitted a monthly GWR model to calibrate the spatial variability within the PM 2.5 -AOD relationship, and the model can be expressed as where PM 2.5 _resi st denotes the residuals from the stage one model at site s in month t, 20 AOD st is the MAIAC AOD value (unitless) at site s in month t, and β 0,s and β 1,s denote the location-specific intercept and slope, respectively. It should be noted that the model was fitted for each year individually, and therefore the predictors used in the model may vary for different years. The model structures were determined by comparison of performance of models using different predictors 25 to ensure that relatively better prediction accuracy can be achieved. To assess the goodness of fit of the model, various statistical indicators such as the coefficient of determination (R 2 ), mean prediction error (MPE), and square root of the mean squared prediction errors (RMSPE) were calculated between the fitted PM 2.5 concentrations from the model and the observations. In addition, a 10-fold cross validation (CV) technique was adopted to assess the potential model over-fitting. That is, the model could perform better on the data used to fit the model than unobserved data. The entire model-fitting dataset was randomly split into ten subsets with approximately 10 % of 5 the total data records in each subset. In each round of cross validation, we selected one subset (10 % of the data) as testing samples and used the remaining nine subsets (90 % of the data) to fit the model. Predictions of the held-out subset (10 % of the data) were made from the fitted model. The process was repeated ten times until every subset was tested. Statistical indicators such as R 2 , MPE, and RMSPE were calculated 10 between the CV predicted concentrations and the observations. The model over-fitting assessment was conducted by the comparison between CV and model-fitting statistics. A relative accuracy value was also calculated for each year to make validation results comparable among different years. The daily PM 2.5 concentrations were estimated using the model for each year individ-15 ually. The maps of annual mean PM 2.5 concentrations as well as the changes between 2001 and 2010 for the study domain and the Atlanta metro area were generated using the daily estimates to visually examine spatial trends of PM 2.5 levels from 2001 to 2010. Moreover, time series analyses were conducted by year and month, respectively to quantitatively investigate the 10 yr temporal trends of fine particle levels in the study 20 domain and the Atlanta metro area. Finally, an examination of the impact of wild and prescribed fires on PM 2.5 levels was also conducted on two test sites.

Descriptive statistics
The descriptive statistics of variables used in fitting the models are listed in Table 1.  Table 1 also shows that land use variables and meteorological fields vary from year to year within the data.

5
The model-fitting and CV statistics (e.g. R 2 , MPE, and RMSPE) are listed in Table 2.
The results show that R 2 ranges from 0.71 to 0.85, MPE is from 1.73 to 2.50 µg m −3 , RMSPE ranges from 2.75 to 4.10 µg m −3 , and relative accuracy ranges from 72.9 % to 80.7 %, which indicate a good fit between predicted values from the fitted models and the observations. In addition, CV statistics results suggest that model over-fitting is present; that is, R 2 decreases, while MPE and RMSPE increase from model fitting to cross validation, yet the differences are relatively small for all the years. For instance, R 2 and relative accuracy have an average decrease of 0.08 and 4.21 %, respectively, while MPE and RMSPE have an average increase of 0.39 and 0.60 µg m −3 , respectively through all the years. Moreover, a regression with zero intercept (Fig. 2) was 15 performed to fit the predicted values against the observations. The figure shows that at high concentration levels, both model fitting and cross validation under-predicted the PM 2.5 concentrations by 3-7 % (e.g. fitted/CV PM 2.5 = 97 % to 93 % observed PM 2.5 ). Figure 3 illustrates the PM 2.5 concentration estimates at 1 km spatial resolution in the 20 study area. land, and high agricultural emissions may lead to high concentrations of fine particles. As reported by previous studies, ammonia (NH 3 ) and nitrogen oxides (NO x ) generated by agricultural activities, such as farm vehicles, domestic and farm animals, and fertilizer applications, can significantly increase the number of suspended particles (Kurvits and Marta, 1998). In addition, biomass burning also contributes to emissions of fine 5 particles in the region (Zhang et al., 2010). Figure 4 shows that the pattern of ground PM 2.5 measurements from FRM monitors corresponds well with that of our estimated concentrations.

Spatial trends of PM 2.5 concentrations
To take advantage of the high spatial resolution of the MAIAC data, a map of PM 2.5 estimates in Atlanta metro area was also generated for each year (Fig. 5). The annual 10 mean estimates from 2001 to 2010 are 15. 10, 14.64, 14.00, 14.54, 15.63, 14.39, 14.14, 11.78, 10.98, and 11.65 µg m −3 , respectively. Those maps distinctly show that high PM 2.5 levels occur in areas with high urban land use and along major highways, while low concentrations appear in forest and recreational areas, suggesting an underlying positive relationship between air pollution levels and urban development.
15 Figure 6 shows the percent changes in PM 2.5 concentrations from 2001 to 2010. Figure 6a illustrates the spatial trend of changes in PM 2.5 levels in the study region. The results show that PM 2.5 levels in most of the areas decreased from 0 to 25 %, and large parts of the areas had decreases exceeding 25 % and up to 50 %. Larger decreases occurred in areas with generally higher pollution levels such as Atlanta metro area 20 and along major highways, which might be due to recently enacted emission reduction program (EPA, 2011), since majority of the emission sources are located in or near urban areas and along major highways. Mitigation of fine particles has been effected through control of direct PM 2.5 emissions from both stationary and mobile sources (e.g., through installation of scrubbers and filters and the use of alternative fuels and 25 electric vehicles) (EPA, 2007). One exception is the mountainous area in the northeast of our domain with generally low PM 2.5 levels. The decreases of PM 2.5 concentrations in this region from 2001 to 2010 were also phenomenal, PM 2.5 levels decreased from 25 % to 50 % in most of the region, and some areas had decreases exceeding 50 %. By checking 2002 and 2008 EPA NEI facility emissions reports, this is probably due to dramatically reduced number of emission sources in the region during the period. Figure 6b illustrates the percent changes in PM 2.5 levels in the Atlanta Metro area. Once again, the spatial trend shows that larger decreases (25 % to 50 %) primarily occurred in urban built-up areas and along major highways, while small decreases (0 5 to 25 %) appeared in forest or recreational areas with generally lower pollution levels. It is reasonable because the increase or decrease of emissions from emission sources might be the major driver causing fluctuations of PM 2.5 levels in this region, while most of those sources are unlikely to be located in forest and recreational areas. Two pixels with large changes (increasing more than 25 % and decreasing more than 50 %) were identified (in blue and red circles). The large decrease of PM 2.5 concentration in the blue pixel was because of the large reduction of particle emissions from power plants located within that pixel during the period between 2001 and 2010. Likewise, the large increase of PM 2.5 concentration in the red pixel was due to the addition of one emission source that did not exist in year 2001.

Temporal trends of PM 2.5 concentrations
A time series analysis was conducted to quantitatively examine temporal trends of PM 2.5 levels in the study area as well as the Atlanta metro area during the period between 2001 and 2010 (Fig. 7). The results show that the estimated PM 2.5 concentrations are smaller than the monitored observations, and underestimation by our model 20 has occurred, especially in areas with high pollution levels (e.g., the Atlanta metro area), which agrees with our model validation results. The average underestimation is 0.99 µg m −3 for the study domain and 1.82 µg m −3 for the Atlanta metro area. The year 2005, and this phenomenon might be attributed to the increase of sulfate concentrations emitted from electric utilities and industrial boilers during the warm months (e.g, from May to September) of 2005 (EPA, 2008b). In addition, the steady decrease of PM 2.5 levels after year 2005 is due to the emissions reduction programs that have been enacted recently (EPA, 2011(EPA, , 2007.  To assess the impact of fires on PM 2.5 levels, we selected year 2007 to conduct a case study as fires in the southeastern US in 2007 has been examined by previous research using ground monitoring data (Zhang et al., 2010). The differences of number of fires and annual mean PM 2.5 concentrations between two test sites were calculated and illustrated in Fig. 9. It shows that in most 15 of the cases, the peaks and valleys of the differences of number of fires correspond well with the peaks and valleys of the differences of PM 2.5 concentrations (indicated by black arrows), indicating an underlying positive relationship between fires and PM 2.5 concentrations and suggesting that the fires may have impact on fine particle levels in the south of our study region. However, exceptions exist. For example, in April 2007, 20 a day in which a large difference of number of fires occurred did not show a higher difference of PM 2.5 concentrations than its neighboring days. In addition, in many days in which there are no differences of number of fires, the differences of PM 2.5 concentrations between two test sites still exist. Such situations might be attributed to several factors: first, due to the lack of data, our analysis only used the number of fires without 25 fire scale and intensity information and thus might be biased. often more substantial, contributors to PM 2.5 concentrations in the region (Zhang et al., 2010).

Discussion
A major strength of this study is that we used 1 km spatial resolution PM 2.5 estimates derived from 1 km resolution MAIAC AOD to investigate spatiotemporal trends of PM 2.5 5 concentrations in the study area as well as in the Atlanta metro area. PM 2.5 estimates at finer resolutions are more suitable for investigation of spatial trends than those at coarser resolutions derived from other AOD products (e.g., MODIS and MISR), because estimates at coarser scales inevitably omit local spatial details. Our results are capable of showing PM 2.5 concentrations and changes within a 1 × 1 km 2 area, which 10 are very useful for air pollution studies at local scales. For instance, spatial trends of changes in PM 2.5 concentrations in the Atlanta metro area show distinct patterns of higher PM 2.5 reduction in areas with generally higher pollution levels (e.g., urban builtup areas and along major highways). Some of the changes may be directly associated with the occurrence or disappearance of one or more emission sources as well as the 15 increase or decrease of emissions from those sources. Although high resolution PM 2.5 estimates can provide more details to examine spatial trends, difficulties lie in validation to ground monitoring. More ground measurements at specific locations are needed to further validate the results. Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | regional fine particle levels, and further research will continue to explore these associations.
In addition, long-term PM 2.5 estimates at high spatial resolutions might be better suited for incorporating fire data in order to examine the impact of fires on fine particle levels, since fires might occur far away from ground monitors. Our results show that 5 fires, to some extent, contribute to PM 2.5 concentrations in our study region. However, they might not be the major contributors to fine particle levels. It should be noted that our results are preliminary and might contain a certain degree of bias, since no fire scale and intensity data were incorporated in the analysis. However, in this paper, we primarily attempt to examine the 10 yr spatial and temporal trends of fine particle levels 10 in the region using high spatial resolution PM 2.5 estimates as well as the possible causes for high PM 2.5 concentrations in the south of our domain. Hence, quantification of the contribution of fires to PM 2.5 levels was beyond the scope of this analysis.

Conclusions
In this paper, we used a two-stage spatiotemporal model incorporating MAIAC AOD 15 data, meteorological fields, and land use variables to estimate PM 2. 5 Fig. 9. Time series analysis of relationship between differences of number of fires and differences of PM 2.5 concentrations for year 2007 (Only months with differences of number of fires were plotted, and the coincidences were indicated by black arrows).