Evaluating the skill of high-resolution WRF-Chem simulations in describing drivers of aerosol direct climate forcing on the regional scale

Assessing the ability of global and regional models to describe aerosol optical properties is essential to reducing uncertainty in aerosol direct radiative forcing in the contemporary climate and to improving confidence in future projections. Here we evaluate the performance of highresolution simulations conducted using the Weather Research and Forecasting model with coupled with Chemistry (WRF-Chem) in capturing spatiotemporal variability of aerosol optical depth (AOD) and the Ångström exponent (AE) by comparison with groundand space-based remotely sensed observations. WRF-Chem is run over eastern North America at a resolution of 12 km for a representative year (2008). A systematic positive bias in simulated AOD relative to observations is found (annual mean fractional bias (MFB) is 0.15 and 0.50 relative to MODIS (MODerate resolution Imaging Spectroradiometer) and AERONET, respectively), whereas the spatial variability is well captured during most months. The spatial correlation of observed and simulated AOD shows a clear seasonal cycle with highest correlation during summer months (r = 0.5–0.7) when the aerosol loading is large and more observations are available. The model is biased towards the simulation of coarse-mode aerosols (annual MFB for AE=−0.10 relative to MODIS and −0.59 for AERONET), but the spatial correlation for AE with observations is 0.3–0.5 during most months, despite the fact that AE is retrieved with higher uncertainty from the remotesensing observations. WRF-Chem also exhibits high skill in identifying areas of extreme and non-extreme aerosol loading, and its ability to correctly simulate the location and relative intensity of extreme aerosol events (i.e., AOD> 75th percentile) varies between 30 and 70 % during winter and summer months, respectively.


Introduction and Objectives
Atmospheric aerosol particles (aerosols) play a major role in dictating Earth's climate by both directly interacting with solar radiation (direct effect) and acting as cloud condensation nuclei and thus changing cloud properties (indirect effect) (Boucher et al., 2013).The global mean aerosol direct effect is estimated to be −0.27(possible range of −0.77 to +0.23) W m −2 , while the indirect effect is −0.55 (−1.33 to −0.06) W m −2 (Stocker et al., 2013).Therefore, their combined radiative forcing is likely a significant fraction of the overall net anthropogenic climate forcing since preindustrial times (i.e., 1.13-3.33W m −2 (Stocker et al., 2013)) and a substantial source of uncertainty in quantifying anthropogenic radiative forcing.
Accurate quantification of direct aerosol radiative forcing is strongly dependent on aerosol precursor and primary aerosol emissions.Both have evolved over the past 2 decades in terms of their spatiotemporal distribution and absolute magnitude.Emissions have generally increased in emerging economies (Kurokawa et al., 2013), biogenic and Published by Copernicus Publications on behalf of the European Geosciences Union.P. Crippa et al.: Evaluating the skill of high-resolution WRF-Chem simulations anthropogenic emissions have altered in response to changing land use and land cover (Wu et al., 2012), and the implementation of pollution control strategies particularly in North America and Europe have resulted in declines in air pollutant emissions (Xing et al., 2015;Giannouli et al., 2011).Therefore, there is evidence that aerosol burdens and thus direct climate forcing has varied markedly in the past and may change substantially in the future.Further, although best estimates of global anthropogenic radiative forcing from the aerosol direct and indirect effect are −0.27 and −0.55 W m −2 (Stocker et al., 2013), respectively, the short residence time and high spatiotemporal variability of aerosol populations mean that their impact on regional climates can be much larger than the global mean but that they are even more uncertain.
Long-term measurements of aerosol properties are largely confined to aerosol mass (total, PM 10 or PM 2.5 ) in the nearsurface layer which may or may not be representative of either the total atmospheric burden (Ford and Heald, 2013;Alston et al., 2012) or radiation extinction and hence climate forcing.Further, aerosol composition measurements are often a 24 h integrated sample taken only on 1 in 3 days and thus are subject to undersampling.Hence, they provide an incomplete description of temporal variability and mean aerosol burdens for model performance evaluation.Columnar remote-sensing measurements of aerosol optical properties are available from a range of ground-based and satelliteborne instrumentation but have only a relatively short period of record, are subject to nonzero measurement uncertainty (and bias), and undersample the range of atmospheric conditions due to cloud masking and infrequent satellite overpasses.Therefore, regional and global models are most commonly used to quantify historical and contemporary aerosol direct radiative forcing based on simulated properties such as the aerosol optical depth (AOD) and Ångström exponent (AE) (Boucher et al., 2013).
Most global models that include aerosol microphysics have been run at a fairly coarse resolution (spatial resolution of the order of 1-2.5 • ) (Table 1) usually for periods of a few years.The resulting fields of AOD (and less frequently AE) have been evaluated relative to ground-based and satelliteborne remote-sensing optical properties measurements (Table 1).However, aerosol populations (and dynamics) are known to exhibit higher spatial variability (and scales) than can be manifest in those models (Kovacs, 2006;Kulmala et al., 2011;Santese et al., 2007;Schutgens et al., 2013;Shinozuka and Redemann, 2011).Despite recent improvements in the sophistication of aerosol processes and properties within global models, there are still substantial regional and latitudinal discrepancies in both the magnitude of AOD and other aerosol properties which impact aerosol direct radiative forcing and the degree of model-to-model agreement (Myhre et al., 2013).Thus, the skill of these models in reproducing the spatiotemporal variability in the aerosol size distribution, composition, concentration and radiative prop-erties is incompletely characterized.Further large model-tomodel variability both in the global mean direct aerosol forcing and in the spatial distribution thereof exists (Kulmala et al., 2011;Myhre et al., 2013), leading to high uncertainty in the quantification of aerosol climate forcing.Although a direct comparison between the studies summarized in Table 1 is inherently very difficult due to the different performance metrics reported and variations in both the model resolution and aerosol descriptions, there is a consistent finding of high spatial variability in model bias, both in sign and magnitude.Correlation coefficients of monthly and seasonal mean AOD from model simulations versus satellite-based measurements are typically in a range of ∼ 0.6-0.8 both in global (Colarco et al., 2010;Lee et al., 2015) and regional (Nabat et al., 2015) simulations.However, these correlations are largely reflective of the ability of the models to capture the seasonal cycle and columnar aerosol properties from remote sensing and thus ignore substantial variability on the synoptic scale (Sullivan et al., 2015) and on mesoscales (Anderson et al., 2003).A wider range of correlation coefficients is reported when comparisons are made to high-frequency observations of AOD on the hourly or daily timescale both in global (Sič et al., 2015) and regional (Rea et al., 2015) simulations (r ∼ 0.3-0.8).The largest range of correlation coefficients ([−0.99,0.9]; Table 1) is reported when simulated AOD is compared with observations from the AErosol RObotic NETwork (AERONET) and appears to be a function of temporal averaging, location of AERONET sites and model resolution.Correlations between time series of simulated AE versus AERONET observations are reported less frequently and, when conducted for monthly mean values, range from ∼ 0.4 (Li et al., 2015) to ∼ 0.8 (Colarco et al., 2010).
At least some of the variability in model performance, as indicated by the mutual variability with observations described by correlation coefficients, and model-to-model agreement shown in AeroCom (Aerosol Comparisons between Observations and Models) Phase II may be attributable to variations in model resolution, differences in gas and particle phase parameterizations, and aerosol descriptions.However, there are also variations in the way in which model skill is evaluated and divergent opinions regarding prioritization of future research directions.The direct effect remains poorly quantified on the regional scale, due to uncertainty in aerosol loading, uncertainty and spatiotemporal variability in aerosol physical properties (Colarco et al., 2014), and a relative paucity of rigorous model verification and validation exercises.Confidence in projections of possible future aerosol radiative forcing requires detailed assessment of skill in the current climate and the need for and benefits of regional downscaling and/or the use of high-resolution global models requires careful quantification.
Regional models represent an opportunity to assess whether running higher-resolution simulations over specific regions of interest improves the characterization of aerosol  The assessment of value added (or lack thereof) from highresolution regional versus global coarse-resolution models has not been clearly quantified in previous studies (Table 1).
Although high-resolution simulations, comparable to those presented herein, have been run, they cover a small temporal and spatial domain (e.g., Tuccella et al., 2015) or lack quantitative assessment of aerosol optical properties (e.g., Tessum et al., 2014).Thus, the quantification of the skill of highresolution modeling of aerosol optical properties is presented here along with a preliminary analysis of model performance as a function of spatial aggregation.Forthcoming work will include a direct comparison to coarser-resolution simulations to quantify the value added (or lack thereof) from increased model resolution.
We evaluate the skill of state-of-the-art high-resolution regional model simulations of climate-relevant aerosol properties using a range of descriptive statistics and investigate possible sources of discrepancies with observations.The impact of aerosols on climate and human health are strengthened under conditions of enhanced aerosol concentrations; thus, it is necessary to study and diagnose causes of "extreme aerosol events" (Chu, 2004;Gkikas et al., 2012) and to evaluate the ability of numerical models to simulate their occurrence, intensity, spatial extent and location.Prior analyses of Level-3 (1 • resolution) MODIS (MODerate resolution Imaging Spectroradiometer) AOD over the eastern half of North America have indicated that extreme AOD values (> local 90th percentile) are coherent on regional scales (∼ 150 km) (Sullivan et al., 2015).Thus, our evaluation exercise also includes an analysis of the spatiotemporal coherence of extreme events.
We applied the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem version 3.6.1)at high resolution (12 × 12 km) over eastern North America during the year 2008, in the context of a pseudo type-2 downscaling exercise in which the high-resolution model is nested within reanalysis boundary conditions (Castro et al., 2005).The choice of this spatial resolution is taken in part to match the resolution of the North American Mesoscale model that is used for the meteorological lateral boundary conditions and to ensure we capture some mesoscale variability while keeping it computationally feasible.
Our evaluation is designed to investigate spatiotemporal variability of aerosol optical properties (i.e., AOD and AE) in their mean and extreme values.Thus, we conduct our evaluation of the simulations using  , 2009).This data set includes monthly accumulated precipitation data on a 0.5 × 0.5 • grid which is estimated by interpolating station observations from the Global Historical Climatology Network using the spherical version of Shepard's distance-weighting method (Shepard, 1968;Willmott et al., 1985).This paper is structured as follows.We first describe the settings used in our WRF-Chem simulations and introduce the remote-sensing and other data used for model evaluation (Sect.2).A description of statistical metrics used for the evaluation is also provided.Section 3 presents results of the evaluation of simulated AOD and AE versus observations, as well as findings on extreme AOD values.In Sect. 4 we summarize our findings and draw conclusions.

WRF-Chem simulations
The WRF-Chem (Grell et al., 2005;Fast et al., 2006) is used to simulate aerosol processes over eastern North America during the whole of 2008.The simulation domain comprises 300×300 grid points with a 12 km resolution and is centered on southern Indiana (86 • W, 39 • N).The calendar year 2008 was selected because it is representative of average climate and aerosol conditions in the center of the model domain (near Indianapolis, IN).In 2008, mean T max , T min , precipitation and wind speed as measured at the National Weather Service Automated Surface Observing Systems (NWS ASOS) station at Indianapolis International Airport are within ±0.25 standard deviations (σ ) of the 2000-2013 seasonal means.Further, mean seasonal AOD from Level-3 MODIS retrievals is within ±0.2σ of 2000-2013 mean values.Additionally, the choice of this year ensures the availability of multiple sources of ground-and space-based measurements of aerosol properties for the evaluation of the simulations.
Table 2 provides details of the WRF-Chem simulations.In brief, we used 32 vertical levels up to 50 hPa with telescoping to allow for a good vertical resolution in the boundary layer (i.e., approximately 10 layers below 1 km for    ter et al., 2011;Emmons et al., 2010).Anthropogenic emissions are from the POET (Precursors of Ozone and their Effects in the Troposphere) and the EDGAR (Emissions Database for Global Atmospheric Research) databases.The land cover is specified based on the USGS 24-category data at 3.7 km resolution (Anderson et al., 1976).Anthropogenic point and area emissions at 4 km resolution are input hourly from the US National Emissions Inventory (NEI-05) (US-EPA, 2009) and specified for 19 vertical levels (see Fig. 1 for an overview of the primary aerosol emissions).Biogenic emissions of isoprene, monoterpenes, other biogenic volatile organic compounds (VOCs), oxygenated VOCs (OVOCs), and nitrogen gas emissions from the soil are described as a function of simulated temperature and photosynthetic active radiation (for isoprene) using the model of Guenther (Guenther et al., 1993, 1994;Simpson et al., 1995).Aerosol and gas phase chemistry are described using the second generation Regional Acid Deposition Model (RADM2) chemical mechanism (Stockwell et al., 1990) and the Modal Aerosol Dynamics model for Europe (MADE) which incorporates the Secondary Organic Aerosol Model (SORGAM) (Ackermann et al., 1998;Schell et al., 2001).The correct characterization of aerosol optical properties is dependent on model skill in describing particle composition and mixing state (Li et al., 2015;Curci et al., 2014).With this in mind, it is worthy of note that aerosol components are assumed to be internally mixed within each mode (although the composition differs by mode).The standard deviation on the lognormal Aitken and accumulation modes are fixed at 1.6 and 2, respectively.The choice of a modal representation of aerosol size distribution is dictated by the high computational demand of more sophisticated approaches (e.g., sectional description of the aerosol size distribution) for long-term simulations.With the current settings, the 1-year run was completed without restart in 9.5 days (230 h) on the Cray XE6/XK7 supercomputer (Big Red II) owned by Indiana University using 256 processors distributed on eight nodes, thus indicating the feasibility of this configuration for climate-scale simulations.Aerosol and gas phase concentrations and meteorological properties are saved once hourly.AE from the WRF-Chem simulations is computed using ln 600 nm 400 nm . (1) AOD at wavelengths (λ) of 500 and 550 nm for comparison with MODIS and MISR, respectively, are derived using the Ångström power law: . (2) We investigated the wavelength dependence on the AE calculation using λ at 300 and 1000 nm as proposed in Kumar et al. (2014) and found that, although AOD estimates are independent of the wavelength range selected, AE 400−600nm is systematically lower than AE 300−1000nm .Analyses of AE reported in this study are obtained using λ = 400 and 600 nm since they are closer to those used in AE satellite retrievals.

Remotely sensed data
Consistent with previous research (Sect. 1 and Table 1), we evaluate the WRF-Chem simulations using four primary remote-sensing products -three are drawn from instruments on the Aqua and Terra satellites, while the fourth is from ground-based radiometers operated as part of the AERONET network.The data sets are as follows: 1.The MODIS instruments aboard the polar-orbiting Terra (∼ 10:30 overpass local solar time (LST)) and Aqua (∼ 13:30 LST) satellites.They have measured atmospheric aerosol optical properties since 2000 and 2002, respectively, with near-global daily coverage (Remer et al., 2005).Herein we use the Level 2 (L2; 10 km resolution) "dark-target" products of AOD at 550 nm and AE at 470 and 660 nm (Collection 5.1; Levy et al., 2010).The L2 AOD uncertainty is ±0.05 ± 0.15 × AOD over land relative to global sun photometer measurements from AERONET; even when no spatiotemporal averaging is used in the comparison (i.e., all combinations of MODIS retrievals within 30 km of an AERONET site and all AERONET retrievals within 30 min of the satellite overpass), 71 % of MODIS retrievals fall within a ±0.05±0.2×AODenvelope relative to AERONET over East CONtiguous US (E.CONUS) (Hyer et al., 2011).AE is retrieved with higher uncertainty, and tends to exhibit a bimodality in retrieved values (Levy et al., 2010;Remer et al., 2005) (see Fig. S1 in the Supplement).For this reason, where we compare WRF-Chemsimulated AE with values from MODIS, we treat AE as a binary variable, wherein AE < 1 is taken as representing coarse-mode-dominated aerosol populations and AE > 1 indicates fine-mode-dominated populations (Pereira et al., 2011;Valenzuela et al., 2014).
3. Ground-based sun photometer measurements from 22 AERONET (Holben et al., 1998) stations are also used in this study (Fig. 1).This network is highly spatially inhomogeneous, but under cloud-free conditions the observations are available at multiple times during daylight hours.AOD is measured directly by the AERONET sun photometers at seven wavelengths (340,380,440,500,670,870 and 1020 nm) with high accuracy (i.e., AOD uncertainty of < 0.01 for λ > 440 nm (Holben et al., 2001)).The AE is calculated for all available wavelengths within the AOD range.The AE 870-440 nm includes the 870, 670, 500 and 440 nm AOD data.Level-2 aerosol products from AERONET (i.e., cloud screened and quality assured) have been used extensively in satellite and model validation studies (including many of those summarized in Table 1) and are used herein.
To avoid the discontinuity in the MODIS retrieval algorithm due to different assumed aerosol types (Levy et al., 2007), we confine our analyses of model skill to longitudes east of 98 • W.Only WRF grid cells with cloud fraction equaling 0 during the satellite over pass of each grid cell are used in comparison to MODIS and MISR observations, and only grid cells with at least five valid observations (both from MODIS and MISR and cloud-screened WRF) during a given month are included in the analyses presented herein.
It is worth noting that setting a threshold of 10 observations does not significantly affect the results.For a uniform assessment, L2 MODIS and L3 MISR data have been interpolated from their native grids (and resolutions of 10 km and 0.5 • ×0.5 • , respectively) to the WRF-Chem 12 km resolution grid by computing the mean of pixels with valid data within 0.1 • and 0.3 • for MODIS and MISR, respectively, from the model centroids.The choice of averaging over a slightly larger area than model resolution is dictated by the sparsity of valid satellite retrievals.For AERONET vs. MODIS comparison, we only use the nearest MODIS data (after regridding to WRF) to each site.Where hourly WRF-Chem output is compared with data from AERONET sites, a station is only included if there are at least 20 simultaneous estimates available, and each AERONET measurement is compared to the nearest WRF-Chem time step and to the grid cell containing the station.

Statistical methods used in the model evaluation
The primary error metric of overall model performance used herein is the mean fractional bias (MFB) (Boylan and Russell, 2006):  MFB is a useful model performance indicator since it weights positive and negative biases equally.It varies between +2 and −2 and has a value of 0 for an ideal model.Where MFB is reported for WRF-Chem versus MODIS, MISR and AERONET, C m is the monthly mean AOD or AE simulated by WRF-Chem at a specific location, C 0 refers to the same quantify from remote-sensing data (Table 3) and N is the sample size.
The evaluation of WRF-Chem simulations of AOD and AE relative to satellite retrievals (MODIS and MISR) is also summarized using Taylor diagrams (Taylor, 2001) produced from the monthly means for the grid cells with simultaneous data availability.Taylor diagrams synthesize three aspects of model skill focused on evaluations of the spatial fields of the parameter of interest: the correlation coefficient of the modeled vs. observed field which is expressed by the azimuthal position; the root mean squared difference which is proportional to the distance between a point and the reference point on the x axis (at 1, 0); and the ratio of simulated and observed spatial standard deviation which is proportional to the radial distance from the origin.
To investigate model performance at given locations through time, empirical quantile-quantile (EQQ) plots are constructed using high-frequency realizations of AOD and AE at individual locations (AERONET sites) relative to WRF-Chem values simulated in the grid cell containing the measurement site.EQQ plots are thus generated for each of the AERONET stations using all hours when there are simultaneous estimates available from the direct observations and from the numerical simulations.The advantage of EQQ plots is that they make no assumptions regarding the underlying form of the data and can be readily used to determine which parts of the modeled distribution deviate from the observations (and thus fall away from a 1 : 1 line).
The validity of AE estimates is a function of both the absolute magnitude of AOD and the uncertainty in the wavelength-dependent AOD.AE provides information regarding the relative abundance of fine to coarse particles.Thus, here we quantify the model skill in reproducing spatial patterns of fine-and coarse-mode particles observed by MODIS (Terra) by comparing the frequency distribution of AE lower and higher than 1 to distinguish populations dominated by coarse and fine aerosols, respectively, in WRF-Chem and MODIS (Valenzuela et al., 2014;Pereira et al., 2011).The choice of this threshold reflects the AE distribution.AE simulated by WRF-Chem generally conforms to a single normal distribution centered on 1 during January-April and on 1.3 from May-June to December; AERONET time series also tend to conform to a single mode, while MODIS estimates typically are bimodally distributed (see Fig. S1).We therefore consider the data in the form of a contingency table (Table 4) and compute a χ 2 test to assess whether the frequency distribution of fine and coarse particles is the same between MODIS and WRF-Chem.The χ 2 statistic is applied with 1 degree of freedom and a 99 % confidence limit.
As described above, the impact of aerosols on climate and human health is strengthened under conditions of enhanced aerosol concentrations; thus, two analyses were undertaken to evaluate the ability of the WRF-Chem simulations to represent extreme AOD values: 1. Evaluation of the spatial patterns of extreme events.Using daily estimates of AOD in each grid cell and month, we identified the 75th percentile value across space (i.e., p75) as a threshold for extreme AOD for WRF-Chem and MODIS separately.Grid cells with AOD exceeding that threshold were classified as exhibiting extreme values.The consistency in the spatial distribution of extreme values as simulated by WRF-Chem relative to MODIS is quantified using three skill statistics: the accuracy, hit rate (HR) and threat score (TS) defined in Eqs. ( 4)-( 6).In these equations, WE, ME, WN and MN correspond to the frequency of extreme conditions in WRF-Chem (WE) or MODIS (ME) or neither (WN or  The Accuracy describes the fraction of grid cells coidentified as exceeding p75 or not in MODIS and WRF-Chem and thus weights event and non-event conditions equally.Since the accuracy quantifies model skill in correctly identifying both extreme and non-extreme aerosol loadings, it is thus indicative of model performance in capturing the overall AOD spatial variability.In this application, where extreme is identified as the 75th percentile, a value of 0.5 would indicate that none of the grid cells experiencing extreme events were reproduced by the model, while 1 would indicate perfect identification of events and non-events.The HR and TS metrics give "credit" only those grid cells identified as "extreme".For these metrics, a value of 0 indicates no correct identification of grid cells with extreme values, while a perfect model would exhibit a value of 1.

Evaluation of the scales of coherence of extreme AOD.
For each day during the overpass time and hours of clear-sky conditions, we determine whether AOD simulated at our reference location (i.e., the center of the domain, in southern Indiana) is equal to or larger than the local p75 for that grid cell and season and then identify all grid cells in the domain that also satisfy the condition of AOD ≥ local p75.The reference location represents the center of gravity of the domain and was previously used by Sullivan et al. (2015) for assessing scales of coherence.In that work they also found that the spatial scales of coherence are not sensitive to the precise choice of reference location.For each season, we thus compute the probability of extreme AOD co-occurrence at our reference site and any other grid cell as the frequency of co-occurrence divided by the number of extreme occurrences at the reference location.The spatial scales of extreme AOD are then estimated by binning the radial distance of each grid cell centroid from the domain center into 100 km distance classes.An analogous procedure is applied to L2 MODIS data to compare them with simulations.

Evaluation of AOD
Overall WRF-Chem is positively biased relative to remotely sensed AOD.The spatial MFB is 0.15 (0.14) when computed using all available MODIS measurements from Terra (Aqua) and 0.50 relative to data from the AERONET stations (Table 3).The sign of this bias is consistent across the entire simulation domain (Fig. 2).These results agree with findings from previous regional studies that have also shown an overestimation of AOD by WRF-Chem over eastern North America and Europe (i.e., regions dominated by sulfate aerosols) and an underestimation in the western USA and most of the rest of the globe (Zhang et al., 2012;Colarco et al., 2010;Curci et al., 2014) (Table 1).Higher biases of WRF-Chemsimulated annual mean AOD are found in the southern portion of the domain (Fig. 2) where the model also exhibits a positive bias in daily mean near-surface PM 2.5 relative to observations from 1230 US EPA sites (see Figs. 3 and S2).We further investigated the bias in PM 2.5 by comparing WRF-Chem simulations with ground-based measurements of particle composition at 123 IMPROVE sites over our domain.We computed the MFB on a seasonal basis between sulfate and nitrate concentrations in fine-mode particles (i.e., Aitken and accumulation mode) versus observations (Fig. 4) and found sulfate concentrations are underestimated almost over the entire domain during winter, whereas a positive bias is present in the other seasons.Conversely, nitrates tend to be overestimated during winter and fall at most sites, whereas they are underestimated during summer.Thus, the positive bias in AOD and PM 2.5 mass particularly during the summer appears to be associated with excess sulfate concentrations.
The MFB of WRF-Chem relative to MODIS estimates of AOD is lower than the MFB relative to most of the AERONET stations except for a few sites located along the coast, one polluted site in the northeast and at a few land sites in the north or northwest (Figs. 1 and 5a).This is possibly a result of an inability of the model to capture variations in aerosol optical properties occurring on a local scale (below the resolution of 12 km).However, the evaluation statistics for WRF-Chem relative to AERONET did not vary consistently with the classification of AERONET stations.Indeed, the mean MFB for AOD at coastal, polluted and land sites varies between 0.26 (coastal) and 0.67 (land), whereas for AE it varies between −0.72 (coastal) and −0.50 (land).When MODIS is compared to the 22 AERONET stations the MFB is −1.23 suggesting an underestimation of AOD from AERONET relative to MODIS.The large bias can be explained noting that the number of co-samples in MODIS is quite small and that MFB is strongly impacted by a few outliers.When we remove the three most biased sites (one land site in the north and two sites along the east coast), the MFB decreases to −0.91.
Using very limited data, prior research indicated mesoscale variability (horizontal scales of 40-400 km and temporal scales of 2-48 h) is a common and perhaps universal feature of lower-tropospheric aerosol light extinction (Anderson et al., 2003).However, we are not aware of prior systematic attempts to quantify and test the universality of AOD scales of coherence over the contiguous USA.To test the sensitivity of the MFB in simulated AOD to spatial aggregation, we excluded the first 12 cells to the left and to the top of the simulated domain and averaged the remaining 12 × 12 km grid cells over the following scales: 24 × 24, 36 × 36, 48 × 48, 72 × 72, 96 × 96, 144 × 144, 192 × 192, 216×216, 288×288, 384×384, 432×432, 576×576, 864× 864, 1152 × 1152, 1728 × 1728, 3456 × 3456 km.The last spatial average corresponds to a single grid cell encompassing the entire domain (excluding the outer 12 cells located to the west and north of the simulation domain).Each spatial average at a coarser resolution is computed as the mean of all valid 12 × 12 km grid cells within the averaging area.We then computed the MFB for the regridded WRF-Chem and MODIS data pair and found that, on a yearly basis, MFB is highest at 12 km (0.14 for Aqua and 0.15 for Terra) and reaches a first minimum at 72 km for Aqua (MFB = 0.13) and 384 km for Terra (MFB = 0.13) (see Fig. 6).However, the MFB, and hence systematic error in AOD relative to MODIS, exhibits only a weak dependence on the level of spatial aggregation.Spatial patterns of monthly mean AOD show the largest differences relative to MODIS during winter months in the southern states and near the coastlines, which show MFB up to 0.7, and lower spatial correlation (see Fig. 7a).This may be due to the larger uncertainty in MODIS retrievals near the coast (Anderson et al., 2013), the smaller sample size in the observations (particularly at high latitudes) during December to March or the lower overall AOD.Conversely, the spatial correlation is maximized during the summer (r = 0.5-0.7)for MODIS and August for MISR, when most data are available.The spatial variability of monthly mean AOD fields is also well simulated by WRF-Chem during the warm season (months May-August), as indicated by the ratio of the spatial standard deviation which is close to 1.However, σ (AOD) is usually higher in MODIS and/or MISR than in WRF-Chem.The root mean squared difference (RMSD) is largest and the spatial correlation is lowest during September and October, when MFB is also > 0.4 in part because WRF-Chem simulates high AOD and aerosol nitrate and sulfate concentrations over large regions in eastern North America (Figs.S3 and 4).The high positive bias in these months is also reflected in the nearsurface PM 2.5 concentrations and its composition (Figs.S2  indicates that WRF-Chem overestimates the observations.Note that the scales differ between the frames shown for sulfate and nitrate MFB and dots and diamonds refer to positive and negative MFB, respectively.and 4).A possible explanation for the relatively poor model performance during September and October may derive from the simulation of precipitation.During the majority of calendar months, domain-averaged precipitation as simulated by WRF-Chem is slightly positively biased relative to the gridded observational data.However, during September and October, the model exhibits a negative bias (of 8-10 % relative to observations) and a substantial underestimation of precipitation in regions of typically high AOD such as the Ohio River valley and along the east coast (Fig. S4).We also examined the impact of spatial aggregation (at 12, 24, 36, 48, 72 and 96 km resolution) on the seasonality of model performance.For AOD the spatial correlations are largest for most months when data are aggregated to a resolution of 24 × 24 km, and the ratio of spatial standard deviation is also closer to 1 when AOD are spatially aggregated, possibly indicating that the spatial patterns simulated by WRF-Chem on a fine scale do not always match those observed by MODIS (Fig. 8).For AE both spatial correlations and the ratio of standard deviations do not vary significantly when data are aggregated to a coarser resolution (Fig. S5).Empirical quantile-quantile plots of AOD at AERONET stations computed for both simultaneous MODIS observations and WRF-Chem with AERONET observations indicate that the positive bias in WRF-Chem-simulated values of AOD is evident across much of the probability distribution (5th to 95th percentile values) at most AERONET stations.However, it is worthy of note that WRF-Chem comparisons with  AERONET observations occupy much of the same observational range as simultaneous MODIS and AERONET observations at those sites (Fig. 9a), although the EQQ plot does not necessarily compare the same MODIS-AERONET and WRF-Chem-AERONET data pairs (i.e., the sample used to compare AERONET and MODIS may differ from that used to compare WRF-Chem and AERONET due to the cloud screening procedure).Thus, model simulations reproduce the range and probability of low-uncertainty AERONET measured AOD nearly as well as MODIS.

Evaluation of AE
Despite the low confidence in AE retrievals from MODIS, the comparison of WRF-Chem with the remote-sensing estimates indicates some degree of agreement.The overall   MFB of WRF-Chem vs. MODIS Terra is −0.09 (−0.11 vs. Aqua), and the correlation between WRF-Chem and MODIS monthly mean AE seems to be independent of season and lies between 0.20 and 0.54 for all months except April, May and November when it is lower, whereas r is always < 0.14 when compared to MISR (Fig. 7b).The AE RMSD relative to MODIS or MISR does not exhibit a clear seasonal pattern and the ratio of spatial standard deviations in the AE fields is always lower than 1, indicating more  spatial variability in the satellite retrievals than in WRF-Chem.The degree to which these results are symptomatic of the difficulties in retrieving AE from the remote-sensing observations is unclear.When the AE values are treated as binary samples (AE < 1 indicating that coarse-mode aerosols dominate, while AE > 1 indicating a dominance of the fine mode) and presented as a contingency table, WRF-Chem and MODIS simultaneously identify coarse-mode dominance (i.e., AE < 1) in 18 % of grid cells (Table 5).WRF-Chem simulates 31 % of grid cells as exhibiting annual mean AE > 1, while MODIS indicates a larger fraction of grid cells with AE > 1 (80 %, Table 5).Both WRF-Chem and MODIS indicate the highest prevalence of fine-mode particles during the warm months, with the highest agreement for co-identification (above 50 %) during June-September.
Co-identification of coarse-mode particles is highest in the winter and spring months (above 20 % during February-May and December, Table 5).However, when a χ 2 test is applied to the frequency of fine and coarse particles identified by WRF-Chem and MODIS, for all months except January and April, the p value is < 0.01; thus, we reject the null hypothesis of equal distribution of fine-and coarse-mode particles identified by MODIS and WRF-Chem.The two data sets agree on 29 % of the cases when trying to identify fine-mode particles and approximately 53 % of the cells are misclassified, with MODIS usually identifying a higher prevalence of fine aerosols than WRF-Chem.AE from WRF-Chem is also negatively biased relative to AERONET observations, with MFB = −0.59indicating a greater prevalence of coarsemode aerosols in the simulations (Table 3, Fig. 2).
EQQ plots for all sites show good accord between WRF-Chem and AERONET observations, as indicated by the relatively consistent fractional error across the entire range of simulated and observed AE (Fig. 9b).Simulations from previous studies have also shown a systematic negative bias of simulated AE versus MODIS observations.AE is very difficult to derive from the MODIS measurements, and the uncertainty in AE scales with AOD (AE is very uncertain at AOD < 0.2).Further, AE is derived from wavelengthdependent AOD; thus, the uncertainties in the measurements are certainly correlated.As indicated in Fig. 5, for some AERONET sites there is evidence that positive bias in AOD is associated with a high negative bias in AE, but this does not uniformly occur over eastern North America (e.g., for the site at 77.8 • W, 55.3 • N, WRF-Chem exhibits a positive bias in AOD across the entire probability density function (pdf) while the simulated AE is negatively biased, but the site at 84.28 • W, 35.95 • N exhibits relatively good accord for AOD but is negatively biased in AE almost to the same amount as the northern station).Highest biases have been noted in regions dominated by dust aerosols or when the model overestimates the dust loading, since aerosol population mean diameter is inversely proportional to AE (Colarco et al., 2014;Balzarini et al., 2014).Sources of the biases in our study include the simplified treatment of the size distribution, weaknesses in the emission inventory or uncertainties in meteorological variables affecting particle growth (e.g., temperature and relative humidity).Future work will focus on examining these sensitivities.

AOD extremes
Averaged across the entire simulation period, WRF-Chem correctly identifies 70 % of locations with extreme and non-extreme AOD in the MODIS observations (i.e., Accuracy = 70 %, Table 6).The overall TS and HR also indicate that the geographic location of extreme AOD is similar between the model and satellite retrievals.The annual mean HR, which is defined as the proportion of grid cells with extreme AOD co-identified by WRF-Chem and MODIS relative to MODIS extremes, is 41 %.The annual mean TS, which also takes into account false alarms, is 27 % (Table 6).
For each month, the HR is significantly higher than the probability of co-identification of extremes by random chance (i.e., p 0 = 0.25 2 = 0.0625), since the test statistic N is always larger than the critical value at 1 % (i.e., 2.575).HR and TS vary seasonally, with highest skill during summer months (HR up to 70 % and TS up to 54 %) and lowest skill during winter and early spring (minimum HR = 29 % and minimum TS = 17 %) (Table 6 and Fig. 10).
The relatively low skill in identifying the spatial occurrence of high AOD during winter and spring may reflect the relatively low AOD and low spatial variability during this season, which means "extreme" AOD may differ only marginally from the "non-extreme" areas (see Fig. S6 for monthly comparisons of extreme area identification).
The spatial distribution of extreme AOD also displays some seasonality, with areas of AOD > p75 concentrated over coastal regions and the southern states during summer months and smaller areas during winter and early spring (Fig. 10).Despite the relatively low simultaneous identification of extremes during cold seasons, the location of extremes moves from the coast to the Great Lakes region and Midwest states in both the model and MODIS (see Fig. S6).During winter and spring months, WRF-Chem simulates more areas with extreme AOD over coastal regions, whereas MODIS shows more spatial variability and predicts higher AOD in the Great Lakes area and in the states west of Illinois.Conversely, WRF-Chem underestimates areas of extreme AOD relative to MODIS in the northern regions of the domain, possibly due to the underestimation of sulfate aerosol.These two observations may be explained by noting that the mass fraction of aerosol nitrate in the accumulation and coarse mode predicted by WRF-Chem during most of the fall and winter months dominates the sulfate fraction over virtually all of the domain (see Fig. S3), whereas point observations indicate that aerosol nitrate mass fraction Table 6.Synthesis of the skill with which WRF-Chem identifies the spatial distribution and location of extreme AOD values.Cells with extreme AOD are identified as exceeding the 75th percentile computed on a monthly basis across space from monthly averaged daily means.The second column reports the Accuracy, which indicates the spatial coherence of extremes and non-extremes between WRF-Chem and MODIS.The Accuracy metric is computed as the sum of cells co-identified as exceeding the 75th percentile and not exceeding that threshold by WRF-Chem and MODIS (Terra) relative to the total number of cells with valid data (fifth column, N ).The third column reports the Threat Score (TS), which indicates the probability of correctly forecasting extreme AOD conditional upon either forecasting or observing extremes.The fourth column shows the hit rate (HR) (i.e., probability of correct forecast), which is the proportion of cells correctly identified as extremes by WRF-Chem relative to MODIS extremes.Values in parentheses refer to the same metrics when comparing WRF-Chem and MODIS onboard the Aqua satellite.is dominant only over the Central Great Plains (Hand et al., 2012).This may be related to an overestimation of aerosol nitrate in winter and fall (Fig. 4) as a result of the impact of air temperature and relative humidity on aerosol ammonium nitrate (NH 4 NO 3 ) stability (Aksoyoglu et al., 2011), as well as an underestimation of aerosol sulfate, mostly during winter (Fig. 4) and likely due to an underestimation of the rate of SO 2 gaseous and aqueous (missing) oxidation or an underestimation of the nighttime boundary layer height which impacts sulfate formation near the surface (Tuccella et al., 2012).Localized negative biases in the model over the coast may be associated with the higher uncertainties in MODIS retrievals at coastlines (Anderson et al., 2013).Extreme AOD exhibits relatively large spatial scales of coherence in both the WRF-Chem simulations and MODIS L2 observations (Fig. 11).Consistent with prior analyses of L3 MODIS data (Sullivan et al., 2015), the largest scales of coherence are found in fall.In all seasons except winter, the probability of the co-occurrence of extremes at the domain center and any other grid cell in the simulation domain is > 0.5 up to a distance of 300 km.The simulated mean seasonal scales of extreme coherence are comparable to L2 MODIS AOD (Fig. 11), despite the larger variability in the MODIS data due to the limited retrievals with simultaneous extreme AOD at the reference location and at each other grid cell.Thus, consistent with prior research, this analysis indicates that extreme AOD occurs on large spatial scales and therefore may significantly impact regional climate.

Discussion and concluding remarks
Aerosol direct and indirect radiative forcing on the climate system are highly uncertain.A systematic assessment of the ability of global and regional models to reproduce aerosol optical properties in the contemporary climate is essential to increasing confidence in future projections.We contribute to this growing literature by presenting high-resolution (12 km) simulations from WRF-Chem conducted over eastern North America during a year representative of average meteorological and aerosol conditions.We evaluate the simulations relative to daily MODIS and MISR observations, high-frequency AERONET measurements of AOD, and AE and near-surface PM 2.5 mass and composition measurements.Results from this study show the following: -After grid cells with any cloud presence are removed and considering only overpass hours, the domainaveraged simulated mean AOD is 0.22.Simulated AOD is positively biased relative to observations, with MFB = 0.14 when compared to MODIS-Aqua and MFB = 0.50 relative to AERONET (Figs. 1 and 2).A clear north-south gradient in AOD bias vs. MODIS is also observed.This positive bias is consistent across the entire probability distribution at most AERONET stations (Fig. 9) and is also evident in the comparison of modeled near-surface PM 2.5 mass relative to daily mean observations distributed at 1230 stations across the domain (Fig. 3).
-Model skill in reproducing the spatial fields of monthly mean AOD as measured by the spatial correlation and ratio of the spatial variability with MODIS is maximized during the summer months (r ∼ 0.5-0.7,and ratio of σ ∼ 0.8 to 1.2).During this season observed AOD is higher and more observations are available (Fig. 7).Lowest model-observation agreement is found in September and October and is at least partially attributable to a dry bias in precipitation from WRF-Chem (Fig. S4).
-In part because of the difficulties in retrieving robust estimates of AE, few previous studies have evaluated model-simulated AE values.We show that AE as simulated by WRF-Chem over eastern North America is negatively biased relative to MODIS (MFB = −0.10)and AERONET (MFB = −0.59).This bias indicates that WRF-Chem simulates a larger fraction of coarse-mode particles than is evident in the remote-sensing observations (see Tables 3 and 5).While some of the bias relative to MODIS may reflect high observational uncertainty, the large bias relative to AERONET is consistent with prior research (Table 1) and is symptomatic of substantial systematic error in the aerosol size distribution.
-Causes of the model error may include insufficiently detailed treatment of size distribution or inaccurate representation of aerosol composition and mixing state which affect the simulated size distribution and thus AE (Li et al., 2015;Curci et al., 2014).Further, weaknesses in the emission inventory (e.g., size resolution of primary emissions), as suggested by the systematic bias in simulated PM 2.5 concentrations relative to ground-based observations, and/or biases in the representation of meteorological conditions critical to determining aerosol nitrate concentrations may also affect model performance.
Currently it is not possible to fully attribute the relative importance of these error sources.
-The majority of prior model evaluation exercises have tended to focus on mean AOD values.However, the climate and health impacts of aerosols are greater under high aerosol loadings.We demonstrate that WRF-Chem exhibits some skill in capturing the spatial patterns of extreme aerosol loading, especially during summer months.During this season, the hit rate for AOD > p75 reaches 70 %.Largest biases are found during winter months and near the coastlines where AOD from MODIS also exhibits largest retrieval uncertainty.
Despite the encouraging performance of WRF-Chem both in terms of simulation efficiency and in reproducing AOD (mean and extreme values) and the partial skill in reproducing AE over eastern North America, further investigations are needed to properly quantify the value added by running high-resolution simulations by direct comparison with analogous runs at a coarser resolution.Future simulations will also involve the assessment of accuracy of different aerosol schemes (i.e., sectional vs. modal approaches) to represent the size distribution.The inclusion of a direct description of new particle formation processes within WRF-Chem may also improve estimates of ultrafine-particle concentrations and thus of simulated aerosol optical properties.
The Supplement related to this article is available online at doi:10.5194/acp-16-397-2016-supplement.

Figure 1 .
Figure 1.Location of the AERONET stations (colored dots) used in this study and mean daily PM 2.5 emissions (mg m −2 day −1 ) during 2008 (gray shading).Colors indicate the AERONET site classification based on Kinne et al., 2013: polluted (magenta), land (green), coastal (blue), unclassified (yellow).The numbers are mean fractional bias (MFB) for WRF-Chem vs. AERONET stations (red numbers indicate that WRF-Chem vs. AERONET has a larger MFB than WRF-Chem vs. MODIS, whereas black numbers indicate a lower bias in the comparison with AERONET).

Figure 2 .
Figure 2. Mean (a) AOD and (b) AE simulated by WRF-Chem during the year 2008.The mean values are computed after applying a cloud mask and are for the Terra overpass time.Mean fractional bias (MFB) for (c) AOD and (d) AE for WRF-Chem relative to MODIS (Terra) (similar results are found for Aqua).The inner black frame indicates the entire model domain, while as stated in the text, model evaluation is only undertaken for longitudes east of 98 • W.

Figure 3 .
Figure 3. Mean daily PM 2.5 concentrations (µg m −3 ) during 2008 as (a) simulated by WRF-Chem in the layer closest to the surface and (b) observed at 1230 EPA sites (note the different color bar).Panel (c) shows the probability density of daily mean PM 2.5 concentrations observed (black line) and simulated (red line) at the measurement stations.

Figure 4 .
Figure 4. Mean fraction bias (MFB) of near-surface daily mean sulfate (first line) and nitrate (second line) concentrations in fine aerosol particles as simulated by WRF-Chem and observed in PM 2.5 measurements at 123 IMPROVE sites in different seasons.A positive MFB indicates that WRF-Chem overestimates the observations.Note that the scales differ between the frames shown for sulfate and nitrate MFB and dots and diamonds refer to positive and negative MFB, respectively.

Figure 5 .
Figure 5. Summary statistics of comparisons of WRF-Chem simulations of (a) AOD and (b) AE relative to simultaneous observations at the AERONET sites.For a location to be included in this analysis at least 20 coincident observations and simulations must be available.The symbols at each AERONET station report MFB (outer square), root mean squared difference (RMSD, inner circle) and correlation coefficient (r, inner *).Note the different color bar for MFB and RMSD between the two frames.The correlation coefficient is displayed with different colors according with three classes: r < −0.1 (black), |r| < 0.1 (red) and r > 0.1 (white).

Figure 6 .
Figure 6.Mean fractional bias (MFB) on AOD from WRF-Chem as a function of spatial aggregation relative to observations from Terra (red line) and Aqua (blue line).
multaneously identified as fine(WF/MF)-or coarse(WC/MC)-mode particles by WRF-Chem and MODIS, as well as cells with different classification (columns 4 and 5).Recall a threshold of AE = 1 is used to define fine (AE > 1) and coarse-mode (AE < 1) dominance.Months in bold indicate that the distribution of observed and simulated fine-or coarse-mode fractions are significantly different (p value < 0.01) according to the χ 2 test described in Sect.2.3.Month WF/MF WC/MC WF/MC WC/MF 1 0

Figure 7 .
Figure 7. Taylor diagrams comparing the spatial fields of monthly mean (a) AOD and (b) AE from WRF-Chem vs. MODIS-Terra (color dots) or MISR (black squares).The numbers shown in the frames denote the month (e.g., 1 = January).The numbers shown in the legend indicate that the sample size of WRF-Chem data used for computing the monthly mean, and the scale of the dots is proportional to the sample size.Note the change in scale for the ratio of standard deviations between the frames.The red dashed lines define the sector with a Pearson correlation coefficient between (a) 0.12 and 0.70 for AOD and (b) 0.20 and 0.54 for AE, which comprise at least two thirds of the months.Each dot or square summarizes the statistics (i.e., RMSD, ratio of standard deviations and correlation coefficient) of the WRF-Chem vs. MODIS or WRF-Chem vs. MISR for a single month.

Figure 8 .
Figure 8.Taylor diagrams for AOD when MODIS observations and WRF-Chem simulations at 12 km are spatially aggregated to 24, 36, 48, 72 and 96 km.Numbers next to the colored dots and diamonds indicate different months (e.g., 1 = January).

Figure 9 .
Figure 9. Empirical quantile-quantile (EQQ) plots of (a) AOD and (b) AE of the 5th to 95th percentile as simulated by WRF-Chem relative to 22 AERONET stations (their longitude (E) and latitude (N) is reported in the legend).The yellow shading shows the data envelope for EQQ plots of AERONET and MODIS.For inclusion in the analysis, a location must have at least 20 coincident observations and simulations in the grid cell containing the AERONET station.Note that MODIS uncertainty in the retrievals (±0.05) in near-zero AOD conditions may lead to negative AOD values which are considered valid.The parameter space for MODIS-AERONET comparisons of AE is not shown because AE from the MODIS L2 data product are strongly bimodal (see examples given in Fig. S1 in the Supplement).

Figure 10 .
Figure 10.Spatial coherence in extreme AOD (i.e., the occurrence of AOD above the 75th percentile value) from WRF-Chem and MODIS Terra during (a) March 2008 and (b) July 2008.Green areas denote grid cells defined as experiencing extreme AOD only in the WRF-Chem simulations, blue pixels indicate extreme values as diagnosed using MODIS, while red pixels indicate areas where the occurrence of extreme values is indicated by both the WRF-Chem simulations and the MODIS observations.

Figure 11 .
Figure 11.Mean and error bars (±1 standard deviation from the mean) of the probability of co-occurrence of extreme AOD (i.e., AOD > 75th percentile) at the reference location (i.e., domain center) and any other simulated grid cell during different seasons.The distance between the reference point and each grid cell centroid was binned using 100 km distance classes.Solid lines indicate mean seasonal spatial scales simulated by WRF-Chem, whereas dashed lines are observed means from L2 MODIS data (only the mean of the coherence ratios is plotted for the MODIS data).

Table 1 .
Synthesis of some recent prior studies comparing simulated aerosol optical properties from global or regional model simulations with remote-sensing products.The first column summarizes the model used, the second the domain and the time period simulated, and the third shows the model resolution and summarizes the description of the aerosol size distribution.Columns 4 to 9 summarize the evaluation statistics in terms of the overall correlation coefficient (R), bias (as described using the mean fractional error (MFE)), and root mean square error (RMSE) or mean absolute error (MAE) relative to satellite or AERONET observations as reported in the references shown in column 10.

Table 2 .
Physical and chemical schemes adopted in the WRF-Chem simulations presented herein.
every 6 h from the North American Mesoscale (NAM) model applied at a 12 km resolution.The initial and boundary chemical conditions are based on output from the offline global chemical transport model MOZART-4 (Model for OZone And Related chemical Tracers, version 4), driven by meteorology from NCEP-NCAR reanalysis (Pfis- non-mountainous regions).Meteorological lateral boundary conditions are provided

Table 3 .
Spatial mean fractional bias (MFB) over the entire year.where C m is the monthly mean AOD or AE simulated by WRF-Chem at a specific location and C 0 refers to the same quantity from MODIS, MISR and AERONET.Thus, a negative value indicates that the model is negatively biased relative to the observations.The total sample size N is 358 048 and 359 633 when comparing WRF-Chem with MODIS onboard Terra and Aqua, respectively.The comparison between MODIS and AERONET is affected by a few outlier sites, so the MFB when the three most biased sites are removed is given in parentheses.The mean domain-averaged AOD and AE from WRF-Chem (after applying the cloud screen and selecting only MODIS overpass hours) are 0.222 and 1.089, respectively. ,

Table 4 .
Contingency table used to compare the fraction of grid cells classified as fine, F (AE > 1), and coarse, C (AE < 1), by MODIS and WRF-Chem (indicated in the table by M and W, respectively).

Table 5 .
Contingency table showing the fraction of grid cells si-