We examine the potential benefits of very high resolution for air-quality forecast simulations using a nested system of the Global Environmental Multiscale – Modelling Air-quality and Chemistry chemical transport model. We focus on simulations at 1 and 2.5 km grid-cell spacing for the same time period and domain (the industrial emissions region of the Athabasca oil sands). Standard grid cell to observation station pair analyses show no benefit to the higher-resolution simulation (and a degradation of performance for most metrics using this standard form of evaluation). However, when the evaluation methodology is modified, to include a search over equivalent representative regions surrounding the observation locations for the closest fit to the observations, the model simulation with the smaller grid-cell size had the better performance. While other sources of model error thus dominate net performance at these two resolutions, obscuring the potential benefits of higher-resolution modelling for forecasting purposes, the higher-resolution simulation shows promise in terms of better aiding localized chemical analysis of pollutant plumes, through better representation of plume maxima.
Numerical modelling of the atmosphere in an Eulerian framework relies on discretization of the computational domain into a numerical grid. The horizontal grid-cell size of atmospheric simulations can range from hundreds of kilometres to the metre-scale of large eddy simulation models. Air-quality model grid-cell size typically follows the grid-cell sizes used in weather forecasting models, which in turn have followed a gradual progression towards finer discretization where more explicit representation of cloud formation and local radiative transfer effects may be represented. The most recent weather forecasting applications (e.g. Leroyer et al., 2014) have reached grid-cell sizes as small as 250 m over limited domains such as individual cities, and have shown promising results in terms of being able to resolve some aspects of local circulation. In addition, as grid resolution reaches the 3 to 4 km scale, explicit cloud microphysics packages may be used, allowing potentially better performance, particularly with regards to feedbacks between meteorology and chemistry (Yu et al., 2014; Gong et al., 2015). However, while these models promise better physical representation of local chemistry, their performance may be limited by the quantity and availability of initialization and boundary condition meteorological data; these data may be used in a data assimilation context to improve their initial state. The accuracy of broader-scale meteorological predictions may thus influence local model accuracy, despite the ongoing decrease in meteorological model (and consequently air-quality model) grid-cell size. Some recent air-quality model simulation studies with grid-cell sizes on the order of 1 to 4 km include Thompson and Selin (2012), Li et al. (2014), Joe et al. (2014), Kheirbek et al. (2014, 2016), and Pan et al. (2017).
For the purposes of this study, very high resolution (VHR) modelling refers to the current higher-resolution limits of chemical transport models (CTMs), employing a horizontal grid-cell spacing of 1 km or less. It is in this regime that the photochemical processes may be forecasted with resolved microphysics (e.g. Milbrandt and Yau, 2005a, b) and detailed particle and gas-phase chemistry, using currently available computer technology. VHR modelling is very computationally expensive, and also introduces its own set of challenges, such as the availability of surface boundary condition fields as the model grid-cell size decreases. Moreover, it is not currently clear whether decreases in model grid-cell size leads to more accurate results when compared to observations. The motivation behind VHR modelling in CTMs is to reduce the impact of diluting chemical concentrations – especially from averaging emission plumes into large grid cells – in order to better capture inhomogeneities in emission profiles, to better simulate local transport processes associated with terrain that would otherwise be smoothed by the use of a coarse grid, and to reduce truncation errors and hence achieve better numerical accuracy (Jacobson, 1999).
We note here that while the terms “grid-cell size” and “resolution” tend
to be used interchangeably in the literature, this is not true in a precise
mathematical sense; more formally, the ability to resolve features of size
2
The issue of a model to distinguish these features is also compounded by uncertainties in model inputs. For example, in a large rural setting, a large model grid cell will represent an area containing many roads, whose emissions will be averaged into one value per species per time. As the grid-cell size decreases, however, this averaging effect will be reduced, giving each road's emissions more impact on the resulting concentrations in the grid cell containing it. However, the smaller grid-cell size will also result in steeper concentration gradients in the model between adjacent grid cells, which can in turn result in numerical instabilities that contaminate predictions (Salvador et al., 1999). At the same time, a reduction in grid-cell size can be shown formally to reduce inaccuracies in the discretization of the governing equations for atmospheric motion (Coiffier, 2011). Previous efforts to address these issues through variable grid size or structure in air quality modelling have not received sustained attention, and therefore most current air quality models use a uniform (albeit nested) grid-cell size in applications (Garcia-Menendez et al., 2010; Kumar et al., 1997).
As resolution increases further, the presence of local topographical features (e.g. buildings and street canyons) becomes more important. Both the increased topographic complexity and potential numerical instabilities can lead to differences in meteorological forcing as resolution increases (Wolke et al., 2012; Gego et al., 2005). The contribution of meteorological uncertainties due to resolution becomes more significant, especially for secondary pollutants such as ozone (Valari and Menut, 2008) or secondary particulate matter (PM). For example, Markakis et al. (2015) in their analysis of 4 km CHIMERE simulations for the relatively flat terrain of Paris, France, suggested that model meteorological grid-cell size does not significantly impact forecast accuracy. That may not have been the case, had their terrain been more complex. In contrast, Queen and Zhang (2008) observed considerable meteorological sensitivity to the more complex terrain in their 4 km resolution Community Multiscale Air Quality (CMAQ; EPA, 1999) model simulations over the Appalachian Mountains in the eastern United States, as did Salvador et al. (1999) for meteorological model simulations.
A number of studies have tried to evaluate the benefits of higher-resolution simulations and to quantify the impact of sub-grid variability by using different model grid-cell sizes (Vardoulakis et al., 2003; Ching et al., 2006; Pepe et al., 2016). These studies have often demonstrated that failure to account for higher-resolution features may result in mischaracterization of concentrations or health impacts (Isakov et al., 2007), although the capability of current models to provide this information with sufficient accuracy is unclear. One study found that increasing resolution did not change predicted health outcomes and concluded that “resolution requirements should be assessed on a case-by-case basis” (Thompson and Selin, 2012), while others (e.g. Kheirbek et al., 2014, 2016) have employed 1 km resolution without discussing the impacts of resolution on predicted health outcomes. Population exposure studies using air pollution models may be affected by resolution in a more complex fashion, given that both the predicted field (a pollutant with a known health impact) and the data to which the predicted field is to be linked (the human population) both have resolution dependencies. The health studies carried out to date highlight the need for better understanding of the underlying controlling factors for model accuracy with decreasing grid-cell size.
Terrain and meteorology are not the only factors that contribute to greater
uncertainties as horizontal grid-cell size is reduced – for example, the
ability of the model to locally resolve emission fluxes may also become a
factor. This may result in improved or deteriorated model performance as the
size of the grid cells decreases. Gridded model emissions may have an
intrinsic resolution dependence due to the underlying spatial disaggregation
fields, and this can contribute to uncertainties and errors in emissions as
grid-cell size is decreased. For instance, Valari and Menut (2008) found that
the discrepancy between their modelled and observed concentrations grew
rather than shrank, in response to decreases in grid-cell size from 48 to
6 km, and they associated these results with changes in the resulting local
emission fluxes. They showed that in their model setup, with regard to ozone,
a grid-cell size was reached (12 km
Salvador et al. (1999) studied the prediction accuracy impacts of
meteorological model grid-cell size in a region with a complex domain and
found that 2 km or smaller grid-cell sizes were required to resolve local-scale complex terrain flow features, and that daytime vertical advection and
predictions of turbulent kinetic energy and potential temperature were
influenced by grid-cell size. Dore et al. (2012) evaluated air quality
model
The benefits for model performance with increased spatial resolution are
unclear, based on the above literature. However, most papers converge
towards the following qualitative conclusions:
The impact of terrain topology on meteorological forcing as grid-cell
size decreases can dwarf the impact of a more accurate spatial apportionment
of the corresponding emissions. Decreases in grid-cell size result in a more realistic spatial distribution
of chemical species, whether or not model performance is improved. Uncertainties in spatial and temporal emissions allocation have an increasing
influence on overall model uncertainty as model grid-cell size decreases.
The 1980s saw several studies in which the potential impacts of wind
direction errors on dispersion model performance were examined. Fox (1981)
noted that pairing of model output at observation station locations could be
done as a function of both time and space: as a function of time (by
combining the data across all stations), as a function of space (by combining
all times, at each station location), or without any pairing (observations
and data were compared as cumulative frequency distributions). The accuracy
of regulatory dispersion models in the early 1980s was such that Fox (1984)
concluded that model and observation values paired in time and space
exhibited “little to no correlation” and discussed potential errors
associated with transport. Poor correlations were also noted
in the report on the first generation of reactive-transport models by Hanha (1988), who stated “wind
direction errors are the major cause of the poor agreement in hourly
predictions of concentrations at short distances downwind of point sources,” and described metrics for air-quality model evaluation. Hanha (1988)
also noted that model predictions could be offset in space and time relative
to observations, leading to poor performance statistics, despite a greater
degree of similarity of behaviour if the offsets are taken into account.
Errors in wind-field modelling were described as the main source of error in
simulations of plumes by Carhart et al. (1989), who again showed how better
agreement resulted when model and observations were unpaired in time and/or
space and noted that other metrics such as maximum plume width might better
represent model performance. Lee (1987) found that small perturbations in
space and time could result in poor correlations, despite similar histogram
distributions of both model and observations.
More recently, Kang et al. (2007) examined the concept of using the area of
the limiting resolution of the model (2 to 3
We expand on this concept to evaluate the impact of model grid-cell size in the context of an equivalent area about a given observation location. We examine area-weighted metrics in the form of averages over roughly equivalent areas for different model grid-cell sizes, and also use the a priori knowledge of the observations to determine whether the closest match to observations may be found within an equivalent area. We show that the latter metric demonstrates a positive impact of model grid-cell size on simulation results, while more simple paired comparisons, and averages over similar areas, mask these benefits.
We examine the impact of grid-cell size on model performance in a region of
intense petrochemical extraction and upgrading, the Athabasca oil sands
region (AOSR). The AOSR refers to the northernmost of three large bitumen
deposits located the northern part of the province of Alberta in Canada: the
Athabasca, Peace River, and Cold Lake areas. Together these areas cover
142 200 km
Next we describe our model, the simulation domains and forecasting setup, the emissions data, our evaluation methodology, and the results of our analysis.
Map showing the oil sands regions (based on an image from Government of Alberta, 2016).
The air-quality model used in this work is Environment and Climate Change
Canada's (ECCC) Global Environmental Multiscale – Modelling Air-quality and
Chemistry (GEM-MACH) model, which has been in use as Canada's operational
air-quality forecast model since 2009 (Moran et al., 2010). GEM-MACH is an
on-line model; that is, both meteorological and chemistry processes are
handled within a single model. The chemical processes reside within the
physics module of the Global Environmental Multiscale meteorological forecast
model (Côté et al., 1998a, b), originate with Environment Canada's
earlier off-line model (A Unified Regional Air-quality Modelling System,
AURAMS; Gong et al., 2006), and include process representation for particle
microphysics (Gong et al., 2003a, b), inorganic heterogeneous chemistry
(Makar et al., 2003), aqueous phase chemistry, in-cloud and below-cloud
scavenging (Gong et al., 2006), and secondary organic aerosol formation
(Stroud et al., 2011). GEM-MACH employs a sectional approach to represent
the size distribution of atmospheric particles, with 12-bin (Makar et al.,
2015a, b; Gong et al., 2015) or 2-bin configurations (Moran et al., 2010).
The latter configuration is designed for maximum computational efficiency,
with re-binning to the 12-bin distribution for key particle microphysics
processes, in order to improve accuracy. Here, the 2-bin version of the model
has been used, the main focus of the work being the impact of horizontal grid-cell size on model results. Eight aerosol chemical components are resolved in
GEM-MACH (sulfate, nitrate, ammonium, elemental carbon, primary organic
aerosol, secondary organic aerosol, sea salt, and crustal material). In the
present study, we make use of GEM-MACH v.1.5.1, described in more detail in
Makar et al. (2015a, b), employing 80 levels in a hybrid vertical
coordinate system extending up to 0.1 hPa (
Four levels of nesting have been employed in our simulations, shown in Fig. 2a. This version of GEM-MACH operates on a rotated latitude–longitude coordinate system wherein the position of the coordinate system poles is set by the user, allowing rotations of the grid with decreasing grid-cell size during nesting. The outermost nested grid corresponds to the westernmost two-thirds of the operational GEM-MACH forecasting domain, with a 10 km grid-cell size, and employ a combination of the Kain–Fritsch sub-gridscale convective cloud scheme (Kain and Fritsch, 1990; Kain, 2004) and a Sunqvist (1988) scheme for cloud parameterizations. Within that outer grid is nested a 10 km grid-cell size western Canada domain (yellow region, Fig. 2a) which has been rotated to match the horizontal orientation of the Rocky Mountains, and which makes use of a double-moment microphysics scheme (Milbrandt and Yau, 2005a, b) in place of the Sundqvist (1988) parameterization. The intention of this intermediate local 10 km simulation domain was to provide initial hydrometeors for the two innermost domains, to reduce the “spin-up” time required for the inner domains' meteorology to reach an equilibrium with respect to cloud formation. The latter two domains (2.5 and 1 km grid-cell sizes) resolve the cloud microphysics explicitly using the double-moment scheme alone and no convective parameterization (Milbrandt and Yau, 2005a, b). The third nested grid inset (green region, Fig. 2a) is the 2.5 km grid-cell size domain, which covers most of the Canadian provinces of Alberta and Saskatchewan. This grid will hereafter be referred to as the OS2.5km domain. The fourth and final nested grid (blue square, Fig. 2a) is a 1 km grid-cell size domain, roughly centered over and covering the immediate environs of the Athabasca oil sands, and is referred to hereafter as the OS1km model. This last nest also shows the region within which 22 instrumented aircraft flights were conducted during August and September of 2013, providing a unique measurement data set for our evaluation of the OS2.5 and OS1km model output for the same time period. Table 1 provides details on the horizontal dimensions of each of these nested domains and the duration of the simulations on each grid. All four model nests make use of the same vertical coordinate and levels. Figure 2b shows the topography of the 1 km domain in detail; the region to be modelled is situated in a broad river valley, with a local vertical relief of 750 m. Significant wind shears and frequent inversions are observed in the region, and part of our interest in 1 km grid-cell size simulations is to determine the extent to which these local features may influence model prediction accuracy.
Model simulations mimic an operational forecasting system, starting from the use of archived, data-assimilated meteorological analyses as meteorological input and boundary conditions every 36 h. The use of analysis fields is a standard meteorological forecasting practice to prevent the chaotic drift of the model results from observed meteorology over time. The outermost 10 km domain uses initial and boundary conditions from the output of a meteorological simulation, that is itself driven by an analysis field. The outermost domain model then carries out a 36 h forecast, of which the first 6 h is discarded as spin-up; the final 30 h is used as initial and boundary conditions for the rotated 10 km grid-cell size domain (the OS10 km domain). An OS10 km simulation of 30 h is then carried out, with the first 6 h being discarded as spin-up, and the latter 24 h forming the initial and boundary conditions for the 2.5 km grid-cell size OS2.5km simulation. The OS2.5km simulation is of 24 h duration. The OS1km simulation covers the same 24 h (and hence both 2.5 and 1 km simulations start from the same OS10 km initial conditions for every 24 h forecast), with the 2.5 km simulation providing boundary conditions thereafter to the OS1km model. Continuity between 24 h forecasts is thus maintained at the level of the outermost nest. The outermost domain is cycled every 12 h starting at 00:00 and 12:00 UT; however, we have selected the set of contiguous OS2.5 and OS1km 24 h simulations starting from the 12:00 UT continental domain for our comparison.
Meteorological boundary conditions for the lowest-resolution GEM-MACH simulations are taken from operational GEM forecasts, in turn driven by data assimilation analyses performed at the Canadian Meteorological Centre.
Nested domain specifications.
All emissions data used in this work are described in Zhang et al. (2018).
These emissions data include
direct observations of stack-specific hourly emissions measured by
continuous emission monitoring systems (CEMS); regional emissions inventory
data from the Cumulative Environmental Management Association (CEMA) – which
had the most detailed stack and process level emission data for the AOSR
facilities, including emissions from mine faces, tailings ponds, and the
off-road mining feet); the 2010 Canadian Air Pollutant
Emissions Inventory (APEI) – which is the most comprehensive national
emissions inventory, and which has the largest spatial coverage for area
sources outside the AOSR; and the 2013 National Pollutant Release Inventory
(NPRI) (a subset of the APEI) that is based on emissions reports from large
industrial facilities.
These emissions data sets primarily describe emissions of pollutants known as
criteria air contaminants (
Major-point sources are represented by a single geographical (latitude,
longitude) pair of coordinates, and are assigned to the grid cell in which
the point is located. These sources are likely to be the most impacted by
model horizontal grid-cell size, as even a large major-point source plume,
which in reality may only occupy an emissions horizontal area on the order of
100 m
Area sources that are large compared to both model grid-cell sizes (2.5 and 1 km) can be expected to be approximated by model grid cells of both resolutions and are thus expected to be less impacted by model resolution than emissions from point sources. However, smaller area sources (i.e. areas intermediate between 2.5 and 1 km to the side) may be better resolved, and hence have less dilution and higher downwind concentrations, when higher spatial resolution is employed.
In the AOSR, approximately 95 % of the
Comparisons between air-quality models and observations usually take the
approach of comparing observation and model-generated values paired in time
and space, from the observation location and corresponding model grid-cell
respectively. We refer to this approach hereafter as our “standard”
evaluation, for both 2.5 and 1 km simulations. However, we note additional
factors aside from grid-cell size may influence the outcome of air-quality
model evaluations. For example, the relative skill of the meteorological
component of the air-quality model will depend in part on the density of
meteorological observation data, incorporated into the model via data
assimilation, for the construction of the model's initial meteorological
state. This in turn will influence the local skill of the model's predicted
wind directions and hence the skill of its plume transport. The simulations
carried out here focus on the Fort McMurray area, where the nearest available
upper air meteorological sounding site is located at the ECCC Stony Plain
station, located approximately 500 km south-west of the study area. The
advantage of higher-resolution simulations (e.g. reduced numerical error
associated with the discretization of transport operators, and better
treatment of local topographic influences) may thus be offset by errors in
the predicted
While meteorological model synoptic-scale forecast errors may manifest
themselves locally as errors in the direction of winds driving local plume
transport, other advantages may result from the use of higher-resolution
air-quality models. Since lower resolution models de facto instantaneously
redistribute plumes emitted from large stack sources over a larger area, such
artificial diffusion will reduce the model's ability to accurately simulate
concentration maxima, and the resulting chemistry, within simulated model
plumes. However, the spatial extent of a plume in a model employing a large
horizontal grid-cell size may be such that its existence may be captured at
discrete observing sites. In contrast, forecast plumes in models with smaller
horizontal grid-cell sizes may correctly capture plume magnitude and chemical
behaviour, but may be more subject to errors in the larger-scale wind
direction. To illustrate this point, Fig. 3 shows a conceptual diagram of an
actual plume, a large grid-cell size model plume, and a small grid-cell size
model plume, where the latter two simulated plumes are both subject to the
same synoptic-scale error in wind forecast direction (indicated by large red
arrows; the smaller red arrow in Fig. 3c indicates the impact of local
forcing predicted for the second model). Observation station “
Schematic comparison of surface concentration contours and model
grid-cell values of a transported pollutant plume from a large stack (termed
a “point” source). Wind direction shown by red arrows. Monitoring station
location marked by “
In addition to the standard analysis, we perform additional analyses that
examine the model's ability to resolve plumes in the
Figure 4a shows an observation station enclosing the nine nearest-neighbour
model grid cells for a 2.5 km grid-cell size, while Fig. 4b shows the
corresponding 1 km grid-cell size map, with the 9 nearest-neighbour model
grid cells shown in light grey and the 49 nearest grid cells shown in
the region enclosed in dark grey. Figure 4a encloses a region of
56.25 km
Scale diagram of the same region in
As noted above, in a formal mathematical sense, the smallest region
resolvable by an Eulerian grid model is twice the size of the model grid-cell
size (relating to the Nyquist frequency of the model); hence the smallest
resolvable feature spans two model grid cells in each direction. However, in
a practical sense, a total of nine grid cells centered on the observation
station must be used to allow a boundary of two grid cells in any direction.
Sampling any or all of the nine grid cells in Fig. 4a may thus be said to be
representative of the model's ability to simulate events occurring at
discrete location “
Our approach follows two steps:
From the 2.5 km simulation, in addition to the predicted model value at the
grid cell containing the observation location, we determine the model
grid-cell value in the nine grid cells surrounding the observation station
location which has the closest value to that observed at the station. This
represents the model's “best estimate” of the value at the observation
station location itself and represents the model's ability to resolve features at 2.5 km
grid-cell size. From the 1 km simulation, in addition to the model value at the grid-cell
location, we select the closest value to the observation value from
the nearest nine grid cells to the observation station location, and the nearest 49 grid cells to the observation station location.
The former represents the model's “best estimate” of the value
at the observation station location itself, while the latter represents the
1 km model's best estimate in the closest equivalent region to the limiting
resolution of the 2.5 km model.
Comparing the resulting statistical measures of each of these selected values with observations, in addition to the standard analysis, thus evaluates the model's best attempt to resolve features for the specified grid-cell size and allows cross-comparison of model performance within nearly equivalent areas. Cross-comparing the statistical values for the different regions described above shows the model's ability to resolve features such as plumes from the standpoint of the region represented at the different grid-cell sizes. If synoptic-scale transport direction errors create situations similar to that depicted in Fig. 3a, a standard comparison of error would be expected to show little benefit to higher resolution. However, the “best model estimate” comparisons would capture the ability of the higher-resolution model to more accurately simulate the magnitude of the plume, if not its spatial location. Each of these selection procedures will be employed in the surface concentration comparisons which follow.
We evaluate our model simulations against observations made at surface monitoring networks in the vicinity of the Athabasca oil sands, and aboard an instrumented aircraft, the National Research Council of Canada Convair. For the surface monitoring data, hourly time series of model output were matched to station time series using the different strategies described above. For the aircraft observations, we extract model values through temporal and spatial interpolation to the aircraft's position during the flights and only perform the standard analysis, as well as examining the behaviour of the two simulations along cross sections corresponding to the flight paths.
Our statistical metrics for evaluation are common to many other air-quality
applications and were computed using the “modstat” function from the OpenAir
R package (Carslaw and Ropkins, 2012). Further discussion of different
metrics for model evaluation may also be found in Yu et al. (2006). The
statistics calculated here include mean bias (MB; perfect score: zero), mean
absolute gross error (MGE; perfect score: zero), normalized mean bias (NMB;
perfect score: zero), normalized mean gross error (NMGE: perfect score:
zero), root mean squared error (RMSE; perfect score: zero), correlation
coefficient (
We begin a comparison of 2.5 and 1 km grid-cell size for specific events,
and for averages across the 1 km domain, in order to provide a qualitative
comparison of the differences in simulations for the two simulations and
then continue with the quantitative comparison. Figure 5 compares OS2.5km
(left column) and OS1km (right column) simulation results for a
cross section located 0.2 km from a major
The model results are identical at hour 0 due to both the OS2.5 and OS1km models being initialized from the OS10 km data at this time (small differences between Fig. 5a and b are due to slight mismatches in the cross section locations). Subsequent cross sections show the OS1km model is capable of resolving both higher absolute mixing ratio values, and sharper gradients, within 12 h of simulation time (Fig. 5c, d). Multiple plumes are resolved by 12 h of simulation time in the 1 km grid-cell size simulation, along with markedly different plume heights, plume structure, and an increase of a factor of 2 in the magnitude of plume mixing ratios relative to the lower grid-cell size simulation, and these differences persist into the 24th simulation hour (Fig. 5e, f). Mixing ratio differences of these magnitudes are to be expected given the increase in resolution, but Fig. 5 shows that other important aspects of the predicted plumes have changed. The plume heights are a function of predicted local stability conditions in the grid square containing the source, and the variation shown here represents a substantial change in the predicted local stability for the origin sources of these plumes, resulting from the change in model horizontal grid-cell size.
Comparison of simulated
Figure 6 compares the maximum surface
Comparison of total-simulation
The maximum surface concentrations tend to show more elongated structures at
the smaller grid-cell size, comparing Fig. 6a and b, particularly for plumes in
the western (left) half of the OS1km domain. The difference plot (Fig. 6c)
shows that local maximum concentration differences of up to
The locations of the local network of 10 surface monitoring stations located
near the sources of emissions in the region (oil sands facilities) are shown
in Fig. 7. As noted in Sect. 2.4, we carry out several analyses:
averaging of the OS1km results across the nine grid cells prior to
evaluation (to determine whether the mean value is better represented by the
smaller grid-cell size, similar to the approach taken in Kang et al., 2007); selection of the the extent to which the 1 km grid-cell size is capable of improving
simulation results despite, for example, the larger-scale flow resulting in errors
in the plume placement, and whether the 1 km grid-cell size model is capable
of outperforming the 2.5 km grid-cell size model
While evaluations (Figs. 2b and 3) deliberately select the “best” value,
they also provide a quantitative estimate of the extent to which each model
is capable of achieving the correct answer within roughly equal
representative areas centered on the observation station locations. These
comparisons are intended to evaluate
In the last test, we place both models on an equal footing with regards to
the region being represented, as well as with regards to allowing cell-to-cell
variability and the selection of a closest match to observations.
Our evaluation is presented as tables of statistical metrics. The comparisons
employing the nearest-neighbour approach are described with a “B
Comparisons to surface concentrations were performed using publicly available
data collected by the Wood Buffalo Environmental Association (WBEA), which
operates the air-quality monitoring network residing within the OS1km
domain. The monitoring station locations are shown in Fig. 7. The statistical
performance of the models, calculated using the procedure outlined above, are
given in Tables 2 through 5, for
In the
Illustration of the OS1km domain, with observation station
locations.
When the standard evaluation is compared to the
We next ask the question, “does a more accurate simulation value
A valid argument could be made that the methodology employed in this fourth
evaluation is subject to selection bias, in that the selection of a
Two important results can be seen from this final evaluation. First, as was
the case for the “best of 9” for the OS1km simulation compared to the
standard OS1km evaluation, the best of 9 for the OS2.5km simulation
has a considerably better performance than the standard OS2.5km evaluation
(compare fifth and first columns, Tables 2 through 5). That is, the OS2.5km
model may
This final result is strongly suggestive of the presence of issues such as those illustrated in Fig. 3. These may include errors in the larger-scale synoptic wind flow, combined with the reduced size of plumes as grid-cell size is reduced, leading to more “misses” than “hits” for a given recorded event at a measurement station compared to the coarse grid-cell size model. There may be multiple additional causes for such errors (examples include poor observation density in the region for model initialization, underlying lower-resolution boundary condition fields such as topography not improving with the reduction in grid-cell size, inaccuracies in land use fields used in meteorological modelling due to rapid development, and errors in other aspects of the reaction transport modelling system aside from horizontal resolution). The expected advantages of the small grid-cell size, such as better representation of the concentrations of species within plumes and hence better representation of their reactive chemistry (see for example Lonsdale et al., 2012), may be lost in a standard performance analysis due to these other issues.
Our analysis suggests that a practical limit in the benefits of increasing model accuracy may be reached when resolution exceeds some threshold, as a result of other errors inherent in the modelling system. However, the analysis also suggests that if these non-resolution-related errors are corrected, the benefits of adopting a smaller grid-cell size may be substantial. For example, meteorological data assimilation employing a dense monitoring network for a specific area of interest would be expected to show a greater impact for smaller rather than larger grid-cell sizes, due to the greater ability of the former to take advantage of the observation density in correcting the initial meteorological state. We note that recent work applying land use data assimilation (Carrera et al., 2015) to regional 2.5 km grid-cell size weather simulations (Milbrandt et al., 2016) have suggested that such data assimilation may indeed improve forecast skill at the very local scale.
Surface
Surface
Surface
Surface PM
The surface observation data were also analyzed by the time of day, with both
observations and simulations split into daytime (09:00 to 18:00 LT)
and nighttime (19:00 to 08:00 LT) data pairs (Appendix A, Tables A1
through A8; Carslaw and Ropkins, 2012). Within each of these diurnally
segregated time periods, the broad aspects of the comparison were the same as
for the “all data” Tables 2 to 5 above: the OS1km simulations tended to
have reduced performance in a standard analysis, averaging improved but did not
completely ameliorate the performance of the OS1km simulation, a
methodology employing the best of nine OS1km grid cells had superior
performance to the two standard comparisons, and comparison of the “best
of” methodologies for equal areas showed better performance for the OS1km
compared to the OS2.5km simulation. We also noted substantial differences
in the day and night performance of both models across the methodologies. For
example, daytime
A total of 22 aircraft observation flights were carried out during the study simulation period – here we present statistical comparisons using the standard approach only (model grid cell containing the observation point to observation data at the aircraft location). Model values were linearly interpolated in time and space to the aircraft observation locations and times (aircraft observations were on a 10 s interval.) We begin with a composite comparison across all observation times, in Table 6.
Aircraft observation comparisons,
The results are in general similar to the surface analysis, in that the
OS1km simulation tended to have worse performance than the OS2.5km
simulation (exceptions being the biases for both
Figure 8a and c provide a further example of the kind of
situation referenced in Fig. 3; surface monitoring station locations are
depicted as grey circles, one of which is identified with a pink arrow. This
station lies within the plume at 2.5 km resolution (Fig. 8a), and outside of
the plume at 1 km resolution (Fig. 8c). While the plume direction is the
same at both scales, that is, the large-scale wind field controls the
positioning of the plume axis, the smaller grid-cell size simulation places a
stronger constraint on the accuracy of the wind field. For example, if the
simulated large-scale flow direction was inaccurately predicted by only a few
degrees, the plume would not appear in the 1 km simulation time series at
this location but would register as present in the 2.5 km simulation.
Nevertheless, the plume maximum concentration is better captured by the
smaller grid-cell size simulation (compare maximum values in observed
aircraft
Comparison between OS2.5km
Standard performance evaluation of Flight 8 for
Meanwhile other flights show a clear advantage of the OS1km model. One
example is given by the
These and the snap-shot comparisons described in Sect. 3.1 show that the higher-resolution model is having a significant impact on predictions – however, other aspects of the overall model performance are preventing the potential benefits of higher resolution from influencing the standard performance evaluation.
Standard performance evaluation of Flight 17 for
Flight 17 comparison for
A key result of our current work is that 1 km grid-cell size simulations
resulted in improved prediction of plume concentration maxima relative to
2.5 km grid-cell size simulations, despite having no improvement using
standard scoring methodologies. We also have described a scoring approach
wherein these potential advantages of higher resolution may be quantified. We
believe that flow field effects such as those described in Fig. 3 are a general
result of increasing grid resolution, but note important caveats, which
include the following:
Given these considerations, we
recommend that modellers should attempt successively smaller grid-cell sizes
to determine the following: first, the point at which, for their particular
system and simulation location, subsequent grid-cell size reductions fail to
improve performance, and second, to make use of still higher resolutions for
studies wherein the point-to-point comparison is less important, and other
factors such as accurately capturing the plume chemistry are more crucial.
Our work suggests the following: decreasing air-quality model horizontal grid-cell size will not necessarily result in improvements to model performance in standard performance evaluations, in which the model values at the grid cells encompassing measurement location stations are used in a pairwise comparison to observations. Other considerations, such as the accuracy of the larger-scale wind direction and speed forecast, and the accuracy of the plume rise parameterization used within the model may play a greater role in the overall performance of the model, and reduce the benefits of the smaller grid-cell size. In the context of a standard model performance evaluation, there may be fixed limits to the benefits of decreasing model grid-cell size.
Despite this difficulty, our results also show that the use of smaller grid-cell sizes have some potential benefits, in that these models do a better job
of resolving specific air pollution features, like high-concentration maxima
within plumes. Both coarse and fine grid-cell size plumes may be misplaced in
both time and space, with the net result that the latter model has a worse
performance in a standard comparison, but is nevertheless more likely to
capture the correct in-plume concentrations, and hence the chemistry, of the
actual plume, in the
Our results should not be taken as an indication that the standard metrics for model comparison are in some way flawed – they provide the most rigorous method for evaluating the performance of a model at specific monitoring locations and specific times. However, the ancillary performance assessment methodology presented here shows that models with very small grid sizes, which may have standard performance metric scores that have not improved or have even degraded relative to larger grid-cell size models, nevertheless have scientific value, in terms of being better able to capture plume concentrations and hence plume chemistry, if not plume position. The work also suggests that the prediction accuracy of very local transport conditions may be a large factor in preventing the smaller grid-cell size models from achieving improved performance in standard performance analyses.
These findings suggest that at the current state of development, VHR
air-quality models are of benefit for the specific purpose of chemical
process studies, in which the main aim of the work is to accurately simulate
plume chemistry – and in which accurate forecasting of the
Surface station air monitoring data used in this study were
obtained and are publicly available from the Wood Buffalo Environmental
Association (
The model output generated from the study requires a large amount of memory space (several gigabytes per day of simulation) in a binary format specific to Environment and Climate Change Canada's modelling systems. The size of the files precludes maintenance on a public site, but conversion to other formats and arrangements for uploading may be made on request to Environment and Climate Change Canada at paul.makar@canada.ca.
Model comparison statistics.
The limits on the summations were removed for brevity; all are from
Surface
Surface
Surface
Surface
Surface
Surface
Surface PM
Surface PM
MR conducted computer simulations and analysis and created graphical outputs and the initial manuscript draft. AH supervised MR, provided research advice and infrastructure, contributed to manuscript writing, and provided comments on manuscript drafts. PAM co-supervised MR, provided research advice and infrastructure, contributed to manuscript writing, and was the lead for revisions and responses to referees. AA provided model code assistance and setup, and the scoring package for model–observation comparisons. JZ provided 2.5 and 1 km resolution emissions files. MDM provided 2.5 and 1 km resolution emissions files and commented on manuscript drafts. QZ contributed to the provision of 2.5 and 1 km resolution emissions files.
The authors declare that they have no conflict of interest.
The authors wish to thank the support of Environment and Climate Change Canada (ECCC), under the CCAP program, for supporting this research. The authors also gratefully acknowledge the assistance of Michel Valin and Sylvie Gravel for advice and assistance with the installation of GEM-MACH on the Carleton University workstations during the early stages of this project.
This paper was edited by Jason West and reviewed by two anonymous referees.