Journal cover Journal topic
Atmospheric Chemistry and Physics An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

Journal metrics

  • IF value: 5.509 IF 5.509
  • IF 5-year value: 5.689 IF 5-year 5.689
  • CiteScore value: 5.44 CiteScore 5.44
  • SNIP value: 1.519 SNIP 1.519
  • SJR value: 3.032 SJR 3.032
  • IPP value: 5.37 IPP 5.37
  • h5-index value: 86 h5-index 86
  • Scimago H index value: 161 Scimago H index 161
Volume 18, issue 9 | Copyright

Special issue: Atmospheric emissions from oil sands development and their...

Atmos. Chem. Phys., 18, 6543-6566, 2018
© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.

Research article 08 May 2018

Research article | 08 May 2018

The use of hierarchical clustering for the design of optimized monitoring networks

Joana Soares1, Paul Andrew Makar1, Yayne Aklilu2, and Ayodeji Akingunola1 Joana Soares et al.
  • 1Air Quality Modelling and Integration Section, Air Quality Research Division, Environment and Climate Change, Toronto, ON, M3H 5T4, Canada
  • 2Environmental Monitoring and Science Division, Alberta Environment and Parks, Edmonton, AL, T5J 5C6, Canada

Abstract. Associativity analysis is a powerful tool to deal with large-scale datasets by clustering the data on the basis of (dis)similarity and can be used to assess the efficacy and design of air quality monitoring networks. We describe here our use of Kolmogorov–Zurbenko filtering and hierarchical clustering of NO2 and SO2 passive and continuous monitoring data to analyse and optimize air quality networks for these species in the province of Alberta, Canada. The methodology applied in this study assesses dissimilarity between monitoring station time series based on two metrics: 1 − R, R being the Pearson correlation coefficient, and the Euclidean distance; we find that both should be used in evaluating monitoring site similarity. We have combined the analytic power of hierarchical clustering with the spatial information provided by deterministic air quality model results, using the gridded time series of model output as potential station locations, as a proxy for assessing monitoring network design and for network optimization. We demonstrate that clustering results depend on the air contaminant analysed, reflecting the difference in the respective emission sources of SO2 and NO2 in the region under study. Our work shows that much of the signal identifying the sources of NO2 and SO2 emissions resides in shorter timescales (hourly to daily) due to short-term variation of concentrations and that longer-term averages in data collection may lose the information needed to identify local sources. However, the methodology identifies stations mainly influenced by seasonality, if larger timescales (weekly to monthly) are considered. We have performed the first dissimilarity analysis based on gridded air quality model output and have shown that the methodology is capable of generating maps of subregions within which a single station will represent the entire subregion, to a given level of dissimilarity. We have also shown that our approach is capable of identifying different sampling methodologies as well as outliers (stations' time series which are markedly different from all others in a given dataset).

Download & links
Publications Copernicus
Special issue
Short summary
Grouping data on the basis of (dis)similarity can be used to assess the efficacy of monitoring networks. The data are cross-compared in terms of temporal variation and magnitude of concentrations, and sites are ranked according to their level of potential redundancy. The methodology can be applied to measurement data, helping to identify sites with different measuring technologies or data flaws, and to model output, generating maps of areas of spatial representativeness of a monitoring site.
Grouping data on the basis of (dis)similarity can be used to assess the efficacy of monitoring...