The interactions between organic and
inorganic aerosol chemical components are integral to understanding and
modelling climate and health-relevant aerosol physicochemical properties,
such as volatility, hygroscopicity, light scattering and toxicity. This study
presents a synthesis analysis for eight data sets, of non-refractory aerosol
composition, measured at a boreal forest site. The measurements, performed
with an aerosol mass spectrometer, cover in total around 9 months over the
course of 3 years. In our statistical analysis, we use the complete organic
and inorganic unit-resolution mass spectra, as opposed to the more common
approach of only including the organic fraction. The analysis is based on
iterative, combined use of (1) data reduction, (2) classification and
(3) scaling tools, producing a data-driven chemical mass balance type of
model capable of describing site-specific aerosol composition. The receptor
model we constructed was able to explain
Along with particle size, aerosol chemical composition is fundamental in understanding aerosol physicochemical properties such as hygroscopicity, volatility, optics and toxicity (Bilde et al., 2015; Swietlicki et al., 2008; Zimmermann, 2015). In the past decade aerosol mass spectrometry has provided a way to quantitatively resolve basic chemical composition of aerosol in near real time. This not only enables basic chemical speciation into organic and common inorganic ion species, but also produces a wealth of complex mass spectrometric data. It has since become clear that these data sets, although superficially hard to interpret, are rich in chemical information and their statistical analysis yields considerable new knowledge. However, tapping into this information source requires use of advanced analysis tools and chemometric methods (i.e. “using mathematical and statistical methods to provide maximum chemical information by analysing chemical data”; Kowalski, 1975). Consequently, advanced statistical methods for data reduction have quickly gained traction in aerosol mass spectrometry, and are presently widely used for deconvolution of complex organic mass spectra into their underlying components. Specifically, the positive matrix factorisation algorithm (PMF; Paatero and Tapper, 1994) has achieved a predominant status as the state-of-the-art analysis tool for deconvolving aerosol mass spectrometric data. Factorisation methods such as PMF notably allow for the condensation of information found in high-dimension data matrices into a manageable number of factors, corresponding to aerosol chemical species, sources or processes, for example. Data reduction often additionally allows for improved visualisation, aiding in interpretation of the underlying aerosol chemical phenomena.
In exploratory factor analysis, the principal difficulties often relate to deciding the optimal number of factors, choosing between multiple solutions of mathematically similar quality, and estimating the reliability and uncertainty of the results. Lacking robust but easy-to-use mathematical tools, the selection and interpretation of factorisation solutions remains prone to subjective bias by the analyst. Specifically, while analyst-imposed additional constraints in factorisation may sometimes be required to reduce rotational uncertainty and extract minor factors in data (e.g. Canonaco et al., 2013; Crippa et al., 2014) such procedures are especially prone to analyst-subjective decisions. Evaluation and verification of a factorisation solution thus generally requires meticulous study and understanding of, for example, correlations with auxiliary data, temporal changes and cycles and spectral references. While statistics-driven methods for spectra comparison and classification as of yet remain marginal in aerosol mass spectrometry, they do show promise in their capability to automatically group similar spectra based on their chemically relevant features, producing comparable classifications to those performed manually by expert analysts (Äijälä et al., 2017; Rebotier and Prather, 2007; Freutel et al., 2013).
The overwhelming majority of PMF analyses to date from aerosol mass spectrometer (AMS) have been performed on the organic fraction alone (Zhang et al., 2011). Contrary to popular belief, there exists no tenable reasons to limit chemometric analysis to organic signals, as exemplified by the analyses of Sun et al. (2012) and Hao et al. (2014). Although it requires some additional data preparation and processing, inclusion of inorganics provides additional insight into, for example, salt formation in aerosol. In this work, we apply data reduction and classification methods for analysing organic and inorganic aerosol mass spectral data from several measurement campaigns in the boreal forest. We then derive a comprehensive receptor model resolving the dominant aerosol categories at the site. In addition, by presenting an example of a semi-supervised, statistics-driven analysis of large mass spectral data sets, we hope to pave the way for machine-learning-based data analysis approaches, reducing the need for expert analyst input and subjective judgement at each step.
Our instrumentation, data processing, measurement site and analysis algorithms have been comprehensively described in previous literature, to which we refer in the corresponding sections. Thus, we focus on the new aspects of this work, showing how the individual methods can be connected to form an analysis chain, and to exemplify how chemometric information can be propagated through it. In short, we will first cover the measurement site, SMEAR II (Station for Measuring Ecosystem–Atmosphere Relations) and the sets of data available to us (Sect. 2.1). We then describe our mass spectrometer instrument and preparation of data (Sect. 2.2). In Sect. 2.3, we will briefly go through the various statistical tools and algorithms, covering the basics of data factorisation, classification of spectra using a clustering algorithm and clustering solution evaluation, and detail the pre- and post-weighting involved. Section 2.4 describes typical reference methods for inorganics and nitrate apportionment: an ion balance scheme and a separate parameterisation for estimating organonitrate loading, to provide a comparison for the inorganic speciation from our statistics-based receptor model. Finally, in Sect. 2.5, we present a summarised, step-by-step description of how the methods were combined to produce a receptor model for aerosol composition at the measurement site.
The AMS data of this study were collected at the SMEAR II site in Hyytiälä, southern
Finland (61
The environment consists mostly of forests dominated by Scots pine (
A large part of the aerosol loading at SMEAR II is attributable to regional
biogenic secondary organic aerosol (SOA; Corrigan et al., 2013; Crippa et
al., 2014; Allan et al., 2006) and long-range transport from industrial
regions in southern Finland, western Russia and central Europe (Kulmala et
al., 2000; Patokoski et al., 2015; Niemi et al., 2009; Sogacheva et al.,
2005). Regional anthropogenic aerosol sources include the towns Orivesi (pop.
9500; 19 km south) and Tampere (pop. 213 000; 48 km south-west), as well
as two sawmills and a pellet factory in the village of Korkeakoski, Juupajoki
(7 km east-south-east of the station). The surrounding countryside is
sparsely populated (5–10 inhabitants km
Data sets used in this study and their time frames (dd.mm.yyyy).
For months when AMS data were available, percentages indicate the fraction of days with at least one data point.
In this study, the aerosol composition was monitored by an AMS between 2008 and 2011, during several short measurement campaigns. Notable larger, intensive campaigns at the time were the EUCAARI project (2008–2009; Kulmala et al., 2009, 2011) and HUMPPA-COPEC (2010; Williams et al., 2011; Corrigan et al., 2013). The sets of data used along with their time frames are shown in Table 1. Data availability by year and month is presented in Table 2.
The mass spectrometric data for this study were acquired with a Time-of-Flight Aerosol Mass Spectrometer (ToF-AMS), developed by Aerodyne Research Inc. (Billerica, MA, US). AMS instruments in general have been described by Canagaratna et al. (2007), and the compact ToF analyser version (CToF) used in this study by Drewnick et al. (2005). Additional, more specific details related to the specific instrument we used are available in our previous study (Äijälä et al., 2017).
In brief, the AMS instrument sucks sample aerosol from atmospheric pressure
to vacuum conditions through an inlet system consisting of a critical
orifice and a particle concentrating aerodynamic lens (Liu et al., 2007).
The sample aerosol beam is directed at a vaporiser operated at 600
The per-amu (atomic mass unit) analyser signal is subsequently quantified based on instrument response calibrations and corrections (among others the correction for relative ionisation efficiency between the species, RIE; Allan et al., 2004; Supplement Sect. S4). Individual, unit-mass-resolution amu signals are then chemically speciated, based on chemical information on fragmentation and air composition (see Allan et al., 2003b, for details). Additional, specific minor modifications to our instrument have been discussed in our previous work (Äijälä et al., 2017).
After basic processing, the data were further prepared, to serve as input for factorisation (described in following Sect. 2.3). The organic and inorganic data and related uncertainties were extracted, and down-weighting of signals performed. The procedure for extraction and preparation of AMS organic signal and related error matrices has been described by Allan et al. (2003b) and Ulbrich et al. (2009).
In short, measurement points or variables with missing data were omitted and
error matrices calculated, based on a function accounting for both counting-statistics-induced uncertainty as well as background noise from the detector
and electronics. The signals were then down-weighted by multiplying the
error-matrix-conveyed uncertainty values for low signal-to-noise ratio (SNR)
variables with a scalar: “weak” variables (SNR
For factorisation, we used the PMF model developed by Pentti Paatero and
colleagues (Paatero, 1997, 1999; Paatero and Tapper, 1994) and widely used
for analysis of AMS data since 2007 (Lanz et al., 2007b; Zhang et al., 2011).
In brief, PMF is a statistical model, typically resolving a bilinear linear
combination of factor profiles (
The main features setting PMF apart from other similar factorisation models, and making it particularly suitable for atmospheric aerosol models, are on the one hand the limitation of factor profiles and time series to positive values, hence drastically reducing the amount of rotational ambiguity, and on the other hand the improved error model where the quantity to minimise is the weighted (typically the measurement uncertainty) residual, resulting in higher weight for the variables with better SNR. In PMF, the minimum weighted residual is solved using one of the related algorithms, i.e. PMF2 or Multilinear Engine 2 (ME-2; Paatero, 1999). Of the two algorithms, ME-2 can take in additional equations defined by the user, i.e. constraints the solutions need to adhere to. In this study, when ME-2 constraints were applied to the factor profiles, we set upper and lower bounds for the allowed profile solutions. The bounds were based on variability estimates obtained from earlier analysis, as explained later, in Sect. 2.5. Variability estimate of the final model is available in the Supplement (Fig. S13). For running the PMF and ME-2 algorithms, we used the Igor Pro (Wavemetrics Inc.) based SoFi (v. 4.8) user interface developed by Francesco Canonaco and co-workers at Paul Scherrer Institute (PSI). The interface allows input of the pre-processed data and user-selected parameters, and calls on the solver algorithms (PMF2 or ME-2, depending on assignment) to return a solution to be displayed and analysed in SoFi (Canonaco et al., 2013).
When PMF is used as a standalone method for source attribution, the
selection of solution needs to be carefully validated. Sensitivities towards
a different number of factors, rotations and initialisation seeds are
meticulously analysed, and correlations with auxiliary data are computed. A case
is then made for why the selection is the best possible. Contrarily, in our
analysis approach, we
To harmonise the description of aerosol components, we constructed a
constrained receptor model, where all the profile components were
constrained. For this purpose we applied a ME-2-based chemical mass balance
(CMB) type of model. CMB models are typically used as receptor models for
cases where source profiles are known, and only the mass loading information
needs resolving (Friedlander, 1973; Gordon, 1988; Hopke, 1991, 2016; Miller et
al., 1972). In such mass-conservation-based models, the observed
loadings are modelled as a sum of multiple individual sources. Although CMB
is often presented mathematically as the sum of loadings (Supplement; Sect. S1, Eq. S1), it can also be thought
of as a special case of the bi-linear model described in Eq. (1). Only
now the profile matrix (
In this work, we use a relaxed CMB-like bilinear model (henceforth abbreviated as r-CMB), where all the source profiles are constrained but allowed to vary within narrow limits (derived from variability estimates; see Sect. 2.5; Supplement Fig. S13). In strict technical terms this approach could be labelled “an extremely constrained ME-2 model”, but we choose to use the term “relaxed CMB” to differentiate between the typical use of ME-2 or constraining only part of the profiles, which allows the model considerably more freedom. We regard our use of the model as much closer to the idea of constraining all profiles than (semi-)exploratory factorisation typical for ME-2. The naming also serves to better highlight the conceptual differences between models in the different analysis phases.
Generally, the biggest problems of the CMB models relate to the selection of source profiles, typically from spectral libraries, and handling of their uncertainty. In our use, the anchor spectra as well as the limits for their allowed variabilities are experimentally derived from data, alleviating some of these typical concerns.
For spectra classification, we selected the
Based on our earlier metric comparison (Äijälä et al., 2017), we
used (Pearson) correlation as a metric for spectral dissimilarity (or
“distance”,
In clustering mass spectra, data weighting is often applied. Based on
previous tests (Äijälä et al., 2017), we applied mass scaling of
variables, advocated by Stein and Scott and others (Stein and Scott, 1994;
Kim et al., 2012; Horai et al., 2010), giving additional emphasis to higher
mass signals. This common practice is based on the idea that higher mass
fragment ions are more indicative of their parent ions, and thus the
original characteristic composition, while smaller fragments can be produced
from a wider variety of molecular fragmentation events. In mass scaling the
weighted variables (
The optimisation of mass scaling was based on the silhouette metric (later also
abbreviated as “silh”; Rousseeuw, 1987), ranging between
In order to mitigate the
Aerosol inorganic chemical speciation is better understood than the organic
speciation, due to much lower diversity of the chemical compounds involved.
A variety of aerosol inorganic equilibrium models exist and are typically
used as modules in atmospheric meteorological and air quality models.
However, performing thermodynamic equilibrium calculations is
computationally demanding (e.g. Fountoukis and Nenes, 2007) and requires a
good deal of auxiliary data on thermodynamic conditions and chemical
activities. Due to the complexity of the models and increased data needs,
simpler approximations are often used in connection with AMS inorganic
speciation. In the following ion-balance-scheme description, we denote the
respective AMS ion species molar concentrations in square brackets (e.g.
[
A typical salt formation approximation used for AMS results is the Hong et
al. (2017) ion pairing scheme, used in aerosol volatility and light
scattering models, for example (Hong et al., 2017; Zieger et al., 2015). The
Hong et al. (2017) scheme is based on similar approximation of Gysel et
al. (2007), which in turn is a simplification of the more extensive model by
Reilly and Wood (1969). We modified the Hong et al. (2017) scheme to
additionally allow organonitrate (
Briefly, in the scheme we apply,
Schematic representation of the inorganic apportionment scheme. The
scheme is divided into three cases according to the ratio of
[
The organic nitrate estimate in the above model is very sensitive to calibration
parameters (see Supplement Sect. S4). Therefore, in addition to the ion-balance-based scheme above, we additionally calculated a particulate organonitrate
mass estimate (
As stated in the Introduction, one of the aims of our work was to derive a robust, harmonised receptor model for the measurement site via explorative analysis. Considering the large amount of campaigns during different seasons, resulting in changing aerosol source contributions and mass spectral profiles, factorisation needed to be performed on a per-campaign (data set) basis. However, instead of performing traditional PMF complete with correlation analysis, source validation and the various sensitivity analyses separately, which would be an arduous task even for a single measurement set, we used the large amount of data sets to our advantage. Instead of optimising individual factorisations, we constructed an r-CMB model applicable to all data sets. A similar task of constructing a semi-exploratory synthesis aerosol model, albeit one applying a different methodology, was undertaken and reported by Sofowote et al. (2015).
A flowchart illustrating the analysis using combined methodology. After initial data collection and preparation, statistical analysis is performed in three phases (P-I to P-III). Each phase limits the freedom given to factorisation from completely free (PMF) to nearly fully constrained (r-CMB). Finally, we evaluate and interpret the r-CMB model from an aerosol chemical perspective.
To derive the anchors and constraints for a synthesis r-CMB model, we
analysed the data in three phases (P-I to P-III; Fig. 2), each consisting
of factorisation, classification and silhouette-based post-weighting of
anchor spectra and their allowed variabilities. The allowed variabilities
were constrained by setting upper and lower bounds (the estimated
variability ranges from the previous phase) for factor profiles. In Phases I
and II, a fixed number of 10 factors were resolved. This amount of factors
was semi-arbitrarily chosen, and in our case likely to be somewhat above the
optimal amount for most data sets, leading to over-resolved factor
solutions. However, unlike in traditional PMF analysis, we can use
additional statistical diagnostics and post-processing options available to
deal with potential fallout of unrealistic factor splitting (i.e.
classification for identifying outliers and post-processing down-weighting
or nullifying their influence). Sensitivity to initialisation seed was
examined by performing all runs using 10 initialisation seeds, and generally
selecting the solution with lowest normalised residual. In rare cases of a
physically unrealistic solutions such as the one with the lowest residual (e.g. only
In phase I (P-I), we performed unconstrained factorisation for all the eight data
sets. With 10 factors this resulted in a total of 80 factors of mass spectra. We
then determined the dominant spectra classes using
For a cluster centroid to qualify as an anchor for further phases of our
analysis, we applied the following two criteria: (1) the spectra forming the
cluster were present in multiple (
Using the anchors and within-cluster variabilities, we re-ran factorisation as in P-I, except now partly constrained (ME-2; 4 of 10 factors constrained using anchors from P-I). In phase II, we focused on analysing the remaining free factors, likely corresponding to the biogenic and assumedly more variable factors (Canonaco et al., 2015; Crippa et al., 2014). The procedure for classification and the selection criteria for the (assumedly) biogenic SOA in this phase were the same as in phase I.
Due to the data-driven analysis approach, specifically the constrained factors being selected from phase I, we do not expect major changes between phase I and phase II (P-II) results. While arguably the methodology could be further developed to constrain the r-CMB components directly from the phase I result, phase II of our analysis currently serves several purposes: (1) it should narrow down the solution space for improved description of the various SOA types, by constraining the anthropogenic, assumedly primary aerosols. (2) Compared to P-I, the allowed solutions are more similar for all data sets in P-II, which reduces the scatter of the factorisation solutions. This reduces the spectral variability (uncertainty) arising from the analysis process itself, allowing us to iteratively converge on more realistic limit values for the constraints. Ultimately, the limits should reflect the actual, natural chemical variabilities within the aerosol types. (3) Similarity of results between successive, un- or semi-constrained phases allows evaluation of stability, reliability and repeatability of the method, so that it is not e.g. overly sensitive to rotational ambiguity or initialisation parameters of algorithms. This is important since the method described here is new, and its robustness needs to be demonstrated, but less so in potential later use.
In phase III (P-III), we constructed the r-CMB receptor model. In this phase, all
the factors were constrained using anchors and variabilities from the
previous phase result. The number of components in the final r-CMB model, in
our case 7, was equal to the total number of selected aerosol types in phase II. With these model constraints, we performed runs for each of the eight data
sets separately. Using the resulting
In Sect. 3.1, we briefly describe the results from analysis phases I to III (P-I to P-III; corresponding to Sect. 3.1.1 to 3.1.3) but concentrate more on the receptor model results and their interpretation (Sect. 3.2). Finally we will compare our results with reference methods (Sect. 3.3). Comparison results are available in the literature for organic aerosol components (Sect. 3.3.1), and in Sect. 3.2 we will compare inorganic speciation with the alternative inorganic attribution methods, described in Methods (Sect. 2.4). Finally, we briefly describe some of the outlier observations which contain potentially interesting chemical information (Sect. 3.4).
When interpreting and identifying aerosol components, we evaluate spectral
similarity using the same similarity metric (mass scaled correlation) as for
the clustering (Eqs. 3 and 4). We thus report mass scaled squared
correlation coefficients (
In phase I, we performed unconstrained PMF runs using 10 factors for all 8 data sets separately. The resulting 80-factor spectra were subsequently
clustered. Maximal data structure (silhouette 0.56) was achieved at mass
scaling
The eight largest clusters for P-I classification of factorisation
results. Cluster centroids (coloured bars) and variabilities (error bars) are
silhouette-weighted averages and standard deviations for the cluster members.
The main anthropogenic aerosol types were identified as clusters no. 2
(“Ammonium sulfate”, AS), no. 4 (“Hydrocarbon-like organic aerosol”,
HOA), no. 5 (“Biomass burning organic aerosol”, BBOA) and no. 8 (“Ammonium
nitrate”, AN). Cluster number, silhouette and population (
Unsurprisingly, the classification returns two large clusters of organic
aerosol resembling the ubiquitous low-volatile oxidised
organic aerosols (no. 1; LV-OOA) and semi-volatile oxidised organic aerosol
(SV-OOA; e.g. Aiken et al., 2007; Jimenez et al., 2009; Zhang et al., 2011).
Comparing to library spectra, the aerosol type dominated by
Final silhouette-weighted reference spectra (coloured bars) and variabilities (error bars) for the r-CMB model components.
The solution also contains a large cluster (no. 2) with spectra dominated by
ammonium and sulfate ion species. This is in agreement with ammonium
sulfate being a main component of ambient aerosols. Although it also contains
trace amounts of other species, we name the
The main nitrate-containing spectra are divided into two clusters (no. 6 and
no. 8). The divisive feature seems to be the ratio of
A fraction of the organic signal observed at
Two of the clusters (no. 4 and no. 5) seem related to anthropogenic
(primary) organic aerosol types. Cluster no. 4 has a similar spectrum as the
hydrocarbon-like-organic aerosol (HOA) spectra from the AMS spectral
database (Ulbrich et al., 2009) and closely matches, among others, HOA
reported by Zhang et al. (2005) for Pittsburgh
(
Cluster no. 5 features high signals for ions typical of biomass burning
organic aerosol (BBOA, e.g. Alfarra et al., 2007) and cooking organic
aerosol (COA, e.g. Mohr et al., 2012). The spectra features the marker
signals of levoglucosan (Cubison et al., 2011; Schneider et al., 2006) at
The differentiation between HOA versus BBOA or COA can often be resolved even
from unit resolution spectra, using the
Cluster 7 spectrum offers little in terms of unique spectral features, and it appears as though it could be represented as a combination of the more distinct AS (no. 2), LV-OOA (no. 1) and ON (no. 6) aerosol types. It is unclear whether this class represents an actual aerosol chemical type, or whether it is due to incomplete resolving of the aforementioned species in the PMF model. We note that the organics part of AS, LV-OOA and ON are all highly oxidised, which may imply similar levels of aging and thus similar origins for these species. Organic spectral components are further analysed and discussed in Sect. 3.2.2.
Based on this interpretation and evaluation of criteria outlined in
Sect. 2.5, we decided to select the following as the main representative anthropogenic
aerosol types: ammonium sulfate (cluster no. 2,
In the second phase of our analysis, ME-2 factorisations were run for 10
factors for all the data sets. We constrained 4 out of the 10 factors with
the anchors and variabilities for anthropogenic aerosol types, derived from
the previous phase (AS, AN, HOA, BBOA). The resulting 80-factor profiles were
again extracted and classified. The classification solutions featured
generally higher silhouette values than in the first phase, which is at least
partly explained by constrained spectra being forced to conform to their set
limits. The highest total silhouette (0.66) was obtained for 15 clusters (at
The expected LV-OOA (no. 1;
For P-III of our analysis, we additionally fix the organic nitrogen class,
(ON, P-II cluster no. 8). Irrespective of the exact chemical composition and
label of this aerosol component, we assess that there is enough literature
support (among others Kiendler-Scharr et al., 2016; Farmer et al., 2010;
Drewnick et al., 2015; Murphy et al., 2007; Hao et al., 2014) for inclusion
of nitrogen-containing aerosol types other than AN to warrant the inclusion
of this class. In any case, the classification of nitrate signal at
In the final phase (P-III) of constructing our r-CMB receptor model, we used seven factors which were all constrained with the profiles and allowed variabilities from the previous phase (P-II, AS, LV-OOA, SV-OOA, BBOA, ON, HOA, AN). The ME-2 algorithm was tasked with resolving the factors' temporal behaviour.
To derive final characteristic spectra for the model components, as well as
to study the variability of spectra in the solutions, we once more applied
the same clustering procedure and silhouette analysis as for previous
phases. The maximal structure (silh 0.85) was achieved for the seven-cluster
solution (
Tabulation of final explained variations (EVs; Paatero, 2000; Canonaco et
al., 2013) for the r-CMB model is shown in Table 3. The seven-component
r-CMB model explains
Explained variations (EV, in percent) for the r-CMB model.
“Default” chemical speciation for r-CMB components: mass
loadings
Model results for campaign VIII, especially regarding BBOA, are very different from other data sets, including the other cold season results available in data set III, for example (Fig. S5). Upon closer examination, we attribute the VIII anomaly at least partly to pronounced surface ionisation effects, discussed more in Sect. 3.4. While we consider the r-CMB results for campaign VIII too unreliable for use in models or further studies, we decided not to omit data set VIII, since other AMS data are likely also affected by the same processes, albeit to a lesser degree. The attribution of anomalies to exact processes is very difficult, and surface ionisation effects remain hard to quantify. We hope that reporting our results in full also furthers the discussion of surface ionisation in the AMS, and potentially helps other AMS users observing similar observations.
The composition of our r-CMB components is shown in Fig. 5b, and the same in
absolute mass units in panel (a). The opposite visualisation,
i.e. attribution of default species into r-CMB components, is similarly given
for absolute mass concentration and relative units in Fig. 5c and d. Unlike
mass spectral variables and estimated EV, where signals at
Generally, the separation between the inorganic r-CMB components (AS, AN)
and organics (LV-OOA, SV-OOA, BBOA, HOA) seems clear (Fig. 5). Ammonium
nitrate and sulfate components consist primarily of inorganic ion species
(81 % to 84 %), while for organic components the inorganic ion species
contribution is small (LV-OOA: 8 %, SV-OOA: 8 %, BBOA: 6 %, HOA:
3 %). However, extensive oxidation of organics in aerosol typically
results in the formation of organic acids (Yatavelli et al., 2015; Vogel et al.,
2013; Duplissy et al., 2011), and we hypothesise that organic salt formation with
[
Mass attribution in the default AMS speciation scheme
Explanations for the observed mixing of ion species can include (1) mixed
emission profiles at sources, variabilities within a source type, as well as
collocation of sources; (2) atmospheric processes, such as mass transfer
between the species by evaporation, condensation (e.g. Ye et al., 2016) or
coagulation; and (3) PMF or r-CMB modelling uncertainties. We will discuss the
relative ratios and neutralisation balances of inorganic ion species in
Sect. 3.3.2, in relation to inorganic salt formation scheme. The interesting
exception to the rather clear-cut ion species separation is the ON component,
which contains 40 % of
As for the organics–inorganics division, the two speciations (default vs. r-CMB) give similar results (Fig. 6). For all the data sets combined, the default organic ion species (“org”) explains an average 57 % of total aerosol mass at the site. Similarly, combining the mass of all organic-dominated components (LV-OOA, SV-OOA, BBOA, HOA and ON) results in 60 % mass fraction versus 40 % explained by ammonium nitrate (5 %) and ammonium sulfate (35 %) salts. The per-data-set mass apportionment is presented in the Supplement (Fig. S9).
As discussed above, despite the mixing observed, the inorganic aerosol classes generally seem separate from organic aerosols. The scaled correlation values between inorganic and organic spectra are extremely low (Supplement Sect. S8, Tables S1 and S2), indicating near-zero similarity and clear-cut separation between the inorganic and organic aerosol types by the clustering algorithm. For inter-correlations between the organics-dominated aerosol classes, the picture is somewhat more complex.
To understand the drivers for the separation of the organic aerosol types, we
visualised the phase I (unconstrained PMF) and phase III (r-CMB)
classification results with a projection of the clustering solutions onto a
plane defined by an axis corresponding to estimated oxidation level and
another connected to source type (P-III in Fig. 7; P-I available in the
Supplement, Fig. S6). Similar to Äijälä et al. (2017), we
describe the oxidation level of the organic fraction of each component using
the oxygen-to-carbon ratio (O : C) parameterisation of Aiken et al. (2008),
and use the ratio of
The LV-OOA aerosol type, characterised by the dominant
We also projected the P-I and P-III solutions to the (
As stated in Sect. 3.1, the spectra of BBOA and HOA aerosol types match
the previously published observations. The HOA spectrum is characterised by
the ion series
In terms of spectral characteristics, the organic contributions of AS and AN
classes fall somewhere between the distinct organic classes and offer little
in terms of significant organic markers. Notably, the organics in the ON
class exhibit some of the characteristics of LV-OOA and feature generally
high
In order to evaluate the performance of the source apportionment approach
presented in this study for organic aerosol, we compare our results to
results only relying on the organic mass spectral fingerprints. Specifically,
two data sets covered in this study (data sets II and III; Table 1) were also
included in the Crippa et al. (2014) analysis, which allows us to compare
factorisation results directly. We chose to compare the Crippa et al. (2014)
results to ours from data set II. We note that while there are minor
differences in the pre-processing and corrections for data covered in Crippa
et al. (2014), the factorisation input is very similar in both cases. The
ME-2 model used by Crippa and co-workers included only the organic spectra
and apportioned its mass to four factors: LV-OOA, SV-OOA, BBOA and HOA. The
latter two components were constrained using a HOA profile from an urban
aerosol study in Paris (Crippa et al., 2013) and an average BBOA of those
extracted for Mexico City, Mexico, and Houston, USA (Ng et al., 2011). The
allowed variability around these anchors for all variables (
We compared the solutions for Crippa et al. (2014) factorisation to our r-CMB
model solution data set II, both for loadings (Fig. 8) and profiles (Fig. 9).
Generally the solutions correlated highly – the loadings (
Time series comparison of aerosol organic component with Crippa et al. (2014) for the September 2008 campaign (data set II). For comparability, only the organic part of r-CMB model components are considered. Data from this work have been averaged to 1 h resolution. Organics in other r-CMB components (AS, AN, ON) are taken into account for the total amount but not shown separately. Discrepancy in total organics loading is due to differences in pre-processing values (e.g. ionisation efficiency, collection efficiency).
Comparison of organic part of spectra with Crippa et al. (2014) for
data set II. The r-CMB model results from this study are shown in colour, and
the Crippa et al. (2014) spectra in black. For comparability, the Crippa et
al. (2014) spectra were corrected for a difference in fragmentation tables
used (included
The discrepancy in distribution of absolute mass for the LV-OOA and SV-OOA components, indicated by the sub-unity slope, suggests the r-CMB model attributes a part of the organic mass from the SOA factors into BBOA, AS, AN and ON components, while HOA is represented rather identically in both models. A difference in mass distribution between the results is to be expected, considering the r-CMB model allows for organics in seven components, while the model of Crippa et al. (2014) model only comprises four components. Generally, we take the similar results of the methods, as shown by the high correlation values, to indicate that inclusion of inorganics in the model does not significantly perturb modelling of the organics. We also note the r-CMB components included (HOA BBOA, LV-OOA, SV-OOA) are predominantly composed of organics (92 % to 97 %; Fig. 5), and the four components presented comprise 82 % of total organics.
To evaluate the inorganic mass apportionment result, we compared the
loadings from the r-CMB solution against the result from the inorganics
apportionment scheme (Sect. 2.4.1). The comparison, again performed for
data set II, is presented in Fig. 10. We additionally compared the r-CMB
ON component loadings with
Comparison of Inorganics apportionment methods (r-CMB and ion balance scheme. The estimates from the ion balance scheme (Sect. 2.4.1) are shown in black, and the r-CMB model results in colour. The linear fits (right panels) represent the data poorly due to high amount of zero-value points and outliers.
The loadings for the (r-CMB) AS component compare well with the combined
The loading prediction for organic nitrogen by the speciation scheme model
is similarly event-driven and the model results do not correlate. This is
caused by the nitrate assignment to organonitrate class when not explained
by
On these differences between the models, we note that the ion-balance-based
apportionment scheme is sensitive to small changes in
Comparison of Kiendler-Scharr parameterisation (Kiendler-Scharr et
al., 2016; black line; moving median filter for 11 points window applied;
In addition to deriving organic nitrogen mass from the ion balance scheme,
we compared the r-CMB-derived ON loading with the Kiendler-Scharr method for
estimating the orgNO3 mass loading (Eq. 6). The comparison, shown in
Fig. 11, indicates that the two methods produce a very similar result for
organic nitrogen mass (
The similarity to Kiendler-Scharr parameterisation result does seem to
support the interpretation of a nitrogen component in ON as organonitrate
(
The
During the course of our analysis we encountered some anomalous observations likely stemming from surface ionisation effects, i.e. molecules being thermally ionised at the heater surface rather than at the ionisation region by electron impact. A thorough review and discussion of AMS-related surface ionisation effects was recently published by Drewnick et al. (2015). Drewnick et al. (2015) emphasise that the division between refractory and non-refractory aerosol is not binary, and there exist a number of semi-refractory compounds that the AMS can measure, albeit non-quantitatively.
Spectra of outlier clusters (no. 9 to no. 17) for P-I. The spectra
for these outlier classes were omitted from our analysis due to not meeting
the criteria of (1) occurrence and/or (2) interpretability (on an acceptable
level). Despite their mostly speculative value, many of them feature some
chemically interesting characteristics, potentially pointing to the presence of
amines (signals at
Our observations on extracted “outlier” PMF factors from the different phases of analysis match well with the finding and calculations of Drewnick et al. (2015), as well as other similar AMS observations published. In Fig. 12, we present the outlier clusters from phase I classification solution that were excluded from further analysis due to a low number of occurrences or/and questionable interpretability. The emergence of most of these spectra are likely attributable to over-resolution or questionable separation of the main PMF factors, due to setting the number of PMF factors to 10. Despite their questionable value for the main analysis, we find they contain many potentially interesting mass spectral features and seem not to emerge by chance. Below we will present some hypotheses on their possible interpretation.
Drewnick et al. (2015) note that the main semi-refractory elements eligible
for ionisation in the AMS are Cd (
A similar data processing/correction artefact is likely seen in cluster
no. 12 with a lone, dominant signal at
The prominent signals at
As for the signals often attributed to amines at 86 and 100 Th, (Mclafferty,
1959), featured in cluster no. 11, in the absence of alternative explanation for
the 100 and 86 signals, we are inclined to believe they actually represent
atmospheric amines. The cluster spectrum corresponds also to the spectra of
pollution plumes, extracted for data sets I to III in our previous study on
pollution events (Äijälä et al., 2017). We note that amines are also
reported to be prone to surface ionisation, and for example trimethylamine is
thermally ionised above temperatures 300
Clusters no. 13, no. 15 and no. 16 are interesting from the viewpoint of
organonitrates and sulfates. Nitrate signal in clusters no. 15 and no. 16
is composed mostly of
Finally, we wish to draw attention to the ion series of cluster no. 16, with prominent organic signals at 69, 79, 81, 95, 107 and 109 Th, which have been connected to cycloalkanes (McLafferty and Turecek, 1993; Alfarra et al., 2004). Cycloalkanes are common in lubricating oils for example (Liang et al., 2018), which are an important, even dominant, component in traffic emissions (Worton et al., 2014). The closest literature match on ambient observations we found was the study of Takami et al. (2007), where they observed similar high concentrations of mass-to-charge 95, 107 and 109 Th, as well as 58 and 85 Th, but were unable to attribute the observation to a specific source.
We performed a synthesis analysis on eight AMS data sets from a boreal
forest site and constructed a data-driven chemical mass balance type of
receptor model, with relaxed constraints on the component profiles (r-CMB).
Notably, the data comprised both inorganic and organic aerosol components.
The resulting seven-component model explained
Remarkably, organic nitrogen seems to be a larger component than ammonium nitrate
for the site. However, ambiguity remains in the interpretation of the
organic nitrogen class as organonitrate, prompting caution against casual
use of the
We suggest inorganics should be routinely included in factorisation of AMS
data due to the high demand of such data in aerosol models. We wish
specifically to point out that adding the inorganic information is easy and
only requires application of the same tried-and-tested data processing and uses
the same error model as for organics. While inclusion of inorganics does
diminish the relative weight organics carry in the analysis and thus may
hinder extraction organic factors comprising very low fraction (
The classification methods presented here for evaluating factor analysis
output can also be useful in applications that produce large
quantities of discrete aerosol spectral data, such as deriving factorisation
error estimates via bootstrapping analysis (Osborne et al., 2014; Brown et
al., 2015). With further development, we find it likely that a two-step analysis
(exploratory factorisation
We would also encourage further development of combined statistical methods for improved mass spectral feature extraction and parameterisation for mass spectra, as they will enable future machine-learning applications for data analysis. Drawing from the comprehensive information available on current size-resolved aerosol mass spectrometric data, it seems likely that advanced machine-learning methods (such as data reduction combined with predictive neural networking, e.g. Burns and Whitesides, 1993; Gasteiger and Zupan, 1993) will likely provide new, improved ways to model aerosol physicochemical properties like hygroscopicity, volatility and optics in the near future.
The AMS r-CMB data presented in this study are available
online (Äijälä et al., 2019). The r-CMB component profiles will
additionally be made available in the AMS spectral database
(
The supplement related to this article is available online at:
Contributor roles (shown in italics) corresponding to the taxonomy of CASRAI's CRediT
definitions (
The authors declare that they have no conflict of interest.
We wish to thank the technical staff at INAR and SMEAR II (Pasi Aalto, Erkki Siivola, Heikki Laakso, Toivo Pohja, Veijo Hiltunen and Janne Levula) for valuable support during the years 2008–2010 in acquiring the data sets analysed here. We thank Douglas Worsnop for pioneering work in starting the AMS studies at University of Helsinki, and the valuable insightful discussions on AMS data analysis and interpretation. We also gratefully acknowledge the friendly support staff at Aerodyne Research (especially Donna Sueper and Leah Williams) for their help on data analytical questions.
The research was supported by the following programs: the European Commission FP6 projects EUCAARI (036833-2), FP7 ACTRIS (262254), the Horizon 2020 project ACTRIS-2 (654109), ERC Grant COALA (638703), the Finnish COE project CRAICC (272041) and the Academy of Finland COE in Atmospheric Science (2008–2019). Edited by: Dominick Spracklen Reviewed by: two anonymous referees