Improved identification of primary biological aerosol particles using single-particle mass spectrometry

Measurements of primary biological aerosol particles (PBAP), especially at altitudes relevant to cloud formation, are scarce. Single-particle mass spectrometry (SPMS) has been used to probe aerosol chemical composition from ground and aircraft for over 20 years. Here we develop a method for identifying bioaerosols (PBAP and particles containing fragments of PBAP as part of an internal mixture) using SPMS. We show that identification of bioaerosol using SPMS is complicated because phosphorus-bearing mineral dust and phosphorus-rich combustion by-products such as fly ash produce mass spectra with peaks similar to those typically used as markers for bioaerosol. We have developed a methodology to differentiate and identify bioaerosol using machine learning statistical techniques applied to mass spectra of known particle types. This improved method provides far fewer false positives compared to approaches reported in the literature. The new method was then applied to two sets of ambient data collected at Storm Peak Laboratory and a forested site in Central Valley, California to show that 0.04– 2 % of particles in the 200–3000 nm aerodynamic diameter range were identified as bioaerosol. In addition, 36–56 % of particles identified as biological also contained spectral features consistent with mineral dust, suggesting internal dust– biological mixtures.

Abstract.Measurements of primary biological aerosol particles (PBAP), especially at altitudes relevant to cloud formation, are scarce.Single-particle mass spectrometry (SPMS) has been used to probe aerosol chemical composition from ground and aircraft for over 20 years.Here we develop a method for identifying bioaerosols (PBAP and particles containing fragments of PBAP as part of an internal mixture) using SPMS.We show that identification of bioaerosol using SPMS is complicated because phosphorus-bearing mineral dust and phosphorus-rich combustion by-products such as fly ash produce mass spectra with peaks similar to those typically used as markers for bioaerosol.We have developed a methodology to differentiate and identify bioaerosol using machine learning statistical techniques applied to mass spectra of known particle types.This improved method provides far fewer false positives compared to approaches reported in the literature.The new method was then applied to two sets of ambient data collected at Storm Peak Laboratory and a forested site in Central Valley, California to show that 0.04-2 % of particles in the 200-3000 nm aerodynamic diameter range were identified as bioaerosol.In addition, 36-56 % of particles identified as biological also contained spectral features consistent with mineral dust, suggesting internal dustbiological mixtures.

Introduction
Biological atmospheric aerosol (or bioaerosol) has recently garnered interest because certain species of bacteria and plant material might impact climate via the nucleation of ice in clouds (Hiranuma et al., 2015;Möhler et al., 2008).However, many field-based measurements of ice nuclei and ice residuals do not indicate that bioaerosol is a major class of ice-active particles (Cziczo et al., 2013;DeMott et al., 2003;Ebert et al., 2011).While modeling efforts suggest that biological material is not significant in ice cloud formation on a global scale, uncertainties continue to exist because field measurements of ice-nucleating particles are currently sparse (Hoose et al., 2010;Sesartic et al., 2012).
Published by Copernicus Publications on behalf of the European Geosciences Union.
In situ techniques specific to biological samples are typically based on the fluorescence of biological material following UV excitation.Examples include the wide-band integrated bioaerosol sensor (WIBS) which is available commercially (Kaye et al., 2000(Kaye et al., , 2005)).WIBS has been successfully deployed in several locations (Gabey et al., 2010;O'Connor et al., 2014;Toprak and Schnaiter, 2013).Using fluorescence to detect biological aerosol can have interferences, however.For example, polycyclic aromatic compounds or humic acids can have similar fluorescent properties (Gabey et al., 2010;Pan et al., 1999).Cigarette smoke has similar fluorescent properties to bacteria (Hill et al., 1999).In an attempt to address interferences, WIBS collects fluorescence information using several channels with different wavelengths while also measuring the size and shape of the particles.Table 1 summarizes some recent measurements of bioaerosol.More information can be found in recent reviews focused on bioaerosols in the atmosphere, such as Després et al. (2012).
Measurements of bioaerosol in the free and upper troposphere, where they could be relevant to cloud formation, remain scarce.Four of the recent studies reported in Table 1 used an aircraft to access altitudes higher than 4000 m (DeLeon-Rodriguez et al., 2013;Pósfai et al., 2003;Twohy et al., 2016;Ziemba et al., 2016).Two of these used the WIBS to report vertical profiles of fluorescent particles (Twohy et al., 2016;Ziemba et al., 2016).In the remaining two cases, aerosols were collected on filters and analyzed off-line.There can exist significant uncertainty in these measurements.A recent aircraft-based study by DeLeon-Rodriguez et al. (2013) reports analysis of high-altitude (8-15 km) samples taken before, after and during two major tropical hurricanes.The abundances of microbes, mostly bacteria, were reported to be between 3.6 × 10 4 and 3.0 × 10 5 particles m −3 in the 0.25-1 µm size range.The methods and conclusions of this study were re-evaluated by Smith and Griffin (2013), who argued that in some instances the reported concentrations of bioaerosol were not possible because they exceeded the total aerosol by several factors.The samples were also taken over periods of hours, possibly including sampling in clouds when the high-speed impaction of droplets and ice can dislodge particles from the inlet (Cziczo and Froyd, 2014;Froyd et al., 2010;Murphy et al., 2004).
Although difficult, measurements of bioaerosol in the upper troposphere are necessary in order to constrain their influence on atmospheric properties and cloud formation processes.All of the techniques discussed above, except for WIBS, are off-line and require expertise in sample processing and decontamination.WIBS is a possible in situ detection technique for bioaerosols, but it is relatively new and, as a result, has a short deployment history.There has been considerable interest in using aerosol mass spectrometry techniques to measure bioaerosol.Single-particle mass spectrometry (SPMS) has been successfully used since the mid-1990s to characterize the chemical composition of atmospheric aerosol particles in situ and in real time (Murphy, 2007).The ability of SPMS to simultaneously characterize volatile and refractory aerosol components makes it an attractive tool for investigating the mechanisms of cloud formation (Cziczo et al., 2013;Friedman et al., 2013).The general principle behind SPMS, and in particular the instrument discussed in this paper, the Particle Analysis by Laser Mass Spectrometry (PALMS), is the use of a pulsed UV laser for the ablation and ionization of single aerosol particles.Ions are then accelerated into a time-of-flight mass spectrometer.Laser ablation or ionization used with SPMS produces ion fragments and clusters and is susceptible to matrix effects such that quantitative results are possible only with careful calibration and consistent composition (Cziczo et al., 2001).
Biological aerosols have been studied with SPMS, in particular the aerosol time-of-flight mass spectrometer (ATOFMS; Cahill et al., 2015;Creamean et al., 2013;Fergenson et al., 2004;Pratt et al., 2009b).A property of SPMS bioaerosol spectra that has been exploited for their detection is the presence of phosphate (PO − , PO − 2 , PO − 3 ) and organic nitrogen ions (CN − , CNO − ) (Cahill et al., 2015;Fergenson et al., 2004).Those ions have also previously been shown to be present in nonbiological particles with the same instrument, however, such as vehicular exhaust (Sodeman et al., 2005) and soil dust (Silva et al., 2000).Particles that contain phosphates, organic nitrates and silicates have historically been classified as mixtures of bioaerosol and dust (Creamean et al., 2013).This work examines the prevalence of these ions in the context of spectra collected with PALMS.
Phosphorus was chosen as the focus of this paper because of its abundance in spectra of bioaerosol but also because it does not undergo gas-phase partitioning in the atmosphere (Mahowald et al., 2008).Therefore, the presence of phosphorus on a particle can often constrain its source, and only the classes of particles that are most likely to contain phosphorus are examined here.Emission estimates qualitatively agree that mineral dust, combustion products and biological particles constitute the principal phosphate emission sources.The global phosphorus budget has been modeled by Mahowald et al. (2008), indicating that 82 % of the total burden is emitted in the form of mineral dust.Bioaerosol accounts for 12 % and anthropogenic combustion sources, including fossil fuels, biofuels and biomass burning, account for 5 % (Mahowald et al., 2008).Recently, Wang et al. (2014) provided a higher estimate of phosphorus emissions from anthropogenic combustion sources: 31 %.In this estimate, mineral dust was responsible for 27 %, bioaerosol for 17 % and natural combustion sources for 20 % of total phosphorus emissions (Wang et al., 2014).
In this work, calcium-phosphate-rich minerals (apatite and monazite) and fly ash are chosen to represent dust and industrial combustion particle classes, respectively.In atmospheric particles, the composition can be mixed, containing some phosphate from inorganic sources, such as calcium phosphate, and some phosphate from microbes.For instance, soils can contain minerals, live microbes and biogenic Table 1.Measurements of biological aerosol in the atmosphere (NR -not reported; FBAPs -fluorescent particles, attributed to bioaerosol).TEM: transmission electron microscopy.(fall) 2.9 ×   2013) by Smith and Griffin (2013).

Site
matter at all stages of decomposition.Therefore, classifying soil-derived particles with a binary biological-nonbiological classifier has uncertainties.These uncertainties are quantified here for soils using soil samples collected in various locations.
In this work, the presence of phosphorus in a mass spectrum is evaluated as a proxy for bioaerosol.All biological cells contain phosphorus because it is a component of nucleic acids and cell membranes.Distinguishing the specific mass spectral phosphate signature of biological cells from other nonbiological phosphorus is the topic of the analysis in this paper.The goal of this paper is to develop a method that can differentiate PALMS bioaerosol spectra from spectra of dust and combustion by-products.

Experimental setup
The objective of this work is to describe and validate a new SPMS-based data analysis technique that allows for the selective measurement of bioaerosol.A dataset of bioaerosol, phosphate-rich mineral and coal fly ash single-particle spectra -the three largest sources of phosphorus in atmospheric aerosols -was used to derive a classification algorithm for biological and nonbiological phosphate-containing material.This classifier was then applied to an ambient dataset collected at the Storm Peak Laboratory during the Fifth Ice Nucleation workshop, phase 3 (FIN03).

PALMS
The NOAA PALMS instrument has been discussed in detail elsewhere (Cziczo et al., 2006;Thomson et al., 2000).Currently, there are two copies of the PALMS instrument, both of which were used in this work.The laboratory PALMS is a prototype for the flight PALMS, which is more compact and can be deployed unattended at field sites and on aircraft (Thomson et al., 2000).Briefly, PALMS uses an aerodynamic lens to sample aerosols and impart them with a sizedependent velocity (Zhang et al., 2002(Zhang et al., , 2004)).Aerodynamic particle diameter is measured by timing the particles between two continuous-wave laser beams (532 nm Nd : YAG in laboratory PALMS and 405 nm diode in flight PALMS).The particles are ablated and ionized in one step by a 193 nm excimer laser.A unipolar reflectron time-of-flight mass spectrometer is then used to acquire mass spectra.PALMS acquires spectra in either positive or negative polarity, but not simultaneously.For field datasets presented in this paper, sampling polarity was switched every 5 min for FIN03 and every 30 min for the Carbonaceous Aerosol and Radiative Effects Study (CARES).
Due to the high laser fluence used for desorption and ionization (∼ 10 9 W cm −2 ), PALMS spectra show both atomic ions and ion clusters, which complicate spectral interpretation.SPMS is considered a semiquantitative technique be-cause the ion signal depends on the abundance and ionization potential of the substance rather than solely on its abundance (Murphy, 2007).Additionally, the ion signals can depend on the overall chemical composition of the particle, known as matrix effects (Murphy, 2007).The lower particle size threshold for PALMS is ∼ 200 nm diameter and is set by the amount of detectable scattered light.The upper size threshold is set by transmission in the aerodynamic lens at ∼ 3 µm diameter (Cziczo et al., 2006).In PALMS, Particles toward the larger end of this size range are transmitted into the laser beam more efficiently than smaller particles.The 193 nm excimer laser can ionize all atmospherically relevant particles within this size range with a little detection bias (Murphy, 2007).The ionization region is identical in the laboratory and flight PALMS instruments.Raw PALMS spectra are processed using a custom IDL software.Mass peak intensities used in this paper refer to integrated peak areas normalized by the total ion current.

Aerosol standards
Table 2 shows numbers of negative spectra for all analyses in this paper.A portion of the data from each of the bioaerosol and nonbiological phosphate samples was used as "training data" to build the classification algorithm.The remaining test data were classified using the trained algorithm.

Training dataset
A collection of phosphorus-containing samples of biological and inorganic origin were used to train the classification algorithm used in this work.Some of the samples were analyzed with the laboratory PALMS at the Aerosol Interaction and Dynamics in the Atmosphere (AIDA) facility at Karlsruhe Institute of Technology (KIT) during the Fifth International Ice Nucleation Workshop, phase 1 (FIN01), with the remainder sampled at MIT.
Biological aerosol sampled at AIDA included two aerosolized cultures of Pseudomonas syringae bacteria, Snomax (Snomax International, Denver, CO) (irradiated, desiccated and ground Pseudomonas syringae) and hazelnut pollen wash water.The Snomax and P. syringae cultures were suspended in water and aerosolized with a Collisontype atomizer.The growth medium for P. syringae cultures was Pseudomonas Agar Base (CM0559, Oxoid Microbiology Products, Hampshire, UK).
Biological aerosol sampled at MIT included giant ragweed (Ambrosia trifida) pollen, oak (Quercus rubra) pollen, European white birch (Betula pendula) pollen, Fusarium solani spores and yeast.Samples of dried pollens and F. solani spores were purchased from Greer (Lenoir, NC).Information supplied by the manufacturer indicates that F. solani fungus was grown on enriched trypticase growth medium and killed with acetone prior to harvesting the spores.Ragweed and oak pollen originated from wild plants, while the birch pollen The yeast powder was sampled by PALMS from a vial subjected to slight manual agitation.Pollen grains were too large (18.9-37.9µm according to manufacturer's specification) to sample with PALMS.They were suspended in ultrapure water (18.2M cm, Millipore, Bedford, MA), and the suspensions were sonicated in an ultrasonic bath for ∼ 30 min to break up the grains.Large material was allowed to settle to the bottom, and a few drops of the clear solution from the top of the suspensions were further dissolved in ultrapure water, and the resulting solutions were aerosolized with a disposable medical nebulizer (Briggs Healthcare, Waukegan, IL).A diffusion dryer was used to remove condensed-phase water prior to sampling with PALMS.F. solani spores were sampled in two different ways: (1) dry and unprocessed, in the same way as the yeast, and (2) fragmented in an ultrasonic bath and wet-generated, in the same way as pollen samples.Examination of PALMS spectra revealed no changes in chemistry resulting from different processing methods.
Samples of fly ash from four coal-fired US power plants were used as a proxy for combustion aerosol: J. Robert Welsh Power Plant (Mount Pleasant, TX), Joppa Power Station (Joppa, IL), Clifty Creek Power Plant (Madison, IN) and Miami Fort Generating Station (Miami Fort, OH).The samples were obtained from a commercial fly ash supplier (Fly Ash Direct, Cincinnati, OH).Fly ash was dry-generated with the shaker.
Apatite and monazite-Ce mineral samples were generated from ∼ 7.5 cm pieces of rock.The rocks were ground and the samples aerosolized with the shaker.Both apatite and monazite were sampled and processed at MIT.The apatite rock was contributed by Adam Sarafian (Woods Hole Oceanographic Institution, Woods Hole, MA).
Two samples of German soil were used as an example of agricultural soil that was known to be fertilized with inorganic phosphate.These were also sampled at the AIDA facility during FIN01.Note that while all other soil samples are used as test aerosols for a completed classifier, those two in particular are used in the training set to account for the presence of inorganic fertilizer.
Samples of apatite and J. Robert Welsh Power Plant fly ash were also subjected to processing with nitric acid to approximate atmospheric aging.Powdered sample was aerosolized from the shaker to fill a 9 L glass mixing volume.A hot plate below the volume was used to heat the air inside to 31 • C, measured in the center of the volume with a thermocouple.PALMS sampled at a flow rate of 0.44 slpm (STP (standard temperature and pressure): 0 • C, 1 atm) from the 9 L volume.This constituted unprocessed aerosol.Then, 80 % HNO 3 was placed with a Pasteur pipette at the heated bottom of the mixing volume.Two experiments were conducted: for experiments using 0.1 mL of nitric acid, the entire volume of HNO 3 evaporated, producing an estimated partial pressure of about 0.005 atm in a static situation.In 1 mL experiments some liquid HNO 3 remained at the bottom of the volume with an estimated partial pressure of HNO 3 of 0.04 atm.The aerosol and gas-phase HNO 3 were allowed to interact for 2 min, at which point PALMS began sampling from the volume.

Test dataset
Samples of natural soil dust were collected from various locations listed in Table 3.Five samples were investigated at the AIDA facility during FIN01 (Bächli soil, Argentina soil, Ethiopian soil, Moroccan soil and Chinese soil) with the remaining analysis at MIT (Storm Peak and Saudi Arabian soil).
Internally mixed biological-mineral particles were also analyzed at MIT. Illite NX (Clay Mineral Society) without bioaerosol was sampled dry, using a shaker (Garimella et al., 2014), and wet-generated, using a medical nebulizer containing ultrapure water.A second disposable medical nebulizer was then used to aerosolize a suspension of illite NX and F. solani spore fragments.This wet-generated aerosol was also dried with a diffusion dryer prior to PALMS sampling.

Statistical analysis
A support vector machine (SVM), a supervised machine learning algorithm (Cortes and Vapnik, 1995), was used as the statistical analysis method for analysis of these data.In this case a nonlinear binary classifier was constructed, using nonlinear kernel functions (Ben-Hur et al., 2001;Cortes and Vapnik, 1995).A Gaussian radial basis function kernel was empirically determined to provide the best performance in this case.For this work, the SVM algorithm was implemented in MATLAB 2016a (MathWorks, Natick, MA) using the Statistics and Machine Learning toolbox.

Field data
The method was employed on two ambient datasets: one acquired at the Desert Research Institute's (DRI's) Storm Peak Laboratory located in Steamboat Springs, CO, and the other acquired at the Cool, CA, site during the CARES study.Storm Peak Laboratory is located on Mt.Werner at 3220 m elevation at 106.74 • W, 40.45 • N.This high-altitude site is often in free-tropospheric air, mainly during overnight hours, with minimal local sources (Borys and Wetzel, 1997).Ambient air was sampled using the Storm Peak facility inlet with the flight PALMS instrument in September 2015.Measurements were made during the Fifth International Ice Nucleation Workshop, phase 3 (FIN03).The measurements were carried out between 14 and 27 September 2015.
The CARES study was carried out in the summer of 2010 and included the deployment of instruments at two different ground sites: one urban (Sacramento, CA) and another in the Sierra Nevada foothills area rich in biogenic emissions (Cool, CA, site) (Zaveri et al., 2012).Thermally driven winds tend to transport the urban plume into the Sierra Nevada foothills and sometimes back again into the Sacramento area (Zaveri et al., 2012).The laboratory PALMS instrument was deployed at the Cool, CA, site at 450 m elevation at 121.02 • W, 38.87 • N in a trailer throughout the campaign.It sampled ambient air between 4 and 24 June 2010.

Results
Figure 1 shows the spectra of biological species: P. syringae bacteria, Snomax and hazelnut pollen wash water particles.These particles contain both organic and inorganic compounds.Because they are easy to ionize, the inorganic ions sodium and potassium stand out in the positive spectra despite their minor fraction by mass.Sulfates, phosphates and nitrates are present, and visible in their associations with potassium.Negative spectra are dominated by CN − , CNO − , phosphate (PO − 2 and PO − 3 ) and sulfate (HSO − 4 ).Higher mass associations of potassium, sulfates, phosphates and nitrates Chlorine is present on some particles.Chlorine is a known contaminant from the agar growth medium since spectra of aerosolized agar devoid of bacteria contain large amounts of chlorine (not shown here).
Figure 2 shows spectra of apatite.In positive polarity, apatite spectra are dominated by calcium, its oxides and associations with phosphate (CaPO + , CaPO + 2 , CaPO + 3 , Ca 2 PO + 3 and Ca 2 PO + 4 ) and fluorine (CaF + , Ca 2 OF + and Ca 3 OF + ).Negative spectra are dominated by phosphates (PO − , PO − 2 and PO − 3 ), and fluorine is often present.Lab-generated apatite spectra analyzed in this study contain little organic matter.This may be a result of the post-processing of the apatite sample, in particular of the use of ethanol as a grinding lubricant.In contrast, ethanol was not used in grinding the monazite sample here, and its spectra exhibit peaks associated with organic matter (C 2 H − ).
Figure 3 shows spectra of coal fly ash from the J. Robert Welsh Power Plant.The positive spectra contain sodium, aluminum, calcium, iron, strontium, barium and lead.As in apatite, calcium-oxygen, calcium-phosphate and calciumfluorine fragments are present.Fly ash particles also contain sulfate (H 3 SO + 3 ).The negative spectra contain phosphates (PO − 2 , PO − 3 ), sulfates (HSO − 4 ) and silicate fragments, such as (SiO 2 ) − 2 , (SiO 2 ) 2 O − , (SiO 2 ) 2 Si − and (SiO 2 ) − 3 .The results of HNO 3 processing experiments are also shown in Figs. 2 and 3.The processing with nitric acid had an effect on both apatite and fly ash: the calcium-fluorine positive markers (CaF + , Ca 2 OF + and Ca 3 OF + ) and the negative fluorine marker (F − ) are either reduced in intensity or completely absent after processing.Additionally, CN − and CNO − appear and/or intensify after processing.
A classifier was designed to use the ratios of phosphate (PO − 2 , PO − 3 ) and organic nitrogen (CN − , CNO − ) spectral peaks.Those spectral peaks were used for several reasons: (1) they are clearly visible in all biological spectra that were acquired as a part of this study (Fig. 1); (2) they were used to distinguish bioaerosol from other species in previous studies (Creamean et al., 2013;Pratt et al., 2009b); and (3) sources of phosphorus on aerosol particles are well-defined and documented in the literature (Mahowald et al., 2008).The only requirement for this analysis was that each spectrum used in the training set contains both phosphate and organic nitrogen (otherwise the ratios used here become undefined).This was ensured by selecting spectra, where PO − 2 > 0.001 and CNO − > 0.001.Nearly all biological spectra in the training set satisfied this criterion (Table 2).Figure 4a shows normalized histograms of the PO − 3 / PO − 2 ratio for the laboratory aerosol.The aerosols that contain only inorganic phosphorus, such as apatite, monazite and fly ash cluster at PO − 3 / PO − 2 less than 4 and often less than 2. The bioaerosols cluster at PO − 3 / PO − 2 greater than 2 and often greater than 4. Ragweed pollen is an exception, with a wide cluster in PO  to shift the PO − 3 / PO − 2 ratio to larger values, decreasing the disparity from the bioaerosols.Soil dusts are shown in Fig. 4, even though they are not used as training aerosol; their histogram shows a broad distribution with a tail extending into the PO − 3 / PO − 2 > 2 region, indicating a mixed inorganicbiological composition.In comparison, fertilized soil dusts show a similar distribution to apatite (PO − 3 / PO − 2 < 4) due to the presence of inorganic fertilizer, which is calcium phosphate.
The SVM algorithm was used here to optimize boundaries between clusters.To do this, the algorithm needs a training dataset, where the classes are known a priori.In this paper, the training dataset is defined in Table 2. Once an optimized boundary is drawn, some of the training data can still fall on the incorrect side of the boundary when the clusters are not perfectly separable.Accuracy here is defined as the percentage of correctly classified particles in the training set once the optimized boundary is found.A simple 1-D classifier can be made based only on the ratio of phosphate peaks PO − 3 / PO − 2 greater or less than 3.The accuracy of this simple filter is 70-80 % for the materials considered here, with ragweed pollen and fly ash as the greatest sources of confusion between the bioaerosol and nonbiological classes.A higher accuracy for differentiation of the bioaerosol and nonbiological classes can be achieved if the ratio of organic nitrogen peaks is also taken into account.Figure 4B shows normalized histograms of CN − / CNO − ratios for the test aerosol.In contrast to PO − 3 / PO − 2 ratios, CN − / CNO − ratios do not, by themselves, exhibit a clear difference between the classes.A superior separation is achieved when data are plotted in a CN − / CNO − vs. PO − 3 / PO − 2 space, as shown in Fig. 5.In this case, two clusters appear.The soil dust class was left out of the training set because it is not known a priori if and how much biological material it contains (classification of soil dusts with the SVM algorithm is discussed later).The boundary between the classes in CN − / CNO − vs. PO − 3 / PO − 2 space is nonlinear, as shown in Fig. 5.The accuracy in this 2-D classification is 97 %.As before, ragweed pollen is the cause of most errors; if it is removed from the training dataset, the accuracy increases to 99 %.Processed mineral dust had a smaller impact on the accuracy: removing it from the training dataset increased the accuracy to 97.5 %.
For every observation, a distance from the SVM boundary can be calculated (otherwise known as score).Those distances can then be converted to the probability of correct identification.An optimized function to convert scores to probabilities was found by 10-fold cross-validation (Platt, 1999).Because in this experiment the classes are not perfectly separable, the conversion function is a sigmoid.Posterior probabilities near 0 and 1 indicate high-confidence identification.An uncertainty boundary was defined between 0.2 and 0.8.This boundary is shown in Fig. 5. Points that lie within this boundary are marked as low-confidence assignments.Those correspond to shaded areas in Figs. 6 and 7.
Once trained with the training set, the SVM algorithm was used to analyze the FIN03 and CARES field datasets collected at Cool, CA, and Storm Peak.As a first step, "phosphorus-containing" particles were identified in both datasets.The criterion for phosphorus-containing particles used for this work is the presence of both PO − 2 and PO − 3 ions at fractional peak area (area of peak of interest / total spectral signal area) greater than 0.01.This threshold was set by examination of the ambient mass spectra to determine when the phosphate peaks are distinct.Ambient particles commonly have numerous small peaks at masses below ∼ 200 due to a diversity of organic components.The height of this background is ∼ 0.01, and data below this level are considered uncertain.Phosphorus-containing ambient spectra were then classified by the SVM algorithm as bioaerosol or inorganic phosphorus if the CNO − ion was also present at fractional peak area greater than 0.001.If CNO − fractional area was less than 0.001, the spectrum was also classified as inorganic phosphorus.
During the FIN03 campaign, phosphorus-containing particles represented from 0.2 to 0.5 % by number of the total detected particles in negative ion mode depending on the sampling day and a 0.4 % average for the entire dataset.As shown in Fig. 6a when the binary classifier described in this work was applied to the phosphorus-containing particles, bioaerosol represented a 29 % subset by number (i.e., 0.1 % of total analyzed particles).During the CARES campaign, phosphorus-containing particles were 1.1 to 4.2 % by number of the total particles detected in negative ion mode, with a 2.4 % average for the dataset (Fig. 7a).Bioaerosol parti-  cles represented a 63 % subset by number (i.e., 1.2 % of total analyzed particles).This range (0.1-1.2 %) is within, and towards the lower end, of previous estimates with biologicalspecific techniques (Table 1).This lower-end estimate may, in part, be due to PALMS sampling particles in the 200-500 nm diameter range as well as larger sizes.Previous estimates tend to show increased bioaerosol in the supermicrometer range, and data are often unavailable for the numerous particles smaller than 500 nm diameter.
The origin of the nonbiological phosphate particles is likely phosphate-bearing mineral dust or fly ash.The CARES site experienced influences of aged marine, urban and local biogenic sources.Within the urban plumes, a likely source of inorganic phosphate is industrial combustion aerosol.At Storm Peak a likely source is the mining of phosphate rock and nearby monazite deposits.Figure 6b shows HYSPLIT back trajectories for the 10 days of the FIN03 campaign; the air masses sampled cross deposits of either phosphate rock (apatite) or rare-earth elements (monazite or carbonatite).As examples, on 27 September the back trajectory intersects the vicinity of an active rare-earth element (REE) mine in Mountain Pass, CA, and on 18 September and 20 September the air mass intersected active phosphate mines in Idaho.Although negative spectra of apatite and monazite cannot be definitively differentiated from fly ash or soil dust spectra, positive spectra acquired during FIN03 additionally suggest that monazite-type material was present.Figure 2g and h show nonbiological phosphate-rich ambient spectra from FIN03. Figure 2e and f (monazite) contain similar features and matching rare-earth elements.
In total, 56 and 36 % of phosphate-containing particles analyzed in FIN03 and CARES, respectively, categorized as biological also contained silicate features.Considered in more detail in the next section, a subset of these may represent internal mixtures of biological and mineral components.

Discussion
The method of identification of bioaerosol described here is based on ratios of phosphate and organic nitrogen peaks.This work is specific to PALMS but can be considered a starting point from which identification and differentiation can be made with similar instruments.Previous work with PALMS shows this ratio approach can be used to identify differences in chemistry, for example among mineral dusts (Gallavardin et al., 2008).In this case the classes are bioaerosol and nonbiological phosphorus; Fig. 4a shows that phosphorus ionizes differently in these classes.In apatite and monazite, phosphorus occurs as calcium phosphate.In biological particles, phosphorus occurs mostly in phospholipid bilayers and nucleic acids.In these experiments, the PO − 3 / PO − 2 ratio of those two forms is different (Fig. 4a).The agricultural soils considered here cluster with the minerals and fly ash, and we assume the phosphorus is due to the use of inorganic fertilizer, which is derived from calcium phosphate (Koppelaar and Weikard, 2013).Fly ash aerosol clusters similarly to apatite and monazite but with a wider distribution; this is likely because the chemical from of phosphorus in fly ash is different than in the minerals.Phosphorus present in coal is volatilized and then condenses into different forms during the combustion process (Wang et al., 2014).
During the FIN03 campaign at Storm Peak, 0.2-0.5 % of particles by number detected in negative polarity contained measureable phosphorus (Fig. 6a).On most days, the majority of phosphorus-rich particles were inorganic.Particles with positive spectra showing the characteristics of monazite coupled to back trajectories over source areas suggest the origin of the inorganic phosphate particles.Although apatite or monazite particles make up a small portion of ambient particles at Storm Peak, they are potentially interesting not only due to their possible confusion with biological phosphate but also as a tracer for industrial mining and processing activities.Currently, such activities are taking place in Idaho and until very recently at Mountain Pass, CA (US Geological Survey, 2016a, b).Smaller exploration activities are also taking place at the Bear Lodge, WY, and the REE-rich areas in Colorado, Idaho and Montana are of interest (US Geological Survey, 2016a).
During the CARES campaign more particles contained phosphorus (1-4.2 %) and a higher percentage of phosphaterich particles were identified as biological (63 % vs. 29 % in FIN03).Because the site contains strong local biogenic and urban influences, the sources of biological particles are probably local.As shown in Fig. 7b, aged marine particles were also present on many days; however, only 4 % of particles identified as biological also contained markers associated with sea salts.

Comparison with existing literature
Previous studies have attempted to identify bioaerosol with SPMS based on the presence of phosphate and organic nitrate components.Creamean et al. (2013) and Pratt et al. (2009b) suggested a "Boolean criterion" where the existence of CN − , CNO − and PO − 3 in a particle resulted in its classification as biological.If silicate components were additionally present, the particle was classified as an internal mixture of mineral dust and biological components (Creamean et al., 2013(Creamean et al., , 2014)).Such "Boolean" criteria for particle identification, can be helpful in distinguishing aerosol types when the signatures are unique to one particle type.
The selectivity of this simple three-component filter (presence or absence of CN − , CNO − and PO − 3 ) for biological particles was investigated for PALMS using the test aerosol database with results shown in Fig. 8.Note that previous literature does not provide information on the thresholds used to determine the presence or absence of ions in an analysis of ATOFMS spectra.Furthermore, because of hardware differences, detection limits of PALMS and ATOFMS are known to be different (Murphy, 2007).This analysis focuses on PALMS and the threshold for "presence" was chosen as 0.001, which was observed to be the detection limit for CN − , CNO − and PO − 3 in the laboratory aerosol database used here.The simple filter successfully picks biological material.However, it also has a high rate of false positives.For the material that contains inorganic phosphorus (i.e., samples known to be devoid of biological material), the threecomponent filter selects 56 % of fly ash, 56 % of agricultural dust and 32 % of apatite and monazite.Soil dust is identified as biological 78 % of the time.
The effect of the misidentification of inorganic phosphate as biological can be considered in the context of the atmospheric abundance of the three major phosphate bearing aerosols: mineral dust, fly ash and bioaerosol (estimates given in Table 4).Because the emissions estimates vary, the highest fraction of bioaerosol is the case of the highest estimate of bioaerosol coupled to the lowest estimate of fly ash and mineral dust (Table 4 and Fig. 9a).Conversely, the lowest fraction of bioaerosol is the case of the lowest estimate of bioaerosol coupled to the highest estimate of fly ash and mineral dust (Table 4 and Fig. 9b).
The misidentification rates noted above are then propagated onto the high and low estimates.As an example, the fraction of aerosol phosphate due to fly ash (1 % in the high and 5 % in the low bioaerosol estimate) is multiplied by 0.56 to indicate the fraction of fly ash that would be misidentified as biological phosphate with the simple three-component filter.This misidentification effect is repeated for the mineral dust emission rate and misidentification fraction.For simplicity, we considered the mineral dust fraction to be desert soils, termed Aridisols and Entisols, which are predominantly present in dust-productive regions, such as the Sahara or the dust bowl (Yang et al., 2013).According to Yang and Post (2011), the organic phosphate content of those soils is 5-15 %, but this is a second-order effect when compared to misclassification.In the high-bioaerosol scenario, 17 % of the phosphate aerosol is biological (Fig. 9a), but when misidentification is considered, 81 % of particles is identified as such (Fig. 9c).In the low-bioaerosol scenario, 2 % of the phosphate aerosol is biological (Fig. 9b), but when misidentification is considered, 77 % of the particles is identified as such (Fig. 9d).This illustrates that simplistic identification can lead to large misclassification errors of aerosol sources.The percentage of ambient aerosol particles from the FIN03 dataset categorized as biological and inorganic (phosphate-bearing mineral dust or fly ash) phosphate using the criteria developed in this work.Hatched regions indicate uncertain assignments as per the boundaries in Fig. 5.Note that at this location and time of year inorganic phosphate dominates biological particles.(b) HYSPLIT back trajectories plotted for 10 measurement days at Storm Peak Laboratory.Locations of REE, phosphate and carobonatite deposits, sourced from US Geological Survey, are co-plotted (Berger et al., 2009;Chernoff and Orris, 2002;Orris and Grauch, 2002).Dates are given as MM/DD.Misidentification can also lead to misattribution.Pratt et al. (2009b) analyzed ice residuals sampled in an orographic cloud and suggested a biological source using the simple three-component filter applied to spectra containing calcium, sodium, organic carbon, organic nitrogen and phosphate.The processed apatite spectrum in Fig. 2, devoid of biological material, contains all of these markers.Similar to the Storm Peak dataset, the Pratt et al. (2009b) wave cloud occurred in west-central Wyoming, which is near the Idaho phosphate rock deposits (Fig. 6), and four US states with active mining of phosphate rock for use as inorganic fertilizer in agriculture (US Geological Survey, 2016b).
As noted above, the Pratt et al. (2009b) and Creamean et al. (2013Creamean et al. ( , 2014) ) studies were performed with a different SPMS, the ATOFMS (Gard et al., 1997;Pratt et al., 2009a).Because the ATOFMS uses a desorption-ionization laser of a different wavelength (266 nm), the SVM algorithm used here may not directly translate to that instrument (Murphy, 2007).Instead, the calculation above assumes only that the misidentification rates between the simple three-component filter and the SVM algorithm apply.(Berger et al., 2009;Chernoff and Orris, 2002;Orris and Grauch, 2002) along with locations of major urban centers.Dates are given as MM/DD.

Soil dust and internal dust-biological mixtures
Soil dust is an important but complicated category of phosphate-containing atmospheric particles.Modeling studies, such as Mahowald et al. (2008), treat all phosphorus in soil dust aerosol as inorganic.However, the phosphorus in soil investigated here took both organic and inorganic forms.Walker and Syers (1976) proposed a conceptual model of transformations of phosphorus depending on the age of the soil.At the beginning of its development, all soil phospho-rus is bound in its primary mineral form, matching that of the parent material, which is primarily apatite (Walker and Syers, 1976;Yang and Post, 2011).As the soil ages, the primary phosphorus is released.Some of it enters the organic reservoir and is utilized by vegetation, some is adsorbed onto the surface of secondary soil minerals (non-occluded phosphorus) and then gradually encapsulated by secondary minerals (Fe and Al oxides) into an occluded form.The total phosphorus content of the soil decreases as the soil ages, due to leaching.The organic fraction can encompass mi-  croorganisms, their metabolic by-products and other biological matter at various stages of decomposition.Soil microorganisms are the key players in converting organic phosphorus back into the mineral form (Brookes et al., 1984).Yang and Post (2011) estimated organic and inorganic phosphorus content of various soils based on available data.Spodosols (moist forest soils) have the highest fraction of organic phosphorus (∼ 45 %), and Aridisols (sandy desert soils) have the lowest (∼ 5 %) (Yang and Post, 2011).Yang et al. (2013) compiled a global map of soil phosphorus distribution and its forms and found that 20 %, on average, of total phosphorus is organic.Wang et al. (2010) arrive at 34 % of soil phosphorus as organic globally.
The biological PALMS filter was applied to several soil dust samples (Table 3).As would be expected, soils collected in areas with less vegetation exhibit smaller biological contributions.We note that organic phosphorus content is not necessarily a direct indicator of microbes since it also encompasses decomposed biogenic and organic matter.At this time, we are not able to delineate between primary biological, biogenic or simply complex organic (such as humic acids) material.
In the FIN03 field dataset, 56 % of particles identified as biological also contained silicate markers normally associated with mineral dust.In the CARES dataset the percentage of such particles was 36 %.This represents an upper limit of particles that are an internal mixture of dust and biological material.As stated in the last paragraph, this biological material probably does not consist of whole cells sitting on mineral particles; such internally mixed mineral dust particle with whole or fragments of biological material are not supported by EM (Peter Buseck, personal communication).It currently remains unclear if such internally mixed particles would be counted as biological with an optical microscope after fluorescent staining.
Internal mixtures of biological and mineral components were generated in the laboratory in order to investigate this; an exemplary spectrum of such a particle is shown in Fig. 10.The spectrum contains alumino-silicate markers consistent with mineral dust together with phosphate markers that, in this case, come from the biological material.In spectra of pure illite, no phosphate markers are present.Using the classifier developed in this paper on the laboratory-generated internally mixed particles correctly identifies the phosphate signatures to be biological.(Zender, 2003) 7800 (Jacobson and Streets, 2009) Primary biological 186 (Mahowald et al., 2008) 298 (Jacobson and Streets, 2009) Fly ash 14.9 (Garimella, 2017) 390 (Garimella, 2017)

Uncertainty in bioaerosol identification in PALMS spectra
Phosphorus peak ratios in biological particles cluster differently than in inorganic phosphorus particles with ragweed pollen being an exception (Fig. 4a).No satisfactory explanation for this observation has been found although contamination with phosphate fertilizer cannot be ruled out.
The accuracy of the biological filter using PO − 3 / PO − 2 and CN − / CNO − ratios is 97 %, with ragweed alone being the source of most of the error.This unexplained behavior is a cause for concern, as the list of biological samples used as a training set is extensive but not exhaustive and other exceptions could exist.
The basic classifier presented in this paper is binary: all phosphate-and organic-nitrogen-containing particles are classified either as biological or inorganic.However, spectra whose PO − 3 / PO − 2 and CN − / CNO − ratios are very close to the SVM boundary have more uncertain assignments than those whose PO − 3 / PO − 2 and CN − / CNO − ratios fall far away from the boundary.In order to provide an additional measure of classification uncertainty, a probability bound was defined as shown in Fig. 5.According to this definition, 96 % of particles in the training dataset were classified with high confidence (Fig. 5).In the FIN03 and CARES field datasets, 79 % of phosphate-containing particles were classified with high confidence.The low-confidence assignments are shown in Figs.6a and 7a with shaded areas.The low-confidence assignments in field datasets can be related to chemical processing of particles (either at the source like in soils or during transport) or to internal mixing of biological and inorganic phosphate.
Because soil dusts are a special category, where lines between biological and inorganic phosphorus sources can be blurry because of ongoing chemical transformations, they have higher classification uncertainties than other types of phosphate-containing aerosols.In the field data, dustbiological mixtures (defined as particles classified as biological with silicate features) are overrepresented in the lowconfidence assignments.Dust-biological mixtures constitute 26 % (CARES)-46 % (FIN03) of high-confidence assignments and 64 % (CARES)-68 % (FIN03) of low-confidence assignments.Moreover, only 75 % of phosphate-containing soil dust particles were classified with high confidence.However, in simple two-component internal mixtures of dust and biological fragments (Fig. 10), phosphate features can be identified as biological with high confidence (98 %).
Because the field studies were performed during different time periods, it was difficult to control for a constant excimer laser fluence.However, laser fluence was similar for all laboratory samples acquired (3-5 mJ pulse energy).This is a possible source of uncertainty, as fragmentation patterns can differ depending on pulse energy.

Conclusions
This paper examines criteria that can be used with SPMS instruments to identify bioaerosol.We propose a new technique of bioaerosol detection and validate it using a database of phosphorus-bearing spectra.A simple binary classification scheme was optimized using an SVM algorithm, with 97 % accuracy.Ambient data collected during FIN03 and CARES campaigns are then analyzed with this binary classifier.Particles with phosphorus were up to 0.5 % for FIN03 and 4.2 % for CARES by number of all ambient particles in the 200-3000 nm size range.On average, 29 % (FIN03) and 63 % (CARES) of these particles were identified as biological.
Our work expands on previous SPMS sampling that used a more simple Boolean three-marker criterion (CN − , CNO − and PO − 3 ) to classify particles as primary biological or not (Creamean et al., 2013(Creamean et al., , 2014)).We show that the presence of these markers is necessary but not sufficient.We show a false positive rate of the Boolean filter between 64 and 75 % for a realistic atmospheric mixture of soil dust, fly ash and primary biological particles.
The trained SVM algorithm was also used to measure the biological content of soil dusts.Different soil dust samples can have different contents of biological material with a range from 2 to 32 % observed here.Consistent with the literature, samples taken from areas with vegetation exhibit a higher biological content.

Figure 1 .
Figure 1.Representative PALMS spectra of bioaerosol.(a, b) Snomax.(c, d) P. syringae.(e, f) Hazelnut wash water.Right and left columns are positive and negative polarity, respectively.Red dotted lines are features indicated in the literature as markers for biological material.

Figure 2 .
Figure 2. Representative PALMS spectra of phosphorus-rich minerals and ambient aerosol.(a, b) Unprocessed apatite.(c, d) Apatite processed with HNO 3 (see text for details).(e, f) Monazite-Ce.(g, h) Ambient particles sampled at Storm Peak matching monazite chemistry.Right and left columns are positive and negative polarity, respectively.Red dotted lines are features indicated in the literature as markers for biological material.

Figure 3 .
Figure 3. Representative PALMS spectra of coal fly ash from the J. Robert Welsh power plant.(a, b) Unprocessed fly ash.(c, d) Fly ash processed with HNO 3 (see text for details).Right and left columns are positive and negative polarity, respectively.Red dotted lines are features indicated in the literature as markers for biological material.

Figure 4 .Figure 5 .
Figure 4. (a) Normalized histograms of the PO − 3 / PO − 2 ratio for the laboratory aerosol.(b) Normalized histograms of the CN − / CNO − ratio for the same laboratory aerosol as in (a).Delineation between the clusters at a PO − 3 / PO − 2 ratio of 3 results in a 70-80 % classification accuracy depending on the types of particles considered.Note that soil dusts were not used as part of the training dataset and that not all training aerosols are shown here for clarity.
Figure 6.(a)The percentage of ambient aerosol particles from the FIN03 dataset categorized as biological and inorganic (phosphate-bearing mineral dust or fly ash) phosphate using the criteria developed in this work.Hatched regions indicate uncertain assignments as per the boundaries in Fig.5.Note that at this location and time of year inorganic phosphate dominates biological particles.(b) HYSPLIT back trajectories plotted for 10 measurement days at Storm Peak Laboratory.Locations of REE, phosphate and carobonatite deposits, sourced from US Geological Survey, are co-plotted(Berger et al., 2009;Chernoff and Orris, 2002;Orris and Grauch, 2002).Dates are given as MM/DD.

Figure 7 .
Figure 7. (a) The percentage of ambient aerosol particles from the CARES dataset categorized as biological and inorganic (phosphatebearing mineral dust or fly ash) phosphate using the criteria developed in this work.Hatched regions indicate uncertain assignments per the boundaries in Fig. 5.(b) HYSPLIT back trajectories plotted for 10 measurement days at the Cool, CA, site.Locations of REE, phosphate and carobonatite deposits, sourced from US Geological Survey, are co-plotted(Berger et al., 2009;Chernoff and Orris, 2002;Orris and Grauch, 2002) along with locations of major urban centers.Dates are given as MM/DD.

Figure 9 .
Figure9.Abundance of bioaerosol, mineral dust and fly ash in the atmosphere constructed using emissions estimates in Table3.(a) Highest estimate for bioaerosol coupled to lowest estimates for dust and fly ash.(b) Lowest estimate of bioaerosol in the atmosphere coupled to highest estimates for dust and fly ash.(c, d) Effect of misidentification of phosphate-and organic-nitrogen-containing aerosol as biological using the emissions in (a) and (b), respectively.The hatched regions correspond to the misidentified fractions of mineral dust and fly ash.In these estimates the correct emissions (solid green region) in (a) and (b) (17 and 2 %, respectively) are overestimated (hatched green region of misidentified aerosol plus solid green region) in (c) and (d) (as 81 and 77 %, respectively).

Figure 10 .
Figure 10.Exemplary PALMS negative polarity spectra of (a): dry-dispersed illite NX; (b): wet-dispersed illite NX from a distilled, deionized water slurry; (c): similarly wet-dispersed illite NX but from a water slurry that also contained F. solani spores.Note that phosphate features are absent in (a) and (b) but present in (c) due to the addition of biological material to the mineral dust.

Table 2 .
Summary of particle statistics for samples used to both train and test the classifier.
originated from a cultivated plant.Pollen was collected, mechanically sieved and dried.The yeast used in this experiment was commercial active dry yeast (Star Market brand).

Table 3 .
Soil dust samples used in this work.The last column shows the results of analysis with the SVM classifier developed here as a percentage of negative spectra acquired.
Percentage of particles that include PO − 3 , CN − and CNO − markers in five classes of atmospherically relevant aerosol spectra acquired with PALMS in this work.Note that the green bars indicate the percentage of particles of each type identified as biological using literature criteria.In the case of bioaerosol the identification is correct.In all other aerosol classes the green bar denotes a typical level of misidentification.

Table 4 .
Literature estimates of emission rates of primary biological particles, dust and fly ash.