Classification of aerosol population type and cloud condensation 1 nuclei properties in a coastal California littoral environment using an 2 unsupervised cluster model

. Aerosol particle and cloud condensation nuclei (CCN) measurements from a littoral location on the northern coast 11 of California at Bodega Bay Marine Laboratory (BML) are presented for approximately six weeks of observations during the 12 CalWater-2015 field campaign. A combination of aerosol microphysical and meteorological parameters was used to classify 13 variability in the properties of the BML surface aerosol using a K-means cluster model. Eight aerosol population types were 14 identified that were associated with a range of impacts from both marine and terrestrial sources. Average measured total 15 particle number concentrations, size distributions, hygroscopicities, and activated fraction spectra between 0.08% and 1.1% 16 supersaturation are given for each of the identified aerosol population types, along with meteorological observations and 17 transport pathways during time periods associated with each type. Five terrestrially influenced aerosol population types 18 represented different degrees of aging of the continental outflow from the coast and interior of California and their appearance 19 at the BML site was often linked to changes in wind direction and transport pathway. In particular, distinct aerosol populations, 20 associated with diurnal variations in source region induced by land/sea-breeze shifts, were classified by the clustering 21 technique. A terrestrial type representing fresh emissions, and/or a recent new particle formation event, occurred in 22 approximately 10% used distribution measurements for between from using L min -1 sample flow 3.0 L min -1 sheath flow rate. scans conducted subsequently averaged to approximately 15-minute time to match additional coincident measurements.

approximately 10% of the observations. Over the entire study period, three marine influenced population types were identified 23 that typically occurred when the regular diurnal land/sea-breeze cycle collapsed and BML was continuously ventilated by air 24 masses from marine regions for multiple days. These marine types differed from each other primarily in the degree of cloud 25 processing evident in the size distributions, and in the presence of an additional large-particle mode for the type associated 26 with the highest wind speeds. One of the marine types was associated with a multi-day period during which an atmospheric 27 river made landfall at BML. The generally higher total particle number concentrations but lower activated fractions of four of 28 the terrestrial types yielded similar CCN number concentrations to two of the marine types for supersaturations below about 29 0.4%. Despite quite different activated fraction spectra, the two remaining marine and terrestrial types had CCN spectral 30 number concentrations very similar to each other, due in part to higher number concentrations associated with the terrestrial 31 type. 32

Introduction 1
Atmospheric rivers (ARs) are tropical moisture advection phenomena that can account for large fractions of the wintertime 2 precipitation in California (Ralph et al., 2004;Dettinger et al., 2011). The winter-spring 2015 CalWater-2015 study (Ralph et 3 al., 2015), and coordinated ACAPEX study (Leung, 2016) that included aircraft-and ship-based observations in the same 4 region, were designed to probe the atmospheric conditions in and around ARs, and to provide new observations of the 5 characteristics of regional aerosols that may interact with these atmospheric moisture features and thereby influence the 6 downwind formation of precipitation. As part of the CalWater-2015 study, ground-based aerosol observations were conducted 7 at the Bodega Bay Marine Laboratory (BML), a coastal California site that is suitable for observation of aerosols in landfalling 8 marine air masses, and in mixtures of marine and continental air . 9 In marine regions impacted by continental outflow, aerosol chemical and microphysical properties, including particle number 10 concentrations and size distributions, are often moderated by impacts from terrestrial sources (Nair et  sulfates and nitrate salts have values in the same range as many remote marine aerosol populations (Andreae and Rosenfeld, 22 2008). The cloud-droplet-nucleating activity of particles in coastal regions therefore depends on the composition, size 23 distribution, and degree of aging of the marine particles, along with the characteristics of particles from non-marine sources 24 that become mixed with the marine aerosol. Thus, observation and analysis of the coastal California aerosol environment will 25 support efforts to better understand implications of aerosol properties on precipitation and cloud development in the region, 26 including aerosol influences on how precipitation develops in landfalling ARs in this region. 27 In this study, observations from the BML ground site during the CalWater-2015 field campaign were used to evaluate the 28 nature and variability of surface (marine boundary layer, MBL) aerosol populations at BML. Observations were classified into 29 periods of similar aerosol and meteorological characteristics using an unsupervised cluster model (Atwood et al., 2017) to 30 derive distinct littoral aerosol population types and link them to source regions. The clusters were assessed with respect to 31 their aerosol characteristics, particularly the contributions of each population type to cloud condensation nuclei (CCN) 32 concentrations that influence the microphysical properties of liquid-phase clouds and the evolution of precipitation in mixed-33 This study used size distribution measurements for particle diameters between 14 and 730 nm from a TSI 3080 SMPS using a 24 0.3 L min -1 sample flow rate and 3.0 L min -1 sheath flow rate. SMPS scans were conducted approximately every 5 minutes 25 and were subsequently averaged to approximately 15-minute time periods to match additional coincident measurements. 26 The presence of large numbers of particles smaller than approximately 30 nm in diameter was intermittently observed in the 27 size distribution measurements. These particles were likely from a combination of local sources that included vehicle and other 28 activity at BML, local camp and brush fires, and emissions from the nearby town of Bodega Bay, located to the east of the 29 site. This nucleation mode was generally superimposed on more stable aerosol populations, and thus rather than removing the 30 entire observation during these events, we removed only the contamination mode, as described below and in further depth in 1 the supplemental material. 2 The best-fit modal parameterization for each size distribution spectrum was first assessed using a lognormal mixture 3 distribution fitting algorithm based on Hussein et al., (2005), and as described in Atwood et al., (2017). The algorithm selects 4 between one and three modes to best represent the size distribution based on empirical rules and maximum-likelihood fitting 5 criteria, and defines each mode by three parameters (median diameter, geometric standard deviation, and fractional number 6 concentration). Within each of the size spectra, a fitted mode was identified as local contamination if the following criteria 7 were met: 8 • The combined fit's distribution was identified to have more than one mode 9 • The median diameter of the smallest fitted mode was at or below 20 nm 10 • The smallest fitted mode's dN/dlog10Dp value was less than 50% of the combined fit's dN/dlog10Dp value at 50 nm 11 (indicating only a small contribution to total particle count in larger fit modes) 12 • The smallest fitted mode did not persist continuously for more than three hours 13 In the case that an observed size distribution spectrum passed all these criteria, the smallest mode was classified as a 14 contamination mode likely associated with local sources as described above and therefore not representative of the regional 15 aerosol. These modes were removed from the size distributions and total number concentrations while retaining the remaining 16 fitted modes. This was accomplished by multiplying the observed dN/dlog10Dp value by one minus the contamination mode's 17 number fraction of the total distribution for each size bin. An example of the removal is shown in Fig. 1, with an observed 18 distribution without a local mode shown in (a). Approximately one hour later, a distinct small mode was observed with a 19 median diameter less than 14 nm and nearly no contribution to particle concentration at diameters above 20 nm (b), while total 20 number concentration showed a rapid increase (not shown). This mode was no longer present approximately 1.5 hours later. 21 Therefore, the observation passed all four criteria to be considered local contamination and was removed from the reported 22 size distribution and total number concentration to give the final corrected distribution (c). 23

CCN measurements 24
As described in Martin et al. (2017), size-resolved CCN concentrations (srCCN) were measured using a DMT cloud 25 condensation nuclei counter (CCN-100) coupled with a TSI 3080 SMPS to scan across a range of particle diameters (12-540 26 nm at water supersaturation of s = 0.1%, 0.19%, 0.28%, 0.44%, 0.58%, 0.67%). A distribution-averaged apparent 27 hygroscopicity parameter k (Petters and Kreidenweis, 2007) was calculated from these scans as described in Petters and 28 Petters, (2016). 29 A separate scanning flow CCN system (sfCCN) was used to measure total CCN number concentration as supersaturation was 30 ramped between approximately 0.08% and 1.1% supersaturation. The system used a DMT CCN-100 instrument that had been 31 modified by connecting a voltage modulated proportional flow valve to the bottom of the column to control flow (Suda et al., 32 2014). The flow rate through the CCN column was increased from 0.2 to 1.2 lpm over 5 minutes while holding the temperature 33 gradient constant, thereby scanning peak column supersaturation as a function of flow rate and column temperature gradient 1 (Moore and Nenes, 2009). A TSI 3010 Condensation Particle Counter (CPC) was placed in parallel to the CCN to measure 2 total particle number concentration (CN). CCN and CN concentrations were recorded at a frequency of one hertz. Each 3 approximately 5-minute scan was repeated three times, after which the temperature gradient was changed to scan a different 4 range of supersaturations. As the column temperatures took approximately 3 minutes to stabilize, the first scan of each three-5 repetition set was not analyzed. As the residence time in the sfCCN varied with total flow rate when compared to the parallel 6 CPC, the CPC timestamps were empirically adjusted to ensure the 1 Hz data points from both instruments were aligned, as shown as red data points (placed for comparison to CCN concentrations at 1.3% supersaturation). During the contamination 12 period approximately one hour later ( Fig. 2(c&d)) the effect of the small mode noted in Fig. 1(b) was seen via increased CN 13 concentrations (reaching as high as 5000 cm -3 ) and decreased activated fractions above approximately 0.1% supersaturation. 14 Observed CCN concentrations remained relatively constant indicating the additional particles were too small to activate below 15 1.1% supersaturation. After removal of this small contamination mode the corrected CN concentrations and activation 16 spectrum Fig. 2(e&f) were similar in characteristics to the earlier non-contaminated period. 17

2.4
Classification of aerosol population type 18 Classification of aerosol population types impacting BML was conducted using an unsupervised K-means cluster analysis. 19 Such clustering methods utilize properties of the aerosol and environment to identify periods of potentially similar impacts 20 and aerosol population types (Wilks, 2011). Cluster analyses have been used to classify aerosol particle size distributions 21 (Tunved et al., 2004), associate them with various environmental and atmospheric processes (Charron et  The K-means clustering methodology involved selection of specific variables that partially defined the state of the aerosol and 27 meteorological environment at the sampling site. The degree of similarity of the state of the environment between any two 28 data points (i.e. specific times) was estimated by the use of a distance function that grouped data points into clusters that had 29 broadly similar values among the input variables. Here, we utilized the cluster.KMeans class of the Python scikit-learn package 30 (Pedregosa et al., 2011) to perform the analysis. Selection of the appropriate number of clusters followed a similar 31 methodology to that described in Atwood et al., (2017). A hierarchical cluster analysis was first created using the 32 cluster.AgglomerativeClustering class of scikit-learn to identify potential numbers of clusters using a dendrogram. Various with verification that the results maintained physically distinct and temporally coherent clusters, in order to select the 2 appropriate number of K-means clusters. 3

2.4.1
Cluster variables 4 Variables used in this analysis included measurements of aerosol microphysical properties and of meteorological parameters 5 at BML. Aerosol property variables included the normalized size distribution at each time stamp (normalized to an integrated 6 value of 1 cm -3 by dividing by the total particle number concentration), after correction for local contamination. The 7 distributions were discretized into 20 logarithmically spaced bins that served as separate variables in the clustering distance 8 function (e.g. Charron et al. (2008)). Activated fraction spectra for each time stamp were divided into 20 linearly spaced bins 9 distributed between 0% and 1.1% supersaturation to incorporate CCN properties into the analysis. Total particle number 10 concentration was included as a separate cluster variable. 11 Meteorological parameters at BML included the local 10m observed wind velocity as perpendicular u and v component 12 variables. Additionally, HYSPLIT backtrajectories were assigned to data points closest in time to each trajectory. The 13 backtrajectory was converted to separate variables for the distance function by determining the distance from the receptor, 14 initial bearing, and altitude, every three hours backwards along the trajectory for 24 hours, yielding a total of 24 trajectory 15 clustering variables for each time stamp. 16

Distance function 17
The Karl Pearson Euclidean distance function (Wilks, 2011) used in the cluster model was modified to include a relative 18 weight parameter for each input variable as: 19 where ! ",$ gives the Euclidean distance between two data point vectors, 5 " and 5 $ , in a 6-dimensional space (i.e. 6 nominally 20 independently measured or observed, orthogonal variables at each time stamp), for each variable, 7, with relative weight, 8 9 . 21 Each variable was first standardized, with missing data imputed to values of zero to minimize their impact on the distance 22

function. 23
The relative weights for each variable were included to prevent properties of the aerosol from becoming over-weighted in the 24 cluster model due to having more variables describing them. The size distribution and activated fraction variables, each of 25 which had 20 variables, were given relative weights of 1/20 such that their total relative weights summed to 1. As these 26 variables were of primary importance to aerosol population microphysical properties, the other variable groups were decreased 27 in relative importance. The backtrajectory and wind vector groups were each assigned a relative weight of 0.5, and the total 28 particle number concentration variable assigned a relative weight of 0.1. As cluster analysis is by its nature an exploratory data 29

Number of clusters 3
Hierarchical clustering and internal validity measures indicated two, six, eight, and twelve clusters were potentially appropriate 4 for the K-means analysis. Clusters associated with periods of marine aerosol impacts (discussed further in the next section) 5 became temporally coherent and physically meaningful after the number of clusters was increased to eight. 6 In the case of the twelve-cluster analysis, several of the clusters were composed of few or even a single data point, indicating 7 the model had begun to separate outliers into distinct clusters. In addition, several of the temporally consistent clusters were 8 split, indicating that too many clusters had been selected. All potential cluster numbers between seven and eleven were then 9 investigated to determine if physically or temporally coherent population types emerged to a greater degree than the eight-10 cluster model. As the eight-cluster option was initially identified as potentially appropriate, and the other potential models did 11 not improve physical interpretability of the results, the eight-cluster model was selected as the most appropriate unsupervised 12 classification result. 13

Aerosol population type classification results and discussion 14
Three of the eight identified clusters were defined as "marine" population types, as backtrajectory data showed evidence of 15 transport pathways primarily over ocean areas. These marine types, denoted as clusters M1-M3, tended to have lower average 16 number concentrations (below approximately 1500 cm -3 ), while the terrestrial clusters (T1-T5) had typical averages between 17 approximately 2000 and 4000 cm -3 and were associated with transport from more terrestrial source regions. The exception to 18 this was cluster T5, which had number concentrations of roughly 1500 cm -3 , more oceanic transport pathways, and size 19 distributions with the largest median diameters among the "terrestrial" clusters. Table 1 provides the cluster-averaged number  20 concentrations, wind velocities, HYSPLIT accumulated precipitation along the 24-hour trajectory, hygroscopicity parameters 21 from the srCCN system, and the percentage of all measurements associated with each of the 8 identified clusters. Best-fit size 22 distribution parameters and best-fit CCN-spectrum activated fraction parameters (see Supplemental Material) for each cluster 23 are provided in Table 2.

Marine population types 7
Further analysis of the marine clusters showed generally distinct meteorological conditions associated with each. Cluster M1 8 primarily occurred toward the end of the study period, during a period of high velocity winds from the northwest. 9 Backtrajectories agreed with local winds and showed generally faster transport velocities. This cluster dominated during 26 10 and 27 February, a period during which the cleanest air masses and lowest number concentrations of the study were observed, 11 reaching as low as 50 cm -3 during the height of the event. The normalized size distributions indicated that the Aitken mode 12 dominated the number distribution (Fig. 4), and also suggested the presence of a somewhat larger mode (particles larger than 13 ~400 nm) that may have been associated with generation of sea spray by the higher wind velocities. Backtrajectories indicated 14 that airmasses had passed over the ocean to the northwest of BML, while 24-hour accumulated precipitation along the 15 trajectories (Table 1) indicated rainfall had occurred in the days prior to the airmass' arrival at BML. Each of these findings 16 was consistent with classification of the M1 cluster as a precipitation scrubbed, clean marine aerosol population. 17 The marine cluster with the next highest number concentration was M2, with 78% of its total occurrences between 17 and 21 18 February. The wind rose for this cluster indicated a primarily northwesterly wind, similar to cluster M1 but with much slower 19 velocities, and backtrajectories with oceanic transport pathways. While HYSPLIT does not always simulate sub-synoptic scale 20 transport with high fidelity, this pattern may be indicative of slower transport of air from a marine region just off the coast, as 21 opposed to the direct fast-transport path from more distant ocean regions seen in the M1 type. As observed for M1, the best-22 fit average normalized particle size distribution (Fig. 4)  Minimal rainfall along the transport pathway was evident in the HYSPLIT accumulated precipitation estimates (Table 1),  26 indicating no recent precipitation scrubbing and suggesting that more cloud processing without rainout led to larger numbers 27 of particles in the accumulation mode compared with M1. In contrast to M1, for which a third fitted mode was found above 28 400 nm, the third mode in M2 occurred at very small particle sizes (diameters less than 30 nm). 29 The final marine cluster, M3, shared similarities with the other marine types, including a bimodal normalized size distribution 30 with a minimum near 110 nm and indications of oceanic source regions in local winds and backtrajectories. However, the 31 average total number concentration was nearly double that of the other marine types. The primary period during which this 32 cluster occurred was during 4-9 February, bracketing the time before, during, and after landfall of the AR that impacted the 33 BML region during CalWater-2015. This cluster is therefore interesting as it may be indicative of a unique population type 1 associated with AR meteorological conditions. Some caution is warranted, however, as instrument downtime lead to a gap in 2 the aerosol size distribution dataset during late 6 February through 7 February, during a high-wind and precipitation period 3 that marked the landfall of the AR, and thus some key data that could be used to guide the clustering during this event were 4 missing. When confined to the 5-8 February period noted by Leung (2016) when the AR made landfall at BML, average 5 number concentration was 749 cm -3 for time periods associated with the M3 cluster (excluding the data gap on 6-7 February), 6 and 1052 cm -3 for the entire 5-8 February AR period (including the intermittent periods classified as terrestrial or anthropogenic 7 aerosol). 8 Backtrajectories for the M3 cluster indicated source regions from just off the coast of the San Francisco Bay area prior to 9 reaching BML, while HYSPLIT accumulated precipitation along the trajectory was the highest of any cluster (Table 1). Flows 10 associated with AR landfall at the coast can be complex (Neiman et al., 2013), however, emissions from this urban area could 11 potentially have mixed with the relatively low particle number counts that would be expected in a precipitation scrubbed AR 12 air mass as it made landfall, accounting for the elevated (> 1000 cm -3 ) total number (CN) concentrations that persisted for 13 much of the AR, including during the high-wind and heavy precipitation period (Fig. 3b) when SMPS and CCN data were not 14 available. However, local generation of fine-mode sea spray aerosol could also be a factor during high winds. 15

Terrestrial population types 16
During periods dominated by diurnal shifts in aerosol and meteorological observations, and for short-duration periods during 17 times associated primarily with marine clusters, the cluster model identified clusters that corresponded to terrestrially 18 influenced populations. In the case of the multi-day events dominated by the M1 and M2 types, these short duration periods 19 were often identified as either T4 or T5, clusters that were notable for having largely monomodal normalized size distributions 20 with median diameters around 100 nm. Occurrences of these cluster types were often associated with a spike in number 21 concentration and changes to either wind direction or wind velocity. Thus, the cluster model was able to identify and separate 22 short duration periods of impacts from terrestrial sources during multi-day marine aerosol conditions at BML. 23 Several longer periods (28-31 January and 13-16 February) were observed during which populations T4 and T5 regularly 24 alternated in tandem with the diurnal land/sea-breeze shift. Similar diurnal-shift behavior, but between clusters T2 and T3, 25 occurred during 25-28 January and 1-5 March. During these diurnal shifts between various terrestrial clusters, the cluster with 26 the larger median diameter was typically associated with the sea-breeze and transport from oceanic regions, while the smaller 27 diameter cluster was associated with the land-breeze and transport from terrestrial regions. As aging of terrestrial aerosol 28 typically leads to an increase in the median diameter of the aerosol modes, these four clusters may therefore be indicative of 29 various degrees of aging of regional terrestrial aerosol during "sea-breeze resampling" at BML (Martin et al., 2017). BML was subjected to extended periods of sea-breezes and ventilation by air masses almost exclusively from ocean regions, 1 the cluster model selected marine cluster types, indicative of marine airmasses that had not experienced much mixing with 2 terrestrial air masses. 3 The terrestrial type T1 featured a dominant mode of particles with median diameters around 30 nm, indicating relatively little 4 aging of the particles had occurred. Both low-level winds and backtrajectories during this cluster type indicated transport 5 pathways from many directions (Fig. 5), though with the highest wind speeds of the terrestrial clusters, consistent with less 6 time between the aerosol source and observation at BML. This cluster occurred primarily during two periods, 23-24 February 7 and 28 February-1 March. During these times, normalized size distributions and modal median diameter fits shown in Fig. 3  8 indicated that the T1 cluster grew into other clusters with more dominant accumulation mode sizes over the course of several 9 days. The T1 cluster may therefore identify a freshly emitted population type or a recent new particle formation event. 10

CCN and activated fraction spectra characteristics 11
The best-fit activated fraction spectra, as functions of water supersaturation, are shown for all valid data points in each of the 12 clusters in Fig. 6(a). As a general comparison against other reported values for aerosol activation spectra, Fig. 6  were drawn from a range of marine, littoral, and continental sites, and were impacted by both marine and terrestrial airmasses, 16 subject to a variety of emissions. The BML spectra spanned much of the range reported for the EUCAARI network, with the 17 M2 cluster slightly above this range at intermediate superaturations. Activated fraction spectra are independent of total number 18 concentration, thus the effect of particle size on activation is evident for the BML cases, with clusters with larger size modes 19 having higher activated fractions across the range of measured supersaturations. These results show the wide range of activated 20 fraction spectra at BML associated with differences in aerosol population type, and the corresponding complexity of the 21 population characteristics at this site. 22 In the CalWater-2015 dataset, marine population types all reached activated fractions of about 0.2 at supersaturations around 23 0.1% to 0.15%, while the terrestrial types did not reach equivalent fractions until supersaturations between approximately 24 0.18% and 0.6% were reached. Terrestrial population types with smaller median diameters tended to have less fractional 25 activation across the full range of measured supersaturations, leading to activated fractions at 1.0% supersaturation that varied 26 from approximately 0.3 to 0.85. However, due to the generally higher total particle number concentrations, and despite lower 27 activated fractions of the terrestrial populations, differences in CCN concentrations between marine and terrestrial types (Fig.  28 6b) were smaller than the differences in the activated fraction spectra. Only at supersaturations above approximately 0.5% did 29 the CCN concentration for the terrestrial types (except T1, associated with many fresh, small particles) consistently exceed 30 those of the marine types (Fig. 6b). Between approximately 0.1% and 0.4%, CCN concentrations were often similar between 31 the marine and terrestrial types.

3.4
Comparison of reconstructed and directly-measured CCN spectra 1 Average values for observations of the hygroscopicity parameter κ from the srCCN system are given for each cluster in Table  2 1. Mean κ values for the three marine population types were higher than for any of the terrestrial clusters, with the κ for the 3 marine populations found to be significantly different (p < 0.05) from those for any of the terrestrial clusters. was 0.30, near the lower end of typical values for marine aerosol in regions of continental outflow, but still above those of the 12 terrestrial population types. As the M2 cluster was also the only marine population type with no indication of recent 13 precipitation scrubbing of the airmass prior to arrival at BML (Fig. 3b), some combination of influences from cloud processing, 14 marine, and terrestrial or anthropogenic sources may result in the observed hygroscopicity values between those of the other 15 population types. 16 The cluster-average hygroscopicities from Table 1 were combined with the average cluster size distributions from Table 2 to 17 create a reconstructed activated fraction spectrum for each cluster. These reconstructions are compared with the direct-18 measured CCN spectra in Fig. S3 and Table 2. Generally, the reconstructed spectra are within one standard deviation of the 19 directly measured spectra from the sfCCN system. However, the reconstruction overpredicts activated fraction for the marine 20 associated CCN number prediction, though they also noted that this discrepancy between predicted and observed CCN 27 concentrations has not been fully resolved. At larger particle diameters measurement uncertainty increases due to imprecise 28 particle size cuts, losses in inlets and tubing, and inversion uncertainties, along with generally lower number concentrations 29 than at smaller particle diameters, leading to higher expected uncertainty in CCN predictions and reconstructions when the 30 critical activation diameter is in this range of particle sizes. At low supersaturations where BML data showed an overprediction 31 bias the critical activation diameter would be above 150 nm. The marine types that had the largest biases at low supersaturations 32 also tended to have larger fractions of particles at these large sizes. This would be expected to add to uncertainty in ways 33 Further investigation of the closure between predicted and observed CCN concentration was conducted using two prediction 3 models. Hygroscopicity derived predictions using cluster average κ and normalized size distribution values were generated 4 and compared against all observed CCN concentrations by the sfCCN system in Fig. 7(a). Similarly, the predicted CCN 5 concentration using cluster average activated fraction spectra were compared against observations in Fig. 7(b). Both models 6 predicted the activated fraction using the cluster type identified during the observation, which was then multiplied by the 7 observed total number concentration at the observation time. The hygroscopicity and size model showed overprediction 8 compared to both observations and the activated fraction model (Fig. 7c), with best-fit slopes of 1.08 and 1.09 respectively. 9 The activated fraction model predictions did improve on the hygroscopicity model, with a slope of 1.00 and R 2 values 10 increasing from 0.87 to 0.89, though this is in part due to the model being based on a direct fit of the observed data.

Aerosol optical properties 14
While optical properties of the various aerosol population types were not directly measured, a simple optical reconstruction 15 was conducted to evaluate potential differences between the population types due to differences in size distribution and 16 hygroscopicity. Average particle size distributions and measured average κ values were used to grow particles to equilibrium 17 with relative humidity across a range of values between 0% and 99%, followed by estimation of mass scattering efficiency on 18 a dry mass basis using Mie theory (Bohren and Huffman, 1983). For the purposes of this simple optical comparison, 19 supersaturated κ values used for this analysis were assumed to be sufficient to provide an estimate of subsaturated hygroscopic 20 growth. An assumed dry index of refraction of 1.5 + 0.0i and density of 1.0 g cm -3 (e.g. Remer et al., 2006) were used for all 21 population types in order to estimate the relative effect of differences in size distribution, hygroscopicity, and relative humidity 22 on scattering properties at a wavelength of 550 nm. The index of refraction of each humidified particle was adjusted based on 23 volume mixing with water (m = 1.33 + 0.0i). Reconstructed mass scattering efficiencies for each of the population types are 24 shown in Fig. 8. 25 Computed dry mass scattering efficiencies at 550 nm ranged between 4.2 and 7.7 m 2 g -1 , although actual values would be 26 expected to be lower due to expected particle dry densities higher than 1 g cm -3 . Further, these values represent only the 27 contribution to mass scattering efficiencies from the sub-micron aerosol, whereas super-micron aerosol, including particles 28 generated by sea spray in littoral environments, can represent a large fraction of the total light scattering. The M1 and M3 29 marine types, which included the largest fraction of larger accumulation mode particles, had the highest associated dry mass 30 scattering efficiency. The M2 marine type, which occurred at generally lower wind speeds than the other marine types and 31 thus had fewer larger particles associated with wind generated sea spray aerosol (O'Dowd and Leeuw, 2007), had a lower dry 32 mass scattering efficiency than several of the terrestrial types with the largest median particle diameters. However, at relative 33 humidity values above roughly 60%, as would typically be expected in littoral environments such as BML, the marine 1 population types all yielded expected mass scattering efficiencies above those of the terrestrial types. For the marine types at 2 a relative humidity of 95%, the mass scattering efficiencies on a unit-density dry mass basis were between 34 and 49 m 2 g -1 , 3 roughly twice the range of the terrestrial types, 16 to 25 m 2 g -1 . As an additional point of comparison, similar reconstructions 4 with MODIS fine mode ocean aerosol populations (Remer et al., 2006) yielded dry mass scattering efficiencies between 3.4 5 and 5.4 m 2 g -1 , and 33 and 54 m 2 g -1 at 95% RH, indicating the marine types were within the same range as assumptions used 6 for MODIS marine aerosol populations. 7

Summary 8
The unsupervised cluster model analysis successfully identified distinct aerosol population types in the littoral zone aerosol at 9 BML during CalWater-2015. The time periods selected by each cluster tended to be both temporally and physically coherent. 10 Clusters also tended to be grouped into periods with physically meaningful microphysical properties that could be associated 11 with meteorological processes and expected sources and transport pathways. For example, the clustering methodology 12 identified regular diurnal swings in aerosol properties associated with land/sea-breeze changes and assigned two distinct, 13 terrestrially-influenced aerosol types during these periods. Overall, the clustering results for the CalWater-2015 dataset 14 produced a reliable set of aerosol population types, and appropriately screened intermittent periods of impacts from various 15 other sources as an outcome of the classification. Both marine and terrestrially influenced aerosol population types were 16 identified by the unsupervised cluster model. Several marine events that persisted for days were identified as distinct in 17 character from each other-differing in the degree of cloud processing and precipitation removal prior to arrival at the 18 measurement site, and in the extent to which high winds contributed larger sea spray particles. About 10% of the observations 19 were associated with a terrestrial population with a large fraction of small particles, indicating it was affected by relatively 20 fresh emissions and/or new particle formation. 21 A primary motivation for CalWater-2015 was improving characterization of the regional aerosol and how it might affect the 22 formation of precipitation. The CCN activation spectra observed at BML spanned a full range reported in the literature, from 23 clean marine to strongly terrestrially influenced in character. However, differences in total aerosol number concentrations 24 associated with the marine and terrestrial types partially offset the differences in activated fraction over a range of measured 25 supersaturations, such that the variability in CCN concentrations between some marine and terrestrial aerosol types at some 26 supersaturations was less than expected from the differences in the averaged total aerosol particle concentrations. 27 In this littoral region sea-breeze resampling and complex mixing between marine, terrestrial, and free-tropospheric air masses Thus, mixed-phase microphysical processes occurring in those clouds might also be expected to be similar whether marine or 2 terrestrial aerosols served as the nuclei for the supercooled droplets. In contrast, for clouds forming at higher maximum 3 supersaturations, terrestrial aerosol populations are expected to yield higher drop concentrations than marine types, with the 4 exception of a terrestrial population characterized by primarily small, recently emitted or newly formed particles. Thus the 5 droplet size distributions formed on terrestrial vs. marine types, for liquid clouds formed in stronger updrafts or at higher 6 cooling rates that reach such higher supersaturations, should be distinctly different.   Table 1 Aerosol and meteorological parameters for each of the cluster time periods. Cluster mean values are given for total particle number concentration, k hygroscopicity parameter from the srCCN system, HYSPLIT 24-hour accumulated precipitation along the trajectory, and local wind velocity observations. Best-fit size distribution and activated fraction parameters are shown, with activated fraction parameters pertaining to the fit model given in the supplemental information.