University of Birmingham A statistical analysis of North East Atlantic (submicron) aerosol size distributions

. The Global Atmospheric Watch research station at Mace Head (Ireland) offers the possibility to sample some of the cleanest air masses being imported into Europe as well as some of the most polluted being exported out of Europe. We present a statistical cluster analysis of the physical characteristics of aerosol size distributions in air ranging from the cleanest to the most polluted for the year 2008. Data coverage achieved was 75 % throughout the year. By applying the Hartigan-Wong k -Means method, 12 clusters were identiﬁed as systematically occurring. These 12 clusters could be further combined into 4 categories with similar characteristics, namely: coastal nucleation category (occurring 21.3 % of the time), open ocean nucleation category (occur-ring 32.6 % of the time), background clean marine category (occurring 26.1 % of the time) and anthropogenic


Introduction
The parameters of the atmospheric aerosols are poorly characterized in global climate models. Particularly uncertain are the ones influencing the radiative balance and properties of clouds, such as the number size distribution, chemical composition and particle mixing state. The lack of a proper representation concerning size distributions of the aerosol in global and regional models is a major reason why direct and the indirect climate effects (Twomey 1974;Albrecht 1989;Chen et al., 2011) of the atmospheric aerosols constitute the largest uncertainty in our present understanding of the anthropogenic climate forcing (IPCC, 2007). Atmospheric aerosol particles span over several orders of magnitude in diameter (D p ), from a few nanometer to hundreds of micrometer. Small particles, in particular nucleation mode (typically with D p <10 nm) and Aitken mode (10 nm<D p <100 nm) particles contribute little to total particulate mass in background air; however, they contribute significantly to surface area and dominate particle number concentration. The surface area contributes to the over all aerosol condensation sink and optical properties (influencing the direct aerosol radiative effect) while number concentration influences cloud droplet concentrations (contributing to the indirect aerosol radiative effect). The shape of the aerosol size distribution is influenced by air mass origin which determines difference aerosol sources and evolution processes . Studies such as that of Dall'Osto et al. have focused on characterization via air mass origin over short periods of weeks to months rather than quantifying the frequency of occurrence of each size distribution type over time scales of the order of a year (Charron et al., 2007;Costabile et al., 2009).
Similarly, a number of studies have already focused on size distributions of particles detected at Mace Head, mainly focusing on marine aerosol constituents. Typical marine aerosol size distributions sampled at the coastal Mace Head research station showed low particle number concentrations with a sub-micron marine size distribution characterised by a bimodal shape, with an accumulation mode centred at 200 nm and an Aitken at 40 nm, indicative of a cloud-residual accumulation mode produced by the in-cloud growth of activated fine mode particles (Hoppel et al., 1994). A study on the seasonality of the clean marine particle size distributions was reported by Yoon et al. (2007), showing the aerosol size distribution modal diameters for different seasons: 31 nm in winter and 49 nm in summer for the Aitken mode and 103 nm in winter and 177 nm in summer for the accumulation mode, respectively. By contrast, air masses affected by anthropogenic pollution sampled at Mace Head during anticyclonic periods and conditions of continental outflow Aitken and accumulation mode were enhanced by a factor of 5 compared to the marine sector (O'Dowd et al., 2001;Coe et al., 2006).
As part of the EUCAARI (European Aerosol Cloud Climate and Air Quality Interactions) Integrated Project, one of the major activities was to conduct and analyse size distribution measurements at a range of European supersites . The EUCAARI intensive measurement programme was throughout the full year of 2008. As one of the 12 atmospheric supersites EUCAARI supersites, Mace Head (Ireland) is uniquely located on the interface between the NE Atlantic and Europe, thus enabling sampling of both the cleanest air entering into Europe along with some of the most polluted air being exported out of Europe into the N. Atlantic (Jennings et al., 2003;. Asmi et al. (2011) presented a combined statistical analysis of 22 stations (including all the 12 EUCAARI supersites) for the year 2008-2009 in terms of means and extremes of the number concentration size distributions encountered.
In this paper, aerosol size distributions are analyzed by using k-means cluster analysis (Beddows et al., 2009). All particle size distributions are considered, and not filtered according to any other criteria. Different states of the aerosol were determined by using a novel application of cluster analysis, which uses the degree of similarity and difference between individual observation to define the groups and to assign group memberships. One advantage of this clustering method over providing an average of aerosol size distributions is that it does provide a specific number of size distributions which can be compared across different time periods (Beddows et al., 2009). Accordingly, the final cluster centres reflect particle number size distributions representative of each cluster. Some examples of particle size distributions cluster analysis for substantial SMPS datasets can be found in the literature, where similar approaches have been previously used. Tunved et al. (2004) showed an analysis resulting in eight different clusters capturing different stages of the aerosol lifecycle. Additionally, Charron et al. (2007) presented an examination of the source signature or origin signature represented by particle number size distribution and modal diameters measured at a rural receptor size in southern England while Beddows et al. (2009) was able to reduce the complexity of the different rural, urban, and curbside atmospheric particle size data according to the temporal and spatial trends of the clusters. Whilst a number of intensive field studies have been focusing on average monthly datasets, clustering analysis on year long particle size distributions measurements are scarce (Engler et al., 2007;Kivekäs et al., 2009;Venzac et al., 2009).

Location
The Mace Head Atmospheric Research Station is located in Connemara, County Galway on the Atlantic Ocean coastline of Ireland at 53 • 19 36 N, 9 • 54 14 W and offers a clean oceanic sector from 190 • through west to 300 • . Meteorological records show that on average, over 60 % of the air masses arrive at the station in the clean sector (Jennings et al., 2003). Air is sampled at 10 m height from a main community sampling duct positioned at 80-120 m from coast line depending on tide. More information about the station can be found in . Clean air is generally characterized by black carbon mass concentrations of 50 ng m −3 or less.

Instrumentation
The on-line aerosol analysers sampled from a 10 m height 10 cm diameter laminar flow community sampling duct with a 50% size cut at 3.5 µm at wind speed of about 10 m s −1 . Total particle concentrations were measured using a Condensation Particle Counter (CPC) bank consisting of a standard TSI CPC 3025A with a 50% particle diameter size cut-off at 3 nm and a TSI CPC 3010 with a 50% size cut at 10 nm. Data acquisition time resolution for the CPC bank was set to 1 Hz. The CPC 3025A was diluted by a factor of 16.1 to avoid saturation of the CPC total number concentration in excess of 100 000 cm −3 (Yoon et al., 2005). It is possible to detect number concentration of ultrafine particles up to 1.7×10 6 cm −3 with the aid of this calibrated dilution system. Throughout this paper N d>3nm and N d>10nm will indicate TSI 3025A and TSI 3010 particle number concentrations, respectively. Size distributions were measured using a TSI nano-Scanning Mobility Particle Sizer (SMPS) between 3 and 20 nm, scanning every 30 s, and a standard SMPS operating 10-min size distribution scans between 20 and 500 nm (Wang and Flagan, 1990).
Basic meteorological parameters such as wind speed (WS), wind direction (WD), relative humidity (RH), temperature (T ), atmospheric pressure, precipitation, global radiation, UV-radiation are measured at the 10 m height level, with some of the measurements duplicated at 22 m. Aerosol scattering coefficient measurements were performed by a TSI Inc. three-wavelength integrating nephelometer (Bodhaine, et al., 1991;Heintzenberg and Charlson, 1996). Aerosol absorption (and Black Carbon mass) was measured using both a McGee Scientific Aethalometer AE-16 and a Multi-Angle Absorption Photometer (MAAP). Furthermore, a TEOM instrument (PM 2.5 ) was also deployed.

Clustering method
The available SMPS data (five minutes resolution) were averaged into hourly values for the analysis. For the year 2008, the data coverage was between the 1 January and the 18 November, with an overall coverage for the year 2008 of 75 %. The 6578 SMPS size distributions obtained at one hour resolution were then subsequently normalised by their vector-length and cluster analysed (Beddows et al., 2009). The use of cluster analysis was justified in this work using a Cluster Tendency test, providing a calculated a Hopkins Index of 0.20 and implying the presence of structures in the form of cluster in a dataset (Beddows et al., 2009). The choice of k-means clustering was made from a selection of the partitional cluster packages (Beddows et al., 2009). The mathematical details of the method presented are available in the Supplement (SI). K-means method aims to minimize the sum of squared distances between all points and the cluster centre. K-means clustering identifies homogeneous groups by minimizing the clustering error defined as the sum of the squared Euclidean distances between each dataset point and the corresponding cluster center. The complexity of the data set is reduced allowing characterization of the data according to the temporal and spatial trends of the clusters. In order to choose the optimum number of clusters the Dunn-Index (DI) was used, which aims to identify dense and well-separated clusters. DI is defined as the ratio between the minimal intercluster distance to maximal intra-cluster distance. Since internal criterion seek clusters with high intra-cluster similarity and low inter-cluster similarity, algorithms that produce clusters with high DI are more desirable. In other words, for Dunn's index we wanted to find the clustering which maximizes this index. The Dunn-Index for the results of the kmeans cluster analysis for different cluster numbers showed a clear maximum for 12 clusters, some of which belonged only to specific times of the day, specific mechanisms as well as specific seasons.

Overview of meteorological conditions of year 2008
The main meteorological data for the year 2008 are presented in Fig. 1 as monthly averages (December is not included in the analysis as SMPS measurements were not available for this month). The month of July and August were found to have the highest temperature (15±1 • C, both), with the lowest temperatures occurring January to March. However, when considering atmospheric pressure and WS, the month of August showed lower values when compared with other summer months. WS was found to be the highest during the winter months, with average values of 9.7±8 and 9.6±8 m s −1 over January and February, respectively. Monthly averaged WD (vector averages) were mainly associated with the clean marine sector (180-300 • ), whereas the month of May was most associated with Easterly wind (140±75 • ). Relative humidity had a minimum in May and a maximum in August.

Air mass back trajectories
Back trajectories of the air masses arriving at Mace Head were calculated for 00:00, 06:00, 12:00 and 18:00 UTC for each day of year 2008, depicting the path taken by the air mass reaching the sampling site over the previous five days. The back trajectories were calculated using the on-line HYS-PLIT model developed by the National Oceanic and Atmospheric Administration (NOAA) (Draxler et al., 1997). The meteorological air mass classification was used following the method described in Dall   (5±8 %) when considering all year 2008. Figure 2 shows for each month of 2008 the percentage occurrence of each of the five different air masses. When considering different seasons, spring experienced more cmP, cP and mA air masses relative to the other types. By contrast, mP air masses dominated during winter months. The month of May was found to be the most affected by anthropogenically influenced air masses ( cmP and cP together contributing up to 52 %), whilst the month of August and October were found the least affected by such air mass types.

SMPS clustering
K-means analysis of the SMPS aerosol size distributions gave 12 clearly-unique clusters whose frequency varied between 2.8 % and 15.2 % (Table 1), none of which dominated the overall population. Table 1 summarises the main features of the 12 clusters and their particle size distributions are presented in Fig. 3a-d. A curve-fitting programme was used to disaggregate the size distributions of each cluster into a number of lognormal distributions (nucleation, Aitken and accumulation) whose average aerosol size diameters are reported in Table 1. Further information related to each cluster is provided in Figs. 4-6, Tables 2-3 and in the Supplement (Fig. S2). In this section average meteorological data (WS, WD, RH, Temp.) are also presented for each SMPS cluster. Moreover, particular emphasis is given in describing unique physical and chemical properties as well specific temporal profiles peculiar of a given cluster. While 12 individual clusters were identified in the cluster analysis, it was further found that these 12 clusters could be distributed into four further groupings namely coastal nucleation, open ocean nucleation, anthropogenic and background clean marine categories. This additional categorisation was based not only upon their similar size distributions among each other (see Fig. 3a-d) but also by considering strong similarities between other parameters, particularly meteorological parameters and air mass trajectories. The reduction to four more-generic classifications, while not based on statistics, is based more on existing knowledge of distributions typically observed and associated with the categories (O'Dowd et al., 2002a, b, for coastal nucleation;, for open ocean events; Yoon et al, 2007, for background marine distributions and both O'Dowd et al.,, 2001, andCoe et al., 2006, for continental characteristics). The average aerosol size distributions of the four aerosol categories (obtained by averaging the SMPS clusters of each individual category) are presented in Fig. 3e. The main characteristics and related environmental parameters associated with the 4 main classifications are described below.

Coastal nucleation (Coast. N. type)
Cluster 1, 2 and 3 posses strong aerosol nucleation modes associated with coastal new particle production events (Fig. 3a). These three clusters accounted for 21.3 % of the total size distributions classified, implying about one fifth of the time Mace Head is under the influence of coastal nucleation events (O'Dowd et al 2002a, b),. Cluster 1 and 2 showed the largest nucleanion modes (N D>3 -N D>10 = 21,447 and 42 676 cm −3 ; respectively while Cluster 3 had a nucleation mode with a concentration of 5703 cm −3 . Clusters 1 and 2 showed the lowest average RH (75.1±10 % and 74.9±10 %, respectively) of the 12 clusters and the highest average solar radiation among all clusters (Fig. 4f). A very interesting feature of these 3 clusters relative to all the others classified is their strong diurnal profile variations (Fig. 5). It is important to stress that these 3 clusters were the only ones showing a clear diurnal trend spiking during the day (Fig. 5). Cluster 1, 2 and 3 shows a progressive shift in the maximum peak of the diurnal profile as the hour of the day increases. In other words, whilst Cluster 1 peaks during the 12:00-15:00 time bin, Clusters 2 and 3 are progressively shifted also during the early afternoon time bins. This can be explained as time during the day passes there is a shift in the size of the nucleation mode from Clusters 1 to 3 (4-5 nm, 6-8 nm and 9-11 nm for Clusters 1, 2, 3, respectively). Cluster 1 is most likely associated with the initial stage of a nucleation event occurring at low tide while Cluster 2 and 3 are representing later stages of the nucleation events when particles have grown to larger sizes during mid-day hours before advecting to Mace Head. Our study is in line with the ones of O' Dowd et al. (2002a, b), which reported that "Type I" showed the highest concentrations at the size below 5 nm (Cluster 1, thus study), whereas in the "Type II" cases, the maximum concentration peak lies at 6-8 nm (Cluster 2, thus study). In "Type III" case the particle growth arising from below 10nm to sizes above 20nm was observed (Cluster 3, this study).  Figure 4 (e) shows the average size distribution for each of the four groups (error bars are one sigma standard deviation). Aerosol category background clean marine is presented as bimodal marine (cluster 7 and 11) and coarse marine (cluster 12)  Figure 4e shows the average size distribution for each of the four groups (error bars are one sigma standard deviation). Aerosol category background clean marine is presented as bimodal marine (Clusters 7 and 11) and coarse marine (Cluster 12).

Open ocean nucleation and growth (Op. Oc. Nucl. type)
While nucleation appears to be a frequent phenomenon in many coastal areas O'Dowd and de Leeuw, 2007;Modini et al., 2009), in open ocean environments new particle formation events have been observed less frequently . O'Dowd et al. (2010) reported open ocean nucleation events detected at the Mace Head station and presented a seven year climatology of such events, illustrating a peak frequency of occurrence in May. In this study, Clusters 4, 5 and 8 are classified as open ocean par-ticle production events and represented 32.6 % of the classified SMPS spectra, suggesting that about a third of the time the Mace Head atmospheric station is under the influence of open ocean nucleation events. Moreover, these clusters were found to be associated with lowest PM 2.5 concentrations (see Table 3

Anthropogenic influence (Anth. type)
Clusters 6, 9 and 10 ( Fig. 3c) show typical particle size distributions affected by anthropogenic pollutants, confirmed by the fact that the BC average concentrations associated with these three clusters were the highest (∼200-600 ng m −3 ) among the 12 classified clusters (Table 3, column e). Furthermore clusters shown in Fig. 4c presented average wind direction linked with inland advection (Fig. 4b). Cluster 10 presented also the highest PM 2.5 concentrations (16.5 µg m −3 , Table 3 column d). Among the three clusters, Cluster 6 was associated with lower temperatures (9.4 • C, Fig. 4d), and its seasonality pattern is described in the following section. Table 2 shows that Clusters 9 and 10 occurs often under cP and cmP air masses, with Clusters 10 showing a 40 % concurrence with cP air mass type. Cluster 6 by contrast does not show strong occurrence under cP/cmP air masses, and this may be due to a more regional-local anthropogenic pollution source for such cluster. Cluster 10 was also associated with the strongest scattering signal among all clusters (Table 3 column f). Nephelometer scattering data for all the 3 wavelengths shows a contrasting trend seen for the Cluster 10 relative to all others, showing higher values for shorter wavelengths and, therefore, suggesting dominant contribution of particles in the accumulation mode .

Background clean marine
Cluster 7, 11 and 12 represented 26.1 % of the total SMPS spectra and were characterised by the lowest particle number concentrations among all clusters. These three clusters were also characterised by similar wind direction and wind speed properties, with the strongest westerly average wind speed among all clusters (Fig. 4a). Among this group, Cluster 7 and Cluster 12 were found to be similar, associated with cold temperatures and low average atmospheric pressure, whilst Cluster 11 showed opposite trends and occurring mainly during summer months (Fig. 4e). Clusters associated in this background clean Marine type group were mostly detected with mP air masses; with Cluster 12 associated up to 59 % of the times with this air mass type (Table 2). Our study shows similar results to the ones reported by Yoon et al. (2007), and further discussions on the seasonality of SMPS clusters associated with this type are provided in the next section. Aerosols described by these SMPS clusters were also characterised by high scattering efficiency (Table 3, column f), reflected again in the high PM 2.5 mass loadings (Table 3, column e). The chemical composition of the marine aerosol cases in supermicron sizes is predominantly comprised of sea-salt . For example, Clusters 11 and 12 were the second and third (respectively) cluster associated with the highest scattering signal (Table 3, column f), regardless of the fact that those two clusters were associated with very low particle number concentrations (Table 3, column a-b). Cluster 7 did not possess high scattering properties; likely do be due to the fact of its smaller accumulation mode intensity relative to Cluster 11 and 12 (Fig. 3d) as well as lower aerosol mass loadings (Table 3, column d).
It is worth noting that Clusters 7 and 11 were similar to the clean marine air particle spectra reported by O'Dowd et al. (2004) representing low and high biological activity over the Northeast Atlantic. However, Cluster 12 is a unique cluster exhibiting large accumulation mode size (with an accumulation mode of average diameter of 372 nm, Fig. 3d), likely originating via sea spray at high wind speed. Recently, Sparklen et al. (2007) used a statistical synthesis of marine aerosol measurements from experiments in four different oceans, predicting a bimodal size distribution that agrees well with observations as a grand average over all regions, but large regional differences were found. Notably, observed Aitken mode number concentrations were more than a factor of 10 higher than in the model for the N. Atlantic indicating

Aerosol category
Cluster number the importance of organic matter in this region. Besides different size distributions associated with different seasons, our study suggests that not only one bimodal size distribution but two distinct types of aerosol size distributions are present during winter months. Figure 6 shows the seasonality of each SMPS cluster, represented by the occurrence of each cluster during each month considered in this study. The 12 SMPS particle size distribution clusters showed very different seasonality due to multiple reasons, including different meteorology and different biological ocean activity throughout the year as well as different anthropogenic influences over time. Whilst the seasonality could be described for the 4 aerosol categories presented in Sect. 4.1, some clusters belonging to the same group showed a different seasonality and therefore each cluster is presented and discussed individually. Within coastal nucleation clusters (1, 2, 3), Clusters 1 and 2 peaked during spring and autumn (Fig. 6). Cluster 1 seems to peak more during late summer times (July-September), whereas Cluster 2 shows its highest frequency during spring months (March-May). Cluster 3 did not present a clear seasonality. The concentration of iodine related precursor gases is related to the width of tidal zone exposed to the atmosphere, and in turn, tidal height (O'Dowd et al., 2002a, b). In addition to these two factors, the origin of air masses arriving at the measurement location is also particularly important, especially at Mace Head (O'Dowd et al., 2002a) since this also determines which tidal areas the air is inter-acting with. The seasonal variation of the number of event days and event duration show a clear seasonal cycle, with the more frequent occurence in spring and autumn, and the rarest in the winter season. Our study confirms earlier studies where it was found that the number of event days for summer is relatively lower than for spring/autumn, mainly due to the amount of precursor gases emitted from marine algae during low tide (O'Dowd et al. 2002a, b).

Overall seasonality of different particle size distributions clusters
Cluster 4 (open ocean nucleation events) by contrast showed a very different trend, peaking mainly during the months of May and June and supporting the study of O'Dowd et al. (2010). We conclude that not only the diurnal profiles but also the seasonality patterns are different between the coastal and open ocean nucleation events. The other two clusters belonging to open ocean nucleation events (Clusters 5 and 8) occurred mainly during summer (with a maximum in June and September, respectively). The higher occurrence of Cluster 4 during May and June is likely to be related with the higher probability of having an open ocean nucleation events during the high biogenic activity months, whereas the occurrence of Clusters 5 and 8 (with larger Aitken modes, Fig. 3b) during warmer summer months is likely to be due to photochemically enhanced secondary aerosol production.
Regarding the clusters affected by anthropogenic aerosols (high value of BC), Cluster 9 showed a strong seasonal trend with higher values during summer months (Fig. 6). The remaining clusters belonging to anthropogenic size distributions did not show clearly discernable seasonal trend. Finally, when considering background clean marine size distributions, Clusters 7 and 12 tended to occur more frequently during winter and characterising clean marine air masses advecting to Mace Head research station during the colder months. The lower contribution of clean marine events to  the total number of events for autumn months is most likely due to the more frequent occurrence of high pressure weather systems during these months, thus bringing more continental air to Mace Head; hence the number of clean marine events are reduced . Whilst Clusters 7 and 12 occurs mainly during winter months, Cluster 11 tends to be more predominant during warmer ones. Yoon et al. (2007) reported that the aerosol light scattering coefficient showed a minimum value of 5.5 Mm −1 in August and a maximum of 21 Mm −1 in February. This seasonal variation was due to the higher contribution of sea salt in the MBL during North Atlantic winter. By contrast, aerosols during late spring and summer exhibited larger angstrom parameters than winter, indicating a large contribution of the biogenically driven fine or accumulation modes. Our study shows similar trends, with clean marine aerosols dominating winter months, and secondary marine aerosols as well as anthropogenic ones dominating the summer months. Yoon et al. (2007) furthermore reported the clean marine aerosols with the large organic fraction appear to be physically larger than the aerosols without (the former occurring mainly during summer times during high oceanic biological activity, the latter during winter). Laboratory studies on bubble-mediated aerosol production (Sellegri et al. 2006) revealed that the aerosol distribution resulting from the use of surfactant-free seawater comprised three modes: (1) a dominant accumulation mode at 110 nm; (2) an Aitken mode at 45 nm; and (3) a third mode, at 300 nm, resulting from forced bursting of bubbles. The forced bursting occurs when bubbles fail to burst upon reaching the surface and are later shattered by splashing associated with breaking waves and/or wind pressure at the surface.
However, the more complex submicrometer spectral structure that is significantly affected by salinity, temperature and surfactant concentration suggested that more detailed studies are required for laboratory generated sea-spray aerosol (O'Dowd and DeLeew, 2007). In summary, during cold winter months, bimodal distributions of the background marine air can dominate the scenarios as meteorology favors the marine clean sector. As the weather warms, then contributions from open ocean nucleation can add to the background and/or as the air approaches the coast then ultrafine particles can contribute to the size distribution depending on the solar radiation and tides. This is all perturbed by anthropogenic pollution under Continental air masses and/or the local and regional pollution from inland wind directions. Table 4 displays the average aerosol physical and radiative parameters for the four aerosol categories presented. The results show that coastal nucleation events possess very high particle number concentrations peaking during day time with marked coastal marine biota seasonality. Anthropogenic aerosol size distributions are associated with high BC mass loadings and high scattering properties at short wavelengths (opposite to background clean marine aerosols, Table 4). Open ocean nucleation events dominated warmer months and, by contrast, clean marine conditions distributions prevailed during winter months. Both primary and secondary formation processes are likely to contribute to the observed physical size distributions but it is not clear as to what extent each process contributes at this stage, especially in the Aitken mode.