Inverse modelling of cloud-aerosol interactions – Part 2: Sensitivity tests on liquid phase clouds using a Markov chain Monte Carlo based simulation approach

. This paper presents a novel approach to investigate cloud-aerosol interactions by coupling a Markov chain Monte Carlo (MCMC) algorithm to an adiabatic cloud parcel model. Despite the number of numerical cloud-aerosol sensitivity studies previously conducted few have used statistical analysis tools to investigate the global sensitivity of a cloud model to input aerosol physiochemical parameters. Using numerically generated cloud droplet number concentration (CDNC) distributions (i.e. synthetic data) as cloud observations, this inverse modelling framework is shown to successfully estimate the correct calibration parameters, and their underlying posterior probability distribution. The employed analysis provides a new, integrative to evaluate the global sensitivity of the derived distribution to the input lognormal large results from prior are the relative very clean the lognormal aerosol


Introduction
Clouds are recognised as one of the most important modulators of radiative processes in the atmosphere (Platnick and Twomey, 1994). Cloud reflectance is partially dependent on droplet size, which in turn is linked to the concentration of cloud condensation nuclei (CCN). The net effect of an increase in CCN is to increase cloud albedo (at fixed cloud liquid water path) generally resulting in a radiative cooling of the surface. To assess the impact of aerosols on clouds in the climate system, it is crucial to understand the underlying physical processes governing cloud-aerosol interactions. The ability of a particle to act as a CCN is a function of the size of the particle, its composition and mixing state, and the supersaturation of the air (Fitzgerald, 1974;Hegg and Larson, 1990;Laaksonen et al., 1998;Feingold, 2003;Conant et al., 2004;Kanakidou et al., 2005;Quinn et al., 2008). Untangling the relative importance of size and composition for the cloud nucleating ability of aerosol particles is at present a major challenge facing the cloud-aerosol modelling community, and this topic is at the core of the aerosol indirect effect (Dusek et al., 2006;McFiggans et al., 2006;Andreae and Rosenfeld, 2008;Stevens and Feingold, 2009). Dusek et al. (2006) showed that particle size accounts for 84 to 96 % of observed variability in CCN concentrations. They hypothesised that aerosol-CCN relationships could be simplified by parameterising the effects of chemical composition on CCN activation for certain aerosol types. Modelling studies by Feingold (2003) and Ervens et al. (2005) also showed that for an internally-mixed aerosol, composition has a relatively small effect on droplet activation, except perhaps under conditions of both high pollution levels and small updraft velocities. However, Hudson (2007) presented a more extensive set of measurements that showed significantly more variability in the relationship between dry particle size and critical supersaturation by including cleaner air masses in the analysis. Other studies have also shown that under certain meteorological/aerosol conditions the effect of chemistry may be relatively more important (e.g. Lance et al., 2004;Rissman et al., 2004;Twohy and Anderson, 2008). In light of this, it is necessary to scrutinize and evaluate model parameters over a wide range of input and output conditions by efficiently searching the entire parameter space of relevant properties governing aerosol activation and growth.
The difficulty in untangling relationships among aerosols, clouds and precipitation has been attributed to the inadequacy of existing tools and methodologies (Stevens and Feingold, 2009). Numerous cloud-aerosol modelling sensitivity studies have been conducted (e.g. Feingold, 2003;Rissman et al., 2004, and references therein;Chuang, 2006), however, few have used statistical analysis tools to investigate the global sensitivity of a cloud model to input aerosol parameters. There are two kinds of sensitivity analysis: local and global. The former examines input parameter variations across ranges that are believed to contain the appropriate values, while global sensitivity analysis considers input parameter changes over the entire multi-dimensional parameter domain (Pérez et al., 2006). When the local sensitivity to a set of model input parameters is tested, models are often run iteratively, perturbing one set of selected parameters at a time, thus testing the sensitivity to these parameters individually. This approach requires prior knowledge as to how best to perturb each input parameter as the number of possible model permutations performed is usually limited. The se-lection of these values becomes more difficult if a parameter is non-measurable or if only limited or unreliable measurements exist.
Methods which explore the whole multi-dimensional parameter space on the other hand have distinct advantages. Global sensitivity analysis generally leads to different, but more reliable results because parameter sensitivities in nonlinear models of complex systems typically vary considerably over the feasible space of solutions. Secondly, if a model exhibits highly non-linear parameter interactions it is possible to diagnose this parameter compensation by simultaneously varying parameters.
Few studies have used global sensitivity analysis to study cloud-aerosol interactions. One example is the study of Anttila and Kerminen (2007), which used the probabilistic collocation method (PCM) to test the sensitivity of cloud microphysics to Aitken mode particles (50-100 nm diameters). One of the main conclusions of their work is that parameters describing the aerosol number size distribution are generally more important than those describing chemical composition unless the particle surface tension or mass accommodation coefficient of water is strongly reduced due to the presence of surface-active organics. This corroborates the results of e.g. Dusek et al. (2006). Despite the progress made, the PCM method which uses a polynomial approximation can never perfectly replace the original cloud-parcel model. Moreover, the parameters used in the polynomial function do not represent system properties, but are just fitting coefficients.
An alternative approach to global sensitivity analysis of cloud-aerosol interactions is to embrace an inverse modelling approach and invoke posterior probability density functions of model parameters using Markov chain Monte Carlo simulation (MCMC). Such methods not only provide an estimate of the best parameter values, but also a sample set of the underlying (posterior) uncertainty. This distribution contains important information about parameter sensitivity, and correlation (interaction), and can be used to produce confidence intervals on the model predictions. The parameter sensitivity determined for the full dimensional parameter set augments the sensitivity derived from 2-D response surface analyses (Partridge et al., 2011, herein denoted P11).
Unfortunately, MCMC simulation requires significant computational resources and in addition, standard MCMC approaches are not particularly efficient and typically require many thousands of model evaluations to find the posterior parameter distribution, even for relatively simple problems. Therefore, it is paramount to test the performance and applicability of sophisticated state-of-the-art MCMC algorithms for investigating cloud-aerosol parameter interactions.
P11 introduced an automatic parameter estimation framework to solve the cloud-aerosol inverse problem using the shuffled Complex Evolution (SCE-UA) global optimisation algorithm (Duan et al., 1992), in conjunction with an adiabatic cloud parcel model (Roelofs and Jongen, 2004). Synthetic calibration data was used to illustrate the methodology in the form of droplet size distributions generated from literature values of model input (calibration) parameters. This allowed us to demonstrate conclusive convergence to the appropriate parameters used to generate the synthetic data as the true values of the calibration parameters were known a-priori. In P11 it was shown that without holding a number calibration parameters at their true values, specifically the lognormal parameters describing the Aitken mode aerosol, surface tension and mass accommodation coefficient, it would be difficult for the automatic search algorithm to find the true optimal parameter values. In particular, it was illustrated that the cloud-aerosol inverse problem is particularly difficult to solve because it is highly nonlinear, and may contain numerous local minima both within the immediate vicinity of the true solution, and far away. Although the SCE-UA algorithm was shown to successfully locate the optimum parameter values for the soluble mass fraction and lognormal aerosol parameters describing the accumulation mode, it does not provide an estimate of the underlying parameter uncertainty, associated with model nonlinearity, measurement and model error.
Explicit treatment of parameter uncertainty is possible if we adopt Bayesian statistics. Therefore, in this study, we pose the model calibration problem in a Bayesian framework, and use the DREAM adaptive MCMC sampling scheme (Vrugt et al., 2008b(Vrugt et al., , 2009b to approximate the posterior parameter distribution. This distribution contains the best parameter values found with SCE-UA, but also summarizes the associated parameter uncertainty. The method is used to compare the global sensitivity of the adiabatic cloud parcel model to different key input parameters. The specific aims are as follows: -Demonstrate that DREAM, a current state-of-the-art MCMC method can be successfully used to provide estimates of parameter uncertainty and correlation when coupled to an adiabatic cloud parcel model. -Demonstrate the applicability and power of MCMC simulation to investigate cloud-aerosol interactions. We are particularly concerned with a global sensitivity analysis of the parameters describing the aerosol physiochemical properties, i.e. the lognormal parameters describing the aerosol accumulation mode and chemistry (denoted by the soluble mass fraction).
-Pinpoint which are the dominant parameters controlling the activation of cloud droplets in different aerosol en-vironments; from clean marine Arctic conditions to polluted continental conditions.
To the authors' knowledge this study is the first to use an MCMC framework with an adiabatic cloud parcel model to summarize the parameter and model uncertainty for cloudaerosol interactions, and infer probability distributions of the factors determining the growth of droplets for different atmospheric conditions. This paper will be presented in the following manner. First we will provide a brief introduction to inverse modelling using Bayesian inference. This will also include a detailed description of MCMC simulations using the DREAM algorithm, and a discussion about the choice of the objective function. We then provide a short overview of the most important cloud-aerosol sensitivity tests that will be performed, followed by stepwise summary of the results. These results will highlight the sensitivity of the cloud droplet number concentration (CDNC) distribution to the different calibration parameters, followed by a discussion of the main findings and conclusions of the work considered herein.

Bayesian inference
To start we provide a short summary of Bayesian inference. For a comprehensive review see e.g. Tamminen and Kyrölä (2001); Jackson et al. (2004);Villagran et al. (2008). Bayesian inference represents a mathematically rigorous approach to parameter estimation. This statistical method treats the model parameters as random variables with a joint (but yet unknown) posterior probability distribution. This distribution is the product of the prior distribution and the likelihood function and conveys all desired information about the current knowledge of the parameters, and implicitly carries information about their maximum a-posteriori (MAP) values, underlying uncertainties and possible multi-dimensional correlations. The posterior probability density function of the parameters, hereafter referred to as p(θ|Y ) can be written as follows using Bayes law: where p(θ) denotes the prior distribution of the parameters, and L(θ|Y ) ≡ p(Y |θ) signifies the likelihood function. The normalization factor, p(Y ), also called "evidence" is difficult to estimate directly in practice, and is instead derived from an integration over the parameter space so that p(Y |θ) scales to unity.
The prior distribution defines our knowledge about the parameters before the actual measurement data is collected and processed. Priors can change iteratively after assimilating new data. This distribution typically constitutes information about the system of interest, and ensures that the parameter estimates at least partially adhere to prior knowledge.
The likelihood function provides a diagnostic measure of how well the model fits the data. It essentially measures the distance between the model predictions and corresponding observations. If we assume a standard Gaussian form of L(θ|Y ), then the highest likelihood is typically found for those parameter values that provide the least squares fit to the experimental data. Additional observations (new evidence) are easily processed in this framework and will result in changes in the posterior parameter distribution. Hence, when confronted with new data, the likelihood function (and prior distribution) will change and alter the parameter and predictive uncertainty. Many different formulations of this function are available in the (Bayesian) literature. Schoups and Vrugt (2010) recently introduced a generalized likelihood function that encapsulates most of these different formulations, and is especially developed to explicitly treat autocorrelation, heteroscedasticity, and non-Gaussianity of the residuals.
Once the posterior parameter distribution has been sampled with MCMC simulation, predictive uncertainty can be derived by evaluating the different posterior samples with the adiabatic cloud parcel model. This results in an ensemble of model predictions from which the appropriate prediction intervals (90 %, 95 %, 99 %, etc.) can be estimated.
This distribution contains the required information to assess the importance (sensitivity) of individual parameters, and their cross-correlation. If the marginal posterior distribution of a given parameter is very well defined and extends over only a small portion of its prior range, the parameter can be considered sensitive. On the contrary, if the marginal posterior distribution extends over a large region of its prior distribution, then the parameter is said to be insensitive. Thus, the reduction in uncertainty of the posterior distribution compared to the prior is a simple and useful diagnostic to assess parameter sensitivity. Further details are given in Sect. 3.3.1.
In the past decade, much progress has been made in the development of efficient sampling methods that approximate the posterior distribution within a limited number of model evaluations. The Markov chain Monte Carlo (MCMC) scheme was introduced by Metropolis et al. (1953), the basis of which is a Markov chain, which generates a random walk through the search space and successively visits solutions stemming from a fixed probability distribution (Vrugt et al., 2009a). This sampling procedure operates in two steps: (1) The proposal step: a candidate value is sampled from a "proposal distribution". (2) The acceptance/rejectance step: the candidate value is either accepted or rejected using the Metropolis acceptance probability (Järvinen et al., 2010).
The original Metropolis MCMC scheme was extended for posterior inference in a Bayesian framework by Gelfand and Smith (1990), and has subsequently enjoyed widespread use in many fields of study (Vrugt et al., 2009b, and references therein). MCMC algorithms are typically used to summarize parameter and model output uncertainty, without recourse to studying parameter sensitivities. A few studies exist that have used MCMC simulation to study "global" parameter sensitivities (Benke et al., 2008;Kanso et al., 2006;Vrugt et al., 2006Vrugt et al., , 2008b), yet such contributions are rather novel. This is rather remarkable as the posterior distribution directly conveys information about parameter sensitivity.
Existing theory and experiments prove the convergence of well constructed MCMC schemes to the appropriate limiting distribution under a variety of different conditions. However, in practice this convergence is observed to be frustratingly slow, the efficiency being limited by the scale/orientation of the proposal distribution (Vrugt et al., 2009b). Slow convergence towards the correct target distribution is frequently caused by an inappropriate selection of the proposal distribution used to generate trial moves in the Markov chain. This indicates the need for preliminary test runs or arduous hand tuning of the proposal distribution. Naturally this is a particular hindrance for the successful application of Bayesian inference for models that are CPU intensive, necessitating the use of more sophisticated and efficient MCMC methods which improve on the efficiency of older methods by employing adaptive techniques that 'learn' during the sampling process. This allows the continuous adaptation of the shape/size of the proposal distribution such that the sampler more rapidly evolves towards the appropriate limiting distribution (Vrugt et al., 2009b). Convergence can also be hindered for inverse problems that contain numerous local minima in the posterior parameter space when using single chain MCMC methods. Gelman and Rubin (1992) advocate the use of MCMC algorithms that run multiple different Markov chains (trajectories) in parallel. This not only reduces the chance of getting stuck in a local solution, but it also helps monitoring convergence to a limiting distribution. For instance, a simple comparison of the within and in-between variances of the different chains will help judge whether the same distribution is being sampled by the different parallel chains. This convergence diagnostic was introduced by Gelman and Rubin (1992) and is generally referred to as theR-statistic. In this study we employ a state-of-the-art self adaptive DiffeRential Evolution Adaptive Metropolis algorithm (DREAM) (Vrugt et al., 2009b) for the efficient investigation the cloud-aerosol inverse problem.

DiffeRential Evolution Adaptive Metropolis algorithm: DREAM
The DREAM sampling scheme is an adaptation of the Shuffled Complex Evolution Metropolis (SCEM-UA) algorithm (Vrugt et al., 2003), but maintains detailed balance and ergodicity (Vrugt et al., 2008a(Vrugt et al., , 2009b. The DREAM algorithm uses differential evolution as a genetic algorithm for population evolution with a Metropolis selection rule to decide whether to accept the candidate points (offspring) or not.
In DREAM, N different Markov chains are run in parallel, and jumps in each chain are generated using a fixed multiple of the difference of the states of one or more randomly chosen pairs of chains. The scale and orientation of this discrete proposal distribution is continuously changing en route to the posterior target distribution. The samples generated after convergence can be used to summarize the posterior distribution, and communicate parameter and model predictive uncertainty. Synthetic and real-world case studies have shown that this new approach elicits good efficiencies for complex, highly nonlinear, and multimodal target distributions (Vrugt et al., 2009b) typical for the parameters involved in cloud-aerosol interactions (P11). It is therefore well suited to the purpose of this investigation.

Adiabatic cloud parcel model
Adiabatic cloud parcel models have been used successfully with field measurements to estimate the impact of aerosol size/composition for liquid clouds (Ayers and Larson, 1990;Nenes et al., 2002;Hsieh et al., 2009). To complete an MCMC simulation for a single cloud case with relatively few calibration parameters, many thousands of cloud model evaluations are required to explore the posterior distribution. The computational requirements of MCMC could therefore hinder the use of CPU intensive models. In this paper, we utilize a computationally efficient adiabatic cloud parcel model that provides a reasonable trade-off between processes accounted for and computational speed. This provides us with flexibility to run different MCMC trials with different data sets, and calibration parameters. The chosen cloud parcel model (Roelofs and Jongen, 2004) simulates the adiabatic ascent of an air parcel, condensation and evaporation of water vapor on aerosols, particle activation, condensational growth, collision and coalescence between droplets, and aqueous phase sulfur chemistry. As in P11, the model is currently configured so that the aerosol is represented as an internal mixture of compounds. The reader is referred to P11 for a description of the model setup and to Roelofs and Jongen (2004) for more information on the cloud parcel model.

Calibration parameters
To test a wide range of input aerosol size distributions, data from four distinctively different aerosol environments were used, as outlined in P11. These are: 1. Marine Arctic: summertime measurements performed at Ny-Ålesund, Svalbard (P. Tunved, personal communication, 2011).
The base "true" value for all 10 input parameters of the adiabatic cloud parcel model and the associated lower and upper (prior) limits for the four parameters to be investigated (herein termed calibration parameters) can be found in Table 1 for marine Arctic and marine average conditions and in Table 2  The only difference to P11 is in the definition of the prior limits for the soluble mass fraction. In P11 as our main interest was in visualising the posedness and sensitivity of the calibration parameters the soluble mass fraction was allowed to vary over the entire range of possible solutions, thus between 0.05 and 1. The main thrust of the present paper is to investigate the global parameter sensitivity for different aerosol environments. It is important that the prior limits are representative of the real atmosphere or else subsequently derived relative sensitivity estimates for individual parameters may be misleading. Therefore, to better portray the behavior of the soluble mass fraction in the real-world, it is necessary to narrow its prior range somewhat to represent the range of possible observations. Thus, we define the prior limits for the soluble mass fraction based on the statistics available in the same literature used to define the soluble mass fraction base values (P11).

Synthetic calibration data
To benchmark our MCMC algorithm, it is useful to start the inverse modelling analysis with numerically generated cloud observations (i.e. "synthetic" calibration data) simulated using known values of the model parameters. In this study, these known values are defined as the base values for each Table 1. Model parameter values used to generate synthetic data for marine Arctic and marine average aerosol environments (bold), as well as their respective lower and upper prior bounds used to create posterior distributions derived with DREAM. Parameters N, R and GSD denote particle number concentration, mean radius, and geometric standard deviation of the aerosol mode where an accompanying number 1 indicates the Aitken mode and number 2 the accumulation mode. Sol MF denotes the soluble mass fraction. Parameters 1-6 are held fixed at their true values during the MCMC simulation.

Environment
Marine Arctic Marine average Parameter Lower limit True value Upper limit Lower limit True value Upper limit . This is important to ensure that the subsequent sensitivity analysis is not contaminated by model error or parameter nonidentifiability. The choice of the calibration data set essentially determines the posterior distribution of the parameters. More information available in the calibration data allows for more parameters to be constrained. On the contrary, noisy data with poor sensitivity to the individual parameters will result in uncertainty in the posterior distribution. Hence, in such situations it will be difficult to reduce parameter uncertainty, and appropriately calibrate the adiabatic cloud parcel model. Thus, the information content of the calibration data directly determines the identifiability, uncertainty, and correlation of the adiabatic cloud parcel parameters (P11).
Here we wish to assess the impact of the calibration parameters on the number of activated cloud droplets and we therefore remove the interstitial aerosols from our calibration data set. The simulated droplet size distribution is output at 100 m above cloud base which is used as the calibration target.
To investigate the influence of environmental conditions on the posterior distribution and associated sensitivity of the governing adiabatic cloud parcel model parameters we synthetically generate CDNC distributions using input from four different aerosol environments (cf. Sect. 2.4). The resulting CDNC distributions are depicted in Fig. 1. The dN /dlogD p particle size distribution generated for marine Arctic (cyan), marine average (blue), rural continental (green) and polluted continental (red) aerosol environments. Black dotted line represents location of 2 µm diameter.

Coupling adiabatic cloud parcel model to MCMC algorithm
The top part corresponds to "the real-world", in our case represented by synthetically generated data. The environmental conditions (denoted with "true input") act on the "real cloud" to produce a certain particle size distribution (dotted blue line, Fig. 2). Note, although the cloud parcel model output includes the interstitial aerosol as represented in the schematic, in this study we only include the activated droplets in the calibration data (cf. Sect. 2.4.1). The terminology "true" and "observed" response is used to differentiate between reality and respective observations of reality that are prone to measurement error and uncertainty. Our framework thus explicitly recognizes the role of measurement error.
The DREAM algorithm is now used to find those values of the adiabatic cloud parcel parameters that provide the best possible fit to the measured droplet size distribution. This results in an ensemble of parameter values that define the posterior distribution. Mathematically, the model calibration problem can be formulated as follows: LetỸ = φ(X,θ) = {ỹ 1 ,...,ỹ n } denote predictions of the model with observed input variables X and model parameters θ. Let, Y = {y,...,y n } represent observations of the droplet size distribution (where n corresponds to the resolution -i.e. the number of size bins used in cloud parcel model). The difference between the model-predicted and measured droplet size distribution can be represented by the residual vector E as: where G(.) allows for various monotonic (such as logarithmic) transformations of the output. The inverse modelling approach now relies on the estimation of the set of input pa-rameters θ such that E(θ) is in some sense forced to be as close to zero as possible.
We run the DREAM algorithm with the parameter bounds of the four calibration parameters listed in Table 1/2 and with 10 different Markov chains and 75 000 cloud parcel model evaluations. Our experience with other parameter estimation problems of similar dimension suggests that these settings are appropriate. Such a setup completes an MCMC simulation in approximately two days using a standard desktop computer.

Defining the objective function -OF (θ)
In practice, it is difficult, if not impossible to work directly with the n-dimensional vector of residuals and find the appropriate parameter values. Instead, it is much easier to aggregate the error residuals (E(θ), see Eq. 3) into a single measure of model performance and minimize (or maximize, if appropriate) this diagnostic. Such a measure is typically called the objective function, hereafter referred to as OF (θ). Typically, we are seeking a minimum discrepancy between our model predictions and the corresponding data. The simple least squares (SLS) OF (θ) is one of the most commonly measures in model-data synthesis studies, and is defined as where θ signifies the vector of calibration parameters. For the cloud-aerosol inverse problem these are the input lognormal parameters describing the accumulation mode and soluble mass fraction. The SLS approach essentially assumes that each data point has a similar measurement error. This is also referred to as homoscedasticity. Examples of measurements that typically exhibit homoscedastic errors include temperature and pressure. In this specific case, the likelihood function of Eq. (1), L(θ|Y ) is directly related to the OF (θ), where σ denotes the standard deviation of the measurement error.
If the measurement error varies dynamically with the magnitude of the data, then the error residuals need to be normalized with the measurement error to ensure statistically optimal estimates of the model parameters. Real world observations of precipitation, river discharge and the cloud droplet size distribution considered herein typically exhibit heteroscedasticity. The synthetically generated model output therefore needs to be perturbed with a "measurement error", (Koda and Seinfeld, 1978) to obtain parameter uncertainty. We assume a 10 % error for each individual calibration data point, and perturb each observation artificially with this measurement error. We then use this perturbed data set as our calibration data using MCMC simulation with DREAM to derive the posterior parameter distribution, and subsequent global parameter sensitivities. The particular choice of error  Fig. 2. A schematic representation of inverse modelling. The rectangular box in the bottom panel represents the cloud-parcel model that is being used to predict the observed particle size distribution from given input data (also called forcing or boundary conditions), and some a-priori values of the model parameters. The model parameters are iteratively adjusted so that the predictions of the model, (represented by the green and red solid lines) approximate as closely and consistently as possible the observed response (measured particle size distribution: blue dotted line). function used here was guided by experience with real world measurements.
The likelihood function for the heteroscedastic case is closely related to Eq. (5), but normalizes each error residual as follows, where the measurement error variance now explicitly depends on the actual observation. The identifiability of the calibration parameters is somewhat dependent on the definition of the OF (θ). Adiabatic cloud parcel models that employ a moving centre (MvCr) framework are particularly problematic for inverse modelling techniques as both the droplet radius and number are simultaneously changing in each run (P11).
For comparisons between different simulations to be meaningful, it is essential to construct a calibration data set that is constant with respect to the droplet size grid regardless of the prescribed calibration input parameters. If the OF (θ) is defined using only the raw MvCr output of the dN/dlogD p function, without any radius information, then it is in theory possible to achieve exactly the same function shape for different parameter combinations, i.e. the calibration parameters are non-identifiable.
To avoid this, a direct interpolation of the droplet size distributions is performed, so that the corresponding model predictions of the dN/dlogD p size distribution functioñ Y = {ỹ 1 ,...,ỹ n } are interpolated to the size grid of the calibration data, Y = {y 1 ,...,y n } (Fig. 1). Unfortunately depending on the environmental conditions (aerosol size distribution/updraft velocity) this interpolation can result in poorly defined and chaotic response surfaces (P11) and nonidentifiability problems for high dimensional setups.
In all our simulations presented herein, we discard the first 80 % of the samples in each Markov chain, to give the MCMC sampler a more than sufficient time to successfully converge to the posterior distribution. The number of steps in each chain required to travel to the posterior distribution (convergence) is commonly called "burn-in", and these samples are removed from the analysis (Dekker et al., 2010). In principle, we could take all those simulations for which thê R-statistic is smaller than 1.2; but resort to the last 20 % of our 75 000 samples. This is sufficient to obtain stable posterior statistics.

Performed sensitivity simulations and analysis
In this first study using MCMC to investigate cloud-aerosol interactions we limit ourselves to investigating four parameters. Simulations and analysis will be presented for the calibration parameters deemed to be of most interest for the discussion regarding the relative importance of particle size versus chemistry. Those are the number concentration, mean radius, and geometric standard deviation of the accumulation mode aerosol as well as the soluble mass fraction (cf. Tables 1-2). The analysis is performed for four aerosol environments (Sect. 2.4).
In the following, we will: 1. Perform an initial sensitivity analysis of the calibration parameters for marine average and rural continental environments. 2. Examine the posterior parameter distributions for all four aerosol environments in order to present a more detailed sensitivity analysis whilst concurrently revealing the effects of parameter compensation within the adiabatic cloud parcel model.

Atmos
3. Repeat step 3 for a "lower" and "higher" updraft velocity conditions to study the effect of updraft velocity on the derived sensitivity.

Performance of MCMC algorithm
To demonstrate that DREAM successfully converges to a posterior distribution that contains the correct parameter values, please consider the blue dots in Figs. 3 and 4 that illustrate the performance of the MCMC sampler. We display the results for marine average and rural continental conditions only. The blue dots represent the convergence of the prior distribution towards the marginal posterior distribution for each parameter and correspond to when we perturbed our calibration data with a 10 % synthetic measurement error.
They illustrate the sensitivity bounds with respect to the true optimal solution, which for this synthetic study are the base parameter values documented in Tables 1, 2, as represented by the green dotted line for each calibration parameter. The convergence of the MCMC algorithm when run without perturbing with a heteroscedastic measurement error in reaching these single optimal values is illustrated by the red lines.
The range on the Y-axis of each subplot in Figs. 3 and 4 corresponds to the prior range defined in Table 1, 2 for marine average and rural continental conditions within which the algorithm is allowed to search. This means that the range of the posterior distribution for a specific parameter in relation to the prior distribution (seen at function evaluation = 0) provides key information as to how sensitive the particle size distribution is to changes in a parameter (cf. Sect. 3.3.1). Since all input parameters are simultaneously optimised within this framework, a calibration parameter whose posterior distribution has a small spread about the true solution is of high importance; as there are few combinations for which it can be defined in the model input and still get a 46 generate the synthetic droplet size distribution. The red line represents the convergence of DREAM 1355 algorithm when the calibration data set is not perturbed with a heteroscedastic measurement error. measurement output which is close to the calibration data.
To visualise the maximisation of the likelihood function with regard to the individual Markov Chains consider one parameter, the number concentration of the accumulation mode aerosol for marine average conditions (Fig. 5). In this figure the convergence of the parameter value (Fig. 5a) and the evolution of log likelihood value, log-L(θ|Y ) (Fig. 5b) for the separate Markov chains are plotted, complementing the results shown in Fig. 3a. It is clear from this figure that the convergence is fast (∼8000 samples) and the posterior distribution is stationary after ∼15 000 samples.

Initial results
In the following sections we focus on the samples stored in the posterior distribution with respect to the relative parameter sensitivity. To calculate the relative sensitivity of each individual parameter, we simply normalize its posterior range with the prior range. We can subtract this value from 1, so that a large reduction in uncertainty corresponds to a high sensitivity (cf. Sect. 4). For instance, if the prior distribution of a given parameter varies between 0 and 10, and its (marginal) posterior distribution ranges from 2 to 4, then the relative sensitivity of this parameter is 1-2/10 = 0.8. Sensitivities thus range between 0 (completely insensitive) and 1 (extremely sensitive -or uniquely defined). In other words, the larger the reduction in uncertainty of the posterior range of a parameter compared to its prior range, the more sensitive a parameter is. We choose to define our relative sensitivities in this way, however, we are aware that from the information stored in the posterior distribution there are alternatives (e.g. standard deviation).
Based on the width of the posterior distribution (cf. Sect. 3.2) it is clear from Figs. 3a and 4a that for both aerosol environments the key calibration parameter for describing the CDNC distribution is the number of particles in the accumulation mode, as its posterior range is the narrowest out of all calibration parameters relative to its prior Atmos. Chem. Phys., 12, 2823-2847 The CDNC distribution associated with this posterior distribution is shown in Fig. 6 for all four aerosol environments. It is clear that the solutions stored within the posterior distribution bound the calibration data set for all aerosol conditions investigated.

Parameter sensitivity
We will now explore the relative sensitivity between the parameters by investigating the normalised posterior distribution for each of the calibration parameters for all four aerosol environments (Fig. 7). A larger normalised posterior range represents smaller sensitivity to a calibration parameter. It should be noted here that our normalised ranges used to infer parameter sensitivity are dependent on the prior range. It is for this reason that the prior ranges have to represent physically reasonable lower and upper limits for each parameter (cf. Sect. 2.4).
The results for marine average aerosol conditions (Fig. 7b) confirm those displayed in Fig. 3, i.e. for the adiabatic cloud parcel model used in this study the particle concentration of the accumulation mode is the most important parameter for the activation of cloud droplets. The geometric standard deviation of the accumulation mode and soluble mass fraction are least important. For marine Arctic conditions (Fig. 7a) whilst the sensitivity towards the geometric standard deviation, mean radius and soluble mass fraction is increased compared to marine conditions the relative sensitivity between the parameters is very similar A low relative sensitivity to chemistry in cleaner aerosol environments (fewer CCN) is intuitive; it does not matter how soluble a particle is if it does not exist. Thus, the number of particles must be, up to a certain threshold the limiting factor in any environment for the cloud droplet nucleating ability of an aerosol population. This will be especially true for environments in which the number of available cloud condensation nuclei (CCN) is limited (P11). This is also consistent with current observations and theory for cleaner (e.g. marine) aerosol environments (e.g. Dusek et al., 2006).
For rural continental conditions, the overall picture is the same, the number of aerosol particles in the accumulation mode is still the most important parameter and the soluble mass fraction is the least important calibration parameter (Fig. 7c). However, now the soluble mass fraction is relatively more important, having approximately the same normalized parameter range as the accumulation mode mean radius. The importance of the accumulation mode number concentration is lower than for marine average conditions. The geometric standard deviation of the accumulation mode is only slightly less important for the cloud nucleating ability of particles than the mean radius. This is in agreement with the study of Anttila and Kerminen (2007) which also focussed on continental background aerosol conditions.
Moving to a yet further polluted environment (Fig. 7d) we see a shift to an increase in the importance of chemistry for describing droplet activation, the soluble mass fraction now the most important parameter, with the difference between sensitivity of the lognormal aerosol parameters describing the accumulation mode, in particular the number concentration, decreasing further. These results are consistent with current theory for conditions in which the environment is polluted and the updraft is relatively low, (0.3 m s −1 ). For more polluted aerosol conditions the higher concentration of larger particles results in the activation of larger droplets, followed by a suppression of peak supersaturation which tends to reduce the total number of droplets activated. This allows for the soluble mass fraction to be relatively more important, in agreement with previous studies (Feingold, 2003;Lance et al., 2004;Ervens et al., 2005;Quinn et al., 2008). It is expected at higher updraft velocities the critical supersaturation is reduced, enabling a greater fraction of the larger aerosol to activate (regardless of composition), thereby decreasing the relative sensitivity of the aerosol composition compared to aerosol size .
The evolution of the calibration parameter sensitivity from very clean (marine Arctic) to more polluted conditions is in keeping with a previous two-dimensional response surface analysis of the sensitivity between aerosol accumulation mode number and chemistry (P11). There is a clear tipping point in the relative sensitivity between chemistry (denoted by the soluble mass fraction) and the lognormal parameters describing the accumulation mode aerosol that occurs at an accumulation mode number concentration level The last 20 % of the samples generated with DREAM were used to derive the results. The y-axes are scaled between 0 and 1 using the prior ranges defined in Table 1 to yield normalized ranges. The blue error-bars represent define the 1 %-99 % limits of the normalized posterior distribution. The blue circles are used to signify the MAP values of the calibration parameters that provide the highest likelihood to the measured (synthetic) droplet size distribution, whereas the red circles denote the true parameter values used to create the synthetic calibration data. Each grey line going from left to right through each panel is a different parameter sample from the posterior distribution. between marine average (Fig. 7b), and rural continental conditions (Fig. 7c). This behaviour is caused by the shift from clean aerosol environments (low available CCN) to more polluted environments (higher available CCN) and the associated competition for water vapour. Table 3 lists values of the derived posterior: mean, minimum, maximum, coefficient of variation (CV) and MAP value of the four calibration parameters under investigation for all aerosol environments. The MAP value is simply the point in the MCMC sample for which the likelihood function, L(θ|Y ) was maximized (hence the calibration parameter value that provided the best fit to the calibration data). This is because we assume a flat (uniform) prior parameter distribution. With other, non-uniform, prior distributions, the MAP is defined as maximum of the product of the likelihood function and the prior density.

Distribution of parameter values
For all aerosol environments the soluble mass fraction has the highest coefficient of variation, showing the parameter to have the highest uncertainty within the posterior parameter distribution. As indicated by the less constrained minimum and maximum ranges after optimisation for polluted conditions, more variability in the calibration parameters describing the activation of cloud droplets is possible, whilst still achieving approximately the same CDNC distribution compared to clean aerosol environments; this will be discussed further (cf. Sect. 3.5).
The MAP value is generally very close to the base values of the calibration parameter for all aerosol environments; with the MAP value of the soluble mass fraction departing furthest from the base value, e.g. for marine average conditions (Table 3; Fig. 7b) (0.75 compared to 0.90). For polluted conditions the accumulation mode number concentration MAP value is 1352 cm −3 , ∼150 cm −3 higher than the true value. The reason for this departure from the true value can be partially ascribed to the magnitude of the calibration data. The perturbation to the synthetically generated CDNC distribution using a 10 % heteroscedastic error in Sect. 2.6 to obtain the calibration data was generally positive, resulting in it having on average a higher peak droplet number Table 3. Prior ranges and true values for each environment are presented under heading "Initial Range" for: marine Arctic, marine average, rural continental and polluted continental conditions. Summary statistics of the derived final (posterior) distribution are also listed for each parameter for which CV denotes the coefficient of variation and MAP the maximum a posteriori (MAP) values.

Environment
Marine Arctic concentration than the original (error free) CDNC distribution generated using the base parameter values. Therefore, it is consistent that the MCMC algorithm tends towards a MAP accumulation mode number concentration that is higher than the base value for this parameter. This is more noticeable for polluted aerosol conditions for which there is a reduced sensitivity (higher uncertainty) to the particle concentration as (for the current base updraft velocity) more particles remain unactivated, staying within the interstitial size regime.

Parameter compensation and correlation
To explore relationships between calibration parameters further we analyze the marginal (posterior) distributions for each aerosol environment and present the results in Fig. 8.
The marginal density is the probability distribution of the variables contained in our four-dimensional inverse problem and provides us with counts of the calibration parameters values over their posterior distribution range, thus providing the shape of the posterior distribution. The marginal distributions are derived by plotting the frequency distribution of each individual parameter sampled with DREAM. In such procedure we essentially marginalize out all the other parameters, and in probability theory and statistics we therefore refer to these histograms as marginal distributions. A marginal distribution that extends over the entire prior ranges is indicative for poor parameter sensitivity. On the contrary, if the histogram is well defined with narrow ranges, then this parameter is well defined, and sensitive to the calibration data. For polluted continental aerosol conditions (subplots M-P) the histograms are not well defined for the lognormal parameters describing the accumulation mode aerosol, showing significant parameter variation across the posterior range. This indicates that for these three calibration parameters there is a wide range of possible aerosol size distributions that can be considered optimal for the given environmental conditions. The spread of the posterior distribution around the modal values for these calibration parameters are generally more constrained for cleaner aerosol conditions. This indicates that for clean environments these parameters are particularly important for the accurate prediction of the droplet size distribution.
The shape of the marginal density distribution for all aerosol environments except marine Arctic indicates the presence of correlations between the four calibration parameters under investigation. For each of these three environments, many of the four calibration parameters depart from a normal distribution. The probability density is forced to accumulate at the parameter bounds so that the peak of the probability distribution departs from the true optimal solution, causing the marginal distributions to be skewed. This result indicates that aerosol physiochemical properties within the adiabatic cloud parcel model compensate each other to achieve the same CDNC distribution. To examine this in more detail consider Table 4 that presents correlation coefficients of the samples of the posterior parameter distribution. For each aerosol environment the calibration parameters that show significant co-variation (correlation coefficient |r| > 0.6) have been highlighted in bold.
As three of the four aerosol environments share common correlations between three sets of calibration parameters we present these three sets in the form of scatter plots for all conditions (Fig. 9). These scatter plots can potentially be used to gauge at which point within the parameter space a specific parameter used to describe the activation of cloud droplets becomes important in relation to another calibration parameter for a certain atmospheric environment. All parameter combinations in the posterior distribution shown in Fig. 9 give approximately the same cloud droplet size distribution for each aerosol environment respectively (Fig. 6).
The geometric standard deviation is positively correlated with the number concentration of particles in the accumulation mode for all four environments (Fig. 9a, d, g and j). Thus, to reach the same CDNC distribution it is necessary for both the number and geometric standard deviation to increase simultaneously. This is in agreement with previous studies, for instance Quinn et al. (2008), reported that for a given mean particle diameter and total number concentration, increases in the geometric standard deviation lead to a decrease in the total droplet concentration because a broader mode suppresses the supersaturation due to the presence of more larger particles.
There is a strong negative relationship between the soluble mass fraction and the number of aerosol particles (Fig. 9b, e, h and k) as well as the geometric standard deviation of the accumulation mode (Fig. 9c, f, i and l) for all aerosol environments. There is a clear shift in the linearity of the correlation as we move into polluted environments which can be attributed to the associated increase in sensitivity of the soluble mass fraction relative to the lognormal aerosol properties describing the accumulation mode. The change in shape (width) of the correlation across the parameter space is indicative of the relationship between parameters pairs. For polluted continental conditions the relative sensitivity towards the soluble mass fraction decreases if the number or the geometric standard deviation of the accumulation mode is increased (Fig. 9k and l), evident from the increase in scatter in the posterior distribution. Thus the ability of the chemistry to compensate changes in these lognormal accumulation mode parameters in such conditions is reduced as a larger percentage change in the soluble mass fraction is required to match the calibration data. This is in agreement with current theory that for more polluted environments the effect of a decrease in supersaturation with a larger geometric standard deviation is larger in the presence of more large particles. This analysis highlights the importance of a proper representation of the geometric standard deviation for estimating the cloud nucleating ability of particles (cf. Sect. 3.3). . Phys., 12, 2823-2847 In Fig. 10 the relationship between all four calibration parameters is presented. If the soluble mass fraction is reduced the number, geometric standard deviation and mean radius of the accumulation mode must increase to achieve a very similar CDNC distribution. The correlations are less clear for marine Arctic conditions (Fig. 10a) and this can most likely be partly attributed to the very narrow CDNC distribution and the loss of information caused by an interpolation of this function to a fixed size grid (P11).

Atmos. Chem
The scatter plots presented in Fig. 10 illustrate that there exists a wide range of aerosol physio-chemical properties that result in very similar modelled cloud microphysical properties. Therefore, we can surmise that for real world applications of inverse modelling of cloud-aerosol interactions, it will be necessary to obtain detailed measurements of cloud properties to ensure that different clouds can be considered "unique". In light of this, height resolved measurements, size resolved chemistry, and interstitial aerosol measurements are beneficial (P11).
In summary, the sensitivity analysis presented in Sect. 3 illustrates that the size of the aerosol particle is only "sometimes" more important than its chemical composition, highlighting the importance of accurately representing the chemical composition of aerosols in global climate models. This must be considered in future development of parameterisations used to calculate droplet number with respect to subsequent calculations of the aerosol indirect effect, thus it is paramount to estimate the importance of chemical effects for a variety of environments and meteorological conditions globally.

Effect of updraft velocity
The impact of the magnitude of the updraft velocity on the relative sensitivity of the aerosol physiochemical properties is investigated by changing the base updraft velocity value from 0.3 m s −1 to 0.15 m s −1 and to 0.60 m s −1 , respectively. The statistics of the posterior distribution resulting from these simulations are presented in Table 5 for all four aerosol environments. The same initial ranges were used, as defined in Table 3. It is important to ascertain the effect of updraft on the sensitivity of the parameters describing the aerosol physiochemical characteristics as it has a strong influence on the number and size of cloud droplets formed Brenguier and Wood, 2009). We also showed from our initial response surface analysis (P11) that the CDNC distribution was most sensitive updraft perturbations.
To illustrate the results from all updraft simulations simultaneously we calculate our relative sensitivity and subtract this value from 1 (cf. Sect. 3.3.1) so that a high value represents a relatively more important parameter and plot these against the accumulation mode number concentration for each aerosol environment (Fig. 11). The relative importance of the chemistry compared to the accumulation mode radius increases for all aerosol environments when the The last 20 % of the samples generated with DREAM were used to derive the results. The y-axes NPR label denotes the "Normalized posterior parameter range". A higher value of 1-NPR indicates a parameter having higher relative sensitivity. Thus, we present the relative sensitivity for each calibration parameter as the aerosol environment becomes more polluted. Going from left the right for each of the four aerosol environments the x-axes corresponds to the accumulation mode aerosol concentration of marine Arctic, marine average, rural continental, and polluted continental aerosol environments respectively. updraft is halved (Fig. 11a). For all base updraft velocity values the importance of the soluble mass fraction is higher than the lognormal parameters describing the accumulation mode aerosol for polluted continental conditions and relatively the least important for clean marine arctic conditions. The sensitivity to all parameters is increased for the marine average aerosol environment when the updraft is doubled. This is due to the increase in updraft (keeping all other values fixed) resulting in a higher fraction of activated droplets (subsequently less interstitial aerosol remaining compared to Fig. 1). Therefore, as the updraft is not optimised during the MCMC simulation it cannot act as a limiting factor, and with the same number of aerosol particles available, smaller perturbations in the remaining parameters will be amplified causing this clean aerosol environment to exhibit higher sensitivity to changes in aerosol physiochemical properties. The effect is not as pronounced for marine Arctic conditions when the updraft is doubled as when the updraft is 0.3 m s −1 there is already only a small reservoir of particles left unactivated (Fig. 1).
To check this hypothesis a simple sensitivity analysis to the input parameters was performed as in P11 (figures not shown). For the higher updraft base case the simulated CDNC distribution was more sensitive to a small perturbation in these parameters compared to the low updraft case. This effect becomes weaker as the environment becomes more polluted partly due to the effect of parameter compensation (cf. Sect. 3.5). The smaller response in the relative sensitivities when halving the updraft compared to doubling it with respect to the base case (Fig. 11b) can be explained by the non-linear relationship between updraft and the accumulation mode concentration and soluble mass fraction as shown by our response surface analysis (P11), so that below a certain updraft value only small changes in the sensitivity will be observed. In summary, for low updraft the critical saturation vapour pressure is the limiting factor, whereas for high updraft conditions the non-linear physiochemical effects relating to the aerosol are limiting.

Inclusion of additional calibration parameters
In the base setup used in this paper we focus on the sensitivity of the accumulation mode aerosol and the chemistry (denoted by the soluble mass fraction), as well as the correlation between these parameters. Whilst this is a limited number  Tables 1, 2 to yield normalized ranges. The blue error-bars represent define the 1 %-99 % limits of the posterior distribution. The blue circles are used to signify the MAP values of the calibration parameters that provide the highest likelihood to the measured (synthetic) droplet size distribution, whereas the red circles denote the true parameter values used to create the synthetic calibration data. Each grey line going from left to right through each panel is a different parameter sample from the posterior distribution. of parameters, it was deemed important to keep the analysis simplified for the first time in which an MCMC algorithm is coupled to a cloud parcel model to investigate cloud-aerosol interactions.
The results presented herein focus on those parameters considered to be most interesting with respect to the importance of aerosol size versus chemistry. However, it was demonstrated in P11, the updraft was clearly the most important calibration parameter for droplet activation. As a focus of this paper was to determine the relative importance of size and chemistry for the nucleating ability of an aerosol particle the updraft was held fixed to 0.3 m s −1 . Nevertheless, it is possible to include additional parameters in the analysis, thus we repeat our simulations for two aerosol environments (marine average and rural continental) with an increase of the number of calibration parameters from four to seven by including the mass accommodation coefficient (MAC), surface tension (ST), and updraft velocity. The associated relative parameter sensitivity is presented in Fig. 12. This test allows us to examine whether there is a significant change in the relative sensitivity of the original four calibration parameters when the number of calibration parameters included in the MCMC analysis is increased.
In the absence of reliable measurements in the literature the median and prior range for the MAC and ST are defined as in P11. Thus, the median of the MAC is set to 1 with a prior range of 0.01-1, and the median of the ST to 70 m N m −1 with a prior range of 20-75 m N m −1 . The value of the MAC is widely acknowledged to be uncertain, with an experimentally determined range of 0.01 to 1.0 (Xue and Feingold, 2004, and references therein). The ST is also a highly uncertain parameter, and the presence of organic surface tension-lowering compounds in the aerosol is acknowledged to affect cloud microphysical properties (Facchini et al., 1999;Gautam and Tyagi, 2006). The prior range of the updraft is selected to represent a wide range of meteorological conditions as in P11 (0.05-2 m s −1 ).
In Fig. 12a for marine average aerosol conditions, the updraft velocity is the most important calibration parameter, slightly more important than the accumulation mode number concentration. Its importance increases relative to the accumulation mode number concentration in rural continental  Ervens et al. (2005) who examined numerous chemical/composition effects in unison and showed that due to compensation between parameters the effect of composition on total droplet number was significantly less than suggested by studies that address the effects individually. The relative sensitivity of the original four calibration parameters is slightly decreased compared to the base sensitivity results when the updraft velocity, MAC and ST are included in the analysis (compare Figs. 7 and 12). This can be explained by parameter compensation, the effects of which are stronger for the more polluted environment (cf. Sect. 3.5).
Tests were conducted (figures not shown) in which the lognormal parameters describing the smaller Aitken aerosol mode were also included as calibration parameters using the prior ranges found in P11. In P11 these parameters were shown to be non-identifiable (the droplet size distribution used as calibration data does not include the necessary information content to warrant their estimation), thus they are not crucial for accurately simulating the droplet size distribution. The results when posing the inverse problem in a Bayesian framework and calculating their relative sensitivity supports the response surface analysis in P11, that these parameters are non-identifiable (their posterior ranges extending to their prior ranges). Including these parameters did alter the relative sensitivity of the other calibration parameters due to parameter correlation. Therefore, for such synthetic studies where correlation between certain calibration parameters exists, if the number of calibration parameters included in the inverse procedure is increased substantially (e.g. including the lognormal parameters describing the Aitken mode) the calculated global sensitivity of parameters can be altered.
We also tested the effect on the non-identifiability of the lognormal parameters describing the Aitken mode by including the interstitial aerosol in the calibration data. For simplicity we applied the same 10 % heteroscedastic error function. By including this information in the calibration data the lognormal parameters describing both aerosol modes generally become more constrained. It is possible to isolate and measure the interstitial aerosol and this has been undertaken during the MASE II campaign in which a reverse-facing inlet in cloud was used to sample the interstitial aerosol (Sorooshian et al., 2010).

Discussion
The sensitivity analysis presented in Sects. 3 and 4 shows that the importance of the chemistry for the cloud nucleating ability of aerosol particles varies substantially as a function of both the accumulation mode aerosol concentration and the updraft velocity. We have probed an idealised cloud using synthetically generated CDNC distribution measurements with respect to four of the key calibration parameters of an adiabatic cloud parcel model.
The strong correlation between three of the four parameters investigated in this synthetic study provides hope for simplification of parameterisations describing droplet activation (Kivekäs et al., 2007), and this motivates applying MCMC simulation to real world observations of cloudaerosol properties. The strong parameter correlation and compensation for all aerosol environments also highlights the need for detailed measurements of cloud properties if we wish to constrain the cloud-aerosol inverse problem using Table 5. Summary statistics of the derived final (posterior) distribution are presented for MCMC simulations using lower base updraft velocity (updraft velocity = 0.15 ms −1 ) and higher base updraft velocity (updraft velocity = 0.60 ms −1 ) for: marine Arctic, marine average, rural continental and polluted continental aerosol conditions. CV denotes the coefficient of variation and MAP the maximum a posteriori (MAP) values. physically based cloud models. Future measurement campaigns should therefore measure the cloud microphysical properties at multiple height levels simultaneously, and include the interstitial aerosol (cf. Sect. 5) as well as size resolved chemistry. This will allow us to increase the information content stored in the calibration data, and more accurately constrain more of the calibration parameters. The parameter compensation and correlation, in particular for polluted environments also highlights the difficulty in ascertaining the true parameter sensitivity using synthetic studies and care should be taken when performing local sensitivity studies on aerosol parameters as their effects on the droplet nucleating ability are highly non-linear. Thus, the effects of parameter correlation and interaction justify the use of statistical approaches such as MCMC simulation for investigating cloud-aerosol interactions with respect to parametric uncertainty and cloud model structural accuracy.
When the updraft velocity was included as a calibration parameter (cf. Sect. 5) is was shown to be a very important parameter for the accurate simulation of the CDNC distribution as would be expected, As it is currently considered both difficult to measure and highly variable (Lance et al., 2004), the results shown with MCMC highlight the importance to constrain the uncertainty of its measured value in future cloud-aerosol measurement campaigns. It has also been shown that for clean aerosol conditions the fraction of aerosols activated to droplets is a weak function of vertical velocity and a much stronger function of vertical velocity when aerosol concentrations are typical of polluted environments (Snider and Brenguier, 2000). To investigate this, further analysis is required for when the updraft velocity is included as a calibration parameter for polluted continental aerosol conditions, which was beyond the scope of this study.
When the MAC and ST were included in the analysis they were found to be unimportant, indicating that constraining their measured values by improved instrumentation would not significantly improve the accuracy of the simulated droplet size distribution with respect to measurements for the cloud model setup used herein. For instance, for the MAC for the two environments investigated, regardless of how much the prior range was constrained the relative sensitivity (1-normalized posterior range) will always be close to zero.
This illustrates a further benefit of MCMC when developing droplet activation parameterisations as we can use this information stored in the posterior distribution to identify unimportant parameters and thus simplify the number of input parameters required. However this must also be investigated using real world droplet size distribution measurements as calibration data rather than synthetically generated observations from the adiabatic cloud parcel model.
The particular choice of measurement error used to perturb the model output in the setup (in this case assumed to be 10 %) can potentially influence the resulting parameter sensitivities. A comprehensive evaluation of the effect of changes in the assumed measurement error on the derived parameter sensitivities is beyond the scope of this work and will be more appropriately dealt with when real world measurements are used. Nevertheless, a simple test of the sensitivity was conducted whereby the measurement error was increased from 10 % to 20 % for the marine average case (figures not included). Although the absolute sensitivities of the parameters decreased somewhat with increasing measurement error (which can be attributed to a higher spread of probability mass over the parameter space, hence larger parameter uncertainty), the relative importance of the parameters remained relatively unchanged.

Conclusion
In this study, we have coupled a state-of-the-art MCMC algorithm, to an adiabatic cloud parcel model. By using synthetically generated observations for marine Arctic, marine average, rural continental and polluted continental conditions, we have shown that the MCMC algorithm is able to efficiently provide a means to calculate the global sensitivity of key input parameters for describing the development of a CDNC population in an adiabatic cloud parcel model.
The most important merits of the approach adopted are: -MCMC algorithms can successfully be coupled with adiabatic cloud parcel models. This framework opens up new ways forward to investigate cloud-aerosol interactions.
-It is possible to simultaneously quantify both global parameter sensitivity and investigate the structure of the cloud model in relation to its input parameters. This framework results in a high level of transparency with respect to statistical inference of parameter uncertainty and correlation, and assessment model prediction uncertainty ranges.
Considerations to be taken when applying inverse modelling to cloud aerosol interactions: -The parameter sensitivity results presented herein are dependent on the choice of the calibration data set, and likelihood (objective) function used, and number of calibration parameters investigated.
-The ability of DREAM MCMC algorithm to search the entire parameter space significantly reduces the chance of getting stuck in local optima. Yet, population based search and optimization algorithms may pose computational challenges, particularly when the (cloud) model under investigation requires significant time to run and produce the desired output.
-Future studies can benefit from the ability of DREAM MCMC algorithm to be run in parallel, and distributed computing opens up new possibilities for solving complex, and computationally demanding parameter estimation problems related to cloud-aerosol interactions.
-To inspire confidence in the MCMC inverse modelling approach, a successful demonstration using real rather than synthetic measurements is required. This is a prerequisite to accurately predict cloud-aerosol interactions across a range of spatial scales.
We found strong correlations between certain input parameters, for example, the solubility versus the number and geometric standard deviation of the accumulation mode aerosol in polluted aerosol environments. In light of this it is crucial to improve our knowledge of the physical upper and lower limits of aerosol physio-chemical properties in the real atmosphere by performing more measurements of these parameters both spatially and temporally. This will ensure a better confidence in subsequently derived global parameter sensitivity using MCMC methods. The applied algorithm shows that for marine Arctic and marine average aerosol conditions the aerosol particle concentration and mean radius of the accumulation mode are the most important parameters when simulating the CDNC distribution, whereas the chemical composition is the least important. However, for the present updraft applied (0.3 m s −1 ), in more polluted environments (aerosol concentration of the accumulation mode >400 cm −3 ) the relative importance of the soluble mass fraction increases considerably. In polluted conditions (aerosol concentration of the accumulation mode >1000 cm −3 ) chemistry dominates the lognormal aerosol parameters describing the accumulation mode.
Whilst these main conclusions mostly confirm those obtained by previous studies, the method presented considers and displays a number of important findings in an integrative Atmos. Chem. Phys., 12, 2823-2847, 2012 www.atmos-chem-phys.net/12/2823/2012/ way, providing a clear way to deconstruct complex cloudaerosol interactions into a simple form.
The results presented here are not derived using real-world cloud data, the findings so far being limited to synthetic cases only. In a related study we will investigate cloud-aerosol interactions in an inverse framework using real measurements from the Marine Stratus/Stratocumulus Experiment (MASE II) campaign to investigate aerosol-CDNC distribution closure. This will allow a more detailed investigation of the structure of the adiabatic cloud parcel model for the accurate simulation of stratocumulus clouds. We will also assess the global parameter sensitivity compared to the results presented herein using synthetic data for marine aerosol conditions.