Detection of particle layers in backscatter profiles: application to Antarctic lidar measurements

A detection method is proposed and studied to infer the presence of hidden signals in a statistical way. It is applied here to the detection of Polar Stratospheric Cloud (PSC) layers in lidar backscatter profiles measured over the Dumont D’Urville station (Antarctica). PSCs appear as layers with enhanced variance in non stationary, heteroscedastic signal profiles, between two unknown altitudes to be estimated. The method is based on a three step algorithm. The first step is the stationarization of the signal, the second performs the maximum likelihoods estimation of the signal (PSC altitude range and variance inside and outside the PSC layer). The last step uses a Fisher-Snédécor test to decide whether the detection of PSC layer is statistically significant. Performances and robustness of the method are tested on simulated data with given statistical properties. Bias and detection limit are estimated. The method is then applied to lidar backscatter profiles measured in 2008. No PSC are detected during seasons when PSCs are not expected to form. As expected, PSC layers are detected during the austral winter and early spring. The effect of time averaging of the profiles is investigated. The best compromise for detection of PSC layers in lidar backscatter profiles acquired at Dumont D’Urville is a time averaging window of 1 h typically.


Introduction
During winter, the low temperatures prevailing in the polar regions in the lower stratosphere lead to the formation of clouds, called Polar Stratospheric Clouds (PSCs) between 12 and 30 km. PSCs play a key role in the formation of the so-called ozone hole over Antarctica at the beginning of spring. PSCs provide reactive surfaces for heterogeneous chemical reactions that quickly convert halogen reservoir species into ozone-destroying radicals (see for example WMO, 2007 and for more details). PSCs may also play a significant role in the radiative balance of the atmosphere, as suggested in Sloan and Pollard (1998) or in Lachlan-Cope et al. (2009). For theses reasons, a long term increase in PSCs can affect polar stratospheric ozone or even the climate. One of the most sensitive instrument to PSC layers is the lidar (LIght Detection And Ranging). Note however that, although there are several long lidar time series available, homogeneous times series of lidar-based PSCs detections remain scarce which is why there is a need for systematic, reliable and simple methods to extract PSC signals from lidar profiles time series (David et al., 2010).
Several types of PSC have been identified and are usually distinguished according to their optical properties. The optical properties depend on PSCs size distribution, state and composition that are quite variable. As the crucial parameter in the processes of formation and evaporation of PSCs is the temperature, the temperature evolution mostly determines changes in PSC composition, phase and size distribution. PSCs can be liquid or solid, composed of nitric acidrich mixtures or ice and have typical sizes of approximatively a micron. The following references give an overview of the different types of PSC: Rosen et al. (1975), Voigt et al. (2000) and Tabazadeh et al. (1994).
Lidar is a widely used remote instrument technique to detect PSCs. Lidars are widely used in PSC studies (Adriani et al., 2004, Iwasaka et al., 1986, Fiocco et al., 1992or WMO, 1999. Lidar measurements consist of very short pulses of focused light, illuminating the overhead atmospheric column, with a relatively small divergence. The returning photons are collected and converted into an electrical signal. The time elapsed between the emitted laser pulse and the scattered returned signal is proportional to the altitude over which the scattering occurred. The intensity of the returned signal depends on the nature and concentration of the scatterers, Bohren and Huffman (1983), Measures (1984) and SPARC (2010). PSC detection is important for studies of the chemistry and dynamics of the polar stratosphere. It also allows to identify PSC-free profiles that can be used for modelling stratospheric profiles where only sulphuric acid aerosols particles are present in the lower stratosphere (i.e. profiles without PSC layer, see Sing Wong et al., 2009 andAdriani et al., 1999) or clear-sky profiles for lidar calibration (Platt, 1979).
The large amount of data (several thousand lidar profiles per year) makes it difficult to identify in a reliable and objective way the presence of PSC layers on every profile without a systematic and robust detection method. The purpose of the present work is the development and testing of a PSC detection algorithm in lidar profiles. Several detection methods have been tested in the literature, for example, Chang and Zhang (2007) approach focuses on the detection of a single long lasting variance shift detection, and Gumedze et al. (2010) worked on outliers detection. Even if they are strongly related, these two studies do not deal with the detection problem in the same way as the present method where transient variance shifts (i.e. short lasting variance shifts) are studied. In addition, some studies still do not pay attention to stationarity properties of the signal. The assumption of stationarity means that the distribution of the signal does not change with altitude in a lidar profile (or, more specifically, homoscedasticity indicates that the variance of the signal remains constant with altitude). In other words, this property assumes that whatever the altitude, the signal has to follow a constant probability distribution with constant parameters. The characterisation of statistical properties of the signal is necessary and required statistical tests because the lack of stationarity precludes in principle statistical calculations of interest (as theoretically introduced in Goldfarb and Pardoux, 2007). For example, the mean or variance of a sample is meaningful only if the assumption of stationarity can be previously confirmed. Methods to stationarize signals exist and have been studied in Goldfarb and Pardoux (2007) or Bourbonnais and Terraza (2004). Other methods rely on wavelet approaches and the use of arbitrary thresholds to discriminate whether a detected signal is significant or not (e.g. Morille et al., 2007or Berthier et al., 2008. Although this last wavelet-based approach gives good results on detecting PSC layers, it is limited by the fact that it does not allow to give a confidence interval on the parameters of the detected signal (e.g. amplitude, top and bottom altitudes ...). Finally, other methods require the a-priori knowledge of the optical prop-erties of the scatterers (see the work of Chazette et al., 2001), which are not known in our case. The current study proposes a new statistical method to systematically detect PSC layers in a lidar profile by testing only the profile, assuming no other information is available. The method is based on the fact that the variance of a backscatter profile is locally affected by the presence of PSC layers. PSCs are identified here in lidar profiles as a transient increase in the variance (an increase which is localized between a bottom and a top altitude) of the signal with an automated procedure that does not require the use of visual or ad-hoc threshold selection and allows to calculate the confidence interval of the parameters of the detected signal.
The paper is organized as follows. Section 2 briefly describes the lidar data we used. The detection procedure is explained in Sect. 3, introducing the different statistical characteristics of the lidar data. Section 4 presents and discusses the results on the application of the detection procedure to a large lidar data set. The last section is devoted to other possible applications of this detection method and concluding remarks.

Lidar data
The international Network for the Atmospheric Composition Changes (NDACC) is composed of worldwide remotesensing stations monitoring the physical and chemical parameters of the atmosphere. The current study is focused on lidar data collected at the Dumont d'Urville (hereafter refered as DDU, 66 • 39 ′ 46 ′′ S 140 • 0 ′ 5 ′′ E) station in Antarctica. The lidar initially installed in 1989, provides vertical backscatter profiles of the atmosphere from several meters above the instrument to 30-35 km, with a 5 min time integration. About 100-140 nights of observations are performed per year.
The retrieval process and necessary assumptions in processing lidar data from DDU are explained in details in Chazette et al. (1995) and David et al. (1998). Instrumental concerns on the DDU lidar can be found for example in Stefanutti et al. (1992) and in David et al. (1998). These measurements provide backscatter aerosols profiles which can contain indications of the presence of PSCs over Antarctica. The vertical resolution of the profiles is 60 meters. Since PSCs form between 12 and 30 km approximately, the detection procedure is applied on the altitude range between 8 and 35 km only, giving 360 data points per lidar profiles. The equation relating the received backscattered signal intensity P (z) from a given z altitude, involving the extinction from the air column and particles ranging from the lidar ground level to the backscattering z altitude is given by, where P (z) is typically the lidar power incident on receiver from z (typically a flux photons: number of photons per unit time and unit surface), F 0 is the laser pulse energy, β(z) is the total aerosol and molecular backscatter coefficient, Kencompasses the various instrumental constants (including area of the lidar receiver), z 0 is the altitude where the instrument is located and α(z) is the total extinction coefficient (molecules + particles). In particular, the presence of clouds layers modify the scattering and extinction properties along the optical path of the laser beam. The resolution of this equation is widely discussed in literature (see for example David et al., 1998, Collis and Russell, 1976, Fierli et al., 2001and David et al., 2005. This gives rise to both theoretical and instrumental issues. Fernald et al. (1972), Klett (1981) and identified a first order Bernouilli differential equation and stated on the formalism of its solution. The critical assumption is the a priori knowledge of the ratio between extinction and backscattering, the so-called lidar ratio. The values of this ratio depend on the particle type, being either aerosols, cirruses, or PSCs. With known lidar ratios, an objectivity issue still remains in the selection of the altitude ranges separating the different particle types along any lidar profile. This step has to use quantifiable and objective criteria to ensure the reliability of lidar time series. This is the substance of the present paper.

A procedure to detect PSCs
An example of a cloud-free profile is displayed in the top left hand corner of Fig. 1, this profile was measured on 17 April 2008 over the DDU station. Typically, the backscattered signal decreases sharply with the increasing altitude between 8 and 35 km, due to the decrease of the molecular density. Every backscatter profile exhibits an interesting statistical feature: the variance (calculated from the difference between the raw and smoothed profiles) is never constant, and varies with altitude (see panel b of Fig. 1). A signal with varying mean and/or variance is called a heteroscedastic signal. Most of the cloud-free (i.e. background) variance originates from instrumental noise and, possibly, some natural shortterm variability of the atmosphere.
The presence of a PSC layer in a profile (panel d of Fig. 1, profile measured on 23 August 2008) generates a local increase in the variance, as illustrated in the panel 1-e which shows the same profile as in 1-b after removing the smoothed profile (i.e. the low frequency component of the signal; thereafter referred as smoothed signal or trend). The lower altitude of 8 km was chosen to prevent including high-altitude cirrus clouds in the variance estimation.
Our procedure of detection is based on these three characteristics (i.e. the trend, the decreasing variance and the transient variance break) and requires three steps in the signal processing. The first step is the stationarization of the signal. That means removing the trend and controlling the variance. In the second step, we proceed to the maximum likelihood estimation of the parameters of model (2) (see Appendix A for details), and then estimate the more likely altitude range of a PSC layer. The last step uses a Fisher-Snédécor test to decide whether the detection of PSC is statistically significant.
Based on the characteristics of the lidar backscatter profiles described previously, the raw signal P raw is modelled with a combination of signals including random variables P raw = P trend + P cloud + P back (2) where P trend describes the trend of the signal (low frequency component of the signal). P cloud describes the signal fluctuations generated by the PSC; this PSC signal is null except between two boundaries, the top and bottom altitudes of the PSC layer, where it is modelled with a zero-mean Gaussian variable whose distribution is usually denoted by, N (0, σ 2 cloud ) with 0 being the mean and σ 2 cloud being the variance. Finally P back describes the heteroscedastic (i.e. variance is not constant) background signal which is modelled with a zero-mean Gaussian variable whose distribution is denoted by, N (0, σ 2 back ); σ 2 back is the altitude-dependent background variance which is found to decrease approximately linearly with increasing altitude (Fig. 1b). P cloud and P back are assumed to be independent.

Stationarization procedure
As explained above, a backscatter profile is obviously not stationary (i.e. its distribution is not constant along the altitude). The stationarization procedure described here tends to remove the trend and make the variance of the remaining signal constant with altitude. The smoothing of the signal P trend is carried out using a centred moving average filter of vertical length p with p being the number of points of averaging window. Once the trend is estimated, it is subtracted from the raw signal to generate a zero-mean signal P hf given by, The residuals P hf are the high-frequency component of the signal. They are heteroscedastic and so P hf is non-stationary. However, an empirical analysis of P hf in a large number of our backscatter profiles and the confirmation on literature (e.g. Liu et al., 2006) show that the raw lidar signal P raw follows a Poisson distribution. That means that a proportional one-to-one relationship exists between the mean of the signal and its variance. So that the altitude dependency of the variance (here denoted σ back ) can be accurately reproduced by the previously estimated trend P trend ; this parametrization of the variance allows us to remove the altitude dependency of the variance in P hf in order to generate a stationary signal (i.e. the variance is now constant with altitude).
It is worth pointing out that, over the cloud altitude range, the total variance is expected to be higher because it will be the sum of the background variance σ 2 back and of the cloud variance σ 2 cloud . After estimating the constants a and b using a common least square fitting approach in the altitude range where the PSC layer are known not to appear (below 12 km and above 30 km), the final step to stationarize the signal is to divide P hf by its own standard deviation σ back . This step is similar to an altitude-dependant normalisation and can be expressed as P * is homoscedastic and is unitless whereas P raw has units of power. The exponent * is always used here to refer to quantities derived from the stationarized signal P * (generated by the altitude-dependent normalisation given by Eq. (4)). Once the signal is stationarized, the resulting distributions of P * can be considered as independent and identically distributed, and it remains constant over the cloud-free altitude ranges (see panel c of Fig. 1). The analysis of a large number of backscatter profiles indicates that the distribution of the stationarized signal P * can be assumed to be Gaussian (zero-mean and variance equal to σ * 2 ). Figure 2 shows the gaussian behaviour of the P *signal. The upper top panel represents the distribution of a stationarized PSC free lidar profile (black circles) compared to a gaussian distribution (red line), whereas the bottom panel represents the stationarization of a profile with a PSC layer (the two graphics represent the distribution inside and outside the PSC layer). The variance σ * 2 depends on the considered region (either inside or outside the cloud layer). Outside the PSC layer, the distribution is denoted by N (0, σ * 2 out ), i.e. σ * 2 = σ * 2 out . The signal P * displays a higher variability within a PSC layer (see Fig. 1f) and the distribution of P * within a PSC layer is denoted by N (0, σ * 2 in ), i.e. σ * 2 = σ * 2 in . When analysing the results, it must be kept in mind that σ 2 back refers to the variance of P hf , the high-frequency component of the backscatter profile, whereas σ * 2 , σ * 2 in and σ * 2 out refer to the variance of P * , the stationarized P hf . When there is no PSC, the variances σ * 2 , σ * 2 in and σ * 2 out are equal (as in panel c of Fig. 1).
The entire previous procedure is illustrated in Fig. 1 for a cloud-free profile measured on 17 April 2008 and for a profile where a PSC layer appears between 18 and 21.5 km on 23 August 2008. The three panels on the top of Fig. 1 correspond to the cloud-free profile monitored on 17 April 2008: the panels 1a and 1b show the raw profile P raw and the variance of P hf (= raw profile -smoothed profile) respectively. Panel 1c shows the stationarized profile P * resulting from the three-step processing described above. The profile P * appears as a somewhat constantly distributed signal over the cloud-free altitude ranges, while, in the case of a PSC layer (the three bottom panels), the variance sharply increases between the two cloud boundaries that have to be estimated.

PSC parameters estimation by likelihood maximisation
This section explains the likelihood maximisation procedure on the signal P * in order to determine the most likely altitude range of a possible PSC layer. The previous procedure allows to assume now that the signal P * is stationary. This means that its distribution is constant inside and outside the hypothetical PSC layer, and can be equal when there is no PSC layer. This assumption is necessary to develop the following calculation. The M 0 -model (Eq. 5) assumes the profile does not contain a PSC. Conversely, the alternative M 1 -model (Eq. 6) assumes there is a PSC somewhere in the profile between two altitudes τ b and τ t , to be estimated representing respectively the bottom and top altitude of the PSC layer.
Thanks to the stationarisation procedure, the signal P * is now assumed to be an independent and identically distributed (iid) Gaussian with a higher variance within the PSC layer. The two models are presented by, M 0 : P * variance denoted by σ * 2 out does not vary with altitude, with the index "out" referring to the domain outside the PSC layer and "in" referring to the domain inside the PSC layer. Model M 0 is nested in M 1 (by considering σ * 2 in = σ * 2 out ). In this case the two altitudes τ b and τ t still exist but do not have any influence on signal P * .
The underlying likelihood of model M 1 following Eq. (6) is given by, where σ * out , σ * in , τ b and τ t are the parameters that need to be estimated, and n is the number of altitude range.
The details of the calculation giving Eq. (8) are given in Appendix A. This maximisation of Eq. (8) has to be done under the constraint that the bottom altitude of the PSC layer has to be lower than the top altitude and that these two altitudes have to be found within certain boundaries (i.e. the bottom altitude is above 12 km and the top altitude is below 30 km). The final constraint is that the variance of the signal within the cloud layer (σ * in ) has to be higher or equal to the variance of the cloud-free domain (σ * out ), or, more precisely, that the two variances have to be equal when there is no PSC. Overall the maximisation under constraints can be expressed by There are a number of difficulties in solving (Eq. 8) (likelihood L not continuous with respect to τ b and τ t (see 8), taking into account the constraints, the number of parameters). However, a recursive scheme has been implemented. Instead of having the 4 parameters (σ * out , σ * in , τ b and τ t ) as control variables in this maximisation problem with constraints, L is only maximised with respect to τ b and τ t using as σ * out and σ * in as fixed parameters that have been estimated previously. Then, once L is maximised, the corresponding values of τ b and τ t are used to recalculate σ * out and σ * in which are in turn used in a new resolution of (Eq. 8). At the end of each iteration, the values of τ b and τ t estimated by the resolution of (Eq. 8) are compared to the values of τ b and τ t estimated in the previous iteration and used to calculate σ * out and σ * in (inputs to the resolution of (Eq. 8)). As long as the input and estimated values of τ b and τ t are significantly different, this procedure is repeated. It is found to converge after fewer than 5 iterations in most cases.
The estimation of the variances is performed using the definition of the empirical variance (see Sprinthall, 2009) by splitting the signal in two intervals. The first interval corresponds to the cloud-free domain [z 1 , τ b [∪]τ t , z n ]. The second one corresponds to the PSC domain [τ b , τ t ]. The respective variances of these intervals (i.e. inside and outside) are given by, where τ t and τ b are expressed in units of number of datapoints in the vertical profile instead of km with 8 km being the origin. These two estimates correspond to the values of σ * out and σ * in which maximize Eq. (8), when considering τ t and τ b as constant.
The first estimatesσ * out andσ * in (used as inputs in the first resolution of Eq. (8)) are calculated assuming that the cloudfree altitude ranges cover below 12 km and above 30 km because PSCs are usually not observed at those altitudes. This choice of altitude ranges is rather arbitrary. Nonetheless, it has no influence on the final estimation because the iteration procedure recalculates recursively the cloud and cloud-free altitude ranges. After a few iterations, the estimates ofσ * 2 out , σ * 2 in ,τ b andτ t do not change anymore. Further investigations on the robustness of the estimation are discussed in part 3.4.
As the cloud altitude range corresponds to discrete values (vertical resolution of 60 m), the maximisation of L with respect to τ b and τ t be computed numerically. It is not necessary to calculate the entire n×n matrix, with n being the total number of discrete altitudes. First, the constraint (8b) τ b ≤ τ t means that only half the calculation of the matrix is needed. Second, the fact that PSCs form between 12 km and 30 km further limits the calculations to τ b > 12 km and τ t < 30 km. An example of matrix (L as a function of τ b and τ t ) is provided in Fig. 4.
Several methods were tested to estimate τ b and τ t . As an example of the tested methods, a raw maximisation of the ratio between the two variances (using the empirical forms of the variances) appeared to be too sensitive to outliers, and led to detect too thin PSC layers. The selected method was inspired by maximum likelihood methods and dynamic programming proposed in . The maximum of L from Eq. (8) appears to be well suited to our parameters estimation problem; The method for solving equation is successful for both simulated and real data. The method using the raw variances ratio is too sensitive to outliers. In Eq. (8), the presence of (τ t − τ b ) log σ * out σ * in reduces the influence of outliers by giving a higher weight to wide layer (i.e. L increases when the distance τ b − τ t increases).

Statistical significance of the parameters estimation by a transient shift test
Once convergence is achieved and that the residuals are found to be independent and to follow a gaussian distribution (i.e. N (0, σ back )), the maximum likelihood algorithm provides estimates of the parameters (cloud altitude range and variances over the cloud and cloud-free domains), assuming there is a PSC layer. However, it does not check the likelihood of the existence of the PSC layer. Now it is time to test the statistical significance of the PSC detection as defined by these parameters: (τ b andτ t ) representing the best estimates of the bottom and top altitudes of a hypothetic PSC andσ * 2 out andσ * 2 in representing the best estimates of the variances in the interval [z 1 , τ b [∪]τ t , z n ] and in the interval [τ b , τ t ] respectively. A test is needed to rule whether the detection of a PSC layer is statistically significant.
The two-hypothesis model can be reduced to the problem to know whetherσ * 2 out =σ * 2 in orσ * 2 in >σ * 2 out , or similarly to know if, statistically, the variability inside and outside the PSC can be considered as equal or if the variability is statistically significantly higher in the inside interval than the one in the outside interval. This last case would indicate the presence of a PSC.
A fisher-Snédécor test handles this problem by considering the ratio of the squared variances of each samples (see Mood, 1974). The ratio allows to test the equality of the variance of two independent samples. Two samples are created from the values of P * split in the two different intervals with the test taking into account the different sizes of the two samples. The ratio is then given by, where, according to Eq. (9),σ * 2 in andσ * 2 out both follow a χ 2 n i −1 -distribution (i.e. the chi-square distribution being the sum of weighted squared gaussian distributed variables, see Sprinthall, 2009), and where n 1 being the sample size of the inside interval and n 2 the sample size of the outside interval. This implies that F follows a Fisher distribution with (n 1 − 1, n 2 −1) degree of freedom. As commonly done in statistics, the decision is made using a fixed confidence rate of 97%. This test ultimately decides on the existence of a PSC layer.

Estimation of bias and detection limit using simulated data
The purpose of this section is to evaluate the performances of the detection algorithm on perfectly characterized data that are generated numerically. In such a configuration, one can assess the ability of the algorithm to detect and quantify apriori known signals in the profiles. The characteristics are chosen such that they mimic typical characteristics of lidar profiles. The aims of this type of numerical experiment are, for instance, to identify possible biases and estimate a detection limit of PSCs. Non-stationary signals are first simulated numerically. Signals representative of average background backscatter profiles are generated by combining a smoothed profile average backscatter profile and a heteroscedastic (i.e. altitude dependent) Poisson noise (P(σ 2 back )), where σ 2 back is proportional to the smoothed profile value and so is also dependent to z, for z ∈ [1,360] with z expressed in units of number of points in the vertical profile (8 km corresponding to the origin). Then, between two altitudes, corresponding to the bottom and the top altitudes of a PSC layer, another Gaussian noise with a greater variance (=N (0, σ * 2 in )) is added to the background profiles. An example of profile simulated by adding a cloud variance σ * 2 in = 20 between 20.9 and 22.2 km is shown in Fig. 3. The detection algorithm is applied to this simulated lidar profile; Fig. 4 shows the likelihood (see Eq. (8)) as a function of the cloud altitudes. The best estimation of the cloud altitudes is provided by the maximum of the likehood, indicated by the open circle on Fig. 4 and by the dotted lines in Fig. 3. The retrieved cloud bottom altitude is underestimated by about 300 m (corresponding to 4 data points for the 60 m vertical resolution of the profiles) and the cloud top altitude is overestimated by the same amount.
The performances of the algorithm are then tested for a wide range of cloud variance values in order to characterise further biases and estimate the detection limit which is expected to depend both on the cloud-to-background variance ratio and on the length of the moving average window, p (used to smooth the raw lidar backscatter profiles (see Sect. 3.2)). Note that, for each value of cloud variance σ * 2 in considered, 500 profiles are simulated and treated by the detection algorithm. Figure 5 shows the PSC altitude range,τ b andτ t , estimated by the detection algorithm as a function of the cloud variance σ * 2 in which is added to the simulated background For a ratio between σ * 2 in and σ * 2 out smaller than 2, the retrieved values of the PSC altitude range vary substantially with many outliers. This suggests that the estimation of the cloud altitude range is not fully reliable when σ * 2 in is smaller or of the same order as σ * 2 out . In this region, the Fisher test does not allow to confirm the presence of a PSC layer. In contrast, for a variance ratio greater than 2,τ b andτ t vary little. There are not a single outlier and the Fisher test allows to confirm more than 95 % of the PSC layers. The same features and evolution are found at the top and bottom cloud altitude. However, the retrieved values exhibit a bias of about 300 m with respect to the cloud altitude range where the variance was enhanced compared to the background variance. The bias is positive at the top cloud altitude and negative at the bottom. Once the bias is corrected, the estimation is found then to be robust.
This bias in the estimated cloud altitudes is caused by the way the profiles are smoothed. Let's recall that a PSC is generated by enhancing the variance on a simulated background profile within a given cloud altitude range. As the smoothed raw profile (i.e. trend P trend ) is estimated with a moving average, the smoothed raw profile differs from the   Fig. 5. Boxplot of the PSC altitude range,τ b andτ t , estimated by the detection algorithm as a function of the ratio between cloud variance σ * 2 in and the background variance σ * out . The PSC altitude range is added between 19.9 and 23.5 km to the simulated background profiles. The median value (thick horizontal black bar), 25th and 75th percentiles (lower and upper box bounds respectively), and the lowest and highest data within 1,5 interquartile range of the lower and upper quartile respectively (lower and upper whiskers respectively) are also indicated. The outliers (i.e. data not included between the whiskers) are plotted as open circles. The actual PSC altitude range is indicated with two dashed horizontal lines (19.9 and 23.5 km). The Fisher test allows finally to confirm whether there is a PSC layer or not. smoothed background profile, not only within the cloud altitude range (from τ b to τ t ), but also in the vicinity of the cloud boundaries. Indeed, the moving average being of length p, the trend P trend is expected to be modified over an altitude range exceeding the cloud altitude range by about 300 m (60 m×p/2, where 60 m is the vertical resolution) on each side of the cloud boundaries. As a result, the high-frequency component P hf (=P raw − P trend ) and the associated variance are artificially enhanced by the presence of a PSC layer from τ b − p/2 altitude to τ t + p/2) altitude. As the PSC detection algorithm is based on the detection of changes in the variance, the estimated cloud bottom (top) altitude is found to be lower (higher) than in the simulated raw backscatter profile. Figure 5 illustrates quite well this small bias of the detection algorithm. It means that, for an accurate determination of the cloud altitude range, the bias has to be removed from the cloud altitude range estimated by the algorithm. It is also necessary for the cloud variance σ * 2 in to be at least of the order of twice the background variance σ * 2 out in order for the algorithm to detect and reliably estimate the cloud altitude range. The level of the background variance in the profile can also be interpreted as the detection limit of the algorithm.

The effect of temporal averaging of profiles using real data
This section describes the study of real backscatter profiles measured at the DDU station. As a first example, the detection of a PSC over DDU on 9 July 2008 is presented in Fig. 6. The estimated cloud altitude range (between 18.1 km and 21.15 km) is indicated with the dashed lines. For the same example, the evolution of the likelihood L(P * ; σ * out , σ * in , τ b , τ t ) is plotted as a function of the cloud bottom τ b and top τ t altitude in Fig. 7. The maximum of L is represented with an open circle and indicates the best estimates of the PSC bottom and top altitude. Overall, the processing of measured backscatter profiles by the algorithm gives results that are very similar to those obtained with simulated profiles (see Fig. 4). The statistical signification of these estimates is calculated using the Fisher Snedecor test of Eq. (10) with the 97 % confidence rate.
The detection algorithm is applied to lidar aerosol backscatter profiles measured between March and October 2008. Lidar aerosol profiles are available at a 5 min resolution corresponding to the measurement time integration. The total number of profiles is 3857. In the literature, before analysis, raw lidar signal profiles are usually averaged over several hours. The averaging allows to minimise the measurement noise and, therefore, make it easier to detect the aerosol/cloud signals. In essence, it is a way of reducing the background variance and hence improving detection. However, the averaging process also has negative consequences. It degrades the temporal resolution. And, it can reduce the cloud signal/variance when the cloud characteristics are not  stable over the averaging window. That is the case for rapidly varying PSC events. The averaging can lead to profiles with radically different characteristics (different PSC variance and altitude ranges, absence of PSCs on the profiles) being averaged together. The length of the averaging window represents a compromise between the benefit of minimising the photon noise and the detrimental effects of degrading the temporal resolution and attenuating the cloud signal.
The consequence of averaging the profiles is illustrated in Fig. 8 where the altitude range of PSC layers detected by the algorithm between June and September 2008 are reported. Each panel corresponds to PSC detections carried out over different averaging intervals: 5 min, 1 h, 4 h and 24 h. All the detection results are compared with the 5 min interval detections (the first top panel) that are indicated in grey on every other panels. The dots at the bottom of each panel indicate the average profiles processed by the algorithm. The larger the averaging interval is, the smaller the number of data (average profiles) is, the sparser the dots are. The results for March, April, May and October 2008 are not shown because no PSCs were detected during those months except once, in May, on a 10 min average. This detection is clearly a false positive because PSCs do not form above DDU during this period and no PSC was detected at 5 and 30 min averaging intervals. The fluctuations from the background noise can very exceptionally (1 out of 1228) generate false positive detection at very short intervals.
The global temporal pattern of detections remains similar from a panel to another. The number of PSC detections decreases when the lidar averaging interval increases. It is expected because, at the same time, the temporal resolution and the number of profiles decrease. Note, however, that the decrease in the number of detections is stronger than expected. In addition, there is a tendency to detect thinner PSC layers when longer averaging intervals are considered. These effects start to be most significant when the averaging interval exceeds 2 h. For the longest averaging intervals (6 h and beyond), some PSC layers seen on short averaging intervals are not detected anymore. It is due to the fact that, over some periods, the PSC signals are so attenuated by the averaging of mixed profiles that the algorithm is not able to detect them anymore. The effect of averaging on the signal variance can be analysed in a more formal way with the following relationship which gives the total variance of the average of two signals, Var( 1 2 (P 1 + P 2 )) = 1 4 Var(P 1 ) + 1 4 Var(P 2 ) + 1 2 Cov(P 1 , P 2 ), where P 1 and P 2 are two profiles.
www.atmos-chem-phys.net/12/3205/2012/ Let us consider separately the calculation inside and outside the PSC layer. Outside the PSC layer, the covariance term (i.e. Cov(P 1 , P 2 )) should be rather constant and small compared to the first 2 terms because the background variance mostly originates from instrumental noise that is characterised by a weak temporal correlation. On the other hand, inside the PSC layer, the PSC signal is expected to exhibit longer and stronger temporal correlation whose timescales are given by the persistence of PSC events seen over DDU; in other words, how long a PSC event typically lasts over DDU. When the profiles to average are separated by a time interval shorter than the PSC correlation timescales (and so PSC profiles with similar characteristics are averaged), the positive correlation between the profiles inside the PSC layer ensures that the inside variance decreases less quickly than the outside variance with averaging. Since the detection relies on the ratio between the inside and the outside variance, the averaging has a detrimental effect on the detection, i.e. there is a time threshold above which the averaged PSC layer is diluted in the background signal. For example, there is a wide short-lived PSC layer clearly detected (bottom altitude at around 11 km and top altitude at around 23 km) just after 9 July 2008 (see Fig. 8) at short averaging intervals (i.e. 1 h and 4 h). However, this layer is not detected at the original 5 min interval, indicating that the background noise was too strong to detect the PSC signal in the original profiles. The averaging initially reduces the background noise more than the PSC signal to make it detectable. At the largest averaging interval, this PSC layer is not detected anymore, meaning that the PSC signal is diluted in the averaging.
When the profiles to average are separated by a time interval beyond the PSC correlation timescales (and so profiles with completely different characteristics are averaged), the positive correlation disappears on average and the covariance (Cov(P 1 , P 2 )) inside the PSC layer should decrease with increasing averaging time intervals (then so does the variance Var( 1 2 (P 1 + P 2 ))). As a result, PSC signals become more difficult to detect in the background noise for large averaging time intervals. This attenuation effect of the averaging starts to be noticeable just on the inner edges of PSC layers where the variance is not very much higher than the outside variance. This explains why the detected PSC layers become thinner when the averaging interval is increased. For long time intervals, 6 h and beyond, the PSC variance can become so weak over entire PSC layers that they are completely missed by the algorithm. According to Fig. 8, the most reliable and robust results for 2008 are obtained between 30 min and 2 h intervals. Overall, the best compromise between the temporal resolution and the accuracy of the detection seems to be an averaging interval of 1 h typically.

Discussion and conclusion
An method of PSC detection on raw lidar signal profiles is presented. The detection is based on the local increase in the profile variance produced by the presence of a PSC layer. The detection procedure consists of three steps. The first step consist of performing a stationarisation of the backscatter profiles. The second step involves the calculation of a maximum likelihoods. In the last step, the statistical efficiency of the PSC detection is estimated. The performances of the detection system are evaluated on simulated backscatter profiles that mimic typical characteristics of lidar profiles. The tests on simulated data show that PSC layers are reliably detected when they produce changes in variances greater than the background (i.e. PSC-free) variance. They also show that the dispersion of the estimated cloud bottom and top altitudes is found to be about 200 meters typically and that there is a systematic bias of about 300 m linked to the smoothing of the profiles.
After having been successfully tested on simulated data, the method is applied to real backscatter profiles measured above DDU station between March and October 2008. The results confirm the relevance of the detection algorithm. Series of PSC layers are detected during the austral winter and early spring (June, July, August and September). No PSC layer is detected during months when PSCs are not expected to form according to thermodynamical thresholds. The effect of temporal averaging has also been analysed. This averaging is often necessary when the lidar measurement time integration is very short. Its aim is to minimise the photon noise and hence maximise the signal-to-noise ratio. However the averaging degrades the temporal resolution and more importantly, if the temporal averaging far exceeds the inner variability time scale of the probed PSC layer, the measurements end up considering an overall optical smoothed equivalent of the cloud. The results suggest that the best compromise for PSC lidar detection at DDU is of the order of 1 hour.
There are other potential applications of this detection method presently applied to ground-based lidar profiles. The first is to include the detection of cloud layer in the inversion process of lidar data. Indeed this inversion requires the knowledge of the optical properties of the atmosphere along the laser beam, which is impacted by the presence of PSC layer. Second, a similar treatment could be applied to satellite lidar profiles (for example satellite observations from Calipso, Pitts et al., 2007 andPitts et al., 2009). Since the optical signature of volcanic aerosol layers on lidar profiles is rather similar to the weak signal of optically small PSC, applying this method to the detection of volcanic layer appears straightforward (i.e. David et al., 1998 andDavid et al., 2010). In the same way, the detection of other clouds (cirrus or noctulescent clouds Von Cossart et al., 1996or Dubietis et al., 2010 should also be possible with this approach. Finally, this could also be suited for the detection of biomass burning plumes or desert dust layers in tropospheric lidar profiles. One limitation of the model is that it only detects a single layer in a profile. In case of superimposed PSC layers our method detect them as a single layer. The detection of distinct multiple PSC layers would improves the caracterization (frequence, height ...) of PSCs and then would help to a better understanding of their formation and role in ozone depletion process. Such improvement of the method requires new developments but no theoretical issues are to be overcome. As PSC backscattered signals depend on the lidar wavelength, the use of lidar profiles acquired with different wavelengths and a multivariate approach (one per wavelength) would allow to distinguish the type of detected PSCs. By taking into account a priori knowledge (for instance, an average PSC height, their most probable altitude ... ), a bayesian approach (see for example the development to variance shifts detection of Hannart and Naveau, 2009) could be considered in order to tackle these new problems (both the multilayer aspect and the distinction of PSC type).

Appendix A Likelhood calculation
This annexe present the calculation which allows to infer the parameters of profiles. The first model, M 0 , explained by Eq. (5) can be mathematically modelled by M 0 : ∀z ∈ [z 1 , z n ] P * (z) ֒→ N (0, σ * 2 out ).
This means that the distribution of the stationarized profile P * is constant along the altitude range (i.e. ∀z ∈ [z 1 , z n ]). Whereas the alternative model, M 1 , explained by (6) is expressed by M 1 : ∀z ∈ [z 1 , τ b [∪]τ t , z n ] P * (z) ֒→ N (0, σ * 2 out ) ∀z ∈ [τ b , τ t [ P * (z) ֒→ N (0, σ * 2 in ), and means that two altitudes exist τ b and τ t which correspond to the bottom altitude and the top altitude of a hidden signal, within this altitudes the variance is supposed to be greater or equal to the variance outside. Note that, if considering σ * in = σ * out in Eq. (A2), models from Eq. (A1) turn out to be embedded in models from Eq. (A2). To estimate the parameters of the model, the calculation of the likelihood maximum of distribution given by Eq. (A2) is needed.
Assuming the random variables P * (z) z 1 ≤z i ≤z n are independent, then, under M 1 , the distribution of the vector P * = (P * (z 1 ), ..., P * (z n )) is given by f (P * |M 1 ) (A4) The likelihood is then given by L(z; σ * out , σ * in , τ b , τ t ) = log(f (P * |M 1 )) (A5) For programming performance, the previous likelihood can be written as Where T is the total sum of squared P * (z) (i.e. z∈[z 1 ,z n ] P * (z) 2 ). This last step allows to calculate only one of the two sums of Eq. (A6).
The search of the maximum of L(z; σ * out , σ * in , τ b , τ t ) regarding σ * out , σ * in , τ b and τ t is performed using a iterative method explained in Sect. 3.2.