Boundary-layer turbulent processes and mesoscale variability represented by numerical weather prediction models during the BLLAST campaign

This study evaluates the ability of three operational models, with resolution varying from 2.5 to 16 km, to predict the boundary-layer turbulent processes and mesoscale variability observed during the Boundary Layer Late-Afternoon and Sunset Turbulence (BLLAST) field campaign. We analyse the representation of the vertical profiles of temperature and humidity and the time evolution of nearsurface atmospheric variables and the radiative and turbulent fluxes over a total of 12 intensive observing periods (IOPs), each lasting 24 h. Special attention is paid to the evolution of the turbulent kinetic energy (TKE), which was sampled by a combination of independent instruments. For the first time, this variable, a central one in the turbulence scheme used in AROME and ARPEGE, is evaluated with observations. In general, the 24 h forecasts succeed in reproducing the variability from one day to another in terms of cloud cover, temperature and boundary-layer depth. However, they exhibit some systematic biases, in particular a cold bias within the daytime boundary layer for all models. An overestimation of the sensible heat flux is noted for two points in ARPEGE and is found to be partly related to an inaccurate simplification of surface characteristics. AROME shows a moist bias within the daytime boundary layer, which is consistent with overestimated latent heat fluxes. ECMWF presents a dry bias at 2 m above the surface and also overestimates the sensible heat flux. The high-resolution model AROME resolves the vertical structures better, in particular the strong daytime inversion and the thin evening stable boundary layer. This model is also able to capture some specific observed features, such as the orographically driven subsidence and a welldefined maximum that arises during the evening of the water vapour mixing ratio in the upper part of the residual layer due to fine-scale advection. The model reproduces the order of magnitude of spatial variability observed at mesoscale (a few tens of kilometres). AROME provides a good simulation of the diurnal variability of the turbulent kinetic energy, while ARPEGE shows the right order of magnitude.


Introduction
Limited-area numerical weather prediction (NWP) models are used routinely for operational weather forecasting across the world.Their increasing resolution is making it important to evaluate their capability to reproduce the low-troposphere vertical profiles of temperature and moisture and their surface turbulent and radiative fluxes as they are being increasingly used for numerous applications, such as predictions of black ice on roads or agro-meteorology.Here we present the performance, which has remained largely unexplored so far, of these models in representing near-surface variables and boundary-layer turbulent kinetic energy (TKE).
The evaluation and improvement of models is often a motivation for deploying instruments in field campaigns.However, field campaign observations are less often extensively used to evaluate the representation of surface and Published by Copernicus Publications on behalf of the European Geosciences Union.8984 F. Couvreux et al.: Boundary-layer turbulent processes and mesoscale variability boundary-layer processes by operational models.Atlaskin and Vihma (2012) used observations from a field campaign to evaluate NWP models.They focused on the representation of very stable conditions at very low temperatures ( < −10 • C) in northern Europe and showed a systematic positive bias for the 2 m temperature, due to an underestimation of the stratification during the coldest nights characterized by very stable conditions.Many studies have used field campaign data to evaluate the behaviour of various non-operational limitedarea models.Steeneveld et al. (2008) used data from three particular days of the CASES-99 field campaign to evaluate the impact of the boundary-layer scheme and the radiative scheme on the performance of three different limited-area models.LeMone et al. (2013) used CASES-97 observations to evaluate the boundary-layer schemes and their diagnostics based on mesoscale model simulations.In parallel, models have been evaluated over permanent observing sites such as the ground-based remote sensing observations from the Swiss Plateau (Collaud Coen et al., 2014), the Atmospheric Radiation Measurement program (ARM, Morcrette, 2002, or Guichard et al., 2003) or the Cloudnet sites (Illingworth et al., 2007).In particular, the Cloudnet project has allowed a systematic evaluation of clouds in different operational forecast models.For instance, Bouniol et al. (2010) showed that models tended to overestimate cloud occurrence at all levels.
The Boundary Layer Late Afternoon and Sunset Turbulence (BLLAST) field campaign was conducted from 14 June to 8 July 2011 at Lannemezan in southern France, in an area of complex and heterogeneous terrain.A wide range of instrument platforms including full-sized aircraft, remotely piloted aircraft systems (RPASs), remote sensing instruments, radiosoundings, tethered balloons, surface flux stations and various meteorological towers were deployed over different types of surface (Lothon et al., 2014).During this campaign, 12 fair-weather days were extensively documented by intensive observing periods (IOPs).These days corresponded mainly to high-pressure fair-weather situations.In this study, we take advantage of the large dataset provided by this campaign to evaluate the vertical structure of the boundary layer and its diurnal evolution as represented in NWP models.
Here, we also focus on the mesoscale variability that can occur in the area and how this impacts the observations locally as well as how this is reproduced by the model.Acevedo and Fitzjarrald (2001) used observations complemented by a large eddy simulation (LES) to show that the spatial variability peaked in the evening transition and that land use and orography played a crucial role in setting temperature anomaly patterns.This highlights the important role of fine resolution in defining the right orography in the model.They also found that, around sunset, horizontal advection played a secondary role compared to vertical divergence.
Several recent studies have also assessed the behaviour of single-column models (a single column of the atmosphere that integrates the same suite of parametrizations as a full 3-D simulation) when representing the entire diurnal cycle by comparison to LES.Single-column runs are often used as a simplified configuration of a full 3-D simulation in order to highlight some deficiencies in the physics parametrization of the model and to test new developments.By comparing the 1-D model to the LES for a case based on observations at Cardington, UK (Beare et al., 2006), which covered the transition from early afternoon to the next morning, Edwards et al. (2006) showed that the 1-D model had difficulties in correctly representing turbulence diffusivity during the afternoon transition; this impacted the mean profiles.More recently, Svensson et al. (2011) compared LES and singlecolumn models on the entire diurnal cycle of a CASES-99 case and showed a faster decrease of the temperature in the afternoon compared to LES.However, this type of evaluation has not been carried out for operational NWP models and has not used observations of turbulence in the entire boundary layer.For example, observations of TKE profiles are quite rare, as they are only made during field campaigns.Therefore the boundary-layer parametrization based on a prognostic equation of the turbulent kinetic energy, which has been shown to perform better than a first-order scheme (Holt and Raman, 1988), has only been evaluated via comparisons with LES results (Cuxart et al., 2006, for instance).Here, we carefully analyse the turbulent kinetic energy, which is a key parameter of the turbulent scheme (Cuxart et al., 2000) used in the two French models evaluated.
Our objectives are (i) to evaluate the skills of operational NWP models in predicting the whole diurnal cycle of the boundary-layer temperature and moisture and in particular the afternoon transition, (ii) to assess the representation of the turbulent kinetic energy by models in which the boundarylayer parametrization is based on a prognostic evolution of the turbulent kinetic energy, (iii) to evaluate the variation of surface thermodynamic parameters for different covers.The observations and the models evaluated are described in Sect. 2 together with the methodology used to carry out the comparison.Results are presented in Sect.3, focusing on the general representation of the entire diurnal cycle: we provide separate analyses of the reproduction of the energy balance at the surface, the surface meteorological variables and the boundary-layer characteristics, and we end the analysis with a specific focus on the behaviour of the models during the afternoon transition.Discussion and conclusion end the paper.

Observations
The observations used in this study were acquired during the BLLAST field campaign and have been described in detail by Lothon et al. (2014).Here, they are briefly summarized.They consist of measurements made by remote sensing (Doppler lidar, aerosol lidar, ultra high frequency (UHF) wind profiler) and in situ (automatic meteorological stations, sound- ings, remotely piloted aircraft systems, manned aircraft) instruments.They were not used in the assimilation system and could therefore be used for evaluation purposes without ambiguity.Table 1 summarizes all the types of data and measurements used in this study, giving details on the resolution of the raw data, the estimated parameters and their sampling.
In the following, we use the observations from the 12 IOPs of the field campaign (Lothon et al., 2014).In total, seven different sites were instrumented with eddy covariance systems and radiometers, documenting various types of covers (wheat, grass, forest, moor (an area of open wasteland with grass and heath), corn and more heterogeneous sites).Forest and grassland were the two main land F. Couvreux et al.: Boundary-layer turbulent processes and mesoscale variability types of the area, while moor and urban surface types were intermediate and corn, wheat and bare soil were minority covers (Hartogensis, 2015).A common procedure to retrieve surface heat fluxes from the raw data acquired at 10 Hz was applied to all surface stations measuring turbulence and provided surface turbulent and radiative fluxes at 30 min resolution (De Coster and Pietersen, 2012).These observations were used to evaluate the radiative and turbulent fluxes and also the meteorological parameters simulated by the models close to the surface.Their locations are indicated in Fig. 1b by small yellow dots.For these sites, the wind was measured at different altitudes above the ground and was interpolated to 10 m for comparison with the models using a logarithmic profile and the measure of the wind stress close to the surface.
To describe the vertical profile of the boundary layer, we used the data from (i) radiosondes (MODEM, M10 probes) launched four times per day (00:00, 06:00, 12:00 and 18:00 UTC -note here that UTC time was the same as solar time as the sites were very close to the Greenwich meridian) from the north-easternmost site ("main site" in the following, indicated by large orange dots in Fig. 1b), (ii) radiosondes (Vaisala RS92 probes) in the lower troposphere (up to 3 to 4 km; Legain et al., 2013) launched hourly from the southern most launching site (4 km from the main site) and (iii) the vertical profiles obtained from the remotely piloted aircraft system (RPAS) SUMO (Reuder et al., 2012) that flew around the main site and provided 4 to 10 soundings of the lower troposphere during the afternoons of the IOPs.These measurements provided vertical profiles of temperature, water vapour content and horizontal wind.Boundary-layer depths were derived from these profiles as detailed in Sect.2.3.Boundarylayer depths derived from UHF and aerosol lidar data were also used.
The combination of various measurements that provided estimates of the turbulent kinetic energy was a unique aspect of this field campaign.The Doppler lidar (Windcube, manufactured by LEOSPHERE, Gibert et al., 2012) and measurements from ground towers, aircraft and the turbulence probe mounted on the tethered balloon (Canut et al., 2016) all contributed estimates of the variance of horizontal and/or vertical wind at high sampling rates (every 4 s for the lidar and 0.1 s for the turbulence probe) and thus estimates of the turbulent kinetic energy.

Numerical weather prediction models
In this study we evaluate the behaviour of three numerical weather prediction (NWP) models: two NWP models from Météo-France: (i) a global model, ARPEGE (Courtier and Geleyn, 1988), with a stretched horizontal grid of about 10 km × 10 km over France and a 4D-Var assimilation system and (ii) a limited-area non-hydrostatic model, AROME (Seity et (b) Zoom of (a) with surface sites shown by small yellow dots and radiosoundings' launching site in large orange dots.Note that the westernmost site was the site for launching the few GRAW soundings that were not used in this study (Google Earth Source).
al., 2011), with a grid of 2.5 km × 2.5 km and a 3D-Var data assimilation system; the operational ECMWF IFS model with a horizontal grid size of around 16 km × 16 km (Simmons et al., 1989).
Table 2 presents the main characteristics (horizontal resolution, number of vertical levels, boundary-layer scheme, initialization time and forecast period, initialization of the landsurface properties) for the three models.For this field campaign, the AROME model was run in near-real time over a smaller domain (about a quarter of France) using lateral boundary conditions and initial condi- From a surface reanalysis tions from the operational AROME, which uses ARPEGE for the lateral boundary conditions.This provided specific outputs for the 16 grid points surrounding the main site (Fig. 1b).
All models employed a terrain-following hybrid sigmapressure vertical coordinate.However, the vertical grid differed from one model to another (Table 2): ARPEGE had 70 vertical levels with about 11 levels within the first kilometre (first level at 16 m), AROME had 60 vertical levels with about 15 levels within the first kilometre (first level at 10 m) and ECMWF has 91 vertical levels with about 11 levels within the first kilometre (first level at 10 m).The time step varied from 1 min for the AROME model to about 10 min for ARPEGE and ECMWF.The models also differed by their parametrizations.For the boundary-layer turbulence, AROME uses an eddy-diffusivity mass-flux concept with the local turbulence (small eddies) represented by a turbulent kinetic energy (TKE) prognostic scheme (Cuxart et al., 2000) with a non-local length scale (Bougeault and Lacarrere, 1989) and the boundary-layer thermals and shallow convection represented by a mass-flux scheme (Pergaud et al., 2009).ARPEGE uses the same TKE prognostic scheme (Cuxart et al., 2000) and uses a mass-flux scheme only when shallow convection is active (Bechtold et al., 2001).ECMWF uses an eddy-diffusivity mass-flux based on two updraughts (Koehler et al., 2011) and a non-local K profile for the boundary layer while shallow convection is handled by a separate bulk mass-flux scheme (Tiedtke, 1989).The surface scheme is ISBA in ARPEGE (Noilhan and Planton, 1989;Giard and Bazile, 2000), AROME uses the surface platform SURFEX (Martin et al., 2014) and ECMWF uses the HT-ESSEL model (Balsamo et al., 2009).All models have the same long-wave radiation scheme, the RRTM parametrization (Mlawer et al., 1997), but they differ for the shortwave component: ECMWF uses the SRTM parametrization, while AROME and ARPEGE have the Morcrette at al. (1991) code.The radiation scheme is called every hour for ARPEGE and every 15 min for AROME.Concerning the cloud scheme, ARPEGE uses a distribution of relative humidity based on Smith (1990), AROME a distribution of the saturation deficit based on Bougeault (1982) and ECMWF uses a prognostic scheme (Forbes et al., 2011).In ARPEGE, there are 12 different vegetation covers and one grid point can have only one given vegetation cover, while in AROME, each grid is associated with a certain fraction of various vegetation types (crops, land, town, mixtures of crops and woodland, Landes forest or broadleaf forest).

Comparison methodology
This section gives a detailed description of how the comparison was conducted, focusing on the temporal and spatial resolution of the different variables obtained from models and observations.
Due to the coarse grid spacing of each model, real surface heterogeneities, topography and local circulation are not expected to be reproduced by the models.The real orography and the one present in each model are shown in Fig. 2, from which it can be seen that high resolution (2.5 km) is needed to resolve the north-south valleys of the Pyrenees.Large variability of surface fluxes exists among the sites (Fig. 1) at scales smaller than 2.5 × 2.5 km 2 , which corresponds to the size of a grid box in AROME (see for example in Fig. 7 of Lothon et al. (2014) the differences between the moor and the corn sites, or the grass and the wheat sites, which are a few hundred metres apart).This is mainly due to surface cover as noted by Lothon et al. (2014).However, the variability among observations and the differences between model outputs and observations provide clues as to the main drawbacks of the models.The simulated grid points (and associated columns) surrounding the locations of the measurement sites were extracted and are shown in Fig. 1: 3 neighbouring grid points are extracted for ARPEGE, 16 neighbouring grid points for AROME (a box of 10 km × 10 km including all sites) and 9 neighbouring grid points for ECMWF.Table 3 presents the main physiographic characteristics (altitude, albedo, vegetation fraction and roughness length) of these points.
For ECMWF we evaluated both the analysis available every 6 h and the operational forecast with 3-hourly outputs for the surface characteristics from the run launched at 00:00 UTC, while for the two other models we show the forecast launched at 00:00 UTC with hourly outputs.The forecast length analysed here was chosen to be 24 h.The atmospheric variables corresponded to instantaneous fields sampled every hour for AROME and ARPEGE and every 6 h for ECMWF.The diagnostics T2m (temperature at 2 m), rh2m (relative humidity at 2 m) and ws10m (horizontal wind speed at 10 m) were obtained using a vertical interpolation following Geleyn (1988) based on the Monin-Obukhov theory between the surface and the first model level for ARPEGE and IFS or calculated using a prognostic surface boundary-layer scheme for AROME (Masson and Seity, 2009).
In the model, the boundary-layer depth is the first level where the TKE is below 0.01 m 2 s −2 .In the observations, various diagnostics allowed the boundary-layer depth to be derived: i. the height of maximum air refractive index structure coefficient (Jacoby-Koaly et al., 2002) obtained from UHF data; it usually provides an estimate of the inversion height based on the vertical gradient of the relative humidity; ii. the first level below the height diagnosed through (i) where the TKE dissipation rate becomes greater than a threshold (10 −3 m 2 s −3 ) also derived from the UHF data; this criterion gives an estimate of the top of the turbulent layer; iii. the height of the largest gradient of aerosol backscatter from the aerosol lidar data (Boyouk et al., 2010); this is another way to estimate the inversion height and iv. the best (determined manually) of four criteria applied to the various vertical profiles from soundings and RPASs (Remotely Piloted Airplane Systems) (Lothon et al., 2014), using the height where the virtual potential temperature exceeds the averaged value over the lower levels plus 0.2, or the height of maximum relative humidity, or the height of maximum first derivative of the potential temperature or the height of minimum first derivative of the specific humidity; often, the criterion based on the virtual potential temperature is chosen; a comparison of different boundary-layer depths derived from various instruments is presented in Bennett et al. (2010).
The decrease of the boundary-layer depth in the afternoon transition is a delicate process and in practice, its estimation is sensitive to the criteria used to derive the boundary-layer depth as already shown by Grimsdell and Angevine (2002) and Bennett et al. (2010).Details of this will be given in  and the maximum horizontal range (right axis), computed as the difference between the maximum value and the minimum value for all sites or all grid points of a given model but averaged respectively over day and night; for observations both the range computed with all sites (full line) or by removing the forest stations (dash-dotted lines).The vertical grey shading marks the night-time.Two consecutive vertical dashed lines indicate interruption in the days.Note that for ARPEGE, due to the different behaviour of ARP1 and ARP3, only ARP2 is plotted as the mean while the spatial variability is computed with the three points.
Sect. 3.5.The diagnostic used in the model was compared to the criteria (iv) applied to the model profiles.These two diagnostics were consistent but in ARPEGE, the model diagnostic tended to overestimate the value derived from the profiles by about 200 m, while, in AROME, there was a very good agreement except for 14 June after 15:00 UTC and 15 June after 14:00 UTC due to the presence of clouds (discussed later).In the following, we will use the model diagnostic discarding these hours of disagreement as it depicts the turbulent layer, in particular during the afternoon transition.
When comparing observations and modelling, we considered the fact that the horizontal and temporal average in observations should be as consistent as possible with the time step and resolution of simulations.In the latter, the surface turbulent and radiative fluxes at a given hour h correspond to the average value between hour h − 1 and hour h.In the observations, values were processed every 30 min and then averaged to provide the 1 h average for the comparison.Fur-thermore, it should be kept in mind that the area (footprint of a few hundred metres) of the surface sampled in the measured surface turbulent fluxes was small relative to the grid size of the three NWPs.
In the observations, the TKE was estimated for 20 min time windows for the 60 m tower, the Doppler lidar and the tethered balloon; 10 min windows for the 10 m tower (sensitivity to a computation with 20 min windows did not change the results); and for horizontal legs of 25-30 km for the aircraft measurements (corresponding to 5-8 min cf.Table 1 and Canut et al., 2016, for more details).This is a compromise between having the same time window as the other measurements and minimizing the influence of the mesoscale heterogeneities.Note that a 5 km high-pass filter was applied only to the aircraft raw data before the calculation of the TKE to filter out the mesoscale variability.This is the current treatment used for flux computation, but it induces an underestimation of the TKE of about 20 %.We also tested the TKE estimates obtained with a 2.5 km high-pass filter but it was affected by a large time variability, indicating that the samples were not large enough.The estimation of the TKE with the Doppler lidar (Gibert et al., 2012) assumed that the turbulence was isotropic and derived the value from the measured vertical velocity variances.To evaluate this hypothesis, we computed the ratio A = 15 w 2 TKE , a coefficient from the tower measurements (both from the 60 m tower and the 10 m tower) and from the tethered balloon.A = 1 if the turbulence is isotropic; when A > 1, the contribution of the vertical velocity variance is dominant (A = 3 if the horizontal velocity variances are zero), and when A < 1, the contribution of horizontal variance is dominant (A = 0 if the vertical velocity variance is zero).Both the tower measurements and the tethered balloon (the tethered balloon never reached heights above 500 m) measurements indicated that above 0.1 to 0.2 zi (zi being the boundary-layer height) and in the middle of the boundary layer, this coefficient was between 1 and 2, suggesting that the variance of the vertical velocity was often the main contributor to the TKE at that height and the TKE could be estimated from the w 2 as TKE = 1.5w 2 .Aircraft measurements indicate that closer to the top of the boundary layer this coefficient decreased again, taking values between 0.75 and 1. Below 0.1 zi, the variance of horizontal wind was significant and the coefficient A was mostly below 0.6 (see Canut et al., 2016, for more details).Therefore, in the following, we only use Doppler lidar estimates from altitudes above 100 m.More complex computations taking the dayto-day and vertical variation of the anisotropy factor derived from the tethered balloon or aircraft into account could be performed in a future study.Note also that as we derive the TKE as 1.5 w 2 , the observed TKE tends to be overestimated most of the time but may be underestimated on days with more wind, conditions in which horizontal wind fluctuations are expected to be larger.
In the models, a horizontal resolution of 2.5 km in AROME and 10 km in ARPEGE is equivalent to 9 and 30 min respectively if a wind speed of around 3-5 m s −1 is considered in the boundary layer.This is consistent with the 20 min used to derive the TKE from surface point observations.We checked that none of the models directly resolved boundary-layer eddies -even the model with the finest resolution (due to its effective resolution of ∼ 9 x; see Ricard et al., 2013).The contribution of the mass-flux scheme in AROME was taken into account by adding the mass-flux contribution, estimated as 0.5•a up •w 2 up , where a up is the coverage fraction of the thermals and w up the thermal vertical velocity, to the subgrid TKE.This contribution is small close to the surface and reaches about 20 % of the total in the middle of the boundary layer.
Eventually, in order to characterize the afternoon transition, the time at which the buoyancy flux became negative was determined in both observations and models.This was  done by finding the 0 cross-over from the interpolation of hourly flux outputs.Below, we evaluate the representation of the diurnal cycle of the boundary-layer characteristics and surface energy budgets over all 12 IOPs.As shown in Lothon et al. (2014), these days correspond to mainly high-pressure fair-weather conditions with no cloud cover, or, for 14, 15, 24, and 30 June, a small number of clouds.Most of the days experienced a typical mountain breeze circulation with nocturnal southerly downslope wind and north-westerly to north-easterly upslope wind during the days.The days of 25, 26 and 27 June did not register such circulation (cf.Lothon et al., 2014, Fig. 6) and were characterized by easterly winds.These 3 days also showed higher temperature and stronger wind; this was due to the presence of a low-pressure system in the Gulf of Lion (for more details see Nilsson et al., 2015a).In the following, these 3 days will be referred to as hot days.

Results
In this section, we compare surface fluxes, meteorological variables, boundary-layer structure and turbulent kinetic energy for the 12 IOPs.

Radiative and surface fluxes
Figure 3 presents series of 24 h sequences of the observed and simulated surface downwelling solar radiation, sensible heat fluxes and latent heat fluxes for the 12 different IOPs (from 14 June to 5 July 2011).The mean value and the maximum range (computed at each time step as the difference between the maximum and the minimum over all the points of either of the models or the observations), averaged for daytime and night-time respectively as a measure of the horizontal variability, are plotted.The cloudy days are clearly depicted by an increase in the horizontal variability of the observed surface downwelling solar radiation (Fig. 3a), con-sistent with Lothon et al. (2014).ARPEGE and AROME mostly distinguish between the clear days (noted "o") and the cloudy days (noted by triangles) indicated by an increased horizontal variability.For at least 2 observed clear days (20, 27 June), ECMWF depicts a decrease of downwelling solar radiation from 10:30 to 13:30 UTC; this suggests the presence of clouds in the model.There are some clouds from 15:00 to 19:00 UTC on 26 June, while ECMWF predicts variability in the downwelling solar radiation from 10:30 to 13:30 UTC.There are high clouds in ARPEGE throughout the day of 27 June, while observations only registered thin cirrus after 17:00 UTC (not shown).Stratocumulus is present in the morning of 30 June, clearing up through the afternoon.Cloud cover remains quite variable in the afternoon, whereas ARPEGE and ECMWF predict a cloud-free atmosphere.The spatial variability is slightly overestimated for 14, 15 and 30 June in AROME and underestimated for 24 June but is otherwise in good agreement with observations.In summary, all models capture the spatial and temporal variability in downwelling solar radiation in general, with, however, better behaviour for AROME in terms of cloud occurrence and spatial variability.
There is more discrepancy in the simulation of sensible heat fluxes, with biases reaching more than 100 W m −2 (Fig. 3b).For instance, ECMWF overestimates the surface sensible heat fluxes.The variability from one IOP to another (Fig. 3b) is correctly reproduced by all three models with, for instance, a decrease of the maximum sensible heat flux during the hot days.They also all predict more negative sensible heat flux during the nights of the hot period (from 25 to 27 June) even though ECMWF and ARPEGE underestimate this negative sensible heat flux while AROME overestimates its value in the first night (25 to 26 June).Concerning the spatial variability, the large value obtained from the surface sites is noteworthy.The observed range is computed either for all the stations (full black line) or by removing the forest stations (dash-dotted black line).The forest stations induce larger observed ranges, especially during the first part of the period.The spatial variability among the various ECMWF grid points is much smaller; this is partly explained by a coarser horizontal grid size, while the value for ARPEGE and AROME is of the same order of magnitude as the observations but slightly underestimated at the end of the period.As shown in Fig. 4a, ARPEGE predicts very large sensible heat fluxes for two of the three points (ARP1 and ARP3 mainly differ from ARP2 in terms of altitude and roughness length as shown in Table 2).They are of the same order of magnitude as observations recorded at forest sites (dashed and dash-dotted black lines) and are characterized by forest cover, which has a lower albedo (0.12 against 0.2).They are also at higher altitude.However, these simulated sensible heat fluxes are too large to be representative of a grid box over the area 10 km wide, which, according to Fig. 1, cannot be characterized by a uniform forest cover; indeed, there is a large variability of surface covers at scales below 10 km.The third point (northernmost, ARP2) is in better agreement with the non-forest sites (indicated by the black error bars).
There is also discrepancy in the simulation of latent heat fluxes.AROME systematically overestimates the observed values by up to 100 W m −2 (Fig. 3c) and this may be related to the soil moisture content being too large (however, no observations were available at various sites to evaluate this variable).The two high-vegetation points of ARPEGE (Fig. 4b) do not show evidence of greater evaporation as could have been expected from the larger net radiation (due to the lower albedo).ECMWF correctly reproduces the range of observations.The variability among the various IOPs is also correctly reproduced, with higher latent heat fluxes during the hot days (Fig. 3c).The spatial variability is of the same order of magnitude as observed in AROME, slightly underestimated in ARPEGE and strongly underestimated in ECMWF.Interestingly, when the latent heat fluxes are plotted against the sensible heat fluxes at 12:00 UTC, the models reproduce the −1 slope related to an almost constant available energy (cf.Fig. 12), in agreement with LeMone et al. (2003).This is more valid for the clear days (cyan or blue symbols) than the cloudy days (green and purple symbols), in agreement with Lohou and Patton (2014).Most of the observations also record a negative relationship (though with a less steep slope) except the observations at 60 m on the tower (grey squares) and observations at 30 m over the forest (dots).
To sum up, we note an overestimation of the sensible heat flux by ARPEGE for the two points covered with forest and, to a lesser extent, by ECMWF and an overestimation of the latent heat flux by AROME (strong bias).All models reproduce the day-to-day variability with the characteristics of the hot period in particular.The observed spatial variability is underestimated for ECMWF probably because of the larger horizontal grid size and more expanded area for the nine extracted grid points.

Meteorological variables
Figure 5 presents the same figures as Fig. 3 for the observed and simulated 2 m temperature, 2 m water vapour mixing ratio and the 10 m wind speed.First, all models reproduce the variability of the 2 m temperature through the period with, in particular, a warming from 24 to 27 June.In AROME and ARPEGE, the maximum of daytime temperature occurs earlier (by about 1 h) than in the observations (note that this could not be analysed in ECMWF with 3-hourly outputs).The main discrepancies occur during the night where the models tend to have a cold bias, consistent with common deficiencies of NWP models (Svensson et al., 2011).The spatial variability in night-time temperature among sites is smaller for the hot period; this is probably due to higher wind speed during this time (as shown in LeMone et al., 2003, andAcevedo andFitzjarrald, 2001).The models do not reproduce this behaviour: during the hot period, the models predict both an increasing variability of both night-time sensi- ble heat fluxes and 2 m temperature.The underestimation of the spatial variability by AROME and ARPEGE during most days is not due to a misrepresentation of the wind, which was relatively weak over the whole period and more or less in agreement with observations.ECMWF overestimates the spatial variability.This is partly explained by the westerly grid points being warmer (not shown).The diurnal cycle of the spatial variability in ECMWF is also inverted compared to observations with higher daily variability than nightly variability.This needs further investigation.
Concerning the 2 m water vapour mixing ratio, the models reproduce the progressive moistening before a precipitating event (the days with precipitation were not IOPs and thus correspond to an interruption of time in Fig. 4, indicated by the double vertical dotted lines).Often, observations show morning and evening maxima (e.g. 19, 27, 30 June, 1, 2 July) associated with latent heat flux within a shallow boundary layer, and this is reproduced by the models.The models also reproduce the increase in spatial variability during the hot period.There is no clear diurnal cycle in observations and models except in ECMWF which presents a drying at midday leading to a dry bias during daytime especially in the second part of the period.It can be seen that the overestimation of the latent heat fluxes by AROME has no clear consequences in the reproduction of the 2 m water vapour mixing ratio.Concerning the 10 m wind speed ARPEGE and AROME reproduce higher wind speed (greater than 2-3 m s −1 ) during the hot period with also a larger spatial variability.ECMWF does not reproduce this shift.
In summary, the surface meteorological variables were well simulated in AROME and ARPEGE but were slightly less accurate in ECMWF, especially for wind speed and water vapour mixing ratio.In the following sections, we focus only on the French models for which we have hourly outputs.

Vertical structure
Figure 6 presents scatter plots of the simulated vs. observed values of the potential temperature and water vapour mixing ratio averaged over the first 500 m deep layer.First, there is good agreement among all types of observations for potential temperature.Then, the MODEM soundings are drier than the others by about 1 g kg −1 , consistent with the findings of Agusti-Panareda et al. (2010).AROME and ARPEGE display a cold bias of about 1.5 K.In ARPEGE, the temperature bias is dependent on the average temperature with less bias for temperatures higher than 305 K. ARPEGE does not present a warm bias despite its overestimation of the sensible heat flux for two of the grid points.AROME presents a moist bias, which is consistent with the latent heat flux being too high, while ARPEGE exhibits a dry bias.The AROME moist and cold biases are not clear in the time evolution of 2 m variables, indicating distinct reproduction of the surface layer and the boundary layer.
Figure 7 illustrates the time evolution of the vertical profiles of potential temperature and water vapour mixing ratio (sampled every 2 h for clarity) from 12:00 to 20:00 UTC for two clear IOPs on 27 June 2011 (one of the hot days) and 1 July 2011.AROME captures the strong inversion in potential temperature that occurs at the top of the boundary layer (at 14:00 UTC on 27 June or 1 July) better, and this is true for most of the IOPs.This may be due to the finer ver- . Vertical profiles of potential temperature and water vapour mixing ratio for observations (left panels), AROME (middle panels) and ARPEGE (right panels) for 2 days: 27 June 2011 (upper panels) and 1 July 2011 (lower panels).For visibility purposes, the vertical profiles are offset by adding 2 K or 2 g kg −1 every 2 h from 14:00 to 20:00.tical grid.In both models, there is more spatial variability during the hot period than otherwise and this remains true throughout the day, and is consistent with the results at the surface (higher variability in terms of surface heat fluxes and 2 m meteorological variables) as shown previously.In particular in AROME, on 27 June, the variability among the 16 columns is larger than the variability among the 3 ARPEGE columns, even though the area covered by the 16 AROME points is equivalent to the size of one grid of ARPEGE.For 1 July, note the maximum in water vapour mixing ratio in the upper part of the boundary layer simulated by AROME; this maximum is also observed in the radiosoundings.Analysis of the moisture budget indicated that this maximum was mainly related to fine-scale advection not resolved at 10 km (not shown).
To further assess the representation of the vertical structure of the boundary layer, we compare the boundary-layer depth estimated by the model with that estimated from observa-tions.The boundary-layer depth is a useful diagnostic to evaluate the representation of boundary-layer evolution in models as it results from the interplay of surface flux, turbulence and subsidence (LeMone et al., 2013).Figure 8 presents the time evolution of the different boundary-layer depth estimates for all the IOPs.The overestimation of the boundarylayer depth by AROME and ARPEGE (more pronounced in ARPEGE) on 14 and 15 June 2011 is explained by the modelled boundary-layer depth criterion based on significant TKE, which marks the top of the shallow cumulus layer.Both AROME and ARPEGE are able to reproduce days with higher boundary layers compared to days with shallower boundary layers, with, for instance, a shallower boundary layer during the hot days and, the highest on 30 June and 1 and 2 July (if we discard 14 and 15 June).The model forecasts are initialized every day so part of the variability among the IOPs is forced through the initial state, but the existence of variability of the boundary-layer depth among the IOPs  shows that the physics of the models responds correctly to these differences in weather.Lothon et al. (2014) identified three types of growth of the boundary layer occurring in the morning of the day: typical growth on 20, 24, 25 and 30 June and 2 July, slow growth on 26 June, 27 June and 5 July and rapid growth on 14 and 19 June and 1 July.The causes of the different types of morning boundary-layer growth are related to the initial profiles, the intensity of the sensible heat fluxes and the intensity of the subsidence as explained in Lothon et al. (2014).This distinction is reproduced by the models.Evaluating the decrease of the boundary layer in the afternoon is more complex.The aerosol diagnosis based on the lidar measurement always shows the top of the inversion layer in the afternoon, while the profile diagnosis and the reflectivity gradient from the UHF indicate either the top of the stable layer or the top of the residual layer depending on the case.The model diagnosis depicts the top of the turbulent layer; this is also the case when the boundary-layer depth is diagnosed from the dissipation rate measured by the UHF.The difference between those diagnoses in the afternoon indicates the existence of a pre-residual layer between the top of the turbulent layer and the top of the inversion layer, as detailed in Nilsson et al. (2015b).Concerning the decrease of the turbulent layer, ARPEGE predicts a later decrease than AROME most of the time.AROME is in better agreement with the boundary-layer depth diagnosed from the dissipation rate even though AROME tends to give slightly higher values; this could be explained by the fact that the turbulence variable used to diagnose the boundary-layer depth is different: TKE instead of dissipation.The large spatial variability among the model grid points is also worth noting, in particular on 26 and 27 June and 2 and 5 July.However, the highest boundary layer is not systematically over the same grid point, so this can not be explained by particular surface characteristics.

Turbulent kinetic energy
A unique feature of this campaign was the existence of various simultaneous measurements of the turbulent kinetic energy at various heights in the atmosphere.We used these measurements to evaluate the reproduction of the TKE by the subgrid turbulence scheme in AROME and ARPEGE.We remind the reader here that despite its fine resolution of 2.5 km, no resolved eddies were simulated in AROME and that we included the mass-flux contribution to the total TKE.
Figure 9 presents the time evolution of the TKE for all the IOPs close to the surface and higher in the boundary layer.In the upper panel, the TKE observed close to the surface, at ∼ 8 m, is compared to the TKE modelled at the first level (at 11 m in AROME and 17.5 m in ARPEGE).Often, observations show significant TKE in the morning, which is not simulated except for a few days (25, 26 and 27 June for AROME and 24 June for ARPEGE), characterized by a greater wind speed and therefore stronger shear production (Fig. 5c).There is also significant TKE in the evening with a minimum around sunset that is also not simulated except for a few days (20, 25 and 26 June and 5 July for AROME and 5 July for ARPEGE).This minimum of TKE is associated with a minimum of wind speed and is present for most days with weak wind.Note that the maximum measured on the evening of the 27 June was associated with convective storms and is reproduced by the models.Those morning and evening TKE values are related to slope wind and also potentially to the effect of the nocturnal low-level jet in the early morning.ARPEGE tends to present a Gaussian diurnal cycle of the TKE for most days (except 3 days: 24 and 27 June and 5 July, where maximum TKE exists in the morning or the evening), but with a maximum value consistent with observations.AROME systematically underestimates the maximum value but records a variable diurnal cycle from one day to another.This underestimation is in apparent contradiction with a larger sensible heat flux, at least near the end of the period.The higher value in ARPEGE can be explained by a higher model level (17.5 m vs. 11 m, as less turbulence is expected close to the ground) and a larger grid size (9 km vs. 2.5 km).Higher in the atmosphere, the modelled and observed TKE values are in better agreement.Note that the various types of observations agree in terms of intensity.The temporal variability at these levels is well reproduced by the models with smaller values during the hot period, in agreement with lower buoyancy flux, which is the main source of TKE during the day (see also Nilsson et al., 2015a).At 60 m and higher up, AROME systematically has less TKE than ARPEGE, as expected from a smaller grid size.
Figure 10 illustrates the time evolution of vertical profiles of the turbulent kinetic energy modelled and observed for 1 July (this was the only day where we had enough observations to retrieve a time-varying vertical profile of the TKE).AROME has larger TKE than ARPEGE around midday and it decreases the turbulence more rapidly.The shape of the vertical profiles is consistent between each model and the observations.The lidar observations (triangles; note that this is a TKE estimate deduced from the turbulent variance of the vertical velocity) indicate a more or less stationary value in the middle of the boundary layer from 14:00 to 16:00 UTC; this is not simulated by the models.However, it should not be forgotten that the lidar only measures the vertical velocity variances by assuming A = 1 (same contribution from vertical and horizontal velocity variances).However, a comparison of the square (tethered balloon) and the triangle (Doppler lidar) symbols of the same colour and at the same altitude gives an idea of the error on this estimation: A is underestimated during daytime with values more around 1.3-1.8(smaller contribution from vertical wind variances), while A is overestimated in late afternoon (17:00 and 18:00 UTC) with A around 0.4-0.8(stronger contribution from horizontal wind variances).This deserves further investigation with more measurements of the vertical profiles.Comparison of the shear contribution with the buoyancy contribution in the creation of TKE and the TKE budget in general could also be further analysed in observations and models.

Afternoon transition
In this section, we focus on the afternoon transition period.During this period, the turbulence regime changes from the fully convective regime of turbulence, close to homogeneous and isotropic, towards more heterogeneous and intermittent turbulence.Most of the terms in the TKE equation -buoyancy production, shear production, dissipation and vertical transport -are small (Nilsson et al., 2015b).
Concerning the evolution of the boundary layer in the afternoon, the IOPs can be separated into the two categories proposed by Grimsdell and Angevine (2002)   1 and 2 July pertaining to the inversion layer separation cases (ILS, so-called by Grimsdell and Angevine, 2002, where the height of the reflectivity gradient stays more or less at the same height as the maximum registered during the day) and 25, 26 and 27 June pertaining to the descent cases (where the height of the reflectivity gradient decreases with time in the evening).As in Grimsdell and Angevine (2002), the ILS cases are colder and drier days characterized by strong inversion of potential temperature at the top of the boundary layer and associated with strong shear as shown in Nilsson et al. (2015a).These cases also have a strong inversion reproduced by the models (not shown except for 1 July).The descent cases are warmer and moister days corresponding to the hot period.However, the height of the strongest gradient in the UHF reflectivity is more representative of the top of the inversion layer and does not really determine the top of the turbulent layer, which is better indicated by the height derived from the dissipation rate (in pink in Fig. 8).This height is more comparable to the boundary-layer depth diagnosed in the models, which makes sense as TKE and dissipation rate are closely related.AROME always predicts an earlier decrease of turbulence than ARPEGE and agrees better with the evolution of the height derived from the dissipation rate.
The layer between the pink and the red symbols was named the pre-residual layer by Nilsson et al. (2015b).It is characterized by very low turbulence and results from the adjustment of turbulence to the decreasing surface fluxes (Darbieu et al., 2015).Figure 11 presents the variations of the time when the virtual temperature flux (which is a combination of the surface sensible heat flux and the latent heat flux) becomes negative, t_Hv0, through the IOPs and the various points.This time varies strongly from one surface to the other in the observations as already shown by Lothon et al. (2014, their Fig. 8 and black symbols in Fig. 9), suggesting that the vegetation partly drives the delay of the transition from one site to the other.The range of t_Hv0 among the three points of ARPEGE (blue symbols) is less than 1 h except during the hot period (26 and 27 June) and 1 July.The range of t_Hv0 is much larger in AROME (green symbols), with a range varying from 2 to 6 h, with, however, no systematic behaviour for a given point (indicated by a given symbol).AROME systematically has an earlier t_Hv0 than ARPEGE, consistent with an earlier decrease of turbulence.This also occurs earlier during the hot period than on the other days, and this is reproduced by the models.In observations and models, the spatial variability is the strongest during the hot period.
In summary, the models do a relatively good job during the afternoon.This could be related to the quasi-stationary behaviour discussed in Darbieu et al. (2015) and Nilsson et al. (2015a), where no changes in turbulence structure or characteristics are evident after normalization by the decreasing surface sensible heat fluxes.The difficulties increase in the very late afternoon.We have also noted more difficulties when the models attempt to reproduce the varying characteristics of close-to-surface variables at night.This highlights the models' difficulties in reproducing stable conditions.

Conclusions
The BLLAST field campaign gathered a large dataset, in particular high-frequency observations of the vertical structure of the boundary layer and observations of the turbulent kinetic energy; this enabled us to extensively evaluate three numerical weather prediction models.In summary, all models reproduced the temporal variability observed among the dif- ferent IOPs in terms of variations of the cloud amount (clear vs. partly cloudy conditions), maximum height of the boundary layer and variations of temperature.This is also a necessary first step if we want to use such models further to derive the large-scale fields, e.g.large-scale advection which are needed for smaller scale modelling studies.For instance, during the hot period, models and observations produced lower sensible heat fluxes, higher temperature, stronger winds and weaker TKE than during the other days.The different types of growth of the boundary layer encountered during the field campaign and detailed in Lothon et al. (2014) were correctly distinguished by AROME and ARPEGE.However, systematic biases appeared over the 12 IOPs: latent heat fluxes in AROME that are too large, a diurnal amplitude of relative humidity at 2 m that is too large and a dry bias during the day for ECMWF (especially at the end of the period).For two ARPEGE points, the surface fluxes were similar to measurements over forest; but the satellite data do not indicate a homogeneous forest patch over 10 × 10 km 2 in this 10 × 10 km 2 area.AROME reproduced the vertical structures better and also the variability in boundary-layer depth among the different IOPs in terms of daily maximum value or growth in the morning.The spatial variability reproduced by AROME was similar to the one derived from the various in situ surface sites.
For the first time, turbulent kinetic energy, the prognostic variable of the turbulence scheme in AROME and ARPEGE, has been evaluated.Both models reproduced the right order of magnitude.AROME reproduced the variation from one day to another of its diurnal cycle better, while ARPEGE always predicted a similar bell-shaped evolution.However, AROME underestimated the value while ARPEGE was in better agreement with the observed intensity.Note that we took the contribution of the mass-flux scheme to the TKE into account here.This may be due to differences not only in grid size, but also in physical parametrization.In a future study, we could gain some insight by evaluating the different simulated terms of the near-surface TKE budget that have also been derived from observations by Nilsson et al. (2015a).
In summary, this study is a first attempt to analyse the improvements provided by high-resolution numerical weather prediction.AROME seemed to depict the mesoscale spatial and temporal variability better.However, future studies are needed to determine the exact role of the increase in resolution vs. the change in physical parametrization.

Data availability
The data used in this study is freely available on the BLLAST data base: http://bllast.sedoo.fr/database/.
Acknowledgements.The authors thank F. Said for providing the tower measurements, J. Reuder for providing the SUMO measurements, F. Edited by: S. Galmarini Reviewed by: M. LeMone and one anonymous referee

Figure 1 .
Figure 1.(a) Map of the different points extracted from the models (red for ECMWF, blue for ARPEGE and cyan for AROME).(b)Zoom of (a) with surface sites shown by small yellow dots and radiosoundings' launching site in large orange dots.Note that the westernmost site was the site for launching the few GRAW soundings that were not used in this study (Google Earth Source).

−Figure 3 .
Figure 3.Time series of 24 h sequences for the 12 IOPs of (a) surface downwelling solar flux, (b) sensible heat flux and (c) latent heat flux, measured over surface sites in black, simulated by ARPEGE in blue, by AROME in green and ECMWF in red with the mean value (left axis)and the maximum horizontal range (right axis), computed as the difference between the maximum value and the minimum value for all sites or all grid points of a given model but averaged respectively over day and night; for observations both the range computed with all sites (full line) or by removing the forest stations (dash-dotted lines).The vertical grey shading marks the night-time.Two consecutive vertical dashed lines indicate interruption in the days.Note that for ARPEGE, due to the different behaviour of ARP1 and ARP3, only ARP2 is plotted as the mean while the spatial variability is computed with the three points.

Figure 4 .
Figure 4. Time series of 24 h sequences for the 12 IOPs of (a) sensible heat flux and (b) latent heat flux.Measurements over several surfaces are indicated by a black curve for the mean, with horizontal standard deviations indicated by error bars; the dashed and dot-dashed black lines correspond to the observations over the forest sites that are not included, either in the mean or in the horizontal standard deviations.Values simulated by ARPEGE are indicated in dark blue for point 2, light-green for point 1 and green for point 3.

−Figure 5 .
Figure5.Time series of 24 h sequences for the 12 IOPs of (a) 2 m temperature, (b) 2 m water vapour mixing ratio and (c) 10 m wind speed, measured over several surfaces in black, simulated by ARPEGE in blue, by AROME in green and ECMWF in red with the mean value (left axis), and the maximum horizontal range (right axis, computed as the difference between the maximum value and the minimum value for all sites or all grid points of a given model but averaged respectively over day and night).The vertical grey shading marks the night-time.Two consecutive vertical dashed lines indicate interruption in the days.Note that for ARPEGE, due to the different behaviour of ARP1 and ARP3, only ARP2 is plotted as the mean while the spatial variability is computed with the three points.

Figure 6 .
Figure 6.Scatter plot for (a, b, c) the potential temperature and (d, e, f) the water vapour mixing ratio averaged over the first 500 m deep layer: (a, d) AROME values vs. the observed values, (b, e) ARPEGE values vs. the observation values and (c, f) values obtained from the Vaisala and the SUMO profiles vs. the values obtained from the MODEM profiles.Symbols vary from one day to the other and colour from one time to the other (see legend).

Figure 8 .
Figure8.Time series of boundary-layer height observed by aerosol lidar (orange diamonds), UHF (from reflectivity in red squares and from the dissipation in pink triangles), radiosoundings or SUMO profiles (green stars) or simulated by ARPEGE (blue triangles) or AROME (cyan full circles) for each IOP.As indicated in the text, no value is drawn from ARPEGE and AROME after 14:00 UTC on 14 and 15 June as the existence of clouds induce that the boundary-layer height diagnostic depicts in fact the top of the shallow clouds.

Figure 9 .
Figure 9.Time series of turbulent kinetic energy observed (in symbols) or simulated by AROME (full line) and ARPEGE (dotted line) at (top) 8 m above ground level for observations, 11 m for AROME and 17.5 m for ARPEGE, (middle) 60 m above ground level and (bottom) 100, 300 and 600 m above ground level for the different IOPs from 15 June to 5 July.

Figure 10 .
Figure10.Vertical profiles of the turbulent kinetic energy modelled by AROME (full lines) and ARPEGE (dotted lines) from 12:00 to 18:00 UTC (see legend); when available, observations are overplotted.Time (on the x axis) when the virtual temperature flux becomes negative for surface stations observations (black symbols), the ARPEGE grid points (blue symbols) or the AROME grid points (cyan symbols) for each IOP plotted on the y axis.

Figure 11 .
Figure11.Time (on the x axis) when the virtual temperature flux becomes negative for surface stations observations (black symbols), the ARPEGE grid points (blue symbols) or the AROME grid points (cyan symbols) for each IOP plotted on the y axis.

Figure 12 .
Figure12.Latent heat flux vs. sensible heat flux at 12:00 UTC in observations (in black for clear days and grey for cloudy days; the dots correspond to the observations over the forest, while the crossed squares correspond to the observations at 60 m in the 60 mtower) and models (AROME in cyan for clear days and green for cloudy days and ARPEGE in blue for clear days and purple for cloudy days).One symbol is plotted for each IOP.
Gibert for providing the lidar measurements, P. Augustin for the boundary-layer diagnostic derived from the lidar, E. Pardyjak, D. Alexander and C. Darbieu for the forest flux measurements, D. Legain for the contribution of the CNRM to the field campaign, P. Durand for the Piper Aztec turbulence data process, B Piguet for the tethered-balloon turbulence data process and LEOSPHERE for providing the Doppler lidar to the field campaign.The BLLAST field experiment was made possible thanks to the contribution and support of several institutions: INSU-CNRS (Institut National des Sciences de l'Univers, Centre National de la Recherche Scientifique, LEFE-IMAGO program), Météo-France, Observatoire Midi-Pyrénées (University of Toulouse), EUFAR (European Facility for Airborne Research), BLLATE-1&2 and COST ES0802 (European Cooperation in the field of Scientific and Technical).The field experiment would not have occurred without the contribution of all participating European and American research groups, which have all contributed a significant amount.The Piper Aztec research aeroplane is operated by SAFIRE, which is a unit supported by INSU-CNRS, Météo-France and the French Spatial Agency (CNES).BLLAST field experiment was hosted by the instrumented site of Centre de Recherches Atmosphériques, Lannemezan, France (Observatoire Midi-Pyrénées, Laboratoire d'Aérologie).This research has also been carried out in the framework of the DEPHY2 project supported by INSU-CNRS through the LEFE-IMAGO program and the Ministry for Environment, Energy and the Sea.

Table 1 .
List of the instruments and their spatial and temporal resolutions.

Table 2 .
Description of the three models.

Table 3 .
Surface characteristics of the various points extracted from the models: the surface characteristics, i.e. albedo, vegetation fraction (the complementary being bare soil), leaf area index (LAI) and roughness length correspond to the total value for the grid point.In ARPEGE and ECMWF the roughness length takes the subgrid orography into account.