Regional effects of atmospheric aerosols on temperature: an evaluation of an ensemble of online coupled models

. The climate effect of atmospheric aerosols is associated with their in(cid:3)uence on the radiative budget of the Earth due to the direct aerosol(cid:150)radiation interactions (ARIs) and indirect effects, resulting from aerosol(cid:150)cloud(cid:150)radiation interactions (ACIs). Online coupled meteorology(cid:150)chemistry models permit the description of these effects on the basis of simulated atmospheric aerosol concentrations, although there is still some uncertainty associated with the use of these models. Thus, the objective of this work is to assess whether the inclusion of atmospheric aerosol radiative feedbacks of an ensemble of online coupled models improves the simulation results for maximum, mean and minimum temperature at 2 m over Europe. The evaluated models outputs orig-inate from EuMetChem COST Action ES1004 simulations for Europe, differing in the inclusion (or omission) of ARI and ACI in the various models. The cases studies cover two important atmospheric aerosol episodes over Europe in the year 2010: (i) a heat wave event and a forest (cid:2)re episode (July(cid:150)August 2010) and (ii) a more humid episode including a Saharan desert dust outbreak in October 2010. The simulation results are evaluated against observational data from the E-OBS gridded database. The results indicate that, although there is only a slight improvement in the bias of the simulation results when including the radiative feedbacks, the spatiotemporal variability and correlation coef(cid:2)cients are improved for the cases under study when atmospheric aerosol radiative effects are included.


Introduction
Atmospheric aerosol particles are known to have an impact on Earth's radiative budget due to their interaction with radiation and clouds properties, which is dependent on their optical, microphysical and chemical properties, and they are considered to be the most uncertain forcing agent.They influ-Published by Copernicus Publications on behalf of the European Geosciences Union.
ence climate by modifying the global energy balance through both absorption and scattering of radiation (direct effect) and by acting as cloud condensation nuclei, thus affecting clouds' droplet size distribution, lifetime (Twomey, 1977;Lohmann and Feichter, 2005;Chung, 2012) and reflectance (indirect effects) (Ghan and Schwartz, 2007;Yang et al., 2011).Depending on the atmospheric aerosol concentration, aerosolcloud interactions may result in an increase or decrease in the liquid water content, cloud cover and lifetime of lowlevel clouds and a suppression or enhancement of precipitation (Bangert et al., 2011).Moreover, aerosol absorption may decrease low-cloud cover by heating the air and reducing relative humidity.This leads to a positive radiative forcing, termed the semi-direct effect, which amplifies the warming influence of absorbing aerosols (Hansen et al., 1997).The Fifth Report of the Intergovernmental Panel on Climate Change (IPCC AR5) (Boucher et al., 2013;Myhre et al., 2013) distinguishes between aerosol-radiation interactions (ARIs), which encompass the aerosol direct and semidirect effect, and the aerosol-cloud interactions (ACIs), which encompass the indirect effects.
In order to account for these atmospheric aerosol effects, the use of fully coupled models is needed for meteorological, chemical and physical processes.Online coupled models include the interaction of atmospheric pollutants (gaseousphase compounds and aerosols) with meteorological variables (Baklanov et al., 2014).In this context, in its phase 2, the air quality model evaluation international initiative (AQMEII) (Alapaty et al., 2012;Galmarini et al., 2015) focused on the assessment of how well the current generation of coupled regional-scale air quality models can simulate the spatiotemporal variability in the optical and radiative characteristics of atmospheric aerosols and associated feedbacks among aerosols, radiation, clouds and precipitation.On this basis, a coordinated exercise of working groups 2 and 4 of the COST Action ES1004 European framework for online integrated air quality and meteorology modeling (EuMetChem, http://eumetchem.info)emerged in order to take into account the radiative feedbacks of atmospheric aerosol effects on meteorology.In this initiative, two important episodes with high loads of atmospheric aerosols were analyzed which were identified during the previous AQMEII phase 2 modeling intercomparison exercise (Galmarini et al., 2015).They were selected due to their strong potential for aerosol-radiation and aerosol-cloud-radiation interactions (Makar et al., 2015a, b;Forkel et al., 2015).
As a result of the AQMEII phase 2 initiative and Eu-MetChem COST Action, several studies covering the analysis of the ARI+ACI feedbacks to meteorology have been done (e.g., Baró et al., 2015;Forkel et al., 2015Forkel et al., , 2016;;Kong et al., 2015;San José et al., 2015).Focusing on the effects of including ARI+ACI on temperature, Forkel et al. (2015) focused on the 2010 Russian wildfire episode, where the presence of the atmospheric aerosols decreased the 2 m mean temperature during summer 2010 by 0.25 K over the tar-get area.For the same episode, Péré et al. (2014) showed daily mean surface temperature reductions between 0.2 and 2.6 K. Forkel et al. (2012) studied a 2-month episode (June to July 2006) in order to allow medium-range effects of the direct and indirect aerosol effect on meteorological variables and air quality.They found a slightly lower temperature over western Europe when including atmospheric aerosol feedbacks.This reduction followed the same pattern as the planetary boundary layer height.Moreover, during July 2006, Meier et al. (2012) found a general decrease of 0.14 K in 2 m temperature when simulating absorbing aerosol in upper layers compared to an aerosol-free troposphere over land surface.
However, all these studies are based on individual model evaluations and do not take into account an ensemble of regional models, in order to build confidence in model simulations and to characterize the uncertainty associated with the use of different modeling systems.Therefore, the objective of this work is to assess whether the outputs of an ensemble of regional online coupled model simulations including aerosol radiative feedbacks, during two important atmospheric aerosol episodes of the year 2010, improves the prognostic for maximum, mean and minimum temperature at 2 m over Europe.

Methodology
The analyzed model outputs are the results of a coordinated modeling exercise which was performed within the COST Action ES1004 (EuMetChem).In order to analyze the ARI or ARI+ACI effect on temperature, it was suggested to run three case studies for two episodes with different online coupled models with identical meteorological boundary conditions and anthropogenic emissions.The two considered episodes are (i) the Russian heat wave and wildfire episode in the summer of 2010 (25 July-15 August 2010) and (ii) an autumn Saharan dust episode, including dust transport to Europe (2-15 October 2010).
The weather conditions during the Russian forest fires were mainly dry and particularly hot, with light winds (Péré et al., 2014;Kong et al., 2015).During this situation, sealevel pressure (SLP) showed a high-pressure system over the northeast part of the Russian area, giving a strong positive SLP anomaly for this period.This resulted in a strong positive surface temperature anomaly accompanied by weak winds from the southeast (Baró et al., 2017).On the other hand, the dust period situation is characterized by a very deep trough with a vortex reaching 20 • N latitude.This situation is maintained for several days, causing a continuous transport in middle levels.It is also worth mentioning the blocking situation over all central Europe.The dust event was dominated by strong southeasterly wind.This may explain windblown dust emissions increasing with wind speed and being transported to some parts of the European area (Kong et al., 2015).For the chosen episodes, simulations with each model were performed with and without considering the atmospheric aerosol effects.Three different configurations were requested: the first one which does not consider any aerosol effect feedbacks to meteorology (no radiative feedbacks, NRF; C11 fire and C21 dust episode); second, where only aerosol-radiation interactions are considered (ARIs; C12 fire and C22 dust episode); and third, where aerosol-radiationcloud interactions are considered (ARI+ACI; C13 fire and C23 dust episode) (this case could not be submitted by all of the participants).Although the NRF case does not consider the aerosol effects and feedbacks, this configuration considers an assumption of 250 cm −3 used by WRF-Chem in the absence of ACI for estimating cloud droplet number.This number is used in the corresponding microphysics parameterization (Morrison or Lin).On the other hand, ARI uses this constant value for accounting the interaction between aerosols and clouds but allows the modification of the radiation budget by using the online estimated aerosols.Lastly, the ARI+ACI cases are based on simulated aerosol concentrations which interact both with radiation and aerosols.The common setup for the participating models and a unified output strategy allow analyzing the model output with respect to similarities and differences in the model response to the aerosol direct effect and aerosol-cloud interactions.

Participating models
An overview of the different models and their configurations is shown in Table 1, where in the first row the model acronym is shown.The participating models shown here are COSMO-MUSCAT (Wolke et al., 2012) and WRF-Chem (Grell et al., 2005;Fast et al., 2006;Gustafson Jr. et al., 2007;Chapman et al., 2009;Grell and Baklanov, 2011) with different chemistry and physics options.The table also includes the episodes run for each model.The horizontal grid spacing is around 25 km for most of the contributions.Only for the fire episode were the COSMO-MUSCAT simulations made with a grid with of 0.125 • (approximately 14 km); there is an additional WRF-Chem run with 9 km grid spacing.The COSMO models use Kessler-type bulk microphysics (Doms et al., 2011), and WRF-Chem uses Morrison microphysics (Morrison et al., 2009), except for one contribution that utilizes Lin (Lin et al., 1983).COSMO models use prognostic turbulence kinetic energy (TKE) (Doms et al., 2011) planetary boundary layer (PBL).The The Yonsei University (YSU) PBL scheme (Hong et al., 2006) was chosen for the WRF-Chem simulations.In general, the Modal Aerosol Dynamics Model for Europe (MADE) is applied (Ackermann et al., 1998) except for one WRF-Chem simulation, which uses the Model for Simulating Aerosol Interactions and Chemistry (MOSAIC) (four aerosol size bins) approach (Zaveri et al., 2008).For further information and details about the models, we refer to the work of Forkel et al. (2015), Im et al. (2015a), Im et al. (2015b) and Baró et al. (2015).To enable the cross compar- ison between models, the participating groups interpolated their model output to a common grid with 0.1 • resolution.
Moreover, the ensemble of the available simulations has also been included in this comparison, as recommended by several studies (Vautard et al., 2012;Jiménez-Guerrero et al., 2013;Landgren et al., 2014;Solazzo and Galmarini, 2015;Kioutsioukis et al., 2016), in order to check whether the design of an ensemble of simulations outperforms (or not) the skill of individual models.

Emissions and boundary conditions
For the EU domain, the anthropogenic emissions for the year 2009 (http://www.gmes-atmosphere.eu/)were applied by all modeling groups and are based on the TNO-MACC-II (Netherlands Organization for Applied Scientific Research, Monitoring Atmospheric Composition and Climate-Interim Implementation) framework (Kuenen et al., 2014;Pouliot et al., 2015).As described in Im et al. (2015a), annual emissions of methane (CH 4 ), carbon monoxide (CO),  ammonia (NH 3 ), total non-methane volatile organic compounds (NMVOCs), nitrogen oxides (NO x ), particulate matter (PM 10 and PM 2.5 ) and sulfur dioxide (SO 2 ) from 10 activity sectors are provided on a latitude-longitude grid of 1/8 × 1/16 resolution.Consistent temporal profiles (diurnal, day-of-week, seasonal) and vertical distributions were also made available to AQMEII and EuMetChem participating groups for time disaggregation.The temporal profiles for the EU anthropogenic emissions were provided from Schaap et al. (2005).For further details, the reader is referred to Im et al. (2015a) and Im et al. (2015b).
Hourly biomass burning emissions were provided by the Finnish Meteorological Institute (FMI) fire assimilation system (http://is4fires.fmi.fi/)(Sofiev et al., 2009).More details on the fire emissions and their uncertainties are discussed in Soares et al. (2015).The fire assimilation system only provides data for total PM emissions; the estimation of emissions for other species are described in Im et al. (2015b).
The chemical initial conditions (ICs) were provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) IFS-MOZART model and are available in 3 h time intervals and provided in daily files with eight different times of day per file.They were run under the MACC-II project (Monitoring Atmospheric Composition and Climate -Interim Implementation) which uses an updated data set of anthropogenic emissions and compiles satellite observation assimilations of O 3 , CO and NO 2 in the IFS-MOZART system.

Observational database
The comparison of regional models with gridded data sets has to be undertaken with care given the differences between available databases.For instance, Gómez-Navarro et al. (2012) showed that even in areas covered by dense monitoring networks, uncertainties in the observations are comparable to the uncertainties within state-of-the-art regional climate models, at least when they are driven by nominally perfect boundary conditions like reanalysis.This work uses the E-OBS (Haylock et al., 2008) version 11.0 gridded observational database for maximum, mean and minimum temperature.E-OBS is a high-resolution European land-only daily gridded data set covering the period 1950-2014.The E-OBS 0.25 • regular latitude-longitude grid has been used as the reference for validation.Thus, data from all model runs have been bilinearly interpolated onto the E-OBS grid.Since the resolution of the models is similar to that of E-OBS, the interpolation procedure is not expected to alter our results significantly.
The election of this gridded data set is based on the abundant scientific literature using E-OBS for the evaluation of regional climate models (e.g., Costa et al., 2012;Jiménez-Guerrero et al., 2013;Turco et al., 2013;Ceglar et al., 2014, among many others).However, several authors highlight the E-OBS limitations.Thus, Kysely and Plavcova (2010) compare E-OBS and a data set gridded onto the same grid from a high-density network of stations in the Czech Republic (GriSt), finding that large differences existed between the two gridded data sets, particularly for minimum temperatures and diurnal temperature range.The errors tended to be larger in tails of the distributions.Therefore, when evaluating regional models against one gridded data set, results have to be treated with caution.

Validation methodology
All the statistical measures are calculated at individual grid points.Only land grid points are considered in the analysis, since these are the only points where E-OBS contains information.Areas in grey indicate cells where E-OBS data are not available (southeastern part of the domain for the wildfires or southern part of the domain in the dust episode) or areas not covered by the modeling domain (southern part of the domain for the CS2 configuration).
We will use the notation V k ipc for a variable from model k at grid point i in period p = fires, dust and case c = 1 2, 3 representing no radiative feedbacks, ARI and ARI+ACI.If we use bracket notation for an average over a given index (e.g., • pc ), we can express the bias at a given grid point as where O ip is the value observed.The model bias is the simplest measure of model performance.
The ensemble mean, V k ipc k , is usually considered as an additional simulation which compensates for the errors of the different ensemble members.Even though this is a very simplistic view of the ensemble (which should be considered from a probabilistic point of view), it can be useful to reinforce the common signal of the different models in our analysis of the mean climate.Notice, however, that the ensemble mean is not a physical realization of any of the models, but just a statistical average (Knutti et al., 2010;Jiménez-Guerrero et al., 2013).R. Baró et al.: Regional effects of atmospheric aerosols on temperature Then, the variability was assessed on the hourly series (V k ipc ).The ability to represent the variability can be decomposed into the ability to represent its size, which can be represented by the standard deviation of the series: and can be compared to that of the observations sd[O] ip , and the ability to represent the hourly variations, which can be represented by the linear determination coefficient (ρ 2 ) with the observations The latter ability can only be expected in simulations nested into "perfect" boundary conditions such as those considered in this study.Finally, pattern agreement between simulated and observed data was quantified in a Taylor diagram by means of the spatial correlation (r) and the ratio between simulated and observed standard deviations, This information can be summarized in a Taylor (2001) diagram, which is a polar plot, with radial coordinate s k and angular coordinate related to r k .

Aerosol representation
In order to address the influence of aerosol effects on the surface temperature, it is crucial to have an understanding of the aerosol loading, both observed and modeled.For that purpose, aerosol optical depth (AOD) from the MODIS platform (Levy et al., 2013) is used, precisely Level 2 of the Atmospheric Aerosol Product (MxD04_L2), collection 6 (C6), with a resolution of 10 km.Palacios-Peña et al. (submitted to ACP, this issue) provide full details of the evaluation of the same set of models presented here against diverse satellite observations for AOD.The current contribution includes a brief description of the results.Figure 1 represents the model-MODIS comparison of AOD at 550 nm both for the fire and the dust episode.
For the Russian wildfire episode, the highest values of AOD measurements by MODIS (around 2.7) are found over Russia and the surroundings areas, due to the emissions produced by the wildfires.According to the estimation of the bias (MBE), all WRF-Chem simulations (CS1, CS2, ES1, ES3) and the ensemble underestimate AOD over the fireaffected areas (minimum MBE values for NRF: the ensemble −1.30; CS1 −1.46; CS2 −1.61; ES1 −1.46; and ES3 −1.62).Over the rest of the domain, a lower overestimation (around 0.5) is produced by the WRF-Chem simulations (maximum MBE values for NRF: CS1 0.55; CS2 0.37; ES1 0.45; and ES3 0.64) and the ensemble (maximum MBE values for NRF 0.23).For DE3, the underestimation is lower (minimum MBE values for NRF −0.72) and does not cover so large an area as the rest of the experiments; however, over the rest of the domain a higher overestimation is found in DE3 (maximum MBE values for NRF 2.61).Generally, for ARI and ARI+ACI simulations, slightly lower MBE values than NRF are found in all the experiments (for example, in ES1 simulations: NRF −1.46; ARI −1.43; ARI+ACI −1.41).However, the MBE for the ensemble (NRF −1.3; ARI −1.23; ARI+ACI −1.40) does not show this improvement, but this analysis should be treated with caution because the ARI+ACI ensemble does not include DE3 simulations.
For the dust episode, AOD values measured by MODIS > 0.5 are observed over the southeast of the domain due to the dust transported.This value is not very high for a dust outbreak, but this is caused by the wet deposition (rain during the episode).The highest AOD values, around 1.3, are found over a small area near the Po Valley.All experiments (no CS2 simulations are available in this case) underestimated high AOD values (over the southeast of the domain).MBE values over this area are around −0.3 for DE3, but for the rest of the experiments (WRF-Chem simulations) these values are around −0.2.However, small areas with a higher underestimation are found over this zone (minimum MBE values: the ensemble −0.73; CS1 −0.68; DE3 −0.84; ES1 −0.70; ES3 −0.67).Over the rest of the domain, small overestimations are modeled (MBE values around 0.1).Conversely, small, specific areas with a high overestimations are found (maximum MBE values for ENS 0.54; CS1 0.81; DE3 0.62; ES1 0.48; ES3 1.09).

Bias
The results for the daily bias of maximum, mean and minimum temperature have been obtained by calculating the bias of the daily mean series at each grid point of all the land grid points of the corresponding domain for the fire and dust episodes.They are summarized in Table 2 for the entire domain.Table 3 only considers the biases in those cells and time steps with a high load of aerosols (masking only those areas where 1 h AOD 550 > 1.0 in the fire case or 1 h AOD 550 > 0.5).
During the fire episode (Fig. 2a and c) there is a general underestimation of the maximum temperature in the base case (average domain values from −2.1 K in ES3-C11 to −1.2 K in DE3-C11 for the entire domain or −5.7 K in CS2-C12 to −3.0 K in DE3-C11 only in those areas with 1 h AOD 550 > 1.0).This is especially noticeable over several cells in Russia (up to −7 K).Conversely, a general overestimation is found in the west and northwest area of the domain (positive differences between +1.0 K in DE3-C11 and +6.5 K in ES1-C11).When introducing the ARI or ARI+ACI, model biases do not improve (mean variation in the bias of +17.2 % in C12 and +11.0 % in C13 for the entire domain).This positive variation was expected because of the cold bias of models for reproducing maximum temperature and the overall cooling effects of aerosols.However, the improvement of introducing aerosol-cloud interactions is remarkable with respect to the case of including just aerosolradiation effects (the bias reduces by 6.2 % in ARI+ACI with respect to ARI simulations).During the dust episode (Fig. 2b and d) the analysis of the results is similar to the fire case (averaged domain underestimations around −1.0 K in DE3-C11 to −0.56 K in ES1-C21; −4.1 K in DE3-C21 to −2.8 K in ES1-C11 only for areas and time steps where 1 h AOD 550 > 0.5).Here, the inclusion of ARI (C22) leads to a mean increase in the bias of +10.2 % for the entire domain, but ARI+ACI (C23) leads to a very limited improvement of the simulations with respect to the base case (C21), i.e., generally to reductions in the bias of around −0.4 %.
A similar discussion can be had for mean temperature.During the fire episode (Fig. 3a and c), all runs (except DE3) tend to underestimate the domain-averaged mean temperature (biases ranging from −0.4 K in ES1-C11 to +1.0 in DE3-C11; for those areas when AOD 550 > 1.0, biases range from −1.1 K in CS2-C13 to +1.0 in DE3-C11).Here, the ensemble (ENS) simulation clearly outperforms the individual simulations (bias of −0.2 K in ENS-C11 for the entire domain and −0.1 K in the high-AOD domain).Again, the model skill does not improve for mean temperature when including ARI or ARI+ACI (bias increase of 46.0 and 56.2 %, respectively, for the fire episode averaged over the entire domain) except in the case of the DE3-C12 simulation (including ARI reduces the bias by −27.3 %).During the dust episode (Fig. 3b and d), there is a general averaged overestimation of mean temperature (+0.4 in ES1-C21 to 0.8 K in DE3-C21; for those areas when AOD 550 > 0.5, biases range from −0.5 K in CS1-C21 to −0.1 in ES1-C11).Conversely to the fire episode, the inclusion of ARI and ARI+ACI improves the entire-domain bias (reductions in this variable of −13.4 % in C22 and −4.2 % in C23).The reduction in the bias when including ARI+ACI is especially remarkable for the ensemble of simulations, where the bias decreases by −24.4 % in ENS-C23 (averaged for the entire domain).
Lastly, minimum temperature during the fire episodes is shown in Fig. 4a and c.Here, the analysis of results regarding improvements or worsening of the bias is very different, since the domain-averaged errors are in the order of −0.01 K for WRF-based models in C11 and C12, so a very slight difference would lead to a percentage increase (or reduction) in the bias compared to the base case.However, for DE3-C11 the bias is larger (up to +1.6 K for minimum temperature averaged over all the domain) and the inclusion of ARI leads only to a small improvement (−1.5 %).Despite the conclusions being similar for areas with 1 h AOD 550 > 1.0, WRF-Chem based models present biases of around +3.0 to 3.5 K for the fire episode and of around +4.5 K for DE3-C11 and DE3-C12.The dust case (Fig. 4b and d) shows a general overestimation of minimum temperature for domainaveraged values, with base-case biases ranging from +0.5 K in ES1-C21 to +1.8 K in DE3-C21 (biases from +2.0 in ES3-C23 to +3.5 in DE3-C21 in areas with AOD 550 > 0.5).Here, the inclusion of ARI and ARI+ACI slightly improves the average bias for the entire domain (reductions of −10.5 % in C22 and −5.0 % in C23).Here, again, the improvement of the ENS-C22 and ENS-C23 simulations is larger than for the rest of the models (reductions in the bias of −29.7 and −38.2 % for ARI and ARI+ACI, respectively).Analogous discussions can be had for the masked domain according to the AOD 550 values.

Temporal correlation
The temporal correlation (estimated through the coefficient of determination, ρ 2 ) between simulated and observed series is shown in Figs. 5, 6 and 7 for maximum, mean and minimum temperature, in that order.The latter are also summarized in Table 4 for the entire domain.Table 5 only considers the temporal correlation in those cells and time steps with a high load of aerosols (masking only those areas where 1 h AOD 550 > 1.0 in the fire case or 1 h AOD 550 > 0.5).Since the values and conclusions are very similar, only the results from the entire domain are discussed below.
The first column in each panel represents the value of ρ 2 of the base case (C11 or C21) of each individual model (or the ensemble) with respect to the E-OBS database.The center (C12 or C22) and right (C13 and C23) columns indicate the increase (red values) or decrease (blue value) in the ρ 2 for each simulation with respect to the case not including feedbacks.Then, that gives an idea of the improvement (or not) in the skill of the model for representing the time evolution of our series when compared to the observations.For maximum, mean and minimum temperature during the fire episode (Figs.5a, 6a and 7a, respectively), domainaveraged ρ 2 is higher than 0.5 for all models (0.52 in CS1-C11 minimum temperature to 0.78 in DE3-C11 mean tem- perature).In general, coefficients of determination are highest for mean temperature (ranging from 0.60 to 0.78 depending on the individual model) with respect to minimum and maximum temperature.The variable with the lowest ρ 2 is minimum temperature (varying from 0.50 to 0.56 depending on the model).Moreover, the coefficient of determination for the ensemble is always higher than that of each individual model for the three studied variables (0.75, 0.79 and 0.61, respectively, for maximum, mean and minimum temperature).The highest ρ 2 values are found over the north and west part of the domain (above 0.8 in mean temperature) and the lowest mainly over the south and southeast area of the domain (under 0.2).According to the improvement with respect to the C11 case, when analyzing the inclusion of the ARI and ARI+ACI, a general improvement is observed for maximum and mean temperature, with positive values reaching up to 0.18 (domain-averaged values improve for individual models around 1 % for maximum and 0.3 % for mean temperature).Correlation with minima experiences a slight decrease (−0.4 %) when including ARI or ARI+ACI for the ensemble mean.
During the dust episode (Figs.5b, 6b and 7b), domainaveraged ρ 2 is higher than for the fire episode for all models and variables in the base case (0.76 in DE3-C21 minima to 0.90 in DE3-C21 mean temperature), with the ensemble again providing the highest correlation (values ranging from 0.88 for maximum, 0.91 for mean and 0.84 for minimum temperature).As before, the inclusion of the ARI and ARI+ACI shows an improvement over some areas in the order of 0.17 for mean and maximum temperature, with domain-averaged improvements of 0.3 % in C22-C23 for maximum temperature, 0.2 % in C22-C23 for mean temperature and 0.1 % in C23 for minimum temperature, with no improvement for C22 in this latter variable.

Temporal variability
The results for the daily variability in maximum, mean and minimum temperature have been obtained by calculating the standard deviation of the daily mean series at each grid point of all the land grid points of the corresponding domain for the fire and dust episodes.
Considering maximum temperature, in the fire episode (Fig. 8a and c), all runs tend to slightly overestimate the standard deviation of maximum temperature for the base case (no radiative feedbacks), with biases of maximum temperature standard deviation varying between +1.28 K for DE3-C11 to +0.25 K for ES1-C11.The biases of the standard deviation are reduced by −22.6 % (on average) when including the ARI, with reductions in the biases of the standard deviation ranging from −34.2 % in ES1-C12 and −8.6 % for DE3-C12.For the ARI+ACI simulations, the average reduction in the bias is −41.21 % (−56.9 % for ES1-C13 and −24.40 % for CS2-C13).The rest of the models and cases show an intermediate behavior for representing the variability, with the best skills always for the cases including the ARI+ACI interactions.Analogous results can be found for maximum temperature during the dust episode (Fig. 8b and d): the inclusion of aerosol feedbacks generally improves the represen-tation of the temporal variability in maximum temperature, with an average reduction in the bias of the standard deviation of −5.9 % (−16.6 %) for ARI (ARI+ACI) simulations.
For mean temperature during the fire episode (Fig. 9a and  c), all runs tend to overestimate the standard deviation for the base case (no radiative feedbacks), with biases of mean temperature standard deviation between +0.2 and +1.1 K.As for the maximum temperature, the biases of the standard deviation are reduced on −41.8 % (on average) when including the ARI and −66.5 % for the ARI+ACI simulations, with reductions in the biases of the standard deviation ranging from −8.5 % in the DE3-C12 simulation to −78.2 % in the ES1-C13 case.Similar to the maximum temperature, the rest of the models and cases show an intermediate representation of the variability in the mean temperature, with the best skills always for the cases including the ARI+ACI interactions.Results for the dust episode are shown in Fig. 9b and d.The standard deviation tends to be overestimated by all models in the north of Africa and central Europe, and underestimated in the eastern part of the target domain.Overall, the inclusion of  ARI does not lead to better skills of the models when representing the temporal variability (+2.4 %), and for ARI+ACI the skill improved only marginally (reductions of −0.6 %).
With respect to the minimum temperature, for the fire episode (Fig. 10a and c) all runs tend to overestimate the standard deviation.Biases of the minimum tempera- If considering the biases of the standard deviation, there is a slight improvement when including ARI or ARI+ACI for the fire episode, while a slight worsening is depicted for the dust case.The variations in the biases of the standard deviation are on average −2.1 and −4.9 %, respectively, for the ARI and ARI+ACI simulations (+3.4 and +5.4 % for the dust episode).

Spatial variability
Taylor diagrams (Taylor, 2001) allow an easy comparison between the spatial and temporal patterns of two fields (Rauscher et al., 2010).In Fig. 11 shows the relative spatial standard deviation (radial distance from the origin) and the correlation (the cosine of the angular coordinate) with E-OBS.Model results with good performance in terms of spatial variability and correlation are located closer to the standard deviation ratio 1 and correlation 1, which corresponds to E-OBS (indicated by the small black asterisk).For maximum, mean and minimum temperature, the diverse models (and configurations) show a narrow spread in the representation of the spatial structure of the standard deviation.
With respect to the mean field of maximum temperature (Fig. 11a and d), all models perform well for the fire period (panels a-c), with high spatial correlations (over 0.9) and a normalized standard deviation close to observations.However, the no radiative feedback configuration (C11 cases in Fig. 11) represents excessive spatial variability (standard deviation ratio over 1).The spatial variability in the daily standard deviation for the ARI simulations (asterisks in Fig. 11, C12 cases) as well as for ARI+ACI simulations (squares, C13 cases) is substantially improved, despite the spatial correlation remaining practically constant for all models.Since there is a positive bias in the models when representing the spatial variability in the no radiative feedbacks simulations, the inclusion of radiative effects reduces the variability and therefore improves its spatial patterns.Analogous results can be found for the dust episode (Fig. 11c-f), with a larger agreement between models and lower differences between C21, C22 and C23 cases (no feedbacks, ARI and ARI+ACI simulations, in that order).
With respect to the mean temperature (Fig. 11b and e), the models perform very similarly with each other, showing a high spatial correlation with the observations (over 0.9 for all models and cases), with a small overestimation of the spatial variability for the C11 (fire episode, no radiative feedbacks) case (panels a-c), which improves when including the ARI and ARI+ACI interactions.Similarly, the spatial variability is slightly overestimated for the C21 (dust, no radiative feedbacks) case, except for the DE3 model.Generally, the models capture the spatial structure of the variability during the fire and dust cases better (Fig. 11b and e) when including the radiative feedbacks.The correlation is only slightly improved for the ARI and ARI+ACI cases (except for ENS simula- tions, which will be discussed below) and is always higher for the mean temperature than for maximum temperature.
The minimum temperature (Fig. 11c and f) is captured well as the maximum and mean temperature.While for the fire episode the models (in all cases) tend to provide a higher spatial variability than the observations, the spatial variability is underestimated for the dust episode, but with a high correlation (over 0.9) for both episodes.For this variable, the improvement of including the radiative feedbacks is not as evident, since the spatial variability does not generally improve for C12, C13, C22 or C23 cases with respect to the configuration without radiative feedbacks.Moreover, the correlation coefficient is even slightly reduced with the inclusion of ARI or ARI+ACI.
Lastly, the added value of considering the ensemble mean of all available simulations in each episode and case is clear for the fire episode but not that obvious for the dust period.For the fire episode, the ensemble mean outperforms indi-vidual models in terms of the standard deviation and the correlation coefficient, especially for mean temperature, where the correlation increases up to 0.99 for the ENS-C11 case.The exception is found for the ENS-C13 for minimum temperature.Generally, the skill of most models improves when aerosol-meteorology interactions are taken into account For the dust case, the ensemble mean outperforms the individual models in representing the standard deviation (that is, the spatial variability).However, the spatial correlation coefficient is somewhat reduced as compared to the individual models.

Summary and conclusions
This study shows a collective operational evaluation of the temperature at 2 m (maximum, mean and minimum) simulated by the coupled chemistry and meteorology models under the umbrella of COST Action ES1004 for a wildfires and a dust episode in the year 2010.The meteorological parameters considered in this assessment are important to understand the effect of the aerosol interactions with clouds and radiation.In this sense, this study complements several other analyses (e.g., Brunner et al., 2015;Forkel et al., 2015;Makar et al., 2015b) by analyzing whether the inclusion of the radiative feedbacks improves the representation of the temperature field (maximum, mean and minimum) in an ensemble of simulations or not.
Focusing on the bias, in both episodes there is a general underestimation of the studied variables, which is most noticeable in maximum temperature.In general, there is no a straightforward conclusion with respect to the improvement (or not) of the bias when introducing aerosol radiative feedbacks.Broadly, the biases are improved when including ARI or ARI+ACI in the dust case, but no evident improvements are found for the heat wave/wildfire episode.Although the ensemble does not outperform the individual models (in general), the improvements found when including ARI and ARI+ACI are by far more remarkable for the ensemble than for the individual models.
With respect to the temporal correlation, maximum and mean temperatures in the fire and dust episodes show higher correlations over most of the domain when considering the C11 case with respect to the E-OBS database than minimum temperature.During these episodes, a twofold conclusion can be drawn: (1) the ensemble of simulations always outperforms the representation of the temporal variability in the series; and (2) an improvement of the ρ 2 coefficient is found when considering ARI or ARI+ACI feedbacks (in both episodes).
Regarding the temporal variability, during the fire episode there is a general, pronounced overestimation of the standard deviation of the studied variables.Here, the inclusion of aerosol feedbacks largely improves the representation of the temporal variability in the three studied variables (reduction in the bias of the standard deviation) showing the best skills for the cases including the ARI+ACI interactions, with a re- duction in bias of the standard deviation by as much as 75 %.Very similar results can be found for the dust episode.Generally, the inclusion of the aerosol radiative feedbacks shows the largest improvements for temporal variability and results in an added value of the computational effort made to include direct aerosol-radiation interactions and aerosol-cloud interactions in the models.Lastly, with respect to the spatial variability for maximum and mean temperature, the inclusion of radiative effects reduces the variability and improves the spatial patterns for both episodes.For the minimum temperature, the improvement of including the radiative feedbacks is less evident.
In order to further investigate the impact of including the aerosol interactions in online coupled models, more episodes with effects on the aerosol-radiation-cloud interactions should be considered.In this work, the fire episode represents a situation of clear skies, and therefore the ARI feedbacks are dominant.The dust episode election permits us to study aerosol-cloud interaction; most of the ARI+ACI differences found in the models with respect to the base case were found over the Mediterranean sea.Since the observational data from E-OBS only have values over land, the effect of ARI+ACI was not evaluated here.Unfortunately part of the interpretation of the results may be missing due to the unavailability of this database over the ocean.Furthermore, it should be pointed out that all results for the ARI+ACI cases were from WRF-Chem simulations, which may bias the ARI+ACI results towards the behavior of this model.
There are still modeling issues regarding the representation of the field of temperature, where maximum temperatures are underestimated and minimum temperatures are overestimated, and the inclusion of the aerosol feedbacks does not improve this situation.Nevertheless, in this study, a general improvement in the temporal variability and correlation has been seen.These improvements may be important not only for certain episodes, as analyzed here, by also for the representation of the climatology of temperatures.How- ever, climatically representative periods should be covered in further studies.
Data availability.The modeling data generated are available by contacting the corresponding managing organizations.The E-OBS database is available through the ECAD project platform (http: //www.ecad.eu).
Competing interests.The authors declare that they have no conflict of interest.Special issue statement.This article is part of the special issue "Coupled chemistry-meteorology modelling: status and relevance for numerical weather prediction, air quality and climate communities (SI of EuMetChem COST ES1004) (ACP/GMD inter-journal SI)".It is not associated with a conference.

Figure 1 .
Figure 1.Aerosol optical depth (AOD) at 550 nm for the fire (a) and dust (b) episodes, as derived from MODIS.The panels below represent the bias for the fire (c) and dust (d) episodes of each simulation with respect to the MODIS AOD.

Figure 2 .
Figure 2. Maximum temperature (TMAX) for the fire (a) and dust (b) episodes, as derived from the E-OBS database (in K).The panels below represent the bias for the fire (c) and dust (d) episodes of each simulation with respect to the E-OBS database.

Figure 3 .
Figure 3. Mean temperature (TEMP) for the fire (a) and dust (b) episodes, as derived from the E-OBS database (in K).The panels below represent the bias for the fire (c) and dust (d) episodes of each simulation with respect to the E-OBS database.

Figure 4 .
Figure 4. Minimum temperature (TMIN) for the fire (a) and dust (b) episodes, as derived from the E-OBS database (in K).The panels below represent the bias for the fire (c) and dust (d) episodes of each simulation with respect to the E-OBS database.

Figure 5 .
Figure 5.Time determination coefficient (ρ 2 ) (model vs. E-OBS) of the maximum temperature (TMAX) for the fire (a) and dust (b) episodes.The first column in each panel represents the value of ρ 2 of the no radiative feedback case with respect to the E-OBS database.The center and right columns indicate the increase (red values) or decrease (blue value) in each simulation with respect to the case not including feedbacks.

Figure 6 .
Figure 6.Time determination coefficient (ρ 2 ) (model vs. E-OBS) of the mean temperature (TEMP) for the fire (a) and dust (b) episodes.The first column in each panel represents the value of ρ 2 of the no radiative feedback case with respect to the E-OBS database.The center and right columns indicate the increase (red values) or decrease (blue value) in each simulation with respect to the case not including feedbacks.

Figure 7 .
Figure 7. Time determination coefficient (ρ 2 ) (model vs. E-OBS) of the minimum temperature (TMIN) for the fire (a) and dust (b) episodes.The first column in each panel represents the value of ρ 2 of the no radiative feedback case with respect to the E-OBS database.The center and right columns indicate the increase (red values) or decrease (blue value) in each simulation with respect to the case not including feedbacks.

Figure 8 .
Figure 8.Standard deviation (SD) of the maximum temperature (TMAX) for the fire (a) and dust (b) episodes, as derived from the E-OBS database (in K).The panels below represent the bias for the standard deviation of the fires (c) and dust (d) episodes of each simulation with respect to the E-OBS database.

Figure 9 .
Figure 9.Standard deviation (SD) of the mean temperature (TEMP) for the fire (a) and dust (b) episodes, as derived from the E-OBS database (in K).The panels below represent the bias for the standard deviation of the fires (c) and dust (d) episodes of each simulation with respect to the E-OBS database.

Figure 10 .
Figure 10.Standard deviation (SD) of the minimum temperature (TMIN) for the fire (a) and dust (b) episodes, as derived from the E-OBS database (in K).The panels below represent the bias for the standard deviation of the fires (c) and dust (d) episodes of each simulation with respect to the E-OBS database.

Figure 11 .
Figure 11.Taylor diagrams for (a, d) maximum temperature, (b, e) mean temperature and (c, f) minimum temperature for the simulations included in the analysis.Panels (a-c) represent the Taylor diagrams for the fire episode, while panels (d-f) stands for the dust episode.The cases included are no radiative feedbacks (filled circle), ARI (asterisk) and ARI+ACI (empty squares).Each configuration is shown in a different color: CS1 (green), CS2 (dark blue), DE3 (red), ES1 (yellow), ES3 (pink) and ENS (black).

Table 1 .
Modeling systems participating and their contributions to the case studies.

Table 2 .
Domain-averaged bias (in K)for the fire (C1X) and dust (C2X) episodes of each simulation with respect to the E-OBS database.

Table 3 .
Bias (in K)in those areas affected by high aerosol optical depths (1 h AOD > 1.0 for the fires, C1X case, and AOD > 0.5 for the dust, C2X case) with respect to the E-OBS database.

Table 4 .
Domain-averaged coefficient of determination (ρ 2 ) for the fire (C1X) and dust (C2X) episodes of each simulation with respect to the E-OBS database.

Table 5 .
Coefficient of determination (ρ 2 ) in those areas affected by high aerosol optical depths (1 h AOD > 1.0 for the fires, C1X case, and AOD > 0.5 for the dust, C2X case) with respect to the E-OBS database.