Modelled black carbon radiative forcing and atmospheric lifetime in AeroCom Phase II constrained by aircraft observations

. Atmospheric black carbon (BC) absorbs solar radiation, and exacerbates global warming through exerting positive radiative forcing (RF). However, the contribution of BC to ongoing changes in global climate is under debate. Anthropogenic BC emissions, and the resulting distribution of BC concentration, are highly uncertain. In particular, long-range transport and processes affecting BC atmospheric lifetime are poorly understood. Here we discuss whether recent assessments may have overestimated present-day BC radiative forcing in remote regions. We compare vertical proﬁles of BC concentration from four recent aircraft measurement campaigns to simulations by 13 aerosol models participating in the AeroCom Phase II intercomparison. An atmospheric lifetime of BC of less than 5 days is shown to be essential for reproducing observations in remote ocean regions, in line with other recent studies. Adjusting model results to measurements in remote regions, and at high altitudes, leads to a 25 % reduction in AeroCom Phase II median direct BC forcing, from fossil fuel and biofuel burning, over the industrial era. The sensitivity of modelled forcing to BC vertical proﬁle and lifetime highlights an urgent need for further ﬂight campaigns, close to sources and in remote regions, to provide improved quantiﬁcation of BC effects for use in climate policy.


Introduction
As an absorber of solar radiation, anthropogenic BC emissions can contribute positively to global radiative forcing through the aerosol direct effect, they can affect clouds through the aerosol indirect and semidirect effects, change albedo of snow and ice, and influence precipitation by changing atmospheric stability and the surface energy balance (Myhre et al., 2013a;Ramanathan and Carmichael, 2008;Haywood and Shine, 1995). Presently both the magnitude of anthropogenic BC emissions and the resulting global distribution of BC concentrations are highly uncertain. In particular the vertical profile of BC concentration, which strongly affects its total impact on the energy balance of the atmosphere, is poorly constrained (Koffi et al., 2012;Textor et al., 2007;Samset et al., 2013). Comparisons of measurements with model results, with emphasis both on total BC mass, spatio-temporal distribution and vertical structure are therefore essential for constraining estimates of BC effects on climate.
The IPCC AR5  assessed the direct aerosol effect radiative forcing due to anthropogenic BC (defined here as BC from anthropogenic fossil fuel and biofuel sources, BC FF + BF) to be +0.40 [range: +0.05 to +0.80] W m −2 over the period 1750-2010. That assessment took into account both model-based and observational studies. Recently, Phase II of the AeroCom model intercomparison project (Myhre et al., 2013b) also evaluated BC FF + BF radiative forcing (RF), based purely on 15 global aerosol models (Myhre et al., 2013b), and found it to be +0.23 [+0.06 to +0.48] W m −2 . The uncertainty ranges (5-95 %) show that BC is still a major contributor to the total uncertainty on anthropogenic RF. Further, it has recently been shown that, because the direct RF per unit mass BC increases strongly with altitude (Ban-Weiss et al., 2011;Zarzycki and Bond, 2010;Samset and Myhre, 2011), the diversity in modelled vertical profiles of BC concentration in the AeroCom Phase II models may account for up to 50 % of the model diversity in anthropogenic BC RF . Climate model simulations, however, indicate that, while direct BC forcing strengthens with altitude, its climate efficacy may decrease, i.e. the surface temperature response to BC in the middle and upper troposphere may be small or even negative (Ban-Weiss et al., 2011;Flanner, 2013). Further, Samset et al. (2014) concluded that an upward adjustment on the model-based uncertainty on total aerosol forcing may be necessary, in an analysis that looked at multimodel variability in per-species aerosol burdens and optical parameters. BC forcing diversity was found to be a significant component in this analysis.
Recently, several single-model studies have investigated other factors that may underlie the intermodel variability in AeroCom Phase II, or assessed its multimodel mean results in light of observations. For example, Wang et al. (2014a) compared results from the GEOS-Chem model (http: //geos-chem.org) with results from the HIPPO flight campaign, and concluded that to reproduce HIPPO, more wet removal was required than is represented in most present models. Based on this, and on revised estimates on direct RF of BC when taking HIPPO constraints  into account, they argue that previous model estimates may be biased high due to elevated BC concentrations in the free troposphere. Bauer et al. (2013) studied the atmospheric lifetime of BC, which is a combined measure of transport and removal processes, by comparing simulations with the GISS-MATRIX model to HIPPO (Bauer et al., 2010), using CMIP5 emissions. They found that under present-day conditions, BC lifetime should be no more than 4 days, which is significantly shorter than what is used in some present models. Hodnebrog et al. (2014) showed that such a reduction in lifetime, when combined with estimates of the impact from BC on atmospheric stability, can lead to major reductions in the global mean climate impact of BC emissions. Based on the above, it is natural to investigate both whether the full AeroCom Phase II RF estimate is biased high, as was found for GEOS-Chem, and what contribution to multimodel variability may be due to differences in modelled BC lifetime.
In the following, we compare vertical concentration profiles from the AeroCom Phase II models to recent aircraft campaigns. The results are used to set multimodel constraints on BC lifetime, and to find limits on the possible high bias of AeroCom Phase II BC RF if constraining to HIPPO results.

Flight data and BC definition
We have used data from flight campaigns HIAPER Pole-to-Pole Observations (HIPPO) 1-5 , Arctic Research of the Composition of the Troposphere from Aircraft and Satellites (ARCTAS) SP2 (Jacob et al., 2010), Polar Airborne Measurements and Arctic Regional Climate Model Simulation Project (PAMARCMiP) (Herber et al., 2012;Stone et al., 2010) and Aerosol Radiative Forcing in East Asia (A-FORCE) (Oshima et al., 2012), see Table 1. All flights measured BC concentrations using the single particle soot photometer (SP2) instrument (Schwarz et al., 2010). Hence, in the present work, "BC" in relation to measured data stands for "refractive BC (rBC)" as quantified by SP2, equivalent to properly measured elemental carbon .
HIPPO 1-5  flew mainly poleto-pole over the Pacific Ocean at various times during 2009-2011. Combining data from all five campaigns yields an approximate annual average. The HIPPO data have been screened against contributions from fires, so as to be representative of the background concentration of BC over the Pacific. Two of the HIPPO campaigns also flew over the North American mainland, allowing also for comparisons closer to anthropogenic BC source regions. HIPPO covers atmospheric pressures from surface values up to 100 hPa, meaning that its upper range reaches into the lower stratosphere.
The ARCTAS SP2 campaign (Jacob et al., 2010) was flown in two separate time periods in 2008. During spring, flights were conducted over the northern Pacific Ocean, comparable to parts of the HIPPO region, and also over the North Polar regions. During summer, flights were conducted over the North American continent, again comparable to HIPPO. Part of the ARCTAS motivation was to study fires, so the measured concentrations can be expected to have a larger contribution from open biomass burning than the HIPPO data set. ARCTAS data cover the atmospheric segment from the surface up to 250 hPa.
The Polar Airborne Measurements and Arctic Regional Climate Model Simulation Project (PAMARCMiP) campaign (Herber et al., 2012;Stone et al., 2010) consisted of a series of flights conducted over the North Polar and Northern Pacific regions, from 2009 through 2012. It covers a vertical range up to 500 hPa, and is partially comparable to both ARCTAS and HIPPO. As ARCTAS, PAMARCMiP is partially affected by biomass burning emissions.
Finally, the Aerosol Radiative Forcing in East Asia (A-FORCE) aircraft campaign (Oshima et al., 2012) flew over East Asia (southwestern Japan) in spring 2009, covering a vertical range up to 300 hPa. It is not regionally comparable to the other campaigns, but is highly relevant because it represents a region dominated by outflow from mainland China, one of the main sources of anthropogenic BC emissions.
All models submitted monthly mean 3-D fields of total BC mass mixing ratios, using year 2000 emissions (Lamarque et al., 2010). Meteorological year was 2006, or model internal present-day (PD) climatology. To calculate BC concentrations from mixing ratios, the models' own monthly mean temperature and pressure fields were used. When discussing modifications to the BC radiative forcing from the direct effect from anthropogenic fossil fuels and biofuels, the models' own monthly mean 2-D forcing fields were used, in combination with a preindustrial simulation using year 1850 emissions but still year 2006 or PD meteorology (Myhre et al., 2013b). For consistency with recent literature, forcing results are given for 1750-2010, using scaling factors presented in Myhre et al. (2013b).

Analysis
To compare flight campaigns and model output, a series of geographical regions were first selected. See Fig. 1. For the flight data, only measurements that fell within the regions were kept, and for each region an average profile was constructed. For the models, all output within the selected region was averaged into a single profile for each model. However, to take the seasonality into account, we produced a model profile for each measurement point or profile from the flights. These were then averaged. The result is a set of model profiles that correspond to the flight profiles both geographically and temporally.
From the concentration profiles, we calculated aerosol burdens for both models and flight data. To ensure comparability, model burdens were calculated only in the same vertical range covered by the flight campaign.
Further, we used the methodology presented in  to calculate BC RF from the concentration profiles. Briefly, we use spatially and temporally resolved normalized forcing efficiency profiles (RF exerted per gram of aerosol at a given altitude) calculated from a single model (Samset and Myhre, 2011), and multiplied with profiles of BC burden per model layer. This yields comparable estimates for total BC  forcing within the selected regions, for the seasons covered by the respective flight campaigns; thus all RF calculations are performed with a consistent method. To distinguish the models' own estimates of RF from the RF calculated by this method, we refer to the two as "native RF" and "recalculated RF" respectively. While using the forcing efficiency from a single model (OsloCTM2) will naturally bias the calculated RF towards the forcing strength predicted by that model, it also allows for an estimate of differences in vertical profile shape. Since the forcing efficiency for BC is strongly and monotonically rising with altitude, differences between the overall shape of measured and modelled profiles will cause the ratio of recalculated forcing per burden to differ from unity.

Calculation of RMSE values and correlations
To estimate how well a given model reproduces the Pacific HIPPO flight data, as primarily used below, we calculated model bias, root-mean-square error (RMSE) values and cor-relation coefficients. The HIPPO data set was subdivided into five regions (P1-P5 in Fig. 1), and an annual mean profile was constructed for each region as shown in Fig. 2. For each model, diagnostics were calculated from the difference between the HIPPO concentration profile at each of its given altitude levels, and the regionally averaged model concentration value interpolated to the corresponding altitude, according to the following equations: (3) Here C denotes a concentration value and N is the total number of data points. In the present case, this value is 72, determined by the number of altitude bins where HIPPO reported measurements. Further, we calculated the Pearson sample correlation coefficient based on the same data set.

Derivation of scaled forcing estimates
Two scalings were applied in the present work to assess the potential impact of adjusting models to measured BC concentrations. These scalings were derived by altering the 3-D concentration fields of total BC (fossil fuel, biofuel and biomass burning) provided by the AeroCom models, and then applied to the BC FF + BF forcing fields supplied. This method is used to ensure that intermodel variability in RF due to differences in optical parameters of BC, cloud distributions and other factors related to the host model are kept unchanged. For the "remote ocean" scaling, the concentration fields were altered within the grey boxes shown in Fig. 1. Between the surface and 500 hPa concentrations were reduced to 1/3, then to 1/8 up to 200 hPa, and then to 1/15 up to TOA. These factors were derived from the comparison between Aero-Com Phase II and HIPPO 1-5 presented in Schwarz 2013 .
Using the forcing efficiency profile method (Samset and Myhre, 2011), we then calculated global, annual mean BC RF from both scaled and unscaled concentration fields. The ratio of these forcing values is taken as the scaling factor for that particular model. Finally, we constructed the multimodel median BC FF + BF for all 13 models used for the present study, based on their original 2-D forcing fields. These forcing values were scaled with the derived scaling factors, to produce the revised model median forcing.
For the "high altitude" scaling the same procedure was followed, except that the concentration fields were scaled to 1/20 at altitudes between 200 hPa and TOA globally.
For the "all scaled" analysis both scalings were applied to the concentration fields, i.e. the fields were scaled to 1/20 at altitudes between 200 hPa and TOA globally, and then as described above in the grey marked regions at altitudes below 200 hPa.

Comparisons of flights and models
Here, we constrain the model range of global, annual mean direct RF by anthropogenic BC, by comparing AeroCom Phase II vertical profiles from 13 models, to recent aircraft campaigns. Figure 1 shows the flight tracks of the four campaigns, the AeroCom multimodel median anthropogenic BC forcing field, and the regions selected for analysis. Table 2 shows individual model BC RF, recalculated using the forcing efficiency profile method, globally and for the regions in Fig. 1. We also show the fraction of RF above 5 km (500 hPa), Figure 2 compares flight campaign data with AeroCom Phase II model output. Panels a-f show the HIPPO1-5 campaigns  for five regions in the remote Pacific Ocean and for western North America, overlain with AeroCom Phase II results. A common pattern is that the models strongly overpredict the HIPPO measurements. Further, the overprediction is more pronounced at high altitudes. Comprised of five campaigns distributed throughout the year, HIPPO represents an approximate annual average. As recently noted , its Pacific measurements indicate that at the highest altitudes studied, BC concentrations converge towards a common background value, here found to be approximately 0.1 ng m −3 , with very low seasonality. Here we also find (Fig. 2f) the same background value above western North America.
Panels g-i of Fig. 2 show the ARCTAS (Jacob et al., 2010) campaign, which reports significantly higher concentrations than HIPPO. The models mainly underpredict these observations, linked to the fact that ARCTAS encountered biomass burning BC from episodic forest fires (Wang et al., 2011), which HIPPO did not encounter in this region. A notable feature is that above the fire-dominated segments, the ARC-TAS profiles show a strong decline with altitude. In the P1 (Northern Pacific) region upper tropospheric ARCTAS concentrations are similar to those measured in HIPPO.
Panels j-l show PAMARCMiP (Herber et al., 2012;Stone et al., 2010) data, first over the North Pacific region, and then over two North Polar regions. While the altitude range covered by PAMARCMiP is limited compared to ARCTAS and HIPPO, the concentrations found in the lowest few kilometres of the troposphere are consistent with ARCTAS. Over the NP1 and NP2 region, north of America and Greenland, models underpredict the measurements. As for ARCTAS, this is at least partly due to episodic fires. The Arctic region may however have further sources of BC not adequately represented in the emission inventories used by the models (Stohl et al., 2013).
Panel m shows A-FORCE (Oshima et al., 2012) data in the sea areas around Japan. Here we find good agreement between models and measured concentrations, both in absolute values and in the shape of the vertical profile. The variability  Comparison of measurements and model data for all selected regions. For each panel, the left box shows an overlay of the observed total BC concentration profiles -black lines: mean (solid), median (dotted) and mean +1 standard deviation (dashed), and 25th-75th percentile range (grey band) -with the mean BC concentration profiles from individual AeroCom Phase II models (coloured lines, see legend). The three middle boxes show, from left to right, the BC burden (mg m −2 ), direct radiative forcing (W m −2 ) and forcing efficiency (W g −1 ) for observations (black) and models (red). The coloured diamonds show the individual AeroCom Phase II models. Finally, the rightmost box shows the ratio of models to observations for the burden (green), radiative forcing (blue) and forcing efficiency (red) within the selected region.
Atmos. Chem. Phys., 14, 12465-12477, 2014 www.atmos-chem-phys.net/14/12465/2014/ Table 2. Modelled mean forcing and fractions for the regions used in the present analysis (see Fig. 1). Mean RF is the forcing within that region. Fraction of global RF is defined as the fraction of energy deposited, on annual mean, within that region. "Remotes" represents the total of all grey marked regions in Fig. 1 between models is also much lower in this region than for the others. The aerosol in the A-FORCE region is mainly sensitive to outflow from mainland China, Korea and Japan. This indicates that in AeroCom Phase II, East Asian BC emissions and outflow are either well represented or, if emissions are still underestimated as discussed for AeroCom Phase I in recent literature (Bond et al., 2013;Chung et al., 2012), the atmospheric lifetime of BC must be compensatory long in the models to allow enough BC to be transported into the region sampled by A-FORCE. We note, however, that the A-FORCE data do not extend as far up in the atmosphere as HIPPO did, and that we find significant intermodel variability at p<400 hPa also for the near-source A-FORCE and HIPPO America regions. While the aircraft data in the present study were taken over the period 2008-2012, the models used emissions from year 2000. BC emissions have increased in the intervening period (Wang et al., 2014b), indicating that any overestimation of concentrations by the models would have been strengthened had they used a more recent emission inventory. One model (CAM4-Oslo) delivered results for both year 2000 and 2006 emissions, reflecting this increase. In remote regions (e.g. the HIPPO regions in Fig. 1), the resulting increase in concentration is found to be evenly distributed throughout the vertical profile, except in the range 1000-800 hPa where no significant increase was found. It is clear that for future comparisons, model calculations with updated emission inventories are desirable.

Consequences for BC atmospheric lifetime
In the following, we assess the implications of the flight observations on modelled BC lifetime and RF. Episodic biomass burning emissions from fires, even though repre-sented in the model emissions, pose challenges when comparing flight campaigns to monthly mean model data. Arguably fires are also difficult to characterize as anthropogenic. Below, we therefore constrain our discussion to the HIPPO data set, which was less influenced by episodic fires, reached the highest altitudes, covers the largest geographical area, and represents an approximate annual mean. Figure 2 shows that some models more closely reproduce the measurements than others, both in magnitude and shape. Several studies have suggested that to reproduce HIPPO data, a low modelled atmospheric lifetime, or a short ageing timescale, of BC is required Wang et al., 2014a). Here we can test this supposition for a larger set of models. Quantifying the difference between models and data requires care, as absolute concentrations range over several orders of magnitude. Common diagnostic variables include model bias and RMS error. Of these, RMS error and model mean bias (see Methods) will be dominated by high absolute concentrations, i.e. low altitudes in the present case. Model mean normalized bias avoids this, but will be more sensitive to model and measurement uncertainties in high altitude, low concentration ranges. In Fig. 3, RMS error and biases are plotted as a function of the modelled BC lifetime. Lifetime, also referred to as atmospheric residence time, is here defined as modelled global, annual mean emissions divided by burden (Table 3). Figure 3 shows that, independent of diagnostic variable, a low BC lifetime is a requirement for good reproduction of absolute modelled concentrations. Regressing bias or RMS error versus lifetime (black, dashed line in Fig. 3) gives an intercept at 3 days for RMS error and model mean bias. This value is in line with indications from other recent studies, e.g. Bauer et al. (2013). In the present data set, a single model with high lifetime (HadGEM2) represents a  significantoutlier. That model did not include BC ageing and transition to a hydrophilic state, with the consequence that both BC lifetime and burdens over remote areas become high . To test the impacts of single models on the result, Fig. 3 also shows regressions with one model removed (grey lines). For the normalized mean bias, which is less sensitive to high concentrations, the regression with this particular model removed is consistent with the results from RMS error and mean bias.
Bias and RMSE give information on absolute deviations, but less on any covariance in shape. The Pearson correlation coefficient, however, is sensitive to the shape of the BC profiles. In Fig. 3, correlation is indicated by symbol size. (See also values in Table 3.) Several models with low lifetimes also yield low correlations. Regressing only the models with correlation coefficients ρ>0.8 gives similar slopes and intercepts to what we find using all models (red, dashed line). We note that 12 out of 13 models show correlation with the Pacific HIPPO data at significance p>0.05.
Low BC lifetime appears necessary, but not sufficient, to describe the data. Only three models (IMPACT, GMI, GISS-MATRIX) exhibit both a low bias or RMS error and a high correlation, with no single obvious factor linking their aerosol treatments. AeroCom Phase II models use a wide variety of microphysics schemes (Mann et al., 2014). Meteorology and treatment of BC aging and wet scavenging also vary, and will impact the vertical profiles (Kipling et al., 2013;Bauer et al., 2013). Further model experiments, in line with the single-model study in Wang et al. (2014a), are required to address the reasons behind the relationship found in Fig. 3.

Consequences for modelled BC FF + FB RF
The three models that best reproduce HIPPO in the Pacific all report consistent and relatively low BC FF + BF forcing, exerted close to emission sources as expected from the low lifetime. In these three models very little BC reaches remote ocean regions, or gets lifted above 500 hPa, relative to the other models in the ensemble. While the correspondence to Atmos. Chem. Phys., 14, 12465-12477, 2014 www.atmos-chem-phys.net/14/12465/2014/ HIPPO cannot be used to extract information close to emission sources, it does suggest that scaling down the average modelled forcing aloft and in remote ocean regions has merit. Compared with HIPPO, the current model ensemble overestimates BC concentrations at all altitudes in remote regions. The overestimation increases with altitude, and is particularly significant at pressures below 200 hPa. Further, Schwarz et al. (2013) suggest that the minimum concentration consistently observed by HIPPO in the upper troposphere, lower stratosphere and tropical transition layer may be a global feature. Interestingly, the models shown here do, on average, reproduce the general feature of a common background level, but with a concentration that is approximately 20 times higher than indicated by HIPPO.
The HIPPO data set allows us to test the possible implications of these observations on the multimodel BC RF from AeroCom Phase II, a key basis for BC forcing recently assessed in the IPCCs AR5. We attempt two scalings of the modelled BC concentration fields, to align them with the HIPPO observations. The first assumes that the vertically resolved ratio between models and observations in the Pacific holds for all remote ocean regions, shown as grey shaded areas in Fig. 1. The second assumes that the supposition of a globally uniform high altitude BC concentration from Schwarz et al. (2013) is true. While the present data set is insufficient to determine if such a supposition is true, it is nevertheless interesting to assess its potential impact to see if efforts to measure high-altitude BC concentrations should be prioritized. Figure 4 shows the implications of these scalings on the multimodel direct RF due to BC from fossil fuel and biofuel burning (BC FF + BF). Unscaled, the multimodel median RF found here is +0.24 [+0.17 to +0.47] W m −2 . (See Table 3) Applying the remote region scaling reduces the global, annual median BC FF + BF RF to +0.22 [+0.16 to +0.39] W m −2 . The high-altitude scaling reduces it to +0.19 [+0.15 to +0.33] W m −2 . Applying both simultaneously, while ensuring that we do not doubly scale in remote, high-altitude regions, yields a BC FF + BF RF of +0.17 [+0.13 to +0.28] W m −2 , or a reduction of 25 % from the AeroCom Phase II value combined with a strong reduction in the model spread.
A 25 % reduction in the direct RF of BC would have significant implications, placing the entire model based 5-95 % range below the central BC RF value recently reported in IPCC AR5 . Presently, the remaining uncertainty in BC forcing is heavily driven by scalings such as the ones attempted above. The IPCC AR5 assessment took input both from AeroCom Phase II and other studies. One of these studies (Bond et al., 2013) reported a significantly stronger forcing of 0.51 W m −2 from fossil fuel and biomass burning. That estimate includes both a gross 15 % global downscaling of BC forcing efficiency due to overestimation of BC aloft, and a differentiated regional upscaling of emissions derived by comparing aerosol absorption optical depth from AeroCom Phase I with that from AERONET   (Myhre et al., 2013b). The grey bar shows unscaled values from the present work, then with remote scaling (pink) and high-altitude scaling (blue) applied. The khaki bar shows the lower limit on BC FF + BF forcing from the present work, with both scalings applied. Below we compare with the recent estimate in IPCC WG1's AR5.
ground-based remote sensing. Their downscaling, based on recent model studies (Bond et al., 2013;Samset and Myhre, 2011;Zarzycki and Bond, 2010) and evaluation of AeroCom Phase I results (Schwarz et al., 2010), is comparable to our 25 % reduction in forcing, though our reduction is attributed to remote ocean areas. For near-source and remote regions covered in the present study we here find no need for an emission bias-related upscaling; however the present data do not cover the regions where the upscaling in that analysis (Bond et al., 2013) was most pronounced. Also, the median anthropogenic BC RF in AeroCom Phase II is already a factor of 2 stronger than in Phase I, in part due to differences in emissions and modelled aerosol optical properties (Myhre et al., 2013b). We note that the above conclusions are broadly consistent with recent findings using the GEOS-Chem model, which is not represented in the present data set (Wang et al., 2014a). It is clear that further observations of BC concentrations, vertically resolved and both in situ and remote, are imperative for constraining the RF of BC.

Applicability of scaling factors derived from total BC to BC FF + BF fields
A question raised by the scaling analysis is whether we bias the results by deriving scaling factors based on total BC fields, and subsequently applying them to BC FF + BF forcing. At present, no measurement exists that can determine systematic differences between the global distributions of total BC and BC from fossil fuel and biofuel burning. However four of the models participating here (OsloCTM2, CAM3, CAM5 and IMPACT) also supplied full 3-D concentration  fields from BC FF + BF only, and we have used these to test the applicability of the method.
For this subset of models, which spans the range of predicted BC burdens, we found the ratio of modelled anthropogenic BC FF + BF to total BC concentrations to be approximately constant with altitude in the regions defined as remote. While trends exist for individual models in single regions, for the remote regions as a whole the ratio changes by less than 10 % through the atmospheric column. Hence any alteration of the BC vertical profile should equally affect both fields. Further, the fraction of the total global mean forcing found to be exerted at altitudes above 200 hPa was, for these models, found to be comparable for total BC and BC FF + BF. (See Table 2.) These two observations lead us to conclude that we do not strongly bias our results by applying scaling factors derived from total BC fields to BC FF + BF forcing.

Forcing pattern from models with low RMS and good correlation with HIPPO
We have shown that of the models participating in the present comparison, there are three (GMI, GISS-MATRIX, IMPACT) that both show a low RMSE and a good correlation with the Pacific HIPPO data. These three models all have low global mean atmospheric lifetimes of BC, and report among the lowest BC FF + BF RF values in the AeroCom Phase II ensemble. Figure 5 shows the zonal mean BC FF + BF RF, and total BC forcing density (RF per unit height) vertical profile, from the full model ensemble, and from the three HIPPOcorresponding models only. Total BC is used for the vertical profile as not all models provided full 3-D concentra-tion fields, as outlined above. The obvious feature is that for these three models, forcing is exerted primarily closer to the sources, and at lower altitudes, than in the full ensemble. Very little is exerted in the Southern Ocean, or above 200 hPa. Figure 6 shows the results from the scaling analysis above, compared with results for the three HIPPO-corresponding models only. From the outset they have low forcing, and the scaling exercise does not significantly affect them as there is already very little forcing in the scaled regions. The final model median, however, is consistent with that from the scaled full model ensemble. This gives a separate indication that a reduction of 25 % in anthropogenic BC RF relative to the AeroCom Phase II value is reasonable if we take the HIPPO Pacific measurements as guidance.

Conclusions
We have compared recent aircraft-based measurements of BC concentration with state-of-the-art global aerosol-climate models. In remote regions where BC concentration are dominated by long-range transport, and at high altitudes, there is a tendency for the models to overestimate the aircraft measurements, where and when the effects of fires are small. For a region sensitive to Asian emission sources, models reproduce the aircraft measurements remarkably well, with no indication of an underestimation in BC emissions. In remote ocean regions, an atmospheric lifetime of anthropogenic BC of less than 5 days seems crucial, but not sufficient, to be able to reproduce measurement data. Scaling the multimodel results to HIPPO measurements, remotely and aloft, and assuming a globally Atmos. Chem. Phys., 14, 12465-12477 Figure 6. As Fig. 4, except also showing the BC FF + BF forcing from the three models selected based on RMSE and correlation versus HIPPO (hatched boxes). uniform high-altitude BC concentration, leads to a reduction of 25 % in anthropogenic BC direct RF, relative to the models native values. The revised median of 0.17 W m −2 stands in stark contrast to recent assessments, which report up to 2-3 times stronger present-day BC forcing, but is in line with recent single-model studies (Wang et al., 2014a;Bauer et al., 2013). This discrepancy underlines the impact of combining measured BC concentration data with model estimates. To resolve these differences, and better constrain the climate impact of BC, there is an urgent need for further flight campaigns to provide BC vertical concentration profiles over both source regions, and regions where anthropogenic BC concentrations are dominated by transport and wet scavenging.