Source attribution using FLEXPART and carbon monoxide emission inventories : SOFT-IO version 1 . 0

Since 1994, the In-service Aircraft for a Global O bserving System (IAGOS) program has produced 18 in-situ measurements of the atmospheric composition during more than 51000 commercial flights. In orde r to 19 help analyzing these observations and understanding the processes driving the observed concentration 20 distribution and variability, we developed the SOFT -IO tool to quantify source/receptor links for all measured 21 data. Based on the FLEXPART particle dispersion mod el (Stohl et al., 2005), SOFT-IO simulates the 22 contributions of anthropogenic and biomass burning emissions from the ECCAD emission inventory databas e 23 for all locations and times corresponding to the me asured carbon monoxide mixing ratios along each IAG OS 24 flight. Contributions are simulated from emissions occurring during the last 20 days before an observa tion, 25 separating individual contributions from the differ ent source regions. The main goal is to supply adde d-value 26 products to the IAGOS database by evincing the geog raphical origin and emission sources driving the CO 27 enhancements observed in the troposphere and lower stratosphere. This requires a good match between ob served 28 and modeled CO enhancements. Indeed, SOFT-IO detect s more than 95% of the observed CO anomalies over 29 most of the regions sampled by IAGOS in the troposp here. In the majority of cases, SOFT-IO simulates C O 30 pollution plumes with biases lower than 10-15 ppbv. Differences between the model and observations are larg r 31 for very low or very high observed CO values. The a dded-value products will help in the understanding of the 32 trace-gas distribution and seasonal variability. Th ey are available in the IAGOS data base via 33 http://www.iagos.org . The SOFT-IO tool could also be applied to similar d ta sets of CO observations (e.g. 34 ground-based measurements, satellite observations). SOFT-IO could also be used for statistical validat ion as well 35 as for inter-comparisons of emission inventories us ing large amounts of data. 36


boxes (290000 particles) of initialization as a whole. 174
FLEXPART is set up for backward simulations (Seibert and Frank, 2004) from these boxes as described in Stohl 175 et al. (2003) and backward transport is computed for 20 days prior to the in-situ observation, which is sufficient 176 to consider hemispheric scale pollution transport in the mid-latitudes (Damoah et al., 2004;Stohl et al., 2002; 177 Cristofanelli et al., 2013). This duration is also expected to be longer than the usual lifetime of polluted plumes 178 in the free troposphere, i.e. the time when the concentration of pollutants in plumes is significantly larger than 179 the surrounding background. Indeed, the tropospheric mixing time scale has been estimated to be typically 180 shorter than 10 days (Good et al., 2003;Pisso et al., 2009). Therefore the model is expected to be able to link air 181 mass anomalies such as strong enhancements in CO to the source regions of emissions (Stohl et al., 2003). It is 182 important to note that we aim to simulate recent events of pollution explaining CO enhancements over the 183 background, but not to simulate the CO background which results from aged and well-mixed emissions. 184 The FLEXPART output is a residence time, as presented and discussed in Stohl et al. (2003). These data 185 represent the average time spent by the transported air masses in a grid cell, divided by the air density, and are 186 proportional to the sensitivity of the receptor mixing ratio to surface emissions. In our case, it is calculated for 187 every input point along the flight track, every day for N t = 20 days backward in time, on a 1° longitude x 1° 188 latitude global grid with N z = 12 vertical levels (every 1 km from 0 to 12 km, and 1 layer above 12 km). 189 Furthermore, the altitude of the 2 PVU potential vorticity level above or below the flight track is extracted from 190 the wind and temperature fields, in order to locate the CO observations above or below the dynamical tropopause 191 according to the approach of Thouret et al. (2006).  Table 2 along with the references describing them. The four global 202 inventories are used to study the model's performance and sensitivity in Sect. 5. 203 To further test the sensitivity to the emission inventories, we also used one regional inventory, which is expected 204 to provide a better representation of emissions in its region of interest than generic global inventories. For 205 biomass burning, the International Consortium for Atmospheric Research on Transport and Transformation 206 (ICARTT) campaign's North American emissions inventory developed by Turquety et al. (2007) for the summer 207 of 2004 and provided at 1° × 1° horizontal resolution was tested. It combines daily area burned data from forest 208 services with the satellite data used by global inventories, and uses a specific vegetation database, including 209 burning of peat lands which represent a significant contribution to the total emissions. 210

Coupling transport output with CO emissions 211
Calculating the recent contributions C(i) (kg m -3 ) of CO emissions for every one of the i model's initialization 212 points along the flight tracks requires three kinds of data: 213 • the residence time T R (in seconds, gridded with N x = 360 by N y = 180 horizontal points, N z = 12 vertical 214 levels, N t = 20 days) from backward transport described in Sect. 3.1, 215 • CO surface emissions E CO (N x ,N y ,N t ) (in kg CO / m 2 / s) 216 • the injection profile Inj(z) defining the fraction of pollutants diluted in the different vertical levels (with 217 ∆z being the thickness, in meters) just after emissions: In the case of anthropogenic emissions, CO is simply emitted into the first vertical layer of the residence time 222 grid ( ∆z= 1000m).

224
For biomass burning emissions, in the tropics and mid latitudes regions, the lifting of biomass burning plumes is 225 usually due to small and large scale dynamical processes, such as turbulence in the boundary layer, deep 226 convection and frontal systems, which are usually represented by global meteorological models. At higher 227 latitudes, however, boreal fires can also be associated with pyro-convection and quick injection above the 228 planetary boundary layer. Pyro-convection plume dynamics are often associated with small-scale processes that 229 are not represented in global meteorological data and emission inventories (Paugam et al 2016). In order to 230 characterize the effect of these processes, we implemented three methodologies to parameterize biomass 231 injection height: 232   Fig. 2). 236 • the second named MIXED uses the same injection profiles as in DENTENER for the tropics and mid-237 latitudes, but for the boreal forest, injection profiles are deduced from a lookup

Automatic detection of CO anomalies 254
For individual measurement cases, plumes of pollution can most of the time be identified by the human eye 255 using the observed CO mixing ratio time series or the CO vertical profiles. However, this is not feasible for a 256 database of tens of thousands of observation flights. In order to create statistics of the model's performance, we 257 need to systematically identify observed pollution plumes in the IAGOS database. The methodology to do this is 258 based on what has been previously done for the detection of layers in the MOZAIC database (Newell et al., 259 1999;Thouret et al., 2000), along with more recent calculations of the CO background and CO percentiles define 260 for different regions along the IAGOS data set (Gressent et al., 2014). An example demonstrating the procedure, 261 which is described below, is shown in Fig. 3.  (Fig. 4) for the UT. Note that C R VP (Q3) or C R UT_season (Q3) 300 needs to be higher than 5 ppb (the accuracy of the CO instrument; Nédélec et al., 2015) in order to consider an 301 anomaly: 302 In the examples shown in Fig. 3a and Fig. 3b, the red line represents CO anomalies. (anthropogenic or biomass burning) and transport (at regional or synoptic scale, pyro-convection, deep 314 convection, frontal systems). Systematic evaluation of the model performance against emission inventories will 315 be presented in Sect. 5. 316

Anthropogenic emission inventories 317
Among the case studies listed in Table 1 Observations show little variability in the free troposphere down to around 3 km. Strong pollution is observed 328 below, with + 300 ppb enhancement over the background on average between 0 and 3 km. Note that we do not 329 discuss CO enhancement above 3 km. 330 In agreement with C R VP , SOFT-IO simulates a strong CO enhancement in the lowest 3 km of the profile, caused 331 by fresh emissions. However, the simulated enhancement is less strong than the observed one, a feature that is 332 typical for this region, as we shall see later. 333 In addition to the CO mixing ratio, SOFT-IO calculates CO source contributions and geographical origins of the 334 modeled CO, respectively displayed in Fig. 5b and Fig. 5c (using the methodology described in Sec. 3.4) and 335 using here MACCity and GFAS v1.2 as example. For the geographical origin we use the same 14 regions as 336 defined for the GFED emissions (http://www.globalfiredata.org/data.html). Note that only the average of the 337 calculated CO is displayed for each anomaly (0-3km; 3.5-6km) in Fig. 5b and Fig. 5c. 338 339 Colored lines in Fig. 5a show the calculated CO using anthropogenic sources described by the two inventories 340 selected in Sect. 3.2, MACCity (green line) and EDGARv4.2 (yellow line), along the flight track. In both cases, 341 biomass burning emissions are described by GFASv1.2. Emissions from fires have negligible influence (less 342 than 3%) on this pollution event as depicted in Fig. 5b. 343 In the two simulations, the calculated CO mixing ratio is below 50 ppb in the free troposphere, as we do not 344 simulate background concentrations with SOFT-IO. CO enhancement around 4 to 6 km is overestimated by 345 SOFT-IO. CO above 6 km is not considered as an anomaly, as C R UT < C R UT_season (Q3). Simulated mixing ratios in 346 the 0-2 km polluted layer are almost homogeneous, with values around 280 ppb using MACCity and around 160 347 ppb using EDGARv4.2. They are attributed to anthropogenic emissions (more than 97% of the simulated CO) 348 Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2017-653 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 26 July 2017 c Author(s) 2017. CC BY 4.0 License. 10 originating mostly from Central Asia with around 95% influence. In this regard, the CO simulated using 349 MACCity is in better agreement with the observed CO than the one obtained using EDGARv4.2. Indeed, using 350 MACCity, simulated CO reaches 90% of the observed enhancement (+ 300 ppb on average) over the background 351 (around 100 ppb), while for EDGARv4.2 the corresponding value is only 53%, indicating strong underestimation 352 of this event. The difference in the calculated CO using these two inventories is also consistent with the results 353 • around 100°W (around +10 ppb of CO enhancement on average): plume 1 362 • between 80°W and 50°W (+30 ppb of CO enhancement on average): plume 2 363 • between 0° and 10°E (+40 ppb of CO enhancement on average): plume 3. 364 These polluted air masses are surrounded by stratospheric air masses with CO values lower than 80-90 ppb. As 365 polluted air masses were sampled at an altitude of around 10 km, they are expected to be due to long-range 366 transport of pollutants. 367 The calculated CO is shown in Fig. 6a using MACCity (green line), EDGARv4.2 (yellow line) for anthropogenic 368 emissions and GFASv1.0 for biomass burning emissions. SOFT-IO estimates that these plumes are mostly 369 anthropogenic (representing 77% to 93% of the total simulated CO, Fig. 6b). Pollution mostly originates from 370 Central and South-East Asia, with strong contribution from North America (Fig. 6c) for plume 3. 371 SOFT-IO correctly locates the three observed polluted air masses with the two anthropogenic inventories. CO is 372 also correctly calculated using MACCity, with almost the same mixing ratios on average as the observed 373 enhancements in the three plumes. Only 2/3 of the observed enhancements are simulated using EDGARv4.2, 374 except for plume 1 with better results. We have already seen in the previous case study that emissions in Asia 375 may be underestimated, especially in the EDGARv4.2 inventory. 376 Similar comparisons were performed in the four case studies selected to estimate and validate the anthropogenic 377 emission inventories coupled with the FLEXPART model. Results are summarized in Table 3. For three of the 378 cases, SOFT-IO simulations showed a better agreement with observations when using MACCity than when 379 using EDGARv4.2. In the fourth case both inventories performed equally well. One reason for the better 380 performance of MACCity is the fact that it provides monthly information (Table 2). 381 382

Biomass burning emission inventories 383
In order to evaluate and choose biomass burning emission inventories, we have selected eleven case studies with 384 fire-induced plumes ( Table 1) The two last ones, on the 30 th and 31 st of July 2008, focused on biomass burning plumes observed in the ITCZ 389 region above Africa as described in a previous study (Sauvage et al., 2007a). peaks, one near the ground that is half due to local anthropogenic emissions and half due to contributions from 407 North American biomass burning and thus not considered in this discussion. 408 The second more intense peak, simulated in the free troposphere where the enhanced CO air masses were 409 sampled, is mostly caused by biomass burning emissions (87% of the total calculated CO, Fig. 7b), originating 410 from North-America (99% of the total enhanced CO). When calculated using the ICARTT campaign inventory, 411 the simulated CO enhancement reaches over 150 ppb, which is 10 ppb higher than the observed mixing ratio 412 above the background (+140 ppb), but only for the upper part of the plume. 413 When using global inventories, the simulated contribution peak reaches 70 ppb using GFASv1.2 and 100 ppb 414 using GFED4, which appears to underestimate the measured enhancement (+140 ppb) by up to 50% to 70% 415 respectively. This comparison demonstrates the large uncertainty in simulated CO caused by the emission 416 inventories, both in the case of biomass burning or anthropogenic emissions. For that reason we aim to provide 417 simulations with different global and regional inventories in for the IAGOS data set. 418 As the ICARTT campaign inventory was created using local observations in addition to satellite products, the 419 large difference in the simulated CO compared to the other inventories may in part be due to different 420 quantification of the total area burned (for GFED, GFAS using the FRP as constraint). 12 deduced from seasonal IAGOS mixing ratios over this region. Such CO enhancements have been attributed to 428 regional fires injected through ITCZ convection (Sauvage et al., 2007b). 429 The SOFT-IO simulations (colored lines in Fig. 8a) link these air masses mostly to recent biomass burning 430 (responsible for 68% of the total simulated CO, Fig. 8b) in South Africa (Fig. 8c) The resulting detection rates are presented in Fig. 9 for eight of the eleven regions shown in Fig. 4. Statistics are 458 presented separately for three altitude levels (Lower Troposphere 0-2 km, Middle Troposphere 2-8 km and 459 Upper Troposphere > 8 km). Figure 9 shows that SOFT-IO performance in detecting plumes is very good and 460 not strongly altitude or region-dependent. In the three layers (LT, MT and UT), detection rates are higher than 461 95% and even close to 100% in the LT where CO anomalies are often related to short-range transport. Detection It is important to note that the biases remain of the same order (±10-15 ppb) when comparing the first (Q1), 485 second (Q2) and third (Q3) quartiles of the CO anomalies observed and modeled within most of the regions (Fig.  486 10b). This confirms the good capacity of the SOFT-IO software in reproducing the CO mixing ratios anomaly in 487 most of the observed pollution plumes. 488 Differences become much larger when considering outlier values of CO anomalies (lower and upper whiskers, ± 489 2.7σ or 99.3%, Fig. 10b), which means for exceptional events of very low and very high CO enhancements 490 (accounting for 1.4% of the CO plumes), with biases from ± 10 ppb to ± 50 ppb for most of the regions. Higher 491 discrepancies are found in the lower and the upper troposphere and can reach ±50 to ±200 ppb in two specific 492 regions (North Asia UT and South Asia LT) for these extreme CO anomalies. Note that North Asia UT and 493 South Asia LT present respectively extreme pollution events related to pyro-convection (Nédélec et al., 2005) for 494 the first region, and to strong anthropogenic surface emissions (Zhang et al., 2012) for the second one. It may 495 suggest that the model fails to correctly reproduce the transport for some specific but rare events of pyro-496 convection. 497 When looking at the origin of the different CO anomalies (Fig. 10c) It is worth noting the good ability of SOFT-IO in quantitatively reproducing the CO enhancements observed by 513 IAGOS. This is especially noticeable in the LT and UT, with similar CO mixing ratios observed and modeled 514 during the entire period and within the standard deviation. However, the amplitude of the seasonal cycle of CO 515 maxima is highly underestimated (-100%) after January 2009 in the European LT, where anthropogenic sources 516 are predominant with more than 90% influence (Fig. 10c). This suggests misrepresentation of anthropogenic 517 emissions in Europe after the year 2009 (Stein et al., 2014). 518 In the middle troposphere (2-8 km), the CO plumes are systematically overestimated by SOFT-IO by 50% to 519 100% compared to the observations. This might be related to different reasons: 520 • the chosen methodology of the CO plume enhancements detection for those altitudes (described in Sect. 521 3.4), which may lead to a large number of plumes with small CO enhancements, which are difficult to 522 simulate. This could be due to the difficulty in defining a realistic CO background in the middle 523 troposphere. 524 • the source-receptor transport which may be more difficult to simulate between 2-8 km than in the LT 525 where receptors are close to sources; or than in the UT where most of the plumes are related to 526 convection detrainment better represented in the models than MT detrainment which might be less 527 intense. 528 • The frequency of the IAGOS observations which is lower in the MT than in the UT.

Biomass burning emissions 560
We first investigate the sensitivity of SOFT-IO to the type of biomass burning inventory, using MACCity with 561 GFAS v1.2 or GFED 4 (2003-2013), using the same MIXED methodology for vertical injection of emissions 562 (Fig. 2). As for anthropogenic emissions, Fig. 13 represents the Taylor diagram and averaged biases for the 563 different configurations. 564 Performances (correlations, standard deviations and biases) are very similar for both biomass burning 565 inventories, with smaller differences compared to anthropogenic inventories. Even for regions dominated by 566 biomass burning such as Africa or South America as depicted previously (Fig. 11c), the sensitivity of the SOFT-567 IO performance to the type of global fire inventory is below 5 ppb. 568 569 Based on case studies, we discussed in Sect. 4.2 the comparison of CO contributions modeled using regional fire 570 emission inventories. It resulted in a better representation of biomass burning plumes using the specifically 571 designed campaign inventory than using the global inventories (Table 4) it is hard to conclude of systematic better results using the ICARTT inventory. While simulations (not shown) 574 give better results for a few specific events of very high CO using ICARTT, similarly good results are obtained 575 when using GFASv1.2 or GFED4 for most other cases. It is worth noting that IAGOS samples biomass burning 576 plumes far from ICARTT sources, after dispersion and diffusion during transport in the atmosphere. Besides, 577 few boreal fire plumes (that would be better represented using ICARTT), are sampled by the IAGOS program. Secondly, we investigate the influence of the vertical injection scheme for the biomass burning emissions, using 580 the three methodologies for determining injection heights described in Sect. 3.3. Sensitivity tests (Fig. 13c and  581   Fig 13d) demonstrate a small influence of the injection scheme on the simulated plumes. The largest influence is 582 found over North Asia UT, where pyro-convection has been highlighted in the IAGOS observations (Nédélec et 583 al., 2005), with however less than 5 ppb difference between the different schemes. More generally, small vertical 584 injection influence is probably due to too few cases where boreal fire emissions are injected outside the PBL by Analyzing long term in situ observations of trace gases can be difficult without a priori knowledge of the 591 processes driving their distribution and seasonal/regional variability, like transport and photochemistry. This is 592 particularly the case for the extensive IAGOS database, which provides a large number of aircraft-based in-situ 593 observations (more than 51000 flights so far) distributed on a global scale, and with no a priori sampling 594 strategy, unlike dedicated field campaigns. 595

596
In order to help studying and analyzing such a large data set of in situ observations, we developed a system that 597 allows quantifying the origin of trace gases both in terms of geographical location as well as source type. The 598 SOFT-IO module (https://doi.org/10.25326/2) is based on the FLEXPART particle dispersion model that is run 599 backward from each trace gas observation, and on different emission inventories (EDGAR v4.2, MACCity, 600 GFED 4, GFAS v1.2) than can be easily changed. 601

602
The main advantages of the SOFT-IO module are: 603 • Its flexibility. Source-receptor relationships pre-calculated with the FLEXPART particle dispersion 604 model can be coupled easily with different emission inventories, allowing each user to select model 605 results based on a range of different available emission inventories. 606 • CO calculation, which is computationally very efficient, can be repeated easily whenever updated 607 emission information becomes available without running again the FLEXPART model. It can also be 608 extended to a larger number of emission datasets, particularly when new inventories become available, 609 or for emission inventories inter-comparisons. It can also be extended to other species with similar or 610 longer lifetime as CO to study other type of pollution sources. 611 • High sensitivity of the SOFT-IO CO mixing ratios to source choice for very specific regions and case 612 studies, especially in the LT most of the time driven by local or regional emissions, may also help 613 improving emission inventories estimates through evaluation with a large database such as IAGOS one. 614 Indeed as it is based on a Lagrangian dispersion model, the tool presented here is able to reproduce 615 small-scale variations, which facilitates comparison to in situ observations. It can then be used to The main results are the following: 628 • By calculating the contributions of recent emissions to the CO mixing ratio along the flight tracks, 629 SOFT-IO identifies the source regions responsible for the observed pollution events, and is able to 630 attribute such plumes to anthropogenic and/or biomass burning emissions. 631 • On average, SOFT-IO detects 95% of all observed CO plumes. In certain regions, detection frequency 632 reaches almost 100%. 633 • SOFT-IO gives a good estimation of the CO mixing ratio enhancements for the majority of the regions 634 and the vertical layers. In majority, the CO contribution is reproduced with a mean bias lower than 10-635 15 ppb, except for the measurements in the LT of Central and South Asia and in the UT of North Asia 636 where emission inventories seems to be less accurate. 637 • CO anomalies calculated by SOFT-IO are very close to observations in the LT and UT where most of 638 the IAGOS data are recorded. Agreement is lower in the MT, possibly because of numerous thinner 639 plumes of lower intensity (maybe linked to the methodology of the plume selection). 640 • SOFT-IO has less skill in modeling CO in extreme plume enhancements with biases higher than 50 ppb. 641 642 In its current version, SOFT-IO is limited by different parameters, such as inherent parameterization of the 643 Lagrangian model, but also by input of external parameters such as meteorological field analysis and emission 644 inventories. Sensitivity analyses were then performed using different meteorological analysis and emissions 645 inventories, and are summarized as follow: 646 • Model results were not very sensitive to the resolution of the meteorological input data. Increasing the 647 resolution from 1 deg to 0.5 deg resulted only in minor improvements. On the other hand, using 648 operational meteorological analysis allowed more accurate simulations than using ERA-Interim 649 reanalysis data, perhaps related to the better vertical resolution of the former. 650 • Concerning anthropogenic emissions sensitivity tests, results display regional differences depending on 651 the emission inventory choice. Slightly better results are obtained using MACCity. 652 • Model results were not sensitive to biomass burning global inventories, with good results using either 653 GFED 4 or GFAS v1.2. However, a regional emission inventory shows better results for few individual 654 cases with high CO enhancements. There is a low sensitivity to parameterizing the altitude of fire