Quantifying methane emissions from natural gas production in north-eastern Pennsylvania

Natural gas infrastructure releases methane (CH4), a potent greenhouse gas, into the atmosphere. The estimated emission rate associated with the production and transportation of natural gas is uncertain, hindering our understanding of its greenhouse footprint. This study presents a new application of inverse methodology for estimating regional emission rates from natural gas production and gathering facilities in north-eastern Pennsylvania. An inventory of CH4 emissions was compiled for major sources in Pennsylvania. This inventory served as input emission data for the Weather Research and Forecasting model with chemistry enabled (WRFChem), and atmospheric CH4 mole fraction fields were generated at 3 km resolution. Simulated atmospheric CH4 enhancements from WRF-Chem were compared to observations obtained from a 3-week flight campaign in May 2015. Modelled enhancements from sources not associated with upstream natural gas processes were assumed constant and known and therefore removed from the optimization procedure, creating a set of observed enhancements from natural gas only. Simulated emission rates from unconventional production were then adjusted to minimize the mismatch between aircraft observations and model-simulated mole fractions for 10 flights. To evaluate the method, an aircraft mass balance calculation was performed for four flights where conditions permitted its use. Using the model optimization approach, the weighted mean emission rate from unconventional natural gas production and gathering facilities in northeastern Pennsylvania approach is found to be 0.36 % of total gas production, with a 2σ confidence interval between 0.27 and 0.45 % of production. Similarly, the mean emission estimates using the aircraft mass balance approach are calculated to be 0.40 % of regional natural gas production, with a 2σ confidence interval between 0.08 and 0.72 % of production. These emission rates as a percent of production are lower than rates found in any other basin using a top-down methodology, and may be indicative of some characteristics of the basin that make sources from the north-eastern Marcellus region unique. Published by Copernicus Publications on behalf of the European Geosciences Union. 13942 Z. R. Barkley et al.: Quantifying methane emissions from natural gas production


Introduction
The advent of hydraulic fracturing and horizontal drilling technology has opened up the potential to access vast reservoirs of previously inaccessible natural gas, shifting energy trends in the United States away from coal and towards natural gas (EIA, 2016b).From a greenhouse gas (GHG) emissions perspective, natural gas has the potential to be a cleaner energy source than coal.For every unit of energy produced, half as much carbon dioxide (CO 2 ) is emitted through the stationary combustion of natural gas in comparison to coal (EPA, 2016).However, during the process of extracting and distributing natural gas a percentage of the overall production escapes into the atmosphere through both planned releases and unintended leaks in infrastructure.Though these emissions may be small from an economic perspective, their climatological impacts are not negligible (Alvarez et al., 2012;Schwietzke et al., 2014).Methane (CH 4 ), the main component of natural gas, is a potent greenhouse gas with a global warming potential over a 20-year period (GWP 20 ) of 84 (Myhre et al., 2013).Over a 100-year period the GWP is reduced to 28 due mostly to interactions with the hydroxyl radical which transform the CH 4 molecule to CO 2 .Depending on which timespan is used, the relative climatological impacts of natural gas as an energy source compared to coal can vary.Using the GWP 20 value, it is estimated that a natural gas emission rate of greater than 3 % of total gas production would result in a natural gas power plant having a more negative impact on the climate than a coal-powered plant.Using the GWP 100 value, this emission rate threshold shifts to 10 % of production (Schwietzke et al., 2014;Alvarez et al., 2012).Complicating matters further, the future climate impacts associated with an increased availability of natural gas extends well beyond a simple greenhouse gas footprint comparison with coal.Lower fuel prices linked to this new reservoir of energy can change the course of future energy development globally.With many states and countries attempting to find a suitable balance between their energy policies and greenhouse gas footprint, it is important for the scientific community to be able to quantify and monitor natural gas emission rates.
The drilling and transportation of natural gas can be broken down into five stages: production, processing, storage, transmission, and distribution.The United States Environmental Protection Agency (EPA) uses a bottom-up approach to quantify these emissions, estimating emission rates per facility or component (such as a compressor, unit length of pipeline, pneumatic device) or an average emission per event (such as a well completion or liquids unloading).These "emission factors" are then multiplied by nationwide activity data containing the number of components or events associated with each emission factor, and a total emission rate is produced for the country (EPA, 2015b).This bottom-up approach is a practical methodology for estimating emissions over a large scale but has limitations.A bottom-up inven-tory depends on the quality and quantity of its emission factors and activity data.Emissions from sources in the natural gas industry can be temporally variable and have a wide range of values depending on a number of factors, such as the quality and age of the device and the gas pressure moving through the component.Furthermore, recent studies have shown that a majority of emissions come from a small percentage of devices, often referred to as "super-emitters", creating a long-tail distribution of emission sources (Brandt et al., 2014;Omara et al., 2016;Zavala-Araiza et al., 2015, 2017;Frankenberg et al., 2016).These factors make it difficult to sample enough devices and adequately describe the mean emission rate, thus allowing for significant representation errors in the emission factors.Because emission factors are required for hundreds of different components, these errors can accumulate and lead to systematic biases in the total emissions estimate.
One way to compliment results based on inadequate sample sizes in the bottom-up approach is to measure the aggregated enhancement in the atmospheric mole fraction at larger scales through a top-down approach.Instead of measuring emissions from individual devices and scaling up, a top-down approach takes atmospheric greenhouse gas concentrations measured downwind of a continent (e.g.Bousquet et al., 2006), a region (e.g.Lauvaux et al., 2008), a city (e.g.White et al., 1976;Mays et al., 2009;Lamb et al., 2016), or a facility (e.g.Ryerson et al., 2001) and uses inverse methodologies to attribute the enhancements to potential sources upwind.One of these methods, the aircraft mass balance technique, has been performed at many different oil and gas fields to characterize natural gas emissions (Pétron et al., 2012;Karion et al., 2013Karion et al., , 2015;;Peischl et al., 2015;Conley et al., 2016).While this methodology is able to capture surface fluxes over a large region, it remains difficult to attribute the emissions to any individual source (Cambaliza et al., 2014).Any sources from within the flux region that emit CH 4 will be measured in the downwind observations and be a part of the aggregated regional enhancement.Atmospheric observations may include other sources of CH 4 unrelated to natural gas, such as anaerobic respiration from landfills and wetlands, enteric fermentation from cattle, anaerobic decomposition of manure, CH 4 seepage from coal mining, and many other smaller sources.If the purpose of the study is to solve for the emissions from the natural gas industry, emissions from all sources unrelated to natural gas must be known and removed from the regional flux estimate.Thus, top-down experiments require an accurate CH 4 inventory of the study area, and any errors associated with the inventory will propagate into the final emissions estimate.A more advanced technique used to separate out non-natural gas sources has been developed using ethane as a tracer for natural gas (Smith et al., 2015).However, such methods may struggle in dry gas basins, where smaller ethane-to-methane ratios within the gas can make the ethane signature more difficult to separate out, or in regions where multiple ethane sources are present.
Similarly to bottom-up methods, top-down studies fail to address temporal variability, with observations from many of these studies having been collected during a limited number of 2 to 4 h aircraft flights performed over a period of weeks.
In recent years, both bottom-up and top-down studies have aimed to calculate natural gas emission rates, with bottomup studies generally finding smaller emission rates than their top-down counterparts (Brandt et al., 2014).The discrepancy between the results from these two methodologies must be better understood if the true emission rate is to be known.Both the bottom-up and top-down approaches have their own inherent sources of error.For the bottom-up approach, a small sample size could result in the omission of any super-emitters, resulting in a low emissions bias.For the top-down approach, difficulty in attributing the measured enhancements to their correct sources can lead to errors when solving for the emissions of a particular sector.
Top-down emission estimates of individual basins have shown variation in the emission rate across the different basins.An aircraft mass balance performed over the Barnett Shale in Texas found an emission rate between 1.3 and 1.9 % of production (Karion et al., 2015), yet a similar mass balance study executed over unconventional wells in Uintah County, Utah, calculated an emission rate between 6.2 and 11.7 % of production (Karion et al., 2013).Differences in regional emission rates can perhaps best be illustrated by recent studies in the Marcellus region.The Marcellus Shale gas play is part of the Marcellus geological formation running close to the Appalachian mountain chain from West Virginia to southern New York and contains an estimated 140 billion cubic feet of technically recoverable natural gas (EIA, 2012).Reaching peak production by the end of 2015, the Marcellus is the largest producer of shale in the USA, producing 17 000 million standard cubic feet per day (MMSCFD) of natural gas (EIA, 2016a).A bottom-up study measuring emissions from 17 unconventional well sites in the Marcellus found a median emission rate from the wells of 0.13 % of production but estimated a mean emission rate between 0.38 and 0.86 % of production due to the potential presence of super-emitters, which would skew the mean emission rate towards values higher than the median (Omara et al., 2016).An aircraft mass balance study over north-eastern Pennsylvania calculated an emission rate between 0.18 and 0.41 %, a number that accounted for emissions from the production, processing, and transmission of the gas (Peischl et al., 2015).Both of these derived estimates fall below emission rates calculated throughout other basins and are below the 3 % threshold required for natural gas to be a smaller climate pollutant in comparison to coal over a 20-year timescale.The low rates in the Marcellus compared to other regions could be the result of a systematic difference within the Marcellus that leads to a more efficient extraction of natural gas.However, while useful as a first-guess estimation, current studies performed in the region are based on relatively small sample sizes (1 aircraft mass balance and 88 individual well measurements).
A more thorough analysis of the emission rate in the Marcellus would provide insight into regional differences in CH 4 emissions from different shale basins and help to improve national estimates of emissions from natural gas.
This study seeks to provide confidence in the emission rate for the north-eastern Marcellus by performing the most thorough top-down analysis of the north-eastern Marcellus region to date.CH 4 measurements were taken from aircraft observations across 10 flights in north-eastern Pennsylvania.A new implementation of modelling CH 4 mole fractions is developed to track complex plume structures associated with different emitters, and an optimal natural gas emission rate is solved for each of the 10 flights.An aircraft mass balance technique is also conducted for four of the flights and natural gas emission estimates from this method are compared to those calculated using the modelling technique.Using information on the uncertainty with both methods, a regional emission rate is calculated for the natural gas industry in the north-eastern Marcellus region.

Methods
The objective of this study is to quantify CH 4 emissions coming from unconventional wells and compressor stations, henceforth referred to as upstream natural gas emissions in the north-eastern Marcellus region (defined as the area contain within 41.1-42.2• N 75.2-77.6• W, see Fig. 1) through two different top-down methodologies.CH 4 observations from aircraft data are collected for 10 individual flights over a 3-week period in May 2015.These data are used to solve for the upstream natural gas emission rate using an aircraft mass balance approach.Additionally, a CH 4 emissions inventory for the region is compiled and input into the atmospheric transport model described below.CH 4 concentrations are modelled for each flight, and the upstream natural gas emission rate within the model is optimized to create the best match between aircraft observations and model-projected enhancement, providing another estimate for the upstream natural gas emission rate.The sections below detail the regional CH 4 inventory, the aircraft campaign, the transport model, the model optimization technique, and the mass balance approach used in this study.

Regional methane emission inventory
In this study we characterize emissions from the natural gas industry into five different sectors: emissions from wells, emissions from compressor facilities, emissions from storage facilities, emissions from pipelines, and emissions in the distribution sector.
To estimate CH 4 emissions from the production sector of the natural gas industry, data were first obtained on the location and production rate of each unconventional well from the Pennsylvania Department of Environmental Protection www.atmos-chem-phys.net/17/13941/2017/Oil and Gas Reports website (PADEP, 2016) and the West Virginia Department of Environmental Protection (WVDEP, 2016).To convert the production rate into an emission rate, we need to assume a first guess as to the expected leakage from wells in the area.A first-guess natural gas emission rate of 0.13 % was applied to the production value of each of the 7000+ producing unconventional wells based on the median rate from Omara et al. (2016).The natural gas emission rate was then converted to a CH 4 emission rate by assuming a CH 4 composition in the natural gas of 95 % (Peischl et al., 2015).
In addition to unconventional wells, the domain also contains more than 100 000 shallow conventional wells.Annual conventional production rates for the year 2014 were obtained through the PADEP Oil and Gas Reports website, the WVDEP, and the New York Department of Environmental Conservation (NYDEC, 2016).Despite the large number of wells, the average conventional well in PA produces 1 % of the natural gas of its unconventional counterpart.However, it is speculated that the older age of these wells and a lack of maintenance and care for them results in a higher emission rate for these wells as a function of their production (Omara et al., 2016).A first-guess natural gas emission rate of 11 % was applied to the production values of the conventional wells based on the median emission rate from the wells sampled in Omara et al. (2016).Similarly to the unconventional wells, the natural gas emission rate was then converted to a CH 4 emission rate by assuming a CH 4 composition in the natural gas of 95 %.
Compressor stations located within the basin are responsible for collecting natural gas from multiple well locations, removing non-CH 4 hydrocarbons and other liquids from the flow, and regulating pressure to keep gas flowing along gathering and transmission pipelines, and can be a potential source for methane emissions.Data for compressor station locations and emissions come from a data set used in Marchese et al. (2015).A total of 489 compressor facilities are listed for Pennsylvania, with 87 % of the listed facilities also containing location data.Emissions for each compressor station are calculated through two different methodologies.In the simplest case, a flat emission rate of 32.35 kg h −1 is applied to each station, which is the mean emission rate of a gathering facility in PA found in Marchese et al. (2015).In the more complex scenario, the same emissions total is used as in the flat rate case but is distributed among the compressor stations linearly as a function of their energy usage.Wattage between compressors in our data set can vary greatly, from 10 kW for small compressors to 7000 kW or more at large gathering facilities.Using the wattage as a proxy for emissions allows us to account for the size and throughput of natural gas at each station and assumes larger stations will emit more natural gas compared to smaller stations (Marchese et al., 2015).
Data on locations of underground storage facilities were obtained from the United States Energy Information Administration (EIA, 2015).For each of these locations, a base emission rate of 96.7 kg h −1 was applied according to the average value emitted by a compressor station associated with an underground storage facility (Zimmerle et al., 2015).
To calculate pipeline emissions, data on pipeline locations needed to be collected.Information on transmission pipelines, which connect gathering compressors to distribution networks, is provided by the Natural Gas Pipelines GIS product purchased from Platts, a private organization which collects and creates various infrastructural layers for the natural gas and oil industry (Platts, 2016).Gathering pipeline data corresponding to the transfer of gas from wellheads to gathering compressors is nearly non-existent for PA with the exception of Bradford County (2016), which maps out all gathering pipeline infrastructure within the county border.In PA, information on the location of a gathering pipeline elsewhere is only available where a gathering line crosses a stream or river.To account for gathering pipelines in the remainder of the state, a GIS model was created using Bradford County pipelines maps in addition to previously generated pipeline maps of Lycoming County (Langlois et al., 2017).A typical pattern was simulated, connecting pipelines between unconventional wells throughout the state (Fig. 2).The resulting pattern follows the valley of the Appalachian Mountains, with larger pipelines crossing through the state to connect the different branches of the network.These pipelines were then multiplied by an emission factor of 0.043 kg per mile of pipe, used for gathering pipeline leaks in the Inventory of US Greenhouse Gas Emissions and Sinks: 1990-2013(EPA, 2015b).
CH 4 emissions from natural gas distribution sources, coal mines, and animals/animal waste were provided from Maasakkers et al. (2016), which takes national-scale emissions from the EPA's greenhouse gas inventory for the year 2012 and transforms it into a 0.1 • × 0.1 • emissions map for the continental USA.For natural gas distribution emissions, various pipeline data were collected at state level and emission factors were accounted for to calculate a total distribution emission for the state.This emissions total was then distributed within the state, proportional to the population density.Emission estimates for coal are calculated using information from the Greenhouse Gas Reporting Program (GHGRP) for active mines and the Abandoned Coal Mine Methane Opportunities Database for abandoned mines (EPA, 2008).State-level emissions missions from enteric fermentation and manure management are provided in the EPA's inventory.These emissions were segregated into higher resolutions using county-level data from the 2012 US Census of Agriculture (USDA, 2012) and land-type mapping.
Finally, the EPA's Greenhouse Gas Reporting Program data set for the year 2014 was used to capture all other major sources of CH 4 in the region that are otherwise unaccounted for, the majority of which are emissions from landfills and some industrial sources (EPA, 2015a).Sources within the GHGRP that overlap with natural gas sources already accounted for within our inventory were removed to prevent redundancy.
Although our emissions map used for the model runs did not account for potential CH 4 emissions from wetland sources, a series of wetlands emission scenarios was obtained for the region using data from Bloom et al. (2017).From this data set, wetland CH 4 emissions make up only 1 % of all regional CH 4 emissions in the most extreme scenario, and thus we assume their impact is negligible in this study.

Aircraft campaign
Observations for this project were obtained from a 3week aircraft campaign during the period of 14 May-3 June 2015 and are available for public access (https://doi.org/10.15138/G35K54).The campaign was led by the Global Monitoring Division (GMD) of the National Oceanic and Atmospheric Administration Earth Systems Research and Laboratory (NOAA ESRL), in collaboration with the University of Michigan.During this period, the NOAA Twin Otter aircraft flew throughout the north-eastern portion of Pennsylvania, providing a total of 10 flights across 9 days.The aircraft was equipped with a cavity ring-down spectroscopic analyser (Picarro G2401-m) measuring CH 4 , CO 2 , CO, and water vapour mole fractions at approximately 0.5 Hz with random errors of 1 ppb, 0.1 ppm, 4 ppb, and 50 ppm respectively (Karion et al., 2013).GPS location, horizontal winds, temperature, humidity, and pressure were also recorded at 1 Hz.The majority of observations for each flight occurred during the afternoon hours at heights lower than 1500 m above ground level.Each flight contains at least one vertical profile within and above the boundary layer, with temperature and water vapour observations from these profiles used to estimate the atmospheric boundary layer height and ensure that the aircraft sampled air within the boundary layer throughout the flight.Observations suspected of being located above the boundary layer top are flagged and removed from all calculations.3. Dates are mm/dd/yyyy.Flight paths, wind speeds, and CH 4 observations for each of the 10 flights can be seen in Fig. 3.For 6 of the 10 flights, a box pattern was flown around a large portion of unconventional natural gas wells in north-eastern PA.These flights were performed typically on days with a strong, steady wind, with a clearly defined upwind and downwind transect intended for use in an aircraft mass balance calculation.Five of the six box-pattern flights were composed of two loops circling the gas basin, allowing for two separate calculations of the upstream natural gas emission rate for the flight.On the remaining four flights, raster patterns were performed to help to identify spatial complexities of CH 4 emissions within the basin.All 10 flights were used in the model optimization calculation of the upstream natural gas emission rate.

Transport model
The atmospheric transport model used in this study is the Advanced Weather Research and Forecasting (WRF) model (WRF-ARW, Skamarock et al., 2005) version 3.6.1.The WRF configuration for the model physics used in this research includes the use of (1) the double-moment scheme (Thompson et al., 2008) for cloud microphysical processes, (2) the Kain-Fritsch scheme (Kain and Fritsch, 1990;Kain, 2004) for cumulus parameterization on the 9 km grid, (3) the rapid radiative transfer method for general circulation models (GCMs) (RRTMG, Mlawer et al., 1997;Iacono et al., 2008), (4) the level 2.5 TKE-predicting MYNN planetary boundary layer (PBL) scheme (Nakanishi and Niino, 2006), and (5) the Noah 4-layer land-surface model (LSM), which predicts soil temperature and moisture (Chen and Dudhia, 2001;Tewari et al., 2004) in addition to sensible and latent heat fluxes between the land surface and atmosphere.
The WRF model grid configuration used in this research contains two grids, 9 and 3 km, each with a mesh of 202 × 202 grid points.The 9 km grid contains the mid-Atlantic region, the entire north-eastern United States east of Indiana, parts of Canada, and a large area of the northern Atlantic Ocean.The 3 km grid contains the entire state of Pennsylvania and most of the state of New York.Fifty vertical terrain-following model layers are used, with the centre point of the lowest model layer located at ∼ 10 m above ground level.The thickness of the layers stays nearly constant with height within the lowest 1 km, with 26 model layers below 850 hPa (∼ 1550 m a.g.l.).One-way nesting is used so that information from the coarse domain translates to the fine domain but no information from the fine domain translates to the coarse domain.
The WRF modelling system used for this study also has four-dimensional data assimilation (FDDA) capabilities to allow meteorological observations to be assimilated into the model (Deng et al., 2009).With WRF FDDA, observations are assimilated through the entire simulation to ensure the optimal model solutions that combine both the observation and the dynamic solution, a technique referred to as dynamic analysis.Data assimilation can be accomplished by nudging the model solutions toward gridded analyses based on observations (analysis nudging) or directly toward the individual observations (observation nudging), with a multiscale gridnesting assimilation framework, typically using a combination of these two approaches (Deng et al., 2009;Rogers et al., 2013).
FDDA (Deng et al., 2009) was used in this research with the same strategy as used in Rogers et al. (2013).Both analysis nudging and observation nudging were applied on the 9 km grid, and only observation nudging was applied on the 3 km grid.In addition to assimilating observations and using the North America Regional Reanalysis model as initial conditions, we reinitialize the WRF model every 5 days, allowing 12 h of overlap period in consideration of model spin-up period to prevent model errors from growing over long periods.The observation data types assimilated include standard WMO surface and upper-air observations distributed by the National Weather Service (NWS), available hourly for the surface and 12-hourly for upper air, and the Aircraft Communications Addressing and Reporting System (ACARS) commercial aircraft observations, available anywhere in space and time with low-level observations near the major airports.
The WRF model used in this study enables the chemical transport option within the model, allowing for the projection of CH 4 concentrations throughout the domain.Surface CH 4 emissions used as input for the model come from our CH 4 emissions inventory and are all contained within the 3 km nested grid.Each source of CH 4 within our inventory is defined with its own tracer (Table 1), allowing for the tracking of each individual source's contribution to the overall projected CH 4 enhancement within the model.For this study, CH 4 is treated as an inert gas.The potential for interaction with the hydroxyl radical (OH), the main sink of CH 4 , is neglected.A calculation assuming an above-average OH mole fraction over a rural region of 0.5 pptv (Stone et al., 2012) and a reaction rate of 6.5 × 10 −15 (Overend et al., 1975) produces a CH 4 sink of 0.5 ppb h −1 .The duration of a flight can be up to 3 h, leading to a potential loss of 1.5 ppb over the course of a flight.This loss is small but not insignificant.CH 4 plumes associated with natural gas during each flight ranged between 15 and 70 ppb, and a change of 1.5 ppb could theoretically impact observations by as much as 10 % of the plume signal.However, this decrease in the CH 4 mole fraction would likely have equal impacts on both the background CH 4 values as well as the enhancement.Because emission calculations are based on the relative difference between the CH 4 background mole fraction and the enhancement downwind, it would take a gradient in the oxidation of OH to impact the results.Considering this relatively low destruction rate, the expected homogeneity of the sink across the region, and the difficulties associated with the simulation of chemical loss processes, we assumed that the CH 4 mass is conserved throughout the afternoon and therefore we ignored the impact of oxidation by OH.

Model optimization methodology
The objective of the model optimization technique is to solve for an emission rate as a percent of natural gas production that creates the best match between modelled CH 4 concentration maps, provided by the transport model, with actual CH 4 mole fraction observations provided by the aircraft data.The optimization process in this study was originally designed to solve for natural gas emission from unconventional wells and emissions from compressor facilities separately.Because the flow rate of natural gas being processed was not available for each compressor station, emissions at each facility were originally scaled based on the size of the station.However, when running the transport model using this emissions map, enhancements from the compressor stations produced plume structures nearly identical in shape to enhancements from the unconventional wells due to the similar spatial distributions of these two tracers.Without distinct differences between the enhancement patterns from each tracer, it becomes impossible to distinguish which emissions source must be adjusted to obtain the closest match to the observations.For this reason, emissions from compressor facilities are merged with unconventional well emissions in the optimized emission rate.Though the emission rate solved for in this experiment only uses the locations and production for the unconventional wells, this optimized rate represents emissions from both the wells and compressor facilities and are referred to as the modelled upstream natural gas emission rate.Midstream and downstream natural gas processes (such as processing, transmission and distribution of the gas) and emissions from conventional wells are not solved for in this study due to their minimal contribution (less than 5 %) to CH 4 emissions in the region encompassed by the aircraft campaign.Using the transport model WRF-Chem, CH 4 atmospheric enhancements were generated for each flight using different tracers to track different components to the overall CH 4 enhancement (e.g.animals/animal waste, distribution sector, industries).From these concentration fields, the upstream natural gas emission rate was solved for each flight using a threestep model optimization technique.First, a background concentration was determined for each flight and subtracted from the observations to create a set of "observed CH 4 enhancements", using where X Obs is the CH 4 mole fraction observation from the aircraft, X bg is a chosen background value for the flight, and X EnhO is the calculated CH 4 enhancement at each observation.In this study, the background value is defined as the ambient CH 4 mole fraction over the region not accounted for by any of the sources within the model, with each flight having a unique background value.Box-pattern flights containing 2 loops around the basin may have a different background value assigned for each loop.To determine the background mole fraction, we start with the value of the observed mole fraction in the lowest 2nd percentile of all observations within the boundary layer for a given flight or loop.This chosen background value represents the CH 4 mole fraction across the flight path from sources that are outside of our model domain.Because the background value is meant to represent the CH 4 mole fraction outside the model domain that is otherwise unaccounted for in our model, using the observations with the lowest CH 4 mole fraction is not always a sufficient definition for the background.On certain days, CH 4 enhancements from sources within the model domain can form plumes with wide spatial coverage that cover all observations during a flight.For example, during a flight the lowest CH 4 observations from the aircraft may be 1850 ppb, but the model simulation during that period indicates that all observations within the flight are being impacted by at minimum a 20 ppb enhancement.In this case, we would set our background value for the flight at 1830 ppb, and say that our 1850 ppb observations from the flight are a combination of an 1830 ppb background in addition to a 20 ppb enhancement from sources within the model.By subtracting this background value from our observations, we create a set of observed CH 4 enhancements, which can be directly compared to the model-projected enhancements.The next step is to remove enhancements from this set that are not associated with emissions from upstream natural gas using where X OtherM is the modelled CH 4 enhancement at each observation from sources unrelated to upstream natural gas processes, and X GasO is the observation-derived CH 4 enhancement associated with upstream natural gas emissions for each observation.In this step, each observed CH 4 enhancement has subtracted from it the projected non-natural gas enhancement from the model (i.e.nearest grid point in space) using the corresponding model output time closest to the observation within a 20 min time interval.This creates a set of observed CH 4 enhancements related only to emissions from upstream gas processes, filtering out potential signals from other CH 4 emitters and providing a set of observed enhancements that can be directly compared to the projected upstream natural gas enhancement within the model.By subtracting these other sources from the observations, we make the assumptions that our emissions inventory is accurate for non-natural gas sources and that the transport of these emissions is perfect, both of which are actually uncertain.Because errors exist in both the emissions and transport, it is possible to create a negative observation-derived upstream gas enhancement if model-projected enhancements from other sources are larger than the observation-derived enhancement.From the 10 flights, 16 % of the observationderived enhancements are negative, but only 3 % are negative by more than 5 ppb.To avoid solving for unrealistic negative values, these negative observation-based upstream gas enhancements are set to 0. Errors associated with this issue and other uncertainties with our inventory are examined further in the uncertainly analysis section of this paper.
In the final step, the upstream natural gas emission rate within the model is adjusted to create the best match between the modelled upstream gas enhancement and observationderived upstream gas enhancement using where X GasO and X GasM are the observed and modelled enhancements for each observation.In this equation, J is a cost function we are trying to minimize by solving for a scalar multiplier C, which, when applied to the modelled natural gas enhancements, creates the smallest sum of the differences between the observation-derived upstream gas enhancement and the modelled upstream enhancement.Because the emission rate within the model is linearly proportional to the model enhancements, we can solve for the upstream natural gas emission rate that minimizes the cost function using where 0.13 was the first-guess upstream emission rate (in percent of production) used in the model, and E is the optimized emission rate for the flight as a percentage of the natural gas production at each well.This final value represents an overall emission rate associated with both unconventional wells and compressor stations across the region.
The decision to use a scalar cost function rather than the sum of squares is to account for possible misalignment between any observed CH 4 plume and modelled plumes.There are two potential ways in which misalignment may occur.One possibility is that the modelled wind direction differs from the true wind direction, leading to a plume in the model that is off-centre in relation to the observed plume.The other possibility relates to how the model treats emissions from natural gas as a uniform percent of production.In reality the emissions are more random in nature, and thus the plume may not always develop over the wells with the largest production values.If a cost function is used that minimizes the sum of the squares, any misalignment between the modelled and observed plume will result in the peak of the modelled plume aligning with the height of the tail of the observed plume (Fig. 4).Unless the observed plume aligns perfectly with the modelled plume, the optimized emission rate using a sum of squares approach will always have a low bias.By using a scalar cost function, we solve for an optimized emission rate that results in a plume with the same area under the curve compared to the observed plume (Fig. 4).This methodology is not impacted by any misalignment between the modelled and observed plumes, preventing the low biases associated with a sum of squares minimization.

Model optimization uncertainty assessment
For each of the 10 flights, an uncertainty assessment was performed to obtain a range of likely upstream emission rates for any individual flight.Five different sources of error were considered in this assessment: model wind speed error, model boundary layer height error, CH 4 background error, CH 4 emission inventory error, and model/observation mismatch error.These five sources of error vary substantially from flight to flight depending on conditions, and each can have significant impacts on the total uncertainty (Tables 4,  5).
Errors in the modelled wind speed and boundary layer height have impacts on our emission estimates that linearly impact the results.If we assume a constant wind speed, a constant boundary layer height, and no entrainment of air from the top of the boundary layer, we can use the following equation to understand their impacts.
where C is the total CH 4 enhancement of the column of air contained within the boundary layer, F 0 is the average emission rate over the path the parcel travelled, x is the distance the column of air travelled, U is the wind speed and D is the boundary layer height.Using this equation, we can see the linear relationship between the model wind speed, model boundary layer height, and the calculated emission rate.As an example, if wind speeds in the model have low bias, natural gas enhancements projected by the model increase inversely.To compensate for this effect, the optimized emission rate decreases proportionally.A similar case can be made for bias in the boundary layer height.Both errors in the wind speed and boundary layer height have known impacts on the optimized emission rate which can be corrected for, as long as the errors of each are known.
To calculate the error in the model wind speed, we assume aircraft observations are truth and use where U obs is the mean wind speed observed by the aircraft across all points within the boundary layer, U m is the mean modelled wind speed across all points closest in time and space to each observation, and U e is the wind speed error percentage.
To compute the error in the modelled boundary layer height, the observed boundary layer height for each flight is assumed to be the true boundary layer height, and the boundary layer height percentage error, H e , is estimated using where H obs is the average observed boundary layer height across each of the aircraft profiles for a given flight, H m is the model boundary layer height closest in time and space to the location of the observed profiles averaged over all profiles.For both the observation and the model, boundary layer heights were determined by locating the height of the potential temperature inversion associated with the top of the boundary layer.On the 22 May flight where a potential temperature inversion could not easily be identified in the observations, changes in water vapour, CO 2 , and CH 4 mixing ratios were used to identify the boundary layer top.
Errors in the model wind speed and boundary layer height are calculated for each of the 10 flights.From these errors, a corrected optimized emission rate is calculated for each flight using Eq. ( 8): where E is the original emission rate, and E new is the corrected optimized upstream natural gas emission rate as a percent of production.
In addition to errors related to wind speed and boundary layer height, we quantify three other sources of error in each flight: errors in the selected CH 4 background value, errors in the CH 4 inventory, and errors associated with the overall model performance (Table 5).Unlike the wind speed and boundary layer errors, which have easily computable impacts on the emission estimates, these other three sources of error and their impact on the optimized emission rate are more difficult to quantify.For example, a background value of 1 ppb below the true background for a given flight would add 1 ppb to each observed natural gas enhancement for all observations, creating a high bias with the optimized upstream emission rate.
To account for this error, each flight's optimization processes was rerun, iterating the background value by ±5 ppb, and the ratio of the percent change in the emission rate compared to the original case was defined as the resulting error in the emission rate due to background uncertainty.This ±5 ppb background error range is an estimate at the range of possible error in the background based on changes observed in the upwind measurements from each of the flights and is meant to be a conservative estimate of the error.The impact this error can have on the emission rate varies depending on the magnitude of the observed downwind enhancements during a flight.A plume containing a CH 4 enhancement of 50 ppb will have a smaller relative error from a 5 ppb change compared to one with an enhancement of only 10 ppb.Thus, days with high wind speeds and a high boundary layer height (and thus enhancements of a smaller magnitude) tend to be affected the most by background errors.
Similarly to background errors, errors from the CH 4 emissions inventory are difficult to quantify.In the model optimization technique, we subtract enhancements from sources unrelated to unconventional natural gas before solving for the upstream gas emission rate.In doing so, we are making the assumption that our emissions inventory for sources unrelated to upstream natural gas processes are accurate.In truth, each emission source in our inventory comes from a different data set and has its own unique error bounds, many of which are unknown.To simulate the potential errors associated with unknown bounds in our inventory, we use a Monte Carlo approach and iterate the unconventional emissions optimization approach for each flight 10 000 times, applying a random multiplier between 0 and 2 for each of the different sources not associated with unconventional natural gas production.The resulting range of optimized natural gas emission rates was fit to a Gaussian distribution and the 2σ emission range was calculated.Despite varying the emissions used in the error analysis by 0 to 200 % of their original value, their impacts on the optimized natural gas emission rate are minimal on most days due to the north-eastern Marcellus region having very few emission sources not related to upstream natural gas processes.Only for the flights on 24 May do we see errors from the inventory contributing significantly to the overall daily error, when the coal plume in south-western PA enters the centre of the study region and has a large role in the upstream natural gas emission rate calculation for that day (Table 5).
The final source of error attempts to quantify the similarity of the pattern of modelled and observed natural gas enhancements, referred to here as the model performance error.Figure 5 shows an example of 2 days, one in which the model appears to recreate the observations, and the other in which the model poorly matches the shape of the observed enhancements.Having compared these two simulations with no other information, we hypothesize that one should put more trust in the upstream natural gas emission rate calculated for the flight with modelled upstream enhancements that match structurally compared to the emission rate from the flight with a modelled enhancement that bares little semblance to the observed enhancement.The model performance error is designed to account for the trustworthiness of the optimized upstream emission rate based on how well the model simulates a given day.The model performance error is calculated using a modified normalized root mean squared error formula given in Eq. ( 9): In this equation, σ X is the standard deviation of the difference between the modelled and observation-derived upstream natural gas CH 4 enhancement using the optimized emission rate, and X gas is the observed magnitude of enhancement from the major natural gas plume observed in each flight.Here, X gas serves as a normalization factor to account for the varying strength of the enhancement from flight to flight and ensures that days with increased enhancements due to meteorological conditions or true daily fluctuations in the upstream natural gas emissions do not proportionally impact the performance error percentage.For example, a day with high winds and a deep boundary layer would produce smaller enhancements, leading to a small σ X regardless of model performance unless normalized by X gas .

Aircraft mass balance method and uncertainty assessment
An aircraft mass balance calculation was performed for four applicable flights from the aircraft campaign as an alternative method to calculate upstream natural gas emission rates independently of the transport model.The aircraft mass balance approach uses the CH 4 enhancement between a downwind and upwind transect to calculate the total CH 4 flux of the area contained between the two transects.We use the mass balance equation from Karion et al. (2013): where E is the total flux (in mol s −1 ) coming from the enclosed flight track, U is the mean wind speed (in m s −1 ), θ is the mean angle of the wind perpendicular to the flight track, X is the CH 4 enhancement measured along the downwind flight track from −b to b (expressed as a mole fraction), n air is the molar density of air within the boundary layer (in mol m −3 ).Each of the integrals represents the summing over all air being measured within our transect in both the horizontal (x) and the vertical (z).By simplifying this further and using the mean enhancement along each downwind transect as the enhancement and choosing z top to be the top of the boundary layer, we can transform the previous equation into the following: Z. R. Barkley et al.: Quantifying methane emissions from natural gas production  where L is the length of the transect (in metres), D is the depth of the boundary layer (in metres) found using observations from vertical ascents during each flight, X is the mean enhancement across the transect (expressed as a mole fraction), U and θ are the mean wind speed (in m s −1 ) and wind direction relative to the angle of the transect, and 37.3 is the average molar density of dry air within the boundary layer (in mol m −3 ), assuming an average temperature and pressure of 290 K and 900 hPa.Out of the 6 days of the aircraft campaign with a clearly defined upwind and downwind transect, 1 day (14 May) con-tained a surface high-pressure centre in the middle of the flight, resulting in erratic wind patterns, and another day (25 May) had CH 4 plumes from south-western PA affecting portions of the flight observations.These days were not used for mass balance, and calculations were performed for the remaining four box-pattern flights (22,23,28,29 May).From this list of remaining flights, three of them contained two loops around a portion of the Marcellus basin.A mass balance was performed on each loop, resulting in a total of seven mass balance calculations for the region across 4 days.Table 6 summarizes the results from the mass balance flights.
For each flight, a total flux within the box encompassed was calculated using Eq. ( 11).Using this flux, a natural gas emission rate based on production from within the box was calculated using Eq. ( 12) where E is the total flux from Eq. ( 11) (in kg h −1 ), E other are the emissions enclosed in the box from sources not related to upstream natural gas processes (in kg h −1 ), P is the total CH 4 from natural gas being produced within the box (in kg h −1 ), and E % is the resulting natural gas emission rate as a percent of total production within the box.
As an error analysis for the mass balance flight, we look at four potential sources of error (Table 7).One source of uncertainty comes from the observed wind speed used in Eq. ( 11).For our experiment, we take the mean observed wind speed from the aircraft and assume this value represents the mean wind speed within the entire box during the 2-4 h period it would take for air to travel from the upwind transect to the downwind transect.To understand the uncertainty and biases associated with this assumption, we recreate wind observations along the flight path using values from WRF-Chem, and compare the mean wind speed from the simulated obser- vations to the mean model winds contained within the box integrated throughout the boundary layer during the 3 h period closest to the flight time.By making this comparison, we are able to understand the representation error associated with treating the wind speed observations from the aircraft as the wind speed within the entire box during the period it would take for air to cross from the upwind transect to the downwind transect.On average, modelled wind speeds following the flight were 7 % faster than integrated wind speeds within the box due to the inability for aircraft observations to account for slower wind speeds closer to the surface.This bias was removed from each day's calculated wind speed.After accounting for the wind speed bias, the average error of the modelled wind speed following the flight path compared to the modelled winds within the box was 3 %.This 3 % uncertainty was applied to each flight and used as the potential uncertainty in the mean wind speed.Errors in the wind direction were neglected, as each flight used in the mass balance completely surrounded the basin using downwind transects at multiple angles, and thus small errors in the wind angle would result in a negligible net change in the total flux calculated.
Another source of uncertainty is error in the boundary layer height.For each flight, between 2 and 3 vertical profiles were performed, and the mean height was used in Eq. ( 11).The standard deviation of different heights from each transect was used as the uncertainty.On 22 May, a boundary layer height could be interpreted from only one vertical transect.For this day, we assume an uncertainty of ±200 m (±9 %).
Uncertainty in the CH 4 background mole fraction was estimated similarly to the boundary layer height.On three of the four flights, two upwind transects were performed.The mean observed CH 4 mole fraction between the two transects was used as the background value for the entire flight, and the standard deviation between the loops was used as the uncertainty.On both the 23 and 28 May flights, background differences between the two transects were less than the instrument error of 1 ppb.On these days, we use the instrument error as the background error.On 22 May, only one upwind transect was usable for the calculation.For this day, we assume a conservative estimate in the uncertainty of the background of ±5 ppb.Finally, we assess the uncertainty in the emissions inventory.After a CH 4 flux is calculated for each loop, emissions from sources contained within the box that are not associated with upstream natural gas processes must be subtracted to solve for the upstream natural gas emission rate.Any errors associated with our inventory will result in a CH 4 source attribution error.To account for the potentially large uncertainty with the emission sources in our inventory, we vary these non-natural gas emissions by a factor of 2 to test the impact on the solved upstream natural gas emission rate.Because north-eastern Pennsylvania contains few sources of CH 4 emissions outside of natural gas production, the impact of this uncertainty is typically less than 20 % of the total emissions calculated within the box.

Methane inventory
From the first-guess CH 4 inventory created in this study, a total anthropogenic CH 4 emission rate of 2.76 Tg CH 4 yr −1 is projected within our inner model domain (Fig. 6) with values for individual source contributions shown in Table 2.This total emissions estimate assumes a leak rate of 0.13 % of gas production for unconventional wells and does not account for emissions from natural gas transmission and storage facilities outside of PA due to a lack of information available from other states.Within the model domain, the area encompassing south-western PA and north-eastern WV stands out as the largest contributor to CH 4 emissions, with emissions from conventional gas, unconventional gas, and coal mines all having significant contributions to the total.In particular, the large emissions from coal make this region unique in comparison to other shales.The EPA's Greenhouse Gas Reporting Program data set for the year 2014 lists individual coal mines in the south-western portion of our domain as 8 of the top 10 CH 4 emitting facilities across the entire United States.This large area source of CH 4 can have an impact on CH 4 concentrations hundreds of kilometres downwind and must be taken into account when winds are from the southwest (Fig. 7).Examples of this plume and its impacts on the aircraft campaign are discussed in Sect.3.2.1.

Case studies
From the aircraft campaign, a total of 10 flights across 9 days were used in the model optimization technique.For each one of these flights, CH 4 concentration fields were produced using WRF-Chem, and the emission rate from upstream gas processes was adjusted as outlined in the methods section to find the rate that best matches the total observed CH 4 enhancement.For box flights that completed two loops around the basin, emission rates were calculated for each loop independent from one another and then averaged for the flight.Table 3 provides the general meteorology for the 10 flights.
During each of the observational periods, we use the transport model to project the mole fraction enhancement across the region for each of the different CH 4 tracers (Fig. 8).From these projections, we see three common sources of CH 4 which can significantly influence the observed mole fractions in our study region of north-eastern PA.The first is emissions   between the modelled and observed enhancements match closely after optimization.For this flight, a box pattern was flown, encompassing a majority of the unconventional wells in north-eastern PA, and enhancements were observed along the western and northern transects of the flight.Modelled enhancements from sources unrelated to upstream gas emissions showed a broad CH 4 plume associated mostly with animal agriculture along the western edge of the flight, and a smaller enhancement on the eastern edge associated with two landfills in the Scranton/Wilkes-Barre urban corridor (Fig. 9).Both of these enhancements are subtracted off from the observations to produce a set of observation-derived enhancements due to upstream natural gas production and gathering facilities.Any enhancements in this new observational data set are located almost entirely along the northern transect of the flight, directly downwind of the natural gas activity in the region.The observation-derived upstream gas enhancement is then directly compared to the modelled upstream enhancement using its first-guess emission rate, and an optimized upstream emission rate of 0.26 % of production (i.e. a doubling of the first guess) is calculated by minimizing the difference between the two data sets (Fig. 10).
The match between observed and modelled CH 4 enhancements on the first loop of the 29 May flight is closer than any other flight in the campaign.The success of the model on this day is likely due to a number of ideal conditions.In general, inconsistencies between the modelled and observed mean wind speeds and boundary layer heights can have a linear bias on the projected enhancements, but for this flight, differences between the observed and modelled wind speed and boundary layer height were near 0 for both loops (Figs. 11,12).Observed wind directions throughout the course of the flight had little directional spread and the averaged observed wind direction was only 9 • different from modelled values,  resulting in a transport of the CH 4 plumes that the model was able to match well.Furthermore, the observed mean wind speed was 4.6 m s −1 , a moderate wind which allows for a steady transport of any enhancements towards the downwind transect but not strong enough to dilute their magnitude, resulting in an easily observable enhancement downwind of the basin.Finally, intrusions from sources unrelated to upstream gas were small on this day due to favourable wind conditions, reducing the probability of incorrectly attributing the observed enhancements to the wrong source.Enhancements from upstream natural gas processes were between 15 and 40 ppb along our downwind transect.By comparison, enhancements from other sources were lower than 15 ppb along a majority of the flight, and most of these enhancements were located west of the downwind transect, making them easier to identify and remove without unintentionally impacting enhancements from the natural gas plume.All of these different factors likely contributed to producing a situation where the model was successfully able to match CH 4 observations during the 29 May flight.
Flights that occurred on days with a south-westerly wind had a tendency to produce CH 4 observations that were intuitively difficult to interpret due to convolved CH 4 sources in south-western Pennsylvania.One of these complex observation sets occurred during the late afternoon flight on 24 May 2015 (Fig. 13).Observations on this day show a CH 4 enhancement that decreased with latitude, with higher CH 4 mole fractions observed farther south.Given the location of the wells in the middle of the flight path and the WSW wind pattern in the region, this northern/southern CH 4 gradient is unexpected and counterintuitive compared to where one would expect the enhancements to be based solely on the presence of the gas industry in north-eastern PA.However, through modelling each of the many contributors of CH 4 within our inventory, we are able to recreate this latitudinal CH 4 gradient and better understand the observed pat- terns (Fig. 13).Throughout an 18 h period leading up to the 24 May flight, winds from the SSW transport emissions from coal in south-western PA north-eastward until they reach the centre of the state, where a westerly wind then shifts the plume across the study region such that it only intersects the southern half of the flight path.Because of both the magnitude of the coal emissions and an accumulation that occurred in the south-western portion of the state during the previous night, the modelled enhancement from the coal plume is substantial (> 20 ppb), as it crosses over the flight path and covers up much of the signal from upstream gas emissions.Nonetheless, the transport model is able to account for these far-reaching sources and attempts to separate out their contribution to the observed enhancements.We are able to recreate the 24 May flight observations more accurately than most other flights, with a correlation coefficient of 0.71 between the observations and model CH 4 values.Although the model successfully recreates the overall observed CH 4 pattern on this flight, attempting to match model vs. observationderived enhancements specifically from upstream natural gas contributions is much more difficult.Contributions from nonnatural natural gas sources are large such that they overwhelm much of the signal from local natural gas sources.
After subtracting out non-natural gas sources from the observations, the correlation specifically between modelled and observation-derived upstream natural gas enhancements is only 0.11.Despite the model's success at recreating observations from the 24 May late-afternoon flight, there is reason to be careful when interpreting results on days with observations influenced by distant sources.In particular, some transport error is unavoidable in atmospheric reanalyses, and the longer the time and distance a plume takes to reach the observations, the more its position and magnitude will be susceptible to these errors.During the early 24 May flight, a small 50 km shift in the location of the coal plume across the study region would change projected enhancements at some observations by as much as 20 ppb.Furthermore, errors in the transport speed could create scenarios where the coal plume either arrives at the study region too early or exits too late, creating a projected enhancement pattern that does not agree with the observations (Fig. 14).Additionally, inaccuracies with the emission estimates of non-unconventional gas sources in the inventory will impact the magnitude of their CH 4 enhancements, creating additional errors in the optimization process when subtracting out these enhancements from the observations.The early-afternoon 24 May flight and 25 May flight are both examples in which influences from CH 4 sources in south-western PA create complex structures in the enhancements, which the model is not able to match.Another example is the late-afternoon flight on 24 May (Fig. 15).Although observations and modelled enhancements closely match throughout portions of these two flights, a slight shift in the modelled wind direction can lead to vastly differing results due to the large offset small changes in the wind field can have on an emission source hundreds of kilometres away.Thus, results from the flights on 24 and 25 May should be taken with caution.A deeper analysis of these errors can be found in Sect.3.2.2.

Emission rates and uncertainty assessment
Table 4 shows the wind speed and boundary layer height errors for each flight as well as the optimized and corrected natural gas emission rates.On days where model performance was poor with regards to the wind speed and boundary layer height, we can see changes in the corrected emission rate.
For most days, this change is less than 20 % different than the original optimized emission rate.However, both 14 and 25 May have corrected emission rates which are around a factor of 2 different from their original value.Whether these corrected emission rates are more accurate than the original optimized rates is debatable.To calculate these alternative emission rates, we must assume that the wind speeds and boundary layer heights from our limited number of observations are the true values in the atmosphere, which may not be the case.Regardless of which rate is more accurate for each flight, the overall 16 % high bias in the model wind speed and the −12 % low bias in the model boundary layer result in compensating errors that cancel out, and the mean emission rates across all flights end up similar.Thus, any errors associated with these two meteorological variables has a trivial impact on the overall calculated emission rate for the region, and the uncorrected emission rates are used for the final mean and uncertainty calculations.Table 5 summarizes the background error, inventory error, and model performance error, and assumes independence between the three error sources to calculate the total uncertainty for each flight.The largest uncertainty exists for the 22 May flight, where an unexplained enhancement along the northern transect led to a poor match between the modelled enhancements and the observed enhancements.This may explain the anomalously high optimized emission rate for that day.Other flights with large uncertainty are those that occurred on 24 May, where enhancements from south-western PA are believed to be influencing large portions of the observations.
Based on the conservative methodology used to calculate these uncertainties, we assume the total uncertainty for each flight represents a 2σ range of possible emission rates and calculate a weighted mean and a 2σ confidence interval for the overall upstream emission rate across the 10 flights.From this approach, we find a mean upstream emission rate of 0.36 % of production and a 2σ confidence interval from 0.27 to 0.45 % of production.

Aircraft mass balance results
Calculated emission rates varied extensively between flights used for the mass balance analysis, ranging from 0.11 to 1.04 % of natural gas production (Table 6).When comparing emission rates between loops on the same day, we see more consistency in the values.This result is not surprising, as on each of the days with multiple loops, upwind and downwind CH 4 concentrations patterns tended to be similar between loops.Thus, differences in the total emission rate are likely due to either errors specific to each day (such as background variability, errors in meteorology) or real daily variability in the upstream natural gas emission rate.
From Table 7, we can see that the largest error with regards to the absolute uncertainty in the emission rate occurs on the 22 May flight.It is on this day that we have the largest uncertainty in the background value, with observations towards the end of the flight becoming unusable due to a rapid and unexplained decrease in the CH 4 mole fraction of 8 ppb over a 30 min period (Fig. 16).This day also features the highest boundary layer height and fastest winds of all flights in this study, reducing the magnitude of the enhancement asso- Using the mean estimated CH 4 emissions and uncertainty for each loop, we calculate a daily mean emission rate and uncertainty for each of the 4 days.We then solve for an unweighted mean across the four flights to derive our overall emissions estimate from the aircraft mass balance approach, and use the standard error of the flights to estimate the uncertainty.In doing so, we derive a natural gas emission rate from upstream processes of 0.40 % of production, with a 2σ confidence interval from 0.08 to 0.72 % of production.Here, we use the arithmetic mean rather than a weighted mean due to the linear relationship between the size of the emis- sion rate and the size of the errors.Because errors associated with ABL height and wind speed have a proportional impact on the calculated emissions within the box, days with a high emissions estimate produce large uncertainties relative to days with a small emission rate.A weighted mean approach assigns more weight to the days with low estimated emissions and produces an overall emission estimate with an uncertainty range too small to have confidence in (0.12 ± 0.02 percent of gas production).

Upstream emission rate
From this study, we estimate an emission rate between 0.27 and 0.45 % of gas production using the model optimization method and 0.08-0.72 % of gas production using the aircraft mass balance with a 2σ confidence interval.Figure 17 provides the emission range estimates from upstream natural gas processes using both the model optimization technique and mass balance technique when applicable.Top-down studies of other basins in the USA have all found emission rates greater than 1 % of production, and thus the rates calculated for the north-eastern Marcellus basin are the lowest observed yet, raising questions as to why the values in this region appear to be low.One possibility may be related to the high efficiency of the north-eastern Marcellus region compared to other major shale plays (Table 8).In terms of gas production per unconventional well, the Marcellus is the highest of all major basins in the USA.Furthermore, the gas production per well increases by nearly a factor of two when focusing specifically on Susquehanna and Bradford counties in northeastern Pennsylvania where the majority of the wells from this study are located (Fig. 1).The large difference in production per well between the north-eastern Marcellus and other shales may partly explain the low emission rates as a percentage of production.Throughout this study, we normalize natural gas emissions as a percentage of total production under the assumption that higher throughput of natural gas in a sys-tem should lead to higher emissions in the system.However, if leaks are more influenced by the number of components in operation rather than by the throughput passing through the wells, a high production-per-well system such as the unconventional wells in the north-eastern Marcellus could end up having a very low emission rate as a percentage of production but a similar emission rate compared to other basins based on the number of wells, compressors, etc.A thorough bottomup study of the Marcellus region measuring emissions on a device level could provide an answer to this hypothesis.
Although we calculate a low emission rate for this region, rates calculated for 22 and 25 May stand out as outliers where emissions fall well above our uncertainty bounds.It is possible that emissions from natural gas sources were higher on these days compared to others.Releases of natural gas into the atmosphere from short time frame events, such as liquids unloading and venting, can add a temporal component to the emission rate.Such events occurring at an increased frequency during the 22 and 25 May flights could be responsible for the higher emission rates.However, these 2 days have issues that could have affected the optimized emission rate.On 22 May, we observe a sudden drop in the observed CH 4 values that is nearly as large as the main plume on that day, creating concerns about background concentrations.On 25 May, a south-westerly wind was present, and while the model showed the coal plume to be west of the flight path, a small shift in the model wind direction would shift the coal plume over the region.For these reasons we are sceptical but not dismissive of the high emission rates found during these two flights.

Advantages of combining observations with model output
One of the major advantages of using a chemical transport model to solve for natural gas emission rates compared to a standard mass balance approach is that the transport model is able to account for the complex and often non-uniform plume structures originating from sources outside the flight path that can affect observations.When performing a mass   balance over a basin, it is assumed that the upwind transect is representative of the air exiting the downwind transect after subtracting out all sources within the box.However, this assumption is only true if winds contained within the flight path are in perfect steady state during the time it take for air to move from the upwind transect to the downwind transect, and that measurements from the downwind transect occurred at a much later time so that the air being measured is the same air measured from the upwind transect.These conditions are not easily achieved for regionalscale mass balances due to the long times needed for the air from the upwind transect to reach the downwind transect.As an example, from the four mass balance flights performed for this study the average time for air to move from the upwind transect to the downwind transect was 4 h, whereas the average time between the aircraft's upwind and downwind measurements was ∼ 40 min.The aircraft observations can be thought of as a snapshot in time, which can be problematic if large-scale plumes from outside the domain are moving through the region and impacting only certain portions of the observations during the flight's short time frame.By using a transport model for a domain much larger than that of the flight paths, we are able to track these far-reaching plumes and identify situations where the background CH 4 concentrations may be spatially heterogeneous.The potential usefulness of using a transport model alongside a mass balance calculation can best be demonstrated from observations taken over the Marcellus during a 2013 aircraft campaign (Peischl et al., 2015).During this flight the prevailing winds were from the WSW, and the largest CH 4 enhancements were observed along the western edge of the flight path, upwind of the unconventional wells.Using our transport model, we are able to recreate the day of flight and attempt to use our inventory to explain this feature (Fig. 18).Comparisons between modelled output and observations show a 60 ppb CH 4 enhancement from coal and conventional wells in south-western PA stretching close to the western edge of the aircraft observations, a plume structure similar to the one observed during the 24 May flight from our own study.Though this plume does not initially align with the observed transect with the largest enhancements, we recognize that the coal and gas plume travels for more than 20 h (a distance of 400 km) from its source before reaching the flight path.If we allow for a 10 % error in the transport speed and therefore advance the transport model by an additional 2 h past the time in which the aircraft observed these high values, we are able to line up the centre of the plume with the largest-observed CH 4 mole fractions along the western edge of the flight.In addition to the 60 ppb enhancement along the centre of the plume, the model projects 20 ppb enhancements along the edges and in front of the plume centre.These smaller enhancements have an influence along different portions of the flight which vary in magnitude, making it difficult to assess a proper background CH 4 value upwind of the wells and potentially masking natural gas enhancements downwind of them.However, by using a transport model, we are able to see the potential impact of these far-reaching sources, which would otherwise not be considered in a regional mass balance, and better understand the complex CH 4 plume structures, which can occur in a given region under specific wind conditions.

Conclusion
Using the model optimization technique presented in this study, we find a weighted mean natural gas emission rate from unconventional production and gathering facilities of 0.36 % of production with a 2σ confidence interval from 0.27 to 0.45 % of production.This emission rate is supported by four mass balance calculations, which produce a mean of 0.40 % and a 2σ confidence interval of 0.08-0.72 % of production.Applied to all the wells in our study region, this mean rate results in a leakage rate of 20 Mg CH 4 h −1 for the year 2015.The emission rate found in this top-down study quantified as a percent of production is significantly lower than rates found using top-down methodology at any other basin and indicates the presence of some fundamental difference in the north-eastern Marcellus gas industry that result in more efficient extraction and processing of the natural gas.
The 10 flights that took place in this study reveal large regional variations in the CH 4 enhancement patterns depending on the prevailing wind direction.On days with a northwesterly wind, observed enhancements come primarily from natural gas sources, and a small plume associated with it can be seen on the downwind leg of each flight with few enhancements upwind of the wells.Flights that took place with winds conditions predominantly from the south-west were more difficult to interpret.Plumes associated with coal and other potential sources of CH 4 in south-western Pennsylvania create complex enhancement patterns affecting both the upwind and downwind portions of the flight, making both the background CH 4 mole fraction and enhancements from the gas industry difficult to interpret.The stark difference between observations that occurred with a north-westerly wind compared to a south-westerly wind illustrates the importance of having multiple flights across days with various wind conditions to better understand the major influences on CH 4 concentrations throughout a region.The regional influences in Pennsylvania also demonstrate the utility of deriving an emissions inventory that provides input data to drive a transport model, allowing one to forecast CH 4 mole fractions on difficult days and better understand the daily uncertainties associated with heterogeneous background conditions.
Though this study presented observations from 10 flights over a 3-week period, it is not able to account for the potential of long-term temporal variability in the emission rates.In May 2015 when the flights took place, the entire Marcellus basin was nearing peak production and active drilling and hydraulic fracturing was still ongoing in the region.By mid-2016, the rate of drilling of new wells in the north-east of Marcellus had decreased and natural gas production had begun to decline in the area.A snapshot of the emission rate during one month of a basin in its peak production is insufficient to characterize emissions from an area that is likely to be producing and transporting gas at various intensities for decades.We need to quantify the long-term climatological impacts of gas production.Future work examining the temporal variability of CH 4 emissions within natural gas basins would complement short-term, high-intensity studies such as this one, and aid with understanding how well the calculated emission rates represent the gas basin over the course of time.
Data availability.Aircraft data for this project are available for public access at https://doi.org/10.15138/G35K54(Sweeney et al., 2015).
Competing interests.The authors declare that they have no conflict of interest.

Figure 1 .
Figure 1.A map of the unconventional wells in Pennsylvania shown with purple dots.Production values of wells for May 2015 are indicated by the marker colour.Red rectangle and inset show the region of focus for this study; 41.1-42.2• N 75.2-77.6• W.

Figure 2 .
Figure 2. A map of transmission and gathering pipelines for the states of PA and NY.Transmission pipelines are provided by Platts Natural Gas Pipelines product.Gathering pipelines associated with unconventional wells in PA are extrapolated using information on existing gathering pipelines provided by Bradford County, PA (highlighted in yellow).

Figure 3 .
Figure 3. Observed CH 4 enhancements within the boundary layer from each of the 10 afternoon flights used in this study, with green dots showing the location of unconventional wells in PA and blue arrows showing the modelled wind direction during the time of the flight.CH 4 enhancements are calculated by taking the observed CH 4 mole fraction values and subtracting the flight's background CH 4 value shown in Table3.Dates are mm/dd/yyyy.

Figure 4 .
Figure 4. (a) Observed and model-projected CH 4 enhancements during the 14 May 2015 at 16:00 Z.(b) Comparison of observed natural gas enhancement and modelled natural gas enhancement along flight path, with upstream emission rate optimized by minimizing the absolute error between the data sets.(c) Same as previous panel but optimized by minimizing the sum of the error between the data sets.
The background error relates to the value chosen for each flight, which represents the ambient CH 4 concentration in the boundary layer unrelated to emission sources within the model.In this study background values ranged from 1897 to 1923 ppb.Though background values should not have high variability during a 2-3 h mid-afternoon flight, entrainment from the boundary layer top can lead to the mixing in of tropospheric air that has different CH 4 mole fraction values from those within the boundary layer, resulting in a change in the afternoon background value with time.Furthermore, for days on which all aircraft observations (including those upwind of the unconventional wells) are impacted by various CH 4 plumes predicted within the model, it is difficult to determine the background CH 4 concentration accurately.Additionally, observations corresponding to locations with no modelled enhancement may in fact have been impacted by missing sources in our inventory, highlighting the difficult nature of knowing with certainty where and what the background is for any given flight.Understanding this uncertainty is crucial; any error in subtracting the background value directly impacts each observation's natural gas enhancement.

Figure 5 .
Figure 5.Comparison of observed natural gas enhancement and modelled natural gas enhancement for segments along the (left) 22 May flight and (right) 28 May flight.A distinct lack of representativeness of the observations in the modelled enhancement can be seen in the 22 May flight compared to the 28 May flight.

Figure 6 .
Figure 6.A log-scale contour of the anthropogenic CH 4 emissions inventory from this study used within the transport model.The red rectangle surrounds the study region where the aircraft campaign took place.

Figure 7 .
Figure 7. (a) Model-projected CH 4 enhancement at the surface associated with underground, surface, and abandoned coal mines on 27 May 2015 at 19:00 Z, with the shaded regions showing the CH 4 enhancement and the arrows representing the wind direction.(b) Projected enhancement from (a) mapped over measured CH 4 enhancement from a driving campaign.The height and colour of the bars represent the scale of the CH 4 enhancement.

Figure 8 .
Figure 8. Projected CH 4 enhancements during the late afternoon flight of 24 May 2015 at 21:00 Z, 700 m above ground level from (a) upstream unconventional gas processes (b) downstream unconventional gas processes (c) conventional production (d) coal mines (e) animal emissions and (f) landfills and other sources within the EPA GHG Inventory Report.The centre figure is a map of the combined enhancement from sources (a-f).

Figure 9 .
Figure 9. (a) Observed CH 4 enhancements from within the boundary layer during the first loop of the 29 May aircraft campaign.(b) Aircraft observations laid overtop modelled CH 4 concentrations at 700 m from sources unrelated to emissions from upstream gas production.(c) Observed CH 4 enhancements from the 29 May flight after subtracting modelled sources in (b).The new set of observations represent the observation-derived upstream gas enhancement during the flight.

Figure 10
Figure 10.(a) Observed enhancement from unconventional natural gas production overtop projected upstream natural gas enhancements at 700 m from the first loop of the 29 May flight, using an upstream gas emission rate of 0.13 % of production.(c) Direct comparison of the observed natural gas enhancement with the modelled enhancement following the path from A to D using an unconventional emission rate of 0.13 %. (b, d) Same as left figures, except using the optimized upstream emission rate of 0.26 %

Figure 12 .
Figure 12.(a) Observed potential temperature profile with height from the first aircraft spiral on the 29 May flight at 17:00 Z.(b) Modelled potential temperature at the location and time at which the aircraft spiral occurred.In both cases, an inversion in the potential temperature profile begins to occur around 850 m.

Figure 13 .
Figure 13.(a) Observed CH 4 enhancement from the late-afternoon flight on 24 May 2015.(b) Observed CH 4 enhancement compared to the model projected CH 4 enhancement from the sum of all sources in the region.The colour scale of observed and projected enhancements is scaled 1 : 1, with matching colours indicating matching values.Modelled wind vectors and CH 4 concentrations are from 700 m model height level.

Figure 14 .
Figure 14.Observed CH 4 enhancements from an early flight on 28 May 2015 compared to projected CH 4 enhancements from coal emissions modelled at (a) 14:00 Z and (b) 15:00 Z.The 1 h time difference results in vastly different projected enhancements across the southern portion of observations.Modelled wind vectors and CH 4 concentrations are from the 700 m model height level.

Figure 15 .
Figure 15.Observed and model-projected CH 4 enhancements during (a) the early afternoon flight of 24 May 2015 at 17:00 Z and (b) the flight of 25 May 2016 at 19:00 Z. Modelled wind vectors and CH 4 concentrations are from 700 m model height level.

Figure 16 .
Figure 16.Time series of CH 4 mole fractions from the second loop of the 22 May flight.Observations at the shaded areas below A and B were taken at similar locations in space, showing the change in the background mole fraction across time.

Figure 17 .
Figure 17.Calculated upstream natural gas emission rates using (black) model optimization technique and (red) aircraft mass balance technique.Error bars represent the 2σ confidence interval for each flight.Mass balance performed in Peischl et al. (2015) included for comparison.

Figure 18 .
Figure 18.Observations and modelled enhancements of the flight from Peischl et al. (2015) for 6 July 2013.(a) Observed enhancements from the flight over model-projected enhancements from all sources at 21:00 Z.(b) Projected enhancement from upstream gas processes using a 0.4 % emission rate.(c) Projected enhancement from coal sources in south-western PA.Modelled wind vectors and CH 4 concentrations are from 700 m model height level.

Table 1 .
List of tracers used in the transport model.

Table 2 .
Annual emission rate totals from anthropogenic sources within the innermost model domain based on values from the inventory within this study.

Table 3 .
Meteorological statistics from the May 2015 flight campaign.

Table 4 .
Optimized natural gas emission rates for each flight as well as corrected emission rates adjusting for errors in the model wind speed and boundary layer height.For wind speed and boundary layer height error, a negative value represents a model value lower than the observations.

Table 5 .
Emission rates and potential errors associated with the model optimization technique.r-values represent the correlation between the model and observation-derived upstream natural gas enhancements.

Table 6 .
Emission rates from mass balance calculations on applicable days, with emission ranges associated with a ±5 ppb error in the background value.

Table 7 .
Relative error associated with the different sources of uncertainty in the aircraft mass balance.