Monitoring of volatile organic compounds ( VOCs ) from an oil and gas station in northwest China for 1 year

Oil and natural gas are important for energy supply around the world. The exploring, drilling, transportation and processing in oil and gas regions can release a lot of volatile organic compounds (VOCs). To understand the VOC levels, compositions and sources in such regions, an oil and gas station in northwest China was chosen as the research site and 57 VOCs designated as the photochemical precursors were continuously measured for an entire year (September 2014– August 2015) using an online monitoring system. The average concentration of total VOCs was 297± 372 ppbv and the main contributor was alkanes, accounting for 87.5 % of the total VOCs. According to the propylene-equivalent concentration and maximum incremental reactivity methods, alkanes were identified as the most important VOC groups for the ozone formation potential. Positive matrix factorization (PMF) analysis showed that the annual average contributions from natural gas, fuel evaporation, combustion sources, oil refining processes and asphalt (anthropogenic and natural sources) to the total VOCs were 62.6± 3.04, 21.5± .99, 10.9± 1.57, 3.8± 0.50 and 1.3± 0.69 %, respectively. The five identified VOC sources exhibited various diurnal patterns due to their different emission patterns and the impact of meteorological parameters. Potential source contribution function (PSCF) and concentration-weighted trajectory (CWT) models based on backward trajectory analysis indicated that the five identified sources had similar geographic origins. Raster analysis based on CWT analysis indicated that the local emissions contributed 48.4–74.6 % to the total VOCs. Based on the high-resolution observation data, this study clearly described and analyzed the temporal variation in VOC emission characteristics at a typical oil and gas field, which exhibited different VOC levels, compositions and origins compared with those in urban and industrial areas.

Published by Copernicus Publications on behalf of the European Geosciences Union.
Until now, research concerning atmospheric VOCs including their emissions, atmospheric transformation, health impact and so on is still a much discussed topic around the world.
Previous studies in China mainly focused on the measurements of VOCs in urban agglomerations such as the Pearl River Delta (PRD) region (Tang et al., 2007;Liu et al., 2008;Cheng et al., 2010;Ling et al., 2011), Yangtze River Delta (YRD) region (An et al., 2014;Li et al., 2016;Shao et al., 2016), and Beijing-Tianjin-Hebei (BTH) region (Li et al., 2015) and key megacities including Beijing (Song et al., 2007;Wang et al., 2010;Yuan et al., 2010), Shanghai (Cai et al., 2010;Wang, 2014), Guangzhou (Zou et al., 2015) and Wuhan (Lyu et al., 2016).These studies found that vehicle emissions and solvent usage contributed most to the ambient VOCs in urban areas.A few studies were also conducted in industrial areas (An et al., 2014;Wei et al., 2015;Shao et al., 2016) and petrochemical industrial regions with a lot of VOC emissions (Lin et al., 2004;Wei et al., 2015;Jia et al., 2016;Mo et al., 2017).These studies conducted in industrial areas found that the VOC sources and compositions are complex due to the different emissions and atmospheric processes (Warneke et al., 2014).However, the research conducted in oil and gas areas in China is still limited while the VOC emission characteristics in these types of regions are common around the world (Buzcu-Guven and Fraser, 2008;Simpson et al., 2010;Rutter et al., 2015;Bari et al., 2016).For instance, Leuchner and Rappenglück (2010) found that natural gas or crude oil sources contributed most to the VOC emissions in Houston.Gilman et al. (2013) found that oil and gas emissions strongly contribute to propane and butanes in northeast Colorado.Therefore, studies concerning VOC emission characteristics in oil and gas areas in China are very important.
In previous studies, the ambient air was sampled for a few days (weeks) or at a certain season with low time resolution.The diurnal, monthly and seasonal variations were mostly overlooked, which prevented the understanding of the VOCs' temporal behaviors influenced by the real-time emissions, photochemical reaction and meteorological condition.Therefore, a long-term monitoring of VOCs with a high time resolution is desired (Baudic et al., 2016;Liu et al., 2016).It should be emphasized that in the September of 2013, the VOC control in petrochemical regions has been listed as one of the main objectives of the Action Plan of Atmospheric Pollution Control released by the central government of China (http://www.gov.cn/zwgk/2013-09/12/content_2486773.htm,last access: 29 June 2017), which proposed new requirements to conduct research in this type of field.
To identify the VOC sources, receptor models including chemical mass balance (CMB), positive matrix factorization (PMF) and principal component analysis/absolute principal component scores (PCA/APCSs) have been widely used (Guo et al., 2004;Rodolfo Sosa et al., 2009;An et al., 2014;Liu et al., 2016).Meanwhile, dispersion models including conditional probability function (CPF), backward trajectory, potential source contribution function (PSCF) and concentration-weighted trajectory (CWT) are also employed to locate the potential source origins (Song et al., 2007;Chan et al., 2011;Liu et al., 2016).Recently, the combination of these two types of models has been developed to figure out the locations of various air pollutant sources (Zhang et al., 2013a;Bressi et al., 2014;Chen et al., 2016).These practices mainly focus on the atmospheric fine particles (PM 2.5 ); few studies have concerned the local and regional source contributions of VOCs.
In this study, an oil and gas field located in northwest China was chosen as the study area to conduct long-term monitoring of VOCs with high time resolution.The main objectives are to (1) compare VOC concentrations, compositions and OFP at this oil and gas station with other areas, (2) discuss the relationships between VOC concentrations and meteorological parameters on different timescales, (3) identify the possible VOC sources by PMF, and (4) identify the local source contributions and regional origins of VOCs based on PMF and dispersion models.This study is the first VOC research with high time resolution at the oil and gas field in China, and provides new information on the temporal variation, OFP, and local and regional contributions of VOCs.This study will be helpful to establish control measures of VOCs at this type of region around the world.

Site description
The study area (44.1-46.3• N and 84.7-86.0• E) is located in northwest China and at the northwestern margin of the Junggar Basin, which is an important oil-and gas-bearing basin (Fig. 1a).The proven deposits of oil and natural gas are 2.41 × 10 9 t and 1.97 × 10 11 m 3 , respectively.There are hundreds of oil and gas wells in this field with an annual gas deliverability of 1.20 × 10 10 m 3 (Chen, 2015).Additionally, 126 petrochemical plants are spread across this area.This area can be divided into two regions with oil and gas operation and oil refineries to the north (Region 1) and petrochemical industry to the south (Region 2).These two regions are about 150 km apart from each other (Fig. 1b).Region 1 is abundant in oil and gas resources and the main petrochemical factories are oil refineries and natural gas chemical plants.The main products include gasoline, diesel, asphalt and 1, 3butadiene.The flow charts of chemical processing are shown in Fig. S1 in the Supplement.Region 2 is a key petrochemical base, with the production capacity of oil and ethylene being 6 × 10 6 and 2.2 × 10 5 t yr −1 , respectively.The sampling site is located on the rooftop of a building (15 m above the ground, 45.6 • N, 85 • E), about 11 km away to the southeast of the urban region.To the northeast of the sampling site, there are hundreds of oil and gas wells (Fig. 1c).The study area is in the hinterland of Eurasia.The typical temperate continental arid desert climate results in high temperature in summer (27.9 • ) and low temperature in winter (−15.4 • ).The sufficient solar radiation, little precipitation and low humidity (43-56 %) result in high evaporation (> 3000 mm) in this region.

Descriptions of instruments and quality assurance and quality control
From September 2014 to August 2015, 57 ambient VOCs designated as O 3 precursors by the Photochemical Assessment Monitoring Station (PAMS) were continuously sampled and measured using an online monitor system (TH-300B, Wuhan Tianhong Instrument Co., Ltd, China) with 2 h time resolution.The sampling and analysis procedures were described elsewhere (Lyu et al., 2016).Briefly, two channels were installed to analyze VOCs separately.The water and carbon dioxide in the sampled air were first removed at a cold trap maintained at −80 • and then concentrated at −150 • at another cold trap.After the purification and concentration, the VOCs were desorbed by rapid heating to 100 • .The C 2 -C 5 VOCs were separated with a porous-layer opentabulator (PLOT) column (diameter: 0.32 mm, thickness of membrane: 1.5 µm, length: 60 m) and were quantified us-ing a gas chromatograph flame ionization detector (GC-FID, Agilent 7890).C 5 -C 12 were separated with a DB-624 column (diameter: 0.25 mm, thickness of membrane: 3 µm and length: 60 m) and were quantified using a mass spectrometer detector (MSD, Agilent 5975).
The target compounds involved 57 VOC species: alkanes (30), alkenes (9), alkynes (acetylene) and aromatics (17).The standard gases from PAMS were used for the equipment calibration and verification through the five-point method every 2 weeks (Lyu et al., 2016).The correlation coefficients of the calibration curves usually varied from 0.991 to 0.998.The detection limits were in the range of 0.04 to 0.12 ppbv (Table 1).The missing value was due to power failure or instrument maintenance and was not included in the data analysis.

Meteorological parameters and air pollutants
Other datasets such as the 3 h resolution meteorological parameters (atmospheric pressure, P ; temperature, T ; relative humidity, RH; wind speed, WS; and direction, WD) were collected from the Meteomanz (www.meteomanz.com,last access: 24 July 2017) and are shown in Fig. 2. The boundary layer height (BLH) was computed every 3 h each day through NOAA's READY Archived Meteorology website (http://www.ready.noaa.gov/READYamet.php,last access: 23 July 2017).
The hourly CO, NO 2 , O 3 , SO 2 , ambient particles (PM 10 ) and fine particles (PM 2.5 ) that can be inhaled were measured using an ambient air quality continuously automated monitor (TH-2000 series, Wuhan-Tianhong Instrument Co., Ltd, China) and the data were acquired from the Qingyue Open Environmental Data Center (https://data.epmap.org,last access: 29 June 2017).It should be noted that the NO 2 (NO 2 = NO x -NO) concentrations were in fact overestimated.This is because some oxidized reactive nitrogen is converted by the molybdenum during the NO x measurement, while the NO measurement is accurate using the chemiluminescence technique.Therefore, the NO 2 concentrations discussed below are considered greater than the actual values (Dunlea et al., 2007;Zou et al., 2015).According to the ambient air quality standards II (GB/3095-2012), the main air pollutants were PM 10 and PM 2.5 in winter and NO 2 in autumn (Fig. S2).

VOC source apportionment and OFP
The PMF model has been widely employed for VOC source apportionment (Buzcu-Guven and Fraser, 2008;Leuchner and Rappenglück, 2010;Liu et al., 2016;Lyu et al., 2016).In this study, the EPA PMF 5.0 (US EPA, 2014) was employed and additional information is given in Appendix A.  The VOC concentrations are not proportional to the OFP due to their wide ranges of photochemical reactivity with OH radicals (Table 1).Two methods including propyleneequivalent concentrations (propy-equiv) and the maximum incremental reactivity (MIR) were adopted to analyze the OFP of VOCs.More details can be found in the research of Atkinson and Arey (2003) and Zou et al. (2015).

Conditional probability function (CPF)
The CPF is widely used to locate the direction of sources based on wind direction data (Song et al., 2007).In this study, the directions of various VOC sources were explored based on the G matrix in PMF analysis and wind directions.The CPF is defined as where m θ is the number of data from wind sector θ (each is 22.5 • ) that exceed the threshold value (75th percentile of each source contribution); n θ is the total number of occurrence from the same wind direction.Calm conditions (wind speed < 1 m s −1 ) were excluded from the calculation for their difficulty in defining the wind direction.
The FNL global analysis data produced by the National Center for Environmental Prediction's Global Data Assimilation System (GDAS) wind field reanalysis were introduced into the calculation.A total of 2743 backward trajectories were generated and were then grouped into four clusters according to their geographic sources and histories.As shown in Fig. S3, the trajectories mainly originated from northwest of the sampling site during the whole observation period.

Local and regional transport contribution
The PSCF and CWT models have been previously used to identify the possible source regions based on the backward www.atmos-chem-phys.net/18/4567/2018/Atmos.Chem.Phys., 18, 4567-4595, 2018 trajectory analysis (Cheng et al., 2013;Bressi et al., 2014;Liu et al., 2016).The PSCF gives the proportion of air pollution trajectory in a given grid and the CWT reflects the concentration levels of trajectories.The geographic domain (31-71 • N, 36-107 • E) was found to be within the annual range of 48 h backward trajectories.The total number of grids was 11 360 with a resolution of 0.5 • × 0.5 • .More information about the PSCF and CWT analysis can be found in Appendix B.
Local and regional source contributions of the observed VOCs were calculated using raster analysis.In previous studies, the domain was divided into 12 sectors (each was 30 • ) to study the regional contributions (Bari et al., 2003;Wang et al., 2015Wang et al., , 2016)).However, in this study, the domain was briefly divided into two sections (local and regional), with the sampling site as the original point.The range of local sources were defined as a polar with a radius of 12 h backward trajectories and the range of regional sources was outside of the circle (detailed descriptions can be found in Appendix C).The concentration of each grid was calculated using CWT analysis.By counting and averaging in each section, contributions of local emissions and regional transportation were produced.To reduce the effects of background values, the lowest CWT values (C b ) in each section were deduced from the concentrations.The contribution (%) of local source and regional transportation was defined as follows: where C i is the mean CWT value in the ith section (local or regional), C bi is the background value of the ith section and N i is the number of grids with non-zero CWT concentrations in the ith section.
Several factors affect the calculated results of regional and local source contributions, including the radius of the circle and CWT value.In this study, the 12 h backward trajectories were chosen to differentiate the local area from regional area.In fact, the longer the backward trajectories were, the lower the regional contributions that were produced.In addition, the PMF model was employed to VOC source apportionment and the contribution of each identified source was introduced into CWT calculation.However, the negative value of source contribution was inevitably generated despite the application of F peak in PMF analysis.Therefore, the negative CWT value was excluded in raster analysis and this would affect the results of regional and local source contributions.Overall, although flaws existed in this new method, it gave new insight into understanding the quantitative contributions of local and regional sources to the VOCs in the study area.

VOC levels and compositions
The statistics of observed VOCs are summarized in Table 1 and every 2-hourly variations in four VOC categories are shown in Fig. 3.Among the four different VOC groups, the average concentrations of alkanes were highest (129 ± 173 ppbv), followed by alkenes (9.52 ± 14.5 ppbv), aromatic hydrocarbons (4.28 ± 8.24 ppbv) and acetylene (3.03 ± 5.55 ppbv).The top four alkanes were ethane (39.7 ± 57.3 ppbv), propane (22.6 ± 33.5 ppbv), n-butane (15.8 ± 21.4 ppbv) and i-butane (12.5 ± 17.5 ppbv).These four species accounted for 64.8 % of the alkanes in total.Among the alkenes, 1-pentene, propylene and ethylene were the most abundant species with their average concentrations of 4.47 ± 6.72, 1.88 ± 10.2 and 1.42 ± 1.69 ppbv, respectively.They represented 71.8 % of the alkenes in total.Of the aromatic hydrocarbons, 96.7 % were composed by benzene, toluene, and mand p-xylene, with corresponding average concentrations of 1.13 ± 1.62, 1.06 ± 1.91 and 0.72 ± 1.94 ppbv, respectively.High concentrations of alkanes, ethane and propane in ambient air were also reported in other oil and natural gas operation and industrial areas in the US (Pétron et al., 2012;Helmig et al., 2014;Warneke et al., 2014).For instance, the average concentrations of ethane and propane were 74 ± 79 and 33 ± 33 ppbv, respectively, in Horse Pool and Uintah Basin in the winter of 2012.Despite the highly enhanced VOC levels due to the temperature inversion, the VOC levels in Uintah Basin were still higher than those in the regional background areas because of oil and gas exploitation activities (Helmig et al., 2014).A distinct chemical signature of collected air samples from the Boulder Atmospheric Observatory in northeastern Colorado was also found with enhanced concentrations of most alkanes (propane, n-butane, i-pentane and n-pentane; Pétron et al., 2012).
The VOC concentrations, compositions and the top five species in this study and other areas around the world were compared and are shown in Fig. 4. The total VOC concentrations in this study (297 ± 372 ppbv) were 1-50× higher than those in urban areas like Beijing (34.5 ppbv), Shanghai (32.4 ppbv), Guangzhou (43.6 ppbv), Seoul (122 ppbv), Mexico (117 ppbv) and 28 cities in the US (9.91 ppbv) as well as industrial areas, including Houston (31.2 ppbv), northeastern Colorado (96.1 ppbv), the Alberta oil sands (2.87 ppbv), Ulsan (91.7 ppbv), the YRD (22.9 ppbv) and Nanjing (34.5 ppbv; Fig. 4a).As shown in Figs.3e and 4b, the alkanes were the most abundant group (87.5 % on average) during the whole sampling period, which was quite higher than other urban or industrial areas (45.3-67.2%, Fig. 4b).Similar relatively high proportions of alkanes were found in Houston (77.1 %; Leuchner and Rappenglück, 2010), the Alberta oil sands area (74.8 %;Simpson et al., 2010) and northeastern Colorado (97.4 %; Gilman et al., 2013), which were all related to oil and gas operations.In urban areas, the aromatics accounted for about 10.1-47.9 % of the total VOCs, with toluene as one of the most abundant species.Toluene is mainly from solvent usage (Guo et al., 2004;Yuan et al., 2010) or vehicle exhaust emissions (Wang et al., 2010) in cities.Another dominant compound in the urban air is propane (1.45-14.7 ppbv), which is the main component of liquid petroleum gas and natural gas (LPG/NG; McCarthy et al., 2013).In industrial areas, alkanes and alkenes contribute most to the total VOCs (43.4-97.4 and 1.8-43.6%, respectively) with ethane, propane and ethylene usually as the top species (Fig. 4c).They may originate from incomplete combustion or LPG/NG usage (Durana et al., 2006;Tang et al., 2007;Guo et al., 2011).To sum up, the concentrations of VOCs in this study were higher than many other regions and cities.The compositions and the top five species of VOCs exhibited typical characteristics of oil and gas exploring regions, such as Houston (Leuchner and Rappenglück, 2010), northeastern Colorado (Gilman et al., 2013) and the Alberta oil sands area (Simpson et al., 2010).

Contribution of VOCs to OFP
The profiles of different VOC categories with concentrations expressed on different scales are shown in Fig. 5.The top 10 VOC species for OFP obtained using the propy-equiv and MIR methods are listed in Table S1 in the Supplement.Among the top 10 compounds calculated using the two methods, six compounds were the same, but differed in their rank order.Considering the kinetic activity, 1-pentene ranked first with the propy-equiv method.However, o-xylene showed the highest OFP based on the MIR method, which may be related with the chemical mechanisms and the impacts of NO x (Zou et al., 2015).Despite the two methods being different in mechanisms, the proportions of different VOC categories to the OFP were the same.From the non-weighted concentrations by volume and carbon atom, alkanes contributed 83 ± 9 and 82 ± 9 %, respectively, to the total VOC concentrations, followed by alkenes (11 ± 6 and 9 ± 4 %, respectively) and aromatics (5 ± 6 and 8 ± 7 %, respectively).
Although the proportions of alkenes and aromatics increased when compared to the values of non-weighted concentrations, the alkanes were still dominant, accounting for 45 ± 11 and 50 ± 14 %, respectively.In summary, the alkanes had the highest concentrations (for both volume and carbon atom) and largest proportions to the OFP weighted by the propyequiv and MIR methods.The results of this study were different from previous research.For example, the alkanes with the highest concentrations (both for volume and carbon atom) contributed less to OFP, while alkenes and aromatics with fewer concentrations contributed most to the OFP in Guangzhou (73 and 83 %, respectively; Zou et al., 2015) and Tianjin (about 28-40 and 32-42 %; Liu et al., 2016) as well as a petrochemical industrialized city (48-49 and 37-49 %; Jia et al., 2016).

Temporal variations
Figure 6 shows the temporal variations in ethane, ethylene, acetylene and benzene on different timescales.Though differences existed, the selected compounds broadly represent the respective alkanes, alkenes, alkynes and aromatics (Lyu et al., 2016).Significant differences were found between the meteorological parameters in different seasons (p < 0.01) and the highest concentrations of these VOC species (ethane, ethylene, acetylene and benzene) were observed in winter.The seasonal variation in VOCs is controlled by meteorological conditions, photochemical activities and source emissions.The highest values in winter were due to inhibited photochemical activities under suppressed dispersion conditions (averaged BLH as 121 ± 71.7 m, wind speed as 1.20 ± 0.76 m s −1 ) and low temperature (−11.8 ± 5.00 • ).
For instance, all these species were negatively correlated with BLH, exhibiting higher VOC levels under lower BLH (Fig. 6).The wind speed and temperature were also found to be negatively correlated with VOC concentrations and ethylene showed the highest negative correlation coefficient with these parameters (Table S2).The reduced photochemical reactions can result in high concentrations in winter, which was shown by negative correlation between VOCs and O 3 (Table S2).Additional sources (i.e., combustion) may also be present, in view of the obvious increase in acetylene (Fig. 6c) from summer (0.87 ± 1.00 ppbv) to winter (10.5 ± 8.51ppbv).Conversely, the high temperature, WS and BLH favor the dilution and dispersion of ambient VOCs and the photochemical depletion in summer.
The diurnal variations in VOCs and trace gases (NO 2 and O 3 ) related to photochemical reaction are shown in Fig. 7.The VOCs had a reverse trend with O 3 (r = −0.82,p < 0.01).The lower BLH and fewer photochemical activities resulted in peak values for VOCs and low O 3 concentrations before sunrise (06:00 local time).After sunrise, with the initiation of photochemical oxidation increasing the BLH, the concentrations of VOCs decreased while the O 3 increased rapidly.The minimum of VOCs occurred at about 12:00-14:00 LT resulted from both dispersion or dilution conditions and photochemical reactions (with the highest O 3 concentrations at l4:00 LT) in the afternoon.The diurnal variation in NO 2 were controlled by BLH, O 3 , and photochemical reactions (i.e., OH radical) and showed a double peak.The similar diurnal patterns of different atmospheric lifetime compounds including ethane, ethylene, acetylene and benzene (the most abundant contributors to its categories) were also found (Fig. S4).To better understand the effects of BLH and photochemical reactions on VOCs, the diurnal variations in VOCs, BLH and O 3 in winter and summer were analyzed (Fig. 7b, c).VOC concentrations in winter (213 ± 97.7 ppbv) were significantly higher than those in summer (130 ± 100 ppbv).However, the VOC concentrations of summer and winter decreased by 8.3× and 2.3×, respectively, from their maximum to the minimum.This was due to the BLH increasing by 8.2× in summer while the BLH in winter only increased by 2.3×.The effects of photochemical reactions on VOCs in two seasons were comparable, which was explained by a similar O 3 increment in winter (0.78× up) and summer (0.71× up).Therefore, we can conclude that the role of BLH variation was more important than the photochemical reaction for the diurnal variation in VOCs.

Ambient ratios: sources and photochemical removal
Ambient ratios for VOC species holding similar reaction rates with OH radicals can reflect the source features, as these compounds are equally affected by the photochemical processing and the new emission inputs (Russo et al., 2010;Baltrėnas et al., 2011;Miller et al., 2012).For example, nbutane and i-butane have similar reaction rates with the OH radicals, with the differences < 10 %, and the ratios of these pair species indicated different sources.The butanes are associated with NG, LPG, vehicle emissions and biomass burning and the i-butane / n-butane ratios varied according to sources (i.e., 0.2-0.3 for vehicle, 0.46 for LPG and 0.6-1.0 for NG; Buzcu and Fraser, 2006;Russo et al., 2010).In this study, the slope of i-butane / n-butane (0.80-0.82,Fig. 8a) was within the range of reported emissions from NG.Additionally, ipentane and n-pentane have similar physical and chemical characteristics (i.e., boiling point and reaction rate coefficients with hydroxyl radical), which result in less susceptibility of the i-pentane / n-pentane ratio in source identification (Gilman et al., 2013).The pentanes are always from NG emissions, vehicle emissions, liquid gasoline and fuel evaporation, with the i-pentane / n-pentane ratios ranging between 0.82 and 0.89 (Gilman et al., 2010(Gilman et al., , 2013)), ∼ 2.2 and 3.8 (Conner et al., 1995;McGaughey et al., 2004), 1.5 and 3.0, and 1.8 and 4.6 (Watson et al., 2001), respectively.As shown in Fig. 8b, the slopes of i-pentane / n-pentane were 1.03- 1.24 in this study, suggesting that the pentanes were more likely from the mixed sources of NG and fuel evaporation.
This assumption was proved by the high loadings of pentanes in NG and fuel evaporation source compositions in Sect.3.5.Information on the photochemical removal process can be obtained by comparing the ambient ratios of aromatics due to their differences in atmospheric lifetimes.For example, the atmospheric lifetimes of benzene (9.4 days), toluene (1.9 days) and ethylbenzene (1.6 days) are relatively longer than m-xylene (11.8 h) and p-xylene (19.4 h; Monod et al., 2001).The commonly used ratios are benzene / toluene, mand p-xylene / ethylbenzene, benzene / ethylbenzene, and toluene / ethylbenzene.The diurnal variation in these compounds and ratios is shown in Fig. 9.A continuous decrease in these compounds and ratios was observed from 08:00 to 14:00 LT, indicating the increased photochemical removal processes due to the increase in reactive radicals (i.e., hydroxyl radical).The diurnal patterns of benzene / ethylbenzene and toluene / ethylbenzene in this study (Fig. 9b, d) were opposite to those observed in Dallas, which was mainly influenced by vehicle emissions (Qin et al., 2007).After 14:00 LT, the increase in the ratios and aromatic concentrations was due to the weakening of photochemical activities.The unusually high concentrations of ethylbenzene and mand p-xylene were observed at about 02:00 LT (Fig. 9c), which might be related to new emissions.This assumption was verified by a small peak occurring at 02:00 LT in the diurnal profile of an oil refinery source (see Sect. 3.5.1).After 12 h of dispersion, dilution and photochemical reaction, the concentrations of these two compounds reached their minimum values at about 14:00 LT.
Generally speaking, when the reaction with OH radicals was the only factor controlling the seasonal ratio of longer atmospheric lifetime to shorter lifetime compounds (i.e., benzene / toluene, mand p-xylene / ethylbenzene), an increase in ratio value from winter to summer would be expected (Russo et al., 2010).However, the seasonal variation in BTEX (benzene, toluene, ethylbenzene, and xylenes) ratios in this study was opposite to the general behavior.For example, the benzene / toluene ratio decreased from winterspring (0.63-0.69) to summer-fall (0.52-0.57;Fig. 8c) and the ethylbenzene / mand p-xylene ratio also decreased from autumn-winter (0.47-0.69) to spring-summer (0.19-0.37;Fig. 8d).Same results were also observed in both industrial areas (Miller et al., 2012) and urban areas (Ho et al., 2004;Hoque et al., 2008;Russo et al., 2010).The results obtained in this study indicated that there were other factors affecting the seasonal variation such as source emissions.The BTEX mainly originate from vehicle exhaust (Wang et al., 2010), solvent usage (Guo et al., 2004;Yuan et al., 2010) and the petrochemical industry (Na and Kim, 2001;Hsieh et al., 2006;Baltrėnas et al., 2011).The mand pxylene / ethylbenzene ratio here (2.2 ± 1.2) was within the ranges reported at a petrochemical area in southern Taiwan (1.5-2.6;Hsieh et al., 2006) and the vicinity of a crude oil refinery in the Baltic region (3.0-4.0;Baltrėnas et al., 2011).Therefore, the BTEX in this area was mainly from the oil refinery emissions.The unexpectedly low benzene / toluene and mand p-xylene / ethylbenzene ratios in summer were due to the strong oil refinery emissions strength and this finding was verified by the seasonal source contribution results in Sect.3.5.1.

Source apportionment: temporal variation in and contribution to OFP
Five sources including oil refining process, NG, combustion source, asphalt and fuel evaporation were identified by the PMF analysis, and their source profiles and daily contributions are shown in Fig. 10.The monthly, seasonal and annual contributions were calculated and are shown in Fig. 11.The relationships among daily source contributions and meteorological parameters and trace gases were analyzed using scatter plots (Fig. 12).The source apportionment of this highresolution dataset provided a unique opportunity to discuss the diurnal variation in different sources as shown in Fig. 13.

Oil refining
The emissions from the refining process are complex due to the diversities of VOC species, which depend on the production processes (Vega et al., 2011;Mo et al., 2015).The crude oil is composed of ≥ C 5 alkanes, cycloalkanes, aromatics and asphaltics (Simpson et al., 2010) and they are supplied as the raw materials for various oil refining processes.High fractions of C 5 -C 9 alkanes including hexane (32 ± 6.2 %), cyclohexane (40 ± 7.9 %), methylcyclohexane (47 ± 6.9 %), n-octane (56 ± 4.2 %), n-nonane (58 ± 2.9 %) and aromatics (i.e., 22 ± 3.0 % for benzene, 39 ± 5.4 for toluene and 45 ± 7.3 % for xylenes) were present (Fig. 10a), which was similar to the chemical compositions measured from the oil refineries (Liu et al., 2008;Dumanoglu et al., 2014).The calculated daily source contributions from the PMF model were well correlated with the high loading species in its source profiles.For example, the methylcyclohexane showed significant correlation with this source contribution (Fig. S5a), suggesting that the tracers of oil refineries were well produced by the PMF model.The main products from oil re- fineries are gasoline, diesel, lubricating oils and kerosene in this area, consistent with the factor derived here.
The annual contribution of the oil refining source was relatively stable throughout the year (3.8 ± 0.50 %).The highest relative contribution was found in summer (5.3 %) and the lowest in winter (2.4 %; Fig. 11b).The Pearson analysis between the daily source contributions and wind speed disclosed a middle statistical negative correlation (r = −0.12,p < 0.05).However, no statistically correlations between the daily source contribution and other meteorological parameters were found (Table S3), even for the BLH.Conversely, significant positive correlations between this source and trace gases (NO 2 and CO) were found, with r being 0.33 and 0.21, respectively (Fig. 12a).These trace gases are associated with oil refinery emissions (Cetin et al., 2003).Therefore, the daily variation in oil refinery sources in this study was more controlled by oil refining emission strength and less influenced by meteorological conditions.
The diurnal pattern of this source contribution was well correlated to the methylcyclohexane (r = 0.76, p < 0.01) and characterized by a double wave profile with the first peak at 02:00 LT and second peak at 06:00 LT (Fig. 13a).A small peak occurring at 02:00 LT was due to the increase in ethylbenzene and mand p-xylene (Fig. 9) and the second peak occurred at 06:00 LT resulted from the low BLH.After sunrise, the contribution continuously decreased owing to the increase in BLH and photochemical reactions and the minimum value occurred at 14:00 LT.

Natural gas
Ethane and propane are the most abundant nonmethane hydrocarbon compounds in natural gas (Xiao et al., 2008;Mc-Carthy et al., 2013).The i-butane / n-butane ratio indicated the butanes were from the natural gas (Sect.3.4).Through PMF analysis, a NG source was identified through the high weights on ethane (81 ± 2.4 %), propane (85 ± 5.3 %), nbutane (62 ± 7.5 %) and i-butane (54 ± 6.4 %).As an important oil and gas resource base in China, the export amount of natural gas from this region was 4.4 × 10 9 m 3 and the loss rate was 1.4 % in 2014 (Chen, 2015).The leakage from exploiting, storing, transporting and processing cannot be ignored, suggesting that it was reasonable to attribute this factor to a NG source.
The annual contribution of the NG leakage source was 53 ppbv, accounting for 62.6 ± 3.04 % of the total VOCs on average.The highest contribution presented in spring (65.2 %), followed by summer (63.6 %), autumn (63.0 %) and winter (60.4 %).The daily variation in this source was influenced by meteorological parameters such as the BLH (r = −0.42,p < 0.01; Table S3).The significant positive correlations between NO 2 and CO and the source contribution were also found with Pearson coefficients of 0.45 and 0.44, respectively (Fig. 12b), indicating that the daily variation in NG source was influenced by meteorological conditions and photochemical activities.The diurnal variation in the NG leakage was significantly correlated (p < 0.01) with the diurnal pattern of propane n-butanes and i-butane, with Pearson coefficients of 0.94, 0.87 and 091 (Fig. 13b), which was also reported by Baudic et al. (2016).The diurnal behaviors of this source were characterized by a nighttime high and mid-afternoon low pattern, which can be interpreted as the diurnal evolution of BLH (Bon et al., 2011;Baudic et al., 2016).

Combustion source
This source was dominantly weighted by ethylene (95 ± 3.5 %) and acetylene (97 ± 2.6 %) and moderately influenced by BTEX.These species are key markers of combustion (Fujita, 2001;Watson et al., 2001;Jobson, 2004) or a petrochemical source (Brocco et al., 1997;Song et al., 2007).However, the independent combustion tracers such as CO, NO 2 and PM 2.5 were well correlated to this source contribution, with Pearson correlation coefficients of 0.59, 0.49 and 0.77, respectively (Fig. 12c and Table S3).Therefore, this factor was attributed to combustion source.This source exhibited obvious seasonal differences with the highest contribution in winter (14.9 %) and lowest contribution in summer (6.9 %).The seasonal difference was due to the temperature change and was proved by the significant negative correlation with ambient temperature (r = −0.57,p < 0.01).The diurnal variation in combustion source was in accordance with the diurnal pattern of ethylene and CO with Pearson correlation coefficients of 0.71 (p < 0.05) and 0.84 (p < 0.01), respectively.It was characterized by a double peak profile with an initial increase from 03:00 to 08:00 LT and a second increase at nighttime (20:00-24:00 LT; Fig. 13c).The increase in the morning was related to the low BLH.Different from other researches, no increasing trend of this source was found during 07:00-10:00 LT here, while the combustion source was reported to increase in the rush hour period (Gaimoz et al., 2011;Baudic et al., 2016).Conversely, the decreasing trends were found for independent combustion tracers (CO and NO 2 ) during this period (Fig. 7a).During rush hour at 18:00-20:00 LT, the enhancement of combustion source contributions and CO from 16:00 LT (Fig. 11c) may be related with the reduction of BLH.The reduction of NO 2 from 18:00 LT (Fig. 7a) was also observed, which indicated that the diurnal variation in combustion source was less affected by vehicle exhaust in the present study.
The annual contribution of asphalt was the lowest among the five sources and only contributed 1.3 ± 0.69 % to the total VOCs.The daily contributions of this source and temperature had a statistically reliable positive correlation (r = 0.19, p < 0.01).The seasonal variation in this source was influenced by temperature, with the highest contributions occurring in autumn (2.1 %) and the lowest in winter (0.5 %).However, the influence of BLH on the contribution of asphalt was not significant (r = 0.04, p > 0.05).The correlation between this source and O 3 was found to be insignificant (r = −0.001,p > 0.05).However, significant positive correlation between asphalt and oil refinery sources was observed (r = 0.47, p < 0.01; Fig. 12d), indicating they shared the same origin, which should be oil refining processes in the current study.
The diurnal variation in asphalt was different from other sources and followed the diurnal patterns of decane (r = 0.76, p < 0.01) and undecane (r = 0.86, p < 0.01) well.It continuously decreased from 02:00 to 06:00 LT, slowly increased from 06:00 to 10:00 LT and subsequently decreased (Fig. 13d).A minimum source contribution occurred when the BLH was low in the morning, which was contrary to the other sources.In addition, no significant correlation between this source and O 3 (r = −0.02,p > 0.05) was found.Therefore, the temporal variation in asphalt was less controlled by BLH and photochemical reaction, but was more influenced by the emission strength.
The fuel evaporation is controlled by temperature, leading to higher contributions in summer.The highest contribution was found in summer (22.9 %) in this study.The same results were also observed previously (Baudic et al., 2016;Liu et al., 2016).A significant correlation between the contributions of NG and fuel evaporation was observed (r = 0.65, p < 0.01), indicating these two sources were influenced by similar factors.The diurnal distribution pattern of fuel evaporation source was different from former studies in urban areas (with an increasing trend from 07:00 to 10:00 LT due to the morning rush hour traffic; Baudic et al., 2016).Conversely, the source contribution followed the diurnal variations in fuel evaporation tracers such as i-pentane, n-pentane and methylcyclohexane, with Pearson correlation coefficients of 0.86 (p < 0.01), 0.87 (p < 0.01) and 0.67 (p < 0.05), respectively.

Contribution to OFP
The contributions of five identified VOC sources to OFP were also evaluated using the F matrix and MIR methods.The fuel evaporation showed the highest contribution (41.9 %, 41.6 ppbv), followed by NG (29.6 %, 29.4 ppbv), combustion (14.2 %, 14.1 ppbv), oil refineries (11.3 %, 11.2 ppbv) and asphalt (3.0 %, 3.0 ppbv).Therefore, more attention should be paid to the fuel evaporation due to its high OFP.It should be noted that the source contributions to OFP were calculated for 20 selected VOC species in PMF modeling and the actual contributions to OFP were higher than the results.

Source contributions compared with previous studies
The source apportionment results showed that the dominant source in this study was the natural gas source, contributing 62.6 ± 3.04 % to the total VOCs for the annual average, followed by fuel evaporation (21.5 ± 2.99 %), combustion (10.9 ± 1.57 %), oil refineries (3.80 ± 0.50 %) and asphalt emissions (1.30 ± 0.69 %).Each identified PMF factor exhibited obvious temporal variations due to the emission strength, photochemical reaction and meteorological conditions.The source apportionment results in this study were compared with former studies based on long-term monitoring (Table 2).The contributors to VOCs in urban areas were complex, with at least five different sources including fuel evaporation, LPG/NG, industrial emissions, vehicle emissions and solvent usage (Table 2).The number of VOC sources apportioned in industrial areas was fewer compared to the cities.For example, only three sources including vehicle emissions (58.3 %), solvent usage (22.2 %) and industrial activities (19.5 %) were apportioned by principle component analysis -multiple linear regression (PCA-MLR) in Lanzhou, a petrochemical industrialized city in northwest China (Jia et al., 2016).The same result was also found in Houston that only fuel evapowww.atmos-chem-phys.net/18/4567/2018/Atmos.Chem.Phys., 18, 4567-4595, 2018   ration, industrial emissions and vehicle emissions were identified (Leuchner and Rappenglück, 2010).In these studies, the vehicle emissions was an important source both in urban and industrial areas and contributed about 11-58.3 % to the total VOCs (Table 2).However, the vehicle emission source was not identified in this study due to several reasons.First, despite the similarity between the source profile of combustion or fuel evaporation in this study and the vehicle emissions (i.e., high loadings on acetylene, ethylene, BTEX, butanes and pentanes), the temporal variations in these species did not show a distinct increase during the traffic rush hour.
In fact, the identified combustion source in this study represented the characteristics of coal burning and torch burning in oil refineries (to eliminate the hazardous gases).Secondly, differences existed in sampling location and vehicle amounts.In previous urban studies, the sampling location was in megacities with huge vehicle flows.For example, in the research of Wuhan (Lyu et al., 2016), the sampling site lo-cated in the city center and the car population was 2.2 × 10 6 by the end of 2015.While the sampling location here was about 11 km away from the urban areas and the car ownership was only 1.1 × 10 5 .Therefore, the factor with higher loadings of these species was not likely to be contributed by vehicle emissions in this study.LPG and NG sources are usually apportioned in both urban and industrial areas.These sources contribute 10-32 % to the total VOCs and are mainly from household or fugitive industrial emissions.However, in this study, the NG source was mainly from the NG exploitation and NG chemical industry due to its abundance in this area and accounted for 62.6 ± 3.04 % on average to VOCs, which was higher than many other areas as summarized in Table 2.
Solvent usage also accounts for a large proportion of total VOCs in urban areas (4.7-36.4%).In this study, a similar source related to asphalt was identified with heavy weights on C 9 C 12 compounds.The solvent usage in urban areas is usually from painting or coating.However, the asphalt in this study originated from oil refineries (Fig. S1) and fugitive emissions from a black oil hill located to the northwest of the sampling site.Due to its high boiling point, the seasonal contribution of asphalt was distinct, with the highest contribution in July (7.2 %) and lowest contribution in January (1.4 %).Despite the source contribution of asphalt being low, it was unique in this study.

Geographic origins of VOC sources: local vs. regional contributions
The possible geographic origins of five identified VOC sources were explored using CPF, PSCF and CWT as shown in Figs.14, 15 and 16, respectively.These methods aimed at providing insights on the potential geographic origins of VOC sources but did not claim to be precise at the cell level or pixel level.
The highest CPF value of oil refineries was found east of the sampling site (Fig. 14a), which indicated the potential location of this source.However, the oil refineries are mainly located to the southwest of the sampling site (Fig. 1c) and a high CPF value (0.95) was also found in the southwest direction.Therefore, the CPF results were able to reflect the location of the oil refineries.Similarly, high probabili-ties and concentrations of oil refineries were also found from the southeast to southwest of the sampling site according to the PSCF (Fig. 15a) and CWT plots (Fig. 16a).As shown in Fig. 1a and b, the sampling site is located to the west of the Junggar Basin, which is the second largest oil and gas basin in China.Indeed, high values of CPF, PSCF and CWT were found in the east (Figs. 14b,15b and 16b), which indicated the potential geographic origins of NG.Given the fact that the NG source was composed of long atmospheric lifetime species (i.e., ethane, propane and butanes), the high probabilities and concentrations of this factor likely resulted from aged air masses from each direction.The combustion source showed high potentials from the ESE to SE according to the CPF, PSCF and CWT plots.There were no high values to the northwest of the sampling site, where the urban area is located.This also indicated that the combustion from vehicle emissions was insignificant in this study.For the asphalt source, the highest CPF value was found in the east while the PSCF and CWT plots showed high values to the northeast.As discussed above, the asphalt source in this study was from the natural source (black oil hill to the northwest of the sampling site) and oil refineries (southwest direction).The CPF, PSCF and CWT results indicated that these methods failed to locate the natural source of asphalt.The potential geographic  origins of fuel evaporation were widespread from the ESE to W, which was similar to the oil refinery source.Diversities of geographic origins were also found in different seasons (Figs.S6-S13).The potential source areas of the five sources spread from northeast to southwest in autumn.In winter, both PSCF and CWT methods indicated that the VOC sources were probably from the southeast and southwest.In spring, VOCs were mainly from long-range transport from the west.However, high probabilities and contributions existed around the sampling site.In summer, high potential and contribution were from the west to the southeast.Overall, the five sources exhibited different local source areas proved by the CPF plots on the annual scale.Similar regional distribu-tions of these sources were found on the seasonal scale.To quantify the contributions of local emissions and long-range transport to the sampling site, raster analysis based on CWT was used and the results are summarized in Table 3. Annually, except for the combustion source, the identified VOC sources were mainly from the local emissions, with contributions of 53.6 % for oil refining, 54.5 % for NG, 50.5 % for asphalt and 50.6 % for fuel evaporation.The seasonal patterns were the same as the annual pattern, exhibiting higher contributions from local areas and the differences only existed in the proportions.The highest local contributions of oil refining (69.4 %) and combustion (69.2 %) were observed in summer, while the local sources contributed most to the NG (74.6 %), asphalt (65.4 %) and fuel evaporation (68.3 %) in autumn.

Summary
Based on 1 year of continuously online monitoring VOCs in an oil and gas field, and on the use of PMF receptor, back trajectory, PSCF and CWT dispersion models, this study compared the VOC levels and compositions with other studies, identified the VOC source and explored the potential geographic origins of five identified VOC sources.The main findings are summarized as follows.
1.The total VOC concentrations in this study were not only higher than those in urban areas but also higher than those measured in petrochemical areas.Alkanes contributed most to the total VOCs (accounting for 87.5 % and 128 ± 82.4 ppbv on average), followed by alkenes (6.81 % and 9.1 ± 5.6 ppbv), aromatic hydrocarbons (3.37 % and 4.8 ± 6.5 ppbv) and acetylene (2.32 % and 3.1 ± 5.1ppbv).
2. Five sources with local characteristics were identified.
The NG contributed most to the VOCs (62.6 ± 3.04 %), followed by fuel evaporation (21.5 ± 2.99 %), combustion source (10.9 ± 1.57 %), oil refining (3.80 ± 0.50 %) and asphalt (1.30 ± 0.69 %).The NG and fuel evaporation source contributions showed positive correlation with each other and shared the same diurnal variation pattern, exhibiting a single peak profile.The diurnal variation in oil refining and combustion source exhibited a similar double wave with peaks occurring at 06:00-08:00 LT.Different from other sources, the diurnal profile of asphalt exhibited a decreasing trend from nighttime to its minimum before sunrise (06:00 LT).
3. The geographic origins of five VOC sources were the same during the whole period.The differences existed in the seasonal variations in them.For instance, VOCs were mainly from the northeast and southwest in autumn, while they originated from the southeast and southwest in winter.The raster analysis indicated that the VOCs in this study were mainly from local emissions with contributions ranging from 48.4 to 74.6 % in different seasons.
In summary, this study found that the VOC concentrations, compositions, ozone formation potential and sources were different from those in urban and industrial areas and similar to those in oil and gas rich areas.This study will be helpful for the VOC control in these type of regions around the world.
H. Zheng et al.: Monitoring of volatile organic compounds for 1 year Appendix A: Detail operation of positive matrix factorization (PMF) in source apportionment of VOC dataset.

A1 Data preparation
Two files including species concentration and uncertainty are required to be introduced into the EPA PMF 5.0 model.The concentration file is an i (number of samples) × j dimension (number of species) matrix (X matrix), i.e., 2743 × 20 in this study.There are two types of uncertainty files: sample specific and equation based.The sample-specific uncertainty file is also a matrix with the same dimensions as the concentration matrix.The equation-based uncertainty dataset is constructed according to the method detection limit (MDL) and error fraction (%): Not all 57 VOCs are introduced into the PMF model; there are some rules to decide which species should be included or excluded from the PMF model: (1) highly collinear species, such as propane and n-butane or benzene and toluene are included (Fig. A1); (2) species indicating VOC sources (i.e., acetylene is the marker of combustion sources) are retained; (3) species that are highly reactive are excluded (i.e., ipentene) since they are rapidly reacted away in the ambient atmosphere (Guo et al., 2011;Shao et al., 2016).Prior to the PMF model base run, the retained species were firstly classified into strong, weak and bad based on their signalto-noise ratios (S / N).Species with S / N ratios less than 0.5 were grouped into bad and grouped into weak if S / N ratios were in the range of 0.5-1.0(US EPA, 2014).However, the S / N ratios were not useful to categorize species because all species have S / N ratios greater than 2.0 in this study.Therefore, the percentage of samples below the detection limit (BDL), residual scale and priority knowledge of VOC source tracers are used.The species with a number of samples BDL greater than 60 % were categorized as bad and were excluded from the model (i.e., trans/cis-2pentene, isoprene); species with a number of samples BDL > 50 % were characterized as weak (Callén et al., 2014).Finally, nine species (ethane, propane, n-hexane, cyclohexane, methylcyclohexane, n-octane, n-nonane, n-decane and u-undecane) were categorized as strong and 11 species (ibutane, n-butane, i-pentane, n-pentane, n-dodecane, ethylene, acetylene, benzene, toluene, mand p-xylene, and oxylene) were characterized as weak due to their residual scale beyond 3.

A2 The optimal number of factors
Choosing the optimal number of factors (P value) is a critical question in PMF analysis.Too many factors will re-sult in meaningless factor profiles, while too few factors will make it difficult to segregate the mixing sources (Bressi et al., 2014).Factors ranging from 3 to 8 were tested in this study.
Each model was run for 20 times with a random seed.All the Q values (Q true , Q robust , Q except and Q true /Q except ) observed verse predicted (O/P) concentrations and scaled residuals were evaluated.In theory, if the number of sources is estimated properly, the Q true value should be approximate to Q except .If the number of sources is not well determined, the Q value may deviate from the theoretical value (Bressi et al., 2014;Baudic et al., 2016).However, the Q true deviates from the Q except in many cases, especially for large datasets (Liu et al., 2016(Liu et al., , 2017;;Shao et al., 2016).The variation in the Q values to the number of factors is shown in Fig. A2a and the correlation coefficients among O/P values in each factor number solution is shown in Fig. A2b.
As shown in Fig. A2a, Q true /Q except decreased substantially between two-, three-and four-factor solutions, indicating that a substantial amount of the variability in the dataset was accounted for for each additional factor; for P = 5, the Q ture /Q except exhibited the minimum value; as the factor number changed from 6 to 8, the Q ture /Q except value increased again.The Pearson correlation coefficients between the observed and predicted total VOC concentrations for different factor numbers are shown in Fig. A2b, which indicates that the total VOC concentrations were well reproduced by the PMF model.In addition, for the 20 individual VOC species, the PMF model also reproduced the predicted concentrations well, with the r 2 ranging from 0.42 to 0.96 (Table A1).Therefore, we considered that the fivefactor solution was the optimum solution for this PMF analysis (Fig. A3).

A3 Bootstrap run (BS)
After choosing the five-factor solution, the bootstrap (BS) method was used to detect and estimate disproportionate effects of a small set of observations on the solution and also, to lesser extent, effects of rotational ambiguity.BS datasets are constructed by randomly sampling blocks of observations from the original dataset (US EPA, 2014).The base run with the lowest Q robust is provided to map with each BS run in minimum Pearson correlation coefficient being 0.6.The number of BS datasets is set as 100 to ensure the robustness of the statistics.In this study, the base and boot factors were matched except for factor 3 (combustion) and factor 5 (fuel evaporation; Table A2).Mapping over 80 % of the factors indicates that BS uncertainties can be interpreted and the number of factors may be appropriate.Seen from Table A2, the BS results indicated a rotational ambiguity and F peak should be further applied.

A4 BS-DISP error estimation
Bootstrap displacement (BS-DISP) estimates the errors associated with both random and rotational ambiguity.A key file containing the number of cases accepted, largest decrease in Q, number of swaps in best fit and DISP was generated (Table A3).Swaps by factor were used to assess the error fraction.There were 99 bootstrap cases accepted and one re-sample was rejected.The decrease in Q was less than 1 %, which indicated that the test of BS was validated and no more testing was required.It suggested that the solution was well constrained and the BS-DISP results can be reported.
Finally, the F -Peak values from −1 to 1 at the 0.1 interval were used to remove the rotational ambiguity as discussed above.The F -peak bootstrap was also used to test the mapping between the base model and the F -peak runs.Results indicated that the F peak = 0.2 was the optimal solution, with all factors mapping 100 % and the base run of each species was within the interquartile range (IQR) of the BS run.Therefore, the local and regional area was divided by a circle with the radius as 7 • (sampling site as the origin).

C2 Raster analysis
The CWT results obtained with the TrajStat software were stored in shapefile format and were then introduced into the ArcGIS software (10.1, Esri, US).The first step was to remove the negative value from the shapefile and then convert the shapefile into raster format.The local and regional area was extracted by a circle with a radius of 7 • as discussed above.The inner area of the circle was defined as local transport while the external area was set as regional transport and an example is shown in Fig. C2.The statistics (count, minimum, maximum, sum, mean and standard deviation) of the extracted raster were shown in its layer properties in Ar-cMap.The statistics of each VOC source in each season are summarized in Table C1.The percentage contributions of local source and regional transport of the five VOC sources in different seasons were calculated according to Eq. (2) in Sect.2.4.3.

Figure 1 .
Figure 1.The spatial distribution of oil-and gas-bearing basins in China (a) and the terrain of the study area (b).The sampling site is about 11 km away from the urban area and located to the northeast of an oil refinery plant and southwest of an oil and gas field.The northeasterly winds prevailed during the sampling periods (c).

Figure 2 .
Figure 2. Meteorological parameters at the observation site from September 2014 to August 2015 for every 3 h.

Figure 5 .
Figure 5. Box and whisker plots of VOC profiles based on different scales during the whole sampling period.Box and whisker plots are constructed according to the 25th-75th and 5th-95th percentiles of the calculation results.

Figure 6 .
Figure 6.Seasonal and daily variations in ethane (a), ethylene (b), acetylene (c) and benzene (d) during the sampling period.

Figure 7 .
Figure 7.Diurnal variation in boundary layer height (BLH), VOC, NO 2 , and O 3 concentrations on different timescales: annual (a), winter (b) and summer (c).The solid line represents the average value and the filled area indicates the 95th confidence intervals of the mean.

Figure 10 .
Figure 10.Source profiles of five factors resolved with PMF modeling including oil refineries (a), NG (b), combustion source (c), asphalt (d), and fuel evaporation (e) and their corresponding hourly source contributions.Box and whisker plots are constructed according to the 5th-95th percentiles of the F -peak bootstrap runs (n = 100).

Figure 11 .
Figure 11.Variation in monthly averaged (a) and seasonally averaged (b) contributions of five identified VOC sources (expressed as a percentage).

Figure 12 .
Figure 12.Scatter plots of daily concentrations of trace gas and source contributions including oil refineries (a), NG (b), combustion (c), asphalt (d) and fuel evaporation (e) under different meteorological conditions (wind speed, WS; boundary layer height, BLH; and temperature, T ).

Figure 13 .
Figure 13.Diurnal variation in the contributions (ppbv) of five identified sources including oil refining processes (a), NG (b), combustion source (c), asphalt (d), and fuel evaporation (e) and specific compounds with high loadings in each source profile.Note that the CO in combustion source is expressed in milligrams per cubic meter.

Figure 15 .
Figure 15.Annual weight potential source contribution function (WPSCF) maps for five identified sources derived from PMF analysis including oil refineries (a), NG (b), combustion (c), asphalt (d) and fuel evaporation (e).The black cross represents the sampling site.

Figure 16 .
Figure 16.Annual weight concentration-weighted trajectory (WCWT) maps for five identified sources derived from PMF analysis including oil refineries (a), NG (b), combustion (c), asphalt (d) and fuel evaporation (e).The black cross represents the sampling site.

Figure A2 .
Figure A2.Q true /Q except , Q robust and Q true plotted against the number of factors used in the positive matrix factorization (PMF) solution (a) and the correlation coefficients (r 2 ) between observed and predicted VOC concentrations of each factor solution (b).

Figure A3 .
Figure A3.Scatter plots between the total predicted and observed VOC concentrations based on the five-factor PMF solution.

Table 1 .
Concentrations (mean ± standard deviation) during the sampling period and the photochemical properties of VOCs.

Table 2 .
Comparison of VOC source apportionment results with former studies.

Table 3 .
Contributions (%) of local sources and regional transport of five sources in different seasons.

Table A1 .
Pearson coefficients between the observed and predicted VOC concentrations for the five-factor solution.

Table A2 .
Mapping of bootstrap factors to base factors.

Table C1 .
The statistics of VOC sources in each season obtained using raster analysis.