Bayesian inverse modeling of the atmospheric transport and emissions of a controlled tracer release from a nuclear power plant

Probability distribution functions (PDFs) of model inputs that affect the transport and dispersion of a trace gas released from a coastal California nuclear power plant are quantified using ensemble simulations, machine-learning algorithms, and Bayesian inversion. The PDFs are constrained by observations of tracer concentrations and account for uncertainty in meteorology, transport, diffusion, and emissions. Meteorological uncertainty is calculated using an ensemble of simulations of the Weather Research and Forecasting (WRF) model that samples five categories of model inputs (initialization time, boundary layer physics, land surface model, nudging options, and reanalysis data). The WRF output is used to drive tens of thousands of FLEXPART dispersion simulations that sample a uniform distribution of six emissions inputs. Machine-learning algorithms are trained on the ensemble data and used to quantify the sources of ensemble variability and to infer, via inverse modeling, the values of the 11 model inputs most consistent with tracer measurements. We find a substantial ensemble spread in tracer concentrations (factors of 10 to 103), most of which is due to changing emissions inputs (about 80 %), though the cumulative effects of meteorological variations are not negligible. The performance of the inverse method is verified using synthetic observations generated from arbitrarily selected simulations. When applied to measurements from a controlled tracer release experiment, the inverse method satisfactorily determines the location, start time, duration and amount. In a 2km× 2km area of possible locations, the actual location is determined to within 200 m. The start time is determined to within 5 min out of 2 h, and the duration to within 50 min out of 4 h. Over a range of release amounts of 10 to 1000 kg, the estimated amount exceeds the actual amount of 146 kg by only 32 kg. The inversion also estimates probabilities of different WRF configurations. To best match the tracer observations, the highest-probability cases in WRF are associated with using a late initialization time and specific reanalysis data products.

terrain induced flow. High-resolution terrain and land use fields were generated for D5 by downloading National Elevation Dataset and National Land Cover Database at 1-arc second (approx. 30 m) data from the United States Geological Survey Multi-Resolution Land Characteristics data server (Homer et al., 2015).

WRF Ensemble
Several features also make WRF ideal for creating an ensemble of plausible atmospheric conditions for uncertainty assess-5 ments. Ensemble modeling approaches have been shown to be effective at quantifying physically plausible states of the atmosphere in a probabilistic manner (e.g., Mullen and Baumhefner, 1994;Stensrud et al., 2000;Berner et al., 2011). The WRF modeling system contains numerous physics schemes that parameterize subgrid-scale processes, such as the surface energy exchange, cloud microphysics, and turbulent mixing in the planetary boundary layer. By running WRF with multiple combinations of physics options, an ensemble is generated that captures meteorological uncertainty due to subgrid-scale pa-10 rameterization error. In addition, WRF can be initialized and run using a variety of publically available reanalysis data sets to quantify meteorological uncertainty resulting from errors in initial / lateral boundary conditions. Meteorological data from gridded analysis data sets and observations can also be integrated into WRF simulations to improve model accuracy by using four dimensional data assimilation (FDDA) options for analysis (Stauffer and Seaman, 1994) and observational (Liu et al., 2005(Liu et al., , 2009 nudging. The goal of the analysis FDDA option is to nudge large-scale motion towards an observed state using 15 relaxation terms, while the observational nudging impacts the prediction of local-scale atmospheric phenomenon. Table 1 summarizes the five major variables that were selected for the WRF ensemble. For the purposes of the Monte Carlo sampling and analysis (Fig. 1), the WRF variables are treated as categorical random variables. By taking all of the combinations among the five variables in the table, we constructed a WRF ensemble containing 162 members. The major categories in the table include model initialization time, source of input reanalysis data, FDDA nudging weighting factors, planetary boundary 20 layer (PBL) physics, and land surface model (LSM) physics. The ensemble categories and their variations were selected based on previous WRF user experience and a literature review of sources of uncertainty that are likely to impact meteorological fields important to the specific tracer release experiment described in Sect. 4. Variations in PBL and LSM physics, for example, have been shown to affect near surface stability and wind fields (Lee et al., 2012), which can have a large impact on plume transport modeling. Other categories related to changes in microphysics and cumulus physics are not considered in this report 25 because precipitation and cloud cover were not present during the specific tracer release experiment. Future studies can easily incorporate these factors, and others, by including additional categories for Monte Carlo sampling.
Several variations are included in the weather ensemble to account for uncertainty related to model initialization and meteorological reanalysis inputs. WRF simulations were started at either 15 hours or 9 hours before the beginning of the tracer release (i.e., at 00:00 UTC or 06:00 UTC on 4 September 1986) to investigate the sensitivity of model solutions to initialization 30 start time and model spin up duration. All of the WRF simulations ended at 13 hours after the end of the tracer release (i.e., at 12:00 UTC on 5 September 1986). The three reanalysis variations included are the North American Regional Reanalysis (NARR) data (Mesinger et al., 2006), European Centre for Medium-Range Weather Forecasts (ECMWF) data (Hersbach et al., 2015), and Climate Forecast System Reanalysis (CFSR) fields (Saha et al., 2010). NARR reanalysis meteorological data are available every 3 hours on 30 vertical levels with a horizontal grid spacing of 32 km. Both ECMWF and CFSR reanalysis fields are available every 6 hours on 38 vertical levels. However, CFSR reanalysis fields have a horizontal grid spacing of roughly 60 km versus roughly 125 km for ECMWF data.
The FDDA weighting of meteorological data fields during the WRF ensemble simulations was varied to account for uncertainty associated with the assimilation of gridded reanalysis fields and irregularly spaced weather observations. Weather 5 simulations were performed with WRF FDDA options for analysis and observational nudging options either turned off, using default weighting factors as suggested by WRF guidance, or a high option with the weighting factors one order of magnitude higher than the default values. Additionally, FDDA analysis nudging was used only on the two outer course-resolution model domains, while FDDA observational nudging was used only on the two innermost model domains (see WRF domains in Three PBL models and three LSM schemes were used to construct the WRF ensemble to account for uncertainty associated with turbulent mixing and surface momentum, moisture, and thermodynamic fluxes. The PBL models included the Yonsei University (YSU) scheme (Hong et al., 2006), the Mellor-Yamada-Janjic (MYJ) scheme (Janjić, 1994), and the Mellor-Yamada-Nakanishi and Niino (MYNN) scheme (Nakanishi and Niino, 2006). Among the PBL models, the biggest difference 15 is that the YSU scheme uses a countergradient flux (non-local) method to develop parabolic mixing profiles in the boundary layer, while the MYJ and MYNN schmes use different numerical approaches to solve for local turbulent kinetic energy (TKE) based vertical mixing in the PBL and free atmosphere. The LSM physics options include Thermal Diffusion (Duhdia, 1996), NOAH (Ek et al., 2003), and RUC (Benjamin et al., 2004) models. Soil moisture and explicit vegetation canopy physics are not included in the Thermal Diffusion model, while the NOAH and RUC models parameterize vegetation canopy effects to 20 differing degrees and both provide soil moisture gradients.

FLEXPART
The FLEXPART Lagrangian dispersion particle model (Stohl et al., 1998(Stohl et al., , 2005Stohl and Thomson, 1999) was used to simulate the atmospheric transport and mixing of the tracer gas released from the Diablo Canyon nuclear power plant. Field experiments have been used to validate the performance of FLEXPART (Stohl et al., 1998;Forster et al., 2007). FLEXPART 25 has also been used in a wide variety of dispersion applications, including the transport of air pollutants (An et al., 2014;Avey et al., 2007), of radiological releases from nuclear power plants and radioisotope production facilities (Andreev et al., 1998;Wotawa et al., 2010;Stohl et al., 2012), of volcanic plumes (Stohl et al., 2011), and of noble gases produced from nuclear weapons tests (Becker et al., 2010).
We used FLEXPART-WRF version 3.1 (Brioude et al., 2013a), which was developed to use meteorological data generated 30 by the WRF model to drive atmospheric transport and diffusion processes. The FLEXPART-WRF code is open source and available for download (https://www.flexpart.eu/). Mean particle trajectories and tracer concentrations were calculated using the three dimensional wind components from WRF (u, v, and w) over the 50 km by 50 km domain shown in Fig. 3 (dashed rectangular area). The FLEXPART-WRF grid used 401 cells in each of the horizontal directions and 11 vertical levels from the 5 Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-336, 2017 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 19 April 2017 c Author(s) 2017. CC-BY 3.0 License. surface to 3 km, with 6 levels contained in the lowest 500 m. Tracer concentrations were derived using one million particles released from a randomly selected point source location at random release times, as detailed in the next section. Wind fluctuations (σ v and σ w ) were calculated using parameterizations (Hanna, 1982;Ryall and Maryon, 1998) and WRF micrometeorological output variables (friction velocity, surface sensible heat flux, planetary boundary layer height, and Monin Obukhov length scale). Lagrangian particles were evolved using a sampling rate and synchronization interval of 20 seconds, and the simula-5 tions utilized a subgrid terrain parameterization. FLEXPART-WRF can also simulate wet and dry deposition removal processes (Wesely and Hicks, 1977;McMahon and Denison, 1979;Slinn, 1982;Hertel et al., 1995), though these processes were not needed for the passive gas tracer release at Diablo Canyon.

FLEXPART Ensemble
In addition to the wind field variations generated by the WRF ensemble, the inverse modeling system in Fig. 1 also applies 10 a Monte Carlo sampling loop to emissions variables in FLEXPART. The goal of this part of the inversion is to determine the location, timing, and magnitude of the tracer release emissions by minimizing the differences between FLEXPART predictions and field measurements.
The location, timing, and magnitude of the Diablo Canyon release are inferred by sampling the six emissions inputs shown in Table 2. Each input is represented by a continuous random variable that can take any value in the inversion range, including 15 the minimum and maximum values. The ranges bound the actual values used for the Diablo Canyon tracer release experiment, which are also listed in the table. The release latitude and longitude are sampled over a roughly 2 km × 2 km bounding box centered on the actual location. The height of the release is varied between 1 and 10 meters above ground, with the actual height at 2 m above the surface. Potential release start times within a two hour period centered around the actual start time (08:00 local time) are considered. Similarly, a range of possible release durations lasting between 6 to 10 hours are tested, with 20 the actual release occurring for 8 hours. Lastly, the inversion algorithm considered any amount between 10 and 1000 kg for the the trace gas released, with the true value at 146 kg.
The FLEXPART ensemble contains 40,000 dispersion simulations that were run and analyzed for the Diablo Canyon release.
The ensemble FLEXPART simulations were generated by randomly sampling both the WRF ensemble and the FLEXPART emissions variables. Random samples were drawn using a Latin hypercube design (Helton and Davis, 2003), which is a space-25 filling variation of Monte Carlo, assuming an 11-dimensional uniform probability distribution. Further details on the statistical aspects of the ensemble modeling are given below in Sect. 3.

Machine learning
Machine learning is used to train statistical regression functions to approximate the input-output relationships in the WRF-

30
FLEXPART ensemble. Once trained, the machine learning functions can be evaluated very efficiently at new input values and used for uncertainty propagation, parameter estimation, Bayesian inference, and other types of statistical analysis. These functions are used for two primary purposes for our work. They are used to identify and rank the effects of input features in WRF and FLEXPART on the tracer responses (i.e., a form of sensitivity analysis) and to determine the values of the inputs that yield responses that are similar to tracer observations (i.e., optimization and inverse modeling). These applications are described in more detail below.

5
The WRF-FLEXPART ensemble is mathematically expressed as where y is a vector containing information about the output or response of the simulations in the ensemble, x WRF and x FLX are vectors containing the corresponding categorical and continuous inputs to WRF and FLEXPART, respectively, and the function F represents the WRF and FLEXPART physical models. The response vector y is taken as either the tracer concentration at a 10 specified location and time or a measure of the goodness-of-fit between the ensemble simulations and measurements. Complex spatiotemporal dispersion patterns are not contained in y, although new statistical methods are being developed to capture these effects (Francom et al., 2016). Machine learning is used to approximate Eq. (1) by training on the ensemble data, which results in 15 whereŷ is an approximation of y and y is the approximation error. The value of y is small for our analysis and is neglected for the remaining discussion (i.e., differences between y andŷ are less than 10% on average, not shown). For notational convenience, the inputs x WRF and x FLX are also combined into a single input vector x in subsequent discussion.
We use a method called gradient boosting (GB) that fits statistical regressions to Eq. (1) using sums of decision trees following the basic notion that an individual decision tree by itself is a weak learner, but a combination of trees is a strong 20 learner capable of fitting complex systems. The GB algorithm is briefly outlined below, and further details are available (e.g., Hastie et al., 2009). We use the GB version available in the Scikit-learn package (Pedregosa et al., 2011).
As noted, a GB model is a sum of decision trees of the form where T m (x; θ) is the tree at stage m and θ is a set of tree fitting parameters (e.g., depth of the trees). An individual tree is which partitions the input space x into J disjoint regions and assigns a value of γ j to region R j via the indicator function I.
Starting from an initial model that fits the mean of the response, F 0 , the regression model is built-up additively using a boosting technique that fits trees to the residuals between the current and previous stages, where the tree output values, γ jm , are defined implicitly. This expression is solved by numerically estimating gradients of a loss function (e.g. least squares) and using steepest descent optimization. A stochastic variation of GB is used that considers a random subset of training data during each stage, which has been shown to improve the accuracy of the fits.
Although other statistical regression methods could be used, GB offers two clear advantages for fitting the WRF-FLEXPART ensemble. First, as shown in Eq.
(2), GB naturally handles heterogeneous inputs (i.e., x WRF and x FLX ). This makes it convenient 5 for analyzing the combined effects of WRF inputs that vary as categories or discrete variables and FLEXPART variations that vary continuously. In addition, GB has a built-in technique for determining the influence of the inputs on the outputs during training using feature importance. The algorithm estimates feature scores for the inputs based on the positions of the nodes in the decision trees. Inputs near the top root node affect the output more strongly than inputs near the bottom terminal nodes, and therefore have higher feature scores. Applied to the WRF-FLEXPART ensemble, the feature scores are analogous to sensitivity 10 coefficients that quantify the fraction of the ensemble variance caused by the changes in the inputs.

Bayesian inverse modeling
The goal of the inverse modeling is to determine the values of the inputs to WRF and FLEXPART that provide output concentrations that best match the tracer measurements. The inversion uses an extension of our approximate Bayesian computation algorithm described in Lucas et al. (2016). The algorithm has been updated to enable joint inversions of categorical inputs in 15 WRF and continuous inputs in FLEXPART, and to allow more flexibility in the calculation of the model-observation likelihood distance weights. In particular, the likelihood weights now utilize predictions from the GB regressions and consider more than one distance metric. Further information about the scheme is provided below and is illustrated in Fig. 5.
The remaining term in Eq. (6) is the likelihood function, L = P(y|x), which quantifies the level of agreement between the simulated and measured tracer concentrations for a given draw from the prior distribution. Relatively high and low likelihood values correspond, respectively, to simulations that agree well and poorly with measurements. Following our previous work , we compute the mean-squared-error (mse) between simulations and measurements as one metric for 30 calculating high and low likelihoods. Furthermore, we include the correlation (corr) between observations and simulations as another metric to determine high and low likelihood values. The corr metric is included because it is sensitive to different aspects of model and observation differences than the mse. The mse varies with the magnitude of the differences between observations and simulations, and is expected to be mainly sensitive to changes in the source amount. The temporal correlation, on the other hand, is expected to be sensitive to changes in the arrival time and duration of the plume at the measurement locations. By combining the two metrics mse and corr into a single likelihood weight, we aim to constrain a larger number of input parameters than we could using either metric alone.
To account for the two metrics in the likelihood function, we use an expression of the form where y s is a column vector of the metrics for a simulation at input x, y t is the corresponding column vector of the metric "targets", and Σ Σ Σ is the 2 × 2 covariance matrix of model and observation errors. The highest likelihood values occur at the inputs that jointly minimize the mse and maximize the corr. Ideally, with a 10 perfect model and data, the targets for mse and corr would be 0 and 1, respectively. In practice, however, models are imperfect and data is noisy, so it is usually not possible to find simulations that match the data perfectly. To avoid extrapolation, we therefore define the targets in y t using a small number of the best fitting simulations within the ensemble, and then estimate the covariance matrix Σ Σ Σ using a bootstrap resampling procedure (Wilks, 2011). Further details of the target and covariance estimation are provided in Sect. 5.4.

15
Before computing the mse and corr in Eq. (8), the tracer concentrations are transformed using a Box-Cox power transformation with an exponent of −0.25 (see Wilks, 2011). This transformation generalizes the logarithmic transform and is used because the tracer concentrations vary over many orders of magnitude. Without it, the likelihood metrics would be skewed toward higher tracer concentrations near the release location. The Box-Cox transformation also symmetrizes the distribution of differences in Eq. (7) by removing long tails.

20
The Bayesian inversion is performed using GB regressions, instead of actual model simulations, to predict y s (x) for 2 million new Latin hypercube input values. Two million points are needed to better cover the large sampling volume of the 11dimensional prior distribution because the sampling volume varies exponentially with the number of dimensions. For instance, partitioning the ranges of only the 6 FLEXPART inputs into 10 bins each results in a volume with 10 6 bins. Running and analyzing the output of 10 6 FLEXPART simulations is computationally infeasible, so we use the ensemble of 10 4 simulations 25 (Sect. 2) as a training dataset to build the GB regressions, and then use the regressions as surrogates for the actual models in the inversion because they can be evaluated very efficiently.
To verify the Bayesian inversion scheme, we performed a series of "synthetic data" tests using model-generated inputs and outputs. These tests are important because inverse problems often have multiple solutions and may be poorly constrained (i.e., ill-posed and ill-conditioned). Sect. 5.5 highlights the results of a synthetic data inversion test. Field measurements from the Diablo Canyon nuclear power plant tracer release experiment (Thuillier, 1992) Table 1, while the FLEXPART simulations use the actual values of the source release parameters listed in Table 2. These plumes therefore represent the our best prior knowledge in a forward modeling sense, and provide tracer concentrations that we would expect to compare reasonably well 10 to measurements without inverse modeling.
The upper portion of the figure shows the plumes using NARR and ECMWF five hours after the release. At this stage of the simulations, there is a large spatial difference between the plumes. The dispersion using NARR is directed eastward, is spatially more confined, and does not extend downwind of Pismo Beach, as compared to the southeast directed ECMWF plume. The ECMWF plume covers a much wider region, though most of the extended area is over the ocean. Because there are not many 15 measurement sensors over the ocean, we expect there to be smaller differences between NARR and ECMWF in the inversion algorithm than the plumes in upper part of Fig. 6 suggest.
Nine hours after the release, as shown in the lower part of the figure, the plumes using NARR and ECMWF begin to resemble each other. Both are directed to the southeast, and both have about the same spatial extent. The higher concentration area of the plumes using ECMWF are a little more dispersed near the release location (see red contour), but otherwise the differences 20 between the two reanalysis cases are minor. The next section shows that differences are substantial when the 11 model inputs are varied together. except that the arrival times of the plume are delayed and the peak concentrations are reduced relative to their distance from the release point. Considering the large 5-95% quantile range in the time series, which spans about three orders of magnitude in the SF 6 concentrations, the tracer measurements generally agree with the ensemble, other than during the initial 2-3 hours.

Prior probability distribution of SF
Also note that, although the reference case appears to agree with the measurements reasonably well (i.e., the black versus red 5 lines), the inversion algorithm may be able to identify a WRF configuration that provides a better match.

Feature scores of SF 6
The SF 6 concentrations in the prior probability distribution in  Overall, the feature scores suggest that the prior uncertainty in the source term inputs are more critical than the prior uncertainty in the meteorological inputs for this particular tracer release experiment. Although the WRF inputs are not the dominant source of variability, the combined effects of the sources of meteorological uncertainty still cannot be neglected. It is also im-25 portant to note that, for tracer release simulations conducted under different meteorological conditions, at different space and time scales, or that consider additional sources of uncertainty, the input sensitivities will likely differ from those estimated here.
Moreover, the contribution of meteorological uncertainty is expected to be larger for forecast problems that are not constrained by reanalysis data.
In addition to being useful for understanding the drivers of variance in the prior probability distribution, the feature scores 30 are also useful for interpreting the results of the Bayesian inversion in the following sections. Inputs with relatively high feature scores are often easier to constrain with observations. On this basis, therefore, we expect the posterior probability distributions for the FLEXPART location and source amount inputs to be relatively narrower than the other FLEXPART terms because they have the highest feature scores.

Inversion with synthetic data
Before performing an inversion with the Diablo Canyon tracer data, we first apply the algorithm to "synthetic" data with known inputs and outputs as a verification test. Synthetic data is generated using the output concentrations from a randomly 25 selected ensemble member as the inversion target. The posterior probability distribution of parameter values is computed using the previously described methods (i.e., fitting GB regressions to the mse and corr and estimating the covariance matrix as in Sect. 5.4). We draw 2 million new Latin hypercube points from the prior distribution to better cover the 11-dimensional input space, evaluate these points in the likelihood function, and compare the maximum likelihood locations to the known input values. For this particular test, the WRF simulation used the 06:00 UTC initialization time, the NARR reanalysis data, the MYNN TKE PBL scheme, the RUC land surface model, and no data assimilation nudging. The Bayesian inversion algorithm successfully determines these inputs, because the areas of highest posterior probability density coincide with the known values (i.e., the tallest bars and red areas overlap with the black lines and circles). All of the WRF inputs, but the land surface model type, exhibit large differences across the posterior categories, which indicates that the inputs are well constrained by the data and well with the known input values. Except for the release altitude, the algorithm infers the location, amount, and timing of the source. As determined from the widths of the posteriors, the release latitude and longitude are the best constrained FLEXPART inputs, followed by the source amount and duration. The posterior distribution for the release altitude is relatively unchanged from the flat prior distribution, suggesting that the data and metrics are not sufficient to constrain the relatively small variations of the release altitude (0 to 10 meters). As previously noted, there is a reasonable correspondence between the widths of the 20 FLEXPART posterior widths in Fig. 10 and the size of the feature scores in Fig. 8.
These results, along with other synthetic data tests that are not shown, provide confidence that the inversion algorithm appears to be functioning adequately. The algorithm returns values for the WRF and FLEXPART inputs that are close to the actual values for most of the parameters.

Inversion with Diablo Canyon tracer data 25
For the inversion using the SF 6 measurements, we draw another 2 million Latin hypercube points from the prior distribution for WRF and FLEXPART, evaluate the points in the GB fits for mse and corr, and then compute the likelihood weights relative to the target and covariance displayed in Fig. 9. The resulting posterior distribution of WRF and FLEXPART parameters is shown in Fig. 11.
As is the case with the synthetic inversion tests, the actual values for the location, start, duration, and amount of the SF 6 30 release are known for this tracer experiment. The inversion, however, assumes that the release parameters are unknown and uses the measurements to infer their values. As shown in the figure, the data and algorithm are sufficient to determine most of the FLEXPART source term parameters, because the positions of the maximum likelihood values of the parameters closely match the known experimental values (black lines and circles in Fig. 11). The close agreement between the two implies that the WRF and FLEXPART models do not have any severe deficiencies that prevent them from accurately simulating tracer transport for this experiment. The synthetic data inversion tests from the previous section would not expose model deficiencies because the same deficiency would be present in both the simulations and target, and hence would be subtracted out of the analysis.
Referring to Fig. 11, we see that the marginal distributions for the latitude, longitude, and amount of the release have the sharpest peaks and are therefore the most constrained by the measurements. The source start and duration are also moderately 5 constrained, though the distribution for release height is unconstrained and remains essentially flat. The posterior distribution also suggests that the release duration lasts longer than the 8 hour period used in the experiment. It is difficult to simulate rapid changes associated with the leading and trailing edges of the plume, and so the model and methods may be smoothing out these features and causing the overestimation of the duration. Other likelihood metrics besides mse and corr may help alleviate this issue. 10 The posterior distribution also shows a strong covariance relationship between the release latitude and longitude (see bivariate distribution in the upper left of Fig. 11). An area of relatively high probability stretches from northwest of the actual release location to the southeast. The shape of this covariance stems from the large-scale flow pattern and nearby measurement locations. The general direction of the flow for the release period is from the northwest to the southeast, and the release point is situated within a fairly close arc of sensors (see Fig. 3). As long as the release stays within this arc, moving the release 15 location slightly upwind or downwind will not greatly affect the simulated concentrations at the sensor locations. If the release location is moved orthogonal to the flow or outside of the arc, however, FLEXPART will simulate SF 6 at sensors where none was measured, and vice versa.
The WRF configurations in the posterior distribution are displayed on the right hand side of Fig. 11. Unlike the known FLEXPART inputs described above or the known inputs in the synthetic data experiments, the actual values of the meteorology 20 are not known here (i.e., there are no black lines or circles for WRF). The configurations that minimize the mse and maximize the corr with the SF 6 measurements are represented by the tallest bars in the univariate distributions along the diagonal and by the red-colored squares or bands in the off-diagonal bivariate distributions.
As shown in the figure, the maximum likelihood configuration consists of the 06:00 UTC initialization time, the ECMWF reanalysis fields, the YSU PBL scheme, the RUC land surface model, and no data assimilation nudging. Some of these config-25 uration settings may, at first, seem surprising. For example, the NARR reanalysis fields have a higher spatial resolution than the ECMWF fields and therefore may be expected to perform better. Likewise, the option to run without data assimilation seems to outperform the options with assimilation. Referring to the figure for these cases, the posterior distribution still has significant probability density for both the NARR and low nudging options. Compared to the posterior distribution in the synthetic data inversion, the WRF inputs are not as strongly constrained using the tracer data, especially the inputs for the land surface model 30 and nudging. Only two of the WRF inputs have settings with negligible probability, the earlier initialization time and CFSR reanalysis data. The alternate PBL schemes also have relatively low probability. We therefore conclude that the winds generated using the 06:00 UTC initialization time, the NARR or ECMWF reanalysis fields, and the YSU PBL scheme will optimize our likelihood metrics, and that there is not a preferred land surface model or nudging option.

Posterior probability distribution of SF 6
The ensemble time series in Fig. 7 are based on sampling the prior distribution of input parameters, which results in a substantial spread of SF 6 concentrations over two to three orders of magnitude. Most of these ensemble members do not agree well with the tracer measurements, so we estimate the posterior distribution of SF 6 concentrations by applying the likelihood weights of Sect. 5.4 and 5.6 to the ensemble time series in Fig. 7.  Other than at site 325, where a large spread remains, the 5-95% range covers about one order of magnitude, or a reduction of two orders of magnitude. Even with the reduced range, most of the measurements still fall within the light blue area. We do not expect all of the measurements to lie in the 5-95% range because the likelihood metrics consider the aggregation of all of the sites and times. In order to achieve an overall higher likelihood, individual measurement points may be farther away from the median in the posterior distribution than they were in the prior. 15 In addition to the reduction in the variance, Fig. 12 also shows that the median and other quantiles of the posterior distribution shift to higher concentrations. This shift occurs because the source term parameter variations in the prior distribution lead to many simulations having SF 6 concentrations that are too low relative to the measurements. The likelihood weights discount these simulations,

20
We have developed an ensemble-based Bayesian inverse modeling system that can determine information about an atmospheric release from a nuclear power plant using measurements collected a relatively safe distance downwind from the plant. The system uses an ensemble of WRF simulations to capture uncertainty in meteorological fields and an ensemble of FLEXPART dispersion simulations to vary factors related to emissions. Machine learning algorithms are trained on the input-output relationships in the meteorological and dispersion ensemble, resulting in statistical surrogate models that mimic the behavior of 25 the actual WRF and FLEXPART models, but that can be evaluated very rapidly at millions of new input value combinations.
Using our system, we can determine the input factors that are most important for understanding and reducing uncertainty in the ensemble (i.e., sensitivity analysis) and can estimate the values of the model inputs that provide likely matches between model output and field measurements (i.e., inverse modeling). Bayes' rule is used for the inversion, which provides probability distribution functions of model inputs and outputs constrained by observations and that serve as a quantitative assessment 30 of model performance. The inversion is designed to estimate the location, timing, and amount of material released to the atmosphere, and to determine the best categories of settings for running a meteorological model. The inversion system can handle, without difficulty, additional factors related to the transport and dispersion of potential materials released during a nuclear power plant accident (e.g., wet and dry deposition of soluble radioactive products).
Our ensemble system is tested against a tracer release experiment conducted near the Diablo Canyon nuclear power plant located in the rugged terrain of coastal California (Thuillier, 1992). An ensemble of 40,000 dispersion simulations is created using a Monte Carlo method to sample uncertainty in 6 source term parameters in FLEXPART and 5 meteorological categories 5 distributed among 162 unique configurations in WRF. The variance of the resulting unconstrained tracer concentration ensemble is substantial (i.e., the prior distribution), covering a 5% to 95% concentration probability range of about four orders of magnitude. About 80% of the unconstrained variance is due to source term parameter variations, with about half the overall variance coming from just three input parameters (release amount, latitude, and longitude). Although the meteorological inputs are not dominant sources of ensemble variability, they cumulatively account for 20% of the variance in the prior distribution 10 and are important because their uncertainty is not easily reduced.
By calculating the mean squared error and correlation between the tracer measurements and the surrogate model predictions, the Bayesian inversion algorithm produces a posterior distribution of model inputs and outputs for the tracer release experiment.
Even though the source term parameters are initially unknown in the inversion (i.e., an non-informative prior), the most likely posterior values of the FLEXPART inputs closely estimate the actual values used in the tracer release experiment, which 15 demonstrates a successful inversion. Table 3  It is important to keep in mind that the ensemble and inversion methods can be applied to problems other than nuclear power plant releases. While the location of a nuclear power plant release is generally restricted to reactor buildings or other nearby facilities, the inversion algorithm can also determine an arbitrary release location from within a large area (e.g., 100's of km 2 ) 25 if suitable observations are available.