ACPAtmospheric Chemistry and PhysicsACPAtmos. Chem. Phys.1680-7324Copernicus GmbHGöttingen, Germany10.5194/acp-15-7039-2015Balancing aggregation and smoothing errors in inverse modelsTurnerA. J.aturner@fas.harvard.eduhttps://orcid.org/0000-0003-1406-7372JacobD. J.School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USADepartment of Earth and Planetary Sciences, Harvard University, Cambridge, Massachusetts, USAA. J. Turner (aturner@fas.harvard.edu)30June201515127039704804December201413January201530March201508May2015This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/This article is available from https://acp.copernicus.org/articles/15/7039/2015/acp-15-7039-2015.htmlThe full text article is available as a PDF file from https://acp.copernicus.org/articles/15/7039/2015/acp-15-7039-2015.pdf
Inverse models use observations of a system (observation vector) to
quantify the variables driving that system (state vector) by
statistical optimization. When the observation vector is large,
such as with satellite data, selecting a suitable dimension for the
state vector is a challenge. A state vector that is too large
cannot be effectively constrained by the observations, leading to
smoothing error. However, reducing the dimension of the state
vector leads to aggregation error as prior relationships between
state vector elements are imposed rather than optimized. Here we
present a method for quantifying aggregation and smoothing errors as
a function of state vector dimension, so that a suitable dimension
can be selected by minimizing the combined error. Reducing the
state vector within the aggregation error constraints can have the
added advantage of enabling analytical solution to the inverse
problem with full error characterization. We compare three methods
for reducing the dimension of the state vector from its native
resolution: (1) merging adjacent elements (grid coarsening),
(2) clustering with principal component analysis (PCA), and (3) applying
a Gaussian mixture model (GMM) with Gaussian pdfs as state vector
elements on which the native-resolution state vector elements are
projected using radial basis functions (RBFs). The GMM method leads
to somewhat lower aggregation error than the other methods, but more
importantly it retains resolution of major local features in the
state vector while smoothing weak and broad features.
Introduction
Inverse models quantify the state variables driving the evolution of
a physical system by using observations of that system. This requires
a physical model F, known as the forward model, that
relates a set of input variables x (state vector) to a set of
output variables y (observation vector),
y=F(x)+ϵ.
The observational error ϵ includes contributions from
both the forward model and the measurements. Solution to the inverse problem
involves statistical optimization to achieve a best error-weighted estimate
of x given y.
A critical step in solving the inverse problem is determining the amount of
information contained in the observations and choosing the state vector
accordingly. This is a non-trivial problem when using large observational
data sets with large errors. An example that will guide our discussion is the
inversion of methane emissions on the basis of satellite observations of
atmospheric methane concentrations . Methane
concentrations can be predicted on the basis of emissions by using a chemical
transport model (CTM) that solves the 3-D continuity equation for methane
concentrations. Here the CTM is the forward model F, the satellite
provides a large observation vector y, and we need to choose the
resolution at which to optimize the methane emission vector x.
The simplest approach would be to use the native resolution of the CTM in
order to extract the maximum information from the observations. However, the
observations may not be sufficiently dense or precise to optimize emissions
at that level of detail, resulting in an underdetermined problem.
refer to this as the “resolution problem”. The inverse
solution must then rely on some prior estimate for the state vector and may
not be able to depart sufficiently from that knowledge. The associated error
is known as the smoothing error and increases
with size of the state vector . illustrate the severity of
this problem in their inversion of methane emissions using satellite data.
An additional drawback of using a large state vector is that analytical
solution to the inverse problem may not be computationally tractable.
Analytical solution requires calculation of the Jacobian matrix,
∇xF, and inversion and multiplication of the error
covariance matrices . It has the major advantage of
providing complete error statistics as part of the solution, but it becomes
impractical as the state vector becomes too large. Numerical solutions using
variational methods circumvent this problem but do not provide error
characterization as part of the solution. Approximate error statistics can be
obtained e.g.,, but at the cost of additional
computation.
Reducing the dimensionality of the state vector in the inverse problem thus
has two advantages. It improves the observational constraints on individual
state vector elements and it facilitates analytical solution. Reduction can
be achieved by aggregating state vector elements. For a state vector of
gridded time-dependent emissions, the state vector can be reduced by
aggregating grid cells and time periods. However, this introduces error in
the inversion as the underlying spatial and temporal patterns of the
aggregated emissions are now imposed from prior knowledge and not allowed to
be optimized as part of the inversion. The resulting error is called the
aggregation error .
Previous work by , , ,
, and developed optimal grids that allow the
transfer of information across multiple scales. These computationally
efficient methods generally require the use of the
native-resolution grid to derive the optimal representation. They also assume
that the native-resolution prior error covariance matrices can be accurately
constructed. However, in practice we are generally unable to specify
realistic prior error correlations and must resort to simple assumptions.
Here we present a method for optimizing the selection of the state vector in
the solution of the inverse problem for a given ensemble of observations
without requiring an accurate specification of the native-resolution prior
error covariance matrix. Instead, we use the expected error correlations
between native-resolution state vector elements as criteria in the
aggregation process. Relative to , our method is suboptimal
but is more practical to implement. As the dimension of the state vector
decreases, the smoothing error decreases while the aggregation error
increases. Therefore, there is potentially an optimum dimension where the
overall error is minimized. We derive an analytical expression for the
aggregation error covariance matrix and show how this can guide selection of
a reduced-dimension state vector where the aggregation error remains below an
acceptable threshold. We also show how intelligent selection of the state
vector can extract more information from the observations for a given state
vector dimension.
Formulating the inverse problem
Inverse problems are commonly solved using Bayes' theorem,
P(x|y)∝P(y|x)P(x),
where P(x|y) is the posterior probability density
function (pdf) of the state vector x (n×1) given
a vector of observations y (m×1), P(x) is the
prior pdf of x, and P(y|x) is the conditional
pdf of y given the true value of x. Assuming Gaussian
distributions for P(y|x) and P(x) allows us to
write the posterior pdf as
P(x|y)∝exp-12y-F(x)TSO-1y-F(x)-12xa-xTSa-1xa-x,
where xa is the n×1 prior state vector,
SO is the m×m observational error covariance
matrix, and Sa is the n×n prior error
covariance matrix. Here and elsewhere, our notation and terminology follow
that of . The most probable solution x^ (called
the maximum a posteriori or MAP) is defined by the maximum of
P(x|y), i.e., the minimum of the cost function
J(x):
J(x)=12y-F(x)TSO-1y-F(x)+12xa-xTSa-1xa-x.
This involves solving
∇xJ=∇xF(x)TSO-1F(x)-y+Sa-1xa-x=0.
Solution to Eq. () can be done analytically if F
is linear; i.e., F(x)=Kx+c where
K≡∇xF=∂y/∂x is the Jacobian of F and
c is a constant that can be set to zero in the general case by
subtracting c from the observations. This yields
x^=xa+Gy-Kxa,
where G=S^KTSO-1 is the gain
matrix and S^ is the posterior error covariance matrix,
S^=KTSO-1K+Sa-1-1
The MAP solution can also be expressed in terms of the true value
x as
x^=xa+Ax-xa+Gϵ,
where A is the averaging kernel matrix that measures the error
reduction resulting from the observations
A=GK=I-S^Sa-1
and Gϵ is the observation error in state space
with error covariance matrix GSOGT.
We have assumed here that errors are unbiased, as is standard practice in the
inverse modeling literature. An observational error bias bO
would propagate as a bias GbO in the solution
x^ in Eq. ().
The analytical solution to the inverse problem thus provides full error
characterization as part of the solution. It does require that the forward
model be linear. The Jacobian matrix must generally be constructed
numerically, requiring n sensitivity simulations with the forward model.
Subsequent matrix operations are also of dimension n. This limits the
practical size of the state vector. The matrix operations also depend on the
dimension m of the observation vector, but this can be easily addressed by
splitting that vector into uncorrelated packets, a method known as sequential
updating .
The limitation on the state vector size can be lifted by finding the solution
to ∇xJ=0 numerically, rather than
analytically, for example by using the adjoint of the forward model to
calculate ∇xJ iteratively at successive approaches
to the solution e.g.,. This variational method allows for
optimization of state vectors of any size because the Jacobian is not
explicitly constructed. But it only yields the MAP solution, x^,
with no error statistics. Several approaches have been presented to obtain
approximate error characterization
e.g.,, but they
can be computationally expensive. An excessively large state vector relative
to the strength of the observational constraints also incurs smoothing error,
as discussed above.
Quantifying aggregation and smoothing errors
The resolution of the forward model (e.g., grid resolution of the CTM) places
an upper limit on the dimension for the state vector, which we call the
native dimension. As we reduce the dimension of the state vector from this
native resolution, the smoothing error decreases while the aggregation error
increases. Here we present analytical expressions for the aggregation and
smoothing error covariance matrices and show how they can be used to select
an optimal state vector dimension.
Aggregation error
As in , we define a restriction (aggregation) operator that
maps the native-resolution state vector x of dimension n to
a reduced-resolution vector xω of dimension p. We assume
a linear restriction operator Γω as a p×n
matrix relating xω to x:
xω=Γωx.
provide a detailed analysis of aggregation error for
reduced-resolution state vectors. Their analysis relies heavily on the
construction of a prolongation operator (Γ⋆) mapping
xω back to x: x=Γ⋆xω. However, construction of this
prolongation operator is not unique. We present here a simpler and more
practical method.
Aggregation error is the error introduced by aggregating state vector
elements in the inversion. The relationship between the aggregated elements
is not optimized as part of the inversion anymore and instead becomes an
unoptimized parameter in the forward model, effectively increasing the
forward model error and inhibiting the ability of the model to fit the
observations. The aggregation error is thus a component of the observational
error.
The aggregation error can be quantified by comparing the observational error
incurred by using the native-resolution state vector,
ϵ=y-Kx,
to that using the aggregated state vector,
ϵω=y-Kωxω.
Here y is the observation vector (common in both cases), x
and xω are the true values of the native-resolution and
aggregated state vectors, and K and Kω are the
native resolution and the reduced-dimension Jacobians. The only difference
between ϵ and ϵω is the
aggregation of state vector elements. As such,
ϵω=ϵ+ϵA
where ϵA is the aggregation error.
Rearranging,
ϵA=K-KωΓωx.
Obtaining the error statistics for ϵA
requires knowledge of the pdf of x for the ensemble of possible true
states cf.. Let x‾
represent the mean value of this ensemble and Se the
corresponding covariance matrix. The aggregation error covariance matrix is:
SA=EϵA-EϵAϵA-EϵAT
where E is the expected value operator.
EϵA=K-KωΓωx‾
is the bias introduced by the aggregation. Replacing into
Eq. ():
SA=K-KωΓωEx-x‾x-x‾TK-KωΓωT=K-KωΓωSeK-KωΓωT.
In designing our inversion system we use xa as our best
estimate of x‾ and Sa as our best
estimate of Se. Indeed, if xa=x there would be no aggregation error since the prior relationship
assumed between state vector elements would be correct, thus K=KωΓω and the aggregation bias would
be zero. Assuming Sa=Se allows us
to calculate the aggregation error covariance matrix as
SA=K-KωΓωSaK-KωΓωT
and we will use this expression in the analysis that follows. Application of
Eq. () requires computation of the native-resolution
Jacobian K, but this can be done for a limited test period only.
We will give an example below.
Smoothing error
Following , we can express the smoothing error on
x^ by rearranging Eqs. () and ():
x^-x=I-Axa-x+Gϵ,
where ϵS=I-Axa-x is the
smoothing error. As pointed out by , the smoothing error
statistics must be derived from the pdf of possible true states, in the same
way as for the aggregation error and characterized by the error covariance
matrix Se. For purposes of designing the inverse system
we assume that Se=Sa. Thus we have
SS=I-ASaI-AT.
We can also express the smoothing error in observation space,
ϵS∗, (i.e., as a difference between
y and Kx^) by multiplying both sides of
Eq. () by the Jacobian matrix:
Kx^-x=KI-Axa-x+KGϵ
so that
ϵS∗=KI-Axa-x.
The corresponding smoothing error covariance matrix in observation
space is
SS∗=KI-ASaI-ATKT.
This expression can be generalized to compute the smoothing error
covariance matrix in observation space for any reduced-dimension state
vector xω with Jacobian Kω, prior error
covariance matrix Sa,ω, and averaging
kernel matrix Aω:
SS∗=KωI-AωSa,ωI-AωTKωT.
Total error budget
From Eq. () we can see that the total error on
x^ without aggregation is ϵT=ϵS+Gϵ in the
state space, or ϵT∗=ϵS∗+KGϵ in the observation space. The
KG term in the observation space appears because we are interested
in the error on x^. If x^=x then
KG=I and A=I, thus
ϵS=0 and our total error reverts
to ϵ,
ϵT∗|x^=x=KI-Axa-x+KGϵ=ϵ.
Additional consideration of aggregation error for a reduced-dimension
state vector xω yields a total error in the state space
ϵT=ϵS+Gωϵ+GωϵA
where
Gω=KωTSO-1Kω+Sa,ω-1-1KωTSO-1
is the gain matrix for the reduced-dimension state vector. In the
observation space we get
ϵT∗=ϵS∗+KωGωϵ+KωGωϵA.
From these relationships we derive the total error covariance matrix
as
ST,ω=I-AωSa,ωI-AωT︸Smoothing Error+GωK-KωΓωSaK-KωΓωTGωT︸Aggregation Error+GωSOGωT︸Observation
Error
in the state space and
ST,ω∗=KωI-AωSa,ωI-AωTKωT︸Smoothing Error+KωGωK-KωΓωSaK-KωΓωTGωTKωT︸Aggregation Error+KωGωSOGωTKωT︸Observation
Error
in the observation space. A bias term should exhibit similar scale dependence
to the observation error term and could be included by following the
derivation from .
Each of the three error terms above depends on state vector dimension.
Because the smoothing error increases with state vector dimension while the
aggregation error decreases, analysis of the error budget can potentially
point to the optimal dimension where the total error is minimum. It can also
point to the minimum state vector dimension needed for the aggregation error
to be below a certain tolerance, e.g., smaller than the observation error. We
give an example in Sect. .
A caveat in the above expressions for the aggregation and smoothing error
covariance matrices is that they are valid only if the prior
xa is the mean value x‾ for the pdf of
true states and if the error covariance matrix Sa is the
covariance matrix for that pdf (Se=Sa). p. 49 and
provide a detailed discussion of the errors induced by failing to meet this
assumption. Since these assumptions define our prior, they can be taken as
valid for the purpose of selecting an appropriate state vector dimension in
an inverse problem. However, they should not be used to diagnose errors on
the inversion results.
Illustration of different approaches for aggregating a state
vector. Here the native-resolution state vector is a field of
gridded methane emissions at
12∘×23∘ resolution over
North America. Extreme reduction to eight state vector elements is
shown with individual elements distinguished by color.
Aggregation methods
Aggregation of state vector elements to reduce the state vector dimension
introduces aggregation error, as described in Sect. . The
aggregation error can be reduced by grouping elements with correlated errors.
Analyzing the off-diagonal structure of a precisely constructed prior error
correlation matrix would provide the best objective way to carry out the
aggregation, as described by , , and
. We generally lack such information but do have some qualitative
knowledge of prior error correlation that can be used to optimize the
aggregation. By aggregating regions that have correlated errors we can
exploit additional information that would otherwise be neglected in a
native-resolution inversion assuming (by default) uncorrelated errors.
Previous work by , , and used
tiling and tree-based aggregation methods, while used a
hierarchal clustering method based on prior error patterns.
also used principal component analysis (PCA) coupled to
the hierarchal grid to compute an optimal grid. Here we compare three
aggregation methods: (1) simple grid coarsening, (2) PCA clustering, and
(3) a Gaussian mixture model (GMM) with radial basis functions (RBFs) to
project native-resolution state vector elements to Gaussian pdfs.
A qualitative illustration of these methods is shown in
Fig. for the aggregation of a native-resolution state
vector of methane emissions with
12∘×23∘ native grid resolution over
North America . We focus here on spatial aggregation
and assume that the state vector has no temporal dimension. However, the same
methods can be used for temporal aggregation.
The simplest method for reducing the dimension of the state vector is to
merge adjacent elements, i.e., neighboring grid cells. This method considers
only spatial proximity as a source of error correlation. It may induce large
aggregation errors if proximal, but otherwise dissimilar regions are
aggregated together. In the case of methane emissions, aggregating
neighboring wetlands and farmland would induce large errors because different
processes drive methane emissions from these two source types.
The other two methods enable consideration of additional similarity factors
besides spatial proximity when aggregating state vector elements. These
similarity factors are expressed by vectors of dimension n describing
correlative properties of the original native-resolution state vector
elements. In the case of a methane source inversion, for example, we can
choose as similarity vectors latitude and longitude to account for spatial
proximity, but also wetland fraction to account for error correlations in the
bottom-up wetland emission estimate used as prior.
Similarity matrix for aggregation
Table lists the similarity vectors chosen for our example
problem of estimating methane emissions . The first
two vectors account for spatial proximity, the third represents the scaling
factors from the first iteration of an adjoint-based inversion at native
resolution , and the others are the source type patterns
from the bottom-up inventories used as prior. All similarity vectors are
normalized and then weighted by judgment of their importance. We choose here
to include initial scaling factors from the adjoint-based inversion because
we have them available and they can serve to correct any prior patterns that
are grossly inconsistent with the observations, or to identify local emission
hotspots missing from the prior. One iteration of the adjoint-based inversion
is computationally inexpensive and is sufficient to pick up major departures
from the prior.
Similarity vectors for inverting methane emissions in North
Americaa.
a The K=14 similarity vectors describe prior
error correlation criteria for the native-resolution state vector,
representing here the methane emission in North America at the
12∘×23∘ resolution of the GEOS-Chem
chemical transport model. The criteria are normalized and then weighted
(weighting factor). Criteria 4–14 are prior emission patterns used in the
GEOS-Chem model .
b The weighting factors (dimensionless) measure the estimated
relative importance of the different similarity criteria in determining prior
error correlations in the state vector. For the prior emission patterns these
weighting factors are the fractional contributions to total prior emissions
in North America. c Distance in kilometers from
the equator. d Distance in kilometers from the
prime meridian. e Initial scaling factors from
one iteration of an adjoint inversion at the native resolution.
Let c1,…,cK represent the K similarity
vectors chosen for the problem (K=14 in our example of
Table ). We assemble them into a n×K similarity
matrix C. We will also make use of the ensemble of similarity
vector values for individual state vector elements, which we assemble into
vectors {c1′,…,cn′} representing the
rows of C. Thus:
C=⋮c1⋮⋮c2⋮⋯⋮cK⋮=⋯c1′⋯⋯c2′⋯⋮⋯cn′⋯
In this work all of the aggregation methods
except for grid coarsening will use the same similarity matrix to construct
the restriction operator.
This approach of using a similarity matrix C to account for prior
error covariances bears some resemblance to the geostatistical approach for
inverse modeling e.g.,.
The geostatistical approach specifies the prior estimate as
xa=Cβ, where β is a vector
of unknown drift coefficients to be optimized as part of the inversion. Here
we use the similarity matrix to reduce the dimension of the state vector,
rather than just as a choice of prior constraints.
Clustering with principal component analysis
In this method we cluster state vector elements following the principal
components of the similarity matrix. It is generally not practical to derive
the principal components in state vector space because the n-dimension is
large. Instead we derive them in similarity space (dimension K) as the
eigenvectors of CTC sorted in order of importance by
their eigenvalues. The leading j principal components are kept for
clustering. The reduced state vector is then constructed by grouping state
vector elements that have the same sign patterns for all j principal
components. Each unique j-dimensional sign pattern constitutes a cluster.
The number of clusters defined in that way ranges between j and 2j.
Figure b shows an example of applying this method to
methane emissions in North America with reduction of the state vector to n=8. The separation into four quadrants reflects the importance of latitude
and longitude as error correlation factors. The additional separation within
each quadrant isolates large from weak sources as defined by the prior.
Gaussian mixture model (GMM)
Here we use a Gaussian mixture model GMM; to project the
native-resolution state vector onto p Gaussian pdfs using radial basis
functions (RBFs). Mixture models are probabilistic models for representing
a population comprised of p subpopulations. Each subpopulation is assumed
to follow a pdf, in this case Gaussian. The Gaussians are K-dimensional
where K is the number of similarity criteria. Each native-resolution state
vector element is fit to this ensemble of Gaussians using RBFs as weighting
factors.
The first step in constructing the GMM is to define a p×n weighting
matrix W=[w1,w2,…,wp]T. Each element
wi,j of this weighting matrix is the relative probability for
native-resolution state vector element j to be described by Gaussian
subpopulation i; i.e., “how much does element j look like Gaussian
i?”. It is given by
wi,j=πiN(cj′|μi,Λi)∑k=1pπjN(cj′|μk,Λk).
Here cj′ is the jth row of the similarity matrix
C, μi is a 1×K row vector of means for the
ith Gaussian, Λi is a K×K covariance matrix
for the ith Gaussian, and π=π1,…,πpT is the relative weight of the p Gaussians in the mixture.
Ncj′|μi,Λi
denotes the probability density of vector cj′ on the normal
distribution of Gaussian i. We define a p×K matrix
M with rows μi and a K×K×p
third-order tensor L=[Λ1,…,Λp] as the set of covariance matrices.
Projection of the native-resolution state vector onto the GMM involves four
unknowns: W, π, M, and
L. This is solved by constructing a cost function to
estimate the parameters of the Gaussians in the mixture model using maximum
likelihood:
JGMM(C|π,M,L)=∑j=1nln∑i=1pπiN(cj′|μi,Λi)
Starting from an initial guess for π, M,
and L we compute the weight matrix W using
Eq. (). We then differentiate the cost function with
respect to π, M, and L,
and set the derivative to zero to obtain seeμi=Ψi∑j=1nwi,jcj′,Λi=Ψi∑j=1nwi,jcj′-μiTcj′-μi,πi=1nΨi,
where
Ψi=∑j=1n1wi,j.
The weights are re-calculated from the updated guesses of W,
π, M, and L from
Eqs. () to (), and so on until
convergence. The final weights define the restriction operator as
Γω=W. The computational complexity for
the expectation-maximization algorithm is O(nK+pn2); however, the actual runtime will be largely dictated
by the convergence criteria. Here we use an absolute tolerance of τ<10-10 where
τ=∑i∑jMi,j-Mi,j⋆+∑i∑j∑kLi,j,k-Li,j,k⋆+∑iπi-πi⋆,
and the superscript star indicates the value from the previous iteration.
Gaussian mixture model (GMM) representation of methane
emissions in Southern California with Gaussian pdfs as state vector
elements. The Gaussians are constructed from a similarity matrix
for methane emissions on the
12∘×23∘ horizontal
resolution of the GEOS-Chem CTM used as forward model for the
inversion. The figure shows the dominant three Gaussians for
Southern California with contours delineating the 0.5, 1.0, 1.5,
and 2.0σ spreads for the latitude–longitude dimensions.
The RBF weights w1, w2, and w3 of the
three Gaussians for each
12∘×23∘ grid square are
also shown along with their sum.
The GMM allows each native-resolution state vector element to be represented
by a unique linear combination of the Gaussians through the RBFs. For a state
vector of a given dimension, defined by the number of Gaussian pdfs, we can
achieve high resolution for large localized sources by sacrificing resolution
for weak or uniform source regions where resolution is not needed. This is
illustrated in Fig. with the resolution of Southern
California in an inversion of methane sources for North America. The figure
shows the three dominant Gaussians describing emissions in Southern
California and the corresponding RBF weights for each native-resolution grid
square. Gaussian 1 is centered over Los Angeles and is highly localized,
Gaussian 2 covers the Los Angeles Basin, and Gaussian 3 is a Southern
California background. The sum of these three Gaussians accounts for most of
the emissions in Southern California and Nevada (which is mostly background).
Additional Gaussians (not shown) resolve the southern San Joaquin Valley
(large livestock and oil/gas emissions) and Las Vegas (large emissions from
waste).
Application
We apply the aggregation methods described above to our example problem of
estimating methane emissions from satellite observations of methane
concentrations, focusing on selecting a reduced-dimension state vector that
minimizes aggregation and smoothing errors. The inversion is described in
detail in and uses GOSAT satellite observations for
2009–2011 over North America. The forward model for the inversion is the
GEOS-Chem CTM with 12∘×23∘ grid
resolution. The native-resolution state vector of methane emissions as
defined on that grid includes 7366 elements.
For the purpose of selecting an aggregated state vector for the inversion, we
consider a subset of observations for May 2010 (m=6070) so that we can
afford to construct the corresponding Jacobian matrix K at the
native resolution; this is necessary to derive the aggregation error
covariance matrix following Eq. (). The prior error
covariance matrix is specified as diagonal with 100 % uncertainty at the
native resolution, decreasing with aggregation following the central limit
theorem . The observational error covariance matrix is
also diagonal and specified as the scene-specific retrieval error from
, which dominates the total observational error as shown by
. We compare the three methods presented in
Sect. for aggregating the state vector in terms of the
implications for aggregation and smoothing errors for different state vector
dimensions. In addition to the GMM with RBFs, we also consider a “GMM
clustering” method where each native-resolution state vector element is
assigned exclusively to its dominant Gaussian pdf. This yields sharp
boundaries between clusters (Fig. ) as in the grid
coarsening and PCA methods.
Aggregation and smoothing error dependences on the
aggregation of state vector elements in an inverse model. The
application here is to an inversion of methane emissions over North
America using satellite methane data with 7366 native-resolution
state vector elements Sect.
and. Results are shown as the square roots of
the means of the diagonal terms (mean error standard deviation) in the aggregation
and smoothing error covariance matrices. Different methods for
aggregating the state vector (Sect. ) are shown as
separate lines. Note the log scale on the x axis.
Figure shows the mean error standard deviation in
the aggregation and smoothing error covariance matrices, computed as the
square root of the mean of the diagonal terms, as a function of state vector
dimension. The aggregation error is zero by definition at the native
resolution (7366 state vector elements), and increases as the number n of
state vector elements decreases, following a roughly n-0.7 dependence.
Conversely, the smoothing error increases as the number of state vector
elements increases, following roughly a log(n) dependence. The different
aggregation methods of Sect. yield very similar smoothing
errors, suggesting that any reasonable aggregation scheme (such as
k means clustering; c.f. ) would perform
comparably. The aggregation error is somewhat improved using the GMM method.
RBF weighting performs slightly better than GMM clustering (sharp
boundaries). As discussed above, a major advantage of the GMM method is its
ability to retain resolution of large localized sources after aggregation.
Figure shows the sum of contributions from
aggregation, smoothing, and observational error standard deviations as a function of
state vector aggregation using the GMM with RBF weighting. In this
application, aggregation error dominates for small state vectors (n<100), but drops below the observation error for n>100 and below
the smoothing error for n>1000. The smoothing error remains
smaller than the observational error even at the native resolution (n=7366). The observational error is not independent of aggregation,
as shown in Eq. (), but we find here that
the dependence is small.
Total error budget from the aggregation of state vector
elements in an inverse model. The application here is to an
inversion of methane emissions over North America using satellite
methane data with 7366 native-resolution state vector
elements Sect. and.
Results are shown as the square roots of the means of the diagonal
terms (mean error standard deviation) in the aggregation, smoothing, and
observational error covariance matrices, and for the sum of these
matrices. Aggregation uses the GMM with RBF weighting
(Sect. ). There is an optimum state vector size
for which the total error is minimum and this is shown as the
circle. Gray shading indicates the 90 % range
for the total error on individual elements as diagnosed from the 5th and 95th quantiles of
diagonal elements in the total error covariance matrix. Note the
log scale on the x axis.
From Fig. we can identify a state vector dimension for
which the total error is minimum (n=2208; circle in
Fig. ). However, error growth is small until n≈200, below which the aggregation error grows rapidly. A state vector of 369
elements, as adopted by , does not incur significant
errors associated with aggregation or smoothing, and enables computation of
an analytical solution to the inverse problem with full error
characterization.
Previous work by , , ,
, and analyzed the scale dependence of
different grids using the degrees of freedom for signal: DFS=Tr(I-Sa,ω-1S^ω). These past works
found this error metric to be monotonically increasing. This implies that the
native-resolution grid will have the least total error and there is no
optimal resolution, except from a numerical efficiency standpoint. Here we
find a local minimum that is, seemingly, at odds with this previous work.
However, the reasoning for this local minimum is that we have allowed the
aggregation to account for spatial error correlations that we are unable to
specify at the native resolution. As such, we are taking more information
into account and obtaining a minimum total error at a state vector size that
is smaller than the native resolution. If the native-resolution error
covariance matrices were correct, then, as previous work showed, the only
reason to perform aggregation would be to reduce the computational expense
and the grid used here would be suboptimal because it does not depend on the
native-resolution grid.
Conclusions
We presented a method for optimizing the selection of the state vector
in the solution of the inverse problem for a given ensemble of
observations. The optimization involves minimizing the total error in
the inversion by balancing the aggregation error (which increases as
the state vector dimension decreases), the smoothing error (which
increases as the state vector dimension increases), and the
observational error. We further showed how one can reduce the state
vector dimension within the constraints from the aggregation error in
order to facilitate an analytical solution to the inverse problem with
full error characterization.
We explored different methods for aggregating state vector elements as
a means of reducing the dimension of the state vector. Aggregation
error can be minimized by grouping state vector elements with the
strongest correlated prior errors. We showed that a Gaussian mixture
model (GMM), where the state vector elements are multi-dimensional
Gaussian pdfs constructed from prior error correlation patterns, is
a powerful aggregation tool. Reduction of the state vector dimension
using the GMM retains fine-scale resolution of important features in
the native-resolution state vector while merging weak or uniform
features.
Acknowledgements
For advice and discussions, we thank K. Wecht (Harvard
University). Special thanks to R. Parker and H. Boesch
(University of Leicester) for providing the GOSAT observations.
This work was supported by the NASA Carbon Monitoring System and by
a Department of Energy (DOE) Computational Science Graduate
Fellowship (CSGF) to A. J Turner. We thank the Harvard SEAS Academic
Computing center for access to computing resources. We also thank
M. Bocquet and an anonymous reviewer for their thorough comments. Edited by: R. Harley
References Bishop, C. M.: Pattern Recognition
and Machine Learning, Springer, 1st Edn., New York, 2007.Bocquet, M.: Towards optimal choices of control space representation
for geophysical data assimilation, Mon. Weather Rev., 137, 2331–2348,
doi:10.1175/2009MWR2789.1, 2009.Bocquet, M. and Wu, L.: Bayesian design of control space for optimal
assimilation of observations. II: Asymptotics solution,
Q. J. Roy. Meteor. Soc., 137, 1357–1368,
doi:10.1002/qj.841, 2011.Bocquet, M., Wu, L., and Chevallier, F.: Bayesian design of control
space for optimal assimilation of observations. Part I: Consistent
multiscale formalism, Q. J. Roy. Meteor. Soc., 137, 1340–1356,
doi:10.1002/qj.837, 2011.Bousserez, N., Henze, D. K., Perkins, A.,
Bowman, K. W., Lee, M., Liu, J., Deng, F., and Jones, D. B. A.: Improved analysis-error
covariance matrix for high-dimensional variational inversions: application to
source estimation using a 3D atmospheric transport model, Q. J. Roy. Meteor. Soc.,
doi:10.1002/qj.2495, online first, 2015.Bousquet, P., Peylin, P.,
Ciais, P., Le Quere, C., Friedlingstein, P., and Tans, P. P.:
Regional changes in carbon dioxide fluxes of land and oceans since
1980, Science, 290, 1342–1346,
doi:10.1126/Science.290.5495.1342,
2000. Chen, Z., Haykin, S., Eggermont, J. J., and Becker, S.:
Correlative Learning: A Basis for Brain and Adaptive Systems, John Wiley & Sons,
1st Edn., New York, 2007.Chevallier, F., Breon, F. M., and
Rayner, P. J.: Contribution of the Orbiting Carbon Observatory to
the estimation of CO2 sources and sinks: theoretical study
in a variational data assimilation framework,
J. Geophys. Res.-Atmos., 112, D09307,
doi:10.1029/2006jd007375,
2007.Courtier, P., Thepaut, J., and Hollingsworth, A.: A strategy for operational
implementation of 4D-Var, using an incremental approach, Q. J. Roy. Meteor.
Soc., 120, 1367–1387,
doi:10.1002/qj.49712051912, 1994.Desroziers, G., Berre, L., Chapnik, B., and
Poli, P.: Diagnosis of observation, background and analysis-error
statistics in observation space, Q. J. Roy. Meteor. Soc., 131,
3385–3396,
doi:10.1256/qj.05.108, 2005.Gourdji, S. M., Mueller, K. L.,
Schaefer, K., and Michalak, A. M.: Global monthly averaged CO2
fluxes recovered using a geostatistical inverse modeling approach:
2. Results including auxiliary environmental data, J. Geophys. Res.,
113, D21115,
doi:10.1029/2007jd009733, 2008.Henze, D. K., Hakami, A., and Seinfeld, J. H.: Development of the adjoint of
GEOS-Chem, Atmos. Chem. Phys., 7, 2413–2433, 10.5194/acp-7-2413-2007,
2007.Kaminski, T. and Heimann, M.: Inverse modeling of atmospheric carbon
dioxide fluxes, Science, 294, p. 259,
doi:10.1126/science.294.5541.259a, 2001.Kaminski, T., Rayner, P. J., Heimann, M.,
and Enting, I. G.: On aggregation errors in atmospheric transport
inversions, J. Geophys. Res., 106, 4703,
doi:10.1029/2000jd900581, 2001.
Koohkan, M. R., Bocquet, M., Wu, L.,
and Krysta, M.: Potential of the International Monitoring System radionuclide
network for inverse modelling, Atmos. Environ., 54, 557–567,
doi:10.1016/j.atmosenv.2012.02.044, 2012.Michalak, A. M., Bruhwiler, L., and
Tans, P. P.: A geostatistical approach to surface flux estimation of
atmospheric trace gases, J. Geophys. Res., 109, D14109,
doi:10.1029/2003jd004422, 2004.Michalak, A. M., Hirsch, A.,
Bruhwiler, L., Gurney, K. R., Peters, W., and Tans, P. P.: Maximum
likelihood estimation of covariance parameters for Bayesian
atmospheric trace gas surface flux inversions, J. Geophys. Res.,
110, D24107,
doi:10.1029/2005jd005970, 2005.Miller, S. M., Kort, E. A., Hirsch, A. I.,
Dlugokencky, E. J., Andrews, A. E., Xu, X., Tian, H., Nehrkorn, T.,
Eluszkiewicz, J., Michalak, A. M., and Wofsy, S. C.: Regional
sources of nitrous oxide over the United States: seasonal variation
and spatial distribution, J. Geophys. Res., 117, D06310,
doi:10.1029/2011jd016951, 2012.Parker, R., Boesch, H., Cogan, A., Fraser, A.,
Feng, L., Palmer, P. I., Messerschmidt, J., Deutscher, N.,
Griffith, D. W. T., Notholt, J., Wennberg, P. O., and Wunch, D.:
Methane observations from the Greenhouse Gases Observing SATellite:
comparison to ground-based TCCON data and model calculations,
Geophys. Res. Lett., 38, L15807,
doi:10.1029/2011gl047871,
2011. Rodgers, C. D.: Inverse Methods
for Atmospheric Sounding, World Scientific, Singapore, 2000.Schuh, A. E., Denning, A. S., Uliasz, M., and
Corbin, K. D.: Seeing the forest through the trees: recovering
large-scale carbon flux biases in the midst of small-scale
variability, J. Geophys. Res., 114, G03007,
doi:10.1029/2008jg000842,
2009.Turner, A. J., Jacob, D. J., Wecht, K. J., Maasakkers, J. D.,
Lundgren, E., Andrews, A. E., Biraud, S. C., Boesch, H., Bowman, K. W., Deutscher, N. M., Dubey, M. K., Griffith, D. W. T.,
Hase, F., Kuze, A., Notholt, J., Ohyama, H., Parker, R., Payne, V. H., Sussmann, R., Sweeney, C., Velazco, V. A., Warneke, T.,
Wennberg, P. O., and Wunch, D.: Estimating global and North American methane emissions
with high spatial resolution using GOSAT satellite data, Atmos. Chem. Phys., 15, 7049–7069, 10.5194/acp-15-7049-2015, 2015.von Clarmann, T.: Smoothing error pitfalls, Atmos. Meas. Tech., 7,
3023–3034, 10.5194/amt-7-3023-2014, 2014.Wecht, K. J., Jacob, D. J., Frankenberg, C.,
Jiang, Z., and Blake, D. R.: Mapping of North American methane
emissions with high spatial resolution by inversion of SCIAMACHY
satellite data, J. Geophys. Res.-Atmos., 119, 7741–7756,
doi:10.1002/2014jd021551, 2014.Wu, L., Bocquet, M.,
Lauvaux, T., Chevallier, F., Rayner, P., and Davis, K.: Optimal representation
of source-sink fluxes for mesoscale carbon dioxide inversion with synthetic data,
J. Geophys. Res., 116, D21304,
doi:10.1029/2011jd016198, 2011.