Direct inversion of circulation and mixing from tracer measurements – Part 1: Method

From a series of zonal mean global stratospheric tracer measurements sampled in altitude vs. latitude, circulation and mixing patterns are inferred by the inverse solution of the continuity equation. As a first step, the continuity equation is written as a tendency equation, which is numerically integrated over time to predict a later atmospheric state, i.e., mixing ratio and air density. The integration is formally performed by the multiplication of the initially measured atmospheric state vector by a linear prediction operator. Further, the derivative of the predicted atmospheric state with respect to the wind vector components and mixing coefficients is used to find the most likely wind vector components and mixing coefficients which minimize the residual between the predicted atmospheric state and the later measurement of the atmospheric state. Unless multiple tracers are used, this inversion problem is under-determined, and dispersive behavior of the prediction further destabilizes the inversion. Both these problems are addressed by regularization. For this purpose, a first-order smoothness constraint has been chosen. The usefulness of this method is demonstrated by application to various tracer measurements recorded with the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS). This method aims at a diagnosis of the Brewer–Dobson circulation without involving the concept of the mean age of stratospheric air, and related problems like the stratospheric tape recorder, or intrusions of mesospheric air into the stratosphere.


Introduction
In the context of climate change, possible changes in the intensity of the Brewer-Dobson circulation have become an important research topic.Climate models predict an intensification of the Brewer-Dobson circulation (Butchart et al., 2006).Engel et al. (2009), however, found a weakly significant slow increase in the mean age of stratospheric air.The latter is defined as the mean time lag between the date of the transition of tropospheric air into the stratosphere and the date when the mixing ratio of a monotonically growing tracer was measured in the air volume under investigation, and its increase suggests a deceleration of the Brewer-Dobson circulation.These measurements have been challenged as not being representative (Garcia et al., 2011), and global mean age of air measurements by Stiller et al. (2012) suggest that the true picture is not that one-dimensional.Instead, stratospheric age trends vary with altitude and with latitude.The determination of the age of air and its use as a diagnostic of the intensity of the Brewer-Dobson circulation, however, has its own limitations.First, due to mixing processes, the age of a stratospheric air volume is not unique but characterized by an age spectrum, which has to be considered since the tropospheric growth of SF 6 mixing ratios is not strictly linear; some ad hoc assumptions on this spectrum have to be made (Waugh and Hall, 2002).These include the adequacy of the Wald (inverse Gaussian) function for the representation of the age spectrum and the choice of its width parameter.Second, the most suited age tracer, SF 6 , which has significant and monotonic growth rates in the troposphere, is not fully inert: it has a mesospheric sink (Hall and Waugh, 1998;Reddmann et al., 2001) and introduces some age uncertainty when mesospheric air subsides into the stratosphere in the polar winter vortex (Stiller et al., 2008).Third, the de-Published by Copernicus Publications on behalf of the European Geosciences Union.
T. von Clarmann and U. Grabowski: Circulation and mixing from tracer measurements termination of the mean age relies upon a reference air mass where the age, by definition, is 0. When the age of air concept was introduced, the reference was simply the troposphere, which is well mixed and thus avoids any related complication (Solomon, 1990;Schmidt and Khedim, 1991).Since the age of air has become a model diagnostic, parts of the community have established the stratospheric entry point as a reference (Hall and Plumb, 1994), which makes a difference due to the slow ascent of air through the tropical tropopause layer (Fueglistaler et al., 2009).For model validation, however, this redefined age of air is of limited use because no long measured time series of tracer mixing ratios are available there.
Facing these difficulties, it is desirable to infer the atmospheric circulation directly from tracer measurements, without going back to the age of air concept.Multiple approaches have been developed to infer wind fields from measured atmospheric state variables.Sequential data assimilation and, in its optimal form, the extended Kalman filter approach (e.g., Ghil and Malanotte-Rizzoli, 1991;Ghil, 1997), calculates the optimal average of the forecasted meteorological variables for the time of the observation and the observed meteorological variables themselves and uses this average to initialize the next forecast step.The wind field is calculated by a dynamical model.This method involves the generalized inversion of the observation operator where the forecast is used as a constraint.In contrast, so-called1 variational data assimilation minimizes the residual between the forecasted and the measured atmospheric state variables by optimally adjusting the initialization of the forecast via inversion of an adjoint forecast model, constrained by some background state (Thompson, 1969).Both approaches rely on dynamical models2 and are suited to infer the most probable atmospheric state variables rather than the wind field, which is a by-product of the assimilation.The wind field or atmospheric circulation can also be inferred directly by kinematic methods from tracer measurements.Such methods rely solely on the continuity equation, do not involve a dynamic model, and thus do not depend on any ad hoc parameterization of effects which are either not resolved by the discrete model, computationally too expensive for explicit modeling, or simply not well understood.While this work is targeted primarily at an assessment of the Brewer-Dobson circulation, its applicability is much wider and includes stratospheric-tropospheric exchange, the mesospheric overturning circulation, and other areas.Early approaches to infer the circulation from tracer measurements include Holton and Choi (1988) as well as Salby and Juckes (1994) who used approaches which share several ideas with ours.The direct inversion of wind speeds from tracer measurements in a volcano plume has, e.g., been suggested by Krueger et al. (2013), however without consideration of mixing.The continuity equation including diffusion terms has been exploited by Wofsy et al. (1994) for the assessment of diffusion of stratospheric aircraft exhaust.
In this paper we present a method to infer two-dimensional (latitude-altitude) circulation and mixing coefficients from subsequent measurements of inert tracers.The application of this method, i.e., the inference of the Brewer-Dobson circulation from global SF 6 distributions (Stiller et al., 2008(Stiller et al., , 2012) ) measured with the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), is presented in a companion paper.In order to avoid the reader not seeing the forest for the trees, we give a short overview of our method in Sect. 2. The prediction of pressure and tracer mixing ratio fields on the basis of the continuity equation and related error estimation is described in Sect.3. The estimation of circulation and mixing coefficients by the inversion of the continuity equation is presented in Sect. 4. In Sect.5, the applicability of the method and the need for further refinements is discussed critically.The benefits of the method are discussed in Sect.6.The paper concludes with recommendations on how these results should be used and with an outlook for future work (Sect.7).Changes in the Brewer-Dobson circulation during 2002-2012, i.e., the MIPAS mission, are currently investigated by means of this method and will be published in a subsequent paper.

General concept
Knowing the initial state of the atmosphere in terms of mixing ratio and air density distributions, wind speed, and mixing coefficients at each grid point, a future atmospheric state can be predicted with respect to the distribution of any inert tracer.This procedure we call the forward problem.If no ideal tracers are available, source and sink terms of related species have to be included in the forward model.The goal of this work is to invert the forward model in order to infer the circulation and mixing coefficients from tracer measurements by the minimization of the residual between the predicted and measured atmospheric state.This approach is complementary to free-running climate models because it makes no assumptions about atmospheric dynamics except for the validity of the continuity equation.It is further considered more robust than age-of-air analysis (Stiller et al., 2012) because it does not depend on a reference point where the age is assumed 0, nor does it require knowledge of the history of an air parcel.
Our concept involves the following operations.First, a general solution of the forward problem is formulated (Sect.3).The forward problem is the solution of the prediction equation as a function of the initial atmospheric state for given winds and mixing coefficients.For our chosen solver, which involves the MacCormack (1969) integration scheme for the solution of the transport problem (Eqs.5-10), the relevant dependencies of the final state on the initial state are reported in .The formulation in matrix notation (Eqs.27-28) allows the easy treatment of multiple successive time steps (Eq.29) and an easy estimation of the prediction error via generalized Gaussian error estimation (Eq.30).As a next step, the dependence of the predicted state on winds and mixing coefficients is estimated for a given initial state.This is achieved by the differentiation of the solution of the prediction equation with respect to winds and mixing coefficients .These partial derivatives form the Jacobian matrix of the problem, with which the estimation of winds and mixing coefficients can be reduced to a constrained least squares optimization problem where the inversely variance-weighted residual between the predicted atmospheric state and the respective measured atmospheric state is minimized.The latter step involves the generalized inverse of the Jacobian matrix (Eqs.78-90).

The forward problem
The forward model reads the measured atmospheric state at time t and predicts the atmospheric state (number density of air, c, and volume mixing ratios, vmr) at time t + t for given wind vectors and mixing coefficients representing the time interval [t; t + t] by solving the continuity equation.The continuity equation allows us to calculate the local tendencies of the number densities and volume mixing ratios.These local tendencies ∂ρ ∂t and ∂vmr ∂t are then integrated over time to give the new number densities and mixing ratios.

The continuity equation
The local change in number density ρ of air is in spherical coordinates (for all auxiliary calculations, see Supplement 1): Here the shallowness approximation (Hinkelmann, 1951;Phillips, 1966, quoted after Kasahara, 1977), which simplifies the equations using the assumption that z is much smaller than r E and which is, often implicitly, used in the usual textbooks on atmospheric sciences (e.g., Brasseur and Solomon, 2005, their Eq. 3.46a), is intentionally not used for reasons which will become clear in Sect.3.2.The local change in the volume mixing ratio of gas g can be calculated from known velocities and mixing coefficients as well as source/sink terms as where vmr g is the volume mixing ratio of species g, K λ is the zonal diffusion coefficient, K φ is the meridional diffusion coefficient, K z is the vertical diffusion coefficient, and S g is the production/loss rate of species g in terms of number density over time (e.g., Brasseur andSolomon, 2005, Eq. 3.46b, andJones et al., 2007).
Since we are only interested in a two-dimensional representation of the atmosphere in altitude and latitude coordinates, zonal advection and mixing terms are ignored in Eqs. ( 1)-(2).In this two-dimensional representation, all atmospheric state variables represent zonal mean values.The kinematic variables, viz. the velocities and mixing coefficients, have to be reinterpreted because they do not represent merely the zonally averaged velocities and mixing coefficients.Instead, they include also eddy transport and diffusion terms, and their interpretation is less unique than one might hope because it depends on the definition of the kinematic variables and the approximations used (see Appendix A for details of the interpretation of the kinematic quantities).The local change in number density ρ of air in a two-dimensional atmosphere thus is and the local change in vmr g is calculated as

Integration of tendencies
The integration of Eqs.(3)-( 4) is performed numerically for time steps of t p .For practical reasons, processes (advection, diffusion, sinks) are split; i.e., the tendencies due to these three classes of processes are integrated independently.The time steps t p used for the integration are chosen to be smaller than the time difference t between two measurements, in order not to clash with the Courant limit (Courant et al., 1952).In the following, we call t p a "micro time increment" and t "macro time increment".The atmospheric state after a macro time increment is predicted by successive prediction over the micro time increment.In the following, index i designates time t, i + 1 designates the time t + t p , etc., and I designates the time after the final micro time increment, i.e., the next macro time increment.
For the discrete integration of the advection part of the tendencies, the MacCormack (1969) method is used in a generalized multidimensional version similar to the one described by (Perrin and Hu, 2006).This is a predictor-corrector method.For a general state variable c(t, x, y) = c i (x, y) at location (x, y), and time t with e(c) and f (c) being functions of c, an equation of the form is solved by preliminary predictions of the state variable as a first step: These are then used in a subsequent correction step which gives the final prediction Application to the continuity equation in spherical coordinates requires the reformulation of Eq. (3) (see, e.g., Chang and St.-Maurice, 1991): The predictor of r 2 ρ cos(φ) is then calculated as and the corrected prediction for ρ then gives For respectively.
Atmos.Chem.Phys., 16, 14563-14584, 2016 www.atmos-chem-phys.net/16/14563/2016/Sinks of the species considered here are treated as unimolecular processes (see, e.g., Brasseur and Solomon, 2005, their Eq. 2.27d) and integrated as where k g is the sink strength of the gas g.The abundance of gas g after time step t p is then simply the sum of the increments due to horizontal and vertical advection, diffusion, and chemical losses.
Admittedly, there exist more elaborate advection schemes than the one used here.However, the need to provide the Jacobians needed in Sects.3.3-4 justifies a reasonable amount of simplicity.Further, numerical errors cannot easily accumulate because after each time step t, the system is reinitialized with measured data.
Since we do not have a closed system but have mass exchange and mixing with the atmosphere below the lowermost model altitude and above the uppermost altitude, the atmospheric state is not predicted for the lowermost and uppermost altitudes.Prediction is only possible from the second altitude from the bottom up to the altitude below the uppermost one.Henceforth, we call this restricted altitude range the "nominal altitude range".Instead, the atmospheric state of the uppermost and lowermost altitude is estimated by the linear interpolation of measured values at times t and t + t and used as a boundary condition for prediction within the nominal altitude range.Alternatively, derivatives at the border can be approximated by asymmetric difference quotients.
We use the following convention.Atmospheric state variables are sampled on a regular latitude-altitude grid.For some grid points, no valid measurements may be available but we assume that, for each state variable, we have a contiguous subset of this grid with valid measurements.For state variable g, we have a total of J g valid measurements within the "nominal altitude range", each denoted by index j .A state variable in this context is either air density ρ (g = 0) or the mixing ratio of one species vmr g .The nominal altitude range at latitude φ is the altitude range where, for each grid point, a valid measurement is available at the grid point itself and for its northern, southern, upper, lower, and diagonal neighbors.The use of asymmetric difference quotients can be emulated by extrapolation, generating artificial border values, which guarantee that each grid point within the nominal altitude and latitude range has all required neighbors.The availability of neighbor values is necessary to allow the calculation of numerical derivatives of the state variable with respect to latitude and altitude.We therefore have K g border elements of each quantity g, each denoted by index k.This implies, for each state variable g, a total of L g = J g + K g grid points with indices l.

Integration in operator notation
For further steps (error propagation and the solution of the inverse problem), it is convenient to rewrite the prediction of air density and mixing ratios in matrix notation.For this purpose, we differentiate the predicted air densities (Eq.10) and mixing ratios (Eqs.11-13) with respect to air density and mixing ratios of the gases under assessment at all relevant locations.The sensitivities of the densities of the first predictive step with respect to the initial densities at the same latitude and altitude are We further differentiate predicted air densities with respect to air densities at the adjacent southern latitude but the same altitude.
The derivative of the predicted air densities with respect to air densities at the adjacent northern latitude but the same altitude is As a next step we differentiate predicted air densities with respect to the initial air densities at the next higher altitude but the same latitude.
The derivative of the predicted air densities with respect to the initial air densities at the next lower altitude but the same latitude is www.atmos-chem-phys.net/16/14563/2016/T. von Clarmann and U. Grabowski: Circulation and mixing from tracer measurements Finally we differentiate the predicted air densities with respect to the initial air densities at the adjacent southern latitude and higher altitude, and vice versa where i is the index of the time increment and where φ ± φ and z ± z refer to the adjacent model grid points in latitude and altitude, respectively.For mixing ratios, the respective derivatives are where Loss(month, φ, z) is the relative loss rate in the respective month at latitude φ and altitude z.These derivatives are simplifications in the sense that they do not consider the full chemical Jacobian but assume instead that the source strength depends on no other concentration than the actual concentration of the same species.For the typical long-lived tropospheric source gases considered here, like SF 6 or CFCs, this assumption is appropriate.Pre-tabulated loss rates are used which have been calculated by locally integrating loss rates over an entire month at a time resolution adequate to resolve the diurnal cycle.From the monthly losses, the Loss(month, φ, z) values, which are the contribution of losses to the partial derivatives of the local mixing ratios with respect to the initial local mixing ratios, are calculated as the secant of the local decay curve.
With these expressions, the prediction of air density and volume mixing ratio can be rewritten in matrix notation for a single micro time increment.This notation simplifies the estimation of the uncertainties of the predicted atmospheric state and the inversion of the prediction equation.In matrix notation, the prediction reads where D ρ;i is the (L 0 +K 0 )×(L 0 +K 0 ) Jacobian matrix of air density for time increment i, i.e., the sensitivities of the prediction with respect to the initial state, ∂c i+1,m ∂c i,n (here m and n run over the model grid points); D ρ,nom is a J 0 × L 0 Jacobian containing the partial derivatives ∂ρ i+1;j /∂ρ i;l , applied to the nominal altitude range; ρ I ;k=1,K 0 is the K 0 -dimensional vector of air densities in the border region after the final time step, i.e., for the time of the next measurement; ρ i;k=1,K 0 is the K 0 -dimensional vector of air densities in the border region at the current time step as resulting from interpolation in time; and ρ i;j =K 0 +1,L 0 is the J 0 -dimensional vector of air densities in the nominal region at the current time step as resulting from integration according to the MacCormack scheme as described above.
The operation of these sub-matrices is illustrated in the uppermost three (violet, green, blue) blocks of Fig. 1.
Since the source term depends on air density, the integration in matrix notation for vmr requires simultaneous treatment of vmr and air density, and we get the following, using notation accordant with air density: where D i is the total Jacobian with respect to air densities and all involved gas mixing ratios, and where g runs over all gases.Note that 1. the Jacobian D i is time-dependent because it includes sub-matrices controlling the interpolation between the initial time and the end time.In the case of vmr, a further time dependence is introduced by the time-dependent source function.
2. the first "row" of the Jacobian matrix includes identity I K because the prediction is not supposed to change the measured ρ I and vmr I at the end of the macro time increment.This value is used to construct the boundary condition.Row is here written in quotes because the elements of this "row" are matrices in themselves.The introduction of unity Jacobian elements is necessary because Eqs. ( 27)-( 28) are autonomized, originally nonautonomous systems of differential equations.
3. W is used to interpolate the boundary state between the initial time of the micro time interval, t +(i−1) t p , and the time at the end of the time interval t + t to give the atmospheric state at the border region at time t + i t p .
4. the Jacobian sub-matrices D ρ,nom and D g,nom are used to predict the atmospheric state in the nominal range after one further micro time increment from the atmospheric state at the current time and the boundary condition.Its elements are described in Eqs. ( 15)-( 21) and ( 22)-( 26).The part of D g,nom which refers to the border mixing ratios (vmr g;I,k=1,K g ) is zero.The dimension of 5. no simple mapping mechanism between the field of atmospheric state variables sampled at latitudes and altitudes and the vectors ρ and vmr g is provided because the fields are irregular in the sense that the number of relevant altitudes is latitude-dependent.Pointer variables have to be used instead.
The matrix structure is exemplified in Fig. 1.For the macro time increment t, we get

Prediction errors
Let S 0 be the L × L covariance matrix describing the uncertainties of all involved measurements ρ 0 and vmr 0 , with diagonal elements s 0;l,l = σ 2 0;l,l and L = gases 0 L g .We assume that these measurement errors in the state variables used for the prediction are the only relevant error sources.With S 0 and 1 i=I D i available, generalized Gaussian error propagation for ρ I and vmr I can be easily formulated as Even if S 0 is diagonal, i.e., the initial errors are assumed to be uncorrelated, error propagation through the forward model will generate non-zero error covariances in S I representing the atmospheric state at time t + t. S I will be needed in the inversion of circulation and mixing coefficients described in Sect. 4.  .
Matrix structure of the right-hand part of Eq. ( 28).Colors indicate which matrix blocks operate on which part of the input vector.
An example with two gases is shown.

A note on finite-resolution measurements
The measurements used are not a perfect image of the true atmospheric state but contain some prior information.
In the case of the data provided by the Institute of Meteorology and Climate Research (IMK), a priori profiles are usually set to zero, and the constraint is built with a Tikhonov-type first-order finite differences smoothing constraint (see von Clarmann et al., 2009).That means that, besides the mapping of measurement and parameter errors, the only distortion of the truth via the retrieval is reduced altitude resolution; no other effect of the prior information is to be considered.Usually, any comparison between modeled and measured fields requires application of the averaging kernels of the retrieval to the model data in order to account for the smoothing by the constraint of the retrieval (assuming that the model grid is much finer than the resolution of the retrieval).
In our case, the situation is different.The model is initialized with measurements of reduced altitude resolution, and the fields predicted by the model are then compared to measurements of the same altitude resolution.It is fair to assume that the model does not dramatically change the altitude resolution of the profiles, and thus comparable quantities are compared when the residuals between predicted and measured atmospheric state are evaluated.

A note on numerical mixing
Let the initial mixing ratio field be homogeneous except one point with delta-type excess mixing ratio.Assume further a homogeneous velocity field and zero mixing coefficients.If the velocity is such that the position of the excess mixing ratio is displaced during t by a distance which is not equal to an integer multiple of the grid width, then the resulting distribution will no longer be a delta-type distribution but will be smoothed.We refer to the widening of the delta peak as numerical mixing.The MacCormack transport scheme is designed to fight this diffusivity but some higher-order effects may still survive.One might think that, during the inversion, the widening is misinterpreted as mixing, leading to mixing coefficients that are too large.
Again, in our case, the situation is different.The widening does not accumulate over the t p time steps because we first calculate the operator 1 i=I D i , which is applied only once to the initial field, which avoids the accumulation of numerical mixing over time steps.Still one widening process as described above can occur, when the forward model leads to a position of the new peak which cannot be represented in the grid chosen.However, since the gas distributions vmr g;I at the end of time step t are sampled on the same grid, the maximum in the real atmosphere would be widened in the same way and there would be no residual the inversion would try to get rid of by increasing the mixing coefficients.And the next time step t is initialized again with measured data, which also excludes the accumulation of numerical mixing effects.
These considerations aside, there are other numerical artifacts.These are related to the numerical evaluation of partial derivatives of the state variables in our transport scheme chosen.Particularly in the case of delta functions in the state variable field, these cause side wiggles behind and smearing in front of the transported structure.To keep these artifacts small, it is necessary to set the spatial grid fine enough that every structure is represented by multiple grid points.

The inverse problem
For convenience, we combine the variables of the initial atmospheric state and the predicted state at the end of the macro time interval, respectively, into the vectors and The related subsets of x 0 and x I which contain only state variables in the nominal altitude range but not those in the border region are x 0 = (x 0;1 . ..x 0;J ) T and x I = (x I ;1 . ..xI ;J ) T , respectively.The reason why the distinction between x and x is made is that, contrary to the prediction step, for the inversion vector elements related to the interpolation of values in the border region are no longer needed.Further, we combine the fields of meridional and vertical wind components and mixing coefficients into the vector and assume constant velocities and mixing coefficients during the macro time step.To infer circulation patterns and mixing coefficients from the measurements of air densities and mixing ratios, the Jacobian matrix F, is needed, where N = 4J , where J = gases 0 J g , because there are four unknown quantities, v j , w j , K φ;j , and K z;j , at each grid point of the nominal region where these variables shall be inferred.The elements of F are calculated from Eq. ( 28) by application of the product rule: where the tilde symbol in f n indicates that the vectors resulting from Eq. ( 35) still include the border elements which have to be discarded to obtain f n .The quantity f n is more efficiently computed using the following recursive scheme, where f l,i is the respective column of the Jacobian after micro time step i: With the argument of D specifying the column of the D matrix such that D c,i (φ, z) relates ρ i+1 (φ, z) to ρ i (φ, z), D ρ,i (φ ± φ, z) relates ρ i+1 (φ, z) to ρ i (φ ± φ, z), and D ρ,i (φ, z ± z) relates ρ i+1 (φ, z) to ρ i (φ, z ± z), and for vmr accordingly, the entries of D i relevant to v are www.atmos-chem-phys.net/16/14563/2016/T. von Clarmann and U. Grabowski: Circulation and mixing from tracer measurements ∂ ∂vmr i+1 (φ,z) Entries not mentioned here are zero.Entries relevant to w are ∂ ∂vmr i+1 (φ,z) ∂ ∂vmr i+1 (φ,z) Entries relevant to K φ are ∂ ∂vmr i+1 (φ,z) ∂ ∂vmr i+1 (φ,z)   ∂vmr i (φ+ φ,z) ∂ ∂vmr i+1 (φ+ φ,z) ∂ ∂vmr i+1 (φ− φ,z) And finally, entries relevant to K z are With these derivatives we linearize the prediction with respect to wind and mixing coefficients; i.e., we linearly predict the new atmospheric state for a given initial state q 0 as a function of wind and mixing ratios.
This formulation gives access to the winds and mixing ratios via inversion of F.
Assuming linearity and Gaussian statistics, the most likely set q of winds and mixing ratios during the macro time interval minimizes the following cost function: where x m;I is the measured state at the end of the macro time step and S r is the error covariance matrix of the residual, which is, under the assumption that prediction error and measurement errors are uncorrelated, the sum of the prediction covariance matrix and the measurement covariance matrix, both after the macro time step: where S p is a J × J matrix containing those elements of S I which are relevant to x I .S m,I is the measurement error covariance matrix of the atmospheric state after the macro time step.The minimization of the cost function gives the following estimate q of winds and mixing coefficients: The matrix F T S −1 r F can be singular either because the related system of equations is under-determined or ill-posed due to nearly linearly dependent equations.Singularity is addressed by adding the following constraint term to the cost function of Eq. ( 78): T. von Clarmann and U. Grabowski: Circulation and mixing from tracer measurements χ 2 con = q − q a T R q − q a (81) where q a is some prior assumption on velocities and mixing coefficients.R is a J × J regularization matrix of which the choice is discussed below.From this, the constrained estimate of velocities and mixing coefficients can be inferred: An equivalent formulation, which is more efficient if the dimension of q is larger than that of x (under-determined problem) but which requires a non-singular regularization matrix and does not give easy access to diagnostics (see below), is (Rodgers, 2000) The covariance matrix characterizing the uncertainty of estimated winds and mixing coefficients is and the estimated winds and mixing coefficients are related to the true ones as which is unity in the case of the unconstrained estimation of q.In the case of Newtonian iteration, Eqs. ( 85)-( 86) are evaluated using the Jacobian F valid at the solution.
Due to the concentration dependence of the source function and the q dependence of F, Eq. ( 29) is valid only in linear approximation.This is helped by putting the inversion in the context of a Newtonian iteration (see, e.g., Rodgers, 2000, p. 85).Equation ( 80) becomes where it is the iteration index.Equation ( 83) becomes or alternatively and Eq. ( 84) becomes x m;I − x I,it + F it (q it − q a ) .
With q a = 0 and diagonal R = γ I we get the smallest possible velocities and mixing coefficients still consistent with the measurement, where tuning parameter γ will be set depending on how large fit residuals may be for the user to still consider them to be "consistent".With R being diagonally blockwise composed of squared and scaled first-order finite difference operators and q a = 0, smooth fields of wind vectors and mixing coefficients can be enforced.Setting q a the result of the previous macro time step corresponds to sequential data assimilation.In this application R is set to the reciprocal uncertainty of q a plus some margin to allow for variability of velocity and mixing coefficients in time.And finally, if prior knowledge is formed by independent measurements and their reciprocal uncertainties as a constraint matrix, or within the debatable framework of Bayesian statistics, estimates q would even be the most probable estimate of velocities and mixing ratios.

Prediction of the atmospheric state
In a first step we test the predictive power of the formalism defined by Eqs. ( 3)-( 29).Since the formalism itself is deductive and starts from a well-established theoretical concept, the purpose of the test is solely to verify that the implementation of the formalism is correct and that involved numerical approximations are adequate.As a consequence of the Bonini paradox (see Bonini, 1963, andStarbuck, 1975), a model is the harder to understand the more complex it is.While the predictive power of a model usually increases with complexity, this does not necessarily hold for its explanatory power.Thus, we have decided to test our model on the basis of very simple test cases, where major failure of the model is immediately obvious.Four test cases have been chosen, each dedicated to one kinematic variable (v, w, K φ and K z ), while the other three were set to zero.
In the first case, v was set to 1/10 of the Courant limit (Courant et al., 1928) (about 0.17/ cos(φ) m s −1 ) everywhere.As one would expect from the continuity equation applied to a spherical atmosphere, no changes in air density except boundary effects at the poles were observed, and structures near the equator were transported by about 4 • within a month, as expected from the equation of motion.A Gaussianshaped perturbation of a half width of one latitudinal grid width (4 • ) causes an upwind wiggle of less than 0.7 % of the amplitude of the perturbation at a meridional velocity of one grid point per month.There is no discernable change in the width of the transported structure.Similarly, for the second case a constant field of w of 1.1 × 10 −3 m s −1 lifts a structure upwards by about 3 km per month.Mixing coefficients were verified to smear out structures in the respective direc-tion while leaving air density and structures in the orthogonal direction unchanged.

Inversion of simulated measurements
Case studies based on real measurement data are inadequate as the sole proof of concept because the truth is unknown and the result thus is unverifiable.Instead we first test our scheme on the basis of simulated atmospheric states and consider the scheme as verified if the velocities and mixing coefficients used to simulate the atmospheric states are sufficiently well reproduced.In the noise-free well-conditioned case, one might even expect, within the numerical precision of the system, the exact reproduction of the reference data; due to the -weak but non-zero -dispersivity of the numerical transport scheme, the wiggles discussed in the previous subsection cause, at some grid points, D matrix entries of the wrong sign.In order to fight resulting convergence problems of the inversion, some small regularization is at least necessary, even if the system of equations to be solved is well or overdetermined.Since the system in reality tends to be illconditioned and the constraint applied to the inversion prevents the reproduction of the reference data, we use a variety of idealized tracers instead.After this initial test of functionality, more and more realistic test cases are constructed in order to study the competing influence of constraint and measurement data on the solution.
During code development, a series of basic tests of increasing complexity were performed, including a variety of mixing ratio distributions transported with various velocity fields.The main lesson learned was that, even when the rows of Eq. ( 80) were independent and no formal ill-posedness could be diagnosed, the unphysical upwind wiggles in the vicinity of the mixing ratio peak as discussed in the previous subsection could trigger errors which are boosted during the iteration.This problem, which is associated with sharp structures and large velocities (of the order of one grid width per macro time step) can be solved by the use of a smoothing regularization matrix R as discussed in the last paragraph of Sect.4, however at the cost of degraded spatial resolution of the result.
As an example we show the following test.An altitudeindependent meridional velocity field (in degrees per month) was chosen, while the vertical velocity was set to zero (Fig. 2a).This particular choice of the meridional velocity field keeps boundary problems at the poles due to divergence in a circulation which is not closed reasonably small.Initial mixing ratio distributions (in ppmv) of four idealized gases were constructed such that gas (a) was sugarloaf shaped, while the mixing ratio distributions of the gases (b) to (d) were monotonic slopes. and where z is altitude above surface in km and φ is latitude in degrees.The distribution of gas (a) is shown in Fig. 2b.The transport problem was solved in the forward mode using the velocity field as defined in Eq. ( 91) for integrating the tendency equation (Eq.28) over 1 month.The resulting distribution of gas (a) after this macro time step is shown Fig. 2c.
The "sugarloaf" is transported to the north by the expected distance.Due to the latitudinal gradients with lower velocities in the luff of the sugarloaf and larger velocities in the lee, the shape of the sugarloaf is slightly distorted, leading to a steeper slope in the luff and a flatter slope in the lee.No indication of problems with diffusivity or dispersivity of the transport scheme is seen.At the poles boundary problems are visible which are unavoidable when such an unrealistic (but instructive) velocity field is used.The inversion nicely reproduces the initial velocity field (not shown because hardly discernable from Fig. 2a).Due to the smoothing constraint the peak velocity is decreased by 18 %.This smoothing is the price to pay for the stabilization of the retrieval in the presence of the boundary problems discussed above.The residual between the simulated measurement of the distribution of gas (a) at t 0 + t and the distribution predicted with the resulting velocity field is shown in (Fig. 2d).

Case study with MIPAS measurements
The risk of case studies based on simulated data typically is that not all difficulties encountered with real data are foreseen during theoretical studies.In order to demonstrate applicability to real data, global monthly latitude-altitude distributions of CFC-12, CH 4 , N 2 O, and SF 6 (Kellmann et al., 2012;Plieninger et al., 2015;Haenel et al., 2015) measured with the MIPAS (Fischer et al., 2008) were used.The purpose of these tests is to demonstrate the feasibility of the method presented.An investigation of the atmospheric circulation on the basis of this method applied to MIPAS data is left for a companion paper.For this proof of concept, sinks of these long-lived tracers have been ignored, but these will certainly be considered in scientific applications.
For this case study, zonal monthly mean distributions of air densities and mixing ratios of these four species from September and October 2010 were used.Figure 3 shows the measured distributions of these quantities in September (left column) and October (middle column) and the residuals between the measured and predicted contributions for October (right column).The residuals are reasonably small and show, except for methane in the polar upper stratosphere, no patterns which would suggest peculiarities with the inferred kinematic quantities.
The resulting circulation vectors which best explain the change in the mixing ratio distributions from September to October 2010 are shown in the upper left panel of Fig. 4. Winter polar subsidence, summer polar upwelling, the mesospheric overturning circulation, the upper and lower branches of the Brewer-Dobson circulation, and the tropical pipe are clearly visible.Details of the tropical pipe are visible in the right upper panel.As expected, the Brewer-Dobson circulation is much more pronounced in the northern (early) winter hemisphere.Velocities are roughly consistent with the mean ages of stratospheric air as determined by Stiller et al. (2008) and Haenel et al. (2015) in the sense that the quotient of the typical circulation velocity and the distance between equator and pole gives an age estimate of the correct order of magnitude.While the inferred field of circulation vectors shows many detailed features demanding scientific investigation in their own right (e.g., the latitude offset between the intertropical convergence zone and the stratospheric tropical pipe or the interface between the stratospheric two-cell circulation and the overturning mesospheric circulation and the transition altitude between them), the reproduction of the expected features justifies confidence in the method proposed.Resulting mixing coefficients K φ and K z are shown in the left and right lower panels, respectively.Negative mixing coefficients indicate counter-gradient mixing, which seems to be most pronounced in the tropical upper stratosphere.
Jacobian elements with respect to v values and K values seem to form a null space.Thus the K values were constrained to zero using diagonal components only in the R matrix diagonal blocks associated with the K values.The strength of this constraint was adjusted such that the K values were as small as possible as long as this did not boost the residual.Resulting K φ and K z distributions are shown in Fig. 4.
The errors in the estimated transport velocities and mixing coefficients have been estimated according to Eq. ( 85) and are shown in Fig. 5.The uncertainties in the transport velocities caused by the propagation of measurement errors are in the 1 % range, indicating that the information contained in the measurements is adequate for the purpose of retrieving circulation parameters.It even seems possible to improve the time resolution of the circulation analysis and aim at weekly instead of monthly temporal sampling.
Larger errors above 65 km altitude and at the bins closest to the pole are border effects, resulting from the fact that no symmetric derivatives can be calculated there.The uncertainties in K φ show the same patterns as the K φ values themselves.

Discussion
The analysis of the age of stratospheric air can be understood as an integrated look at the equations of motion of stratospheric air because the total travel time of the air parcel through the stratosphere is represented.The refinement of this method which analyzes the mean age just considers a weighted mean of the above, but it is still an integral method.Contrary to these integral methods, our direct inversion scheme supports a -in approximation, due to discrete sampling in the time domain -differential view of the same problem.The related advantages are the following: a. independence of assumptions on the age spectrum because during each time step mixing is explicitly considered b.insensitivity to SF 6 depletion in the mesosphere (see, e.g., Reddmann et al., 2001;Stiller et al., 2012) because the scheme uses the actual entry values of subsiding air as a reference c. applicability to nonideal tracers in the stratosphere (since the atmospheric state is updated for each time step by measured value, depletion does not accumulate, even if no sink functions are considered) and d. the circular reasoning that the lifetimes of nonideal tracers depend on their trajectories (and thus atmospheric circulation), while the determination of the circulation requires knowledge of the lifetimes, can be resolved; our scheme requires knowledge only of the local, not the global, lifetimes e. the method is an empirical method which does not involve any dynamical model; i.e., the forces which cause the circulation are not required.
The method only finds that kinematic state of the atmosphere which, according to the continuity equation, fit the measurements best.These kinematic state values are provided as model diagnostics to assess the performance of dynamical models.Due to these advantages, the major problems in the empirical analysis of the Brewer-Dobson circulation as mentioned by Butchart (2014)  Results of the case study presented in Sect.5.3 suggest that these problems are under control in the current application of the proposed scheme.

Conclusions and outlook
We have presented a method which infers mixing coefficients and effective velocities of a 2-D atmosphere by inversion of the continuity equation.The main steps of this procedure are a. the integration of the continuity equation over time to predict pressure and mixing ratios for given initial pressures and mixing ratios and initially guessed velocities and mixing coefficients; b. the propagation of errors of initial pressures and mixing ratios onto the predicted pressures and mixing ratios, by differentiation of the predicted state with respect to the initial state and generalized Gaussian error propagation; c. the estimation of the sensitivities of the predicted state with respect to the velocities and mixing coefficients; and d. the minimization of a quadratic cost function involving the residual between measured and predicted state at the end of the forecasting interval by inversion of the continuity equation.
The inferred velocities are suggested to be used as a model diagnostic in order to avoid problems encountered with other model diagnostics like mean age of stratospheric air.It is important to note that the diagnostics inferred here are effective transport velocities and effective mixing coefficients in the sense that they include eddy transport and diffusion terms.Thus, they cannot simply be compared to zonal mean velocities and mixing coefficients of a 3-D model, but the eddy terms have to be considered when these diagnostics quan- tities are calculated.The application of this method to SF 6 distributions measured by MIPAS (Stiller et al., 2012) to diagnose the Brewer-Dobson circulation is discussed in a companion paper.Obvious future activities are the extension of this method to three dimensions and the inclusion of sink functions of non-inert species to explore a larger number of tracers in order to better constrain the related inverse problem.
8 Data/Code availability The code is still under development.The data used are compiled in Supplement 2. The complete MIPAS data are available at http://www.imk-asf.kit.edu/english/308.php.
Appendix A: From 3-D to 2-D The reduction of the transport problem from three to two dimensions involves Reynolds decomposition of the threedimensional continuity equation and subsequent zonal averaging and gives rise to eddy mixing and transport terms.
The inference of effective two-dimensional transport velocities and effective mixing coefficients from measurements discussed in the main paper relies on the fact that, within certain assumptions and approximations, all these eddy effects can be understood as additional pseudo-advection and pseudomixing terms according to the advection equation and Fick's law, with gas-independent pseudo-velocities and pseudomixing coefficients.The exact interpretation of the twodimensional velocities and mixing coefficients inferred from the measurements depends on the approximations made.We apply our scheme to zonally averaged mixing ratios (no mass-weighted averaging).Contrary to the main text, where the symbols v, w, K φ , and K z are used in the twodimensional system, here zonal averages are indicated by a bar.Assuming that the deviations from the zonal mean are small compared to the zonal mean itself such that linearization is justifiable, that meridional advection is negligibly small compared to zonal advection, that the time variation of the zonal mean quantities is assumed to be much slower than the time variation of the deviations from the zonal mean, which corresponds to the assumption of a quasi-steady state, Tung (1982) derives a two-dimensional approximation to the continuity equation which, adjusted to our notation, written for geometric altitudes instead of potential temperatures, and using the shallowness approximation, reads and Assuming further that our scheme is applied only to long-lived species, such that chemical eddy terms can be ignored because chemical lifetimes are long compared to transport lifetimes (see Pyle and Rogers, 1980) and wave disturbances are dominated by steady or periodic terms, such that the terms with the mixed second derivative terms tend to disappear (Matsuno, 1980;Clark and Rogers, 1978;Pyle and Rogers, 1980;Tung, 1982) and an even weaker approximation is sufficient, namely The Supplement related to this article is available online at doi:10.5194/acp-16-14563-2016-supplement.

Figure 2 .
Figure 2. Test case: (a) velocity field; (b) initial distribution of gas (a); (c) distribution of gas (a) after 1 month; (d) residual between distribution of gas (a) after 1 month predicted with the resulting and the correct velocities.The units of the color bar in (a) are degrees per month.

Figure 3 .
Figure3.Measured distributions in September (left column) and October (middle column) and residual distributions between October measurements and predictions for October (right column) for air density and mixing ratios of CFC-12, CH 4 , N 2 O, and SF 6 (top to bottom).Grey grid boxes indicate nonavailability of valid data.

Figure 4 .
Figure 4. Resulting circulation vectors (v(z, φ), w(z, φ)) (upper left panel), where colors on the red side of the rainbow color scale represent higher velocities; a zoomed-in view of this (upper right panel); mixing coefficients K φ (lower left panel) and K z (lower right panel).
with the assumption that K zφ ≈ −K φz , Eq. (A1) can be rewritten as a tendency equation of the type