What is the Lagrangian for Nonlinear Filtering?
Jin W. Kim, Prashant G. Mehta, Sean P. Meyn

TL;DR
This paper extends the classical duality between estimation and control from linear to nonlinear filtering, introducing a dual process via BSDEs to derive the nonlinear filter equation.
Contribution
It generalizes the Kalman-Bucy duality to nonlinear filters using backward stochastic differential equations and optimal control techniques.
Findings
Derived the nonlinear filter equation using duality and BSDEs.
Showed classical Kalman-Bucy duality as a special case.
Provided a new framework for nonlinear filtering via dual processes.
Abstract
Duality between estimation and optimal control is a problem of rich historical significance. The first duality principle appears in the seminal paper of Kalman-Bucy, where the problem of minimum variance estimation is shown to be dual to a linear quadratic (LQ) optimal control problem. Duality offers a constructive proof technique to derive the Kalman filter equation from the optimal control solution. This paper generalizes the classical duality result of Kalman-Bucy to the nonlinear filter: The state evolves as a continuous-time Markov process and the observation is a nonlinear function of state corrupted by an additive Gaussian noise. A dual process is introduced as a backward stochastic differential equation (BSDE). The process is used to transform the problem of minimum variance estimation into an optimal control problem. Its solution is obtained from an application of the maximum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
What is the Lagrangian for Nonlinear Filtering?
Jin-Won Kim, Prashant G. Mehta and Sean P. Meyn Financial support from the NSF grant 1761622 and the ARO grant W911NF1810334 is gratefully acknowledged. J-W. Kim and P. G. Mehta are with the Coordinated Science Laboratory and the Department of Mechanical Science and Engineering at the University of Illinois at Urbana-Champaign (UIUC); S. P. Meyn is with the Department of Electrical and Computer Engineering at the University of Florida at Gainesville; Corresponding email: [email protected].
Abstract
Duality between estimation and optimal control is a problem of rich historical significance. The first duality principle appears in the seminal paper of Kalman-Bucy, where the problem of minimum variance estimation is shown to be dual to a linear quadratic (LQ) optimal control problem. Duality offers a constructive proof technique to derive the Kalman filter equation from the optimal control solution.
This paper generalizes the classical duality result of Kalman-Bucy to the nonlinear filter: The state evolves as a continuous-time Markov process and the observation is a nonlinear function of state corrupted by an additive Gaussian noise. A dual process is introduced as a backward stochastic differential equation (BSDE). The process is used to transform the problem of minimum variance estimation into an optimal control problem. Its solution is obtained from an application of the maximum principle, and subsequently used to derive the equation of the nonlinear filter. The classical duality result of Kalman-Bucy is shown to be a special case.
I Introduction
In Kalman’s celebrated paper with Bucy, it is shown that the problem of minimum variance estimation is dual to a deterministic optimal control problem [1]. Duality offers a constructive proof technique to derive the Kalman filter equation from the optimal control solution [2, Ch. 7]. Apart from the formulation’s aesthetic appeal to control theorists and aficionados of variational techniques, the proof helps explain why, with the time arrow reversed, the covariance update equation of the Kalman filter is the same as the dynamic Riccati equation (DRE) of optimal control. Given this, two natural questions are: (i) What is the dual optimal control problem for the nonlinear filter? and (ii) Can the equation of the nonlinear filter be derived from the solution of an optimal control problem? These questions are answered in this paper.
In classical linear Gaussian settings, dual constructions are of the following two types [3, Sec. 7.3]: (i) minimum variance estimation, which was first outlined in Kalman-Bucy’s original paper, and (ii) minimum energy estimation whose formulation first appears in [4].
Given the historical significance of this area, several extensions have been considered over the decades [4, 5, 6, 7, 8]. Much work has been done on extending and interpreting the duality for minimum energy estimation as (i) a MAP estimator [8, 9]; (ii) through an application of the log transformation to transform the Bellman equation of optimal control into the Zakai equation of filtering [6, 7, 10]; or (iii) based upon the variational Kallianpur-Striebel formula [11], [12, Lemma 2.2.1]. Based on these extensions, the negative log-posterior has been shown to have an interpretation as an optimal value function. Such a formulation is used to derive the equations of nonlinear smoothing in a companion paper on arxiv [13].
It must be said that none of these earlier results are extensions of duality for the minimum variance estimation problem.
It has been noted in prior work that (i) the dual relationship between the DRE of the LQ optimal control and the covariance update equation of the Kalman filter is not consistent with the interpretation of the negative log-posterior as a value function, and (ii) some of the linear algebraic operations, e.g., the use of matrix transpose to define the dual system, are not applicable to nonlinear systems [14, 9]. For these reasons, the original duality of Kalman-Bucy is widely understood as an LQG artifact that does not generalize [14].
The present paper has a single contribution: generalization of the original Kalman-Bucy duality theory to nonlinear filtering. It is an exact extension, in the sense that the dual optimal control problem has the same minimum variance structure for linear and nonlinear filtering problems. Kalman-Bucy’s linear Gaussian result is shown to be a special case. Explicit expressions for the control Lagrangian and the Hamiltonian are described. These expressions are expected to be useful to construct approximate algorithms for filtering via learning techniques that have become popular of late.
A related but distinct formulation for duality was proposed by the authors in a recent paper [15]. The formulation described in this paper is original and fixes many of the issues (information structure, stochastic terms in the objective function) in the earlier paper. Both the objective function and the BSDE constraint described in this paper are original. The main analysis technique – the use of maximum principle to derive the nonlinear filter – is also original.
The outline of the remainder of this paper is as follows: The dual optimal control problem is proposed in Sec. II. The dual problem is described for two cases: in the first the state space is finite, and in the second the state is defined as an Itô-diffusion. The solutions for these two cases appears in Sec. III and Sec. IV, respectively. A martingale characterization of the solution appears in Sec. V. The proofs are contained in the Appendix.
II Problem Formulation
Notation: For state-space denoted by , we let denote the Borel -algebra on , and denote the set of probability measures on . Any probability measure acts on a Borel-measurable function according to .
For a filtration and a measurable space , denotes the space of -adapted square-integrable processes taking values in . Likewise, is the -measurable square-integrable random variables taking values in . is the space of -times differentiable functions from to , and is the space of square-integrable (with respect to the Lebesgue measure) function from to .
For a matrix, denotes the trace and denotes the vector of its diagonal entries. For a vector, denotes a diagonal matrix with diagonal entries given by the vector. For a function , is the gradient vector,
and is the Hessian matrix. For a vector-valued function , is the divergence of .
II-A Filtering problem
Consider a pair of continuous-time stochastic processes . The state is a Markov process that evolves in the state-space . The vector-valued observation process is defined according to the following model:
[TABLE]
where is the observation function and is an -dimensional Wiener process (w.p.) with covariance matrix . The initial distribution for is denoted .
The filtering problem is to compute the conditional distribution (posterior) of the state given the filtration (time history of observations) . The posterior distribution at time is denoted as . It is an element of . For any integrable function ,
[TABLE]
It is well-known that is the minimum variance estimator of [16, Lemma 5.1]. The central question of this paper is to formulate the minimum variance estimation as a dual optimal control problem.
These formulations are described in Sec. II-B and Sec. II-C, for the Euclidean and the finite state-space settings, respectively. The duality principle is then presented in Sec. II-D. The relationship to the well known linear Gaussian case is discussed in Sec. II-E.
II-B Itô diffusion on the Euclidean space
The state process evolves on according to the Itô stochastic differential equation (SDE)
[TABLE]
where is a vector valued standard w.p., and are functions of appropriate dimensions. It is assumed that , , are mutually independent.
The differential generator of , denoted as , acts on functions in its domain according to
[TABLE]
It is assumed that is an elliptic operator: there is such that for all .
For functions , , , and , a cost function is defined as follows:
[TABLE]
Dual optimal control problem:
[TABLE]
The constraint (4b) is a backward stochastic partial differential equation (BSPDE) with boundary condition prescribed at the terminal time . The function appearing in this boundary condition is allowed to be random, with .
The admissible set of control input is as follows:
[TABLE]
The solution of the BPSDE is adapted to the filtration . It is an element of ; cf., [17].
II-C Finite state-space
The continuous-time process evolves on the finite state space . Its statistics are characterized by the initial distribution and the row-stochastic rate matrix .
The dual of is the space of all functions on , which can be identified with : Any function is determined by its values at the basis vectors , and for any , the expectation can be expressed as a dot product: . Similarly, the observation function can be expressed , , where .
We also define a matrix for each as follows:
[TABLE]
The cost function in this case is defined as
[TABLE]
where , , , and .
Dual optimal control problem:
[TABLE]
The constraint is a backward stochastic differential equation (BSDE) with terminal condition as before. We write , with our convention that functions on are identified with vectors in .
The admissible set of control input is and the solution pair ; cf. [18, Ch. 7].
II-D Duality relationship
Suppose is defined according the observation model (1). Consider an admissible control input and define via solution of the BSDE – Eq. (4b) for the Euclidean case, and Eq. (7b) for the finite case. Assume the following linear structure of the estimator:
[TABLE]
The duality relationship is expressed in the following proposition whose proof appears in Appendix. A-A:
Proposition 1
Consider the observation model (1), the linear estimator (8), together with the dual optimal control problem. Then for any choice of admissible control :
[TABLE]
Thus, formally, the problem of obtaining the minimum variance estimate of (minimizer of the right-hand side of the equality) is converted into the problem of finding the optimal control (minimizer of the left-hand side of the identity). However, there is a subtle problem: It is not apriori clear whether there exists a such that 111This will be true, e.g., if all -measurable random variable have a representation of the form (8)..
In this paper, the following assumption is made:
Assumption A1: For each fixed terminal time and function , there exists a such that .
Under this assumption, the following is proved in Appendix. A-B:
Proposition 2
Consider the dual optimal control problem. Suppose is the optimal control input and that is the associated optimal trajectory obtained as a solution of the BSDE. Then for all :
[TABLE]
II-E Linear-Gaussian case
The linear-Gaussian case assumes the following model:
The drift in the Itô diffusion is linear in . That is,
[TABLE]
where and . 2. 2.
The coefficient of the process noise is a constant matrix, . We denote . 3. 3.
The prior is a Gaussian distribution with mean and variance .
Classical Kalman-Bucy duality is concerned with the problem of constructing a minimum variance estimator for the random variable where is a given deterministic vector [1]. Therefore, we also have
The terminal condition in (4b) is a linear function .
We impose the following restrictions:
The control input is restricted to be a deterministic function of time (in particular, it does not depend upon the observations). Such a control is trivially -adapted hence admissible. For such a control input, with deterministic , the solution of the BSPDE is a deterministic function of time, and . The BSPDE becomes a PDE:
[TABLE]
where the lower-case notation is used to stress the fact that and are now deterministic functions of time. 2. 2.
Without loss of generality, it suffices to consider the restriction of the optimal control problem (4) on a finite () dimensional subspace of the function space:
[TABLE]
It is easy to see that is an invariant subspace for the dynamics (10). On , the PDE reduces to an ODE:
[TABLE]
and the cost function becomes
[TABLE]
It no longer depends upon .
In summary, the optimal control problem (4) reduces to the deterministic LQ problem of classical duality:
[TABLE]
The solution of the optimal control problem yields the optimal control input , along with the vector that determines the minimum-variance estimator:
[TABLE]
The Kalman filter is obtained by expressing as the solution to a linear SDE [2, Ch. 7].
Remark 1
Consider the dual optimal control problem (7) for the finite state space model. Suppose one only allows control inputs that are deterministic functions of time. In this case, is a deterministic function of time and . Consequently, the objective function in (7a) simplifies
[TABLE]
*where and is the quadratic variation process for . The resulting problem is a deterministic LQ problem whose optimal solution will (in general) yield a sub-optimal estimate using (8). In Appendix A-C, the solution is used to derive a Kalman filter for the Markov chain. Such sub-optimal filters for Markov chains have been applied in [19, 20]. *
We have now set the stage to derive the nonlinear filter via the solution to the dual optimal control problem. We describe the solution for the finite state case first in Sec. III. Although technically and notationally more challenging, the considerations for the Euclidean case are entirely analogous and described in Sec. IV.
III Solution for the finite-state case
III-A Standard form of the optimal control problem
Consider the dual optimal control problem (7) associated with the finite state space model. There is only one natural filtration in this problem, the filtration generated by the observation process.
The ‘state’ of the optimal control problem is adapted to the filtration by construction. So, the problem is fully observed. However, the problem is not in its standard form [21, Def. 5.4].
There are two issues:
The stochastic process on the right-hand side of the BSDE (7b) is not a Wiener process. 2. 2.
The cost function in (7b) depends upon the exogenous process which is not adapted to .
In order to resolve these issues and express the optimal control problem in a standard form, we introduce three stochastic processes , , and as follows:
[TABLE]
From the standard filtering theory, it is well known that i) is a Wiener process that is adapted to [16, Lemma 5.6],
and ii) the filtration generated by the innovation process equals for all [22].
We therefore express the BSDE constraint as
[TABLE]
The solution is adapted to , now interpreted as the filtration generated by the innovation process.
In order to remove the explicit dependence of the cost function on the non-adapted process , we use the tower property of the conditional expectation:
[TABLE]
Denote .
Because are all -adapted, it is a straightforward calculation to see that
[TABLE]
where . The function is the control Lagrangian.
We are now ready to state the standard form of the optimal control problem for the finite state-space case:
Dual optimal control problem (standard form):
[TABLE]
In its standard form, all the processes are adapted to the filtration and moreover the stochastic process on the right-hand side of the BSDE is a Wiener process with respect to this filtration.
III-B Solution using the maximum principle
The Hamiltonian is defined as follows:
[TABLE]
A characterization of the optimal input is contained in the following. Its proof, based on an application of the maximum principle [23], appears in the Appendix. A-D.
Theorem 1
Consider the optimal control problem (13). Suppose is the optimal control input and that is the associated optimal trajectory obtained as a solution of the BSDE (13b). Then there exists a -adapted vector valued process such that
[TABLE]
where and satisfy
[TABLE]
Remark 2
From linear optimal control theory, it is known that (see [18, Sec. 6.6]) where is a -adapted matrix-valued process. The boundary condition suggests that . This is indeed the case as shown in the proof of Theorem 2, where the following equation is derived (see (25)):
[TABLE]
This is the DRE of the nonlinear filter.
III-C Derivation of the nonlinear filter
From Prop. 2, using the formula (14) for the optimal control,
[TABLE]
This formula is used to derive the Wonham filter [24]. The proof of the following theorem appears in the Appendix. A-E.
Theorem 2** (Nonlinear filter)**
Consider the optimal estimator (16) where and solve Hamilton’s equations (15). Then
[TABLE]
*and furthermore for all . *
IV Solution for the Euclidean case
As in the finite state-space case, the starting point is the definition of the Lagrangian: where the cost function for the Euclidean case is defined in (3) and is the conditional distribution at time defined according to (2). Explicitly,
[TABLE]
The standard form of the optimal control problem is as follows:
Dual optimal control problem (standard form):
[TABLE]
Assumption A2: The (generic) measure is absolutely continuous with respect to Lebesgue measure.
Notation: The Radon-Nikodyn derivative is denoted as . Consequently, . In the remainder of this section, with a slight abuse of notation, we will drop the tilde to simply write .
The co-state and the Hamiltonian are defined as follows:
[TABLE]
Hamilton’s equations are described in the following, while the proof appears in Appendix. A-F:
Theorem 3
Consider the optimal control problem (18). Suppose is the optimal control input and the is the associated optimal solution obtained by solving BSPDE (18b). Then there exists a -adapted function-valued process such that
[TABLE]
where the optimal control is given by
[TABLE]
Using the result of Prop. 2, the optimal estimator is
[TABLE]
As before, the formula is used to derive the equation for the Kushner filter equation [25]. The proof appears in Appendix. A-G:
Theorem 4
Consider the optimal estimator (21) where and solve Hamilton’s equation (19). Then the conditional density solves the SPDE:
[TABLE]
and furthermore P_{t}(x)=\pi_{t}(x)\big{(}Y_{t}(x)-\langle\pi_{t},Y_{t}\rangle\big{)} for all and .
V A martingale characterization
For the finite state-space case, define the function as follows:
[TABLE]
For the Euclidean case, the analogous function is as follows:
[TABLE]
In the statement of the following theorem, denotes the optimal control input as defined according to the formula (14) for the finite-state-space and the formula (20) for the Euclidean case. The proof appears in the Appendix. A-H.
Theorem 5
Suppose is the posterior process and is the -adapted solution of the dual BSDE. For every input , the process
[TABLE]
is a supermartingale; it is a martingale if and only if . Consequently,
[TABLE]
with equality if and only if . Consequently, the right-hand side is the value function for the dual optimal control problem.
Appendix A Appendix
A-A Proof of Prop. 1
Euclidean case: For a random function , the Itô-Wentzell theorem [26, Thm. 1.17] gives the formula for the differential
[TABLE]
Integrating over ,
[TABLE]
Using the formula (8) for the estimator, the error
[TABLE]
Upon squaring and taking an expectation,
[TABLE]
Finite state-space case: A martingale is defined as follows:
[TABLE]
whose quadratic variation is (see (5) for the definition of ). Itô’s product formula gives
[TABLE]
Using this, together with (8) for the estimator, results in the following error equation:
[TABLE]
Upon squaring and taking an expectation,
[TABLE]
A-B Proof of Prop. 2
Suppose . Define
[TABLE]
Using Prop. 1,
[TABLE]
By the dynamic programming principle, if is an optimal control input over the time-horizon then minimizes the right-hand side over all -adapted control inputs.
It then follows from Assumption (A1) that .
A-C Derivation of the Kalman filter for the Markov process
The optimal control solution is given by
[TABLE]
Upon substituting the solution into the estimator (8)
[TABLE]
where
[TABLE]
Define to denote the state transition matrix and express the solution as . Thus,
[TABLE]
Noting that time is arbitrary, upon differentiating with respect to , one obtains the Kalman filter
[TABLE]
where we have replaced by .
A-D Proof of Thm. 1
Equation (15a)-(15c) are Hamilton’s equation for optimal control of a BSDE [23]. Explicitly, the partial derivatives are as follows:
[TABLE]
Using these formulae, the explicit form of Hamilton’s equation is as follows:
[TABLE]
The optimal control is obtained from the maximum principle:
[TABLE]
Since is quadratic in the control input, the explicit formula (14) is obtained by setting the derivative to zero:
[TABLE]
A-E Proof of Thm. 2
As noted in Remark 2, the optimal control problem for the finite state-space case has a linear structure, and thus for some matrix-valued process . The proof is broken into two steps: In step 1, we assume and derive the equation (17) of the Wonham filter. In step 2, we show that this assumption is consistent with the filter.
Step 1: The foregoing implies the following identities
[TABLE]
The first equality is (16), and the second follows from Itô’s product formula.
Hamilton’s equation (22b) gives a formula for , which when combined with (23) gives
[TABLE]
Upon integrating both sides, one finds that
[TABLE]
is a finite variation process. Therefore, by an application of [27, Theorem 4.8],
[TABLE]
for some yet to be determined process . Now,
[TABLE]
Therefore, the right-hand side of (24) must be zero:
[TABLE]
This gives the equation of the Wonham filter.
Step 2: It remains to verify that . Using the equation of the Wonham filter (17) and the definition (12b) for , it is a direct calculation to see that
[TABLE]
The assertion is shown by establishing
[TABLE]
and noting .
The calculation showing (26) is notationally cumbersome but straightforward. It is included in step 3 below.
Step 3: For any two column vectors , denotes the Hadamard (element-wise) product. For , it is a straightforward calculation to see
[TABLE]
Multiplying both sides of the matrix-valued equation (26) by , upon using the identity with and to simplify the righthand-side, one obtains
[TABLE]
Similarly, multiplying both sides of (26) by , using the identity with and , after applying Itô rules to simplify the righthand-side, one obtains
[TABLE]
Therefore,
[TABLE]
The right-hand side is identical to the right-hand side of Hamilton’s equation (22a) for .
A-F Proof of Thm. 3
Equation (19) are Hamilton’s equation. The Gateaux differentials of are
[TABLE]
Therefore the explicit formulas for Hamilton’s equations are
[TABLE]
The optimal control is obtained from the maximum principle:
[TABLE]
The explicit formula (20) for the optimal control is obtained by setting the derivative to zero:
[TABLE]
A-G Proof of Thm. 4
The proof is identical to the proof of Theorem 2 for the finite state-space case. In step 1, we derive the equation of the filter by first assuming the following linear relationship between and :
[TABLE]
In step 2, we verify this relationship is consistent with the filter equation.
Step 1: Suppose (27) is true. The differential form of (21) is given by:
[TABLE]
As in the finite state-space case, the proof approach is to use Hamilton’s equation for to derive the equation for .
Using the Itô product formula,
[TABLE]
Use Hamilton’s equation (19b) to evaluate , and equate the resulting expression to the right-hand side of (28) to obtain:
[TABLE]
This is the Euclidean counterpart of (24) in the proof of Theorem 2 in the finite state-space case. The derivation of the filter is now identical.
Step 2: The verification of (27) follows along the same lines as the finite state-space case. It is omitted here.
A-H Proof of Thm. 5
The proof is given for the finite state-space case. In the finite state-space case,
[TABLE]
Upon using (12b) for and (17) for the filter, it is a direct application of the Itô product formula that:
[TABLE]
Using this formula,
[TABLE]
Therefore, is always a super-martingale with respect to , and is a martingale if and only if
[TABLE]
for almost every . Consequently,
[TABLE]
Adding {\sf E}\big{(}\int_{0}^{T}{\cal L}(Y_{s},V_{s},U_{s}\,;\pi_{s})\,\mathrm{d}s\big{)} on both sides yields the optimality result.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. E. Kalman and R. S. Bucy, “New results in linear filtering and prediction theory,” Journal of basic engineering , vol. 83, no. 1, pp. 95–108, 1961.
- 2[2] K. J. Åström, Introduction to Stochastic Control Theory . Academic Press, 1970.
- 3[3] A. Bensoussan, Estimation and Control of Dynamical Systems . Springer, 2018, vol. 48.
- 4[4] R. E. Mortensen, “Maximum-likelihood recursive nonlinear filtering,” Journal of Optimization Theory and Applications , vol. 2, no. 6, pp. 386–394, 1968.
- 5[5] K. W. Simon and A. R. Stubberud, “Duality of linear estimation and control,” Journal of Optimization Theory and Applications , vol. 6, no. 1, pp. 55–67, 1970.
- 6[6] W. Fleming and S. Mitter, “Optimal control and nonlinear filtering for nondegenerate diffusion processes,” Stochastics , vol. 8, pp. 63–77, 1982.
- 7[7] W. H. Fleming and E. De Giorgi, “Deterministic nonlinear filtering,” Annali della Scuola Normale Superiore di Pisa-Classe di Scienze-Serie IV , vol. 25, no. 3, pp. 435–454, 1997.
- 8[8] G. C. Goodwin, J. A. de Doná, M. M. Seron, and X. W. Zhuo, “Lagrangian duality between constrained estimation and control,” Automatica , vol. 41, no. 6, pp. 935–944, 2005.
