What is the Lagrangian for Nonlinear Filtering?

Jin W. Kim; Prashant G. Mehta; Sean P. Meyn

arXiv:1903.11195·math.OC·October 28, 2019·CDC

What is the Lagrangian for Nonlinear Filtering?

Jin W. Kim, Prashant G. Mehta, Sean P. Meyn

PDF

TL;DR

This paper extends the classical duality between estimation and control from linear to nonlinear filtering, introducing a dual process via BSDEs to derive the nonlinear filter equation.

Contribution

It generalizes the Kalman-Bucy duality to nonlinear filters using backward stochastic differential equations and optimal control techniques.

Findings

01

Derived the nonlinear filter equation using duality and BSDEs.

02

Showed classical Kalman-Bucy duality as a special case.

03

Provided a new framework for nonlinear filtering via dual processes.

Abstract

Duality between estimation and optimal control is a problem of rich historical significance. The first duality principle appears in the seminal paper of Kalman-Bucy, where the problem of minimum variance estimation is shown to be dual to a linear quadratic (LQ) optimal control problem. Duality offers a constructive proof technique to derive the Kalman filter equation from the optimal control solution. This paper generalizes the classical duality result of Kalman-Bucy to the nonlinear filter: The state evolves as a continuous-time Markov process and the observation is a nonlinear function of state corrupted by an additive Gaussian noise. A dual process is introduced as a backward stochastic differential equation (BSDE). The process is used to transform the problem of minimum variance estimation into an optimal control problem. Its solution is obtained from an application of the maximum…

Equations233

Z_{t} = \int_{0}^{t} h (X_{s}) d s + W_{t}

Z_{t} = \int_{0}^{t} h (X_{s}) d s + W_{t}

π_{t} (f) := E (f (X_{t}) ∣ Z_{t})

π_{t} (f) := E (f (X_{t}) ∣ Z_{t})

d X_{t} = a (X_{t}) d t + σ (X_{t}) d B_{t}, X_{0} \sim π_{0}

d X_{t} = a (X_{t}) d t + σ (X_{t}) d B_{t}, X_{0} \sim π_{0}

({\cal A}y)(x):=a^{\top}(x)\frac{\partial y}{\partial x}(x)+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}\Big{(}\sigma(x)\sigma^{\top}(x)\frac{\partial^{2}y}{\partial x^{2}}(x)\Big{)}

({\cal A}y)(x):=a^{\top}(x)\frac{\partial y}{\partial x}(x)+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\mbox{tr}\Big{(}\sigma(x)\sigma^{\top}(x)\frac{\partial^{2}y}{\partial x^{2}}(x)\Big{)}

\ell(y,v,u\,;x)={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\Big{|}\sigma^{\top}(x)\frac{\partial y}{\partial x}(x)\Big{|}^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}(u+v(x))^{\top}R(u+v(x))

\ell(y,v,u\,;x)={\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}\Big{|}\sigma^{\top}(x)\frac{\partial y}{\partial x}(x)\Big{|}^{2}+{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}(u+v(x))^{\top}R(u+v(x))

\displaystyle\mathop{\text{Min}}_{U\in\;{\cal U}}\ {\sf J}(U)={\sf E}\Big{(}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|Y_{0}(X_{0})-\pi_{0}(Y_{0})|^{2}+\int_{0}^{T}\ell(Y_{t},V_{t},U_{t}\,;X_{t})\,\mathrm{d}t\Big{)}

\displaystyle\mathop{\text{Min}}_{U\in\;{\cal U}}\ {\sf J}(U)={\sf E}\Big{(}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|Y_{0}(X_{0})-\pi_{0}(Y_{0})|^{2}+\int_{0}^{T}\ell(Y_{t},V_{t},U_{t}\,;X_{t})\,\mathrm{d}t\Big{)}

\displaystyle\text{Subj.}\ \,\mathrm{d}Y_{t}(x)=-\big{(}({\cal A}Y_{t})(x)+h^{\top}(x)(U_{t}+V_{t}(x))\big{)}\,\mathrm{d}t+V_{t}^{\top}(x)\,\mathrm{d}Z_{t}

Y_{T} (x) = f (x), \forall x \in R^{d}

U := L_{Z}^{2} ([0, T]; R^{m})

U := L_{Z}^{2} ([0, T]; R^{m})

Q (e_{i}) := j = 1 \sum d A_{ij} (e_{j} - e_{i}) (e_{j} - e_{i})^{⊤}, i = 1, \dots, d

Q (e_{i}) := j = 1 \sum d A_{ij} (e_{j} - e_{i}) (e_{j} - e_{i})^{⊤}, i = 1, \dots, d

ℓ (y, v, u; x) = \frac{1}{2} y^{⊤} Q (x) y + \frac{1}{2} (u + v^{⊤} x)^{⊤} R (u + v^{⊤} x)

ℓ (y, v, u; x) = \frac{1}{2} y^{⊤} Q (x) y + \frac{1}{2} (u + v^{⊤} x)^{⊤} R (u + v^{⊤} x)

\displaystyle\mathop{\text{Min}}_{U\in\,{\cal U}}\ {\sf J}(U)={\sf E}\Big{(}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|Y_{0}^{\top}(X_{0}-\pi_{0})|^{2}+\int_{0}^{T}\ell(Y_{t},V_{t},U_{t}\,;X_{t})\,\mathrm{d}t\Big{)}

\displaystyle\mathop{\text{Min}}_{U\in\,{\cal U}}\ {\sf J}(U)={\sf E}\Big{(}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}|Y_{0}^{\top}(X_{0}-\pi_{0})|^{2}+\int_{0}^{T}\ell(Y_{t},V_{t},U_{t}\,;X_{t})\,\mathrm{d}t\Big{)}

\displaystyle\text{Subj. }\ \,\mathrm{d}Y_{t}=-\big{(}AY_{t}+HU_{t}+\operatorname{diag}(HV_{t}^{\top})\big{)}\,\mathrm{d}t+V_{t}\,\mathrm{d}Z_{t},\quad Y_{T}=f

S_{T} = π_{0} (Y_{0}) - \int_{0}^{T} U_{t}^{⊤} d Z_{t}

S_{T} = π_{0} (Y_{0}) - \int_{0}^{T} U_{t}^{⊤} d Z_{t}

J (U) = \frac{1}{2} E (∣ S_{T} - f (X_{T}) ∣^{2})

J (U) = \frac{1}{2} E (∣ S_{T} - f (X_{T}) ∣^{2})

π_{t} (Y_{t}) = π_{0} (Y_{0}) - \int_{0}^{t} U_{s}^{⊤} d Z_{s}

π_{t} (Y_{t}) = π_{0} (Y_{0}) - \int_{0}^{t} U_{s}^{⊤} d Z_{s}

a (x) = A x and h (x) = H x

a (x) = A x and h (x) = H x

\frac{\partial y _{t}}{\partial t} (x) = - (A y_{t}) (x) - h^{⊤} (x) u_{t}, y_{T} (x) \equiv f^{⊤} x \forall x \in R^{d}

\frac{\partial y _{t}}{\partial t} (x) = - (A y_{t}) (x) - h^{⊤} (x) u_{t}, y_{T} (x) \equiv f^{⊤} x \forall x \in R^{d}

V := {\tilde{y} : \tilde{y} (x) = y^{⊤} x \forall x \in R^{d} where y \in R^{d}}

V := {\tilde{y} : \tilde{y} (x) = y^{⊤} x \forall x \in R^{d} where y \in R^{d}}

\frac{d y _{t}}{d t} = - A^{⊤} y_{t} - H^{⊤} u_{t}, y_{T} = f

\frac{d y _{t}}{d t} = - A^{⊤} y_{t} - H^{⊤} u_{t}, y_{T} = f

L (y, u) := \frac{1}{2} y^{⊤} Q y + \frac{1}{2} u^{⊤} R u

L (y, u) := \frac{1}{2} y^{⊤} Q y + \frac{1}{2} u^{⊤} R u

Minimize_{u} : J (u) = \frac{1}{2} y_{0}^{⊤} Σ_{0} y_{0} + \int_{0}^{T} L (y_{t}, u_{t}) d t

Minimize_{u} : J (u) = \frac{1}{2} y_{0}^{⊤} Σ_{0} y_{0} + \int_{0}^{T} L (y_{t}, u_{t}) d t

Subject to : \frac{d y _{t}}{d t} = - A^{⊤} y_{t} - H^{⊤} u_{t}, y_{T} = f

S_{T}

S_{T}

J (U) = \frac{1}{2} Y_{0}^{⊤} Σ_{0} Y_{0} + \int_{0}^{T} \frac{1}{2} U_{t}^{⊤} R U_{t} + \frac{1}{2} Y_{t}^{⊤} E (Q (X_{t})) Y_{t} d t

J (U) = \frac{1}{2} Y_{0}^{⊤} Σ_{0} Y_{0} + \int_{0}^{T} \frac{1}{2} U_{t}^{⊤} R U_{t} + \frac{1}{2} Y_{t}^{⊤} E (Q (X_{t})) Y_{t} d t

filter :

filter :

covariance :

innovation :

d Y_{t}

d Y_{t}

\displaystyle=-\big{(}AY_{t}+HU_{t}+\operatorname{diag}(HV_{t}^{\top})-V_{t}H^{\top}\pi_{t}\big{)}\,\mathrm{d}t+V_{t}\,\mathrm{d}{\sf I}_{t}

J (U) = \frac{1}{2} E (∣ Y_{0} (X_{0}) - π_{0} (Y_{0}) ∣^{2}) + \int_{0}^{T} E (ℓ (Y_{t}, V_{t}, U_{t}; X_{t}) ∣ Z_{t}) d t

J (U) = \frac{1}{2} E (∣ Y_{0} (X_{0}) - π_{0} (Y_{0}) ∣^{2}) + \int_{0}^{T} E (ℓ (Y_{t}, V_{t}, U_{t}; X_{t}) ∣ Z_{t}) d t

L (y, v, u; μ) = \frac{1}{2} y^{⊤} μ (Q) y + \frac{1}{2} u^{⊤} R u + u^{⊤} R v^{⊤} μ + \frac{1}{2} μ^{⊤} diag (v R v^{⊤})

L (y, v, u; μ) = \frac{1}{2} y^{⊤} μ (Q) y + \frac{1}{2} u^{⊤} R u + u^{⊤} R v^{⊤} μ + \frac{1}{2} μ^{⊤} diag (v R v^{⊤})

\displaystyle\mathop{\text{Min}}_{U\in{\cal U}}\ {\sf J}(U)={\sf E}\Big{(}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}Y_{0}^{\top}\Sigma_{0}Y_{0}+\int_{0}^{T}{\cal L}(Y_{t},V_{t},U_{t};\pi_{t})\,\mathrm{d}t\Big{)}

\displaystyle\mathop{\text{Min}}_{U\in{\cal U}}\ {\sf J}(U)={\sf E}\Big{(}{\mathchoice{\genfrac{}{}{}{1}{1}{2}}{\genfrac{}{}{}{2}{1}{2}}{\genfrac{}{}{}{3}{1}{2}}{\genfrac{}{}{}{4}{1}{2}}}Y_{0}^{\top}\Sigma_{0}Y_{0}+\int_{0}^{T}{\cal L}(Y_{t},V_{t},U_{t};\pi_{t})\,\mathrm{d}t\Big{)}

\displaystyle\text{Subj.}\ \,\mathrm{d}Y_{t}=-\big{(}AY_{t}+HU_{t}+\operatorname{diag}(HV_{t}^{\top})-V_{t}H^{\top}\pi_{t}\big{)}\,\mathrm{d}t+V_{t}\,\mathrm{d}{\sf I}_{t},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

What is the Lagrangian for Nonlinear Filtering?

Jin-Won Kim, Prashant G. Mehta and Sean P. Meyn Financial support from the NSF grant 1761622 and the ARO grant W911NF1810334 is gratefully acknowledged. J-W. Kim and P. G. Mehta are with the Coordinated Science Laboratory and the Department of Mechanical Science and Engineering at the University of Illinois at Urbana-Champaign (UIUC); S. P. Meyn is with the Department of Electrical and Computer Engineering at the University of Florida at Gainesville; Corresponding email: [email protected].

Abstract

Duality between estimation and optimal control is a problem of rich historical significance. The first duality principle appears in the seminal paper of Kalman-Bucy, where the problem of minimum variance estimation is shown to be dual to a linear quadratic (LQ) optimal control problem. Duality offers a constructive proof technique to derive the Kalman filter equation from the optimal control solution.

This paper generalizes the classical duality result of Kalman-Bucy to the nonlinear filter: The state evolves as a continuous-time Markov process and the observation is a nonlinear function of state corrupted by an additive Gaussian noise. A dual process is introduced as a backward stochastic differential equation (BSDE). The process is used to transform the problem of minimum variance estimation into an optimal control problem. Its solution is obtained from an application of the maximum principle, and subsequently used to derive the equation of the nonlinear filter. The classical duality result of Kalman-Bucy is shown to be a special case.

I Introduction

In Kalman’s celebrated paper with Bucy, it is shown that the problem of minimum variance estimation is dual to a deterministic optimal control problem [1]. Duality offers a constructive proof technique to derive the Kalman filter equation from the optimal control solution [2, Ch. 7]. Apart from the formulation’s aesthetic appeal to control theorists and aficionados of variational techniques, the proof helps explain why, with the time arrow reversed, the covariance update equation of the Kalman filter is the same as the dynamic Riccati equation (DRE) of optimal control. Given this, two natural questions are: (i) What is the dual optimal control problem for the nonlinear filter? and (ii) Can the equation of the nonlinear filter be derived from the solution of an optimal control problem? These questions are answered in this paper.

In classical linear Gaussian settings, dual constructions are of the following two types [3, Sec. 7.3]: (i) minimum variance estimation, which was first outlined in Kalman-Bucy’s original paper, and (ii) minimum energy estimation whose formulation first appears in [4].

Given the historical significance of this area, several extensions have been considered over the decades [4, 5, 6, 7, 8]. Much work has been done on extending and interpreting the duality for minimum energy estimation as (i) a MAP estimator [8, 9]; (ii) through an application of the log transformation to transform the Bellman equation of optimal control into the Zakai equation of filtering [6, 7, 10]; or (iii) based upon the variational Kallianpur-Striebel formula [11], [12, Lemma 2.2.1]. Based on these extensions, the negative log-posterior has been shown to have an interpretation as an optimal value function. Such a formulation is used to derive the equations of nonlinear smoothing in a companion paper on arxiv [13].

It must be said that none of these earlier results are extensions of duality for the minimum variance estimation problem.

It has been noted in prior work that (i) the dual relationship between the DRE of the LQ optimal control and the covariance update equation of the Kalman filter is not consistent with the interpretation of the negative log-posterior as a value function, and (ii) some of the linear algebraic operations, e.g., the use of matrix transpose to define the dual system, are not applicable to nonlinear systems [14, 9]. For these reasons, the original duality of Kalman-Bucy is widely understood as an LQG artifact that does not generalize [14].

The present paper has a single contribution: generalization of the original Kalman-Bucy duality theory to nonlinear filtering. It is an exact extension, in the sense that the dual optimal control problem has the same minimum variance structure for linear and nonlinear filtering problems. Kalman-Bucy’s linear Gaussian result is shown to be a special case. Explicit expressions for the control Lagrangian and the Hamiltonian are described. These expressions are expected to be useful to construct approximate algorithms for filtering via learning techniques that have become popular of late.

A related but distinct formulation for duality was proposed by the authors in a recent paper [15]. The formulation described in this paper is original and fixes many of the issues (information structure, stochastic terms in the objective function) in the earlier paper. Both the objective function and the BSDE constraint described in this paper are original. The main analysis technique – the use of maximum principle to derive the nonlinear filter – is also original.

The outline of the remainder of this paper is as follows: The dual optimal control problem is proposed in Sec. II. The dual problem is described for two cases: in the first the state space is finite, and in the second the state is defined as an Itô-diffusion. The solutions for these two cases appears in Sec. III and Sec. IV, respectively. A martingale characterization of the solution appears in Sec. V. The proofs are contained in the Appendix.

II Problem Formulation

Notation: For state-space denoted by $\mathbb{S}$ , we let ${\cal B}(\mathbb{S})$ denote the Borel $\sigma$ -algebra on $\mathbb{S}$ , and ${\cal P}(\mathbb{S})$ denote the set of probability measures on ${\cal B}(\mathbb{S})$ . Any probability measure $\mu\in{\cal P}(\mathbb{S})$ acts on a Borel-measurable function $y$ according to $\mu(y):=\int_{\mathbb{S}}y(x)\mu(\,\mathrm{d}x)$ .

For a filtration ${\cal Z}:=\{{\cal Z}_{t}:0\leq t\leq T\}$ and a measurable space $S$ , $L_{{\cal Z}}^{2}([0,T]\,;{S})$ denotes the space of ${\cal Z}$ -adapted square-integrable processes taking values in $S$ . Likewise, $L^{2}_{{\cal Z}_{T}}(S)$ is the ${\cal Z}_{T}$ -measurable square-integrable random variables taking values in $S$ . $C^{k}(\mathbb{R}^{d};S)$ is the space of $k$ -times differentiable functions from $\mathbb{R}^{d}$ to $S$ , and $L^{2}(\mathbb{R}^{d};S)$ is the space of square-integrable (with respect to the Lebesgue measure) function from $\mathbb{R}^{d}$ to $S$ .

For a matrix, $\mbox{tr}(\cdot)$ denotes the trace and $\operatorname{diag}(\cdot)$ denotes the vector of its diagonal entries. For a vector, $\operatorname{diag}^{\dagger}(\cdot)$ denotes a diagonal matrix with diagonal entries given by the vector. For a function $y\in C^{2}(\mathbb{R}^{d})$ , $\frac{\partial y}{\partial x}$ is the gradient vector,

and $\frac{\partial^{2}y}{\partial x^{2}}$ is the Hessian matrix. For a vector-valued function $f\in C^{1}(\mathbb{R}^{d}\,;\mathbb{R}^{d})$ , $\operatorname{div}(f)$ is the divergence of $f$ .

II-A Filtering problem

Consider a pair of continuous-time stochastic processes $(X,Z)$ . The state $X=\{X_{t}:\,0\leq t\leq T\}$ is a Markov process that evolves in the state-space $\mathbb{S}$ . The vector-valued observation process $Z=\{Z_{t}:\,0\leq t\leq T\}$ is defined according to the following model:

[TABLE]

where $h:\mathbb{S}\rightarrow\mathbb{R}^{m}$ is the observation function and $W=\{W_{t}:\,t\geq 0\}$ is an $m$ -dimensional Wiener process (w.p.) with covariance matrix $R\succ 0$ . The initial distribution for $X_{0}$ is denoted $\pi_{0}\in{\cal P}(\mathbb{S})$ .

The filtering problem is to compute the conditional distribution (posterior) of the state $X_{t}$ given the filtration (time history of observations) ${\cal Z}_{t}:=\sigma(Z_{s},0\leq s\leq t)$ . The posterior distribution at time $t$ is denoted as $\pi_{t}$ . It is an element of ${\cal P}(\mathbb{S})$ . For any integrable function $f:\mathbb{S}\rightarrow\mathbb{R}$ ,

[TABLE]

It is well-known that $\pi_{t}(f)$ is the minimum variance estimator of $f(X_{t})$ [16, Lemma 5.1]. The central question of this paper is to formulate the minimum variance estimation as a dual optimal control problem.

These formulations are described in Sec. II-B and Sec. II-C, for the Euclidean and the finite state-space settings, respectively. The duality principle is then presented in Sec. II-D. The relationship to the well known linear Gaussian case is discussed in Sec. II-E.

II-B Itô diffusion on the Euclidean space

The state process $X$ evolves on $\mathbb{S}=\mathbb{R}^{d}$ according to the Itô stochastic differential equation (SDE)

[TABLE]

where $B=\{B_{t}:\,t\geq 0\}$ is a vector valued standard w.p., and $a(\cdot),\;\sigma(\cdot)$ are $C^{2}$ functions of appropriate dimensions. It is assumed that $W$ , $B$ , $X_{0}$ are mutually independent.

The differential generator of $X$ , denoted as ${\cal A}$ , acts on $C^{2}$ functions in its domain according to

[TABLE]

It is assumed that ${\cal A}$ is an elliptic operator: there is $\varepsilon>0$ such that $\sigma(x)\sigma^{\top}(x)\geq\varepsilon I$ for all $x\in\mathbb{R}^{d}$ .

For functions $y\in C^{1}(\mathbb{R}^{d};\mathbb{R})$ , $v\in C(\mathbb{R}^{d};\mathbb{R}^{m})$ , $u\in\mathbb{R}^{m}$ , and $x\in\mathbb{R}^{d}$ , a cost function is defined as follows:

[TABLE]

Dual optimal control problem:

[TABLE]

The constraint (4b) is a backward stochastic partial differential equation (BSPDE) with boundary condition prescribed at the terminal time $T$ . The function $f$ appearing in this boundary condition is allowed to be random, with $f\in L^{2}_{{\cal Z}_{T}}(L^{2}(\mathbb{R}^{d};\mathbb{R}))$ .

The admissible set of control input is as follows:

[TABLE]

The solution $(Y,V):=\{(Y_{t}(x),V_{t}(x))\,:\,0\leq t\leq T,\;x\in\mathbb{R}^{d}\}$ of the BPSDE is adapted to the filtration ${\cal Z}$ . It is an element of $L^{2}_{{\cal Z}}([0,T];L^{2}(\mathbb{R}^{d};\mathbb{R}))\times L^{2}_{{\cal Z}}([0,T];L^{2}(\mathbb{R}^{d};\mathbb{R}^{m}))$ ; cf., [17].

II-C Finite state-space

The continuous-time process evolves on the finite state space $\mathbb{S}=\{e_{1},\ldots,e_{d}\}$ . Its statistics are characterized by the initial distribution $\pi_{0}\in{\cal P}(\mathbb{S})$ and the row-stochastic rate matrix $A\in\mathbb{R}^{d\times d}$ .

The dual of ${\cal P}(\mathbb{S})$ is the space of all functions on $\mathbb{S}$ , which can be identified with $\mathbb{R}^{d}$ : Any function $y:\mathbb{S}\to\mathbb{R}$ is determined by its values at the basis vectors $\{e_{i}\}$ , and for any $\pi\in{\cal P}(\mathbb{S})$ , the expectation can be expressed as a dot product: $\pi(y)=\sum\pi(e_{i})y(e_{i})$ . Similarly, the observation function can be expressed $h(x)=H^{\top}x$ , $x\in\mathbb{S}$ , where $H\in\mathbb{R}^{d\times m}$ .

We also define a $d\times d$ matrix for each $i$ as follows:

[TABLE]

The cost function in this case is defined as

[TABLE]

where $y\in\mathbb{R}^{d}$ , $v\in\mathbb{R}^{d\times m}$ , $u\in\mathbb{R}^{m}$ , and $x\in\mathbb{S}\subset\mathbb{R}^{d}$ .

Dual optimal control problem:

[TABLE]

The constraint is a backward stochastic differential equation (BSDE) with terminal condition as before. We write $Y_{T}=f\in L^{2}_{{\cal Z}_{T}}(\mathbb{R}^{d})$ , with our convention that functions on $\mathbb{S}$ are identified with vectors in $\mathbb{R}^{d}$ .

The admissible set of control input is ${\cal U}=L_{{\cal Z}}^{2}([0,T]\,;{\mathbb{R}^{m}})$ and the solution pair $\{(Y_{t},V_{t})\,:\,0\leq t\leq T\}=:(Y,V)\in L_{{\cal Z}}^{2}([0,T]\,;{\mathbb{R}^{d}})\times L_{{\cal Z}}^{2}([0,T]\,;{\mathbb{R}^{d\times m}})$ ; cf. [18, Ch. 7].

II-D Duality relationship

Suppose $Z$ is defined according the observation model (1). Consider an admissible control input $U\in{\cal U}$ and define $Y_{0}$ via solution of the BSDE – Eq. (4b) for the Euclidean case, and Eq. (7b) for the finite case. Assume the following linear structure of the estimator:

[TABLE]

The duality relationship is expressed in the following proposition whose proof appears in Appendix. A-A:

Proposition 1

Consider the observation model (1), the linear estimator (8), together with the dual optimal control problem. Then for any choice of admissible control $U\in{\cal U}$ :

[TABLE]

Thus, formally, the problem of obtaining the minimum variance estimate $S_{T}$ of $f(X_{T})$ (minimizer of the right-hand side of the equality) is converted into the problem of finding the optimal control $U$ (minimizer of the left-hand side of the identity). However, there is a subtle problem: It is not apriori clear whether there exists a $U\in{\cal U}$ such that $S_{T}=\pi_{T}(f)$ 111This will be true, e.g., if all ${\cal Z}_{T}$ -measurable random variable have a representation of the form (8)..

In this paper, the following assumption is made:

Assumption A1: For each fixed terminal time $T>0$ and function $f\in L^{2}_{{{\cal Z}}_{T}}$ , there exists a $U\in{\cal U}$ such that $S_{T}=\pi_{T}(f)$ .

Under this assumption, the following is proved in Appendix. A-B:

Proposition 2

Consider the dual optimal control problem. Suppose $U=\{U_{t}:0\leq t\leq T\}$ is the optimal control input and that $Y=\{Y_{t}:0\leq t\leq T\}$ is the associated optimal trajectory obtained as a solution of the BSDE. Then for all $t\in[0,T]$ :

[TABLE]

II-E Linear-Gaussian case

The linear-Gaussian case assumes the following model:

The drift in the Itô diffusion is linear in $x$ . That is,

[TABLE]

where $A\in\mathbb{R}^{d\times d}$ and $H\in\mathbb{R}^{m\times d}$ . 2. 2.

The coefficient of the process noise is a constant matrix, $\sigma(x)\equiv\sigma$ . We denote $Q:=\sigma\sigma^{\top}\in\mathbb{R}^{d\times d}$ . 3. 3.

The prior $\pi_{0}$ is a Gaussian distribution with mean $m_{0}\in\mathbb{R}^{d}$ and variance $\Sigma_{0}\in\mathbb{R}^{d\times d}$ .

Classical Kalman-Bucy duality is concerned with the problem of constructing a minimum variance estimator for the random variable $f^{\top}X_{T}$ where $f\in\mathbb{R}^{d}$ is a given deterministic vector [1]. Therefore, we also have

The terminal condition in (4b) is a linear function $f^{\top}x$ .

We impose the following restrictions:

The control input $U=u$ is restricted to be a deterministic function of time (in particular, it does not depend upon the observations). Such a control is trivially ${\cal Z}$ -adapted hence admissible. For such a control input, with deterministic $f$ , the solution $Y=y$ of the BSPDE is a deterministic function of time, and $V=0$ . The BSPDE becomes a PDE:

[TABLE]

where the lower-case notation is used to stress the fact that $u$ and $y$ are now deterministic functions of time. 2. 2.

Without loss of generality, it suffices to consider the restriction of the optimal control problem (4) on a finite ( $d-$ ) dimensional subspace of the function space:

[TABLE]

It is easy to see that ${\cal V}$ is an invariant subspace for the dynamics (10). On ${\cal V}$ , the PDE reduces to an ODE:

[TABLE]

and the cost function becomes

[TABLE]

It no longer depends upon $x$ .

In summary, the optimal control problem (4) reduces to the deterministic LQ problem of classical duality:

[TABLE]

The solution of the optimal control problem yields the optimal control input $u$ , along with the vector $y_{0}$ that determines the minimum-variance estimator:

[TABLE]

The Kalman filter is obtained by expressing $\{S_{t}(f):t\geq 0,\ f\in\mathbb{R}^{d}\}$ as the solution to a linear SDE [2, Ch. 7].

Remark 1

Consider the dual optimal control problem (7) for the finite state space model. Suppose one only allows control inputs $U$ that are deterministic functions of time. In this case, $Y$ is a deterministic function of time and $V=0$ . Consequently, the objective function in (7a) simplifies

[TABLE]

*where $\Sigma_{0}:={\sf E}((X_{0}-\pi_{0})(X_{0}-\pi_{0})^{\top})$ and ${\sf E}(Q(X))$ is the quadratic variation process for $X$ . The resulting problem is a deterministic LQ problem whose optimal solution $\{U_{t}:0\leq t\leq T\}$ will (in general) yield a sub-optimal estimate $S_{T}$ using (8). In Appendix A-C, the solution is used to derive a Kalman filter for the Markov chain. Such sub-optimal filters for Markov chains have been applied in [19, 20]. *

We have now set the stage to derive the nonlinear filter via the solution to the dual optimal control problem. We describe the solution for the finite state case first in Sec. III. Although technically and notationally more challenging, the considerations for the Euclidean case are entirely analogous and described in Sec. IV.

III Solution for the finite-state case

III-A Standard form of the optimal control problem

Consider the dual optimal control problem (7) associated with the finite state space model. There is only one natural filtration in this problem, the filtration ${\cal Z}$ generated by the observation process.

The ‘state’ of the optimal control problem $(Y,V)$ is adapted to the filtration by construction. So, the problem is fully observed. However, the problem is not in its standard form [21, Def. 5.4].

There are two issues:

The stochastic process $Z$ on the right-hand side of the BSDE (7b) is not a Wiener process. 2. 2.

The cost function in (7b) depends upon the exogenous process $X$ which is not adapted to ${\cal Z}$ .

In order to resolve these issues and express the optimal control problem in a standard form, we introduce three stochastic processes $\pi:=\{\pi_{t}\in{\cal P}(\mathbb{S})\subset\mathbb{R}^{d}:0\leq t\leq T\}$ , $\Sigma:=\{\Sigma_{t}\in\mathbb{R}^{d\times d}:0\leq t\leq T\}$ , and ${\sf I}:=\{{\sf I}_{t}\in\mathbb{R}^{m}:0\leq t\leq T\}$ as follows:

[TABLE]

From the standard filtering theory, it is well known that i) ${\sf I}$ is a Wiener process that is adapted to ${\cal Z}$ [16, Lemma 5.6],

and ii) the filtration $\sigma({\sf I}_{s}:0\leq s\leq t)$ generated by the innovation process equals ${\cal Z}_{t}$ for all $t\in[0,T]$ [22].

We therefore express the BSDE constraint as

[TABLE]

The solution $(Y,V)$ is adapted to ${\cal Z}$ , now interpreted as the filtration generated by the innovation process.

In order to remove the explicit dependence of the cost function on the non-adapted process $X$ , we use the tower property of the conditional expectation:

[TABLE]

Denote ${\cal{L}}(Y_{t},V_{t},U_{t}\,;\pi_{t})\mathbin{:=}{\sf E}(\ell(Y_{t},V_{t},U_{t}\,;X_{t})|{\cal Z}_{t})$ .

Because $Y,V,U$ are all ${\cal Z}$ -adapted, it is a straightforward calculation to see that

[TABLE]

where $\mu(Q):=\operatorname{diag}^{\dagger}(A^{\top}\mu)-A^{\top}\operatorname{diag}^{\dagger}(\mu)-\operatorname{diag}^{\dagger}(\mu)A$ . The function ${\cal{L}}$ is the control Lagrangian.

We are now ready to state the standard form of the optimal control problem for the finite state-space case:

Dual optimal control problem (standard form):

[TABLE]

In its standard form, all the processes are adapted to the filtration ${\cal Z}$ and moreover the stochastic process ${\sf I}$ on the right-hand side of the BSDE is a Wiener process with respect to this filtration.

III-B Solution using the maximum principle

The Hamiltonian ${\cal H}:\mathbb{R}^{d}\times\mathbb{R}^{d\times m}\times\mathbb{R}^{m}\times\mathbb{R}^{d}\times{\cal P}(\mathbb{S})\to\mathbb{R}$ is defined as follows:

[TABLE]

A characterization of the optimal input is contained in the following. Its proof, based on an application of the maximum principle [23], appears in the Appendix. A-D.

Theorem 1

Consider the optimal control problem (13). Suppose $U=\{U_{t}:0\leq t\leq T\}$ is the optimal control input and that $(Y,V)=\{(Y_{t},V_{t}):0\leq t\leq T\}$ is the associated optimal trajectory obtained as a solution of the BSDE (13b). Then there exists a ${\cal Z}$ -adapted vector valued process $P=\{P_{t}:0\leq t\leq T\}$ such that

[TABLE]

where $P$ and $Y$ satisfy

[TABLE]

Remark 2

From linear optimal control theory, it is known that $P_{t}=M_{t}Y_{t}$ (see [18, Sec. 6.6]) where $M:=\{M_{t}\in\mathbb{R}^{d\times d}:0\leq t\leq T\}$ is a ${\cal Z}$ -adapted matrix-valued process. The boundary condition $P_{0}=\Sigma_{0}Y_{0}$ suggests that $M=\Sigma$ . This is indeed the case as shown in the proof of Theorem 2, where the following equation is derived (see (25)):

[TABLE]

This is the DRE of the nonlinear filter.

III-C Derivation of the nonlinear filter

From Prop. 2, using the formula (14) for the optimal control,

[TABLE]

This formula is used to derive the Wonham filter [24]. The proof of the following theorem appears in the Appendix. A-E.

Theorem 2 (Nonlinear filter)

Consider the optimal estimator (16) where $(Y,V)$ and $P$ solve Hamilton’s equations (15). Then

[TABLE]

*and furthermore $P_{t}=\Sigma_{t}Y_{t}$ for all $t\in[0,T]$ . *

IV Solution for the Euclidean case

As in the finite state-space case, the starting point is the definition of the Lagrangian: ${\cal{L}}(Y_{t},V_{t},U_{t}\,;\pi_{t}):={\sf E}(\ell(Y_{t},V_{t},U_{t}\,;X_{t})|{\cal Z}_{t})$ where the cost function $\ell$ for the Euclidean case is defined in (3) and $\pi_{t}$ is the conditional distribution at time $t$ defined according to (2). Explicitly,

[TABLE]

The standard form of the optimal control problem is as follows:

Dual optimal control problem (standard form):

[TABLE]

Assumption A2: The (generic) measure $\mu$ is absolutely continuous with respect to Lebesgue measure.

Notation: The Radon-Nikodyn derivative is denoted as $\tilde{\mu}(x):=\frac{\,\mathrm{d}\mu}{\,\mathrm{d}x}(x)$ . Consequently, $\mu(f)=\int_{\mathbb{R}^{d}}f(x)\tilde{\mu}(x)\,\mathrm{d}x=:\langle\tilde{\mu},f\rangle$ . In the remainder of this section, with a slight abuse of notation, we will drop the tilde to simply write $\langle\mu,f\rangle$ .

The co-state $p\in L^{2}(\mathbb{R}^{d};\mathbb{R})$ and the Hamiltonian are defined as follows:

[TABLE]

Hamilton’s equations are described in the following, while the proof appears in Appendix. A-F:

Theorem 3

Consider the optimal control problem (18). Suppose $U=\{U_{t}:0\leq t\leq T\}$ is the optimal control input and the $(Y,V)=\{(Y_{t},V_{t}):0\leq t\leq T\}$ is the associated optimal solution obtained by solving BSPDE (18b). Then there exists a ${\cal Z}$ -adapted function-valued process $P=\{P_{t}:0\leq t\leq T\}$ such that

[TABLE]

where the optimal control is given by

[TABLE]

Using the result of Prop. 2, the optimal estimator is

[TABLE]

As before, the formula is used to derive the equation for the Kushner filter equation [25]. The proof appears in Appendix. A-G:

Theorem 4

Consider the optimal estimator (21) where $(Y,V)$ and $P$ solve Hamilton’s equation (19). Then the conditional density solves the SPDE:

[TABLE]

and furthermore $P_{t}(x)=\pi_{t}(x)\big{(}Y_{t}(x)-\langle\pi_{t},Y_{t}\rangle\big{)}$ for all $x\in\mathbb{R}^{d}$ and $t\in[0,T]$ .

V A martingale characterization

For the finite state-space case, define the function ${\cal V}:\mathbb{R}^{d}\times{\cal P}(\mathbb{R}^{d})\rightarrow\mathbb{R}$ as follows:

[TABLE]

For the Euclidean case, the analogous function is as follows:

[TABLE]

In the statement of the following theorem, $U^{*}$ denotes the optimal control input as defined according to the formula (14) for the finite-state-space and the formula (20) for the Euclidean case. The proof appears in the Appendix. A-H.

Theorem 5

Suppose $\pi$ is the posterior process and $(Y,V)$ is the ${\cal Z}$ -adapted solution of the dual BSDE. For every input $U\in{\cal U}$ , the process

[TABLE]

is a supermartingale; it is a martingale if and only if $U=U^{*}$ . Consequently,

[TABLE]

with equality if and only if $U=U^{*}$ . Consequently, the right-hand side is the value function for the dual optimal control problem.

Appendix A Appendix

A-A Proof of Prop. 1

Euclidean case: For a random function $Y_{t}(\,\cdot\,)$ , the Itô-Wentzell theorem [26, Thm. 1.17] gives the formula for the differential $\,\mathrm{d}(Y_{t}(X_{t}))$

[TABLE]

Integrating over $[0,T]$ ,

[TABLE]

Using the formula (8) for the estimator, the error

[TABLE]

Upon squaring and taking an expectation,

[TABLE]

Finite state-space case: A martingale $N=\{N_{t}:t\geq 0\}$ is defined as follows:

[TABLE]

whose quadratic variation is $[N]_{t}=\int_{0}^{t}Q(X_{s})\,\mathrm{d}s$ (see (5) for the definition of $Q$ ). Itô’s product formula gives

[TABLE]

Using this, together with (8) for the estimator, results in the following error equation:

[TABLE]

Upon squaring and taking an expectation,

[TABLE]

A-B Proof of Prop. 2

Suppose $U\in{\cal U}$ . Define

[TABLE]

Using Prop. 1,

[TABLE]

By the dynamic programming principle, if $\{U_{s}:0\leq s\leq T\}$ is an optimal control input over the time-horizon $[0,T]$ then $\{U_{s}:0\leq s\leq t\}$ minimizes the right-hand side over all ${\cal Z}$ -adapted control inputs.

It then follows from Assumption (A1) that $S_{t}=\pi_{t}(Y_{t})$ .

A-C Derivation of the Kalman filter for the Markov process

The optimal control solution is given by

[TABLE]

Upon substituting the solution into the estimator (8)

[TABLE]

where

[TABLE]

Define $\Phi_{t,T}$ to denote the state transition matrix and express the solution as $Y_{t}=\Phi_{t,T}f$ . Thus,

[TABLE]

Noting that time $T$ is arbitrary, upon differentiating with respect to $T$ , one obtains the Kalman filter

[TABLE]

where we have replaced $T$ by $t$ .

A-D Proof of Thm. 1

Equation (15a)-(15c) are Hamilton’s equation for optimal control of a BSDE [23]. Explicitly, the partial derivatives are as follows:

[TABLE]

Using these formulae, the explicit form of Hamilton’s equation is as follows:

[TABLE]

The optimal control is obtained from the maximum principle:

[TABLE]

Since ${\cal H}$ is quadratic in the control input, the explicit formula (14) is obtained by setting the derivative to zero:

[TABLE]

A-E Proof of Thm. 2

As noted in Remark 2, the optimal control problem for the finite state-space case has a linear structure, and thus $P_{t}=M_{t}Y_{t}$ for some matrix-valued process $\{M_{t}\in\mathbb{R}^{d\times d}:0\leq t\leq T\}$ . The proof is broken into two steps: In step 1, we assume $M_{t}=\Sigma_{t}$ and derive the equation (17) of the Wonham filter. In step 2, we show that this assumption is consistent with the filter.

Step 1: The foregoing implies the following identities

[TABLE]

The first equality is (16), and the second follows from Itô’s product formula.

Hamilton’s equation (22b) gives a formula for $\,\mathrm{d}Y_{t}$ , which when combined with (23) gives

[TABLE]

Upon integrating both sides, one finds that

[TABLE]

is a finite variation process. Therefore, by an application of [27, Theorem 4.8],

[TABLE]

for some yet to be determined process $g$ . Now,

[TABLE]

Therefore, the right-hand side of (24) must be zero:

[TABLE]

This gives the equation of the Wonham filter.

Step 2: It remains to verify that $P_{t}=\Sigma_{t}Y_{t}$ . Using the equation of the Wonham filter (17) and the definition (12b) for $\Sigma_{t}$ , it is a direct calculation to see that

[TABLE]

The assertion is shown by establishing

[TABLE]

and noting $\Sigma_{0}Y_{0}=P_{0}$ .

The calculation showing (26) is notationally cumbersome but straightforward. It is included in step 3 below.

Step 3: For any two column vectors $a,b\in\mathbb{R}^{d}$ , $a\cdot b$ denotes the Hadamard (element-wise) product. For $a,b\in\mathbb{R}^{d}$ , it is a straightforward calculation to see

[TABLE]

Multiplying both sides of the matrix-valued equation (26) by $Y_{t}$ , upon using the identity with $a=HR^{-1}\,\mathrm{d}{\sf I}_{t}$ and $b={Y}_{t}$ to simplify the righthand-side, one obtains

[TABLE]

Similarly, multiplying both sides of (26) by ${V}_{t}\,\mathrm{d}{\sf I}_{t}$ , using the identity with $a=HR^{-1}\,\mathrm{d}{\sf I}_{t}$ and $b={V}_{t}\,\mathrm{d}{\sf I}_{t}$ , after applying Itô rules to simplify the righthand-side, one obtains

[TABLE]

Therefore,

[TABLE]

The right-hand side is identical to the right-hand side of Hamilton’s equation (22a) for $P_{t}$ .

A-F Proof of Thm. 3

Equation (19) are Hamilton’s equation. The Gateaux differentials of ${\cal H}$ are

[TABLE]

Therefore the explicit formulas for Hamilton’s equations are

[TABLE]

The optimal control is obtained from the maximum principle:

[TABLE]

The explicit formula (20) for the optimal control is obtained by setting the derivative to zero:

[TABLE]

A-G Proof of Thm. 4

The proof is identical to the proof of Theorem 2 for the finite state-space case. In step 1, we derive the equation of the filter by first assuming the following linear relationship between $P_{t}$ and $Y_{t}$ :

[TABLE]

In step 2, we verify this relationship is consistent with the filter equation.

Step 1: Suppose (27) is true. The differential form of (21) is given by:

[TABLE]

As in the finite state-space case, the proof approach is to use Hamilton’s equation for $Y_{t}$ to derive the equation for $\pi_{t}$ .

Using the Itô product formula,

[TABLE]

Use Hamilton’s equation (19b) to evaluate $\,\mathrm{d}Y_{t}$ , and equate the resulting expression to the right-hand side of (28) to obtain:

[TABLE]

This is the Euclidean counterpart of (24) in the proof of Theorem 2 in the finite state-space case. The derivation of the filter is now identical.

Step 2: The verification of (27) follows along the same lines as the finite state-space case. It is omitted here.

A-H Proof of Thm. 5

The proof is given for the finite state-space case. In the finite state-space case,

[TABLE]

Upon using (12b) for $\Sigma_{t}$ and (17) for the filter, it is a direct application of the Itô product formula that:

[TABLE]

Using this formula,

[TABLE]

Therefore, ${\cal M}^{U}$ is always a super-martingale with respect to ${\cal Z}$ , and is a martingale if and only if

[TABLE]

for almost every $t\in[0,T]$ . Consequently,

[TABLE]

Adding ${\sf E}\big{(}\int_{0}^{T}{\cal L}(Y_{s},V_{s},U_{s}\,;\pi_{s})\,\mathrm{d}s\big{)}$ on both sides yields the optimality result.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. E. Kalman and R. S. Bucy, “New results in linear filtering and prediction theory,” Journal of basic engineering , vol. 83, no. 1, pp. 95–108, 1961.
2[2] K. J. Åström, Introduction to Stochastic Control Theory . Academic Press, 1970.
3[3] A. Bensoussan, Estimation and Control of Dynamical Systems . Springer, 2018, vol. 48.
4[4] R. E. Mortensen, “Maximum-likelihood recursive nonlinear filtering,” Journal of Optimization Theory and Applications , vol. 2, no. 6, pp. 386–394, 1968.
5[5] K. W. Simon and A. R. Stubberud, “Duality of linear estimation and control,” Journal of Optimization Theory and Applications , vol. 6, no. 1, pp. 55–67, 1970.
6[6] W. Fleming and S. Mitter, “Optimal control and nonlinear filtering for nondegenerate diffusion processes,” Stochastics , vol. 8, pp. 63–77, 1982.
7[7] W. H. Fleming and E. De Giorgi, “Deterministic nonlinear filtering,” Annali della Scuola Normale Superiore di Pisa-Classe di Scienze-Serie IV , vol. 25, no. 3, pp. 435–454, 1997.
8[8] G. C. Goodwin, J. A. de Doná, M. M. Seron, and X. W. Zhuo, “Lagrangian duality between constrained estimation and control,” Automatica , vol. 41, no. 6, pp. 935–944, 2005.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

What is the Lagrangian for Nonlinear Filtering?

Abstract

I Introduction

II Problem Formulation

II-A Filtering problem

II-B Itô diffusion on the Euclidean space

II-C Finite state-space

II-D Duality relationship

Proposition 1

Proposition 2

II-E Linear-Gaussian case

Remark 1

III Solution for the finite-state case

III-A Standard form of the optimal control problem

III-B Solution using the maximum principle

Theorem 1

Remark 2

III-C Derivation of the nonlinear filter

Theorem 2** (Nonlinear filter)**

IV Solution for the Euclidean case

Theorem 3

Theorem 4

V A martingale characterization

Theorem 5

Appendix A Appendix

A-A Proof of Prop. 1

A-B Proof of Prop. 2

A-C Derivation of the Kalman filter for the Markov process

A-D Proof of Thm. 1

A-E Proof of Thm. 2

A-F Proof of Thm. 3

A-G Proof of Thm. 4

A-H Proof of Thm. 5

Theorem 2 (Nonlinear filter)