State and Parameter Estimation from Observed Signal Increments

Nikolas N\"usken; Sebastian Reich; Paul J. Rozdeba

arXiv:1903.10717·math.NA·June 26, 2019·Entropy

State and Parameter Estimation from Observed Signal Increments

Nikolas N\"usken, Sebastian Reich, Paul J. Rozdeba

PDF

TL;DR

This paper develops ensemble Kalman-Bucy filter algorithms for simultaneous state and parameter estimation in continuous-time stochastic systems with correlated errors, demonstrated on complex multi-scale models.

Contribution

It introduces new ensemble Kalman-Bucy algorithms tailored for joint state and parameter estimation in correlated error settings.

Findings

01

Effective estimation in multi-scale stochastic models

02

Algorithms handle correlated model and measurement errors

03

Successful application to complex systems

Abstract

The success of the ensemble Kalman filter has triggered a strong interest in expanding its scope beyond classical state estimation problems. In this paper, we focus on continuous-time data assimilation where the model and measurement errors are correlated and both states and parameters need to be identified. Such scenarios arise from noisy and partial observations of Lagrangian particles which move under a stochastic velocity field involving unknown parameters. We take an appropriate class of McKean-Vlasov equations as the starting point to derive ensemble Kalman-Bucy filter algorithms for combined state and parameter estimation. We demonstrate their performance through a series of increasingly complex multi-scale model systems.

Equations234

d X_{t} = f (X_{t}, a) d t + G d W_{t},

d X_{t} = f (X_{t}, a) d t + G d W_{t},

f (x, a) = f_{0} (x) + B (x) a = f_{0} (x) + i = 1 \sum N_{a} b_{i} (x) a^{i},

f (x, a) = f_{0} (x) + B (x) a = f_{0} (x) + i = 1 \sum N_{a} b_{i} (x) a^{i},

d Y_{t} = H d X_{t} + R^{1/2} d V_{t} = H f (X_{t}, a) d t + H G d W_{t} + R^{1/2} d V_{t}, Y_{0} = X_{0} = x_{0},

d Y_{t} = H d X_{t} + R^{1/2} d V_{t} = H f (X_{t}, a) d t + H G d W_{t} + R^{1/2} d V_{t}, Y_{0} = X_{0} = x_{0},

h (x, a) = H f (x, a)

h (x, a) = H f (x, a)

E_{t}^{o} := H G W_{t} + R^{1/2} V_{t}

E_{t}^{o} := H G W_{t} + R^{1/2} V_{t}

C = H G G^{T} H^{T} + R = H Q H^{T} + R

C = H G G^{T} H^{T} + R = H Q H^{T} + R

l_{t} (a) = exp (\int_{0}^{t} f (Y_{s}, a)^{T} Q^{- 1} d Y_{s} - \frac{1}{2} \int_{0}^{t} f (Y_{s}, a)^{T} Q^{- 1} f (Y_{s}, a) d s)

l_{t} (a) = exp (\int_{0}^{t} f (Y_{s}, a)^{T} Q^{- 1} d Y_{s} - \frac{1}{2} \int_{0}^{t} f (Y_{s}, a)^{T} Q^{- 1} f (Y_{s}, a) d s)

Π_{t} (a) = \frac{l _{t} ( a ) Π _{0} ( a )}{Π _{0} [ l _{t} ]}

Π_{t} (a) = \frac{l _{t} ( a ) Π _{0} ( a )}{Π _{0} [ l _{t} ]}

Π_{0} [l_{t}] = \int_{R^{N_{a}}} l_{t} (a) Π_{0} (a) d a

Π_{0} [l_{t}] = \int_{R^{N_{a}}} l_{t} (a) Π_{0} (a) d a

d Π_{t} [ϕ] = (Π_{t} [ϕ h_{t}] - Π_{t} [ϕ] Π_{t} [h_{t}])^{T} Q^{- 1} (d Y_{t} - Π_{t} [h_{t}] d t)

d Π_{t} [ϕ] = (Π_{t} [ϕ h_{t}] - Π_{t} [ϕ] Π_{t} [h_{t}])^{T} Q^{- 1} (d Y_{t} - Π_{t} [h_{t}] d t)

h_{t} (a) = f (Y_{t}, a),

h_{t} (a) = f (Y_{t}, a),

d A_{t} = K_{t} (A_{t}) d I_{t} + Ω_{t} (A_{t}) d t,

d A_{t} = K_{t} (A_{t}) d I_{t} + Ω_{t} (A_{t}) d t,

\nabla \cdot (Π_{t} (K_{t} Q)) = - Π_{t} (h_{t} - Π_{t} [h_{t}])^{T}, Π_{t} = Law (A_{t}),

\nabla \cdot (Π_{t} (K_{t} Q)) = - Π_{t} (h_{t} - Π_{t} [h_{t}])^{T}, Π_{t} = Law (A_{t}),

d I_{t} = d Y_{t} - \frac{1}{2} (h_{t} (A_{t}) + Π_{t} [h_{t}]) d t,

d I_{t} = d Y_{t} - \frac{1}{2} (h_{t} (A_{t}) + Π_{t} [h_{t}]) d t,

d I_{t} = d Y_{t} - {h_{t} (A_{t}) d t + G d W_{t}},

d I_{t} = d Y_{t} - {h_{t} (A_{t}) d t + G d W_{t}},

Ω_{t}^{i} = \frac{1}{2} j = 1 \sum N_{a} k, l = 1 \sum N_{y} Q^{k l} K_{t}^{j l} (\partial_{j} K_{t}^{ik}), i = 1, \dots, N_{a} .

Ω_{t}^{i} = \frac{1}{2} j = 1 \sum N_{a} k, l = 1 \sum N_{y} Q^{k l} K_{t}^{j l} (\partial_{j} K_{t}^{ik}), i = 1, \dots, N_{a} .

d A_{t} = K_{t} (A_{t}) \circ d I_{t},

d A_{t} = K_{t} (A_{t}) \circ d I_{t},

K_{t} = \nabla Ψ_{t} Q^{- 1},

K_{t} = \nabla Ψ_{t} Q^{- 1},

\nabla \cdot (Π_{t} \nabla Ψ_{t}) = - Π_{t} (h_{t} - Π_{t} [h_{t}])^{T}, Π_{t} [Ψ_{t}] = 0,

\nabla \cdot (Π_{t} \nabla Ψ_{t}) = - Π_{t} (h_{t} - Π_{t} [h_{t}])^{T}, Π_{t} [Ψ_{t}] = 0,

i = 1 \sum N_{a} j = 1 \sum N_{y} \partial_{i} (Π_{t} (K_{t}^{ij} Q^{j k})) = - Π_{t} (h_{t}^{k} - Π_{t} [h_{t}^{k}]), k = 1, \dots, N_{y},

i = 1 \sum N_{a} j = 1 \sum N_{y} \partial_{i} (Π_{t} (K_{t}^{ij} Q^{j k})) = - Π_{t} (h_{t}^{k} - Π_{t} [h_{t}^{k}]), k = 1, \dots, N_{y},

j = 1 \sum N_{y} K_{t}^{ij} (a) Q^{j k} = \partial_{i} ψ_{t}^{k} (a), i = 1, \dots, N_{a}, k = 1, \dots, N_{y} .

j = 1 \sum N_{y} K_{t}^{ij} (a) Q^{j k} = \partial_{i} ψ_{t}^{k} (a), i = 1, \dots, N_{a}, k = 1, \dots, N_{y} .

K_{t} = P_{t}^{aa} B (Y_{t})^{T} Q^{- 1}

K_{t} = P_{t}^{aa} B (Y_{t})^{T} Q^{- 1}

d A_{t} = P_{t}^{aa} B (Y_{t})^{T} Q^{- 1} d I_{t}

d A_{t} = P_{t}^{aa} B (Y_{t})^{T} Q^{- 1} d I_{t}

d I_{t} = d Y_{t} - (f_{0} (Y_{t}) + \frac{1}{2} B (Y_{t}) (A_{t} + \overline{a}_{t})) d t

d I_{t} = d Y_{t} - (f_{0} (Y_{t}) + \frac{1}{2} B (Y_{t}) (A_{t} + \overline{a}_{t})) d t

d I_{t} = d Y_{t} - {(f_{0} (Y_{t}) + B (Y_{t}) A_{t}) d t + G d W_{t}} .

d I_{t} = d Y_{t} - {(f_{0} (Y_{t}) + B (Y_{t}) A_{t}) d t + G d W_{t}} .

K_{t} = P_{t}^{ah} Q^{- 1}, P_{t}^{ah} = Π_{t} [(a - \overline{a}_{t}) (h_{t} (a) - Π_{t} [h_{t}])^{T}] = Π_{t} [a (h_{t} (a) - Π_{t} [h_{t}])^{T}]

K_{t} = P_{t}^{ah} Q^{- 1}, P_{t}^{ah} = Π_{t} [(a - \overline{a}_{t}) (h_{t} (a) - Π_{t} [h_{t}])^{T}] = Π_{t} [a (h_{t} (a) - Π_{t} [h_{t}])^{T}]

d A_{t}^{i} = K_{t}^{M} (A_{t}^{i}) \circ d I_{t}^{i},

d A_{t}^{i} = K_{t}^{M} (A_{t}^{i}) \circ d I_{t}^{i},

d I_{t}^{i} = d Y_{t} - \frac{1}{2} (h_{t} (A_{t}^{i}) + \overline{h}_{t}^{M}) d t, \overline{h}_{t}^{M} = \frac{1}{M} i = 1 \sum M h_{t} (A_{t}^{i}),

d I_{t}^{i} = d Y_{t} - \frac{1}{2} (h_{t} (A_{t}^{i}) + \overline{h}_{t}^{M}) d t, \overline{h}_{t}^{M} = \frac{1}{M} i = 1 \sum M h_{t} (A_{t}^{i}),

d I_{t}^{i} = d Y_{t} - (h_{t} (A_{t}^{i}) d t + G d W_{t}^{i}),

d I_{t}^{i} = d Y_{t} - (h_{t} (A_{t}^{i}) d t + G d W_{t}^{i}),

K_{t}^{M} = P_{t}^{ah} Q^{- 1}, P_{t}^{ah} = \frac{1}{M - 1} i = 1 \sum M A_{t}^{i} (h_{t} (A_{t}^{i}) - \overline{h}_{t}^{M})^{T},

K_{t}^{M} = P_{t}^{ah} Q^{- 1}, P_{t}^{ah} = \frac{1}{M - 1} i = 1 \sum M A_{t}^{i} (h_{t} (A_{t}^{i}) - \overline{h}_{t}^{M})^{T},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Abstract

The success of the ensemble Kalman filter has triggered a strong interest in expanding its scope beyond classical state estimation problems. In this paper, we focus on continuous-time data assimilation where the model and measurement errors are correlated and both states and parameters need to be identified. Such scenarios arise from noisy and partial observations of Lagrangian particles which move under a stochastic velocity field involving unknown parameters. We take an appropriate class of McKean–Vlasov equations as the starting point to derive ensemble Kalman–Bucy filter algorithms for combined state and parameter estimation. We demonstrate their performance through a series of increasingly complex multi-scale model systems.

keywords:

parameter estimation, continuous-time data assimilation, ensemble Kalman filter, correlated noise, multi-scale diffusion processes

\pubvolume

xx \issuenum1 \articlenumber5

\historyReceived: date; Accepted: date; Published: date \TitleState and Parameter Estimation from Observed Signal Increments \AuthorNikolas Nüsken 1, Sebastian Reich 1* and Paul J. Rozdeba 1 \AuthorNamesNikolas Nüsken, Sebastian Reich and Paul J. Rozdeba

\corresCorrespondence: [email protected]; Tel.: +49-331-977-1859

1 Introduction

The research presented in this paper has been motivated by the state and parameter estimation problem for particles moving under a stochastic velocity field, with the measurements given by partial and noisy observations of their position increments. If the deterministic contributions to the velocity field are stationary, and the position increments of the moving particle are exactly observed, then one is led to a standard parameter estimation problem for stochastic differential equations (SDEs) (Kutoyants, 2004; Pavliotis, 2014). In Apte et al. (2007), this setting was extended to the case where the deterministic contributions to the velocity field themselves undergo a stochastic time evolution. Furthermore, while continuous-time observations of position increments are at the focus of the present study, the assimilation of discrete-time observations of particle positions has been investigated in Salman et al. (2006); Apte et al. (2008) under a so-called Lagrangian data assimilation setting for atmospheric fluid dynamics.

The assumption of exactly and fully observed position increments is not always realistic and the case of partial and noisy observations is at the center of the present study. Having access to partial and noisy observations of position increments leads to correlations between the measurement and model errors. The theoretical impact of such correlations on state and parameter estimation problems has been discussed, for example, in Simon (2006) in the context of linear systems, and in Bain and Crisan (2009) for nonlinear systems. One finds in particular that the appropriately adjusted data likelihood involves the gradient of log-densities, which is nontrivial from a computational perspective, and which prevents a straightforward application of standard Markov chain Monte Carlo (MCMC) or sequential Monte Carlo (SMC) methods Liu (2001).

In this paper, we instead follow an alternative Monte Carlo approach based on appropriately adjusted McKean–Vlasov filtering equations, an approach pioneered in Crisan and Xiong (2010) in the context of the standard state estimation problem for diffusion processes. We recall that the notion of McKean–Vlasov equations, first studied in McKean (1966), characterises a class of SDEs for which their right-hand side depends on the law of the process itself. We rely on a particular formulation of such McKean–Vlasov filtering equations, the so-called feedback particle filters (Yang et al., 2013), utilising stochastic innovation processes (Reich, 2019). Our proposed Monte Carlo formulation avoids the need for estimating log-densities, and can be implemented in a numerically robust manner relying on a generalised ensemble Kalman–Bucy filter approximation applied to an extended state space formulation (Majda and Harlim, 2012). The ensemble Kalman–Bucy filter (Bergemann and Reich, 2012; Taghvaei et al., 2017) has been introduced previously as an extension of the popular ensemble Kalman filter Majda and Harlim (2012); Law et al. (2015); Reich and Cotter (2015) to continuous-time data assimilation under the assumption of uncorrelated measurement and model errors.

We apply the proposed algorithms to a series of state and parameter estimation problems of increasing complexity. First, we study the state and parameter estimation problem for an Ornstein–Uhlenbeck process Pavliotis (2014). Two further experiments investigate the behaviour of the filters for reduced model equations, with the data being collected from underlying multi-scale models. There we distinguish between the averaging and homogenisation scenarios (Pavliotis and Stuart, 2008). Finally, we also look at nonparametric drift estimation Apte et al. (2007), and parameter estimation for the stochastic heat equation Altmeyer and Reiß (2019).

2 Mathematical problem formulation

We consider the time evolution of a random state variable $X_{t}\in\mathbb{R}^{N_{x}}$ in $N_{x}$ -dimensional state space, $N_{x}\geq 1$ , as prescribed by an SDE of the form

[TABLE]

for time $t\geq 0$ , with the drift function $f:\mathbb{R}^{N_{x}}\times\mathbb{R}^{N_{a}}\to\mathbb{R}^{N_{x}}$ depending on $N_{a}\geq 0$ unknown parameters $a=(a^{1},\ldots,a^{N_{a}})^{\rm T}\in\mathbb{R}^{N_{a}}$ . Model errors are represented through standard $N_{w}$ -dimensional Brownian motion $W_{t}$ , $N_{w}\geq 1$ , and a matrix $G\in\mathbb{R}^{N_{x}\times N_{w}}$ . We also introduce the associated model error covariance matrix $Q=GG^{\rm T}$ . We will generally assume that the initial condition $X_{0}$ is fixed, that is, $X_{0}=x_{0}$ a.s. for given $x_{0}\in\mathbb{R}^{N_{x}}$ . In terms of a more specific example, one can think of $X_{t}$ denoting the position of a particle at time $t\geq 0$ moving in $N_{x}=3$ dimensional space under the influence of a stochastic velocity field, with deterministic contributions given by $f$ and stochastic perturbations by $GW_{t}$ . In the case $G=0$ , the SDE (1) reduces to an ordinary differential equation with given initial condition $x_{0}$ .

We assume throughout this paper that (1) possesses unique, strong solutions for all parameter values $a$ . See, for example, Pavliotis (2014) for necessary conditions on the drift function $f$ . The distribution of $X_{t}$ is denoted by $\pi_{t}$ , which we also abbreviate by $\pi_{t}={\rm Law}(X_{t})$ . We use the same notation for measures and their Lebesgue densities, provided they exist.

{Example}

A wide class of drift functions can be written in the form

[TABLE]

where $f_{0}:\mathbb{R}^{N_{x}}\to\mathbb{R}^{N_{x}}$ is a known drift function, the $b_{i}:\mathbb{R}^{N_{x}}\to\mathbb{R}^{N_{x}}$ , $i=1,\ldots,N_{a}$ , denote appropriate basis functions, and the vector $a=(a^{1},\ldots,a^{N_{a}})^{\rm T}\in\mathbb{R}^{N_{a}}$ contains the unknown parameters of the model. The family $\{b_{i}(x)\}$ of basis functions, which we collect in a matrix-valued function $B(x)=(b_{1}(x),b_{2}(x),\ldots,b_{N_{a}}(x))\in\mathbb{R}^{N_{x}\times N_{a}}$ , could arise from a finite-dimensional truncation of some appropriate Hilbert space $\mathcal{H}$ . See, for example, Papaspiliopoulos et al. (2012) for computational approaches to nonparametric drift estimation using a Galerkin approximation in $\mathcal{H}$ , where the $b_{i}(x)$ become finite element basis functions. Furthermore, the expansion coefficients $\{a^{i}\}$ could be made time-dependent by letting them evolve according to some system of differential equations arising, for example, from the discretisation of an underlying partial differential equation with solutions in $\mathcal{H}$ . See Apte et al. (2007) for specific examples of such a setting. While the present paper focuses on stationary drift functions, that is, the parameters $\{a^{i}\}$ are time-independent, the results from Sections 3 and 5, respectively, can easily be extended to the non-stationary case where the parameters themselves satisfy given evolution equations.

Data and an observation model are required in order to perform state and parameter estimation for SDEs of the form (1). In this paper, we assume that we observe partial and noisy increments ${\rm d}Y_{t}$ of the signal $X_{t}$ , that is,

[TABLE]

for $t$ in the observation interval $[0,T]$ , $T>0$ , where $H\in\mathbb{R}^{N_{y}\times N_{x}}$ is a given linear operator, $V_{t}$ denotes standard $N_{y}$ -dimensional Brownian motion with $N_{y}\geq 1$ and $R\in\mathbb{R}^{N_{y}\times N_{y}}$ is a covariance matrix. We introduce the observation map

[TABLE]

for later use. Unless $HG=0$ , we find that the model error $E_{t}^{\rm m}:=GW_{t}$ in (1) and the total observation error

[TABLE]

in (3) are correlated. The impact of correlations between the model and measurement errors on the state estimation problem have been discussed by Simon (2006); Bain and Crisan (2009). Furthermore, such correlations require adjustments to sequential estimation methods (Särkkä, 2013; Law et al., 2015; Reich and Cotter, 2015) which are the main focus of this paper. We assume throughout this paper that the covariance matrix

[TABLE]

of the observation error (5) is invertible.

The special case $R=0$ and $H=I$ leads to a pure parameter estimation problem, which has been extensively studied in the literature in the settings of maximum likelihood and Bayesian estimators (Kutoyants, 2004; Pavliotis, 2014). We will provide a reformulation of the Bayesian approach in the form of McKean–Vlasov equations in the parameters, based on the results in Crisan and Xiong (2010); Yang et al. (2013) in Section 3.

If $R\not=0$ , then (1) and (3) lead to a combined state and parameter estimation problem with correlated noise terms. We will first discuss the impact of this correlation on the pure state estimation problem in Section 4 assuming that the parameters of the problem are known. Again, we will derive appropriate McKean–Vlasov equations in the state variables. Our key contribution is a formulation that avoids the need for log-density estimates, and can be put into an appropriately generalised ensemble Kalman–Bucy filter approximation framework (Bergemann and Reich, 2012; Taghvaei et al., 2017). We also formally demonstrate that the McKean–Vlasov filter equation reduces to ${\rm d}X_{t}={\rm d}Y_{t}$ in the limit $R\to 0$ and $H=I$ , a property which is less straightforward to demonstrate for filter formulations involving log-densities.

These McKean–Vlasov equations can be generalised to the combined state and parameter estimation problem via an augmentation of state space (Majda and Harlim, 2012) in Section 5. Given the results from Section 4, such an extension is rather straightforward.

The numerical experiments in Section 6 rely exclusively on the generalised ensemble Kalman–Bucy filter approximation to the McKean–Vlasov equations, which are easy to implement and yield robust and accurate numerical results.

3 Parameter estimation from noiseless data

In this section, we treat the simpler Bayesian parameter estimation problem which arises from setting $R=0$ and $H=I$ in (3), that is, $N_{y}=N_{x}$ . This leads to ${\rm d}X_{t}={\rm d}Y_{t}$ and, furthermore, $X_{t}=Y_{t}$ for all $t\in[0,T]$ , provided $X_{0}=Y_{0}=x_{0}$ which we assume throughout this paper. The requirement that $C=Q$ is invertible requires that $G$ has rank $N_{x}$ , that is, $N_{w}\geq N_{x}$ in (1). The data likelihood

[TABLE]

thus follows from the observation model with additive Brownian noise in (3). Given a prior distribution $\Pi_{0}(a)$ for the parameters, the resulting posterior distribution at any time $t\in(0,T]$ is

[TABLE]

according to Bayes’ theorem Bain and Crisan (2009). Here, we have introduced the shorthand

[TABLE]

for the expectation of $l_{t}$ with respect to $\Pi_{0}$ . It is well-known that the posterior distributions $\Pi_{t}$ satisfy the stochastic partial differential equation

[TABLE]

with time-dependent observation map

[TABLE]

where $\phi:\mathbb{R}^{N_{a}}\to\mathbb{R}$ is a compactly supported smooth test function, and $\Pi_{t}[\phi]$ again denoting the expectation of $\phi$ with respect to $\Pi_{t}$ . See Bain and Crisan (2009) for a detailed discussion. Equation (10) constitutes a special instance of the well-known Kushner–Stratonovitch equation from time-continuous filtering Bain and Crisan (2009).

3.1 Feedback particle filter

We now state a McKean–Vlasov reformulation of the Kushner–Stratonovitch equation (10) as a special instance of the feedback particle filter of Yang et al. (2013); Reich (2019). The key idea is to formulate a stochastic differential equation in the parameters in which they are treated as time-dependent random variables. We introduce the notation $\widetilde{A}_{t}$ for these, and require that the law of $\widetilde{A}_{t}$ coincide with (8) for $t\in[0,T]$ , that is, with the solution to (10).

{Lemma}

[Feedback particle filter]

Consider the McKean–Vlasov equations

[TABLE]

where the matrix-valued Kalman gain $K_{t}\in\mathbb{R}^{N_{a}\times N_{y}}$ satisfies

[TABLE]

the innovation process $I_{t}$ can be chosen to be given by either

[TABLE]

or

[TABLE]

and

[TABLE]

Then, the distribution $\widetilde{\Pi}_{t}=\operatorname{Law}(\widetilde{A}_{t})$ coincides with the solution to (10), provided that the initial distributions agree. In other words, $\widetilde{\Pi}_{t}=\Pi_{t}$ for all $t\in[0,T]$ .

Throughout this paper, we write (12) in the more compact Stratonovitch form

[TABLE]

where the Stratonovitch interpretation is to be applied only to $\widetilde{A}_{t}$ in $K_{t}(\widetilde{A}_{t})$ , while the explicit time-dependence of $K_{t}$ remains in its Itô interpretation. It should be noted that the matrix-valued function $K_{t}$ is not uniquely defined by the PDE (44). Indeed, provided $K_{t}$ solves (44), $K_{t}+\beta_{t}$ is also a solution whenever $\nabla\cdot\left(\widetilde{\Pi}_{t}\beta_{t}\right)=0$ . As discussed in Taghvaei et al. (2017), the minimiser over all suitable $K_{t}$ with respect to a kinetic energy-type functional is of the form

[TABLE]

for a vector of potential functions $\Psi_{t}=(\psi_{t}^{1},\ldots,\psi_{t}^{N_{x}})$ , $\psi^{k}_{t}:\mathbb{R}^{N_{a}}\to\mathbb{R}$ . Inserting (18) into (44) leads to $N_{x}$ elliptic partial differential equations (often referred to as Poisson equations),

[TABLE]

understood componentwise, where the centering condition $\widetilde{\Pi}_{t}[\Psi_{t}]=0$ makes the solution unique under mild assumptions on $\widetilde{\Pi}_{t}$ , see Laugesen et al. (2015). Finally, (15) yields a particularly appealing formulation, since it is based on a direct comparison of ${\rm d}Y_{t}$ with a random realisation of the right hand side of the SDE (1), given a parameter value $a=\widetilde{A}_{t}(\omega)$ and a realisation of the noise term ${\rm d}W_{t}(\omega)$ . This fact will be explored further in Section 4.

{Remark}

For clarity, let us repeat equations (44) and (18) in their index forms:

[TABLE]

3.2 Ensemble Kalman–Bucy filter

Let us now assume that the initial distribution $\Pi_{0}$ is Gaussian, and that $f$ is linear in the unknown parameters such as in (2). Then, the distributions $\widetilde{\Pi}_{t}$ remain Gaussian for all times with mean $\overline{a}_{t}$ and covariance matrix $P^{aa}_{t}$ . The elliptic PDE (44) is solved by the parameter-independent Kalman gain matrix

[TABLE]

and one obtains the McKean–Vlasov formulation

[TABLE]

of the Kalman–Bucy filter, with the innovation process $I_{t}$ defined by either

[TABLE]

or

[TABLE]

Note that the Stratonovitch formulation (17) reduces to the standard Itô interpretation, since $K_{t}$ no longer depends explicitly on $\widetilde{A}_{t}$ .

The McKean–Vlasov equations (23) can be extended to nonlinear, non-Gaussian parameter estimation problems by generalising the parameter-independent Kalman gain matrix (22) to

[TABLE]

Clearly, the gain (26) provides only an approximation to the solution of (44). However, such approximations have become popular in nonlinear state estimation in the form of the ensemble Kalman filter (Law et al., 2015; Reich and Cotter, 2015), and we will test its suitability for parameter estimation in Section 6.

Numerical implementations of the proposed McKean–Vlasov approaches rely on Monte–Carlo approximations. More specifically, given $M$ samples $\widetilde{A}_{0}^{i}$ , $i=1,\ldots,M$ , from the initial distribution $\Pi_{0}$ , we introduce the interacting particle system

[TABLE]

where the innovation processes $I_{t}^{i}$ are defined by either

[TABLE]

or, alternatively,

[TABLE]

and $W_{t}^{i}$ , $i=1,\ldots,M$ , denote independent $N_{w}$ -dimensional Brownian motions. For $K^{M}_{t}$ , we will use the parameter-independent empirical Kalman gain approximation

[TABLE]

in our numerical experiments, which leads to the so-called ensemble Kalman–Bucy filter (Bergemann and Reich, 2012; Taghvaei et al., 2017). Note that $\widehat{P}_{t}^{ah}$ provides an unbiased estimator of $P_{t}^{ah}$ .

Finally, a robust and efficient time-stepping procedure for approximating $\widetilde{A}_{t_{n}}$ , $t_{n}=n\Delta t$ , is provided in (Amezcua et al., 2014; de Wiljes et al., 2018; Blömker et al., 2018). Denoting the approximations at time $t_{n}$ by $\widetilde{A}_{n}^{i}$ , $i=1,\ldots,M$ , we obtain

[TABLE]

with step size $\Delta t>0$ , empirical covariance matrices

[TABLE]

and innovation increments $\Delta I_{n}^{i}$ given by either

[TABLE]

or

[TABLE]

Here we have used the abbreviations $h_{n}(a)=f(Y_{n},a)$ , $Y_{n}=Y_{t_{n}}$ , and $\Delta Y_{n}=Y_{t_{n+1}}-Y_{t_{n}}$ .

While the feedback particle formulation (17) and its ensemble Kalman–Bucy filter approximation (31) are special cases of already available formulations, they provide the starting point for our novel McKean–Vlasov equations and their numerical approximation of the combined state and parameter estimation problem with correlated measurement and model errors, which we develop in the following two sections.

4 State estimation for noisy data

We return to the observation model (3) with $R\not=0$ and general $H$ . The pure state estimation problem is considered first, that is, $f(x,a)=f(x)$ in (1).

Using $E_{t}^{\rm o}$ , given by (5), and $E_{t}^{\rm c}$ defined by

[TABLE]

with the total measurement error covariance matrix $C$ given by (6), we find that

[TABLE]

and the covariations Pavliotis (2014) satisfy

[TABLE]

Hence (1) and (3) can be rewritten as follows:

[TABLE]

where $\widehat{W}_{t}$ and $\widehat{V}_{t}$ denote mutually independent standard Brownian motions of dimension $N_{w}$ and $N_{y}$ , respectively. These equations correspond exactly to the correlated noise example from (Bain and Crisan, 2009, Section 3.8). Furthermore, $H=I$ and $R=0$ lead to $E_{t}^{\rm c}=0$ , $QH^{\rm T}C^{-1/2}=C^{1/2}$ , and, hence, ${\rm d}X_{t}={\rm d}Y_{t}$ .

A straightforward application of the results from (Bain and Crisan, 2009, Section 3.8) yields the following statement:

{Lemma}

[Generalised Kushner–Stratonovich equation] The conditional expectations $\pi_{t}[\phi]=\mathbb{E}[\phi(X_{t})|Y_{[0,t]}]$ satisfy

[TABLE]

where111We use the notation $Q:\nabla\nabla\phi=\sum_{i,j=1}^{N_{x}}Q^{ij}\partial_{i}\partial_{j}\phi$ .

[TABLE]

is the generator of (1), $h(x)=Hf(x)$ denotes the observation map, and $\phi$ is a compactly supported smooth function.

For the convenience of the reader, we present an independent derivation in Appendix A. We note that (39) also arises as the Kushner–Stratonovitch equations for an SDE model (1) with observations $Y_{t}$ satisfying the observation model

[TABLE]

where $\widetilde{V}_{t}$ denotes $N_{y}$ -dimensional Brownian motion independent of the Brownian motion $W_{t}$ in (1). Here we have used that $\pi_{t}\left[HQ\nabla\pi_{t}\right]=0$ . This reinterpretation of our state estimation problem in terms of uncorrelated model and observation errors and modified observation map

[TABLE]

allows one to apply available MCMC and SMC methods for continuous-time filtering and smoothing problems. See, for example, Law et al. (2015). However, there are two major limitations of such an approach. First, it requires approximating the gradient of the log-density. Second, the modified observation model (41) is not well-defined in the limit $R\to 0$ and $H=I$ , since the density $\pi_{t}$ collapses to a Dirac delta function under the given initial condition $X_{0}=x_{0}$ a.s.

In order to circumvent these complications, we develop an alternative approach based on an appropriately modified feedback particle filter formulation in the following subsection.

4.1 Generalised feedback particle filter formulation

While it is clearly possible to apply the standard feedback particle filter formulations using (41), the following alternative formulation avoids the need for approximating the gradient of the log-density.

{Lemma}

[Feedback particle filter with correlated innovation] Consider the McKean–Vlasov equation

[TABLE]

where the gain $K_{t}\in\mathbb{R}^{N_{x}\times N_{y}}$ solves

[TABLE]

with observation map $h(x)=Hf(x)$ . The function $\Omega_{t}$ is given by

[TABLE]

and the innovation process $I_{t}$ by

[TABLE]

Here, $W_{t}$ and $U_{t}$ denote mutually independent $N_{x}$ -dimensional and $N_{y}$ -dimensional Brownian motions, respectively. Then, $\widetilde{\pi}_{t}=\operatorname{Law}(\widetilde{X}_{t})$ coincides with the solution to (39), provided that the initial distributions agree.

It should be stressed that $W_{t}$ in (43) and (46) denote the same Brownian motion, resulting in correlations between the innovation process and model noise.

Proof.

In this proof the Einstein summation convention over repeated indices is employed, noting that (44) takes the form

[TABLE]

We begin by writing (43) in its Itô-form,

[TABLE]

where

[TABLE]

Here we have used that the covariation between $K_{t}$ and $I_{t}$ satisfies

[TABLE]

and furthermore $\langle GW,I\rangle_{t}=-QH^{\rm T}t$ as well as $\langle I,I\rangle_{t}=2Ct$ .

For a smooth compactly supported test function $\phi$ , Itô’s formula implies

[TABLE]

where the covariation process is given by

[TABLE]

Our aim is to show that $\widetilde{\pi}_{t}[\phi]$ coincides with $\pi_{t}[\phi]$ as defined by the Kushner–Stratonovich equation (39). To this end, we insert (48) and (52) into (51) and take the conditional expectation, arriving at

[TABLE]

recalling that the generator $\mathcal{L}$ has been defined in (40). Under the assumption that $K_{t}$ satisfies (44), the two equations (39) and (53) coincide. Indeed,

[TABLE]

implies

[TABLE]

and the $\mathrm{d}Y_{s}$ -contributions agree. To verify the same for the $\mathrm{d}s$ -contributions, we use (44) to obtain

[TABLE]

Finally, collecting terms in (53) and (56) and applying (55) to the remaining $\mathrm{d}s$ -contribution, i.e. $-\widetilde{\pi}_{s}[\nabla\phi\cdot K_{s}]\widetilde{\pi}_{s}[h]$ , leads to the desired result. ∎

We note that the correlation between the innovation process $I_{t}$ and the model error $W_{t}$ leads to a correction term $\Omega_{t}$ in (43) which cannot be subsumed into a Stratonovitch correction, in contrast to the standard feedback particle filter formulation (17).

{Remark}

Assuming that there exist potential functions $\Psi_{t}=(\psi_{t}^{1},\ldots,\psi_{t}^{N_{y}})$ , $\psi_{t}^{k}:\mathbb{R}^{N_{x}}\to\mathbb{R}$ , solving the Poisson equation(s) (19) (with $\widetilde{\Pi}_{t}$ being replaced by $\widetilde{\pi}_{t}$ ), (44) can be solved by requiring

[TABLE]

thus generalising (18).

{Remark}

If we set $R=0$ , $H=I$ , and $K_{t}=QH^{\rm T}C^{-1}=I$ in (43), then one obtains

[TABLE]

since $\Omega_{t}$ vanishes, and all other terms in (43) cancel each other out. If, furthermore, $Y_{0}=\widetilde{X}_{0}=x_{0}$ a.s., then $\widetilde{X}_{t}=Y_{t}$ for all $t\in[0,T]$ , which in turn justifies our assumption that the gain $K_{t}$ is independent of the state variable. Hence, the McKean–Vlasov formulation (43) reproduces the exact reference trajectory $Y_{t}$ in the case of no measurement errors and perfectly known initial conditions.

We develop a simplified version of the feedback particle filter formulation (43) for linear SDEs and Gaussian distributions in the following subsection, which will form the basis of the generalised ensemble Kalman–Bucy filter put forward in the follow-up Section 4.3.

4.2 Generalised Kalman–Bucy filter

Let us assume that $f(x)=Fx$ with $F\in\mathbb{R}^{N_{x}\times N_{x}}$ , that is, equations (1) and (3) take the form

[TABLE]

with initial conditions drawn from a Gaussian distribution. In this case $\pi_{t}$ stays Gaussian for all $t>0$ , i.e. $\pi_{t}\sim{\rm N}(\overline{x}_{t},P_{t})$ with $\overline{x}_{t}\in\mathbb{R}^{N_{x}}$ , $P_{t}\in\mathbb{R}^{N_{x}\times N_{x}}$ . Equations (19) can be solved uniquely by $\nabla_{x}\Psi=P_{t}F^{\rm T}H^{\rm T}$ , and thus the McKean–Vlasov equations for the feedback particle filter (43) reduce to

[TABLE]

with the innovation process (46) leading to

[TABLE]

We take the expectation in (60)–(61) and end up with

[TABLE]

Defining $u_{t}:=\widetilde{X}_{t}-\overline{x}_{t}$ , we see that

[TABLE]

Next we use

[TABLE]

and $P_{t}=\mathbb{E}[u_{t}u_{t}^{\rm T}]$ to obtain, after some calculations,

[TABLE]

Hence we have shown that our McKean–Vlasov formulation (60) agrees with the standard Kalman–Bucy filter equations for the mean and the covariance matrix in the correlated noise case Simon (2006).

4.3 Ensemble Kalman–Bucy filter

The McKean–Vlasov equations (60) for linear systems and Gaussian distributions suggest approximating the feedback particle filter formulation (43) for nonlinear systems by

[TABLE]

where the innovation process $I_{t}$ given by (46) as before. In other words, we approximate the gain matrix $K_{t}$ in (43) by the state independent term $\left(P_{t}^{xh}+QH^{\rm T}\right)C^{-1}$ with the covariance matrix $P_{t}^{xh}$ defined by

[TABLE]

where $\widetilde{\pi}_{t}$ denotes the law of $\widetilde{X}_{t}$ .

We can now generalise the ensemble Kalman–Bucy filter formulation (31) for the pure parameter estimation problem to the state estimation problem with correlated noise. We assume that $M$ initial state values $\widetilde{X}_{0}^{i}$ have been sampled from an initial distribution $\pi_{0}$ or, alternatively, $X_{0}^{i}=x_{0}$ for all $i=1,\ldots,M$ in case the initial condition is known exactly. These state values are then propagated under the time-stepping procedure

[TABLE]

with $\Theta_{n}^{i}\sim{\rm N}(0,I)$ , step size $\Delta t>0$ , empirical covariance matrices

[TABLE]

and innovation increments $\Delta I_{n}^{i}$ given by

[TABLE]

The McKean–Vlasov equations of this section form the basis of the methods proposed for the combined state and parameter estimation problem to be considered next.

5 Combined state and parameter estimation

We now return to the combined state and parameter estimation problem and consider the augmented dynamics

[TABLE]

with observations (3) as before. The initial conditions satisfy $X_{0}=x_{0}$ a.s. and $A_{0}\sim\Pi_{0}$ . Let us introduce the extended state-space variable $Z_{t}=(X_{t}^{\rm T},A_{t}^{\rm T})^{\rm T}$ . In terms of $Z_{t}$ , the equations (71) and (3) take the form

[TABLE]

with

[TABLE]

Thus we end up with an augmented state estimation problem of the general structure considered in detail in Section 4 already. Below we provide details on some of the necessary modifications.

5.1 Feedback particle filter formulation

The appropriately extended feedback particle filter equation (43) leads to

[TABLE]

where (46) takes the form

[TABLE]

with observation map (4) and the correction $\Omega_{t}$ is given by (45) with $Q$ replaced by $\bar{Q}=\bar{G}\bar{G}^{T}$ and $H$ by $\bar{H}$ . In the Poisson equation(s) (19), $\widetilde{\Pi}_{t}$ is replaced by $\widetilde{\pi}_{t}$ denoting the joint density of $(\widetilde{X}_{t},\widetilde{A}_{t})$ . We also stress that $\Psi_{t}$ becomes a function of $x$ and $a$ and we distinguish between gradients with respect to $x$ and $a$ using the notation $\nabla_{x}$ and $\nabla_{a}$ , respectively.

Numerical implementations of the extended feedback particle filter are demanding due to the need of solving the Poisson equation(s) (19). Instead we again rely on the ensemble Kalman–Bucy filter approximation, which we describe next.

5.2 Ensemble Kalman–Bucy filter

We approximate the joint density $\widetilde{\pi}_{t}$ of $\widetilde{Z}_{t}$ by an ensemble of particles

[TABLE]

that is,

[TABLE]

where $\delta_{z^{\prime}}$ denotes the Dirac delta function centred at $z^{\prime}$ . The initial ensemble satisfies $X_{0}^{i}=x_{0}$ for all $i=1,\ldots,M$ , and the initial parameter values $A_{0}^{i}$ are independent draws from the prior distribution $\Pi_{0}$ .

At the same time, we make the approximation $\widetilde{Z}_{t}\sim{\rm N}(\overline{z}_{t}^{M},\widehat{P}^{zz}_{t})$ when dealing with the Kalman gain of the feedback particle filter. Here the empirical mean $\overline{z}_{t}^{M}$ has components

[TABLE]

and the joint empirical covariance matrix is given by

[TABLE]

As in Section 4.3, the solution to (19) can be approximated by

[TABLE]

where the covariance matrices $P^{xh}_{t}$ and $P^{ah}_{t}$ are finally estimated by their empirical counterparts

[TABLE]

with $\overline{h}^{M}_{t}$ defined by

[TABLE]

Summing everything up, we obtain the following generalised ensemble Kalman–Bucy filter equations

[TABLE]

where the innovations are given by

[TABLE]

and $W_{t}^{i}$ and $U_{t}^{i}$ denote independent $N_{x}$ -dimensional and $N_{y}$ -dimensional, respectively, Brownian motions for $i=1,\ldots,M$ .

The interacting particle equations (83) can be time-stepped along the lines discussed in Section 4.3 for the pure state estimation formulation of the ensemble Kalman–Bucy filter.

6 Numerical results

We now apply the generalised ensemble Kalman–Bucy filter formulation (83) with innovation (84) to five different model scenarios.

6.1 Parameter estimation for the Ornstein–Uhlenbeck process

Our first example is provided by the Ornstein–Uhlenbeck process

[TABLE]

with unknown parameter $a\in\mathbb{R}$ , and known initial condition $X_{0}=1/2$ . We assume an observation model of the form (3) with $H=1$ , and a measurement error taking values $R=0.01$ , $R=0.0001$ , and $R=0$ . The model error variance is set to either $Q=0.5$ or $Q=0.005$ . Except for the case $R=0$ a combined state and parameter estimation problem is to be solved. We implement the ensemble Kalman–Bucy filter (83) with innovation (84), step size $\Delta t=0.005$ , and ensemble size $M=1000$ . The data is generated using the Euler–Maruyama method applied to (85), with $a=-1/2$ and integrated over a time-interval $[0,500]$ with the same step size. The prior distribution $\Pi_{0}$ for the parameter is Gaussian with mean $\overline{a}=-1/2$ and variance $\sigma_{a}^{2}=2$ . The results can be found in Figure 1. We find that the ensemble Kalman–Bucy filter is able to successfully identify the unknown parameter under all tested experimental settings, except for the largest measurement error case where $R=0.01$ . There, a small systematic offset of the estimated parameter value can be observed. One can also see that the variance in the parameter estimate monotonically decreases in time in all cases, while the variance in the state estimates approximately reaches a steady state.

6.2 Averaging

Consider the equations

[TABLE]

from Pavliotis and Stuart (2008) for $\lambda,\alpha,\gamma,\epsilon>0$ , and initial condition $Y_{0}=1/2$ , $Z_{0}=0$ . The reduced equations in the limit $\epsilon\to 0$ are given by (85), with parameter value

[TABLE]

and initial condition $X_{0}=1/2$ . The reduced dynamics corresponds to a (stable) Ornstein–Uhlenbeck process for $\lambda/\alpha>1$ . We wish to estimate the parameter $a$ from observed increments

[TABLE]

where the sequence of $\{Y_{n}\}_{n\geq 0}$ is obtained by time-stepping (86) using the Euler–Maruyama method with a step size $\Delta t$ . We set $\lambda=3$ , $\alpha=2$ (so that $a=-1/2$ ), $Q=0.5$ , and $\epsilon\in\{0.1,0.01\}$ in our experiments. The measurement noise is set to $R=0.01$ or $R=0$ (pure parameter estimation).

We implement the ensemble Kalman–Bucy filter (83) with innovation (84), step size $\Delta t=\epsilon/50$ , and ensemble size $M=1000$ for the reduced equations (87). The data is generated from an Euler–Maruyama discretization of (86) with the same step size. We also investigate the effect of subsampling the observations for $\epsilon=0.01$ by solving (86) with step size $\Delta t=\epsilon/50$ and storing only every tenth solution $Y_{n}$ , while the reduced equations and the ensemble Kalman–Bucy filter equations are integrated with $\Delta t=\epsilon/5$ . The results are shown in Figure 2. Figure 3 shows the results for the same experiments repeated with a smaller ensemble size of $M=10$ . We find that the smaller ensemble size leads to more noisy estimates for the variance in $\widetilde{X}_{n}$ and a faster decay of the variance in $\widetilde{A}_{n}$ , but the estimated parameter values are equally well converged. Subsampling does not lead to significant changes in the estimated parameter values. This is in contrast to the example considered next.

We finally mention Harlim (2017) for alternative approaches to sequential estimation in the context of averaging using however different assumptions on the data.

6.3 Homogenisation

In this example, the data is produced by integrating the multi-scale SDE

[TABLE]

with parameter values $\epsilon=0.1$ , $a=-1/2$ , $\sigma=1/2$ , and initial condition $Y_{0}=1/2$ , $Z_{0}=0$ . Here, $W_{t}^{z}$ denotes standard Brownian motion. The equations are discretised with step size $\Delta\tau=\epsilon^{2}/50=0.0002$ , and the resulting increments (88) are stored over a time interval $[0,500]$ . See Krumscheid et al. (2011) for more details.

According to homogenisation theory, the reduced model is given by (85) with $Q=\sigma$ , and we wish to estimate the parameter $a$ from the data $\{\Delta Y_{n}\}$ produced according to (88). It is known that a standard maximum likelihood estimator (MLE) given by

[TABLE]

leads to $a_{\rm ML}=0$ in the limit $\Delta\tau\to 0$ and the observation interval $T\to\infty$ . This MLE corresponds to $H=I$ and $R=0$ in our extended state space formulation of the problem. Subsampling can be achieved by choosing an appropriate time-step $\Delta t>\Delta\tau$ in the ensemble Kalman–Bucy filter equations and a corresponding subsampling of the data points $Y_{n}$ in (88). We used $\Delta t=50\Delta\tau=0.01$ and $\Delta t=500\Delta\tau=0.1$ , respectively. The results can be found in Figure 4. It can be seen that only the larger subsampling leads to a correct estimate of the parameter $a$ . This is in line with known results for the maximum likelihood estimator (90). See Krumscheid et al. (2011) and references therein.

6.4 Nonparametric drift and state estimation

We consider nonparametric drift estimation for one-dimensional SDEs over a periodic domain $[0,2\pi)$ in the setting considered from a theoretical perspective in (van Waaij and van Zanten, 2016). There, a zero-mean Gaussian process prior $\mathcal{GP}(0,\mathcal{D}^{-1})$ is placed on the unknown drift function, with inverse covariance operator

[TABLE]

The integer parameter $p$ sets the regularity of the process, whereas $\eta,\kappa\in\mathbb{R}^{+}$ control its characteristic correlation length and stationary variance.

Spatial discretization of the problem is carried out by first defining a grid of $N_{d}$ evenly spaced points on the domain, at locations $x_{i}=i\Delta x$ , $\Delta x=2\pi/N_{d}$ . The drift function is projected onto compactly supported functions centred at these points, which are piecewise linear with

[TABLE]

and linear interpolation is used to define a drift function $f(x,a)$ for all $x\in[0,2\pi)$ , that is, it is of the form (2) with $f_{0}(x)\equiv 0$ . In this example, we set $N_{d}=200$ . Sample realisations, as well as the reference drift $f^{*}$ , can be found in Figure 5(a).

Data is generated by integrating the SDE (1) with drift $f^{\ast}$ forward in time from initial condition $X_{0}=\pi$ and with noise level $Q=0.1$ , using the Euler–Maruyama discretisation with step size $\Delta t=0.1$ over one million time-steps. The spatial distribution of the solutions $X_{n}$ is plotted in Figure 5(b). The data is then given by

[TABLE]

with $R=0.00001$ . Data assimilation is performed using the time-discretised ensemble Kalman–Bucy filter equations (83) with innovation (84), ensemble size $M=200$ , and step size $\Delta t=0.1$ .

The final estimate of the drift function (ensemble mean) and the ensemble of drift functions can be found in Figure 5(c). Figure 5(d) displays the ensemble of state estimates and the value of the reference solution at the final time. We find that the ensemble Kalman–Bucy filter is able to successfully estimate the drift function and the model states. Further experiments reveal that the drift function can only be identified for sufficiently small measurement errors.

6.5 SPDE parameter estimation

Consider the stochastic heat equation on the periodic domain $x\in[0,2\pi)$ , given in conservative form by the stochastic partial differential equation (SPDE)

[TABLE]

where $W(x,t)$ is space-time white noise. With constant $\theta(x)=\theta$ , this SPDE reduces to

[TABLE]

In this example, we examine the estimation of $\theta$ from incremental measurements of a locally averaged quantity $q(x,t)$ that arises naturally in a standard finite volume discretisation of (95).

To discretise the system, one first defines $q^{i}_{t}=q(x_{i},t)$ around $N_{d}=200$ grid points $x_{i}$ on a regular grid, separated by distances $\Delta x$ , as

[TABLE]

The conservative (drift) term in (94) reduces to

[TABLE]

where $\theta_{i\pm 1/2}\equiv\theta(x_{i}+\Delta x/2)$ , etc. The following standard finite difference approximations

[TABLE]

yield the $N_{d}$ -dimensional SDE

[TABLE]

for constant $\theta$ , where $W_{t}^{i}$ are independent one-dimensional Brownian motions in time.

Following recent results from Altmeyer and Reiß (2019) we consider the case of estimation of a constant $a=\theta$ value from measurements ${\rm d}q^{\ast}_{t}$ at a fixed location/index $j^{\ast}\in\{1,\ldots,N_{d}\}$ . The data trajectory is thus given by

[TABLE]

where $R^{1/2}$ is a scalar and $V_{t}$ is a standard Brownian motion in one dimension. We perform numerical experiments in which the initial state $q_{0}^{i}$ is set to zero for all indices $i$ and the prior on the unknown parameter $a=\theta$ is uniform over the interval $[0.2,1.8]$ .

The increment data is generated by first integrating (95) forward in time from the known initial condition $q_{i}(0)=0$ for all $i$ . The equation is discretised in time using the Euler-Maruyama method. It is known that $\Delta t<\theta\Delta x^{2}/2$ is required for stability of the Euler–Maruyama discretisation; we use the much smaller time step $\Delta t=\Delta x^{2}/80$ . The solution is sampled with this same time step, and increment measurements are approximated at time $t_{n}$ by setting the measurement noise level $R$ to zero in (100), resulting in

[TABLE]

Note that the associated model error in (1) is given by $G=\sigma^{1/2}\Delta x^{1/2}I$ and the matrix $H$ in (3) projects the vector of state increments onto a single component with index $j^{\ast}=N_{d}/2$ . Simulations are performed over the time-interval $[0,20]$ . The results can be found in Figure 6(a). We also compute the model evidence for a sequence of parameter values $\theta\in\{0.2,0.3,\ldots,1.8\}$ based on a standard Kalman–Bucy filter Simon (2006) for the associated linear state estimation problem. See Figure 6(b). Both approaches agree with the reference value $\theta=1$ .

6.6 Discussion

The presented results demonstrate that the proposed methodology can be applied to a broad range of continuous-time state and parameter estimation problems with correlated measurement and model errors. Alternatively, one could have employed standard SMC or MCMC methods utilising the modified observation model (41). However, such implementations require the approximation of the additional $Q\nabla\log\pi_{t}$ term which is nontrivial if only samples from $\pi_{t}$ are available. Furthermore, the limiting behaviour of such implementations in the limit $R\to 0$ and $H=I$ (pure parameter estimation problem) is unclear. The proposed generalised ensemble Kalman–Bucy filter avoids these issues and is easy to implement. In fact, the only differences to the standard ensemble Kalman–Bucy filter formulation of Bergemann and Reich (2012) consist in the additional $QH^{\rm T}$ term in the Kalman gain and a correlation between the stochastic innovation process and the model error.

7 Conclusions

In this paper, we have derived McKean–Vlasov equations for combined state and parameter estimation from continuously observed state increments. An approximate and robust implementation of these McKean-Vlassov equations in the form of a generalised ensemble Kalman–Bucy filter has been provided and applied to a range of increasingly complex model systems. Future work will address the treatment of temporally correlated measurement and model errors as well as a rigorous analysis of these McKean–Vlasov equations in a multi-scale context and in the context of nonparametric drift estimation.

\authorcontributions

Methodology, N.N. and S.R.; software, S.R. and P.R.; validation, N.N., S.R. and P.R.; writing–original draft preparation, N.N., S.R.; writing–review and editing, N.N., S.R. and P.R.

\funding

This research has been partially funded by Deutsche Forschungsgemeinschaft (DFG) through grants CRC 1294 ‘Data Assimilation’ (project A06) and CRC 1114 ‘Scaling Cascades’ (project A02).

\conflictsofinterest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

\appendixtitles

no

Appendix A The filtering equations for correlated noise

In this appendix we outline a derivation of the Kushner-Stratonovich equation (39) for the signal-observation dynamics given by (38). In fact, we only compute the evolution equation (termed modified Zakai equation) for the unnormalised filtering distribution $\rho_{t}[\phi]=\mathbb{E}\left[l_{t}\phi(X_{t})|Y_{[0,t]}\right]$ , where the likelihood $l_{t}$ is given by

[TABLE]

Obtaining the Kushner-Stratonovich formulation is then standard, applying Itô’s formula to the Kallianpur-Striebel formula $\pi[\phi]=\rho_{t}[\phi]/\rho_{t}[\mathbf{1}]$ , see (Bain and Crisan, 2009, Chapter 3). The following result is in agreement with the corollaries 3.39 and 3.40 in Bain and Crisan (2009).

{Lemma}

The modified Zakai equation is given by

[TABLE]

where the generator $\mathcal{L}$ has been defined in (40).

Proof.

For convenience, let us define the process

[TABLE]

where $Y_{s}$ satisfies (38b). From $\langle Y\rangle_{t}=Ct$ we see that

[TABLE]

hence the likelihood takes the form

[TABLE]

satisfying the SDE

[TABLE]

For an arbitrary smooth compactly supported test function $\phi$ Itô’s formula implies

[TABLE]

where $X_{s}$ satisfies (38a). For the covariation process $\langle l,X\rangle_{t}$ we obtain

[TABLE]

using $\langle Y,X\rangle_{t}=HQt$ . Furthermore, $\langle X,X\rangle_{t}=Qt$ , which follows from the definition of the stochastic contributions in (38a).

We now apply the conditional expectation to (108). Noticing that

[TABLE]

the result follows from (107). ∎

\reftitle

References

\externalbibliography

yes

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Kutoyants (2004) Kutoyants, Y. Statistical inference for ergodic diffusion processes ; Springer–Verlag: New York, 2004.
2Pavliotis (2014) Pavliotis, G. Stochastic processes and applications ; Springer–Verlag: New York, 2014.
3Apte et al. (2007) Apte, A.; Hairer, M.; Stuart, A.; Voss, J. Sampling the posterior: An approach to non-Gaussian data assimilation. Physica D Nonlinear Phenomena 2007 , 230 , 50–64.
4Salman et al. (2006) Salman, H.; Kuznetsov, L.; Jones, C.; Ide, K. A method for assimilating Lagrangian data into a shallow-water-equation ocean model. Mon. Wea. Rev. 2006 , 134 , 1081–1101.
5Apte et al. (2008) Apte, A.; Jones, C.; Stuart, A. A Bayesian approach to Lagrangian data assimilation. Tellus A 2008 , 60 , 336–347.
6Simon (2006) Simon, D. Optimal State Estimation ; Wiley: Hoboken, New Jersey, 2006.
7Bain and Crisan (2009) Bain, A.; Crisan, D. Fundamentals of stochastic filtering ; Springer–Verlag: New York, 2009.
8Liu (2001) Liu, J. Monte Carlo strategies in scientific computing ; Springer-Verlag: New York, 2001.