A Probabilistic Framework for Moving-Horizon Estimation: Stability and   Privacy Guarantees

Vishaal Krishnan; Sonia Mart\'inez

arXiv:1812.09672·math.OC·December 20, 2019·IEEE Trans. Autom. Control.

A Probabilistic Framework for Moving-Horizon Estimation: Stability and Privacy Guarantees

Vishaal Krishnan, Sonia Mart\'inez

PDF

TL;DR

This paper introduces a probabilistic framework for stable moving-horizon estimation of nonlinear systems, incorporating differential privacy, and compares two variants based on Wasserstein distance and KL-divergence.

Contribution

It unifies stability analysis and privacy guarantees in moving-horizon estimators using probabilistic metrics and introduces two novel estimator variants.

Findings

01

W2-MHE provides a gradient-based estimation approach.

02

KL-MHE functions as a particle filter with stability properties.

03

Differential privacy can be achieved through entropy regularization.

Abstract

This work proposes a unifying probabilistic framework for the design of robustly asymptotically stable moving-horizon estimators (MHE) for discrete-time nonlinear systems, and a mechanism to incorporate differential privacy in moving-horizon estimation. We begin with an investigation of the classical notion of strong local observability of nonlinear systems and its relationship to optimization-based state estimation. We then present a general moving-horizon estimation framework for strongly locally observable systems, as an iterative minimization scheme in the space of probability measures. This framework allows for the minimization of the estimation cost with respect to different metrics. In particular, we consider two variants, which we name $W_{2}$ -MHE and KL-MHE, where the minimization scheme uses the 2-Wasserstein distance and the KL-divergence, respectively. The $W_{2}$ -MHE yields a…

Equations177

prox_{F} (x) = ar g \tilde{x} \in X min \frac{1}{2} ∥ \tilde{x} - x ∥^{2} + F (\tilde{x}) .

prox_{F} (x) = ar g \tilde{x} \in X min \frac{1}{2} ∥ \tilde{x} - x ∥^{2} + F (\tilde{x}) .

W_{2}^{2} (μ_{1}, μ_{2}) = π \in Π (μ_{1}, μ_{2}) in f \int_{X \times X} ∥ x - y ∥^{2} d π (x, y) .

W_{2}^{2} (μ_{1}, μ_{2}) = π \in Π (μ_{1}, μ_{2}) in f \int_{X \times X} ∥ x - y ∥^{2} d π (x, y) .

D_{KL} (μ_{1} ∣∣ μ_{2}) = \int_{X} lo g (\frac{d μ _{1} ( x )}{d μ _{2} ( x )}) d μ_{1} (x) = \int_{X} ρ_{1} (x) lo g (\frac{ρ _{1} ( x )}{ρ _{2} ( x )}) dvol (x) .

D_{KL} (μ_{1} ∣∣ μ_{2}) = \int_{X} lo g (\frac{d μ _{1} ( x )}{d μ _{2} ( x )}) d μ_{1} (x) = \int_{X} ρ_{1} (x) lo g (\frac{ρ _{1} ( x )}{ρ _{2} ( x )}) dvol (x) .

D_{max} (μ_{1}, μ_{2}) = x \in X sup lo g (\frac{ρ _{1} ( x )}{ρ _{2} ( x )}) .

D_{max} (μ_{1}, μ_{2}) = x \in X sup lo g (\frac{ρ _{1} ( x )}{ρ _{2} ( x )}) .

ϵ \geq D_{max} (E [y_{1}], E [y_{2}]) = x \in X sup lo g (\frac{ρ _{1} ( x )}{ρ _{2} ( x )}) \geq lo g (\frac{ρ _{1} ( x )}{ρ _{2} ( x )}) .

ϵ \geq D_{max} (E [y_{1}], E [y_{2}]) = x \in X sup lo g (\frac{ρ _{1} ( x )}{ρ _{2} ( x )}) \geq lo g (\frac{ρ _{1} ( x )}{ρ _{2} ( x )}) .

Ω : {x_{k + 1} = f (x_{k}, w_{k}), y_{k} = h (x_{k}) + v_{k},

Ω : {x_{k + 1} = f (x_{k}, w_{k}), y_{k} = h (x_{k}) + v_{k},

Σ : {x_{k + 1} = f (x_{k}, 0) = f_{0} (x_{k}), y_{k} = h (x_{k}) .

Σ : {x_{k + 1} = f (x_{k}, 0) = f_{0} (x_{k}), y_{k} = h (x_{k}) .

f_{0} (x) = ⎩ ⎨ ⎧ 3 x, γ (x) 2 x + aπ, for x \in (0, aπ - ϵ], for x \in (aπ - ϵ, aπ + ϵ], for x \in (aπ + ϵ, \infty),

f_{0} (x) = ⎩ ⎨ ⎧ 3 x, γ (x) 2 x + aπ, for x \in (0, aπ - ϵ], for x \in (aπ - ϵ, aπ + ϵ], for x \in (aπ + ϵ, \infty),

x_{0} \in ar g x \in X min J_{T} (y_{0 : T}, Σ_{T} (x)) .

x_{0} \in ar g x \in X min J_{T} (y_{0 : T}, Σ_{T} (x)) .

\displaystyle\frac{\left\langle\nabla^{2}\Sigma_{T}[v,v](x),\nabla J_{T}\bigg{|}_{\Sigma_{T}(x)}\right\rangle}{\left\|\nabla\Sigma_{T}[v]\right\|^{2}}\leq-\lambda_{\textup{max}}\left(\operatorname{Hess}J_{T}\bigg{|}_{\Sigma_{T}(x)}\right),

\displaystyle\frac{\left\langle\nabla^{2}\Sigma_{T}[v,v](x),\nabla J_{T}\bigg{|}_{\Sigma_{T}(x)}\right\rangle}{\left\|\nabla\Sigma_{T}[v]\right\|^{2}}\leq-\lambda_{\textup{max}}\left(\operatorname{Hess}J_{T}\bigg{|}_{\Sigma_{T}(x)}\right),

μ_{0} \in ar g μ \in P (X) min E_{μ} [J_{T} (y_{0 : T}, Σ_{T} (\cdot))] .

μ_{0} \in ar g μ \in P (X) min E_{μ} [J_{T} (y_{0 : T}, Σ_{T} (\cdot))] .

μ_{k} \in ar g μ \in P (X) min D (μ, f_{0#} μ_{k - 1}) + η E_{μ} [G_{k}^{N}], given μ_{0} \in P (X),

μ_{k} \in ar g μ \in P (X) min D (μ, f_{0#} μ_{k - 1}) + η E_{μ} [G_{k}^{N}], given μ_{0} \in P (X),

μ_{k} \in ar g μ \in P (X) min \frac{1}{2} W_{2}^{2} (μ, f_{0#} μ_{k - 1}) + η E_{μ} [G_{k}^{N}], given μ_{0} \in P (X) .

μ_{k} \in ar g μ \in P (X) min \frac{1}{2} W_{2}^{2} (μ, f_{0#} μ_{k - 1}) + η E_{μ} [G_{k}^{N}], given μ_{0} \in P (X) .

\displaystyle c=\frac{\delta}{\delta\mu}\left[\left(\frac{1}{2}W_{2}^{2}(\mu,f_{0\#}\mu_{k-1})+\eta\mathbb{E}_{\mu}\left[G^{N}_{k}\right]\right)\right]\bigg{|}_{\mu=\mu_{k}}=\phi_{k}+\eta G^{N}_{k},

\displaystyle c=\frac{\delta}{\delta\mu}\left[\left(\frac{1}{2}W_{2}^{2}(\mu,f_{0\#}\mu_{k-1})+\eta\mathbb{E}_{\mu}\left[G^{N}_{k}\right]\right)\right]\bigg{|}_{\mu=\mu_{k}}=\phi_{k}+\eta G^{N}_{k},

\nabla ϕ_{k} (x) + η \nabla G_{k}^{N} (x) = 0.

\nabla ϕ_{k} (x) + η \nabla G_{k}^{N} (x) = 0.

x = T_{k}^{- 1} (x) - η \nabla G_{k}^{N} (x) .

x = T_{k}^{- 1} (x) - η \nabla G_{k}^{N} (x) .

z_{k} = f_{0} (z_{k - 1}) - η \nabla G_{k}^{N} (z_{k}), k > 0.

z_{k} = f_{0} (z_{k - 1}) - η \nabla G_{k}^{N} (z_{k}), k > 0.

z_{k} \in ar g z min \frac{1}{2} ∥ z - f_{0} (z_{k - 1}) ∥^{2} + η G_{k}^{N} (z), k > 0, z_{0} \sim μ_{0} \in P (X) .

z_{k} \in ar g z min \frac{1}{2} ∥ z - f_{0} (z_{k - 1}) ∥^{2} + η G_{k}^{N} (z), k > 0, z_{0} \sim μ_{0} \in P (X) .

z_{k} = prox_{η G_{k}^{N}} (f_{0} (z_{k - 1})), k > 0, z_{0} \sim μ_{0} \in P (X),

z_{k} = prox_{η G_{k}^{N}} (f_{0} (z_{k - 1})), k > 0, z_{0} \sim μ_{0} \in P (X),

∣ G_{k}^{N} (f_{0} (z_{k - 1})) - G_{k}^{N} (z_{k}) - ⟨ \nabla G_{k}^{N} (z_{k}), f_{0} (z_{k - 1}) - z_{k} ⟩ ∣ \leq \frac{l}{2} ∥ f_{0} (z_{k - 1}) - z_{k} ∥^{2} .

∣ G_{k}^{N} (f_{0} (z_{k - 1})) - G_{k}^{N} (z_{k}) - ⟨ \nabla G_{k}^{N} (z_{k}), f_{0} (z_{k - 1}) - z_{k} ⟩ ∣ \leq \frac{l}{2} ∥ f_{0} (z_{k - 1}) - z_{k} ∥^{2} .

∣ G_{k}^{N} (f_{0} (z_{k - 1})) - G_{k}^{N} (z_{k}) - η ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} ∣ \leq η^{2} \frac{l}{2} ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} .

∣ G_{k}^{N} (f_{0} (z_{k - 1})) - G_{k}^{N} (z_{k}) - η ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} ∣ \leq η^{2} \frac{l}{2} ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} .

G_{k}^{N} (z_{k}) \leq G_{k}^{N} (f_{0} (z_{k - 1})) - η (1 - \frac{l}{2} η) ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} .

G_{k}^{N} (z_{k}) \leq G_{k}^{N} (f_{0} (z_{k - 1})) - η (1 - \frac{l}{2} η) ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} .

G_{k}^{N} (z_{k}) \leq G_{k - 1}^{N} (z_{k - 1}) + L ∥\nabla G_{k - 1}^{N} (z_{k - 1}) ∥^{2} - η (1 - \frac{l}{2} η) ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} .

G_{k}^{N} (z_{k}) \leq G_{k - 1}^{N} (z_{k - 1}) + L ∥\nabla G_{k - 1}^{N} (z_{k - 1}) ∥^{2} - η (1 - \frac{l}{2} η) ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} .

η (1 - \frac{l}{2} η) k = 1 \sum K ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} - L k = 1 \sum K ∥\nabla G_{k - 1}^{N} (z_{k - 1}) ∥^{2} \leq G_{0}^{N} (z_{0}) - G_{K}^{N} (z_{K}) .

η (1 - \frac{l}{2} η) k = 1 \sum K ∥\nabla G_{k}^{N} (z_{k}) ∥^{2} - L k = 1 \sum K ∥\nabla G_{k - 1}^{N} (z_{k - 1}) ∥^{2} \leq G_{0}^{N} (z_{0}) - G_{K}^{N} (z_{K}) .

[η (1 - \frac{l}{2} η) - L] k = 1 \sum K ∥\nabla G_{k}^{N} (z_{k}) ∥^{2}

[η (1 - \frac{l}{2} η) - L] k = 1 \sum K ∥\nabla G_{k}^{N} (z_{k}) ∥^{2}

\leq G_{0}^{N} (z_{0}) + L ∥\nabla G_{0}^{N} (z_{0}) ∥^{2} .

\overset{μ}{ˉ}_{k} \in ar g μ \in P (X) min \frac{1}{2} W_{2}^{2} (μ, f_{0#} \overset{μ}{ˉ}_{k - 1}) + η E_{μ} [\overset{ˉ}{G}_{k}^{N}], given \overset{μ}{ˉ}_{0} \in P (X) .

\overset{μ}{ˉ}_{k} \in ar g μ \in P (X) min \frac{1}{2} W_{2}^{2} (μ, f_{0#} \overset{μ}{ˉ}_{k - 1}) + η E_{μ} [\overset{ˉ}{G}_{k}^{N}], given \overset{μ}{ˉ}_{0} \in P (X) .

\overset{z}{ˉ}_{k} = f (\overset{z}{ˉ}_{k - 1}, w_{k - 1}) - η \nabla \overset{ˉ}{G}_{k}^{N} (\overset{z}{ˉ}_{k}),

\overset{z}{ˉ}_{k} = f (\overset{z}{ˉ}_{k - 1}, w_{k - 1}) - η \nabla \overset{ˉ}{G}_{k}^{N} (\overset{z}{ˉ}_{k}),

∥ z_{k} - \overset{z}{ˉ}_{k} ∥

∥ z_{k} - \overset{z}{ˉ}_{k} ∥

= ∥ f_{0} (z_{k - 1}) - f_{0} (\overset{z}{ˉ}_{k - 1}) + f_{0} (\overset{z}{ˉ}_{k - 1}) - f (\overset{z}{ˉ}_{k - 1}, w_{k - 1})

- η \nabla G_{k}^{N} (z_{k}) + η \nabla G_{k}^{N} (\overset{z}{ˉ}_{k}) - η \nabla G_{k}^{N} (\overset{z}{ˉ}_{k}) + η \nabla \overset{ˉ}{G}_{k}^{N} (\overset{z}{ˉ}_{k}) ∥

\leq c_{f}^{(1)} ∥ z_{k - 1} - \overset{z}{ˉ}_{k - 1} ∥ + c_{f}^{(2)} ∥ w_{k - 1} ∥ + η l ∥ z_{k} - \overset{z}{ˉ}_{k} ∥

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Probabilistic Framework for Moving-Horizon

Estimation: Stability and Privacy Guarantees

Vishaal Krishnan

Sonia Martínez The authors are with the Department of Mechanical and Aerospace Engineering, University of California at San Diego, La Jolla CA 92093 USA (email: [email protected]; [email protected]).

Abstract

This work proposes a unifying probabilistic framework for the design of robustly asymptotically stable moving-horizon estimators (MHE) for discrete-time nonlinear systems, and a mechanism to incorporate differential privacy in moving-horizon estimation. We begin with an investigation of the classical notion of strong local observability of nonlinear systems and its relationship to optimization-based state estimation. We then present a general moving-horizon estimation framework for strongly locally observable systems, as an iterative minimization scheme in the space of probability measures. This framework allows for the minimization of the estimation cost with respect to different metrics. In particular, we consider two variants, which we name $W_{2}$ -MHE and KL-MHE, where the minimization scheme uses the 2-Wasserstein distance and the KL-divergence, respectively. The $W_{2}$ -MHE yields a gradient-based estimator whereas the KL-MHE yields a particle filter, for which we investigate asymptotic stability and robustness properties. Stability results for these moving-horizon estimators are derived in the probabilistic setting, against the backdrop of the classical notion of strong local observability which, to the best of our knowledge, differentiates it from other previous works. We then propose a mechanism to encode differential privacy of the measurements used by the estimator via an entropy regularization of the MHE objective functional. In particular, we find sufficient bounds on the regularization parameter to achieve the desired level of differential privacy. Numerical simulations demonstrate the performance of these estimators.

1 Introduction

Moving-horizon estimation (MHE) is an optimization-based state estimation method that uses the most recent measurements within a moving-time horizon to recursively update state estimates. In principle, its optimization-based formulation enables it to handle nonlinearities and state constraints much more effectively than other known methods. This, coupled with the adoption of increasingly powerful, inexpensive computing platforms has brought new impetus to the adoption of moving-horizon estimation in various data-driven applications. In many cases, data is acquired from particular individuals or users, which introduces new ethical concerns about data collection and manipulation, highlighting an increasing need for data privacy. Such is the case in home monitoring and traffic estimation (with vehicle GPS data) applications, to name a few. Motivated by this, here we design and analyze a new class of moving-horizon estimation filters that can guarantee the differential-privacy of the data.

The origins of MHE can be traced back to the limited memory optimal filters introduced in [19]. Theoretical investigations on MHE have broadly been directed at their asymptotic stability [29, 2, 33] and robustness [20, 24, 18] properties. These properties have primarily been built upon underlying assumptions of input/output-to-state (IOSS) stability, which is adopted as the notion of detectability, wherein the norm of the state is bounded given the sequences of inputs and outputs. However, alternative foundations for the stability results in other classical notions of observability, such as strong observability [25], have remained unexplored. The connection between nonlinear observability theory and estimation problems runs deep, see [22] and more recently [32], and it is worthwhile to explore this connection in the context of optimization-based estimation methods such as moving-horizon estimation.

The problem of state estimation is fundamentally about dealing with uncertainty, manifested as uncertainty in the initial conditions and/or in the evolution of the system in the presence of unknown disturbances. This is appropriately formulated in the space of probability measures over the state space of the system. Recent advances in gradient flows in the space of probability measures [5], [30], and the corresponding discrete-time movement-minimizing schemes [28] present powerful theoretical tools that can be applied to recursive optimization-based estimation methods such as moving-horizon estimation, and can serve as a unifying framework for their design and analysis.

Another important consideration in the MHE problem is the cost of computation. The problem formulation more commonly involves solving an optimization problem at every time instant, with the state estimate and disturbances as decision variables in the optimization, where the dimension of the problem scales with the size of the horizon. This approach, in general, tends to be computationally intensive, which poses a hurdle for implementation in real-time. This has motivated the search for fast MHE that implement one or more iterations of the optimization at every time instant. Recently, in [3], [4], the authors develop such a method for noiseless systems and provide theoretical guarantees on convergence. However, these works assume the convexity of the cost function, which is restrictive for general nonlinear systems, and not well connected to notions of observability. None of these works has considered the additional question of privacy.

Differential privacy [11] has emerged over the past decade as a benchmark in data privacy. The typical setting assumes independence between the records in static databases; however, basic existing mechanisms fail to provide guarantees when correlations exist between the records in the database. This is the case when data is employed by a state estimation process whose output is then released: there is a dynamic system from which a time series of sensor measurements is obtained, and the measurement data and the released estimates are correlated.

In [8, 9], the authors generalize the definition of differential privacy to include general notions of distance between datasets and design differentially private mechanisms for Bayesian inference. In [23, 31], the authors investigate privacy-preserving mechanisms for the case where correlations exist between database records. Privacy-preserving mechanisms for functions and functional data were investigated in [15]. The work [27] studies the problem of differentially-private state estimation, introducing the formal notion of differential privacy into the framework of Kalman filter design for dynamic systems. The authors of [13] consider the problem of optimal state estimation for linear discrete-time systems with measurements corrupted by Laplacian noise. A finite-dimensional distributed convex optimization is considered in [26], where differential privacy is achieved by perturbation of the objective function. We refer the reader to [7] for a broad overview of the systems and control-theoretic perspective on differential privacy.

Contributions: The contributions of this work are two-fold: establishing the robust asymptotic stability of the proposed moving-horizon estimator in a probabilitstic framework, founded on the notion of strong local observability; and incorporating differential privacy in moving-horizon estimation. We begin with the well-studied notion of strong local observability of nonlinear, discrete-time systems and investigate its relationship to the optimization-based state estimation problem. To handle uncertain initial conditions and the possible non-uniqueness of solutions to the estimation problem, we adopt a generalized problem formulation over the space of probability measures over the state space. More precisely, we define the MHE as a proximal gradient descent in the space of probability measures, with a non-convex, time-varying cost function. This probabilistic setting serves as a unifying framework for moving-horizon estimation and allows us to develop different classes of moving-horizon estimators by simply varying the metric used to define the proximal operator, and to obtain implementable filters by Monte Carlo methods. We then consider the Wasserstein metric and the KL-divergence, which yield the more familiar MHE and a particle filter, respectively. Following this, we present an analysis of the convergence and robustness properties of these estimators in the probabilistic setting, under assumptions of strong local observability. Further, we modify the optimization problem (in the space of probability measures) by an entropy regularization to derive conditions that guarantee a desired level of differential privacy for these filters.

Paper organization: The rest of the paper is organized as follows. In Section 2, we introduce the notation and mathematical preliminaries used in the paper. We present the optimization-based state estimation problem in Section 4, where Section 4.1 deals with the Full Information Estimation (FIE) problem and the Moving-horizon Estimation (MHE) problem is introduced in Section 4.2. We present the MHE method based on proximal gradient descent with the Wasserstein metric in Section 5, and with the KL-divergence in Section 6. In Section 7, we address the differential privacy considerations for the moving-horizon estimators designed. The results from numerical experiments are presented in Section 8, with the conclusions in Section 9.

2 Notation and preliminaries

In this section, we introduce the notation and mathematical preliminaries relevant to this paper.

Let $\|\cdot\|:{}^{d}\rightarrow{\mathbb{R}}_{\geq 0}$ denote the Euclidean norm on d and $|\cdot|:\real\rightarrow{\mathbb{R}}_{\geq 0}$ the absolute value function. We denote by $\nabla=\left(\frac{\partial}{\partial x_{1}},\ldots\frac{\partial}{\partial x_{n}}\right)$ the gradient operator in d. For any $x\in\mathcal{X}\subset{}^{d}$ , we let $\mu\in\mathcal{P}(\mathcal{X})$ be an absolutely continuous probability measure on $\mathcal{X}\subset{}^{d}$ . We denote by $\rho$ the corresponding density function, where $\operatorname{d}\mu=\rho\operatorname{dvol}$ , with $\operatorname{vol}$ being the Lebesgue measure. For $M\subseteq\mathcal{X}$ , let the distance $d(x,M)$ of a point $x\in\mathcal{X}$ to the set $M$ be given by $d(x,M)=\inf_{y\in M}\|x-y\|$ . We denote by $\left\langle p,q\right\rangle$ the inner product of functions $p,q:\mathcal{X}\rightarrow\real$ with respect to the Lebesgue measure vol, given by $\left\langle p,q\right\rangle=\int_{\mathcal{X}}pq\operatorname{dvol}$ . Let $F:\mathcal{P}(\mathcal{X})\rightarrow\real$ be a smooth real-valued function on the space of probability measures on $\mathcal{X}\subset{}^{d}$ . We denote by $\frac{\delta F}{\delta\mu}(x)$ the derivative of $F$ with respect to $\mu$ , see [12], such that a perturbation $\delta\mu$ of the measure results in a perturbation $\delta F=\int_{\mathcal{X}}\frac{\delta F}{\delta\mu}d(\delta\mu)$ . Given a map $\mathcal{T}:\mathcal{X}\rightarrow\mathcal{Y}$ and a measure $\mu\in\mathcal{P}(\mathcal{X})$ , in the space of probability measures $\mathcal{P}(\mathcal{X})$ , we let $\nu=\mathcal{T}_{\#}\mu$ denote the pushforward measure of $\mu$ by $\mathcal{T}$ , where for a measurable set $\mathcal{B}\subset\mathcal{T}(\mathcal{X})$ , we have $\nu(\mathcal{B})=\mathcal{T}_{\#}\mu(\mathcal{B})=\mu(\mathcal{T}^{-1}(\mathcal{B}))$ . Moreover, we denote by $\mathbb{E}_{\mu}$ the expectation operator w.r.t. the measure $\mu$ .

We now introduce the notion of $l$ -smoothness that underlies the results on convergence of gradient descent methods.

Definition 1.

( $l$ -smoothness). A function $p:\mathcal{X}\rightarrow\real$ is called $l$ -smooth (or Lipschitz differentiable) if for any $x,y\in\mathcal{X}$ , we have $|\nabla p(y)-\nabla p(x)|\leq l\|y-x\|$ .

The following lemma [6] can be easily verified for $l$ -smooth functions:

Lemma 1.

( $l$ -smooth functions). For an $l$ -smooth function $p:\mathcal{X}\rightarrow\real$ and any $x,y\in\mathcal{X}$ , we have $|p(y)-p(x)-\langle\nabla p(x),y-x\rangle|\leq\frac{l}{2}\|y-x\|^{2}$ . $\bullet$

We now define the proximal operator on $\mathcal{X}$ with respect to a function $F:\mathcal{X}\rightarrow\real$ , as follows:

[TABLE]

The notion of observability used in this paper is intricately related to solutions of inverse problems, with an associated notion of well-posedness that is introduced below:

Definition 2.

*(Well posedness [21]).

Let $\mathcal{X}$ and $\mathcal{Y}$ be normed spaces, and $P:\mathcal{X}\rightarrow\mathcal{Y}$ a mapping. The equation $P(x)=y$ is called well-posed if:*

Existence: For every $y\in\mathcal{Y}$ , there is (at least one) $x\in\mathcal{X}$ such that $P(x)=y$ . 2. 2.

Uniqueness: For every $y\in\mathcal{Y}$ , there is at most $x\in\mathcal{X}$ such that $P(x)=y$ . 3. 3.

Stability: The solution $x$ depends continuously on $y$ , that is, for any sequence $\{x_{i}\}\subset\mathcal{X}$ such that $P(x_{i})\rightarrow P(x)$ , it follows that $x_{i}\rightarrow x$ .

We now introduce the notion of lower semicontinuity of set-valued maps, which underlies some of the results on optimization-based state estimation in this paper.

Definition 3.

(Lower semicontinuity of set-valued maps). A point-to-set mapping $H:\mathcal{Z}\subset\real\rightrightarrows{}^{d}$ is lower semicontinuous at a point $\alpha\in\mathcal{Z}$ if for any $x\in H(\alpha)$ and sequences $\{\alpha_{i}\}\subseteq\mathcal{Z}$ , $\{x_{i}\}\subseteq{}^{d}$ with $\{\alpha_{i}\}\rightarrow\alpha$ , $\{x_{i}\}\rightarrow x$ such that $x_{i}\in H(\alpha_{i})$ for all $i$ , it holds that $x\in H(\alpha)$ . If $H$ is lower semicontinuous at every $\alpha\in\mathcal{Z}$ , then $H$ is said to be lower semicontinuous on $\mathcal{Z}$ .

We now define some notions of distance in the space of probability measures. Let $\mu_{1},\mu_{2}\in\mathcal{P}(\mathcal{X})$ be two absolutely continuous probability measures on $\mathcal{X}$ , with $\rho_{1},\rho_{2}$ being the corresponding density functions. Also, let $\Pi(\mu_{1},\mu_{2})\subset\mathcal{P}(\mathcal{X}\times\mathcal{X})$ be the space of joint probability measures that have $\mu_{1}$ and $\mu_{2}$ as their marginals. The $2$ -Wasserstein distance $W_{2}(\mu_{1},\mu_{2})$ between $\mu_{1}$ and $\mu_{2}$ is given by:

[TABLE]

In what follows, we let $\frac{\delta W_{2}^{2}(\mu_{1},\mu_{2})}{\delta\mu_{1}}=\phi_{1}$ , where $\phi_{1}$ is the so-called the Kantorovich potential [30] associated with the transport from $\mu_{1}$ to $\mu_{2}$ .

The KL-divergence from $\mu_{1}$ to $\mu_{2}$ is given by:

[TABLE]

The max-divergence between $\mu_{1}$ and $\mu_{2}$ is defined as:

[TABLE]

We refer the reader to [14] for a detailed overview of the relations between the various metrics and divergences in probability spaces.

We define an estimator $\mathcal{E}:\mathcal{Y}\rightarrow\mathcal{P}(\mathcal{X})$ as a function that accepts as input data $y$ from the metric space $\mathcal{Y}$ and releases as output $\mathcal{E}[y]$ , a probability measure over the space $\mathcal{X}$ .

Definition 4.

(Differential privacy). Given $\delta$ , an estimator $\mathcal{E}$ is $\epsilon$ -differentially private if for any two $\delta$ -adjacent measurements $y_{1},y_{2}\in\mathcal{Y}$ (that is $d_{\mathcal{Y}}(y_{1},y_{2})\leq\delta$ ), and any measurable $A\subseteq\mathcal{X}$ , we have $\mathcal{E}[y_{1}](A)\leq e^{\epsilon}\mathcal{E}[y_{2}](A)$ .

Note that the condition $d_{\mathcal{Y}}(y^{m}_{1},y^{m}_{2})\leq\delta$ is a generalization of the notion of adjacency to arbitrary metric spaces that we adopt in this paper. We now have the following lemma on the connection between the notions of differential privacy and max-divergence introduced above:

Lemma 2.

*(Differential privacy and max-divergence).

An estimator $\mathcal{E}$ is $\epsilon$ -differentially private iif $D_{\textup{max}}(\mathcal{E}[y_{1}],\mathcal{E}[y_{2}])\leq\epsilon$ for any $y_{1},y_{2}\in\mathcal{Y}$ with $d_{\mathcal{Y}}(y_{1},y_{2})\leq\delta$ .*

Proof.

Clearly, if for any $y_{1},y_{2}\in\mathcal{Y}$ with $d_{\mathcal{Y}}(y_{1},y_{2})\leq\delta$ we have $D_{\textup{max}}(\mathcal{E}[y_{1}],\mathcal{E}[y_{2}])\leq\epsilon$ , then:

[TABLE]

This implies that for any $x\in\mathcal{X}$ , we have $\rho_{1}(x)\leq e^{\epsilon}\rho_{2}(x)$ . Now, for any $A\subseteq\mathcal{X}$ , we have $\mathcal{E}[y^{m}_{1}](A)=\int_{\mathcal{A}}\rho_{1}(x)\operatorname{dvol}\leq\int_{\mathcal{A}}e^{\epsilon}\rho_{2}(x)\operatorname{dvol}=e^{\epsilon}\int_{\mathcal{A}}\rho_{2}(x)\operatorname{dvol}=e^{\epsilon}\mathcal{E}[y^{m}_{2}](A)$ , which implies that $\mathcal{E}$ is $\epsilon$ -differentially private. The forward implication can be easily verified. ∎

Thus, $\epsilon$ -differential privacy essentially imposes an upper bound on the sensitivity of the estimate generated by $\mathcal{E}$ (in the sense of the max-divergence $D_{\textup{max}}$ ), to the measurement.

3 Observability notions

In this paper, we consider systems of the form:

[TABLE]

where $f:\mathbb{X}\times\mathbb{W}\rightarrow\mathbb{X}$ and $h:\mathbb{X}\rightarrow\mathbb{Y}$ , $w_{k}\in\mathbb{W}$ is the process noise, $v_{k}\in\mathbb{V}$ is the measurement noise at time instant $k$ , and $\mathbb{X}\subset{}^{d_{X}}$ , $\mathbb{Y}\subset{}^{d_{Y}}$ , $\mathbb{W}\subset{}^{d_{W}}$ , and $\mathbb{V}\subset{}^{d_{V}}$ .

Assumption 1.

(Lipschitz continuity). The functions $f$ and $h$ are Lipschitz continuous, with $\|f(x_{1},w_{1})-f(x_{2},w_{2})\|\leq c_{f}^{(1)}\|x_{1}-x_{2}\|+c_{f}^{(2)}\|w_{1}-w_{2}\|$ and $\|h(x_{1})-h(x_{2})\|\leq c_{h}\|x_{1}-x_{2}\|$ .

Assumption 2.

(Noise characteristics). The noise sequences $\{w_{k}\}_{k\in\mathbb{N}}$ and $\{v_{k}\}_{k\in\mathbb{N}}$ are i.i.d samples from distributions $\omega$ and $\nu$ (with supports in $\mathbb{W}$ and $\mathbb{V}$ ). The sets $\mathbb{W}$ and $\mathbb{V}$ are bounded, with $|w_{k}|\leq W$ and $|v_{k}|\leq V$ . Moreover, we assume that $\mathbb{E}_{\omega}[w_{k}]=0$ and $\mathbb{E}_{\nu}[v_{k}]=0$ .

We also introduce the following autonomous system corresponding to (1):

[TABLE]

With a slight abuse of notation, for any $x\in\mathbb{X}$ , we let $\Sigma_{T}(x)=\left(h(x),h\circ f_{0}(x),\ldots,h\circ f_{0}^{T}(x)\right)$ , the sequence of outputs over a horizon of length $T+1$ for the system (2) from the state $x\in\mathbb{X}$ . Similarly, for the system (1), we let $\Omega(x,\mathbf{w}_{i:j})=(h(x),h\circ f(x,w_{i}),\ldots,h\circ f(\ldots f(f(x,w_{i}),w_{i+1}),\ldots,w_{j})$ , for some sequence of process noise samples $\{w_{k}\}$ , where $\mathbf{w}_{i:j}=(w_{i},\ldots,w_{j})$ .

The theoretical results in the moving-horizon estimation literature have largely been derived in the setting of input/output-to-state (IOSS) stability, as in [29, 20, 18] to name a few, which is a notion of norm-observability, see [17], wherein the norm of the state is bounded using the sequences of inputs and outputs. However, there are other classical notions of observability based on the notion of distinguishability, which generalize the approach taken to linear systems. For a detailed treatment, we refer the reader to [25] and [1]. In this paper, we explore the connection between the classical notion of strong local observability and moving-horizon estimation.

We now introduce the notion of strong local observability used in this paper:

Definition 5.

(Strong local observability). The system $\Sigma$ defined in (2) is called strongly locally observable if there exists a $T_{0}\in\mathbb{N}$ such that for any given $x\in\mathbb{X}$ and $T\geq T_{0}$ , we have that ${\Sigma_{T}}^{-1}\circ\Sigma_{T}(x)$ is a set of isolated points. Moreover, for all $x\in\mathbb{X}$ and $T_{1},T_{2}\geq T_{0}$ , we have that ${\Sigma_{T_{1}}}^{-1}\circ\Sigma_{T_{1}}(x)={\Sigma_{T_{2}}}^{-1}\circ\Sigma_{T_{2}}(x)$ . We call $T_{0}$ the minimum horizon length of $\Sigma$ .

The above definition is equivalent to the definitions contained in [25, 1], which has been restated it in a manner suitable for the optimization-based estimation framework considered here. As seen from the above definition, strong observability is based on a distinguishability notion, and when it holds globally (i.e., ${\Sigma_{T}}^{-1}\circ\Sigma_{T}=id$ for all $T\geq T_{0}$ ) it is equivalent to the notion of uniform observability, as established in [16].

For systems with process noise, of the form $\Omega$ in (1), we introduce the notion of almost sure strong local observability.

Definition 6.

(Almost sure strong local observability). The system $\Omega$ defined in (1) is called almost surely strongly locally observable if there exists a $T^{w}\in\mathbb{N}$ such that, given a process noise sequence $\mathbf{w}_{0:T-1}\in\mathbb{W}^{T}$ , for $T\geq T^{w}$ , any $\mathbf{y}_{0:T}=\Omega_{\mathbf{w}_{0:T-1}}(x)\in\mathbb{Y}^{T+1}$ , and $T\geq T^{w}$ , we have that $\Omega_{\mathbf{w}_{0:T-1}}^{-1}(\mathbf{y}_{0:T})$ is a set of isolated points almost surely. More precisely, the set of noise sequences $\mathbf{w}_{0:T-1}$ for which $\Omega_{\mathbf{w}_{0:T-1}}^{-1}(\mathbf{y}_{0:T})$ is not a set of isolated points, is of measure zero. Moreover, we call $T^{w}$ the minimum horizon length of $\Omega$ .

We now present a fundamental result that characterizes strong local observability via a rank condition.

Lemma 3.

(Observability rank condition [25]). The system $\Sigma$ is locally strongly observable with minimum horizon length $T_{0}$ if and only if $\text{Rank}(\nabla\Sigma_{T}(x))=\text{dim}(\mathbb{X})$ for all $T\geq T_{0}$ and $x\in\mathbb{X}$ . The system $\Omega$ is almost surely locally strongly observable with minimum horizon length $T^{w}$ if and only if $\text{Rank}(\nabla\Omega_{\mathbf{w}_{0:T-1}}(x))=\text{dim}(\mathbb{X})$ almost surely for all $T\geq T^{w}$ and $x\in\mathbb{X}$ . $\bullet$

We now present an example to illustrate these concepts.

Example 1.

Consider a system with the state space $\mathbb{X}=(0,\infty)$ , with $x_{k+1}=f_{0}(x_{k})$ and $y_{k}=h(x_{k})$ , such that:

[TABLE]

for some $a\in\mathbb{N}$ , $\epsilon$ small and a smooth function $\gamma$ such that $\gamma(a\pi-\epsilon)=3(a\pi-\epsilon)$ and $\gamma(a\pi+\epsilon)=2(a\pi+\epsilon)+a\pi$ . Moreover, let the output $h(x)=\sin{x}$ . We note that $\nabla h(x)=\cos{x}$ which implies that $\nabla h((2m+1)\pi/2)=0$ for all $m\in\mathbb{N}$ . Applying Lemma 3 for this system, we can infer that for $a=2$ , we get that the minimum horizon length $T_{0}=3$ . This is because the system becomes strongly locally observable at $x=\pi/2$ only over a horizon of length $T_{0}=3$ , that is $\nabla\Sigma_{k}(\pi/2)=\mathbf{0}_{k+1}$ for $k\in\{0,1,2\}$ . This is a case of a one-dimensional system which is strongly locally observable with a minimum horizon of length $T_{0}=3$ . With larger values of $a$ , the minimum horizon length is further increased. $\bullet$

We make the following assumption in the rest of the paper:

Assumption 3.

(Strong local observability).

The system $\Sigma$ in (2) is strongly locally observable with minimum horizon length $T_{0}$ . 2. 2.

*The system $\Omega$ in (1) is almost surely strongly locally observable with minimum horizon length $T^{w}$ . *

4 Optimization-based state estimation

We now begin by addressing the state estimation problem for the autonomous system $\Sigma$ , and develop a recursive moving-horizon estimator for it.

4.1 Full-Information Estimation (FIE)

Let $\{y_{k}\}_{k\in\{0\}\cup\mathbb{N}}$ be a sequence of measurements generated by the system $\Sigma$ . Let $\{0,\ldots,T\}$ be a time horizon such that $T\geq T_{0}$ , the minimum horizon length of the system $\Sigma$ , and denote $\mathbf{y}_{0:T}=(y_{0},\ldots,y_{T})$ . The problem of estimation essentially aims at characterizing ${\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ , which is an inverse problem, and optimal estimation formulates this problem as an optimization. Assumptions 1, and 3, on Lipschitz continuity and strong local observability, respectively, ensure that the inverse problem is locally well-posed as in Definition 2.

To formulate the inverse problem as an optimization, consider a convex function $J_{T}(\mathbf{y}_{0:T},\cdot):\mathbb{Y}^{T+1}\rightarrow{\mathbb{R}}_{\geq 0}$ such that $J_{T}(\mathbf{y}_{0:T},\xi)=0$ if and only if $\xi=\mathbf{y}_{0:T}$ . Moreover, we let $\lim_{T\rightarrow\infty}J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(x))=\infty$ if $x\notin{\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ for $T\geq T_{0}$ . Now, the problem of interest becomes:

[TABLE]

In the above, $\mathbf{y}_{0:T}$ is the data in the estimation problem, which is given. Since the objective is to solve the original inverse problem, and we would like to use gradient descent-based methods, we would like for every local minimizer of $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(x))$ to belong to the set ${\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ , or, in other words, that every local minimizer is also global. We therefore make the following additional assumption on the system $\Sigma$ and the choice of $J_{T}$ . For a conciseness of notation, in the following assumption and lemma, we let $J_{T}(\cdot)=J_{T}(\mathbf{y}_{0:T},\cdot)$ , suppressing the data $\mathbf{y}_{0:T}$ in the notation where useful, and is understood from context.

Assumption 4.

(Lower semicontinuity of sublevel sets). We assume that, for all $T\geq T_{0}$ , the convex function $J_{T}:\mathbb{Y}^{T+1}\rightarrow\real$ is such that the set-valued map $\mathcal{S}_{\mathbb{X}}(\alpha)={\Sigma_{T}}^{-1}\left(\mathcal{S}^{J_{T}}_{\mathbb{Y}^{T+1}}(\alpha)\cap\Sigma_{T}(\mathbb{X})\right)$ is lower semicontinuous, where $\mathcal{S}^{J_{T}}_{\mathbb{Y}^{T+1}}(\alpha)=\{\xi\in\mathbb{Y}^{T+1}|J_{T}(\xi)\leq\alpha\}$ .

The above assumption ensures that the function $J_{T}\left(\mathbf{y}_{0:T},\Sigma_{T}(\cdot)\right)$ satisfies the condition for the local minimizers to be global (Theorem 1 from [34]). The following lemma provides a sufficient condition for it to hold.

Lemma 4.

(Second-order sufficient condition for lower semicontinuity). Assumption 4 holds if for any $x\in\mathbb{X}$ such that $\nabla\left(J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(x))\right)=0$ we have $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(x))=0$ , or the following condition holds when $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(x))\neq 0$ for any $v\in{}^{d_{X}}$ , $v\neq 0$ :

[TABLE]

where $\operatorname{Hess}J_{T}$ is the Hessian of $J_{T}$ . $\bullet$

The final inequality in Lemma 4 merely states that those critical points at which the cost function does not reach the global minimum value are local maximizers.

We are now ready to present the following theorem that establishes the equivalence between the inverse problem of characterizing the set ${\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ and the optimization (3).

Theorem 1.

(Inverse as minimizer). For a convex $J_{T}(\mathbf{y}_{0:T},\cdot):\mathbb{Y}^{T+1}\rightarrow{\mathbb{R}}_{\geq 0}$ such that $J_{T}(\mathbf{y}_{0:T},\xi)=0$ if and only if $\xi=\mathbf{y}_{0:T}$ for any $\mathbf{y}_{0:T}\in\mathbb{Y}^{T+1}$ , under Assumptions 3 and 4, and any $T\geq T_{0}$ , it holds that $z\in{\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ if and only if $z$ is a minimizer of $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(\cdot))$ .

Proof.

If $z\in{\Sigma_{T}}^{-1}(\mathbf{y}_{0,T})$ , we have that $h\circ f^{k}_{0}(z)=y_{k}$ for all $k\in\{0,\ldots,T\}$ . It now follows that $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(z))=0$ . Since, $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(z))\geq 0$ by definition, we infer that $z$ is a global minimizer of $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(\cdot))$ .

Suppose that $z$ is a local minimizer of $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(\cdot))$ . By Assumption 4 and Theorem 1 in [34], we get that the local minima of $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(\cdot))$ are also global, which implies that $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(z))=0$ , and therefore $\Sigma_{T}(z)=\mathbf{y}_{0:T}$ . ∎

Theorem 1 suggests that the state estimates for the system $\Sigma$ can be obtained by minimizing $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(\cdot))$ over a horizon of length $T\geq T_{0}$ . This is also called the full information estimation (FIE) problem in the optimal state estimation literature [29, 18], as it works with the entire sequence of output measurements over the horizon $\{0,\ldots,T\}$ .

Now, from Assumption 3 and Theorem 1, we have that ${\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ is a set of isolated points which are minimizers of $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(\cdot))$ . It then follows that ${\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ is the set of stable fixed points of the negative gradient vector field of $J_{T}(\mathbf{y}_{0:T},\Sigma_{T}(\cdot))$ . We let $\mathcal{C}_{0}$ be the basin of attraction of this set. Moreover, we note that $f^{k}({\Sigma_{T}}^{-1}(\mathbf{y}_{0:T}))$ is the set of stable fixed points of the negative gradient vector field of $J_{T}\left(\mathbf{y}_{k:k+T},f^{k}\circ\Sigma_{T}(\cdot)\right)$ , and we let $\mathcal{C}_{k}$ be the basin of attraction of ${\Sigma_{T}}^{-1}(\mathbf{y}_{k:k+T})$ . We have used above the fact that ${\Sigma_{T}}^{-1}(\mathbf{y}_{k:k+T})=f_{0}^{k}({\Sigma_{T}}^{-1}(\mathbf{y}_{0:T}))$ , which follows from the definition of strong local observability.

We now lift the FIE problem (3) to the space of probability measures over $\mathbb{X}$ , as a minimization in expectation of the estimation objective function:

[TABLE]

The above formulation allows us to capture information about the (probably many) optimal estimates through a probability measure $\mu_{0}$ , and help encode distributional constraints, which will be considered in a forthcoming publication.

In the following, we develop recursive moving-horizon estimators that generate sequences $\{\mu_{k}\}_{k\in\mathbb{N}}$ of probability measures in $\mathcal{P}(\mathbb{X})$ as estimates. We then obtain practically implementable estimators using Monte Carlo methods to sample from the measures $\mu_{k}$ .

4.2 Moving-Horizon Estimation (MHE)

In the previous section, we presented a formulation of the full information estimation (FIE) problem for the autonomous system $\Sigma$ , which uses the entire measurement sequence over a horizon of length $T\geq T_{0}$ . However, the minimum horizon length $T_{0}$ may be large, which would make the estimation computationally intensive. Moreover, we would like to progressively assimilate the incoming measurements online. We therefore adopt a moving-horizon estimation method which, at any time instant $k+N$ , uses the output measurements from the horizon $\{k+1,\ldots,k+N\}$ (of length $N<T_{0}$ ), and the state estimate at the time instant $k-1$ , to obtain the state estimate at instant $k$ , recursively.

We let $G^{N}_{k}(z)=J_{N-1}\left(\mathbf{y}_{k+1:k+N},\Sigma_{N}(z)\right)$ be the objective function over the horizon $\{k+1,\ldots,k+N\}$ , at the time instant $k+N$ , where $\mathbf{y}_{k+1:k+N}=(y_{k+1},\ldots,y_{k+N})$ .

Assumption 5.

(Moving-horizon cost). We make the following assumptions on the cost function $G^{N}_{k}$ :

the cost $G^{N}_{k}$ is $l$ -smooth, 2. 2.

it holds that $|G^{N}_{k+1}(f_{0}(z))-G^{N}_{k}(z)|\leq L\|\nabla G^{N}_{k}(z)\|^{2}$ , 3. 3.

the previous constants are such that $lL\leq\frac{1}{2}$ , 4. 4.

for any two $\delta$ -adjacent measurements $\mathbf{y},\tilde{\mathbf{y}}\in\mathbb{Y}^{T+1}$ , such that $\|\mathbf{y}-\tilde{\mathbf{y}}\|\leq\delta$ and with corresponding costs $G^{N}_{k}$ and ${\widetilde{G}}^{N}_{k}$ , for $k\in\{0,\ldots,T\}$ and $N\leq T-k$ , we have $\|\nabla(G^{N}_{k}-{\widetilde{G}}^{N}_{k})(x)\|\leq l\delta$ for all $x\in\mathbb{X}$ .

We now formulate the general moving-horizon estimation method as follows:

[TABLE]

where $D:\mathcal{P}(\mathbb{X})\times\mathcal{P}(\mathbb{X})\rightarrow{\mathbb{R}}_{\geq 0}$ is a placeholder for a metric, divergence or transport cost on $\mathcal{P}(\mathbb{X})$ . We obtain implementable observers from the above formulation by sampling from the measures, by Monte Carlo methods. As discussed in the ensuing sections, using the $2$ -Wasserstein distance $W_{2}$ yields the more familiar MHE formulation, whereas with the KL-divergence we obtain a moving-horizon particle filter. Hence, this formulation is proposed as a unifying probabilistic framework for moving-horizon estimation, where different estimators are generated by different choices of $D$ .

We now introduce the following asymptotic stability notion for estimators that will be used in investigating the properties of the estimators we design.

Definition 7.

(Asymptotic stability of state estimator). We call an estimator of the form (5) an asymptotically stable observer for the system $\Sigma$ if the sequence of estimates $\{\mu_{k}\}_{k\in\mathbb{N}}$ is such that $\lim_{k\rightarrow\infty}\mu_{k}({\Sigma_{T}}^{-1}(\mathbf{y}_{k:k+T}))=1$ for $T\geq T_{0}$ .

5 A $W_{2}$ -Moving-Horizon Estimator

In this section, we derive a moving-horizon estimator, which we refer to as the $W_{2}$ -MHE, to generate a sequence of probability distributions $\{\mu_{k}\}_{k\in\mathbb{N}}$ . This is based on the one-step minimization scheme of [30] in $\mathcal{P}(\mathbb{X})$ w.r.t. the Wasserstein metric $W_{2}$ , which we extend to the moving-horizon setting. For every $k>0$ , consider:

[TABLE]

We let $\mathcal{K}_{k}$ be the support of $\mu_{k}$ , with $\mathcal{K}_{0}\subseteq\mathcal{C}_{0}$ , where $\mathcal{C}_{0}$ is as defined earlier in Section 4.1.

5.1 Sample update scheme for $W_{2}$ -MHE

We now derive a sample update scheme for $W_{2}$ -MHE, which also yields an implementable filter for the $W_{2}$ -MHE formulation.

We note that any local minimizer $\mu_{k}$ of (6) is a critical point of the objective functional and therefore, it satisfies:

[TABLE]

where $\phi_{k}$ is the Kantorovich potential [30] associated with the transport from $\mu_{k}$ to $f_{0\#}\mu_{k-1}$ , and $c$ is a constant (from the constraint $\int_{\mathbb{X}}d\mu(x)=1$ , for $\mu\in\mathcal{P}(\mathbb{X})$ , due to which the first variation is defined up to an additive constant). From the above equation, we now obtain:

[TABLE]

The gradient of the Kantorovich potential $\phi_{k}$ defines the deterministic optimal transport map $T_{k}$ (note that this notation is not to be confused with that of the time horizon $T$ ) w.r.t. the $W_{2}$ -distance from $\mu_{k}$ to $f_{0\#}\mu_{k-1}$ , which determines $\nabla\phi_{k}(x)=x-T_{k}^{-1}(x)$ (where $\mu_{k}={T_{k}}_{\#}f_{0\#}\mu_{k-1}$ ). We therefore get:

[TABLE]

The above equation allows us to design an implementable filter for the $W_{2}$ -MHE (6). We let $z_{k}\sim\mu_{k}$ , that is, $z_{k}\in\mathcal{K}_{k}$ is sampled from the distribution $\mu_{k}$ . From (7), it holds that $z_{k}=T_{k}^{-1}(z_{k})-\eta\nabla G^{N}_{k}(z_{k})$ . Since $(T_{k}^{-1})_{\#}\mu_{k}=f_{0\#}\mu_{k-1}$ , we let $T_{k}^{-1}(z_{k})=f_{0}(z_{k-1})$ , a sample of the distribution $f_{0\#}\mu_{k-1}$ , and we obtain the following recursive estimator:

[TABLE]

We now note that the estimate $z_{k}$ in (8) corresponds to a critical point of the following minimizing movement scheme:

[TABLE]

Lemma 5.

(Strong convexity). For $\eta<l^{-1}$ , the objective function in (9) is strongly convex, and therefore $\text{prox}_{\eta G^{N}_{k}}(f_{0}(x))$ is a singleton for any $x\in\mathbb{X}$ .

Proof.

Let $\Theta(z)=\frac{1}{2}\left\|z-f_{0}(\tilde{z})\right\|^{2}+\eta G^{N}_{k}(z)$ . We have that $\nabla\Theta(z_{1})-\nabla\Theta(z_{2})=z_{1}-z_{2}+\eta\left(\nabla G^{N}_{k}(z_{1})-\nabla G^{N}_{k}(z_{2})\right)$ . It now follows that $\left\langle\nabla\Theta(z_{1})-\nabla\Theta(z_{2}),z_{1}-z_{2}\right\rangle=\|z_{1}-z_{2}\|^{2}+\eta\left\langle\nabla G^{N}_{k}(z_{1})-\nabla G^{N}_{k}(z_{2}),z_{1}-z_{2}\right\rangle$ . From Assumption 5-(1), on the moving-horizon cost, we now get that $\left\langle\nabla\Theta(z_{1})-\nabla\Theta(z_{2}),z_{1}-z_{2}\right\rangle\geq(1-\eta l)\|z_{1}-z_{2}\|^{2}$ , and since $\eta l<1$ , we infer that $\Theta$ is strongly convex, and therefore has a unique minimizer. Thus, $\text{prox}_{\eta G^{N}_{k}}(f_{0}(\tilde{z}))=\arg\min_{z}\Theta(z)$ is a singleton. ∎

We note that the minimization (9) defines a proximal mapping w.r.t. the Euclidean metric, which we represent in a compact form using the proximal operator as:

[TABLE]

where $\text{supp}(\mu_{0})=\mathcal{K}_{0}\subseteq\mathcal{C}_{0}$ .

5.2 Asymptotic stability of $W_{2}$ -MHE

We present the asymptotic stability result for $W_{2}$ -MHE in this section, before which we introduce the following assumption on positive invariance of the discrete-time dynamics defined by the map $\text{prox}_{\eta G^{N}_{k}}\circ f$ .

Assumption 6.

(Positive invariance). We assume that there exists $\alpha>(1-\sqrt{1-2lL}){l}^{-1}$ such that for all $\eta\in(0,\alpha)$ , we have $\text{prox}_{\eta G^{N}_{k}}(f(\mathcal{C}_{k-1}))\subseteq\mathcal{C}_{k}$ .

The above assumption ensures that under the discrete-time dynamics defined by the map $\text{prox}_{\eta G^{N}_{k}}\circ f$ , any sequence starting in the basin of attraction $\mathcal{C}_{0}$ of ${\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ remains within the basins of attraction $\mathcal{C}_{k}$ of ${\Sigma_{T}}^{-1}(\mathbf{y}_{k:k+T})$ at the subsequent instants of time $k\in\mathbb{N}$ .

We are now ready to present the asymptotic stability result for $W_{2}$ -MHE:

Theorem 2.

*(Asymptotic stability of $W_{2}$ -MHE).

The estimator (6), under Assumptions 3 to 6, with a constant step size $\eta\in\displaystyle{\left(\frac{1-\sqrt{1-2lL}}{l},\min\left\{\alpha,\frac{1}{l}\right\}\right)}$ , is an asymptotically stable observer for the system $\Sigma$ .*

Proof.

By Assumption 5- $(1)$ , on the moving-horizon cost, and Lemma 1, we have:

[TABLE]

Substituting from (8) into the above, we get:

[TABLE]

It now follows that:

[TABLE]

From Assumption 5- $(2)$ , on the moving-horizon cost, we have:

[TABLE]

Summing the above inequality from $k=1$ to $K$ , we get:

[TABLE]

From here, we obtain:

[TABLE]

Since $\eta\in\displaystyle{\left(\frac{1-\sqrt{1-2lL}}{l},\frac{1}{l}\right)}$ , we have that $\eta\left(1-\frac{l}{2}\eta\right)-L>0$ and therefore, taking limits in the previous inequality, we deduce that the series is summable. The latter implies that $\lim_{k\rightarrow\infty}\nabla G^{N}_{k}(z_{k})=0$ , and from (8), we have that $\lim_{k\rightarrow\infty}\|z_{k}-f(z_{k-1})\|=0$ .

It now follows, by definition, from the above that $\lim_{k\rightarrow\infty}\nabla G_{k}^{T+1}(z_{k})=\lim_{k\rightarrow\infty}\nabla\left(J_{T}\left(\mathbf{y}_{k:k+T},\Sigma_{T}(z_{k})\right)\right)=0$ , over a horizon of length $T+1$ (with $T\geq T_{0}$ ). We now have that the initial condition $z_{0}\in\mathcal{K}_{0}\subseteq\mathcal{C}_{0}$ and Assumption 6 ensure that $z_{k}\in\mathcal{C}_{k}$ , the basin of attraction of $f^{k}\left({\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})\right)$ and from the fact that $\lim_{k\rightarrow\infty}\nabla\left(J_{T}\left(\mathbf{y}_{k:k+T},\Sigma_{T}(z_{k})\right)\right)=0$ , we infer that $\{z_{k}\}$ converges to the local minima of $J_{T}\left(\mathbf{y}_{k:k+T},\Sigma_{T}(\cdot)\right)$ . By Theorem 1, it now follows that $\{z_{k}\}$ converges to the set $\Sigma_{T}^{\quad-1}(\mathbf{y}_{k:k+T})$ . Therefore $\lim_{k\rightarrow\infty}d(z_{k},{\Sigma_{T}}^{-1}(\mathbf{y}_{k:k+T}))=0$ .

Moreover, since $\lim_{k\rightarrow\infty}d(z_{k},{\Sigma_{T}}^{-1}(\mathbf{y}_{k:k+T}))=0$ for all $z_{0}\in\mathcal{K}_{0}$ , it follows that $\lim_{k\rightarrow\infty}\mathcal{K}_{k}={\Sigma_{T}}^{-1}(\mathbf{y}_{k:k+T})$ . We know that $\operatorname{supp}(\mu_{k})=\mathcal{K}_{k}$ , and therefore we get that $\lim_{k\rightarrow\infty}\mu_{k}\left({\Sigma_{T}}^{-1}(\mathbf{y}_{k:k+T})\right)=1$ . ∎

5.3 Robustness of $W_{2}$ -MHE

We now characterize the performance of the estimator (6) on the system $\Omega$ in (1). Since the true process and measurement noise sequences remain unknown, we are interested in the robustness properties of the estimator (11), in the form of an upper bound by the norms of the disturbance sequences on the estimation error.

We begin by constructing a reference estimator that recursively generates the estimate sequence, given the true disturbance sequences $\{w_{k}\}_{k\in\mathbb{N}}$ and $\{v_{k}\}_{k\in\mathbb{N}}$ , as follows:

[TABLE]

where, we employ for conciseness $\mathbf{w}\equiv\mathbf{w}_{k:k+N-1}=(w_{k},\dots,w_{k+N-1})$ and $\mathbf{v}\equiv\mathbf{v}_{k+1:k+N}=(v_{k+1},\dots,v_{k+N})$ , so that $\bar{G}^{N}_{k}(z)\equiv\bar{G}^{N}_{k}(z,\mathbf{w},\mathbf{v})=J_{N-1}\left(\mathbf{y}_{k+1:k+N},\Omega_{\mathbf{w}_{k:k+N-1}}(z)+\mathbf{v}_{k+1:k+N}\right)$ . Note that $G^{N}_{k}=\bar{G}^{N}_{k}\big{|}_{\mathbf{w}=0,\mathbf{v}=0}$ . We let $\bar{\mathcal{K}}_{k}$ be the support of $\bar{\mu}_{k}$ , with $\bar{\mathcal{K}}_{0}\subseteq\bar{\mathcal{C}}_{0}$ , where the definition of $\bar{\mathcal{C}}_{k}$ is similar to that of $\mathcal{C}_{k}$ but taking the noise $\{w_{k}\}$ and $\{v_{k}\}$ into account.

Assumption 7.

(l-Smoothness w.r.t. disturbances). We assume that $\|\nabla G^{N}_{k}(z)-\nabla\bar{G}^{N}_{k}(z)\|\leq l_{w}\|(\mathbf{w}_{k:k+N-1},\mathbf{v}_{k+1:k+N})\|$ for all $z\in\mathbb{X}$ .

Following the proof of Theorem 2, under the same set of underlying assumptions, we infer that the reference estimator (11) is almost surely an asymptotically stable observer for the system $\Omega$ , given a particular realization of the disturbances $\{w_{k}\}_{k\in\mathbb{N}}$ and $\{v_{k}\}_{k\in\mathbb{N}}$ .

We now present the following theorem on the robustness of the estimator (6), characterized by a bound on the error in the estimates generated by (6) with respect to the estimates generated by the reference estimator (11):

Theorem 3.

*(Robustness of $W_{2}$ -MHE). Under Assumptions 1, 3, 5, and 7, given the estimate sequences $\{\mu_{k}\}_{k\in\mathbb{N}}$ generated by (6) and $\{\bar{\mu}_{k}\}_{k\in\mathbb{N}}$ generated by the reference estimator (11), with $\mu_{0}=\bar{\mu}_{0}$ , we have $W_{2}(\mu_{k},\bar{\mu}_{k})\leq\frac{c_{f}^{(2)}}{c_{f}^{(1)}}WC_{k}+\frac{\eta l_{w}\sqrt{N}}{c_{f}^{(1)}}(W+V)C_{k}$ , for all $k\in\mathbb{N}$ , where $C_{k}=\sum_{\ell=1}^{k}(\frac{c_{f}^{(1)}}{1-\eta l})^{\ell}$ . *

Proof.

The estimator (11) yields the following reference recursive scheme:

[TABLE]

where the above is derived similarly to the noiseless case. Let $\{z_{k}\}_{k\in\mathbb{N}}$ and $\{\bar{z}_{k}\}_{k\in\mathbb{N}}$ be the estimate sequences generated by (8) and (12) respectively, with $z_{0}=\bar{z}_{0}$ , for which we have:

[TABLE]

where the final inequality follows from Assumptions 1, 5, and 7, on the several Lipschitz properties of $f$ the gradient of $G^{N}_{k}$ , and $\bar{G}^{N}_{k}$ , respectively. Further, since $\eta l<1$ , we obtain from the above that:

[TABLE]

We note that if $\frac{c_{f}^{(1)}}{1-\eta l}<1$ , we have that $\lim_{k\rightarrow\infty}C_{k}=\frac{c_{f}^{(1)}}{1-\eta l-c_{f}^{(1)}}$ is finite, and therefore, $\|z_{k}-\bar{z}_{k}\|$ is bounded as $k\rightarrow\infty$ . We note here that even when $z_{0}\neq\bar{z}_{0}$ , the effect of this initial discrepancy vanishes as $k\rightarrow\infty$ .

Now, let $T_{k}:\mathcal{K}_{k}\rightarrow\bar{\mathcal{K}}_{k}$ be a map such that for sequences $\{z_{k}\}$ and $\{\bar{z}_{k}\}$ generated by (8) and (12) respectively, with $z_{0}=\bar{z}_{0}$ , we have $T_{k}(z_{k})=\bar{z}_{k}$ . It then follows that ${T_{k}}_{\#}\mu_{k}=\bar{\mu}_{k}$ . Now, from the above, and by definition of the $2$ -Wasserstein distance, we have:

[TABLE]

∎

6 A KL-Moving-Horizon Estimator

In this section, we derive a moving-horizon estimator, which we refer to as KL-MHE, to generate a sequence of probability distributions $\{\mu_{k}\}_{k\in\mathbb{N}}$ . Using the KL-divergence $D_{\textup{KL}}$ as the choice of divergence in the moving-horizon formulation (5), we obtain:

[TABLE]

We note that any local minimizer $\mu_{k}$ of (13) is a critical point of the objective functional, and, therefore, it satisfies:

[TABLE]

where $c$ is a constant (from the constraint $\int_{\mathbb{X}}d\mu(x)=1$ , for $\mu\in\mathcal{P}(\mathbb{X})$ , due to which the first variation is defined up to an additive constant). From the above, we get:

[TABLE]

where for any $\ell\in\{0,1,\ldots\}$ , $\rho_{\ell}$ is the density function corresponding to the measure $\mu_{\ell}$ . Therefore, the corresponding recursive update scheme for the density function is given by:

[TABLE]

where $c_{k}$ is the normalization constant. We note that the above is a particle filter formulation, with the horizon cost $G^{N}_{k}$ defining the weighting function. Implementable filters are obtained by a Sequential Monte Carlo method, see [10]. We now present the asymptotic stability result for KL-MHE:

Theorem 4.

(Asymptotic stability of KL-MHE). The estimator (13), under Assumptions 1 to 4, is an asymptotically stable observer for the system $\Sigma$ .

Proof.

We know that for any map $\mathcal{T}$ and measure $\mu$ , we have that $d\mathcal{T}_{\#}\mu(x)=d\mu\left(\mathcal{T}^{-1}(x)\right)$ . It then follows from (14) that:

[TABLE]

We now rewrite the above as:

[TABLE]

Repeating the above process $k$ times, we obtain:

[TABLE]

where $C_{k}=c_{k}c_{k-1}\ldots c_{1}$ is the normalization constant. If $x\notin{\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ , we have that $\lim_{k\rightarrow\infty}\rho_{k}(f^{k}_{0}(x))=0$ , since $\sum_{\ell=1}^{k}G^{N}_{\ell}(f^{\ell}_{0}(x))\rightarrow\infty$ as $k\rightarrow\infty$ for all $x\notin{\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ (by definition of the cost function, the sum diverges over an infinitely long horizon). Thus, we get:

[TABLE]

∎

7 Differential privacy

In this section, we discuss the mechanism for encoding the desired level of differential privacy in moving-horizon estimators. We then apply this mechanism to the two estimators presented in the previous sections, the $W_{2}$ -MHE and KL-MHE. We conclude the section with a discussion on differential privacy of the estimators over a time horizon. Our aim here is to guarantee differential privacy of the measurement data $\mathbf{y}_{0:T}$ , when the estimate sequence $\{\mu_{k}\}$ is released (made public). We consider the class of scenarios where an adversary can access the released estimates, while the measurement data itself is not accessible to the adversary. Our goal in incorporating differential privacy in estimation is to ensure that the adversary is not able to distinguish (in the sense of $\epsilon$ -differential privacy) between measurement sequences that are $\delta$ -adjacent, using the released estimates, which is an underlying risk when the estimates are directly released without such a consideration.

Given the framework (5), we encode differential privacy by an entropic regularization of the estimation objective function, as follows:

[TABLE]

where $s_{k}\in[0,1]$ is a tunable time-dependent parameter and $\mathcal{K}_{k}$ is the support of $f_{0\#}\mu_{k-1}$ (with $\mathcal{K}_{0}$ being the support of $\mu_{0}$ ). Moreover, $S^{A}(\mu)=\int_{A}\rho\log(\rho)\operatorname{dvol}$ , where $A\subset\mathbb{X}$ and $d\mu=\rho\operatorname{dvol}$ . We note that when $s_{k}=1$ , the above formulation reduces to (5) and when $s_{k}=0$ , it is equivalent to an entropy maximization problem, yielding a uniform distribution over the set $f_{0}(\mathcal{K}_{k-1})$ as the solution. Clearly, the uniform distribution is insensitive to the measurements, and therefore offers maximum privacy, while being of no value to the estimation objective. The ensuing analysis in this section is directed at determining upper bounds on the parameter sequence $\{s_{k}\}_{k\in\mathbb{N}}$ such that the MHE offers $\epsilon$ -differential privacy. We rewrite the optimization problem (15) for $s_{k}\in(0,1]$ as follows:

[TABLE]

Let $\mathbf{y},\tilde{\mathbf{y}}\in\mathbb{Y}^{T+N+1}$ be two $\delta$ -adjacent measurement sequences as in Definition 4, over a horizon $\{0,\ldots,T+N\}$ , such that $\|\mathbf{y}-\tilde{\mathbf{y}}\|\leq\delta$ and let $\{\mu_{k}\}_{k\in\mathbb{N}}$ and $\{\tilde{\mu}_{k}\}_{k\in\mathbb{N}}$ be the sequences of estimates derived from (16). In the following, we determine conditions on $\{s_{k}\}_{k\in\mathbb{N}}$ that guarantee differential privacy for each of the estimators derived in previous sections.

7.1 Differentially private $W_{2}$ -MHE

We now design a differentially private $W_{2}$ -moving-horizon estimator. We begin by considering:

[TABLE]

for $s_{k}\in(0,1]$ .

The following theorem provides a sufficient upper bound on $s_{T}$ such that the entropy-regularized $W_{2}$ -MHE in (17) is $\epsilon_{T}$ -differentially private at a time instant $T$ .

Theorem 5.

(Sensitivity of $W_{2}$ -MHE). Given two $\delta$ -adjacent measurement sequences $\mathbf{y},\tilde{\mathbf{y}}\in\mathbb{Y}^{T+N+1}$ , under Assumption 5, we have that the estimates generated by (17) satisfy $D_{\textup{max}}\left(\mu_{T},\tilde{\mu}_{T}\right)\leq\epsilon_{T}$ if $s_{T}\leq\epsilon_{T}\left(\epsilon_{T}+c_{f}^{T}\text{diam}(\mathcal{K}_{0})\left(\eta l\delta+c_{f}^{T}\text{diam}(\mathcal{K}_{0})q(\delta)\right)\right)^{-1}$ , where $q:{\mathbb{R}}_{\geq 0}\rightarrow{\mathbb{R}}_{\geq 0}$ is a class- $\mathcal{K}$ function that satisfies $q(0)=0$ .

Proof.

Let $G^{N}_{k}$ and ${\widetilde{G}}^{N}_{k}$ be the estimation objective functions at time instant $k$ , corresponding to the measurement sequences $\mathbf{y}$ and $\widetilde{\mathbf{y}}$ respectively, and let $\mu_{k}$ and $\widetilde{\mu}_{k}$ be the respective estimated probability measures, with $\rho_{k},\widetilde{\rho}_{k}$ the corresponding density functions. From (17), we get that for all $k\in\{0,\ldots,T\}$ , $\mu_{k}$ , being the local minimizer is also a critical point of the objective functional. We therefore obtain:

[TABLE]

where $\phi_{k}$ is the Kantorovich potential associated with the transport from $\mu_{k}$ to ${f_{0}}_{\#}\mu_{k-1}$ and $c$ is a constant. It now follows that:

[TABLE]

Similarly, we have:

[TABLE]

Taking the difference between the above two equations:

[TABLE]

We have that $\nabla\phi_{k}(x)=x-T_{k}^{-1}(x)$ , where $\mu_{k}={T_{k}}_{\#}\left(f_{0\#}\mu_{k-1}\right)$ . This implies that $\nabla(\phi_{k}-\widetilde{\phi}_{k})(x)=-(T_{k}^{-1}(x)-\widetilde{T}_{k}^{-1}(x))$ . However, $T_{k}^{-1}(x),\widetilde{T}_{k}^{-1}(x)\in f_{0}(\mathcal{K}_{k-1})=f^{k}_{0}(\mathcal{K}_{0})$ , and therefore $\|\nabla(\phi_{k}-\widetilde{\phi}_{k})(x)\|\leq c_{f}^{k}\text{diam}(\mathcal{K}_{0})q(\delta)$ , for all $x\in f^{k}_{0}(\mathcal{K}_{0})$ and some class- $\mathcal{K}$ function $q$ . We let $q$ characterize the dependence of $\phi$ on the measurement sequence, and we get that $\|\nabla(\phi_{k}-\widetilde{\phi}_{k})(x)\|=0$ for all $x\in\mathbb{X}$ , when $\delta=0$ . Moreover, by Assumption 5, we get $\|\nabla(G^{N}_{k}-{\widetilde{G}}^{N}_{k})(x)\|\leq l\delta$ . Therefore, we obtain:

[TABLE]

We also have that for any $x\in f^{k}_{0}(\mathcal{K}_{0})$ :

[TABLE]

where $\gamma(0)=\bar{x}$ and $\gamma(1)=x$ . Since $\rho_{k}$ and $\widetilde{\rho}_{k}$ are continuous, with $\int_{f^{k}_{0}(\mathcal{K}_{0})}(\rho_{k}-\widetilde{\rho}_{k})=0$ (since $\int_{f^{k}_{0}(\mathcal{K}_{0})}\rho_{k}=\int_{f^{k}_{0}(\mathcal{K}_{0})}\widetilde{\rho}_{k}=1$ ), there exists an $\bar{x}\in f^{k}_{0}(\mathcal{K}_{0})$ such that $\rho_{k}(\bar{x})=\widetilde{\rho}_{k}(\bar{x})$ , which implies that $\log\left(\frac{\rho_{k}}{\widetilde{\rho}_{k}}\right)(\bar{x})=0$ . From (18) and (19), for a straight line segment $\gamma$ , we therefore obtain:

[TABLE]

where we have used the fact that $\int_{0}^{1}|\dot{\gamma}(t)|dt=\|x-\bar{x}\|\leq\text{diam}({f_{0}^{k}}(\mathcal{K}_{0}))\leq c_{f}^{k}\text{diam}(\mathcal{K}_{0})$ . Thus, for $k=T$ , we let:

[TABLE]

from which we obtain that:

[TABLE]

and since $\left|\log\left(\frac{\rho_{T}}{\widetilde{\rho}_{T}}\right)(x)\right|\leq\epsilon_{T}$ for all $x\in f^{T}_{0}(\mathcal{K}_{0})$ , we have that $\sup_{x\in f^{T}_{0}(\mathcal{K}_{0})}\left|\log\left(\frac{\rho_{T}}{\widetilde{\rho}_{T}}\right)\right|=D_{\textup{max}}(\mu_{T},\widetilde{\mu}_{T})\leq\epsilon_{T}$ . ∎

As noted earlier, Theorem 5 provides a sufficient upper bound on $s_{T}$ for differential privacy of the estimate at $T$ . The goal, however, is to guarantee the desired level of differential privacy over a time horizon $\{0,\ldots,T\}$ . The key issue here is that the recursive update scheme of the estimator introduces a dependence between the estimates at different time instants. This essentially means that imposing an upper bound on sensitivity for the marginal distributions $\mu_{k}$ individually, without regard to the dependence between these distributions, may not be sufficient. Therefore, to guarantee the desired level of differential privacy over the time horizon, we must impose an upper bound on the sensitivity of the joint distribution $\sigma\in\mathcal{P}(\mathbb{X}^{T+1})$ , where the estimates $\mu_{k}$ are the marginals of $\sigma$ over $\mathbb{X}$ .

The following theorem provides a sufficient upper bound on $\{s_{k}\}_{k=1}^{T}$ such that the entropy-regularized $W_{2}$ -MHE in (17) is $\epsilon$ -differentially private over a time horizon $\{0,\ldots,T\}$ .

Theorem 6.

(Differentially private $W_{2}$ -MHE). Given two $\delta$ -adjacent measurement sequences $\mathbf{y},\widetilde{\mathbf{y}}\in\mathbb{Y}^{T+N+1}$ , under Assumption 5, we have that the estimates generated by (17) satisfy $D_{\textup{max}}\left(\sigma,\widetilde{\sigma}\right)\leq\epsilon$ if $\sum_{k=1}^{T}\left(\frac{s_{k}}{1-s_{k}}\right)c_{f}^{k}\leq\frac{\epsilon}{l\delta\text{diam}(\mathcal{K}_{0})}$ .

Proof.

Let $G^{N}_{k}$ and ${\widetilde{G}}^{N}_{k}$ be the estimation objective functions at time instant $k$ , corresponding to the measurement sequences $\mathbf{y}$ and $\widetilde{\mathbf{y}}$ respectively, and let $\sigma$ and $\widetilde{\sigma}$ be the respective joint probability measures over the horizon $\{0,\ldots,T\}$ . With a slight abuse of notation, we allow $\sigma$ and $\widetilde{\sigma}$ to also denote the joint density function. We now have:

[TABLE]

where $\rho_{k}(x_{k}|x_{k-1})$ is the marginal density at $x_{k}$ at time instant $k$ , given that the distribution at time instant $k-1$ is concentrated at $x_{k-1}$ . Moreover, we note that the $W_{2}$ -MHE (17) yields a Markov process, which allows us to express $\rho_{k}(x_{k}|x_{k-1},\ldots,x_{0})=\rho_{k}(x_{k}|x_{k-1})$ . Now, $\rho_{k}(x_{k}|x_{k-1})$ is the density corresponding to the measure obtained by the following:

[TABLE]

where $\partial_{\xi}$ is the Dirac measure concentrated at $\xi$ . From the above, we get that for all $k\in\{0,\ldots,T\}$ , $\mu_{k}$ , being the local minimizer is also a critical point of the objective functional. Applying similar steps to those in the proof of Theorem 5, we obtain:

[TABLE]

Now, we have:

[TABLE]

By taking

[TABLE]

we obtain the following inequality:

[TABLE]

and that $D_{\textup{max}}(\sigma,\widetilde{\sigma})\leq\epsilon$ . ∎

We note that for a given $\epsilon$ , the upper bound on the sequence $\{s_{k}\}$ decreases with $\delta$ . In other words, guaranteeing $\epsilon$ -differential privacy w.r.t. measurement sequences that are farther apart requires the addition of more noise and a greater loss in estimation accuracy. This is because the weighting on the entropic regularization term in the estimation objective increases when $s_{k}$ is reduced. The same is the case when $\epsilon$ is reduced for a given $\delta$ , which corresponds to a more stringent privacy requirement.

7.2 Differentially private KL-MHE

We now design a differentially private KL-moving-horizon estimator. We begin by considering the entropy-regularized KL-MHE formulation, given by:

[TABLE]

for $s_{k}\in(0,1]$ . The corresponding recursive update scheme for (20) is given by:

[TABLE]

which will be derived in the proof of Theorem 7 below.

The following theorem provides a sufficient upper bound on $s_{k}$ such that the entropy-regularized KL-MHE in (20) is $\epsilon_{T}$ -differentially private at a time instant $T$ , while ignoring the correlations between the estimates $\mu_{k}$ across time.

Theorem 7.

(Sensitivity of KL-MHE). Given two $\delta$ -adjacent measurement sequences $\mathbf{y},\widetilde{\mathbf{y}}\in\mathbb{Y}^{T+N+1}$ , under Assumption 5, we have that the estimates generated by (20) satisfy $D_{\textup{max}}\left(\mu_{T},\widetilde{\mu}_{T}\right)\leq\epsilon_{T}$ if $\sum_{k=1}^{T}\left(\prod_{i=k}^{T}s_{i}\right)\leq\epsilon_{T}\left(2\eta\max_{k\in\{0,\ldots,T\}}\left(\alpha_{k}+lc_{f}^{k}\delta\text{diam}(\mathcal{K}_{0})\right)\right)^{-1}$ , where $\alpha_{k}=\min_{\xi\in f^{k}_{0}(\mathcal{K}_{0})}\left|\left(G^{N}_{k}-{\widetilde{G}}^{N}_{k}\right)(\xi)\right|$ .

Proof.

Let $G^{N}_{k}$ and ${\widetilde{G}}^{N}_{k}$ be the estimation objective functions at time instant $k$ , corresponding to the measurement sequences $\mathbf{y}$ and $\widetilde{\mathbf{y}}$ respectively, and let $\mu_{k}$ and $\widetilde{\mu}_{k}$ be the respective estimated probability measures, with $\rho_{k},\widetilde{\rho}_{k}$ the corresponding density functions. From (20), we get that for all $k\in\{0,\ldots,T\}$ , $\mu_{k}$ , being the local minimizer is also a critical point of the objective functional. We therefore obtain:

[TABLE]

from which we derive that:

[TABLE]

The above equation can be rewritten as follows:

[TABLE]

where $c_{k}$ is the normalization constant. We therefore obtain:

[TABLE]

Expanding the above, we get:

[TABLE]

where $C_{T}=c_{1}c_{2}\ldots c_{T}$ . Similarly, we have:

[TABLE]

where $\widetilde{C}_{T}=\widetilde{c}_{1}\widetilde{c}_{2}\ldots\widetilde{c}_{T}$ and $\rho_{0}=\widetilde{\rho}_{0}$ , as we assume that the estimator starts with the same initial $\mu_{0}$ . From the above two equations, we obtain:

[TABLE]

The max-divergence between $\mu_{T}$ and $\widetilde{\mu}_{T}$ can be upper bounded now by:

[TABLE]

where the final inequality is due to the following (note that we use the fact that $\rho=\tilde{\rho}$ , as mentioned earlier):

[TABLE]

We now have, for all $k\in\{1,\ldots,T\}$ :

[TABLE]

where $\gamma_{k}(0)=\xi_{k}$ and $\gamma_{k}(1)=f_{0}^{k}(x)$ . From Assumption 5, we have $\left\|\nabla\left(G^{N}_{k}-{\widetilde{G}}^{N}_{k}\right)(\xi)\right\|\leq l\delta$ . Moreover, let $\xi_{k}\in f_{0}^{k}(\mathcal{K}_{0})$ such that $\left|\left(G^{N}_{k}-{\widetilde{G}}^{N}_{k}\right)(\xi_{k})\right|=\min_{f_{0}^{k}(\mathcal{K}_{0})}\left|\left(G^{N}_{k}-{\widetilde{G}}^{N}_{k}\right)\right|=\alpha_{k}$ , and we obtain:

[TABLE]

This yields the following inequality:

[TABLE]

We now let:

[TABLE]

which yields the bound

[TABLE]

and we get $D_{\textup{max}}(\mu_{T},\widetilde{\mu}_{T})\leq\epsilon_{T}$ . ∎

We note here that, in practice, with the choice of a sufficiently large domain $\mathcal{K}_{0}$ , we can ensure that $\alpha_{k}=\min_{\xi\in f^{k}_{0}(\mathcal{K}_{0})}\left|\left(G^{N}_{k}-{\widetilde{G}}^{N}_{k}\right)(\xi)\right|=0$ for all $k\in\{0,\ldots,T\}$ . This is owing to the fact that for a large enough $\mathcal{K}_{0}$ , we will have $\min_{\xi\in f^{k}_{0}(\mathcal{K}_{0})}\left(G^{N}_{k}-{\widetilde{G}}^{N}_{k}\right)(\xi)\leq 0\leq\max_{\xi\in f^{k}_{0}(\mathcal{K}_{0})}\left(G^{N}_{k}-{\widetilde{G}}^{N}_{k}\right)(\xi)$ . Moreover, since the function $G^{N}_{k}-{\widetilde{G}}^{N}_{k}$ is continuous, there must therefore exist a point $\xi^{*}$ such that $\left(G^{N}_{k}-{\widetilde{G}}^{N}_{k}\right)(\xi^{*})=0$ .

As with the $W_{2}$ -MHE, we now characterize the differential privacy of the KL-MHE over a horizon $\{0,\ldots,T\}$ . We recall that the KL-MHE yields a sequence of distributions $\{\mu_{k}\}_{k=0}^{T}$ over the time horizon. Differential privacy over the horizon requires an upper bound on the sensitivity of the joint distribution $\sigma$ over the horizon, where $\mu_{k}$ is the marginal of $\sigma$ at the time instant $k$ . As before, with a slight abuse of notation, letting $\sigma$ also denote the joint density function, we have:

[TABLE]

From the above, we infer that to estimate the sensitivity of the joint density function, we must estimate the sensitivity of the conditionals $\rho_{k}(x_{k}|x_{k-1})$ . The conditional $\rho_{k}(x_{k}|x_{k-1})$ at any time instant $k$ , is obtained from the coupling between the marginal distributions $\mu_{k}$ and $\mu_{k-1}$ .

We now obtain an upper bound for the case where the marginals $\mu_{k}$ are independently coupled. In other words, we suppose that:

[TABLE]

Theorem 8.

(Differentially private KL-MHE). Given two $\delta$ -adjacent measurement sequences $\mathbf{y},\widetilde{\mathbf{y}}\in\mathbb{Y}^{T+N+1}$ , under Assumption 5 and the independent coupling (22), we have that the estimates generated by (20) satisfy $D_{\textup{max}}\left(\sigma,\widetilde{\sigma}\right)\leq\epsilon$ if $\sum_{k=1}^{T}\sum_{l=1}^{k}\left(\prod_{i=l}^{k}s_{i}\right)\leq\epsilon\left(2\eta\max_{k}\left(\alpha_{k}+lc_{f}^{k}\delta\text{diam}(\mathcal{K}_{0})\right)\right)^{-1}$ , where $\alpha_{k}=\min_{\xi\in f^{k}_{0}(\mathcal{K}_{0})}\left|\left(G^{N}_{k}-{\widetilde{G}}^{N}_{k}\right)(\xi)\right|$ .

Proof.

Let $G^{N}_{k}$ and ${\widetilde{G}}^{N}_{k}$ be the estimation objective functions at time instant $k$ , corresponding to the measurement sequences $\mathbf{y}$ and $\widetilde{\mathbf{y}}$ respectively, and let $\sigma$ and $\widetilde{\sigma}$ be the respective joint probability measures over the horizon $\{0,\ldots,T\}$ . With a slight abuse of notation, we allow $\sigma$ and $\widetilde{\sigma}$ to also denote the joint density function. From (22), we get:

[TABLE]

which implies that:

[TABLE]

From the proof of Theorem 7 on the sensitivity of KL-MHE, we further get:

[TABLE]

Therefore, it holds that $D_{\textup{max}}(\sigma,\widetilde{\sigma})\leq\epsilon$ if:

[TABLE]

∎

8 Simulation results

In this section, we present results from numerical simulations of the estimators studied in this paper. The simulations were performed in MATLAB (version R2017a) on a 2.5 GHz Intel Core i5 processor.

We considered the following nonlinear discrete-time system:

[TABLE]

with $\tau=0.1$ , $w_{k}$ and $v_{k}$ are i.i.d disturbances, sampled uniformly from the intervals $[-0.1,0.1]$ and $[-0.15,0.15]$ respectively, and a quadratic estimation objective function $J_{T}(\mathbf{y}^{(1)}_{0:T},\mathbf{y}^{(2)}_{0:T})=\|\mathbf{y}^{1}_{0:T}-\mathbf{y}^{2}_{0:T}\|^{2}$ .

We first present the simulation results for $W_{2}$ -MHE. We ran 30 trials of the estimator (9) on the same measurement sequence, with randomly generated initial conditions and over a time horizon of length $T=100$ . The length of the moving-horizon was chosen to be $N=10$ . Figure 1 contains the plots of the mean of the estimates along with the true states. The root mean squared error (RMSE) for the mean state estimate sequences were found to be ${z_{1}}^{\operatorname{RMSE}}=0.0856$ and ${z_{2}}^{\operatorname{RMSE}}=0.0846$ for the estimates of $x_{1}$ and $x_{2}$ , respectively. The average time for computing the state estimate through the minimization (9) using the $fminunc$ function in MATLAB was observed to be $t_{\textup{comp}}=0.012\pm 0.02s$ .

We then implemented the estimator (13) with 30 samples, over a time horizon of length $T=100$ . The length of the moving-horizon was chosen to be $N=10$ . Figure 2 contains the plots of the mean of the estimates along with the true states. The root mean squared error (RMSE) for the mean state estimate sequences were found to be ${z_{1}}^{\operatorname{RMSE}}=0.1073$ and ${z_{2}}^{\operatorname{RMSE}}=0.1144$ for the estimates of $x_{1}$ and $x_{2}$ , respectively. The average run-time for the minimization (13) by a resampling method was observed to be $t_{\textup{comp}}=(4.8\pm 0.4)\times 10^{-4}s$ .

In simulation, with 30 samples, we find that the $W_{2}$ -MHE performs better with respect to the root mean squared error, while the KL-MHE is much faster. The performance of the KL-MHE is determined by the richness of the sample set and effectiveness of the resampling procedure, choices that depend on context and experience. In this manuscript, we did not attempt to investigate improvements in performance with respect to these choices. The performance of $W_{2}$ -MHE does not necessarily improve with the richness of the sample set, but for systems for which ${\Sigma_{T}}^{-1}(\mathbf{y}_{0:T})$ is not a singleton, a richer sample set allows for a more complete characterization of the set of feasible estimates.

Figure 3 illustrates the typical trade-off between accuracy and privacy in moving-horizon estimation. We considered constant weights $s_{k}=s$ for the entropic regularization terms in (17) and (20). The values of $s$ were chosen such that they satisfied the bounds specified in Theorems 6 and 8 for $\epsilon$ -differential privacy of the estimators over the horizon. In Figure 3, we plot the RMSE (for the estimates of the state $x_{1}$ ) for $W_{2}$ -MHE, averaged over the $30$ samples, specifying the accuracy, for different values of $\epsilon$ , the privacy parameter. We recall that a higher value of $\epsilon$ indicates a less stringent privacy requirement. We notice that the the accuracy of the estimators improves with an increase in the privacy parameter.

9 Conclusions

In this work, we laid out a unifying probabilistic framework for moving-horizon estimation. We clearly established the connection between the classical notion of strong local observability and the stability of moving-horizon estimation, for nonlinear discrete-time systems. We then proposed a differentially private mechanism based on entropic regularization and derived conditions under which $\epsilon$ -differential privacy is guaranteed at any given time instant and over time horizons. As an extension to this work, we intend to include distributional constraints in the moving-horizon estimation framework. An important consideration in the estimation problem, in addition to the asymptotic stability, is the rate of convergence of the observer. It is of interest to obtain convergence rate bounds for the moving-horizon estimators proposed in this paper, and to compare their performance for various choices of the metric (or divergence) in the unifying formulation, which will be undertaken in our future work.

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] F. Albertini and D. D’Alessandro. Observability and forward–backward observability of discrete-time nonlinear systems. Mathematics of Control, Signals and Systems , 15(4):275–290, 2002.
2[2] A. Alessandri, M. Baglietto, and G. Battistelli. Moving-horizon state estimation for nonlinear discrete-time systems: New stability results and approximation schemes. Automatica , 44(7):1753–1765, 2008.
3[3] A. Alessandri and M. Gaggero. Moving-horizon estimation for discrete-time linear and nonlinear systems using the gradient and newton methods. In IEEE Int. Conf. on Decision and Control , page 2906–2911, 2016.
4[4] A. Alessandri and M. Gaggero. Fast moving horizon state estimation for discrete-time systems using single and multi iteration descent methods. IEEE Transactions on Automatic Control , 62(9):4499–4511, 2017.
5[5] L. Ambrosio, N. Gigli, and G. Savaré. Gradient flows: in metric spaces and in the space of probability measures . Springer, 2008.
6[6] S. Boyd and L. Vandenberghe. Convex Optimization . Cambridge University Press, 2004.
7[7] J. Cortés, G. E. Dullerud, S. Han, J. Le Ny, S. Mitra, and G. J. Pappas. Differential privacy in control and network systems. In IEEE Int. Conf. on Decision and Control , pages 4252–4272, Las Vegas, NV, 2016.
8[8] C. Dimitrakakis, B. Nelson, A. Mitrokotsa, and B. Rubinstein. Robust and private Bayesian inference. In Int. Conf. on Algorithmic Learning Theory , page 291–305, 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Probabilistic Framework for Moving-Horizon

Abstract

1 Introduction

2 Notation and preliminaries

Definition 1**.**

Lemma 1**.**

Definition 2**.**

Definition 3**.**

Definition 4**.**

Lemma 2**.**

Proof.

3 Observability notions

Assumption 1**.**

Assumption 2**.**

Definition 5**.**

Definition 6**.**

Lemma 3**.**

Example 1**.**

Assumption 3**.**

4 Optimization-based state estimation

4.1 Full-Information Estimation (FIE)

Assumption 4**.**

Lemma 4**.**

Theorem 1**.**

Proof.

4.2 Moving-Horizon Estimation (MHE)

Assumption 5**.**

Definition 7**.**

5 A W2W_{2}W2​-Moving-Horizon Estimator

5.1 Sample update scheme for W2W_{2}W2​-MHE

Lemma 5**.**

Proof.

5.2 Asymptotic stability of W2W_{2}W2​-MHE

Assumption 6**.**

Theorem 2**.**

Proof.

5.3 Robustness of W2W_{2}W2​-MHE

Assumption 7**.**

Theorem 3**.**

Proof.

6 A KL-Moving-Horizon Estimator

Theorem 4**.**

Proof.

7 Differential privacy

7.1 Differentially private W2W_{2}W2​-MHE

Theorem 5**.**

Proof.

Theorem 6**.**

Proof.

7.2 Differentially private KL-MHE

Theorem 7**.**

Proof.

Theorem 8**.**

Proof.

8 Simulation results

9 Conclusions

Definition 1.

Lemma 1.

Definition 2.

Definition 3.

Definition 4.

Lemma 2.

Assumption 1.

Assumption 2.

Definition 5.

Definition 6.

Lemma 3.

Example 1.

Assumption 3.

Assumption 4.

Lemma 4.

Theorem 1.

Assumption 5.

Definition 7.

5 A $W_{2}$ -Moving-Horizon Estimator

5.1 Sample update scheme for $W_{2}$ -MHE

Lemma 5.

5.2 Asymptotic stability of $W_{2}$ -MHE

Assumption 6.

Theorem 2.

5.3 Robustness of $W_{2}$ -MHE

Assumption 7.

Theorem 3.

Theorem 4.

7.1 Differentially private $W_{2}$ -MHE

Theorem 5.

Theorem 6.

Theorem 7.

Theorem 8.