A Probabilistic Framework for Moving-Horizon Estimation: Stability and Privacy Guarantees
Vishaal Krishnan, Sonia Mart\'inez

TL;DR
This paper introduces a probabilistic framework for stable moving-horizon estimation of nonlinear systems, incorporating differential privacy, and compares two variants based on Wasserstein distance and KL-divergence.
Contribution
It unifies stability analysis and privacy guarantees in moving-horizon estimators using probabilistic metrics and introduces two novel estimator variants.
Findings
W2-MHE provides a gradient-based estimation approach.
KL-MHE functions as a particle filter with stability properties.
Differential privacy can be achieved through entropy regularization.
Abstract
This work proposes a unifying probabilistic framework for the design of robustly asymptotically stable moving-horizon estimators (MHE) for discrete-time nonlinear systems, and a mechanism to incorporate differential privacy in moving-horizon estimation. We begin with an investigation of the classical notion of strong local observability of nonlinear systems and its relationship to optimization-based state estimation. We then present a general moving-horizon estimation framework for strongly locally observable systems, as an iterative minimization scheme in the space of probability measures. This framework allows for the minimization of the estimation cost with respect to different metrics. In particular, we consider two variants, which we name -MHE and KL-MHE, where the minimization scheme uses the 2-Wasserstein distance and the KL-divergence, respectively. The -MHE yields a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Probabilistic Framework for Moving-Horizon
Estimation: Stability and Privacy Guarantees
Vishaal Krishnan
Sonia Martínez The authors are with the Department of Mechanical and Aerospace Engineering, University of California at San Diego, La Jolla CA 92093 USA (email: [email protected]; [email protected]).
Abstract
This work proposes a unifying probabilistic framework for the design of robustly asymptotically stable moving-horizon estimators (MHE) for discrete-time nonlinear systems, and a mechanism to incorporate differential privacy in moving-horizon estimation. We begin with an investigation of the classical notion of strong local observability of nonlinear systems and its relationship to optimization-based state estimation. We then present a general moving-horizon estimation framework for strongly locally observable systems, as an iterative minimization scheme in the space of probability measures. This framework allows for the minimization of the estimation cost with respect to different metrics. In particular, we consider two variants, which we name -MHE and KL-MHE, where the minimization scheme uses the 2-Wasserstein distance and the KL-divergence, respectively. The -MHE yields a gradient-based estimator whereas the KL-MHE yields a particle filter, for which we investigate asymptotic stability and robustness properties. Stability results for these moving-horizon estimators are derived in the probabilistic setting, against the backdrop of the classical notion of strong local observability which, to the best of our knowledge, differentiates it from other previous works. We then propose a mechanism to encode differential privacy of the measurements used by the estimator via an entropy regularization of the MHE objective functional. In particular, we find sufficient bounds on the regularization parameter to achieve the desired level of differential privacy. Numerical simulations demonstrate the performance of these estimators.
1 Introduction
Moving-horizon estimation (MHE) is an optimization-based state estimation method that uses the most recent measurements within a moving-time horizon to recursively update state estimates. In principle, its optimization-based formulation enables it to handle nonlinearities and state constraints much more effectively than other known methods. This, coupled with the adoption of increasingly powerful, inexpensive computing platforms has brought new impetus to the adoption of moving-horizon estimation in various data-driven applications. In many cases, data is acquired from particular individuals or users, which introduces new ethical concerns about data collection and manipulation, highlighting an increasing need for data privacy. Such is the case in home monitoring and traffic estimation (with vehicle GPS data) applications, to name a few. Motivated by this, here we design and analyze a new class of moving-horizon estimation filters that can guarantee the differential-privacy of the data.
The origins of MHE can be traced back to the limited memory optimal filters introduced in [19]. Theoretical investigations on MHE have broadly been directed at their asymptotic stability [29, 2, 33] and robustness [20, 24, 18] properties. These properties have primarily been built upon underlying assumptions of input/output-to-state (IOSS) stability, which is adopted as the notion of detectability, wherein the norm of the state is bounded given the sequences of inputs and outputs. However, alternative foundations for the stability results in other classical notions of observability, such as strong observability [25], have remained unexplored. The connection between nonlinear observability theory and estimation problems runs deep, see [22] and more recently [32], and it is worthwhile to explore this connection in the context of optimization-based estimation methods such as moving-horizon estimation.
The problem of state estimation is fundamentally about dealing with uncertainty, manifested as uncertainty in the initial conditions and/or in the evolution of the system in the presence of unknown disturbances. This is appropriately formulated in the space of probability measures over the state space of the system. Recent advances in gradient flows in the space of probability measures [5], [30], and the corresponding discrete-time movement-minimizing schemes [28] present powerful theoretical tools that can be applied to recursive optimization-based estimation methods such as moving-horizon estimation, and can serve as a unifying framework for their design and analysis.
Another important consideration in the MHE problem is the cost of computation. The problem formulation more commonly involves solving an optimization problem at every time instant, with the state estimate and disturbances as decision variables in the optimization, where the dimension of the problem scales with the size of the horizon. This approach, in general, tends to be computationally intensive, which poses a hurdle for implementation in real-time. This has motivated the search for fast MHE that implement one or more iterations of the optimization at every time instant. Recently, in [3], [4], the authors develop such a method for noiseless systems and provide theoretical guarantees on convergence. However, these works assume the convexity of the cost function, which is restrictive for general nonlinear systems, and not well connected to notions of observability. None of these works has considered the additional question of privacy.
Differential privacy [11] has emerged over the past decade as a benchmark in data privacy. The typical setting assumes independence between the records in static databases; however, basic existing mechanisms fail to provide guarantees when correlations exist between the records in the database. This is the case when data is employed by a state estimation process whose output is then released: there is a dynamic system from which a time series of sensor measurements is obtained, and the measurement data and the released estimates are correlated.
In [8, 9], the authors generalize the definition of differential privacy to include general notions of distance between datasets and design differentially private mechanisms for Bayesian inference. In [23, 31], the authors investigate privacy-preserving mechanisms for the case where correlations exist between database records. Privacy-preserving mechanisms for functions and functional data were investigated in [15]. The work [27] studies the problem of differentially-private state estimation, introducing the formal notion of differential privacy into the framework of Kalman filter design for dynamic systems. The authors of [13] consider the problem of optimal state estimation for linear discrete-time systems with measurements corrupted by Laplacian noise. A finite-dimensional distributed convex optimization is considered in [26], where differential privacy is achieved by perturbation of the objective function. We refer the reader to [7] for a broad overview of the systems and control-theoretic perspective on differential privacy.
Contributions: The contributions of this work are two-fold: establishing the robust asymptotic stability of the proposed moving-horizon estimator in a probabilitstic framework, founded on the notion of strong local observability; and incorporating differential privacy in moving-horizon estimation. We begin with the well-studied notion of strong local observability of nonlinear, discrete-time systems and investigate its relationship to the optimization-based state estimation problem. To handle uncertain initial conditions and the possible non-uniqueness of solutions to the estimation problem, we adopt a generalized problem formulation over the space of probability measures over the state space. More precisely, we define the MHE as a proximal gradient descent in the space of probability measures, with a non-convex, time-varying cost function. This probabilistic setting serves as a unifying framework for moving-horizon estimation and allows us to develop different classes of moving-horizon estimators by simply varying the metric used to define the proximal operator, and to obtain implementable filters by Monte Carlo methods. We then consider the Wasserstein metric and the KL-divergence, which yield the more familiar MHE and a particle filter, respectively. Following this, we present an analysis of the convergence and robustness properties of these estimators in the probabilistic setting, under assumptions of strong local observability. Further, we modify the optimization problem (in the space of probability measures) by an entropy regularization to derive conditions that guarantee a desired level of differential privacy for these filters.
Paper organization: The rest of the paper is organized as follows. In Section 2, we introduce the notation and mathematical preliminaries used in the paper. We present the optimization-based state estimation problem in Section 4, where Section 4.1 deals with the Full Information Estimation (FIE) problem and the Moving-horizon Estimation (MHE) problem is introduced in Section 4.2. We present the MHE method based on proximal gradient descent with the Wasserstein metric in Section 5, and with the KL-divergence in Section 6. In Section 7, we address the differential privacy considerations for the moving-horizon estimators designed. The results from numerical experiments are presented in Section 8, with the conclusions in Section 9.
2 Notation and preliminaries
In this section, we introduce the notation and mathematical preliminaries relevant to this paper.
Let denote the Euclidean norm on d and the absolute value function. We denote by the gradient operator in d. For any , we let be an absolutely continuous probability measure on . We denote by the corresponding density function, where , with being the Lebesgue measure. For , let the distance of a point to the set be given by . We denote by the inner product of functions with respect to the Lebesgue measure vol, given by . Let be a smooth real-valued function on the space of probability measures on . We denote by the derivative of with respect to , see [12], such that a perturbation of the measure results in a perturbation . Given a map and a measure , in the space of probability measures , we let denote the pushforward measure of by , where for a measurable set , we have . Moreover, we denote by the expectation operator w.r.t. the measure .
We now introduce the notion of -smoothness that underlies the results on convergence of gradient descent methods.
Definition 1**.**
(-smoothness). A function is called -smooth (or Lipschitz differentiable) if for any , we have .
The following lemma [6] can be easily verified for -smooth functions:
Lemma 1**.**
(-smooth functions). For an -smooth function and any , we have .
We now define the proximal operator on with respect to a function , as follows:
[TABLE]
The notion of observability used in this paper is intricately related to solutions of inverse problems, with an associated notion of well-posedness that is introduced below:
Definition 2**.**
*(Well posedness [21]).
Let and be normed spaces, and a mapping. The equation is called well-posed if:*
Existence: For every , there is (at least one) such that . 2. 2.
Uniqueness: For every , there is at most such that . 3. 3.
Stability: The solution depends continuously on , that is, for any sequence such that , it follows that .
We now introduce the notion of lower semicontinuity of set-valued maps, which underlies some of the results on optimization-based state estimation in this paper.
Definition 3**.**
(Lower semicontinuity of set-valued maps). A point-to-set mapping is lower semicontinuous at a point if for any and sequences , with , such that for all , it holds that . If is lower semicontinuous at every , then is said to be lower semicontinuous on .
We now define some notions of distance in the space of probability measures. Let be two absolutely continuous probability measures on , with being the corresponding density functions. Also, let be the space of joint probability measures that have and as their marginals. The -Wasserstein distance between and is given by:
[TABLE]
In what follows, we let , where is the so-called the Kantorovich potential [30] associated with the transport from to .
The KL-divergence from to is given by:
[TABLE]
The max-divergence between and is defined as:
[TABLE]
We refer the reader to [14] for a detailed overview of the relations between the various metrics and divergences in probability spaces.
We define an estimator as a function that accepts as input data from the metric space and releases as output , a probability measure over the space .
Definition 4**.**
(Differential privacy). Given , an estimator is -differentially private if for any two -adjacent measurements (that is ), and any measurable , we have .
Note that the condition is a generalization of the notion of adjacency to arbitrary metric spaces that we adopt in this paper. We now have the following lemma on the connection between the notions of differential privacy and max-divergence introduced above:
Lemma 2**.**
*(Differential privacy and max-divergence).
An estimator is -differentially private iif for any with .*
Proof.
Clearly, if for any with we have , then:
[TABLE]
This implies that for any , we have . Now, for any , we have , which implies that is -differentially private. The forward implication can be easily verified. ∎
Thus, -differential privacy essentially imposes an upper bound on the sensitivity of the estimate generated by (in the sense of the max-divergence ), to the measurement.
3 Observability notions
In this paper, we consider systems of the form:
[TABLE]
where and , is the process noise, is the measurement noise at time instant , and , , , and .
Assumption 1**.**
(Lipschitz continuity). The functions and are Lipschitz continuous, with and .
Assumption 2**.**
(Noise characteristics). The noise sequences and are i.i.d samples from distributions and (with supports in and ). The sets and are bounded, with and . Moreover, we assume that and .
We also introduce the following autonomous system corresponding to (1):
[TABLE]
With a slight abuse of notation, for any , we let , the sequence of outputs over a horizon of length for the system (2) from the state . Similarly, for the system (1), we let , for some sequence of process noise samples , where .
The theoretical results in the moving-horizon estimation literature have largely been derived in the setting of input/output-to-state (IOSS) stability, as in [29, 20, 18] to name a few, which is a notion of norm-observability, see [17], wherein the norm of the state is bounded using the sequences of inputs and outputs. However, there are other classical notions of observability based on the notion of distinguishability, which generalize the approach taken to linear systems. For a detailed treatment, we refer the reader to [25] and [1]. In this paper, we explore the connection between the classical notion of strong local observability and moving-horizon estimation.
We now introduce the notion of strong local observability used in this paper:
Definition 5**.**
(Strong local observability). The system defined in (2) is called strongly locally observable if there exists a such that for any given and , we have that is a set of isolated points. Moreover, for all and , we have that . We call the minimum horizon length of .
The above definition is equivalent to the definitions contained in [25, 1], which has been restated it in a manner suitable for the optimization-based estimation framework considered here. As seen from the above definition, strong observability is based on a distinguishability notion, and when it holds globally (i.e., for all ) it is equivalent to the notion of uniform observability, as established in [16].
For systems with process noise, of the form in (1), we introduce the notion of almost sure strong local observability.
Definition 6**.**
(Almost sure strong local observability). The system defined in (1) is called almost surely strongly locally observable if there exists a such that, given a process noise sequence , for , any , and , we have that is a set of isolated points almost surely. More precisely, the set of noise sequences for which is not a set of isolated points, is of measure zero. Moreover, we call the minimum horizon length of .
We now present a fundamental result that characterizes strong local observability via a rank condition.
Lemma 3**.**
(Observability rank condition [25]). The system is locally strongly observable with minimum horizon length if and only if for all and . The system is almost surely locally strongly observable with minimum horizon length if and only if almost surely for all and .
We now present an example to illustrate these concepts.
Example 1**.**
Consider a system with the state space , with and , such that:
[TABLE]
for some , small and a smooth function such that and . Moreover, let the output . We note that which implies that for all . Applying Lemma 3 for this system, we can infer that for , we get that the minimum horizon length . This is because the system becomes strongly locally observable at only over a horizon of length , that is for . This is a case of a one-dimensional system which is strongly locally observable with a minimum horizon of length . With larger values of , the minimum horizon length is further increased.
We make the following assumption in the rest of the paper:
Assumption 3**.**
(Strong local observability).
The system in (2) is strongly locally observable with minimum horizon length . 2. 2.
*The system in (1) is almost surely strongly locally observable with minimum horizon length . *
4 Optimization-based state estimation
We now begin by addressing the state estimation problem for the autonomous system , and develop a recursive moving-horizon estimator for it.
4.1 Full-Information Estimation (FIE)
Let be a sequence of measurements generated by the system . Let be a time horizon such that , the minimum horizon length of the system , and denote . The problem of estimation essentially aims at characterizing , which is an inverse problem, and optimal estimation formulates this problem as an optimization. Assumptions 1, and 3, on Lipschitz continuity and strong local observability, respectively, ensure that the inverse problem is locally well-posed as in Definition 2.
To formulate the inverse problem as an optimization, consider a convex function such that if and only if . Moreover, we let if for . Now, the problem of interest becomes:
[TABLE]
In the above, is the data in the estimation problem, which is given. Since the objective is to solve the original inverse problem, and we would like to use gradient descent-based methods, we would like for every local minimizer of to belong to the set , or, in other words, that every local minimizer is also global. We therefore make the following additional assumption on the system and the choice of . For a conciseness of notation, in the following assumption and lemma, we let , suppressing the data in the notation where useful, and is understood from context.
Assumption 4**.**
(Lower semicontinuity of sublevel sets). We assume that, for all , the convex function is such that the set-valued map is lower semicontinuous, where .
The above assumption ensures that the function satisfies the condition for the local minimizers to be global (Theorem 1 from [34]). The following lemma provides a sufficient condition for it to hold.
Lemma 4**.**
(Second-order sufficient condition for lower semicontinuity). Assumption 4 holds if for any such that we have , or the following condition holds when for any , :
[TABLE]
where is the Hessian of .
The final inequality in Lemma 4 merely states that those critical points at which the cost function does not reach the global minimum value are local maximizers.
We are now ready to present the following theorem that establishes the equivalence between the inverse problem of characterizing the set and the optimization (3).
Theorem 1**.**
(Inverse as minimizer). For a convex such that if and only if for any , under Assumptions 3 and 4, and any , it holds that if and only if is a minimizer of .
Proof.
If , we have that for all . It now follows that . Since, by definition, we infer that is a global minimizer of .
Suppose that is a local minimizer of . By Assumption 4 and Theorem 1 in [34], we get that the local minima of are also global, which implies that , and therefore . ∎
Theorem 1 suggests that the state estimates for the system can be obtained by minimizing over a horizon of length . This is also called the full information estimation (FIE) problem in the optimal state estimation literature [29, 18], as it works with the entire sequence of output measurements over the horizon .
Now, from Assumption 3 and Theorem 1, we have that is a set of isolated points which are minimizers of . It then follows that is the set of stable fixed points of the negative gradient vector field of . We let be the basin of attraction of this set. Moreover, we note that is the set of stable fixed points of the negative gradient vector field of , and we let be the basin of attraction of . We have used above the fact that , which follows from the definition of strong local observability.
We now lift the FIE problem (3) to the space of probability measures over , as a minimization in expectation of the estimation objective function:
[TABLE]
The above formulation allows us to capture information about the (probably many) optimal estimates through a probability measure , and help encode distributional constraints, which will be considered in a forthcoming publication.
In the following, we develop recursive moving-horizon estimators that generate sequences of probability measures in as estimates. We then obtain practically implementable estimators using Monte Carlo methods to sample from the measures .
4.2 Moving-Horizon Estimation (MHE)
In the previous section, we presented a formulation of the full information estimation (FIE) problem for the autonomous system , which uses the entire measurement sequence over a horizon of length . However, the minimum horizon length may be large, which would make the estimation computationally intensive. Moreover, we would like to progressively assimilate the incoming measurements online. We therefore adopt a moving-horizon estimation method which, at any time instant , uses the output measurements from the horizon (of length ), and the state estimate at the time instant , to obtain the state estimate at instant , recursively.
We let be the objective function over the horizon , at the time instant , where .
Assumption 5**.**
(Moving-horizon cost). We make the following assumptions on the cost function :
the cost is -smooth, 2. 2.
it holds that , 3. 3.
the previous constants are such that , 4. 4.
for any two -adjacent measurements , such that and with corresponding costs and , for and , we have for all .
We now formulate the general moving-horizon estimation method as follows:
[TABLE]
where is a placeholder for a metric, divergence or transport cost on . We obtain implementable observers from the above formulation by sampling from the measures, by Monte Carlo methods. As discussed in the ensuing sections, using the -Wasserstein distance yields the more familiar MHE formulation, whereas with the KL-divergence we obtain a moving-horizon particle filter. Hence, this formulation is proposed as a unifying probabilistic framework for moving-horizon estimation, where different estimators are generated by different choices of .
We now introduce the following asymptotic stability notion for estimators that will be used in investigating the properties of the estimators we design.
Definition 7**.**
(Asymptotic stability of state estimator). We call an estimator of the form (5) an asymptotically stable observer for the system if the sequence of estimates is such that for .
5 A -Moving-Horizon Estimator
In this section, we derive a moving-horizon estimator, which we refer to as the -MHE, to generate a sequence of probability distributions . This is based on the one-step minimization scheme of [30] in w.r.t. the Wasserstein metric , which we extend to the moving-horizon setting. For every , consider:
[TABLE]
We let be the support of , with , where is as defined earlier in Section 4.1.
5.1 Sample update scheme for -MHE
We now derive a sample update scheme for -MHE, which also yields an implementable filter for the -MHE formulation.
We note that any local minimizer of (6) is a critical point of the objective functional and therefore, it satisfies:
[TABLE]
where is the Kantorovich potential [30] associated with the transport from to , and is a constant (from the constraint , for , due to which the first variation is defined up to an additive constant). From the above equation, we now obtain:
[TABLE]
The gradient of the Kantorovich potential defines the deterministic optimal transport map (note that this notation is not to be confused with that of the time horizon ) w.r.t. the -distance from to , which determines (where ). We therefore get:
[TABLE]
The above equation allows us to design an implementable filter for the -MHE (6). We let , that is, is sampled from the distribution . From (7), it holds that . Since , we let , a sample of the distribution , and we obtain the following recursive estimator:
[TABLE]
We now note that the estimate in (8) corresponds to a critical point of the following minimizing movement scheme:
[TABLE]
Lemma 5**.**
(Strong convexity). For , the objective function in (9) is strongly convex, and therefore is a singleton for any .
Proof.
Let . We have that . It now follows that . From Assumption 5-(1), on the moving-horizon cost, we now get that , and since , we infer that is strongly convex, and therefore has a unique minimizer. Thus, is a singleton. ∎
We note that the minimization (9) defines a proximal mapping w.r.t. the Euclidean metric, which we represent in a compact form using the proximal operator as:
[TABLE]
where .
5.2 Asymptotic stability of -MHE
We present the asymptotic stability result for -MHE in this section, before which we introduce the following assumption on positive invariance of the discrete-time dynamics defined by the map .
Assumption 6**.**
(Positive invariance). We assume that there exists such that for all , we have .
The above assumption ensures that under the discrete-time dynamics defined by the map , any sequence starting in the basin of attraction of remains within the basins of attraction of at the subsequent instants of time .
We are now ready to present the asymptotic stability result for -MHE:
Theorem 2**.**
*(Asymptotic stability of -MHE).
The estimator (6), under Assumptions 3 to 6, with a constant step size , is an asymptotically stable observer for the system .*
Proof.
By Assumption 5-, on the moving-horizon cost, and Lemma 1, we have:
[TABLE]
Substituting from (8) into the above, we get:
[TABLE]
It now follows that:
[TABLE]
From Assumption 5-, on the moving-horizon cost, we have:
[TABLE]
Summing the above inequality from to , we get:
[TABLE]
From here, we obtain:
[TABLE]
Since , we have that and therefore, taking limits in the previous inequality, we deduce that the series is summable. The latter implies that , and from (8), we have that .
It now follows, by definition, from the above that , over a horizon of length (with ). We now have that the initial condition and Assumption 6 ensure that , the basin of attraction of and from the fact that , we infer that converges to the local minima of . By Theorem 1, it now follows that converges to the set . Therefore .
Moreover, since for all , it follows that . We know that , and therefore we get that . ∎
5.3 Robustness of -MHE
We now characterize the performance of the estimator (6) on the system in (1). Since the true process and measurement noise sequences remain unknown, we are interested in the robustness properties of the estimator (11), in the form of an upper bound by the norms of the disturbance sequences on the estimation error.
We begin by constructing a reference estimator that recursively generates the estimate sequence, given the true disturbance sequences and , as follows:
[TABLE]
where, we employ for conciseness and , so that . Note that G^{N}_{k}=\bar{G}^{N}_{k}\big{|}_{\mathbf{w}=0,\mathbf{v}=0}. We let be the support of , with , where the definition of is similar to that of but taking the noise and into account.
Assumption 7**.**
(l-Smoothness w.r.t. disturbances). We assume that for all .
Following the proof of Theorem 2, under the same set of underlying assumptions, we infer that the reference estimator (11) is almost surely an asymptotically stable observer for the system , given a particular realization of the disturbances and .
We now present the following theorem on the robustness of the estimator (6), characterized by a bound on the error in the estimates generated by (6) with respect to the estimates generated by the reference estimator (11):
Theorem 3**.**
*(Robustness of -MHE). Under Assumptions 1, 3, 5, and 7, given the estimate sequences generated by (6) and generated by the reference estimator (11), with , we have , for all , where . *
Proof.
The estimator (11) yields the following reference recursive scheme:
[TABLE]
where the above is derived similarly to the noiseless case. Let and be the estimate sequences generated by (8) and (12) respectively, with , for which we have:
[TABLE]
where the final inequality follows from Assumptions 1, 5, and 7, on the several Lipschitz properties of the gradient of , and , respectively. Further, since , we obtain from the above that:
[TABLE]
We note that if , we have that is finite, and therefore, is bounded as . We note here that even when , the effect of this initial discrepancy vanishes as .
Now, let be a map such that for sequences and generated by (8) and (12) respectively, with , we have . It then follows that . Now, from the above, and by definition of the -Wasserstein distance, we have:
[TABLE]
∎
6 A KL-Moving-Horizon Estimator
In this section, we derive a moving-horizon estimator, which we refer to as KL-MHE, to generate a sequence of probability distributions . Using the KL-divergence as the choice of divergence in the moving-horizon formulation (5), we obtain:
[TABLE]
We note that any local minimizer of (13) is a critical point of the objective functional, and, therefore, it satisfies:
[TABLE]
where is a constant (from the constraint , for , due to which the first variation is defined up to an additive constant). From the above, we get:
[TABLE]
where for any , is the density function corresponding to the measure . Therefore, the corresponding recursive update scheme for the density function is given by:
[TABLE]
where is the normalization constant. We note that the above is a particle filter formulation, with the horizon cost defining the weighting function. Implementable filters are obtained by a Sequential Monte Carlo method, see [10]. We now present the asymptotic stability result for KL-MHE:
Theorem 4**.**
(Asymptotic stability of KL-MHE). The estimator (13), under Assumptions 1 to 4, is an asymptotically stable observer for the system .
Proof.
We know that for any map and measure , we have that . It then follows from (14) that:
[TABLE]
We now rewrite the above as:
[TABLE]
Repeating the above process times, we obtain:
[TABLE]
where is the normalization constant. If , we have that , since as for all (by definition of the cost function, the sum diverges over an infinitely long horizon). Thus, we get:
[TABLE]
∎
7 Differential privacy
In this section, we discuss the mechanism for encoding the desired level of differential privacy in moving-horizon estimators. We then apply this mechanism to the two estimators presented in the previous sections, the -MHE and KL-MHE. We conclude the section with a discussion on differential privacy of the estimators over a time horizon. Our aim here is to guarantee differential privacy of the measurement data , when the estimate sequence is released (made public). We consider the class of scenarios where an adversary can access the released estimates, while the measurement data itself is not accessible to the adversary. Our goal in incorporating differential privacy in estimation is to ensure that the adversary is not able to distinguish (in the sense of -differential privacy) between measurement sequences that are -adjacent, using the released estimates, which is an underlying risk when the estimates are directly released without such a consideration.
Given the framework (5), we encode differential privacy by an entropic regularization of the estimation objective function, as follows:
[TABLE]
where is a tunable time-dependent parameter and is the support of (with being the support of ). Moreover, , where and . We note that when , the above formulation reduces to (5) and when , it is equivalent to an entropy maximization problem, yielding a uniform distribution over the set as the solution. Clearly, the uniform distribution is insensitive to the measurements, and therefore offers maximum privacy, while being of no value to the estimation objective. The ensuing analysis in this section is directed at determining upper bounds on the parameter sequence such that the MHE offers -differential privacy. We rewrite the optimization problem (15) for as follows:
[TABLE]
Let be two -adjacent measurement sequences as in Definition 4, over a horizon , such that and let and be the sequences of estimates derived from (16). In the following, we determine conditions on that guarantee differential privacy for each of the estimators derived in previous sections.
7.1 Differentially private -MHE
We now design a differentially private -moving-horizon estimator. We begin by considering:
[TABLE]
for .
The following theorem provides a sufficient upper bound on such that the entropy-regularized -MHE in (17) is -differentially private at a time instant .
Theorem 5**.**
(Sensitivity of -MHE). Given two -adjacent measurement sequences , under Assumption 5, we have that the estimates generated by (17) satisfy if , where is a class- function that satisfies .
Proof.
Let and be the estimation objective functions at time instant , corresponding to the measurement sequences and respectively, and let and be the respective estimated probability measures, with the corresponding density functions. From (17), we get that for all , , being the local minimizer is also a critical point of the objective functional. We therefore obtain:
[TABLE]
where is the Kantorovich potential associated with the transport from to and is a constant. It now follows that:
[TABLE]
Similarly, we have:
[TABLE]
Taking the difference between the above two equations:
[TABLE]
We have that , where . This implies that . However, , and therefore , for all and some class- function . We let characterize the dependence of on the measurement sequence, and we get that for all , when . Moreover, by Assumption 5, we get . Therefore, we obtain:
[TABLE]
We also have that for any :
[TABLE]
where and . Since and are continuous, with (since ), there exists an such that , which implies that . From (18) and (19), for a straight line segment , we therefore obtain:
[TABLE]
where we have used the fact that . Thus, for , we let:
[TABLE]
from which we obtain that:
[TABLE]
and since for all , we have that . ∎
As noted earlier, Theorem 5 provides a sufficient upper bound on for differential privacy of the estimate at . The goal, however, is to guarantee the desired level of differential privacy over a time horizon . The key issue here is that the recursive update scheme of the estimator introduces a dependence between the estimates at different time instants. This essentially means that imposing an upper bound on sensitivity for the marginal distributions individually, without regard to the dependence between these distributions, may not be sufficient. Therefore, to guarantee the desired level of differential privacy over the time horizon, we must impose an upper bound on the sensitivity of the joint distribution , where the estimates are the marginals of over .
The following theorem provides a sufficient upper bound on such that the entropy-regularized -MHE in (17) is -differentially private over a time horizon .
Theorem 6**.**
(Differentially private -MHE). Given two -adjacent measurement sequences , under Assumption 5, we have that the estimates generated by (17) satisfy if .
Proof.
Let and be the estimation objective functions at time instant , corresponding to the measurement sequences and respectively, and let and be the respective joint probability measures over the horizon . With a slight abuse of notation, we allow and to also denote the joint density function. We now have:
[TABLE]
where is the marginal density at at time instant , given that the distribution at time instant is concentrated at . Moreover, we note that the -MHE (17) yields a Markov process, which allows us to express . Now, is the density corresponding to the measure obtained by the following:
[TABLE]
where is the Dirac measure concentrated at . From the above, we get that for all , , being the local minimizer is also a critical point of the objective functional. Applying similar steps to those in the proof of Theorem 5, we obtain:
[TABLE]
Now, we have:
[TABLE]
By taking
[TABLE]
we obtain the following inequality:
[TABLE]
and that . ∎
We note that for a given , the upper bound on the sequence decreases with . In other words, guaranteeing -differential privacy w.r.t. measurement sequences that are farther apart requires the addition of more noise and a greater loss in estimation accuracy. This is because the weighting on the entropic regularization term in the estimation objective increases when is reduced. The same is the case when is reduced for a given , which corresponds to a more stringent privacy requirement.
7.2 Differentially private KL-MHE
We now design a differentially private KL-moving-horizon estimator. We begin by considering the entropy-regularized KL-MHE formulation, given by:
[TABLE]
for . The corresponding recursive update scheme for (20) is given by:
[TABLE]
which will be derived in the proof of Theorem 7 below.
The following theorem provides a sufficient upper bound on such that the entropy-regularized KL-MHE in (20) is -differentially private at a time instant , while ignoring the correlations between the estimates across time.
Theorem 7**.**
(Sensitivity of KL-MHE). Given two -adjacent measurement sequences , under Assumption 5, we have that the estimates generated by (20) satisfy if , where .
Proof.
Let and be the estimation objective functions at time instant , corresponding to the measurement sequences and respectively, and let and be the respective estimated probability measures, with the corresponding density functions. From (20), we get that for all , , being the local minimizer is also a critical point of the objective functional. We therefore obtain:
[TABLE]
from which we derive that:
[TABLE]
The above equation can be rewritten as follows:
[TABLE]
where is the normalization constant. We therefore obtain:
[TABLE]
Expanding the above, we get:
[TABLE]
where . Similarly, we have:
[TABLE]
where and , as we assume that the estimator starts with the same initial . From the above two equations, we obtain:
[TABLE]
The max-divergence between and can be upper bounded now by:
[TABLE]
where the final inequality is due to the following (note that we use the fact that , as mentioned earlier):
[TABLE]
We now have, for all :
[TABLE]
where and . From Assumption 5, we have . Moreover, let such that , and we obtain:
[TABLE]
This yields the following inequality:
[TABLE]
We now let:
[TABLE]
which yields the bound
[TABLE]
and we get . ∎
We note here that, in practice, with the choice of a sufficiently large domain , we can ensure that for all . This is owing to the fact that for a large enough , we will have . Moreover, since the function is continuous, there must therefore exist a point such that .
As with the -MHE, we now characterize the differential privacy of the KL-MHE over a horizon . We recall that the KL-MHE yields a sequence of distributions over the time horizon. Differential privacy over the horizon requires an upper bound on the sensitivity of the joint distribution over the horizon, where is the marginal of at the time instant . As before, with a slight abuse of notation, letting also denote the joint density function, we have:
[TABLE]
From the above, we infer that to estimate the sensitivity of the joint density function, we must estimate the sensitivity of the conditionals . The conditional at any time instant , is obtained from the coupling between the marginal distributions and .
We now obtain an upper bound for the case where the marginals are independently coupled. In other words, we suppose that:
[TABLE]
Theorem 8**.**
(Differentially private KL-MHE). Given two -adjacent measurement sequences , under Assumption 5 and the independent coupling (22), we have that the estimates generated by (20) satisfy if , where .
Proof.
Let and be the estimation objective functions at time instant , corresponding to the measurement sequences and respectively, and let and be the respective joint probability measures over the horizon . With a slight abuse of notation, we allow and to also denote the joint density function. From (22), we get:
[TABLE]
which implies that:
[TABLE]
From the proof of Theorem 7 on the sensitivity of KL-MHE, we further get:
[TABLE]
Therefore, it holds that if:
[TABLE]
∎
8 Simulation results
In this section, we present results from numerical simulations of the estimators studied in this paper. The simulations were performed in MATLAB (version R2017a) on a 2.5 GHz Intel Core i5 processor.
We considered the following nonlinear discrete-time system:
[TABLE]
with , and are i.i.d disturbances, sampled uniformly from the intervals and respectively, and a quadratic estimation objective function .
We first present the simulation results for -MHE. We ran 30 trials of the estimator (9) on the same measurement sequence, with randomly generated initial conditions and over a time horizon of length . The length of the moving-horizon was chosen to be . Figure 1 contains the plots of the mean of the estimates along with the true states. The root mean squared error (RMSE) for the mean state estimate sequences were found to be and for the estimates of and , respectively. The average time for computing the state estimate through the minimization (9) using the function in MATLAB was observed to be .
We then implemented the estimator (13) with 30 samples, over a time horizon of length . The length of the moving-horizon was chosen to be . Figure 2 contains the plots of the mean of the estimates along with the true states. The root mean squared error (RMSE) for the mean state estimate sequences were found to be and for the estimates of and , respectively. The average run-time for the minimization (13) by a resampling method was observed to be .
In simulation, with 30 samples, we find that the -MHE performs better with respect to the root mean squared error, while the KL-MHE is much faster. The performance of the KL-MHE is determined by the richness of the sample set and effectiveness of the resampling procedure, choices that depend on context and experience. In this manuscript, we did not attempt to investigate improvements in performance with respect to these choices. The performance of -MHE does not necessarily improve with the richness of the sample set, but for systems for which is not a singleton, a richer sample set allows for a more complete characterization of the set of feasible estimates.
Figure 3 illustrates the typical trade-off between accuracy and privacy in moving-horizon estimation. We considered constant weights for the entropic regularization terms in (17) and (20). The values of were chosen such that they satisfied the bounds specified in Theorems 6 and 8 for -differential privacy of the estimators over the horizon. In Figure 3, we plot the RMSE (for the estimates of the state ) for -MHE, averaged over the samples, specifying the accuracy, for different values of , the privacy parameter. We recall that a higher value of indicates a less stringent privacy requirement. We notice that the the accuracy of the estimators improves with an increase in the privacy parameter.
9 Conclusions
In this work, we laid out a unifying probabilistic framework for moving-horizon estimation. We clearly established the connection between the classical notion of strong local observability and the stability of moving-horizon estimation, for nonlinear discrete-time systems. We then proposed a differentially private mechanism based on entropic regularization and derived conditions under which -differential privacy is guaranteed at any given time instant and over time horizons. As an extension to this work, we intend to include distributional constraints in the moving-horizon estimation framework. An important consideration in the estimation problem, in addition to the asymptotic stability, is the rate of convergence of the observer. It is of interest to obtain convergence rate bounds for the moving-horizon estimators proposed in this paper, and to compare their performance for various choices of the metric (or divergence) in the unifying formulation, which will be undertaken in our future work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] F. Albertini and D. D’Alessandro. Observability and forward–backward observability of discrete-time nonlinear systems. Mathematics of Control, Signals and Systems , 15(4):275–290, 2002.
- 2[2] A. Alessandri, M. Baglietto, and G. Battistelli. Moving-horizon state estimation for nonlinear discrete-time systems: New stability results and approximation schemes. Automatica , 44(7):1753–1765, 2008.
- 3[3] A. Alessandri and M. Gaggero. Moving-horizon estimation for discrete-time linear and nonlinear systems using the gradient and newton methods. In IEEE Int. Conf. on Decision and Control , page 2906–2911, 2016.
- 4[4] A. Alessandri and M. Gaggero. Fast moving horizon state estimation for discrete-time systems using single and multi iteration descent methods. IEEE Transactions on Automatic Control , 62(9):4499–4511, 2017.
- 5[5] L. Ambrosio, N. Gigli, and G. Savaré. Gradient flows: in metric spaces and in the space of probability measures . Springer, 2008.
- 6[6] S. Boyd and L. Vandenberghe. Convex Optimization . Cambridge University Press, 2004.
- 7[7] J. Cortés, G. E. Dullerud, S. Han, J. Le Ny, S. Mitra, and G. J. Pappas. Differential privacy in control and network systems. In IEEE Int. Conf. on Decision and Control , pages 4252–4272, Las Vegas, NV, 2016.
- 8[8] C. Dimitrakakis, B. Nelson, A. Mitrokotsa, and B. Rubinstein. Robust and private Bayesian inference. In Int. Conf. on Algorithmic Learning Theory , page 291–305, 2014.
