Outlier-robust Kalman filters with mixture correntropy

Hongwei Wang; Wei Zhang; Junyi Zuo; Heping Wang

arXiv:1907.00307·stat.ME·April 29, 2020·J. Frankl. Inst.

Outlier-robust Kalman filters with mixture correntropy

Hongwei Wang, Wei Zhang, Junyi Zuo, Heping Wang

PDF

Open Access

TL;DR

This paper introduces two novel robust Kalman filters based on mixture correntropy to effectively handle outliers in nonlinear measurement data, improving estimation accuracy in challenging scenarios.

Contribution

The work develops two new robust filters using mixture correntropy, enhancing traditional Kalman filtering for outlier resistance in nonlinear systems.

Findings

01

The proposed filters outperform existing robust methods in numerical tests.

02

Mixture correntropy effectively mitigates measurement outliers.

03

The methods maintain reasonable estimates when measurement errors are small.

Abstract

We consider the robust filtering problem for a nonlinear state-space model with outliers in measurements. To improve the robustness of the traditional Kalman filtering algorithm, we propose in this work two robust filters based on mixture correntropy, especially the double-Gaussian mixture correntropy and Laplace-Gaussian mixture correntropy. We have formulated the robust filtering problem by adopting the mixture correntropy induced cost to replace the quadratic one in the conventional Kalman filter for measurement fitting errors. In addition, a tradeoff weight coefficient is introduced to make sure the proposed approaches can provide reasonable state estimates in scenarios where measurement fitting errors are small. The formulated robust filtering problems are iteratively solved by utilizing the cubature Kalman filtering framework with a reweighted measurement covariance. Numerical…

Tables1

Table 1. Table 1: TRMSE of x 1 subscript 𝑥 1 x_{1} and x 2 subscript 𝑥 2 x_{2} with different α 𝛼 \alpha

$α$		0(MCC-CKF1)	0.1	0.3	0.5	0.7	0.9	1(MCC-CKF2)
$ϕ = 0.3, φ = 200$	$x_{1}$	0.4245	0.3692	0.3625	0.3631	0.3650	0.3578	0.4912
$ϕ = 0.3, φ = 200$	$x_{2}$	0.4792	0.4243	0.4180	0.4154	0.4154	0.4127	0.5004
$ϕ = 0.2, φ = 300$	$x_{1}$	0.2726	0.2665	0.2599	0.2544	0.2498	0.2458	0.3640
$ϕ = 0.2, φ = 300$	$x_{2}$	0.3522	0.3387	0.3327	0.3280	0.3243	0.3213	0.3709

Equations91

V (X, Y) = E [κ (X, Y)] = \int\int κ (X, Y) p (x, y) d x d y

V (X, Y) = E [κ (X, Y)] = \int\int κ (X, Y) p (x, y) d x d y

M (X, Y) = E (α κ_{1} (X, Y) + (1 - α) κ_{2} (X, Y))

M (X, Y) = E (α κ_{1} (X, Y) + (1 - α) κ_{2} (X, Y))

M (X, Y) = \frac{1}{N} i = 1 \sum N (α κ_{1} (e_{i}) + (1 - α) κ_{2} (e_{i})))

M (X, Y) = \frac{1}{N} i = 1 \sum N (α κ_{1} (e_{i}) + (1 - α) κ_{2} (e_{i})))

κ_{1} (e_{i}) = exp (- \frac{e _{i}^{2}}{2 σ _{1}^{2}}), κ_{2} (e_{i}) = exp (- \frac{e _{i}^{2}}{2 σ _{2}^{2}})

κ_{1} (e_{i}) = exp (- \frac{e _{i}^{2}}{2 σ _{1}^{2}}), κ_{2} (e_{i}) = exp (- \frac{e _{i}^{2}}{2 σ _{2}^{2}})

κ_{1} (e_{i}) = exp (- \frac{e _{i}^{2}}{2 σ _{1}^{2}}), κ_{2} (e_{i}) = exp (- \frac{∣ e _{i} ∣}{σ _{2}})

κ_{1} (e_{i}) = exp (- \frac{e _{i}^{2}}{2 σ _{1}^{2}}), κ_{2} (e_{i}) = exp (- \frac{∣ e _{i} ∣}{σ _{2}})

L (X, Y) = 1 - M (X, Y)

L (X, Y) = 1 - M (X, Y)

x_{t}

x_{t}

y_{t}

\hat{x}_{t ∣ t}

\hat{x}_{t ∣ t}

= ar g x_{t} min (- lo g p (x_{t} ∣ y_{1 : t - 1}) - lo g p (y_{t} ∣ x_{t}))

\hat{x}_{t ∣ t}

\hat{x}_{t ∣ t}

= ar g x_{t} min (\frac{1}{2} ∥ x_{t} - \hat{x}_{t ∣ t - 1} ∥_{P_{t ∣ t - 1}^{- 1}}^{2} + \frac{1}{2} i = 1 \sum m e_{t, i}^{2})

\hat{x}_{t ∣ t} = ar g x_{t} min {\frac{1}{2} ∥ x_{t} - \hat{x}_{t ∣ t - 1} ∥_{P_{t ∣ t - 1}^{- 1}}^{2} + λ [1 - \frac{1}{m} i = 1 \sum m (α exp (- \frac{e _{i}^{2}}{2 σ _{1}^{2}}) + (1 - α) exp (- \frac{e _{i}^{2}}{2 σ _{2}^{2}}))]}

\hat{x}_{t ∣ t} = ar g x_{t} min {\frac{1}{2} ∥ x_{t} - \hat{x}_{t ∣ t - 1} ∥_{P_{t ∣ t - 1}^{- 1}}^{2} + λ [1 - \frac{1}{m} i = 1 \sum m (α exp (- \frac{e _{i}^{2}}{2 σ _{1}^{2}}) + (1 - α) exp (- \frac{e _{i}^{2}}{2 σ _{2}^{2}}))]}

e^{δ} \approx 1 + δ

e^{δ} \approx 1 + δ

L_{D G - M C L} \approx \frac{α σ _{2}^{2} + ( 1 - α ) σ _{1}^{2}}{m σ _{1}^{2} σ _{2}^{2}} \frac{1}{2} i = 1 \sum m e_{t, i}^{2}

L_{D G - M C L} \approx \frac{α σ _{2}^{2} + ( 1 - α ) σ _{1}^{2}}{m σ _{1}^{2} σ _{2}^{2}} \frac{1}{2} i = 1 \sum m e_{t, i}^{2}

λ = \frac{m σ _{1}^{2} σ _{2}^{2}}{α σ _{2}^{2} + ( 1 - α ) σ _{1}^{2}}

λ = \frac{m σ _{1}^{2} σ _{2}^{2}}{α σ _{2}^{2} + ( 1 - α ) σ _{1}^{2}}

P_{t ∣ t - 1}^{- 1} (x_{t} - \hat{x}_{t ∣ t - 1}) - \frac{λ}{m} i = 1 \sum m (α \frac{\partial κ _{1} ( e _{i} )}{\partial e _{i}} \frac{\partial e _{i}}{\partial x _{t}} + (1 - α) \frac{\partial κ _{2} ( e _{i} )}{\partial e _{i}} \frac{\partial e _{i}}{\partial x _{t}}) = 0

P_{t ∣ t - 1}^{- 1} (x_{t} - \hat{x}_{t ∣ t - 1}) - \frac{λ}{m} i = 1 \sum m (α \frac{\partial κ _{1} ( e _{i} )}{\partial e _{i}} \frac{\partial e _{i}}{\partial x _{t}} + (1 - α) \frac{\partial κ _{2} ( e _{i} )}{\partial e _{i}} \frac{\partial e _{i}}{\partial x _{t}}) = 0

\frac{\partial κ _{1} ( e _{i} )}{\partial e _{i}} = - \frac{e _{i}}{σ _{1}^{2}} κ_{1} (e_{i}), \frac{\partial κ _{2} ( e _{i} )}{\partial e _{i}} = - \frac{e _{i}}{σ _{2}^{2}} κ_{2} (e_{i})

\frac{\partial κ _{1} ( e _{i} )}{\partial e _{i}} = - \frac{e _{i}}{σ _{1}^{2}} κ_{1} (e_{i}), \frac{\partial κ _{2} ( e _{i} )}{\partial e _{i}} = - \frac{e _{i}}{σ _{2}^{2}} κ_{2} (e_{i})

P_{t ∣ t - 1}^{- 1} (x_{t} - \hat{x}_{t ∣ t - 1}) + \frac{λ}{m} i = 1 \sum m (\frac{α κ _{1} ( e _{i} )}{σ _{1}^{2}} + \frac{( 1 - α ) κ _{2} ( e _{i} )}{σ _{2}^{2}}) \frac{e _{i} \partial e _{i}}{\partial x _{t}} = 0

P_{t ∣ t - 1}^{- 1} (x_{t} - \hat{x}_{t ∣ t - 1}) + \frac{λ}{m} i = 1 \sum m (\frac{α κ _{1} ( e _{i} )}{σ _{1}^{2}} + \frac{( 1 - α ) κ _{2} ( e _{i} )}{σ _{2}^{2}}) \frac{e _{i} \partial e _{i}}{\partial x _{t}} = 0

Λ_{t, ii} = \frac{λ}{m} (\frac{α κ _{1} ( e _{i} )}{σ _{1}^{2}} + \frac{( 1 - α ) κ _{2} ( e _{i} )}{σ _{2}^{2}})

Λ_{t, ii} = \frac{λ}{m} (\frac{α κ _{1} ( e _{i} )}{σ _{1}^{2}} + \frac{( 1 - α ) κ _{2} ( e _{i} )}{σ _{2}^{2}})

P_{t ∣ t - 1}^{- 1} (x_{t} - \hat{x}_{t ∣ t - 1}) + \frac{\partial e _{t}}{\partial x _{t}} Λ_{t} e_{t} = 0

P_{t ∣ t - 1}^{- 1} (x_{t} - \hat{x}_{t ∣ t - 1}) + \frac{\partial e _{t}}{\partial x _{t}} Λ_{t} e_{t} = 0

\hat{x}_{t ∣ t} = ar g x_{t} min (\frac{1}{2} ∥ x_{t} - \hat{x}_{t ∣ t - 1} ∥_{P_{t ∣ t - 1}^{- 1}}^{2} + \frac{1}{2} ∥ y_{t} - h (x_{t}) ∥_{\overset{ˉ}{R}_{t}^{- 1}}^{2})

\hat{x}_{t ∣ t} = ar g x_{t} min (\frac{1}{2} ∥ x_{t} - \hat{x}_{t ∣ t - 1} ∥_{P_{t ∣ t - 1}^{- 1}}^{2} + \frac{1}{2} ∥ y_{t} - h (x_{t}) ∥_{\overset{ˉ}{R}_{t}^{- 1}}^{2})

\overset{ˉ}{R}_{t} = R_{t}^{T /2} Λ_{t}^{- 1} R_{t}^{1/2}

\overset{ˉ}{R}_{t} = R_{t}^{T /2} Λ_{t}^{- 1} R_{t}^{1/2}

∥ \hat{x}_{t ∣ t}^{k + 1} - \hat{x}_{t ∣ t}^{k} ∥ < ϵ

∥ \hat{x}_{t ∣ t}^{k + 1} - \hat{x}_{t ∣ t}^{k} ∥ < ϵ

\displaystyle\hat{\bm{x}}_{t|t}=\arg\min_{\bm{x}_{t}}\Bigg{\{}\frac{1}{2}\|\bm{x}_{t}

\displaystyle\hat{\bm{x}}_{t|t}=\arg\min_{\bm{x}_{t}}\Bigg{\{}\frac{1}{2}\|\bm{x}_{t}

L_{D G - M C L} \approx \frac{1}{2} i = 1 \sum m η_{i} e_{i, t}^{2}

L_{D G - M C L} \approx \frac{1}{2} i = 1 \sum m η_{i} e_{i, t}^{2}

η_{i} = \frac{1}{m} (\frac{α}{σ _{1}^{2}} + \frac{2 ( 1 - α )}{σ _{2} ∣ e _{i} ∣})

η_{i} = \frac{1}{m} (\frac{α}{σ _{1}^{2}} + \frac{2 ( 1 - α )}{σ _{2} ∣ e _{i} ∣})

λ_{i} = η_{t}^{- 1} = m (\frac{α}{σ _{1}^{2}} + \frac{2 ( 1 - α )}{σ _{2} ∣ e _{i} ∣})^{- 1}

λ_{i} = η_{t}^{- 1} = m (\frac{α}{σ _{1}^{2}} + \frac{2 ( 1 - α )}{σ _{2} ∣ e _{i} ∣})^{- 1}

\hat{x}_{t ∣ t} = ar g x_{t} min (\frac{1}{2} ∥ x_{t} - \hat{x}_{t ∣ t - 1} ∥_{P_{t ∣ t - 1}^{- 1}}^{2} + \frac{1}{2} ∥ y_{t} - h (x_{t}) ∥_{\overset{ˉ}{R}_{t}^{- 1}}^{2})

\hat{x}_{t ∣ t} = ar g x_{t} min (\frac{1}{2} ∥ x_{t} - \hat{x}_{t ∣ t - 1} ∥_{P_{t ∣ t - 1}^{- 1}}^{2} + \frac{1}{2} ∥ y_{t} - h (x_{t}) ∥_{\overset{ˉ}{R}_{t}^{- 1}}^{2})

\overset{ˉ}{R}_{t}

\overset{ˉ}{R}_{t}

Λ_{t}

Λ_{t, ii}

\displaystyle\left\{\begin{array}[]{l}\dot{x}_{1}=x_{2}\\ \dot{x}_{2}=\mu(1-x_{1}^{2})x_{2}-x_{1}\end{array}\right.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTarget Tracking and Data Fusion in Sensor Networks · Inertial Sensor and Navigation · Advanced Adaptive Filtering Techniques

Full text

Outlier-robust Kalman filters with mixture correntropy

Hongwei Wang

Wei Zhang

Junyi Zuo

Heping Wang

National Key Laboratory of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu, People’s Republic of China, 611731

School of Aeronautics, Northwestern Polytechnical University, Xi’an, People’s Republic of China, 710072

Abstract

We consider the robust filtering problem for a nonlinear state-space model with outliers in measurements. To improve the robustness of the traditional Kalman filtering algorithm, we propose in this work two robust filters based on mixture correntropy, especially the double-Gaussian mixture correntropy and Laplace-Gaussian mixture correntropy. We have formulated the robust filtering problem by adopting the mixture correntropy induced cost to replace the quadratic one in the conventional Kalman filter for measurement fitting errors. In addition, a tradeoff weight coefficient is introduced to make sure the proposed approaches can provide reasonable state estimates in scenarios where measurement fitting errors are small. The formulated robust filtering problems are iteratively solved by utilizing the cubature Kalman filtering framework with a reweighted measurement covariance. Numerical results show that the proposed methods can achieve a performance improvement over existing robust solutions.

keywords:

Robsut Kalman filter, mixture correntropy, cubature Kalman filter, state estimation, measurement outliers

††journal: fundingfundingfootnotetext: This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61473227 and 11472222.fundingfundingfootnotetext: Corresponding author: Hongwei Wang ([email protected]).

1 Introduction

State estimation for stochastic discrete-time dynamic systems is one of the vital issues in control engineering, and it has broad applications in various areas, such as target tracking, sparse signal processing, fault detection and diagnosis, pose estimation, and many others [1, 2, 3, 4, 5, 6, 7]. The state estimates of a linear system with Gaussian noises is provided by the celebrated Kalman filter [8]. For nonlinear systems with a Gaussian assumption (i.e., both the process and measurement noises are Gaussian), several Kalman-liked Gaussian approximation filters (GKF) were investigated, e.g., the unscented Kalman filter [9], cubature Kalman filter [10, 11], to name a few. These solutions have shown good performance when the Gaussian assumption meets in systems. In some applications, however, the Gaussian assumption of the measurement noise may fail since outliers may contaminate measurements due to unreliable sensors. Outliers lead to the measurement noise having a heavy tail and becoming non-Gaussian, resulting in substantial degradation of the existing GKFs.

The sequential Monte-Carlo sampling/particle filter (PF) [12] and the Gaussian sum filter (GSM) are two general strategies to deal with non-Gaussian noises caused by measurement outliers. In the PF, a massive number of particles are involved to approximate the posterior probability density function to obtain reasonable estimation results. In the GSM, the state estimates are obtained by combining the results from several parallelly implemented filters via an interacting procedure. Therefore, both the PF and GSM suffer from a great computational burden, which prevents them from being widely used in applications. In addition, a computationally economical approach, i.e., integrating the robust cost from $M$ -estimation (e.g., Huber’s cost) into the GKF framework [13, 14, 15], has also been studied. This type of robust filters was developed by interpreting the GKF filtering problem as linear or nonlinear regression. Other approaches for robust filtering such as the heavy-tailed distribution based solution [16] and the $H_{\infty}$ filter [17] were also reported in the literature.

Recently, a novel local similarity measure called correntropy from information-theoretic learning is introduced to deal with heavy-tailed non-Gaussian noises [18, 19, 20], and its associated maximum correntropy criterion (MCC) has been employed to design a robust filtering algorithm. In [21, 22] the MCC was first employed to improve the robustness of the KF for linear systems. Those methods employed the gradient descent approach and ignored the covariance propagation procedure, which may cause a potential loss of information. To handle this issue, a robust Kalman filter called MCKF [23] was developed via recasting the Kalman filtering problem as a linear regression one. Afterward, several variants of the MCKF were developed for nonlinear systems [24, 25, 26]. Although the feasibility of the MCC based robust filters for dealing with non-Gaussian noises has been demonstrated, the default kernel in the MCC, e.g., the Gaussian kernel, may not sufficient to deal with more complex data in many practical problems [27, 28]. Besides, there is still no guideline for the selection of the kernel parameter which has a significant influence of the MCC associated robust filter.

There is a need, therefore, for designing some kernel parameter-insensitive algorithms to deal with measurement outliers. To address this challenge, we propose two robust Kalman filters based on the mixture correntropy. We formulate the robust filtering problem by utilizing a mixture correntropy induced loss to replace the quadratic one in the GKF for measurement fitting errors. In addition, a weighting coefficient is included to seek a tradeoff between the model and measurement fitting errors. The resulting robust filtering problem are iteratively solved within the GKF framework with a reweighted measurement covariance. The simulation results have shown the superior performance of the proposed algorithms, and also provide a heuristic rule to design kernel parameters.

The remaining of the paper is organized as follows. In Section 2, we give a brief introduction of mixture correntropy. In section 3, we formulate the mixture correntropy based robust Kalman filtering problems and derive the related algorithms. In Section 4 we present a simulation example to verify the performance of the proposed algorithm. Finally, Section 5 concludes our work.

Notation: In this paper, boldface lower and upper-case letters represent column vectors and matrices, respectively. Scalars are denoted by normal font letters. $A^{T}$ means the transpose of the matrix $A$ . $\mathcal{N}(\cdot,\cdot)$ means a Gaussian distribution. The estimates of state $\bm{x}_{t}$ given the measurements up to $t=m$ is denoted by $\hat{\bm{x}}_{t|m}$ .

2 Brief review of the mixture correntropy

Correntropy is a newly developed similarity measure that originated from information-theoretic learning. Given two random variables $X$ and $Y$ , correntropy is defined by [18]

[TABLE]

where $\kappa(\cdot,\cdot)$ is a kernel function which satisfies Mercer’s theorem, $E(\cdot)$ denotes the expectation operation and $p(x,y)$ is the joint probability density function of $X$ and $Y$ . Readers can refer [18] for the properties of correntropy and its associated maximum correntropy criterion. The advantages of correntropy in dealing with measurement outliers in the Kalman filtering framework were illustrated in several literatures, e.g., [21, 22, 23, 24, 25, 26]. However, correntropy with a single kernel may suffer performance degradation when dealing with more complex data. In addition, correntropy is sensitive to the parameter of the kernel function, which may limit its performance. To address those issues and improve the flexibility, a mixture correntropy is introduced [27], i.e.,

[TABLE]

where $0<\alpha<1$ is the mixture coefficient, $\kappa_{1}(\cdot)$ and $\kappa_{2}(\cdot)$ are two different Mercer kernel functions. Calculating the exact value of the mixture correntropy is in general intractable due to the lack of knowledge of $p(x,y)$ . In practice, only some finite data samples $\{x_{i},y_{i}\}_{i=1}^{N}$ are available, hence the value of mixture correntropy can be empirically approximated by

[TABLE]

where $e_{i}=x_{i}-y_{i}$ . It is clear that the mixture correntropy will reduce to the original correntropy when $\alpha=0$ or $\alpha=1$ .

Remark 1

For simplicity, we mainly focus on the mixture correntropy with two different kernels in this work. The mixture correntropy, however, has a generalized form other than the definition in (2), i.e., $M(X,Y)=E\left(\sum_{i}\alpha_{i}\kappa_{i}(X,Y)\right)$ with $i\geq 2$ , $\alpha_{i}>0$ and $\sum_{i}\alpha_{i}=1$ .

Generally, the difference between $\kappa_{1}(\cdot)$ and $\kappa_{2}(\cdot)$ in the mixture correntropy can be reached via two approaches. In the first one, $\kappa_{1}(\cdot)$ and $\kappa_{2}(\cdot)$ may come from the same kernel family but with distinct kernel parameters, resulting in a homogenous mixture correntropy, e.g., the double-Gaussian kernel mixture correntropy (DG-MC) [27] where

[TABLE]

In the other, $\kappa_{1}(\cdot)$ and $\kappa_{2}(\cdot)$ may be the different types of kernel functions, leading to a heterogenous mixture correntropy, e.g., the Laplace-Gaussian kernel mixture correntropy (LG-MC) proposed in [28] in which

[TABLE]

It is apparent that both the DG-MC and LG-MC meet their maximum when $X=Y$ (i.e., two random variables are exactly the same). Therefore, we here define the mixture correntropy loss in (4) for facilitating the formulation of the optimization problem

[TABLE]

In the next section, we devote to utilizing the DG-MC loss (DG-MCL) and LG-MC loss (LG-MCL) to design robust filters.

3 Derivation of the proposed robust Kalman filter

Consider the stochastic dynamic process described by a state-space model

[TABLE]

where $\bm{y}_{t}\in\mathcal{R}^{m}$ is a measurement related to the state of interest $\bm{x}_{t}\in\mathcal{R}^{n}$ ; $f(\cdot)$ and $h(\cdot)$ are some known mappings to model the state transition and measurement procedure respectively; $\bm{w}_{t-1}\sim\mathcal{N}(0,\bm{Q}_{t-1})$ is the process noise and $\bm{v}_{t}$ is the measurement noise. In canonical Kalman filtering, $\bm{v}_{t}$ is assumed to be Gaussian, i.e., $\bm{v}_{t}\sim\mathcal{N}(0,\bm{R}_{t})$ . Under such a Gaussian assumption, the Kalman filtering problem can be formulated as the following minimization problem

[TABLE]

where $p(\bm{y}_{t}|\bm{x}_{t})$ is the likelihood function given by $\mathcal{N}(h(\bm{x}_{t}),\bm{R}_{t})$ , and $p(\bm{x}_{t}|\bm{y}_{1:t-1})$ is the predictive density which can be approximated by $\mathcal{N}(\hat{\bm{x}}_{t|t-1},\bm{P}_{t|t-1})$ in the Gaussian approximation filtering framework. Substituting both the predictive density and likelihood distribution into (7), and discarding the terms that do not depend on $\bm{x}_{t}$ , we can rewrite(7) as

[TABLE]

where $e_{t,i}$ is the $i$ -th component of $\bm{e}_{t}=\bm{R}_{t}^{-1/2}(\bm{y}_{t}-h(\bm{x}_{t}))$ . The optimization problem in (8) can be solved by several GKFs, e.g., CKF. In the following, we present our robust Kalman filters in conjunction with the CKF which is briefly introduced in A. It is straightforward to extend the proposed filters with other GKFs.

From (8) we note that the traditional Kalman filter based on the Gaussian assumption has a quadratic loss for the measurement fitting error. It is clear that the quadratic loss is sensitive to outliers, which is the main reason that causes the performance degradation of the KF in scenarios where measurement outliers encountered. In order to improve the robustness of the filtering algorithm against outliers, the DG-MCL and LG-MCL are utilized separately to replace the quadratic loss for the measurement fitting error to design the mixture correntropy based robust filters.

3.1 DG-MCL based robust Kalman filter

In this section, we first derive a robust Kalman filter based on the DG-MCL. Adopting the DG-MCL to the measurement fitting error leads to the following robust filtering problem

[TABLE]

where $\lambda$ is a weighting coefficient to make the balance between the model fitting error and measurement fitting error. $\lambda$ should be carefully chosen to obtain a reasonable estimation result. Specifically, we expect that the performance of the DG-MCL is similar to that of the quadratic loss when the measurement fitting error is small. It is noticed that for a small real vale $\delta$ , we have

[TABLE]

Therefore, for a small measurement fitting error, the DG-MCL can be approximated as

[TABLE]

In order to maintain the similarity of the quadratic loss and DG-MCL when the measurement fitting error is small, $\lambda$ should be determined as

[TABLE]

Differencing the cost function in (9) with regards to $\bm{x}_{t}$ , we have

[TABLE]

For the Gaussian kernel, we know that

[TABLE]

Substituting (12) into (11) results in

[TABLE]

Define a diagonal matrix $\bm{\Lambda}_{t}$ with its $i$ -th element given by

[TABLE]

With $\bm{\Lambda}_{t}$ , one can rewrite (13) into the matrix format as

[TABLE]

Equation (15) is essentially the derivative of the cost function of the following optimization problem

[TABLE]

where

[TABLE]

Despite simple structure, directly solving (16) is intractable due to the fact that $\bar{\bm{R}}_{t}$ depends on the state $\bm{x}_{t}$ via $\bm{\Lambda}_{t}$ . To address this, we adopt an alternate iterative algorithm. Specifically, for the given estimate $\hat{\bm{x}}_{t|t}^{k}$ after the $k$ -th iteration, we construct $\bm{\Lambda}^{k}_{t}$ via (14), and then $\bar{\bm{R}}_{t}^{k}$ via (17). In the next iteration, we solve the optimization problem (16) with $\bar{\bm{R}}_{t}^{k}$ to obtain $\hat{\bm{x}}_{t|t}^{k+1}$ . It is noted that (16) has a similar structure as the one under the Gaussian assumption illustrated in (8), which enables us to solve (16) by applying the existing Gaussian approximation filtering solutions, e.g., the CKF in A. This iteration loop continues until the algorithm converges, e.g., for a small tolerance $\epsilon$ ,

[TABLE]

At the beginning of the iteration procedure, we initialize $\bm{\Lambda}_{t}$ as an identity matrix, meaning that in the first loop the conventional CKF is implemented. The proposed robust filter is summarized in Algorithm 1.

3.2 LG-MCL based robust Kalman filter

Similarly, the LG-MCL based robust filtering problem can be formulated as

[TABLE]

where $\lambda_{i}$ is the weighting parameter. The reason why we assign a weighting parameter to each component of the measurement fitting is that the kernel functions in the LG-MC are heterogenous. Likewise, for a small fitting error, the LG-MCL can be approximated by

[TABLE]

where $\eta_{i}$ is given by

[TABLE]

Therefore, $\lambda_{i}$ should be chosen as in (21) to make sure that the LG-MCL has a similar performance of the quadratic loss when dealing with the small measurement fitting error

[TABLE]

Akin to the derivation of the DG-MCL based robust Kalman filter, the reformulated optimization problem for the LG-MCL based robust filtering problem is

[TABLE]

where

[TABLE]

Here we apply the similar iterative procedure to solve (22), and the details of the resulting robust filter is presented in Algorithm 2.

4 Simulations and results

In this section, we analyze the proposed algorithms by investigating two numerical simulations, e.g., estimating the state of a Van der Pol oscillator (VPO) and the state-of-charge (SoC) of a battery. For comparison, we also consider the conventional CKF and some existing robust filters, including the maximum correntropy derivative-free robust CKF (MCC-CKF) [26], linear regression and maximum correntropy based CKF (RMCC-CKF) [24] and Huber’s cost function based CKF (Huber-CKF) [15]. We use two different setups in the MCC-CKF, i.e., the MCC-CKF1 with $\{\sigma=100,\ \eta=4\}$ and MCC-CKF2 with $\{\sigma=100,\ \eta=5\}$ (setting $\sigma=100$ is to deal with the Gaussian process noise). We set the kernel parameter in the RMCC-CKF to $5$ and the threshold parameter in the Huber-CKF to $1.345$ . In the DG-MCL-CKF and LG-MCL-CKF, kernel parameters are determined as $\sigma_{1}=4$ and $\sigma_{2}=5$ , and the mixture coefficient $\alpha$ is set to $0.5$ .

Remark 2

The kernel parameters in the mixture correntropy will influence the performance of the MCL. In this work, we select these user-defined parameters by trial and error. Further studies, however, are needed to explore the detailed parameter-selection strategy, which would be beyond the scope of this work.

4.1 VPO model

The standard VPO model is given by

[TABLE]

where $\mu$ is a coefficient to control the nonlinearity of the VPO. Using a sampling interval $\delta$ to discretize the VPO results

[TABLE]

where $\bm{x}_{t}=[x_{1,t},x_{2,t}]^{T}$ is the state of interest and $\bm{w}_{t}$ is the process noise which is assumed to be Gaussian, i.e., $\bm{w}_{t}\sim\mathcal{N}(0,\bm{Q}_{t-1})$ . We utilize the fourth-order Runge-Kutta scheme to numerically calculate the integral terms in (28) which in general have no analytical solutions. Furthermore we assume that the noisy measurements are gathered via

[TABLE]

The measurement noise is modeled as the following Gaussian-mixture model to simulate the heavy-tailed property caused by outliers

[TABLE]

in which $\phi$ is the contaminating ratio, $\varphi$ is the outlier strength factor and $R_{t}$ is the covariance of the nominal measurement noise.

In the simulation, we set $\mu=1$ , and total samples $T=120$ with the sampling interval $\delta=0.1$ s are involved. The true value of the initial state is $\bm{x}_{0}=[0,-0.5]^{T}$ and the estimated initial state is generated by a Gaussian distribution $\mathcal{N}({\bm{x}}_{0},0.01\bm{I}_{2})$ . The covariance of the process noise and the nominal measurement noise are, respectively, given by $\bm{Q}_{t-1}=0.005\bm{I}_{2}$ and $R_{t}=1$ . $L=1000$ Monte Carlo runs are implemented to obtain the simulation results. The time-averaged root mean square (TRMSE) is employed as a metric, which is defined as

[TABLE]

First, we have studied the performance of the proposed methods versus the iteration number. Fig. 1 shows the TRMSEs of $x_{1}$ and $x_{2}$ when the iteration number of our algorithms varies from $1$ to $10$ . It is apparent that the proposed approaches converge after $2$ or $3$ iterations. In the following simulations, we set $3$ as a default value of the iteration number for the proposed algorithms.

Fig. 2 illustrates the TRMSEs of $x_{1}$ and $x_{2}$ with varying $\varphi$ and fixed $\phi=0.2$ ; Fig. 3 shows these data when $\phi$ varies and $\varphi=200$ . It can be seen that, as expected, the conventional CKF degrades significantly since the quadratic loss in the CKF is sensitive to outliers. Overall, our proposed DG-MCL-CKF and LG-MCL-CKF, which have similar performance, have the smallest TRMSEs among all robust solutions, and the RMCC-CKF has the largest ones. The inferior performance of the RMCC-CKF is due primarily to the linearization error during the linear regression procedure. The Huber-CKF performs comparably against to the MCC-CKF, the performance of which is significantly influenced by the kernel parameters. This is illustrated by the fact that the MCC-CKF1 outperforms the MCC-CKF2. Similar conclusions can also be drawn from Fig. 4 in which we present the RMSE of the two components of the state for the different algorithms in the scenario where $\phi=0.3$ and $\varphi=200$ .

We have further studied how the parameter $\alpha$ influences the performance of the proposed method. Table 1 presents the TRMSEs of $x_{1}$ and $x_{2}$ in the two selected scenarios with different $\alpha$ . We only show the data of the DG-MCL-CKF and omit that of the LG-MCL-CKF due to the similarity. Obviously, the DG-MCL-CKF degrades to the MCC-CKF1 when $\alpha=1$ while it turns to be the MCC-CKF2 when $\alpha=0$ . From the results one can observe that the DG-MCL-CKF (i.e., $\alpha\neq 1\ \text{or}\ \alpha\neq 0)$ outperforms both the MCC-CKF1 and MCC-CKF2, so it is concluded that the mixture correntropy is superior over the conventional correntropy. This may bring us a heuristic idea for designing a correntropy related robust Kalman filtering algorithm, i.e., using the mixture correntropy with a larger kernel parameter and a relative small one to alternate the original correntropy to skip the kernel parameter selection step. The optimal value of $\alpha$ , however, still needs further investigation.

4.2 SoC estimation in batteries

Owing to its high power density, low cost and long cycle life, the lithium-ion battery is widely employed in numerous applications such as electric vehicles. SoC, the level of the amount of charge remaining in a battery, is a crucial monitored parameter in these applications. Unfortunately, SoC is not in general physically measurable. A considerable amount of effort has been devoted to providing an accurate estimate of SoC. One common solution is based on the Kalman filter, in which the evolution of SoC over time is modeled by a nonlinear SSM according to the equivalent circuit of a battery.

Here we consider a equivalent circuit of the lithium-ion battery [29], which is showed in Fig. 5. The associated nonlinear system is given by

[TABLE]

where $a$ is SoC; $b$ is the voltage of the RC circuit; $c$ is the hysteresis voltage; $I$ is the discharging current; $y$ is the measurement of the terminal voltage; $\beta$ , $R_{d}$ , $C_{d}$ , and $R_{s}$ are some parameters of the lithium-ion battery. Clearly, the measurement is complicatedly related to SoC, hence outlier-contaminated measurements may influence the estimate accuracy of SoC. We here apply the proposed robust filters to reduce the negative effect of outliers.

In the simulation, denote $\bm{x}=[a,b,c]^{T}$ and discretize (34) by the Euler method to construct the SSM so that the KF can be applied to estimate SoC. We set $\beta=5.634^{-5}$ , $R_{d}=3^{-3}$ ${\Omega}$ , $C_{d}=9^{3}$ F, $R_{s}=5^{-3}$ ${\Omega}$ , $\gamma=2.47^{-3}$ . The process noise obeys $\mathcal{N}(0,10^{-6}\bm{I}_{3})$ , and the measurement noise is from the following Gaussian mixture noise

[TABLE]

where $\bm{R}=10^{-2}$ . The true value of the initial state is $\bm{x}_{0}=[1,0,0]^{T}$ . all filters are initialized by $\mathcal{N}(\hat{\bm{x}}_{0|0},\bm{P}_{0|0})$ where $\hat{\bm{x}}_{0|0}=[0.95,0.1,0.001]^{T}$ and $\bm{P}_{0|0}=0.05\bm{I}_{3}$ .

The TRMSE and RMSE of SoC, which are based on 100 independent Monte Carlo runs, are utilized as metrics to illustrate the performance of the different filters. The results for different filters are presented in Fig. 6 and Fig. 7. The TRMSEs of SoC when $\lambda=0.2$ and $\kappa$ varies are shown in Fig. 6a. Among the robust filters, the proposed MCL based solutions, which perform similarly, have the lowest TRMSE for all $\kappa$ . It also can be verified that the TRMSEs of all filters increase slightly when $\kappa$ is small, while fluctuate dramatically for these larger $\kappa$ . Fig. 6b shows the SoC TRMSEs versus the change of $\lambda$ . It is seen that the performance of all robust filters degrade with the increase of $\lambda$ . Again, under such scenarios, our methods outperform other robust filters.

The RMSEs of SoC for the compared robust solutions under a certain scenario are presented in Fig. 7. Although all filters have converged over time, the convergence speed of our methods is faster than others. It is noted that the convergence values of all robust filters are similar, which are about 30% smaller than that of the conventional CKF.

5 Conclusion

In this paper, we have investigated outlier-robust Kalman filters based on mixture correntropy for a nonlinear system involving the heavy-tailed measurement noise. Two mixture correntropy induced losses are employed to replace the quadratic loss for the measurement fitting error in the conventional Kalman filtering framework. The resulting robust Kalman filtering problems are then iteratively solved by the conventional CKF with a reweighted covariance matrix of the measurement noise. It can be noted from the simulation results that the proposed algorithms can outperform the existing MCC based solutions.

In the current work, we only consider two kind of Mercer’s kernels, i.e., the Gaussian kernel and Laplace kernel. We do not take other kernels such as the Student’s t kernel into account, which can be conducted in the further work. In addition, the mixture correntropy based on the multi-kernel method are expected to be the other research direction.

Appendix A Cubature Kalman Filter [10]

For the state-space model described in (5) and (6) with the Gaussian process and measurement noises, the CKF is implemented as follows:

Initialize the initial state $\bm{x}_{0}\sim\mathcal{N}(\hat{\bm{x}}_{0|0},\bm{P}_{0|0})$ and generate the basic weighted cubature point set $\{\bm{\xi}_{i},\eta_{i}\}$ for $i=1,\cdots,2n$ , where $n$ is the dimension of the state, $\bm{\xi}_{i}=\sqrt{n}[\bm{I}]_{i}$ , $[\bm{I}]=[\bm{I}_{n},-\bm{I}_{n}]$ and $\eta_{i}=1/(2n)$ . 2. 2.

Generate the sigma points related to the distribution $\mathcal{N}(\hat{\bm{x}}_{t-1|t-1},\bm{P}_{t-1|t-1})$

[TABLE] 3. 3.

Calculate the predicted state and its associated error covariance

[TABLE] 4. 4.

Generate the sigma points for the predicted distribution $\mathcal{N}(\hat{\bm{x}}_{t|t-1},\bm{P}_{t|t-1})$

[TABLE] 5. 5.

Calculate the predicted measurement, predicted measurement covariance and state-measurement covariance

[TABLE] 6. 6.

Obtain the filtered state and its associated error covariance

[TABLE]

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. S. Grewal, A. P. Andrews, Applications of Kalman filtering in aerospace 1960 to the present [historical perspectives], IEEE Control Systems Magazine 30 (3) (2010) 69–78.
2[2] F. Auger, M. Hilairet, J. M. Guerrero, E. Monmasson, T. Orlowska-Kowalska, S. Katsura, Industrial applications of the Kalman filter: A review, IEEE Transactions on Industrial Electronics 60 (12) (2013) 5458–5471.
3[3] P. Lu, L. Van Eykeren, E. Van Kampen, C. De Visser, Q. Chu, Adaptive three-step Kalman filter for air data sensor fault detection and diagnosis, Journal of Guidance, Control, and Dynamics (null) (2015) 590–604.
4[4] H. Wang, H. Yu, M. Hoy, J. Dauwels, H. Wang, Variational Bayesian dynamic compressive sensing, in: 2016 IEEE International Symposium on Information Theory (ISIT), IEEE, 2016, pp. 1421–1425.
5[5] T.-S. Lou, L. Wang, H. Su, M.-W. Nie, N. Yang, Y. Wang, Desensitized cubature Kalman filter with uncertain parameters, Journal of the Franklin Institute 354 (18) (2017) 8358–8373.
6[6] H. A. Hashim, L. J. Brown, K. Mc Isaac, Nonlinear stochastic attitude filters on the special orthogonal group 3: Ito and Stratonovich, IEEE Transactions on Systems, Man, and Cybernetics: Systems.
7[7] H. A. Hashim, L. J. Brown, K. Mc Isaac, Nonlinear stochastic position and attitude filter on the special euclidean group 3, Journal of the Franklin Institute 356 (7) (2019) 4144–4173.
8[8] R. E. Kalman, A new approach to linear filtering and prediction problems, Journal of basic Engineering 82 (1) (1960) 35–45.