Large-scale distributed Kalman filtering via an optimization approach

Mathias Hudoba de Badyn; Mehran Mesbahi

arXiv:1704.03125·math.OC·April 12, 2017

Large-scale distributed Kalman filtering via an optimization approach

Mathias Hudoba de Badyn, Mehran Mesbahi

PDF

TL;DR

This paper introduces a novel distributed Kalman filtering method that employs an optimization approach, specifically gradient descent, to efficiently estimate the error covariance matrix in large-scale sensor networks.

Contribution

It extends existing distributed Kalman filtering techniques by integrating an optimization-based update for the error covariance, reducing computational complexity in high-dimensional systems.

Findings

01

The proposed method effectively reduces computational costs.

02

The filter maintains accuracy comparable to traditional approaches.

03

Applications demonstrate improved scalability in sensor networks.

Abstract

Large-scale distributed systems such as sensor networks, often need to achieve filtering and consensus on an estimated parameter from high-dimensional measurements. Running a Kalman filter on every node in such a network is computationally intensive; in particular the matrix inversion in the Kalman gain update step is expensive. In this paper, we extend previous results in distributed Kalman filtering and large-scale machine learning to propose a gradient descent step for updating an estimate of the error covariance matrix; this is then embedded and analyzed in the context of distributed Kalman filtering. We provide properties of the resulting filters, in addition to a number of applications throughout the paper.

Figures5

Click any figure to enlarge with its caption.

Equations92

x_{k + 1} \tilde{y}_{k} = A x_{k} + B u_{k} + Υ_{k} w_{k} = C_{k} x_{k} + v_{k},

x_{k + 1} \tilde{y}_{k} = A x_{k} + B u_{k} + Υ_{k} w_{k} = C_{k} x_{k} + v_{k},

\overset{x}{^}_{k + 1}

\overset{x}{^}_{k + 1}

K_{k}

P_{k + 1}

β_{k + 1} = β_{k} - μ \nabla f (β_{k}),

β_{k + 1} = β_{k} - μ \nabla f (β_{k}),

β_{k + 1}^{(i)} = β_{k}^{(i)} - \frac{1}{2} μ \frac{\partial ( δ _{k}^{T} δ _{k} )}{\partial β _{k}^{(i)}} .

β_{k + 1}^{(i)} = β_{k}^{(i)} - \frac{1}{2} μ \frac{\partial ( δ _{k}^{T} δ _{k} )}{\partial β _{k}^{(i)}} .

β_{k + 1}^{(i)}

β_{k + 1}^{(i)}

h_{k + 1}^{(i)}

\hat{P}_{k}^{+} = diag (exp (β_{k + 1}^{(i)}))

\hat{P}_{k}^{+} = diag (exp (β_{k + 1}^{(i)}))

β_{k + 1}^{(i)}

β_{k + 1}^{(i)}

= β_{k}^{(i)} + μ δ_{k}^{T} C^{[i]} h_{k}^{(i)} (K_{k - 1} (β_{k}^{(i)})) .

β_{k + 1}^{(i)}

β_{k + 1}^{(i)}

= α_{k}^{(i)} + μ δ_{k}^{T} C^{[i]} h_{k}^{(i)} (K_{k - 1} (α_{k}^{(i)}))

α_{k}^{(i)} = β_{k}^{(i)} + \frac{k - 1}{k + 2} (β_{k}^{(i)} - β_{k - 1}^{(i)}) .

α_{k}^{(i)} = β_{k}^{(i)} + \frac{k - 1}{k + 2} (β_{k}^{(i)} - β_{k - 1}^{(i)}) .

\hat{P}

\hat{P}

β_{k + 1}^{(i)}

β_{k + 1}^{(i)}

β_{k + 1}^{(i)} = β_{k}^{(i)} - \frac{1}{2} μ \nabla f (β_{k}^{(i)}) .

β_{k + 1}^{(i)} = β_{k}^{(i)} - \frac{1}{2} μ \nabla f (β_{k}^{(i)}) .

\frac{1}{2} μ_{k} = ar g λ min ∥ Δ β - λ Δ g (β) ∥ .

\frac{1}{2} μ_{k} = ar g λ min ∥ Δ β - λ Δ g (β) ∥ .

μ_{k} = 2 \frac{Δ g ( β ) ^{T} Δ β}{Δ g ( β ) ^{T} Δ g ( β )}, Δ β = β_{k} - α_{k - 1} .

μ_{k} = 2 \frac{Δ g ( β ) ^{T} Δ β}{Δ g ( β ) ^{T} Δ g ( β )}, Δ β = β_{k} - α_{k - 1} .

\tilde{x}_{k} = x_{k} - \overset{x}{^}_{k}^{-}

\tilde{x}_{k} = x_{k} - \overset{x}{^}_{k}^{-}

\tilde{x}_{k + 1}

\tilde{x}_{k + 1}

= (A - L C) \overset{x}{^}_{k}^{-} + (w_{k} - L v_{k}),

L = A \hat{P} C^{T} (C \hat{P} C^{T} + R)^{- 1} .

L = A \hat{P} C^{T} (C \hat{P} C^{T} + R)^{- 1} .

A - L C

A - L C

N^{(ii)} = ⎩ ⎨ ⎧ \frac{e ^{β_{k}^{(i)}} ( C ^{(ii)} ) ^{2}}{e ^{β_{k}^{(i)}} ( C ^{(ii)} ) ^{2} + R ^{(ii)}} 0 if 1 \leq i \leq p if p < i \leq n .

N^{(ii)} = ⎩ ⎨ ⎧ \frac{e ^{β_{k}^{(i)}} ( C ^{(ii)} ) ^{2}}{e ^{β_{k}^{(i)}} ( C ^{(ii)} ) ^{2} + R ^{(ii)}} 0 if 1 \leq i \leq p if p < i \leq n .

∣ λ (L) ∣

∣ λ (L) ∣

\leq ∣ λ_{m a x} (A) ∣∣ λ_{m a x} (I - N) ∣

\leq ∣ λ_{m a x} (A) ∣,

\displaystyle A=\left[\begin{array}[]{cc}1.001&0.011\\ -0.0301&0.98\end{array}\right],B=\left[\begin{array}[]{c}5\mathrm{E}-5\\ 1\mathrm{E}-2\end{array}\right],C=I

\displaystyle A=\left[\begin{array}[]{cc}1.001&0.011\\ -0.0301&0.98\end{array}\right],B=\left[\begin{array}[]{c}5\mathrm{E}-5\\ 1\mathrm{E}-2\end{array}\right],C=I

\dot{\overset{x}{^}}_{i} = j \in N_{i} \sum (\overset{x}{^}_{j} - \overset{x}{^}_{i}) .

\dot{\overset{x}{^}}_{i} = j \in N_{i} \sum (\overset{x}{^}_{j} - \overset{x}{^}_{i}) .

\overset{u}{˙} = (α \frac{\partial ^{2} u}{\partial x} + β \frac{\partial ^{2} u}{\partial y}) .

\overset{u}{˙} = (α \frac{\partial ^{2} u}{\partial x} + β \frac{\partial ^{2} u}{\partial y}) .

\dot{u} = \frac{2}{Δ x ^{2}} [I_{n} \otimes (α D) + (β D) \otimes I_{n}] u, u = vec (u_{ij}^{k})

\dot{u} = \frac{2}{Δ x ^{2}} [I_{n} \otimes (α D) + (β D) \otimes I_{n}] u, u = vec (u_{ij}^{k})

β_{k + 1}^{(i)} = β_{k}^{(i)} - \frac{1}{2} μ \frac{\partial ( δ _{k}^{T} δ _{k} )}{\partial β _{k}^{(i)}},

β_{k + 1}^{(i)} = β_{k}^{(i)} - \frac{1}{2} μ \frac{\partial ( δ _{k}^{T} δ _{k} )}{\partial β _{k}^{(i)}},

\frac{\partial ( δ _{k}^{T} δ _{k} )}{\partial x ^ _{k}^{(i)}}

\frac{\partial ( δ _{k}^{T} δ _{k} )}{\partial x ^ _{k}^{(i)}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Large-Scale Distributed Kalman Filtering via an Optimization Approach

Mathias Hudoba de Badyn∗ Mehran Mesbahi∗

∗William E. Boeing Department of Aeronautics and Astronautics, University of Washington, Seattle, WA 98195 USA

(e-mails: {hudomath,mesbahi}@uw.edu).

Abstract

Large-scale distributed systems such as sensor networks, often need to achieve filtering and consensus on an estimated parameter from high-dimensional measurements. Running a Kalman filter on every node in such a network is computationally intensive; in particular the matrix inversion in the Kalman gain update step is expensive. In this paper, we extend previous results in distributed Kalman filtering and large-scale machine learning to propose a gradient descent step for updating an estimate of the error covariance matrix; this is then embedded and analyzed in the context of distributed Kalman filtering. We provide properties of the resulting filters, in addition to a number of applications throughout the paper.

keywords:

Machine learning, fast Kalman algorithms, state estimation, gradient methods

††thanks: This research was supported by the U.S. Army Research Laboratory and the U.S. Army Research Office under contract number W911NF-13-1-0340 and AFOSR grant FA9550-16-1-0022.

1 Introduction

The Kalman filter is an algorithm that uses the known dynamics of a system to remove noise from measurements of that system. When considering large-scale dynamical systems, implementation of the standard or extended Kalman filters can be computationally difficult. In such cases, the Kalman filter requires the inversion of very large matrices at each timestep. This may cause the Kalman filter to run slower than the dynamical process it is trying to measure, or to severely reduce the temporal resolution of the measurements.

There are many systems for which measurements are taken by a network of sensors and are of high dimension; examples of such systems can be found in Khan and Moura (2008) and Kutz (2013). Previous methods for circumventing this problem include decomposing the dynamical system being measured into several subsystems and distributing the subsystems over the sensor network as in Khan and Moura (2008), or using Monte-Carlo methods for estimating the error covariance, such as in Furrer and Bengtsson (2007).

Previously, Sutton (1992) proposed to modify the Kalman filter error covariance update with a gradient descent method for the purpose of mimimizing memory consumption, albeit for a specific instantiation of a SISO linear system. In this paper, we extend the gradient descent algorithm for estimating the Kalman filter error covariance to the general MIMO linear system as a proposed solution to the problem of running a Kalman filter on a high-dimensional system. We improve the gradient descent using Nesterov acceleration and adaptive learning rate methods. Lastly, we apply the methods above to distributed Kalman filtering on a sensor network.

Distributed Kalman filtering seeks to estimate the state of a system by distributing the tasks of measuring the system and subsequently filtering the data to many agents, who then collectively assemble the state estimate. Such algorithms utilizing consensus to provide a global estimate were presented by Olfati-Saber (2005) and Olfati-Saber (2007), and later extended by Carli et al. (2008). Performance of distributed Kalman filters using graph-theoretic quantities were studied by Spanos et al. (2005).

The organization of the paper is as follows. In §2, we outline the mathematical notation and conventions we use. We discuss the gradient descent algorithm, Nesterov acceleration and adaptive learning rate methods in §3, and the distributed version of these algorithms with a relevant example in §4. The paper is summarized in §5.

2 MATHEMATICAL PRELIMINARIES

In this section, we lay out the mathematical notation used in this paper, and summarize the essential background on Kalman filtering.

Consider the noisy discrete-time controlled linear system

[TABLE]

where $x_{k}\in\mathbb{R}^{n}$ denotes the state vector, $u_{k}\in\mathbb{R}^{m}$ denotes the control vector, $\tilde{y}_{k}\in\mathbb{R}^{p}$ is the output vector or measurement, and $w_{k},v_{k}$ are Gaussian white noise vectors of appropriate dimensions with covariances $Q_{k}$ and $R_{k}$ respectively. The subscript $k$ refers to the timestep. The random vectors $w_{k}$ and $v_{k}$ represent system disturbances and sensor noise respectively, and the matrix $\Upsilon_{k}$ describes how the disturbance propagates into the system. Since most matrix quantities will have a subscript $k$ denoting the timestep, we will refer to the $ij$ th entry of a matrix $A_{k}$ with bracketed superscripts: $A_{k}^{(ij)}$ . Similarly, the $i$ th entry of a vector $a_{k}$ is denoted $a_{k}^{(i)}$ , the $i$ th column of a matrix $A_{k}$ is denoted $A_{k}^{[i]}$ and the $i$ th row is denoted $A_{k}^{\{i\}}$ .

A network is represented by a graph $\mathcal{G}=(V,E)$ where $V$ is a set of nodes representing agents in the network, and $E$ is the set of edges $\{i,j\}$ representing the connections between agents $i$ and $j$ . The neighbourhood set $N_{i}$ of node $i$ is the set of indices $j$ such that $\{i,j\}$ is an edge. We assume for simplicity that the graph is undirected, meaning that $\{i,j\}$ and $\{j,i\}$ represent the same edge.

The purpose of Kalman filtering is to provide an accurate estimate $\hat{x}_{k}$ of the state $x_{k}$ using the measurement $\tilde{y}_{k}$ and the known information about the system and noise. The standard discrete-time linear Kalman filter yields the estimate $\hat{x}_{k+1}$ and is given by

[TABLE]

where $K_{k}$ is the Kalman gain and $P_{k}$ is the error covariance matrix. The problem for high-dimensional systems is the inverse when computing $K_{k}$ . In this paper, we assume that the sensor noise is uncorrelated, and so $R_{k}$ is a diagonal positive-definite matrix. This leaves the term $CP_{k}C^{T}$ ; if this term is diagonal, then the matrix inverse becomes a series of $n$ scalar divisions. We propose to replace $P_{k}$ with a diagonal estimate $\hat{P}_{k}$ , and one can also assume that $C$ is a matrix that measures individual states of the system without redundancy. Therefore, a system with $1\leq p\leq n$ outputs will have a measurement matrix of the form $C=[\mathrm{diag}(C^{(ii)})I_{p}~{}\mathbf{0}_{n-p}]$ . This assumption is common for networked systems where one measures the state of individual nodes of the network, see for example Chapman and Mesbahi (2015). In the next few sections, we will formalize these assertions in the context of our algorithm.

3 GENERAL KALMAN FILTERING

In this section, we extend the gradient descent estimate for the error covariance proposed by Sutton (1992). In §3.1, we discuss the linear Kalman filter, and then extend the error covariance estimate to an accelerated gradient descent in §3.2. The accelerated gradient descent benefits from a clever adaptive learning rate, which is discussed in §3.3. The final algorithm combining all these methods is summarized in Algorithm 2.

3.1 Gradient Descent for the Error Covariance Update: Linear Case

Gradient descent is an iterative algorithm that seeks to find the local minimum of a function $f$ by stepping in the direction of the largest gradient:

[TABLE]

where $\mu$ is the learning rate. For the Kalman filter, we seek to find a diagonal matrix estimate $\hat{P}_{k}$ of $P_{k}$ . We do this by assuming $\hat{P}_{k}$ is of the form $\hat{P}_{k}=\mathrm{diag}(\hat{P}_{k}^{(ii)})=\mathrm{diag}(e^{\beta_{k}^{i}})$ , where $\beta_{k}$ is a parameter undergoing gradient descent attempting to minimize the norm of the error $\delta_{k}=\tilde{y}_{k}-C\hat{x}_{k}$ :

[TABLE]

Computing the gradient of $\delta_{k}^{T}\delta_{k}$ as outlined in Appendix A yields the following gradient descent equations for $\hat{P}$ :

[TABLE]

The form of $\hat{P}^{+}$ guarantees that it remains positive-definite, which is required to preserve convergence of the Kalman filter.

3.2 Nesterov-Accelerated Methods

Nexterov acceleration is a method used to increase the convergence rate of gradient descent [Nesterov (1983)]. Although Nesterov-accelerated gradient descent converges in fewer timesteps, it does so by sacrificing monotonicity: the gradient descent trajectory will tend to oscillate as it converges towards the estimate.

In order to implement Nesterov acceleration, one must see that the quantity $h_{k}^{(i)}$ is a function of $K_{k-1}$ , which in turn is a function of $\beta^{(i)}_{k}$ :

[TABLE]

Hence, we can get the form of the Nesterov-accelerated gradient descent by introducing the quantity $\alpha_{k}$ :

[TABLE]

where $\alpha_{k}^{(i)}$ is given by

[TABLE]

Therefore, the Nesterov-accelerated estimate $\hat{P}$ replaces the covariance update with the following equations:

[TABLE]

where $h_{k}$ is as in Equation (1), but using this new expression for $k_{k}^{(i)}$ in Equation (2).

3.3 Adaptive Learning Rate Methods

Using a constant learning rate $\mu$ can lead to suboptimal performance, and therefore it is prudent to adapt $\mu$ on-the-fly as the gradient descent is run. A well-known method for adapting the learning rate is given by Barzilai and Borwein Barzilai and Borwein (1988). Suppose we have the gradient descent with learning rate $\mu$ :

[TABLE]

Let $\Delta\beta=\beta_{k}-\beta_{k+1}$ and let $\Delta g(\beta)=\nabla f\left(\beta_{k+1}\right)-\nabla f\left(\beta_{k}\right)$ . Then, the Barzilai and Borwein method selects the learning rate parameter using a secant line approximation:

[TABLE]

For accelerated gradient descent, the updated learning rate $\mu$ is given by

[TABLE]

Combining the accelerated gradient descent estimate of the error covariance with the adaptive learning rate $\mu_{k}$ yields the final algorithm as outlined in Algorithm 1.

3.4 Filter Properties

In the standard Kalman filter, $P_{k}$ is computed from an algebraic Ricatti equation. In order for this equation to have a positive-definite solution, a requirement for stable error dynamics, one must assume that $(A^{T},C^{T})$ is stabilizable and $(A,\sqrt{Q})$ is detectable (see Dorato et al. (1994)). Therefore, since our estimate of $P$ is positive definite by construction and not computed by an algebraic Ricatti equation, we do not need to have these assumptions. The next theorem shows that all one needs for stability of the error dynamics is for the system matrix $A$ to be stable.

Theorem 1

Suppose $A$ is an asymptotically stable matrix, and so the eigenvalues of $A$ satisfy $|\lambda(A)|\leq 1$ . Then, the error dynamics

[TABLE]

are stable, where $\hat{x}_{k}^{-}$ is the estimate computed according to Algorithm 1.

{pf}

We can write the error dynamics as

[TABLE]

where $L$ is given by

[TABLE]

Hence, it suffices to show that $(A-LC)$ has eigenvalues in the unit disc. We can write $(A-LC)$ as

[TABLE]

where $N=\hat{P}C^{T}(C\hat{P}C^{T}+R)^{-1}C$ . By the form of $C$ and $\hat{P}$ , if the system has $p$ outputs, then $N$ is a diagonal matrix with the form

[TABLE]

It is clear that $0\leq N^{(ii)}<1$ since $R$ is a positive definite diagonal matrix. Therefore, $0<(I-N)^{(ii)}\leq 1$ , and so it follows that

[TABLE]

and so by the asymptotic stability of $A$ , the asymptotic stability of $(A-LC)$ follows.

3.5 Example

Here, we show a simple numerical implementation of Algorithm 1 to see how the algorithm performs compared to the standard Kalman filter for different strengths of noise. Consider the linear system with matrices

[TABLE]

propagating with process noise of covariance $Q=0.001$ and measurement noise of covariance $R=\alpha I$ for various values of $\alpha$ . The first and second plots of Figure 1 show the error and first state of the system for $R=0.115I$ for the raw measurement data, and the filtered data using the accelerated gradient descent and the standard Kalman filter. The third plot shows the average steady state error computed on the time interval $t\in[6.6,10]$ for various values of $\alpha$ indicating the covariance of the measurement error. Figure 1 shows that the filter in Algorithm 1 achieves similar performance as the standard Kalman filter.

4 DISTRIBUTED KALMAN FILTERING

Olfati-Saber (2007) describes several methods of Kalman filtering using a network of sensors, with each $n$ th sensor using a unique sensor model $\tilde{y}_{n,k}=C_{n,k}x_{k}+v_{n,k}$ , to arrive at a global measurement of a system. The context of their system model is on a graph $\mathcal{G}$ , which is an abstract representation of the connection structure of sensors, which are represented by nodes. An edge in the graph denotes a communication link between sensors by which they can exchange information about their current state estimate. One such information exchange algorithm is called consensus, where each node in the graph changes its estimated value by continuously averaging with the estimates of its neighbours:

[TABLE]

One particular algorithm in Olfati-Saber (2007) utilizes a Kalman-consensus filter in which every node of the network computes an estimate of the state of the system and then performs consensus with its neighbours on this estimate. This allows sensors that only see a portion of the system to collect a global state estimate over time.

For high-dimensional systems, implementing this filter is problematic again because each sensor has to invert a large matrix at each timestep in order to compute the Kalman filter gain. In Algorithm 2, we summarize the Kalman-consensus filter, where every node performs an accelerated gradient descent with adaptive learning rate, and shares its estimate with its neighbours. In the following section, we provide an example of such a high-dimensional system.

4.1 Example

Consider a physical phenomenon, such as weather, propagating on a planar surface according to the partial differential equation

[TABLE]

As studied in Khan and Moura (2008), and Kutz (2013), by discretizing the surface into an equally-spaced $n\times n$ grid in the $x$ and $y$ directions and the temporal dimension, thus obtaining a grid of values $u_{ij}^{k}$ , one can construct the linear system representation

[TABLE]

where $D$ is a tridiagonal matrix with $1$ ’s on the sub and superdiagonal, and $-2$ ’s on the diagonal. The discrete-time state matrix $A$ can be found using a matrix exponential, or a finite-order Taylor expansion. Periodic boundary conditions may be added by setting $D_{1n}=D_{n1}=1$ . Suppose that this phenomenon is being measured according to this discretization. We use a sensor model such that each node collects data from an $n_{l}\times n_{l}$ square portion of the grid $u_{ij}^{k}$ so that each individual grid point is measured by at least one sensor. The sensor network by the graph in Figure 2 is used, with $n_{l}=4$ for a $50\times 50$ grid, and the sensor noise is given unit covariance. Five 2D Gaussian functions are used to set the initial conditions for the simulation, which was run over a time interval of 10 seconds. The true state of the system at 1.2s is shown in Figure 3, and an aggregate measurement of the entire system is shown in Figure 4. An estimate from a single sensor at this timestep is shown in Figure 5, and one can see that there is a substantial reduction of noise.

5 Conclusion

In this paper, we extended the results of Sutton (1992) to produce a gradient descent estimate of the error covariance in a standard state estimation algorithm for linear systems. This gradient descent method was improved with Nesterov acceleration and an adaptive learning rate algorithm.

Finally, the algorithm was extended to improve the Kalman-consensus filter on a sensor network observing a distributed high-dimensional process. The diagonal estimate of $P$ removes the necessity of computing a large matrix inverse at each timestep at each sensor node, and allows for a computationally cheaper and therefore faster state estimation of the system over the entire network. The modified Kalman-consensus filter was implemented on a simple 2D diffusion model with 2500 grid points corresponding to the states of the system, and the algorithm was shown to remove a substantial amount of noise from the measurements, and to propagate the state estimate quickly across the sensor network.

Appendix A Gradient Descent for Kalman Filter Covariance Update

Sutton’s Gradient Descent for updating the error covariance matrix approximation $\hat{P}=\mathrm{diag}\left(\hat{P}_{k}^{(ii)}\right)=\mathrm{diag}\left(\exp\beta^{(i)}_{k+1}\right)$ uses gradient descent on $\beta_{k+1}^{(i)}$ :

[TABLE]

where $\delta_{k}=\tilde{y}_{k}-C\hat{x}_{k}^{-}$ is the error: To derive the exact form of the gradient descent on $\beta_{k}^{(i)}$ , consider the following terms:

[TABLE]

Next, we have

[TABLE]

We assume that for $i\neq j$ ,

[TABLE]

Next, define the following two quantities:

[TABLE]

Next, we compute the following derivatives:

[TABLE]

Now, we can write the gradient descent on $\beta^{(i)}$ :

[TABLE]

All that remains is to find an approximation to $\frac{\partial\hat{x}_{k}^{-(i)}}{\partial\beta^{(i)}}$ . We denote this approximation with $h_{k+1}^{(i)}$ :

[TABLE]

This yields the final gradient descent update:

[TABLE]

In conclusion, the Kalman filter with gradient descent error covariance update is given by the equations

[TABLE]

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Barzilai and Borwein (1988) Barzilai, J. and Borwein, J.M. (1988). Two-point step size gradient methods.
2Carli et al. (2008) Carli, R., Chiuso, A., Schenato, L., and Zampieri, S. (2008). Distributed Kalman filtering based on consensus strategies. IEEE Journal on Selected Areas in Communications , 26(4), 622–633.
3Chapman and Mesbahi (2015) Chapman, A. and Mesbahi, M. (2015). State controllability, output controllability and stabilizability of networks : A symmetry perspective. In Proc. 54th IEEE Conference on Decision and Control , 4776–4781. Osaka, Japan.
4Dorato et al. (1994) Dorato, P., Cerone, V., and Abdallah, C. (1994). Linear-Quadratic Control: An Introduction . Simon & Schuster.
5Furrer and Bengtsson (2007) Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. Journal of Multivariate Analysis , 98(2), 227–255.
6Khan and Moura (2008) Khan, U.A. and Moura, J.M.F. (2008). Distributing the Kalman filter for large-scale systems. IEEE Transactions on Signal Processing , 56(10), 4919–4935.
7Kutz (2013) Kutz, J.N. (2013). Data-driven modeling & scientific computation: Methods for complex systems & big data . Oxford University Press.
8Nesterov (1983) Nesterov, Y. (1983). A method of solving a convex programming problem with convergence rate O (1/k^2). In Soviet Mathematics Doklady , volume 27, 372–376.