Performance analysis of local ensemble Kalman filter

Xin T. Tong

arXiv:1705.10598·math.PR·April 4, 2018·J. Nonlinear Sci.

Performance analysis of local ensemble Kalman filter

Xin T. Tong

PDF

TL;DR

This paper provides a rigorous analysis of the local ensemble Kalman filter (LEnKF) for linear systems, establishing conditions for error control and revealing an intrinsic inconsistency due to localization, supported by numerical validation.

Contribution

It offers the first rigorous theoretical analysis of LEnKF error behavior for linear systems with localized structures and sparse observations.

Findings

01

Filter error dominated by ensemble covariance under certain conditions

02

Stable localized structure is necessary for controlling localization inconsistency

03

Numerical validation confirms theoretical predictions

Abstract

Ensemble Kalman filter (EnKF) is an important data assimilation method for high dimensional geophysical systems. Efficient implementation of EnKF in practice often involves the localization technique, which updates each component using only information within a local radius. This paper rigorously analyzes the local EnKF (LEnKF) for linear systems, and shows that the filter error can be dominated by the ensemble covariance, as long as 1) the sample size exceeds the logarithmic of state dimension and a constant that depends only on the local radius; 2) the forecast covariance matrix admits a stable localized structure. In particular, this indicates that with small system and observation noises, the filter error will be accurate in long time even if the initialization is not. The analysis also reveals an intrinsic inconsistency caused by the localization technique, and a stable localized…

Equations487

[A \circ D]_{i, j} = [A]_{i, j} [D]_{i, j} .

[A \circ D]_{i, j} = [A]_{i, j} [D]_{i, j} .

X_{n + 1} = A_{n} X_{n} + b_{n} + ξ_{n}, ξ_{n + 1} \sim N (0, Σ_{n}), Y_{n + 1} = H X_{n + 1} + ζ_{n}, ζ_{n + 1} \sim N (0, σ_{o}^{2} I_{q}) .

X_{n + 1} = A_{n} X_{n} + b_{n} + ξ_{n}, ξ_{n + 1} \sim N (0, Σ_{n}), Y_{n + 1} = H X_{n + 1} + ζ_{n}, ζ_{n + 1} \sim N (0, σ_{o}^{2} I_{q}) .

∥ A_{n} ∥ \leq M_{A}, m_{Σ} I_{d} ⪯ Σ_{n} ⪯ M_{Σ} I_{d} .

∥ A_{n} ∥ \leq M_{A}, m_{Σ} I_{d} ⪯ Σ_{n} ⪯ M_{Σ} I_{d} .

[H]_{k, j} = \mathds 1_{j = o_{k}}, 1 \leq k \leq q, 1 \leq j \leq d .

[H]_{k, j} = \mathds 1_{j = o_{k}}, 1 \leq k \leq q, 1 \leq j \leq d .

X_{n + 1}^{(k)} = A_{n} X_{n}^{(k)} + b_{n} + ξ_{n + 1}^{(k)}, ξ_{n + 1}^{(k)} \sim N (0, Σ_{n}) .

X_{n + 1}^{(k)} = A_{n} X_{n}^{(k)} + b_{n} + ξ_{n + 1}^{(k)}, ξ_{n + 1}^{(k)} \sim N (0, Σ_{n}) .

\overline{X}_{n + 1} = \frac{1}{K} k = 1 \sum K X_{n + 1}^{(k)}, Δ X_{n + 1}^{(k)} := X_{n + 1}^{(k)} - \overline{X}_{n + 1}, C_{n + 1} = \frac{1}{K} k = 1 \sum K Δ X_{n + 1}^{(k)} \otimes Δ X_{n + 1}^{(k)} .

\overline{X}_{n + 1} = \frac{1}{K} k = 1 \sum K X_{n + 1}^{(k)}, Δ X_{n + 1}^{(k)} := X_{n + 1}^{(k)} - \overline{X}_{n + 1}, C_{n + 1} = \frac{1}{K} k = 1 \sum K Δ X_{n + 1}^{(k)} \otimes Δ X_{n + 1}^{(k)} .

X_{n + 1}^{(k)} = (I - K_{n + 1} H) X_{n + 1}^{(k)} + K_{n + 1} Y_{n + 1} - K_{n + 1} ζ_{n + 1}^{(k)} .

X_{n + 1}^{(k)} = (I - K_{n + 1} H) X_{n + 1}^{(k)} + K_{n + 1} Y_{n + 1} - K_{n + 1} ζ_{n + 1}^{(k)} .

[C_{n}]_{i, j} \propto ϕ (d (i, j)) .

[C_{n}]_{i, j} \propto ϕ (d (i, j)) .

[D_{l}]_{i, j} = ϕ (d (i, j)) .

[D_{l}]_{i, j} = ϕ (d (i, j)) .

ϕ (x) = (1 + \frac{x}{c _{l}}) exp (- \frac{x}{c _{l}}) \mathds 1_{x \leq l},

ϕ (x) = (1 + \frac{x}{c _{l}}) exp (- \frac{x}{c _{l}}) \mathds 1_{x \leq l},

[D_{c u t}^{l}]_{i, j} = \mathds 1_{d (i, j) \leq l} .

[D_{c u t}^{l}]_{i, j} = \mathds 1_{d (i, j) \leq l} .

l := in f {x \geq 0 : [A]_{i, j} = 0 if d (i, j) > x} .

l := in f {x \geq 0 : [A]_{i, j} = 0 if d (i, j) > x} .

B_{l} = i max # {j : d (i, j) \leq l} .

B_{l} = i max # {j : d (i, j) \leq l} .

K_{n + 1}^{i} = C_{n + 1}^{i} H^{T} (σ_{o}^{2} I_{q} + H C_{n + 1}^{i} H^{T})^{- 1},

K_{n + 1}^{i} = C_{n + 1}^{i} H^{T} (σ_{o}^{2} I_{q} + H C_{n + 1}^{i} H^{T})^{- 1},

K_{n + 1} = i = 1 \sum d e_{i} e_{i}^{T} K_{n + 1}^{i} .

K_{n + 1} = i = 1 \sum d e_{i} e_{i}^{T} K_{n + 1}^{i} .

\overline{X}_{n + 1} = A_{n} \overline{X}_{n} + b_{n}, Δ X_{n + 1}^{(k)} = r (A_{n} Δ X_{n}^{(k)} + ξ_{n + 1}^{(k)}), ξ_{n + 1}^{(k)} \sim N (0, Σ_{n}), C_{n + 1} = \frac{1}{K} k = 1 \sum K Δ X_{n + 1}^{(k)} \otimes Δ X_{n + 1}^{(k)}, \overline{X}_{n + 1} = (I - K_{n + 1} H) \overline{X}_{n + 1} + K_{n + 1} Y_{n + 1}, Δ X_{n + 1}^{(k)} = (I - K_{n + 1} H) Δ X_{n + 1}^{(k)} + K_{n + 1} ζ_{n + 1}^{(k)}, ζ_{n + 1}^{(k)} \sim N (0, σ_{o}^{2} I_{q}) .

\overline{X}_{n + 1} = A_{n} \overline{X}_{n} + b_{n}, Δ X_{n + 1}^{(k)} = r (A_{n} Δ X_{n}^{(k)} + ξ_{n + 1}^{(k)}), ξ_{n + 1}^{(k)} \sim N (0, Σ_{n}), C_{n + 1} = \frac{1}{K} k = 1 \sum K Δ X_{n + 1}^{(k)} \otimes Δ X_{n + 1}^{(k)}, \overline{X}_{n + 1} = (I - K_{n + 1} H) \overline{X}_{n + 1} + K_{n + 1} Y_{n + 1}, Δ X_{n + 1}^{(k)} = (I - K_{n + 1} H) Δ X_{n + 1}^{(k)} + K_{n + 1} ζ_{n + 1}^{(k)}, ζ_{n + 1}^{(k)} \sim N (0, σ_{o}^{2} I_{q}) .

C_{n + 1} = \frac{1}{K} k = 1 \sum K Δ X_{n + 1}^{(k)} \otimes Δ X_{n + 1}^{(k)} .

C_{n + 1} = \frac{1}{K} k = 1 \sum K Δ X_{n + 1}^{(k)} \otimes Δ X_{n + 1}^{(k)} .

F_{n}^{S} = σ {Δ X_{0}^{(k)}, ξ_{t}^{(k)}, ζ_{t - 1}^{(k)}, t = 1, \dots, n, k = 1, \dots, K} .

F_{n}^{S} = σ {Δ X_{0}^{(k)}, ξ_{t}^{(k)}, ζ_{t - 1}^{(k)}, t = 1, \dots, n, k = 1, \dots, K} .

Δ X_{n}^{(k)}, Δ X_{n - 1}^{(k)}, C_{n}, C_{n - 1}, K_{n} \in F_{n}^{S} .

Δ X_{n}^{(k)}, Δ X_{n - 1}^{(k)}, C_{n}, C_{n - 1}, K_{n} \in F_{n}^{S} .

F_{n} = σ {X_{0}, X_{0}^{(k)}, ξ_{t}, ζ_{t - 1}, ξ_{t}^{(k)}, ζ_{t - 1}^{(k)}, t = 1, \dots, n, k = 1, \dots, K} .

F_{n} = σ {X_{0}, X_{0}^{(k)}, ξ_{t}, ζ_{t - 1}, ξ_{t}^{(k)}, ζ_{t - 1}^{(k)}, t = 1, \dots, n, k = 1, \dots, K} .

R_{n} (C_{n}) := A_{n} (I - K_{n} H) C_{n} (I - K_{n} H)^{T} A_{n}^{T} + σ_{o}^{2} A_{n} K_{n} K_{n}^{T} A_{n}^{T} + Σ_{n} .

R_{n} (C_{n}) := A_{n} (I - K_{n} H) C_{n} (I - K_{n} H)^{T} A_{n}^{T} + σ_{o}^{2} A_{n} K_{n} K_{n}^{T} A_{n}^{T} + Σ_{n} .

Δ X_{n + 1}^{(k)} = r A_{n} (I - K_{n} H) Δ X_{n}^{(k)} + r A_{n} K_{n} ζ_{n}^{(k)} + r ξ_{n + 1}^{(k)} .

Δ X_{n + 1}^{(k)} = r A_{n} (I - K_{n} H) Δ X_{n}^{(k)} + r A_{n} K_{n} ζ_{n}^{(k)} + r ξ_{n + 1}^{(k)} .

Z = \frac{1}{K} k = 1 \sum K (a_{k} + z_{k}) \otimes (a_{k} + z_{k}), Σ_{a} = \frac{1}{K} k = 1 \sum K a_{k} \otimes a_{k} .

Z = \frac{1}{K} k = 1 \sum K (a_{k} + z_{k}) \otimes (a_{k} + z_{k}), Σ_{a} = \frac{1}{K} k = 1 \sum K a_{k} \otimes a_{k} .

σ_{a, z} = i, j max {[Σ_{z}]_{i, i}, [Σ_{a}]_{i, i}^{1/2} [Σ_{z}]_{j, j}^{1/2}} .

σ_{a, z} = i, j max {[Σ_{z}]_{i, i}, [Σ_{a}]_{i, i}^{1/2} [Σ_{z}]_{j, j}^{1/2}} .

P (∥ (Z - E Z) \circ D_{L} ∥ \geq ∥ D_{L} ∥_{1} σ_{a, z} t) \leq 8 exp (2 lo g d - cK min {t, t^{2}}) .

P (∥ (Z - E Z) \circ D_{L} ∥ \geq ∥ D_{L} ∥_{1} σ_{a, z} t) \leq 8 exp (2 lo g d - cK min {t, t^{2}}) .

P (∥ Z - E Z ∥_{\infty} \geq σ_{a, z} t) \leq 8 exp (2 lo g d - cK min {t, t^{2}}) .

P (∥ Z - E Z ∥_{\infty} \geq σ_{a, z} t) \leq 8 exp (2 lo g d - cK min {t, t^{2}}) .

a_{k} = r A_{n} (I - K_{n} H) Δ X_{n}^{(k)}, z_{k} = r A_{n} K_{n} ζ_{n}^{(k)} + r ξ_{n + 1}^{(k)},

a_{k} = r A_{n} (I - K_{n} H) Δ X_{n}^{(k)}, z_{k} = r A_{n} K_{n} ζ_{n}^{(k)} + r ξ_{n + 1}^{(k)},

E_{F_{n}^{S}} \overset{e}{^}_{n} \otimes \overset{e}{^}_{n} = E_{S} \overset{e}{^}_{n} \otimes \overset{e}{^}_{n} .

E_{F_{n}^{S}} \overset{e}{^}_{n} \otimes \overset{e}{^}_{n} = E_{S} \overset{e}{^}_{n} \otimes \overset{e}{^}_{n} .

e_{n} = \overline{X}_{n} - X_{n} = \overline{X}_{n} - K_{n} (H \overline{X}_{n} - H X_{n} - ζ_{n}) - X_{n} = (I - K_{n} H) \overset{e}{^}_{n} + K_{n} ζ_{n},

e_{n} = \overline{X}_{n} - X_{n} = \overline{X}_{n} - K_{n} (H \overline{X}_{n} - H X_{n} - ζ_{n}) - X_{n} = (I - K_{n} H) \overset{e}{^}_{n} + K_{n} ζ_{n},

\overset{e}{^}_{n + 1} = \overline{X}_{n + 1} - X_{n + 1} = A_{n} (\overline{X}_{n} - X_{n}) - ξ_{n + 1} = A_{n} (I - K_{n} H) \overset{e}{^}_{n} + A_{n} K_{n} ζ_{n} - ξ_{n + 1} .

\overset{e}{^}_{n + 1} = \overline{X}_{n + 1} - X_{n + 1} = A_{n} (\overline{X}_{n} - X_{n}) - ξ_{n + 1} = A_{n} (I - K_{n} H) \overset{e}{^}_{n} + A_{n} K_{n} ζ_{n} - ξ_{n + 1} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Performance analysis of local ensemble Kalman filter

Xin T. Tong National University of Singapore, [email protected]

Abstract

Ensemble Kalman filter (EnKF) is an important data assimilation method for high dimensional geophysical systems. Efficient implementation of EnKF in practice often involves the localization technique, which updates each component using only information within a local radius. This paper rigorously analyzes the local EnKF (LEnKF) for linear systems, and shows that the filter error can be dominated by the ensemble covariance, as long as 1) the sample size exceeds the logarithmic of state dimension and a constant that depends only on the local radius; 2) the forecast covariance matrix admits a stable localized structure. In particular, this indicates that with small system and observation noises, the filter error will be accurate in long time even if the initialization is not. The analysis also reveals an intrinsic inconsistency caused by the localization technique, and a stable localized structure is necessary to control this inconsistency. While this structure is usually taken for granted for the operation of LEnKF, it can also be rigorously proved for linear systems with sparse local observations and weak local interactions. These theoretical results are also validated by numerical implementation of LEnKF on a simple stochastic turbulence in two dynamical regimes.

1 Introduction

Data assimilation is a sequential procedure, in which observations of a dynamical system are incorporated to improve the forecasts of that system. In many of its most important geoscience and engineering applications, the main challenge comes from the high dimensionality of the system. For contemporary atmospheric models, the dimension can reach $d\sim 10^{8}$ , and the classical particle filter is no longer feasible [1, 2]. The ensemble Kalman filter (EnKF) was invented by meteorologists [3, 4, 5] to resolve this issue. By sampling the forecast uncertainty with a small ensemble, and then employing Kalman filter procedures to the empirical distribution, EnKF can often capture the major uncertainty and produce accurate predictions. The simplicity and efficiency of EnKF have made it a popular choice for weather forecasting and oil reservoir management [6, 7].

One fundamental technique employed by EnKF is localization [8, 4, 9, 10, 11]. In most geophysical applications, each component $[X]_{i}$ of the state variable $X$ holds information of one spatial location. There is a natural distance $\mathbf{d}(i,j)$ between two components. In most physical systems, the covariance between $[X]_{i}$ and $[X]_{j}$ is formed by information propagation in space, intuitively its strength decays with the distance $\mathbf{d}(i,j)$ . In particular, when $\mathbf{d}(i,j)$ exceeds a threshold $L$ , the covariance is approximately zero. This is a special sparse and localized structure that can be exploited in the EnKF operation. In particular, the forecast covariance can be artificially enforced as zero if $\mathbf{d}(i,j)>L$ . In other words, there is no need to sample these covariance terms, and indeed sampling from them leads to higher errors [4]. Such modification significantly reduces the sampling difficulty and the associated sample size. This is crucial for EnKF operation, since often only a few hundred samples can be generated in practice. Various versions of localized EnKF (LEnKF) are derived based on this principle, and there is ample numerical evidence showing their performance is robust against the growth of dimension [4, 9, 10, 11, 12, 13, 14, 15, 16, 17]. Moreover, there is a growing interest in applying the same technique to the classical particle filters [18, 19, 20].

While there is a consensus on the importance of the localization technique for EnKF, currently there is no rigorous explanation of its success. This paper contributes to this issue by showing that in the long run, the LEnKF can reach its estimated performance for linear systems, if the ensemble size $K$ exceeds $D_{L}\log d$ , and the ensemble covariance matrix admits a stable localized structure of radius $L$ . The constant $D_{L}$ above depends on the radius $L$ but not on $d$ .

Showing the necessary sampling size has only logarithmic dependence on $d$ is our major interest. In the simpler scenario of sampling a static covariance matrix, [21] shows that the necessary sample size scales with $D_{L}\log d$ . Generalizing this result to the setting of EnKF is highly nontrivial, since the target covariance matrix evolves constantly in time, and the sampling error at one time step has a nonlinear impact on future iterations. By analyzing the filter forecast error evolution, and compare it with the filter covariance evolution, we show the filter error covariance can be dominated by the ensemble covariance with high probability. In other words, the LEnKF can reach its estimated performance. One important corollary is that if the system and observation noise are of scale $\sqrt{\epsilon}$ , then the error covariance scales as $\epsilon$ , which indicates that LEnKF can be accurate regardless of the initial condition. Such property is often termed as accuracy for practical filters or observers [22, 23, 24].

Interestingly, our analysis also captures an intrinsic inconsistency caused by the localization technique. Generally speaking, the localization technique can be applied to the ensemble covariance matrix, but not the ensemble. However, the Kalman update is applied to the ensemble, but not to the localized ensemble covariance matrix. As these two operations do not commute, an inconsistency emerges, which we will call the localization inconsistency. This phenomenon has been mentioned in [9, 25]. Moreover, [15] numerically examines its role with serial observation processing, and shows that it may lead to significant filter error. In correspondence to these findings, one crucial step in our analysis is showing that the localization inconsistency is controllable, if the forecast covariance matrix indeed has a localized structure.

While most applications of LEnKF assume the underlying covariance matrices are localized, rigorous justification of this assumption is sorely missing in the literature. A recent work [26] considers applying a projection to the continuous time Kalman-Bucy filter, and shows that if the projection is a small perturbation on the covariance matrix, its impact on the filter process is also small. It is shown through an example that if the filter system can be decoupled into independent local parts, a projection similar to the LEnKF localization procedure can be made. Unfortunately, in most practical problems, all spatial dimensions are coupled with local interactions, and it is very difficult to show that the localization procedure is a small perturbation.

This paper partially investigates the theoretical gaps mentioned above. We show that for linear systems with weak local interactions and sparse local observations, the localized structure is stable for the LEnKF ensemble covariance. Weak local interaction is an intuitive requirement, else fast information propagation will form strong covariances between far away locations. Sparse local observation, on the other hand, is assumed to simplify the assimilation formulas.

In rough words, our main results consist of the following statements.

To sample a localized covariance matrix correctly, the necessary sample size scales with $D_{L}\log d$ (Theorem 2.1). This reveals the sampling advantage gained by applying the localization procedure. 2. 2.

While localization improves the sampling, it creates an inconsistency in the assimilation steps. For the LEnKF ensemble covariance to capture the filter error covariance with $D_{L}\log d$ samples, the localization inconsistency needs to be small (Theorem 2.4). 3. 3.

One way to guarantee a small localization inconsistency, is to have a stable localized structure in the forecast ensemble covariance matrix (Proposition 2.3). 4. 4.

The LEnKF forecast covariance has a stable localized structure, if the underlying linear system has weak interactions and sparse local observations. (Theorem 2.5). So by points 2 and 3, we know that LEnKF has good forecast skills, since its ensemble covariance captures the true filter error covariance. 5. 5.

The results above scale linearly with the variance of the noises. So when applying LEnKF to a linear system with small system and observation noises, its long time performance is accurate (Theorem 2.7).

Section 2 will provide the setup of our problem, and present the precise statements of the main results. The implication of these results on the issue of localized radius is discussed in Section 2.6.

Section 3 verifies the theoretical results by implementing LEnKF on a stochastically forced dissipative advection equation [6]. One stable and one unstable dynamical regimes are tested. In both of them, LEnKF have shown robust forecast skill with only $K=10$ ensemble members, while the dimension varies between $10$ and $1000$ . Moreover the localized covariance structure and the accuracy with small noises can also be verified for LEnKF in both regimes.

Section 4 investigates the covariance sampling problem of LEnKF, and proves Theorem 2.1. Section 5 analyzes the localization inconsistency and filter error evolution. It contains the proofs of Theorem 2.4 and Proposition 2.3. Section 6 studies the localized structure of linear systems with weak local interactions and sparse observations, and shows that the small noise scaling can be applied to our results. Section 7 concludes this paper and discusses some interesting extensions.

2 Main Results

2.1 Problem Setup

Since its invention, the ensemble Kalman filter (EnKF) has been modified constantly for two decades, and its formulation has become rather sophisticated today. In this subsection we briefly review some of the key modifications, in particular the localization techniques.

The following notations will be used throughout the paper. For two vectors $a$ and $b$ , $\|a\|$ denotes the $l_{2}$ norm of $a$ , $a\otimes b$ denotes the matrix $ab^{T}$ . Square bracket with subscripts indicates a component or entry of an object. So $[a]_{i}$ is the $i$ -th component of vector $a$ . In particular, we use $\mathbf{e}_{i}$ to denote the $i$ -th standard basis vector, i.e. $[\mathbf{e}_{i}]_{j}=\mathds{1}_{i=j}$ .

Given a matrix $A$ , $[A]_{i,j}$ is the $(i,j)$ -th entry of $A$ . The $l_{2}$ operator norm is denoted by $\|A\|=\inf\{c:\|Av\|\leq c\|v\|,\forall v\}$ . The $l_{\infty}$ operator norm is denoted by $\|A\|_{1}=\max_{i}\sum_{j}|[A]_{i,j}|$ . The maximum absolute entry is denoted by $\|A\|_{\infty}=\max_{i,j}|[A]_{i,j}|$ . We also use $I_{m}$ to denote the $m\times m$ dimensional identity matrix. Given two matrices $A$ and $D$ , their Schur (Hadamard) product can be defined by entry wise product

[TABLE]

For two real symmetric matrices $A$ and $B$ , $A\preceq B$ indicates that $B-A$ is positive semidefinite.

Ensemble Kalman Filter

In this paper, we consider a linear system in $\mathbb{R}^{d}$ with partial observations,

[TABLE]

Throughout our discussion, we assume the matrices $A_{n},\Sigma_{n}$ are bounded:

[TABLE]

The time-inhomogeneous generality can be used to model intermittent dynamical systems [6, 27]. We assume that the observations are made at $q<d$ distinct locations $\{o_{1},o_{2},\cdots,o_{q}\}\subset\{1,\cdots,d\}$ . This can be modelled by letting

[TABLE]

Note that the operator norm $\|H\|=1$ .

It is well known that the optimal estimate of $X_{n}$ given historical observations $Y_{1},\ldots,Y_{n}$ is provided by the Kalman filter [28], assuming $X_{0}$ is Gaussian distributed. Unfortunately, direct implementation of the Kalman filter involves a stepwise computation complexity of $O(d^{2}q)$ . When the state dimension $d$ is high, the Kalman filter is not computationally feasible.

The ensemble Kalman filter (EnKF) is invented by meteorologists [5] to reduce the computation complexity. $K$ samples of (2.1) are updated using the Kalman filter rules, and their ensemble mean and covariance are employed to estimate the signal $X_{n}$ . In specific, suppose the posterior ensemble for $X_{n}$ is denoted by $\{X_{n}^{(k)}\}_{k=1,\ldots,K}$ . The forecast ensemble of $X_{n+1}$ is first generated by propagating the linear system in (2.1):

[TABLE]

The EnKF then estimates $X_{n+1}$ with a prior distribution $\mathcal{N}(\overline{\widehat{X}}_{n+1},\widehat{C}_{n+1})$ , where the mean and covariance are obtained by the forecast ensemble:

[TABLE]

Applying the Bayes’ formula to the prior distribution and the linear observation $Y_{n+1}$ , a target Gaussian posterior distribution for $X_{n+1}$ can be obtained. There are several ways to update the forecast ensemble so its statistics approximate the target ones. Here we consider the standard EnKF in [5, 6] with artificial perturbations:

[TABLE]

The Kalman gain matrix is given by $\widetilde{K}_{n+1}=\widehat{C}_{n+1}H^{T}(\sigma_{o}^{2}I_{q}+H\widehat{C}_{n+1}H^{T})^{-1}$ . The $\zeta^{(k)}_{n+1}$ are independent noises sampled from $\mathcal{N}(0,\sigma_{o}^{2}I_{q})$ .

The computation complexity of EnKF is roughly $O(K^{2}d)$ , assuming $A_{n}$ and $\Sigma_{n}$ are sparse [29]. In practice, the ensemble size $K$ is often less than a few hundred, so the operational speed is significantly improved. On the other hand, with the sample size $K$ much smaller than the state space dimension $d$ , the sample covariance $\widehat{C}_{n+1}$ often produces spurious correlations [30, 5]. Spurious correlations may seriously reduce the filter accuracy, since the Kalman filter operation hinges heavily on the correctness of covariance estimation. The localization techniques are often employed to resolve such problems.

Localization techniques

In most geophysical applications, each dimension index $i\in\{1,\ldots,d\}$ corresponds to a spatial location. For simplicity, we assume different indices correspond to different spatial locations. Let $\mathbf{d}(i,j)$ be the spatial distance between the locations $i$ and $j$ specify, then $\mathbf{d}$ is also a distance on the index set $\{1,\ldots,d\}$ . In other words,

•

$\mathbf{d}(i,j)=0$ if and only if $i=j$ ;

•

$\mathbf{d}(i,j)=\mathbf{d}(j,i)$ ;

•

$\mathbf{d}(i,j)+\mathbf{d}(j,k)\geq\mathbf{d}(i,k)$ .

For a simple example, one can correspond index $i$ with the integer $i$ , then $\mathbf{d}(i,j)=|i-j|$ clearly defines a distance.

For most geophysical problems that can be modeled by a (stochastic) partial differential equation, the covariance between two locations is caused by the propagation of information through local interactions. Information often is also dissipated during its propagation, so its impact gets less significant when it reaches far-away locations. This leads to a localized covariance structure. In other words, there is a decreasing function $\phi:[0,\infty)\mapsto[0,1]$ , $\phi(0)=1$ such that

[TABLE]

In geophysical applications, a localization radius $l$ is often defined, so $\phi(x)=0$ for $x>l$ . Consequentially, it is natural to model the localization function as

[TABLE]

In particular, the widely used Gaspari-Cohn matrix [31] is of this form with

[TABLE]

where the radius is often picked with $l=\sqrt{10/3}c_{l}$ or $2c_{l}$ [32]. Another simple localization matrix corresponds to the cutoff or heavyside function $\phi(x)=\mathds{1}_{x\leq l}$ , and we denote it by $\mathbf{D}_{cut}^{l}$ . In other words

[TABLE]

As a remark, while (2.5) is more useful in practice, (2.6) is much simpler for theoretical analysis and interpretation. Most of our analysis results in below only apply to (2.6), except Theorem 2.1. It will be very interesting to generalize the analysis framework here for localization functions like (2.5).

The notion of localization radius is closely related to the bandwidth of a matrix [33]. For a matrix $A$ , we define its bandwidth as:

[TABLE]

The bandwidth roughly captures how fast different components interact with each other. If $A$ has bandwidth $l$ , each component interacts with at most $\mathcal{B}_{l}$ components when product with $A$ , where the volume constant $\mathcal{B}_{l}$ is defined by

[TABLE]

A localized covariance structure is extremely useful for EnKF. It indicates only covariances between nearby indices are worth sampling. By ignoring the far apart covariances, the necessary sampling size can be significantly reduced. To apply this idea, the localization technique modifies the Kalman gain matrix in (2.3), and ensures the assimilation updates from far away observation is insignificant. There are two main types of localization methods in the literature, domain localization and covariance localization [14]. This paper discusses only the former, while similar analysis should in principal applies to the latter as well.

With domain localization, the $i$ -th component is updated using only observations of indices within distance $l$ , which are elements of $\mathcal{I}_{i}=\{j:\mathbf{d}(i,j)\leq l\}$ . Let $\mathbf{P}_{\mathcal{I}_{i}}$ be the projection matrix of a $\mathbb{R}^{d}$ vector to its components on $\mathcal{I}_{i}$ , note that it is diagonal so it is symmetric. Then $\widehat{C}^{i}_{n+1}:=\mathbf{P}_{\mathcal{I}_{i}}\widehat{C}_{n+1}\mathbf{P}_{\mathcal{I}_{i}}$ contains the local covariance relevant to the $i$ -th component. The corresponding Kalman gain is

[TABLE]

and the $i$ -th component is updated using the $i$ -th row of (2.9), namely $\mathbf{e}_{i}\mathbf{e}_{i}^{T}K^{i}_{n+1}$ . Again $\mathbf{e}_{i}$ is the $i$ -th standard basis vector of $\mathbb{R}^{d}$ . The final Kalman gain matrix patches all rows together

[TABLE]

Since each $K^{i}_{n+1}$ has nonzero entries only with indies in $\mathcal{I}_{i}\times\mathcal{I}_{i}$ , $\widehat{K}_{n+1}H$ is of bandwidth $l$ as well. The proof in Proposition 2.3 below verifies this. Therefore, each component is updated using observations of distance at most $l$ from it.

Localized EnKF with covariance inflation

Other than spurious correlations, a small sampling size also jeopardizes the EnKF operation, as the forecast covariance is often undervalued [34, 35, 23]. In order to resolve this issue, the covariance needs to be inflated with a fixed ratio $r>1$ . [23] has shown these modification are pivotal to EnKF performance. We also incorporate this idea in our LEnKF.

In summary, the localized EnKF (LEnKF) updates an posterior ensemble $\{X_{n}^{(k)},k=1,\cdots,K\}$ of its mean $\overline{X}_{n}=\frac{1}{K}\sum_{k=1}^{K}X^{(k)}_{n}$ and spread $\Delta X_{n}^{(k)}=X_{n}^{(k)}-\overline{X}_{n}$ through the following steps with $\widehat{K}_{n+1}$ given by (2.9) and (2.10):

[TABLE]

The posterior covariance matrix can be obtained through the spread

[TABLE]

Note here we update the mean and ensemble spread, the $\Delta$ terms, separately. This is different from the standard EnKF, since the average noise terms $\frac{1}{K}\sum\xi^{(k)}_{n+1}$ and $\frac{1}{K}\sum\zeta^{(k)}_{n+1}$ are ignored for simplicity. Also the sum of the ensemble spread, $\sum\Delta X_{n}^{(k)}$ , may not be zero. On the other hand, these differences are small by the law of large numbers. The proofs can also be generalized to admit these terms, but the discussion will be notationally complicated.

One classical property of the Kalman filter is that the filter covariances and the Kalman gain matrices are predetermined with no dependence on the realization of system (2.1). This is inherited by the LEnKF (2.11), the covariances and Kalman gain depend only on the sample noise $\xi_{n}^{(k)},\zeta_{n}^{(k)}$ realizations, but not on $(X_{n},Y_{n})$ .

To illustrate, consider the filtration generated by sample noise realization,

[TABLE]

Using induction, it is easy to verify the ensemble spread, ensemble covariance and Kalman gain, are all $\mathcal{F}^{S}_{n}$ adapted:

[TABLE]

The corresponding conditional expectation is denoted by $\mathbb{E}_{\mathcal{F}^{S}_{n}}$ . We will use $\mathcal{F}^{S}_{\infty}=\bigvee\mathcal{F}^{S}_{n}$ to denote the $\sigma$ -field for all ensemble spread information.

The other randomness of EnKF comes from the realization of system (2.1). We can average out this part of randomness by conditioning on $\mathcal{F}^{S}_{\infty}$ , which we will denote as $\mathbb{E}_{S}$ . This is useful when comparing the filter error and sample covariance. The natural filtration generated by all random outcome at time $n$ is

[TABLE]

We will denote the conditional expectation with $\mathcal{F}_{n}$ as $\mathbb{E}_{n}$ .

2.2 Sampling errors of localized forecast covariance

Since EnKF relies on the ensemble forecast covariance matrix to assimilate new observations, its performance depends on the accuracy of the sampling procedure. The sampling procedure updates the forecast matrix from time $n$ to $n+1$ .

Given the forecast ensemble covariance $\widehat{C}_{n}$ , based on the Kalman update rule, the inflated target forecast covariance at $n+1$ is given by $r\mathcal{R}_{n}(\widehat{C}_{n})$ , with the posterior Riccati map

[TABLE]

The real ensemble forecast covariance $\widehat{C}_{n+1}=\frac{1}{K}\sum\Delta\widehat{X}^{(k)}_{n+1}\otimes\Delta\widehat{X}^{(k)}_{n+1}$ is generated by the ensemble spread

[TABLE]

It is straight forward to verify the average of $\widehat{C}_{n+1}$ over $\zeta^{(k)}_{n}$ and $\xi^{(k)}_{n+1}$ matches $\mathcal{R}_{n}(\widehat{C}_{n})$ , that is, $\mathbb{E}_{n}\widehat{C}_{n+1}=\mathcal{R}_{n}(\widehat{C}_{n})$ .

In order to control the sampling error $\|\widehat{C}_{n+1}-r\mathcal{R}_{n}(\widehat{C}_{n})\|$ , it is necessary to have a sufficiently large $K$ . Unfortunately, the size of $K$ would need to grow linearly with $d$ [21]. As a simple example, let $\widehat{C}_{n}=\widehat{K}_{n}=0$ , $\Sigma_{n}=I_{d}$ , $r=1$ , then $\Delta\widehat{X}^{(k)}_{n+1}=\xi^{(k)}_{n+1}$ are i.i.d. samples from $\mathcal{N}(0,I_{d})$ , and the target sample matrix is $I_{d}$ . Yet $\|\widehat{C}_{n+1}\|=1+\sqrt{d/K}$ with high probability by the Bai-Yin’s law [36]. In practical settings, $K\ll d$ , so the sample covariance is unlikely to be correct.

As discussed in Section 2.1, the main idea of localization is that we assume the target covariance $\mathcal{R}_{n}(\widehat{C}_{n})$ is localized, so it suffices to consider $\mathcal{R}_{n}(\widehat{C}_{n})\circ\mathbf{D}_{L}$ , which can be sampled by $\widehat{C}_{n+1}\circ\mathbf{D}_{L}$ . Here $\mathbf{D}_{L}$ can be any matrix of form (2.4), where its radius $L$ does not need to match $l$ used in (2.9). In fact, we will mostly use $\mathbf{D}_{L}=\mathbf{D}_{cut}^{L}$ (2.6) with $L\geq 4l$ in our discussion. One important advantage gained by localization is that, in order for the covariance sampling to be accurate, that is $\|(\widehat{C}_{n+1}-\mathcal{R}_{n}(\widehat{C}_{n}))\circ\mathbf{D}_{L}\|$ to be small, the necessary sample size scales only with $D_{L}\log d$ , instead of $d$ , where $D_{L}$ is some constant that only depends on $L$ . This phenomenon was discovered in statistics [21], assuming the samples are generated from one fixed distribution. But in EnKF, the conditional mean of each sample is different, i.e. $\mathbb{E}_{n}\Delta\widehat{X}^{(k)}_{n+1}=\sqrt{r}A_{n}(I-\widehat{K}_{n}H)\Delta\widehat{X}_{n}^{(k)}$ . A generalization of [21] is our first result:

Theorem 2.1.

For any fixed group of $a_{k}\in\mathbb{R}^{d}$ , $k=1,\ldots,K$ , and $K$ i.i.d. samples $z_{k}\sim\mathcal{N}(0,\Sigma_{z})$ . Consider the sample covariances

[TABLE]

Let

[TABLE]

$Z$ * concentrates around its mean in the following two ways, where $c$ is an absolute constant:*

a)

Schur product with a symmetric matrix $\mathbf{D}_{L}$ . For any $t\geq 0$

[TABLE]

Recall that $\|\mathbf{D}_{L}\|_{1}:=\max_{i}\sum_{j=1}^{d}|[\mathbf{D}_{L}]_{i,j}|$ , which is often independent of $d$ . 2. b)

Entry-wise. Consider $\|Z-\mathbb{E}Z\|_{\infty}=\max_{i,j}|[Z-\mathbb{E}Z]_{i,j}|$ , then for any $t\geq 0$

[TABLE]

In application to LEnKF, we will let

[TABLE]

and Theorem 2.1 shows that $\widehat{C}_{n+1}\circ\mathbf{D}_{L}$ concentrates around $r\mathcal{R}_{n}(\widehat{C}_{n})\circ\mathbf{D}_{L}$ . The exact statement is given below by Corollary 5.4. The result in [21] is equivalent to the special case where $a_{k}\equiv 0$ . Fortunately, the generalization is not difficult and is in Section 4.

2.3 Localization inconsistency with localized covariance

While the localization technique makes the covariance sampling much easier, they also introduce additional errors. The fundamental reason is that the localization techniques are applied to the covariance matrices, but cannot be applied to the ensemble members themselves. On the other hand, the analysis update is applied to the ensemble but not to the covariance. This leads to a matrix inconsistency [9, 25, 15].

To illustrate, we look at the forecast filter error at time $n$ , $\hat{e}_{n}=\overline{\widehat{X}}_{n}-X_{n}$ . At this moment, the sample noise realization of $\mathcal{F}^{S}_{n}$ is available, so it is natural to consider the conditional covariance of the forecast filter error :

[TABLE]

The identity holds because the sample noises after time $n$ are independent of $\hat{e}_{n}\in\mathcal{F}_{n}$ .

Suppose this covariance is captured by the localized ensemble covariance, in other words $\mathbb{E}_{S}\hat{e}_{n}\otimes\hat{e}_{n}=\widehat{C}_{n}\circ\mathbf{D}_{L}$ . Based on the LEnKF formulation (2.11), the filter errors after the next assimilation step and forecast step are:

[TABLE]

Since the Kalman gain $\widehat{K}_{n}\in\mathcal{F}^{S}_{n}$ , $\zeta_{n}$ and $\xi_{n+1}$ are independent of $\mathcal{F}^{S}_{\infty}$ , the new forecast error covariance is

[TABLE]

On the other hand, the ensemble covariance is generated by the update in (2.14). With no inflation, $r=1$ , Theorem 2.1 indicates $\widehat{C}_{n+1}\circ\mathbf{D}_{L}$ is near its average

[TABLE]

Recall the posterior Riccati map $\mathcal{R}_{n}(\widehat{C}_{n})$ is defined by (2.13).

The difference between (2.16) and (2.17) can be interpreted as the inconsistency caused by commuting the localization and Kalman covariance update. In order for the ensemble covariance to capture the error covariance, it is necessary for this difference to be small. This is an issue not governed by the sampling scheme, but governed by the localization operation.

As discussed in the introduction, the major motivation behind localization techniques is that the covariance is localized. We formalize this notion through the following definition.

Definition 2.2.

Given a decreasing function $\Phi:\mathbb{R}^{+}\mapsto[0,1]$ with $\Phi(0)=1$ , we say the forecast covariance sequence $\widehat{C}_{n}$ follows an $(M_{n},\Phi,L)$ -localized structure, if

[TABLE]

The decay function $\Phi$ and $L$ need not coincide with the $\phi$ and $l$ used in Kalman gain localization (2.4). This flexibility is useful when we try to verify the localized structure. Intuitively, in order for localization techniques to be effective, we need $\Phi(x)$ to be near zero when $x$ is large. This holds true for most localized covariance structures, such as the Gaspari Cohn matrix (2.5), and also the function $\Phi(x)=\lambda_{A}^{x}$ with a certain $\lambda_{A}<1$ , which will appear below in Theorem 2.5 for linear systems.

One interesting phenomenon, is that if the forecast covariance is already localized, then the localization inconsistency is in general small:

Proposition 2.3.

Suppose $\|A_{n}\|\leq M_{A}$ , $A_{n}$ and $\Sigma_{n}$ are of bandwidth less than $l$ , and $\widehat{C}_{n}$ follows an $(M_{n},\Phi,L)$ -localized structure, then the localization inconsistency with $\mathbf{D}_{L}=\mathbf{D}_{cut}^{L}$ and $L\geq 4l$ , given by

[TABLE]

has nonzero entries only around the localization boundary:

[TABLE]

Moreover, it is bounded by

[TABLE]

$\mathcal{B}_{L,l}$ * is a volume constant $\mathcal{B}_{L,l}=\max_{i}\#\{j:|\mathbf{d}(i,j)-L|\leq 2l\}$ , and $\mathcal{B}_{l}$ is given by (2.8). Note that if $\Phi(L-2l)$ is close to zero, the right side is very small.*

2.4 Main result: LEnKF performance

There are different ways to quantify the performance of EnKF. One approach is to compare EnKF with its large ensemble limit, which is the Kalman filter, and estimate the convergence rate [LeG11, 37, 38, 39]. Moreover, advanced sampling techniques, such as multilevel Monte Carlo, can be applied to the EnKF procedures, and speed up the convergence [HLT16, CHLT16]. However, these results have not investigated the dependence of sample size $K$ on the underlying dimension, thus they are not helpful in explaining the advantages of the localization procedures. Moreover, the large ensemble limit for LEnKF is not necessarily the optimal, since the localization techniques may violate the Bayes’ formula.

A more practical approach looks for qualitative EnKF properties, where the necessary sample size $K$ scales with quantities much less than $d$ [40, 41, 42, 43], for example a low effective dimension [23]. One central issue of EnKF is that, unlike Kalman filter, it estimates the forecast uncertainty by the ensemble covariance, which can be faulty. Since the forecast covariance matrix plays a pivotal role in the EnKF operation, it is important to ask if the ensemble covariance captures the real filter error covariance.

In our particular case, we are interested in finding a bound for filter error covariance $\mathbb{E}_{S}\hat{e}_{n}\otimes\hat{e}_{n}$ . We will compare it with the filter ensemble covariance $\widehat{C}_{n}$ . Note that the conditioning $\mathbb{E}_{S}$ is with respect to the sample noise filtration $\mathcal{F}^{S}_{\infty}$ given in (2.12), moreover note that $\widehat{C}_{n}\in\mathcal{F}_{\infty}^{S}$ . Therefore the comparison is legitimate. By showing $\mathbb{E}_{S}\hat{e}_{n}\otimes\hat{e}_{n}$ is dominated by a proper inflation of $\widehat{C}_{n}$ with large probability, we demonstrate that the LEnKF reaches its estimated performance. In order to achieve that, we need the localized structure to be stable as well.

Theorem 2.4.

Suppose the forecast ensemble covariance follows a stable $(M_{n},\Phi,L)$ -localized structure, and the sample size $K$ exceeds $D_{L}\log d$ with a constant $D_{L}$ that depends on $L$ , the LEnKF (2.11) reaches its estimated performance in the long time average. In specific, for any $\delta>0$ , suppose the following conditions hold

In the signal-observation system (2.1), $A_{n}$ and $\Sigma_{n}$ are of bandwidth $l$ , moreover

[TABLE] 2. 2)

Suppose the initial error satisfies $\mathbb{E}_{S}\hat{e}_{0}\otimes\hat{e}_{0}\preceq r_{0}(\widehat{C}_{0}+\rho I_{d})$ for some $r_{0}$ and $\rho$ that

[TABLE]

This can always be achieved by picking a larger $r_{0}$ . 3. 3)

The forecast covariance $\widehat{C}_{n}$ follows a $(M_{n},\Phi,L)$ -localized structure as in Definition 2.2. Moreover, the localized structure is stable, so there are constants $B_{0},D_{0}$ and $M_{0}$ so that

[TABLE] 4. 4)

The localized structure $\Phi$ and radius $L$ satisfy

[TABLE]

The volume constants are given by Proposition 2.3. 5. 5)

The sample size $K>\Gamma(r\mathcal{B}_{l}\delta^{-1},d)$ , with

[TABLE]

and the absolute constant $c$ is given by Theorem 2.1.

Then for any $1<r_{*}<r$ , the filter error covariance is dominated by the filter covariance

[TABLE]

with high $1-O(\delta)$ probability in long time average

[TABLE]

2.5 Weak local interaction with sparse observations

By Theorem 2.4, the stability of localized structure is a necessary condition for the LEnKF to reach its estimated performance. While in practice this condition is often assumed to be true to motivate the localization technique, and one can check it while the algorithm is running, it is interesting to find some sufficient a-priori conditions of system (2.1), so that (2.20) holds. Unfortunately, rigorous investigations in this direction is sorely missing. Here we provide a stability analysis in a simple setting.

The origin of localized covariances is intuitively clear. In most physical systems, the covariance between $[X]_{i}$ and $[X]_{j}$ comes from information propagation in space. So if the propagation is weak and decays at the same time, there will be a localized covariance. For our linear models, the information propagation is carried by local interactions, described by the off diagonal terms of $A_{n}$ . To enforce its weakness, we assume that there is a $\lambda_{A}<1$ , such that

[TABLE]

For the simplicity of our discussion, we also assume the system noise is diagonal $\Sigma_{n}=\sigma^{2}_{\xi}I_{d}$ .

Note that $\lambda_{A}<1$ , so $\lambda_{A}^{-\mathbf{d}(i,k)}$ is a large number when $i$ and $k$ are fart apart. So condition (2.22) constraints the long distance interaction, measured by $|[A_{n}]_{i,k}|$ , to be weak. In other words, (2.22) models a local interaction. If we concern the unfilter covariance of the sequence $[X]_{i}$ , then $\lambda_{A}<1$ is sufficient to guarantee the covariance is localized, using Proposition 6.2 in below.

The main difficulty actually comes from the observation part. For simplicity, we require the observations in (2.2) to be sparse in the sense that $\mathbf{d}(o_{i},o_{j})>2l$ for any $i\neq j$ . Recall that $o_{i}$ is the $i$ -th observable component. Then for each location $i\in\{1,\cdots,d\}$ , there is at most one location $o(i)\in\{o_{1},\cdots,o_{q}\}$ such that $\mathbf{d}(i,o(i))\leq l$ . This will significantly simplify the analysis step and yield an explicit expression. Sparse observations are in fact quite common in practice. Moreover, it is also possible to generalize the results here to non-sparse scenario, by using sequential assimilation [15]. But the conditions will be much more involved.

Under the sparse observation scenario, the following function describes how does the localized structure of $\widehat{C}_{n}$ update to the one of $\widehat{C}_{n+1}$ :

[TABLE]

This function provides a way to ensure stable localized structure:

Theorem 2.5.

Given a LEnKF (2.11), suppose the following holds

The system noise is diagonal and the observations are sparse

[TABLE] 2. 2)

There is a $\lambda_{A}<r^{-1}$ such that (2.22) holds. 3. 3)

There are constants

[TABLE]

such that $\psi_{\lambda_{A}}(M_{*},\delta_{*})\leq M_{*}$ with $\psi_{\lambda_{A}}$ given by (2.23). 4. 4)

Denote $n_{*}=2L+\lceil\frac{\log 4\delta_{*}^{-1}}{\log\lambda^{-1}_{A}}\rceil$ . The sample size $K$ exceeds

[TABLE]

Then the forecast ensemble covariance follows a stable localized structure $(M_{n},\Phi,L)$ with $\Phi(x)=\lambda_{A}^{x}$ . In specific, the stochastic sequence $M_{n}$ is dissipative every $n_{*}$ steps:

[TABLE]

The long time average condition (2.20) can be verified by

[TABLE]

Remark 2.6.

Note that

[TABLE]

With sufficiently small $\lambda_{A}$ or $\sigma^{-1}_{o}$ , $\psi_{\lambda_{A}}(M,\delta)<M$ can have a solution, so condition $3)$ holds.

2.6 Localization radius

One important and difficult issue of LEnKF implementation is how to choose the localization radius $l$ . The theoretical results above shed some light over this issue qualitatively. It is worth noticing that this paper has two localization radii. $l$ is the one used for LEnKF(2.11) formulation, and $L$ is used for the filter error theoretical analysis. But generally speaking $L$ and $l$ should be picked so that $L\geq 4l$ , so we concern only of $L$ in the following. We also assume that $\Phi(x)=\lambda_{A}^{x}$ from Theorem 2.5 for simpler discussion.

A smaller localization radius simplify the sampling task by focusing on a smaller assimilation domain, and significantly reduces the necessary sample size. This comes from two perspectives. First, in order for the LEnKF to sample the correct localized covariance matrix, condition 5) of Theorem 2.4 requires the sample size to grow polynomially with $L$ , since $\|\Phi\|_{1}$ is summing over $\mathcal{B}_{L}$ entries. Second, the localized covariance structure can be very delicate at the boundary, and to maintain it one needs the random forecast covariance to have sampling error of scale $\lambda_{A}^{L}$ . This leads to the exponential dependence of $K$ on $L$ , as in condition 4) of Theorem 2.5.

On the other hand, a larger localization radius $L$ reduces the size of the localization inconsistency. Based on Proposition 2.3, the localization inconsistency is of order $\Phi(L-2l)=\lambda_{A}^{L-2l}$ , because within inequality (2.19), $\mathcal{B}_{l}$ is independent of $L$ , and $\mathcal{B}_{L,l}$ is also independent of $L$ if $i,j$ are taken from $\{1,\cdots,d\}$ . This becomes condition 4) of Theorem 2.4, where we need the localization radius to be large, so the inconsistency is bounded by the tolerance.

2.7 LEnKF accuracy with small noises

In practice, with frequent and accurate observations, the system noises, $\Sigma_{n}$ and $\sigma_{o}^{2}$ , are often of scale $\epsilon$ . In this scenario, the LEnKF has its error covariance scale with $\epsilon$ in long time, showing an accurate forecast skill. Moreover, there is no requirement that the initial ensemble to have error of scale $\epsilon$ , meaning the LEnKF can converge to the signal $X_{n}$ given enough time.

Theorem 2.7.

Suppose, the signal-observation system (2.1) satisfies the conditions of Theorem 2.5, and its LEnKF is tuned to satisfy the conditions of Theorem 2.4 except (2.20). Then if the same LEnKF is applied to the following system

[TABLE]

it has small filter error covariance of scale $\epsilon$ . In particular, the ensemble covariance is of scale $\epsilon$ in long time average

[TABLE]

Moreover, the real filter covariance is dominated by $\widehat{C}_{n}$ with high probability:

[TABLE]

Note that $\epsilon$ appears only in terms that converge to zero with $T\to\infty$ .

Remark 2.8.

We need the system to follow the conditions in Theorem 2.5 only to ensure the stable localized structure exists. If one can find other conditions to verify that the LEnKF follows an $(M_{n},\Phi,L)$ localized structure such that $M_{n}$ converges to a scale of $\epsilon$ , the conditions in Theorem 2.5 can be replaced.

3 Numerical experiments

There is plenty of numerical evidence showing that LEnKF has good forecast skill even with nonlinear dynamical systems. Moreover, this paper intends to understand LEnKF from a theoretical perspective, not an empirical one. On the other hand, several new concepts and conditions are introduced in our analysis framework. To understand their significance, we conduct a few simple numerical experiments in this section.

3.1 Experiments setup: a stochastic turbulence model

We consider a stochastically forced dissipative advection equation on an one dimensional periodic domain from Section 6.3 of [6]:

[TABLE]

To transform it to a discrete linear system, we apply the centered difference formula with spatial grid size $h$ , and Euler scheme with time step $\Delta t$ . We assume $W(x,t)$ is a white noise in both time and space. The discretized signal-system $[X_{n,1},\cdots,X_{n,d}]^{T}$ follows

[TABLE]

The indices should be interpreted cyclically, that is $X_{n,0}=X_{n,d}$ and $X_{n,d+1}=X_{n,1}$ . The natural distance between indices is $\mathbf{d}(i,j)=\min\{|i-j|,||i-j|-d|\}$ . The system noises $W_{n,i}$ are independent samples from $\mathcal{N}(0,1)$ . We also initialize $X_{0,i}\sim\mathcal{N}(0,1)$ for simplicity. Evidently, if we formulate (3.1) in the format of (2.1), the corresponding matrix $A_{n}$ is constant with bandwidth $l=1$ . In other words it is tridiagonal. We assume one observation is made every $p$ components with independent Gaussian noise $B_{n,k}\sim\mathcal{N}(0,1)$ :

[TABLE]

A simple LEnKF with domain localization radius $l=1$ , inflation $r=1.1$ will be applied to recover $X_{n}$ . A small sample size $K=10$ is taken. As comparison, we implement a standard EnKF with the same inflation, sample size and sample noise realization. A standard Kalman filter is also computed to indicate the optimal filter error. We are interested to see

•

Does LEnKF have a close to optimal performance? Does localization play a key role?

•

Is filter performance robust against dimension increase?

•

Does filter performance scale with the noise strength?

•

Does the LEnKF ensemble covariance localize, and is this structure stable?

•

Do the a-priori conditions of Theorem 2.5 hold?

In the discussion below, we consider dimension in a wide range $d=10,100,1000$ . Yet we will fix the grid size $h$ in each regime. This corresponds to a sequence of domains with increasing size, but not a fixed domain with increasing refinement. Although the latter can also have very high dimension, localization is not a suitable tool; a proper projection to the low effective dimension should be more effective [23]. Also it is worth noticing that there are better ways to filter (3.1), such as Fourier domain filtering [6]. We are running LEnKF here just to support our theoretical analysis.

3.2 Regime I: strong dissipation

We first consider a regime of (3.1) with strong uniform damping and weak advection

[TABLE]

In this regime, the conditions of Theorem 2.5 can be verified. In particular, (2.22) can be formulated as

[TABLE]

Direct numerical computation shows that $\lambda_{A}=0.5186$ satisfies this relation. Furthermore, we can verify that $(\delta^{*},M_{*})=(0.128,0.2187)$ satisfy condition 3) of Theorem 2.5. Theorem 2.5 predicts a stable stochastic sequence $M_{n}$ exists so $\widehat{C}_{n}$ follows localized structure $(M_{n},\Phi,4)$ , where $\Phi(x)=\lambda_{A}^{x\wedge L}$ and $M_{n}$ has its mean bounded by $8.8959$ . On the other hand, Theorem 2.5 requires the sample size to be around $K=2.8\times 10^{4}$ for $d=100$ , and $K=7.34\times 10^{4}$ for $d=10^{6}$ . We will see $K=10$ is sufficient for LEnKF to perform well numerically. The overestimate is reasonable as theoretical analysis is often too conservative. The main point of theoretical analysis is showing a logarithmic dependence of $K$ on the dimension.

The numerical results are presented in Figure 3.1. In subplot a) the dimension average square forecast error

[TABLE]

of LEnKF is plotted for 100 iterations. The time mean DSE (MSE) is around 0.142 for $d=100$ . This is comparable with the optimal Kalman filter MSE 0.129. Moreover, this performance is robust for all dimensions, MSE=0.137 for $d=10$ and MSE=0.143 for $d=1000$ , while the oscillation is stronger in $d=10$ case due to averaging over a small dimension.

Since this regime is very stable, EnKF without localization also has surprisingly good performance, as shown in subplot b). Its MSE is around 0.15, which is worse than LEnKF. This shows that, while the conditions of Theorem 2.5 are sufficient for LEnKF to work well, they might be too strong. It will be interesting if sharper working conditions for LEnKF can be found. It will also be interesting if one can show such strong conditions can already guarantee EnKF to work without localization.

Two other properties predicted by our theory are also validated. In subplot c), the localization status $M_{n}$ is plotted for all three dimensions. All three time sequences are stable, and they are all bounded below the theoretical estimate $8.8959$ from Theorem 2.5. We also test LEnKF with small scale system noises $\sigma^{\epsilon}_{x}=\sqrt{\epsilon}\sigma_{x},\sigma^{\epsilon}_{o}=\sqrt{\epsilon}\sigma_{o}$ . In subplot d), we plot the time mean DSE of $\epsilon=1,\frac{1}{2},\frac{1}{4},\ldots,\frac{1}{32}$ in logarithmic scales. It is clear that the LEnKF has the correct MSE scale of $\epsilon$ as Theorem 2.7 predicted.

3.3 Regime II: strong advection

The second regime we considered has a strong advection, while the damping is weak:

[TABLE]

This regime is close to unstable, since the linear system map $A_{n}$ has spectral norm 0.99. (3.2) does not have a solution below $1$ , so the conditions of Theorem 2.5 are not verifiable. Nevertheless, we find empirically the LEnKF ensemble covariance matrices are localized. In Figure 3.2, we demonstrate this by plotting

[TABLE]

using empirical average from 1000 samples with $d=100,n=100.$ The clear covariance strength transition around $x=4$ indicates that the ensemble covariance is localized. Therefore Theorem 2.4 applies and predicts that LEnKF will have a good performance.

This is indeed the case. In subplot a) of Figure 3.3, we see that LEnKF has a forecast skill. The MSE is around 1.63 for $d=100$ , where the optimal Kalman filter MSE is 1.06. This performance does not change much with the dimension, MSE=1.42 for $d=10$ , MSE=1.72 for $d=1000$ . The EnKF on the other hand is highly unstable except for the low dimension $d=10$ case. In subplot b), we see for $d=100$ and $1000$ , the DSE of EnKF grows exponentially to $10^{10}$ . This is a phenomenon known as EnKF catastrophic filter divergence, previously studied by [6, 43]. Now this also demonstrates how important is the localization technique. Such divergence can be resolved by introducing an adaptive additive inflation, where the stability can be rigorously proved [42].

In this unstable regime, LEnKF retains its stability and accuracy. Since the localization structure does not have a theoretical ground in this regime, Figure subplot c) plots only the largest matrix component of $\widehat{C}_{n}$ . From it we see the LEnKF ensemble covariance is stochastically stable for all three dimensions. Like in Regime I, we also test LEnKF with small scale system noises $\sigma^{\epsilon}_{x}=\sqrt{\epsilon}\sigma_{x},\sigma^{\epsilon}_{o}=\sqrt{\epsilon}\sigma_{o},$ where $\epsilon=1,\frac{1}{2},\frac{1}{4},\ldots,\frac{1}{32}$ . Subplot d) indicates the LEnKF has the correct MSE scaling with $\epsilon$ .

4 Concentration of localized random matrices

In this section, we present the proof of Theorem 2.1. While part a) is more useful, it can be established easily from part b), using a similar argument as in [21].

4.1 Entry-wise concentration

It is well known that the averages of independent Gaussian variables concentrate around their expected values. In specific, a simplified version of theorem 1.1 from [44] is:

Theorem 4.1 (Hanson-Wright inequality).

Let $\xi\sim\mathcal{N}(0,I_{n})$ and $A$ be an $n\times n$ matrix. Then for any $t\geq 0$

[TABLE]

Here $c$ is a constant independent of other parameters. The Hilbert-Schmidt (Frobenius) norm is denoted by $\|A\|_{HS}=[\sum_{i,j}[A]_{i,j}^{2}]^{1/2}$ .

This provides us a straight forward way to control the random matrix entries $[Z]_{i,j}$ in Theorem 2.1.

Lemma 4.2.

Under the conditions of Theorem 2.1, let $\Delta=Z-\mathbb{E}Z$ . There is an absolute constant $c$ such that for any $t\geq 0$ ,

[TABLE]

Proof.

For any vector $u$ , denote $\Delta_{u}=u^{T}[Z-\mathbb{E}Z]u$ . Then by symmetry,

[TABLE]

Recall that $\mathbf{e}_{i}$ is the $i$ -th standard basis vector. So it suffices to find a concentration bound for $\Delta_{u}$ with $u=\mathbf{e}_{i}\pm\mathbf{e}_{j}$ . To do that, note that $u^{T}\Sigma_{z}u=\mathbb{E}u^{T}z_{k}z_{k}^{T}u$ , so we can decompose $\Delta_{u}$

[TABLE]

We denote $\langle a,b\rangle=a^{T}b$ as the inner product, and the two summations above as I and II in the following. Notice that $\langle u,z_{k}\rangle\sim\mathcal{N}(0,u^{T}\Sigma_{z}u),K^{-1}\sum_{k=1}^{K}\langle\mathbf{e}_{j},a_{k}\rangle^{2}=u^{T}\Sigma_{a}u$ . Moreover for $u=\mathbf{e}_{i}\pm\mathbf{e}_{j}$ ,

[TABLE]

We have the same conclusion for $u^{T}\Sigma_{a}u$ . Because $\langle u,a_{k}\rangle$ is a deterministic scalar,

[TABLE]

and

[TABLE]

Because by definition of $\sigma_{a,z}$ , $u^{T}\Sigma_{a}u\cdot u^{T}\Sigma_{z}u\leq 16\sigma_{a,z}^{2}$ , by the Chernoff bound for Gaussian distributions, there is a $c_{1}>0$ so that

[TABLE]

In order to deal with II, notice that

[TABLE]

So

[TABLE]

where $A=\frac{1}{K}(u^{T}\Sigma_{z}u)I_{K}$ . Clearly, $\|A\|\leq\frac{4\sigma_{a,z}}{K}$ , and $\|A\|^{2}_{HS}\leq\frac{16\sigma_{a,z}^{2}}{K}$ . Therefore, by Theorem 4.1 there is a constant $c_{2}$ so that for all $s\geq 0$

[TABLE]

Let $t=\sigma^{-1}_{a,z}s$ , the inequality can be written as

[TABLE]

Because $|\Delta_{u}|\leq|\text{I}|+|\text{II}|$ , by the union bound, if we let $c=\min\{c_{1},c_{2}\}$ ,

[TABLE]

Finally, recall the bound above holds for all $u=\mathbf{e}_{i}\pm\mathbf{e}_{j}$ , so by (4.1)

[TABLE]

∎

Entry-wise concentration now comes as a direct corollary.

Proof of Theorem 2.1 b).

Let $\Delta=Z-\mathbb{E}Z$ . Note that $\|Z-\mathbb{E}Z\|_{\infty}=\max_{i,j=1,\ldots,d}\{|[\Delta]_{i,j}|\},$ so using the previous lemma we have our claim by the union bound

[TABLE]

∎

4.2 Summation of entry-wise deviation

One simple fact of matrix norm is that $\|\Delta\|\leq\|\Delta\|_{1}$ . This is also exploited by [21]

Lemma 4.3.

Given a matrix $\Delta$ , the following holds

a)

If $\Delta$ is symmetric, then

[TABLE] 2. b)

$\|\Delta\|_{\infty}\leq\|\Delta\|$ * always holds. If in addition $\Delta$ has bandwidth $l$ , then $\|\Delta\|\leq\mathcal{B}_{l}\|\Delta\|_{\infty}.$ *

Proof.

For a) part, recall that $\mathbf{e}_{i}$ is the $i$ -th standard basis vector. Notice that

[TABLE]

Therefore

[TABLE]

For the b) part, by the definition of operator norm, and $\|\mathbf{e}_{i}\|=\|\mathbf{e}_{j}\|=1$ , we have

[TABLE]

Taking maximum among all $i$ and $j$ , we have $\|\Delta\|_{\infty}\leq\|\Delta\|$ .

Next note that $\|\Delta\|=\|\Delta\Delta^{T}\|^{1/2}\leq\max_{i}\sum_{j}|[\Delta\Delta^{T}]_{i,j}|$ , and if $\Delta$ is of bandwidth $l$ , by part a)

[TABLE]

Therefore $\|\Delta\|\leq\mathcal{B}_{l}\|\Delta\|_{\infty}$ . ∎

Now the Theorem 2.1 a) comes as a direct corollary:

Proof of Theorem 2.1 a).

Let $\Delta=Z-\mathbb{E}Z$ . By Lemma 4.3 a),

[TABLE]

Therefore by part b) of this theorem,

[TABLE]

∎

5 Error analysis of LEnKF

5.1 Localization inconsistency

Lemma 5.1.

Fix an $L>l$ , if matrix $A$ is of bandwidth $l$ , the difference caused by commuting localization and bilinear product with $A$

[TABLE]

has nonzero entries only for indices $(i,j)$ with $|\mathbf{d}(i,j)-L|\leq 2l$ .

If in addition, matrix $C$ follows an $(M,\Phi,L)$ -localized structure, then

[TABLE]

Recall that $\mathcal{B}_{l}$ is the volume constant given by (2.8).

Proof.

By the matrix product rule,

[TABLE]

If $\mathbf{d}(i,j)>L+2l$ , note that $[A]_{i,u}[A]_{j,v}\neq 0$ only when $\mathbf{d}(i,u)\leq l,\mathbf{d}(j,v)\leq l$ . But for these terms, by the triangular inequality $\mathbf{d}(u,v)>L$ , and they are not included in (5.1). Therefore (5.1) $=0$ .

If $\mathbf{d}(i,j)\leq L$ , it is easy to verify that $[\Delta]_{i,j}=-\sum_{\mathbf{d}(u,v)>L}[A]_{i,u}[C]_{u,v}[A]_{j,v}$ . Moreover, $[A]_{i,u}[A]_{j,v}\neq 0$ only when $\mathbf{d}(i,u)\leq l,\mathbf{d}(j,v)\leq l$ . So if $\mathbf{d}(i,j)<L-2l$ , then by triangular inequality $\mathbf{d}(u,v)<L$ and $[\Delta]_{i,j}=0$ .

Next, we assume $C$ follows an $(M,\Phi,L)$ -localized structure. If $L<\mathbf{d}(i,j)$ , then among the nonzero terms in $[\Delta]_{i,j}=\sum_{\mathbf{d}(u,v)\leq L}[A]_{i,u}[C]_{u,v}[A]_{j,v}$ , $\mathbf{d}(u,v)\geq L-2l$ by triangular inequality. This leads to

[TABLE]

Here we used that

[TABLE]

If $L-2l\leq\mathbf{d}(i,j)\leq L$ , then by $[\Delta]_{i,j}=-\sum_{\mathbf{d}(u,v)>L}[A]_{i,u}[C]_{u,v}[A]_{j,v}$ ,

[TABLE]

where we applied the inequality

[TABLE]

In either case, we have the bound we claim, since $\Phi(L)\leq\Phi(L-2l)$ . ∎

Proof of Proposition 2.3.

Since Schur product is a linear operation, we can decompose the localization inconsistency as

[TABLE]

Since both $\widehat{K}_{n}$ and $\Sigma_{n}$ are of bandwidth at most $l$ , $A_{n}\widehat{K}_{n}\widehat{K}_{n}^{T}A_{n}^{T}$ has bandwidth at most $4l$ by triangular inequality. Since $L\geq 4l$ , so

[TABLE]

In other words, $\Delta_{loc}$ is

[TABLE]

which can be applied by Lemma 5.1. Next, we try to bound $\|A_{n}(I-\widehat{K}_{n}H)\|_{\infty}$ . Recall that $\|H\|=1$ , $\|A_{n}\|\leq M_{A}$ and Lemma 4.3 b),

[TABLE]

In domain localization (2.10), $\widehat{K}_{n}H$ has bandwidth $l$ . To see this, note that

[TABLE]

Since $\widehat{C}^{i}_{n}$ has nonzero entries only in $\mathcal{I}_{i}\times\mathcal{I}_{i}$ ,

[TABLE]

Also $[\widehat{C}^{i}_{n}]_{i,o_{k}}=0$ if $\mathbf{d}(o_{k},i)>l$ . Therefore, $[\widehat{K}_{n}H]_{i,j}=0$ if $\mathbf{d}(i,j)>l$ .

By Lemma 4.3 b), $\|\widehat{K}_{n}H\|\leq\mathcal{B}_{l}\|\widehat{K}_{n}H\|_{\infty}$ . Since the $i$ -th row of $\widehat{K}_{n}H$ is the $i$ -th row of $K^{i}_{n}H$ , so by Lemma 4.3 b),

[TABLE]

Moreover, by definition (2.9) and Lemma 4.3 a)

[TABLE]

Note that $\widehat{C}^{i}_{n}$ has nonzero entries only in $\mathcal{I}_{i}\times\mathcal{I}_{i}$ , by Lemma 4.3,

[TABLE]

Moreover, since $\widehat{C}_{n}$ follows an $(M_{n},\Phi,L)$ structure, $\|\widehat{C}_{n}\|_{\infty}\leq M_{n}$ . Summing up, the domain localized Kalman gain can be bounded by

[TABLE]

Then by Lemma 5.1, the localization inconsistency matrix is bounded entry-wise

[TABLE]

while $|[\Delta]_{i,j}|=0$ if $|\mathbf{d}(i,j)-L|>2l$ . So there are at most $\mathcal{B}_{L,l}=\max_{i}\#\{j,|\mathbf{d}(i,j)-L|\leq 2l\}$ nonzero entries in each row.

As a consequence

[TABLE]

∎

5.2 Component information gain through filtering

One of the fundamental properties in Kalman filter is that the assimilation of observation improves estimation. Mathematically, this can be represented by that the forecast covariance matrix dominates the posterior covariance matrix. Unfortunately, with LEnKF, this natural property, $\widehat{C}_{n}\succeq(I-\widehat{K}_{n}H)\widehat{C}_{n}(I-\widehat{K}_{n}H)^{T}+\sigma_{o}^{2}\widehat{K}_{n}\widehat{K}_{n}^{T}$ , may no longer hold. However, we can still show the dominance at the diagonal entries.

Proposition 5.2.

The assimilation step lowers the variance at each component:

[TABLE]

Proof.

Recall that the $i$ -th coordinate of $\Delta\widehat{X}^{(k)}_{n}$ is updated through the Kalman gain matrix $\widehat{K}^{i}_{n}$ . Therefore,

[TABLE]

Moreover, in (5.2) we have shown that $[\widehat{K}^{i}_{n}H]_{i,j}\neq 0$ only when $\mathbf{d}(i,j)\leq l$ , so

[TABLE]

Note that the right side is the posterior Kalman covariance with the forecast covariance being $\widehat{C}^{i}_{n}$ . Therefore by

[TABLE]

we have

[TABLE]

∎

5.3 Sampling error

First, we have the following general integral lemma

Lemma 5.3.

If $Y$ is a nonnegative random variable that satisfies

[TABLE]

Then for any $\delta\in(0,1)$ , if $K\geq\Gamma(M\delta^{-1},d)$ , where

[TABLE]

We have $\mathbb{E}Y\leq\delta$ and $\mathbb{E}Y^{2}\leq 2M\delta$ .

Proof.

Let $\epsilon=\frac{\delta}{3M}$ , and $X=Y/M$ , we have $K\geq\max\{\epsilon^{-2},\tfrac{8}{c\epsilon},\frac{2}{c\epsilon^{2}}\log d\}$ , and

[TABLE]

We will show that $\mathbb{E}X\leq 3\epsilon$ and $\mathbb{E}X^{2}\leq 6\epsilon$ , which are equivalent to our claims. Recall the integration by part formula for nonnegative random variables, $\mathbb{E}X=\int^{\infty}_{0}\mathbb{P}(X>x)dx$ ,

[TABLE]

Note that with our requirement on $K$ , $d^{2}\exp(-cK\epsilon)\leq 1$ ,

[TABLE]

And for $t>\epsilon$ , $8\leq 2\epsilon cKt$ , so

[TABLE]

As for $\mathbb{E}X^{2}$ , we again apply the integration by part formula

[TABLE]

We used $K\geq\max\{\epsilon^{-2},\tfrac{8}{c\epsilon},\frac{2}{c\epsilon^{2}}\log d\}$ in the last line. ∎

Corollary 5.4.

Under condition 1) of Theorem 2.4, suppose $\widehat{C}_{n}$ follows $(M_{n},\Phi,L)$ -localized structure. For any $\epsilon\in(0,1)$ , if

a)

$K>\Gamma(\mathcal{B}_{L}\epsilon^{-1},d)$ , then the sampling error

[TABLE] 2. b)

$K>\Gamma(rC\epsilon^{-1},d)$ * for any $C\geq 1$ , then the entry-wise sampling error*

[TABLE]

Proof.

We apply Theorem 2.1 with

[TABLE]

and $\mathbf{D}_{L}=\mathbf{D}_{cut}^{L}$ . Then

[TABLE]

Note that $\Sigma_{a}\preceq\Sigma_{a}+\Sigma_{z}$ and $\Sigma_{z}\preceq\Sigma_{a}+\Sigma_{z}=r\mathcal{R}_{n}(\widehat{C}_{n})$ , where recall

[TABLE]

Therefore

[TABLE]

Moreover, since $Q_{n}$ is positive semidefinite (PSD), so

[TABLE]

Moreover, by Proposition 5.2,

[TABLE]

Since $\mathcal{R}_{n}(\widehat{C}_{n})$ is PSD, and by Lemma 4.3 $\|A_{n}\|_{\infty}\leq\|A_{n}\|\leq M_{A}$ ,

[TABLE]

Apply Theorem 2.1, since $\|\mathbf{D}_{cut}^{L}\|_{1}=\max_{i}\sum_{j:\mathbf{d}(i,j)<L}1=\mathcal{B}_{L}$ , we have that

[TABLE]

$\mathbb{P}_{n}$ denotes the probability conditioned on $\mathcal{F}_{n}$ . Apply Lemma 5.3 with the both of them, but using $\delta=\epsilon$ for the first inequality and $\delta=\epsilon C^{-1}$ for the second, we have our claimed results. ∎

5.4 Error analysis

Next, we proceed to prove Theorem 2.4.

Proof of Theorem 2.4.

For each time $n$ , let $r_{n}$ be the smallest number such that the following hold,

[TABLE]

We will try to find a recursive upper bound of $r_{n+1}$ in term of $r_{n}$ .

Step 1: tracking the filter error. Recall that the forecast error at time $n+1$ is provided by the (2.15), and its covariance conditioned on sample noise realization is

[TABLE]

By Young’s inequality $(a+b)(a+b)^{T}\preceq 2aa^{T}+2bb^{T}$ , and that $HH^{T}=I_{q}$ ,

[TABLE]

Moreover, $A_{n}A_{n}^{T}\preceq M^{2}_{A}I_{d}\preceq\frac{M^{2}_{A}}{m_{\Sigma}}\Sigma_{n}$ . Denote $D_{\Sigma}=\max\{\frac{2M^{2}_{A}}{m_{\Sigma}},\frac{2}{\sigma_{o}^{2}}\}$ , then

[TABLE]

Furthermore,

[TABLE]

Recall that $\mathcal{R}_{n}^{\prime}(\widehat{C}_{n})$ in (2.16) is

[TABLE]

Therefore

[TABLE]

With our condition 2) on $\rho$ ,

[TABLE]

so $\mathbb{E}_{S}\hat{e}_{n+1}\otimes\hat{e}_{n+1}\preceq\max\{1,r_{n}/r\}r\mathcal{R}_{n}^{\prime}(\widehat{C}_{n})$ .

Step 2: difference between filter error covariance and its estimate.

The EnKF estimates the error covariance by the ensemble covariance $\widehat{C}_{n+1}$ . Its conditional expectation is

[TABLE]

In order to establish a control of the new filter error using localized ensemble covariance matrix, consider the difference

[TABLE]

The first part of (5.3) is the error caused by sampling. By Corollary 5.4, if we denote

[TABLE]

then $\mathbb{E}_{n}\mu_{n+1}\leq(\mathcal{B}^{2}_{l}M_{A}^{2}M_{n}+M_{\Sigma})\delta/r$ if $K$ satisfies condition 5).

The second part of (5.3) is the localization inconsistency. By Proposition 2.3, we have

[TABLE]

Summing these two parts up,

[TABLE]

Then

[TABLE]

Recall that in step 1, we have $\mathbb{E}_{S}\hat{e}_{n+1}\hat{e}_{n+1}^{T}\preceq\max\{1,\tfrac{r_{n}}{r}\}r\mathcal{R}_{n}^{\prime}(\widehat{C}_{n})$ , so if we let $r_{n+1}$ be the smallest number such that

[TABLE]

then

[TABLE]

**Step 3: long time stability analysis. ** Since $r_{*}\leq r$

[TABLE]

Taking the logarithm of (5.4), and using that $\log(1+x+y^{3})\leq x+2y$ for all $x,y\geq 0$ ,

[TABLE]

Sum this inequality from $n=0,\ldots,{T-1}$ , we have

[TABLE]

Because $r_{T}\geq 1$ ,

[TABLE]

Take expectation,

[TABLE]

Step 4: Upper bounds for (5.5). Recall in step 2 we have that

[TABLE]

Next, note the following holds because $\mathcal{B}_{l}\geq 1$

[TABLE]

With condition 4), we have

[TABLE]

so

[TABLE]

In conclusion,

[TABLE]

Plug these bounds to (5.5), and then use (2.20)

[TABLE]

For our result, simply notice that

[TABLE]

∎

6 Localized covariance for linear LEnKF systems

As discussed in the introduction, the existence of a localized covariance structure is often assumed in practice to motivate the localization technique. Our result, Theorem 2.4, shows that such a structure indeed can guarantee estimated performance, assuming the parameters and sample size are properly tuned. Then it is natural to ask when does a stable localized structure exist. This is an interesting and important question by itself, but to answer it for general signal-observation systems with rigorous proof is beyond the scope of this paper. Here we demonstrate how to verify a stable localized covariance for simple linear models.

6.1 Localized covariance propagation with weak local interactions

As discussed in Theorem 2.4, we require $A_{n}$ to be of a short bandwidth $l$ . In other words, interaction in one time step exists only for components of distance $l$ apart. When $l=1$ , this type of interaction is often called nearest neighbor interaction, and it includes many statistical physics models with proper spatial discretization.

Generally speaking, localized covariance is formed through weak local interactions. With linear dynamics described by $A_{n}$ , one way to enforce a weak local interaction is through (2.22). We will show in this subsection that weak local interaction propagates a localized covariance structure of form $[\widehat{C}_{n}]_{i,j}\propto\lambda_{A}^{\mathbf{d}(i,j)}$ , from diagonal entries of the covariance matrix to entries further away from diagonal.

To describe the state of localization in covariance matrices $\widehat{C}_{n}$ and $C_{n}$ , we define the following quantities

[TABLE]

Then clearly, the forecast covariance matrices follow the $(M_{n},\lambda^{x}_{A},L)$ localized structure with $M_{n}=\widehat{M}_{n,L}$ . The goal of this section is to show that $\widehat{M}_{n,L}$ is a stable stochastic sequence.

The following properties hold immediately because the matrices involved are PSD.

Lemma 6.1.

Given positive semidefinite (PSD) matrices $C_{n},\widehat{C}_{n}$ , define $M_{n,l},\widehat{M}_{n,l}$ as in (6.1), we have $\widehat{M}_{n,0}=\max_{i}[\widehat{C}_{n}]_{i,i}$ ,

[TABLE]

The same properties also hold for $M_{n,k}$ as well.

Proof.

Recall that $[\widehat{C}_{n}]_{i,j}$ is the ensemble covariance, so for $i\neq j$

[TABLE]

Therefore

[TABLE]

The monotonicity of $\widehat{M}_{n,k}$ in $k$ is quite obvious since $\mathbf{d}(i,j)\wedge k\leq\mathbf{d}(i,j)\wedge(k+1)$ , and

[TABLE]

∎

Next, we investigate how does the forecast step change the state of localization.

Proposition 6.2.

Suppose $\Sigma_{n}=\sigma_{\xi}^{2}I_{d}$ and the linear dynamics admits a weak local interaction satisfying (2.22), the forecast step propagates the localization in covariance. In particular, given any covariance matrix $C_{n}$ , and let $\widehat{C}_{n+1}=A_{n}C_{n}A_{n}^{T}+\Sigma_{n}$ , then the localization states described by (6.1) follows

[TABLE]

Proof.

Note that $[\widehat{C}_{n+1}]_{i,j}=[A_{n}C_{n}A_{n}^{T}]_{i,j}+\sigma_{\xi}^{2}\mathds{1}_{i=j}$ . Moreover

[TABLE]

which by (2.22) is bounded by $\lambda_{A}^{2}M_{n,k}\lambda_{A}^{\mathbf{d}(i,j)\wedge k}$ .

By Lemma 6.1,

[TABLE]

Moreover,

[TABLE]

∎

6.2 Preserving a localized structure with sparse observations

From now on, we require the observations to be sparse in the sense that $\mathbf{d}(o_{i},o_{j})>2l$ for any $i\neq j$ . Then for each location $i\in\{1,\cdots,d\}$ , there is at most one location $o(i)\in\{o_{1},\cdots,o_{q}\}$ such that $\mathbf{d}(i,o(i))\leq l$ . If such an $o(i)$ doesn’t exist, we set $o(i)=nil$ , the analysis step will not update it, and we will see the discussion for these components are trivial.

With domain localization and sparse observations, the analysis step updates the information at the $i$ -th component using only the observation at $o(i)$ . This significantly simplifies the formulation of $(H\widehat{C}^{i}_{n}H^{T}+\sigma_{o}^{2}I_{q})^{-1}$ , which is diagonal with entries $(\sigma_{o}^{2}+[\widehat{C}_{n}]_{o_{i},o_{i}})^{-1}$ in $\mathcal{I}_{i}\times\mathcal{I}_{i}$ . As a result, the Kalman update matrix has entries

[TABLE]

In fact, if we apply the covariance localization scheme instead of domain localization, the Kalman gain remains the same in this setting.

In below, we investigate how does the assimilation step change the state of localization.

Proposition 6.3.

Given any covariance matrix $\widehat{C}_{n}$ , define $\widehat{K}_{n}$ as the Kalman gain in (2.10), and let

[TABLE]

Define the state of localization using (6.1). Then

[TABLE]

where

[TABLE]

Proof.

Based on Lemma 6.1, $M_{n,0}=\max_{i}|[C_{n}]_{i,i}|,\widehat{M}_{n,0}=\max_{i}|[\widehat{C}_{n}]_{i,i}|$ , so $M_{n,0}\leq\widehat{M}_{n,0}$ holds by Proposition 5.2. Next, we look at the off diagonal terms:

[TABLE]

We have the following bounds for each term in (6.2)

[TABLE]

In summary

[TABLE]

∎

Proposition 6.4.

Denote $\delta_{n+1}=\lambda_{A}^{-L}\|\widehat{C}_{n+1}-r\mathcal{R}_{n}(\widehat{C}_{n})\|_{\infty}/\|\mathcal{R}_{n}(\widehat{C}_{n})\|_{\infty},$ and

[TABLE]

Then for $k\leq L-1$ ,

[TABLE]

Proof.

Recall that

[TABLE]

Following (6.1), we define its localized status:

[TABLE]

Apply Proposition 6.3,

[TABLE]

Then apply Proposition 6.2, we find that

[TABLE]

Finally by Lemma 6.1,

[TABLE]

Since $\|\mathcal{R}_{n}(\widehat{C}_{n})\|_{\infty}\leq\lambda_{A}^{2}\widehat{M}_{n,0}+\sigma_{\xi}^{2}$ , we have our bound for $\widehat{M}_{n+1,0}$ . Likewise,

[TABLE]

∎

6.3 Stability of localized structures

Lemma 6.5.

Under the conditions of Theorem 2.5, when $K>\Gamma(r\epsilon^{-1},d)$ with $\epsilon=\min\{\tfrac{1}{2\lambda_{A}}-\tfrac{r}{2},\tfrac{\delta}{2}\}$ , the diagonal status defined by (6.1) satisfies:

[TABLE]

Therefore, by Gronwall’s inequality,

[TABLE]

Proof.

We apply Lemma 6.1, Proposition 6.3 to find that

[TABLE]

and by the first claim of Proposition 6.2,

[TABLE]

Also by Young’s inequality, one can show that

[TABLE]

With $\epsilon=\min\{\tfrac{1}{2\lambda_{A}}-\tfrac{r}{2},\tfrac{\delta}{2}\}$ , when $K>\Gamma(r\epsilon^{-1},d)$ , by Corollary 5.4 b),

[TABLE]

By $\epsilon+r\leq\lambda_{A}^{-1}$ and $\|\mathcal{R}_{n}(\widehat{C}_{n})\|_{\infty}\leq\lambda_{A}^{2}\|\widehat{C}_{n}\|_{\infty}+\sigma_{\xi}^{2}$ ,

[TABLE]

Likewise, because $(r+2\epsilon)\leq\lambda^{-1}_{A}$ ,

[TABLE]

∎

Lemma 6.6.

Suppose the following holds

[TABLE]

and the sample size satisfies (2.24). Then

[TABLE]

Proof.

Case 1: if $\widehat{M}_{0,L}>\frac{4(r+\delta_{*})\sigma_{\xi}^{2}}{\lambda_{A}^{L}(1-\lambda_{A})}$ . By Lemma 6.1

[TABLE]

Then by Lemma 6.5

[TABLE]

By our choice of $n_{*}$ , $\lambda_{A}^{n_{*}-L}\leq\frac{1}{4}$ , so we have our claim, since

[TABLE]

Case 2: if $\widehat{M}_{0,L}\leq\frac{4(r+\delta_{*})\sigma_{\xi}^{2}}{\lambda_{A}^{L}(1-\lambda_{A})}$ . Consider the event

[TABLE]

Denote its complementary set as $\mathcal{U}^{c}$ . Then the expectation can be decomposed as

[TABLE]

where we applied the Cauchy inequality for the $\mathcal{U}^{c}$ part, and $\mathbb{P}_{0}$ is the probability conditioned on $\mathcal{F}_{0}$ . We will find a bound for each of the two parts.

If $\mathcal{U}$ holds, then $\delta_{n+1}\leq\delta_{*}$ for $n\leq n_{*}-1$ . By Proposition 6.4,

[TABLE]

Then by the Gronwall’s inequality, under $\mathcal{U}$ ,

[TABLE]

Because $\widehat{M}_{0,0}\leq\widehat{M}_{0,L}\leq\frac{4(r+\delta_{*})\sigma_{\xi}^{2}}{\lambda_{A}^{L}(1-\lambda_{A})}$ , so after $n_{0}=n_{*}-L\geq L+\lceil-\log(4r\delta_{*}^{-1}+4)/\log\lambda_{A}\rceil$

[TABLE]

In the next $1\leq k\leq L$ steps, since $\delta_{n}\leq\delta_{*}$ when $\mathcal{U}$ holds, because $\psi_{\lambda_{A}}$ is increasing, by Proposition 6.4

[TABLE]

we can derive that $\widehat{M}_{n_{0}+L,L}\leq M_{*}$ . Therefore by $n_{*}=n_{0}+L$ ,

[TABLE]

In order to conclude our claim, it suffices to show that

[TABLE]

Apply Lemma 6.5 with $\delta=\delta_{*}$ , recall that $\widehat{M}_{0,L}\leq\frac{4(r+\delta_{*})\sigma_{\xi}^{2}}{\lambda_{A}^{L}(1-\lambda_{A})}$ and $16\lambda_{A}^{n_{*}-2L}\leq 1$ ,

[TABLE]

Moreover, by Theorem 2.1 b)

[TABLE]

where the final bound comes with the sample $K$ satisfying (2.24). Therefore, by the law of iterated expectation,

[TABLE]

and (6.3) comes as a result. ∎

Proof of Theorem 2.5.

Recall that $M_{n}=\widehat{M}_{n,L}$ . So

[TABLE]

has been proved by Lemma 6.6. This leads to the following using Gronwall’s inequality,

[TABLE]

Next, for $k=1,\cdots,n_{*}-1$ , apply Lemma 6.5 with $\delta=\delta_{*}$

[TABLE]

because $\widehat{M}_{0,0}=\|\widehat{C}_{0}\|_{\infty}\leq\|\widehat{C}_{0}\|$ by Lemma 6.1. Then if $k+mn_{*}\leq T$ ,

[TABLE]

Summation of the inequality above with $k=0,\cdots,n_{*}-1$ , we obtain our final claim. ∎

6.4 Small noise scaling

Proof of Theorem 2.7.

It suffices to verify the conditions of Theorems 2.4 and 2.5 under the small noise scaling.

First, we check Theorem 2.5. Condition 1) is invariant except that $\Sigma_{n}=\epsilon\sigma_{\xi}^{2}I_{d}$ . Condition 2) concerns only of $A_{n}$ , so it and $\lambda_{A}$ are also invariant under small noise scaling. For condition 3), if it holds without small noise scaling, that is

[TABLE]

This leads to

[TABLE]

Moreover, condition 3) requires that

[TABLE]

Therefore, with small scaling, condition 3) holds with the same $\delta_{*}$ , while $M_{*}$ is replaced by $\epsilon M_{*}$ . Condition 4) is invariant under the small noise scaling, since $\delta_{*}$ and $\lambda_{A}$ are invariant.

As a consequence, Theorem 2.5 implies the following:

[TABLE]

This yields the first claimed result, since $M_{k}=\widehat{M}_{k,L}\geq\|\widehat{C}_{k}\|_{\infty}$ by Lemma 6.1.

Next we check the conditions of Theorem 2.4. For condition 1), $m_{\Sigma}$ and $M_{\Sigma}$ need to be replaced by $\epsilon\sigma_{\xi}^{2}$ since we assume $\Sigma_{n}=\epsilon\sigma_{\xi}^{2}I_{d}$ . Condition 2) still holds with $(r_{0},\rho)\to(\epsilon^{-1}r_{0},\epsilon\rho)$ since

[TABLE]

Condition 3) is guaranteed by (6.4) above, with $M_{0}=2(1+\delta_{*})\epsilon M_{*}$ . Condition 4) and condition 5) are both invariant, as it concerns only geometry quantities. Finally it suffices to plug in all the estimates for the result, and find

[TABLE]

Note that in above some $\epsilon$ terms are upper-bounded by $1$ , so the inequality has a simpler form. ∎

7 Conclusion and discussion

Ensemble Kalman filter (EnKF) is a popular tool for high dimensional data assimilation problems. Domain localization is an important EnKF technique that exploits the natural localized covariance structure, and simplifies the associated sampling task. We rigorously investigate the performance of localized EnKF (LEnKF) for linear systems. We show in Theorem 2.4 that in order for the filter error covariance to be dominated by the ensemble covariance, 1) the sample size $K$ needs to exceed a constant that depends on the localization radius and the logarithmic of the state dimension, 2) the forecast covariance has a stable localized structure. Condition 2) is necessary for an intrinsic localization inconsistency to be bounded. This condition is usually assumed in LEnKF operations, but it can also be verified for systems with weak local interaction and sparse observation by Theorem 2.5.

While the results here provide the first successive explanation of LEnKF performance with almost dimension independent sample size, there are several issues that require further study. In below we discuss a few of them.

There are several ways to apply the localization technique in EnKF. We discuss here only the domain localization with standard EnKF procedures. In principle, our results can be generalized to the covariance localization/tempering technique, and also the popular ensemble square root implementation. But such generalization will not be trivial, as the Kalman gain will not be of a small bandwidth, and localization techniques will have unclear impact on the square root SVD operation. 2. 2.

This paper studies the sampling effect of LEnKF and shows the sampling error is controllable. Yet LEnKF without sampling error, in other words, LEnKF in the large ensemble limit, is not well studied mathematically. The effect of the localization techniques on the classical Kalman filter controllability and observability condition is not known. This may lead to practical guidelines in the choice of localization radius. 3. 3.

Theorem 2.5 provides the first proof that LEnKF covariance has a stable localized structure. But the conditions we impose here are quite strong, while localized structure is taken for granted in practice. How to show it in general nonlinear settings is a very interesting question.

Acknowledgement

This research is supported by the NUS grant R-146-000-226-133, where X.T.T. is the principal investigator. The author thanks Andrew J. Majda, Lars Nerger and Ramon van Handel for their discussion on various parts of this paper.

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. Snyder, T. Bengtsson, and P. J. Bickel. Obstacles to high-dimensional particle filtering. Mon. Wea. Rev. , 136(12):4629–4640, 2008.
2[2] P. J. van Leeuwen. Particle filtering in geophysical systems. Mon. Wea. Rev. , 137:4089–4114, 2009.
3[3] J. L. Anderson. An ensemble adjustment Kalman filter for data assimilation. Mon. Weather Rev. , 129(12):2884–2903, 2001.
4[4] T. M. Hamill, C. Whitaker, and C. Snyder. Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Weather Rev. , 129:2776–2790, 2001.
5[5] G. Evensen. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean dynamics , 53(4):343–367, 2003.
6[6] A. J. Majda and J. Harlim. Filtering complex turbulent systems . Cambridge University Press, Cambridge, UK, 2012.
7[7] E. Kalnay. Atmospheric modeling, data assimilation, and predictability. Cambridge university press, 2003.
8[8] P. L. Houtekamer and H. L. Mitchell. Data assimilation using an ensemble kalman filter technique. Mon. Wea. Rev. , 126(3):796–811, 1998.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Performance analysis of local ensemble Kalman filter

Abstract

1 Introduction

2 Main Results

2.1 Problem Setup

Ensemble Kalman Filter

Localization techniques

Localized EnKF with covariance inflation

2.2 Sampling errors of localized forecast covariance

Theorem 2.1**.**

2.3 Localization inconsistency with localized covariance

Definition 2.2**.**

Proposition 2.3**.**

2.4 Main result: LEnKF performance

Theorem 2.4**.**

2.5 Weak local interaction with sparse observations

Theorem 2.5**.**

Remark 2.6**.**

2.6 Localization radius

2.7 LEnKF accuracy with small noises

Theorem 2.7**.**

Remark 2.8**.**

3 Numerical experiments

3.1 Experiments setup: a stochastic turbulence model

3.2 Regime I: strong dissipation

3.3 Regime II: strong advection

4 Concentration of localized random matrices

4.1 Entry-wise concentration

Theorem 4.1** (Hanson-Wright inequality).**

Lemma 4.2**.**

Proof.

Proof of Theorem 2.1 b).

4.2 Summation of entry-wise deviation

Lemma 4.3**.**

Proof.

Proof of Theorem 2.1 a).

5 Error analysis of LEnKF

5.1 Localization inconsistency

Lemma 5.1**.**

Proof.

Proof of Proposition 2.3.

5.2 Component information gain through filtering

Proposition 5.2**.**

Proof.

5.3 Sampling error

Lemma 5.3**.**

Proof.

Corollary 5.4**.**

Proof.

5.4 Error analysis

Proof of Theorem 2.4.

6 Localized covariance for linear LEnKF systems

6.1 Localized covariance propagation with weak local interactions

Lemma 6.1**.**

Proof.

Proposition 6.2**.**

Proof.

6.2 Preserving a localized structure with sparse observations

Proposition 6.3**.**

Proof.

Proposition 6.4**.**

Proof.

6.3 Stability of localized structures

Lemma 6.5**.**

Proof.

Lemma 6.6**.**

Proof.

Proof of Theorem 2.5.

6.4 Small noise scaling

Proof of Theorem 2.7.

7 Conclusion and discussion

Acknowledgement

Theorem 2.1.

Definition 2.2.

Proposition 2.3.

Theorem 2.4.

Theorem 2.5.

Remark 2.6.

Theorem 2.7.

Remark 2.8.

Theorem 4.1 (Hanson-Wright inequality).

Lemma 4.2.

Lemma 4.3.

Lemma 5.1.

Proposition 5.2.

Lemma 5.3.

Corollary 5.4.

Lemma 6.1.

Proposition 6.2.

Proposition 6.3.

Proposition 6.4.

Lemma 6.5.

Lemma 6.6.