Particle filter efficiency under limited communication

Deborshee Sen

arXiv:1904.09623·math.ST·February 21, 2022

Particle filter efficiency under limited communication

Deborshee Sen

PDF

TL;DR

This paper investigates how limited communication structures in particle filters affect their convergence and stability, demonstrating that randomized local communication can maintain efficiency and Monte Carlo convergence rates.

Contribution

It introduces the analysis of communication structure effects on particle filter stability and proposes randomized local communication to improve distributed algorithm efficiency.

Findings

01

Limited communication affects convergence stability.

02

Randomized local communication maintains Monte Carlo rate.

03

Good mixing properties ensure algorithm stability.

Abstract

Sequential Monte Carlo methods are typically not straightforward to implement on parallel architectures. This is because standard resampling schemes involve communication between all particles. The $α$ -sequential Monte Carlo method was proposed recently as a potential solution to this which limits communication between particles. This limited communication is controlled through a sequence of stochastic matrices known as $α$ -matrices. We study the influence of the communication structure on the convergence and stability properties of the resulting algorithms. In particular, we quantitatively show that the mixing properties of the $α$ -matrices play an important role in the stability properties of the algorithm. Moreover, we prove that one can ensure good mixing properties by using randomized communication structures where each particle only communicates with a few…

Equations120

X_{0} X_{t} ∣ (X_{t - 1} = x_{t - 1}) Y_{t} ∣ (X_{t} = x_{t}) \sim π_{0} (\cdot), \sim K_{t} (x_{t - 1}, \cdot) (t \geq 1), \sim g_{t} (x_{t}, \cdot) (t \geq 0) .

X_{0} X_{t} ∣ (X_{t - 1} = x_{t - 1}) Y_{t} ∣ (X_{t} = x_{t}) \sim π_{0} (\cdot), \sim K_{t} (x_{t - 1}, \cdot) (t \geq 1), \sim g_{t} (x_{t}, \cdot) (t \geq 0) .

π_{T} (φ) = \frac{1}{Z _{T}} \int_{X^{T + 1}} π_{0} (d x_{0}) t = 1 \prod T K_{t} (x_{t - 1}, d x_{t}) t = 0 \prod T - 1 g_{t} (x_{t}) φ (x_{T}),

π_{T} (φ) = \frac{1}{Z _{T}} \int_{X^{T + 1}} π_{0} (d x_{0}) t = 1 \prod T K_{t} (x_{t - 1}, d x_{t}) t = 0 \prod T - 1 g_{t} (x_{t}) φ (x_{T}),

γ_{T} (φ) = Z_{T} \times π_{T} (φ) .

γ_{T} (φ) = Z_{T} \times π_{T} (φ) .

W_{t}^{i} = j = 1 \sum N α_{t - 1}^{ij} W_{t - 1}^{j} g_{t - 1} (X_{t - 1}^{j}) (i = 1, \dots, N) .

W_{t}^{i} = j = 1 \sum N α_{t - 1}^{ij} W_{t - 1}^{j} g_{t - 1} (X_{t - 1}^{j}) (i = 1, \dots, N) .

P (X_{t}^{i} \in d x ∣ F_{t - 1}) = \frac{1}{W _{t}^{i}} j = 1 \sum N α_{t - 1}^{ij} W_{t - 1}^{j} g_{t - 1} (X_{t - 1}^{j}) K_{t} (X_{t - 1}^{j}, d x) .

P (X_{t}^{i} \in d x ∣ F_{t - 1}) = \frac{1}{W _{t}^{i}} j = 1 \sum N α_{t - 1}^{ij} W_{t - 1}^{j} g_{t - 1} (X_{t - 1}^{j}) K_{t} (X_{t - 1}^{j}, d x) .

X_{t}^{i} ∣ F_{t - 1} \sim \frac{1}{W _{t}^{i}} j = 1 \sum N α_{t - 1}^{ij} W_{t - 1}^{j} g_{t - 1} (X_{t - 1}^{j}) K_{t} (X_{t - 1}^{j}, \cdot) independently.

X_{t}^{i} ∣ F_{t - 1} \sim \frac{1}{W _{t}^{i}} j = 1 \sum N α_{t - 1}^{ij} W_{t - 1}^{j} g_{t - 1} (X_{t - 1}^{j}) K_{t} (X_{t - 1}^{j}, \cdot) independently.

F_{t} π (φ) = \frac{π ( g _{t - 1} K _{t} φ )}{π ( g _{t - 1} )}, t \geq 1.

F_{t} π (φ) = \frac{π ( g _{t - 1} K _{t} φ )}{π ( g _{t - 1} )}, t \geq 1.

E_{t - 1} {γ_{t}^{N} (φ)} = E_{t - 1} {W_{t}^{1} φ (X_{t}^{1})} = \frac{1}{N} j = 1 \sum N W_{t - 1}^{j} Q_{t} φ (X_{t - 1}^{j}) = γ_{t - 1}^{N} (Q_{t} φ) .

E_{t - 1} {γ_{t}^{N} (φ)} = E_{t - 1} {W_{t}^{1} φ (X_{t}^{1})} = \frac{1}{N} j = 1 \sum N W_{t - 1}^{j} Q_{t} φ (X_{t - 1}^{j}) = γ_{t - 1}^{N} (Q_{t} φ) .

λ (α) = v \in B_{1}^{0} sup ∥ α v ∥ < 1,

λ (α) = v \in B_{1}^{0} sup ∥ α v ∥ < 1,

α^{k} v - (i = 1 \sum N \frac{v _{i}}{N}) \times 1 \leq λ (α)^{k} v - (i = 1 \sum N \frac{v _{i}}{N}) \times 1 .

α^{k} v - (i = 1 \sum N \frac{v _{i}}{N}) \times 1 \leq λ (α)^{k} v - (i = 1 \sum N \frac{v _{i}}{N}) \times 1 .

∥ α w ∥^{2} \leq \frac{1 - λ ^{2} ( α )}{N} + λ^{2} (α) ∥ w ∥^{2} .

∥ α w ∥^{2} \leq \frac{1 - λ ^{2} ( α )}{N} + λ^{2} (α) ∥ w ∥^{2} .

∣ ∣ ∣ μ - ν ∣ ∣ ∣^{2} = sup {E [{μ (φ) - ν (φ)}^{2}] : φ \in B (X)} .

∣ ∣ ∣ μ - ν ∣ ∣ ∣^{2} = sup {E [{μ (φ) - ν (φ)}^{2}] : φ \in B (X)} .

t \geq 0 sup ∣ ∣ ∣ π_{t}^{N} - π_{t} ∣ ∣ ∣ \leq Cst \times N^{- 1/2} .

t \geq 0 sup ∣ ∣ ∣ π_{t}^{N} - π_{t} ∣ ∣ ∣ \leq Cst \times N^{- 1/2} .

N \times ∣ ∣ ∣ π_{t}^{N} - π_{t} ∣ ∣ ∣^{2} \leq \frac{D}{1 - ρ} \times \frac{κ _{g}^{4} { 1 - λ ^{2} ( α )}}{1 - κ _{g}^{4} λ ^{2} ( α )}

N \times ∣ ∣ ∣ π_{t}^{N} - π_{t} ∣ ∣ ∣^{2} \leq \frac{D}{1 - ρ} \times \frac{κ _{g}^{4} { 1 - λ ^{2} ( α )}}{1 - κ _{g}^{4} λ ^{2} ( α )}

N \times ∣ ∣ ∣ π_{t}^{N} - π_{t} ∣ ∣ ∣^{2} \leq C_{bootstrap} + Const \times λ^{2} (α) + O {λ^{4} (α)} .

N \times ∣ ∣ ∣ π_{t}^{N} - π_{t} ∣ ∣ ∣^{2} \leq C_{bootstrap} + Const \times λ^{2} (α) + O {λ^{4} (α)} .

α \sim υ_{N} ⟹ P α P^{- 1} \sim υ_{N} for any N \times N permutation matrix P .

α \sim υ_{N} ⟹ P α P^{- 1} \sim υ_{N} for any N \times N permutation matrix P .

μ_{t} (φ) = μ_{t - 1} (Q_{t} φ) \times N \to \infty lim [E {(α^{11})^{2}} + N E {(α^{12})^{2}}] + Z_{t}^{2} π_{t} (φ) \times N \to \infty lim {N^{2} E (α^{12} α^{13}) + 2 N E (α^{11} α^{12})};

μ_{t} (φ) = μ_{t - 1} (Q_{t} φ) \times N \to \infty lim [E {(α^{11})^{2}} + N E {(α^{12})^{2}}] + Z_{t}^{2} π_{t} (φ) \times N \to \infty lim {N^{2} E (α^{12} α^{13}) + 2 N E (α^{11} α^{12})};

V_{t}^{γ} (φ) V_{t}^{π} (φ) = V_{t - 1}^{γ} (Q_{t} φ) + μ_{t} (φ^{2}) - Z_{t}^{2} π_{t} (φ)^{2}, = \frac{V _{t - 1}^{π} ( Q _{t} φ )}{π _{t - 1} ( g _{t - 1} ) ^{2}} + μ_{t} (\overline{φ}^{2}),

V_{t}^{γ} (φ) V_{t}^{π} (φ) = V_{t - 1}^{γ} (Q_{t} φ) + μ_{t} (φ^{2}) - Z_{t}^{2} π_{t} (φ)^{2}, = \frac{V _{t - 1}^{π} ( Q _{t} φ )}{π _{t - 1} ( g _{t - 1} ) ^{2}} + μ_{t} (\overline{φ}^{2}),

μ_{t} (φ) = \frac{1}{d} μ_{t - 1} (Q_{t} φ) + \frac{d - 1}{d} Z_{t}^{2} π_{t} (φ) .

μ_{t} (φ) = \frac{1}{d} μ_{t - 1} (Q_{t} φ) + \frac{d - 1}{d} Z_{t}^{2} π_{t} (φ) .

V_{t}^{γ} (φ) = {Z_{0}^{2} var_{π_{0}} (Q_{0, t} φ) + Z_{1}^{2} var_{π_{1}} (Q_{1, t} φ) + \dots + Z_{t}^{2} var_{π_{t}} (φ)} + k = 1 \sum t \frac{β _{t, k} ( φ )}{d ^{k}}

V_{t}^{γ} (φ) = {Z_{0}^{2} var_{π_{0}} (Q_{0, t} φ) + Z_{1}^{2} var_{π_{1}} (Q_{1, t} φ) + \dots + Z_{t}^{2} var_{π_{t}} (φ)} + k = 1 \sum t \frac{β _{t, k} ( φ )}{d ^{k}}

V_{t}^{γ} (φ) = V_{t}^{bootstrap} (φ) + k = 1 \sum t \frac{β _{t, k} ( φ )}{d ^{k}} \approx V_{t}^{bootstrap} (φ) + \frac{β _{t, 1} ( φ )}{d},

V_{t}^{γ} (φ) = V_{t}^{bootstrap} (φ) + k = 1 \sum t \frac{β _{t, k} ( φ )}{d ^{k}} \approx V_{t}^{bootstrap} (φ) + \frac{β _{t, 1} ( φ )}{d},

λ (α) \to pr \frac{2 d - 1}{d} as N \to \infty.

λ (α) \to pr \frac{2 d - 1}{d} as N \to \infty.

X_{t + Δ t, 1}

X_{t + Δ t, 1}

X_{t + Δ t, 2}

X_{t + Δ t, 3}

F_{t} π (φ)

F_{t} π (φ)

∣ ∣ ∣ π_{T}^{N} - π_{T} ∣ ∣ ∣ \leq t = 1 \sum T ∣ ∣ ∣ F_{t, T} π_{t}^{N} - F_{t, T} F_{t} π_{t - 1}^{N} ∣ ∣ ∣ + ∣ ∣ ∣ F_{0, T} π_{0}^{N} - F_{0, T} π_{0} ∣ ∣ ∣ \leq t = 1 \sum T D ρ^{T - t} ∣ ∣ ∣ π_{t}^{N} - F_{t} π_{t - 1}^{N} ∣ ∣ ∣ + D ρ^{T} ∣ ∣ ∣ π_{0}^{N} - π_{0} ∣ ∣ ∣ .

∣ ∣ ∣ π_{T}^{N} - π_{T} ∣ ∣ ∣ \leq t = 1 \sum T ∣ ∣ ∣ F_{t, T} π_{t}^{N} - F_{t, T} F_{t} π_{t - 1}^{N} ∣ ∣ ∣ + ∣ ∣ ∣ F_{0, T} π_{0}^{N} - F_{0, T} π_{0} ∣ ∣ ∣ \leq t = 1 \sum T D ρ^{T - t} ∣ ∣ ∣ π_{t}^{N} - F_{t} π_{t - 1}^{N} ∣ ∣ ∣ + D ρ^{T} ∣ ∣ ∣ π_{0}^{N} - π_{0} ∣ ∣ ∣ .

E_{t - 1} [{π_{t}^{N} (φ) - F_{t} π_{t - 1}^{N} (φ)}^{2}] = B^{- 2} var_{t - 1} (A) = B^{- 2} i = 1 \sum N (W_{t}^{i})^{2} var_{t - 1} {φ (X_{t}^{i})} \leq B^{- 2} i = 1 \sum N (W_{t}^{i})^{2} = \overline{W}_{t}^{2} = E_{t}^{N} .

E_{t - 1} [{π_{t}^{N} (φ) - F_{t} π_{t - 1}^{N} (φ)}^{2}] = B^{- 2} var_{t - 1} (A) = B^{- 2} i = 1 \sum N (W_{t}^{i})^{2} var_{t - 1} {φ (X_{t}^{i})} \leq B^{- 2} i = 1 \sum N (W_{t}^{i})^{2} = \overline{W}_{t}^{2} = E_{t}^{N} .

∣ ∣ ∣ π_{t}^{N} - F_{t} π_{t - 1}^{N} ∣ ∣ ∣^{2}

∣ ∣ ∣ π_{t}^{N} - F_{t} π_{t - 1}^{N} ∣ ∣ ∣^{2}

E_{t}^{N} = \frac{\sum _{i = 1}^{N} { \sum _{j = 1}^{N} α ^{ij} W _{t - 1}^{j} g _{t - 1} ( X _{t - 1}^{j} ) } ^{2}}{{ \sum _{i = 1}^{N} \sum _{j = 1}^{N} α ^{ij} W _{t - 1}^{j} g _{t - 1} ( X _{t - 1}^{j} ) } ^{2}} \leq κ_{g}^{4} i = 1 \sum N (j = 1 \sum N α^{ij} \overline{W}_{t - 1, j})^{2} = κ_{g}^{4} α \overline{W}_{t - 1}^{2} \leq κ_{g}^{4} {\frac{1 - λ ^{2} ( α )}{N} + λ^{2} (α) \overline{W}_{t - 1}^{2}} = κ_{g}^{4} {\frac{1 - λ ^{2} ( α )}{N} + λ^{2} (α) E_{t - 1}^{N}} .

E_{t}^{N} = \frac{\sum _{i = 1}^{N} { \sum _{j = 1}^{N} α ^{ij} W _{t - 1}^{j} g _{t - 1} ( X _{t - 1}^{j} ) } ^{2}}{{ \sum _{i = 1}^{N} \sum _{j = 1}^{N} α ^{ij} W _{t - 1}^{j} g _{t - 1} ( X _{t - 1}^{j} ) } ^{2}} \leq κ_{g}^{4} i = 1 \sum N (j = 1 \sum N α^{ij} \overline{W}_{t - 1, j})^{2} = κ_{g}^{4} α \overline{W}_{t - 1}^{2} \leq κ_{g}^{4} {\frac{1 - λ ^{2} ( α )}{N} + λ^{2} (α) \overline{W}_{t - 1}^{2}} = κ_{g}^{4} {\frac{1 - λ ^{2} ( α )}{N} + λ^{2} (α) E_{t - 1}^{N}} .

∣ ∣ ∣ π_{t}^{N} - F_{t} π_{t - 1}^{N} ∣ ∣ ∣^{2}

∣ ∣ ∣ π_{t}^{N} - F_{t} π_{t - 1}^{N} ∣ ∣ ∣^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Particle filter efficiency under limited communication

Deborshee Sen

[email protected]

(Department of Mathematical Sciences, University of Bath, Bath BA27AY, UK)

Abstract

Sequential Monte Carlo methods are typically not straightforward to implement on parallel architectures. This is because standard resampling schemes involve communication between all particles. The $\alpha$ -sequential Monte Carlo method was proposed recently as a potential solution to this which limits communication between particles. This limited communication is controlled through a sequence of stochastic matrices known as $\alpha$ -matrices. We study the influence of the communication structure on the convergence and stability properties of the resulting algorithms. In particular, we quantitatively show that the mixing properties of the $\alpha$ -matrices play an important role in the stability properties of the algorithm. Moreover, we prove that one can ensure good mixing properties by using randomized communication structures where each particle only communicates with a few neighboring particles. The resulting algorithms converge at the usual Monte Carlo rate. This leads to efficient versions of distributed sequential Monte Carlo.

Keywords: $\alpha$ -sequential Monte Carlo; Bootstrap particle filter; Central limit theorem; Distributed algorithms; Mixing; Stability.

1 Introduction

Hidden Markov models (Rabiner and Juang,, 1986), also known as state-space models (Durbin and Koopman,, 2012), constitute a large class of numerical methods frequently used in statistics and signal processing. Examples of application areas include ecology (Michelot et al.,, 2016), finance (Nystrup et al.,, 2017), medical physics (Ingle et al.,, 2015), natural language processing (Kang et al.,, 2018), oceanology (Grecian et al.,, 2018), and sociology (Qiao et al.,, 2017).

A hidden Markov model with measurable state space $(\mathsf{X},\mathcal{X})$ and observation space $(\mathsf{Y},\mathcal{Y})$ is a process $\{(X_{t},Y_{t})\}_{t\geq 0}$ , where $\{X_{t}\}_{t\geq 0}$ is a Markov chain on $\mathsf{X}$ , and each observation $Y_{t}$ , valued in $\mathsf{Y}$ , is conditionally independent of the rest of the process given $X_{t}$ . Let $\pi_{0}$ and $\{K_{t}\}_{t\geq 1}$ be respectively a probability distribution and a sequence of Markov kernels on $(\mathsf{X},\mathcal{X})$ , and let $\{g_{t}\}_{t\geq 0}$ be a sequence of Markov kernels acting from $(\mathsf{X},\mathcal{X})$ to $(\mathsf{Y},\mathcal{Y})$ , with $g_{t}(x,\cdot)$ admitting a strictly positive density – denoted similarly by $g_{t}(x,y)$ – with respect to some dominating $\sigma$ -finite measure for every $t\geq 0$ , which we shall assume to be the Lebesgue measure for convenience. The hidden Markov model specified by $\pi_{0}$ , $\{K_{t}\}_{t\geq 1}$ and $\{g_{t}\}_{t\geq 0}$ is

[TABLE]

In the sequel, we fix a sequence of observations $y=\{y_{t}\}_{t\geq 0}$ and use $g_{t}(x)$ to denote $g_{t}(x,y_{t})$ for $t\geq 0$ . The functions $\{g_{t}(\cdot)\}_{t\geq 0}$ are known as potential functions and the kernels $\{K_{t}\}_{t\geq 1}$ are known as latent transition kernels. Let $\mathcal{M}(\mathsf{X})$ and $\mathcal{P}(\mathsf{X})$ denote the set of measures and probability measures on $(\mathsf{X},\mathcal{X})$ , respectively, and let $\mathcal{B}(\mathsf{X})$ denote the set of all real-valued measurable functions on $(\mathsf{X},\mathcal{X})$ which are bounded by one in absolute value. For a measure $\pi\in\mathcal{M}(\mathsf{X})$ and a function $\varphi\in\mathcal{B}(\mathsf{X})$ , we define $\pi(\varphi)=\int_{\mathsf{X}}\varphi(x)\pi(\text{d}x)$ , and for a Markov kernel $K$ on $(\mathsf{X},\mathcal{X})$ , we define $K\varphi(x)=\int_{\mathsf{X}}\varphi(x^{\prime})K(x,\text{d}x^{\prime})$ . We use the notation $Y_{s:t}$ for $s\leq t$ to denote $(Y_{s},\dots,Y_{t})$ .

We focus our attention on the predictive distribution in this article, which is the distribution of $X_{T}\mid Y_{0:(T-1)}$ for $T\geq 1$ . The analysis developed can be straightforwardly extended to the filtering distribution, which is the distribution of $X_{T}\mid Y_{0:T}$ . We denote the predictive distribution by $\pi_{T}\{X_{T}\mid Y_{0:(T-1)}\}$ for $T\geq 1$ . Integrals of functions $\varphi\in\mathcal{B}(\mathsf{X})$ with respect to the predictive distribution can be written as

[TABLE]

where $Z_{T}$ is the normalisation constant, which is the marginal likelihood of the observations $Y_{0:(T-1)}$ given by $Z_{T}=\int_{\mathsf{X}^{T+1}}\pi_{0}(\text{d}x_{0})\prod_{t=1}^{T}K_{t}(x_{t-1},\text{d}x_{t})\prod_{t=0}^{T-1}g_{t}(x_{t})$ ; we also define $Z_{0}=1$ . For the purpose of analysis, it is useful to also consider the unnormalised measure $\gamma_{T}$ defined as

[TABLE]

Unfortunately, these integrals cannot be evaluated analytically except for linear Gaussian models, and Monte Carlo methods must be used instead. The bootstrap particle filter algorithm (Gordon et al.,, 1993) is commonly used for inference in hidden Markov models. It starts by generating $N\geq 1$ independent and identically distributed samples, termed particles, $X_{0}=\{X_{0}^{i}\}_{i=1}^{N}$ from the distribution $\pi_{0}$ . Given particles $X_{t-1}=\{X_{t-1}^{i}\}_{i=1}^{N}$ , it performs multinomial resampling according to (unnormalised) weights $\{g_{t-1}(X_{t-1}^{i})\}_{i=1}^{N}$ , before propagating the particles via the Markov kernel $K_{t}$ . At each time $t\geq 0$ , the bootstrap particle filter provides a particle approximation of the predictive distribution $\pi_{t}$ and the normalisation constant $Z_{t}$ .

Parallel and distributed algorithms have become increasingly relevant as parallel computing architectures have become the norm rather than the exception. While there has been significant research devoted to distributed Markov chain Monte Carlo algorithms (Ahn et al.,, 2014; Scott et al.,, 2016; Li et al.,, 2017; Heng and Jacob,, 2019; Ou et al.,, 2021), the same has generally not been true for particle filtering. The resampling step of particle filters makes it difficult to parallelize. Two parallel implementations of the resampling step were proposed by Bolic et al., (2005), and alternative schemes were investigated by Miao et al., (2011); Murray, (2012); Murray et al., (2016). Vergé et al., (2015) provided algorithms involving resampling at two hierarchical levels, and Del Moral et al., (2017) proved convergence and central limit theorem. Míguez, (2014); Míguez and Vázquez, (2016) provided proofs of convergence for distributed particle filters relying on techniques developed in Bolic et al., (2005); however, these assumed that a certain notion of weight degeneracy does not occur. We prove in this article that weight degeneracy can be avoided by suitably choosing network architectures of distributed particle filters. Heine et al., (2020) designed stable-in-time distributed sequential Monte Carlo algorithms with limited interactions; however, these converge at a slower rate than the standard Monte Carlo rate.

The $\alpha$ -sequential Monte Carlo algorithm (Whiteley et al.,, 2016) was proposed recently as a general method for distributed sequential Monte Carlo. This is a generalisation of the bootstrap particle filter that can be implemented on parallel architectures. This is achieved by allowing particles to interact with only a small subset of other particles in the resampling step, and is formalized through a sequence of stochastic matrices. These are referred to as connectivity matrices in the sequel since they describe how particles are connected to each other. It has been shown that certain “local exchange” communication structures do not lead to stable algorithms (Heine and Whiteley,, 2017), and sophisticated adaptive mechanisms have been designed for ensuring stability (Lee and Whiteley,, 2016; Heine et al.,, 2020). However, a general understanding of the influence of the communication structure on the stability properties of the algorithm is lacking.

In this article, we relate the stability properties of the $\alpha$ -sequential Monte Carlo algorithm to the connectivity and mixing properties of the communication structures described by the connectivity matrices. In particular, we show that it is possible to design $\alpha$ -sequential Monte Carlo algorithms with time-uniform convergence at the standard Monte Carlo rate of $N^{-1/2}$ without the degree of the interaction graph growing with the number of particles $N$ . Computer code for numerical experiments in this article can be found online at https://github.com/deborsheesen/alphaSMC.

2 $\alpha$ -Sequential Monte Carlo

2.1 Algorithm description

The $\alpha$ -sequential Monte Carlo algorithm with $N\geq 1$ particles relies on a sequence of (possibly random) matrices $\{\alpha_{t}\}_{t\geq 0}$ , where each $\alpha_{t}=(\alpha_{t}^{ij})_{i,j=1}^{N}\in\mathbb{R}^{N,N}$ is a stochastic matrix; for any time index $t\geq 0$ and particle index $i=1,\dots,N$ , we have $\sum_{j=1}^{N}\alpha_{t}^{ij}=1$ . The $\alpha$ -sequential Monte Carlo algorithm simulates a sequence $\{X_{t};\,t\geq 0\}$ , where, for each time index $t\geq 0$ , we have $X_{t}=\{X_{t}^{i}\,;\,i=1,\dots,N\}$ , and $X^{i}_{t}\in\mathsf{X}$ is the location of the $i$ -th particle at time index $t\geq 0$ . The particle approximation $\widehat{\pi}_{t}^{N}$ of $\pi_{t}$ produced by the $\alpha$ -sequential Monte Carlo algorithm is given by $\widehat{\pi}_{t}^{N}=\sum_{i=1}^{N}\overline{W}_{t}^{i}\,\delta_{X_{t}^{i}}$ , where $\overline{W}_{t}=(\overline{W}^{1}_{t},\dots,\overline{W}^{N}_{t})\in\mathcal{P}(N)$ denotes the vector of normalised weights with $\mathcal{P}(N)=\{x\in\mathbb{R}^{N}_{+}\;:\;\sum_{i=1}^{N}x_{i}=1\}$ being the $N$ -dimensional probability simplex. We have also defined $\overline{W}^{i}_{t}=W^{i}_{t}/(\sum_{j=1}^{N}W^{j}_{t})$ as the normalised weights. The unnormalised weights $W_{t}=(W^{1}_{t},\dots,W^{N}_{t})\in\mathbb{R}_{+}^{N}$ are recursively defined as follows. At time index $t=0$ , the weights are all initialized to one, that is, $W^{i}_{t}=1$ $(i=1,\dots,N)$ . For $t\geq 1$ , the weights are recursively defined as

[TABLE]

The $\alpha$ -sequential Monte Carlo algorithm also produces a particle approximation of the unnormalised measure $\gamma_{t}$ and the normalisation constant $Z_{t}$ as $\widehat{\gamma}_{t}^{N}=(1/N)\sum_{i=1}^{N}W_{t}^{i}\,\delta_{X_{t}^{i}}$ and $\widehat{Z}^{N}_{t}=\widehat{\gamma}_{t}^{N}(1)=(1/N)\sum_{i=1}^{N}W_{t}^{i}$ . The particle equivalent of equation 3 is $\widehat{\gamma}_{t}^{N}=\widehat{Z}_{t}^{N}\times\widehat{\pi}_{t}^{N}$ , which states that the estimate of the unnormalised measure can be decomposed into the product of estimates for the normalised measure and the normalisation constant; this is the same as for the bootstrap particle filter.

The particles are initialised as follows. At time index $t=0$ , particles $X_{0}^{i}\in\mathsf{X}$ are simulated as being independent and identically distributed from the initial distribution $\pi_{0}$ . We define $\mathcal{F}_{t-1}$ to be the $\sigma$ -algebra generated by all the particles up to and including time $(t-1)$ , that is, $X_{0:(t-1)}$ , and all the connectivity matrices up to and including time $(t-1)$ , that is, $\alpha_{0:(t-1)}$ . We also define the notations $E_{t}(\cdot)=E(\cdot\mid\mathcal{F}_{t})$ and $\mathrm{var}_{t}(\cdot)=\mathrm{var}(\cdot\mid\mathcal{F}_{t})$ for convenience, which are the conditional mean and variance conditioned upon the state of the system up to and including time $t$ ; these will typically be used in the context of events happening after time $t$ . At time index $t\geq 1$ and conditionally upon $\mathcal{F}_{t-1}$ , the particles $\{X_{t}^{i}\}_{i=1}^{N}$ are simulated independently, with

[TABLE]

The $\alpha$ -sequential Monte Carlo algorithm is summarised in Algorithm 1. Throughout this text, we assume that the connectivity matrices $\{\alpha_{t}\}_{t\geq 0}$ can all be generated at the start of the algorithm. In other words, we do not consider adaptive schemes for constructing the connectivity matrices, as for example is explored by Liu and Chen, (1995); Whiteley et al., (2016); Lee and Whiteley, (2016).

Zhang et al., (2020) have implemented a distributed resampling technique using a message passing interface for a scheme that is similar to the local exchange scheme analysed by Heine and Whiteley, (2017), and have reported computational gains from doing so.

2.2 Basic Properties

The predictive probability distributions $\{\pi_{t}\}_{t\geq 0}$ defined by the state-space model (2) satisfy $\pi_{t}=\mathsf{F}_{t}\pi_{t-1}$ , where the mapping $\mathsf{F}_{t}:\mathcal{P}(\mathsf{X})\rightarrow\mathcal{P}(\mathsf{X})$ associates to any probability measure $\pi\in\mathcal{P}(\mathsf{X})$ the probability measure $\mathsf{F}_{t}\pi$ that acts on functions $\varphi\in\mathcal{B}(\mathsf{X})$ as

[TABLE]

For two time indices $0\leq s\leq t$ , set $\mathsf{F}_{s,t}=\mathsf{F}_{t}\circ\cdots\circ\mathsf{F}_{s+1}$ , with the convention that $\mathsf{F}_{t,t}$ is the identity mapping, so that we have $\pi_{t}=\mathsf{F}_{s,t}\,\pi_{s}$ . Similarly, the unnormalised measures $\{\gamma_{t}\}_{t\geq 0}$ satisfy $\gamma_{t}(\varphi)=\gamma_{s}(\mathsf{Q}_{s,t}\,\varphi)$ , where $\mathsf{Q}_{s,t}=\mathsf{Q}_{s+1}\circ\cdots\circ\mathsf{Q}_{t}$ and the operator $\mathsf{Q}_{t}$ acts on a test function $\varphi\in\mathcal{B}(\mathsf{X})$ as $\mathsf{Q}_{t}\,\varphi=g_{t-1}\,K_{t}\varphi$ , $t\geq 1$ .

As noted in Whiteley et al., (2016), if the connectivity matrices $\{\alpha_{t}\}_{t\geq 0}$ keep (almost surely) the uniform distribution on $\{1,\dots,N\}$ invariant, that is, $\mathbf{1}\alpha_{t}=\mathbf{1}$ , where $\mathbf{1}=(1,\dots,1)\in\mathbb{R}^{N}$ is the $N$ -dimensional vector of ones. The definition (4) of the weights shows that the particle approximations $\widehat{\gamma}_{t}^{N}$ are such that for any test function $\varphi\in\mathcal{B}(\mathsf{X})$ ,

[TABLE]

Consequently, iterating equation 5 shows that the particle approximation $\widehat{\gamma}^{N}_{t}(\varphi)$ is unbiased: $E\{\widehat{\gamma}^{N}_{t}(\varphi)\}=E\{\widehat{\gamma}^{N}_{0}(\mathsf{Q}_{0,t}\varphi)\}=\gamma_{0}(\mathsf{Q}_{0,t}\varphi)=\gamma_{t}(\varphi)$ . Since $\widehat{Z}^{N}_{t}=\gamma^{N}_{t}(1)$ , it also follows that $E(\widehat{Z}^{N}_{t})=Z_{t}$ . This lack-of-bias property allows the $\alpha$ -sequential Monte Carlo approach to be straightforwardly leveraged within other Monte Carlo schemes such as the pseudo-marginal Monte Carlo approach (Andrieu and Roberts,, 2009), particle Markov chain Monte Carlo methods (Andrieu et al.,, 2010), and advanced sequential Monte Carlo methods (Chopin et al.,, 2013).

3 Time-uniform stability of $\alpha$ -sequential Monte Carlo

3.1 Mixing of connectivity matrices

In this section, we assume that there exists a fixed bi-stochastic matrix $\alpha\in\mathbb{R}_{+}^{N,N}$ such that $\alpha_{t}=\alpha$ for all $t\geq 0$ . Under the assumption that the uniform distribution on $\{1,\dots,N\}$ is the unique invariant distribution of $\alpha$ , we relate the stability properties of the $\alpha$ -sequential Monte Carlo algorithm to the mixing properties of the connectivity matrix $\alpha$ . We define the mixing constant $\lambda(\alpha)$ of the connectivity matrix $\alpha$ as

[TABLE]

where $\|\cdot\|$ denotes the Euclidean norm and $\mathsf{B}^{0}_{1}=\{v\in\mathbb{R}^{N}\;:\;\|v\|=1\;\textrm{and}\;\langle v,\mathbf{1}\rangle=0\}$ is the compact set of unit vectors that are orthogonal to the vector $\mathbf{1}$ . The quantity $\lambda(\alpha)\geq 0$ is the smallest constant such that for any vector $v\in\mathbb{R}^{N}$ , we have

[TABLE]

If the Markov transition matrix $\alpha$ is reversible with respect to the uniform distribution on $\{1,\dots,N\}$ , that is, $\alpha$ is symmetric, the quantity $\lambda(\alpha)$ equals the absolute value of the second largest (in absolute value) eigenvalue of $\alpha$ : $\lambda(\alpha)=\max_{k\in\{2,\dots,N\}}\;|\lambda_{k}|$ , where $1=\lambda_{1}\geq\cdots\geq\lambda_{N}>-1$ is the spectrum of $\alpha$ . In other words, in the reversible case, $\lambda(\alpha)$ can also be expressed as one minus the absolute spectral gap of the matrix $\alpha$ . In the case where $w\in\mathbb{R}^{N}$ is a probability vector, that is, $w\in\mathcal{P}(N)$ , equation 7 can be reformulated as

[TABLE]

This is the key inequality that we will use to establish the stability properties of the $\alpha$ -sequential Monte Carlo algorithm.

3.2 Stability

To measure the discrepancy between two (possibly random) probability measures $\mu$ and $\nu$ , consider the norm

[TABLE]

We assume in this section that the potential functions $\{g_{t}\}_{t\geq 0}$ and latent transition kernels $\{K_{t}\}_{t\geq 1}$ of the state-space model (1) are uniformly bounded in time; this is standard when studying the stability properties of particle filters (Del Moral and Guionnet,, 2001; Whiteley et al.,, 2016). In other words, we make the following Assumption 1.

Assumption 1.

There exist constants $\kappa_{K}>1$ and $\kappa_{g}>1$ such that $\kappa_{K}^{-1}\leq K_{t}\leq\kappa_{K}$ $(t\geq 1)$ and $\kappa_{g}^{-1}\leq g_{t}\leq\kappa_{g}$ $(t\geq 0)$ .

The main result of this section is that under Assumption 1 and as soon as the absolute spectral gap of the matrix $\alpha$ is large enough, the discrepancy between the particle approximation $\widehat{\pi}_{t}^{N}$ and its limiting value $\pi_{t}$ can be uniformly bounded in time:

[TABLE]

In other words, the particle approximation $\widehat{\pi}^{N}_{t}$ converges to the true predictive distribution $\pi_{t}$ at the usual Monte Carlo rate, and this convergence can be controlled uniformly in time. This is formalised in Theorem 1, which is proved in Appendix A.

Theorem 1 (Uniform stability).

Suppose that the state-space model (1) satisfies Assumption 1. Consider the $\alpha$ -sequential Monte Carlo algorithm with $N$ particles and a constant bi-stochastic connectivity matrix $\alpha\in\mathbb{R}_{+}^{N,N}$ such that

•

the uniform distribution on $\{1,\dots,N\}$ is the unique invariant distribution of $\alpha$ , and

•

the mixing constant $\lambda(\alpha)$ defined in equation 6 satisfies $\lambda(\alpha)<\kappa_{g}^{-2}$ .

Then the following uniform bound for the $N$ -particle approximations $\widehat{\pi}_{t}$ holds:

[TABLE]

for constants $D>0$ , $\kappa_{g}>1$ and $0<\rho<1$ that depend only on the state-space-model (1).

The bootstrap particle filter corresponds to the case where $\lambda(\alpha)=0$ , and in that case one obtains that $N\times{|\kern-1.07639pt|\kern-1.07639pt|\widehat{\pi}^{N}_{t}-\pi_{t}|\kern-1.07639pt|\kern-1.07639pt|}^{2}\;\leq\;D\kappa_{g}^{4}/(1-\rho)=C_{\textrm{bootstrap}}$ . In the case of fast mixing connectivity matrices, that is, $\lambda(\alpha)\ll 1$ , expanding the right-hand-side of equation 9 in powers of $\lambda(\alpha)$ yields that

[TABLE]

In other words, when compared to the bootstrap particle filter, the use of a connectivity matrix $\alpha$ with limited communication incurs a cost of leading order $\lambda^{2}(\alpha)$ . In Section 5.2, we discuss another situation leading to similar conclusions.

4 Randomized connectivity matrices

4.1 Setting and basic properties

We extend the analysis of the previous section to randomized connectivity structures and obtain a central limit theorem. To this end, let $\mathcal{M}_{N}$ be the set of all $N\times N$ symmetric stochastic matrices, and consider a distribution $\upsilon_{N}$ on $\mathcal{M}_{N}$ such that

[TABLE]

The operation $P\alpha P^{-1}$ corresponds to permuting the nodes of the graph associated with the matrix $\alpha$ . For $\alpha\in\mathcal{M}_{N}$ , let $\SS(\alpha)$ be the set of all permutations of the nodes of the graph associated with $\alpha$ . The distribution $\upsilon_{N}$ is uniform over the set $\SS(\alpha)$ for every $\alpha$ . Moreover, the mixing constant (6) is the same for every matrix in the set $\SS(\alpha)$ , and this is therefore a generalisation of the setting considered in Section 3. Common examples of this framework include the bootstrap particle filter (which corresponds to $\upsilon_{N}$ placing mass one on the matrix $\mathbf{1}\mathbf{1}^{T}/N$ ) and importance sampling (which corresponds to $\upsilon_{N}$ placing mass one on the identity matrix). We shall consider another such setting in Section 5.1 with limited connections. We study the asymptotic behaviour of the $\alpha$ -sequential Monte Carlo algorithm under this setting.

In order to keep the analysis simple, we assume in this section that the potential functions $\{g_{t}\}_{t\geq 0}$ are uniformly bounded: there exists a constant $\kappa_{g}>0$ such that, for any time index $t\geq 0$ and $x\in\mathsf{X}$ , we have $0<g_{t}(x)\leq\kappa_{g}$ ; we note that this is weaker than Assumption 1. We also make the following assumption.

Assumption 2.

The distribution $\upsilon_{N}$ is such that for $\alpha\sim\upsilon_{N}$ , $E(\alpha^{ij}\alpha^{ik})=\mathcal{O}(N^{-2})$ for all $i\neq j\neq k$ . In particular, there exists $0\leq c_{3}<\infty$ such that $E(\alpha^{ij}\alpha^{ik})\leq c_{3}N^{-2}$ for $N$ large.

Assumption 2 is clearly satisfied for the bootstrap particle filter and for sequential importance sampling.

We prove consistency and a central limit theorem for the normalised measures $\widehat{\pi}_{t}^{N}$ and unnormalised measures $\widehat{\gamma}_{t}^{N}$ under this setting. In the proof of the central limit theorem, we will need to consider a further sequence of unnormalised measures defined as $\widehat{\mu}^{N}_{t}=(1/N)\sum_{i=1}^{N}(W^{i}_{t})^{2}\delta_{X^{i}_{t}}$ . Define an operator $\widetilde{\mathsf{Q}}_{t}$ that acts on a test function $\varphi\in\mathcal{B}(\mathsf{X})$ as $\widetilde{\mathsf{Q}}_{t}\varphi\;=\;g_{t-1}^{2}K_{t}\varphi$ . We show in Section 4.2 that under Assumption 2, as $N\to\infty$ , the unnormalised measures $\widehat{\mu}^{N}_{t}$ converge to the measure $\mu_{t}$ defined as $\mu_{0}=\pi_{0}$ , and, for a test function $\varphi\in\mathcal{B}(\mathsf{X})$ ,

[TABLE]

we have implicitly assumed that the limits on the right hand side of equation 12 exist. This is true for the bootstrap particle filter and sequential importance sampling, and more generally is true for the settings we consider in Section 5. Moreover, Assumption 2 and Proposition 1 of Section B.1 ensure that the right hand side of the previous equation is finite. We shall exploit equation 12 to study the asymptotic behaviour of $\alpha$ -sequential Monte Carlo with sparse connections in Section 5.2.

4.2 Consistency and central limit theorem

We first establish that the particle approximations $\widehat{\pi}^{N}_{t}$ , $\widehat{\gamma}^{N}_{t}$ , and $\widehat{\mu}^{N}_{t}$ are consistent.

Theorem 2 (Consistency).

Assume that the potential functions satisfy $0<g_{t}(x)\leq\kappa_{g}$ , and suppose also that Assumption 2 holds. For any test function $\varphi\in\mathcal{B}(\mathsf{X})$ , as $N\to\infty$ , the particle approximations $\widehat{\pi}^{N}_{t}(\varphi)$ , $\widehat{\gamma}^{N}_{t}(\varphi)$ and $\widehat{\mu}^{N}_{t}(\varphi)$ converge in probability to $\pi_{t}(\varphi)$ , $\gamma_{t}(\varphi)$ , and $\mu_{t}(\varphi)$ , respectively.

Theorem 2 is proved in Section B.2. Consistency of the particle approximations $\widehat{\pi}^{N}_{t}(\varphi)$ and $\widehat{\gamma}^{N}_{t}(\varphi)$ was established in Whiteley et al., (2016) under an asymptotic negligibility condition, which is automatically satisfied when the $\alpha$ matrices are bi-stochastic; we nonetheless include a straightforward proof for the sake of being a self-contained article. The consistency of $\widehat{\mu}^{N}_{t}(\varphi)$ is a more involved proof and is novel in our work.

We next show a central limit theorem for the particle approximations $\widehat{\pi}^{N}_{t}$ and $\widehat{\gamma}^{N}_{t}$ , which is proved in Section B.3.

Theorem 3 (Central limit theorem).

Assume that the potential functions satisfy $0<g_{t}(x)\leq\kappa_{g}$ , and suppose also that Assumption 2 holds. For any bounded test function $\varphi\in\mathcal{B}(\mathsf{X})$ , the re-normalised quantities $N^{1/2}\{\widehat{\gamma}^{N}_{t}(\varphi)-\gamma_{t}(\varphi)\}$ and $N^{1/2}\{\widehat{\pi}^{N}_{t}(\varphi)-\pi_{t}(\varphi)\}$ converge in laws to centred Gaussian distributions with variances $\mathbb{V}^{\gamma}_{t}(\varphi)$ and $\mathbb{V}^{\pi}_{t}(\varphi)$ , respectively, where the variances satisfy the following recursions:

[TABLE]

where $\overline{\varphi}_{t}=\varphi-\pi_{t}(\varphi)$ .

Theorem 3 provides a way to quantify the trade-off (relative to the bootstrap particle filter) in using $\alpha$ -sequential Monte Carlo under different settings as measured by its asymptotic variance. It is worth stressing that the terms $\mu_{t}(\varphi^{2})$ and $\mu_{t}(\overline{\varphi}^{2})$ depend on the choice of the $\alpha$ -matrices used. We discuss this in more detail in Section 5. In particular, we consider a setting in which particles are connected to a few other particles at each time and study the effect of the number of connections on the asymptotic variances.

5 Statistical tradeoffs

5.1 Permutations of a random walk on $d$ -regular graph

We describe and analyse a version of $\alpha$ -sequential Monte Carlo with sparse connections that falls into the setting considered in Section 4.1. Consider an undirected $d$ -regular graph $\mathcal{G}_{N}$ with $N$ vertices. Let $A$ be the stochastic matrix corresponding to a random walk on $\mathcal{G}_{N}$ . In other words, $A^{ij}=d^{-1}$ if nodes $i$ and $j$ have a vertex connecting them, and zero otherwise. Let $\mathcal{P}_{{\rm permute}}$ be the uniform distribution over the set of all permutations of $\{1,\dots,N\}$ , and let $\upsilon_{N}$ be a distribution over $\mathcal{M}_{N}$ specified by $PAP^{-1}$ for $P\sim\mathcal{P}_{{\rm permute}}$ . The operation $PAP^{-1}$ re-indexes $A$ by the permutation $\sigma$ of the indices. We consider the case where the graph $\mathcal{G}_{N}$ corresponds to a random $d$ -regular graph without self connections (that is, no node is connected to itself). In this case, $\lim_{N\to\infty}E(\alpha^{ii})=0$ , $\lim_{N\to\infty}NE\{(\alpha^{ij})^{2}\}=1/d$ , $\lim_{N\to\infty}2NE(\alpha^{ii}\alpha^{ij})=0$ , and $\lim_{N\to\infty}N^{2}E(\alpha^{ij}\alpha^{ik})=(d-1)/d$ . By equation 12, this implies

[TABLE]

This is used to analyze the asymptotic variances (13) of $\alpha$ -sequential Monte Carlo in the next section and compare them to those of the bootstrap particle filter.

5.2 Cost under sparse connections

We leverage the central limit theorem to analyze the influence of the number of connections $d$ on the performance of the $\alpha$ -sequential Monte Carlo algorithm. Iterating equation 14 immediately shows that $\mu_{t}(\varphi)=Z^{2}_{t}\pi_{t}(\varphi)+\sum_{k=1}^{t}\beta_{t,k}(\varphi)/d^{k}$ for some coefficients $\{\beta_{t,k}(\varphi)\}_{k=1}^{t}$ that depend on the test function $\varphi$ and the state-space model (1), but not on the connectivity $d$ . It then follows from Theorem 3 that the asymptotic variance can be expanded as

[TABLE]

for some coefficients $\{\widetilde{\beta}_{t,k}(\varphi)\}_{k=1}^{t}$ that depend on the test function $\varphi$ and the state-space model (1), but not on the connectivity $d$ ; here $\mathrm{var}_{\pi_{s}}$ denotes the variance under $\pi_{s}$ . Not surprisingly, since the limit $d\to\infty$ corresponds to the bootstrap particle filter, the first term on the right-hand side of the previous equation equals exactly the asymptotic variance obtained from a standard bootstrap particle filter (Chopin,, 2004). In other words,

[TABLE]

where $\mathbb{V}^{\textrm{bootstrap}}_{t}(\varphi)$ denotes the asymptotic variance of the bootstrap particle filter.

It is interesting to note that, in general, the first coefficient $\widetilde{\beta}_{t,1}(\varphi)$ can be either positive or negative. In other words, there are situations where the estimates obtained from $\alpha$ -sequential Monte Carlo are statistically more efficient that those obtained from the bootstrap particle filter: $\widetilde{\beta}_{t,1}(\varphi)<0$ . At a heuristic level, this may be explained as follows. When using $\alpha$ -sequential Monte Carlo, the propagation of information between particles is typically worse than that for the bootstrap particle filter. For example, if the distribution $\pi_{t}$ is more concentrated than the initial distribution $\pi_{0}$ , it is typically the case that the distributional estimates obtained from an $\alpha$ -sequential Monte Carlo with low value of $d$ will have thicker tails than the one obtained from the bootstrap particle filter (Figure 1). In these situations, the $\alpha$ -sequential Monte Carlo estimates of tail events of $\pi_{t}$ can have lower variance than the one obtained from the bootstrap particle filter.

As a concrete example, one can show that when $\pi_{0}$ is a standard real Gaussian distribution, $g_{0}(x)=0.1+100\times\mathbb{I}(|x|<0.1)$ , $\varphi(x)=\mathbb{I}(|x|>1)$ , and $K_{1}(x,\text{d}y)=\delta_{x}(\text{d}y)$ , the $\alpha$ -sequential Monte Carlo estimates of $\gamma_{1}(\varphi)=\pi_{0}(g_{0}\varphi)$ with $d=2$ have an asymptotic variance that is roughly half as large as the one obtained from the bootstrap particle filter.

In most more realistic scenarios where particle filters are routinely used (for example, tracking of partial and/or noisy dynamical systems), though, we have indeed observed that $\alpha$ -sequential Monte Carlo estimates have a higher variance than the estimates obtained from the bootstrap particle filter. Equation (15) shows that there is typically a cost of order $\mathcal{O}(d^{-1})$ additional variance for using $\alpha$ -sequential Monte Carlo instead of the bootstrap particle filter. This is demonstrated numerically in Figure 5 of Section 6.3.

Connecting back to the setting considered in Section 3 (which considers a fixed $\alpha$ matrix), this result is in the same spirit as the bound (10) that showed that there was a cost of order $\lambda(\alpha)^{2}$ (when controlling $N\times{|\kern-1.07639pt|\kern-1.07639pt|\widehat{\pi}_{t}^{N}-\pi_{t}|\kern-1.07639pt|\kern-1.07639pt|}^{2}$ ) when $\alpha$ -sequential Monte Carlo is used instead of the bootstrap particle filter. To see the connection, consider the connectivity matrix $\alpha\in\mathbb{R}_{+}^{N,N}$ to be equal to the Markov transition matrix of the random walk on an undirected graph $\mathcal{G}_{N}$ that is chosen uniformly at random among all the $d$ -regular graphs on $N$ vertices. Any such connectivity matrix $\alpha$ is bi-stochastic, so $\lambda(\alpha)$ equals one minus the absolute spectral gap of $\alpha$ : the Alon-Friedman theorem (Alon,, 1986; Friedman,, 2008) states that

[TABLE]

In other words, for such graphs and for a fixed connectivity $d\geq 2$ , the mixing constant $\lambda(\alpha)$ does not deteriorate as $N\to\infty$ ; this is demonstrated numerically in Section 6.1. If the connectivity matrix $\alpha$ was chosen this way, for large $N$ we would observe that $\lambda^{2}(\alpha)=\mathcal{O}(d^{-1})$ . Theorem 1 thus shows that, under regularity assumptions on the state-space model, in order to obtain an $\alpha$ -sequential Monte Carlo algorithm that is stable, one does not need to increase the number of connections $d\geq 3$ with the total number of particles $N$ as long as $d$ is large enough. We conjecture that a similar result holds for randomised connectivity matrices as well. Note that if $\mathcal{G}_{N}$ is the undirected graph on $\{1,\dots,N\}$ where the vertex $i$ is connected to each vertex $j\in\{i\pm 1,\dots,i\pm\lfloor d/2\rfloor\}\,\mathrm{mod}\,N$ , the mixing constant $\lambda(\alpha)$ converges to one as $N\to\infty$ , ultimately leading to poor performances. This is a variation of the local exchange mechanism considered in Heine and Whiteley, (2017), where the authors indeed show that one cannot expect such an algorithm to converge uniformly at rate $N^{-1/2}$ .

6 Numerical examples

6.1 Spectral gap of random $\alpha$ -matrices

We use the random graph generation algorithm of Steger and Wormald, (1999) as implemented in the NetworkX package of Python (Hagberg et al.,, 2008) to generate random $\alpha$ -matrices. This generates graphs $\mathcal{G}_{N}$ that are samples from the uniform distribution over all $d$ -regular graphs with $N$ nodes. The $\alpha$ -matrix is defined as the Markov transition matrix of the random walk on $\mathcal{G}_{N}$ . We consider different values of $(d,N)$ and simulate 100 random $\alpha$ matrices for each pair. Figure 2 shows the quality of the mixing constant $\lambda(\alpha)$ as a function of $d$ and $N$ , as well as the limiting value as $N\to\infty$ as described in equation 16.

6.2 Predictive distribution estimations

Consider the state-space-model with initial distribution $\pi_{0}=\mathrm{Normal}(0,1)$ , dynamics $X_{t+1}=\beta X_{t}+\sqrt{1-\beta^{2}}\xi_{t}$ with $\beta=0.9$ and $\xi_{t}\sim\mathrm{Normal}(0,1)$ . We consider the situation where the potential functions are all equal and given by $g_{t}(x)=0.1+10\times\mathbb{I}(|x-2|<0.1)$ for $t\geq 0$ . We run several experiments with $N=10^{4}$ particles for different values of the connectivity $d$ . For each experiment, we randomly generate a $d$ -regular graph as described in Section 6.1 and run the $\alpha$ -sequential Monte Carlo algorithm using this. The top panel of Figure 3 shows the performance of the $\alpha$ -sequential Monte Carlo algorithm for the estimation of $\pi_{T=6}(\varphi)$ for $\varphi(x)=x^{2}$ . For a connectivity $d=50$ , the estimate from $\alpha$ -sequential Monte Carlo is roughly as accurate as the bootstrap particle filter. The bottom panel of Figure 3 shows the Wasserstein distance between the estimated predictive distributions and the true predictive distribution obtained by running an $\alpha$ -sequential Monte Carlo algorithm for several values of the connectivity $d\geq 0$ ; the true predictive distribution is obtained by running the bootstrap particle filter with a large number of particles.

6.3 Comparison with bootstrap particle filter

We numerically investigate the effects of using sparse $d$ -regular networks on the stability of the $\alpha$ -sequential Monte Carlo algorithm. Three settings are considered.

(a)

A local exchange scheme (Heine and Whiteley,, 2017).

(b)

Generating an $\alpha$ matrix as described in Section 6.1 at the beginning of the algorithm and fixing it throughout. This is the setting considered in Section 3 and is referred to as ‘random $d$ -regular (no permutation)’ in this section.

(c)

Randomly permuting the matrix generated in (b) at time time step; this is the setting considered in Section 4 and is referred to as “random $d$ -regular (with permutation)’ in this section.

We consider a time-discretized version of the chaotic Lorenz 63 model (Lorenz,, 1963). The hidden chain $\{X_{t}\}_{t\geq 0}$ is three-dimensional with $X_{t}=(X_{t,1},X_{t,2},X_{t,3})$ and evolves as

[TABLE]

where $\Delta t=10^{-3}$ is the time-discretization and $\varepsilon_{t}=(\varepsilon_{t,1},\varepsilon_{t,2},\varepsilon_{t,3})$ are independent and identically distributed as $\mathrm{Normal}(0,\Delta t\tau^{2}I)$ for $\tau=10^{-1}$ . This model is known to be chaotic when $(\sigma,\rho,\beta)=(10,28,8/3)$ , and this is the setting we choose. We collect observations $Y_{t}$ after every $\delta=10\Delta t$ units of time and assume that they are distributed as $Y_{t}\mid X_{t}\sim\mathrm{Normal}(X_{t},\eta^{2}I)$ for $\eta=5\times 10^{-1}$ .

We generate $T=10^{3}$ observations from this model. The bootstrap particle filter with $10^{6}$ particles is used to calculate the ground truth. We compare the relative mean square errors of the estimate to the log-likelihood and predictive mean $E(X_{T}\mid Y_{0:(T-1)})$ for the three methods; this is the ratio of the mean square error of the estimate obtained by each method to the mean square error of the estimate obtained by the bootstrap particle filter with the same number of particles. We repeat the experiments $100$ times to obtain the mean square error.

The two left plots of Figure 4 display relative mean square errors for $N=5\times 10^{4}$ as the degree $d$ of the graph increases. As expected, the local exchange particle filter has a large error as compared to the bootstrap particle filter, which decreases as the degree increases. More interestingly, choosing a random $d$ -regular graph has much lower error and is virtually indistinguishable from the bootstrap particle filter. This is true irrespective of whether we permute the nodes of the graph at every time, which is unsurprising as the permutation operation leaves the mixing constant unchanged. A random $5$ -regular graph appears to perform extremely well.

The two right plots of Figure 4 display relative mean square errors as the network size (number of particles) $N$ increases. The performance of the local exchange particle filter deteriorates as $N$ increases, which is unsurprising since its mixing deteriorates. However, as predicted by the theory, the performance of a random $d$ -regular graph remains stable as $N$ increases, whether or not the nodes are permuted at each time.

Finally, we display in Figure 5 the additional variance of the $\alpha$ -sequential Monte Carlo algorithm as compared to the bootstrap particle filter when using a random $d$ -regular graph as the connectivity structure. As predicted by the theory, the additional variance is of order $\mathcal{O}(d^{-1})$ .

7 Conclusion

The bottleneck in parallelising particle filters is usually the resampling step since it typically involves interactions between all particles. Reducing these interactions can lead to more efficient algorithms, albeit sometimes at the expense of stability. Future directions can include relaxing the assumptions made in this article, considering adaptive sequential Monte Carlo (Fearnhead and Taylor,, 2013), and considering high-dimensional target spaces (Beskos et al.,, 2014). An interesting future direction would be to consider estimating the variance of the estimates obtained by the $\alpha$ -sequential Monte Carlo algorithm along the lines of Chan and Lai, (2013); Lee and Whiteley, (2018). From an applied perspective, it would be interesting to compare stable distributed implementations of sequential Monte Carlo with distributed Markov chain Monte Carlo.

Acknowledgment

The author acknowledges support from grant DMS-1638521 from SAMSI. The author would like to thank Alexandre H Thiery and Kari Heine for helpful discussions.

Appendix A Proof of time-uniform stability

Proof of Theorem 1.

Recall that the sequence of probability measures $\{\pi_{t}\}_{t\geq 0}$ defined by the state-space model (2) of the main text satisfies $\pi_{t}=\mathsf{F}_{s,t}\pi_{s}$ , where the operator $\mathsf{F}_{t}$ is defined as

[TABLE]

The stability properties of the operators $\{\mathsf{F}_{t}\}_{t\geq 1}$ are well-understood (Del Moral,, 2004). Under Assumption 1 of the main text, there exist constants $D>0$ and $\rho\in(0,1)$ such that for any two probability measures $\mu,\mu^{\prime}\in\mathcal{P}(\mathsf{X})$ , we have ${|\kern-1.07639pt|\kern-1.07639pt|\mathsf{F}_{s,t}\mu-\mathsf{F}_{s,t}\mu^{\prime}|\kern-1.07639pt|\kern-1.07639pt|}\;\leq\;D\rho^{t-s}{|\kern-1.07639pt|\kern-1.07639pt|\mu-\mu^{\prime}|\kern-1.07639pt|\kern-1.07639pt|}$ . The decomposition $(\widehat{\pi}^{N}_{T}-\pi_{T})=(\widehat{\pi}^{N}_{T}-\mathsf{F}_{0,T}\widehat{\pi}^{N}_{0})+(\mathsf{F}_{0,T}\widehat{\pi}^{N}_{0}-\mathsf{F}_{0,T}\pi_{0})$ and the standard telescoping expansion $(\widehat{\pi}^{N}_{T}-\mathsf{F}_{0,T}\widehat{\pi}^{N}_{0})=\sum_{t=1}^{T}(\mathsf{F}_{t,T}\widehat{\pi}^{N}_{t}-\mathsf{F}_{t,T}\mathsf{F}_{t}\widehat{\pi}^{N}_{t-1})$ yields that the discrepancy ${|\kern-1.07639pt|\kern-1.07639pt|\widehat{\pi}^{N}_{T}-\pi_{T}|\kern-1.07639pt|\kern-1.07639pt|}$ can be controlled as

[TABLE]

Since $\widehat{\pi}^{N}_{0}(\varphi)=N^{-1}\sum_{i=1}^{N}\varphi(X_{0,i})$ for independent and identically distributed samples $X_{0,i}\sim\pi_{0}$ , it follows that ${|\kern-1.07639pt|\kern-1.07639pt|\widehat{\pi}^{N}_{0}-\pi_{0}|\kern-1.07639pt|\kern-1.07639pt|}\leq N^{-1/2}$ . Consequently, since $\rho\in(0,1)$ , for proving an upper bound of the type $\sup_{t\geq 0}\;{|\kern-1.07639pt|\kern-1.07639pt|\widehat{\pi}^{N}_{t}-\pi_{t}|\kern-1.07639pt|\kern-1.07639pt|}\;\leq\;\textrm{Cst}\times N^{-1/2}$ , it only remains to prove that the quantities ${|\kern-1.07639pt|\kern-1.07639pt|\widehat{\pi}^{N}_{t}-\mathsf{F}_{t}\widehat{\pi}^{N}_{t-1}|\kern-1.07639pt|\kern-1.07639pt|}$ can be uniformly bounded in time by a constant multiple of $N^{-1/2}$ .

For a test function $\varphi\in\mathcal{B}(\mathsf{X})$ , we have $\widehat{\pi}^{N}_{t}(\varphi)-\mathsf{F}_{t}\widehat{\pi}^{N}_{t-1}(\varphi)=(\widetilde{A}-A)/B$ for

$A=\sum_{i=1}^{N}W_{t-1}^{i}g_{t-1}(X_{t-1}^{i})K_{t}\varphi(X_{t-1}^{i})$ , $\widetilde{A}=\sum_{i=1}^{N}W_{t}^{i}\varphi(X_{t}^{i})$ , and $B=\sum_{i=1}^{N}W_{t-1}^{i}g_{t-1}(X_{t-1}^{i})$ . We have $E_{t-1}(\widetilde{A})=A$ , and the quantities $A$ , $B$ and $W^{i}_{t}$ are all $\mathcal{F}_{t-1}$ -measurable. It follows that

[TABLE]

In the last line of equation 18, we have used the fact that $B=\sum_{i=1}^{N}W_{t}^{i}$ . We have also introduced the quantity $\mathsf{E}^{N}_{t}=\|\overline{W}_{t}\|^{2}$ ; this is a measure of the effective sample size (Whiteley et al.,, 2016). In summary, we have thus established that

[TABLE]

As recognised in Whiteley et al., (2016), equation 18 shows that controlling the behaviour of $\mathsf{E}_{t}^{N}$ is crucial to studying the stability properties of the $\alpha$ -sequential Monte Carlo algorithm. For proving a bound of the type given by $\sup_{t\geq 0}\;{|\kern-1.07639pt|\kern-1.07639pt|\widehat{\pi}^{N}_{t}-\pi_{t}|\kern-1.07639pt|\kern-1.07639pt|}\;\leq\;\textrm{Cst}\times N^{-1/2}$ , equation 19 reveals that it suffices to have the uniform-in-time bound $E(\mathsf{E}_{t}^{N})\leq\mathrm{Cst}/N$ . Recalling that $\kappa_{g}^{-1}\leq g_{t-1}(x)\leq\kappa_{g}$ by Assumption 1 of the main text, the bound (8) yields that

[TABLE]

If the constant $\lambda(\alpha)$ defined in equation 6 of the main text satisfies $\lambda(\alpha)<1/\kappa_{g}^{2}$ , iterating the bound (20) directly yields that

[TABLE]

Combining equation 17 and equation 21, the theorem is proved. ∎

Appendix B Proofs for randomized connections

B.1 Setup

The following proposition is useful in studying the asymptotic behaviour of the $\alpha$ -sequential Monte Carlo algorithm.

Proposition 1 (Basic properties).

The following are true, where the expectations are with respect to the distribution $\upsilon_{N}$ on $\mathcal{M}_{N}$ .

(a)

$E\{(\alpha^{ii})^{2}\}$ * does not depend on $i$ .* 2. (b)

For $i\neq j$ , $E(\alpha^{ij})$ and $E\{(\alpha^{ij})^{2}\}$ do not depend on $(i,j)$ . Further, there exists $0\leq c_{1}<\infty$ such that $E(\alpha^{ij})\leq c_{1}N^{-1}$ and $E\{(\alpha^{ij})^{2}\}\leq c_{1}N^{-1}$ for $N$ large. 3. (c)

For $i\neq j$ , $E(\alpha^{ii}\alpha^{ij})$ does not depend on $(i,j)$ , Further, there exists $0\leq c_{2}<\infty$ such that $E(\alpha^{ii}\alpha^{ij})\leq c_{2}N^{-1}$ for $N$ large.

For example, for $i\neq j$ , we have $E(\alpha^{ij})=N^{-1}$ and $E\{(\alpha^{ij})^{2}\}=N^{-2}$ for the bootstrap particle filter, and we have $E(\alpha^{ij})=E\{(\alpha^{ij})^{2}\}=0$ for sequential importance sampling.

Let the permutation corresponding to a permutation matrix $P$ be $\sigma:\{1,\dots,N\}\mapsto\{1,\dots,N\}$ . Then $(P\alpha P^{-1})^{ij}=\alpha^{\sigma(i)\sigma^{-1}(j)}$ . By equation 11, this implies $\alpha^{ij}\stackrel{{\scriptstyle\mathrm{D}}}{{=}}\alpha^{\sigma(i)\sigma^{-1}(j)}$ for all $1\leq i,j\leq N$ and all permutations $\sigma$ , where we have used the notation $\cdot\stackrel{{\scriptstyle\mathrm{D}}}{{=}}\cdot$ to denote that the left and right hand side have the same distribution.

Proof of Proposition 1.

Part a follows since for any $i\neq i^{\prime}$ , there exists a permutation $\sigma$ such that $\sigma(i)=i^{\prime}$ and $\sigma(i^{\prime})=i$ , which implies that $\alpha^{ii}\stackrel{{\scriptstyle\mathrm{D}}}{{=}}\alpha^{\sigma(i)\sigma^{-1}(i)}=\alpha^{i^{\prime}i^{\prime}}$ .

To see part b, consider $i\neq j\neq k$ . There exists a permutation $\sigma$ such that $\sigma(i)=i$ and $\sigma(k)=j$ , which implies that $\alpha^{ij}\stackrel{{\scriptstyle\mathrm{D}}}{{=}}\alpha^{\sigma(i)\sigma^{-1}(j)}=\alpha^{ik}$ . This implies that $E(\alpha^{ij})=E(\alpha^{ik})$ for all $i\neq j\neq k$ . Similarly, we have $E(\alpha^{ij})=E(\alpha^{i^{\prime}j})$ for all $i\neq i^{\prime}\neq j$ . The previous two statements imply that $E(\alpha^{ij})$ does not depend on $(i,j)$ for $i\neq j$ . Since $\sum_{j=1}^{N}\alpha^{ij}=1$ and $\alpha^{ij}\leq 1$ , the results follow.

To see part c, consider $i\neq j\neq k$ . There exists a permutation $\sigma$ such that $\sigma(i)=i$ and $\sigma(k)=j$ , and therefore $(\alpha^{ii},\alpha^{ij})\stackrel{{\scriptstyle\mathrm{D}}}{{=}}(\alpha^{\sigma(i)\sigma^{-1}(i)},\alpha^{\sigma(i)\sigma^{-1}(j)})=(\alpha^{ii},\alpha^{ik})$ for $j\neq k$ . Thus $E(\alpha^{ii}\alpha^{ij})$ does not depend on $j$ for $j\neq i$ . Similarly, for $i\neq i^{\prime}\neq j$ , there exists a permutation $\sigma$ such that $\sigma(i)=i^{\prime}$ , $\sigma(i^{\prime})=i$ , and $\sigma(j)=j$ , which implies $(\alpha^{ii},\alpha^{ij})\stackrel{{\scriptstyle\mathrm{D}}}{{=}}(\alpha^{\sigma(i)\sigma^{-1}(i)},\alpha^{\sigma(i)\sigma^{-1}(j)})=(\alpha^{i^{\prime}i^{\prime}},\alpha^{i^{\prime}j})$ . Thus $E(\alpha^{ii}\alpha^{ij})$ does not depend on $i$ for $i\neq j$ . The inequality follows from part b as $\alpha^{ij}\leq 1$ . ∎

By the assumption that for any time index $t\geq 0$ and $x\in\mathsf{X}$ , we have $0<g_{t}(x)\leq\kappa_{g}$ , it follows that for any time index $t\geq 0$ and particle index $1\leq i\leq N$ , we have $0<W^{i}_{t}\leq\kappa_{g}^{t}$ , so that, for a test function $\varphi\in\mathcal{B}(\mathsf{X})$ , the random variables $\widehat{\gamma}^{N}_{t}(\varphi)$ and $\widehat{\mu}^{N}_{t}(\varphi)$ are almost surely bounded: $\|\widehat{\gamma}^{N}_{t}(\varphi)\|_{\infty}\leq\kappa_{g}^{t}$ and $\|\widehat{\mu}^{N}_{t}(\varphi)\|_{\infty}\leq\kappa_{g}^{2t}$ , where the infinity norm of a random variable $X$ is defined as $\|X\|_{\infty}=\inf\{B>0:\mathsf{pr}(|X|\leq B)=1\}$ .

For the bootstrap particle filter, it is standard that as $N\to\infty$ , the sequence of particle approximations $\widehat{\pi}^{N}_{t}$ is consistent (Del Moral,, 1996). Since in that case, all the weights $W^{i}_{t}$ are equal and converge in probability to $Z_{t}$ , it follows that

[TABLE]

where $\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}$ denotes convergence in probability. Equation (12) of the main text can be seen as a generalisation of equation 22.

For the bootstrap particle filter, equation 12 of the main text implies $\mu_{t}(\varphi)=Z_{t}^{2}\pi_{t}(\varphi)$ , which in turn implies equation 22. Similarly, for importance sampling, equation 12 of the main text implies $\mu_{t}(\varphi)=\mu_{t-1}(\widetilde{\mathsf{Q}}_{t}\varphi)$ , which is as expected. Equation (12) is therefore a generalization of the bootstrap particle filter and importance sampling, which represent two extreme communication structures (fully connected and not connected at all, respectively).

B.2 Consistency

Proof of Theorem 2.

Since $\widehat{\pi}^{N}_{t}(\varphi)=\widehat{\gamma}^{N}_{t}(\varphi)/\widehat{\gamma}^{N}_{t}(1)$ , it suffices to prove that $\widehat{\gamma}^{N}_{t}(\varphi)\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}\gamma_{t}(\varphi)$ and $\widehat{\mu}^{N}_{t}(\varphi)\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}\mu_{t}(\varphi)$ . We work by induction, the initial case $t=0$ following directly from the weak law of large numbers.

**(I) Convergence of $\widehat{\gamma}^{N}_{t}$ .

**This follows from Theorem 1 of Whiteley et al., (2016), but we include a proof for the sake of completeness. Since $\|\widehat{\gamma}^{N}_{t}(\varphi)\|_{\infty}\leq\kappa_{g}^{t}$ and $E\{\widehat{\gamma}^{N}_{t}(\varphi)\}=\gamma_{t}(\varphi)$ , for proving $\widehat{\gamma}^{N}_{t}(\varphi)\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}\gamma_{t}(\varphi)$ , it suffices to prove that the variance of $\widehat{\gamma}^{N}_{t}(\varphi)$ converges to zero.

•

The variance of $E_{t-1}\{\widehat{\gamma}^{N}_{t}(\varphi)\}=\widehat{\gamma}^{N}_{t}(\mathsf{Q}_{t}\varphi)$ converges to zero since the random variables $\widehat{\gamma}^{N}_{t}(\mathsf{Q}_{t}\varphi)$ are bounded by $\kappa_{g}^{t}$ and, by induction, converge in probability to $\gamma_{t}(\mathsf{Q}_{t}\varphi)$ as $N\to\infty$ .

•

The expectation of $\mathrm{var}_{t-1}\{\widehat{\gamma}^{N}_{t}(\varphi)\}$ also converges to zero. Indeed, since the random variables $\{W^{i}_{t}\,\varphi(X^{i}_{t})\}_{i=1}^{N}$ are independent conditionally upon $\mathcal{F}_{t-1}$ and upper bounded in absolute value by $\kappa_{g}^{t}$ , we have that, almost surely,

[TABLE]

Since $\mathrm{var}\{\widehat{\gamma}^{N}_{t}(\varphi)\}$ can be decomposed as the sum of the expectation of $\mathrm{var}_{t-1}\{\widehat{\gamma}^{N}_{t}(\varphi)\}$ and the variance of $E_{t-1}\{\widehat{\gamma}^{N}_{t}(\varphi)\}$ , this concludes the proof that $\widehat{\gamma}^{N}_{t}(\varphi)\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}\gamma_{t}(\varphi)$ .

**(II) Convergence of $\widehat{\mu}^{N}_{t}$ .

**We proceed in two steps. We first show that $E\{\widehat{\mu}^{N}_{t}(\varphi)\}\to\mu_{t}(\varphi)$ , and then prove that $\mathrm{var}\{\widehat{\mu}^{N}_{t}(\varphi)\}$ converges to zero. Since $\|\widehat{\mu}^{N}_{t}(\varphi)\|_{\infty}\leq\kappa_{2}^{2t}$ , this is enough to obtain that $\widehat{\mu}^{N}_{t}(\varphi)\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}\mu_{t}(\varphi)$ .

To begin with,

[TABLE]

Expression (23) is

[TABLE]

where the first equality is by Proposition 1 a and the first part of Proposition 1 b, the second and fourth equalities are because $\widehat{\mu}^{N}_{t-1}(\widetilde{\mathsf{Q}}_{t}\varphi)=(1/N)\sum_{i=1}^{N}(W_{t-1}^{i})^{2}g^{2}_{t-1}(X_{t-1}^{i})K_{t}\varphi(X_{t-1}^{i})$ , the third equality is by the second part of Proposition 1 b, and and the limit is by the consistency of $\widehat{\mu}^{N}_{t-1}$ .

For the sake of convenience, define

[TABLE]

Expression (24) can be written as

[TABLE]

where the third equality uses Proposition 1 c, the fourth equality uses Assumption 2, and the sixth equality uses the fact all the relevant quantities are bounded.

Putting together what we have obtained so far, we get

[TABLE]

It remains to prove that $\mathrm{var}\{\widehat{\mu}^{N}_{t}(\varphi)\}=\mathrm{var}[E_{t-1}\{\widehat{\mu}^{N}_{t}(\varphi)\}]+E[\mathrm{var}_{t-1}\{\widehat{\mu}^{N}_{t}(\varphi)\}]\to 0$ . We now prove that each term converges to zero. From the proof of $E_{t-1}\{\widehat{\mu}^{N}_{t}(\varphi)\}\to\mu_{t}(\varphi)$ , we have

[TABLE]

•

By Proposition 1 and Assumption 2, all terms on the right-hand side of equation 25 are upper bounded by a universal constant and, by induction, converge in probability to a constant. Therefore the variance of $E_{t-1}\{\widehat{\mu}^{N}_{t-1}(\varphi)\}$ converges to zero.

•

The expectation of $\mathrm{var}_{t-1}\{\widehat{\mu}^{N}_{t}(\varphi)\}$ also converges to zero. Indeed, since the random variables $\{(W^{i}_{t})^{2}\,\varphi(X^{i}_{t})\}_{i=1}^{N}$ are independent conditionally upon $\mathcal{F}^{N}_{t-1}$ and upper bounded in absolute value by $\kappa_{g}^{2t}$ , we have that, almost surely,

[TABLE]

This concludes the proof. ∎

B.3 Central limit theorem

Proof of Theorem 3.

Notice that $\{\widehat{\pi}^{N}_{t}(\varphi)-\pi_{t}(\varphi)\}=\{\widehat{\gamma}^{N}_{t}(1)\}^{-1}\,[\widehat{\gamma}^{N}_{t}\{\varphi-\pi_{t}(\varphi)\}]$ and $\widehat{\gamma}^{N}_{t}(1)\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}Z_{t}$ . Consequently, the recursive formula for the asymptotic variance $\mathbb{V}^{\pi}_{t}(\varphi)$ readily follows from the one describing $\mathbb{V}^{\gamma}_{t}(\varphi)$ . We thus concentrate on proving the recursive formula for $\mathbb{V}^{\gamma}_{t}(\varphi)$ . We proceed by induction and use a standard Fourier-theoretic approach. The initial case $t=0$ follows directly from the standard central limit theorem for independent and identically distributed random variables. We need to prove that for any $\xi\in\mathbb{R}$ and $S^{N}_{t}=N^{1/2}\{\widehat{\gamma}^{N}_{t}(\varphi)-\gamma_{t}(\varphi)\}$ , we have $E\{\exp{\left(\mathrm{i}\mkern 1.0mu\xi S^{N}_{t}\right)}\}\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}\exp\{-\mathbb{V}^{\gamma}_{t}(\varphi)\xi^{2}/2\}$ , where $\mathrm{i}\mkern 1.0mu$ denotes the imaginary unit. We have

[TABLE]

Further, $B^{N}_{t}(\varphi)=N^{1/2}\{\widehat{\gamma}^{N}_{t-1}(\mathsf{Q}\varphi)-\gamma_{t-1}(\mathsf{Q}\varphi)\}$ , so the induction hypothesis yields that $E[\exp\{\mathrm{i}\mkern 1.0mu\xi B^{N}_{t}(\varphi)\}]\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}\exp\{-\mathbb{V}^{\gamma}_{t-1}(\mathsf{Q}_{t}\varphi)\xi^{2}/2\}$ . To conclude, it suffices to show that

[TABLE]

since it then follows (by Slutsky’s theorem) that

[TABLE]

We thus concentrate on establishing equation (26). To this end, note that we can write $A^{N}_{t}(\varphi)=N^{-1/2}\sum_{i=1}^{N}U^{i}_{t}$ , where the random variables $U^{i}_{t}=W^{i}_{t}\varphi(X^{i}_{t})-E_{t-1}\{W^{i}_{t}\varphi(X^{i}_{t})\}=W^{i}_{t}\varphi(X^{i}_{t})-\widehat{\gamma}^{N}_{t-1}(\mathsf{Q}_{t}\varphi)$ are independent and identically distributed conditionally upon $\mathcal{F}_{t-1}$ . Theorem A.3 of Douc and Moulines, (2007) shows that in order to prove equation 26, it is enough to prove that for any $\epsilon>0$ and as $N\stackrel{{\scriptstyle\mathrm{pr}}}{{\rightarrow}}\infty$ , we have

[TABLE]

where $\mathbb{I}(\cdot)$ denotes the indicator function. The tail condition (28) directly follows from the fact that we consider bounded test functions $\varphi\in\mathcal{B}(\mathsf{X})$ and that $0<W^{i}_{t}\leq\kappa_{g}^{t}$ almost surely. We thus focus on proving equation 27. We have $\mathrm{var}_{t-1}(U^{i}_{t})=E_{t-1}\{(W^{i}_{t})^{2}\varphi^{2}(X^{i}_{t})\}-E_{t-1}\{W^{i}_{t}\varphi(X^{i}_{t})\}^{2}$ , so equation 5 of the main text, Theorem 2, and the boundedness of $\widehat{\mu}^{N}_{t}(\varphi^{2})$ together yield that

[TABLE]

as desired. This concludes the proof. ∎

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ahn et al., (2014) Ahn, S., Shahbaba, B., and Welling, M. (2014). Distributed stochastic gradient MCMC. In International Conference on Machine Learning , pages 1044–1052.
2Alon, (1986) Alon, N. (1986). Eigenvalues and expanders. Combinatorica , 6(2):83–96.
3Andrieu et al., (2010) Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 72(3):269–342.
4Andrieu and Roberts, (2009) Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics , pages 697–725.
5Beskos et al., (2014) Beskos, A., Crisan, D., and Jasra, A. (2014). On the stability of sequential Monte Carlo methods in high dimensions. The Annals of Applied Probability , 24(4):1396–1445.
6Bolic et al., (2005) Bolic, M., Djuric, P. M., and Hong, S. (2005). Resampling algorithms and architectures for distributed particle filters. IEEE Transactions on Signal Processing , 53(7):2442–2450.
7Chan and Lai, (2013) Chan, H. P. and Lai, T. L. (2013). A general theory of particle filters in hidden Markov models and some applications. The Annals of Statistics , 41(6):2877–2904.
8Chopin, (2004) Chopin, N. (2004). Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. The Annals of Statistics , 32(6):2385–2411.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Particle filter efficiency under limited communication

Abstract

1 Introduction

2 α\alphaα-Sequential Monte Carlo

2.1 Algorithm description

2.2 Basic Properties

3 Time-uniform stability of α\alphaα-sequential Monte Carlo

3.1 Mixing of connectivity matrices

3.2 Stability

Assumption 1**.**

Theorem 1** (Uniform stability).**

4 Randomized connectivity matrices

4.1 Setting and basic properties

Assumption 2**.**

4.2 Consistency and central limit theorem

Theorem 2** (Consistency).**

Theorem 3** (Central limit theorem).**

5 Statistical tradeoffs

5.1 Permutations of a random walk on ddd-regular graph

5.2 Cost under sparse connections

6 Numerical examples

6.1 Spectral gap of random α\alphaα-matrices

6.2 Predictive distribution estimations

6.3 Comparison with bootstrap particle filter

7 Conclusion

Acknowledgment

Appendix A Proof of time-uniform stability

Proof of Theorem 1.

Appendix B Proofs for randomized connections

B.1 Setup

Proposition 1** (Basic properties).**

Proof of Proposition 1.

B.2 Consistency

Proof of Theorem 2.

B.3 Central limit theorem

Proof of Theorem 3.

2 $\alpha$ -Sequential Monte Carlo

3 Time-uniform stability of $\alpha$ -sequential Monte Carlo

Assumption 1.

Theorem 1 (Uniform stability).

Assumption 2.

Theorem 2 (Consistency).

Theorem 3 (Central limit theorem).

5.1 Permutations of a random walk on $d$ -regular graph

6.1 Spectral gap of random $\alpha$ -matrices

Proposition 1 (Basic properties).