Accelerated inference for stochastic compartmental models with over-dispersed partial observations
Michael Whitehouse

TL;DR
This paper introduces a fast method for analyzing disease spread models with uncertain observations.
Contribution
A new approximate likelihood method for stochastic compartmental models with over-dispersed observations is developed.
Findings
The method achieves order of magnitude speed gains over existing approaches.
It successfully recovers latent disease states and reporting probabilities in simulations.
The approach is implemented in Stan for practical disease modeling applications.
Abstract
An assumed density approximate likelihood is derived for a class of partially observed stochastic compartmental models which permit observational over-dispersion. This is achieved by treating time-varying reporting probabilities as latent variables and integrating them out using Laplace approximations within Poisson Approximate Likelihoods (LawPAL), resulting in a fast deterministic approximation to the marginal likelihood and filtering distributions. We derive an asymptotically exact filtering result in the large population regime, demonstrating the approximation’s ability to recover latent disease states and reporting probabilities. Through simulations we: 1) demonstrate favorable behavior of the maximum approximate likelihood estimator in the large population and time horizon regime in terms of ground truth recovery; 2) demonstrate order of magnitude computational speed gains over a…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7- —https://doi.org/10.13039/501100000265Medical Research Council
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 epidemiological studies · Markov Chains and Monte Carlo Methods · Statistical Methods and Bayesian Inference
Introduction
Compartmental models, initially developed in the seminal works of Mckendrick (1925); Kermack and Mckendrick (1927), capture disease outbreak dynamics by dividing a population into distinct disease states we call ‘compartments’. One then defines a transmission model which describes the rate at which individuals transition between compartments through time - in turn, this implicitly characterizes the progression of the disease through the population. In practice this process is noisily observed through testing data, usually cast as random under-reporting of either incidence (newly infected individuals) or prevalence (currently infected individuals) counts.
Such models can be cast in myriad ways, deterministic, stochastic, continuous, or discrete time, finite population counts, or continuous proportions. The link between the Markov jump process formulation and its corresponding large population limiting deterministic ODE system was formalized by Kurtz (1970). When coupled with a noisy observation model likelihoods for ODE transmission models are relatively easy to compute using an ODE solver. This has motivated widespread adoption of the ODE approach in practice. Such an approach, however, ignores the inherent stochasticity governing disease transmission (King et al. 2015).
A complication introduced when using a stochastic transmission model is the need to marginalize over the latent transmission process, which subsequently renders the likelihood intractable. Many simulation based solutions to this issue have been proposed in the literature, approaches include: sequential Monte Carlo (SMC) (Corbella et al. 2022; Koepke et al. 2016; Wheeler et al. 2024), Approximate Bayesian Computation (Mckinley et al. 2009; Kypraios et al. 2017), Data Augmentation MCMC (Fintzi et al. 2017; Lekone and Finkenstädt 2006; Nguyen-Van-Yen et al. 2021). In principle these methods only require the ability to simulate from the model, though in practice they rely on intricate tuning, vast computational power for many repeated simulations, and the construction of sophisticated proposal distributions or summary statistics (Rimella et al. 2022; Prangle 2018). One deterministic approach is the linear noise approximation (LNA) (Golightly et al. 2023; Chkrebtii et al. 2022) which consists of Gaussian approximations derived from the functional central limit theorem associated with Kurtz (1970).
The recently developed suite of assumed density approximate likelihoods (ADAL) (Whiteley and Rimella 2021; Whitehouse et al. 2023; Rimella et al. 2025) compose a family of computationally efficient methods for inference in discrete time stochastic compartmental models which exhibit favorable theoretical properties. ADALs are obtained by marginalizing over the latent compartmental model through a combination of recursive expectation propagation (Minka 2001) and assumed density (Sorenson and Stubberud 1968) steps to arrive at deterministic approximations to filtering distributions and the marginal likelihood. In their vanilla form they can be used to fit models with Binomial under-reporting (Whiteley and Rimella 2021), this is an equi-dispersed observation model in the sense that the variance of a binomial random variable is strictly of the same order as its mean. Such assumptions can lead to problematic inference when confronted with data exhibiting over-dispersion, i.e. an under-reporting distribution with variance significantly larger than its mean, as explored in Section 5.3 and noted in previous studies (Whitehouse et al. 2023; Stocks et al. 2020). The use of ADALs can be extended to over-dispersed observation mechanisms by treating reporting probabilities as time varying random variables, see Section 2.3 for details, and integrating them out by embedding ADALs in an SMC sampler, incurring significant computational cost (Whitehouse et al. 2023). In this paper we extend the remit of ADALs to models permitting observational over-dispersion without resorting to sampling methods. This is achieved by recursively using Laplace approximations to integrate out the latent reporting rates, resulting in a novel deterministic and computationally simple marginal likelihood approximation. We introduce the class of latent compartmental models considered by this paper in Section 2. We derive and present an algorithm to compute our approximate filtering distributions and likelihood in Section 3. In Section 4 we present a result on the asymptotic behavior of our filtering approximations in the large population limit. Through simulations in Section 5 we explore properties of the maximum approximate likelihood estimator in terms of ground truth recovery and benchmark our approximation against an SMC approach. To conclude we perform a comparison of observation mechanisms by embedding our methodology within a probabilistic programming language (PPL) to facilitate automated Bayesian inference.
Model
Notation
The set of natural numbers, including 0, is denoted \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {N}_0$$\end{document} . The set of non-negative real numbers is denoted \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}_{\ge 0}$$\end{document} . For an integer \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\ge 1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[m]:=\{1,\ldots ,m\}$$\end{document} . Matrices and vectors are denoted by bold upper-case and bold lower-case letters, respectively, e.g., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{A}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{b}$$\end{document} , with non-bold upper-case and lower case used for their respective elements \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A^{(i,j)}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b^{(i)}$$\end{document} . All vectors are column vectors unless stated otherwise. We use \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{1}_m$$\end{document} to denote the vector of m 1’s. The indicator function is denoted \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {I}[\cdot ]$$\end{document} . For a sequence \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_1,x_2,\dots $$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\in \mathbb {N}$$\end{document} we write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1:s} = \{ x_1,\dots , x_s\}$$\end{document} .
For \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}\in \mathbb {N}_0^m$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\lambda }\in \mathbb {R}_{\ge 0}^m$$\end{document} we write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}\sim \textrm{Pois}(\boldsymbol{\lambda })$$\end{document} to denote that the elements of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}$$\end{document} are independent and element \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{(i)}$$\end{document} is Poisson distributed with parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda ^{(i)}$$\end{document} . We shall say that such a random vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}$$\end{document} has a “vector-Poisson distribution”. For a probability vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\pi }$$\end{document} we write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Mult}(n,\boldsymbol{\pi })$$\end{document} for the associated multinomial distribution. Similarly, for a random matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{X}\in \mathbb {N}_0^{m \times l}$$\end{document} and a matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\Lambda }\in \mathbb {R}_{\ge 0}^{m \times l}$$\end{document} , we write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{X}\sim \textrm{Pois}(\boldsymbol{\Lambda }) $$\end{document} when the elements of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{X}$$\end{document} are independent with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^{(i,j)}$$\end{document} being Poisson distributed with parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Lambda ^{(i,j)}$$\end{document} . We call \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\lambda }$$\end{document} (resp. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\Lambda }$$\end{document} ) the intensity vector (resp. matrix). For \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a,b,\mu \in \mathbb {R}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a<b$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2>0$$\end{document} we write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[a,b]}(\mu ,\sigma ^2)$$\end{document} to denote a Gaussian distribution with mean and variance parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} conditioned to take values in [a, b].
Latent compartmental model
The transmission model we consider for the remainder of the paper is defined by: m, the number of compartments; n the initial population size; a length-m probability vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\pi }_0$$\end{document} and for each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t \ge 0$$\end{document} a mapping from length-m probability vectors to size- \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m \times m$$\end{document} row-stochastic matrices, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\eta } \mapsto \textbf{K}_{t, \boldsymbol{\eta }}$$\end{document} . The population at time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t \in \mathbb {N}_0$$\end{document} is a set of n random variables \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\xi _{t}^{(1)}, \ldots , \xi _{t}^{(n)}\}$$\end{document} , each valued in [m]. The counts of individuals in each of the m compartments at time t are collected in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_{t}=[x_{t}^{(1)} \cdots x_{t}^{(m)}]^\top $$\end{document} , where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{t}^{(i)}=\sum _{j=1}^{n} \mathbb {I}[\xi _{t}^{(j)}=i].$$\end{document} The population is initialized as a draw \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{x}}_0 \sim \textrm{Mult}(n, \boldsymbol{\pi }_0)$$\end{document} . The members of the population are exchangeable, labeled by, e.g., a uniformly random assignment of indices \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\xi _{0}^{(1)}, \ldots , \xi _{0}^{(n)}\}$$\end{document} subject to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{0}^{(j)}:=\sum _{i=1}^{n} \mathbb {I}[\xi _{0}^{(i)}=j].$$\end{document} For \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}\in \mathbb {N}_0^m$$\end{document} we define \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\eta }(\textbf{x}) = [x^{(1)}/n \,\cdots \, x^{(m)}/n]^\top $$\end{document} . For \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\ge 1$$\end{document} , given \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\xi _{t-1}^{(1)}, \ldots , \xi _{t-1}^{(n)}\}$$\end{document} , the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\xi _{t}^{(1)}, \ldots , \xi _{t}^{(n)}\}$$\end{document} are conditionally independent such that for each i, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _{t}^{(i)}$$\end{document} is drawn from the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\xi _{t-1}^{(i)}$$\end{document} ’th row of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{K}_{t, \boldsymbol{\eta }({\textbf{x}}_{t-1})}$$\end{document} . Moreover, let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}_{t}$$\end{document} be an \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m \times m$$\end{document} matrix with elements \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_{t}^{(i, j)}:=\sum _{k=1}^{n} \mathbb {I}[\xi _{t-1}^{(k)}=i, \xi _{t}^{(k)}=j],$$\end{document} which counts the individuals transitioning from compartment i at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t-1$$\end{document} to j at time t. It should be understood that the way in which \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{K}_{\boldsymbol{\eta }}$$\end{document} depends on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\eta }$$\end{document} encodes how transition rates depend on the current population disease states (e.g. prevalence), we illustrate this now with an example.
SEIR example
A popular model used in practice (Lekone and Finkenstädt 2006; Rawson et al. 2025) is the discrete time stochastic susceptible exposed infected removed (SEIR) model. The dynamics of susceptible \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(S_t)$$\end{document} , exposed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(E_t)$$\end{document} , infected \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(I_t)$$\end{document} , and removed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(R_t)$$\end{document} counts are described by
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & S_{t+1} = S_{t} - B_{t},\quad E_{t+1} = E_{t} + B_{t} - C_{t},\\ & \quad I_{t+1} = I_{t} + C_{t} - D_{t}, \qquad R_{t+1} = R_{t} + D_{t}. \end{aligned}$$\end{document}With conditionally independent, binomially distributed random increment variables:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & B_t \sim \textrm{Bin}(S_t, 1- e^{-h\beta \frac{I_t}{n}}), \quad C_t \sim \textrm{Bin}(E_t, 1- e^{-h\rho }),\\ & \quad D_t \sim \textrm{Bin}(I_t, 1- e^{-h\gamma }), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h>0$$\end{document} is a time-step size (we set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h=1$$\end{document} throughout) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta , \rho , \gamma $$\end{document} are model parameters. The model is initialized with non negative integers in each compartment as a sample \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[S_0,E_0,I_0,R_0]^\top \sim \textrm{Mult}(n,\boldsymbol{\pi }_0)$$\end{document} for some length-4 probability vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\pi }_0$$\end{document} . We can interpret \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} as the rate at which interactions between susceptible and infected individuals results in an infection. The mean exposure and infection periods are given by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/\rho $$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/\gamma $$\end{document} respectively and the reproduction number is given by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta /\gamma $$\end{document} . This model can be cast as an instance of the model from Section 2.2 by taking \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m=4$$\end{document} , identifying \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_t\equiv [S_t\;E_t\;I_t\;R_t]^\top $$\end{document} and:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textbf{K}_{t, \boldsymbol{\eta }}=\left[ \begin{array}{cccc} e^{-h \beta \eta ^{(3)}} & 1-e^{-h \beta \eta ^{(3)}} & 0 & 0\\ 0 & e^{-h \rho } & 1-e^{-h \rho } & 0\\ 0 & 0& e^{-h \gamma } & 1-e^{-h \gamma }\\ 0 & 0 & 0 & 1 \end{array}\right] . \end{aligned}$$\end{document}We can then further identify the time t transitions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_t:=Z_t^{(1,2)}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_t:=Z_t^{(2,3)}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_t:=Z_t^{(3,4)}$$\end{document} .
Over-dispersed observation model of disease incidence
In this article we neglect observation models derived from disease prevalence, the total number currently infected in a population at a given time, since these data are rarely available in practice. In our notation this corresponds to observations derived from the process \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\textbf{x}_t\}_{t\ge 0}$$\end{document} . Instead, we focus on observations derived from disease incidence, the number of newly infected individuals in a given time window - corresponding to observations derived from the process \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\textbf{Z}_t\}_{t\ge 1}$$\end{document} . In particular, the number of individuals transitioning between two compartments \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i,j \in [m]$$\end{document} at time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\ge 1$$\end{document} is modeled by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_t^{(i,j)}$$\end{document} . For pedagogical reasons we consider here incidence corresponding to a single pair of compartments i and j, i.e. observations are a univariate time series \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{1},y_2,\dots $$\end{document} , though we note that the methodology could be extended to higher dimensional problems, such as age-stratified incidence or geographical meta-population models incorporating immigration and emigration (Ionides et al. 2022; Andrade and Duggan 2020), we discuss this further in Section 6. The observation at time t, denoted \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_t$$\end{document} , has distribution given by the following hierarchical model.
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} q_t&\sim f(\cdot \mid \varphi ), \nonumber \\ y_t|\textbf{Z}_t,q_t&\sim \text {Bin}(Z_t^{(i,j)},q_t), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\cdot \mid \varphi )$$\end{document} is a distribution with support [0, 1] and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi $$\end{document} is a model parameter. If \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\cdot | \varphi )$$\end{document} is a Dirac mass at some \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q \in [0,1]$$\end{document} then we recover an equi-dispersed binomial observation model (Whiteley and Rimella 2021; Fintzi et al. 2017). Considering a general f gives rise to over-dispersed models for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_t$$\end{document} when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t$$\end{document} is marginalized out. For the remainder of the paper we follow (Whitehouse et al. 2023) and take f to be a Gaussian distribution truncated to the interval (0, 1) with parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varphi :=[\mu _q,\sigma _q^2]$$\end{document} , where the components denote mean and variance parameters, respectively. In Section 3 we show that with this model structure it is possible to derive an analytically tractable assumed density approximate likelihood.
Approximate filtering and marginal likelihood
This section presents methodology for performing inference on the model described in Section 2; the proposed approximations fall under the umbrella of assumed density approximate likelihood (ADAL) methods (Whiteley and Rimella 2021; Rimella et al. 2025; Whitehouse et al. 2023), extending previous advances to handle over-dispersed observations.
Model likelihood
The processes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{(\textbf{Z}_t,q_t)\}_{t\ge 1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{y_t\}_{t\ge 1}$$\end{document} can be cast as a hidden Markov model (HMM), with the former constituting the latent hidden state and latter the observed. Given a finite time horizon \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T\in \mathbb {N}$$\end{document} the exact likelihood is given by
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & p(y_{1:T}) = \sum _{\textbf{x}_0}p(\textbf{x}_0)\sum _{\textbf{Z}_{1:T}}\int _{(0,1)^{T}}\\ & \quad \prod _{t=1}^T p(y_t|\textbf{Z}_t,q_t) p(\textbf{Z}_t|\textbf{Z}_{t-1})p(q_t)dq_{1:T}, \end{aligned}$$\end{document}which has prohibitive computational cost for even moderate population size n, compartment number m and time horizon T (Whiteley and Rimella 2021). The forward algorithm reduces this cost to linear in T using the so-called prediction and update steps, in principle, given \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{x}_0)$$\end{document} one could compute
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(\textbf{Z}_t| y_{1:t-1})&= \sum _{\textbf{Z}_{t-1}}p(\textbf{Z}_t|\textbf{Z}_{t-1})p(\textbf{Z}_{t-1}|y_{1:t-1}), \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(y_t|y_{1:t-1})&= \sum _{\textbf{Z}_t} \int p(y_t| \textbf{Z}_t,q_t)p(\textbf{Z}_t,q_t| y_{1:t-1})d{q_t}, \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(q_t | y_{1:t})&= \frac{p(y_t|q_t,y_{1:t-1})p(q_t|y_{1:t-1})}{p(y_t|y_{1:t-1})}, \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(\textbf{Z}_t| y_{1:t})&= \frac{p(y_t|\textbf{Z}_t)p(\textbf{Z}_t|y_{1:t-1})}{p(y_t|y_{1:t-1})}. \end{aligned}$$\end{document}However, equations (2)-(5) are still not available in closed form due to the intractable marginalization steps over \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}_{t-1},\textbf{Z}_t,$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t$$\end{document} , and must be approximated.
Approximation derivation
Our strategy to approximate (2)-(5) is outlined by the following approximation steps:
- Given a matrix Poisson approximation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}(\bar{\mathbf \Lambda }_{t-1})$$\end{document} of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_{t-1}|y_{1:t-1})$$\end{document} for some \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\times m$$\end{document} matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\mathbf \Lambda }_{t-1}$$\end{document} , derive an approximation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}({\mathbf \Lambda }_{t})$$\end{document} of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_t| y_{1:t-1})$$\end{document} for some \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\times m$$\end{document} matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbf \Lambda }_{t}$$\end{document} .
- Perform a Laplace approximation to the intractable calculations (3) and (4) to approximate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(y_t|y_{1:t-1})$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(q_t | y_{1:t})$$\end{document} .
- Calculate an approximation of the first moment \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}[\textbf{Z}_t| y_{1:t}]$$\end{document} and use this to define an approximation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}(\bar{\mathbf \Lambda }_{t})$$\end{document} of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_t| y_{1:t})$$\end{document} for some \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\times m$$\end{document} matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\mathbf \Lambda }_{t}$$\end{document} . These steps define a recursive strategy to approximate (2)-(5). Step 1 follows the same reasoning as the prediction step associated with the Poisson Approximate Likelihood (PAL) procedure of Whitehouse et al. (2023). The recipe for steps 2 and 3, along with supporting theory and experiments in Sections 4 and 5, constitute the novel contributions of this paper.
Approximation step 1: prediction step. Step 1 can be formalized with Lemma 1, which was first presented in Whitehouse et al. (2023). For \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}\in \mathbb {N}_0^{m\times m}$$\end{document} and a length-m probability vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\eta }$$\end{document} , let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{M}_t(\textbf{Z}, \boldsymbol{\eta }, \cdot )$$\end{document} denote the probability mass function of a random \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m \times m$$\end{document} matrix, say \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\textbf{Z}}$$\end{document} , such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{1}_m^\top \textbf{Z}= (\tilde{\textbf{Z}} \textbf{1}_m )^\top $$\end{document} with probability 1 where, given the row sums \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\textbf{Z}}\textbf{1}_m = \textbf{x}$$\end{document} , the rows of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\textbf{Z}}$$\end{document} are conditionally independent with the conditional distribution of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i^{th}$$\end{document} row being \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Mult}(x^{(i)}, \textbf{K}_{t,\boldsymbol{\eta }}^{(i,\cdot )})$$\end{document} . That is, by construction, we have that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{M}_t(\textbf{Z}_{t-1}, \boldsymbol{\eta }(\textbf{1}_m^\top \textbf{Z}_{t-1}), \textbf{Z}_t)$$\end{document} is equal to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_{t}|\textbf{Z}_{t-1})$$\end{document} as per our latent compartmental model. We therefore have that at time t the prediction step of equation (2) is given by
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(\textbf{Z}_t| y_{1:t-1})&= \sum _{\textbf{Z}_{t-1}} \bar{M}_t(\textbf{Z}_{t-1}, \boldsymbol{\eta }(\textbf{1}_m^\top \textbf{Z}_{t-1}), \textbf{Z}_t)p(\textbf{Z}_{t-1}|y_{1:t-1}), \end{aligned}$$\end{document}which we have noted is intractable. To circumvent this issue we replace \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_{t-1}| y_{1:t-1})$$\end{document} with our approximation from the previous time step \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}(\bar{\mathbf \Lambda }_{t-1})$$\end{document} and replace \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta (\textbf{1}_m^\top \textbf{Z}_{t-1})$$\end{document} with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta (\mathbb {E}[\textbf{1}_m^\top \textbf{Z}_{t-1}])$$\end{document} , where the expectation is under \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}_{t-1} \sim \textrm{Poi}(\bar{\mathbf \Lambda }_{t-1})$$\end{document} . This approximation turns out to have an analytically tractable solution in the form of another matrix Poisson distribution, we formalize this with Lemma 1.
Lemma 1
If for a given \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m \times m$$\end{document} matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf \Lambda $$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\mu }$$\end{document} is the probability mass function associated with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Pois}(\mathbf \Lambda )$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}_{\bar{\mu }}[ \textbf{1}_m^\top \textbf{Z}]$$\end{document} is the expected value of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{1}^\top _m \textbf{Z}$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}\sim \bar{\mu }$$\end{document} , then we have that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{\textbf{Z}\in \mathbb {N}_{0}^{m\times m}}\bar{\mu }(\textbf{Z}) \bar{M}_t(\textbf{Z},\boldsymbol{\eta }(\mathbb {E}_{\bar{\mu }}[\textbf{1}_m^\top \textbf{Z}]),\cdot )$$\end{document} is the probability mass function associated with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Pois}((\boldsymbol{\lambda }\otimes \textbf{1}_m) \odot \textbf{K}_{t,\boldsymbol{\eta }(\boldsymbol{\lambda })})$$\end{document} , where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\lambda }^\top :=\textbf{1}_m^\top \boldsymbol{\Lambda }$$\end{document} .
Using Lemma 1 we arrive at the following recipe for approximation step 1. Given a matrix Poisson approximation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}(\bar{\mathbf \Lambda }_{t-1})$$\end{document} of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_{t-1}|y_{1:t-1})$$\end{document} for some \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\times m$$\end{document} matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\mathbf \Lambda }_{t-1}$$\end{document} , we approximate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_t| y_{1:t-1})$$\end{document} with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}({\mathbf \Lambda }_{t})$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf \Lambda _{t} = (\bar{\boldsymbol{\lambda }}_{t-1} \otimes \textbf{1}_m) \odot \textbf{K}_{t,\boldsymbol{\eta }(\bar{\boldsymbol{\lambda }}_{t-1})}$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\boldsymbol{\lambda }}_{t-1}^\top :=\textbf{1}_m^\top \bar{\mathbf \Lambda }_{t-1}$$\end{document} .
Approximation step 2: marginal likelihood.
Given our approximation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}({\mathbf \Lambda }_{t})$$\end{document} of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_t| y_{1:t-1})$$\end{document} we wish to perform a Laplace approximation to equations (3) and (4), we now outline this recipe. Once again we distill the problem to suppress dependence on t and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{1:{t-1}}$$\end{document} : for some \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m\times m$$\end{document} matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf \Lambda $$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(i,j)\in [m]^2$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q \in (0,1)$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _q^2>0$$\end{document} , let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}\sim \textrm{Poi}(\mathbf \Lambda )$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q \sim \mathcal {N}_{[0,1]}(\mu _q, \sigma _q^2)$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y \sim \textrm{Binom}(Z^{(i,j)},q)$$\end{document} . One can use standard results of Poisson processes (Kingman 1992) to notice that, given q, y is a marked Poisson process and we can write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y|q\sim \textrm{Poi}(q\Lambda ^{(i,j)})$$\end{document} .
For an approximation with a Gaussian to be appropriate, at the very least we should want p(q|y) to be uni-modal. To see this is indeed the case note that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(q|y)\propto p(y|q)p(q)$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\partial ^2\log p(y|q)/\partial q^2 = -y/q^2<0$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\partial ^2 \log p(q)/\partial q^2 = -1/\sigma _q^2<0$$\end{document} , so that p(y|q)p(q) is a product of log-concave functions. This in turn means p(q|y) is itself log-concave and, hence, uni-modal in q.
The standard recipe for a Laplace approximation, see e.g. Murphy (2012) Section 8.4.1, to p(q|y) and p(y) is given by
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(q\mid y)&\overset{{ {\textit{law}}}}{\approx } \mathcal {N}_{[0,1]}(\bar{q}, s^2), \end{aligned}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \log p(y)&\approx \log p(y, \bar{q}) + \log (s\sqrt{2\pi }), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}:= \arg \max _qp(y,q)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s^2:= \left( -\frac{\partial ^2}{\partial q^2}\log p(y,q)\right. $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left. |_{q=\bar{q}} \right) ^{-1}$$\end{document} where the support in (6) has further been restricted to the unit interval [0, 1] in line with the prior support on q. It turns out both \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s^2$$\end{document} can be calculated analytically: for some constant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_y $$\end{document} , which does not depend on q, write
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & \log p(y,q){=}\log \left\{ p(y|q)p(q) \right\} {=} y\log (q\Lambda ^{(i,j)}){-}q\Lambda ^{(i,j)} \nonumber \\ & - \frac{1}{2}\left( \frac{q-\mu _q}{\sigma ^2_q} \right) ^2+ C_y. \end{aligned}$$\end{document}Differentiating and setting to zero gives a quadratic equation we can solve
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{\partial }{\partial q}\log p(y,q) = \frac{y}{q} - \Lambda ^{(i,j)} - \frac{q-\mu _q}{\sigma ^2_q}&= 0, \\ \iff q^2 +( \Lambda ^{(i,j)}\sigma ^2_q - \mu _q)q - y\sigma ^2_q&= 0, \end{aligned}$$\end{document}which gives \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\arg \max _{q}p(y,q)= \frac{1}{2}\left( \mu _q - \Lambda ^{(i,j)}\sigma ^2_q \right. $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left. + \sqrt{\left( \Lambda ^{(i,j)}\sigma ^2_q - \mu _q \right) ^2 - 4y\sigma ^2_q }\right) ,$$\end{document} which we can in turn use to deduce \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( -\frac{\partial ^2}{\partial q^2}\log p(y,q)|_{q=\bar{q}} \right) ^{-1} = \left( \frac{y}{\bar{q}^2}+ \frac{1}{\sigma _q^2}\right) ^{-1}$$\end{document} , where we take the convention \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0/0=0$$\end{document} . If we let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi (\cdot )$$\end{document} be the probability density function associated with a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,1]}(\mu _q, \sigma _q^2)$$\end{document} random variable then we can write (7) as
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \log p(y)&\approx \log \left\{ p(y|q)p(q) \right\} {+}\log (s\sqrt{2\pi })\\&=y\log (\bar{q}\Lambda ^{(i,j)}){-}\bar{q}\Lambda ^{(i,j)}{-}y! + \phi (\bar{q}){+}\log (s\sqrt{2\pi }). \end{aligned}$$\end{document}Our recipe for approximation step 2 is thus as follows. Given an approximation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}({\mathbf \Lambda }_{t})$$\end{document} of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_t| y_{1:t-1})$$\end{document} , we define \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}_t :=\frac{1}{2}\left( \mu _q - \Lambda _t ^{(i,j)}\sigma ^2_q + \sqrt{\left( \Lambda _t ^{(i,j)}\sigma ^2_q - \mu _q \right) ^2 - 4y_t\sigma ^2_q }\right) $$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s^2_t \leftarrow \left( \frac{y_t}{\bar{q}_t}+ \frac{1}{\sigma _q^2}\right) ^{-1}$$\end{document} to arrive at the approximation:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & \log p(y_t|y_{1:t-1}) \approx \ell (y_t|y_{1:t-1}) :=- \Lambda _t^{(i,j)} \bar{q}_t+ \\ & \quad + y_t \log (\Lambda _t^{(i,j)} \bar{q}_t) 4- y_t! + \phi \left( \bar{q}_t\right) +\frac{1}{2}\log (s_t2\pi ). \end{aligned}$$\end{document}Approximation step 3: update step. To complete our recursions we move on to approximation step 3 in which we use a moment matching strategy to approximate (5) with an assumed matrix Poisson distribution. For any \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(k,l)\ne (i,j)$$\end{document} we have \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}\left[ Z^{(k,l)}|y\right] =\mathbb {E}\left[ Z^{(k,l)}\right] =\Lambda ^{(k,l)}$$\end{document} . Considering now \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z^{(i,j)}$$\end{document} , we have by Lemma 4 of Whitehouse et al. (2023) that
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathbb {E}\left[ Z^{(i,j)}|q,y\right] = y + (1-q)\Lambda ^{(i,j)}. \end{aligned}$$\end{document}Then by the tower rule we have that
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathbb {E}\left[ Z^{(i,j)}|y\right] =\mathbb {E}\left[ \mathbb {E}\left[ Z^{(i,j)}|q,y\right] |y\right] = y + (1-\mathbb {E}\left[ q|y\right] )\Lambda ^{(i,j)}. \end{aligned}$$\end{document}We then approximate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}\left[ q|y\right] $$\end{document} using (6). We make the further approximation that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}\left[ q|y\right] \approx \bar{q}$$\end{document} , this error becomes negligible when the variance of (6) is small and can be justified both theoretically and experimentally as per results presented in Sections 4 and 5. Defining the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m \times m$$\end{document} matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\mathbf \Lambda }$$\end{document} element-wise as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\Lambda }^{(k,l)}:= {\Lambda }^{(k,l)}$$\end{document} for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(k,l)\ne (i,j)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\Lambda }^{(i,j)}:= y + (1-\bar{q}){\Lambda }^{(i,j)}$$\end{document} we have that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}\left[ \textbf{Z}|y\right] \approx \bar{\mathbf \Lambda }$$\end{document} , leading us to the moment matching approximation of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}|y)$$\end{document} with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}(\bar{\mathbf \Lambda })$$\end{document} . Hence our recipe for approximation step 3 is as follows. Given the definitions of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf \Lambda _{t}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}_t$$\end{document} from steps 1 and 2, we approximate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_t| y_{1:t})$$\end{document} with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Poi}(\bar{\mathbf \Lambda }_{t})$$\end{document} where
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \bar{\Lambda }^{(k,l)}_{t} = {\left\{ \begin{array}{ll} {\Lambda }_{t}^{(k,l)} \quad & \text { if } {(k,l)} \ne {(i,j)}, \\ y_t + (1-\bar{q}_t){\Lambda _t}^{(k,l)} \quad & \text { if } {(k,l)} = {(i,j)}. \end{array}\right. } \end{aligned}$$\end{document}Laplace approximations within PALs algorithm
We can now collect the approximation steps and present them as a recursive algorithm, which we call the Laplace approximations within Poisson Approximate Likelihoods (LawPAL) algorithm. Line 2 is informed by approximation step 1, lines 3-5 by step 2, and lines 6 and 7 by step 3.
Algorithm 1LawPAL
As output from algorithm 1 one may take the approximate filtering distributions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ p(q_t | y_{1:t}) \approx \mathcal {N}_{[0,1]}(\bar{q}_t,s_t^2)$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{x}_t| y_{1:t}) \approx \textrm{Poi}(\bar{\boldsymbol{\lambda }}_t)$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_t| y_{1:t}) \approx \textrm{Poi}(\bar{\boldsymbol{\Lambda }}_t)$$\end{document} , which we study analytically in Section 4, and approximate marginal log-likelihood
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \log p(y_{1:T})= & \sum _{t=1}^T\log p(y_t|y_{1:t-1}) \\ \approx & \sum _{t=1}^T\ell (y_t|y_{1:t-1}) =: \ell (y_{1:T}), \end{aligned}$$\end{document}which can be used to fit \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q,\sigma ^2_q$$\end{document} , and parameters of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{K}_{t,\boldsymbol{\eta }}$$\end{document} , as demonstrated through examples in Section 5.
Asymptotic filtering theory
In this section we study the behavior of the approximate filtering quantities and distributions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ p(q_t | y_{1:t}) \approx \mathcal {N}_{[0,1]}(\bar{q}_t,s_t^2)$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{x}_t| y_{1:t}) \approx \textrm{Poi}(\bar{\boldsymbol{\lambda }}_t)$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_t| y_{1:t}) \approx \textrm{Poi}(\bar{\boldsymbol{\Lambda }}_t)$$\end{document} in the large population limit on a fixed time horizon \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T \in \mathbb {N}_0$$\end{document} . In a change of perspective, for this section we will consider the under-reporting parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1:T} \in (0,1)^T$$\end{document} as fixed but unknown and to be estimated, with (1) defining a prior. The main result of this section, Theorem 1, rigorously shows that Algorithm 1 is able to asymptotically recover both the reporting probabilities and the un-observed state of the process. To make dependence on population size explicit, throughout this section we will index the model elements and filtering quantities of algorithm 1 with the initial population size n: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_{t,n}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}_{t,n}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{t,n}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf \lambda _{t,n}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbf \Lambda _{t,n}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}_{t,n},$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\mathbf \Lambda }_{t,n}$$\end{document} , for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t= 1, \dots , T.$$\end{document} To be technically complete we write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\Omega _n, \mathcal {F}_n, \mathbb {P}_n)$$\end{document} for the probability space underlying the distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_{t,n}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{Z}_{t,n}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{t,n}$$\end{document} with initial population size n, denoting the overall probability space we shall work with as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\Omega , \mathcal {F}, \mathbb {P}) :=(\prod _{n \ge 1}\Omega _n, \bigotimes _{n \ge 1}\mathcal {F}_n, \bigotimes _{n \ge 1}\mathbb {P}_n)$$\end{document} . We begin with an assumption.
Assumption 1
There exists a constant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c>0$$\end{document} such that for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t \ge 1$$\end{document} , all vectors \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{f}_1, \boldsymbol{f}_2 \in \mathbb {R}^m$$\end{document} , and all probability vectors \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\eta }, \boldsymbol{\eta }'$$\end{document} :
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} |\boldsymbol{f}_1^\top \textbf{K}_{t, \boldsymbol{\eta }}\boldsymbol{f}_2 - \boldsymbol{f}_1^\top \textbf{K}_{t, \boldsymbol{\eta }'}\boldsymbol{f}_2 | \le c \Vert \boldsymbol{f}_1\Vert _\infty \Vert \boldsymbol{f}_2\Vert _\infty \Vert \boldsymbol{\eta }- \boldsymbol{\eta '}\Vert _\infty . \end{aligned}$$\end{document}That is, for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\ge 1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{K}_{t, \boldsymbol{\eta }}$$\end{document} is a continuous function of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\eta }$$\end{document} .
Law of large numbers. To analyze the large population limiting properties of the filter, we must first start with the data generating process. Define the following recursions:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \boldsymbol{\nu }_0&:=\boldsymbol{\pi }_0, \\ \boldsymbol{N}_t&:=(\boldsymbol{\nu }_{t-1} \otimes \textbf{1}_m) \odot \boldsymbol{K}_{t,\eta (\boldsymbol{\nu }_{t-1})}, \\ \boldsymbol{\nu }_t&:=(\textbf{1}_m^\top \boldsymbol{N}_t)^\top . \end{aligned}$$\end{document}Under assumption 1 it can be shown (Whitehouse et al. 2023) that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-1}\textbf{Z}_{t,n} \rightarrow \boldsymbol{N}_t$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n^{-1}\textbf{x}_{t,n} \rightarrow \boldsymbol{\nu }_t}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n^{-1}y_{t,n} \rightarrow q_t N_t^{(i,j)}}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} -almost surely, which can be interpreted as a discrete analogue of the convergence of jump-Markov processes to ODE limits (Kurtz 1970). One can interpret \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\nu }_t$$\end{document} as the limiting behavior of the (hidden) population prevalence process.
Asymptotically exact filtering. Theorem 1 presents a result on the asymptotic properties of algorithm 1, in particular it shows that the LawPAL algorithm asymptotically recovers both the prevalence process and the under-reporting rates \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1:T}$$\end{document} in the large population limit.
Theorem 1
Let assumption 1 hold. Then for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=1,\dots , T$$\end{document} we have that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-1}\bar{\boldsymbol{\lambda }}_{t,n}\rightarrow \boldsymbol{\nu }_t$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-1}\bar{\boldsymbol{\Lambda }}_{t,n} \rightarrow \boldsymbol{N}_t$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}_{t,n} \rightarrow q_t$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} -almost surely.
Proof
The proof proceeds by induction on t. The result for time 0 follows directly from the definition of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\lambda }_0 = n\boldsymbol{\pi }_0$$\end{document} , hence it is trivial that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{n}^{-1}\boldsymbol{\lambda }_{0,n} \rightarrow \boldsymbol{\pi }_0 = \boldsymbol{\nu }_0.$$\end{document} Now, for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t \ge 1$$\end{document} , for our inductive hypothesis we assume that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{n}^{-1}\bar{\boldsymbol{\lambda }}_{n,t-1} \rightarrow \boldsymbol{\nu }_{t-1}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} - almost surely. By assumption 1 and the continuous mapping theorem we then have that
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & n^{-1}\mathbf \Lambda _{t,n} = (n^{-1}\bar{\boldsymbol{\lambda }}_{t-1,n} \otimes \textbf{1}_m) \odot \textbf{K}_{t,\boldsymbol{\eta }(\bar{\boldsymbol{\lambda }}_{t-1,n})} \rightarrow \nonumber \\ & \quad (\boldsymbol{\nu }_{t-1} \otimes \textbf{1}_m) \odot \textbf{K}_{t,\boldsymbol{\nu }_{t-1}} = \boldsymbol{N}_t, \quad \mathbb {P}\text {-a.s.}. \end{aligned}$$\end{document}We now proceed to show that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}_{t,n} \rightarrow q_t$$\end{document} , which requires a more delicate argument.
Firstly we write
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \bar{q}_{t,n}&= \arg \max _q \quad y_{t,n}\log \left( q\Lambda _{t,n}^{(i,j)}\right) - q\Lambda _{t,n}^{(i,j)} - \frac{1}{2}\left( \frac{q-\mu _q}{\sigma ^2_q} \right) ^2+ C_y\\&= \arg \max _q \quad \underbrace{n^{-1}y_{t,n}\log \left( \frac{n^{-1}q\Lambda _{t,n}^{(i,j)}}{n^{-1}q_t\Lambda _{t,n}^{(i,j)}}\right) - n^{-1}q\Lambda _{t,n}^{(i,j)}+n^{-1}q_t\Lambda _{t,n}^{(i,j)} - \frac{n^{-1}}{2}\left( \frac{q-\mu _q}{\sigma ^2_q} \right) ^2}_{ \textstyle { =:\mathcal {C}_{t,n}(q).}} \end{aligned}$$\end{document}where the second line is obtained from the first by subtracting a quantity constant in q and dividing by n. The next step is to notice that by the continuous mapping theorem we have that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}\text {-a.s.}$$\end{document}
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathcal {C}_{t,n}(q)&\rightarrow q_tN_t^{(i,j)}\log \left( \frac{q\boldsymbol{N}_{t}^{(i,j)}}{q_t\boldsymbol{N}_{t}^{(i,j)}}\right) -q\boldsymbol{N}_{t}^{(i,j)}+q_t\boldsymbol{N}_{t}^{(i,j)} \\&= -\textrm{KL}\left[ \textrm{Poi}\left( q_t\boldsymbol{N}_{t}^{(i,j)}\right) ||\textrm{Poi}\left( q\boldsymbol{N}_{t}^{(i,j)}\right) \right] =: C_t(q). \end{aligned}$$\end{document}Fig. 1. LawPAL filtering distributions for the SEIR example, dotted lines indicate ground truth \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1:T}$$\end{document} , red lines the LawPAL filter mean, orange bands LawPAL filter 95% credible region.
Hence, by properties of the Kullback Leibler divergence, we have that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {C}_{t,n}(q)$$\end{document} converges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} -a.s. to a function maximized (uniquely) at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t$$\end{document} . By a standard continuity argument, e.g. (Cappé et al. (2005) ,Chapter 12.1), we then have the maximizer of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {C}_{t,n}(q)$$\end{document} converges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} -a.s. to the maximizer of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {C}_{t}(q)$$\end{document} and hence \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} -a.s. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}_{t,n} \rightarrow q_t$$\end{document} . We can now complete the inductive argument, noting that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} -a.s
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} n^{-1}\bar{\mathbf \Lambda }_{t,n}= & n^{-1}y_{t,n} + (1-\bar{q}_{t,n})n^{-1}\mathbf \Lambda _{t,n} \\ & \quad \rightarrow q_t\textbf{N}_t + (1 - q_t)\textbf{N}_t = \textbf{N}_t, \end{aligned}$$\end{document}hence by the continuous mapping theorem \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^{-1}\bar{\boldsymbol{\lambda }}_{t,n} = \left( \textbf{1}_m^\top n^{-1} \bar{\mathbf \Lambda }_{t,n}\right) ^\top \rightarrow \left( \textbf{1}_m^\top \textbf{N}_t\right) ^\top = \boldsymbol{\nu }_t$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} -a.s., thus completing the induction. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}
If one considers the full approximate filtering distribution \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ p(q_t | y_{1:t}) \approx \mathcal {N}_{[0,1]}(\bar{q}_t,s_t^2)$$\end{document} , we can further notice that since \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{t,n}\rightarrow \infty $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} -a.s. implies \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s^2_t = \left( \frac{y_{t,n}}{\bar{q}_{t,n}}+ \frac{1}{\sigma _q^2}\right) ^{-1} \rightarrow 0$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {P}$$\end{document} -a.s., we have that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,1]}(\bar{q}_t,s_t^2)$$\end{document} will converge to a point mass at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t$$\end{document} . To demonstrate this result through simulations we ran a Python implementation of the filter on data from the SEIR model in Section 2.2.1 for population sizes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=10^3,10^4,10^5$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=100$$\end{document} . We first simulated \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1:T}\sim \mathcal {N}_{[0,1]}(0.5,0.1)$$\end{document} which are then fixed for all values of n, for each population size we then simulate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{n,1:T}$$\end{document} using parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta = 0.8,\rho = 0.1,\gamma = 0.2, \boldsymbol{\pi }_0=[0.99,0,0.01,0]$$\end{document} and observation model 2.3. For each simulation corresponding to each value of n we ran the LawPAL filter. Figure 1 plots the approximate filtering means and 95% credible bands for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t$$\end{document} against the ground truth values, demonstrating that, for each t, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,1]}(\bar{q}_t,s_t^2)$$\end{document} collapses onto the true value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t$$\end{document} as per Theorem 1. In the next section we assess the ability of LawPAL approximate likelihood based inference to recover ground truth \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q,\sigma ^2_q$$\end{document} , and parameters of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{K}_{t,\boldsymbol{\eta }}$$\end{document} .
Experiments and examples
Code for all examples can be found at GitHub.com/Michael-Whitehouse/LawPAL, implemented using Rcpp.
Parameter recovery
Section 4 studies the theoretical performance of the filter under a well specified parameterization. Whilst in some settings it is reasonable to make assumptions on epidemiological parameters (Kucharski et al. 2020), in practice these quantities are unknown and must be estimated from data. Here we study the behavior of the estimator obtained by maximizing the approximate likelihood of algorithm 1, as population size n and time horizon T both grow large.
We consider a Susceptible-Infected-Removed (SIR) type model defined by an initial distribution \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_0\sim \textrm{Mult}(n,\boldsymbol{\pi }_0)$$\end{document} , for some length-3 probability vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\pi }_0$$\end{document} , and transition matrix
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textbf{K}_{t, \boldsymbol{\eta }}=\left[ \begin{array}{ccc} e^{- \beta \eta ^{(2)}} & 1-e^{- \beta \eta ^{(2)}} & 0 \\ 0 & e^{- \gamma } & 1-e^{-\gamma }\\ 0 & 0 & 1 \end{array}\right] . \end{aligned}$$\end{document}Observations are associated with new infections \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_t\sim \textrm{Binom}(Z_t^{1,2},q_t)$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t\sim \mathcal {N}_{[0,1]}(\mu _q,\sigma ^2_q)$$\end{document} , for some \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q \in [0,1], \sigma ^2_q>0$$\end{document} . We simulate from this model using initial distribution \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\pi }_0 = [0.995,0.005,0]$$\end{document} and parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta =0.15,\gamma =0.1,\mu _q=0.5,\sigma ^2_q=0.1$$\end{document} , repeated with varying population sizes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=5000,10^4,10^5,10^6$$\end{document} and time horizons \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=50,100,150,200$$\end{document} , and consider the problem of jointly estimating \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta ,\gamma ,\mu _q,\sigma ^2_q$$\end{document} with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\pi }_0$$\end{document} assumed known. Approximate maximum likelihood estimates are obtained by maximizing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell (y_{1:T})$$\end{document} jointly with respect to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta ,\gamma ,\mu _q,\sigma ^2_q$$\end{document} using a simple coordinate ascent algorithm, detailed in the associated GitHub repository. For each pair of n and T we produce 100 trajectories \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{1:T}$$\end{document} and 100 corresponding to approximate maximum likelihood estimates of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta ,\gamma ,\mu _q,\sigma ^2_q$$\end{document} , summaries of these estimates are presented in tables 1(a)-(d) and are visualized with a series of box-plots for each experiment in figure 2.Table 1. Approximate maximum likelihood estimate sample means and standard deviations.(a) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=50$$\end{document} n5000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^4$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^5$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^6$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} 0.165(.040)0.155(.032)0.154(.019)0.153(.020) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} 0.117(.043)0.106(.035)0.104(0.019)0.103(.021) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q$$\end{document} 0.484(.080)0.480(.077)0.493(.088)0.503(.091) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_q$$\end{document} 0.092(.106)0.097(.066)0.096(.040)0.097(.039)(b) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=100$$\end{document} n5000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^4$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^5$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^6$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} 0.144(.030)0.147(.017)0.149(.006)0.150(.003) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} 0.094(0.30)0.99(.019)0.99(.006)0.010(.004) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q$$\end{document} 0.493(.069)0.507(.052)0.499(.022)0.498(.020) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_q$$\end{document} 0.105(.066)0.104(.036)0.099(.021)0.098(.014)(c) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=150$$\end{document} n5000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^4$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^5$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^6$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} 0.151(.021)0.147(.016)0.150(.005)0.150(.002) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} 0.102(.024)0.097(0.018)0.099(.006)0.010(.002) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q$$\end{document} 0.510(.064)0.500(.049)0.498(.017)0.501(.009) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_q$$\end{document} 0.107(.065)0.101(.036)0.099(.013)0.099(.011)(d) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=200$$\end{document} n5000 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^4$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^5$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^6$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} 0.151(.017)0.151(.015)0.149(.004)0.150(.001) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} 0.102(.021)0.103(.018)0.100(.004)0.100(0.001) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q$$\end{document} 0.508(.064)0.511(.051)0.500(.013)0.500(.008) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_q$$\end{document} 0.112(.067)0.106(.040)0.101(.015)0.100(.010)
It is clear that as n and T increase the estimate samples become more concentrated around ground truth values; demonstrating that the LawPAL approximate likelihood can be used to recover ground truth parameters. In the large population \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\rightarrow \infty $$\end{document} regime with a fixed time horizon \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T<\infty $$\end{document} , a consistency result along with identifiability conditions are rigorously established in Whitehouse et al. (2023) for an equi-dispersed model, corresponding to an atomic distribution for the reporting probability. Tables 1(a)-(d) and figure 2 provide evidence that a consistency result may be feasible for the over-dispersed auxiliary variable model in the joint \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\rightarrow \infty $$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T\rightarrow \infty $$\end{document} regime. An intuition for the need to also take \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T\rightarrow \infty $$\end{document} comes by considering that trajectories \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{1:T}$$\end{document} with a fixed T can only contain information related to a finite sample \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1:T}\sim \mathcal {N}_{[0,1]}(\mu _q,\sigma ^2_q)$$\end{document} , and hence finite information related to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _q^2$$\end{document} ; for more information we would need more samples and hence a larger T. This heuristic also explains why the sample variance of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q$$\end{document} estimates in table 1(a) does not decrease as n increases: without T also increasing, little information on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q$$\end{document} is gained. A rigorous proof for such a consistency result and identifiability conditions in the joint regime would require significant development of the approach and additional tools to those used in Whitehouse et al. (2023), e.g. time uniform bounds on convergence rates of the data generating process. Such a result is an enticing prospect, but out of the scope of the present paper; we leave its study to future work.Fig. 2. Box-plots displaying mean and inter-quartile range of approximate maximum likelihood estimate samples for different values of n and T. Horizontal red lines correspond to the data generating parameter.
Computational efficiency and statistical compromise
When making the kind of approximations used in LawPAL, we should recognize that we probably pay some statistical sacrifice. For example, the constant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_y$$\end{document} in equation (8), is tacitly ignored– one may wonder what knock on effects this may have. This is the motivation of the present section in which we aim to characterize the statistical price paid for the computational efficiency gained when using the LawPAL instead of sequential Monte Carlo.
Comparison with sequential Monte Carlo
It is well known that SMC can be used to consistently (as the number of particles \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow \infty $$\end{document} ) estimate the true likelihood for a hidden Markov model (Chopin et al. 2020), furthermore, using SMC within a pseudo marginal Metropolis Hastings sampler produces an ergodic chain targeting the true posterior distribution (Andrieu et al. 2010). As such, SMC is often described as a ‘gold standard’ approximation and provides us with a way to check the performance of the LawPAL algorithm, with the heuristic being if our approximations closely match those obtained with SMC, then our approach well approximates the true likelihood.Fig. 3. Posterior plots for simulated data with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=2.5\times 10^4$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=50$$\end{document} .
To this end, we simulate data from model with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\pi }_0 = [0.995,0.005,0]$$\end{document} and parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta =0.3,\gamma =0.2,\mu _q=0.5,\sigma ^2_q=0.1$$\end{document} under 2 regimes, once with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=2.5\times 10^4$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=50$$\end{document} (of order corresponding a moderately sized town) and once with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^6$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=100$$\end{document} (of order corresponding a large city). For each of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta ,\gamma ,\sigma _q^2$$\end{document} we place a vague \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,\infty ]}(0,10)$$\end{document} prior, for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q$$\end{document} we use \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,1]}(0.5,10)$$\end{document} . For each of the simulations we ran two Metropolis Hasting schemes, 1) with the likelihood approximated by SMC using 1000 particles (Pmmh), 2) with the likelihood approximated by the LawPAL algorithm (LawPALmh). Priors are kept consistent for both schemes so that any discrepancies in posterior distributions must be due to discrepancies in likelihood approximations. For each chain we ran a 2000 iteration burn in with independent Gaussian random walk proposals with variance 0.01, then use this to construct an ‘optimal’ (Andrieu and Thoms 2008) joint multi-variate Gaussian proposal covariance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma = (2.38^2/d)\hat{\Sigma }$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\Sigma }$$\end{document} is the sample covariance of the burn in chain, and use this to produce a final chain of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^5$$\end{document} iterations. Each chain showed no signs of poor mixing and exhibited satisfactory decay in auto-correlation plots. The LawPALmh scheme performed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^4$$\end{document} iterations 3.8 seconds, the Pmmh scheme took 345 seconds, demonstrating a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim 90\times $$\end{document} speed up, though we note this is of course implementation dependent. We further note that for the pmmh scheme specially designed proposal distributions (Whitehouse et al. 2023) were required to reduce the marginal likelihood estimate variance to ensure adequate estimation of the likelihood ratio step in the Metropolis Hastings sampler; this extra step was not necessary for the deterministic LawPAL marginal likelihood approach.
Figures 3 and 4 plot posterior histogram densities for each regime using a thinned sample of size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\times 10^4$$\end{document} . Under the moderate population regime the statistical price of the approximation becomes apparent: whilst both schemes produce posterior distributions concentrated close to the data generating parameters, there are disagreements in the tails of the distributions with the pmmh posteriors exhibiting higher variance, indicating presence of slight over-confidence in LawPALmh posteriors. Under the large population regime this discrepancy becomes less pronounced, as exhibited by figure 4. This can be heuristically linked with Theorem 1: as n grows large, the first moment \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}_t$$\end{document} becomes more representative of the full filtering distribution as it collapses to a single point. Hence, less is sacrificed when we propagate the point value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{q}_t$$\end{document} in algorithm line 7, as opposed to the full conditional distribution.Fig. 4. Posterior plots for simulated data with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=10^6$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=100$$\end{document} .
Bayesian inference with Stan: Covid-19 in Switzerland
In this section we demonstrate how the LawPAL algorithm can be used for the development of a model for real data of practical interest using ‘off the shelf’ software. Probabilistic programming languages (PPL) have recently been recognized as a valuable tool in the development of disease transmission models (Grinsztajn et al. 2021; Andrade and Duggan 2020). In contrast to sampling based methods, the LawPAL approximation comprises only simple linear algebraic operations and is therefore naturally amenable to auto-differentiation and use within PPL libraries to draw posterior samples using an Hamiltonian Monte Carlo (HMC) scheme. HMC samplers are a type of MCMC algorithm which use auxiliary ’momentum’ variables to aid in exploring the posterior, crucially they require access to the gradient of the likelihood function to sample Hamiltonian trajectories.
We consider data of Covid-19 incidence reported in Switzerland in 2020. The count time-series data we consider is freely available, e.g. from Grinsztajn et al. (2021), and starts from the index case lasting 109 time steps. Our transmission model is a discrete-time stochastic counterpart to the ODE SEIR model developed by Grinsztajn et al. (2021), which captures the effects of national control measures implemented at time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t^*=23$$\end{document} using a time varying transmission rate.
SEIR model with control intervention
Cast as an instance of our latent compartmental model, the transmission process is governed by the time dependent transition matrix
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textbf{K}_{t, \boldsymbol{\eta }}=\left[ \begin{array}{cccc} e^{- \beta _t \eta ^{(3)}} & 1-e^{-\beta _t \eta ^{(3)}} & 0 & 0\\ 0 & e^{- \rho } & 1-e^{- \rho } & 0\\ 0 & 0& e^{- \gamma } & 1-e^{- \gamma }\\ 0 & 0 & 0 & 1 \end{array}\right] , \end{aligned}$$\end{document}with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _t = \beta (\alpha +(1-\alpha )(1+\exp (b(t-t^*-d)))^{-1})$$\end{document} , where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} denotes the uncontrolled transmission rate, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} is the decrease in transmission when controls are fully in place, b is the slope of the decrease, and d is the delay in the effect of control measures. Observations are \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_t\sim \textrm{Binom}(Z_t^{(2,3)},q_t)$$\end{document} where we consider two distinct models: an equi-dispersed observation model \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t = \mu _q$$\end{document} for all t, and an over-dispersed observation model \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t \sim \mathcal {N}_{[0,1]}(\mu _q,\sigma _q^2)$$\end{document} , where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q\in [0,1]$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_q>0$$\end{document} . The population size is taken to be \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=8.57\times 10^6$$\end{document} and the initial distribution probability vector is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\boldsymbol{\pi }_0 = n^{-1} [n-i_0-e_0,i_0,e_0,0]$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i_0,e_0>0$$\end{document} are parameters.Table 2. Parameter priors and posterior mean (standard deviation) summaries.ParameterPriorover-dispersedequi-dispersed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,\infty )}(2,0.5)$$\end{document} 1.53(0.33)1.24(0.04) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,1]}(0.2,0.1)$$\end{document} 0.17(0.06)0.07(0.0007) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,1]}(0.2,0.1)$$\end{document} 0.33(0.08)0.003(0.004) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} Beta(2.5, 4)0.09(0.04)0.0005(0.00003)b**Beta(1, 1)0.24(0.24)0.013(0.001)d**Exp(0.1)3.31(0.77)1.54(0.12) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _q$$\end{document} Beta(1, 2)0.62(0.17)0.67(0.09) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i_0$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,\infty )}(0,20)$$\end{document} 24.5(9.49)41.2(6.30) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e_0$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}_{[0,\infty )}(0,20)$$\end{document} 15.6(10.31)12.1(6.42) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _q^2$$\end{document} Exp(0.1)0.21(0.06)NA
Inference with HMC
The PPL Stan (Carpenter et al. 2017) provides a framework for the implementation of HMC in which a user only needs to provide priors and a likelihood function to produce posterior samples, providing a far less labor intensive approach than the pmmh scheme explored in Section 5.2. The equi-dispersed model is fit by embedding the Poisson Approximate Likelihood (PAL) algorithm (Whitehouse et al. 2023) in a Stan program, the over-dispersed model is similarly fit by embedding the LawPAL algorithm, implementation details can be found on the GitHub page. For both models the parameters to be learned are \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta ,\rho ,\gamma ,\alpha ,b,d,\mu _q,i_0,e_0$$\end{document} with the over-dispersed model having the additional dispersion parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _q^2$$\end{document} . Vague priors, inspired by Grinsztajn et al. (2021), are assigned to each of these parameters and are given in table 2. For each model the HMC sampler was run to produce chains of length \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^5$$\end{document} which showed no signs of unsatisfactory mixing. Posterior summaries are presented in table 2, there are significant disagreements between our two models, which we explore in the next section.
Posterior predictive checks: model parsimony
Fig. 5. Graphical posterior predictive distributions with credible intervals. Fig. 6. Graphical posterior predictive plots. Solid red line plots observed data. Coloured faint lines plot simulations from the posterior predictive distribution.
Graphical model checks provide a way for practitioners to look for systematic discrepancies between observations and model simulations by plotting them alongside one another and making comparisons ( Gelman et al. (1995) ,Chapter 6.4). We now apply this philosophy to assess our two models. Using the posterior samples obtained in Section 5.3.2 we sampled from the posterior predictive distributions for the equi-dispersed and over-dispersed models. In figure 5 we plot each of these distributions with credible intervals. Based on these plots alone both models appear reasonable, each with strong coverage of the observed data. However, these plots are effectively 2-dimensional projections of 109 dimensional objects, summarizing only marginal distributions for each time step and obscuring between-timestep covariation. Figure 6 presents ‘spaghetti’ plots of posterior predictive draws alongside the observed data. Using these graphical checks the limitations of the equi-dispersed model are exposed, with the observed data exhibiting significantly more between-timestep variation than the posterior draws. This is in contrast to the corresponding plot for the over-dispersed model, in which posterior predictive draws and the observed trajectory are indistinguishable in this regard. Furthermore the posterior samples associated with the equi-dispersed model imply an astronomically high posterior mean \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_0=\beta /\gamma \approx 8872$$\end{document} , which is epidemiologically absurd. The corresponding estimate for the over-dispersed model is a far more reasonable \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_0 \approx 4.4$$\end{document} . These issues exhibit the potential pitfalls of assuming equi-dispersed observation models, and demonstrate how the LawPAL has allowed us to build a model parsimonious with observed data using an ‘off the shelf’ sampler, in a way that the vanilla PAL algorithm does not permit. Furthermore, the exercise exhibits the need for an holistic approach to posterior evaluation and that is it appropriate to inform model selection decisions with multiple plots and checks.
Discussion and limitations
We have developed an analytically tractable ADAL procedure for stochastic epidemic models permitting observational over-dispersion, supported with theoretical results and favorable experimental performance. This extends the remit of ADAL methods to a broader class of models than previously possible and provides order of magnitude computational advantages over simulation-based sequential Monte Carlo methods. A natural extension of this work would be to tackle models with mechanistic over-dispersion (Ning and Ionides 2024; Whitehouse et al. 2023), e.g. by treating transmission rates as latent variables. It could be fruitful to investigate ADAL style approximations for such models, alternatively the LawPAL algorithm could be embedded within an SMC scheme. One could also consider alternative models to the multinomial and truncated Gaussian used herein. For example, one could let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_t$$\end{document} follow a Gamma distribution with the view of exploiting conjugacy in approximation step 2, as noted by an anonymous reviewer; or consider a Dirichlet-multinomial transmission model to induce dynamic over-dispersion. In principle one could consider LawPAL style approximations to the standard HMM backward equations (Briers et al. 2010) to obtain smoothing distributions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{Z}_{t}|y_{1:T})$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(q_{t}|y_{1:T})$$\end{document} for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t= 1,\dots , t$$\end{document} . These could then be used with expectation maximization algorithms for approximate maximum likelihood estimation, or within SMC samplers for smoothed estimation of time-varying parameters with Markovian priors Whiteley and Rimella (2021). Models with increasing numbers of auxiliary variables inducing increasing levels of over-dispersion define a set of nested models, each with a hierarchical latent structure. It would be interesting to compare such models using Bayesian selection techniques which explicitly penalize model variability pertinent to hierarchical models with latent structure, such as the Watanabe-Akaike information criterion or deviance information criterion (Watanabe 2013; Spiegelhalter et al. 2002). ADAL methods provide a taxonomy for accelerating posterior computations and hence could be used to develop a Bayesian workflow (Gelman et al. 1995) for the design and selection of models with differing extents of over-dispersion, as a Bayesian compliment to a previously explored frequentist approach (Stocks et al. 2020). The approximations developed in this work are derived from the specifics of the model and subsequently are not as general as simulation based methods such as particle filtering, highlighting a trade off between model choices and computational cost. We envisage that the use of ADAL methods in conjunction with particle filter methods has the potential to provide significant aid in scaling up procedures to large systems by reducing the dimensionality of the filtering problem, all without compromising the fidelity of the inference, as previously explored by Whitehouse et al. (2023) in the context of look-ahead filtering schemes (Rimella et al. 2022). Future work could also aim to provide a generalized recipe for approximation step 2 through the use of auto-differentiation software, further investigation would be needed to elucidate the potential gains of such an approach - taking inspiration from integrated nested Laplace approximation (Rue et al. 2009) type approaches could be of benefit to this end.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Corbella, A., Mc Kinley, T.J., Birrell, P.J., De Angelis, D., Presanis, A.M., Roberts, G.O., Spencer, S.E.: The lifebelt particle filter for robust estimation from low-valued count data. ar Xiv preprint ar Xiv:2212.04400 (2022)
- 2Ionides, E.L., Ning, N., Wheeler, J.: An iterated block particle filter for inference on coupled dynamic systems with shared and unit-specific parameters. ar Xiv preprint ar Xiv:2206.03837 (2022)
- 3Mckendrick, A.: Applications of mathematics to medical problems. Proc. Edinb. Math. Soc. 44, 98–130 (1925)
- 4Mc Kinley, T., Cook, A.R., Deardon, R.: Inference in epidemic models without likelihoods. The International Journal of Biostatistics 5(1) (2009)
- 5Rimella, L., Jewell, C., Fearnhead, P.: Approximating optimal smc proposal distributions in individual-based epidemic models. ar Xiv preprint ar Xiv:2206.05161 (2022)
- 6Rimella, L., Whiteley, N., Jewell, C., Fearnhead, P., Whitehouse, M.: Scalable calibration for partially observed individual-based epidemic models through categorical approximations. ar Xiv preprint ar Xiv:2501.03950 (2025)
