A large sample property in approximating the superposition of i.i.d.   point processes

Tianshu Cong; Aihua Xia; Fuxi Zhang

arXiv:1906.10008·math.PR·June 25, 2019

A large sample property in approximating the superposition of i.i.d. point processes

Tianshu Cong, Aihua Xia, Fuxi Zhang

PDF

Open Access

TL;DR

This paper investigates the large sample property (LSP) for the superposition of i.i.d. point processes, extending classical results from sums of i.i.d. variables to point process superpositions.

Contribution

It establishes the LSP for the superposition of i.i.d. point processes, a novel extension of the law of small numbers in the context of point processes.

Findings

01

LSP holds for superpositions of i.i.d. point processes

02

Error in approximation decreases with sample size

03

Extends classical LSP results to point process superpositions

Abstract

One of the main differences between the central limit theorem and the Poisson law of small numbers is that the former possesses the large sample property (LSP), i.e., the error of normal approximation to the sum of $n$ independent identically distributed (i.i.d.) random variables is a decreasing function of $n$ . Since 1980's, considerable effort has been devoted to recovering the LSP for the law of small numbers in discrete random variable approximation. In this paper, we aim to establish the LSP for the superposition of i.i.d. point processes.

Equations162

E [\int_{Γ} f (x, Ξ - δ_{x}) Ξ (d x)] = \int_{Γ} E f (x, Ξ_{x}) λ (d x),

E [\int_{Γ} f (x, Ξ - δ_{x}) Ξ (d x)] = \int_{Γ} E f (x, Ξ_{x}) λ (d x),

E [\iint_{Γ^{2}} f (x, y; Ξ - δ_{x} - δ_{y}) Ξ (d x) (Ξ - δ_{x}) (d y)] = \iint_{Γ^{2}} E f (x, y; Ξ_{x y}) λ^{[2]} (d x, d y),

E [\iint_{Γ^{2}} f (x, y; Ξ - δ_{x} - δ_{y}) Ξ (d x) (Ξ - δ_{x}) (d y)] = \iint_{Γ^{2}} E f (x, y; Ξ_{x y}) λ^{[2]} (d x, d y),

d_{1}(\rho_{1},\rho_{2}):=\left\{\begin{array}[]{ll}1,&\mbox{for }|\rho_{1}|\neq|\rho_{2}|,\\ 0,&\mbox{for }|\rho_{1}|=|\rho_{2}|=0,\\ \sup_{u\in\mathscr{K}}\left|\bar{\rho}_{1}(u)-\bar{\rho}_{2}(u)\right|,&\mbox{for }|\rho_{1}|=|\rho_{2}|>0,\end{array}\right.

d_{1}(\rho_{1},\rho_{2}):=\left\{\begin{array}[]{ll}1,&\mbox{for }|\rho_{1}|\neq|\rho_{2}|,\\ 0,&\mbox{for }|\rho_{1}|=|\rho_{2}|=0,\\ \sup_{u\in\mathscr{K}}\left|\bar{\rho}_{1}(u)-\bar{\rho}_{2}(u)\right|,&\mbox{for }|\rho_{1}|=|\rho_{2}|>0,\end{array}\right.

d_{2} (P, Q) := f \in F sup ∣ P (f) - Q (f) ∣ = W \sim P, Z \sim Q in f E d_{1} (W, Z),

d_{2} (P, Q) := f \in F sup ∣ P (f) - Q (f) ∣ = W \sim P, Z \sim Q in f E d_{1} (W, Z),

M_{G} \circ η := i \in I \sum η (G_{i}) δ_{t_{i}} .

M_{G} \circ η := i \in I \sum η (G_{i}) δ_{t_{i}} .

TV_{G} (W) := i \in I max d_{t v} (L (M_{G} \circ W + δ_{t_{i}}); L (M_{G} \circ W)),

TV_{G} (W) := i \in I max d_{t v} (L (M_{G} \circ W + δ_{t_{i}}); L (M_{G} \circ W)),

ϑ_{ε} (W) := G \in P_{ε} in f TV_{G} (W) .

ϑ_{ε} (W) := G \in P_{ε} in f TV_{G} (W) .

π_{a, b; β} (i + 1) := C j = 0 \prod i \frac{a + bj}{( j + 1 ) ( 1 + β j )}, i \in Z_{+} := {0, 1, 2, \dots},

π_{a, b; β} (i + 1) := C j = 0 \prod i \frac{a + bj}{( j + 1 ) ( 1 + β j )}, i \in Z_{+} := {0, 1, 2, \dots},

C := C (a, b; β) = (1 + i = 0 \sum \infty j = 0 \prod i \frac{a + bj}{( j + 1 ) ( 1 + β j )})^{- 1} .

C := C (a, b; β) = (1 + i = 0 \sum \infty j = 0 \prod i \frac{a + bj}{( j + 1 ) ( 1 + β j )})^{- 1} .

Z := n = 1 \sum Z δ_{U_{n}},

Z := n = 1 \sum Z δ_{U_{n}},

λ (d x) := E Ξ_{1} (d x), λ^{[2]} (d x, d y) := E Ξ_{1} (d x) (Ξ_{1} - δ_{x}) (d y),

λ (d x) := E Ξ_{1} (d x), λ^{[2]} (d x, d y) := E Ξ_{1} (d x) (Ξ_{1} - δ_{x}) (d y),

θ_{r} := E [∣ Ξ_{1} ∣ (∣ Ξ_{1} ∣ - 1) \dots (∣ Ξ_{1} ∣ - r + 1)], \forall r \in N .

θ_{r} := E [∣ Ξ_{1} ∣ (∣ Ξ_{1} ∣ - 1) \dots (∣ Ξ_{1} ∣ - r + 1)], \forall r \in N .

β = \frac{θ _{1}^{2} - θ _{2}}{( n - 1 ) θ _{1} [ θ _{1} - 2 ( θ _{1}^{2} - θ _{2} ) ] + ( θ _{3} + θ _{2} - θ _{1} θ _{2} )} .

β = \frac{θ _{1}^{2} - θ _{2}}{( n - 1 ) θ _{1} [ θ _{1} - 2 ( θ _{1}^{2} - θ _{2} ) ] + ( θ _{3} + θ _{2} - θ _{1} θ _{2} )} .

θ_{1} - 2 (θ_{1}^{2} - θ_{2}) = 2 [Var (∣ Ξ_{1} ∣) - \frac{1}{2} E ∣ Ξ_{1} ∣],

θ_{1} - 2 (θ_{1}^{2} - θ_{2}) = 2 [Var (∣ Ξ_{1} ∣) - \frac{1}{2} E ∣ Ξ_{1} ∣],

a

a

μ (d x)

\displaystyle d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu}\right)

\displaystyle d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu}\right)

\leq 2 ε + C ϑ_{ε} (k = 1 \sum n - 2 Ξ_{k}) + C n P (k = 1 \sum n - 2 ∣ Ξ_{k} ∣ \leq ρ (n - 2) θ_{1})

\leq 2 ε + C ϑ_{ε} (k = 1 \sum n - 2 Ξ_{k}) + O (n^{- 1/2})

\displaystyle d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu}\right)

\displaystyle d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu}\right)

\leq (C + 1) 2 ε_{n - 2} + C n P (k = 1 \sum n - 2 ∣ Ξ_{k} ∣ \leq ρ (n - 2) θ_{1})

\leq (C + 1) 2 ε_{n - 2} + O (n^{- 1/2}),

ε_{n} = in f {v : 2 v \geq ϑ_{v} (k = 1 \sum n Ξ_{k})} .

ε_{n} = in f {v : 2 v \geq ϑ_{v} (k = 1 \sum n Ξ_{k})} .

d_{2}\left(\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu},\mbox{\boldmath$\pi$}_{a,b;{\beta};\nu}\right)\leq d_{1}(\mu,\nu)=O(n^{-1}).

d_{2}\left(\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu},\mbox{\boldmath$\pi$}_{a,b;{\beta};\nu}\right)\leq d_{1}(\mu,\nu)=O(n^{-1}).

d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu}\right)=O\left(n^{-1/2}\right)\mbox{ as }n\to\infty.

d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu}\right)=O\left(n^{-1/2}\right)\mbox{ as }n\to\infty.

d_{t v} (L (Y + e_{j}), L (Y)) = d_{t v} (L (Y_{j} + 1), L (Y_{j})) .

d_{t v} (L (Y + e_{j}), L (Y)) = d_{t v} (L (Y_{j} + 1), L (Y_{j})) .

d_{t v} (L (Y_{j} + 1), L (Y_{j})) = 0 \leq l \leq n - 2 max (l n - 2) p_{j}^{l} (1 - p_{j})^{n - l} \leq \frac{C}{( n - 2 ) p _{j} ( 1 - p _{j} )} .

d_{t v} (L (Y_{j} + 1), L (Y_{j})) = 0 \leq l \leq n - 2 max (l n - 2) p_{j}^{l} (1 - p_{j})^{n - l} \leq \frac{C}{( n - 2 ) p _{j} ( 1 - p _{j} )} .

d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu}\right)=O\left(n^{-1/2}\right),\mbox{ as }n\rightarrow\infty.

d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\mu}\right)=O\left(n^{-1/2}\right),\mbox{ as }n\rightarrow\infty.

d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\nu}\right)=O\left(n^{-1/2}\right),\mbox{ as }n\rightarrow\infty.

d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\nu}\right)=O\left(n^{-1/2}\right),\mbox{ as }n\rightarrow\infty.

d_{2}^{\prime}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}^{\prime}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\nu^{\prime}}\right)=O\left(n^{-1/2}\right)\mbox{ as }n\to\infty.

d_{2}^{\prime}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}^{\prime}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\nu^{\prime}}\right)=O\left(n^{-1/2}\right)\mbox{ as }n\to\infty.

\mathbb{E}d_{1}^{\prime}\left(\sum_{i=1}^{m}Y_{1i}\delta_{t_{i}},\sum_{i=1}^{m}Y_{2i}\delta_{t_{i}}\right)=d_{2}^{\prime}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}^{\prime}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\nu^{\prime}}\right).

\mathbb{E}d_{1}^{\prime}\left(\sum_{i=1}^{m}Y_{1i}\delta_{t_{i}},\sum_{i=1}^{m}Y_{2i}\delta_{t_{i}}\right)=d_{2}^{\prime}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}^{\prime}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\nu^{\prime}}\right).

W_{1} := i = 1 \sum m j = 1 \sum Y_{1 i} δ_{ζ_{ij}} \mbox an d W_{2} := i = 1 \sum m j = 1 \sum Y_{2 i} δ_{ζ_{ij}},

W_{1} := i = 1 \sum m j = 1 \sum Y_{1 i} δ_{ζ_{ij}} \mbox an d W_{2} := i = 1 \sum m j = 1 \sum Y_{2 i} δ_{ζ_{ij}},

d_{2}\left(\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right),\mbox{\boldmath$\pi$}_{a,b;{\beta};\nu}\right)\leq\mathbb{E}d_{1}\left({\cal W}_{1},{\cal W}_{2}\right)\leq\mathbb{E}d_{1}^{\prime}\left(\sum_{i=1}^{m}Y_{1i}\delta_{t_{i}},\sum_{i=1}^{m}Y_{2i}\delta_{t_{i}}\right).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Data Management and Algorithms · Stochastic processes and statistical mechanics

Full text

\GraphInit

[vstyle = Shade]

A large sample property in approximating the superposition of i.i.d. point processes

Tianshu Cong111School of Mathematics and Statistics, The University of Melbourne, VIC 3010, Australia, E-mail: [email protected]. Work supported by a Research Training Program Scholarship and a Cross-Disciplinary PhD Scholarship in Mathematics and Statistics at the University of Melbourne., Aihua Xia222School of Mathematics and Statistics, The University of Melbourne, VIC 3010, Australia, E-mail: [email protected]. Work supported in part by the Belz fund, Australian Research Council Grants Nos DP150101459 and DP190100613. and Fuxi Zhang333School of Mathematical Sciences, Peking University, Beijing 100871, China, E-mail: [email protected]. Work supported in part by NSF of China 11371040.

Abstract

One of the main differences between the central limit theorem and the Poisson law of small numbers is that the former possesses the large sample property (LSP), i.e., the error of normal approximation to the sum of $n$ independent identically distributed (i.i.d.) random variables is a decreasing function of $n$ . Since 1980’s, considerable effort has been devoted to recovering the LSP for the law of small numbers in discrete random variable approximation. In this paper, we aim to establish the LSP for the superposition of i.i.d. point processes.

Key words and phrases: point process approximation, superposition, central limit theorem.

AMS 2010 Subject Classification: Primary 60F05; secondary 60E15, 60G55.

Running title: Superposition of Point Processes

1 Introduction

The central limit theorem states that the distribution of the sum $S:=\sum_{i=1}^{n}X_{i}$ of independent copies of a random variable $X$ with finite second moment, after being normalized, converges weakly to the standard normal distribution. The Berry-Esseen bound ensures that, if $X$ has the finite third moment, the error of the normal approximation, measured in the Kolmogorov metric, is not worse than $c/\sqrt{n}$ , where $c$ is a constant determined by the distribution of $X$ . In other words, the central limit theorem has the large sample property (LSP), i.e., the quality of the approximation improves as the sample size becomes large. The LSP can also be established for the functional central limit theorem measured in the Lévy-Prokhorov distance [Borovkov & Sakhanenko (1980), Haeusler (1984), Kubilius (1985), Ferger (1994), Utev (1986)]. Moreover, Stein’s method can be used to estimate the errors of diffusion approximation [Barbour (1990)].

The Poisson law of small numbers, on the other hand, does not possess the LSP. More precisely, if $X_{i}$ ’s are independent indicator random variables with $\mathbb{P}(X_{i}=1)=1-\mathbb{P}(X_{i}=0)=p_{i}$ for each $i$ , then the total variation distance between the distribution of $W=\sum_{i=1}^{n}I_{i}$ and the Poisson distribution with mean $\lambda:=\sum_{i=1}^{n}p_{i}$ is of the order ${\Omega}\left(\lambda^{-1}\sum_{i=1}^{n}p_{i}^{2}\right)$ [Barbour & Hall (1984)]. In particular, if $p_{i}=p$ for all $i$ , one can see that the quality of approximation does not improve when $n$ becomes large. This is due to the fact that a Poisson distribution has only one parameter while a normal distribution has two parameters. To recover the LSP, one has to introduce more parameters into the approximating distributions, e.g., signed compound Poisson measures, translated Poisson, compound Poisson, negative binomial and polynomial birth-death distributions [Presman (1983), Kruopis (1986), Čekanavičius (1997), Barbour & Xia (1999), Barbour & Choi (2004), Röllin (2007), Barbour, Chen & Loh (1992), Brown & Phillips (1999), Brown & Xia (2001)].

If we consider point processes rather than nonnegative integer-valued random variables, the counterpart is the superposition ${\cal V}_{n}=\Xi_{1}+\cdots+\Xi_{n}$ of point processes $\{\Xi_{i}:\ 1\leq i\leq n\}$ . The pioneering work of Grigelionis [Grigelionis (1963)] demonstrates that the distribution of the superposition of independent sparse point processes on the carrier space $\mathbb{R}_{+}$ converges weakly to a Poisson process distribution. The same phenomenon can be established for the superposition of dependent sparse point processes on a general carrier space [Goldman (1967), Jagers (1972), Brown (1978), Kallenberg (1983)]. The accuracy of Poisson point process approximation has been of considerable interest since 1970’s [Serfling (1975), Brown (1978)]. Stein’s method for Poisson process approximation was subsequently established by [Barbour (1988), Barbour & Brown (1992)] for estimating the approximation errors and the method was further refined by [Brown, Weinberg & Xia (2000), Xia (2005b), Chen & Xia (2004)]. In the context of the aforementioned superposition of i.i.d. point processes, no error estimates were studied until the last decade [Schuhmacher (2005), Chen & Xia (2011)] and these studies show that the Poisson point process approximation to the superposition of i.i.d. point processes does not possess the LSP either. The aim of this note is to show that, by introducing more parameters into the approximating point process distribution, it is possible to recover a LSP in approximating the superposition of i.i.d. point processes.

Given that a Poisson point process on a compact metric space can be viewed as a Poisson number $Z$ of i.i.d. points in the space, a natural step of introducing more parameters into the approximating point process is to replace the Poisson number $Z$ by a random variable $N$ whose distribution is controlled by two or more parameters, such as the translated Poisson [Barbour & Choi (2004), Röllin (2007)], negative binomial [Brown & Phillips (1999)] and polynomial birth-death distributions [Brown & Xia (2001)]. The family of approximating distributions we will consider in this note is the polynomial birth-death process distributions introduced in [Xia & Zhang (2008)]. To quantify the difference between two point processes, as in [Schuhmacher (2005), Chen & Xia (2011)], we use the Wasserstein distance $d_{2}$ initiated in [Barbour & Brown (1992)]. The formal statement of the main result is given in Theorem 2.2. Several applications are provided in Section 3 to illustrate the order of convergence in the LSP. Section 4 is devoted to the proof of the main result.

2 Preliminaries and the main result

1. Point processes. For the reader’s convenience, in this part, we collect some basic concepts and facts, and introduce a partitional total variation distance for comparing point processes under a partition of the carrier space. The basic concepts needed for this note are point process, reduced palm process [Kallenberg (1983), Chapter 10], the Wasserstein distance $d_{2}$ [Barbour & Brown (1992)] and partition [Xia & Zhang (2012)].

Let ${\Gamma}$ be a compact metric space with metric $d_{0}$ bounded by 1. Let ${\mathscr{B}}({\Gamma})$ be the Borel ${\sigma}$ -algebra induced by $d_{0}$ . A configuration $\xi$ on ${\Gamma}$ is a collection of finitely many particles located in ${\Gamma}$ . Equivalently, it can be represented as a non-negative integer-valued finite measure on ${\Gamma}$ . Denote by $|\rho|$ the total mass of a measure $\rho$ . Therefore, we can write $\xi$ as $\sum_{i=1}^{|\xi|}\delta_{x_{i}}$ , where $\delta_{x}$ is the Dirac measure at $x$ . Let ${\mathscr{H}}$ be the set of all configurations on ${\Gamma}$ , and ${\mathscr{B}}({\mathscr{H}})$ be the ${\sigma}$ -algebra generated by the mappings $\xi\mapsto\xi(C)$ , $C\in{\mathscr{B}}({\Gamma})$ [Kallenberg (1983), p. 12]. A point process is a measurable mapping from a probability space into $({\mathscr{H}},{\mathscr{B}}({\mathscr{H}}))$ . We use $\xi,\eta,\cdots$ to stand for configurations, $\Xi$ , ${\cal V},{\cal W},{\cal Z},\cdots$ to stand for point processes, and ${\bf P},{\bf Q},\mathscr{L}(\Xi),\mathscr{L}({\cal V}),\cdots$ to stand for the laws of point processes.

Let $\Xi$ be a point process with finite mean measure ${\lambda}(dx):=\mathbb{E}\Xi(dx)$ . The family of point processes $\{\Xi_{x}:x\in{\Gamma}\}$ are said to be the reduced Palm processes associated with $\Xi$ if for any measurable function $f:{\Gamma}\times{\mathscr{H}}\rightarrow{\mathbb{R}}_{+}:=[0,\infty)$ ,

[TABLE]

[Kallenberg (1983), Chapter 10]. Furthermore, suppose ${\lambda}^{[2]}(dx,dy):=\mathbb{E}\Xi(dx)(\Xi-\delta_{x})(dy)$ is finite, then one can define the second order reduced Palm processes $\{\Xi_{xy}:x,y\in{\Gamma}\}$ associated with $\Xi$ by

[TABLE]

for any measurable function $f:{\Gamma}^{2}\times{\mathscr{H}}\rightarrow[0,\infty)$ [Kallenberg (1983), Chapter 12].

[Barbour & Brown (1992)] introduce a Wasserstein distance $d_{2}$ for quantifying the difference between two probability measures ${\bf P}$ , ${\bf Q}$ on $({\mathscr{H}},{\mathscr{B}}({\mathscr{H}}))$ . The metric is defined in two stages. First, for two finite measures $\rho_{1}$ and $\rho_{2}$ on ${\Gamma}$ , define

[TABLE]

where $\bar{\rho}:=\rho/|\rho|$ is the normalized measure of $\rho$ , $\mathscr{K}:=\{u:|u(x)-u(y)|\leq d_{0}(x,y),\forall x,y\in{\Gamma}\}$ and $\rho(u):=\int_{{\Gamma}}ud\rho$ . In particular, by the Kantorovich-Rubinstein duality theorem [Rachev (1991), Theorem 8.1.1], for two probability measures $\mu$ and $\nu$ on ${\Gamma}$ , $d_{1}(\mu,\nu)=\inf_{X\sim\mu,Y\sim\nu}\mathbb{E}d_{0}(X,Y)$ , where $X,\ Y$ are ${\Gamma}$ -valued ${\mathscr{B}}({\Gamma})$ -measurable random elements. For two configurations $\xi_{1}:=\sum_{i=1}^{|\xi_{1}|}\delta_{x_{1i}},\xi_{2}:=\sum_{i=1}^{|\xi_{2}|}\delta_{x_{2i}}\in{\mathscr{H}}$ , we have the duality representation $d_{1}(\xi_{1},\xi_{2})=\min_{\bm{\pi}}\left\{n^{-1}\sum_{i=1}^{n}d_{0}(x_{1i},x_{2\bm{\pi}(i)})\right\}$ when $|\xi_{1}|=|\xi_{2}|=n$ and $1$ otherwise, where $\min$ is taken over all permutations of $\{1,2,\cdots,n\}$ . The metric $d_{2}$ is defined as

[TABLE]

where $\mathscr{F}:=\{f:|f(\xi)-f(\eta)|\leq d_{1}(\xi,\eta),\ \forall\ \xi,\eta\in{\mathscr{H}}\}$ , and the last equality is due to the duality theorem [Rachev (1991), Theorem 8.1.1].

For a partition $\mathscr{G}=\{G_{i}:\ i\in I\}\subset{\mathscr{B}}({\Gamma})$ of ${\Gamma}$ , where $I\subset\mathbb{N}:=\{1,2,3,\dots\}$ is a finite set, let $t_{i}\in\operatorname*{arg\,min}_{x}\sup_{s\in G_{i}}d_{0}(s,x)$ , that is, $t_{i}\in{\Gamma}$ is a point such that $d_{0}(G_{i},t_{i}):=\sup_{s\in G_{i}}d_{0}(s,t_{i})$ is as small as possible, $i\in I$ . We call $t_{i}$ a center of $G_{i}$ . Let $d_{0}(\mathscr{G}):=\max_{i\in I}d_{0}(G_{i},t_{i})$ . We call $\mathscr{G}$ an $\varepsilon$ -partition of ${\Gamma}$ if $d_{0}(\mathscr{G})\leq\varepsilon$ . Denote all $\varepsilon$ -partitions of ${\Gamma}$ by ${\mathscr{P}}_{\varepsilon}$ .

For any partition $\mathscr{G}=\{G_{i}:\ i\in I\}$ , we define an assembling mapping $\mathscr{M}_{\mathscr{G}}$ as

[TABLE]

The assembling mapping, when applied to a configuration $\eta$ , shifts all particles of $\eta$ in $G_{i}\in\mathscr{G}$ to its center $t_{i}$ . For a point process ${\cal W}$ , we define the partitional total variation distance as

[TABLE]

where for two probability measures ${\bf P}$ and ${\bf Q}$ on ${\mathscr{H}}$ , $d_{tv}({\bf P},{\bf Q})=\sup_{A\in{\mathscr{B}}({\mathscr{H}})}|{\bf P}(A)-{\bf Q}(A)|.$ We write

[TABLE]

2. Polynomial birth-death point process. As mentioned in the Introduction, there are various ways to introduce more parameters into the approximating point process for better accuracy of approximation. In this part, we collect the facts around the polynomial birth-death point process established in [Xia & Zhang (2008)].

For $a>0$ , $0\leq b<1$ , ${\beta}\geq 0$ , we define the polynomial birth-death distribution introduced in [Brown & Xia (2001)] as

[TABLE]

where

[TABLE]

The distribution can be viewed as the equilibrium distribution of the birth-death process with birth rates $\{a+bk:\ k\in\mathbb{Z}_{+}\}$ and death rates $\{k(1+{\beta}(k-1)):\ k\in\mathbb{N}\}$ . The polynomial birth-death point process is given by

[TABLE]

where $Z;U_{1},U_{2},\cdots$ are independent, $Z\sim\pi_{a,b;{\beta}}$ , $U_{n}\sim\mu$ with $\mu$ being a probability measure on ${\Gamma}$ , $\forall n\geq 1$ . Denote $\mathscr{L}({\cal Z})$ by $\mbox{\boldmath$ \pi $}_{a,b;{\beta};\mu}$ .

3. Main Result. Suppose $\Xi_{1},\cdots,\Xi_{n}$ are independent and identically distributed point processes. In each of the following two cases, we can give the polynomial birth-death point process approximation of the superposition ${\cal V}_{n}=\Xi_{1}+\cdots+\Xi_{n}$ under the Wasserstein distance $d_{2}$ . Denote

[TABLE]

Case 1. If ${\rm Var}(|\Xi_{1}|)\geq\mathbb{E}|\Xi_{1}|$ , we take $b=\frac{\theta_{2}-\theta_{1}^{2}}{\theta_{2}-\theta_{1}^{2}+\theta_{1}}$ and ${\beta}=0$ .

Case 2. Either $\frac{1}{2}\mathbb{E}|\Xi_{1}|<{\rm Var}(|\Xi_{1}|)<\mathbb{E}|\Xi_{1}|$ or $\frac{1}{2}\mathbb{E}|\Xi_{1}|={\rm Var}(|\Xi_{1}|)$ and $\theta_{3}>\theta_{2}(\theta_{1}-1)$ , we set $b=0$ and

[TABLE]

Remark 2.1

Note that $\theta_{2}-\theta_{1}^{2}={\rm Var}\left(|\Xi_{1}|\right)-\mathbb{E}|\Xi_{1}|$ and

[TABLE]

we have $0\leq b<1$ and ${\beta}>0$ for $n>1+\frac{\theta_{1}\theta_{2}-\theta_{2}-\theta_{3}}{\theta_{1}(\theta_{1}-2(\theta_{1}^{2}-\theta_{2}))}\vee 0$ .

In all cases, let

[TABLE]

where $\Xi_{1,x}$ is the reduced Palm distribution of $\Xi_{1}$ at $x$ . Our main result is as follows.

Theorem 2.2

For both cases above, there exists a constant $C$ , depending on $\mathscr{L}(\Xi_{1})$ , and $0<\rho<1$ such that for any $\varepsilon>0$ ,

[TABLE]

for $n>2$ in Case 1 and $n>1+\frac{\theta_{1}\theta_{2}-\theta_{2}-\theta_{3}}{\theta_{1}(\theta_{1}-2(\theta_{1}^{2}-\theta_{2}))}\vee 0$ in Case 2. In particular,

[TABLE]

valid for the same range of $n$ specified above, where $\varepsilon_{n}$ is defined as

[TABLE]

Remark 2.3

The last terms of (4) and (6) are typically of order $O(e^{-cn})$ for a positive constant $c$ depending on the distribution of $\Xi_{1}$ [Chung & Lu, Theorem 2.7] but the remaining terms of (4) and (6) are typically of order no better than $O\left(n^{-1/2}\right)$ .**

Proposition 2.4

$\varepsilon_{n}$ * is decreasing in $n$ .*

Proof. First, $\vartheta_{\varepsilon}(\sum_{i=1}^{n+1}\Xi_{i})\leq\vartheta_{\varepsilon}(\sum_{i=1}^{n}\Xi_{i})$ since $TV_{\mathscr{G}}(\sum_{i=1}^{n+1}\Xi_{i})\leq TV_{\mathscr{G}}(\sum_{i=1}^{n}\Xi_{i})$ for any partition $\mathscr{G}$ . It follows that $\vartheta_{\varepsilon_{n}}(\sum_{i=1}^{n+1}\Xi_{i})\leq\vartheta_{\varepsilon_{n}}(\sum_{i=1}^{n}\Xi_{i})\leq 2\varepsilon_{n}$ . Noting that $\vartheta_{\varepsilon}(\sum_{i=1}^{n+1}\Xi_{i})>2\varepsilon$ when $\varepsilon<\varepsilon_{n+1}$ and $\vartheta_{\varepsilon}(\sum_{i=1}^{n+1}\Xi_{i})\leq 2\varepsilon$ when $\varepsilon>\varepsilon_{n+1}$ , we obtain $\varepsilon_{n}\geq\varepsilon_{n+1}$ .

Remark 2.5

Case 1 is known as over-dispersion [Faddy (1994)]. It is shown in [Brown, Hamza & Xia (1998)] that over-dispersion in statistics arising from natural phenomena is much more common than under-dispersion, i.e., ${\rm Var}(|\Xi_{1}|)<\mathbb{E}|\Xi_{1}|$ .**

Remark 2.6

Let $\nu(dx)=\frac{1}{\theta}_{1}{\lambda}(dx)$ be the normalized distribution of ${\lambda}(dx)$ . Since $\mu(dx)$ is the normalized distribution of ${\lambda}(dx)+\frac{{\beta}\mathbb{E}|\Xi_{1,x}|}{1+{\beta}(n-1)\theta_{1}}{\lambda}(dx)$ , we have

[TABLE]

Thus $\mu$ in Theorem 2.2 can be replaced by $\nu$ at the cost of $O(n^{-1})$ being added to the upper bound.**

3 Examples

In this section, we demonstrate the use of Theorem 2.2 in five applications: Bernoulli process, Bernoulli process with shifts, compound Poisson process, renewal process and entrances and exits of Markov process. For simplicity, except in subsection 3.3, we only consider point processes on the carrier space ${\Gamma}=[0,1]$ with $d_{0}(x,y)=|x-y|$ . Extension to any compact carrier space is a straightforward exercise.

3.1 Bernoulli process

As a warming up example, we consider a simple Bernoulli process $\Xi_{1}=\sum_{i=1}^{m}I_{i}\delta_{t_{i}}$ , where $\{I_{i}:\ 1\leq i\leq m\}$ are independent Bernoulli random variables with $\mathbb{P}(I_{i}=1)=1-\mathbb{P}(I_{i}=0)=p_{i}\in(0,1)$ , $\{t_{i}:\ 1\leq i\leq m\}\subset{\Gamma}$ and $m$ is a finite positive integer. This is a typical case where the actual support space of the point process is a subset of the carrier space and it reminds us that the partition technique should not be applied blindly. The following theorem is a generalisation of [Xia & Zhang (2008)] with the same order of convergence as that for the special case in [Xia & Zhang (2008)].

Theorem 3.1

For i.i.d. Bernoulli processes $\{\Xi_{k}\}_{k\in\mathbb{N}}$ , if $\sum_{i=1}^{m}p_{i}(1-p_{i})>\frac{1}{2}\sum_{i=1}^{m}p_{i}$ , let $a,b;{\beta}$ and $\mu$ be as defined in Case 2, we have

[TABLE]

Remark 3.2

The distances amongst $\{t_{i}:\ 1\leq i\leq m\}$ play no role in the speed of convergence. **

Proof of Theorem 3.1. The support of $\Xi_{1}$ is a reduced carrier space ${\Gamma}_{r}:=\{t_{i}:\ 1\leq i\leq m\}$ , so it suffices to consider the reduced carrier space ${\Gamma}_{r}$ with partition $\mathscr{G}=\{\{t_{i}\}:\ 1\leq i\leq m\}$ and $\varepsilon=0$ . With the partition $\mathscr{G}$ , $\Xi_{1}$ corresponds to the vector $\vec{X}=(I_{1},\cdots,I_{m})$ . Let $\vec{Y}$ be the sum of $n-2$ independent copies of $\vec{X}$ , and $e_{j}$ be the vector with value $1$ at the $j$ -th component and [math] otherwise. Then, by the independence,

[TABLE]

Noting that $Y_{j}$ is the sum of independent Bernoulli $(p_{j})$ random variables, we have $Y_{j}\sim{\rm Binomial}(n-2,p_{j})$ , which implies

[TABLE]

Hence, it follows from (8) that the second term of (5) is bounded by $O\left(n^{-1/2}\right)$ .

3.2 Bernoulli process with shifts

The aim of this example is to show that we may use a marked point process to get better approximation bounds.

Similar to the previous subsection, we define $\Xi_{1}=\sum_{i=1}^{m}I_{i}\delta_{\zeta_{i}}$ , where $\{(I_{i},\zeta_{i})\}$ are independent with $I_{i}$ having Bernoulli distribution with $\mathbb{P}(I_{i}=1)=1-\mathbb{P}(I_{i}=0)=p_{i}\in(0,1)$ , $\zeta_{i}$ taking values in ${\Gamma}$ , and $m$ is a fixed positive integer.

Theorem 3.3

If $\sum_{i=1}^{m}p_{i}(1-p_{i})>\frac{1}{2}\sum_{i=1}^{m}p_{i}$ , let $a,b;{\beta}$ and $\mu$ be as defined in Case 2, then for ${i.i.d.}$ Bernoulli processes with shifts $\{\Xi_{k}\}_{k\in\mathbb{N}}$ ,

[TABLE]

Remark 3.4

If we apply a partition $\mathscr{G}$ as introduced in the previous section directly, then the bound we may obtain is at most of order $o(1)$ . **

Proof. According to Remark 2.6, with $\nu(dx)=\frac{1}{\mathbb{E}|\Xi_{1}|}\mathbb{E}\Xi_{1}(dx)$ , it suffices to show

[TABLE]

We embed $\{\Xi_{i}\}$ into marked point processes [Daley & Vere-Jones (2008), pp. 194–195] and use Theorem 3.1 to complete the proof. To this end, we take fixed points $\{t_{i}:\ 1\leq i\leq m\}$ with $d_{0}^{\prime}$ distances $1$ from each other and define a ground process [Daley & Vere-Jones (2008), p. 194] of $\Xi_{1}$ as $\Xi^{\prime}_{1}=\sum_{i=1}^{n}I_{i}\delta_{t_{i}}$ . The metric $d_{0}^{\prime}$ induces $d_{1}^{\prime}$ and $d_{2}^{\prime}$ in the same way as that $d_{0}$ generates $d_{1}$ and $d_{2}$ . The mean measure of $\Xi^{\prime}_{1}$ is $\lambda^{\prime}=\sum_{i=1}^{m}p_{i}\delta_{t_{i}}$ . Let $\theta_{1},\theta_{2};a,b;{\beta}$ be the same as those defined in Theorem 3.3 and set $\nu^{\prime}(dx)=\left(\sum_{i=1}^{m}p_{i}\right)^{-1}\lambda^{\prime}(dx).$ For ${i.i.d.}$ Bernoulli processes $\{\Xi^{\prime}_{k}\}$ , it follows from Theorem 3.1, Remark 3.2 and Remark 2.6 that

[TABLE]

Using the Rubinstein duality theorem [Rachev (1991), Theorem 8.1.1] and decompositions of point processes [Kallenberg (1983), §2.1], we can find $\mathbb{Z}_{+}^{m}$ -valued random vectors $\left(Y_{11},\dots,Y_{1m}\right)$ and $\left(Y_{21},\dots,Y_{2m}\right)$ such that $\mathscr{L}\left(\sum_{i=1}^{m}Y_{1i}\delta_{t_{i}}\right)=\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}^{\prime}\right)$ , $\mathscr{L}\left(\sum_{i=1}^{m}Y_{2i}\delta_{t_{i}}\right)=\mbox{\boldmath$ \pi $}_{a,b;{\beta};\nu^{\prime}}$ and

[TABLE]

We now use $\sum_{i=1}^{m}Y_{1i}\delta_{t_{i}}$ and $\sum_{i=1}^{m}Y_{2i}\delta_{t_{i}}$ as ground processes to construct marked point processes as suitable realisations of $\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right)$ and $\mbox{\boldmath$ \pi $}_{a,b;{\beta};\nu}$ . Let $\{\left(\zeta_{1i},\dots,\zeta_{mi}\right):\ i\in\mathbb{N}\}$ be independent copies of $\left(\zeta_{1},\dots,\zeta_{m}\right)$ such that $\{\left(\zeta_{1i},\dots,\zeta_{mi}\right):\ i\in\mathbb{N}\}$ is independent of $\{\left(Y_{11},\dots,Y_{1m}\right),\left(Y_{21},\dots,Y_{2m}\right)\}$ , define

[TABLE]

then $\mathscr{L}\left({\cal W}_{1}\right)=\mathscr{L}\left(\sum_{k=1}^{n}\Xi_{k}\right)$ , $\mathscr{L}\left({\cal W}_{2}\right)=\mbox{\boldmath$ \pi $}_{a,b;{\beta};\nu}$ , and

[TABLE]

Combining (10), (11) and (12) gives (9).

3.3 Compound Poisson process

[Barbour, Chen & Loh (1992)] and [Barbour & Månsson (2002)] demonstrate that a compound Poisson process is often good enough as a suitable asymptotic model for a variety of random phenomena. In this example, we show that the superposition of such a model can be well described by Theorem 2.2.

Recall that a compound Poisson process on a compact carrier space ${\Gamma}$ is defined as $\Xi_{1}=\sum_{i=1}^{\infty}iX_{i}$ , where $\{X_{i}\}$ are independent Poisson processes with mean measures $\{\lambda_{i}\}$ on ${\Gamma}$ respectively and we write $\Xi_{1}\sim{\rm CP}(\lambda_{1},\lambda_{2},\dots)$ .

Theorem 3.5

If $\Xi_{1}\sim{\rm CP}(\lambda_{1},\lambda_{2},\dots)$ with $\mathbb{E}|\Xi_{1}|<\infty$ and $\sum_{j\geq 2}\lambda_{j}$ being absolutely continuous with respect to $\lambda_{1}$ , then for i.i.d. compound Poisson processes $\{\Xi_{k}\}_{k\in\mathbb{N}}$ , with $a,b;{\beta};\mu$ chosen as in Case 1, we have

[TABLE]

Remark 3.6

Noting that the superposition $\sum_{i=1}^{n}\Xi_{i}\sim{\rm CP}(n\lambda_{1},n\lambda_{2},\dots)$ , Theorem 3.5 states that, with suitably chosen parameters, $\mbox{\boldmath$ \pi $}_{a,b;{\beta};\mu}$ can be used to replace a compound Poisson process in the context of superposition of point processes.**

Remark 3.7

The condition that $\sum_{j\geq 2}\lambda_{j}$ is absolutely continuous with respect to $\lambda_{1}$ guarantees aperiodicity of the distribution and it plays the crucial role in the theory of compound Poisson approximation in [Barbour, Chen & Loh (1992), Barbour & Utev (1998), Barbour & Utev (1999), Barbour & Månsson (2002), Xia (2005a)].**

Remark 3.8

It can be observed from the proof below that better upper bounds are possible if more information about $\{\lambda_{i}\}$ is available.**

Proof of Theorem 3.5. Taking a reduced carrier space if necessary, without loss of generality, we assume ${\Gamma}$ equals the support of $\lambda_{1}$ , that is, the smallest closed set $A$ such that $\lambda_{1}(A)=|\lambda_{1}|$ . Since ${\rm Var}(|\Xi_{1}|)=\sum_{i=1}^{\infty}i^{2}\lambda_{i}({\Gamma})\geq\mathbb{E}|\Xi_{1}|=\sum_{i=1}^{\infty}i\lambda_{i}({\Gamma})$ , Case 1 applies. Set $\Xi=\sum_{i=1}^{n-2}\Xi_{i}$ , let $\mathscr{G}=\{G_{1},\cdots,G_{k}\}$ be a partition and $W_{1}$ be a Poisson process on ${\Gamma}$ with mean measure $(n-2)\lambda_{1}$ , then

[TABLE]

where the last inequality is from Proposition A.2.7 in [Barbour, Holst & Janson (1992)]. Hence

[TABLE]

which implies, for arbitrary $\varepsilon>0$ , $\vartheta_{\varepsilon}(\Xi)\leq\inf_{\mathscr{G}\in{\mathscr{P}}_{\varepsilon}}{\max_{G\in\mathscr{G}}{\frac{1}{\sqrt{2e(n-2)\lambda_{1}(G)}}}}$ . It then follows from (5) that

[TABLE]

completing the proof.

3.4 Renewal process

The superposition of renewal processes is not a renewal process except that they are Poisson processes [Feller (1968), p. 370] and the exact behaviour of the superposition is generally hard to extract. In this subsection, we establish its asymptotic behaviour.

Let $W_{0}$ , $W_{1}$ , $W_{2}$ , $\cdots$ be independent non-negative random variables defined on a probability space $(\Omega,\mathscr{F},\mathbb{P})$ . The variables $W_{1}$ , $W_{2}$ , $\cdots$ are strictly positive and identically distributed, which play the role of inter-renewal times of the renewal process $\mathbf{S}=(S_{n})_{0}^{\infty}:=(\sum_{i=0}^{n}W_{i})_{0}^{\infty}$ . We assume $\mathbb{E}(W_{1}^{2})<\infty$ and choose the delay $W_{0}$ to make the renewal process stationary [Daley & Vere-Jones (2008), p. 75]. We define $\Xi_{1}=\sum_{m=0}^{\infty}\delta_{S_{m}}\mathbf{1}_{S_{m}\in{\Gamma}}$ , which is the renewal point process restricted to $\Gamma=[0,1]$ [Kallenberg (1983), p. 12]. Before stating the result in this subsection, we briefly recall three terminologies. The support of a random variable $X$ is defined as the smallest closed set $A$ such that $\mathbb{P}(X\in A)=1$ and, for two subsets $B_{1},B_{2}$ of ${\mathbb{R}}$ , $B_{1}+B_{2}:=\{x+y:\ x\in B_{1},y\in B_{2}\}$ and $d_{0}(B_{1},B_{2})=\inf\{d_{0}(x,y):\ x\in B_{1},y\in B_{2}\}$ .

Theorem 3.9

Assume the renewal time $W_{1}$ satisfies

[TABLE]

and ${\rm Var}(|\Xi_{1}|)>\frac{1}{2}\mathbb{E}|\Xi_{1}|$ , then for ${i.i.d.}$ renewal processes $\{\Xi_{k}\}_{k\in\mathbb{N}}$ , with $a,b;{\beta};\mu$ chosen as in Case 1 if ${\rm Var}(|\Xi_{1}|)\geq\mathbb{E}|\Xi_{1}|$ and in Case 2 if $\frac{1}{2}\mathbb{E}|\Xi_{1}|<{\rm Var}(|\Xi_{1}|)<\mathbb{E}|\Xi_{1}|$ , we have

[TABLE]

Remark 3.10

If $0\in{\rm supp}(W_{1})$ , then it satisfies (13) and the bound in Theorem 3.9 holds.**

Remark 3.11

The condition (13) is almost necessary. See counterexample 3.13 below.**

Remark 3.12

The condition ${\rm Var}(|\Xi_{1}|)>\frac{1}{2}\mathbb{E}|\Xi_{1}|$ can not be easily deduced from the moments of $W_{1}$ . However, if we consider a sufficiently large carrier space, the asymptotic behaviour of the renewal process ensures that the condition can be verified through the first two moments of $W_{1}$ .**

Proof of Theorem 3.9. For any $\epsilon>0$ , we take an $m$ such that $2^{-m}<\epsilon$ . We divide ${\Gamma}$ into $2^{m}$ equally spaced intervals with $s_{j}=j/2^{m}$ so that $\mathscr{G}=\{G_{1},\dots,G_{2^{m}}\}$ , where $G_{1}=[0,s_{1}]$ and $G_{j}=\left(s_{j-1},s_{j}\right]$ for $2\leq j\leq 2^{m}$ . The centre of $G_{j}$ is $t_{j}=(s_{j-1}+s_{j})/2$ and $d_{0}(\mathscr{G})=2^{-(m+1)}$ . Consequently, the first term in (5) is bounded by $\epsilon$ . With the partition $\mathscr{G}$ , set $X_{j}=\Xi(G_{j})$ , define $\vec{X}=(X_{1},\cdots,X_{2^{m}})$ and $\vec{Y}$ as the sum of $n-2$ independent copies of $\vec{X}$ . Applying [Barbour, Luczak & Xia (2018), Lemma 4.1], we obtain

[TABLE]

where $u_{m}:=\min_{1\leq j\leq 2^{m}}\{1-d_{tv}(\mathscr{L}(\vec{X}),\mathscr{L}(\vec{X}+e_{j}))\}$ and $C$ is a universal constant. If $u_{m}\neq 0$ , then the second term in (5) with $\varepsilon=2^{-(m+1)}$ is also dominated by $\epsilon$ for sufficiently large $n$ , which implies that the bound in (5) can be made arbitrarily small as $n\to\infty$ . To establish $u_{m}\neq 0$ , we make use of the assumption that the support $A$ of $W_{1}$ satisfies $d_{0}(A+A,A)=0$ . Since $A$ is closed, and in ${\mathbb{R}}_{+}$ , the operation $+$ is continuous, $A+A$ and $A$ are both closed, which means that $(A+A)\cap A\neq\emptyset$ . This in turn implies that there exists at least one $x\in\mathbb{R}_{+}$ such that both $\mathbb{P}(W\in(x-\epsilon_{1},x+\epsilon_{1}))$ and $\mathbb{P}(W_{1}+W_{2}\in(x-\epsilon_{1},x+\epsilon_{1}))$ are positive for all $\epsilon_{1}>0$ . It is also possible to find a $0<y<x$ such that $\mathbb{P}(W_{1}\in(y-\epsilon_{2},y+\epsilon_{2}),W_{1}+W_{2}\in(x-\epsilon_{1},x+\epsilon_{1}))\neq 0$ for all $\epsilon_{1}$ , $\epsilon_{2}>0$ . For the convenience of argument, we extend the stationary renewal point process $\Xi_{1}$ to $\Xi_{1}^{\prime}$ on ${\mathbb{R}}$ . For $0<j\leq m$ , if $\varsigma>0$ is small enough, the set

[TABLE]

has positive Lebesgue measure.

From stationarity, there is a positive probability that there is at least one point in $B_{\varsigma}$ , and conditional on the largest point in $B_{\varsigma}$ and the past, the renewal process has a positive probability for the future inter-renewal times $W_{1}^{\prime}$ , $W_{2}^{\prime}$ , $\cdots$ to evolve as $W_{i}^{\prime}\in(x-\frac{2\varsigma}{2^{i}},x+\frac{2\varsigma}{2^{i}})$ for all $i\in\mathbb{N}$ until time $1$ and it also guarantees a positive probability that the incoming inter-renewal times $W_{1}^{\prime\prime}$ , $W_{2}^{\prime\prime}$ , $\cdots$ evolve as $W_{1}^{\prime\prime}\in(y-\varsigma,y+\varsigma)$ , $W_{1}^{\prime\prime}+W_{2}^{\prime\prime}\in(x-\varsigma,x+\varsigma)$ , ${W_{i}^{\prime\prime}=W_{i-1}^{\prime}}\in(x-\frac{4\varsigma}{2^{i}},x+\frac{4\varsigma}{2^{i}})$ for $i\geq 3$ until time $1$ . The choice of $B_{\varsigma}$ and synchronicity of $\{W_{i}^{\prime}:\ i\geq 1\}$ and $\{W_{i}^{\prime\prime}:\ i\geq 1\}$ ensure that an extra renewal point caused by $W^{\prime\prime}_{1}$ is added in $G_{j}$ , and the subsequent renewal points of the two renewal processes occur in the same partition sets $\{G_{k}:\ k\geq j\}$ simultaneously. Consequently, we can set aside a positive probability event $B_{+}$ such that on $B_{+}$ , the two renewal processes run together until the point in $B_{\varsigma}$ and then one runs according to $\{W_{i}^{\prime}\}$ and the other evolves as $\{W^{\prime\prime}_{i}\}$ . Figure 1 shows the coupling when $m=2$ , $x=0.5$ , $y=0.2$ , $j=1$ , a renewal happens at around $-0.25$ , with a positive probability, the next three inter-arrival times are each around $x=0.5$ ; with another positive probability, the incoming four inter-arrival times respectively take values around $y=0.2$ , $x-y=0.3$ , $x=0.5$ , $x=0.5$ .

For this coupling, the corresponding vectors $\vec{X}^{\prime}$ and $\vec{X}^{\prime\prime}$ satisfy that $\vec{X}^{\prime\prime}=\vec{X}^{\prime}+e_{j}$ on $B_{+}$ (in Figure 1, $\vec{X}^{\prime}=(0,1,0,1)$ and $\vec{X}^{\prime\prime}=(1,1,0,1)$ ), which implies that $d_{tv}(\mathscr{L}(\vec{X}),\mathscr{L}(\vec{X}+e_{j}))<1$ for all $0<j\leq 2^{m}$ . This concludes the proof.

Counterexample 3.13

If ${\rm supp}{(W_{1})}\subset[a,2a-b]$ for some $0<b<a\leq\frac{1}{2}$ , then $u_{m}=0$ for some $m$ so the method does not work.**

In fact, for $m$ large enough, when $X_{2^{m-1}}=0$ , there is one point of $\mathbf{S}$ sitting in the interval $B_{3}:=\left[\frac{1}{2}-\frac{1}{2^{m}}-a+\frac{b}{2},\frac{1}{2}+a-\frac{b}{2}\right]$ almost surely. But when we have $X_{2^{m-1}}=1$ , there are no points in $B_{3}$ except in $G_{2^{m-1}}$ . On the other hand, for $m$ large enough, $X_{2^{m-1}}\leq 1$ almost surely because $a>0$ . In this situation, $d_{tv}(\mathscr{L}(\vec{X}),\mathscr{L}(\vec{X}+e_{2^{m-1}}))=1$ , i.e., $u_{m}=0$ .

Remark 3.14

It is possible to extend Theorem 3.9 to the superposition of ${i.i.d.}$ non-stationary renewal processes, provided there are use-friendly criteria for ensuring $u_{m}:=\min_{1\leq j\leq 2^{m}}\{1-d_{tv}(\mathscr{L}(\vec{X}),\mathscr{L}(\vec{X}+e_{j}))\}\neq 0$ for all $m\geq 1$ and $0<j\leq 2^{m}$ .**

3.5 Entrances or exits of Markov Process

Let $\{M_{t}\}_{t\in\mathbb{R}}$ be a time-reversible and irreducible Markov chain with finite state space $E$ . Let $S_{0}$ be a proper subset of $E$ . As the exit process from $S_{0}$ can be viewed as the entrance process of $E\backslash S_{0}$ , we consider entrance process to $S_{0}$ only. Let $T_{1}:=\inf\{t\geq 0:\ M_{t^{-}}\notin S_{0}\mbox{ and }M_{t}\in S_{0}\}$ and $T_{i+1}=\inf\{t>T_{i}:M_{t^{-}}\notin S_{0}\mbox{ and }M_{t}\in S_{0}\}$ for $i\geq 1$ . Then the total number of entrances to $S_{0}$ in ${\Gamma}=[0,1]$ can be written as $\tau=\max\{n:T_{n}\leq 1\}$ with $\max\emptyset:=0$ , and the times of entrances form a point process $\Xi_{1}:=\sum_{1\leq i\leq\tau}\delta_{T_{i}}$ with convention $\Xi_{1}=0$ when $\tau=0$ . Clearly, $|\Xi_{1}|$ is almost surely finite.

Theorem 3.15

For i.i.d. entrance processes $\{\Xi_{k}\}_{k\in\mathbb{N}}$ , with $a,b;{\beta};\mu$ chosen as in Case 1,

[TABLE]

Remark 3.16

When $S_{0}$ is a single point set, $(T_{n})_{0}^{\infty}$ forms a renewal process, Theorem 3.15 becomes a special case of Theorem 3.9. However, when $S_{0}$ contains more than one state, then $\Xi_{1}$ is no longer a renewal process.**

Proof of Theorem 3.15. [Brown, Hamza & Xia (1998), Corollary 2] implies that ${\rm Var}(|\Xi_{1}|)\geq\mathbb{E}|\Xi|$ , so Case 1 applies. The rest of the proof is essentially the same as that of Theorem 3.9. For any $\epsilon>0$ , we choose an $m$ such that $2^{-m}<\epsilon$ . Let $s_{j}=j/2^{m}$ and $\mathscr{G}=\{G_{1},\dots,G_{2^{m}}\}$ , where $G_{1}=[0,s_{1}]$ and $G_{j}=\left(s_{j-1},s_{j}\right]$ for $2\leq j\leq 2^{m}$ . The centre of $G_{j}$ is at $t_{j}=(s_{j-1}+s_{j})/2$ and $d_{0}(\mathscr{G})=2^{-(m+1)}$ . This partition ensures that the first term in (5) is bounded by $\epsilon$ . Set $X_{j}=\Xi(G_{j})$ and define $\vec{X}=(X_{1},\cdots,X_{2^{m}})$ and $\vec{Y}$ as the sum of $n-2$ independent copies of $\vec{X}$ . It follows from [Barbour, Luczak & Xia (2018), Lemma 4.1] that

[TABLE]

where $u_{m}:=\min_{1\leq j\leq 2^{m}}\{1-d_{tv}(\mathscr{L}(\vec{X}),\mathscr{L}(\vec{X}+e_{j}))\}$ and $C$ is a universal constant. It remains to show that $u_{m}\neq 0$ . Since $\{M_{t}\}_{t\in{\mathbb{R}}}$ is irreducible, we can choose a state $s\in E\backslash S_{0}$ such that there is a positive probability of entering $S_{0}$ immediately after leaving $s$ . Let $\tau_{1}$ be the first time that the Markov chain enters $S_{0}$ , $\tau_{2}$ be the first time after $\tau_{1}$ to depart from $S_{0}$ , $T_{1}^{\prime}$ and $T_{2}^{\prime}$ be the first and second jump times of $\{M_{t}\}_{t\in\mathbb{R}}$ after time [math]. From the assumption that $\{M_{t}\}_{t\in{\mathbb{R}}}$ is finite irreducible, we can conclude that $\mathbb{P}(M_{0}=s)>0$ and so $q_{1}:=\mathbb{P}\left(\vec{X}=\mathbf{0}\right)\geq\mathbb{P}\left(M_{0}=s,\tau_{1}>1\right)\geq\mathbb{P}\left(M_{0}=s,T_{1}^{\prime}>1\right)>0$ and

[TABLE]

which in turn imply $d_{tv}(\mathscr{L}(\vec{X}),\mathscr{L}(\vec{X}+e_{j}))\leq\max\{1-q_{1},1-q_{2}\}<1$ , as claimed.

4 The Proof of Theorem 2.2

The advantage of using $\mbox{\boldmath$ \pi $}_{a,b;{\beta};\mu}$ as approximating distribution is that it can be considered as the unique stationary distribution of an ${\mathscr{H}}$ -valued positive recurrent process with the generator

[TABLE]

see [Xia & Zhang (2012)] for more details. We use $\mathbb{Z}_{\xi}(\cdot)$ to stand for a birth-death point process with generator ${\mathscr{A}}$ and initial configuration $\xi$ . For any bounded measurable function $f$ on $({\mathscr{H}},{\mathscr{B}}({\mathscr{H}}))$ , it can be shown that

[TABLE]

is well defined and is the solution of the Stein equation

[TABLE]

To estimate $d_{2}(\mathscr{L}({\cal W}),\mbox{\boldmath$ \pi $}_{a,b;{\beta};\mu})$ , it is equivalent to bound $\mathbb{E}{\mathscr{A}}h_{f}({\cal W})$ for all $f\in\mathscr{F}$ defined on page 2. As $\mathbb{E}{\mathscr{A}}h({\cal W})$ can be expressed via the differences of $h$ , the successful application of the Stein method hinges on sharp upper bounds of

[TABLE]

Let $\Delta_{2}h(\xi):=\sup\{|\Delta_{2}h(\xi;x,y)|:\ x,y\in{\Gamma}\}$ . Then it is shown in [Xia & Zhang (2012)] that

[TABLE]

Now, we are ready to prove Theorem 2.2.

Proof of Theorem 2.2. The inequalities (5) and (7) are due to the well-known concentration inequality, see [McDiarmid (1998), Theorem 2.7] and [Chung & Lu, Theorem 2.7]. Hence it remains to show (4) and (6).

Suppose $\mathscr{G}\in{\mathscr{P}}_{\varepsilon}$ . The “assembling mapping” $\mathscr{M}_{\mathscr{G}}$ ensures that for any configuration $\eta$ ,

[TABLE]

It follows that $d_{2}(\mathscr{L}({\cal W}),\mathscr{L}(\mathscr{M}_{\mathscr{G}}\circ{\cal W}))\leq\varepsilon$ for any point process ${\cal W}$ , which yields

[TABLE]

To compute the $d_{2}$ distance between $\mathscr{L}(\mathscr{M}_{\mathscr{G}}\circ\Xi)$ and $\mathscr{L}(\mathscr{M}_{\mathscr{G}}\circ{\cal Z})$ , we concentrate on the space $\tilde{\Gamma}:=\{t_{1},\cdots,t_{k}\}$ and apply Stein’s method. Denote by $\tilde{\mathscr{H}}$ the class of all configurations on $\tilde{\Gamma}$ . For any $\xi\in\tilde{\mathscr{H}}$ , let

[TABLE]

Then, with generator $\tilde{\mathscr{A}}$ , we have a positive recurrent Markov process on $\tilde{\mathscr{H}}$ . The unique stationary measure is $\tilde{}\mbox{\boldmath$ \pi $}:=\mbox{\boldmath$ \pi $}_{a,b;{\beta};\tilde{\mu}}=\mathscr{L}(\mathscr{M}_{\mathscr{G}}\circ{\cal Z})$ , where

[TABLE]

Denote

[TABLE]

For any $\tilde{f}\in\tilde{\mathscr{F}}$ , let $\tilde{h}_{\tilde{f}}$ be the unique solution of

[TABLE]

Then

[TABLE]

Now we concentrate on estimating $\mathbb{E}\tilde{\mathscr{A}}\tilde{h}_{f}(\mathscr{M}_{\mathscr{G}}\circ\Xi)$ . Let

[TABLE]

Then

[TABLE]

First of all, we can write $\mathbb{E}\tilde{\mathscr{A}}\tilde{h}(\mathscr{M}_{\mathscr{G}}\circ\Xi)$ via $\Delta^{2}h$ ’s. Namely, since $\Xi_{1},\cdots,\Xi_{n}$ are independent identically distributed,

[TABLE]

With the reduced Palm processes, one can write $\mathbb{E}\tilde{\mathscr{A}}\tilde{h}(\mathscr{M}_{\mathscr{G}}\circ\Xi)$ as

[TABLE]

We subtract $\Delta h(\sum_{j=2}^{n}\Xi_{j};x)$ in the first four terms and $\Delta h(\sum_{k=3}^{n}\Xi_{k};x)$ in the last one. Then, $\mathbb{E}\tilde{\mathscr{A}}\tilde{h}(\mathscr{M}_{\mathscr{G}}\circ\Xi)$ can be written via $\Delta_{2}h$ ’s, provided that the number of $-\Delta_{2}h$ added is balanced with that of $\Delta_{2}$ added. More precisely, we need

[TABLE]

which is equivalent to (2) and (3). With (2) and (3), we write $\mathbb{E}\tilde{\mathscr{A}}\tilde{h}(\mathscr{M}_{\mathscr{G}}\circ\Xi)$ via $\Delta_{2}h$ ’s. For example, the first term in (18) becomes

[TABLE]

The difference $\Delta h(\Xi;x)-\Delta h(\sum_{j=2}^{n}\Xi_{j};x)$ can be telescoped out as the sum of $|\Xi_{1}|$ $\Delta_{2}h$ functions. Provided the number of $\Delta_{2}h$ is balanced with that of $-\Delta_{2}h$ , one can further write $\mathbb{E}\tilde{\mathscr{A}}\tilde{h}(\mathscr{M}_{\mathscr{G}}\circ\Xi)$ via $\Delta_{3}h$ ’s or differences of two $\Delta_{2}h$ ’s. To this end, let $z\in{\Gamma}$ and

[TABLE]

Then,

[TABLE]

provided that

[TABLE]

In both cases, $b$ and ${\beta}$ are taken to ensure the above equality.

To estimate $e_{1},\cdots,e_{7}$ , we decompose them into the sum of $\Delta_{2}h$ functions of the forms

[TABLE]

The bounds in the following lemma can be found in [Xia & Zhang (2012), pp. 3060-3061].

Lemma 4.1

For any point process ${\cal W}$ , and $u>0$ , both $|\mathbb{E}\Delta_{3}h({\cal W};x,y,z)|$ and $|\mathbb{E}\Delta_{2,T}h({\cal W};x,y;z,z)|$ are bounded above by

[TABLE]

where $a$ is defined in (2) and ${\rm TV}_{\mathscr{G}}({\cal W})$ is defined in (1).

To estimate $e_{1}$ , let $\Xi_{1}=\sum_{n=1}^{|\Xi_{1}|}\delta_{X_{n}}$ , $\langle\Xi_{1}\rangle_{r}:=\sum_{n=1}^{r}\delta_{X_{n}}$ , ${\cal W}=\sum_{j=2}^{n}\Xi_{j}$ . Then,

[TABLE]

Since ${\cal W}$ is independent of $\Xi_{1}$ , it follows that

[TABLE]

Similarly, we have

[TABLE]

Since $r(\sum_{j=2}^{n}\Xi_{j})\leq r(\sum_{k=3}^{n}\Xi_{k})$ , we have $|\mathbb{E}\tilde{\mathscr{A}}\tilde{h}(\Xi)|\leq C_{n}r(\sum_{k=3}^{n}\Xi_{k})$ , where

[TABLE]

It is not difficult to check that in each of the two cases, $a$ has order $n$ , $b$ is a constant and ${\beta}$ has order $1/n$ . Hence $C_{n}$ has order $n$ . Let $u$ be a constant independent of $n$ such that $a/u<n\theta_{1}/4$ , then for $n>2$ ,

[TABLE]

where $C$ is a constant. Using the fact that $\sum_{k=3}^{n}\Xi_{k}$ and $\sum_{k=1}^{n-2}\Xi_{k}$ have the same distribution, we combine (16) and (17) to conclude that

[TABLE]

Since $\mathscr{G}$ is arbitrary, the proof of Theorem 2.2 is complete.

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[Barbour (1988)] Barbour, A. D. (1988) Stein’s method and Poisson process convergence. J. Appl. Probab. 25 (A), 175–184.
3[Barbour (1990)] Barbour, A. D. (1990) Stein’s method for diffusion approximations. Probab. Theory Related Fields 84 , 297–322.
4[Barbour & Brown (1992)] Barbour, A. D. & Brown, T. C. (1992) Stein’s method and point process approximation. Stochastic Processes Appl. 43 , 9–31.
5[Barbour, Chen & Loh (1992)] Barbour, A. D., Chen, L. H. Y. & Loh, W. L. (1992) Compound Poisson approximation for nonnegative random variables via Stein’s method. Ann. Probab. 20 , 1843–1866.
6[Barbour & Choi (2004)] Barbour, A. D. & Choi, K. P. (2004) A non-uniform bound for translated Poisson approximation. Electron. J. Probab. 9 , 18–36.
7[Barbour & Hall (1984)] Barbour, A. D. & Hall, P. (1984) On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 , 473–480.
8[Barbour, Holst & Janson (1992)] Barbour, A. D., Holst, L. & Janson, S. (1992) Poisson Approximation. Oxford Univ. Press.