On moderate deviations in Poisson approximation

Qingwei Liu; Aihua Xia

arXiv:1906.10016·math.PR·September 9, 2020·J. Appl. Probab.

On moderate deviations in Poisson approximation

Qingwei Liu, Aihua Xia

PDF

TL;DR

This paper investigates the accuracy of Poisson approximation for rare event counts, showing that with proper adjustments, the error estimates for moderate deviations are improved and applicable across various problems.

Contribution

It introduces refined error estimates for moderate deviations in Poisson approximation, applicable to multiple classical problems, with no unspecified constants and easy to implement.

Findings

01

Poisson tail probabilities better approximate rare event counts than normal tails.

02

Adjusted estimates improve error bounds for moderate deviations.

03

Applications include Poisson-binomial, matching, occupancy, birthday, random graphs, and 2-runs.

Abstract

In this paper, we first use the distribution of the number of records to demonstrate that the right tail probabilities of counts of rare events are generally better approximated by the right tail probabilities of Poisson distribution than {those} of normal distribution. We then show the moderate deviations in Poisson approximation generally require an adjustment and, with suitable adjustment, we establish better error estimates of the moderate deviations in Poisson approximation than those in \cite{CFS}. Our estimates contain no unspecified constants and are easy to apply. We illustrate the use of the theorems in six applications: Poisson-binomial distribution, matching problem, occupancy problem, birthday problem, random graphs and 2-runs. The paper complements the works of \cite{CC92,BCC95,CFS}.

Equations264

E e^{t_{0} ∣ X_{1} ∣} \leq c_{0} < \infty,

E e^{t_{0} ∣ X_{1} ∣} \leq c_{0} < \infty,

\frac{P ( \frac{1}{n} \sum _{i = 1}^{n} X _{i} \geq z )}{1 - Φ ( z )} = 1 + O (1) \frac{1 + z ^{3}}{n}, 0 \leq z \leq c_{1} n^{1/6},

\frac{P ( \frac{1}{n} \sum _{i = 1}^{n} X _{i} \geq z )}{1 - Φ ( z )} = 1 + O (1) \frac{1 + z ^{3}}{n}, 0 \leq z \leq c_{1} n^{1/6},

I_{i} := 1 [η_{i} > 1 \leq j \leq i - 1 max η_{j}],

I_{i} := 1 [η_{i} > 1 \leq j \leq i - 1 max η_{j}],

λ_{n} := E S_{n} = i = 2 \sum n \frac{1}{i}; σ_{n}^{2} := Var (S_{n}) = i = 2 \sum n \frac{1}{i} (1 - \frac{1}{i}) .

λ_{n} := E S_{n} = i = 2 \sum n \frac{1}{i}; σ_{n}^{2} := Var (S_{n}) = i = 2 \sum n \frac{1}{i} (1 - \frac{1}{i}) .

n \to \infty lim \frac{P ( W _{n} \geq n p + x n p ( 1 - p ) )}{P ( Y _{n} \geq n p + x n p ( 1 - p ) )} = \frac{P ( Z \geq x )}{P ( Z \geq x 1 - p )},

n \to \infty lim \frac{P ( W _{n} \geq n p + x n p ( 1 - p ) )}{P ( Y _{n} \geq n p + x n p ( 1 - p ) )} = \frac{P ( Z \geq x )}{P ( Z \geq x 1 - p )},

n \to \infty lim \frac{P ( W _{n} \geq n p + x n p ( 1 - p ) )}{P ( Y _{n} \geq n p + x n p )} = 1

n \to \infty lim \frac{P ( W _{n} \geq n p + x n p ( 1 - p ) )}{P ( Y _{n} \geq n p + x n p )} = 1

n \to \infty lim \frac{P ( W _{n} \geq n p + x n p ( 1 - p ) )}{P ( Y _{n}^{'} \geq n p ( 1 - p ) + x n p ( 1 - p ) )} = 1.

n \to \infty lim \frac{P ( W _{n} \geq n p + x n p ( 1 - p ) )}{P ( Y _{n}^{'} \geq n p ( 1 - p ) + x n p ( 1 - p ) )} = 1.

μ_{i} = E (X_{i}), \leavevmode \leavevmode μ = E (W), \leavevmode \leavevmode σ^{2} = Var (W) .

μ_{i} = E (X_{i}), \leavevmode \leavevmode μ = E (W), \leavevmode \leavevmode σ^{2} = Var (W) .

θ_{i} := ess sup j max P (W = j ∣ F_{i}),

θ_{i} := ess sup j max P (W = j ∣ F_{i}),

\frac{P ( W - a \geq k )}{P ( Y \geq k )} - 1 \leq

\frac{P ( W - a \geq k )}{P ( Y \geq k )} - 1 \leq

+ E [∣ X_{i} - μ_{i} ∣ Z_{i} (Z_{i}^{'} - Z_{i} /2 - 1/2)]}

+ \mathds C_{1} (λ, k) ∣ λ - σ^{2} ∣ + P (W - a < - 1),

\mathds C_{1} (λ, k)

\mathds C_{1} (λ, k)

\mathds C_{2} (λ, k)

\mbox r a t i o i := \mathds C_{i} (λ, k) / [(1 - e^{- λ}) / (λ P (Y \geq k))], i = 1, 2,

\mbox r a t i o i := \mathds C_{i} (λ, k) / [(1 - e^{- λ}) / (λ P (Y \geq k))], i = 1, 2,

\frac{P ( W - a \geq k )}{P ( Y \geq k )} - 1

\frac{P ( W - a \geq k )}{P ( Y \geq k )} - 1

\leq \mathds C_{2} (λ, k) i \in I \sum θ_{i} {μ_{i} ∣ E [X_{i} (X_{i} - μ_{i})] ∣ + \frac{1}{2} E [∣ X_{i} - μ_{i} ∣ X_{i} (X_{i} - 1)]}

+ \mathds C_{1} (λ, k) ∣ λ - σ^{2} ∣ + P (W - a < - 1) .

P (W - a < - 1) \leq e^{- \frac{( μ - a + 2 ) ^{2}}{2 \sum _{i \in I} E ( X _{i}^{2} )}} .

P (W - a < - 1) \leq e^{- \frac{( μ - a + 2 ) ^{2}}{2 \sum _{i \in I} E ( X _{i}^{2} )}} .

dF^{s}(w)=\frac{wdF(w)}{\mu},\quad\mbox{$w\geq 0$,}

dF^{s}(w)=\frac{wdF(w)}{\mu},\quad\mbox{$w\geq 0$,}

\operatorname{\mathbb{E}}[Wg(W)]=\mu\operatorname{\mathbb{E}}g(W^{s})\quad\mbox{for all $g$ with $\operatorname{\mathbb{E}}|Wg(W)|<\infty$.}

\operatorname{\mathbb{E}}[Wg(W)]=\mu\operatorname{\mathbb{E}}g(W^{s})\quad\mbox{for all $g$ with $\operatorname{\mathbb{E}}|Wg(W)|<\infty$.}

\frac{P ( W - a \geq k )}{P ( Y \geq k )} - 1 \leq

\frac{P ( W - a \geq k )}{P ( Y \geq k )} - 1 \leq

+ P (W - a < - 1),

E [(V - μ) g (V)] = σ^{2} E Δ g (V^{⋆}),

E [(V - μ) g (V)] = σ^{2} E Δ g (V^{⋆}),

θ_{R} = j max P (W = j ∣ R) .

θ_{R} = j max P (W = j ∣ R) .

\frac{P ( W - a \geq k )}{P ( Y \geq k )} - 1

\frac{P ( W - a \geq k )}{P ( Y \geq k )} - 1

\leq \mathds C_{2} (λ, k) σ^{2} E [∣ R ∣ θ_{R}] + \mathds C_{1} (λ, k) ∣ λ - σ^{2} ∣ λ^{- 1} + P (W - a < - 1),

W^{s} = j \neq = I \sum X_{j}^{(I)} + 1,

W^{s} = j \neq = I \sum X_{j}^{(I)} + 1,

L ({X_{j}^{(i)} : j \in I}) = L ({X_{j} : j \in I} ∣ X_{i} = 1),

L ({X_{j}^{(i)} : j \in I}) = L ({X_{j} : j \in I} ∣ X_{i} = 1),

E ∣ W + 1 - W^{s} ∣ = E (W + 1 - W^{s}) = μ^{- 1} (μ - σ^{2}),

E ∣ W + 1 - W^{s} ∣ = E (W + 1 - W^{s}) = μ^{- 1} (μ - σ^{2}),

E ∣ W + 1 - W^{s} ∣

E ∣ W + 1 - W^{s} ∣

\leq E ⎩ ⎨ ⎧ j \neq = I \sum (X_{j}^{(I)} - X_{j}) + X_{I} ⎭ ⎬ ⎫

= E (W^{s} - W - 1) + 2 μ^{- 1} i \in I \sum p_{i}^{2}

= μ^{- 1} (σ^{2} - μ) + 2 μ^{- 1} i \in I \sum p_{i}^{2} .

\frac{P ( W \geq k )}{Pn ( μ ) ([ k , \infty ))} - 1 \leq \mathds C_{1} (μ, k) μ_{2}

\frac{P ( W \geq k )}{Pn ( μ ) ([ k , \infty ))} - 1 \leq \mathds C_{1} (μ, k) μ_{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

ON MODERATE DEVIATIONS IN POISSON APPROXIMATION

Qingwei Liu111School of Mathematics and Statistics, The University of Melbourne, VIC 3010, Australia, E-mail: [email protected]. Work supported in part by China Scholarship Council. and Aihua Xia222School of Mathematics and Statistics, The University of Melbourne, VIC 3010, Australia, E-mail: [email protected]. Work supported in part by Australian Research Council Grants No DP190100613.

(20missingmissing-missingmissing-missingmissing)

Abstract

In this paper, we first use the distribution of the number of records to demonstrate that the right tail probabilities of counts of rare events are generally better approximated by the right tail probabilities of Poisson distribution than those of normal distribution. We then show the moderate deviations in Poisson approximation generally require an adjustment and, with suitable adjustment, we establish better error estimates of the moderate deviations in Poisson approximation than those in [Chen, Fang & Shao (2013a)]. Our estimates contain no unspecified constants and are easy to apply. We illustrate the use of the theorems in six applications: Poisson-binomial distribution, matching problem, occupancy problem, birthday problem, random graphs and 2-runs. The paper complements the works of [Chen & Choi (1992), Barbour, Chen & Choi (1995), Chen, Fang & Shao (2013a)].

Key words and phrases: Stein-Chen method, Poisson approximation, moderate deviation.

AMS 2020 Subject Classification: Primary 60F05; secondary 60E15.

1 Introduction

An exemplary moderate deviation theorem is as follows (see [Petrov (1975), p. 228]). Let $X_{i}$ , $1\leq i\leq n$ , be independent and identically distributed (i.i.d.) random variables with $\operatorname{\mathbb{E}}(X_{1})=0$ and $\operatorname{\mathrm{Var}}(X_{1})=1$ . If for some $t_{0}>0$ ,

[TABLE]

then there exist positive constants $c_{1}$ and $c_{2}$ depending on $c_{0}$ and $t_{0}$ such that

[TABLE]

where $\Phi(z)$ is the distribution function of the standard normal, $|O(1)|\leq c_{2}$ . However, since the pioneering work [Chen (1975)], it has been shown [Barbour, Holst & Janson (1992)] that, for the counts of rare events, Poisson distribution provides a better approximation. For example, the distribution of the number of records [Dwass (1960), Rényi (1962)] in Example 1.1 below can be better approximated by the Poisson distribution having the same mean than by a normal distribution [Deheuvels & Pfeifer (1988)]. Moreover, a suitable refinement of the Poisson distribution can further improve the performance of the approximation [Borovkov (1988), Borovkov & Pfeifer (1996)].

The right tail probabilities of counts of rare events are often needed in statistical inference but these probabilities are so small that the error estimates in approximations of distributions of the counts are usually of no use because the bounds are often larger than the probabilities of interest. Hence it is of practical interest to consider their approximations via moderate deviations in Poisson approximation in a similar fashion to (1.2). However, there is not much progress in the general framework except the special cases in [Chen & Choi (1992), Barbour, Chen & Choi (1995), Chen, Fang & Shao (2013a), Tan, Lu & Xia (2018), Čekanavičius & Vellaisamy (2019)]. This is partly due to the fact that the tail behaviour of a Poisson distribution is significantly different from that of a normal distribution and this fact is observed by [Gnedenko (1943)] in the context of extreme value theory. In particular, [Gnedenko (1943)] concludes that the Poisson distribution does not belong to any domain of attraction while the normal distribution belongs to the domain of attraction of the Gumbel distribution.

Example 1.1

We use the distribution of the number of records to explain the difference of moderate deviations between Poisson and normal approximations. More precisely, let $\{\eta_{i}:\ 1\leq i\leq n\}$ be i.i.d. random variables with a continuous cumulative distribution function. As the value of $\eta_{1}$ is always a record, for $2\leq i\leq n$ , we say $\eta_{i}$ is a record if $\eta_{i}>\max_{1\leq j\leq i-1}\eta_{j}$ . We define the indicator random variable

[TABLE]

that is, $I_{i}=1$ if a new record occurs at time $i$ and $I_{i}=0$ otherwise. Our interest is on the distribution of $S_{n}:=\sum_{i=2}^{n}I_{i}$ , denoted by $\mathscr{L}(S_{n})$ . [Dwass (1960), Rényi (1962)] state that $\operatorname{\mathbb{E}}I_{i}=1/i$ , $\{I_{i}:\ 2\leq i\leq n\}$ are independent so

[TABLE]

We use ${\rm Pn}(\lambda)$ to stand for the Poisson distribution with mean $\lambda$ , ${\rm Pn}(\lambda)(A):=\mathbb{P}(Y\in A)$ for $Y\sim{\rm Pn}(\lambda)$ , and $N(\mu,\sigma^{2})$ to stand for the normal distribution with mean $\mu$ and variance $\sigma^{2}$ .

Let $v_{n}:=\lambda_{n}+x\cdot\sigma_{n}$ , and we consider approximations of $\mathbb{P}(S_{n}\geq v_{n})$ by moderate deviations based on ${\rm Pn}(\lambda_{n})$ [Barbour, Chen & Choi (1995), Chen, Fang & Shao (2013a)] and $N_{n}\sim N(\lambda_{n},\sigma_{n}^{2})$ . For $x=3$ , figures 2, 2 and 4 are respectively the plots of the ratios $\mathbb{P}(S_{n}\geq v_{n})/{\rm Pn}(\lambda_{n})([v_{n},\infty))$ , $\mathbb{P}(S_{n}\geq v_{n})/\mathbb{P}(N_{n}\geq v_{n})$ and $\mathbb{P}(S_{n}\geq v_{n})/{\rm Pn}(\sigma_{n}^{2})([v_{n},\infty))$ for the range of $n\in[3,10^{5}]$ . As observed in [Borovkov & Pfeifer (1996)], Poisson and normal approximations to $\mathscr{L}(S_{n})$ are resp. with order $O((\ln n)^{-1})$ and $O((\ln n)^{-1/2})$ , the numerical studies confirm that approximation by the Poisson distribution is better than that by normal distribution. In fact, it appears that the speed of convergence of $\mathbb{P}(S_{n}\geq v_{n})/\mathbb{P}(N_{n}\geq v_{n})$ to $1$ as $n\to\infty$ is too slow to be of practical use. In the context of normal approximation to the distribution of integer valued random variables, a common practice is to introduce a 0.5 correction, giving the ratios $\mathbb{P}(S_{n}\geq v_{n})/\mathbb{P}(N_{n}\geq\lceil v_{n}\rceil-0.5)$ , where $\lceil x\rceil$ is the smallest integer that is not less than $x$ . Figure 4 is the plot of the ratios and we can see that the ratios are still far away from the limit of 1. Finally, the difference between Figure 2 and Figure 4 shows that a minor change of the mean of the approximating Poisson can change the quality of moderate deviation approximation significantly, further highlighting the difficulty of obtaining sharp bounds in theoretical studies in the area. **

Example 1.1 shows that the distribution of the counts of rare events often has a heavier right tail than that of the corresponding normal distribution, approximations by the moderate deviations in the normal distribution are generally inferior to those by the moderate deviations in the Poisson distribution. The next example says that the parameter of the approximating Poisson distribution suggested in [Chen & Choi (1992), Barbour, Chen & Choi (1995), Chen, Fang & Shao (2013a)] is not optimal and some adjustment can significantly improve the quality of approximations by the moderate deviations in the Poisson distribution.

Example 1.2

With $0<p<1$ , let $W_{n}\sim\text{Bi}(n,p)$ , $Y_{n}\sim{\rm Pn}(np)$ and $Z\sim N(0,1)$ , then for a fixed $x>0$ ,

[TABLE]

which systematically deviates from 1 as $x$ moves away from 0. The systematic bias can be removed by introducing an adjustment into the approximate models: for a fixed $x>0$ ,

[TABLE]

or equivalently, with $Y_{n}^{\prime}\sim{\rm Pn}(np(1-p))$ ,

[TABLE]

Example 1.2 suggests that it is more suitable to approximate the right tail probabilities by looking at the number of standard variations away from the mean, which is essentially the original idea of the translated (shifted) Poisson approximation [Barbour & Xia (1999), Röllin (2005), Röllin (2007)]. In this paper, we show that it is indeed better to approximate the right tail probabilities via the moderate deviations in the translated Poisson distribution.

Our approach does not rely on the boundedness of the Radon-Nikodym derivative as in [Chen & Choi (1992), Barbour, Chen & Choi (1995)] or the tacit assumption of well-behaved tail probabilities as in [Chen, Fang & Shao (2013a)], see Remark 2.4 for more details. For the case of Poisson-binomial, we show in Proposition 3.2 that our approach works for the case that the maximum of the success probabilities of the Bernoulli random variables is not small, such as the distribution of the number of records.

The paper is organised as follows. We state the main results in the context of local dependence, size-biased distribution and discrete zero-biased distribution in Section 2. The accuracy of our bounds is illustrated in six examples in Section 3. The proofs of the main results are postponed to Section 4 where we also establish Stein’s factors for Poisson moderate deviations in Lemma 4.1.

2 The main results

In this section, we state three theorems on moderate deviations in Poisson approximation, the first is under a local dependent structure, the second is with respect to the size-biased distribution and the last is in terms of the discrete zero-biased distribution.

We first consider a class of non-negative integer valued random variables $\{X_{i}:\ i\in{\cal I}\}$ satisfying the local dependent structure (LD2) in [Chen & Shao (2004)] (see also [Arratia, Goldstein & Gordon (1989)] for its origin). For ease of reading, we quote the definition of (LD2) below.

(LD2) For each $i\in{\cal I}$ , there exists an $A_{i}\subset B_{i}\subset{\cal I}$ such that $X_{i}$ is independent of

$\{X_{j}:\ j\in A_{i}^{c}\}$ and $\{X_{i}:\ i\in A_{i}\}$ is independent of $\{X_{j}:\ j\in B_{i}^{c}\}$ .

We set $W=\sum_{i\in{\cal I}}X_{i}$ , $Z_{i}=\sum_{j\in A_{i}}X_{j}$ , $Z_{i}^{\prime}=\sum_{j\in B_{i}}X_{j}$ , $W_{i}=W-Z_{i}$ and $W_{i}^{\prime}=W-Z_{i}^{\prime}$ . We write

[TABLE]

As suggested in Example 1.2, we consider $Y\sim{\rm Pn}(\lambda)$ approximation to $W-a$ with $\left|\lambda-\sigma^{2}\right|$ being not too large and $a=\mu-\lambda$ being an integer so that $k$ in $\mathbb{P}(W-a\geq k)$ and $\mathbb{P}(Y\geq k)$ is in terms of the number of standard deviations of $W$ . In principle, the constant $a$ is chosen to minimise the error of approximation, however, our theory is formulated in such a flexible way that other choices of $\lambda$ and $a$ are also acceptable. The three most useful choices of $a$ are $a=0$ , $a=\lfloor\mu-\sigma^{2}\rfloor$ and $a=\lceil\mu-\sigma^{2}\rceil$ , where $\lfloor\cdot\rfloor$ stands for the largest integer in $(-\infty,\cdot]$ .

Theorem 2.1

With the setup in the preceding paragraph, assume that $\{X_{i}:\ i\in{\cal I}\}$ satisfies (LD2) and, for each $i$ , there exists a $\sigma$ -algebra ${\cal F}_{i}$ such that $\{X_{j}:\ j\in B_{i}\}$ is ${\cal F}_{i}$ measurable. Define

[TABLE]

where $\operatorname*{ess\,sup}V$ is the essential supremum of the random variable $V$ . Then for integer $a<\mu$ , $\lambda=\mu-a$ and positive integer $k>\lambda$ , we have

[TABLE]

where, with $F(j)=\mathbb{P}(Y\leq j)$ , $\overline{F}(j)=\mathbb{P}(Y\geq j)$ ,

[TABLE]

Remark 2.2

Both ${\mathds{C}}_{1}$ and ${\mathds{C}}_{2}$ can be numerically computed in applications and they can’t be generally improved (see the proofs below). They are better than the “naive” counterparts $(1-e^{-\lambda})/(\lambda\mathbb{P}(Y\geq k))$ derived through the total variation bounds in [Barbour & Eagleson (1984), Barbour, Holst & Janson (1992)]. Figure 5 provides details of

[TABLE]

for $\lambda=10$ , $k$ from $10$ to $43$ . We would like to mention that for large $k$ and/or large $\lambda$ , the tail probabilities are so small that the calculation using MATLAB produces unstable results since accumulated computation errors often exceed the tail probabilities, hence more powerful computational tools are needed to achieve the required accuracy or one has to resort to known approximations to the Poisson right tails and point probabilities.

Remark 2.3

Due to the discrete nature of Poisson distribution, it seems impossible to analytically simplify ${\mathds{C}}_{1}$ and ${\mathds{C}}_{2}$ at negligible costs for the diverse range of $k>\lambda$ . **

Remark 2.4

If $\lambda$ is chosen reasonably close to $\sigma^{2}$ so that $\lambda-\sigma^{2}$ is bounded, then $\theta_{i}$ in the bound (2.1) converges to [math] when $\sigma^{2}$ converges to $\infty$ . Our bound does not rely on the Radon-Nikodym derivative of $\mathscr{L}(W)$ with respect to ${\rm Pn}(\lambda)$ , which is the crucial ingredient in [Chen & Choi (1992), Barbour, Chen & Choi (1995)]. On the other hand, the tacit assumption of [Chen, Fang & Shao (2013a)] is that $\sup_{\lambda\leq r\leq k}\frac{\mathbb{P}(W\geq r)}{\mathbb{P}(Y\geq r)}$ for $W$ and $Y$ in Theorem 2.1 is well-behaved and this assumption is hard to verify. The bound (2.1), although relatively crude, does not rely on this assumption and covers more general cases.**

Corollary 2.5

For the sum of independent non-negative integer valued random variables $W=\sum_{i\in{\cal I}}X_{i}$ , let $\theta_{i}=\max_{j}\mathbb{P}(W-X_{i}=j)$ , $\mu_{i}=\mathbb{E}X_{i}$ , $\mu=\sum_{i\in{\cal I}}\mu_{i}$ , $\sigma^{2}=\operatorname{\mathrm{Var}}(W)$ . For any integer $a<\mu$ , let $\lambda=\mu-a$ , $Y\sim{\rm Pn}(\lambda)$ , then for $k>\lambda$ ,

[TABLE]

Remark 2.6

We leave $\mathbb{P}(W-a<-1)$ in the upper bound (2.1) because the current approach can not remove it from the bound. Nevertheless, it is no more than $1$ and converges to zero exponentially fast with suitable choice of $a$ . For the sum of independent non-negative integer valued random variables in Corollary 2.5, if $a$ is at least less than $\mu$ by a few $\sigma$ s, we can use [Chung & Lu (2006), Theorem 2.7] to obtain

[TABLE]

For any non-negative random variable $W$ with mean $\mu\in(0,\infty)$ and distribution $dF(w)$ , the $W$ -size biased distribution [Cochran (1977), Arratia & Goldstein (2010)] is given by

[TABLE]

or equivalently by the characterising equation

[TABLE]

Theorem 2.7

Let $W$ be a non-negative integer-valued random variable with mean $\mu$ and variance $\sigma^{2}$ , $a<\mu$ be an integer, $\lambda=\mu-a$ . Then for integer $k>\lambda$ , we have

[TABLE]

where $Y\sim{\rm Pn}(\lambda)$ .

Remark 2.8

Theorem 2.7 improves [Chen, Fang & Shao (2013a), Theorem 3] in a number of ways, with less restrictive conditions and no unspecified constants. **

The next theorem is based on the discrete zero-biased distribution defined in [Goldstein & Xia (2006)] and the approach is very similar to that in [Chen, Fang & Shao (2013b)]. For an integer valued random variable $V$ with mean $\mu$ and finite variance $\sigma^{2}$ , we say that $V^{\star}$ has the discrete $V$ -zero biased distribution [Goldstein & Xia (2006), Definition 2.1] if, for all bounded functions $g:\ \mathbb{Z}:=\{0,\pm 1,\pm 2,\dots\}\rightarrow\mathbb{R}$ with $\mathbb{E}|Vg(V)|<\infty$ ,

[TABLE]

where $\Delta f(i):=f(i+1)-f(i)$ .

Theorem 2.9

Let $W$ be a non-negative integer-valued random variable with mean $\mu$ , variance $\sigma^{2}$ , $a<\mu$ be an integer, and $W^{\star}$ have the discrete $W$ -zero biased distribution and be defined on the same probability space as $W$ . Set $R=W^{\star}-W$ and define

[TABLE]

Then, for integer $k>\lambda$ , with $\lambda=\mu-a>0$ , we have

[TABLE]

where $Y\sim{\rm Pn}(\lambda)$ .

3 Examples

As many applications of Poisson approximation rely on size biased distributions, we begin with a review of some facts about size biasing.

Size biasing has been of considerable interest for many decades (see [Barbour, Holst & Janson (1992)], [Ross (2011)], [Arratia, Goldstein & Kochman (2013)] and references therein). In the context of the sum of Bernoulli random variables, its size biasing is particularly simple. More precisely, if $\{X_{i}:\ i\in{\cal I}\}$ is a family of Bernoulli random variables with $\mathbb{P}(X_{i}=1)=p_{i}$ , then the size biased distribution of $W=\sum_{i\in{\cal I}}X_{i}$ is

[TABLE]

where

[TABLE]

$I$ is a random element independent of $\{\{X_{j}^{(i)}:\ j\in{\cal I}\}:\ i\in{\cal I}\}$ having distribution $\mathbb{P}(I=i)=\frac{p_{i}}{\mathbb{E}W}$ , $i\in{\cal I}$ . Moreover, $\{X_{i}:\ i\in{\cal I}\}$ are said to be negatively related (resp. positively related) [Barbour, Holst & Janson (1992), p. 24] if one can construct $\{\{X_{j}^{(i)}:\ j\in{\cal I}\}:\ i\in{\cal I}\}$ such that $X_{j}^{(i)}\leq$ (resp. $\geq$ ) $X_{j}$ for all $j\neq i$ . When $\{X_{i}:\ i\in{\cal I}\}$ are negatively related, we have

[TABLE]

where $\mu=\mathbb{E}W$ and $\sigma^{2}=\operatorname{\mathrm{Var}}(W)$ . On the other hand, if $\{X_{i}:\ i\in{\cal I}\}$ are positively related, then

[TABLE]

3.1 Poisson-binomial trials

Let $\{X_{i},\leavevmode\nobreak\ 1\leq i\leq n\}$ be independent Bernoulli random variables with $\mathbb{P}(X_{i}=1)=p_{i}\in(0,1)$ , $W=\sum_{i=1}^{n}X_{i}$ , $\mu=\mathbb{E}W$ and $\mu_{2}=\sum_{i=1}^{n}p_{i}^{2}$ . When $\tilde{p}:=\max_{1\leq i\leq n}p_{i}\to 0$ , the large deviation of $W$ is investigated in [Chen & Choi (1992), Barbour, Chen & Choi (1995)] with precise asymptotic order. We give two results for this particular case without the assumption $\tilde{p}$ being small, the first is direct consequences of the general results in Section 2 and the second is based on our approach using a more fine-tuned analysis and well-studied properties of the tail behaviour of $W$ .

Proposition 3.1

Recalling ${\mathds{C}}_{1}$ and ${\mathds{C}}_{2}$ in (2.2) and (2.3), for any integer $k>\mu$ , we have

[TABLE]

and, with $a=\lfloor\mu_{2}\rfloor$ and $\lambda:=\mu-a$ ,

[TABLE]

Proof The claim (3.4) is a consequence of Theorem 2.7 with $a=0$ and $\mu\mathbb{E}|W+1-W^{s}|=\sum_{i=1}^{n}p_{i}^{2}$ , as shown in (3.2).

The bound (3.5) is a special case of Corollary 2.5. Since $\mathscr{L}(W_{i})$ is unimodal, [Mattner & Roos (2007), Corollary 1.6] says that

[TABLE]

On the other hand, $\mathbb{E}(X_{i}^{2})=p_{i}$ , hence the upper bound (3.5) is an immediate consequence of Corollary 2.5 and (2.4).

One can also use Theorem 2.9 to obtain the same bound. More precisely, according to the construction of the discrete zero-biased distribution suggested in [Goldstein & Xia (2006)], let $I$ be a random variable independent of $\{X_{i},\leavevmode\nobreak\ 1\leq i\leq n\}$ with distribution $\mathbb{P}(I=i)=p_{i}(1-p_{i})/\sigma^{2}$ for $1\leq i\leq n$ , then we can write $W^{\star}=W-X_{I}$ , giving $R=-X_{I}$ . We then apply (3.6) to bound $\theta_{R}$ as

[TABLE]

and a routine calculation gives $\mathbb{E}|R|=\sum_{i=1}^{n}p_{i}^{2}(1-p_{i})/\sigma^{2}$ , hence (3.5) follows from (2.6) and (2.4).

Proposition 3.2

Define

[TABLE]

then for any integer $k$ with $x:=(k-\mu)/\sqrt{\mu}\geq 1$ , we have

[TABLE]

The proof relies on more information of the solutions of Stein’s equation and it is postponed to the end of Section 4. The bound (3.7) improves [Chen, Fang & Shao (2013a), (3.1)] in two aspects: it contains no unspecified constants and it does not require $\tilde{p}$ being small. For the distribution of the number $S_{n}$ of records, the large deviation results in [Barbour, Chen & Choi (1995)] do not apply. However, recalling that $\lambda_{n}=\sum_{i=2}^{n}\frac{1}{i}$ , we apply Proposition 3.2 with the harmonic series $\lambda_{n}=\sum_{i=2}^{n}\frac{1}{i}\geq\ln n+\gamma-1$ and the Riemann zeta function $\sum_{i=2}^{n}\frac{1}{i^{2}}\leq\sum_{i=2}^{\infty}\frac{1}{i^{2}}=\frac{\pi^{2}}{6}-1$ to get the following estimate.

Corollary 3.3

For any integer $k$ with $x:=(k-\lambda_{n})/\sqrt{\lambda_{n}}\geq 1$ , we have

[TABLE]

where $\gamma$ is Euler’s constant.

Remark 3.4

We conjecture that, with $a=\lfloor\mu_{2}\rfloor$ and $\lambda:=\mu-a$ , the bound in (3.5) can be significantly improved and the better estimate is likely dependent on the Radon-Nikodym derivative bound $\sup_{r\geq 0}\frac{\mathbb{P}(W-a=r)}{{\rm Pn}(\lambda)(\{r\})}$ . **

3.2 Matching problem

For a fixed $n$ , let $\pi$ be a uniform random permutation of $\{1,\dots,n\}$ , $W=\sum_{j=1}^{n}{\bf 1}_{\{j=\pi(j)\}}$ be the number of fixed points in the permutation.

Proposition 3.5

For the random variable $W$ defined above and any integer $k\geq 2$ , we have

[TABLE]

Proof of Proposition 3.5 In this case, the size-biased distribution $\mathscr{L}(W^{s})$ can be coupled with $W$ as follows [Chatterjee, Diaconis & Meckes (2005)]. Let $I$ be uniformly distributed on $\{1,2,\dots,n\}$ , independent of $\pi$ , and define

[TABLE]

Set $W^{s}=\sum_{j=1}^{n}{\bf 1}_{\{j=\pi^{s}(j)\}}$ , one can easily verify that $W^{s}$ has the size-biased distribution of $W$ . Also, we can check that $\mathbb{E}W=\operatorname{\mathrm{Var}}(W)=1$ , giving $\mathbb{E}W^{s}=2$ . Let $\Delta=W+1-W^{s}$ , using the above construction of $W^{s}$ , we can conclude that $\Delta$ takes values in $\{-1,0,1\}$ and $\mathbb{P}(\Delta=1|W)=W/n$ . Since $\mathbb{E}\Delta=0,$ we have $\mathbb{P}(\Delta=1)=\mathbb{P}(\Delta=-1)$ , and $\mathbb{E}|\Delta|=2/n$ . On the other hand, $\lambda=\mu$ allows us to get rid of the second term in (2.5). By Theorem 2.7 with $a=0$ , $\lambda=\mu=1$ , the claim follows.

Remark 3.6

The bound (3.8) contains no unknown constants and improves the bound of [Chen, Fang & Shao (2013a), §3.3]. **

3.3 Occupancy Problem

The occupancy problem has a long history dating back to the early development of probability theory. General references on this subject can be found in classics, e.g., [Feller (1968), Vol 1, Chapter 2] and [Barbour, Holst & Janson (1992), Chapter 6].

The occupancy problem can be formulated as follows. Let $l$ balls be thrown independently of each other into $n$ boxes uniformly. Let $X_{i}$ be the indicator variable of the event that $i$ -th box being empty, so the number of empty boxes can be written as $W=\sum_{i=1}^{n}X_{i}$ . Noting that $p:=\mathbb{E}X_{i}=\left(1-\frac{1}{n}\right)^{l}$ , direct computation gives

[TABLE]

Proposition 3.7

For the random variable $W$ defined above and any integer $k>\mu$ , we have

[TABLE]

where $Y\sim{\rm Pn}(\mu)$ .

Proof of Proposition 3.7 For the sake of completeness, we provide the following proof which is essentially a repeat of [Barbour, Holst & Janson (1992), p. 23]. From the construction of $W$ -size biased distribution in (3.1), we can construct a coupling as follows. Let $I$ be uniform on $\{1,\dots,n\}$ , that is, we randomly pick one box with equal probability. If the selected box is not empty, we redistribute all balls in the box randomly into the other $n-1$ boxes with equal probability $1/(n-1)$ . Define $X_{j}^{(i)}$ as the indicator of the event that the box being selected is $i$ , and after the redistribution, box $j$ is empty. With this coupling in mind, one can verify that $\{X_{i}\}$ is negatively related so it follows from (3.2) that

[TABLE]

Now, applying Theorem 2.7 with $a=0$ yields (3.9).

3.4 Birthday problem

The classical birthday problem is essentially a variant of the occupancy problem. For this reason, we throw $l$ balls independently and equally likely into $n$ boxes and let $X_{ij}$ be the indicator random variable of the event that ball $i$ and ball $j$ fall into the same box. The number of pairs of balls going into the same boxes (i.e., the number of pairs of people having the same birthdays) can be written as $W=\sum_{i<j}X_{ij}$ . Define $p=\mathbb{E}X_{ij}=\frac{1}{n}$ , so $\mu=\mathbb{E}W={l\choose 2}p.$ [Chatterjee, Diaconis & Meckes (2005)] give the following construction of $W^{s}$ : label the balls from $1$ to $l$ , randomly choose two balls $J_{1}$ and $J_{2}$ and move ball $J_{1}$ into the box that $J_{2}$ is in, then $W$ is the number of pairs of balls before the move while $W^{s}$ is the number of pairs of balls after the move. Let $E$ be the event that $J_{1}$ and $J_{2}$ are from the same box. When $E$ occurs, $W^{s}=W$ so $|W+1-W^{s}|=1$ ; otherwise, $J_{1}$ and $J_{2}$ are from different boxes with $B_{1}$ and $B_{2}$ balls respectively, giving

[TABLE]

Hence,

[TABLE]

This, together with Theorem 2.7 and $a=0$ , gives the following Proposition.

Proposition 3.8

For the random variable $W$ defined above and any integer $k>\mu$ , we have

[TABLE]

where $Y\sim{\rm Pn}(\mu)$ .

3.5 Triangles in the Erdős-Rényi random graph

Let $G=G(n,p)$ be an Erdős-Rényi random graph on $n$ vertices with edge probability $p$ . Let $K_{n}$ be the complete graph on $n$ vertices, and $\Gamma$ be the set of all triangles in $K_{n}$ . For $\alpha\in\Gamma$ , let $X_{\alpha}$ be the indicator that there is a triangle in $G$ at $\alpha$ , i.e.

[TABLE]

Therefore the number of triangles in $G$ can be represented as $W=\sum_{\alpha\in\Gamma}X_{\alpha}$ . Clearly, $X_{\alpha}$ is independent of $X_{\beta}$ if $\alpha$ and $\beta$ don’t share a common edge. By analysing the numbers of shared edges, we obtain (see [Ross (2011), p. 255])

[TABLE]

Proposition 3.9

For the random variable $W$ defined above and any integer $k>\mu$ , we have

[TABLE]

where $Y\sim{\rm Pn}(\mu)$ .

Proof of Proposition 3.9 The following proof is a special version of the general argument in [Barbour, Holst & Janson (1992), p. 89]. Since $X_{\alpha}$ and $X_{\beta}$ are independent if $\alpha$ and $\beta$ have no common edges, a size biased distribution of $W$ can be constructed as follows. Let

[TABLE]

then $\mathscr{L}(\{X_{\beta}^{(\alpha)},\beta\neq\alpha\})=\mathscr{L}(\{X_{\beta},\beta\neq\alpha\}|X_{\alpha}=1)$ . Here the union of graphs is in the sense of set operation of their vertices and edges. Let $I$ be a random element taking values in $\Gamma$ with equal probability and be independent of $\mathscr{L}(\{X_{\beta}^{(\alpha)},\alpha,\beta\})$ , then we can write $W^{s}=\sum_{\beta\neq I}X_{\beta}^{(I)}+1$ . Because $X_{\beta}^{(\alpha)}\geq X_{\beta}$ for all $\beta\in\Gamma$ , (3.3) implies

[TABLE]

The claim follows from Theorem 2.7 with $a=0$ .

Remark 3.10

Since $\mu={n\choose 3}p^{3}$ , if $p=O(1/n)$ , then the error bound (3.10) is of the same order $O(1/n)$ .

3.6 2-runs

Let $\{\xi_{i},\dots,\xi_{n}\}$ be i.i.d. $Bernoulli(p)$ random variables with $n\geq 9$ , $p<2/3$ . For each $1\leq i\leq n$ , define $X_{i}=\xi_{i}\xi_{i+1}$ and, to avoid edge effects, we define $\xi_{j+n}=\xi_{j}$ for $-3\leq j\leq n$ . The number of $2$ -runs in the Bernoulli sequence is defined as $W=\sum_{i=1}^{n}X_{i}$ , then $\mu=np^{2}$ and variance $\sigma^{2}=np^{2}(1-p)(3p+1)$ .

Proposition 3.11

For any integer $k>\mu$ ,

[TABLE]

With $a:=\lfloor np^{3}(3p-2)\rfloor$ , $\lambda=\mu-a$ , then for any integer $k>\lambda$ ,

[TABLE]

Proof For (3.11), we apply Theorem 2.7 with $a=0$ ,

[TABLE]

$I$ a uniform random variable on $\{1,\dots,n\}$ independent of $\{X_{j}^{(i)}\}$ , and

[TABLE]

giving

[TABLE]

Apropos of (3.12), we make use of Theorem 2.1. To this end, let $A_{i}=\{i-1,i,i+1\}$ , $B_{i}=\{i-2,i-1,i,i+1,i+2\}$ , ${\cal F}_{i}=\sigma\{\xi_{j}:\ i-2\leq j\leq i+3\}$ , then [Barbour & Xia (1999), Lemma 5.1] with $\alpha_{j}=0$ or $1$ for $j=i-2,\cdots,i+5$ gives

[TABLE]

On the other hand, $\mathbb{E}(Z_{i}^{\prime})=5p^{2}$ , $|\mathbb{E}((X_{i}-\mu_{i})Z_{i})|\leq\mathbb{E}(Z_{i})=3p^{2}$ ,

[TABLE]

and $|\lambda-\sigma^{2}|\lambda^{-1}\leq 1\wedge(\lambda^{-1})$ , $a=\lfloor np^{3}(3p-2)\rfloor\leq 0$ , $\lambda\geq\sigma^{2}$ , hence $\mathbb{P}(W-a<-1)=0$ and (3.12) follows from Theorem 2.1 by collecting these terms.

4 The proofs of the main results

The celebrated Stein-Chen method [Chen (1975)] is based on the observation that a non-negative random variable $Y\sim{\rm Pn}(\lambda)$ if and only if $\mathbb{E}[\lambda f(Y+1)-Yf(Y)]=0$ for all bounded functions $f:\leavevmode\nobreak\ \mathbb{Z}_{+}:=\{0,1,2,\dots\}\rightarrow\mathbb{R}$ , leading to a Stein identity for Poisson approximation as

[TABLE]

where ${\rm Pn}(\lambda)\{h\}:=\mathbb{E}h(Y)$ . Since $f(0)$ plays no role in Stein’s equation, we set $f(0)=f(1)$ and $f(j)=0$ for $j<0$ . The following Lemma plays the key role in the proofs of the main results and it enables us to circumvent checking the moment condition (1.1) which seems to be inevitable in the existing procedure for proving moderate deviation theorems.

Lemma 4.1

For fixed $k\in\mathbb{Z}_{+}$ , let $h={\bf 1}_{[k,\infty)}$ . With $\pi_{\cdot}={\rm Pn}(\lambda)(\{\cdot\})$ , $\Delta f(i)=f(i+1)-f(i)$ and $\Delta^{2}f=\Delta(\Delta f)$ , the solution $f:=f_{h}$ of the Stein equation (4.1) has the following properties:

(i) $\|f\|:=\sup_{i\in\mathbb{Z}_{+}}|f(i)|={\mathds{C}}_{0}(\lambda,k){\rm Pn}(\lambda)\{h\}$ , where ${\mathds{C}}_{0}(\lambda,k):=\frac{F(k-1)}{k\pi_{k}}$ ;

(ii) $\Delta f(i)$ is negative and decreasing in $i\leq k-1$ ; and positive and decreasing in $i\geq k$ ;

(iii) $\|\Delta f\|_{k-}:=\sup_{i\leq k-1}|\Delta f(i)|={\mathds{C}}_{1-}(\lambda,k){\rm Pn}(\lambda)\{h\}$ and $\|\Delta f\|_{k+}:=\sup_{i\geq k}|\Delta f(i)|={\mathds{C}}_{1+}(\lambda,k){\rm Pn}(\lambda)\{h\}$ , where

[TABLE]

and

[TABLE]

(iv) $\|\Delta f\|:=\sup_{i\in\mathbb{Z}_{+}}|\Delta f(i)|={\mathds{C}}_{1}(\lambda,k){\rm Pn}(\lambda)\{h\}$ and $\|\Delta^{2}f\|:=\sup_{i\in\mathbb{Z}_{+}}|\Delta^{2}f(i)|={\mathds{C}}_{2}(\lambda,k){\rm Pn}(\lambda)\{h\}$ ;

where ${\mathds{C}}_{1}$ and ${\mathds{C}}_{2}$ are defined in (2.2) and (2.3).

For $k>\lambda$ , death rates are bigger than the birth rate, so it seems intuitively obvious that $\tau_{k}^{-}$ is stochastically less than or equal to $\tau_{k-2}^{+}$ for such $k$ . In view of representation (4.11) and $f(k)<0$ as shown in (4.6), this is equivalent to ${\mathds{C}}_{1-}(\lambda,k)>{\mathds{C}}_{1+}(\lambda,k)$ , leading to the following conjecture.

Conjecture 4.2

We conjecture that ${\mathds{C}}_{1-}(\lambda,k)>{\mathds{C}}_{1+}(\lambda,k)$ for all $k>\lambda$ and the gap increases exponentially as a function of $k-\lambda$ .**

Proof of Lemma 4.1 We build our argument on the birth-death process representation of the solution

[TABLE]

where $Z_{n}(t)$ is a birth-death process with birth rate $\lambda$ , unit per capita death rate and initial state $Z_{n}(0)=n$ [Barbour (1988), Barbour & Brown (1992), Brown & Xia (2001)]. For convenience, we adopt the notation in [Brown & Xia (2001)]: for $i,j\in\mathbb{Z}_{+}$ , define

[TABLE]

and

[TABLE]

Applying Lemmas 2.1 and 2.2 of [Brown & Xia (2001)] with birth rate $\lambda$ , death rate $\beta_{i}=i$ , $A:=[k,\infty)$ and $\pi(\cdot)=\sum_{l\in\cdot}\pi_{l}$ , we have

[TABLE]

and for $j\in\mathbb{Z}_{+}$ ,

[TABLE]

where, as in Theorem 2.1,

[TABLE]

One can easily simplify (4.3) to get

[TABLE]

which, together with (4.4) and the balance equations

[TABLE]

implies

[TABLE]

It follows from [Brown & Xia (2001), Lemma 2.4] that for $i\geq 1$ ,

[TABLE]

which, together with (4.7), ensures

[TABLE]

hence, $f(k)\leq f(i)\leq 0$ and combining (4.4), (4.5) and (4.6) gives $\|f\|=|f(k)|=\frac{F(k-1)}{k\pi_{k}}\pi(A),$ as claimed in (i).

Apropos of (ii), because of (4.9) and (4.10), it remains to show that $\Delta f$ is decreasing in the two ranges. To this end, we will mainly rely on the properties of the solution (4.2). Let $T$ be an exponential random variable with mean $1$ and independent of birth-death process $Z_{i-1}$ , then $Z_{i}$ can be represented as

[TABLE]

hence we obtain from (4.2) and the strong Markov property in the second last equality that

[TABLE]

This enables us to give another representation of (4.8) as

[TABLE]

and so

[TABLE]

For $i\geq k$ , using the strong Markov property again in the equalities below, we have

[TABLE]

where the inequality follows from

[TABLE]

Similarly, for $i\leq k-2$ , $\tau_{i-1,i}$ is stochastically less than or equal to $\tau_{i,i+1}$ , so

[TABLE]

Hence, $\Delta^{2}f(i)\leq 0$ for $i\geq k$ and $i\leq k-2$ , which concludes the proof of (ii).

In terms of (iii), we use (ii) to obtain

[TABLE]

Likewise,

[TABLE]

Since (iv) is clearly an immediate consequence of (iii), (2.2) and (2.3), the proof of Lemma 4.1 is complete.

Proof of Theorem 2.1 As in the proof of Lemma 4.1, we set $A=[k,\infty)$ and $h={\bf 1}_{A}$ , then

[TABLE]

Define

[TABLE]

then it follows from (4.1) that

[TABLE]

For the estimate of $e_{1}$ , from $f(0)=f(1)$ , we know that $\lambda f(0)=-{\rm Pn}(\lambda)\{h\}$ , thus

[TABLE]

which gives

[TABLE]

For the estimate of $e_{2}$ , denoting ${\tilde{f}}(j):=f(j-a)$ , we have

[TABLE]

Using Lemma 4.1 (ii), we have $\Delta^{2}{\tilde{f}}(m)$ is negative for all $m$ except $m=a+k-1$ , which implies $-\sum_{m\neq k-1}\Delta^{2}f(m)\leq\Delta^{2}f(k-1)=\|\Delta^{2}f\|$ and

[TABLE]

and

[TABLE]

hence

[TABLE]

By taking

[TABLE]

we have from (4.14) that

[TABLE]

where the third last equality is because $\sum_{i\in{\cal I}}\mathbb{E}[(X_{i}-\mu_{i})Z_{i}]=\sigma^{2}$ and $(X_{i},Z_{i})$ is independent of $W_{i}^{\prime}$ , and the last equality is due to the assumption that $\{X_{j}:\ j\in B_{i}\}$ is ${\cal F}_{i}$ measurable. Using (4.15) in (4.16), we obtain

[TABLE]

Now, combining Lemma 4.1 (iii), (iv), (4.12), (4.13) and (4.17) gives (2.1).

Proof of Corollary 2.5 Under the setting of the local dependence, the claim follows from Theorem 2.1 by taking $Z_{i}=Z_{i}^{\prime}=X_{i}$ .

Proof of Theorem 2.7 Recall the Stein representation (4.12) and the estimate (4.13), it remains to tackle (4.14). However,

[TABLE]

thus

[TABLE]

Hence, combining (4.12), (4.13) and (4) completes the proof.

Proof of Theorem 2.9 Again, we make use of the Stein representation (4.12) and the estimate (4.13) so that it suffices to deal with (4.14). To this end, we have

[TABLE]

However, with $R=W^{\star}-W$ ,

[TABLE]

and a similar argument for (4.15) ensures

[TABLE]

hence

[TABLE]

The claim follows from combining (4.12), (4.13) and (4.19) and using Lemma 4.1 (iii), (iv).

Proof of Proposition 3.2 The first inequality of (3.7) is a direct consequence of [Hoeffding (1956)]. For the second inequality, let $h={\bf 1}_{[k,\infty)}$ and $f$ be the solution of the Stein identity (4.1) with $\lambda=\mu$ , set $W_{i}=W-X_{i}$ , $Y\sim{\rm Pn}(\mu)$ , the following argument is standard (see [Barbour, Holst & Janson (1992), p. 6]) and we repeat it for the ease of reading:

[TABLE]

For any non-negative integer valued random variable $U$ such that the following expectations exist, the summation by parts gives

[TABLE]

On the other hand, [Barbour, Chen & Choi (1995), Proposition 2.1] ensures

[TABLE]

so using Lemma 4.1 (ii), we have

[TABLE]

However, by (4.2), since ${\rm Pn}(\mu)$ is the stationary distribution of $Z_{i}$ , $Z_{Y}(t)\sim{\rm Pn}(\mu)$ , leading to

[TABLE]

where $T_{1},T_{2}$ are i.i.d. $\exp(1)$ random variables independent of $Y$ . Combining (4.20), (4.21) and (4.22), we have

[TABLE]

For the first term of (4.23), using [Barbour, Holst & Janson (1992), Proposition A.2.1 (ii)], we obtain

[TABLE]

For the second term of (4.23), we use the crude estimate of $\Delta^{2}f(k-1)\leq 2\|\Delta f\|\leq 2(1-e^{-\mu})/\mu$ (see [Barbour, Holst & Janson (1992), Lemma 1.1.1] or Remark 2.2), so applying [Barbour, Holst & Janson (1992), Proposition A.2.1 (ii)] again,

[TABLE]

The bound (3.7) follows by collecting (4.23), (4.24) and (4.25).

Acknowledgements We thank the anonymous referees for suggesting the “naive bound” in Remark 2.2 and comments leading to the improved version of the paper. We also thank Serguei Novak for email discussions about the quality of the bounds presented in the paper versus the “naive bound”.

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Arratia & Goldstein (2010)] Arratia, R. & Goldstein, L. (2010). Size bias, sampling, the waiting time paradox, and infinite divisibility: when is the increment independent? ar Xiv:1007.3910.
2[Arratia, Goldstein & Gordon (1989)] Arratia, R., Goldstein, L. & Gordon, L. (1989). Two moments suffice for Poisson approximations: The Chen-Stein method. Ann. Probab. 17 , 9–25.
3[Arratia, Goldstein & Kochman (2013)] Arratia, R., Goldstein, L. & Kochman, F. (2013). Size bias for one and all. ar Xiv preprint ar Xiv:1308.2729.
4[Barbour (1988)] Barbour, A. D. (1988). Stein’s method and Poisson process convergence. J. Appl. Probab. 25 (A), 175–184.
5[Barbour & Brown (1992)] Barbour, A. D. & Brown, T. C. (1992). Stein’s method and point process approximation. Stochastic Process. Appl. 43 , 9–31.
6[Barbour, Chen & Choi (1995)] Barbour, A. D., Chen, L. H. Y. & Choi, K. P. (1995). Poisson approximation for unbounded functions, I: Independent summands. Statist. Sinica 2 , 749–766.
7[Barbour & Eagleson (1984)] Barbour, A. D. & Eagleson, G. K. (1984). Poisson Convergence for Dissociated Statistics. Journal of the Royal Statistical Society. Series B (Methodological) 46 , 397–402.
8[Barbour, Holst & Janson (1992)] Barbour, A. D., Holst, L. & Janson, S. (1992). Poisson Approximation. Oxford Univ. Press.