Non-uniform Bounds in the Poisson Approximation with Applications to   Informational Distances. I

S.G. Bobkov; G.P. Chistyakov; F. G\"otze

arXiv:1906.09152·math.PR·August 13, 2019·IEEE Trans. Inf. Theory

Non-uniform Bounds in the Poisson Approximation with Applications to Informational Distances. I

S.G. Bobkov, G.P. Chistyakov, F. G\"otze

PDF

Open Access

TL;DR

This paper derives asymptotically optimal bounds for how much Bernoulli convolutions deviate from the Poisson distribution, using informational distances like Shannon entropy and chi-squared, based on non-uniform density estimates.

Contribution

It introduces new non-uniform bounds for deviations of Bernoulli convolutions from the Poisson limit in terms of informational distances.

Findings

01

Established asymptotically optimal bounds for deviations

02

Applied bounds to non-homogeneous Bernoulli models

03

Enhanced understanding of informational distances in Poisson approximation

Abstract

We explore asymptotically optimal bounds for deviations of Bernoulli convolutions from the Poisson limit in terms of the Shannon relative entropy and the Pearson $χ^{2}$ -distance. The results are based on proper non-uniform estimates for densities. They deal with models of non-homogeneous, non-degenerate Bernoulli distributions.

Equations283

P {W = k} = \sum p_{1}^{ε_{1}} q_{1}^{1 - ε_{1}} \dots p_{n}^{ε_{n}} q_{n}^{1 - ε_{n}},

P {W = k} = \sum p_{1}^{ε_{1}} q_{1}^{1 - ε_{1}} \dots p_{n}^{ε_{n}} q_{n}^{1 - ε_{n}},

λ = p_{1} + \dots + p_{n},

λ = p_{1} + \dots + p_{n},

v_{k} = P {Z = k} = \frac{λ ^{k}}{k !} e^{- λ}, k = 0, 1, \dots

v_{k} = P {Z = k} = \frac{λ ^{k}}{k !} e^{- λ}, k = 0, 1, \dots

d (W, Z)

d (W, Z)

λ_{2} = p_{1}^{2} + \dots + p_{n}^{2} .

λ_{2} = p_{1}^{2} + \dots + p_{n}^{2} .

\frac{1}{32} min (1, 1/ λ) λ_{2} \leq \frac{1}{2} d (W, Z) \leq \frac{1 - e ^{- λ}}{λ} λ_{2} .

\frac{1}{32} min (1, 1/ λ) λ_{2} \leq \frac{1}{2} d (W, Z) \leq \frac{1 - e ^{- λ}}{λ} λ_{2} .

d (W, Z) \leq 2 λ_{2} .

d (W, Z) \leq 2 λ_{2} .

d (W, Z) \leq \frac{2.1}{λ} λ_{2} if j \leq n max p_{j} \leq \frac{1}{4}, respectively d (W, Z) \leq \frac{10}{λ} λ_{2} .

d (W, Z) \leq \frac{2.1}{λ} λ_{2} if j \leq n max p_{j} \leq \frac{1}{4}, respectively d (W, Z) \leq \frac{10}{λ} λ_{2} .

D (W ∣∣ Z) = k = 0 \sum \infty w_{k} lo g \frac{w _{k}}{v _{k}},

D (W ∣∣ Z) = k = 0 \sum \infty w_{k} lo g \frac{w _{k}}{v _{k}},

\frac{- lo g ( 1 - p ) - p}{2} - \frac{14 p ^{2}}{n ( 1 - p ) ^{3}}

\frac{- lo g ( 1 - p ) - p}{2} - \frac{14 p ^{2}}{n ( 1 - p ) ^{3}}

D (W ∣∣ Z) = \frac{λ ^{2}}{4 n ^{2}} + O (1/ n^{3}) as n \to \infty.

D (W ∣∣ Z) = \frac{λ ^{2}}{4 n ^{2}} + O (1/ n^{3}) as n \to \infty.

D (W ∣∣ Z) \leq \frac{1}{λ} j = 1 \sum n \frac{p _{j}^{3}}{1 - p _{j}}

D (W ∣∣ Z) \leq \frac{1}{λ} j = 1 \sum n \frac{p _{j}^{3}}{1 - p _{j}}

D (W ∣∣ Z) \leq A_{λ} λ_{2}^{2}

D (W ∣∣ Z) \leq A_{λ} λ_{2}^{2}

D(W||Z)\geq\frac{1}{4}\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}

D(W||Z)\geq\frac{1}{4}\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}

χ^{2} (W, Z) = k = 0 \sum \infty \frac{( w _{k} - v _{k} ) ^{2}}{v _{k}} .

χ^{2} (W, Z) = k = 0 \sum \infty \frac{( w _{k} - v _{k} ) ^{2}}{v _{k}} .

D(W||Z)\,\leq\,\chi^{2}(W,Z)\,\leq\,c\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}.

D(W||Z)\,\leq\,\chi^{2}(W,Z)\,\leq\,c\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}.

\displaystyle D(W||Z)\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,(1+\log F),\qquad\chi^{2}(W,Z)\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,\sqrt{F},

\displaystyle D(W||Z)\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,(1+\log F),\qquad\chi^{2}(W,Z)\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,\sqrt{F},

F = \frac{max ( 1 , λ )}{max ( 1 , λ - λ _{2} )} .

F = \frac{max ( 1 , λ )}{max ( 1 , λ - λ _{2} )} .

Δ_{k} = w_{k} - v_{k} = P {W = k} - P {Z = k} .

Δ_{k} = w_{k} - v_{k} = P {W = k} - P {Z = k} .

∣ Δ_{k} ∣ \leq 2 λ_{2} P {k - 2 \leq Z \leq k} .

∣ Δ_{k} ∣ \leq 2 λ_{2} P {k - 2 \leq Z \leq k} .

∣ Δ_{0} ∣ \leq 3 λ_{2} e^{- λ}, ∣ Δ_{k} ∣ \leq 3 λ_{2} (k \geq 1) .

∣ Δ_{0} ∣ \leq 3 λ_{2} e^{- λ}, ∣ Δ_{k} ∣ \leq 3 λ_{2} (k \geq 1) .

∣ Δ_{k} ∣

∣ Δ_{k} ∣

|\Delta_{k}|\,\leq\,c\,\Big{(}\frac{(k-\lambda)^{2}}{\lambda}+1\Big{)}\,\frac{\lambda_{2}}{\lambda}\,{\mathbb{P}}\{Z=k\},

|\Delta_{k}|\,\leq\,c\,\Big{(}\frac{(k-\lambda)^{2}}{\lambda}+1\Big{)}\,\frac{\lambda_{2}}{\lambda}\,{\mathbb{P}}\{Z=k\},

|\Delta_{k}|\,\leq\,c\,\Big{(}\frac{k}{\lambda}\Big{)}^{3}\,\lambda_{2}\,{\mathbb{P}}\{Z=k\}.

|\Delta_{k}|\,\leq\,c\,\Big{(}\frac{k}{\lambda}\Big{)}^{3}\,\lambda_{2}\,{\mathbb{P}}\{Z=k\}.

v_{k} = f (k) = P {Z = k} = \frac{λ ^{k}}{k !} e^{- λ}, k = 0, 1, \dots,

v_{k} = f (k) = P {Z = k} = \frac{λ ^{k}}{k !} e^{- λ}, k = 0, 1, \dots,

2 π k^{k + \frac{1}{2}} e^{- k} \leq k! \leq e k^{k + \frac{1}{2}} e^{- k} (k \geq 1) .

2 π k^{k + \frac{1}{2}} e^{- k} \leq k! \leq e k^{k + \frac{1}{2}} e^{- k} (k \geq 1) .

f (k) \leq \frac{1}{2 π k} .

f (k) \leq \frac{1}{2 π k} .

\frac{1}{e k} e^{- \frac{( k - λ ) ^{2}}{λ}} \leq f (k) \leq \frac{1}{2 π k} e^{- \frac{( k - λ ) ^{2}}{3 λ}} .

\frac{1}{e k} e^{- \frac{( k - λ ) ^{2}}{λ}} \leq f (k) \leq \frac{1}{2 π k} e^{- \frac{( k - λ ) ^{2}}{3 λ}} .

f (k) \geq \frac{1}{e k} e^{- \frac{( k - λ ) ^{2}}{2 λ}} .

f (k) \geq \frac{1}{e k} e^{- \frac{( k - λ ) ^{2}}{2 λ}} .

f (k)

f (k)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Mechanics and Entropy · Stochastic processes and financial applications · Sparse and Compressive Sensing Techniques

Full text

School of Mathematics, University of Minnesota, USA; research was partially supported by SFB 1283, Humboldt Foundation, and NSF grant

Faculty of Mathematics, University of Bielefeld, Germany; research was partially supported by SFB 1283

NON-UNIFORM BOUNDS IN THE POISSON APPROXIMATION

WITH APPLICATIONS TO INFORMATIONAL DISTANCES. I

S. G. Bobkov1 missing Sergey G. Bobkov School of Mathematics, University of Minnesota 127 Vincent Hall, 206 Church St. S.E., Minneapolis, MN 55455 USA

[email protected]

,

G. P. Chistyakov2 missing Gennadiy P. ChistyakovFakultät für Mathematik, Universität BielefeldPostfach 100131, 33501 Bielefeld, Germany

[email protected]

and

F. Götze2

Friedrich GötzeFakultät für Mathematik, Universität BielefeldPostfach 100131, 33501 Bielefeld, Germany

[email protected]

Abstract.

We explore asymptotically optimal bounds for deviations of Bernoulli convolutions from the Poisson limit in terms of the Shannon relative entropy and the Pearson $\chi^{2}$ -distance. The results are based on proper non-uniform estimates for densities. This part deals with the so-called non-degenerate case.

Key words and phrases:

$\chi^{2}$ -divergence, Relative entropy, Poisson approximation

1991 Mathematics Subject Classification:

Primary 60E, 60F

1. Introduction

Let $X_{1},\dots,X_{n}$ be independent Bernoulli random variables taking the two values, $1$ (interpreted as a success) and [math] (as a failure) with respective probabilities $p_{j}$ and $q_{j}=1-p_{j}$ . The total number of successes $W=X_{1}+\dots+X_{n}$ takes values $k=0,1,\dots,n$ with probabilities

[TABLE]

where the summation runs over all 0-1 sequences $\varepsilon_{1},\dots,\varepsilon_{n}$ such that $\varepsilon_{1}+\dots+\varepsilon_{n}=k$ . Although this expression is difficult to determine in case of arbitrary $p_{j}$ and large $n$ , it can be well approximated by the Poisson probabilities under quite general assumptions. Putting

[TABLE]

let $Z$ be a Poisson random variable with parameter $\lambda>0$ (for short, $Z\sim P_{\lambda}$ ), i.e.,

[TABLE]

It is well-known for a long time that, if $\max_{j\leq n}p_{j}$ is small, the distribution $P_{\lambda}$ approximates the distribution $P_{W}$ of $W$ , which may be quantified by means of the total variation distance

[TABLE]

where $w_{k}={\mathbb{P}}\{W=k\}$ . In particular, based on Stein-Chen’s method, there is the following two-sided bound due to Barbour and Hall involving the functional

[TABLE]

Theorem 1.1 [1]. One has

[TABLE]

Here, the parameter $\lambda_{2}$ , or more precisely – the ratio $\lambda_{2}/\lambda$ (for $\lambda$ bounded away from zero), plays a similar role as the Lyapunov ratio $L_{3}$ in the central limit theorem.

In the i.i.d. case with $p_{j}=\lambda/n$ and fixed $\lambda>0$ , both sides of (1.2) are of the same order $1/n$ . In the case $\lambda\leq 1$ , the upper bound in (1.2) is sharp also in the sense that the second inequality becomes an equality for $p_{1}=\lambda$ , $p_{j}=0$ ( $2\leq j\leq n$ ).

Theorem 1.1 refined many previous results in this direction, starting from bounds for the i.i.d. case by Prokhorov [17] and bounds for the general case by Le Cam [14]. In particular, Le Cam obtained the upper bound

[TABLE]

For large $\lambda$ Kerstan [12] and respectively Chen [4] improved these bounds to

[TABLE]

See also [10], [23], [21], [18], [19], [2] and the references therein. A certain refinement of the lower bound in (1.2) was obtained in Sason [20].

While (1.2) provides a sharp estimate for the total variation distance, one may wonder whether or not similar approximation bounds hold for the stronger informational distances. As a first interesting example, one may consider the relative entropy

[TABLE]

often called the Kullback-Leibler distance, or an informational divergence of $P_{W}$ from $P_{\lambda}$ . It dominates the total variation distance in view of the Pinsker inequality $D(W||Z)\geq\frac{1}{2}\,d(W,Z)^{2}.$ In this context, lower and upper bounds for the relative entropy were studied by Harremoës [6], [7], and Harremoës and Ruzankin [9]. In particular, in the i.i.d. case $p_{j}=p$ , it was shown in [9] that

[TABLE]

If $p=\lambda/n$ with a fixed (or just bounded) value of $\lambda$ , these estimates provide the rate of Poisson approximation

[TABLE]

The general non-i.i.d. scenario (with not necessarily equal probabilities $p_{j}$ ) has been partially studied as well. A simple upper estimate $D(W||Z)\leq\lambda_{2}$ , analogous to Le Cam’s bound (1.3), may be found in [6], cf. also Johnson [11]. It is however not so sharp as (1.4). A tighter upper bound

[TABLE]

was later derived by Kontoyiannis, Harremoës and Johnson [13]. If $p_{j}=\lambda/n$ with $\lambda\leq n/2$ , it yields $D(W||Z)\leq 2\lambda^{2}/n^{2}$ reflecting a correct decay with respect to $n$ up to a constant, according to (1.4). Nevertheless, in the general case, Pinsker’s inequality and the bounds (1.2)-(1.3) suggest that a further sharpening such as

[TABLE]

might be possible by involving $\lambda_{2}$ rather than the functional $\lambda_{3}=p_{1}^{3}+\dots+p_{n}^{3}$ . To compare the two quantities, note that, by Cauchy’s inequality, $\lambda_{2}^{2}\leq\lambda\lambda_{3}$ . Hence, the inequality (1.6) would be sharper compared to (1.5) modulo a $\lambda$ -dependent factor. An upper bound such as (1.6) may also be inspired by the lower bound

[TABLE]

recently derived by Harremoës, Johnson and Kontoyiannis [8]. It is consistent with (1.4) and also shows that the constant $1/4$ is best possible.

As it turns out, (1.6) does hold in the so-called non-degenerate situation, and in essence, (1.7) may be reversed (we say that the range of $(\lambda,\lambda_{2})$ is non-degenerate, if $\lambda_{2}\leq\kappa\lambda$ with $\kappa\in(0,1)$ , or if $\lambda\leq\lambda_{0}$ , and implicitly mean that the resulting inequalities may contain $\kappa$ or $\lambda_{0}$ as fixed parameters). Moreover, one can further sharpen (1.6) by replacing the relative entropy with the Pearson $\chi^{2}$ -distance, as well as with other Rényi/Tsallis distances. To avoid technical complications, let us restrict ourselves to the $\chi^{2}$ -divergence which is given by

[TABLE]

It is a divergence type quantity which dominates the relative entropy: $\chi^{2}(W,Z)\geq D(W||Z)$ . For a general theory of informational distances, we refer interested readers to the recent review by van Erven and Harremoës [5]; an additional material may be found in the books [15], [16], [22], [11]. Here, we reverse the bound (1.7) and prove:

Theorem 1.2. If $\lambda_{2}\leq\lambda/2$ , then with some absolute constant $c$ we have

[TABLE]

The condition $\lambda_{2}\leq\lambda/2$ is readily fulfilled as long as all $p_{j}\leq 1/2$ (note that, if $\lambda\leq 1/2$ , then necessarily $p_{j}\leq 1/2$ and then $\lambda_{2}\leq\lambda/2$ ). Similar bounds as in (1.8) remain to hold under the weaker assumption $\lambda_{2}\leq\kappa\lambda$ with a constant $c=c_{\kappa}$ depending on $\kappa\in(0,1)$ , cf. Proposition 6.2 below. This assumption may actually be replaced with the requirement that $\lambda$ is bounded. More precisely, in the second part of the paper it will be shown that without any restriction, up to some universal factors, we have

[TABLE]

where

[TABLE]

This shows that in general the bound (1.7) cannot be reversed.

For the study of the asymptotic behavior of $D$ and $\chi^{2}$ in terms of $\lambda$ and $\lambda_{2}$ , we derive new bounds for the difference between densities of $W$ and $Z$ , that is, for

[TABLE]

To this aim, one has to consider different zones of $\lambda$ ’s, distinguishing between “small” and “large” values. The case $\lambda\leq\frac{1}{2}$ can be handled directly leading to the non-uniform density bound

[TABLE]

It easily yields sharp upper bounds for all above distances as in Theorems 1.1-1.2 in the case of small $\lambda$ , at least up to numerical factors (cf. Proposition 3.3 and 3.4). To treat larger values of $\lambda$ , a more sophisticated analysis in the complex plane is involved – using the closeness of the generating functions associated with the sequences $w_{k}$ and $v_{k}$ . In particular, the following statement may be of independent interest.

Theorem 1.3. For all integer $k\geq 0$ ,

[TABLE]

Moreover, putting $\rho=(\lambda-\lambda_{2})\,\min\{\frac{k}{\lambda},\frac{\lambda}{k}\}$ , $k=1,2,\dots$ , we have

[TABLE]

Let us clarify the meaning of the last bound, assuming that $\lambda_{2}\leq\kappa\lambda$ with some constant $\kappa\in(0,1)$ . If $k\leq 2\lambda$ and $\lambda\geq 1/2$ , then with some $c=c_{\kappa}>0$ , it gives

[TABLE]

while for $k\geq\lambda\geq 1/2$ , we also have

[TABLE]

Since $|k-\lambda|$ is of order at most $\sqrt{\lambda}$ on a sufficiently large part of ${\mathbb{Z}}$ measured by $P_{\lambda}$ , these non-uniform bounds explain the possibility of upper bounds in Theorem 1.2.

The paper is organized as follows. First we describe several general bounds on the probability function of the Poisson law (Section 2). In Sections 3, we consider the deviations $\Delta_{k}$ and prove Theorem 1.2 in case $\lambda\leq 1/2$ . Sections 4-5 are devoted to non-uniform bounds and the proof of Theorem 1.3, which is used to complete the proof of Theorem 1.2 for $\lambda\geq 1/2$ . Uniform bounds for large $\lambda$ are discussed in Section 7. There we shall demonstrate that in a typical situation, when the ratio $\lambda_{2}/\lambda$ is small, the Poisson approximation considerably improves the rate of normal approximation described by the Berry-Esseen bound in the central limit theorem.

2. Gaussian Type Bounds on Poisson Probabilities

When bounding the Poisson probabilities

[TABLE]

with a fixed parameter $\lambda>0$ , it is convenient to use the well-known Stirling-type two-sided bound:

[TABLE]

In particular, it implies the following Gaussian type estimates.

Lemma 2.1. For all $k\geq 1$ ,

[TABLE]

Moreover, if $1\leq k\leq 2\lambda$ , then

[TABLE]

Here, the lower bound may be improved in the region $k\geq\lambda$ as

[TABLE]

Proof. Applying the lower estimate in (2.1), we get

[TABLE]

where

[TABLE]

The function $h(\theta)$ is concave on the half-axis $\theta\geq-1$ , with $h(0)=h^{\prime}(0)=0$ . Hence, $h(\theta)\leq 0$ for all $\theta$ , thus proving the first assertion (2.2).

Assuming that $1\leq k\leq 2\lambda$ (with $\lambda\geq\frac{1}{2}$ ), we necessarily have $|\theta|\leq 1$ . In this interval, consider the function $T_{c}(\theta)=h(\theta)+c\theta^{2}$ with parameter $\frac{1}{4}<c\leq 1$ . The second derivative

[TABLE]

is vanishing at the point $\theta_{0}=\frac{1}{2c}-1$ , while $T_{c}^{\prime\prime}(-1)=-\infty$ . This means that $T_{c}$ is concave on $[-1,\theta_{0}]$ and convex on $[\theta_{0},1]$ . Since also $T_{c}(0)=T_{c}^{\prime}(0)=0$ , we have $T_{c}(\theta)\leq 0$ for all $\theta\in[-1,1]$ , if and only if this inequality is fulfilled at $\theta=1$ . But $T_{c}(1)=1-2\log 2+c$ , so the optimal value is $c=2\log 2-1=0.387...>\frac{1}{3}$ . Hence, $h(\theta)\leq-\frac{1}{3}\,\theta^{2}$ , and we arrive at the upper bound in (2.3).

Similarly, applying the upper estimate in (2.1), we get

[TABLE]

Choosing $c=1$ , consider the function $T(\theta)=h(\theta)+\theta^{2}$ in the interval $|\theta|\leq 1$ . Since $T^{\prime\prime}(-\frac{1}{2})=0$ , it is concave on $[-1,-\frac{1}{2}]$ and is convex on $[-\frac{1}{2},1]$ . Since $T(0)=T^{\prime}(0)=0$ and $T(-1)=0$ , this means that $\theta=0$ is the point of minimum of $T$ . Therefore, $T(\theta)\geq 0$ , that is, $h(\theta)\geq-\theta^{2}$ for all $\theta\in[-1,1]$ , giving the lower bound in (2.3).

Finally, to get the refinement (2.4) in the region $k\geq\lambda$ , consider the function $T(\theta)=h(\theta)+\frac{1}{2}\,\theta^{2}$ for $\theta\geq 0$ . Since $T(0)=0$ and $T^{\prime}(\theta)=\theta-\log(1+\theta)\geq 0$ , this function is increasing. Therefore, $T(\theta)\geq 0$ , that is, $h(\theta)\geq-\frac{1}{2}\,\theta^{2}$ for all $\theta\geq 0$ . ∎

3. Elementary Upper Bounds

We keep the same notations as before; in particular,

[TABLE]

while

[TABLE]

with summation over all 0-1 sequences $\varepsilon=(\varepsilon_{1},\dots,\varepsilon_{n})$ such that $\varepsilon_{1}+\dots+\varepsilon_{n}=k$ . Clearly, ${\mathbb{P}}\{W=k\}=0$ for $k>n$ . To eliminate this condition, one may always assume that $n$ is arbitrary, by extending the sequence $(X_{1},\dots,X_{n})$ to $(X_{1},\dots,X_{k})$ in case $n<k$ with $p_{n+1}=\dots=p_{k}=0$ . This does not change the value of $W$ .

First, let us consider the probability that $W$ equals $k=0$ .

Lemma 3.1. If $\max_{j}p_{j}\leq\frac{1}{2}$ , then

[TABLE]

Proof. Expanding the function $p\rightarrow-\log(1-p)$ near zero according to the Taylor formula as in the previous section, write

[TABLE]

Using $\lambda_{s}\leq(\max_{j}p_{j})^{s-2}\,\lambda_{2}\leq 2^{-(s-2)}\,\lambda_{2}$ for $s\geq 2$ , we have

[TABLE]

Hence

[TABLE]

∎

Note that the condition of Lemma 3.1 is fulfilled automatically, if $\lambda\leq 1/2$ . In that case, the upper bounds of the lemma may easily be reversed up to numerical factors, for example, in the form

[TABLE]

Moreover, if $\lambda\leq 1/8$ , then also

[TABLE]

Here, the value $k=2$ turns out to be most essential for obtaining lower bounds, since it immediately yields $d(W,Z)\geq c\lambda_{2}$ and $D(W||Z)\geq c\,(\frac{\lambda_{2}}{\lambda})^{2}$ with some absolute constant $c>0$ .

Returning to upper bounds, recall the notation $\Delta_{k}={\mathbb{P}}\{W=k\}-{\mathbb{P}}\{Z=k\}$ . In order to involve the values $k\geq 1$ , we need the following:

Lemma 3.2. If $\max_{j}p_{j}\leq 1/2$ , then

[TABLE]

Moreover, for any $k\geq 2$ ,

[TABLE]

Proof. Denote by $I$ the collection of all tuples $\varepsilon=(\varepsilon_{1},\dots,\varepsilon_{n})$ with integer components $\varepsilon_{i}\geq 0$ such $\varepsilon_{1}+\dots+\varepsilon_{n}=k$ , and let $J=\{\varepsilon\in I:\max_{i}\varepsilon_{i}\leq 1\}$ . Representing the Poisson random variable $Z\sim P_{\lambda}$ as $Z=Z_{1}+\dots+Z_{n}$ with independent summands $Z_{j}\sim P_{p_{j}}$ , we have that, for any $k=0,1,\dots$ ,

[TABLE]

Hence, we may start with the formula

[TABLE]

where

[TABLE]

For a 0-1 sequence $\varepsilon=(\varepsilon_{1},\dots,\varepsilon_{n})\in J$ , put

[TABLE]

By the Taylor formula once more,

[TABLE]

Similarly to (3.1)-(3.2), we have

[TABLE]

Therefore,

[TABLE]

Moreover, since $L_{\varepsilon}\leq\min{(\lambda,k)}$ , we have $\frac{e^{L_{\varepsilon}}-1}{L_{\varepsilon}}\leq\frac{e^{\min(\lambda,k)}-1}{\min(\lambda,k)}\equiv c_{k,\lambda}$ , which in turn implies $e^{\lambda}\,V_{\varepsilon}\leq e^{L_{\varepsilon}}\leq 1+c_{k,\lambda}\,L_{\varepsilon}$ . The two bounds give $L_{\varepsilon}-\lambda_{2}\leq e^{\lambda}\,V_{\varepsilon}-1\leq c_{k,\lambda}\,L_{\varepsilon},$ so that

[TABLE]

Next, applying the multinomial formula, we have

[TABLE]

and

[TABLE]

Thus,

[TABLE]

The remaining terms participating in ${\mathbb{P}}(Z=k)$ correspond to the tuples $\varepsilon\in I$ with $\max_{i}\varepsilon_{i}\geq 2$ , which is only possible for $k\geq 2$ . In that case, restricting for definiteness to the constraint $\varepsilon_{n}\geq 2$ , we have

[TABLE]

Similarly, for any $i=1,\dots,n$ ,

[TABLE]

and summing over $i\leq n$ , we then get

[TABLE]

It remains to combine this bound with the bound (3.5) and apply both in (3.4). Then we finally obtain that

[TABLE]

If $k=1$ , then $c_{1,\lambda}\leq e-1$ , and we arrive at the first inequality in (3.3). In the case $k\geq 2$ , one may use $c_{k,\lambda}\leq\frac{e^{\lambda}-1}{\lambda}$ , and then we arrive at the second inequality of the lemma. ∎

Note that when $\lambda\leq\frac{1}{2}$ , we also have $c_{1,\lambda}\leq 2(\sqrt{e}-1)$ , and then (3.3) may be replaced with a slightly better bound

[TABLE]

Combining Lemmas 3.1–3.2 (cf. (3.6)), we thus obtain the following non-uniform bound on the deviations of $\Delta_{k}$ .

Proposition 3.3. If $\max_{j}p_{j}\leq 1/2$ , then, for all $k\geq 0$ ,

[TABLE]

The estimates obtained so far are sufficient to establish Theorem 1.2 in the case $\lambda\leq 1/2$ . In fact, one may weaken the latter condition to $\max_{j}p_{j}\leq 1/2$ , as shown in the next statement. To compare the lower and upper bounds, we recall the lower bound (1.7) of Harremoës, Johnson and Kontoyiannis [8].

Proposition 3.4. If $\max_{j}p_{j}\leq 1/2$ , then

[TABLE]

where $C_{\lambda}$ depends on $\lambda\geq 0$ as an increasing continuous function with $C_{0}=2$ . In particular, if $\lambda\leq 1/2$ , then

[TABLE]

Proof. Applying Lemmas 3.1-3.2, we get

[TABLE]

where $c_{\lambda}=\frac{e^{\lambda}-1}{\lambda}$ . Expanding the squares of the brackets in this sum results in

[TABLE]

which is the same as

[TABLE]

Multiplying by $\lambda^{2}$ , this gives the desired inequality

[TABLE]

with

[TABLE]

It is easy to check that $\frac{d}{d\lambda}\,B_{\lambda}>0$ , so that this function is increasing in $\lambda$ , with $C_{0}=B_{0}=2$ .

For the range $\lambda\leq\frac{1}{2}$ , the term $e-1$ appearing in the definition of $C_{\lambda}$ may be replaced with $2(\sqrt{e}-1)$ (according to the inequality (3.7)), which leads to the constant $C_{1/2}=\frac{1}{2}(\frac{1}{2}+2(\sqrt{e}-1))^{2}+\frac{7}{8}-2\sqrt{e}+6\,e<15$ . ∎

4. Generating functions

The probability function $f(k)={\mathbb{P}}\{Z=k\}$ of the Poisson random variable $Z\sim P_{\lambda}$ satisfies the equation $\lambda f(k-1)=kf(k)$ in integers $k\geq 1$ , which immediately implies

[TABLE]

for any function $h$ on ${\mathbb{Z}}$ (as long as the expectations exist). This identity was emphasized by Chen [4] who proposed to consider an approximate equality

[TABLE]

as a characterization of a random variable $X$ being almost Poisson with parameter $\lambda$ . This idea was inspired by a similar approach of Charles Stein to problems of normal approximation on the basis of the approximate equality ${\mathbb{E}}\,h^{\prime}(X)\sim{\mathbb{E}}\,Xh(X)$ .

Another natural approach to the Poisson approximation is based on the comparison of characteristic functions. Since the random variables $W$ and $Z$ take non-negative integer values, one may equivalently consider the associated generating functions.

The generating function for the Poisson law $P_{\lambda}$ with parameter $\lambda>0$ is given by

[TABLE]

which is an entire function of the complex variable $w$ . Correspondingly, the generating function for the distribution of the random variable $W=X_{1}+\dots+X_{n}$ in (1.1) is

[TABLE]

which is a polynomial of degree $n$ . Hence, the difference between the involved probabilities may be expressed via the contour integrals by the Cauchy formula

[TABLE]

where $\mu_{r}$ is the uniform probability measure on the circle $|w|=r$ of an arbitrary radius $r>0$ .

Note that for $w=e^{it}$ with real $t$ , the generating functions $\varphi$ and $g$ become the characteristic functions of $Z$ and $W$ , respectively. Hence, closeness of the distributions of these random variables may be studied as a problem of the closeness of the generating functions on the unit circle.

Let us now describe first steps based on the application of the formula (4.3). Given complex numbers $a_{j},b_{j}$ ( $1\leq j\leq n$ ), we have an identity

[TABLE]

with the convention that $\prod_{l<j}b_{l}=1$ for $j=1$ and $\prod_{l>j}a_{l}=1$ for $j=n$ . It implies

[TABLE]

According to the product representations (4.1)-(4.2) to be used in (4.3), one should choose here $a_{j}=q_{j}+p_{j}w$ and $b_{j}=e^{p_{j}(w-1)}$ with $|w|=r$ . Then

[TABLE]

Therefore

[TABLE]

To estimate the terms in this sum, consider the function

[TABLE]

of the complex variable $u$ , where the Taylor integral formula is applied in the second representation. If ${\rm Re}\,u\leq 0$ , then $|u^{2}\,e^{tu}|=|u|^{2}\,\exp\{t\,{\rm Re}\,u\}\leq|u|^{2},$ so,

[TABLE]

In particular, for $u=p_{j}(w-1)$ with $w=\cos\theta+i\sin\theta$ , we have

[TABLE]

hence $|\xi(u)|\leq p_{j}^{2}\,(1-\cos\theta)$ , and (4.6) yields

[TABLE]

Integrating over the unit circle in (4.3), we then arrive at the uniform bound:

Proposition 4.1. We have

[TABLE]

This is a weakened variant of Le Cam’s bound $|{\mathbb{P}}\{W\in A\}-{\mathbb{P}}\{Z\in A\}|\,\leq\,\lambda_{2}$ , specialized to the one-point set $A=\{k\}$ . In order to get a similar bound with arbitrary sets, or develop applications to stronger distances, we need sharper forms of (4.9), with the right-hand side properly depending on $k$ .

5. Proof of Theorem 1.3

Applying (4.4) with $a_{j}=q_{j}+p_{j}w$ and $b_{j}=e^{p_{j}(w-1)}$ in (4.3), one may write this formula as

[TABLE]

with

[TABLE]

where the integration is performed over the uniform probability measure $\mu_{r}$ on the circle $|w|=r$ . Let us write $w=r(\cos\theta+i\sin\theta)$ , $|\theta|<\pi$ , and estimate $|T_{j}(k)|$ by inserting the absolute value sign inside the integral. Then, using (4.5), we get

[TABLE]

Here, in order to estimate $|a_{j}-b_{j}|$ , let us return to the function $\xi(u)$ introduced in (4.7), which we need at the values $u_{j}=p_{j}(w-1)$ with $|w|=r$ .

Case 1: $r\geq 1$ . Since ${\rm Re}\,u_{j}\leq p_{j}(r-1)$ , we have, for any $t\in(0,1)$ ,

[TABLE]

so, by (4.7),

[TABLE]

Case 2: $0<r<1$ . Then ${\rm Re}\,u_{j}\leq 0$ , so, by (4.8),

[TABLE]

Since $|w-1|^{2}=(r-1)^{2}+4r\,\sin^{2}(\theta/2)$ , we therefore obtain from (5.2) that

[TABLE]

where

[TABLE]

and

[TABLE]

In order to estimate the last integrals, which we need with $m=0$ and $m=2$ , let us first note that

[TABLE]

Hence, using $1-x\leq e^{-x}$ ( $x\in{\mathbb{R}}$ ), we have

[TABLE]

so that

[TABLE]

Here we applied the inequalities $\frac{2}{\pi}\,t\leq\sin t\leq t$ ( $0\leq t\leq\frac{\pi}{2}$ ) and used the notation

[TABLE]

Thus, we need to bound $\gamma_{j}$ from below. If $r\geq 1$ , then $q_{l}+p_{l}r\leq r$ , so

[TABLE]

This gives

[TABLE]

In case $r\leq 1$ , we use $q_{l}+p_{l}r\leq 1$ , implying that

[TABLE]

Therefore in this range we have a similar lower bound, namely

[TABLE]

Since $q_{j}p_{j}\leq\frac{1}{4}$ , both lower bounds yield

[TABLE]

As a result, (5.5) is simplified to

[TABLE]

The last integral may be extended to the whole real line, which makes sense for large values of $\psi(r)$ , or one may bound the exponential term in the integrand by 1, which makes sense for small values of $\psi(r)$ . These two ways of estimation lead to

[TABLE]

where $\xi$ is a standard normal random variable. In particular, we get the upper bounds

[TABLE]

In view of $q_{l}+p_{l}r\leq e^{(r-1)p_{l}}$ , from the definition of $R_{j}(r)$ we also have the bound

[TABLE]

in case $r\geq 1$ , while for $r\leq 1$

[TABLE]

Applying these bounds in (5.3), we therefore obtain that $|T_{j}(k)|$ may be bounded from above by

[TABLE]

where $\delta_{r}=1$ in case $r\geq 1$ and $\delta_{r}=e$ for $r<1$ . Summing over $j\leq n$ and recalling (5.1), one can estimate $|\Delta_{k}|$ from above by

[TABLE]

Now, letting $r\rightarrow 0$ in the case $k=0$ , (5.6) leads to

[TABLE]

and we obtain the first inequality in (1.9). Letting $r\downarrow 1$ in the case $k\geq 1$ , (5.6) gives

[TABLE]

which is the second inequality in (1.9).

But, if $k\geq 1$ , one may also use (5.6) with $r=\frac{k}{\lambda}$ and apply the bound $k!\leq e\,k^{k+\frac{1}{2}}\,e^{-k}$ , cf. (2.1), giving

[TABLE]

To simplify the numerical constants, note that $\frac{1}{2}\,e^{5/2}<6.1$ and $\frac{1}{6}\,e^{5/2}\,\pi^{2}<20.1$ . Recalling that $\psi(r)=\rho$ for $r=k/\lambda$ , we finally get the second inequality (1.10),

[TABLE]

∎

6. Consequences of Theorem 1.3

Under the natural requirement that $\lambda_{2}$ is bounded away from $\lambda$ , the bound (5.7) on $\Delta_{k}={\mathbb{P}}\{W=k\}-{\mathbb{P}}\{Z=k\}$ may be simplified. As before, we use the notations

[TABLE]

Note that $\lambda_{2}\leq\lambda$ and recall that $\rho=(\lambda-\lambda_{2})\,\min\{\frac{k}{\lambda},\frac{\lambda}{k}\}$ .

Corollary 6.1. If $\lambda_{2}\leq\kappa\lambda$ , $\kappa\in(0,1)$ , then for any integer $k\geq 0$ ,

[TABLE]

In particular, if $k\leq 2\lambda$ , then

[TABLE]

If $k\geq\lambda\geq 1/2$ , we also have

[TABLE]

Proof. The assumption $\lambda_{2}\leq\kappa\lambda$ ensures that $\rho\geq(1-\kappa)\lambda\,\min\{\frac{k}{\lambda},\frac{\lambda}{k}\}$ .

If $1\leq k\leq K\lambda$ ( $K\geq 1$ ), then $\frac{k}{\lambda}\leq K^{2}\frac{\lambda}{k}$ and $\rho\geq\frac{1-\kappa}{K^{2}}\,k$ , so, the right-hand side of (5.7) is bounded from above by

[TABLE]

Choosing $K=\max\{\frac{k}{\lambda},1\}$ , this expression does not exceed the right-hand side of (6.1). Thus, the inequality (1.10) yields (6.1), which in turn immediately implies (6.2).

In case $k=0$ , we apply the inequality (1.9). Since $\frac{(k-\lambda)^{2}}{\lambda}+3\geq\lambda$ for $k=0$ , the right-hand side of (1.10) is dominated by the right-hand side of (6.1). Thus, we obtain (6.1) without any constraints on $k$ , and (6.2) for all $k\leq 2\lambda$ .

In case $k\geq\lambda$ , necessarily $\rho\geq(1-\kappa)\,\lambda^{2}/k$ . Hence, the right-hand side of (5.7) may be bounded from above by

[TABLE]

Using $(\frac{k-\lambda}{\lambda})^{2}\leq\frac{k^{2}}{\lambda^{2}}$ to bound the first term in the brackets and $\frac{k}{\lambda}\leq 2k$ to bound the second term (using $\lambda\geq 1/2$ ), we obtain the bound (6.3). ∎

We are now prepared to extend Proposition 3.4 to larger values of $\lambda$ under the assumption that $\lambda_{2}/\lambda$ is bounded away from 1. The next assertion, being combined with Proposition 3.4, yields Theorem 1.2 with $c=15$ in case $\lambda\leq 1/2$ and $c=56\cdot 10^{6}$ in case $\lambda>1/2$ .

Proposition 6.2. If $\lambda\geq 1/2$ and $\lambda_{2}\leq\kappa\lambda$ with $\kappa\in(0,1)$ , then

[TABLE]

where $c_{\kappa}=c\,(1-\kappa)^{-3}$ with, for example, $c=7\cdot 10^{6}$ .

Proof. The leftmost lower bound in (6.4) is added according to (1.7) (using the Pinsker inequality, it also follows with some constant from Barbour-Hall’s lower bound in Theorem 1.1). Hence, it remains to show the rightmost upper bound in (6.4). Write

[TABLE]

In the range $0\leq k\leq[2\lambda]$ , we apply the inequality (6.2) which gives

[TABLE]

Hence

[TABLE]

In the sequel, we use a simple moment inequality ${\mathbb{E}}\,Z^{m}\leq\lambda(\lambda+1)\dots(\lambda+m-1)$ . We also have ${\mathbb{E}}\,(Z-\lambda)^{2}=\lambda$ and ${\mathbb{E}}\,(Z-\lambda)^{4}=\lambda(\lambda+3)$ , so that

[TABLE]

with $C_{1}=94\,080$ (where we used the assumption $\lambda\geq 1/2$ on the last step).

In order to estimate $S_{2}$ , we use the following elementary bound

[TABLE]

which holds for any $d=1,2,\dots$ as long as $k_{0}^{d}/(k_{0}+1)^{d-1}>\lambda$ . For the proof, write

[TABLE]

where

[TABLE]

Since the function $(x+1)^{d-1}\,x^{-d}$ is decreasing in $x>0$ , we have $1>\theta_{1}>\theta_{2}>\dots$ This gives

[TABLE]

that is, (6.6). In particular, for $k_{0}=[2\lambda]+1$ and $\lambda\geq 8$ (with $d=6$ ),

[TABLE]

So, by (6.6), and using $[2\lambda]+1\leq\frac{17}{8}\lambda$ for the chosen range of $\lambda$ , we have

[TABLE]

Hence, by (6.3),

[TABLE]

with $C_{2}=49^{2}\cdot 3.1\cdot(17/8)^{6}<685\,343$ . Asymptotically with respect to large $\lambda$ , this bound is much better than (6.4). Applying $f(k)\leq\frac{1}{\sqrt{2\pi k}}\,e^{k-\lambda}\,(\frac{\lambda}{k})^{k}$ as in (2.5) with $k=[2\lambda]+1$ and using $2\lambda\leq k\leq 2\lambda+1$ , we have

[TABLE]

This gives

[TABLE]

As a result, we arrive at the desired upper bound in (6.4).

Finally, let us estimate $S_{2}$ for the range $\frac{1}{2}\leq\lambda\leq 8$ . Returning to (6.7), we have

[TABLE]

where $C_{2}^{\prime}=49^{2}\,\sup_{\frac{1}{2}\leq\lambda\leq 8}\psi(\lambda)$ , $\psi(\lambda)=\lambda^{-4}\,{\mathbb{E}}Z^{6}$ . Here

[TABLE]

with $\psi_{1}(\lambda)=5+\lambda+\frac{4}{\lambda}$ , $\psi_{2}(\lambda)=7+\lambda+\frac{10}{\lambda}$ , $\psi_{3}(\lambda)=1+\frac{3}{\lambda}$ . All these three functions are convex, while $\psi_{3}$ is decreasing. In addition, $\psi_{i}(1/2)\geq\psi_{i}(8)$ for $i=1,2$ . Hence $\psi(\lambda)\leq\psi(1/2)=\frac{1}{4}\cdot 11!!$ It follows that $C_{2}^{\prime}=49^{2}\cdot\frac{1}{4}\cdot 11!!<6\,239\,560$ , and thus $c=C_{1}+C_{2}^{\prime\prime}$ is the resulting constant in (6.4). ∎

Remark 6.3. Up to a numerical constant, the upper bound in (6.4) immediately implies an upper bound of Theorem 1.1 in case $\lambda\geq 1/2$ , in view of the relation $d(W,Z)^{2}\leq\frac{1}{2}\,D(W,Z)$ . Indeed, (6.4) gives $d(W,Z)\leq c_{\kappa}\lambda_{2}/\lambda$ , provided that $\lambda_{2}\leq\kappa\lambda$ . But, in the other case $\lambda_{2}\geq\kappa\lambda$ , there is nothing to prove, since $d(W,Z)\leq 2$ . Note also that, for $\lambda\leq 1/2$ , the correct upper bound on the total variation distance is of the form $d(W,Z)\leq C\lambda_{2}$ . It may be obtained as a consequence of Lemmas 3.1-3.2.

7. Uniform Bounds. Comparison with Normal Approximation

A different choice of the parameter $r$ in the proof of Theorem 1.3 may provide various uniform bounds in the Poisson approximation, like in the next assertion. Using the $L^{\infty}(\mu)$ -norm with respect to the counting measure $\mu$ on ${\mathbb{Z}}$ , let us focus on the deviations of the densities of $W$ and $Z$ and the deviations of their distribution functions. These distances are thus given by

[TABLE]

Putting $r=1$ in (5.6), we arrive at the next assertion which sharpens Proposition 4.1.

Theorem 7.1. We have

[TABLE]

This uniform bound is not new; with a non-explicit numerical factor, it corresponds to Theorem 3.1 in Cekanavicius [3], p. 53. For $\lambda\leq 1$ , this relation is simplified to

[TABLE]

which cannot be improved (modulo a numerical factor) in view of the lower bounds on $|\Delta_{k}|$ with $k=0,1,2$ mentioned in Section 3. We also have a similar bound for the Kolmogorov distance, $K(W,Z)\leq C\lambda_{2}$ , which follows from the upper bound for the stronger total variation distance as in Theorem 1.1.

When, however, $\lambda$ is large (and say all $p_{j}\leq 1/2$ ), one would expect to achieve more accurate bounds when replacing the Poisson approximation for $P_{W}$ by the normal law $N(\lambda,\lambda)$ with mean $\lambda$ and variance $\lambda$ . Indeed, suppose, for example, that $p_{j}=1/2$ , so that $W$ has a binomial distribution with parameters $(n,1/2)$ , while the approximating Poisson distribution has parameter $\lambda=n/2$ with $\lambda_{2}=n/4$ . Here (1.2) only yields $d(W,Z)\sim 1$ , which means that there is no Poisson approximation with respect to the total variation! Nevertheless, the approximation is still meaningful in a weaker sense in terms of the Kolmogorov distance $K$ , as well as in terms of $M$ . In this case, both $P_{W}$ and $P_{\lambda}$ are almost equal to $N(\lambda,\lambda)$ , and the Berry-Esseen theorem provides a correct bound $K(W,Z)\leq\frac{c}{\sqrt{n}}$ via the triangle inequality for $K$ . Since $M\leq 2K$ (which holds true for all probability distributions on ${\mathbb{Z}}$ ), we also have $M(W,Z)\leq\frac{c}{\sqrt{n}}$ . Note that this inequality also follows from Theorem 7.1. Indeed, when $\lambda_{2}\leq\frac{1}{2}\,\lambda$ , (7.1) is simplified to

[TABLE]

which yields a correct order for growing $n$ . Thus, the two approaches are equivalent for this particular (i.i.d.) example.

To realize whether or not the normal approximation is better or worse than the Poisson approximation in the general non-i.i.d. situation (that is, with different $p_{j}$ ’s), let us evaluate the corresponding Lyapunov ratio in the central limit theorem and apply the Berry-Esseen bound $K(W,N_{\lambda})\leq cL_{3}$ , where the random variable $N_{\lambda}$ is distributed according to $N(\lambda,\lambda)$ . Since ${\rm Var}(W)=\sum_{j=1}^{n}p_{j}q_{j}=\lambda-\lambda_{2}$ , the Lyapunov ratio for the sequence $X_{1},\dots,X_{n}$ is given by

[TABLE]

(note that $\frac{1}{2}\leq p_{j}^{2}+q_{j}^{2}\leq 1$ ). Hence $K(W,N_{\lambda})\leq\frac{c}{\sqrt{\lambda-\lambda_{2}}}$ , up to some absolute constant $c>0$ . A similar bound holds for $Z$ as well when representing $W$ as the sum of $n$ independent Poisson random variables $Z_{j}$ with parameters $p_{j}$ . Namely, for the sequence $Z_{1},\dots,Z_{n}$ , we have

[TABLE]

Therefore, $K(Z,N_{\lambda})\leq\frac{c}{\sqrt{\lambda}}$ and hence, by the triangle inequality, $K(W,Z)\leq\frac{c}{\sqrt{\lambda-\lambda_{2}}}$ . In particular, in a typical situation where $\lambda_{2}\leq\frac{1}{2}\,\lambda$ , the normal approximation yields

[TABLE]

with some absolute constant $c$ . But, this bound is surprisingly worse than (7.2) as long as $\lambda_{2}=o(\lambda)$ .

Consider as an example $p_{j}=1/(2\sqrt{j})$ for $j=1,\dots,n$ . Then $\lambda\sim\sqrt{n}$ , $\lambda_{2}\sim\log n$ , and we get $M(W,Z)\leq cn^{-3/4}\log n$ in (7.2), while (7.3) only yields $M(W,Z)\leq cn^{-1/4}$ . This example is also illustrative when comparing Theorem 1.2 with (1.5). The first one provides a correct asymptotic $D(W,Z)\sim\frac{\log^{2}n}{n}$ (within absolute factors), while (1.5) only gives $D(W,Z)\leq c$ .

Acknowledgement. The authors would like to thank Igal Sason and two referees for valuable comments and drawaing our attention to additional references related to the Poisson approximation in informational distances.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Barbour, A. D.; Hall, P. On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 (1984), no. 3, 473–480.
2[2] Barbour, A. D.; Holst, L.; Janson, S. Poisson approximation. Oxford Studies in Probability, 2. Oxford Science Publications. The Clarendon Press, Oxford University Press, New York, 1992. x+277 pp.
3[3] Čekanavicius, V. Approximation methods in Probability Theory. Universitext. Springer (2016), 274 pp.
4[4] Chen, L. H. Y. Poisson approximation for dependent trials. Ann. Probability 3 (1975), no. 3, 534–545.
5[5] van Erven, T., Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory 60 (2014), no. 7, 3797–3820.
6[6] Harremoës, P. Binomial and Poisson distributions as maximum entropy distributions. IEEE Trans. Inform. Theory 47 (2001), no. 5, 2039–2041.
7[7] Harremoës, P. Convergence to Poisson distribution in information divergence. Preprint 2, Math. Department, University of Copenhagen, Feb. 2003.
8[8] Harremoës, P.; Johnson, O.; Kontoyiannis. Thinning and information projections. ar Xive:1601.04255, Jan. 2016.