Non-Uniform Bounds in the Poisson Approximation with Applications to   Informational Distances. II

S.G. Bobkov; G.P. Chistyakov; F. G\"otze

arXiv:1906.09156·math.PR·August 15, 2019

Non-Uniform Bounds in the Poisson Approximation with Applications to Informational Distances. II

S.G. Bobkov, G.P. Chistyakov, F. G\"otze

PDF

Open Access

TL;DR

This paper extends previous work on bounds for how closely sums of independent Bernoulli variables approximate a Poisson distribution, using various informational distances, without parameter restrictions.

Contribution

It generalizes earlier results by removing parameter constraints, providing asymptotically optimal bounds for distribution deviations in informational distances.

Findings

01

Derived bounds for Bernoulli sums in terms of Shannon and Rényi distances

02

Extended previous results to all Bernoulli parameters without restrictions

03

Provided asymptotically optimal bounds for distribution deviations

Abstract

We explore asymptotically optimal bounds for deviations of distributions of independent Bernoulli random variables from the Poisson limit in terms of the Shannon relative entropy and R\'enyi/Tsallis relative distances (including Pearson's $χ^{2}$ ). This part generalizes the results obtained in Part I and removes any constraints on the parameters of the Bernoulli distributions.

Equations410

w_{k} = P {W = k} = \sum p_{1}^{ε_{1}} q_{1}^{1 - ε_{1}} \dots p_{n}^{ε_{n}} q_{n}^{1 - ε_{n}}, k = 0, 1, \dots, n,

w_{k} = P {W = k} = \sum p_{1}^{ε_{1}} q_{1}^{1 - ε_{1}} \dots p_{n}^{ε_{n}} q_{n}^{1 - ε_{n}}, k = 0, 1, \dots, n,

v_{k} = P {Z = k} = \frac{λ ^{k}}{k !} e^{- λ}, k = 0, 1, \dots

v_{k} = P {Z = k} = \frac{λ ^{k}}{k !} e^{- λ}, k = 0, 1, \dots

\frac{1}{32} min (1, 1/ λ) λ_{2} \leq \frac{1}{2} d (W, Z) \leq \frac{1 - e ^{- λ}}{λ} λ_{2} .

\frac{1}{32} min (1, 1/ λ) λ_{2} \leq \frac{1}{2} d (W, Z) \leq \frac{1 - e ^{- λ}}{λ} λ_{2} .

D_{\alpha}=D_{\alpha}(W||Z)=\frac{1}{\alpha-1}\,\log\sum_{k=0}^{\infty}\Big{(}\frac{w_{k}}{v_{k}}\Big{)}^{\alpha}\,v_{k}

D_{\alpha}=D_{\alpha}(W||Z)=\frac{1}{\alpha-1}\,\log\sum_{k=0}^{\infty}\Big{(}\frac{w_{k}}{v_{k}}\Big{)}^{\alpha}\,v_{k}

T_{\alpha}=T_{\alpha}(W||Z)=\frac{1}{\alpha-1}\,\bigg{[}\sum_{k=0}^{\infty}\Big{(}\frac{w_{k}}{v_{k}}\Big{)}^{\alpha}\,v_{k}-1\bigg{]}.

T_{\alpha}=T_{\alpha}(W||Z)=\frac{1}{\alpha-1}\,\bigg{[}\sum_{k=0}^{\infty}\Big{(}\frac{w_{k}}{v_{k}}\Big{)}^{\alpha}\,v_{k}-1\bigg{]}.

D = D_{1} = T_{1} = k = 0 \sum \infty w_{k} lo g \frac{w _{k}}{v _{k}}, T_{2} = χ^{2} = k = 0 \sum \infty \frac{( w _{k} - v _{k} ) ^{2}}{v _{k}} .

D = D_{1} = T_{1} = k = 0 \sum \infty w_{k} lo g \frac{w _{k}}{v _{k}}, T_{2} = χ^{2} = k = 0 \sum \infty \frac{( w _{k} - v _{k} ) ^{2}}{v _{k}} .

lim \frac{χ ^{2}}{( λ _{2} / λ ) ^{2}} = \frac{1}{2} as λ^{6} λ_{2} \to 0.

lim \frac{χ ^{2}}{( λ _{2} / λ ) ^{2}} = \frac{1}{2} as λ^{6} λ_{2} \to 0.

\frac{1}{4}\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\leq D\leq\chi^{2}\leq c\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}.

\frac{1}{4}\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\leq D\leq\chi^{2}\leq c\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}.

\chi^{2}\leq 2\,(\sqrt{e}-1)^{2}\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\Big{(}1-\frac{\lambda_{2}}{\lambda}\Big{)}^{-3}

\chi^{2}\leq 2\,(\sqrt{e}-1)^{2}\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\Big{(}1-\frac{\lambda_{2}}{\lambda}\Big{)}^{-3}

F = \frac{max ( 1 , λ )}{max ( 1 , λ - λ _{2} )} .

F = \frac{max ( 1 , λ )}{max ( 1 , λ - λ _{2} )} .

D\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,(1+\log F),\qquad\chi^{2}\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,\sqrt{F}.

D\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,(1+\log F),\qquad\chi^{2}\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,\sqrt{F}.

D\,\sim\,\log\frac{\lambda}{\max\{1,\lambda-\lambda_{2}\}},\qquad\chi^{2}\,\sim\,\bigg{(}\frac{\lambda}{\max\{1,\lambda-\lambda_{2}\}}\bigg{)}^{1/2}.

D\,\sim\,\log\frac{\lambda}{\max\{1,\lambda-\lambda_{2}\}},\qquad\chi^{2}\,\sim\,\bigg{(}\frac{\lambda}{\max\{1,\lambda-\lambda_{2}\}}\bigg{)}^{1/2}.

D=\log\frac{1}{{\mathbb{P}}\{Z=n\}}=\log\Big{(}\frac{n!}{n^{n}}\,e^{n}\Big{)}\sim\log n,

D=\log\frac{1}{{\mathbb{P}}\{Z=n\}}=\log\Big{(}\frac{n!}{n^{n}}\,e^{n}\Big{)}\sim\log n,

χ^{2} = \frac{1}{P { Z = n }} - 1 = \frac{n !}{n ^{n}} e^{n} - 1 \sim 2 π n .

χ^{2} = \frac{1}{P { Z = n }} - 1 = \frac{n !}{n ^{n}} e^{n} - 1 \sim 2 π n .

T_{\alpha}\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,F^{\frac{\alpha-1}{2}}

T_{\alpha}\sim\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,F^{\frac{\alpha-1}{2}}

H (W ∣∣ Z) = H (Z) - H (W),

H (W ∣∣ Z) = H (Z) - H (W),

H (Z) = - k \sum v_{k} lo g v_{k}, H (W) = - k \sum w_{k} lo g w_{k} .

H (Z) = - k \sum v_{k} lo g v_{k}, H (W) = - k \sum w_{k} lo g w_{k} .

H (W ∣∣ Z) \leq C_{λ} \frac{λ _{2}}{λ} .

H (W ∣∣ Z) \leq C_{λ} \frac{λ _{2}}{λ} .

D (W ∣∣ Z) = k \sum w_{k} lo g \frac{w _{k}}{v _{k}}, χ^{2} (W, Z) = k \sum \frac{( w _{k} - v _{k} ) ^{2}}{v _{k}} .

D (W ∣∣ Z) = k \sum w_{k} lo g \frac{w _{k}}{v _{k}}, χ^{2} (W, Z) = k \sum \frac{( w _{k} - v _{k} ) ^{2}}{v _{k}} .

- w_{k} < v_{k} \sum w_{k} lo g \frac{w _{k}}{v _{k}} \leq 1.

- w_{k} < v_{k} \sum w_{k} lo g \frac{w _{k}}{v _{k}} \leq 1.

D (W ∣∣ Z) \geq \frac{1}{2} k \sum \frac{( w _{k} - v _{k} ) ^{2}}{max { w _{k} , v _{k} }} .

D (W ∣∣ Z) \geq \frac{1}{2} k \sum \frac{( w _{k} - v _{k} ) ^{2}}{max { w _{k} , v _{k} }} .

w_{k} < v_{k} \sum w_{k} lo g \frac{w _{k}}{v _{k}}

w_{k} < v_{k} \sum w_{k} lo g \frac{w _{k}}{v _{k}}

w_{k} < v_{k} \sum (w_{k} - v_{k}) = - \frac{1}{2} k = 0 \sum \infty ∣ w_{k} - v_{k} ∣ \geq - 1,

w_{k} < v_{k} \sum (w_{k} - v_{k}) = - \frac{1}{2} k = 0 \sum \infty ∣ w_{k} - v_{k} ∣ \geq - 1,

w_{k} > v_{k} \sum w_{k} lo g \frac{w _{k}}{v _{k}}

w_{k} > v_{k} \sum w_{k} lo g \frac{w _{k}}{v _{k}}

k \sum w_{k} lo g \frac{w _{k}}{v _{k}} \geq \frac{1}{2} w_{k} > v_{k} \sum \frac{( w _{k} - v _{k} ) ^{2}}{w _{k}} + \frac{1}{2} w_{k} < v_{k} \sum \frac{( w _{k} - v _{k} ) ^{2}}{v _{k}},

k \sum w_{k} lo g \frac{w _{k}}{v _{k}} \geq \frac{1}{2} w_{k} > v_{k} \sum \frac{( w _{k} - v _{k} ) ^{2}}{w _{k}} + \frac{1}{2} w_{k} < v_{k} \sum \frac{( w _{k} - v _{k} ) ^{2}}{v _{k}},

D (W_{1} + W_{2} ∣∣ Z_{1} + Z_{2}) \leq D (W_{1} ∣∣ Z_{1}) + D (W_{2} ∣∣ Z_{2}) .

D (W_{1} + W_{2} ∣∣ Z_{1} + Z_{2}) \leq D (W_{1} ∣∣ Z_{1}) + D (W_{2} ∣∣ Z_{2}) .

χ^{2} (W_{1} + W_{2}, Z_{1} + Z_{2}) + 1 \leq (χ^{2} (W_{1}, Z_{1}) + 1) (χ^{2} (W_{2}, Z_{2}) + 1) .

χ^{2} (W_{1} + W_{2}, Z_{1} + Z_{2}) + 1 \leq (χ^{2} (W_{1}, Z_{1}) + 1) (χ^{2} (W_{2}, Z_{2}) + 1) .

k = 0 \sum \infty \frac{P { W _{1} + W _{2} = k } ^{α}}{P ( Z _{1} + Z _{2} = k } ^{α - 1}} \leq k = 0 \sum \infty \frac{P { W _{1} = k } ^{α}}{P { Z _{1} = k } ^{α - 1}} k = 0 \sum \infty \frac{P { W _{2} = k } ^{α}}{P { Z _{2} = k } ^{α - 1}}

k = 0 \sum \infty \frac{P { W _{1} + W _{2} = k } ^{α}}{P ( Z _{1} + Z _{2} = k } ^{α - 1}} \leq k = 0 \sum \infty \frac{P { W _{1} = k } ^{α}}{P { Z _{1} = k } ^{α - 1}} k = 0 \sum \infty \frac{P { W _{2} = k } ^{α}}{P { Z _{2} = k } ^{α - 1}}

P {Z = 0}

P {Z = 0}

P {W = 0}

\frac{1}{4}\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2}\,\leq\,D(W||Z)\,\leq\,\chi^{2}(W,Z)\,\leq\,C_{\lambda}\,\Big{(}\frac{\lambda_{2}}{\lambda}\Big{)}^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Mechanics and Entropy · Financial Risk and Volatility Modeling · Statistical Distribution Estimation and Applications

Full text

School of Mathematics, University of Minnesota, USA; research was partially supported by the Simons Foundation and the NSF grant DMS-1855575

Faculty of Mathematics, University of Bielefeld, Germany; research was partially supported by SFB 1283

NON-UNIFORM BOUNDS IN THE POISSON APPROXIMATION

WITH APPLICATIONS TO INFORMATIONAL DISTANCES. II

S. G. Bobkov1 missing Sergey G. Bobkov School of Mathematics, University of Minnesota 127 Vincent Hall, 206 Church St. S.E., Minneapolis, MN 55455 USA

[email protected]

,

G. P. Chistyakov2 missing Gennadiy P. ChistyakovFakultät für Mathematik, Universität BielefeldPostfach 100131, 33501 Bielefeld, Germany

[email protected]

and

F. Götze2

Friedrich GötzeFakultät für Mathematik, Universität BielefeldPostfach 100131, 33501 Bielefeld, Germany

[email protected]

Abstract.

We explore asymptotically optimal bounds for deviations of distributions of independent Bernoulli random variables from the Poisson limit in terms of the Shannon relative entropy and Rényi/relative Tsallis distances (including Pearson’s $\chi^{2}$ ). This part generalizes the results obtained in Part I and removes any constraints on the parameters of the Bernoulli distributions.

Key words and phrases:

$\chi^{2}$ -divergence, Relative entropy, Poisson approximation

1991 Mathematics Subject Classification:

Primary 60E, 60F

1. Introduction

Let $W=X_{1}+\dots+X_{n}$ be the sum of independent random variables $X_{j}$ taking values $1$ and [math] with respective probabilities $p_{j}$ and $q_{j}=1-p_{j}$ . Thus,

[TABLE]

where the summation runs over all 0-1 sequences $\varepsilon_{1},\dots,\varepsilon_{n}$ such that $\varepsilon_{1}+\dots+\varepsilon_{n}=k$ .

Denote by $Z$ a Poisson random variable with parameter $\lambda=p_{1}+\dots+p_{n}$ , i.e., taking non-negative integer values wih probabilities

[TABLE]

It is well known that, if all $p_{j}$ are small, the distribution of $Z$ approximates the distribution of $W$ in terms of the total variation distance $d(W,Z)=\sum_{k=0}^{\infty}\,|w_{k}-v_{k}|$ . In particular, involving the functional $\lambda_{2}=p_{1}^{2}+\dots+p_{n}^{2}$ , Barbour and Hall [2] derived a two sided bound

[TABLE]

There is considerable interest as well in the question of Poisson approximation for (stronger) informational distances, including the Rényi divergences, or equivalently – the Tsallis relative entropies in their full hierarchy. Being well-defined in the setting of abstract measure spaces (cf. e.g. [6], [3]), in the discrete model specified above these important quantities are respectively given for any parameter $\alpha>0$ by

[TABLE]

and

[TABLE]

The functions $\alpha\rightarrow D_{\alpha}$ and $\alpha\rightarrow T_{\alpha}=\frac{1}{\alpha-1}\,(\exp\{(\alpha-1)\,D_{\alpha}\}-1)$ are non-decreasing, and in the particular cases $\alpha=1$ and $\alpha=2$ , we deal with the more familiar relative entropy (Kullback-Leibler distance) and the Pearson $\chi^{2}$ -distance

[TABLE]

We refer to [13] and [4] for historical references related to the lower and upper bounds as in (1.3), as well as to recent developments towards the problem of bounding of $D$ and $\chi^{2}$ . Here, let us only mention a few results in this direction.

In a rather general asymptotic regime (which is typical in applications), Borisov and Vorozheĭkin [5] observed that $\chi^{2}$ is approximately $\frac{1}{2}\,\big{(}\frac{\lambda_{2}}{\lambda})^{2}$ , and more precisely,

[TABLE]

On the other hand, Harremoës, Johnson and Kontoyiannis [8] have recently derived a universal lower bound on the relative entropy, $D\geq\frac{1}{4}\,\big{(}\frac{\lambda_{2}}{\lambda})^{2}$ . Here, the constant $\frac{1}{4}$ is best possible and is asymptotically attained in the case of equal probabilities $p_{j}$ [9]. It is therefore natural to wonder whether or not there are two-sided bounds such as

[TABLE]

This turns out to be true in the the case where $\lambda_{2}/\lambda$ is bounded away from 1. Based on orthogonal expansions in Charlier polynomials over the Poisson measure and using the Parseval identity in this context, Zacharovas and Hwang [13] obtained a superior upper bound

[TABLE]

(among other similar results for different distances). Consequently, if for example $\frac{\lambda_{2}}{\lambda}\leq\frac{1}{2}$ , then (1.4) is fulfilled with $c=6.74$ .

The upper estimate such as (1.4) also appears as a consequence of non-uniform bounds which have been recently studued in [4]. It was shown there that $\frac{w_{k}-v_{k}}{v_{k}}$ is of order at most $\lambda_{2}$ on a large part of the support of the Poisson measure, especially when $\lambda$ is large. One of the aims of this paper is to extend (1.4) modulo absolute constants to the whole range of $(\lambda,\lambda_{2})$ . To formulate results in a compact form, let us use the notation $Q_{1}\sim Q_{2}$ , whenever two positive quantities are related by $c_{1}Q_{1}\leq Q_{2}\leq c_{2}Q_{1}$ with some absolute constants $c_{j}>0$ . Introduce the quantity

[TABLE]

Clearly, $F\geq 1$ .

Theorem 1.1. We have

[TABLE]

If $\frac{\lambda_{2}}{\lambda}$ is bounded away from 1, then $F$ is bounded, and (1.6) recovers (1.4). A similar conclusion is also true, when $\lambda$ is not large, say $\lambda\leq 10$ , which is typical for applications (note that for such $\lambda$ ’s, $\frac{\lambda_{2}}{\lambda}$ may be close to 1, and then (1.5) fails to be optimal). On the other hand, if these two assumptions on $\lambda$ and $\lambda_{2}$ are violated (which we hence forth call the “degenerate case”), both distances are bounded away from zero and can be large, since then

[TABLE]

This shows that the lower bound for $D$ in (1.4) may not be reversed in general. Indeed, in the extreme case with all $p_{j}=1$ , we have $\lambda_{2}=\lambda=n$ . Here ${\mathbb{P}}\{W=n\}=1$ , hence as $n\rightarrow\infty$

[TABLE]

As a next step, we employ the non-uniform bounds of [4] to extend (1.4) and (1.6) to all Tsallis entropies.

Theorem 1.2. Given $\alpha\,>\,1$ ,

[TABLE]

with involved constants depending on $\alpha$ . In particular, $T_{\alpha}\leq c_{\alpha}\chi^{2}$ as long as $\frac{\lambda_{2}}{\lambda}\leq\frac{1}{2}$ .

Let us finally mention one application of Theorem 1.1 to the problem of the estimation of the difference of entropies

[TABLE]

where $H$ stands for the Shannon entropy, that is,

[TABLE]

The property that $H(W||Z)$ is positive is a consequence of the assertion, recently proved by Hillion and Johnson [10], that $H(p)\equiv H(W)$ is a concave function of the vector $p=(p_{1},\dots,p_{n})$ . Indeed, since $H(p)$ is invariant under permutations of $p_{j}$ , this entropy attains its maximum on the simplex $p_{j}\geq 0$ , $p_{1}+\dots+p_{n}=\lambda$ at the point where all coordinates coincide, that is, for $p_{j}=\lambda/n$ . But in that case, the distribution of $W$ represents the binomial law with parameters $n$ and $\lambda/n$ whose entropy is dominated by $H(Z)$ , as was shown by Harremoës [7].

Thus, the difference of entropies in this particular model may be viewed as kind of informational distance. Sason proposed to bound $H(W||Z)$ for equal $p_{j}$ ’s by means of the so-called maximal coupling, cf. [12]. Here, we show that this distance may be controlled in terms of $\chi^{2}(W,Z)$ , which together with the upper bound on the Pearson distance as in (1.4)-(1.5) leads to the following estimate.

Corollary 1.3. With some constants $C_{\lambda}$ depending only on $\lambda$ , we have

[TABLE]

If $\lambda_{2}\leq\frac{1}{2}\,\lambda$ , one may take $C_{\lambda}=C\log(2+\lambda)$ with an absolute constant $C$ .

Below, we start with some general bounds involving the relative entropy and the Pearson distance (Section 2). In Section 3, we describe several results obtained in [4] in the non-degenerated case, and employ there some bounds for the probability function of the Poisson law. The remaining parts are devoted to the proof of Theorems 1.1 and 1.2 in the degenerate case (Sections 4-10) and of Corollary 1.3 (Section 11). Thus, the paper is structured as follows:

Introduction
General bounds on relative entropy and $\chi^{2}$
Poisson approximation in the non-degenerate case
Upper bounds on $D$ and $\chi^{2}$
Lower bound on $\chi^{2}$
Lower bound on $D$
Proof of Theorem 1.1
Tsallis versus Vajda-Pearson
Estimates of Vajda-Pearson distances
Proof of Theorem 1.2
Difference of entropies

2. General Bounds on Relative Entropy and $\chi^{2}$

Before turning to the problem of lower and upper bounds for the relative entropy and $\chi^{2}$ -distance, we first collect several useful general inequalities. If two discrete random elements $W$ and $Z$ in a measurable space $\Omega$ take at most countably many values $\omega_{k}\in\Omega$ with probabilities $w_{k}={\mathbb{P}}\{W=\omega_{k}\}$ and $v_{k}={\mathbb{P}}\{Z=\omega_{k}\}$ , the above distances are defined canonically by

[TABLE]

Proposition 2.1. We have

[TABLE]

Moreover,

[TABLE]

Proof. Using the Taylor formula for the logarithmic function, write

[TABLE]

Here

[TABLE]

thus proving the first assertion. Similarly, we have a second identity

[TABLE]

Adding the two identities, we get

[TABLE]

which is the desired inequality (2.2). ∎

Proposition 2.2. Let $W_{1}$ and $W_{2}$ be independent, non-negative, integer-valued random variables with finite means, and let $Z_{1}$ and $Z_{2}$ be independent Poisson random variables with ${\mathbb{E}}Z_{1}={\mathbb{E}}W_{1}$ and ${\mathbb{E}}Z_{2}={\mathbb{E}}W_{2}$ . Then

[TABLE]

In addition,

[TABLE]

For the proof, we refer to Johnson [11], pp. 133–134. Let us only mention that (2.4) is obtained in [11] in the more general form

[TABLE]

with arbitrary $\alpha\geq 1$ , which represents a Poisson analog of weighted convolution inequalities due to Andersen [1]. Here, for $\alpha=1$ there is an equality, and comparing the derivatives of both sides at this point, we arrive at the relation (2.3).

3. Poisson Approximation in the Non-Degenerate Case

Now, we restrict ourselves to the random variables $W=X_{1}+\dots+X_{n}$ and $Z$ with distributions described in (1.1)-(1.2). In particular,

[TABLE]

The bounds (1.4) follow from the following two assertions proved in [4]. To compare the lower and upper bounds, we recall the lower bound (1.4) of Harremoës, Johnson and Kontoyiannis [8].

Proposition 3.1. If $\max_{j}p_{j}\leq\frac{1}{2}$ , then

[TABLE]

where $C_{\lambda}$ depends on $\lambda\geq 0$ and is an increasing continuous function with $C_{0}=2$ . In particular, if $\lambda\leq 1/2$ , then

[TABLE]

Proposition 3.2. If $\lambda\geq 1/2$ and $\lambda_{2}\leq\kappa\lambda$ with $\kappa\in(0,1)$ , then

[TABLE]

where one may take $c_{\kappa}=c\,(1-\kappa)^{-3}$ with some absolute constant, e.g. $c=7\cdot 10^{6}$ .

A natural approach to the Poisson approximation is based on the comparison of characteristic functions. Since the random variables $W$ and $Z$ assume non-negative integer values only, one may equivalently consider the associated generating functions, similar as in [4]. The generating function for the Poisson law with parameter $\lambda>0$ is given by

[TABLE]

which is an entire function of the complex variable $w$ . Correspondingly, the generating function for the distribution of the random variable $W$ is

[TABLE]

which is a polynomial of degree $n$ . Hence, the difference between the involved probabilities may be expressed with the help of the contour integrals by the Cauchy formula

[TABLE]

where $\mu_{r}$ is the uniform probability measure on the circle $|w|=r$ of an arbitrary radius $r>0$ . This identity for the difference of probabilities was used in [4] in the derivation of the upper bound in (3.2), while here the representation

[TABLE]

will be particularly helpful in the study of the degenerate case.

When estimating the Poisson probabilities

[TABLE]

for a fixed parameter $\lambda>0$ , it is convenient to use the well-known Stirling-type two-sided bound:

[TABLE]

In particular, it implies the following Gaussian type estimates (cf. [4])

Lemma 3.3. For all $k\geq 1$ ,

[TABLE]

Moreover, if $1\leq k\leq 2\lambda$ , then

[TABLE]

Here, the lower bound may be improved in the region $k\geq\lambda$ as

[TABLE]

4. Upper Bounds on $D$ and $\chi^{2}$

We now turn to Theorem 1.2 in the degenerate case, where the optimal bounds on the relative entropy and $\chi^{2}$ have a different behavior. As an intermediate step, let us derive the following upper bounds for the $\chi^{2}$ -distance and the relative entropy, by using the quantity

[TABLE]

Proposition 4.1. For $\lambda\geq 1/2$ , we have

[TABLE]

These bounds are sharp when $\lambda_{2}\geq\kappa\lambda$ , cf. Propositions 5.1 and 6.1.

Proof. Setting $g(w)=\prod_{l=1}^{n}(q_{l}+p_{l}w)$ , $w\in{\mathbb{C}}$ , we exploit the contour integral representation (3.3), i.e.,

[TABLE]

It yields an upper bound

[TABLE]

where

[TABLE]

Let us choose $r=k/\lambda$ . Since $q_{j}+p_{j}r\leq e^{p_{j}(r-1)}$ , we have

[TABLE]

Moreover, applying $(\frac{e}{k})^{k}\leq e\sqrt{k}\,\frac{1}{k!}$ , cf. (3.4), the above is simplified to

[TABLE]

where $f(k)$ is the density of the Poisson law with parameter $\lambda$ .

Now, to bound $I(r)$ , for all $|\theta|\leq\pi$ , using $\sin(\frac{\theta}{2})\geq\frac{1}{\pi}\,\theta$ , we have

[TABLE]

Here

[TABLE]

and

[TABLE]

These right-hand sides have the form

[TABLE]

and we get

[TABLE]

First, we consider the region $\frac{1}{4}\,\lambda\leq k\leq 4\lambda$ , in which case $\frac{1}{4}\leq r\leq 4$ and $\psi(r)\geq\frac{1}{4}\,(\lambda-\lambda_{2})$ and thus

[TABLE]

Applying this bound together with (4.4) in (4.3), we get

[TABLE]

As for the regions $1\leq k<\frac{1}{4}\,\lambda$ and $k>4\lambda$ , we use the property $|I(r)|\leq 1$ , which yields simpler upper bounds

[TABLE]

Now, recall that ${\mathbb{P}}\{W=0\}\leq f(0)$ (as mentioned in (3.1)) and write

[TABLE]

By (4.5),

[TABLE]

To estimate $S_{1}$ , first note that $S_{1}=0$ for $\lambda<4$ . For $\lambda\geq 4$ , using the property that the function $k\rightarrow(\frac{e\lambda}{k})^{k}$ is increasing for $k<\lambda$ , we obtain from (4.6) that

[TABLE]

Here we applied the inequality

[TABLE]

with $p=3/2$ and $c=4/e^{3}$ .

To estimate $S_{3}$ , one may bound the sequence $\sqrt{k}\,(\frac{e\lambda}{k})^{k}$ for $k>4\lambda\geq 2$ by the geometric progression $Ab^{k}$ with suitable parameters $A>0$ and $0<b<1$ . To this aim, consider the function

[TABLE]

We have

[TABLE]

if $b\geq\frac{1}{4}\,e^{1/4}$ which we assume. In this case, $u$ is decreasing, so that $u(x)\leq u(4\lambda)=\log\big{(}2\sqrt{\lambda}\,(\frac{e}{4b})^{4\lambda}\big{)}\leq\log A$ , where

[TABLE]

where on the last step we choose $b=3/4$ and applied (4.7) with $p=1/2$ and $c=e/3$ . Thus, putting $k_{0}=[4\lambda]+1$ and noting that $k_{0}\geq 2$ , we get

[TABLE]

Finally, using $Q=\lambda Q_{0}\geq 1/2$ (due to $\lambda\geq 1/2$ ), we get $S_{1}+S_{3}<5.611\leq 5.611\sqrt{2Q}$ . This gives $S_{1}+S_{2}+S_{3}<(5.611\sqrt{2}+4e)\sqrt{Q}<18.81\sqrt{Q}$ , so (4.1) follows.

Turning to the second assertion and using ${\mathbb{P}}\{W=0\}\leq f(0)$ , write similarly

[TABLE]

For the region $\frac{1}{4}\,\lambda\leq k\leq 4\lambda$ , we can apply the bound (4.5) again, which gives

[TABLE]

and therefore, using $Q\geq 1/2$ ,

[TABLE]

Using (4.6) together with the inequality $\log(et)\leq t$ ( $t>0$ ), we obtain, similarly to the derivation of the bound on $T_{1}$ in the $\chi^{2}$ -case, that

[TABLE]

Choosing again $k_{0}=[4\lambda]+1$ similarly to the derivation of the bound on $S_{3}$ in the $\chi^{2}$ -case, we also get

[TABLE]

Hence, $T_{1}+T_{3}<5.087<16.578\,\log(eQ)$ , and (4.2) follows as well. ∎

5. Lower Bound on $\chi^{2}$

Here, we complement Proposition 4.1 by a similar lower bound for the $\chi^{2}$ -distance in terms of the same quantity $Q=\lambda/\max\{1,\lambda-\lambda_{2}\}$ . Let $c_{0}=2.5\cdot 10^{-6}$ .

Proposition 5.1. If $\lambda\geq 1/2$ , then with some absolute constant $c\in[c_{0},1)$

[TABLE]

Moreover,

[TABLE]

as long as $\lambda_{2}\geq(1-\frac{c^{2}}{4})\,\lambda$ .

Suppose that $\lambda_{2}\geq(1-\frac{c^{2}}{4})\,\lambda$ . To derive (5.2) from (5.1), it is sufficient to require that $c\sqrt{Q}\geq 2$ , since then $c\sqrt{Q}-1\geq\frac{c}{2}\sqrt{Q}$ . This condition is fulfilled, as long as $\lambda\geq\lambda_{0}=\frac{4}{c^{2}}$ and then we obtain (5.2). In the remaining case $\frac{1}{2}\leq\lambda\leq\lambda_{0}$ , the inequality (5.2) follows from the lower bound

[TABLE]

cf. (1.4). Indeed, in this case, $\lambda-\lambda_{2}\leq\frac{c^{2}}{4}\,\lambda\leq 1$ , so that $Q=\lambda\leq\frac{4}{c^{2}}$ , and thus $\frac{c}{9}\sqrt{Q}\leq\frac{2}{9}$ , while $\frac{1}{4}\,(\frac{\lambda_{2}}{\lambda})^{2}\geq\frac{1}{4}\,(1-\frac{c^{2}}{4})^{2}$ .

Thus, it remains to derive the first inequality (5.1). First we shall prove it, assuming that $\lambda-\lambda_{2}$ is sufficiently large. As in Section 4, for any fixed $r>0$ , we apply the Cauchy theorem and write

[TABLE]

with integration over the uniform distribution $\mu_{r}$ on the circle $|w|=r$ of the complex plane. Here and below

[TABLE]

and

[TABLE]

We split the integration over the two regions so that to work with the representation

[TABLE]

where

[TABLE]

To properly estimate $I_{k}(r)$ from below, $I_{k2}(r)$ needs to be estimated from above (in absolute value), while $I_{k1}(r)$ , which is a real number, should be estimated from below.

Furthermore, the quantity $R_{k}(r)$ needs to be estimated from below as well. To this aim, we choose the radius $r=r(k)>0$ by the condition $R_{k}^{\prime}(r)=0$ , or equivalently

[TABLE]

Since the function $F$ is monotone and $F(0)=0$ , $F(\infty)=n$ , there is a unique solution, say $r$ , to this equation as long as $n>k$ (which may be assumed). We also assume that not all $p_{k}$ are equal to 0 or 1, so that $\lambda_{2}<\lambda$ .

Let us also emphasize that $F$ is concave on the positive half-axis. Since $F(1)=\lambda$ , we necessarily have $r(k)<1$ in case $k<\lambda$ , and $r(k)>1$ in case $k>\lambda$ .

Lemma 5.2. For any $k=0,\dots,n-1$ , the solution $r=r(k)$ to the equation $(5.3)$ satisfies

[TABLE]

Moreover, in case $|k-\lambda|\leq\frac{1}{6}\,(\lambda-\lambda_{2})$ , we have $\frac{5}{6}\leq r\leq\frac{6}{5}$ , and actually with some $0\leq b_{i}\leq 1$

[TABLE]

Proof. We have

[TABLE]

The inverse function $F^{-1}:[0,n)\rightarrow[0,\infty)$ is increasing and convex. Hence, for any $s\in[0,n)$ ,

[TABLE]

Plugging $s=k$ , we obtain the first inequality.

Now, since $q_{l}+p_{l}r\leq 1$ for $r\leq 1$ , we conclude that $F^{\prime}(r)\geq\sum_{l=1}^{n}p_{l}q_{l}=\lambda-\lambda_{2}$ and $F(1)-F(r)\geq(1-r)(\lambda-\lambda_{2})$ . Thus, if $k\leq\lambda$ , we obtain that

[TABLE]

implying $r(k)\geq\frac{5}{6}$ . For $r\geq 1$ , one may use $q_{l}+p_{l}r\leq r$ , which gives $F^{\prime}(r)\geq\frac{1}{r^{2}}\,(\lambda-\lambda_{2})$ and $F(r)-F(1)\geq(1-\frac{1}{r})\,(\lambda-\lambda_{2})$ . Hence, again by the assumption,

[TABLE]

implying $r(k)\leq\frac{6}{5}$ . In both cases, $\frac{5}{6}\leq r(k)\leq\frac{6}{5}$ , proving the second assertion of the lemma.

Now, in the interval $\frac{5}{6}\leq r\leq\frac{6}{5}$ , we necessarily have $\frac{5}{6}\leq q_{l}+p_{l}r\leq\frac{6}{5}$ , so that

[TABLE]

In addition,

[TABLE]

Let us now write the Taylor expansion up to the linear and quadratic terms for the inverse function $F^{-1}(s)$ around the point $\lambda$ . Then we get

[TABLE]

where the points $s_{1}$ and $s_{2}$ lie between $\lambda$ and $s$ . Putting $r=F^{-1}(s)$ and $r_{i}=F^{-1}(s_{i})$ , the above is simplified as

[TABLE]

where $r_{1}$ and $r_{2}$ lie between $1$ and $r$ . It remains to apply these equalities with $s=k$ , that is, $r=r(k)$ , and note that $\frac{1}{F^{\prime}(r_{1})}\leq(\frac{6}{5})^{2}\,\frac{1}{\lambda-\lambda_{2}}$ , while

[TABLE]

Note that $(\frac{6}{5})^{2}=1.44$ and $(\frac{6}{5})^{9}<5.16$ . ∎

Lemma 5.3. Let $r=r(k)$ be the solution of $(5.3)$ for $0\leq\lambda-k\leq\frac{1}{6}\,(\lambda-\lambda_{2})$ . Then

[TABLE]

Proof. The function

[TABLE]

is vanishing at $r=1$ and has derivative

[TABLE]

Since $F$ is increasing and concave, $F(a)-F(b)\leq F^{\prime}(b)\,(a-b)$ whenever $a\geq b>0$ . In particular, in the interval $r(k)\leq r\leq 1$ , we have

[TABLE]

which implies

[TABLE]

By Lemma 5.2, $\frac{5}{6}\leq r(k)\leq 1$ and $1-r(k)\leq(\frac{6}{5})^{2}\,\frac{\lambda-k}{\lambda-\lambda_{2}}$ . Moreover, as was shown in the proof, $F^{\prime}(r(k))\leq(\frac{6}{5})^{2}\,(\lambda-\lambda_{2})$ . Hence

[TABLE]

Here, $(\frac{6}{5})^{7}<3.6$ . ∎

Lemma 5.4. Let $\lambda-\lambda_{2}\geq 100$ . Then, for $0\leq\lambda-k\leq\frac{1}{6}\,(\lambda-\lambda_{2})$ ,

[TABLE]

Proof. By Lemma 5.2, $1\geq r(k)\geq\frac{5}{6}$ . As in the proof of Proposition 4.1, recall that for $r>0$ and $-\pi\leq\theta\leq\pi$ ,

[TABLE]

For $\frac{5}{6}\leq r\leq 1$ , necessarily $q_{l}+p_{l}r\leq 1$ and

[TABLE]

Hence

[TABLE]

Let us now estimate $I_{k1}(r)$ from below. Using $4q_{l}p_{l}r\leq(q_{l}+p_{l}r)^{2}$ which is the same as $(q_{l}-p_{l}r)^{2}\geq 0$ , we have, for $|\theta|\leq\pi/2$ ,

[TABLE]

In the region $0\leq\varepsilon\leq\varepsilon_{0}<1$ , there is a lower bound $1-\varepsilon\geq e^{-c\varepsilon}$ with best attainable constant when $\varepsilon=\varepsilon_{0}$ . In the case $\varepsilon_{0}=\frac{1}{2}$ , this constant is given by $c=2\log 2$ . Therefore, for $|\theta|\leq\frac{\pi}{2}$ ,

[TABLE]

Here, the involved function

[TABLE]

is increasing in $0\leq r\leq r_{l}\equiv q_{l}/p_{l}$ and decreasing in $r\geq r_{l}$ . Hence, if $r_{l}\geq 1$ , then $\max_{\frac{5}{6}\leq r\leq 1}w_{l}(r)=w_{l}(1)=1$ . If $r_{l}\leq\frac{5}{6}$ , that is, when $p_{l}\geq\frac{6}{11}$ , we have

[TABLE]

Finally, if $\frac{5}{6}\leq r_{l}\leq 1$ , which is equivalent to $\frac{1}{2}\leq p_{l}\leq\frac{6}{11}$ , we have

[TABLE]

Thus, in all cases, $w_{l}(r)\leq\frac{6}{5}$ on the interval $\frac{5}{6}\leq r\leq 1$ , so that

[TABLE]

and thus

[TABLE]

Here we used $\lambda-\lambda_{2}\geq 100$ , which ensures that

[TABLE]

where $\xi\sim N(0,1)$ . In addition (recalling one of the upper bounds when bounding the integral $I_{k2}$ from above), and using $\sin(\theta/2)\geq\frac{\sqrt{2}}{\pi}\,\theta$ for $0\leq\theta\leq\pi/2$ , we get that

[TABLE]

Now, the assumption (5.3) may be rewritten as

[TABLE]

Here, the functions ${\rm Im}\big{(}\log(q_{l}+p_{l}r\,e^{i\theta})\big{)}$ are odd, so their 2nd derivatives are vanishing at zero. We now apply the Taylor formula up to the cubic term to the function

[TABLE]

on the interval $\theta\in[-\pi/2,\pi/2]$ to get that

[TABLE]

with some $\theta_{0}\in[-\frac{\pi}{2},\frac{\pi}{2}]$ . To perform differentiation, consider a function of the form

[TABLE]

We have

[TABLE]

Therefore,

[TABLE]

implying that

[TABLE]

But, for $\frac{5}{6}\leq r\leq 1$ and $|\theta|\leq\frac{\pi}{2}$ ,

[TABLE]

Hence

[TABLE]

Here we used the property that $u_{l}(r)$ is increasing in $r\leq r_{l}=q_{l}/p_{l}$ and is decreasing in $r\geq r_{l}$ . If $r_{l}\geq 1$ , this gives $u_{l}(r)\leq u_{l}(1)=\frac{1}{q_{l}^{2}+p_{l}^{2}}\leq 2$ . If $r_{l}\leq\frac{5}{6}$ , that is, when $p_{l}\geq\frac{6}{11}$ , we get $u_{l}(r)\leq u_{l}(5/6)=\frac{5/6}{q_{l}^{2}+\frac{5}{6}\,p_{l}^{2}}$ . The latter expression is minimized at $p_{l}=\frac{6}{11}$ where it has the value $\frac{121}{66}$ . Finally, if $\frac{5}{6}\leq r_{l}\leq 1$ , which is equivalent to $\frac{1}{2}\leq p_{l}\leq\frac{6}{11}$ , we have

[TABLE]

From this,

[TABLE]

so that

[TABLE]

with $c_{0}=\frac{121}{60}+2\,(\frac{121}{60})^{3/2}<7.744438$ . Thus,

[TABLE]

Now, as we mentioned before, the function $A_{k}$ is odd in $\theta$ , so that $I_{k1}(r)$ is a real number given by

[TABLE]

Hence, using

[TABLE]

from the previous estimates we may deduce the lower bound

[TABLE]

where on the last step we assume that $\lambda-\lambda_{2}\geq 100$ . Together with the upper bound on $I_{k2}$ , we arrive at the lower bound

[TABLE]

Thus, Lemma 5.4 is proved. ∎

Proof of Proposition 5.1. We conclude from Lemmas 5.3 and 5.4 that

[TABLE]

for $0\leq\lambda-k\leq\frac{1}{6}\,(\lambda-\lambda_{2})$ under the assumption $\lambda-\lambda_{2}\geq 100$ .

On the other hand, $f(k)={\mathbb{P}}\{Z=k\}\leq\frac{1}{\sqrt{2\pi k}}$ , cf. (3.5). Since $k\geq\lambda-\frac{1}{6}\,(\lambda-\lambda_{2})\geq\frac{5}{6}\,\lambda$ , we have

[TABLE]

As a consequence,

[TABLE]

In order to clarify the last inequality, note that the condition $\lambda-\lambda_{2}\geq 100$ implies that $\lambda>100$ . The above summation is performed over all integers $k$ from the interval $\lambda-\frac{1}{6}\sqrt{\lambda-\lambda_{2}}\leq x\leq\lambda$ of length at least $10/6$ . It contains at least one integer point, and actually, the number of integer points in it is at least $h=\frac{1}{6}\sqrt{\lambda-\lambda_{2}}$ . Moreover,

[TABLE]

Here, we used the bounds $4\,\frac{\lambda-[\lambda]}{\sqrt{\lambda-\lambda_{2}}}\leq\frac{2}{5}$ and $4\,\frac{\lambda-[\lambda-h]}{\sqrt{\lambda-\lambda_{2}}}\geq 4\,\frac{\lambda-[\lambda-10/6]}{10}\geq\frac{2}{3}$ , together with $\Phi(2/3)-\Phi(2/5)>0.09$ .

In order to treat the region $\lambda-\lambda_{2}\leq 100$ , we apply Proposition 2.2. Let $W_{1}=W$ and $W_{2}=Y_{1}+\dots+Y_{m}$ , where $Y_{1},\dots Y_{m}$ are independent Bernoulli random variables taking values 1 and 0 with probabilities $1/2$ and $m=400$ . Assume as well that $W$ and $W_{2}$ are independent. Then $\tilde{\lambda}=\lambda+m/2$ and $\tilde{\lambda}_{2}=\lambda_{2}+m/4$ satisfy the condition $\tilde{\lambda}-\tilde{\lambda}_{2}\geq 100$ .

Denote by $Z_{2}$ a Poisson random variable with ${\mathbb{E}}Z_{2}=m/2$ which is independent of $Z_{1}=Z$ . By the previous step and the inequality (2.4) of Proposition 2.2,

[TABLE]

Here, by (4.1), $\chi^{2}(W_{2},Z_{2})\leq 19\sqrt{2}$ . Moreover, since $\lambda-\lambda_{2}\leq 100$ , we have

[TABLE]

It follows that

[TABLE]

Hence, Proposition 5.1 holds in the case $\lambda-\lambda_{2}\leq 100$ as well.

6. Lower Bound on $D$

An analogue of Proposition 5.1 is the following statement for the relative entropy. Recall that $Q=\lambda/\max\{1,\lambda-\lambda_{2}\}$ .

Proposition 6.1. If $\lambda_{2}\geq\kappa_{0}\lambda$ and $\lambda\geq\lambda_{0}$ , then

[TABLE]

where $\kappa_{0}=1-\exp\{-2\cdot 10^{7}\}$ , $\lambda_{0}=\exp\{2\cdot 10^{7}\}$ , and $c_{0}=e^{-14}$ .

Proof. Let us recall two estimates from the previous section, namely

[TABLE]

The first one is valid under the conditions $0\leq\lambda-k\leq\frac{1}{6}\,(\lambda-\lambda_{2})$ and $\lambda-\lambda_{2}\geq 100$ , cf. (5.4). Clearly, they are fulfilled if $0\leq\lambda-k\leq\frac{5}{3}\sqrt{\lambda-\lambda_{2}}$ and $\lambda-\lambda_{2}\geq 100$ . If additionally $\lambda_{2}\geq\kappa\lambda$ , $0<\kappa<1$ , then

[TABLE]

Since $k\geq\frac{5}{6}\,\lambda$ , we also have an upper bound

[TABLE]

In order that $w_{k}\geq v_{k}$ , it is therefore sufficient to require that $\frac{1}{10\sqrt{1-\kappa}}\,e^{-100/9}\geq\frac{1}{\sqrt{5\pi/3}}$ , that is, $1-\kappa\leq\frac{\pi}{60}\,e^{-200/9}$ . We have, moreover,

[TABLE]

Now, applying the inequality (2.1) of Proposition 2.1, we get

[TABLE]

Note that, if $\lambda-\lambda_{2}\geq 100$ , the $x$ -interval $0\leq\lambda-x\leq\frac{5}{3}\sqrt{\lambda-\lambda_{2}}$ has length at least $50/3$ , so, the total number of integer points in this interval is at least $50/3$ as well. Hence, the last sum can be bounded from below by

[TABLE]

Thus,

[TABLE]

Moreover, if $\lambda_{2}\geq\kappa\lambda$ with $\kappa\geq\kappa_{1}=1-\exp\{-60\,e^{11}\}$ , then

[TABLE]

and (6.2) yields

[TABLE]

The proposition is thus proved under the conditions $\lambda-\lambda_{2}\geq 100$ and $\lambda_{2}\geq\kappa\lambda$ with $\kappa_{1}\leq\kappa<1$ . It remains to eliminate the first condition, assuming that $\lambda-\lambda_{2}<100$ and again that $\lambda_{2}\geq\kappa\lambda$ with $\kappa$ being sufficiently close to 1. To this aim, we appeal to Proposition 2.2 again like in the last step of the proof of Proposition 5.1. Namely, using the same notations and assumptions, from the inequality (2.3) and using (6.3), we obtain that

[TABLE]

where $W_{1}=W$ and $Z_{1}=Z$ . It holds, as long as $\tilde{\lambda}_{2}\geq\kappa\tilde{\lambda}$ , i.e., $\lambda_{2}+m/4\,\geq\,\kappa\,(\lambda+m/2).$ Since $\lambda-\lambda_{2}<100$ , the latter would follow from

[TABLE]

which is solved as

[TABLE]

Moreover, by (4.2), we have $D(W_{2}||Z_{2})\leq 23\,\log(2e)$ . This bound may be used in (6.4), which gives

[TABLE]

where the second inequality holds true when $1-\kappa$ is sufficiently small. Namely,

[TABLE]

if $\tilde{\lambda}_{2}\geq\kappa\tilde{\lambda}$ and $1-\kappa\leq\exp\{-8\cdot 23\cdot\log(2e)\cdot e^{11}\}$ . Since the product in the exponent is smaller than $1.87\cdot 10^{7}$ , we may choose $\kappa=1-\exp\{-1.87\cdot 10^{7}\}>\kappa_{1}$ . In this case,

[TABLE]

assuming that $\lambda\geq 200\,\frac{\kappa}{1-\kappa}$ . But

[TABLE]

for all $\lambda\geq 4\cdot 10^{4}$ . It remains to note that $200\,\frac{\kappa}{1-\kappa}<\lambda_{0}$ , $\kappa<\kappa_{0}$ , $\frac{1}{2}\,c_{1}>c_{0}$ . ∎

7. Proof of Theorem 1.1

Let us summarize. Using the quantity

[TABLE]

the results on Poisson approximation obtained for different regions of $\lambda$ and $\lambda_{2}$ can be combined in the form of the following two-sided bounds

[TABLE]

which are valid up to some absolute positive constants $c_{1}$ and $c_{2}$ . Let us describe the proof of Theorem 1.1 and provide explicit values for these constants. As we will see, (7.1)-(7.2) hold with $c_{1}=10^{-8}$ and $c_{2}=5.6\cdot 10^{7}$ .

An upper bound in (7.1).

If $\lambda\leq 1/2$ , these bounds simplify and are made precise via

[TABLE]

Here, the left inequality holds for all $\lambda$ and $\lambda_{2}$ , cf. [H-J-K], while the right inequality is part of Proposition 3.1. Note that $\lambda\leq 1/2$ implies $\lambda_{2}\leq\frac{1}{2}\,\lambda$ .

If $\lambda\geq 1/2$ and $\lambda_{2}\leq\frac{1}{2}\,\lambda$ , we have, by Proposition 3.2,

[TABLE]

so that

[TABLE]

In the case where $\lambda\geq 1/2$ and $\lambda_{2}>\frac{1}{2}\,\lambda$ , one may apply (4.2) which gives

[TABLE]

Here, the right-hand side contains a better numerical constant in comparison with (7.4), and we finally get (7.1) with a constant $c_{2}=56\cdot 10^{6}$ .

A lower bound in (7.1).

If $\lambda\leq 1$ , then $F=1$ , so that the lower bound in (7.3) yields (7.1) with $c_{1}=1/4$ .

If $\lambda\geq 1$ , the inequality (7.4) may be reversed by virtue of (6.1), which gives

[TABLE]

with $c_{0}=e^{-14}$ , provided that $\lambda_{2}\geq\kappa_{0}\lambda$ and $\lambda\geq\lambda_{0}$ , where $\kappa_{0}=1-\exp\{-2\cdot 10^{7}\}$ and $\lambda_{0}=\exp\{2\cdot 10^{7}\}$ . But, the remaining regions belong to the non-degenerate case, where $F$ is bounded by a quantity which depends on $\kappa_{0}$ or $\lambda_{0}$ . Indeed, if $\lambda_{2}\leq\kappa_{0}\lambda$ , then $\log F\leq-\log(1-\kappa_{0})=2\cdot 10^{7}$ , so,

[TABLE]

This means that the left inequality in (7.1) holds with a constant $c_{1}=\frac{1}{4\,(1+2\cdot 10^{7})}$ which is smaller than $c_{0}$ in the analogous inequality (7.5). Similarly, if $1\leq\lambda<\lambda_{0}$ , then $F\leq\lambda<\lambda_{0}$ , and we get, by the lower bound in (7.3),

[TABLE]

This means that the left inequality in (7.1) holds true with the same constant $c_{1}$ as above. Thus, the lower bound in (7.1) holds with constant $c_{1}$ ( $>10^{-8}$ ).

An upper bound in (7.2).

If $\lambda\leq 1/2$ , we have (7.3), which implies (7.2) with $c_{2}=15$ .

If $\lambda\geq 1/2$ and $\lambda_{2}\leq\frac{1}{2}\,\lambda$ , a stronger version of (7.4) is provided by Proposition 3.2, which gives

[TABLE]

so that (7.2) holds true with $c_{2}=56\cdot 10^{6}$ . In the case where $\lambda\geq 1/2$ and $\lambda_{2}>\frac{1}{2}\,\lambda$ , one may apply (4.1) which gives

[TABLE]

Here, the right-hand side contains a better numerical constant, and we finally get (7.2) with the same constant $c_{2}$ as in (7.1).

A lower bound in (7.2).

If $\lambda\leq 1$ , then $F=1$ , so that the lower bound in (7.3) yields (7.1) with $c_{1}=1/4$ .

Assume that $\lambda\geq 1$ , in which case $F=Q=\lambda/\max(1,\lambda-\lambda_{2})$ . By (5.2), we have

[TABLE]

with $c_{0}=2.5\cdot 10^{-6}$ , provided that $\lambda_{2}\geq\kappa_{0}\lambda$ , $\kappa_{0}=1-c_{0}^{2}/4$ . This gives

[TABLE]

and we obtain the left inequality in (7.2) with $c_{1}=c_{0}/9>10^{-7}$ .

The remaining region belongs to the non-degenerate case, where $F$ is bounded. Indeed, if $\lambda_{2}\leq\kappa_{0}\lambda$ , then $1/\sqrt{F}\geq\sqrt{1-\kappa_{0}}=\frac{c_{0}}{2}=0.8\,\cdot 10^{-6}$ , so that, by the left inequality in (7.3),

[TABLE]

This means that the left inequality in (7.1) holds true with constant $c_{1}=2\,\cdot 10^{-7}$ which is slightly better than the constant in the analogous inequality (7.6). Thus, the lower bound in (7.2) holds true with constant $c_{1}=10^{-7}$ . ∎

8. Tsallis versus Vajda-Pearson

We now turn to the Tsallis relative entropies of other indexes. To make an application of non-uniform bounds more convenient, first let us relate $T_{\alpha}$ to the Vajda-Pearson distance

[TABLE]

It is defined for arbitrary random elements $X$ and $Z$ in a measure space $(\Omega,\pi)$ whose distributions are absolutely continuous and have densities $p$ and $q$ respectively with respect to the measure $\pi$ on $\Omega$ (the defnition does not depend on the choice of $\pi$ ).

Recall that

[TABLE]

so that $T_{2}=\chi_{2}$ is the classical Pearson distance, and note that $T_{\alpha}=\chi_{\alpha}=\infty$ as long as the distribution of $X$ is not absolutely continuous with respect to the distribution of $Z$ . We need the following auxilliary result.

Proposition 8.1. For any $\alpha\geq 2$ ,

[TABLE]

Proof. We may assume that the distribution of $X$ is absolutely continuous with respect to the distribution of $Z$ , with $\chi_{\alpha}(W,Z)<\infty$ . In this case, the (non-negative) function $\xi=p/q$ is well defined a.e. with respect to the probability measure $Q=q\,d\pi$ . We consider it as a random variable on the probability space $(\Omega,Q)$ with finite moment of order $\alpha$ . Note that

[TABLE]

Putting $\eta=\xi-1\geq-1$ , define the function $\psi(t)={\mathbb{E}}\,(1+t\eta)^{\alpha}-1$ , $t\geq 0$ , so that $\psi(1)=(\alpha-1)\,T_{\alpha}(W||Z)$ . By the integral Taylor formula,

[TABLE]

Introducing the sets $A=\{\xi\leq 2\}=\{\eta\leq 1\}$ and $B=\{\xi>2\}=\{\eta>1\}$ , we have

[TABLE]

and

[TABLE]

We obtain the assertion of the proposition from the last two bounds. ∎

9. Estimates of Vajda-Pearson distances

For the proof of Theorem 1.2 we need the following propositions. We thus return to the setting of Bernoulli trials. Let us denote by $c(\alpha)$ a positive constant depending on $\alpha$ only, which may vary from place to place.

Proposition 9.1. For $\alpha>1$ and $\lambda\leq\frac{1}{2}$ , we have

[TABLE]

Proof. Applying Lemmas III.1-2 and repeating the argument used in the proof of Proposition III.4 from [4], we obtain that

[TABLE]

∎

Proposition 9.2. Let $\alpha>1$ . If $\lambda\geq\,\frac{1}{2}$ and $\lambda_{2}\,\leq\,\kappa\lambda$ with $\kappa\in(0,1)$ , then

[TABLE]

Proof. Write

[TABLE]

In the range $0\leq\,k\,\leq[2\lambda]$ we apply the inequality (VI.2) from [4] which gives

[TABLE]

Therefore

[TABLE]

Here we use the upper bound ${\mathbb{E}}\,|Z-\lambda|^{2\alpha}\leq c(\alpha)\,\lambda^{\alpha}$ .

In order to estimate $S_{2}$ we use the inequalities (VI.3) and (II.1) from [4] to get

[TABLE]

The assertion of the proposition follows immediately from the last two estimates. ∎

10. Proof of Theorem 1.2

To complete the proof of Theorem 1.2, we need the following two lemmas. Recall that $Q=\lambda/\max\{1,\lambda-\lambda_{2}\}$ .

Lemma 10.1. For $\alpha\,>1$ and $\lambda\geq\,\frac{1}{2}$ ,

[TABLE]

Proof. By the definition of the Tsallis distance,

[TABLE]

By (4.5),

[TABLE]

Using (4.6) and repeating the argument of Section 4, we obtain the upper bounds $S_{1}+S_{3}\,\leq\,c(\alpha)$ . The three last estimates give the assertion of the proposition. ∎

Lemma 10.2. For $\alpha\,>1$ and $\lambda\geq\,\frac{1}{2}$ , with some constant $c_{1}(\alpha)\in(0,1)$

[TABLE]

Moreover

[TABLE]

as long as $\lambda_{2}\,\geq\,\big{(}1-\frac{c_{1}(\alpha)^{2}}{4}\big{)}\,\lambda$ .

Proof. The assertion (10.2) follows from the assertion (10.1) in the same way as (5.2) follows from (5.1). Therefore we omit the proof.

In order to prove (10.1) we use the lower bound (5.4). Repeating the argument of the proof of Proposition 5.1, we easily obtain the lower bound, under the assumption $\lambda-\lambda_{2}\,\geq\,100$ ,

[TABLE]

In order to treat the region $\lambda-\lambda_{2}\,\leq\,100$ we refer to Johnson [11], pp. 133–134, and repeat the argument of the end of Section 5. ∎

Proof of Theorem 1.2. Assuming that $\frac{\lambda_{2}}{\lambda}\leq 1-\frac{1}{4}c_{1}(\alpha)^{2}$ , we have $F\,\sim\,1$ with involved constants depending on $\alpha$ , and then we need to show that $T_{\alpha}(W||Z)\sim(\frac{\lambda_{2}}{\lambda})^{2}$ .

In the case $1\,<\alpha\,\leq 2$ , we have

[TABLE]

Turning to the case $\alpha\,\geq 2$ , first let $\lambda\,\leq\frac{1}{2}$ . Since $\lambda_{2}\leq\lambda^{2}$ , by Propositions 8.1 and 9.1,

[TABLE]

Now, let $\lambda\,\geq\frac{1}{2}$ . Then, by Propositions 8.1 and 9.2, we conclude that

[TABLE]

It remains to consider the region $\frac{\lambda_{2}}{\lambda}\,\geq\,1-\frac{1}{4}\,c_{1}(\alpha)^{2}$ . But in this case, the assertion of the theorem immediately follows from Lemmas 10.1 and 10.2.

11. Difference of Entropies

For the proof of Corollary 1.3, we shall use another functional

[TABLE]

where $Z$ is an integer-valued random variable. Thus, while the Shannon entropy $H(Z)=-{\mathbb{E}}\,\log v(Z)$ describes the average of the informational content $-\log v(Z)$ , the informational quantity $H_{2}(Z)$ represents the 2nd moment of this random variable.

An application of Theorem 1.1 is based upon the following elementary relation.

Proposition 11.1. For all integer-valued random variables $W$ and $Z$ with finite entropies, we have

[TABLE]

Proof. We may assume that the distribution of $W$ is absolutely continuous with respect to the distribution of $Z$ (since otherwise $\chi^{2}(W,Z)=\infty$ ). Equivalently, for all $k\in{\mathbb{Z}}$ , $v_{k}=0\Rightarrow w_{k}=0$ , where $w_{k}={\mathbb{P}}\{W=k\}$ . Define $t_{k}=w_{k}/v_{k}$ in case $v_{k}>0$ . Recalling the definition (1.8), we then have

[TABLE]

We now apply the inequality $t\log t\leq(t-1)+(t-1)^{2}$ ( $t\geq 0$ ), obtaining

[TABLE]

Here, the first sum in the last bound is exactly $\chi^{2}(W,Z)$ , while, by Cauchy’s inequality, the square of the last sum is bounded from above by

[TABLE]

∎

In view of (11.1), we also need:

Proposition 11.2. If $Z$ has a Poisson distribution with parameter $\lambda$ , then

[TABLE]

Proof. Put $v_{k}={\mathbb{P}}\{Z=k\}$ . In particular, $v_{0}\,(\log v_{0})^{2}=\lambda^{2}\,e^{-\lambda}$ and $v_{1}\,(\log v_{1})^{2}=\lambda e^{-\lambda}\,(\lambda+\log(1/\lambda))^{2}$ . This shows that the above upper bound for small $\lambda$ can be reversed up to a constant. For $\lambda\leq 1$ , given $k\geq 1$ , from

[TABLE]

we get

[TABLE]

Hence, $H_{2}^{2}(Z)\leq 25\,\lambda\,\log^{2}(e/\lambda)$ , thus proving the second upper bound of the lemma.

Now, assuming that $\lambda\geq 1$ , let us apply the lower bounds (3.6)-(3.7) from Lemma 3.3, which for all $k\geq 1$ give

[TABLE]

and

[TABLE]

Note that this bound is also true for $k=0$ . Using the concavity of the function $\log^{2}x$ in $x\geq e$ and applying Jensen’s inequality, we therefore obtain that

[TABLE]

Hence $H_{2}(Z)\leq Cx$ , $x=\log(1+\lambda)\geq\log 2$ , with $C^{2}=2\,(1+\frac{1}{x})^{2}+\frac{18}{x^{2}}<50$ .

Applying the upper bound (3.6) from Lemma 3.3, we also see that this upper bound on $H_{2}$ can be reversed up to a constant as well.

∎

Remark 11.3. With similar arguments, it follows that

[TABLE]

which can be reversed modulo an absolute factor $c>0$ . Hence, $H_{2}(Z)\sim H(Z)$ as long as $\lambda$ stays bounded away from zero.

Proof of Corollary 1.3. By Theorem 1.1 with $W$ as in (1.1) and with a Poisson random variable $Z$ with parameter $\lambda$ , we have

[TABLE]

up to some absolute constant $C$ . Using this estimate in (11.1) and applying Proposition 11.2, the desired inequality (1.9) immediately follows (in view of $\lambda_{2}\leq\lambda$ ).

To derive a more precise inequality illustrating the asymptotic behaviour in $\lambda$ in the typical case $\lambda_{2}\leq\frac{1}{2}\,\lambda$ , let us apply once more Theorem 1.1 with its sharper bound

[TABLE]

as in Proposition 3.1. By Proposition 11.1, this gives

[TABLE]

It remains to note that $1+H_{2}(Z)\leq C\log(2+\lambda)$ . according to Proposition 11.2. ∎

Acknowledgement. We would like to thank the referee for drawing our attention to the work by V. Zacharovas and H.-K. Hwang. Thanks also to A. Zaitsev for drawing our attention to the work by I. S. Borisov and I. S. Vorozheĭkin.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Andersen, K. F. Weighted inequalities for iterated convolutions. Proc. Amer. Math. Soc. 127 (1999), no. 9, 2643–2651.
2[2] Barbour, A. D.; Hall, P. On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 (1984), no. 3, 473–480.
3[3] Bobkov, S. G.; Chistyakov, G. P.; Götze, F. Rényi divergence and the central limit theorem. Ann. Probab. 47 (2019), no. 1, 270–323.
4[4] Bobkov, S. G.; Chistyakov, G. P.; Götze, F. Non-uniform bounds in the Poisson approximation with applications to informational distances. I. IEEE Transactions on Information Theory. Published online 25 April 2019.
5[5] Borisov, I. S.; Vorozheĭkin, I. S. Accuracy of approximation in the Poisson theorem in terms of χ 2 superscript 𝜒 2 \chi^{2} distance. (Russian) Sibirsk. Mat. Zh. 49 (2008), no. 1, 8–22; translation in Sib. Math. J. 49 (2008), no. 1, 5–17.
6[6] van Erven, T., Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory 60 (2014), no. 7, 3797–3820.
7[7] Harremoës, P. Binomial and Poisson distributions as maximum entropy distributions. IEEE Trans. Inform. Theory 47 (2001), no. 5, 2039–2041.
8[8] Harremoës, P.; Johnson, O.; Kontoyiannis. Thinning and information projections. ar Xive:1601.04255, Jan. 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

NON-UNIFORM BOUNDS IN THE POISSON APPROXIMATION

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

2. General Bounds on Relative Entropy and χ2\chi^{2}χ2

3. Poisson Approximation in the Non-Degenerate Case

4. Upper Bounds on DDD and χ2\chi^{2}χ2

5. Lower Bound on χ2\chi^{2}χ2

6. Lower Bound on DDD

7. Proof of Theorem 1.1

8. Tsallis versus Vajda-Pearson

9. Estimates of Vajda-Pearson distances

10. Proof of Theorem 1.2

11. Difference of Entropies

2. General Bounds on Relative Entropy and $\chi^{2}$

4. Upper Bounds on $D$ and $\chi^{2}$

5. Lower Bound on $\chi^{2}$

6. Lower Bound on $D$