Non-Uniform Bounds in the Poisson Approximation with Applications to Informational Distances. II
S.G. Bobkov, G.P. Chistyakov, F. G\"otze

TL;DR
This paper extends previous work on bounds for how closely sums of independent Bernoulli variables approximate a Poisson distribution, using various informational distances, without parameter restrictions.
Contribution
It generalizes earlier results by removing parameter constraints, providing asymptotically optimal bounds for distribution deviations in informational distances.
Findings
Derived bounds for Bernoulli sums in terms of Shannon and Rényi distances
Extended previous results to all Bernoulli parameters without restrictions
Provided asymptotically optimal bounds for distribution deviations
Abstract
We explore asymptotically optimal bounds for deviations of distributions of independent Bernoulli random variables from the Poisson limit in terms of the Shannon relative entropy and R\'enyi/Tsallis relative distances (including Pearson's ). This part generalizes the results obtained in Part I and removes any constraints on the parameters of the Bernoulli distributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Financial Risk and Volatility Modeling · Statistical Distribution Estimation and Applications
School of Mathematics, University of Minnesota, USA; research was partially supported by the Simons Foundation and the NSF grant DMS-1855575
Faculty of Mathematics, University of Bielefeld, Germany; research was partially supported by SFB 1283
NON-UNIFORM BOUNDS IN THE POISSON APPROXIMATION
WITH APPLICATIONS TO INFORMATIONAL DISTANCES. II
S. G. Bobkov1 missing Sergey G. Bobkov School of Mathematics, University of Minnesota 127 Vincent Hall, 206 Church St. S.E., Minneapolis, MN 55455 USA
,
G. P. Chistyakov2 missing Gennadiy P. ChistyakovFakultät für Mathematik, Universität BielefeldPostfach 100131, 33501 Bielefeld, Germany
and
F. Götze2
Friedrich GötzeFakultät für Mathematik, Universität BielefeldPostfach 100131, 33501 Bielefeld, Germany
Abstract.
We explore asymptotically optimal bounds for deviations of distributions of independent Bernoulli random variables from the Poisson limit in terms of the Shannon relative entropy and Rényi/relative Tsallis distances (including Pearson’s ). This part generalizes the results obtained in Part I and removes any constraints on the parameters of the Bernoulli distributions.
Key words and phrases:
-divergence, Relative entropy, Poisson approximation
1991 Mathematics Subject Classification:
Primary 60E, 60F
1. Introduction
Let be the sum of independent random variables taking values and [math] with respective probabilities and . Thus,
[TABLE]
where the summation runs over all 0-1 sequences such that .
Denote by a Poisson random variable with parameter , i.e., taking non-negative integer values wih probabilities
[TABLE]
It is well known that, if all are small, the distribution of approximates the distribution of in terms of the total variation distance . In particular, involving the functional , Barbour and Hall [2] derived a two sided bound
[TABLE]
There is considerable interest as well in the question of Poisson approximation for (stronger) informational distances, including the Rényi divergences, or equivalently – the Tsallis relative entropies in their full hierarchy. Being well-defined in the setting of abstract measure spaces (cf. e.g. [6], [3]), in the discrete model specified above these important quantities are respectively given for any parameter by
[TABLE]
and
[TABLE]
The functions and are non-decreasing, and in the particular cases and , we deal with the more familiar relative entropy (Kullback-Leibler distance) and the Pearson -distance
[TABLE]
We refer to [13] and [4] for historical references related to the lower and upper bounds as in (1.3), as well as to recent developments towards the problem of bounding of and . Here, let us only mention a few results in this direction.
In a rather general asymptotic regime (which is typical in applications), Borisov and Vorozheĭkin [5] observed that is approximately \frac{1}{2}\,\big{(}\frac{\lambda_{2}}{\lambda})^{2}, and more precisely,
[TABLE]
On the other hand, Harremoës, Johnson and Kontoyiannis [8] have recently derived a universal lower bound on the relative entropy, D\geq\frac{1}{4}\,\big{(}\frac{\lambda_{2}}{\lambda})^{2}. Here, the constant is best possible and is asymptotically attained in the case of equal probabilities [9]. It is therefore natural to wonder whether or not there are two-sided bounds such as
[TABLE]
This turns out to be true in the the case where is bounded away from 1. Based on orthogonal expansions in Charlier polynomials over the Poisson measure and using the Parseval identity in this context, Zacharovas and Hwang [13] obtained a superior upper bound
[TABLE]
(among other similar results for different distances). Consequently, if for example , then (1.4) is fulfilled with .
The upper estimate such as (1.4) also appears as a consequence of non-uniform bounds which have been recently studued in [4]. It was shown there that is of order at most on a large part of the support of the Poisson measure, especially when is large. One of the aims of this paper is to extend (1.4) modulo absolute constants to the whole range of . To formulate results in a compact form, let us use the notation , whenever two positive quantities are related by with some absolute constants . Introduce the quantity
[TABLE]
Clearly, .
Theorem 1.1. We have
[TABLE]
If is bounded away from 1, then is bounded, and (1.6) recovers (1.4). A similar conclusion is also true, when is not large, say , which is typical for applications (note that for such ’s, may be close to 1, and then (1.5) fails to be optimal). On the other hand, if these two assumptions on and are violated (which we hence forth call the “degenerate case”), both distances are bounded away from zero and can be large, since then
[TABLE]
This shows that the lower bound for in (1.4) may not be reversed in general. Indeed, in the extreme case with all , we have . Here , hence as
[TABLE]
[TABLE]
As a next step, we employ the non-uniform bounds of [4] to extend (1.4) and (1.6) to all Tsallis entropies.
Theorem 1.2. Given ,
[TABLE]
with involved constants depending on . In particular, as long as .
Let us finally mention one application of Theorem 1.1 to the problem of the estimation of the difference of entropies
[TABLE]
where stands for the Shannon entropy, that is,
[TABLE]
The property that is positive is a consequence of the assertion, recently proved by Hillion and Johnson [10], that is a concave function of the vector . Indeed, since is invariant under permutations of , this entropy attains its maximum on the simplex , at the point where all coordinates coincide, that is, for . But in that case, the distribution of represents the binomial law with parameters and whose entropy is dominated by , as was shown by Harremoës [7].
Thus, the difference of entropies in this particular model may be viewed as kind of informational distance. Sason proposed to bound for equal ’s by means of the so-called maximal coupling, cf. [12]. Here, we show that this distance may be controlled in terms of , which together with the upper bound on the Pearson distance as in (1.4)-(1.5) leads to the following estimate.
Corollary 1.3. With some constants depending only on , we have
[TABLE]
If , one may take with an absolute constant .
Below, we start with some general bounds involving the relative entropy and the Pearson distance (Section 2). In Section 3, we describe several results obtained in [4] in the non-degenerated case, and employ there some bounds for the probability function of the Poisson law. The remaining parts are devoted to the proof of Theorems 1.1 and 1.2 in the degenerate case (Sections 4-10) and of Corollary 1.3 (Section 11). Thus, the paper is structured as follows:
-
Introduction
-
General bounds on relative entropy and
-
Poisson approximation in the non-degenerate case
-
Upper bounds on and
-
Lower bound on
-
Lower bound on
-
Proof of Theorem 1.1
-
Tsallis versus Vajda-Pearson
-
Estimates of Vajda-Pearson distances
-
Proof of Theorem 1.2
-
Difference of entropies
2. General Bounds on Relative Entropy and
Before turning to the problem of lower and upper bounds for the relative entropy and -distance, we first collect several useful general inequalities. If two discrete random elements and in a measurable space take at most countably many values with probabilities and , the above distances are defined canonically by
[TABLE]
Proposition 2.1. We have
[TABLE]
Moreover,
[TABLE]
Proof. Using the Taylor formula for the logarithmic function, write
[TABLE]
Here
[TABLE]
thus proving the first assertion. Similarly, we have a second identity
[TABLE]
Adding the two identities, we get
[TABLE]
which is the desired inequality (2.2). ∎
Proposition 2.2. Let and be independent, non-negative, integer-valued random variables with finite means, and let and be independent Poisson random variables with and . Then
[TABLE]
In addition,
[TABLE]
For the proof, we refer to Johnson [11], pp. 133–134. Let us only mention that (2.4) is obtained in [11] in the more general form
[TABLE]
with arbitrary , which represents a Poisson analog of weighted convolution inequalities due to Andersen [1]. Here, for there is an equality, and comparing the derivatives of both sides at this point, we arrive at the relation (2.3).
3. Poisson Approximation in the Non-Degenerate Case
Now, we restrict ourselves to the random variables and with distributions described in (1.1)-(1.2). In particular,
[TABLE]
The bounds (1.4) follow from the following two assertions proved in [4]. To compare the lower and upper bounds, we recall the lower bound (1.4) of Harremoës, Johnson and Kontoyiannis [8].
Proposition 3.1. If , then
[TABLE]
where depends on and is an increasing continuous function with . In particular, if , then
[TABLE]
Proposition 3.2. If and with , then
[TABLE]
where one may take with some absolute constant, e.g. .
A natural approach to the Poisson approximation is based on the comparison of characteristic functions. Since the random variables and assume non-negative integer values only, one may equivalently consider the associated generating functions, similar as in [4]. The generating function for the Poisson law with parameter is given by
[TABLE]
which is an entire function of the complex variable . Correspondingly, the generating function for the distribution of the random variable is
[TABLE]
which is a polynomial of degree . Hence, the difference between the involved probabilities may be expressed with the help of the contour integrals by the Cauchy formula
[TABLE]
where is the uniform probability measure on the circle of an arbitrary radius . This identity for the difference of probabilities was used in [4] in the derivation of the upper bound in (3.2), while here the representation
[TABLE]
will be particularly helpful in the study of the degenerate case.
When estimating the Poisson probabilities
[TABLE]
for a fixed parameter , it is convenient to use the well-known Stirling-type two-sided bound:
[TABLE]
In particular, it implies the following Gaussian type estimates (cf. [4])
Lemma 3.3. For all ,
[TABLE]
Moreover, if , then
[TABLE]
Here, the lower bound may be improved in the region as
[TABLE]
4. Upper Bounds on and
We now turn to Theorem 1.2 in the degenerate case, where the optimal bounds on the relative entropy and have a different behavior. As an intermediate step, let us derive the following upper bounds for the -distance and the relative entropy, by using the quantity
[TABLE]
Proposition 4.1. For , we have
[TABLE]
These bounds are sharp when , cf. Propositions 5.1 and 6.1.
Proof. Setting , , we exploit the contour integral representation (3.3), i.e.,
[TABLE]
It yields an upper bound
[TABLE]
where
[TABLE]
Let us choose . Since , we have
[TABLE]
Moreover, applying , cf. (3.4), the above is simplified to
[TABLE]
where is the density of the Poisson law with parameter .
Now, to bound , for all , using , we have
[TABLE]
Here
[TABLE]
and
[TABLE]
These right-hand sides have the form
[TABLE]
and we get
[TABLE]
First, we consider the region , in which case and and thus
[TABLE]
Applying this bound together with (4.4) in (4.3), we get
[TABLE]
As for the regions and , we use the property , which yields simpler upper bounds
[TABLE]
Now, recall that (as mentioned in (3.1)) and write
[TABLE]
By (4.5),
[TABLE]
To estimate , first note that for . For , using the property that the function is increasing for , we obtain from (4.6) that
[TABLE]
Here we applied the inequality
[TABLE]
with and .
To estimate , one may bound the sequence for by the geometric progression with suitable parameters and . To this aim, consider the function
[TABLE]
We have
[TABLE]
if which we assume. In this case, is decreasing, so that u(x)\leq u(4\lambda)=\log\big{(}2\sqrt{\lambda}\,(\frac{e}{4b})^{4\lambda}\big{)}\leq\log A, where
[TABLE]
where on the last step we choose and applied (4.7) with and . Thus, putting and noting that , we get
[TABLE]
Finally, using (due to ), we get . This gives , so (4.1) follows.
Turning to the second assertion and using , write similarly
[TABLE]
For the region , we can apply the bound (4.5) again, which gives
[TABLE]
and therefore, using ,
[TABLE]
Using (4.6) together with the inequality (), we obtain, similarly to the derivation of the bound on in the -case, that
[TABLE]
Choosing again similarly to the derivation of the bound on in the -case, we also get
[TABLE]
Hence, , and (4.2) follows as well. ∎
5. Lower Bound on
Here, we complement Proposition 4.1 by a similar lower bound for the -distance in terms of the same quantity . Let .
Proposition 5.1. If , then with some absolute constant
[TABLE]
Moreover,
[TABLE]
as long as .
Suppose that . To derive (5.2) from (5.1), it is sufficient to require that , since then . This condition is fulfilled, as long as and then we obtain (5.2). In the remaining case , the inequality (5.2) follows from the lower bound
[TABLE]
cf. (1.4). Indeed, in this case, , so that , and thus , while .
Thus, it remains to derive the first inequality (5.1). First we shall prove it, assuming that is sufficiently large. As in Section 4, for any fixed , we apply the Cauchy theorem and write
[TABLE]
with integration over the uniform distribution on the circle of the complex plane. Here and below
[TABLE]
and
[TABLE]
We split the integration over the two regions so that to work with the representation
[TABLE]
where
[TABLE]
To properly estimate from below, needs to be estimated from above (in absolute value), while , which is a real number, should be estimated from below.
Furthermore, the quantity needs to be estimated from below as well. To this aim, we choose the radius by the condition , or equivalently
[TABLE]
Since the function is monotone and , , there is a unique solution, say , to this equation as long as (which may be assumed). We also assume that not all are equal to 0 or 1, so that .
Let us also emphasize that is concave on the positive half-axis. Since , we necessarily have in case , and in case .
Lemma 5.2. For any , the solution to the equation satisfies
[TABLE]
Moreover, in case , we have , and actually with some
[TABLE]
Proof. We have
[TABLE]
The inverse function is increasing and convex. Hence, for any ,
[TABLE]
Plugging , we obtain the first inequality.
Now, since for , we conclude that and . Thus, if , we obtain that
[TABLE]
implying . For , one may use , which gives and . Hence, again by the assumption,
[TABLE]
implying . In both cases, , proving the second assertion of the lemma.
Now, in the interval , we necessarily have , so that
[TABLE]
In addition,
[TABLE]
Let us now write the Taylor expansion up to the linear and quadratic terms for the inverse function around the point . Then we get
[TABLE]
where the points and lie between and . Putting and , the above is simplified as
[TABLE]
where and lie between and . It remains to apply these equalities with , that is, , and note that , while
[TABLE]
Note that and . ∎
Lemma 5.3. Let be the solution of for . Then
[TABLE]
Proof. The function
[TABLE]
is vanishing at and has derivative
[TABLE]
Since is increasing and concave, whenever . In particular, in the interval , we have
[TABLE]
which implies
[TABLE]
By Lemma 5.2, and . Moreover, as was shown in the proof, . Hence
[TABLE]
Here, . ∎
Lemma 5.4. Let . Then, for ,
[TABLE]
Proof. By Lemma 5.2, . As in the proof of Proposition 4.1, recall that for and ,
[TABLE]
For , necessarily and
[TABLE]
Hence
[TABLE]
Let us now estimate from below. Using which is the same as , we have, for ,
[TABLE]
In the region , there is a lower bound with best attainable constant when . In the case , this constant is given by . Therefore, for ,
[TABLE]
Here, the involved function
[TABLE]
is increasing in and decreasing in . Hence, if , then . If , that is, when , we have
[TABLE]
Finally, if , which is equivalent to , we have
[TABLE]
Thus, in all cases, on the interval , so that
[TABLE]
and thus
[TABLE]
Here we used , which ensures that
[TABLE]
where . In addition (recalling one of the upper bounds when bounding the integral from above), and using for , we get that
[TABLE]
Now, the assumption (5.3) may be rewritten as
[TABLE]
Here, the functions {\rm Im}\big{(}\log(q_{l}+p_{l}r\,e^{i\theta})\big{)} are odd, so their 2nd derivatives are vanishing at zero. We now apply the Taylor formula up to the cubic term to the function
[TABLE]
on the interval to get that
[TABLE]
with some . To perform differentiation, consider a function of the form
[TABLE]
We have
[TABLE]
Therefore,
[TABLE]
implying that
[TABLE]
But, for and ,
[TABLE]
Hence
[TABLE]
Here we used the property that is increasing in and is decreasing in . If , this gives . If , that is, when , we get . The latter expression is minimized at where it has the value . Finally, if , which is equivalent to , we have
[TABLE]
From this,
[TABLE]
so that
[TABLE]
with . Thus,
[TABLE]
Now, as we mentioned before, the function is odd in , so that is a real number given by
[TABLE]
Hence, using
[TABLE]
from the previous estimates we may deduce the lower bound
[TABLE]
where on the last step we assume that . Together with the upper bound on , we arrive at the lower bound
[TABLE]
Thus, Lemma 5.4 is proved. ∎
Proof of Proposition 5.1. We conclude from Lemmas 5.3 and 5.4 that
[TABLE]
for under the assumption .
On the other hand, , cf. (3.5). Since , we have
[TABLE]
As a consequence,
[TABLE]
In order to clarify the last inequality, note that the condition implies that . The above summation is performed over all integers from the interval of length at least . It contains at least one integer point, and actually, the number of integer points in it is at least . Moreover,
[TABLE]
Here, we used the bounds and , together with .
In order to treat the region , we apply Proposition 2.2. Let and , where are independent Bernoulli random variables taking values 1 and 0 with probabilities and . Assume as well that and are independent. Then and satisfy the condition .
Denote by a Poisson random variable with which is independent of . By the previous step and the inequality (2.4) of Proposition 2.2,
[TABLE]
Here, by (4.1), . Moreover, since , we have
[TABLE]
It follows that
[TABLE]
Hence, Proposition 5.1 holds in the case as well.
6. Lower Bound on
An analogue of Proposition 5.1 is the following statement for the relative entropy. Recall that .
Proposition 6.1. If and , then
[TABLE]
where , , and .
Proof. Let us recall two estimates from the previous section, namely
[TABLE]
The first one is valid under the conditions and , cf. (5.4). Clearly, they are fulfilled if and . If additionally , , then
[TABLE]
Since , we also have an upper bound
[TABLE]
In order that , it is therefore sufficient to require that , that is, . We have, moreover,
[TABLE]
Now, applying the inequality (2.1) of Proposition 2.1, we get
[TABLE]
Note that, if , the -interval has length at least , so, the total number of integer points in this interval is at least as well. Hence, the last sum can be bounded from below by
[TABLE]
Thus,
[TABLE]
Moreover, if with , then
[TABLE]
and (6.2) yields
[TABLE]
The proposition is thus proved under the conditions and with . It remains to eliminate the first condition, assuming that and again that with being sufficiently close to 1. To this aim, we appeal to Proposition 2.2 again like in the last step of the proof of Proposition 5.1. Namely, using the same notations and assumptions, from the inequality (2.3) and using (6.3), we obtain that
[TABLE]
where and . It holds, as long as , i.e., Since , the latter would follow from
[TABLE]
which is solved as
[TABLE]
Moreover, by (4.2), we have . This bound may be used in (6.4), which gives
[TABLE]
where the second inequality holds true when is sufficiently small. Namely,
[TABLE]
if and . Since the product in the exponent is smaller than , we may choose . In this case,
[TABLE]
assuming that . But
[TABLE]
for all . It remains to note that , , . ∎
7. Proof of Theorem 1.1
Let us summarize. Using the quantity
[TABLE]
the results on Poisson approximation obtained for different regions of and can be combined in the form of the following two-sided bounds
[TABLE]
[TABLE]
which are valid up to some absolute positive constants and . Let us describe the proof of Theorem 1.1 and provide explicit values for these constants. As we will see, (7.1)-(7.2) hold with and .
An upper bound in (7.1).
If , these bounds simplify and are made precise via
[TABLE]
Here, the left inequality holds for all and , cf. [H-J-K], while the right inequality is part of Proposition 3.1. Note that implies .
If and , we have, by Proposition 3.2,
[TABLE]
so that
[TABLE]
In the case where and , one may apply (4.2) which gives
[TABLE]
Here, the right-hand side contains a better numerical constant in comparison with (7.4), and we finally get (7.1) with a constant .
A lower bound in (7.1).
If , then , so that the lower bound in (7.3) yields (7.1) with .
If , the inequality (7.4) may be reversed by virtue of (6.1), which gives
[TABLE]
with , provided that and , where and . But, the remaining regions belong to the non-degenerate case, where is bounded by a quantity which depends on or . Indeed, if , then , so,
[TABLE]
This means that the left inequality in (7.1) holds with a constant which is smaller than in the analogous inequality (7.5). Similarly, if , then , and we get, by the lower bound in (7.3),
[TABLE]
This means that the left inequality in (7.1) holds true with the same constant as above. Thus, the lower bound in (7.1) holds with constant ().
An upper bound in (7.2).
If , we have (7.3), which implies (7.2) with .
If and , a stronger version of (7.4) is provided by Proposition 3.2, which gives
[TABLE]
so that (7.2) holds true with . In the case where and , one may apply (4.1) which gives
[TABLE]
Here, the right-hand side contains a better numerical constant, and we finally get (7.2) with the same constant as in (7.1).
A lower bound in (7.2).
If , then , so that the lower bound in (7.3) yields (7.1) with .
Assume that , in which case . By (5.2), we have
[TABLE]
with , provided that , . This gives
[TABLE]
and we obtain the left inequality in (7.2) with .
The remaining region belongs to the non-degenerate case, where is bounded. Indeed, if , then , so that, by the left inequality in (7.3),
[TABLE]
This means that the left inequality in (7.1) holds true with constant which is slightly better than the constant in the analogous inequality (7.6). Thus, the lower bound in (7.2) holds true with constant . ∎
8. Tsallis versus Vajda-Pearson
We now turn to the Tsallis relative entropies of other indexes. To make an application of non-uniform bounds more convenient, first let us relate to the Vajda-Pearson distance
[TABLE]
It is defined for arbitrary random elements and in a measure space whose distributions are absolutely continuous and have densities and respectively with respect to the measure on (the defnition does not depend on the choice of ).
Recall that
[TABLE]
so that is the classical Pearson distance, and note that as long as the distribution of is not absolutely continuous with respect to the distribution of . We need the following auxilliary result.
Proposition 8.1. For any ,
[TABLE]
Proof. We may assume that the distribution of is absolutely continuous with respect to the distribution of , with . In this case, the (non-negative) function is well defined a.e. with respect to the probability measure . We consider it as a random variable on the probability space with finite moment of order . Note that
[TABLE]
Putting , define the function , , so that . By the integral Taylor formula,
[TABLE]
Introducing the sets and , we have
[TABLE]
and
[TABLE]
We obtain the assertion of the proposition from the last two bounds. ∎
9. Estimates of Vajda-Pearson distances
For the proof of Theorem 1.2 we need the following propositions. We thus return to the setting of Bernoulli trials. Let us denote by a positive constant depending on only, which may vary from place to place.
Proposition 9.1. For and , we have
[TABLE]
Proof. Applying Lemmas III.1-2 and repeating the argument used in the proof of Proposition III.4 from [4], we obtain that
[TABLE]
∎
Proposition 9.2. Let . If and with , then
[TABLE]
Proof. Write
[TABLE]
In the range we apply the inequality (VI.2) from [4] which gives
[TABLE]
Therefore
[TABLE]
Here we use the upper bound .
In order to estimate we use the inequalities (VI.3) and (II.1) from [4] to get
[TABLE]
The assertion of the proposition follows immediately from the last two estimates. ∎
10. Proof of Theorem 1.2
To complete the proof of Theorem 1.2, we need the following two lemmas. Recall that .
Lemma 10.1. For and ,
[TABLE]
Proof. By the definition of the Tsallis distance,
[TABLE]
By (4.5),
[TABLE]
Using (4.6) and repeating the argument of Section 4, we obtain the upper bounds . The three last estimates give the assertion of the proposition. ∎
Lemma 10.2. For and , with some constant
[TABLE]
Moreover
[TABLE]
as long as \lambda_{2}\,\geq\,\big{(}1-\frac{c_{1}(\alpha)^{2}}{4}\big{)}\,\lambda.
Proof. The assertion (10.2) follows from the assertion (10.1) in the same way as (5.2) follows from (5.1). Therefore we omit the proof.
In order to prove (10.1) we use the lower bound (5.4). Repeating the argument of the proof of Proposition 5.1, we easily obtain the lower bound, under the assumption ,
[TABLE]
In order to treat the region we refer to Johnson [11], pp. 133–134, and repeat the argument of the end of Section 5. ∎
Proof of Theorem 1.2. Assuming that , we have with involved constants depending on , and then we need to show that .
In the case , we have
[TABLE]
Turning to the case , first let . Since , by Propositions 8.1 and 9.1,
[TABLE]
Now, let . Then, by Propositions 8.1 and 9.2, we conclude that
[TABLE]
It remains to consider the region . But in this case, the assertion of the theorem immediately follows from Lemmas 10.1 and 10.2.
11. Difference of Entropies
For the proof of Corollary 1.3, we shall use another functional
[TABLE]
where is an integer-valued random variable. Thus, while the Shannon entropy describes the average of the informational content , the informational quantity represents the 2nd moment of this random variable.
An application of Theorem 1.1 is based upon the following elementary relation.
Proposition 11.1. For all integer-valued random variables and with finite entropies, we have
[TABLE]
Proof. We may assume that the distribution of is absolutely continuous with respect to the distribution of (since otherwise ). Equivalently, for all , , where . Define in case . Recalling the definition (1.8), we then have
[TABLE]
We now apply the inequality (), obtaining
[TABLE]
Here, the first sum in the last bound is exactly , while, by Cauchy’s inequality, the square of the last sum is bounded from above by
[TABLE]
∎
In view of (11.1), we also need:
Proposition 11.2. If has a Poisson distribution with parameter , then
[TABLE]
Proof. Put . In particular, and . This shows that the above upper bound for small can be reversed up to a constant. For , given , from
[TABLE]
we get
[TABLE]
Hence, , thus proving the second upper bound of the lemma.
Now, assuming that , let us apply the lower bounds (3.6)-(3.7) from Lemma 3.3, which for all give
[TABLE]
and
[TABLE]
Note that this bound is also true for . Using the concavity of the function in and applying Jensen’s inequality, we therefore obtain that
[TABLE]
Hence , , with .
Applying the upper bound (3.6) from Lemma 3.3, we also see that this upper bound on can be reversed up to a constant as well.
∎
Remark 11.3. With similar arguments, it follows that
[TABLE]
which can be reversed modulo an absolute factor . Hence, as long as stays bounded away from zero.
Proof of Corollary 1.3. By Theorem 1.1 with as in (1.1) and with a Poisson random variable with parameter , we have
[TABLE]
up to some absolute constant . Using this estimate in (11.1) and applying Proposition 11.2, the desired inequality (1.9) immediately follows (in view of ).
To derive a more precise inequality illustrating the asymptotic behaviour in in the typical case , let us apply once more Theorem 1.1 with its sharper bound
[TABLE]
as in Proposition 3.1. By Proposition 11.1, this gives
[TABLE]
It remains to note that . according to Proposition 11.2. ∎
Acknowledgement. We would like to thank the referee for drawing our attention to the work by V. Zacharovas and H.-K. Hwang. Thanks also to A. Zaitsev for drawing our attention to the work by I. S. Borisov and I. S. Vorozheĭkin.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Andersen, K. F. Weighted inequalities for iterated convolutions. Proc. Amer. Math. Soc. 127 (1999), no. 9, 2643–2651.
- 2[2] Barbour, A. D.; Hall, P. On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 (1984), no. 3, 473–480.
- 3[3] Bobkov, S. G.; Chistyakov, G. P.; Götze, F. Rényi divergence and the central limit theorem. Ann. Probab. 47 (2019), no. 1, 270–323.
- 4[4] Bobkov, S. G.; Chistyakov, G. P.; Götze, F. Non-uniform bounds in the Poisson approximation with applications to informational distances. I. IEEE Transactions on Information Theory. Published online 25 April 2019.
- 5[5] Borisov, I. S.; Vorozheĭkin, I. S. Accuracy of approximation in the Poisson theorem in terms of χ 2 superscript 𝜒 2 \chi^{2} distance. (Russian) Sibirsk. Mat. Zh. 49 (2008), no. 1, 8–22; translation in Sib. Math. J. 49 (2008), no. 1, 5–17.
- 6[6] van Erven, T., Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory 60 (2014), no. 7, 3797–3820.
- 7[7] Harremoës, P. Binomial and Poisson distributions as maximum entropy distributions. IEEE Trans. Inform. Theory 47 (2001), no. 5, 2039–2041.
- 8[8] Harremoës, P.; Johnson, O.; Kontoyiannis. Thinning and information projections. ar Xive:1601.04255, Jan. 2016.
