Bounding distributional errors via density ratios
Lutz Duembgen, Richard Samworth, Jon Wellner

TL;DR
This paper introduces explicit bounds on distributional approximation errors using the maximal density ratio, providing a more informative measure than total variation distance, with applications to common distribution approximations.
Contribution
It develops new explicit error bounds based on density ratios, applicable to various classical distribution approximation problems, with both upper and lower bounds.
Findings
Provides explicit bounds for hypergeometric by binomial distributions
Offers bounds for binomial by Poisson distributions
Includes bounds for beta by gamma distributions
Abstract
We present some new and explicit error bounds for the approximation of distributions. The approximation error is quantified by the maximal density ratio of the distribution to be approximated and its proxy . This non-symmetric measure is more informative than and implies bounds for the total variation distance. Explicit approximation problems include, among others, hypergeometric by binomial distributions, binomial by Poisson distributions, and beta by gamma distributions. In many cases we provide both upper and (matching) lower bounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Bounding distributional errors via density ratios
Lutz Dümbgen (University of Bern)111Research supported by Swiss National Science Foundation,
Richard J. Samworth (University of Cambridge)222Research supported by an Engineering and Physical Sciences Research Council fellowship and
Jon A. Wellner (University of Washington, Seattle)333Research supported in part by NSF Grant DMS-1566514 and NI-AID Grant 2R01 AI291968-04
(February 4, 2020)
Abstract
We present some new and explicit error bounds for the approximation of distributions. The approximation error is quantified by the maximal density ratio of the distribution to be approximated and its proxy . This non-symmetric measure is more informative than and implies bounds for the total variation distance.
Explicit approximation problems include, among others, hypergeometric by binomial distributions, binomial by Poisson distributions, and beta by gamma distributions. In many cases we provide both upper and (matching) lower bounds.
Key words:
Binomial distribution, hypergeometric distribution, Poisson approximation, relative errors, total variation distance.
1 Introduction
The aim of this work is to provide new inequalities for the approximation of probability distributions. A traditional measure of discrepancy between distributions on a space is their total variation distance
[TABLE]
Alternatively we consider the maximal ratio
[TABLE]
with the conventions and for . Obviously because . While is a standard and strong metric on the space of all probability measures on , the maximal ratio is particularly important in situations in which a distribution is approximated by a distribution . When , we know that
[TABLE]
for arbitrary events , no matter how small is, whereas total variation distance gives only the additive bounds .
Explicit values or bounds for are obtained via density ratios. From now on let and have densities and , respectively, with respect to some measure on . Then
[TABLE]
The ratio measure plays an important role in acceptance-rejection sampling (von Neumann, 1951): Suppose that . Let and be independent random variables where and . Now let denote all indices such that . Then the random variables and (, ) are independent with and .
As soon as we have a finite bound for , we can bound total variation distance or other measures of discrepancy. The general result is as follows:
Proposition 1.
Suppose that for some number .
(a) For any non-decreasing function with ,
[TABLE]
(b) For any convex function ,
[TABLE]
Both inequalities are equalities if takes only values in .
Under the assumptions of Proposition 1, the following inequalities hold true, with equality in case of :
Total variation: With , part (a) leads to
[TABLE]
Kullback-Leibler divergence: With , part (a) yields
[TABLE]
Hellinger distance: With \psi(t):=2^{-1}\bigl{(}\sqrt{t}-1\bigr{)}^{2}, part (b) leads to
[TABLE]
Pearson divergence: With , part (b) yields
[TABLE]
Inequality (2) implies that , and the latter quantity is easily seen to be the mixture index of fit introduced by Rudas et al. (1994),
[TABLE]
The remainder of this paper is organized as follows: In Section 2 we present an explicit inequality for with being a hypergeometric and being an approximating binomial distribution. Our result complements results of Diaconis and Freedman (1980), Ehm (1991) and Holmes (2004) for .
In Section 3 we first consider the case of being a binomial distribution and being the Poisson distribution with the same mean. The corresponding ratio measure has been analyzed previously by Christensen et al. (1995) and Antonelli and Regoli (2005). Our new explicit bounds bridge the gap between these two works. As a by-product we obtain explicit bounds for which are comparable to well-known bounds from the literature. All these bounds carry over to multinomial distributions, to be approximated by a product of Poisson distributions. In particular, we improve and generalize approximation bounds by Diaconis and Freedman (1987). Indeed, at several places we use sufficiency arguments similarly to the latter authors to reduce multivariate approximation problems to univariate ones. Section 4 presents several further examples, most of which are based on approximating beta by gamma distributions.
Most proofs are deferred to Section 5. In particular, we provide a slightly strengthened version of the Stirling–Robbins approximation of factorials (Robbins, 1955) and some properties of the log-gamma function. This part is potentially of independent interest. As notation used throughout, we write and for real numbers and integers .
2 Binomial approximation of hypergeometric distributions
Sampling from a finite population.
First we revisit a result of Freedman (1977) concerning sampling with and without replacement. For integers let , the set of all samples of size drawn with replacement from . The uniform distribution on has weights
[TABLE]
for . When sampling without replacement, we consider the set of all with all components different, and the distribution with weights
[TABLE]
Consequently, on and on , so Proposition 1 (a) with implies that
[TABLE]
Freedman (1977) showed that
[TABLE]
Here are two new bounds for which we will prove in Section 5. The lower bound in the following display follows from Freedman’s proof of the lower bound in (4), while the upper bound is new.
[TABLE]
From (3) and (4) one would get the upper bound -\log\bigl{(}1-n(n-1)/(2N)\bigr{)} with the convention that for . For this coincides with the upper bound in (5), for it is strictly larger.
Hypergeometric and binomial distributions.
Now recall the definition of the hypergeometric distribution: Consider an urn with balls, of them being black and being white. Now we draw balls at random and define to be the number of black balls in this sample. When sampling with replacement, has the binomial distribution , and when sampling without replacement (), has the hypergeometric distribution . Intuitively one would guess that the difference between and is small when . Note that when Freedman’s (1977) result is applied to a particular function, e.g. the number of black balls, the resulting bound is suboptimal because it involves rather than . Indeed, Diaconis and Freedman (1980) showed that
[TABLE]
Stronger bounds have been obtained by means of the Chen–Stein method. Ehm (1991) showed that with ,
[TABLE]
while Holmes (2004) proved that
[TABLE]
Our first main result shows that for fixed parameters and , the ratio measure \rho\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)} is maximized by (and ):
Theorem 2.
For integers with , and ,
[TABLE]
Moreover,
[TABLE]
Remarks.
Note that our bounds for \mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)} are slightly better than the bound (7) of Holmes (2004). If we fix and let such that , then our bounds are equal to
[TABLE]
and thus similar to the bound (6) of Ehm (1991). If we fix and let such that , then our two bounds converge to
[TABLE]
whereas the upper bound in (7) tends to , and (6) is not applicable.
3 Poisson approximations
3.1 Binomial distributions
It is well-known that for and , the binomial distribution may be approximated by the Poisson distribution if is small. Explicit bounds for the approximation error have been developed in the more general setting of sums of independent but not necessarily identically distributed Bernoulli random variables by various authors. Hodges and Le Cam (1960) introduced a coupling method which was refined by Serfling (1975) and implies the inequality
[TABLE]
By direct calculations involving density ratios, Reiss (1993) showed that
[TABLE]
Finally, by means of the Chen–Stein method, Barbour and Hall (1984) derived the remarkable bound
[TABLE]
Concerning the ratio measure \rho\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)}, Christensen et al. (1995) showed that
[TABLE]
is a convex, piecewise linear function of with and
[TABLE]
A close inspection of their proof reveals that is the maximum of the log-ratio measure \log\rho\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(n,p)\bigr{)} over all integers , so the bound is probably rather conservative for large sample sizes . Indeed, it follows from the results of Antonelli and Regoli (2005) that for any fixed ,
[TABLE]
which is substantially smaller than , at least for small values . By means of elementary calculations and an appropriate version of Stirling’s formula, we shall prove the following bounds:
Theorem 3.
For arbitrary ,
[TABLE]
is a continuous and strictly increasing function of , satisfying and
[TABLE]
for . More precisely, with ,
[TABLE]
Remarks.
Since , the first two upper bounds of Theorem 3 and Proposition 1 (a) lead to the inequalities
[TABLE]
see inequality (20) in Section 5. For fixed , the bound in (8) may be rephrased as n\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,\lambda/n),\mathop{\mathrm{Poiss}}\nolimits(\lambda)\bigr{)}\leq(1-e^{-\lambda})\lambda. Our bounds imply that
[TABLE]
and for . The refined inequalities imply that for any fixed ,
[TABLE]
The proof of Theorem 3 reveals that \Lambda_{n}(p)=\log\rho\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)} is concave in p\in\bigl{[}(k-1)/n,k/n\bigr{]} for each . Figure 1 illustrates this for . In the left panel one sees (black) together with (black dashed) and the simple upper bounds (green) and (blue). The right panel shows the quantities (black), i.e. the difference of and the asymptotic bound of Antonelli and Regoli (2005), together with the upper bound (blue) and the two bounds in (11) (red and orange).
Poisson binomial distributions.
The distribution can be replaced with the distribution of with independent Bernoulli variables with arbitrary parameters and in place of . Dümbgen and Wellner (2020) showed that with .
3.2 Multinomial distributions and Poissonization
Multinomial distributions.
The previous bounds for the approximation of binomial by Poisson distributions imply bounds for the approximation of multinomial distributions by products of Poisson distributions. For integers and parameters such that , let follow a multinomial distribution
[TABLE]
where . Further, let be independent Poisson random variables with parameters respectively. Elementary calculations reveal that with and ,
[TABLE]
for arbitrary integers . Moreover,
[TABLE]
This implies that for arbitrary integers and ,
[TABLE]
Consequently, by (1),
[TABLE]
and one easily verifies that
[TABLE]
Poissonization.
Theorem 3 applies also to Poissonization for empirical processes: Let be independent random variables with distribution on a measurable space . Let be the random measure , and let be a Poisson process on with intensity measure . Then has the same distribution as , where is independent from . For a set with , the restrictions of the random measures and to satisfy the equality
[TABLE]
Here and stand for the random measures
[TABLE]
on . Indeed, for arbitrary integers ,
[TABLE]
while
[TABLE]
Consequently,
[TABLE]
and
[TABLE]
4 Gamma approximations and more
In this section we present further examples of bounds for the ratio measure . In all but one case, they are related to the approximation of beta by gamma distributions.
4.1 Beta distributions
In what follows, let be the beta distribution with parameters . The corresponding density is given by
[TABLE]
with the gamma function . Note that we view as a distribution on the halfline , because we want to approximate it by gamma distributions. Specifically, let be the gamma distribution with shape parameter and rate parameter (i.e. inverse scale parameter) . The corresponding density is given by
[TABLE]
The next theorem shows that may be approximated by for suitable rate parameters , provided that .
Theorem 4.
(i) For arbitrary parameters and ,
[TABLE]
where
[TABLE]
(ii) For , , and arbitrary ,
[TABLE]
Moreover, for this opimal rate parameter ,
[TABLE]
where
[TABLE]
Remarks.
The rate parameter is canonical in the sense that the means of and are both equal to . But note that
[TABLE]
if . Hence, yields a remarkably better approximation than , unless is rather large or is close to .
In the proof of Theorem 4 it is shown that in the special case of , one can show the following: For ,
[TABLE]
and for ,
[TABLE]
4.2 The Lévy–Poincaré projection problem
Let be uniformly distributed on the unit sphere in . It is well-known that can be represented as where and denotes standard Euclidean norm. Then the first coordinates of satisfy
[TABLE]
since by the weak law of large numbers. Indeed, let
[TABLE]
with , and let
[TABLE]
Diaconis and Freedman (1987) showed that
[TABLE]
By means of Theorem 4, this bound can be improved by a factor larger than . The approximation becomes even better if we set . To verify all this, we consider the random variables R_{k}:=\bigl{(}\sum_{i=1}^{k}Z_{i}^{2}\bigr{)}, R_{n}:=\bigl{(}\sum_{i=1}^{n}Z_{i}^{2}\bigr{)} and
[TABLE]
Note that is uniformly distributed on the unit sphere in and independent of . Moreover,
[TABLE]
But and . Hence,
[TABLE]
Applying Theorem 4 with , and yields the following bounds:
Corollary 5.
For ,
[TABLE]
where
[TABLE]
Figures 2 and 3 illustrate Corollary 5 in case of . For dimensions , Figure 2 shows the standard Gaussian density (green) and the density of in case of (black) and (blue). Figure 3 depicts the corresponding ratios . The dotted black and blue lines are the corresponding upper bounds from Corollary 5. These pictures show clearly that using instead of yields a substantial improvement.
4.3 Dirichlet distributions and uniform spacings
Dirichlet distributions.
For integers and parameters , let be a random vector with independent components . With , it is well-known that the random vector
[TABLE]
and are independent, where with
[TABLE]
The distribution of is the Dirichlet distribution with parameters , written
[TABLE]
Now let us focus on the first components of and :
[TABLE]
with
[TABLE]
Then and is independent of , while
[TABLE]
with
[TABLE]
Hence, the difference between and , in terms of the ratio measure, is the difference between and . Thus Theorem 4 yields the following bounds:
Corollary 6.
Let , and let . Then
[TABLE]
where either
[TABLE]
or
[TABLE]
Uniform spacings.
A special case of the previous result are uniform spacings: For an integer , let be independent random variables with uniform distribution on . Then we consider the order statistics . With and , it is well-known that
[TABLE]
That means, the spacings have the same distribution as with independent, standard exponential random variables and . Consequently, Corollary 6 and the second remark after Theorem 4 yield the following bounds:
Corollary 7.
For integers let be the distribution of the vector
[TABLE]
Further let be the -fold product of the standard exponential distribution. Then
[TABLE]
In particular,
[TABLE]
Remarks.
Corollary 7 gives another proof of the results of Runnenburg and Vervaat (1969), who obtained bounds on by first bounding the Kullback–Leibler divergence; see their Remark 4.1, pages 74–75. It can be shown via the methods of Hall and Wellner (1979) that
[TABLE]
where .
4.4 Student distributions
For let denote student’s t distribution with degrees of freedom, with density
[TABLE]
It is well-known that converges uniformly to the density of the standard Gaussian distribution , where . The distribution has heavier tails than the standard Gaussian distribution and, indeed,
[TABLE]
However, for the reverse ratio measure we do obtain a reasonable upper bound:
Lemma 8.
For ,
[TABLE]
Remarks.
It follows from Lemma 8 that
[TABLE]
By means of Proposition 1 (a) we obtain the inequality for . Pinelis (2015) proved that
[TABLE]
for , and that r\mathop{d_{\rm TV}}\bigl{(}N(0,1),t_{r}\bigr{)}\rightarrow C as . So is optimal in the bound for , whereas is optimal for .
Let and be random variables with distribution and , respectively, where . Then for any Borel set ,
[TABLE]
In particular,
[TABLE]
4.5 A counterexample: convergence of normal extremes
In all previous settings, we derived upper bounds for which implied resonable bounds for , whereas in general. This raises the question whether there are probability densities and , , such that , but both and ? The answer is “yes” in view of the following example.
Example 9.
Suppose that are independent, standard Gaussian random variables. Let . Let satisfy and then set . Then it is well-known that
[TABLE]
where is the Gumbel distribution function given by . Set for and . Hall (1979) shows that for constants and sufficiently large ,
[TABLE]
and for the Lévy metric . It is also known that if and , then , and (13) continues to hold with and replaced by and , but the rate of convergence in the last display is not better than .
In this example the densities of are given by
[TABLE]
for each fixed ; here is the standard normal density and is the standard normal distribution function. Thus by Scheffé’s lemma. But in this case it is easily seen that both and where the infinity in the first case occurs in the left tail, and the infinity in the second case occurs in the right tail.
We do not know a rate for the total variation convergence in this example, but it cannot be faster than .
5 Proofs and Auxiliary Results
5.1 Proofs of the main results
Proof of (1).
Suppose that for some real number . Then , -almost everywhere, so for all , and this implies that . On the other hand, if for some real number , then satisfies and , whence . These considerations show that equals the -essential supremum of . ∎
Proof of Proposition 1.
(a) Under the given hypotheses that is non-decreasing with and , we have
[TABLE]
Equality holds in the first inequality if and only if Q\bigl{(}\{g<f\}\cap\{\psi(g/f)<0\}\bigr{)}=0, and in the second inequality if and only if Q\bigl{(}\{g>f\}\cap\{\psi(g/f)<\psi(\rho)\}\bigr{)}=0. In particular, if , then Q\bigl{(}\{g<f\})=Q(\{g/f=0\})=0 and Q\bigl{(}\{g>f\}\cap\{\psi(g/f)<\psi(\rho)\}\bigr{)}=Q(\emptyset)=0, so we have equality in (14).
(b) For any convex function and , we have
[TABLE]
with equality in case of . Hence
[TABLE]
Equality holds if . ∎
Proof of (5) and comparison with (4).
The asserted bounds are trivial in case of , so we assume that . Note first that
[TABLE]
with for . Since ,
[TABLE]
This is essentially Freedman’s (1977) argument. For the upper bound, it suffices to show that for , the increment
[TABLE]
is not larger than the increment
[TABLE]
But the difference between (16) and (15) equals
[TABLE]
because is non-decreasing on . Since for and , we may also conclude that for ,
[TABLE]
∎
Auxiliary inequalities.
In what follows, we will use repeatedly the following inequalities for logarithms: For real numbers and ,
[TABLE]
These inequalities follow essentially from the fact
[TABLE]
with , where the Taylor series expansion in the second to last step is well-known and follows from the usual expansion . Then it follows from that
[TABLE]
whereas
[TABLE]
Here is another expression which will be encountered several times: For ,
[TABLE]
and the inequality implies that
[TABLE]
Recall that we write and for real numbers and integers . In particular, for integers .
Proof of Theorem 2.
The assertions are trivial in case of or , because then . Hence it suffices to consider and . For let
[TABLE]
Since
[TABLE]
it even suffices to consider
[TABLE]
In this case, for , and for .
In order to maximize the weight ratio , note that for any integer ,
[TABLE]
if and only if
[TABLE]
Consequently,
[TABLE]
The worst-case value equals if and only if . But
[TABLE]
Consequently, it suffices to consider
[TABLE]
Note that these inequalities for imply that . Hence it remains to prove the assertions when and .
The case is treated separately: Here it suffices to show that
[TABLE]
Indeed
[TABLE]
with equality if and only if . The latter expression is less than or equal to if and only if
[TABLE]
and elementary manipulations show that this is equivalent to
[TABLE]
But this inequality is satisfied for all .
Consequently, it suffices to prove our assertion in case of
[TABLE]
The maximizer of the density ratio is , and
[TABLE]
Now our task is to bound
[TABLE]
from above. Corollary 11 in Section 5.2 implies that for integers ,
[TABLE]
where
[TABLE]
Consequently,
[TABLE]
Now we introduce the auxiliary quantities
[TABLE]
and write
[TABLE]
Then
[TABLE]
whence
[TABLE]
It follows from (18) with , and that
[TABLE]
and with , and we may conclude that
[TABLE]
Hence
[TABLE]
where
[TABLE]
because . It will be shown later that
[TABLE]
Consequently,
[TABLE]
because , and we want to show that the right-hand side is not greater than
[TABLE]
Hence, it suffices to show that
[TABLE]
But the left-hand side is a convex function of and takes the value [math] for . Thus it suffices to verify that the latter inequality holds for . Indeed, for , the left-hand side is .
It remains to verify (21). When , this is relatively easy: Here , so
[TABLE]
because . Hence,
[TABLE]
The case is a bit more involved: Since
[TABLE]
inequality (21) is equivalent to
[TABLE]
The left-hand side of (22) equals
[TABLE]
because , while the right-hand of (22) side equals
[TABLE]
because and . Consequently, it suffices to verify that
[TABLE]
To this end, note that depends on , namely, , whence and
[TABLE]
so (23) is equivalent to
[TABLE]
But the left-hand side is
[TABLE]
For , the denominator is strictly positive, and the derivative of the numerator is , which is strictly positive, too. Thus it suffices to verify that the numerator is nonnegative for . Indeed, for .
Finally, it follows from Bernoulli’s inequality444 for real numbers and that . Now the inequalities for the total variation distance are an immediate consequence of Proposition 1 (a) with and the fact that and , whence
[TABLE]
∎
Proof of Theorem 3.
Obviously, . For we introduce the weights and . Obviously, for , while for and ,
[TABLE]
Note that the right hand side is a continuous function of with limit as , where . Thus we may conclude that
[TABLE]
is a continuous function of .
Next we need to determine the maximizer of . For ,
[TABLE]
Consequently,
[TABLE]
From now on we fix an integer and focus on p\in\bigl{[}(k-1)/n,k/n\bigr{]}, so that if . Then
[TABLE]
This is a concave function of with derivative
[TABLE]
if . Since is the derivative of with respect to , and since , this implies that
[TABLE]
On the other hand, is strictly increasing, whence
[TABLE]
But Corollary 11 in Section 5.2 implies that
[TABLE]
with
[TABLE]
Consequently,
[TABLE]
where the last inequality follows from (18) with , , and .
The refined bounds are for the quantity
[TABLE]
For p\in\bigl{[}(k-1)/n,k/n\bigr{]},
[TABLE]
and
[TABLE]
Consequently,
[TABLE]
It follows from (17) with , and that
[TABLE]
and with ,
[TABLE]
Hence
[TABLE]
On the other hand, the lower bound for in (11) is trivial in case of , and otherwise
[TABLE]
by (19) with and . ∎
Proof of Theorem 4.
We start with the first statement of part (ii). Let and for . Since for , it suffices to consider the log-density ratio
[TABLE]
for , noting that the latter expression for is well-defined for all . The derivative of equals
[TABLE]
and this is smaller or greater than zero if and only if is greater or smaller than the ratio , respectively. This shows that in case of ,
[TABLE]
For ,
[TABLE]
But the derivative of the latter expression with respect to equals
[TABLE]
so the unique minimizer of \log\rho\bigl{(}\mathrm{Beta}(a,b),\mathrm{Gamma}(a,c)\bigr{)} with respect to is .
It remains to verify the inequalities
[TABLE]
Then the total variation bounds of Theorem 4 follow from Proposition 1 (a) and the elementary inequality (20). Corollary 11 in Section 5.2 implies that
[TABLE]
Combining this with (25) yields (26):
[TABLE]
by (18) with . Concerning (27), if follows from (25) and (28) that
[TABLE]
where and . Now (27) follows from
[TABLE]
because .
In the special case of , we do not need (28) but get via (25) the explicit expression
[TABLE]
because . Now the standard Taylor series for yields that
[TABLE]
and in case of , the latter expression is not larger than
[TABLE]
∎
Proof of Lemma 8.
By Proposition 1 (a) and the inequality for , it suffices to verify the claims about \log\rho\bigl{(}N(0,1),t_{r}\bigr{)}. Note first that
[TABLE]
and
[TABLE]
whence
[TABLE]
On the one hand, the Taylor expansion yields that
[TABLE]
and the latter series equals
[TABLE]
Moreover, it follows from Lemma 12 in Section 5.2 with that
[TABLE]
because by assumption. Consequently,
[TABLE]
On the other hand, the previous considerations and Lemma 12 imply that
[TABLE]
and
[TABLE]
whence
[TABLE]
∎
5.2 Auxiliary Results for the Gamma Function
In what follows, let
[TABLE]
With a random variable one may write
[TABLE]
The functions and are known as the digamma and trigamma functions; see e.g., Olver et al. (2010), Section 5.15. This shows that is strictly convex in . Moreover, it follows from concavity of and Jensen’s inequality that
[TABLE]
The well-known identity is equivalent to
[TABLE]
Binet’s first formula and Stirling’s approximation.
Binet’s first integral formula states that
[TABLE]
where
[TABLE]
see Chapter 12 of Whittaker and Watson (1996). The following lemma provides a lower and upper bound for , and these yield rather precise bounds for the remainder .
Lemma 10.
For arbitrary ,
[TABLE]
In particular, the remainder in Binet’s formula (29) is strictly decreasing in and satisfies
[TABLE]
Since , Lemma 10 implies a slight improvement of the Stirling approximation by Robbins (1955): For arbitrary integers ,
[TABLE]
with
[TABLE]
In addition, Binet’s formula (29) and Lemma 10 lead to useful inequalities for the increments of .
Corollary 11.
For arbitrary ,
[TABLE]
where
[TABLE]
Proof of Lemma 10.
The series expansion of the exponential function and some elementary algebra lead to the representation
[TABLE]
with
[TABLE]
Note that , and
[TABLE]
This shows that with strict inequality for . Consequently, .
The reverse inequality, , is equivalent to
[TABLE]
The left hand side equals , while the right hand side equals
[TABLE]
Note that . Consequently, for all , provided that for all . But and , whence for . Consequently, it suffices to show that
[TABLE]
But
[TABLE]
if and only if , and for integers this is equivalent to . Hence
[TABLE]
Since for any fixed , the integrand is strictly decreasing in , the remainder is strictly decreasing in . The two bounds for imply that is larger than and smaller than . ∎
Proof of Corollary 11.
Writing with the auxiliary function , the remainder term equals . But
[TABLE]
and since , it follows from that
[TABLE]
Moreover, since ,
[TABLE]
∎
Special increments of .
In connection with student distributions, we need lower and upper bounds for the quantities . With a random variable , the latter expression equals , so it follows from Jensen’s inequality that . The next lemma shows that is close to to for large .
Lemma 12.
For arbitrary ,
[TABLE]
Proof of Lemma 12.
Let us first mention that the second derivative of the log-gamma function is given by Gauss’ formula
[TABLE]
see Chapter 12 of Whittaker and Watson (1996). In particular, is strictly convex and decreasing on with
[TABLE]
because .
Now we start with a general consideration about second order differences of : For arbitrary ,
[TABLE]
where and are independent random variables with uniform distribution on . Since is convex and , it follows from Jensen’s inequality that
[TABLE]
Note also that the distribution of is given by the triangular density , so
[TABLE]
We first apply these findings with and : Since ,
[TABLE]
which gives us the upper bound for . Furthermore,
[TABLE]
On the other hand, if , then with and we obtain
[TABLE]
Note that
[TABLE]
has the following properties:
[TABLE]
and
[TABLE]
These properties plus the convexity of imply that
[TABLE]
Indeed, the latter integral doesn’t change if we replace with with constants such that . But then, by convexity of and the sign changes of , we have that . Consequently,
[TABLE]
Finally, with , the latter expression equals
[TABLE]
∎
Acknowledgements.
Constructive comments of David Ginsbourger, Dominic Schuhmacher and Kaspar Stucki on an early version of this paper are gratefully acknowledged. We also thank Lutz Mattner for drawing our attention to the technical report Christensen et al. (1995) and for pointing out the connection between the ratio measure and the mixture index of fit. Constructive comments of a referee led to further improvements such as Proposition 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Antonelli and Regoli (2005) Antonelli, S. and Regoli, G. (2005). On the Poisson-binomial relative error. Statist. Probab. Lett. 71 249–256.
- 2Barbour and Hall (1984) Barbour, A. D. and Hall, P. (1984). On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 473–480.
- 3Christensen et al. (1995) Christensen, J. , Fischer, P. and Kvols, K. (1995). On the ratio of binomial and poisson probability distributions. Tech. Rep. 7, Matematisk Institut, Kobenhavns Universitet.
- 4Diaconis and Freedman (1980) Diaconis, P. and Freedman, D. (1980). Finite exchangeable sequences. Ann. Probab. 8 745–764.
- 5Diaconis and Freedman (1987) Diaconis, P. and Freedman, D. (1987). A dozen de Finetti-style results in search of a theory. Ann. Inst. H. Poincaré Probab. Statist. 23 397–423.
- 6Dümbgen and Wellner (2020) Dümbgen, L. and Wellner, J. A. (2020). The density ratio of Poisson binomial versus Poisson distributions. Statist. Probab. Lett. 165 108862. (ar Xiv:1910.03444).
- 7Ehm (1991) Ehm, W. (1991). Binomial approximation to the Poisson binomial distribution. Statist. Probab. Lett. 11 7–16.
- 8Freedman (1977) Freedman, D. (1977). A remark on the difference between sampling with and without replacement. J. Amer. Statist. Assoc. 72 681.
