Bounding distributional errors via density ratios

Lutz Duembgen; Richard Samworth; Jon Wellner

arXiv:1905.03009·math.ST·September 2, 2022

Bounding distributional errors via density ratios

Lutz Duembgen, Richard Samworth, Jon Wellner

PDF

TL;DR

This paper introduces explicit bounds on distributional approximation errors using the maximal density ratio, providing a more informative measure than total variation distance, with applications to common distribution approximations.

Contribution

It develops new explicit error bounds based on density ratios, applicable to various classical distribution approximation problems, with both upper and lower bounds.

Findings

01

Provides explicit bounds for hypergeometric by binomial distributions

02

Offers bounds for binomial by Poisson distributions

03

Includes bounds for beta by gamma distributions

Abstract

We present some new and explicit error bounds for the approximation of distributions. The approximation error is quantified by the maximal density ratio of the distribution $Q$ to be approximated and its proxy $P$ . This non-symmetric measure is more informative than and implies bounds for the total variation distance. Explicit approximation problems include, among others, hypergeometric by binomial distributions, binomial by Poisson distributions, and beta by gamma distributions. In many cases we provide both upper and (matching) lower bounds.

Equations536

\mathop{d_{\rm TV}}(Q,P)\ :=\ \sup_{A\in\mathcal{A}}\,\bigl{|}Q(A)-P(A)\bigr{|}.

\mathop{d_{\rm TV}}(Q,P)\ :=\ \sup_{A\in\mathcal{A}}\,\bigl{|}Q(A)-P(A)\bigr{|}.

ρ (Q, P) := A \in A sup \frac{Q ( A )}{P ( A )},

ρ (Q, P) := A \in A sup \frac{Q ( A )}{P ( A )},

Q (A) \leq ρ (Q, P) P (A)

Q (A) \leq ρ (Q, P) P (A)

ρ (Q, P) = ess sup_{x \in X} \frac{g ( x )}{f ( x )} .

ρ (Q, P) = ess sup_{x \in X} \frac{g ( x )}{f ( x )} .

\int ψ (g / f) d Q \leq Q ({g > f}) ψ (ρ) .

\int ψ (g / f) d Q \leq Q ({g > f}) ψ (ρ) .

\int ψ (g / f) d P \leq ψ (0) + \frac{ψ ( ρ ) - ψ ( 0 )}{ρ} .

\int ψ (g / f) d P \leq ψ (0) + \frac{ψ ( ρ ) - ψ ( 0 )}{ρ} .

d_{TV} (Q, P) \leq Q ({g > f}) (1 - ρ^{- 1}) .

d_{TV} (Q, P) \leq Q ({g > f}) (1 - ρ^{- 1}) .

\int lo g (g / f) d Q \leq Q ({g > f}) lo g ρ .

\int lo g (g / f) d Q \leq Q ({g > f}) lo g ρ .

\frac{1}{2}\int\bigl{(}\sqrt{f}-\sqrt{g}\bigr{)}^{2}\,d\mu\ \leq\ 1-\rho^{-1/2}.

\frac{1}{2}\int\bigl{(}\sqrt{f}-\sqrt{g}\bigr{)}^{2}\,d\mu\ \leq\ 1-\rho^{-1/2}.

\int (g / f - 1)^{2} d P \leq ρ - 1.

\int (g / f - 1)^{2} d P \leq ρ - 1.

\displaystyle\pi^{*}(P,Q)\

\displaystyle\pi^{*}(P,Q)\

\displaystyle\phantom{:}=\ \min\bigl{\{}\pi\in[0,1]:P\geq(1-\pi)Q\ \text{on}\ \mathcal{A}\bigr{\}}.

P ({x}) = 1/ N^{n}

P ({x}) = 1/ N^{n}

Q ({x}) = {1/ [N]_{n} 0 if x \in X_{*}, if x \in X ∖ X_{*} .

Q ({x}) = {1/ [N]_{n} 0 if x \in X_{*}, if x \in X ∖ X_{*} .

ρ (Q, P) = N^{n} / [N]_{n} and d_{TV} (Q, P) = 1 - ρ (Q, P)^{- 1} = 1 - [N]_{n} / N^{n} .

ρ (Q, P) = N^{n} / [N]_{n} and d_{TV} (Q, P) = 1 - ρ (Q, P)^{- 1} = 1 - [N]_{n} / N^{n} .

1-\exp\Bigl{(}-\frac{n(n-1)}{2N}\Bigr{)}\leq\mathop{d_{\rm TV}}(Q,P)\leq\frac{n(n-1)}{2N}.

1-\exp\Bigl{(}-\frac{n(n-1)}{2N}\Bigr{)}\leq\mathop{d_{\rm TV}}(Q,P)\leq\frac{n(n-1)}{2N}.

\frac{n(n-1)}{2N}\ \leq\ \log\rho(Q,P)\ \leq\ -\frac{n}{2}\log\Bigl{(}1-\frac{n-1}{N}\Bigr{)}.

\frac{n(n-1)}{2N}\ \leq\ \log\rho(Q,P)\ \leq\ -\frac{n}{2}\log\Bigl{(}1-\frac{n-1}{N}\Bigr{)}.

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,L/N),\mathop{\mathrm{Hyp}}\nolimits(N,L,n)\bigr{)}\ \leq\ 2\frac{n}{N}.

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,L/N),\mathop{\mathrm{Hyp}}\nolimits(N,L,n)\bigr{)}\ \leq\ 2\frac{n}{N}.

\displaystyle\mathop{d_{\rm TV}}\bigl{(}

\displaystyle\mathop{d_{\rm TV}}\bigl{(}

\leq \frac{n}{n + 1} (1 - p^{n + 1} - (1 - p)^{n + 1}) \frac{n - 1}{N - 1} for 1 \leq n \leq min {L, N - L},

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)}\ \leq\ \frac{n-1}{N-1}.

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)}\ \leq\ \frac{n-1}{N-1}.

\displaystyle\rho\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)}\

\displaystyle\rho\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)}\

\displaystyle=\ \Bigl{(}1-\frac{1}{N}\Bigr{)}^{-(n-1)}

\displaystyle\leq\ \Bigl{(}1-\frac{n-1}{N}\Bigr{)}^{-1}.

\displaystyle\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)}\

\displaystyle\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)}\

\leq (1 - \frac{[ L ] _{n}}{[ N ] _{n}} - \frac{[ N - L ] _{n}}{[ N ] _{n}}) \frac{n - 1}{N} .

(1 + o (1)) (1 - p^{n} - q^{n}) \frac{n - 1}{N}

(1 + o (1)) (1 - p^{n} - q^{n}) \frac{n - 1}{N}

(1 - γ^{L}) (1 - e^{- γ}) \leq (1 - γ^{L}) γ,

(1 - γ^{L}) (1 - e^{- γ}) \leq (1 - γ^{L}) γ,

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)}\ \leq\ np(1-e^{-p})\ \leq\ np^{2}.

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)}\ \leq\ np(1-e^{-p})\ \leq\ np^{2}.

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)}\ \leq\ p.

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)}\ \leq\ p.

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)}\ \leq\ (1-e^{-np})p.

\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)}\ \leq\ (1-e^{-np})p.

\Lambda(p)\ :=\ \max_{n\geq 1}\,\log\rho\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(n,p)\bigr{)}

\Lambda(p)\ :=\ \max_{n\geq 1}\,\log\rho\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(n,p)\bigr{)}

Λ (p) = p for 0 \leq p \leq lo g (2) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Bounding distributional errors via density ratios

Lutz Dümbgen (University of Bern)111Research supported by Swiss National Science Foundation,

Richard J. Samworth (University of Cambridge)222Research supported by an Engineering and Physical Sciences Research Council fellowship and

Jon A. Wellner (University of Washington, Seattle)333Research supported in part by NSF Grant DMS-1566514 and NI-AID Grant 2R01 AI291968-04

(February 4, 2020)

Abstract

We present some new and explicit error bounds for the approximation of distributions. The approximation error is quantified by the maximal density ratio of the distribution $Q$ to be approximated and its proxy $P$ . This non-symmetric measure is more informative than and implies bounds for the total variation distance.

Explicit approximation problems include, among others, hypergeometric by binomial distributions, binomial by Poisson distributions, and beta by gamma distributions. In many cases we provide both upper and (matching) lower bounds.

Key words:

Binomial distribution, hypergeometric distribution, Poisson approximation, relative errors, total variation distance.

1 Introduction

The aim of this work is to provide new inequalities for the approximation of probability distributions. A traditional measure of discrepancy between distributions $P,Q$ on a space $(\mathcal{X},\mathcal{A})$ is their total variation distance

[TABLE]

Alternatively we consider the maximal ratio

[TABLE]

with the conventions $0/0:=0$ and $a/0:=\infty$ for $a>0$ . Obviously $\rho(Q,P)\geq 1$ because $Q(\mathcal{X})=P(\mathcal{X})=1$ . While $\mathop{d_{\rm TV}}(\cdot,\cdot)$ is a standard and strong metric on the space of all probability measures on $(\mathcal{X},\mathcal{A})$ , the maximal ratio $\rho(Q,P)$ is particularly important in situations in which a distribution $Q$ is approximated by a distribution $P$ . When $\rho(Q,P)<\infty$ , we know that

[TABLE]

for arbitrary events $A$ , no matter how small $P(A)$ is, whereas total variation distance gives only the additive bounds $P(A)\pm\mathop{d_{\rm TV}}(Q,P)$ .

Explicit values or bounds for $\rho(Q,P)$ are obtained via density ratios. From now on let $P$ and $Q$ have densities $f$ and $g$ , respectively, with respect to some measure $\mu$ on $(\mathcal{X},\mathcal{A})$ . Then

[TABLE]

The ratio measure $\rho(Q,P)$ plays an important role in acceptance-rejection sampling (von Neumann, 1951): Suppose that $\rho(Q,P)\leq C<\infty$ . Let $X_{1},X_{2},X_{3},\ldots$ and $U_{1},U_{2},U_{3},\ldots$ be independent random variables where $X_{i}\sim P$ and $U_{i}\sim\mathrm{Unif}[0,1]$ . Now let $\tau_{1}<\tau_{2}<\tau_{3}<\cdots$ denote all indices $i\in\mathbb{N}$ such that $U_{i}\leq C^{-1}g(X_{i})/f(X_{i})$ . Then the random variables $Y_{j}:=X_{\tau_{j}}$ and $W_{j}:=\tau_{j}-\tau_{j-1}$ ( $j\in\mathbb{N}$ , $\tau_{0}:=0$ ) are independent with $Y_{j}\sim Q$ and $W_{j}\sim\mathrm{Geom}(1/C)$ .

As soon as we have a finite bound for $\rho(Q,P)$ , we can bound total variation distance or other measures of discrepancy. The general result is as follows:

Proposition 1.

Suppose that $g/f\leq\rho$ for some number $\rho\in[1,\infty)$ .

(a) For any non-decreasing function $\psi:[0,\infty)\to\mathbb{R}$ with $\psi(1)=0$ ,

[TABLE]

(b) For any convex function $\psi:[0,\infty)\to\mathbb{R}$ ,

[TABLE]

Both inequalities are equalities if $g/f$ takes only values in $\{0,\rho\}$ .

Under the assumptions of Proposition 1, the following inequalities hold true, with equality in case of $g/f\in\{0,\rho\}$ :

Total variation: With $\psi(t):=(1-t^{-1})_{+}$ , part (a) leads to

[TABLE]

Kullback-Leibler divergence: With $\psi(t):=\log t$ , part (a) yields

[TABLE]

Hellinger distance: With $\psi(t):=2^{-1}\bigl{(}\sqrt{t}-1\bigr{)}^{2}$ , part (b) leads to

[TABLE]

Pearson $\chi^{2}$ divergence: With $\psi(t):=(t-1)^{2}$ , part (b) yields

[TABLE]

Inequality (2) implies that $\mathop{d_{\rm TV}}(Q,P)\leq 1-\rho(Q,P)^{-1}$ , and the latter quantity is easily seen to be the mixture index of fit introduced by Rudas et al. (1994),

[TABLE]

The remainder of this paper is organized as follows: In Section 2 we present an explicit inequality for $\rho(Q,P)$ with $Q$ being a hypergeometric and $P$ being an approximating binomial distribution. Our result complements results of Diaconis and Freedman (1980), Ehm (1991) and Holmes (2004) for $\mathop{d_{\rm TV}}(Q,P)$ .

In Section 3 we first consider the case of $Q$ being a binomial distribution and $P$ being the Poisson distribution with the same mean. The corresponding ratio measure $\rho(Q,P)$ has been analyzed previously by Christensen et al. (1995) and Antonelli and Regoli (2005). Our new explicit bounds bridge the gap between these two works. As a by-product we obtain explicit bounds for $\mathop{d_{\rm TV}}(Q,P)$ which are comparable to well-known bounds from the literature. All these bounds carry over to multinomial distributions, to be approximated by a product of Poisson distributions. In particular, we improve and generalize approximation bounds by Diaconis and Freedman (1987). Indeed, at several places we use sufficiency arguments similarly to the latter authors to reduce multivariate approximation problems to univariate ones. Section 4 presents several further examples, most of which are based on approximating beta by gamma distributions.

Most proofs are deferred to Section 5. In particular, we provide a slightly strengthened version of the Stirling–Robbins approximation of factorials (Robbins, 1955) and some properties of the log-gamma function. This part is potentially of independent interest. As notation used throughout, we write $[a]_{0}:=1$ and $[a]_{m}:=\prod_{i=0}^{m-1}(a-i)$ for real numbers $a$ and integers $m\geq 1$ .

2 Binomial approximation of hypergeometric distributions

Sampling from a finite population.

First we revisit a result of Freedman (1977) concerning sampling with and without replacement. For integers $1\leq n\leq N$ let $\mathcal{X}=\{1,\ldots,N\}^{n}$ , the set of all samples of size $n$ drawn with replacement from $\{1,\ldots,N\}$ . The uniform distribution $P$ on $\mathcal{X}$ has weights

[TABLE]

for $x=(x_{1},\ldots,x_{n})\in\mathcal{X}$ . When sampling without replacement, we consider the set $\mathcal{X}_{*}$ of all $x$ with all components different, and the distribution $Q$ with weights

[TABLE]

Consequently, $dQ/dP=N^{n}/[N]_{n}$ on $\mathcal{X}_{*}$ and $dQ/dP=0$ on $\mathcal{X}\setminus\mathcal{X}_{*}$ , so Proposition 1 (a) with $\psi(t):=(1-t^{-1})_{+}$ implies that

[TABLE]

Freedman (1977) showed that

[TABLE]

Here are two new bounds for $\rho(Q,P)$ which we will prove in Section 5. The lower bound in the following display follows from Freedman’s proof of the lower bound in (4), while the upper bound is new.

[TABLE]

From (3) and (4) one would get the upper bound $-\log\bigl{(}1-n(n-1)/(2N)\bigr{)}$ with the convention that $\log(t):=-\infty$ for $t\leq 0$ . For $n=2$ this coincides with the upper bound in (5), for $n\geq 3$ it is strictly larger.

Hypergeometric and binomial distributions.

Now recall the definition of the hypergeometric distribution: Consider an urn with $N$ balls, $L$ of them being black and $N-L$ being white. Now we draw $n$ balls at random and define $X$ to be the number of black balls in this sample. When sampling with replacement, $X$ has the binomial distribution $\mathop{\mathrm{Bin}}\nolimits(n,L/N)$ , and when sampling without replacement ( $n\leq N$ ), $X$ has the hypergeometric distribution $\mathop{\mathrm{Hyp}}\nolimits(N,L,n)$ . Intuitively one would guess that the difference between $\mathop{\mathrm{Bin}}\nolimits(n,L/N)$ and $\mathop{\mathrm{Hyp}}\nolimits(N,L,n)$ is small when $n\ll N$ . Note that when Freedman’s (1977) result is applied to a particular function, e.g. the number of black balls, the resulting bound is suboptimal because it involves $n(n-1)/N$ rather than $n/N$ . Indeed, Diaconis and Freedman (1980) showed that

[TABLE]

Stronger bounds have been obtained by means of the Chen–Stein method. Ehm (1991) showed that with $p:=L/N$ ,

[TABLE]

while Holmes (2004) proved that

[TABLE]

Our first main result shows that for fixed parameters $N$ and $n\leq N/2+1$ , the ratio measure $\rho\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)}$ is maximized by $L=1$ (and $L=N-1$ ):

Theorem 2.

For integers $N,L,n$ with $1\leq n\leq N$ , $n-1\leq N/2$ and $L\in\{0,1,\ldots,N\}$ ,

[TABLE]

Moreover,

[TABLE]

Remarks.

Note that our bounds for $\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Hyp}}\nolimits(N,L,n),\mathop{\mathrm{Bin}}\nolimits(n,L/N)\bigr{)}$ are slightly better than the bound (7) of Holmes (2004). If we fix $n$ and let $L,N\to\infty$ such that $L/N\to p\in(0,1)$ , then our bounds are equal to

[TABLE]

and thus similar to the bound (6) of Ehm (1991). If we fix $L$ and let $n,N\to\infty$ such that $n/N\to\gamma\in(0,1)$ , then our two bounds converge to

[TABLE]

whereas the upper bound in (7) tends to $\gamma$ , and (6) is not applicable.

3 Poisson approximations

3.1 Binomial distributions

It is well-known that for $n\in\mathbb{N}$ and $p\in[0,1]$ , the binomial distribution $\mathop{\mathrm{Bin}}\nolimits(n,p)$ may be approximated by the Poisson distribution $\mathop{\mathrm{Poiss}}\nolimits(np)$ if $p$ is small. Explicit bounds for the approximation error have been developed in the more general setting of sums of independent but not necessarily identically distributed Bernoulli random variables by various authors. Hodges and Le Cam (1960) introduced a coupling method which was refined by Serfling (1975) and implies the inequality

[TABLE]

By direct calculations involving density ratios, Reiss (1993) showed that

[TABLE]

Finally, by means of the Chen–Stein method, Barbour and Hall (1984) derived the remarkable bound

[TABLE]

Concerning the ratio measure $\rho\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)}$ , Christensen et al. (1995) showed that

[TABLE]

is a convex, piecewise linear function of $p\in[0,1)$ with $\lim_{p\to 1}\Lambda(p)=\infty$ and

[TABLE]

A close inspection of their proof reveals that $\Lambda(p)$ is the maximum of the log-ratio measure $\log\rho\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(n,p)\bigr{)}$ over all integers $n\leq 1/(1-p)$ , so the bound $\Lambda(p)$ is probably rather conservative for large sample sizes $n$ . Indeed, it follows from the results of Antonelli and Regoli (2005) that for any fixed $p\in(0,1)$ ,

[TABLE]

which is substantially smaller than $\Lambda(p)$ , at least for small values $p$ . By means of elementary calculations and an appropriate version of Stirling’s formula, we shall prove the following bounds:

Theorem 3.

For arbitrary $n\in\mathbb{N}$ ,

[TABLE]

is a continuous and strictly increasing function of $p\in[0,1)$ , satisfying $\Lambda_{n}(0)=0$ and

[TABLE]

for $0<p<1$ . More precisely, with $k:=\lceil np\rceil$ ,

[TABLE]

Remarks.

Since $P(\{0\})=e^{-np}\geq Q(\{0\})=(1-p)^{n}$ , the first two upper bounds of Theorem 3 and Proposition 1 (a) lead to the inequalities

[TABLE]

see inequality (20) in Section 5. For fixed $\lambda>0$ , the bound in (8) may be rephrased as $n\mathop{d_{\rm TV}}\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,\lambda/n),\mathop{\mathrm{Poiss}}\nolimits(\lambda)\bigr{)}\leq(1-e^{-\lambda})\lambda$ . Our bounds imply that

[TABLE]

and $\lceil\lambda\rceil/2<\lambda$ for $\lambda>1/2$ . The refined inequalities imply that for any fixed $p_{o}\in(0,1)$ ,

[TABLE]

The proof of Theorem 3 reveals that $\Lambda_{n}(p)=\log\rho\bigl{(}\mathop{\mathrm{Bin}}\nolimits(n,p),\mathop{\mathrm{Poiss}}\nolimits(np)\bigr{)}$ is concave in $p\in\bigl{[}(k-1)/n,k/n\bigr{]}$ for each $k\in\{1,\ldots,n\}$ . Figure 1 illustrates this for $n=40$ . In the left panel one sees $\Lambda_{n}(p)$ (black) together with $\Lambda(p)$ (black dashed) and the simple upper bounds $-\log(1-p)$ (green) and $-\log(1-\lceil np\rceil/n)/2$ (blue). The right panel shows the quantities $\Lambda_{n}(p)+\log(1-p)/2$ (black), i.e. the difference of $\Lambda_{n}(p)$ and the asymptotic bound $-\log(1-p)/2$ of Antonelli and Regoli (2005), together with the upper bound $-\log(1-\lceil np\rceil/n)/2+\log(1-p)/2$ (blue) and the two bounds in (11) (red and orange).

Poisson binomial distributions.

The distribution $\mathop{\mathrm{Bin}}\nolimits(n,p)$ can be replaced with the distribution $Q$ of $\sum_{i=1}^{n}Z_{i}$ with independent Bernoulli variables $Z_{i}$ with arbitrary parameters $p_{i}:=\mathop{\mathrm{I\!P}}\nolimits(Z_{i}=1)\in(0,1)$ and $\lambda:=\sum_{i=1}^{n}p_{i}$ in place of $np$ . Dümbgen and Wellner (2020) showed that $\rho(Q,\mathop{\mathrm{Poiss}}\nolimits(\lambda))\leq(1-p_{*})^{-1}$ with $p_{*}:=\max_{1\leq i\leq n}p_{i}$ .

3.2 Multinomial distributions and Poissonization

Multinomial distributions.

The previous bounds for the approximation of binomial by Poisson distributions imply bounds for the approximation of multinomial distributions by products of Poisson distributions. For integers $n,K\geq 1$ and parameters $p_{1},\ldots,p_{K}>0$ such that $p_{+}:=\sum_{i=1}^{K}p_{i}<1$ , let $(Y_{0},Y_{1},\ldots,Y_{K})$ follow a multinomial distribution

[TABLE]

where $p_{0}:=1-p_{+}$ . Further, let $X_{1},\ldots,X_{K}$ be independent Poisson random variables with parameters $np_{1},\ldots,np_{K}$ respectively. Elementary calculations reveal that with $Y_{+}:=\sum_{i=1}^{K}Y_{i}$ and $X_{+}:=\sum_{i=1}^{K}X_{i}$ ,

[TABLE]

for arbitrary integers $m\geq 0$ . Moreover,

[TABLE]

This implies that for arbitrary integers $x_{1},\ldots,x_{K}\geq 0$ and $x_{+}:=\sum_{i=1}^{K}x_{i}$ ,

[TABLE]

Consequently, by (1),

[TABLE]

and one easily verifies that

[TABLE]

Poissonization.

Theorem 3 applies also to Poissonization for empirical processes: Let $X_{1},X_{2},X_{3},\ldots$ be independent random variables with distribution $P$ on a measurable space $(\mathcal{X},\mathcal{A})$ . Let $M_{n}$ be the random measure $\sum_{i=1}^{n}\delta_{X_{i}}$ , and let $\widetilde{M}_{n}$ be a Poisson process on $(\mathcal{X},\mathcal{A})$ with intensity measure $nP$ . Then $\widetilde{M}_{n}$ has the same distribution as $\sum_{i\leq N_{n}}\delta_{X_{i}}$ , where $N_{n}\sim\mathop{\mathrm{Poiss}}\nolimits(n)$ is independent from $(X_{i})_{i\geq 1}$ . For a set $A_{o}\in\mathcal{A}$ with $0<p_{o}:=P(A_{o})<1$ , the restrictions of the random measures $M_{n}$ and $\widetilde{M}_{n}$ to $A_{o}$ satisfy the equality

[TABLE]

Here $M_{n}|_{A_{o}}$ and $\widetilde{M}_{n}|_{A_{o}}$ stand for the random measures

[TABLE]

on $A_{o}$ . Indeed, for arbitrary integers $m\geq 0$ ,

[TABLE]

while

[TABLE]

Consequently,

[TABLE]

and

[TABLE]

4 Gamma approximations and more

In this section we present further examples of bounds for the ratio measure $\rho(Q,P)$ . In all but one case, they are related to the approximation of beta by gamma distributions.

4.1 Beta distributions

In what follows, let $\mathrm{Beta}(a,b)$ be the beta distribution with parameters $a,b>0$ . The corresponding density is given by

[TABLE]

with the gamma function $\Gamma(a):=\int_{0}^{\infty}x^{a-1}e^{-x}\,dx$ . Note that we view $\mathrm{Beta}(a,b)$ as a distribution on the halfline $(0,\infty)$ , because we want to approximate it by gamma distributions. Specifically, let $\mathrm{Gamma}(a,c)$ be the gamma distribution with shape parameter $a>0$ and rate parameter (i.e. inverse scale parameter) $c>0$ . The corresponding density is given by

[TABLE]

The next theorem shows that $\mathrm{Beta}(a,b)$ may be approximated by $\mathrm{Gamma}(a,c)$ for suitable rate parameters $c>0$ , provided that $b\gg\max(a,1)$ .

Theorem 4.

(i) For arbitrary parameters $a>0$ and $b>1$ ,

[TABLE]

where

[TABLE]

(ii) For $a>0$ , $b>1$ , and arbitrary $c>0$ ,

[TABLE]

Moreover, for this opimal rate parameter $c=a+b-1$ ,

[TABLE]

where

[TABLE]

Remarks.

The rate parameter $c=a+b$ is canonical in the sense that the means of $\mathrm{Beta}(a,b)$ and $\mathrm{Gamma}(a,a+b)$ are both equal to $a/(a+b)$ . But note that

[TABLE]

if $b\gg\max\{a,1\}$ . Hence, $\mathrm{Gamma}(a,a+b-1)$ yields a remarkably better approximation than $\mathrm{Gamma}(a,a+b)$ , unless $a$ is rather large or $b$ is close to $1$ .

In the proof of Theorem 4 it is shown that in the special case of $a=1$ , one can show the following: For $b>1$ ,

[TABLE]

and for $b\geq 2$ ,

[TABLE]

4.2 The Lévy–Poincaré projection problem

Let $\boldsymbol{U}=(U_{1},U_{2},\ldots,U_{n})$ be uniformly distributed on the unit sphere in $\mathbb{R}^{n}$ . It is well-known that $\boldsymbol{U}$ can be represented as $\boldsymbol{Z}/\|\boldsymbol{Z}\|$ where $\boldsymbol{Z}\sim N_{n}(0,I)$ and $\|\cdot\|$ denotes standard Euclidean norm. Then the first $k$ coordinates of $\boldsymbol{U}$ satisfy

[TABLE]

since $n^{-1}\sum_{j=1}^{n}Z_{j}^{2}\rightarrow_{p}1$ by the weak law of large numbers. Indeed, let

[TABLE]

with $r_{n}>0$ , and let

[TABLE]

Diaconis and Freedman (1987) showed that

[TABLE]

By means of Theorem 4, this bound can be improved by a factor larger than $2$ . The approximation becomes even better if we set $r_{n}=\sqrt{n-2}$ . To verify all this, we consider the random variables $R_{k}:=\bigl{(}\sum_{i=1}^{k}Z_{i}^{2}\bigr{)}$ , $R_{n}:=\bigl{(}\sum_{i=1}^{n}Z_{i}^{2}\bigr{)}$ and

[TABLE]

Note that $\boldsymbol{V}$ is uniformly distributed on the unit sphere in $\mathbb{R}^{k}$ and independent of $(R_{k},R_{n})$ . Moreover,

[TABLE]

But $R_{k}^{2}\sim\mathrm{Gamma}(k/2,1/2)$ and $R_{k}^{2}/R_{n}^{2}\sim\mathrm{Beta}(k/2,(n-k)/2)$ . Hence,

[TABLE]

Applying Theorem 4 with $a:=k/2$ , $b:=(n-k)/2$ and $c:=r_{n}^{2}/2$ yields the following bounds:

Corollary 5.

For $n>k+2$ ,

[TABLE]

where

[TABLE]

Figures 2 and 3 illustrate Corollary 5 in case of $k=1$ . For dimensions $n=5,10$ , Figure 2 shows the standard Gaussian density $f$ (green) and the density $g_{n}$ of $Q_{n,1}$ in case of $r_{n}=\sqrt{n}$ (black) and $r_{n}=\sqrt{n-2}$ (blue). Figure 3 depicts the corresponding ratios $g_{n}/f$ . The dotted black and blue lines are the corresponding upper bounds $(1-\delta)^{-1/2}$ from Corollary 5. These pictures show clearly that using $r_{n}=\sqrt{n-2}$ instead of $r_{n}=\sqrt{n}$ yields a substantial improvement.

4.3 Dirichlet distributions and uniform spacings

Dirichlet distributions.

For integers $1\leq k\leq N$ and parameters $a_{1},\ldots,a_{N},c>0$ , let $\boldsymbol{X}$ be a random vector with independent components $X_{i}\sim\mathrm{Gamma}(a_{i},c)$ . With $X_{+}:=\sum_{i=1}^{N}X_{i}$ , it is well-known that the random vector

[TABLE]

and $X_{+}$ are independent, where $X_{+}\sim\mathrm{Gamma}(a_{+},c)$ with

[TABLE]

The distribution of $\boldsymbol{Y}$ is the Dirichlet distribution with parameters $a_{1},\ldots,a_{N}$ , written

[TABLE]

Now let us focus on the first $k$ components of $\boldsymbol{X}$ and $\boldsymbol{Y}$ :

[TABLE]

with

[TABLE]

Then $(V_{1},\ldots,V_{k})\sim\mathrm{Dirichlet}(a_{1},\ldots,a_{k})$ and is independent of $(X_{+}^{(k)},X_{+})$ , while

[TABLE]

with

[TABLE]

Hence, the difference between $\mathcal{L}(Y_{1},\ldots,Y_{k})$ and $\mathcal{L}(X_{1},\ldots,X_{k})$ , in terms of the ratio measure, is the difference between $\mathrm{Beta}(a_{+}^{(k)},a_{+}-a_{+}^{(k)})$ and $\mathrm{Gamma}(a_{+}^{(k)},c)$ . Thus Theorem 4 yields the following bounds:

Corollary 6.

Let $P_{k}:=\otimes_{i=1}^{k}\mathrm{Gamma}(a_{i},c)$ , and let $Q_{N,k}:=\mathcal{L}(Y_{1},\ldots,Y_{k})$ . Then

[TABLE]

where either

[TABLE]

or

[TABLE]

Uniform spacings.

A special case of the previous result are uniform spacings: For an integer $n\geq 2$ , let $U_{1},\ldots,U_{n}$ be independent random variables with uniform distribution on $[0,1]$ . Then we consider the order statistics $0<U_{n:1}<U_{n:2}<\cdots<U_{n:n}<1$ . With $U_{n:0}:=0$ and $U_{n:n+1}:=1$ , it is well-known that

[TABLE]

That means, the $n+1$ spacings have the same distribution as $(E_{j}/E_{+})_{j=1}^{n+1}$ with independent, standard exponential random variables $E_{1},\ldots,E_{n+1}$ and $E_{+}:=\sum_{j=1}^{n+1}E_{j}$ . Consequently, Corollary 6 and the second remark after Theorem 4 yield the following bounds:

Corollary 7.

For integers $1\leq k<n$ let $Q_{n,k}$ be the distribution of the vector

[TABLE]

Further let $P_{k}$ be the $k$ -fold product of the standard exponential distribution. Then

[TABLE]

In particular,

[TABLE]

Remarks.

Corollary 7 gives another proof of the results of Runnenburg and Vervaat (1969), who obtained bounds on $\mathop{d_{\rm TV}}(Q_{n,k},P_{k})$ by first bounding the Kullback–Leibler divergence; see their Remark 4.1, pages 74–75. It can be shown via the methods of Hall and Wellner (1979) that

[TABLE]

where $2e^{-2}\approx.2707<1/2$ .

4.4 Student distributions

For $r>0$ let $t_{r}$ denote student’s t distribution with $r$ degrees of freedom, with density

[TABLE]

It is well-known that $f_{r}$ converges uniformly to the density $\phi$ of the standard Gaussian distribution $N(0,1)$ , where $\phi(x):=\exp(-x^{2}/2)/\sqrt{2\pi}$ . The distribution $t_{r}$ has heavier tails than the standard Gaussian distribution and, indeed,

[TABLE]

However, for the reverse ratio measure we do obtain a reasonable upper bound:

Lemma 8.

For $r\geq 2$ ,

[TABLE]

Remarks.

It follows from Lemma 8 that

[TABLE]

By means of Proposition 1 (a) we obtain the inequality $r\mathop{d_{\rm TV}}(N(0,1),t_{r})\leq 1/2$ for $r\geq 2$ . Pinelis (2015) proved that

[TABLE]

for $r\geq 4$ , and that $r\mathop{d_{\rm TV}}\bigl{(}N(0,1),t_{r}\bigr{)}\rightarrow C$ as $r\rightarrow\infty$ . So $C$ is optimal in the bound for $\mathop{d_{\rm TV}}$ , whereas $1/2$ is optimal for $\rho$ .

Let $Z$ and $T_{r}$ be random variables with distribution $N(0,1)$ and $t_{r}$ , respectively, where $r\geq 2$ . Then for any Borel set $B\subset\mathbb{R}$ ,

[TABLE]

In particular,

[TABLE]

4.5 A counterexample: convergence of normal extremes

In all previous settings, we derived upper bounds for $\rho(Q,P)$ which implied resonable bounds for $\mathop{d_{\rm TV}}(Q,P)=\mathop{d_{\rm TV}}(P,Q)$ , whereas $\rho(P,Q)=\infty$ in general. This raises the question whether there are probability densities $g$ and $f_{n}$ , $n\geq 1$ , such that $\mathop{d_{\rm TV}}(f_{n},g)\to 0$ , but both $\rho(f_{n},g)=\infty$ and $\rho(g,f_{n})=\infty$ ? The answer is “yes” in view of the following example.

Example 9.

Suppose that $Z_{1},Z_{2},Z_{3},\ldots$ are independent, standard Gaussian random variables. Let $V_{n}:=\max\{Z_{i}:1\leq i\leq n\}$ . Let $b_{n}>0$ satisfy $2\pi b_{n}^{2}\exp(b_{n}^{2})=n^{2}$ and then set $a_{n}:=1/b_{n}$ . Then it is well-known that

[TABLE]

where $G$ is the Gumbel distribution function given by $G(x)=\exp(-\exp(-x))$ . Set $F_{n}(x):=P(Y_{n}\leq x)$ for $n\geq 1$ and $x\in\mathbb{R}$ . Hall (1979) shows that for constants $0<C_{1}<C_{2}\leq 3$ and sufficiently large $n$ ,

[TABLE]

and $d_{\mathrm{L}}(F_{n},G)=O(1/\log n)$ for the Lévy metric $d_{\mathrm{L}}$ . It is also known that if $\widetilde{b}_{n}:=(2\log n)^{1/2}-(1/2)\{\log\log n+\log(4\pi)\}/(2\log n)^{1/2}$ and $\widetilde{a}_{n}:=1/\widetilde{b}_{n}$ , then $\widetilde{a}_{n}/a_{n}\rightarrow 1$ , $(\widetilde{b}_{n}-b_{n})/a_{n}\rightarrow 0$ and (13) continues to hold with $a_{n}$ and $b_{n}$ replaced by $\widetilde{a}_{n}$ and $\widetilde{b}_{n}$ , but the rate of convergence in the last display is not better than $(\log\log n)^{2}/\log n$ .

In this example the densities $f_{n}$ of $F_{n}$ are given by

[TABLE]

for each fixed $x\in\mathbb{R}$ ; here $\phi$ is the standard normal density and $\Phi(z):=\int_{-\infty}^{z}\phi(y)dy$ is the standard normal distribution function. Thus $\mathop{d_{\rm TV}}(F_{n},G)\rightarrow 0$ by Scheffé’s lemma. But in this case it is easily seen that both $\rho(f_{n},g)=\infty$ and $\rho(g,f_{n})=\infty$ where the infinity in the first case occurs in the left tail, and the infinity in the second case occurs in the right tail.

We do not know a rate for the total variation convergence in this example, but it cannot be faster than $1/\log n$ .

5 Proofs and Auxiliary Results

5.1 Proofs of the main results

Proof of (1).

Suppose that $\mu(\{g/f>r\})=0$ for some real number $r>0$ . Then $g\leq rf$ , $\mu$ -almost everywhere, so $Q(A)\leq rP(A)$ for all $A\in\mathcal{A}$ , and this implies that $\rho(Q,P)\leq r$ . On the other hand, if $\mu(\{g/f\geq r\})>0$ for some real number $r>0$ , then $A:=\{g/f\geq r\}=\{g\geq rf\}\cap\{g>0\}$ satisfies $Q(A)>0$ and $Q(A)\geq rP(A)$ , whence $\rho(Q,P)\geq r$ . These considerations show that $\rho(Q,P)$ equals the $\mu$ -essential supremum of $g/f$ . ∎

Proof of Proposition 1.

(a) Under the given hypotheses that $\psi$ is non-decreasing with $\psi(1)=0$ and $g/f\leq\rho$ , we have

[TABLE]

Equality holds in the first inequality if and only if $Q\bigl{(}\{g<f\}\cap\{\psi(g/f)<0\}\bigr{)}=0$ , and in the second inequality if and only if $Q\bigl{(}\{g>f\}\cap\{\psi(g/f)<\psi(\rho)\}\bigr{)}=0$ . In particular, if $g/f\in\{0,\rho\}$ , then $Q\bigl{(}\{g<f\})=Q(\{g/f=0\})=0$ and $Q\bigl{(}\{g>f\}\cap\{\psi(g/f)<\psi(\rho)\}\bigr{)}=Q(\emptyset)=0$ , so we have equality in (14).

(b) For any convex function $\psi:[0,\infty)\rightarrow\mathbb{R}$ and $y\in[0,\rho]$ , we have

[TABLE]

with equality in case of $y\in\{0,\rho\}$ . Hence

[TABLE]

Equality holds if $g/f\in\{0,\rho\}$ . ∎

Proof of (5) and comparison with (4).

The asserted bounds are trivial in case of $n=1$ , so we assume that $2\leq n\leq N$ . Note first that

[TABLE]

with $H(x):=-\log(1-x/N)=\sum_{\ell=1}^{\infty}(x/N)^{\ell}/\ell$ for $x\geq 0$ . Since $H(x)\geq x/N$ ,

[TABLE]

This is essentially Freedman’s (1977) argument. For the upper bound, it suffices to show that for $1\leq n<N$ , the increment

[TABLE]

is not larger than the increment

[TABLE]

But the difference between (16) and (15) equals

[TABLE]

because $H(x)/x$ is non-decreasing on $[0,\infty)$ . Since $H(tx)>tH(x)$ for $x\in[0,N)$ and $t>1$ , we may also conclude that for $3\leq n\leq N$ ,

[TABLE]

∎

Auxiliary inequalities.

In what follows, we will use repeatedly the following inequalities for logarithms: For real numbers $x,a>0$ and $b>-x$ ,

[TABLE]

These inequalities follow essentially from the fact

[TABLE]

with $y:=a/(2x+a)$ , where the Taylor series expansion in the second to last step is well-known and follows from the usual expansion $\log(1\pm y)=-\sum_{k=1}^{\infty}(\mp y)^{k}/k$ . Then it follows from $x+b>0$ that

[TABLE]

whereas

[TABLE]

Here is another expression which will be encountered several times: For $\delta\in[0,1]$ ,

[TABLE]

and the inequality $\sqrt{1-\delta}\geq 1-\delta$ implies that

[TABLE]

Recall that we write $[a]_{0}:=1$ and $[a]_{m}:=\prod_{i=0}^{m-1}(a-i)$ for real numbers $a$ and integers $m\geq 1$ . In particular, $\binom{n}{k}=[n]_{k}/k!$ for integers $0\leq k\leq n$ .

Proof of Theorem 2.

The assertions are trivial in case of $n=1$ or $L\in\{0,N\}$ , because then $\mathop{\mathrm{Hyp}}\nolimits(N,L,n)=\mathop{\mathrm{Bin}}\nolimits(n,L/N)$ . Hence it suffices to consider $n\geq 2$ and $1\leq L\leq N-1$ . For $k\in\{0,1,\ldots,n\}$ let

[TABLE]

Since

[TABLE]

it even suffices to consider

[TABLE]

In this case, $r(k)>0$ for $1\leq k\leq\min(n,L)$ , and $r(k)=0$ for $\min(n,L)<k\leq n$ .

In order to maximize the weight ratio $r$ , note that for any integer $0\leq k<\min(L,n)$ ,

[TABLE]

if and only if

[TABLE]

Consequently,

[TABLE]

The worst-case value $k_{N,L,n}$ equals $1$ if and only if $L\leq N/(n-1)$ . But

[TABLE]

Consequently, it suffices to consider

[TABLE]

Note that these inequalities for $L$ imply that $n-1>2$ . Hence it remains to prove the assertions when $n\geq 4$ and $N/(n-1)<L\leq N/2$ .

The case $n=4$ is treated separately: Here it suffices to show that

[TABLE]

Indeed

[TABLE]

with equality if and only if $L=N/2$ . The latter expression is less than or equal to $1$ if and only if

[TABLE]

and elementary manipulations show that this is equivalent to

[TABLE]

But this inequality is satisfied for all $N\geq 5$ .

Consequently, it suffices to prove our assertion in case of

[TABLE]

The maximizer $k=k_{N,L,n}$ of the density ratio is $k=\lceil(n-1)L/N\rceil\geq 2$ , and

[TABLE]

Now our task is to bound

[TABLE]

from above. Corollary 11 in Section 5.2 implies that for integers $A\geq m\geq 2$ ,

[TABLE]

where

[TABLE]

Consequently,

[TABLE]

Now we introduce the auxiliary quantities

[TABLE]

and write

[TABLE]

Then

[TABLE]

whence

[TABLE]

It follows from (18) with $x=L\Delta$ , $a=1-\gamma$ and $b=1/2-\gamma$ that

[TABLE]

and with $x=(N-L)\Delta$ , $a=\gamma$ and $b=\gamma-1/2$ we may conclude that

[TABLE]

Hence

[TABLE]

where

[TABLE]

because $\gamma(1-\gamma)\leq 1/4$ . It will be shown later that

[TABLE]

Consequently,

[TABLE]

because $\delta\leq 1/2$ , and we want to show that the right-hand side is not greater than

[TABLE]

Hence, it suffices to show that

[TABLE]

But the left-hand side is a convex function of $\delta\in[0,1/2]$ and takes the value [math] for $\delta=0$ . Thus it suffices to verify that the latter inequality holds for $\delta=1/2$ . Indeed, for $\delta=1/2$ , the left-hand side is $\log(2)/2+1/7-1/2=(\log(2)-5/7)/2<0$ .

It remains to verify (21). When $k=\lceil L\delta\rceil\geq 3$ , this is relatively easy: Here $2\delta^{-1}<L\leq N/2$ , so

[TABLE]

because $n\geq 5$ . Hence,

[TABLE]

The case $k=2$ is a bit more involved: Since

[TABLE]

inequality (21) is equivalent to

[TABLE]

The left-hand side of (22) equals

[TABLE]

because $7\gamma(1-\gamma)\leq 7/4<2$ , while the right-hand of (22) side equals

[TABLE]

because $N-L\geq L$ and $L\delta>1$ . Consequently, it suffices to verify that

[TABLE]

To this end, note that $\gamma$ depends on $L$ , namely, $\gamma=2-L\delta$ , whence $L=(2-\gamma)\delta^{-1}$ and

[TABLE]

so (23) is equivalent to

[TABLE]

But the left-hand side is

[TABLE]

For $n\geq 5$ , the denominator is strictly positive, and the derivative of the numerator is $15.5n-43.5$ , which is strictly positive, too. Thus it suffices to verify that the numerator is nonnegative for $n=5$ . Indeed, $4(n-3)(7n-9)-(4.5n-8.5)^{2}=12$ for $n=5$ .

Finally, it follows from Bernoulli’s inequality444 $(1+x)^{m}\geq 1+mx$ for real numbers $x>-1$ and $m\geq 1$ that $(1-1/N)^{-(n-1)}\leq(1-(n-1)/N)^{-1}$ . Now the inequalities for the total variation distance are an immediate consequence of Proposition 1 (a) with $\psi(t)=(1-t^{-1})_{+}$ and the fact that $Q(\{0\})\leq P(\{0\})$ and $Q(\{n\})\leq P(\{n\})$ , whence

[TABLE]

∎

Proof of Theorem 3.

Obviously, $\Lambda_{n}(0)=0$ . For $k\in\mathbb{N}_{0}$ we introduce the weights $b(k)=b_{n,p}(k):=\mathop{\mathrm{Bin}}\nolimits(n,p)(\{k\})$ and $\pi(k)=\pi_{np}(k):=\mathop{\mathrm{Poiss}}\nolimits(np)(\{k\})=e^{-np}(np)^{k}/k!$ . Obviously, $b(k)=0$ for $k>n$ , while for $0\leq k\leq n$ and $p\in(0,1)$ ,

[TABLE]

Note that the right hand side is a continuous function of $p\in[0,1)$ with limit $\lambda_{n,0}(k):=\log([n]_{k}/n^{k})\leq 0$ as $p\to 0$ , where $\lambda_{n,0}(0)=0$ . Thus we may conclude that

[TABLE]

is a continuous function of $p\in[0,1)$ .

Next we need to determine the maximizer of $\lambda_{n,p}(\cdot)$ . For $k\in\{0,1\ldots,n-1\}$ ,

[TABLE]

Consequently,

[TABLE]

From now on we fix an integer $k\in\{1,\ldots,n\}$ and focus on $p\in\bigl{[}(k-1)/n,k/n\bigr{]}$ , so that $k=\lceil np\rceil$ if $p>(k-1)/n$ . Then

[TABLE]

This is a concave function of $p$ with derivative

[TABLE]

if $(k-1)/n<p<k/n$ . Since $1/(1-p)$ is the derivative of $-\log(1-p)$ with respect to $p$ , and since $\Lambda_{n}(0)=0=-\log(1-0)$ , this implies that

[TABLE]

On the other hand, $\Lambda_{n}$ is strictly increasing, whence

[TABLE]

But Corollary 11 in Section 5.2 implies that

[TABLE]

with

[TABLE]

Consequently,

[TABLE]

where the last inequality follows from (18) with $x=n-k$ , $a=1$ , and $b=1/2$ .

The refined bounds are for the quantity

[TABLE]

For $p\in\bigl{[}(k-1)/n,k/n\bigr{]}$ ,

[TABLE]

and

[TABLE]

Consequently,

[TABLE]

It follows from (17) with $x=n-k+1/2$ , $a=1/2$ and $b=0$ that

[TABLE]

and with $y:=n-k+3/4\geq 3/4$ ,

[TABLE]

Hence

[TABLE]

On the other hand, the lower bound for $D_{n}(p)$ in (11) is trivial in case of $k=n$ , and otherwise

[TABLE]

by (19) with $x=n-k$ and $a=1$ . ∎

Proof of Theorem 4.

We start with the first statement of part (ii). Let $\beta:=\beta_{a,b}$ and $\gamma_{c}:=\gamma_{a,c}$ for $c>0$ . Since $\beta(x)=0$ for $x\geq 1$ , it suffices to consider the log-density ratio

[TABLE]

for $0\leq x<1$ , noting that the latter expression for $\lambda_{c}(x)$ is well-defined for all $x<1$ . The derivative of $\lambda_{c}$ equals

[TABLE]

and this is smaller or greater than zero if and only if $x$ is greater or smaller than the ratio $(c-b+1)/c$ , respectively. This shows that in case of $c\leq b-1$ ,

[TABLE]

For $c\geq b-1$ ,

[TABLE]

But the derivative of the latter expression with respect to $c\geq b-1$ equals

[TABLE]

so the unique minimizer of $\log\rho\bigl{(}\mathrm{Beta}(a,b),\mathrm{Gamma}(a,c)\bigr{)}$ with respect to $c>0$ is $c=a+b-1$ .

It remains to verify the inequalities

[TABLE]

Then the total variation bounds of Theorem 4 follow from Proposition 1 (a) and the elementary inequality (20). Corollary 11 in Section 5.2 implies that

[TABLE]

Combining this with (25) yields (26):

[TABLE]

by (18) with $(x,a,b)=(b-1,a,1/2)$ . Concerning (27), if follows from (25) and (28) that

[TABLE]

where $A:=2b-1$ and $B:=2(a+b)-1$ . Now (27) follows from

[TABLE]

because $A<B$ .

In the special case of $a=1$ , we do not need (28) but get via (25) the explicit expression

[TABLE]

because $\Gamma(b+1)=b\Gamma(b)$ . Now the standard Taylor series for $\log(1-x)$ yields that

[TABLE]

and in case of $b\geq 2$ , the latter expression is not larger than

[TABLE]

∎

Proof of Lemma 8.

By Proposition 1 (a) and the inequality $1-\exp(-x)\leq x$ for $x\geq 0$ , it suffices to verify the claims about $\log\rho\bigl{(}N(0,1),t_{r}\bigr{)}$ . Note first that

[TABLE]

and

[TABLE]

whence

[TABLE]

On the one hand, the Taylor expansion $-\log(1-x)=\sum_{k=1}^{\infty}x^{k}/k$ yields that

[TABLE]

and the latter series equals

[TABLE]

Moreover, it follows from Lemma 12 in Section 5.2 with $x:=r/2$ that

[TABLE]

because $r-1\geq 1$ by assumption. Consequently,

[TABLE]

On the other hand, the previous considerations and Lemma 12 imply that

[TABLE]

and

[TABLE]

whence

[TABLE]

∎

5.2 Auxiliary Results for the Gamma Function

In what follows, let

[TABLE]

With a random variable $Y_{x}\sim\mathrm{Gamma}(x,1)$ one may write

[TABLE]

The functions $h^{\prime}$ and $h^{\prime\prime}$ are known as the digamma and trigamma functions; see e.g., Olver et al. (2010), Section 5.15. This shows that $h(x)$ is strictly convex in $x>0$ . Moreover, it follows from concavity of $\log(\cdot)$ and Jensen’s inequality that

[TABLE]

The well-known identity $\Gamma(x+1)=x\Gamma(x)$ is equivalent to

[TABLE]

Binet’s first formula and Stirling’s approximation.

Binet’s first integral formula states that

[TABLE]

where

[TABLE]

see Chapter 12 of Whittaker and Watson (1996). The following lemma provides a lower and upper bound for $w(t)$ , and these yield rather precise bounds for the remainder $R(x)$ .

Lemma 10.

For arbitrary $t>0$ ,

[TABLE]

In particular, the remainder $R(x)$ in Binet’s formula (29) is strictly decreasing in $x>0$ and satisfies

[TABLE]

Since $n!=\Gamma(n+1)$ , Lemma 10 implies a slight improvement of the Stirling approximation by Robbins (1955): For arbitrary integers $n\geq 0$ ,

[TABLE]

with

[TABLE]

In addition, Binet’s formula (29) and Lemma 10 lead to useful inequalities for the increments of $h(\cdot)$ .

Corollary 11.

For arbitrary $0<a<b$ ,

[TABLE]

where

[TABLE]

Proof of Lemma 10.

The series expansion of the exponential function and some elementary algebra lead to the representation

[TABLE]

with

[TABLE]

Note that $a_{1}=12^{-1}$ , and

[TABLE]

This shows that $a_{m}\leq 12^{-1}$ with strict inequality for $m\geq 3$ . Consequently, $w(t)<12^{-1}$ .

The reverse inequality, $w(t)>12^{-1}e^{-t/12}$ , is equivalent to

[TABLE]

The left hand side equals $12\sum_{m=1}^{\infty}a_{m}t^{m-1}/m!$ , while the right hand side equals

[TABLE]

Note that $12a_{1}=1=c_{1}$ . Consequently, $w(t)>12^{-1}e^{-t/12}$ for all $t>0$ , provided that $12a_{m}\geq c_{m}$ for all $m\geq 2$ . But $c_{2}=61/72$ and $c_{m+1}/c_{m}<11/12$ , whence $c_{m}\leq(61/72)(11/12)^{m-2}$ for $m\geq 2$ . Consequently, it suffices to show that

[TABLE]

But

[TABLE]

if and only if $m^{2}-9m<-12$ , and for integers $m\geq 2$ this is equivalent to $m\leq 7$ . Hence

[TABLE]

Since for any fixed $t>0$ , the integrand $e^{-tx}w(t)$ is strictly decreasing in $x>0$ , the remainder $R(x)$ is strictly decreasing in $x>0$ . The two bounds for $w(t)$ imply that $R(x)$ is larger than $12^{-1}\int_{0}^{\infty}e^{-t(x+1/12)}\,dt=(12x+1)^{-1}$ and smaller than $12^{-1}\int_{0}^{\infty}e^{-tx}\,dt=(12x)^{-1}$ . ∎

Proof of Corollary 11.

Writing $h(x)=\log\sqrt{2\pi}+\widetilde{h}(x)+R(x)$ with the auxiliary function $\widetilde{h}(x):=(x-1/2)\log x-x$ , the remainder term $s(a,b)$ equals $R(b)-R(a)$ . But

[TABLE]

and since $e^{-ta}-e^{-tb}>0$ , it follows from $0<w(t)<12^{-1}$ that

[TABLE]

Moreover, since $w(t)>12^{-1}e^{-t/12}$ ,

[TABLE]

∎

Special increments of $h$ .

In connection with student distributions, we need lower and upper bounds for the quantities $h(x+1/2)-h(x)-\log(x)/2$ . With a random variable $Y_{x}\sim\mathrm{Gamma}(x,1)$ , the latter expression equals $\log\mathop{\mathrm{I\!E}}\nolimits\sqrt{Y_{x}/x}$ , so it follows from Jensen’s inequality that $h(x+1/2)-h(x)-\log(x)/2<\log\sqrt{\mathop{\mathrm{I\!E}}\nolimits(Y_{x}/x)}=0$ . The next lemma shows that $h(x+1/2)-h(x)-\log(x)/2$ is close to to $-1/(8x)$ for large $x$ .

Lemma 12.

For arbitrary $x>0$ ,

[TABLE]

Proof of Lemma 12.

Let us first mention that the second derivative of the log-gamma function $h$ is given by Gauss’ formula

[TABLE]

see Chapter 12 of Whittaker and Watson (1996). In particular, $h^{\prime\prime}$ is strictly convex and decreasing on $(0,\infty)$ with

[TABLE]

because $(x+n)^{-2}>\int_{x+n}^{x+n+1}y^{-2}\,dy$ .

Now we start with a general consideration about second order differences of $h$ : For arbitrary $0<a<z$ ,

[TABLE]

where $U$ and $V$ are independent random variables with uniform distribution on $[0,1]$ . Since $h^{\prime\prime}$ is convex and $h^{\prime\prime}(z)>1/z$ , it follows from Jensen’s inequality that

[TABLE]

Note also that the distribution of $W:=U+V$ is given by the triangular density $f(w):=(1-|w-1|)_{+}$ , so

[TABLE]

We first apply these findings with $z=x+1/2$ and $a=1/2$ : Since $h(x+1)-h(x)=\log x$ ,

[TABLE]

which gives us the upper bound for $h(x+1/2)-h(x)-\log(x)/2$ . Furthermore,

[TABLE]

On the other hand, if $x>1/2$ , then with $z=x+1/2$ and $a=1$ we obtain

[TABLE]

Note that

[TABLE]

has the following properties:

[TABLE]

and

[TABLE]

These properties plus the convexity of $h^{\prime\prime}$ imply that

[TABLE]

Indeed, the latter integral doesn’t change if we replace $h^{\prime\prime}(x+1/2+t)$ with $g(t):=h^{\prime\prime}(x+1/2+t)+a+bt$ with constants $a,b$ such that $g(\pm 1/3)=0$ . But then, by convexity of $g$ and the sign changes of $\Delta$ , we have that $g\Delta\geq 0$ . Consequently,

[TABLE]

Finally, with $y:=(2x)^{-1}<1$ , the latter expression equals

[TABLE]

∎

Acknowledgements.

Constructive comments of David Ginsbourger, Dominic Schuhmacher and Kaspar Stucki on an early version of this paper are gratefully acknowledged. We also thank Lutz Mattner for drawing our attention to the technical report Christensen et al. (1995) and for pointing out the connection between the ratio measure and the mixture index of fit. Constructive comments of a referee led to further improvements such as Proposition 1.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Antonelli and Regoli (2005) Antonelli, S. and Regoli, G. (2005). On the Poisson-binomial relative error. Statist. Probab. Lett. 71 249–256.
2Barbour and Hall (1984) Barbour, A. D. and Hall, P. (1984). On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 473–480.
3Christensen et al. (1995) Christensen, J. , Fischer, P. and Kvols, K. (1995). On the ratio of binomial and poisson probability distributions. Tech. Rep. 7, Matematisk Institut, Kobenhavns Universitet.
4Diaconis and Freedman (1980) Diaconis, P. and Freedman, D. (1980). Finite exchangeable sequences. Ann. Probab. 8 745–764.
5Diaconis and Freedman (1987) Diaconis, P. and Freedman, D. (1987). A dozen de Finetti-style results in search of a theory. Ann. Inst. H. Poincaré Probab. Statist. 23 397–423.
6Dümbgen and Wellner (2020) Dümbgen, L. and Wellner, J. A. (2020). The density ratio of Poisson binomial versus Poisson distributions. Statist. Probab. Lett. 165 108862. (ar Xiv:1910.03444).
7Ehm (1991) Ehm, W. (1991). Binomial approximation to the Poisson binomial distribution. Statist. Probab. Lett. 11 7–16.
8Freedman (1977) Freedman, D. (1977). A remark on the difference between sampling with and without replacement. J. Amer. Statist. Assoc. 72 681.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Bounding distributional errors via density ratios

Abstract

Key words:

1 Introduction

Proposition 1.

2 Binomial approximation of hypergeometric distributions

Sampling from a finite population.

Hypergeometric and binomial distributions.

Theorem 2.

Remarks.

3 Poisson approximations

3.1 Binomial distributions

Theorem 3.

Remarks.

Poisson binomial distributions.

3.2 Multinomial distributions and Poissonization

Multinomial distributions.

Poissonization.

4 Gamma approximations and more

4.1 Beta distributions

Theorem 4.

Remarks.

4.2 The Lévy–Poincaré projection problem

Corollary 5.

4.3 Dirichlet distributions and uniform spacings

Dirichlet distributions.

Corollary 6.

Uniform spacings.

Corollary 7.

Remarks.

4.4 Student distributions

Lemma 8.

Remarks.

4.5 A counterexample: convergence of normal extremes

Example 9.

5 Proofs and Auxiliary Results

5.1 Proofs of the main results

Proof of (1).

Proof of Proposition 1.

Proof of (5) and comparison with (4).

Auxiliary inequalities.

Proof of Theorem 2.

Proof of Theorem 3.

Proof of Theorem 4.

Proof of Lemma 8.

5.2 Auxiliary Results for the Gamma Function

Binet’s first formula and Stirling’s approximation.

Lemma 10.

Corollary 11.

Proof of Lemma 10.

Proof of Corollary 11.

Special increments of hhh.

Lemma 12.

Proof of Lemma 12.

Acknowledgements.

Special increments of $h$ .