The Bennett-Orlicz norm

Jon A. Wellner

arXiv:1703.01721·math.ST·March 7, 2017

The Bennett-Orlicz norm

Jon A. Wellner

PDF

Open Access

TL;DR

This paper introduces the Bennett-Orlicz norm, a new mathematical tool linked to Bennett inequalities, providing potentially tighter bounds for expectations of maxima compared to existing norms.

Contribution

The paper presents the Bennett-Orlicz norm, expanding the family of Orlicz norms and connecting it to Bennett inequalities for improved expectation bounds.

Findings

01

Bennett-Orlicz norm yields tighter expectation inequalities.

02

Connections established between Bennett-Orlicz, Bernstein, and Prokhorov norms.

03

Comparisons made with classical inequalities and prior results.

Abstract

Lederer and van de Geer (2013) introduced a new Orlicz norm, the Bernstein-Orlicz norm, which is connected to Bernstein type inequalities. Here we introduce another Orlicz norm, the Bennett-Orlicz norm, which is connected to Bennett type inequalities. The new Bennett-Orlicz norm yields inequalities for expectations of maxima which are potentially somewhat tighter than those resulting from the Bernstein-Orlicz norm when they are both applicable. We discuss cross connections between these norms, exponential inequalities of the Bernstein, Bennett, and Prokhorov types, and make comparisons with results of Talagrand (1989, 1994), and Boucheron, Lugosi, and Massart (2013).

Figures2

Click any figure to enlarge with its caption.

Equations256

∥ X ∥_{Ψ} = in f {c > 0 : E Ψ (\frac{∣ X ∣}{c}) \leq 1},

∥ X ∥_{Ψ} = in f {c > 0 : E Ψ (\frac{∣ X ∣}{c}) \leq 1},

\displaystyle\big{\|}\max_{1\leq j\leq m}X_{i}\big{\|}_{\Psi}\leq K\Psi^{-1}(m)\max_{1\leq j\leq m}\|X_{i}\|_{\Psi}

\displaystyle\big{\|}\max_{1\leq j\leq m}X_{i}\big{\|}_{\Psi}\leq K\Psi^{-1}(m)\max_{1\leq j\leq m}\|X_{i}\|_{\Psi}

x, y \to \infty lim sup \frac{Ψ ^{- 1} ( x y )}{Ψ ^{- 1} ( x ) Ψ ^{- 1} ( y )} < \infty \mbox an d x \to \infty lim sup \frac{Ψ ^{- 1} ( x ^{2} )}{Ψ ^{- 1} ( x )} < \infty.

x, y \to \infty lim sup \frac{Ψ ^{- 1} ( x y )}{Ψ ^{- 1} ( x ) Ψ ^{- 1} ( y )} < \infty \mbox an d x \to \infty lim sup \frac{Ψ ^{- 1} ( x ^{2} )}{Ψ ^{- 1} ( x )} < \infty.

\displaystyle\bigg{\|}\sup_{k\geq 1}\frac{|X_{k}|}{\Psi^{-1}(k)}\bigg{\|}_{\Psi}\leq M\sup_{k\geq 1}\|X_{k}\|_{\Psi}.

\displaystyle\bigg{\|}\sup_{k\geq 1}\frac{|X_{k}|}{\Psi^{-1}(k)}\bigg{\|}_{\Psi}\leq M\sup_{k\geq 1}\|X_{k}\|_{\Psi}.

Ψ (x) = exp (h (x)) - 1

Ψ (x) = exp (h (x)) - 1

h_{0} (x)

h_{0} (x)

h_{1} (x)

h_{2} (x)

h_{4} (x)

h_{5} (x)

Ψ_{L} (x) \equiv Ψ_{1} (x; L) \equiv exp {(\frac{1 + 2 Lx - 1}{L})^{2}} - 1 = exp {\frac{2}{L ^{2}} h_{1} (Lx)} - 1.

Ψ_{L} (x) \equiv Ψ_{1} (x; L) \equiv exp {(\frac{1 + 2 Lx - 1}{L})^{2}} - 1 = exp {\frac{2}{L ^{2}} h_{1} (Lx)} - 1.

\displaystyle\Psi_{1}(x;L)\sim\left\{\begin{array}[]{l l}\exp(x^{2})-1&\ \ \mbox{for}\ Lx\ \mbox{small},\\ \exp(2x/L)-1&\ \ \mbox{for}\ Lx\ \mbox{large}.\end{array}\right.

\displaystyle\Psi_{1}(x;L)\sim\left\{\begin{array}[]{l l}\exp(x^{2})-1&\ \ \mbox{for}\ Lx\ \mbox{small},\\ \exp(2x/L)-1&\ \ \mbox{for}\ Lx\ \mbox{large}.\end{array}\right.

P (∣ Z ∣ > τ [t + 2^{- 1} L t]) \leq 2 e^{- t} \mbox f or a l l t > 0;

P (∣ Z ∣ > τ [t + 2^{- 1} L t]) \leq 2 e^{- t} \mbox f or a l l t > 0;

P (∣ Z ∣ > (τ / L) h_{1}^{- 1} (\frac{L ^{2} t}{2})) \leq 2 e^{- t} \mbox f or a l l t > 0;

P (∣ Z ∣ > (τ / L) h_{1}^{- 1} (\frac{L ^{2} t}{2})) \leq 2 e^{- t} \mbox f or a l l t > 0;

P (∣ Z ∣ > z) \leq 2 exp (- \frac{2}{L ^{2}} h_{1} (\frac{L z}{τ})) \mbox f or a l l z > 0.

P (∣ Z ∣ > z) \leq 2 exp (- \frac{2}{L ^{2}} h_{1} (\frac{L z}{τ})) \mbox f or a l l z > 0.

P (∣ Z ∣ \geq τ [t + 2^{- 1} L t]) \leq 2 e^{- t} \mbox f or a l l t > 0.

P (∣ Z ∣ \geq τ [t + 2^{- 1} L t]) \leq 2 e^{- t} \mbox f or a l l t > 0.

P (∣ X - ν ∣ \geq z) \leq 2 exp (- ν h_{2} (z / ν)) \leq 2 exp (- 9 ν h_{1} (z / (3 ν)))

P (∣ X - ν ∣ \geq z) \leq 2 exp (- ν h_{2} (z / ν)) \leq 2 exp (- 9 ν h_{1} (z / (3 ν)))

∥ X - ν ∥_{Ψ_{1} (\cdot; 2/3 ν)} \leq 6 ν .

∥ X - ν ∥_{Ψ_{1} (\cdot; 2/3 ν)} \leq 6 ν .

E {1 \leq j \leq m max ∣ Z_{j} ∣} \leq τ Ψ_{1}^{- 1} (m; L) = τ {lo g (1 + m) + \frac{L}{2} lo g (1 + m)} .

E {1 \leq j \leq m max ∣ Z_{j} ∣} \leq τ Ψ_{1}^{- 1} (m; L) = τ {lo g (1 + m) + \frac{L}{2} lo g (1 + m)} .

E {1 \leq j \leq m max ∣ Z_{j} ∣} \leq τ Ψ_{1}^{- 1} (m; L) \leq 2 max {τ, Lτ /2} lo g (1 + m) .

E {1 \leq j \leq m max ∣ Z_{j} ∣} \leq τ Ψ_{1}^{- 1} (m; L) \leq 2 max {τ, Lτ /2} lo g (1 + m) .

E {1 \leq j \leq m max ∣ Z_{j} ∣} \leq τ Ψ_{1}^{- 1} (m; L) \leq 2 {2 ν \lor 1/3} lo g (1 + m) .

E {1 \leq j \leq m max ∣ Z_{j} ∣} \leq τ Ψ_{1}^{- 1} (m; L) \leq 2 {2 ν \lor 1/3} lo g (1 + m) .

h (x) = x (lo g x - 1) + 1,

h (x) = x (lo g x - 1) + 1,

h_{1} (x) = 1 + x - 1 + 2 x,

h_{0} (x) = \frac{x ^{2}}{2 ( 1 + x )} .

9 h_{0} (x /3) \leq 9 h_{1} (x /3) \leq h (1 + x) .

9 h_{0} (x /3) \leq 9 h_{1} (x /3) \leq h (1 + x) .

h_{0} (x) \leq h_{1} (x) \leq 2 h_{0} (x) \leq h_{0} (2 x) .

h_{0} (x) \leq h_{1} (x) \leq 2 h_{0} (x) \leq h_{0} (2 x) .

h_{1}^{- 1} (y) = y + 2 y, \mbox f or y \geq 0,

h_{1}^{- 1} (y) = y + 2 y, \mbox f or y \geq 0,

h_{0}^{- 1} (y) = y + y^{2} + 2 y .

t + 2^{- 1} L t = \frac{1}{L} h_{1}^{- 1} (\frac{L ^{2} t}{2}) .

t + 2^{- 1} L t = \frac{1}{L} h_{1}^{- 1} (\frac{L ^{2} t}{2}) .

P (∣ Z ∣ > z)

P (∣ Z ∣ > z)

P (∣ Z ∣ > z) \leq 2 exp (- \frac{z ^{2}}{2 ( A + B z )}) = 2 exp (- \frac{A}{B ^{2}} h_{0} (\frac{B z}{A})) \mbox f or a l l z > 0

P (∣ Z ∣ > z) \leq 2 exp (- \frac{z ^{2}}{2 ( A + B z )}) = 2 exp (- \frac{A}{B ^{2}} h_{0} (\frac{B z}{A})) \mbox f or a l l z > 0

P (∣ Z ∣ > z)

P (∣ Z ∣ > z)

=

P (n (\overline{X}_{n} - μ) \geq z)

P (n (\overline{X}_{n} - μ) \geq z)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Numerical Analysis Techniques · Mathematical Inequalities and Applications · Optimization and Variational Analysis

Full text

The Bennett-Orlicz norm

Jon A. Wellnerlabel=e1][email protected] [[ Department of Statistics

University of Washington

Seattle, WA 98195-4322

Abstract

Lederer and van de Geer (2013) introduced a new Orlicz norm, the Bernstein-Orlicz norm, which is connected to Bernstein type inequalities. Here we introduce another Orlicz norm, the Bennett-Orlicz norm, which is connected to Bennett type inequalities. The new Bennett-Orlicz norm yields inequalities for expectations of maxima which are potentially somewhat tighter than those resulting from the Bernstein-Orlicz norm when they are both applicable. We discuss cross connections between these norms, exponential inequalities of the Bernstein, Bennett, and Prokhorov types, and make comparisons with results of Talagrand (1989, 1994), and Boucheron, Lugosi, and Massart (2013).

60E15,

60F10,

Bennett’s inequality,

exponential bound,

Maximal inequality,

Orlicz norm,

Poisson,

Prokhorov’s inequality,

keywords:

[class=AMS]

keywords:

label=u1,url]http://www.stat.washington.edu/jaw/

t1Supported in part by NSF Grants DMS-1104832 and DMS 1566514, and NI-AID grant 2R01 AI291968-04

1 Orlicz norms and maximal inequalities
2 The Bernstein-Orlicz norm
3 Bennett’s inequality and the Bennett-Orlicz norm
4 Prokhorov’s “arcsinh” exponential bound and Orlicz norms
5 Comparisons with some results of Talagrand
6 Appendix 1: Lambert’s function $W$ ; inverses of $h$ and $h_{2}$
7 Appendix 2: General versions of Lemmas 1-5

1 Orlicz norms and maximal inequalities

Let $\Psi$ be an increasing convex function from $[0,\infty)$ onto $[0,\infty)$ . Such a function is called a Young-Orlicz modulus by Dudley [1999], and a Young modulus by de la Peña and Giné [1999]. Let $X$ be a random variable. The Orlicz norm $\|X\|_{\Psi}$ is defined by

[TABLE]

where the infimum over the empty set is $\infty$ . By Jensen’s inequality it is easily shown that this does define a norm on the set of random variables for which $\|X\|_{\Psi}$ is finite. The most important functions $\Psi$ for a variety of applications are those of the form $\Psi(x)=\exp(x^{p})-1\equiv\Psi_{p}(x)$ for $p\geq 1$ , and in particular $\Psi_{1}$ and $\Psi_{2}$ corresponding to random variables which are “sub-exponential” or “sub-Gaussian” respectively. See Krasnosel*′skiĭ and Rutickiĭ [1961], Dudley [1999], Arcones and Giné [1995], de la Peña and Giné [1999], and van der Vaart and Wellner [1996] for further background on Orlicz norms, and see Rao and Ren [1991], Krasnosel′*skiĭ and Rutickiĭ [1961], and Hewitt and Stromberg [1975] for more information about Birnbaum-Orlicz spaces.

The following useful lemmas are from van der Vaart and Wellner [1996], pages 95-97, and Arcones and Giné [1995] (see also de la Peña and Giné [1999], pages 188-190), respectively.

Lemma 1.1.

Let $\Psi$ be a convex, nondecreasing, nonzero function with $\Psi(0)=0$ and $\limsup_{x,y\rightarrow\infty}\Psi(x)\Psi(y)/\Psi(cxy)<\infty$ for some constant $c$ . Then, for any random variables $X_{1},\ldots,X_{m}$ ,

[TABLE]

*where $K$ is a constant depending only on $\Psi$ .

Lemma 1.2.

Let $\Psi$ be a Young Modulus satisfying

[TABLE]

Then for some constant $M$ depending only on $\Psi$ and every sequence of random variables $\{X_{k}:\ k\geq 1\}$ ,

[TABLE]

The inequality (1.1) shows that if Orlicz norms for individual random variables $\{X_{i}\}_{i=1}^{m}$ are under control, then the $\Psi-$ Orlicz norm of the maximum of the $X_{i}$ ’s is controlled by a constant times $\Psi^{-1}(m)$ times the maximum of the individual Orlicz norms. The inequality (1.2) shows a stronger related Orlicz norm control of the supremum of an entire sequence $X_{k}$ divided by $\Psi^{-1}(k)$ if the supremum of the individual Orlicz norms is finite. Lemma 1.2 implies Lemma 1.1 for Young functions of exponential type (such as $\Psi_{p}(x)=\exp(x^{p})-1$ with $p\geq 1$ ), but it does not hold for power type Young functions such as $\Psi(x)=x^{p}$ , $p\geq 1$ . These latter Young functions continue to be covered by Lemma 1.1. Arcones and Giné [1995] carefully define Young moduli $\Psi_{p}(x)=\exp(x^{p})-1$ for all $p>0$ and use Lemma 1.2 to establish laws of the iterated logarithm for U-statistics.

A general theme is that if $\Psi_{a}\leq\Psi_{b}$ and we have control of the individual $\Psi_{b}$ Orlicz norms, then Lemma 1.1 or Lemma 1.2 applied with $\Psi=\Psi_{b}$ will yield a better bound than with $\Psi=\Psi_{a}$ in the sense that $\Psi_{b}^{-1}(m)\leq\Psi_{a}^{-1}(m)$ .

Here we are interested in functions $\Psi$ of the form

[TABLE]

where $h$ is a nondecreasing convex function with $h(0)=0$ not of the form $x^{p}$ . In fact, the particular functions $h$ of interest here are (scaled versions of):

[TABLE]

for the particular $h(x)\equiv x(\log x-1)+1$ . The functions $h_{0}$ and $h_{1}$ are related to Bernstein exponential bounds and refinements thereof due to Birgé and Massart [1998], while the function $h_{2}$ is related to Bennett’s inequality (Bennett [1962]), and $h_{4}$ is related to Prokhorov’s inequality (Prokhorov [1959]).

van de Geer and Lederer [2013] studied the family of Orlicz norms defined in terms of scaled versions of $h_{1}$ , and called called them Bernstein-Orlicz norms. Our primary goal here is to compare and contrast the Orlicz norms defined in terms of $h_{0}$ , $h_{1}$ , $h_{2}$ , and $h_{4}$ . We begin in the next section by reviewing the Bernstein-Orlicz norm(s) as defined by van de Geer and Lederer [2013]. Section 3 gives corresponding results for what we call the Bennett-Orlicz norm(s) corresponding to the function $h_{2}$ . In Section 4 we give further comparisons and two applications.

2 The Bernstein-Orlicz norm

For a given number $L>0$ , van de Geer and Lederer [2013] have defined the Bernstein-Orlicz norm $\|X\|_{\Psi_{L}}$ with

[TABLE]

It is easily seen that

[TABLE]

The following three lemmas of van de Geer and Lederer [2013] should be compared with the development on page 96 of van der Vaart and Wellner [1996].

Lemma 2.1.

Let $\tau\equiv\|Z\|_{\Psi_{1}(\cdot;L)}$ . Then

[TABLE]

or, equivalently, with $h_{1}^{-1}(y)\equiv y+\sqrt{2y}$ ,

[TABLE]

or

[TABLE]

Lemma 2.2.

Suppose that for some $\tau$ and $L>0$ we have

[TABLE]

Equivalently, the inequality (2.3) holds. Then $\|Z\|_{\Psi_{1}(\cdot;\sqrt{3}L)}\leq\sqrt{3}\tau$ .

Example 2.1.

Suppose that $X\sim\mbox{Poisson}(\nu)$ . Then it is well-known (see e.g. Boucheron et al. [2013], page 23), that

[TABLE]

where $h_{2}(x)=h(1+x)=(x+1)\log(x+1)-x$ . Thus the inequality involving $h_{1}$ holds with $9\nu=2/L^{2}$ and $1/(3\nu)=L/\tau$ . Thus $L=\sqrt{2/(9\nu)}=3^{-1}\sqrt{2/\nu}$ , $\tau=L3\nu=\sqrt{2\nu}$ . We conclude from Lemma 2.2 that

[TABLE]

Pisier [1983] and Pollard [1990] showed how to bound the Orlicz norm of the maximum of random variables with bounded Orlicz norms; see also de la Peña and Giné [1999], section 4.3, and van der Vaart and Wellner [1996], Lemma 2.2.2, page 96. The following bound for the expectation of the maximum was given by van de Geer and Lederer [2013]; also see Boucheron et al. [2013], Theorem 2.5, pages 32-33.

Lemma 2.3.

Let $\tau$ and $L$ be positive constants, and let $Z_{1},\ldots,Z_{m}$ be random variables satisfying $\max_{1\leq j\leq m}\|Z_{j}\|_{\Psi_{L}}\leq\tau$ . Then

[TABLE]

Corollary 2.1.

For $m\geq 2$

[TABLE]

In particular when $Z_{j}\sim\mbox{Poisson}(\nu)$ for $1\leq j\leq m$

[TABLE]

Proof. This follows from Lemma 2.3 since $\sqrt{x}\leq x$ for $x\geq 1$ . The Poisson $(\nu)$ special case then follows from Example 2.1. $\Box$

It will be helpful to relate $\Psi_{1}(\cdot;L)$ to several functions appearing frequently in the theory of exponential bounds as follows: for $x\geq 0$ , we define

[TABLE]

It is easily shown (see e.g. Boucheron et al. [2013] Exercise 2.8, page 47) that

[TABLE]

A trivial restatement of the inequality on the left above and some algebra and easy inequalities yield

[TABLE]

The latter inequalities imply that the Orlicz norms based on $h_{0}$ and $h_{1}$ are equivalent up to constants.

One reason the functions $h_{0}$ and $h_{1}$ are so useful is that they both have explicit inverses: from Boucheron, Lugosi, and Massart (2013), page 29, for $h_{1}$ and direct calculation for $h_{0}$ ,

[TABLE]

To relate the inequalities in Lemmas 2.1 and 2.2 to more standard inequalities (with names) we note that

[TABLE]

This implies immediately that the inequality in Lemma 2.2 can be rewritten as

[TABLE]

Here is a formal statement of a proposition relating exponential tail bounds in the traditional Bernstein form in terms of $h_{0}$ to tail bounds in terms of the (larger) function $h_{1}$ .

Proposition 2.1.

Suppose that a random variable $Z$ satisfies

[TABLE]

for numbers $A,B>0$ . Then the hypothesis of Lemma 2.2 holds with $L$ and $\tau$ given by $L^{2}=2B^{2}/A$ and $\tau=2^{3/2}A^{1/2}$ :

[TABLE]

Proof. This follows from (2.7) and elementary manipulations. $\Box$

The classical route to proving inequalities of the form given in (2.8) for sums of independent random variables is via Bernstein’s inequality; see for example van der Vaart and Wellner [1996] Lemmas 2.2.9 and 2.2.11, pages 102 and 103, or Boucheron et al. [2013], Theorem 2.10, page 37. But the recent developments of concentration inequalities via Stein’s method yields inequalities of the form given in (2.8) for many random variables $Z$ which are not sums of independent random variables: see, for example, Ghosh and Goldstein [2011a, b] and Goldstein and Işlak [2014]. The point of the previous proposition is that (up to constants) these inequalities in terms of $h_{0}$ can be re-expressed in terms of the (larger) function $h_{1}$ .

3 Bennett’s inequality and the Bennett-Orlicz norm

We begin with a statement of a version of Bennett’s inequality for sums of bounded random variables; see Bennett [1962], Shorack and Wellner [1986], and Boucheron et al. [2013]. Let $h(x)\equiv x(\log x-1)+1$ and $h_{2}(x)\equiv h(1+x)$ . This function arises in Bennett’s inequality for bounded random variables and elsewhere; see e.g. Bennett [1962], Shorack and Wellner [1986], and Boucheron et al. [2013], page 35 (but note that their $h$ is our $h_{2}=h(1+\cdot)$ ). As noted in Example 1 above, the function $h$ also appears in exponential bounds for Poisson random variables: see Shorack and Wellner [1986] page 485, and Boucheron et al. [2013] page 23.

Proposition 3.1.

(Bennett) (i) Let $X_{1},\ldots,,X_{n}$ be independent with $\max_{1\leq j\leq n}(X_{j}-\mu_{j})\leq b$ , $E(X_{j})=\mu_{j}$ , $Var(X_{j})=\sigma_{n,j}^{2}$ . Let $\mu\equiv\sum_{j=1}^{n}\mu_{j}/n$ , $\sigma_{n}^{2}\equiv(\sigma_{n,1}^{2}+\cdots+\sigma_{n,n}^{2})/n$ . Then with $\psi(x)\equiv 2h(1+x)/x^{2}$ ,

[TABLE]

*for all $z>0$ .

(ii) If, in addition, $\max_{1\leq j\leq n}|X_{j}-\mu_{j}|\leq b$ , then*

[TABLE]

Using the inequality $h(1+x)\geq 9h_{1}(x/3)$ , it follows that

[TABLE]

Thus an inequality of the form of that in Lemma 2.1 holds with $2/L^{2}=9n\sigma_{n}^{2}/b^{2}$ and $L/\tau=b/(3\sqrt{n}\sigma_{n}^{2})$ . Thus $L=\sqrt{2/9}b/(\sqrt{n}\sigma_{n})$ and $\tau=L3\sqrt{n}\sigma_{n}^{2}/b=\sqrt{2}\sigma_{n}$ . It follows from Lemma 2.2 that

[TABLE]

or

[TABLE]

But this bound has not taken advantage of the fact the the first bound above involves the function $h$ (or $h_{2}$ ) rather than $h_{1}$ . It would seem to be of potential interest to develop an Orlicz norm based on the function $h_{2}\equiv h(1+\cdot)$ rather than the function $h_{1}$ . Motivated by the first inequality in Proposition 3.1, we define for each $L>0$ a new Orlicz norm based on the function $h_{2}$ as follows.

[TABLE]

Since $h_{2}$ is convex, $h_{2}(0)=0$ , and $h_{2}$ is increasing on $[0,\infty)$ , it follows that $\Psi_{2}(\cdot;L)$ defines a valid Orlicz norm (as defined in Section 1) for each $L$ :

[TABLE]

We call $\|X\|_{\Psi_{2}(\cdot;L)}$ the Bennett-Orlicz norm of $X$ . Note that with $\psi(Lx)\equiv x^{-2}(2/L^{2})h_{2}(Lx)$ ,

[TABLE]

We first relate $\Psi_{2}(\cdot;L)$ to $\Psi_{1}(x;L)$ and to the usual Gaussian Orlicz norm defined by $\Psi_{2}(x)=\exp(x^{2})-1$ .

Proposition 3.2.

(i) $\Psi_{2}(x;L)\leq\exp(x^{2})-1=\Psi_{2}(x)$ for all $x\geq 0$ .

(ii) $\Psi_{2}(x;L)\geq\Psi_{1}(x;L/3)$ for $x\geq 0$ .*

Proof. (i) follows since $\psi(x)\equiv 2x^{-2}h(1+x)\leq 1$ for all $x\geq 0$ ; see Shorack and Wellner [1986], Proposition 11.1.1, page 441. To show that (ii) holds, note that by (2.1)

[TABLE]

Thus the claimed inequality in (ii) is equivalent to

[TABLE]

or equivalently

[TABLE]

But the inequality in the last display holds in view of (2.6). $\Box$

Note that while $h_{1}$ and $\Psi_{1}(\cdot;L)$ have explicit inverses given in terms of $\sqrt{v}$ and $\log(1+v)$ by (2) and (2.4), inverses of the functions $h_{2}$ and $\Psi_{2}(\cdot;L)$ can only be written in terms of Lambert’s function (also called the product log function) $W$ satisfying $W(z)\exp(W(z))=z$ ; see Corless et al. [1996]. But this slight difficulty is easily overcome by way of several nice inequalities for $W$ . By use of $W$ and the inequalities developed in the Appendix, Section 6, we obtain the following proposition concerning $\Psi_{2}^{-1}(\cdot;L)$ .

Proposition 3.3.

(i) $\Psi_{2}^{-1}(y;L)\leq\Psi_{1}^{-1}(y;L/3)=\sqrt{\log(1+y)}+(L/6)\log(1+y)$ for $y\geq 0$ .

(ii) Furthermore, with $W$ denoting the Lambert $W$ function,*

[TABLE]

(iii) If $(L^{2}/2)\log(1+y)\geq 1$ , then

[TABLE]

(iv) If $(L^{2}/2)\log(1+y)\geq 5$ , then

[TABLE]

(v) If $(L^{2}/2)\log(1+y)\leq 9/4$ , then

[TABLE]

(vi)

[TABLE]

Proof. (i) follows immediately from Proposition 3.2. (ii) follows from the definition of $\Psi_{2}(\cdot;L)$ and direct computation for the first part; the second part follows from Lemma 6.1. The inequality in (iii) follows from (ii) and Lemma 6.2. The first inequality in (iv) follows from (iii) since $\log(y-1)\geq(1/2)\log y$ for $y\geq 4$ . The second inequality in (iv) follows by noting that

[TABLE]

if $L^{2}/2\geq 1$ . (v) follows from (ii) and Lemma 6.3, part (iv). $\Box$

Lemmas 2.1 and 2.2 by van de Geer and Lederer [2013] as stated in Section 2 should be compared with the development on page 96 of van der Vaart and Wellner [1996]. We now show that the following analogues of Lemmas 2.1 - 2.3 hold for $\|Z\|_{\Psi_{2}(\cdot;L)}$ .

Lemma 3.1.

Let $\tau\equiv\|Z\|_{\Psi_{2}(\cdot;L)}$ . Then

[TABLE]

where $h_{2}(x)\equiv h(1+x)$ and $h_{2}^{-1}$ is the inverse of $h_{2}$ (so that $h_{2}^{-1}(y)=h^{-1}(y)-1$ ).

Proof. Let $y>0$ . Since $\Psi_{2}(x;L)=\exp((2/L^{2})h_{2}(Lx))-1=e^{t}-1$ implies $h_{2}(Lx)=L^{2}t/2$ , it follows that for any $c>\|Z\|_{\Psi_{2}(\cdot;L)}$ we have

[TABLE]

Lemma 3.2.

Suppose that for some $\tau>0$ we have

[TABLE]

Equivalently,

[TABLE]

Then $\|Z\|_{\Psi_{2}(\cdot;\sqrt{3}L)}\leq\sqrt{3}\tau$ .

Proof. Let $\alpha,\beta>0$ . We compute

[TABLE]

Choosing $\alpha=\beta=\sqrt{3}$ this yields

[TABLE]

Hence we conclude that $\|Z\|_{\Psi_{2}(\cdot;\sqrt{3}L)}\leq\sqrt{3}\tau$ . $\Box$

Corollary 3.1.

*(i) If $X\sim\mbox{Poisson}(\nu)$ , then $\|X-\nu\|_{\Psi_{2}(\cdot;\sqrt{6/\nu})}\leq\sqrt{6\nu}$ .

(ii) If $X_{1},\ldots,X_{n}$ are i.i.d. Bernoulli $(p)$ , then*

[TABLE]

(iii) If $X\sim N(0,1)$ , then $\|X\|_{\Psi_{h}(\cdot;L)}\leq\sqrt{6}$ for every $L>0$ . By taking the limit on $L\searrow 0$ and noting that $\Psi_{2}(z;L)\rightarrow\Psi_{2}(z)\equiv\exp(z^{2})-1$ as $L\searrow 0$ this yields $\|X\|_{\Psi_{2}}\leq\sqrt{6}$ . In this case it is known that $\|X\|_{\Psi_{2}}=\sqrt{8/3}$ . (See van der Vaart and Wellner [1996], Exercise 2.2.1, page 105.)

Now for an inequality paralleling Lemma 2.3 for the Bernstein-Orlicz norm:

Lemma 3.3.

Let $\tau$ and $L$ be constants, and let $Z_{1},\ldots,Z_{m}$ be random variables satisfying $\max_{1\leq j\leq m}\|Z_{j}\|_{\Psi_{2}(\cdot;L)}\leq\tau$ . Then

[TABLE]

Furthermore,

[TABLE]

for all $m$ such that $\log(1+m)\geq 5$ (or $m\geq e^{5}-1$ ).

Remark 3.1.

The point of this last bound is that it gives an explicit trade-off between the Gaussian component (the term $\sqrt{\log(1+m)}$ ) and the Poisson component (the term $\log(1+m)/\log\log(1+m)$ ) governed by a Bennett type inequality. In contrast, the bounds obtained by van de Geer and Lederer [2013] yield a trade-off between the Gaussian world and the sub-exponential world governed by a Bernstein type inequality.

Proof. We write $\Psi_{2,L}\equiv\Psi_{2}(\cdot;L)$ . Let $c>\tau$ . Then by Jensen’s inequality

[TABLE]

Therefore,

[TABLE]

The remaining claims follow from Proposition 3.3. $\Box$

Here are analogues of Lemmas 4 and 5 of van de Geer and Lederer [2013].

Lemma 3.4.

Let $Z_{1},\ldots,Z_{m}$ be random variables satisfying

[TABLE]

for some $L$ and $\tau$ . Then, for all $t>0$

[TABLE]

Proof. For any $a>0$ and $t>0$ concavity of $h_{2}^{-1}$ together with $h_{2}^{-1}(0)=0$ imply that

[TABLE]

Therefore, by using a union bound and Lemma 3.1

[TABLE]

$\Box$

Lemma 3.5.

Let $Z_{1},\ldots,Z_{m}$ be random variables satisfying (3.7). Then

[TABLE]

Proof. Let

[TABLE]

Then Lemma 3.4 implies that

[TABLE]

Then the conclusion follows from Lemma 3.2. $\Box$

4 Prokhorov’s “arcsinh” exponential bound and Orlicz norms

Another important exponential bound for sums of independent bounded random variables is due to Prokhorov [1959]. As will be seen below, Prokhorov’s bound involves another function $h_{4}$ (rather than $h_{2}$ of Bennett’s inequality) given by

[TABLE]

Suppose that $X_{1},\ldots,X_{n}$ are independent random variables with $E(X_{j})=\mu_{j}$ and $|X_{j}-\mu_{j}|\leq b$ for some $b>0$ . Let $S_{n}=X_{1}+\cdots+X_{n}$ , and set $\mu\equiv n^{-1}\sum_{j=1}^{n}\mu_{j}$ , $\sigma_{n}^{2}\equiv n^{-1}Var(S_{n})$ . Prokhorov’s “arcsinh” exponential bound is as follows:

Proposition 4.1.

(Prokhorov) If the $X_{j}$ ’s satisfy the above assumptions, then

[TABLE]

Equivalently, with $\sigma_{n}^{2}\equiv n^{-1}Var(S_{n})$ and $h_{4}(x)\equiv(x/2)\mbox{arcsinh}(x/2)$ ,

[TABLE]

See e.g. Prokhorov [1959], Stout [1974], de la Peña and Giné [1999], Johnson et al. [1985], and Kruglov [2006] . Johnson et al. [1985] use Prokhorov’s inequality to control Orlicz norms for functions $\Psi$ of the form $\Psi(x)=\exp(\psi(x))$ with $\psi(x)\equiv x\log(1+x)$ and use the resulting inequalities to show that the optimal constants $D_{p}$ in Rosenthal’s inequalities grow as $p/\mbox{log}(p)$ .

Kruglov [2006] gives an improvement of Prokhorov’s inequality which involves replacing $h_{4}$ by

[TABLE]

Note that Prokhorov’s inequality is of the same form as Bennett’s inequality (3.1) in Proposition 3.1, but with Bennett’s $h_{2}$ replaced by Prokhorov’s $h_{4}$ .

Thus we want to compare Prokhorov’s inequality (and Kruglov’s improvement thereof) to Bennett’s inequality. As can be seen from the above development, this boils down to comparison of the functions $h_{2}$ , $h_{4}$ , and $h_{5}$ . The following lemma makes a number of comparisons and contrasts between the functions $h_{2}$ , $h_{4}$ , and $h_{5}$ .

Lemma 4.1.

*(Comparison of $h_{2}$ , $h_{4}$ , and $h_{5}$ )

(i)(a) $h_{2}(x)\geq h_{5}(x)\geq h_{4}(x)$ for all $x\geq 0$ .

(i)(b) $h_{2}^{-1}(y)\leq h_{5}^{-1}(y)\leq h_{4}^{-1}(y)$ for all $y\geq 0$ .

(ii)(a) $h_{2}(x)\geq(x/2)\log(1+x)\geq(x/2)\log(1+x/2)$ for all $x\geq 0$ .

(ii)(b) $h_{4}(x)\geq(x/2)\log(1+x/2)$ for all $x\geq 0$ .

(ii)(c) $h_{5}(x)\geq\ (x/2)\log(1+x/2)$ for all $x\geq 0$ .

(iii)(a) $h_{2}(x)\sim 2^{-1}x^{2}$ as $x\searrow 0$ ; $h_{2}(x)\sim x\log(x)$ as $x\rightarrow\infty$ .

(iii)(b) $h_{4}(x)\sim 4^{-1}x^{2}$ as $x\searrow 0$ ; $h_{4}(x)\sim(1/2)x\log(x)$ as $x\rightarrow\infty$ .

(iii)(c) $h_{5}(x)\sim 4^{-1}x^{2}$ as $x\searrow 0$ ; $h_{5}(x)\sim x\log(x)$ as $x\rightarrow\infty$ .

(iii)(d) $h_{2}(x)-h_{4}(x)\sim x^{2}/4$ as $x\searrow 0$ ; $h_{2}(x)-h_{4}(x)\sim(1/2)x\log x$ as $x\rightarrow\infty$ .

(iii)(e) $h_{2}(x)-h_{5}(x)\sim x^{2}/4$ as $x\searrow 0$ ; $h_{2}(x)-h_{5}(x)\sim\log x$ as $x\rightarrow\infty$ .*

(iv)(a) $h_{2}(x)=2^{-1}x^{2}\psi_{2}(x)$ where

[TABLE]

(iv)(b) $h_{4}(x)=4^{-1}x^{2}\psi_{4}(x)$ where

[TABLE]

(iv)(c) $h_{5}(x)=4^{-1}x^{2}\psi_{5}(x)$ where

[TABLE]

Proof. (i) We first prove that $h_{2}(x)\geq h_{4}(x)$ . Let $g(x)=h_{2}(x)-h_{4}(x)$ ; thus

[TABLE]

Then $g(0)=0$ and

[TABLE]

also has $g^{\prime}(0)=0$ . Note that $\sqrt{1+(x/2)^{2}}\leq 1+x/2$ and hence $x/2+\sqrt{1+(x/2)^{2}}\leq 1+x$ . Thus

[TABLE]

and hence

[TABLE]

and it suffices to show that the right side is $\geq 0$ for all $x$ . Thus we let

[TABLE]

Let $\overline{m}(x)\equiv 2m(2x)=\log(1+2x)-\frac{x}{\sqrt{1+x^{2}}}$ . Then $\overline{m}(0)=0$ and we compute

[TABLE]

so that $\overline{m}^{\prime}(0)=1$ and the numerator, $j$ , is easily seen to be non-negative since $(1+x^{2})^{3/2}\geq 1+x^{2}$ implies $2(1+x^{2})^{3/2}\geq 2(1+x^{2})\geq 1+2x$ for all $x\geq 0$ . Thus $h_{2}(x)\geq h_{4}(x)$ .

Kruglov [2006] shows that $h_{5}(x)\geq h_{4}(x)$ . Now we show that $h_{2}(x)\geq h_{5}(x)$ . Note that with $g(x)\equiv h_{2}(x)-h_{5}(x)$ ,

[TABLE]

has $g^{\prime}(x)=0$ and $g^{\prime}(x)\geq 0$ (as was shown above in (4.3) ). Thus $g(x)=\int_{0}^{x}g^{\prime}(v)dv\geq 0$ .

(i)(b) The inequalities for the inverse functions follow immediately from the inequalities for the functions themselves in (i)(a).

(ii)(a) To show that the first inequality holds, consider

[TABLE]

Then $g(0)=0$ and

[TABLE]

Thus $g^{\prime}(0)=0$ and $g(x)=\int_{0}^{x}g^{\prime}(y)dy\geq 0$ . The second inequality in (ii)(a) is trivial.

(ii)(b) This follows easily from $\mbox{arcsinh}(v)=\log(v+\sqrt{1+v^{2}})\geq\log(v+1)$ for all $v\geq 0$ .

(ii)(c) This follows from (i)(a) and (ii)(b).

(iii)(a) This follows from $\psi_{2}(x)\equiv\psi(x)\rightarrow 1$ as $x\searrow 0$ ; see Proposition 11.1.1, page 441, Shorack and Wellner [1986].

(iii)(b) Now

[TABLE]

with $h_{4}^{\prime}(0)=0$ , and

[TABLE]

with $h_{4}^{\prime\prime}(0)=1/2$ . Therefore

[TABLE]

and

[TABLE]

(iii)(c) Now

[TABLE]

where $h_{5}^{\prime}(x)=0$ and $h_{5}^{\prime\prime}$ is decreasing. Thus $h_{5}(x)=(x^{2}/2)h_{5}^{\prime\prime}(x^{*})$ for some $0\leq x\leq x^{*}$ and we conclude that $4x^{-2}h_{5}(x)\rightarrow 1$ as $x\geq x^{*}\searrow 0$ .

(iv)(a) The first part is a restatement of (ii)(a). The second part follows from (2.7): $h_{2}(x)=h(1+x)\geq 9h_{0}(x)=x^{2}/(2(1+x/3))$ , and the claim follows by definition of $\psi_{2}$ .

(iv)(b) The first inequality is a restatement of (ii)(b). The second inequality follows since $h_{4}(x)=h_{4}^{\prime\prime}(x^{*})$ where $x\mapsto h_{4}^{\prime\prime}(x)$ is decreasing, so

[TABLE]

To prove the third inequality, note that

[TABLE]

holds if $1+x^{2}/8\geq c(1+x^{2}/4)$ , or if $1-c\geq(x^{2}/4)(c-1/2)$ . Then rearrange and take $c=(1-\delta)$ for $\delta\in(0,1/2)$ .

(iv)(c) The first inequality follows from (ii)(c). The second inequality follows by arguing as in (iv)(b), but now without the complicating second factor: note that

[TABLE]

since $h_{5}^{\prime\prime}$ is decreasing. $\Box$

Discussion: 1. Even though Kruglov’s inequality improves on Prokhorov’s inequality, (ia) of Lemma 4.1 shows that Bennett’s inequality dominates both Kruglov’s improvement of Prokhorov’s inequality and Prokhorov’s inequality itself: $h_{2}\geq h_{5}\geq h_{4}$ .

2. (ii) of Lemma 4.1 shows that all three of the inequalities, Bennett, Kruglov, and Prokhorov, are based on functions $h_{2}$ , $h_{5}$ , and $h_{4}$ which are bounded below by $(x/2)\log(1+x/2)$ for all $x\geq 0$ . On the other hand, (ii)(d) shows that both $h_{2}$ and $h_{5}$ are very nearly equivalent for large $x$ , but that although $h_{4}$ grows at the same $x\log x$ rate as $h_{2}$ and $h_{5}$ , $h_{4}$ is smaller by a multiplicative factor of $1/2$ as $x\rightarrow\infty$ .

3. (iii)(a-c) of Lemma 4.1 shows that $h_{2}(x)\sim x^{2}/2$ as $x\searrow 0$ while $h_{k}(x)\sim x^{2}/4$ for both $h_{5}$ and $h_{4}$ ; thus $h_{2}(x)$ is larger at $x=0$ by a factor of $2$ . Furthermore, the difference $h_{2}-h_{4}$ is of order $(1/2)x\log x$ as $x\rightarrow\infty$ , while the difference $h_{2}-h_{5}$ is only of order $\log x$ as $x\rightarrow\infty$ .

4. (iv) of Lemma 4.1 re-expresses the behavior of the Kruglov and Prokhorov inequalities for small values of $x$ in terms of the corresponding $\psi_{k}$ functions.

The upshot of all of these comparisons is that Bennett’s inequality dominates both the Kruglov and Prokhorov inequalities. Figures 1 - 2 give graphical versions of these comparisons as well as comparisons to the Bernstein type $h-$ functions $h_{0}$ and $h_{1}$ .

5 Comparisons with some results of Talagrand

Our goal in this section is to give comparisons with some results of Talagrand [1989] and Talagrand [1994], especially his Theorem 3.5, page 45, and Proposition 6.5, page 58.

Talagrand [1994] defines a function $\varphi_{L,S}$ as follows:

[TABLE]

Because of the square-root on the log term, this can be regarded as corresponding to a “sub - Bennett” type exponential bound. One of the interesting properties of $\varphi_{L,S}$ established by Talagrand [1994] is given in the following lemma:

Lemma 5.1.

There is a number $K(L)$ depending on $L$ only such that

[TABLE]

This is Lemma 3.6 of Talagrand [1994] page 47. Talagrand uses this Lemma to develop a Kiefer-type inequality: see also van der Vaart and Wellner [1996], Corollary A.6.3. In the basic Kiefer type inequality for Binomial random variables, van der Vaart and Wellner [1996], Corollary A.6.3, it follows that

[TABLE]

for $\log(1/p)-1\geq 11$ ; i.e. for $p\leq e^{-12}$ .

A similar fact holds for any exponential bound of the Bennett type under a certain boundedness hypothesis. Suppose that

[TABLE]

and that $P(|Z|\geq v)=0$ for all $v\geq C$ . Then, since $\psi$ is decreasing, for $z\leq C$

[TABLE]

where the $\log$ term can be made arbitrarily large by choosing $\tau$ sufficiently small. Here the second inequality follows from the fact that

[TABLE]

Proof of (5.2): Since $\psi(x)=2x^{-2}h(1+x)$ where $h(x)=x(\log x-1)+1$ , we can write, with $\underline{\psi}(x)\equiv 2x^{-2}\log(x/e)$ ,

[TABLE]

where both terms are clearly non-negative. $\Box$

Now we consider another basic inequality due to Talagrand [1994]. Suppose that

[TABLE]

satisfies the following three properties:

(a) $C\subset D$ implies that $\theta(C)\leq\theta(D)$ for $C,D\in 2^{\cal X}$ .

(b) $\theta(C\cup D)\leq\theta(C)+\theta(D)$ .

(c) $\theta(C)\leq|C|=\#(C)$ .

Then if $X_{1},\ldots,X_{n}$ are i.i.d. $P$ non-atomic on $({\cal X},{\cal A})$ and $Z\equiv\theta(\{X_{1},\ldots,X_{n}\})$ , for some universal constant $K_{2}$ we have, for $z\geq K_{2}E(Z)$ ,

[TABLE]

As noted by Talagrand [1994], this follows from an isoperimetric inequality established in Talagrand [1989], but it is also a consequence of results of Talagrand [1991, 1995]. Here we simply note that it can be rephrased as a Bennett type inequality: for all $z\geq K_{2}E(Z)$

[TABLE]

This follows by simply checking that

[TABLE]

for $z\geq K_{2}E(Z)$ .

Also see Ledoux [2001], Theorem 7.5, page 142 and Corollary 7.8, page 148; Massart [2000], and Boucheron et al. [2013], Theorem 6.12, page 182.

One further remark seems to be in order: Talagrand [1989] Theorem 2 and Proposition 12, shows that Orlicz norms of the Bennett type are “too large” to yield nice generalizations of the classical Hoffmann-Jørgensen inequality in the setting of sums of independent bounded sequences in a general Banach space. This follows by noting that Talagrand’s condition (2.11) fails for the Bennett-Orlicz norm $\Psi_{2}(\cdot,L)$ as defined in (3.2).

6 Appendix 1: Lambert’s function $W$ ; inverses of $h$ and $h_{2}$

Let $h(x)\equiv x(\log x-1)+1$ and $h_{2}(x)\equiv h(1+x)$ for $x\geq 0$ . The function $h$ is convex, decreasing on $[0,1]$ , increasing on $[1,\infty)$ , with $h(1)=0$ ; see Shorack and Wellner [1986], page 439. The Lambert, or product log function, $W$ (see e.g. Corless et al. [1996] and satisfies $W(x)e^{W(x)}=x$ for $x\geq-1/e$ . As noted by Boucheron et al. [2013], problem 2.18, the inverse functions $h^{-1}$ (for the function $h:[1,\infty)\rightarrow[0,\infty)$ ) and $h_{2}^{-1}$ (for the function $h_{2}:[0,\infty)\rightarrow[0,\infty)$ ) can be expressed in terms of the function $W$ . Here are some facts about $W$ :

Fact 1: $W:[-1/e,\infty)\mapsto{\mathbb{R}}$ is multi-valued on $[-1/e,0)$ with two branches $W_{0}$ and

$W_{-1}$ where $W_{0}(x)>0$ , $W_{-1}(x)<0$ , and $W_{0}(-1/e)=-1=W_{-1}(-1/e)$ .

Fact 2: $W_{0}$ is monotone increasing on $[-1/e,\infty)$ with $W(0)=0$ and $W^{\prime}(0)=1$ .

See Roy and Olver [2010], section 4.13, page 111; and Corless et al. [1996].

In the following we simply write $W$ for $W_{0}$ . The following lemma shows that the inverses of the functions $h$ and $h_{2}$ can be expressed in terms of $W$ .

Lemma 6.1.

*( $h$ and $h_{2}$ inverses in terms of $W$ )

(i) For $y\geq 0$ *

[TABLE]

(ii) For $y\geq 0$

[TABLE]

Proof. If $h^{-1}$ is as in the display we have, since $h(x)=x(\log x-1)+1$ ,

[TABLE]

Thus (6.1) holds. Then (6.2) follows immediately. $\Box$

In view of Lemma 6.1, the following lower bounds on the function $W$ will be useful in deriving upper bounds on $h^{-1}$ and $h_{2}^{-1}$ .

Lemma 6.2.

(A lower bound for $W$ ) For $z>0$

[TABLE]

Proof. We first prove (6.3) for $z\geq 1/e$ . Since $W(z)$ is increasing for $z\geq 0$ , the claimed inequality is equivalent to

[TABLE]

for $ez\geq 1$ where $y\equiv(ez)^{1/2}$ . But then the last display is equivalent to

[TABLE]

or

[TABLE]

Now $g(1)=0$ , $g(e)=0$ , and $g^{\prime}(y)=2y-e-e\log y$ has $g^{\prime}(1)=2-e<0$ , $g^{\prime}(e)=0$ , and $g^{\prime}(y)>0$ for $y>e$ with $g^{\prime\prime}(y)=2-e/y$ , we find that $g^{\prime\prime}(e)=2-e/e=1>0$ . Thus the claimed bound holds for $z\geq 1/e$ . For $0\leq z<1/e$ the bound holds trivially since $W(z)\geq 0$ while $2^{-1}\log(ez)<0$ . $\Box$

Combining Lemma 6.1 with the lower bounds for $W$ given in Lemma 6.2 yields the following upper bounds for $h^{-1}$ and $h_{2}^{-1}$ . The second and third parts of the following lemma are motivated by the fact that $h_{2}(x)=h(1+x)\equiv(x^{2}/2)\psi(x)$ where $\psi(x)\nearrow 1$ as $x\searrow 0$ ; see Shorack and Wellner [1986], Proposition 4.4.1, page 441.

Lemma 6.3.

*(Upper bounds for $h^{-1}$ and $h_{2}^{-1}$ )

(i) For $y>1+e$ *

[TABLE]

(ii) For $y>1+e$ ,

[TABLE]

(iii) For $0\leq y\leq 9c^{-2}(c^{2}/2-1)^{2}$ with $c>\sqrt{2}$ ,

[TABLE]

*In particular, with $c=2$ , the bound holds for $0\leq y\leq 9/4$ , and with $c=2.2$ , the bound holds for $0\leq y\leq 1+e$ .

(iv) For $0<y<\infty$ ,*

[TABLE]

Proof. (i) Follows from (i) of Lemma 6.1 together with Lemma 6.2. Note that $g(x)\equiv x/\log(x)\geq e$ and $g$ is increasing for $x\geq e$ .

(ii) follows from (ii) of Lemma 6.1 and Lemma 6.2.

(iii) To show that (6.6) holds, note that the inequality is equivalent to $y\leq h_{2}(c\sqrt{y})$ , and hence, by taking $x\equiv c\sqrt{y}$ , to the inequality

[TABLE]

where $\psi(x)\equiv(2/x^{2})h(1+x)\geq 1/(1+x/3)$ by Lemma 4.1 (iva) (or by (10) of Proposition 11.4.1, Shorack and Wellner [1986] page 441). But then we have

[TABLE]

where the last inequality holds if $0\leq x\leq 3(c^{2}/2-1)$ . Hence the inequality in (iii) holds for $0\leq y\equiv x^{2}/c^{2}\leq 9(c^{2}/2-1)^{2}/c^{2}$ . Finally, (iv) holds by combining the bounds in (ii) and (iii). $\Box$

7 Appendix 2: General versions of Lemmas 1-5

Now consider Young functions of the form $\Psi=e^{\psi}-1$ where $\psi$ is assumed to be convex and nondecreasing with $\psi(0)=0$ . (Note that we have changed notation in this section: the functions $h$ and $h_{j}$ for $j\in\{0,1,2,4,6\}$ in Sections 1 - 6 are denoted here by $\psi$ .) Our goal in this section is to give general versions of Lemmas 1 - 5 of van de Geer and Lederer [2013] and section 3 above. The advantage of this formulation is that the resulting lemmas apply to all the special cases treated in Sections 2 and 3 and more.

Lemma 7.1.

Suppose that $\tau\equiv\|Z\|_{\Psi}<\infty$ . Then for all $t>0$

[TABLE]

For the general version of Lemma 2 we consider a scaled version of $\Psi$ as follows:

[TABLE]

Lemma 7.2.

Suppose that for some $\tau>0$ and $L>0$

[TABLE]

Then $\|Z\|_{\Psi(\cdot;\sqrt{3}L)}\leq\sqrt{3}\tau$ .

Lemma 7.3.

Suppose that $\Psi$ is non-decreasing, convex, with $\Psi(0)=0$ . Suppose that $Z_{1},\ldots,Z_{m}$ are random variables with $\max_{1\leq j\leq m}\|Z_{j}\|_{\Psi}\equiv\tau<\infty$ . Then

[TABLE]

Lemma 7.4.

Suppose that $\Psi$ is non-decreasing, convex, with $\Psi(0)=0$ . Suppose that $Z_{1},\ldots,Z_{m}$ are random variables with $\max_{1\leq j\leq m}\|Z_{j}\|_{\Psi}\equiv\tau<\infty$ . Then

[TABLE]

Lemma 7.5.

Suppose that $\Psi$ is non-decreasing, convex, with $\Psi(0)=0$ . Suppose that $Z_{1},\ldots,Z_{m}$ are random variables with $\max_{1\leq j\leq m}\|Z_{j}\|_{\Psi}\equiv\tau<\infty$ . Then

[TABLE]

Proof of Lemma 7.1. For all $c>\|Z\|_{\Psi}$

[TABLE]

Thus letting $c\searrow\tau$ yields

[TABLE]

$\Box$

Proof of Lemma 7.2. Let $\alpha,\beta>0$ . We compute

[TABLE]

by choosing $\alpha=\beta=\sqrt{3}$ . $\Box$

Proof of Lemma 7.3. Let $c>\tau$ . Then by Jensen’s inequality and convexity of $\Psi$

[TABLE]

Letting $c\searrow\tau$ yields

[TABLE]

$\Box$

Proof of Lemma 7.4. For any $u>0$ and $v>0$ concavity of $\psi^{-1}$ implies that

[TABLE]

Therefore, by using this with $u=\log(1+m)$ and $v=t$ , a union bound, and Lemma 7.1,

[TABLE]

$\Box$

Proof of Lemma 7.5. By Lemma 7.4

[TABLE]

so the hypothesis of Lemma 7.2 holds for

[TABLE]

with $L=\sqrt{2}$ and $\tau$ replaced by $\sqrt{2}\tau$ . Thus the conclusion of Lemma 7.2 holds for $Z$ with these choices of $L$ and $\tau$ : $\|Z\|_{\Psi(\cdot;\sqrt{6})}\leq\sqrt{6}\tau$ . $\Box$

**Acknowledgement: ** I owe thanks to Evan Greene and Johannes Lederer for several helpful conversations and suggestions. Thanks are also due to Richard Nickl for a query concerning Prokhorov’s inequality.

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arcones and Giné [1995] Arcones, M. A. and Giné, E. (1995). On the law of the iterated logarithm for canonical U 𝑈 U -statistics and processes. Stochastic Process. Appl. , 58 (2), 217–245.
2Bennett [1962] Bennett, G. (1962). Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association , 57 , 33–45.
3Birgé and Massart [1998] Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli , 4 (3), 329–375.
4Boucheron et al. [2013] Boucheron, S., Lugosi, G., and Massart, P. (2013). Concentration Inequalities . Oxford University Press, Oxford.
5Corless et al. [1996] Corless, R. M., Gonnet, G. H., Hare, D. E. G., Jeffrey, D. J., and Knuth, D. E. (1996). On the Lambert W 𝑊 W function. Adv. Comput. Math. , 5 (4), 329–359.
6de la Peña and Giné [1999] de la Peña, V. H. and Giné, E. (1999). Decoupling; From dependence to independence . Probability and its Applications (New York). Springer-Verlag, New York.
7Dudley [1999] Dudley, R. M. (1999). Uniform Central Limit Theorems , volume 63 of Cambridge Studies in Advanced Mathematics . Cambridge University Press, Cambridge.
8Ghosh and Goldstein [2011 a] Ghosh, S. and Goldstein, L. (2011 a). Applications of size biased couplings for concentration of measures. Electron. Commun. Probab. , 16 , 70–83.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

The Bennett-Orlicz norm

Abstract

keywords:

keywords:

Contents

1 Orlicz norms and maximal inequalities

Lemma 1.1**.**

Lemma 1.2**.**

2 The Bernstein-Orlicz norm

Lemma 2.1**.**

Lemma 2.2**.**

Example 2.1**.**

Lemma 2.3**.**

Corollary 2.1**.**

Proposition 2.1**.**

3 Bennett’s inequality and the Bennett-Orlicz norm

Proposition 3.1**.**

Proposition 3.2**.**

Proposition 3.3**.**

Lemma 3.1**.**

Lemma 3.2**.**

Corollary 3.1**.**

Lemma 3.3**.**

Remark 3.1**.**

Lemma 3.4**.**

Lemma 3.5**.**

4 Prokhorov’s “arcsinh” exponential bound and Orlicz norms

Proposition 4.1**.**

Lemma 4.1**.**

5 Comparisons with some results of Talagrand

Lemma 5.1**.**

6 Appendix 1: Lambert’s function WWW; inverses of hhh and h2h_{2}h2​

Lemma 6.1**.**

Lemma 6.2**.**

Lemma 6.3**.**

7 Appendix 2: General versions of Lemmas 1-5

Lemma 7.1**.**

Lemma 7.2**.**

Lemma 7.3**.**

Lemma 7.4**.**

Lemma 7.5**.**

Lemma 1.1.

Lemma 1.2.

Lemma 2.1.

Lemma 2.2.

Example 2.1.

Lemma 2.3.

Corollary 2.1.

Proposition 2.1.

Proposition 3.1.

Proposition 3.2.

Proposition 3.3.

Lemma 3.1.

Lemma 3.2.

Corollary 3.1.

Lemma 3.3.

Remark 3.1.

Lemma 3.4.

Lemma 3.5.

Proposition 4.1.

Lemma 4.1.

Lemma 5.1.

6 Appendix 1: Lambert’s function $W$ ; inverses of $h$ and $h_{2}$

Lemma 6.1.

Lemma 6.2.

Lemma 6.3.

Lemma 7.1.

Lemma 7.2.

Lemma 7.3.

Lemma 7.4.

Lemma 7.5.