Modified log-Sobolev inequalities and two-level concentration

Holger Sambale; Arthur Sinulis

arXiv:1905.06137·math.PR·April 13, 2021

Modified log-Sobolev inequalities and two-level concentration

Holger Sambale, Arthur Sinulis

PDF

TL;DR

This paper establishes that modified log-Sobolev inequalities lead to two-level concentration inequalities, applicable in continuous and discrete settings, and demonstrates their use in proving Talagrand's inequality and analyzing fluctuations of statistics.

Contribution

It introduces a general framework connecting modified log-Sobolev inequalities to two-level concentration and applies it to symmetric groups and hypercube slices.

Findings

01

Derived two-level concentration inequalities from mLSI.

02

Proved Talagrand's convex distance inequality using mLSI.

03

Obtained fluctuation orders consistent with CLTs for known statistics.

Abstract

We consider a generic modified logarithmic Sobolev inequality (mLSI) of the form $Ent_{μ} (e^{f}) \leq \frac{ρ}{2} E_{μ} e^{f} Γ (f)^{2}$ for some difference operator $Γ$ , and show how it implies two-level concentration inequalities akin to the Hanson--Wright or Bernstein inequality. This can be applied to the continuous (e.\,g. the sphere or bounded perturbations of product measures) as well as discrete setting (the symmetric group, finite measures satisfying an approximate tensorization property, \ldots). Moreover, we use modified logarithmic Sobolev inequalities on the symmetric group $S_{n}$ and for slices of the hypercube to prove Talagrand's convex distance inequality, and provide concentration inequalities for locally Lipschitz functions on $S_{n}$ . Some examples of known statistics are worked out, for which we obtain the correct order of fluctuations,…

Tables1

Table 1. Table 1 . Invariance and probabilistic properties of the four functions H 𝐻 H (Hamming distance), D 𝐷 D (Spearman’s footrule), S 2 superscript 𝑆 2 S^{2} (Spearman’s rank correlation) and I 𝐼 I (Kendall’s τ 𝜏 \tau ). This table has been extracted from information in [ Dia88 , Chapter 6] .

function $d$	invariance	mean $𝔼 d (id, \cdot)$	$Var (d (id, \cdot))$	limit theorem
H	bi-invariant	$n - 1$	$1$	$n - H \Rightarrow Poi (1)$
D	right invariant	$\frac{n^{2} - 1}{3}$	$\frac{(n + 1) (2 n^{2} + 7)}{45}$	CLT
$S^{2}$	right invariant	$\frac{n (n^{2} - 1)}{6}$	$\frac{n^{2} (n - 1) {(n + 1)}^{2}}{36}$	CLT
I	right invariant	$\frac{n (n - 1)}{4}$	$\frac{n (n - 1) (2 n + 5)}{72}$	CLT

Equations308

\operatorname{\mathbb{P}}\big{(}f(X)-\operatorname{\mathbb{E}}f(X)\geq t\big{)}\leq C\exp\Big{(}-\frac{t^{2}}{2K^{2}}\Big{)}

\operatorname{\mathbb{P}}\big{(}f(X)-\operatorname{\mathbb{E}}f(X)\geq t\big{)}\leq C\exp\Big{(}-\frac{t^{2}}{2K^{2}}\Big{)}

\operatorname{\mathbb{P}}\big{(}f(X)-\operatorname{\mathbb{E}}f(X)\geq t\big{)}\leq C\exp\Big{(}-\frac{t^{2}}{2(a+bt)}\Big{)}

\operatorname{\mathbb{P}}\big{(}f(X)-\operatorname{\mathbb{E}}f(X)\geq t\big{)}\leq C\exp\Big{(}-\frac{t^{2}}{2(a+bt)}\Big{)}

\operatorname{\mathbb{P}}\big{(}f(X)-\operatorname{\mathbb{E}}f(X)\geq t\big{)}\leq C\exp\Big{(}-\min\Big{(}\frac{t^{2}}{a},\frac{t}{b}\Big{)}\Big{)}.

\operatorname{\mathbb{P}}\big{(}f(X)-\operatorname{\mathbb{E}}f(X)\geq t\big{)}\leq C\exp\Big{(}-\min\Big{(}\frac{t^{2}}{a},\frac{t}{b}\Big{)}\Big{)}.

Ent_{μ} (e^{f}) \leq \frac{ρ}{2} E_{μ} Γ (f)^{2} e^{f},

Ent_{μ} (e^{f}) \leq \frac{ρ}{2} E_{μ} Γ (f)^{2} e^{f},

\mu(f-\operatorname{\mathbb{E}}_{\mu}f\geq t)\leq\exp\Big{(}-\frac{t^{2}}{2\rho}\Big{)},

\mu(f-\operatorname{\mathbb{E}}_{\mu}f\geq t)\leq\exp\Big{(}-\frac{t^{2}}{2\rho}\Big{)},

\mu(g\geq c+t)\leq C\exp\Big{(}-\frac{t^{2}}{2K^{2}}\Big{)}.

\mu(g\geq c+t)\leq C\exp\Big{(}-\frac{t^{2}}{2K^{2}}\Big{)}.

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\frac{4C}{3}\exp\Big{(}-\frac{1}{8}\min\Big{(}\frac{t^{2}}{\rho c^{2}},\frac{t}{\sqrt{\rho}K}\Big{)}\Big{)}.

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\frac{4C}{3}\exp\Big{(}-\frac{1}{8}\min\Big{(}\frac{t^{2}}{\rho c^{2}},\frac{t}{\sqrt{\rho}K}\Big{)}\Big{)}.

\mu\big{(}\lvert f-\operatorname{\mathbb{E}}_{\mu}f\rvert\geq t\big{)}\leq 2C\exp\Big{(}-\frac{1}{12}\min\Big{(}\frac{t^{2}}{\rho c^{2}},\frac{t}{\sqrt{\rho}K}\Big{)}\Big{)}.

\mu\big{(}\lvert f-\operatorname{\mathbb{E}}_{\mu}f\rvert\geq t\big{)}\leq 2C\exp\Big{(}-\frac{1}{12}\min\Big{(}\frac{t^{2}}{\rho c^{2}},\frac{t}{\sqrt{\rho}K}\Big{)}\Big{)}.

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\frac{4}{3}\exp\Big{(}-\frac{1}{8\rho}\min\Big{(}\frac{t^{2}}{(\operatorname{\mathbb{E}}_{\mu}g)^{2}},\frac{t}{b}\Big{)}\Big{)}.

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\frac{4}{3}\exp\Big{(}-\frac{1}{8\rho}\min\Big{(}\frac{t^{2}}{(\operatorname{\mathbb{E}}_{\mu}g)^{2}},\frac{t}{b}\Big{)}\Big{)}.

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\exp\Big{(}-c\min\Big{(}\frac{t^{2}}{\rho(\operatorname{\mathbb{E}}_{\mu}g)^{2}+2b^{2}\rho^{2}},\frac{t}{\sqrt{2}\rho b}\Big{)}\Big{)}

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\exp\Big{(}-c\min\Big{(}\frac{t^{2}}{\rho(\operatorname{\mathbb{E}}_{\mu}g)^{2}+2b^{2}\rho^{2}},\frac{t}{\sqrt{2}\rho b}\Big{)}\Big{)}

\mu(f-\operatorname{\mathbb{E}}_{\mu}f\geq t)\leq\exp\Big{(}-\frac{1}{4\rho}\min\Big{(}\frac{t^{2}}{2\operatorname{\mathbb{E}}_{\mu}g^{2}},\frac{t}{b}\Big{)}\Big{)}.

\mu(f-\operatorname{\mathbb{E}}_{\mu}f\geq t)\leq\exp\Big{(}-\frac{1}{4\rho}\min\Big{(}\frac{t^{2}}{2\operatorname{\mathbb{E}}_{\mu}g^{2}},\frac{t}{b}\Big{)}\Big{)}.

\frac{t^{2}}{a^{2}+bt}\leq\min\Big{(}\frac{t^{2}}{a^{2}},\frac{t}{b}\Big{)}\leq\frac{2t^{2}}{a^{2}+bt}.

\frac{t^{2}}{a^{2}+bt}\leq\min\Big{(}\frac{t^{2}}{a^{2}},\frac{t}{b}\Big{)}\leq\frac{2t^{2}}{a^{2}+bt}.

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\frac{4C}{3}\exp\Big{(}-\frac{t^{2}}{8(\rho c^{2}+\sqrt{\rho}Kt)}\Big{)}.

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\frac{4C}{3}\exp\Big{(}-\frac{t^{2}}{8(\rho c^{2}+\sqrt{\rho}Kt)}\Big{)}.

Γ (f)^{2} \leq a f + b

Γ (f)^{2} \leq a f + b

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\exp\Big{(}-\frac{t^{2}}{2\rho(2a\operatorname{\mathbb{E}}_{\mu}f+2b+\frac{1}{3}at)}\Big{)}.

\mu\big{(}f-\operatorname{\mathbb{E}}_{\mu}f\geq t\big{)}\leq\exp\Big{(}-\frac{t^{2}}{2\rho(2a\operatorname{\mathbb{E}}_{\mu}f+2b+\frac{1}{3}at)}\Big{)}.

\mu\big{(}\operatorname{\mathbb{E}}_{\mu}f-f\geq t\big{)}\leq\exp\Big{(}-\frac{t^{2}}{2\rho(2a\operatorname{\mathbb{E}}_{\mu}f+2b+\frac{1}{3}at)}\Big{)}.

\mu\big{(}\operatorname{\mathbb{E}}_{\mu}f-f\geq t\big{)}\leq\exp\Big{(}-\frac{t^{2}}{2\rho(2a\operatorname{\mathbb{E}}_{\mu}f+2b+\frac{1}{3}at)}\Big{)}.

Γ (f) (σ)^{2}

Γ (f) (σ)^{2}

Γ^{+} (f) (σ)^{2}

ObsDiam (S_{n}, d) : = σ \in S_{n} max n^{- 1} i, j \sum d (σ, σ τ_{ij})^{2} .

ObsDiam (S_{n}, d) : = σ \in S_{n} max n^{- 1} i, j \sum d (σ, σ τ_{ij})^{2} .

ObsDiam (S_{n}, d) = n^{- 1} i, j \sum d (id, τ_{ij})^{2} .

ObsDiam (S_{n}, d) = n^{- 1} i, j \sum d (id, τ_{ij})^{2} .

\pi_{n}(\lvert f-\operatorname{\mathbb{E}}_{\pi_{n}}f\rvert\geq t)\leq 2\exp\Big{(}-\frac{t^{2}}{2\mathrm{ObsDiam}(S_{n},d)}\Big{)}.

\pi_{n}(\lvert f-\operatorname{\mathbb{E}}_{\pi_{n}}f\rvert\geq t)\leq 2\exp\Big{(}-\frac{t^{2}}{2\mathrm{ObsDiam}(S_{n},d)}\Big{)}.

Var_{π_{n}} (f) \leq 4 ObsDiam (S_{n}, d) .

Var_{π_{n}} (f) \leq 4 ObsDiam (S_{n}, d) .

d_{T} (ω, A) : = α \in R^{n} : ∣ α ∣_{2} = 1 sup d_{α} (ω, A),

d_{T} (ω, A) : = α \in R^{n} : ∣ α ∣_{2} = 1 sup d_{α} (ω, A),

d_{α} (ω, A) : = ω^{'} \in A in f d_{α} (ω, ω^{'}) : = ω^{'} \in A in f i = 1 \sum n ∣ α_{i} ∣ \mathbbm 1_{ω_{i} \neq = ω_{i}^{'}} .

d_{α} (ω, A) : = ω^{'} \in A in f d_{α} (ω, ω^{'}) : = ω^{'} \in A in f i = 1 \sum n ∣ α_{i} ∣ \mathbbm 1_{ω_{i} \neq = ω_{i}^{'}} .

\pi_{n}(A)\operatorname{\mathbb{E}}_{\pi_{n}}\exp\Big{(}\frac{d_{T}(\cdot,A)^{2}}{144}\Big{)}\leq 1.

\pi_{n}(A)\operatorname{\mathbb{E}}_{\pi_{n}}\exp\Big{(}\frac{d_{T}(\cdot,A)^{2}}{144}\Big{)}\leq 1.

\pi_{n}(d_{T}(\cdot,A)\geq t)\leq 2\exp\Big{(}-\frac{t^{2}}{64}\Big{)}.

\pi_{n}(d_{T}(\cdot,A)\geq t)\leq 2\exp\Big{(}-\frac{t^{2}}{64}\Big{)}.

Γ (f) (η)^{2}

Γ (f) (η)^{2}

Γ^{+} (f) (σ)^{2}

\mu_{n,r}(A)\operatorname{\mathbb{E}}_{\mu_{n,r}}\exp\Big{(}\frac{d_{T}(\cdot,A)^{2}}{544}\Big{)}\leq 1.

\mu_{n,r}(A)\operatorname{\mathbb{E}}_{\mu_{n,r}}\exp\Big{(}\frac{d_{T}(\cdot,A)^{2}}{544}\Big{)}\leq 1.

H (π, σ)

H (π, σ)

D (π, σ)

S (π, σ)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

spacing=nonfrench

Modified log-Sobolev inequalities and two-level concentration

Holger Sambale1 and Arthur Sinulis1

1Faculty of Mathematics, Bielefeld University, Bielefeld, Germany

{hsambale, asinulis}@math.uni-bielefeld.de

Abstract.

We consider a generic modified logarithmic Sobolev inequality (mLSI) of the form $\mathrm{Ent}_{\mu}(e^{f})\leq\tfrac{\rho}{2}\operatorname{\mathbb{E}}_{\mu}e^{f}\Gamma(f)^{2}$ for some difference operator $\Gamma$ , and show how it implies two-level concentration inequalities akin to the Hanson–Wright or Bernstein inequality. This can be applied to the continuous (e. g. the sphere or bounded perturbations of product measures) as well as discrete setting (the symmetric group, finite measures satisfying an approximate tensorization property, …).

Moreover, we use modified logarithmic Sobolev inequalities on the symmetric group $S_{n}$ and for slices of the hypercube to prove Talagrand’s convex distance inequality, and provide concentration inequalities for locally Lipschitz functions on $S_{n}$ . Some examples of known statistics are worked out, for which we obtain the correct order of fluctuations, which is consistent with central limit theorems.

Key words and phrases:

Bernstein inequality, concentration of measure phenomenon, convex distance inequality, Hanson–Wright inequality, modified logarithmic Sobolev inequality, symmetric group

This research was supported by the German Research Foundation (DFG) via CRC 1283 “Taming uncertainty and profiting from randomness and low regularity in analysis, stochastics and their applications”.

1. Introduction

Concentration and one-sided deviation inequalities have become an indispensable tool of probability theory and its applications. A question that arises frequently is to bound the fluctuations of a function $f=f(X_{1},\ldots,X_{n})$ of many random variables (or, equivalently, a function on a product space) around its mean, and often times it is possible to prove sub-Gaussian tail decay of the form

[TABLE]

for some $C\geq 1$ , $K^{2}>0$ and all $t\geq 0$ . There are various ways to establish sub-Gaussian estimates, such as the martingale method, the entropy method and an information-theoretic approach, and we refer to the monograph [BLM13] for further details.

On the other hand, in some situations it is not possible to prove sub-Gaussian tails, and a suitable replacement might be Bernstein-type

[TABLE]

or Hanson–Wright-type inequalities

[TABLE]

As both inequalities show two different levels of tail decay (the Gaussian one for $t\leq ab^{-1}$ and an exponential one for $t>ab^{-1}$ ), we use the terminology of Adamczak (see [ABW17, AKPS19]) and call inequalities of these type two-level deviation inequalities. If a similar estimate holds for $-f(X)$ as well, we refer to these as two-level concentration inequalities.

The purpose of this note is to give a unified treatment of some of the existing literature on two-level deviation and concentration inequalities by showing that these are implied by a modified logarithmic Sobolev inequality (mLSI for short). We prove a general theorem providing two-level deviation and concentration inequalities in various frameworks. In particular, in Section 2, we get back and partially improve a number of earlier results like [BCG17] and [GS20].

We work in a general framework which was introduced in [BG99]. Consider a probability space $(\Omega,\mathcal{F},\mu)$ and let $\operatorname{\mathbb{E}}_{\mu}f$ denote the expectation of a random variable $f$ with respect to $\mu$ . An operator $\Gamma$ on a class $\mathcal{A}$ of bounded, measurable functions is called a difference operator, if

(1)

for all $f\in\mathcal{A}$ , $\Gamma(f)$ is a non-negative measurable function, 2. (2)

for all $f\in\mathcal{A}$ and $a\geq 0,b\in\operatorname{\mathbb{R}}$ we have $af+b\in\mathcal{A}$ and $\Gamma(af+b)=a\Gamma(f)$ .

At first reading, one can think of $\Gamma(f)=\lvert\nabla f\rvert$ in the setting $\Omega=\operatorname{\mathbb{R}}^{n}$ . However, we want to stress that we do not require $\Gamma$ to satisfy a chain rule, and $\Gamma$ does not need to be an operator in the language of functional analysis.

We say that $\mu$ satisfies a $\Gamma\mathrm{-mLSI}(\rho)$ for some $\rho>0$ , if for all $f\in\mathcal{A}$ we have

[TABLE]

where $\mathrm{Ent}_{\mu}(f)=\operatorname{\mathbb{E}}_{\mu}f\log f-\operatorname{\mathbb{E}}_{\mu}f\log(\operatorname{\mathbb{E}}_{\mu}f)$ ( $f\geq 0$ ) is the entropy functional. This functional inequality is well-known in the theory of concentration of measure and has been used in various works, see [BG99] and the references therein. It is well-known that if $\mu$ satisfies a $\Gamma\mathrm{-mLSI}(\rho)$ , we have for any function $f\in\mathcal{A}$ such that $\Gamma(f)\leq 1$ ,

[TABLE]

which is a classical first order concentration of measure result yielding subgaussian concentration (cf. (3.5)). It is not hard to see that the same holds for $-f$ if $\Gamma(af)=\lvert a\rvert\Gamma(f)$ for all $a\in\mathbb{R}$ . Our first goal is to establish second order analogues of (1.2).

1.1. Two-level concentration inequalities

Our first set of results are two-level deviation inequalities for probability measures satisfying a modified logarithmic Sobolev inequality.

Theorem 1.1.

Assume that $\mu$ satisfies a $\Gamma\mathrm{-mLSI}(\rho)$ for some difference operator $\Gamma$ and $\rho>0$ . Let $f,g:\Omega\to\operatorname{\mathbb{R}}$ be two measurable functions such that $\Gamma(f)\leq g$ and $g$ is sub-Gaussian, i. e. for some $c>0$ , $C\geq 1$ and $K>0$

[TABLE]

Then for all $t\geq 0$ it holds

[TABLE]

If moreover $\Gamma(af)=\lvert a\rvert\Gamma(f)$ for all $a\in\operatorname{\mathbb{R}}$ , we have

[TABLE]

One possible way to show sub-Gaussian concentration for $g$ in presence of a $\Gamma\mathrm{-mLSI}(\rho)$ is by the Herbst argument. This leads to the following corollary.

Corollary 1.2.

Assume that $\mu$ satisfies a $\Gamma-\mathrm{mLSI}(\rho)$ for some difference operator $\Gamma$ and $\rho>0$ . Let $f,g$ be two measurable functions such that $\Gamma(f)\leq g$ and $\Gamma(g)\leq b$ . Then for all $t\geq 0$ we have

[TABLE]

If, again, $\Gamma(af)=\lvert a\rvert\Gamma(f)$ for all $a\in\operatorname{\mathbb{R}}$ , then the same bound holds for $-f$ .

By elementary means (cf. (3.1)), the constant $4C/3$ can be replaced by any $C^{\prime}>1$ . It is also possible to modify our proofs in order to apply [KZ18, Lemma 1.3], which leads to an inequality of the form

[TABLE]

for some absolute constant $c$ (the same one as in [KZ18]). However, this is at the cost of a weaker denominator in the Gaussian term as compared to (1.4), and so we choose to present it in the form of Theorem 1.1.

If the difference operator $\Gamma$ satisfies a chain rule-type condition, we obtain the following result, especially improving some of the constants above:

Proposition 1.3.

Assume that $\mu$ satisfies a $\Gamma\mathrm{-mLSI}(\rho)$ for some $\rho>0$ and some difference operator $\Gamma$ which satisfies $\Gamma(g^{2})\leq 2g\Gamma(g)$ for all positive functions $g$ . Let $f\in\mathcal{A}$ be such that $\Gamma(f)\leq g$ and $\Gamma(g)\leq b$ . For any $t\geq 0$ it holds

[TABLE]

If $\Gamma$ satisfies $\Gamma(g^{2})\leq 2\lvert g\rvert\Gamma(g)$ for any $g\in\mathcal{A}$ , the same bound holds for $-f$ .

We will see a number of examples of such difference operators all along this paper. Obviously, one example is the usual gradient, but also many difference operators involving a positive part satisfy the property in question.

In all the above results, a possible choice of $g$ is usually given by $g=\Gamma(f)$ , resulting in $\operatorname{\mathbb{E}}_{\mu}\Gamma(f)$ in the denominator of the Gaussian term. In this case, the second condition reads as $\Gamma(\Gamma(f))\leq b$ , which can be understood as a condition on an iterated (and thus second order) difference of $f$ .

In fact, Theorem 1.1 can be understood as a Bernstein-type concentration inequality. Indeed, it is easy to see that for all $a,b>0$ and $t\geq 0$ we have

[TABLE]

This leads to the following corollary.

Corollary 1.4.

In the situation of Theorem 1.1, for all $t\geq 0$ we have

[TABLE]

If $\Gamma(af)=\lvert a\rvert\Gamma(f)$ for all $a\in\operatorname{\mathbb{R}}$ , then the same bound holds with $f$ replaced by $-f$ .

Let us remark that the use of modified LSIs allows us to prove results for some classes of measures we could not address in previous work (e. g. [GSS18b]), e. g. weakly dependent measures which might not have a finite number of atoms.

Next, we show similar deviation inequalities for an important class of functions, namely self-bounded functions. In our framework, for a difference operator $\Gamma$ we say that $f\geq 0$ is a $\Gamma-(a,b)-$ self-bounded function, if

[TABLE]

for some constants $a,b\geq 0$ . For a product measure $\mu$ , there are various sources that provide deviation or concentration inequalities for self-bounded functions, see e. g. [BLM00, Theorem 2.1], [Rio01, Théorème 3.1], [BLM03, Theorem 5], [BBLM05, Corollary 1], [Cha05, Theorem 3.9], [MR06, Theorem 1] and [BLM09, Theorem 1]. As many of the proofs rely on the entropy method, it is not hard to adapt them to obtain Bernstein-type deviation inequalities only requiring an mLSI, which includes many more types of measures also allowing for dependencies:

Proposition 1.5.

Assume that $\mu$ satisfies a $\Gamma\mathrm{-mLSI}(\rho)$ and let $f\geq 0$ be a $\Gamma-(a,b)-$ self-bounded function. Then for all $t\geq 0$ we have

[TABLE]

If, additionally, $\Gamma(\lambda f)=\lvert\lambda\rvert\Gamma(f)$ for all $\lambda\in\operatorname{\mathbb{R}}$ , then for all $t\in[0,\operatorname{\mathbb{E}}_{\mu}f]$ it holds

[TABLE]

As we show in Proposition 2.18, product measures always satisfy an mLSI with respect to a certain $L^{2}$ -type difference operator, which was also used in the works mentioned above. This is a well-known fact and was first proven in [Mas00].

1.2. The symmetric group

One example we especially discuss in this note is the symmetric group $S_{n}$ equipped with the uniform measure. To this end, we need some notations. We write the group operation on $S_{n}$ as $\tau\sigma$ for $\tau,\sigma\in S_{n}$ , and denote by $\tau_{ij}$ the transposition of $i$ and $j$ . We define two difference operators (on $\mathcal{A}=L^{\infty}(\pi_{n})=\operatorname{\mathbb{R}}^{S_{n}}$ ) via

[TABLE]

For our results, we will need that the symmetric group satisfies modified logarithmic Sobolev inequalities with respect to the two difference operators defined above:

Proposition 1.6.

Let $(S_{n},\pi_{n})$ be the symmetric group equipped with the uniform measure. Then a $\Gamma\mathrm{-mLSI}(1)$ and a $\Gamma^{+}\mathrm{-mLSI}(2)$ hold.

To formulate our next result, let us recall the notion of observable diameter. In the context of $S_{n}$ equipped with any metric $d$ , we define it by

[TABLE]

For some metrics, this expression can be simplified. We say that a metric is right invariant, if for any $\pi,\sigma,\tau\in S_{n}$ we have $d(\pi,\sigma)=d(\pi\tau,\sigma\tau)$ , and left invariant if $d(\pi,\sigma)=d(\tau\pi,\tau\sigma)$ . It is bi-invariant, if it is right and left invariant. Assuming that $d$ is left (or right) invariant, we have

[TABLE]

We call a function $f:S_{n}\to\operatorname{\mathbb{R}}$ locally Lipschitz with respect to $d$ , if for all $\sigma\in S_{n}$ and $i,j\in\{1,\ldots,n\}$ we have $\lvert f(\sigma)-f(\sigma\tau_{ij})\rvert\leq d(\sigma,\sigma\tau_{ij})$ .

Theorem 1.7.

Let $(S_{n},d)$ be the symmetric group equipped with a metric $d$ and $\pi_{n}$ be the uniform distribution on $S_{n}$ . Assume that $f:S_{n}\to\operatorname{\mathbb{R}}$ is locally Lipschitz with respect to $d$ . For all $t\geq 0$ it holds

[TABLE]

As a consequence, we have

[TABLE]

For example, Theorem 1.7 can easily recover concentration inequalities for locally Lipschitz functions with respect to the normalized Hamming distance $d_{H}(\sigma,\pi)=n^{-1}\sum_{i=1}^{n}\text{$ \mathbbm{1} $}_{\sigma(i)\neq\pi(i)}$ . In this case, $\mathrm{ObsDiam}(S_{n},d_{H})\leq 4$ . We work out further examples in Subsection 2.1.

Finally, we give a proof of Talagrand’s famous concentration inequality for the convex distance for random permutations by similar means as used in the proofs of the upper results. To this end, recall that for any measurable space $\Omega$ and any $\omega=(\omega_{1},\ldots,\omega_{n})\in\Omega^{n}$ , we may define the convex distance of $\omega$ to some measurable set $A\subset\Omega^{n}$ by

[TABLE]

where

[TABLE]

Proposition 1.8.

For any $A\subseteq S_{n}$ it holds

[TABLE]

As compared to Talagrand’s original formulation (see [Tal95, Theorem 5.1]), (1.8) has a weaker absolute constant 144 instead of 16. It is possible to improve our own constant a bit by invoking slightly more subtle estimates but we do not seem to arrive at 16. For product measures, an inequality similar to (1.8) was deduced in [Tal95], a form of which with a weaker constant was proven in [BLM09] with the help of the entropy method. This was extended to weakly dependent random variables in [Pau14]. However, it does not seem possible to adjust the method therein to the case of the symmetric group, and so we are not aware of any proof of either of the inequalities for the symmetric group using the entropy method. In [Sam17] the author has proven the convex distance inequality for the symmetric group using weak transport inequalities.

It is possible to prove a weaker version of (1.8) with a somewhat better constant:

Proposition 1.9.

Let $S_{n}$ be the symmetric group and $\pi_{n}$ be the uniform distribution on $S_{n}$ . For any set $A\subseteq S_{n}$ with $\pi_{n}(A)\geq 1/2$ and all $t\geq 0$ we have

[TABLE]

In fact, (1.8) implies (1.9) with a constant of 144 instead of 64.

1.3. Slices of the hypercube

Finally, let us discuss another model for which we are able to prove a convex distance inequality similar to (1.8). Given two natural numbers $n,r$ such that $r\leq n$ , consider the corresponding slice of the hypercube $C_{n,r}\coloneqq\{\eta\in\{0,1\}^{n}\colon\sum_{i}\eta_{i}=r\}$ , and denote by $\mu_{n,r}$ the uniform measure on $C_{n,r}$ . On $C_{n,r}$ , we define the difference operators

[TABLE]

Here, $\tau_{ij}\eta$ switches the $i$ -th and the $j$ -th coordinate of the configuration $\eta$ . Up to the scaling of $2/n$ , $\Gamma(f)^{2}$ is the generator of the so-called Bernoulli–Laplace model.

As in the previous section, a modified logarithmic Sobolev inequality holds:

Proposition 1.10.

For $(C_{n,r},\mu_{n,r})$ as above, a $\Gamma\mathrm{-mLSI}(1)$ and a $\Gamma^{+}\mathrm{-mLSI}(2)$ hold.

Using this, we may establish a convex distance inequality by means of the entropy method again:

Proposition 1.11.

For any $A\subseteq C_{n,r}$ it holds

[TABLE]

1.4. Outline

In Section 2 we provide various applications and concentration inequalities. This includes examples of functions on the symmetric group (Section 2.1), concentration inequalities for multilinear polynomials in $[0,1]$ -valued random variables (Section 2.2), as well as consequences of Theorem 1.1 for the Euclidean sphere and measures on $\operatorname{\mathbb{R}}^{n}$ satisfying a logarithmic Sobolev inequality (Section 2.3) and for probability measures (on general spaces) satisfying an mLSI with respect to some “ $L^{2}$ difference operator” (see Section 2.4). Moreover, in Section 2.5 we recover and extend the classical Bernstein inequality for independent random variables (up to constants).

Section 3 contains all the proofs, both of the results mentioned in this section as well as in Section 2.

2. Applications

Let us now describe various situations which give rise to mLSIs with respect to “natural” difference operators, and show some consequences of the main results.

2.1. Symmetric group

The aim of this subsection is to show how the results from Section 1 can be used to easily obtain concentration inequalities for functions on the symmetric group. In particular, we calculate many examples of statistics for which central limit theorems were proven, and show that the variance proxy of the sub-Gaussian estimate and the true variance agree (up to a constant independent of the dimension). This provides non-asymptotic concentration results, which are consistent with the limit theorems.

First, let us introduce the following natural metrics on $S_{n}$ :

[TABLE]

Table 1 collects some basic properties of $H$ , $D$ , $S^{2}$ and $I$ .

Example 2.1.

In this example, we calculate the observable diameters of the metrics on the symmetric group introduced above. By Theorem 1.7, this yields concentration properties for (locally) Lipschitz functions.

(1)

For the Hamming distance $H$ it is clear that $H(\sigma,\sigma\tau_{ij})=2$ , which implies $\mathrm{ObsDiam}(S_{n},H)=4(n-1)$ . So, Theorem 1.7 recovers a concentration result from [Mau79].

The resulting variance estimate is not always sharp; for example, if we consider the function $H(\cdot,\mathrm{id})$ , the variance is $1$ and not of order $n$ . On the other hand, the function $G=n-H(\mathrm{id},\cdot)$ is a locally Lipschitz function with respect to $H$ , which converges weakly to a Poisson random variable. As a consequence, there cannot be an $n$ -independent sub-Gaussian estimate in the class of all locally Lipschitz functions. 2. (2)

If we define for $p\in[1,\infty)$ a distance $d_{p}$ on $S_{n}$ by the induced $\ell^{p}$ norm

[TABLE]

this yields $d_{p}(\sigma,\sigma\tau_{ij})=2^{1/p}\lvert\sigma(i)-\sigma(j)\rvert$ . Consequently, recalling that

[TABLE]

for any $\sigma\in S_{n}$ , we have

[TABLE]

The case $p=1$ gives Spearman’s footrule and $p=2$ Spearman’s rank correlation. 3. (3)

Considering Kendall’s $\tau$ , we can readily see that for two indices $i,j$ and any $\sigma\in S_{n}$ it holds $I(\sigma,\sigma\tau_{ij})\leq 2\lvert\sigma(i)-\sigma(j)\rvert$ , since $\tau_{ij}\sigma^{-1}$ can be brought to $\sigma^{-1}$ by first taking $\sigma^{-1}(i)$ to its place, and then $\sigma^{-1}(j)$ . So, as above, this leads to

[TABLE] 4. (4)

In a more general setting, let $\rho:S_{n}\to\mathrm{GL}(V)$ be a faithful, unitary representation of $S_{n}$ and let $\lVert\cdot\rVert$ be a unitarily invariant norm on $\mathrm{GL}(V)$ . Then $d_{\rho}(\sigma,\tau)\coloneqq\lVert\rho(\sigma)-\rho(\tau)\rVert$ defines a bi-invariant metric on $S_{n}$ , and in this case we have

[TABLE]

Example 2.2.

Define the random variable $f(\sigma)=S^{2}(\sigma,\mathrm{id})=\sum_{i=1}^{n}(\sigma(i)-i)^{2}$ . We have

[TABLE]

If we define the matrix $A(\sigma)=(a_{ij}(\sigma))_{i,j}$ via $a_{ij}(\sigma)=(\sigma(i)-\sigma(j))(i-j)$ , then the right hand side is (up to the factor $4n^{-1}$ ) the squared Hilbert–Schmidt norm of $A(\sigma)$ . It is clear that $\lvert A(\sigma)\rvert_{\mathrm{HS}}=\lvert A(\sigma^{-1})\rvert_{\mathrm{HS}}$ , and one can also easily see that it is invariant under right multiplication with any transposition $\tau_{kl}$ . As any permutation can be written as a product of transpositions, we can evaluate it at the identity element. Consequently,

[TABLE]

Using (1.2), this leads to the concentration inequality

[TABLE]

Actually, the term $n^{5}$ is natural, as the variance of $f$ is of order $n^{5}$ (see the table above). Incorporating the variance of $f$ into the inequality above leads to

[TABLE]

which yields the correct tail behavior.

Example 2.3.

Let us consider the $1$ -Lipschitz function $f(\sigma)=I(\sigma,\mathrm{id})$ . For any $t\geq 0$ we have by (1.7), $\mathrm{Var}_{\pi_{n}}(f)=n(n-1)(2n+5)/72$ and Example 2.1 (3)

[TABLE]

which is consistent with the central limit theorem for $f$ .

Example 2.4.

We define the number of ascents $f(\sigma)=\sum_{j=1}^{n-1}\text{$ \mathbbm{1} $}_{\sigma(j+1)>\sigma(j)}.$ It can be easily shown that for any $i\neq j$ the number of ascents is not sensitive to transpositions in the sense that $\lvert f(\sigma)-f(\sigma\tau_{ij})\rvert\leq 2$ . Consequently, this leads to $\Gamma(f)^{2}\leq 4(n-1)$ , implying the concentration inequality

[TABLE]

again using (1.2). Alternatively, this also follows from Example 2.1 (1). Again, the variance term of order $\sqrt{n}$ is of the right order, as in [CKSS72] the authors have shown a central limit theorem for the number of ascents. More precisely, the sequence $g_{n}=(f-\operatorname{\mathbb{E}}_{\pi_{n}}f)/(\sqrt{(n+1)/12})$ converges to a standard normal distribution. The above calculations lead to

[TABLE]

Example 2.5.

A closely related statistic is given by the sum of the ascents defined as $f(\sigma)=\sum_{j=1}^{n-1}(\sigma_{i+1}-\sigma_{i})_{+}$ . A short calculation shows

[TABLE]

Indeed, if we let $\Delta_{i,j}\coloneqq(\sigma(i)-\sigma(j))_{+}$ , then

[TABLE]

Now each of the terms $\Delta_{i,i+1}+\Delta_{i+1,i}$ , $\Delta_{j,j-1}+\Delta_{j+1,j}$ is less than $(n-1)$ , and the same holds true for the two other sums. Therefore this yields

[TABLE]

[Cla09] has calculated the variance of the sum of ascents, and it is of order $n^{3}$ , which is in good accordance with the concentration inequality (again, up to the factor).

Example 2.6.

Given a matrix $a=(a_{ij})$ of real numbers satisfying $a_{ij}\in[0,1]$ , define $f(\sigma)=\sum_{i=1}^{n}a_{i,\sigma(i)}$ . By elementary computations one can show $\Gamma(f)^{2}\leq 4f+4\operatorname{\mathbb{E}}_{\pi_{n}}f$ , i. e. $f$ is self-bounding. As a consequence, Proposition 1.5 leads to

[TABLE]

Concentration inequalities for $f$ have been proven using the exchangeable pair approach in [Cha05, Proposition 3.10] (see also [Cha07, Theorem 1.1]), with the denominator being $4\operatorname{\mathbb{E}}_{\pi_{n}}f+2t$ .

For example, if $a$ is the identity matrix, $f$ is the number of fixed points of a random permutation, which satisfies $\operatorname{\mathbb{E}}_{\pi_{n}}f=1$ for all $n\in\operatorname{\mathbb{N}}$ . In this case, $f$ converges to a Poisson distribution with mean $1$ as $n\to\infty$ (see e. g. [Dia88]).

Example 2.7.

Finally, consider the random variable $f(\sigma)=g(\sigma)+g(\sigma^{-1})$ , where $g(\sigma)=\sum_{i=1}^{n-1}\text{$ \mathbbm{1} $}_{\sigma(i+1)>\sigma(i)}$ is the number of descents. In [CD17] the authors calculated the expectation and variance of $f$ and proved a central limit theorem. As in the above example one can easily see that $\Gamma(g)^{2}\leq 4(n-1)$ , as well as $\Gamma(g\circ\mathrm{inv})^{2}\leq 4(n-1)$ , where $\mathrm{inv}:S_{n}\to S_{n}$ denotes the inverse map. Since $\Gamma(h_{1}+h_{2})^{2}\leq 2\Gamma(h_{1})^{2}+2\Gamma(h_{2})^{2}$ holds true for any functions $h_{1},h_{2}$ , we also have $\Gamma(f)^{2}\leq 16(n-1)$ , implying for any $t\geq 0$

[TABLE]

Again, the variance is of order $\sqrt{n}$ , so that it is consistent with the CLT.

2.2. Multilinear polynomials in $[0,1]$ -random variables

The aim of this section is to show Bernstein-type concentration inequalities for a class of polynomials in independent random variables with values in $[0,1]$ . The functions we consider are constructed as follows: Let $H=(V,E,(w_{e})_{e\in E})$ be a weighted hypergraph, such that every $e\in E$ consists of at most $k$ vertices, assume that $(X_{v})_{v\in V}$ are independent, $[0,1]$ -valued random variables, and set

[TABLE]

Define the maximum first order partial derivative $\mathrm{ML}(f)$ as

[TABLE]

Proposition 2.8.

Let $(X_{v})_{v\in V}$ be independent, $[0,1]$ -valued random variables and $f:[0,1]^{V}\to\operatorname{\mathbb{R}}$ given as in (2.1). Assume that $w_{e}\geq 0$ and $\lvert e\rvert\leq k$ for all $e\in E$ . We have for any $t\geq 0$

[TABLE]

Furthermore, for $t\in[0,\operatorname{\mathbb{E}}f]$ it holds

[TABLE]

A slight modification of the proof of Proposition 2.8 also allows to prove deviation inequalities for suprema of such homogeneous polynomials. For example, this can be used to prove the following concentration inequalities for maxima or $l^{p}$ norms of linear forms.

Proposition 2.9.

Let $(X_{v})_{v\in V}$ be independent, $[0,1]$ -valued random variables, $\mathcal{F}\subset\{a\in\operatorname{\mathbb{R}}^{V}:a_{i}\in[0,1]^{n}\}$ and define $f_{\mathcal{F}}(X)\coloneqq\sup_{a\in\mathcal{F}}\sum_{i\in V}a_{i}X_{i}$ . For any $t\geq 0$ we have

[TABLE]

In particular, for any $p\in[1,\infty]$ it holds

[TABLE]

One possible application of Proposition 2.8 is to understand the finite $n$ concentration properties of the so-called d-runs on the line.

Proposition 2.10.

Let $d\in\operatorname{\mathbb{N}},n>d$ , $(X_{i})_{i=1,\ldots,n}$ be independent, identically distributed random variables with values in $[0,1]$ and mean $\eta\coloneqq\operatorname{\mathbb{E}}X_{1}>0$ . Define the random variable $f_{d}(X)\coloneqq\sum_{i=1}^{n}X_{i}\cdots X_{i+d-1}$ , where the indices are to be understood modulo $n$ . For any $t\geq 0$ it holds

[TABLE]

In [RR09, Theorem 4.1], the authors prove a CLT for the $d$ -runs on the line for Bernoulli random variables $X_{i}$ with success probability $p$ , by normalizing $f$ by $n^{-1/2}p^{-d/2}$ . This is also the reason for the choice $n^{1/2}\eta^{d/2}t$ in inequality (2.4). In other words, under the assumption $n\eta^{d}\to\infty$ as $n\to\infty$ , Proposition 2.10 yields sub-Gaussian tails for $n^{-1/2}\eta^{-d/2}f$ . This is in good accordance with the aforementioned CLT.

Moreover, note that in this example, our methodology leads to better results than the usual bounded difference inequality. Indeed, the latter only yields

[TABLE]

suggesting an (inaccurate) normalization of $f(X)$ by $n^{-1/2}$ .

Example 2.11.

If $(X_{v})_{v\in E(K_{n})}$ is the Erdös–Rényi model with parameter $p$ , for any fixed graph $H$ with $\lvert V\rvert$ vertices and $\lvert E\rvert$ edges, the subgraph counting statistic $T_{H}$ can be written in the form (2.1) with $w_{e}=1$ , and $k=\lvert E\rvert$ . Furthermore, it is easy to see that $\mathrm{ML}(f)\leq n^{\Delta-1}$ for the maximum degree $\Delta$ , so that Proposition 2.8 yields

[TABLE]

For example, this gives nontrivial bounds in the triangle case whenever $n^{2}p^{3}\to\infty$ as $n\to\infty$ . This bound is suboptimal, as the optimal decay is known to be $np\to\infty$ , see [Cha12, DK12]. However, it is better than the bound obtained by the bounded differences inequality. In general, if we consider subgraph counting statistics for some subgraph $H$ with $v$ vertices and $e$ edges on an Erdös–Rényi model $(X_{v})_{v\in E(K_{n})}$ , the bounded difference inequality yields the estimate

[TABLE]

Thus, to obtain non-trivial estimates in the limit $n\to\infty$ , one has to assume that $n^{\lvert V\rvert-\Delta}p^{\lvert E\rvert}\to\infty$ . With the above inequality, this can be weakened to $nn^{\lvert V\rvert-\Delta}p^{\lvert E\rvert}\to\infty$ .

2.3. Derivations

If $\Gamma$ satisfies the chain rule, i. e. for all differentiable $u:\operatorname{\mathbb{R}}\to\operatorname{\mathbb{R}}$ and $f\in\mathcal{A}$ such that $u\circ f\in\mathcal{A}$ we have $\Gamma(u\circ f)=\lvert u^{\prime}\circ f\rvert\Gamma(f)$ , then (1.1) is equivalent to the usual logarithmic Sobolev inequality (in short: $\Gamma\mathrm{-LSI}(\rho)$ )

[TABLE]

Using this, one can derive second order concentration inequalities similar to the ones given in [BCG17] from Proposition 1.3. Let $S^{n-1}\coloneqq\{x\in\operatorname{\mathbb{R}}^{n}:\lvert x\rvert=1\}$ be the unit sphere equipped with the uniform measure $\sigma_{n-1}$ . It is known that for $\rho_{n}\coloneqq(n-1)^{-1}$

[TABLE]

holds for all Lipschitz functions $f$ and the spherical gradient $\nabla_{S}f$ (see [BCG17, Formula (3.1)] for the logarithmic Sobolev inequality, from which the modified one follows as above). To state our next result, we introduce the following notation (which we will stick to for the rest of this paper): if $A$ is an $n\times n$ matrix, we denote by $\lVert A\rVert_{\mathrm{HS}}$ its Hilbert–Schmidt and by $\lVert A\rVert_{\mathrm{op}}$ its operator norm.

Proposition 2.12.

Consider $S^{n-1}$ equipped with the uniform measure $\sigma_{n-1}$ and let $f:S^{n-1}\to\operatorname{\mathbb{R}}$ be a $C^{2}$ function satisfying $\sup_{\theta\in S^{n-1}}\lVert f_{S}^{\prime\prime}(\theta)\rVert_{\mathrm{op}}\leq 1$ . For any $t\geq 0$

[TABLE]

This follows immediately from Proposition 1.3 and the inequality $\lvert\nabla_{S}\lvert\nabla_{S}f\rvert\rvert\leq\lVert f_{S}^{\prime\prime}\rVert_{\mathrm{op}}$ proven in [BCG17, Lemma 3.1]. Now, if $f$ is $C^{2}$ and orthogonal to all affine functions (in $L^{2}(\sigma_{n-1})$ ), [BCG17, Proposition 5.1] shows $\operatorname{\mathbb{E}}_{\sigma_{n-1}}\lvert\nabla_{S}f\rvert^{2}\leq\rho_{n}\operatorname{\mathbb{E}}_{\sigma_{n-1}}\lVert f^{\prime\prime}_{S}\rVert_{\mathrm{HS}}^{2}.$ So, if we additionally have $\operatorname{\mathbb{E}}_{\sigma_{n-1}}\lVert f^{\prime\prime}\rVert_{\mathrm{HS}}^{2}\leq b^{2}$ , the estimate

[TABLE]

follows.

In a similar manner, one may address open subsets of $\mathbb{R}^{n}$ equipped with some probability measure $\mu$ satisfying a logarithmic Sobolev inequality (with respect to the usual gradient $\nabla$ ). This situation has been sketched in [BCG17, Remark 5.3] and was discussed in more detail in [GS20]. Here we easily obtain the following result:

Proposition 2.13.

Let $G\subseteq\mathbb{R}^{n}$ be an open set, equipped with a probability measure $\mu$ which satisfies a $\nabla\mathrm{-LSI}(\rho)$ , and let $f:G\to\operatorname{\mathbb{R}}$ be a $C^{2}$ function satisfying $\sup_{x\in G}\lVert f^{\prime\prime}(x)\rVert_{\mathrm{op}}\leq 1$ . For any $t\geq 0$

[TABLE]

For the proof it only remains to note that $|\nabla|\nabla f||\leq\lVert f^{\prime\prime}\rVert_{\mathrm{op}}$ , cf. [GS20, Lemma 7.2]. As above, if we require the first order partial derivatives $\partial_{i}f$ to be centered (which translates into orthogonality to linear functions if $\mu$ is the standard Gaussian measure, for instance), a simple application of the Poincaré inequality yields $\operatorname{\mathbb{E}}_{\mu}\lvert\nabla f\rvert^{2}\leq\rho\operatorname{\mathbb{E}}_{\mu}\lVert f^{\prime\prime}\rVert_{\mathrm{HS}}^{2}.$ In particular, we have the following corollary which immediately follows from Proposition 1.3 and the Poincaré inequality.

Corollary 2.14.

Let $G\subseteq\operatorname{\mathbb{R}}^{n}$ be an open set, equipped with a probability measure $\mu$ satisfying a $\nabla\mathrm{-LSI}(\rho)$ , and $f:G\to\operatorname{\mathbb{R}}$ be a $C^{2}$ function with

[TABLE]

For any $t\geq 0$ we have

[TABLE]

Thus, if we recenter a function and its derivatives, the two conditions on the Hessian ensure two-level concentration inequalities. For functions $f(X,Y)$ of independent Gaussian vectors, two-level concentration inequalities have been studied in [Wol13] using the Hoeffding decomposition instead of a recentering of the partial derivatives.

Note that (2.6) and Corollary 2.14 do not only recover [BCG17, Theorem 1.1] and [GS20, Theorem 1.4], but even strengthen these results by providing two-level bounds. To illustrate this, we discuss one of the examples from [GS20] in more detail.

Example 2.15 (Eigenvalues of Wigner matrices).

Let $\{\xi_{jk},1\leq j\leq k\leq N\}$ be a family of independent real-valued random variables whose distributions all satisfy a $\nabla\mathrm{-LSI}(\rho)$ for a fixed $\rho>0$ . Putting $\xi_{jk}=\xi_{kj}$ for $1\leq k<j\leq N$ , we define the random matrix $\Xi=(\xi_{jk}/\sqrt{N})_{1\leq j,k\leq N}$ . Then, by a simple argument using the Hoffman–Wielandt theorem, the joint distribution $\mu^{(N)}=\mu$ of its ordered eigenvalues $\lambda_{1}\leq\ldots\leq\lambda_{N}$ on $\mathbb{R}^{N}$ (in fact, $\lambda_{1}<\ldots<\lambda_{N}$ a.s.) satisfies a $\nabla\mathrm{-LSI}(\rho_{N})$ with constant $\rho_{N}=2\rho/N$ (see for instance [BG10]).

Now consider a $\mathcal{C}^{2}$ -smooth function $g\colon\mathbb{R}^{2}\to\mathbb{R}$ with first order (partial) derivatives in $L^{1}(\mu)$ and second order derivatives bounded by some constant $\gamma$ . Considering a quadratic statistic $\sum_{j\neq k}g(\lambda_{j},\lambda_{k})$ and recentering according to Corollary 2.14, we shall study

[TABLE]

where $g_{x},g_{y}$ denote partial derivatives. For instance, if $g(x,y)\coloneqq xy$ , we have $Q_{N}=\sum_{j\neq k}(\lambda_{j}-\mu[\lambda_{j}])(\lambda_{k}-\mu[\lambda_{k}])$ . Simple calculations show that $\lVert Q_{N}^{\prime\prime}\rVert_{\mathrm{op}}\leq c\gamma N$ as well as $\int\lVert Q_{N}\rVert_{\mathrm{HS}}^{2}d\mu\leq c\gamma^{2}N^{3}$ . Here, by $c>0$ we denote suitable absolute constants which may vary from line to line. Following [GS20, Proposition 8.5], this leads to the exponential moment bound

[TABLE]

By Chebyshev’s inequality, $\mu(|Q_{N}|\geq t)\leq 2\exp(-ct/(\rho\gamma N^{1/2}))$ for all $t\geq 0$ , thus yielding subexponential fluctuations of order $\mathcal{O}_{P}(N^{1/2})$ .

By contrast, Corollary 2.14 leads to

[TABLE]

which is much better for large $t$ . In particular, the fluctuations in the subexponential regime are of order $\mathcal{O}_{P}(1)$ now. This can be interpreted as an extension of the self-normalizing property of linear eigenvalue statistics to a second order situation on the level of fluctuations (cf. the discussion of [GS20, Proposition 8.5]). Note that in [GS20], a comparable result could be achieved for the special case of $g(x,y)\coloneqq xy$ only.

2.4. Weakly dependent measures

To continue the discussion of the previous section for a larger class of measures, we will now consider applications of Theorem 1.1 for functions of weakly dependent random variables (which, in our case, essentially means that a certain mLSI with respect to a suitable difference operator is satisfied). Throughout this section, we shall consider probability measures $\mu$ on a product of Polish spaces $\mathcal{X}=\otimes_{i=1}^{n}\mathcal{X}_{i}$ . For a vector $x=(x_{i})_{i\in I}$ and $j\in I$ we let $x_{i^{c}}=(x_{j})_{j\in I\backslash\{i\}}$ , and for $y\in\operatorname{\mathbb{R}}$ we write $y_{+}=\max(y,0)$ . Now we define difference operators on $L^{\infty}(\mu)$ via

[TABLE]

Here, the suprema over $x_{i}^{\prime}$ (and $x_{i}$ ) are to be understood with respect to the support of $\mu$ . Clearly, $\lvert\mathfrak{d}f\rvert\leq\lvert\mathfrak{h}f\rvert$ and $\lvert\mathfrak{d}^{+}f\rvert\leq\lvert\mathfrak{h}^{+}f\rvert$ . Moreover, we need a second order version of the difference operator $\mathfrak{h}$ . To this end, for any $i\neq j$ , define

[TABLE]

and let $\mathfrak{h}^{(2)}f(x)$ be the matrix (“Hessian”) with zero diagonal and entries $h_{ij}f(x)$ on the off-diagonal.

We now have the following second order result in presence of a $\mathfrak{d}\mathrm{-mLSI}$ :

Proposition 2.16.

Let $\mu$ be a probability measure on a product of Polish spaces $\mathcal{X}=\otimes_{i=1}^{n}\mathcal{X}_{i}$ satisfying a $\mathfrak{d}\mathrm{-mLSI}(\sigma^{2})$ , and let $f:\mathcal{X}\to\operatorname{\mathbb{R}}$ be a bounded measurable function. If $\lvert\mathfrak{d}^{+}\lvert\mathfrak{d}f\rvert\rvert\leq b$ , we have for any $t\geq 0$

[TABLE]

On the other hand, if $\lVert\mathfrak{h}^{(2)}f\rVert_{\mathrm{op}}\leq b$ for all $x\in\mathcal{X}$ , we have for all $t\geq 0$

[TABLE]

Proposition 2.16 implies many second order results from previous articles. For instance, it is well-known (and we will check again below) that any product probability measure $\mu$ satisfies a $\mathfrak{d}\mathrm{-mLSI}(1)$ . Therefore, from (2.7) it is easily possible to obtain results similar to [GS20, Theorem 1.2]. To see this, it suffices to note that for functions with Hoeffding decomposition $f=\sum_{k=2}^{n}f_{k}$ , one may apply [GS20, Proposition 5.2] to upper bound $\operatorname{\mathbb{E}}_{\mu}|\mathfrak{d}f|^{2}$ by $\operatorname{\mathbb{E}}_{\mu}\lVert\mathfrak{d}^{(2)}f\rVert_{\mathrm{HS}}^{2}$ . Unlike in [GS20], Proposition 2.16 yields two-level (or Bernstein-type) inequalities, which can be regarded as an advantage of the present approach.

Similarly, we may retrieve (and sharpen) some of the results from further articles like e. g. [GSS18a] for $d=2$ . On the other hand, it seems that requiring modified logarithmic Sobolev inequalities instead of usual logarithmic Sobolev inequalities extends the class of measures to which our results apply, in particular in non-independent situations. We will discuss the $\mathfrak{d}\mathrm{-mLSI}$ property and provide some sufficient conditions in more detail below.

For some classes of functions, we can obtain variants of Proposition 2.16 which are especially adapted to the properties of the functions under consideration. In particular, we may show deviation inequalities for suprema of quadratic forms in the spirit of [KZ18] for the weakly dependent case.

Proposition 2.17.

Let $\mu$ be supported in $[-1,+1]^{n}$ and satisfy a $\mathfrak{d\mathrm{-mLSI}(\sigma^{2})}$ . Let $\mathcal{A}$ be a countable class of symmetric matrices, bounded in operator norm and with zeroes on its diagonal. Define $h(x)\coloneqq\sup_{A\in\mathcal{A}}\langle x,Ax\rangle$ , $f_{\mathcal{A}}(x)\coloneqq\sup_{A\in\mathcal{A}}\lVert Ax\rVert$ and $\Sigma\coloneqq\sup_{A\in\mathcal{A}}\lVert A\rVert_{\mathrm{op}}$ . We have for any $t>0$

[TABLE]

Note that while in general, we only obtain deviation inequalities here, for a single symmetric matrix $A$ with zeroes on its diagonal and the quadratic form $f(x)=\langle x,Ax\rangle$ similar arguments as in the proof of Proposition 2.16 do lead to concentration inequalities for $f$ .

If $\mu$ is a product measure, the result of Proposition 2.17 is well-known and has been proven various times, see for example [Tal96, Theorem 1.2] for concentration inequalities in Rademacher random variables, [Led97, Theorem 3.1] for the upper tail inequalities and random variables satisfying $\lvert X_{i}\rvert\leq 1$ , [BLM03, Theorem 17] for the upper bound and Rademacher random variables and [BBLM05, Corollary 4]. More recent results include [HKZ12, RV13, Ada15, AKPS19, KZ18, GSS18b].

To understand which classes of measures may be addressed by Propositions 2.16 and 2.17, let us study the $\mathfrak{d}\mathrm{-mLSI}$ property in more detail. First, we show that it is implied by another functional inequality. Assume that a probability measure $\mu$ on a product of Polish spaces $\mathcal{X}=\otimes_{i=1}^{n}\mathcal{X}_{i}$ satisfies

[TABLE]

where $\mu(\cdot\mid x_{i^{c}})$ denotes the regular conditional probability. This functional inequality is (also) known as a modified logarithmic Sobolev inequality in the framework of Markov processes, and it is equivalent to exponential decay of the relative entropy along the Glauber semigroup, see for example [BT06] or [CMT15].

Proposition 2.18.

If $\mu$ satisfies (2.10), then a $\mathfrak{d}\mathrm{-mLSI}(\sigma^{2})$ and a $\mathfrak{d}^{+}\mathrm{-mLSI}(2\sigma^{2})$ hold. Consequently, for any $f:\mathcal{X}\to\operatorname{\mathbb{R}}$ and any $\alpha>\sigma^{2}/2$ we have

[TABLE]

The same is true for $\mathfrak{d}^{+}$ with $\sigma^{2}$ replaced by $2\sigma^{2}$ . This especially holds for product measures $\mu=\mu_{1}\otimes\ldots\otimes\mu_{n}$ with $\sigma^{2}=1$ .

Here, choosing $\alpha=\sigma^{2}$ or $\alpha=2\sigma^{2}$ respectively leads to the exponential inequalities

[TABLE]

The first inequality might be considered as a generalization of [Mas00, Lemma 8], which in turn is based on arguments in [Led97, Theorem 1.2]. The second inequality involving $\lvert\mathfrak{d}f\rvert^{2}$ is well-known in the case of the discrete cube, cf. [BG99, Corollary 2.4] with a better constant. On the other hand, the proof presented herein is remarkably short and does not rely on some special properties of the measure $\mu$ , but can be derived under (2.10).

Proposition 2.18 implies [BLM03, Theorem 2], as product measures satisfy (2.10) with $\sigma^{2}=1$ . Indeed, taking the logarithms on both sides of (2.11) gives for any $\alpha>1$ and $\lambda\geq 0$

[TABLE]

It remains to choose some fixed $\theta>0$ and set $\alpha=(\lambda\theta)^{-1}$ .

The property (2.10) is satisfied for a large class containing non-product measures. Note that a sufficient condition (due to Jensen’s inequality) for (2.10) is the approximate tensorization property

[TABLE]

Establishing (2.12) is subject to ongoing research, and we especially want to highlight two possible approaches.

The first one is akin to the perturbation argument of Holley and Stroock as outlined in [HS87] (see also [Roy07, Proposition 3.1.18] for a similar reasoning). Assume that $d\mu=Z^{-1}e^{f}d\nu$ , where $f:\mathcal{X}\to\operatorname{\mathbb{R}}$ is a measurable function, $\nu=\otimes_{i=1}^{n}\nu_{i}$ is some product measure and $Z=\operatorname{\mathbb{E}}_{\nu}e^{f}$ . If we require $f$ to be bounded, we clearly have $\mathrm{osc}(f)<\infty$ for its (maximal) oscillation $\mathrm{osc}(f)=\sup_{x\in\mathcal{X}}f(x)-\inf_{x\in\mathcal{X}}f(x)$ . Under these assumptions, $\mu$ satisfies (2.12) with $\sigma^{2}=\exp(2\mathrm{osc}(f))$ .

Furthermore, under weak dependence conditions on the local specifications of some measure $\mu$ on a product space $\mathcal{X}$ , (2.12) was proven in [Mar13, Mar15, CMT15].

2.5. Bernstein inequality

As a final application, let us demonstrate how to recover the classical Bernstein inequality for independent bounded random variables by means of Theorem 1.1 (up to constants). In fact, as in some previous works we may remove the boundedness assumption.

There are various extensions of Bernstein’s inequality to unbounded random variables. For instance, [Ada08, Theorem 4] proves deviation inequalities for empirical processes in independent random variables with finite $\Psi_{\alpha}$ norm for some $\alpha\in(0,1]$ , which in particular includes concentration inequalities for sums of random variables with finite $\Psi_{\alpha}$ norm. Moreover, [BLM13, Theorem 2.10] requires a certain control of the moments of the random variables, which is in essence a condition on the $\Psi_{1}$ norms. Thirdly, [Ver18, Theorem 2.8.1] provides a Bernstein inequality for random variables with bounded $\Psi_{1}$ norms. However, note that the Gaussian term in the last two mentioned works is a sum of the $\Psi_{1}$ norm instead of the variance. By our methods, we obtain a version of Bernstein’s inequality for sub-Gaussian random variables with the variance of the sum in the Gaussian term, with a reasonable constant.

Theorem 2.19.

There exists an absolute constant $c^{\prime}>0$ such that the following holds. For any set of independent random variables $X_{1},\ldots,X_{n}$ satisfying $\lVert X_{i}\rVert_{\Psi_{2}}<\infty$ , we have for any $t\geq 0$

[TABLE]

In particular, if $\lvert X_{i}\rvert\leq M$ almost surely for all $i\in\{1,\ldots,n\}$ and some $M>0$ , then for all $t\geq 0$ it holds

[TABLE]

We want to give three concluding remarks on Theorem 2.19. Firstly, note that is not possible to prove an inequality

[TABLE]

for some absolute constant $c>0$ in the class of all sub-Gaussian random variables. This can be easily seen in the case $n=1$ and by choosing $X\sim\mathrm{Ber}(p)$ for $p\to 0$ . Thus, to obtain a sub-Gaussian tail with the variance parameter, one has to limit the range of $t$ for which one can expect sub-Gaussian behaviour.

Secondly, one cannot replace $\lVert\max_{i}\lvert X_{i}\rvert\rVert_{\Psi_{2}}$ by $\max_{i}\lVert X_{i}\rVert_{\Psi_{2}}$ in (2.13), i. e. there cannot be an inequality of the form

[TABLE]

This, again, follows by choosing $X_{i}\sim\mathrm{Ber}(p)$ for $p=\lambda/n$ , $\lambda>0$ . In this case, the sum converges (weakly) to a Poisson random variable, whereas the sub-Gaussian range extends to $t\in\operatorname{\mathbb{R}}_{+}$ for $n\to\infty$ , giving a contradiction.

Thirdly, it is well known that the $\Psi_{2}$ norm of the maximum of $\Psi_{2}$ random variables (bounded by some constant, say $K$ ) grows at most logarithmically in the dimension. For example, if we consider i. i. d. random variables $X_{i}$ with unit variance, we have the sub-Gaussian estimate for $t$ of order (at least) $n/\log(n)$ .

3. Proofs and auxiliary results

We begin by proving Theorem 1.1. Before we start, let us recall [BG99, Theorem 2.1], relating the exponential moments of $f-\operatorname{\mathbb{E}}_{\mu}f$ to those of $\Gamma(f)^{2}$ .

Theorem 3.1.

Assume that $(\Omega,\mu,\Gamma)$ satisfies (1.1) with constant $\rho>0$ . Then for any $f\in\mathcal{A}$ and any $\alpha>\frac{\rho}{2}$ we have

[TABLE]

Note that formally, Theorem 3.1 and our own results like Theorem 1.1 are valid for bounded functions $f$ only, since $\Gamma$ was defined on a subset of bounded functions. However, it is not hard to see that our proofs can usually be extended to a suitable larger class of functions $\widetilde{\mathcal{A}}\supset\mathcal{A}$ . One possible approach is first to truncate the random variable $f$ under consideration, and then prove bounds which are independent of the truncation level. As this is somewhat situational and depends on the difference operator $\Gamma$ , we stick to the boundedness assumption for the sake of a clearer presentation of the arguments. Nevertheless, we can prove Theorem 1.1 under the assumption that $\Gamma$ can be suitably defined for the function $f$ at hand, and that $\Gamma(f)\leq g$ for some sub-Gaussian function.

Furthermore, we need an elementary inequality to adjust the constants in concentration or deviation inequalities: for any two constants $c_{1}>c_{2}>1$ we have for all $r\geq 0$ and $c>0$

[TABLE]

whenever the left hand side is smaller or equal to $1$ .

Proof of Theorem 1.1.

Assume that $\rho=1$ , which can always be achieved by defining a new difference operator $\Gamma_{\rho}(f)=\sqrt{\rho}\Gamma(f)$ . The general inequality follows by straightforward modifications from the $\rho=1$ case.

Making use of Theorem 3.1 in the first and $a^{2}\leq 2(a-b)_{+}^{2}+2b^{2}$ for any $a,b\geq 0$ in the second inequality, we obtain for all $\lambda\geq 0$

[TABLE]

The sub-Gaussian condition (1.3) leads to

[TABLE]

whenever $4\lambda^{2}K^{2}<1$ . Consequently, for all $\lambda\in[0,(2K)^{-1})$ we obtain by Markov’s inequality

[TABLE]

Now we distinguish the two cases $t\leq\frac{c^{2}}{K}$ and $t>\frac{c^{2}}{K}$ . In the first case, set $\lambda\coloneqq\frac{t}{4c^{2}}$ (which implies $4\lambda^{2}K^{2}\leq 1/4$ and thus is in the range) to obtain

[TABLE]

using the monotonicity of $\frac{1}{1-x}$ . In the second case, we simply set $\lambda\coloneqq\frac{1}{4K}$ (implying $\lambda^{2}K^{2}=1/4$ ) and observe that

[TABLE]

Combining (3.3) and (3.4) finishes the proof of (1.4).

Finally, (1.5) follows by considering $-f$ instead of $f$ , which yields

[TABLE]

The constant can be adjusted using (3.1). ∎

Proof of Corollary 1.2.

Using the $\Gamma\mathrm{-mLSI}(\rho)$ , by applying Theorem 3.1 to $f:=\lambda g$ , Markov’s inequality and optimizing it can be shown that for all $t\geq 0$

[TABLE]

Here, to obtain the factor $2$ in the denominator, one has to let $\alpha\to\infty$ in Theorem 3.1. Thus, the corollary follows easily from Theorem 1.1. ∎

Proof of Proposition 1.3.

We assume $b=1$ which can be done by rescaling.

First, observe that [BG99, equation (2.4)] holds for any positive function $g$ , since the inequality $\Gamma(g^{2})\leq 2g\Gamma(g)$ is sufficient to apply the argument given therein. Thus, for any positive function $g$ satisfying $\Gamma(g)\leq 1$ it holds for $\lambda\in[0,(2\rho)^{-1})$

[TABLE]

So, by applying Theorem 3.1 (with $\alpha=\rho)$ we have

[TABLE]

which can also be applied to $\lambda f$ and $\lambda g$ instead of $f$ and $g$ , for $\lambda\in[0,1]$ . Thus, by Markov’s inequality, for any $\lambda\in[0,1]$

[TABLE]

The claim follows by putting $\lambda=\min(\frac{t}{2\operatorname{\mathbb{E}}_{\mu}g^{2}},1)$ and noting that if $t/(2\operatorname{\mathbb{E}}_{\mu}g^{2})\geq 1$ , we have $t-\operatorname{\mathbb{E}}_{\mu}g^{2}\geq t/2$ . ∎

Proof of Proposition 1.5.

Choosing $\alpha=\rho$ in Theorem 3.1, applying the inequality to $\lambda f$ and using the monotonicity leads to

[TABLE]

Thus for $\lambda\in(0,(a\rho)^{-1})$ , by Jensen’s inequality (applied to the concave function $x\mapsto x^{\lambda\rho a}$ ) we have

[TABLE]

Finally, Markov’s inequality and [BLM03, Lemma 11] yield the first inequality.

To see the second inequality, note that for any $\lambda>0$ such that $\lambda a\rho<1$ , by Theorem 3.1 and concavity of $x\mapsto x^{\lambda a\rho}$ , it holds

[TABLE]

Finally, applying the estimates from the first part we obtain

[TABLE]

The concentration inequality follows as in the first part. ∎

Proof of Proposition 1.6.

Using and rewriting [GQ03, Theorem 1] we obtain for any $f:S_{n}\to\operatorname{\mathbb{R}}$

[TABLE]

Now, the inequality $(a-b)(e^{a}-e^{b})\leq\frac{1}{2}(e^{a}+e^{b})(a-b)^{2}$ and the fact that $\sigma\mapsto\sigma\tau_{ij}$ is an automorphism of $S_{n}$ leads to the $\Gamma\mathrm{-mLSI}(1)$ . The $\Gamma^{+}\mathrm{-mLSI}(2)$ follows in the same manner from the inequality $(a-b)_{+}(e^{a}-e^{b})\leq(a-b)_{+}^{2}e^{a}$ . ∎

Proof of Theorem 1.7.

By Proposition 1.6 and Theorem 3.1 we have for any $f:S_{n}\to\operatorname{\mathbb{R}}$ , any $\lambda\in\operatorname{\mathbb{R}}$ and any $\alpha>1/2$ the inequality

[TABLE]

If $f$ is locally Lipschitz with respect to $d$ , an easy calculation shows that we can upper bound $\Gamma(f)^{2}\leq\mathrm{ObsDiam}(S_{n},d)$ , so that from the above inequality in combination with $\alpha\to\infty$ we get

[TABLE]

The sub-Gaussian estimate follows by Markov’s inequality and the variance bound from integration by parts. ∎

In order to prove Proposition 1.8, we first need to establish the following lemma:

Lemma 3.2.

Let $f:S_{n}\to\operatorname{\mathbb{R}}$ be a non-negative function such that

(1)

$\Gamma^{+}(f)^{2}\leq f$ , 2. (2)

$\lvert f(\sigma)-f(\sigma\tau_{ij})\rvert\leq 1$ * for all $\sigma,i,j$ .*

Then for all $t\in[0,\operatorname{\mathbb{E}}_{\pi_{n}}f]$ we have

[TABLE]

Especially we have

[TABLE]

In particular, this holds for $f(\sigma)=\frac{1}{16}d_{T}(\sigma,A)^{2}$ , where $A\subset S_{n}$ is any set.

Proof of Lemma 3.2.

Rewriting [GQ03, Theorem 1], we have that for any positive function $g$ ,

[TABLE]

Using this, we obtain for any $\lambda\in[0,1]$

[TABLE]

where $\Psi(x)\coloneqq e^{x}-1$ . By a Taylor expansion it can easily be seen that $\Psi(x)\leq 2x$ for all $x\in[0,1]$ , so that (recall that by $(2)$ we have $f(\sigma)-f(\sigma\tau_{ij})\leq 1$ , and $f(\sigma)-f(\sigma\tau_{ij})\geq 0$ due to the positive part)

[TABLE]

Chebyshev’s association inequality yields

[TABLE]

In other terms, if we set $h(\lambda)\coloneqq\operatorname{\mathbb{E}}_{\pi_{n}}e^{-\lambda f}$ , we have

[TABLE]

which by the fundamental theorem of calculus implies for all $\lambda\in[0,1]$

[TABLE]

So, for any $t\in[0,\operatorname{\mathbb{E}}_{\pi_{n}}f]$ , by Markov’s inequality and setting $\lambda=\frac{t}{4\operatorname{\mathbb{E}}_{\pi_{n}}f}$

[TABLE]

The second part follows by nonnegativity and $t=\operatorname{\mathbb{E}}_{\pi_{n}}f$ .

It remains to show that $f(\sigma)=\frac{1}{16}d_{T}(\sigma,A)^{2}$ satisfies the two conditions of this lemma. To this end, we first need to show that $\Gamma^{+}(d_{T}(\cdot,A))^{2}\leq 4$ . Writing $g(\sigma)\coloneqq d_{T}(\sigma,A)$ , it is well known (see [BLM03]) that we have

[TABLE]

where $\mathcal{M}(A)$ is the set of all probability measures on $A$ . To estimate $\Gamma^{+}(g)^{2}(\sigma)$ , one has to compare $g(\sigma)$ and $g(\sigma\tau_{ij})$ . To this end, for any $\sigma\in S_{n}$ fixed, let $\widetilde{\alpha},\widetilde{\nu}$ be parameters for which the value $g(\sigma)$ is attained, and let $\hat{\nu}=\hat{\nu}_{ij}$ be a minimizer of $\inf_{\nu\in\mathcal{M}(A)}\sum_{k=1}^{n}\widetilde{\alpha}_{k}\nu(\sigma^{\prime}:\sigma^{\prime}_{k}\neq(\sigma\tau_{ij})_{k})$ . This leads to

[TABLE]

Using this and the non-negativity of $d_{T}(\cdot,A)$ , we have

[TABLE]

To show the second property, we proceed similarly to [BLM09, Proof of Lemma 1]. By (3.7) and the Cauchy–Schwarz inequality, we have

[TABLE]

Assuming without loss of generality that $f(\sigma)\geq f(\sigma\tau_{ij})$ , choose $\hat{\nu}=\hat{\nu}_{ij}\in\mathcal{M}(A)$ such that the value of $f(\sigma\tau_{ij})$ is attained. It follows that

[TABLE]

which finishes the proof. ∎

The proof of Proposition 1.8 is now easily completed:

Proof of Proposition 1.8.

The difference operator $\Gamma^{+}$ satisfies $\Gamma^{+}(g^{2})\leq 2g\Gamma^{+}(g)$ for all positive functions $g$ , as well as an $\mathrm{mLSI}(2)$ . Moreover, as seen in the proof of Lemma 3.2, we have $\Gamma^{+}(d_{T}(\cdot,A))\leq 2$ . Thus, by (3.6) it holds for $\lambda\in[0,1/4)$

[TABLE]

Furthermore, Lemma 3.2 shows that

[TABLE]

So, for $\lambda=1/36$ we have

[TABLE]

∎

Proof of Proposition 1.9.

Again, the proof mimics the proof given for independent random variables in [BLM03]. As stated in Proposition 1.6, the uniform measure $\pi_{n}$ on $S_{n}$ satisfies a $\Gamma^{+}\mathrm{-mLSI}(2)$ with respect to

[TABLE]

Writing $f_{A}(\sigma)\coloneqq d_{T}(\sigma,A)$ , we have $\Gamma^{+}(f_{A})(\sigma)^{2}\leq 4$ as seen in the proof of Lemma 3.2. Hence, by similar arguments as in the proof of Theorem 1.1 we have for any $\lambda\geq 0$

[TABLE]

implying the sub-Gaussian estimate $\pi_{n}(f_{A}-\operatorname{\mathbb{E}}_{\pi_{n}}f_{A}\geq t)\leq\exp(-t^{2}/16).$ Fix a set $A\subseteq S_{n}$ satisfying $\pi_{n}(A)\geq 1/2$ . As a $\Gamma\mathrm{-mLSI}(1)$ implies a Poincaré inequality (see [BT06, Proposition 3.5] or [DS96]), we also have (by Chebyshev’s inequality)

[TABLE]

which evaluated at $t=\operatorname{\mathbb{E}}_{\pi_{n}}f_{A}$ yields $(\operatorname{\mathbb{E}}_{\pi_{n}}f_{A})^{2}\leq 16$ . Thus, for any $t\geq 4$ it holds

[TABLE]

where the last inequality follows from $(t-4)^{2}\geq t^{2}/2-16$ for any $t\geq 0$ and (3.1). For $t\leq 4$ the inequality (3.9) holds trivially. ∎

The proofs of the results for slices of the hypercube work in a very similar way.

Proof of Proposition 1.10.

It follows from [GQ03, Theorem 1] that we have for any $f:C_{n,r}\to\operatorname{\mathbb{R}}$

[TABLE]

From here, we may process as in the proof of Proposition 1.6. ∎

For the proof of Proposition 1.11, we need to establish the following analogue of Lemma 3.2:

Lemma 3.3.

Let $f:C_{n,r}\to\operatorname{\mathbb{R}}$ be a non-negative function such that

(1)

$\Gamma^{+}(f)^{2}\leq f$ , 2. (2)

$\lvert f(\eta)-f(\tau_{ij}\eta)\rvert\leq 1$ * for all $\eta,i,j$ .*

Then for all $t\in[0,\operatorname{\mathbb{E}}_{\mu_{n,r}}f]$ we have

[TABLE]

Especially we have

[TABLE]

In particular, this holds for $f(\eta)=\frac{1}{32}d_{T}(\eta,A)^{2}$ , where $A\subset C_{n,r}$ is any set.

Proof of Lemma 3.3.

Rewriting [GQ03, Theorem 1], we have that for any positive function $g$ ,

[TABLE]

From here, we may mimic the proof of Lemma 3.2.

Last, we need to show that $f(\eta)=\frac{1}{32}d_{T}(\eta,A)^{2}$ satisfies the two conditions of this lemma. As compared to the proof of Lemma 3.2, some of the constants will change because of the different normalization of the difference operators. However, we may argue similarly and show that $\Gamma^{+}(d_{T}(\cdot,A))^{2}\leq 8$ . Using this and the non-negativity of $d_{T}(\cdot,A)$ yields

[TABLE]

Finally, by arguing as above it is easily seen that $\lvert f(\eta)-f(\tau_{ij}\eta)\rvert\leq 2/32$ . ∎

Proof of Proposition 1.11.

As the difference operator $\Gamma^{+}$ satisfies $\Gamma^{+}(g^{2})\leq 2g\Gamma^{+}(g)$ for all positive functions $g$ , as well as an $\mathrm{mLSI}(2)$ , it remains to change the proof of Proposition 1.8 in view of the different constants appearing in Lemma 3.3. As noted in the proof of Lemma 3.3, we have $\Gamma^{+}(d_{T}(\cdot,A))\leq\sqrt{8}$ . Thus, by (3.6) it holds for $\lambda\in[0,1/4)$

[TABLE]

Furthermore, Lemma 3.3 shows that

[TABLE]

So, for $\lambda=1/68$ we have

[TABLE]

∎

Finally, we present the proofs of Section 2.

Proof of Proposition 2.8.

We show that $f$ is weakly $(k\mathrm{ML}(f),0)$ -self bounding in the language of [BLM09]. To see this, for any $v\in V$ let $f_{v}(x_{v^{c}})\coloneqq\sum_{e\in E:v\notin E}w_{e}X_{e}=f(X_{v^{c}},0)$ . Now we have

[TABLE]

Here, the first inequality follows from $X_{v}\in[0,1]$ and the last one is a consequence of Euler’s homogeneous function theorem and the fact that all quantities involved are positive. Consequently, [BLM09, Theorem 1] yields for any $t\geq 0$

[TABLE]

For the lower bound, apply [BLM09, Theorem 1] to $\widetilde{f}=\mathrm{ML}(f)^{-1}f$ which satisfies $0\leq\widetilde{f}(x)-\widetilde{f}_{v}(x_{v^{c}})\leq 1$ for all $v\in V$ and $x\in[0,1]^{V}$ and is weakly $(k\mathrm{ML}(f)^{-1},0)$ -self bounding. ∎

Proof of Proposition 2.9.

The first part follows as above. As for the second part, if we choose $\mathcal{F}=\mathcal{F}_{q}=\{a\in\operatorname{\mathbb{R}}^{V}:a_{v}\geq 0,\lVert a\rVert_{q}\leq 1\}$ for some $q\in[1,\infty]$ this leads to

[TABLE]

for the Hölder conjugate $p$ , which is due to the nonnegativity of the $X_{i}$ and the dual formulation of the $L^{p}$ norm in $\operatorname{\mathbb{R}}^{V}$ . ∎

Proof of Proposition 2.10.

Clearly, $f_{d}$ is $d$ -homogeneous and has positive weights in the sense of (2.1), if we set $V=[n]$ and $E=\{\{j,j+1,\ldots,j+d-1\},j=1,\ldots,n\}$ , $w_{e}=1$ . Furthermore, the partial derivatives can be easily bounded: For any fixed $l\in[n]$ there are exactly $d$ terms which depend on $X_{l}$ , and the product is bounded by $1$ . Consequently, $\mathrm{ML}(f_{d})=\max_{l\in[n]}\max_{x\in[0,1]^{n}}\partial_{l}f(X)=d.$ Thus, Proposition 2.8 yields for all $t\geq 0$

[TABLE]

The assertion now follows, if we note that $\operatorname{\mathbb{E}}f_{d}(X)=n\eta^{d}$ . ∎

Let us now prove the results from Section 2.4. To this end, we first need to establish some basic properties of modified logarithmic Sobolev inequalities with respect to the difference operators we use.

Lemma 3.4.

Let $\mu$ be a probability measure on a product of Polish spaces $\mathcal{X}=\otimes_{i=1}^{n}\mathcal{X}_{i}$ which satisfies a $\mathfrak{d}\mathrm{-mLSI}(\sigma^{2})$ . Then, $\mu$ also satisfies a $\mathfrak{d}^{+}\mathrm{-mLSI}(2\sigma^{2})$ .

Proof.

Let $(\Omega,\mathcal{F},\nu)$ be a probability space and $g$ a measurable function on it. Then,

[TABLE]

Applying this to $\nu=\mu(\cdot\mid x_{i^{c}})$ and $g=f(x_{i^{c}},\cdot)$ for any $i=1,\ldots,n$ yields

[TABLE]

which finishes the proof. ∎

Also note that by monotonicity a $\mathfrak{d}-\mathrm{mLSI}(\sigma^{2})$ implies an $\mathfrak{h}-\mathrm{mLSI}(\sigma^{2})$ , and the same holds for $\mathfrak{d}^{+}$ and $\mathfrak{h}^{+}$ . Moreover, we recall the duality formula $\lvert x\rvert=\sup_{y\in S^{n-1}}\langle x,y\rangle$ .

Proof of Proposition 2.16.

First, (2.7) follows by applying Theorem 1.1 to $g=\lvert\mathfrak{d}f\rvert$ and noting that $\lvert\mathfrak{d}(af)\rvert=\lvert a\rvert\lvert\mathfrak{d}f\rvert$ for all $a\in\operatorname{\mathbb{R}}$ . To see that $g$ is sub-Gaussian with parameter $K=\sqrt{2\sigma^{2}}b$ and $C=1$ , note that by Lemma 3.4, $\mu$ satisfies a $\mathfrak{d}^{+}\mathrm{-mLSI}(2\sigma^{2})$ , so that we can use (3.5).

The same arguments are valid for $\mathfrak{h}^{+}$ and $\mathfrak{h}$ respectively. Here, we additionally use the estimate $\lvert\mathfrak{h}^{+}\lvert\mathfrak{h}f\rvert\rvert\leq\lvert\mathfrak{h}^{(2)}f\rvert_{\mathrm{op}}$ (cf. [GSS18b, Lemma 3.2]). ∎

Proof of Proposition 2.17.

Let us bound $\lvert\mathfrak{d}^{+}h\rvert^{2}$ . Choose the matrix $\widetilde{A}\in\mathcal{A}$ maximizing $\sup_{A\in\mathcal{A}}\langle x,Ax\rangle$ and use the monotonicity of $y\mapsto y_{+}$ to obtain

[TABLE]

Furthermore, we have for some maximizer $\widetilde{A}\in\mathcal{A}$ of $\sup_{A\in\mathcal{A}}\lVert Ax\rVert$ and $\widetilde{v}\in S^{n-1}$ for $\sup_{v\in S^{n-1}}\langle\widetilde{A}x,v\rangle$

[TABLE]

Here, the suprema of $v$ and $w$ are taken over the $n$ -dimensional sphere. We can now apply Corollary 1.2 to $\Gamma=\mathfrak{d}^{+}$ , $\rho=2\sigma^{2}$ , $g=4f_{\mathcal{A}}$ and $b=8\Sigma$ to finish the proof. ∎

Proof of Proposition 2.18.

The idea of the proof of the $\mathrm{mLSI}$ s is already present in [BG07]. Let $(\Omega,\mathcal{F},\nu)$ be any probability space. For any function $g$ we have due to the inequality $(a-b)_{+}(e^{a}-e^{b})_{+}\leq\frac{1}{2}(a-b)_{+}^{2}(e^{a}+e^{b})$ (for all $a,b\in\operatorname{\mathbb{R}}$ )

[TABLE]

Applying this to $\nu=\mu(\cdot\mid x_{i^{c}})$ and $g=f(x_{i^{c}},\cdot)$ and using (2.10) yields

[TABLE]

To see that $\mu$ also satisfies a $\mathfrak{d^{+}}\mathrm{-mSLI}(2\sigma^{2})$ , it remains to apply Lemma 3.4. The exponential inequalities are a consequence of Theorem 3.1. ∎

Proof of Theorem 2.19.

Write $X=(X_{1},\ldots,X_{n})$ . Let us assume that $\operatorname{\mathbb{E}}X_{i}=0$ for all $i\in\{1,\ldots,n\}$ , from which the general case follows easily using the inequality

[TABLE]

Since the $X_{i}$ are independent, it follows from Proposition 2.18 that their joint distribution $\mathbb{P}_{X}$ satisfies a $\mathfrak{d}\mathrm{-mLSI}(1)$ , and we can calculate

[TABLE]

To apply Theorem 1.1, it remains to show that we may set $c=\operatorname{\mathbb{E}}\lVert X\rVert_{2}+\sqrt{\mathrm{Var}(\sum_{i}X_{i})}$ and $K=\lVert\max_{i}\lvert X_{i}\rvert\rVert_{\Psi_{2}}$ . This is seen by noting that

[TABLE]

where the last step follows from [KZ18, Lemma 1.4], as $X\mapsto\lVert X\rVert_{2}$ is a convex and $1$ -Lipschitz function. Note that although [KZ18, Lemma 1.4] is formulated for $t\geq t_{0}>0$ , one can easily find an estimate for all $t\geq 0$ , by first multiplying the right hand side by $2$ , and then adjusting the constant in the exponential. ∎

Recall that as discussed above, the application of Theorem 3.1 is only possible for bounded functions, so that an additional truncation step needs to be done. Instead of applying Theorem 3.1 to $f(X)=\sum_{i}X_{i}-\operatorname{\mathbb{E}}X_{i}$ , it is applied to the sum of the random variables $Y_{i}\coloneqq g_{R}(X_{i})-\operatorname{\mathbb{E}}g_{R}(X_{i})$ for $g_{R}(x)=\min(R,\max(x,-R))$ for a suitable truncation level $R>0$ . As the right hand side of equation (2.13) can be chosen to be independent of $R$ , the theorem follows for unbounded random variables by letting $R\to\infty$ .

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Ada 08] Radosław Adamczak “A tail inequality for suprema of unbounded empirical processes with applications to Markov chains” In Electron. J. Probab. 13 , 2008, pp. no. 34 \bibrangessep 1000–1034 DOI: 10.1214/EJP.v 13-521 · doi ↗
2[Ada 15] Radosław Adamczak “A note on the Hanson-Wright inequality for random vectors with dependencies” In Electron. Commun. Probab. 20 , 2015, pp. no. 72 \bibrangessep 13 DOI: 10.1214/ECP.v 20-3829 · doi ↗
3[ABW 17] Radosław Adamczak, Witold Bednorz and Paweł Wolff “Moment estimates implied by modified log-Sobolev inequalities” In ESAIM Probab. Stat. 21 , 2017, pp. 467–494 DOI: 10.1051/ps/2016030 · doi ↗
4[AKPS 19] Radosław Adamczak, Michał Kotowski, Bartłomiej Polaczyk and Michał Strzelecki “A note on concentration for polynomials in the Ising model” In Electron. J. Probab. 24 , 2019, pp. no. 42 \bibrangessep 1–22 DOI: 10.1214/19-EJP 280 · doi ↗
5[BCG 17] Sergey G. Bobkov, Gennadiy P. Chistyakov and Friedrich Götze “Second-order concentration on the sphere” In Commun. Contemp. Math. 19.5 , 2017 DOI: 10.1142/S 0219199716500589 · doi ↗
6[BG 99] Sergey G. Bobkov and Friedrich Götze “Exponential integrability and transportation cost related to logarithmic Sobolev inequalities” In J. Funct. Anal. 163.1 , 1999, pp. 1–28 DOI: 10.1006/jfan.1998.3326 · doi ↗
7[BG 07] Sergey G. Bobkov and Friedrich Götze “Concentration inequalities and limit theorems for randomized sums” In Probab. Theory Related Fields 137.1-2 , 2007, pp. 49–81 DOI: 10.1007/s 00440-006-0500-9 · doi ↗
8[BG 10] Sergey G. Bobkov and Friedrich Götze “Concentration of empirical distribution functions with applications to non-i.i.d. models” In Bernoulli 16.4 , 2010, pp. 1385–1414 DOI: 10.3150/10-BEJ 254 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Modified log-Sobolev inequalities and two-level concentration

Abstract.

Key words and phrases:

1. Introduction

1.1. Two-level concentration inequalities

Theorem 1.1**.**

Corollary 1.2**.**

Proposition 1.3**.**

Corollary 1.4**.**

Proposition 1.5**.**

1.2. The symmetric group

Proposition 1.6**.**

Theorem 1.7**.**

Proposition 1.8**.**

Proposition 1.9**.**

1.3. Slices of the hypercube

Proposition 1.10**.**

Proposition 1.11**.**

1.4. Outline

2. Applications

2.1. Symmetric group

Example 2.1**.**

Example 2.2**.**

Example 2.3**.**

Example 2.4**.**

Example 2.5**.**

Example 2.6**.**

Example 2.7**.**

2.2. Multilinear polynomials in [0,1][0,1][0,1]-random variables

Proposition 2.8**.**

Proposition 2.9**.**

Proposition 2.10**.**

Example 2.11**.**

2.3. Derivations

Proposition 2.12**.**

Proposition 2.13**.**

Corollary 2.14**.**

Example 2.15** (Eigenvalues of Wigner matrices).**

2.4. Weakly dependent measures

Proposition 2.16**.**

Proposition 2.17**.**

Proposition 2.18**.**

2.5. Bernstein inequality

Theorem 2.19**.**

3. Proofs and auxiliary results

Theorem 3.1**.**

Proof of Theorem 1.1.

Proof of Corollary 1.2.

Proof of Proposition 1.3.

Proof of Proposition 1.5.

Proof of Proposition 1.6.

Proof of Theorem 1.7.

Lemma 3.2**.**

Proof of Lemma 3.2.

Proof of Proposition 1.8.

Proof of Proposition 1.9.

Proof of Proposition 1.10.

Lemma 3.3**.**

Proof of Lemma 3.3.

Proof of Proposition 1.11.

Proof of Proposition 2.8.

Proof of Proposition 2.9.

Proof of Proposition 2.10.

Lemma 3.4**.**

Proof.

Proof of Proposition 2.16.

Proof of Proposition 2.17.

Proof of Proposition 2.18.

Proof of Theorem 2.19.

Theorem 1.1.

Corollary 1.2.

Proposition 1.3.

Corollary 1.4.

Proposition 1.5.

Proposition 1.6.

Theorem 1.7.

Proposition 1.8.

Proposition 1.9.

Proposition 1.10.

Proposition 1.11.

Example 2.1.

Example 2.2.

Example 2.3.

Example 2.4.

Example 2.5.

Example 2.6.

Example 2.7.

2.2. Multilinear polynomials in $[0,1]$ -random variables

Proposition 2.8.

Proposition 2.9.

Proposition 2.10.

Example 2.11.

Proposition 2.12.

Proposition 2.13.

Corollary 2.14.

Example 2.15 (Eigenvalues of Wigner matrices).

Proposition 2.16.

Proposition 2.17.

Proposition 2.18.

Theorem 2.19.

Theorem 3.1.

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.