On the convex Poincar\'e inequality and weak transportation inequalities

Rados{\l}aw Adamczak; Micha{\l} Strzelecki

arXiv:1703.01765·math.PR·June 18, 2019

On the convex Poincar\'e inequality and weak transportation inequalities

Rados{\l}aw Adamczak, Micha{\l} Strzelecki

PDF

TL;DR

This paper establishes an equivalence between the convex Poincaré inequality and weak transportation inequalities with quadratic-linear cost for probability measures on R^n, extending previous results and introducing new concentration inequalities.

Contribution

It generalizes the equivalence between convex Poincare9 and weak transportation inequalities to higher dimensions and introduces modified logarithmic Sobolev inequalities for convex functions.

Findings

01

Proves the equivalence for R^n

02

Derives refined concentration inequalities for convex functions

03

Extends previous one-dimensional results

Abstract

We prove that for a probability measure on $R^{n}$ , the Poincar\'e inequality for convex functions is equivalent to the weak transportation inequality with a quadratic-linear cost. This generalizes recent results by Gozlan et al. and Feldheim et al., concerning probability measures on the real line. The proof relies on modified logarithmic Sobolev inequalities of Bobkov-Ledoux type for convex and concave functions, which are of independent interest. We also present refined concentration inequalities for general (not necessarily Lipschitz) convex functions, complementing recent results by Bobkov, Nayar and Tetali.

Equations327

Var f (X) \leq \frac{1}{λ} E ∣\nabla f (X) ∣^{2},

Var f (X) \leq \frac{1}{λ} E ∣\nabla f (X) ∣^{2},

∣\nabla f (x) ∣ = y \to x lim sup \frac{∣ f ( y ) - f ( x ) ∣}{∣ y - x ∣} .

∣\nabla f (x) ∣ = y \to x lim sup \frac{∣ f ( y ) - f ( x ) ∣}{∣ y - x ∣} .

μ^{\otimes N} (A + t B_{2}^{N n}) \geq 1 - 2 exp (- c t),

μ^{\otimes N} (A + t B_{2}^{N n}) \geq 1 - 2 exp (- c t),

P (f (X_{1}, \dots, X_{N}) \geq Med f (X_{1}, \dots, X_{N}) + t) \leq 2 e^{- c t}

P (f (X_{1}, \dots, X_{N}) \geq Med f (X_{1}, \dots, X_{N}) + t) \leq 2 e^{- c t}

T_{θ} (ν, μ) = π in f \int_{R^{n}} \int_{R^{n}} θ (x - y) π (d x d y),

T_{θ} (ν, μ) = π in f \int_{R^{n}} \int_{R^{n}} θ (x - y) π (d x d y),

H (ν ∣ μ) = \int_{R^{n}} lo g \frac{d ν}{d μ} d ν,

H (ν ∣ μ) = \int_{R^{n}} lo g \frac{d ν}{d μ} d ν,

T_{θ_{C, D}} (ν, μ) \leq H (ν ∣ μ),

T_{θ_{C, D}} (ν, μ) \leq H (ν ∣ μ),

θ_{C, D} (x) = {\frac{∣ x ∣ ^{2}}{2 C} D ∣ x ∣ - \frac{C D ^{2}}{2} for ∣ x ∣ \leq C D, for ∣ x ∣ > C D .

θ_{C, D} (x) = {\frac{∣ x ∣ ^{2}}{2 C} D ∣ x ∣ - \frac{C D ^{2}}{2} for ∣ x ∣ \leq C D, for ∣ x ∣ > C D .

\overline{\mathcal{T}}_{\theta}(\nu|\mu)=\inf_{\pi}\int_{\mathbb{R}^{n}}\theta\bigl{(}x-\int_{\mathbb{R}^{n}}yp_{x}(dy)\bigr{)}\mu(dx),

\overline{\mathcal{T}}_{\theta}(\nu|\mu)=\inf_{\pi}\int_{\mathbb{R}^{n}}\theta\bigl{(}x-\int_{\mathbb{R}^{n}}yp_{x}(dy)\bigr{)}\mu(dx),

\overline{T}_{θ} (ν ∣ μ) = (X, Y) in f E θ (X - E (Y ∣ X)),

\overline{T}_{θ} (ν ∣ μ) = (X, Y) in f E θ (X - E (Y ∣ X)),

\overline{T}_{θ} (ν ∣ μ) \leq H (ν ∣ μ),

\overline{T}_{θ} (ν ∣ μ) \leq H (ν ∣ μ),

\overline{T}_{θ} (μ ∣ ν) \leq H (ν ∣ μ),

\overline{T}_{θ} (μ ∣ ν) \leq H (ν ∣ μ),

E (f (X) - Med f (X))^{2} \leq \frac{2}{λ} E ∣\nabla f (X) ∣^{2} .

E (f (X) - Med f (X))^{2} \leq \frac{2}{λ} E ∣\nabla f (X) ∣^{2} .

(E Z - Med Z)^{2} \leq (E ∣ Z - Med Z ∣)^{2} \leq (E ∣ Z - E Z ∣)^{2} \leq Var Z .

(E Z - Med Z)^{2} \leq (E ∣ Z - Med Z ∣)^{2} \leq (E ∣ Z - E Z ∣)^{2} \leq Var Z .

E (Z - Med Z)^{2} = Var Z + (E Z - Med Z)^{2} \leq 2 Var Z

E (Z - Med Z)^{2} = Var Z + (E Z - Med Z)^{2} \leq 2 Var Z

P (f (X) \geq Med f (X) + t) \leq 8 e^{- 0.52 λ t / L} .

P (f (X) \geq Med f (X) + t) \leq 8 e^{- 0.52 λ t / L} .

\frac{1}{λ} P (∣ X ∣ \geq a)

\frac{1}{λ} P (∣ X ∣ \geq a)

\geq \frac{1}{2} E (Y - Y^{'})^{2} (1_{{Y > 0}} 1_{{Y^{'} = 0}} + 1_{{Y = 0}} 1_{{Y^{'} > 0}})

\geq \frac{1}{4} E Y^{2} 1_{{Y > 0}} \geq \frac{2}{λ} P (∣ X ∣ > a + 2 2/ λ)

E (f (X) - Med f (X))_{+}^{p}

E (f (X) - Med f (X))_{+}^{p}

\displaystyle\leq\frac{p^{2}}{2\lambda}\bigl{(}\mathbb{E}(f(X)-\operatorname{Med}f(X))_{+}^{p}\bigr{)}^{1-2/p}\bigl{(}\mathbb{E}|\nabla f(X)|^{p}\bigr{)}^{2/p},

\displaystyle\bigl{(}\mathbb{E}(f(X)-\operatorname{Med}f(X))_{+}^{p}\bigr{)}^{1/p}\leq\frac{p}{\sqrt{2\lambda}}\bigl{(}\mathbb{E}|\nabla f(X)|^{p}\bigr{)}^{1/p},

\displaystyle\bigl{(}\mathbb{E}(f(X)-\operatorname{Med}f(X))_{+}^{p}\bigr{)}^{1/p}\leq\frac{p}{\sqrt{2\lambda}}\bigl{(}\mathbb{E}|\nabla f(X)|^{p}\bigr{)}^{1/p},

\displaystyle\mathbb{P}\Big{(}f(X)\geq\operatorname{Med}f(X)+e\frac{p}{\sqrt{2\lambda}}\bigl{(}\mathbb{E}|\nabla f(X)|^{p}\bigr{)}^{1/p}\Big{)}\leq e^{2-p}

\displaystyle\mathbb{P}\Big{(}f(X)\geq\operatorname{Med}f(X)+e\frac{p}{\sqrt{2\lambda}}\bigl{(}\mathbb{E}|\nabla f(X)|^{p}\bigr{)}^{1/p}\Big{)}\leq e^{2-p}

\mathbb{P}(f(X)\geq\operatorname{Med}f(X)+t)\leq\exp\Bigl{(}2-\frac{\sqrt{2\lambda}}{e}t\Bigr{)}\leq 8e^{-0.52\sqrt{\lambda}t}.\qed

\mathbb{P}(f(X)\geq\operatorname{Med}f(X)+t)\leq\exp\Bigl{(}2-\frac{\sqrt{2\lambda}}{e}t\Bigr{)}\leq 8e^{-0.52\sqrt{\lambda}t}.\qed

M (s) - M (s /2)^{2} = Var (e^{s f (X) /2}) \leq \frac{1}{4 λ} E s^{2} ∣\nabla f (X) ∣^{2} e^{s f (X)} \leq \frac{L ^{2} s ^{2}}{4 λ} M (s) .

M (s) - M (s /2)^{2} = Var (e^{s f (X) /2}) \leq \frac{1}{4 λ} E s^{2} ∣\nabla f (X) ∣^{2} e^{s f (X)} \leq \frac{L ^{2} s ^{2}}{4 λ} M (s) .

Convex Poincar \overset{e}{ˊ} inequality ⟹ \overline{T} T_{θ_{C, D}}^{+} .

Convex Poincar \overset{e}{ˊ} inequality ⟹ \overline{T} T_{θ_{C, D}}^{+} .

P (f (X) \leq Med f (X) - t) \leq 8 e^{- t λ / (32 E ∣\nabla f (X) ∣)} .

P (f (X) \leq Med f (X) - t) \leq 8 e^{- t λ / (32 E ∣\nabla f (X) ∣)} .

P (∣ X - E X ∣ \geq M + t) \leq 8 e^{- 0.5 t λ}, t \geq 0.

P (∣ X - E X ∣ \geq M + t) \leq 8 e^{- 0.5 t λ}, t \geq 0.

P (∣ X - E X ∣ \leq M) \geq 3/4,

P (∣ X - E X ∣ \leq M) \geq 3/4,

P (∣\nabla f (X) ∣ < 8 E ∣\nabla f (X) ∣) \geq 7/8.

f (x) = f (x_{0}) + ⟨ u, x - x_{0} ⟩, x \in R^{n},

f (x) = f (x_{0}) + ⟨ u, x - x_{0} ⟩, x \in R^{n},

P (f (X) \leq - t)

P (f (X) \leq - t)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the convex Poincaré inequality and weak transportation inequalities

Radosław Adamczak

Institute of Mathematics, University of Warsaw, Banacha 2, 02–097 Warsaw, Poland.

[email protected]

and

Michał Strzelecki

Institute of Mathematics, University of Warsaw, Banacha 2, 02–097 Warsaw, Poland.

[email protected]

(Date: Last changes : March 27, 2017.)

Abstract.

We prove that for a probability measure on $\mathbb{R}^{n}$ , the Poincaré inequality for convex functions is equivalent to the weak transportation inequality with a quadratic-linear cost. This generalizes recent results by Gozlan et al. and Feldheim et al., concerning probability measures on the real line.

The proof relies on modified logarithmic Sobolev inequalities of Bobkov-Ledoux type for convex and concave functions, which are of independent interest.

We also present refined concentration inequalities for general (not necessarily Lipschitz) convex functions, complementing recent results by Bobkov, Nayar, and Tetali.

Key words and phrases:

Concentration of measure, convex functions, Poincaré inequality, weak transport-entropy inequalities.

2010 Mathematics Subject Classification:

Primary: 60E15. Secondary: 26B25, 26D10.

Research partially supported by the National Science Centre, Poland, grants no. 2015/18/E/ST1/00214 (R.A.) and 2015/19/N/ST1/00891 (M.St.).

1. Introduction

In the last thirty years a substantial body of research has been devoted to the interplay between various functional inequalities, transportation of measure theory, and the concentration of measure phenomenon, showing intimate connection between them. While most of the investigations have been carried out in the setting of general Lipschitz functions, concentration inequalities restricted to the class of convex Lipschitz functions have also been considered by many authors, starting from the seminal work by Talagrand in the 1990’s ([30, 31], see also [21, 24, 28, 29] and the monograph [22] for subsequent developments). A crucial feature of these results is that they are satisfied under much less restrictive assumptions concerning the regularity of the underlying probability measure when compared to inequalities valid for all Lipschitz functions. Even though the theory of concentration of measure for convex functions to some extent parallels the classical theory, there are some subtle differences related to the fact that convexity is not preserved under general contractions—even under the change of signs—which creates certain difficulties in the proofs and makes many well known arguments, which have been established in the classical context, invalid. As a consequence, the theory of concentration of measure for convex functions has not yet reached a satisfactory level of completeness. Nevertheless, several important results have been obtained in recent years, connecting dimension-free concentration inequalities for convex functions with the convex Poincaré inequality [19] and a new type of weak transportation cost inequalities [16, 17]. We will now briefly describe these developments, which will allow us to formulate our main result.

Let $|\cdot|$ stand for the standard Euclidean norm on $\mathbb{R}^{n}$ . Let $\mu$ be a Borel probability measure on $\mathbb{R}^{n}$ and let $X$ be a random vector with law $\mu$ . We say that $\mu$ (equivalently $X$ ) satisfies the convex Poincaré inequality with constant $\lambda>0$ if for all convex functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ we have

[TABLE]

where by $|\nabla f(x)|$ we mean the length of gradient at $x$ , defined as

[TABLE]

Note that this coincides with the length of the ‘true’ gradient provided $f$ is differentiable at $x$ . Also, it is enough to assume that (1.1) holds for convex Lipschitz functions, since an arbitrary convex function can be pointwise approximated by convex Lipschitz functions.

It follows from the results by Gozlan, Roberto, and Samson [19] that $\mu$ satisfies the convex Poincaré inequality if and only if there exists a constant $c>0$ such that for any $N$ , any convex set $A\subseteq(\mathbb{R}^{n})^{N}$ with $\mu^{\otimes N}(A)\geq 1/2$ , and any $t>0$ ,

[TABLE]

where $B_{2}^{k}$ denotes the unit Euclidean ball in $\mathbb{R}^{k}$ and $+$ stands for the Minkowski addition.

It is not difficult to see that (1.3) is equivalent to the one-sided deviation inequality for convex $1$ -Lipschitz functions, i.e.

[TABLE]

for all $t\geq 0$ , where $X_{1},\ldots,X_{N}$ are i.i.d. copies of $X$ , and $\operatorname{Med}Y$ denotes the median of the random variable $Y$ , i.e. $\operatorname{Med}Y=\inf\{t\in\mathbb{R}\colon\mathbb{P}(Y\leq t)\geq 1/2\}$ .

Thus the convex Poincaré inequality is equivalent to a dimension free deviation inequality for the upper tail of convex Lipschitz functions.

Let us now pass to the connections between the Poincaré inequality and transportation inequalities. Let $\theta\colon\mathbb{R}^{n}\to[0,\infty]$ be a measurable function with $\theta(0)=0$ . Recall that the optimal transport cost between two probability measures $\mu$ and $\nu$ on $\mathbb{R}^{n}$ , induced by $\theta$ is given by

[TABLE]

where the infimum is taken over all couplings between $\mu$ and $\nu$ , i.e. over all probability measures on $(\mathbb{R}^{n})^{2}$ such that $\pi(dx\times\mathbb{R}^{n})=\mu(dx)$ , $\pi(\mathbb{R}^{n}\times dy)=\nu(dy)$ . Recall also that the relative entropy $H(\nu|\mu)$ is defined as

[TABLE]

if $\nu$ is absolutely continuous with respect to $\mu$ and $H(\nu|\mu)=\infty$ otherwise.

It has been proved in [9] that $\mu$ satisfies the Poincaré inequality (1.1) for all smooth functions if and only if there exist constants $C,D$ such that for all probability measures $\nu$ ,

[TABLE]

where

[TABLE]

Recently Gozlan, Roberto, Samson, Shu, and Tetali [17] formulated a similar characterization of the convex Poincaré inequality on the real line. In order to formulate their result we need to introduce the weak transport cost between probability measures and corresponding transportation inequalities as defined in [16, 17].

In what follows, by $\mathcal{P}_{1}(\mathbb{R}^{n})$ we denote the class of all probability measures $\nu$ on $\mathbb{R}^{n}$ such that $\int_{\mathbb{R}^{n}}|x|d\nu(x)<\infty$ .

Definition 1.1.

Let $\mu$ and $\nu$ be probability measures on $\mathbb{R}^{n}$ . Assume that $\nu\in\mathcal{P}_{1}(\mathbb{R}^{n})$ . For a convex, lower semicontinuous function $\theta\colon\mathbb{R}^{n}\to[0,\infty]$ , such that $\theta(0)=0$ define the weak transport cost between $\mu$ and $\nu$ as

[TABLE]

where the infimum is taken over all couplings between $\mu$ and $\nu$ and for $x\in\mathbb{R}^{n}$ , $p_{x}(\cdot)$ is the conditional measure defined ( $\mu$ almost surely) by $\pi(dxdy)=p_{x}(dy)\mu(dx)$ .

Note that in the probabilistic notation one can write

[TABLE]

where the infimum is taken over all pairs of random vectors $(X,Y)$ with values in $\mathbb{R}^{n}\times\mathbb{R}^{n}$ , such that $X$ is distributed according to $\mu$ and $Y$ according to $\nu$ .

Due to the asymmetry between $\mu$ and $\nu$ , one can now introduce three different inequalities related to the cost $\overline{\mathcal{T}}_{\theta}$ .

Definition 1.2.

Let $\mu\in\mathcal{P}_{1}(\mathbb{R}^{n})$ and $\theta\colon\mathbb{R}^{n}\to[0,\infty]$ be a convex lower semicontinuous function with $\theta(0)=0$ . We will say that $\mu$ satisfies the inequality

•

$\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta}^{+}$ if for every probability measure $\nu\in\mathcal{P}_{1}(\mathbb{R}^{n})$ ,

[TABLE]

•

$\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta}^{-}$ if for every probability measure $\nu\in\mathcal{P}_{1}(\mathbb{R}^{n})$ ,

[TABLE]

•

$\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta}$ if $\mu$ satisfies both $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta}^{+}$ and $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta}^{-}$ .

The definition of those inequalities in [16] differs formally from the one presented above (which is taken from [17]). It is not difficult to see that the definitions presented in both articles are equivalent up to universal constants—the version above is more convenient for our purposes.

The authors of [17] proved that a probability measure $\mu$ on the real line satisfies the convex Poincaré inequality for some constant $\lambda>0$ if and only if it satisfies the transportation inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta_{C,D}}$ for some $C,D>0$ . In a dual formulation (expressed in terms of infimum convolution inequalities), this result has been also obtained in [14].

Our main result is an extension of this equivalence to arbitrary dimension.

Theorem 1.3.

Let $\mu$ be a probability measure on $\mathbb{R}^{n}$ . Then the following conditions are equivalent:

(i)

There exists $\lambda>0$ such that $\mu$ satisfies the convex Poincaré inequality (1.1).

(ii)

There exist $C,D>0$ such that $\mu$ satisfies the transportation inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta_{C,D}}$ .

*Remark 1.4**.*

The implication (ii) $\implies$ (i) is standard, in this case $\lambda=\frac{1}{C}$ . In our proof the constants $C,D$ in the implication (i) $\implies$ (ii) depend not only on $\lambda$ but also on certain quantiles related to the measure $\mu$ (which are always finite but may be of the order of up to $\sqrt{n}$ ). This is related to the inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta_{C,D}}^{+}$ responsible for the lower tail of convex functions, which is usually more difficult to deal with than the upper tail. We suspect that this is an artefact of our proof and one should be able to obtain $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{+}_{\theta_{C,D}}$ with $C,D$ depending only on $\lambda$ . As for $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta_{C,D}}^{-}$ our argument does yield it with $C,D$ depending only on $\lambda$ (see Corollary 4.3 below for details).

*Remark 1.5**.*

Thanks to well known tensorization properties of the inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta_{C,D}}$ , Theorem 1.3 implies that the convex Poincaré inequality is equivalent to improved two-level dimension free concentration inequality for convex functions (see Example 6.9 below for a precise formulation). In the class of Lipschitz functions inequalities of this type have been first obtained by Talagrand [30] in the case of the product exponential distribution (with an alternate proof, using infimum-convolution inequalities, by Maurey [24]). The fact that they are consequences of the Poincaré inequality for smooth functions was established by Bobkov and Ledoux [6]. By results due to Gozlan et al. [19] this can be regarded as a self-improvement of dimension-free concentration properties of Lipschitz functions. Our result shows that similar self-improvements are present also in the setting of convex concentration.

*Remark 1.6**.*

In [10] Bobkov and Götze provide a simple characterization of measures on $\mathbb{R}$ which satisfy the convex Poincaré inequality for some $\lambda>0$ (and thus also the inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{C,D}$ ) in terms of the probability distribution function. A similar characterization for larger $n$ seems to be a non-trivial open problem.

The organization of the article is as follows. First, in Section 2, we present preliminary properties of measures satisfying the convex Poincaré inequality and weak transportation inequalities, to be used in the proofs. Section 3 contains our most important technical result, i.e. modified log-Sobolev inequalities for convex and concave functions, which in Section 4 are combined with the Hamilton-Jacobi equations giving the proof of Theorem 1.3.

Next, in Section 5 we briefly discuss operations preserving the convex Poincaré inequality, which may be used to provide new non-trivial examples of measures satisfying it.

In Section 6 we present refined concentration of measure inequalities, which are consequences of weak transportation inequalities. We consider there more general cost functions than the one corresponding to the convex Poincaré inequality and discuss applications both to the Lipschitz and non-Lipschitz setting.

Finally, in Section 7 we state a few open questions. The Appendix contains basic facts concerning Hamilton-Jacobi equations, which are used in the proof of Theorem 1.3.

2. Preliminaries on the convex Poincaré inequality and weak transportation inequalities

In this section we present basic concentration of measure properties implied by the convex Poincaré inequality and the dual formulations of weak transportation inequalities. They will be needed in the proof of our main result.

We begin with a simple reformulation of the convex Poincaré inequality.

Lemma 2.1.

Let $X$ be a random vector in $\mathbb{R}^{n}$ satisfying the convex Poincaré inequality (1.1). Then for every convex function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ ,

[TABLE]

Proof.

Note that for every random variable $Z$ , thanks to the fact that the median minimizes the mean absolute deviation, we have

[TABLE]

Thus

[TABLE]

and it is enough to set $Z=f(X)$ and apply (1.1). ∎

2.1. Concentration inequalities

Let us start with the already mentioned (see (1.4)) upper tail estimate for convex Lipschitz functions implied by the convex Poincaré inequality. The proposition below can be also obtained by abstract results from [19], but we would like to provide an alternative derivation based on moments (the possibility of such a proof was suggested in [19]). Our strategy mimics a well known approach from the general Lipschitz case (see e.g. Proposition 2.5. in [25]), however we have to deal with some small difficulties related to the fact that in the convex setting we cannot truncate the function as this operation does not preserve convexity.

Proposition 2.2.

Assume that $X$ is a random vector in $\mathbb{R}^{n}$ , satisfying the convex Poincaré inequality (1.1). Then for any $L$ -Lipschitz convex function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ and any $t>0$ ,

[TABLE]

Proof of Proposition 2.2.

Consider the random variable $Y=(|X|-a)_{+}$ , where $a\in\mathbb{R}_{+}$ is arbitrary such that $\mathbb{P}(|X|\leq a)>1/4$ , and let $Y^{\prime}$ be an independent copy of $Y$ . Since the function $\varphi(x)=(|x|-a)_{+}$ is convex,

[TABLE]

and so $\mathbb{P}(|X|\geq a+2\sqrt{2/\lambda})\leq 2^{-1}\mathbb{P}(|X|\geq a)$ , which implies that $|X|$ is exponentially integrable. In particular for every Lipschitz function $f$ and all $p>0$ , $\mathbb{E}|f(X)|^{p}<\infty$ .

Assume now that $f\colon\mathbb{R}^{n}\to\mathbb{R}$ is convex. Then for all $p\geq 2$ , applying Lemma 2.1 to the convex function $x\mapsto(f(x)-\operatorname{Med}f(X))_{+}^{p/2}$ (note that its median is zero and $|\nabla(f(x)-\operatorname{Med}f(X))_{+}|\leq|\nabla f(x)|$ ), we obtain

[TABLE]

where we used Hölder’s inequality with exponents $p/(p-2)$ , $p/2$ . If we additionally assume that $f$ is Lipschitz, so that $\mathbb{E}(f(X)-\operatorname{Med}f(X))_{+}^{p}<\infty$ , we get

[TABLE]

which via Chebyshev’s inequality in $L_{p}$ implies

[TABLE]

for $p\geq 0$ . Now, if the Lipschitz constant of $f$ equals one, the above inequality yields for $t>0$ ,

[TABLE]

*Remark 2.3**.*

Another possible approach is based on the Laplace transform: assume without loss of generality that $\mathbb{E}f(X)=0$ and denote $M(s)=\mathbb{E}e^{sf(X)}$ for $s\geq 0$ . Since the function $e^{sf(\cdot)/2}$ is convex, the Poincaré inequality yields

[TABLE]

The idea would be now to regroup the expressions appearing in the above inequality, repeat the procedure (with $s/2$ instead of $s$ ), and—after a simple limit argument—obtain a bound on $\mathbb{E}e^{sf(X)}$ . After that we could use Markov’s inequality and optimize in $s$ to obtain an estimate of the upper tail of $f$ . However a delicate issue emerges: we have to a priori know that (for reasonable choices of the parameter $s$ ) $e^{sf(X)}$ is integrable (in the setting of smooth functions one overcomes this problem simply by truncating $f$ , for convex functions one would need e.g. to repeat the beginning of the proof of Proposition 2.2); cf. the remark following Theorem 6.8 in [19].

We do not know if the convex Poincaré inequality implies similar tail estimates—which depend only on $\lambda$ and the Lipschitz constant of the function—for the lower tail of convex Lipschitz functions, i.e. for $\mathbb{P}(f(X)\leq\operatorname{Med}f(X)-t)$ , $t>0$ (cf. Question 7.3 below).

Nonetheless, we can easily get estimates in terms of $\lambda$ and certain quantiles of $X$ . They will be crucial in the proof of the implication

[TABLE]

Lemma 2.4.

Let $X$ be a random vector in $\mathbb{R}^{n}$ satisfying the convex Poincaré inequality (1.1) and let $M$ be any number such that $\mathbb{P}(|X-\mathbb{E}X|\leq M)\geq 3/4$ . Then for every convex $f\colon\mathbb{R}^{n}\to\mathbb{R}$ and for any $t>32M\mathbb{E}|\nabla f(X)|$ ,

[TABLE]

Proof.

By Proposition 2.2 (note that the function $x\mapsto|x-\mathbb{E}X|$ is convex and $1$ -Lipschitz),

[TABLE]

Let $f\colon\mathbb{R}^{n}\to\mathbb{R}$ be a convex function. Without loss of generality we may assume $\operatorname{Med}f(X)=0$ . We have $\mathbb{P}(f(X)\geq 0)\geq 1/2$ ,

[TABLE]

Thus there exists $x_{0}$ such that $f(x_{0})\geq 0$ , $|x_{0}-\mathbb{E}X|\leq M$ , and $|\nabla f(x_{0})|<8\mathbb{E}|\nabla f(X)|$ . Define

[TABLE]

where $u$ is any subgradient of $f$ at $x_{0}$ , so that $\tilde{f}(x)\leq f(x)$ for all $x\in\mathbb{R}^{n}$ . Taking $x=x_{0}+\varepsilon u$ with $\varepsilon\to 0$ we see that $|u|\leq|\nabla f(x_{0})|\leq 8\mathbb{E}|\nabla f(X)|$ , and thus we have

[TABLE]

If now $t/(16\mathbb{E}|\nabla f|)\geq 2M$ , we can conclude from (2.3) that

[TABLE]

which ends the proof. ∎

2.2. Infimum convolution. Dual formulation of transportation inequalities

We will rely on the following lemma proved in [17] (and in a slightly different version also in [16]). The proof in [17] is presented for the real line, but it is not difficult to see that it generalizes to arbitrary dimension.

Lemma 2.5.

Let $\theta\colon\mathbb{R}^{n}\to\mathbb{R}_{+}$ be a convex cost function, $\theta(0)=0$ , $\lim_{x\to\infty}\theta(x)=\infty$ . For all functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ bounded from below, $x\in\mathbb{R}^{n}$ , and $t>0$ set

[TABLE]

Then

(i)

$\mu$ * satisfies $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{+}_{\theta}$ if and only if for all convex $f\colon\mathbb{R}^{n}\to\mathbb{R}$ , bounded from below,*

[TABLE]

(ii)

$\mu$ * satisfies $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{-}_{\theta}$ if and only if for all convex $f\colon\mathbb{R}^{n}\to\mathbb{R}$ , bounded from below,*

[TABLE]

(iii)

if $\mu$ satisfies $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta}$ , then for all convex $f\colon\mathbb{R}^{n}\to\mathbb{R}$ , bounded from below,

[TABLE]

holds with $t=2$ . Conversely, if $\mu$ satisfies (2.6) for some $t>0$ , then it satisfies $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\tilde{\theta}}$ with $\tilde{\theta}(\cdot)=t\theta(\cdot/t)$ .

Moreover, the inequality (2.4) (resp. (2.5)) for all convex, Lipschitz functions bounded from below is a sufficient condition for $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{+}_{\theta}$ (resp. $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{-}_{\theta}$ ).

The inequality (2.6) was introduced by Maurey in [24] and the relation with transportation cost inequalities was first observed in [7].

3. From convex Poincaré to modified log-Sobolev inequalities

for convex and concave functions

In this section we present modified log-Sobolev inequalities for convex and concave functions which are implied by the convex Poincaré inequality. Our approach builds heavily on the arguments introduced by Bobkov and Ledoux in [6] for arbitrary Lipschitz functions, however some non-trivial modifications will be necessary in order to handle the difficulties imposed by the restriction of the Poincaré inequality to convex functions.

In what follows for a nonnegative random variable $Y$ , we define its entropy as

[TABLE]

if $\mathbb{E}Y\log Y<\infty$ and $\operatorname{Ent}Y=\infty$ otherwise. We refer to e.g [5, 22] for basic properties of entropy and log-Sobolev inequalities.

Throughout this section we assume that $\mu$ is a probability measure on $\mathbb{R}^{n}$ satisfying the convex Poincaré inequality (1.1) and that $X$ is a random vector with law $\mu$ , which will not be explicitly stated in all the theorems.

3.1. Modified log-Sobolev inequalities for convex functions

Theorem 3.1.

Let $f\colon\mathbb{R}^{n}\to\mathbb{R}$ be convex with $|\nabla f(x)|\leq c\leq 0.5\sqrt{\lambda}$ for all $x\in\mathbb{R}^{n}$ . Then

[TABLE]

where

[TABLE]

Our constants are slightly worse than in [6], basically because we need to work with the median rather than the mean. However the argument (which works also in the classical case) seems to slightly simplify the technicalities of [6]. The proof relies on two propositions.

Proposition 3.2.

Let $f\colon\mathbb{R}^{n}\to\mathbb{R}$ be convex with $\operatorname{Med}f(X)=0$ and $|\nabla f(x)|\leq c\leq 0.5\sqrt{\lambda}$ for all $x\in\mathbb{R}^{n}$ . Then

[TABLE]

where $C_{1}=C_{1}(c,\lambda)=\bigl{(}\sqrt{\lambda/2}-c/2\bigr{)}^{-2}$ .

Proof.

For $x\in\mathbb{R}$ we define $\Psi(x)=xe^{x/2}$ and

[TABLE]

One easily checks that $|\Psi(x)|\leq|\Phi(x)|$ , $|\Phi^{\prime}(x)|\leq|\Psi^{\prime}(x)|$ , and $\Phi$ is convex nondecreasing.

Denote $a^{2}=\mathbb{E}|\Phi(f(X))|^{2}$ and $b^{2}=\mathbb{E}|\nabla f(X)|^{2}e^{f(X)}$ (where $a,b\geq 0$ ). The function $\Phi(f)$ is convex, moreover $\operatorname{Med}\Phi(f(X))=0$ . Hence, by Lemma 2.1,

[TABLE]

Note that $a<\infty$ (by Proposition 2.2 and since $c\leq 0.5\sqrt{\lambda}$ ). Thus $a(\sqrt{\lambda/2}-c/2)\leq b$ and the assertion follows. ∎

Proposition 3.3.

Let $f\colon\mathbb{R}^{n}\to\mathbb{R}$ be either convex or concave, with $\operatorname{Med}f(X)=0$ and $|\nabla f(x)|\leq c$ for all $x\in\mathbb{R}^{n}$ . Then

[TABLE]

where $C_{2}=C_{2}(c,\lambda)=\exp({c\sqrt{2/\lambda}})$ . Consequently,

[TABLE]

Proof.

If $|\nabla f(X)|$ vanishes with probability one, there is nothing to prove. Otherwise, denote by $\widetilde{\mathbb{E}}$ the expectation with respect to the probability measure with density $|\nabla f(X)|^{2}/\mathbb{E}|\nabla f(X)|^{2}$ relative to $\mathbb{P}$ . By Jensen’s inequality,

[TABLE]

Thus, using the trivial inequality $-|f|\leq f$ , we conclude that

[TABLE]

But since

[TABLE]

we can bound $\widetilde{\mathbb{E}}|f(X)|$ by $c\sqrt{2/\lambda}$ . This yields the assertion of the proposition. ∎

Proof of Theorem 3.1.

Without loss of generality assume $\operatorname{Med}f(X)=0$ . Denote $F(t)=\mathbb{E}f(X)^{2}e^{tf(X)}$ , $t\in[0,1]$ . By the formula $\int_{0}^{1}ta^{2}e^{ta}dt=ae^{a}-e^{a}+1$ and the convexity of $t\mapsto F(t)$ ,

[TABLE]

(note that for this argument to work we do not need the expectation of $f(X)$ to vanish). Thus Propositions 3.2 and 3.3 imply the assertion of the theorem. ∎

3.2. Modified log-Sobolev inequalities for concave functions

Theorem 3.4.

Let $f\colon\mathbb{R}^{n}\to\mathbb{R}$ be convex with $|\nabla f(x)|\leq c<\sqrt{\lambda}/64$ for all $x\in\mathbb{R}^{n}$ . Assume that $M\in\mathbb{R}_{+}$ satisfies $\mathbb{P}(|X-\mathbb{E}X|\leq M)\geq 3/4$ . Then

[TABLE]

where $C=C(\lambda,c,M)$ is a constant depending only on $\lambda,c,M$ .

*Remark 3.5**.*

If we denote by $X_{1},\ldots,X_{n}$ the coordinates of $X$ , then by the Poincaré inequality we have

[TABLE]

and hence, by the Chebyshev inequality, $M=2\sqrt{n/\lambda}$ satisfies $\mathbb{P}(|X-\mathbb{E}X|\leq M)\geq 3/4$ . Thus in fixed dimension $n$ and for say $c=\sqrt{\lambda}/128$ , the constant $C$ in Theorem 3.4 can be bounded uniformly over all probability measures satisfying the convex Poincaré inequality with constant $\lambda$ .

Proof of Theorem 3.4.

We start as in the proof of Theorem 3.1. Denote $g=-f$ (this is a concave function). Without loss of generality assume $\operatorname{Med}g(X)=0$ . Denote $F(t)=\mathbb{E}g(X)^{2}e^{tg(X)}$ , $t\in[0,1]$ . By the convexity of $t\mapsto F(t)$ ,

[TABLE]

We have

[TABLE]

By Proposition 3.3, $F(0)\leq\frac{2}{\lambda}\exp(c\sqrt{2/\lambda})\mathbb{E}|\nabla g(X)|^{2}e^{g(X)}$ , so it remains to estimate $\mathbb{E}g_{+}(X)^{2}e^{g_{+}(X)}$ .

Integration by parts and Lemma 2.4 yield

[TABLE]

if only $c<\sqrt{\lambda}/64$ . Similarly (using Lemma 2.4 in its full strength),

[TABLE]

for some $D_{2}=D_{2}(\lambda,M)$ . Thus, by Proposition 3.3,

[TABLE]

This, together with (3.2) and (3.3), ends the proof:

[TABLE]

4. Proof of the main result

We will now present the proof of Theorem 1.3. As already mentioned, the implication (ii) $\implies$ (i) is standard, we provide a sketch of its proof just for the sake of completeness. The proof of the implication (i) $\implies$ (ii) follows the arguments introduced first in [9] and based on the analysis of the Hamilton-Jacobi equations. A crucial element of the proof will be the modified log-Sobolev inequalities obtained in Section 3.

Lemma 4.1.

Let $X$ be a random vector in $\mathbb{R}^{n}$ . Assume that there exist $C<\infty$ and $L>0$ such that

[TABLE]

and the inequality

[TABLE]

holds for every convex (respectively: concave) $L$ -Lipschitz function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ . Then, for every convex Lipschitz function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ bounded from below,

[TABLE]

where $Q_{t}^{\alpha}f(x)=\inf_{y\in\mathbb{R}^{n}}\{f(x-y)+t\alpha(|y|/t)\}$ , $t>0$ , is the infimum convolution operator with the cost function

[TABLE]

*Remark 4.2**.*

The condition (4.1) is introduced to exclude heavy-tailed measures for which the only exponentially integrable convex functions are constants. Note that in this case the inequality (4.2) is trivially satisfied, while the transportation inequality cannot hold (as it implies the existence of exponential moments).

If we recall the dual formulations of the weak transport-entropy inequalities $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{-}$ and $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{+}$ (see Lemma 2.5), the definition of $\theta_{C,D}$ from (1.8), and the results of the preceding section (namely, Theorems 3.1 and 3.4), we immediately obtain the following corollaries.

Corollary 4.3.

Let $X$ be a random vector in $\mathbb{R}^{n}$ satisfying the convex Poincaré inequality (1.1). Then, for any $c\leq 0.5\sqrt{\lambda}$ , the law of $X$ satisfies the inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{-}_{\theta_{2C,c}}$ with

[TABLE]

Corollary 4.4.

Let $X$ be a random vector in $\mathbb{R}^{n}$ satisfying the convex Poincaré inequality (1.1) and let $M$ be any number such that $\mathbb{P}(|X-\mathbb{E}X|\leq M)\geq 3/4$ . Then, for any $c<\sqrt{\lambda}/64$ , the law of $X$ satisfies the inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{+}_{\theta_{2C,c}}$ for some constant $C=C(\lambda,c,M)$ depending only on $\lambda$ , $c$ , and $M$ .

Proof of Lemma 4.1.

Suppose that the log-Sobolev inequality (4.2) holds for all convex and $L$ -Lipschitz functions. We first present a perturbation argument which allows us to work with random vectors with an absolutely continuous law. We then shall follow the approach of [17, Proof of Theorem 1.5].

Let $G$ be a Gaussian random vector in $\mathbb{R}^{n}$ , independent of $X$ , with the covariance matrix being a sufficiently small multiple of identity, so that it satisfies the usual log-Sobolev inequality with constant $C$ ,

[TABLE]

for all Lipschitz functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ (see e.g. Theorem 5.1. in [22] for an equivalent formulation).

Then, by the tensorization property of entropy (see e.g. Proposition 5.6. in [22]), the random vector $(X,G)$ on $\mathbb{R}^{n}\times\mathbb{R}^{n}$ satisfies the modified log-Sobolev inequality

[TABLE]

for all convex functions $F\colon\mathbb{R}^{n}\times\mathbb{R}^{n}\to\mathbb{R}$ which are $L$ -Lipschitz with respect to the first coordinate (here $|\nabla_{X}F|$ and $|\nabla_{G}F|$ denote partial lengths of gradients with respect to the first and second variable, with the other variable fixed).

Let $f\colon\mathbb{R}^{n}\to\mathbb{R}$ be a convex $L$ -Lipschitz function and consider $\varepsilon>0$ . Applying the inequality (4.4) to the function defined by the formula $F(x,y)=f(x+\varepsilon y)$ for $x,y\in\mathbb{R}^{n}$ (which is $L$ -Lipschitz with respect to the first variable), we see that the random vector $X_{\varepsilon}=X+\varepsilon G$ satisfies the modified log-Sobolev inequality

[TABLE]

where $C_{\varepsilon}=C(1+\varepsilon^{2})$ . Note that the law of $X_{\varepsilon}$ is absolutely continuous with respect to the Lebesgue measure on $\mathbb{R}^{n}$ , and so almost surely $X_{\varepsilon}$ is a differentiability point of $f$ and $|\nabla f(X_{\varepsilon})|$ coincides with the Euclidean length of the ‘true’ gradient $\nabla f(X_{\varepsilon})$ .

Moreover, (4.5) can be rewritten in the form

[TABLE]

where

[TABLE]

is the Legendre transform of $\alpha_{\varepsilon}(s)=\min\{\frac{s^{2}}{4C_{\varepsilon}},L|s|-L^{2}C_{\varepsilon}\}$ .

If $f\colon\mathbb{R}^{n}\to\mathbb{R}$ is convex, Lipschitz (with arbitrary Lipschitz constant) and bounded from below, then $Q_{t}^{\alpha_{\varepsilon}}f$ is well defined, convex (as an infimum convolution of two convex functions), and $L$ -Lipschitz for $t\in(0,1]$ (since $Q_{t}^{\alpha_{\varepsilon}}f(x)=\inf_{y\in\mathbb{R}^{n}}\{f(y)+t\alpha_{\varepsilon}(|y-x|/t)\}$ and the function $x\mapsto t\alpha_{\varepsilon}(|y-x|/t)$ is $L$ -Lipschitz for $t\in(0,1]$ ).

Moreover, the function $u(t,x)=Q_{t}^{\alpha_{\varepsilon}}f(x)$ is Lipschitz on $(0,\infty)\times\mathbb{R}^{n}$ and satisfies the Hamilton-Jacobi equation

[TABLE]

(see Proposition A.1 in Appendix A). Set

[TABLE]

(Note that $F(t)<\infty$ since $Q_{t}^{\alpha_{\varepsilon}}f$ is $L$ -Lipschitz.) Using the integrability properties of $X$ (and as a consequence of $X_{\varepsilon}$ ), together with the Lipschitz property of $u$ it is not difficult to see that $F$ is locally Lipschitz and for Lebesgue almost all $t\in(0,1)$ ,

[TABLE]

where we used (4.6), the definition of $\alpha_{\varepsilon}^{*}$ , and the fact that $Q_{t}^{\alpha_{\varepsilon}}f$ is $L$ -Lipschitz. Thus

[TABLE]

or, in other words,

[TABLE]

It is easy to see that by taking $\varepsilon\to 0$ we arrive at the assertion of the lemma (recall that $f$ and $Q_{1}^{\alpha_{\varepsilon}}$ are Lipschitz and $\alpha_{\varepsilon}\leq\alpha$ ).

Suppose now that the log-Sobolev inequality (4.2) holds for all concave and $L$ -Lipschitz functions. As before, we pass to the random vector $X_{\varepsilon}$ which has an absolutely continuous distribution. Let $g\colon\mathbb{R}^{n}\to\mathbb{R}$ be convex and bounded from below. Then the function $f=-Q_{1}^{\alpha_{\varepsilon}}g$ is concave and $L$ -Lipschitz. The same calculation as above yields

[TABLE]

or equivalently

[TABLE]

We stress that now, in order to prove the Hamilton-Jacobi equations via Proposition A.1, we need to use the $L$ -Lipschitz property of $f$ , since in general $f$ is not bounded from below.

Since

[TABLE]

(to verify the inequality take $z=x$ ), a limit argument yields the assertion. ∎

We are now ready for the proof of our main result.

Proof of Theorem 1.3.

The implication (i) $\implies$ (ii) follows immediately from Corollaries 4.3 and 4.4, and the definition of $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta_{2C,c}}$ . To obtain the reverse implication one can use a standard Taylor expansion argument. Assume that $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta_{C,D}}$ holds. Let $f\colon\mathbb{R}^{n}\to\mathbb{R}$ be convex, Lipschitz, and bounded from below. For $x\in\mathbb{R}^{n}$ denote

[TABLE]

where $u_{x}$ is any subgradient of $f$ at $x$ , so that $f^{x}\leq f$ on $\mathbb{R}^{n}$ . Taking $z=x+\varepsilon u_{x}$ with $\varepsilon\to 0$ we see that $|u_{x}|\leq|\nabla f(x)|$ .

For sufficiently small $\varepsilon$ we have $\varepsilon|\nabla f(x)|\leq D$ for all $x\in\mathbb{R}^{n}$ , and hence

[TABLE]

(recall that $|u_{x}|\leq|\nabla f(x)|$ ). We now substitute $\varepsilon f$ into the dual formulation (2.6) and use the above estimate. An inspection of the Taylor expansions up to order $\varepsilon^{2}$ yields

[TABLE]

This ends the proof. ∎

5. Examples of measures satisfying the convex Poincaré inequality

We will now discuss several tools which allow to construct measures satisfying the convex Poincaré inequality. To shorten the notation we will denote by $\mathbb{E}_{\mu}$ and $\operatorname{Var}_{\mu}$ respectively the mean and variance of $f$ seen as a random variable on $\mathbb{R}^{n}$ equipped with probability measure $\mu$ .

Let us start with the well known tensorization property of variance (see e.g. [5, Proposition 1.4.1]), which asserts that whenever $\mu_{i}$ are probability measures on $\mathcal{X}_{i}$ , $i=1,\ldots,n$ , then the product measure $\mu=\mu_{1}\otimes\cdots\otimes\mu_{n}$ on $\mathcal{X}_{1}\times\cdots\times\mathcal{X}_{n}$ , satisfies the inequality

[TABLE]

for every function $f\colon\mathcal{X}_{1}\times\cdots\times\mathcal{X}_{n}\to\mathbb{R}$ , where $\operatorname{Var}_{\mu_{i}}f$ denotes the variance of $f$ treated as a function on $\mathcal{X}_{i}$ , with the other coordinates fixed.

This immediately implies the tensorization property for the convex Poincaré inequality, namely if $\mu_{i}$ ( $i=1,\ldots,N$ ) is a probability measure on $\mathbb{R}^{n_{i}}$ , satisfying the convex Poincaré inequality with constant $\lambda$ , then the product measure $\mu=\mu_{1}\otimes\cdots\otimes\mu_{N}$ on $\mathbb{R}^{n_{1}+\cdots+n_{N}}$ satisfies

[TABLE]

for every convex function $f\colon\mathbb{R}^{n_{1}+\cdots+\mu_{n}}\to\mathbb{R}$ , where $|\nabla_{i}f|$ denotes the ‘partial length of gradient’ along $\mathbb{R}^{n_{i}}$ . If the measures $\mu_{i}$ are absolutely continuous with respect to the Lebesgue measure, then by Rademacher’s theorem locally Lipschitz functions are almost everywhere differentiable, in particular the right-hand side of the above inequality coincides with $\lambda^{-1}\mathbb{E}|\nabla f|^{2}$ and so we obtain that $\mu$ satisfies the convex Poincaré inequality with constant $\lambda$ . The situation is more delicate for measures which are not absolutely continuous, however thanks to results by Gozlan, Roberto and Samson [19], we can obtain the following simple proposition.

Proposition 5.1.

Assume that $\mu_{i}$ are probability measures on $\mathbb{R}^{n_{i}}$ , $i=1,\ldots,n$ , satisfying the convex Poincaré inequality with constant $\lambda$ . Then the measure $\mu=\mu_{1}\otimes\cdots\otimes\mu_{n}$ on $\mathbb{R}^{n_{1}+\cdots+n_{N}}$ satisfies the convex Poincaré inequality with constant $\lambda/C$ for some universal constant $C$

Proof.

We provide only a sketch of the proof, leaving some computational details to the Reader. Denote $n=n_{1}+\cdots+n_{N}$ and consider an arbitrary convex smooth 1-Lipschitz function $f$ on $\mathbb{R}^{nk}$ , $k\geq 1$ . By (5.1) we have $\operatorname{Var}_{\mu^{\otimes k}}f\leq\lambda^{-1}\mathbb{E}_{\mu^{\otimes k}}|\nabla f|^{2}\leq 1$ . Using an analogous argument as in the proof of Proposition 2.2 (for $p>2$ , to remain in the smooth setting) we arrive at

[TABLE]

for all 1-Lipschitz smooth convex functions. We can extend this inequality to arbitrary 1-Lipschitz convex function (approximating them with 1-Lipschitz smooth convex functions, e.g. by convolving them with Gaussian densities, see [28, p. 429]), so in particular we get that for any convex set $A\subseteq\mathbb{R}^{nk}$ , with $\mu^{\otimes k}(A)\geq 1/2$ , and all $t>0$ ,

[TABLE]

where $B_{2}^{nk}$ is the unit Euclidean ball in $\mathbb{R}^{nk}$ . Recall the notation

[TABLE]

By [19, Theorem 6.7], the dimension free subexponential concentration for convex sets of the form (5.2) implies that $\mu$ satisfies the Poincaré inequality

[TABLE]

for all convex functions $f$ , where

[TABLE]

where $\bar{\Phi}$ is the Gaussian tail function. Using the estimate $\bar{\Phi}(x)\geq\frac{1}{2}e^{-x^{2}}$ and performing some elementary calculations, we arrive at the assertion of the proposition. ∎

*Remark 5.2**.*

The above argument shows that if $\mu$ satisfies the Poincaré inequality (1.1) then it also satisfies the formally stronger inequality (5.3) with $\lambda^{\prime}=\lambda/C$ . We remark that in the category of all Lipschitz functions it is known that the Poincaré inequalities with the length of gradients $|\nabla^{-}f|$ and $|\nabla f|$ are equivalent and the involved constants do not change (cf. [19, Remark 1.1]).

Tensorization allows in particular to pass from one-dimensional measures satisfying the convex Poincaré inequality (characterized in [10]) to product measures in higher dimensions. Another standard tool for producing new examples is perturbation: if $\mu$ satisfies the convex Poincaré inequality with constant $\lambda$ and $\nu$ is a measure with density $e^{U}$ with respect to $\mu$ , then $\nu$ satisfies the convex Poincaré inequality with constant $\lambda\exp(\inf U-\sup U)$ . For the proof see e.g. [5, Chapter 3.4] (the proof therein is written in the context of Markov processes and Dirichlet forms but it is based only on the elementary observation that $\operatorname{Var}f=\inf_{a\in\mathbb{R}}\mathbb{E}|f-a|^{2}$ and works in exactly the same way in the convex setting).

Perturbation and tensorization are tools that appeared for the first time in the ‘classical’ theory of Poincaré and log-Sobolev inequalities for smooth (or locally Lipschitz) functions. The next proposition does not have a counterpart in the classical setting and significantly extends the set of tools for creating new examples. Namely, we will show that the convex Poincaré inequality passes to mixtures of measures. Note that this cannot be the case for the classical Poincaré inequality since it clearly cannot hold for measures with disconnected support. We note however that the preservation of the Poincaré and log-Sobolev inequalities by mixtures of measures with overlapping supports has been investigated by Chafaï and Malrieu in [11]. In particular, the Proposition 5.3 below has been inspired by calculation in Section 4.1 therein.

Let $\mathcal{T}_{2}(\mu_{0},\mu_{1})$ stand for the usual Kantorovich transport cost between $\mu_{1}$ and $\mu_{0}$ (defined by taking $\theta(x)=|x|^{2}$ in (1.5)), in other words the square of the Kantorovich-Wasserstein distance $W_{2}$ .

Proposition 5.3.

Let $\mu_{0}$ , $\mu_{1}$ be probability measures on $\mathbb{R}^{n}$ which satisfy the convex Poincaré inequality (1.1) with constants $\lambda_{0}$ and $\lambda_{1}$ respectively. Then the measure $\mu_{p}=p\mu_{1}+(1-p)\mu_{0}$ , $p\in(0,1)$ , satisfies the convex Poincaré inequality (1.1) with constant

[TABLE]

Proof.

If $f\colon\mathbb{R}^{n}\to\mathbb{R}$ is a convex function, then

[TABLE]

and it suffices to estimate the last term.

Let $X$ and $Y$ be random vectors in $\mathbb{R}^{n}$ with laws $\mu_{1}$ and $\mu_{0}$ respectively. By convexity of $f$ ,

[TABLE]

Thus,

[TABLE]

Taking the infimum over all realizations of $X$ and $Y$ implies the assertion. ∎

6. Refined concentration of measure derived from infimum convolution inequalities

In this section we explain what concentration inequalities for convex functions can be obtained from general infimum convolution inequalities of the form (2.6). While some parts of our derivation are well known and are included only for the sake of completeness, we also provide new inequalities valid beyond the setting of Lipschitz functions. Their proofs are elementary but to our best knowledge they have not been noted in the literature before.

Throughout this section $\theta\colon\mathbb{R}^{n}\to[0,\infty)$ is a convex function. We also assume the following conditions:

•

$\theta(x)=\theta(-x)$ for all $x\in\mathbb{R}^{n}$ ,

•

$\theta(x)=0$ if and only if $x=0$ (in particular, by convexity, $\lim_{x\to\infty}\theta(x)=\infty$ ).

We remark that at the cost of some technical work one can obtain the results we present below for more general cost functions (e.g. taking the value $\infty$ or not satisfying the symmetry condition). We restrict to the smaller class to simplify the presentation.

In what follows, for a function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ , bounded from below, we set

[TABLE]

We also denote

[TABLE]

6.1. Enlargements of sets and concentration for Lipschitz functions

Let us start with the classical description of concentration of measure in terms of enlargements of sets. The following proposition goes back to [24].

Proposition 6.1.

Assume that $\mu$ is a probability measure on $\mathbb{R}^{n}$ , satisfying

[TABLE]

for all convex functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ , bounded from below. Then for all convex subsets $A\subseteq\mathbb{R}^{n}$ and $r>0$ , we have

[TABLE]

Proof.

Consider $f=\infty 1_{(\operatorname{cl}A)^{c}}$ and note that $Qf(x)<r$ if and only if there exists $y\in A$ such that $\theta(x-y)<r$ . Applying the inequality (6.1) to $f$ (which can be justified by monotone approximation), we obtain

[TABLE]

To formulate corollaries to the above proposition we need to introduce new notation, which at first may seem rather abstract. However, as the examples presented in the subsequent parts of this section will show, it will prove useful in providing a uniform framework for concentration inequalities, especially in the non-Lipschitz case.

Definition 6.2.

Define the norm $|\cdot|_{\frac{1}{p}\theta}$ on $\mathbb{R}^{n}$ , as the Orlicz norm corresponding to the function $x\mapsto\frac{1}{p}\theta(x)$ , i.e.

[TABLE]

Define also the norm $|\cdot|_{\theta,p}$ on $\mathbb{R}^{n}$ as the dual to $|\cdot|_{\frac{1}{p}\theta}$ , i.e.

[TABLE]

The norm $|x|_{\theta,p}$ is equivalent (up to universal constants) to the Orlicz norm $|\cdot|_{\theta^{\ast}_{p}}$ related to the function $\theta^{\ast}_{p}(x)=\frac{1}{p}\theta^{\ast}(px)$ , explicitly given by

[TABLE]

It was observed by Gluskin and Kwapień in [15] that norms of this type play an important role in moment estimates for sums of independent random variables. Recently it has been noted [3, 1] that they also appear in moment estimates for smooth functions of random vectors satisfying modified log-Sobolev inequalities. Since in the context of transportation or infimum convolution inequalities one starts from the function $\theta$ and not from $\theta^{\ast}$ (which is the case in the corresponding log-Sobolev setting) it is more convenient to work with $|\cdot|_{\theta,p}$ rather than with the equivalent norm $|\cdot|_{\theta^{\ast}_{p}}$ used in [3, 1].

In what follows we will need the following simple inequality which follows from convexity of $\theta$ and the assumption $\theta(0)=0$ . For $x\in\mathbb{R}^{n}$ , $p>0$ , and $t\geq 1$ ,

[TABLE]

The following corollary to Proposition 6.1 is again based on by now standard arguments, written however in the language of the norms $|\cdot|_{\theta,p}$ .

Corollary 6.3.

Let $X$ be a random vector with law $\mu$ , satisfying (6.1) for all convex functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ bounded from below. Then for any smooth convex Lipschitz function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ and $p\geq 0$ ,

[TABLE]

*Remark 6.4**.*

It is easy to see that if the inequality (6.3) holds for all smooth convex Lipschitz functions, then one can apply it to arbitrary convex Lipschitz function, replacing $\sup_{x\in\mathbb{R}^{n}}|\nabla f(x)|_{\theta,p}$ by the Lipschitz constant of $f$ with respect to the norm $|\cdot|_{\frac{1}{p}\theta}$ . To verify this it is enough to consider convolutions of $f$ with a sequence of Gaussian densities converging to Dirac’s mass at zero—they are smooth, have the same Lipschitz constant as $f$ and converge to $f$ uniformly (see e.g. [28, p. 429]).

Proof of Corollary 6.3.

Let $A=\{y\in\mathbb{R}^{n}\colon f(y)\leq\operatorname{Med}f(X)\}$ , so that $\mathbb{P}(X\in A)\geq 1/2$ . Then by convexity, for any $y\in A$ ,

[TABLE]

Thus

[TABLE]

where in the second inequality we used Proposition 6.1.

Let now $A=\{y\in\mathbb{R}^{n}\colon f(y)<\operatorname{Med}f(X)-\sup_{x\in\mathbb{R}^{n}}|\nabla f(x)|_{\theta,p}\}$ . Similarly as above, we obtain

[TABLE]

which shows that

[TABLE]

Combining the last inequality with (6.5) proves the corollary. ∎

6.2. Concentration inequalities for general convex functions

We are now ready to state the main result of this section, contained in the following theorem, dealing with general (not necessarily Lipschitz) convex functions. In its formulation we adopt the convention $\frac{0}{0}=0$ . The proof of the theorem as well as of its corollary is postponed to Section 6.3

We would like to emphasize, that in the theorem we assume only (6.3), which is streactly weaker than the infimum-convolution inequality (6.1).

Theorem 6.5.

Let $X$ be a random vector satisfying (6.3) for all smooth convex Lipschitz functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ . Then for any smooth convex function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ , the following properties hold.

(i)

For any $p\geq 1$ ,

[TABLE]

(ii)

Let $p>0$ , $q\in(1/2,1]$ and let $M_{p,q}\in\mathbb{R}$ satisfy $\mathbb{P}(|\nabla f(X)|_{\theta,p}\leq M_{p,q})\geq q$ . Then

[TABLE]

In particular for $p\geq 0$ ,

[TABLE]

(iii)

For all $p>0$ ,

[TABLE]

*Remark 6.6**.*

As will become clear in the proof, the part (i) of the above theorem holds in fact under one-sided concentration, i.e. it is enough to assume that

[TABLE]

Let us now illustrate the above theorem with a few concrete examples and a corollary. In particular we will show what the norms $|\cdot|_{\theta,p}$ look like for different choices of the cost function $\theta$ .

Example 6.7.

If $\theta(x)=c|x|^{r}$ for some $r\geq 1$ and $c>0$ , then $|x|_{\theta,p}=c^{-1/r}p^{1/r}|x|$ and (6.3) is equivalent to

[TABLE]

for all 1-Lipschitz convex functions (in particular for $r=2$ we get the subgaussian concentration). The first part of Theorem 6.5 gives then the following inequality for all (not necessarily Lipschitz) convex functions and $p\geq 1$ ,

[TABLE]

Thus by the $L^{p}$ -Chebyshev inequality, with $p=ct^{r}/(3e)^{r}$ we obtain for $t\geq 0$ ,

[TABLE]

(the additional factor $e$ on the right-hand side is introduced artificially to encompass all $t\geq 0$ , also those for which $p<1$ ; note that in this case the right-hand side exceeds one). We remark that similar self-normalized inequalities are known e.g. in the theory of empirical processes (see [12]).

The lower tail inequalities gives

[TABLE]

Moreover, using the full strength of part (ii) of Theorem 6.5, one can replace $\mathbb{E}|\nabla f(X)|$ by $4^{-1}M_{3/4}$ , where $M_{3/4}$ is the $3/4$ quantile of $|\nabla f(X)|$ . Thus no integrability conditions on the gradient are in fact required.

*Remark 6.8**.*

Let us note that inequalities similar to (6.11) were previously known with the quantity $(\mathbb{E}|\nabla f(X)|^{2})^{1/2}$ instead of the quantile or $\mathbb{E}|\nabla f(X)|$ (see [28] or [23, Chapter 3.3]. Very recently, Paouris and Valettas [26] have proved that the standard Gaussian vector in $\mathbb{R}^{n}$ satisfies a similar inequality (for $r=2$ ) with $\mathbb{E}|f(X)-\operatorname{Med}f(X)|$ in place of $\mathbb{E}|\nabla f(X)|$ . Their proof uses in a crucial way isoperimetric properties of Gaussian measures. The version with $\mathbb{E}|\nabla f(X)|$ follows simply by an application of the (1,1)-Poincaré inequality for the Gaussian measure, i.e. $\mathbb{E}|f(X)-\operatorname{Med}f(X)|\leq C\mathbb{E}|\nabla f(X)|$ (see e.g. [27, 25]). In fact the proof in [26] gives also inequalities in terms of quantiles of $|f(X)-M|$ . We do not know if they are comparable to our estimates (specialized to the standard Gaussian measure) in terms of quantiles of $|\nabla f(X)|$ .

Note also that (6.9) for $r=1$ is a consequence of the convex Poincaré inequality (however we do not know if (1.1) implies (6.9) with $c$ depending only on $\lambda$ and not on the dimension $n$ , see Question 7.3 below).

Example 6.9.

Let us now consider a measure $\mu$ on $\mathbb{R}^{n}$ satisfying the convex Poincaré inequality with constant $\lambda$ . Then, by Theorem 3.1 it satisfies the convex Bobkov-Ledoux inequality (3.1) with constants $C$ and $c$ depending only on $\lambda$ . By the classical Herbst argument it follows (see e.g. [6, 2]) that for each $N\geq 1$ , if $X$ is an $Nn$ -dimensional random vector with law $\mu^{\otimes N}$ , then for any smooth convex function $f\colon\mathbb{R}^{Nn}\to\mathbb{R}$ and any $t>0$ ,

[TABLE]

where for $x=(x_{1},\ldots,x_{N})\in(\mathbb{R}^{n})^{N}=\mathbb{R}^{Nn}$ , $\nabla_{i}f(x)$ denotes the partial gradient with respect to $x_{i}$ .

Moreover, by the Poincaré inequality

[TABLE]

which at the cost of changing the constant allows to replace the mean by the median in the above inequality. Thus we obtain that for some constant $c^{\prime\prime}(\lambda)$ and $p>0$ ,

[TABLE]

It is easy to see that up to universal constants $c^{\prime\prime}(\lambda)(\sqrt{p}|x|+p\max_{i\leq N}|x_{i}|)$ is equivalent to $|x|_{\theta,p}$ , where

[TABLE]

More precisely

[TABLE]

Thus, the first part of Theorem 6.5 together with Remark 6.6 gives for arbitrary smooth convex function $f$ on $\mathbb{R}^{Nn}$ , the inequality

[TABLE]

for $p\geq 1$ , where $c^{\prime\prime\prime}(\lambda)$ depends only on $\lambda$ . By Chebyshev’s inequality this implies that

[TABLE]

for $t\geq 1$ (note that contrary to (6.10) this time $t$ cannot be removed from the denominator).

As for the lower tail, by Theorem 1.3, Remark 1.4, Lemma 2.5 and tensorization properties of infimum convolution inequalities (see Lemma 5 in [24]) we obtain that $X$ satisfies (6.1) and thus also (6.3) with $\theta(x)=K(\lambda,n)\sum_{i=1}^{N}\min(|x_{i}|^{2},|x_{i}|)$ , where $K(\lambda,n)$ depends only on $\lambda$ and the dimension $n$ . Thus, by the second part of Theorem 6.5,

[TABLE]

or equivalently (up to constants depending only on $\lambda,n$ ),

[TABLE]

We stress that all the above inequalities are dimension-free in the sense that the constants do not depend on the number $N$ but just on the initial dimension $n$ (cf. Remark 1.5).

Example 6.10.

Finally, we remark that general cost functions $\theta$ lead to other concentration profiles, which have been studied in the literature. One can for instance consider products of measures on $\mathbb{R}$ , satisfying (6.1) with

[TABLE]

for $r\geq 1$ (such measures are characterized thanks to results in [17]). If we denote for $x\in\mathbb{R}^{n}$ , $|x|_{r}=(|x_{1}|^{r}+\cdots+|x_{n}|^{r})^{1/r}$ and let $r^{\ast}$ be the Hölder conjugate of $r$ , then such costs correspond for $r\in[1,2]$ to norms of the form $|x|_{\theta,p}\simeq\sqrt{p}|x|+p^{1/r}|x|_{r^{\ast}}$ (the case $r=1$ has been discussed above), while for $r>2$ to

[TABLE]

where $(x_{i}^{\ast})_{i=1}^{n}$ is the non-increasing rearrangement of the sequence $(|x_{i}|)_{i=1}^{n}$ .

We will now present a corollary to Theorem 6.5, providing concentration inequalities for non-Lipschitz convex functions, in the spirit of recent results due to Bobkov, Nayar, and Tetali [8].

Corollary 6.11.

Under the assumptions of Theorem 6.5 for all convex functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ ,

[TABLE]

Moreover, for any $p\geq 1$ ,

[TABLE]

Let us note that inequalities of the form (6.12) have been obtained in [1] for all smooth functions of random vectors satisfying modified log-Sobolev inequalities (assumed to hold for all smooth functions). Therein, the function $\theta$ had to satisfy some appropriate growth condition.

Example 6.12.

In particular for $\theta(x)=c|x|^{2}$ , the above corollary gives

[TABLE]

By substituting $p=\frac{ct^{2}}{(3e)^{2}L^{2}}$ and adjusting the constant we obtain

[TABLE]

where $c^{\prime}$ is positive and depends only on $c$ . The factor 2 in the above inequality is introduced for notational simplicity to allow the whole range of $L>0$ in the infimum (note that for large $L$ we have $p<1$ and we cannot apply Corollary 6.11, on the other hand the above inequality becomes then trivial, as the right-hand side exceeds one).

Recall also the second part of Theorem 6.5 which for $q=3/4$ gives in this case

[TABLE]

where $M_{3/4}=\inf\{x\in\mathbb{R}^{n}\colon\mathbb{P}(|\nabla f(X)|\leq x)\geq 3/4\}$ and $c^{\prime\prime}$ again depends only on $c$ .

The above inequalities should be compared with a recent result in [8], which asserts that for some constant positive $c^{\prime\prime\prime}$ depending only on $c$ ,

[TABLE]

where $Y$ is an independent copy of $X$ .

It is not difficult to see that in the regime of $t$ for which the above inequalities are of interest, i.e. the right-hand sides are small, (6.13) gives estimates on the upper tail which (up to numerical constants) are comparable to those implied by (6.15), whereas for the lower tail, the inequality (6.14) improves over (6.15).

Example 6.13.

Consider now $\theta(x)=\sum_{i=1}^{N}\min(|x_{i}/c|^{2},|x_{i}/c|)$ , which we have already discussed in Example 6.9. From Corollary 6.11 we get

[TABLE]

By substituting $p=\min\{\frac{t^{2}}{(2c^{\prime})^{2}L^{2}},\frac{t}{2c^{\prime}M}\}$ and using the union bound we obtain

[TABLE]

with $c^{\prime\prime}$ depending only on $c$ . As in the preceding example, the factor 2 is introduced to allow for all positive values of $L,M$ .

*Remark 6.14**.*

Let us note that another way of obtaining estimates on the upper tail of non-Lipschitz functions under the convex Poincaré inequality is to use the estimates (2.1) and (2.2). By approximating arbitrary convex functions with Lipschitz ones we can easily see that they hold in fact for all convex functions. Thus, if one controls the moments of $|\nabla f(X)|$ , one can obtain tail estimates beyond the Lipschitz case. Such inequalities are however different than those of the above example as they are of exponential type and not of mixed exponential or Gaussian type. On the other hand, the weak transportation inequality with cost $\theta(x)=c\sum_{i=1}^{n}\min(|x_{i}|^{2},|x_{i}|)$ arises usually as a consequence of tensorization, so in order to apply it we need some additional product structure of the measure.

6.3. Proofs of Theorem 6.5 and Corollary 6.11

Proof of Theorem 6.5.

Let us start with (i), the proof of which is quite similar to the proof of Corollary 6.3. Let us again define $A=\{x\in\mathbb{R}^{n}\colon f(x)\leq\operatorname{Med}f(X)\}$ . Using (6.2) and (6.4), we can write for $t\geq 1$ ,

[TABLE]

Hence for $t\geq 1$ ,

[TABLE]

where we used the fact that the function $g(x)=\inf_{y\in A}|x-y|_{\frac{1}{tp}\theta}$ is convex, 1-Lipschitz with respect to $|\cdot|_{\frac{1}{tp}\theta}$ and $\operatorname{Med}g(X)=0$ , together with Corollary 6.3 and Remark 6.4. We can now integrate by parts and get

[TABLE]

(the integrand is pointwise non-increasing with respect to $p\geq 1$ , as the computation of the derivative with respect to $p$ reveals), which proves the first part of the theorem.

Let us now pass to the second part. Assume without loss of generality that $\operatorname{Med}f(X)=0$ . Consider the set $B=\{x\in\mathbb{R}^{n}\colon|\nabla f(x)|_{\theta,p}\leq M_{p,q}\}$ . By the definition of $M_{p,q}$ , we have $\mathbb{P}(X\in B)\geq q$ . Let $\tilde{f}\colon\mathbb{R}^{n}\to\mathbb{R}$ be defined as

[TABLE]

Then $\tilde{f}$ is convex, moreover by convexity of $f$ we have $\tilde{f}\leq f$ pointwise and $\tilde{f}=f$ on $B$ . By the definition of the set $B$ and inequality (6.2), for any $t\geq 1$ all linear functionals $x\mapsto\langle\nabla f(y),x\rangle$ , $y\in B$ , are $(tM_{p,q})$ -Lipschitz with respect to $|\cdot|_{\frac{1}{tp}\theta}$ and therefore so is $\tilde{f}$ . By Corollary 6.3 and Remark 6.4 this implies that for any $t\geq 1$ ,

[TABLE]

We also have $\mathbb{P}(\tilde{f}(X)\geq 0)\geq\mathbb{P}(f(X)\geq 0\;\textrm{and}\;X\in B)\geq q-1/2$ . Therefore, the above inequality applied with $t\searrow\log(8/(2q-1))>1$ gives

[TABLE]

which by another application of (6.16) implies

[TABLE]

This proves the first inequality of part (ii).

The second inequality of part (ii) follows from the first one by specializing to $q=3/4$ , $M_{p,q}=4\mathbb{E}|\nabla f(X)|_{\theta,p}$ and some elementary calculations.

As for part (iii), using again (6.2) and (6.7), we get for $t\geq 16\mathbb{E}|\nabla f(X)|_{\theta,p}$

[TABLE]

Now, again by integration by parts,

[TABLE]

which ends the proof. ∎

Proof of Corollary 6.11.

To prove the first inequality it is enough to note that if $|\nabla f(X)|_{\theta,p}\leq t/(3e)$ and $f(X)-\operatorname{Med}f(X)\geq t$ , then

[TABLE]

where the last inequality follows from (6.6). The assertion follows thus from Chebyshev’s inequality: $\mathbb{P}(Z\geq e\|Z\|_{p})\leq e^{-p}$ .

As for the second inequality, we apply the first one with $t=3e^{2}\||\nabla f(X)|_{\theta,p}\|_{p}$ and combine it with the estimate (6.7). ∎

7. Further questions

Let us conclude with some open questions, which seem natural in view of our results.

As already mentioned in the introduction, in our proof of the implication

[TABLE]

the constants $C,D$ do not depend just on $\lambda$ , but also on certain quantiles of the measure $\mu$ . In fact, the issue comes from the inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{+}$ , since the constants in $\overline{\mathbf{T}}\vphantom{\mathbf{T}}^{-}$ do depend only on $\lambda$ (see Corollary 4.3). This gives rise to our first question.

Question 7.1.

Does the Poincaré inequality with constant $\lambda$ imply the weak transportation inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta_{C,D}}$ with constants $C,D$ depending only on $\lambda$ ?

The inspection of our proof shows that in order to answer the above question in the affirmative, it is enough to remove the restriction on $t$ in Lemma 2.4. An improved version of this lemma, valid for all $t>0$ would follow by part (ii) of Theorem 6.5 provided that one can show that the convex Poincaré inequality with constant $\lambda$ implies subexponential concentration for convex 1-Lipschitz functions, with constants depending only on $\lambda$ . The problem lies in the lower-tail (as the upper one is handled by Proposition 2.2). More precisely, we have the following result.

Theorem 7.2.

Assume that $\mu$ is a probability measure on $\mathbb{R}^{n}$ , satisfying the convex Poincaré inequality (1.1) with constant $\lambda$ and $c$ is a positive constant, such that for all $1$ -Lipschitz convex functions $f\colon\mathbb{R}^{n}\to\mathbb{R}$ and all $t>0$ ,

[TABLE]

Then $\mu$ satisfies the inequality $\overline{\mathbf{T}}\vphantom{\mathbf{T}}_{\theta_{C,D}}$ with $C,D$ depending only on $\lambda$ and $c$ .

This motivates the following question, which is clearly of interest also in its own right.

Question 7.3.

Does the convex Poincaré inequality (1.1) with constant $\lambda$ imply subexponential estimates for the lower-tail of convex 1-Lipschitz functions, with constants depending only on $\lambda$ ? Specifically, is it true that whenever $\mu$ is a probability measure on $\mathbb{R}^{n}$ satisfying (1.1), then for every convex $1$ -Lipschitz function $f\colon\mathbb{R}^{n}\to\mathbb{R}$ ,

[TABLE]

where the constant $c(\lambda)$ depends only on $\lambda$ ?

The inequality provided by Lemma 2.4 introduces an additional dependence on $n$ , which carries over to the dependence of constants in Theorem 1.3. Let us point out that all the proofs of lower-tail estimates based on the Poincaré inequality and available for the category of all smooth functions, which we have been able to find in the literature, seem to break down in the convex setting (see e.g. the arguments in [20, 4, 19]).

Appendix A Facts related to Hamilton-Jacobi equations

We will now present some basic properties of Hamilton-Jacobi equations related to infimum convolution operators with the cost $\theta(x)=\alpha(|x|)$ , where $\alpha$ is given by (4.3), which have been exploited in the proof of Lemma 4.1. We remark that all the facts we will rely on are quite standard, however in the literature they are usually considered under slightly different sets of assumptions, which makes it difficult to find an off the shelf result applicable to our situation. We will briefly indicate how the reasonings from [13, Chapter 3] can be modified to yield the properties we need. Alternatively, as in [17], one could rely on modification of the results from [18], where the theory of Hamilton-Jacobi equations is extended to the setting of metric spaces.

Proposition A.1.

Let $C,L$ be positive constants and let $\alpha$ be defined by (4.3). Assume that $f\colon\mathbb{R}^{n}\to\mathbb{R}$ is either bounded from below or $L$ -Lipschitz and let $u\colon(0,\infty)\times\mathbb{R}^{n}\to\mathbb{R}$ be given by $u(t,x)=Q_{t}^{\alpha}f(x)$ , where

[TABLE]

Then the following conditions hold.

(a)

For every $s,t>0$ and every $x\in\mathbb{R}^{n}$ , $Q_{t}Q_{s}f(x)=Q_{t+s}f(x)$ .

(b)

The function $u$ is Lipschitz on $(0,\infty)\times\mathbb{R}^{n}$ ,

(c)

At every point $(t,x)\in(0,\infty)\times\mathbb{R}^{n}$ of differentiability of $u$ , one has

[TABLE]

where $\alpha^{\ast}$ is the Legendre transform of $\alpha$ , given explicitly by the formula

[TABLE]

Sketch of proof.

Let us note that if $f$ is bounded from below or $L$ -Lipschitz, then $Q_{t}f$ is well defined.

Ad (a). To show the semigroup property one can repeat the argument from the proof of [13, Chapter 3.3.2, Lemma 1], however in our setting one needs to work with infima rather then minima.

Ad (b). For fixed $t$ , $u$ is $L$ -Lipschitz as the function of $x$ , as an infimum of $L$ -Lipschitz functions. Indeed for each $y$ , the function $x\mapsto t\alpha(|x-y|/t)$ is $L$ -Lipschitz. As for the Lipschitz property with respect to $t$ , the argument in the proof of [13, Chapter 3.3.2, Lemma 2] shows that if $f$ is $L$ -Lipschitz, then for any $x$ ,

[TABLE]

where $M=\max_{|x|\leq L}\alpha^{\ast}(x)=CL^{2}$ . Now the Lipschitz condition with respect to $t>0$ (for general $f$ , which may not be $L$ -Lipschitz) follows from the semigroup property and the fact that $Q_{t}f$ is an $L$ -Lipschitz function of $x$ .

Ad (c). Using again the fact that $Q_{t}f$ is $L$ -Lipschitz, it is enough to consider the case when so is $f$ . One can then repeat the proof of [13, Chapter 3.3.2, Theorem 5], provided that one can prove that the infimum in the definition of $Q_{t}f$ is in fact achieved. To this end, it is enough to note that whenever $|y-x|>2CLt$ we have, denoting $z=x+2CLt(y-x)/|x-y|$ ,

[TABLE]

where the inequality holds by the Lipschitz property of $f$ and the last equality follows from the definition of $\alpha$ (and the fact that $z$ lies on the interval with endpoints $x$ and $y$ ). Thus $Q_{t}f(x)=\inf_{|y-x|\leq 2CL}\{f(y)+t\alpha(|y-x|/t)\}$ and the existence of the minimizer follows from compactness and continuity of $f$ and $\alpha$ . ∎

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Radosław Adamczak, Witold Bednorz, and Paweł Wolff, Moment estimates implied by modified log-Sobolev inequalities , to appear in ESAIM: Probability and Statistics.
2[2] Radosław Adamczak and Michał Strzelecki, Modified log-Sobolev inequalities for convex functions on the real line. Sufficient conditions , Studia Math. 230 (2015), no. 1, 59–93. MR 3456588
3[3] Radosław Adamczak and Paweł Wolff, Concentration inequalities for non-Lipschitz functions with bounded derivatives of higher order , Probab. Theory Related Fields 162 (2015), no. 3-4, 531–586. MR 3383337
4[4] S. Aida and D. Stroock, Moment estimates derived from Poincaré and logarithmic Sobolev inequalities , Math. Res. Lett. 1 (1994), no. 1, 75–86. MR 1258492
5[5] Cécile Ané, Sébastien Blachère, Djalil Chafaï, Pierre Fougères, Ivan Gentil, Florent Malrieu, Cyril Roberto, and Grégory Scheffer, Sur les inégalités de Sobolev logarithmiques , Panoramas et Synthèses [Panoramas and Syntheses], vol. 10, Société Mathématique de France, Paris, 2000, With a preface by Dominique Bakry and Michel Ledoux. MR 1845806
6[6] S. Bobkov and M. Ledoux, Poincaré’s inequalities and Talagrand’s concentration phenomenon for the exponential distribution , Probab. Theory Related Fields 107 (1997), no. 3, 383–400. MR 1440138
7[7] S. G. Bobkov and F. Götze, Exponential integrability and transportation cost related to logarithmic Sobolev inequalities , J. Funct. Anal. 163 (1999), no. 1, 1–28. MR 1682772
8[8] Sergey Bobkov, Piotr Nayar, and Prasad Tetali, Concentration Properties of Restricted Measures with Applications to Non-Lipschitz Functions , To appear in GAFA Seminar Notes (2015), ar Xiv:1506.06174 .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the convex Poincaré inequality and weak transportation inequalities

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

Definition 1.1**.**

Definition 1.2**.**

Theorem 1.3**.**

Remark 1.4*.*

Remark 1.5*.*

Remark 1.6*.*

2. Preliminaries on the convex Poincaré inequality and weak transportation inequalities

Lemma 2.1**.**

Proof.

2.1. Concentration inequalities

Proposition 2.2**.**

Proof of Proposition 2.2.

Remark 2.3*.*

Lemma 2.4**.**

Proof.

2.2. Infimum convolution. Dual formulation of transportation inequalities

Lemma 2.5**.**

3. From convex Poincaré to modified log-Sobolev inequalities

3.1. Modified log-Sobolev inequalities for convex functions

Theorem 3.1**.**

Proposition 3.2**.**

Proof.

Proposition 3.3**.**

Proof.

Proof of Theorem 3.1.

3.2. Modified log-Sobolev inequalities for concave functions

Theorem 3.4**.**

Remark 3.5*.*

Proof of Theorem 3.4.

4. Proof of the main result

Lemma 4.1**.**

Remark 4.2*.*

Corollary 4.3**.**

Corollary 4.4**.**

Proof of Lemma 4.1.

Proof of Theorem 1.3.

5. Examples of measures satisfying the convex Poincaré inequality

Proposition 5.1**.**

Proof.

Remark 5.2*.*

Proposition 5.3**.**

Proof.

6. Refined concentration of measure derived from infimum convolution inequalities

6.1. Enlargements of sets and concentration for Lipschitz functions

Proposition 6.1**.**

Proof.

Definition 6.2**.**

Corollary 6.3**.**

Remark 6.4*.*

Proof of Corollary 6.3.

6.2. Concentration inequalities for general convex functions

Theorem 6.5**.**

Remark 6.6*.*

Example 6.7**.**

Remark 6.8*.*

Example 6.9**.**

Example 6.10**.**

Corollary 6.11**.**

Example 6.12**.**

Example 6.13**.**

Remark 6.14*.*

6.3. Proofs of Theorem 6.5 and Corollary 6.11

Proof of Theorem 6.5.

Proof of Corollary 6.11.

7. Further questions

Question 7.1**.**

Theorem 7.2**.**

Question 7.3**.**

Appendix A Facts related to Hamilton-Jacobi equations

Definition 1.1.

Definition 1.2.

Theorem 1.3.

*Remark 1.4**.*

*Remark 1.5**.*

*Remark 1.6**.*

Lemma 2.1.

Proposition 2.2.

*Remark 2.3**.*

Lemma 2.4.

Lemma 2.5.

Theorem 3.1.

Proposition 3.2.

Proposition 3.3.

Theorem 3.4.

*Remark 3.5**.*

Lemma 4.1.

*Remark 4.2**.*

Corollary 4.3.

Corollary 4.4.

Proposition 5.1.

*Remark 5.2**.*

Proposition 5.3.

Proposition 6.1.

Definition 6.2.

Corollary 6.3.

*Remark 6.4**.*

Theorem 6.5.

*Remark 6.6**.*

Example 6.7.

*Remark 6.8**.*

Example 6.9.

Example 6.10.

Corollary 6.11.

Example 6.12.

Example 6.13.

*Remark 6.14**.*

Question 7.1.

Theorem 7.2.

Question 7.3.

Proposition A.1.