Stability of the Shannon-Stam inequality via the F\"ollmer process

Ronen Eldan; Dan Mikulincer

arXiv:1903.07140·cs.IT·September 8, 2020

Stability of the Shannon-Stam inequality via the F\"ollmer process

Ronen Eldan, Dan Mikulincer

PDF

TL;DR

This paper establishes stability estimates for the Shannon-Stam inequality for log-concave vectors, linking the deficit to entropy and transportation metrics, using a novel stochastic control approach.

Contribution

It provides the first stability estimates for general log-concave vectors in the Shannon-Stam inequality, with dimension-free bounds for uniformly log-concave cases.

Findings

01

Bound the Shannon-Stam deficit by relative entropy with respect to Gaussian.

02

First stability estimate for general log-concave vectors.

03

Dimension-free bounds for uniformly log-concave vectors.

Abstract

We prove stability estimates for the Shannon-Stam inequality (also known as the entropy-power inequality) for log-concave random vectors in terms of entropy and transportation distance. In particular, we give the first stability estimate for general log-concave random vectors in the following form: for log-concave random vectors $X, Y \in R^{d}$ , the deficit in the Shannon-Stam inequality is bounded from below by the expression $C (D (X ∣∣ G) + D (Y ∣∣ G)),$ where $D (\cdot ∣∣ G)$ denotes the relative entropy with respect to the standard Gaussian and the constant $C$ depends only on the covariance structures and the spectral gaps of $X$ and $Y$ . In the case of uniformly log-concave vectors our analysis gives dimension-free bounds. Our proofs are based on a new approach which uses an entropy-minimizing…

Equations325

C (D (X ∣∣ G) + D (Y ∣∣ G)),

C (D (X ∣∣ G) + D (Y ∣∣ G)),

h (μ) := h (X) = - R^{d} \int ln (\frac{d μ}{d x}) d μ .

h (μ) := h (X) = - R^{d} \int ln (\frac{d μ}{d x}) d μ .

h (λ X + 1 - λ Y) \geq λ h (X) + (1 - λ) h (Y) .

h (λ X + 1 - λ Y) \geq λ h (X) + (1 - λ) h (Y) .

e^{\frac{2 h ( X + Y )}{d}} \geq e^{\frac{2 h ( X )}{d}} + e^{\frac{2 h ( Y )}{d}},

e^{\frac{2 h ( X + Y )}{d}} \geq e^{\frac{2 h ( X )}{d}} + e^{\frac{2 h ( Y )}{d}},

\frac{d γ _{Σ} ( x )}{d x} = \frac{e ^{- \frac{⟨ x , Σ ^{- 1} x ⟩}{2}}}{det ( 2 π Σ )} .

\frac{d γ _{Σ} ( x )}{d x} = \frac{e ^{- \frac{⟨ x , Σ ^{- 1} x ⟩}{2}}}{det ( 2 π Σ )} .

D (μ ∣∣ ν) := D (X ∣∣ Y) = R^{d} \int ln (\frac{d μ}{d ν}) d μ .

D (μ ∣∣ ν) := D (X ∣∣ Y) = R^{d} \int ln (\frac{d μ}{d ν}) d μ .

D (X ∣∣ G)

D (X ∣∣ G)

\mathrm{D}\left(\sqrt{\lambda}X+\sqrt{1-\lambda}Y\big{|}\big{|}G\right)\leq\lambda\mathrm{D}(X||G)+(1-\lambda)\mathrm{D}(Y||G),

\mathrm{D}\left(\sqrt{\lambda}X+\sqrt{1-\lambda}Y\big{|}\big{|}G\right)\leq\lambda\mathrm{D}(X||G)+(1-\lambda)\mathrm{D}(Y||G),

\delta_{EPI,\lambda}(\mu,\nu):=\delta_{EPI,\lambda}(X,Y)=\Bigl{(}\lambda\mathrm{D}(X||G)+(1-\lambda)\mathrm{D}(Y||G)\Bigr{)}-\mathrm{D}\left(\sqrt{\lambda}X+\sqrt{1-\lambda}Y\big{|}\big{|}G\right),

\delta_{EPI,\lambda}(\mu,\nu):=\delta_{EPI,\lambda}(X,Y)=\Bigl{(}\lambda\mathrm{D}(X||G)+(1-\lambda)\mathrm{D}(Y||G)\Bigr{)}-\mathrm{D}\left(\sqrt{\lambda}X+\sqrt{1-\lambda}Y\big{|}\big{|}G\right),

W_{2} (μ, ν) = π in f R^{2 d} \int ∥ x - y ∥_{2}^{2} d π (x, y),

W_{2} (μ, ν) = π in f R^{2 d} \int ∥ x - y ∥_{2}^{2} d π (x, y),

- \nabla^{2} ln (f (x)) ⪰ 0 for all x,

- \nabla^{2} ln (f (x)) ⪰ 0 for all x,

- \nabla^{2} ln (f (x)) ⪰ ξ I_{d} for all x,

- \nabla^{2} ln (f (x)) ⪰ ξ I_{d} for all x,

δ_{E P I, λ} (X, Y) \geq \frac{λ ( 1 - λ )}{2} (σ_{X}^{4} D (X ∣∣ G_{X}) + σ_{Y}^{4} D (Y ∣∣ G_{Y}) + \frac{σ _{X}^{4}}{2} D (G_{X} ∣∣ G_{Y}) + \frac{σ _{Y}^{4}}{2} D (G_{Y} ∣∣ G_{X})) .

δ_{E P I, λ} (X, Y) \geq \frac{λ ( 1 - λ )}{2} (σ_{X}^{4} D (X ∣∣ G_{X}) + σ_{Y}^{4} D (Y ∣∣ G_{Y}) + \frac{σ _{X}^{4}}{2} D (G_{X} ∣∣ G_{Y}) + \frac{σ _{Y}^{4}}{2} D (G_{Y} ∣∣ G_{X})) .

W_{2}^{2} (X, G) \leq 2 D (X ∣∣ G) .

W_{2}^{2} (X, G) \leq 2 D (X ∣∣ G) .

δ_{E P I, λ} (X, Y) \geq C_{σ_{X}, σ_{Y}} \frac{λ ( 1 - λ )}{2} (W_{2}^{2} (X, G_{X}) + W_{2}^{2} (Y, G_{Y}) + W_{2}^{2} (G_{X}, G_{Y})),

δ_{E P I, λ} (X, Y) \geq C_{σ_{X}, σ_{Y}} \frac{λ ( 1 - λ )}{2} (W_{2}^{2} (X, G_{X}) + W_{2}^{2} (Y, G_{Y}) + W_{2}^{2} (G_{X}, G_{Y})),

δ_{E P I, λ} (X, Y) \geq \frac{λ ( 1 - λ )}{2} ξ^{2} (D (X ∣∣ G_{X}) + D (Y ∣∣ G_{Y}) + \frac{1}{2} D (G_{X} ∣∣ G_{Y}) + \frac{1}{2} D (G_{Y} ∣∣ G_{X})) .

δ_{E P I, λ} (X, Y) \geq \frac{λ ( 1 - λ )}{2} ξ^{2} (D (X ∣∣ G_{X}) + D (Y ∣∣ G_{Y}) + \frac{1}{2} D (G_{X} ∣∣ G_{Y}) + \frac{1}{2} D (G_{Y} ∣∣ G_{X})) .

E [Var (ψ (X))] \leq C E [∥ \nabla ψ (X) ∥_{2}^{2}], for all test functions ψ .

E [Var (ψ (X))] \leq C E [∥ \nabla ψ (X) ∥_{2}^{2}], for all test functions ψ .

δ_{E P I, λ} (X, Y) \geq K λ (1 - λ) (\frac{min ( σ _{Y}^{2} , σ _{X}^{2} )}{C _{p}})^{3} (D (X ∣∣ G) + D (Y ∣∣ G)),

δ_{E P I, λ} (X, Y) \geq K λ (1 - λ) (\frac{min ( σ _{Y}^{2} , σ _{X}^{2} )}{C _{p}})^{3} (D (X ∣∣ G) + D (Y ∣∣ G)),

δ_{E P I, \frac{1}{2}} (X, X) \geq \frac{1}{8 C _{p} ( X )} D (X ∣∣ G),

δ_{E P I, \frac{1}{2}} (X, X) \geq \frac{1}{8 C _{p} ( X )} D (X ∣∣ G),

δ_{E P I, \frac{1}{2}} (X, X) \geq κ D (X ∣∣ G),

δ_{E P I, \frac{1}{2}} (X, X) \geq κ D (X ∣∣ G),

δ_{E P I, λ} (X, Y) \geq \frac{λ ( 1 - λ )}{36 C _{p}} (D (X ∣∣ G) + D (Y ∣∣ G))

δ_{E P I, λ} (X, Y) \geq \frac{λ ( 1 - λ )}{36 C _{p}} (D (X ∣∣ G) + D (Y ∣∣ G))

δ_{E P I, λ} (X, G) \geq (λ - \frac{λ ( C _{p} ( X ) - 1 ) - ln ( λ ( C _{p} ( X ) - 1 ) + 1 )}{C _{p} ( X ) - ln ( C _{p} ( X ) ) - 1}) D (X ∣∣ G) .

δ_{E P I, λ} (X, G) \geq (λ - \frac{λ ( C _{p} ( X ) - 1 ) - ln ( λ ( C _{p} ( X ) - 1 ) + 1 )}{C _{p} ( X ) - ln ( C _{p} ( X ) ) - 1}) D (X ∣∣ G) .

(λ - \frac{λ ( C _{p} ( X ) - 1 ) - ln ( λ ( C _{p} ( X ) - 1 ) + 1 )}{C _{p} ( X ) - ln ( C _{p} ( X ) ) - 1}) \geq \frac{λ ( 1 - λ )}{C _{p} ( X )} .

(λ - \frac{λ ( C _{p} ( X ) - 1 ) - ln ( λ ( C _{p} ( X ) - 1 ) + 1 )}{C _{p} ( X ) - ln ( C _{p} ( X ) ) - 1}) \geq \frac{λ ( 1 - λ )}{C _{p} ( X )} .

I (X ∣∣ G) \geq 2 D (X ∣∣ G) \frac{( 1 - C _{p} ( X ) ) ^{2}}{C _{p} ( X ) ( C _{p} ( X ) - ln ( C _{p} ( X ) - 1 ) )} .

I (X ∣∣ G) \geq 2 D (X ∣∣ G) \frac{( 1 - C _{p} ( X ) ) ^{2}}{C _{p} ( X ) ( C _{p} ( X ) - ln ( C _{p} ( X ) - 1 ) )} .

v_{t}^{X} = ar g u_{t} min \frac{1}{2} 0 \int 1 E [∥ u_{t} ∥_{2}^{2}] d t,

v_{t}^{X} = ar g u_{t} min \frac{1}{2} 0 \int 1 E [∥ u_{t} ∥_{2}^{2}] d t,

B_{1} + 0 \int 1 u_{t} d t \sim μ .

B_{1} + 0 \int 1 u_{t} d t \sim μ .

X_{t} := B_{t} + 0 \int t v_{s}^{X} d s,

X_{t} := B_{t} + 0 \int t v_{s}^{X} d s,

v_{t}^{X} = \nabla_{x} ln (P_{1 - t} (f_{X} (X_{t}))),

v_{t}^{X} = \nabla_{x} ln (P_{1 - t} (f_{X} (X_{t}))),

D (X ∣∣ G) = \frac{1}{2} 0 \int 1 E [v_{t}^{X}_{2}^{2}] d t .

D (X ∣∣ G) = \frac{1}{2} 0 \int 1 E [v_{t}^{X}_{2}^{2}] d t .

\frac{d P}{d Q} = \frac{d μ}{d γ} (X_{1}),

\frac{d P}{d Q} = \frac{d μ}{d γ} (X_{1}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Stability of the Shannon-Stam inequality via the Föllmer process

Ronen Eldan Weizmann Institute of Science. Incumbent of the Elaine Blond career development chair. Supported by a European Research Council Starting Grant (ERC StG) and by an Israel Science Foundation grant no. 715/16.

Dan Mikulincer Weizmann Institute of Science. Supported by an Azrieli foundation fellowship.

Abstract

We prove stability estimates for the Shannon-Stam inequality (also known as the entropy-power inequality) for log-concave random vectors in terms of entropy and transportation distance. In particular, we give the first stability estimate for general log-concave random vectors in the following form: for log-concave random vectors $X,Y\in\mathbb{R}^{d}$ , the deficit in the Shannon-Stam inequality is bounded from below by the expression

[TABLE]

where $\mathrm{D}\left(\cdot~{}||G\right)$ denotes the relative entropy with respect to the standard Gaussian and the constant $C$ depends only on the covariance structures and the spectral gaps of $X$ and $Y$ . In the case of uniformly log-concave vectors our analysis gives dimension-free bounds. Our proofs are based on a new approach which uses an entropy-minimizing process from stochastic control theory.

1 Introduction

Let $\mu$ be a probability measure on $\mathbb{R}^{d}$ and $X\sim\mu$ . Denote by $\mathrm{h}(\mu)$ , the differential entropy of $\mu$ which is defined to be

[TABLE]

One of the fundamental results of information theory is the celebrated Shannon-Stam inequality which asserts that for independent vectors $X$ , $Y$ and $\lambda\in(0,1)$

[TABLE]

We remark that Stam [18] actually proved the equivalent statement

[TABLE]

first observed by Shannon in [17], and known today as the entropy power inequality. To state yet another equivalent form of the inequality, for any positive-definite matrix, $\Sigma$ , we set $\gamma_{\Sigma}$ as the centered Gaussian measure on $\mathbb{R}^{d}$ with density

[TABLE]

For the case where the covariance matrix is the identity, $\mathrm{I}_{d}$ , we will also write $\gamma:=\gamma_{\mathrm{I}_{d}}$ . If $Y\sim\nu$ we set the relative entropy of $X$ with respect to $Y$ as

[TABLE]

For $G\sim\gamma$ , the differential entropy is related to the relative entropy by

[TABLE]

Thus, when $X$ and $Y$ are independent and centered the statement

[TABLE]

is equivalent to (1). Shannon noted that in the case that $X$ and $Y$ are Gaussians with proportional covariance matrices, both sides of (2) are equal. Later, in [18] it was shown that this is actually a necessary condition for the equality case. We define the deficit in (3) as

[TABLE]

and are led to the question: what can be said about $X$ and $Y$ when $\delta_{EPI,\lambda}(X,Y)$ is small? One might expect that, in light of the equality cases, a small deficit in (3) should imply that $X$ and $Y$ are both close, in some sense, to a Gaussian. A recent line of works has focused on an attempt to make this intuition precise (see e.g., [20, 6]), which is also our main goal in the present work. In particular, we give the first stability estimate in terms of relative entropy. A good starting point is the work of Courtade, Fathi and Pananjady ([6]) which considers stability in terms of the Wasserstein distance (also known as quadratic transportation). The Wasserstein distance is defined by

[TABLE]

where the infimum is taken over all couplings $\pi$ whose marginal laws are $\mu$ and $\nu$ . A crucial observation made in their work is that without further assumptions on the measures $\mu$ and $\nu$ , one should not expect meaningful stability results to hold. Indeed, for any $\lambda\in(0,1)$ they show that there exists a family of measures $\{\mu_{\varepsilon}\}_{\varepsilon>0}$ such that $\delta_{EPI,\lambda}(\mu_{\varepsilon},\mu_{\varepsilon})<\varepsilon$ and such that for any Gaussian measure $\gamma_{\Sigma}$ , $\mathcal{W}_{2}(\mu_{\varepsilon},\gamma_{\Sigma})\geq\frac{1}{3}$ . Moreover, one may take $\mu_{\varepsilon}$ to be a mixture of Gaussians. Thus, in order to derive quantitative bounds it is necessary to consider a more restricted class of measures. We focus on the class of log-concave measures which, as our method demonstrates, turns out to be natural in this context.

Our Contribution

A measure is called log-concave if it is supported on some subspace of $\mathbb{R}^{d}$ and, relative to the Lebesgue measure of that subspace, it has a density $f$ for which

[TABLE]

where $\nabla^{2}$ denotes the Hessian matrix, and we consider the inequality in the sense of positive definite matrices. Our first result will rely on a slightly stronger condition known as uniform log-concavity. If there exists $\xi>0$ such that

[TABLE]

then we say that the measure is $\xi$ -uniformly log-concave.

Theorem 1.

Let $X$ and $Y$ be $1$ -uniformly log-concave centered vectors, and denote by $\sigma^{2}_{X},\sigma^{2}_{Y}$ the respective minimal eigenvalues of their covariance matrices. Then there exist Gaussian vectors $G_{X}$ and $G_{Y}$ such that for any $\lambda\in(0,1)$ ,

[TABLE]

To compare this with the main result of [6] we recall the transportation-entropy inequality due to Talagrand ([19]) which states that

[TABLE]

As a conclusion we get

[TABLE]

where $C_{\sigma_{X},\sigma_{Y}}$ depends only on $\sigma_{X}$ and $\sigma_{Y}$ . Up to this constant, this is precisely the main result of [6]. In fact, our method can reproduce their exact result, which we present as a warm up in the next section. We remark that as the underlying inequality is of information-theoretic nature, it is natural to expect that stability estimates are expressed in terms of relative entropy.

A random vector is isotropic if it is centered and its covariance matrix is the identity. By a re-scaling argument the above theorem can be restated for uniform log-concave isotropic random vectors.

Corollary 2.

Let $X$ and $Y$ be $\xi$ -uniformly log-concave and isotropic random vectors, then there exist Gaussian vectors $G_{X}$ and $G_{Y}$ such that for any $\lambda\in(0,1)$

[TABLE]

In our estimate for general log-concave vectors, the dependence on the parameter $\xi$ will be replaced by the spectral gap of the measures. We say that a random vector $X$ satisfies a Poincaré inequality if there exists a constant $C>0$ such that

[TABLE]

We define $C_{p}(X)$ to be the smallest number such that the above equation holds with $C=C_{p}(X)$ , and refer to this quantity as the Poincaré constant of $X$ . The inverse quantity, $C_{p}(X)^{-1}$ is referred to as the spectral gap of $X$ .

Theorem 3.

Let $X$ and $Y$ be centered log-concave vectors with $\sigma^{2}_{X}$ , $\sigma_{Y}^{2}$ denoting the minimal eigenvalues of their covariance matrices. Assume that $\mathrm{Cov}(X)+\mathrm{Cov}(Y)=2\mathrm{I}_{d}$ and set $\max\left(\frac{\mathrm{C_{p}}(X)}{\sigma_{X}^{2}},\frac{\mathrm{C_{p}}(Y)}{\sigma^{2}_{Y}}\right)=\mathrm{C_{p}}$ . Then, if $G$ denotes the standard Gaussian, for every $\lambda\in(0,1)$

[TABLE]

where $K>0$ is a numerical constant, which can be made explicit.

*Remark 4**.*

For $\xi$ -uniformly log-concave vectors, we have the relation, $\mathrm{C_{p}}(X)\leq\frac{1}{\xi}$ (this is a consequence of the Brascamp-Lieb inequality [3], for instance). Thus, considering Corollary 2, one might have expected that the term $\mathrm{C^{3}_{p}}$ could have been replaced by $\mathrm{C^{2}_{p}}$ in Theorem 3. We do not know if either result is tight.

*Remark 5**.*

Bounding the Poincaré constant of an isotropic log-concave measure is the object of the long standing Kannan-Lováz-Simonovits (KLS) conjecture (see [12, 13] for more information). The conjecture asserts that there exists a constant $K>0$ , independent of the dimension, such that for any isotropic log-concave vector $X$ , $\mathrm{C_{p}}(X)\leq K$ . The best known bound is due to Lee and Vempala which showed in [14] that if $X$ is a a $d$ -dimensional log-concave vector, $\mathrm{C_{p}}(X)=O\left(\sqrt{d}\right).$

Concerning the assumptions of Theorem 3; note that as the EPI is invariant to linear transformation, there is no loss in generality in assuming $\mathrm{Cov}(X)+\mathrm{Cov}(Y)=2\mathrm{I}_{d}$ . Remark that $\mathrm{C_{p}}(X)$ is, approximately, proportional to the maximal eigenvalue of $\mathrm{Cov}(X)$ . Thus, for ill-conditioned covariance matrices $\frac{\mathrm{C_{p}}(X)}{\sigma_{X}^{2}},\frac{\mathrm{C_{p}}(Y)}{\sigma^{2}_{Y}}$ will not be on the same scale. It seems plausible to conjecture that the dependence on the minimal eigenvalue and Poicnaré constant could be replaced by a quantity which would take into consideration all eigenvalues.

Some other known stability results, both for log-concave vectors and for other classes of measures, may be found in [20, 5, 6]. The reader is referred to [6, Section 2.2] for a complete discussion. Let us mention one important special case, which is relevant to our results; the so-called entropy jump, first proved for the one dimensional case by Ball, Barthe and Naor ([1]) and then generalized by Ball and Nguyen to arbitrary dimensions in [2]. According to the latter result, if $X$ is a log-concave and isotropic random vector, then

[TABLE]

where $\mathrm{C_{p}}(X)$ is the Poincaré constant of $X$ and $G$ is the standard Gaussian. This should be compared to both Corollary 2 and Theorem 3. That is, in the special case of two identical measures and $\lambda=\frac{1}{2}$ , their result gives a better dependence on the Poincaré constant than the one afforded by our results.

Ball and Nguyen ([2]) also give an interesting motivation for these type of inequalities: They show that if for some constant $\kappa>0$ ,

[TABLE]

then the density $f_{X}$ of $X$ satisfies, $f_{X}(0)\leq e^{\frac{2d}{\kappa}}$ . The isotropic constant of $X$ is defined by $L_{X}:=f_{X}(0)^{\frac{1}{d}}$ , and is the main subject of the slicing conjecture, which hypothesizes that $L_{X}$ is uniformly bounded by a constant, independent of the dimension, for every isotropic log-concave vector $X$ . Ball and Nguyen observed that using the above fact in conjunction with an entropy jump estimate gives a bound on the isotropic constant in terms of the Poincaré constant, and in particular the slicing conjecture is implied by the KLS conjecture.

Our final results give improved bounds under the assumption that $X$ and $Y$ are already close to being Gaussian, in terms of relative entropy, or if one them is a Gaussian. We record these results in the following theorems.

Theorem 6.

Suppose that $X,Y$ be isotropic log-concave vectors such that $\mathrm{C_{p}}(X),\mathrm{C_{p}}(Y)\leq\mathrm{C_{p}}$ for some $\mathrm{C_{p}}<\infty$ . Suppose further that $\mathrm{D}(X||G),\mathrm{D}(Y||G)\leq\frac{1}{4}$ , then

[TABLE]

The following gives an improved bound in the case that one of the random vectors is a Gaussian, and holds in full generality with respect to the other vector, without a log-concavity assumption.

Theorem 7.

Let $X$ be a centered random vector with finite Poincaré constant, $\mathrm{C_{p}}(X)<\infty$ . Then

[TABLE]

*Remark 8**.*

When $\mathrm{C_{p}}(X)\geq 1$ , the following inequality holds

[TABLE]

*Remark 9**.*

Theorem 7 was already proved in [6] by using a slightly different approach. Denote by $\mathrm{I}(X||G)$ , the relative Fisher information of the random vector $X$ . In [9] the authors proof the following improved log-Sobolev inequality.

[TABLE]

The theorem follows by integrating the inequality along the Ornstein-Uhlenbeck semi-group.

Acknowledgments

We are grateful to Alex Zhai for several enlightening exchanges of ideas, and are thankful to Bo’az Klartag and Max Fathi for useful discussions. We would also like to thank Tom Courtade for his thoughtful comments concerning a preliminary draft and for suggesting that we generalize the proof of Theorem 3 for arbitrary covariance structures.

2 Bounding the deficit via martingale embeddings

Our approach is based on ideas somewhat related to the ones which appear in [8]: the very high-level plan of the proof is to embed the variables $X,Y$ as the terminal points of some martingales and express the entropies of $X,Y$ and $X+Y$ as functions of the associates quadratic co-variation processes. One of the main benefits in using such an embedding is that the co-variation process of $X+Y$ can be easily expressed in terms on the ones of $X,Y$ , as demonstrated below. In [8] these ideas where used to produce upper bounds for the entropic central limit theorem, so it stands to reason that related methods may be useful here. It turns out, however, that in order to produce meaningful bounds for the Shannon-Stam inequality, one needs a more intricate analysis, since this inequality corresponds to a second-derivative phenomenon: whereas for the CLT one only needs to produce upper bounds on the relative entropy, here we need to be able to compare, in a non-asymptotic way, two relative entropies.

In particular, our martingale embedding is constructed using the entropy minimizing technique developed by Föllmer ([10, 11]) and later Lehec ([15]). This construction has several useful features, one of which is that it allows us to express the relative entropy of a measure in $\mathbb{R}^{d}$ in terms of a variational problem on the Wiener space. In addition, upon attaining a slightly different point of view on this process, that we introduce here, the behavior of this variational expression turns out to be tractable with respect to convolutions.

In order to outline the argument, fix centered measures $\mu$ and $\nu$ on $\mathbb{R}^{d}$ with finite second moment. Let $X\sim\mu$ , $Y\sim\nu$ be random vectors and $G\sim\gamma$ a standard Gaussian random vector.

An entropy-minimizing drift. Let $B_{t}$ be a standard Brownian motion on $\mathbb{R}^{d}$ and denote by $\mathcal{F}_{t}$ its natural filtration. In the sequel, the following process plays a fundamental role:

[TABLE]

where the minimum is taken with respect to all processes $u_{t}$ adapted to $\mathcal{F}_{t}$ , such that

[TABLE]

Amazingly, under mild assumptions on $\mu$ , and in particular in the case that $\mu$ is log-concave, there exists a unique minimizer to Equation (4), from which we construct the process

[TABLE]

also known as the Föllmer process, with $v_{t}^{X}$ being the associated Föllmer drift. We refer the reader to [15] for proofs of the existence and uniqueness of the process, as well as of a few other facts summarized below.

It turns out that the process $v_{t}^{X}$ is a martingale (which goes together with the fact that it minimizes a quadratic form) which is given by the equation

[TABLE]

where $f_{X}$ is the density of $X$ with respect to the standard Gaussian and $P_{1-t}$ denotes the heat semi-group. In fact, Girsanov’s formula gives a very useful relation between the energy of the drift and the entropy of $X$ , namely,

[TABLE]

This gives the following alternative interpretation for the process: suppose that the Wiener space is equipped with an underlying probability measure $P$ , with respect to which the process $B_{t}$ is a Brownian motion as above. Let $Q$ be a measure on Wiener space such that

[TABLE]

then the process $X_{t}$ is a Brownian motion with respect to the measure $Q$ . By the representation theorem for the Brownian bridge, this tells us that the process $X_{t}$ conditioned on $X_{1}$ is a Brownian bridge between [math] and $X_{1}$ . In particular, we have

[TABLE]

Lehec’s proof of the Shannon-Stam inequality. For the sake of intuition, we now repeat Lehec’s argument to reproduce the Shannon-Stam inequality (3) using this process. Let $X_{t}:=B^{X}_{t}+\int\limits_{0}^{t}v^{X}_{s}ds$ and $Y_{t}:=B^{Y}_{t}+\int\limits_{0}^{t}v^{Y}_{s}ds$ be the Föllmer processes associated to $X$ and $Y$ , where $B_{t}^{X}$ and $B_{t}^{Y}$ are independent Brownian motions. For $\lambda\in(0,1)$ , define the new processes

[TABLE]

and

[TABLE]

By the independence of $B_{t}^{X}$ and $B_{t}^{Y}$ , $\tilde{B}_{t}$ is a Brownian motion and

[TABLE]

Note that as the $v_{t}^{X}$ is martingale, we have for every $t\in[0,1]$ ,

[TABLE]

Using equations (4) and (6) and recalling that the processes are independent, we finally have

[TABLE]

This recovers the Shannon-Stam inequality in the form (3).

An alternative point of view: Replacing the drift by a varying diffusion coefficient. Lehec’s proof gives rise to the following idea: Suppose the processes $v_{t}^{X}$ and $v_{t}^{Y}$ could be coupled in a way such that the variance of the resulting process $\sqrt{\lambda}v_{t}^{X}+\sqrt{1-\lambda}v_{t}^{Y}$ was smaller than that of $w_{t}$ above. Such a coupling would improve on (3) and that is the starting point of this work.

As it turns out, however, it is easier to get tractable bounds by working with a slightly different interpretation of the above processes, in which the role of the drift is taken by an adapted diffusion coefficient of a related process.

The idea is as follows: Suppose that $M_{t}:=\int\limits_{0}^{t}F_{s}dB_{s}$ is a martingale, where $F_{t}$ is some positive-definite matrix valued process adapted to $\mathcal{F}_{t}$ . Consider the drift defined by

[TABLE]

We then claim that $B_{1}+\int\limits_{0}^{1}u_{t}dt=M_{1}$ . To show this, we use the stochastic Fubini Theorem ([21]) to write

[TABLE]

Since we now expressed the random variable $M_{1}$ as the terminal point of a standard Brownian motion with an adapted drift, the minimality property of the Föllmer drift together with equation (6) immediately produce a bound on its entropy. Namely, by using Itô’s isometry and Fubini’s theorem we have the bound

[TABLE]

This hints at the following possible scheme of proof: in order to give an upper bound for the expression $\mathrm{D}(\sqrt{\lambda}X_{1}+\sqrt{1-\lambda}Y_{1}||G)$ , it suffices to find martingales $M_{t}^{X}$ and $M_{t}^{Y}$ such that $M_{1}^{X},M_{1}^{Y}$ have the laws of $X$ and $Y$ , respectively, and such that the $\lambda$ -average of the covariance processes is close to the identity.

The Föllmer process gives rise to a natural martingale: Consider $\mathbb{E}\left[X_{1}|\mathcal{F}_{t}\right]$ , the associated Doob martingale. By the martingale representation theorem ([16, Theorem 4.3.3]) there exists a uniquely defined adapted matrix valued process $\Gamma_{t}^{X}$ , for which

[TABLE]

By following the construction in (8) and considering the process $\tilde{v}_{t}^{X}:=\int\limits_{0}^{t}\frac{\Gamma^{X}_{s}-\mathrm{I}_{d}}{1-s}dB^{X}_{s}$ , it is immediate that $B_{1}+\int\limits_{0}^{1}\tilde{v}_{t}^{X}dt=X_{1}$ . Observe that $v_{t}-\tilde{v}_{t}$ is a martingale and that for every $t\in[0,1]$ , $\int\limits_{t}^{1}(v_{s}^{X}-\tilde{v}_{s}^{X})ds|\mathcal{F}_{t}=0,$ almost surely. It thus follows that $v_{t}^{X}$ and $\tilde{v}_{t}^{X}$ are almost surely the same process. We conclude the following representation for the Föllmer drift,

[TABLE]

The matrix $\Gamma_{t}^{X}$ turns out to be positive definite almost surely, (in fact, it has an explicit simple representation, see Proposition 1 below), which yields, by the combining (6) with same calculation as in (9),

[TABLE]

Given the processes $\Gamma_{t}^{X}$ and $\Gamma_{t}^{Y}$ , we are now in position to express $\sqrt{\lambda}X+\sqrt{1-\lambda}Y$ as the terminal point of a martingale, towards using (9), which would lead to a bound on $\delta_{EPI,\lambda}$ . We define

[TABLE]

and a martingale $\tilde{B}_{t}$ which satisfies

[TABLE]

Since $\Gamma_{t}^{X}$ and $\Gamma_{t}^{Y}$ are invertible almost surely and independent, it holds that

[TABLE]

where $[\tilde{B}]_{t}$ denotes the quadratic co-variation of $\tilde{B}_{t}$ . Thus, by Levy’s characterization, $\tilde{B}_{t}$ is a standard Brownian motion and we have the following equality in law

[TABLE]

We can now invoke (9) to get

[TABLE]

Combining this with the identity (12) finally gives a bound on the deficit in the Shannon-Stam inequality, in the form

[TABLE]

The following technical lemma will allow us to give a lower bound for the right hand side in terms of the variances of the processes $\Gamma_{t}^{X},\Gamma_{t}^{Y}$ . Its proof is postponed to the end of the section.

Lemma 1.

Let $A$ and $B$ be positive definite matrices and denote

[TABLE]

Then

[TABLE]

Combining the lemma with the estimate obtained in (2) produces the following result, which will be our main tool in studying $\delta_{EPI,\lambda}$ .

Lemma 2.

Let $X$ and $Y$ be centered random vectors on $\mathbb{R}^{d}$ with finite second moment, and let $\Gamma_{t}^{X},\Gamma_{t}^{Y}$ be defined as above. Then,

[TABLE]

The expression on the right-hand side of (14) may seem unwieldy, however, in many cases it can be simplified. For example, if it can be shown that, almost surely, $\Gamma_{t}^{X},\Gamma_{t}^{Y}\preceq c_{t}\mathrm{I}_{d}$ for some deterministic $c_{t}>0$ , then we obtain the more tractable inequality

[TABLE]

As we will show, this is the case when the random vectors are log-concave.

Proof of Lemma 1.

We have

[TABLE]

As

[TABLE]

we have the equality

[TABLE]

Finally, as the trace is invariant under any permutation of three symmetric matrices we have that

[TABLE]

and

[TABLE]

Thus,

[TABLE]

as required.

2.1 The Föllmer process associated to log-concave random vectors

In this section, we collect several results pertaining to the Föllmer process. Throughout the section, we fix a random vector $X$ in $\mathbb{R}^{n}$ and associate to it the Föllmer process $X_{t}$ , defined in the previous section, as well as the process $\Gamma^{X}_{t}$ , defined in equation (10) above. The next result lists some of its basic properties, and we refer to [8, 7] for proofs.

Proposition 1.

For $t\in(0,1)$ define

[TABLE]

where $f_{X}$ is the density of $X$ with respect to the standard Gaussian and $Z_{t,X}$ is a normalizing constant defined so that $\int\limits_{\mathbb{R}^{d}}f_{X}^{t}=1$ . Then

•

$f_{X}^{t}$ * is the density of the random measure $\mu_{t}:=X_{1}|\mathcal{F}_{t}$ with respect to the standard Gaussian and $\Gamma^{X}_{t}=\frac{\mathrm{Cov}\left(\mu_{t}\right)}{1-t}$ .*

•

$\Gamma^{X}_{t}$ * is almost surely a positive definite matrix, in particular, it is invertible.*

•

For all $t\in(0,1)$ , we have

[TABLE]

•

The following identity holds

[TABLE]

for all $t\in[0,1]$ . In particular, if $\mathrm{Cov}(X)\preceq\mathrm{I}_{d}$ , then $\mathbb{E}\left[\Gamma^{X}_{t}\right]\preceq\mathrm{I}_{d}$ .

In what follows, we restrict ourselves to the case that $X$ is log-concave. Using this assumption we will establish several important properties for the matrix $\Gamma_{t}$ . For simplicity, we will write $\Gamma_{t}:=\Gamma_{t}^{X}$ and $v_{t}:=v_{t}^{X}$ . The next result shows that the matrix $\Gamma_{t}$ is bounded almost surely.

Lemma 3.

Suppose that $X$ is log-concave, then for every $t\in(0,1)$

[TABLE]

Moreover, if for some $\xi>0$ , $X$ is $\xi$ -uniformly log-concave then

[TABLE]

Proof.

By Proposition 1, $\mu_{t}$ , the law of $X_{1}|\mathcal{F}_{t}$ has a density $\rho_{t}$ , with respect to the Lebesgue measure, proportional to

[TABLE]

Consequently, since $-\nabla^{2}f_{X}\succeq 0$ ,

[TABLE]

It follows that, almost surely, $\mu_{t}$ is $\frac{t}{1-t}$ -uniformly log-concave. According to the Brascamp-Lieb inequality ([3]) $\alpha$ -uniform log-concavity implies a spectral gap of $\alpha$ , and in particular $\textrm{Cov}(\mu_{t})\preceq\frac{1-t}{t}\mathrm{I}_{d}$ and so, $\Gamma_{t}=\frac{\mathrm{Cov}(\mu_{t})}{1-t}\preceq\frac{1}{t}\mathrm{I}_{d}$ . If, in addition, $X$ is $\xi$ -uniformly log-concave, so that $-\nabla^{2}f_{X}\succeq\xi\mathrm{I}_{d}$ , then we may write

[TABLE]

and the arguments given above show $\textrm{Cov}(\mu_{t})\preceq\frac{(1-t)}{(1-t)\xi+t}\mathrm{I}_{d}$ . Thus,

[TABLE]

Our next goal is to use the formulas given in the above lemma in order to bound from below the expectation of $\Gamma_{t}$ . We begin with a simple corollary.

Corollary 10.

Suppose that $X$ is $1$ -uniformly log-concave, then for every $t\in[0,1]$

[TABLE]

Proof.

By (16), we have

[TABLE]

By Lemma 3, $\Gamma_{t}\preceq\mathrm{I}_{d}$ , which shows

[TABLE]

Thus, for every $t$ ,

[TABLE]

To produce similar bounds for general log-concave random vectors, we require more intricate arguments. Recall that $\mathrm{C_{p}}(X)$ denotes the Poincaré constant of $X$ .

Lemma 4.

If $X$ is centered and has a finite a Poincaré constant $\mathrm{C_{p}}(X)<\infty$ , then

[TABLE]

Proof.

Recall that, by equation (7), we know that $X_{t}$ has the same law as $tX_{1}+\sqrt{t(1-t)}G$ , where $G$ is a standard Gaussian independent of $X_{1}$ . Since $\mathrm{C_{p}}(tX)=t^{2}\mathrm{C_{p}}(X)$ and since the Poincaré constant is sub-additive with respect to convolution ([4]) we get

[TABLE]

The drift, $v_{t}$ , is a function of $X_{t}$ and $\mathbb{E}\left[v_{t}\right]=0$ . Equation (5) implies that $\nabla_{x}v_{t}(X_{t})$ is a symmetric matrix, hence the Poincaré inequality yields

[TABLE]

As $v_{t}(X_{t})$ is a martingale, by Itô’s lemma we have

[TABLE]

An application of Itô’s isometry then shows

[TABLE]

where we have again used the fact that $\nabla_{x}v_{t}(X_{t})$ is symmetric.

Using the last lemma, we can deduce lower bounds on the matrix $\Gamma_{t}^{X}$ in terms of the Poincaré constant.

Corollary 11.

Suppose that $X$ is log-concave and that $\sigma^{2}$ is the minimal eigenvalue of $\mathrm{Cov}(X)$ . Then,

•

For every $t\in\left[0,\frac{1}{2\frac{\mathrm{C_{p}}(X)}{\sigma^{2}}+1}\right]$ , $\mathbb{E}\left[\Gamma_{t}\right]\succeq\frac{\min(1,\sigma^{2})}{3}\mathrm{I}_{d}.$

•

For every $t\in\left[\frac{1}{2\frac{\mathrm{C_{p}}(X)}{\sigma^{2}}+1},1\right]$ , $\mathbb{E}\left[\Gamma_{t}\right]\succeq\frac{\min(1,\sigma^{2})}{3}\frac{1}{t\left(2\frac{\mathrm{C_{p}}(X)}{\sigma^{2}}+1\right)}\mathrm{I}_{d}$ .

Proof.

Using Equation (11), Itô’s isometry and the fact that $\Gamma_{t}$ is symmetric, we deduce that

[TABLE]

Combining this with equation (17) and using Lemma 4, we get

[TABLE]

In the case where $X$ is log-concave, by Lemma 3, $\Gamma_{t}\preceq\frac{1}{t}\mathrm{I}_{d}$ almost surely, therefore $\mathbb{E}\left[\Gamma_{t}^{2}\right]\preceq\frac{1}{t}\mathbb{E}\left[\Gamma_{t}\right]$ . The above inequality then becomes

[TABLE]

Rearranging the inequality shows

[TABLE]

As long as $t\leq\frac{1}{2\left(\frac{\mathrm{C_{p}}(X)}{\sigma^{2}}\right)+1}$ , we have

[TABLE]

which gives the first bound. By (10), we also have the bound

[TABLE]

The differential equation

[TABLE]

has a unique solution given by

[TABLE]

Using Gromwall’s inequality, we conclude that for every $t\in\left[\frac{1}{2\frac{\mathrm{C_{p}}(X)}{\sigma^{2}}+1},1\right]$ ,

[TABLE]

We conclude this section with a comparison lemma that will allow to control the values of $\mathbb{E}\left[\left\lVert v_{t}\right\rVert_{2}^{2}\right]$ .

Lemma 5.

Let $t_{0}\in[0,1]$ and suppose that $X$ is centered with a finite Poincaré constant $\mathrm{C_{p}}(X)<\infty$ . Then

•

For $t_{0}\leq t\leq 1,$

[TABLE]

•

For $0\leq t\leq t_{0},$

[TABLE]

Proof.

Consider the differential equation

[TABLE]

It has a unique solution given by

[TABLE]

The bounds follow by applying Gromwall’s inequality combined with the result of Lemma 4.

3 Stability for $1$ -uniformly log-concave random vectors

In this section, we assume that $X$ and $Y$ are both $1$ -uniformly log-concave. Let $B_{t}^{X},B_{t}^{Y}$ be independent standard Brownian motions and consider the associated processes $\Gamma_{t}^{X},\Gamma_{t}^{Y}$ defined as in Section 2.

The key fact that makes the uniform log-concave case easier is Lemma 3, which implies that $\Gamma_{t}^{X},\Gamma_{t}^{Y}\preceq\mathrm{I}_{d}$ almost surely. In this case, Lemma 2 simplifies to

[TABLE]

where we have used the fact that

[TABLE]

Consider the two Gaussian random vectors defined as

[TABLE]

and observe that

[TABLE]

This induces a coupling between $X$ and $G_{X}$ from which we obtain, using Itô’s Isometry,

[TABLE]

and an analogous estimate also holds for $Y$ . We may now use $\mathbb{E}\left[\Gamma_{t}^{X}\right]$ and $\mathbb{E}\left[\Gamma_{t}^{Y}\right]$ as the diffusion coefficients for the same Brownian motion to establish

[TABLE]

Plugging these estimates into (19) reproves the following bound, which is identical to Theorem 1 in [6].

Theorem 12.

Let $X$ and $Y$ be $1$ -uniformly log-concave centered vectors and let $G_{X},G_{Y}$ be defined as above. Then,

[TABLE]

To obtain a bound for the relative entropy towards the proof of Theorem 1, we will require a slightly more general version of inequality (9). This is the content of the next lemma, whose proof is similar to the argument presented above. The main difference comes from applying Girsanov’s theorem to a re-scaled Brownian motion, from which we obtain an expression analogous to (6). The reader is referred to [8, Lemma 2], for a complete proof.

Lemma 6.

Let $F_{t}$ and $E_{t}$ be two $F_{t}$ -adapted matrix-valued processes and let $X_{t}$ , $M_{t}$ be two processes defined by

[TABLE]

Suppose that for every $t\in[0,1]$ , $E_{t}\succeq c\mathrm{I}_{d}$ for some deterministic $c>0$ , then

[TABLE]

Proof of Theorem 1.

By Corollary 10

[TABLE]

We invoke Lemma 6 with $E_{t}=\mathbb{E}\left[\Gamma_{t}^{X}\right]$ and $F_{t}=\Gamma_{t}^{X}$ to obtain

[TABLE]

Repeating the same argument for $Y$ gives

[TABLE]

By invoking Lemma 6 with $F_{t}=\mathbb{E}\left[\Gamma_{t}^{X}\right]$ and $E_{t}=\mathbb{E}\left[\Gamma_{t}^{Y}\right]$ and then one more time after switching between $F_{t}$ and $E_{t}$ , and summing the results, we get

[TABLE]

Plugging the above inequalities into (19) concludes the proof.

4 Stability for general log-concave random vectors

Fix $X,Y$ , centered log-concave random vectors in $\mathbb{R}^{d}$ , such that

[TABLE]

with $\sigma_{X}^{2},\sigma_{Y}^{2}$ the corresponding minimal eigenvalues of $\mathrm{Cov}(X)$ and $\mathrm{Cov}(Y)$ . Assume further that $\frac{\mathrm{C_{p}}(Y)}{\sigma_{Y}^{2}},\frac{\mathrm{C_{p}}(X)}{\sigma_{X}^{2}}\leq\mathrm{C_{p}}$ , for some $\mathrm{C_{p}}>1$ . Again, let $B_{t}^{X}$ and $B_{t}^{Y}$ be independent Brownian motions and consider the associated processes $\Gamma_{t}^{X},\Gamma_{t}^{Y}$ defined as in Section 2.

The general log-concave case, in comparison with the case where $X$ and $Y$ are uniformly log-concave, gives rise to two essential difficulties. Recall that the results in the previous section used the fact that an upper bound for the matrices $\Gamma_{t}^{X},\Gamma_{t}^{Y}$ , combined with equation (14) gives the simpler bound (19). Unfortunately, in the general log-concave case, there is no upper bound uniform in $t$ , which creates the first problem. The second issue has to do with the lack of respective lower bounds for $\mathbb{E}[\Gamma_{t}^{X}]$ and $\mathbb{E}[\Gamma_{t}^{Y}]$ : in view of Lemma 6, one needs such bounds in order to obtain estimates on the entropies.

The solution of the second issue lies in Corollary 11, which gives a lower bound for the processes in terms on the Poincaré constants. We denote $\xi=\frac{1}{(2\mathrm{C_{p}}+1)}\frac{\min(\sigma_{Y}^{2},\sigma_{X}^{2})}{3}$ , so that the corollary gives

[TABLE]

Thus, we are left with the issue arising from the lack of a uniform upper bound for the matrices $\Gamma_{t}^{X},\Gamma_{t}^{Y}$ . Note that Lemma 3 gives $\Gamma_{t}^{X}\preceq\frac{1}{t}\mathrm{I}_{d}$ , a bound which is not uniform in $t$ . To illustrate how one may overcome this issue, suppose that there exists an $\varepsilon>0$ , such that

[TABLE]

In such a case, Lemma 2 would imply

[TABLE]

Towards finding an $\varepsilon$ such that the above holds, note that since $v_{t}^{X}$ is a martingale, and using (6) we have for every $t_{0}\in[0,1],$

[TABLE]

Observe that

[TABLE]

Using the relation in (11), Fubini’s theorem shows

[TABLE]

Combining the last two displays gives

[TABLE]

Using (17), we have the identities:

[TABLE]

and

[TABLE]

from which we deduce

[TABLE]

Let $\{w_{i}\}_{i=1}^{d}$ be an orthornormal basis of eigenvectors corresponding to the eigenvalues $\{\lambda_{i}\}_{i=1}^{d}$ of $\mathrm{I}_{d}-\mathbb{E}\left[\Gamma_{t}^{X}\right]$ . The following observation, which follows from the above identities, is crucial: if $\lambda_{i}\leq 0$ then necessarily $\langle w_{i},\mathrm{Cov}(X)w_{i}\rangle\geq 1$ . In this case, by assumption (20), $\langle w_{i},\mathrm{Cov}(Y)w_{i}\rangle\leq 1$ and

[TABLE]

Our aim is to bound (4) from below; thus, in the calculation of the trace in the RHS, we may disregard all $w_{i}$ corresponding to negative $\lambda_{i}$ . Moreover, if $\lambda_{i}\geq 0$ , we need only consider the cases where

[TABLE]

as well. Since,

[TABLE]

under the assumptions taken on $w_{i}$ , we see that all the terms are positive. Using the estimate (21), the previous equation is bounded from above by

[TABLE]

where we have used (20). Summing over all the relevant $w_{i}$ we get

[TABLE]

Plugging this into (4) and using (22) we have thus shown

[TABLE]

This suggests that it may be useful to bound $\mathbb{E}\left[\left\lVert v^{X}_{t_{0}}\right\rVert_{2}^{2}\right]$ from above, for small values of $t_{0}$ , which is the objective of the next lemma.

Lemma 7.

If $X$ is centered and has a finite Poincaré constant $\mathrm{C_{p}}(X)<\infty$ , then for every $s\leq\frac{1}{3(2\mathrm{C_{p}}(X)+1)}$ the following holds

[TABLE]

Proof.

Suppose to the contrary that $\mathbb{E}\left[\left\lVert v_{s^{2}}^{X}\right\rVert^{2}_{2}\right]\geq\frac{s}{4}\cdot\mathrm{D}(X||G)$ . Invoking Lemma 5 with $t_{0}=s^{2}$ gives

[TABLE]

whenever $t\geq s^{2}$ . Thus,

[TABLE]

Note now that for $s\leq\frac{1}{3(2\mathrm{C_{p}}(X)+1)}$

[TABLE]

and in particular we may substitute $s=\frac{1}{3(2\mathrm{C_{p}}(X)+1)}$ in (4). In this case, a straightforward calculation yields

[TABLE]

which contradicts the identity (6), and concludes the proof by contradiction.

We would like to use the lemma with the choice $s=\xi^{2}$ . In order to verify the condition on the lemma which amounts to $\xi^{2}\leq\frac{1}{3(2\mathrm{C_{p}}(X)+1)}$ , we first remark that if $\sigma_{X}^{2}\leq 1$ , then it is clear that $\xi\leq\frac{1}{3(2\mathrm{C_{p}}(X)+1)}$ . Otherwise, $\sigma_{X}^{2}\geq 1$ and

[TABLE]

As the same reasoning is also true for $Y$ , we now choose $t_{0}=\xi^{2}$ , which allows to invoke the previous lemma in (4) and to establish:

[TABLE]

We are finally ready to prove the main theorem.

Proof of Theorem 3.

Denote $\xi=\frac{1}{(2\mathrm{C_{p}}+1)}\frac{\min(\sigma_{Y}^{2},\sigma_{X}^{2})}{3}$ . Since $X$ and $Y$ are log-concave, by Lemma 3, $\Gamma_{t}^{X},\Gamma_{t}^{Y}\preceq\frac{1}{t}\mathrm{I}_{d}$ almost surely. Thus, Lemma 2 gives

[TABLE]

By noting that $\mathrm{C_{p}}\geq 1$ , the bound (26) gives

[TABLE]

for some numerical constant $K>0$ .

5 Further results

5.1 Stability for low entropy log concave measures

In this section we focus on the case where $X$ and $Y$ are log-concave and isotropic. Similar to the previous section, we set $\xi_{X}=\frac{1}{3(2\mathrm{C_{p}}(X)+1)}$ , so that by Corollary 11,

[TABLE]

Towards the proof of Theorem 6, we first need an analogue of Lemma 7, for which we sketch the proof here.

Lemma 8.

If $X$ is centred and has a finite Poincaré constant $\mathrm{C_{p}}(X)<\infty$ ,

[TABLE]

Proof.

Assume by contradiction that $\mathbb{E}\left[\left\lVert v_{\xi_{X}}\right\rVert_{2}^{2}\right]\geq\frac{1}{4}\mathrm{D}(X||G)$ . In this case, Lemma 5 implies, for every $t\geq\xi_{X}$ ,

[TABLE]

A calculation then shows that

[TABLE]

which is a contradiction to (6).

Proof of Theorem 6.

Since $v_{t}^{X}$ is a martingale, $\mathbb{E}\left[\left\lVert v_{t}^{X}\right\rVert_{2}^{2}\right]$ is an increasing function. By (6) we deduce the elementary inequality

[TABLE]

which holds for every $s\in[0,1]$ . For isotropic $X$ , Equation (17) shows that, for all $t\in[0,1]$ ,

[TABLE]

where the second inequality is by assumption. Note that Equation (17) also shows that $\mathbb{E}\left[\Gamma_{t}^{X}\right]\preceq\mathrm{I}_{d}$ which yields, for every $t\in[0,1]$

[TABLE]

Applying this to $Y$ as well produces the bound

[TABLE]

Set $\xi=\min(\xi_{X},\xi_{Y})$ . Repeating the same calculation as in (4) and using the above gives that

[TABLE]

Lemma 8 implies

[TABLE]

Finally, by Lemma 3, $\Gamma_{t}^{X},\Gamma_{t}^{Y}\preceq\frac{1}{t}\mathrm{I}_{d}$ almost surely for all $t\in[0,1]$ . We now invoke Lemma 2 to obtain

[TABLE]

5.2 Stability under convolution with a Gaussian

Proof of Theorem 7.

Fix $\lambda\in(0,1)$ , by (7) we have that

[TABLE]

As the relative entropy is affine invariant, this implies

[TABLE]

Lemma 5 yields,

[TABLE]

and

[TABLE]

Denote

[TABLE]

A calculation shows

[TABLE]

as well as

[TABLE]

Thus, the above bounds give

[TABLE]

and

[TABLE]

Now, since the expression $\frac{\alpha}{\alpha+\beta}$ is monotone increasing with respect to $\alpha$ and decreasing with respect to $\beta$ whenever $\alpha,\beta>0$ , those two inequalities together with (27) imply that

[TABLE]

Rewriting the above in terms of the deficit in the Shannon-Stam inequality, we have established

[TABLE]

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Ball, K., Barthe, F., Naor, A., et al. Entropy jumps in the presence of a spectral gap. Duke Mathematical Journal 119 , 1 (2003), 41–63.
2[2] Ball, K., and Nguyen, V. Entropy jumps for isotropic log-concave random vectors and spectral gap. Studia Mathematica 1 , 213 (2012), 81–96.
3[3] Brascamp, H. J., and Lieb, E. H. On extensions of the brunn-minkowski and prékopa-leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis 22 , 4 (1976), 366–389.
4[4] Courtade, T. A. Bounds on the poincaré constant for convolution measures. to appear in Annales de l’Institut Henri Poincaré (B) Probabilités et Statistiques (2018).
5[5] Courtade, T. A. A quantitative entropic clt for radially symmetric random vectors. In 2018 IEEE International Symposium on Information Theory (ISIT) (2018), IEEE, pp. 1610–1614.
6[6] Courtade, T. A., Fathi, M., and Pananjady, A. Quantitative stability of the entropy power inequality. IEEE Trans. Inform. Theory 64 , 8 (2018), 5691–5703.
7[7] Eldan, R., and Lee, J. R. Regularization under diffusion and anticoncentration of the information content. Duke Math. J. 167 , 5 (2018), 969–993.
8[8] Eldan, R., Mikulincer, D., and Zhai, A. The clt in high dimensions: quantitative bounds via martingale embedding. ar Xiv preprint ar Xiv:1806.09087 (2018).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Stability of the Shannon-Stam inequality via the Föllmer process

Abstract

1 Introduction

Our Contribution

Theorem 1**.**

Corollary 2**.**

Theorem 3**.**

Remark 4*.*

Remark 5*.*

Theorem 6**.**

Theorem 7**.**

Remark 8*.*

Remark 9*.*

Acknowledgments

2 Bounding the deficit via martingale embeddings

Lemma 1**.**

Lemma 2**.**

Proof of Lemma 1.

2.1 The Föllmer process associated to log-concave random vectors

Proposition 1**.**

Lemma 3**.**

Proof.

Corollary 10**.**

Proof.

Lemma 4**.**

Proof.

Corollary 11**.**

Proof.

Lemma 5**.**

Proof.

3 Stability for 111-uniformly log-concave random vectors

Theorem 12**.**

Lemma 6**.**

Proof of Theorem 1.

4 Stability for general log-concave random vectors

Lemma 7**.**

Proof.

Proof of Theorem 3.

5 Further results

5.1 Stability for low entropy log concave measures

Lemma 8**.**

Proof.

Proof of Theorem 6.

5.2 Stability under convolution with a Gaussian

Proof of Theorem 7.

Theorem 1.

Corollary 2.

Theorem 3.

*Remark 4**.*

*Remark 5**.*

Theorem 6.

Theorem 7.

*Remark 8**.*

*Remark 9**.*

Lemma 1.

Lemma 2.

Proposition 1.

Lemma 3.

Corollary 10.

Lemma 4.

Corollary 11.

Lemma 5.

3 Stability for $1$ -uniformly log-concave random vectors

Theorem 12.

Lemma 6.

Lemma 7.

Lemma 8.