Diffusion Hypercontractivity via Generalized Density Manifold

Wuchen Li

arXiv:1907.12546·cs.IT·October 31, 2019

Diffusion Hypercontractivity via Generalized Density Manifold

Wuchen Li

PDF

TL;DR

This paper introduces a new framework connecting diffusion hypercontractivity with various inequalities through a generalized density manifold, employing advanced calculus and transport metrics.

Contribution

It develops a one-parameter family of diffusion hypercontractivity results and introduces a novel PH^{-1}I inequality linking divergence, transport, and information measures.

Findings

01

Established a new family of diffusion hypercontractivity inequalities.

02

Derived the PH^{-1}I inequality connecting divergence, transport, and Fisher information.

03

Presented a mean-field Bakry-Emery calculus and Yano's volume measure formula.

Abstract

We prove a one-parameter family of diffusion hypercontractivity and present the associated Log-Sobolev, Poincare and Talagrand inequalities. A mean-field type Bakry-Emery iterative calculus and volume measure based integration formula (Yano's formula) are presented. Our results are based on the interpolation among divergence functional, generalized diffusion process, and generalized optimal transport metric. As a result, an inequality among Pearson divergence (P), negative Sobolev metric H^-1 and generalized Fisher information functional (I), named PH^{-1}I inequality, is derived.

Equations304

d X_{γ, t} = - \frac{γ}{γ - 1} \nabla μ (X_{γ, t})^{γ - 1} d t + 2 μ (X_{γ, t})^{γ - 1} d B_{t},

d X_{γ, t} = - \frac{γ}{γ - 1} \nabla μ (X_{γ, t})^{γ - 1} d t + 2 μ (X_{γ, t})^{γ - 1} d B_{t},

L_{γ} Φ = (\frac{γ}{γ - 1} \nabla μ^{γ - 1}, \nablaΦ) + μ^{γ - 1} ΔΦ, Φ \in C^{\infty} (M) .

L_{γ} Φ = (\frac{γ}{γ - 1} \nabla μ^{γ - 1}, \nablaΦ) + μ^{γ - 1} ΔΦ, Φ \in C^{\infty} (M) .

D_{γ} (ρ ∥ μ) = \int f (\frac{ρ}{μ}) μ d x,

D_{γ} (ρ ∥ μ) = \int f (\frac{ρ}{μ}) μ d x,

f (ρ) = ⎩ ⎨ ⎧ \frac{1}{( 1 - γ ) ( 2 - γ )} (ρ^{2 - γ} - 1) ρ lo g ρ - lo g ρ γ \neq = 1, γ \neq = 2 γ = 1 γ = 2.

f (ρ) = ⎩ ⎨ ⎧ \frac{1}{( 1 - γ ) ( 2 - γ )} (ρ^{2 - γ} - 1) ρ lo g ρ - lo g ρ γ \neq = 1, γ \neq = 2 γ = 1 γ = 2.

I_{γ} (ρ ∥ μ) = \int ∥\nabla lo g \frac{ρ}{μ} ∥^{2} ρ^{γ} μ^{2 γ - 2} d x .

I_{γ} (ρ ∥ μ) = \int ∥\nabla lo g \frac{ρ}{μ} ∥^{2} ρ^{γ} μ^{2 γ - 2} d x .

\mathcal{W}_{\gamma}(\rho,\mu)=\inf_{\Phi}\Big{\{}\int_{0}^{1}\sqrt{\int\|\nabla\Phi_{t}\|^{2}\rho_{t}^{\gamma}dx}dt\colon\partial_{t}\rho_{t}+\nabla\cdot(\rho_{t}^{\gamma}\nabla\Phi_{t})=0,~{}\rho_{0}=\rho,~{}\rho_{1}=\mu\Big{\}},

\mathcal{W}_{\gamma}(\rho,\mu)=\inf_{\Phi}\Big{\{}\int_{0}^{1}\sqrt{\int\|\nabla\Phi_{t}\|^{2}\rho_{t}^{\gamma}dx}dt\colon\partial_{t}\rho_{t}+\nabla\cdot(\rho_{t}^{\gamma}\nabla\Phi_{t})=0,~{}\rho_{0}=\rho,~{}\rho_{1}=\mu\Big{\}},

μ^{γ - 1} Ric - \frac{1}{γ - 1} Hess μ^{γ - 1} - Δ μ^{γ - 1} + \frac{1}{8} γ (γ - 1) ∥\nabla lo g μ ∥^{2} μ^{γ - 1} ⪰ κ .

μ^{γ - 1} Ric - \frac{1}{γ - 1} Hess μ^{γ - 1} - Δ μ^{γ - 1} + \frac{1}{8} γ (γ - 1) ∥\nabla lo g μ ∥^{2} μ^{γ - 1} ⪰ κ .

D_{γ} (ρ_{t} ∥ μ) \leq e^{- 2 κ t} D_{γ} (ρ_{0} ∥ μ) .

D_{γ} (ρ_{t} ∥ μ) \leq e^{- 2 κ t} D_{γ} (ρ_{0} ∥ μ) .

D_{γ} (ρ ∥ μ) \leq \frac{1}{2 κ} I_{γ} (ρ ∥ μ),

D_{γ} (ρ ∥ μ) \leq \frac{1}{2 κ} I_{γ} (ρ ∥ μ),

W_{γ} (ρ, μ) \leq \frac{2 D _{γ} ( ρ ∥ μ )}{κ} .

W_{γ} (ρ, μ) \leq \frac{2 D _{γ} ( ρ ∥ μ )}{κ} .

Ric - Hess lo g μ ⪰ κ, κ > 0,

Ric - Hess lo g μ ⪰ κ, κ > 0,

\int ρ lo g \frac{ρ}{μ} d x \leq \frac{1}{2 κ} \int ∥\nabla lo g \frac{ρ}{μ} ∥^{2} ρ d x;

\int ρ lo g \frac{ρ}{μ} d x \leq \frac{1}{2 κ} \int ∥\nabla lo g \frac{ρ}{μ} ∥^{2} ρ d x;

W_{1} (ρ, μ) \leq \frac{2 D _{1} ( ρ ∥ μ )}{κ} .

W_{1} (ρ, μ) \leq \frac{2 D _{1} ( ρ ∥ μ )}{κ} .

μ^{- 1} Ric + Hess μ^{- 1} - Δ μ^{- 1} ⪰ κ, κ > 0.

μ^{- 1} Ric + Hess μ^{- 1} - Δ μ^{- 1} ⪰ κ, κ > 0.

\frac{1}{2} \int (\frac{ρ}{μ} - 1)^{2} μ d x \leq \frac{1}{2 κ} \int ∥\nabla lo g \frac{ρ}{μ} ∥^{2} μ^{- 2} d x .

\frac{1}{2} \int (\frac{ρ}{μ} - 1)^{2} μ d x \leq \frac{1}{2 κ} \int ∥\nabla lo g \frac{ρ}{μ} ∥^{2} μ^{- 2} d x .

\begin{split}&\int\mu^{-1}\Big{(}\nabla\cdot(\mu^{\gamma}\nabla\Phi)\Big{)}^{2}dx\\ =&\int\mu^{\gamma}\Big{\{}(\mu^{\gamma-1}\textrm{Ric}-\Delta\mu^{\gamma-1}-\frac{1}{\gamma-1}\textrm{Hess}\mu^{\gamma-1})(\nabla\Phi,\nabla\Phi)+\mu^{\gamma-1}\|\textrm{Hess}\Phi\|^{2}\\ &\qquad+\gamma(\gamma-1)\mu^{\gamma-1}\Big{(}(\nabla\log\mu,\nabla\Phi)^{2}-\|\nabla\log\mu\|^{2}\|\nabla\Phi\|^{2}\Big{)}\Big{\}}dx.\end{split}

\begin{split}&\int\mu^{-1}\Big{(}\nabla\cdot(\mu^{\gamma}\nabla\Phi)\Big{)}^{2}dx\\ =&\int\mu^{\gamma}\Big{\{}(\mu^{\gamma-1}\textrm{Ric}-\Delta\mu^{\gamma-1}-\frac{1}{\gamma-1}\textrm{Hess}\mu^{\gamma-1})(\nabla\Phi,\nabla\Phi)+\mu^{\gamma-1}\|\textrm{Hess}\Phi\|^{2}\\ &\qquad+\gamma(\gamma-1)\mu^{\gamma-1}\Big{(}(\nabla\log\mu,\nabla\Phi)^{2}-\|\nabla\log\mu\|^{2}\|\nabla\Phi\|^{2}\Big{)}\Big{\}}dx.\end{split}

\int(\Delta\Phi)^{2}dx=\int\Big{\{}\textrm{Ric}(\nabla\Phi,\nabla\Phi)+\|\textrm{Hess}\Phi\|^{2}\Big{\}}dx.

\int(\Delta\Phi)^{2}dx=\int\Big{\{}\textrm{Ric}(\nabla\Phi,\nabla\Phi)+\|\textrm{Hess}\Phi\|^{2}\Big{\}}dx.

\int\mu^{-1}\Big{(}\nabla\cdot(\mu\nabla\Phi)\Big{)}^{2}dx=\int\mu\Big{\{}\textrm{Ric}(\nabla\Phi,\nabla\Phi)+\|\textrm{Hess}\Phi\|^{2}\Big{\}}dx.

\int\mu^{-1}\Big{(}\nabla\cdot(\mu\nabla\Phi)\Big{)}^{2}dx=\int\mu\Big{\{}\textrm{Ric}(\nabla\Phi,\nabla\Phi)+\|\textrm{Hess}\Phi\|^{2}\Big{\}}dx.

\begin{split}\int\mu^{-1}(\Delta\Phi)^{2}dx=&\int\Big{\{}(\mu^{-1}\textrm{Ric}-\Delta\mu^{-1}+\textrm{Hess}\mu^{-1})(\nabla\Phi,\nabla\Phi)+\mu^{-1}\|\textrm{Hess}\Phi\|^{2}\Big{\}}dx.\end{split}

\begin{split}\int\mu^{-1}(\Delta\Phi)^{2}dx=&\int\Big{\{}(\mu^{-1}\textrm{Ric}-\Delta\mu^{-1}+\textrm{Hess}\mu^{-1})(\nabla\Phi,\nabla\Phi)+\mu^{-1}\|\textrm{Hess}\Phi\|^{2}\Big{\}}dx.\end{split}

μ^{γ - 1} Ric - Δ μ^{γ - 1} - \frac{1}{γ - 1} Hess μ^{γ - 1} ⪰ λ,

μ^{γ - 1} Ric - Δ μ^{γ - 1} - \frac{1}{γ - 1} Hess μ^{γ - 1} ⪰ λ,

μ^{γ - 1} Ric - Δ μ^{γ - 1} - \frac{1}{γ - 1} Hess μ^{γ - 1} - γ (γ - 1) ∥\nabla lo g μ ∥^{2} μ^{γ - 1} ⪰ λ,

μ^{γ - 1} Ric - Δ μ^{γ - 1} - \frac{1}{γ - 1} Hess μ^{γ - 1} - γ (γ - 1) ∥\nabla lo g μ ∥^{2} μ^{γ - 1} ⪰ λ,

\int f^{2} μ d x \leq \frac{1}{λ} \int ∥\nabla f ∥^{2} μ^{γ} d x,

\int f^{2} μ d x \leq \frac{1}{λ} \int ∥\nabla f ∥^{2} μ^{γ} d x,

\int f^{2} μ d x \leq \frac{1}{λ} \int ∥\nabla f ∥^{2} μ d x .

\int f^{2} μ d x \leq \frac{1}{λ} \int ∥\nabla f ∥^{2} μ d x .

D_{γ} (μ + ϵ h ∥ μ) = \frac{ϵ ^{2}}{2} \int f^{2} μ d x + o (ϵ^{2}) .

D_{γ} (μ + ϵ h ∥ μ) = \frac{ϵ ^{2}}{2} \int f^{2} μ d x + o (ϵ^{2}) .

I_{γ} (μ + ϵ h ∥ μ) = ϵ^{2} \int (\nabla f)^{2} μ^{γ} d x + o (ϵ^{2}) .

I_{γ} (μ + ϵ h ∥ μ) = ϵ^{2} \int (\nabla f)^{2} μ^{γ} d x + o (ϵ^{2}) .

μ Ric - Δ μ - Hess μ - 2∥\nabla lo g μ ∥^{2} μ ⪰ λ, λ > 0,

μ Ric - Δ μ - Hess μ - 2∥\nabla lo g μ ∥^{2} μ ⪰ λ, λ > 0,

\int f^{2} μ d x \leq \frac{1}{λ} \int ∥\nabla f ∥^{2} μ^{2} d x,

\int f^{2} μ d x \leq \frac{1}{λ} \int ∥\nabla f ∥^{2} μ^{2} d x,

W_{0} (ρ, μ) = H^{- 1} (ρ, μ),

W_{0} (ρ, μ) = H^{- 1} (ρ, μ),

H^{-1}(\rho,\mu)=\sqrt{\int\big{(}\rho-\mu,\Delta^{-1}(\rho-\mu)\big{)}dx}.

H^{-1}(\rho,\mu)=\sqrt{\int\big{(}\rho-\mu,\Delta^{-1}(\rho-\mu)\big{)}dx}.

D_{0} (ρ ∥ μ) \leq I_{0} (ρ ∥ μ) H^{- 1} (ρ, μ) - \frac{κ}{2} H^{- 1} (ρ, μ)^{2} .

D_{0} (ρ ∥ μ) \leq I_{0} (ρ ∥ μ) H^{- 1} (ρ, μ) - \frac{κ}{2} H^{- 1} (ρ, μ)^{2} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Diffusion Hypercontractivity via Generalized Density Manifold

Wuchen Li

[email protected]

Abstract.

We prove a one-parameter family of diffusion hypercontractivity and present the associated Log-Sobolev, Poincaré and Talagrand inequalities. A mean-field type Bakry-Émery iterative calculus and volume measure based integration formula (Yano’s formula) are presented. Our results are based on the interpolation among divergence functional, generalized diffusion process, and generalized optimal transport metric. As a result, an inequality among Pearson divergence (P), negative Sobolev metric ( $H^{-1}$ ) and generalized Fisher information functional (I), named P $H^{-1}$ I inequality, is derived.

Key words and phrases:

Information theory; Mean-field Bakry-Émery calculus; Generalized Log-Sobolev inequality; Generalized Poincaré inequality; Generalized Talagrand inequality; Generalized Yano’s formula.

Wuchen Li is supported by AFOSR MURI (Grant No. 18RT0073).

1. Introduction

Diffusion hypercontractivity plays essential roles in functional inequalities [2, 10] and information theory [6, 7]. It is often used to estimate the convergence rates of Markov chain Monte Carlo methods. Among these studies, Bakry–Émery criterions [1] provide sufficient conditions for showing convergences rates of diffusion processes and related inequalities. Recently, optimal transport provides the other viewpoint on this topic [22]. In this viewpoint, the probability density space is embedded with an infinite-dimensional Riemannian metric, named Wasserstein metric [19]. Here the density space with Wasserstein metric is named density manifold [13]. Following this metric viewpoint, the diffusion hypercontractivity, in particular, the Bakery–Émery criterions, can be derived by Hessian operators of divergence functionals in density manifold [20]. This study has been extended to general base metric spaces [18, 21]. On the other hand, the relation between local behavior of diffusion hypercontractivity (such as Poincaré inequality) and integral formula, known as Yano’s formula [23], has been discovered in [5, 14]. It reveals the connection between integration formula on the base manifold and Riemannian calculus in density manifold.

In this paper, we study the hypercontractivity of generalized diffusion processes, named Hessian transport stochastic differential equations [17]. See related information theory background in [25]. Following the generalized (mobility) density manifold proposed in [4, 8], the density manifold’s Riemannian calculus [15] and geometric insights of inequalities provided in [20], we introduce the generalized Bakry–Émery criterions in Theorem 1. These criterions provide sufficient conditions for showing convergence rates of generalized diffusion processes and establishing generalized Log-Sobolev and Talagrand inequalities. In addition, a generalized Yano’s formula in Theorem 2 is derived, which provides a reference measure related integral formulas. Using Yano’s formula, we establish the generalized Poincaré inequality in Corollary 3. More importantly, a P $H^{-1}$ I inequality is presented in Theorem 4.

In literature, the generalized optimal transport metric has been proposed in [8] and many groups have studied associated generalized functional inequalities [3, 9]. Firstly, [9] studies functional inequalities for the classical Kolmogorov-Fokker-Planck equation, where the Bakry–Émery criterions are classical and generalized optimal transport metrics depend on the reference measure. Here we study transport metrics related stochastic processes and then build new functional inequalities. We introduce new Bakry–Émery criterions, in which the metric of density manifold in inequalities do not depend on the reference measure. For example, we obtain several functional inequalities related to the classical $H^{-1}$ metric. In addition, [3] formulates divergence functional related inequalities for a type of drift-diffusion processes. In this study, they apply the classical Bakry-Émery iterative calculus. While in this paper, we introduce a new mean field Bakry-Émery iterative calculus.

This paper is organized as follows. In section 2, we state the main result of this paper. We establish hypercontractivity for generalized drift-diffusion processes and prove several functional inequalities. A generalized Yano’s formula is also derived. In section 3, we formulate the primary tool of the proof. In section 4, the proof is presented. In section 5, a generalized Bakry-Émery iterative calculus is presented.

2. Main result

Given a compact and smooth Riemannian manifold $(M,(\cdot,\cdot))$ without boundary. Denote its volume form by $dx$ , the Ricci curvature tensor by Ric, the gradient, divergence, Laplacian operators by $\nabla$ , $\nabla\cdot$ , $\Delta$ respectively, and the Hessian operator by Hess.

Given a reference probability density function $\mu\in C^{\infty}(M)$ with $\inf_{x\in M}\mu(x)>0$ , consider the $\gamma$ –drift diffusion process

[TABLE]

where $B_{t}$ is the standard Brownian motion in $M$ with the infinitesimal generator

[TABLE]

Consider the $\gamma$ –divergence functional111It is often named $\alpha$ –divergence with $\gamma=\frac{3-\alpha}{2}$ . We use notation $\gamma$ for the simplicity of presentation.

[TABLE]

where $f\colon[0,\infty)\rightarrow\mathbb{R}$ has the form

[TABLE]

Consider the $\gamma$ –Fisher information functional

[TABLE]

Consider the $\gamma$ *–Wasserstein distance222This notation of Wasserstein-distance is first studied in [8]. When $\gamma=1$ , we notice that the notation of $\mathcal{W}_{1}$ represents the classical $L^{2}$ -Wasserstein distance, not the $L^{1}$ -Wasserstein distance. *

[TABLE]

where $\rho_{t}=\rho(t,x)$ , $\Phi_{t}=\Phi(t,x)$ and the infimum is over all potential function $\Phi\in[0,1]\times M\rightarrow\mathbb{R}$ .

We next provide sufficient conditions for the hypercontractivity of $\gamma$ –drift diffusion process and functional inequalities among $\gamma$ –divergence, $\gamma$ –Fisher information and $\gamma$ –Wasserstein distance.

Theorem 1 (Generalized hypercontractivity).

Let $\gamma\in[0,1]$ , if there exists a constant $\kappa>0$ , such that

[TABLE]

Let $\rho_{0}$ be a smooth initial distribution and $\rho_{t}$ be the probability density function of $\gamma$ –drift diffusion process, then

[TABLE]

Moreover, for any smooth probability density function $\rho\in C^{\infty}(M)$ with $\inf_{x\in M}\rho(x)>0$ , the generalized Log-Sobolev inequality holds

[TABLE]

and the generalized Talagrand inequality holds

[TABLE]

Example 1 (Kullback–Leibler divergence).

Consider $\gamma=1$ , $f(\rho)=\rho\log\rho$ . Here the $\mathcal{D}_{1}=\int\rho\log\frac{\rho}{\mu}dx$ forms the classical Kullback–Leibler divergence function (relative entropy), and $\mathcal{I}_{1}=\int\|\nabla\log\frac{\rho}{\mu}\|^{2}\rho dx$ is the classical relative Fisher information, $dX_{1,t}=-\nabla\log\mu(X_{1,t})dt+\sqrt{2}B_{t}$ is the classical Langevin process, and $\mathcal{W}_{1}$ is the classical $L^{2}$ -Wasserstein distance. Here the condition (1) forms

[TABLE]

which is the classical Bakry–Émery criterion. Under this condition, the distribution of drift diffusion process $X_{1,t}$ converges to $\mu$ ; the Log–Sobolev inequality (3) holds

[TABLE]

and the Talagrand inequality holds

[TABLE]

Here, if $M$ is Ricci flat, i.e. $\textrm{Ric}=0$ and we denote $\mu(x)=e^{-V(x)}$ , the condition (1) forms $\textrm{Hess}V\succeq\kappa$ .

Example 2 (Pearson divergence).

Consider $\gamma=0$ , $f(\rho)=\frac{1}{2}(\rho^{2}-1)$ . Here $\mathcal{D}_{0}=\frac{1}{2}\int(\frac{\rho}{\mu}-1)^{2}\mu dx$ is named Pearson divergence function, $\mathcal{I}_{0}=\int\|\nabla\log\frac{\rho}{\mu}\|^{2}\mu^{-2}dx$ is the [math]–Fisher information and $dX_{0,t}=\sqrt{2\mu^{-1}(X_{t})}dB_{t}$ is the [math]–drift diffusion process. The condition (1) forms

[TABLE]

Under this condition, the distribution of drift diffusion process $X_{0,t}$ converges to $\mu$ and the generalized Log–Sobolev inequality (3) holds

[TABLE]

We also show a new integration identity, which is found in the proof of Theorem 1.

Theorem 2 (Generalized Yano’s formula).

Denote $\Phi\in C^{\infty}(M)$ , then

[TABLE]

*Remark 1**.*

When $\mu(x)$ is a uniform measure, i.e. $\mu(x)=1$ , the above formula is the classical Yano’s formula [23]:

[TABLE]

When $\gamma=1$ , it is the generalized Yano’s formula studied in [5, 14]

[TABLE]

Our derivation extends these classical Yano’s formulas with general volume measure $\mu$ and its power $\gamma$ . For example, when $\gamma=0$ , we obtain

[TABLE]

Later on, using the generalized Yano’s formula, we prove the following Poincaré inequality.

Corollary 3 (Generalized Poincaré inequality).

If there exists a constant $\lambda>0$ , such that when $\gamma\in[0,1]$ ,

[TABLE]

or when $\gamma\in[1,\infty)\cup(-\infty,0]$ ,

[TABLE]

then

[TABLE]

for any $f\in C^{\infty}(M)$ with $\int f\mu dx=0$ .

*Remark 2**.*

When $\gamma=1$ , Corollary 3 recovers the classical Poincaré inequality

[TABLE]

*Remark 3**.*

Here we derive the formulation of generalized Poincaré inequality by the approximation of generalized Log-Sobolev inequality. In details, denote $\rho=\mu+\epsilon h$ , where $h=f\mu$ and $\int hdx=0$ . The L.H.S. of (5) comes from the Hessian metric tensor for the $\gamma$ –divergence $\mathcal{D}_{\gamma}(\rho\|\mu)$ :

[TABLE]

While the R.H.S. of (5) is from the second order approximation in term of $\epsilon$ for the $\gamma$ –relative Fisher information:

[TABLE]

Example 3 (Reverse Kullback–Leibler divergence).

Consider $\gamma=2$ , $f(\rho)=-\log\rho$ . Here $\mathcal{D}_{2}(\rho\|\mu)=-\int\mu\log\frac{\rho}{\mu}dx$ is named reverse Kullback–Leibler divergence function or Cross entropy. In this case, the condition in Corollary 3 forms

[TABLE]

Under this condition, the generalized Poincaré inequality holds

[TABLE]

where $f\in C^{\infty}(M)$ and $\int f\mu dx=0$ . Again, if $M$ is Ricci flat, i.e. $\textrm{Ric}=0$ , then the condition in Corollary 3 forms $-\Delta\mu-\textrm{Hess}\mu-2\|\nabla\log\mu\|^{2}\mu\succeq\lambda$ .

Last, notice the fact that when $\gamma=0$ , the $\gamma$ -Wasserstein distance is exactly the $H^{-1}$ distance:

[TABLE]

where $H^{-1}$ is the negative Sobolev distance between $\rho$ and $\mu$ , i.e.

[TABLE]

We next show an inequality among Pearson divergence (P), $H^{-1}$ metric and [math]–Fisher information (I), named $PH^{-1}I$ inequality.

Theorem 4 (Inequalities for $H^{-1}$ metric).

Suppose $\mu^{-1}\textrm{Ric}+\textrm{Hess}\mu^{-1}-\Delta\mu^{-1}\succeq\kappa$ , where $\kappa\in\mathbb{R}$ , then the $PH^{-1}I$ inequality holds

[TABLE]

In addition, if $\kappa\geq 0$ , then the $H^{-1}$ -Talagrand inequality holds

[TABLE]

*Remark 4**.*

The P $H^{-1}$ I inequality is an analog of inequalities among $\mathcal{D}_{1}$ (H), Wasserstein-2 metric and $1$ –Fisher information, known as HWI inequality; see details in [20].

*Remark 5**.*

If $\kappa>0$ , the P $H^{-1}$ I inequality shows

[TABLE]

In addition, using the fact that $H^{-1}(\rho,\mu)\leq\sqrt{\frac{2\mathcal{D}_{0}(\mu\|\nu)}{\kappa}}$ and $\mathcal{D}_{0}(\rho\|\mu)\leq\frac{1}{2\kappa}\mathcal{I}_{0}(\rho\|\mu)$ , we have

[TABLE]

In next sections, we apply geometric tools in probability space to prove the above inequalities and integral formulas.

3. Generalized Density manifold

In this section, we introduce the main tool to prove above results. We first review a class of Riemannian metrics in probability space, introduced by $\gamma-$ Wasserstein distance. We then present its Riemannian calculus, including gradient and Hessian operators. By using gradient operators in this metric, we connect $\gamma$ –divergence, $\gamma$ –Fisher information and $\gamma$ –drift diffusion process.

3.1. Density manifold and its Riemannian calculus

Consider the set of smooth and strictly positive densities

[TABLE]

The tangent space of $\mathcal{P}$ at $\rho\in\mathcal{P}$ is given by

[TABLE]

Consider the $\gamma$ –Wasserstein metric tensor in the probability space as follows.

Definition 5.

The inner product $g_{\rho}\colon{T_{\rho}}\mathcal{P}\times{T_{\rho}}\mathcal{P}\rightarrow\mathbb{R}$ is defined as for any $\sigma_{1}$ and $\sigma_{2}\in T_{\rho}\mathcal{P}$ :

[TABLE]

where $\Delta_{\rho^{\gamma}}=\nabla\cdot(\rho^{\gamma}\nabla)$ is a weighted elliptic operator. In addition, denote $\Phi_{1}$ , $\Phi_{2}\in C^{\infty}(M)/\mathbb{R}=T^{*}_{\rho}\mathcal{P}$ , with $\sigma_{i}=-\Delta_{\rho^{\gamma}}\Phi_{i}$ , $i=1,2$ , then

[TABLE]

*Remark 6**.*

An observation is that if $\gamma=0$ , the proposed $\mathcal{W}_{0}$ metric is the $H^{-1}$ metric [8]. If $\gamma=1$ , the proposed $\mathcal{W}_{1}$ metric is the $L^{2}$ -Wasserstein metric [13, 19].

We note that the characterization of geodesics in $(\mathcal{P},g)$ has been studied in [4, 8]. In this paper, we focus on the Riemannian calculus for density manifold $(\mathcal{P},g)$ , using both $(\rho,\sigma)$ in tangent bundle and $(\rho,\Phi)$ in cotangent bundle.

Proposition 6.

The Christoffel symbol $\Gamma_{\rho}\colon T_{\rho}\mathcal{P}\times T_{\rho}\mathcal{P}\rightarrow T_{\rho}\mathcal{P}$ in $(\mathcal{P},g)$ satisfies

[TABLE]

where $\sigma_{i}=-\Delta_{\rho^{\gamma}}\Phi_{i}$ , $i=1,2$ , and

[TABLE]

Proof.

The proof follows the study in [15]. We derive the Christoffel symbol by using the Lagrangian formulation of geodesics. Consider the minimization of the geometric action functional in density space

[TABLE]

where $\rho_{t}=\rho(t,x)$ is a density path with fixed boundary points $\rho_{0}$ , $\rho_{1}$ . The geodesics in $(\mathcal{P},g)$ satisfies the Euler-Lagrange equation

[TABLE]

i.e.

[TABLE]

where $C(t)$ is a function depending only on $t$ . Using the fact that

[TABLE]

then equation (6) forms

[TABLE]

By timing both sides with $\Delta_{\rho_{t}^{\gamma}}$ and comparing with the geodesics equation,

[TABLE]

we derive the Christoffel symbol. Similarly, we can formulate the Christoffel symbol (raised Christoffel symbol) in term of $\Phi$ . ∎

Proposition 7.

The geodesics equation in $(\mathcal{P},g)$ satisfies

[TABLE]

Denote Legendre transform $\Phi_{t}=(-\Delta_{\rho_{t}^{\gamma}})^{-1}\partial_{t}\rho_{t}$ , then the co-geodesics equation satisfies

[TABLE]

Proof.

The geodesics equation follows $\partial_{tt}\rho_{t}+\Gamma_{\rho_{t}}(\partial_{t}\rho_{t},\partial_{t}\rho_{t})=0$ . We next demonstrate the Hamiltonian formulation of geodesics flow. Consider the Legendre transform in $(\mathcal{P},g)$ :

[TABLE]

Then $\Phi_{t}=-\Delta_{\rho_{t}^{\gamma}}^{-1}\partial_{t}\rho_{t}$ , and

[TABLE]

Then the co-geodesic flow satisfies

[TABLE]

which is the equation pair (7). ∎

For completeness of this paper, we also present the Lagrangian coordinates of geodesics in generalized density manifold.

Proposition 8 (Lagrangian coordinates).

Denote $\rho_{t}=X_{t}\#\rho^{0}$ , where $X_{t}\colon M\rightarrow M$ is the diffeomorphism and $\#$ is the push-forward operator. Then the Lagrangian coordinates of geodesic equation (7) satisfies

[TABLE]

*Remark 7**.*

Here we present three examples of geodesics in Lagrangian coordinates for generalized density manifold.

(i)

If $\gamma=1$ , the $\mathcal{W}_{1}$ geodesic equation forms

[TABLE]

which is a well-known result in optimal transport.

(ii)

If $\gamma=2$ , the $\mathcal{W}_{2}$ geodesic equation satisfies

[TABLE]

(iii)

If $\gamma=0$ , the $\mathcal{W}_{0}$ , i.e. $H^{-1}$ , geodesic equation satisfies

[TABLE]

Proof.

Denote

[TABLE]

where $(\rho_{t},\Phi_{t})$ satisfies (7). Then

[TABLE]

Notice the fact that

[TABLE]

where the equality holds since $\partial_{t}\rho+\nabla\cdot(\rho^{\gamma}\nabla\Phi)=0$ . Substituting the above and $\partial_{t}\Phi+\frac{\gamma}{2}\|\nabla\Phi\|^{2}\rho^{\gamma-1}=0$ into (8), we obtain

[TABLE]

where the second last equality uses the fact $(\gamma-1)\log\rho\rho^{\gamma-1}=\nabla\rho^{\gamma-1}$ and $\frac{d}{dt}X_{t}=\rho^{\gamma-1}\nabla\Phi$ . ∎

Proposition 9.

Consider a functional $\mathcal{F}\colon\mathcal{P}\rightarrow\mathbb{R}$ .

(i)

The Riemannian gradient operator of $\mathcal{F}$ in $(\mathcal{P},g)$ satisfies

[TABLE]

where $\delta$ is the $L^{2}$ first variation. The squared norm of gradient operator forms

[TABLE]

(ii)

The Riemannian Hessian operator of $\mathcal{F}$ in $(\mathcal{P},g)$ satisfies

[TABLE]

where $\sigma_{i}=-\Delta_{\rho^{\gamma}}\Phi_{i}$ , $i=1,2$ , and $\delta^{2}$ is the $L^{2}$ second variation operator.

*Remark 8**.*

Several interesting examples of Hessian operators of $\mathcal{F}$ have been studied in [4], including linear and interaction potential energies.

Proof.

(i) The Riemannian gradient operator satisfies

[TABLE]

Then

[TABLE]

(ii) The Riemannian Hessian operator satisfies

[TABLE]

We next formulate the terms $h_{1}$ , $h_{2}$ separately. Notice the fact that

[TABLE]

where the second and third equalities are shown by integration by parts with respect to $x$ , $y$ . In addition, we estimate three terms in (h2).

[TABLE]

where the first and second equality holds by integration by parts with respect to $x$ . Similarly,

[TABLE]

And

[TABLE]

Combining the above three terms in (h2), we finish the proof. ∎

3.2. Gradient systems and $\gamma$ –drift diffusion process

In this sequel, we present the relation among Riemannian gradient operators in $(\mathcal{P},g)$ , $\gamma$ –divergence functional, $\gamma$ –Fisher information and $\gamma$ –drift diffusion process, see details in [17].

Given a $\gamma$ -divergence functional, the Kolomogrov forward operator of $\gamma$ –drift diffusion process is the negative gradient descent direction in $(\mathcal{P},g)$ . And the squared gradient norm of $\gamma-$ divergence functional in $(\mathcal{P},g)$ forms the $\gamma$ –Fisher information functional.

Lemma 10.

The following statements hold.

(i)

[TABLE]

where $L^{*}_{\gamma}$ is the adjoint operator of $L_{\gamma}$ in $L^{2}(\rho)$ .

(ii)

[TABLE]

Proof.

We first prove (i). On the one hand, the the Kolomogrov forward operator forms

[TABLE]

We need to show

[TABLE]

Notice the fact

[TABLE]

where the second equality uses the fact $\nabla\cdot(\mu^{\gamma-1}\nabla\Phi)=(\nabla\mu^{\gamma-1},\nabla\Phi)+\mu^{\gamma-1}\Delta\Phi$ and the fourth equality applies the fact $\nabla\frac{\rho}{\mu}=\mu^{-1}\nabla\rho-\mu^{-2}\rho\nabla\mu$ . On the other hand, the negative gradient operator of $\mathcal{D}_{\gamma}(\rho\|\mu)$ in $(\mathcal{P},g)$ satisfies

[TABLE]

Comparing the above, we finish the proof.

We next prove (ii). Notice the fact that

[TABLE]

where the second equality uses the fact that $\frac{1}{1-\gamma}\nabla(\frac{\rho}{\mu})^{1-\gamma}=(\frac{\rho}{\mu})^{-\gamma}\nabla\frac{\rho}{\mu}=(\frac{\rho}{\mu})^{1-\gamma}\nabla\log\frac{\rho}{\mu}$ .

∎

Shortly, we shall apply the above two geometric relations to give sufficient conditions for hypercontractivity of $\gamma$ –diffusion process and generalized functional inequalities.

4. Proof

Here we present the proof of generalized diffusion hypercontractivity using the geometric tools built in previous section.

4.1. Sketch of proof

Consider the gradient flow of $\gamma$ –divergence functional

[TABLE]

where $\rho_{t}$ is the probability density function at time $t$ . Then the first time derivative of $\gamma$ –divergence along the gradient flow forms

[TABLE]

And the second time derivative of $\gamma$ –divergence becomes

[TABLE]

If we can bound the ratio between the first and second derivative, i.e.

[TABLE]

we prove Theorem 1. This is true if we integrate (10) on both sides for $[t,\infty)$ , then

[TABLE]

By Grownwall’s inequality, we obtain the hypercontractivity of $\gamma$ –drift diffusion process

[TABLE]

In addition, notice the fact that $\frac{d}{dt}\mathcal{D}_{\gamma}(\rho_{t}\|\mu)=-\mathcal{I}_{\gamma}(\rho_{t}\|\mu_{t})$ , then inequality (11) forms

[TABLE]

By choosing $t=0$ with arbitrary $\rho_{0}\in\mathcal{P}$ , the Log-Sobolev inequality (3) is proven.

From above arguments, the proof boils down to estimate the ratio in (10). Here the formulation (10) is equivalent to

[TABLE]

Next, our goal is to derive the Hessian operators of $\gamma$ –divergence in $(\mathcal{P},g)$ .

4.2. Hessian operator estimation

Lemma 11 (Hessian of $\gamma$ –divergence in $(\mathcal{P},g)$ ).

Denote $\sigma=-\Delta_{\rho^{\gamma}}\Phi$ , then

[TABLE]

*Remark 9**.*

In fact, there are several interesting cases for Hessian operators of $\gamma$ –divergence in density manifold. Denote $\sigma=-\Delta_{\rho^{\gamma}}\Phi$ .

(i)

If $\gamma=1$ , then

[TABLE]

(ii)

If $\mu=1$ is a uniform measure [4], then

[TABLE]

(iii)

If $\gamma=0$ , then

[TABLE]

Proof.

From proposition 9, we can compute the Hessian operator of $\gamma$ –Divergence directly. For readers who are not family with geometric computations, the following direct method is also given. The Hessian operator is given by taking the second order time derivative of $\mathcal{D}_{\gamma}(\rho\|\mu)$ along the co-geodesics flow (7). Consider the first order time derivative

[TABLE]

And the second order time derivative satisfies

[TABLE]

We estimate (a), (b) and (c) separately.

For (a), we denote $\delta^{2}\mathcal{D}(\rho)=\frac{\partial^{2}}{\partial\rho\partial\rho}(f(\frac{\rho}{\mu})\mu)(x)$ . Then

[TABLE]

We next estimate (b).

[TABLE]

Here we derive (b1), (b2), (b3), (b4) more explicitly. Notice the fact that

[TABLE]

and

[TABLE]

In addition,

[TABLE]

and

[TABLE]

We last derive (c).

[TABLE]

We estimate (c1), (c2), (c3), (c4) explicitly. Notice the fact that

[TABLE]

and

[TABLE]

In addition,

[TABLE]

We now summarize all the formulas.

[TABLE]

Notice the fact that

[TABLE]

where the last equality is from Bochner’s formula, i.e.

[TABLE]

By substituting (d) into (12), we obtain

[TABLE]

We lastly reformulate the term (e):

[TABLE]

Notice that (e1) has a formulation

[TABLE]

where the second equality holds by the fact that $\textrm{Hess}\Phi\nabla\Phi=\nabla\nabla\Phi\nabla\Phi=\frac{1}{2}\nabla(\nabla\Phi)^{2}$ .

Substituting (e1) into (e), we obtain

[TABLE]

Substituting the above formula into (13), we derive

[TABLE]

∎

We observe that the Hessian operator in $(\mathcal{P},g)$ is more complicated than the one with $\gamma=1$ . Since when $\gamma=1$ , there is no interaction bilinear term between the Hessian operator and the squared gradient norm. We overcome this by the following estimates. Denote the bilinear form:

[TABLE]

Lemma 12.

Denote $\delta\mathcal{D}_{\gamma}(\rho\|\mu)=\frac{1}{1-\gamma}(\frac{\rho}{\mu})^{1-\gamma}$ , then for any $\rho\in\mathcal{P}$ ,

[TABLE]

Proof.

The proof is based on an estimation for the bilinear form $J$ . Notice

[TABLE]

Then

[TABLE]

and

[TABLE]

Denote $\nabla\log\frac{\rho}{\mu}=a$ , $\nabla\log\mu=a_{0}$ , then $\nabla\log\rho=a+a_{0}$ , and thus

[TABLE]

We further denote $\cos\theta=\frac{(a_{0},a)}{\|a\|\|a_{0}\|}$ , then

[TABLE]

which finishes the proof. ∎

4.3. Proof

Proof of Theorem 1.

Firstly, following Lemma 11 and Lemma 12, we prove that condition (1) implies both the convergence result (10) and the functional inequality (3).

Secondly, the generalized Talagrand inequality (4) follows directly from the gradient flow interpolation of inequality in Proposition $1$ of [20]. For completeness of this paper, we present it here. Consider the real value function

[TABLE]

where $\rho_{t}=\rho(t,\cdot)$ is the density function at time $t$ . Notice that $\Psi(0)=\mathcal{W}(\rho_{0},\mu)$ and $\lim_{t\rightarrow\infty}\Psi(t)=\sqrt{\frac{2\mathcal{D}_{\gamma}(\rho_{t}\|\mu)}{\kappa}}$ , since $\mathcal{D}_{\gamma}(\rho_{t}\|\mu)\rightarrow 0$ following (10).

We next claim $\frac{d}{dt}\Psi(t)\leq 0$ . If so, we finish the proof. To prove it, we show that

[TABLE]

Notice the fact that

[TABLE]

and along the gradient flow $\partial_{t}\rho=-\textrm{grad}_{g}\mathcal{D}_{\gamma}(\rho\|\mu)$ ,

[TABLE]

In addition

[TABLE]

Thus

[TABLE]

which finishes the proof. ∎

Proof of Theorem 2.

We prove the equality by using the Hessian operator of $\mathcal{D}_{\gamma}(\rho\|\mu)$ in $(\mathcal{P},g)$ at the point $\rho=\mu$ . Notice that for any $\sigma\in T_{\rho}\mathcal{P}$ , then

[TABLE]

Following the Hessian operator formula in (9), denote $\sigma=-\nabla\cdot(\rho^{\gamma}\nabla\Phi)$ , then

[TABLE]

where the second equality uses the fact $\Gamma_{\rho}(\sigma,\sigma)\in T_{\rho}\mathcal{P}$ and (14). Comparing the above with formula at $\rho=\mu$ in Lemma 11, we prove the equality. ∎

Proof of Corollary 3.

We first prove the following claim.

Claim:

[TABLE]

Proof of Claim.

The first equality holds by the definition of Hessian operator at $\rho=\mu$ as in the proof of Theorem 2. We next focus on the second equality. Denote $\sigma_{1}=-\nabla\cdot(\mu^{\gamma}\nabla\Phi)$ . Then the minimization in the second equation of (15) forms

[TABLE]

The minimizer of above minimization satisfies the following eigenvalue problem

[TABLE]

i.e.

[TABLE]

In other words, $\lambda_{1}=\lambda_{\textrm{min}}(-\Delta_{\mu^{\gamma}}\frac{1}{\mu})$ , where $\lambda_{\min}$ represents the smallest non-zero eigenvalue.

On the other hand, denote $\sigma_{2}=f\mu$ , then the minimizer of minimization (15) in the third equality forms

[TABLE]

Similarly, the minimizer of above minimization satisfies the following eigenvalue problem

[TABLE]

i.e.

[TABLE]

Thus $\lambda_{2}=\lambda_{\textrm{min}}(-\Delta_{\mu^{\gamma}}\frac{1}{\mu})$ . From the above, we have $\lambda_{1}=\lambda_{2}$ , which finishes the proof of claim. ∎

From the above claim, the smallest eigenvalue of Hessian operator of $\mathcal{D}_{\gamma}$ in $(\mathcal{P},g)$ at $\rho=\mu$ is precisely the lower bound for the Poincaré inequality. Here from the generalized Yano’s formula, we have

[TABLE]

where

[TABLE]

Thus

[TABLE]

From the above, we can estimate the smallest eigenvalue of Hessian operator, which finishes the proof. ∎

Proof of Theorem 4.

We first prove the $PH^{-1}I$ inequality. Denote $\rho_{t}$ be a geodesic curve of least energy in $\mathcal{P}$ , with $H^{-1}$ metric, where $\rho_{0}=\mu$ and $\rho_{1}=\rho$ . Then from Proposition 7, $\partial_{tt}\rho_{t}=0$ , i.e. $\rho_{t}=(1-t)\rho_{0}+t\rho_{1}$ . Thus

[TABLE]

By taking the Taylor expansion of $\mathcal{D}_{0}(\rho\|\mu)$ in $(\mathcal{P},H^{-1})$ at $\rho=\mu$ , we obtain

[TABLE]

where $\mathcal{D}_{0}(\mu\|\mu)=0$ . From the Cauchy-Schwarz inequality, we have

[TABLE]

In addition, the condition $\mu^{-1}\textrm{Ric}+\textrm{Hess}\mu^{-1}-\Delta\mu^{-1}\succeq\kappa$ implies $\textrm{Hess}_{H^{-1}}\mathcal{D}_{0}(\rho\|\mu)(\rho-\mu,\rho-\mu)\geq\kappa(\rho-\mu,\rho-\mu)_{H^{-1}}$ , thus

[TABLE]

Substituting (17) and (18) into (16), we prove the $PH^{-1}I$ inequality. In addition, the $H^{-1}$ -Talagrand inequality follows directly from Theorem 1. ∎

*Remark 10**.*

The current method fails when $\gamma>1$ or $\gamma<0$ . In these cases, there is no finite lower bound for the bilinear form and squared gradient norm for any $\rho\in\mathcal{P}$ . One can not obtain the finite ratio between $\frac{d}{dt}\mathcal{D}_{\gamma}(\rho_{t}\|\mu)$ and $\frac{d^{2}}{dt^{2}}\mathcal{D}_{\gamma}(\rho_{t}\|\mu)$ . Thus we can not establish the exponential decay results in term of $\gamma$ –divergence.

However, the current method fails does not mean that we can not find the convergence guarantee condition of $\gamma$ –diffusion processes when $\gamma>1$ . In fact, we can always formulate $\gamma$ –divergence as the gradient flow of 1-divergence (relative entropy) w.r.t. density manifold metric $\Big{(}-\nabla\cdot(\rho\mu^{\gamma-1}\nabla)\Big{)}^{-1}$ . In this case, the study of diffusion hypercontractivity forms a classical Bakry–Émery method. In other words, one can always apply the entropy method or entropy-entropy production as in [24] to find the associated diffusion hypercontractivity and convergence rate in 1-divergence. See related details in [3].

*Remark 11**.*

We comment on the proof of different types of inequalities. (i) For Log-Sobolev and Talagrand inequalities, we only need the Hessian operator along the gradient flow to have a lower bound. (ii) For Poinćare inequality, we require the Hessian operator at the equilibrium measure $\mu$ to have a lower bound. (iii) For the divergence, metric and information inequality, such as HWI or P $H^{-1}$ I inequality, we require the Hessian operator to have a lower bound for any tangent directions in density manifold. Interestingly, the above three conditions coincide in the case of $\gamma=0,1$ .

5. Generalized Bakry–Émery Calculus

In this section, we propose the generalized Bakry–Émery iterative calculus. This definition follows the connection of Hessian operator in density manifold with the generator (Kolmogorov backward operator) of $\gamma$ –drift diffusion process.

We first define the generalized iterative Bakry–Émery Gamma operators.

Definition 13 ( $\gamma$ –Bakry–Émery calculus).

Denote the $\gamma$ –Gamma one operator $\Gamma_{\gamma,1}\colon C^{\infty}(M)\times C^{\infty}(M)\times\mathcal{P}\rightarrow C^{\infty}(M)$ by

[TABLE]

where $\Phi_{1}$ , $\Phi_{2}\in C^{\infty}(M)$ .

Denote the $\gamma$ –Gamma two operator $\Gamma_{\gamma,2}\colon C^{\infty}(M)\times C^{\infty}(M)\times\mathcal{P}\rightarrow C^{\infty}(M)$ by

[TABLE]

where $\Phi_{1}$ , $\Phi_{2}\in C^{\infty}(M)$ .

*Remark 12**.*

We note that when $\gamma=1$ , we recover the classical iterative Bakry–Émery operators. Here $\Gamma_{1,1}$ and $\Gamma_{1,2}$ are independent of $\rho$ with

[TABLE]

where $L_{1}=(\nabla\log\mu,\nabla\cdot)+\Delta$ is the generator of classical Langevin drift diffusion process. In addition, when $\gamma\neq 1$ , the generalized Bakery–Émery Gamma one and Gamma two operators depend on the current density $\rho$ . In other words, they are mean-field formulations of Gamma operators.

We next prove an equality to bridge generalized Bakry–Émery calculus and Hessian operator of $\gamma$ –divergence in density manifold.

Proposition 14.

[TABLE]

where $\sigma_{i}=-\nabla\cdot(\rho^{\gamma}\nabla\Phi_{i})\in T_{\rho}\mathcal{P},$ and $\Phi_{i}\in C^{\infty}(M)$ with $i=1,2$ .

Proof.

For simplicity of presentation, in the proof, we omit the notation of $\rho$ with the generalized Gamma operators, e.g. $\Gamma_{\gamma,1}(\Phi_{1},\Phi_{2}):=\Gamma_{\gamma,1}(\Phi_{1},\Phi_{2},\rho)$ .

Let us recalculate the Hessian operator of $\mathcal{D}_{\gamma}$ in $(\mathcal{P},g)$ by using (9) directly. Using the generalized iterative operators, we reformulate (9) as follows:

[TABLE]

We next rewrite (19) in three terms. First, we prove the following claim.

Claim 1:

[TABLE]

Proof of Claim 1.

Notice

[TABLE]

and

[TABLE]

The above two facts show that

[TABLE]

Using the fact that $\delta^{2}\mathcal{D}_{\gamma}=\rho^{-\gamma}\mu^{\gamma-1}$ and $\nabla\delta\mathcal{D}_{\gamma}=\rho^{-\gamma}\mu^{\gamma-1}\nabla\rho-\rho^{1-\gamma}\mu^{\gamma-2}\nabla\mu$ .

[TABLE]

where the second last equality holds by the integration by parts formula. ∎

Secondly, by switching $\Phi_{1}$ and $\Phi_{2}$ in Claim 1, we have

[TABLE]

Thirdly, we show the following claim.

Claim 2:

[TABLE]

Proof of Claim 2.

Here

[TABLE]

where the second equality uses the fact that

[TABLE]

and the last equality holds because $L^{*}_{\gamma}$ is the adjoint operator $L_{\gamma}$ in $L^{2}(\rho)$ . ∎

By summing (20), (21) and $\frac{\gamma}{2}$ times (22) and using (19), we have

[TABLE]

∎

We last point out that the generalized Bakry–Émery iterative calculus implies generalized hypercontractivity.

Proposition 15 (Generalized Bakry–Émery criterion).

If there exists a constant $\kappa>0$ , such that

[TABLE]

for $\Phi=\frac{1}{1-\gamma}(\frac{\rho}{\mu})^{1-\gamma}$ , with respect to any $\rho\in\mathcal{P}$ . Then the generalized hypercontractivity (2) and the generalized Log-Sobolev inequality (3) hold.

*Remark 13**.*

Our generalized Bakry-Émery operators follow the proof in proposition 19 of [15]. In other words, when the divergence functional is the relative entropy, i.e. $\gamma=1$ , we have the classical Bakry-Émery iterative calculus. For generalized divergence functional, we introduce the generalized iterative Bakry-Émery calculus.

*Remark 14**.*

When $\gamma=1$ or $\gamma=0$ , the ratio between generalized Gamma two operator and Gamma one operator gives the bound in (23). This is not the case for $\gamma\neq 1,0$ . In general, we need to apply the mean field (integral formula w.r.t $\rho$ ) of the Gamma two operator to bound the Gamma one operator, and then derive the related Log-Sobolev inequalities.

Proof.

Here the proof applies Proposition 14 and the gradient flow formulation (10) to prove Theorem 1. ∎

As a summary, we show the generalized Bakry–Émery criterion (23) here, and estimate its precise bound in Theorem 1. Besides, we comment on major differences and difficulties between generalized Bakry–Émery criterions and classical ones. The Hessian operator in generalized density manifold involves an additional quadratic form $J(\Phi,\Phi)$ . Thus the smallest eigenvalue of Hessian operator in density manifold is not enough to provide a lower bound for the convergence rate of generalized drift-diffusion processes. Here, we carefully derive the global behavior of dynamics. This is to control the additional quadratic form along with the gradient flow for any $\rho$ . Besides, a local viewpoint is provided for establishing the Poincaré inequality, which follows local behavior of dynamics, i.e., the Hessian operator in density manifold at the minimizer $\mu$ . This local property relates to the integral formula, known as the Yano’s formula.

6. Discussion

In this paper, we study the diffusion hypercontractivity for $\gamma$ –drift-diffusion process, and prove generalized Log-Sobolev, Poincaré and Talagrand inequalities. Firstly, using $\mathcal{D}_{\gamma}$ as the Lyapunov function, the global exponential convergence of $\gamma$ –drift-diffusion process is presented for $\gamma\in[0,1]$ . It is to estimate the smallest eigenvalue of Hessian operator in density manifold along with the gradient flow for any $\rho\in\mathcal{P}$ . Secondly, the local behavior of $\gamma$ -drift diffusion process is shown for any $\gamma\in\mathbb{R}$ . It is to estimate the smallest eigenvalue of Hessian operator at the reference measure $\mu$ . This local property allows us to obtain a class of generalized Poincaré inequalities. Besides, we identify the generalized Poincaré inequality and Yano’s formula. Lastly, our approach can be formulated into the generalized Bakry–Émery iterative operators. Here, the Gamma one and Gamma two operators are mean-field based, which depend on the current density; see related studies in probability models [16]. In future work, we will study more general mean-field Bakry–Émery conditions for related diffusion hypercontractivity and functional inequalities.

Notations

We apply the following notations in this paper.

[TABLE]

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Bakry and M. Émery. Diffusions hypercontractives. Séminaire de probabilités de Strasbourg , 19:177–206, 1985.
2[2] D. Bakry, I. Gentil, M. Ledoux. Logarithmic Sobolev Inequalities. In: Analysis and Geometry of Markov Diffusion Operators. Grundlehren der mathematischen Wissenschaften (A Series of Comprehensive Studies in Mathematics), vol 348. Springer, Cham, 2014.
3[3] F. Bolley, I. Gentil. Phi-entropy inequalities for diffusion semigroups. Journal de Math matiques Pures et Appliqu es , 93(5): 449-473, 2010.
4[4] J. A. Carrillo, S. Lisini, G. Savar and D. Slepcev. Nonlinear mobility continuity equations and generalized displacement convexity. Journal of Functional Analysis 258(4):1273-1309, 2009.
5[5] S.-N. Chow, W. Li, and H. Zhou. Entropy dissipation of Fokker-Planck equations on graphs. Discrete & Continuous Dynamical Systems, series A , 38(10): 4929-4950, 2018.
6[6] T. M. Cover and J. A. Thomas. Elements of Information Theory . Wiley Series in Telecommunications. Wiley, New York, 1991.
7[7] I. Csiszár and P. C. Shields. Information Theory and Statistics: A Tutorial. Foundations and Trends™ in Communications and Information Theory , 1(4):417–528, 2004.
8[8] J. Dolbeault, B. Nazaret, and G. Savare A new class of transport distances. Calculus of Variations and Partial Differential Equations , (2):193–231, 2009.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Diffusion Hypercontractivity via Generalized Density Manifold

Abstract.

Key words and phrases:

1. Introduction

2. Main result

Theorem 1** (Generalized hypercontractivity).**

Example 1** (Kullback–Leibler divergence).**

Example 2** (Pearson divergence).**

Theorem 2** (Generalized Yano’s formula).**

Remark 1*.*

Corollary 3** (Generalized Poincaré inequality).**

Remark 2*.*

Remark 3*.*

Example 3** (Reverse Kullback–Leibler divergence).**

Theorem 4** (Inequalities for H−1H^{-1}H−1 metric).**

Remark 4*.*

Remark 5*.*

3. Generalized Density manifold

3.1. Density manifold and its Riemannian calculus

Definition 5**.**

Remark 6*.*

Proposition 6**.**

Proof.

Proposition 7**.**

Proof.

Proposition 8** (Lagrangian coordinates).**

Remark 7*.*

Proof.

Proposition 9**.**

Remark 8*.*

Proof.

3.2. Gradient systems and γ\gammaγ–drift diffusion process

Lemma 10**.**

Proof.

4. Proof

4.1. Sketch of proof

4.2. Hessian operator estimation

Lemma 11** (Hessian of γ\gammaγ–divergence in (P,g)(\mathcal{P},g)(P,g)).**

Remark 9*.*

Proof.

Lemma 12**.**

Proof.

4.3. Proof

Proof of Theorem 1.

Proof of Theorem 2.

Proof of Corollary 3.

Proof of Claim.

Proof of Theorem 4.

Remark 10*.*

Remark 11*.*

5. Generalized Bakry–Émery Calculus

Definition 13** (γ\gammaγ–Bakry–Émery calculus).**

Remark 12*.*

Proposition 14**.**

Proof.

Proof of Claim 1.

Proof of Claim 2.

Proposition 15** (Generalized Bakry–Émery criterion).**

Remark 13*.*

Remark 14*.*

Proof.

6. Discussion

Notations

Theorem 1 (Generalized hypercontractivity).

Example 1 (Kullback–Leibler divergence).

Example 2 (Pearson divergence).

Theorem 2 (Generalized Yano’s formula).

*Remark 1**.*

Corollary 3 (Generalized Poincaré inequality).

*Remark 2**.*

*Remark 3**.*

Example 3 (Reverse Kullback–Leibler divergence).

Theorem 4 (Inequalities for $H^{-1}$ metric).

*Remark 4**.*

*Remark 5**.*

Definition 5.

*Remark 6**.*

Proposition 6.

Proposition 7.

Proposition 8 (Lagrangian coordinates).

*Remark 7**.*

Proposition 9.

*Remark 8**.*

3.2. Gradient systems and $\gamma$ –drift diffusion process

Lemma 10.

Lemma 11 (Hessian of $\gamma$ –divergence in $(\mathcal{P},g)$ ).

*Remark 9**.*

Lemma 12.

*Remark 10**.*

*Remark 11**.*

Definition 13 ( $\gamma$ –Bakry–Émery calculus).

*Remark 12**.*

Proposition 14.

Proposition 15 (Generalized Bakry–Émery criterion).

*Remark 13**.*

*Remark 14**.*