Entropic curvature and convergence to equilibrium for mean-field   dynamics on discrete spaces

Matthias Erbar; Max Fathi; Andr\'e Schlichting

arXiv:1908.03397·math.PR·June 4, 2020

Entropic curvature and convergence to equilibrium for mean-field dynamics on discrete spaces

Matthias Erbar, Max Fathi, Andr\'e Schlichting

PDF

TL;DR

This paper introduces a notion of entropic curvature for mean-field dynamics on discrete spaces, linking curvature bounds to convergence rates and establishing explicit bounds for classical models.

Contribution

It extends the concept of curvature bounds from linear Markov chains to non-linear mean-field dynamics, providing new tools for analyzing convergence to equilibrium.

Findings

01

Positive curvature bounds imply functional inequalities for convergence.

02

Explicit curvature bounds are derived for classical statistical mechanics models.

03

The framework generalizes existing curvature notions to non-linear mean-field systems.

Abstract

We consider non-linear evolution equations arising from mean-field limits of particle systems on discrete spaces. We investigate a notion of curvature bounds for these dynamics based on convexity of the free energy along interpolations in a discrete transportation distance related to the gradient flow structure of the dynamics. This notion extends the one for linear Markov chain dynamics studied by Erbar and Maas. We show that positive curvature bounds entail several functional inequalities controlling the convergence to equilibrium of the dynamics. We establish explicit curvature bounds for several examples of mean-field limits of various classical models from statistical mechanics.

Equations321

\partial_{t} ρ = \nabla \cdot [ρ \nabla (S^{'} (ρ) + V + W * ρ)]

\partial_{t} ρ = \nabla \cdot [ρ \nabla (S^{'} (ρ) + V + W * ρ)]

\dot{\mu}(t)=\mu(t)Q\mathopen{}\mathclose{{}\left(\mu(t)}\right)\;,

\dot{\mu}(t)=\mu(t)Q\mathopen{}\mathclose{{}\left(\mu(t)}\right)\;,

F (μ) = x \in X \sum μ_{x} lo g μ_{x} + U (μ), with U (μ) = z \in X \sum μ_{z} K_{z} (μ),

F (μ) = x \in X \sum μ_{x} lo g μ_{x} + U (μ), with U (μ) = z \in X \sum μ_{z} K_{z} (μ),

F (μ_{t}) \leq (1 - t) F (μ_{0}) + t F (μ_{1}) - \frac{κ}{2} t (1 - t) W (μ_{0}, μ_{1})^{2} .

F (μ_{t}) \leq (1 - t) F (μ_{0}) + t F (μ_{1}) - \frac{κ}{2} t (1 - t) W (μ_{0}, μ_{1})^{2} .

\frac{1}{2} \frac{\mathup d ^{+}}{\mathup d t} W (μ_{t}, ν)^{2} + \frac{κ}{2} W (μ_{t}, ν)^{2} \leq F (ν) - F (μ_{t}) .

\frac{1}{2} \frac{\mathup d ^{+}}{\mathup d t} W (μ_{t}, ν)^{2} + \frac{κ}{2} W (μ_{t}, ν)^{2} \leq F (ν) - F (μ_{t}) .

\displaystyle\mathcal{I}(\mu)=\frac{1}{2}\sum\limits_{x,y}\Theta\mathopen{}\mathclose{{}\left(\mu_{x}Q_{xy}(\mu),\mu_{y}Q_{yx}(\mu)}\right)\;,\qquad\Theta(a,b)=(a-b)(\log a-\log b)\;,

\displaystyle\mathcal{I}(\mu)=\frac{1}{2}\sum\limits_{x,y}\Theta\mathopen{}\mathclose{{}\left(\mu_{x}Q_{xy}(\mu),\mu_{y}Q_{yx}(\mu)}\right)\;,\qquad\Theta(a,b)=(a-b)(\log a-\log b)\;,

F_{*} (μ) \leq \frac{1}{2 λ} I (μ);

F_{*} (μ) \leq \frac{1}{2 λ} I (μ);

F_{*} (μ_{t}) \leq e^{- 2 λ t} F_{*} (μ_{0});

F_{*} (μ_{t}) \leq e^{- 2 λ t} F_{*} (μ_{0});

W (μ, π_{*}) \leq \frac{2}{λ} F_{*} (μ) .

W (μ, π_{*}) \leq \frac{2}{λ} F_{*} (μ) .

\forall x \neq = y : Q_{x y} \geq 0 and Q_{xx} = - y \neq = x \sum Q_{x y} .

\forall x \neq = y : Q_{x y} \geq 0 and Q_{xx} = - y \neq = x \sum Q_{x y} .

\pi_{x}(\mu)=\frac{1}{Z(\mu)}\exp\big{(}-H_{x}(\mu)\big{)}\;,

\pi_{x}(\mu)=\frac{1}{Z(\mu)}\exp\big{(}-H_{x}(\mu)\big{)}\;,

H_{x} (μ) = \frac{\partial}{\partial μ _{x}} U (μ) and U (μ) = x \in X \sum μ_{x} K_{x} (μ) .

H_{x} (μ) = \frac{\partial}{\partial μ _{x}} U (μ) and U (μ) = x \in X \sum μ_{x} K_{x} (μ) .

π_{x} (μ) Q_{x y} (μ) = π_{y} (μ) Q_{y x} (μ)

π_{x} (μ) Q_{x y} (μ) = π_{y} (μ) Q_{y x} (μ)

\dot{\mu}(t)=\mu(t)Q\mathopen{}\mathclose{{}\left(\mu(t)}\right)\;,

\dot{\mu}(t)=\mu(t)Q\mathopen{}\mathclose{{}\left(\mu(t)}\right)\;,

\mathcal{K}[\mu]\psi(x):=-\frac{1}{2}\sum_{y}\Lambda\big{(}\mu_{x}Q_{xy}(\mu),\mu_{y}Q_{yx}(\mu)\big{)}\bigl{(}\psi(y)-\psi(x)\bigr{)}\;,

\mathcal{K}[\mu]\psi(x):=-\frac{1}{2}\sum_{y}\Lambda\big{(}\mu_{x}Q_{xy}(\mu),\mu_{y}Q_{yx}(\mu)\big{)}\bigl{(}\psi(y)-\psi(x)\bigr{)}\;,

\frac{d μ _{t}}{d t} = - K [μ_{t}] D F (μ_{t}) .

\frac{d μ _{t}}{d t} = - K [μ_{t}] D F (μ_{t}) .

\mathcal{W}(\mu_{0},\mu_{1}):=\underset{(\mu,\psi)\in CE}{\inf}\mathopen{}\mathclose{{}\left(\int_{0}^{1}{\mathcal{A}(\mu_{t},\psi_{t})dt}}\right)^{1/2}

\mathcal{W}(\mu_{0},\mu_{1}):=\underset{(\mu,\psi)\in CE}{\inf}\mathopen{}\mathclose{{}\left(\int_{0}^{1}{\mathcal{A}(\mu_{t},\psi_{t})dt}}\right)^{1/2}

\overset{μ}{˙}_{t} = K [μ_{t}] ψ_{t}

\overset{μ}{˙}_{t} = K [μ_{t}] ψ_{t}

\mathcal{A}(\mu,\psi)=\langle\psi,\mathcal{K}[\mu]\psi\rangle=\frac{1}{2}\sum_{x,y}(\psi(y)-\psi(x))^{2}\,\Lambda\mathopen{}\mathclose{{}\left(\mu_{x}Q_{xy}(\mu),\mu_{y}Q_{yx}(\mu)}\right),

\mathcal{A}(\mu,\psi)=\langle\psi,\mathcal{K}[\mu]\psi\rangle=\frac{1}{2}\sum_{x,y}(\psi(y)-\psi(x))^{2}\,\Lambda\mathopen{}\mathclose{{}\left(\mu_{x}Q_{xy}(\mu),\mu_{y}Q_{yx}(\mu)}\right),

F (μ_{t}) + \int_{0}^{t} I (μ_{s}) \mathup d s = F (μ_{0}) for any t > 0 .

F (μ_{t}) + \int_{0}^{t} I (μ_{s}) \mathup d s = F (μ_{0}) for any t > 0 .

\mathcal{I}(\mu)=\begin{cases}\frac{1}{2}\sum\limits_{(x,y)\in E_{\mu}}\Theta\mathopen{}\mathclose{{}\left(\mu_{x}Q_{xy}(\mu),\mu_{y}Q_{yx}(\mu)}\right)\;,&\text{ for }\mu\in\mathcal{P}^{*}(\mathcal{X})\\ +\infty\;,&\text{ else}\end{cases}\;,

\mathcal{I}(\mu)=\begin{cases}\frac{1}{2}\sum\limits_{(x,y)\in E_{\mu}}\Theta\mathopen{}\mathclose{{}\left(\mu_{x}Q_{xy}(\mu),\mu_{y}Q_{yx}(\mu)}\right)\;,&\text{ for }\mu\in\mathcal{P}^{*}(\mathcal{X})\\ +\infty\;,&\text{ else}\end{cases}\;,

I (μ) = ⟨ D F (μ), K [μ] D F (μ)⟩ .

I (μ) = ⟨ D F (μ), K [μ] D F (μ)⟩ .

\nabla ψ_{x y} = ψ_{y} - ψ_{x} .

\nabla ψ_{x y} = ψ_{y} - ψ_{x} .

(\nabla \cdot Ψ)_{x} = \frac{1}{2} y \in X \sum Ψ_{x y} - Ψ_{y x} .

(\nabla \cdot Ψ)_{x} = \frac{1}{2} y \in X \sum Ψ_{x y} - Ψ_{y x} .

⟨ ψ, ϕ ⟩ = x \in X \sum ψ_{x} ϕ_{x}, ⟨ Ψ, Φ ⟩ = \frac{1}{2} x, y \in X \sum Ψ_{x y} Φ_{x y} .

⟨ ψ, ϕ ⟩ = x \in X \sum ψ_{x} ϕ_{x}, ⟨ Ψ, Φ ⟩ = \frac{1}{2} x, y \in X \sum Ψ_{x y} Φ_{x y} .

⟨ ψ, \nabla \cdot Φ ⟩ = - ⟨ \nabla ψ, Φ ⟩ .

⟨ ψ, \nabla \cdot Φ ⟩ = - ⟨ \nabla ψ, Φ ⟩ .

\displaystyle\dot{\mu}_{t}+\nabla\cdot\bigl{(}\Lambda(\mu_{t})\cdot\nabla\psi_{t}\bigr{)}=0\;,\qquad\mathcal{A}(\mu,\psi)=\langle\nabla\psi,\Lambda(\mu)\cdot\nabla\psi\rangle\;.

\displaystyle\dot{\mu}_{t}+\nabla\cdot\bigl{(}\Lambda(\mu_{t})\cdot\nabla\psi_{t}\bigr{)}=0\;,\qquad\mathcal{A}(\mu,\psi)=\langle\nabla\psi,\Lambda(\mu)\cdot\nabla\psi\rangle\;.

x \in X \sum π_{x}^{*} Q_{x y} (π^{*}) = x \in X \sum π_{x} (π^{*}) Q_{x y} (π^{*}) = x \in X \sum π_{y} (π^{*}) Q_{y x} (π^{*}) = 0,

x \in X \sum π_{x}^{*} Q_{x y} (π^{*}) = x \in X \sum π_{x} (π^{*}) Q_{x y} (π^{*}) = x \in X \sum π_{y} (π^{*}) Q_{y x} (π^{*}) = 0,

\mathopen{}\mathclose{{}\left.\frac{\mathup{d}}{\mathup{d}s}\mathcal{F}(\mu_{s})}\right|_{s=0}=\sum_{x\in\mathcal{X}}\Big{(}\log\mu_{x}-1+\partial_{\mu_{x}}U(\mu)\Big{)}\mathopen{}\mathclose{{}\left(\nu_{x}-\mu_{x}}\right)=\sum_{x\in\mathcal{X}}\log\frac{\mu_{x}}{\pi_{x}(\mu)}\mathopen{}\mathclose{{}\left(\nu_{x}-\mu_{x}}\right)\;,

\mathopen{}\mathclose{{}\left.\frac{\mathup{d}}{\mathup{d}s}\mathcal{F}(\mu_{s})}\right|_{s=0}=\sum_{x\in\mathcal{X}}\Big{(}\log\mu_{x}-1+\partial_{\mu_{x}}U(\mu)\Big{)}\mathopen{}\mathclose{{}\left(\nu_{x}-\mu_{x}}\right)=\sum_{x\in\mathcal{X}}\log\frac{\mu_{x}}{\pi_{x}(\mu)}\mathopen{}\mathclose{{}\left(\nu_{x}-\mu_{x}}\right)\;,

\omega(\mu)=\mathopen{}\mathclose{{}\left\{\nu\in\mathcal{P}(\mathcal{X}):\mu_{t_{j}}\to\nu\text{ for some sequence }t_{j}\to\infty}\right\}\;.

\omega(\mu)=\mathopen{}\mathclose{{}\left\{\nu\in\mathcal{P}(\mathcal{X}):\mu_{t_{j}}\to\nu\text{ for some sequence }t_{j}\to\infty}\right\}\;.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Entropic curvature and convergence to equilibrium for mean-field dynamics on discrete spaces

Matthias Erbar

Insitut für Angewande Mathematik, Endenicher Allee 60, Universität Bonn

[email protected]

,

Max Fathi

CNRS & Institut de Mathématiques de Toulouse

Université de Toulouse

118 route de Narbonne, 31062 Toulouse, France

[email protected]

and

André Schlichting

Insitut für Angewande Mathematik, Endenicher Allee 60, Universität Bonn

[email protected]

Abstract.

We consider non-linear evolution equations arising from mean-field limits of particle systems on discrete spaces. We investigate a notion of curvature bounds for these dynamics based on convexity of the free energy along interpolations in a discrete transportation distance related to the gradient flow structure of the dynamics. This notion extends the one for linear Markov chain dynamics studied in [21]. We show that positive curvature bounds entail several functional inequalities controlling the convergence to equilibrium of the dynamics. We establish explicit curvature bounds for several examples of mean-field limits of various classical models from statistical mechanics.

1. Introduction

This work is about long-time behavior for mean-field systems on discrete spaces. Mean-field equations describe the large-scale limit of interacting particle systems where the total force exerted on any given particle is the average of the forces exerted by all other particles on the tagged particle. They are used to describe collective behavior in many areas of sciences. Examples include the modeling of granular flows in physics [2] and collective behavior and self-organization for groups of animals [16, 11]. We refer to [36] for an introduction to the mathematical theory.

One of the important questions in the mathematical analysis of these equations is their long-time behavior. In [9], Carrillo, McCann and Villani obtained quantitative bounds on the rate of convergence to equilibrium for McKean-Vlasov equations in a continuous setting of the form

[TABLE]

under strong convexity assumptions on the potentials $S$ , $V$ and $W$ . The core idea underlying their method was the fact that the PDE has a gradient flow structure, i.e. it can be recast as a gradient descent equation $\partial_{t}\rho=-\nabla F(\rho)$ of the free energy functional $F(\rho)=\int S(\rho)+\int V\mathop{}\!\mathup{d}\rho+\int W*\rho\mathop{}\!\mathup{d}\rho$ in the space of probability measure with respect to the Kantorovitch-Wasserstein distance $W_{2}$ , which has a formal Riemannian description via Otto calculus [32, 33]. The use of such structures in the study of long-time behavior comes from the fact that as soon as the driving functional satisfies some uniform convexity property (with respect to the particular metric structure), it must decay exponentially fast towards its minimal value along solutions of the evolution equation. Moreover, we can use convexity to derive strong functional inequalities relating the distance, the entropy functional and the entropy dissipation functional [33].

1.1. Setup and main results

Our main motivation here is to adapt the approach of [9] to mean-field equations in a discrete setting. We consider discrete mean-field dynamics of the form

[TABLE]

where $\mu$ is a flow of probability measures on a finite set $\mathcal{X}$ and $(Q(\mu)_{xy})_{x,y\in\mathcal{X}}$ is a parametrized collection of Markov kernels. These dynamics naturally arise as scaling limits of interacting particles systems on graphs where the interaction only depends on the normalized empirical measure of the system (which indeed corresponds to mean-field interactions). They generalize linear Markov chains on discrete spaces, which orrepsond to the case where $Q$ is a constant Markov kernel, independent of $\mu$ .

While the Wasserstein gradient flow approach works well on continuous spaces, it fails in the discrete setting, since the Wasserstein $L^{2}$ -transport distance does not admit any non-trivial absolutely continuous curve. In our previous work [19], we derived a gradient flow structure for (1.1) by replacing the role of the Wasserstein distance with a distance $\mathcal{W}$ constructed via a suitable modification of the Benamou-Brenier formula for optimal transport, extending similar earlier results for linear reversible Markov chains obtained in [29, 31, 13]. Under the condition that the rates $Q$ are Gibbs with a potential $K:\mathcal{P}(\mathcal{X})\times\mathcal{X}\to\mathbf{R}$ (see Assumption 2.1), i.e. $Q(\mu)$ is reversible with respect to a local Gibbs measure of the form $\pi_{x}(\mu)=Z(\mu)^{-1}\exp\bigl{(}-H_{x}(\mu)\bigr{)}$ , with $H_{x}$ given in terms of the potential $K$ , we showed that this dynamic is the gradient flow of the free energy functional

[TABLE]

with respect to the distance $\mathcal{W}$ on the simplex of probability measures on $\mathcal{X}$ , see Proposition 2.2. This built up on previous works [4, 5] that showed that $\mathcal{F}$ is indeed a Lyapunov functional for the flow. An archetypical example, which we shall discuss in some details later on, is the classical Curie-Weiss model, which corresponds to a mean-field dynamic on a two-point space. Already this easy model exhibits interesting behavior, such as a phase transition at an explicit critical value of a temperature parameter.

In the present work, we exploit this gradient flow structure to analize the long-term behaviour of (1.1) inspired by the approach in [9] by investigating convexity properties of the free energy along discrete optimal transport paths for a non-linear Markov triple $(\mathcal{X},Q,\pi)$ as above. Following the works of Lott, Sturm and Villani [28, 35] for metric measure spaces and [21, 31] for linear Markov chains, we make the following

Definition 1.1 (Entropic Ricci curvature lower bound).

We say that $(\mathcal{X},Q,\pi)$ has Ricci curvature bounded below by $\kappa\in\mathbf{R}$ (for short $\operatorname{Ric}(\mathcal{X},Q)\geq\kappa$ ) if for any $\mathcal{W}$ -geodesic $(\mu_{t})_{t\in[0,1]}$ :

[TABLE]

We will show, see Theorem 3.7, that Ricci curvature lower bounds can be characterized in terms of a discrete Bochner-type inequality by deriving the Hessian of $\mathcal{F}$ in the Riemannian structure $\mathcal{W}$ , as well as in terms of the Evolution Variational inequality EVIκ for the solutions to (1.1):

[TABLE]

Further, we will show that a positive lower bound on the Ricci curvature entails a number of functional inequalities that control the convergence to equilibrium of the mean-field systems. These involve a discrete Fisher information functional $\mathcal{I}:\mathcal{P}(\mathcal{X})\rightarrow[0,\infty]$ given by

[TABLE]

which arises from the dissipation of $\mathcal{F}$ along solutions to (1.1) as $\frac{d}{dt}\mathcal{F}(\mu_{t})=-\mathcal{I}(\mu_{t})$ . One of our main results is the following theorem which can be seen as a discrete analog of [9, Thm. 2.1].

Theorem 1.2.

Assume that $\operatorname{Ric}(\mathcal{X},Q,\pi)\geq\lambda$ for some $\lambda>0$ . Then the following hold:

(i)

there exists a unique stationary point $\pi^{*}$ for the evolution (1.1), it is the unique minimizer of $\mathcal{F}$ . Let $\mathcal{F}_{*}(\cdot):=\mathcal{F}(\cdot)-\mathcal{F}(\pi^{*})$ ;

(ii)

the modified logarithmic Sobolev inequality with constant $\lambda>0$ holds, i.e. for all $\mu\in\mathcal{P}(\mathcal{X})$ ,

[TABLE]

(iii)

for any solution $(\mu_{t})_{t\geq 0}$ to (1.1) we have exponential decay of the free energy:

[TABLE]

(iv)

the entropy-transport inequality with constant $\lambda>0$ holds, i.e. for all $\mu\in\mathcal{P}(\mathcal{X})$ ,

[TABLE]

1.2. Examples

We will establish explicit curvature bounds for several examples of (relatively simple) mean-field dynamics, such as the Curie-Weiss model, zero-range mean-field dynamcis and misanthrope processes. We compute a formula for the second derivative of entropy along geodesics, and generalize techniques developed in [22, 20] to the present non-linear situation in order to bound for bounding this second derivative. The nonlinearity of the dynamic gives rise to several extra terms when computing the Hessian of the free energy functional, which complicates the analysis.

In the case of the Curie–Weiss model, we will show that a positive lower curvature bound holds down to the critical temperature, see Section 5.1.

Another particular family of dynamic we shall be interested is when the flux of particles from some site $x$ to a site $y$ is a function of the particle density at site $y$ , that is $f(c_{y})$ . In the situation where $f$ is constant, this would correspond to the scaling limit of independent particles on the complete graph. As in [22, 20], our approach is in some sense perturbative in nature, and we shall consider rates of the form $f(r)=T+g(r)$ , and show that if $g$ is not too large in some sense, relative to $T$ , then we can derive a rate of convergence to equilibrium. This is inspired by recent work of Villemonais [39], who proved that the $N$ particle system has a positive Ollivier-Ricci (or coarse Ricci) curvature (another notion of curvature, corresponding to a contraction rate for the Markovian dynamic) independently of the system size, and hence converges to equilibrium in $L^{2}$ distance, via a uniform estimate on the Poincaré constant of the dynamic. Our approach has the advantage of yielding rates of convergence in relative entropy via Theorem 1.2, which is a strictly stronger notion of convergence.

1.3. Connection to the literature

The approach of [9] was later extended to other potentials [10, 3]. Other approaches developed later include using uniform convergence estimates for a stochastic particle approximation [12] and coupling arguments [17, 18]. Without convexity, deriving rates of convergence can be quite delicate, since there may be multiple equilibria [38], unlike what happens for linear diffusions.

Our approach developed here builds on earlier work [29, 31, 13] contructing gradient flow structures for linear Markov chain dynamics and [21] studying Ricci curvature and its impact on functional inequalities in this context. It is also related to, but different from the one developped in [26], which uses convexity of the entropy along a different type of paths, the so-called entropic interpolations, rather than geodesic paths, to establish functional inequalities involving relative entropy. In the continuous setting, entropic interpolations are regularizations of geodesics in Wasserstein space, but in the discrete case it seems that the entorpic interpolations of [26] are related to a gradient flow structure different from the one of [19] we use here.

Organization

The plan of the paper is as follows. Section 2 introduces the mathematical framework we shall work in. Section 3 introduces the notion of curvature bounds in our setting, and contains the computation of the Hessian for general dynamics. Section 4 investigates the consequences of Ricci curvature bounds in terms of functional inequalities and convergence to equilibrium for the nonlinear dynamics. Finally, Section 5 investigates curvature bounds for several examples of mean-field dynamics inspired by classical models of statistical physics.

2. Setup

2.1. Gradient-flow formulation

The main definitions and results from [19] on which this work will build are collected in this section. The gradient flow structure of (1.1) is based on the existence of a suitable potential, which is ensured by the following constraint, which we shall assume to hold throughout the article. We recall that a rate matrix $Q$ of a Markov chain in the continuous time setting satisfies

[TABLE]

Assumption 2.1.

Let $K:\mathcal{P}(\mathcal{X})\times\mathcal{X}\rightarrow\mathbf{R}$ be such that for each $x\in\mathcal{X},K_{x}:\mathcal{P}(\mathcal{X})\rightarrow\mathbf{R}$ is a twice continuously differentiable. Let $\{Q(\mu)\in\mathbf{R}^{\mathcal{X}\times\mathcal{X}}\}_{\mu\in\mathcal{P}(\mathcal{X})}$ be a family of rate matrices that is Gibbs with respect to the potential function $K$ , i.e. for each $\mu\in\mathcal{P}(\mathcal{X})$ , $Q(\mu)$ is the rate matrix of an irreducible, reversible ergodic Markov chain with respect to the probability measure

[TABLE]

with

[TABLE]

In particular $Q(\mu)$ satisfies the detailed balance condition with respect to $\pi(\mu)$ , that is for all $x,y\in\mathcal{X}$

[TABLE]

holds. Moreover, we assume that for each $x,y\in\mathcal{X}$ the map $\mu\mapsto Q_{xy}(\mu)$ is Lipschitz continuous over $\mathcal{P}(\mathcal{X})$ .

We will refer to the triple $(\mathcal{X},Q,\pi)$ as above for short as a non-linear Markov triple.

The specific form of (2.1) with (2.2) emerges from the detailed balance condition of an underlying $N$ -particle system, from which the dynamics we are interested arise in the limit $N\to\infty$ (see [19]). Associated to a non-linear Markov triple $(\mathcal{X},Q,\pi)$ is the non-linear master equation

[TABLE]

which is the deterministic evolution equation describing the mean-field limit of the underlying particle system. Based on the above assumption a gradient flow formulation of (2.4) is established in [19, Proposition 2.13] as we shall briefly recall.

Consider the Onsager operator $\mathcal{K}[\mu]:T_{\mu}^{*}\mathcal{P}(\mathcal{X})\to T_{\mu}\mathcal{P}(\mathcal{X})$ given by

[TABLE]

where $\Lambda(a,b)=\int_{0}^{1}a^{1-s}b^{s}\mathop{}\!\mathup{d}s=\frac{a-b}{\log a-\log b}$ is the logarithmic mean. Then the master equation can be written in gradient flow form using the functional $\mathcal{F}$ from (1.2):

[TABLE]

In other words, (2.4) is the gradient flow of $\mathcal{F}$ with respect to the Riemannian structure on $\mathcal{P}(\mathcal{X})$ induced by the metric tensor $\mathcal{K}[\mu]^{-1}$ . Since this Riemannian metric degenerates at the boundary of $\mathcal{P}(\mathcal{X})$ we note the following characterization in metric terms. We consider the distance function on $\mathcal{P}(\mathcal{X})$ that is formally induced by the Riemannian metric $\mathcal{K}[\mu]^{-1}$ , i.e. for $\mu_{0},\mu_{1}\in\mathcal{P}(\mathcal{X})$ we set

[TABLE]

where CE is the set of curves $(\mu_{t},\psi_{t})_{t\in[0,1]}$ with $t\rightarrow\mu_{t}$ continuous, $\psi$ measurable and integrable in time, and satisfying the continuity equation

[TABLE]

in distribution sense, and the action functional $\mathcal{A}$ is given by

[TABLE]

Proposition 2.2 (Gradient flow structure of the mean-field system).

Let $(\mathcal{X},Q,\pi)$ be a non-linear Markov triple satisfying Assumption 2.1. Then any solution to (2.4) is a gradient flow of $\mathcal{F}$ with respect to the distance $\mathcal{W}$ .

The distance $\mathcal{W}$ and the above gradient flow structure are extensions of the discrete transport distance constructed in [29] and the gradient flow structure of linear Markov chains to the non-linear case. See [19, Section 2.3] for more background on the construction of the distance $\mathcal{W}$ .

An immediate consequence of the gradient flow formulation (2.5) is the free energy dissipation relation established in [19, Remark 2.14]:

[TABLE]

Here, the discrete Fisher information or dissipation $\mathcal{I}:\mathcal{P}(\mathcal{X})\rightarrow[0,\infty]$ is defined by

[TABLE]

with $\Theta:\mathbf{R}_{+}\times\mathbf{R}_{+}\to\mathbf{R}_{+}$ defined by $\Theta(a,b)=(a-b)(\log a-\log b)$ . In this framework, the Fisher information can be reinterpreted as the squared modulus of the gradient of the entropy with respect to the discrete transport metric $\mathcal{W}$ , i.e. we have

[TABLE]

2.2. Notation

We will use the following notation throughout the paper.

Given a function $\psi\in\mathbf{R}^{\mathcal{X}}$ we will denote by $\nabla\psi\in\mathbf{R}^{\mathcal{X}\times\mathcal{X}}$ its discrete gradient, given by

[TABLE]

For a function $\Psi\in\mathbf{R}^{\mathcal{X}\times\mathcal{X}}$ we denote by $\nabla\cdot\Psi$ its discrete divergence, given by

[TABLE]

For $\psi,\phi\in\mathbf{R}^{\mathcal{X}}$ and $\Psi,\Phi\in\mathbf{R}^{\mathcal{X}\times\mathcal{X}}$ we will denote the Euclidean inner products by

[TABLE]

Then we have the integration by parts formula

[TABLE]

For a functions $\Phi,\Psi$ in $\mathbf{R}^{\mathcal{X}\times\mathcal{X}}$ , we denote by $\Phi\cdot\Psi$ the componentwise product. Using the shorthand notation $\Lambda(\mu)_{xy}:=\Lambda\bigl{(}\mu_{x}Q(\mu)_{xy},\mu_{y}Q(\mu)_{yx}\bigr{)}$ we can thus write the continuity equation (2.6) and the action functional (2.7) compactly as

[TABLE]

We will switch freely between notations for the components of functions $\psi\in\mathbf{R}^{\mathcal{X}}$ , $\Psi\in\mathbf{R}^{\mathcal{X}\times\mathcal{X}}$ as $\psi_{x},\Psi_{xy}$ or $\psi(x),\Psi(x,y)$ depending on what is more readable in the presence of other indices, e.g. a time parameter $t$ .

2.3. Equilibria and qualitative longtime behavior

From the gradient flow formulation, it is straightforward to obtain the following characterization of stationary states, which is completely analog to the McKean-Vlasov equation on $\mathbf{R}^{n}$ [8, Proposition 2.4 and Corollary 2.5].

Proposition 2.3 (Characterization of stationary points).

Let $(\mathcal{X},Q,\pi)$ be a non-linear Markov triple satisfying Assumption 2.1. Then, the following statements are equivalent:

(1)

$\pi^{*}$ * is a stationary solution to (1.1), that is $\pi^{*}Q(\pi^{*})=0$ .* 2. (2)

$\pi^{*}$ * is a fixed point of the map $\mu\mapsto\pi(\mu)$ (2.1), that is $\pi^{*}=\pi(\pi^{*})$ .* 3. (3)

$\pi^{*}$ * is a critical point of $\mathcal{F}$ (1.2) on $\mathcal{P}(\mathcal{X})$ .* 4. (4)

$\pi^{*}$ * is a global minimizer of $\mathcal{I}$ (2.9), that is $\mathcal{I}(\mu^{*})=0$ .*

The set of all stationary points $\pi^{*}$ is denoted by $\varPi^{*}$ .

Moreover, it holds that $\varPi^{*}\subset\mathcal{P}^{*}(\mathcal{X})$ , i.e. each stationary point has strictly positive density.

Proof.

(1) $\Leftrightarrow$ (2): Let $\pi^{*}Q(\pi^{*})=0$ . The rate matrix $Q^{*}=Q(\pi^{*})$ is by assumption the rate matrix of an irreducible reversible Markov chain with unique reversible measure $\pi(\pi^{*})$ . In particular, it is also the unique stationary solution to $\pi(\pi^{*})Q^{*}=0$ and hence $\pi^{*}=\pi(\pi^{*})$ . If $\pi^{*}=\pi(\pi^{*})$ , we calculate using the local detailed balance condition (2.3) and find

[TABLE]

since $Q$ is a rate matrix.

(2) $\Leftrightarrow$ (3): Take $\mu\in\mathcal{P}^{*}(\mathcal{X})$ and any $\nu\in\mathcal{P}(\mathcal{X})$ . Let $\mu_{s}=(1-s)\mu+s\nu$ the standard linear interpolation. Then, it holds

[TABLE]

where we used the relations (2.1) and (2.2). Now, if $\mu=\pi^{*}=\pi(\pi^{*})$ the right hand side is zero and hence $\pi^{*}$ a critical point if $\mathcal{F}$ . On the other hand, if the right hand side is zero for all $\nu\in\mathcal{P}(\mathcal{X})$ , it follows that $\mu_{x}=C\pi_{x}(\mu)$ for a constant $C$ . Since $\mu,\pi(\mu)\in\mathcal{P}^{*}(\mathcal{X})$ , we have that $C=1$ and hence critical points are fixed points.

(2) $\Leftrightarrow$ (4): Let $\pi^{*}=\pi^{*}(\pi)$ . Since $\mathcal{I}(\mu)\geq 0$ for all $\mu\in\mathcal{P}(\mathcal{X})$ , we immediately find from the local detailed balance condition (2.3) that $\mathcal{I}(\pi^{*})=0$ . Likewise, any global minimizer $\pi^{*}$ satisfies by the definition of $\mathcal{I}$ that $\pi^{*}_{x}Q_{xy}(\pi^{*})=\pi^{*}_{y}Q_{yx}(\pi^{*})$ , that is the local detailed balance condition (2.3). Since again by assumption $Q(\pi^{*})$ has the unique reversible measure $\pi(\pi^{*})$ , we conclude that $\pi^{*}=\pi^{*}(\pi)$ .

Finally, the positivity follows from the definition of $\pi(\mu)$ in (2.1) and the assumptions on $K$ implying that $H$ is finite. Hence, $\pi(\mu)\in\mathcal{P}^{*}(\mathcal{X})$ for all $\mu\in\mathcal{P}(\mathcal{X})$ implies in particular that $\pi(\pi^{*})=\pi^{*}\in\mathcal{P}^{*}(\mathcal{X})$ . ∎

Another useful information provided by the gradient flow information is the free energy dissipation relation (2.8), which immediately shows that $\mathcal{F}$ is a Lyapunov function for the evolution (1.1). By standard theory, we can conclude the following qualitative longtime behavior.

Proposition 2.4 (Convergence to stationary points).

Let $Q$ satisfy Assumption 2.1, then $c(t)\to\pi^{*}$ for some $\pi^{*}\in\varPi^{*}$ as $t\to\infty$ .

Proof.

The proof follows along standard arguments from the theory of dynamical systems (see for instance [37, Section 6]).

By Assumption 2.1, $Q$ is Lipschitz on $\mathcal{P}(\mathcal{X})$ , which implies by standard well-posedness for ODEs, that the solutions $(\mu_{t})_{t\geq 0}$ to (1.1) are globally defined and generate a semigroup on $\mathcal{P}(\mathcal{X})$ . The $\omega$ -limit is given by

[TABLE]

Since $\mathcal{P}(\mathcal{X})$ is compact, each orbit $\mathcal{O}^{+}(\mu_{0})=\bigcup_{t\geq 0}\mu_{t}$ for any $\mu_{0}\in\mathcal{P}(\mathcal{X})$ is also compact in $\mathcal{P}(\mathcal{X})$ and the $\omega$ -limit is non-empty and quasi-invariant, that is for $\nu\in\omega(\mu_{0})$ it holds $\mathcal{O}^{+}(\nu)\subseteq\omega(\mu_{0})$ . Moreover, again thanks to the compactness of $\mathcal{P}(\mathcal{X})$ follows for any $\mu_{0}\in\mathcal{P}(\mathcal{X})$ that $\operatorname{dist}_{\mathcal{P}(\mathcal{X})}(\mu_{t},\omega(\mu_{0}))\to 0$ as $t\to\infty$ (see also [37, Lemma 6.7]).

Since the free energy functional $\mathcal{F}$ is continuous on $\mathcal{P}(\mathcal{X})$ and monotone along the flow, it follows that $\omega(\mu_{0})$ consists of complete orbits along which $\mathcal{F}$ has the constant value $\mathcal{F}^{\infty}=\lim_{t\to\infty}\mathcal{F}(\nu_{t})$ with $\nu_{0}\in\omega(\mu_{0})$ . By the free energy dissipation relation (2.8), it follows that for any $\nu_{0}\in\omega(\mu_{0})$ and any $t>0$ we have

[TABLE]

and hence the nonnegativity of $\mathcal{I}$ and continuity of trajectories imply $\mathcal{I}(\nu_{s})=0$ for all $s\in[0,t]$ . Hence, $\omega(\mu_{0})$ consists of all states $\nu$ such that $\mathcal{I}(\nu)=0$ , which by Proposition 2.3 entails $\nu\in\varPi^{*}$ and moreover also that $\nu$ is a stationary solution $\nu Q(\nu)=0$ . ∎

Our purpose in this work can be summarized as giving sufficient conditions for which the above statement on convergence to equilibrium can be made quantitative (but which shall automatically enforce that $\varPi^{*}$ contains a single element).

3. Curvature for non-linear Markov chains

In this section, we introduce a notion Ricci curvature lower bounds for non-linear Markov chains based on geodesic convexity of the entropy. This generalizes the notion of curvature for linear Markov chains developed in [21] inspired by the approach of Lott, Sturm and Villani [28, 35] to a synthetic notion of lower bounds on Ricci curvature for geodesic metric measure spaces.

Let $(\mathcal{X},Q,\pi)$ be a non-linear Markov chain according to Assumption 2.1 and let $\mathcal{F}$ be the associated free energy functional (1.2) and $\mathcal{W}$ the associated transport distance.

Definition 3.1 (Entropic Ricci curvature lower bound).

We say that $(\mathcal{X},Q,\pi)$ has Ricci curvature bounded below by $\kappa\in\mathbf{R}$ (for short $\operatorname{Ric}(\mathcal{X},Q)\geq\kappa$ ) if for any $\mathcal{W}$ -geodesic $(\mu_{t})_{t\in[0,1]}$ :

[TABLE]

We will show that a lower bound on the Ricci curvature can be characterized equivalently by a lower bound on the Hessian of the free energy functional $\mathcal{F}$ with respect to the Riemanian structure on $\mathcal{P}_{*}(\mathcal{X})$ induced by $\mathcal{W}$ , or via an Evolution Variational Inequality for the non-linear Markov dynamics.

To this end, we first derive the geodesic equation for the distance $\mathcal{W}$ as well as an expression for the first variation of the free energy.

Lemma 3.2 (Geodesic equation).

Let $(\mu_{t})_{t\in[0,1]}$ be a constant speed geodesic contained in $\mathcal{P}_{*}(\mathcal{X})$ . Then the unique potential $(\psi_{t})_{t\in[0,1]}$ such that $(\mu,\psi)\in\operatorname{CE}$ solves

[TABLE]

or explicitely

[TABLE]

where $\partial_{\mu(z)}$ is the derivative with respect to $\mu(z)$ .

*Remark 3.3**.*

In the case of a linear Markov chain, where $Q$ is independent of $\mu$ , the expression (3.1) simplifies to

[TABLE]

recovering the geodesic equation derived in [21, Prop. 3.4].

Proof.

Since $\mathcal{P}_{*}(\mathcal{X})$ is a smooth Riemannian manifold, uniqueness and smoothness of geodesics imply that the curve $\mu_{t}$ is smooth, and that there exists a unique (up to constants) potential $\psi_{t}$ such that $(\mu,\psi)\in\operatorname{CE}$ and achieves in the infimum for the action

[TABLE]

and moreover $\psi$ is then also a smooth curve. We will derive (3.1) as the corresponding Euler–Langrange equation. So let $\mu^{s}_{t}\in\mathcal{P}_{*}(\mathcal{X})$ for $s\in[-\varepsilon,\varepsilon]$ be a smooth perturbation of $\mu$ such that $\mu^{s}_{0}=\mu_{0}$ and $\mu^{s}_{1}=\mu_{1}$ for all $s$ . Let $\psi^{s}_{t}$ be the unique potentials such that $(\mu^{s}_{\cdot},\psi^{s}_{\cdot})\in\operatorname{CE}$ . Note that $\psi^{s}_{t}$ is smooth in $s$ and $t$ . Then we have

[TABLE]

We compute

[TABLE]

From the continuity equation we infer that for any $\phi\in\mathbf{R}^{\mathcal{X}}$

[TABLE]

Plugging this into (3.2) for $s=0$ and integrating by parts in $t$ yields:

[TABLE]

The claim then follows by noting that

[TABLE]

and using that the perturbation $\partial_{s}\mu^{s}_{t}$ was arbitrary. ∎

In order to give convenient expressions for the first and second variation of the free energy $\mathcal{F}$ along a geodesic, we introduce the following notation.

We set

[TABLE]

and note that $\langle L_{\mu}\psi,\sigma\rangle=\langle\hat{L}_{\mu}\sigma,\psi\rangle$ , so $\hat{L}_{\mu}$ is the adjoint of $L_{\mu}$ . The master equation (1.1) then reads $\dot{\mu}_{t}=\hat{L}_{\mu_{t}}\mu_{t}$ . Note further that we can write

[TABLE]

where we set $\bigl{(}Q(\mu)\pi(\mu)\bigr{)}_{xy}=Q(\mu)_{xy}\pi(\mu)_{x}$ , which is symmetric in $x,y$ .

Lemma 3.4 (First variation of the free energy).

Let $(\mu,\psi)\in\operatorname{CE}$ be a solution to the continuity equation. Then it holds

[TABLE]

Note that when the curve is a solution to the gradient flow equation, the right-hand side is indeed the discrete Fisher information, in accordance with (2.8).

Proof.

Starting from the expression

[TABLE]

recalling that $\partial_{\mu_{x}}U(\mu)=H_{x}(\mu)=-\log\pi_{x}(\mu)-\log Z(\mu)$ , and setting $\rho_{t}=\mu_{t}/\pi(\mu_{t})$ , we obtain from the continuity equation

[TABLE]

Here, we have also used in the last step that

[TABLE]

and integrated by parts. ∎

To give an expression of the second variation of $\mathcal{F}$ , we further introduce the following notation.

Let $\partial_{\mu_{z}}Q(\mu;x,y)$ denote the partial derivative of $Q(\cdot;x,y)$ with respect to $\mu_{z}$ . Then we write

[TABLE]

Furthermore, let us write

[TABLE]

Then, we set

[TABLE]

Finally, we can define the following quantity:

[TABLE]

*Remark 3.5**.*

Note that in the case of a linear Markov chain, the last two terms in the definition of $\mathcal{B}$ vanish and we recover the formula of [21] for the second derivative of the entropy along geodesics.

Lemma 3.6 (Second variation of the free energy).

Let $(\mu_{t})_{t}$ be a $\mathcal{W}$ -geodesic contained in $\mathcal{P}_{*}(\mathcal{X})$ and let $(\psi_{t})$ be the unique potential such that $(\mu,\psi)\in\operatorname{CE}$ . Then it holds

[TABLE]

Proof.

From (3.4) we get

[TABLE]

To calculate $I_{1}$ , first note that

[TABLE]

where $\delta_{xz}$ denotes the Kronecker delta. Hence, we infer from the geodesic equation (3.1) and (3.5) that

[TABLE]

The continuity equation $\dot{\mu}_{t}=-\nabla\cdot\bigl{(}\Lambda(\mu_{t})\cdot\nabla\psi_{t}\bigr{)}$ readily yields that

[TABLE]

To calculate $I_{3}$ , note that for any $\phi$ we have

[TABLE]

while for any $\mu$ and $\psi$ we have

[TABLE]

Thus, we get $I_{3}=\langle\nabla\psi_{t},M(\mu_{t})\nabla\psi_{t}\rangle$ . As $I_{1}+I_{2}+I_{3}=\mathcal{B}(\mu_{t},\psi_{t})$ , this yields the claim. ∎

We can now state the following equivalent characterizations of lower Ricci bounds:

Theorem 3.7.

Let $\kappa\in\mathbf{R}$ . For a non-linear Markov triple $(\mathcal{X},Q,\pi)$ the following assertions are equivalent:

(1)

$\operatorname{Ric}(\mathcal{X},Q,\pi)\geq\kappa$ * ;* 2. (2)

For all $\mu\in\mathcal{P}_{*}(\mathcal{X})$ and $\psi\in\mathbf{R}^{\mathcal{X}}$ we have

[TABLE] 3. (3)

The following Evolution Variational Inequality EVIκ holds: for all $\mu,\nu\in\mathcal{P}(\mathcal{X})$ and all $t\geq 0$ :

[TABLE]

*where $\mu_{t}$ denotes the solution to the non-linear Fokker–Planck equation starting from $\mu$ , i.e. $\dot{\mu}_{t}=\hat{L}_{\mu_{t}}\mu_{t}=\mu_{t}Q(\mu_{t})$ and $\mu_{0}=\mu$ ; *

By Lemma 3.6, (2) corresponds to a lower bound $\kappa$ on the Hessian of $\mathcal{F}$ in the Riemannian structure on $\mathcal{P}_{*}(\mathcal{X})$ induced by $\mathcal{W}$ . Note that the equivalence of (1) and (2) is a non-trivial assertion, since the Riemannian metric degenerates at the boundary of $\mathcal{P}(\mathcal{X})$ .

Proof.

The proof is based on an argument of Daneri and Savaré [15] suitably adapted to the discrete setting. We can follow verbatim the proof of [21, Thm. 4.5] where the analogue of Thm. 3.7 is proven for linear Markov chains. The core of the argument is a variation of the action along the evolution equation, [21, Lem. 4.6]. To accommodate the additional terms arising from the non-linear structure in the present situation, we have to replace that lemma with Lemma 3.8 below. ∎

Lemma 3.8.

Let $\{\mu^{s}\}_{s\in[0,1]}$ be a smooth curve in $\mathcal{P}_{*}(\mathcal{X})$ . For each $t\geq 0,$ let $\mu_{t}^{s}$ denote the solution of the non-linear Fokker–Planck equation at time $s+t$ starting from $\mu^{s}$ and let $\{\psi_{t}^{s}\}_{s\in[0,1]}$ be a smooth curve in $\mathbf{R}^{\mathcal{X}}$ satisfying the continuity equation

[TABLE]

Then the identity

[TABLE]

holds for every $s\in[0,1]$ and $t\geq 0$ .

Proof.

First of all, setting $\rho^{s}_{t}=\frac{\mu^{s}_{t}}{\pi(\mu^{s}_{t})}$ we compute as in Lemma 3.4 that

[TABLE]

Furthermore,

[TABLE]

In order to further manipulate $\bar{I}_{1}$ we first note that

[TABLE]

Further, we observe that for any $\phi\in\mathbf{R}^{\mathcal{X}}$

[TABLE]

To show (3.9), note that the left-hand side equals $\partial_{t}\partial_{s}\langle\mu_{t}^{s},\phi\rangle$ , while the right-hand side equals $\partial_{s}\partial_{t}\langle\mu_{t}^{s},\phi\rangle$ . Integrating by parts repeatedly and using (3.9) we obtain

[TABLE]

Thus, we arrive at

[TABLE]

To conclude, it suffices to note that

[TABLE]

further remark that for any $\phi$ we have

[TABLE]

and then use again (3.8). ∎

To end this section, we use Theorem 3.7 to give an expression of the optimal lower Ricci bound on the two point space.

Lemma 3.9 (Two-point space).

Let $\bigl{(}\{0,1\},Q,\pi\bigr{)}$ be a non-linear Markov triple on the base space $\mathcal{X}=\{0,1\}$ and let $p(\mu):=Q(\mu;0,1)$ and $q(\mu):=Q(\mu;1,0)$ as well as $p^{\prime}(\mu)=[\partial_{\mu_{0}}-\partial_{\mu_{1}}]p(\mu)$ and $q^{\prime}(\mu)=[\partial_{\mu_{1}}-\partial_{\mu_{0}}]q(\mu)$ . Then, the optimal constant $\kappa$ such that $\operatorname{Ric}(\{0,1\},Q,\pi)\geq\kappa$ is given by

[TABLE]

*Remark 3.10**.*

Note that in the case of a linear Markov chain, where $p$ and $q$ are independent of $\mu$ , and in particular $p^{\prime}\equiv 0\equiv q^{\prime}$ , we recover the formula in [29, Remark 2.11].

Proof.

First, we compute from (3.7) for any $\mu\in\mathcal{P}_{*}(\{0,1\})$ and non-constant $\psi$ :

[TABLE]

Now, note that $\hat{L}_{\mu}\mu(0)=-\hat{L}_{\mu}\mu(1)=\mu(1)q(\mu)-\mu(0)p(\mu)$ , yielding

[TABLE]

Furthermore, $\mathcal{A}(\mu,\psi)=\Lambda(\mu)(0,1)(\psi(1)-\psi(0))^{2}$ . Thus by Theorem 3.7 we get the optimal curvature bound $\kappa_{\text{opt}}$ by dividing the above identity by $\Lambda(\mu)(0,1)$ and minimize in $\mu$ . Now, we use the identities

[TABLE]

to get rid of the partial derivatives and obtain after some further simplifications the result (3.10). ∎

4. Consequences of Ricci bounds

In this section we derive consequences of Ricci curvature lower bounds for non-linear Markov chains in terms of functional inequalities and the trend to equilibrium for the dynamics. Throughout this section, let $(\mathcal{X},Q,\pi)$ be a non-linear Markov triple satisfying Assumption 2.1.

We first note the following expansion bound for the transport distance between solutions to the non-linear Markov dynamics.

Proposition 4.1.

Assume that $\operatorname{Ric}(\mathcal{X},Q)\geq\kappa$ for some $\kappa\in\mathbf{R}$ . Then for any two solutions $(\mu_{t}^{i})_{t\geq 0}$ to the non-linear evolution equation $\dot{\mu}^{i}_{t}=\mu^{i}_{t}Q(\mu^{i}_{t})$ , $i=1,2$ we have

[TABLE]

In particular, when $\kappa>0$ , solutions with different initial data get closer at an exponential speed.

Proof.

This is a consequence of the EVIκ. It follows from [15, Prop. 3.1] applied to the functional $\mathcal{F}$ on the metric space $(\mathcal{P}(\mathcal{X}),\mathcal{W})$ . ∎

Next we prove some consequences of Ricci bounds in terms of different functional inequalities. These results can be seen as non-linear discrete analogues of classical results of Bakry and Émery [1] and of Otto and Villani [33]. They extend results that have been obtained in [21] for linear Markov chains, and are reminiscent of results of Carrillo, McCann and Villani [9] obtained for McKean–Vlasov equations in a continuous setting.

Let $\mathcal{F}$ be the free energy functional associated with $(\mathcal{X},Q,\pi)$ given by

[TABLE]

and recall that $\mathcal{F}$ attains its minimum on $\mathcal{P}(\mathcal{X})$ . We set

[TABLE]

so that $\min\mathcal{F}_{*}=0$ . Recall that $\mathcal{I}$ is the discrete Fisher information, given by

[TABLE]

provided $\mu\in\mathcal{P}_{*}(\mathcal{X})$ and $\mathcal{I}(\mu)=+\infty$ else. Recall that $\mathcal{I}$ gives the dissipation of $\mathcal{F}$ along a solution $(\mu_{t})$ to the non-linear Fokker–Planck equation $\dot{\mu}_{t}=\mu_{t}Q(\mu_{t})$ . More precisely, we have

[TABLE]

Note further that with $\rho=\mu/\pi(\mu)$ we have the expression $\mathcal{I}(\mu)=\mathcal{A}(\mu,-\log\rho)$ . The next result relates $\mathcal{F}$ , $\mathcal{I}$ and the transport distance $\mathcal{W}$ under a Ricci bound.

Theorem 4.2.

Assume that $\operatorname{Ric}(\mathcal{X},Q,\pi)\geq\kappa$ for some $\kappa\in\mathbf{R}$ . Then the $\mathcal{F}\mathcal{W}\mathcal{I}$ inequality holds with constant $\kappa\in\mathbf{R}$ , i.e. for all $\mu,\nu\in\mathcal{P}(\mathcal{X})$ ,

[TABLE]

Proof.

Fix $\mu,\nu\in\mathcal{P}(\mathcal{X})$ and assume without restriction that $\mu\in\mathcal{P}_{*}(\mathcal{X})$ since otherwise there is nothing to prove. Denote by $\mu_{t}$ the solution to $\dot{\mu}_{t}=\mu_{t}Q(\mu_{t})$ with $\mu_{0}=\mu$ and set $\rho_{t}=\mu_{t}/\pi(\mu_{t})$ . Theorem 3.7 yields that EVIκ holds, so in particular for $t=0$ :

[TABLE]

From the triangle inequality and the fact that $t\mapsto\mu_{t}$ is continuous with respect to $\mathcal{W}$ we obtain

[TABLE]

Now, note that since $(\mu_{t},-\log\rho_{t})\in\operatorname{CE}$ we can estimate

[TABLE]

Since $t\mapsto\mathcal{I}(\mu_{t})$ is a continuous function, we obtain

[TABLE]

which yields the claim. ∎

Theorem 4.3.

Assume that $\operatorname{Ric}(\mathcal{X},Q,\pi)\geq\lambda$ for some $\lambda>0$ . Then the following hold:

(i)

there exists a unique stationary point $\pi^{*}$ , it is the unique minimizer of $\mathcal{F}$ ;

(ii)

the modified logarithmic Sobolev inequality with constant $\lambda>0$ holds, i.e. for all $\mu\in\mathcal{P}(\mathcal{X})$ ,

[TABLE]

(iii)

for any solution $(\mu_{t})_{t\geq 0}$ to $\dot{\mu}_{t}=\mu_{t}Q(\mu_{t})$ we have exponential decay of the free energy:

[TABLE]

(iv)

the transport-entropy inequality with constant $\lambda>0$ holds, i.e. for all $\mu\in\mathcal{P}(\mathcal{X})$ ,

[TABLE]

Proof.

(i) From Proposition 2.3 we know that the set $\Pi_{*}$ of stationary points is non-empty and that it coincides with the set of local minimizers of $\mathcal{F}$ . Assume by contradiction that $F$ has two distinct local minima at points $\mu_{0}$ and $\mu_{1},$ with $F(\mu_{0})\leq F(\mu_{1})$ and let $(\mu_{s})_{s\in[0,1]}$ be a constant speed geodesic connecting $\mu_{0}$ and $\mu_{1}$ . Then we infer from $\operatorname{Ric}(\mathcal{X},Q,\pi)\geq\lambda$ that

[TABLE]

Since $\mu_{1}$ is a local minimum, there is an $\epsilon>0$ such that $\mathcal{F}(\mu_{1-\epsilon})\geq\mathcal{F}(\mu_{1})$ . This leads to

[TABLE]

a contradiction. Hence, $\Pi_{*}=\{\pi_{*}\}$ is a singleton and $\pi_{*}$ is the unique global minimizer of $\mathcal{F}$ .

(ii) By Theorem 4.2, we have that $\mathcal{F}\mathcal{W}\mathcal{I}(\kappa)$ holds. Applying $\mathcal{F}\mathcal{W}\mathcal{I}(\kappa)$ with $\mu\in\mathcal{P}(\mathcal{X})$ and $\nu=\pi_{*}$ , noting that $\mathcal{F}_{*}(\pi_{*})=0$ , and using Young’s inequality

[TABLE]

with $x=\mathcal{W}(\mu,\pi_{*})$ , $y=\sqrt{\mathcal{I}(\mu)}$ and $c=\lambda/2$ yields the claim.

(iii) From $\operatorname{MLSI}(\lambda)$ we infer that for a solution $(\mu_{t})_{t}$ we have

[TABLE]

and we obtain (4.2) as a consequence of Gronwall’s lemma.

(iv) It suffices to establish ET $(\lambda)$ for any $\mu\in\mathcal{P}_{*}(\mathcal{X})$ . The inequality for general $\mu$ can then be obtained by approximation, taking into account the continuity of $\mathcal{W}$ with respect to the Euclidean metric on $\mathcal{P}(\mathcal{X})$ . So fix $\mu\in\mathcal{P}_{*}(\mathcal{X})$ , and let $\mu_{t}$ be the solution to the non-linear Fokker–Planck equation starting from $\mu$ . From Proposition 2.4 we have that $\mu_{t}\to\pi_{*}$ as $t\to\infty$ and that

[TABLE]

The last property follows from the continuity of $\mathcal{W}$ with respect to the Euclidean distance. We now define the function $G:\mathbf{R}_{+}\to\mathbf{R}_{+}$ by

[TABLE]

Obviously we have $G(0)=\sqrt{\frac{2}{\lambda}\mathcal{F}_{*}(\mu)}$ and by (4.3) we have that $G(t)\to\mathcal{W}(\mu,\pi_{*})$ as $t\to\infty$ . Hence it is sufficient to show that $G$ is non-increasing. To this end we show that its upper right derivative is non-positive. If $\mu_{t}\neq\pi_{*}$ we deduce from (4.1) that

[TABLE]

where we used $\operatorname{MLSI}(\lambda)$ in the last inequality. If $\mu_{t}=\pi_{*}$ , then the relation also holds true, since this implies that $\mu_{r}=\pi_{*}$ for all $r\geq t$ . ∎

5. Some examples of curvature bounds

We shall now compute lower bounds on the curvature for several examples of mean-field dynamics, inspired by classical models of statistical physics.

5.1. Curie-Weiss model

Let us consider the following example also mentioned in [5, Example 4.2], which is the infinite particle limit of the classical Curie-Weiss model, one of the simplest examples of Markovian dynamic exhibiting a phase transition. Let us take $\mathcal{X}=\{0,1\}$ and define $K$ for $\beta>0$ by

[TABLE]

with $V\equiv 0,\,W(0,0)=0=W(1,1),$ and $W(0,1)=1=W(1,0)$ . Hence, we have

[TABLE]

The free energy $\mathcal{F}(\mu)$ for the Curie-Weiss model is given by

[TABLE]

Since $\mu(0)+\mu(1)=1$ , we have that the free energy is essentially given by the function $f_{\beta}:[0,1]\to\mathbf{R}$

[TABLE]

Hence, $f_{\beta}$ is convex on $[0,1]$ for $\beta\in[0,1]$ and non-convex for $\beta>1$ .

The local detailed balance state $\pi(\mu)$ (2.1) is given by

[TABLE]

Therefore, it holds

[TABLE]

We use Glauber rates and set

[TABLE]

With this choice, we can estimate the Ricci curvature of the limit with the help of Lemma 3.9.

Proposition 5.1 ( $\lambda$ -Convexity of Curie-Weiss model with Glauber rates).

It holds for $\beta\in[0,1]$

[TABLE]

As a consequence of this curvature bound, one can derive the modified logarithmic Sobolev inequality for the nonlinear dynamic. This inequality could also be derived from a logarithmic Sobolev inequality for the particle Gibbs sampler of [30] and passing to the limit in the number of particles. In [26], the mLSI was also derived via convexity of the entropy, but along a different family of interpolations of probability measures. At a technical level, the proof of [26] requires differentiating the entropy three times rather than two, which involves more technical estimates (this is not much of an issue for a two-point space system like Curie-Weiss, but gets much more complicated for more involved systems, as the ones we shall see later in this section).

Proof.

We set $\mu(0)=u$ and $\mu(1)=1-u$ , for which the rates become $p(\mu)=\exp(-\beta(2u-1))=1/q(u)$ . First, we note that with the notation of Lemma 3.9, we have $p^{\prime}(\mu)=-2\beta p(\mu)$ and $q^{\prime}(\mu)=-2\beta q(\mu)$ .

The expression in the infimum of (3.10) to optimize becomes

[TABLE]

It will be convenient to do the variable substitution $x=2u-1$ . For obtaining the expression in a compact manner, we introduce two auxiliary functions

[TABLE]

We then obtain, using the identities $up+(1-u)q=g_{1}(x)$ , $up-(1-u)q=-g_{2}(x)$ and $\operatorname{arctanh}(x)=\frac{1}{2}\log\frac{1+x}{1-x}$ and after some rewriting,

[TABLE]

A simple evaluation yields $\kappa(0)=2(1-\beta)$ , where we note $\frac{g_{2}(x)}{\beta x-\operatorname{arctanh}(x)}\to 1$ as $x\to 0$ .

For the lower bound, we proceed in several steps, we first observe that

[TABLE]

Now, the claim follows once we have shown

[TABLE]

Indeed, the last term in (5.2), combined with the above estimate, is bounded from below by $1-(1-x^{2})\beta\geq 1-\beta$ , which proves (5.1). To prove (5.3), we do another substitution and set $x=\tanh(y)$ . Therefore, the function $g_{2}$ becomes after transformations by hyperbolic trigonometric identities

[TABLE]

and we can estimate the left hand side of (5.3) by

[TABLE]

where we used the bound $\sinh(z)/z\geq 1$ for all $z\in\mathbf{R}$ . This can be further estimated by observing that $\cosh(y)\geq 1$ for all $y\in\mathbf{R}$ and by using the identity $1-\tanh^{2}y=1/\cosh^{2}y$ , to obtain

[TABLE]

by the substitution $x=\tanh(y)$ , which proves (5.3). ∎

5.2. General zero-range/misanthrope processes

In this section, we consider mean-field limits of particle systems with rate matrix of the form

[TABLE]

These systems generalize usual linear Markov chains encoded in $p(x,y)$ by an additional dependency of the jump rate on the population density of the departure and arrival site of the jump. This model, first introduced in [14], incorporates many examples, such as for instance the zero range process, for which $c(\mu_{x},\mu_{y})=b(\mu_{x})$ , but also interacting agent/voter models [39], for which $c(\mu_{x},\mu_{y})=a(\mu_{y})$ .

Since our method in this section is perturbative in nature, we restrict to the complete graph as underlying graph, that is $p(x,y)=1$ for all $x\neq y$ . In this case the mean-field limit from the $N$ -particle system was derived in [23] and the limit equation was investigated in [34]. Since positive curvature is know in the case of independent particles on the complete graph [21], we expect that for $c(\mu_{x},\mu_{y})=T+\tilde{c}(\mu_{x},\mu_{y})$ with bounded $\tilde{c}:\mathcal{P}(\mathcal{X})\times\mathcal{P}(\mathcal{X})\to[0,\infty)$ , we should also obtain positive entropic curvature for the nonlinear models when $T$ is sufficiently large.

To have a gradient flow formulation, the chain has to satisfy the local detailed balance condition (2.3)

[TABLE]

For the further analysis, we will focus on the separable case, where for some $a,b:[0,1]\to\mathbf{R}_{+}$ holds

[TABLE]

It is easy to verify that (5.4) is satisfied for

[TABLE]

This is of the form (2.2) for a potential $U$ given e.g. by

[TABLE]

i.e. for $K$ given by $K_{x}(\mu)=u(\mu_{x})/\mu_{x}$ .

Example 5.2.

There are two subclasses of models of particular interest:

[TABLE]

Both models satisfy the local detailed balance condition (2.3) for

[TABLE]

and

[TABLE]

For the first, the interacting agent model from [39] is recovered by setting $a(\mu_{y})=T/d+f(\mu_{y})$ , where $d=\lvert\mathcal{X}\rvert$ is the (constant) degree of the complete graph. For these models, [39] proves a spectral gap via another notion of discrete curvature, but which is not strong enough to derive the mLSI. In the second case, this dynamic corresponds to (a scaling limit of) a zero range-process. This type of particle system is commonly used in statistical physics as a toy model for understanding various large-scale features of interacting systems (scaling limits, long-time behavior, phase transitions). We refer to [27] for an overview. Long-time behavior of the $N$ -particle system in various situations was studied for example in [7, 6, 22, 24]. Recently, Hermon and Salez [25] significantly improved on the state of the art using a combination of the Lu-Yau martingale method and a monotone coupling argument, establishing a modified logarithmic Sobolev inequality independent of the number of particles for mean-field zero-range processes in a non-perturbative setting, even in some inhomogeneous situations where the curvature approach cannot work.

In this separable case, we can prove the following statement.

Theorem 5.3 (Curvature for separable kernels).

Assume the rates are separable, given by

[TABLE]

Suppose that

[TABLE]

Moreover, assume that

[TABLE]

Then $\operatorname{Ric}\geq\kappa$ in the sense of Theorem 3.7 with $\kappa$ given by

[TABLE]

Especially, in the regime $\frac{\max\mathopen{}\mathclose{{}\left\{\operatorname{Lip}a,\operatorname{Lip}b}\right\}}{\min\{\underline{a},\underline{b}\}}=:\eta\ll 1$ it holds

[TABLE]

Proof.

First, we evaluate some of the quantities occurring in the derivation of the curvature estimate. Let us start with (3.3), for which we have

[TABLE]

and

[TABLE]

The next quantity (3.5) becomes

[TABLE]

The last quantity is (3.6)

[TABLE]

from which after symmetrization, we obtain the identity

[TABLE]

We will use the following identity for the logarithmic mean

[TABLE]

To compensate off-diagonal terms, we need the following estimate for the logarithmic mean [22, Lemma A.2]

[TABLE]

The above basic identities shall be used to estimate the four terms in (3.7), which we denote by $\operatorname{I},\operatorname{II},\operatorname{III}$ and $\operatorname{IV}$ in this order of occurrence.

First term $\operatorname{I}$ : Let us start estimating the first term in (3.7) and use the identity (5.11)

[TABLE]

where we introduced $A(\mu)=\sum_{z}a(\mu_{z})$ . Although $\operatorname{I}_{1}$ is non-negative, we will keep it to compensate for terms from $\operatorname{II}$ and $\operatorname{IV}$ . To do so, we compactify notation further by introducing the tilted measure

[TABLE]

With this definition and with the one-homogeneity of $\Lambda$ , we can rewrite

[TABLE]

and likewise for the zero-homogeneous derivatives $(i=1,2)$

[TABLE]

With this notation we want to employ the estimate (5.12) in the form

[TABLE]

Now, we can bound $\operatorname{I}_{1}$ from below by

[TABLE]

Second term $\operatorname{II}$ : Let us continue with the second term in (3.7) for which we use (5.8), symmetrize the sum and obtain

[TABLE]

where we used the Young inequality $uv\leq u^{2}/2+v^{2}/2$ .

Third term $\operatorname{III}$ : For estimating the third term in (3.7), denoted by $\operatorname{III}$ , we use (5.9), do a crude estimate to again apply (5.11)

[TABLE]

To bound the infimum, we observe that

[TABLE]

Hence, in total we obtain

[TABLE]

Fourth term $\operatorname{IV}$ : For estimating the fourth term in (3.7), denoted by $\operatorname{IV}$ , we use (5.10) and compensate it partly by $\operatorname{I}_{2}$ from (5.14)

[TABLE]

*Conclusion: * We combine all the estimates of the individual terms in (3.7) from the rewriting $\mathcal{B}=\operatorname{I}+\operatorname{II}+\operatorname{III}+\operatorname{IV}$ . There is one small catch. After having applied the first bound (5.13) to $\operatorname{I}$ , we split for $\lambda\in(0,1)$ the non-negative part $\operatorname{I}_{1}$ into $(1-\lambda)\operatorname{I}_{1}$ and $\lambda\operatorname{I}_{1}$ , where only to the second term $\lambda\operatorname{I}_{1}$ the bound (5.14) is applied. The other three estimates (5.15), (5.16) and (5.17) are applied in a straightforward manner to $\operatorname{II}$ , $\operatorname{III}$ and $\operatorname{IV}$ , respectively, to arrive at the lower bound

[TABLE]

If $\lambda$ is chosen according to (5.5) and by $\operatorname{I}_{1}\geq 0$ , we arrive at the bound $\mathcal{B}(\mu,\psi)\geq\kappa\mathcal{A}(\mu,\psi)$ with $\kappa$ given in (5.6). The final statement (5.7) follows by simple calculus from the bound $\bar{a}\leq\underline{a}+\operatorname{Lip}a$ , similar for $\bar{b}\leq\underline{b}+\operatorname{Lip}b$ and observing that $\lambda=O(\eta)$ in this case. ∎

Acknowledgments: This work was supported by the PHC Procope project ”‘Entropic Ricci curvature bounds of interacting particle systems and their mean-field limits”’. MF was additionally supported by ANR-11-LABX-0040-CIMI within the program ANR-11-IDEX-0002-02, as well as Projects EFI (ANR-17-CE40-0030) and MESA (ANR-18-CE40-006) of the French National Research Agency (ANR). ME and AS were additionally supported by the German Research Foundation (DFG) through the “Hausdorff Center for Mathematics” and the CRC 1060 “The Mathematics of Emergent Effects”.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Dominique Bakry and Michel Émery. Diffusions hypercontractives. In Séminaire de probabilités, XIX, 1983/84 , volume 1123 of Lecture Notes in Math. , pages 177–206. Springer, Berlin, 1985.
2[2] Dario Benedetto, Emanuele Caglioti, and Mario Pulvirenti. A kinetic equation for granular media. RAIRO Modél. Math. Anal. Numér. , 31(5):615–641, 1997.
3[3] François Bolley, Ivan Gentil, and Arnaud Guillin. Uniform convergence to equilibrium for granular media. Arch. Ration. Mech. Anal. , 208(2):429–445, 2013.
4[4] Amarjit Budhiraja, Paul Dupuis, Markus Fischer, and Kavita Ramanan. Limits of relative entropies associated with weakly interacting particle systems. Electron. J. Probab. , 20:no. 80, 22, 2015.
5[5] Amarjit Budhiraja, Paul Dupuis, Markus Fischer, and Kavita Ramanan. Local stability of Kolmogorov forward equations for finite state nonlinear Markov processes. Electron. J. Probab. , 20:no. 81, 30, 2015.
6[6] Pietro Caputo, Paolo Dai Pra, and Gustavo Posta. Convex entropy decay via the Bochner-Bakry-Emery approach. Ann. Inst. Henri Poincaré Probab. Stat. , 45(3):734–753, 2009.
7[7] Pietro Caputo and Gustavo Posta. Entropy dissipation estimates in a zero-range dynamics. Probab. Theory Related Fields , 139(1-2):65–87, 2007.
8[8] J. A. Carrillo, R. S. Gvalani, G. A. Pavliotis, and A. Schlichting. Long-time behaviour and phase transitions for the mckean–vlasov equation on the torus. Archive for Rational Mechanics and Analysis , Jul 2019.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Entropic curvature and convergence to equilibrium for mean-field dynamics on discrete spaces

Abstract.

1. Introduction

1.1. Setup and main results

Definition 1.1** (Entropic Ricci curvature lower bound).**

Theorem 1.2**.**

1.2. Examples

1.3. Connection to the literature

Organization

2. Setup

2.1. Gradient-flow formulation

Assumption 2.1**.**

Proposition 2.2** (Gradient flow structure of the mean-field system).**

2.2. Notation

2.3. Equilibria and qualitative longtime behavior

Proposition 2.3** (Characterization of stationary points).**

Proof.

Proposition 2.4** (Convergence to stationary points).**

Proof.

3. Curvature for non-linear Markov chains

Definition 3.1** (Entropic Ricci curvature lower bound).**

Lemma 3.2** (Geodesic equation).**

Remark 3.3*.*

Proof.

Lemma 3.4** (First variation of the free energy).**

Proof.

Remark 3.5*.*

Lemma 3.6** (Second variation of the free energy).**

Proof.

Theorem 3.7**.**

Proof.

Lemma 3.8**.**

Proof.

Lemma 3.9** (Two-point space).**

Remark 3.10*.*

Proof.

4. Consequences of Ricci bounds

Proposition 4.1**.**

Proof.

Theorem 4.2**.**

Proof.

Theorem 4.3**.**

Proof.

5. Some examples of curvature bounds

5.1. Curie-Weiss model

Proposition 5.1** (λ\lambdaλ-Convexity of Curie-Weiss model with Glauber rates).**

Proof.

5.2. General zero-range/misanthrope processes

Example 5.2**.**

Theorem 5.3** (Curvature for separable kernels).**

Proof.

Definition 1.1 (Entropic Ricci curvature lower bound).

Theorem 1.2.

Assumption 2.1.

Proposition 2.2 (Gradient flow structure of the mean-field system).

Proposition 2.3 (Characterization of stationary points).

Proposition 2.4 (Convergence to stationary points).

Definition 3.1 (Entropic Ricci curvature lower bound).

Lemma 3.2 (Geodesic equation).

*Remark 3.3**.*

Lemma 3.4 (First variation of the free energy).

*Remark 3.5**.*

Lemma 3.6 (Second variation of the free energy).

Theorem 3.7.

Lemma 3.8.

Lemma 3.9 (Two-point space).

*Remark 3.10**.*

Proposition 4.1.

Theorem 4.2.

Theorem 4.3.

Proposition 5.1 ( $\lambda$ -Convexity of Curie-Weiss model with Glauber rates).

Example 5.2.

Theorem 5.3 (Curvature for separable kernels).