Equilibrium large deviations for mean-field systems with translation   invariance

Julien Reygner (CERMICS)

arXiv:1706.08780·math.PR·April 25, 2019

Equilibrium large deviations for mean-field systems with translation invariance

Julien Reygner (CERMICS)

PDF

TL;DR

This paper establishes large deviation principles for mean-field particle systems with translation invariance, covering McKean-Vlasov and rank-based diffusions, with applications to capital distribution analysis.

Contribution

It introduces a framework for large deviations in translation-invariant mean-field systems, including new results for systems without external potential and in orbit spaces.

Findings

01

Large deviation principles are proved for equilibrium empirical measures.

02

Results apply to systems with and without external potential.

03

Application to atypical capital distribution is demonstrated.

Abstract

We consider particle systems with mean-field interactions whose distribution is invariant by translations. Under the assumption that the system seen from its centre of mass be reversible with respect to a Gibbs measure, we establish large deviation principles for its empirical measure at equilibrium. Our study covers the cases of McKean-Vlasov particle systems without external potential, and systems of rank-based interacting diffusions. Depending on the strength of the interaction, the large deviation principles are stated in the space of centered probability measures endowed with the Wasserstein topology of appropriate order, or in the orbit space of the action of translations on probability measures. An application to the study of atypical capital distribution is detailed.

Equations459

\int_{x \in R^{d}} f (x) d τ_{y} μ (x) = \int_{x \in R^{d}} f (x + y) d μ (x),

\int_{x \in R^{d}} f (x) d τ_{y} μ (x) = \int_{x \in R^{d}} f (x + y) d μ (x),

T μ = τ_{- ξ} μ, ξ := \int_{x \in R^{d}} x d μ (x),

T μ = τ_{- ξ} μ, ξ := \int_{x \in R^{d}} x d μ (x),

W : P (R^{d}) \to [0, + \infty]

W : P (R^{d}) \to [0, + \infty]

\forall μ \in P_{ℓ} (R^{d}), W [μ] \geq κ_{ℓ} \int_{x \in R^{d}} ∣ x ∣^{ℓ} d μ (x) .

\forall μ \in P_{ℓ} (R^{d}), W [μ] \geq κ_{ℓ} \int_{x \in R^{d}} ∣ x ∣^{ℓ} d μ (x) .

W_{n} (x) := W [π_{n} (x)],

W_{n} (x) := W [π_{n} (x)],

π_{n} (x) := \frac{1}{n} i = 1 \sum n δ_{x_{i}} \in P (R^{d})

π_{n} (x) := \frac{1}{n} i = 1 \sum n δ_{x_{i}} \in P (R^{d})

M_{d, n} := {x = (x_{1}, \dots, x_{n}) \in (R^{d})^{n} : x_{1} + \dots + x_{n} = 0},

M_{d, n} := {x = (x_{1}, \dots, x_{n}) \in (R^{d})^{n} : x_{1} + \dots + x_{n} = 0},

Z_{n} := \int_{x \in M_{d, n}} exp (- \frac{2 n}{σ ^{2}} W_{n} (x)) d x \in (0, + \infty),

Z_{n} := \int_{x \in M_{d, n}} exp (- \frac{2 n}{σ ^{2}} W_{n} (x)) d x \in (0, + \infty),

i = 1 \sum n ∣ x_{i} ∣^{ℓ} \geq i = 1 \sum n - 1 ∣ x_{i} ∣^{ℓ},

i = 1 \sum n ∣ x_{i} ∣^{ℓ} \geq i = 1 \sum n - 1 ∣ x_{i} ∣^{ℓ},

Z_{n} \leq \int_{x \in M_{d, n}} exp (- \frac{2 κ _{ℓ}}{σ ^{2}} i = 1 \sum n - 1 ∣ x_{i} ∣^{ℓ}) d x,

Z_{n} \leq \int_{x \in M_{d, n}} exp (- \frac{2 κ _{ℓ}}{σ ^{2}} i = 1 \sum n - 1 ∣ x_{i} ∣^{ℓ}) d x,

p_{n} (x) := \frac{1}{Z _{n}} exp (- \frac{2 n}{σ ^{2}} W_{n} (x))

p_{n} (x) := \frac{1}{Z _{n}} exp (- \frac{2 n}{σ ^{2}} W_{n} (x))

d X_{i} (t) = - n \nabla_{x_{i}} W_{n} (X_{1} (t), \dots, X_{n} (t)) d t + σ d β_{i} (t),

d X_{i} (t) = - n \nabla_{x_{i}} W_{n} (X_{1} (t), \dots, X_{n} (t)) d t + σ d β_{i} (t),

W [μ] = \frac{1}{2} \iint_{x, y \in R^{d}} W (x - y) d μ (x) d μ (y)

W [μ] = \frac{1}{2} \iint_{x, y \in R^{d}} W (x - y) d μ (x) d μ (y)

d X_{i} (t) = - \frac{1}{n} j = 1 \sum n \nabla W (X_{i} (t) - X_{j} (t)) d t + σ d β_{i} (t) .

d X_{i} (t) = - \frac{1}{n} j = 1 \sum n \nabla W (X_{i} (t) - X_{j} (t)) d t + σ d β_{i} (t) .

B (0) = B (1) = 0,

B (0) = B (1) = 0,

W [μ] = \int_{x \in R} B (F_{μ} (x)) d x,

W [μ] = \int_{x \in R} B (F_{μ} (x)) d x,

d X_{i} (t) = k = 1 \sum n b_{n} (k) \mathds 1_{{X_{i} (t) = X_{(k)} (t)}} d t + σ d β_{i} (t),

d X_{i} (t) = k = 1 \sum n b_{n} (k) \mathds 1_{{X_{i} (t) = X_{(k)} (t)}} d t + σ d β_{i} (t),

b_{n} (k) := n (B (\frac{k}{n}) - B (\frac{k - 1}{n})) .

b_{n} (k) := n (B (\frac{k}{n}) - B (\frac{k - 1}{n})) .

W [μ] = \frac{1}{2} \iint_{x, y \in R} ∣ x - y ∣ d μ (x) d μ (y) = \int_{x \in R} F_{μ} (x) (1 - F_{μ} (x)) d x,

W [μ] = \frac{1}{2} \iint_{x, y \in R} ∣ x - y ∣ d μ (x) d μ (y) = \int_{x \in R} F_{μ} (x) (1 - F_{μ} (x)) d x,

Ξ (t) := \frac{1}{n} i = 1 \sum n X_{i} (t)

Ξ (t) := \frac{1}{n} i = 1 \sum n X_{i} (t)

X_{i} (t) := X_{i} (t) - Ξ (t),

X_{i} (t) := X_{i} (t) - Ξ (t),

P_{n} := P_{n} \circ π_{n}^{- 1},

P_{n} := P_{n} \circ π_{n}^{- 1},

S [μ] := \int_{x \in R^{d}} p (x) lo g p (x) d x

S [μ] := \int_{x \in R^{d}} p (x) lo g p (x) d x

S [μ] := + \infty

S [μ] := + \infty

F [μ] := S [μ] + \frac{2}{σ ^{2}} W [μ],

F [μ] := S [μ] + \frac{2}{σ ^{2}} W [μ],

F_{phys} = - T S_{phys} + W,

F_{phys} = - T S_{phys} + W,

F_{⋆} := μ \in P (R^{d}) in f F [μ] \in R .

F_{⋆} := μ \in P (R^{d}) in f F [μ] \in R .

n \to + \infty lim E [W_{n} (Y_{1}, \dots, Y_{n})] = W [μ] .

n \to + \infty lim E [W_{n} (Y_{1}, \dots, Y_{n})] = W [μ] .

n \to + \infty lim inf E [W_{n} (Y_{1}, \dots, Y_{n})] \geq W [μ],

n \to + \infty lim inf E [W_{n} (Y_{1}, \dots, Y_{n})] \geq W [μ],

I [μ] := F [μ] - F_{⋆} .

I [μ] := F [μ] - F_{⋆} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Equilibrium large deviations for mean-field systems with translation invariance

Julien Reygner

Université Paris-Est, CERMICS (ENPC), F-77455 Marne-la-Vallée

[email protected]

Abstract.

We consider particle systems with mean-field interactions whose distribution is invariant by translations. Under the assumption that the system seen from its centre of mass be reversible with respect to a Gibbs measure, we establish large deviation principles for its empirical measure at equilibrium. Our study covers the cases of McKean-Vlasov particle systems without external potential, and systems of rank-based interacting diffusions. Depending on the strength of the interaction, the large deviation principles are stated in the space of centered probability measures endowed with the Wasserstein topology of appropriate order, or in the orbit space of the action of translations on probability measures. An application to the study of atypical capital distribution is detailed.

Key words and phrases:

Large deviations, mean-field systems, McKean-Vlasov particle systems, rank-based interacting diffusions, free energy.

2010 Mathematics Subject Classification:

60F10, 60J60, 60K35

This work is partially supported by: the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement number 614492; the Chaire Risques Financiers, Fondation du Risque; and the French National Research Agency (ANR) under the programs ANR-12-BLAN Stab and ANR-17-CE40-0030 EFI

1. Introduction

This work is dedicated to the study of the large deviations of the empirical measure of particle systems at equilibrium exhibiting the following formal features:

(a)

they are reversible with respect to an explicit Gibbs measure; 2. (b)

the particles are coupled through mean-field interactions; 3. (c)

their distribution is invariant under spatial translations.

The typical models that we aim to study include McKean-Vlasov particle systems without external potential, whose mean-field limit allows to approximate the granular media equation [3, 4, 37, 14], and systems of one-dimensional diffusions interacting through their rank, which arise in the probabilistic interpretation of scalar nonlinear conservation laws [9, 10, 31, 44, 32]. Both models also appear in mathematical finance, in the modelling of inter-bank borrowing and lending [27] and of stable equity markets [26, 33], respectively.

For McKean-Vlasov particle systems with an external potential, which in general satisfy the conditions (a) and (b) but not (c), the large deviations of the empirical measure of the particle system under its equilibrium measure are governed by the free energy functional, which combines entropic and energetic contributions. Prefiguring the interpretation by Otto [29, 39] and Carrillo, McCann and Villani [12, 13] of (nonlinear) Fokker-Planck equations as functional gradient flows, Dawson and Gärtner [18, 19, 20] showed that, for such systems, the free energy plays the role of a quasipotential, in the sense of the Freidlin-Wentzell theory. Thus it not only describes the static large scale properties of the particle system, such as typical configurations or possible phase transitions, but it also sheds light on its large scale dynamics, providing both typical paths and fluctuation rates.

For models satisfying the condition (c), translation invariance generally prevents ergodicity, so that there is no equilibrium measure for the original particle system. Still it was noted in [37] for McKean-Vlasov systems, and in [30, 40] for rank-based interacting diffusions, that under suitable assumptions on the interactions between the particles, a stationary behaviour can be observed for the particle system seen from its centre of mass. Centering the particle system induces a conserved quantity in its evolution, and the purpose of this article is to understand the effect of this constraint on its equilibrium large deviations. To the best of the author’s knowledge, this is the first study in this direction.

For such systems, a free energy functional can still be defined, with an energetic contribution depending only on the interaction between the particles. Thus it may be expected that, under the assumption that the centered particle system be ergodic, the large deviations of its empirical measure at equilibrium be described by this free energy functional, restricted to the space of centered probability measures. The first result of this article, Theorem 2.14, provides a rigorous formulation of this assertion; however, it only holds under the assumption that the interaction between particles be strong enough, in a sense to be made precise below — typically, for McKean-Vlasov systems with an interaction potential growing faster than linearly. In contrast, when this assumption is not satisfied, which turns out to be the case for systems of rank-based interacting diffusions, we show that the rate function may fail to have compact level sets, so that the expected large deviation principle does not hold. This is formally explained by the following two facts: the topology on which a large deviation principle can be expected to hold depends on the strength of the interaction; and on too weak topologies, the space of centered probability measures is not closed.

In order to connect the free energy functional to the equilibrium large deviations of the particle system without restriction on the strength of the interaction, and thereby cover the case of rank-based interacting diffusions, we avoid resorting to the notion of centered probability measures, and rather work at the level of the orbit of the empirical measure of the particle system at equilibrium, under the action of translations. This provides an equivalent description of the particle system, however the quotient topology on the orbit space becomes weak enough for a large deviation principle to hold without any assumption on the strength of the interaction. This is the second main result of the article, Theorem 2.16, which is weaker than the first in the sense that it is implied by the latter, but holds under less restrictive assumptions.

The adaptation of these results to the specific examples of McKean-Vlasov particle systems and systems of rank-based interacting diffusions are stated as corollaries. For the latter example, the large deviation principle allows to associate a notion of free energy to scalar nonlinear conservation laws, which complements, at the level of the stationary measure, the results by Dembo, Shkolnikov, Varadhan and Zeitouni on finite time intervals [21]. As an application, we discuss at the end of the article the estimation of the probability of an atypical capital distribution in the framework of Fernholz’ Stochastic Portfolio Theory [26].

Outline of the article

The notations and main results of the article are presented in Section 2. The proof of our two main theorems is based on the approximation of the particle system without external potential by a particle system with a small external potential. The large deviation results for this approximating system are presented in Section 3, and the control of these results when the external potential vanishes is studied in Section 4. The application of the main results to the particular cases of McKean-Vlasov particle systems, and systems of rank-based interacting diffusions, is detailed in Section 5. A technical result on the metrisability of the quotient topology is proved in Appendix A.

2. Notations and main results

2.1. Spaces of probability measures

For $d\geq 1$ , we denote by $\mathcal{P}(\mathbb{R}^{d})$ the space of Borel probability measures on $\mathbb{R}^{d}$ . It is endowed with the topology of weak convergence [7, Chapter 1, p. 7], which makes it a Polish space [7, Theorem 6.8, p. 73].

For all $y\in\mathbb{R}^{d}$ , we define the translation by $y$ as the operator $\tau_{y}:\mathcal{P}(\mathbb{R}^{d})\to\mathcal{P}(\mathbb{R}^{d})$ such that, for all $\mu\in\mathcal{P}(\mathbb{R}^{d})$ ,

[TABLE]

for all measurable and bounded functions $f:\mathbb{R}^{d}\to\mathbb{R}$ . It is known that the operator $\tau_{y}$ is continuous on $\mathcal{P}(\mathbb{R}^{d})$ .

For all $p\geq 1$ , we denote by $\mathcal{P}_{p}(\mathbb{R}^{d})$ the space of Borel probability measures on $\mathbb{R}^{d}$ with a finite $p$ -th order moment. It is endowed with the Wasserstein topology of order $p$ [45, Definition 6.8, p. 96], which makes it a Polish space [45, Theorem 6.18, p. 104].

The Wasserstein topology is stronger than the topology induced on $\mathcal{P}_{p}(\mathbb{R}^{d})$ by the topology of weak convergence on $\mathcal{P}(\mathbb{R}^{d})$ , so that for any $y\in\mathbb{R}^{d}$ , the translation $\tau_{y}$ is continuous on $\mathcal{P}_{p}(\mathbb{R}^{d})$ .

We denote by $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ the subset of centered probability measures with a finite $p$ -th order moment, and define the centering operator $\mathrm{T}:\mathcal{P}_{p}(\mathbb{R}^{d})\to\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ by

[TABLE]

for all $\mu\in\mathcal{P}_{p}(\mathbb{R}^{d})$ . It is easily checked that $\mathrm{T}$ is continuous on $\mathcal{P}_{p}(\mathbb{R}^{d})$ , and that $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ is a closed subset of $\mathcal{P}_{p}(\mathbb{R}^{d})$ , hence it is a Polish space itself.

In the sequel of this article, we shall consider probability measures defined on the respective Borel $\sigma$ -fields of the topological spaces $\mathcal{P}(\mathbb{R}^{d})$ , $\mathcal{P}_{p}(\mathbb{R}^{d})$ and $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ .

2.2. Energy functional and Gibbs measure

Throughout the article, the temperature parameter $\sigma^{2}>0$ is fixed.

The physical systems which we aim to study are described by an energy functional

[TABLE]

satisfying the following set of conditions:

(TI)

translation invariance: for all $y\in\mathbb{R}^{d}$ , for all $\mu\in\mathcal{P}(\mathbb{R}^{d})$ , $\mathscr{W}[\tau_{y}\mu]=\mathscr{W}[\mu]$ ;

( $\sigma$ F)

$\sigma$ -finiteness: if $\mu$ has compact support, then $\mathscr{W}[\mu]<+\infty$ ;

(LSC)

the function $\mathscr{W}$ is lower semicontinuous on $\mathcal{P}(\mathbb{R}^{d})$ ;

(GC)

growth control: there exists $\ell\geq 1$ and $\kappa_{\ell}>0$ such that $\mathscr{W}[\mu]=+\infty$ if $\mu\not\in\mathcal{P}_{\ell}(\mathbb{R}^{d})$ , and

[TABLE]

For all $n\geq 2$ , the energy of a configuration $\mathbf{x}=(x_{1},\ldots,x_{n})\in(\mathbb{R}^{d})^{n}$ of a system with $n$ particles is defined by

[TABLE]

where

[TABLE]

is the empirical measure of the configuration $\mathbf{x}$ . Notice that Assumption ( $\sigma$ F) ensures that $W_{n}(\mathbf{x})<+\infty$ for any configuration $\mathbf{x}\in(\mathbb{R}^{d})^{n}$ .

The Gibbs density $\exp(-\frac{2n}{\sigma^{2}}W_{n}(\mathbf{x}))$ naturally associated with the energy function $W_{n}$ is never integrable on $(\mathbb{R}^{d})^{n}$ , because Assumption (TI) implies that $W_{n}(\mathbf{x})$ is invariant under the translations $(x_{1},\ldots,x_{n})\mapsto(x_{1}+\zeta,\ldots,x_{n}+\zeta)$ , $\zeta\in\mathbb{R}^{d}$ . However, introducing the linear subspace

[TABLE]

and denoting by $\mathrm{d}\widetilde{\mathbf{x}}$ the Lebesgue measure on $M_{d,n}$ , we get the following first result.

Lemma 2.1 (Finiteness of the partition function).

Let $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ be an energy functional satisfying Assumptions (TI), ( $\sigma$ F), (LSC) and (GC). For all $n\geq 2$ , we have

[TABLE]

where the function $W_{n}$ is defined by (1).

Proof.

The combination of Assumptions ( $\sigma$ F) and (LSC) ensures that the function $\widetilde{\mathbf{x}}\mapsto\exp(-\frac{2n}{\sigma^{2}}W_{n}(\widetilde{\mathbf{x}}))$ is positive and measurable, so that $\widetilde{Z}_{n}$ is well-defined as an element of $(0,+\infty]$ . Using (3), Assumption (GC) and the trivial bound

[TABLE]

we get the inequality

[TABLE]

whose right-hand side is proven to be finite by using the parametrisation of $\widetilde{\mathbf{x}}=(\widetilde{x}_{1},\ldots,\widetilde{x}_{n})\in M_{d,n}$ by $(\widetilde{x}_{1},\ldots,\widetilde{x}_{n-1})\in(\mathbb{R}^{d})^{n-1}$ . ∎

Definition 2.2 (Gibbs measure).

Under the assumptions of Lemma 2.1, we denote by $\widetilde{P}_{n}$ the probability measure on $(\mathbb{R}^{d})^{n}$ with density

[TABLE]

with respect to the Lebesgue measure $\mathrm{d}\widetilde{\mathbf{x}}$ on $M_{d,n}$ .

By definition, for all $n\geq 2$ , the probability measure $\widetilde{P}_{n}$ gives full weight to the subspace $M_{d,n}$ .

2.3. Two specific examples

When $W_{n}$ is smooth enough to ensure the well-posedness of the system of stochastic differential equations

[TABLE]

with $\beta_{1},\ldots,\beta_{n}$ independent standard $\mathbb{R}^{d}$ -valued Brownian motions, the Gibbs measure $\widetilde{P}_{n}$ of Definition 2.2 is related to the long time behaviour of the diffusion process $(X_{1}(t),\ldots,X_{n}(t))_{t\geq 0}$ . We first give two explicit examples of such processes, for which the energy functional satisfies Assumption (TI).

Example 2.3 (mv-model).

Given a smooth, nonnegative and even interaction potential $W:\mathbb{R}^{d}\to\mathbb{R}$ , the energy functional

[TABLE]

leads to the McKean-Vlasov particle system without external potential

[TABLE]

This particle system arises, for instance, in the probabilistic approximation of the granular media equation [3, 4, 37, 14], for which the choice $W(x)=|x|^{3}$ is of particular physical interest [6, 5].

Example 2.4 (rb-model).

In dimension $d=1$ , given a $C^{1}$ and nonnegative function $B:[0,1]\to\mathbb{R}$ such that

[TABLE]

the energy functional

[TABLE]

where $F_{\mu}$ denotes the cumulative distribution function of $\mu$ , is associated with the system of rank-based interacting diffusions

[TABLE]

where for all $t\geq 0$ , $X_{(1)}(t)\leq\cdots\leq X_{(n)}(t)$ denotes the order statistics of $X_{1}(t),\ldots,X_{n}(t)$ , and

[TABLE]

This particle system serves as a model for large equity markets, and is also related to the probabilistic interpretation of nonlinear scalar conservation laws [26, 42]. For the latter reason, we shall call $B$ a flux function.

Remark 2.5 (Intersection between both classes of models).

Taking $d=1$ and $W(x)=|x|$ in the mv-model yields the energy functional

[TABLE]

so that this model coincides with the rb-model for $B(u)=u(1-u)$ .

For both the mv-model and the rb-model, it is quickly observed that the centre of mass of the system

[TABLE]

is a Brownian motion in $\mathbb{R}^{d}$ , which prevents $(X_{1}(t),\ldots,X_{n}(t))$ from converging to an equilibrium probability measure. Following the remark made in [37] for the mv-model and in [30, 40] for the rb-model, we define the diffusion process $(\widetilde{X}_{1}(t),\ldots,\widetilde{X}_{n}(t))_{t\geq 0}$ on the linear subspace $M_{d,n}$ by

[TABLE]

which describes the particle system seen from its centre of mass. Under the assumptions of Lemma 2.1, this process turns out to be reversible with respect to the Gibbs measure $\widetilde{P}_{n}$ .

2.4. Free energy and large deviations

Under the assumptions of Lemma 2.1, the central object of our study is the sequence of probability measures $\widetilde{\mathbb{P}}_{n}$ defined by

[TABLE]

which describe the distribution of the empirical measure of the particle system, seen from its centre of mass, at equilibrium. Notice that, for all $n\geq 2$ , the restriction of $\pi_{n}$ to $M_{d,n}$ defines a continuous mapping from $M_{d,n}$ to either $\mathcal{P}(\mathbb{R}^{d})$ or $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , for any $p\geq 1$ ; in particular, it is measurable for both the topology of weak convergence and the Wasserstein topology. As a consequence, for all $n\geq 2$ , the probability measure $\widetilde{\mathbb{P}}_{n}$ is well-defined on both the Borel $\sigma$ -field of $\mathcal{P}(\mathbb{R}^{d})$ and the Borel $\sigma$ -field of $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ .

In order to study the large deviations of the sequence $\widetilde{\mathbb{P}}_{n}$ , we first introduce the following two functionals on $\mathcal{P}(\mathbb{R}^{d})$ .

Definition 2.6 (Boltzmann’s entropy).

For all $\mu\in\mathcal{P}(\mathbb{R}^{d})$ , we let

[TABLE]

if $\mu\in\mathcal{P}_{1}(\mathbb{R}^{d})$ and has a density $p(x)$ with respect to the Lebesgue measure on $\mathbb{R}^{d}$ , and

[TABLE]

otherwise.

Remark 2.7 (On the moment condition).

The requirement that $\mu\in\mathcal{P}_{1}(\mathbb{R}^{d})$ ensures that the negative part of $p\log p$ is integrable [1, Remark 9.3.7, p. 212], and therefore ensures that $\mathscr{S}[\mu]$ is well-defined as an element of $(-\infty,+\infty]$ .

Definition 2.8 (Free energy).

The free energy $\mathscr{F}$ associated with an energy functional $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ is defined by

[TABLE]

for all $\mu\in\mathcal{P}(\mathbb{R}^{d})$ .

Remark 2.9 (Physical free energy).

In statistical physics, $\sigma^{2}$ is usually assigned the value $2\mathrm{k}T$ , where $\mathrm{k}$ is the Boltzmann constant and $T>0$ is the temperature, and Boltzmann’s entropy is rather defined by $\mathscr{S}_{\mathrm{phys}}=-\mathrm{k}\mathscr{S}$ . Therefore, to be consistent with the classical definition of the free energy

[TABLE]

one should rather define the free energy to be worth $\frac{\sigma^{2}}{2}\mathscr{S}+\mathscr{W}$ . The difference with (14) merely lies in the multiplicative constant, and we shall keep the latter definition as it alleviates some computations throughout the article.

If the energy functional satisfies the assumptions of Lemma 2.1, the free energy possesses the following properties.

Lemma 2.10 (Bounds on the free energy).

Let $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ be an energy functional satisfying the assumptions of Lemma 2.1.

(i)

There exists $\mu\in\mathcal{P}(\mathbb{R}^{d})$ such that $\mathscr{F}[\mu]<+\infty$ . 2. (ii)

$\mathscr{F}$ * is bounded from below on $\mathcal{P}(\mathbb{R}^{d})$ .*

Remark 2.11.

It is easily checked that the uniform distribution on any compact set has a finite Boltzmann entropy, which by Assumption ( $\sigma$ F) yields the statement (i) of Lemma 2.10.

The statement (ii) of Lemma 2.10 is proved in Subsection 4.1.

Under the assumptions of Lemma 2.10, we may define

[TABLE]

This quantity is sometimes referred to as Gibbs’ free energy [20].

Before stating our first result, we introduce two further assumptions on the energy functional $\mathscr{W}$ :

(SH)

subhomogeneity: for all $\epsilon\in(0,1)$ , for all $\mathbf{x}\in(\mathbb{R}^{d})^{n}$ , $(1-\epsilon)W_{n}(\mathbf{x})\geq W_{n}((1-\epsilon)\mathbf{x})$ ;

(CC)

chaos compatibility: for all $\mu\in\mathcal{P}(\mathbb{R}^{d})$ , if $(Y_{n})_{n\geq 1}$ is a sequence of independent random variables with identical distribution $\mu$ on some probability space $(\Omega,\mathcal{A},\mathbf{P})$ , then

[TABLE]

Remark 2.12 (On Assumption (SH)).

Unlike the remainder of the assumptions, Assumption (SH) is quite technical and is only employed once in the article, namely in the proof of the exponential estimates of Lemma 4.4. It may certainly be replaced by a variety of other similar assumptions, as long as they allow to obtain the same exponential estimates, but we believe that the present formulation achieves a reasonable balance between the generality of the models that it covers, and the relative simplicity of the computations that it requires to prove Lemma 4.4.

Remark 2.13 (On Assumptions (CC) and (LSC)).

If $(Y_{n})_{n\geq 1}$ is a sequence of independent random variables with identical distribution $\mu\in\mathcal{P}(\mathbb{R}^{d})$ , then by the Glivenko-Cantelli Lemma, the empirical measure $\pi_{n}(Y_{1},\ldots,Y_{n})$ converges to $\mu$ in $\mathcal{P}(\mathbb{R}^{d})$ , $\mathbf{P}$ -almost surely. As a consequence, Assumption (LSC) and Fatou’s Lemma yield

[TABLE]

so that Assumption (CC) merely involves the limit superior of $\mathbf{E}[W_{n}(Y_{1},\ldots,Y_{n})]$ .

We are now ready to state the first main result of the article. We recall that, on a metric space, a good rate function is a proper function with compact level sets, and refer to [22] for introductory material on large deviation principles.

Theorem 2.14 (LDP for $\widetilde{\mathbb{P}}_{n}$ in Wasserstein spaces).

Let $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ be an energy functional satisfying Assumptions (TI), ( $\sigma$ F), (LSC), (GC), (SH) and (CC). If the index $\ell\geq 1$ given by Assumption (GC) is such that $\ell>1$ , then for all $p\in[1,\ell)$ , the sequence $\widetilde{\mathbb{P}}_{n}$ satisfies a large deviation principle on $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ with good rate function

[TABLE]

Notice that the large deviation principle holds only in Wasserstein topologies with order strictly smaller than the index $\ell$ of Assumption (GC), which for the mv-model coincides with the order of polynomial growth of the interaction potential $W$ . Furthermore, since the Wasserstein topology is stronger than the topology of weak convergence, the Contraction Principle [22, Theorem 4.2.1, p. 126] implies that under the assumptions of Theorem 2.14, the large deviation principle for $\widetilde{\mathbb{P}}_{n}$ also holds on the space $\mathcal{P}(\mathbb{R}^{d})$ , with good rate function $\mathscr{I}$ defined by

[TABLE]

As far as the role of the topology in the large deviation principle is concerned, a parallel can be drawn with Sanov’s Theorem. Indeed, let $\mathbb{Q}_{n}$ denote the law of the empirical measure of independent random variables in $\mathbb{R}^{d}$ , with identical distribution $\nu$ , where $\nu$ has a density proportional to $\exp(-2|x|^{\ell}/\sigma^{2})$ , with $\ell\geq 1$ . The standard Sanov Theorem [22, Theorem 6.2.10, p. 263] asserts that the sequence $\mathbb{Q}_{n}$ satisfies a large deviation principle on $\mathcal{P}(\mathbb{R}^{d})$ , and it was proved by Wang, Wang and Wu [46] that, if $\ell>1$ , then the large deviation principle actually holds on $\mathcal{P}_{p}(\mathbb{R}^{d})$ , for $p\in[1,\ell)$ — but not for $p=\ell$ .

Keeping the analogy between Sanov’s Theorem and Theorem 2.14 in mind, one may therefore wonder, if Assumption (GC) in the latter theorem is only satisfied with $\ell=1$ , whether the large deviation principle continues to hold on $\mathcal{P}(\mathbb{R}^{d})$ , with the rate function $\mathscr{I}$ defined by (15), for want of holding in a Wasserstein topology. We show that the answer is negative, by exhibiting an example for which the level sets of the function $\mathscr{I}$ fail to be compact on $\mathcal{P}(\mathbb{R}^{d})$ , which prevents the large deviation principle from holding. As should be clear from the example, this is related to the lack of continuity of the centering operator $\mathrm{T}$ on $\mathcal{P}(\mathbb{R}^{d})$ .

Example 2.15 (Counter-example to Theorem 2.14 when $\ell=1$ ).

We assume that $d=1$ and take the energy functional

[TABLE]

of Remark 2.5. It will be checked in Subsection 5.2 that this energy functional satisfies the assumptions of Theorem 2.14, except that Assumption (GC) is only satisfied with $\ell=1$ . This in fact occurs for any instance of the rb-model, and not only for the case $B(u)=u(1-u)$ corresponding to the energy functional chosen here.

Let $\varphi$ be the density of the standard Gaussian distribution on $\mathbb{R}$ , and for all $\theta\in(0,1)$ , let us define the density

[TABLE]

For all $\theta\in(0,1)$ , the probability measure $\mu_{\theta}$ with density $p_{\theta}$ is centered, and we have

[TABLE]

due to the convexity of $r\mapsto r\log r$ , while

[TABLE]

where we have used the triangle inequality twice. As a consequence, the collection $\{\mu_{\theta},\theta\in(0,1)\}$ is contained in a level set of the rate function $\mathscr{I}$ defined by (15). But on the other hand, $\mu_{\theta}$ converges weakly, when $\theta$ vanishes, to the Gaussian distribution centered in $-1$ , at which $\mathscr{I}$ takes the value $+\infty$ . Therefore the level sets of $\mathscr{I}$ are not closed, whence not compact, in $\mathcal{P}(\mathbb{R}^{d})$ .

2.5. Large deviations in the quotient space

Let us denote by $\overline{\mathcal{P}}(\mathbb{R}^{d})$ the orbit space of the group action

[TABLE]

and define

[TABLE]

the associated orbit map. The space $\overline{\mathcal{P}}(\mathbb{R}^{d})$ is endowed with the quotient topology, which is defined as the strongest topology making the map $\rho$ continuous. It is proved in Appendix A that this topology is metrisable.

If a functional $\mathscr{G}$ on $\mathcal{P}(\mathbb{R}^{d})$ is translation invariant, then it is constant on orbits and we may define the functional $\overline{\mathscr{G}}$ on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ by

[TABLE]

for any $\mu\in\mathcal{P}(\mathbb{R}^{d})$ . Under the assumptions of Lemma 2.10, the functionals $\mathscr{W}$ , $\mathscr{S}$ and $\mathscr{F}$ are translation invariant, and it is immediate that

[TABLE]

For all $n\geq 2$ , we define the probability measure

[TABLE]

on the Borel $\sigma$ -field of $\overline{\mathcal{P}}(\mathbb{R}^{d})$ . The next theorem is the second main result of this article.

Theorem 2.16 (LDP for $\overline{\mathbb{P}}_{n}$ in the quotient space).

Let $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ be an energy functional satisfying Assumptions (TI), ( $\sigma$ F), (LSC), (GC), (SH) and (CC). The sequence $\overline{\mathbb{P}}_{n}$ satisfies a large deviation principle on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ with good rate function

[TABLE]

Of course, in the case $\ell>1$ , Theorem 2.16 can be obtained by contraction from Theorem 2.14, but we will not take advantage of this remark and we will rather prove both theorems simultaneously.

Remark 2.17 (Large deviations in $\overline{\mathcal{P}}(\mathbb{R}^{d})$ ).

Let $\beta$ be a standard $\mathbb{R}^{d}$ -valued Brownian motion, and consider the occupation measure

[TABLE]

Because of the lack of ergodicity of the Brownian motion, the large deviations of $L_{t}$ , when $t\to+\infty$ , are not covered by the standard Donsker-Varadhan theory. Recently, Mukherjee and Varadhan [38] introduced a suitable compactification of the space $\overline{\mathcal{P}}(\mathbb{R}^{d})$ , in which a large deviation principle can be stated for the orbit of $L_{t}$ . This result also allows to get estimates on translation invariant functionals, such as probability measures on the space of sample paths with density proportional to

[TABLE]

with respect to the Wiener measure, for some interaction potential $W$ .

Although we rely on the same idea of working in the orbit space in order to compensate the lack of ergodicity of our original process, our topological construction is quite distinct. In particular, no compactification of the orbit space is required for Theorem 2.16 to hold.

2.6. Sketch of the proof of Theorems 2.14 and 2.16

Our two main theorems are proved simultaneously. In Section 3, we first state a large deviation principle for the law $\mathbb{P}^{\eta}_{n}$ of the empirical measure of a system with energy functional $\mathscr{W}$ and a confining functional $\mathscr{V}^{\eta}$ , the magnitude of which depends on a small parameter $\eta>0$ . This result can be considered standard, and our proof closely follows the lines of [24, Theorem 1.5]. For consistency when $\eta$ vanishes, we choose the external potential associated with $\mathscr{V}^{\eta}$ to grow as $|x|^{\ell}$ , where $\ell\geq 1$ is given by Assumption (GC). As a result, the large deviation principle for $\mathbb{P}^{\eta}_{n}$ holds on $\mathcal{P}(\mathbb{R}^{d})$ , and if $\ell>1$ , on $\mathcal{P}_{p}(\mathbb{R}^{d})$ for any $p\in[1,\ell)$ .

By contraction, we then obtain large deviation principles for the respective pushforward measures $\overline{\mathbb{P}}^{\eta}_{n}$ and $\widetilde{\mathbb{P}}^{\eta}_{n}$ of $\mathbb{P}^{\eta}_{n}$ by $\rho$ and $\mathrm{T}$ , respectively on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ , and if $\ell>1$ , on $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ for any $p\in[1,\ell)$ . The end of the proof, detailed in Section 4, then consists in checking that, when $\eta$ vanishes, $\overline{\mathbb{P}}^{\eta}_{n}$ and $\widetilde{\mathbb{P}}^{\eta}_{n}$ provide sufficiently good approximations of $\overline{\mathbb{P}}_{n}$ and $\widetilde{\mathbb{P}}_{n}$ , at the level of large deviations. This part can be considered as the main original contribution of the article.

2.7. Large deviations for the mv-model and the rb-model

We come back to the specific examples of the mv-model and rb-model introduced in Subsection 2.3, and state large deviation principles for these models which come as corollaries of Theorems 2.14 and 2.16.

2.7.1. mv-model

Let $W:\mathbb{R}^{d}\to[0,+\infty)$ be an interaction potential which possesses the decomposition

[TABLE]

where the functions $W^{\sharp}:\mathbb{R}^{d}\to[0,+\infty)$ and $W^{\flat}:\mathbb{R}^{d}\to\mathbb{R}$ satisfy the following respective assumptions.

(mv- $\sharp$ )

The function $W^{\sharp}$ is even, lower semicontinuous on $\mathbb{R}^{d}$ , there exists $\ell\geq 1$ and $\kappa_{\ell}>0$ such that, for all $x\in\mathbb{R}^{d}$ , $W^{\sharp}(x)\geq 2\kappa_{\ell}|x|^{\ell}$ , and for all $\epsilon\in(0,1)$ , for all $x\in\mathbb{R}^{d}$ , $(1-\epsilon)W^{\sharp}(x)\geq W^{\sharp}((1-\epsilon)x)$ .

(mv- $\flat$ )

The function $W^{\flat}$ is even, continuous on $\mathbb{R}^{d}$ and, with $\ell\geq 1$ given by Assumption (mv- $\sharp$ ) on $W^{\sharp}$ :

•

if $\ell=1$ , then $W^{\flat}$ is bounded;

•

if $\ell>1$ , then there exists $\ell^{\prime}\in[0,\ell)$ such that $W^{\flat}(x)/(1+|x|^{\ell^{\prime}})$ is bounded on $\mathbb{R}^{d}$ .

Any polynomial function of $|x|$ , with nonnegative but possibly fractional powers, degree larger or equal to $1$ , and positive leading coefficient, satisfies this set of assumptions — up to renormalisation of the constant term in order to ensure nonnegativity. This is in particular the case of the cubic potential $W(x)=|x|^{3}$ corresponding to the granular media equation [6, 5]. However, singular potentials such as those involved in the particle approximation of the Keller-Segel equation [28, 15], or in the study of Coulomb gases [35], do not satisfy our set of assumptions.

Corollary 2.18 (LDP for the mv-model).

Let $W:\mathbb{R}^{d}\to[0,+\infty)$ be an interaction potential possessing the decomposition (17), with functions $W^{\sharp}$ and $W^{\flat}$ satisfying the respective Assumptions (mv- $\sharp$ ) and (mv- $\flat$ ). Let us define the energy functional $\mathscr{W}$ by the identity (7). The sequence of associated probability measures $\widetilde{P}_{n}$ is well-defined, and letting $\widetilde{\mathbb{P}}_{n}$ , $\overline{\mathbb{P}}_{n}$ be defined by (12) and (16), respectively, we have the following results.

(i)

The sequence $\overline{\mathbb{P}}_{n}$ satisfies a large deviation principle on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ with good rate function $\overline{\mathscr{I}}$ defined by Theorem 2.16. 2. (ii)

If the index $\ell\geq 1$ of Assumptions (mv- $\sharp$ ) and (mv- $\flat$ ) is such that $\ell>1$ , then for all $p\in[1,\ell)$ , the sequence $\widetilde{\mathbb{P}}_{n}$ satisfies a large deviation principle on $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ with good rate function $\widetilde{\mathscr{I}}$ defined by Theorem 2.14.

The proof of Corollary 2.18 is presented in Subsection 5.1. If $W^{\flat}\equiv 0$ , then the energy functional $\mathscr{W}$ actually satisfies the assumptions of Theorems 2.14 and 2.16, so that the result of Corollary 2.18 is straightforward. The case $W^{\flat}\not\equiv 0$ is treated as a perturbation of the previous case, thanks to the Laplace-Varadhan Lemma.

2.7.2. rb-model

Let $B:[0,1]\to[0,+\infty)$ be a $C^{1}$ flux function satisfying the condition (8), which ensures that the energy functional $\mathscr{W}$ defined by (9) is not identically equal to $+\infty$ . It is known [42] that the condition

[TABLE]

which is called Oleinik’s entropy condition in the vocabulary of conservation laws, ensures the ergodicity of the centered particle system introduced in Subsection 2.3. The combination of (8) and (18) implies that $B^{\prime}(0)\geq 0\geq B^{\prime}(1)$ , and the stronger condition

[TABLE]

which is called Lax’ entropy condition, generally ensures better ergodic properties of both the particle system and its mean-field limit [30, 32, 33, 43]. Notice that if $B$ is assumed to be concave, then Oleinik’s and Lax’ conditions are equivalent, and hold as soon as $B$ is not identically zero.

We shall check in Subsection 5.2 that this set of conditions implies that the energy functional $\mathscr{W}$ satisfies the assumptions of Theorem 2.16, and in particular Assumption (GC) with $\ell=1$ , which allows to define the sequence $\overline{\mathbb{P}}_{n}$ associated with $\mathscr{W}$ and leads to the following result.

Corollary 2.19 (LDP for the rb-model).

Let $B:[0,1]\to[0,+\infty)$ be a $C^{1}$ flux function satisfying the conditions (8), (18) and (19). Let $\mathscr{W}$ be the energy functional associated with $B$ by (9). The sequence $\overline{\mathbb{P}}_{n}$ associated with $\mathscr{W}$ is well-defined, and it satisfies a large deviation principle on $\overline{\mathcal{P}}(\mathbb{R})$ , with good rate function $\overline{\mathscr{I}}$ given by Theorem 2.16.

In mathematical finance, systems of rank-based interacting diffusions are employed to model the evolution of the logarithmic capitalisations of stocks on an equity market [26, 2, 33]. In Subsection 5.3, we present an application of Corollary 2.19 to the study of atypical capital distribution in this framework.

3. Large deviations with a small external potential

Throughout this section, $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ is an energy functional satisfying Assumptions (TI), ( $\sigma$ F), (LSC), (GC) and (CC), and $\ell\geq 1$ is the index given by Assumption (GC). We do not repeat these assumptions in the statements of our results.

We first introduce a few notations. For all $\eta>0$ , we define

[TABLE]

for all $x\in\mathbb{R}^{d}$ , and let

[TABLE]

for all $\mu\in\mathcal{P}(\mathbb{R}^{d})$ , as well as

[TABLE]

for all $\mathbf{x}\in(\mathbb{R}^{d})^{n}$ . Let

[TABLE]

and let $\nu^{\eta}$ be the probability measure on $\mathbb{R}^{d}$ with density $(z^{\eta})^{-1}\exp(-2V^{\eta}/\sigma^{2})$ with respect to the Lebesgue measure on $\mathbb{R}^{d}$ .

3.1. Relative entropy and Sanov’s Theorem

We recall that the relative entropy of $\mu\in\mathcal{P}(\mathbb{R}^{d})$ with respect to $\nu\in\mathcal{P}(\mathbb{R}^{d})$ is defined by

[TABLE]

The following lemma is straightforward.

Lemma 3.1 (From Boltzmann’s entropy to relative entropy).

For all $\mu\in\mathcal{P}(\mathbb{R}^{d})$ ,

[TABLE]

The identity (24) holds in $[0,+\infty]$ , in the sense that if $\mu\not\in\mathcal{P}_{\ell}(\mathbb{R}^{d})$ , then $\mathscr{R}[\mu|\nu]=+\infty$ ; while if $\mu\in\mathcal{P}_{\ell}(\mathbb{R}^{d})$ , then $\mathscr{S}[\mu]$ and $\mathscr{R}[\mu|\nu]$ are simultaneously finite or equal to $+\infty$ .

With the notations introduced above, let us define

[TABLE]

Proposition 3.2 (Sanov’s Theorem).

For all $\eta>0$ , the sequence $\mathbb{Q}^{\eta}_{n}$ satisfies a large deviation principle on $\mathcal{P}(\mathbb{R}^{d})$ , with good rate function $\mathscr{R}[\cdot|\nu^{\eta}]$ . If $\ell>1$ , the large deviation principle holds on $\mathcal{P}_{p}(\mathbb{R}^{d})$ for all $p\in[1,\ell)$ , with the same rate function.

The statement of the large deviation principle on $\mathcal{P}(\mathbb{R}^{d})$ is the usual formulation of Sanov’s Theorem [22, Theorem 6.2.10, p. 263]. Its extension to $\mathcal{P}_{p}(\mathbb{R}^{d})$ is due to Wang, Wang and Wu [46].

3.2. Large deviations in the interacting case

Owing to Assumption ( $\sigma$ F), we have

[TABLE]

and we denote by $P^{\eta}_{n}$ the probability measure on $(\mathbb{R}^{d})^{n}$ with density

[TABLE]

with respect to the Lebesgue measure on $(\mathbb{R}^{d})^{n}$ . We finally let

[TABLE]

and define the free energy functional $\mathscr{F}^{\eta}$ by

[TABLE]

Large deviation principles for equilibrium mean-field systems with an external potential may be considered to be standard results in the literature [36, 20, 16, 24]. We however give a complete proof of the next statement, which is adapted to our assumptions on the energy functional $\mathscr{W}$ , and follows closely the arguments of Dupuis, Laschos and Ramanan [24, Theorem 1.5].

Proposition 3.3 (LDP for the sequence $\mathbb{P}^{\eta}_{n}$ ).

For all $\eta>0$ , the sequence $\mathbb{P}^{\eta}_{n}$ satisfies a large deviation principle on $\mathcal{P}(\mathbb{R}^{d})$ with good rate function

[TABLE]

where

[TABLE]

If $\ell>1$ , the large deviation principle holds on $\mathcal{P}_{p}(\mathbb{R}^{d})$ for all $p\in[1,\ell)$ , with the same rate function.

Notice that the same arguments as in Remark 2.11 show that $\mathscr{F}^{\eta}_{\star}<+\infty$ . On the other hand, combining Lemma 3.1 with (27) yields

[TABLE]

so that the nonnegativity of both the relative entropy and the energy functional ensure that $\mathscr{F}^{\eta}_{\star}>-\infty$ .

We may now proceed to the proof of Proposition 3.3.

Proof.

The proof relies on the so-called weak convergence approach to large deviations developed by Dupuis and Ellis [23]. Throughout the proof, we use the notation $\mathcal{P}_{*}(\mathbb{R}^{d})$ to refer to either of the topological spaces $\mathcal{P}(\mathbb{R}^{d})$ or $\mathcal{P}_{p}(\mathbb{R}^{d})$ , if $\ell>1$ and $p\in[1,\ell)$ . We recall that both spaces are Polish.

As a first step, we invoke [23, Theorem 1.2.3, p. 7] to reduce the proof of Proposition 3.3 to the verification of the following two facts:

(i)

the function $\mathscr{I}^{\eta}$ has compact level sets on $\mathcal{P}_{*}(\mathbb{R}^{d})$ ; 2. (ii)

for any continuous and bounded functional $\mathscr{G}:\mathcal{P}_{*}(\mathbb{R}^{d})\to\mathbb{R}$ , the Laplace principle

[TABLE]

holds.

Proof of (i). Using Lemma 3.1, we rewrite

[TABLE]

so that it suffices to show that $\mathscr{R}[\cdot|\nu^{\eta}]+\frac{2}{\sigma^{2}}\mathscr{W}$ has compact level sets. As a consequence of Proposition 3.2, $\mathscr{R}[\cdot|\nu^{\eta}]$ is a good rate function on $\mathcal{P}_{*}(\mathbb{R}^{d})$ and therefore has compact level sets. Since the functional $\mathscr{W}$ is nonnegative and satisfies Assumption (LSC), then any level set of $\mathscr{R}[\cdot|\nu^{\eta}]+\frac{2}{\sigma^{2}}\mathscr{W}$ is a closed subset of a level set of $\mathscr{R}[\cdot|\nu^{\eta}]$ , and therefore is compact.

Reformulation of (28). Let us first remark that, on account of the definitions of $\mathbb{P}^{\eta}_{n}$ and $\mathbb{Q}^{\eta}_{n}$ ,

[TABLE]

As a consequence, the prelimit in (28) rewrites

[TABLE]

so that it suffices to compute the limit of the first term in the right-hand side, and deduce the limit of the second by taking $\mathscr{G}\equiv 0$ . The computation of such quantities is typically the object of Varadhan’s Lemma, which cannot be directly applied here since the functional $\mathscr{W}$ is not assumed to be continuous and bounded.

Lower bound in the Laplace principle. Using the fact that $\mathscr{W}$ is bounded from below and satisfies Assumption (LSC), the combination of Proposition 3.2 with the variant of Varadhan’s Lemma [22, Lemma 4.3.6, p. 138] provides the lower bound

[TABLE]

Upper bound in the Laplace principle. In order to obtain an upper bound of the same order as (32), we first introduce a few notations. For all $\mathbf{x}\in(\mathbb{R}^{d})^{n}$ , we define

[TABLE]

and for all $M\geq 0$ ,

[TABLE]

The function $\Psi^{M}_{n}$ is measurable and bounded on $(\mathbb{R}^{d})^{n}$ , so that the representation formula [23, Proposition 1.4.2, p. 27] — or dually the Donsker-Varadhan variational characterisation of the relative entropy [23, Lemma 1.4.3, p. 29] — show that, for all probability measures $R_{n}$ on $(\mathbb{R}^{d})^{n}$ ,

[TABLE]

where the definition of the relative entropy of probability measures on $(\mathbb{R}^{d})^{n}$ is the same as (23) for probability measures on $\mathbb{R}^{d}$ . Using the trivial bound $\Psi^{M}_{n}(\mathbf{x})\geq\Psi_{n}(\mathbf{x})$ on the one hand, and the fact that since $\Psi_{n}$ is bounded from above on $(\mathbb{R}^{d})^{n}$ , the Dominated Convergence Theorem yields

[TABLE]

on the other hand, we deduce that

[TABLE]

which rewrites

[TABLE]

Let $\epsilon>0$ , and let $\mu_{\epsilon}\in\mathcal{P}_{*}(\mathbb{R}^{d})$ be such that

[TABLE]

We evaluate the right-hand side of (33) with $R_{n}=\mu_{\epsilon}^{\otimes n}$ . On the one hand, it is easily seen that

[TABLE]

while on the other hand,

[TABLE]

where $Y_{1},\ldots,Y_{n}$ are independent random variables in $\mathbb{R}^{d}$ with identical distribution $\mu_{\epsilon}$ on some probability space $(\Omega,\mathcal{A},\mathbf{P})$ . By Assumption (CC),

[TABLE]

whereas to justify the convergence of $\mathbf{E}[\mathscr{G}[\pi_{n}(Y_{1},\ldots,Y_{n})]]$ to $\mathscr{G}[\mu_{\epsilon}]$ , we now show that

[TABLE]

and conclude by the Dominated Convergence Theorem using the fact that $\mathscr{G}$ is continuous and bounded on $\mathcal{P}_{*}(\mathbb{R}^{d})$ .

•

If $\mathcal{P}_{*}(\mathbb{R}^{d})$ refers to the topological space $\mathcal{P}(\mathbb{R}^{d})$ , then (34) is the Glivenko-Cantelli Lemma.

•

If $\ell>1$ and $\mathcal{P}_{*}(\mathbb{R}^{d})$ refers to the topological space $\mathcal{P}_{p}(\mathbb{R}^{d})$ , with $p\in[1,\ell)$ , then by the strong Law of Large Numbers,

[TABLE]

which, combined with the Glivenko-Cantelli Lemma, implies the $\mathbf{P}$ -almost sure convergence in $\mathcal{P}_{p}(\mathbb{R}^{d})$ of $\pi_{n}(Y_{1},\ldots,Y_{n})$ to $\mu_{\epsilon}$ [45, Definition 6.8, p. 96].

As a consequence, we finally get

[TABLE]

Conclusion of the proof. Letting $\epsilon\downarrow 0$ in (35) and combining the latter inequality with (32), we conclude that

[TABLE]

so that, taking (31) into account,

[TABLE]

By (29), the right-hand side above rewrites $\inf_{\mu\in\mathcal{P}_{*}(\mathbb{R}^{d})}\{\mathscr{G}[\mu]+\mathscr{I}^{\eta}[\mu]\}$ , which yields (28) and completes the proof. ∎

3.3. The measures $\overline{\mathbb{P}}^{\eta}_{n}$ and $\widetilde{\mathbb{P}}^{\eta}_{n}$

Let us define the functional $\vartheta:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ by

[TABLE]

Notice that $\vartheta[\mu]<+\infty$ if and only if $\mu\in\mathcal{P}_{\ell}(\mathbb{R}^{d})$ , and that $\vartheta$ is translation invariant.

For all $\eta>0$ , we define the probability measures $\overline{\mathbb{P}}^{\eta}_{n}$ and $\widetilde{\mathbb{P}}^{\eta}_{n}$ , respectively on the Borel $\sigma$ -fields of the topological spaces $\overline{\mathcal{P}}(\mathbb{R}^{d})$ and $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , for any $p\geq 1$ , by the identities

[TABLE]

Since the operators $\rho:\mathcal{P}(\mathbb{R}^{d})\to\overline{\mathcal{P}}(\mathbb{R}^{d})$ and $\mathrm{T}:\mathcal{P}_{p}(\mathbb{R}^{d})\to\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ are continuous, the following result is obtained from Proposition 3.3 by means of the Contraction Principle [22, Theorem 4.2.1, p. 126].

Corollary 3.4 (LDP for $\overline{\mathbb{P}}^{\eta}_{n}$ and $\widetilde{\mathbb{P}}^{\eta}_{n}$ ).

For all $\eta>0$ , the sequence $\overline{\mathbb{P}}^{\eta}_{n}$ satisfies a large deviation principle on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ with good rate function

[TABLE]

In addition, if $\ell>1$ , then for all $p\in[1,\ell)$ , for all $\eta>0$ , the sequence $\widetilde{\mathbb{P}}^{\eta}_{n}$ satisfies a large deviation principle on $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ with good rate function

[TABLE]

3.4. Alternative expression for $\widetilde{\mathbb{P}}^{\eta}_{n}$

We denote by $\mathrm{t}_{n}$ the orthogonal projection of $(\mathbb{R}^{d})^{n}$ onto the subspace $M_{d,n}$ , and for all $\eta>0$ , we define the probability measure $\widetilde{P}^{\eta}_{n}$ on $(\mathbb{R}^{d})^{n}$ by

[TABLE]

Notice that $\widetilde{P}^{\eta}_{n}(M_{d,n})=1$ . We also define the function $\widehat{V}_{n}^{\eta}:M_{d,n}\to\mathbb{R}$ by the identity

[TABLE]

where for all $\zeta\in\mathbb{R}^{d}$ , we denote by $\vec{\zeta}=(\zeta,\ldots,\zeta)$ the corresponding element of $(\mathbb{R}^{d})^{n}$ .

Lemma 3.5 (Relation between $\widetilde{\mathbb{P}}^{\eta}_{n}$ and $\widetilde{P}^{\eta}_{n}$ ).

For all $\eta>0$ ,

[TABLE]

and the probability measure $\widetilde{P}^{\eta}_{n}$ defined by (37) possesses the density

[TABLE]

with respect to the Lebesgue measure $\mathrm{d}\widetilde{\mathbf{x}}$ on $M_{d,n}$ . Besides, the probability measure $\widetilde{\mathbb{P}}^{\eta}_{n}$ defined by (36) satisfies

[TABLE]

Proof.

Let $B$ be a Borel subset of $(\mathbb{R}^{d})^{n}$ . By (37) and (25),

[TABLE]

Any $\mathbf{x}\in(\mathbb{R}^{d})^{n}$ admits the orthogonal decomposition $\mathbf{x}=\widetilde{\mathbf{x}}+\vec{\zeta}$ , with $\widetilde{\mathbf{x}}=\mathrm{t}_{n}(\mathbf{x})\in M_{d,n}$ and $\vec{\zeta}=(\zeta,\ldots,\zeta)$ for some $\zeta\in\mathbb{R}^{d}$ . As a consequence, $\widetilde{P}^{\eta}_{n}(B)$ rewrites

[TABLE]

where we have used the fact that $W_{n}(\widetilde{\mathbf{x}}+\vec{\zeta})=W_{n}(\widetilde{\mathbf{x}})$ , thanks to Assumption (TI), and the definition (38) of $\widehat{V}_{n}^{\eta}$ . This shows (39) and the fact that $\widetilde{P}^{\eta}_{n}$ possesses the density $\widetilde{p}^{\eta}_{n}(\widetilde{\mathbf{x}})$ . Last, (40) follows from the elementary relation $\mathrm{T}\circ\pi_{n}=\pi_{n}\circ\mathrm{t}_{n}$ on $(\mathbb{R}^{d})^{n}$ . ∎

In Section 4, we shall rely on the following bounds on the function $\widehat{V}_{n}^{\eta}$ .

Lemma 3.6 (Bounds on $\widehat{V}_{n}^{\eta}$ ).

Let $n\geq 2$ and $\eta>0$ . For all $\widetilde{\mathbf{x}}\in M_{d,n}$ ,

[TABLE]

where we recall the definition (22) of $z^{\eta}$ .

Proof.

The upper bound follows from the convexity inequality

[TABLE]

while the lower bound follows from Jensen’s inequality

[TABLE]

since $\widetilde{\mathbf{x}}\in M_{d,n}$ . ∎

4. Proof of Theorems 2.14 and 2.16

This section is dedicated to the proof of the large deviation principles contained in Theorems 2.14 and 2.16. We first check in Subsection 4.1 that, under the respective assumptions of these theorems, the functionals $\widetilde{\mathscr{I}}$ and $\overline{\mathscr{I}}$ are good rate functions. In Subsection 4.2, we obtain auxiliary results on the respective approximation of $\widetilde{\mathbb{P}}_{n}$ and $\overline{\mathbb{P}}_{n}$ by the measures $\widetilde{\mathbb{P}}^{\eta}_{n}$ and $\overline{\mathbb{P}}^{\eta}_{n}$ introduced in Section 3. These results allow us to prove large deviation upper and lower bounds in Subsection 4.3, thereby completing the proof of Theorems 2.14 and 2.16.

4.1. Rate functions

The purpose of this subsection is to prove the following result.

Lemma 4.1 (Goodness of rate functions).

Under the assumptions of Lemma 2.10, the functional $\overline{\mathscr{F}}$ has compact level sets on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ , and if the index $\ell\geq 1$ given by Assumption (GC) is such that $\ell>1$ , then for all $p\in[1,\ell)$ , the functional $\mathscr{F}$ has compact level sets on $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ .

Combining the results of Lemmas 2.10 and 4.1, we conclude that, under the respective assumptions of Theorems 2.14 and 2.16, the functionals $\widetilde{\mathscr{I}}$ and $\overline{\mathscr{I}}$ are good rate functions, respectively on $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ and $\overline{\mathcal{P}}(\mathbb{R}^{d})$ . We first state an auxiliary result.

Lemma 4.2 (Level sets on $\mathcal{P}(\mathbb{R}^{d})$ ).

Under the assumptions of Lemma 2.10, for all $a\in\mathbb{R}$ , the set

[TABLE]

is closed in $\mathcal{P}(\mathbb{R}^{d})$ . Besides, letting $\ell\geq 1$ be given by Assumption (GC), we have $A\subset\mathcal{P}_{\ell}(\mathbb{R}^{d})$ and there exists $a^{\prime}\in\mathbb{R}$ such that

[TABLE]

Proof.

Since, by Remark 2.7, neither $\mathscr{S}[\mu]$ nor $\mathscr{W}[\mu]$ can take the value $-\infty$ , any $\mu\in A$ satisfies $\mathscr{S}[\mu]<+\infty$ and $\mathscr{W}[\mu]<+\infty$ , which by Assumption (GC) ensures that $A\subset\mathcal{P}_{\ell}(\mathbb{R}^{d})$ .

Let us now fix $\mu\in A$ and define $\widetilde{\mu}=\mathrm{T}\mu\in\widetilde{\mathcal{P}}_{\ell}(\mathbb{R}^{d})$ . For all $\eta>0$ , we recall the definitions of $z^{\eta}$ and $\nu^{\eta}$ from Section 3. By the translation invariance of $\mathscr{F}$ , Lemma 3.1 and the definition (21) of $\mathscr{V}^{\eta}$ ,

[TABLE]

Using the fact that the relative entropy is nonnegative and then Assumption (GC), we deduce that

[TABLE]

so that taking $\eta=\kappa_{\ell}/2$ and recalling that $\mu\in A$ yields

[TABLE]

which provides (41).

In order to show that $A$ is closed in $\mathcal{P}(\mathbb{R}^{d})$ , let us take a sequence $(\mu_{n})_{n\geq 1}$ in $A$ , which converges to some $\mu$ in $\mathcal{P}(\mathbb{R}^{d})$ , and prove that

[TABLE]

which implies $\mu\in A$ . As a first step, we note that, according to the first part of the proof, $\mu_{n}\in\mathcal{P}_{\ell}(\mathbb{R}^{d})$ for all $n\geq 1$ , which allows us to define $\widetilde{\mu}_{n}=\mathrm{T}\mu_{n}$ and notice that $\mathscr{F}[\widetilde{\mu}_{n}]=\mathscr{F}[\mu_{n}]$ ; besides, by (41), the sequence of $\ell$ -th order moments of $\widetilde{\mu}_{n}$ is bounded. Since the functional $\mathscr{W}$ is nonnegative, the sequence $\mathscr{F}[\widetilde{\mu}_{n}]$ is also bounded. Denoting by $\widetilde{p}_{n}$ the density of $\widetilde{\mu}_{n}$ , we then obtain from standard arguments [29, pp. 7-8] the existence of a probability density $\widetilde{q}$ toward which $\widetilde{p}_{n}$ converges weakly in $L^{1}(\mathbb{R}^{d})$ , at least along a subsequence, and such that

[TABLE]

where we denote by $\widetilde{\nu}$ the probability measure with density $\widetilde{q}$ . Finally, since the orbit map $\rho:\mathcal{P}(\mathbb{R}^{d})\to\overline{\mathcal{P}}(\mathbb{R}^{d})$ is continuous, the series of identities

[TABLE]

in $\overline{\mathcal{P}}(\mathbb{R}^{d})$ implies that $\mathscr{F}[\mu]=\mathscr{F}[\widetilde{\nu}]$ , whence the conclusion. ∎

The inequality (42) shows that $\mathscr{F}$ is bounded from below on $\mathcal{P}(\mathbb{R}^{d})$ , which proves the statement (ii) of Lemma 2.10. We may now complete the proof of Lemma 4.1.

Proof of Lemma 4.1.

We fix $a\in\mathbb{R}$ and first prove that the set

[TABLE]

is compact in $\overline{\mathcal{P}}(\mathbb{R}^{d})$ . By Lemma A.1, this set is closed if and only if $\rho^{-1}(\overline{A})$ is closed in $\mathcal{P}(\mathbb{R}^{d})$ , which is the case since $\rho^{-1}(\overline{A})$ is easily seen to coincide with the set $A$ of Lemma 4.2. We now proceed to show that this set is sequentially compact. Let $(\overline{\mu}_{n})_{n\geq 1}$ be a sequence of elements of $\overline{A}$ . By Lemma 4.2, for all $n\geq 1$ there exists $\widetilde{\mu}_{n}\in A\cap\widetilde{\mathcal{P}}_{\ell}(\mathbb{R}^{d})$ such that $\rho(\widetilde{\mu}_{n})=\overline{\mu}_{n}$ , and we have the moment control

[TABLE]

given by (41). Markov’s inequality implies that the sequence $(\widetilde{\mu}_{n})_{n\geq 1}$ is tight, so that by Prohorov’s Theorem [7, Theorem 5.1, p. 59], it possesses a converging subsequence. The continuity of the map $\rho$ then ensures that the sequence $(\overline{\mu}_{n})_{n\geq 1}$ possesses a converging subsequence as well, which shows the sequential compactness of $\overline{A}$ . Since we prove in Lemma A.3 that the quotient topology on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ is metrisable, [22, Theorem B.2, p. 345] allows us to conclude that $\overline{A}$ is compact and obtain the first part of Lemma 4.1.

We now assume that $\ell>1$ , fix $p\in[1,\ell)$ , and prove that the set

[TABLE]

is compact in $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ . Since the Wasserstein topology is stronger than the topology of weak convergence, Lemma 4.2 implies that $A$ is closed in $\mathcal{P}_{p}(\mathbb{R}^{d})$ , and therefore $\widetilde{A}$ is closed in $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ . Now for all sequences $(\widetilde{\mu}_{n})_{n\geq 1}$ of elements of $\widetilde{A}$ , the moment control (43) ensures that $(\widetilde{\mu}_{n})_{n\geq 1}$ possesses a subsequence, that we still index by $n$ for convenience, which converges to some $\mu$ in $\mathcal{P}(\mathbb{R}^{d})$ . To prove that the convergence actually holds in $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , we remark that since $p<\ell$ , the moment control (43) also ensures the uniform integrability of the $p$ -th order moment of $\widetilde{\mu}_{n}$ , so that by [45, Definition 6.8, p. 96], $\widetilde{\mu}_{n}$ converges to $\mu$ in $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , therefore $\widetilde{A}$ is sequentially compact in $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ . By [22, Theorem B.2, p. 345] again, we conclude that $\widetilde{A}$ is compact in $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , whence the second part of Lemma 4.1. ∎

4.2. Exponential comparisons

This subsection contains two auxiliary results which will be used in the proof of the large deviation upper and lower bounds.

Lemma 4.3 (Exponential tilting of $\widetilde{P}^{\eta}_{n}$ ).

Let $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ be an energy functional satisfying Assumptions (TI), ( $\sigma$ F), (LSC) and (GC), and let $\eta>0$ .

(i)

For all $p\geq 1$ , for all Borel sets $\widetilde{B}$ of $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ ,

[TABLE]

and

[TABLE] 2. (ii)

For all Borel sets $\overline{B}$ of $\overline{\mathcal{P}}(\mathbb{R}^{d})$ ,

[TABLE]

and

[TABLE]

Proof.

We first address the proof of the identities (44) and (46). The equality (44) is a straightforward consequence of the definition (36) of $\widetilde{\mathbb{P}}^{\eta}_{n}$ . To check the validity of (46), we recall that the respective definitions (36) and (26) of $\overline{\mathbb{P}}^{\eta}_{n}$ and $\mathbb{P}^{\eta}_{n}$ yield

[TABLE]

Besides, since for all $\mathbf{x}\in(\mathbb{R}^{d})^{n}$ ,

[TABLE]

we have $\rho\circ\pi_{n}=\rho\circ\pi_{n}\circ\mathrm{t}_{n}$ on $(\mathbb{R}^{d})^{n}$ . Hence we may substitute $\pi_{n}^{-1}\circ\rho^{-1}$ with $\mathrm{t}_{n}^{-1}\circ\pi_{n}^{-1}\circ\rho^{-1}$ in (48) to obtain

[TABLE]

thanks to (37). This equality immediately leads to (46).

We now address the proof of (45) and (47). For all $p\geq 1$ , for all Borel sets $\widetilde{B}$ of $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , (12) yields

[TABLE]

so that (45) follows from Lemma 3.5. Likewise, for all Borel sets $\overline{B}$ of $\overline{\mathcal{P}}(\mathbb{R}^{d})$ , (47) is obtained by the same chain of arguments, but starting with (16) in place of (12). ∎

Lemma 4.4 (Exponential moment control).

Let $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ be an energy functional satisfying Assumptions (TI), ( $\sigma$ F), (LSC), (GC) and (SH). For all $q\in[1,+\infty)$ ,

[TABLE]

Proof.

Let us fix $q\in[1,+\infty)$ and $\epsilon\in(0,1)$ . The proof is divided in 3 steps.

Step 1. In this step, we construct $\eta_{0}>0$ , depending on $\epsilon$ , such that for all $\eta\leq\eta_{0}$ , there exists $n_{0}\geq 2$ which depends on $\eta$ such that, for all $n\geq n_{0}$ , for all $\widetilde{\mathbf{x}}\in M_{d,n}$ , if

[TABLE]

then

[TABLE]

We first rewrite (51) under the equivalent formulation

[TABLE]

On the one hand, the upper bound of Lemma 3.6 yields, for all $\widetilde{\mathbf{x}}\in M_{d,n}$ ,

[TABLE]

with

[TABLE]

on the other hand, Assumption (GC) yields, for all $\widetilde{\mathbf{x}}\in M_{d,n}$ ,

[TABLE]

We deduce that (51) holds as soon as

[TABLE]

With the latter condition at hand, let us define

[TABLE]

and notice that, for all $\eta\leq\eta_{0}$ ,

[TABLE]

so that there exists $n_{0}\geq 2$ , depending on $\eta$ , such that, for all $n\geq n_{0}$ ,

[TABLE]

and therefore

[TABLE]

As a conclusion, for all $n\geq n_{0}$ , if $\widetilde{\mathbf{x}}\in M_{d,n}$ satisfies (50) then (52) holds, which leads to (51).

Step 2. Let us fix $\eta\leq\eta_{0}$ and $n\geq n_{0}$ , where $\eta_{0}$ and $n_{0}$ are given by Step 1. In this step, we give an upper bound on

[TABLE]

by studying this integral separately on the domains

[TABLE]

and on its complement. By the upper bound of Lemma 3.6,

[TABLE]

On the other hand, using Lemma 3.5 and Step 1 we obtain the chain of inequalities

[TABLE]

By Assumption (SH), $(1-\epsilon)W_{n}(\widetilde{\mathbf{x}})\geq W_{n}((1-\epsilon)\widetilde{\mathbf{x}})$ . We now derive a similar bound for $\widehat{V}^{\eta(1-\epsilon)}_{n}(\widetilde{\mathbf{x}})$ . The definition (38) yields

[TABLE]

where we have performed the change of variable $\zeta=(1-\epsilon)\xi$ and used the fact that $(1-\epsilon)^{\ell}\leq 1-\epsilon$ . Thus,

[TABLE]

thanks to the change of variable $\widetilde{\mathbf{y}}=(1-\epsilon)\widetilde{\mathbf{x}}$ . Injecting this inequality at the end of (53), we obtain

[TABLE]

so that we may conclude this step by stating that

[TABLE]

Step 3. We complete the proof by studying the asymptotic behaviour of $I^{\eta}_{n}$ . By Step 2 and the standard asymptotic subadditivity argument,

[TABLE]

from which we then deduce that

[TABLE]

We may now complete the proof of (49) by letting $\epsilon$ vanish. ∎

4.3. Large deviation upper and lower bounds

In this subsection, we complete the proof of Theorems 2.14 and 2.16 by addressing the large deviation upper and lower bounds.

Lemma 4.5 (Large deviation upper bound).

Let $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ be an energy functional satisfying Assumptions (TI), ( $\sigma$ F), (LSC), (GC), (SH) and (CC).

(i)

For all closed sets $\overline{B}$ of $\overline{\mathcal{P}}(\mathbb{R}^{d})$ ,

[TABLE] 2. (ii)

If the index $\ell\geq 1$ given by Assumption (GC) is such that $\ell>1$ , then for all $p\in[1,\ell)$ , for all closed sets $\widetilde{B}$ of $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ ,

[TABLE]

Proof.

We shall prove both statements at once. Let $B^{\prime}$ refer to either $\overline{B}\subset\overline{\mathcal{P}}(\mathbb{R}^{d})$ or $\widetilde{B}\subset\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , $\mathbb{P}^{\prime}_{n}$ (resp. $\mathbb{P}^{\prime\eta}_{n}$ ) refer to either $\overline{\mathbb{P}}_{n}$ (resp. $\overline{\mathbb{P}}^{\eta}_{n}$ ) or $\widetilde{\mathbb{P}}_{n}$ (resp. $\widetilde{\mathbb{P}}^{\eta}_{n}$ ), and so on. By Lemma 4.3, for all $\eta>0$ , for all $n\geq 2$ ,

[TABLE]

where $B^{\prime\prime}$ refers to either $\rho^{-1}(\overline{B})$ or $\widetilde{B}$ .

By (39) and the lower bound of Lemma 3.6,

[TABLE]

whence

[TABLE]

for any $\eta>0$ .

Let us now fix $q,q^{\prime}\in(1,+\infty)$ such that $1/q+1/q^{\prime}=1$ . By Hölder’s inequality, for all $\eta>0$ ,

[TABLE]

By Lemma 4.3, for all $\eta>0$ ,

[TABLE]

and by Corollary 3.4,

[TABLE]

with $\mathscr{I}^{\prime\eta}$ referring to either $\overline{\mathscr{I}}^{\eta}$ or $\widetilde{\mathscr{I}}^{\eta}$ . Using Lemma 4.4, we thus deduce that

[TABLE]

from which we deduce that

[TABLE]

thanks to Lemma 4.7 stated below. Since $q^{\prime}$ is arbitrarily close to $1$ , the proof is completed. ∎

Lemma 4.6 (Large deviation lower bound).

Let $\mathscr{W}:\mathcal{P}(\mathbb{R}^{d})\to[0,+\infty]$ be an energy functional satisfying Assumptions (TI), ( $\sigma$ F), (LSC), (GC), (SH) and (CC).

(i)

For all open sets $\overline{B}$ of $\overline{\mathcal{P}}(\mathbb{R}^{d})$ ,

[TABLE] 2. (ii)

If the index $\ell\geq 1$ given by Assumption (GC) is such that $\ell>1$ , then for all $p\in[1,\ell)$ , for all open sets $\widetilde{B}$ of $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ ,

[TABLE]

Proof.

We shall prove both statements at once, and use the same shortcut notations as in the proof of Lemma 4.5. Once again, we start from the fact that $\mathbb{P}^{\prime}_{n}(B^{\prime})$ satisfies the identity (54). Noting that

[TABLE]

and then using Lemma 4.4 with $q=1$ , we first obtain

[TABLE]

We now combine the lower bound of Lemma 3.6 with Lemma 4.3 to write

[TABLE]

from which we deduce that

[TABLE]

thanks to Corollary 3.4. The conclusion follows from the application of Lemma 4.7, which is stated below. ∎

Lemma 4.7 (Convergence of rate functions).

Under the assumptions of either Theorem 2.16 or Theorem 2.14, let $\mathscr{I}^{\prime}$ (resp. $\mathscr{I}^{\prime\eta}$ , for $\eta>0$ ) refer to either $\overline{\mathscr{I}}$ (resp. $\overline{\mathscr{I}}^{\eta}$ ) or $\widetilde{\mathscr{I}}$ (resp. $\widetilde{\mathscr{I}}^{\eta}$ ). Then for any subset $B^{\prime}$ of either $\overline{\mathcal{P}}(\mathbb{R}^{d})$ or $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ ,

[TABLE]

Proof.

The functions $\mathscr{I}^{\prime\eta}$ and $\mathscr{I}^{\prime}$ write

[TABLE]

with obvious notations for $\mathscr{F}^{\prime}$ and $\vartheta^{\prime}$ , and

[TABLE]

Thus, it is sufficient to prove that

[TABLE]

The fact that $\vartheta^{\prime}[\mu^{\prime}]\geq 0$ immediately yields

[TABLE]

Furthermore, for any $\nu^{\prime}\in B^{\prime}$ ,

[TABLE]

and letting $\eta$ vanish in both sides of the inequality yields

[TABLE]

so that taking the infimum of the right-hand side over $\nu^{\prime}$ yields

[TABLE]

which completes the proof. ∎

5. Application to McKean-Vlasov and rank-based models

5.1. mv-model

This subsection presents the proof of Corollary 2.18. We first assume that, in the decomposition (17), $W^{\flat}\equiv 0$ .

Lemma 5.1 (Case $W^{\flat}\equiv 0$ ).

Let $W^{\sharp}:\mathbb{R}^{d}\to[0,+\infty)$ be an interaction potential satisfying Assumption (mv- $\sharp$ ). Then the associated energy functional $\mathscr{W}^{\sharp}$ defined by (7) with $W=W^{\sharp}$ satisfies Assumptions (TI), ( $\sigma$ F), (LSC), (GC), (SH) and (CC); besides, Assumption (GC) holds with the index $\ell\geq 1$ given by Assumption (mv- $\sharp$ ).

Proof.

Assumptions (TI) and ( $\sigma$ F) are straightforward. The continuity of the mapping $\mu\mapsto\mu\otimes\mu$ on $\mathcal{P}(\mathbb{R}^{d})$ , combined with the fact that, by Assumtion (mv- $\sharp$ ), $W^{\sharp}$ is nonnegative and lower semicontinuous, and Fatou’s Lemma, yield Assumption (LSC).

Let $\ell\geq 1$ be given by Assumption (mv- $\sharp$ ). By (7), for all $\mu\in\mathcal{P}(\mathbb{R}^{d})$ ,

[TABLE]

which, by the Fubini-Tonelli Theorem, implies that $\mathscr{W}^{\sharp}[\mu]=+\infty$ if $\mu\not\in\mathcal{P}_{\ell}(\mathbb{R}^{d})$ . On the other hand, if $\widetilde{\mu}\in\widetilde{\mathcal{P}}_{\ell}(\mathbb{R}^{d})$ , then by Jensen’s Inequality,

[TABLE]

so that

[TABLE]

and $\mathscr{W}^{\sharp}$ satisfies Assumption (GC).

Assumption (SH) is a straightforward consequence of Assumption (mv- $\sharp$ ).

We finally let $\mu\in\mathcal{P}(\mathbb{R})$ and take a sequence of independent random variables $(Y_{n})_{n\geq 1}$ on some probability space $(\Omega,\mathcal{A},\mathbf{P})$ with identical distribution $\mu$ . For all $n\geq 2$ ,

[TABLE]

which leads to Assumption (CC) and completes the proof. ∎

We now address the general case $W=W^{\sharp}+W^{\flat}$ , with $W^{\flat}\not\equiv 0$ . We decompose the energy functional $\mathscr{W}$ , defined by (7), as

[TABLE]

with obvious definitions for $\mathscr{W}^{\sharp}$ and $\mathscr{W}^{\flat}$ . By Lemma 5.1 and Theorems 2.16 and 2.14, the sequences $\overline{\mathbb{P}}^{\sharp}_{n}$ and $\widetilde{\mathbb{P}}^{\sharp}_{n}$ associated with $\mathscr{W}^{\sharp}$ satisfy the large deviation principles of Corollary 2.18, with respective rate functions denoted by $\overline{\mathscr{I}}^{\sharp}$ and $\widetilde{\mathscr{I}}^{\sharp}$ . On the other hand,

[TABLE]

with an obvious definition for $\widetilde{Z}^{\sharp}_{n}$ .

If $\ell=1$ , then by Assumption (mv- $\flat$ ), $\overline{\mathscr{W}}^{\flat}$ is a bounded and continuous functional on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ , so that the application of the Laplace-Varadhan Lemma [25, Theorem II.7.2, p. 52] is straightforward and yields the first part of Corollary 2.18.

Remark 5.2 (On the Laplace-Varadhan Lemma).

The statement of the Laplace-Varadhan Lemma in [25, Theorem II.7.2, p. 52] requires the state space to be Polish, which is not proved for $\overline{\mathcal{P}}(\mathbb{R}^{d})$ in the present article. However, a careful examination of the proof of this theorem shows that this assumption is in fact not necessary. More generally, we refer to [22, Section 4.3] for an exposition of Varadhan’s Lemma and various developments on regular (and in particular metric) topological spaces, which are not necessarily Polish.

Let us now assume that $\ell>1$ , and fix $p\in[\max(1,\ell^{\prime}),\ell)$ , where $\ell^{\prime}\in[0,\ell)$ is given by Assumption (mv- $\flat$ ). The functional $\mathscr{W}^{\flat}$ is continuous on $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , but not necessarily bounded, so that following [25, Theorem II.7.2, p. 52] and [22, Lemma 4.3.8, p. 138], we shall check the exponential moment condition

[TABLE]

for some $\gamma>1$ — in fact, since any multiple of $\mathscr{W}^{\flat}$ also satisfies Assumption (mv- $\flat$ ), this condition should hold for any $\gamma\in\mathbb{R}$ .

Taking (55) for granted, the Laplace-Varadhan Lemma [25, Theorem II.7.2, p. 52] allows to transfer the large deviation principle from $\widetilde{\mathbb{P}}^{\sharp}_{n}$ to $\widetilde{\mathbb{P}}_{n}$ on $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , for any $p\in[\max(1,\ell^{\prime}),\ell)$ . This result is then extended on $\widetilde{\mathcal{P}}_{p}(\mathbb{R}^{d})$ , for any $p\in[1,\ell)$ , and to $\overline{\mathcal{P}}(\mathbb{R}^{d})$ , by the use of the Contraction Principle [22, Theorem 4.2.1, p. 126], which completes the proof of Corollary 2.18.

Proof of (55).

The argument is similar to the proof of Lemma 4.4. Let us fix $\gamma>1$ , and rewrite

[TABLE]

Assumption (mv- $\flat$ ) and Jensen’s Inequality imply that there exists $C\geq 0$ such that, for all $\widetilde{\mathbf{x}}\in M_{d,n}$ ,

[TABLE]

By Hölder’s Inequality and Assumption (mv- $\sharp$ ),

[TABLE]

so that

[TABLE]

and for any $\epsilon\in(0,1)$ , there exists $L\geq 0$ such that, for all $n\geq 2$ , for all $\widetilde{\mathbf{x}}\in M_{d,n}$ , the condition

[TABLE]

implies that

[TABLE]

Studying the integral in the numerator of the right-hand side of (56) separately on the domains $\{W_{n}^{\sharp}(\widetilde{\mathbf{x}})<L\}$ and on its complement, we get the bound

[TABLE]

and the same change of variable as in the proof of Lemma 4.4 allows to complete the proof of (55). ∎

5.2. rb-model

The next lemma allows to deduce Corollary 2.19 from a straightforward application of Theorem 2.16.

Lemma 5.3 (Assumptions of Theorem 2.16 for the rb-model).

If the flux function $B$ satisfies the assumptions of Corollary 2.19, then the associated energy functional $\mathscr{W}$ defined by (9) satisfies Assumptions (TI), ( $\sigma$ F), (LSC), (GC) with $\ell=1$ , (SH) and (CC).

Proof.

Assumptions (TI) and ( $\sigma$ F) are straightforward.

To check Assumption (LSC), we recall that the weak convergence of probability measures implies the convergence $\mathrm{d}x$ -almost everywhere of their cumulative distribution functions, so that the lower semicontinuity of $\mathscr{W}$ follows from Fatou’s Lemma and the fact that $B$ is continuous and nonnegative.

Let us now define

[TABLE]

The combination of the conditions (8), (18) and (19) implies that $\kappa\in(0,+\infty)$ . As a consequence, for all $\mu\in\mathcal{P}(\mathbb{R})$ ,

[TABLE]

which by Lemma 5.1 implies Assumption (GC) with $\ell=1$ .

Assumption (SH) follows from the remark that

[TABLE]

so that $(1-\epsilon)W_{n}(\mathbf{x})=W_{n}((1-\epsilon)\mathbf{x})$ .

We finally let $\mu\in\mathcal{P}(\mathbb{R})$ and take a sequence of independent random variables $(Y_{n})_{n\geq 1}$ on some probability space $(\Omega,\mathcal{A},\mathbf{P})$ with identical distribution $\mu$ . If $\mu\not\in\mathcal{P}_{1}(\mathbb{R})$ , then by Remark 2.13,

[TABLE]

On the contrary, if $\mu\in\mathcal{P}_{1}(\mathbb{R})$ , let us write

[TABLE]

where $\pi_{n}$ is a short notation for $\pi_{n}(Y_{1},\ldots,Y_{n})$ . Denoting $C=\sup_{u\in[0,1]}|B^{\prime}(u)|$ , we get

[TABLE]

where $\mathrm{W}_{1}$ is the Wasserstein distance of order $1$ . That this distance coincides with the $L^{1}$ distance of cumulative distribution functions is a specific feature of probability measures on the real line, see [8, Theorem 2.9, p. 16]. By [8, Theorem 2.14, p. 20], $\mathbf{E}[\mathrm{W}_{1}(\pi_{n},\mu)]$ converges to [math] when $n$ grows to infinity, which shows that $\mathscr{W}$ satisfies Assumption (CC). ∎

5.3. Application to the study of atypical capital distribution

It was proved in [43] that under the assumptions of Corollary 2.19, $\widetilde{\mathbb{P}}_{n}$ converges weakly, on $\widetilde{\mathcal{P}}_{p}(\mathbb{R})$ for any $p\geq 1$ , to the Dirac mass $\delta_{\widetilde{\mu}_{\infty}}$ , where $\widetilde{\mu}_{\infty}$ is the unique centered stationary measure of the nonlinear diffusion process describing the mean-field limit of (10) — we refer to [44, 32, 21] for associated propagation of chaos results in the space of sample-paths. This measure satisfies the stationary nonlinear Fokker-Planck equation

[TABLE]

which implies that it possesses a density $\widetilde{p}_{\infty}$ with respect to the Lebesgue measure on $\mathbb{R}$ , which solves the fixed point relation

[TABLE]

As a consequence, if we let $(\widetilde{X}_{1},\ldots,\widetilde{X}_{n})$ be a random vector with distribution $\widetilde{P}_{n}$ , and $\widetilde{X}_{\infty,1},\ldots,\widetilde{X}_{\infty,n}$ be independent random variables with identical distribution $\widetilde{\mu}_{\infty}$ , then $\pi_{n}(\widetilde{X}_{1},\ldots,\widetilde{X}_{n})$ and $\pi_{n}(\widetilde{X}_{\infty,1},\ldots,\widetilde{X}_{\infty,n})$ satisfy the same weak law of large numbers, and converge to $\widetilde{\mu}_{\infty}$ . However, the large deviations of these random empirical measures are respectively described by Corollary 2.19 (in the orbit space $\overline{\mathcal{P}}(\mathbb{R})$ ), and by Sanov’s Theorem. We first examine the difference between the associated rate functions, and then detail an application of this result to the estimation of the probability of atypical capital distribution in the context of Stochastic Portfolio Theory.

5.3.1. Difference between rate functions

With the notations introduced above, let us define

[TABLE]

By Sanov’s Theorem and the Contraction Principle, the sequence $\overline{\mathbb{P}}_{\infty,n}$ satisfies a large deviation principle on $\overline{\mathcal{P}}(\mathbb{R})$ , with good rate function

[TABLE]

Lemma 5.4 (Comparison of rate functions).

Under the assumptions of Corollary 2.19, we have, for all $\overline{\mu}\in\overline{\mathcal{P}}(\mathbb{R})$ , for all $\mu\in\mathcal{P}(\mathbb{R})$ such that $\rho(\mu)=\overline{\mu}$ ,

[TABLE]

where

[TABLE]

As a consequence, if $B$ is concave, then for all $\overline{\mu}\in\overline{\mathcal{P}}(\mathbb{R})$ ,

[TABLE]

Proof.

As a preliminary remark, we observe that the convergence of $\widetilde{\mathbb{P}}_{n}$ to $\delta_{\widetilde{\mu}_{\infty}}$ implies that $\overline{\mathbb{P}}_{n}$ converges weakly to $\delta_{\overline{\mu}_{\infty}}$ , with $\overline{\mu}_{\infty}:=\rho(\widetilde{\mu}_{\infty})$ , on $\overline{\mathcal{P}}(\mathbb{R})$ . Combining this weak law of large numbers with Corollary 2.19, we get that $\overline{\mu}_{\infty}$ is the unique zero of $\overline{\mathscr{I}}$ , and therefore that

[TABLE]

Notice that (57) is the optimality condition associated with the definition of $\mathscr{F}_{\star}$ .

We now let $\overline{\mu}\in\overline{\mathcal{P}}(\mathbb{R})$ and $\mu\in\mathcal{P}(\mathbb{R})$ be such that $\rho(\mu)=\overline{\mu}$ . If $\mu\not\in\mathcal{P}_{1}(\mathbb{R})$ , then by Assumption (GC), $\mathscr{F}[\mu]=+\infty$ so that $\overline{\mathscr{I}}[\overline{\mu}]=+\infty$ ; besides, it is known that $\widetilde{\mu}_{\infty}$ has exponential tails [32, 33] so that $\overline{\mathscr{I}}_{\infty}[\overline{\mu}]=+\infty$ . Likewise, if $\mu$ is not absolutely continuous with respect to the Lebesgue measure on $\mathbb{R}$ , then both $\overline{\mathscr{I}}[\overline{\mu}]$ and $\overline{\mathscr{I}}_{\infty}[\overline{\mu}]$ are infinite.

We now assume that $\mu\in\mathcal{P}_{1}(\mathbb{R})$ and has a density $p$ with respect to the Lebesgue measure, and write

[TABLE]

Besides,

[TABLE]

By (58),

[TABLE]

which after the use of Fubini’s Theorem yields

[TABLE]

and leads to (59).

If $B$ is concave, then $\Gamma(u,v)\leq 0$ for all $u,v\in[0,1]$ , so that, for all $\overline{\mu}\in\overline{\mathcal{P}}(\mathbb{R})$ , (59) yields

[TABLE]

for all $\mu\in\mathcal{P}(\mathbb{R})$ such that $\rho(\mu)=\overline{\mu}$ . Taking the infimum over $\mu$ of the right-hand side results in (60) and completes the proof. ∎

5.3.2. Capital distribution curves

In the framework of Stochastic Portfolio Theory [26, 2], systems of rank-based interacting diffusions of the form (10) serve as first-order approximations of stable equity markets, in the sense that on a market with $n$ companies, the process $X_{i}(t)$ provides a good representation of the behaviour of the logarithmic capitalisation of the $i$ -th company. Thus, the proportion of the total capital held by this company is given by its market weight

[TABLE]

Using the reverse order statistics notation

[TABLE]

the capital distribution curve is defined as the log-log plot of the mapping $m\mapsto\mu_{[m]}(t)$ and summarises in which manner the whole capital of a market is spread among the companies.

Notice that the market weights are invariant by translation of $X_{1}(t),\ldots,X_{n}(t)$ , so that the vector $(\mu_{[1]}(t),\ldots,\mu_{[n]}(t))$ (and therefore the associated capital distribution curve) only depends on $\rho(\pi_{n}(X_{1}(t),\ldots,X_{n}(t)))$ . Besides, empirical studies (see for instance [26, Figure 5.1]) show that the capital distribution curves are remarkably stable over long times. These remarks suggest to study the statistical distribution of the vector $(\mu_{[1]},\ldots,\mu_{[n]})$ under the probability measure $\overline{\mathbb{P}}_{n}$ [2, 17, 33].

When $n$ grows to infinity, the law of large numbers for $\overline{\mathbb{P}}_{n}$ prescribes a deterministic form for the (suitably rescaled) capital distribution curve, which was observed to fit empirical data in [33]. This defines a distribution of capital which we call typical. If one wants to study the capital distribution without having to sample the high-dimensional vector $(\widetilde{X}_{1},\ldots,\widetilde{X}_{n})$ from the distribution $\widetilde{P}_{n}$ , then the discussion above shows that using independent random variables $\widetilde{X}_{\infty,1},\ldots,\widetilde{X}_{\infty,n}$ identically distributed according to $\widetilde{\mu}_{\infty}$ as a surrogate model provides correct results concerning this typical behaviour. Such a surrogate model was for instance employed in [33] to evaluate the performance of diversity-weighted portfolios, and in [11, Section 3] to study hitting times and rank-rank correlations.

On the contrary, Lemma 5.4 shows that the fluctuations of the capital distribution far from its typical behaviour, due to finite-size effects, and which are described by the large deviations of $\overline{\mathbb{P}}_{n}$ , are not correctly captured by the surrogate model in general. In short: both sequences $\overline{\mathbb{P}}_{n}$ and $\overline{\mathbb{P}}_{\infty,n}$ concentrate around $\overline{\mu}_{\infty}$ , but their rate functions differ. On a more quantitative level, if the flux function $B$ is concave, then at the level of large deviations, the probability of an atypical distribution of the capital is always underestimated by $\overline{\mathbb{P}}_{\infty,n}$ (the surrogate model) with respect to $\overline{\mathbb{P}}_{n}$ . In other words, the interaction between the stocks increases the probability of an atypical capital distribution.

For similar works on the study of the fluctuations of mean-field rank-based interacting diffusions around, or far from, their typical behaviour, we refer to the respective works by Kolli and Shkolnikov [34], and Dembo, Shkolnikov, Varadhan and Zeitouni [21]. We also mention that inequalities between rate functions for sequence of probability measures having the same law of large numbers, such as in Lemma 5.4, naturally provide comparisons between asymptotic variances in Monte-Carlo numerical methods. For more details in this direction, we refer to the work by Rey-Bellet and Spiliopoulos [41] and the references therein.

Appendix A Metrisability of the quotient topology on $\overline{\mathcal{P}}(\mathbb{R}^{d})$

By definition, the quotient topology on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ is the strongest topology making the orbit map $\rho$ continuous. The purpose of this appendix is to prove that this topology is metrisable, which is in general not the case for quotient topologies.

We first note that the definition of the quotient topology implies the following characterisation of open and closed sets.

Lemma A.1 (Open and closed sets in $\overline{\mathcal{P}}(\mathbb{R}^{d})$ ).

A subset $\overline{A}$ of $\overline{\mathcal{P}}(\mathbb{R}^{d})$ is open (respectively closed) if and only if the set $\rho^{-1}(\overline{A})$ is open (respectively closed) in $\mathcal{P}(\mathbb{R}^{d})$ .

Our construction of a metric on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ is based on the Prohorov metric on $\mathcal{P}(\mathbb{R}^{d})$ , which following [7, Theorem 6.9, p. 74] can be defined by

[TABLE]

where the infimum is taken over all the couplings $\pi$ of $\mu$ and $\nu$ . We recall that a sequence of probability measures $\mu_{n}$ converges weakly to $\mu$ in $\mathcal{P}(\mathbb{R}^{d})$ if and only if $d_{\mathrm{P}}(\mu_{n},\mu)$ converges to [math], so that the metric topology associated with the Prohorov metric coincides with the topology of weak convergence [7, Theorem 6.8, p. 73].

The following property of the Prohorov metric is immediate.

Lemma A.2 (Translation invariance of the Prohorov metric).

For all $\mu,\nu\in\mathcal{P}(\mathbb{R}^{d})$ , for all $y\in\mathbb{R}^{d}$ ,

[TABLE]

For all $\overline{\mu},\overline{\nu}\in\overline{\mathcal{P}}(\mathbb{R}^{d})$ , let us define

[TABLE]

For any $\mu,\nu\in\mathcal{P}(\mathbb{R}^{d})$ such that $\rho(\mu)=\overline{\mu}$ and $\rho(\nu)=\overline{\nu}$ , it is a consequence of Lemma A.2 that $\overline{d}_{\mathrm{P}}(\overline{\mu},\overline{\nu})$ rewrites

[TABLE]

Lemma A.3 (Metrisability of $\overline{\mathcal{P}}(\mathbb{R}^{d})$ ).

The function $\overline{d}_{\mathrm{P}}$ is a metric on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ , and the associated metric topology is the same as the quotient topology.

We call $\overline{d}_{\mathrm{P}}$ the quotient Prohorov metric.

Proof.

It is obvious that $\overline{d}_{\mathrm{P}}$ is symmetric. To show that it satisfies the triangle inequality, we take $\overline{\mu},\overline{\nu},\overline{\lambda}\in\overline{\mathcal{P}}(\mathbb{R}^{d})$ and fix $\mu,\nu,\lambda\in\mathcal{P}(\mathbb{R}^{d})$ such that $\rho(\mu)=\overline{\mu}$ , $\rho(\nu)=\overline{\nu}$ and $\rho(\lambda)=\overline{\lambda}$ . By (62) and the triangle inequality for $d_{\mathrm{P}}$ , for all $x,y\in\mathbb{R}^{d}$ ,

[TABLE]

so that taking the infimum of the right-hand side of the inequality over $x,y\in\mathbb{R}^{d}$ and using (63), we obtain

[TABLE]

We now take $\overline{\mu},\overline{\nu}\in\overline{\mathcal{P}}(\mathbb{R}^{d})$ such that $\overline{d}_{\mathrm{P}}(\overline{\mu},\overline{\nu})=0$ . Let $\mu,\nu\in\mathcal{P}(\mathbb{R}^{d})$ such that $\rho(\mu)=\overline{\mu}$ and $\rho(\nu)=\overline{\nu}$ . By (63), for all $n\geq 1$ there exists $y_{n}\in\mathbb{R}^{d}$ such that

[TABLE]

therefore $\tau_{y_{n}}\nu$ converges to $\mu$ . By Ulam’s Theorem [7, Theorem 1.3, p. 8], $\nu$ is tight, hence there exists a centered ball $B(0,r)$ , $r\geq 0$ , such that

[TABLE]

Likewise, by Prohorov’s Theorem [7, Theorem 5.2, p. 60], the family $(\tau_{y_{n}}\nu)_{n\geq 1}$ is tight, so that there exists $s\geq 0$ such that

[TABLE]

Assume that there exists an extracted sequence $(n_{k})_{k\geq 1}$ such that $|y_{n_{k}}|$ diverges to $+\infty$ : then for $k$ large enough, the balls $B(0,r)$ and $B(y_{n_{k}},s)$ are disjoint, so that the combination of (64) and (65) yields

[TABLE]

which is absurd. As a consequence, the sequence $(y_{n})_{n\geq 1}$ is bounded and therefore possesses a converging subsequence, that we still index by $n$ for convenience, and the limit of which is denoted $y_{*}$ . Using the continuity of the mapping $y\mapsto\tau_{y}\nu$ , we get

[TABLE]

which implies that $\overline{\mu}=\rho(\mu)=\rho(\tau_{y_{*}}\nu)=\overline{\nu}$ and completes the proof that $\overline{d}_{\mathrm{P}}$ is a metric.

As an immediate consequence of the definition (62) of $\overline{d}_{\mathrm{P}}$ , we have the inequality

[TABLE]

which implies that $\rho$ is continuous for the metric topology induced on $\overline{\mathcal{P}}(\mathbb{R}^{d})$ by $\overline{d}_{\mathrm{P}}$ , so by definition of the quotient topology, the latter is stronger than the former. Now let $\overline{A}$ be an open set in the quotient topology. By the definition of the quotient topology, the set $A:=\rho^{-1}(\overline{A})$ is open in $\mathcal{P}(\mathbb{R}^{d})$ , so that for all $\mu\in A$ , there exists $r_{\mu}>0$ such that $\mathcal{B}(\mu,r_{\mu})\subset A$ , whence

[TABLE]

Since $\rho(\tau_{y}\mu)=\rho(\mu)\in\overline{A}$ for any $y\in\mathbb{R}^{d}$ and $\mu\in A$ , we may rewrite

[TABLE]

Introducing the notation

[TABLE]

we deduce from (63) that, for all $\mu\in A$ ,

[TABLE]

so that

[TABLE]

As a consequence,

[TABLE]

therefore $\overline{A}$ is an open set in the metric topology and the proof is completed. ∎

Acknowledgements

This work was motivated by several discussions with Freddy Bouchet on the large deviations of mean-field particle systems. The author is grateful to Cyril Labbé for his careful reading of this manuscript, and thanks the referee for correcting a mistake in the proof of Lemma A.3.

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Ambrosio, N. Gigli, and G. Savaré. Gradient flows in metric spaces and in the space of probability measures . Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel, second edition, 2008.
2[2] A. D. Banner, R. Fernholz, and I. Karatzas. Atlas models of equity markets. Ann. Appl. Probab. , 15(4):2296–2330, 2005.
3[3] S. Benachour, B. Roynette, D. Talay, and P. Vallois. Nonlinear self-stabilizing processes. I. Existence, invariant probability, propagation of chaos. Stochastic Process. Appl. , 75(2):173–201, 1998.
4[4] S. Benachour, B. Roynette, and P. Vallois. Nonlinear self-stabilizing processes. II. Convergence to invariant probability. Stochastic Process. Appl. , 75(2):203–224, 1998.
5[5] D. Benedetto, E. Caglioti, J. A. Carrillo, and M. Pulvirenti. A non-Maxwellian steady distribution for one-dimensional granular media. J. Statist. Phys. , 91(5-6):979–990, 1998.
6[6] D. Benedetto, E. Caglioti, and M. Pulvirenti. A kinetic equation for granular media. RAIRO Modél. Math. Anal. Numér. , 31(5):615–641, 1997.
7[7] P. Billingsley. Convergence of probability measures . Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc., New York, second edition, 1999. A Wiley-Interscience Publication.
8[8] S. Bobkov and M. Ledoux. One-dimensional empirical measures, order statistics and Kantorovich transport distances. To appear in Mem. Amer. Math. Soc.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Equilibrium large deviations for mean-field systems with translation invariance

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

Outline of the article

2. Notations and main results

2.1. Spaces of probability measures

2.2. Energy functional and Gibbs measure

Lemma 2.1** (Finiteness of the partition function).**

Proof.

Definition 2.2** (Gibbs measure).**

2.3. Two specific examples

Example 2.3** (mv-model).**

Example 2.4** (rb-model).**

Remark 2.5** (Intersection between both classes of models).**

2.4. Free energy and large deviations

Definition 2.6** (Boltzmann’s entropy).**

Remark 2.7** (On the moment condition).**

Definition 2.8** (Free energy).**

Remark 2.9** (Physical free energy).**

Lemma 2.10** (Bounds on the free energy).**

Remark 2.11**.**

Remark 2.12** (On Assumption (SH)).**

Remark 2.13** (On Assumptions (CC) and (LSC)).**

Theorem 2.14** (LDP for P~n\widetilde{\mathbb{P}}_{n}Pn​ in Wasserstein spaces).**

Example 2.15** (Counter-example to Theorem 2.14 when ℓ=1\ell=1ℓ=1).**

2.5. Large deviations in the quotient space

Theorem 2.16** (LDP for P‾n\overline{\mathbb{P}}_{n}Pn​ in the quotient space).**

Remark 2.17** (Large deviations in P‾(Rd)\overline{\mathcal{P}}(\mathbb{R}^{d})P(Rd)).**

2.6. Sketch of the proof of Theorems 2.14 and 2.16

2.7. Large deviations for the mv-model and the rb-model

2.7.1. mv-model

Corollary 2.18** (LDP for the mv-model).**

2.7.2. rb-model

Corollary 2.19** (LDP for the rb-model).**

3. Large deviations with a small external potential

3.1. Relative entropy and Sanov’s Theorem

Lemma 3.1** (From Boltzmann’s entropy to relative entropy).**

Proposition 3.2** (Sanov’s Theorem).**

3.2. Large deviations in the interacting case

Proposition 3.3** (LDP for the sequence Pnη\mathbb{P}^{\eta}_{n}Pnη​).**

Proof.

3.3. The measures P‾nη\overline{\mathbb{P}}^{\eta}_{n}Pnη​ and P~nη\widetilde{\mathbb{P}}^{\eta}_{n}Pnη​

Corollary 3.4** (LDP for P‾nη\overline{\mathbb{P}}^{\eta}_{n}Pnη​ and P~nη\widetilde{\mathbb{P}}^{\eta}_{n}Pnη​).**

3.4. Alternative expression for P~nη\widetilde{\mathbb{P}}^{\eta}_{n}Pnη​

Lemma 3.5** (Relation between P~nη\widetilde{\mathbb{P}}^{\eta}_{n}Pnη​ and P~nη\widetilde{P}^{\eta}_{n}Pnη​).**

Proof.

Lemma 3.6** (Bounds on V^nη\widehat{V}_{n}^{\eta}Vnη​).**

Proof.

4. Proof of Theorems 2.14 and 2.16

4.1. Rate functions

Lemma 4.1** (Goodness of rate functions).**

Lemma 4.2** (Level sets on P(Rd)\mathcal{P}(\mathbb{R}^{d})P(Rd)).**

Proof.

Proof of Lemma 4.1.

4.2. Exponential comparisons

Lemma 4.3** (Exponential tilting of P~nη\widetilde{P}^{\eta}_{n}Pnη​).**

Proof.

Lemma 4.4** (Exponential moment control).**

Proof.

4.3. Large deviation upper and lower bounds

Lemma 4.5** (Large deviation upper bound).**

Proof.

Lemma 4.6** (Large deviation lower bound).**

Proof.

Lemma 4.7** (Convergence of rate functions).**

Proof.

5. Application to McKean-Vlasov and rank-based models

5.1. mv-model

Lemma 5.1** (Case W♭≡0W^{\flat}\equiv 0W♭≡0).**

Proof.

Remark 5.2** (On the Laplace-Varadhan Lemma).**

Proof of (55).

Lemma 2.1 (Finiteness of the partition function).

Definition 2.2 (Gibbs measure).

Example 2.3 (mv-model).

Example 2.4 (rb-model).

Remark 2.5 (Intersection between both classes of models).

Definition 2.6 (Boltzmann’s entropy).

Remark 2.7 (On the moment condition).

Definition 2.8 (Free energy).

Remark 2.9 (Physical free energy).

Lemma 2.10 (Bounds on the free energy).

Remark 2.11.

Remark 2.12 (On Assumption (SH)).

Remark 2.13 (On Assumptions (CC) and (LSC)).

Theorem 2.14 (LDP for $\widetilde{\mathbb{P}}_{n}$ in Wasserstein spaces).

Example 2.15 (Counter-example to Theorem 2.14 when $\ell=1$ ).

Theorem 2.16 (LDP for $\overline{\mathbb{P}}_{n}$ in the quotient space).

Remark 2.17 (Large deviations in $\overline{\mathcal{P}}(\mathbb{R}^{d})$ ).

Corollary 2.18 (LDP for the mv-model).

Corollary 2.19 (LDP for the rb-model).

Lemma 3.1 (From Boltzmann’s entropy to relative entropy).

Proposition 3.2 (Sanov’s Theorem).

Proposition 3.3 (LDP for the sequence $\mathbb{P}^{\eta}_{n}$ ).

3.3. The measures $\overline{\mathbb{P}}^{\eta}_{n}$ and $\widetilde{\mathbb{P}}^{\eta}_{n}$

Corollary 3.4 (LDP for $\overline{\mathbb{P}}^{\eta}_{n}$ and $\widetilde{\mathbb{P}}^{\eta}_{n}$ ).

3.4. Alternative expression for $\widetilde{\mathbb{P}}^{\eta}_{n}$

Lemma 3.5 (Relation between $\widetilde{\mathbb{P}}^{\eta}_{n}$ and $\widetilde{P}^{\eta}_{n}$ ).

Lemma 3.6 (Bounds on $\widehat{V}_{n}^{\eta}$ ).

Lemma 4.1 (Goodness of rate functions).

Lemma 4.2 (Level sets on $\mathcal{P}(\mathbb{R}^{d})$ ).

Lemma 4.3 (Exponential tilting of $\widetilde{P}^{\eta}_{n}$ ).

Lemma 4.4 (Exponential moment control).

Lemma 4.5 (Large deviation upper bound).

Lemma 4.6 (Large deviation lower bound).

Lemma 4.7 (Convergence of rate functions).

Lemma 5.1 (Case $W^{\flat}\equiv 0$ ).

Remark 5.2 (On the Laplace-Varadhan Lemma).

Lemma 5.3 (Assumptions of Theorem 2.16 for the rb-model).

Lemma 5.4 (Comparison of rate functions).

Appendix A Metrisability of the quotient topology on $\overline{\mathcal{P}}(\mathbb{R}^{d})$

Lemma A.1 (Open and closed sets in $\overline{\mathcal{P}}(\mathbb{R}^{d})$ ).

Lemma A.2 (Translation invariance of the Prohorov metric).

Lemma A.3 (Metrisability of $\overline{\mathcal{P}}(\mathbb{R}^{d})$ ).