Large deviations of empirical measures of diffusions in weighted   topologies

Gr\'egoire Ferr\'e; Gabriel Stoltz

arXiv:1906.09411·math.PR·September 23, 2020

Large deviations of empirical measures of diffusions in weighted topologies

Gr\'egoire Ferr\'e, Gabriel Stoltz

PDF

TL;DR

This paper establishes large deviation principles for empirical measures of diffusion processes, linking spectral gap conditions to a generalized Cramér condition, and analyzes the Donsker-Varadhan rate functional in various dynamics.

Contribution

It introduces new conditions for large deviations of empirical measures in diffusion processes, extending classical results to unbounded functions and degenerate diffusions.

Findings

01

Large deviations principle (LDP) for unbounded functions in diffusion processes.

02

Spectral gap condition related to a generalized Cramér condition.

03

Application of results to Langevin dynamics in unbounded spaces.

Abstract

We consider large deviations of empirical measures of diffusion processes. In a first part, we present conditions to obtain a large deviations principle (LDP) for a precise class of unbounded functions. This provides an analogue to the standard Cram\'er condition in the context of diffusion processes, which turns out to be related to a spectral gap condition for a Witten-Schr\"odinger operator. Secondly, we study more precisely the properties of the Donsker-Varadhan rate functional associated with the LDP. We revisit and generalize some standard duality results as well as a more original decomposition of the rate functional with respect to the symmetric and antisymmetric parts of the dynamics. Finally, we apply our results to overdamped and underdamped Langevin dynamics, showing the applicability of our framework for degenerate diffusions in unbounded configuration spaces.

Equations621

L_{t} := \frac{1}{t} \int_{0}^{t} δ_{X_{s}} d s, t ⩾ 0,

L_{t} := \frac{1}{t} \int_{0}^{t} δ_{X_{s}} d s, t ⩾ 0,

\mathbb{P}\big{(}L_{t}\in\Gamma\big{)}\asymp\mathrm{e}^{-t\inf_{\nu\in\Gamma}\,I(\nu)},

\mathbb{P}\big{(}L_{t}\in\Gamma\big{)}\asymp\mathrm{e}^{-t\inf_{\nu\in\Gamma}\,I(\nu)},

∥ f ∥_{B_{κ}^{\infty}} := x \in X sup \frac{∣ f ( x ) ∣}{κ ( x )} < + \infty.

∥ f ∥_{B_{κ}^{\infty}} := x \in X sup \frac{∣ f ( x ) ∣}{κ ( x )} < + \infty.

Ψ := - \frac{L W}{W}

Ψ := - \frac{L W}{W}

\forall\,\nu\in\mathcal{P}(\mathcal{X}),\quad I(\nu)=\sup_{\|f\|_{B^{\infty}_{\kappa}}<+\infty}\ \big{\{}\nu(f)-\lambda(f)\big{\}},

\forall\,\nu\in\mathcal{P}(\mathcal{X}),\quad I(\nu)=\sup_{\|f\|_{B^{\infty}_{\kappa}}<+\infty}\ \big{\{}\nu(f)-\lambda(f)\big{\}},

λ (f) = t \to + \infty lim \frac{1}{t} lo g E [e^{\int_{0}^{t} f (X_{s}) d s}],

λ (f) = t \to + \infty lim \frac{1}{t} lo g E [e^{\int_{0}^{t} f (X_{s}) d s}],

P_{t}^{f} : φ \mapsto E [φ (X_{t}) e^{\int_{0}^{t} f (X_{s}) d s}] .

P_{t}^{f} : φ \mapsto E [φ (X_{t}) e^{\int_{0}^{t} f (X_{s}) d s}] .

M_{t} = W (X_{t}) e^{- \int_{0}^{t} \frac{L W}{W} (X_{s}) d s}

M_{t} = W (X_{t}) e^{- \int_{0}^{t} \frac{L W}{W} (X_{s}) d s}

\forall ν \in P (X), I (ν) = sup {- \int_{X} \frac{L u}{u} d ν, u \in D^{+} (L)},

\forall ν \in P (X), I (ν) = sup {- \int_{X} \frac{L u}{u} d ν, u \in D^{+} (L)},

\lambda(f)=\sup_{\nu\in\mathcal{P}(\mathcal{X})}\big{\{}\nu(f)-I(\nu)\big{\}}.

\lambda(f)=\sup_{\nu\in\mathcal{P}(\mathcal{X})}\big{\{}\nu(f)-I(\nu)\big{\}}.

I (ν) = \frac{1}{4} lo g \frac{d ν}{d μ}_{H^{1} (ν)}^{2} + \frac{1}{4} L_{A} (lo g \frac{d ν}{d μ})_{H^{- 1} (ν)}^{2},

I (ν) = \frac{1}{4} lo g \frac{d ν}{d μ}_{H^{1} (ν)}^{2} + \frac{1}{4} L_{A} (lo g \frac{d ν}{d μ})_{H^{- 1} (ν)}^{2},

d X_{t} = b (X_{t}) d t + σ (X_{t}) d B_{t},

d X_{t} = b (X_{t}) d t + σ (X_{t}) d B_{t},

L = b \cdot \nabla + S : \nabla^{2}, \mbox w i t h S = \frac{σ σ ^{T}}{2},

L = b \cdot \nabla + S : \nabla^{2}, \mbox w i t h S = \frac{σ σ ^{T}}{2},

\mathscr{C}(\varphi,\psi)=\frac{1}{2}\big{(}\mathcal{L}(\varphi\psi)-\varphi\mathcal{L}\psi-\psi\mathcal{L}\varphi\big{)}=\nabla\varphi\cdot S\nabla\psi.

\mathscr{C}(\varphi,\psi)=\frac{1}{2}\big{(}\mathcal{L}(\varphi\psi)-\varphi\mathcal{L}\psi-\psi\mathcal{L}\varphi\big{)}=\nabla\varphi\cdot S\nabla\psi.

S = {φ \in C^{\infty} (X) \forall α \in N^{d}, \exists N > 0 \mbox s u c h t ha t x \in X sup \frac{∣ \partial ^{α} φ ( x ) ∣}{( 1 + ∣ x ∣ ^{2} ) ^{N}} < + \infty},

S = {φ \in C^{\infty} (X) \forall α \in N^{d}, \exists N > 0 \mbox s u c h t ha t x \in X sup \frac{∣ \partial ^{α} φ ( x ) ∣}{( 1 + ∣ x ∣ ^{2} ) ^{N}} < + \infty},

∥ φ ∥_{B^{\infty}} = x \in X sup ∣ φ (x) ∣.

∥ φ ∥_{B^{\infty}} = x \in X sup ∣ φ (x) ∣.

B_{W}^{\infty} (X) = {φ : X \to R measurable ∥ φ ∥_{B_{W}^{\infty}} := x \in X sup \frac{∣ φ ( x ) ∣}{W ( x )} < + \infty},

B_{W}^{\infty} (X) = {φ : X \to R measurable ∥ φ ∥_{B_{W}^{\infty}} := x \in X sup \frac{∣ φ ( x ) ∣}{W ( x )} < + \infty},

\mathcal{P}_{W}(\mathcal{X})=\Big{\{}\nu\in\mathcal{P}(\mathcal{X})\ \Big{|}\ \nu(W)<+\infty\Big{\}}.

\mathcal{P}_{W}(\mathcal{X})=\Big{\{}\nu\in\mathcal{P}(\mathcal{X})\ \Big{|}\ \nu(W)<+\infty\Big{\}}.

\forall ν, η \in P_{W} (X), d_{W} (ν, η) = ∥ φ ∥_{B_{W}^{\infty}} ⩽ 1 sup {\int_{X} φ d ν - \int_{X} φ d η} = \int_{X} W (x) ∣ ν - η ∣ (d x),

\forall ν, η \in P_{W} (X), d_{W} (ν, η) = ∥ φ ∥_{B_{W}^{\infty}} ⩽ 1 sup {\int_{X} φ d ν - \int_{X} φ d η} = \int_{X} W (x) ∣ ν - η ∣ (d x),

\forall φ \in B^{\infty} (X), (P_{t} φ) (x) = E [φ (X_{t}^{(x)})],

\forall φ \in B^{\infty} (X), (P_{t} φ) (x) = E [φ (X_{t}^{(x)})],

(μ P_{t}) (φ) = μ (P_{t} φ) = \int_{X} E [φ (X_{t}^{(x)})] μ (d x) .

(μ P_{t}) (φ) = μ (P_{t} φ) = \int_{X} E [φ (X_{t}^{(x)})] μ (d x) .

L^{2} (μ) = {φ \mbox m e a s u r ab l e \int_{X} ∣ φ ∣^{2} d μ < + \infty} .

L^{2} (μ) = {φ \mbox m e a s u r ab l e \int_{X} ∣ φ ∣^{2} d μ < + \infty} .

∣ φ ∣_{H^{1} (μ)}^{2} = \int_{X} C (φ, φ) d μ,

∣ φ ∣_{H^{1} (μ)}^{2} = \int_{X} C (φ, φ) d μ,

∣ φ ∣_{H^{- 1} (μ)}^{2} = ψ \in C_{c}^{\infty} (X) sup {2 \int_{X} φ ψ d μ - ∣ ψ ∣_{H^{1} (μ)}^{2}} .

∣ φ ∣_{H^{- 1} (μ)}^{2} = ψ \in C_{c}^{\infty} (X) sup {2 \int_{X} φ ψ d μ - ∣ ψ ∣_{H^{1} (μ)}^{2}} .

∣ φ ∣_{H^{1} (μ)}^{2} = 2 \int_{X} φ (- L φ) d μ .

∣ φ ∣_{H^{1} (μ)}^{2} = 2 \int_{X} φ (- L φ) d μ .

∣ φ ∣_{H^{1} (μ)}^{2} = \int_{X} ∣\nabla φ ∣^{2} d μ .

∣ φ ∣_{H^{1} (μ)}^{2} = \int_{X} ∣\nabla φ ∣^{2} d μ .

ψ (x) = {1, 0, if ∣ x ∣ ⩽ 1, if ∣ x ∣ ⩾ 2,

ψ (x) = {1, 0, if ∣ x ∣ ⩽ 1, if ∣ x ∣ ⩾ 2,

∣ φ ∣_{H^{- 1} (μ)} ⩾ 2 n \int_{∣ x ∣ ⩽ 2 n} ψ φ d μ - C .

∣ φ ∣_{H^{- 1} (μ)} ⩾ 2 n \int_{∣ x ∣ ⩽ 2 n} ψ φ d μ - C .

\int_{∣ x ∣ ⩽ 2 n} ψ φ d μ n \to + \infty \int_{X} φ d μ ⩾ 0.

\int_{∣ x ∣ ⩽ 2 n} ψ φ d μ n \to + \infty \int_{X} φ d μ ⩾ 0.

\big{\{}x\in\mathcal{X}\,\big{|}\,f(x)\leqslant M\big{\}}

\big{\{}x\in\mathcal{X}\,\big{|}\,f(x)\leqslant M\big{\}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Large deviations of empirical measures of diffusions in weighted topologies

Grégoire Ferré and Gabriel Stoltz

Université Paris-Est, CERMICS (ENPC), Inria, F-77455 Marne-la-Vallée, France

Abstract

We consider large deviations of empirical measures of diffusion processes. In a first part, we present conditions to obtain a large deviations principle (LDP) for a precise class of unbounded functions. This provides an analogue to the standard Cramér condition in the context of diffusion processes, which turns out to be related to a spectral gap condition for a Witten–Schrödinger operator. Secondly, we study more precisely the properties of the Donsker–Varadhan rate functional associated with the LDP. We revisit and generalize some standard duality results as well as a more original decomposition of the rate functional with respect to the symmetric and antisymmetric parts of the dynamics. Finally, we apply our results to overdamped and underdamped Langevin dynamics, showing the applicability of our framework for degenerate diffusions in unbounded configuration spaces.

1 Introduction

Empirical averages of diffusion processes and their convergence are commonly studied in statistical mechanics, probability theory and machine learning. In statistical physics, an observable averaged along the trajectory of a diffusion typically converges to the expectation with respect to its stationary distribution, which provides some macroscopic information on the system [74, 84]. For reversible dynamics, this convergence is known to be characterized by an entropy functional [106, 7], which generalizes results for small fluctuations such as the central limit theorem [75] or Berry-Esseen type inequalities [91]. It has been shown for some time that the approach can be extended to nonequilibrium systems by considering generalized entropy and free energy functionals, as provided by the theory of large deviations [28, 45, 106]. From a more computational perspective, studying the convergence of empirical averages is an important problem for the efficiency of Monte Carlo Markov Chain methods [1, 100, 98, 36].

Since its initiation by Cramér in the 30s [25, 108], large deviations theory has been given many extensions. The theory takes its origin in the study of fluctuations for sums of independent variables, leading to the celebrated Sanov theorem [29]. Interestingly, the necessity of Cramér’s exponential moment condition for Sanov’s theorem to hold in a Wasserstein topology has been proved only recently [111].

Due to the above mentioned applications, it is natural to try to apply such a theory to diffusions, or more generally Markovian dynamics. This is useful for instance in statistical physics, when considering Gallavotti–Cohen fluctuation relations for irreversible dynamics [52, 79, 78], as well as for characterizing dynamical phase transitions in physical systems [54, 3, 89, 92]. From a more computational perspective, studying the rate function associated with a given dynamics is interesting for designing better sampling strategies [40, 98, 99], which is important for instance in a Bayesian framework [19, 14] or for molecular dynamics [82, 83]. The approach can also be used for deriving concentration results such as Bernstein-type inequalities [53, 13] and uncertainty quantification bounds [73, 57].

However, proving a large deviations principle for correlated processes turns out to be a difficult task. A milestone in the theory is the series of papers by Donsker and Varadhan [31, 32, 34, 35] and the dual approach followed by Gärtner and Ellis [55, 44]. The strategy of the former works is to build explicitly lower and upper large deviations bounds from the Girsanov theorem and the Tchebychev inequality [109]. On the other hand, the Gärtner–Ellis theorem relies on the existence and regularity of a free energy functional. This technique has been later related to optimal control problems through the so-called weak convergence approach [38, 39].

Whichever strategy is chosen, proving large deviations principles for empirical measures of diffusions in unbounded configuration spaces remains difficult. Indeed, studying the stability of unbounded Markov processes is already challenging, and often relies on Lyapunov function techniques [87, 86, 97, 60]. Such a Lyapunov function can be interpreted as an energy associated with the system, which decreases in average and provides a control on the excursions of the process far away from the origin. This technique can be used for proving LDPs, see for instance [109, Section 9] and [30, 115, 39]. However, the LDPs of the above mentioned works are stated in the so-called strong (resp. weak) topology, i.e. with respect to the topology on measures associated with the convergence of measurable bounded (resp. continuous bounded) functions. To the best of our knowledge, convergence in Wasserstein-like topologies (i.e. associated with unbounded functions) for diffusions has only been addressed in [76], and [115, Section 2.2]. Unfortunately, the nonlinear approach of [76] does not allow to characterize precisely the set of functions for which the LDP holds, while [115] considers a particular system (Langevin dynamics). In both cases, the rate function is not related to the standard Donsker–Varadhan theory [33]. Our first result is to derive the LDP in a weak topology associated with unbounded functions, under very natural conditions, and to express the rate function in duality with a free energy. From a practical point of view, this allows to compute the rate function from the free energy, a standard procedure [56, 106, 23, 88, 48].

Once a large deviations principle has been derived, providing alternative expressions of the rate function is an important problem. This can be useful for computing this function more efficiently, or for interpreting some key aspects of the dynamics (such as irreversibility for physical systems). Our first contribution in this direction is to derive a variational representation of the rate function similar to the Donsker–Varadhan formula [33]. This provides a variational representation of the principal eigenvalue for any non-symmetric linear second order differential operator associated with a diffusion, under confinement and regularity conditions. To the best of our knowledge, there is no such formula in an unbounded setting, a fortiori for unbounded functions. Finally, it has been shown in a pioneering work [15], for a specific choice of dynamics, that the above mentioned duality allows to decompose the rate function into two parts: one corresponding to a “reversible” part and the other to an “irreversible part” of the dynamics. We extend these results to general diffusions by using Sobolev seminorms, a feature inspired by the small fluctuations framework developed in [75]. This decomposition turns out to be useful for various purposes. For illustration we apply it to study more precisely the rate function of the Langevin dynamics, in particular its dependence on the friction both in the Hamiltonian and overdamped limits.

We now sketch the main results of the paper, the precise setting being presented in Section 2.1.

Main results.

Consider a diffusion process $(X_{t})_{t\geqslant 0}$ over a state space $\mathcal{X}\subset\mathbb{R}^{d}$ with generator $\mathcal{L}$ , invariant probability measure $\mu$ , and empirical measure

[TABLE]

where $\delta_{x}$ is the Dirac measure at $x\in\mathcal{X}$ .

Our first contribution is to prove a large deviations principle for the empirical measure $(L_{t})_{t\geqslant 0}$ in a weak topology associated with an unbounded function $\kappa:\mathcal{X}\to[1,+\infty)$ . That is, we prove the following type of long time scaling: for $\Gamma\subset\mathcal{P}(\mathcal{X})$ ,

[TABLE]

where $I$ is a rate function. Here, $\mathcal{P}(\mathcal{X})$ denotes the set of probability measures on $\mathcal{X}$ , and the above scaling holds for the weak topology on $\mathcal{P}(\mathcal{X})$ associated with measurable functions $f$ satisfying

[TABLE]

As is standard for LDPs on unbounded state spaces [109, 115], our result relies on the existence of a twice differentiable Lyapunov function $W:\mathcal{X}\to[1,+\infty)$ such that

[TABLE]

has compact level sets (in other words, it goes to infinity at infinity). Unlike previous works, where this condition implies the asymptotic equivalence (2) in the weak topology corresponding to the convergence of measures tested against bounded test functions [109, 39, 115], we show in Section 2 that the LDP holds for the weak topology associated with any cost function $\kappa$ controlled by $\Psi$ (see Section 2.1 for details). Moreover, the associated rate function $I:\mathcal{P}(\mathcal{X})\to[0,+\infty]$ , also called entropy, reads

[TABLE]

where

[TABLE]

is the cumulant or free energy function.

We mention that our strategy relies on the Gärtner–Ellis theorem, according to which the existence and regularity of (5) implies the large deviations principle. We actually show that (5) is well-defined because it matches the principal eigenvalue of the Feynman–Kac operator

[TABLE]

A key remark for defining the above operator is that the process

[TABLE]

is a local martingale, as noted by Wu in [115]. This allows to define (6) for functions $\varphi$ such that $\|\varphi\|_{B^{\infty}_{W}}<+\infty$ , as soon as $f$ is dominated by the function $\Psi$ defined in (4). As a result, for any such $f$ , the operator (6) can be shown to be compact over the space of functions controlled by $W$ (see [55, 47]), and the functional (5) is obtained as the largest eigenvalue of the operator (6) through a generalized Perron–Frobenius theorem (the Krein–Rutman theorem [27]).

The second part of our work consists in rewriting the rate function $I$ . For this, we first show that

[TABLE]

where $\mathcal{D}^{+}(\mathcal{L})$ is an appropriate domain defined in Section 3. This formula is similar to the one proved in [33], but differs by additional growth conditions in the definition of $\mathcal{D}^{+}(\mathcal{L})$ . This result leads to a variational formula for the largest eigenvalue $\lambda(f)$ of the operator $P_{t}^{f}$ defined on a suitable functional space through

[TABLE]

We mention that the proof of (8) relies on the spectral problem associated with the Feynman–Kac operator (6), and uses tools from the recent work [47].

Finally, the variational representation (8) allows to generalize the results of [15] by splitting $I$ into two parts. More specifically, denoting by $\mathcal{L}=\mathcal{L}_{\mathrm{S}}+\mathcal{L}_{\mathrm{A}}$ the decomposition into symmetric and antisymmetric parts of the generator considered on $L^{2}(\mu)$ , we obtain, for any $\nu\ll\mu$ :

[TABLE]

where $|\cdot|_{\mathscr{H}^{1}(\nu)}$ and $|\cdot|_{\mathscr{H}^{-1}(\nu)}$ refer to Sobolev seminorms defined in Section 2.1. Interestingly, the proof relies on a generalized Witten transform performed in the variational representation (8), which we may therefore call variational Witten transform. This shows that, for a given invariant measure, an irreversible dynamics ( $\mathcal{L}_{\mathrm{A}}\neq 0$ ) produces more entropy than a reversible one, in accordance with the second law of thermodynamics. This decomposition is useful for instance to study the entropy production of the Langevin dynamics, which is irreversible but has a particular structure. In this case, there is a natural identification of the effect of the reversible and irreversible parts of the dynamics on fluctuations.

Organization of the work.

The paper is organized as follows. In Section 2 we prove the large deviations principle under Lyapunov and regularity conditions. In Section 3 we rewrite the rate function and give its decomposition into symmetric and antisymmetric parts. Some examples of application are given in Section 4, in particular for overdamped and underdamped Langevin dynamics. Section 5 discusses possible extensions and connections with related works. Finally, most of the proofs are postponed to Section 6.

2 Large deviations principle

2.1 Setting

This section introduces the main notation used throughout the paper. We consider a diffusion process $(X_{t})_{t\geqslant 0}$ evolving in $\mathcal{X}=\mathbb{R}^{d}$ with $d\in\mathbb{N}\setminus\{0\}$ , and satisfying the following stochastic differential equation (SDE):

[TABLE]

where $b:\mathcal{X}\to\mathbb{R}^{d}$ , $\sigma:\mathcal{X}\to\mathbb{R}^{d\times m}$ and $(B_{t})_{t\geqslant 0}$ is a $m$ -dimensional Brownian motion for some $m\in\mathbb{N}^{*}$ .

Remark 1.

The analysis can easily be extended with appropriate modifications to other spaces $\mathcal{X}$ such as $\mathcal{X}=\mathbb{T}^{d}$ or $\mathcal{X}=\mathbb{T}^{d}\times\mathbb{R}^{d}$ , where $\mathbb{T}^{d}$ is the $d$ -dimensional torus. The last case is motivated by applications to the Langevin equation, where $\mathbb{T}^{d}$ would be a bounded position space and $\mathbb{R}^{d}$ the unbounded momentum space (see Section 4.2).

The generator of the dynamics (9), denoted by $\mathcal{L}$ , reads

[TABLE]

where $\sigma^{T}$ denotes the transpose of the matrix $\sigma$ and $\cdot$ is the scalar product on $\mathbb{R}^{d}$ . Moreover, $\nabla^{2}$ stands for the Hessian matrix, and for two matrices $A,B$ belonging to $\mathbb{R}^{d\times d}$ , we write $A:B=\mathrm{Tr}(A^{T}B)$ . The conditions on $b$ and $\sigma$ will be made precise in Section 2.2. The function $S$ takes values in the set of symmetric positive matrices (not necessarily definite). We also introduce the carré du champ operator [5] associated with $\mathcal{L}$ defined by, for two regular functions $\varphi$ , $\psi$ :

[TABLE]

We will use the space $C_{\mathrm{c}}^{\infty}(\mathcal{X})$ (resp. $C_{\mathrm{b}}(\mathcal{X})$ ) of smooth functions with compact support (resp. continuous and bounded functions), as well as the space of smooth functions growing at most polynomially and whose derivatives also grow at most polynomially:

[TABLE]

where $\partial^{\alpha}=\partial_{x_{1}}^{\alpha_{1}}\dots\partial_{x_{d}}^{\alpha_{d}}$ with $\alpha=(\alpha_{1},\dots\alpha_{d})$ .

The space of bounded measurable functions, denoted by $B^{\infty}(\mathcal{X})$ , is endowed with the norm

[TABLE]

Moreover, we will need weighted function spaces and the corresponding probability measure spaces, which commonly appear in Markov chain theory [87, 76, 60]. For any measurable function $W:\mathcal{X}\to[1,+\infty)$ we define

[TABLE]

and the associated space of probability measures (see [102, Chapter 2] for duality results on measure spaces):

[TABLE]

The associated weighted total variation distance is (see for instance [60]):

[TABLE]

where $|\nu-\eta|$ denotes the total variation measure associated to $\nu-\eta$ , see [102, Chapter 6].

Remark 2.

Note that the spaces (12) and (13) are defined for an arbitrary measurable function $W\geqslant 1$ . It is possible to weaken the assumption $W\geqslant 1$ but we will not need these refinements in this paper.

We denote by $\tau$ -topology the weak topology on $\mathcal{P}(\mathcal{X})$ associated with the convergence of measures tested against functions belonging to $B^{\infty}(\mathcal{X})$ (we may also use the notation $\sigma(\mathcal{P}(\mathcal{X}),B^{\infty})$ ); see [30]. This means that for a sequence $(\nu_{n})_{n\in\mathbb{N}}$ in $\mathcal{P}(\mathcal{X})$ , $\nu_{n}\to\nu$ in the $\tau$ -topology if $\nu_{n}(\varphi)\to\nu(\varphi)$ for any $\varphi\in B^{\infty}(\mathcal{X})$ . Recall that the $\tau$ -topology is stronger than the usual weak topology $\sigma(\mathcal{P}(\mathcal{X}),C_{\mathrm{b}}(\mathcal{X}))$ on $\mathcal{P}(\mathcal{X})$ , which corresponds to the convergence $\nu_{n}(\varphi)\to\nu(\varphi)$ for any $\varphi\in C_{\mathrm{b}}(\mathcal{X})$ . The $\tau$ -topology can be extended to account for convergence of measures tested against the larger class of functions $\varphi\in B^{\infty}_{W}(\mathcal{X})$ . We denote by $\tau^{W}$ the associated topology $\sigma(\mathcal{P}_{W}(\mathcal{X}),B^{\infty}_{W}(\mathcal{X}))$ , see [115, 76].

We associate to the dynamics $(X^{(x)}_{t})_{t\geqslant 0}$ started from $X_{0}^{(x)}=x\in\mathcal{X}$ the semigroup $(P_{t})_{t\geqslant 0}$ defined through

[TABLE]

where $\mathbb{E}$ stands for the expectation with respect to all realizations of the Brownian motion in (9). Let us mention that, with some abuse of notation but for the sake of readability, we will not write out explicitly the dependence of $X_{t}$ on $x$ in the proofs presented in Section 6, see the discussion at the beginning of this section. We say that $\mu\in\mathcal{P}(\mathcal{X})$ is invariant with respect to the dynamics $(X_{t}^{(x)})_{t\geqslant 0}$ if $(\mu P_{t})(\varphi)=\mu(\varphi)$ for any $\varphi\in C_{\mathrm{b}}(\mathcal{X})$ , with the notation

[TABLE]

This implies in particular that $\mu(\mathcal{L}\varphi)=0$ for $\varphi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ , see [46, Proposition 9.2].

We now follow the path of [75, Chapter 2] for defining other useful functional spaces. For any probability measure $\mu\in\mathcal{P}(\mathcal{X})$ , let

[TABLE]

For $\varphi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ , we introduce the seminorm

[TABLE]

and the equivalence relation $\sim_{1}$ through: $\varphi\sim_{1}\psi$ if and only if $|\varphi-\psi|_{\mathscr{H}^{1}(\mu)}=0$ . We denote by $\mathscr{H}^{1}(\mu)$ the closure of $C_{\mathrm{c}}^{\infty}(\mathcal{X})$ quotiented by $\sim_{1}$ for the norm $|\cdot|_{\mathscr{H}^{1}(\mu)}$ . Note that $\mathscr{H}^{1}(\mu)$ and $L^{2}(\mu)$ are not subspaces of each other in general, but $\mathscr{H}^{1}(\mu)\subset L^{2}(\mu)$ for instance if $\mu$ satisfies a Poincaré inequality and $S$ is positive definite. The difference between $L^{2}(\mu)$ and $\mathscr{H}^{1}(\mu)$ is however important for degenerate dynamics, see the application in Section 4.2. We now construct a space dual to $\mathscr{H}^{1}(\mu)$ with the same density argument by introducing the seminorm: for $\varphi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ ,

[TABLE]

We define similarly the equivalence relation $\sim_{-1}$ on $C_{\mathrm{c}}^{\infty}(\mathcal{X})$ by $\varphi\sim_{-1}\psi$ if and only if $|\varphi-\psi|_{\mathscr{H}^{-1}(\mu)}=0$ . The space $\mathscr{H}^{-1}(\mu)$ is then the closure of $C_{\mathrm{c}}^{\infty}(\mathcal{X})$ quotiented by $\sim_{-1}$ . This is actually the dual space of $\mathscr{H}^{1}(\mu)$ , see [75, Section 2.2, Claim F].

Let us relate $\mathscr{H}^{1}(\mu)$ to the more standard $H^{1}(\mu)$ Sobolev space. If $\mu$ is invariant with respect to $\mathcal{L}$ then, for $\varphi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ , it holds (using that $\mathcal{L}(\varphi^{2})=2\varphi\mathcal{L}\varphi+2\mathscr{C}(\varphi,\varphi)$ )

[TABLE]

In particular, when $S=\mathrm{Id}$ we have

[TABLE]

In this case, $|\cdot|_{\mathscr{H}^{1}(\mu)}$ is the standard $H^{1}(\mu)$ Sobolev seminorm [83]. An in-depth discussion on the space $\mathscr{H}^{1}(\mu)$ and its use for proving central limit theorems for Markov processes is provided in [75, Chapter 2].

Remark 3.

The space $\mathscr{H}^{-1}(\mu)$ has a role comparable to the subspace $L_{0}^{2}(\mu)$ of functions in $L^{2}(\mu)$ with average zero with respect to $\mu$ since $\mathscr{H}^{-1}(\mu)\cap L^{2}(\mu)\subset L_{0}^{2}(\mu)$ (but of course the functions of $\mathscr{H}^{-1}(\mu)$ do not belong to $L^{2}(\mu)$ in general). Assume indeed that $\varphi\in L^{2}(\mu)$ (so $\varphi\in L^{1}(\mu)$ ), $\int_{\mathcal{X}}\varphi\,d\mu\geqslant 0$ (which is not restrictive upon considering $-\varphi$ ) and $|\varphi|_{\mathscr{H}^{-1}}<+\infty$ . We may choose $\psi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ such that

[TABLE]

and set $\psi_{n}(x)=n\psi(x/n)$ , so $|\psi_{n}|_{\mathscr{H}^{1}(\mu)}\leqslant C$ for some constant $C>0$ independent of $n$ . The definition of $\mathscr{H}^{-1}(\mu)$ shows that

[TABLE]

By the dominated convergence theorem it holds

[TABLE]

Since $|\varphi|_{\mathscr{H}^{-1}(\mu)}<+\infty$ , we obtain by letting $n\to+\infty$ that $\mu(\varphi)=0$ .

We also introduce some notation concerning the growth of functions. A function $f:\mathcal{X}\to\mathbb{R}$ is said to have compact level sets if for any $M\in\mathbb{R}$ the set

[TABLE]

is compact (with the convention that $\emptyset$ is compact). A function $g$ is said to be negligible with respect to $f$ (denoted by $g\ll f$ ) if $f/g$ has compact level sets, and $g$ is said to be equivalent to $f$ (denoted by $g\sim f$ ) if there exist constants $c,c^{\prime}>0$ and $R,R^{\prime}\in\mathbb{R}$ such that

[TABLE]

Remark 4.

The above definitions are useful when the state space $\mathcal{X}$ is unbounded. A sufficient condition for $f$ to have compact level sets in this case is for this function to be lower semicontinuous and to go to infinity at infinity (i.e. to be coercive). If $\mathcal{X}$ was bounded, all these criteria would be automatically met for smooth functions.

Finally, we denote by $\underline{\lim}$ and $\overline{\lim}$ the inferior and superior limits respectively, while for a subset $A\subset\mathcal{Y}$ of a topological space $\mathcal{Y}$ , $\mathring{A}$ and $\bar{A}$ denote the interior and closure of $A$ for the chosen topology on $\mathcal{Y}$ . The function $\mathds{1}_{A}$ denotes the indicator function of the set $A$ , i.e. $\mathds{1}_{A}(x)=1$ if $x\in A$ and $\mathds{1}_{A}(x)=0$ otherwise. For a Banach space $E$ , $\mathcal{B}(E)$ refers to the Banach space of bounded linear operators over $E$ with the usual norm. We recall some elements of large deviations theory in Appendix A for the reader’s convenience.

2.2 Statement of the main results

The large deviations principle relies on three standard assumptions: hypoellipticity of the generator, irreducibility of the dynamics, and a Lyapunov condition.

We start with our hypoellipticity assumption (which could certainly be relaxed for particular applications, see for instance [115]). It will be useful for proving regularity of the Feynman–Kac semigroup in Lemma 5. We denote by $A^{\dagger}$ the adjoint of a (closed) operator $A$ considered on $L^{2}(dx)$ .

Assumption 1 (Hypoellipticity).

The functions $b$ and $\sigma$ in (9) belong to $\mathscr{S}^{d}$ and $\mathscr{S}^{d\times m}$ , respectively, and the generator $\mathcal{L}$ defined in (10) satisfies the hypoelliptic Hörmander condition. More precisely, $\mathcal{L}$ can be written as

[TABLE]

where $(A_{i})_{i=0}^{d}$ are first order differential operators with coefficients belonging to $\mathscr{S}$ such that the family

[TABLE]

spans $\mathbb{R}^{d}$ at any $x\in\mathcal{X}$ for a finite number of commutators $n_{x}\in\mathbb{N}$ .

This assumption is natural in practical situations, as illustrated in the applications of Section 4 covering elliptic and hypoelliptic diffusions, see [65, 43, 97] for details. Note that excluding the operator $A_{0}$ from the first family means that, if $\mathcal{L}$ satisfies Assumption 1, $\partial_{t}+\mathcal{L}$ is hypoelliptic and the transition kernel of $(X_{t})_{t\geqslant 0}$ has a smooth density for any $t>0$ .

The regularity requirement comes together with a controllability condition (recall that $\sigma$ takes values in $\mathbb{R}^{d\times m}$ ).

Assumption 2 (Controllability).

For any $x,y\in\mathcal{X}$ and $T>0$ , there exists a control $u\in C^{0}([0,T],\mathbb{R}^{m})$ such that the path $\phi\in C^{0}([0,T],\mathcal{X})$ defined as

[TABLE]

is well-defined and satisfies $\phi(T)=y$ .

Assumption 2 together with Assumption 1 implies that the process is irreducible, i.e. that the transition density of $(X_{t})_{t\geqslant 0}$ is everywhere positive (by adapting the argument of [97, Proposition 8.1]), which will be used in Lemma 6. Note that constructing a control $u\in C^{0}([0,T],\mathcal{X})$ may be difficult in general [70]. However, for the overdamped and underdamped Langevin dynamics we are interested in, building such a control turns out to be guenuinely feasible, see [86, 105, 97, 83, 85] and references therein. Let us mention that the above two assumptions are standard for proving LDPs [109, 115].

A recurrent idea when studying Markov chain stability and large deviations on an unbounded state space is to reduce the analysis to a compact set and to control the excursions of the dynamics out of this set with a Lyapunov function [87, 115]. Our Witten–Lyapunov condition for the dynamics reads as follows (for the terminology, see Remark 6 below).

Assumption 3 (Witten–Lyapunov condition).

There exists a function $W:\mathcal{X}\to[1,+\infty)$ of class $C^{2}(\mathcal{X})$ , with compact level sets and such that

[TABLE]

has compact level sets. Moreover, there exists a $C^{2}(\mathcal{X})$ function $\mathscr{W}:\mathcal{X}\to[1,+\infty)$ such that, for some constants $C_{1}>0$ , $C_{2}\in\mathbb{R}$ ,

[TABLE]

In all what follows, we consider an arbitrary function $\kappa:\mathcal{X}\to[1;+\infty)$ belonging to $\mathscr{S}$ such that:

•

$\kappa\ll\Psi$ ;

•

either (i) $\kappa$ bounded, or (ii) $\kappa$ has compact level sets and there exists $c\in\mathbb{R}$ such that

[TABLE]

Remark 5.

Note that the condition $\mathscr{W}^{2}\leqslant C_{1}W$ implies in particular that $\mathscr{W}\ll W$ . In addition, since $\kappa\ll\Psi$ and $\Psi\sim-\frac{\mathcal{L}\mathscr{W}}{\mathscr{W}}$ , it holds $\kappa\ll-\frac{\mathcal{L}\mathscr{W}}{\mathscr{W}}$ . These facts will be frequently used in the proofs. Moreover the conditions (21) are not restrictive for exponential-like Lyapunov function as shown in Proposition 1 below – the idea being that $\mathscr{W}$ can be set to $\sqrt{W}$ . The condition (22) also typically holds because $W$ is chosen of exponential type while $\kappa$ is a polynomial. In practice, the auxiliary function $\mathscr{W}$ is used to obtain some control in the proofs of Lemmas 3 and 5 (in particular to apply a Grönwall lemma). Assumption 3 could certainly be phrased differently, possibly with weaker conditions on the functions at stake.

Although we stated Assumption 3 in order to fit standard conditions when considering large deviations on unbounded state spaces [109, 115], in practice it can be obtained from a non-linear Lyapunov condition in the spirit of [76] and [39, Condition 2.2]. This is the purpose of the next proposition, whose proof is postponed to Appendix B.

Proposition 1.

Assume that there exists $V\in\mathscr{S}$ such that:

•

$V$ * has compact level sets;*

•

$|\sigma^{T}\nabla V|$ * has compact level sets;*

•

for any $\theta\in(0,1)$ ,

[TABLE]

Then Assumption 3 is satisfied with

[TABLE]

for $\theta\in(0,1)$ and $\varepsilon<\theta/2$ small enough. In this case it holds

[TABLE]

Moreover, condition (22) holds true for any function $\kappa:\mathcal{X}\to[1,+\infty)$ of class $\mathscr{S}$ such that either (i) $\kappa$ is bounded or (ii) $\kappa$ has compact level sets, satisfies $\kappa\ll\Psi$ and there exists $C\geqslant 0$ with

[TABLE]

Note that (23) means that the term $-\mathcal{L}V$ coming from the dynamics must compensate the quadratic loss proportional to $|\sigma^{T}\nabla V|^{2}$ . We also mention that the condition (24) is not restrictive in general since it is typically satisfied by polynomial-like functions $\kappa$ .

A first consequence of Assumptions 1 to 3 is the ergodicity of the dynamics, whatever the initial distribution for $X_{0}$ .

Proposition 2.

Under Assumptions 1, 2 and 3, (9) has a global strong solution, and the process $(X_{t}^{(x)})_{t\geqslant 0}$ admits a unique invariant probability measure $\mu\in\mathcal{P}_{W}(\mathcal{X})$ . This measure has a positive $C^{\infty}(\mathcal{X})$ -density with respect to the Lebesgue measure: there exists $\rho^{\mu}\in C^{\infty}(\mathcal{X})$ with $\rho^{\mu}>0$ such that $\mu(dy)=\rho^{\mu}(y)\,dy$ . Moreover, the dynamics is ergodic with respect to $\mu$ : there exist $C,c>0$ such that

[TABLE]

Equivalently,

[TABLE]

Proof.

The existence of a unique local strong solution is standard when Assumption 1 holds, see [96, Chapter IX, Exercise 2.10]. Assumption 3 then implies the existence of $a>0$ , $b\in\mathbb{R}$ such that

[TABLE]

and global existence can be deduced from the above Lyapunov inequality [97]. The end of the proof is a direct application of [97, Theorem 8.9] since Assumption 2 together with Assumption 1 ensures irreducibility. ∎

We can now present the large deviations principle associated with the empirical measure of the process $(X_{t}^{(x)})_{t\geqslant 0}$ with respect to its invariant measure $\mu$ . Recall that the empirical measure of the process is defined by

[TABLE]

where $\delta_{y}$ denotes the Dirac mass at $y\in\mathcal{X}$ . When one considers large deviations principles for empirical averages of the form (25), the topology on probability measures has to be specified. As mentioned in the introduction, most of the LDPs are stated in topologies associated with bounded measurable functions (resp. continuous bounded), the so-called strong topology or $\tau$ -topology (resp. weak topology). We now prove that, in our setting, a LDP holds in the $\tau^{\kappa}$ -topology defined in Section 2.1, for any function $\kappa$ satisfying Assumption 3. The proof of Theorem 1 is presented in Section 6.1. We recall that a rate function is said to be good if its level sets are compact.

Theorem 1.

Suppose that Assumptions 1, 2 and 3 hold true, and consider a function $\kappa$ as in Assumption 3 and $x\in\mathcal{X}$ fixed. Then, the functional

[TABLE]

does not depend on $x$ , is well-defined, convex and finite, and $(L_{t}^{(x)})_{t\geqslant 0}$ satisfies a LDP in the $\tau^{\kappa}$ -topology with the good rate function defined by:

[TABLE]

More precisely, for any $\tau^{\kappa}$ -measurable set $\Gamma\subset\mathcal{P}(\mathcal{X})$ and any $x\in\mathcal{X}$ , it holds

[TABLE]

where the interior and closure of $\Gamma$ are taken with respect to the $\tau^{\kappa}$ -topology. Finally, for any $\nu\in\mathcal{P}(\mathcal{X})$ , it holds $I(\nu)=0$ if and only if $\nu=\mu$ ; and, for any sequence $(t_{n})_{n\geqslant 1}$ such that $t_{n}/\log(n)\to+\infty$ as $n\to+\infty$ , it holds

[TABLE]

almost surely in the $\tau^{\kappa}$ -topology.

Our conclusion is in essence close to that of [76], but the conditions to reach it seem more natural to us and correspond to usual conditions for proving large deviations principles in an unbounded state space, see [115, 39] and [109, Section 9]. In particular, they allow to derive the duality representation (27), and we do not need to consider non-linear operators. Our strategy (presented in Section 6.1) relies on the Gärtner–Ellis theorem [55, 44, 45, 28], for which the existence of the free-energy (26) is a key element. The originality of our work is to make use of the local martingale (7) introduced by Wu [115] in order to solve the spectral problem associated with the Feynman–Kac operator, which proves the existence of the limit in (26). This directly provides the LDP in the $\tau^{\kappa}$ -topology by duality. However, there may be cases in which a LDP holds although the conditions of the Gärtner–Ellis theorem are not satisfied, for instance in the framework of the Sanov theorem [111], so our conditions may not be necessary.

Let us also mention that, in addition to (29), we also show for completeness in the proof of Theorem 1 that $(L_{t}^{(x)})_{t\geqslant 0}$ almost surely spends a time of finite Lebesgue measure outside any $\tau^{\kappa}$ -open set around $\mu$ .

Another advantage of our approach is to characterize precisely the set of functions for which a LDP holds from the standard condition on $\Psi$ defined in (20), like in [31, 109]. This condition is also used in [115, Corollary 2.3] for proving a level 1 LDP for Langevin dynamics. We present below a clear connection with a spectral gap condition for the Witten–Schrödinger operator in the reversible case. The comparison with Cramér’s condition for independent variables highlights the effect of correlations on fluctuations.

Remark 6 (Reversible processes, Witten Laplacian and Cramér’s condition).

Consider the following reversible diffusion

[TABLE]

where $V:\mathcal{X}\to\mathbb{R}$ is a smooth potential with compact level sets. The generator of this dynamics is $\mathcal{L}=-\nabla V\cdot\nabla+\Delta$ and its invariant probability measure reads $\mu(dx)=Z^{-1}\,\mathrm{e}^{-V(x)}dx$ , where we assume that

[TABLE]

Define

[TABLE]

for some $\theta\in(0,1)$ . This is a standard choice for obtaining compactness of the evolution operator [97, Section 8], and optimal control representations of rate functions [39], see also Proposition 1. An easy computation shows that

[TABLE]

However, we also know [112] that the generator $\mathcal{L}$ considered on $L^{2}(\mu)$ is unitarily equivalent to the operator

[TABLE]

defined on $L^{2}(dx)$ (a procedure also called symmetrization [107, Section 4.3]), which is actually the opposite of the Witten Laplacian [112, 62]:

[TABLE]

In this case, the condition for (30) to have compact level sets when $\theta=1/2$ is actually equivalent to a confinement condition (or spectral gap condition [63]) for the Witten–Schrödinger operator $\widetilde{\mathcal{L}}$ defined in (31). In that sense, Assumption 3 is a natural generalization of a spectral gap condition for the Witten Laplacian in the case of possibly non-reversible dynamics. This is why we call Assumption 3 a Witten–Lyapunov condition.

We now compare this Witten–Lyapunov condition to Cramér’s exponential moment condition in the case of independent variables of law $\mu$ . Consider a smooth potential $V(x)$ which behaves as $|x|^{q}$ for $q>1$ outside a ball $B(0,r)$ centered on the origin. Assumption 3 is thus satisfied by application of Proposition 1. The standard Cramér condition in the case of independent variables $(X_{i})_{i\geqslant 0}$ states that the empirical measure

[TABLE]

satisfies a large deviations principle in the $\tau^{\kappa}$ -topology if and only if [111, Theorem 1.1]:

[TABLE]

For $\mu(dx)=Z^{-1}\mathrm{e}^{-V(x)}dx$ , a sufficient condition for the above condition to hold is to choose a smooth function $\kappa$ behaving as $1+|x|^{\alpha}$ with $0\leqslant\alpha<q$ . On the other hand, the Witten–Lyapunov potential (30) reads in this case

[TABLE]

so that we may choose $\kappa(x)$ behaving as $1+|x|^{\alpha}$ for $0\leqslant\alpha<2(q-1)$ . When comparing the two conditions, we obtain the following different situations depending on $q$ :

•

$q>2$ * (super-Gaussian case): $2(q-1)>q$ , the Witten–Lyapunov condition is less restrictive than Cramér’s condition;*

•

$q=2$ * (Gaussian case): $2(q-1)=q$ , the two conditions are equivalent;*

•

$q\in(1,2)$ * (sub-Gaussian case): $2(q-1)<q$ , the Witten–Lyapunov condition is more restrictive than Cramér’s condition.*

This simple example shows that considering a correlated system instead of independent variables has a non-trivial effect on the stability of the system. Depending on the confinement potential, the Witten–Lyapunov condition for (30) to have compact level sets can be more or less restrictive than Cramér’s condition for independent variables distributed according to the invariant measure $\mu$ . Finally, we remark that for $q\in(1,3/2)$ , the process is heavy-tailed in the sense that $2(q-1)<1$ and the observable $f(x)=x$ (assuming $d=1$ ) does not satisfy a LDP. In other words, the average position of the process defined by

[TABLE]

cannot be shown to satisfy a large deviations principle at speed $t$ with our arguments.

We finally mention that, in the case where the observable $f$ grows faster at infinity than the potential $\Psi$ , it seems possible to derive a level 1 large deviations principle at a speed smaller than $t$ . We refer to [90] for a recent account dealing with the case of an Ornstein–Uhlenbeck process, and to [16, 2] for related issues.

We close this section with a practical corollary of Theorem 1 which generalizes the level 1 LDP proved in [115, Corollary 2.3].

Corollary 1 (Level 1 large deviations principle).

Suppose that Assumptions 1, 2 and 3 hold true and consider a function $f\in B^{\infty}_{\kappa}(\mathcal{X})$ . Fix $x\in\mathcal{X}$ . Then, the function

[TABLE]

is well-defined and differentiable, and does not depend on $x$ . Moreover, $L_{t}^{(x)}(f)$ satisfies a large deviations principle in $\mathbb{R}$ at speed $t$ with good rate function given by

[TABLE]

where $I$ is defined in (27). Finally, it holds

[TABLE]

Corollary 1 is useful for practical applications, since (34) is a natural way to estimate the rate function $I_{f}$ associated with an observable $f$ , see for instance [56, 101, 104, 23, 48].

Proof.

For $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , the application $L_{t}^{(x)}\in\mathcal{P}_{\kappa}(\mathcal{X})\mapsto L_{t}^{(x)}(f)\in\mathbb{R}$ is continuous in the $\tau^{\kappa}$ -topology [30, Lemma 3.3.8]. Therefore, $L_{t}^{(x)}(f)$ obeys a large deviations principle in $\mathbb{R}$ by the contraction principle [28, Theorem 4.2.1], with good rate function given by (33). Moreover, one can redo the proofs leading to Theorem 1 and show that $\lambda_{f}$ defined in (32) is smooth and well-defined on $\mathbb{R}$ . This implies that a LDP with good rate function (34) holds through the Gärtner–Ellis theorem applied in $\mathbb{R}$ . Since the rate function is unique, the expressions (33) and (34) coincide. ∎

3 Decomposition of the rate function

Our goal in this section is to rewrite $I$ in various ways, which is useful for theoretical understanding and practical purposes. In Section 3.1, we first show an extension of the standard Donsker–Varadhan formulation for $I$ . This result is obtained by making use of the spectral analysis of the operator $P_{t}^{f}$ for $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , which is presented in Section 6.1. We then apply this result to obtain a variational representation for the principal eigenvalue $\mathrm{e}^{t\lambda(f)}$ of $P_{t}^{f}$ . Next, in Section 3.2, we split the expression of the rate function according to the symmetric and antisymmetric parts of the dynamics, extending the work [15] to general diffusions. Such a decomposition will prove useful in Section 4 to compare the entropy of overdamped and underdamped Langevin dynamics. Most of the proofs of this section are postponed to Section 6.2.

3.1 Donsker–Varadhan variational formula

We start with the variational representation of the entropy. Our proof, which can be found in Section 6.2.2, is an adaptation of [30, Lemma 4.2.35] relying on the Feynman–Kac semigroup and its spectral elements. In order to state the result, we need to make sense of $\mathcal{L}u$ for functions $u\in B^{\infty}_{W}(\mathcal{X})$ . It turns out that the appropriate notion to this end is the extended domain $\mathcal{D}(\mathcal{L})$ of the generator $\mathcal{L}$ considered as an operator on $B^{\infty}_{W}(\mathcal{X})$ , defined in the following way: a function $\varphi\in B^{\infty}_{W}(\mathcal{X})$ belongs to $\mathcal{D}(\mathcal{L})$ if and only if there exists a measurable function $\phi:\mathcal{X}\to\mathbb{R}$ such that, for any $x\in\mathcal{X}$ ,

[TABLE]

and

[TABLE]

In this case we write $\phi=\mathcal{L}\varphi$ (with some abuse of notation in view of the definition of $\mathcal{L}$ as a differential operator in (10), but of course the expressions coincide when $\varphi$ is a smooth test function with compact support).

When the $\tau$ -topology is considered, such extended domains were already considered for instance in [114, 115, 76], see also [26, Chapter I, Definition 14.15]. For the unbounded functions we consider, one should think of $\phi=\mathcal{L}u$ as an element of $B^{\infty}_{\kappa W}(\mathcal{X})$ (see the proof of Lemma 10 below, as well as the comments following Proposition 3). The integrability condition (35) is reasonable in this context since $(P_{t})_{t\geqslant 0}$ is a well defined semigroup on $B^{\infty}_{\kappa W}(\mathcal{X})$ in view of the Lyapunov condition (22).

We can now present the main result of this section.

Proposition 3.

The rate function defined in (27) admits the following representation:

[TABLE]

where

[TABLE]

In particular, the functional defined in (37) is equal to $+\infty$ if $\nu\notin\mathcal{P}_{\kappa}(\mathcal{X})$ or $\nu$ is not absolutely continuous with respect to $\mu$ .

This result is standard when $\mathcal{X}$ is compact [33], but does not seem to be known for an unbounded space $\mathcal{X}$ and for the $\tau^{\kappa}$ -topology we consider. In this situation the space $\mathcal{D}^{+}(\mathcal{L})$ has to be designed with some caution. Note that $\mathcal{D}^{+}(\mathcal{L})$ is not empty since it contains the functions of the form $u=\mathrm{e}^{\psi}$ for $\psi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ . Note also that the last statement of Proposition 3 is consistent with the Fenchel definition (27) of the rate function. In order to get some intuition on the formula (37), let us mention that the proof formally relies on replacing the maximum over functions $u\in\mathcal{D}^{+}(\mathcal{L})$ by the supremum over eigenfunctions $h_{f}$ satisfying

[TABLE]

for $f\in B^{\infty}_{\kappa}(\mathcal{X})$ . The above equation rewrites, since $h_{f}>0$ (see Lemmas 7 and 10),

[TABLE]

By integrating with respect to a measure $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ we find (37) on the left hand side, and the Fenchel transform (27) on the right hand side. The functional spaces associated with $f$ and $h_{f}$ motivate the choice of $\mathcal{D}^{+}(\mathcal{L})$ , in particular the fact that $\mathcal{L}h_{f}=\lambda(f)h_{f}-fh_{f}\in B^{\infty}_{\kappa W}(\mathcal{X})$ (as the sum of an element in $B^{\infty}_{W}(\mathcal{X})$ and the product of a function in $B^{\infty}_{W}(\mathcal{X})$ and another one in $B^{\infty}_{\kappa}(\mathcal{X})$ ), which allows to define $\mathcal{L}h_{f}$ in the weak sense (36).

A natural consequence of Proposition 3 is the following variational representation for the cumulant function. The proof, postponed to Section 6.2.3, relies on the convexity of the cumulant function to invert the Fenchel transform (27).

Corollary 2.

Suppose that Assumptions 1, 2 and 3 hold true, and consider $f\in B^{\infty}_{\kappa}(\mathcal{X})$ . Then,

[TABLE]

where $I$ is defined in (37).

Corollary 2 may seem anecdotal, but it provides a variational representation for the principal eigenvalue of non-symmetric diffusion operators, as pioneered by Donsker and Varadhan in their seminal paper [33] for a compact space $\mathcal{X}$ . To the best of our knowledge, this formula had not been shown in an unbounded setting, for which we need to introduce the “generalized domain” $\mathcal{D}^{+}(\mathcal{L})$ defined in (38). However, our set of assumptions implies that $\lambda(f)$ can be thought of as the largest eigenvalue of $\mathcal{L}+f$ , and turns out to be isolated for any $f$ (because of the compactness of the resolvent provided by Lemma 7), whereas in [33], (39) may be the supremum of the essential spectrum of the operator. This suggests that (39) holds under weaker assumptions. A possible approach for generalizing our results may be to consider different methods for studying the long time behaviour of unnormalized semigroups, see for instance [20, 6, 21], or to resort to more subtle spectral analysis tools [113, 116, 53, 13].

3.2 Entropy decomposition: symmetry and antisymmetry

Our goal is now to provide refined expressions for the rate function $I$ in terms of symmetric and antisymmetric parts of the dynamics, inspired in particular by [15]. In the following, for any closed operator $T$ , we denote by $T^{*}$ its adjoint on $L^{2}(\mu)$ , where $\mu$ is the invariant probability measure of the process, as obtained in Proposition 2. Considering the generator $\mathcal{L}$ of the diffusion (9), we can always decompose it into symmetric and antisymmetric parts with respect to $\mu$ through

[TABLE]

It is important to note that $\mathcal{L}_{\mathrm{A}}$ is a first order differential operator (and therefore obeys the chain rule of first order differentiation). We assume here that the operators $\mathcal{L},\mathcal{L}_{\mathrm{A}},\mathcal{L}_{\mathrm{S}}$ admit $C_{\mathrm{c}}^{\infty}(\mathcal{X})$ as a common core (but the domains of these operators may be different).

The decomposition (40) allows to separate the rate function (37) into two parts. This is the purpose of the next key result, whose proof can be found in Section 6.2.4. It is inspired by the computations in [15, Proposition 2], which we simplify and generalize here through a variational Witten transform and the use of the Sobolev spaces introduced in Section 2.1. The algebra of the proof also suggests to consider $I(\nu)$ for probability measures $\nu$ of the form $d\nu=\mathrm{e}^{v}\,d\mu$ .

Theorem 2.

Suppose that Assumptions 1, 2 and 3 hold true, consider a measure $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ such that $d\nu=\mathrm{e}^{v}\,d\mu$ with $v\in\mathscr{H}^{1}(\nu)$ and $\mathcal{L}_{\mathrm{A}}v\in\mathscr{H}^{-1}(\nu)$ . Then, the rate function $I$ defined in (37) admits the following decomposition:

[TABLE]

where

[TABLE]

and

[TABLE]

Theorem 2 expresses the rate function as the sum of dual norms of the symmetric and antisymmetric parts of the dynamics. Note also that we consider a measure of the form $d\nu=\mathrm{e}^{v}\,d\mu$ , that is the Radon–Nikodym derivative of $\nu$ with respect to $\mu$ is positive. However, we believe that we can consider more general measures $\nu$ , see Remark 10 in the proof. Since the measure $\nu$ at hand appears both inside the norms and in the definition of the norms themselves, a possibly clearer rewriting is the following:

[TABLE]

Moreover, the symmetric part of the rate function (42) can be written as a Fisher information for the invariant measure $\mu$ , a standard result [55]: denoting by $\rho=d\nu/d\mu$ , it holds

[TABLE]

The next corollary builds upon (43) by rewritting $I_{\mathrm{A}}$ using a Poisson equation, which can be manipulated more easily. The proof can be found in Section 6.2.5.

Corollary 3.

Suppose that Assumptions 1, 2 and 3 hold true, and consider a measure $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ such that $d\nu=\mathrm{e}^{v}d\mu$ with $v\in\mathscr{H}^{1}(\nu)$ and $\mathcal{L}_{\mathrm{A}}v\in\mathscr{H}^{-1}(\nu)$ . Then, the antisymmetric part of the rate function (43) reads

[TABLE]

where $\psi_{v}$ is the unique solution in $\mathscr{H}^{1}(\nu)$ to the Poisson equation

[TABLE]

the symmetric matrix $S$ being defined in (10) and $\widetilde{\nabla}$ denoting the adjoint of the gradient operator in $L^{2}(\nu)$ .

It has been known for a long time [33] that the rate function of a reversible process is a Fisher information as in (42). The antisymmetric part of the rate function has been less investigated, although an expression like (44) already appears in [55] (see also [98, 15]). However, our setting provides natural well-posedness conditions for both parts of the rate function to be finite. Moreover, the uniqueness of $\psi_{v}$ is a consequence of the definition of $\mathscr{H}^{1}(\nu)$ through equivalence classes, see Section 2.1.

Interestingly, the solution $\psi_{v}$ of (45) can be formally represented through [83]

[TABLE]

where $\mathcal{L}_{\nu}=-\widetilde{\nabla}(S\nabla\,\cdot\,)$ . The stochastic process $(X_{t}^{\nu})_{t\geqslant 0}$ associated with $\mathcal{L}_{\nu}$ is reversible with respect to $\nu$ . Denoting by $\mathrm{e}^{-V_{\nu}}$ the density of $\nu$ with respect to the Lebesgue measure, $(X_{t}^{\nu})_{t\geqslant 0}$ is solution to the following SDE:

[TABLE]

Finally (44) takes the form

[TABLE]

The antisymmetric part of the entropy is therefore the autocorrelation of $\mathcal{L}_{\mathrm{A}}v$ along a reversible process that realizes the fluctuation corresponding to the measure $\nu$ . From a mathematical point of view, it seems interesting to relate (46) to the so-called level 2.5 of large deviations [7, 24], since this approach consists in considering joint fluctuations of the empirical measure and the associated empirical current. In this case, the large deviations function is explicit: this reflects the fact that a Markov process is characterized entirely by its density and current. Exploring further the connection between (46) and level 2.5 large deviations is an interesting direction for future works.

Remark 7.

It is also possible to consider the adjoint $\mathcal{L}^{*}$ not with respect to the invariant measure $\mu$ (whose analytical expression may be unknown), but instead with respect to a reference measure $\mu_{\mathrm{ref}}$ with a known analytical expression such that $\mathcal{L}^{*}=\mathcal{L}_{1}-\mathcal{L}_{2}+\xi$ for some measurable function $\xi$ (with $\mathcal{L}=\mathcal{L}_{1}+\mathcal{L}_{2}$ ). This leads to an additionnal term $-\int_{\mathcal{X}}\xi\,d\nu$ in the expression of the rate function (41), as can be readily checked by a straightforward adaptation of the proof. The operators $\mathcal{L}_{1}$ and $\mathcal{L}_{2}$ are the counterparts of the symmetric and antisymmetric parts of the generator in this decomposition. A typical situation to apply this strategy is provided by systems subject to a small external nonequilibrium forcing, the reference measure usually being chosen as the invariant measure at equilibrium, in the absence of external forcing. Atom chains in contact with an inhomogeneous heat bath were studied with this approach in [15], $\mu_{\mathrm{ref}}$ being the Gibbs measure associated with a fixed temperature profile.

4 Applications

4.1 Overdamped Langevin dynamics

In this section, we come back to the setting of Remark 6 by considering a diffusion process over $\mathcal{X}=\mathbb{R}^{d}$ subject to

[TABLE]

where $b:\mathbb{R}^{d}\to\mathbb{R}^{d}$ is a smooth function and $(B_{t})_{t\geqslant 0}$ is a $d$ -dimensional Brownian motion. This corresponds to (9) with $\sigma=\sqrt{2}$ , in which case the generator reads

[TABLE]

We will treat the reversible case where $b=-\nabla V$ for a smooth potential $V$ , and $b=-\nabla V+F$ for a smooth function $F$ such that $\nabla\cdot(F\mathrm{e}^{-V})=0$ . In both cases, the invariant probability measure $\mu$ of the process is (assuming $\mathrm{e}^{-V}\in L^{1}(\mathcal{X})$ )

[TABLE]

The dynamics (47) is reversible (i.e. $\mathcal{L}^{*}=\mathcal{L}$ , where $\mathcal{L}^{*}$ denotes the adjoint of $\mathcal{L}$ in $L^{2}(\mu)$ ) if and only if $b=-\nabla V$ . We now give a standard condition on $V$ under which the framework developped in Sections 2 and 3 applies.

Assumption 4.

The potential $V\in\mathscr{S}$ has compact level sets, satisfies $\mathrm{e}^{-V}\in L^{1}(\mathcal{X})$ and, for any $\theta\in(0,1)$ , it holds

[TABLE]

This assumption is satisfied for smooth potentials growing like $|x|^{q}$ for $q>1$ at infinity, and it also implies that the invariant probability measure $\mu$ satisfies a Poincaré inequality [4]. Similar conditions are derived in [76] in the context of large deviations. The next proposition is a direct application of Propositions 1 and 2, Theorem 1 and Corollary 3.

Proposition 4.

Under Assumption 4, the process (47) with $b=-\nabla V$ admits the function

[TABLE]

for any $\theta\in(0,1)$ as a Lyapunov function in the sense of Assumption 3. For any fixed $\theta\in(0,1)$ , there exist $C,c>0$ such that for any initial measure $\nu\in\mathcal{P}_{W}(\mathcal{X})$ ,

[TABLE]

Moreover,

[TABLE]

has compact level sets and, for any $\kappa:\mathcal{X}\to[1,+\infty)$ belonging to $\mathscr{S}$ , bounded or with compact level sets and such that

[TABLE]

the empirical measure

[TABLE]

satisfies a large deviations principle in the $\tau^{\kappa}$ -topology. The good rate function is defined by: for all $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ with $d\nu=\rho\,d\mu=\mathrm{e}^{v}\,d\mu$ ,

[TABLE]

and $I(\nu)=+\infty$ otherwise.

In this reversible example, we see that the rate function is only defined through its symmetric part (42), as shown in Theorem 2. We now consider a modification of this dynamics when a divergence-free drift is added. The next proposition is an extension of the examples proposed in [98] to the unbounded state space case.

Proposition 5.

Suppose that Assumption 4 holds and consider the diffusion process solution to:

[TABLE]

with $F$ a smooth vector field such that $\nabla\cdot(F\mathrm{e}^{-V})=0$ and

[TABLE]

where $\Psi$ is defined in (50). Then, with the notation of Section 3.2 it holds $\mathcal{L}_{\mathrm{S}}=-\nabla V\cdot\nabla+\Delta$ and $\mathcal{L}_{\mathrm{A}}=F\cdot\nabla$ . Moreover

[TABLE]

and $(X_{t})_{t\geqslant 0}$ satisfies a LDP in the $\tau^{\kappa}$ -topology for any function $\kappa$ belonging to $\mathscr{S}$ , bounded or with compact level sets and such that

[TABLE]

The associated rate function $I_{\mathrm{F}}$ reads: for any $\nu$ such that $d\nu=\mathrm{e}^{v}\,d\mu$ with $v\in\mathscr{H}^{1}(\nu)$ and $F\cdot\nabla v\in\mathscr{H}^{-1}(\nu)$ ,

[TABLE]

where $\psi_{v}$ is the unique $\mathscr{H}^{1}(\nu)$ -solution to

[TABLE]

Proposition 5 shows that, in this simple case, the equilibrium and nonequilibrium dynamics admit a LDP for the same class of functions but with different rate functions, the irreversible dynamics producing more entropy. It is therefore an extension of the case treated in [98, Theorem 2.2]. As for this result, Proposition 5 can be used to design algorithms with accelerated convergence to equilibrium, see also [66, 67, 37]. A setting in which Proposition 5 typically applies is when $V(x)$ behaves as $|x|^{q}$ for some $q>1$ outside an open set centered on the origin, and $F=A\nabla V$ with $A\in\mathbb{R}^{d\times d}$ such that $A^{T}=-A$ (see [98]). The latter condition implies in particular that $F\cdot\nabla V=0$ so (52) immediately holds.

4.2 Underdamped Langevin dynamics

We now apply our framework to the underdamped Langevin dynamics. A first nice feature of our results is that, compared to [115], we obtain a stronger result with similar assumptions – that is our LDP for the empirical measure holds for a finer topology than the one associated with bounded measurable functions. Note however that [115, Corollary 2.3] obtains results similar to ours for a contraction of the rate function. In addition, Theorem 2 and Corollary 3 allow to obtain precise results on the dependency of the rate function on the friction parameter $\gamma$ .

We start by describing the Langevin equation in Section 4.2.1, before stating the large deviations principle in Section 4.2.2. Finally Section 4.2.3 provides asymptotics on the rate function depending on the friction.

4.2.1 Description of the dynamics

The dynamics is set on $\mathcal{X}=\mathbb{R}^{d}\times\mathbb{R}^{d}$ , with $(X_{t})_{t\geqslant 0}=(q_{t},p_{t})_{t\geqslant 0}\in\mathbb{R}^{d}\times\mathbb{R}^{d}$ evolving as

[TABLE]

where $\gamma>0$ is a friction parameter, $V:\mathbb{R}^{d}\to\mathbb{R}$ is a smooth potential, and $(B_{t})_{t\geqslant 0}$ is a $d$ -dimensional Brownian motion. We could also consider the easier case where the position space is bounded ( $q\in\mathbb{T}^{d}$ ) but leave this simple modification to the reader. The generator of the dynamics is

[TABLE]

where

[TABLE]

The operator $\mathcal{L}_{\gamma}$ leaves invariant the measure

[TABLE]

The invariant measure (56) can be written

[TABLE]

where

[TABLE]

is the Hamiltonian of the system, and we assume that the normalization constant $Z$ in (57) is finite (which is indeed the case when $\mathrm{e}^{-V}\in L^{1}(\mu)$ ). In (55), the Liouville operator $\mathcal{L}_{\mathrm{ham}}$ corresponding to the Hamiltonian part of the dynamics is antisymmetric in $L^{2}(\mu)$ . On the other hand, the fluctuation-dissipation part with generator $\mathcal{L}_{\mathrm{FD}}$ is symmetric in $L^{2}(\mu)$ , so that $\mathcal{L}_{\mathrm{A}}=\mathcal{L}_{\mathrm{ham}}$ and $\mathcal{L}_{\mathrm{S}}=\gamma\mathcal{L}_{\mathrm{FD}}$ with the notation of Section 3.2.

Before turning to the LDP associated with the Langevin dynamics (54), we give some intuition on the behaviour of the process as $\gamma$ varies. First, it is clear that in the small $\gamma$ limit, (54) becomes the Hamiltonian dynamics

[TABLE]

To be more precise, we introduce the process $(Q_{t}^{\gamma},P_{t}^{\gamma})=(q_{t/\gamma},p_{t/\gamma})$ where $(q_{t},p_{t})_{t\geqslant 0}$ is solution to (54). It can then be shown that, in the limit $\gamma\to 0$ , the Hamiltonian $H(Q_{t}^{\gamma},P_{t}^{\gamma})$ converges to an effective diffusion on a graph [51, 49, 50, 61]. In particular the relevant time scale in the underdamped limit is $\gamma^{-1}t$ .

On the other hand, in the limit $\gamma\to+\infty$ and under an appropriate time rescaling, we recover the overdamped dynamics studied in Section 4.1. To see this, we integrate the second line in (54) to obtain

[TABLE]

By introducing now $Q_{t}^{\gamma}=q_{\gamma t}$ and $P_{t}^{\gamma}=p_{\gamma t}$ , the latter equality becomes

[TABLE]

When $\gamma\to+\infty$ , we observe that $Q_{t}^{\gamma}$ converges formally towards the solution of (47), see [93, Section 6.5]. The relevant time scale in the overdamped limit is therefore $\gamma t$ . These remarks will be of interest below when studying the rate function associated with the dynamics (54).

4.2.2 Large deviations

In order to obtain a large deviations principle for (54), let us make the following classical assumption on the growth of the potential [115, 86, 77, 83].

Assumption 5.

The potential $V\in\mathscr{S}$ has compact level sets, satisfies $\mathrm{e}^{-V}\in L^{1}(\mathcal{X})$ and there exist $c_{V}>0$ , $C_{V}\in\mathbb{R}$ such that

[TABLE]

We can now find a Lyapunov function for (54) by following e.g. [115, 105, 86], as made precise in Appendix C. Recall that the Hamiltonian $H$ is defined in (58).

Lemma 1.

Suppose that $(X_{t})_{t\geqslant 0}=(q_{t},p_{t})_{t\geqslant 0}$ solves (54) where $V$ satisfies Assumption 5. Then for any $\gamma>0$ and $\theta\in(0,1)$ , there exists $\varepsilon>0$ such that

[TABLE]

is a Lyapunov function in the sense of Assumption 3. More precisely, for any $\gamma>0$ and $\theta\in(0,1)$ , there exist $\varepsilon>0$ and $a,b,C>0$ such that

[TABLE]

The Lyapunov function (59) can be adapted in cases where $V$ has singularities, see [64, 85]. We can now deduce our main theorem on the Langevin dynamics since Assumptions 1 and 2 are readily satisfied, see for instance [86].

Theorem 3.

Assume that $(X_{t})_{t\geqslant 0}=(q_{t},p_{t})_{t\geqslant 0}$ solves (54) where $V$ satisfies Assumption 5, and consider a smooth function $\kappa$ with $\kappa(q,p)=1+|q|^{\alpha}+|p|^{\beta}$ for $|q|+|p|\geqslant 1$ and $\alpha\in[0,2)$ , $\beta\in[0,2)$ . Then $(X_{t})_{t\geqslant 0}$ is ergodic with respect to the measure $\mu$ in the sense of Proposition 2, with Lyapunov function defined in (59). Moreover, the empirical measure

[TABLE]

satisfies a LDP in the $\tau^{\kappa}$ -topology. Finally, for any $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ such that $d\nu=\mathrm{e}^{v}\,d\mu$ with $v\in\mathscr{H}^{1}(\nu)$ and $\mathcal{L}_{\mathrm{ham}}v\in\mathscr{H}^{-1}(\nu)$ , the rate function reads

[TABLE]

where $\psi$ is the unique solution in $\mathscr{H}^{1}(\nu)$ to the Poisson problem:

[TABLE]

The proof of Theorem 3 is a direct application of the results of Sections 2 and 3. For the expression of the rate function, we use (45) and (55) together with the fact that in this case, the matrix $S$ defined in Section 2.1 reads

[TABLE]

While $\kappa$ can be chosen independently of the friction $\gamma$ , it is interesting to note the dependency of the rate function (60) with respect to this parameter. We discuss more precisely the scaling of the rate function with respect to $\gamma$ in the next section, depending on the form of $\nu$ .

4.2.3 Low and large friction asymptotics of the rate function

The next corollary shows how the decomposition (60) allows to identify the most likely fluctuations in the overdamped and underdamped limits. By this we mean that, when $\gamma\to 0$ or $\gamma\to+\infty$ , most fluctuations become exponentially rare in $\gamma$ or $1/\gamma$ , but some of them are associated with rate functions that vanish as $\gamma\to 0$ and $\gamma\to+\infty$ . The expression of these typical fluctuations is motivated by the discussion on the overdamped and underdamped limits in Section 4.2.1, from which the scalings of the rate function appear natural. Recall the definition of the marginal in position $\bar{\mu}$ in (56).

Corollary 4.

Suppose that the assumptions of Theorem 3 hold true.

•

Overdamped limit $\gamma\to+\infty$ :* Consider a measure $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ with $d\nu=\mathrm{e}^{v}\,d\mu$ equilibrated in the velocity variable, i.e. such that $v(q,p)=v(q)$ with $v\in\mathscr{H}^{1}(\nu)$ and $p\cdot\nabla_{q}v\in\mathscr{H}^{-1}(\nu)$ . Then, for any $\gamma>0$ ,*

[TABLE]

where $\bar{\nu}=\mathrm{e}^{v}\bar{\mu}$ .

•

Hamiltonian limit $\gamma\to 0$ :* Consider a Hamiltonian fluctuation, i.e. $d\nu=\mathrm{e}^{v}\,d\mu$ with $v(q,p)=g(H(q,p))\in\mathscr{H}^{1}(\nu)$ for $g\in C^{1}(\mathbb{R})$ , where $H$ is defined in (58). Then, for any $\gamma>0$ ,*

[TABLE]

The proof is an immediate consequence of (60).

Proof.

Consider first the case where $d\nu=\mathrm{e}^{v}\,d\mu$ with $v(q,p)=v(q)$ . We have

[TABLE]

Next, (61) becomes

[TABLE]

The solution to this equation is $\psi(q,p)=-p\cdot\nabla_{q}v(q)$ which indeed belongs to $\mathscr{H}^{1}(\nu)$ since $\mathcal{L}_{\mathrm{ham}}v\in\mathscr{H}^{-1}(\nu)$ (in fact we may add to $\psi$ any function depending on $q$ only but the solutions would be equivalent by definition of the space $\mathscr{H}^{1}(\nu)$ in Section 2.1). Plugging this solution into (60) leads to (62).

Assume now that $v(q,p)=g(H(q,p))$ belongs to $\mathscr{H}^{1}(\nu)$ with $g\in C^{1}(\mathbb{R})$ . It holds

[TABLE]

As a result, the solution $\psi$ to (61) is $\psi=0$ (again, up to a function of $q$ only), from which (63) follows since $v\in\mathscr{H}^{1}(\nu)$ . ∎

Corollary 4 characterizes the dominant fluctuations in the small and large friction regimes. In the overdamped limit $\gamma\to+\infty$ the dominant fluctuations are in position only, and the rate function is actually that of the limiting overdamped dynamics (51) up to a time rescaling in $t\mapsto\gamma t$ , which is coherent with the discussion on the overdamped limit in Section 4.2.1. On the other hand, in the Hamiltonian limit $\gamma\to 0$ , the dominant fluctuations are Hamiltonian, with the inverse time rescaling $t\mapsto\gamma^{-1}t$ . This is consistent with the small temperature limit of Hamiltonian systems [49].

Although Corollary 4 provides interesting information, its structure is quite rigid. For instance, in the overdamped limit, we consider only position-dependent perturbations, which is not realistic. We now refine the asymptotics by considering the next order correction in $\gamma$ for the perturbation in both regimes, which shows the robustness of the analysis. In the result stated below, we consider a family of probability measures $\nu_{\gamma}$ indexed by $\gamma>0$ , and simply denote by $\nu$ the probability measure $\nu_{0}$ .

Corollary 5.

Suppose that the assumptions of Theorem 3 hold true.

•

Overdamped limit $\gamma\to+\infty$ :* Consider the measure $\nu_{\gamma}\in\mathcal{P}_{\kappa}(\mathcal{X})$ defined by $\nu_{\gamma}=\mathrm{e}^{v_{\gamma}}d\mu$ with $v_{\gamma}(q,p)=v(q)+\gamma^{-1}\tilde{v}(q,p)$ where $\mathcal{L}_{\mathrm{ham}}v\in\mathscr{H}^{-1}(\nu)$ , and $\tilde{v}\in\mathscr{H}^{1}(\nu)$ is bounded and satisfies $\nabla_{q}v\cdot\nabla_{p}\tilde{v}\in\mathscr{H}^{-1}(\nu)$ and $\mathcal{L}_{\mathrm{ham}}\tilde{v}\in\mathscr{H}^{-1}(\nu)$ . Then*

[TABLE]

where $\bar{\nu}=\mathrm{e}^{v}\bar{\mu}$ .

•

Hamiltonian limit $\gamma\to 0$ :* Consider $\nu_{\gamma}=\mathrm{e}^{v_{\gamma}}d\mu$ with $v_{\gamma}(q,p)=g(H(q,p))+\gamma\tilde{v}(q,p)$ , where $g\in C^{1}(\mathbb{R})$ , $g(H)\in\mathscr{H}^{1}(\nu)$ , and $\tilde{v}\in\mathscr{H}^{1}(\nu)$ is bounded and satisfies $\mathcal{L}_{\mathrm{ham}}\tilde{v}\in\mathscr{H}^{-1}(\nu)$ . Then*

[TABLE]

where $\tilde{\psi}$ is the unique solution in $\mathscr{H}^{1}(\nu)$ to

[TABLE]

We believe that it is also instructive to mention the relation between the rate function (60) and the asymptotic variance of the Langevin dynamics. Indeed, when considering small perturbations of the invariant measure, Corollary 5 shows that

[TABLE]

On the other hand, the resolvent estimates in [82, Section 2.1] and [59, 61, 68] show that the asymptotic variance $\sigma_{\gamma}^{2}$ scales like

[TABLE]

Since we expect the asymptotic variance to be the inverse of the rate function around the invariant measure [29, 98], the scalings (67) and (68) are consistent. However, as (60) suggests, this scaling is no longer true for general fluctuations. We now present the proof of Corollary 5.

Proof.

We first consider the overdamped limit $\gamma\to+\infty$ . Since $\tilde{v}$ is bounded we have, for any $\gamma\geqslant 1$ and $\psi\in\mathscr{H}^{1}(\nu_{\gamma})$ ,

[TABLE]

Thus, the norms $\mathscr{H}^{1}(\nu_{\gamma})$ and $\mathscr{H}^{1}(\nu)$ are equivalent for any fixed $\gamma\geqslant 1$ , and the functions of $\mathscr{H}^{1}(\nu_{\gamma})$ and $\mathscr{H}^{1}(\nu)$ coincide (we repeatedly use this fact below, and we will use a similar argument when $\gamma\leqslant 1$ ). A similar conclusion holds for the corresponding dual norms. This consequence of the boundedness of $\tilde{v}$ makes the analysis simpler.

Recall that we consider $v_{\gamma}(q,p)=v(q)+\gamma^{-1}\tilde{v}(q,p)$ in the overdamped limit. The symmetric part of the rate function is easily computed since $v$ only depends on the position variable, namely

[TABLE]

where we used that $\tilde{v}$ belongs to $\mathscr{H}^{1}(\nu)$ and is bounded to expand the exponential. For the antisymmetric part, by (61), we have to consider the solution $\psi_{\gamma}\in\mathscr{H}^{1}(\nu_{\gamma})$ to

[TABLE]

Corollary 4 suggests that at leading order in $\gamma$ it holds $\psi_{\gamma}=\psi+\mathrm{O}(\gamma^{-1})$ where $\psi(q,p)=p\cdot\nabla v(q)$ . In order to make this idea more precise we compute

[TABLE]

In what follows, we denote by $u=\mathcal{L}_{\mathrm{ham}}\tilde{v}+\nabla_{q}v\cdot\nabla_{p}\tilde{v}$ the right hand side of the above equation. Since $\nabla_{q}v\cdot\nabla_{p}\tilde{v}\in\mathscr{H}^{-1}(\nu_{\gamma})$ and $\mathcal{L}_{\mathrm{ham}}\tilde{v}\in\mathscr{H}^{-1}(\nu_{\gamma})$ by assumption, it holds $u\in\mathscr{H}^{-1}(\nu_{\gamma})$ . Thus, multiplying by $\psi_{\gamma}-\psi$ and integrating with respect to $\nu_{\gamma}$ we obtain

[TABLE]

Using the duality between $\mathscr{H}^{1}(\nu_{\gamma})$ and $\mathscr{H}^{-1}(\nu_{\gamma})$ (see [75, Section 2.2 Claim F]) and (69) we find

[TABLE]

where $C$ is some constant independent of $\gamma$ . This shows that $\psi_{\gamma}=\psi+\gamma^{-1}\tilde{\psi}_{\gamma}$ with $|\tilde{\psi}_{\gamma}|_{\mathscr{H}^{1}(\nu)}\leqslant C^{\prime}$ for a constant $C^{\prime}>0$ and all $\gamma\geqslant 1$ . Plugging this estimate into (60) and using that $\nabla_{p}\psi=\nabla_{q}v$ , we obtain the second term on the right hand side of (64).

The arguments to prove the limit $\gamma\to 0$ follow a similar path, so we only sketch the proof. First, the boundedness of $\tilde{v}$ allows again to compare the Sobolev norms associated with $\nu$ and $\nu_{\gamma}$ for any $\gamma\leqslant 1$ (by writting the counterpart of (69) in this regime). The first term on the right hand side of (65) is easily obtained as in Corollary 4 using that $g(H)\in\mathscr{H}^{1}(\nu)$ and $\tilde{v}$ is bounded. Concerning the antisymmetric part, (61) now reads

[TABLE]

since $\mathcal{L}_{\mathrm{ham}}g(H(q,p))=0$ . Because of the scaling in $\gamma$ on the right hand side of the above equation, the solution $\psi_{\gamma}$ can be expanded as $\psi_{\gamma}=\gamma\tilde{\psi}+\mathrm{O}(\gamma^{2})$ in $\mathscr{H}^{1}(\nu)$ , where $\tilde{\psi}$ is solution to

[TABLE]

This reasoning can be made rigorous by a precise asymptotic analysis as above. Plugging this expansion into (60) provides the second term on the right hand side of (65). ∎

5 Conclusion and perspectives

The goal of this paper was twofold. Our first aim was to provide, given a diffusion process, a precise class of unbounded functions for which a large deviations principle holds. This question is answered in Section 2 were we prove a LDP for the empirical measure in a topology associated with unbounded functions, in relation with a Witten–Lyapunov condition. In particular, a comparison with Cramér’s condition for independent variables shows the effect of correlations on the stability of the SDE at hand. These results extend in several directions and refine results from previous works [115, 76]. However, the necessity of our Lyapunov condition for a LDP to hold is still an open problem – whereas the necessity of a similar condition is known for the Sanov theorem [111]. Our second concern was to provide finer expressions of the rate function governing the LDP, in particular in order to study Langevin dynamics which appear for instance in molecular simulation. We answer to this question in two ways in Section 3. We first provide an alternative variational formula for the rate function in Section 3.1, which gives as a by-product a very general representation formula for the principal eigenvalue of second order differential operators, without symmetry assumption. This extends the important work of Donsker and Varadhan [33] in an unbounded setting. In Section 3.2, we show a general decomposition of the rate function into symmetric and antisymmetric parts of the dynamics based on the computations in [15]. Interestingly, the proof of the result relies on a Witten-like transform in the above mentioned variational representation of the rate function. These results allow us to describe precisely the rate function of an irreversible overdamped Langevin dynamics in Section 4.1, revisiting results from [98] in an unbounded setting. More interestingly we provide in Section 4.2, for Langevin dynamics, asymptotics of the rate function for the overdamped and the underdamped limits. We thus characterize the most likely fluctuations in both regimes with a natural physical interpretation. Considering piecewise deterministic processes [11, 41, 42] (which lack regularity) instead of the Langevin dynamics is also an interesting problem.

We would like to mention several interesting directions for future works. A first natural issue is to rephrase our results in the optimal control framework developed e.g. in [18, 38, 39]. This is particularly interesting for numerical purposes, since the optimal control representation can be learned on the fly with stochastic approximation methods [17, 9, 10, 48]. We believe that such results can be obtained by harvesting the contraction principle provided by Corollary 1.

On a more theoretical ground, dual Sobolev norms have recently attracted attention in the optimal control community due to the so-called optimal matching problem, see for instance [80, 81] and references therein. With these works in mind, the dual Sobolev norm in the antisymmetric part of the rate function described in Section 3.2 could be interpreted as an infinitesimal transport cost related to the antisymmetric part of the dynamics, which is an alluring interpretation of irreversibility. Note that the relations between optimal transport and large deviations theory have a fruitful history, see e.g. [58].

It has been known for some time in the physics literature that the empirical density of a diffusion may not contain enough information to describe its fluctuations in an irreversible regime. It is actually more relevant to consider the fluctuations of both the empirical density and current, a procedure sometimes called level 2.5 large deviations [24, 7]. This framework can be used to provide a clear description of the rate function of irreversible dynamics. As shown in [7], such large deviations results can be derived by Krein–Rutman arguments like those used in the present paper. Therefore, we believe that our results can be extended to prove level 2.5 large deviations principles and characterize precisely the class of admissible currents.

Finally, it is important to understand the behaviour of observables which are not covered by our analysis. It has been recently shown [90] in the case of the Ornstein–Uhlenbeck process that observables growing too fast at infinity with respect to the confinement are characterized by a heavy tail behaviour. This leads to a level 1 large deviations principle at an anomalous speed with a localization in time of the fluctuation, and the Krein–Rutman strategy developped in the present paper does not apply. We therefore believe there are several interesting open questions in this direction.

Acknowledgments

The authors warmfully thank Hugo Touchette for reading an early version of the manuscript as well as the first preprint, and providing useful comments; as well as the referees, whose suggestions helped us making more precise various aspects of this work. The authors are grateful to Ofer Zeitouni for an interesting discussion about scalings in large deviations theory, as well as to Jianfeng Lu for pointing out the work [15]. We also thank Julien Reygner for general discussions on large deviations. The PhD of Grégoire Ferré was supported by the Labex Bézout ANR-10-LABX-58-01. The work of Gabriel Stoltz was funded in part by the Agence Nationale de la Recherche, under grant ANR-14-CE23-0012 (COSMOS), and by the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement number 614492. We also benefited from the scientific environment of the Laboratoire International Associé between the Centre National de la Recherche Scientifique and the University of Illinois at Urbana-Champaign.

6 Proofs

In all the proofs below, for conciseness, we write $\mathbb{E}_{x},\mathbb{P}_{x}$ , etc, with some abuse of notation, to indicate that the expectations we consider are taken with respect to all realizations of the dynamics (9) started from $X_{0}=x$ ; and do not indicate explicitly the dependence of $X_{t}$ on $x$ , in contrast to the convention used in Section 2.

6.1 Proof of the large deviations principle

As mentioned after Theorem 1, our proof relies on the Gärtner–Ellis theorem [28], for which we need several preliminary results. The key object is the functional

[TABLE]

Roughly speaking, the Gärtner–Ellis theorem (Theorem 4 in Appendix A) states that if this functional is finite and Gateau-differentiable over $B^{\infty}_{\kappa}(\mathcal{X})$ and $(L_{t})_{t\geqslant 0}$ defined in (25) is exponentially tight for the $\tau^{\kappa}$ -topology, then $(L_{t})_{t\geqslant 0}$ satisfies a LDP in the dual space of $B^{\infty}_{\kappa}(\mathcal{X})$ . A reminder of this theorem and some elements of analysis are given in Appendix A.

However, studying the range of functions $f$ for which the functional $\lambda$ is finite and Gateau-differentiable is not an easy task. Formally, our strategy is to prove that $r(f)$ , the element of the spectrum of the operator $\mathcal{L}+f$ with the largest modulus, is a real eigenvalue for any function $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , and to show that it is actually equal to the cumulant function $\lambda(f)$ defined in (26). This amounts to showing the well-posedness and regularity of a family of spectral problems. For this, we use several ideas from [47], which shows that under Lyapunov and irreducibility conditions, the eigenvalue problem to which $\lambda$ is associated is well defined. In order to avoid technical difficulties related to unbounded operators, we study the semigroup $(P_{t}^{f})_{t\geqslant 0}$ rather than its generator $\mathcal{L}+f$ , see Remark 9 below for more details. The seminal paper by Gärtner [55, Section 3] provides useful technical tools, as well as [44, 115].

In all of this section, we suppose that Assumptions 1, 2 and 3 hold true and consider a function $\kappa:\mathcal{X}\to[1,+\infty)$ of class $\mathscr{S}$ as in Assumption 3, i.e. such that $\kappa\ll\Psi$ and either $\kappa$ is bounded or has compact level sets and satisfies (22). We repeatedly use that $\kappa\ll-\frac{\mathcal{L}\mathscr{W}}{\mathscr{W}}$ in view of (21). We start with important properties of key martingales that appear regularly in the proofs of the required technical results.

Lemma 2.

If $(X_{t})_{t\geqslant 0}$ is a solution to (9), then the stochastic processes defined by

[TABLE]

are continuous non-negative local martingales, hence supermartingales. Moreover, it holds almost surely

[TABLE]

where $C_{1}>0$ and $C_{2}\in\mathbb{R}$ are the constants from Assumption 3.

Proof.

First, Itô formula gives

[TABLE]

Since $W$ is $C^{2}(\mathcal{X})$ and $\sigma$ is continuous, $M_{t}$ is a continuous local martingale [71]. Since it is non-negative, it is a supermartingale by Fatou’s lemma, and the same conclusion holds for $\mathscr{M}_{t}$ . On the other hand, (21) shows that

[TABLE]

which concludes the proof. ∎

The use of the martingale $M_{t}$ is inspired by [115] where it is considered to control return times to compact sets. Here, it allows to define the Feynman–Kac semigroup associated with the dynamics $(X_{t})_{t\geqslant 0}$ with weight function $f\in B^{\infty}_{\kappa}(\mathcal{X})$ .

Lemma 3.

Fix $f\in B^{\infty}_{\kappa}(\mathcal{X})$ . For any $t\geqslant 0$ , the Feynman–Kac operator

[TABLE]

is well defined. Moreover, $(P_{t}^{f})_{t\geqslant 0}$ is a semigroup of bounded operators on $B^{\infty}_{W}(\mathcal{X})$ . Finally, for any $t>0$ and any $a>0$ , there exist $c_{a,t}\geqslant 0$ and a compact subset $K_{a,t}\subset\mathcal{X}$ such that

[TABLE]

Proof.

We first show that for any $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , $(P_{t}^{f})_{t\geqslant 0}$ is a semigroup of bounded operators on $B^{\infty}_{W}(\mathcal{X})$ , before turning to the proof of (73). For a fixed $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , since $\kappa\ll\Psi$ , there exists $c>0$ such that, for any $t>0$ ,

[TABLE]

Using Lemma 2, the supermartingale property leads to

[TABLE]

Therefore, for all $\varphi\in B^{\infty}_{W}(\mathcal{X})$ ,

[TABLE]

and hence

[TABLE]

As a result $(P_{t}^{f})_{t\geqslant 0}$ is a semigroup of bounded operators over $B^{\infty}_{W}(\mathcal{X})$ .

We next prove (73) for a fixed $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , which we assume non-zero without loss of generality. Note that

[TABLE]

Since $\Psi$ has compact level sets and $\kappa\ll\Psi$ , for any $a>0$ there exists a compact set $K_{a}\subset\mathcal{X}$ and a constant $b_{0,a}$ such that

[TABLE]

where $\alpha>0$ is a constant to be chosen later on. This implies that

[TABLE]

with $b_{a}=b_{0,a}\sup_{K_{a}}W<+\infty$ since $W\in C^{2}(\mathcal{X})$ . Therefore (by some standard approximation arguments relying on stopping times, as discussed for instance in [97])

[TABLE]

We can now bound the right hand side of the above equation with a technique similar to the one used in [47, Section 2.3]. Indeed, for any $x\in\mathcal{X}$ ,

[TABLE]

Since $\kappa\ll-\frac{\mathcal{L}\mathscr{W}}{\mathscr{W}}$ , there exists a constant $c\geqslant 0$ depending on $f$ such that

[TABLE]

Plugging this estimate into (75) and using that $\mathscr{W}\geqslant 1$ leads to

[TABLE]

where the last bound is due to Lemma 2.

Using this estimate to bound the right hand side of (74), we end up with

[TABLE]

Integrating with respect to time leads to

[TABLE]

Since $\mathscr{W}\ll W$ , there exists a compact set $K_{a,t}\subset\mathcal{X}$ such that $\tilde{b}_{a}\mathscr{W}\leqslant\mathrm{e}^{-(a+\alpha)t}W$ outside $K_{a,t}$ , so that we have

[TABLE]

We can now assume that we chose from the begining $\alpha>\log(2)/t$ (recall that $t$ is fixed). Setting $c_{a,t}=\tilde{b}_{a}\sup_{K_{a,t}}\mathscr{W}$ , this leads to

[TABLE]

which proves (73). ∎

Lemma 3 proves crucial to obtain the compactness of the evolution operator $P_{t}^{f}$ , as noted in [47] (a result inspired by [97, Theorem 8.9]). Note however that $(P_{t}^{f})_{t\geqslant 0}$ is a priori not a strongly continuous semigroup on $B^{\infty}_{W}(\mathcal{X})$ , see the discussion in [114, Proposition B13] and Remark 9 below for more details.

Another key ingredient is the regularization property of the evolution. The following bound on the Feynman–Kac semigroup depending on the weight function $f$ is one element in this direction.

Lemma 4.

Suppose that Assumptions 1, 2 and 3 hold true, and fix $f,g\in B^{\infty}_{\kappa}(\mathcal{X})$ . Then, for any $t>0$ , any $\varphi\in B^{\infty}_{W}(\mathcal{X})$ and any $x\in\mathcal{X}$ , it holds

[TABLE]

Proof.

Using the inequality $|\mathrm{e}^{a}-\mathrm{e}^{b}|\leqslant|a-b|\,\mathrm{e}^{|a|+|b|}$ for $a,b\in\mathbb{R}$ , we have, for $x\in\mathcal{X}$ ,

[TABLE]

which is the desired conclusion. ∎

We can now use Lemma 4 to show an important regularization property of the Feynman–Kac semigroup.

Lemma 5.

For any $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , $\varphi\in B^{\infty}_{W}(\mathcal{X})$ , any $t>0$ and any compact $K\subset\mathcal{X}$ , the function $P_{t}^{f}(\varphi\mathds{1}_{K})$ is continuous.

Let us insist on the fact that the statement of Lemma 5 is a consequence of Hörmander’s theorem [43, Theorem 4.1] when $f$ has polynomial growth and is smooth. However, the result is more difficult to obtain when $f$ is irregular. Note for instance that we cannot rely on the continuity property proved in Section 6.2.3 below since the space of smooth functions with compact support is not dense in $B^{\infty}_{W}(\mathcal{X})$ . The idea of the proof is to use the local martingales introduced in Lemma 2 to show that the regularization property of Hörmander’s theorem is preserved when $f$ is irregular but does not grow too fast.

Proof.

We use Assumption 1 to revisit [55, pages 34-35] in an unbounded setting and with a hypoelliptic flavour. First, we note that for $f\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ , the result is a direct application of Assumption 1 combined with Hörmander’s theorem, since the evolution operator $P_{t}^{f}$ can be shown to be an integral operator with a transition probability which admits a density $p^{f}(t,x,y)$ belonging to $C^{\infty}((0,+\infty)\times\mathcal{X}\times\mathcal{X})$ (see for instance [69] for $f=0$ , which can easily be extended to $f\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ with the hypoelliptic result of [43, Theorem 4.1]). In particular, $P_{t}^{f}(\varphi\mathds{1}_{K})$ is continuous.

We now use an approximation argument inspired by [55, Section 3] for a generic function $f\in B^{\infty}_{\kappa}(\mathcal{X})$ . Consider a sequence $(f_{n})_{n\in\mathbb{N}}$ of functions belonging to $C_{\mathrm{c}}^{\infty}(\mathcal{X})$ with $\|f_{n}\|_{B^{\infty}_{\kappa}}\leqslant\|f\|_{B^{\infty}_{\kappa}}$ for any $n\in\mathbb{N}$ , and such that $f_{n}\to f$ almost everywhere as $n\to+\infty$ (such a sequence exists by Lusin’s theorem, see [102, Chapter 2]). By modifying the proof of Lemma 4, and since $\|f_{n}\|_{B^{\infty}_{\kappa}}\leqslant\|f\|_{B^{\infty}_{\kappa}}$ , we have for any $\varphi\in B^{\infty}_{W}(\mathcal{X})$ , $n\in\mathbb{N}$ and $x\in\mathcal{X}$ ,

[TABLE]

with $\delta=2\|f\|_{B^{\infty}_{\kappa}}$ .

Our goal is now to show that $P_{t}^{f_{n}}(\varphi\mathds{1}_{K})$ converges uniformly over any compact $K^{\prime}$ to $P_{t}^{f}(\varphi\mathds{1}_{K})$ , by proving that the right hand side of (77) goes uniformly to [math] over $K^{\prime}$ . This will conclude the proof since a uniform limit of continuous functions is continuous.

We introduce to this end the events

[TABLE]

and fix a compact set $K^{\prime}\subset\mathcal{X}$ . The right hand side of (77) can then be split into two terms

[TABLE]

for which we show convergence to [math], uniformly for $x\in K^{\prime}$ , starting with $(A)$ . Since $\kappa\ll-\mathcal{L}\mathscr{W}/\mathscr{W}$ , there exists $c>0$ such that

[TABLE]

Moreover, $\|f_{n}\|_{B^{\infty}_{\kappa}}\leqslant\|f\|_{B^{\infty}_{\kappa}}$ , $a\leqslant\mathrm{e}^{a}$ , and $\mathscr{W}\geqslant 1$ , so that

[TABLE]

By definition of $\mathscr{M}_{t}$ in (70) we have

[TABLE]

The Cauchy–Schwarz inequality then shows that

[TABLE]

By (71) it holds $\sqrt{\mathbb{E}_{x}[\mathscr{M}_{t}^{2}]}\leqslant\sqrt{C_{1}}\,\mathrm{e}^{C_{2}t/2}\sqrt{W(x)}$ . Next, by Tchebychev’s inequality and since $W\geqslant 1$ ,

[TABLE]

As a result, we obtain, for $x\in K^{\prime}$ ,

[TABLE]

Therefore, for any $\varepsilon>0$ , we can choose $m\geqslant 0$ such that $(A)\leqslant\varepsilon$ .

Let us now control $(B)$ , introducing $g_{n}=|f-f_{n}|$ . Since $\kappa\ll\Psi$ , it holds for some $c^{\prime}\geqslant 0$ ,

[TABLE]

Using the definition (78) we have

[TABLE]

where $B_{R}$ is the ball of center [math] and radius $R>0$ to be chosen. Let us first bound $(B^{\prime})$ , which retains only the parts of the trajectories performing excursions out of $B_{R}$ . Using $\kappa\ll\Psi$ , for $\varepsilon>0$ and $m\geqslant 0$ as fixed above, there exist $R>0$ , $C_{R}>0$ such that

[TABLE]

We fix $R>0$ and $C_{R}>0$ such that the above inequality holds true. Using again $g_{n}\leqslant 2\|f\|_{B^{\infty}_{\kappa}}\kappa$ , we are led to

[TABLE]

where the last line follows from the definition (78) of $\mathscr{E}_{m}$ . Therefore, once $m$ is fixed, there exists $R>0$ such that for any $n\geqslant 1$ and $x\in K^{\prime}$ , it holds $(B^{\prime})\leqslant\varepsilon$ . It remains to control $(B^{\prime\prime})$ in order to obtain the uniform convergence to zero of (77) over $K^{\prime}$ as $n\to+\infty$ . In fact,

[TABLE]

where $(P_{s})_{s\geqslant 0}$ is the evolution semigroup defined in (15). Since $(\mathds{1}_{B_{R}}g_{n})_{n\geqslant 1}$ is a sequence of bounded functions converging almost everywhere to zero and the transition kernel $P_{s}$ has a smooth density for $s>0$ , it follows that $(P_{s}(g_{n}\mathds{1}_{B_{R}}))_{n\geqslant 1}$ goes uniformly to zero over compact sets for any $s>0$ as $n\to+\infty$ , see e.g. [55, 97]. Moreover, it can be shown that

[TABLE]

which goes to zero when $\eta\to 0$ , uniformly in $x\in K^{\prime}$ and $n\in\mathbb{N}$ . Therefore, for $\varepsilon>0$ , $R>0$ and $m\geqslant 0$ fixed as above, and choosing

[TABLE]

there exists $n^{\prime}\in\mathbb{N}$ such that for all $n\geqslant n^{\prime}$ and $x\in K^{\prime}$ ,

[TABLE]

Then, for any $n\geqslant n^{\prime}$ , $x\in K^{\prime}$ , it holds

[TABLE]

Let us summarize the various approximations: for any $\varepsilon>0$ , we first fix $m\geqslant 0$ so that $(A)\leqslant\varepsilon$ . Then, we choose $R>0$ large enough so that $(B^{\prime})\leqslant\varepsilon$ . Finally, we take $\eta$ small enough and $n$ large enough in (79) so that $(B^{\prime\prime})\leqslant\varepsilon$ for $n\geqslant n^{\prime}$ . As a result, for any $\varepsilon>0$ there is $n^{\prime}\geqslant 0$ such that for $n\geqslant n^{\prime}$ and $x\in K^{\prime}$ , it holds $(A)+(B)\leqslant 3\varepsilon$ .

In conclusion, the right hand side of (77) goes to zero uniformly as $n\to+\infty$ over any compact set $K^{\prime}$ . Therefore $P_{t}^{f_{n}}(\varphi\mathds{1}_{K})$ is continuous and converges uniformly over $K^{\prime}$ to $P_{t}^{f}(\varphi\mathds{1}_{K})$ , which is therefore continuous over $K^{\prime}$ . Since the compact $K^{\prime}\subset\mathcal{X}$ is arbitrary, $P_{t}^{f}(\varphi\mathds{1}_{K})$ is continuous over $\mathcal{X}$ , which concludes the proof. ∎

Before presenting the main result concerning the spectral properties of the operator $P_{t}^{f}$ and its consequences on the definition of the cumulant function $\lambda(f)$ , we need the following “irreducibility” lemma, which relies on Assumption 2.

Lemma 6.

For any time $t>0$ , $x\in\mathcal{X}$ and any Borel set $A\subset\mathcal{X}$ with non-empty interior, it holds that

[TABLE]

Proof.

Take $x\in\mathcal{X}$ and $y\in\mathring{A}$ (which is possible since $A$ has non-empty interior). By Assumption 2, there exists a $C^{1}$ -path $(\phi_{s})_{s\in[0,t]}$ solving (19) such that $\phi_{0}=x$ and $\phi_{t}=y$ . We can then use the proof of the Stroock–Varadhan support theorem, see [97, Theorem 6.1] for an overview. In particular, Assumption 2 implies that [103, Eq. (5.5)] is satisfied. Therefore, [103, Eq. (5.3)] ensures that, for any $\varepsilon>0$ ,

[TABLE]

Moreover, since $\phi_{t}=y\in\mathring{A}$ and upon reducing $\varepsilon>0$ we may assume that $B(y,\varepsilon)\subset A$ , where $B(y,\varepsilon)$ denotes the ball of center $y$ and radius $\varepsilon>0$ . Recalling that $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , we then obtain

[TABLE]

where we denote by $S_{\phi,\varepsilon}$ the $\varepsilon$ -tube around the path $(\phi_{s})_{s\in[0,t]}$ , namely

[TABLE]

Since $S_{\phi,\varepsilon}$ is a bounded set and $\kappa$ is continuous over $\mathcal{X}$ , it holds

[TABLE]

The combination of (81) and (82) leads to the desired result (80). ∎

At this stage, we follow the spectral analysis path developed in [47]. However, we have to prove that the assumptions used in [47] are fulfilled in our context. In particular the irreducibility is granted by Lemma 6.

Lemma 7.

For any $f\in B^{\infty}_{\kappa}(\mathcal{X})$ and any $t>0$ , the operator $P_{t}^{f}$ considered over $B^{\infty}_{W}(\mathcal{X})$ has a real largest eigenvalue $\mathrm{e}^{tr(f)}$ with eigenspace of dimension one, and an associated continuous eigenvector $h_{f}\in B^{\infty}_{W}(\mathcal{X})$ such that $h_{f}(x)>0$ for any $x\in\mathcal{X}$ . Moreover, $h_{f}$ is the only positive eigenvector of $P_{t}^{f}$ (up to multiplication by a positive constant). Finally, $r(f)$ is equal to the cumulant function defined in (26):

[TABLE]

The result of Lemma 7 is twofold: it entails the well-posedness of the principal eigenproblem associated with $P_{t}^{f}$ for any $f\in B^{\infty}_{\kappa}(\mathcal{X})$ and $t>0$ , and then identifies this principal eigenvalue with the free energy function (26). Another consequence of this lemma is that $h_{f}$ is in fact the principal eigenvector of $\mathcal{L}+f$ , see Lemma 10 below for a more precise statement.

Proof.

We follow the general strategy of [47] and split the proof into several steps.

Step 1: Compactness of the evolution operator.

We first show that, for given $t>0$ and $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , the operator $P_{t}^{f}$ defined in Lemma 3 is compact when considered on $B^{\infty}_{W}(\mathcal{X})$ . For any compact set $K\subset\mathcal{X}$ we have the decomposition

[TABLE]

We first consider the compact sets $K_{a}$ from (73) for $a>0$ and time $t/3$ (omitting the dependence on $t$ in the notation since the time is fixed here) and note that $\mathds{1}_{K_{a}^{c}}P_{t/3}^{f}$ converges to [math] in operator norm as $a\to+\infty$ . Indeed, for any $\varphi\in B^{\infty}_{W}(\mathcal{X})$ , (73) leads to

[TABLE]

so that $\left\|\mathds{1}_{K_{a}^{c}}P_{t}^{f}\right\|_{\mathcal{B}(B^{\infty}_{W})}\to 0$ when $a\to+\infty$ .

We next show that $P_{t/3}^{f}\mathds{1}_{K}P_{t/3}^{f}\mathds{1}_{K}$ is compact over $B^{\infty}_{W}(\mathcal{X})$ for any compact set $K\subset\mathcal{X}$ . Consider a sequence $(\varphi_{k})_{k\in\mathbb{N}}$ bounded in $B^{\infty}_{W}(\mathcal{X})$ . Following the first step of the proof of [47, Lemma 2] and using our strong Feller result, Lemma 5, we see that $P_{t/3}^{f}\mathds{1}_{K}$ is a strong Feller operator, so $P_{t/3}^{f}\mathds{1}_{K}P_{t/3}^{f}\mathds{1}_{K}$ is ultra-Feller (see [47, Lemma 6]). This means that the operator $P_{t/3}^{f}\mathds{1}_{K}P_{t/3}^{f}\mathds{1}_{K}$ is continuous in total variation norm, so that the family $(P_{t/3}^{f}\mathds{1}_{K}P_{t/3}^{f}\mathds{1}_{K}\varphi_{k})_{k\in\mathbb{N}}$ is uniformly equicontinuous. We used here that since $\varphi\in B^{\infty}_{W}(\mathcal{X})$ and $W$ is continuous, it holds $\mathds{1}_{K}\varphi\in B^{\infty}(\mathcal{X})$ . The sequence $(P_{t/3}^{f}\mathds{1}_{K}P_{t/3}^{f}\mathds{1}_{K}\varphi_{k})_{k\in\mathbb{N}}$ therefore converges in $B^{\infty}(\mathcal{X})$ up to extraction by the Ascoli theorem [102, Theorem 11.28], and in $B^{\infty}_{W}(\mathcal{X})$ since $W\geqslant 1$ . Therefore, the operator $P_{t/3}^{f}\mathds{1}_{K}P_{t/3}^{f}\mathds{1}_{K}$ sends a bounded sequence into a convergent one (up to extraction), so it is compact in $B^{\infty}_{W}(\mathcal{X})$ [95]. The decomposition (84) and the bound (85) then show that $P_{t}^{f}$ is the limit in operator norm of the compact operators $P_{t/3}^{f}\mathds{1}_{K_{a}}P_{t/3}^{f}\mathds{1}_{K_{a}}P_{t/3}^{f}$ as $a\to+\infty$ , so it is compact in $B^{\infty}_{W}(\mathcal{X})$ (see e.g. [95, Theorem VI.12]).

Step 2: Existence of the principal eigenvalue.

We can now use the Krein–Rutman theorem on the (closed) total cone $\mathbb{K}_{W}=\{\varphi\in B^{\infty}_{W}\,|\,\varphi\geqslant 0\}$ (see [27, 47] for definitions). For $t>0$ , it is clear that $P_{t}^{f}$ leaves this cone invariant. We next show that $P_{t}^{f}$ has a non-zero spectral radius

[TABLE]

To this end, fix a compact set $K$ with non-empty interior. We have shown in Lemma 6 that

[TABLE]

Since $P_{t}^{f}\mathds{1}_{K}$ is continuous by Lemma 5, this shows that

[TABLE]

Therefore, for any $x\in K$ ,

[TABLE]

so that $\mathds{1}_{K}(x)\left((P_{t}^{f})^{2}\mathds{1}_{K}\right)(x)\geqslant\alpha_{K}^{2}\mathds{1}_{K}(x)$ for $x\in\mathcal{X}$ . Iterating the procedure for any $n\geqslant 1$ we get

[TABLE]

As a result, since $1\leqslant\inf_{K}W<+\infty$ , we obtain in the large $n$ limit the following lower bound for the spectral radius:

[TABLE]

which shows that $R_{t}(f)$ is positive. Since $P_{t}^{f}$ is compact, [27, Theorem 19.2] ensures that $R_{t}(f)$ is a real eigenvalue of $P_{t}^{f}$ with associated eigenvector $h_{f}\in\mathbb{K}_{W}$ (in particular, $h_{f}\geqslant 0$ ). Using the semigroup property of $P_{t}^{f}$ and standard arguments (see [94, Theorem 2.4]), we can show that there exists $r(f)\in\mathbb{R}$ such that $R_{t}(f)=\mathrm{e}^{r(f)t}$ and

[TABLE]

Step 3: Properties of $h_{f}$ .

For the remainder of the proof, we write for simplicity $r:=r(f)$ and $h:=h_{f}$ (the function $f$ being fixed). We show here that $h$ is continuous and positive. For any compact $K\subset\mathcal{X}$ and $t>0$ , (87) leads to

[TABLE]

Using Lemma 3 we obtain that, for any $a>0$ , there exists a compact set $K_{a}$ such that

[TABLE]

so that $h$ is continuous as the uniform limit of continuous functions (since $P_{t}^{f}(\mathds{1}_{K_{a}}h)$ is continuous by Lemma 5). Finally, since $h\geqslant 0$ and $h$ is not identically equal to [math], there exists $x_{0}\in\mathcal{X}$ such that $h(x_{0})>0$ . Moreover $h$ is continuous, so there is $\varepsilon>0$ for which $h>0$ on $B(x_{0},\varepsilon)$ . By (87) it holds, for any $x\in\mathcal{X}$ ,

[TABLE]

Since $h>0$ on $B(x_{0},\varepsilon)$ and $h$ is continuous, $\inf_{B(x_{0},\varepsilon)}h>0$ . Moreover $(P_{t}^{f}\mathds{1}_{B(x_{0},\varepsilon)})(x)>0$ for any $x\in\mathcal{X}$ by Lemma 6, so the previous lower bound shows that $h(x)>0$ for all $x\in\mathcal{X}$ .

Step 4: Properties of eigenspaces and eigenfunctions.

We now show that the eigenspace associated with $h$ is of dimension one, and that any other eigenvector vanishes somewhere in $\mathcal{X}$ . For this, we introduce the so called $h$ -transform [76, 101, 23, 47]. A key element here is the fact that $h(x)>0$ for all $x\in\mathcal{X}$ , which allows to define the following Markov operator, for an arbitrary time $t>0$ :

[TABLE]

where $h$ and $h^{-1}$ refer here to the multiplication operators by the functions $h$ and $h^{-1}$ respectively. We now prove that $Q_{h}$ is ergodic by first noting that $Q_{h}$ admits $Wh^{-1}$ as a Lyapunov function (using (73) and the normalization $\|h\|_{B^{\infty}_{W}}=1$ which implies that $Wh^{-1}\geqslant 1$ ). Using Assumption 3, we can also show that $Wh^{-1}$ has compact level sets, see [47, Appendix E] for details.

Moreover, we can prove that $Q_{h}$ satisfies a minorization condition on any compact set. For this, we first use that $P_{t}^{f}\geqslant P_{t}^{-\|f\|_{B^{\infty}_{\kappa}}\kappa}$ . Then, for any $t>0$ and $\alpha\geqslant 0$ , the operator $P_{t}^{-\alpha\kappa}$ has a smooth transition density by hypoellipticity (because $\kappa$ and the coefficients of $\mathcal{L}$ belong to the class $\mathscr{S}$ , see [43, Theorem 4.1]), which is positive in view of Lemma 6 by an argument similar to the one sketched after Assumption 2 (see for instance the proof of [97, Proposition 8.1]). Therefore, for any $K\subset\mathcal{X}$ compact with non-empty interior, and denoting by $\eta_{K}$ the uniform Lebesgue measure on $K$ , there is $a_{K}>0$ such that, for any measurable set $A\subset\mathcal{X}$ ,

[TABLE]

Since $h$ is continuous, this implies that, for any measurable $\varphi\geqslant 0$ ,

[TABLE]

where both the minimum and maximum above are finite and non-zero (recall that $|K|>0$ is the Lebesgue measure of $K$ ). This shows that $Q_{h}$ satisfies a minorization condition [60] over any compact set.

Therefore, the Markovian dynamics with kernel $Q_{h}$ admits a unique invariant probability measure $\mu_{h}$ , with respect to which it is ergodic in $B^{\infty}_{Wh^{-1}}(\mathcal{X})$ . By this we mean that (in view of [60, Theorem 1.2]) there exist $\bar{\alpha}>0$ and $C>0$ such that for any $\varphi\in B^{\infty}_{Wh^{-1}}(\mathcal{X})$ ,

[TABLE]

and it holds $\mu_{h}(W/h)<+\infty$ .

We can now use this ergodic behaviour to show that the eigenspace associated with $r$ has dimension one and that $P_{t}^{f}$ cannot have another positive eigenvector with norm $1$ in $B^{\infty}_{W}(\mathcal{X})$ . Indeed, if there were another eigenvector $\tilde{h}\in B^{\infty}_{W}(\mathcal{X})$ associated with $r$ , then the fact that $\tilde{h}/h\in B^{\infty}_{Wh^{-1}}(\mathcal{X})$ together with (89) ensure that

[TABLE]

This shows that $h$ and $\tilde{h}$ would be proportional, and answers the claim that the eigenspace associated with $r$ has dimension $1$ . Assume now that there is another real eigenvalue $\tilde{r}<r$ with real eigenvector $\tilde{h}\in B^{\infty}_{W}(\mathcal{X})$ such that $\tilde{h}(x)>0$ for all $x\in\mathcal{X}$ . Noting again that $\tilde{h}/h\in B^{\infty}_{Wh^{-1}}(\mathcal{X})$ and since $\tilde{h}>0$ , (89) shows that, for any $x\in\mathcal{X}$ ,

[TABLE]

However it now holds, for any $x\in\mathcal{X}$ ,

[TABLE]

where we used that $h>0$ and $\tilde{r}<r$ . Combining the two equations above shows that

[TABLE]

which contradicts (90). As a result, there cannot be another eigenvalue with a positive eigenvector.

Step 5: The principal eigenvalue is the cumulant function.

Proving (83) now follows by a simple rewriting. For $x\in\mathcal{X}$ and $t_{0}>0$ fixed, it holds, for any $n\in\mathbb{N}^{*}$ ,

[TABLE]

so that

[TABLE]

By (89) (since $h^{-1}\in B^{\infty}_{Wh^{-1}}(\mathcal{X})$ ), we see that $\left(h(Q_{h})^{n}h^{-1}\right)(x)$ converges to $\mu_{h}(h^{-1})h(x)$ (with $x$ fixed), so that

[TABLE]

We have chosen to work with an arbitrary time $t_{0}>0$ for convenience, so a priori the above limit depends on $t_{0}$ . To conclude the proof, it remains to show that the limit actually does not depend on the specific choice of $t_{0}$ and that

[TABLE]

This extension from $t_{0}>0$ fixed to any $t>0$ follows by standard arguments not reproduced here (see e.g. [64, 47]). ∎

An important ingredient for the lower bound of the LDP is the Gateau-differentiability of the cumulant functional, which we prove below.

Lemma 8.

The functional

[TABLE]

is convex and Gateau-differentiable.

Proof.

The convexity of $\lambda$ is a standard consequence of Hölder’s inequality. Concerning Gateau-differentiability, we follow the strategy of [55, Section 3] for a compact state space, relying on results of Kato [72]. For this, we interpret the cumulant function (91) as the largest eigenvalue of the tilted generator, $r(f)$ , as shown in Lemma 7. More precisely, for $f,g\in B^{\infty}_{\kappa}(\mathcal{X})$ and $\alpha\in\mathbb{R}$ , $\lambda(f+\alpha g)$ is associated with the largest eigenvalue of the operator $P_{t}^{f+\alpha g}$ in $B^{\infty}_{W}(\mathcal{X})$ through

[TABLE]

so that derivability in $\alpha$ can be shown through the differentiability of the spectrum of a bounded operator. We thus show that the operator-valued function $\alpha\mapsto P_{t}^{f+\alpha g}$ is differentiable in operator norm.

To this end, we fix $C>0$ , and prove that for $|\alpha|\leqslant C$ , there exists $K\in\mathbb{R}_{+}$ such that

[TABLE]

where

[TABLE]

Note that the operator $Q_{t}^{f,g}$ is bounded on $B^{\infty}_{W}(\mathcal{X})$ by the same martingale estimate used to prove Lemma 4. In order to prove (92), we use the identity

[TABLE]

to obtain, for any $\varphi\in B^{\infty}_{W}(\mathcal{X})$ and $x\in\mathcal{X}$ ,

[TABLE]

where we used the inequality $z^{2}/2\leqslant\mathrm{e}^{z}$ for $z\geqslant 0$ in the last line. By manipulations similar to the one used to prove Lemma 4, we can bound the latter expectation by $\mathrm{e}^{ct}W(x)$ for some constant $c>0$ , which leads to (92) with $K=\mathrm{e}^{ct}$ .

Equation (92) shows that $\alpha\mapsto P_{t}^{f+\alpha g}$ is differentiable in operator norm, and that

[TABLE]

Thus, the principal eigenvalue $\lambda(f+\alpha g)$ , which is always isolated, is differentiable, see [72, Chapter II, Theorem 5.4] and [72, Chapter IV, Theorem 3.5]. This concludes the proof of Gateau-differentiability. ∎

Remark 8.

By pursuing further the Taylor expansion (92) in the proof of Lemma 8, we can actually show that, for any $f,g\in B^{\infty}_{\kappa}(\mathcal{X})$ , the function

[TABLE]

is analytic (this analyticity was already proven in [76] using a different argument that can be simplified with our tools). This relies on the simple inequality $a^{n}/n!\leqslant\mathrm{e}^{a}$ for any $a\geqslant 0$ , together with the series expansion of the exponential and martingale estimates as in the proof of Lemma 8. Indeed, our proof, based on martingales, shows that for any $t>0$ , the function

[TABLE]

is analytic. Moreover, it is finite on $\mathbb{R}$ and converges pointwise to a finite valued function as $t\to+\infty$ , as shown in Lemma 7. Therefore, the convergence holds uniformly on any compact as $t\to+\infty$ (see [45, Theorem VI.3.3]). Since a locally uniform limit of analytic functions is analytic (see [102, Theorem 10.28]), the function $\alpha\mapsto\lambda(f+\alpha g)$ is analytic.

The last step before proving the large deviations principle itself is an exponential tightness result, see [28, Section 1.2]. At this stage, the finiteness of $\lambda(f)$ together with the Gateau-differentiability of $f\in B^{\infty}_{\kappa}(\mathcal{X})\mapsto\lambda(f)$ already provides the upper bound over compact sets and the lower bound in (28). In order to extend the upper bound to all closed sets, we prove exponential tightness in the $\tau^{\kappa}$ -topology, see Appendix A for some definitions (this exponential tightness is not explicitely stated in [76]).

Lemma 9.

The family of probability measures $t\mapsto\mathbb{P}_{x}(L_{t}\in\cdot\,)$ over $\mathcal{P}(\mathcal{X})$ is exponentially tight in the $\tau^{\kappa}$ -topology.

Proof.

We adapt the strategy of [115, Corollary 2.3] and [111, Section 2.2] by introducing the family of sets

[TABLE]

For $N>0$ , the sets $\Gamma_{N}$ are subsets of $\mathcal{P}_{\kappa}(\mathcal{X})$ since $\kappa\ll\Psi$ . We show that they are actually precompact in the $\tau^{\kappa}$ -topology.

Let us first show that $\Gamma_{N}$ is precompact in the usual weak topology for any $N>0$ . Consider for this the compact sets $K_{\beta}=\{x\in\mathcal{X}\,|\,\Psi(x)\leqslant\beta\}\subset\mathcal{X}$ for $\beta>0$ (recall that $\Psi$ has compact level sets). Then, for any $\nu\in\Gamma_{N}$ , we have

[TABLE]

This shows that for any $\beta>0$ and any $\nu\in\Gamma_{N}$ ,

[TABLE]

hence (upon choosing $\beta$ sufficiently large) for any $N>0$ the family of measures $\Gamma_{N}$ is tight, so it is precompact for the weak topology by the Prohorov theorem [12]. Now, if $\kappa$ is bounded, $\Gamma_{N}$ is tight for the $\tau^{\kappa}$ -topology and the theorem is shown, so we may assume that $\kappa$ has compact level sets (see Assumption 3). For proving compactness in our weighted topology, we show that $\kappa$ is uniformly integrable over $\Gamma_{N}$ in order to use [110, Theorem 7.12]. Since $\kappa\ll\Psi$ , the set

[TABLE]

is compact for any $n\geqslant 1$ . Moreover, since we assume $\kappa$ to be continuous with compact level sets, for any $n\geqslant 1$ there exists $m_{n}\geqslant n$ such that

[TABLE]

with $m_{n}\to+\infty$ when $n\to+\infty$ . Therefore, for any $\nu\in\Gamma_{N}$ and $n\geqslant 1$ ,

[TABLE]

Taking the supremum over $\nu\in\Gamma_{N}$ in the above equation and recalling that $m_{n}\to+\infty$ when $n\to+\infty$ we obtain

[TABLE]

We can then conclude that $\Gamma_{N}$ is precompact for the $\tau^{\kappa}$ -topology. Consider indeed a sequence $(\nu_{n})_{n\in\mathbb{N}}\subset\Gamma_{N}$ . By Prohorov’s theorem, $(\nu_{n})_{n\in\mathbb{N}}$ has a subsequence weakly converging towards a measure $\nu$ , i.e. $\nu_{n}(\varphi)\to\nu(\varphi)$ for any $\varphi\in C_{\mathrm{b}}(\mathcal{X})$ . Then, by [110, Theorem 7.12], (93) ensures that $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ and for any $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , $\nu_{n}(f)\to\nu(f)$ as $n\to+\infty$ . In other words, $\Gamma_{N}$ is precompact for the $\tau^{\kappa}$ -topology.

We can now prove the $\tau^{\kappa}$ -exponential tightness of the empirical distribution $(L_{t})_{t\geqslant 0}$ in $\mathcal{P}(\mathcal{X})$ . Indeed, for any $N,t>0$ , Tchebytchev’s inequality leads to

[TABLE]

Renormalizing at log scale leads to

[TABLE]

The right hand side of the above quantity may look infinite since $\Psi$ grows faster than $\kappa$ . However, using again the martingale $M_{t}$ defined in Lemma 2 we obtain, for any $t>0$ ,

[TABLE]

Thus it holds

[TABLE]

As a result, (94) becomes

[TABLE]

Since $\Gamma_{N}$ is precompact in the $\tau^{\kappa}$ -topology for any $N>0$ , and $N$ can be chosen arbitrarily large, this proves the exponential tightness of the family of empirical distributions in the $\tau^{\kappa}$ -topology. ∎

We are now in position to prove Theorem 1.

Proof of Theorem 1.

The previous lemmas make it possible to apply the Gärtner–Ellis theorem (recalled in Appendix A). The function $\Lambda$ in Theorem 4 of Appendix A is the cumulant function

[TABLE]

The topological dual of $(\mathcal{M}_{\kappa}(\mathcal{X}),\tau^{\kappa})$ is $B^{\infty}_{\kappa}(\mathcal{X})$ , where $\mathcal{M}_{\kappa}(\mathcal{X})$ is the set of measures over $\mathcal{X}$ integrating $\kappa$ (see [102, 76] and [30, Lemma 3.3.8] for details). We have proved that $\lambda$ is well defined, Gateau-differentiable, and that the family of measures

[TABLE]

is exponentially tight in the $\tau^{\kappa}$ -topology. Therefore, $(\pi_{t})_{t\geqslant 0}$ satisfies a large deviations principle in the $\tau^{\kappa}$ -topology with good rate function given by

[TABLE]

Note first that $I(\nu)\geqslant 0$ . We next observe that $I(\nu)=+\infty$ if $\nu$ is not normalized to 1 (take $f$ to be constant in the supremum (95)), so we may consider $I$ over $\mathcal{P}(\mathcal{X})$ . Moreover, choosing $f=\kappa$ in (95) and noting that $\lambda(\kappa)<+\infty$ by Lemma 7, we get $I(\nu)=+\infty$ if $\nu\notin\mathcal{P}_{\kappa}(\mathcal{X})$ . If $\nu$ is not absolutely continuous with respect to $\mu$ , there exists a measurable set $A\subset\mathcal{X}$ such that $\mu(A)=0$ and $\nu(A)>0$ . Since $\mu$ has a positive density with respect to the Lebesgue measure, this means that $A$ has zero Lebesgue measure. Consider then $f_{a}=a\mathds{1}_{A}\in B^{\infty}_{\kappa}(\mathcal{X})$ for $a\in\mathbb{R}$ . Since $A$ has zero Lebegue measure and $(X_{t})_{t\geqslant 0}$ has a smooth density for all $t>0$ (as a consequence of Assumption 1) it holds, for all $t>0$ ,

[TABLE]

Therefore, the process

[TABLE]

satisfies $\mathbb{E}_{x}[Z_{t}]=0$ for all $t>0$ . Since $Z_{t}\geqslant 0$ , it holds $Z_{t}=0$ almost surely, for any $t>0$ . As a consequence we obtain

[TABLE]

This shows that $\lambda(f_{a})=0$ , so that from (95) we obtain

[TABLE]

with $\nu(A)>0$ . By letting $a\to+\infty$ we are led to $I(\nu)=+\infty$ .

Finally, we show that $I(\nu)=0$ if and only if $\nu=\mu$ , and that $(L_{t_{n}})_{n\geqslant 0}$ converges almost surely to $\mu$ in the $\tau^{\kappa}$ -topology for any sequence $(t_{n})_{n\geqslant 0}$ such that $t_{n}/\log(n)\to+\infty$ (see [28, Appendix B] for the definition of this almost-sure convergence). Define

[TABLE]

Since $I$ has compact level sets (because it is a good rate function, see Theorem 4), $\mathscr{I}$ is a non-empty closed subset of $\mathcal{P}(\mathcal{X})$ for the $\tau^{\kappa}$ -topology. Moreover, in order for the LDP upper bound to make sense, it holds $\inf_{\mathcal{P}(\mathcal{X})}\,I=0$ . If $\mathscr{I}_{\delta}$ denotes an open neighborhood of $\mathscr{I}$ , the lower semicontinuity of $I$ implies that

[TABLE]

Therefore, by the large deviations upper bound we have, for any $t\geqslant 0$ ,

[TABLE]

for some constant $C>0$ . Consider now a sequence $(t_{n})_{n\geqslant 1}$ such that $t_{n}/\log(n)\to+\infty$ as $n\to+\infty$ . In particular, there exists $n_{\star}\in\mathbb{N}$ such that $t_{n}\inf_{\mathscr{I}_{\delta}^{c}}I\geqslant 2\log(n)$ for $n\geqslant n_{\star}$ , which implies

[TABLE]

This shows that $(L_{t_{n}})_{n\geqslant 0}$ converges almost surely to $\mathscr{I}$ in the $\tau^{\kappa}$ -topology, by the Borel-Cantelli lemma (and by definition of convergence in a topological space [28, Appendix B]). However, we know by Proposition 2 that the only possible limit for $(L_{t_{n}})_{n\geqslant 0}$ is $\mu$ , hence $\mathscr{I}=\{\mu\}$ and $(L_{t_{n}})_{n\geqslant 0}$ almost surely converges to $\mu$ .

We finally show for completeness that $(L_{t})_{t\geqslant 0}$ almost surely spends a finite Lebesgue time outside $\mathscr{I}_{\delta}$ . For this we introduce the random subset of $\mathbb{R}_{+}$ of times $t\geqslant 0$ for which $L_{t}$ does not belong to $\mathscr{I}_{\delta}$ , namely $T=\{t\geqslant 0\,|\,L_{t}\notin\mathscr{I}_{\delta}\}$ . Since

[TABLE]

we have, by Fubini’s theorem, for any $t>0$ ,

[TABLE]

By using (96) and the dominated convergence theorem, we obtain

[TABLE]

As a result, $|T|<+\infty$ almost surely. This means that, for any neighborhood $\mathscr{I}_{\delta}$ of $\mathscr{I}$ in the $\tau^{\kappa}$ -topology, the empirical measure $(L_{t})_{t\geqslant 0}$ almost surely spends a finite Lebesgue measure time outside $\mathscr{I}_{\delta}$ , and this concludes the proof. ∎

6.2 Proofs of Section 3

We start by providing a preliminary technical result in Section 6.2.1, which shows that the eigenvectors $h_{f}$ considered in Lemma 7 belong to the generalized domain $\mathcal{D}^{+}(\mathcal{L})$ defined in (38). We then turn to the proofs of Proposition 3 (see Section 6.2.2) and Corollary 2 (see Section 6.2.3).

6.2.1 A preliminary technical result

Lemma 10.

Fix $f\in B^{\infty}_{\kappa}(\mathcal{X})$ . The function $h_{f}\in B^{\infty}_{W}(\mathcal{X})$ defined in Lemma 7 belongs to $\mathcal{D}^{+}(\mathcal{L})$ and satisfies

[TABLE]

Proof.

We already know by Lemma 7 that $h_{f}\in C^{0}(\mathcal{X})$ and $h_{f}>0$ . It suffices therefore to show that $h_{f}\in\mathcal{D}(\mathcal{L})$ and to obtain the representation (97) for $\mathcal{L}h_{f}$ . We combine to this end elements from [30, Theorem 4.2.25] and [114, Proposition B13].

We start by noting that, since $h_{f}$ is an eigenvector of the operator $P_{t}^{f}$ with eigenvalue $\mathrm{e}^{\lambda(f)t}$ , it holds

[TABLE]

Therefore,

[TABLE]

where the last equality comes from Fubini’s theorem and

[TABLE]

Note that we can indeed apply Fubini’s theorem since there exist $K,c>0$ such that

[TABLE]

and (since we are integrating nonnegative functions)

[TABLE]

where the last expression is finite by manipulations similar to the ones performed in the proof of Lemma 2.

We can next use (98) at initial time $s\in[0,t]$ together with a conditioning argument to write

[TABLE]

This finally shows that (99) becomes

[TABLE]

Since $(\lambda(f)-f)h_{f}$ is in $B^{\infty}_{\kappa W}(\mathcal{X})$ (as the product of functions in $B^{\infty}_{W}(\mathcal{X})$ and $B^{\infty}_{\kappa}(\mathcal{X})$ ) and $(P_{t})_{t\geqslant 0}$ is a semigroup of bounded operators on $B^{\infty}_{\kappa W}(\mathcal{X})$ by (22), it holds

[TABLE]

so that (35) is satisfied. As a result, $h_{f}\in\mathcal{D}(\mathcal{L})$ and $\mathcal{L}h_{f}=(\lambda(f)-f)h_{f}$ in the weak sense defined by (36). ∎

Remark 9.

It is actually possible to make more general statements about the domains of the generators of $(P_{t}^{f})_{t\geqslant 0}$ for $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , similarly to [114, 115]. For this, one considers the (closed) subset of functions $\varphi\in B^{\infty}_{W}(\mathcal{X})$ for which $P_{t}^{f}\varphi\to\varphi$ in $B^{\infty}_{W}(\mathcal{X})$ when $t\to 0$ , see [96, Exercice 1.16]. We can then define a generator $\mathcal{L}_{f}$ with domain $D(\mathcal{L}_{f})$ for this semigroup. By manipulations similar to those of Lemma 10, we can show that $D(\mathcal{L}_{f})\subset\mathcal{D}(\mathcal{L})$ when we define $\mathcal{D}(\mathcal{L})$ as in (36). In this case we obtain the representation $\mathcal{L}_{f}=\mathcal{L}-f$ which could be expected. This procedure allows to define a common domain for the operators $\mathcal{L}_{f}$ with $f\in B^{\infty}_{\kappa}(\mathcal{X})$ .

Here we bypass the approach sketched above because, for the proof of Proposition 3 given below, we can restrict our attention to the eigenvectors $h_{f}$ for $f\in B^{\infty}_{\kappa}(\mathcal{X})$ . In this case, it is clear that $P_{t}^{f}h_{f}\to h_{f}$ in $B^{\infty}_{W}(\mathcal{X})$ when $t\to 0$ , and we have the simple representation formula $\mathcal{L}h_{f}=(\lambda(f)-f)h_{f}$ , which can be seen as a reformulation of the eigenvalue equation $(\mathcal{L}+f)h_{f}=\lambda(f)h_{f}$ .

6.2.2 Proof of Proposition 3

For the proof, which is partly inspired by [30, Lemma 4.1.36], we denote by $I_{\mathrm{F}}$ the rate function given by the Fenchel transform in (27) and $I_{\mathrm{V}}$ for the Varadhan functional on the right hand side of (37). We repeatedly use the results of Lemmas 7 and 10.

We first show that $I_{\mathrm{V}}(\nu)=+\infty$ if $\nu$ is not absolutely continuous with respect to $\mu$ or does not belong to $\mathcal{P}_{\kappa}(\mathcal{X})$ . Assume first that $\nu\ll\mu$ does not hold: there exists a set $A\subset\mathcal{X}$ such that $\nu(A)>0$ and $\mu(A)=0$ . For any $a\in\mathbb{R}$ we introduce $f_{a}=a\mathds{1}_{A}$ and denote by $h_{a}$ the eigenvector associated with the principal eigenvalue $\mathrm{e}^{t\lambda(f_{a})}$ of $P_{t}^{f_{a}}$ for some $t>0$ . Recall that $h_{a}\in\mathcal{D}^{+}(\mathcal{L})$ by Lemma 10. As shown in the proof of Theorem 1, it holds $\lambda(f_{a})=0$ , so that (97) can be rewritten as

[TABLE]

Therefore,

[TABLE]

By letting $a\to+\infty$ , we conclude that $I_{\mathrm{V}}(\nu)=+\infty$ when $\nu$ is not absolutely continuous with respect to $\mu$ . Next, if $\nu\notin\mathcal{P}_{\kappa}(\mathcal{X})$ , since $\kappa\geqslant 1$ it holds $\nu(\kappa)=+\infty$ . We may then choose $f=\kappa\in B^{\infty}_{\kappa}(\mathcal{X})$ . By Lemma 10, the principal eigenvector $h_{\kappa}$ belongs to $\mathcal{D}^{+}(\mathcal{L})$ with $\lambda(\kappa)<+\infty$ , so we have

[TABLE]

i.e. $I_{\mathrm{V}}(\nu)=+\infty$ if $\nu\notin\mathcal{P}_{\kappa}(\mathcal{X})$ . This shows that $I_{\mathrm{F}}(\nu)=I_{\mathrm{V}}(\nu)$ when $\nu$ is not absolutely continuous with respect to $\mu$ or $\nu\notin\mathcal{P}_{\kappa}(\mathcal{X})$ . We next show that $I_{\mathrm{F}}=I_{\mathrm{V}}$ when $\nu\ll\mu$ and $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ , which we assume until the end of the proof.

Let us first show that $I_{\mathrm{F}}\geqslant I_{\mathrm{V}}$ . For this, we consider $u\in\mathcal{D}^{+}(\mathcal{L})$ and introduce

[TABLE]

Because of the definition (38) of $\mathcal{D}^{+}(\mathcal{L})$ , we know that $f_{u}\in B^{\infty}_{\kappa}(\mathcal{X})$ . We can then write, since $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ ,

[TABLE]

We now show that $\lambda(f_{u})\leqslant 0$ . By computations similar to the ones in the proof of Lemma 2, and using the continuity of $u\in\mathcal{D}^{+}(\mathcal{L})$ (see also [115, Corollary 2.2]), we obtain by the local martingale property that

[TABLE]

Therefore, recalling the definition (88) of the $h$ -transformed evolution operator with a time $t>0$ fixed (with $r(f_{u})=\lambda(f_{u})$ in view of Lemma 7), and denoting by $h_{u}>0$ the eigenvector associated with $f_{u}$ in Lemma 7, (101) becomes

[TABLE]

where the limit $n\to+\infty$ follows from (89) (noting that $u/h_{u}\in B^{\infty}_{Wh_{u}^{-1}}(\mathcal{X})$ ). The latter limit is positive since $u/h_{u}$ is continuous and positive, which implies that $\lambda(f_{u})\leqslant 0$ . Therefore, (100) leads to

[TABLE]

Since $u\in\mathcal{D}^{+}(\mathcal{L})$ is arbitrary, taking the supremum shows that $I_{\mathrm{F}}(\nu)\geqslant I_{\mathrm{V}}(\nu)$ for any $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ with $\nu\ll\mu$ .

We finally turn to the inequality $I_{\mathrm{F}}\leqslant I_{\mathrm{V}}$ . Consider for any arbitrary $f\in B^{\infty}_{\kappa}(\mathcal{X})$ the eigenvector $h_{f}\in B^{\infty}_{W}(\mathcal{X})$ defined in Lemma 7. By Lemma 10, this eigenvector belongs to $\mathcal{D}^{+}(\mathcal{L})$ and satisfies $\mathcal{L}h_{f}=(\lambda(f)-f)h_{f}$ . Thus, since $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ , we have

[TABLE]

Given that, in the above equation, $f$ is an arbitrary function belonging to $B^{\infty}_{\kappa}(\mathcal{X})$ , taking the supremum leads to

[TABLE]

This finally shows that $I_{\mathrm{F}}(\nu)=I_{\mathrm{V}}(\nu)$ for all $\nu\in\mathcal{P}_{\kappa}(\mathcal{X})$ with $\nu\ll\mu$ and concludes the proof.

6.2.3 Proof of Corollary 2

Since $I$ is the Fenchel transform of $\lambda$ , the result follows if we can show that the application $\lambda$ defined on $B^{\infty}_{\kappa}(\mathcal{X})$ is stable by bi-Fenchel conjugacy. The convexity and finiteness of $\lambda$ show that a (necessary and) sufficient condition for $\lambda$ to be bi-Fenchel stable is for the functional $f\mapsto\lambda(f)$ to be lower-semicontinuous (see [8, Theorem 2.22]). We show below that it is actually continuous: for any sequence $(f_{n})_{n\geqslant 0}$ in $B^{\infty}_{\kappa}(\mathcal{X})$ such that $\|f_{n}-f\|_{B^{\infty}_{\kappa}}\to 0$ for some $f\in B^{\infty}_{\kappa}(\mathcal{X})$ , it holds $\lambda(f_{n})\to\lambda(f)$ as $n\to+\infty$ . We shall use for this a stability result from [22].

Consider a sequence $(f_{n})_{n\geqslant 0}$ converging to $f$ in $B^{\infty}_{\kappa}(\mathcal{X})$ . Using Lemma 4, for any $\varphi\in B^{\infty}_{W}(\mathcal{X})$ , $t>0$ , $x\in\mathcal{X}$ and $n\in\mathbb{N}$ , it holds (using again the inequality $a\leqslant\mathrm{e}^{a}$ for $a\geqslant 0$ )

[TABLE]

for some constant $C>0$ depending on $t>0$ , $\|f\|_{B^{\infty}_{\kappa}}$ and $\sup_{n\geqslant 0}\|f_{n}\|_{B^{\infty}_{\kappa}}$ . We used Lemma 2 and the supermartingale property of $M_{t}$ to obtain the last line. This leads to

[TABLE]

We know by Lemma 7 that $\lambda(f)$ and $\lambda(f_{n})$ are associated with the isolated largest eigenvalue of the operators $P_{t}^{f}$ and $P_{t}^{f_{n}}$ respectively. Therefore, (102) shows that the approximation is strongly stable (we refer to [22], in particular the definitions in Section 2.2 and Proposition 2.11), so [22, Proposition 2.2] ensures that $\lambda(f_{n})\to\lambda(f)$ as $n\to+\infty$ . This shows that the function $\lambda:B^{\infty}_{\kappa}(\mathcal{X})\to\mathbb{R}$ is continuous and concludes the proof.

6.2.4 Proof of Theorem 2

The proof, inspired by [15], relies on two ideas: performing a Witten transform inside the variational representation (37) and separating the symmetric and antisymmetric parts of the generator $\mathcal{L}$ . We write $d\nu=\rho\,d\mu=\mathrm{e}^{v}\,d\mu$ and assume first that $v\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ instead of $\mathscr{H}^{1}(\nu)$ . Starting from (37), we consider a function $u$ of the form

[TABLE]

We call this choice “variational Witten transform” for its similarity with the standard Witten transform [112, 62, 83] and its use in the variational formula (37) satisfied by $I$ . Since $u=\mathrm{e}^{\frac{\psi}{2}+\frac{v}{2}}$ with $v,\psi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ it is clear that $u\in\mathcal{D}^{+}(\mathcal{L})$ . This follows by noting that, using the shorthand notation $w=\psi/2+v/2\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ , we have

[TABLE]

Moreover, it holds $u=\mathrm{e}^{w}>0$ and $u$ is constant outside a compact set, so $u\in B^{\infty}_{W}(\mathcal{X})$ and it holds $u\in\mathcal{D}^{+}(\mathcal{L})$ .

We now rewrite the expression in (37) for $u$ given by (103), using again the notation $w=\psi/2+v/2$ :

[TABLE]

Recalling that $S=\sigma\sigma^{T}/2$ and expanding $w=\psi/2+v/2$ , we obtain

[TABLE]

We now decompose $\mathcal{L}$ into symmetric and antisymmetric parts. First, it holds

[TABLE]

On the other hand, using that $\mathcal{L}_{\mathrm{A}}$ is a first order differential operator satisfying $\mathcal{L}_{\mathrm{A}}^{*}\mathds{1}=-\mathcal{L}_{\mathrm{A}}\mathds{1}=0$ , we obtain

[TABLE]

As a result

[TABLE]

By plugging (105)-(106) into (104), we obtain

[TABLE]

The first term in the above equation reads (recalling that $\rho=\mathrm{e}^{v}$ )

[TABLE]

By density of $C_{\mathrm{c}}^{\infty}(\mathcal{X})$ in $\mathscr{H}^{1}(\mu)$ , the above expression is valid for any $\rho$ such that $\sqrt{\rho}\in\mathscr{H}^{1}(\mu)$ . The above computation shows that this condition is equivalent to assuming that $v\in\mathscr{H}^{1}(\nu)$ , and

[TABLE]

which does not involve the function $\psi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ . Moreover, since $\mathcal{L}_{\mathrm{A}}$ is a first order differential operator, antisymmetric on $L^{2}(\mu)$ , it holds

[TABLE]

As a result, (107) rewrites

[TABLE]

and this expression is finite for any $\psi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ .

Our goal is now to take the supremum over functions $\psi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ in (108), and prove that this is enough to obtain the supremum over $\mathcal{D}^{+}(\mathcal{L})$ . We consider for this the terms depending on $\psi$ in (108) and, using the duality between $\mathscr{H}^{1}(\nu)$ and $\mathscr{H}^{-1}(\nu)$ (see [75, Section 2, Claim F]) we obtain

[TABLE]

where we used Young’s inequality with $\varepsilon<1$ to obtain the second line. Since $\mathcal{L}_{\mathrm{A}}v\in\mathscr{H}^{-1}(\nu)$ , the supremum over the functions $\psi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ takes the value $-\infty$ when $\psi\notin\mathscr{H}^{1}(\nu)$ . Therefore, by density of $C_{\mathrm{c}}^{\infty}(\mathcal{X})$ in $\mathscr{H}^{1}(\nu)$ , the supremum over the functions of the form (103) for $\psi\in C_{\mathrm{c}}^{\infty}(\mathcal{X})$ recovers the supremum over $\mathcal{D}^{+}(\mathcal{L})$ and it holds

[TABLE]

by definition of the $\mathscr{H}^{-1}(\nu)$ -norm in Section 2.1, which concludes the proof.

Remark 10.

We have proved our result for measures of the form $d\nu=\mathrm{e}^{v}\,d\mu$ . Considering more general measures $\nu\ll\mu$ is made difficult because the Radon–Nikodym derivative $\rho=d\nu/d\mu$ may vanish on some region of $\mathcal{X}$ , hence the definition of $\mathcal{L}_{\mathrm{A}}(\log\rho)$ is not clear. Given (109), we see that we can give a sense to our computations provided $\mathcal{L}_{\mathrm{A}}(\log\rho)$ defines a linear form on $\mathscr{H}^{1}(\nu)$ , namely: there exists $C>0$ such that

[TABLE]

We find it however clearer to work directly with exponential perturbations of the invariant measure $\mu$ .

6.2.5 Proof of Corollary 3

The proof follows from the variational formulation of Theorem 2. Indeed, let us rewrite (43) as

[TABLE]

where $\nu$ is fixed and satisfies the assumptions of the theorem, and

[TABLE]

By [75, Section 2, Claim F], we can identify $\mathscr{H}^{-1}(\nu)$ with the dual of $\mathscr{H}^{1}(\nu)$ , so that $\mathcal{I}_{\nu}$ reads

[TABLE]

Denoting by $\widetilde{\nabla}$ the adjoint of the gradient operator in $L^{2}(\nu)$ , standard results of calculus of variations show that the minimum in (111) is attained at a unique $\psi_{v}\in\mathscr{H}^{1}(\nu)$ solution to

[TABLE]

Inserting $\psi_{v}$ solution to (112) in (111) leads to

[TABLE]

which concludes the proof.

Appendix A Tools for large deviations principles

In this section, we remind some large deviations concepts (using the abuse of notation discussed at the beginning of Section 6 for denoting expectations and probabilities). For a Polish space $\mathcal{Y}$ , we denote by $\mathcal{Y}^{\prime}$ its topological dual (the set of continuous linear functionals over $\mathcal{Y}$ ). We first recall the definition of an exponentially tight family of measures. A family of measures $(\pi_{t})_{t\geqslant 0}$ over a Polish space $\mathcal{Y}$ is called exponentially tight if for any $N<+\infty$ , there exists a (pre)compact set $\Gamma_{N}\subset\mathcal{Y}$ such that

[TABLE]

In words, exponential tightness means that the measures $(\pi_{t})_{t\geqslant 0}$ concentrate exponentially fast over compact sets. This property is used in large deviations to turn an upper bound over compact sets into an upper bound over all closed sets.

We now define the cumulant function. Consider a family of measures $(\pi_{t})_{t\geqslant 0}$ over a Polish space $\mathcal{Y}$ . The logarithmic moment generating function is defined as in [28, Section 4.5]: for any $t\geqslant 0$ , $f\in\mathcal{Y}^{\prime}$ and a random variable $Z_{t}$ distributed according to $\pi_{t}$ ,

[TABLE]

The scaled cumulant generating function is defined by

[TABLE]

Let us relate this quantity with the objects introduced in Section 2. In our situation, we consider fluctuations of the empirical measure $L_{t}\in\mathcal{M}(\mathcal{X})$ (where $\mathcal{M}(\mathcal{X})$ is the space of measures with finite mass), so $\mathcal{Y}=\mathcal{M}(\mathcal{X})$ and for $\Gamma\in\mathcal{M}(\mathcal{X})$ ,

[TABLE]

On the other hand, $f$ belongs to a space of functions, typically $\mathcal{Y}^{\prime}=\mathcal{M}(\mathcal{X})^{\prime}=B^{\infty}(\mathcal{X})$ when the $\tau$ -topology is considered. In practice we may restrict ourselves to probability measures because the rate function is infinite otherwise. We see that considering $L_{t}\in\mathcal{P}_{\kappa}(\mathcal{X})$ leads to choosing $f\in B^{\infty}_{\kappa}(\mathcal{X})$ . In any case the duality relation (114) reads in this case

[TABLE]

so that $\bar{\Lambda}_{t}(f)$ coincides with the argument of the limit in (26). With these preliminaries, we are in position to state the key theorem for the results in this work, which goes back to [55, 45] and is presented for instance in [28, Corollary 4.6.14]. We recall that a rate function is said to be good if its level sets are compact for the considered topology.

Theorem 4 (Projective limit - Gärtner–Ellis).

Let $(\pi_{t})_{t\geqslant 0}$ be an exponentially tight family of probability measures on a Polish space $\mathcal{Y}$ . Assume that

[TABLE]

is finite valued over $\mathcal{Y}^{\prime}$ and Gateau-differentiable. Then $(\pi_{t})_{t\geqslant 0}$ satisfies a large deviations principle over $\mathcal{Y}$ with good rate function $\Lambda^{*}$ , the Legendre–Fenchel transform of $\Lambda$ .

Appendix B Proof of Proposition 1

The proposition is a consequence of the equality

[TABLE]

Since $|\sigma^{T}\nabla V|$ has compact level sets and $\Psi\sim|\sigma^{T}\nabla V|^{2}$ by (23), $\Psi$ has compact level sets. Since $V$ has compact level sets, for $\varepsilon<\theta/2$ it holds $\mathscr{W}\ll W$ and $\mathscr{W}^{2}\leqslant C_{1}W$ for some constant $C_{1}>0$ . Moreover, outside a compact set, the function

[TABLE]

is bounded above and below since the numerator and denominator are both equivalent to $|\sigma^{T}\nabla V|^{2}$ , so the second condition in (21) holds. Finally,

[TABLE]

Since $\Psi\sim|\sigma\nabla V|^{2}$ , we may choose $\varepsilon$ small enough so as to obtain

[TABLE]

for some constant $C_{2}\in\mathbb{R}$ . This proves the third item of (21).

We finally turn to the proof of (22). For this we compute

[TABLE]

Hence, using that $W(x)=\mathrm{e}^{\theta V(x)}$ , for any $\eta>0$ it holds

[TABLE]

Since $\Psi\sim|\sigma^{T}\nabla V|^{2}$ at infinity and (24) holds, this shows that (22) is satisfied when choosing $\eta>0$ sufficiently small.

Appendix C Proof of Lemma 1

The proof relies on manipulations similar to those of [86]. A simple computation shows that

[TABLE]

For any $\eta>0$ it holds

[TABLE]

As a result, Assumption 5 leads to

[TABLE]

Since $\theta>0$ , it holds

[TABLE]

with

[TABLE]

The claim follows for $\theta\in(0,1)$ by choosing $\eta<2c_{V}/\gamma$ and $\varepsilon>0$ sufficiently small.

Bibliography116

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Asmussen and P. W. Glynn. Stochastic Simulation: Algorithms and Analysis , volume 57 of Stochastic Modelling and Applied Probability . Springer Science & Business Media, 2007.
2[2] F. Augeri. On heavy-tail phenomena in some large deviations problems. Comm. Pure Appl. Math. , 73(8), 1599-1659, 2020.
3[3] Y. Baek, Y. Kafri, and V. Lecomte. Dynamical phase transitions in the current distribution of driven diffusive channels. J. Phys. A , 51(10):105001, 2018.
4[4] D. Bakry, F. Barthe, P. Cattiaux, and A. Guillin. A simple proof of the Poincaré inequality for a large class of probability measures. Electron. Commun. Probab. , 13:60–66, 2008.
5[5] D. Bakry, I. Gentil, and M. Ledoux. Analysis and Geometry of Markov Diffusion Operators , volume 348 of Grundlehren der mathematischen Wissenschaften . Springer Science & Business Media, 2013.
6[6] V. Bansaye, B. Cloez, P. Gabriel, and A. Marguet. A non-conservative Harris’ ergodic theorem. ar Xiv:1903.03946 , 2019.
7[7] A. C. Barato and R. Chetrite. A formal view on level 2.5 large deviations and fluctuation relations. J. Stat. Phys. , 160(5):1154–1172, 2015.
8[8] V. Barbu and T. Precupanu. Convexity and Optimization in Banach Spaces , volume 10 of Mathematics and its Applications . Springer Science & Business Media, 2012.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Large deviations of empirical measures of diffusions in weighted topologies

Abstract

1 Introduction

Main results.

Organization of the work.

2 Large deviations principle

2.1 Setting

Remark 1**.**

Remark 2**.**

Remark 3**.**

Remark 4**.**

2.2 Statement of the main results

Assumption 1** (Hypoellipticity).**

Assumption 2** (Controllability).**

Assumption 3** (Witten–Lyapunov condition).**

Remark 5**.**

Proposition 1**.**

Proposition 2**.**

Proof.

Theorem 1**.**

Remark 6** (Reversible processes, Witten Laplacian and Cramér’s condition).**

Corollary 1** (Level 1 large deviations principle).**

Proof.

3 Decomposition of the rate function

3.1 Donsker–Varadhan variational formula

Proposition 3**.**

Corollary 2**.**

3.2 Entropy decomposition: symmetry and antisymmetry

Theorem 2**.**

Corollary 3**.**

Remark 7**.**

4 Applications

4.1 Overdamped Langevin dynamics

Assumption 4**.**

Proposition 4**.**

Proposition 5**.**

4.2 Underdamped Langevin dynamics

4.2.1 Description of the dynamics

4.2.2 Large deviations

Assumption 5**.**

Lemma 1**.**

Theorem 3**.**

4.2.3 Low and large friction asymptotics of the rate function

Corollary 4**.**

Proof.

Corollary 5**.**

Proof.

5 Conclusion and perspectives

Acknowledgments

6 Proofs

6.1 Proof of the large deviations principle

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Lemma 5**.**

Proof.

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Step 1: Compactness of the evolution operator.

Step 2: Existence of the principal eigenvalue.

Step 3: Properties of hfh_{f}hf​.

Step 4: Properties of eigenspaces and eigenfunctions.

Step 5: The principal eigenvalue is the cumulant function.

Lemma 8**.**

Proof.

Remark 8**.**

Lemma 9**.**

Proof.

Proof of Theorem 1.

Remark 1.

Remark 2.

Remark 3.

Remark 4.

Assumption 1 (Hypoellipticity).

Assumption 2 (Controllability).

Assumption 3 (Witten–Lyapunov condition).

Remark 5.

Proposition 1.

Proposition 2.

Theorem 1.

Remark 6 (Reversible processes, Witten Laplacian and Cramér’s condition).

Corollary 1 (Level 1 large deviations principle).

Proposition 3.

Corollary 2.

Theorem 2.

Corollary 3.

Remark 7.

Assumption 4.

Proposition 4.

Proposition 5.

Assumption 5.

Lemma 1.

Theorem 3.

Corollary 4.

Corollary 5.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

Lemma 7.

Step 3: Properties of $h_{f}$ .

Lemma 8.

Remark 8.

Lemma 9.

Lemma 10.

Remark 9.

Remark 10.

Theorem 4 (Projective limit - Gärtner–Ellis).