Large deviations of empirical measures of diffusions in weighted topologies
Gr\'egoire Ferr\'e, Gabriel Stoltz

TL;DR
This paper establishes large deviation principles for empirical measures of diffusion processes, linking spectral gap conditions to a generalized Cramér condition, and analyzes the Donsker-Varadhan rate functional in various dynamics.
Contribution
It introduces new conditions for large deviations of empirical measures in diffusion processes, extending classical results to unbounded functions and degenerate diffusions.
Findings
Large deviations principle (LDP) for unbounded functions in diffusion processes.
Spectral gap condition related to a generalized Cramér condition.
Application of results to Langevin dynamics in unbounded spaces.
Abstract
We consider large deviations of empirical measures of diffusion processes. In a first part, we present conditions to obtain a large deviations principle (LDP) for a precise class of unbounded functions. This provides an analogue to the standard Cram\'er condition in the context of diffusion processes, which turns out to be related to a spectral gap condition for a Witten-Schr\"odinger operator. Secondly, we study more precisely the properties of the Donsker-Varadhan rate functional associated with the LDP. We revisit and generalize some standard duality results as well as a more original decomposition of the rate functional with respect to the symmetric and antisymmetric parts of the dynamics. Finally, we apply our results to overdamped and underdamped Langevin dynamics, showing the applicability of our framework for degenerate diffusions in unbounded configuration spaces.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Large deviations of empirical measures of diffusions in weighted topologies
Grégoire Ferré and Gabriel Stoltz
Université Paris-Est, CERMICS (ENPC), Inria, F-77455 Marne-la-Vallée, France
Abstract
We consider large deviations of empirical measures of diffusion processes. In a first part, we present conditions to obtain a large deviations principle (LDP) for a precise class of unbounded functions. This provides an analogue to the standard Cramér condition in the context of diffusion processes, which turns out to be related to a spectral gap condition for a Witten–Schrödinger operator. Secondly, we study more precisely the properties of the Donsker–Varadhan rate functional associated with the LDP. We revisit and generalize some standard duality results as well as a more original decomposition of the rate functional with respect to the symmetric and antisymmetric parts of the dynamics. Finally, we apply our results to overdamped and underdamped Langevin dynamics, showing the applicability of our framework for degenerate diffusions in unbounded configuration spaces.
1 Introduction
Empirical averages of diffusion processes and their convergence are commonly studied in statistical mechanics, probability theory and machine learning. In statistical physics, an observable averaged along the trajectory of a diffusion typically converges to the expectation with respect to its stationary distribution, which provides some macroscopic information on the system [74, 84]. For reversible dynamics, this convergence is known to be characterized by an entropy functional [106, 7], which generalizes results for small fluctuations such as the central limit theorem [75] or Berry-Esseen type inequalities [91]. It has been shown for some time that the approach can be extended to nonequilibrium systems by considering generalized entropy and free energy functionals, as provided by the theory of large deviations [28, 45, 106]. From a more computational perspective, studying the convergence of empirical averages is an important problem for the efficiency of Monte Carlo Markov Chain methods [1, 100, 98, 36].
Since its initiation by Cramér in the 30s [25, 108], large deviations theory has been given many extensions. The theory takes its origin in the study of fluctuations for sums of independent variables, leading to the celebrated Sanov theorem [29]. Interestingly, the necessity of Cramér’s exponential moment condition for Sanov’s theorem to hold in a Wasserstein topology has been proved only recently [111].
Due to the above mentioned applications, it is natural to try to apply such a theory to diffusions, or more generally Markovian dynamics. This is useful for instance in statistical physics, when considering Gallavotti–Cohen fluctuation relations for irreversible dynamics [52, 79, 78], as well as for characterizing dynamical phase transitions in physical systems [54, 3, 89, 92]. From a more computational perspective, studying the rate function associated with a given dynamics is interesting for designing better sampling strategies [40, 98, 99], which is important for instance in a Bayesian framework [19, 14] or for molecular dynamics [82, 83]. The approach can also be used for deriving concentration results such as Bernstein-type inequalities [53, 13] and uncertainty quantification bounds [73, 57].
However, proving a large deviations principle for correlated processes turns out to be a difficult task. A milestone in the theory is the series of papers by Donsker and Varadhan [31, 32, 34, 35] and the dual approach followed by Gärtner and Ellis [55, 44]. The strategy of the former works is to build explicitly lower and upper large deviations bounds from the Girsanov theorem and the Tchebychev inequality [109]. On the other hand, the Gärtner–Ellis theorem relies on the existence and regularity of a free energy functional. This technique has been later related to optimal control problems through the so-called weak convergence approach [38, 39].
Whichever strategy is chosen, proving large deviations principles for empirical measures of diffusions in unbounded configuration spaces remains difficult. Indeed, studying the stability of unbounded Markov processes is already challenging, and often relies on Lyapunov function techniques [87, 86, 97, 60]. Such a Lyapunov function can be interpreted as an energy associated with the system, which decreases in average and provides a control on the excursions of the process far away from the origin. This technique can be used for proving LDPs, see for instance [109, Section 9] and [30, 115, 39]. However, the LDPs of the above mentioned works are stated in the so-called strong (resp. weak) topology, i.e. with respect to the topology on measures associated with the convergence of measurable bounded (resp. continuous bounded) functions. To the best of our knowledge, convergence in Wasserstein-like topologies (i.e. associated with unbounded functions) for diffusions has only been addressed in [76], and [115, Section 2.2]. Unfortunately, the nonlinear approach of [76] does not allow to characterize precisely the set of functions for which the LDP holds, while [115] considers a particular system (Langevin dynamics). In both cases, the rate function is not related to the standard Donsker–Varadhan theory [33]. Our first result is to derive the LDP in a weak topology associated with unbounded functions, under very natural conditions, and to express the rate function in duality with a free energy. From a practical point of view, this allows to compute the rate function from the free energy, a standard procedure [56, 106, 23, 88, 48].
Once a large deviations principle has been derived, providing alternative expressions of the rate function is an important problem. This can be useful for computing this function more efficiently, or for interpreting some key aspects of the dynamics (such as irreversibility for physical systems). Our first contribution in this direction is to derive a variational representation of the rate function similar to the Donsker–Varadhan formula [33]. This provides a variational representation of the principal eigenvalue for any non-symmetric linear second order differential operator associated with a diffusion, under confinement and regularity conditions. To the best of our knowledge, there is no such formula in an unbounded setting, a fortiori for unbounded functions. Finally, it has been shown in a pioneering work [15], for a specific choice of dynamics, that the above mentioned duality allows to decompose the rate function into two parts: one corresponding to a “reversible” part and the other to an “irreversible part” of the dynamics. We extend these results to general diffusions by using Sobolev seminorms, a feature inspired by the small fluctuations framework developed in [75]. This decomposition turns out to be useful for various purposes. For illustration we apply it to study more precisely the rate function of the Langevin dynamics, in particular its dependence on the friction both in the Hamiltonian and overdamped limits.
We now sketch the main results of the paper, the precise setting being presented in Section 2.1.
Main results.
Consider a diffusion process over a state space with generator , invariant probability measure , and empirical measure
[TABLE]
where is the Dirac measure at .
Our first contribution is to prove a large deviations principle for the empirical measure in a weak topology associated with an unbounded function . That is, we prove the following type of long time scaling: for ,
[TABLE]
where is a rate function. Here, denotes the set of probability measures on , and the above scaling holds for the weak topology on associated with measurable functions satisfying
[TABLE]
As is standard for LDPs on unbounded state spaces [109, 115], our result relies on the existence of a twice differentiable Lyapunov function such that
[TABLE]
has compact level sets (in other words, it goes to infinity at infinity). Unlike previous works, where this condition implies the asymptotic equivalence (2) in the weak topology corresponding to the convergence of measures tested against bounded test functions [109, 39, 115], we show in Section 2 that the LDP holds for the weak topology associated with any cost function controlled by (see Section 2.1 for details). Moreover, the associated rate function , also called entropy, reads
[TABLE]
where
[TABLE]
is the cumulant or free energy function.
We mention that our strategy relies on the Gärtner–Ellis theorem, according to which the existence and regularity of (5) implies the large deviations principle. We actually show that (5) is well-defined because it matches the principal eigenvalue of the Feynman–Kac operator
[TABLE]
A key remark for defining the above operator is that the process
[TABLE]
is a local martingale, as noted by Wu in [115]. This allows to define (6) for functions such that , as soon as is dominated by the function defined in (4). As a result, for any such , the operator (6) can be shown to be compact over the space of functions controlled by (see [55, 47]), and the functional (5) is obtained as the largest eigenvalue of the operator (6) through a generalized Perron–Frobenius theorem (the Krein–Rutman theorem [27]).
The second part of our work consists in rewriting the rate function . For this, we first show that
[TABLE]
where is an appropriate domain defined in Section 3. This formula is similar to the one proved in [33], but differs by additional growth conditions in the definition of . This result leads to a variational formula for the largest eigenvalue of the operator defined on a suitable functional space through
[TABLE]
We mention that the proof of (8) relies on the spectral problem associated with the Feynman–Kac operator (6), and uses tools from the recent work [47].
Finally, the variational representation (8) allows to generalize the results of [15] by splitting into two parts. More specifically, denoting by the decomposition into symmetric and antisymmetric parts of the generator considered on , we obtain, for any :
[TABLE]
where and refer to Sobolev seminorms defined in Section 2.1. Interestingly, the proof relies on a generalized Witten transform performed in the variational representation (8), which we may therefore call variational Witten transform. This shows that, for a given invariant measure, an irreversible dynamics () produces more entropy than a reversible one, in accordance with the second law of thermodynamics. This decomposition is useful for instance to study the entropy production of the Langevin dynamics, which is irreversible but has a particular structure. In this case, there is a natural identification of the effect of the reversible and irreversible parts of the dynamics on fluctuations.
Organization of the work.
The paper is organized as follows. In Section 2 we prove the large deviations principle under Lyapunov and regularity conditions. In Section 3 we rewrite the rate function and give its decomposition into symmetric and antisymmetric parts. Some examples of application are given in Section 4, in particular for overdamped and underdamped Langevin dynamics. Section 5 discusses possible extensions and connections with related works. Finally, most of the proofs are postponed to Section 6.
2 Large deviations principle
2.1 Setting
This section introduces the main notation used throughout the paper. We consider a diffusion process evolving in with , and satisfying the following stochastic differential equation (SDE):
[TABLE]
where , and is a -dimensional Brownian motion for some .
Remark 1**.**
The analysis can easily be extended with appropriate modifications to other spaces such as or , where is the -dimensional torus. The last case is motivated by applications to the Langevin equation, where would be a bounded position space and the unbounded momentum space (see Section 4.2).
The generator of the dynamics (9), denoted by , reads
[TABLE]
where denotes the transpose of the matrix and is the scalar product on . Moreover, stands for the Hessian matrix, and for two matrices belonging to , we write . The conditions on and will be made precise in Section 2.2. The function takes values in the set of symmetric positive matrices (not necessarily definite). We also introduce the carré du champ operator [5] associated with defined by, for two regular functions , :
[TABLE]
We will use the space (resp. ) of smooth functions with compact support (resp. continuous and bounded functions), as well as the space of smooth functions growing at most polynomially and whose derivatives also grow at most polynomially:
[TABLE]
where with .
The space of bounded measurable functions, denoted by , is endowed with the norm
[TABLE]
Moreover, we will need weighted function spaces and the corresponding probability measure spaces, which commonly appear in Markov chain theory [87, 76, 60]. For any measurable function we define
[TABLE]
and the associated space of probability measures (see [102, Chapter 2] for duality results on measure spaces):
[TABLE]
The associated weighted total variation distance is (see for instance [60]):
[TABLE]
where denotes the total variation measure associated to , see [102, Chapter 6].
Remark 2**.**
Note that the spaces (12) and (13) are defined for an arbitrary measurable function . It is possible to weaken the assumption but we will not need these refinements in this paper.
We denote by -topology the weak topology on associated with the convergence of measures tested against functions belonging to (we may also use the notation ); see [30]. This means that for a sequence in , in the -topology if for any . Recall that the -topology is stronger than the usual weak topology on , which corresponds to the convergence for any . The -topology can be extended to account for convergence of measures tested against the larger class of functions . We denote by the associated topology , see [115, 76].
We associate to the dynamics started from the semigroup defined through
[TABLE]
where stands for the expectation with respect to all realizations of the Brownian motion in (9). Let us mention that, with some abuse of notation but for the sake of readability, we will not write out explicitly the dependence of on in the proofs presented in Section 6, see the discussion at the beginning of this section. We say that is invariant with respect to the dynamics if for any , with the notation
[TABLE]
This implies in particular that for , see [46, Proposition 9.2].
We now follow the path of [75, Chapter 2] for defining other useful functional spaces. For any probability measure , let
[TABLE]
For , we introduce the seminorm
[TABLE]
and the equivalence relation through: if and only if . We denote by the closure of quotiented by for the norm . Note that and are not subspaces of each other in general, but for instance if satisfies a Poincaré inequality and is positive definite. The difference between and is however important for degenerate dynamics, see the application in Section 4.2. We now construct a space dual to with the same density argument by introducing the seminorm: for ,
[TABLE]
We define similarly the equivalence relation on by if and only if . The space is then the closure of quotiented by . This is actually the dual space of , see [75, Section 2.2, Claim F].
Let us relate to the more standard Sobolev space. If is invariant with respect to then, for , it holds (using that )
[TABLE]
In particular, when we have
[TABLE]
In this case, is the standard Sobolev seminorm [83]. An in-depth discussion on the space and its use for proving central limit theorems for Markov processes is provided in [75, Chapter 2].
Remark 3**.**
The space has a role comparable to the subspace of functions in with average zero with respect to since (but of course the functions of do not belong to in general). Assume indeed that (so ), (which is not restrictive upon considering ) and . We may choose such that
[TABLE]
and set , so for some constant independent of . The definition of shows that
[TABLE]
By the dominated convergence theorem it holds
[TABLE]
Since , we obtain by letting that .
We also introduce some notation concerning the growth of functions. A function is said to have compact level sets if for any the set
[TABLE]
is compact (with the convention that is compact). A function is said to be negligible with respect to (denoted by ) if has compact level sets, and is said to be equivalent to (denoted by ) if there exist constants and such that
[TABLE]
Remark 4**.**
The above definitions are useful when the state space is unbounded. A sufficient condition for to have compact level sets in this case is for this function to be lower semicontinuous and to go to infinity at infinity (i.e. to be coercive). If was bounded, all these criteria would be automatically met for smooth functions.
Finally, we denote by and the inferior and superior limits respectively, while for a subset of a topological space , and denote the interior and closure of for the chosen topology on . The function denotes the indicator function of the set , i.e. if and otherwise. For a Banach space , refers to the Banach space of bounded linear operators over with the usual norm. We recall some elements of large deviations theory in Appendix A for the reader’s convenience.
2.2 Statement of the main results
The large deviations principle relies on three standard assumptions: hypoellipticity of the generator, irreducibility of the dynamics, and a Lyapunov condition.
We start with our hypoellipticity assumption (which could certainly be relaxed for particular applications, see for instance [115]). It will be useful for proving regularity of the Feynman–Kac semigroup in Lemma 5. We denote by the adjoint of a (closed) operator considered on .
Assumption 1** (Hypoellipticity).**
The functions and in (9) belong to and , respectively, and the generator defined in (10) satisfies the hypoelliptic Hörmander condition. More precisely, can be written as
[TABLE]
where are first order differential operators with coefficients belonging to such that the family
[TABLE]
spans at any for a finite number of commutators .
This assumption is natural in practical situations, as illustrated in the applications of Section 4 covering elliptic and hypoelliptic diffusions, see [65, 43, 97] for details. Note that excluding the operator from the first family means that, if satisfies Assumption 1, is hypoelliptic and the transition kernel of has a smooth density for any .
The regularity requirement comes together with a controllability condition (recall that takes values in ).
Assumption 2** (Controllability).**
For any and , there exists a control such that the path defined as
[TABLE]
is well-defined and satisfies .
Assumption 2 together with Assumption 1 implies that the process is irreducible, i.e. that the transition density of is everywhere positive (by adapting the argument of [97, Proposition 8.1]), which will be used in Lemma 6. Note that constructing a control may be difficult in general [70]. However, for the overdamped and underdamped Langevin dynamics we are interested in, building such a control turns out to be guenuinely feasible, see [86, 105, 97, 83, 85] and references therein. Let us mention that the above two assumptions are standard for proving LDPs [109, 115].
A recurrent idea when studying Markov chain stability and large deviations on an unbounded state space is to reduce the analysis to a compact set and to control the excursions of the dynamics out of this set with a Lyapunov function [87, 115]. Our Witten–Lyapunov condition for the dynamics reads as follows (for the terminology, see Remark 6 below).
Assumption 3** (Witten–Lyapunov condition).**
There exists a function of class , with compact level sets and such that
[TABLE]
has compact level sets. Moreover, there exists a function such that, for some constants , ,
[TABLE]
In all what follows, we consider an arbitrary function belonging to such that:
- •
;
- •
either (i) bounded, or (ii) has compact level sets and there exists such that
[TABLE]
Remark 5**.**
Note that the condition implies in particular that . In addition, since and , it holds . These facts will be frequently used in the proofs. Moreover the conditions (21) are not restrictive for exponential-like Lyapunov function as shown in Proposition 1 below – the idea being that can be set to . The condition (22) also typically holds because is chosen of exponential type while is a polynomial. In practice, the auxiliary function is used to obtain some control in the proofs of Lemmas 3 and 5 (in particular to apply a Grönwall lemma). Assumption 3 could certainly be phrased differently, possibly with weaker conditions on the functions at stake.
Although we stated Assumption 3 in order to fit standard conditions when considering large deviations on unbounded state spaces [109, 115], in practice it can be obtained from a non-linear Lyapunov condition in the spirit of [76] and [39, Condition 2.2]. This is the purpose of the next proposition, whose proof is postponed to Appendix B.
Proposition 1**.**
Assume that there exists such that:
- •
* has compact level sets;*
- •
* has compact level sets;*
- •
for any ,
[TABLE]
Then Assumption 3 is satisfied with
[TABLE]
for and small enough. In this case it holds
[TABLE]
Moreover, condition (22) holds true for any function of class such that either (i) is bounded or (ii) has compact level sets, satisfies and there exists with
[TABLE]
Note that (23) means that the term coming from the dynamics must compensate the quadratic loss proportional to . We also mention that the condition (24) is not restrictive in general since it is typically satisfied by polynomial-like functions .
A first consequence of Assumptions 1 to 3 is the ergodicity of the dynamics, whatever the initial distribution for .
Proposition 2**.**
Under Assumptions 1, 2 and 3, (9) has a global strong solution, and the process admits a unique invariant probability measure . This measure has a positive -density with respect to the Lebesgue measure: there exists with such that . Moreover, the dynamics is ergodic with respect to : there exist such that
[TABLE]
Equivalently,
[TABLE]
Proof.
The existence of a unique local strong solution is standard when Assumption 1 holds, see [96, Chapter IX, Exercise 2.10]. Assumption 3 then implies the existence of , such that
[TABLE]
and global existence can be deduced from the above Lyapunov inequality [97]. The end of the proof is a direct application of [97, Theorem 8.9] since Assumption 2 together with Assumption 1 ensures irreducibility. ∎
We can now present the large deviations principle associated with the empirical measure of the process with respect to its invariant measure . Recall that the empirical measure of the process is defined by
[TABLE]
where denotes the Dirac mass at . When one considers large deviations principles for empirical averages of the form (25), the topology on probability measures has to be specified. As mentioned in the introduction, most of the LDPs are stated in topologies associated with bounded measurable functions (resp. continuous bounded), the so-called strong topology or -topology (resp. weak topology). We now prove that, in our setting, a LDP holds in the -topology defined in Section 2.1, for any function satisfying Assumption 3. The proof of Theorem 1 is presented in Section 6.1. We recall that a rate function is said to be good if its level sets are compact.
Theorem 1**.**
Suppose that Assumptions 1, 2 and 3 hold true, and consider a function as in Assumption 3 and fixed. Then, the functional
[TABLE]
does not depend on , is well-defined, convex and finite, and satisfies a LDP in the -topology with the good rate function defined by:
[TABLE]
More precisely, for any -measurable set and any , it holds
[TABLE]
where the interior and closure of are taken with respect to the -topology. Finally, for any , it holds if and only if ; and, for any sequence such that as , it holds
[TABLE]
almost surely in the -topology.
Our conclusion is in essence close to that of [76], but the conditions to reach it seem more natural to us and correspond to usual conditions for proving large deviations principles in an unbounded state space, see [115, 39] and [109, Section 9]. In particular, they allow to derive the duality representation (27), and we do not need to consider non-linear operators. Our strategy (presented in Section 6.1) relies on the Gärtner–Ellis theorem [55, 44, 45, 28], for which the existence of the free-energy (26) is a key element. The originality of our work is to make use of the local martingale (7) introduced by Wu [115] in order to solve the spectral problem associated with the Feynman–Kac operator, which proves the existence of the limit in (26). This directly provides the LDP in the -topology by duality. However, there may be cases in which a LDP holds although the conditions of the Gärtner–Ellis theorem are not satisfied, for instance in the framework of the Sanov theorem [111], so our conditions may not be necessary.
Let us also mention that, in addition to (29), we also show for completeness in the proof of Theorem 1 that almost surely spends a time of finite Lebesgue measure outside any -open set around .
Another advantage of our approach is to characterize precisely the set of functions for which a LDP holds from the standard condition on defined in (20), like in [31, 109]. This condition is also used in [115, Corollary 2.3] for proving a level 1 LDP for Langevin dynamics. We present below a clear connection with a spectral gap condition for the Witten–Schrödinger operator in the reversible case. The comparison with Cramér’s condition for independent variables highlights the effect of correlations on fluctuations.
Remark 6** (Reversible processes, Witten Laplacian and Cramér’s condition).**
Consider the following reversible diffusion
[TABLE]
where is a smooth potential with compact level sets. The generator of this dynamics is and its invariant probability measure reads , where we assume that
[TABLE]
Define
[TABLE]
for some . This is a standard choice for obtaining compactness of the evolution operator [97, Section 8], and optimal control representations of rate functions [39], see also Proposition 1. An easy computation shows that
[TABLE]
However, we also know [112] that the generator considered on is unitarily equivalent to the operator
[TABLE]
defined on (a procedure also called symmetrization [107, Section 4.3]), which is actually the opposite of the Witten Laplacian [112, 62]:
[TABLE]
In this case, the condition for (30) to have compact level sets when is actually equivalent to a confinement condition (or spectral gap condition [63]) for the Witten–Schrödinger operator defined in (31). In that sense, Assumption 3 is a natural generalization of a spectral gap condition for the Witten Laplacian in the case of possibly non-reversible dynamics. This is why we call Assumption 3 a Witten–Lyapunov condition.
We now compare this Witten–Lyapunov condition to Cramér’s exponential moment condition in the case of independent variables of law . Consider a smooth potential which behaves as for outside a ball centered on the origin. Assumption 3 is thus satisfied by application of Proposition 1. The standard Cramér condition in the case of independent variables states that the empirical measure
[TABLE]
satisfies a large deviations principle in the -topology if and only if [111, Theorem 1.1]:
[TABLE]
For , a sufficient condition for the above condition to hold is to choose a smooth function behaving as with . On the other hand, the Witten–Lyapunov potential (30) reads in this case
[TABLE]
so that we may choose behaving as for . When comparing the two conditions, we obtain the following different situations depending on :
- •
* (super-Gaussian case): , the Witten–Lyapunov condition is less restrictive than Cramér’s condition;*
- •
* (Gaussian case): , the two conditions are equivalent;*
- •
* (sub-Gaussian case): , the Witten–Lyapunov condition is more restrictive than Cramér’s condition.*
This simple example shows that considering a correlated system instead of independent variables has a non-trivial effect on the stability of the system. Depending on the confinement potential, the Witten–Lyapunov condition for (30) to have compact level sets can be more or less restrictive than Cramér’s condition for independent variables distributed according to the invariant measure . Finally, we remark that for , the process is heavy-tailed in the sense that and the observable (assuming ) does not satisfy a LDP. In other words, the average position of the process defined by
[TABLE]
cannot be shown to satisfy a large deviations principle at speed with our arguments.
We finally mention that, in the case where the observable grows faster at infinity than the potential , it seems possible to derive a level 1 large deviations principle at a speed smaller than . We refer to [90] for a recent account dealing with the case of an Ornstein–Uhlenbeck process, and to [16, 2] for related issues.
We close this section with a practical corollary of Theorem 1 which generalizes the level 1 LDP proved in [115, Corollary 2.3].
Corollary 1** (Level 1 large deviations principle).**
Suppose that Assumptions 1, 2 and 3 hold true and consider a function . Fix . Then, the function
[TABLE]
is well-defined and differentiable, and does not depend on . Moreover, satisfies a large deviations principle in at speed with good rate function given by
[TABLE]
where is defined in (27). Finally, it holds
[TABLE]
Corollary 1 is useful for practical applications, since (34) is a natural way to estimate the rate function associated with an observable , see for instance [56, 101, 104, 23, 48].
Proof.
For , the application is continuous in the -topology [30, Lemma 3.3.8]. Therefore, obeys a large deviations principle in by the contraction principle [28, Theorem 4.2.1], with good rate function given by (33). Moreover, one can redo the proofs leading to Theorem 1 and show that defined in (32) is smooth and well-defined on . This implies that a LDP with good rate function (34) holds through the Gärtner–Ellis theorem applied in . Since the rate function is unique, the expressions (33) and (34) coincide. ∎
3 Decomposition of the rate function
Our goal in this section is to rewrite in various ways, which is useful for theoretical understanding and practical purposes. In Section 3.1, we first show an extension of the standard Donsker–Varadhan formulation for . This result is obtained by making use of the spectral analysis of the operator for , which is presented in Section 6.1. We then apply this result to obtain a variational representation for the principal eigenvalue of . Next, in Section 3.2, we split the expression of the rate function according to the symmetric and antisymmetric parts of the dynamics, extending the work [15] to general diffusions. Such a decomposition will prove useful in Section 4 to compare the entropy of overdamped and underdamped Langevin dynamics. Most of the proofs of this section are postponed to Section 6.2.
3.1 Donsker–Varadhan variational formula
We start with the variational representation of the entropy. Our proof, which can be found in Section 6.2.2, is an adaptation of [30, Lemma 4.2.35] relying on the Feynman–Kac semigroup and its spectral elements. In order to state the result, we need to make sense of for functions . It turns out that the appropriate notion to this end is the extended domain of the generator considered as an operator on , defined in the following way: a function belongs to if and only if there exists a measurable function such that, for any ,
[TABLE]
and
[TABLE]
In this case we write (with some abuse of notation in view of the definition of as a differential operator in (10), but of course the expressions coincide when is a smooth test function with compact support).
When the -topology is considered, such extended domains were already considered for instance in [114, 115, 76], see also [26, Chapter I, Definition 14.15]. For the unbounded functions we consider, one should think of as an element of (see the proof of Lemma 10 below, as well as the comments following Proposition 3). The integrability condition (35) is reasonable in this context since is a well defined semigroup on in view of the Lyapunov condition (22).
We can now present the main result of this section.
Proposition 3**.**
The rate function defined in (27) admits the following representation:
[TABLE]
where
[TABLE]
In particular, the functional defined in (37) is equal to if or is not absolutely continuous with respect to .
This result is standard when is compact [33], but does not seem to be known for an unbounded space and for the -topology we consider. In this situation the space has to be designed with some caution. Note that is not empty since it contains the functions of the form for . Note also that the last statement of Proposition 3 is consistent with the Fenchel definition (27) of the rate function. In order to get some intuition on the formula (37), let us mention that the proof formally relies on replacing the maximum over functions by the supremum over eigenfunctions satisfying
[TABLE]
for . The above equation rewrites, since (see Lemmas 7 and 10),
[TABLE]
By integrating with respect to a measure we find (37) on the left hand side, and the Fenchel transform (27) on the right hand side. The functional spaces associated with and motivate the choice of , in particular the fact that (as the sum of an element in and the product of a function in and another one in ), which allows to define in the weak sense (36).
A natural consequence of Proposition 3 is the following variational representation for the cumulant function. The proof, postponed to Section 6.2.3, relies on the convexity of the cumulant function to invert the Fenchel transform (27).
Corollary 2**.**
Suppose that Assumptions 1, 2 and 3 hold true, and consider . Then,
[TABLE]
where is defined in (37).
Corollary 2 may seem anecdotal, but it provides a variational representation for the principal eigenvalue of non-symmetric diffusion operators, as pioneered by Donsker and Varadhan in their seminal paper [33] for a compact space . To the best of our knowledge, this formula had not been shown in an unbounded setting, for which we need to introduce the “generalized domain” defined in (38). However, our set of assumptions implies that can be thought of as the largest eigenvalue of , and turns out to be isolated for any (because of the compactness of the resolvent provided by Lemma 7), whereas in [33], (39) may be the supremum of the essential spectrum of the operator. This suggests that (39) holds under weaker assumptions. A possible approach for generalizing our results may be to consider different methods for studying the long time behaviour of unnormalized semigroups, see for instance [20, 6, 21], or to resort to more subtle spectral analysis tools [113, 116, 53, 13].
3.2 Entropy decomposition: symmetry and antisymmetry
Our goal is now to provide refined expressions for the rate function in terms of symmetric and antisymmetric parts of the dynamics, inspired in particular by [15]. In the following, for any closed operator , we denote by its adjoint on , where is the invariant probability measure of the process, as obtained in Proposition 2. Considering the generator of the diffusion (9), we can always decompose it into symmetric and antisymmetric parts with respect to through
[TABLE]
It is important to note that is a first order differential operator (and therefore obeys the chain rule of first order differentiation). We assume here that the operators admit as a common core (but the domains of these operators may be different).
The decomposition (40) allows to separate the rate function (37) into two parts. This is the purpose of the next key result, whose proof can be found in Section 6.2.4. It is inspired by the computations in [15, Proposition 2], which we simplify and generalize here through a variational Witten transform and the use of the Sobolev spaces introduced in Section 2.1. The algebra of the proof also suggests to consider for probability measures of the form .
Theorem 2**.**
Suppose that Assumptions 1, 2 and 3 hold true, consider a measure such that with and . Then, the rate function defined in (37) admits the following decomposition:
[TABLE]
where
[TABLE]
and
[TABLE]
Theorem 2 expresses the rate function as the sum of dual norms of the symmetric and antisymmetric parts of the dynamics. Note also that we consider a measure of the form , that is the Radon–Nikodym derivative of with respect to is positive. However, we believe that we can consider more general measures , see Remark 10 in the proof. Since the measure at hand appears both inside the norms and in the definition of the norms themselves, a possibly clearer rewriting is the following:
[TABLE]
Moreover, the symmetric part of the rate function (42) can be written as a Fisher information for the invariant measure , a standard result [55]: denoting by , it holds
[TABLE]
The next corollary builds upon (43) by rewritting using a Poisson equation, which can be manipulated more easily. The proof can be found in Section 6.2.5.
Corollary 3**.**
Suppose that Assumptions 1, 2 and 3 hold true, and consider a measure such that with and . Then, the antisymmetric part of the rate function (43) reads
[TABLE]
where is the unique solution in to the Poisson equation
[TABLE]
the symmetric matrix being defined in (10) and denoting the adjoint of the gradient operator in .
It has been known for a long time [33] that the rate function of a reversible process is a Fisher information as in (42). The antisymmetric part of the rate function has been less investigated, although an expression like (44) already appears in [55] (see also [98, 15]). However, our setting provides natural well-posedness conditions for both parts of the rate function to be finite. Moreover, the uniqueness of is a consequence of the definition of through equivalence classes, see Section 2.1.
Interestingly, the solution of (45) can be formally represented through [83]
[TABLE]
where . The stochastic process associated with is reversible with respect to . Denoting by the density of with respect to the Lebesgue measure, is solution to the following SDE:
[TABLE]
Finally (44) takes the form
[TABLE]
The antisymmetric part of the entropy is therefore the autocorrelation of along a reversible process that realizes the fluctuation corresponding to the measure . From a mathematical point of view, it seems interesting to relate (46) to the so-called level 2.5 of large deviations [7, 24], since this approach consists in considering joint fluctuations of the empirical measure and the associated empirical current. In this case, the large deviations function is explicit: this reflects the fact that a Markov process is characterized entirely by its density and current. Exploring further the connection between (46) and level 2.5 large deviations is an interesting direction for future works.
Remark 7**.**
It is also possible to consider the adjoint not with respect to the invariant measure (whose analytical expression may be unknown), but instead with respect to a reference measure with a known analytical expression such that for some measurable function (with ). This leads to an additionnal term in the expression of the rate function (41), as can be readily checked by a straightforward adaptation of the proof. The operators and are the counterparts of the symmetric and antisymmetric parts of the generator in this decomposition. A typical situation to apply this strategy is provided by systems subject to a small external nonequilibrium forcing, the reference measure usually being chosen as the invariant measure at equilibrium, in the absence of external forcing. Atom chains in contact with an inhomogeneous heat bath were studied with this approach in [15], being the Gibbs measure associated with a fixed temperature profile.
4 Applications
4.1 Overdamped Langevin dynamics
In this section, we come back to the setting of Remark 6 by considering a diffusion process over subject to
[TABLE]
where is a smooth function and is a -dimensional Brownian motion. This corresponds to (9) with , in which case the generator reads
[TABLE]
We will treat the reversible case where for a smooth potential , and for a smooth function such that . In both cases, the invariant probability measure of the process is (assuming )
[TABLE]
The dynamics (47) is reversible (i.e. , where denotes the adjoint of in ) if and only if . We now give a standard condition on under which the framework developped in Sections 2 and 3 applies.
Assumption 4**.**
The potential has compact level sets, satisfies and, for any , it holds
[TABLE]
This assumption is satisfied for smooth potentials growing like for at infinity, and it also implies that the invariant probability measure satisfies a Poincaré inequality [4]. Similar conditions are derived in [76] in the context of large deviations. The next proposition is a direct application of Propositions 1 and 2, Theorem 1 and Corollary 3.
Proposition 4**.**
Under Assumption 4, the process (47) with admits the function
[TABLE]
for any as a Lyapunov function in the sense of Assumption 3. For any fixed , there exist such that for any initial measure ,
[TABLE]
Moreover,
[TABLE]
has compact level sets and, for any belonging to , bounded or with compact level sets and such that
[TABLE]
the empirical measure
[TABLE]
satisfies a large deviations principle in the -topology. The good rate function is defined by: for all with ,
[TABLE]
and otherwise.
In this reversible example, we see that the rate function is only defined through its symmetric part (42), as shown in Theorem 2. We now consider a modification of this dynamics when a divergence-free drift is added. The next proposition is an extension of the examples proposed in [98] to the unbounded state space case.
Proposition 5**.**
Suppose that Assumption 4 holds and consider the diffusion process solution to:
[TABLE]
with a smooth vector field such that and
[TABLE]
where is defined in (50). Then, with the notation of Section 3.2 it holds and . Moreover
[TABLE]
and satisfies a LDP in the -topology for any function belonging to , bounded or with compact level sets and such that
[TABLE]
The associated rate function reads: for any such that with and ,
[TABLE]
where is the unique -solution to
[TABLE]
Proposition 5 shows that, in this simple case, the equilibrium and nonequilibrium dynamics admit a LDP for the same class of functions but with different rate functions, the irreversible dynamics producing more entropy. It is therefore an extension of the case treated in [98, Theorem 2.2]. As for this result, Proposition 5 can be used to design algorithms with accelerated convergence to equilibrium, see also [66, 67, 37]. A setting in which Proposition 5 typically applies is when behaves as for some outside an open set centered on the origin, and with such that (see [98]). The latter condition implies in particular that so (52) immediately holds.
4.2 Underdamped Langevin dynamics
We now apply our framework to the underdamped Langevin dynamics. A first nice feature of our results is that, compared to [115], we obtain a stronger result with similar assumptions – that is our LDP for the empirical measure holds for a finer topology than the one associated with bounded measurable functions. Note however that [115, Corollary 2.3] obtains results similar to ours for a contraction of the rate function. In addition, Theorem 2 and Corollary 3 allow to obtain precise results on the dependency of the rate function on the friction parameter .
We start by describing the Langevin equation in Section 4.2.1, before stating the large deviations principle in Section 4.2.2. Finally Section 4.2.3 provides asymptotics on the rate function depending on the friction.
4.2.1 Description of the dynamics
The dynamics is set on , with evolving as
[TABLE]
where is a friction parameter, is a smooth potential, and is a -dimensional Brownian motion. We could also consider the easier case where the position space is bounded () but leave this simple modification to the reader. The generator of the dynamics is
[TABLE]
where
[TABLE]
The operator leaves invariant the measure
[TABLE]
The invariant measure (56) can be written
[TABLE]
where
[TABLE]
is the Hamiltonian of the system, and we assume that the normalization constant in (57) is finite (which is indeed the case when ). In (55), the Liouville operator corresponding to the Hamiltonian part of the dynamics is antisymmetric in . On the other hand, the fluctuation-dissipation part with generator is symmetric in , so that and with the notation of Section 3.2.
Before turning to the LDP associated with the Langevin dynamics (54), we give some intuition on the behaviour of the process as varies. First, it is clear that in the small limit, (54) becomes the Hamiltonian dynamics
[TABLE]
To be more precise, we introduce the process where is solution to (54). It can then be shown that, in the limit , the Hamiltonian converges to an effective diffusion on a graph [51, 49, 50, 61]. In particular the relevant time scale in the underdamped limit is .
On the other hand, in the limit and under an appropriate time rescaling, we recover the overdamped dynamics studied in Section 4.1. To see this, we integrate the second line in (54) to obtain
[TABLE]
By introducing now and , the latter equality becomes
[TABLE]
When , we observe that converges formally towards the solution of (47), see [93, Section 6.5]. The relevant time scale in the overdamped limit is therefore . These remarks will be of interest below when studying the rate function associated with the dynamics (54).
4.2.2 Large deviations
In order to obtain a large deviations principle for (54), let us make the following classical assumption on the growth of the potential [115, 86, 77, 83].
Assumption 5**.**
The potential has compact level sets, satisfies and there exist , such that
[TABLE]
We can now find a Lyapunov function for (54) by following e.g. [115, 105, 86], as made precise in Appendix C. Recall that the Hamiltonian is defined in (58).
Lemma 1**.**
Suppose that solves (54) where satisfies Assumption 5. Then for any and , there exists such that
[TABLE]
is a Lyapunov function in the sense of Assumption 3. More precisely, for any and , there exist and such that
[TABLE]
The Lyapunov function (59) can be adapted in cases where has singularities, see [64, 85]. We can now deduce our main theorem on the Langevin dynamics since Assumptions 1 and 2 are readily satisfied, see for instance [86].
Theorem 3**.**
Assume that solves (54) where satisfies Assumption 5, and consider a smooth function with for and , . Then is ergodic with respect to the measure in the sense of Proposition 2, with Lyapunov function defined in (59). Moreover, the empirical measure
[TABLE]
satisfies a LDP in the -topology. Finally, for any such that with and , the rate function reads
[TABLE]
where is the unique solution in to the Poisson problem:
[TABLE]
The proof of Theorem 3 is a direct application of the results of Sections 2 and 3. For the expression of the rate function, we use (45) and (55) together with the fact that in this case, the matrix defined in Section 2.1 reads
[TABLE]
While can be chosen independently of the friction , it is interesting to note the dependency of the rate function (60) with respect to this parameter. We discuss more precisely the scaling of the rate function with respect to in the next section, depending on the form of .
4.2.3 Low and large friction asymptotics of the rate function
The next corollary shows how the decomposition (60) allows to identify the most likely fluctuations in the overdamped and underdamped limits. By this we mean that, when or , most fluctuations become exponentially rare in or , but some of them are associated with rate functions that vanish as and . The expression of these typical fluctuations is motivated by the discussion on the overdamped and underdamped limits in Section 4.2.1, from which the scalings of the rate function appear natural. Recall the definition of the marginal in position in (56).
Corollary 4**.**
Suppose that the assumptions of Theorem 3 hold true.
- •
Overdamped limit :* Consider a measure with equilibrated in the velocity variable, i.e. such that with and . Then, for any ,*
[TABLE]
where .
- •
Hamiltonian limit :* Consider a Hamiltonian fluctuation, i.e. with for , where is defined in (58). Then, for any ,*
[TABLE]
The proof is an immediate consequence of (60).
Proof.
Consider first the case where with . We have
[TABLE]
Next, (61) becomes
[TABLE]
The solution to this equation is which indeed belongs to since (in fact we may add to any function depending on only but the solutions would be equivalent by definition of the space in Section 2.1). Plugging this solution into (60) leads to (62).
Assume now that belongs to with . It holds
[TABLE]
As a result, the solution to (61) is (again, up to a function of only), from which (63) follows since . ∎
Corollary 4 characterizes the dominant fluctuations in the small and large friction regimes. In the overdamped limit the dominant fluctuations are in position only, and the rate function is actually that of the limiting overdamped dynamics (51) up to a time rescaling in , which is coherent with the discussion on the overdamped limit in Section 4.2.1. On the other hand, in the Hamiltonian limit , the dominant fluctuations are Hamiltonian, with the inverse time rescaling . This is consistent with the small temperature limit of Hamiltonian systems [49].
Although Corollary 4 provides interesting information, its structure is quite rigid. For instance, in the overdamped limit, we consider only position-dependent perturbations, which is not realistic. We now refine the asymptotics by considering the next order correction in for the perturbation in both regimes, which shows the robustness of the analysis. In the result stated below, we consider a family of probability measures indexed by , and simply denote by the probability measure .
Corollary 5**.**
Suppose that the assumptions of Theorem 3 hold true.
- •
Overdamped limit :* Consider the measure defined by with where , and is bounded and satisfies and . Then*
[TABLE]
where .
- •
Hamiltonian limit :* Consider with , where , , and is bounded and satisfies . Then*
[TABLE]
where is the unique solution in to
[TABLE]
We believe that it is also instructive to mention the relation between the rate function (60) and the asymptotic variance of the Langevin dynamics. Indeed, when considering small perturbations of the invariant measure, Corollary 5 shows that
[TABLE]
On the other hand, the resolvent estimates in [82, Section 2.1] and [59, 61, 68] show that the asymptotic variance scales like
[TABLE]
Since we expect the asymptotic variance to be the inverse of the rate function around the invariant measure [29, 98], the scalings (67) and (68) are consistent. However, as (60) suggests, this scaling is no longer true for general fluctuations. We now present the proof of Corollary 5.
Proof.
We first consider the overdamped limit . Since is bounded we have, for any and ,
[TABLE]
Thus, the norms and are equivalent for any fixed , and the functions of and coincide (we repeatedly use this fact below, and we will use a similar argument when ). A similar conclusion holds for the corresponding dual norms. This consequence of the boundedness of makes the analysis simpler.
Recall that we consider in the overdamped limit. The symmetric part of the rate function is easily computed since only depends on the position variable, namely
[TABLE]
where we used that belongs to and is bounded to expand the exponential. For the antisymmetric part, by (61), we have to consider the solution to
[TABLE]
Corollary 4 suggests that at leading order in it holds where . In order to make this idea more precise we compute
[TABLE]
In what follows, we denote by the right hand side of the above equation. Since and by assumption, it holds . Thus, multiplying by and integrating with respect to we obtain
[TABLE]
Using the duality between and (see [75, Section 2.2 Claim F]) and (69) we find
[TABLE]
where is some constant independent of . This shows that with for a constant and all . Plugging this estimate into (60) and using that , we obtain the second term on the right hand side of (64).
The arguments to prove the limit follow a similar path, so we only sketch the proof. First, the boundedness of allows again to compare the Sobolev norms associated with and for any (by writting the counterpart of (69) in this regime). The first term on the right hand side of (65) is easily obtained as in Corollary 4 using that and is bounded. Concerning the antisymmetric part, (61) now reads
[TABLE]
since . Because of the scaling in on the right hand side of the above equation, the solution can be expanded as in , where is solution to
[TABLE]
This reasoning can be made rigorous by a precise asymptotic analysis as above. Plugging this expansion into (60) provides the second term on the right hand side of (65). ∎
5 Conclusion and perspectives
The goal of this paper was twofold. Our first aim was to provide, given a diffusion process, a precise class of unbounded functions for which a large deviations principle holds. This question is answered in Section 2 were we prove a LDP for the empirical measure in a topology associated with unbounded functions, in relation with a Witten–Lyapunov condition. In particular, a comparison with Cramér’s condition for independent variables shows the effect of correlations on the stability of the SDE at hand. These results extend in several directions and refine results from previous works [115, 76]. However, the necessity of our Lyapunov condition for a LDP to hold is still an open problem – whereas the necessity of a similar condition is known for the Sanov theorem [111]. Our second concern was to provide finer expressions of the rate function governing the LDP, in particular in order to study Langevin dynamics which appear for instance in molecular simulation. We answer to this question in two ways in Section 3. We first provide an alternative variational formula for the rate function in Section 3.1, which gives as a by-product a very general representation formula for the principal eigenvalue of second order differential operators, without symmetry assumption. This extends the important work of Donsker and Varadhan [33] in an unbounded setting. In Section 3.2, we show a general decomposition of the rate function into symmetric and antisymmetric parts of the dynamics based on the computations in [15]. Interestingly, the proof of the result relies on a Witten-like transform in the above mentioned variational representation of the rate function. These results allow us to describe precisely the rate function of an irreversible overdamped Langevin dynamics in Section 4.1, revisiting results from [98] in an unbounded setting. More interestingly we provide in Section 4.2, for Langevin dynamics, asymptotics of the rate function for the overdamped and the underdamped limits. We thus characterize the most likely fluctuations in both regimes with a natural physical interpretation. Considering piecewise deterministic processes [11, 41, 42] (which lack regularity) instead of the Langevin dynamics is also an interesting problem.
We would like to mention several interesting directions for future works. A first natural issue is to rephrase our results in the optimal control framework developed e.g. in [18, 38, 39]. This is particularly interesting for numerical purposes, since the optimal control representation can be learned on the fly with stochastic approximation methods [17, 9, 10, 48]. We believe that such results can be obtained by harvesting the contraction principle provided by Corollary 1.
On a more theoretical ground, dual Sobolev norms have recently attracted attention in the optimal control community due to the so-called optimal matching problem, see for instance [80, 81] and references therein. With these works in mind, the dual Sobolev norm in the antisymmetric part of the rate function described in Section 3.2 could be interpreted as an infinitesimal transport cost related to the antisymmetric part of the dynamics, which is an alluring interpretation of irreversibility. Note that the relations between optimal transport and large deviations theory have a fruitful history, see e.g. [58].
It has been known for some time in the physics literature that the empirical density of a diffusion may not contain enough information to describe its fluctuations in an irreversible regime. It is actually more relevant to consider the fluctuations of both the empirical density and current, a procedure sometimes called level 2.5 large deviations [24, 7]. This framework can be used to provide a clear description of the rate function of irreversible dynamics. As shown in [7], such large deviations results can be derived by Krein–Rutman arguments like those used in the present paper. Therefore, we believe that our results can be extended to prove level 2.5 large deviations principles and characterize precisely the class of admissible currents.
Finally, it is important to understand the behaviour of observables which are not covered by our analysis. It has been recently shown [90] in the case of the Ornstein–Uhlenbeck process that observables growing too fast at infinity with respect to the confinement are characterized by a heavy tail behaviour. This leads to a level 1 large deviations principle at an anomalous speed with a localization in time of the fluctuation, and the Krein–Rutman strategy developped in the present paper does not apply. We therefore believe there are several interesting open questions in this direction.
Acknowledgments
The authors warmfully thank Hugo Touchette for reading an early version of the manuscript as well as the first preprint, and providing useful comments; as well as the referees, whose suggestions helped us making more precise various aspects of this work. The authors are grateful to Ofer Zeitouni for an interesting discussion about scalings in large deviations theory, as well as to Jianfeng Lu for pointing out the work [15]. We also thank Julien Reygner for general discussions on large deviations. The PhD of Grégoire Ferré was supported by the Labex Bézout ANR-10-LABX-58-01. The work of Gabriel Stoltz was funded in part by the Agence Nationale de la Recherche, under grant ANR-14-CE23-0012 (COSMOS), and by the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement number 614492. We also benefited from the scientific environment of the Laboratoire International Associé between the Centre National de la Recherche Scientifique and the University of Illinois at Urbana-Champaign.
6 Proofs
In all the proofs below, for conciseness, we write , etc, with some abuse of notation, to indicate that the expectations we consider are taken with respect to all realizations of the dynamics (9) started from ; and do not indicate explicitly the dependence of on , in contrast to the convention used in Section 2.
6.1 Proof of the large deviations principle
As mentioned after Theorem 1, our proof relies on the Gärtner–Ellis theorem [28], for which we need several preliminary results. The key object is the functional
[TABLE]
Roughly speaking, the Gärtner–Ellis theorem (Theorem 4 in Appendix A) states that if this functional is finite and Gateau-differentiable over and defined in (25) is exponentially tight for the -topology, then satisfies a LDP in the dual space of . A reminder of this theorem and some elements of analysis are given in Appendix A.
However, studying the range of functions for which the functional is finite and Gateau-differentiable is not an easy task. Formally, our strategy is to prove that , the element of the spectrum of the operator with the largest modulus, is a real eigenvalue for any function , and to show that it is actually equal to the cumulant function defined in (26). This amounts to showing the well-posedness and regularity of a family of spectral problems. For this, we use several ideas from [47], which shows that under Lyapunov and irreducibility conditions, the eigenvalue problem to which is associated is well defined. In order to avoid technical difficulties related to unbounded operators, we study the semigroup rather than its generator , see Remark 9 below for more details. The seminal paper by Gärtner [55, Section 3] provides useful technical tools, as well as [44, 115].
In all of this section, we suppose that Assumptions 1, 2 and 3 hold true and consider a function of class as in Assumption 3, i.e. such that and either is bounded or has compact level sets and satisfies (22). We repeatedly use that in view of (21). We start with important properties of key martingales that appear regularly in the proofs of the required technical results.
Lemma 2**.**
If is a solution to (9), then the stochastic processes defined by
[TABLE]
are continuous non-negative local martingales, hence supermartingales. Moreover, it holds almost surely
[TABLE]
where and are the constants from Assumption 3.
Proof.
First, Itô formula gives
[TABLE]
Since is and is continuous, is a continuous local martingale [71]. Since it is non-negative, it is a supermartingale by Fatou’s lemma, and the same conclusion holds for . On the other hand, (21) shows that
[TABLE]
which concludes the proof. ∎
The use of the martingale is inspired by [115] where it is considered to control return times to compact sets. Here, it allows to define the Feynman–Kac semigroup associated with the dynamics with weight function .
Lemma 3**.**
Fix . For any , the Feynman–Kac operator
[TABLE]
is well defined. Moreover, is a semigroup of bounded operators on . Finally, for any and any , there exist and a compact subset such that
[TABLE]
Proof.
We first show that for any , is a semigroup of bounded operators on , before turning to the proof of (73). For a fixed , since , there exists such that, for any ,
[TABLE]
Using Lemma 2, the supermartingale property leads to
[TABLE]
Therefore, for all ,
[TABLE]
and hence
[TABLE]
As a result is a semigroup of bounded operators over .
We next prove (73) for a fixed , which we assume non-zero without loss of generality. Note that
[TABLE]
Since has compact level sets and , for any there exists a compact set and a constant such that
[TABLE]
where is a constant to be chosen later on. This implies that
[TABLE]
with since . Therefore (by some standard approximation arguments relying on stopping times, as discussed for instance in [97])
[TABLE]
We can now bound the right hand side of the above equation with a technique similar to the one used in [47, Section 2.3]. Indeed, for any ,
[TABLE]
Since , there exists a constant depending on such that
[TABLE]
Plugging this estimate into (75) and using that leads to
[TABLE]
where the last bound is due to Lemma 2.
Using this estimate to bound the right hand side of (74), we end up with
[TABLE]
Integrating with respect to time leads to
[TABLE]
Since , there exists a compact set such that outside , so that we have
[TABLE]
We can now assume that we chose from the begining (recall that is fixed). Setting , this leads to
[TABLE]
which proves (73). ∎
Lemma 3 proves crucial to obtain the compactness of the evolution operator , as noted in [47] (a result inspired by [97, Theorem 8.9]). Note however that is a priori not a strongly continuous semigroup on , see the discussion in [114, Proposition B13] and Remark 9 below for more details.
Another key ingredient is the regularization property of the evolution. The following bound on the Feynman–Kac semigroup depending on the weight function is one element in this direction.
Lemma 4**.**
Suppose that Assumptions 1, 2 and 3 hold true, and fix . Then, for any , any and any , it holds
[TABLE]
Proof.
Using the inequality for , we have, for ,
[TABLE]
which is the desired conclusion. ∎
We can now use Lemma 4 to show an important regularization property of the Feynman–Kac semigroup.
Lemma 5**.**
For any , , any and any compact , the function is continuous.
Let us insist on the fact that the statement of Lemma 5 is a consequence of Hörmander’s theorem [43, Theorem 4.1] when has polynomial growth and is smooth. However, the result is more difficult to obtain when is irregular. Note for instance that we cannot rely on the continuity property proved in Section 6.2.3 below since the space of smooth functions with compact support is not dense in . The idea of the proof is to use the local martingales introduced in Lemma 2 to show that the regularization property of Hörmander’s theorem is preserved when is irregular but does not grow too fast.
Proof.
We use Assumption 1 to revisit [55, pages 34-35] in an unbounded setting and with a hypoelliptic flavour. First, we note that for , the result is a direct application of Assumption 1 combined with Hörmander’s theorem, since the evolution operator can be shown to be an integral operator with a transition probability which admits a density belonging to (see for instance [69] for , which can easily be extended to with the hypoelliptic result of [43, Theorem 4.1]). In particular, is continuous.
We now use an approximation argument inspired by [55, Section 3] for a generic function . Consider a sequence of functions belonging to with for any , and such that almost everywhere as (such a sequence exists by Lusin’s theorem, see [102, Chapter 2]). By modifying the proof of Lemma 4, and since , we have for any , and ,
[TABLE]
with .
Our goal is now to show that converges uniformly over any compact to , by proving that the right hand side of (77) goes uniformly to [math] over . This will conclude the proof since a uniform limit of continuous functions is continuous.
We introduce to this end the events
[TABLE]
and fix a compact set . The right hand side of (77) can then be split into two terms
[TABLE]
for which we show convergence to [math], uniformly for , starting with . Since , there exists such that
[TABLE]
Moreover, , , and , so that
[TABLE]
By definition of in (70) we have
[TABLE]
The Cauchy–Schwarz inequality then shows that
[TABLE]
By (71) it holds . Next, by Tchebychev’s inequality and since ,
[TABLE]
As a result, we obtain, for ,
[TABLE]
Therefore, for any , we can choose such that .
Let us now control , introducing . Since , it holds for some ,
[TABLE]
Using the definition (78) we have
[TABLE]
where is the ball of center [math] and radius to be chosen. Let us first bound , which retains only the parts of the trajectories performing excursions out of . Using , for and as fixed above, there exist , such that
[TABLE]
We fix and such that the above inequality holds true. Using again , we are led to
[TABLE]
where the last line follows from the definition (78) of . Therefore, once is fixed, there exists such that for any and , it holds . It remains to control in order to obtain the uniform convergence to zero of (77) over as . In fact,
[TABLE]
where is the evolution semigroup defined in (15). Since is a sequence of bounded functions converging almost everywhere to zero and the transition kernel has a smooth density for , it follows that goes uniformly to zero over compact sets for any as , see e.g. [55, 97]. Moreover, it can be shown that
[TABLE]
which goes to zero when , uniformly in and . Therefore, for , and fixed as above, and choosing
[TABLE]
there exists such that for all and ,
[TABLE]
Then, for any , , it holds
[TABLE]
Let us summarize the various approximations: for any , we first fix so that . Then, we choose large enough so that . Finally, we take small enough and large enough in (79) so that for . As a result, for any there is such that for and , it holds .
In conclusion, the right hand side of (77) goes to zero uniformly as over any compact set . Therefore is continuous and converges uniformly over to , which is therefore continuous over . Since the compact is arbitrary, is continuous over , which concludes the proof. ∎
Before presenting the main result concerning the spectral properties of the operator and its consequences on the definition of the cumulant function , we need the following “irreducibility” lemma, which relies on Assumption 2.
Lemma 6**.**
For any time , and any Borel set with non-empty interior, it holds that
[TABLE]
Proof.
Take and (which is possible since has non-empty interior). By Assumption 2, there exists a -path solving (19) such that and . We can then use the proof of the Stroock–Varadhan support theorem, see [97, Theorem 6.1] for an overview. In particular, Assumption 2 implies that [103, Eq. (5.5)] is satisfied. Therefore, [103, Eq. (5.3)] ensures that, for any ,
[TABLE]
Moreover, since and upon reducing we may assume that , where denotes the ball of center and radius . Recalling that , we then obtain
[TABLE]
where we denote by the -tube around the path , namely
[TABLE]
Since is a bounded set and is continuous over , it holds
[TABLE]
The combination of (81) and (82) leads to the desired result (80). ∎
At this stage, we follow the spectral analysis path developed in [47]. However, we have to prove that the assumptions used in [47] are fulfilled in our context. In particular the irreducibility is granted by Lemma 6.
Lemma 7**.**
For any and any , the operator considered over has a real largest eigenvalue with eigenspace of dimension one, and an associated continuous eigenvector such that for any . Moreover, is the only positive eigenvector of (up to multiplication by a positive constant). Finally, is equal to the cumulant function defined in (26):
[TABLE]
The result of Lemma 7 is twofold: it entails the well-posedness of the principal eigenproblem associated with for any and , and then identifies this principal eigenvalue with the free energy function (26). Another consequence of this lemma is that is in fact the principal eigenvector of , see Lemma 10 below for a more precise statement.
Proof.
We follow the general strategy of [47] and split the proof into several steps.
Step 1: Compactness of the evolution operator.
We first show that, for given and , the operator defined in Lemma 3 is compact when considered on . For any compact set we have the decomposition
[TABLE]
We first consider the compact sets from (73) for and time (omitting the dependence on in the notation since the time is fixed here) and note that converges to [math] in operator norm as . Indeed, for any , (73) leads to
[TABLE]
so that when .
We next show that is compact over for any compact set . Consider a sequence bounded in . Following the first step of the proof of [47, Lemma 2] and using our strong Feller result, Lemma 5, we see that is a strong Feller operator, so is ultra-Feller (see [47, Lemma 6]). This means that the operator is continuous in total variation norm, so that the family is uniformly equicontinuous. We used here that since and is continuous, it holds . The sequence therefore converges in up to extraction by the Ascoli theorem [102, Theorem 11.28], and in since . Therefore, the operator sends a bounded sequence into a convergent one (up to extraction), so it is compact in [95]. The decomposition (84) and the bound (85) then show that is the limit in operator norm of the compact operators as , so it is compact in (see e.g. [95, Theorem VI.12]).
Step 2: Existence of the principal eigenvalue.
We can now use the Krein–Rutman theorem on the (closed) total cone (see [27, 47] for definitions). For , it is clear that leaves this cone invariant. We next show that has a non-zero spectral radius
[TABLE]
To this end, fix a compact set with non-empty interior. We have shown in Lemma 6 that
[TABLE]
Since is continuous by Lemma 5, this shows that
[TABLE]
Therefore, for any ,
[TABLE]
so that for . Iterating the procedure for any we get
[TABLE]
As a result, since , we obtain in the large limit the following lower bound for the spectral radius:
[TABLE]
which shows that is positive. Since is compact, [27, Theorem 19.2] ensures that is a real eigenvalue of with associated eigenvector (in particular, ). Using the semigroup property of and standard arguments (see [94, Theorem 2.4]), we can show that there exists such that and
[TABLE]
Step 3: Properties of .
For the remainder of the proof, we write for simplicity and (the function being fixed). We show here that is continuous and positive. For any compact and , (87) leads to
[TABLE]
Using Lemma 3 we obtain that, for any , there exists a compact set such that
[TABLE]
so that is continuous as the uniform limit of continuous functions (since is continuous by Lemma 5). Finally, since and is not identically equal to [math], there exists such that . Moreover is continuous, so there is for which on . By (87) it holds, for any ,
[TABLE]
Since on and is continuous, . Moreover for any by Lemma 6, so the previous lower bound shows that for all .
Step 4: Properties of eigenspaces and eigenfunctions.
We now show that the eigenspace associated with is of dimension one, and that any other eigenvector vanishes somewhere in . For this, we introduce the so called -transform [76, 101, 23, 47]. A key element here is the fact that for all , which allows to define the following Markov operator, for an arbitrary time :
[TABLE]
where and refer here to the multiplication operators by the functions and respectively. We now prove that is ergodic by first noting that admits as a Lyapunov function (using (73) and the normalization which implies that ). Using Assumption 3, we can also show that has compact level sets, see [47, Appendix E] for details.
Moreover, we can prove that satisfies a minorization condition on any compact set. For this, we first use that . Then, for any and , the operator has a smooth transition density by hypoellipticity (because and the coefficients of belong to the class , see [43, Theorem 4.1]), which is positive in view of Lemma 6 by an argument similar to the one sketched after Assumption 2 (see for instance the proof of [97, Proposition 8.1]). Therefore, for any compact with non-empty interior, and denoting by the uniform Lebesgue measure on , there is such that, for any measurable set ,
[TABLE]
Since is continuous, this implies that, for any measurable ,
[TABLE]
where both the minimum and maximum above are finite and non-zero (recall that is the Lebesgue measure of ). This shows that satisfies a minorization condition [60] over any compact set.
Therefore, the Markovian dynamics with kernel admits a unique invariant probability measure , with respect to which it is ergodic in . By this we mean that (in view of [60, Theorem 1.2]) there exist and such that for any ,
[TABLE]
and it holds .
We can now use this ergodic behaviour to show that the eigenspace associated with has dimension one and that cannot have another positive eigenvector with norm in . Indeed, if there were another eigenvector associated with , then the fact that together with (89) ensure that
[TABLE]
This shows that and would be proportional, and answers the claim that the eigenspace associated with has dimension . Assume now that there is another real eigenvalue with real eigenvector such that for all . Noting again that and since , (89) shows that, for any ,
[TABLE]
However it now holds, for any ,
[TABLE]
where we used that and . Combining the two equations above shows that
[TABLE]
which contradicts (90). As a result, there cannot be another eigenvalue with a positive eigenvector.
Step 5: The principal eigenvalue is the cumulant function.
Proving (83) now follows by a simple rewriting. For and fixed, it holds, for any ,
[TABLE]
so that
[TABLE]
By (89) (since ), we see that converges to (with fixed), so that
[TABLE]
We have chosen to work with an arbitrary time for convenience, so a priori the above limit depends on . To conclude the proof, it remains to show that the limit actually does not depend on the specific choice of and that
[TABLE]
This extension from fixed to any follows by standard arguments not reproduced here (see e.g. [64, 47]). ∎
An important ingredient for the lower bound of the LDP is the Gateau-differentiability of the cumulant functional, which we prove below.
Lemma 8**.**
The functional
[TABLE]
is convex and Gateau-differentiable.
Proof.
The convexity of is a standard consequence of Hölder’s inequality. Concerning Gateau-differentiability, we follow the strategy of [55, Section 3] for a compact state space, relying on results of Kato [72]. For this, we interpret the cumulant function (91) as the largest eigenvalue of the tilted generator, , as shown in Lemma 7. More precisely, for and , is associated with the largest eigenvalue of the operator in through
[TABLE]
so that derivability in can be shown through the differentiability of the spectrum of a bounded operator. We thus show that the operator-valued function is differentiable in operator norm.
To this end, we fix , and prove that for , there exists such that
[TABLE]
where
[TABLE]
Note that the operator is bounded on by the same martingale estimate used to prove Lemma 4. In order to prove (92), we use the identity
[TABLE]
to obtain, for any and ,
[TABLE]
where we used the inequality for in the last line. By manipulations similar to the one used to prove Lemma 4, we can bound the latter expectation by for some constant , which leads to (92) with .
Equation (92) shows that is differentiable in operator norm, and that
[TABLE]
Thus, the principal eigenvalue , which is always isolated, is differentiable, see [72, Chapter II, Theorem 5.4] and [72, Chapter IV, Theorem 3.5]. This concludes the proof of Gateau-differentiability. ∎
Remark 8**.**
By pursuing further the Taylor expansion (92) in the proof of Lemma 8, we can actually show that, for any , the function
[TABLE]
is analytic (this analyticity was already proven in [76] using a different argument that can be simplified with our tools). This relies on the simple inequality for any , together with the series expansion of the exponential and martingale estimates as in the proof of Lemma 8. Indeed, our proof, based on martingales, shows that for any , the function
[TABLE]
is analytic. Moreover, it is finite on and converges pointwise to a finite valued function as , as shown in Lemma 7. Therefore, the convergence holds uniformly on any compact as (see [45, Theorem VI.3.3]). Since a locally uniform limit of analytic functions is analytic (see [102, Theorem 10.28]), the function is analytic.
The last step before proving the large deviations principle itself is an exponential tightness result, see [28, Section 1.2]. At this stage, the finiteness of together with the Gateau-differentiability of already provides the upper bound over compact sets and the lower bound in (28). In order to extend the upper bound to all closed sets, we prove exponential tightness in the -topology, see Appendix A for some definitions (this exponential tightness is not explicitely stated in [76]).
Lemma 9**.**
The family of probability measures over is exponentially tight in the -topology.
Proof.
We adapt the strategy of [115, Corollary 2.3] and [111, Section 2.2] by introducing the family of sets
[TABLE]
For , the sets are subsets of since . We show that they are actually precompact in the -topology.
Let us first show that is precompact in the usual weak topology for any . Consider for this the compact sets for (recall that has compact level sets). Then, for any , we have
[TABLE]
This shows that for any and any ,
[TABLE]
hence (upon choosing sufficiently large) for any the family of measures is tight, so it is precompact for the weak topology by the Prohorov theorem [12]. Now, if is bounded, is tight for the -topology and the theorem is shown, so we may assume that has compact level sets (see Assumption 3). For proving compactness in our weighted topology, we show that is uniformly integrable over in order to use [110, Theorem 7.12]. Since , the set
[TABLE]
is compact for any . Moreover, since we assume to be continuous with compact level sets, for any there exists such that
[TABLE]
with when . Therefore, for any and ,
[TABLE]
Taking the supremum over in the above equation and recalling that when we obtain
[TABLE]
We can then conclude that is precompact for the -topology. Consider indeed a sequence . By Prohorov’s theorem, has a subsequence weakly converging towards a measure , i.e. for any . Then, by [110, Theorem 7.12], (93) ensures that and for any , as . In other words, is precompact for the -topology.
We can now prove the -exponential tightness of the empirical distribution in . Indeed, for any , Tchebytchev’s inequality leads to
[TABLE]
Renormalizing at log scale leads to
[TABLE]
The right hand side of the above quantity may look infinite since grows faster than . However, using again the martingale defined in Lemma 2 we obtain, for any ,
[TABLE]
Thus it holds
[TABLE]
As a result, (94) becomes
[TABLE]
Since is precompact in the -topology for any , and can be chosen arbitrarily large, this proves the exponential tightness of the family of empirical distributions in the -topology. ∎
We are now in position to prove Theorem 1.
Proof of Theorem 1.
The previous lemmas make it possible to apply the Gärtner–Ellis theorem (recalled in Appendix A). The function in Theorem 4 of Appendix A is the cumulant function
[TABLE]
The topological dual of is , where is the set of measures over integrating (see [102, 76] and [30, Lemma 3.3.8] for details). We have proved that is well defined, Gateau-differentiable, and that the family of measures
[TABLE]
is exponentially tight in the -topology. Therefore, satisfies a large deviations principle in the -topology with good rate function given by
[TABLE]
Note first that . We next observe that if is not normalized to 1 (take to be constant in the supremum (95)), so we may consider over . Moreover, choosing in (95) and noting that by Lemma 7, we get if . If is not absolutely continuous with respect to , there exists a measurable set such that and . Since has a positive density with respect to the Lebesgue measure, this means that has zero Lebesgue measure. Consider then for . Since has zero Lebegue measure and has a smooth density for all (as a consequence of Assumption 1) it holds, for all ,
[TABLE]
Therefore, the process
[TABLE]
satisfies for all . Since , it holds almost surely, for any . As a consequence we obtain
[TABLE]
This shows that , so that from (95) we obtain
[TABLE]
with . By letting we are led to .
Finally, we show that if and only if , and that converges almost surely to in the -topology for any sequence such that (see [28, Appendix B] for the definition of this almost-sure convergence). Define
[TABLE]
Since has compact level sets (because it is a good rate function, see Theorem 4), is a non-empty closed subset of for the -topology. Moreover, in order for the LDP upper bound to make sense, it holds . If denotes an open neighborhood of , the lower semicontinuity of implies that
[TABLE]
Therefore, by the large deviations upper bound we have, for any ,
[TABLE]
for some constant . Consider now a sequence such that as . In particular, there exists such that for , which implies
[TABLE]
This shows that converges almost surely to in the -topology, by the Borel-Cantelli lemma (and by definition of convergence in a topological space [28, Appendix B]). However, we know by Proposition 2 that the only possible limit for is , hence and almost surely converges to .
We finally show for completeness that almost surely spends a finite Lebesgue time outside . For this we introduce the random subset of of times for which does not belong to , namely . Since
[TABLE]
we have, by Fubini’s theorem, for any ,
[TABLE]
By using (96) and the dominated convergence theorem, we obtain
[TABLE]
As a result, almost surely. This means that, for any neighborhood of in the -topology, the empirical measure almost surely spends a finite Lebesgue measure time outside , and this concludes the proof. ∎
6.2 Proofs of Section 3
We start by providing a preliminary technical result in Section 6.2.1, which shows that the eigenvectors considered in Lemma 7 belong to the generalized domain defined in (38). We then turn to the proofs of Proposition 3 (see Section 6.2.2) and Corollary 2 (see Section 6.2.3).
6.2.1 A preliminary technical result
Lemma 10**.**
Fix . The function defined in Lemma 7 belongs to and satisfies
[TABLE]
Proof.
We already know by Lemma 7 that and . It suffices therefore to show that and to obtain the representation (97) for . We combine to this end elements from [30, Theorem 4.2.25] and [114, Proposition B13].
We start by noting that, since is an eigenvector of the operator with eigenvalue , it holds
[TABLE]
Therefore,
[TABLE]
where the last equality comes from Fubini’s theorem and
[TABLE]
Note that we can indeed apply Fubini’s theorem since there exist such that
[TABLE]
and (since we are integrating nonnegative functions)
[TABLE]
where the last expression is finite by manipulations similar to the ones performed in the proof of Lemma 2.
We can next use (98) at initial time together with a conditioning argument to write
[TABLE]
This finally shows that (99) becomes
[TABLE]
Since is in (as the product of functions in and ) and is a semigroup of bounded operators on by (22), it holds
[TABLE]
so that (35) is satisfied. As a result, and in the weak sense defined by (36). ∎
Remark 9**.**
It is actually possible to make more general statements about the domains of the generators of for , similarly to [114, 115]. For this, one considers the (closed) subset of functions for which in when , see [96, Exercice 1.16]. We can then define a generator with domain for this semigroup. By manipulations similar to those of Lemma 10, we can show that when we define as in (36). In this case we obtain the representation which could be expected. This procedure allows to define a common domain for the operators with .
Here we bypass the approach sketched above because, for the proof of Proposition 3 given below, we can restrict our attention to the eigenvectors for . In this case, it is clear that in when , and we have the simple representation formula , which can be seen as a reformulation of the eigenvalue equation .
6.2.2 Proof of Proposition 3
For the proof, which is partly inspired by [30, Lemma 4.1.36], we denote by the rate function given by the Fenchel transform in (27) and for the Varadhan functional on the right hand side of (37). We repeatedly use the results of Lemmas 7 and 10.
We first show that if is not absolutely continuous with respect to or does not belong to . Assume first that does not hold: there exists a set such that and . For any we introduce and denote by the eigenvector associated with the principal eigenvalue of for some . Recall that by Lemma 10. As shown in the proof of Theorem 1, it holds , so that (97) can be rewritten as
[TABLE]
Therefore,
[TABLE]
By letting , we conclude that when is not absolutely continuous with respect to . Next, if , since it holds . We may then choose . By Lemma 10, the principal eigenvector belongs to with , so we have
[TABLE]
i.e. if . This shows that when is not absolutely continuous with respect to or . We next show that when and , which we assume until the end of the proof.
Let us first show that . For this, we consider and introduce
[TABLE]
Because of the definition (38) of , we know that . We can then write, since ,
[TABLE]
We now show that . By computations similar to the ones in the proof of Lemma 2, and using the continuity of (see also [115, Corollary 2.2]), we obtain by the local martingale property that
[TABLE]
Therefore, recalling the definition (88) of the -transformed evolution operator with a time fixed (with in view of Lemma 7), and denoting by the eigenvector associated with in Lemma 7, (101) becomes
[TABLE]
where the limit follows from (89) (noting that ). The latter limit is positive since is continuous and positive, which implies that . Therefore, (100) leads to
[TABLE]
Since is arbitrary, taking the supremum shows that for any with .
We finally turn to the inequality . Consider for any arbitrary the eigenvector defined in Lemma 7. By Lemma 10, this eigenvector belongs to and satisfies . Thus, since , we have
[TABLE]
Given that, in the above equation, is an arbitrary function belonging to , taking the supremum leads to
[TABLE]
This finally shows that for all with and concludes the proof.
6.2.3 Proof of Corollary 2
Since is the Fenchel transform of , the result follows if we can show that the application defined on is stable by bi-Fenchel conjugacy. The convexity and finiteness of show that a (necessary and) sufficient condition for to be bi-Fenchel stable is for the functional to be lower-semicontinuous (see [8, Theorem 2.22]). We show below that it is actually continuous: for any sequence in such that for some , it holds as . We shall use for this a stability result from [22].
Consider a sequence converging to in . Using Lemma 4, for any , , and , it holds (using again the inequality for )
[TABLE]
for some constant depending on , and . We used Lemma 2 and the supermartingale property of to obtain the last line. This leads to
[TABLE]
We know by Lemma 7 that and are associated with the isolated largest eigenvalue of the operators and respectively. Therefore, (102) shows that the approximation is strongly stable (we refer to [22], in particular the definitions in Section 2.2 and Proposition 2.11), so [22, Proposition 2.2] ensures that as . This shows that the function is continuous and concludes the proof.
6.2.4 Proof of Theorem 2
The proof, inspired by [15], relies on two ideas: performing a Witten transform inside the variational representation (37) and separating the symmetric and antisymmetric parts of the generator . We write and assume first that instead of . Starting from (37), we consider a function of the form
[TABLE]
We call this choice “variational Witten transform” for its similarity with the standard Witten transform [112, 62, 83] and its use in the variational formula (37) satisfied by . Since with it is clear that . This follows by noting that, using the shorthand notation , we have
[TABLE]
Moreover, it holds and is constant outside a compact set, so and it holds .
We now rewrite the expression in (37) for given by (103), using again the notation :
[TABLE]
Recalling that and expanding , we obtain
[TABLE]
We now decompose into symmetric and antisymmetric parts. First, it holds
[TABLE]
On the other hand, using that is a first order differential operator satisfying , we obtain
[TABLE]
As a result
[TABLE]
By plugging (105)-(106) into (104), we obtain
[TABLE]
The first term in the above equation reads (recalling that )
[TABLE]
By density of in , the above expression is valid for any such that . The above computation shows that this condition is equivalent to assuming that , and
[TABLE]
which does not involve the function . Moreover, since is a first order differential operator, antisymmetric on , it holds
[TABLE]
As a result, (107) rewrites
[TABLE]
and this expression is finite for any .
Our goal is now to take the supremum over functions in (108), and prove that this is enough to obtain the supremum over . We consider for this the terms depending on in (108) and, using the duality between and (see [75, Section 2, Claim F]) we obtain
[TABLE]
where we used Young’s inequality with to obtain the second line. Since , the supremum over the functions takes the value when . Therefore, by density of in , the supremum over the functions of the form (103) for recovers the supremum over and it holds
[TABLE]
by definition of the -norm in Section 2.1, which concludes the proof.
Remark 10**.**
We have proved our result for measures of the form . Considering more general measures is made difficult because the Radon–Nikodym derivative may vanish on some region of , hence the definition of is not clear. Given (109), we see that we can give a sense to our computations provided defines a linear form on , namely: there exists such that
[TABLE]
We find it however clearer to work directly with exponential perturbations of the invariant measure .
6.2.5 Proof of Corollary 3
The proof follows from the variational formulation of Theorem 2. Indeed, let us rewrite (43) as
[TABLE]
where is fixed and satisfies the assumptions of the theorem, and
[TABLE]
By [75, Section 2, Claim F], we can identify with the dual of , so that reads
[TABLE]
Denoting by the adjoint of the gradient operator in , standard results of calculus of variations show that the minimum in (111) is attained at a unique solution to
[TABLE]
Inserting solution to (112) in (111) leads to
[TABLE]
which concludes the proof.
Appendix A Tools for large deviations principles
In this section, we remind some large deviations concepts (using the abuse of notation discussed at the beginning of Section 6 for denoting expectations and probabilities). For a Polish space , we denote by its topological dual (the set of continuous linear functionals over ). We first recall the definition of an exponentially tight family of measures. A family of measures over a Polish space is called exponentially tight if for any , there exists a (pre)compact set such that
[TABLE]
In words, exponential tightness means that the measures concentrate exponentially fast over compact sets. This property is used in large deviations to turn an upper bound over compact sets into an upper bound over all closed sets.
We now define the cumulant function. Consider a family of measures over a Polish space . The logarithmic moment generating function is defined as in [28, Section 4.5]: for any , and a random variable distributed according to ,
[TABLE]
The scaled cumulant generating function is defined by
[TABLE]
Let us relate this quantity with the objects introduced in Section 2. In our situation, we consider fluctuations of the empirical measure (where is the space of measures with finite mass), so and for ,
[TABLE]
On the other hand, belongs to a space of functions, typically when the -topology is considered. In practice we may restrict ourselves to probability measures because the rate function is infinite otherwise. We see that considering leads to choosing . In any case the duality relation (114) reads in this case
[TABLE]
so that coincides with the argument of the limit in (26). With these preliminaries, we are in position to state the key theorem for the results in this work, which goes back to [55, 45] and is presented for instance in [28, Corollary 4.6.14]. We recall that a rate function is said to be good if its level sets are compact for the considered topology.
Theorem 4** (Projective limit - Gärtner–Ellis).**
Let be an exponentially tight family of probability measures on a Polish space . Assume that
[TABLE]
is finite valued over and Gateau-differentiable. Then satisfies a large deviations principle over with good rate function , the Legendre–Fenchel transform of .
Appendix B Proof of Proposition 1
The proposition is a consequence of the equality
[TABLE]
Since has compact level sets and by (23), has compact level sets. Since has compact level sets, for it holds and for some constant . Moreover, outside a compact set, the function
[TABLE]
is bounded above and below since the numerator and denominator are both equivalent to , so the second condition in (21) holds. Finally,
[TABLE]
Since , we may choose small enough so as to obtain
[TABLE]
for some constant . This proves the third item of (21).
We finally turn to the proof of (22). For this we compute
[TABLE]
Hence, using that , for any it holds
[TABLE]
Since at infinity and (24) holds, this shows that (22) is satisfied when choosing sufficiently small.
Appendix C Proof of Lemma 1
The proof relies on manipulations similar to those of [86]. A simple computation shows that
[TABLE]
For any it holds
[TABLE]
As a result, Assumption 5 leads to
[TABLE]
Since , it holds
[TABLE]
with
[TABLE]
The claim follows for by choosing and sufficiently small.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Asmussen and P. W. Glynn. Stochastic Simulation: Algorithms and Analysis , volume 57 of Stochastic Modelling and Applied Probability . Springer Science & Business Media, 2007.
- 2[2] F. Augeri. On heavy-tail phenomena in some large deviations problems. Comm. Pure Appl. Math. , 73(8), 1599-1659, 2020.
- 3[3] Y. Baek, Y. Kafri, and V. Lecomte. Dynamical phase transitions in the current distribution of driven diffusive channels. J. Phys. A , 51(10):105001, 2018.
- 4[4] D. Bakry, F. Barthe, P. Cattiaux, and A. Guillin. A simple proof of the Poincaré inequality for a large class of probability measures. Electron. Commun. Probab. , 13:60–66, 2008.
- 5[5] D. Bakry, I. Gentil, and M. Ledoux. Analysis and Geometry of Markov Diffusion Operators , volume 348 of Grundlehren der mathematischen Wissenschaften . Springer Science & Business Media, 2013.
- 6[6] V. Bansaye, B. Cloez, P. Gabriel, and A. Marguet. A non-conservative Harris’ ergodic theorem. ar Xiv:1903.03946 , 2019.
- 7[7] A. C. Barato and R. Chetrite. A formal view on level 2.5 large deviations and fluctuation relations. J. Stat. Phys. , 160(5):1154–1172, 2015.
- 8[8] V. Barbu and T. Precupanu. Convexity and Optimization in Banach Spaces , volume 10 of Mathematics and its Applications . Springer Science & Business Media, 2012.
