Revisiting classical and quantum disordered systems from the unifying   perspective of large deviations

Cecile Monthus

arXiv:1903.05899·cond-mat.dis-nn·May 12, 2021

Revisiting classical and quantum disordered systems from the unifying perspective of large deviations

Cecile Monthus

PDF

TL;DR

This paper explores classical and quantum disordered systems through the lens of large deviations theory, providing a unified framework to understand both typical and rare events across different scales.

Contribution

It offers a pedagogical review that unifies the analysis of classical and quantum disordered systems using large deviations, highlighting common underlying mechanisms.

Findings

01

Unified perspective on classical and quantum disordered systems

02

Large deviations effectively describe typical and rare events

03

Highlights common mechanisms across different disordered systems

Abstract

The theory of large deviations is already the natural language for the statistical physics of equilibrium and non-equilibrium. In the field of disordered systems, the analysis via large deviations is even more useful to describe within a unified perspective the typical events and the rare events that occur on various scales. In the present pedagogical introduction, we revisit various emblematic classical and quantum disordered systems in order to highlight the common underlying mechanisms from the point of view of large deviations.

Equations453

v \sum p_{v} = 1

v \sum p_{v} = 1

P_{L} [v (.)] \equiv p_{v (1)} p_{v (2)} ... p_{v (L)} = x = 1 \prod L p_{v (x)}

P_{L} [v (.)] \equiv p_{v (1)} p_{v (2)} ... p_{v (L)} = x = 1 \prod L p_{v (x)}

T_{L} [v (.)] \equiv T r a ce [T_{v (L)} T_{v (L - 1)} ... T_{v (2)} T_{v (1)}]

T_{L} [v (.)] \equiv T r a ce [T_{v (L)} T_{v (L - 1)} ... T_{v (2)} T_{v (1)}]

λ [v (.)] \equiv \frac{ln ∣ T _{L} [ v ( . )] ∣}{L}

λ [v (.)] \equiv \frac{ln ∣ T _{L} [ v ( . )] ∣}{L}

P_{L} (λ) \equiv [v (.)] \sum P_{L} [v (.)] δ (λ - \frac{ln ∣ T _{L} [ v ( . )] ∣}{L}) ≃_{L \to + \infty} e^{- L I (λ)}

P_{L} (λ) \equiv [v (.)] \sum P_{L} [v (.)] δ (λ - \frac{ln ∣ T _{L} [ v ( . )] ∣}{L}) ≃_{L \to + \infty} e^{- L I (λ)}

I (λ^{t y p}) = 0 = I^{'} (λ^{t y p})

I (λ^{t y p}) = 0 = I^{'} (λ^{t y p})

\overline{∣ T_{L} [v (.)] ∣^{k}}

\overline{∣ T_{L} [v (.)] ∣^{k}}

ϕ (k)

ϕ (k)

0

I (λ)

I (λ)

0

λ^{t y p} = ϕ^{'} (k = 0)

λ^{t y p} = ϕ^{'} (k = 0)

τ_{L} [v (.)] = t_{v (L)} t_{v (L - 1)} ... t_{v (2)} t_{v (1)} = x = 1 \prod L t_{v (x)}

τ_{L} [v (.)] = t_{v (L)} t_{v (L - 1)} ... t_{v (2)} t_{v (1)} = x = 1 \prod L t_{v (x)}

C (x_{0}, x_{0} + r) = x = x_{0} \prod x_{0} + r - 1 tanh (β J (x))

C (x_{0}, x_{0} + r) = x = x_{0} \prod x_{0} + r - 1 tanh (β J (x))

ψ_{x_{0}} (x_{0} + r)

ψ_{x_{0}} (x_{0} + r)

C (x_{0}, x_{0} + r) = J (x_{0}) x = x_{0} + 1 \prod x_{0} + r - 1 \frac{J ( x )}{h ( x )}

C (x_{0}, x_{0} + r) = J (x_{0}) x = x_{0} + 1 \prod x_{0} + r - 1 \frac{J ( x )}{h ( x )}

Q_{v_{1}} [v (.)] \equiv \frac{1}{L} x = 1 \sum L δ_{v_{1}, v (x + 1)}

Q_{v_{1}} [v (.)] \equiv \frac{1}{L} x = 1 \sum L δ_{v_{1}, v (x + 1)}

Q_{v_{r} ... v_{2} v_{1}} [v (.)] \equiv \frac{1}{L} x = 1 \sum L δ_{v_{r}, v (x + r)} ... δ_{v_{2}, v (x + 2)} δ_{v_{1}, v (x + 1)}

Q_{v_{r} ... v_{2} v_{1}} [v (.)] \equiv \frac{1}{L} x = 1 \sum L δ_{v_{r}, v (x + r)} ... δ_{v_{2}, v (x + 2)} δ_{v_{1}, v (x + 1)}

Q_{v_{L} v_{L - 1} ... v_{2} v_{1}} [v (.)] \equiv \frac{1}{L} x = 1 \sum L δ_{v_{L}, v (x + L)} δ_{v_{L - 1}, v (x + L - 1)} ... δ_{v_{2}, v (x + 2)} δ_{v_{1}, v (x + 1)}

Q_{v_{L} v_{L - 1} ... v_{2} v_{1}} [v (.)] \equiv \frac{1}{L} x = 1 \sum L δ_{v_{L}, v (x + L)} δ_{v_{L - 1}, v (x + L - 1)} ... δ_{v_{2}, v (x + 2)} δ_{v_{1}, v (x + 1)}

τ_{L} [v (.)] = x = 1 \prod L t_{v (x)} = v_{1} \prod (t_{v_{1}})^{L Q_{v_{1}} [v (.)]}

τ_{L} [v (.)] = x = 1 \prod L t_{v (x)} = v_{1} \prod (t_{v_{1}})^{L Q_{v_{1}} [v (.)]}

C^{S p A v} (r) \equiv \frac{1}{L} x = 1 \sum L C (x, x + r) = \frac{1}{L} x = 1 \sum L y = x \prod x + r - 1 tanh (β J (y)) = J_{r} \sum ... J_{1} \sum (j = 1 \prod r tanh (β J_{j})) Q_{J_{r} ... J_{2} J_{1}} [J (.)]

C^{S p A v} (r) \equiv \frac{1}{L} x = 1 \sum L C (x, x + r) = \frac{1}{L} x = 1 \sum L y = x \prod x + r - 1 tanh (β J (y)) = J_{r} \sum ... J_{1} \sum (j = 1 \prod r tanh (β J_{j})) Q_{J_{r} ... J_{2} J_{1}} [J (.)]

A [v (.)] = A (Q_{(r p o in t s)} [v (.)])

A [v (.)] = A (Q_{(r p o in t s)} [v (.)])

P_{L} (A) \equiv [v (.)] \sum P_{L} [v (.)] δ (A - A (Q_{v_{r} ... v_{2} v_{1}} [v (.)])) = [Q_{(r p o in t s)}] \sum P_{L} [Q_{(r p o in t s)}] δ (A - A (Q_{(r p o in t s)}))

P_{L} (A) \equiv [v (.)] \sum P_{L} [v (.)] δ (A - A (Q_{v_{r} ... v_{2} v_{1}} [v (.)])) = [Q_{(r p o in t s)}] \sum P_{L} [Q_{(r p o in t s)}] δ (A - A (Q_{(r p o in t s)}))

∣ τ_{L} [v (.)] ∣ = x = 1 \prod L ∣ t_{v (x)} ∣

∣ τ_{L} [v (.)] ∣ = x = 1 \prod L ∣ t_{v (x)} ∣

λ [v (.)] \equiv \frac{ln ∣ τ _{L} [ v ( . )] ∣}{L} = \frac{1}{L} x = 1 \sum L ln ∣ t_{v (x)} ∣

λ [v (.)] \equiv \frac{ln ∣ τ _{L} [ v ( . )] ∣}{L} = \frac{1}{L} x = 1 \sum L ln ∣ t_{v (x)} ∣

\overline{∣ τ_{L} [v (.)] ∣^{k}}

\overline{∣ τ_{L} [v (.)] ∣^{k}}

ϕ (k) = \frac{ln ( ∣ τ _{L} [ v ( . )] ∣ ^{k} )}{L} = ln [\overline{∣ t_{v} ∣^{k}}]

ϕ (k) = \frac{ln ( ∣ τ _{L} [ v ( . )] ∣ ^{k} )}{L} = ln [\overline{∣ t_{v} ∣^{k}}]

t_{v (x)} = e^{β v (x)}

t_{v (x)} = e^{β v (x)}

τ_{L} [v (.)] = x = 1 \prod L e^{β v (x)}

τ_{L} [v (.)] = x = 1 \prod L e^{β v (x)}

λ [v (.)] = β \frac{1}{L} x = 1 \sum L v (x)

λ [v (.)] = β \frac{1}{L} x = 1 \sum L v (x)

p^{G a u ss} (v) = \frac{1}{2 π σ ^{2}} e^{- \frac{v ^{2}}{2 σ ^{2}}}

p^{G a u ss} (v) = \frac{1}{2 π σ ^{2}} e^{- \frac{v ^{2}}{2 σ ^{2}}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Revisiting classical and quantum disordered systems

from the unifying perspective of large deviations

Cécile Monthus

Institut de Physique Théorique, Université Paris Saclay, CNRS, CEA, 91191 Gif-sur-Yvette, France

Abstract

The theory of large deviations is already the natural language for the statistical physics of equilibrium and non-equilibrium. In the field of disordered systems, the analysis via large deviations is even more useful to describe within a unified perspective the typical events and the rare events that occur on various scales. In the present pedagogical introduction, we revisit various emblematic classical and quantum disordered systems in order to highlight the common underlying mechanisms from the point of view of large deviations.

I Introduction

Just like Mr Jourdain discovering that he has been speaking in prose all his life without knowing it, physicists working in statistical physics become aware at some point that they have been using the theory of large deviations without realizing it since their very first acquaintance with the Boltzmann notion of entropy and the Gibbs theory of ensembles. This language of large deviations has turned out to be very powerful to unify the statistical physics of equilibrium, non-equilibrium and dynamical systems (see the reviews [1, 2, 3] and references therein) and to formulate an appropriate statistical physics approach of dynamical trajectories for various Markovian processes (see the reviews [4, 5, 6, 7, 8, 9, 10] and the PhD Theses [11, 12, 13, 14] and the HDR Thesis [15]).

In the field of disordered systems, the presence of random disorder variables induce a lot of subtle effects for the probabilities of interesting observables. Physicists have understood from the very beginning that some observables are non-self-averaging, i.e. their disorder-averaged value is completely different from their typical value (see the books [16, 17] and references therein). It was also realized very early that in each large typical sample, there will nevertheless occur rare anomalous regions of a certain size that may dominate some observables : famous examples are the Lifshitz essential singularities of the density of states near spectrum edges in Anderson localization models [18, 19, 20, 21, 16], the Griffiths singularities for the statics [22, 23] and the dynamics [24, 25, 26] of random classical models, and the Griffiths phases in random quantum models (see the reviews [27, 28] and references therein). Finally at critical points, it was found that multifractal properties appear, for instance for the inverse participation ratios of eigenfunctions at Anderson localization transitions (see the reviews [29, 30] and references therein) or for correlation functions in random classical spin models [31, 32, 33, 34, 35, 36, 37, 38], while at Infinite Disorder fixed points, many observables are even more broadly distributed [27, 28]. These few examples indicate that that the language of large deviations is even more useful in the presence of disorder in order to describe within a unified perspective all these phenomena involving typical and rare events on various scales.

The aim of the present pedagogical introduction is thus to explain to physicists how the general theory of large deviations is the natural language to analyze the properties of various well-known classical and quantum random models. It is of course not meant for mathematicians who have been using the large deviation framework for a very long time (see the books [39, 40, 41, 42, 43, 44] and references therein), in particular in the area of disordered systems (see the the books [45, 46, 47], the review [48] and references therein). This pedagogical introduction is thus intended only for physicists who are disheartened by the technical vocabulary used in the mathematical literature on large deviations (like Polish space, Borel sigma-field, cadlag function, … ).

The following sections are organized as follows. In section II, we introduce the generic notations for one-dimensional random models and describe how observables can be classified according to the order of the empirical property of the disorder configuration that determine them. We then analyze the various levels of this hierarchy : the ’Level 1’ of large deviations allows to study the properties of observables given by products of random variables (section III); the ’Level 2’ of large deviations corresponds to the fluctuations of the empirical 1-point histogram of the disorder configuration (section IV); the ’Level 2.5’ of large deviations corresponds to the fluctuations of the empirical 2-point histogram of the disorder configuration (section V); finally the ’Level 3’ of large deviations corresponds to the whole series of empirical histograms of arbitrary order (section VI). In Section VII, we turn to random models defined on Cayley trees to analyze their properties in terms of large deviations of branches. Our conclusions are summarized in section VIII. In Appendix A, we describe an alternative classification of one-dimensional disorder configurations in terms of empirical intervals where the disorder remains the same.

II Classification of observables in one-dimensional random models

II.1 Transfer-matrix formulation of one-dimensional random models

Many classical and quantum disordered models in one dimension can be reformulated in terms of the product of random matrices (see the books [16, 17] and references therein). To have generic notations, it will be convenient to denote by $v(x)$ the disorder variable at point $x$ that is drawn independently with some probability distribution $p_{v}$ normalized to unity

[TABLE]

that should be translated into $\int dvp_{v}=1$ whenever the disorder $v$ is a continuous random variable. In this paper, we have chosen to write the general equations for the case of discrete disorder $v$ (Eq 1) without the constant translation into the case of continuous disorder, but some examples of application will involve continuous disorder.

A disorder configuration $[v(.)]\equiv[v(x)]_{x=1,2,..,L}$ on a sample of $L$ sites occurs with the factorized probability

[TABLE]

In this disordered sample, various physical observables can be then obtained by considering the product of the $L$ corresponding transfer matrices $T_{v(x)}$ [16, 17]. One of the most important observable is the trace of this product

[TABLE]

The exponential growth with $L$ of its modulus $|{\cal T}_{L}[v(.)]|$ can be then measured by the finite-size Lyapunov exponent

[TABLE]

Of course a more complete analysis would involve the whole Lyapunov spectrum [17] of the product of matrices but will not be discussed here.

II.2 Statistics of the Lyapunov exponent $\lambda$ over the disorder configurations

For large $L$ , the probability distribution ${\cal P}_{L}(\lambda)$ of the finite-size Lyapunov exponent $\lambda$ of Eq. 4 over the disorder configurations $[v(.)]$ drawn with the probabilities of Eq. 2 is expected to follow the large deviation form [16, 17]

[TABLE]

where $I(\lambda)$ is called the ’rate function’ in the field of large deviations : it is positive $I(\lambda)\geq 0$ and vanishes only at its minimum corresponding to the typical value $\lambda^{typ}$ that will be realized with probability one in the thermodynamic limit $L\to+\infty$ .

[TABLE]

All other values $\lambda\neq\lambda^{typ}$ appear with a probability ${\cal P}_{L}(\lambda)$ that is exponentially small in $L$ in Eq. 5, but they are nevertheless important to understand the behavior of the moments of non-integer order $k$ of the trace of Eq. 3, as a consequence of their evaluation via the Laplace saddle-point method of the following integral over $\lambda$

[TABLE]

The function $\phi(k)$ governing their exponential growth in $L$ is called the ’scaled cumulant generating function’ in the field of large deviations. It corresponds to the Legendre transform of the rate function $I(\lambda)$ of Eq. 5 as a consequence of the saddle-point evaluation of Eq. 7

[TABLE]

with the reciprocal Legendre transform

[TABLE]

For $k=0$ where $\phi(k=0)=0$ as a consequence of the normalization in Eq. 7, one obtains that the typical value $\lambda^{typ}$ where the rate function vanishes (Eq. 6) corresponds to the derivative

[TABLE]

while all moments of order $k\neq 0$ are dominated by non-typical values of the Lyapunov exponent in the saddle-point calculation of Eq. 7.

Since the typical Lyapunov exponent $\lambda^{typ}$ appear with probability one in the thermodynamical limit $L\to+\infty$ , one of the main goal in the field of products of random matrices has been to compute it in various models via the Dyson-Schmidt invariant measure method [16, 17, 49]. In the present paper, our goal will be instead to focus on the simplest cases where the whole large deviations rate function $I(\lambda)$ can be explicitly obtained.

II.3 Examples of observables corresponding to products of random variables

It is clear that the simplest case is of Eq. 3 is when the transfer matrices $T_{v}$ are replaced by numbers $t_{v}$

[TABLE]

This case occurs in various disordered models, either exactly or approximately in some region of parameters, as displayed by the following examples.

II.3.1 Examples of observables that are exactly given by products of random variables

(1-a) In the classical Ising chain with random couplings $J(x)$ , the two-spin correlation function reads [50, 16, 17]

[TABLE]

(1-b) In the random quantum spin chains corresponding to free majorana fermions, the possible edge Majorana zero modes that characterize the topological phases are given in terms of product of random variables in the simplest cases (see [51] and references therein for various examples).

II.3.2 Observables that can be approximated by products of random variables in certain regions of parameters

(2-a) For the Anderson Localization tight-binding model with hopping $V$ and random on-site-energy $\epsilon(x)$ , the eigenfunction $\psi_{x_{0}}$ localized on site $x_{0}$ for $V=0$ can be approximated at lowest order in the hopping $V$ in the so-called Forward Approximation [52, 53, 54, 55] by the product

[TABLE]

(2-b) For the quantum Ising chain with random couplings $J(x)$ and random transverse fields $h(x)$ , the two-spin correlation function is given at lowest order in perturbation in the couplings by the product

[TABLE]

This form can also be understood from the Strong Disorder RG approach [27, 28] when only sites are decimated, or from the Cavity approach [56, 57, 58].

II.4 Classification of observables in terms of empirical histograms of the disorder configuration

For each disorder configuration $[v(x)]_{x=1,2,..,L}$ with periodic boundary conditions $v(L+x)=v(x)$ , the empirical 1-point histogram

[TABLE]

measures the frequencies of the possible values $v_{1}$ of the disorder variable. More generally, the empirical r-point histogram

[TABLE]

measures the frequencies of the occurrence of the r consecutive values $(v_{r},...v_{2},v_{1})$ in the disordered sample. This hierarchy can be constructed up to the maximal value $r_{max}=L$ that corresponds to the total length $L$ of the disorder configuration

[TABLE]

i.e. this represents the average over the $L$ translations via $x=1,2,..,L$ of the initial disorder configuration.

The observables of the disordered models can be then classified according to the order $r$ of the empirical r-point histogram that allows to reconstruct them. For instance, the product of Eq. 11 can be rewritten in terms of the empirical 1-point histogram $Q_{v_{1}}[v(.)]$ of Eq. 15 as

[TABLE]

The physical interpretation is that the product of random variables is not sensitive to the order of appearance of the disorder variables $v(x)$ , but depends only on the global frequencies of the possible values $v_{1}$ that are summarized in the empirical 1-point histogram $Q_{v_{1}}[v(.)]$ .

An example of observable that depends only on the empirical r-point histogram of Eq. 16 is the Spatial-Average within a given sample of the 2-point correlation function at distance $r$ of Eq. 12 in a given sample

[TABLE]

whose statistics is discussed in [50] to stress that it will coincide with the disorder-averaged correlation function only for the small sizes $r\leq(cst)\ln L$ . Finally, the most general observables depend on the empirical L-point histogram of Eq. 17 that contains the complete information on the disorder configuration.

The usefulness of this classification is that once one has identified that an observable $A[v(.)]$ depends on the disorder configuration $[v(.)]$ only via its empirical r-point histogram $Q_{(rpoints)}\left[v(.)\right]$ of Eq. 16

[TABLE]

then its probability distribution over the disorder configurations drawn with Eq. 2 depends only on the probability distribution $P_{L}[Q_{(rpoints)}]$ of the empirical r-point histogram

[TABLE]

In the theory of large deviations, it turned that the probability distributions $P_{L}[Q_{(rpoints)}]$ of the empirical r-point histograms of various order $r$ have been labelled by levels as follows [1, 3] : the Level 2 corresponds to the empirical 1-point histogram $Q_{.}$ , the Level 2.5 corresponds to the empirical 2-point histogram $Q_{..}$ , the Level 3 corresponds to the full hierarchy of arbitrary $r$ up to the limit $r\to+\infty$ . In the following sections, we will thus describe this hierarchy, starting with the Level 1 that corresponds to the large deviations properties of sums of random variables, that are important to fully characterize the statistics of products of random variables.

III Product of random variables

as the level-1 of large deviations

In this section, we focus on the product of random variables corresponding to the modulus of Eq 11

[TABLE]

and on the corresponding finite-size Lyapunov exponent of Eq. 4

[TABLE]

As explained in detail in the previous section, this is the simplest problem that occur in the field of disordered systems. In the language of large deviations, the properties of the sum of random variables of Eq. 23 is also the simplest example corresponding to the so-called ’Level-1’ description [1, 3].

III.1 Moments of non-integer order $k$

The moments of non-integer order $k$ of the product in Eq. 22 can be directly computed as a consequence of the independence of the disorder variables $v(x)$ on the $L$ sites (Eq. 2)

[TABLE]

So the scaled cumulant generating function $\phi(k)$ governing their exponential growth in $L$ (Eq 7) is given, actually even for any finite $L$ , by the simple expression

[TABLE]

in terms of the moments $\overline{|t_{v}|^{k}}$ of the elementary variable $|t_{v}|$ .

III.2 Rate function $I(\lambda)$ governing the large deviations of the Lyapunov exponent $\lambda$

The rate function $I(\lambda)$ governing the large deviations (Eq 5) of the Lyapunov exponent $\lambda$ of Eq. 23 can be computed either directly if the probability distribution of the sum of Eq. 23 is known or it can be obtained via the reciprocal Legendre transform (Eq. 9) from the knowledge of the function $\phi(k)$ of Eq. 25. Let us now recall some simple examples that will be useful later (in section VII).

III.3 Examples for the equilibrium of disordered classical models

In the field of disordered classical models, the simplest example is when the variable $t_{v(x)}$ corresponds to the Boltzmann weight at inverse temperature $\beta$ of the random potential $v(x)$

[TABLE]

Then Eq. 22 represents the Boltzmann weight of the $L$ sites

[TABLE]

and Eq. 23 corresponds to the energy per site (up to the factor $\beta$ )

[TABLE]

For instance if the distribution of the potential $v$ is Gaussian of zero mean

[TABLE]

then both the rate function $I(\lambda)$ and the scaled cumulant generating function $\phi(k)$ are simply quadratic

[TABLE]

Another example is when the distribution of the potential $v$ is the Bernoulli distribution

[TABLE]

then the rate function $I(\lambda)$ and the scaled cumulant generating function $\phi(k)$ read

[TABLE]

So it is important to stress here that the large deviations properties depend on all the details of the disorder distribution $p_{v}$ , in contrast to the small deviations region described by the Central-Limit-Theorem that corresponds to the expansion at lowest order of the rate function $I(\lambda)$ around its vanishing minimum at the typical value $\lambda_{typ}$ of Eq. 6

[TABLE]

III.4 Examples for disordered quantum models

For the Anderson Localization model in the Forward approximation of Eq. 13, it is usual to consider the box distribution of width $(2W)$ for the random on-site energy $\epsilon(x)$

[TABLE]

The elementary variable $t_{\epsilon(x)}$ in the product in Eq. 13 at the center of the band $\epsilon(x_{0})=0$

[TABLE]

has then moments only in the region $k<1$

[TABLE]

So the scaled cumulant generating function $\phi(k)$ of Eq. 25 reads

[TABLE]

with the corresponding rate function

[TABLE]

IV Empirical 1-point histogram as the level-2 of large deviations

In this section, we focus on the probability of the empirical 1-point histogram of Eq. 15 over the disorder configurations $v(.)$ drawn with Eq. 2

[TABLE]

Of course the typical value of this histogram is the ’true’ probability distribution $p_{v}$ of the disorder (Eq. 1)

[TABLE]

but here the goal is to describe its fluctuations for large $L$ . In the language of large deviations [1, 2, 3], this is known as the ’Level-2 description of the empirical measure’. The essential result is the large deviation form for large $L$

[TABLE]

where

[TABLE]

represents the normalization constraint of the empirical histogram (the notation $\delta(Y)$ represents the discrete Kronecker symbol $\delta_{0,Y}$ but has been chosen here for better readability of the argument $Y$ ), while the rate function is the relative entropy of the empirical 1-point histogram $Q_{v}$ with respect to the true probability distribution $p_{v}$ of the disorder

[TABLE]

This result is known as the Sanov theorem in the field of large deviations [1, 2, 3] and can be considered as the true cornerstone of the whole theory, with many further generalizations for the higher levels. It is thus important to fully understand its origin and its physical meaning, via the three following different derivations.

IV.1 First approach via the multinomial distribution

Since each disorder value $v(x)$ is drawn with probability $p_{v(x)}$ independently on each of the $L$ sites $x=1,2,..,L$ (Eq 2), the probability of the empirical 1-point histogram $Q_{.}$ of Eq. 15 amounts to analyze the integer numbers $(LQ_{v})$ of the occurrences of each value $v$ and is thus given by the multinomial distribution

[TABLE]

The Stirling’s approximation for the factorials $m!\simeq\sqrt{2\pi m}\ m^{m}e^{-m}$ then yields the large deviation form of Eq. 41 with the relative entropy of Eq. 43. This derivation based on the application of the Stirling’s approximation to the multinomial distribution of Eq. 44 goes back to Boltzmann [2] and appears in all statistical physics lectures.

IV.2 Second approach via the generating function

Another derivation is based on the generating function of the empirical 1-point histogram of Eq. 39

[TABLE]

This factorized form is valid already for any finite $L$ and the corresponding scaled cumulant generating function $\Phi[\nu_{.}]$ governing the exponential growth with $L$

[TABLE]

is given in terms of the generating function of the disorder distribution $p_{v}$

[TABLE]

where the analogy with Eq. 25 is clear. It is now useful to show the link with the the relative entropy of Eq. 43 via the Legendre transform and the reciprocal Legendre transform respecify.

IV.2.1 Link with the relative entropy via the Legendre transform

The generating function of Eq 45 can be rewritten in terms of Eq. 41 as

[TABLE]

The Laplace’s saddle point method for large $L$ yields that one should optimize over $Q_{.}$ the function in the exponential in the presence of the normalization constraint $\left(1-\sum_{v}Q_{v}\right)$ in order to obtain the function $\Phi[\nu_{.}]$ of Eq. 46

[TABLE]

Taking into account the constraint via some Lagrange multiplier $\eta$ , one needs to optimize the functional

[TABLE]

over the values $Q_{v}$

[TABLE]

One obtains the optimal solution

[TABLE]

where the Lagrange multiplier $\eta$ is fixed by the constraint

[TABLE]

The optimal value of the functional of Eq. 50

[TABLE]

indeed coincides with the result of Eq. 47.

IV.2.2 Link with the relative entropy via the reciprocal Legendre transform

The reciprocal Legendre transform of Eq. 49 reads

[TABLE]

The optimization over $\nu_{v}$

[TABLE]

yields the optimal solution

[TABLE]

and the optimal value of the functional of Eq. 55

[TABLE]

coincides with the relative entropy as it should.

These calculations based on generating functions, Laplace’s saddle-point method with constraints taken into account via Lagrange multipliers, and Legendre transforms are very standard both in statistical physics and in the theory of large deviations.

IV.3 Third approach via some appropriate change of measure

The third approach via some appropriate change of measure is very common in the whole field of large deviations, but appears to be less well known among physicists. It seems thus useful to explain it here in more physical terms than usual. The starting point is that the probability of the disorder configuration $[v(x)]_{x=1,2,..,L}$ of Eq. 2 can be rewritten only in terms of the empirical 1-point histogram of Eq. 15

[TABLE]

So all the disorder configurations that have the same empirical 1-point histogram $Q_{.}$ have the same probability in Eq. 59. As a consequence, the normalization of Eq. 59 over all disorder configurations $[v(.)]$ can be rewritten as a sum over the possible empirical 1-point histogram $Q_{.}$

[TABLE]

where

[TABLE]

counts the number of disorder configurations that are associated to the same value $Q_{.}$ of the empirical histogram. So the probability $P_{L}[Q_{.}]$ of Eq 39 to observe the empirical histogram $Q_{.}$ reads

[TABLE]

When the empirical 1-point histogram takes its typical value $p_{v}$ of Eq. 40, the probability of Eq. 62

[TABLE]

should not decay exponentially in $L$ , so that $\Omega_{L}[p_{.}]$ should grow exponentially in $L$ in order to compensate exactly the other exponential factor

[TABLE]

To obtain the behavior of $\Omega_{L}[Q_{.}]$ when the empirical 1-point histogram $Q_{.}$ is different from its typical value $Q_{.}^{typ}=p_{.}$ , we may consider a modified model where the disorder is drawn with the modified probability ${\tilde{p}}_{v}=Q_{v}$ that will make $Q_{v}$ typical for this modified model, and one obtains

[TABLE]

where

[TABLE]

represents the entropy of the empirical 1-point histogram $Q_{.}$ . Plugging this result into Eq. 62 yields that the large deviation behavior of the probability of the empirical 1-point histogram

[TABLE]

involves again the relative entropy $S^{rel}(Q_{.}|p_{.})$ as it should to recover Eq 41 and Eq. 43.

This idea to evaluate the large deviations properties of the untypical values of the empirical observable via the introduction of a modified model that make this empirical observable typical is used extensively in the field of large deviation for the two following reasons. From the conceptual point of view, this way of thinking is very illuminating because it shows very clearly why the entropy $S_{1}[Q_{.}]$ appears in Eq. 65 and why the relative entropy $S^{rel}(Q_{.}|p_{.})$ appears in Eq 67. From the technical point of view, it is extremely powerful, since it allows to obtain directly the results without any actual computations : indeed, one does not need to use combinatorics to enumerate the appropriate configurations in finite size as in Eq. 44, and one does not need either to compute the generating function of Eq. 45 and to perform the reciprocal Legendre transform, but one obtains directly the rate function from simple considerations. In the following sections concerning the more complicated cases of empirical histograms of higher orders, as well as in the Appendix, we will see how this approach can be adapted to each purpose in order to obtain directly the appropriate rate functions without any calculation.

V Empirical 2-point histogram as the Level 2.5 of large deviations

In this section, we focus on the probability of the empirical 2-point histogram of Eq. 16 for $r=2$ over the disorder configurations $[v(.)]$ drawn with Eq. 2

[TABLE]

Its large deviations properties have been analyzed in the context of Markov chains [11, 59, 3]. Together with its analog formulations for Markov jump processes in continuous time [11, 60, 61, 62, 63, 64, 65, 66, 15, 67, 68, 69] and for diffusion processes [63, 70, 64, 71, 15], it is nowadays called the ’Level 2.5’ in the field of large deviations.

V.1 Constraints on the empirical 2-point histogram $Q_{..}$

Since the empirical 1-point histogram $Q_{.}$ can be reconstructed by summing over the last or the first value of the empirical 2-point histogram $Q_{..}$ , it is convenient to introduce the following notation to summarize these constraints

[TABLE]

while the empirical 1-point histogram $Q_{.}$ should of course still satisfy the normalization constraint $C_{1}[Q_{.}]$ of Eq. 42.

V.2 Generalized Markovian model for the disorder

In order to analyze the statistical properties of the empirical 2-point histogram, it is useful to introduce a generalized model where the disorder configurations are generated by a Markov chain where the transition probability matrix $W_{v^{\prime}v}$ to go from $v$ to $v^{\prime}$ is normalized to unity

[TABLE]

The probability of Eq. 2 for a disorder configuration $[v(x)]_{x=1,2,..,L}$ is thus replaced by the product of the transition probabilities along the configuration (up to boundary terms that become negligible for large $L\to+\infty$ )

[TABLE]

It is also useful to introduce the stationary state $\rho_{v}$ of this Markov chain satisfying

[TABLE]

with the normalization

[TABLE]

For this generalized Markovian model, the typical value of the empirical 1-point histogram of Eq. 15 is simply the stationary state $\rho_{v}$ introduced in Eq. 72

[TABLE]

while the typical value of the empirical 2-point histogram is given by the corresponding flow appearing in Eq. 72

[TABLE]

that satisfy the constraints of Eqs 69 and Eq 42.

Since the probability of Eq. 71 can be rewritten only in terms of the empirical 2-point histogram $Q_{v^{\prime}v}$ as

[TABLE]

the normalization over disorder configurations can be rewritten as a sum over the empirical 1-point and 2-point histograms with their constraints of Eq. 42 and Eq 69 as

[TABLE]

where $\Omega_{L}[Q_{..},Q_{.}]$ counts the number of disorder configurations that have the empirical observables $[Q_{..},Q_{.}]$ and is thus the direct generalization of Eq. 61, while the probability to observe these empirical observables reads

[TABLE]

For the typical values of Eq. 74 and Eq 75 of the empirical observables, this probability should not be exponentially small in $L$ so that $\Omega_{L}[Q^{typ}_{..},Q^{typ}_{.}]$ should exactly compensate the other exponential factor in Eq. 78

[TABLE]

For other values of the empirical observables, one may consider a modified Markov transition matrix ${\tilde{W}}_{v^{\prime}v}$ that would make these empirical histograms typical : Eqs 74 and 75 yields that the appropriate choice is

[TABLE]

so that Eq 79 becomes

[TABLE]

where $S_{2}[Q_{..}]$ represents the entropy of the empirical 2-point histogram $Q_{..}$

[TABLE]

while $S_{1}[Q_{.}]$ is the entropy of the empirical 1-point histogram $Q_{.}$ introduced in Eq 66.

Plugging Eq. 81 into Eq 78 yields the large deviation form [11, 59, 3]

[TABLE]

that is called nowadays the ’Level 2.5’ for Markov chains. The rate function can be interpreted as the relative entropy for Markov chains [11, 59, 3]. The analog results have been much studied for Markov jump processes in continuous time [11, 60, 61, 62, 63, 64, 65, 66, 15, 67, 68, 69] and for diffusion processes [63, 70, 64, 71, 15],

V.3 Return to the initial disorder of Eq. 2

The initial disorder model of Eq. 2 corresponds to the special case where the Markov matrix of Eq. 70 reduces to

[TABLE]

Then Eq 83 simplifies into

[TABLE]

In the last expression, one recognizes the probability $P_{L}[Q_{.}]$ of the empirical 1-point histogram $Q_{.}$ of Eq. 67. This yields that the conditional probability to observe the empirical 2-point histogram $Q_{..}$ once the empirical 1-point histogram $Q_{.}$ is given reads

[TABLE]

In particular, once the empirical 1-point histogram $Q_{.}$ is given, the typical value of the empirical 2-point histogram $Q_{..}$ is simply the product

[TABLE]

as it should, while Eq 86 described the large deviations away from this typical value.

VI Empirical higher order histograms as the level 3 of large deviations

In the language of large deviations, the Level 3 actually denotes the empirical process that can be constructed from the knowledge of the empirical r-point histogram in the limit $r\to+\infty$ [1, 3]. In this section, we will not be interested into taking this limit, but we wish to analyze the hierarchy of the empirical r-point histograms of arbitrary order $r$ up to the maximal value $r_{max}=L$ (Eq 17), in order to characterize the sample-to-sample fluctuations for a disordered ring of large size $L$ . So strictly speaking, this section is between the Level 2.5 of the previous section and the Level 3 concerning the limit $r\to+\infty$ .

VI.1 Large deviations properties of the empirical r-point histograms of arbitrary order $r$

In the two previous sections, we have described in detail the large deviations properties of the empirical 1-point histogram $Q_{.}$ and 2-point histogram $Q_{..}$ . Via iteration, one may analyze similarly the properties of the empirical r-point histogram $Q_{(rpoints)}$ of Eq. 16 of arbitrary order $r$ . Since the empirical $(r-1)$ -point histogram $Q_{((r-1)points)}$ can be reconstructed by summing over the last or the first value of the r-point histogram $Q_{(rpoints)}$ , it is convenient to introduce the following notation analogous to Eq 69 to summarize them

[TABLE]

The final result is that the probability $P_{L}(Q_{(rpoints)},...,Q_{..},Q_{.})$ to observe the empirical histograms up to the $r$ -point histogram $Q_{(rpoints)}$ normalized to unity

[TABLE]

follows the large deviation form

[TABLE]

that generalizes Eq. 85. Besides the consistency constraints $(C_{1},..,C_{r})$ up to order $r$ (Eq 88) and besides the disorder configuration weight $e^{L\sum_{v}Q_{v}\ln(p_{v})}$ of Eq. 59 that only involves the empirical 1-point histogram $Q_{v}$ , the remaining factor corresponds to the exponential growth of the number of configurations that have some empirical r-point histogram $Q_{(rpoints)}$

[TABLE]

in terms of the entropy of the empirical r-point histogram $Q_{(rpoints)}$

[TABLE]

Equivalently, Eq. 90 means that the conditional probability to observe the empirical $r$ -point histogram $Q_{(rpoints)}$ once the empirical $(r-1)$ -point histogram $Q_{((r-1)points)}$ is given reads

[TABLE]

which is the generalization of Eq. 86.

VI.2 Analysis of the hierarchy in the backward direction via contraction

Up to now we have described the hierarchy of empirical histograms by considering successively higher and higher order $r$ . But it is also useful to see now how one goes backwards in this hierarchy, via the notion of ’contraction’ which is the generic name in the field of large deviations for the operation needed to go from a higher to a lower level of description. In our present case, the contraction consists in finding the optimal empirical $r$ -point histogram that maximizes the conditional probability Eq 93 when all the lower-order empirical histograms are given. One needs to maximize the exponential factor in Eq 93 in the presence of the constraints $C_{r}[Q_{(rpoints)},Q_{(r-1)points}]$ of Eq. 88 that can be taken into account via Lagrange multipliers. So one considers the following functional of $Q_{(rpoints)}$

[TABLE]

The optimization with respect to $Q_{v_{r}...v_{2}v_{1}}$

[TABLE]

yields the optimal solution

[TABLE]

where the Lagrange multipliers $f_{v_{r}...v_{2}}$ and $g_{v_{r-1}...v_{1}}$ have to be chosen to satisfy the constraints

[TABLE]

A further consequence is thus the following constraint involving the empirical histogram of order $(k-2)$

[TABLE]

These four last equations yield that the optimal solution of Eq. 96 can be simply rewritten as the product of the two empirical observables of order $(k-1)$ of Eq 97 divided by the empirical observable of order $(k-2)$ of Eq 98

[TABLE]

One then needs to evaluate the entropy of Eq. 92 of this optimal solution $Q_{v_{r}...v_{2}v_{1}}^{*}$

[TABLE]

So the functional of Eq. 94 vanishes for this optimal solution $Q^{*}_{(rpoints)}$

[TABLE]

i.e. the conditional probability of Eq. 93 does not decay exponentially in $L$ for this optimal solution $Q^{*}_{(rpoints)}$ , that represents the typical value of $Q_{v_{r}...v_{2}v_{1}}$ once all the empirical histograms of lower order are given

[TABLE]

The probability of all other values is described by the large deviation form of Eq. 93.

VII Random models on the Cayley tree from large deviations of branches

Many random models have been studied on the geometry of the Cayley tree, where the absence of loops allows to write exact recurrences on probability distributions : two famous examples are the Directed Polymer on the Cayley tree [72, 73] and the Anderson Localization on the Cayley tree [74, 75, 76, 77]. In the Cayley tree of branching ratio $K$ around the central root $O$ , the number of sites at distance $r$

[TABLE]

grows exponentially with the distance $r$ , in contrast to the power-law growth as $r^{d-1}$ in any finite dimension $d$ . The Cayley tree is thus considered as an appropriate way to define the mean-field version of random models in infinite dimensionality $d=\infty$ .

It is interesting to compare the properties of the same random model defined in the two following geometries :

(i) in the finite Cayley tree of branching ratio $K$ with $L$ generations around the central root $O$ , where the number of leaves is given by Eq. 103 for $r=L$

[TABLE]

(ii) in the star geometry, where the central root $O$ is linked to $K^{L}$ independent one-dimensional lattices of $L$ sites, so that the number of sites at distance $r$ is actually independent of $r$

[TABLE]

but the number of leaves at $r=L$ displays the same exponential behavior in $L$ as Eq. 104.

Although (ii) may look as an extremely crude approximation of (i), the properties of some random models defined on (i) and (ii) have turned out to be very close, as exemplified by the exact solutions of (i) the Directed Polymer on the Cayley tree [72, 73] and of (ii) the Directed Polymer in the star geometry that coincides with the Random Energy Model [78] (a model that had been introduced before with completely different motivations coming from mean-field spin-glasses). The differences between the two only appear in the finite-size scaling properties of the freezing transition [73].

In the star geometry (ii), it is clear that the random model will be governed by the large deviations properties of the corresponding one-dimensional model of length $L$ that appear on the $K^{L}$ independent branches. In this section, the goal is thus to describe how the large deviations properties of one-dimensional models that have been discussed in the previous sections can be used to analyze the properties of the same model on this star geometry (ii).

VII.1 Model on the star geometry where each branch corresponds to a product of random variables

We wish the analyze the star geometry (ii) above, where each of the independent $K^{L}$ branches labelled by $b=1,2,..,K^{L}$ can be described by a product of $L$ random variables as Eq. 11

[TABLE]

with its corresponding finite-size Lyapunov exponent of Eq. 23

[TABLE]

whose large deviations properties for large $L$ are described by some rate function $I(\lambda)$

[TABLE]

Each disordered configuration on the star geometry can be then characterized by the empirical histogram of the Lyapunov exponent $\lambda_{b}$ of Eq. 107 for the $K^{L}$ independent branches

[TABLE]

while the empirical number of branches having the Lyapunov exponent $\lambda$ reads

[TABLE]

In various models, an interesting class of observables are given by the sums over the $K^{L}$ independent branches of the powers of non-integer $k$ of the products $\tau_{L}[v_{b}(.)]$ of Eq. 106

[TABLE]

that can be rewritten in terms of the Lyapunov exponents $\lambda_{b}$ (Eq 107) of the $K^{L}$ branches or in terms of the empirical observables of Eqs 109 and 110 as

[TABLE]

VII.2 Statistical properties of the empirical histogram ${\cal Q}_{L}(\lambda)$ of the Lyapunov exponent

The typical value of the empirical histogram of Eq. 109 is given by the true probability of the Lyapunov exponent of Eq. 108

[TABLE]

so that in a given sample, the empirical number of branches of Eq. 110 has for typical value

[TABLE]

The typical value $\lambda^{typ}$ of the one-dimensional model corresponding to the vanishing of the rate function $I(\lambda^{typ})=0$ will thus appear in an extensive number of the branches

[TABLE]

while all the other values in the interval $\lambda^{-}<\lambda<\lambda^{+}$ where

[TABLE]

will appear in a sub-extensive number $e^{L\left[\ln K-I(\lambda)\right]}$ of branches. Finally, the values of the Lyapunov exponent outside this interval, i.e. in the two regions $\lambda<\lambda^{-}$ and $\lambda>\lambda^{+}$ where the rate function satisfies $I(\lambda)>\ln K$ are too rare to appear in a typical sample of the star geometry, so that Eq. 114 should be rewritten more precisely for a typical sample as

[TABLE]

However the values $\lambda<\lambda^{-}$ and $\lambda>\lambda^{+}$ that do not appear in a typical sample may appear in atypical samples, and it is thus interesting to consider the large deviations of the empirical histogram ${\cal Q}_{L}(.)$ of Eq. 109 with respect to its typical value ${\cal Q}^{typ}_{L}(.)={\cal P}_{L}(.)$ of Eq. 113 : since the $K^{L}$ branches are independent, one may directly adapt the Sanov result of Eq. 41 to our present notations : the probability to observe the empirical histogram ${\cal Q}_{L}(.)$ follows the large deviation form with respect to the size $K^{L}$

[TABLE]

where the rate function corresponds to the relative entropy

[TABLE]

of the empirical histogram ${\cal Q}_{L}(.)$ with respect to the true probability distribution ${\cal P}_{L}(.)$ of the Lyapunov exponent (Eq. 108). As explained in detail in section IV.2, the Sanov result of Eq. 118 is equivalent to the following expression of the generating function that is valid for any finite $L$ (Eq. 45 as adapted to our present context)

[TABLE]

In particular, the successive derivatives with respect to $\nu(\lambda)$

[TABLE]

gives the integer moments of the number ${\cal N}_{L}(\lambda)=K^{L}{\cal Q}_{L}(\lambda)$ of branches with some Lyapunov exponent $\lambda$ (Eq 110) by taking $\nu(.)=0$ . The first moment

[TABLE]

coincides with the typical value ${\cal N}^{typ}_{L}(\lambda)$ of Eq. 114. The second moment

[TABLE]

can be rewritten in terms of the typical value ${\cal N}^{typ}_{L}(\lambda)$ of Eq. 114 as

[TABLE]

and will thus change of behavior at the values $\lambda^{\pm}$ introduced in Eq. 116. In the region $\lambda^{-}<\lambda<\lambda^{+}$ where ${\cal N}^{typ}_{L}(\lambda)$ is exponentially large, the second term dominates over the first term that corresponds to a small fluctuation. In the other regions where ${\cal N}^{typ}_{L}(\lambda)$ is exponentially small, the first term dominates and actually represents the very small probability to have a single rare event

[TABLE]

This result can be generalized to arbitrary moments, as described in the context of the Random Energy Model [78].

VII.3 Statistical properties of the empirical sums ${\cal S}_{L}(k)$ of Eq. 112

The disorder-averaged value of the empirical sum ${\cal S}_{L}(k)$ of Eq. 111 reads

[TABLE]

where the moments $\overline{|\tau_{L}[v(.)]|^{k}}$ of non-integer order $k$ for the product of random variables have been already discussed in Eq. 24

[TABLE]

in terms of the scaled cumulant generating function $\phi(k)$

[TABLE]

that corresponds to the Legendre transform of the rate function $I(\lambda)$ .

On the other hand, Eq. 112 yields that the sum ${\cal S}_{L}(k)$ in a typical sample can be computed from the empirical histogram in a typical sample (Eq. 117)

[TABLE]

So the only difference with the averaged value (Eqs 126 and 127)

[TABLE]

lies in the boundaries $\lambda^{-}\leq\lambda\leq\lambda^{+}$ for the integration over the Lyapunov exponent that appear for the value in a typical sample (Eq 129) but that are absent in the averaged value of Eq. 130. As a consequence, one needs to discuss the position of the saddle-point value $\lambda_{k}$ that governs the integral governing the averaged value of Eq. 130

[TABLE]

with respect to the two boundaries $\lambda^{\pm}$ of the integral governing the typical-sample value of Eq. 129. It is thus useful to introduce the two values $k^{\pm}$ satisfying $\lambda_{k^{\pm}}=\lambda^{\pm}$ i.e.

[TABLE]

and to distinguish the three following cases :

(a) In the region $k^{-}<k<k^{+}$ , the saddle-point value $\lambda_{k}$ of Eq. 131 is in the interval

[TABLE]

The typical-sample value ${\cal S}^{TypicalSample}_{L}(k)$ of Eq. 129 has then the same exponential behavior in $L$ as the averaged value $\overline{{\cal S}_{L}(k)}$ involving the Legendre transform $\phi(k)$ (Eqs 127 and 25 ) of $I(\lambda)$

[TABLE]

(b) In the region $k>k^{+}$ , the saddle-point value $\lambda_{k}$ of Eq. 131 is bigger than $\lambda^{+}$

[TABLE]

The typical-sample value ${\cal S}^{TypicalSample}_{L}(k)$ of Eq. 129 is then governed by the saddle-point evaluation frozen at the boundary $\lambda_{+}$ satisfying $I(\lambda_{+})=\ln K$ (Eq 116)

[TABLE]

(c) In the region $k<k^{-}$ , the saddle-point value $\lambda_{k}$ of Eq. 131 is smaller than $\lambda^{-}$

[TABLE]

The typical-sample value ${\cal S}^{TypicalSample}_{L}(k)$ of Eq. 129 is then governed by the saddle-point evaluation frozen at the boundary $\lambda_{-}$ satisfying $I(\lambda_{-})=\ln K$ (Eq 116)

[TABLE]

VII.4 Sample-to-sample fluctuations in the frozen phase $k>k^{+}$

In the frozen phase $k>k^{+}$ , the sample-dependent version of Eq. 136 is that the sum ${\cal S}_{L}(k)$ in a given sample will be actually governed by the biggest Lyapunov exponents available among the $K^{L}$ branches. It is thus convenient to relabel in each sample the Lyaponov exponents according to their magnitudes

[TABLE]

and to analyze the statistics of the first biggest terms in the sum of Eq. 112

[TABLE]

and in particular the first one that involves the maximal Lyapunov exponent $\lambda_{1}$

[TABLE]

VII.4.1 Probability distribution of the maximal Lyapunov exponent $\lambda_{1}$ in each sample

The maximal Lyapunov exponent $\lambda_{1}$ is typically of order $\lambda^{+}$ , but here we wish to analyze its probability distribution $R(\lambda_{1})$ over the samples. The corresponding cumulative distribution reads in terms of ${\cal P}_{L}(\lambda)$ of Eq. 108

[TABLE]

The change of variables

[TABLE]

centered around the value $\lambda^{+}$ where $I(\lambda^{+})=\ln K$ and $I(\lambda^{+})=k^{+}$ (Eq. 132) leads to the Taylor expansion of the rate function

[TABLE]

Plugging this expansion into Eq 142

[TABLE]

yields the convergence towards the Gumbel distribution (well-known as one of the three universality classes for the extreme-value statistics of independent random variables [82, 83])

[TABLE]

for the $O(1)$ random variable $u$ introduced in Eq. 143.

VII.4.2 Probability distribution of ${\cal S}^{first}_{L}(k)=e^{Lk\lambda_{1}}$ over the samples

Eq 141 yields that its logarithm reads with the change of variables of Eq. 143

[TABLE]

where $u$ is distributed with the Gumbel distribution of Eq. 146. This means that the probability distribution of $\left(\ln{\cal S}^{first}_{L}(k)\right)$ propagates as a traveling wave as $L$ grows : the first term $Lk\lambda^{+}$ corresponds to a motion with the non-random velocity $(k\lambda^{+})$ with respect to $L$ , while the second term $\frac{k}{k^{+}}u$ is random and independent of $L$ , i.e. its probability distribution corresponds to the fixed shape of the traveling wave. This notion of traveling wave has been stressed here because it plays a major role in the analysis of random models defined on Cayley trees, as first discovered with the exact solution of the Directed Polymer on the Cayley tree [72].

Eq. 147 translates into

[TABLE]

where ${\cal S}^{TypicalSample}_{L}(k)=e^{Lk\lambda^{+}}$ is the value in a typical sample introduced in 136, while

[TABLE]

is an $O(1)$ positive random variable, whose distribution reads in terms of the Gumbel distribution $G(u)$ of Eq. 146

[TABLE]

where the exponent

[TABLE]

governs the power-law decay of Eq. 150 for large $X$

[TABLE]

The exponent $\mu_{k}$ decays continuously in the frozen phase $k\geq k^{+}$ from the value $\mu_{k=k^{+}}=1$ to vanishing values $\mu_{(k\to+\infty)}\to 0$ . Since it remains smaller than one in the whole frozen phase $k\geq k^{+}$

[TABLE]

the averaged value of the variable $X$ is infinite

[TABLE]

i.e. the averaged value $\overline{{\cal S}_{L}^{first}(k)}$ in Eq. 148 has a different exponential behavior in $L$ than the typical value ${\cal S}^{TypicalSample}_{L}(k)$ , in consistency with the discussion around Eq. 130.

VII.5 Application to the Directed Polymer and the Random Energy Model

With respect to the generic notations of section VII.1, the Random Energy Model [78] corresponds to $K=2$ and to the case where $\lambda$ is an energy distributed with a Gaussian distribution of Eq. 29 so that the rate function $I(\lambda)$ and the scaled cumulant generating function $\phi(k)$ are quadratic

[TABLE]

The empirical number ${\cal N}_{L}(\lambda)$ of Eq. 110 corresponds to the number of accessible states in the microcanonical ensemble where the energy density $\lambda$ is fixed, and its value ${\cal N}_{L}(\lambda)$ in a typical sample (Eq. 117) yields that the function in the exponential corresponds to the entropy as a function of the energy density $\lambda$ in the microcanonical ensemble [78]

[TABLE]

with the boundaries

[TABLE]

With the change of notation $k\to\beta$ , the empirical sum ${\cal S}_{L}(k)$ of Eq. 111 and 112 corresponds to the partition function $Z_{L}(\beta)$ in the canonical ensemble at inverse temperature $\beta$

[TABLE]

with its disordered-averaged value (Eqs 126 and 127)

[TABLE]

while its value in a typical sample (Eq 129) involves the microcanonical entropy of Eq. 156

[TABLE]

Since the inverse temperature $\beta$ is positive $\beta>0$ (instead of $k$ of arbitrary sign above), the critical temperature $\beta_{c}$ of the freezing transition corresponds to the solution $k^{+}$ of Eq 132

[TABLE]

The two phases are [78]

(a) the high-temperature phase $\beta<\beta_{c}$ where the partition function in a typical sample (Eq 129) coincides with the averaged value of Eq. 159.

(b) the low-temperature frozen phase $\beta<\beta_{c}$ where the partition function in a typical sample is different from the averaged value of Eq. 159 because it is governed by the boundary $\lambda_{+}$ (Eq. 136)

[TABLE]

In this frozen phase, the exponent of Eq. 151

[TABLE]

of the heavy-tail distribution of Eq. 152 allows to analyze further the statistics of overlaps in terms of the weights of individual terms within in a Lévy sum of random variables distributed with heavy tails [79, 80, 81].

VII.6 Application to Anderson Localization

The notations for the Anderson Localization model have been explained in the subsection III.4 with the rate function $I(\lambda)$ and the scaled cumulant generating function $\phi(k)$ given by Eqs 37 and 38

[TABLE]

Here the analysis concerns the localized phase in the regime of small hopping $V$ where the forward perturbation formula of Eq. 13 is valid, so it will be possible to use this approach up to the critical hopping $V_{c}$ of the delocalization transition only if the branching ratio $K$ is large $K\gg 1$ .

The empirical number of Eq 110 counts the number of leaves (among the $K^{L}$ branches) where the wave-function $|\psi_{b}(L)|$ is of order $e^{L\lambda}$ with respect to the finite wave-function at the center. The empirical number in a typical sample (Eq. 117) reads

[TABLE]

where the boundaries $\lambda^{\pm}$ are given by Eq. 116

[TABLE]

For large $K\gg 1$ , the upper boundary is given by

[TABLE]

The localized phase correspond to the region $\lambda_{+}<0$ , where the wave-function decays exponentially on all the $K^{L}$ branches, while the delocalization transition occurs when $\lambda^{+}$ vanishes

[TABLE]

so the critical hopping $V_{c}$ for the delocalization transition is given for large $K\gg 1$ by

[TABLE]

At this critical point $V=V_{c}$ , the inverse participation ratios

[TABLE]

correspond to the empirical sums of Eq. 111 with the change of notation $k=2q$ , so that their disordered-averaged values (Eqs 126 and 127) read for $q<\frac{1}{2}$

[TABLE]

where the exponents $\tau_{q}^{av}$ defined with respect to the number $K^{L}$ of sites read for large $K\gg 1$

[TABLE]

Eq 132 yields that the boundary value $k^{+}=2q^{+}$ using Eq. 167 and Eq. 169

[TABLE]

is close to unity for large $K$ , so that the inverse participation ratios in a typical sample

[TABLE]

involve essentially the same exponents as the averaged values of Eq. 171

[TABLE]

These exponents are known as the ’Strong Multifractality spectrum’ in the field of Anderson transitions [30], where they appears either in the limit of infinite dimensionality $d\to+\infty$ or in related long-ranged power-law hoppings in one-dimension [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98], or more recently in toy models of Many-Body-Localization [99, 100]. Although the freezing transitions at the values $q^{\pm}$ is not very important for this ’Strong Multifractality spectrum’, they have been much discussed in the general theory of multifractality at Anderson transitions in finite dimension $d$ [29, 30, 88]

VII.7 Application to the Quantum Ising Model

As a third and final example, let us mention the case of the random transverse field spin-glass model on the Cayley tree that has been studied recently via real-space renormalization and where the large deviations properties of the one-dimensional model play a major role [101]. Here the difference with the two previous examples of the Random Energy Model and of Anderson Localization is that the one-dimensional model has already its phase transition between the spin-glass phase and the paramagnetic phase, where the exact critical properties have been obtained by the Strong Disorder renormalization approach [27, 28]. As a consequence, one obtains three phases that can be explained as follows in the star-geometry (ii) of Eq. 105, where one considers that the center $O$ is linked to $K^{L}$ independent chains of length $L$ [101], i.e. each branch $b=1,..,K^{L}$ is characterized by the Lyapunov exponent (Eq 14)

[TABLE]

(a) the star is in its paramagnetic phase if all the $K^{L}$ chains are in their paramagnetic state $\lambda_{b}<0$ , i.e. the boundary value $\lambda_{+}$ of Eq. 116 should be negative $\lambda_{+}<0$ .

(b) the star is in its spin-glass phase with an extensive spin-glass order if an extensive number of the $K^{L}$ chains are in their spin-glass phase $\lambda_{b}>0$ i.e. the typical Lyapunov exponent should be positive $\lambda_{typ}>0$ .

(c) in between, i.e. in the region $\lambda_{typ}<0<\lambda_{+}$ , the star is a spin-glass phase with an sub-extensive spin-glass order, because only the subextensive number of chains are in their spin-glass phase $\lambda_{b}>0$ , while an extensive number of chains are in their paramagnetic state $\lambda_{b}<0$ .

VIII Conclusion

In this pedagogical introduction, we have explained why the general theory of large deviations is the natural language to analyze the properties of disordered systems in order to offer a unified perspective on the typical events and on the rare events that occur on various scales. We have first focused on one-dimensional random models in order to emphasize the various levels of description. We have first recalled how the Level 1 allows to analyze the properties of observables given by products of random variables that occur in many classical or quantum models. We have then described how a finer analysis in terms of the whole hierarchy of empirical histograms allows to classify the set of disorder configurations into subsets that have the same empirical properties up to a certain order. We have then turned our attention to random models defined on Cayley trees, in order to analyze their properties in terms of the large deviations of branches. We have taken as examples various emblematic classical and quantum disordered systems in order to highlight the common underlying mechanisms from the point of view of large deviations.

The large deviation analysis of disordered systems in finite dimension $1<d<+\infty$ clearly goes beyond the scope of the present introduction. Although some notions can be directly applied, like the Sanov theorem for the empirical 1-point histogram, or the multifractal analysis at Anderson transitions [29, 30] or at phase transitions of random classical models [31, 32, 33, 34, 35, 36, 37, 38], one should be aware that qualitatively new phenomena may also occur. For instance the large deviations properties that have been exactly computed [102, 103, 104] for the Directed Polymer in dimension $d=2$ display an asymmetry between values bigger or smaller than the typical value, with two different scalings with respect to the length $L$ of the polymer : an ’anomalously good’ ground state energy requires only $L$ anomalously good on-site energies along the polymer, while an ’anomalously bad’ ground state energy requires $L^{2}$ bad on-site energies in the two-dimensional sample. So this single example already shows that some properties of random systems in finite dimensions $d$ call for a much broader large deviation theory with two different scalings for values bigger or smaller than the typical value, as discussed in more details in the recent preprint [105].

Appendix A Alternative classification of disorder configurations in terms of empirical intervals

In the text, we have described the classification of one-dimensional disorder configurations in terms of the hierarchy of the empirical r-point histograms. In this Appendix, we discuss an alternative classification in terms of the empirical intervals during which the disorder keeps a constant value, since this framework is more appropriate to analyze the Lifshitz and the Griffiths singularities as we now recall.

A.1 Observables corresponding to products of contributions from intervals of random lengths

After the product of random variables discussed in section II.3, the next simpler case of Eq. 3 concerns the case where the disorder variable can take only two values that will be labelled by $v=\pm$ . It is then useful to replace the disorder configuration $[v(x)]_{x=1,2,..,L}$ by its decomposition into intervals during which the disorder keeps the same value. For a model defined on a ring of $L$ sites (i.e. with periodic boundary conditions $L+x=x$ ) there will be an empirical even number $(2N)$ of intervals, where the $N$ odd intervals $(2i-1)$ of lengths $l_{2i-1}$ are associated to the value $v=-$ , while the $N$ even intervals $(2i)$ of lengths $l_{2i}$ are associated to the value $v=+$ . The lengths $l_{i}$ satisfy the sum rule

[TABLE]

When the disorder configuration $[v(x)]_{x=1,2,..,L}$ is replaced by the list $\left[l_{i}\right]_{i=1,2,..,2N}$ of the lengths of the intervals, the trace of Eq. 3 becomes

[TABLE]

To analyze the Lifshitz and the Griffiths singularities mentioned in the Introduction, various models have been studied in the regime where the value $v=-$ corresponds to a very strong disorder value where the associated transfer matrix $T_{-}$ can be approximated by a projector on some state $|0>$ with some eigenvalue $t_{-}$ [16]

[TABLE]

Then Eq. 178 simplifies into the product of the contributions of the intervals

[TABLE]

where the contribution of an interval $v=-$ of length $l$ is simply

[TABLE]

while the contribution of an interval $v=+$ of length $l$ corresponds to the the pure model $v=+$ with the boundary conditions $|0>$ fixed by the projector form of Eq. 179

[TABLE]

Various examples concerning Anderson Localization models and classical spin chains are described in the book [16], while an example concerning random DNA is analyzed in [23].

A.2 Empirical 1-interval observables with their constraints

The observables of the form of Eq. 180 suggests that it is appropriate to analyze the disorder configurations in terms the empirical 1-interval observables

[TABLE]

The summation over the length $l$ corresponds to the density $\frac{N}{L}$ of intervals $v=+$ or $v=-$

[TABLE]

while the total length $L$ of the disorder configurations fixes the normalization (Eq. 177)

[TABLE]

It is thus useful to introduce the following notation to summarize these constraints on the empirical 1-interval observables $n^{\pm}(.)$

[TABLE]

where again the notation $\delta(X)$ is introduced for better readability of the arguments $X$ but actually represents the Kronecker symbol $\delta_{0,X}$ .

A.3 Typical values of the empirical 1-interval observables

Since the probability of a disorder configuration is given by Eq. 2 with $p_{+}+p_{-}=1$ , the probability distributions of the lengths $l$ of the intervals $v=\pm$ are given by the geometrical distributions

[TABLE]

with the normalization

[TABLE]

and the averaged lengths

[TABLE]

As a consequence, the typical density $\frac{N^{typ}}{L}$ of the intervals reads

[TABLE]

and the typical values of the empirical 1-interval observables are

[TABLE]

A.4 Large deviations of empirical 1-interval observables

In order to analyze the large deviations of empirical 1-interval observables, one needs to introduce a generalized semi-Markovian model for the disorder, where the lengths $l_{i}$ of the intervals are drawn with some general distributions $p_{\pm}(l)$ (instead of the geometric distributions of Eq. 187). The probability of some configuration of the intervals then reads (up to boundary terms that can be neglected for $L\to+\infty$

[TABLE]

where the action in the exponential is a function of the empirical 1-interval observables introduced in Eq. 183

[TABLE]

while $c_{1}[n_{+}(.);n_{-}(.)]$ has been introduced in Eq. 186 to summarize the constraints. In this semi-Markovian model, all the disorder configurations that have the same empirical 1-interval observables $n^{\pm}(.)$ have the same probability. As a consequence, the probability $P_{L}[n_{+}(.);n_{-}(.)]$ to see these empirical observables is given by

[TABLE]

where $\omega_{L}[n_{+}(.);n_{-}(.)]$ counts the number of disorder configurations that correspond to these empirical observables, while the normalization reads

[TABLE]

When the empirical 1-interval observables take their typical values for this semi-Markovian generalized model (adapted from Eqs 190 and 191 )

[TABLE]

the probability $P_{L}[n^{typ}_{+}(.);n^{typ}_{-}(.)]$ should remain finite as $L\to+\infty$ . So the factor $\omega_{L}[n^{typ}_{+}(.);n^{typ}_{-}(.)]$ should compensate exactly the exponential factor of Eq. 194, i.e. it should display the exponential growth

[TABLE]

When the empirical observables $[n_{+}(.);n_{-}(.)]$ are different from their typical values $[n^{typ}_{+}(.);n^{typ}_{-}(.)]$ , we may consider a modified semi-Markovian model with modified probability distributions ${\tilde{p}}_{\pm}(l)$ for the lengths of the intervals that would make the empirical observables $[n_{+}(.);n_{-}(.)]$ typical for this modified model. Equations 196 yield that the modified probability distributions ${\tilde{p}}_{\pm}(l)$ should be chosen as

[TABLE]

where the two denominators coincide as a consequence of the constraints of Eq. 186

Then Eq. 197 translates for this modified model into

[TABLE]

Plugging this result into Eq. 194 yields the large deviation form

[TABLE]

with the rate function

[TABLE]

Related studies on large deviations properties of various semi-Markov processes in continuous time can be found in [11, 106, 107, 108, 109].

Here we wish to return to the initial disorder model corresponding to the geometric distributions $p^{geo}_{\pm}(l)$ of Eq. 187, where the result of Eq. 201, concerning the generalized semi-Markov model of disorder configurations with arbitrary distributions $p_{\pm}(l)$ for the lengths of the intervals, becomes

[TABLE]

A.5 Large deviations for observables given by the product of the intervals contributions

The modulus of Eq. 180 can be rewritten in terms of the empirical 1-interval observables of Eq. 183 as

[TABLE]

So the corresponding finite-size Lyapunov exponent of Eq. 4 is a linear function of the empirical 1-interval observables

[TABLE]

Its typical value can be obtained from the typical values of the empirical 1-interval observables of Eq. 191

[TABLE]

The moments of non-integer order $k$ of Eq 203 read in terms of the probability $P_{L}[n_{+}(.);n_{-}(.)]$ of Eq 200

[TABLE]

One thus needs to optimize the function $\left[k\lambda[n_{+}(.);n_{-}(.)]-J^{geo}[n_{+}(.);n_{-}(.)]\right]$ in the exponential in the presence of the constraints $c_{1}[n_{+}(.);n_{-}(.)]$ of Eq. 186 that can be taken into account via Lagrange multipliers. It is technically more convenient to introduce the empirical density of intervals $\pm$ that appear in the constraints $c_{1}[n_{+}(.);n_{-}(.)]$ and in the rate function $J^{geo}[n_{+}(.);n_{-}(.)]$ of Eq. 202

[TABLE]

via another constraint.

So we will consider the functional

[TABLE]

The optimization with respect to the empirical 1-interval observable $n_{\pm}(l)$

[TABLE]

yields the forms

[TABLE]

The constraints

[TABLE]

determine the Lagrange multipliers $\chi_{\pm}$ as a function of the other parameters

[TABLE]

The optimization with respect to the interval density $n$

[TABLE]

yields together with Eq. 212 that the value of the Lagrange multiplier $\varphi$ is fixed by the condition

[TABLE]

while the remaining constraint

[TABLE]

determines the value of the density $n$ .

The value of the functional of Eq. 208 for the optimal solution satisfying the constraints

[TABLE]

actually reduces to the Lagrange multiplier $\varphi$ .

In summary, the scaled cumulant generating function $\varphi(k)$ governing the exponential growth of the moments of Eq. 206

[TABLE]

is the solution of Eq 214

[TABLE]

that involves the distribution $p^{geo}_{\pm}(l)$ of the lengths of the intervals of the disorder configurations (Eq 187) and the functions $\theta_{\pm}(l)$ of Eq. 180 of the observable under study. One can check that the expansion at first order in $k$ around $k=0$ with Eq. 10

[TABLE]

allows to recover the typical value $\lambda^{typ}$ de Eq 205, while the special case $\theta^{\pm}(l)=(t_{\pm})^{l}$ allows to recover the scaled cumulant generating function $\phi(k)$ of Eq 25 concerning the simpler case of products of random variables.

Bibliography109

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Y. Oono, Progress of Theoretical Physics Supplement 99, 165 (1989).
2[2] R.S. Ellis, Physica D 133, 106 (1999).
3[3] H. Touchette, Phys. Rep. 478, 1 (2009).
4[4] B. Derrida, J. Stat. Mech. P 07023 (2007).
5[5] R J Harris and G M Schütz, J. Stat. Mech. P 07020 (2007).
6[6] E.M. Sevick, R. Prabhakar, S. R. Williams, D. J. Searles, Ann. Rev. of Phys. Chem. Vol 59, 603 (2008).
7[7] H. Touchette and R.J. Harris, chapter ”Large deviation approach to nonequilibrium systems” of the book ”Nonequilibrium Statistical Physics of Small Systems: Fluctuation Relations and Beyond”, Wiley 2013.
8[8] L. Bertini, A. De Sole, D. Gabrielli, G. Jona-Lasinio, and C. Landim Rev. Mod. Phys. 87, 593 (2015).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Revisiting classical and quantum disordered systems

Abstract

I Introduction

II Classification of observables in one-dimensional random models

II.1 Transfer-matrix formulation of one-dimensional random models

II.2 Statistics of the Lyapunov exponent λ\lambdaλ over the disorder configurations

II.3 Examples of observables corresponding to products of random variables

II.3.1 Examples of observables that are exactly given by products of random variables

II.3.2 Observables that can be approximated by products of random variables in certain regions of parameters

II.4 Classification of observables in terms of empirical histograms of the disorder configuration

III Product of random variables

III.1 Moments of non-integer order kkk

III.2 Rate function I(λ)I(\lambda)I(λ) governing the large deviations of the Lyapunov exponent λ\lambdaλ

III.3 Examples for the equilibrium of disordered classical models

III.4 Examples for disordered quantum models

IV Empirical 1-point histogram as the level-2 of large deviations

IV.1 First approach via the multinomial distribution

IV.2 Second approach via the generating function

IV.2.1 Link with the relative entropy via the Legendre transform

IV.2.2 Link with the relative entropy via the reciprocal Legendre transform

IV.3 Third approach via some appropriate change of measure

V Empirical 2-point histogram as the Level 2.5 of large deviations

V.1 Constraints on the empirical 2-point histogram Q..Q_{..}Q..​

V.2 Generalized Markovian model for the disorder

V.3 Return to the initial disorder of Eq. 2

VI Empirical higher order histograms as the level 3 of large deviations

VI.1 Large deviations properties of the empirical r-point histograms of arbitrary order rrr

VI.2 Analysis of the hierarchy in the backward direction via contraction

VII Random models on the Cayley tree from large deviations of branches

VII.1 Model on the star geometry where each branch corresponds to a product of random variables

VII.2 Statistical properties of the empirical histogram QL(λ){\cal Q}_{L}(\lambda)QL​(λ) of the Lyapunov exponent

VII.3 Statistical properties of the empirical sums SL(k){\cal S}_{L}(k)SL​(k) of Eq. 112

VII.4 Sample-to-sample fluctuations in the frozen phase k>k+k>k^{+}k>k+

VII.4.1 Probability distribution of the maximal Lyapunov exponent λ1\lambda_{1}λ1​ in each sample

VII.4.2 Probability distribution of SLfirst(k)=eLkλ1{\cal S}^{first}_{L}(k)=e^{Lk\lambda_{1}}SLfirst​(k)=eLkλ1​ over the samples

VII.5 Application to the Directed Polymer and the Random Energy Model

VII.6 Application to Anderson Localization

VII.7 Application to the Quantum Ising Model

VIII Conclusion

Appendix A Alternative classification of disorder configurations in terms of empirical intervals

A.1 Observables corresponding to products of contributions from intervals of random lengths

A.2 Empirical 1-interval observables with their constraints

A.3 Typical values of the empirical 1-interval observables

A.4 Large deviations of empirical 1-interval observables

A.5 Large deviations for observables given by the product of the intervals contributions

II.2 Statistics of the Lyapunov exponent $\lambda$ over the disorder configurations

III.1 Moments of non-integer order $k$

III.2 Rate function $I(\lambda)$ governing the large deviations of the Lyapunov exponent $\lambda$

V.1 Constraints on the empirical 2-point histogram $Q_{..}$

VI.1 Large deviations properties of the empirical r-point histograms of arbitrary order $r$

VII.2 Statistical properties of the empirical histogram ${\cal Q}_{L}(\lambda)$ of the Lyapunov exponent

VII.3 Statistical properties of the empirical sums ${\cal S}_{L}(k)$ of Eq. 112

VII.4 Sample-to-sample fluctuations in the frozen phase $k>k^{+}$

VII.4.1 Probability distribution of the maximal Lyapunov exponent $\lambda_{1}$ in each sample

VII.4.2 Probability distribution of ${\cal S}^{first}_{L}(k)=e^{Lk\lambda_{1}}$ over the samples