Revisiting classical and quantum disordered systems from the unifying perspective of large deviations
Cecile Monthus

TL;DR
This paper explores classical and quantum disordered systems through the lens of large deviations theory, providing a unified framework to understand both typical and rare events across different scales.
Contribution
It offers a pedagogical review that unifies the analysis of classical and quantum disordered systems using large deviations, highlighting common underlying mechanisms.
Findings
Unified perspective on classical and quantum disordered systems
Large deviations effectively describe typical and rare events
Highlights common mechanisms across different disordered systems
Abstract
The theory of large deviations is already the natural language for the statistical physics of equilibrium and non-equilibrium. In the field of disordered systems, the analysis via large deviations is even more useful to describe within a unified perspective the typical events and the rare events that occur on various scales. In the present pedagogical introduction, we revisit various emblematic classical and quantum disordered systems in order to highlight the common underlying mechanisms from the point of view of large deviations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Revisiting classical and quantum disordered systems
from the unifying perspective of large deviations
Cécile Monthus
Institut de Physique Théorique, Université Paris Saclay, CNRS, CEA, 91191 Gif-sur-Yvette, France
Abstract
The theory of large deviations is already the natural language for the statistical physics of equilibrium and non-equilibrium. In the field of disordered systems, the analysis via large deviations is even more useful to describe within a unified perspective the typical events and the rare events that occur on various scales. In the present pedagogical introduction, we revisit various emblematic classical and quantum disordered systems in order to highlight the common underlying mechanisms from the point of view of large deviations.
I Introduction
Just like Mr Jourdain discovering that he has been speaking in prose all his life without knowing it, physicists working in statistical physics become aware at some point that they have been using the theory of large deviations without realizing it since their very first acquaintance with the Boltzmann notion of entropy and the Gibbs theory of ensembles. This language of large deviations has turned out to be very powerful to unify the statistical physics of equilibrium, non-equilibrium and dynamical systems (see the reviews [1, 2, 3] and references therein) and to formulate an appropriate statistical physics approach of dynamical trajectories for various Markovian processes (see the reviews [4, 5, 6, 7, 8, 9, 10] and the PhD Theses [11, 12, 13, 14] and the HDR Thesis [15]).
In the field of disordered systems, the presence of random disorder variables induce a lot of subtle effects for the probabilities of interesting observables. Physicists have understood from the very beginning that some observables are non-self-averaging, i.e. their disorder-averaged value is completely different from their typical value (see the books [16, 17] and references therein). It was also realized very early that in each large typical sample, there will nevertheless occur rare anomalous regions of a certain size that may dominate some observables : famous examples are the Lifshitz essential singularities of the density of states near spectrum edges in Anderson localization models [18, 19, 20, 21, 16], the Griffiths singularities for the statics [22, 23] and the dynamics [24, 25, 26] of random classical models, and the Griffiths phases in random quantum models (see the reviews [27, 28] and references therein). Finally at critical points, it was found that multifractal properties appear, for instance for the inverse participation ratios of eigenfunctions at Anderson localization transitions (see the reviews [29, 30] and references therein) or for correlation functions in random classical spin models [31, 32, 33, 34, 35, 36, 37, 38], while at Infinite Disorder fixed points, many observables are even more broadly distributed [27, 28]. These few examples indicate that that the language of large deviations is even more useful in the presence of disorder in order to describe within a unified perspective all these phenomena involving typical and rare events on various scales.
The aim of the present pedagogical introduction is thus to explain to physicists how the general theory of large deviations is the natural language to analyze the properties of various well-known classical and quantum random models. It is of course not meant for mathematicians who have been using the large deviation framework for a very long time (see the books [39, 40, 41, 42, 43, 44] and references therein), in particular in the area of disordered systems (see the the books [45, 46, 47], the review [48] and references therein). This pedagogical introduction is thus intended only for physicists who are disheartened by the technical vocabulary used in the mathematical literature on large deviations (like Polish space, Borel sigma-field, cadlag function, … ).
The following sections are organized as follows. In section II, we introduce the generic notations for one-dimensional random models and describe how observables can be classified according to the order of the empirical property of the disorder configuration that determine them. We then analyze the various levels of this hierarchy : the ’Level 1’ of large deviations allows to study the properties of observables given by products of random variables (section III); the ’Level 2’ of large deviations corresponds to the fluctuations of the empirical 1-point histogram of the disorder configuration (section IV); the ’Level 2.5’ of large deviations corresponds to the fluctuations of the empirical 2-point histogram of the disorder configuration (section V); finally the ’Level 3’ of large deviations corresponds to the whole series of empirical histograms of arbitrary order (section VI). In Section VII, we turn to random models defined on Cayley trees to analyze their properties in terms of large deviations of branches. Our conclusions are summarized in section VIII. In Appendix A, we describe an alternative classification of one-dimensional disorder configurations in terms of empirical intervals where the disorder remains the same.
II Classification of observables in one-dimensional random models
II.1 Transfer-matrix formulation of one-dimensional random models
Many classical and quantum disordered models in one dimension can be reformulated in terms of the product of random matrices (see the books [16, 17] and references therein). To have generic notations, it will be convenient to denote by the disorder variable at point that is drawn independently with some probability distribution normalized to unity
[TABLE]
that should be translated into whenever the disorder is a continuous random variable. In this paper, we have chosen to write the general equations for the case of discrete disorder (Eq 1) without the constant translation into the case of continuous disorder, but some examples of application will involve continuous disorder.
A disorder configuration on a sample of sites occurs with the factorized probability
[TABLE]
In this disordered sample, various physical observables can be then obtained by considering the product of the corresponding transfer matrices [16, 17]. One of the most important observable is the trace of this product
[TABLE]
The exponential growth with of its modulus can be then measured by the finite-size Lyapunov exponent
[TABLE]
Of course a more complete analysis would involve the whole Lyapunov spectrum [17] of the product of matrices but will not be discussed here.
II.2 Statistics of the Lyapunov exponent over the disorder configurations
For large , the probability distribution of the finite-size Lyapunov exponent of Eq. 4 over the disorder configurations drawn with the probabilities of Eq. 2 is expected to follow the large deviation form [16, 17]
[TABLE]
where is called the ’rate function’ in the field of large deviations : it is positive and vanishes only at its minimum corresponding to the typical value that will be realized with probability one in the thermodynamic limit .
[TABLE]
All other values appear with a probability that is exponentially small in in Eq. 5, but they are nevertheless important to understand the behavior of the moments of non-integer order of the trace of Eq. 3, as a consequence of their evaluation via the Laplace saddle-point method of the following integral over
[TABLE]
The function governing their exponential growth in is called the ’scaled cumulant generating function’ in the field of large deviations. It corresponds to the Legendre transform of the rate function of Eq. 5 as a consequence of the saddle-point evaluation of Eq. 7
[TABLE]
with the reciprocal Legendre transform
[TABLE]
For where as a consequence of the normalization in Eq. 7, one obtains that the typical value where the rate function vanishes (Eq. 6) corresponds to the derivative
[TABLE]
while all moments of order are dominated by non-typical values of the Lyapunov exponent in the saddle-point calculation of Eq. 7.
Since the typical Lyapunov exponent appear with probability one in the thermodynamical limit , one of the main goal in the field of products of random matrices has been to compute it in various models via the Dyson-Schmidt invariant measure method [16, 17, 49]. In the present paper, our goal will be instead to focus on the simplest cases where the whole large deviations rate function can be explicitly obtained.
II.3 Examples of observables corresponding to products of random variables
It is clear that the simplest case is of Eq. 3 is when the transfer matrices are replaced by numbers
[TABLE]
This case occurs in various disordered models, either exactly or approximately in some region of parameters, as displayed by the following examples.
II.3.1 Examples of observables that are exactly given by products of random variables
(1-a) In the classical Ising chain with random couplings , the two-spin correlation function reads [50, 16, 17]
[TABLE]
(1-b) In the random quantum spin chains corresponding to free majorana fermions, the possible edge Majorana zero modes that characterize the topological phases are given in terms of product of random variables in the simplest cases (see [51] and references therein for various examples).
II.3.2 Observables that can be approximated by products of random variables in certain regions of parameters
(2-a) For the Anderson Localization tight-binding model with hopping and random on-site-energy , the eigenfunction localized on site for can be approximated at lowest order in the hopping in the so-called Forward Approximation [52, 53, 54, 55] by the product
[TABLE]
(2-b) For the quantum Ising chain with random couplings and random transverse fields , the two-spin correlation function is given at lowest order in perturbation in the couplings by the product
[TABLE]
This form can also be understood from the Strong Disorder RG approach [27, 28] when only sites are decimated, or from the Cavity approach [56, 57, 58].
II.4 Classification of observables in terms of empirical histograms of the disorder configuration
For each disorder configuration with periodic boundary conditions , the empirical 1-point histogram
[TABLE]
measures the frequencies of the possible values of the disorder variable. More generally, the empirical r-point histogram
[TABLE]
measures the frequencies of the occurrence of the r consecutive values in the disordered sample. This hierarchy can be constructed up to the maximal value that corresponds to the total length of the disorder configuration
[TABLE]
i.e. this represents the average over the translations via of the initial disorder configuration.
The observables of the disordered models can be then classified according to the order of the empirical r-point histogram that allows to reconstruct them. For instance, the product of Eq. 11 can be rewritten in terms of the empirical 1-point histogram of Eq. 15 as
[TABLE]
The physical interpretation is that the product of random variables is not sensitive to the order of appearance of the disorder variables , but depends only on the global frequencies of the possible values that are summarized in the empirical 1-point histogram .
An example of observable that depends only on the empirical r-point histogram of Eq. 16 is the Spatial-Average within a given sample of the 2-point correlation function at distance of Eq. 12 in a given sample
[TABLE]
whose statistics is discussed in [50] to stress that it will coincide with the disorder-averaged correlation function only for the small sizes . Finally, the most general observables depend on the empirical L-point histogram of Eq. 17 that contains the complete information on the disorder configuration.
The usefulness of this classification is that once one has identified that an observable depends on the disorder configuration only via its empirical r-point histogram of Eq. 16
[TABLE]
then its probability distribution over the disorder configurations drawn with Eq. 2 depends only on the probability distribution of the empirical r-point histogram
[TABLE]
In the theory of large deviations, it turned that the probability distributions of the empirical r-point histograms of various order have been labelled by levels as follows [1, 3] : the Level 2 corresponds to the empirical 1-point histogram , the Level 2.5 corresponds to the empirical 2-point histogram , the Level 3 corresponds to the full hierarchy of arbitrary up to the limit . In the following sections, we will thus describe this hierarchy, starting with the Level 1 that corresponds to the large deviations properties of sums of random variables, that are important to fully characterize the statistics of products of random variables.
III Product of random variables
as the level-1 of large deviations
In this section, we focus on the product of random variables corresponding to the modulus of Eq 11
[TABLE]
and on the corresponding finite-size Lyapunov exponent of Eq. 4
[TABLE]
As explained in detail in the previous section, this is the simplest problem that occur in the field of disordered systems. In the language of large deviations, the properties of the sum of random variables of Eq. 23 is also the simplest example corresponding to the so-called ’Level-1’ description [1, 3].
III.1 Moments of non-integer order
The moments of non-integer order of the product in Eq. 22 can be directly computed as a consequence of the independence of the disorder variables on the sites (Eq. 2)
[TABLE]
So the scaled cumulant generating function governing their exponential growth in (Eq 7) is given, actually even for any finite , by the simple expression
[TABLE]
in terms of the moments of the elementary variable .
III.2 Rate function governing the large deviations of the Lyapunov exponent
The rate function governing the large deviations (Eq 5) of the Lyapunov exponent of Eq. 23 can be computed either directly if the probability distribution of the sum of Eq. 23 is known or it can be obtained via the reciprocal Legendre transform (Eq. 9) from the knowledge of the function of Eq. 25. Let us now recall some simple examples that will be useful later (in section VII).
III.3 Examples for the equilibrium of disordered classical models
In the field of disordered classical models, the simplest example is when the variable corresponds to the Boltzmann weight at inverse temperature of the random potential
[TABLE]
Then Eq. 22 represents the Boltzmann weight of the sites
[TABLE]
and Eq. 23 corresponds to the energy per site (up to the factor )
[TABLE]
For instance if the distribution of the potential is Gaussian of zero mean
[TABLE]
then both the rate function and the scaled cumulant generating function are simply quadratic
[TABLE]
Another example is when the distribution of the potential is the Bernoulli distribution
[TABLE]
then the rate function and the scaled cumulant generating function read
[TABLE]
So it is important to stress here that the large deviations properties depend on all the details of the disorder distribution , in contrast to the small deviations region described by the Central-Limit-Theorem that corresponds to the expansion at lowest order of the rate function around its vanishing minimum at the typical value of Eq. 6
[TABLE]
III.4 Examples for disordered quantum models
For the Anderson Localization model in the Forward approximation of Eq. 13, it is usual to consider the box distribution of width for the random on-site energy
[TABLE]
The elementary variable in the product in Eq. 13 at the center of the band
[TABLE]
has then moments only in the region
[TABLE]
So the scaled cumulant generating function of Eq. 25 reads
[TABLE]
with the corresponding rate function
[TABLE]
IV Empirical 1-point histogram as the level-2 of large deviations
In this section, we focus on the probability of the empirical 1-point histogram of Eq. 15 over the disorder configurations drawn with Eq. 2
[TABLE]
Of course the typical value of this histogram is the ’true’ probability distribution of the disorder (Eq. 1)
[TABLE]
but here the goal is to describe its fluctuations for large . In the language of large deviations [1, 2, 3], this is known as the ’Level-2 description of the empirical measure’. The essential result is the large deviation form for large
[TABLE]
where
[TABLE]
represents the normalization constraint of the empirical histogram (the notation represents the discrete Kronecker symbol but has been chosen here for better readability of the argument ), while the rate function is the relative entropy of the empirical 1-point histogram with respect to the true probability distribution of the disorder
[TABLE]
This result is known as the Sanov theorem in the field of large deviations [1, 2, 3] and can be considered as the true cornerstone of the whole theory, with many further generalizations for the higher levels. It is thus important to fully understand its origin and its physical meaning, via the three following different derivations.
IV.1 First approach via the multinomial distribution
Since each disorder value is drawn with probability independently on each of the sites (Eq 2), the probability of the empirical 1-point histogram of Eq. 15 amounts to analyze the integer numbers of the occurrences of each value and is thus given by the multinomial distribution
[TABLE]
The Stirling’s approximation for the factorials then yields the large deviation form of Eq. 41 with the relative entropy of Eq. 43. This derivation based on the application of the Stirling’s approximation to the multinomial distribution of Eq. 44 goes back to Boltzmann [2] and appears in all statistical physics lectures.
IV.2 Second approach via the generating function
Another derivation is based on the generating function of the empirical 1-point histogram of Eq. 39
[TABLE]
This factorized form is valid already for any finite and the corresponding scaled cumulant generating function governing the exponential growth with
[TABLE]
is given in terms of the generating function of the disorder distribution
[TABLE]
where the analogy with Eq. 25 is clear. It is now useful to show the link with the the relative entropy of Eq. 43 via the Legendre transform and the reciprocal Legendre transform respecify.
IV.2.1 Link with the relative entropy via the Legendre transform
The generating function of Eq 45 can be rewritten in terms of Eq. 41 as
[TABLE]
The Laplace’s saddle point method for large yields that one should optimize over the function in the exponential in the presence of the normalization constraint in order to obtain the function of Eq. 46
[TABLE]
Taking into account the constraint via some Lagrange multiplier , one needs to optimize the functional
[TABLE]
over the values
[TABLE]
One obtains the optimal solution
[TABLE]
where the Lagrange multiplier is fixed by the constraint
[TABLE]
The optimal value of the functional of Eq. 50
[TABLE]
indeed coincides with the result of Eq. 47.
IV.2.2 Link with the relative entropy via the reciprocal Legendre transform
The reciprocal Legendre transform of Eq. 49 reads
[TABLE]
The optimization over
[TABLE]
yields the optimal solution
[TABLE]
and the optimal value of the functional of Eq. 55
[TABLE]
coincides with the relative entropy as it should.
These calculations based on generating functions, Laplace’s saddle-point method with constraints taken into account via Lagrange multipliers, and Legendre transforms are very standard both in statistical physics and in the theory of large deviations.
IV.3 Third approach via some appropriate change of measure
The third approach via some appropriate change of measure is very common in the whole field of large deviations, but appears to be less well known among physicists. It seems thus useful to explain it here in more physical terms than usual. The starting point is that the probability of the disorder configuration of Eq. 2 can be rewritten only in terms of the empirical 1-point histogram of Eq. 15
[TABLE]
So all the disorder configurations that have the same empirical 1-point histogram have the same probability in Eq. 59. As a consequence, the normalization of Eq. 59 over all disorder configurations can be rewritten as a sum over the possible empirical 1-point histogram
[TABLE]
where
[TABLE]
counts the number of disorder configurations that are associated to the same value of the empirical histogram. So the probability of Eq 39 to observe the empirical histogram reads
[TABLE]
When the empirical 1-point histogram takes its typical value of Eq. 40, the probability of Eq. 62
[TABLE]
should not decay exponentially in , so that should grow exponentially in in order to compensate exactly the other exponential factor
[TABLE]
To obtain the behavior of when the empirical 1-point histogram is different from its typical value , we may consider a modified model where the disorder is drawn with the modified probability that will make typical for this modified model, and one obtains
[TABLE]
where
[TABLE]
represents the entropy of the empirical 1-point histogram . Plugging this result into Eq. 62 yields that the large deviation behavior of the probability of the empirical 1-point histogram
[TABLE]
involves again the relative entropy as it should to recover Eq 41 and Eq. 43.
This idea to evaluate the large deviations properties of the untypical values of the empirical observable via the introduction of a modified model that make this empirical observable typical is used extensively in the field of large deviation for the two following reasons. From the conceptual point of view, this way of thinking is very illuminating because it shows very clearly why the entropy appears in Eq. 65 and why the relative entropy appears in Eq 67. From the technical point of view, it is extremely powerful, since it allows to obtain directly the results without any actual computations : indeed, one does not need to use combinatorics to enumerate the appropriate configurations in finite size as in Eq. 44, and one does not need either to compute the generating function of Eq. 45 and to perform the reciprocal Legendre transform, but one obtains directly the rate function from simple considerations. In the following sections concerning the more complicated cases of empirical histograms of higher orders, as well as in the Appendix, we will see how this approach can be adapted to each purpose in order to obtain directly the appropriate rate functions without any calculation.
V Empirical 2-point histogram as the Level 2.5 of large deviations
In this section, we focus on the probability of the empirical 2-point histogram of Eq. 16 for over the disorder configurations drawn with Eq. 2
[TABLE]
Its large deviations properties have been analyzed in the context of Markov chains [11, 59, 3]. Together with its analog formulations for Markov jump processes in continuous time [11, 60, 61, 62, 63, 64, 65, 66, 15, 67, 68, 69] and for diffusion processes [63, 70, 64, 71, 15], it is nowadays called the ’Level 2.5’ in the field of large deviations.
V.1 Constraints on the empirical 2-point histogram
Since the empirical 1-point histogram can be reconstructed by summing over the last or the first value of the empirical 2-point histogram , it is convenient to introduce the following notation to summarize these constraints
[TABLE]
while the empirical 1-point histogram should of course still satisfy the normalization constraint of Eq. 42.
V.2 Generalized Markovian model for the disorder
In order to analyze the statistical properties of the empirical 2-point histogram, it is useful to introduce a generalized model where the disorder configurations are generated by a Markov chain where the transition probability matrix to go from to is normalized to unity
[TABLE]
The probability of Eq. 2 for a disorder configuration is thus replaced by the product of the transition probabilities along the configuration (up to boundary terms that become negligible for large )
[TABLE]
It is also useful to introduce the stationary state of this Markov chain satisfying
[TABLE]
with the normalization
[TABLE]
For this generalized Markovian model, the typical value of the empirical 1-point histogram of Eq. 15 is simply the stationary state introduced in Eq. 72
[TABLE]
while the typical value of the empirical 2-point histogram is given by the corresponding flow appearing in Eq. 72
[TABLE]
that satisfy the constraints of Eqs 69 and Eq 42.
Since the probability of Eq. 71 can be rewritten only in terms of the empirical 2-point histogram as
[TABLE]
the normalization over disorder configurations can be rewritten as a sum over the empirical 1-point and 2-point histograms with their constraints of Eq. 42 and Eq 69 as
[TABLE]
where counts the number of disorder configurations that have the empirical observables and is thus the direct generalization of Eq. 61, while the probability to observe these empirical observables reads
[TABLE]
For the typical values of Eq. 74 and Eq 75 of the empirical observables, this probability should not be exponentially small in so that should exactly compensate the other exponential factor in Eq. 78
[TABLE]
For other values of the empirical observables, one may consider a modified Markov transition matrix that would make these empirical histograms typical : Eqs 74 and 75 yields that the appropriate choice is
[TABLE]
so that Eq 79 becomes
[TABLE]
where represents the entropy of the empirical 2-point histogram
[TABLE]
while is the entropy of the empirical 1-point histogram introduced in Eq 66.
Plugging Eq. 81 into Eq 78 yields the large deviation form [11, 59, 3]
[TABLE]
that is called nowadays the ’Level 2.5’ for Markov chains. The rate function can be interpreted as the relative entropy for Markov chains [11, 59, 3]. The analog results have been much studied for Markov jump processes in continuous time [11, 60, 61, 62, 63, 64, 65, 66, 15, 67, 68, 69] and for diffusion processes [63, 70, 64, 71, 15],
V.3 Return to the initial disorder of Eq. 2
The initial disorder model of Eq. 2 corresponds to the special case where the Markov matrix of Eq. 70 reduces to
[TABLE]
Then Eq 83 simplifies into
[TABLE]
In the last expression, one recognizes the probability of the empirical 1-point histogram of Eq. 67. This yields that the conditional probability to observe the empirical 2-point histogram once the empirical 1-point histogram is given reads
[TABLE]
In particular, once the empirical 1-point histogram is given, the typical value of the empirical 2-point histogram is simply the product
[TABLE]
as it should, while Eq 86 described the large deviations away from this typical value.
VI Empirical higher order histograms as the level 3 of large deviations
In the language of large deviations, the Level 3 actually denotes the empirical process that can be constructed from the knowledge of the empirical r-point histogram in the limit [1, 3]. In this section, we will not be interested into taking this limit, but we wish to analyze the hierarchy of the empirical r-point histograms of arbitrary order up to the maximal value (Eq 17), in order to characterize the sample-to-sample fluctuations for a disordered ring of large size . So strictly speaking, this section is between the Level 2.5 of the previous section and the Level 3 concerning the limit .
VI.1 Large deviations properties of the empirical r-point histograms of arbitrary order
In the two previous sections, we have described in detail the large deviations properties of the empirical 1-point histogram and 2-point histogram . Via iteration, one may analyze similarly the properties of the empirical r-point histogram of Eq. 16 of arbitrary order . Since the empirical -point histogram can be reconstructed by summing over the last or the first value of the r-point histogram , it is convenient to introduce the following notation analogous to Eq 69 to summarize them
[TABLE]
The final result is that the probability to observe the empirical histograms up to the -point histogram normalized to unity
[TABLE]
follows the large deviation form
[TABLE]
that generalizes Eq. 85. Besides the consistency constraints up to order (Eq 88) and besides the disorder configuration weight of Eq. 59 that only involves the empirical 1-point histogram , the remaining factor corresponds to the exponential growth of the number of configurations that have some empirical r-point histogram
[TABLE]
in terms of the entropy of the empirical r-point histogram
[TABLE]
Equivalently, Eq. 90 means that the conditional probability to observe the empirical -point histogram once the empirical -point histogram is given reads
[TABLE]
which is the generalization of Eq. 86.
VI.2 Analysis of the hierarchy in the backward direction via contraction
Up to now we have described the hierarchy of empirical histograms by considering successively higher and higher order . But it is also useful to see now how one goes backwards in this hierarchy, via the notion of ’contraction’ which is the generic name in the field of large deviations for the operation needed to go from a higher to a lower level of description. In our present case, the contraction consists in finding the optimal empirical -point histogram that maximizes the conditional probability Eq 93 when all the lower-order empirical histograms are given. One needs to maximize the exponential factor in Eq 93 in the presence of the constraints of Eq. 88 that can be taken into account via Lagrange multipliers. So one considers the following functional of
[TABLE]
The optimization with respect to
[TABLE]
yields the optimal solution
[TABLE]
where the Lagrange multipliers and have to be chosen to satisfy the constraints
[TABLE]
A further consequence is thus the following constraint involving the empirical histogram of order
[TABLE]
These four last equations yield that the optimal solution of Eq. 96 can be simply rewritten as the product of the two empirical observables of order of Eq 97 divided by the empirical observable of order of Eq 98
[TABLE]
One then needs to evaluate the entropy of Eq. 92 of this optimal solution
[TABLE]
So the functional of Eq. 94 vanishes for this optimal solution
[TABLE]
i.e. the conditional probability of Eq. 93 does not decay exponentially in for this optimal solution , that represents the typical value of once all the empirical histograms of lower order are given
[TABLE]
The probability of all other values is described by the large deviation form of Eq. 93.
VII Random models on the Cayley tree from large deviations of branches
Many random models have been studied on the geometry of the Cayley tree, where the absence of loops allows to write exact recurrences on probability distributions : two famous examples are the Directed Polymer on the Cayley tree [72, 73] and the Anderson Localization on the Cayley tree [74, 75, 76, 77]. In the Cayley tree of branching ratio around the central root , the number of sites at distance
[TABLE]
grows exponentially with the distance , in contrast to the power-law growth as in any finite dimension . The Cayley tree is thus considered as an appropriate way to define the mean-field version of random models in infinite dimensionality .
It is interesting to compare the properties of the same random model defined in the two following geometries :
(i) in the finite Cayley tree of branching ratio with generations around the central root , where the number of leaves is given by Eq. 103 for
[TABLE]
(ii) in the star geometry, where the central root is linked to independent one-dimensional lattices of sites, so that the number of sites at distance is actually independent of
[TABLE]
but the number of leaves at displays the same exponential behavior in as Eq. 104.
Although (ii) may look as an extremely crude approximation of (i), the properties of some random models defined on (i) and (ii) have turned out to be very close, as exemplified by the exact solutions of (i) the Directed Polymer on the Cayley tree [72, 73] and of (ii) the Directed Polymer in the star geometry that coincides with the Random Energy Model [78] (a model that had been introduced before with completely different motivations coming from mean-field spin-glasses). The differences between the two only appear in the finite-size scaling properties of the freezing transition [73].
In the star geometry (ii), it is clear that the random model will be governed by the large deviations properties of the corresponding one-dimensional model of length that appear on the independent branches. In this section, the goal is thus to describe how the large deviations properties of one-dimensional models that have been discussed in the previous sections can be used to analyze the properties of the same model on this star geometry (ii).
VII.1 Model on the star geometry where each branch corresponds to a product of random variables
We wish the analyze the star geometry (ii) above, where each of the independent branches labelled by can be described by a product of random variables as Eq. 11
[TABLE]
with its corresponding finite-size Lyapunov exponent of Eq. 23
[TABLE]
whose large deviations properties for large are described by some rate function
[TABLE]
Each disordered configuration on the star geometry can be then characterized by the empirical histogram of the Lyapunov exponent of Eq. 107 for the independent branches
[TABLE]
while the empirical number of branches having the Lyapunov exponent reads
[TABLE]
In various models, an interesting class of observables are given by the sums over the independent branches of the powers of non-integer of the products of Eq. 106
[TABLE]
that can be rewritten in terms of the Lyapunov exponents (Eq 107) of the branches or in terms of the empirical observables of Eqs 109 and 110 as
[TABLE]
VII.2 Statistical properties of the empirical histogram of the Lyapunov exponent
The typical value of the empirical histogram of Eq. 109 is given by the true probability of the Lyapunov exponent of Eq. 108
[TABLE]
so that in a given sample, the empirical number of branches of Eq. 110 has for typical value
[TABLE]
The typical value of the one-dimensional model corresponding to the vanishing of the rate function will thus appear in an extensive number of the branches
[TABLE]
while all the other values in the interval where
[TABLE]
will appear in a sub-extensive number of branches. Finally, the values of the Lyapunov exponent outside this interval, i.e. in the two regions and where the rate function satisfies are too rare to appear in a typical sample of the star geometry, so that Eq. 114 should be rewritten more precisely for a typical sample as
[TABLE]
However the values and that do not appear in a typical sample may appear in atypical samples, and it is thus interesting to consider the large deviations of the empirical histogram of Eq. 109 with respect to its typical value of Eq. 113 : since the branches are independent, one may directly adapt the Sanov result of Eq. 41 to our present notations : the probability to observe the empirical histogram follows the large deviation form with respect to the size
[TABLE]
where the rate function corresponds to the relative entropy
[TABLE]
of the empirical histogram with respect to the true probability distribution of the Lyapunov exponent (Eq. 108). As explained in detail in section IV.2, the Sanov result of Eq. 118 is equivalent to the following expression of the generating function that is valid for any finite (Eq. 45 as adapted to our present context)
[TABLE]
In particular, the successive derivatives with respect to
[TABLE]
gives the integer moments of the number of branches with some Lyapunov exponent (Eq 110) by taking . The first moment
[TABLE]
coincides with the typical value of Eq. 114. The second moment
[TABLE]
can be rewritten in terms of the typical value of Eq. 114 as
[TABLE]
and will thus change of behavior at the values introduced in Eq. 116. In the region where is exponentially large, the second term dominates over the first term that corresponds to a small fluctuation. In the other regions where is exponentially small, the first term dominates and actually represents the very small probability to have a single rare event
[TABLE]
This result can be generalized to arbitrary moments, as described in the context of the Random Energy Model [78].
VII.3 Statistical properties of the empirical sums of Eq. 112
The disorder-averaged value of the empirical sum of Eq. 111 reads
[TABLE]
where the moments of non-integer order for the product of random variables have been already discussed in Eq. 24
[TABLE]
in terms of the scaled cumulant generating function
[TABLE]
that corresponds to the Legendre transform of the rate function .
On the other hand, Eq. 112 yields that the sum in a typical sample can be computed from the empirical histogram in a typical sample (Eq. 117)
[TABLE]
So the only difference with the averaged value (Eqs 126 and 127)
[TABLE]
lies in the boundaries for the integration over the Lyapunov exponent that appear for the value in a typical sample (Eq 129) but that are absent in the averaged value of Eq. 130. As a consequence, one needs to discuss the position of the saddle-point value that governs the integral governing the averaged value of Eq. 130
[TABLE]
with respect to the two boundaries of the integral governing the typical-sample value of Eq. 129. It is thus useful to introduce the two values satisfying i.e.
[TABLE]
and to distinguish the three following cases :
(a) In the region , the saddle-point value of Eq. 131 is in the interval
[TABLE]
The typical-sample value of Eq. 129 has then the same exponential behavior in as the averaged value involving the Legendre transform (Eqs 127 and 25 ) of
[TABLE]
(b) In the region , the saddle-point value of Eq. 131 is bigger than
[TABLE]
The typical-sample value of Eq. 129 is then governed by the saddle-point evaluation frozen at the boundary satisfying (Eq 116)
[TABLE]
(c) In the region , the saddle-point value of Eq. 131 is smaller than
[TABLE]
The typical-sample value of Eq. 129 is then governed by the saddle-point evaluation frozen at the boundary satisfying (Eq 116)
[TABLE]
VII.4 Sample-to-sample fluctuations in the frozen phase
In the frozen phase , the sample-dependent version of Eq. 136 is that the sum in a given sample will be actually governed by the biggest Lyapunov exponents available among the branches. It is thus convenient to relabel in each sample the Lyaponov exponents according to their magnitudes
[TABLE]
and to analyze the statistics of the first biggest terms in the sum of Eq. 112
[TABLE]
and in particular the first one that involves the maximal Lyapunov exponent
[TABLE]
VII.4.1 Probability distribution of the maximal Lyapunov exponent in each sample
The maximal Lyapunov exponent is typically of order , but here we wish to analyze its probability distribution over the samples. The corresponding cumulative distribution reads in terms of of Eq. 108
[TABLE]
The change of variables
[TABLE]
centered around the value where and (Eq. 132) leads to the Taylor expansion of the rate function
[TABLE]
Plugging this expansion into Eq 142
[TABLE]
yields the convergence towards the Gumbel distribution (well-known as one of the three universality classes for the extreme-value statistics of independent random variables [82, 83])
[TABLE]
for the random variable introduced in Eq. 143.
VII.4.2 Probability distribution of over the samples
Eq 141 yields that its logarithm reads with the change of variables of Eq. 143
[TABLE]
where is distributed with the Gumbel distribution of Eq. 146. This means that the probability distribution of propagates as a traveling wave as grows : the first term corresponds to a motion with the non-random velocity with respect to , while the second term is random and independent of , i.e. its probability distribution corresponds to the fixed shape of the traveling wave. This notion of traveling wave has been stressed here because it plays a major role in the analysis of random models defined on Cayley trees, as first discovered with the exact solution of the Directed Polymer on the Cayley tree [72].
Eq. 147 translates into
[TABLE]
where is the value in a typical sample introduced in 136, while
[TABLE]
is an positive random variable, whose distribution reads in terms of the Gumbel distribution of Eq. 146
[TABLE]
where the exponent
[TABLE]
governs the power-law decay of Eq. 150 for large
[TABLE]
The exponent decays continuously in the frozen phase from the value to vanishing values . Since it remains smaller than one in the whole frozen phase
[TABLE]
the averaged value of the variable is infinite
[TABLE]
i.e. the averaged value in Eq. 148 has a different exponential behavior in than the typical value , in consistency with the discussion around Eq. 130.
VII.5 Application to the Directed Polymer and the Random Energy Model
With respect to the generic notations of section VII.1, the Random Energy Model [78] corresponds to and to the case where is an energy distributed with a Gaussian distribution of Eq. 29 so that the rate function and the scaled cumulant generating function are quadratic
[TABLE]
The empirical number of Eq. 110 corresponds to the number of accessible states in the microcanonical ensemble where the energy density is fixed, and its value in a typical sample (Eq. 117) yields that the function in the exponential corresponds to the entropy as a function of the energy density in the microcanonical ensemble [78]
[TABLE]
with the boundaries
[TABLE]
With the change of notation , the empirical sum of Eq. 111 and 112 corresponds to the partition function in the canonical ensemble at inverse temperature
[TABLE]
with its disordered-averaged value (Eqs 126 and 127)
[TABLE]
while its value in a typical sample (Eq 129) involves the microcanonical entropy of Eq. 156
[TABLE]
Since the inverse temperature is positive (instead of of arbitrary sign above), the critical temperature of the freezing transition corresponds to the solution of Eq 132
[TABLE]
The two phases are [78]
(a) the high-temperature phase where the partition function in a typical sample (Eq 129) coincides with the averaged value of Eq. 159.
(b) the low-temperature frozen phase where the partition function in a typical sample is different from the averaged value of Eq. 159 because it is governed by the boundary (Eq. 136)
[TABLE]
In this frozen phase, the exponent of Eq. 151
[TABLE]
of the heavy-tail distribution of Eq. 152 allows to analyze further the statistics of overlaps in terms of the weights of individual terms within in a Lévy sum of random variables distributed with heavy tails [79, 80, 81].
VII.6 Application to Anderson Localization
The notations for the Anderson Localization model have been explained in the subsection III.4 with the rate function and the scaled cumulant generating function given by Eqs 37 and 38
[TABLE]
Here the analysis concerns the localized phase in the regime of small hopping where the forward perturbation formula of Eq. 13 is valid, so it will be possible to use this approach up to the critical hopping of the delocalization transition only if the branching ratio is large .
The empirical number of Eq 110 counts the number of leaves (among the branches) where the wave-function is of order with respect to the finite wave-function at the center. The empirical number in a typical sample (Eq. 117) reads
[TABLE]
where the boundaries are given by Eq. 116
[TABLE]
For large , the upper boundary is given by
[TABLE]
The localized phase correspond to the region , where the wave-function decays exponentially on all the branches, while the delocalization transition occurs when vanishes
[TABLE]
so the critical hopping for the delocalization transition is given for large by
[TABLE]
At this critical point , the inverse participation ratios
[TABLE]
correspond to the empirical sums of Eq. 111 with the change of notation , so that their disordered-averaged values (Eqs 126 and 127) read for
[TABLE]
where the exponents defined with respect to the number of sites read for large
[TABLE]
Eq 132 yields that the boundary value using Eq. 167 and Eq. 169
[TABLE]
is close to unity for large , so that the inverse participation ratios in a typical sample
[TABLE]
involve essentially the same exponents as the averaged values of Eq. 171
[TABLE]
These exponents are known as the ’Strong Multifractality spectrum’ in the field of Anderson transitions [30], where they appears either in the limit of infinite dimensionality or in related long-ranged power-law hoppings in one-dimension [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98], or more recently in toy models of Many-Body-Localization [99, 100]. Although the freezing transitions at the values is not very important for this ’Strong Multifractality spectrum’, they have been much discussed in the general theory of multifractality at Anderson transitions in finite dimension [29, 30, 88]
VII.7 Application to the Quantum Ising Model
As a third and final example, let us mention the case of the random transverse field spin-glass model on the Cayley tree that has been studied recently via real-space renormalization and where the large deviations properties of the one-dimensional model play a major role [101]. Here the difference with the two previous examples of the Random Energy Model and of Anderson Localization is that the one-dimensional model has already its phase transition between the spin-glass phase and the paramagnetic phase, where the exact critical properties have been obtained by the Strong Disorder renormalization approach [27, 28]. As a consequence, one obtains three phases that can be explained as follows in the star-geometry (ii) of Eq. 105, where one considers that the center is linked to independent chains of length [101], i.e. each branch is characterized by the Lyapunov exponent (Eq 14)
[TABLE]
(a) the star is in its paramagnetic phase if all the chains are in their paramagnetic state , i.e. the boundary value of Eq. 116 should be negative .
(b) the star is in its spin-glass phase with an extensive spin-glass order if an extensive number of the chains are in their spin-glass phase i.e. the typical Lyapunov exponent should be positive .
(c) in between, i.e. in the region , the star is a spin-glass phase with an sub-extensive spin-glass order, because only the subextensive number of chains are in their spin-glass phase , while an extensive number of chains are in their paramagnetic state .
VIII Conclusion
In this pedagogical introduction, we have explained why the general theory of large deviations is the natural language to analyze the properties of disordered systems in order to offer a unified perspective on the typical events and on the rare events that occur on various scales. We have first focused on one-dimensional random models in order to emphasize the various levels of description. We have first recalled how the Level 1 allows to analyze the properties of observables given by products of random variables that occur in many classical or quantum models. We have then described how a finer analysis in terms of the whole hierarchy of empirical histograms allows to classify the set of disorder configurations into subsets that have the same empirical properties up to a certain order. We have then turned our attention to random models defined on Cayley trees, in order to analyze their properties in terms of the large deviations of branches. We have taken as examples various emblematic classical and quantum disordered systems in order to highlight the common underlying mechanisms from the point of view of large deviations.
The large deviation analysis of disordered systems in finite dimension clearly goes beyond the scope of the present introduction. Although some notions can be directly applied, like the Sanov theorem for the empirical 1-point histogram, or the multifractal analysis at Anderson transitions [29, 30] or at phase transitions of random classical models [31, 32, 33, 34, 35, 36, 37, 38], one should be aware that qualitatively new phenomena may also occur. For instance the large deviations properties that have been exactly computed [102, 103, 104] for the Directed Polymer in dimension display an asymmetry between values bigger or smaller than the typical value, with two different scalings with respect to the length of the polymer : an ’anomalously good’ ground state energy requires only anomalously good on-site energies along the polymer, while an ’anomalously bad’ ground state energy requires bad on-site energies in the two-dimensional sample. So this single example already shows that some properties of random systems in finite dimensions call for a much broader large deviation theory with two different scalings for values bigger or smaller than the typical value, as discussed in more details in the recent preprint [105].
Appendix A Alternative classification of disorder configurations in terms of empirical intervals
In the text, we have described the classification of one-dimensional disorder configurations in terms of the hierarchy of the empirical r-point histograms. In this Appendix, we discuss an alternative classification in terms of the empirical intervals during which the disorder keeps a constant value, since this framework is more appropriate to analyze the Lifshitz and the Griffiths singularities as we now recall.
A.1 Observables corresponding to products of contributions from intervals of random lengths
After the product of random variables discussed in section II.3, the next simpler case of Eq. 3 concerns the case where the disorder variable can take only two values that will be labelled by . It is then useful to replace the disorder configuration by its decomposition into intervals during which the disorder keeps the same value. For a model defined on a ring of sites (i.e. with periodic boundary conditions ) there will be an empirical even number of intervals, where the odd intervals of lengths are associated to the value , while the even intervals of lengths are associated to the value . The lengths satisfy the sum rule
[TABLE]
When the disorder configuration is replaced by the list of the lengths of the intervals, the trace of Eq. 3 becomes
[TABLE]
To analyze the Lifshitz and the Griffiths singularities mentioned in the Introduction, various models have been studied in the regime where the value corresponds to a very strong disorder value where the associated transfer matrix can be approximated by a projector on some state with some eigenvalue [16]
[TABLE]
Then Eq. 178 simplifies into the product of the contributions of the intervals
[TABLE]
where the contribution of an interval of length is simply
[TABLE]
while the contribution of an interval of length corresponds to the the pure model with the boundary conditions fixed by the projector form of Eq. 179
[TABLE]
Various examples concerning Anderson Localization models and classical spin chains are described in the book [16], while an example concerning random DNA is analyzed in [23].
A.2 Empirical 1-interval observables with their constraints
The observables of the form of Eq. 180 suggests that it is appropriate to analyze the disorder configurations in terms the empirical 1-interval observables
[TABLE]
The summation over the length corresponds to the density of intervals or
[TABLE]
while the total length of the disorder configurations fixes the normalization (Eq. 177)
[TABLE]
It is thus useful to introduce the following notation to summarize these constraints on the empirical 1-interval observables
[TABLE]
where again the notation is introduced for better readability of the arguments but actually represents the Kronecker symbol .
A.3 Typical values of the empirical 1-interval observables
Since the probability of a disorder configuration is given by Eq. 2 with , the probability distributions of the lengths of the intervals are given by the geometrical distributions
[TABLE]
with the normalization
[TABLE]
and the averaged lengths
[TABLE]
As a consequence, the typical density of the intervals reads
[TABLE]
and the typical values of the empirical 1-interval observables are
[TABLE]
A.4 Large deviations of empirical 1-interval observables
In order to analyze the large deviations of empirical 1-interval observables, one needs to introduce a generalized semi-Markovian model for the disorder, where the lengths of the intervals are drawn with some general distributions (instead of the geometric distributions of Eq. 187). The probability of some configuration of the intervals then reads (up to boundary terms that can be neglected for
[TABLE]
where the action in the exponential is a function of the empirical 1-interval observables introduced in Eq. 183
[TABLE]
while has been introduced in Eq. 186 to summarize the constraints. In this semi-Markovian model, all the disorder configurations that have the same empirical 1-interval observables have the same probability. As a consequence, the probability to see these empirical observables is given by
[TABLE]
where counts the number of disorder configurations that correspond to these empirical observables, while the normalization reads
[TABLE]
When the empirical 1-interval observables take their typical values for this semi-Markovian generalized model (adapted from Eqs 190 and 191 )
[TABLE]
the probability should remain finite as . So the factor should compensate exactly the exponential factor of Eq. 194, i.e. it should display the exponential growth
[TABLE]
When the empirical observables are different from their typical values , we may consider a modified semi-Markovian model with modified probability distributions for the lengths of the intervals that would make the empirical observables typical for this modified model. Equations 196 yield that the modified probability distributions should be chosen as
[TABLE]
where the two denominators coincide as a consequence of the constraints of Eq. 186
Then Eq. 197 translates for this modified model into
[TABLE]
Plugging this result into Eq. 194 yields the large deviation form
[TABLE]
with the rate function
[TABLE]
Related studies on large deviations properties of various semi-Markov processes in continuous time can be found in [11, 106, 107, 108, 109].
Here we wish to return to the initial disorder model corresponding to the geometric distributions of Eq. 187, where the result of Eq. 201, concerning the generalized semi-Markov model of disorder configurations with arbitrary distributions for the lengths of the intervals, becomes
[TABLE]
A.5 Large deviations for observables given by the product of the intervals contributions
The modulus of Eq. 180 can be rewritten in terms of the empirical 1-interval observables of Eq. 183 as
[TABLE]
So the corresponding finite-size Lyapunov exponent of Eq. 4 is a linear function of the empirical 1-interval observables
[TABLE]
Its typical value can be obtained from the typical values of the empirical 1-interval observables of Eq. 191
[TABLE]
The moments of non-integer order of Eq 203 read in terms of the probability of Eq 200
[TABLE]
One thus needs to optimize the function in the exponential in the presence of the constraints of Eq. 186 that can be taken into account via Lagrange multipliers. It is technically more convenient to introduce the empirical density of intervals that appear in the constraints and in the rate function of Eq. 202
[TABLE]
via another constraint.
So we will consider the functional
[TABLE]
The optimization with respect to the empirical 1-interval observable
[TABLE]
yields the forms
[TABLE]
The constraints
[TABLE]
determine the Lagrange multipliers as a function of the other parameters
[TABLE]
The optimization with respect to the interval density
[TABLE]
yields together with Eq. 212 that the value of the Lagrange multiplier is fixed by the condition
[TABLE]
while the remaining constraint
[TABLE]
determines the value of the density .
The value of the functional of Eq. 208 for the optimal solution satisfying the constraints
[TABLE]
actually reduces to the Lagrange multiplier .
In summary, the scaled cumulant generating function governing the exponential growth of the moments of Eq. 206
[TABLE]
is the solution of Eq 214
[TABLE]
that involves the distribution of the lengths of the intervals of the disorder configurations (Eq 187) and the functions of Eq. 180 of the observable under study. One can check that the expansion at first order in around with Eq. 10
[TABLE]
allows to recover the typical value de Eq 205, while the special case allows to recover the scaled cumulant generating function of Eq 25 concerning the simpler case of products of random variables.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y. Oono, Progress of Theoretical Physics Supplement 99, 165 (1989).
- 2[2] R.S. Ellis, Physica D 133, 106 (1999).
- 3[3] H. Touchette, Phys. Rep. 478, 1 (2009).
- 4[4] B. Derrida, J. Stat. Mech. P 07023 (2007).
- 5[5] R J Harris and G M Schütz, J. Stat. Mech. P 07020 (2007).
- 6[6] E.M. Sevick, R. Prabhakar, S. R. Williams, D. J. Searles, Ann. Rev. of Phys. Chem. Vol 59, 603 (2008).
- 7[7] H. Touchette and R.J. Harris, chapter ”Large deviation approach to nonequilibrium systems” of the book ”Nonequilibrium Statistical Physics of Small Systems: Fluctuation Relations and Beyond”, Wiley 2013.
- 8[8] L. Bertini, A. De Sole, D. Gabrielli, G. Jona-Lasinio, and C. Landim Rev. Mod. Phys. 87, 593 (2015).
