On heavy-tail phenomena in some large deviations problems
Fanny Augeri

TL;DR
This paper investigates heavy-tail phenomena in large deviations, showing they can be explained by translation mechanisms, and applies these results to spectral measures, eigenvalues, traces, and last-passage times with heavy-tailed distributions.
Contribution
It establishes a general large deviations principle for functionals under heavy-tailed measures, revealing translation as the key mechanism behind observed deviations.
Findings
Heavy-tail phenomena explained by translation mechanisms.
Large deviations principles for spectral measures and eigenvalues.
Application to last-passage times with heavy-tailed weights.
Abstract
In this paper, we revisit the proof of the large deviations principle of Wiener chaoses partially given by Borel, and then by Ledoux in its full form. We show that some heavy-tail phenomena observed in large deviations can be explained by the same mechanism as for the Wiener chaoses, meaning that the deviations are created, in a sense, by translations. More precisely, we prove a general large deviations principle for a certain class of functionals , where is some metric space, under the -fold probability measure , where , , for which the large deviations are due to translations. We retrieve, as an application, the large deviations principles known for the Wigner matrices without Gaussian tails, of the empirical spectral measure by Bordenave and Caputo,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectral Theory in Mathematical Physics · advanced mathematical theories · Random Matrices and Applications
On heavy-tail phenomena in some large deviations problems
Fanny Augeri111Institut de Mathématiques de Toulouse, France, E-mail: [email protected]
Abstract
In this paper, we revisit the proof of the large deviations principle of Wiener chaoses partially given by Borell [20], and then by Ledoux [31] in its full form. We show that some heavy-tail phenomena observed in large deviations can be explained by the same mechanism as for the Wiener chaoses, meaning that the deviations are created, in a sense, by translations. More precisely, we prove a general large deviations principle for a certain class of functionals , where is some metric space, under the -fold probability measure , where , , for which the large deviations are due to translations. We retrieve, as an application, the large deviations principles known for the Wigner matrices without Gaussian tails in [19], [4], [5] of the empirical spectral measure, the largest eigenvalue, and traces of polynomials. We also apply our large deviations result to the last-passage time, which yields a large deviations principle when the weights follow the law , with .
1 Introduction
In [31], Ledoux proposed a large deviations principle for the Wiener chaoses based on the approach Borell gave in [20] for estimating their tail distribution. The main feature which stands out of the proof is that the large deviations of Wiener chaoses are due to translations by elements of the Cameron-Martin space. The lower bound consists in an application of the Cameron-Martin formula, whereas the upper bound relies on the Gaussian isoperimetric inequality.
More precisely, let be an abstract Wiener space, where is a separable Banach space, is a Gaussian measure on , and the reproducing kernel (see [33] or [23, chapter 4] for proper definitions). Let also be a homogenous Wiener chaos of degree taking values in some Banach space , that is, a random variable in the subspace spanned in by Hermite polynomials of degree . From [31], we know that follows a large deviations principle with speed and good rate function defined by,
[TABLE]
where denotes the norm of the reproducing kernel , and
[TABLE]
We believe Borell and Ledoux’s approach to be extremely fruitful, and can shed a new light on heavy-tail phenomena appearing in the large deviations of certain models, where the large deviations are created also, in a sense, by translations. We already used this approach in a previous work [5] to deal with the question of the large deviations of traces of powers of Gaussian Wigner matrices. Indeed, this problem can be reformulated as understanding the large deviations of Gaussian chaoses defined on spaces with growing dimension. Although this problem cannot be solved directly by using the large deviations principle of Wiener chaoses, the same outline of proof was carried out in this case, and yields a rate function having a similar structure as (1).
We would like here to push further this approach in a more general setting, and give some elements showing that heavy-tail phenomena in the large deviations of certain models can be understood using the paradigm of the Wiener chaoses. To this end, we propose a general large deviations result for a certain class of functionals , where is some metric space, under the -fold probability measure , where , with , for which the large deviations are governed by translations.
As an application of this result, we will retrieve the large deviations principles of different spectral functionals of the so-called Wigner matrices without Gaussian tails. Introduced in [19] by Bordenave and Caputo, the model of Wigner matrices without Gaussian tails designates Wigner matrices whose entries have tail distributions behaving like , with , and . This model gives rise to a heavy-tail phenomenon which enables one to derive full large deviations principles for the spectral measure [19] (see [26] in the Wishart matrix case), the largest eigenvalue [4], and the traces of powers [5].
In the more restricted setting where we assume that the entries have a density with respect to Lebesgue measure which is proportional to , with , and , the large deviations principles of these spectral functionals will fall in a unified way from our general large deviation result.
Another application of this result will consist in a large deviations principle for the last-passage time when the weights are independent and have a density on proportional to for .
2 Main results
Let us present the main results of this paper. For , we denote by the probability measure on with density with respect to Lebesgue measure, and its -fold product measure on . Similarly, we define the probability measure on with density . We will denote for any ,
[TABLE]
We recall that a sequence of random variables taking value in some topological space equipped with the Borel -field , follows a large deviations principle (LDP) with speed , and rate function , if is lower semicontinuous and increases to infinity and for all ,
[TABLE]
where denotes the interior of and the closure of . We recall that is lower semicontinuous if its -level sets are closed, for any . Furthermore, if all the level sets are compact, then we say that is a good rate function.
The purpose of the general large deviations result we will present, is to identify a class of functionals , where is some metric space, for which the large deviations are created by translations. Let us describe first informally the assumptions we will make. Let follow the law . We will assume that admits a kind of deterministic equivalent under additive deformations, given by a certain function , that is,
[TABLE]
in probability, for any sequence , , where will eventually be the speed of deviations. It is convenient to think of as a deterministic equivalent of , where we took the large limit on the variable . Under this assumption, we will show that a large deviations lower bound for at speed , holds with rate function,
[TABLE]
where
[TABLE]
This rate function can be interpreted by saying that to make a deviation around some , needs to make a translation by , which one pays at the exponential scale by .
For the upper bound, we will further assume that for any , the deterministic equivalent (3) holds uniformly in . The upper bound will rely on sharp large deviation inequalities for , where we will need, excepted in the Gaussian case, to neglect the Euclidean enlargements appearing naturally. We thus make the assumption that has a small, in expectation, local Lipschitz constant with respect to when . Finally, under some compactness property of , we will prove that a large deviations upper bound holds for with speed and rate function,
[TABLE]
Thus, if we moreover assume that the upper bound rate function matches the lower rate function, we will get a full large deviations principle with speed . More precisely, we will prove the following result.
2.1 Theorem**.**
Let be a metric space. Let and an infinite subset. Let be a random variable with law . Let be measurable functions. Let be a sequence going to . Define for and , the function
[TABLE]
We set
[TABLE]
*We assume:
(i).(Uniform deterministic equivalent). For any ,*
[TABLE]
*in probability.
(ii).(Control of the Lipschitz constant). If , then for any and , there is a sequence such that,*
[TABLE]
with
[TABLE]
satisfying,
[TABLE]
*(iii).(Compactness). For any , is relatively compact.
(iv).(Upper bound = lower bound). For any ,*
[TABLE]
Then satisfies a LDP with speed and good rate function .
Let us make some remarks on the assumptions of this theorem.
2.2 Remarks*.*
(a). We will prove that under the assumption that for any sequence , , such that ,
[TABLE]
in probability, the lower bound of the LDP holds with the rate function (6).
(b). The assumption that the approximation (7) holds uniformly in is crucial for deriving the upper bound of the LDP with rate function (4), and is one of the most constraining assumptions of Theorem 2.1. In the applications we develop when , this is proven by some concentration inequality and chaining arguments, which can be carried out successfully due to the “sparsity” of the ball .
(c). The formulation of assumption on the Lipschitz constant of is specially designed to include polynomial functionals , as the trace of a polynomial of random matrices. In other words, it says that the “local” Lipschitz constant of , is small enough uniformly on the set . Note that when is -Lipschitz with respect to , a sufficient condition for assumption to be fulfilled is
[TABLE]
This assumption ensures that the deviations of are explained by a heavy-tail phenomenon. For example, it fails to hold for empirical means under when .
(d). The compactness assumption of is made to ensure that is a good rate function. As one can observe in the proof, without it, the upper bound of the LDP holds only for compact sets.
(e). The rate function can be simplified in certain cases. Define the function by,
[TABLE]
One can see that,
[TABLE]
Thus if is lower semi-continuous, then .
The proof is in line with the ideas and the framework developed by Borell and Ledoux in [20], [21] and [31], [23], for the large deviations for Wiener chaoses. To make a parallel with their approach, one can observe that the first step in their proof is to show some deterministic equivalent for the Wiener chaoses when deformed in a direction of the reproducing kernel, that is, by [23, chapter 5 (5.7)], for any ,
[TABLE]
in probability, and even uniformly in the unit ball of , along a discretization of by [23, chapter 5 (5.9)], where defined in (2). Similarly, we make the assumption that a uniform deterministic equivalent holds for the functionals .
For the lower bound, we replace the use of the Cameron-Martin formula, used in the context of abstract Wiener space, with a lower bound estimate of the probability of translated events, that is,
[TABLE]
for a given sequence , subsets such that , and where is some weight function. In the Gaussian case , the translation formula of the Gaussian measure gives this estimate with . When , one can mimic the Gaussian case to get such an estimate (10) with , whereas when , we believe that there is a competition between the speed and the dimension which is not workable in the applications.
Whereas the Gaussian isoperimetric inequality is used in the proof of the upper bound of the deviations of Wiener chaoses, ours will rely on sharp large deviation inequalities for with respect to the weight function , that is
[TABLE]
for some “large enough” subsets . We will show that we can take , which together with (10) will allow us to make the upper and lower bound match. In the Gaussian case, this is due to the Gaussian isoperimetric inequality, whereas when , we will have to call for sharp inf-convolution inequalities for . This is in particular where assumption plays its role since it enables us, when , to neglect the Euclidean balls which come naturally in the deviation inequality of , and consider subsets which are indeed large enough.
These two estimates (10) and (11) are behind the limitation in Theorem 2.1 to the probability measures for . For example, if one replaces the measure by the probability measure on with density , one can show that (10) holds provided has all its coordinates non-negative (and if ). But then, we will have to prove (11) with if the coordinates of are non-negative, and otherwise, which we do not know how to obtain for the subsets we are dealing with in the proof.
This said, we can give a version of Theorem 2.1 for the probability measure , with density , which will be sufficient to prove a LDP result for the last-passage time.
2.3 Theorem**.**
Let and an infinite subset. Let be a random variable distributed according to . Let be measurable functions. Let be a sequence going to . Define as in (4), and for and ,
[TABLE]
*Assume from Theorem 2.1, and,
(iv)’. For any ,*
[TABLE]
Then satisfies a LDP with speed and good rate function .
2.4 Remark*.*
We only state this result for because for , we know how to get the lower bound (10) for a sequence only under the additional assumption on the speed that . But this condition and the requirement cannot be met simultaneously in the applications we will present.
2.1 Applications to Wigner matrices
We present now the applications of Theorem 2.1 to Wigner matrices. We denote by the set of Hermitian matrices when , and symmetric matrices when , of size . We define the class of Wigner matrices whose law is of density with respect to the Lebesgue measure on , where
[TABLE]
for some , and where is the normalizing constant.
We will denote by the empirical spectral measure of a matrix , that is,
[TABLE]
where are the eigenvalues of , and we will denote by the largest eigenvalue of .
We will say that is a Wigner matrix if is a random Hermitian matrix with independent coefficients (up to the symmetry) such that are identically distributed and are identically distributed. If , then by Wigner’s theorem (see [2, Theorem 2.1.1, Exercice 2.1.16], [6, Theorem 2.5]), almost surely,
[TABLE]
where denotes the weak convergence, and is the semi-circular law defined by,
[TABLE]
If we assume furthermore that and , then we know by [7], [6, Theorem 5.1],
[TABLE]
in probability.
As a consequence of Theorem 2.1, we have the following large deviations principles, originally proven in [19], in the case of the empirical spectral measure and in [4] for the largest eigenvalue.
2.5 Theorem**.**
Let . Assume is in the class such that . follows a LDP with respect to the weak topology with speed , and good rate function , defined for any probability measure on by,
[TABLE]
where is a distance compatible with the weak topology, stands for the free convolution (see [2, section 2.3.3] for a definition), and is the semi-circular law.
2.6 Remark*.*
In [19], the rate function is computed explicitly for measures , where is a symmetric probability measure, for which we have
[TABLE]
2.7 Theorem**.**
Let . Assume that is in the class such that . follows a LDP with speed and good rate function , defined for any by,
[TABLE]
with
[TABLE]
and where denotes the Stieltjes transform of , that is,
[TABLE]
2.8 Remark*.*
The constant can be computed explicitly, we refer the reader to [4, section 8] for more details.
If is a collection of independent centered Wigner matrices such that for any , and with entries having finite moments of order , then for any non-commutative polynomial of total degree , we know by [2, Theorem 5.4.2],
[TABLE]
in probability, where and is a free family of semi-circular variables in a non-commutative probability space (see [2, section 5.3] for a definition).
Concerning the large deviations of such normalized traces of polynomials in independent matrices in the class , with we have the following result.
2.9 Theorem**.**
Let and , . Assume is a collection of independent Wigner matrices in the class , such that for , . We assume that is distributed according to , where is of the form (12). Let be a non-commutative polynomial of total degree . We denote by the state on . The sequence
[TABLE]
satisfies a LDP with speed n^{\alpha\big{(}\frac{1}{2}+\frac{1}{d}\big{)}} and good rate function , defined for all by
[TABLE]
where for any ,
[TABLE]
where and is the homogeneous part of degree of .
2.10 Remark*.*
Unlike the previous results on deviations of the spectral measure and the largest eigenvalue, this one allows us to consider Gaussian matrices. As we will see in the proof, the mechanism of deviations of traces of polynomials is the same in both cases , and . This is essentially due to the fact that still in the Gaussian case there is a heavy-tail phenomena which appears when the degree of the polynomial is strictly greater than since there is no exponential moments.
This large deviations principle is an extension, although in a more restricted setting, of the large deviations principle proven in [5], in the case where and for some , for Gaussian matrices and Wigner matrices without Gaussian tails.
2.2 Application to last-passage percolation
Let , . We denote by the subset of vectors of with non-negative coordinates. Let be a collection of weights. We will call a directed path a path in which at each step, one coordinate is increased by . For , we denote by the set of directed paths from to . We will identify a path with the set of its vertices. We define the last-passage time , by
[TABLE]
We know by a work of Martin [34], that if the weights are i.i.d random variables with common distribution function satisfying,
[TABLE]
then for any ,
[TABLE]
where is a continuous function on .
As an application of Theorem 2.3, we will get the following LDP for the last-passage time.
2.11 Theorem**.**
Let . For any , we set . Let be a family of i.i.d random variables distributed according to . The sequence satisfies a LDP with speed and good rate function , defined by
[TABLE]
2.3 Concentration inequalities
In order to prove that assumption holds in the context of Wigner matrices in the class when for the largest eigenvalue and the empirical spectral measure, we will prove some concentration inequalities for Wigner matrices which we would like to present as they can be of independent interest.
To derive such concentration inequalities for functions of the spectrum of random matrices, we will follow the classical argument which consists in considering our functionals as functions of the entries, and taking advantage of the concentration property of the law of the underlying random matrix. This approach is made possible in the setting where the spectrum is a smooth function of the entries, which will be our case as we will work with Hermitian matrices.
For Wigner matrices with bounded entries, or satisfying a Log-Sobolev inequality, or also for certain unitarily or orthogonally invariant models, concentration inequalities for Lipschitz (convex) linear statistics of the eigenvalues and for the largest eigenvalue, have been extensively studied by Guionnet-Zeitouni [28], Guionnet [27, Part II], and Ledoux [32, Chapter 8 §8.5] (see also [2, sections 2.3, 4.4]).
More precisely, we will provide concentration inequalities for the linear statistics, the spectral measure and the largest eigenvalue of random Hermitian matrices satisfying a certain concentration property which will be indexed by some . As we will see, this concentration property will capture the gradation of speeds of large deviations for the spectral functionals we are interested in, as it has been observed in Theorems 2.5 and 2.7.
We now present the concentration property with which we will be working.
2.12 Definition**.**
Let . We will say in the following that a Wigner matrix satisfies the concentration property , if there is a constant , such that for any Borel subset of , such that , and any ,
[TABLE]
if , and
[TABLE]
if , where for any ,
[TABLE]
with
[TABLE]
When , the motivation for defining this concentration property comes from Talagrand’s famous two-levels deviation inequality [39] for the measure , which says that there is a constant such that for any , any Borel subset of with , and ,
[TABLE]
and similarly for .
In particular, the Wigner matrices in the class for satisfy the concentration property with some depending on the parameters of the law of (see (12)). More generally, we know by the results of Bobkov-Ledoux [16, Corollary 3.2], and Gozlan [25, Proposition 1.2] that if is a Wigner matrix with entries satisfying a certain Poincaré-type inequality, where the underlying metric on , , is the following,
[TABLE]
where , standing for the sign of , then satisfies the concentration property with some constant depending on the spectral gap. We will get into more details in section 5 about this functional inequality, and present some workable criterion available for a Wigner matrix to satisfy when .
When , the concentration property of the law of Wigner matrices in the class differs significantly from the case where . We know by Talagrand [38, Proposition 5.1] that as does not have exponential tails, cannot satisfy a dimension-free concentration inequality. Transporting onto , we will prove the following deviation inequality.
2.13 Proposition**.**
Let , . There is a constant depending on , such that for any , Borel subset of , and such that ,
[TABLE]
We will discuss in remark 5.4 in section 5.2 the optimality of such a deviation inequality for . The above proposition justifies the definition of the concentration property in the case where , as it implies that Wigner matrices in the class satisfy this property when .
Regarding the linear statistics of Wigner matrices having concentration , we will consider different families of function whether or . To this end, we define the set of finite signed measures such that its total variation has a finite -moment. Following [37, Chapter 2 §5.1], we define when , the fractional integrals of order of , by
[TABLE]
This definition interpolates for non-integer order the usual iterated integral (see [37, Chapter 1 §2.3] for more details). With these definitions, we will prove the following deviations inequalities.
2.14 Proposition**.**
Let . Let be a Wigner matrix having concentration with some . There is a constant such that if and is some -Lipschitz function, then for any ,
[TABLE]
if , is -Lipschitz and moreover for some such that , then for any ,
[TABLE]
where denotes a median of .
2.15 Remark*.*
The reason for considering the class of function in the case , comes from the fact that we only understand the stability of the empirical spectral measure with respect to , by using a certain distance which controls this class of functions (see section 5.4 for more details).
Still in the case , note that one cannot expect the above concentration inequality to be true for all Lipschitz functions, since a change of large deviations speed may occur as the entries of do not have exponential tails. Indeed, for example if is in the class , Theorem 2.9 tells us the speed of large deviations of is .
2.16 Remark*.*
One can identify the image , by a minor change of [37, Theorem 6.3]. To ease the notation, we will only describe . For any , one can define the fractional integral of order by,
[TABLE]
The function above is well-defined almost everywhere as is integrable on a neighborhood of [math] for almost all by Fubini theorem. With this definition, the set consists of the functions such that there is some and , such that
[TABLE]
2.17 Remark*.*
Note also that the exponential bound can be simplified in the case if , where is a constant independent of . One gets then, for any ,
[TABLE]
In order to state our concentration inequality for the spectral measure, we will work with the following distance defined on the set of probability measures on , denoted by , in order to quantify the deviations:
[TABLE]
where is a compact subset of with an accumulation point, such that , and with the Stieltjes transform of , that is,
[TABLE]
where . This distance metrizes the weak topology on by [2, Theorem 2.4.4].
We will prove the following concentration inequalities for the empirical spectral measure and the largest eigenvalues of Wigner matrices having concentration .
2.18 Proposition**.**
Let . Let be a Wigner matrix satisfying with some . There exists a constant , depending on , such that for any ,
[TABLE]
where \delta_{n}=O\big{(}\kappa n^{-1}(\log n)^{(1/\alpha-1)_{+}}\big{)}, and where for ,
[TABLE]
whereas for
[TABLE]
2.19 Proposition**.**
Let . Let be a Wigner matrix satisfying for some . There is a constant , such that for any ,
[TABLE]
where
[TABLE]
if , and
[TABLE]
if , and where , uniformly in .
2.4 Spectral variation inequalities
We would like also to advertise for some spectral variation inequalities, which are not particularly new, but which are maybe a little less known in the form we will propose. Indeed, to obtain the concentration inequality of Proposition 2.18, we need to understand the stability of the spectrum of Hermitian matrices with respect to the distance for or when .
For , define the -Wasserstein distance on the set of probability measures on with finite -moment by,
[TABLE]
where the infimum is over all coupling between and , two probability measures on with finite -moment.
When , we get as a mere consequence of Lidskii’s theorem (see [15, Theorem III.4.1]) the following lemma.
2.20 Lemma**.**
Let , and .
[TABLE]
As a consequence,
[TABLE]
Whereas for , we obtain by Rofteld’s inequality (see [15, Theorem IV.2.14] or [41]) the following.
2.21 Lemma**.**
Let . Let . For any ,
[TABLE]
where denote the eigenvalues of , and similarly for . Furthermore, there is a positive constant , such that for any ,
[TABLE]
with
[TABLE]
Acknowledgements
I would like to thank my supervisor Charles Bordenave for his inspiring advice and the many fruitful conversations which helped me build the present paper. I am also grateful to Franck Barthe and Michel Ledoux for precious conversations and references, as well as Guillaume Aubrun for pointing me out the result of [24, Proposition 3.2.2]. I would like also to thank IMPA for its welcome, where this work was partially carried out.
2.5 Organization of the paper
In the section 3, we prove some inf-convolution inequalities for . As the large deviations of our functional are governed by translates, we will need some sharp deviation inequalities with respect to the metric (or when ). We will provide a family of weights which captures the asymptotics of the tail distribution of , that is, behaving like when . This will be done by transporting and tensoring the family of optimal weights known for the exponential law due to Talagrand [40, Theorem 1.2].
In the section 4, we give a proof of Theorems 2.1 and 2.3. The upper bound relies on Proposition 4.1 which gives a large deviations sharp upper bound for with respect to the metric using the inf-convolution inequalities proven in section 3. The lower bound is given by Proposition 4.4 which estimates at the exponential scale the probability, under , of an event translated by some element .
The rest of the paper is devoted to applications to Wigner matrices and the last-passage time.
In the section 5, we prove the concentration inequalities of Propositions 2.18 and 2.19 for the largest eigenvalue, linear statistics and empirical spectral measure of Wigner matrices satisfying the concentration property defined in (15) and (16). To do so, we will prove and discuss the spectral variations inequalities in Lemmas 2.20 and 2.21 in section 5.4.
In section 6, we show some uniform deterministic equivalents for the spectral measure, largest eigenvalue and traces of non-commutative polynomials of deformed Wigner matrices in the class . To make the equivalents for the spectral measure and largest eigenvalue of hold uniformly for , we make use of the concentration inequalities we proved in section 5, and perform a classical chaining argument.
In section 7, we provide a deterministic equivalent for the last-passage time under additive deformations of the weights. The strategy to make our equivalent hold uniformly will be the same as for the case of the spectral measure and largest eigenvalue of Wigner matrices in the class , meaning that it will rely on concentration and chaining arguments.
In section 8, we apply Theorem 2.1 in the setting of Wigner matrices in the class , to the spectral measure, the largest eigenvalue (for ) and to traces of non-commutative polynomials (for ). Using of the uniform deterministic equivalents we proved in section 6, we give a proof of Theorems 2.5, 2.7, and 2.9.
Finally we prove in section 9, the large deviations principle for the last-passage time of Theorem 2.11 by applying Theorem 2.3 and using the uniform deterministic equivalent proved in section 7.
3 Inf-convolution inequalities for
Let be a probability measure on , and let be a measurable function on taking non-negative values. Following Maurey (see [35]), we will say that satisfies the -property if for any non-negative measurable function on ,
[TABLE]
where denotes the inf-convolution, that is,
[TABLE]
The -property is closely linked to transportation-cost inequalities. By the Kantorovitch duality (see [42, Theorem 5.10]), and the duality of the entropy (see [22, Lemma 6.2.13]), it is known that under mild assumptions on that the following general inf-convolution inequality,
[TABLE]
satisfied for any non-negative measurable function is equivalent to the following transportation-cost inequality: for any probability measure on ,
[TABLE]
where is the relative entropy of with respect to , and
[TABLE]
In particular, under the assumption that is upper semi-continuous, Kantorovitch duality is valid by [42, Theorem 5.10], so that the equivalence above between (24) and (26) holds.
One can observe that if satisfies the -property, then by Jensen’s inequality, it satisfies also the general inf-convolution inequality (24), and therefore satisfies the transportation-cost inequality (25) with cost function .
Conversely, according to [25, Proposition 4.13], if satisfies the transportation-cost inequality (25) with cost function , then satisfies the -property. If moreover is sub-additive, then one can see that and thus satisfies the -property. Whereas if is convex, then so that satisfies the -property. This remark will be useful later when we will need to translate a transportation-cost inequality into a -property.
More importantly for us, the -property yields deviations bounds with respect to enlargements by the weight . We know from [35, Lemma 4], that if satisfies the -property, then for any Borel subset of , and any ,
[TABLE]
We define another form of inf-convolution inequality, designed to enable us to get the best constants in our weight functions, (and also to deal with the measure when ), which we will call the truncated -property. More precisely, we will say that a measure on with the weight function , satisfies the -truncated -property, where is a Borel subset of , if (23) is true for any non-negative measurable function such that on .
This -truncated -property yields a deviation inequality with respect to enlargement by the weight of the following form: for any Borel subset of such that , and any ,
[TABLE]
The goal of this section is to find, for the measure , when , a family of weights for which a truncated -property is satisfied, and which captures the asymptotics of the tail distribution of . More precisely, we will prove the following proposition.
3.1 Proposition**.**
Let . If , then for any , satisfies the -property with
[TABLE]
where
[TABLE]
If , there are some constants and such that for and , satisfies the -truncated -property, where
[TABLE]
with
[TABLE]
The rest of this section will be devoted to proving the above proposition. We will reduce the problem in a first phase to the one-dimensional case, and to an estimation of the monotone rearrangement of onto .
As the usual -property (see [35, Lemma 1]), the truncated version of the -property tensorizes in the following way.
3.2 Lemma**.**
Let be a probability measure defined on some measurable space , be some measurable subset of and be a measurable function, for .
If satisfies the -truncated -property for , then satisfies the -truncated -property with
[TABLE]
Since we are dealing with the product measure , we can focus on studying the -property for the one-dimensional marginal .
For the exponential measure, we have the following result due to Talagrand, which gives a family of optimal weights .
3.3 Proposition** ([39, Theorem 1.2]).**
Let . Define the weight function for any by,
[TABLE]
For any , satisfies a transportation-cost inequality (25) with cost function .
Note that, . Thus, when , captures the exact asymptotics of the tail distribution of the exponential law.
For technical reasons, we prefer to work with a different family of weights than the one defined in Proposition 3.3. In the following corollary, we reformulate Talagrand’s result for the symmetric exponential measure .
3.4 Corollary**.**
Let . We define the weight function , for any , by
[TABLE]
For any , satisfies the -property. As a consequence, satisfies the -property, with defined in Proposition 3.1.
This reformulation reveals in particular the structure of the enlargements given by the weights which consist in a mixture of and -balls.
Proof.
As is a convex function, we know by [25, Proposition 4.13] that satisfies the -property. To prove Corollary 3.4, it suffices to prove that for any . Since both functions are even, it is sufficient to prove the inequality on . Let . By Taylor’s formula
[TABLE]
for some . If and , we get
[TABLE]
If , we have
[TABLE]
Thus, for .
After tensorization (see [35, Lemma 1]), we obtain that satisfies the -property with defined in Proposition 3.1.
∎
For , the general strategy is to transport this -property of the symmetric exponential law to obtain a -property for . It extends in our setting of truncated -property, a result of Maurey [35, Lemma 2].
3.5 Lemma**.**
Let be a Borel subset of . Let be a probability measure on and let be a bijective measurable map. Assume satisfies the -property. Let be a Borel subset of and let be a weight function such that,
[TABLE]
Then, satisfies the -truncated -property.
Proof.
Let be a measurable non-negative function being on . Applying the -property of to , we get
[TABLE]
But, as is a bijection and on ,
[TABLE]
From the assumption on , we deduce
[TABLE]
Therefore,
[TABLE]
∎
In particular, in the one-dimensional case, if satisfies the -property and is even and non-decreasing on , then satisfies the -truncated -property with any even weight function such that
[TABLE]
where is defined for any by,
[TABLE]
If and are two probability measures on , we define the monotone rearrangement of onto by,
[TABLE]
This defines a unique non-decreasing map if the distribution function of is invertible, which sends to .
Let be the monotone rearrangement of onto . One can easily check that is an odd function, and that its restriction on satisfies,
[TABLE]
where is the normalizing constant of , so that is the monotone rearrangement of onto . Thus, we are reduced to understand the behavior of the map and how it deforms the weights of Proposition 3.3.
3.1 Behavior of the monotone rearrangement
When , we have the following estimate on the monotone rearrangement due to Talagrand [39].
3.6 Lemma** ([39, Lemma 2.5]).**
Let . Let be the monotone rearrangement sending to . Denote by the function defined for any by,
[TABLE]
There is a constant depending on such that for any ,
[TABLE]
3.7 Remark*.*
In [39, Lemma 2.5], this estimate is derived for the monotone rearrangement of onto . But since,
[TABLE]
one easily deduces the same estimate for , together with the fact that if have opposite signs,
[TABLE]
where is some constant and where we used the fact that .
To get the exact asymptotic of the tail distribution of we will need of the following finer estimate on the monotone rearrangement.
3.8 Lemma**.**
Let . Define for any ,
[TABLE]
There is a constant depending on , such that for any , and ,
[TABLE]
Proof.
By definition of , we have for any ,
[TABLE]
where is the normalizing constant of . Let and such that , and . If and have the same signs, we can assume without loss of generality, that both . As , we have . Thus,
[TABLE]
We have, on one hand, as ,
[TABLE]
And on the other hand,
[TABLE]
Therefore, as ,
[TABLE]
for some constant . Now, if and have opposite signs, we can assume without loss of generality that and . Then, so that,
[TABLE]
Thus, we can find some constant such that .
∎
3.9 Remark*.*
The truncation we performed here is made to ensure we get the best constant (that is ) in the estimate of the large increments of the monotone rearrangement. Indeed, defining as in (30), we would get for ,
[TABLE]
with .
When , we get the following estimate on the monotone rearrangement of onto . Note that as does not have an exponential tail, the rearrangement map cannot be a Lipschitz function.
3.10 Lemma**.**
Let . Let be the monotone rearrangement of onto . There is a constant depending on such that for any ,
[TABLE]
Proof.
This proof is very much in the spirit of [39, Lemma 2.5]. We begin by bounding from above
[TABLE]
when . The change of variable gives,
[TABLE]
Let . Integrating by parts times, we get
[TABLE]
As , we deduce for any ,
[TABLE]
where is some constant depending on which will vary along the proof. Therefore, for any ,
[TABLE]
By definition satisfies for any ,
[TABLE]
This implies that is an increasing homeomorphism of . For , we have
[TABLE]
From (34), we see that is differentiable, and satisfies for any ,
[TABLE]
Thus by (35), we get for ,
[TABLE]
Dividing by and integrating on we get
[TABLE]
for any . Hence,
[TABLE]
for . By (36) we deduce
[TABLE]
Since is continuous, at the price of taking larger, we have
[TABLE]
Let , and such that . If ,
[TABLE]
Whereas if ,
[TABLE]
Now, if ,
[TABLE]
In conclusion, for any , ,
[TABLE]
The mean value theorem yields
[TABLE]
Using the convexity of , if , or its sub-additivity, when , we get
[TABLE]
with . Together with (38), this gives the claim. ∎
As in the case , we can refine the estimate of Lemma 3.10 to get the following result.
3.11 Lemma**.**
Let . Let be the monotone rearrangement of onto . Let . Define the function by,
[TABLE]
There is some constant , such that
[TABLE]
Proof.
Since and are linked by the the relation (31), the same estimate as in Lemma 3.10 holds for the Brenier map . Therefore, we have for any , and ,
[TABLE]
with . Fix , and . We have
[TABLE]
But we know from (37) that for , , with some constant . Thus, for , there is a constant , which will vary along the proof without changing name, such that
[TABLE]
We deduce that for ,
[TABLE]
Let . Assume now . Proceeding as in the proof of Lemma 3.8 in the case , we assume first that . As , we must have . Then,
[TABLE]
On one hand, as , we have using the sub-additivity of ,
[TABLE]
and on the other hand, by (33),
[TABLE]
where is some constant depending on . Thus,
[TABLE]
As for , we deduce that
[TABLE]
If and have opposite signs, we can assume and , thus and we get,
[TABLE]
As , we deduce
[TABLE]
which ends the proof of the claim. ∎
3.2 A family of weights for
Using transport arguments, we will work in this section at obtaining a family of weights for which capture its exact tail distribution.
3.12 Proposition**.**
Let , , and . There exist some constants depending on such that for any , satisfies the -truncated -property where,
[TABLE]
Proof.
Let and . Let such that
[TABLE]
With this choice of , we will prove that for ,
[TABLE]
with the appropriate constants and , defined in Corollary 3.4, and where is as in (32). Using the result of Lemma 3.5, this will yield the claim.
Let be small enough such that is non-decreasing. This is possible since . Let . If is small enough, we have by Lemma 3.8 or 3.11,
[TABLE]
If , then by Lemma 3.8 we get, as ,
[TABLE]
for some constant which will vary along the proof. Similarly, when , we get by Lemma 3.11,
[TABLE]
Now let . Assume . By Lemma 3.6 and the fact that is non-decreasing, we have
[TABLE]
where is some positive constant. Without loss of generality, we can assume . Then, as , we have , so that we get
[TABLE]
Using the fact that , for some constant , we get the claim in the case . Assume now . From Lemma 3.11 and the fact that is non-decreasing, we deduce
[TABLE]
Without loss of generality, we can assume that . As and , we have
[TABLE]
Thus,
[TABLE]
with some . But, we can find some constant such that
[TABLE]
which, recalling that gives the claim.
∎
We can now give a proof of Proposition 3.1.
Proof of Proposition 3.1.
As satisfies the -truncated -property for , for some and any by Proposition 3.12, we deduce by the tensorization property of the -property (see Lemma 3.2) that satisfies the -truncated -property with defined as in (29).
∎
4 Large deviations
We will prove in this section Theorem 2.1. As sketched in the introduction, the proof will consist in looking for, in a first phase, large deviations inequalities for and lower bounds estimates of the probability of translates.
As a consequence of the truncated -property of Proposition 3.1, satisfied by and the weight functions , we deduce an isoperimetric-type bound for with respect to the metric (or in the case ). This estimate will be of paramount importance to derive the upper bound of Theorem 2.1.
4.1 Proposition**.**
Let , . Let . Let , be two sequences going to as goes to . Let and be Borel subsets of such that
[TABLE]
For , we assume that
[TABLE]
whereas for , we assume . Then,
[TABLE]
4.2 Remark*.*
For , the Gaussian isoperimetric inequality (see [32, Theorem 2.5]) entails the same result without any further assumption on the speed or the set than .
Proof.
Before going into the proof per say, we need to relate the enlargements by the weights , for which we know that satisfies the -property, and therefore a deviation inequality of the type (28), to the -balls. This is the subject of the following lemma.
4.3 Lemma**.**
Let . With the notation of Proposition 3.1, for any , and ,
[TABLE]
with . Moreover, there is a function , such that
[TABLE]
Proof.
We will prove only the first statement, the proof for the second one being similar. Let . By cutting the entries of , we can find , such that , for any , , and
[TABLE]
By the very definition of ,
[TABLE]
and
[TABLE]
Thus, if we let
[TABLE]
and if , then , and .
∎
With this lemma proven, we can now give the proof of Proposition 4.1. We start with the case . As , for large enough, we have . Then, by Lemma 4.3, we have
[TABLE]
But by assumption, . Thus,
[TABLE]
We deduce that,
[TABLE]
As satisfies the -property by Corollary 3.4, we have the following deviation inequality (see (27)),
[TABLE]
As , we get
[TABLE]
Letting going to [math], we get the claim.
Let now . Let and set , with some which is to be chosen later. By Lemma 4.3
[TABLE]
From the assumption that we deduce that for large enough,
[TABLE]
In particular for large enough,
[TABLE]
Put in another way
[TABLE]
Thus,
[TABLE]
As by assumption , we get
[TABLE]
As satisfies the -truncated -property by Proposition 3.1, we deduce the following deviation inequality (see (28)),
[TABLE]
But,
[TABLE]
Let , defined by , where is the monotone rearrangement map sending to . Then sends to , so that,
[TABLE]
From (37), we deduce
[TABLE]
for some constant . But , for some constant . Therefore,
[TABLE]
Thus by Markov’s inequality,
[TABLE]
since we chose . As by assumption, we deduce that for large enough,
[TABLE]
Therefore,
[TABLE]
which gives the claim by taking . ∎
We show in the next proposition that we can bound from below the probability of translates under .
4.4 Proposition**.**
Let . Let be a sequence going to as goes to . Fix some . Let be some Borel subset of such that
[TABLE]
(i). For any sequence of elements of ,
[TABLE]
(ii). If , then for any sequence ,
[TABLE]
4.5 Remark*.*
On can obtain the estimate when for the measures with the additional assumption on the speed, which is actually very restrictive in the applications we have in mind. This is one of the reasons of the limitation of Theorem 2.3 to the case , since we do not know how to produce a meaningful lower bound of such translated sets in this case. Similarly, when , one can see, at least for integer, that the estimate does not hold unless .
Proof.
The proof will essentially follow the lines of [23, Theorem 5.1]. Indeed, in the Gaussian case , this lower bound is derived from the translation formula of the Gaussian measure. The proof for will consist in mimicking the Gaussian case.
If the in the right-hand side of is infinite, then the statement is trivial. If it is finite, we take some , such that , for all . Let for any , . Then, we have,
[TABLE]
where denotes the Lebesgue measure on , and is the normalizing factor. If , then for any ,
[TABLE]
Thus,
[TABLE]
Therefore,
[TABLE]
which gives the claim in the case . Note that the same argument for instead of gives without changes the estimate .
Now, if , we have for any ,
[TABLE]
where stands for the sign of . Thus, for any ,
[TABLE]
where
[TABLE]
and . We have,
[TABLE]
Jensen’s inequality yields,
[TABLE]
But, by Cauchy-Schwarz inequality,
[TABLE]
But for any , since and is symmetric. Thus,
[TABLE]
Using the fact that , we get,
[TABLE]
where is some constant. As , we have
[TABLE]
Note that is was actually very important that we did not bound by in (41), so that is of mean [math] under , and is not too big. When one replaces by , this is exactly where one needs to make an assumption on the speed to identify the leading term.
By assumption, we know that there is some such that for large enough, . Thus, we get for large enough,
[TABLE]
Taking the at the exponential scale , we get the claim.
∎
We can now give a proof of Theorem 2.1. We will essentially follow the proof of the LDP of Wiener chaoses (see [31]), replacing the use of the Cameron-Martin formula by Proposition 4.4, and the Gaussian isoperimetric inequality with Proposition 4.1.
Proof of Theorem 2.1.
Without loss of generality we can and will assume that . **Property of the rate function: ** By assumption , for any ,
[TABLE]
This formulation shows that if and only if there is a sequence , such that
[TABLE]
Thus, , for some fixed , if and only if is a limit point of a sequence such that . Therefore, is lower semi-continuous. Moreover,
[TABLE]
As by assumption the set on the right-hand side is compact, we conclude that is a good rate function.
**Lower bound: ** Let such that . By assumption , there is a sequence such that
[TABLE]
Let . For large enough,
[TABLE]
Let
[TABLE]
Note that
[TABLE]
By assumption , goes to as goes to . From Proposition 4.4, we deduce
[TABLE]
**Upper bound: ** Let be a closed subset of . We can assume without loss of generality that . Let such that . Put in another way,
[TABLE]
As is a good rate function, we can find a such that
[TABLE]
where denotes the -neighborhood for the distance . Thus,
[TABLE]
Let
[TABLE]
Define, similarly as for the lower bound, the event
[TABLE]
By assumption , we know that goes to as goes to . We claim that
[TABLE]
Indeed, if and , then , from the definition (4) of , and
[TABLE]
so that . With this observation we get,
[TABLE]
If , we get by the Gaussian isoperimetric inequality (see [32, Theorem 2.5]) for any large enough so that ,
[TABLE]
which gives the upper bound.
Let now , and , where is given by assumption . With the notation of Theorem 2.1 define,
[TABLE]
By Markov’s inequality and assumption , we deduce
[TABLE]
From assumption , we deduce that . Furthermore, we claim that
[TABLE]
Recall that
[TABLE]
Now, if and , then by definition of , for all
[TABLE]
which yields (43) by triangular inequality. Thus the requirements of Lemma 4.1 are met, and we get
[TABLE]
As this inequality is true for any , we get the upper bound.
∎
We will end this section with the proof of Theorem 2.3.
Proof of Theorem 2.3.
We will follow the same steps as for the proof of Theorem 2.1. The compactness assumption , and the assumption yield that is a good rate function. As shown in the proof of Theorem 2.1, a large deviations upper bound holds with speed and rate function , under the assumptions . Thus, we only have to prove the lower bound. Let such that . We know that there is a sequence such that
[TABLE]
Proceeding as in the proof of Theorem 2.1, if , then for large enough,
[TABLE]
Let
[TABLE]
Note that
[TABLE]
By assumption , goes to as goes to . From Lemma 4.4, we deduce
[TABLE]
which ends the proof of the lower bound. Due to assumption the lower bound and upper bound rate functions match so that a full LDP holds. ∎
5 Concentration inequalities
We will prove in this section the concentration inequalities of Propositions 2.14, 2.18 and 2.19 for the linear statistics, the empirical spectral measure and largest eigenvalue of Wigner matrices satisfying the concentration property introduced by definition 2.12.
5.1 Some examples of Wigner matrices satisfying
Before going into the proofs, we will review some workable criterion for a Wigner matrix to satisfy the concentration property when . The case of of normal concentration has drawn most of the attention, and we refer the reader to [32, section 8.5], [28] or also [27, Part II] for a presentation of the different examples of classical models of random matrices having normal concentration.
When we introduce the notion of Poincaré-type inequalities in the finite-dimensional setting. Let be some distance on . For a smooth function , we define the length of the gradient of with respect to the distance by,
[TABLE]
We say that a probability measure satisfies a Poincaré-type inequality on if there is some , such that for any smooth ,
[TABLE]
where the length of the gradient is taken with respect to .
Following Gozlan [25, Definition 1.1], we will say that a probability measure on satisfies if it satisfies the Poincaré-type inequality on with spectral gap , where is the distance defined in (18).
By the results of Bobkov-Ledoux [16, Corollary 3.2], and Gozlan [25, Proposition 1.2], we know that if a Wigner matrix has entries satisfying , then it satisfies a two-level deviations inequality: for any Borel subset of such that , and ,
[TABLE]
where only depends on , and by [25, Proposition 1.2]) can be taken as
[TABLE]
with for any . In particular, such a Wigner matrix has concentration .
5.1 Remark*.*
We note that when , the Poincaré-type inequality yields a different deviation inequality (the one above is also true for but not sharp) where the mixed enlargement is replaced by (see [25] for more details).
A workable criterion for a probability measure on of the form is given by Gozlan [25, Proposition 1.2] in terms of a growth condition of the potential . More precisely, if
[TABLE]
then satisfies on . We mention also that a criterion is available in higher dimension (although more intricate) in [25, Proposition 3.5], which one may use for the complex entries of Wigner matrices.
In the case of the classical Poincaré inequality, we know by Bobkov [17] (or by Bakry, Barthe, Cattiaux, and Guillin [8]) that any log-concave law on satisfies a Poincaré inequality with a certain spectral gap depending on the dimension. Thus, any Wigner matrix with entries whose laws are log-concave will satisfy .
When , the concentration property is equivalent (see [32, Proposition 1.3]) to the following deviation inequality of Lipschitz functions around their medians, which will be useful in the applications.
5.2 Lemma**.**
Let . Let be a Wigner matrices with entries satisfying for some . Let be a function respectively -Lipschitz and -Lipschitz with respect to , and . Then, for any ,
[TABLE]
where denotes the median of .
5.2 A deviation inequality for ,
In the case , we will show that the Wigner matrices in the class satisfy the concentration property . This fact will follow from the study of the concentration property of the product measures and . It can be shown that the probability measure satisfies a weak Poincaré inequality (see [9, Chapter 7 §7.5]). The derivation of a deviations inequality from the weak Poincaré inequality has been investigated by Barthe, Cattiaux and Roberto [11], and yields a concentration inequality with respect to Euclidean enlargements. We will follow another path which consists, as it was the case for , in transporting Talagrand’s deviation inequality for the symmetric exponential law (17) onto with , using the estimate on the monotone rearrangement map proved in Lemma 3.10. We start with the one-sided probability measure .
5.3 Proposition**.**
Let , , and . There is a constant depending on , such that for any , Borel subset of , and such that ,
[TABLE]
5.4 Remark*.*
This deviation inequality is not optimal in the sense that it fails to capture the Gaussian fluctuations of empirical means from the central limit theorem. This is due to the factor in front of the -ball, which comes from the fact that the increasing rearrangement from to is not a Lipschitz function.
But on the other hand, the factor seems to be sharp, since it yields a non-trivial deviation inequality for
[TABLE]
where is the median of the maximum function under . But from the extreme value theory (see [30, Theorem 1.6.2, Corollary 1.6.3]),
[TABLE]
converges in law to the Gumbel distribution , where
[TABLE]
for some constant . Moreover, as the Gumbel distribution has a right-tail behaving like , we see that the part in the enlargement of the deviations inequality of Proposition 5.3 is justified.
Proof of Proposition 5.3.
Let , defined by , which sends to . Let , and be a measurable subset of such that . In a first step, we will use Lemma 3.10 to see how the map transform the set . Actually, to transport the deviation inequality of it is sufficient to understand how deforms for a well-chosen subset of such that . To this end, define
[TABLE]
where is some constant which will be chosen later. Let , , and . By Lemma 3.10, we have
[TABLE]
where the inequality has to be understood coordinate-wise, the functions being applied coordinate by coordinate to the vectors in , and where is a constant depending on which will vary in the rest of the proof without changing name. Thus,
[TABLE]
For , we have
[TABLE]
Once again by Lemma 3.10, we get
[TABLE]
where again this inequality is valid coordinate-wise. Using the convexity of the power function , or its sub-additivity, we get
[TABLE]
Note that Hölder’s inequality implies
[TABLE]
with . Thus,
[TABLE]
Therefore,
[TABLE]
We now simplify the enlargement on the right-hand side. Observe that for any ,
[TABLE]
Indeed, if , then
[TABLE]
and
[TABLE]
Thus, , with and . Therefore, as , , and ,
[TABLE]
Thus,
[TABLE]
Applying the deviation inequality (17) of , we get
[TABLE]
where is some constant independent of . But, since
[TABLE]
for some numerical constant , we have by Markov’s inequality
[TABLE]
Thus,
[TABLE]
But, as , and is a bijection,
[TABLE]
Using (47), we deduce
[TABLE]
Adjusting the constant we get the claim. ∎
As observed in remark 3.7, the monotone rearrangement of onto , satisfies the same estimate of Lemma 3.10 as . Therefore, the same arguments as for the proof of Proposition 5.3 can be carried out, and yield a similar deviation inequality for which we stated in Proposition 2.13.
In view of this deviation inequality for , we see that a Wigner matrix in the class when satisfies the concentration property .
As for the case where , the concentration property can be translated into a deviation inequality for Lipschitz or Hölder functions when , as stated in the following lemma.
5.5 Lemma**.**
Let . Assume satisfies the concentration property for some . Let be a function respectively -Lipschitz and -Lipschitz with respect to , and . There is a constant depending on , such that if is moreover -Lipschitz with respect to , then for any ,
[TABLE]
whereas if
[TABLE]
for some , then for any ,
[TABLE]
where is the median of .
5.3 Concentration inequalities for the largest eigenvalue
We will prove in this section Proposition 2.19. We will see that it will fall easily form Weyl’s inequality [15, Theorem III.2.1], as it enables one to compute the Lipschitz constants of the largest eigenvalue function with respect to the distances when and when on .
Proof of Proposition 2.19.
Let . Let be a Wigner matrix satisfying the concentration property for some . By Weyl’s inequality [15, Theorem III.2.1], the function
[TABLE]
is -Lipschitz with respect to the -Schatten (pseudo-)norm for any , which is defined by
[TABLE]
Let denote the median of , and . As , we have by [43, Theorem 3.32]. Thus, is also -Lipschitz with respect to . Applying Lemmas 5.2 and 5.5 successively to and , we deduce that for any ,
[TABLE]
with defined in Proposition 2.19, and where is some constant depending on . Integrating the above inequality (49), we get
[TABLE]
if , and
[TABLE]
if , which gives the claim.
∎
5.4 Two lemmas on spectral variation of Hermitian matrices
In view of Lemmas 5.2 and 5.5, proving the concentration inequalities of Propositions 2.14 and 2.18 require to compute the Lipschitz constants of the empirical spectral measure of Hermitian matrices, with respect to when , and when , and a well-chosen distance on .
We will prove and discuss in this subsection Lemmas 2.20 and 2.21. For , we denote by the -Wasserstein distance, defined for any probability measures , on with finite -moments by,
[TABLE]
if and by,
[TABLE]
if , where the infimum is taken on all coupling between and .
We begin with the proof of Lemma 2.20.
Proof of Lemma 2.20.
By Lidskii’s theorem (see [15, Corollary III 4.2]), we have
[TABLE]
where denotes the vector of eigenvalues of in decreasing order, and the majorisation relation between vectors of (see [15, Chapter II] for a proper definition). Thus, by [15, Theorem II.3.1] we get, since is convex as ,
[TABLE]
Using the decreasing coupling between the spectra of and , we get
[TABLE]
where denotes the -Schatten norm, defined in (48). But as , we have by [43, Theorem 3.32],
[TABLE]
which ends the proof of the first inequality of Lemma 2.20.
As a consequence of the Kantorovitch-Rubinstein duality (see [42, Particular case 5.16]), we have
[TABLE]
where is as in (20). Besides, Jensen’s inequality yields for any ,
[TABLE]
Therefore,
[TABLE]
which gives the second claim of the lemma. ∎
5.6 Remark*.*
When , the inequality for ,
[TABLE]
is no longer true, since for it amounts to (52), which is false when , by taking , where is the constant vector.
When , one may hope for the inequality
[TABLE]
to hold. But taking formally , would yield
[TABLE]
where denote the set of eigenvalues of and . But one can see that changing entry to a matrix can change the whole spectrum, which disproves (55).
The moral of remark 5.6 is that one cannot have (54) with a constant on the right-hand side. As the cost function behaves quite badly when as it is not convex (see [36] for this transportation problem with concave costs), in particular, the optimal transport map is not necessarily the monotone rearrangement contrary to the case , we will not investigate further the question of having a spectral variation inequality involving the -Wasserstein distance. We prefer to deal with another distance on , the set probability measures on with finite moments, which induces the same topology as and dominates . This distance is chosen so that, applied to empirical spectral measures, it will be controlled by in the case where .
To this end, let and define for any ,
[TABLE]
Taking formally to [math], we retrieve the Kolmogorov-Smirnov distance . Recall that by integrating by parts, we can write
[TABLE]
where NBV denotes the set of normalized functions with bounded variations, that is, functions which are the integrals of finite signed measures, and
[TABLE]
whenever is the distribution function of the finite signed measure , and is the total variation of .
We can actually have a similar formulation for , by introducing the fractional integrals of order on , the set of finite signed measures such that has a finite -moment, which we defined in (19). We recall that fractional integrals enjoy the following integration by parts formula (see [37, (5.16)]): for ,
[TABLE]
Thus, we can write
[TABLE]
where the supremum is taken on all , such that . The inequality (58) is the consequence of the integration by parts formula (57), whereas the equality is given by taking , for . We investigate now the link between the distances , defined in (20), and when .
5.7 Proposition**.**
Let . Then, , defined in (56), is a distance on , and metrizes the weak topology. More precisely, there is a constant such that
[TABLE]
for all . One can choose
[TABLE]
Furthermore,
[TABLE]
5.8 Remark*.*
We actually do not know if the distances and are comparable, meaning that the reversed inequality is true for some . We do know however, by the remark 5.6, that such an inequality cannot hold with some constant staying bounded when .
Proof.
In view of the formulation of as (58), the stake behind (59) is to represent the function as the fractional integral of order of some function. The constant will arise as a bound on the norm of this function as , over .
The fractional integral of order of the function is given in [37], which we state in the next lemma.
5.9 Lemma** ([37, Chapter 2 (5.25)]).**
Let . For any , , we have
[TABLE]
with
[TABLE]
where is the principal branch of the -root on .
Let and as in (62). We have
[TABLE]
where we used . Therefore,
[TABLE]
But, one can recognize an Euler integral of the first kind in the definition of , by making successively the changes of variables , and , which yields,
[TABLE]
Therefore by [3, (2.13)], we deduce the value for claimed in (60).
Inequality (61) is the consequence of the sub-additivity of the function on . More precisely, for any ,
[TABLE]
Integrating the above inequality under a coupling of two probability measures with finite -moment yields the claim.
From (59), we deduce that the topology induced by on is finer than the weak topology, and by (61) that it is coarser than the one induced by . But induces the weak topology on by [42, Theorem 6.9] (as is a metric on for ), therefore induces the weak topology on this set. ∎
We finally prove that the distance we introduced, when applied to spectral measures of Hermitian matrices, is dominated by for , this will directly imply the result of Lemma 2.21.
5.10 Lemma**.**
Let . Let .
[TABLE]
where is defined in (56). In particular,
[TABLE]
where is as in (60).
5.11 Remark*.*
Defining the distance
[TABLE]
for any , we see that we have a similar representation as for , that is,
[TABLE]
where run in such that . Moreover, we clearly get the same inequality as (63) for .
Proof.
As , the second inequality of (63) is due to [43, Theorem 3.32]. To prove the first inequality, we begin by recalling an inequality due to Rotfel’d originally, and then to Thompson [41] (for an extension and a simpler proof). Let be a concave symmetric function. Then for any positive semi-definite,
[TABLE]
where denotes the vector of eigenvalues of a Hermitian matrix . Note that since is symmetric, there is no ambiguity in the writing. Let . We have,
[TABLE]
In particular, if we denote the eigenvalues of some Hermitian matrix , then by Weyl’s inequality [15, Theorem III.2.1], for any ,
[TABLE]
Therefore,
[TABLE]
Define
[TABLE]
Since are Hermitian,
[TABLE]
As is non-decreasing coordinate-wise,
[TABLE]
Rotfel’d inequality gives
[TABLE]
Thus,
[TABLE]
Applying this inequality with , instead of and , we get the first claim. The inequality (63) is a just reformulation of the above inequality and a use of the comparison (52) between -(quasi)-norm and -Schatten (quasi)-norm. Finally, using Proposition 5.7, we deduce that (64) is true. ∎
With the Lemmas 5.10 and 2.20, we can now give a proof of Propositions 2.14 and 2.18.
Proof of Proposition 2.14.
Let and to be a Wigner matrix satisfying the concentration property with some . Lemma 2.20 and Hölder’s inequality allow us to say that if is -Lipschitz, then the function
[TABLE]
where denote the eigenvalues of , is -Lipschitz with respect to for any . Thus, using Lemma 5.2, we deduce the concentration inequality for the linear statistics of Lipschitz functions of Proposition 2.14 in the case .
Assume now that and is -Lipschitz and moreover can be written for some such that , then by Lemma 5.10 (and remark 5.11), we know that the map (65) is -Lipschitz with respect to . Thus we can deduce from Lemma 5.5 the second concentration inequality of Proposition 2.14.
∎
We prove now Proposition 2.18.
Proof of Proposition 2.18.
Fix some . Let denote the function on defined by,
[TABLE]
As , we see that the function is -Lipschitz. Moreover, we know by Lemma 5.9 that when ,
[TABLE]
with , where is as in (60). Let be the median of . Let also . We deduce by Proposition 2.14, and using remark 2.17 in the case , that there is a constant depending on such that,
[TABLE]
where is defined in the statement of Proposition 2.18. Integrating this inequality, we get
[TABLE]
with , uniformly in , . With this notation, we get for any ,
[TABLE]
Let be a -net of . As is -Lipschitz on , we have
[TABLE]
As is a subset of of diameter inferior to , we can find a -net such that . Thus,
[TABLE]
which, adjusting the constant , gives the claim.
∎
6 Deterministic equivalents for Wigner matrices
We will prove in this section some uniform deterministic equivalents for the spectral measure and largest eigenvalue of deformed Wigner matrices having concentration for (see definition 2.12), using the inequalities proved in the preceding section. We will also prove a deterministic equivalent for traces of polynomials of deformed Wigner matrices, but which will not rely on concentration arguments. In particular, these deterministic equivalents will entail that assumption of Theorem 2.1 holds for the spectral measure, the largest eigenvalue and the traces of polynomials of Wigner matrices in . More precisely, we will prove the following propositions.
6.1 Proposition**.**
Let . Let be a Wigner matrix such that and satisfying the concentration property . For any ,
[TABLE]
in probability, where is the distance defined in (20) .
6.2 Remark*.*
This statement fails when since is in for some , with positive probability uniform in . Whereas on one hand, by Wigner’s theorem (see [2])
[TABLE]
in probability, where for any ,
[TABLE]
On the other hand, by continuity of the free convolution (see [14, Proposition 4.13]),
[TABLE]
in probability, and we have by [2, Example 5.3.26].
6.3 Proposition**.**
Let . Let be a centered Wigner matrix satisfying the concentration property such that . Define the function by,
[TABLE]
For any ,
[TABLE]
in probability.
For the traces of polynomials of independent Wigner matrices we will prove the next proposition.
6.4 Proposition**.**
Let . Let be a non-commutative polynomial of total degree . Let be a family of independent centered Wigner matrices with entries having finite -moments, such that for any . For any ,
[TABLE]
in probability, where is the homogeneous part of degree of , is a free family of semi-circular variables in a non-commutative probability space and,
[TABLE]
It is interesting to note that we are able for polynomials, to make the approximation hold uniformly in , which is why we can consider the Gaussian case in our large deviations principle of Theorem 2.9.
6.1 Deterministic equivalents in expectation
Our approach to prove Propositions 6.1 and 6.3 consists is showing in a first step the proposed uniform deterministic equivalents in expectation, and then make use the concentration inequalities of the last section 5 together with a chaining argument to show that these equivalent hold uniformly in probability.
For the empirical spectral measure, we have such a uniform deterministic equivalents in expectation by the following result of Bordenave and Caputo [19].
6.5 Theorem** ([19, Theorem 2.6]).**
Let be a Wigner matrix such that , , and . There exists a universal constant such that for any ,
[TABLE]
where is defined for any ,
[TABLE]
where and denote the Stieltjes transforms of and .
For the largest eigenvalue, we will prove the following proposition.
6.6 Proposition**.**
Let . Let be a centered Wigner matrix such that and . For any ,
[TABLE]
where is the function defined in (67).
Proof.
In a first step, we will perfom a truncation and convolution argument as to the one used in [18, Proposition 4.1, step 1], in order to reduce the problem to the case the entries of satisfies a Poincaré inequality. Let and let be a GUE matrix, that is, where is a matrix with i.i.d complex Gaussian entries with covariance , independent from . We set to be the Hermitian matrix with -entry,
[TABLE]
and . By [10, Theorem 1.2], has entries satisfying a Poincaré inequality .
We know by [29, Theorem 2] that there is some constant such that for any centered Wigner matrix ,
[TABLE]
This inequality yields as the entries of have finite fourth moments,
[TABLE]
But, using Weyl’s inequality [15, Theorem III.2.1], and the fact that is -Lipschitz, we see that is -Lipschitz with respect to . Thus, we can focus on proving Proposition 6.6 when has entries satisfying a Poincaré inequality. We make now another reduction of the statement to a convergence in probability and to the case where the supremum is taken on the set of matrices which we denote by , consisting of -sparse matrices (meaning at most entries are non-zero) with spectral radius bounded by , for some fixed .
Note that by Weyl’s inequality and (52), we have for any ,
[TABLE]
As converges in by [2, Theorem 2.1.22, 27], we deduce that, uniformly in , is uniformly integrable. Therefore it suffices to prove that for any ,
[TABLE]
Let , and be the values in non-increasing order. We have,
[TABLE]
Let now and the locations of the largest values of . Define to be the matrix,
[TABLE]
As , we deduce,
[TABLE]
Thus, again by Weyl’s inequality, it is sufficient to prove for any fixed , , and ,
[TABLE]
To prove this claim, we will follow a rather classical argument relying on the Frobenius formula used in the study of finite rank perturbations as in [13] for example, to determine the behavior of the largest eigenvalue of deformed models.
Diagonalize , with of size such that . By Frobenius formula (see [13, section 4.1]), is either in the spectrum of , denoted , or the largest zero of the function,
[TABLE]
Our main task consists in proving that this function is uniformly close on any compact subset of to the following deterministic limit function,
[TABLE]
6.7 Lemma**.**
Let and define,
[TABLE]
For all subset compactly included in and ,
[TABLE]
where and are defined in (70), (71).
Assume for the moment that this lemma is true. Note that the functions , , form a normal family of holomorphic functions on . By [1, Chapter 5, Theorem 2], it is thus a pre-compact family in the space of holomorphic functions on . We deduce by Hurwitz’s theorem [1, Chapter 5, Theorem 10] that for any and open subset compactly included in , there is some such that for any holomorphic function defined on a neighborhood of , and such that , then either does not have any zeros in and therefore neither, or for any zeros of in , corresponds a zero of in which is -close.
Let . We set
[TABLE]
Let also be some open subset compactly included in such that . We deduce that for any there is a , such that,
[TABLE]
As this does not depend on , we get from Lemma 6.7
[TABLE]
It remains to show that goes to [math] as uniformly in . Note that almost surely (taking an arbitrary coupling of the matrices ), we have by Hoeffman-Weilandt inequality (51),
[TABLE]
Thus, by Wigner’s theorem, almost surely, converges weakly towards uniformly in . By lower-semicontinuity of the map
[TABLE]
we deduce that
[TABLE]
almost surely. Using the above convergence, (68) and the convergence of the largest eigenvalue of to in probability, we can conclude that
[TABLE]
which gives the claim of Proposition 6.6. Thus, we are reduced to show Lemma 6.7.
∎
Proof of Lemma 6.7.
Let and as in the statement of Lemma 6.7. Let and a Lipschitz function such that
[TABLE]
Let be some unit vectors and . We set
[TABLE]
where . By Weyl’s inequality, this defines a -Lipschitz function with respect to , where is a constant depending on the set . As the entries of satisfies a Poincaré inequality, has concentration . We deduce from Lemma 5.2 that for large enough,
[TABLE]
where . Note that defines a -Lipschitz function in . As is relatively compact, we deduce by an -net argument that for any ,
[TABLE]
In the following lemma, we show an isotropic-like property.
6.8 Lemma**.**
Let and a subset compactly included in . Let be a Wigner matrix satisfying the assumptions of Proposition 6.6. For any ,
[TABLE]
where denotes the set of unit -sparse vectors, meaning with at most non-zero entries, is as in (72), and .
Proof.
By polarization, it is sufficient to prove this lemma where the supremum ranges over vectors . Moreover, by symmetry, it is enough to show this statement for . Because , as a function of , is a Lipschitz function on , we only need to show for any ,
[TABLE]
with . Let . For any , we have on one hand,
[TABLE]
On the other hand, expanding the scalar product,
[TABLE]
As converges to in probability, we are reduced to prove for any ,
[TABLE]
Even though this is a classical estimate of random matrix theory, for sake of completeness we give here a proof. We start with the case of the off-diagonal entries. We set and we write as a short-hand for . Let . We have the following resolvent identity (see [12, Lemma 3.5]),
[TABLE]
where is the resolvent of the matrix where we removed the -row and -column, and means that the summation is over . By Cauchy-Schwarz inequality we have
[TABLE]
But, as is independent of and are centered and independent,
[TABLE]
Recall Ward’s identity (see [12, (3.6)]),
[TABLE]
Thus,
[TABLE]
To deal with the diagonal entries, we start from the Schur complement formula (see [2, Lemma 2.4.6]),
[TABLE]
where denotes the -column of where the entry is removed. Let be the -algebra generated by the variables for . We find,
[TABLE]
where and . Introducing the missing diagonal terms, using Ward’s identity again and the fact that , we find,
[TABLE]
where is some positive constant depending on . This yields,
[TABLE]
From Wigner’s theorem, we know that converges to in probability for any . Note that are identically distributed for . We deduce from (73) and the fact that (see [2, Example 5.3.2.6]),
[TABLE]
which yields,
[TABLE]
for any . As the functions and are -Lipschitz on , we can extend by an -net argument, this convergence uniformly on any bounded subset of , for any .
∎
We come back now to the proof of Lemma 6.7. The above lemma yields that for any ,
[TABLE]
Note that -sparse matrices have -sparse eigenvectors. Using the fact that the spectral radius of matrices in is bounded and a union bound, we deduce that for any ,
[TABLE]
where , for any matrix . As the matrices , , form a pre-compact subset of , the continuity of the determinant on , allows us to conclude the proof of Lemma 6.7.
∎
6.2 A chaining argument
We will now give a proof of Propositions 6.1 and 6.3. As it will rely on a chaining argument, we will need the following lemma.
6.9 Lemma**.**
Let and let denote the -ball of for any . Fix some . We denote by , the covering number of by , that is, the minimal number of translates of needed to cover . There is a constant depending on , such that for ,
[TABLE]
Proof.
This estimate is a consequence of the upper bound on entropy numbers of embeddings of in given in [24, Proposition 3.2.2]. Let . Denote by the space equipped with the (quasi)-norm . We define, for ,
[TABLE]
From [24, Proposition 3.2.2], we know that there is a constant such that for ,
[TABLE]
Thus, if we set , for some such that , we deduce the following rough bound,
[TABLE]
for some constant . Let now and set such that . The above inequality tells us that if , then there are balls covering , that is,
[TABLE]
which yields the claim. ∎
We are now ready to give a proof of Proposition 6.1 and 6.3.
Proof of Proposition 6.1.
Let . As satisfies for some constant , we see that also satisfies with the same constant . We know from Propositions 2.18 and 6.5, that for any ,
[TABLE]
with defined in Proposition 2.18 and \varepsilon_{n}=O\big{(}n^{-1/2}(\log n)^{(1/\alpha-1)_{+}}\big{)}, uniformly in . Note that the map
[TABLE]
is -Lipschitz with respect to by Lemma 2.20. We deduce using an -net argument that for large enough,
[TABLE]
where denotes the covering number of by . But, the homogeneity of the norm gives,
[TABLE]
with . We get from Lemma 6.9 applied with ,
[TABLE]
This shows that the covering number is negligible with respect to the speed of the deviations, which concludes the chaining argument. ∎
We finally give a proof of Proposition 6.3.
Proof of Proposition 6.3.
Let . Similarly as in the proof of Proposition 6.1, we deduce from Propositions 2.19 and 6.6, that for any and ,
[TABLE]
where is defined in Proposition 2.19, uniformly in , and is as in (67).
Note that the map is -Lipschitz. From Weyl’s inequality [15, Theorem III.2.1], we deduce that
[TABLE]
is -Lipschitz with respect to the Hilbert-Schmidt norm on . Using an -net argument as in the proof of Proposition 6.1, it is sufficient to prove that for any fixed , the covering number is negligible at the exponential scale , that is
[TABLE]
But from Lemma 6.9, we know that,
[TABLE]
which ends the proof of the claim. ∎
6.3 Traces of polynomials of deformed Wigner matrices
We will now prove Proposition 6.4. Contrary to the spectral measure or the largest eigenvalue, the proof will consist in a simple moment computation.
Proof of Proposition 6.4.
By linearity it is sufficient to show the statement when is a monomial, which we will assume from now on. We can write , with . Define the matrix with coefficients in , by
[TABLE]
Observe that by cyclicity of the trace, for any , . Therefore,
[TABLE]
Write and . We know from the proof of [5, Lemma 2.1] that,
[TABLE]
Let us define -Schatten (quasi-)norm on , for any by,
[TABLE]
Note that for any ,
[TABLE]
Thus, for any ,
[TABLE]
As , . Without loss of generality we can assume . Thus,
[TABLE]
But we know from Wigner’s theorem (see [2, Lemma 2.1.6]), that there is a constant , such that
[TABLE]
Besides,
[TABLE]
By Jensen’s inequality, we deduce
[TABLE]
Therefore,
[TABLE]
We deduce from (75) and (77) that
[TABLE]
uniformly in and where . It is now sufficient to prove that converges to [math] uniformly in , as soon as . Assume first . Using the non-commutative Hölder’s inequality (see [15, Corollary IV.2.6]), we get
[TABLE]
The arithmetic-geometric mean inequality yields,
[TABLE]
As , we deduce
[TABLE]
We conclude that when ,
[TABLE]
If , then and . By Jensen’s inequality,
[TABLE]
Thus, as ,
[TABLE]
Besides, we know by [2, Theorem 5.4.2], that
[TABLE]
where s are a family of free semi-circular variables defined on a non-commutative probability space . This ends the proof of the proposition. ∎
7 Deterministic equivalent for the last-passage time
We will prove in this section the analogue of the results for Wigner matrices of the preceding section, for the last-passage time. More precisely, we will provide a deterministic equivalent for the last-passage time when the matrix of weights is deformed by some matrix , where is bounded for some .
Let denote the set of finite vectors , which we will call admissible, such that , , , and for any , , where denotes the lexicographic order. With this definition we set, for any , where ,
[TABLE]
where for some , where is as in (14), and where we denote here, for better lisibility, the positive part of (, our former notation). With this notation, we will prove the following proposition.
7.1 Proposition**.**
Let . Let be a family of i.i.d random variables following the law . For any ,
[TABLE]
in probability, where denotes the multi-matrix .
We will follow the same arguments as for the proof of the uniform deterministic equivalent of the empirical spectral measure and the largest eigenvalue of Wigner matrices. We will begin by showing that the deterministic equivalent (80) we propose, holds uniformly in expectation. This is the object of the following lemma.
7.2 Lemma**.**
Let . Let be a family of i.i.d non-negative random variables with common distribution function satisfying (13). For any ,
[TABLE]
where is as in (80).
Proof.
Let denote the subset of vectors of of size less or equal than , and define by,
[TABLE]
and ,
[TABLE]
where for some , and is as in (14). We begin by proving that there is some constant depending on , such that for any ,
[TABLE]
In the following will denote a constant which will depend only on and which will vary along the lines of the proof. Let be an optimal path for the last-passage time , and denote by be the largest values of on the path , sorted in lexicographic order. Add and , to get . We have
[TABLE]
As , we deduce
[TABLE]
Now observe that if are the values of (or of ) along in decreasing order, we have since , for any ,
[TABLE]
Therefore,
[TABLE]
for some constant . This proves the upper bound of (81). On the other hand, let . Considering the optimal paths from to in the last-passage time , for and their concatenation , we get,
[TABLE]
Indeed, if , then
[TABLE]
by considering the cases whether or ( and ) or ( and ). Turning our attention to the first sum in (83), we deduce by bounding the first largest weights of by , and using the bound (82) for the rest of the terms,
[TABLE]
By (40) we have,
[TABLE]
for some constant . We thus proved,
[TABLE]
On the other hand, focusing now on the second term of (83),
[TABLE]
But , thus
[TABLE]
Therefore,
[TABLE]
which concludes the proof of the lower bound of (81). Comparing and , we get using the translation invariance in law (by vectors of ) of ,
[TABLE]
As is coordinate-wise non-decreasing as a function of , and converges to which is continuous by [34, Theorem 2.3], we deduce that converges uniformly to on by Dini’s Theorem. Thus,
[TABLE]
where when .
Now, using the same argument as for the upper bound of (81), we see that
[TABLE]
for any . Indeed, if achieves the supremum in , then taking the largest values of on , we get
[TABLE]
Thus, using (82), we get the claim. To summarize, we got by (81), (84), and (85),
[TABLE]
for some constant and for any , which gives finally the claim by taking the as , and then as . ∎
We can now give a proof of Proposition 7.1.
Proof of Proposition 7.1.
Let . Note that is -Lipschitz with respect to on . As since , we deduce that is also -Lipschitz with respect to . Moreover by Hölder’s inequality, is -Lipschitz with respect to . We get by Lemma 5.5, for any ,
[TABLE]
where is the median of , is some strictly positive constant, and
[TABLE]
Integrating this inequality we get,
[TABLE]
uniformly in . Using the result of Proposition 7.1, we deduce that for large enough,
[TABLE]
where . Let now . Note that
[TABLE]
is -Lipschitz with respect to on . Besides, by Lemma 6.9 for any , the covering number of by -balls of radii satisfies,
[TABLE]
Since this estimate is negligible with respect to the concentration bound (86), we deduce using an -net arguments as in the proofs of Propositions 6.1 and 6.3, that
[TABLE]
which ends the proof of the claim. ∎
8 Applications to Wigner matrices
We apply in this section Theorem 2.1 in the setting of Wigner matrices, and we derive the LDP of Theorems 2.5, 2.7 and 2.9. In all this section, will designate a Wigner matrix with the class for some . It is clear that Theorem 2.1 remains valid in the context of Wigner matrices in the class , making the according change in the rate function , by replacing the weight function by , which defines the law of a Wigner matrix in (see (12)).
8.1 Large deviations of the empirical spectral measure
Proof of Theorem 2.5.
From Proposition 6.1, we know that assumption of Theorem 2.1 is satisfied with
[TABLE]
and
[TABLE]
where is the (real) dimension of , with the metric on defined in (20), and .
By Lemma 2.20, we see that is -Lipschitz with respect to on and on . By the remark 2.2 (c), and from the fact that , we deduce that the assumption of Theorem 2.1 holds. Besides, as , we have by [43, Theorem 3.32]
[TABLE]
Thus for any ,
[TABLE]
which shows that is relatively compact by Prokhorov’s theorem, and that is verified.
To prove it is sufficient to show that for a fixed , there is a sequence , , such that
[TABLE]
Let for any , . We have , as for any , and
[TABLE]
Now, if , with and , we define
[TABLE]
We have,
[TABLE]
Thus,
[TABLE]
Besides,
[TABLE]
As , and , we get the claim (87).
∎
8.2 Large deviations of the largest eigenvalue
Proof of Theorem 2.7.
We begin by giving back to its variational form. We claim that for any ,
[TABLE]
where is the function
[TABLE]
Let us prove first that
[TABLE]
When , both sides of (89) are infinite. If , we denote by the right-hand side of (89). The function is the inverse of the Stieltjes transform of on (see [2, Example 5.3.2.6]). Thus, we can write
[TABLE]
As is -homogeneous, and , for any , we get
[TABLE]
Thus, . As is clearly lower semi-continuous, the equality (88) holds by the remark 2.2 (e).
We check now the assumptions of Theorem 2.1. Assumption of Theorem 2.1 is met by the result of Proposition 6.3, with
[TABLE]
where as before is the dimension of , and . Weyl’s inequality [15, Theorem III.2.1] shows that is -Lipschitz with respect to , and thus assumption is satisfied as by the remark 2.2 (c). Besides, note that for any ,
[TABLE]
where we used in the second inequality the fact that and [43, Theorem 3.32]. As is non-decreasing, we deduce for any that,
[TABLE]
which proves that is satisfied. To show that holds, it suffices to observe that if , and if we set for any ,
[TABLE]
then , and provided , we have , so that in particular . ∎
8.3 Large deviations of non-commutative polynomials
Finally, we give a proof of Theorem 2.9.
Proof of Theorem 2.9.
By a homogeneity argument similar as for the proof of Theorem 2.7, we get for any ,
[TABLE]
where denotes the homogeneous part of degree of . From the remark 2.2 (e), we get as is lower semi-continuous, that
[TABLE]
Assumption of Theorem 2.1 is a consequence of Lemma 6.4 with the speed and
[TABLE]
where is the real dimension of .
Let us now prove assumption . Note that by linearity, it suffices to prove assumption when is a monomial of total degree less or equal than , which we will assume from now on. If , then there are two cases to consider. First we see by Hölder’s inequality that is -Lipschitz with respect to . If then , so that as in this case. We conclude by remark 2.2 (c) that assumption holds. If and , then we deduce again by remark 2.2 (c) that assumption is fulfilled as .
In the case , we will need to understand the stability of the function with respect to the Euclidean norm. This is the object of the following lemma.
8.1 Lemma**.**
There is a constant depending on and , such that for any monomial of total degree , and ,
[TABLE]
where for any , denotes the -Schatten norm on , defined in (76).
Proof.
Let
[TABLE]
By the mean value theorem, we have
[TABLE]
Note that if is a monomial of degree in X, then by (79), we have
[TABLE]
As is the sum of at most monomials of degree in X, we get by triangular inequality and the above observation,
[TABLE]
Thus,
[TABLE]
As is convex, we get
[TABLE]
As , we have
[TABLE]
This inequality together with (91) which yields the claim (8.1). ∎
We come back now at the proof of assumption of Theorem 2.1. Let .
Let , and set . As we assumed is a monomial of total degree , from the preceding Lemma 8.1, we have for any ,
[TABLE]
where is some constant depending and . Using the fact that for any and , we get,
[TABLE]
Let and . For ,
[TABLE]
With the notation of Theorem 2.1, we have
[TABLE]
where is the dimension of . By convexity, we deduce
[TABLE]
But by Wigner’s theorem (see [2, Lemma 2.1.6]),
[TABLE]
for some constant . As with , we deduce as ,
[TABLE]
Thus,
[TABLE]
where is some positive constant depending on and . This shows that assumption is satisfied.
We show now that assumption holds. Using (79) for , we get
[TABLE]
where is some constant depending on . This proves condition of Theorem 2.1. To show that the last assumption is met, it suffices to observe that for any fixed , with the same construction as in (90), there is a sequence , for , such that
[TABLE]
and .
∎
9 Application to last-passage time
We prove in this last section Theorem 2.11.
Proof of Theorem 2.11.
We will verify the assumptions of Theorem 2.3. Assumption holds due to Proposition 7.1 with , and
[TABLE]
where is defined in (80), denotes the matrix with coefficients , and is the dimension of . As
[TABLE]
is -Lipschitz with respect to , assumption is satisfied by the remark 2.2 (c).
Using the fact that when , on , we see that the condition of Theorem 2.3 is met. To prove , we first observe that
[TABLE]
Indeed, since the function is superadditive by [34, Proposition 2.1], we deduce that
[TABLE]
for any . Therefore, both sides of (92) are infinite if . Now if , and is such that , then denoting the element of achieving the supremum in (80), we get,
[TABLE]
Using the superadditivity of , it yields
[TABLE]
with equality for the matrix whose entries are all zero except . This proves the equality (92). In particular, is lower semi-continuous and therefore by the remark 2.2 (e), we deduce,
[TABLE]
As the matrices with , achieves (92) for any , we deduce,
[TABLE]
Finally, as , where is the matrix , we get
[TABLE]
This proves the last assumption of Theorem 2.3.
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] L. Ahlfors. Complex analysis: An introduction of the theory of analytic functions of one complex variable . Second edition. Mc Graw-Hill Book Co., New York-Toronto-London, 1966.
- 2[2] G. W. Anderson, A. Guionnet, and O. Zeitouni. An introduction to random matrices , volume 118 of Cambridge Studies in Advanced Mathematics . Cambridge University Press, Cambridge, 2010.
- 3[3] E. Artin. The gamma function . Translated by Michael Butler. Athena Series: Selected Topics in Mathematics. Holt, Rinehart and Winston, New York-Toronto-London, 1964.
- 4[4] F. Augeri. Large deviations principle for the largest eigenvalue of Wigner matrices without Gaussian tails. Electron. J. Probab. , 21:Paper No. 32, 49, 2016.
- 5[5] F. Augeri. On the large deviations of traces of random matrices. ar Xiv:1605.03894, accepted for publication in the Annales de l’Institut Henri Poincaré , May 2016.
- 6[6] Z. Bai and J. W. Silverstein. Spectral analysis of large dimensional random matrices . Springer Series in Statistics. Springer, New York, second edition, 2010.
- 7[7] Z. D. Bai and Y. Q. Yin. Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue of a Wigner matrix. Ann. Probab. , 16(4):1729–1741, 1988.
- 8[8] D. Bakry, F. Barthe, P. Cattiaux, and A. Guillin. A simple proof of the Poincaré inequality for a large class of probability measures including the log-concave case. Electron. Commun. Probab. , 13:60–66, 2008.
