Gaussian Concentration bound for potentials satisfying Walters condition with subexponential continuity rates
J.-R. Chazottes, J. Moles, E. Ugalde

TL;DR
This paper proves a Gaussian concentration bound for equilibrium states of certain potentials with Walters condition, leading to new results on fluctuations, convergence rates, and an almost-sure CLT in symbolic dynamics.
Contribution
It establishes a Gaussian concentration inequality for a class of potentials with subexponential variation decay, independent of the Lipschitz functions involved.
Findings
Bound on fluctuations of empirical frequencies
Speed of convergence of empirical measures
Almost-sure central limit theorem
Abstract
We consider the full shift where , being a finite alphabet. For a class of potentials which contains in particular potentials with variation decreasing like for some , we prove that their corresponding equilibrium state satisfies a Gaussian concentration bound. Namely, we prove that there exists a constant such that, for all and for all separately Lipschitz functions , the exponential moment of is bounded by . The crucial point is that is independent of and . We then derive various consequences of this inequality. For instance, we obtain bounds on the fluctuations of the empirical frequency of blocks, the speed of convergence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Gaussian Concentration bound for potentials satisfying Walters condition with subexponential continuity rates
J.-R. Chazottes JRC benefited from an ECOS Nord project for his stay at San Luis Potosí. Centre de Physique Théorique, Ecole Polytechnique, CNRS, 91128 Palaiseau, France.
Email: [email protected]
J. Moles JM benefited from an ECOS Nord project for his stay at Ecole Polytechnique. Instituto de Física, Universidad Autónoma de San Luis Potosí, S.L.P., 78290 México
Emails: [email protected], [email protected]
E. Ugalde EU acknowledges Ecole Polytechnique for financial support (one-month stay). Instituto de Física, Universidad Autónoma de San Luis Potosí, S.L.P., 78290 México
Emails: [email protected], [email protected]
(Dated: )
Abstract
We consider the full shift where , being a finite alphabet. For a class of potentials which contains in particular potentials with variation decreasing like for some , we prove that their corresponding equilibrium state satisfies a Gaussian concentration bound. Namely, we prove that there exists a constant such that, for all and for all separately Lipschitz functions , the exponential moment of is bounded by \exp\big{(}C\sum_{i=0}^{n-1}\mathrm{Lip}_{i}(K)^{2}\big{)}. The crucial point is that is independent of and . We then derive various consequences of this inequality. For instance, we obtain bounds on the fluctuations of the empirical frequency of blocks, the speed of convergence of the empirical measure, and speed of Markov approximation of . We also derive an almost-sure central limit theorem.
Keywords: concentration inequalities, empirical measure, Kantorovich distance, Wasserstein distance, d-bar distance, relative entropy, Markov approximation, almost-sure central limit theorem.
Contents
1 Introduction
We consider the full shift where , being a finite alphabet. Given an ergodic measure on and a continuous observable , we know by Birkhoff’s ergodic theorem that converges, for -almost every , to . (We use the standard notation .) To refine this result, we need more assumptions on and . For instance, if is the equilibrium state for a Lipschitz potential and is also a Lipschitz function, then the following central limit theorem holds:
[TABLE]
for all , where is the variance of the process where is distributed according to . 111 \sigma_{f}^{2}=\int f^{2}{\mathrm{d}}\mu_{\phi}-\Big{(}\int f{\mathrm{d}}\mu_{\phi}\Big{)}^{2}+2\sum_{\ell\geq 1}\left(\int f\cdot f\circ T^{\ell}{\mathrm{d}}\mu_{\phi}-\Big{(}\int f{\mathrm{d}}\mu_{\phi}\Big{)}^{2}\right) This result says in essence that the fluctuations of are with high probability of order , when . Fluctuations of order , referred to as ‘large deviations’, are unlikely to appear. Indeed, for instance one has
[TABLE]
where and is the so-called ‘rate function’ which is (strictly) convex, such that , and equal to outside a certain finite interval .222For two positive sequences , means that . Of course, both the central limit theorem and the large deviation asymptotics have been obtained for more general potentials, and for more general ‘chaotic’ dynamical systems. For a fairly recent review on probabilistic properties of nonuniformly hyperbolic dynamical systems modeled by Young towers, we refer to [4].
In this paper, we are interested in concentration inequalities which describe the fluctuations of observables of the form around their average. The only restriction on is that it has to be separately Lipschitz. By this we mean that, for all , there exists a constant with
[TABLE]
for all points in , where is the usual distance on (see (2.1)). So can be nonlinear and implicitly defined. Of course, such a class contains partial sums of Lipschitz functions, namely functions of the form for which for all . Beside considering very general observables, the other essential characteristics of concentration inequalities is that they are valid for all , contrarily to the above two results which are valid only in the limit . More precisely, we shall prove the following ‘Gaussian concentration bound’. There exists a constant such that, for all and for all separately Lipschitz functions , we have
[TABLE]
The crucial point is that is independent of and . By a standard argument (see below), the previous inequality implies that for all
[TABLE]
The Gaussian concentration bound (1.3) is known for Lipschitz potentials [7]. We shall prove that it remains true for a large subclass of potentials satisfying Walters condition. For instance, the bound holds for a potential whose variation is for some . The proof of our result relies on two main ingredients. First, we start with a classical decomposition of as a telescopic sum of martingale differences. Second, we have to do a second telescoping to use Ruelle’s Perrons-Frobenius operator. But we do not have a spectral gap anymore as in [7] (in the case of Lipschitz potentials). Instead, we use a result of V. Maume-Deschamps [18] based on Birkhoff cones.
We apply the Gaussian concentration bound and its consequences, like (1.4), to various observables. On the one hand, we obtain concentration bounds for previously studied observables. We get the same bounds but they are no more limited to equilibrium states with Lipschitz potentials. On the other hand, we consider observables not considered before. Even when , we get a non-trivial bound. We then obtain a control on the fluctuations of the empirical frequency of blocks around , uniformly in . We then consider an estimator of the entropy based on hitting times. The next application is about the speed of convergence of the empirical measure towards in Wasserstein distance. Then we obtain an upper bound for the -distance between any shift-invariant probability measure and . This distance is bounded by the square root of their relative entropy, times a constant. A consequence of this inequality is a bound for the speed of convergence of the Markov approximation of in -distance. Then we quantify the ‘shadowing’ of an orbit by another one which has to start in a subset of with -measure , say. Finally, we prove an almost-sure version of the central limit theorem. This application shows in particular that concentration inequalities can also be used to obtain limit theorems.
2 Setting and preliminary results
Let where is a finite set. We denote by the elements of (hence ), and by the shift map: , . (We use upper indices instead of lower indices because we will need to consider bunches of points in , e.g., , .) We use the classical distance
[TABLE]
where is some fixed number. Probability measures are defined on the Borel sigma-algebra of which is generated by cylinder sets. Let be a continuous potential, which means that
[TABLE]
The sequence is the modulus of continuity of and it is called the ‘variation’ of in our context. By the way, we denote by the Banach space of real-valued continuous functions on equipped with the supremum norm . We put further restrictions on , namely that it must satisfy the Walters condition [22]. For in let
[TABLE]
We assume that exists and that there exists such that
[TABLE]
Now for let
[TABLE]
Definition 2.1**.**
* is said to satisfy Walters’ condition if is a strictly positive sequence and decreases to [math] as .*
We now make several remarks on Walters’ condition. First, observe that locally constant potentials do not satisfy this condition because for all larger than some . But one can in fact work with any strictly positive sequence decreasing to zero such that for all , e.g., for some fixed . Second, one easily checks that
[TABLE]
Hence the set of potentials satisfying Walters’ condition contains the set of potentials with summable variation. In particular, is bounded above by a geometric sequence if and only if is also bounded above by a geometric sequence. This corresponds to the case of Lipschitz or Hölder potentials (with respect to ).
Now define Ruelle’s Perron-Frobenius operator as
[TABLE]
The next step is to define a function space preserved by and on which it has good spectral properties. We take the space of Lipschitz functions with respect to a new distance built out of as follows.
Definition 2.2** (The distance ).**
For let
[TABLE]
and .
Now define
[TABLE]
and
[TABLE]
One can then define a norm on , making it a Banach space, by setting
[TABLE]
Remark 2.1**.**
The usual Banach space of Lipschitz functions is defined as follows. Let
[TABLE]
and
[TABLE]
The canonical norm making a Banach space is .
In view of (2.3), if we have , then . If we now have, for instance, for some , then we get a bigger space which contains in particular all functions such that with .
The following result is instrumental to this article. In brief, it tells us that a potential satisfying Walters’ condition has a unique equilibrium state, which will be denoted by , and gives a speed of convergence for the properly normalized iterates of the associated Ruelle’s Perron-Frobenius operator. The first part of the theorem is due to Walters, while the second one is due to Maume-Deschamps and can be found in her PhD thesis [18, Chapter I.2]. Unfortunately, her result was not published even though it is much sharper than the result in [16].
Theorem 2.1** ([22], [18]).**
Let satisfying Walters’ condition as above. Then the following holds.
- A.
There exists a unique triplet such that and is strictly positive, , , a fully supported probability measure such that . Moreover, and , and has a unique equilibrium state which is mixing. 333This means that for any pair of cylinders . In particular is ergodic.
- B.
There exists a positive sequence converging to zero, such that, for any ,
[TABLE]
Morover, one has the following behaviors:
If for some , then there exists such that . 2. 2.
If for some , then . 3. 3.
If for some and , then, for any , . 4. 4.
If for some and , then there exists such that \epsilon_{n}=O\big{(}\operatorname{e}^{-c^{\prime}n^{\frac{\alpha}{\alpha+1}}}\big{)}.
The fact that is an equilibrium state means that it maximizes the functional over the set of shift-invariant probability measures on , where is the entropy of , and the maximum is equal to the topological pressure of (see e.g. [15]), and we have .
Let us give examples of potentials. First consider and , and define
[TABLE]
One can check that . This is the analog of the so-called long-range Ising model on . Let us now take and let . Let be a monotone decreasing sequence of real numbers converging to [math] and define
[TABLE]
One can check that . This example is taken from [19].
Remark 2.2**.**
Let us briefly explain how we can interpret an equilibrium state for a non Lipschitz potential as an absolutely continuous invariant measure of a piecewise expanding map of the unit interval with a Markov partition. It is well-known that a uniformly expanding map of the unit interval with a finite Markov partition which is piecewise , for some , can be coded by a subshift of finite type over a finite alphabet. Then, induces a potential on which is Lipschitz (with respect to ). The pullback of is then the unique absolutely continuous invariant probability measure for . In [10], the authors showed that, given which is not Lipschitz, one can construct a uniformly expanding map of the unit interval with a finite Markov partition which is piecewise , but not piecewise for any , and such that the pullback of is the Lebesgue measure.
3 Main result and applications
3.1 Gaussian concentration bound
We can now state our main theorem whose proof is deferred to Section 4. We start by the definition of separately -Lipschitz functions.
Definition 3.1**.**
A function is said to be separately -Lipschitz if, for all , there exists a constant with
[TABLE]
for all points in .
Theorem 3.1**.**
Suppose that satisfies one of the following conditions:
* (that is, is -Lipschitz);* 2. 2.
* for some ;* 3. 3.
* for some and ;* 4. 4.
* for some and .*
Then the process , with distributed according to , satisfies the following Gaussian concentration bound. There exists such that for any and for any separately -Lipschitz function , we have
[TABLE]
Three remarks are in order. First, we conjecture that this theorem is valid under the condition . Second, it would be useful to have an explicit formula for in (3.1). Unfortunately, this constant is proportional to (see Theorem 2.1) which is cumbersome since it involves the eigendata of . Third, for the sake of simplicity, we considered the full shift . In fact, our results remain true if is a topologically mixing one-sided subshift of finite type. Moreover, one can extend Theorem 3.1 to bilateral subshifts of finite type by a trick used in [7].
We now give some corollaries of our main theorem that we will be used in the section on applications. First, by (2.3) we immediately obtain the following corollary.
Corollary 3.2**.**
If there exists such that
[TABLE]
then we have the Gaussian concentration bound (3.1).
Next, we get the following concentration inequalities from (3.1).
Corollary 3.3**.**
For all , we have
[TABLE]
and
[TABLE]
Proof.
Inequality (3.2) follows by a well-known trick referred to as Chernoff’s bounding method [2]. Let us give the proof for completeness. Let . For any random variable , Markov’s inequality tells us that for all . Now let
[TABLE]
Using (3.1) and optimizing over , we get (3.2). Inequality (3.3) follows by applying (3.2) to and then summing up the two bounds. ∎
The last corollary we want to state is about the variance of any separately -Lipschitz function.
Corollary 3.4**.**
We have
[TABLE]
Proof.
To alleviate notations, we simply write instead of , instead of , and so on and so forth. Applying (3.1) to where is any real number different from [math], we get
[TABLE]
Now by Taylor expansion we get
[TABLE]
Dividing by on both sides and then taking the limit , we obtain the desired inequality. ∎
Although we were not able to prove the Gaussian concentration bound for separately -Lipschitz functions, for many applications separately -Lipschitz functions are more natural. Furthermore there is a notable class of separately -Lipschitz functions, namely Birkhoff sums of the potential itself, for which our theorem holds. Indeed, when , the function is obviously separately -Lipschitz and for all . We have the following result.
Theorem 3.5**.**
Under the hypotheses of Theorem 3.1, there exists such that, for any , for all , and for all , we have
[TABLE]
The proof is left to the reader. The main (simple) modification lies in the proof of Lemma 4.3 in which considering a Birkhoff sum of a -Lipschitz function works fine, whereas we are stuck for a general separately -Lipschitz function.
We will apply this result with to derive concentration bounds for hitting times. Note that under the assumptions of this theorem, satisfies the central limit theorem [18, Chapter 2].
3.2 Related works
The novelty here is to prove a Gaussian concentration bound for potentials with a variation decaying subexponentially. For is Lipschitz, Theorem 3.1 was proved in [7]. The main goal of [7] was then to deal with nonuniformly hyperbolic systems modeled by a Young tower. For a tower with a return-time to the base with exponential tails, the authors of [7] proved a Gaussian concentration bound. For polynomial tails, they proved moment concentration bounds. For maps of the unit interval with an indifferent fixed point, which are thus nonuniformly expanding, we are in the latter situation. In view of Remark 2.2 above, we deal here with maps whose derivative is not Hölder continuous, but which are still uniformly expanding.
Let us also mention the paper [14] in which the authors prove a Gaussian concentration bound for of summable variation (whereas we need a bit more than summable). Their proof is based on coupling. However, they consider functions on , not on \big{(}A^{\mathds{N}}\big{)}^{n}=\Omega^{n} as in this paper. For such functions, the analogue of is . It is clear that a Gaussian concentration bound for functions K:\big{(}A^{\mathds{N}}\big{)}^{n}\to\mathds{R} implies a Gaussian concentration bound for functions , but the converse is not true.
3.3 Applications
We now give several applications of the Gaussian concentration bound (3.1) and its corollaries. Throughout this section, is the equilibrium state for a potential satisfying one of the conditions 1-4 in Theorem 3.1.
3.3.1 Birkhoff sums
Let be a -Lipschitz function and define
[TABLE]
whence is the Birkhoff sum of . Clearly, for all . Applying Corollary 3.3 we immediately get
[TABLE]
for all and , where
[TABLE]
This bound can be compared with the large deviation asymptotics (1.2). We see that it has the right behavior in . Replacing by in (3.6) we get
[TABLE]
for all and . This can be compared with the central limit theorem (1.1). We can see that the previous bound is consistent with that theorem. Note that the central limit is about convergence in law, whereas here we obtain a (non-asymptotic) bound from which one cannot deduce a convergence in law.
3.3.2 Empirical frequency of blocks
Take where
[TABLE]
is a given -cylinder. Let
[TABLE]
This is the ‘empirical frequency’ of the block in the orbit of up to time . By Birkhoff’s ergodic theorem, we know that, for each , goes to for -almost all . The next theorem quantifies this asymptotic statement. Notice that we can control the fluctuations of around uniformly in .
Theorem 3.6**.**
For all , for all and for all we have
[TABLE]
where . Moreover, if for some , then
[TABLE]
where .
Proof.
Define the function by
[TABLE]
where
[TABLE]
It is left to the reader to check that , so we get immediately from 3.6
[TABLE]
for all and . To complete the proof, we need a good upper bound for . Actually, this can be done by using again the Gaussian concentration bound. Using (3.1) and Jensen’s inequality we get for any
[TABLE]
The third inequality is obtained by using the trivial inequality
[TABLE]
Taking logarithms on both sides and then dividing by , we have
[TABLE]
There is a unique minimizing the right-hand side, hence
[TABLE]
where we used that . Hence we get the desired estimate. ∎
Note that is the topological entropy of the full shift with alphabet .
3.3.3 Hitting times and entropy
For , let
[TABLE]
This is the first time that the first symbols of appear in . We assume that satisfies
[TABLE]
One can prove (see [9]) that
[TABLE]
Roughly, this means that, if we pick and independently, each one according to , then the time it takes to see the first symbols of appearing in for the first time is .
Theorem 3.7**.**
If satisfies (3.7), then there exist strictly positive constants and such that, for all and for all ,
[TABLE]
and
[TABLE]
These bounds were obtained in [8] when is Lipschitz. Observe that the probability of being above is bounded above by , whereas the probability of being below is bounded above by . The proof of this theorem being very similar to that given in [8], we omit the details and only sketch it. We cannot directly deal with but we have \log T_{x^{0,n-1}}(y)=\log\big{(}T_{x^{0,n-1}}(y)\mu_{\phi}([x^{0,n-1}])\big{)}-\log\mu_{\phi}([x^{0,n-1}]). Then we use Theorem 3.5 for , assuming (without loss of generality) that , that is, , because we can control uniformly in the approximation . To control the other term, we use that the law of is well approximated by an exponential law.
Another estimator of is the so-called plug-in estimator. We could also obtain concentration bounds for it in the spirit of [8].
3.3.4 Speed of convergence of the empirical measure
Instead of looking at the frequency of a block we can consider a global object, namely the empirical measure
[TABLE]
For -almost every , we know that
[TABLE]
where the convergence is in the weak topology on the space of probability measures on . This is a consequence of Birkhoff’s ergodic theorem. The natural question is: how fast does this convergence takes place? We can answer this question by using the Kantorovich distance which metrizes weak topology on :
[TABLE]
We have the following result.
Theorem 3.8**.**
For all and all we have
[TABLE]
where .
Proof.
Let
[TABLE]
Of course, . It is left to the reader to check that
[TABLE]
The result follows at once by applying inequality (3.3). ∎
It is natural to ask for a good upper bound for because this would give a control on the fluctuations of around [math]. Getting such a bound turns out to be difficult. In [5, Section 8] it is proved that
[TABLE]
For two positive sequences , means that . One could in principle get a non-asymptotic but messy bound.
3.3.5 Relative entropy, -distance and speed of Markov approximation
Given and the (non normalized) Hamming distance between and is
[TABLE]
Now, given two shift-invariant probability measures on , denote by and their projections on , and define their -distance by
[TABLE]
where the infimum is taken over all the joint shift-invariant probability distributions on such that and . By [20, Theorem I.9.6, p. 92], the limit following exists:
[TABLE]
and defines a distance on the set of shift-invariant probability measures. It induces a finer topology than the weak topology and, in particular, the -limit of ergodic measures is ergodic, and the entropy is -continuous on the class of ergodic measures.444These two properties are false in the weak topology.
Next, given and a shift-invariant probability measure on , define the -block relative entropy of with respect to by
[TABLE]
One can easily prove that the following limit exists and defines the relative entropy of with respect to :
[TABLE]
where is the topological pressure of :
[TABLE]
This limit exists for any continuous . (To prove (3.11), we use that there exists a positive sequence going to [math] such that, for any and any , is bounded below by and above by .) By the variational principle, with equality if and only if (recall that is the unique equilibrium state of ). We refer to [21] for details. We can now formulate the first theorem of this section.
Theorem 3.9**.**
For every shift-invariant probability measure on and for all , we have
[TABLE]
where . In particular
[TABLE]
Proof.
For a function , define for each
[TABLE]
We obviously have that for all
[TABLE]
A function such that , is -Lipschitz for the Hamming distance (3.9). We now consider the set of functions
[TABLE]
We can identify a function with a function in a natural way: where is defined by . We obviously have and it is easy to check that , . Therefore we can apply the Gaussian concentration bound (3.1) to get
[TABLE]
We now apply an abstract result [1, Theorem 3.1] which says that (3.14) is equivalent to
[TABLE]
Hence (3.12) is proved. To get (3.13), divide by on both sides and take the limit and use (3.10) and (3.11). ∎
We now give an application of inequality (3.13). Let
[TABLE]
The equilibrium state for is a -step Markov measure. One can prove that in the weak topology converges to , but one cannot get any speed of convergence. We get the following upper bound on the speed of convergence of to in the finer topology.
Corollary 3.10**.**
Assume, without loss of generality, that is normalized in the sense that
[TABLE]
Then there exists such that, for all , we have
[TABLE]
where
[TABLE]
More details on how to normalize a potential are given in Subsection 4.1.
Proof.
Using (3.11) and the variational principle we get
[TABLE]
Indeed, since and are normalized, we have in particular that , and by the variational principle . Now
[TABLE]
where we used the inequality for all . Now using the shift-invariance of and replacing by we get
[TABLE]
where we used that . Combining (3.13), (3.16), (3.17) and (3.18) we thus obtain
[TABLE]
It remains to estimate in terms of . We have
[TABLE]
provided that , where we used the inequality valid for . Finally, since , we define to be the smallest integer sucht and we can take
[TABLE]
We thus proved (3.15). ∎
Let us mention the paper [13] in which the authors obtain the same bound for the speed of convergence of Markov approximation, up to the constant. Their approach is a direct estimation of by using a coupling method. The point here is to obtain the same speed of convergence as an easy corollary of inequality (3.13). Let us remark that from (4.8) we get a worse result since we end up with a bound proportional to . The trick which leads to the correct bound was told us by Daniel Takahashi.
3.3.6 Shadowing of orbits
Let be a Borel subset of such that and define for all
[TABLE]
A basic example of set is a cylinder set . The quantity , which lies between [math] and , measures how we can trace, in the best possible way, the orbit of some initial condition not in by an orbit starting in .
Theorem 3.11**.**
For any Borel subset such that , for any and for any
[TABLE]
where
[TABLE]
We give a shorter and simpler proof than in [11].
Proof.
Let . One can easily check that
[TABLE]
It follows from (3.2) that
[TABLE]
for all and for all . We now need an upper bound for . We simply observe that by (3.1) and the definition of
[TABLE]
for all . Hence
[TABLE]
Optimizing this bound over gives
[TABLE]
The theorem follows at once. ∎
3.3.7 Almost-sure central limit theorem
It was proved in [18, Chapter 2] that satisfies the central limit theorem for the class of -Lipschitz functions such , that is, for any such the process satisfies
[TABLE]
where
[TABLE]
If , denotes the law of a Gaussian random variable with mean [math] and variance , that is,
[TABLE]
When we set , the Dirac mass at zero.
Remark 3.1**.**
In fact, a more general statement was proved in [18, Chapter I.2]. Namely, (3.19) holds when is such that and .
Now, for each and , define the probability measure
[TABLE]
where and where, as usual, is the Dirac mass at point . Of course, . Notice that is a random probability measure. Finally, the Wasserstein distance between two probability measures , on the Borel sigma-algbra is
[TABLE]
where the infimum is taken over all probability measures such that
[TABLE]
for any Borel subset of . By the Kantorovich-Rubinstein duality theorem, is equal to the Kantorovich distance which is the supremum of over the set of -Lipschitz functions . We refer to [12] for background and proofs.
Now we can formulate the almost-sure central limit theorem.
Theorem 3.12**.**
Let be a -Lipschitz function. Then, for almost every , we have
[TABLE]
We make several comments. Recall that the Wasserstein distance metrizes the weak topology on the set of probability measures on . Moreover, if is a sequence of probability measures on and a probability measure on , then
[TABLE]
where “” means weak convergence of probability measures on .
To compare with (3.19), observe that Theorem 3.12 implies that for -almost every , , which in turn implies that
[TABLE]
Therefore, the expectation with respect to in (3.19) is replaced by a pathwise logarithmic average in the almost-sure central limit theorem.
Proof.
The proof follows from an abstract theorem proved in [6]. In words, that theorem says the following. Let be a stochastic stationary process where the ’s are random variables taking values in . Assume that if is -Lipschitz and such that , then it satisfies the central limit theorem, that is, for all ,
[TABLE]
where is assumed to be . Moreover, assume that the process satisfies the following variance inequality: There exists such that for all separately -Lipschitz functions for some distance on ,
[TABLE]
Then, the conclusion is that, almost surely,
[TABLE]
converges in Wasserstein distance (or, equivalently, in Kantorovich distance) to . We apply this abstract theorem to the process where is distributed according to with and . Since we have (3.19) and (3.4), the theorem follows. ∎
Remark 3.2**.**
The previous result relies only upon the variance inequality (3.4), which is much weaker than the Gaussian concentration bound of Theorem 3.1. On the one hand, the variance inequality (3.4) should be true for less regular potentials than the ones we consider here. On the other hand, the Gaussian concentration bound should provide a strengthening of Theorem 3.12, namely a speed of convergence.
4 Proof of Theorem 3.1
We follow the proof given in [7] with the appropriate modifications to go beyond Lipschitz potentials.
4.1 Some preparatory results
It is convenient to normalize the potential or, equivalently, the operator in the following way. We use the notations of Theorem 2.1. Let
[TABLE]
Thus
[TABLE]
Let denote the inverse of the Jacobian of , and the inverse of the Jacobian of , that is,
[TABLE]
(Of course .) Therefore we have
[TABLE]
Estimate (2.4) now takes the form
[TABLE]
for any . Finally, we will need the following distortion estimate. Let such that for and such that and . Then it is easy to check (see [18, Chapter 2]) that, for any ,
[TABLE]
for some constant depending only on .
We will use the following inequality relating the distances and .
Lemma 4.1**.**
Suppose that , , or
[TABLE]
Then there exists
[TABLE]
or, equivalently,
[TABLE]
for all .
Proof.
The statement is trivial when . If (4.5) holds, then there exists such that for all
[TABLE]
hence . Then the desired inequalities follow easily from the definitions. ∎
4.2 Proof of Theorem 3.1
Fix a separately -Lipschitz function . It is convenient to think of it as a function on depending only on the first coordinates, therefore for . We endow with the measure obtained as the limit when of the measure on given by . On , let be the -algebra of events depending only on the coordinates (this is a decreasing sequence of -fields). We want to write the function as a sum of reverse martingale differences with respect to this sequence. Therefore, let and . More precisely,
[TABLE]
The function is -measurable and . Moreover
[TABLE]
We then apply Azuma-Hoeffding inequality (see e.g. [17, Page 68]) which says that
[TABLE]
Therefore, the point is to obtain a good bound on . This is the claim of the following lemma.
Lemma 4.2**.**
There exists , depending only on , such that for any one has
[TABLE]
Using this lemma and applying Young’s inequality for convolutions [3, p. 316] twice we obtain
[TABLE]
Remark 4.1**.**
If and are sequences of reals, their convolution is given by . Young’s inequality tells us that if , and with , then
[TABLE]
We used it twice with , and .
Notice that by assumption and by Theorem 2.1 we have . Therefore, using (4.7) at a fixed index and then letting tend to infinity, we get by the dominated convergence theorem
[TABLE]
which is, in view of (4.6), exactly (3.1) with
[TABLE]
Now we are going to prove Lemma 4.2 by proving that is close to an integral quantity. This is the content of the following lemma which is the core of the proof.
Lemma 4.3**.**
There exists , depending only on , such that, for all ,
[TABLE]
where
[TABLE]
Proof of Lemma 4.2.
Applying Lemma 4.3 yields
[TABLE]
Averaging over the preimages of we get exactly , hence the previous bound holds for , proving the lemma. ∎
Proof of Lemma 4.3.
Let us fix a point in and decompose as
[TABLE]
For fixed , we can group together those points which have the same image under , splitting the sum as . Since the jacobian is multiplicative, one has . Let us define two functions and as follows:
[TABLE]
Bearing in mind (4.2), we obtain
[TABLE]
Now we want to prove that to use (4.3). First observe that for any
[TABLE]
since and . Hence
[TABLE]
We now estimate the -Lipschitz norm of . We write
[TABLE]
where and are two points in the same partition element, and their respective preimages , are paired according to the cylinder of length they belong to. Using the distorsion control (4.4) we have
[TABLE]
hence the first sum in (4.8) is bounded in absolute value by
[TABLE]
For the second sum, substituting successively each with , we have
[TABLE]
where we used Lemma 4.1 for the third inequality.
Summing over the different preimages of , we deduce that
[TABLE]
Therefore we can apply (4.3) to get
[TABLE]
Summing those bounds, one obtains
[TABLE]
Finally, when one computes the sum of the integrals of , there are again cancelations, leaving only . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. G. Bobkov and F. Götze. Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. , 163(1):1–28, 1999.
- 2[2] S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities . Oxford University Press, Oxford, 2013. A nonasymptotic theory of independence, With a foreword by Michel Ledoux.
- 3[3] P. Bullen. Dictionary of inequalities . Monographs and Research Notes in Mathematics. CRC Press, Boca Raton, FL, second edition, 2015.
- 4[4] J.-R. Chazottes. Fluctuations of observables in dynamical systems: from limit theorems to concentration inequalities. In Nonlinear dynamics new directions , volume 11 of Nonlinear Syst. Complex. , pages 47–85. Springer, Cham, 2015.
- 5[5] J.-R. Chazottes, P. Collet, and F. Redig. On concentration inequalities and their applications for Gibbs measures in lattice systems. Journal of Statistical Physics , 169(3):504–546, Nov 2017.
- 6[6] J.-R. Chazottes, P. Collet, and B. Schmitt. Statistical consequences of the Devroye inequality for processes. Applications to a class of non-uniformly hyperbolic dynamical systems. Nonlinearity , 18(5):2341–2364, 2005.
- 7[7] J.-R. Chazottes and S. Gouëzel. Optimal concentration inequalities for dynamical systems. Comm. Math. Phys. , 316(3):843–889, 2012.
- 8[8] J.-R. Chazottes and C. Maldonado. Concentration bounds for entropy estimation of one-dimensional Gibbs measures. Nonlinearity , 24(8):2371–2381, 2011.
