Gaussian Concentration bound for potentials satisfying Walters condition   with subexponential continuity rates

J.-R. Chazottes; J. Moles; E. Ugalde

arXiv:1902.07146·math.DS·February 19, 2020

Gaussian Concentration bound for potentials satisfying Walters condition with subexponential continuity rates

J.-R. Chazottes, J. Moles, E. Ugalde

PDF

TL;DR

This paper proves a Gaussian concentration bound for equilibrium states of certain potentials with Walters condition, leading to new results on fluctuations, convergence rates, and an almost-sure CLT in symbolic dynamics.

Contribution

It establishes a Gaussian concentration inequality for a class of potentials with subexponential variation decay, independent of the Lipschitz functions involved.

Findings

01

Bound on fluctuations of empirical frequencies

02

Speed of convergence of empirical measures

03

Almost-sure central limit theorem

Abstract

We consider the full shift $T : Ω \to Ω$ where $Ω = A^{N}$ , $A$ being a finite alphabet. For a class of potentials which contains in particular potentials $ϕ$ with variation decreasing like $O (n^{- α})$ for some $α > 2$ , we prove that their corresponding equilibrium state $μ_{ϕ}$ satisfies a Gaussian concentration bound. Namely, we prove that there exists a constant $C > 0$ such that, for all $n$ and for all separately Lipschitz functions $K (x_{0}, \dots, x_{n - 1})$ , the exponential moment of $K (x, \dots, T^{n - 1} x) - \int K (y, \dots, T^{n - 1} y) d μ_{ϕ} (y)$ is bounded by $exp (C \sum_{i = 0}^{n - 1} Lip_{i} (K)^{2})$ . The crucial point is that $C$ is independent of $n$ and $K$ . We then derive various consequences of this inequality. For instance, we obtain bounds on the fluctuations of the empirical frequency of blocks, the speed of convergence of…

Equations296

n \to \infty lim μ_{ϕ} (x : \frac{S _{n} f ( x ) - n \int f d μ _{ϕ}}{n} \leq u) = \frac{1}{σ 2 π} \int_{- \infty}^{u} e^{- \frac{ξ ^{2}}{2 σ ^{2}}} d ξ

n \to \infty lim μ_{ϕ} (x : \frac{S _{n} f ( x ) - n \int f d μ _{ϕ}}{n} \leq u) = \frac{1}{σ 2 π} \int_{- \infty}^{u} e^{- \frac{ξ ^{2}}{2 σ ^{2}}} d ξ

μ_{ϕ} (x : \frac{1}{n} S_{n} f (x) \geq \int f d μ_{ϕ} + u) ≍ e^{- n I (u + \int f d μ_{ϕ})}

μ_{ϕ} (x : \frac{1}{n} S_{n} f (x) \geq \int f d μ_{ϕ} + u) ≍ e^{- n I (u + \int f d μ_{ϕ})}

∣ K (x_{0}, \dots, x_{i}, \dots, x_{n - 1}) - K (x_{0}, \dots, x_{i}^{'}, \dots, x_{n - 1}) ∣ \leq Lip_{i} (K) d (x_{i}, x_{i}^{'}) .

∣ K (x_{0}, \dots, x_{i}, \dots, x_{n - 1}) - K (x_{0}, \dots, x_{i}^{'}, \dots, x_{n - 1}) ∣ \leq Lip_{i} (K) d (x_{i}, x_{i}^{'}) .

\int exp (K (x, T x, \dots, T^{n - 1} x)) d μ_{ϕ} (x)

\int exp (K (x, T x, \dots, T^{n - 1} x)) d μ_{ϕ} (x)

\leq exp (\int K (x, T x, \dots, T^{n - 1} x) d μ_{ϕ} (x)) exp (C j = 0 \sum n - 1 Lip_{j} (K)^{2}) .

μ_{ϕ} (x : K (x, T x, \dots, T^{n - 1} x) - \int K (y, T y, \dots, T^{n - 1} y) d μ_{ϕ} (y) \geq u)

μ_{ϕ} (x : K (x, T x, \dots, T^{n - 1} x) - \int K (y, T y, \dots, T^{n - 1} y) d μ_{ϕ} (y) \geq u)

\leq exp (- \frac{u ^{2}}{4 C \sum _{i = 0}^{n - 1} Lip _{i} ( K ) ^{2}}) .

d_{θ} (x, y) = θ^{i n f {k : x^{k} \neq = y^{k}}}

d_{θ} (x, y) = θ^{i n f {k : x^{k} \neq = y^{k}}}

var_{n} (ϕ) := sup {∣ ϕ (x) - ϕ (y) ∣ : x^{i} = y^{i}, 0 \leq i \leq n - 1} n \to \infty 0.

var_{n} (ϕ) := sup {∣ ϕ (x) - ϕ (y) ∣ : x^{i} = y^{i}, 0 \leq i \leq n - 1} n \to \infty 0.

W (ϕ, x, y) = n \in \mathds N sup a \in A^{n} sup ∣ S_{n} ϕ (a x) - S_{n} ϕ (a y) ∣ .

W (ϕ, x, y) = n \in \mathds N sup a \in A^{n} sup ∣ S_{n} ϕ (a x) - S_{n} ϕ (a y) ∣ .

x, y \in Ω sup W (ϕ, x, y) \leq W (ϕ) .

x, y \in Ω sup W (ϕ, x, y) \leq W (ϕ) .

W_{p} (ϕ) := sup {W (ϕ, x, y) : x^{i} = y^{i}, 0 \leq i \leq p - 1} .

W_{p} (ϕ) := sup {W (ϕ, x, y) : x^{i} = y^{i}, 0 \leq i \leq p - 1} .

var_{p + 1} (ϕ) \leq W_{p} (ϕ) \leq k = p + 1 \sum \infty var_{k} (ϕ), p \in \mathds N .

var_{p + 1} (ϕ) \leq W_{p} (ϕ) \leq k = p + 1 \sum \infty var_{k} (ϕ), p \in \mathds N .

P_{ϕ} f (x) = T y = x \sum f (y) e^{ϕ (y)} .

P_{ϕ} f (x) = T y = x \sum f (y) e^{ϕ (y)} .

d_{ϕ} (x, y) = W_{p} (ϕ) if d_{θ} (x, y) = θ^{p}

d_{ϕ} (x, y) = W_{p} (ϕ) if d_{θ} (x, y) = θ^{p}

L_{ϕ} = {f \in C (Ω) : \exists M > 0 such that var_{n} (f) \leq M W_{n} (ϕ), n = 1, 2, \dots}

L_{ϕ} = {f \in C (Ω) : \exists M > 0 such that var_{n} (f) \leq M W_{n} (ϕ), n = 1, 2, \dots}

Lip_{ϕ} (f) = sup {\frac{∣ f ( x ) - f ( y ) ∣}{d _{ϕ} ( x , y )} : x \neq = y} = sup {\frac{var _{n} ( f )}{W _{n} ( ϕ )} : n \in \mathds N} .

Lip_{ϕ} (f) = sup {\frac{∣ f ( x ) - f ( y ) ∣}{d _{ϕ} ( x , y )} : x \neq = y} = sup {\frac{var _{n} ( f )}{W _{n} ( ϕ )} : n \in \mathds N} .

∥ f ∥_{L_{ϕ}} = ∥ f ∥_{\infty} + Lip_{ϕ} (f) .

∥ f ∥_{L_{ϕ}} = ∥ f ∥_{\infty} + Lip_{ϕ} (f) .

L_{θ} = {f \in C (Ω) : \exists M > 0 such that var_{n} (f) \leq M θ^{n}, n = 1, 2, \dots}

L_{θ} = {f \in C (Ω) : \exists M > 0 such that var_{n} (f) \leq M θ^{n}, n = 1, 2, \dots}

Lip_{θ} (f) = sup {\frac{∣ f ( x ) - f ( y ) ∣}{d _{θ} ( x , y )} : x \neq = y} = sup {\frac{var _{n} ( f )}{θ ^{n}} : n \in \mathds N} .

Lip_{θ} (f) = sup {\frac{∣ f ( x ) - f ( y ) ∣}{d _{θ} ( x , y )} : x \neq = y} = sup {\frac{var _{n} ( f )}{θ ^{n}} : n \in \mathds N} .

\frac{P _{ϕ}^{n} f}{λ _{ϕ}^{n}} - h_{ϕ} \int f d ν_{ϕ}_{\infty} \leq C_{(\ref backgroundthm)} ϵ_{n} ∥ f ∥_{L_{ϕ}}, \forall n \in \mathds N .

\frac{P _{ϕ}^{n} f}{λ _{ϕ}^{n}} - h_{ϕ} \int f d ν_{ϕ}_{\infty} \leq C_{(\ref backgroundthm)} ϵ_{n} ∥ f ∥_{L_{ϕ}}, \forall n \in \mathds N .

ϕ (x) = - n \geq 2 \sum \frac{x ^{0} x ^{n - 1}}{n ^{p}} .

ϕ (x) = - n \geq 2 \sum \frac{x ^{0} x ^{n - 1}}{n ^{p}} .

ϕ (x) = ⎩ ⎨ ⎧ v_{k} 00 if x \in [0^{k} 1] if x = (0, 0, \dots) otherwise .

ϕ (x) = ⎩ ⎨ ⎧ v_{k} 00 if x \in [0^{k} 1] if x = (0, 0, \dots) otherwise .

∣ K (x_{0}, \dots, x_{i}, \dots, x_{n - 1}) - K (x_{0}, \dots, x_{i}^{'}, \dots, x_{n - 1}) ∣ \leq Lip_{θ, i} (K) d_{θ} (x_{i}, x_{i}^{'}) .

∣ K (x_{0}, \dots, x_{i}, \dots, x_{n - 1}) - K (x_{0}, \dots, x_{i}^{'}, \dots, x_{n - 1}) ∣ \leq Lip_{θ, i} (K) d_{θ} (x_{i}, x_{i}^{'}) .

\int exp (K (x, T x, \dots, T^{n - 1} x)) d μ_{ϕ} (x)

\int exp (K (x, T x, \dots, T^{n - 1} x)) d μ_{ϕ} (x)

\leq exp (\int K (x, T x, \dots, T^{n - 1} x) d μ_{ϕ} (x)) exp (C_{(\ref maintheo)} j = 0 \sum n - 1 Lip_{θ, j} (K)^{2}) .

var_{n} (ϕ) = O (\frac{1}{n ^{α}})

var_{n} (ϕ) = O (\frac{1}{n ^{α}})

μ_{ϕ} (x \in Ω : K (x, T x, \dots, T^{n - 1} x) - \int K (y, T y, \dots, T^{n - 1} y) d μ_{ϕ} (y) \geq u)

μ_{ϕ} (x \in Ω : K (x, T x, \dots, T^{n - 1} x) - \int K (y, T y, \dots, T^{n - 1} y) d μ_{ϕ} (y) \geq u)

\leq exp (- \frac{u ^{2}}{4 C _{(\ref maintheo)} \sum _{i = 0}^{n - 1} Lip _{θ, i} ( K ) ^{2}})

μ_{ϕ} (x \in Ω : K (x, T x, \dots, T^{n - 1} x) - \int K (y, T y, \dots, T^{n - 1} y) d μ_{ϕ} (y) \geq u)

μ_{ϕ} (x \in Ω : K (x, T x, \dots, T^{n - 1} x) - \int K (y, T y, \dots, T^{n - 1} y) d μ_{ϕ} (y) \geq u)

\leq 2 exp (- \frac{u ^{2}}{4 C _{(\ref maintheo)} \sum _{i = 0}^{n - 1} Lip _{θ, i} ( K ) ^{2}}) .

Y = K (x, T x, \dots, T^{n - 1} x) - \int K (y, T y, \dots, T^{n - 1} y) d μ_{ϕ} (y) .

Y = K (x, T x, \dots, T^{n - 1} x) - \int K (y, T y, \dots, T^{n - 1} y) d μ_{ϕ} (y) .

\int (K (x, T x, \dots, T^{n - 1} x) - \int K (y, T y, \dots, T^{n - 1} y) d μ_{ϕ} (y))^{2} d μ_{ϕ} (x)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Gaussian Concentration bound for potentials satisfying Walters condition with subexponential continuity rates

J.-R. Chazottes JRC benefited from an ECOS Nord project for his stay at San Luis Potosí. Centre de Physique Théorique, Ecole Polytechnique, CNRS, 91128 Palaiseau, France.

Email: [email protected]

J. Moles JM benefited from an ECOS Nord project for his stay at Ecole Polytechnique. Instituto de Física, Universidad Autónoma de San Luis Potosí, S.L.P., 78290 México

Emails: [email protected], [email protected]

E. Ugalde EU acknowledges Ecole Polytechnique for financial support (one-month stay). Instituto de Física, Universidad Autónoma de San Luis Potosí, S.L.P., 78290 México

Emails: [email protected], [email protected]

(Dated: )

Abstract

We consider the full shift $T:\Omega\to\Omega$ where $\Omega=A^{\mathds{N}}$ , $A$ being a finite alphabet. For a class of potentials which contains in particular potentials $\phi$ with variation decreasing like $O(n^{-\alpha})$ for some $\alpha>2$ , we prove that their corresponding equilibrium state $\mu_{\phi}$ satisfies a Gaussian concentration bound. Namely, we prove that there exists a constant $C>0$ such that, for all $n$ and for all separately Lipschitz functions $K(x_{0},\ldots,x_{n-1})$ , the exponential moment of $K(x,\ldots,T^{n-1}x)-\int K(y,\ldots,T^{n-1}y)\,{\mathrm{d}}\mu_{\phi}(y)$ is bounded by $\exp\big{(}C\sum_{i=0}^{n-1}\mathrm{Lip}_{i}(K)^{2}\big{)}$ . The crucial point is that $C$ is independent of $n$ and $K$ . We then derive various consequences of this inequality. For instance, we obtain bounds on the fluctuations of the empirical frequency of blocks, the speed of convergence of the empirical measure, and speed of Markov approximation of $\mu_{\phi}$ . We also derive an almost-sure central limit theorem.

Keywords: concentration inequalities, empirical measure, Kantorovich distance, Wasserstein distance, d-bar distance, relative entropy, Markov approximation, almost-sure central limit theorem.

1 Introduction
2 Setting and preliminary results
3 Main result and applications
3.1 Gaussian concentration bound
3.2 Related works
3.3 Applications
3.3.1 Birkhoff sums
3.3.2 Empirical frequency of blocks
3.3.3 Hitting times and entropy
3.3.4 Speed of convergence of the empirical measure
3.3.5 Relative entropy, $\bar{d}$ -distance and speed of Markov approximation
3.3.6 Shadowing of orbits
3.3.7 Almost-sure central limit theorem
4 Proof of Theorem 3.1
4.1 Some preparatory results
4.2 Proof of Theorem 3.1

1 Introduction

We consider the full shift $T:\Omega\to\Omega$ where $\Omega=A^{\mathds{N}}$ , $A$ being a finite alphabet. Given an ergodic measure $\mu$ on $\Omega$ and a continuous observable $f:\Omega\to\mathds{R}$ , we know by Birkhoff’s ergodic theorem that $n^{-1}S_{n}f(x)$ converges, for $\mu$ -almost every $x$ , to $\int f{\mathrm{d}}\mu$ . (We use the standard notation $S_{n}f=f+f\circ T+\cdots+f\circ T^{n-1}$ .) To refine this result, we need more assumptions on $\mu$ and $f$ . For instance, if $\mu=\mu_{\phi}$ is the equilibrium state for a Lipschitz potential $\phi:\Omega\to\mathds{R}$ and $f$ is also a Lipschitz function, then the following central limit theorem holds:

[TABLE]

for all $u\in\mathds{R}$ , where $\sigma^{2}=\sigma_{f}^{2}$ is the variance of the process $\{f(T^{n}x)\}_{n\geq 0}$ where $x$ is distributed according to $\mu_{\phi}$ . 111 $\sigma_{f}^{2}=\int f^{2}{\mathrm{d}}\mu_{\phi}-\Big{(}\int f{\mathrm{d}}\mu_{\phi}\Big{)}^{2}+2\sum_{\ell\geq 1}\left(\int f\cdot f\circ T^{\ell}{\mathrm{d}}\mu_{\phi}-\Big{(}\int f{\mathrm{d}}\mu_{\phi}\Big{)}^{2}\right)$ This result says in essence that the fluctuations of $S_{n}f(x)-n\int f{\mathrm{d}}\mu_{\phi}$ are with high probability of order $\sqrt{n}$ , when $n\to\infty$ . Fluctuations of order $n$ , referred to as ‘large deviations’, are unlikely to appear. Indeed, for instance one has

[TABLE]

where $u\geq 0$ and $I(u)\geq 0$ is the so-called ‘rate function’ which is (strictly) convex, such that $I(\int f{\mathrm{d}}\mu_{\phi})=0$ , and equal to $+\infty$ outside a certain finite interval $(\ushort{u}_{f},\bar{u}_{f})$ .222For two positive sequences $(a_{n}),(b_{n})$ , $a_{n}\asymp b_{n}$ means that $\lim_{n}(1/n)\log a_{n}=\lim_{n}(1/n)\log b_{n}$ . Of course, both the central limit theorem and the large deviation asymptotics have been obtained for more general potentials, and for more general ‘chaotic’ dynamical systems. For a fairly recent review on probabilistic properties of nonuniformly hyperbolic dynamical systems modeled by Young towers, we refer to [4].

In this paper, we are interested in concentration inequalities which describe the fluctuations of observables of the form $K(x,Tx,\ldots,T^{n-1}x)$ around their average. The only restriction on $K$ is that it has to be separately Lipschitz. By this we mean that, for all $i=0,\ldots,n-1$ , there exists a constant $\mathrm{Lip}_{i}(K)$ with

[TABLE]

for all points $x_{0},\dots,x_{i},\dots,x_{n-1},x^{\prime}_{i}$ in $\Omega$ , where $d$ is the usual distance on $\Omega$ (see (2.1)). So $K$ can be nonlinear and implicitly defined. Of course, such a class contains partial sums of Lipschitz functions, namely functions of the form $K(x_{0},\ldots,x_{n-1})=f(x_{0})+\cdots+f(x_{n-1})$ for which $\mathrm{Lip}_{i}(K)=\mathrm{Lip}(f)$ for all $i$ . Beside considering very general observables, the other essential characteristics of concentration inequalities is that they are valid for all $n$ , contrarily to the above two results which are valid only in the limit $n\to\infty$ . More precisely, we shall prove the following ‘Gaussian concentration bound’. There exists a constant $C$ such that, for all $n$ and for all separately Lipschitz functions $K(x_{0},\ldots,x_{n-1})$ , we have

[TABLE]

The crucial point is that $C$ is independent of $n$ and $K$ . By a standard argument (see below), the previous inequality implies that for all $u>0$

[TABLE]

The Gaussian concentration bound (1.3) is known for Lipschitz potentials [7]. We shall prove that it remains true for a large subclass of potentials $\phi$ satisfying Walters condition. For instance, the bound holds for a potential whose variation is $O(n^{-\alpha})$ for some $\alpha>2$ . The proof of our result relies on two main ingredients. First, we start with a classical decomposition of $K-\int K$ as a telescopic sum of martingale differences. Second, we have to do a second telescoping to use Ruelle’s Perrons-Frobenius operator. But we do not have a spectral gap anymore as in [7] (in the case of Lipschitz potentials). Instead, we use a result of V. Maume-Deschamps [18] based on Birkhoff cones.

We apply the Gaussian concentration bound and its consequences, like (1.4), to various observables. On the one hand, we obtain concentration bounds for previously studied observables. We get the same bounds but they are no more limited to equilibrium states with Lipschitz potentials. On the other hand, we consider observables not considered before. Even when $K(x,\ldots,T^{n-1}x)=S_{n}f(x)$ , we get a non-trivial bound. We then obtain a control on the fluctuations of the empirical frequency of blocks $a^{0},\ldots,a^{k-1}$ around $\mu([a^{0},\ldots,a^{k-1}])$ , uniformly in $a^{0},\ldots,a^{k-1}\in A^{k}$ . We then consider an estimator of the entropy $\mu_{\phi}$ based on hitting times. The next application is about the speed of convergence of the empirical measure $(1/n)\sum_{i=0}^{n-1}\delta_{T^{i}x}$ towards $\mu_{\phi}$ in Wasserstein distance. Then we obtain an upper bound for the $\bar{d}$ -distance between any shift-invariant probability measure and $\mu_{\phi}$ . This distance is bounded by the square root of their relative entropy, times a constant. A consequence of this inequality is a bound for the speed of convergence of the Markov approximation of $\mu_{\phi}$ in $\bar{d}$ -distance. Then we quantify the ‘shadowing’ of an orbit by another one which has to start in a subset of $\Omega$ with $\mu_{\phi}$ -measure $1/3$ , say. Finally, we prove an almost-sure version of the central limit theorem. This application shows in particular that concentration inequalities can also be used to obtain limit theorems.

2 Setting and preliminary results

Let $\Omega=A^{\mathds{N}}$ where $A$ is a finite set. We denote by $x=x^{0}x^{1}\dots$ the elements of $\Omega$ (hence $x^{i}\in A$ ), and by $T$ the shift map: $(Tx)^{k}=x^{k+1}$ , $k\in\mathds{N}$ . (We use upper indices instead of lower indices because we will need to consider bunches of points in $\Omega$ , e.g., $x_{0},x_{1},\dots,x_{p}$ , $x_{i}\in\Omega$ .) We use the classical distance

[TABLE]

where $\theta\in(0,1)$ is some fixed number. Probability measures are defined on the Borel sigma-algebra of $\Omega$ which is generated by cylinder sets. Let $\phi:\Omega\to\mathds{R}$ be a continuous potential, which means that

[TABLE]

The sequence $(\mathrm{var}_{n}(\phi))_{n\geq 1}$ is the modulus of continuity of $\phi$ and it is called the ‘variation’ of $\phi$ in our context. By the way, we denote by $\mathscr{C}(\Omega)$ the Banach space of real-valued continuous functions on $\Omega$ equipped with the supremum norm $\|\cdot\|_{\infty}$ . We put further restrictions on $\phi$ , namely that it must satisfy the Walters condition [22]. For $x,y$ in $\Omega$ let

[TABLE]

We assume that $W(\phi,x,y)$ exists and that there exists $W(\phi)>0$ such that

[TABLE]

Now for $p\in\mathds{N}$ let

[TABLE]

Definition 2.1.

$\phi$ * is said to satisfy Walters’ condition if $(W_{p}(\phi))_{p\in\mathds{N}}$ is a strictly positive sequence and decreases to [math] as $p\to\infty$ .*

We now make several remarks on Walters’ condition. First, observe that locally constant potentials do not satisfy this condition because $W_{p}(\phi)=0$ for all $p$ larger than some $p_{0}$ . But one can in fact work with any strictly positive sequence $(\widetilde{W}_{p}(\phi))_{p\in\mathds{N}}$ decreasing to zero such that $W_{p}(\phi)\leq\widetilde{W}_{p}(\phi)$ for all $p$ , e.g., $\max(W_{p}(\phi),\eta^{p})$ for some fixed $\eta\in(0,1)$ . Second, one easily checks that

[TABLE]

Hence the set of potentials satisfying Walters’ condition contains the set of potentials with summable variation. In particular, $(W_{p}(\phi))_{p}$ is bounded above by a geometric sequence if and only if $(\mathrm{var}_{p}(\phi))_{p}$ is also bounded above by a geometric sequence. This corresponds to the case of Lipschitz or Hölder potentials (with respect to $d_{\theta}$ ).

Now define Ruelle’s Perron-Frobenius operator $P_{\phi}:\mathscr{C}(\Omega)\to\mathscr{C}(\Omega)$ as

[TABLE]

The next step is to define a function space preserved by $P_{\phi}$ and on which it has good spectral properties. We take the space of Lipschitz functions with respect to a new distance $d_{\phi}$ built out of $\phi$ as follows.

Definition 2.2 (The distance $d_{\phi}$ ).

For $x,y\in\Omega$ let

[TABLE]

and $d_{\phi}(x,x)=0$ .

Now define

[TABLE]

and

[TABLE]

One can then define a norm on $\mathcal{L}_{\phi}$ , making it a Banach space, by setting

[TABLE]

Remark 2.1.

The usual Banach space of Lipschitz functions is defined as follows. Let

[TABLE]

and

[TABLE]

The canonical norm making $\mathcal{L}_{\theta}$ a Banach space is $\|f\|_{\mathcal{L}_{\theta}}=\|f\|_{\infty}+\mathrm{Lip}_{\theta}(f)$ .

In view of (2.3), if we have $W_{n}(\phi)=O(\theta^{n})$ , then $\mathcal{L}_{\theta}=\mathcal{L}_{\phi}$ . If we now have, for instance, $W_{n}(\phi)=O(n^{-q})$ for some $q>0$ , then we get a bigger space which contains in particular all functions $f$ such that $\mathrm{var}_{n}(f)=O(n^{-r})$ with $r\geq q$ .

The following result is instrumental to this article. In brief, it tells us that a potential $\phi$ satisfying Walters’ condition has a unique equilibrium state, which will be denoted by $\mu_{\phi}$ , and gives a speed of convergence for the properly normalized iterates of the associated Ruelle’s Perron-Frobenius operator. The first part of the theorem is due to Walters, while the second one is due to Maume-Deschamps and can be found in her PhD thesis [18, Chapter I.2]. Unfortunately, her result was not published even though it is much sharper than the result in [16].

Theorem 2.1 ([22], [18]).

Let $\phi:\Omega\to\mathds{R}$ satisfying Walters’ condition as above. Then the following holds.

A.

There exists a unique triplet $(h_{\phi},\lambda_{\phi},\nu_{\phi})$ such that $h_{\phi}\in\mathcal{L}_{\phi}$ and is strictly positive, $\|\log h_{\phi}\|_{\infty}<\infty$ , $\lambda_{\phi}>0$ , $\nu_{\phi}$ a fully supported probability measure such that $\int h_{\phi}\,{\mathrm{d}}\nu_{\phi}=1$ . Moreover, $P_{\phi}h_{\phi}=\lambda_{\phi}h_{\phi}$ and $P_{\phi}^{*}\nu_{\phi}=\lambda_{\phi}\nu_{\phi}$ , and $\phi$ has a unique equilibrium state $\mu_{\phi}=h_{\phi}\nu_{\phi}$ which is mixing. 333This means that for any pair of cylinders $B,B^{\prime}$ $\lim_{n\to\infty}\mu_{\phi}(B\cap T^{-n}B^{\prime})=\mu_{\phi}(B)\mu_{\phi}(B^{\prime})$ . In particular $\mu_{\phi}$ is ergodic.

B.

There exists a positive sequence $(\epsilon_{n})_{n\in\mathds{N}}$ converging to zero, such that, for any $f\in\mathcal{L}_{\phi}$ ,

[TABLE]

Morover, one has the following behaviors:

If $W_{n}(\phi)=O(\eta^{n})$ for some $\eta\in(0,1)$ , then there exists $\eta^{\prime}\in(0,1)$ such that $\epsilon_{n}=O({\eta^{\prime}}^{n})$ . 2. 2.

If $W_{n}(\phi)=O(n^{-\alpha})$ for some $\alpha>0$ , then $\epsilon_{n}=O(n^{-\alpha})$ . 3. 3.

If $W_{n}(\phi)=O(\theta^{(\log n)^{\alpha}})$ for some $\theta\in(0,1)$ and $\alpha>1$ , then, for any $\epsilon>0$ , $\epsilon_{n}=O(\theta^{(\log n)^{\alpha-\epsilon}})$ . 4. 4.

If $W_{n}(\phi)=O(\operatorname{e}^{-cn^{\alpha}})$ for some $c>0$ and $\alpha\in(0,1)$ , then there exists $c^{\prime}>0$ such that $\epsilon_{n}=O\big{(}\operatorname{e}^{-c^{\prime}n^{\frac{\alpha}{\alpha+1}}}\big{)}$ .

The fact that $\mu_{\phi}$ is an equilibrium state means that it maximizes the functional $\mu\mapsto h(\nu)+\int\phi\,{\mathrm{d}}\nu$ over the set of shift-invariant probability measures on $\Omega$ , where $h(\nu)$ is the entropy of $\nu$ , and the maximum is equal to the topological pressure $P(\phi)$ of $\phi$ (see e.g. [15]), and we have $P(\phi)=\log\lambda_{\phi}$ .

Let us give examples of potentials. First consider $A=\{-1,1\}$ and $p>1$ , and define

[TABLE]

One can check that $W_{n}(\phi)=O(n^{-p+2})$ . This is the analog of the so-called long-range Ising model on $\mathds{N}$ . Let us now take $A=\{0,1\}$ and let $[0^{k}1]=\{x\in\Omega:x^{i}=0,0\leq i\leq k-1,\,\text{and}\,x_{k}=1\}$ . Let $(v_{n})$ be a monotone decreasing sequence of real numbers converging to [math] and define

[TABLE]

One can check that $\mathrm{var}_{n}(\phi)=v_{n}$ . This example is taken from [19].

Remark 2.2.

Let us briefly explain how we can interpret an equilibrium state for a non Lipschitz potential as an absolutely continuous invariant measure of a piecewise expanding map of the unit interval with a Markov partition. It is well-known that a uniformly expanding map $S$ of the unit interval with a finite Markov partition which is piecewise $C^{1+\eta}$ , for some $\eta>0$ , can be coded by a subshift of finite type $(\Omega,T)$ over a finite alphabet. Then, $-\log|S^{\prime}|$ induces a potential $\phi$ on $\Omega$ which is Lipschitz (with respect to $d_{\theta}$ ). The pullback of $\mu_{\phi}$ is then the unique absolutely continuous invariant probability measure for $S$ . In [10], the authors showed that, given $\phi$ which is not Lipschitz, one can construct a uniformly expanding map of the unit interval with a finite Markov partition which is piecewise $C^{1}$ , but not piecewise $C^{1+\eta}$ for any $\eta>0$ , and such that the pullback of $\mu_{\phi}$ is the Lebesgue measure.

3 Main result and applications

3.1 Gaussian concentration bound

We can now state our main theorem whose proof is deferred to Section 4. We start by the definition of separately $d_{\theta}$ -Lipschitz functions.

Definition 3.1.

A function $K:\Omega^{n}\to\mathds{R}$ is said to be separately $d_{\theta}$ -Lipschitz if, for all $i$ , there exists a constant $\mathrm{Lip}_{\theta,i}(K)$ with

[TABLE]

for all points $x_{0},\dots,x_{i},\dots,x_{n-1},x^{\prime}_{i}$ in $\Omega$ .

Theorem 3.1.

Suppose that $\phi$ satisfies one of the following conditions:

$W_{n}(\phi)=O(\theta^{n})$ * (that is, $\phi$ is $d_{\theta}$ -Lipschitz);* 2. 2.

$W_{n}(\phi)=O(n^{-\alpha})$ * for some $\alpha>1$ ;* 3. 3.

$W_{n}(\phi)=O(\theta^{(\log n)^{\alpha}})$ * for some $\theta\in(0,1)$ and $\alpha>1$ ;* 4. 4.

$W_{n}(\phi)=O(\operatorname{e}^{-cn^{\alpha}})$ * for some $c>0$ and $\alpha\in(0,1)$ .*

Then the process $(x,Tx,\ldots)$ , with $x$ distributed according to $\mu_{\phi}$ , satisfies the following Gaussian concentration bound. There exists $C_{\mathrm{(\ref{maintheo})}}>0$ such that for any $n\in\mathds{N}$ and for any separately $d_{\theta}$ -Lipschitz function $K:\Omega^{n}\to\mathds{R}$ , we have

[TABLE]

Three remarks are in order. First, we conjecture that this theorem is valid under the condition $\sum_{n}\mathrm{var}_{n}(\phi)<\infty$ . Second, it would be useful to have an explicit formula for $C_{\mathrm{(\ref{maintheo})}}$ in (3.1). Unfortunately, this constant is proportional to $C_{\mathrm{(\ref{backgroundthm})}}$ (see Theorem 2.1) which is cumbersome since it involves the eigendata of $P_{\phi}$ . Third, for the sake of simplicity, we considered the full shift $A^{\mathds{N}}$ . In fact, our results remain true if $\Omega\subset A^{\mathds{N}}$ is a topologically mixing one-sided subshift of finite type. Moreover, one can extend Theorem 3.1 to bilateral subshifts of finite type by a trick used in [7].

We now give some corollaries of our main theorem that we will be used in the section on applications. First, by (2.3) we immediately obtain the following corollary.

Corollary 3.2.

If there exists $\alpha>2$ such that

[TABLE]

then we have the Gaussian concentration bound (3.1).

Next, we get the following concentration inequalities from (3.1).

Corollary 3.3.

For all $u>0$ , we have

[TABLE]

and

[TABLE]

Proof.

Inequality (3.2) follows by a well-known trick referred to as Chernoff’s bounding method [2]. Let us give the proof for completeness. Let $u>0$ . For any random variable $Y$ , Markov’s inequality tells us that $\mathds{P}(Y\geq u)\leq\operatorname{e}^{-\xi u}\mathds{E}\left(\operatorname{e}^{\xi Y}\right)$ for all $\xi>0$ . Now let

[TABLE]

Using (3.1) and optimizing over $\xi$ , we get (3.2). Inequality (3.3) follows by applying (3.2) to $-K$ and then summing up the two bounds. ∎

The last corollary we want to state is about the variance of any separately $d_{\theta}$ -Lipschitz function.

Corollary 3.4.

We have

[TABLE]

Proof.

To alleviate notations, we simply write $K$ instead of $K\left(x,Tx,\dots,T^{n-1}x\right)$ , $\int K$ instead of $\int K\left(y,Ty,\dots,T^{n-1}y\right){\mathrm{d}}\mu_{\phi}(y)$ , and so on and so forth. Applying (3.1) to $\xi K$ where $\xi$ is any real number different from [math], we get

[TABLE]

Now by Taylor expansion we get

[TABLE]

Dividing by $\xi^{2}$ on both sides and then taking the limit $\xi\to 0$ , we obtain the desired inequality. ∎

Although we were not able to prove the Gaussian concentration bound for separately $d_{\phi}$ -Lipschitz functions, for many applications separately $d_{\theta}$ -Lipschitz functions are more natural. Furthermore there is a notable class of separately $d_{\phi}$ -Lipschitz functions, namely Birkhoff sums of the potential itself, for which our theorem holds. Indeed, when $\phi\in\mathcal{L}_{\phi}$ , the function $K(x,\ldots,T^{n-1}x)=S_{n}\phi(x)$ is obviously separately $d_{\phi}$ -Lipschitz and $\mathrm{Lip}_{\phi,j}(K)=\mathrm{Lip}_{\phi}(\phi)$ for all $j$ . We have the following result.

Theorem 3.5.

Under the hypotheses of Theorem 3.1, there exists $C_{\mathrm{(\ref{GCBBirkhoff})}}>0$ such that, for any $\psi\in\mathcal{L}_{\phi}$ , for all $u>0$ , and for all $n\in\mathds{N}$ , we have

[TABLE]

The proof is left to the reader. The main (simple) modification lies in the proof of Lemma 4.3 in which considering a Birkhoff sum of a $d_{\phi}$ -Lipschitz function works fine, whereas we are stuck for a general separately $d_{\phi}$ -Lipschitz function.

We will apply this result with $\psi=-\phi$ to derive concentration bounds for hitting times. Note that under the assumptions of this theorem, $\{\psi(T^{n}x)\}_{n\geq 0}$ satisfies the central limit theorem [18, Chapter 2].

3.2 Related works

The novelty here is to prove a Gaussian concentration bound for potentials with a variation decaying subexponentially. For $\phi$ is Lipschitz, Theorem 3.1 was proved in [7]. The main goal of [7] was then to deal with nonuniformly hyperbolic systems modeled by a Young tower. For a tower with a return-time to the base with exponential tails, the authors of [7] proved a Gaussian concentration bound. For polynomial tails, they proved moment concentration bounds. For $C^{1+\eta}$ maps of the unit interval with an indifferent fixed point, which are thus nonuniformly expanding, we are in the latter situation. In view of Remark 2.2 above, we deal here with maps whose derivative is not Hölder continuous, but which are still uniformly expanding.

Let us also mention the paper [14] in which the authors prove a Gaussian concentration bound for $\phi$ of summable variation (whereas we need a bit more than summable). Their proof is based on coupling. However, they consider functions $K$ on $A^{n}$ , not on $\big{(}A^{\mathds{N}}\big{)}^{n}=\Omega^{n}$ as in this paper. For such functions, the analogue of $\mathrm{Lip}_{\theta,i}(K)$ is $\delta_{i}(K)=\sup\{|K(a^{0},\ldots,a^{i},\ldots,a^{n-1})-K(b^{0},\ldots,b^{i},\ldots,b^{n-1})|:a^{j}=b^{j},\forall j\neq i\}$ . It is clear that a Gaussian concentration bound for functions $K:\big{(}A^{\mathds{N}}\big{)}^{n}\to\mathds{R}$ implies a Gaussian concentration bound for functions $K:A^{n}\to\mathds{R}$ , but the converse is not true.

3.3 Applications

We now give several applications of the Gaussian concentration bound (3.1) and its corollaries. Throughout this section, $\mu_{\phi}$ is the equilibrium state for a potential $\phi$ satisfying one of the conditions 1-4 in Theorem 3.1.

3.3.1 Birkhoff sums

Let $f:\Omega\to\mathds{R}$ be a $d_{\theta}$ -Lipschitz function and define

[TABLE]

whence $K(x,Tx,\ldots,T^{n-1}x)=f(x)+f(Tx)+\cdots+f(T^{n-1}x):=S_{n}f(x)$ is the Birkhoff sum of $f$ . Clearly, $\mathrm{Lip}_{\theta,i}(K)=\mathrm{Lip}_{\theta}(f)$ for all $i=0,\ldots,n-1$ . Applying Corollary 3.3 we immediately get

[TABLE]

for all $n\geq 1$ and $u\in\mathds{R}_{+}$ , where

[TABLE]

This bound can be compared with the large deviation asymptotics (1.2). We see that it has the right behavior in $n$ . Replacing $u$ by $u/\sqrt{n}$ in (3.6) we get

[TABLE]

for all $n$ and $u>0$ . This can be compared with the central limit theorem (1.1). We can see that the previous bound is consistent with that theorem. Note that the central limit is about convergence in law, whereas here we obtain a (non-asymptotic) bound from which one cannot deduce a convergence in law.

3.3.2 Empirical frequency of blocks

Take $f(x)=\mathds{1}_{[a^{0,k-1}]}(x)$ where

[TABLE]

is a given $k$ -cylinder. Let

[TABLE]

This is the ‘empirical frequency’ of the block $a^{0,k-1}\in A^{k}$ in the orbit of $x$ up to time $n-k$ . By Birkhoff’s ergodic theorem, we know that, for each $a^{0,k-1}$ , $\mathfrak{f}_{n}(x,a^{0,k-1})$ goes to $\mu_{\phi}([a^{0,k-1}])$ for $\mu_{\phi}$ -almost all $x$ . The next theorem quantifies this asymptotic statement. Notice that we can control the fluctuations of $\mathfrak{f}_{n}(x,a^{0,k-1})$ around $\mu_{\phi}([a^{0,k-1}])$ uniformly in $a^{0,k-1}$ .

Theorem 3.6.

For all $n\in\mathds{N}$ , for all $1\leq k\leq n$ and for all $u>0$ we have

[TABLE]

where $c=2\sqrt{2C_{\mathrm{(\ref{maintheo})}}\log|A|}$ . Moreover, if $k=k(n)=\zeta\log n$ for some $\zeta>0$ , then

[TABLE]

where $c^{\prime}=2\sqrt{2\zeta C_{\mathrm{(\ref{maintheo})}}\log|A|}$ .

Proof.

Define the function $K:\Omega^{n-k+1}\to\mathds{R}$ by

[TABLE]

where

[TABLE]

It is left to the reader to check that $\mathrm{Lip}_{\theta,j}(K)=\frac{\mathrm{Lip}_{\theta}(f)}{n-k+1}=\frac{1}{\theta^{k}(n-k+1)}$ , so we get immediately from 3.6

[TABLE]

for all $n\geq 1$ and $u>0$ . To complete the proof, we need a good upper bound for $\int K\left(y,Ty,\dots,T^{n-k-1}y\right){\mathrm{d}}\mu_{\phi}(y)$ . Actually, this can be done by using again the Gaussian concentration bound. Using (3.1) and Jensen’s inequality we get for any $\xi>0$

[TABLE]

The third inequality is obtained by using the trivial inequality

[TABLE]

Taking logarithms on both sides and then dividing by $\xi$ , we have

[TABLE]

There is a unique $\xi>0$ minimizing the right-hand side, hence

[TABLE]

where we used that $\log 2\leq\log|A|$ . Hence we get the desired estimate. ∎

Note that $\log|A|$ is the topological entropy of the full shift with alphabet $A$ .

3.3.3 Hitting times and entropy

For $x,y\in\Omega$ , let

[TABLE]

This is the first time that the $n$ first symbols of $x$ appear in $y$ . We assume that $\phi$ satisfies

[TABLE]

One can prove (see [9]) that

[TABLE]

Roughly, this means that, if we pick $x$ and $y$ independently, each one according to $\mu_{\phi}$ , then the time it takes to see the first $n$ symbols of $x$ appearing in $y$ for the first time is $\approx\operatorname{e}^{nh(\mu_{\phi})}$ .

Theorem 3.7.

If $\phi$ satisfies (3.7), then there exist strictly positive constants $c_{1},c_{2}$ and $u_{0}$ such that, for all $n$ and for all $u>u_{0}$ ,

[TABLE]

and

[TABLE]

These bounds were obtained in [8] when $\phi$ is Lipschitz. Observe that the probability of being above $h(\mu_{\phi})$ is bounded above by $c_{1}\operatorname{e}^{-c_{2}nu^{2}}$ , whereas the probability of being below $h(\mu_{\phi})$ is bounded above by $c_{1}\operatorname{e}^{-c_{2}nu}$ . The proof of this theorem being very similar to that given in [8], we omit the details and only sketch it. We cannot directly deal with $T_{x^{0,n-1}}(y)$ but we have $\log T_{x^{0,n-1}}(y)=\log\big{(}T_{x^{0,n-1}}(y)\mu_{\phi}([x^{0,n-1}])\big{)}-\log\mu_{\phi}([x^{0,n-1}])$ . Then we use Theorem 3.5 for $\psi=-\phi$ , assuming (without loss of generality) that $P(\phi)=0$ , that is, $h(\mu_{\phi})=-\int\phi\,{\mathrm{d}}\mu_{\phi}$ , because we can control uniformly in $x$ the approximation $-\log\mu_{\phi}([x^{0,n-1}])\approx S_{n}(-\phi)(x)$ . To control the other term, we use that the law of $T_{x^{0,n-1}}(y)\mu_{\phi}([x^{0,n-1}])$ is well approximated by an exponential law.

Another estimator of $h(\mu_{\phi})$ is the so-called plug-in estimator. We could also obtain concentration bounds for it in the spirit of [8].

3.3.4 Speed of convergence of the empirical measure

Instead of looking at the frequency of a block $a_{1}^{k}$ we can consider a global object, namely the empirical measure

[TABLE]

For $\mu_{\phi}$ -almost every $x$ , we know that

[TABLE]

where the convergence is in the weak topology on the space of probability measures $\mathscr{M}(\Omega)$ on $\Omega$ . This is a consequence of Birkhoff’s ergodic theorem. The natural question is: how fast does this convergence takes place? We can answer this question by using the Kantorovich distance $d_{{\scriptscriptstyle K}}$ which metrizes weak topology on $\mathscr{M}(\Omega)$ :

[TABLE]

We have the following result.

Theorem 3.8.

For all $u>0$ and all $n\geq 1$ we have

[TABLE]

where $c_{\mathrm{(\ref{concentrationkanto})}}=(4C_{\mathrm{(\ref{maintheo})}})^{-1}$ .

Proof.

Let

[TABLE]

Of course, $K(x,Tx,\ldots,T^{n-1}x)=d_{{\scriptscriptstyle K}}(\mathscr{E}_{n}(x),\mu_{\phi})$ . It is left to the reader to check that

[TABLE]

The result follows at once by applying inequality (3.3). ∎

It is natural to ask for a good upper bound for $\int d_{{\scriptscriptstyle K}}(\mathscr{E}_{n}(y),\mu_{\phi}){\mathrm{d}}\mu_{\phi}(y)$ because this would give a control on the fluctuations of $d_{{\scriptscriptstyle K}}(\mathscr{E}_{n}(x),\mu_{\phi})$ around [math]. Getting such a bound turns out to be difficult. In [5, Section 8] it is proved that

[TABLE]

For two positive sequences $(a_{n}),(b_{n})$ , $a_{n}\preceq b_{n}$ means that $\limsup_{n}\frac{\log a_{n}}{\log b_{n}}\leq 1$ . One could in principle get a non-asymptotic but messy bound.

3.3.5 Relative entropy, $\bar{d}$ -distance and speed of Markov approximation

Given $n\in\mathds{N}$ and $x^{0,n-1},y^{0,n-1}\in A^{n}$ the (non normalized) Hamming distance between $x$ and $y$ is

[TABLE]

Now, given two shift-invariant probability measures $\mu,\nu$ on $\Omega$ , denote by $\mu_{n}$ and $\nu_{n}$ their projections on $A^{n}$ , and define their $\bar{d}_{n}$ -distance by

[TABLE]

where the infimum is taken over all the joint shift-invariant probability distributions $\mathbb{P}_{n}$ on $A^{n}\times A^{n}$ such that $\sum_{y^{0,n-1}\in A^{n}}\mathbb{P}_{n}(x^{0,n-1},y^{0,n-1})=\mu_{n}(x^{0,n-1})$ and $\sum_{x^{0,n-1}\in A^{n}}\mathbb{P}_{n}(x^{0,n-1},y^{0,n-1})=\nu_{n}(y^{0,n-1})$ . By [20, Theorem I.9.6, p. 92], the limit following exists:

[TABLE]

and defines a distance on the set of shift-invariant probability measures. It induces a finer topology than the weak topology and, in particular, the $\bar{d}$ -limit of ergodic measures is ergodic, and the entropy is $\bar{d}$ -continuous on the class of ergodic measures.444These two properties are false in the weak topology.

Next, given $n\in\mathds{N}$ and a shift-invariant probability measure $\nu$ on $\Omega$ , define the $n$ -block relative entropy of $\nu$ with respect to $\mu_{\phi}$ by

[TABLE]

One can easily prove that the following limit exists and defines the relative entropy of $\nu$ with respect to $\mu_{\phi}$ :

[TABLE]

where $P(\phi)$ is the topological pressure of $\phi$ :

[TABLE]

This limit exists for any continuous $\phi$ . (To prove (3.11), we use that there exists a positive sequence $(\varepsilon_{n})_{n}$ going to [math] such that, for any $a^{0,n-1}\in A^{n}$ and any $x\in[a^{0,n-1}]$ , $\mu_{\phi}([a^{0,n-1}])/\exp(-nP(\phi)+S_{n}\phi(x))$ is bounded below by $\exp(-n\varepsilon_{n})$ and above by $\exp(-n\varepsilon_{n})$ .) By the variational principle, $h(\nu|\mu_{\phi})\geq 0$ with equality if and only if $\nu=\mu_{\phi}$ (recall that $\mu_{\phi}$ is the unique equilibrium state of $\phi$ ). We refer to [21] for details. We can now formulate the first theorem of this section.

Theorem 3.9.

For every shift-invariant probability measure $\nu$ on $\Omega$ and for all $n\in\mathds{N}$ , we have

[TABLE]

where $c_{\mathrm{(\ref{theo-pinsker})}}=\sqrt{2C_{\mathrm{(\ref{maintheo})}}}$ . In particular

[TABLE]

Proof.

For a function $f:A^{n}\to\mathds{R}$ , define for each $i=0,\ldots,n-1$

[TABLE]

We obviously have that for all $a^{0,n-1},b^{0,n-1}\in A^{n}$

[TABLE]

A function $f:A^{n}\to\mathds{R}$ such that $\delta_{j}(f)=1$ , $i=0,\ldots,n-1$ is $1$ -Lipschitz for the Hamming distance (3.9). We now consider the set of functions

[TABLE]

We can identify a function $f\in\mathcal{H}(n,\phi)$ with a function $\tilde{f}:\Omega^{n}\to\mathds{R}$ in a natural way: $\tilde{f}(x_{0},\ldots,x_{n-1})=f(\pi(x_{0}),\ldots,\pi(x_{n-1}))$ where $\pi:\Omega\to A$ is defined by $\pi(x)=x^{0}$ . We obviously have $\int\tilde{f}\,{\mathrm{d}}\mu_{\phi}=0$ and it is easy to check that $\mathrm{Lip}_{j}(\tilde{f})=\delta_{j}(f)=1$ , $j=0,\ldots,n-1$ . Therefore we can apply the Gaussian concentration bound (3.1) to get

[TABLE]

We now apply an abstract result [1, Theorem 3.1] which says that (3.14) is equivalent to

[TABLE]

Hence (3.12) is proved. To get (3.13), divide by $n$ on both sides and take the limit $n\to\infty$ and use (3.10) and (3.11). ∎

We now give an application of inequality (3.13). Let

[TABLE]

The equilibrium state for $\phi_{n}$ is a $(n-1)$ -step Markov measure. One can prove that in the weak topology $(\mu_{\phi_{n}})_{n}$ converges to $\mu_{\phi}$ , but one cannot get any speed of convergence. We get the following upper bound on the speed of convergence of $(\mu_{\phi_{n}})_{n}$ to $\mu_{\phi}$ in the finer $\bar{d}$ topology.

Corollary 3.10.

Assume, without loss of generality, that $\phi$ is normalized in the sense that

[TABLE]

Then there exists $n_{\phi}\geq 1$ such that, for all $n\geq n_{\phi}$ , we have

[TABLE]

where

[TABLE]

More details on how to normalize a potential are given in Subsection 4.1.

Proof.

Using (3.11) and the variational principle we get

[TABLE]

Indeed, since $\phi$ and $\phi_{n}$ are normalized, we have in particular that $P(\phi)=P(\phi_{n})=0$ , and by the variational principle $h(\mu_{\phi_{n}})=-\int\phi_{n}{\mathrm{d}}\mu_{\phi_{n}}$ . Now

[TABLE]

where we used the inequality $\log(1+u)\leq u$ for all $u>-1$ . Now using the shift-invariance of $\mu_{\phi_{n}}$ and replacing $\operatorname{e}^{\phi_{n}}$ by $\operatorname{e}^{\phi_{n}}-\operatorname{e}^{\phi}+\operatorname{e}^{\phi}$ we get

[TABLE]

where we used that $\sum_{a\in A}(\operatorname{e}^{\phi_{n}(ax)}-\operatorname{e}^{\phi(ax)})=0$ . Combining (3.13), (3.16), (3.17) and (3.18) we thus obtain

[TABLE]

It remains to estimate $\|\operatorname{e}^{\phi_{n}}-\operatorname{e}^{\phi}\|_{\infty}$ in terms of $\mathrm{var}_{n}(\phi)$ . We have

[TABLE]

provided that $\|\phi-\phi_{n}\|_{\infty}<1$ , where we used the inequality $|\operatorname{e}^{u}-1|\leq(\operatorname{e}-1)|u|$ valid for $|u|<1$ . Finally, since $\|\phi-\phi_{n}\|_{\infty}\leq\mathrm{var}_{n}(\phi)$ , we define $n_{\phi}$ to be the smallest integer sucht $\mathrm{var}_{n}(\phi)<1$ and we can take

[TABLE]

We thus proved (3.15). ∎

Let us mention the paper [13] in which the authors obtain the same bound for the speed of convergence of Markov approximation, up to the constant. Their approach is a direct estimation of $\bar{d}(\mu_{\phi_{n}},\mu_{\phi})$ by using a coupling method. The point here is to obtain the same speed of convergence as an easy corollary of inequality (3.13). Let us remark that from (4.8) we get a worse result since we end up with a bound proportional to $\sqrt{\mathrm{var}_{n}(\phi)}$ . The trick which leads to the correct bound was told us by Daniel Takahashi.

3.3.6 Shadowing of orbits

Let $A$ be a Borel subset of $\Omega$ such that $\mu_{\phi}(A)>0$ and define for all $n\in\mathds{N}$

[TABLE]

A basic example of set $A$ is a cylinder set $[a^{0,k-1}]$ . The quantity $\mathcal{S}_{A}(x,n)$ , which lies between [math] and $1$ , measures how we can trace, in the best possible way, the orbit of some initial condition not in $A$ by an orbit starting in $A$ .

Theorem 3.11.

For any Borel subset $A\subset\Omega$ such that $\mu_{\phi}(A)>0$ , for any $n\in\mathds{N}$ and for any $u>0$

[TABLE]

where

[TABLE]

We give a shorter and simpler proof than in [11].

Proof.

Let $K(x_{0},\ldots,x_{n-1})=\frac{1}{n}\inf_{y\in A}\sum_{j=0}^{n-1}d_{\theta}(x_{j},T^{j}y)$ . One can easily check that

[TABLE]

It follows from (3.2) that

[TABLE]

for all $n\geq 1$ and for all $u>0$ . We now need an upper bound for $\int\mathcal{S}_{A}(y,n){\mathrm{d}}\mu_{\phi}(y)$ . We simply observe that by (3.1) and the definition of $\mathcal{S}_{A}(\cdot,n)$

[TABLE]

for all $\xi>0$ . Hence

[TABLE]

Optimizing this bound over $\xi>0$ gives

[TABLE]

The theorem follows at once. ∎

3.3.7 Almost-sure central limit theorem

It was proved in [18, Chapter 2] that $(\Omega,T,\mu_{\phi})$ satisfies the central limit theorem for the class of $d_{\theta}$ -Lipschitz functions $f:\Omega\to\mathds{R}$ such $\int f{\mathrm{d}}\mu_{\phi}=0$ , that is, for any such $f$ the process $\{f\circ T^{n}\}_{n\geq 0}$ satisfies

[TABLE]

where

[TABLE]

If $\sigma^{2}(f)>0$ , $G_{0,\sigma^{2}}$ denotes the law of a Gaussian random variable with mean [math] and variance $\sigma^{2}(f)$ , that is,

[TABLE]

When $\sigma^{2}(f)=0$ we set $G_{0,0}=\delta_{0}$ , the Dirac mass at zero.

Remark 3.1.

In fact, a more general statement was proved in [18, Chapter I.2]. Namely, (3.19) holds when $\phi$ is such that $\sum_{k}\epsilon_{k}<+\infty$ and $f\in\mathcal{L}_{\phi}$ .

Now, for each $N\geq 1$ and $x\in\Omega$ , define the probability measure

[TABLE]

where $L_{N}=\sum_{n=1}^{N}\frac{1}{n}$ and where, as usual, $\delta_{u}$ is the Dirac mass at point $u\in\mathds{R}$ . Of course, $L_{N}=\log N+\mathcal{O}(1)$ . Notice that $\mathcal{A}_{N}$ is a random probability measure. Finally, the Wasserstein distance between two probability measures $\nu$ , $\nu^{\prime}$ on the Borel sigma-algbra $\mathscr{B}(\mathds{R})$ is

[TABLE]

where the infimum is taken over all probability measures such that

[TABLE]

for any Borel subset of $\mathds{R}$ . By the Kantorovich-Rubinstein duality theorem, $W_{1}(\nu,\nu^{\prime})$ is equal to the Kantorovich distance which is the supremum of $\int\ell\,{\mathrm{d}}\nu-\int\ell\,{\mathrm{d}}\nu^{\prime}$ over the set of $1$ -Lipschitz functions $\ell:\mathds{R}\to\mathds{R}$ . We refer to [12] for background and proofs.

Now we can formulate the almost-sure central limit theorem.

Theorem 3.12.

Let $f:\Omega\to\mathds{R}$ be a $d_{\theta}$ -Lipschitz function. Then, for $\mu_{\phi}$ almost every $x\in\Omega$ , we have

[TABLE]

We make several comments. Recall that the Wasserstein distance metrizes the weak topology on the set of probability measures $\nu$ on $\mathscr{B}(\mathds{R})$ . Moreover, if $(\nu_{n})_{n\geq 1}$ is a sequence of probability measures on $\mathscr{B}(\mathds{R})$ and $\nu$ a probability measure on $\mathscr{B}(\mathds{R})$ , then

[TABLE]

where “ $\mathop{}\!\xrightarrow[]{\mathrm{\scriptscriptstyle{law}}}$ ” means weak convergence of probability measures on $\mathscr{B}(\mathds{R})$ .

To compare with (3.19), observe that Theorem 3.12 implies that for $\mu_{\phi}$ -almost every $x$ , $\mathcal{A}_{N}(x)\mathop{}\!\xrightarrow[]{\mathrm{\scriptscriptstyle{law}}}G_{0,\sigma^{2}(f)}$ , which in turn implies that

[TABLE]

Therefore, the expectation with respect to $\mu_{\phi}$ in (3.19) is replaced by a pathwise logarithmic average in the almost-sure central limit theorem.

Proof.

The proof follows from an abstract theorem proved in [6]. In words, that theorem says the following. Let $(X_{n})_{n\geq 0}$ be a stochastic stationary process where the $X_{n}$ ’s are random variables taking values in $\Omega$ . Assume that if $f:\Omega\to\mathds{R}$ is $d$ -Lipschitz and such that $\mathds{E}[f(X_{0})]=0$ , then it satisfies the central limit theorem, that is, for all $u\in\mathds{R}$ ,

[TABLE]

where $\sigma^{2}(f):=\mathds{E}[f^{2}(X_{0})]+2\sum_{\ell\geq 1}\mathds{E}[f(X_{0})f(X_{\ell})]$ is assumed to be $\neq 0$ . Moreover, assume that the process $(X_{n})_{n\geq 0}$ satisfies the following variance inequality: There exists $C>0$ such that for all separately $d$ -Lipschitz functions $K:\Omega^{n}\to\mathds{R}$ for some distance $d$ on $\Omega$ ,

[TABLE]

Then, the conclusion is that, almost surely,

[TABLE]

converges in Wasserstein distance (or, equivalently, in Kantorovich distance) to $G_{0,\sigma^{2}(f)}((-\infty,u])$ . We apply this abstract theorem to the process $(x,Tx,\ldots)$ where $x\in\Omega$ is distributed according to $\mu_{\phi}$ with $\Omega=A^{\mathds{N}}$ and $d=d_{\theta}$ . Since we have (3.19) and (3.4), the theorem follows. ∎

Remark 3.2.

The previous result relies only upon the variance inequality (3.4), which is much weaker than the Gaussian concentration bound of Theorem 3.1. On the one hand, the variance inequality (3.4) should be true for less regular potentials than the ones we consider here. On the other hand, the Gaussian concentration bound should provide a strengthening of Theorem 3.12, namely a speed of convergence.

4 Proof of Theorem 3.1

We follow the proof given in [7] with the appropriate modifications to go beyond Lipschitz potentials.

4.1 Some preparatory results

It is convenient to normalize the potential $\phi$ or, equivalently, the operator $P_{\phi}$ in the following way. We use the notations of Theorem 2.1. Let

[TABLE]

Thus

[TABLE]

Let $g$ denote the inverse of the Jacobian of $T$ , and $g^{(k)}$ the inverse of the Jacobian of $T^{k}$ , that is,

[TABLE]

(Of course $g=g^{(1)}$ .) Therefore we have

[TABLE]

Estimate (2.4) now takes the form

[TABLE]

for any $f\in\mathcal{L}_{\phi}$ . Finally, we will need the following distortion estimate. Let $x,y\in\Omega$ such that $x^{i}=y^{i}$ for $i=0,\dotsc,n-1$ and $x^{\prime},y^{\prime}\in\Omega$ such that $T^{k}x^{\prime}=x$ and $T^{k}y^{\prime}=y$ . Then it is easy to check (see [18, Chapter 2]) that, for any $k$ ,

[TABLE]

for some constant $c_{\mathrm{(\ref{distortion})}}>0$ depending only on $\phi$ .

We will use the following inequality relating the distances $d_{\theta}$ and $d_{\phi}$ .

Lemma 4.1.

Suppose that $W_{n}(\phi)=O(\theta^{n})$ , $n\geq 1$ , or

[TABLE]

Then there exists $c_{\mathrm{(\ref{comparaisondistances})}}>0$

[TABLE]

or, equivalently,

[TABLE]

for all $x,y$ .

Proof.

The statement is trivial when $W_{n}(\phi)=O(\theta^{n})$ . If (4.5) holds, then there exists $n_{0}$ such that for all $n\geq n_{0}$

[TABLE]

hence $W_{n}(\phi)\geq\theta^{n-n_{0}}W_{n_{0}}(\phi)$ . Then the desired inequalities follow easily from the definitions. ∎

4.2 Proof of Theorem 3.1

Fix a separately $d_{\theta}$ -Lipschitz function $K:\Omega^{n}\to\mathds{R}$ . It is convenient to think of it as a function on $\Omega^{\mathds{N}}$ depending only on the first $n$ coordinates, therefore $\mathrm{Lip}_{\theta,i}(K)=0$ for $i\geq n$ . We endow $\Omega^{\mathds{N}}$ with the measure $\mu^{\infty}$ obtained as the limit when $k\to\infty$ of the measure $\mu^{\infty}_{k}$ on $\Omega^{k}$ given by ${\mathrm{d}}\mu^{\infty}_{k}(x_{0},\dotsc,x_{k-1})={\mathrm{d}}\mu_{\phi}(x_{0})\delta_{x_{1}=Tx_{0}}\dotsm\delta_{x_{k-1}=Tx_{k-2}}$ . On $\Omega^{\mathds{N}}$ , let $\mathcal{F}_{p}$ be the $\sigma$ -algebra of events depending only on the coordinates $(x_{j})_{j\geq p}$ (this is a decreasing sequence of $\sigma$ -fields). We want to write the function $K$ as a sum of reverse martingale differences with respect to this sequence. Therefore, let $K_{p}=\mathds{E}(K|\mathcal{F}_{p})$ and $D_{p}=K_{p}-K_{p+1}$ . More precisely,

[TABLE]

The function $D_{p}$ is $\mathcal{F}_{p}$ -measurable and $\mathds{E}(D_{p}|\mathcal{F}_{p+1})=0$ . Moreover

[TABLE]

We then apply Azuma-Hoeffding inequality (see e.g. [17, Page 68]) which says that

[TABLE]

Therefore, the point is to obtain a good bound on $D_{p}$ . This is the claim of the following lemma.

Lemma 4.2.

There exists $C_{\mathrm{(\ref{mainlemma})}}>0$ , depending only on $\phi$ , such that for any $p\in\mathds{N}$ one has

[TABLE]

Using this lemma and applying Young’s inequality for convolutions [3, p. 316] twice we obtain

[TABLE]

Remark 4.1.

If $u=(u_{n})_{n}$ and $v=(v_{n})_{n}$ are sequences of reals, their convolution $u\star v$ is given by $(u\star v)_{n}=\sum_{k=0}^{n}u_{k}v_{n-k}$ . Young’s inequality tells us that if $u\in\ell^{p}(\mathds{N})$ , $u\in\ell^{q}(\mathds{N})$ and $1\leq p,q,r\leq\infty$ with $r^{-1}+1=p^{-1}+q^{-1}$ , then

[TABLE]

We used it twice with $r=2$ , $p=2$ and $q=1$ .

Notice that by assumption and by Theorem 2.1 we have $\sum_{k\geq 1}\epsilon_{k}<+\infty$ . Therefore, using (4.7) at a fixed index $P$ and then letting $P$ tend to infinity, we get by the dominated convergence theorem

[TABLE]

which is, in view of (4.6), exactly (3.1) with

[TABLE]

Now we are going to prove Lemma 4.2 by proving that $K_{p}$ is close to an integral quantity. This is the content of the following lemma which is the core of the proof.

Lemma 4.3.

There exists $C_{\mathrm{(\ref{lem_controle_Kp})}}>0$ , depending only on $\phi$ , such that, for all $p\in\mathds{N}$ ,

[TABLE]

where

[TABLE]

Proof of Lemma 4.2.

Applying Lemma 4.3 yields

[TABLE]

Averaging $K_{p}(x^{\prime}_{p},x_{p+1},\dotsc)$ over the preimages of $x^{\prime}_{p}$ we get exactly $K_{p+1}(x_{p+1},\dotsc)$ , hence the previous bound holds for $|D_{p}|$ , proving the lemma. ∎

Proof of Lemma 4.3.

Let us fix a point $x_{*}$ in $\Omega$ and decompose $K_{p}$ as

[TABLE]

For fixed $i$ , we can group together those points $y\in T^{-p}(x_{p})$ which have the same image under $T^{i}$ , splitting the sum $\sum_{T^{p}(y)=x_{p}}$ as $\sum_{T^{p-i}(z)=x_{p}}\sum_{T^{i}(y)=z}$ . Since the jacobian is multiplicative, one has $g^{(p)}(y)=g^{(i)}(y)g^{(p-i)}(z)$ . Let us define two functions $f_{i}$ and $H$ as follows:

[TABLE]

Bearing in mind (4.2), we obtain

[TABLE]

Now we want to prove that $f_{i}\in\mathcal{L}_{\phi}$ to use (4.3). First observe that for any $z\in\Omega$

[TABLE]

since $d_{\theta}(x_{*},T^{i}y)\leq 1$ and $\sum_{T^{i}y=z}g^{(i)}(y)=1$ . Hence

[TABLE]

We now estimate the $d_{\phi}$ -Lipschitz norm of $f_{i}$ . We write

[TABLE]

where $z$ and $z^{\prime}$ are two points in the same partition element, and their respective preimages $y$ , $y^{\prime}$ are paired according to the cylinder of length $i$ they belong to. Using the distorsion control (4.4) we have

[TABLE]

hence the first sum in (4.8) is bounded in absolute value by

[TABLE]

For the second sum, substituting successively each $T^{j}y$ with $T^{j}y^{\prime}$ , we have

[TABLE]

where we used Lemma 4.1 for the third inequality.

Summing over the different preimages of $z$ , we deduce that

[TABLE]

Therefore we can apply (4.3) to get

[TABLE]

Summing those bounds, one obtains

[TABLE]

Finally, when one computes the sum of the integrals of $f_{i}$ , there are again cancelations, leaving only $\int K(y,\dotsc,T^{p-1}y,x_{p},\dotsc)\,{\mathrm{d}}\mu_{\phi}(y)$ . ∎

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. G. Bobkov and F. Götze. Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. , 163(1):1–28, 1999.
2[2] S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities . Oxford University Press, Oxford, 2013. A nonasymptotic theory of independence, With a foreword by Michel Ledoux.
3[3] P. Bullen. Dictionary of inequalities . Monographs and Research Notes in Mathematics. CRC Press, Boca Raton, FL, second edition, 2015.
4[4] J.-R. Chazottes. Fluctuations of observables in dynamical systems: from limit theorems to concentration inequalities. In Nonlinear dynamics new directions , volume 11 of Nonlinear Syst. Complex. , pages 47–85. Springer, Cham, 2015.
5[5] J.-R. Chazottes, P. Collet, and F. Redig. On concentration inequalities and their applications for Gibbs measures in lattice systems. Journal of Statistical Physics , 169(3):504–546, Nov 2017.
6[6] J.-R. Chazottes, P. Collet, and B. Schmitt. Statistical consequences of the Devroye inequality for processes. Applications to a class of non-uniformly hyperbolic dynamical systems. Nonlinearity , 18(5):2341–2364, 2005.
7[7] J.-R. Chazottes and S. Gouëzel. Optimal concentration inequalities for dynamical systems. Comm. Math. Phys. , 316(3):843–889, 2012.
8[8] J.-R. Chazottes and C. Maldonado. Concentration bounds for entropy estimation of one-dimensional Gibbs measures. Nonlinearity , 24(8):2371–2381, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Gaussian Concentration bound for potentials satisfying Walters condition with subexponential continuity rates

Abstract

Contents

1 Introduction

2 Setting and preliminary results

Definition 2.1**.**

Definition 2.2** (The distance dϕd_{\phi}dϕ​).**

Remark 2.1**.**

Theorem 2.1** ([22], [18]).**

Remark 2.2**.**

3 Main result and applications

3.1 Gaussian concentration bound

Definition 3.1**.**

Theorem 3.1**.**

Corollary 3.2**.**

Corollary 3.3**.**

Proof.

Corollary 3.4**.**

Proof.

Theorem 3.5**.**

3.2 Related works

3.3 Applications

3.3.1 Birkhoff sums

3.3.2 Empirical frequency of blocks

Theorem 3.6**.**

Proof.

3.3.3 Hitting times and entropy

Theorem 3.7**.**

3.3.4 Speed of convergence of the empirical measure

Theorem 3.8**.**

Proof.

3.3.5 Relative entropy, dˉ\bar{d}dˉ-distance and speed of Markov approximation

Theorem 3.9**.**

Proof.

Corollary 3.10**.**

Proof.

3.3.6 Shadowing of orbits

Theorem 3.11**.**

Proof.

3.3.7 Almost-sure central limit theorem

Remark 3.1**.**

Theorem 3.12**.**

Proof.

Remark 3.2**.**

4 Proof of Theorem 3.1

4.1 Some preparatory results

Lemma 4.1**.**

Proof.

4.2 Proof of Theorem 3.1

Lemma 4.2**.**

Remark 4.1**.**

Lemma 4.3**.**

Proof of Lemma 4.2.

Proof of Lemma 4.3.

Definition 2.1.

Definition 2.2 (The distance $d_{\phi}$ ).

Remark 2.1.

Theorem 2.1 ([22], [18]).

Remark 2.2.

Definition 3.1.

Theorem 3.1.

Corollary 3.2.

Corollary 3.3.

Corollary 3.4.

Theorem 3.5.

Theorem 3.6.

Theorem 3.7.

Theorem 3.8.

3.3.5 Relative entropy, $\bar{d}$ -distance and speed of Markov approximation

Theorem 3.9.

Corollary 3.10.

Theorem 3.11.

Remark 3.1.

Theorem 3.12.

Remark 3.2.

Lemma 4.1.

Lemma 4.2.

Remark 4.1.

Lemma 4.3.