A temporal Central Limit Theorem for real-valued cocycles over rotations

Michael Bromberg; Corinna Ulcigrai

arXiv:1705.06484·math.DS·May 23, 2017

A temporal Central Limit Theorem for real-valued cocycles over rotations

Michael Bromberg, Corinna Ulcigrai

PDF

TL;DR

This paper proves a Temporal Central Limit Theorem for deterministic random walks driven by irrational rotations with specific conditions, extending previous results to more general irrational parameters using renormalization and symbolic coding.

Contribution

It extends the Temporal CLT to irrational parameters using continued fraction and Ostrowski expansions, generalizing prior quadratic irrational cases.

Findings

01

Occupancy variables converge to Gaussian distribution

02

Extension of CLT to irrational skewing cocycles

03

Application of continued fraction renormalization

Abstract

We consider deterministic random walks on the real line driven by irrational rotations, or equivalently, skew product extensions of a rotation by $α$ where the skewing cocycle is a piecewise constant mean zero function with a jump by one at a point $β$ . When $α$ is badly approximable and $β$ is badly approximable with respect to $α$ , we prove a Temporal Central Limit theorem (in the terminology recently introduced by D.Dolgopyat and O.Sarig), namely we show that for any fixed initial point, the occupancy random variables, suitably rescaled, converge to a Gaussian random variable. This result generalizes and extends a theorem by J. Beck for the special case when $α$ is quadratic irrational, $β$ is rational and the initial point is the origin, recently reproved and then generalized to cover any initial point using geometric renormalization arguments by…

Figures4

Click any figure to enlarge with its caption.

Equations299

S_{n} (T, f, x) := k = 0 \sum n - 1 f \circ T^{k} (x)

S_{n} (T, f, x) := k = 0 \sum n - 1 f \circ T^{k} (x)

\frac{S _{n} - A _{n}}{B _{n}} ⟶ d i s t Y .

\frac{S _{n} - A _{n}}{B _{n}} ⟶ d i s t Y .

ν_{n} (F) := \frac{1}{n} # {1 \leq k \leq n : S_{k} (x_{0}) \in F}

ν_{n} (F) := \frac{1}{n} # {1 \leq k \leq n : S_{k} (x_{0}) \in F}

\frac{1}{n} # {1 \leq k \leq n : \frac{S _{k} ( x _{0} ) - A _{n}}{B _{n}} < a} ⟶ n \to \infty P r o b (Y < a)

\frac{1}{n} # {1 \leq k \leq n : \frac{S _{k} ( x _{0} ) - A _{n}}{B _{n}} < a} ⟶ n \to \infty P r o b (Y < a)

R_{α} (x) = x + α mod 1.

R_{α} (x) = x + α mod 1.

f_{β} (x) = 1_{[0, β)} (x) - β .

f_{β} (x) = 1_{[0, β)} (x) - β .

\frac{1}{n} # {1 \leq k \leq n : \frac{S _{k} ( R _{α} , f _{β} , 0 ) - C _{1} lo g n}{C _{2} lo g n} \in [a, b]} \to \frac{1}{2 π} a \int b e^{- \frac{x ^{2}}{2}} d x .

\frac{1}{n} # {1 \leq k \leq n : \frac{S _{k} ( R _{α} , f _{β} , 0 ) - C _{1} lo g n}{C _{2} lo g n} \in [a, b]} \to \frac{1}{2 π} a \int b e^{- \frac{x ^{2}}{2}} d x .

N_{k} (α, β) := # {0 \leq j < k ∣ 0 \leq j α mod 1 < β},

N_{k} (α, β) := # {0 \leq j < k ∣ 0 \leq j α mod 1 < β},

T_{f_{β}} (x, y) = (R_{α} (x), y + f_{β} (x)), (x, y) \in T \times R,

T_{f_{β}} (x, y) = (R_{α} (x), y + f_{β} (x)), (x, y) \in T \times R,

\frac{1}{n} # {1 \leq k \leq n : \frac{S _{k} ( R _{α} , f _{β} , x ) - A _{n}}{B lo g n} \in [a, b]} \to \frac{1}{2 π} a \int b e^{- \frac{x ^{2}}{2}} d x

\frac{1}{n} # {1 \leq k \leq n : \frac{S _{k} ( R _{α} , f _{β} , x ) - A _{n}}{B lo g n} \in [a, b]} \to \frac{1}{2 π} a \int b e^{- \frac{x ^{2}}{2}} d x

∣ q α - β - p ∣ > \frac{C}{∣ q ∣} for all p \in Z, q \in Z ∖ {0} .

∣ q α - β - p ∣ > \frac{C}{∣ q ∣} for all p \in Z, q \in Z ∖ {0} .

\frac{1}{n} # {1 \leq k \leq n : \frac{S _{k} ( R _{α} , f _{β} , x ) - A _{n}}{B _{n}} \in [a, b]} \to \frac{1}{2 π} a \int b e^{- \frac{x ^{2}}{2}} d x .

\frac{1}{n} # {1 \leq k \leq n : \frac{S _{k} ( R _{α} , f _{β} , x ) - A _{n}}{B _{n}} \in [a, b]} \to \frac{1}{2 π} a \int b e^{- \frac{x ^{2}}{2}} d x .

α = \frac{1}{a _{0} + \frac{1}{a _{1} + \dots}}

α = \frac{1}{a _{0} + \frac{1}{a _{1} + \dots}}

\frac{1}{a _{0} + \frac{1}{a _{1} + \dots \frac{1}{a _{n}}}} = \frac{p _{n}}{q _{n}} .

\frac{1}{a _{0} + \frac{1}{a _{1} + \dots \frac{1}{a _{n}}}} = \frac{p _{n}}{q _{n}} .

sup {∣ f \circ R_{α}^{q_{n}} (x) ∣ : x \in T} \leq ⋁_{T} f

sup {∣ f \circ R_{α}^{q_{n}} (x) ∣ : x \in T} \leq ⋁_{T} f

T_{α} (x) = {x + α x - 1 x \in [- 1, 0) x \in [0, α)

T_{α} (x) = {x + α x - 1 x \in [- 1, 0) x \in [0, α)

T_{α}^{'} (x) = {x + α x - 1 x \in (- 1, 0]; x \in [0, α);

T_{α}^{'} (x) = {x + α x - 1 x \in (- 1, 0]; x \in [0, α);

α_{0} := \frac{α}{1 - α}

α_{0} := \frac{α}{1 - α}

β_{0} := (α + 1) β - 1.

β_{0} := (α + 1) β - 1.

φ (x) = 1_{[- 1, β_{0})} (x) - \frac{β _{0} + 1}{α _{0} + 1}

φ (x) = 1_{[- 1, β_{0})} (x) - \frac{β _{0} + 1}{α _{0} + 1}

φ_{n} (x) = k = 0 \sum n - 1 φ (T_{α_{0}}^{k} (x)) .

φ_{n} (x) = k = 0 \sum n - 1 φ (T_{α_{0}}^{k} (x)) .

a_{n} = [1/ α_{n}], α_{n}^{'} = 1 - a_{n} α_{n}

a_{n} = [1/ α_{n}], α_{n}^{'} = 1 - a_{n} α_{n}

b_{n} := {[(β_{n} - (- 1)) / α_{n}] + 1 = [(1 + β_{n}) / α_{n}] + 1 0 if β_{n} \in [- 1 + (b_{n} - 1) α_{n}, - 1 + b_{n} α_{n}) if β_{n} \in [- α_{n}^{'}, α_{n}) .

b_{n} := {[(β_{n} - (- 1)) / α_{n}] + 1 = [(1 + β_{n}) / α_{n}] + 1 0 if β_{n} \in [- 1 + (b_{n} - 1) α_{n}, - 1 + b_{n} α_{n}) if β_{n} \in [- α_{n}^{'}, α_{n}) .

x_{n} := {- 1 + (b_{n} - 1) α_{n} 0 i f b_{n} \geq 1 i f b_{n} = 0, β_{n}^{'} := {β_{n} + 1 - (b_{n} - 1) α_{n} β_{n} if b_{n} \geq 1 if b_{n} = 0 .

x_{n} := {- 1 + (b_{n} - 1) α_{n} 0 i f b_{n} \geq 1 i f b_{n} = 0, β_{n}^{'} := {β_{n} + 1 - (b_{n} - 1) α_{n} β_{n} if b_{n} \geq 1 if b_{n} = 0 .

α_{n + 1} := \frac{α _{n}^{'}}{α _{n}} = \frac{1 - a _{n} α _{n}}{α _{n}} = \frac{1}{α _{n}} - [\frac{1}{α _{n}}] = G (α_{n})

α_{n + 1} := \frac{α _{n}^{'}}{α _{n}} = \frac{1 - a _{n} α _{n}}{α _{n}} = \frac{1}{α _{n}} - [\frac{1}{α _{n}}] = G (α_{n})

β_{n + 1} := - \frac{β _{n}^{'}}{α _{n}} = - \frac{β _{n} - x _{n}}{α _{n}},

β_{n + 1} := - \frac{β _{n}^{'}}{α _{n}} = - \frac{β _{n} - x _{n}}{α _{n}},

β_{n} = {- 1 + (b_{n} - 1) α_{n} - α_{n} β_{n + 1} = x_{n} - α_{n} β_{n + 1} - α_{n} β_{n + 1} b_{n} \geq 1 b_{n} = 0 .

β_{n} = {- 1 + (b_{n} - 1) α_{n} - α_{n} β_{n + 1} = x_{n} - α_{n} β_{n + 1} - α_{n} β_{n + 1} b_{n} \geq 1 b_{n} = 0 .

β_{n + 1} = H (α_{n}, β_{n}) := {- {\frac{β _{n} + 1}{α _{n}}} - \frac{β _{n}}{α _{n}} i f b_{n} \geq 1 i f b_{n} = 0.

β_{n + 1} = H (α_{n}, β_{n}) := {- {\frac{β _{n} + 1}{α _{n}}} - \frac{β _{n}}{α _{n}} i f b_{n} \geq 1 i f b_{n} = 0.

I^{(n)} := {(- α^{(n - 1)}, α^{(n)}] [- α^{(n)}, α^{(n - 1)}) i f n i s o dd; i f n i s e v e n .

I^{(n)} := {(- α^{(n - 1)}, α^{(n)}] [- α^{(n)}, α^{(n - 1)}) i f n i s o dd; i f n i s e v e n .

β_{0} = n = 0 \sum \infty x^{(n)}, where x^{(n)} = ψ_{n} (x_{n}) = {(- 1)^{n} α^{(n - 1)} (- 1 + (b_{n} - 1) α_{n}) 0 1 \leq b_{n} \leq a_{n} b_{n} = 0

β_{0} = n = 0 \sum \infty x^{(n)}, where x^{(n)} = ψ_{n} (x_{n}) = {(- 1)^{n} α^{(n - 1)} (- 1 + (b_{n} - 1) α_{n}) 0 1 \leq b_{n} \leq a_{n} b_{n} = 0

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A temporal Central Limit Theorem for real-valued cocycles over rotations

Michael Bromberg and Corinna Ulcigrai

Abstract.

We consider deterministic random walks on the real line driven by irrational rotations, or equivalently, skew product extensions of a rotation by $\alpha$ where the skewing cocycle is a piecewise constant mean zero function with a jump by one at a point $\beta$ . When $\alpha$ is badly approximable and** $\beta$ ** is badly approximable with respect to $\alpha$ , we prove a *Temporal Central Limit theorem *(in the terminology recently introduced by D.Dolgopyat and O.Sarig), namely we show that for any fixed initial point, the occupancy random variables, suitably rescaled, converge to a Gaussian random variable. This result generalizes and extends a theorem by J. Beck for the special case when $\alpha$ is quadratic irrational, $\beta$ is rational and the initial point is the origin, recently reproved and then generalized to cover any initial point using geometric renormalization arguments by Avila-Dolgopyat-Duryev-Sarig (Israel J., 2015) and Dolgopyat-Sarig (J. Stat. Physics, 2016). We also use renormalization, but in order to treat irrational values of $\beta$ , instead of geometric arguments, we use the renormalization associated to the continued fraction algorithm and dynamical Ostrowski expansions. This yields a suitable symbolic coding framework which allows us to reduce the main result to a CLT for non homogeneous Markov chains.

1. introduction and results

The main result of this article is a temporal distributional limit theorem (see Section 1.1 below) for certain functions over an irrational rotation (Theorem 1.1 below). In order to introduce and motivate this result, in the first section, we first define two types of distributional limit theorems in the study of dynamical systems, namely spatial and temporal. Temporal limit theorems in dynamics are the focus of the recent paper [19] by D. Dolgopyat and O. Sarig; we refer the interested reader to [19] and the references therein for a comprehensive introduction to the subject, as well as for a list of examples of dynamical systems known up to date to satisfy temporal distributional limit theorems. In section 1.2 we then focus on irrational rotations, which are one of the most basic examples of low complexity dynamical systems, and recall previous results on temporal limit theorems for rotations, in particular Beck’s temporal CLT. Our main result in stated in section 1.3, followed by a description of the structure of the rest of the paper in section 1.4.

1.1. Temporal and Spatial Limits in dynamics.

Distributional limit theorems appear often in the study of dynamical systems as follows. Let $X$ be a complete separable metric space, $m$ a Borel probability measure on $X$ and denote by $\mathcal{B}$ is the Borel $\sigma$ -algebra on $X$ . Let $T:\,X\rightarrow X$ be a Borel measurable map. We call the quadruple $\left(X,\mathcal{B},m,T\right)$ a probability preserving dynamical system and assume that $T$ is ergodic with respect to $m$ . Let $f:\,X\rightarrow\mathbb{R}$ be a Borel measurable function and set

[TABLE]

We will also use the notation $S_{n}\left(x\right)$ , or $S_{n}\left(f,x\right)$ instead of $S_{n}\left(T,f,x\right)$ , when it is clear from the context, what is the underlying transformation or function. The function $S_{n}(x)$ is called (the $n^{th}$ ) Birkhoff sum (or also ergodic sum) of the function $f$ over the transformation $T$ . The study of Birkhoff sums, their growth and their behavior is one of the central themes in ergodic theory. When the transformation $T$ is ergodic with respect to $m,$ by the Birkhoff ergodic theorem, for any $f\in L^{1}(X,m)$ , for $m$ -almost every $x\in X$ , $S_{n}(f,x)/n$ converges to $\int fdm$ as $n$ grows; equivalently, one can say that the random variables $X_{n}:=f\circ T^{n}$ where $x$ is chosen randomly according to the measure $m$ , satisfy the strong law of large numbers. We will now introduce some limit theorems which allow to study the error term in the Birkhoff ergodic theorem.

The function $f$ is said to satisfy a spatial distributional limit theorem (spatial DLT) if there exists a random variable with no atoms $Y$ , and sequences of constants $A_{n},B_{n}\in\mathbb{R}$ , $B_{n}\rightarrow\infty$ , such that the random variables $\frac{S_{n}\left(x\right)-A_{n}}{B_{n}}$ , where $x$ is chosen randomly according to the measure $m$ , converge in distribution to $Y$ . In this case we write

[TABLE]

It is the case that many *hyperbolic *dynamical systems, under some regularity conditions on $f$ , satisfy a spatial DLT with the limit being a Gaussian random variable. In the cases that we have in mind, the rate of mixing of the sequence of random variables $X_{n}:=f\circ T^{n}$ is sufficiently fast, in order for them to satisfy the Central Limit Theorem (CLT). On the other hand, in many classical examples of dynamical systems with zero entropy, for which the random variables $X_{n}:=f\circ T^{n}$ are highly correlated, the spatial DLT fails if $f$ is sufficiently regular. For example, this is the case when $T$ is an irrational rotation and $f$ is of bounded variation.

Perhaps surprisingly, many examples of dynamical systems with zero entropy satisfy a CLT when instead of averaging over the space $X$ , one considers the Birkhoff sums $S_{n}\left(x_{0}\right)$ over a single orbit of some fixed initial condition $x_{0}\in X$ . Fix an initial point $x_{0}\in X$ and consider its orbit under $T$ . One can define a sequence of occupation measures on $\mathbb{R}$ by

[TABLE]

for every Borel measurable $F\subset\mathbb{R}$ . One can interpret the quantity $\nu_{n}\left(F\right)$ as the fraction of time that the Birkhoff sums $S_{k}\left(x_{0}\right)$ spend in the set $F$ , up to time $n$ . Let $Y_{n}$ be a sequence of random variables distributed according to $\nu_{n}$ . We say that the pair $(T,f)$ satisfies a temporal distributional limit theorem (temporal DLT) along the orbit of $x_{0}$ , if there exists a random variable with no atoms $Y$ , and two sequences $A_{n}\in\mathbb{R}$ and $B_{n}\rightarrow\infty$ such that $(Y_{n}-A_{n})/B_{n}$ converges in distribution to $Y$ . In other words, the pair $\left(T,f\right)$ satisfies a temporal DLT along the orbit of $x_{0}$ , if

[TABLE]

for every $a\in\mathbb{R}$ . If the limit $Y$ is a Gaussian random variable, we call this type of behavior a temporal CLT along the orbit of $x_{0}$ . Note, that this type of result may be interpreted as convergence in distribution of a sequence of normalized random variables, obtained by considering the Birkhoff sums $S_{k}\left(x_{0}\right)$ for $k=1,..,n$ and choosing $k$ randomly uniformly.

1.2. Beck’s temporal CLT and its generalizations

One example of occurrence of a temporal CLT in dynamical systems with zero entropy is the following result by Beck, generalizations of which are the main topic of this paper. Let us denote by $R_{\alpha}$ the rotation on the interval $\mathbb{T}=\mathbb{R}\setminus\mathbb{Z}$ by an irrational number $\alpha\in\mathbb{R}$ , given by

[TABLE]

Let $f_{\beta}:\,\mathbb{T}\to\mathbb{R}$ be the indicator of the interval $[0,\beta)$ where $0<\beta<1$ , rescaled to have mean zero with respect to the Lebesgue measure on $\mathbb{T}$ , namely

[TABLE]

The sequence $\left\{S_{n}\right\}$ of random variables given by the Birkhoff sums $S_{n}(x)=S_{n}\left(R_{\alpha},f_{\beta},x\right)$ , where $x$ is taken uniformly with respect to the Lebesgue measure, is sometimes referred to in the literature as the *deterministic random walk *driven by an irrational rotation (see for example [6]).

Beck proved [7, 8] that if $\alpha$ is a quadratic irrational, and $\beta$ is rational, then the pair $(R_{\alpha},f_{\beta})$ satisfies a temporal DLT along the orbit of $x_{0}=0$ . More precisely, he shows that there exist constants $C_{1}$ and $C_{2}$ such that for all $a,b\in\mathbb{R}$ , $a<b$

[TABLE]

Beck’s CLT relates to the theory of discrepancy in number theory as follows. If $\alpha\in\mathbb{R}$ is irrational, by unique ergodicity of the rotation $R_{\alpha}$ , the sequence of $\left\{j\alpha\right\}$ is equidistributed modulo one, i.e. in particular, for any** $\beta\in\left[0,1\right]$ ** if we set

[TABLE]

then $N_{k}\left(\alpha,\beta\right)/k$ converges to $\beta$ , or, equivalently, $N_{k}\left(\alpha,\beta\right)=k\beta+o(k)$ . Discrepancy theory concerns the study of the error term in the expression $N_{k}\left(\alpha,\beta\right)=k\beta+o(k)$ . Beck’s result hence says that, when $\alpha$ is a quadratic irrational and $\beta$ is* rational, *the error term $\overline{N_{k}}\left(\alpha,\beta\right):=N_{k}\left(\alpha,\beta\right)-k\beta$ , when $k$ is chosen uniformly in $\left\{1,\dots,n\right\}$ , can be normalized so that it converges to the standard Gaussian distribution as $n$ grows to infinity.

Let us also remark that the Birkhoff sums in the statement of Beck’s theorem are related to the dynamics of the map $T_{f_{\beta}}:\mathbb{T}\times\mathbb{R}\rightarrow\mathbb{T\times R}$ , defined by

[TABLE]

since one can see that the form of the iterates of $T_{f}$ is $T_{f}^{n}\left(x,y\right)=\text{$ \left(R_{\alpha}^{n}\left(x\right),y+S_{n}\text{ $\left(f_{\beta},x\right)$ }\right) $.}$ This skew product map has been studied as one basic example in infinite ergodic theory and there is a long history of results on it, starting from ergodicity (see for example [30, 15, 3, 27, 6, 2]).

Recently, in [6], a new proof of Beck’s theorem for the special case where $\beta=\frac{1}{2}$ , which uses dynamical and geometrical renormalization tools. It is crucially based on the interpretation of the corresponding skew-product map $T_{f_{1/2}}$ as the Poincaré map of a flow on the staircase periodic surface, which was noticed and pointed out in [22]. In [19] this method is generalized to show that for any initial point $x$ , any $\alpha$ quadratic irrational and any* rational* $\beta$ , there exists a sequence $A_{n}:=A_{n}\left(\alpha,\beta,x\right)$ and a constant $B:=B\left(\alpha,\beta\right)$ such that

[TABLE]

for all $a,b\in\mathbb{R}$ , $a<b$ . Dolgopyat and Sarig showed us how to use the staircase method to prove the temporal CLT also in the case when $\alpha$ is badly approximable, for a.e. $x$ and $\beta=1/2$ (private communication), but their methods do not apply to the more general class of $\beta$ s that we treat in this paper. They also informed us that they can show that the temporal CLT does not hold for a.e. $(\alpha,x)$ and $\beta=1/2$ .

1.3. Main result and comments

The main result of this paper is the following generalization of Beck’s temporal CLT, in which we consider certain irrational values of $\beta$ and badly approximable values of $\alpha$ . Let us recall that $\alpha$ is* badly approximable* (or equivalently, $\alpha$ is of* bounded type*) if there exists a constant $c>0$ such that $\left|\alpha-p/q\right|\geq c/\left|q\right|$ for any $p,q$ , $q\neq 0$ . Equivalently, $\alpha$ is badly approximable if the continued faction entries of $\alpha$ are uniformly bounded. For $\alpha\in\left(0,1\right)\setminus\mathbb{Q}$ let us say that $\beta$ is badly approximable with respect to $\alpha$ if there exists a constant $C>0$ such that

[TABLE]

One can show that given a badly approximable $\alpha$ , the set of $\beta$ which are badly approximable with respect to $\alpha$ have full Hausdorff dimension.

Theorem 1.1.

Let $0<\alpha<1$ be a badly approximable irrational number. For every $\beta$ badly approximable with respect to $\alpha$ and every $x\in\mathbb{T}$ there exists a sequence of centralizing constants $A_{n}:=A_{n}\left(\alpha,\beta,x\right)$ and a sequence of normalizing constants $B_{n}:=B_{n}\left(\alpha,\beta\right)$ such that for all $a<b$

[TABLE]

In other words, for every $\alpha$ badly approximable, any $\beta$ badly approximable with respect to $\alpha$ the pair $\left(R_{\alpha},f\right)$ satisfies the temporal CLT along the orbit of any $x\in\mathbb{T}$ . Note that the centralizing constants depend on $x$ , while the normalizing constants do not. We will see in Section 4.1 that badly approximable numbers with respect to $\alpha$ can be explicitly described in terms of their Ostrowski expansion, using an adaptation of the continued fraction algorithm in the context of non homogenous Diophantine Approximation. Let us recall that quadratic irrationals are in particular badly approximable. Moreover, when* $\alpha$ * is badly approximable, it follows from definition that any rational number $\beta$ is badly approximable with respect to $\alpha$ . Thus, this theorem, already in the special case in which $\alpha$ is assumed to be a quadratic irrational, since it includes irrational values of $\beta$ , gives a strict generalization of the results mentioned above. As we already pointed out, the temporal limit theorem, fails to hold for almost every value of $\alpha$ . It would be interesting to see whether a temporal CLT holds for a larger class of values of $\beta$ .

While the proof of Theorem 1.1 was inspired and motivated by an insight of Dolgopyat and Sarig and based, as theirs, on renormalization, we stress that our renormalization scheme and the formalism that we develop is different. As remarked in the previous section, the proof of Beck’s theorem in [6, 19] exploits a geometric renormalization which is based on the link with the staircase flow and the existence of affine diffeomorphisms which renormalize certain directions of directional flows on this surface. This geometrical insight, unfortunately, as well as the interpretation of the map $T_{f}$ as the Poincaré map of a staricase flow, breaks down when $\beta$ is not rational. Our proof does not rely on this geometric picture, but uses only the more classical renormalization given by the continued fraction algorithm for rotations, with the additional information encoded by Ostrowski expansions in the context of non homogeneous Diophantine approximations (see Section 2). This renormalization allows to encode the dynamics symbolically and reduce it to the formalism of adic and Vershik maps [37]).

There is a large literature of results on limiting distributions for entropy zero dynamical systems, see for example [13, 14, 12, 18, 20, 35]. Let us mention two recent results in the context of substitution systems which are related to our work. Bressaud, Bufetov and Hubert proved in [11] a spatial CLT for substitutions with eigenvalues of modulus one along a subsequence of times. In the same context (substitutions with eigenvalues of modulus one), Paquette and Son [28] recently also proved a *temporal *CLT. In [1] a temporal CLT over quadratic irrational rotations and $\mathbb{R}^{d}$ valued, piecewise constant functions with rational discontinuities, is shown to hold along subsequences.

While we wrote this paper specifically for deterministic random walks driven by rotations, there are other entropy zero dynamical systems where this formalism applies and for which one can prove temporal limit theorems using similar techniques. For example, in work in progress, we can prove temporal limit theorems also for certain linear flows on infinite translation surfaces and some cocycles over interval exchange transformation and more in general for certain $\mathcal{S}-$ adic systems (which are non-stationary generalizations of substitution systems, see [9]).

1.4. Proof tools and sketch and outline of the paper

In Section 2 we introduce the renormalization algorithm that we use, as a key tool in the proofs: this is essentially the classical multiplicative continued fraction algorithm, with additional data which records the relative position of the break point $\beta$ of the function $f_{\beta}$ under renormalization. This renormalization acts on the underlying parameter space to be defined in what follows, as a (skew-product) extension of the Gauss map, and it produces simultaneously the continued fraction expansion entries of $\alpha$ and the Ostrowski expansion entries of $\beta$ . Variations on this skew product have been studied by several authors (see in particular [5, 33]) and it is well known that it is related to a section of the diagonal flow on the space of affine lattices (as explained in detail in [5]). In sections 2.4 and 2.5 we explain how the renormalization algorithm provides a way of encoding dynamics symbolically in terms of a Markov chain. More precisely, the dynamics of the map $R_{\alpha}$ we are interested in translates in symbolic language to the adic or Vershik dynamics (on a Bratelli diagram given by the Markov chain), as explained in section. The original function $f_{\beta}$ defines under renormalization a sequence of induced functions (which correspond to Birkhoff sums of the function $f_{\beta}$ at first return times, called special Birkhoff sums in the terminology introduced by [25]). The Birkhoff sums of the function $f_{\beta}$ can be then decomposed into sums of special Birkhoff sums. This formalism and the symbolic coding allows to translate the study of the temporal visit distribution random variable to the study of a non-homogeneous Markov chain, see section 2.6. In Section 3 we provide sufficient conditions for a non-homogeneous Markov chain to satisfy the CLT. Finally, in Section 4 we prove that these conditions are satisfied for the Markov chain modeling the temporal distribution random variables.

2. renormalization

2.1. Preliminaries on continued fraction expansions and circle rotations

Let $\mathcal{G}$ : $(0,1)\to(0,1)$ be the Gauss map, given by $\mathcal{G}(x)=\left\{\nicefrac{{1}}{{x}}\right\}$ , where $\{\cdot\}$ denotes the fractional part. Recall that a regular continued fraction expansion of $\alpha\in\left(0,1\right)\setminus\mathbb{Q}$ is given by

[TABLE]

where $a_{i}:=a\left(\alpha_{i}\right)=\left[\frac{1}{\alpha_{i}}\right]$ and $\alpha_{i}:=G^{i}(\alpha)$ = $\left\{\frac{1}{\alpha_{i-1}}\right\}$ . In this case we write $\alpha=\left[a_{0},a_{1},...\right]$ . Setting $q_{-1}=1$ , $q_{0}=a_{0}$ , $q_{n}=a_{n}q_{n-1}+q_{n-2}$ for $n\geq 1$ , and $p_{-1}=0$ , $p_{0}=1$ , $p_{n}=a_{n}p_{n-1}+p_{n-2}$ for $n\geq 1$ we have $\gcd\left(p_{n},q_{n}\right)=1$ and

[TABLE]

Let $\alpha\in\left(0,1\right)\setminus\mathbb{Q}$ , $\mathbb{T}:=\mathbb{R}/\mathbb{Z}$ and $R_{\alpha}:\mathbb{T}\rightarrow\mathbb{T}$ be the irrational rotation given by $R_{\alpha}:=x+\alpha\mod 1$ . Then the Denjoy-Koksma inequality [21, 24] states that if $f:\mathbb{T}\rightarrow\mathbb{R}$ is a function of bounded variation, then for any $n\in\mathbb{N}$ ,

[TABLE]

where $\bigvee_{\mathbb{T}}f$ is the variation of $f$ on $\mathbb{T}$ .

In this section we define the dynamical renormalization algorithm we use in this paper, which is an extension of the classical continued fraction algorithm and hence of the Gauss map. This algorithm gives a dynamical interpretation of the notion of Ostrowski expansion of $\beta$ relative to $\alpha$ in non-homogeneous Diophantine approximation. We mostly follow the conventions of the paper [5] by Arnoux and Fisher, in which the connection between this renormalization and homogeneous dynamics (in particular the geodesic flow on the space of lattices with a marked point, which is also known as the scenery flow) is highlighted. As in [5] we use a different convention for rotations on the circle. Let $\alpha\in\left(0,1\right)\setminus\mathbb{Q}$ , $I=\left[-1,\alpha\right)$ and let $T_{\alpha}\left(x\right):\left[-1,\alpha\right)\rightarrow\left[-1,\alpha\right)$ be defined by

[TABLE]

Note that $T_{\alpha}$ may also be viewed as a rotation on the circle $\mathbb{R}/\sim$ where the equivalence relation $\sim$ on $\mathbb{R}$ is given by $x\sim y\iff x-y\in\left(1+\alpha\right)\mathbb{Z}$ . It is conjugate to the standard rotation $R_{\alpha^{\prime}}$ on $\mathbb{T}$ , where $\alpha^{\prime}=\frac{\alpha}{1+\alpha}$ , by the map $\psi(x)=\left(\alpha^{\prime}+1\right)x-1$ which maps the unit interval $[0,1]$ to the interval $[-1,\alpha]$ .

*Remark 2.1**.*

In what follows, we slightly abuse notation by not distinguishing between the transformation $T_{\alpha}$ and the transformation defined similarly on the interval $\left(-1,\alpha\right]$ by

[TABLE]

when viewed as transformations on the circle, $T_{\alpha}$ and $T_{\alpha^{\prime}}$ coincide.

Note that given an irrational rotation $R_{\alpha}$ , we can assume without loss of generality that $\alpha<\frac{1}{2}$ (otherwise consider the inverse rotation by $1-\alpha$ ). If we set

[TABLE]

then $\mathcal{G}(\alpha)=\mathcal{G}^{i}(\alpha_{0})$ for any $i\in\mathbb{N}$ and thus, apart from the first entry, the continued fraction entries of $\alpha$ and $\alpha_{0}$ coincide. If $a_{0}$ , $a_{0}^{\prime}$ are correspondingly the first entries in the expansion of $\alpha_{0}$ and $\alpha$ , then $a_{0}=a_{0}^{\prime}+1$ . Furthermore, given $\beta\in\left(0,1\right)$ , let

[TABLE]

Then the mean zero with a discontinuity at $\beta_{0}$ , given by

[TABLE]

is the function that corresponds to the function $f_{\beta}$ in the introduction under the conjugation between $R_{\alpha}$ and $T_{\alpha_{0}}$ . Therefore, we are interested in the Birkhoff sums

[TABLE]

Henceforth, unless explicitly stated otherwise, we work with the transformation $T_{\alpha_{0}}$ . The sequences $\left(a_{n}\right)_{n=0}^{\infty}$ , $\left(\frac{p_{n}}{q_{n}}\right)_{n=0}^{\infty}$ will correspond to the sequence of entries and the sequence of partial convergents in the continued fraction expansion of $\alpha=\frac{\alpha_{0}}{1+\alpha_{0}}$ .

We denote by $\lambda$ the Lebesgue measure on $\left[-1,\alpha_{0}\right)$ normalized to have total mass $1$ .

2.2. Continued fraction

renormalization and Ostrowski expansion

The renormalization procedure is an inductive procedure, where at each stage we induce the original transformation $T_{\alpha_{0}}$ onto a subinterval of the interval we induced upon at the previous stage. We denote by $I^{\left(n\right)}$ the nested sequence of intervals which we induce upon, and by $T^{\left(n\right)}$ the first return map of $T_{\alpha_{0}}$ onto $I^{\left(n\right)}$ . The nested sequence of intervals $I^{\left(n\right)}$ is chosen in such a way that the induced transformations $T^{\left(n\right)}$ are all irrational rotations. The next paragraph describes a step of induction given an irrational rotation $T_{\alpha_{n}}$ on the interval $I_{n}=\left[-1,\alpha_{n}\right)$ defined by (2.2). The procedure is then iterated recursively by rescaling and performing the induction step once again. In general, we keep to the convention that we use $n$ as a superscript to denote objects related to the non-rescaled $n$ th step of renormalization, and as a subscript for the rescaled version.

One step of renormalization

For an irrational $\alpha_{n}\in\left(0,1\right)$ let $I_{n}:=[-1,\alpha_{n})$ , $T_{\alpha_{n}}:I_{n}\rightarrow I_{n}$ defined by the formula in (2.2) and $\beta_{n}\in I_{n}$ . Then $T_{\alpha_{n}}$ is an exchange of two intervals of lengths $\alpha_{0}$ and $1$ respectively (namely $\left[0,\alpha_{0}\right)$ and $\left[-1,0\right)$ ). The renormalization step consists of inducing $T_{\alpha_{n}}$ onto an interval $I_{n}^{\prime}$ , where $I_{n}^{\prime}$ is obtained by cutting a half-open interval of size $\alpha_{0}$ from the left endpoint of the interval $I_{0}$ , i.e. $-1$ , as many times as possible in order to obtain an interval of the from $[-\alpha^{\prime}_{n},\alpha_{n})$ containing zero. More precisely, let

[TABLE]

so that $[-1,0)$ contains exactly $a_{n}$ intervals of lengths $\alpha_{n}$ plus an additional remainder of length $0<\alpha^{\prime}_{n}<\alpha_{n}$ (see Figure 2.1). If $\beta_{n}\in[-1,-\alpha^{\prime}_{n})$ , let $1\leq b_{n}\leq a_{n}$ be such that $\beta_{n}$ belongs to the $b_{n}^{th}$ copy of the interval which is cut, otherwise set $b_{n}:=0$ , i.e. define

[TABLE]

For $b_{n}\geq 1$ , let us define $x_{n}$ to be the left endpoint of the copy of the interval which contains $\beta_{n}$ , otherwise, if $b_{n}=0$ , set $x_{n}:=0$ ; let also $\beta_{n}^{\prime}:=\beta_{n}-x_{n}$ , so that if $b_{n}\geq 1$ then $\beta_{n}^{\prime}$ is the distance of $\beta_{n}$ from the left endpoint of the interval which contains it (Figure 2.1). In formulas

[TABLE]

Notice that $x_{n}=T_{\alpha_{n}}^{b_{n}}(0)$ and hence in particular it belongs to the segment $\left\{0,T_{\alpha_{n}}(0),\dots,T_{\alpha_{n}}^{a_{n}}(0)\right\}$ of the orbit of [math] under $T_{\alpha_{n}}$ .

Let $I_{n}^{\prime}=[-\alpha^{\prime}_{n},\alpha_{n})$ and note that $\beta_{n}^{\prime}\in I_{n}^{\prime}$ and that the induced transformation obtained as the first return map of $T_{\alpha_{n}}$ on $I_{n}^{\prime}$ is again an exchange of two intervals, a short one $\left[-\alpha_{n}^{\prime},0\right)$ and a long one $\left[0,\alpha_{n}\right)$ . Hence, if we renormalize and *flip *the picture by multiplying by $-\alpha_{n}$ , the interval $I_{n}^{\prime}$ is mapped to $I_{n+1}:=\left(-1,\alpha_{n+1}\right]$ , where

[TABLE]

and the transformation $T_{\alpha_{n}}$ as first return on the interval $I_{n}^{\prime}$ is conjugated to $T_{\alpha_{n+1}}$ . We then set

[TABLE]

so that $\beta_{n+1}\in I_{n+1}.$ Thus we have defined $\alpha_{n+1}$ , $\beta_{n+1}$ and $T_{\alpha_{n+1}}$ and completed the description of the step of induction.

Notice that by definition of $\beta_{n}^{\prime}$ and $\beta_{n+1}$ we have that

[TABLE]

and hence, by using equation (2.7), we get

[TABLE]

Repeating the described procedure inductively, one can prove by induction the assertions summarized in the next proposition.

Proposition 2.2.

Let $\alpha^{\left(n\right)}:=\alpha_{0}\cdot...\cdot\alpha_{n}$ where $\alpha_{i}$ are defined inductively from $\alpha_{0}$ by $\alpha_{n}=\mathcal{G}\left(\alpha_{n-1}\right)$ and set $\alpha^{\left(-1\right)}=1$ . Define a sequence of nested intervals $I^{\left(n\right)}$ , $n=0,1,...$ , by $I^{\left(0\right)}:=\left[-1,\alpha^{\left(0\right)}\right)$ , and

[TABLE]

The induced map $T^{\left(n\right)}$ of $T_{\alpha_{0}}$ on $I^{\left(n\right)}$ is conjugated to $T_{\alpha_{n}}$ on the interval $I_{n}=\left[-1,\alpha_{n}\right)$ if $n$ is even or to $T_{\alpha_{n}}$ on $I_{n}=\left(-1,\alpha_{n}\right]$ if $n$ is odd, where the conjugacy is given by $\psi_{n}:I_{n}\rightarrow I^{\left(n\right)}$ , $\psi_{n}\left(x\right)=\left(-1\right)^{n}\alpha^{\left(n-1\right)}\left(x\right)$ .

Let $\beta_{0}\in I^{\left(0\right)}$ and let $(b_{n})_{n}$ and $(\beta_{n})_{n}$ be the sequences inductively111Note that given $\beta_{n},$ formulas (2.7) and (2.10) determine first $b_{n}$ and then, as function of $\beta_{n}$ and $b_{n}$ , also $\beta_{n+1}$ and hence $b_{n+1}$ . by the formulas (2.7) and (2.10). Then we have

[TABLE]

and the reminders are given by

[TABLE]

The expansion in (2.11) is an Ostrowski type expansion for $\beta_{0}$ in terms of $\alpha_{0}$ . We call the integers $b_{n}$ the entries in the Ostrowski expansion of $\beta_{0}$ .

*Remark 2.3**.*

Partial approximations in the Ostrowski expansions have the following dynamical interpretation. It well known that, for any $n\in\mathbb{N}$ , the finite segment $\left\{T_{\alpha_{0}}^{i}\left(0\right):\ i=0,...,q_{n}+q_{n-1}-1\right\}$ of the orbit of [math] under $T_{\alpha_{0}}$ (which can be thought of as a rotation on a circle) induce a partition of $[-1,\alpha_{0})$ into intervals of two lengths (see for example [34]; these partitions correspond to the classical Rokhlin-Kakutani representation of a rotation as two towers over an induced rotation given by the Gauss map, see also Section 2.3 and Remark 2.6). The finite Ostrowski approximation $\sum_{k=0}^{n}x^{\left(k\right)}$ gives one of the endpoints of the unique interval of this partition which contains $\beta_{0}$ (if it is the left or the right one depends on the parity as well as on whether $b_{n}$ is zero or not). In particular, we have that

[TABLE]

*Remark 2.4**.*

Since the points $\alpha^{\left(n\right)}$ are all in the orbit of the point [math] by the rotation $T_{\alpha_{0}}$ , it follows from the correspondence between $T_{\alpha_{0}}$ and $R_{\alpha}$ that the Ostrowski expansion of $\beta_{0}$ appearing in the previous proposition is finite, i.e. $\beta_{0}=\sum_{n=0}^{N}x^{\left(n\right)}$ for some $N\in\mathbb{N}$ if and only if $\beta\in\left\{n\alpha\ \mod 1:\ n\in\mathbb{Z}\right\}$ . This condition is well known to be equivalent to the function $f_{\beta}$ (and hence also $\varphi$ ) being a coboundary (see [29]) .

It follows from the description of the renormalization algorithm that

[TABLE]

where the function $\mathcal{H}$ is defined by (2.10). The ergodic properties of a variation on the map

[TABLE]

were studied among others in [33].

Introduce the functions $a,b:X\to\mathbb{N}$ defined by

[TABLE]

The functions are defined so that the sequences $\left(a_{n}\right)_{n}$ and $\left(b_{n}\right)_{n}$ of continued fractions and Ostrowski entries are respectively given by $a_{n}=a\left(\hat{\mathcal{G}}^{n}\left(\alpha_{0},\beta_{0}\right)\right)$ , $b_{n}=b\left(\hat{\mathcal{G}}^{n}\left(\alpha_{0},\beta_{0}\right)\right)$ for any $n\in\mathbb{N}.$

By Remark 2.4, the restriction of the space $X$ to

[TABLE]

is invariant with respect to $\hat{G}$ and we partition this space into three sets $X_{G},$ $X_{B_{-}}$ , $X_{B_{+}}\subset\tilde{X}$ defined by

[TABLE]

Explicitly, in terms of the relative position of $\alpha,\beta$ , these sets are given by

[TABLE]

The reason for the choice of names $G$ , $B_{-}$ , $B_{+}$ for the thee parts of parameter space, which stand for Good ( $G$ ) and Bad ( $B$ ), where Bad has two subcases, $B_{-}$ and $B_{+}$ (according to whether $\beta$ is positive or negative), will be made clear in Section 4.1.

2.3. Description of the Kakutani-Rokhlin

towers obtained from renormalization.

We assume throughout the present Section and Sections 2.4, 2.5 that we are given a fixed pair $\left(\alpha_{0},\beta_{0}\right)\in\tilde{X}$ . The symbols $q_{n}$ used in this Section refer to the denominators of the $n^{th}$ convergent in the continued fraction expansion of $\alpha$ , where $\alpha$ is related to $\alpha_{0}$ via (2.3).

The renormalization algorithm described above defines a nested sequence of intervals $I^{\left(n\right)}$ . We describe here below how the original transformation $T_{\alpha_{0}}$ can be represented as a union of* towers* in a Kakutani skyscraper (the definition is given below) with base $I^{\left(n\right)}$ ; the tower structure of the skyscraper corresponding to the $(n+1)^{th}$ stage of renormalization is obtained from the towers of the previous skyscraper corresponding to the n stage $n^{th}$ by a cutting and stacking procedure. We will use these towers to describe what we call an* adic* symbolic coding of the interval $I=\left[-1,\alpha_{0}\right)$ (see section 2.4). In what follows, we give a detailed description of the tower structure and the coding.

Let us first recall that if a measurable set $B\subset[-1,\alpha_{0})$ and a natural integer $h$ are such that the union $\bigcup_{i=0}^{h-1}T_{\alpha_{0}}^{i}B$ is disjoint, we say that the union is a (dynamical) tower of base $B$ and height $h$ . The union can indeed be represented as a tower with $h$ floors, namely $T_{\alpha_{0}}^{i}B$ for $i=0,\dots,h-1,$ so that $T_{\alpha_{0}}$ acts by mapping each point in each level except the last one, to the point directly above it. A disjoint union of towers is called a skyscraper (see for example [26]). A subtower of a tower of base $B$ and height $h$ is a tower with the same height whose base is a subset $B^{\prime}\subset B$ .

As it was explained in the previous section, the induced map of $T_{\alpha_{0}}$ on $I^{(n)}$ is an exchange of two intervals, a* long* and a* short* one. If $n$ is even, the long one is given by $\left[-\alpha^{\left(n-1\right)},0\right)$ and the short one by $\left[0,\alpha^{\left(n\right)}\right)$ . If $n$ is odd the long and short interval are respectively given by $\left[0,\alpha^{\left(n-1\right)}\right)$ and $\left[-\alpha^{\left(n\right)},0\right)$ . In both cases, these are the preimages of the intervals $\left[-1,0\right)$ and $\left[0,\alpha_{n}\right)$ under the conjugacy map $\psi_{n}:I^{\left(n\right)}\rightarrow I_{n}$ given in Proposition 2.2. Notice also that $\beta^{\left(n\right)}=\psi_{n}^{-1}\left(\beta_{n}\right)$ , the non rescaled marked point corresponding to the point $\beta_{n}\in I_{n}$ , further divides the two mentioned subintervals of $I^{\left(n\right)}$ into three, by cutting either the long or the short into two subintervals. We denote these three intervals $I_{M}^{(n)},I_{L}^{(n)}$ and $I_{S}^{(n)}$ , where the letters $M,L,S$ , respectively correspond to middle (M), *long *(L) and short (S), and $I_{M}^{(n)}$ denotes the middle interval, while $I_{L}^{(n)}$ and $I_{S}^{(n)}$ denote (what is left of) the long one and the short one, after removing the middle interval. Explicitly, it is convenient to describe the intervals in terms of the partition $X_{G}$ , $X_{B_{-}}$ , $X_{B_{+}}$ defined in the end of the previous section. Thus, set

[TABLE]

We claim that the first return time of $T_{\alpha_{0}}$ to the interval $I^{\left(n\right)}$ is constant on the subintervals $I_{L}^{\left(n\right)}$ , $I_{M}^{\left(n\right)}$ and $I_{S}^{\left(n\right)}$ . Moreover, the first return time over $I_{L}^{\left(n\right)}$ and $I_{S}^{\left(n\right)}$ equals to $q_{n}$ and $q_{n-1}$ respectively, while the first return time over $I_{M}^{\left(n\right)}$ equals either $q_{n}$ or $q_{n-1}$ , depending on whether $\beta^{\left(n\right)}\in\left[-\alpha^{\left(n-1\right)},0\right)$ or $\beta^{\left(n\right)}\in\left[0,\alpha^{\left(n\right)}\right)$ and hence on whether the middle interval was cut from the long or the short interval respectively. For $J\in\left\{L,M,S\right\}$ , let us denote by $h_{J}^{\left(n\right)}$ the first return time of $I_{J}^{\left(n\right)}$ to $I^{\left(n\right)}$ under $T_{\alpha_{0}}$ and let us denote by $Z_{J}^{\left(n\right)}$ the tower with base $I_{J}^{\left(n\right)}$ and height $h_{J}^{\left(n\right)}$ .

Let us how describe how the tower structure at stage $n+1$ of the renormalization is related to the tower structure at stage $n$ . We will describe in detail as an example the particular case where $n$ is odd and $\beta^{\left(n\right)}\in\left[-\alpha^{\left(n-1\right)},-\alpha^{\left(n\right)}\right)$ (i.e. $\beta^{\left(n\right)}\notin I^{\left(n+1\right)}$ ), or equivalently $\left(\alpha_{n},\beta_{n}\right)\in X_{G}$ (see also Figure 2.2). The other cases are summarized in Proposition 2.5 below. In the considered case, the heights $h_{J}^{\left(n\right)}$ of the three towers $Z_{J}^{\left(n\right)}$ , $J\in\left\{L,M,S\right\},$ at stage $n$ are given by $h_{J}^{\left(n\right)}=q_{n}$ for $J\in\left\{M,L\right\}$ and $h_{S}^{\left(n\right)}=q_{n-1}$ . By the structure of the first return map $T^{\left(n\right)}$ , the intervals $\left(T^{\left(n\right)}\right)^{i}\left(I_{S}^{\left(n\right)}\right)$ , $i=1,...,a_{n}$ partition the interval $\left[-\alpha^{\left(n-1\right)},-\alpha^{\left(n\right)}\right)=I^{\left(n\right)}\setminus I^{\left(n+1\right)}$ into intervals of equal length, and it follows that the first return time of $T_{\alpha_{0}}$ is constant on $I_{S}^{\left(n\right)}$ and equals to

[TABLE]

It also follows that the tower over $I_{S}^{\left(n\right)}\subset I^{\left(n+1\right)}$ at stage $n+1$ is obtained by stacking the subtowers over the intervals $\left(T^{\left(n\right)}\right)^{i}\left(I_{S}^{\left(n\right)}\right)$ on top of the tower $Z_{S}^{\left(n\right)}$ (as shown in Figure 2.2). By construction, the point $\beta^{\left(n+1\right)}$ is obtained by vertically projecting the point $\beta^{\left(n\right)}$ from its location in the tower over $I_{S}^{\left(n\right)}$ down to the interval $I_{S}^{\left(n\right)}$ . According to our definitions, $\beta^{\left(n+1\right)}$ divides $I_{S}^{\left(n\right)}$ into $I_{L}^{\left(n+1\right)}=\left[\beta^{\left(n\right)},\alpha^{\left(n-1\right)}\right)$ and $I_{M}^{\left(n+1\right)}=\left[0,\beta^{\left(n\right)}\right)$ . As we have seen, the height of the towers at stage $n+1$ over the intervals $I_{M}^{\left(n+1\right)}$ and $I_{L}^{\left(n+1\right)}$ is the same and equals $q_{n+1}$ , but the composition of the towers is different. The tower $Z_{M}^{\left(n+1\right)}$ is obtained by stacking, on top of the bottom tower $Z_{S}^{\left(n\right)}$ , first $b_{n}$ subtowers of $Z_{L}^{\left(n\right)}$ and then $a_{n}-b_{n}$ subtowers of $Z_{M}^{\left(n\right)}$ on top of them; $Z_{L}^{\left(n+1\right)}$ has a similar structure, with the tower $Z_{S}^{\left(n\right)}$ in the bottom, but with $b_{n}-1$ subtowers of $Z_{L}^{\left(n\right)}$ on top and then $a_{n}-b_{n}+1$ subtowers of $Z_{M}^{\left(n\right)}$ stacked over (see Figure 2.2). The tower over $I_{S}^{\left(n+1\right)}=I_{M}^{\left(n\right)}$ remains unchanged, i.e $Z_{S}^{\left(n+1\right)}=Z_{M}^{\left(n\right)}$ .

It is convenient to describe the tower structure in the language of substitutions. Let us recall that a substitution $\tau$ on a finite alphabet $\mathcal{A}$ is a map which associates to each letter of $\mathcal{A}$ a finite word in the alphabet $\mathcal{A}$ . To each $(\alpha,\beta)$ with $\beta$ rational or $\alpha,\beta,1$ linearly independent over $\mathbb{Q}$ , we associate a sequence $(\tau_{n})_{n}$ of substitutions over the alphabet $\{L,M,S\}$ , where for $J\in\left\{L,M,S\right\}$ ,

[TABLE]

if and only if the tower $Z_{J}^{\left(n+1\right)}$ consists of subtowers of $Z_{J_{i}}^{\left(n\right)}$ , $i=0,...,k$ stacked on top of each other in the specified order, i.e. the subtower of $Z_{J_{i+1}}^{\left(n\right)}$ is stacked on top of $Z_{J_{i}}^{\left(n\right)}$ . More formally,

[TABLE]

For example, in the case discussed above, since the tower $Z_{M}^{\left(n+1\right)}$ is obtained by stacking, on top of each other, in order, $Z_{S}^{\left(n\right)}$ , then $b_{n}$ subtowers of $Z_{L}^{\left(n\right)}$ and then $a_{n}-b_{n}$ subtowers of $Z_{M}^{\left(n\right)}$ , we have

[TABLE]

We will use the convention of writing $J^{n}$ for the block $J\cdots J$ where the symbol $J$ is repeated $n$ times. With this convention, the above substitution can be written $\tau_{n}(M)=SL^{b_{n}}M^{a_{n}-b_{n}}.$

If $\omega$ is a word $\omega=J_{0}J_{1}\cdots J_{k}$ where we will denote by $\omega_{i}$ the letter indexed by $0\leq i$ < $\left|\omega\right|$ . Using this notation, we can rewrite (2.16) as

[TABLE]

We summarize the tower structure and the associated sequence of substitutions in the following proposition. The substitution $\tau_{n}$ is determined by the location of $\beta^{\left(n\right)}\in I^{\left(n\right)}$ , or equivalently, by the non-rescaled parameters $\left(\alpha_{n},\beta_{n}\right)$ and one can check that there are three separate cases corresponding to the parameters being in $X_{G}$ , $X_{B_{-}}$ or $X_{B_{+}}$ . One of the cases was analyzed in the discussion above, while the other cases can be deduced similarly, and the proof of the proposition is a straightforward induction on $n$ .

Proposition 2.5.

The first return time function of $T_{\alpha_{0}}$ to $I^{\left(n\right)}$ is constant on each of the three intervals $I_{J}^{\left(n\right)}$ , $J\in\left\{L,M,S\right\}$ . Thus, for $n=0,1,2,...$ ,

[TABLE]

where $h_{J}^{\left(n\right)}$ is the value of the first return time function on $I_{j}^{\left(n\right)}$ , which is given by

[TABLE]

The sequence of substitutions associated to the pair $\left(\alpha,\beta\right)$ is given by the formulas, determined by the following cases

•

If $\left(\alpha_{n},\beta_{n}\right)\in X_{G}$

[TABLE]

•

If $\left(\alpha_{n},\beta_{n}\right)\in X_{B_{-}}$

[TABLE]

•

If $\left(\alpha_{n},\beta_{n}\right)\in X_{B_{+}}$

[TABLE]

*Remark 2.6**.*

It can be shown that due to irrationality of $\alpha$ , the levels of the towers $Z_{J}^{\left(n\right)}$ , $J\in\left\{L,M,S\right\}$ form an increasing sequence of partitions that separates points and hence generates the Borel $\sigma$ -algebra on $\left[-1,\alpha_{0}\right)$ (see for example [34]).

Let $A_{n}$ , $n\in\mathbb{N}$ , be the $3\times 3$ incidence matrix of the substitution $\tau_{n}$ with entries indexed by $\{L,M,S\}$ , where the entry indexed by $\left(J_{1},J_{2}\right)$ , which we will denote by $(A_{n})_{J_{1},J_{2}}$ , gives the number of subtowers contained in $Z_{J_{2}}^{(n)}$ among the subtowers of level $n$ which are stuck to form of tower $Z_{J_{1}}^{(n+1)}$ . Equivalently, the entry $(A_{n})_{J_{1},J_{2}}$ gives the number of occurrences of the letter $J_{2}$ in the word $\tau_{n}\left(J_{1}\right)$ . If we adopt the convention that the order of rows/columns of $A_{n}$ corresponds to $L,M,S$ , it follows from Proposition 2.5 that these matrices are then explicitly given by:

[TABLE]

In particular, if we denote by $h^{(n)}$ the column vector of heights towers, i.e. the transpose of $\left(h_{L}^{(n)},h_{M}^{(n)},h_{S}^{(n)}\right)$ , it satisfies the recursive relations

[TABLE]

*Remark 2.7**.*

We remark briefly for the readers familiar with the Vershik adic map and the $S$ -adic formalism (even though it will play no role in the rest of this paper), that the sequence $(\tau_{n})_{n}$ also allows to represent the map $T_{\alpha_{0}}$ as a Vershik adic map. The associated Bratteli diagram is a non-stationary diagram, whose vertex sets $V_{n}$ are always indexed by $\{L,M,S\}$ with $(A_{n})_{J_{1},J_{2}}$ edges from $J_{1}$ to $J_{k}$ ; the ordering of the edges which enter the vertex $J$ at level $n$ is given exactly by the substitution word $\tau_{n}(J)$ . We refer the interested reader to the works by Vershik [37] and to the paper by Berthe and Delecroix [9] for further information on Vershik maps, Brattelli diagrams and $S$ -adic formalism.

2.3.1. Special Birkhoff sums

Let us consider now the function $\varphi$ defined by (2.5) which has a discontinuity at [math] and at $\beta_{0}$ . In order to study its Birkhoff sums $\varphi_{n}$ (defined in (2.6)) , we will use the renormalization algorithm described in the previous section. Under the assumption that $\left(\alpha,\beta\right)\in\tilde{X}$ , $\varphi$ determines a sequence of functions $\varphi^{\left(n\right)}$ , where $\varphi^{(n)}$ is a real valued function defined on $I^{(n)}$ obtained by inducing $\varphi$ on $I^{(n)}$ , i.e. by setting

[TABLE]

The function $\varphi^{(n)}$ is what Marmi-Moussa-Yoccoz in [25] started calling special Birkhoff sums: the value $\varphi^{(n)}(x)$ gives the Birkhoff sum of the function $\varphi$ along the orbit of $x\in I_{J}^{(n)}$ until its first return to $I^{(n)}$ , i.e. it represents the Birkhoff sum of the function along an orbit which goes from the bottom to the top of the tower $Z_{J}^{(n)}$ .

One can see that since $\varphi^{(0)}:=\varphi$ has mean zero and a discontinuity with a jump of $1$ at $\beta^{(0)}:=\beta_{0}$ , its special Birkhoff sums $\varphi^{(n)}$ , $n\in\mathbb{N}$ , again have mean zero and a discontinuity with a jump of $1$ . The points $\beta^{(n)}$ , $n\in\mathbb{N}$ , are defined in the renormalization procedure exactly so that $\varphi^{(n)}$ has a jump of one at $\beta^{(n)}$ . Moreover, the function $\varphi$ is constant on each level of the towers $Z_{J}^{\left(n\right)}$ , $J\in\left\{L,M,S\right\}$ , $n\in\mathbb{N}\cup\left\{0\right\}$ , and therefore, it is completely determined by a sequence of vectors

[TABLE]

where $\varphi_{J}^{\left(n\right)}=\varphi^{\left(n\right)}\left(x\right)$ , for any $x\in I_{J}^{\left(n\right)}$ . It then follows immediately from the towers recursive structure (see equation (2.16)) that the functions $\varphi^{\left(n\right)}$ also satisfy the following recursive formulas given by the substitutions in Proposition 2.5:

[TABLE]

We finish this section with a few simple observations on the heights of the towers and on special Birkhoff sums along these towers that we will need for the proof of the main result. Let $\left(\alpha_{0},\beta_{0}\right)\in\tilde{X}$ be the parameters associated to a given pair $\left(\alpha,\beta\right)$ via the relations (2.3) and (2.4). Under the assumption that $\alpha$ is badly approximable, since the heights of the towers appearing in the renormalization procedure satisfy $\eqref{eq:recursiveheights}$ ) and $0\leq b_{n}\leq a_{n}$ are bounded, there exists a constant $C$ such that

[TABLE]

It follows that for any $m\in\mathbb{N}$ , there exists a constant $M=M\left(m\right)$ , such that if $\left|k-n\right|\leq m$ , then

[TABLE]

Moreover, by (2.1), the special Birkhoff $\varphi_{J}^{\left(n\right)}$ are uniformly bounded, i.e.

[TABLE]

2.4. The (adic) symbolic coding

The renormalization algorithm and the formalism defined above lead to the symbolic coding of the dynamics of $T_{\alpha_{0}}$ described in the present section. This coding is exploited in Section 2.5 to build an array of non-homogeneous Markov chains which models the dynamics.

Definition 2.8.

(Markov compactum) Let $\left(\mathcal{S}_{n}\right)_{n=1}^{\infty}$ be a sequence of finite sets with $\sup_{i}\left|\mathcal{S}_{i}\right|<\infty$ and let $\left(A^{\left(n\right)}\right)_{n=1}^{\infty}$ be a sequence of matrices, such that $A^{\left(n\right)}$ is an $\left|\mathcal{S}_{n}\right|\times\left|\mathcal{S}_{n+1}\right|$ matrix whose entries $A_{s,t}^{\left(n\right)}\in\left\{0,1\right\}$ for any $(s,t)\in\mathcal{S}_{n}\times\mathcal{S}_{n+1}$ . The Markov compactum determined by $A^{\left(n\right)}$ is the space

[TABLE]

To describe the coding, recall that for each $n\in\mathbb{N}$ , each tower $Z_{J}^{(n)}$ , where $J\in\{L,M,S\}$ , is obtained by stacking at most $a_{n}+1$ *subtowers *of the towers $Z_{K}^{(n-1)}$ (the type and order of the subtowers is completely determined by the word $\tau_{n-1}(J)$ given by the substitution $\tau_{n-1}$ as described in Proposition 2.5). We will label these subtowers by $(J,i)$ , where the index $i$ satisfies $0\leq i\leq a_{n}$ and indexes the subtowers from bottom to top: more formally, $(J,i)$ is the label of the subtower of $Z_{J}^{(n)}$ , with base $(T^{(n-1)})^{i}(I_{J}^{(n)})$ , which is the $(i+1)^{th}$ subtower from the bottom (see Figure 2.3). Thus, for a fixed $n$ , denoting by $\left|\tau_{n-1}\left(J\right)\right|$ the length of the word $\tau_{n-1}\left(J\right)$ , the labels of the subtowers belong to

[TABLE]

When $\alpha=\frac{\alpha_{0}}{1+\alpha_{0}}=\left[a_{0},a_{1},\dots,a_{n},\dots\right]$ is the continued badly approximable, let $a_{max}$ be the largest of its continued fraction entries and consider the alphabet

[TABLE]

*Remark 2.9**.*

It is not necessary for $\alpha$ to be badly approximable in order for the construction of the present section and the next section to be valid. If $\alpha$ is not badly approximable, define $E=\left\{L,M,S\right\}\times\left\{0,1,...,n,...\right\}$ . This definition would make all statements of this and the following sections valid, without any further changes.

Definition 2.10.

Given $x\in\left[-1,\alpha_{0}\right)$ , for each $n\in\mathbb{N}$ , $x$ is contained in a unique tower $Z_{J_{n}(x)}^{(n)}$ for some $J_{n}\left(x\right)\in\{L,M,S\}$ , and furthermore in a unique subtower of stage $n-1$ inside it, labeled by $\left(J_{n}\left(x\right),j_{n}\left(x\right)\right)$ where $0\leq j_{n}\left(x\right)\leq a_{n}$ . Let $\Psi:\left[-1,\alpha_{0}\right)\rightarrow E$ be the coding map defined by

[TABLE]

Let us recall that for word $\omega$ in the alphabet $E$ let us denote by $\omega_{i}$ the letter in the word which is labeled by $0\leq i<$$\left|\omega\right|$ .

Proposition 2.11.

The image of $\Psi$ is contained in the subspace $\Sigma\subset E^{\mathbb{N}}$ defined by

[TABLE]

The preimage under $\Psi$ of any cylinder $\left[\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)\right]:=\left\{\omega\in\Sigma:\ \omega_{i}=\left(J_{i},j_{i}\right),\ i=1,...,n\right\}$ satisfying the constraints $\left(\tau_{i}\left(J_{i+1}\right)\right)_{j_{i+1}}=J_{i},\ i=1,2,...,n-1$ is the set of all points on some level of the tower $Z_{J_{n}}^{\left(n\right)}$ , i.e. there exists $0\leq i<h_{J_{n}}^{\left(n\right)}$ such that

[TABLE]

Moreover, $\Psi$ is a Borel isomorphism between $\left[-1,\alpha_{0}\right)$ and its image, where the Borel structure on the image of $\Psi$ is inherited from the natural Borel structure on $E^{\mathbb{N}}$ arising from the product topology on $E^{\mathbb{N}}$ .

Let

[TABLE]

be the set of symbols which appear as $n^{th}$ coordinate in some admissible word in $\Sigma$ , and note that the definition of $\Sigma$ shows that $\Sigma$ is a Markov compactum with state space $\prod_{i=1}^{\infty}\mathcal{S}_{i}$ , given by a sequence of matrices $\left(A^{(n)}\right)^{T}$ indexed by $\mathcal{S}_{n}\times\mathcal{S}_{n+1}$ such that $A_{\left(K,k\right),\left(J,j\right)}^{\left(n\right)}=1$ if and only if $\left(\tau_{n}\left(J\right)\right)_{j}=K$ . Although we do not need it in what follows, one can explicitly describe the image $\Sigma^{\prime}\subset\Sigma$ of the coding map $\Psi$ and show that it is obtained from $\Sigma$ by removing countably many sequences. We remarked in Remark 2.7 that $T_{\alpha}$ is conjugated to a the Vershik adic map. Let us add that the map $\Psi$ provides the measure theoretical conjugacy.

Proof of Proposition 2.11.

First we prove that the image of $\Psi$ is contained in $\Sigma$ . To see this, note that for $x\in\left[-1,\alpha_{0}\right)$ , $(J_{n}\left(x\right),j_{n}\left(x\right))=(K,k)$ means that $x$ belongs to $Z_{K}^{(n)}$ (since $J_{n}\left(x\right)=K$ ) and $(J_{n+1}\left(x\right),j_{n+1}\left(x\right))=(J,j)$ means that $x$ belongs to the $j^{th}$ subtower of $Z_{J}^{(n+1)}$ . Hence the $j^{th}$ subtower of $Z_{J}^{(n+1)}$ must be contained in $Z_{K}^{(n)}$ . Recalling the definition of the substitutions $(\tau_{n})_{n},$ this implies exactly the relation $\left(\tau_{n}\left(J\right)\right)_{j}=K$ , which in turn implies that $\Psi\left(x\right)\in\Sigma$ .

To prove the second statement, namely that cylinders correspond to floors of towers, note that according to our labeling of the towers, the set of all $x$ such that $\left(J_{n}\left(x\right),j_{n}\left(x\right)\right)=\left(J_{n},j_{n}\right)$ consists exactly of all points which belong to $h_{K}^{\left(n-1\right)}$ levels of the tower $Z_{J_{n}}^{\left(n\right)}$ , where $K=\left(\tau_{n-1}\left(J_{n}\right)\right)_{j_{n}}$ . Proceeding by induction, one sees that for any $k=n-1,\dots,1$ , the set $\left\{x:\ \left(J_{i}\left(x\right),j_{i}\left(x\right)\right)=\left(J_{i},j_{i}\right),\ i=k,...,n\right\}$ is the set of all points contained in precisely $h_{K}^{\left(k-1\right)}$ levels of the tower $Z_{J_{n}}^{\left(n\right)}$ , where $K=\left(\tau_{k-1}\left(J_{k}\right)\right)_{j_{k}}$ . Thus, since $h_{K}^{(0)}=1$ for any $K\in\left\{L,M,S\right\}$ , $\Psi^{-1}\left(\left[\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)\right]\right)$ is the set of all points on a single level of the tower $Z_{J_{n}}^{\left(n\right)}$ . This argument shows that the levels of the towers $Z_{J}^{\left(n\right)}$ , $J\in\left\{L,M,S\right\},$ are in bijective correspondence under the map $\Psi$ with cylinders of length $n$ in $\Sigma$ .

Finally, injectivity and bi-measurability of $\Psi$ follow since the sequence of partitions induced by the tower structure generate the Borel sets of the space $\left[-1,\alpha_{0}\right)$ and separates points (see Remark 2.6). ∎

2.5. The Markov chain modeling towers.

In what follows, we denote by $\mu$ be the push forward by the map $\Psi$ of the normalized Lebesgue measure $\lambda$ on $\left[-1,\alpha_{0}\right)$ , i.e. the measure given by

[TABLE]

Moreover, for $J\in\left\{L,M,S\right\}$ , let us define also the conditional measures

[TABLE]

We denote by $\Sigma_{n}$ and $E_{n}$ correspondingly, the restriction of $\Sigma$ and $E$ to the first $n$ coordinates and we endow these sets with the $\sigma$ -algebras inherited from the Borel $\sigma$ -algebra on $E^{\mathbb{N}}$ . Let $\mathcal{S}_{n}$ be the set of states appearing in the $n^{th}$ coordinate of $\Sigma$ , defined by (2.25).

We define a sequence of transition probabilities, or equivalently in this discrete case, stochastic matrices $p_{\left(J,j\right),\left(K,k\right)}^{\left(n\right)}$ , where $\left(J,j\right)\in\mathcal{S}_{n+1}$ and $\left(K,k\right)\in\mathcal{S}_{n}$ , and a sequence of probability distributions $\pi_{n}$ on $\mathcal{S}_{n}$ which are used to define a sequence of Markovian measures on $\Sigma_{n}$ that model the dynamical renormalization procedure. We refer to the sequence $p^{\left(n\right)}$ as the sequence of* transition matrices associated to the pair* $\left(\alpha_{0},\beta_{0}\right)\in\tilde{X}$ .

Definition 2.12.

For any $n\in\mathbb{N}$ and $J,\,K\in\left\{L,M,S\right\}$ , if

[TABLE]

we define

[TABLE]

Moreover, for any $\left(L,l\right)\in E$ , we set

[TABLE]

*Remark 2.13**.*

The rationale behind the definition of $\pi_{n}$ is that $\pi_{n}\left(K,k\right)$ is defined to be the $\lambda$ - measure of the piece of the tower $Z_{K}^{(n)}$ labeled by $(K,k)$ ; similarly $p_{\left(J,j\right),\left(K,k\right)}^{\left(n\right)}$ is non zero exactly when the $k^{th}$ subtower inside $Z_{J}^{(n)}$ is contained in $Z_{K}^{(n-1)}$ , in which case it gives the proportion of this $k^{th}$ subtower which is contained in the subtower of $Z_{K}^{(n-1)}$ labeled by $(J,j)$ .

The following Proposition identifies the measures $\mu_{n}$ and $\mu_{n}^{J}$ as Markovian measures on $\Sigma_{n}$ generated by the transition matrices and initial distributions indicated in the previous definition.

Proposition 2.14.

For every $n\in\mathbb{N}$ , $J\in\left\{L,M,S\right\}$ and every word $\left(\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)\right)\in\Sigma_{n}$ we have

[TABLE]

and

[TABLE]

Proof.

By Proposition 2.11, $\Psi^{-1}\left(\left[\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)\right]\right)$ is non empty if and only if the sequence $\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)$ satisfies the conditions $\left(\tau_{i}\left(J_{i+1}\right)\right)_{j_{i+1}}=J_{i}$ for $i=1,...,n-1$ , in which case it consists of the the set of all points on a certain level of the tower $Z_{J_{n}}^{\left(n\right)}$ , i.e.

[TABLE]

It follows from the definition (2.26) of the measure $\mu$ that

[TABLE]

Moreover, we get that the conditional measures $\mu_{n}^{J}$ given by (2.27) satisfy

[TABLE]

Equations (2.28) now follow by definition of $\pi_{n}$ and $p^{\left(n\right)},$ which give that

[TABLE]

Hence, by the conditions $\left(\tau_{i}\left(J_{i+1}\right)\right)_{j_{i+1}}=J_{i}$ for $i=1,...,n-1$ and recalling that $J_{n}=J$ and $h_{K}^{(0)}=1$ for any $K\in\left\{L,M,S\right\}$ , we have that

[TABLE]

Equations (2.29) follows in the same way by using the definition of $\pi_{n}^{J}$ instead than $\pi_{n}$ .

Finally, if the sequence $\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)$ does not satisfy the conditions $\left(\tau_{i}\left(J_{i+1}\right)\right)_{j_{i+1}}=J_{i}$ for $i=1,...,n-1$ , $\Psi^{-1}\left(\left[\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)\right]\right)=\emptyset$ (by Proposition 2.11, as recalled above) and by definition of $p^{\left(n\right)}$ and $\pi_{n}$ , $\pi_{n}^{J}$ , we get that the right hand sides in (2.3) and (2.31) are both zero, so equations (2.3) and (2.31) hold in this case too. This completes the proof. ∎

For $\omega\in\Sigma$ , $n\in\mathbb{N}$ , we define the *coordinate random variables *

[TABLE]

Since, all cylinders of the form $\left[\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)\right]$ , with $\left(\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)\right)\in\Sigma_{n}$ generate the $\sigma$ -algebra of $\Sigma_{n}$ , it immediately follows from Proposition 2.14 that for every $n\in\mathbb{N}$ , $X_{n},...,X_{1}$ form a Markov chain on $\Sigma_{n}$ with respect to the measures $\mu$ , $\mu_{n}^{J}$ with transition probabilities $p^{\left(i\right)}$ , $i=1,...,n-1$ and initial distributions $\pi_{n}$ , $\pi_{n}^{J}$ , respectively.

2.6. The functions over the Markov chain modeling the Birkhoff sums.

Let us now define a sequence of functions $\xi_{n}:\,\mathcal{S}_{n}\rightarrow\mathbb{R}$ that enables us to model the distribution of Birkhoff sums. In section 2.3.1 we introduced the notion of special Birkhoff sums of $\varphi$ , i.e. Birkhoff sums of $\varphi$ along the orbit of a point $x$ in the base of a renormalization tower $Z_{J}^{(n)}$ up to the height of the tower, see (2.21). We will consider in this section intermediate Birkhoff sums along a tower (of for short, intermediate Birkhoff sums), namely Birkhoff sums of a point at the base of a tower $Z_{J}^{\left(n\right)}$ up to an intermediate height, i.e. sums of the form

[TABLE]

The crucial Proposition (2.16) shows that intermediate Birkhoff sums can be expressed as sums of the following functions $\left\{\xi_{n}\right\}$ over the Markov chain $\left(X_{n}\right)$ .

Definition 2.15.

For $n\in\mathbb{N}$ , $\left(J,j\right)\in\mathcal{S}_{n}$ such that $\tau_{n-1}\left(J\right)=J_{0}\dots J_{l}$ (note that this forces $0\leq j\leq l=\left|\tau_{n-1}\left(J\right)\right|$ ), if $n\geq 2$ set

[TABLE]

where, by convention, a sum with $i$ that runs from [math] to $-1$ is equal to zero. If $n=1$ , set

[TABLE]

We then have the following proposition.

Proposition 2.16.

Let $J\in\left\{L,M,S\right\}$ and let , $x_{J}\in I_{J}^{\left(n\right)}$ . Then for any $A\in\mathcal{B}\left(\mathbb{R}\right)$ ,

[TABLE]

Proof.

We show by induction on $n$ that, for any $J\in\left\{L,M,S\right\}$ , any $x\in I_{J}^{\left(n\right)}$ and $0\leq l\leq h_{J}^{\left(n\right)}-1$ , we have that

[TABLE]

where $\left[\omega\right]=\left[\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)\right]$ is the (unique) cylinder containing $T_{\alpha_{0}}^{l}x$ . To see this, note first that for $n=1$ , $J\in\left\{L,M,S\right\}$ , $x\in I_{J}^{\left(1\right)}$ and $0\leq l\leq h_{J}^{\left(1\right)}-1$ , we have $T^{l}x\in\Psi^{-1}\left(\left[\left(J,l\right)\right]\right)$ and by definition of $\xi_{1}$ ,

[TABLE]

which proves the claim for $n=1$ . Now, assume that (2.31) holds for some $n\in\mathbb{N}$ . Then for $J\in\left\{L,M,S\right\}$ , $x\in I_{J}^{\left(n+1\right)}$ , and $0\leq l\leq h_{J}^{\left(n+1\right)}-1$ , let $\left[\omega\right]=\left[\left(J_{1},j_{1}\right),...,\left(J_{n+1},j_{n+1}\right)\right]$ be the unique cylinder such that $T_{\alpha_{0}}^{l}\left(x\right)\in\Psi^{-1}\left(\left[\omega\right]\right)$ . Then $J_{n}=J$ and by Proposition 2.11 $\Psi^{-1}\left(\left[\omega\right]\right)=T_{\alpha_{0}}^{l}\left(I_{J}\right)$ . It follows from definition of the map $\Psi$ , that $J_{n+1}=J$ and

[TABLE]

Thus, setting $l^{\prime}=l-\sum_{i=1}^{j_{n+1}-1}h_{\left(\tau\left(J\right)\right)_{i}}^{\left(n\right)}$ , $x^{\prime}=\left(T^{\left(n\right)}\right)^{j_{n+1}}\left(x\right)$ and using the definition of $\xi_{n+1}$ , we may write

[TABLE]

The previous equality is obtained by splitting the Birkhoff sum up to $l$ of a point at the base of the tower $Z_{J_{n+1}}^{\left(n+1\right)}$ into special Birkhoff sums over towers obtained at the $n^{th}$ stage of the renormalization procedure and a remainder given by $\sum_{k=0}^{l^{\prime}}\varphi\left(T_{\alpha_{0}}^{k}\left(x^{\prime}\right)\right)$ . Now, by definition of the coding map $\Psi$ , $T_{\alpha_{0}}^{l^{\prime}}\left(x^{\prime}\right)\in\Psi^{-1}\left(\left[\left(J_{1},j_{1}\right),...,\left(J_{n},j_{n}\right)\right]\right)$ . Thus, if for $y\in\mathbb{R}$ , we let $A-y$ denote the set $\left\{a-y:\ a\in A\right\}$ , (2.32) implies,

[TABLE]

and the equality (2.31) now follows from the hypothesis of induction, which gives

[TABLE]

Since by Proposition 2.14, $\mu_{n}^{J}\left(\left[\omega\right]\right)=\frac{1}{h_{J}^{\left(n\right)}}$ for any $\omega\in\Sigma_{n}$ , and since by the proof of Proposition 2.11, the levels of the tower $Z_{J}^{\left(n\right)}$ are in bijective correspondence with cylinders of length $n$ in $\Sigma_{n}$ , the proof is complete. ∎

3. the clt for markov chains

In the previous section we established that the study of intermediate Birkhoff sums can be reduced to the study of (in general) non-homogeneous Markov chains. In this section we establish some (mostly well-known) statements about such Markov chains which we use in the proof of our temporal CLT. The main result which we need is the CLT for non-homogeneous Markov chains. To the best of our knowledge, this was initially established by Dobrushin [16, 17] (see also [32] for a proof using martingale approximations). Dobrushin’s CLT is not directly valid in our case (since it assumes that the contraction coefficient is strictly less than $1$ for every transition matrix in the underlying chain, while under our assumptions this is only valid for a product of a constant number of matrices). While the proof of Dobrushin’s theorem can be reworked to apply to our assumptions, we do not do it here, and instead use a general CLT for $\varphi$ -mixing triangular arrays of random variables by Utev.

3.1. Contraction coefficients, mixing properties and CLT for Markov chains

In this section we collect some probability theory results for (arrays of) non-homogeneous Markov chains that we will use in the next section.

Let $\left(\Omega,\mathcal{B},P\right)$ be a probability space and let $\mathcal{F}$ , $\mathcal{G}$ be two sub $\sigma$ -algebras of $\mathcal{B}$ . For any $\sigma$ -algebra $\mathcal{A}\subset\mathcal{B}$ , denote by $\mathcal{L}^{2}\left(\mathcal{A}\right)$ the space of square integrable, real functions on $\Omega$ , which are measurable with respect to $\mathcal{A}$ . We use two measures of dependence between $\mathcal{F}$ and $\mathcal{G}$ , the so called $\varphi$ -coefficient and $\rho$ -coefficient, defined by

[TABLE]

and

[TABLE]

It is a well-known fact (see [10]) that

[TABLE]

In what follows, let $Y=\left\{Y_{1}^{\left(n\right)},...,Y_{n}^{\left(n\right)}:\,n\geq 1\right\}$ be a triangular array of mean zero, square integrable random variables such that the random variables in each row are defined on the same probability space $\left(\Omega,\mathcal{B},P\right)$ . For any set $\mathcal{Y}$ of random variables defined on $\left(\Omega,\mathcal{B},P\right)$ , let us denote by $\sigma\left(\mathcal{Y}\right)$ to be the $\sigma$ -algebra generated by all the random variables in $\mathcal{Y}.$

Set $S_{n}=\sum_{k=1}^{n}Y_{k}^{\left(n\right)}$ and $e_{n}=E\left(S_{n}\right)$ , $\sigma_{n}=\sqrt{Var\left(S_{n}\right)}$ . For any $n,k\in\mathbb{N}$ let

[TABLE]

The array $Y$ is said to be $\varphi$ -mixing if $\varphi\left(k\right)\rightarrow 0$ as $k$ tends to infinity.

The following CLT for $\varphi$ -mixing arrays of random variables, which follows from a more general CLT for such arrays in [36] is the main result that we use to prove our distributional CLT.

Theorem 3.1.

Let $Y$ be a $\varphi$ -mixing array of square integrable random variables and assume that

[TABLE]

for every $\epsilon>0$ . Then

[TABLE]

converges in law to the standard normal distribution.

Let $\mathcal{T},\mathcal{S}$ be finite sets and $P$ a stochastic matrix with entries indexed by $\mathcal{T}\times\mathcal{S}$ . The contraction coefficient of $P$ is defined by

[TABLE]

It is not difficult to see that $\tau\left(P\right)=0$ if and only if the entry $P_{s,t}$ does not depend on $s$ and that

[TABLE]

for any pair of stochastic matrices $P$ and $Q$ such that their product is defined.

For $n\in\mathbb{N}$ , let $X_{1}^{\left(n\right)},...,X_{n}^{\left(n\right)}$ be a Markov chain with each $X_{i}^{\left(n\right)}$ taking values in a finite state space $\mathcal{S}_{i}$ , determined by an initial distribution $\pi_{n}$ and transition matrices $P_{i}^{\left(n\right)}$ , $i=1,...n$ (thus, each matrix $P_{i}^{\left(n\right)}$ has dimension $\left|\mathcal{S}_{i}\right|\times\left|\mathcal{S}_{i+1}\right|$ ).

Proposition 3.2.

Assume that there exist $0\leq\delta<1$ and $s\in\mathbb{N}$ such that for every $n\in\mathbb{N}$

[TABLE]

Then $X=X_{0}^{\left(n\right)},...,X_{n}^{\left(n\right)}$ is $\varphi$ -mixing and $\varphi\left(k\right)$ tends to [math] as $k\rightarrow\infty$ with exponential rate.

Proof.

This is a direct consequence of the inequality

[TABLE]

(see relation (1.1.2) and Proposition 1.2.5 in [23]) and the fact that $\tau\left(P_{j}^{\left(n\right)}...P_{j+k}^{\left(n\right)}\right)\leq\delta^{\left[\frac{k}{s}\right]}$ , which immediately follows from the assumption and (3.4). ∎

Now, let $\xi_{i}^{\left(n\right)}:\mathcal{S}_{i}\rightarrow\mathbb{R}$ , with $1\leq i\leq n$ for any $n\in\mathbb{N}$ , be an array of functions and set $Y_{i}^{\left(n\right)}=\xi_{i}^{\left(n\right)}\left(X_{i}^{\left(n\right)}\right)$ . Henceforth, we assume that

[TABLE]

An application of Theorem 3.1 yields the following corollary.

Corollary 3.3.

Under the conditions of Proposition 3.2, assume further that and $\sigma_{n}\rightarrow\infty$ . Then $\frac{S_{n}-e_{n}}{\sigma_{n}}$ converges in law to the standard normal distribution.

Proof.

It is enough to remark that the condition (3.2) in Theorem 3.1 holds trivially for $n$ large in virtue of the bound in (3.5) since by assumption the variance $\sigma_{n}\rightarrow\infty.$ ∎

Let now $\tilde{\pi}_{n}$ be a sequence of probability distributions on $\mathcal{S}_{1}$ , and let $\tilde{X}_{1}^{\left(n\right)},...,\tilde{X}_{n}^{\left(n\right)}$ be an array of Markov chains generated by initial distributions $\tilde{\pi}_{n}$ and transition matrices $P_{i}^{\left(n\right)}$ . Let $\tilde{S}_{n}=\sum_{k=0}^{n-1}\xi_{i}\left(\tilde{X}_{i}\right)$ and let $\tilde{e}_{n}=E\left(\tilde{S}_{n}\right)$ , $\tilde{\sigma}_{n}=\sqrt{Var\left(\tilde{S}_{n}\right)}$ .

Proposition 3.4.

Under the conditions of Proposition 3.2, there exists a constant $C$ , independent of the sequences $\pi_{n}$ and $\tilde{\pi}_{n}$ , such that $\left|e_{n}-\tilde{e}_{n}\right|\leq C$ and $\left|\sigma_{n}^{2}-\tilde{\sigma}_{n}^{2}\right|\leq C$ for all $n\in\mathbb{N}$ .

Proof.

The assumption implies that there exists a constant $M$ and a sequence of rank $1$ stochastic matrices (i.e stochastic matrices with all rows being identical) $V_{i}^{\left(n\right)}$ such that

[TABLE]

(see [31, Chapter 4, Cor. 2]), where for two matrices $P$ , $Q$ indexed by $S\times T$ , $\left\|P-Q\right\|=\max\left\{\left|P_{s,t}-Q_{s,t}\right|:\,\left(s,t\right)\in S\times T\right\}$ . Using (3.5) it follows that there exists a constant $\tilde{C}$ which depends only on the array of matrices $P_{i}^{\left(n\right)}$ and functions $\xi_{i}^{\left(n\right)}$ , such that

[TABLE]

Since the right hand side of the last inequality is a general term of a summable geometric series, we have proved that there exists a constant $C$ , such that $\left|e_{n}-\tilde{e}_{n}\right|\leq C$ for all $n\in\mathbb{N}$ .

To prove the inequality for the variances, we first note that it follows from (3.1) and (3.5) that there exists a constant $C^{\prime}$ independent of $\pi_{n}$ , such that

[TABLE]

for all $n\in\mathbb{N}$ . An analogous inequality hence holds also for the array $\tilde{X}_{i}^{\left(n\right)}$ instead of $\tilde{X}_{i}^{\left(n\right)}$ , so that

[TABLE]

Moreover, since $\sup_{n}\left|\mu_{n}-\tilde{\mu}_{n}\right|<\infty$ , one can also prove that

[TABLE]

Now, write

[TABLE]

The proof of the Lemma hence follows by (3.7) and (3.8). ∎

4. proof of the temporal clt

In this section we give the proof of Theorem 1.1. We need to show that we can apply the results on Markov chains summarized in the previous section (and in particular Corollary 3.3) to the Markov chains that model the dynamics. In order to check that the required assumptions are verified, we first show, in section 4.1 a result on positivity of the product of finitely many transition matrices, which follows from the assumption that $\alpha$ is badly approximable and $\beta$ is badly approximable with respect to $\alpha$ . Then, in section 4.2 we prove that the variance grows. Finally, the proof of the Theorem is given in section 4.3.

4.1. Positivity of products of incidence

matrices

Let us recall that in Section 2.2 we described a renormalization procedure that, to a pair of parameters $\left(\alpha,\beta\right)$ (under the assumption that $\left(\alpha,\beta\right)\in\tilde{X}$ ), in particular associates a sequence $\left(A_{n}\right)_{n}$ of matrices (given by equations (2.17), (2.18) and (2.19) respectively), which are the incidence matrices of the sequence of substitutions $\left(\tau_{n}\right)_{n}$ which describe the tower structure. In this section, we develop conditions on the pair $\left(\alpha,\beta\right)\in\tilde{X}$ that ensure that we may split the sequence of incidence matrices $\left(A_{n}\right)_{n}$ associated to $\left(\alpha,\beta\right)$ into consecutive blocks of uniformly bounded length, so that the product of matrices in each block is strictly positive. This fact is used for showing that the Markov chain associated to $\left(\alpha,\beta\right)$ satisfies the assumption of the previous section needed to prove the CLT.

Under the assumption that $\left(\alpha,\beta\right)\in\tilde{X}$ , the orbit $\hat{G}^{n}\left(\alpha,\beta\right)$ of the point $\left(\alpha,\beta\right)$ under the transformation $\hat{G}$ defined in (2.13) is infinite and one can consider its* itinerary* with respect to the partition $\left\{X_{G,}X_{B_{-}},X_{B_{+}}\right\}$ defined in Section 2.2: the itinerary is the sequence $\left(s_{n}\right)_{n}\in\mathcal{S}^{\mathbb{N}\cup\text{$ \left{0\right} $}},$ where $\mathcal{S}:=\left\{G,B_{-},B_{+}\right\},$ defined by

[TABLE]

We will call $\mathcal{S}:=\left\{G,B_{-},B_{+}\right\}$ the set of states and we will say that $s\left(\alpha,\beta\right):=\left(s_{n}\right)_{n}\in\mathcal{S}^{\mathbb{N}\cup\text{$ \left{0\right} $}}$ the infinite sequence of states associated to $\left(\alpha,\beta\right)\in\tilde{X}$ . From the definitions in Section 2.2, $s_{n}=G$ (or $B_{-},B_{+}$ respectively) if and only if the incidence matrix $A_{n}$ is of the form (2.17) (or (2.18), (2.19) respectively). It can be easily deduced from the description of the renormalization procedure that not all sequences in $\mathcal{S}^{\mathbb{N}}$ are images of some pair $\left(\alpha,\beta\right)\in\tilde{X}$ . The sequences $s\in\mathcal{S}^{\mathbb{N}\cup\text{$ \left{0\right} $}}$ such that $s=s\left(\alpha,\beta\right),$ for some $\left(\alpha,\beta\right)\in\tilde{X}$ ) form a stationary Markov compactum $\tilde{\mathcal{S}}$$\subset\mathcal{S}^{\mathbb{N}\cup\text{$ \left{0\right} $}}$ with state space determined by the graph,

[TABLE]

namely $s=\left(s_{n}\right)_{n}\in\tilde{\mathcal{S}}$ if and only if for any $n\geq 0$ there is an oriented edge from the state $s_{n}\in\mathcal{S}$ to the state $s_{n+1}\in\mathcal{S}$ in the graph above.

Since at this point we are interested solely in positivity of the incidence matrices and not in the values themselves, we define a function $F:\mathcal{S}\rightarrow M_{3}\left(\mathbb{Z}\right)$ , where $M_{3}\left(\mathbb{Z}\right)$ are $3\times 3$ matrices, by

[TABLE]

Note that $F$ is defined in such a way, so that some entry of the matrix $F\left(s_{n}\right)$ is $1$ , if and only if the corresponding entry of incidence matrix $A_{n}$ which corresponds to the state $s\left(\alpha,\beta\right):=\left(s_{n}\right)_{n}$ has a non-zero value, independently of $a_{n}$ and $b_{n}$ (for example $a_{n}$ and $a_{n}-b_{n}+1$ are always greater than $1$ or $b_{n}\geq 1$ when $\left(\alpha,\beta\right)\in G$ ). Note that the other implication is not necessarily true, namely some entries of $F(s)$ could be [math] even if the corresponding entry of the incidence matrices are positive (in such cases the positivity depends on the values of $a_{n}$ and $b_{n}$ , for example $a_{n}-b_{n}$ is zero if $a_{n}=b_{n}$ ). Thus, for any $n,k\in\mathbb{N}\bigcup\left\{0\right\}$ ,

[TABLE]

It immediately follows from the topology of the transition graph that every itinerary $s\in\mathcal{S}^{\mathbb{N}\cup\left\{0\right\}}$ can be written in the form

[TABLE]

where $W_{k}$ , $k\in\mathbb{N}$ are words in the alphabet $\mathcal{S}$ which do not contain $B_{-}$ (i.e. they are words in $G$ and $B_{+})$ , and $W_{k}$ is not empty for $k\geq 2$ . Note that it may be that the number of appearances of $B_{-}$ in the above representation is finite. This means that there exists $K$ such that $n_{k}=0$ for $k\geq K$ and in this case the above representation reduces to

[TABLE]

where the length of $W_{K+1}$ is infinite.

Definition 4.1.

Let $\left(\alpha,\beta\right)\in\tilde{X}$ . We say that $\beta$ * is of Ostrowski bounded type with respect to* $\alpha$ if the decomposition of $s\left(\alpha,\beta\right)\in\mathcal{S^{\mathbb{N}}}$ given by (4.3) or (4.4) satisfies $\sup\left\{n_{k}\right\}=M<\infty$ , where the supremum is taken over $k\in\mathbb{N}$ in the first case, and over $k\in\left\{1,...,K\right\}$ in the second case. We say in both cases that $\beta$ * is of Ostrowski bounded type of order $M$ .*

Proposition 4.2.

Let $\beta$ be of Ostrowski bounded type of order $M$ with respect to $\alpha$ and let $(A_{i})_{i}$ bee i the sequence of incidence matrices associated to $(\alpha,\beta)$ by the Ostrowski renormalization. Then for any $k$ , and any $n\geq 5M$ , we have that $A_{k+n}A_{k+n-1}...A_{k}>0$ .

Proof.

Let $W_{1}\left(B_{-}B_{+}\right)^{n_{1}}W_{2}\left(B_{-}B_{+}\right)^{n_{2}}...W_{k}\left(B_{0}B_{1}\right)^{n_{k}}...$ the decomposition of $s\left(\alpha,\beta\right)$ described above. Direct calculation gives that the product of matrices which corresponds to an admissible word of length $5$ (or more) which does not contain $B_{-}$ is strictly positive. Also, any word of length $5$ which starts with $B_{-}B_{+}G$ gives a transition matrix which is strictly positive. Note that it follows from the transition graph that each $W_{i}$ , $i\geq 2$ must start with $G$ and must be of length strictly greater than $1$ . Since any subword of length greater than $5M$ must contain a block of the form $B_{-}B_{+}W_{i}B_{-}$ , or a block of length at least $5$ where there is no occurrence of $B_{-}$ , the claim follows. ∎

Lemma 4.3.

If $0<\alpha<\frac{1}{2}$ is badly approximable and $\beta\in\left(0,1\right)$ is badly approximable with respect to $\alpha$ , then the pair $\left(\alpha_{0},\beta_{0}\right),$ related to $\left(\alpha,\beta\right)$ via equations (2.3) and (2.4), satisfies $\left(\alpha_{0},\beta_{0}\right)\in\tilde{X}$ and $\beta_{0}$ is of Ostrowski bounded type with respect to $\alpha_{0}$ .

Proof.

Let $\sum_{k=0}^{\infty}x^{\left(k\right)}$ be the Ostrowski expansion of $\beta_{0}$ in terms of $\alpha_{0}$ given by Proposition 2.2. Then by Remark 2.3 $\sum_{k=0}^{n}x^{\left(k\right)}\in\left\{T_{\alpha_{0}}^{j}\left(0\right):\ 0\leq j\leq q_{n-1}+q_{n}\right\}\bigcup\left\{\alpha_{0}\right\}$ where $q_{n}$ are the denominators of the $n^{th}$ convergent in the continued fraction expansion of $\alpha$ . Since under the conjugacy $\psi$ between $T_{\alpha_{0}}$ and $R_{\alpha}$ (where both maps are viewed as rotations on a circle), the (equivalence class of the) points [math] and $\alpha_{0}$ in the domain of $T_{\alpha_{0}}$ correspond respectively to the (equivalence class of) points $1-\alpha$ and $1$ in the domain of $R_{\alpha}$ , we obtain that $\psi^{-1}\left(\sum_{k=0}^{n}x^{\left(k\right)}\right)\in\left\{R_{\alpha}^{j}\left(1-\alpha\right):\ 0\leq j\leq q_{n-1}+q_{n}-1\right\}$ . It follows that the Ostrowski expansion of $\beta_{0}$ is infinite, since otherwise, if there exists an $n$ such that $\beta_{0}=\sum_{k=0}^{n}x^{\left(k\right)}$ , we would get that $\beta=\psi^{-1}\left(\beta_{0}\right)=1-\alpha+j\alpha\mod 1$ for some $j\in\mathbb{N}\bigcup\left\{0\right\}$ , which obviously contradicts (1.1). Thus, $\left(\alpha_{0},\beta_{0}\right)\in\tilde{X}$ .

Fix $M\in\mathbb{N}$ and let $s=s\left(\alpha_{0},\beta_{0}\right)$ be defined by (4.1). We claim that, if for some $n\in\mathbb{N}$ , $s_{n+i}\in\left\{B_{-},B_{+}\right\}$ for all $1\leq i\leq M$ , then there exist a constant $C$ , which does not depend on $n$ , and $0\leq k\leq q_{n}+q_{n-1}$ , $p\in\mathbb{Z}$ , such that

[TABLE]

The second assertion of the Lemma follows immediately from this and the fact that $q_{n+M}\rightarrow\infty$ as $M$ tends to $\infty$ .

To see that the claim holds, suppose that $s_{n+i}\in\left\{B_{-},B_{+}\right\}$ for all $0\leq i\leq M$ . Recalling the description of the renormalization procedure in section 2.2, this is equivalent to $x^{\left(n+i\right)}=0$ for all $0\leq i\leq M$ , so that $\sum x^{\left(k\right)}=\sum_{k=0}^{n+M}x^{\left(k\right)}$ . Thus, by the estimate of the reminder in an Ostrowski expansion given by Proposition 2.2, we obtain that

[TABLE]

Since $\alpha$ is badly approximable, $\alpha^{\left(n\right)}=\mathcal{G}^{n}\left(\alpha\right)\leq\frac{C}{q_{n}}$ for all $n$ , where $C$ is a constant which depends only on $\alpha$ . Since the conjugacy map $\psi$ is affine, the previous inequality yields that there exists a constant $C$ , such that

[TABLE]

Since $\psi^{-1}\left(\sum_{k=0}^{n}x^{\left(k\right)}\right)\in\left\{R_{\alpha}^{j}\left(1-\alpha\right):\ 0\leq j<q_{n-1}+q_{n}\right\}$ , we obtain that

[TABLE]

where $0\leq k<q_{n}+q_{n-1}$ , and $p\in\mathbb{Z}$ . Thus, combining the last two equations, we proved (4.5). This completes the proof of the Lemma. ∎

Let $0<\alpha<\frac{1}{2}$ be badly approximable, let $\beta\in\left(0,1\right)$ be badly approximable with respect to $\alpha$ and let $\left(\alpha_{0},\beta_{0}\right)$ be related to $\left(\alpha,\beta\right)$ via equations (2.3) and (2.4). Since by the previous proposition $\left(\alpha_{0},\beta_{0}\right)\in\tilde{X}$ , the sequence of transition matrices $p^{\left(n\right)}$ associated to the pair $\left(\alpha_{0},\beta_{0}\right)$ given by Definition 2.12 is well defined. Recall that $\tau\left(P\right)$ , where $P$ is a stochastic matrix, denotes the contraction coefficient defined by (3.3).

Corollary 4.4.

Let $0<\alpha<\frac{1}{2}$ be badly approximable, $\beta\in\left(0,1\right)$ be badly approximable with respect to $\alpha$ and let $\left(\alpha_{0},\beta_{0}\right)$ be related to $\left(\alpha,\beta\right)$ via equations (2.3) and (2.4). Then if $p^{\left(n\right)}$ is the sequence of transition matrices associated to $\left(\alpha_{0},\beta_{0}\right)$ (see Definition 2.12), there exist $M\in\mathbb{N}$ , and $0\leq\delta<1$ , such that

[TABLE]

Proof.

Lemma 4.3 implies that $\beta_{0}$ is of Ostrowski bounded type. By definition of the transition matrices $p^{\left(n\right)}$ (see Definition 2.12), for any $\left(K,k\right)\in\mathcal{S}_{n+M+1}$ , $\left(J,j\right)\in\mathcal{S}_{n}$

[TABLE]

if and only if

[TABLE]

This should be interpreted as the statement that the probability to pass from a state $\left(K,k\right)\in\mathcal{S}_{M+n+1}$ to some state $\left(J,j\right)\in\mathcal{S}_{n}$ is positive if and only if the intersection of the tower $Z_{J}^{\left(n\right)}$ with the subtower of $Z_{K}^{\left(n+M+1\right)}$ labelled by $\left(K,k\right)$ is non-empty. Thus, Proposition 4.2 implies that there exists $M\in\mathbb{N}$ such that $p^{\left(n+M\right)}\cdot...\cdot p^{\left(n\right)}$ is strictly positive for any $n\in\mathbb{N}$ . From $\alpha$ being badly approximable (see inequality (2.23)) and by the fact that by definition, every positive entry of $p^{\left(n+M\right)}\cdot...\cdot p^{\left(n\right)}$ is a ratio between the heights of tower at the $\left(n+M\right)^{th}$ and $n^{th}$ stage of the renormalization, it follows that there exists $\delta>0$ which is independent of $n$ , such that every entry of $p^{\left(n+M\right)}\cdot...\cdot p^{\left(n\right)}$ is not less than $\delta$ . Note that it follows from the definition of the coefficient $\tau$ (see (3.3)) that if $P_{n\times m}$ is a stochastic matrix such that there exists $\delta>0$ , for which $P_{i,j}>\delta$ , for all $1\leq i\leq n$ , $1\leq j\leq m$ , then $\tau\left(P\right)<1-\delta$ . Thus, the proof is complete. ∎

4.2. Growth of the variance

In this section we consider the random variables $\xi_{k}\left(X_{k}\right),\,k\in\mathbb{N}$ , constructed in Section 2.6 (see equation (2.30) therein). Recall that the array is well defined for any given pair of parameters $\left(\alpha_{0},\beta_{0}\right)\in\tilde{X}$ and, by the key Proposition 2.16, models Birkhoff sums over the transformation $T_{\alpha_{0}}$ of the function $\varphi$ defined by (2.5), which has a jump at $\beta_{0}$ . The goal in the present section is to show that if $\varphi$ is not a coboundary, then the variance $Var_{\mu_{n}}\left(\sum_{k=1}^{n}\xi_{k}\left(X_{k}\right)\right)$ tends to infinity as $n$ tends to infinity, where $Var_{\mu_{n}}\left(\sum_{k=1}^{n}\xi_{k}\left(X_{k}\right)\right)$ is the variance of $\sum_{k=1}^{n}\xi_{k}\left(X_{k}\right)$ with respect to the measure $\mu_{n}$ .

Let us first recall the definition of tightness and a criterion which characterizes coboundaries.

Definition 4.5.

Let $\left(\Omega,\mathcal{B},P\right)$ be a probability space. A sequence of random variables $\left\{Y_{n}\right\}$ defined on $\Omega$ and taking values in a Polish space $\mathcal{P}$ is tight if for every $\epsilon>0$ , there exists a compact set $C\subseteq\mathcal{P}$ such that $\forall n\in\mathbb{N}$ , $P\left(Y_{n}\in C\right)>1-\epsilon$ .

Let $\left(X,\mathcal{B},m,T\right)$ be a probability preserving system and let $f:X\rightarrow\mathbb{R}$ be a measurable function. We say that $f$ is a coboundary if there exists a measurable function $g:X\rightarrow\mathbb{R}$ such that the equality $f\left(x\right)=g\left(x\right)-g\circ T\left(x\right)$ holds almost surely. Let us recall the following characterization of coboundaries on $\mathbb{R}$ (see [4]).

Theorem 4.6.

The sequence $\left\{\sum_{k=0}^{n-1}f\circ T^{k}\right\}$ is tight if and only if $f$ is a coboundary.

Set $e_{n}=E_{\mu_{n}}\left(\sum_{k=1}^{n}X_{k}\right)$ , $\sigma_{n}=\sqrt{Var_{\mu_{n}}\left(\sum_{k=1}^{n}X_{k}\right)}$ . We will now prove the following lemma.

Lemma 4.7.

Assume that there exists a strictly increasing sequence of positive integers $\left\{n_{j}\right\}_{j=1}^{\infty}$ such that

[TABLE]

Then the sequence $\varphi_{n}=\sum_{k=0}^{n-1}\varphi\circ T_{\alpha_{0}}^{k}$ is tight.

Thus, combining Theorem 4.6 and Lemma 4.7 we have the following.

Corollary 4.8.

If $\sigma_{n}$ does not tend to infinity as $n\rightarrow\infty$ , then $\varphi$ must be a coboundary.

Proof of Lemma 4.7..

Fix $\epsilon>0$ . By Markov’s inequality the assumption that $\sup\left\{\sigma_{n_{j}}:\ j=1,2,...\right\}<\infty$ , implies that there exists a constant $A$ such that for every $j\in\mathbb{N}$ ,

[TABLE]

Let $n\in N$ and fix $j$ such that $n<\epsilon h_{J}^{\left(n_{j}\right)}$ for any $J\in\left\{L,M,S\right\}$ (this is possible since the heights of the towers $h_{J}^{\left(n\right)}$ tend to infinity with $n$ ). Let $x$ be any point on level $l$ of the tower $Z_{J}^{\left(n_{j}\right)}$ and consider the Birkhoff sums $\varphi_{n}(x)$ . Then there exists a point $x_{0}=x_{0}(x)$ in the base of the tower $I_{J}^{\left(n_{j}\right)}$ such that $\varphi_{n}\left(x\right)=\varphi_{n+l}\left(x_{0}\right)-\varphi_{l}\left(x_{0}\right)$ . Since the values of $S_{l}(x_{0})$ for $x_{0}\in I_{J}^{\left(n_{j}\right)}$ and $0\leq l\leq h_{J}^{\left(n_{j}\right)}$ do not depend on $x_{0},$ we can choose any point $x_{J}$ $\in I_{J}^{\left(n_{j}\right)}$ and by triangle inequality we have that $\left|\varphi_{n}\left(x\right)\right|>2A$ implies that $\left|\varphi_{n+l}\left(x_{J}\right)-e_{n_{j}}\right|>A$ or $\left|\varphi_{l}\left(x_{J}\right)-e_{n_{j}}\right|>A$ for any point $x$ on level $l$ of the tower $Z_{J}^{\left(n_{j}\right)}$ with $0\leq l<h_{J}^{\left(n_{j}\right)}-n$ . Thus,

[TABLE]

where the last inequality follows by using that $\lambda\left(I_{J}^{\left(n_{j}\right)}\right)h_{J}^{\left(n_{j}\right)}=\lambda\left(Z_{J}^{\left(n_{j}\right)}\right)\leq 1$ and recalling that by choice of $n_{j}$ we have that $n/h_{J}^{\left(n_{j}\right)}<\epsilon$ . Furthermore, by a change of indexes,

[TABLE]

where the last equality follows from Proposition 2.16. Therefore, from the relation between the measures $\mu_{n}^{J}$ and $\mu_{n}$ (see Definition 2.12) it follows that

[TABLE]

It follows from (4.6) that $\lambda\left(\left\{x:\left|\varphi_{n}\left(x\right)\right|>2A\right\}\right)<5\epsilon$ . Since $\epsilon$ was chosen arbitrarily, this shows that $\varphi_{n}$ is tight. ∎

4.3. Proof of Theorem 1.1.

We begin this section with a few observations that summarize the results obtained in the preceding sections in the form that is used in order to prove Theorem 4.9 below from which the main theorem follows.

Let $0<\alpha<\frac{1}{2}$ be badly approximable and $\beta\in\left(0,1\right)$ be badly approximable with respect to $\alpha$ . By Lemma 4.3 the pair $\left(\alpha_{0},\beta_{0}\right)$ related to $\left(\alpha,\beta\right)$ via equations (2.3) and (2.4), satisfies $\left(\alpha_{0},\beta_{0}\right)\in\tilde{X}$ . To each such pair, in Section 2.5 we associated a Markov compactum given by a sequence of transition matrices $\left\{A_{n}\right\}$ (which are incidence matrices for the substitutions $\left\{\tau_{n}\right\}$ which describe the Rokhlin tower structure) and Markov measures $\left\{\mu_{n}\right\}$ with transition matrices $\left\{p^{\left(n\right)}\right\}$ (defined in 2.26 and Definition 2.12 respectively). Let $\left\{X_{k}\right\}$ be the coordinate functions on the Markov compactum (see 2.30) and $\left\{\xi_{k}\right\}$ be the functions also defined therein (see Definition 2.15), which can be used to study the behavior of Birkhoff sums of the function $\varphi$ defined by (2.5) over $T_{\alpha_{0}}$ in virtue of as proved in Proposition 2.16. We set

[TABLE]

where the subscript $\mu_{n}$ in $E_{\mu_{n}}$ and $Var_{\mu_{n}}$ mean that all integrals are taken with respect to the measure $\mu_{n}$ .

Since the function $\varphi$ defined by (2.5) is not a coboundary (see Remark 2.4), Corollary 4.8 implies that $\sigma_{n}\rightarrow\infty$ . By definition of $\xi_{k}$ , combining the assumption that $\alpha$ is badly approximable with the inequality (2.24), we obtain that

[TABLE]

Finally, for any $n\in\mathbb{N}$ , set $\xi_{k}^{\left(n\right)}:=\xi_{k}$ , $X_{k}^{\left(n\right)}:=X_{n}$ , for $k=n,...,1$ . Let us then define a Markov array $\left\{X_{k}^{\left(n\right)}:\ n\in\mathbb{N},\ k=n,...,1\right\}$ , where $Prob\left(\left(X_{1}^{\left(n\right)},...,X_{n}^{\left(n\right)}\right)\in A\right)=\mu_{n}\left(A\right)$ for every set $A$ in the Borel $\sigma$ -algebra of the space $\Sigma_{n}$ . The observations above together with Corollary 4.4 show that all assumptions of Corollary 3.3 hold for this array. Thus

[TABLE]

Moreover, by Proposition 3.4 (and the fact the $\sigma_{n}\rightarrow\infty$ ), (4.7) holds with $\mu_{n}$ replaced by $\mu_{n}^{J}$ , for any $J\in\left\{L,M,S\right\}$ (where $\mu_{n}^{J}$ are the conditional measures defined by (2.27)).

We can now deduce the temporal CLT for Birkhoff sums. Fix $x\in\left[-1,\alpha_{0}\right)$ . Let us first define the centralizing and normalizing constants for the Birkhoff sums $\varphi_{n}(x).$ For $n\in\mathbb{N}$ , let $N=N\left(n\right):=\min\left\{k:\ n\leq h_{S}^{\left(k\right)}\right\}$ . Let $Z_{J}^{\left(N\right)}$ be the tower at stage $N$ of the renormalization which contains the point $x$ and let $l_{n}$ be the level of the tower $Z_{J}^{\left(N\right)}$ which contains $x$ , i.e. $l_{n}$ satisfies $x\in T^{l_{n}}\left(I_{J}^{\left(N\right)}\right)$ . Set $c_{n}\left(x\right):=\varphi_{l_{n}}\left(x^{\prime}\right)$ where $x^{\prime}$ is any point in $I_{J}^{\left(N\right)}$ , i.e. $c_{n}\left(x\right)$ is the Birkhoff sum over the tower $Z_{J}^{\left(N\right)}$ from the bottom of the tower and up to the level that contains $x$ .

We will prove the following temporal DLT, from which Theorem 1.1 follows immediately recalling the correspondence between $R_{\alpha}$ and $T_{\alpha_{0}}$ and the functions $f_{\beta}$ and $\varphi$ (refer to the beginning of Section (2.2)).

Theorem 4.9.

For any $a<b$ ,

[TABLE]

The above formulation, in particular, shows that the centralizing constants depend on the point $x$ and have a very clear dynamical meaning. The proof of this Theorem, which will take the rest of the section, is based on a quite standard decomposition of a Birkhoff sums into special Birkhoff sums. For each intermediate Birkhoff sum along a tower, we then exploit the connection with the Markov chain given by Proposition 2.16 and the convergence given by (4.7).

Proof.

Fix $0<\epsilon<1$ , $a,b\in\mathbb{R}$ , $a<b$ and let $n\in\mathbb{N}$ . By definition of $N=N\left(n\right)$ , the points $\left\{x,...,T^{n-1}x\right\}$ are contained in at most two towers obtained at the $N^{th}$ level of the renormalization. Let $K$ be defined by $K:=K\left(n\right)=\max\left\{k:\ h_{L}^{\left(k\right)}\leq\epsilon n\right\}$ . Evidently, $K\leq N$ , and by (2.22) there exists $C>0$ which depends on $\epsilon$ but not on $n$ , such that $N-K\leq C$ .

Thus, since towers of level $N$ are decomposed into towers of level $K,$ we can decompose the orbit $\left\{x,...,T^{n-1}x\right\}$ into blocks which are each contained in a tower of level $K$ . More precisely, as shown in Figure (4.1), there exist $0=k_{0}\leq k_{1}<...,<k_{t}\leq n$ and towers $\left(Z_{J_{k_{i}}}^{\left(K\right)}\right)_{i=0}^{t}$ appearing at the $K^{th}$ stage of renormalization, such that $\left\{T^{k_{i}}x,...,T^{k_{i+1}-1}x\right\}\subseteq Z_{J_{k_{i}}}^{\left(K\right)}$ for $i=0,...,t$ . Moreover, for $i=1,...,t-1$ , the set $\left\{T^{k_{i}}x,...,T^{k_{i+1}-1}x\right\}$ contains exactly $h_{J_{k_{i}}}^{\left(K\right)}$ points, i.e. $k_{i+1}-k_{i}=h_{J_{k_{i}}}^{\left(K\right)}$ and the points $T^{k_{i}+j}$ , $j=0,...,k_{i+1}-1$ belong to the $j+1$ level of the tower $Z_{J_{k_{i}}}^{\left(K\right)}$ . Since the orbit segment is contained in at most two towers of level $N$ and each tower of level $N$ contains at most $h_{L}^{\left(N\right)}/h_{S}^{\left(K\right)}$ towers of level $K$ , we have that $t=t\left(n\right)\leq 2h_{L}^{\left(N\right)}/h_{S}^{\left(K\right)}$ and hence is uniformly bounded in $n$ .

It follows from this decomposition that, for any interval $I\subset\mathbb{R}$ ,

[TABLE]

where the last inequality follows from the fact that $h_{J_{k_{0}}}^{\left(K\right)}$ and $h_{J_{k_{t}}}^{\left(K\right)}$ are both not greater than $n\epsilon$ . Evidently, we also have the opposite inequality

[TABLE]

For $i=1,...,t-1$ , and $k_{i}<k\leq k_{i+1}$ , write

[TABLE]

where $x^{\prime}$ is any point in $I_{J_{k_{i}}}^{\left(K\right)}$ .

By definition of $c_{n}\left(x\right)$ , $\varphi_{k_{i}}\left(x\right)+c_{n}\left(x\right)=\varphi_{k_{i}}\left(x_{0}\right)$ where $x_{0}$ belongs to the base $I^{\left(N\right)}$ (see Figure 4.1), thus $\varphi_{k_{i}}\left(x\right)+c_{n}\left(x\right)$ is a sum of special Birkhoff sums over subtowers of $Z_{J}^{\left(K\right)}$ , $J\in\left\{L,M,S\right\}$ . Hence,

[TABLE]

by (2.24), there exists a constant $\tilde{C}:=\tilde{C}\left(\epsilon\right)$ which does not depend on $n$ , such that $\left|\varphi_{k_{i}}\left(x\right)+c_{n}\left(x\right)\right|\leq\tilde{C}$ . It follows from Proposition 2.16 that

[TABLE]

Since $\left|N-K\right|=\left|N\left(n\right)-K\left(n\right)\right|<C$ , we have that $\sup_{n}\left\{\left|e_{N}-e_{K}\right|\right\}<\infty$ and $\frac{\sigma_{N}}{\sigma_{K}}\underset{n\rightarrow\infty}{\longrightarrow}1$ . Moreover, since $\left|\varphi_{k_{i}}\left(x\right)+c_{n}\left(x\right)\right|\leq\tilde{C}$ , it follows from (4.7), that for any $J\in\left\{L,M,S\right\}$

[TABLE]

Let $n_{0}$ be such that for all $n>n_{0}$ and any $J\in\left\{L,M,S\right\}$ ,

[TABLE]

Then if $n>n_{0}$ , by (4.8) and (4.10), recalling that $\sum_{i=1}^{t-1}h_{J_{k_{i}}}\leq n$ ,

[TABLE]

Similarly, by (4.9), if $n>n_{0}$ , using this time that $\sum_{i=1}^{t-1}h_{J_{k_{i}}}\geq n(1-2\epsilon)$ , we obtain the lower bound

[TABLE]

This completes the proof. ∎

Acknowledgments.

We would like to thank Jon Aaronson, Dima Dolgopyat, Jens Marklof and Omri Sarig for useful discussions and for their interest in our work. Both authors are supported by the ERC Starting Grant ChaParDyn. C. U. is also supported by the Leverhulm Trust through a Leverhulme Prize. The research leading to these results has received funding from the European Research Council under the European Union Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. 335989.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Jon Aaronson, Michael Bromberg, and Nishant Chandgotia. Rational ergodicity of step function skew products. ar Xiv preprint ar Xiv:1703.09003 , 2017.
2[2] Jon Aaronson, Michael Bromberg, and Hitoshi Nakada. Discrepancy skew products and affine random walks. ar Xiv preprint ar Xiv:1603.07233 , 2016.
3[3] Jon Aaronson and Michael Keane. The visitors to zero of some deterministic random walks. Proceedings of the London Mathematical Society , 3(3):535–553, 1982.
4[4] Jon Aaronson and Benjamin@articlepetersen 1973 series, title=On a series of cosecants related to a problem in ergodic theory, author=Petersen, Karl, journal=Compositio Mathematica, volume=26, number=3, pages=313–317, year=1973 Weiss. Remarks on the tightness of cocycles. In Colloq. Math , volume 84, pages 363–376, 2000.
5[5] Pierre Arnoux and Albert M Fisher. The scenery flow for geometric structures on the torus: the linear setting. Chinese Annals of Mathematics , 22(04):427–470, 2001.
6[6] Artur Avila, Dmitry Dolgopyat, Eduard Duryev, and Omri Sarig. The visits to zero of a random walk driven by an irrational rotation. Israel Journal of Mathematics , 207(2):653–717, 2015.
7[7] József Beck. Randomness of the square root of 2 and the giant leap, part 1. Periodica Mathematica Hungarica , 60(2):137–242, 2010.
8[8] József Beck. Randomness of the square root of 2 and the giant leap, part 2. Periodica Mathematica Hungarica , 62(2):127–246, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A temporal Central Limit Theorem for real-valued cocycles over rotations

Abstract.

1. introduction and results

1.1. Temporal and Spatial Limits in dynamics.

1.2. Beck’s temporal CLT and its generalizations

1.3. Main result and comments

Theorem 1.1**.**

1.4. Proof tools and sketch and outline of the paper

2. renormalization

2.1. Preliminaries on continued fraction expansions and circle rotations

Remark 2.1*.*

2.2. Continued fraction

One step of renormalization

Proposition 2.2**.**

Remark 2.3*.*

Remark 2.4*.*

2.3. Description of the Kakutani-Rokhlin

Proposition 2.5**.**

Remark 2.6*.*

Remark 2.7*.*

2.3.1. Special Birkhoff sums

2.4. The (adic) symbolic coding

Definition 2.8**.**

Remark 2.9*.*

Definition 2.10**.**

Proposition 2.11**.**

Proof of Proposition 2.11.

2.5. The Markov chain modeling towers.

Definition 2.12**.**

Remark 2.13*.*

Proposition 2.14**.**

Proof.

2.6. The functions over the Markov chain modeling the Birkhoff sums.

Definition 2.15**.**

Proposition 2.16**.**

Proof.

3. the clt for markov chains

3.1. Contraction coefficients, mixing properties and CLT for Markov chains

Theorem 3.1**.**

Proposition 3.2**.**

Proof.

Corollary 3.3**.**

Proof.

Proposition 3.4**.**

Proof.

4. proof of the temporal clt

4.1. Positivity of products of incidence

Definition 4.1**.**

Proposition 4.2**.**

Proof.

Lemma 4.3**.**

Proof.

Corollary 4.4**.**

Proof.

4.2. Growth of the variance

Definition 4.5**.**

Theorem 4.6**.**

Lemma 4.7**.**

Corollary 4.8**.**

Proof of Lemma 4.7..

4.3. Proof of Theorem 1.1.

Theorem 4.9**.**

Proof.

Acknowledgments.

Theorem 1.1.

*Remark 2.1**.*

Proposition 2.2.

*Remark 2.3**.*

*Remark 2.4**.*

Proposition 2.5.

*Remark 2.6**.*

*Remark 2.7**.*

Definition 2.8.

*Remark 2.9**.*

Definition 2.10.

Proposition 2.11.

Definition 2.12.

*Remark 2.13**.*

Proposition 2.14.

Definition 2.15.

Proposition 2.16.

Theorem 3.1.

Proposition 3.2.

Corollary 3.3.

Proposition 3.4.

Definition 4.1.

Proposition 4.2.

Lemma 4.3.

Corollary 4.4.

Definition 4.5.

Theorem 4.6.

Lemma 4.7.

Corollary 4.8.

Theorem 4.9.