Equidistribution of random walks on compact groups

Bence Borda

arXiv:1906.09432·math.PR·April 15, 2021

Equidistribution of random walks on compact groups

Bence Borda

PDF

TL;DR

This paper establishes conditions under which random walks on compact groups become uniformly distributed, and proves related limit theorems including the law of large numbers, the law of the iterated logarithm, and the central limit theorem.

Contribution

It provides necessary and sufficient conditions for equidistribution of random walks on compact groups and extends classical limit theorems to this setting.

Findings

01

Random walks equidistribute if not supported on proper closed subgroups and have an absolutely continuous component.

02

Strong law of large numbers and law of the iterated logarithm hold for sums of functions along the walk.

03

Central limit theorem with remainder term is established for sums along the walk.

Abstract

Let $X_{1}, X_{2}, \dots$ be independent, identically distributed random variables taking values from a compact metrizable group $G$ . We prove that the random walk $S_{k} = X_{1} X_{2} \dots X_{k}$ , $k = 1, 2, \dots$ equidistributes in any given Borel subset of $G$ with probability $1$ if and only if $X_{1}$ is not supported on any proper closed subgroup of $G$ , and $S_{k}$ has an absolutely continuous component for some $k \geq 1$ . More generally, the sum $\sum_{k = 1}^{N} f (S_{k})$ , where $f : G \to R$ is Borel measurable, is shown to satisfy the strong law of large numbers and the law of the iterated logarithm. We also prove the central limit theorem with remainder term for the same sum, and construct an almost sure approximation of the process $\sum_{k \leq t} f (S_{k})$ by a Wiener process provided $S_{k}$ converges to the Haar measure in the total variation metric.

Equations152

N \to \infty lim \frac{1}{N} k = 1 \sum N f (S_{k}) = \int_{G} f d μ a.s.

N \to \infty lim \frac{1}{N} k = 1 \sum N f (S_{k}) = \int_{G} f d μ a.s.

N \to \infty lim \frac{1}{N} k = 1 \sum N f (S_{k}) = \int_{G} f d μ a.s.

N \to \infty lim \frac{1}{N} k = 1 \sum N f (S_{k}) = \int_{G} f d μ a.s.

N \to \infty lim sup \frac{\sum _{k = 1}^{N} f ( S _{k} ) - N \int _{G} f d μ}{N lo g lo g N} < \infty a.s.

N \to \infty lim sup \frac{\sum _{k = 1}^{N} f ( S _{k} ) - N \int _{G} f d μ}{N lo g lo g N} < \infty a.s.

\frac{\sum _{k = 1}^{N} f ( S _{k} )}{C ( f , ν ) N} ⟶ d N (0, 1)

\frac{\sum _{k = 1}^{N} f ( S _{k} )}{C ( f , ν ) N} ⟶ d N (0, 1)

\int_{G} f (x) d μ (x) = \int_{G} f (x y) d μ (x) = \int_{G} f (y x) d μ (x) .

\int_{G} f (x) d μ (x) = \int_{G} f (x y) d μ (x) = \int_{G} f (y x) d μ (x) .

∥ ϑ ∥_{TV} = sup {\int_{G} f d ϑ : f : G \to R Borel measurable, G sup ∣ f ∣ \leq 1} .

∥ ϑ ∥_{TV} = sup {\int_{G} f d ϑ : f : G \to R Borel measurable, G sup ∣ f ∣ \leq 1} .

\int_{G} f d (ϑ_{1} * ϑ_{2}) = \int_{G} \int_{G} f (x y) d ϑ_{1} (x) d ϑ_{2} (y) .

\int_{G} f d (ϑ_{1} * ϑ_{2}) = \int_{G} \int_{G} f (x y) d ϑ_{1} (x) d ϑ_{2} (y) .

\int_{G} f g d μ = π \in \hat{G} \sum d_{π} tr (\hat{f} (π) \overset{g}{^} (π)^{*}),

\int_{G} f g d μ = π \in \hat{G} \sum d_{π} tr (\hat{f} (π) \overset{g}{^} (π)^{*}),

∥ ν^{* (k + 1)} - μ ∥_{TV} = ∥ (ν^{* k} - μ) * ν ∥_{TV} \leq ∥ ν^{* k} - μ ∥_{TV} \cdot ∥ ν ∥_{TV} = ∥ ν^{* k} - μ ∥_{TV},

∥ ν^{* (k + 1)} - μ ∥_{TV} = ∥ (ν^{* k} - μ) * ν ∥_{TV} \leq ∥ ν^{* k} - μ ∥_{TV} \cdot ∥ ν ∥_{TV} = ∥ ν^{* k} - μ ∥_{TV},

q = max {π \in \hat{G} \ {π_{0}} sup ρ (\overset{ν}{^} (π)), k \geq 1 in f ∥ (ν^{* k})_{sing} ∥_{TV}^{1/ k}},

q = max {π \in \hat{G} \ {π_{0}} sup ρ (\overset{ν}{^} (π)), k \geq 1 in f ∥ (ν^{* k})_{sing} ∥_{TV}^{1/ k}},

N \to \infty lim \frac{\sum _{k = 1}^{N} f ( S _{k} )}{φ _{m, ε} ( N ) ^{1/ p}} = 0 a.s.

N \to \infty lim \frac{\sum _{k = 1}^{N} f ( S _{k} )}{φ _{m, ε} ( N ) ^{1/ p}} = 0 a.s.

N \to \infty lim sup \frac{\sum _{k = 1}^{N} f ( S _{k} )}{N lo g lo g N} < \infty a.s.

N \to \infty lim sup \frac{\sum _{k = 1}^{N} f ( S _{k} )}{N lo g lo g N} < \infty a.s.

C (f, ν) = E f (U)^{2} + 2 k = 1 \sum \infty E f (U) f (U S_{k}),

C (f, ν) = E f (U)^{2} + 2 k = 1 \sum \infty E f (U) f (U S_{k}),

x \in R sup Pr (\frac{\sum _{k = 1}^{N} f ( S _{k} )}{C ( f , ν ) N} < x) - Φ (x) ≪ K \frac{lo g ^{δ / (1 + δ)} N}{N ^{δ / (2 + 2 δ)}},

x \in R sup Pr (\frac{\sum _{k = 1}^{N} f ( S _{k} )}{C ( f , ν ) N} < x) - Φ (x) ≪ K \frac{lo g ^{δ / (1 + δ)} N}{N ^{δ / (2 + 2 δ)}},

∣ E g (S_{J}) - E g (U) ∣ = \int_{G} g d (ν^{* ∣ J ∣} - μ) \leq G sup ∣ g ∣ \cdot Δ_{∣ J ∣} .

∣ E g (S_{J}) - E g (U) ∣ = \int_{G} g d (ν^{* ∣ J ∣} - μ) \leq G sup ∣ g ∣ \cdot Δ_{∣ J ∣} .

E (k = 1 \sum N f (U S_{k}))^{2} = C (f, ν) N + O_{ν} (∥ f ∥_{2}^{2}) \leq ∥ f ∥_{2}^{2} Δ N .

E (k = 1 \sum N f (U S_{k}))^{2} = C (f, ν) N + O_{ν} (∥ f ∥_{2}^{2}) \leq ∥ f ∥_{2}^{2} Δ N .

A_{k} = \int_{G} \int_{G} f (u) f (ux) d μ (u) d ν^{* k} (x) .

A_{k} = \int_{G} \int_{G} f (u) f (ux) d μ (u) d ν^{* k} (x) .

E (k = 1 \sum N f (U S_{k}))^{2} = k = 1 \sum N E f (U S_{k})^{2} + 2 1 \leq k < ℓ \leq N \sum E f (U S_{k}) f (U S_{ℓ}) .

E (k = 1 \sum N f (U S_{k}))^{2} = k = 1 \sum N E f (U S_{k})^{2} + 2 1 \leq k < ℓ \leq N \sum E f (U S_{k}) f (U S_{ℓ}) .

E (k = 1 \sum N f (U S_{k}))^{2} = N E f (U)^{2} + 2 1 \leq k < ℓ \leq N \sum E f (U) f (U S_{ℓ - k}) = N E f (U)^{2} + 2 d = 1 \sum N - 1 (N - d) A_{d} = C (f, ν) N + O (d = 1 \sum N - 1 d ∣ A_{d} ∣ + N d = N \sum \infty ∣ A_{d} ∣) \leq N E f (U)^{2} + 2 N d = 1 \sum N - 1 ∣ A_{d} ∣.

E (k = 1 \sum N f (U S_{k}))^{2} = N E f (U)^{2} + 2 1 \leq k < ℓ \leq N \sum E f (U) f (U S_{ℓ - k}) = N E f (U)^{2} + 2 d = 1 \sum N - 1 (N - d) A_{d} = C (f, ν) N + O (d = 1 \sum N - 1 d ∣ A_{d} ∣ + N d = N \sum \infty ∣ A_{d} ∣) \leq N E f (U)^{2} + 2 N d = 1 \sum N - 1 ∣ A_{d} ∣.

C (f, ν) = π \in \hat{G} \ {π_{0}} \sum d_{π} tr (\hat{f} (π)^{*} \hat{f} (π) B_{ν} (π)),

C (f, ν) = π \in \hat{G} \ {π_{0}} \sum d_{π} tr (\hat{f} (π)^{*} \hat{f} (π) B_{ν} (π)),

A_{k} = \int_{G} f (u) h_{k} (u) d μ (u) = π \in \hat{G} \sum d_{π} tr (\hat{f} (π) \overset{ν}{^} (π)^{k} \hat{f} (π)^{*}) .

A_{k} = \int_{G} f (u) h_{k} (u) d μ (u) = π \in \hat{G} \sum d_{π} tr (\hat{f} (π) \overset{ν}{^} (π)^{k} \hat{f} (π)^{*}) .

k = 1 \sum \infty π \in \hat{G} \ {π_{0}} \sum d_{π} tr (\hat{f} (π) \overset{ν}{^} (π)^{k} \hat{f} (π)^{*}) \leq ∥ f ∥_{2}^{2} k = 1 \sum \infty Δ_{k} < \infty,

k = 1 \sum \infty π \in \hat{G} \ {π_{0}} \sum d_{π} tr (\hat{f} (π) \overset{ν}{^} (π)^{k} \hat{f} (π)^{*}) \leq ∥ f ∥_{2}^{2} k = 1 \sum \infty Δ_{k} < \infty,

C (f, ν) = ∥ f ∥_{2}^{2} + 2 k = 1 \sum \infty A_{k} = π \in \hat{G} \ {π_{0}} \sum d_{π} tr (\hat{f} (π)^{*} \hat{f} (π) (I_{d_{π}} + 2 (I_{d_{π}} - \overset{ν}{^} (π))^{- 1} \overset{ν}{^} (π))) .

C (f, ν) = ∥ f ∥_{2}^{2} + 2 k = 1 \sum \infty A_{k} = π \in \hat{G} \ {π_{0}} \sum d_{π} tr (\hat{f} (π)^{*} \hat{f} (π) (I_{d_{π}} + 2 (I_{d_{π}} - \overset{ν}{^} (π))^{- 1} \overset{ν}{^} (π))) .

C (f, ν) = π \in \hat{G} \ {π_{0}} \sum tr (\hat{f} (π)^{*} \hat{f} (π)) tr B_{ν} (π) .

C (f, ν) = π \in \hat{G} \ {π_{0}} \sum tr (\hat{f} (π)^{*} \hat{f} (π)) tr B_{ν} (π) .

tr B_{ν} (π) = i = 1 \sum d_{π} (1 + \frac{λ _{i}}{1 - λ _{i}} + \frac{λ _{i}}{1 - λ _{i}}) = i = 1 \sum d_{π} \frac{1 - ∣ λ _{i} ∣ ^{2}}{∣1 - λ _{i} ∣ ^{2}} .

tr B_{ν} (π) = i = 1 \sum d_{π} (1 + \frac{λ _{i}}{1 - λ _{i}} + \frac{λ _{i}}{1 - λ _{i}}) = i = 1 \sum d_{π} \frac{1 - ∣ λ _{i} ∣ ^{2}}{∣1 - λ _{i} ∣ ^{2}} .

B_{ν} (π) v_{i} = (1 + \frac{λ _{i}}{1 - λ _{i}} + \frac{λ _{i}}{1 - λ _{i}}) v_{i} = \frac{1 - ∣ λ _{i} ∣ ^{2}}{∣1 - λ _{i} ∣ ^{2}} v_{i} .

B_{ν} (π) v_{i} = (1 + \frac{λ _{i}}{1 - λ _{i}} + \frac{λ _{i}}{1 - λ _{i}}) v_{i} = \frac{1 - ∣ λ _{i} ∣ ^{2}}{∣1 - λ _{i} ∣ ^{2}} v_{i} .

\frac{1 - q}{1 + q} tr (\hat{f} (π)^{*} \hat{f} (π)) \leq tr (\hat{f} (π)^{*} \hat{f} (π) B_{ν} (π)) \leq \frac{1 + q}{1 - q} tr (\hat{f} (π)^{*} \hat{f} (π)),

\frac{1 - q}{1 + q} tr (\hat{f} (π)^{*} \hat{f} (π)) \leq tr (\hat{f} (π)^{*} \hat{f} (π) B_{ν} (π)) \leq \frac{1 + q}{1 - q} tr (\hat{f} (π)^{*} \hat{f} (π)),

\left\|\sum_{k=1}^{N}f(US_{k})\right\|_{p}\ll\left\{\begin{array}[]{ll}\|f\|_{p}\left(\Delta N\right)^{1/p}&\mathrm{if}\,\,\,1\leq p<2,\\ \|f\|_{p}\sqrt{\Delta N}&\mathrm{if}\,\,\,2\leq p\leq 4.\end{array}\right.

\left\|\sum_{k=1}^{N}f(US_{k})\right\|_{p}\ll\left\{\begin{array}[]{ll}\|f\|_{p}\left(\Delta N\right)^{1/p}&\mathrm{if}\,\,\,1\leq p<2,\\ \|f\|_{p}\sqrt{\Delta N}&\mathrm{if}\,\,\,2\leq p\leq 4.\end{array}\right.

k = 1 \sum N f (U S_{k})_{4} ≪ ∥ f ∥_{2} Δ N + ∥ f ∥_{4} Δ^{3/4} N^{1/4} .

k = 1 \sum N f (U S_{k})_{4} ≪ ∥ f ∥_{2} Δ N + ∥ f ∥_{4} Δ^{3/4} N^{1/4} .

E (k = 1 \sum N f (U S_{k}))^{4} ≪ 1 \leq k_{1} \leq k_{2} \leq k_{3} \leq k_{4} \leq N \sum ∣ E f (U S_{k_{1}}) f (U S_{k_{2}}) f (U S_{k_{3}}) f (U S_{k_{4}}) ∣ .

E (k = 1 \sum N f (U S_{k}))^{4} ≪ 1 \leq k_{1} \leq k_{2} \leq k_{3} \leq k_{4} \leq N \sum ∣ E f (U S_{k_{1}}) f (U S_{k_{2}}) f (U S_{k_{3}}) f (U S_{k_{4}}) ∣ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\xpatchcmd

Proof.

\proofnameformat

Equidistribution of random walks on compact groups

Bence Borda

Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences

1053 Budapest, Reáltanoda u. 13–15, Hungary

Email: [email protected]

Keywords: empirical distribution, ergodic theorem, strong law of large numbers, law of the iterated logarithm, central limit theorem, Wiener process

Mathematics Subject Classification (2010): 60G50, 60B15

Abstract

Let $X_{1},X_{2},\dots$ be independent, identically distributed random variables taking values from a compact metrizable group $G$ . We prove that the random walk $S_{k}=X_{1}X_{2}\cdots X_{k}$ , $k=1,2,\dots$ equidistributes in any given Borel subset of $G$ with probability $1$ if and only if $X_{1}$ is not supported on any proper closed subgroup of $G$ , and $S_{k}$ has an absolutely continuous component for some $k\geq 1$ . More generally, the sum $\sum_{k=1}^{N}f(S_{k})$ , where $f:G\to\mathbb{R}$ is Borel measurable, is shown to satisfy the strong law of large numbers and the law of the iterated logarithm. We also prove the central limit theorem with remainder term for the same sum, and construct an almost sure approximation of the process $\sum_{k\leq t}f(S_{k})$ by a Wiener process provided $S_{k}$ converges to the Haar measure in the total variation metric.

1 Introduction

Let $G$ be a compact Hausdorff group with normalized Haar measure $\mu$ , and let $X_{1},X_{2},\dots$ be a sequence of independent, identically distributed (i.i.d. for short) $G$ -valued random variables. The random walk $S_{k}=\prod_{j=1}^{k}X_{j}=X_{1}X_{2}\cdots X_{k}$ is a classical object in probability theory. Throughout we assume that the distribution of $X_{1}$ is a regular Borel probability measure $\nu$ on $G$ ; the distribution of $S_{k}$ is thus $\nu^{*k}$ , the $k$ -fold convolution of $\nu$ . Generalizing results of Lévy [10] on the circle group $G=\mathbb{R}/\mathbb{Z}$ and Kawada and Itô [8] on compact metrizable groups, it was Urbanik [17] and Kloss [9] who proved that if $\nu^{*k}$ is weakly convergent, then its weak limit is the normalized Haar measure of a closed subgroup of $G$ . Stromberg [16] gave the following necessary and sufficient condition for the weak limit to be the Haar measure on $G$ itself. We shall say that $\nu$ is adapted if the support of $\nu$ is not contained in any proper closed subgroup of $G$ , and that $\nu$ is strictly aperiodic if the support of $\nu$ is not contained in a coset of any proper closed normal subgroup of $G$ .

Theorem A.

Let $G$ be a compact Hausdorff group, and let $\nu$ be a regular Borel probability measure on $G$ . The following are equivalent.

(i)

$\nu$ * is adapted and strictly aperiodic.* 2. (ii)

$\nu^{*k}\to\mu$ * weakly as $k\to\infty$ .*

A similar classical result gives a necessary and sufficient condition for convergence in the total variation metric $\|\cdot\|_{\mathrm{TV}}$ . The Lebesgue decomposition of $\nu^{*k}$ with respect to the Haar measure $\mu$ will be written as $\nu^{*k}=(\nu^{*k})_{\mathrm{abs}}+(\nu^{*k})_{\mathrm{sing}}$ , where $(\nu^{*k})_{\mathrm{abs}}$ is absolutely continuous and $(\nu^{*k})_{\mathrm{sing}}$ is singular with respect to $\mu$ .

Theorem B.

Let $G$ be a compact Hausdorff group, and let $\nu$ be a regular Borel probability measure on $G$ . The following are equivalent.

(i)

$\nu$ * is adapted and strictly aperiodic, and $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ .* 2. (ii)

$\|\nu^{*k}-\mu\|_{\mathrm{TV}}\to 0$ * as $k\to\infty$ .*

Moreover, if these equivalent conditions hold, then the convergence in (ii) is exponentially fast.

Special cases of Theorem B were proved by Bhattacharya [5]. For the general case and the history of related results see [1] and [13]. We mention that if $G$ is connected, then the assumption that $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ implies that $\nu$ is adapted and strictly aperiodic. This follows from the fact that in a connected, compact Hausdorff group any proper closed subgroup has Haar measure [math].

Theorems A and B concern the distribution of $S_{k}$ for a given $k\geq 1$ . We can also view $S_{k}$ , $k=1,2,\dots$ as a random sequence in $G$ , and consider the empirical distribution of the terms $S_{1},S_{2},\dots,S_{N}$ for some $N\geq 1$ . Under the technical assumption that $G$ is metrizable, Berger and Evans [2, Corollary 3.1] proved the following.

Theorem C.

Let $G$ be a compact metrizable group. Let $X_{1},X_{2},\dots$ be i.i.d. $G$ -valued random variables with distribution $\nu$ , and set $S_{k}=\prod_{j=1}^{k}X_{j}$ . The following are equivalent.

(i)

$\nu$ * is adapted.* 2. (ii)

For any continuous function $f:G\to\mathbb{R}$

[TABLE]

Note that a.s. (almost surely) means that the given relation holds with probability $1$ . Since the Banach space of continuous, real-valued functions on $G$ (or indeed, on any compact metric space) is separable, condition (ii) in the previous theorem is equivalent to the property that with probability $1$ , (1) holds for all continuous functions $f:G\to\mathbb{R}$ simultaneously. A (deterministic) sequence $a_{k}$ , $k=1,2,\dots$ in $G$ is called uniformly distributed if $\lim_{N\to\infty}(1/N)\sum_{k=1}^{N}f(a_{k})=\int_{G}f\,\mathrm{d}\mu$ for any continuous function $f:G\to\mathbb{R}$ . Theorem C thus states that the random sequence $S_{k}$ , $k=1,2,\dots$ is uniformly distributed with probability $1$ if and only if $\nu$ is adapted. See [3], [4] and [14] for related results on the circle group $G=\mathbb{R}/\mathbb{Z}$ , and [2] for the case of continuous time processes.

In this paper we consider $\sum_{k=1}^{N}f(S_{k})$ for Borel measurable functions $f:G\to\mathbb{R}$ , and we prove the following analogue of Theorem C.

Theorem 1.

Let $G$ be a compact metrizable group. Let $X_{1},X_{2},\dots$ be i.i.d. $G$ -valued random variables with distribution $\nu$ , and set $S_{k}=\prod_{j=1}^{k}X_{j}$ . The following are equivalent.

(i)

$\nu$ * is adapted, and $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ .* 2. (ii)

For any bounded, Borel measurable function $f:G\to\mathbb{R}$

[TABLE] 3. (iii)

For any bounded, Borel measurable function $f:G\to\mathbb{R}$

[TABLE]

The implications (iii) $\Rightarrow$ (ii) $\Rightarrow$ (i) are straightforward. Condition (ii) is in fact equivalent to the assumption that (2) holds for the indicator function $f=I_{B}$ of any Borel set $B\subseteq G$ ; indeed, a bounded, Borel measurable function can be uniformly approximated by finite linear combinations of such indicator functions. The equivalence (i) $\Leftrightarrow$ (ii) in Theorem 1 thus states that the random sequence $S_{k}$ , $k=1,2,\dots$ equidistributes in any given Borel set with probability $1$ if and only if $\nu$ is adapted, and $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . Equidistribution of a random sequence in any given Borel set with probability $1$ is sometimes called the “strong uniform distribution” property. In contrast, (ordinary) uniform distribution means equidistribution in any Borel set $B\subseteq G$ such that $\mu(\partial B)=0$ . Note that equidistribution in all Borel sets simultaneously is impossible; in particular, no deterministic sequence satisfies the strong uniform distribution property (unless $G$ is finite).

In Theorems C and 1 we did not assume that $\nu$ is strictly aperiodic, whereas in Theorems A and B strict aperiodicity is required for the convergence of $\nu^{*k}$ . In the proof of the implication (i) $\Rightarrow$ (iii) in Theorem 1 we will thus first assume that $\nu$ is strictly aperiodic. In case the support of $\nu$ is contained in a coset of a closed normal subgroup $H\lhd G$ , we will see that the factor group $G/H$ is finite and cyclic, and we will argue by induction on the index $|G:H|$ . Surprisingly, in Theorem 1 the strong law of large numbers (condition (ii)) and the law of the iterated logarithm (condition (iii)) are equivalent. This is a consequence of the fact that whenever $\nu^{*k}$ converges to the Haar measure $\mu$ in the total variation metric, the convergence is necessarily exponentially fast. This fact does not have an analogue for weak convergence. We also prove the following central limit theorem under the technical assumption that $\nu$ is a central measure. Note that condition (ii) below expresses convergence in distribution to the standard normal distribution.

Theorem 2.

Let $G$ be a compact metrizable group. Let $X_{1},X_{2},\dots$ be i.i.d. $G$ -valued random variables with distribution $\nu$ , and set $S_{k}=\prod_{j=1}^{k}X_{j}$ . Assume that $\nu$ is central. The following are equivalent.

(i)

$\nu$ * is adapted and strictly aperiodic, and $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ .* 2. (ii)

For any bounded, Borel measurable function $f:G\to\mathbb{R}$ such that $\int_{G}f\,\mathrm{d}\mu=0$ and $f$ is not $\mu$ -a.e. zero, we have

[TABLE]

with some constant $C(f,\nu)>0$ depending only on $f$ and $\nu$ .

Recall that a compact Hausdorff topological space is metrizable if and only if it is second countable. We mention that in the proofs of Theorems 1 and 2 the choice of the metric on $G$ is irrelevant; we only use the second countability of $G$ . Whether Theorems 1 and 2 are true for compact Hausdorff groups remains open.

2 Results

2.1 Preliminaries

For the rest of the paper we assume that $G$ is a compact metrizable group. Let $\mu$ denote the Haar measure on $G$ normalized so that $\mu(G)=1$ . We will write $L^{p}(G)=L^{p}(G,\mu)$ for the Lebesgue space of real-valued, Borel measurable functions with respect to $\mu$ , and $\|f\|_{p}=\|f\|_{L^{p}(G,\mu)}$ . In addition, $\|\cdot\|_{p}$ will also denote the $L^{p}$ -norm of (real-valued) random variables. Recall that $\mu$ is both left and right invariant; that is, for any Borel set $B\subseteq G$ and any $y\in G$ we have $\mu(yB)=\mu(By)=\mu(B)$ . Therefore for any $f\in L^{1}(G)$ we have

[TABLE]

The total variation of a finite, signed Borel measure $\vartheta$ on $G$ is defined as

[TABLE]

Given two finite, signed Borel measures $\vartheta_{1}$ and $\vartheta_{2}$ on $G$ , their convolution $\vartheta_{1}*\vartheta_{2}$ is the unique finite, signed Borel measure such that for any bounded, Borel measurable function $f:G\to\mathbb{R}$

[TABLE]

It is easy to see that $\|\vartheta_{1}*\vartheta_{2}\|_{\mathrm{TV}}\leq\|\vartheta_{1}\|_{\mathrm{TV}}\cdot\|\vartheta_{2}\|_{\mathrm{TV}}$ . A finite, signed Borel measure $\vartheta$ on $G$ is called central if $\vartheta(y^{-1}By)=\vartheta(B)$ for all Borel sets $B\subseteq G$ and all $y\in G$ . Similarly, a Borel measurable function $f:G\to\mathbb{R}$ is called a class function if $f(y^{-1}xy)=f(x)$ for all $x,y\in G$ . Note that for any Borel probability measure $\nu$ on $G$ we have $\nu*\mu=\mu*\nu=\mu$ . If $\nu_{1}$ and $\nu_{2}$ are Borel probability measures on $G$ , then $\|\nu_{1}-\nu_{2}\|_{\mathrm{TV}}=2\sup_{B}|\nu_{1}(B)-\nu_{2}(B)|$ , where the supremum is over all Borel sets $B\subseteq G$ . The support of a Borel probability measure $\nu$ on $G$ , denoted by $\mathrm{supp}\,\nu$ , is the smallest closed set $F\subseteq G$ such that $\nu(F)=1$ .

Remark.

Every finite Borel measure on $G$ (or indeed, on any Polish space) is regular. Therefore in the definitions of total variation and convolution we could have equivalently used continuous functions $f:G\to\mathbb{R}$ instead of bounded, Borel measurable ones. The existence and uniqueness of the convolution thus follows from the Riesz representation theorem.

A $G$ -valued random variable is a Borel measurable map $X$ from a probability space to $G$ . Let $\nu_{X}$ denote the distribution of $X$ ; that is, $\nu_{X}(B)=\Pr(X\in B)$ for all Borel sets $B\subseteq G$ . The variable $X$ is called uniformly distributed if $\nu_{X}=\mu$ . If $X$ and $Y$ are independent $G$ -valued random variables, then $\nu_{XY}=\nu_{X}*\nu_{Y}$ . We shall write $X\overset{d}{=}Y$ if the (real-valued or $G$ -valued) random variables $X$ and $Y$ have the same distribution.

Let $\hat{G}$ denote the unitary dual of $G$ ; that is, a complete set of pairwise unitarily inequivalent, irreducible unitary representations of $G$ . Recall that every such representation is finite dimensional, and let $d_{\pi}$ denote the dimension of $\pi\in\hat{G}$ . Thus $\pi(x)$ is a $d_{\pi}\times d_{\pi}$ unitary matrix with complex entries for any given $x\in G$ . Let $\pi_{0}\in\hat{G}$ , $\pi_{0}=1$ denote the trivial representation. Given $f\in L^{1}(G)$ and $\pi\in\hat{G}$ , let $\hat{f}(\pi)=\int_{G}f(x)\pi(x)^{*}\,\mathrm{d}\mu(x)$ denote the Fourier coefficients of $f$ . Here $\pi(x)^{*}$ denotes the conjugate transpose of $\pi(x)$ , and the integral is taken entrywise. The Fourier coefficients of a finite, signed Borel measure $\vartheta$ on $G$ are defined similarly as $\hat{\vartheta}(\pi)=\int_{G}\pi(x)^{*}\,\mathrm{d}\vartheta(x)$ , $\pi\in\hat{G}$ . The Parseval formula states that for any $f,g\in L^{2}(G)$ we have

[TABLE]

where $\mathrm{tr}$ denotes trace. Given $\pi\in\hat{G}$ the complex conjugate $\bar{\pi}$ is also an irreducible unitary representation of $G$ , called the contragradient of $\pi$ . The contragradient $\bar{\pi}$ may or may not be unitarily equivalent to $\pi$ . For the theory of Fourier analysis on compact groups we refer the reader to [6].

The notation $a_{n}\ll b_{n}$ and $a_{n}=O(b_{n})$ mean that there exists an (implied) constant $K\geq 0$ such that $|a_{n}|\leq Kb_{n}$ for all $n\geq 1$ . We write $a_{n}=\Theta(b_{n})$ if $a_{n}\ll b_{n}\ll a_{n}$ . We will use subscripts to denote dependence of the implied constant on certain parameters; thus e.g. $a_{n}\ll_{f,\nu}b_{n}$ and $a_{n}=O_{f,\nu}(b_{n})$ mean that the implied constant may depend on $f$ and $\nu$ . We emphasize that in the statement of all theorems, propositions and lemmas implied constants in the notation $\ll$ and $O$ are universal; in particular, they do not even depend on the group $G$ .

2.2 The main theorems

Let $G$ be a compact metrizable group with normalized Haar measure $\mu$ , let $X_{1},X_{2},\ldots$ be i.i.d. $G$ -valued random variables with distribution $\nu$ , and set $S_{k}=\prod_{j=1}^{k}X_{j}$ . Let $\Delta_{k}=\|\nu^{*k}-\mu\|_{\mathrm{TV}}=2\sup_{B}|\Pr(S_{k}\in B)-\mu(B)|$ denote the total variation distance of the distribution of $S_{k}$ from the uniform distribution. Note that

[TABLE]

hence $\Delta_{k+1}\leq\Delta_{k}$ .

The precise rate of convergence in the total variation metric was found by Anoussis and Gatzouras [1, Theorem 4.1]. Let

[TABLE]

where $\rho(\hat{\nu}(\pi))$ denotes the spectral radius of the matrix $\hat{\nu}(\pi)$ . Then $\lim_{k\to\infty}\Delta_{k}^{1/k}=\inf_{k\geq 1}\Delta_{k}^{1/k}=q$ ; moreover, $q<1$ if and only if $\nu$ is adapted and strictly aperiodic, and $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . Thus, as already stated in Theorem B, whenever $\|\nu^{*k}-\mu\|_{\mathrm{TV}}\to 0$ , the convergence is necessarily exponentially fast; more precisely, we have $q^{k}\leq\Delta_{k}$ for every $k\geq 1$ , and $\Delta_{k}\leq(q+\varepsilon)^{k}$ for every $k\geq k_{0}(\nu,\varepsilon)$ . Let $\Delta=1+2\sum_{k=1}^{\infty}\Delta_{k}$ , and observe $1/(1-q)\leq\Delta$ .

We now give a more quantitative form of Theorem 1. For any $m\geq 1$ and $\varepsilon>0$ let $\varphi_{m,\varepsilon}(N)=N\left(\prod_{i=1}^{m-1}\log_{i}N\right)\left(\log_{m}N\right)^{1+\varepsilon}$ , where $\log_{1}N=\log N$ and $\log_{i}N=\log\log_{i-1}N$ denotes the $i$ -fold iterated logarithm.

Theorem 3.

Suppose that $\nu$ is adapted, and that $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . Let $f:G\to\mathbb{R}$ be Borel measurable such that $\int_{G}f\,\mathrm{d}\mu=0$ .

(i)

If $\sup_{c\in G}\mathbb{E}|f(cX_{1})|^{p}<\infty$ for some $1\leq p\leq 2$ , then for any $m\geq 1$ and $\varepsilon>0$

[TABLE] 2. (ii)

If $\sup_{c\in G}\mathbb{E}|f(cX_{1})|^{p}<\infty$ for some $p>2$ , then

[TABLE]

Remark.

Note that $\sup_{c\in G}\mathbb{E}|f(cX_{1})|^{p}<\infty$ clearly implies $f\in L^{p}(G)$ . To mention a sufficient condition, suppose that $\nu$ is absolutely continuous with density $\frac{\mathrm{d}\nu}{\mathrm{d}\mu}$ . If there exists a Hölder conjugate pair $r,s\in[1,\infty]$ , $1/r+1/s=1$ such that $f\in L^{pr}(G)$ and $\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\in L^{s}(G)$ , then $\sup_{c\in G}\mathbb{E}|f(cX_{1})|^{p}\leq\|f\|_{pr}^{p}\|\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\|_{s}<\infty$ .

Under the extra condition that $\nu$ is strictly aperiodic, we will approximate $\sum_{k=1}^{N}f(S_{k})$ by a sum of independent random variables. Following Strassen [15], we can even construct an almost sure approximation by a Wiener process. To state the precise form of this result, let us introduce the following technical definition. Given a function $E(t)$ positive on $(t_{0},\infty)$ for some $t_{0}$ , we shall say that two stochastic processes $Y(t)$ and $Z(t)$ in the Skorokhod space $D[0,\infty)$ , possibly defined on different probability spaces, are $o(E(t))$ -equivalent if there exist finitely many processes $Y_{1}(t),Y_{2}(t),\dots,Y_{m}(t)$ in $D[0,\infty)$ such that $Y_{1}(t)=Y(t)$ , $Y_{m}(t)=Z(t)$ , and for all $1\leq i\leq m-1$ one of the following hold:

(i)

The processes $Y_{i}(t)$ and $Y_{i+1}(t)$ , possibly defined on different probability spaces, have the same distribution. 2. (ii)

The processes $Y_{i}(t)$ and $Y_{i+1}(t)$ are defined on the same probability space, and $\lim_{t\to\infty}(Y_{i}(t)-Y_{i+1}(t))/E(t)=0$ a.s.

Roughly speaking (ignoring the different probability spaces), $Y(t)$ and $Z(t)$ being $o(E(t))$ -equivalent thus means $Y(t)=Z(t)+o(E(t))$ a.s. Given $f\in L^{2}(G)$ with $\int_{G}f\,\mathrm{d}\mu=0$ , let

[TABLE]

where $U$ is a uniformly distributed $G$ -valued random variable, independent of $X_{1},X_{2},\ldots$ . As we shall see, the series in (5) is absolutely convergent, and $C(f,\nu)\geq 0$ .

Theorem 4.

Suppose that $\nu$ is adapted and strictly aperiodic, and that $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . Let $f:G\to\mathbb{R}$ be Borel measurable such that $\int_{G}f\,\mathrm{d}\mu=0$ . If $\sup_{c\in G}\mathbb{E}|f(cX_{1})|^{2+\delta}<\infty$ for some $0<\delta<2$ , then the processes $\sum_{k\leq t}f(S_{k})$ and $\sqrt{C(f,\nu)}W(t)$ are $o(t^{1/2-\delta/20})$ -equivalent, where $W(t)$ is a standard Wiener process.

The almost sure approximation by a Wiener process yields even more precise asymptotics than those in Theorem 3; for instance, it shows that for strictly aperiodic $\nu$ the value of the limsup in (4) is $\sqrt{2C(f,\nu)}$ . The almost sure asymptotics as well as the limit distribution of continuous functionals of the process $\sum_{k\leq t}f(S_{k})$ also follow. Instead of the random step functions $\sum_{k\leq t}f(S_{k})$ , we could have also used the piecewise linear functions $\sum_{k\leq\lfloor t\rfloor}f(S_{k})+\left(t-\lfloor t\rfloor\right)f(S_{\lfloor t\rfloor+1})$ . In that case the $o(t^{1/2-\delta/20})$ -equivalence holds in the space $C[0,\infty)$ as well.

In Theorem 4 in general we only know $C(f,\nu)\geq 0$ ; in the case $C(f,\nu)=0$ the result simply states that $\sum_{k\leq t}f(S_{k})=o(t^{1/2-\delta/20})$ a.s. The natural question whether $C(f,\nu)>0$ is surprisingly delicate. We shall prove a necessary and sufficient condition in terms of irreducible unitary representations of $G$ , see Proposition 7 below. As this condition is rather cumbersome to use, we also give simpler criteria to ensure $C(f,\nu)>0$ . In particular, we will show that under mild technical assumptions (e.g. $f$ is a class function or $\nu$ is a central measure) we have $C(f,\nu)=0$ if and only if $f=0$ $\mu$ -a.e. We work out the details in Section 3.

It clearly follows from Theorem 4 that $N^{-1/2}\sum_{k=1}^{N}f(S_{k})$ has a (possibly degenerate) Gaussian limit distribution. Under slightly weaker assumptions than those in Theorem 4 we also prove a Lyapunov-type bound on the remainder term in the central limit theorem. Let $\Phi(x)=\int_{-\infty}^{x}(2\pi)^{-1/2}e^{-t^{2}/2}\,\mathrm{d}t$ denote the standard normal distribution function.

Theorem 5.

Suppose that $\nu$ is adapted and strictly aperiodic, and that $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . Let $f:G\to\mathbb{R}$ be Borel measurable such that $\int_{G}f\,\mathrm{d}\mu=0$ , and assume $f\in L^{2+\delta}(G)$ for some $0<\delta\leq 1$ and $\sup_{c\in G}\mathbb{E}f(cX_{1})^{2}<\infty$ .

(i)

If $C(f,\nu)>0$ , then for any integer $N\geq N_{0}(f,\nu,\delta)$

[TABLE]

where $K=\Delta\left(\|f\|_{2+\delta}/\sqrt{C(f,\nu)}\right)^{(2+\delta)/(1+\delta)}$ . 2. (ii)

If $C(f,\nu)=0$ , then $N^{-1/2}\sum_{k=1}^{N}f(S_{k})\to 0$ in $L^{2}$ .

Remark.

If $f\in L^{3}(G)$ , then the right hand side of (6) becomes $KN^{-1/4}\log^{1/2}N$ with $K=\Delta\|f\|_{3}^{3/2}/C(f,\nu)^{3/4}$ . We mention that if $f\in L^{4}(G)$ , then here $\|f\|_{3}$ can be replaced by $\|f\|_{2}$ (see the end of Section 5.2). As we will see in Proposition 8 below, if $f$ is a class function or $\nu$ is a central measure, then $C(f,\nu)\geq\|f\|_{2}^{2}/(2\Delta)$ . Thus in this case $K\leq 2\Delta^{7/4}$ , so the right hand side of (6) does not depend on $f$ . ( $N_{0}(f,\nu,\delta)$ always depends on $f$ , however.)

3 Moment estimates

Throughout this section we assume that $\nu$ is adapted and strictly aperiodic, and that $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . Further, we fix a Borel measurable function $f:G\to\mathbb{R}$ such that $\int_{G}f\,\mathrm{d}\mu=0$ , and a uniformly distributed $G$ -valued random variable $U$ independent of $X_{1},X_{2},\ldots$ . We now prove moment estimates for the modified sum $\sum_{k=1}^{N}f(US_{k})$ . In Section 4 we shall give the counterparts of these estimates for shifted sums $\sum_{k=M+1}^{M+N}f(S_{k})$ .

For every nonempty, finite interval of positive integers $J\subset\mathbb{N}$ let $S_{J}=\prod_{j\in J}X_{j}$ . Note that $S_{J}$ has distribution $\nu^{*|J|}$ , hence by definition for any bounded, Borel measurable function $g:G\to\mathbb{R}$

[TABLE]

Proposition 6.

Assume $f\in L^{2}(G)$ . The series in (5) is absolutely convergent, and $0\leq C(f,\nu)\leq\|f\|_{2}^{2}\Delta$ . Further, for any integer $N\geq 1$

[TABLE]

Proof.

Let $A_{k}=\mathbb{E}f(U)f(US_{k})$ . Since $U$ is independent of $S_{k}$ , we have

[TABLE]

The function $g(x)=\int_{G}f(u)f(ux)\,\mathrm{d}\mu(u)$ satisfies $\int_{G}g\,\mathrm{d}\mu=0$ and $\sup_{G}|g|\leq\|f\|_{2}^{2}$ . Applying (7) to $g$ we thus obtain $\left|A_{k}\right|\leq\|f\|_{2}^{2}\Delta_{k}$ . Since $\Delta_{k}\to 0$ exponentially fast, the series in (5) is absolutely convergent. As $\mathbb{E}f(U)^{2}=\|f\|_{2}^{2}$ , we have $C(f,\nu)\leq\|f\|_{2}^{2}\Delta$ . Finally, $C(f,\nu)\geq 0$ will follow from the second claim.

Expanding the square we have

[TABLE]

Let us write $US_{\ell}=US_{k}S_{[k+1,\ell]}$ . Since $\mu*\nu^{*k}=\mu$ , the variable $US_{k}$ is uniformly distributed on $G$ and independent of $S_{[k+1,\ell]}$ ; moreover, $S_{[k+1,\ell]}\overset{d}{=}S_{\ell-k}$ . Thus $\mathbb{E}f(US_{k})^{2}=\mathbb{E}f(U)^{2}$ and $\mathbb{E}f(US_{k})f(US_{\ell})=\mathbb{E}f(U)f(US_{\ell-k})$ , so (9) simplifies as

[TABLE]

The second claim thus follows from $|A_{d}|\leq\|f\|_{2}^{2}\Delta_{d}$ and the fact that $\Delta_{d}\to 0$ exponentially fast. ∎

We now study the question whether the normalizing factor $C(f,\nu)$ in the variance is zero or positive. To this end, we derive an alternative formula for $C(f,\nu)$ in the form of an infinite series with nonnegative terms. Next, we will consider the special case when $f$ is a class function or $\nu$ is a central measure. As we shall see, the behavior of $C(f,\nu)$ then simplifies considerably, allowing for effective lower bounds.

Proposition 7.

Assume $f\in L^{2}(G)$ . We have

[TABLE]

where $B_{\nu}(\pi)=I_{d_{\pi}}+(I_{d_{\pi}}-\hat{\nu}(\pi))^{-1}\hat{\nu}(\pi)+(I_{d_{\pi}}-\hat{\nu}(\pi)^{*})^{-1}\hat{\nu}(\pi)^{*}$ and $I_{d_{\pi}}$ denotes the $d_{\pi}\times d_{\pi}$ identity matrix. The series in (10) has nonnegative terms and is convergent. In particular, $C(f,\nu)>0$ if and only if at least one term in (10) is nonzero.

Proof.

Let $A_{k}=\mathbb{E}f(U)f(US_{k})$ , and recall (8). The function $h_{k}(u)=\int_{G}f(ux)\,\mathrm{d}\nu^{*k}(x)$ is in $L^{2}(G)$ , and its Fourier coefficients are $\widehat{h_{k}}(\pi)=\hat{f}(\pi)\left(\hat{\nu}(\pi)^{k}\right)^{*}$ . Applying the Parseval formula in (8) we thus obtain

[TABLE]

Here $\hat{f}(\pi_{0})=0$ as $\int_{G}f\,\mathrm{d}\mu=0$ . Fix $\pi\in\hat{G}$ , $\pi\neq\pi_{0}$ . For any $v\in\mathbb{C}^{d_{\pi}}$ we have $\langle\hat{\nu}(\pi)^{k}v,v\rangle=\int_{G}\langle\pi(x)^{*}v,v\rangle\,\mathrm{d}\nu^{*k}(x)$ , where $\langle a,b\rangle=\sum_{i=1}^{d_{\pi}}a_{i}\overline{b_{i}}$ denotes the standard sesquilinear form on $\mathbb{C}^{d_{\pi}}$ . Since $\pi(x)^{*}$ is unitary, the integrand $g(x)=\langle\pi(x)^{*}v,v\rangle$ satisfies $\sup_{G}|g|\leq|v|^{2}$ . Further, $\int_{G}g\,\mathrm{d}\mu=\langle\hat{\mu}(\pi)v,v\rangle=0$ because $\pi\neq\pi_{0}$ . Applying (7) to $g$ we thus obtain $|\langle\hat{\nu}(\pi)^{k}v,v\rangle|\leq|v|^{2}\Delta_{k}$ . In particular, for any $v\in\mathbb{C}^{d_{\pi}}$ we have $|\langle\hat{f}(\pi)\hat{\nu}(\pi)^{k}\hat{f}(\pi)^{*}v,v\rangle|\leq\langle\hat{f}(\pi)\hat{f}(\pi)^{*}v,v\rangle\Delta_{k}$ . Summing this over an orthonormal basis of $\mathbb{C}^{d_{\pi}}$ we get $\left|\mathrm{tr}\left(\hat{f}(\pi)\hat{\nu}(\pi)^{k}\hat{f}(\pi)^{*}\right)\right|\leq\mathrm{tr}\left(\hat{f}(\pi)\hat{f}(\pi)^{*}\right)\Delta_{k}$ . Hence

[TABLE]

justifying a change in the order of summation. Since $\rho(\hat{\nu}(\pi))\leq q<1$ , we have $\sum_{k=1}^{\infty}\hat{\nu}(\pi)^{k}=(I_{d_{\pi}}-\hat{\nu}(\pi))^{-1}\hat{\nu}(\pi)$ in operator norm. We thus obtain

[TABLE]

As $C(f,\nu)$ is clearly real, we can take the real part of the series in the previous line, resulting in formula (10).

Finally, we prove that every term in (10) is nonnegative. Fix $\pi\in\hat{G}\backslash\{\pi_{0}\}$ . First, suppose that $\pi$ and $\overline{\pi}$ are unitarily inequivalent. Then we may assume that $\pi,\overline{\pi}\in\hat{G}$ . Since $f$ and $\nu$ are real-valued, we have $\hat{f}(\overline{\pi})=\overline{\hat{f}(\pi)}$ and $\hat{\nu}(\overline{\pi})=\overline{\hat{\nu}(\pi)}$ . Hence $B_{\nu}(\overline{\pi})=\overline{B_{\nu}(\pi)}$ , and the terms in (10) indexed by $\pi$ and $\overline{\pi}$ are equal. Let $F$ be the orthogonal projection of $f$ in $L^{2}(G)$ to the linear subspace spanned by the matrix elements $\{\pi_{ij}:1\leq i,j\leq d_{\pi}\}\cup\{\overline{\pi}_{ij}:1\leq i,j\leq d_{\pi}\}$ ; that is, $F(x)=d_{\pi}\mathrm{tr}\left(\hat{f}(\pi)\pi(x)\right)+d_{\pi}\mathrm{tr}\left(\hat{f}(\overline{\pi})\overline{\pi}(x)\right)$ . Note that $F$ is real-valued, $\hat{F}(\pi)=\hat{f}(\pi)$ , $\hat{F}(\overline{\pi})=\hat{f}(\overline{\pi})$ and $\hat{F}(\pi^{\prime})=0$ for all $\pi^{\prime}\neq\pi,\overline{\pi}$ . Therefore the terms in (10) indexed by $\pi$ and $\overline{\pi}$ are both $C(F,\nu)/2$ . But $C(F,\nu)\geq 0$ from Proposition 6, and we are done. Next, suppose that $\pi$ and $\overline{\pi}$ are unitarily equivalent. Let $F$ be the orthogonal projection of $f$ in $L^{2}(G)$ to the linear subspace spanned by the matrix elements $\{\pi_{ij}:1\leq i,j\leq d_{\pi}\}$ ; note that $\{\overline{\pi}_{ij}:1\leq i,j\leq d_{\pi}\}$ span the same linear subspace. Thus $F(x)=d_{\pi}\mathrm{tr}\left(\hat{f}(\pi)\pi(x)\right)=d_{\pi}\mathrm{tr}\left(\hat{f}(\overline{\pi})\overline{\pi}(x)\right)$ . Again, $F$ is real-valued, $\hat{F}(\pi)=\hat{f}(\pi)$ and $\hat{F}(\pi^{\prime})=0$ for all $\pi^{\prime}\neq\pi$ . Therefore the term in (10) indexed by $\pi$ is $C(F,\nu)\geq 0$ . ∎

Proposition 8.

Assume $f\in L^{2}(G)$ , and let $\nu^{*}(B)=\nu(B^{-1})$ ( $B\subseteq G$ Borel) denote the distribution of $X_{1}^{-1}$ . Suppose at least one of the following hold.

(i)

$f$ * is a class function* 2. (ii)

$\nu*\nu^{*}=\nu^{*}*\nu$ ** 3. (iii)

$\nu$ * is a central measure*

Then $\frac{1-q}{1+q}\|f\|_{2}^{2}\leq C(f,\nu)\leq\frac{1+q}{1-q}\|f\|_{2}^{2}$ . In particular, $C(f,\nu)=0$ if and only if $f=0$ $\mu$ -a.e.

Proof.

First, assume (i). It follows from Schur’s lemma that $\hat{f}(\pi)$ is a scalar multiple of the identity matrix. Hence (10) simplifies as

[TABLE]

Let $\lambda_{1},\lambda_{2},\dots,\lambda_{d_{\pi}}$ denote the eigenvalues of $\hat{\nu}(\pi)$ . Then

[TABLE]

Since $\rho(\hat{\nu}(\pi))\leq q<1$ , we have $|\lambda_{i}|\leq q$ , and so $d_{\pi}\frac{1-q}{1+q}\leq\mathrm{tr}\,B_{\nu}(\pi)\leq d_{\pi}\frac{1+q}{1-q}$ . The claim thus follows from the Parseval formula.

Next, assume (ii). Since $\widehat{\nu^{*}}(\pi)=\hat{\nu}(\pi)^{*}$ , the condition $\nu*\nu^{*}=\nu^{*}*\nu$ implies that the matrix $\hat{\nu}(\pi)$ is normal. Therefore there exists an orthonormal basis $v_{1},v_{2},\dots,v_{d_{\pi}}$ of $\mathbb{C}^{d_{\pi}}$ comprised of eigenvectors of $\hat{\nu}(\pi)$ ; say, $\hat{\nu}(\pi)v_{i}=\lambda_{i}v_{i}$ . It follows that $\hat{\nu}(\pi)^{*}v_{i}=\overline{\lambda_{i}}v_{i}$ , and hence

[TABLE]

The eigenvalues of $B_{\nu}(\pi)$ again satisfy $\frac{1-q}{1+q}\leq\frac{1-|\lambda_{i}|^{2}}{|1-\lambda_{i}|^{2}}\leq\frac{1+q}{1-q}$ . It is now easy to see that

[TABLE]

and the claim follows. Finally, note that condition (iii) implies condition (ii). ∎

We conclude this section with an estimate of the $L^{p}$ -norm for $1\leq p\leq 4$ . These estimates, combined with the Erdős–Stechkin and the Rademacher–Menshov inequalities will help us bound the fluctuations of $\sum_{k=1}^{N}f(S_{k})$ as $N$ runs in a short interval. Additionally, we will also use them to verify the Lyapunov condition in the proof of Theorem 5.

Proposition 9.

Assume $f\in L^{p}(G)$ for some $1\leq p\leq 4$ . For any integer $N\geq 1$

[TABLE]

In the case $p=4$ we also have

[TABLE]

Proof.

First, assume $p=4$ . Expanding the fourth power we get

[TABLE]

Fix $1\leq k_{1}\leq k_{2}\leq k_{3}\leq k_{4}\leq N$ . Since $US_{k_{1}}$ is uniformly distributed on $G$ and independent of $X_{k_{1}+1},X_{k_{1}+2},\ldots$ , we have

[TABLE]

Here we use the convention that $\nu^{*0}$ is the Dirac measure concentrated on the identity element of $G$ , and $\Delta_{0}=\|\nu^{*0}-\mu\|_{\mathrm{TV}}\leq 2$ . Let

[TABLE]

As $\int_{G}g(z)\,\mathrm{d}\mu(z)=0$ , applying (7) to $g$ we obtain

[TABLE]

Fix $z\in G$ , and let $h_{z}(y)=\int_{G}\int_{G}f(u)f(ux)f(uxy)f(uxyz)\,\mathrm{d}\mu(u)\mathrm{d}\nu^{*(k_{2}-k_{1})}(x)$ . Note that

[TABLE]

where $w_{z}=\int_{G}f(uxy)f(uxyz)\mathrm{d}\mu(y)=\int_{G}f(y)f(yz)\mathrm{d}\mu(y)$ does not depend on $u$ and $x$ . Applying (7) to $h_{z}$ we get

[TABLE]

Here $|w_{z}|\leq\|f\|_{2}^{2}$ , and the double integral in the previous line is $\leq\|f\|_{2}^{2}\Delta_{k_{2}-k_{1}}$ , as seen in the proof of Proposition 6. Hence

[TABLE]

Now fix $y,z\in G$ , and let $r_{y,z}(x)=\int_{G}f(u)f(ux)f(uxy)f(uxyz)\,\mathrm{d}\mu(u)$ . Note that $\sup_{G}|r_{y,z}|\leq\|f\|_{4}^{4}$ , and that $\int_{G}r_{y,z}(x)\,\mathrm{d}\mu(x)=0$ . Applying (7) we thus get

[TABLE]

Combining (14), (15) and (16) we finally obtain

[TABLE]

Summing over $1\leq k_{1}\leq k_{2}\leq k_{3}\leq k_{4}\leq N$ , (12) follows.

On the other hand, we can use $\Delta_{k_{3}-k_{2}}\leq 2$ to deduce the simpler estimate

[TABLE]

and by summing over $1\leq k_{1}\leq k_{2}\leq k_{3}\leq k_{4}\leq N$ we get $\left\|\sum_{k=1}^{N}f(US_{k})\right\|_{4}\ll\|f\|_{4}\sqrt{\Delta N}$ . Proposition 6 shows that if $f\in L^{2}(G)$ , the same estimate holds with $\|\cdot\|_{4}$ replaced by $\|\cdot\|_{2}$ on both sides. Moreover, we also have the trivial estimate $\left\|\sum_{k=1}^{N}f(US_{k})\right\|_{1}\leq\|f\|_{1}N$ for any $f\in L^{1}(G)$ . This settles the endpoints of the intervals $1\leq p\leq 2$ and $2\leq p\leq 4$ . The cases $1<p<2$ and $2<p<4$ follow from the Riesz–Thorin interpolation theorem applied to the linear operator $f\mapsto\sum_{k=1}^{N}f(US_{k})-N\int_{G}f\,\mathrm{d}\mu$ . ∎

4 Approximation by independent variables

Assume again, that $\nu$ is adapted and strictly aperiodic, and that $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . Fix a Borel measurable function $f:G\to\mathbb{R}$ such that $\int_{G}f\,\mathrm{d}\mu=0$ . In this section we approximate the shifted sum $\sum_{k=M+1}^{M+N}f(S_{k})$ by a sum of independent random variables. The main tool of this approximation is a coupling between $\nu^{*k}$ and $\mu$ , which we will construct using Strassen’s theorem. We mention that this is the only step of the proof where we use the fact that $G$ is metrizable. A similar approach was used by Schatte [14] on the circle group $G=\mathbb{R}/\mathbb{Z}$ , with a different type of coupling based on the Kolmogorov metric instead of the total variation metric.

Recall that for any two Borel probability measures $\nu_{1}$ and $\nu_{2}$ on $G$ (or indeed, on any Polish space) we have $\|\nu_{1}-\nu_{2}\|_{\mathrm{TV}}=2\inf_{\vartheta}\vartheta\left(\{(x,y)\in G\times G:x\neq y\}\right)$ , where the infimum is over all Borel probability measures $\vartheta$ on $G\times G$ whose marginals are $\vartheta(B\times G)=\nu_{1}(B)$ and $\vartheta(G\times B)=\nu_{2}(B)$ . This fact follows from Strassen’s theorem, which in turn is a special case of the Kantorovich duality theorem in the theory of optimal transportation (see e.g. [18, Chapter 1]). In particular, for any $k\geq 1$ there exists a Borel probability measure $\vartheta_{k}$ on $G\times G$ with marginals $\nu^{*k}$ and $\mu$ such that $\vartheta_{k}\left(\{(x,y)\in G\times G:x\neq y\}\right)\leq\Delta_{k}$ . After a suitable extension of the probability space, for any nonempty, finite interval of positive integers $J\subseteq\mathbb{N}$ we may therefore introduce auxiliary $G$ -valued random variables $T_{J},U_{J}$ whose joint distribution is $\vartheta_{|J|}$ ; that is, $T_{J}\overset{d}{=}S_{J}$ , $U_{J}$ is uniformly distributed on $G$ , and $\Pr(T_{J}\neq U_{J})\leq\Delta_{|J|}$ . Moreover, we may assume $(T_{J},U_{J})$ , $J\subseteq\mathbb{N}$ and $X_{1},X_{2},\ldots$ are independent. The independence of the approximating variables will follow from the following observation.

Lemma 1.

Let $G$ be a compact metrizable group, and let $(S,\mathcal{A})$ be a measurable space. Let $U$ be a $G$ -valued, and let $V$ be an $S$ -valued random variable. If $U$ and $V$ are independent and $U$ is uniformly distributed on $G$ , then for any Borel measurable function $g:S\to G$ the variables $g(V)U$ and $V$ are also independent.

Proof.

Note that $g(V)U$ is uniformly distributed on $G$ . Let $\gamma$ denote the distribution of $V$ . For any Borel set $B\subseteq G$ and any $A\in\mathcal{A}$ we have

[TABLE]

∎

We construct the approximating variables as follows. Fix an integer $M\geq 0$ , and let us decompose the infinite set $\{M+1,M+2,\ldots\}$ into consecutive, nonempty, finite intervals of integers $H_{1},J_{1},H_{2},J_{2},\ldots$ . For all $i\geq 1$ and $k\in J_{i}$ let

[TABLE]

Similarly, for all $i\geq 2$ and $k\in H_{i}$ let

[TABLE]

Note that here the case $i=1$ is excluded to ensure that $H_{i}$ is preceded by an interval $J_{i-1}$ . Let us also introduce the variables

[TABLE]

Observe that the random sequence $W_{k}$ , $k\in\bigcup_{i=1}^{\infty}J_{i}$ has the same distribution as $S_{k}$ , $k\in\bigcup_{i=1}^{\infty}J_{i}$ . Similarly, $W_{k}$ , $k\in\bigcup_{i=2}^{\infty}H_{i}$ has the same distribution as $S_{k}$ , $k\in\bigcup_{i=2}^{\infty}H_{i}$ .

For every $R\geq 1$ let $N_{R}$ be such that $M+N_{R}=\max J_{R}$ . Then $\sum_{k=M+1}^{M+N}f(S_{k})$ along the subsequence $N_{R}$ satisfies

[TABLE]

Here the sequence $\sum_{i=1}^{R}\sum_{k\in J_{i}}f(S_{k})$ , $R=1,2,\dots$ has the same distribution as $\sum_{i=1}^{R}Y_{i}$ , $R=1,2,\dots$ ; similarly, the sequence $\sum_{i=2}^{R}\sum_{k\in H_{i}}f(S_{k})$ , $R=2,3,\dots$ has the same distribution as $\sum_{i=2}^{R}Z_{i}$ , $R=2,3,\dots$ . The main idea is to replace $Y_{i}$ by $Y_{i}^{*}$ , and $Z_{i}$ by $Z_{i}^{*}$ . First, we establish the properties of the approximating variables $Y_{i}^{*}$ and $Z_{i}^{*}$ , then we estimate the error committed.

Lemma 2.

$Y_{1}^{*},Y_{2}^{*},\ldots$ * are independent, and $\mathbb{E}Y_{i}^{*}=0$ .*

(i)

If $f\in L^{2}(G)$ , then $\mathbb{E}(Y_{i}^{*})^{2}=C(f,\nu)|J_{i}|+O_{\nu}(\|f\|_{2}^{2})$ . 2. (ii)

If $f\in L^{p}(G)$ for some $1\leq p\leq 4$ , then for any $0\leq R<S$

[TABLE]

In the case $p=4$ we also have

[TABLE]

The same hold for $Z_{2}^{*},Z_{3}^{*},\ldots$ with $|J_{i}|$ replaced by $|H_{i}|$ .

Proof.

To see that $Y_{1}^{*},Y_{2}^{*},\ldots$ are independent, it will be enough to prove that $Y_{i}^{*}$ is independent of the random vector $(Y_{1}^{*},Y_{2}^{*},\dots,Y_{i-1}^{*})$ for all $i\geq 2$ . Let $W$ be the random vector whose coordinates are the variables $X_{k}$ , $k\in[1,M]\cup J_{1}\cup\cdots\cup J_{i-1}$ and $T_{H_{1}},U_{H_{1}},T_{H_{2}},U_{H_{2}},\dots,T_{H_{i-1}},U_{H_{i-1}}$ . Further, let $W^{\prime}$ be the random vector with coordinates $X_{k}$ , $k\in J_{i}$ . Applying Lemma 1 to $V=(W,W^{\prime})$ and $U=U_{H_{i}}$ we get that $(W,W^{\prime})$ and $g(W,W^{\prime})U_{H_{i}}$ are independent for any Borel measurable function $g$ . But $W$ and $W^{\prime}$ are also independent, therefore $W$ , $W^{\prime}$ , $g(W,W^{\prime})U_{H_{i}}$ are independent as well. Note that $(Y_{1}^{*},Y_{2}^{*},\dots,Y_{i-1}^{*})$ is a function of $W$ , whereas $Y_{i}^{*}$ is a function of $W^{\prime}$ and $g(W,W^{\prime})U_{H_{i}}$ for some $g$ (in fact, $g(W,W^{\prime})$ is simply the product of certain components of $W$ ). The independence thus follows.

Now fix $i\geq 1$ . Note that $S_{M}\prod_{j=1}^{i-1}\left(T_{H_{j}}S_{J_{j}}\right)U_{H_{i}}$ is uniformly distributed on $G$ and independent of $X_{k}$ , $k\in J_{i}$ . Hence $Y_{i}^{*}=\sum_{k\in J_{i}}f(W_{k}^{*})\overset{d}{=}\sum_{k=1}^{|J_{i}|}f(US_{k})$ . Here $US_{k}$ is uniformly distributed on $G$ ; in particular, $\mathbb{E}Y_{i}^{*}=\sum_{k=1}^{|J_{i}|}\mathbb{E}f(US_{k})=0$ .

Claim (i) follows from Proposition 6. Now fix $0\leq R<S$ , and let us prove (ii). The case $p=1$ follows from $\|Y_{i}^{*}\|_{1}\leq\sum_{k=1}^{|J_{i}|}\|f(US_{k})\|_{1}=\|f\|_{1}|J_{i}|$ . If $p=2$ , Proposition 6 gives $\|Y_{i}^{*}\|_{2}=\left\|\sum_{k=1}^{|J_{i}|}f(US_{k})\right\|_{2}\leq\|f\|_{2}\sqrt{\Delta|J_{i}|}$ , hence the claim follows from independence. Now assume $p=4$ . The independence of $Y_{1}^{*},Y_{2}^{*},\dots$ implies

[TABLE]

Proposition 9 shows $\|Y_{i}^{*}\|_{4}=\left\|\sum_{k=1}^{|J_{i}|}f(US_{k})\right\|_{4}\ll\|f\|_{2}\sqrt{\Delta|J_{i}|}+\|f\|_{4}\Delta^{3/4}|J_{i}|^{1/4}$ , yielding (17). On the other hand, Proposition 9 also gives $\|Y_{i}^{*}\|_{4}\ll\|f\|_{4}\sqrt{\Delta|J_{i}|}$ , and so $\left\|\sum_{i=R+1}^{S}Y_{i}^{*}\right\|_{4}\ll\|f\|_{4}\sqrt{\Delta\sum_{i=R+1}^{S}|J_{i}|}$ follows as well. This settles the endpoints of the intervals $1\leq p\leq 2$ and $2\leq p\leq 4$ .

Observe that for a given integer $M\geq 0$ , given intervals $H_{1},J_{1},\dots$ and given $0\leq R<S$ , the sum $\sum_{i=R+1}^{S}Y_{i}^{*}$ is linear in $f$ . Applying the Riesz–Thorin interpolation theorem to the linear operator $f\mapsto\sum_{i=R+1}^{S}Y_{i}^{*}-\left(\sum_{i=R+1}^{S}|J_{i}|\right)\int_{G}f\,\mathrm{d}\mu$ , the cases $1<p<2$ and $2<p<4$ follow. The proof for $Z_{2}^{*},Z_{3}^{*},\ldots$ is analogous. ∎

Lemma 3.

If $L_{p}=\sup_{c\in G}\mathbb{E}|f(cX_{1})|^{p}<\infty$ for some $p\geq 1$ , then $\|Y_{i}-Y_{i}^{*}\|_{p}\leq 2|J_{i}|\left(L_{p}\Delta_{|H_{i}|}\right)^{1/p}$ and $\|Z_{i}-Z_{i}^{*}\|_{p}\leq 2|H_{i}|\left(L_{p}\Delta_{|J_{i}|}\right)^{1/p}$ .

Proof.

We have $\|Y_{i}-Y_{i}^{*}\|_{p}\leq\sum_{k\in J_{i}}\|f(W_{k})-f(W_{k}^{*})\|_{p}$ . Let $\mathcal{F}$ be the $\sigma$ -algebra generated by $S_{M}\prod_{j=1}^{i-1}\left(T_{H_{j}}S_{J_{j}}\right)$ , $T_{H_{i}}$ , $U_{H_{i}}$ and $X_{\ell}$ , $\ell\in J_{i}$ , $\ell<k$ . Then $W_{k}=aX_{k}$ and $W_{k}^{*}=a^{*}X_{k}$ with some $\mathcal{F}$ -measurable random variables $a,a^{*}$ . Note that if $T_{H_{i}}=U_{H_{i}}$ , then $W_{k}=W_{k}^{*}$ . Therefore $\mathbb{E}\left(\left|f(W_{k})-f(W_{k}^{*})\right|^{p}\mid\mathcal{F}\right)\leq 2^{p}L_{p}I_{\{T_{H_{i}}\neq U_{H_{i}}\}}$ . Taking the (total) expectation we get $\mathbb{E}\left|f(W_{k})-f(W_{k}^{*})\right|^{p}\leq 2^{p}L_{p}\Pr(T_{H_{i}}\neq U_{H_{i}})$ $\leq 2^{p}L_{p}\Delta_{|H_{i}|}$ , and the result follows. The proof for $\|Z_{i}-Z_{i}^{*}\|_{p}$ is analogous. ∎

As a simple application of the approximating variables constructed above, we deduce moment estimates for shifted sums $\sum_{k=M+1}^{M+N}f(S_{k})$ from the results of Section 3.

Corollary 10.

(i)

If $\sup_{c\in G}\mathbb{E}f(cX_{1})^{2}<\infty$ , then for any integers $M\geq 0$ and $N\geq 1$

[TABLE] 2. (ii)

If $\sup_{c\in G}\mathbb{E}\left|f(cX_{1})\right|^{p}<\infty$ for some $1\leq p\leq 4$ , then for any integers $M\geq 0$ and $N\geq 1$

[TABLE]

with some constant $K_{f,\nu,p}>0$ . In the case $p=4$ we also have

[TABLE]

Proof.

We may assume that $N$ is large enough in terms of $f$ , $\nu$ and $p$ . Let us decompose the index set $[M+1,M+N]$ into two consecutive intervals of integers $H_{1}$ and $J_{1}$ such that $|H_{1}|=\lceil 4\Delta\log N\rceil$ . We then have $\sum_{k=M+1}^{M+N}f(S_{k})=\sum_{k\in H_{1}}f(S_{k})+\sum_{k\in J_{1}}f(S_{k})$ , where $\sum_{k\in J_{1}}f(S_{k})\overset{d}{=}Y_{1}$ . To see (i), let us write

[TABLE]

By Lemma 2 (i), here $\|Y_{1}^{*}\|_{2}=\sqrt{C(f,\nu)N}+O_{f,\nu}(1)$ . Since, say, $\Delta_{k}^{1/k}\leq(1+q)/2$ for $k\geq k_{0}(\nu)$ and $((1+q)/2)^{\Delta}\leq((1+q)/2)^{1/(1-q)}\leq e^{-1/2}$ , we have $\Delta_{|H_{1}|}\leq N^{-2}$ . Lemma 3 thus gives $\|Y_{1}-Y_{1}^{*}\|_{2}\ll_{f,\nu}1$ . Finally, note that $\sup_{c\in G}\mathbb{E}f(cX_{1})^{2}<\infty$ implies $\sup_{k\geq 1}\mathbb{E}f(S_{k})^{2}<\infty$ . Hence $\sum_{k\in H_{1}}\|f(S_{k})\|_{2}\ll_{f,\nu}\log N$ , and (i) follows. If we use Lemma 2 (ii) instead of Lemma 2 (i), similar arguments show (ii). ∎

Remark.

We could easily improve the error term $K_{f,\nu,p}\log(N+1)$ in (ii) by decomposing $[M+1,M+N]$ into more than $2$ consecutive intervals of exponentially increasing sizes.

5 Proof of the theorems

We prove Theorem 3 (i) for strictly aperiodic measures and Theorem 4 in Section 5.1; the general case of Theorem 3 and Theorem 1 in Section 5.2; finally, Theorem 5 and the Remark thereafter, and Theorem 2 in Section 5.3.

5.1 Almost sure asymptotics, strictly aperiodic case

Suppose that $\nu$ is adapted and strictly aperiodic, and that $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . Let $f:G\to\mathbb{R}$ be Borel measurable such that $\int_{G}f\,\mathrm{d}\mu=0$ , and assume $\sup_{c\in G}\mathbb{E}|f(cX_{1})|^{p}<\infty$ . In this section we prove the strong law of large numbers (3) in the case $1\leq p\leq 2$ , and the almost sure approximation by a Wiener process in Theorem 4 in the case $2<p<4$ . For the sake of brevity, in the proofs of this section implied constants are allowed to depend on $f$ , $\nu$ and $p$ .

First, assume $1\leq p\leq 2$ . We start by estimating the fluctuations. Recall that $\log_{m}$ denotes the $m$ -fold iterated logarithm.

Lemma 4.

For any integers $m\geq 1$ , $M\geq 0$ and $N\geq N_{0}(f,\nu,p,m)$ , and any $\lambda>\lambda_{0}(f,\nu,p,m)$

[TABLE]

Proof.

We use induction on $m$ . Corollary 10 (ii) and the Rademacher–Menshov inequality [11, Theorem F] give $\left\|\max_{1\leq n\leq N}\left|\sum_{k=M+1}^{M+n}f(S_{k})\right|\right\|_{p}\ll N^{1/p}\log(N+1)$ . The $m=1$ case thus follows from the Markov inequality. Now assume the claim holds for some $m\geq 1$ , and let us prove it for $m+1$ . Let us decompose the index set $[M+1,M+N]$ into consecutive intervals of integers $H_{1},J_{1},H_{2},J_{2},\dots,H_{R},J_{R}$ , as in Section 4, such that $|H_{i}|,|J_{i}|\geq 4\Delta\log N$ for all $i$ , and $R=\Theta(N/\log N)$ . Similarly to the proof of Corollary 10 we have $\Delta_{|H_{i}|},\Delta_{|J_{i}|}\leq N^{-2}$ . Let $M+n_{r}=\max J_{r}$ , and recall that for any $2\leq r\leq R$

[TABLE]

Here the variables $\sum_{i=2}^{r}\sum_{k\in J_{i}}f(S_{k})$ , $2\leq r\leq R$ have the same joint distribution as $\sum_{i=2}^{r}Y_{i}$ , $2\leq r\leq R$ ; similarly, $\sum_{i=2}^{r}\sum_{k\in H_{i}}f(S_{k})$ , $2\leq r\leq R$ have the same joint distribution as $\sum_{i=2}^{r}Z_{i}$ , $2\leq r\leq R$ . Let us introduce the random events

[TABLE]

The event in the claim of the lemma is a subset of $A\cup B\cup\bigcup_{i=1}^{R}C_{i}$ . Applying the inductive hypothesis on the interval $H_{i}\cup J_{i}$ of length $\ll\log N$ we get $\Pr(C_{i})\ll\lambda^{-p}(\log N/N)(\log_{m}\log N)^{p}$ , and hence $\Pr\left(\bigcup_{i=1}^{R}C_{i}\right)\ll\lambda^{-p}(\log_{m+1}N)^{p}$ .

Recall from Lemma 2 (ii) that $\left\|\sum_{i=r+1}^{s}Y_{i}^{*}\right\|_{p}\ll\left(\sum_{i=r+1}^{s}|J_{i}|\right)^{1/p}\ll N^{1/p}$ for any $1\leq r<s\leq R$ . It follows that the median of $\sum_{i=r+1}^{s}Y_{i}^{*}$ is also $\ll N^{1/p}$ , and hence from Lévy’s inequality (see e.g. [12, p. 51]) we get

[TABLE]

provided $\lambda$ is large enough. On the other hand, Lemma 3 gives $\mathbb{E}|Y_{i}-Y_{i}^{*}|^{p}\ll|J_{i}|^{p}\Delta_{|H_{i}|}\ll N^{-2}(\log N)^{p}$ , and thus

[TABLE]

Therefore $\Pr\left(\sum_{i=2}^{R}|Y_{i}-Y_{i}^{*}|\geq(\lambda/8)N^{1/p}\right)\ll\lambda^{-p}$ . This relation, together with (18) shows $\Pr(A)\ll\lambda^{-p}$ . Repeating the same arguments for $Z_{i}$ and $Z_{i}^{*}$ we get $\Pr(B)\ll\lambda^{-p}$ . Hence $\Pr\left(A\cup B\cup\bigcup_{i=1}^{R}C_{i}\right)\ll\lambda^{-p}(\log_{m+1}N)^{p}$ , as claimed. ∎

We are now ready to prove (3). Fix $m\geq 1$ and $\varepsilon>0$ . Let us decompose the set of positive integers into consecutive intervals of integers $H_{1},J_{1},H_{2},J_{2},\dots$ , as in Section 4 (with the choice $M=0$ ), such that, say, $|H_{i}|=|J_{i}|=i$ for all $i\geq 1$ . Similarly to the proof of Corollary 10 we have $i\geq 16\Delta\log i$ , and so $\Delta_{|H_{i}|}=\Delta_{|J_{i}|}\leq i^{-8}$ for all integers $i$ large enough in terms of $\nu$ .

Consider (3) along the subsequence $N_{R}=\max J_{R}=\Theta(R^{2})$ . We have

[TABLE]

Here the sequences $\sum_{i=2}^{R}\sum_{k\in J_{i}}f(S_{k})$ and $\sum_{i=2}^{R}\sum_{k\in H_{i}}f(S_{k})$ , $R=2,3,\dots$ have the same distribution as $\sum_{i=2}^{R}Y_{i}$ and $\sum_{i=2}^{R}Z_{i}$ , $R=2,3,\dots$ , respectively. Using Lemma 2 (ii) we get $\sum_{i=i_{0}(m)}^{\infty}\mathbb{E}|Y_{i}^{*}|^{p}/(i\varphi_{m,\varepsilon}(i))\ll\sum_{i=i_{0}(m)}^{\infty}1/\varphi_{m,\varepsilon}(i)<\infty$ . By a classical form of the strong law of large numbers (see e.g. [12, p. 209]) and $R^{1/p}\varphi_{m,\varepsilon}(R)^{1/p}=\Theta\left(\varphi_{m,\varepsilon}(N_{R})^{1/p}\right)$ , we have

[TABLE]

Lemma 3 gives $\mathbb{E}|Y_{i}-Y_{i}^{*}|^{p}\ll|J_{i}|^{p}\Delta_{|H_{i}|}\ll i^{p-8}$ , and hence $\Pr\left(|Y_{i}-Y_{i}^{*}|\geq 1/i^{2}\right)\ll i^{3p-8}\leq i^{-2}$ . By the Borel–Cantelli lemma $\sum_{i=2}^{\infty}|Y_{i}-Y_{i}^{*}|<\infty$ a.s., and consequently (19) remains true if we replace $Y_{i}^{*}$ by $Y_{i}$ . Repeating the same arguments for $Z_{i}$ and $Z_{i}^{*}$ , we obtain (3) along the subsequence $N_{R}=\max J_{R}$ .

On the other hand, applying Lemma 4 with $m+2$ on the interval $H_{R}\cup J_{R}$ of length $2R$ , we get

[TABLE]

The Borel–Cantelli lemma shows that with probability $1$ , for any $R\geq 1$ and any $N\in H_{R}\cup J_{R}$ the fluctuation satisfies $\left|\sum_{k=\min H_{R}}^{N}f(S_{k})\right|\ll_{\omega}\varphi_{m+1,\varepsilon}(N)^{1/p}$ with an implied constant depending on the point $\omega$ of the probability space. Therefore (3) holds along all $N$ . This finishes the proof of Theorem 3 (i) under the extra condition that $\nu$ is strictly aperiodic.

Proof of Theorem 4.

Assume $p=2+\delta$ for some $0<\delta<2$ , and $C(f,\nu)>0$ . Let us decompose the set of positive integers into consecutive intervals of integers $H_{1},J_{1},H_{2},J_{2},\ldots$ , as in Section 4 (with the choice $M=0$ ), such that $|H_{i}|=\lceil 22\Delta\log(i+1)\rceil$ and $|J_{i}|=\lceil i^{\delta/(4+2\delta)}\rceil$ for all $i\geq 1$ . As before, we have $\Delta_{|H_{i}|}\leq i^{-11}$ and $\Delta_{|J_{i}|}\leq i^{-11}$ for all integers $i$ large enough in terms of $\nu$ and $\delta$ .

Corollary 10 (ii) and the Erdős–Stechkin inequality [11, Theorem A] give $\left\|\max_{1\leq n\leq N}\left|\sum_{k=M+1}^{M+n}f(S_{k})\right|\right\|_{2+\delta}\ll\sqrt{N}$ for any $M\geq 0$ and $N\geq 1$ . Therefore for any $R\geq 1$ we have

[TABLE]

The Borel–Cantelli lemma shows that with probability $1$ , for any $R\geq 1$ and any $N\in H_{R}\cup J_{R}$ the fluctuation satisfies $\left|\sum_{k=\min H_{R}}^{N}f(S_{k})\right|\ll_{\omega}R^{1/2}$ with an implied constant depending on the point $\omega$ of the probability space. For any $t\geq 1$ let $R(t)$ denote the positive integer for which $\lfloor t\rfloor\in H_{R(t)}\cup J_{R(t)}$ . Summing over $\min J_{1}\leq k\leq\max J_{R(t)}$ instead of $1\leq k\leq t$ , we thus obtain

[TABLE]

Here $\sum_{i=1}^{R(t)}\sum_{k\in J_{i}}f(S_{k})\overset{d}{=}\sum_{i=1}^{R(t)}Y_{i}$ and $\sum_{i=2}^{R(t)}\sum_{k\in H_{i}}f(S_{k})\overset{d}{=}\sum_{i=2}^{R(t)}Z_{i}$ in the Skorokhod space $D[0,\infty)$ . From Lemma 3 we get $\mathbb{E}|Y_{i}-Y_{i}^{*}|^{2+\delta}\ll|J_{i}|^{2+\delta}\Delta_{|H_{i}|}\ll i^{-11+\delta/2}$ . Hence $\Pr(|Y_{i}-Y_{i}^{*}|\geq 1/i^{2})\ll i^{-7+5\delta/2}\leq i^{-2}$ , so by the Borel–Cantelli lemma $\sum_{i=1}^{\infty}|Y_{i}-Y_{i}^{*}|<\infty$ a.s. Clearly the same holds for $Z_{i}-Z_{i}^{*}$ .

By Lemma 2 we have $\|Z_{i}^{*}\|_{2+\delta}\ll\sqrt{\log i}$ , and $\sum_{i=2}^{R}\mathbb{E}|Z_{i}^{*}|^{2}=\Theta(R\log R)$ . It follows (see e.g. [12, p. 246]) that $\sum_{i=2}^{R}Z_{i}^{*}$ satisfies the law of the iterated logarithm; in particular, $\left|\sum_{i=2}^{R}Z_{i}^{*}\right|\ll_{\omega}\sqrt{R\log R\log\log R}$ a.s. Note that $R(t)^{1/2}\ll t^{(2+\delta)/(4+3\delta)}$ and $(2+\delta)/(4+3\delta)<1/2-\delta/20$ whenever $0<\delta<2$ . Thus the second double sum on the right hand side of (20) is $o(t^{1/2-\delta/20})$ a.s., and consequently the processes $\sum_{k\leq t}f(S_{k})$ and $\sum_{i=1}^{R(t)}Y_{i}^{*}$ are $o\left(t^{1/2-\delta/20}\right)$ -equivalent.

A special case of a theorem of Strassen [15, Theorem 4.4] states the following. Given independent random variables $\zeta_{i}$ , $i=1,2,\dots$ with $\mathbb{E}\zeta_{i}=0$ and $V_{R}=\sum_{i=1}^{R}\mathbb{E}|\zeta_{i}|^{2}\to\infty$ , for any $t\geq V_{1}$ let $R^{\prime}(t)$ denote the positive integer for which $V_{R^{\prime}(t)}\leq t<V_{R^{\prime}(t)+1}$ . If $\sum_{i=1}^{\infty}\mathbb{E}|\zeta_{i}|^{p}/V_{i}^{\theta p/2}<\infty$ for some $p>2$ and $0\leq\theta\leq 1$ , then the processes $\sum_{i=1}^{R^{\prime}(t)}\zeta_{i}$ and $W(t)$ are $o\left(t^{(1+\theta)/4}\log t\right)$ -equivalent, where $W(t)$ is a standard Wiener process.

We apply Strassen’s theorem to $\zeta_{i}=Y_{i}^{*}/\sqrt{C(f,\nu)}$ , $i=1,2,\dots$ . By Lemma 2 we have $V_{R}=\sum_{i=1}^{R}\mathbb{E}|\zeta_{i}|^{2}=\sum_{i=1}^{R}|J_{i}|+O(R)=\Theta\left(R^{1+\delta/(4+2\delta)}\right)$ and $\mathbb{E}|\zeta_{i}|^{2+\delta}\ll|J_{i}|^{1+\delta/2}\ll i^{\delta/4}$ . Hence $\sum_{i=1}^{\infty}\mathbb{E}|\zeta_{i}|^{2+\delta}/V_{i}^{\theta(1+\delta/2)}<\infty$ for any $\theta>(4+\delta)/(4+3\delta)$ . Choosing $\theta$ close enough to $(4+\delta)/(4+3\delta)$ , we have $(1+\theta)/4<1/2-\delta/20$ , and so the processes $\sum_{i=1}^{R^{\prime}(t)}Y_{i}^{*}/\sqrt{C(f,\nu)}$ and $W(t)$ are $o(t^{1/2-\delta/20})$ -equivalent; clearly so are $\sum_{i=1}^{R^{\prime}(t)}Y_{i}^{*}$ and $\sqrt{C(f,\nu)}W(t)$ .

Finally, we show that the processes $Y(t)=\sum_{i=1}^{R^{\prime}(t)}Y_{i}^{*}$ and $\sum_{i=1}^{R(t)}Y_{i}^{*}$ are $o(t^{1/2-\delta/20})$ -equivalent. Clearly $\max J_{R}=\sum_{i=1}^{R}(|H_{i}|+|J_{i}|)$ , and recall that $V_{R}=\sum_{i=1}^{R}|J_{i}|+O(R)$ . Therefore for all large enough integer $r$ , on the interval $V_{r}\leq t<V_{r+1}$ we have $R^{\prime}(t)=r$ and $R(t)=R^{\prime}(V_{r}-s)$ for some $0\leq s\ll V_{r}^{(4+2\delta)/(4+3\delta)}\log V_{r}$ , and hence $\sum_{i=1}^{R^{\prime}(t)}Y_{i}^{*}=Y(V_{r})$ and $\sum_{i=1}^{R(t)}Y_{i}^{*}=Y(V_{r}-s)$ . Letting $K_{r}=cV_{r}^{(4+2\delta)/(4+3\delta)}\log V_{r}$ with a large enough constant $c>0$ , it will thus be enough to prove that

[TABLE]

Recalling the distribution of the running maximum of a Wiener process, we have

[TABLE]

Choosing, say, $\lambda=2\sqrt{\log V_{r}}$ and noting $(2+\delta)/(4+3\delta)<1/2-\delta/20$ , the Borel–Cantelli lemma shows that the process $W(t)$ satisfies the property in (21); clearly so does $\sqrt{C(f,\nu)}W(t)$ . Since (21) is invariant under $o(t^{1/2-\delta/20})$ -equivalence, $Y(t)$ also satisfies (21). This finishes the proof in the case $C(f,\nu)>0$ .

If $C(f,\nu)=0$ , the proof is much simpler. In this case Lemma 2 gives $\mathbb{E}|Y_{i}^{*}|^{2}\ll 1$ . Therefore $\sum_{i=1}^{\infty}\mathbb{E}|Y_{i}^{*}|^{2}/i^{1+2\varepsilon}<\infty$ for any $\varepsilon>0$ , and by the strong law of large numbers $\sum_{i=1}^{R(t)}Y_{i}^{*}=o\left(R(t)^{1/2+\varepsilon}\right)$ a.s. Similarly, $\sum_{i=2}^{R(t)}Z_{i}^{*}=o\left(R(t)^{1/2+\varepsilon}\right)$ a.s. Using these relations instead of the law of the iterated logarithm and Strassen’s theorem and noting that $R(t)^{1/2+\varepsilon}\ll t^{1/2-\delta/20}$ for small enough $\varepsilon>0$ , we get $\sum_{k\leq t}f(S_{k})=o\left(t^{1/2-\delta/20}\right)$ a.s., as claimed. ∎

5.2 Almost sure asymptotics, general case

Proof of Theorem 3.

Under the extra condition that $\nu$ is strictly aperiodic, we proved claim (i) in Section 5.1, whereas claim (ii) follows from Theorem 4. We now show that the condition of strict aperiodicity can be removed, and prove the general case of Theorem 3.

Assume that the pair $(G,\nu)$ satisfies the conditions of Theorem 3; that is, $G$ is a compact metrizable group, and $\nu$ is a Borel probability measure on $G$ such that $\nu$ is adapted, and $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . We shall use the notation $\mu_{G}$ for the normalized Haar measure on $G$ . It is not difficult to see that if $\nu_{1}$ and $\nu_{2}$ are Borel probability measures on $G$ , then $\mathrm{supp}\,(\nu_{1}*\nu_{2})=(\mathrm{supp}\,\nu_{1})(\mathrm{supp}\,\nu_{2})$ (see e.g. [17, Lemma 2]). Therefore $\mathrm{supp}\,\nu^{*k}=(\mathrm{supp}\,\nu)^{k}$ , where we use the notation $A^{k}=\{a_{1}a_{2}\cdots a_{k}:a_{1},a_{2},\dots,a_{k}\in A\}$ . In particular, $\mathrm{supp}\,\nu^{*(k+1)}$ contains a translate of $\mathrm{supp}\,\nu^{*k}$ , so the sequence $\mu_{G}(\mathrm{supp}\,\nu^{*k})$ is nondecreasing. Let $\alpha(G,\nu)=\lim_{k\to\infty}\mu_{G}(\mathrm{supp}\,\nu^{*k})$ . Note that $\alpha(G,\nu)>0$ because $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ . The following simple observation is a special case of [7, Theorem 14]. For the sake of completeness we include a short proof.

Lemma 5.

Let $G$ be a compact metrizable group. If $K\subseteq G$ is nonempty and closed, and $K^{2}\subseteq K$ , then $K$ is a subgroup.

Proof.

Let $a\in K$ be arbitrary. By assumption $a^{n}\in K$ for all $n\geq 1$ . Using the compactness of $K$ we have $a^{n_{k}}\to b\in K$ as $k\to\infty$ for some subsequence $a^{n_{k}}$ . For any fixed $n\geq 1$ we have $a^{n_{k}-n}\to a^{-n}b\in K$ as $k\to\infty$ . After replacing $n_{k}$ by another subsequence we may assume that $a^{n_{k}}\to b\in K$ and $a^{-n_{k}}b\to c\in K$ as $k\to\infty$ for some $c$ . Then $b=a^{n_{k}}a^{-n_{k}}b\to bc$ , hence $c=1\in K$ . It remains to prove that for any $a\in K$ we have $a^{-1}\in K$ . But $aK$ is also nonempty and closed, and $(aK)^{2}\subseteq aK$ . By the previous argument $1\in aK$ , therefore $a^{-1}\in K$ . ∎

Assume now, that there exists a proper closed normal subgroup $H\lhd G$ such that $\mathrm{supp}\,\nu\subseteq aH$ for some coset $aH$ . Since $H$ is normal, we have $\mathrm{supp}\,\nu^{*k}\subseteq a^{k}H$ for all $k\geq 1$ . Thus $\mu_{G}(H)=\mu_{G}(a^{k}H)\geq\mu_{G}(\mathrm{supp}\,\nu^{*k})$ , and so $\mu_{G}(H)\geq\alpha(G,\nu)>0$ . In particular, $aH$ has finite order $d$ in the factor group $G/H$ . Since $\bigcup_{i=1}^{d}a^{i}H$ is a closed subgroup of $G$ containing $\mathrm{supp}\,\nu$ , and $\nu$ is assumed to be adapted, we have $G=\bigcup_{i=1}^{d}a^{i}H$ and $|G:H|=d$ . As $\mathrm{supp}\,\nu^{*d}\subseteq H$ , we can view $\nu^{*d}$ as a Borel probability measure on the compact metrizable group $H$ . Note that $\mu_{H}(B)=d\cdot\mu_{G}(B)$ ( $B\subseteq H$ Borel) is the normalized Haar measure on $H$ . Clearly $(\nu^{*d})^{*k}$ has an absolutely continuous component with respect to $\mu_{H}$ for some $k\geq 1$ . It is also not difficult to see that $\nu^{*d}$ is adapted on $H$ . Indeed, suppose $K<H$ is a proper closed subgroup for which $\mathrm{supp}\,\nu^{*d}\subseteq K$ . Consider $C=\bigcup_{i=1}^{d}\mathrm{supp}\,\nu^{*i}K$ , and note that here $\mathrm{supp}\,\nu^{*i}K\subseteq a^{i}H$ ; in particular, $C\neq G$ . On the other hand, writing an arbitrary integer $k\geq 1$ in the form $k=nd+i$ , $1\leq i\leq d$ we have $\mathrm{supp}\,\nu^{*k}=(\mathrm{supp}\,\nu^{*i})(\mathrm{supp}\,\nu^{*d})^{n}\subseteq\mathrm{supp}\,\nu^{*i}K$ . Therefore the topological closure $\overline{\bigcup_{k=1}^{\infty}\mathrm{supp}\,\nu^{*k}}$ is a subset of $C\neq G$ . Using Lemma 5 we get that $\overline{\bigcup_{k=1}^{\infty}\mathrm{supp}\,\nu^{*k}}$ is a proper closed subgroup of $G$ , contradicting the adaptedness of $\nu$ . Altogether, we find that the pair $(H,\nu^{*d})$ satisfies the conditions of Theorem 3. Observe, moreover, that $\alpha(H,\nu^{*d})=d\cdot\alpha(G,\nu)$ .

Assume in addition, that the pair $(H,\nu^{*d})$ satisfies the claims of Theorem 3. We now prove that under all these assumptions $(G,\nu)$ also satisfies the claims of Theorem 3. Fix a Borel measurable function $f:G\to\mathbb{R}$ such that $\sup_{c\in G}|f(cX_{1})|^{p}<\infty$ for some $p\geq 1$ . It will be enough to prove that for any $1\leq i\leq d$ we have

[TABLE]

for any $m\geq 1$ and $\varepsilon>0$ in the case $1\leq p\leq 2$ , and

[TABLE]

in the case $p>2$ . Fix $1\leq i\leq d$ , and let $\mathcal{F}_{i}$ denote the $\sigma$ -algebra generated by $X_{1},X_{2},\dots,X_{i}$ . Letting $Y_{n}=\prod_{j=i+(n-1)d+1}^{i+nd}X_{j}$ , the variables $Y_{1},Y_{2},\dots$ are i.i.d. $H$ -valued random variables with distribution $\nu^{*d}$ , independent of $\mathcal{F}_{i}$ . Let $b=X_{1}X_{2}\cdots X_{i}$ , and note $b\in a^{i}H$ a.s. Let $g:H\to\mathbb{R}$ , $g(x)=f(bx)$ , and observe $\sup_{c\in H}\mathbb{E}\left(|g(cY_{1})|^{p}\mid\mathcal{F}_{i}\right)<\infty$ a.s. and $\int_{H}g\,\mathrm{d}\mu_{H}=d\int_{a^{i}H}f\,\mathrm{d}\mu_{G}$ a.s. We thus have

[TABLE]

By the assumption that $(H,\nu^{*d})$ satisfies the claims of Theorem 3, we have

[TABLE]

in the case $1\leq p\leq 2$ , and

[TABLE]

in the case $p>2$ . Taking the (total) probability, (22) and (23) follow.

Finally, we prove that the pair $(G,\nu)$ satisfies the claims of Theorem 3. Let $H_{0}=G$ . If $\nu$ is not strictly aperiodic in $H_{0}$ , then let $H_{1}\lhd H_{0}$ be a proper closed normal subgroup such that $\mathrm{supp}\,\nu$ is contained in a coset of $H_{1}$ , and let $d_{1}=$ $|H_{0}:H_{1}|$ . As seen above, the pair $(H_{1},\nu^{*d_{1}})$ satisfies the conditions of Theorem 3, hence we can iterate this procedure. We obtain a sequence $H_{0}\rhd H_{1}\rhd\cdots\rhd H_{j}$ , where $H_{i}$ is a proper closed normal subgroup of $H_{i-1}$ with index $d_{i}=|H_{i-1}:H_{i}|$ , and $\mathrm{supp}\,\nu^{*(d_{1}\cdots d_{i-1})}$ is contained in a coset of $H_{i}$ for all $1\leq i\leq j$ . The procedure ends after step $j$ if $\nu^{*(d_{1}\cdots d_{j})}$ is strictly aperiodic in $H_{j}$ . Note that $1\geq\alpha(H_{i},\nu^{*(d_{1}\cdots d_{i})})=d_{1}\cdots d_{i}\alpha(G,\nu)$ , therefore the procedure terminates after finitely many steps. We prove the claims by induction on $j$ . If $j=0$ , that is, $\nu$ is strictly aperiodic, the claims have already been proved. To prove the inductive step from $j-1$ to $j$ , we first apply the inductive hypothesis to $(H_{1},\nu^{*d_{1}})$ , then the arguments above to conclude that $(G,\nu)$ satisfies the claims of Theorem 3. ∎

Proof of Theorem 1.

The implication (iii) $\Rightarrow$ (ii) is trivial, whereas (i) $\Rightarrow$ (iii) is a special case of Theorem 3. Let us finally prove (ii) $\Rightarrow$ (i). First, suppose that $\nu^{*k}$ is singular with respect to $\mu$ for every $k\geq 1$ . Then there exists a Borel set $B\subseteq G$ such that $\mu(B)=0$ and $\Pr(S_{k}\in B)=1$ for every $k\geq 1$ . Hence the indicator function $f=I_{B}$ does not satisfy (ii), giving a contradiction. Suppose next, that $\nu$ is not adapted; that is, there exists a proper closed subgroup $H<G$ such that $\Pr(X_{1}\in H)=1$ . Then $\Pr(S_{k}\in H)=1$ for all $k\geq 1$ . Since every nonempty open subset of $G$ has positive Haar measure, we have $\mu(H)<1$ . Therefore $f=I_{H}$ does not satisfy (ii), giving a contradiction. ∎

5.3 Central limit theorem

Proof of Theorem 5.

In this proof implied constants will be universal. Claim (ii) follows from Corollary 10 (i). To see (i), fix a positive integer $N$ large enough in terms of $f$ , $\nu$ and $\delta$ , and let us prove (6). Let $E_{N}=N^{-\delta/(2+2\delta)}\log^{\delta/(1+\delta)}N$ and $K=\Delta\left(\|f\|_{2+\delta}/\sqrt{C(f,\nu)}\right)^{(2+\delta)/(1+\delta)}$ . Let us decompose the set $\{1,2,\dots,N\}$ into consecutive intervals of integers $H_{1},J_{1},\dots,H_{R},J_{R}$ , as in Section 4 (with the choice $M=0$ ), such that $|H_{i}|=\lceil 4\Delta\log N\rceil$ and $|J_{i}|=\Theta\left((\Delta/K^{2})N^{\delta/(1+\delta)}\log^{2/(1+\delta)}N\right)$ for all $1\leq i\leq R$ . As in the proof of Corollary 10, we have $\Delta_{|H_{i}|}\leq N^{-2}$ , and clearly the same holds for $\Delta_{|J_{i}|}$ .

Recall that

[TABLE]

where $\sum_{i=1}^{R}\sum_{k\in J_{i}}f(S_{k})\overset{d}{=}\sum_{i=1}^{R}Y_{i}$ and $\sum_{i=2}^{R}\sum_{k\in H_{i}}f(S_{k})\overset{d}{=}\sum_{i=2}^{R}Z_{i}$ . From Lemma 2 and the classical Lyapunov condition (see e.g. [12, p. 154]) we get

[TABLE]

Here $\sum_{i=1}^{R}\mathbb{E}(Y_{i}^{*})^{2}=C(f,\nu)N+O_{f,\nu}(N^{1/(1+\delta)})$ , therefore the error of replacing the normalizing factor on the left hand side of (24) by $\sqrt{C(f,\nu)N}$ is $o(E_{N})$ . Similarly, $\sum_{i=2}^{R}Z_{i}^{*}$ also satisfies the central limit theorem with remainder term $O(KE_{N})$ . In particular,

[TABLE]

Applying this with $x=\sqrt{\log N}$ and noting that $1-\Phi(\sqrt{\log N})=O(N^{-1/2})=o(E_{N})$ , we obtain

[TABLE]

From Lemma 3 we get $\|\sum_{i=1}^{R}(Y_{i}-Y_{i}^{*})\|_{2}\ll_{f,\nu}\sum_{i=1}^{R}|J_{i}|\sqrt{\Delta_{|H_{i}|}}\ll_{f,\nu}1$ , hence the Chebyshev inequality gives

[TABLE]

We similarly deduce

[TABLE]

Finally, note that $\sup_{c\in G}\mathbb{E}f(cX_{1})^{2}<\infty$ implies $\sup_{k\geq 1}\mathbb{E}f(S_{k})^{2}<\infty$ . Therefore $\|\sum_{k\in H_{1}}f(S_{k})\|_{2}\ll_{f,\nu}\log N$ , and the Chebyshev inequality gives

[TABLE]

Combining (24)–(28) we thus have

[TABLE]

∎

We now prove the Remark made after Theorem 5. If $f\in L^{4}(G)$ , then instead of $\|Y_{i}^{*}\|_{3}\ll\|f\|_{3}\sqrt{\Delta|J_{i}|}$ , Lemma 2 gives the slightly better estimate $\|Y_{i}^{*}\|_{3}\leq\|Y_{i}^{*}\|_{4}\ll\|f\|_{2}\sqrt{\Delta|J_{i}|}+\|f\|_{4}\Delta^{3/4}|J_{i}|^{1/4}$ . Therefore if in the definition of $K$ we replace $\|f\|_{2+\delta}$ by $\|f\|_{2}$ , the Lyapunov condition gives that (24) and (25) hold with error terms $O(KE_{N})+o(E_{N})=O(KE_{N})$ . The rest of the proof remains unchanged.

Proof of Theorem 2.

The implication (i) $\Rightarrow$ (ii) follows from Theorem 5 and Proposition 8. The latter is needed to ensure $C(f,\nu)>0$ . We now prove (ii) $\Rightarrow$ (i). The facts that $\nu$ is adapted, and that $(\nu^{*k})_{\mathrm{abs}}\neq 0$ for some $k\geq 1$ follow similarly to the proof of Theorem 1. Suppose that $\mathrm{supp}\,\nu$ is contained in a coset $aH$ of some proper closed normal subgroup $H\lhd G$ . We have seen in Section 5.2 that the index $d=|G:H|$ is finite, and $G=\bigcup_{i=1}^{d}a^{i}H$ . Note that if $k=nd+i$ for some $1\leq i\leq d$ , then $S_{k}\in a^{i}H$ a.s. Letting $f=I_{H}-\mu(H)$ , we thus have $\left|\sum_{k=1}^{N}f(S_{k})\right|\leq 1-1/d$ a.s. Hence $N^{-1/2}\sum_{k=1}^{N}f(S_{k})$ cannot have a nondegenerate limit distribution. ∎

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Anoussis and D. Gatzouras: A spectral radius formula for the Fourier transform on compact groups and applications to random walks. Adv. Math. 188 (2004), no. 2, 425–443.
2[2] A. Berger and S. N. Evans: A limit theorem for occupation measures of Lévy processes in compact groups. Stoch. Dyn. 13 (2013), no. 1, 1250008, 16 pp.
3[3] I. Berkes and B. Borda: On the law of the iterated logarithm for random exponential sums. Trans. Amer. Math. Soc. 371 (2019), no. 5, 3259–3280.
4[4] I. Berkes and M. Raseta: On the discrepancy and empirical distribution function of { n k α } subscript 𝑛 𝑘 𝛼 \{n_{k}\alpha\} . Unif. Distrib. Theory 10 (2015), no. 1, 1–17.
5[5] R. N. Bhattacharya: Speed of convergence of the n 𝑛 n -fold convolution of a probability measure on a compact group. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 25 (1972/73), 1–10.
6[6] G. B. Folland: A Course in Abstract Harmonic Analysis. Second edition. CRC Press, Boca Raton, FL, 2016.
7[7] B. Gelbaum, G. K. Kalisch and J. M. H. Olmsted: On the embedding of topological semigroups and integral domains. Proc. Amer. Math. Soc. 2 (1951), 807–821.
8[8] Y. Kawada and K. Itô: On the probability distribution on a compact group. I. Proc. Phys.-Math. Soc. Japan (3) 22 (1940), 977–998.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Proof.

Abstract

1 Introduction

Theorem A**.**

Theorem B**.**

Theorem C**.**

Theorem 1**.**

Theorem 2**.**

2 Results

2.1 Preliminaries

Remark**.**

2.2 The main theorems

Theorem 3**.**

Remark**.**

Theorem 4**.**

Theorem 5**.**

Remark**.**

3 Moment estimates

Proposition 6**.**

Proof.

Proposition 7**.**

Proof.

Proposition 8**.**

Proof.

Proposition 9**.**

Proof.

4 Approximation by independent variables

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Corollary 10**.**

Proof.

Remark**.**

5 Proof of the theorems

5.1 Almost sure asymptotics, strictly aperiodic case

Lemma 4**.**

Proof.

Proof of Theorem 4.

5.2 Almost sure asymptotics, general case

Proof of Theorem 3.

Lemma 5**.**

Proof.

Proof of Theorem 1.

5.3 Central limit theorem

Proof of Theorem 5.

Proof of Theorem 2.

Theorem A.

Theorem B.

Theorem C.

Theorem 1.

Theorem 2.

Remark.

Theorem 3.

Remark.

Theorem 4.

Theorem 5.

Remark.

Proposition 6.

Proposition 7.

Proposition 8.

Proposition 9.

Lemma 1.

Lemma 2.

Lemma 3.

Corollary 10.

Remark.

Lemma 4.

Lemma 5.