Random walks on the circle and Diophantine approximation
István Berkes, Bence Borda

TL;DR
This paper studies random walks on a circle and how they behave differently when the step size is rational or irrational.
Contribution
The paper introduces a new phenomenon in random walks on compact groups involving phase transitions in convergence rates.
Findings
Random walks with irrational span exhibit different behavior compared to rational spans.
A phase transition from polynomial to exponential decay occurs in the rational case after ≈q² steps.
Convergence rates differ between the Kolmogorov and total variation metrics.
Abstract
Random walks on the circle group R/Z whose elementary steps are lattice variables with span α∉Q or p/q∈Q taken mod Z exhibit delicate behavior. In the rational case, we have a random walk on the finite cyclic subgroup Zq, and the central limit theorem and the law of the iterated logarithm follow from classical results on finite state space Markov chains. In this paper, we extend these results to random walks with irrational span α, and explicitly describe the transition of these Markov chains from finite to general state space as p/q→α along the sequence of best rational approximations. We also consider the rate of weak convergence to the stationary distribution in the Kolmogorov metric, and in the rational case observe a phase transition from polynomial to exponential decay after ≈q2 steps. This seems to be a new phenomenon in the theory of random walks on compact groups. In contrast,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
- —Nemzeti Kutatási Fejlesztési és Innovációs Hivatal 10.13039/501100011019
- —Austrian Science Fund 10.13039/501100002428
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and statistical mechanics · Geometric and Algebraic Topology · Mathematical Dynamics and Fractals
INTRODUCTION
1
Convergence of random walks on compact groups to the uniform distribution (Haar measure) is a classical topic in probability theory. Such a convergence takes place under very general assumptions, but its finer properties are rather intricate. A classical example is the cutoff phenomenon of Aldous and Diaconis [2, 3, 16, Chapter 4] for random walks on the finite symmetric group corresponding to card shuffling, when the distance from uniformity in the total variation metric remains close to 1 for a long time, and then drops almost immediately near 0. In this paper, we consider random walks on the circle group R/Z, which also exhibit surprisingly delicate phenomena.
Let X1,X2,⋯ be independent, identically distributed (i.i.d.) nondegenerate integer‐valued random variables, and put Sk=∑j=1kXj. Given an irrational α, the sequence Skα(modZ) is a random walk on R/Z whose asymptotic behavior depends sensitively on the Diophantine approximation properties of α. In the terminology of probability theory, Skα(modZ) is a discrete time Markov chain on a general state space, with the uniform distribution on R/Z as its stationary distribution. Note that the chain is irreversible, and, as we will see, its terms are polynomially weakly dependent. We can also consider a discrete analog by choosing a reduced fraction p/q instead of an irrational α. In this case, Skp/q(modZ) is a random walk (in particular, a discrete time Markov chain) on the finite cyclic group Zq:={0,1/q,…,(q−1)/q}. The main purpose of this paper is to study the asymptotic behavior of these random walks, and to explicitly describe the transition from finite to general state space as p/q→α along the sequence of best rational approximations to a given irrational α.
Starting with the irrational case, let
denote the rate of weak convergence to the uniform distribution in the Kolmogorov metric, where {·} denotes fractional part. Improving results of Schatte [25], Diaconis [16, Section 3.C], Su [29], and Hensley and Su [17], in [6], we found the precise relationship between ψ(k) and the Diophantine approximation properties of α. In particular, assuming that infh∈Z∖{0}|h|·∥hα∥>0, where ∥·∥ denotes distance from the nearest integer (such irrationals are called badly approximable) and that EX12<∞, we proved that k−1/2≪ψ(k)≪k−1/2. See also [6] for sharp results under more general Diophantine conditions and for heavy‐tailed X1. For instance, assuming that infh∈Z∖{0}|h|γ∥hα∥>0 with some γ⩾1, we have ψ(k)≪k−1/(2γ), and this is also sharp. For similar results on the real line instead of the circle group, we refer to Bobkov [10, 11].
In the rational case, it is natural to define the distance from uniformity in the “discrete Kolmogorov metric” as
where the maximum is over integer values of a. Note that the discrete Kolmogorov metric metrizes weak convergence on Zq, and so, ψdisc(k)→0 if and only if the maximal span of X1 (the largest integer D such that D∣(X1−X2) a.s.) is relatively prime to q. Our first result establishes a precise phase transition from polynomial to exponential decay in the discrete setting, which is largely independent of the underlying distribution. This behavior has not yet been described for a random walk on any compact group in any probability metric. Theorem 1Assume that EX12<∞, and let φ(x)=EeixX1 denote the characteristic function of X1. There exists a constant q0 depending only on the distribution of X1 with the following property. If p/q is a reduced fraction such that q⩾q0, the maximal span D of X1 and q are relatively prime, and min0<|h|⩽q/2|h|·∥hp/q∥⩾A>0 with some constant A>0, then
and
The implied constants in the lower bounds depend only on the distribution of X1. The implied constants in the upper bounds depend, in addition, on A.
Note that |φ(2π/(Dq))|k is roughly exp(−ck/q2) with c=2π2(VarX1)/D2. The value of A>0 depends only on the maximal partial quotient in the continued fraction expansion of p/q, but not on its length. In particular, if p/q is a best rational approximation to a given badly approximable irrational α, this condition is satisfied with an A>0 depending only on α. In this case, for the first few steps, the Markov chains Skα(modZ) and Skp/q(modZ) are practically indistinguishable. Theorem 1 describes with striking precision, that it takes constant times† q2 steps for Skp/q(modZ) to start to behave like a Markov chain on a finite state space; exponentially fast convergence to the stationary distribution is a hallmark property of finite state space Markov chains. We have an explicitly described transition from rational to irrational behavior: as p/q→α, the time of transition, constant times q2 goes to infinity, and in the limiting case of Skα(modZ), we end up with the polynomial decay k−1/2 for all k⩾1. Obvious modifications of the proof of Theorem 1 yield similar results under more general Diophantine conditions on p/q, and for heavy‐tailed X1.
In contrast, for any q⩾q0, the distance ψTV(k) of the distribution of {Skp/q} from uniformity in the total variation metric satisfies
where q0 and the implied constants depend only on the distribution of X1, provided that EX12<∞ and that D and q are relatively prime. The rate does not depend on the Diophantine properties of the fraction p/q. Indeed, multiplication by an integer p relatively prime to q is a bijection of Zq, and thus, does not change the distance in total variation. In particular, it takes constant times q2 steps to get close to uniformity in the total variation metric as well; however, there is no transition from polynomial to exponential decay. Note that in the irrational setting, Skα(modZ) does not converge to the uniform distribution on R/Z in total variation. Special cases of (2) were first proved by Chung, Diaconis, and Graham [15], see also [3, 16, Section 3.C].
In addition to weak convergence, we also study the empirical distribution by considering additive functionals, that is, sums of the form ∑k=1Nf(Skα). Let‡ F denote the set of all one‐periodic functions f:R→R such that f is of bounded variation on [0,1], let E(f)=∫01f(x)dx, and set
where f¯=f−E(f), and U is a random variable uniformly distributed on R/Z, independent of X1,X2,⋯. Our second result is the central limit theorem (CLT) and the law of the iterated logarithm (LIL) for the sum ∑k=1Nf(Skα). Theorem 2Assume that X1>0 a.s., and that EX1<∞. Let α be irrational such that infh∈Z∖{0}|h|γ∥hα∥>0 with some constant 1⩽γ<(3+5)/4≈1.309. Then for any f∈F the series in (3) is convergent, and C(α,f)⩾0 with equality if and only if f¯=0 a.e. For any f∈F, the sum ∑k=1Nf(Skα) satisfies the CLT
and the LIL
with σ=C(α,f).
Note that (4) expresses convergence in distribution to the mean zero normal distribution with variance σ2 (interpreted as the constant 0 if σ=0). In addition to badly approximable irrationals, the Diophantine condition is satisfied by all algebraic irrationals and also by a.e. real number in the sense of the Lebesgue measure. We also prove an almost sure approximation of the same sum by a Wiener process. Theorem 3Assume that the conditions of Theorem 2 hold, and let f∈F. After a suitable extension of the probability space, there exists a stochastic process ζ(t) in the Skorokhod space D[0,∞) with the same distribution as ∑1⩽k⩽tf(Skα) such that
with σ=C(α,f), a standard Wiener process W(t), and some constant ε>0 depending only on γ.
The almost sure approximation in Theorem 3 immediately implies the CLT and the LIL in Theorem 2, as well as the almost sure asymptotics and the limit distribution of more general functionals of the process ∑1⩽k⩽tf(Skα). Using the piecewise linear functions ∑1⩽k⩽⌊t⌋f(Skα)+(t−⌊t⌋)f(S⌊t⌋+1α) instead, Theorem 3 holds in the space of continuous functions C[0,∞).
In a previous paper [6], we observed a transition in the behavior of the chain from weak to strong dependence as γ passes the critical value 2: the sequence Skα(modZ) behaves, from the point of view of discrepancy, like independent random variables if 1⩽γ<2, but not when γ>2, see Section 2. For this reason, we conjecture that Theorems 2 and 3 in fact hold for all 1⩽γ<2.
Starting the chain from its stationary distribution corresponds to U+Skα(modZ), where U is as in (3). This stationary sequence exhibits the same transition from weak to strong dependence: the sum ∑k=1Nf(U+Skα), f∈F, satisfies the CLT and the LIL if 1⩽γ<2, but not when γ⩾2. The same holds in the quenched setting, that is, the sum ∑k=1Nf(x+Skα), f∈F, satisfies the CLT and the LIL for a.e. x∈R if 1⩽γ<2, but not when γ⩾2. A detailed proof of these results will be given in an upcoming paper. The novelty of Theorems 2 and 3 lies in the fact that the chain is started from a nonstationary distribution; in fact, from the specific point x=0 instead of a typical x∈R. Using x=0 as starting point makes the problem considerably harder, and requires a blend of arithmetic and Fourier analytic arguments to complement classical methods of probability theory.
The rational case is, of course, much simpler. Indeed, if the maximal span of X1 is relatively prime to q, then Skp/q(modZ) is an irreducible Markov chain on Zq, and from classical theorems for finite state space Markov chains [14, Chapter 16], it follows that for any f∈F, the CLT
and the LIL
hold with σ=C(p/q,f). The only difference is that E(f)=q−1∑a=0q−1f(a/q) is the average of f on Zq, and that in the definition of C(p/q,f), the variable U is uniformly distributed on Zq. In this case, C(p/q,f)⩾0 with equality if and only if f¯=0 on Zq.
Informally, Theorem 2 states that the Markov chain Skp/q(modZ), whose state space is finite but of increasing size as q→∞, begins to behave more and more like the Markov chain Skα(modZ) with an irrational α. What we can formally say is that key parameters of these Markov chains, such as the expected value and the variance converge:
as p/q→α along the sequence of best rational approximations to a given irrational α, see Proposition 4. Once again, we have an explicitly described transition from rational to irrational behavior, with Theorem 2 corresponding to the limiting case.
The rest of the paper is organized as follows. Several previous results related to Theorems 2 and 3 are listed in Section 2. We prove Theorems 2 and 3 in Section 3, and the remarks made about the rational case at the end of Section 3.3. The proof of Theorem 1 and (2) is given in Section 4.
RELATED RESULTS
2
Given a sequence of i.i.d. R/Z‐valued random variables ζ1,ζ2,⋯, the qualitative behavior of the random walk Zk=∑j=1kζj(modZ) is fairly straightforward. A classical result of Lévy [19] states that Zk converges in distribution to the uniform distribution on R/Z if and only if the distribution of ζ1 is not supported on a translate of a finite cyclic subgroup. The corresponding ergodic theorem is due to Robbins [24]: we have
for all continuous functions f:R/Z→R if and only if the distribution of ζ1 is not supported on a finite cyclic subgroup. Note that (8) expresses weak convergence of the empirical distribution N−1∑k=1NδZk, with δ denoting the Dirac measure, in other words, the equidistribution of the random sequence Zk. Both facts generalize to compact groups.
Quantitative forms of the ergodic theorem (8) are more involved. In [8], we proved that
holds with some constant 0<σ<∞ if and only if ζ1 is nondegenerate. A remarkable perturbation method first used by Schatte [26, 27] allows more general test functions. Improving results of Schatte, in [7], we showed that the sum ∑k=1Nf(Zk) satisfies the CLT and the LIL for any function f∈F, provided that the rate of weak convergence of Zk to uniformity in the Kolmogorov metric
satisfies ψ(k)≪k−(1+ε) with some ε>0. We conjecture that the assumption on the rate is optimal. Similar results are known for p‐Hölder functions (resp. bounded Borel measurable functions) with the Kolmogorov metric replaced by the p‐Wasserstein metric (resp. total variation metric). In fact, these hold on any compact group [12, 13].
The assumption ψ(k)≪k−(1+ε) holds for a wide class of distributions (e.g., under Cramér's condition), but not when ζ1 is a real‐valued lattice variable with finite expectation taken mod Z as in Theorems 2 and 3. Indeed, in the latter case by the Markov inequality and the pigeonhole principle, the distribution of Zk has an atom of weight ≫k−1, and thus, ψ(k)≫k−1; under a finite variance condition, we even have ψ(k)≫k−1/2. The main goal of this paper is to establish quantitative ergodic theorems for such random walks. Instead of fast enough convergence in the Kolmogorov metric, the crucial assumption is E|X1|<∞ and EX1≠0. For technical reasons, in Theorems 2 and 3, we assume the slightly stronger condition X1>0 a.s. and EX1<∞. Proving the CLT (4) and the LIL (5) solely under a condition on the expected value presents considerable arithmetic and analytic difficulties, consequently the proof of Theorems 2 and 3 is much more technical than under ψ(k)≪k−(1+ε) in [7]. We do not know whether these methods generalize to other compact groups.
From a broader perspective, our results fit into the subject of subsequences {nkα} of the classical {nα} sequence. For given nk and α, quantitative equidistribution results are notoriously difficult to prove, and are known only in very special cases. Considerable effort has been made to understand the case of a randomly chosen α. R. Baker [4] showed that for any strictly increasing sequence of positive integers nk, the mod 1 discrepancy
satisfies DN(nkα)≪N1/2(logN)3/2+ε for a.e. α, and this is known to be sharp up to factors of logN. For a fixed f∈F with ∫01f(x)dx=0, Lewko and Radziwiłł[20] improved this to
which is known to be sharp up to factors of loglogN. See also [1, 9]. For lacunary sequences nk, both DN(nkα) and ∑k=1Nf(nkα) with a fixed f∈F satisfy a sharp LIL for a.e. α, see Philipp [23].
In contrast, our results concern the case when nk is random and α is deterministic. Note that under X1>0 a.s., {Skα} is a random subsequence of {nα}. In [6], we found the discrepancy DN(Skα) up to logarithmic factors for a large class of distributions, as an analog of Baker's theorem. Theorems 2 and 3 in the present paper represent an improvement for a fixed f∈F similar to (9). In particular, for certain special distributions with E|X1|<∞, EX1≠0 and an irrational α satisfying 0<lim infh→∞hγ∥hα∥<∞ in [6], we showed that DN(Skα) is, up to logarithmic factors, Nmax{1/2,1−1/γ} a.s. We thus have a transition from weak dependence to strong dependence at γ=2. We also mention that in [7], we actually proved that DN(Zk) satisfies the LIL and found the nondegenerate limit distribution of N−1/2DN(Zk) under the assumption ψ(k)≪k−(1+ε). The assumptions of Theorem 2, however, seem not to be strong enough to find the sharp asymptotics of the discrepancy.
The growth rate of the integer sequence Sk plays an important role in our setup as well. Indeed, under the condition X1>0 a.s. and EX1<∞ of Theorems 2 and 3, Sk is a linearly increasing sequence of integers, and we have the CLT (4) and the LIL (5) with a fix f∈F. If X1 has heavy tails Pr(|X1|⩾x)∼cx−β with some 0<β<1 and c>0 instead, and infh∈Z∖{0}|h|γ∥hα∥>0 with some γ<1/β, then by the results in [5], we have ψ(k)≪k−1/(βγ)≪k−(1+ε), and consequently, the precise asymptotics of DN(Skα) is also known. Note that in this heavy‐tailed case, Sk grows, in a stochastic sense, roughly at the rate k1/β. In particular, for a random version of Philipp's LIL for the discrepancy, polynomial growth suffices instead of lacunarity.
PROOF OF THEOREMS 2 AND 3
3
Throughout this section, X1,X2,⋯ is a sequence of i.i.d. nondegenerate integer‐valued random variables with characteristic function φ(x)=EeixX1, and Sk=∑j=1kXj. Further, e(x)=e2πix, and f^(h)=∫01f(x)e(−hx)dx, h∈Z are the Fourier coefficients of f. Finally, V(f) denotes the total variation of f on [0,1].
A lemma on characteristic functions
3.1
In this section, we prove a technical lemma on the characteristic function φ. Let suppX1={n∈Z:Pr(X1=n)>0} denote the support of (the distribution of) X1. Further, let gcd(A) denote the positive greatest common divisor, and A−A={a−b:a,b∈A} the set of differences of a (finite or infinite) set A⊆Z. Lemma 1Assume that E|X1|<∞ and EX1≠0, and let d=gcd(suppX1) and D=gcd(suppX1−suppX1).
- (i)For any integer N⩾1 and any x,y∈R such that d(x−y)∉Z,
In particular, for any x∈R, we have |1−φ(2πx)|≫∥dx∥.
- (ii)For any positive integers B,B′, both divisible by D/d such that min{B,B′}⩽B+B′, and any x∈R such that Dx∉Z,
The implied constants in (i) and (ii) depend only on the distribution of X1.
Throughout this proof, constants and implied constants depend only on the distribution of X1. Replacing X1,X2,⋯ by X1/d,X2/d,⋯, we may assume that d=1. In particular, the smallest period of φ(2πx) is 1. Note that X1−X2 has characteristic function |φ|2, and supp(X1−X2)=suppX1−suppX1. It is thus easy to deduce that |φ(2πx)| is 1/D‐periodic, and that |φ(2πx)|=1 if and only if x=n/D with some integer n. In fact, we have the estimate
Indeed, note that
Since sin2(πx(X1−X2))⩾4x2(X1−X2)2 whenever |x(X1−X2)|⩽1/2, we get
Here, E(X1−X2)2>0 (possibly infinite) as X1 is nondegenerate; therefore, (10) holds in an open neighborhood of 0. By periodicity, (10) holds for all x∈∪n∈Z(n/D−η,n/D+η) with a small η>0. On the other hand, for η⩽x⩽1/D−η, the function |φ(2πx)| is bounded away from 1 by continuity. This establishes (10) for all x∈R.Furthermore, there exists an integer a such that X1≡a(modD) a.s., and the assumption d=1 ensures that a and D are relatively prime. Hence, for any integer n, we have e(nX1/D)=ωn a.s. with the primitive Dth root of unity ω=e(a/D). This means that within a period x∈[0,1), the curve φ(2πx) touches the unit circle at each Dth root of unity exactly once. Moreover, the derivative at these points is nonzero. Indeed, we have
We now prove (i). Note that
The assumption E|X1|<∞ implies that the derivative φ′ is uniformly continuous, which, in turn, ensures that the convergence
is uniform as |x−y|→0. Thus, it is not difficult to see that there exists a constant δ>0 such that for any x,y∈J=⋃n∈Z(n/D−δ,n/D+δ), we have
Since |φ(2πx)|≠1 outside J, using the compactness of [0,1]∖J and the periodicity of |φ|, we also have |φ(2πx)|⩽1−r for every x∈R∖J with some constant r>0. Let U={x∈R:|φ(2πx)|>1−r/2}⊆J.Now let x,y∈R be such that x−y∉Z. If x,y∈U, then by (12), we have
If x,y∉U, then |φ(2πx)|⩽1−r/2 and |φ(2πy)|⩽1−r/2, hence
Finally, suppose that, say, x∈U and y∉U. If y∈J, then (12) and (13) still hold. If y∉J, then |φ(2πx)|>1−r/2 and |φ(2πy)|⩽1−r, therefore
This finishes the proof of the first claim in (i). The second claim in (i) follows, for example, from setting y=0 and letting N→∞ in the first claim (although it would not be difficult to give a direct proof).To prove (ii), first note that B and B′ play almost symmetric roles, that is,
Thus, we may assume that B⩽B′, and so, B⩽B+B′. From (11), we deduce the asymptotics
Let, say, K=3π|EX1|, and recall (10). There exists a constant δ′>0 such that for any x∈J′=⋃n∈Z(n/D−δ′,n/D+δ′) the o(1) term in (14) has absolute value less than 1/2, and so,
with some constant L>0, where n=n(x) is the integer for which |x−n/D|<δ′. We may assume Kδ′<1. As in the proof of (i), for any x∈R∖J′, we have |φ(2πx)|⩽1−r′ with some constant r′>0. To proceed, we will also need the simple estimate
which holds with implied constant e−2. (This can be seen, e.g., by expanding (1+(z−1))m.)Now let x∈R be such that Dx∉Z. If x∈R∖J′, then |1−φ(2πx)B+B′|⩾r′, and we are done. If δ′/B+B′⩽|x−n/D|<δ′ for some integer n, then from (15), we get
and again, we are done.Next, assume δ′/(B+B′)⩽|x−n/D|<δ′/B+B′ for some integer n. From (15), we similarly get
From (15), we also have |φ(2πx)−ωn|⩽K|x−n/D|<Kδ′/B+B′<1/B. Hence, (16) with z=ω−nφ(2πx) and m=B gives
Here, ω−nB=1 because B was assumed to be divisible by D. Therefore, we have |1−φ(2πx)B|/|1−φ(2πx)B+B′|≪B, and we are done.Finally, assume 0<|x−n/D|<δ′/(B+B′) for some integer n. Then, (15) gives the stronger bound |φ(2πx)−ωn|⩽K|x−n/D|<1/(B+B′). Applying (16) with z=ω−nφ(2πx) and m=B, m=B+B′, respectively, we get
Here ω−nB=ω−n(B+B′)=1; therefore, |1−φ(2πx)B|/|1−φ(2πx)B+B′|≪B/(B+B′), and we are done.□
An exponential sum
3.2
In this section, we approximate a general sum involving f by an exponential sum. Lemma 2For any f∈F, any x1,x2,⋯,xN∈R and any y∈[−1/2,1/2],
where the supremum and the infimum are over all intervals J⊆R of length λ(J)=|y|, and IJ∗(x)=∑n∈ZIJ(x+n) denotes the indicator of J extended with period 1.
For any f∈F and any random variables X and Y,
where {·} denotes fractional part. This fact is usually stated when the distribution of X is finitely supported with equal weights, and Y is uniformly distributed on [0,1], see Koksma's inequality [18, p. 143]. The general case formally follows from integration by parts. For a detailed proof, see [7, Lemma 1].Let us apply (17) to the random variables X and Y with distribution N−1∑k=1Nδxk and N−1∑k=1Nδxk−y, respectively. If 0⩽y⩽1/2, then for any x∈[0,1], we have
A similar argument shows that the same holds if −1/2⩽y⩽0, and the claim follows.□
Lemma 3Let f∈F, and let x1,x2,⋯,xN∈R be such that ∥xk−xℓ∥⩾r>0 for all k≠ℓ. For any integer H>1,
with a universal implied constant.
Let FH(x)=∑|h|<H(1−|h|/H)e(hx) denote the Fejér kernel, and recall the convolution identity
Applying this with x=x1,x2,⋯,xN and using the fact that the total integral of FH on [−1/2,1/2] is 1, the error term in the claim can be written in the explicit form
The assumption ∥xk−xℓ∥⩾r, k≠ℓ and the pigeonhole principle imply that the periodic extension of any interval of length |y| contains at most |y|/r+1 of the points x1,x2,⋯,xN. According to Lemma 2, we thus have
The claim then follows from the estimate ∫−1/21/2|y|FH(y)dy≪logH/H, which can be seen directly from the definition of FH.□
The variance
3.3
In this section, we prove two lemmas closely related to the variance of ∑k=1Nf(Skα). In particular, we find the variance of the corresponding stationary process ∑k=1Nf(U+Skα), and prove the properties of the constant C(α,f). At the end of the section, we prove the remarks made in the Introduction about the rational case.
We will need the fact that the distance from the nearest integer function is symmetric and subadditive, that is, ∥−x∥=∥x∥ and ∥x+y∥⩽∥x∥+∥y∥ for any x,y∈R. Further, we will need the classical Diophantine estimate that states that for any irrational α such that infh∈Z∖{0}|h|γ∥hα∥>0 and any integer H>1,
with implied constants depending only on α and γ. A similar estimate claims ∑h=1H1/(h2∥hα∥2)≪logH if γ=1. For a detailed proof of (18), we refer to [6, Corollary 4.3]. For related results on Diophantine sums, see [18, Chapter 2]. Lemma 4Assume that E|X1|<∞ and EX1≠0, and let α be irrational such that infh∈Z∖{0}|h|γ∥hα∥>0 with some constant γ⩾1. Further, let H>1 be an integer, and let ch∈C, 0<|h|<H be a finite sequence such that |ch|⩽1/|h| for every h. For any integers M⩾0 and N⩾1,
with implied constants depending only on the distribution of X1, α, and γ.
Let d=gcd(suppX1). Replacing X1,X2,⋯ by X1/d,X2/d,⋯ and α by dα, we may assume that d=1. Let V denote the left‐hand side of the claim. Expanding the square, we get
Here,
First, we estimate the off‐diagonal terms j≠h in (19). Let a=φ(−2πhα), b=φ(2πjα), and c=φ(2π(j−h)α). Summing over M+1⩽k,ℓ⩽M+N, we obtain
Lemma 1 (i) shows that |∑k=0N−1ckaN−k−1|≪1/∥jα∥, |∑k=0N−1ckbN−k−1|≪1/∥hα∥, 1/|1−a|≪1/∥hα∥, 1/|1−b|≪1/∥jα∥, and 1/|1−c|≪1/∥(j−h)α∥. The contribution of all off‐diagonal terms in V is thus
Let EH=log4H if γ=1, and EH=H2γ−2 if γ>1 denote the error term in the claim. The classical estimate (18) shows that here the contribution of 1/(|j|·∥jα∥·|h|·∥hα∥) is indeed O(EH). Note that if ∥jα∥⩽∥hα∥/2, then ∥(j−h)α∥⩾∥hα∥/2; hence, the contribution of all such terms is also O(EH). A similar claim holds if ∥hα∥⩽∥jα∥/2. Thus, it is enough to consider the terms for which ∥jα∥ and ∥hα∥ are equal up to a factor of 2. In particular, we need to estimate
This is symmetric in j,h; thus, it is enough to consider the terms for which, say, |j|⩽|h|. For any such term, |h|⩾|j−h|/2. Hence, summing over j and i=j−h≠0 instead of j and h, we get from (18) that (21) is
The total contribution of all off‐diagonal terms j≠h in (19) is thus O(EH).Next, we estimate the diagonal terms j=h in (19). Using (20) and the fact that φ(−2πhα)=φ(2πhα)¯, after some simplification, we get
Finally, applying Lemma 1(i) and the classical estimate (18), we get that the total contribution of this error term in (19) also satisfies
□
Lemma 5Assume that E|X1|<∞ and EX1≠0, and let U be uniformly distributed on R/Z, independent of X1,X2,⋯. Further, let α be irrational such that infh∈Z∖{0}|h|γ∥hα∥>0 with some constant 1⩽γ<2.
- (i)For any f∈F, the infinite series in (3) is convergent, and C(α,f)⩾0 with equality if and only if f¯=0 a.e.
- (ii)For any f∈F and any integer N⩾1,
with implied constants depending only on the distribution of X1, α, and γ.
Let d=gcd(suppX1) and D=gcd(suppX1−suppX1). Replacing X1,X2,⋯ by X1/d,X2/d,⋯ and α by dα, we may assume that d=1. We may also assume that E(f)=0. Since the variable U is independent of X1,X2,⋯, we have
where g(x)=∫01f(u)f(u+x)du. Note that g^(h)=|f^(h)|2 for any integer h, and that f^(0)=0. Further, integration by parts shows that |f^(h)|⩽V(f)/(2π|h|) for any integer h≠0. In particular, the Fourier series of g is absolutely convergent. Clearly g is continuous; hence, the Fourier series of g converges uniformly to g. Therefore, we can write
First, we prove (i). Recall from the proof of Lemma 1 that φ(2πx)=1 if and only if x∈Z. Also, |φ(2πx)|=1 if and only if Dx∈Z. In particular, |φ(2πhα)|<1 for any h≠0. For any positive integer K, we thus have
Using Lemma 1(i), the classical estimate (18), and a dyadic decomposition, in the case 1<γ<2, we get
A similar estimate holds if γ=1. We can thus take the limit as K→∞ in (23) to obtain
and the convergence of the series in (3) follows. Combining the h and −h terms in the previous formula and using Ef(U)2=∑h≠0|f^(h)|2, we get
Here 0<(1−|φ(2πhα)|2)/|1−φ(2πhα)|2≪1/∥hα∥, and thus C(α,f)⩾0. Further, C(α,f)=0 if and only if f^(h)=0 for all integers h≠0. The latter condition is equivalent to f¯=0 a.e.Next, we prove (ii). We may assume V(f)=1. Let V=E(∑k=1Nf(U+Skα))2. Expanding the square, we get
Let us write U+Sℓα=U+Skα+(Sℓ−Sk)α. Since U+Skα(modZ) is uniformly distributed on R/Z and independent of (Sℓ−Sk)α, we have Ef(U+Skα)2=Ef(U)2=∥f∥22 and Ef(U+Skα)f(U+Sℓα)=Ef(U)f(U+Sℓ−kα). Hence, the previous formula simplifies as V=N∥f∥22+2∑k=1N−1(N−k)Ef(U)f(U+Skα). Using (22), we thus have
Here, the inner sum is
By Lemma 1(i), the second term on the right‐hand side of the previous line has absolute value at most
From |f^(h)|⩽1/|h|, the classical estimate (18) and a dyadic decomposition, in the case 1<γ<2, we get
therefore V=N∥f∥22+2N∑h≠0|f^(h)|2φ(2πhα)/(1−φ(2πhα))+O(N2−2/γ). In the case γ=1, the same holds with error term O(logN). Combining the h and −h terms, and using ∥f∥22=∑h≠0|f^(h)|2, we finally obtain
as claimed.□
If EX12<∞, then limx→0(1−|φ(2πx)|2)/|1−φ(2πx)|2=VarX1/(EX1)2 is finite. In particular, (1−|φ(2πhα)|2)/|1−φ(2πhα)|2≪1, and hence, C(α,f)≪∥f¯∥22. Also, the zeroes of the function above are at points x such that Dx∈Z but x∉Z. In the special case D=1, we thus also have (1−|φ(2πhα)|2)/|1−φ(2πhα)|2≫1, and so, C(α,f)≫∥f¯∥22.
We now prove the remarks made in the Introduction about the rational case. Let p/q be a reduced fraction, and assume that D=gcd(suppX1−suppX1) is relatively prime to q. Let fq^(h)=q−1∑a=0q−1f(a/q)e(−ha/q), h=0,1,⋯,q−1 be the Fourier coefficients of f on Zq={0,1/q,⋯,(q−1)/q}, and let E(f)=fq^(0) and f¯=f−E(f). Finally, let U be a random variable uniformly distributed on Zq, independent of X1,X2,⋯, and define
Following the proof of Lemma 5(i) using Fourier analysis on Zq instead of R/Z, we get
In particular, C(p/q,f)⩾0 with equality if and only if f¯=0 on Zq. For a far‐reaching generalization of these Fourier analytic expressions for C(α,f) and C(p/q,f) to compact groups, we refer to [12]. We also note that the condition gcd(D,q)=1 is equivalent to the distribution of X1p/q(modZ) not being supported on a translate of a proper subgroup of Zq. Consequently, the CLT (6) and the LIL (7) are special cases of the quantitative ergodic theorems for random walks on compact groups in the same paper. Proposition 4Assume that E|X1|<∞ and EX1≠0, and let α be irrational such that infh∈Z∖{0}|h|γ∥hα∥>0 with some constant 1⩽γ<2. For any f∈F,
as p/q→α along the sequence of best rational approximations, provided that D=gcd(suppX1−suppX1) is relatively prime to all but finitely many q in this sequence.
By (24) and (25), we need to prove that
Since f is Riemann integrable and φ is continuous, we have term by term convergence for any fixed h≠0.Note that f is of bounded variation also on Zq in the sense that
From summation by parts, we thus get that
remains true for the Zq‐Fourier transform for all 0<|h|⩽q/2. On the other hand, by Lemma 1(i),
Here,
for all 0<|h|⩽q/2, where the second inequality follows from the best rational approximation property ∥hα∥⩾∥qα∥. Therefore,
and, as we have seen before, ∑h≠01/(h2∥hα∥)<∞ is ensured by the assumption γ<2. The convergence of the series in (26) thus follows, for example, from the dominated convergence theorem.□
Approximation by independent variables
3.4
The main idea of the proof of Theorems 2 and 3 is that applying suitable small perturbations to the terms of ∑k=1Nf(Skα) introduces independence, and the CLT and the LIL then follow from classical results of probability theory. This method goes back to Schatte [26, 27]. We improved [7], and generalized his approach to compact groups [12, 13]. In our setup, the source of independence is Lemma 6 below. We approximate the error of the perturbations in Lemma 7, and then finish the proof of Theorems 2 and 3 at the end of the section.
Fix an irrational α such that infh∈Z∖{0}|h|γ∥hα∥>0 with some constant 1⩽γ<2, and let ψ(k)=sup0⩽x⩽1|Pr({Skα}⩽x)−x|, as before. For any integer n⩾0, let us decompose the finite set of integers [2n,2n+1) into consecutive blocks Hn,1,Jn,1,Hn,2,Jn,2,…,Hn,rn,Jn,rn. This way we obtain a block decomposition Hn,1,Jn,1,⋯,Hn,rn,Jn,rn, n=0,1,… of the set of positive integers. Further, let us introduce auxiliary random variables ξn,2,ξn,3,…,ξn,rn−1, n=0,1,… which are independent, uniformly distributed on R/Z, and independent of X1,X2,⋯. The following lemma is implicit in Schatte [26, 27]. For a formal proof, see [7]. Lemma 6 (Schatte) There exists a sequence of random variables δn,2,δn,3,⋯, δn,rn−1, n=0,1,… with the following properties. First, δn,i is measurable with respect to Xk, k∈Jn,i−1 and ξn,i, and |δn,i|⩽ψ(|Jn,i−1|) for all n⩾0 and 2⩽i<rn. Second, the random vectors (Skα−δn,i(modZ):k∈Hn,i) are independent and have uniformly distributed coordinates. Similarly, there exists a sequence of random variables δn,2′,δn,3′,⋯,δn,rn−1′, n=0,1,… with the following properties. First, δn,i′ is measurable with respect to Xk, k∈Hn,i and ξn,i, and |δn,i′|⩽ψ(|Hn,i|) for all n⩾0 and 2⩽i<rn. Second, the random vectors (Skα−δn,i′(modZ):k∈Jn,i) are independent and have uniformly distributed coordinates.
As before, let d=gcd(suppX1) and D=gcd(suppX1−suppX1). Let c>0 be a small constant, to be chosen, and let Bn=(D/d)2⌈2(1/2−c)n⌉ and Bn′=(D/d)⌈2cn⌉. We choose the sizes of the blocks so that |Hn,i|=Bn and |Jn,i|=Bn′ for all 1⩽i<rn, and |Hn,rn|+|Jn,rn|<Bn+Bn′ is the remainder of 2n modulo Bn+Bn′. By the nondegeneracy of X1, we have ψ(k)≪k−1/(2γ)⩽k−1/4, see [5, Proposition 2.1]. In particular, |δn,i|≪(Bn′)−1/4 and |δn,i′|≪Bn−1/4.
Now fix f∈F with E(f)=∫01f(x)dx=0, and consider the block sums Tn,i=∑k∈Hn,if(Skα) and Dn,i=∑k∈Jn,if(Skα). If N=maxJn,R for some n⩾0 and 1⩽R⩽rn, then
Let Tn,i∗=∑k∈Hn,if(Skα−δn,i) and Dn,i∗=∑k∈Jn,if(Skα−δn,i′) denote the corresponding perturbed block sums. By Lemma 6, we have that Tn,i∗, n⩾0, 2⩽i<rn are independent and ETn,i∗=0. Similarly, Dn,i∗, n⩾0, 2⩽i<rn are independent and EDn,i∗=0. Further, Tn,i∗=d∑k=1|Hn,i|f(U+Skα) and Dn,i∗=d∑k=1|Jn,i|f(U+Skα); therefore, by Lemma 5, the variance is
with implied constants depending only on the distribution of X1, α, and γ. The same holds for EDn,i∗2 with |Hn,i| replaced by |Jn,i|. Lemma 7Assume that X1>0 a.s. and EX1<∞, and also that 1<γ<(3+5)/4≈1.309. There exists a small enough constant c>0 depending only on γ, such that for any n⩾0 and any real t⩾1,
with implied constants depending only on the distribution of X1, α, γ, and c.
We only give a proof for ∑i=2R(Tn,i−Tn,i∗), as the proof for ∑i=2R(Dn,i−Dn,i∗) is analogous. Replacing X1,X2,⋯ by X1/d,X2/d,⋯ and α by dα, we may assume that d=1. We may also assume that V(f)=1. Since n⩾0 is fixed, for the sake of simplicity, we can write Ti=Tn,i, Hi=Hn,i, Ji=Jn,i, B=Bn, B′=Bn′, r=rn, ξi=ξn,i, and δi=δn,i. Let c>0 be a small enough constant to be chosen, and note that min{B,B′}=B′⩽B+B′. Further, let H>1 be an integer, to be chosen. We start by approximating Ti and Ti∗ by two exponential sums. For every 2⩽i<r, let
and let us write Ti=Ai+Ei and Ti∗=Ai∗+Ei∗.Fix 2⩽R<r, and let us apply Lemma 3 to the points Skα, k∈H2∪⋯∪HR. Since X1 attains positive integers only, these points satisfy
for any k≠ℓ, and hence, |∑i=2REi|≪logH/H(∑j=2n2n+1Xj)γ+1. The same holds for the maximum over 2⩽R<r; thus, by EX1<∞ and the Markov inequality,
The last term in the exponent satisfies c(2−γ)/(8γ)⩽c.Using the fact that the vectors (Skα−δi(modZ):k∈Hi), 2⩽i<r are independent and their coordinates are uniformly distributed on R/Z, we get that Ei∗, 2⩽i<r are independent and EEi∗=0. Letting
we have Ei∗=Ti∗−Ai∗=∑k∈Hig(Skα−δi)=d∑k=1|Hi|g(U+Skα). Note that g^(h)=|h|f^(h)/H if 0<|h|<H, and g^(h)=f^(h) if |h|⩾H. Using the nonnegativity of the Fejér kernel, it is also not difficult to see that the Cesàro mean in the previous line has total variation ⩽V(f), and hence, V(g)⩽2V(f)=2. From the results seen in the proof of Lemma 5, it thus follows that
and hence, by the Kolmogorov inequality, after simplifying the exponent,
provided that c>0 is small enough. Combining the previous formula and (28), we can estimate the error of replacing Ti by Ai and Ti∗ by Ai∗, and we obtain
Next, fix 1⩽R<S<r, and let us estimate E|∑i=R+1S(Ai−Ai∗)|2. We start with the diagonal term E|Ai−Ai∗|2. Let Yi=∑k∈Ji−1Xk. We have
where the factor (e(hYiα)−e(h(Yiα−δi)) is independent of the sum over k∈Hi. Let Fi denote the σ‐algebra generated by Xk, k∈Ji−1 and ξi, and note that Yi and δi are Fi‐measurable. We can apply Lemma 4 to the i.i.d. sequence obtained from X1,X2,⋯ by deleting the terms Xk, k∈Ji−1 to estimate the conditional expectation with respect to Fi as
Here, |Hi|≪2(1/2−c)n, |f^(h)|2≪1/h2 and (1−|φ(2πhα)|2)/|1−φ(2πhα)|2≪1/∥hα∥. On the one hand, |1−e(−hδi)|2≪1. On the other hand, using |δi|≪(B′)−1/4≪2−cn/4, we also have |1−e(−hδi)|2≪h2|δi|2≪h22−cn/2. The main term of (31) is therefore
Taking the (total) expectation of (31) and summing over R+1⩽i⩽S, we obtain
Now fix the indices R+1⩽i<j⩽S, and let us estimate the off‐diagonal term E(Ai−Ai∗)(Aj¯−Aj∗¯). Let Yi=∑k∈Ji−1Xk and Yj=∑k∈Jj−1Xk. Note that (30) still holds. We can thus derive a factorization of Aj−Aj∗ similar to (30), from which we altogether get
We now take the expected value of (33). Note that the factor in terms of Yi,δi,Yj,δj is independent of the sum over k,ℓ. Observe also, that
depends only on h1,h2 but not on i,j. Indeed, since we chose the sizes of the blocks Ji−1,Jj−1 defining Yi,δi,Yj,δj to be the same for all i,j (within the interval [2n,2n+1)), the random variable in (34) has the same distribution for all i<j. We have
Let a=φ(2π(h1−h2)α) and b=φ(−2πh2α). Summing the previous formula over k∈Hi and ℓ∈Hj, we get
From (33), (34), and the estimates |c(h1,h2)|⩽4, |f^(h1)|⩽1/|h1|, |f^(h2)|⩽1/|h2|, we get by fixing R+1⩽i<S and summing over j=i+1,i+2,⋯,S that
Lemma 1(i) shows that here 1/|1−b|≪1/∥h2α∥ and |∑k=0B−1akbB−k−1|≪1/∥h1α∥, while Lemma 1(ii) gives |1−bB|/|1−bB+B′|≪2cn. Hence,
Applying the classical Diophantine estimate (18) and then summing over R+1⩽i<S, we can thus estimate the contribution of the off‐diagonal terms i<j. The terms j<i can be estimated similarly, and we finally get
Combining the previous formula and (32), we get that for any 1⩽R<S<r,
and thus, by the Rademacher–Menshov inequality [22, Theorem F],
Note that r≪2(1/2+c)n. Applying the Chebyshev inequality, from (29), we finally deduce
Choosing H=⌈2(3γ−1)n/(4γ2−4γ+2)⌉, the first two error terms have roughly the same order of magnitude. The estimate then simplifies to ≪2(−τ+3c)nnt−1/γ with some τ depending only on γ. Moreover, we have τ>0 whenever 1<γ<(3+5)/4. Therefore, if c>0 is small enough depending only on γ, the estimate is ≪t−1/γ, as claimed.□
Proof of Theorem 2The properties of C(α,f) were proved in Lemma 5(i). It remains to prove the CLT (4) and the LIL (5). We may assume that E(f)=0 and V(f)=1, and that 1<γ<(3+5)/4. Let us fix a small enough constant c>0 for which the claim of Lemma 7 holds. For any integer N⩾1, there exist integers n=n(N)⩾0 and 2⩽R=R(N)<rn such that 2n⩽N<2n+1 and |N−maxJn,R|≪2(1/2−c)n. Further, Tm,1, Dm,1, Tm,rm, and Dm,rm are all O(2(1/2−c)m). Hence,
and consequently,
Applying Lemma 7 with t=m−12εm where, say, ε=c(2−γ)/16, we get
and thus, by the Borel–Cantelli lemma
Summing over m=0,1,…,n−1, we see that replacing Tm,i by Tm,i∗ and Dm,i by Dm,i∗, the double sum on the right‐hand side of (35) changes by O(2(1/2−ε)n)=O(N1/2−ε). The same holds if we replace Tn,i by Tn,i∗ and Dn,i by Dn,i∗ in the second sum on the right‐hand side of (35), and so, we get
Recall that the variables Dm,i∗, 2⩽i<rm, m=0,1,…, viewed as a single sequence, are independent, mean zero random variables with variance EDm,i∗2≪|Jm,i|≪2cm, see (27). By the strong law of large numbers, the contribution of Dm,i∗ is negligible:
and so,
Here Tm,i∗, 2⩽i<rm, m=0,1,…, viewed as a single sequence, are also independent, mean zero random variables with variance ETm,i∗2∼C(α,f)|Hm,i|, see (27). Since ∑m=0n−1∑i=2rm−1|Hm,i|+∑i=2R|Hn,i|∼N, and |Tm,i∗|=O(N1/2−c), |Tn,i∗|=O(N1/2−c), the Lindeberg condition [21, p. 292], as well as Kolmogorov's condition for the LIL [21, p. 272] are satisfied, and consequently,
and
with σ=C(α,f). By (36), the CLT (4) and the LIL (5) follow.□
Proof of Theorem 3We may assume that E(f)=0 and V(f)=1, and that 1<γ<(3+5)/4. In the proof of Theorem 2, we wrote ∑1⩽k⩽tf(Skα) in the form ∑1⩽k⩽tf(Skα)=X(t)+Y(t) with stochastic processes
and Y(t)=O(t1/2−ε) a.s., where ε>0 is a constant depending only on γ, and n(t)⩾0 and 2⩽R(t)<rn(t) are integers such that |t−maxJn(t),R(t)|≪t1/2−ε. Note that X(t) and Y(t) are measurable functions of the variables Xk and the auxiliary variables ξn,i.Applying a theorem of Strassen on the almost sure approximation of sums of independent variables by a Wiener process [28, Theorem 4.4], we get that after a suitable extension of the probability space, there exists a stochastic process X∼(t) in the Skorokhod space D[0,∞) with the same distribution as X(t), such that X∼(t)=σW(t)+O(t1/2−ε′) a.s. with σ=C(α,f), a standard Wiener process W(t) and some constant ε′>0 depending only on γ. After another extension of the probability space, we can introduce independent variables X∼k=dXk and ξ∼n,i=dξn,i such that X∼(t) is the same measurable function of the X∼k and ξ∼n,i as X(t) is of the Xk and ξn,i. Let Y∼(t)=dY(t) be the same measurable function of the X∼k and ξ∼n,i as Y(t) is of the Xk and the ξn,i. Then ζ(t):=X∼(t)+Y∼(t) has the same distribution as X(t)+Y(t)=d∑1⩽k⩽tf(Skα), and
with ε′′=min{ε,ε′}>0, as claimed.□
PROOF OF THEOREM 1
4
As before, X1,X2,⋯ is a sequence of i.i.d. nondegenerate integer‐valued random variables with maximal span D=gcd(suppX1−suppX1) and characteristic function φ, and Sk=∑j=1kXj. Further, p/q is a reduced fraction, and Zq={0,1/q,⋯,(q−1)/q} is the cyclic group of order q. Let ψdisc be as in (1).
First of all, note that gcd(D,q)=1 is equivalent to the distribution of X1p/q(modZ) not being supported on a translate of a proper subgroup of Zq. In particular, Skp/q(modZ) converges in distribution to the uniform distribution on Zq if and only if gcd(D,q)=1. This is further equivalent to ψdisc(k)→0 as k→∞.
Consider now
where the maximum is over all cyclic intervals J⊆Zq, and note that ψdisc(k)⩽ψdisc∗(k)⩽2ψdisc(k). The rate of convergence in ψdisc and ψdisc∗ is thus the same. Using all cyclic intervals is in a sense more natural, however. Indeed, since the family of all cyclic intervals is translation invariant, adding an arbitrary integer to X1 does not change the value of ψdisc∗(k). It is also not difficult to see that ψdisc∗(k) is nonincreasing in k.
Most importantly, to prove Theorem 1, we may assume that D=1. Indeed, we can first translate X1 so that Pr(X1=0)>0. This changes neither |φ| nor the order of magnitude of ψdisc(k). But then D∣X1 a.s., and so, we can replace X1,X2,⋯ by X1/D,X2/D,⋯, and p/q by Dp/q. Note that the characteristic function of X1/D is φ(x/D), and that min0<|h|⩽q/2|h|·∥hDp/q∥⩾A/D>0 follows from the assumption gcd(D,q)=1. This reduces the general case of Theorem 1 to the special case D=1.
Upper bounds
4.1
The second upper bound in Theorem 1 will follow from the following general upper estimate. Lemma 8Assume that D=1, and that |φ(2πx)|⩽g(x)⩽e−cx2 for all x∈[0,r] with some constants c>0 and 0<r⩽1/2, and some function g satisfying g(x+y) ⩽g(x)g(y) whenever x,y,x+y∈[0,r]. Assume further, that min0<|h|⩽q/2|h|·∥hp/q∥⩾A>0. Then, for all k⩾q2,
with some constant τ>0 and an implied constant depending only on the distribution of X1, c,r,g and A. If r=1/2, then the term e−τklogq can be removed.
Before we give the proof, let us make a few observations about the possible choices of the function g; one could call it a submultiplicative upper envelope of |φ(2πx)|. Note first that since we necessarily have g(0)=1, log‐concavity on some interval [0,r] implies submultiplicativity on the same interval.
Recalling the estimate 1−|φ(2πx)|≫∥Dx∥2 from (10) — which holds under the sole assumption of nondegeneracy without any moment condition —, in the case D=1, we have |φ(2πx)|⩽e−cx2 for all x∈[0,1/2] with some constant c>0. Because of its log‐concavity, we can thus choose g(x)=e−cx2. In particular, Lemma 8 applies to an arbitrary nondegenerate X1.
Assuming EX12<∞, we can even choose g(x)=|φ(2πx)| with suitable constants c>0 and 0<r⩽1/2. Indeed, in this case, X1−X2 is a mean zero random variable with finite variance and characteristic function ϕ:=|φ|2. In particular, ϕ is twice continuously differentiable, ϕ′(0)=0 and ϕ′′(0)=−Var(X1−X2)<0. Therefore, (logϕ)′′(0)=(ϕ′′(0)ϕ(0)−ϕ′(0)2)/ϕ(0)2<0. By continuity, ϕ is log‐concave, and hence, submultiplicative, in an open neighborhood of 0.
Proof of Lemma 8Let 0⩽a<q be an integer, and let w:Zq→{0,1} be the indicator function of the set {0,1/q,⋯,a/q}. Its Zq‐Fourier coefficients satisfy
for any integer 0<|h|⩽q/2. Using the Fourier series expansion of w, we thus get
and consequently, we have the Berry–Esseen‐type inequality
For the rest of the proof, constants and implied constants will depend only on the distribution of X1, c,r,g, and A. By the assumption D=1, the function |φ(2πx)| is even and has smallest period 1. In addition, |φ(2πx)|=1 if and only if x∈Z. Therefore, |φ(2πx)|⩽e−τ whenever ∥x∥⩾r/2 with some constant τ>0, and hence,
Consider now the continued fraction representation p/q=[a0;a1,⋯,aM], and let pm/qm=[a0;a1,⋯,am] denote the convergents. Since p/q is reduced, we have pM=p and qM=q. Let us now estimate
Let J0=(−∥qmp/q∥,∥qmp/q∥), Jℓ=[ℓ∥qmp/q∥,(ℓ+1)∥qmp/q∥), ℓ=1,2,⋯ and Jℓ=((ℓ−1)∥qmp/q∥,ℓ∥qmp/q∥], ℓ=−1,−2,⋯. By the best rational approximation property, for any 0<h<qm+1, we have ∥hp/q∥⩾∥qmp/q∥. Consequently, each interval Jℓ contains at most one of the points hp/q(modZ), qm⩽h<qm+1, and the interval J0 is empty. If we only consider those values of h for which ∥hp/q∥<r/2, then Jℓ is also empty for all |ℓ|⩾r/(2∥qmp/q∥). Note that the submultiplicative assumption on g in particular implies that g(x) is nonincreasing on [0,r], and that all nonempty Jℓ is a subset of [−r,r]. Therefore,
By the assumptions on g, consecutive terms in the previous sum satisfy
The terms thus decay exponentially fast, hence
and by summing over m, we get
Here, the last term dominates. Indeed, by the assumptions on g,
In the last step, we used that k⩾q2 and ∥qmp/q∥⩾∥qM−1p/q∥⩾1/q. Using ∥qmp/q∥⩾1/(qm+1+qm), it is not difficult to see that the terms of the last sum decay exponentially fast as m decreases, showing that the sum is ≪1. In the main factor, we have p/q=pM/qM. Recalling the identity pmqm−1−pm−1qm=(−1)m−1 from the theory of continued fractions, we get that ∥qM−1pM/qM∥=1/qM; hence, we altogether deduce
The last relation together with (38) shows
and the claim follows from the Berry–Esseen‐type inequality (37).In the case r=1/2 (when g is submultiplicative on the whole interval [0,1/2]), we can repeat the same proof without separating the terms ∥hp/q∥⩾r/2 and ∥hp/q∥<r/2, and obtain ψdisc(k)≪g(1/q)k/q.□
Lower bounds
4.2
The lower bounds in Theorem 1 will follow from the following lemma. Lemma 9We have
Assuming that EX12<∞, we also have
To prove the first claim, we use a discrete version of Koksma's inequality: for any Zq‐valued random variables X and Y, and any f:Zq→C,
where Vq(f)=∑a=0q−2|f((a+1)/q)−f(a/q)| is the Zq‐total variation of f. The inequality (39) follows from a simple summation by parts. Now let 0<h<q be an integer, and note that f(x/q)=e(hx/q) has Zq‐total variation
Applying (39) with X={Skp/q} and a uniformly distributed Y, and noting that Ee(hY)=0, we get
Choosing h to be the multiplicative inverse of p modulo q, we finally obtain |φ(2π/q)|k=|Ee(Sk/q)|⩽2(q−1)ψdisc(k), as claimed.Next, assume that EX12<∞, and let 1⩽k⩽(q−3)2/(108VarX1). By the Chebyshev inequality, Sk lies in an interval of length 23(VarX1)k with probability at least 2/3. There are at most 23(VarX1)k+1 integers in this interval, hence by the pigeonhole principle Pr(Sk=n)⩾(33(VarX1)k+3/2)−1 for some n. It follows that the distribution of {Skp/q} has an atom, say a/q∈Zq, of weight ⩾(33(VarX1)k+3/2)−1. Letting F(a/q)=Pr({Skp/q}⩽a/q), we thus have
Here 1/q is at most half of the left‐hand side, and consequently, either |F(a/q)−(a+1)/q| or |F((a−1)/q)−a/q| is at least (123(VarX1)k+6)−1. In particular, ψdisc(k)⩾(123(VarX1)k+6)−1, as claimed.□
Proof of Theorem 1As observed at the beginning of Section 4, we may assume that D=1. First, we prove the upper bounds, starting with the case k⩽q2. In [5], we showed that if α is a badly approximable irrational and EX12<∞, then sup0⩽x⩽1|Pr({Skα}⩽x)−x|≪k−1/2 with implied constant depending only on the maximal partial quotient in the continued fraction of α. In fact, the same proof works for a rational p/q in place of α, provided that k⩽q2. In particular, sup0⩽x⩽1|Pr({Skp/q}⩽x)−x|≪k−1/2. Since the uniform distribution on Zq also has distance ≪q−1≪k−1/2 from the Lebesgue measure in the (continuous) Kolmogorov metric, it follows that ψdisc(k)≪k−1/2.Next, let k>q2. As observed, we can apply Lemma 8 with g(x)=|φ(2πx)| and suitable constants c>0 and 0<r⩽1/2 to obtain
It is easy to see that there exists a constant q0 depending only on the distribution of X1 such that for all q⩾q0 and all k>q2, the second term is negligible compared to the first one. In particular, ψdisc(k)≪|φ(2π/q)|k/q, as claimed.Finally, we prove the lower bounds. Lemma 9 immediately shows that the claim holds for any k⩽Cq2 with some constant C>0, and also for all k>q2. To see the claim on the remaining interval Cq2⩽k⩽q2, simply note that the asymptotics |φ(2πx)|2=1−4π2(VarX1)x2(1+o(1)) as x→0 shows that in this case,
provided that q is large enough.□
Total variation metric
4.3
By similar arguments as those at the beginning of Section 4, we have ψTV(k)→0 if and only if gcd(D,q)=1. In addition, to prove (2), we may assume that D=1. By the so‐called “upper bound lemma” in [15],
In the second step, we used the fact that multiplication by p is a bijection of all nonzero remainders modulo q. As observed in the paragraph after Lemma 8, the function |φ(2πx)| is submultiplicative on [0,r], and |φ(2πx)|⩽e−τ whenever ∥x∥⩾r with suitable constants 0<r⩽1/2 and τ>0. Following the methods in the proof of Lemma 8, we get
and
Therefore,
It is now easy to see that for all large enough q and k⩾1, we have ψTV(k)≪|φ(2π/q)|k, as claimed. Indeed, if, say, |φ(2π/q)|2k⩾1/2, then the claim follows from the trivial estimate ψTV(k)⩽1. If |φ(2π/q)|2k<1/2, then we necessarily have k≫q2, and consequently, e−2τkq≪|φ(2π/q)|2k for all large enough q. The claim then follows from (40). This finishes the proof of the upper bound in (2).
To see the lower bound in (2), simply note that f(x/q)=e(hx/q), where h is the multiplicative inverse of p modulo q, has maximum norm 1 on Zq. Hence,
as claimed.
JOURNAL INFORMATION
The Journal of the London Mathematical Society is wholly owned and managed by the London Mathematical Society, a not‐for‐profit Charity registered with the UK Charity Commission. All surplus income from its publishing programme is used to support mathematicians and mathematics research in the form of research grants, conference grants, prizes, initiatives for early career researchers and the promotion of mathematics.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1C. Aistleitner , I. Berkes , and K. Seip , GCD sums from Poisson integrals and systems of dilated functions, J. Eur. Math. Soc. 17 (2015), 1517–1546.
- 2D. Aldous , Random walks on finite groups and rapidly mixing Markov chains, Seminar on probability, XVII, Lecture Notes in Math., vol. 986, Springer, Berlin, 1983, pp. 243–297.
- 3D. Aldous and P. Diaconis , Shuffling cards and stopping times, Amer. Math. Monthly 93 (1986), 333–348.
- 4R. Baker , Metric number theory and the large sieve, J. Lond. Math. Soc. 24 (1981), 34–40.
- 5I. Berkes and B. Borda , Berry–Esseen bounds and Diophantine approximation, Anal. Math. 44 (2018), 149–161.
- 6I. Berkes and B. Borda , On the discrepancy of random subsequences of {nα} , Acta Arith. 191 (2019), 383–415.
- 7I. Berkes and B. Borda , On the discrepancy of random subsequences of {nα}, II, Acta Arith. 199 (2021), 303–330.
- 8I. Berkes and B. Borda , On the law of the iterated logarithm for random exponential sums, Trans. Amer. Math. Soc. 371 (2019), 3259–3280.
