Sarnak's Conjecture for Sequences of Almost Quadratic Word Growth
Redmond McNamara

TL;DR
This paper proves the logarithmic Sarnak conjecture for sequences with subquadratic word growth, demonstrating that the Liouville function exhibits complex sign patterns and does not locally correlate with such sequences, under certain conjectural conditions.
Contribution
It establishes the logarithmic Sarnak conjecture for sequences of subquadratic word growth and introduces a conditional result linking Fourier uniformity conjectures to the behavior of the Liouville function.
Findings
Liouville function has at least quadratically many sign patterns
Sequences with subquadratic word growth do not locally correlate with multiplicative functions
Conditional results depend on Fourier uniformity conjectures
Abstract
We prove the logarithmic Sarnak conjecture for sequences of subquadratic word growth. In particular, we show that the Liouville function has at least quadratically many sign patterns. We deduce the main theorem from a variant which bounds the correlations between multiplicative functions and sequences with subquadratically many sign patterns which occur with positive logarithmic density. This allows us to actually prove that our multiplicative functions do not locally correlate with sequences of subquadratic word growth. We also prove a conditional result which shows that if the -Fourier uniformity conjecture holds then the Liouville function does not correlate with sequences with many words of length where . We prove a variant of the -Fourier uniformity conjecture where the frequencies are restricted to any set of box…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
∅∅
Sarnak’s conjecture for sequences of almost quadratic word growth
Redmond McNamara
Abstract.
We prove the logarithmic Sarnak conjecture for sequences of subquadratic word growth. In particular, we show that the Liouville function has at least quadratically many sign patterns. We deduce the main theorem from a variant which bounds the correlations between multiplicative functions and sequences with subquadratically many words which occur with positive logarithmic density. This allows us to actually prove that our multiplicative functions do not locally correlate with sequences of subquadratic word growth. We also prove a conditional result which shows that if the -Fourier uniformity conjecture holds then the Liouville function does not correlate with sequences with many words of length where . We prove a variant of the -Fourier uniformity conjecture where the frequencies are restricted to any set of box dimension .
1. Introduction
The prime number theorem states that
[TABLE]
where if is a power of a prime and [math] otherwise is the von Mangoldt function. (We refer the reader to Section 1.1 for an explanation of the notation). This is equivalent to the estimate
[TABLE]
where \lambda(n)=(-1)^{\text{# of prime factors of n}} is the Liouville function. Dirichlet’s theorem on prime numbers in arithmetic progressions morally follows from the estimate
[TABLE]
for any and . Taking linear combinations, we find that for any periodic function ,
[TABLE]
Equivalently, for any function and any rational angle ,
[TABLE]
The analogous estimate when is irrational and is a continuous function was proved by Vinogradov and was a key ingredient in his proof that any sufficiently large odd number is the sum of three primes. Green and Tao proved that
[TABLE]
where is a nilpotent Lie group, is an element of , is a cocompact lattice and is a continuous function . A version of this statement was a key ingredient in their proof with Tamar Ziegler that counts the solutions to almost any system of linear equations over the primes. This motivates the following conjecture, due to Sarnak:
Conjecture 1.1** (Sarnak, see [MR3014544] ).**
For any topological dynamical system with zero entropy, any continuous function and any point in ,
[TABLE]
Tao introduced the following variant,
Conjecture 1.2** (Logarithmically Averaged Sarnak Conjecture).**
For any topological dynamical system with zero entropy, any continuous function and any point in ,
[TABLE]
Many instances of Sarnak’s conjecture have been proven. We give a few examples but stress that this is an incomplete list: [MR3043150] [MR2986954] [MR3095150] [MR3459905] [MR3415586] [MR3622068] [MR3819999] [MR3631324] [MR3347317] [MR3263939] [MR3724218] [MR3859364] [MR3702497] [MR3660308].
Definition 1.3**.**
A word of length is an element of . Let be a natural number, let and let . We say that occurs as a word of if there exists a natural number such that for all . We say that occurs with (upper) logarithmic density if
[TABLE]
In this paper, when we refer to -density we mean upper logarithmic density. A word whose entries are all is called a sign pattern. We say that has subquadratic word growth if takes finitely many possible values and the number of words of length that occur with positive upper logarithmic density is .
Then a particular case of Sarnak’s conjecture predicts that for any bounded sequence with subexponential word growth that
[TABLE]
Because correlates with itself, this in particular implies that the number of sign patterns of of length is exponential in . [FrantzikinakisHost] proved the special case where has linear word growth. In this paper, we prove the following special case:
Theorem 1.4**.**
Let be a bounded sequence with subquadratic word growth. Then
[TABLE]
Previously [Hildebrand] showed that all 8 sign patterns of length 3 occur infinitely often. [MRT2] showed all 8 sign patterns of length 3 occur with positive density. [TJ] proved that all 16 sign patterns of length 4 occur with positive density using an argument communicated to them by Matomäki and Sawin. [TJ] also showed the number of sign patterns of length is at least for . [FrantzikinakisHost] showed that the number of sign patterns is super linear. In particular, Theorem 1.4 implies that does not have subquadratically many sign patterns. We actually prove something slightly stronger.
Theorem 1.5**.**
There is a constant such that has at least many sign patterns of length .
[TaoEquiv] showed that the Sarnak conjecture is equivalent to the following Fourier uniformity conjecture for every natural number .
Conjecture 1.6** (-Fourier uniformity).**
Let be a nilpotent Lie group of step , a cocompact lattice and a continuous function. Then
[TABLE]
[TaoEquiv] also showed that this is equivalent to the -Chowla conjecture for every .
Conjecture 1.7** (Logarithmic Chowla Conjecture).**
For every natural number and every distinct natural numbers , we have
[TABLE]
A function is said to be unpretentious, nonpretentious or strongly aperiodic if there exists a function from to such that, for all natural numbers , for all Dirichlet characters of period at most we have, for all natural numbers sufficiently large and for all real numbers we have
[TABLE]
and as . The main goal of this paper is to prove the following theorems.
Theorem 1.8**.**
Let be an unpretentious completely multiplicative function taking values in the unit circle. Let be a finite-valued -bounded function. Suppose further that for any there are infinitely many such that the number of words of of length that occur with positive upper logarithmic density is at most . Then
[TABLE]
We also obtain a conditional version of this result.
Theorem 1.9**.**
Let be a natural number. Set . Let be an unpretentious completely multiplicative function taking values in the unit circle so that the local -Fourier uniformity conjecture holds for . Let be a finite-valued -bounded function. Suppose further that for some there are infinitely many such that the number of words of of length that occur with positive upper logarithmic density is at most . Then
[TABLE]
We note that this result matches the numerology in [Sawin] and may be almost the best possible result one can obtain with purely dynamical methods. We also note that even the Fourier uniformity conjecture is still unknown and so this theorem currently has no unconditional content. We also obtain a version of the theorem where need not take only finitely many values and we only have information about the number of “approximate” words.
Definition 1.10**.**
We say a sequence has at most words of length up to rounding if there exists a set of words of length such that for all there is an in such that for all and the cardinality of is at most . We say has at most words of length that occur with positive logarithmic density up to rounding if we only require for a set of of lower logarithmic density .
Theorem 1.11**.**
Let and . Then if is sufficiently small depending on then the following holds: Let be an unpretentious completely multiplicative function taking values in the unit circle. Let be a -bounded function with entropy zero. Suppose further that for every there are infinitely many such that the number of words of of length that occur with positive logarithmic density up to rounding is at most . Then
[TABLE]
In fact, this works for any satisfying .
We list a few new applications of this theorem.
Proof of Theorem 1.5. Apply Theorem 1.8 to .
Theorem 1.12**.**
If is a finite set of sequences of subquadratic word growth and is an unpretentious completely multiplicative function taking values in the unit circle then
[TABLE]
Remark 1.13*.*
We remark that since the set is finite, it is enough to show that any one function does not locally correlate with . However, we also remark that it is generally harder to show that does not locally correlate with than it is to show that does not correlate with . For Theorem 1.12, we need to use that Theorem 1.8 allows us to handle the case where may have many words which occur with [math] -density but still only subquadratically many words which occur with positive -density. Theorem 1.12 in the linear word growth case seems to follow implicitly from [alex2019mbius].
Proof.
For convenience, we will assume that [math] is in . Let . We aim to show that
[TABLE]
Suppose not. We will now use an argument of [TaoEquiv] (see Section 5 of that paper) to show that must be correlate with a “ticker tape” function. We define to be the set of sequences of the form where is an element of and is a rational number with denominator . By the pigeonhole principle, for any in and any natural numbers and in there exists a rational number with denominator such that
[TABLE]
Therefore, we may assume for the sake of contradiction that for some in
[TABLE]
By a diagonalization argument, we may find a sequence and of natural numbers both tending to infinity and functions such that and
[TABLE]
Since the functions in and are bounded, for sufficiently large there exists a set of natural numbers of lower logarithmic density in the interval such that for in ,
[TABLE]
By a greedy algorithm, we can select a subset of of upper logarithmic density at least in that is at least separated (meaning distinct points of differ by at least ). Now define the “ticker tape” function as follows:
[TABLE]
for all in between and and . If is not of the form for in between and and then we set . Thus,
[TABLE]
Now we aim to show that has subquadratically many words of length that occur with positive upper logarithmic density. Let be a natural number and let be a word of length which occurs in with positive upper logarithmic density. Consider the set of natural numbers such that is within of an element of or for some . Then since elements of are at least separated, the upper logarithmic density of in is at most which clearly tends to [math] as tends to infinity. Since , we may assume that the -density of in the interval is also . Thus, has -density [math]. Therefore, if occurs with positive -density then for a positive density set of not in . Since has only finitely many members, we get that there exists in such that for a positive upper logarithmic density set of , for all . Thus, has subquadratic word growth and does not correlate with by Theorem 1.8, which gives a contradiction. ∎
Theorem 1.14**.**
Let be a subset of of upper box dimension . Then if is an unpretentious completely multiplicative function taking values in the unit circle
[TABLE]
Remark 1.15*.*
In particular, this implies that if is the middle thirds Cantor set then
[TABLE]
Of course, the result also applies to a large family of other fractals. The author does not know of any results in the literature where this is established for any infinite set. He does not know of any proof for any set of positive box dimension which does not use Theorem 1.11.
Proof.
Suppose the upper box dimension of is . Let . As in the proof of Theorem 1.12, we assume that
[TABLE]
and derive a contradiction. As before, there is a ticker tape function such that
[TABLE]
of the following form: there exists sequences of natural numbers and tending to infinity with , a sequence of -separated sets , and for some rational of denominator at most , some in and for all in some set and . We set for all natural numbers not of this form. As before, for any natural number , the natural numbers that are within of a number in or has -density [math].
Let be a natural number sufficiently large depending on and . Let . Then because has upper box dimension there exists a collection of at most intervals of length covering . If two points on the circle and differ by at most then by the triangle inequality, for all , we have that . Therefore, the number of sign patterns of that occur with positive -density up to rounding is sublinear. In particular, for any , there are fewer than many sign patterns that occur with positive -density up to rounding. By Theorem 1.11, we get a contradiction. ∎
On the surface, this argument appears to be very close to the -Fourier uniformity conjecture, which Tao introduced in [TaoEquiv] and proved was equivalent to the -Chowla and -Sarnak conjectures. (For recent significant progress on the Fourier uniformity conjecture, see [MRT3]). If you wanted to prove the Fourier uniformity conjecture in the case , namely that
[TABLE]
the ticker tape functions that you would need to be orthogonal to have many sign patterns of length up to rounding. Thus, one might hope that a simple argument could adjust the constants in Theorem 1.11 and thereby prove the Fourier uniformity conjecture. However, there is a major theoretical obstacle to further progress. [FrantzikinakisHost] introduced the dynamical system where . [Sawin] showed that this dynamical system with some additional structure is a dynamical model for the Liouville function (a notion which we will precisely define later). This is an obstruction to solving the Fourier uniformity conjecture purely with dynamical methods and without any new input from number theory. [Sawin] further showed that there are dynamical models for the Liouville function which have only polynomially many sign patterns. Explicitly, consider the following function which behaves almost like a multiplicative function: we partition the natural numbers into intervals with the length of the intervals slowly tending to infinity. For instance, we could split all the numbers between and into blocks of length . Then on each interval we pick a random phase in uniformly and indepently. Then we set to be the function obtained by rounding the function which sends for in . In formulas, we set for in . We remark that the dynamical model for this sequence is isomorphic to the product of the dynamical system introduced by [FrantzikinakisHost] with (again, we defer the precise definition until later). Clearly, is not multiplicative. However, it is “statistically” multiplicative in the sense that, with high probability, for any sign pattern of length , for any and for large
[TABLE]
This function clearly does not satisfy the -Fourier uniformity conjecture and [Sawin] showed that it has quadratically many sign patterns that occur with positive upper logarithmic density even though it does satisfy the a version of [MRT]. If we had used a random -degree polynomial instead of a random linear polynomial, we would get a function which is again statistically multiplicative but which fails the -Fourier uniformity conjecture and [Sawin] showed that it has many sign patterns of length . However, the author is unaware of any “dynamical” techiniques that distinguish these statistically multiplicative functions from the Liouville function. This is made precise with Definition 1.18.
We give one last application.
Theorem 1.16**.**
Again, suppose that is an unpretentious completely multiplicative function taking values in the unit circle. There is a set of Hausdorff dimension 1 such that
[TABLE]
Proof.
The main idea is to combine Theorem 1.14 with a diagonalization argument. For a disjoint collection of intervals and a natural number we define to be the set of intervals obtained by taking each , removing a ball of diameter around the center of the interval , taking the two remaining intervals, then taking the union over all in .
We construct inductively as follows. Start with any interval and set . Assume inductively that we have constructed Then we apply again and again. Let
[TABLE]
Since has box dimension , we know by Theorem 1.14 that there exists a natural number such that if then
[TABLE]
Then define
[TABLE]
We set
[TABLE]
Clearly, the Hausdorff dimension of is at least for every and therefore the Hausdorff dimension is precisely . Now we verify that has the desired property. For each natural number , by enlarging the set we are maximizing over, we have that
[TABLE]
Since was arbitrary, we obtain the desired result. ∎
Remark 1.17*.*
We have stated our main theorems in the case that is completely multiplicative and takes values in the unit circle. We remark that these assumptions can be weakened to include all multiplicative functions taking values in the unit disk. The reduction from multiplicative functions taking values in the unit disk to multiplicative functions taking values in the unit circle is essentially due to Tao (see [TaoChowla], Proposition 2.1). The reduction from multiplicative functions to completely multiplicative functions (say both taking values in the unit circle) is carried out in the second appendix to this paper. The argument is rather short and was essentially communicated to me by Tao. However, it may be more broadly known and I make no claim of originality.
We now sketch an outline of an argument that is morally very similar to the main argument in this paper. However, for the moment we will work in a more concrete setting. To make this argument rigorous, it is much easier to pass to the dynamical context. Suppose that is a sequence with quadratic word growth rate and that
[TABLE]
Then we can fix a natural number and average over translates,
[TABLE]
Fix a large natural number with . Because also has a multiplicative symmetry, we can average over dilates
[TABLE]
Moving the absolute values inside and crudely replacing by the worst word of length , we get
[TABLE]
where the supremum is taken over all words of . Tao’s entropy decrement argument, introduced in [TaoChowla], allows us to replace by .
[TABLE]
Now if behaves randomly, then we already know that is orthogonal to . Therefore, if correlates with it must have some structure. Morally, [FrantzikinakisHost] says we can break up into a structured part and a random part, and that all the correlation comes from the structured part. [HostKra] proves that the structured part must take the form of a nilsequence. For the purposes of this sketch, we will focus on the case that there exists and irrational such that
[TABLE]
By Hölder’s inequality,
[TABLE]
By the pigeonhole principle, since there are only many sign patterns, there is a sign pattern such that,
[TABLE]
Expanding everything out and using that
[TABLE]
When or then for large, by the circle method
[TABLE]
The analogue of the circle method for more general nilpotent Lie groups was introduced in [GT1], [GT2] and [GTZ]. The analogue of the step where we conclude that the sums of powers is [math] for more general nilpotent Lie groups is an argument of [Frantzikinakis]. Thus the only contribution is from the terms where and . But it is easily seen from Newton’s identities for symmetric polynomials that this only happens for the “diagonal” terms. Thus, we get
[TABLE]
which of course provides a contradiction. For the proof of Theorem 1.9, we need to not only use the theory of symmetric polynomials but also use [MR3548534].
1.1. Background and notation
Suppose is a 1-bounded, unpretentious multiplicative function with for all . Let a sequence where only or many sign patterns occur with positive -density. The usual construction of a Furstenberg system (see [MR670131]) for proceeds as follows: consider the point in the space of pairs of sequences. Then apply a random shift to this deterministic variable, . This gives a random variable in the space of pairs of sequences. The distribution of this random variable is then a shift invariant measure on the space of pairs of sequences. Furthermore, if is the function on the space of pairs of sequences that evaluates the first sequence at and is the function which evaluates the second sequence at then
[TABLE]
which is the sequence whose average value we care about. Therefore, if the average of is greater than in absolute value then
[TABLE]
as well. Of course, it does not really make sense to take a random natural number. Instead, one must shift by a random natural number in a large but finite interval whose length tends to infinity, then find a subsequence of the random variables that converges in distribution. This corresponds to taking a weak- limit of the corresponding measures.
However, we take a slightly modified approach. The reason is that the function has some additional symmetry, namely . As such, the probability that some word occurs i.e., that for and for randomly chosen between and is the same as the probability that for and for chosen randomly between and . That’s the same as times the probability that for a randomly chosen between and one has for and divides . Just flipping everything around, the probability that a random between and satisfies and is divisible by is times the probability that a random between and satisfies . We want our dynamical system to capture this symmetry. There are two difficulties which arise when we want to translate this symmetry to our dynamical system. The first is that the interval keeps changing: the distribution of might be very different on the intervals from to and from to so when we take a weak limit along a subsequence of intervals, the distribution for shifts in one interval might approximate our invariant measure while shifts along the other interval might not. The fix for this problem is to use -averaging. After we weight each natural number by , the probability that a random will be between and is which tends to [math] as tends to infinity. Therefore, the distribution of for a random between and is very close to the distribution of for a random between and as long as we choose randomly using logarithmic weights. The other problem is that our dynamical system does not have a good notion of “being divisible” by a number. To remedy this, we make use of the profinite completion of the integers
[TABLE]
where is always restricted to be a prime and is the -adic integers i.e. the inverse limit of for all . For each natural number , we get an element of by reducing for every prime and every natural number . Then to build our dynamical system, we take the space of triples consisting of two sequences and a profinite integer and for a logarithmically randomly chosen integer we consider the random variable in this space. The distribution of this random variable is a shift invariant measure. Furthermore, we have the following symmetry: let and . Define the function
[TABLE]
by projecting onto the coordinate in ,
[TABLE]
Define the function
[TABLE]
by “zooming in” by a factor of and multiplying by on the first factor and dividing by on the second,
[TABLE]
where is the unique element of such that . Then if is our invariant measure on and is its first marginal then pushes forward restricted to to . Formally, we make the following definition:
Definition 1.18**.**
Let be a dynamical system, let be a measurable function, let be a measurable function and for each let be a measurable function. We say is a dynamical model for if,
- •
almost everywhere.
- •
almost everywhere in .
- •
pushes forward the measure restricted to to . Symbolically, for any function in we have
[TABLE]
- •
almost everywhere in .
- •
For all and , almost everywhere in .
We also ask for the following property that [Sawin] does not impose.
- •
For any natural number and any measurable subset of ,
[TABLE]
where denotes upper logarithmic density. We remark that we can also fix a Banach limit extending the usual limit functional and require that equality holds in the previous equation holds for any limit taken with respect to that Banach limit. For more details, see [TaoBlog].
Let be a joining of two dynamical systems and . Suppose that is the first marginal and is a dynamical model for . Let be a measurable function on which is measurable. We say is a joining of a dynamical model of with if we also have that, for any natural number and any measurable subset of ,
[TABLE]
where denotes upper logarithmic density. We could also require that the joint statistics of agree with the joint statistics of but this is not necessary for our argument.
Remark 1.19*.*
The preceding definition was used first in [TaoBlog] and generalized in [Sawin].
We abuse notation and denote all transformations by the letter . We also remark that for the proof of Theorems 1.8 and 1.9 that only takes finitely many values.
We now specify some notation used in the main argument:
- •
We fix an unpretentious 1-bounded multiplicative function . (For the definition of unpretentious, see [MRT]; we will only really use that is unpretentious in Theorem 2.1; we remark that the Liouville function is unpretentious). We fix constants , and . We fix a 1-bounded function with at most or many words of length occurring with positive upper logarithmic density for all where is some fixed infinite set. We suppose that
[TABLE]
We fix such that
[TABLE]
- •
We use the following theorem of [FrantzikinakisHost2].
Theorem 2.15** ([FrantzikinakisHost2] Theorem 1.5).**
There exists a joining of a dynamical model for with , satisfying
[TABLE]
and if is the first marginal then the ergodic components are isomorphic to products of Bernoulli systems with the Host Kra factor of .
Because the statement here is slightly different than Theorem 1.5 in [FrantzikinakisHost2], we will go through the details in the first appendix. We fix such a system. We will always denote by the first marginal of . We also fix ergodic decompositions and . We define the words of length of to be those words of length such that the set of such that for all has positive measure. We note that the set of words of is a subset of the set of words of that occur with positive upper density: after all, if then by definition of a joining of a dynamical model with ,
[TABLE]
where denotes upper logarithmic density.
- •
will always refer to a nilpotent Lie group. will always refer to the step in the lower central series. will always refer to a cocompact lattice in , meaning that is compact for every . and will always refer to group elements. will always refer to the Borel sigma algebra. We will fix a particular , and following Corollary 2.20. For more on this see [GT3].
- •
For a nonempty, finite set and , we denote . For , we denote
[TABLE]
This notation is due to Frantzikinakis (see [Frantzikinakis]). We always restrict to be prime. By definition a nilsystem is a dynamical system where is a nilpotent Lie group, is a cocompact subgroup, is Haar measure, there exists such that and is the Borel sigma algebra. A nilsequence is a sequence of the form where is a nilpotent Lie group, is a cocompact lattice in , is an element in and and is a continuous function. Suppose is an -step nilpotent Lie group so that is an abelian group and is a compact abelian group. Then a nilcharacter is a function such that there exists a character called the frequency of such that, for all in and in we have . We will abuse notation and identify with the function on that maps . We say is nontrivial if there exists in such that . We say is nontrivial on the identity component if we can find a in the identity component of such that .
- •
For Theorem 2.14, we will adopt conventions from the theory of Shannon entropy. In particular, will denote the Shannon entropy of and will denote the mutual information between and . For more details, see [TaoChowla].
- •
We will always denote by the smallest sigma algebra on generated by the union of the sigma algebras corresponding each of the Host Kra factors. We will denote
[TABLE]
Since whether is in depends only , we will abuse notation and also use B=\{y\in Y\colon f^{\prime}(T^{n}y)\text{ is eventually periodic as a function of n}\}.
- •
For a complex numbers , a set and a real number we say and if there exists a constant depending on but not and such that . If there are more subscripts we mean that the constant may depend on more parameters. For instance, by we mean that the implied constant can depend on , and .
1.2. Acknowledgments
Special thanks to Terence Tao for sharing many ideas on an earlier version of this paper and for his many helpful comments. Also, special thanks to Nikos Frantzikinakis for pointing out a number of ways to strengthen the main theorem of this paper. I would also like to thank Tim Austin, Björn Bringmann, Alex Dobner, Asgar Jamneshan, Bernard Host, Gyu Eun Lee, Zane Li, Adam Lott, Maksym Radziwiłł, Bar Roytman, Chris Shriver, Joni Teräväinen and Alex Wertheim for many helpful discussions. Thanks to Will Sawin for suggesting I use Vinogradov’s mean value theorem to improve an earlier version of Theorem 1.9. I would lastly like to thank the anonymous reviewer for their extremely helpful comments. Some of this work was completed while the author was at the American Institute for Mathematics workshop on Sarnak’s conjecture.
2. Main Argument
In this section, we prove Theorem 1.8 and Theorem 1.9. In Section 3, we explain how to adapt the proof to handle Theorem 1.11.
We remark that much of the notation, including and was defined in Subsection 1.1.
We start off with a theorem by [MRT], relying on work in [MR]. This is a special case of our theorem, so it is no surprise that we need this result.
Theorem 2.1**.**
[[MRT] Theorem 1.7; see also [MR]] Let be a bounded, non-pretentious multiplicative function. Let be a periodic sequence. Then
[TABLE]
This theorem says that does not locally correlate with periodic functions. Eventually, we plan to use a local argument. In particular, our argument will only work for those points where does not behave locally like a periodic function. Therefore, we need to exclude any contribution to the integral coming from points where behaves like a periodic function. That is the content of the following corollary.
Corollary 2.2**.**
Let B=\{(x,y)\in X\times Y\colon f^{\prime}(T^{n}y)\text{ is eventually periodic as a function of n}\}. Then
[TABLE]
Proof.
In this proof, we introduce some notation which will not be used in the rest of the paper. Because preserves and because is -invariant, we can average over shifts:
[TABLE]
We know takes only finitely many values. There are only countably many different periodic sequences taking values in a finite alphabet. Therefore, it suffices to prove that if is the set of points on which is eventually equal to the periodic function that
[TABLE]
as desired. ∎
We will also need the following result later. It states that does not correlate locally with periodic functions.
Corollary 2.3**.**
Let be an ergodic decomposition of . For almost every , for all -bounded function such that, for almost every , is periodic in we have
[TABLE]
Proof.
Let be a natural number. Then we claim that,
[TABLE]
where is the set of periodic, 1-bounded functions. Since the supremum is over a finite set, this directly follows from Theorem 2.1. Let . For sufficiently large,
[TABLE]
Therefore, as in the proof of Proposition 2.2
[TABLE]
Since this is true for all , we get that
[TABLE]
Therefore, there exists of full measure such that for in , we have
[TABLE]
Now let be an element of for all and let be a 1-bounded function such that is periodic in for -almost every . Suppose that there exists such that
[TABLE]
Then by translation invariance, we know
[TABLE]
Let be the set of all points such that is periodic with period at most . Note by assumption that . Then by dominated convergence, there exists such that
[TABLE]
Since is periodic for every in , this integral is bounded by
[TABLE]
which gives a contradiction. ∎
For the proof of Theorem 1.9, we also need an upgraded version of Corollary 2.3 under the assumption that the -Fourier uniformity conjecture holds.
Proposition 2.4**.**
Suppose that the -Fourier uniformity conjecture holds i.e., for every nilpotent Lie group of step , every cocompact lattice and every continuous function
[TABLE]
Then for almost every , we have the following property: for every nilpotent Lie group of step , every cocompact lattice , every continuous function and every function on such that for almost every there exists in and in we have for all in we have that
[TABLE]
Proof.
In the proof of this proposition, we will introduce some notation which will not be used in the rest of the paper. By, for instance, [HostKra2, Chapter 10, Theorem 28] there are only countably many pairs up to isomorphism of . Thus, we can fix a sequence of nilpotent Lie groups of step and cocompact lattices such that, for any nilpotent Lie group of step and for any cocompact lattice there exists a natural number and a Lie group isomorphism such that . By Stone-Weierstass, there exists a countable, uniformly dense subset of the continuous functions on . Fix such a subset and call it . We are assuming the -Fourier uniformity conjecture:
[TABLE]
depends measurably on . Thus, there exists some set in such that
[TABLE]
Remember that implicitly depends on and . By the definition of the ergodic decomposition, we have that
[TABLE]
Therefore, by another application of Chebyshev’s inequality, we find that
[TABLE]
We call this set . Of course depends on and . Define
[TABLE]
Since , we know that for any , we have and therefore .
Now we check that has the desired properties. Thus, fix in , a measurable function on , a nilpotent Lie group of step , a cocompact lattice and a function on . Suppose that for almost every in , there exists in such that for some in . Fix . We aim to show
[TABLE]
Fix in the natural numbers such that is isomorphic to . Fix an isomorphism such that . Fix in such that . Then is in so is in and therefore there exists such that is in . Therefore, for some ,
[TABLE]
Bounding the exceptional points by the norm, we get that:
[TABLE]
This completes the proof. ∎
Proposition 2.5**.**
Let be a (topologically) compact, invertible, not necessarily ergodic dynamical system. Let be an ergodic decomposition. Recall that, for each , the Host Kra factor is defined up to sets of -measure [math]. For each , fix such a Host Kra factor. For instance, one could use any definition of the Host Kra factor and then add all sets of -measure [math] to obtain the complete Host Kra factor. Then there exists a sigma algebra on such that, for any measurable set , is measurable if and only if there exists a full measure subset such that for all in , is measurable. This implies that a function in is measurable if and only if there exists a full measure subset such that is measurable for every in .
Proof.
Let be the set of measurable subsets of such that there exists a full measure set such that for all in , is measurable. For each such set, fix such an . Let be a countable list of sets in . Consider
[TABLE]
Because is the intersection of countably many full measure sets, it has full measure. Let be an element of . Then for every natural number , is measurable. Because is a sigma algebra, that implies the countable intersection and countable union of the sets are also measurable. Thus the intersection and union are both measurable for a full measure subset and thus, by definition of , is closed under countable unions and intersections. If is in , then is in for every in . Since is a sigma algebra, the complement is also in for every in . By definition of , we conclude that is closed under complements. Obviously and are in so is a sigma algebra.
Lastly, we check that a function in is measurable if and only if it is measurable for a full measure set of . First, suppose there exists a full measure subset such that, for in , is measurable. Let be a measurable subset of . Then since is measurable for any in , is in for any in . Therefore, by definition of , is in so is measurable.
Now suppose is measurable. We approximate by simple functions . For instance, we can take if is between and for any natural number . Then in and also in for any by the dominated convergence theorem. For each , the function has only finitely many distinct level sets. Because is measurable, the level sets of are measurable. Therefore, there exists a full measure subset such that is measurable for all in . Let
[TABLE]
Then since is the intersection of sets of full measure, has full measure. For each in , is measurable for all natural numbers . But in so the limit is also measurable for all in . ∎
Definition 2.6**.**
By Proposition 2.5, there exists a sigma algebra such that a function is measurable if and only if it is measurable for almost every in . We fix such a sigma algebra and call it the Host Kra sigma algebra for .
Proposition 2.7**.**
Let be a function in . Then there exists a set of full measure in such that for in ,
[TABLE]
* almost everywhere.*
Proof.
First, we need the following quick ergodic theoretic fact. The space can be essentially partitioned into pieces where each piece carries all the mass of an ergodic component. More precisely, there exists a map such that
[TABLE]
for any integrable . For instance, in the usual construction of an ergodic decomposition, one can take to be the set of atoms of with respect to the invariant sigma algebra . Then let where is any point in the atom . In this case the map just sends to the atom containing .
By Proposition 2.5, there is a set of full measure such that is measurable for every in . We also ask that for in that
[TABLE]
which holds for a full measure set of . Fix such an . Since is compact, there exists a countable uniformly dense subset of the space of continuous functions. Fix such a subset and fix an order on that subset . Again by Proposition 2.5, there exists a full measure subset of such that for in , the function is measurable. Let
[TABLE]
Since each has full measure and there are only countably many choices of , we conclude that has full measure. Now let be an element of and suppose for the sake of contradiction that
[TABLE]
meaning equality does not hold up to sets of measure 0. The conditional expectation is uniquely defined by two properties, namely that is measurable and that
[TABLE]
for any measurable function in . If satisfies the same properties then equals -almost everywhere. We know since is in which is contained in that is measurable. Therefore, there exists in such that
[TABLE]
By subtracting off the appropriate multiple of , we may assume that is -orthogonal to . Multiplying by a scalar we may assume that is a positive real number greater than 1.
For each in such that , we showed there exists a measurable function such that and . Let be such a function. Suppose for the moment that . Since are dense in , for any and for any power we can find an such that . This implies, by Cauchy-Schwarz, that
[TABLE]
and
[TABLE]
We also need a quantitative way of saying that is close to being measurable. One option is to use the Host Kra norms defined for an ergodic system in [HostKra] section 3.5. Let denote the Host Kra norm. The key feature of the Host Kra norms is that a function is measurable if and only if for all (see [HostKra] Lemma 4.3). We claim that is a measurable function of . After all, by definition is the integral of some fixed function on , namely with respect to some measure (namely defined in [HostKra] section 3.1) which depends measurably on . Thus, we can find also with
[TABLE]
for all . If we also know that
[TABLE]
for some constant then also by the triangle inequality,
[TABLE]
Fix a constant . Now we define a function as follows: Let be the first index such that all four inequalities 2 - 5 are satisfied with if such an exists and otherwise. Note that implicitly depends on . Let denote the set of such that is finite for all . In particular, if then is in this set for some choice of . Thus, we may assume for the sake of contradiction that the measure of is positive. Let
[TABLE]
Since, for all
[TABLE]
we can take an weak- limit for some subsequence of epsilons tending to [math]. By a diagonalization argument, we can ensure that this weak- limit exists for all . By 4, we conclude that is measurable for each in . If is not in , then on a set of full measure so is measurable with respect to for a full measure set of in so by definition is measurable. Futhermore, by 3
[TABLE]
so we conclude that
[TABLE]
by integrating in . On the other hand, by 2
[TABLE]
This contradicts the definition of . Thus,
[TABLE]
for almost every in . ∎
A crucial input is the following theorem of [FrantzikinakisHost2]. This theorem says that if correlates with then it does so for some algebraic reason. In particular, any correlation between and is due solely to some locally algebraic structure in .
Theorem 2.8** ([FrantzikinakisHost2] Theorem 1.5; see also the first appendix to this paper).**
Let be the first marginal of corresponding to the factor . Then the ergodic components of are isomorphic to the product of a Bernoulli system with the Host Kra factor of ).
To use this theorem, we need the following result, which essentially appears in [FrantzikinakisHost]:
Lemma 2.9** ([FrantzikinakisHost]; see the proof of Lemma 6.2).**
Suppose that where is a Bernoulli system, is a zero entropy system and is the first marginal of . Then for any function and any function we have
[TABLE]
where denotes the conditional expectation of with respect to the measure and the sigma algebra of -measurable functions.
Proof.
By density, it suffices to consider the case . Because any joining of the Bernoulli system and the zero entropy system is trivial i.e. is equipped with the product measure, we can break up the the integral
[TABLE]
∎
We also need the following result, which says that conditional expectation is essentially local.
Putting everything together gives the following corollary.
Corollary 2.10**.**
Let , , , , , and be as in Subsection 1.1. Let be as in Corollary 2.2. Then
[TABLE]
Proof.
Recall that
[TABLE]
This completes the proof. ∎
Now we forget everything about the joining of and and reduce to the worst case scenario, where we choose the worst possible in for each in .
Corollary 2.11**.**
Let , , , , , , and be as in Subsection 1.1. Let be as in Definition 2.6. Let be as in Corollary 2.2. Since whether only depends on , we abuse notation and write to mean for some . Then,
[TABLE]
where the supremum is an essential supremum taken with respect to the second marginal of .
We will need the following lemma, which states that conditioning with respect to a conditional measure is essentially the same as conditioning with respect to the original measure.
Lemma 2.12**.**
Let be a positive measure set in and denote . Then for any measurable function ,
[TABLE]
* almost everywhere i.e. for -almost every point in .*
Proof.
Let be another set in . Then
[TABLE]
This is the defining property of . Since conditional expectation is well defined up to sets of measure [math], we obtain the result. ∎
The system possesses an extra symmetry that most dynamical systems do not have, a dilation symmetry. In fact, it possesses a whole family of dilation symmetries. It is not obvious which dilation makes the problem easiest. Therefore, instead of choosing a particular dilation, we use a random dilation.
Proposition 2.13**.**
Let be any natural number. Then
[TABLE]
where is always restricted to be prime.
Proof.
By Corollary 2.11 we have
[TABLE]
Now we use that pushes forward to for every and average in .
[TABLE]
Because for almost every in we have that,
[TABLE]
Next we use the standard fact that
[TABLE]
-almost everywhere, where is the pushforward of . Since , we can replace by . Note that defines a factor map between and . Since Host Kra factors are functorial, the Host Kra factor for factors onto the Host Kra factor for . Thus, is contained in the Host Kra factor of some dynamical system and thus corresponds to an inverse limit of nilsystems. This is all we actually need for our purposes. However, for the sake of avoiding notation, we also prove that
[TABLE]
That follows from the definition of . If denotes the Host Kra factor for and denotes the Host Kra factor for , then any invariant subset of the cube is an element of the Konecker factor i.e. the first Host Kra factor for (where is the measure on the cube defined in section 3 of [HostKra]). Since the Host Kra factor of an ergodic system is the smallest sigma algebra generating the invariant factor on the cube, we conclude that so . In fact, as in the appendix, the Host Kra factor for is a joining of the Host Kra factor on the space of sequences and . On the second factor, acts by division by . On the first factor, and so on each ergodic component of the first factor, acts by multiplication by up to a possible translation. Multiplication by is a local isomorphism of any nilmanifold that does not contain torsion. However, by Corollary 2.3, is already orthogonal to all torsion. Thus,
[TABLE]
Combined with Lemma 2.12 and the fact that is invariant and therefore an element of we get,
[TABLE]
Recall that for all . Thus, merely gets absorbed into the absolute value.
[TABLE]
∎
2.1. The Entropy Decrement Argument
Next, we use the entropy decrement method to replace by its average, . This is essentially due to Tao but because our statement is slightly different we reproduce the argument. For the definitions of entropy, conditional entropy, mutual information and conditional mutual information see [TaoChowla].
Let be a random variable distributed according to and fix a natural number . From this, we get the following two random variables. Set where and set in by . Denote so that . Note that is uniformly distributed in and that the distribution of is the same as the distribution of for any because is translation invariant. Technically, if takes infinitely many values then we will have to round so that each takes values in a finite set but this slightly annoying detail may be delayed for the moment. We want to study the following integral:
[TABLE]
By translation invariance, this is equal to
[TABLE]
Notice that this is the expected value of some function of and . In particular, we are interested in
[TABLE]
where is defined by the formula
[TABLE]
Define
[TABLE]
Thus, we are interested in
[TABLE]
We would like to say that and are very close to independent for some large choice of . Let be a random variable with the same distribution as but which is independent of . We would like to say that
[TABLE]
A property like this actually holds in a more general setting, which we take the liberty of stating now.
Theorem 2.14**.**
[[TaoChowla] Section 3; see also [Blog2],[TJ] Lemma 3.4 and Proposition 3.5 and [TJ2] Section 4] Let be a finite set and let be a natural number. For each power of two , let be a sequence of random variables with taking values in and let be a random variable that is uniformly distributed in . We write where . We further assume that for different values of , the random variables are jointly independent meaning is uniformly distributed in for all powers of two . Suppose that, for any natural numbers and such that we have that the distribution of is equal to the distribution of . Furthermore, suppose that for any and any element in and any a measurable subset of ,
[TABLE]
For each with , let be a -bounded function and let . Let be a random variable with the same distribution as but which is independent of . Then
[TABLE]
Proof.
Fix a large power of two and . By replacing by we may assume that for all . To prove the theorem, first we need a very good understanding of the case when and are independent. In that case, even if we know the exact value of , is still a sum of independent random variables and therefore exhibits concentration. This is formalized in Hoeffding’s inequality, which says that large collections of independent random variables exhibit concentration.
Lemma 2.15** (Hoeffding’s Inequality).**
Suppose are independent random variables taking values in . Then
[TABLE]
Let be an element of . We apply Hoeffding’s inequality to the random variables . (We remind the reader that there are roughly many such terms, by the prime number theorem).
[TABLE]
Next, we aim to show that if is not necessarily independent of but nearly independent of , we still can obtain a good bound. To do this, we use a Pinsker-type inequality.
Lemma 2.16**.**
[[TJ2] Lemma 3.4] Let be a random variable taking values in a finite set, let be a uniformly distributed random variable on the same set and let be a set. Then
[TABLE]
Let be an element of . Let be the set of in such that . By 6, we know
[TABLE]
Applying Lemma 2.16 to , we find
[TABLE]
Note that
[TABLE]
where the last equality follows since since the two random variables have the same distribution. Therefore, summing over , we get
[TABLE]
If
[TABLE]
then
[TABLE]
This would complete the proof. Let . Fix an element of . Then we may repeat the previous argument with to conclude:
[TABLE]
and therefore if
[TABLE]
then
[TABLE]
and therefore
[TABLE]
Let be a power of two. We will try to show that there exists such that
[TABLE]
This would complete the proof. Suppose not. Then
[TABLE]
for all . By definition of mutual information,
[TABLE]
where . Since has the same distribution as , for any set in and for any in
[TABLE]
Since the entropy of a random variable only depends on its distribution, we conclude that, for all
[TABLE]
Since is uniformly distributed, for all ,
[TABLE]
Therefore, summing in ,
[TABLE]
[TABLE]
where and where . Applying this argument inductively, if then
[TABLE]
However, so for large
[TABLE]
Combining this with 7,
[TABLE]
which is impossible.
∎
Applying the Theorem 2.14 to our situation yields,
Corollary 2.17**.**
Let , , , , , , , and be as in Subsection 1.1. Let be as in Definition 2.6. Let be as in Corollary 2.2. We have
[TABLE]
Proof.
Recall that, for all natural numbers ,
[TABLE]
By translation invariance, for all natural numbers ,
[TABLE]
Let be a random variable with distribution . Fix small. We will ask that . Let be a measurable function on which uniformly approximates i.e.
[TABLE]
For instance, could be obtained by rounding to the closest element of . By the triangle inequality
[TABLE]
For each natural number , let where and where . Let where . For each natural number , let
[TABLE]
Define
[TABLE]
Unpacking definitions, for every natural number ,
[TABLE]
Now we check the hypotheses of Theorem 2.14. Because takes only finitely many values, takes values in a finite set. For all natural numbers , since the distribution of is a invariant measure on , it must be the uniform distribution. Since is translation invariant, for any natural numbers and and any subset of
[TABLE]
Applying this to the preimage under of an arbitrary subset of reveals that the distribution of is the same as the distribution of . Similarly, if is an element in if is the preimage under of an arbitrary set intersected with the set of points in such that then we conclude
[TABLE]
For each natural number and each prime , for at most two values of is it true that . Therefore, at most two terms in the sum
[TABLE]
are nonzero. Therefore is bounded by . Let be a random variable with the same distribution as . Then by Theorem 2.14
[TABLE]
Since, for any natural number ,
[TABLE]
We conclude that
[TABLE]
Unpacking definitions, this proves
[TABLE]
By the triangle inequality
[TABLE]
Since was arbitrary
[TABLE]
By translation invariance
[TABLE]
This completes the proof. ∎
2.2. Nilsystems and Algebraic Structure
Now we want to use [HostKra] to show that has some local algebraic structure. This algebraic structure makes much easier to understand than .
Proposition 2.18**.**
Let be an element of such that
[TABLE]
Then for almost all such choices for , there exists a collection of nilsystems , -bounded functions and factor maps so that is a nilcharacter on with frequency nontrivial on the identity component and such that, after identifying with a function on , we have that satisfies and
[TABLE]
Proof.
We are given that
[TABLE]
Recall that by Lemma 2.7, we know that
[TABLE]
By [HostKra] Theorem 10.1, is isomorphic to an inverse limit of nilsystems. Therefore, there exists a nilsystem, a factor map such that
[TABLE]
We denote . By a Fourier decomposition, we may write as a sum of nilcharacters, . For each , either is nontrivial on the identity component of or is trivial on the identity component. If is trivial on the identity component and the step of is , then is actually trivial on . That is because, for any in , the multiplication by map is continuous so it takes components to components. Let \sigma_{*}\colon\text{components of G}\rightarrow\text{components of G} be the induced map on components and let be any other element of . Then if and are in the same component of then for any in , multiplication by on the right is also continuous, so is in the same component as so . We return to the general case where and are not necessarily in the same component. Also note that, for any element in , . Pick , , and such that is in the same component as and is in the same component as . Thus where is an element of higher order. Of course and by induction we get that is the identity and therefore is in the identity component for some . Therefore, if , the function descends to a function of on . By induction, we can almost prove the theorem, namely we can find a collect of nilsystems and functions and factor maps so that is a nilcharacter on with frequency nontrivial on the identity component or is abelian and such that, after identifying with a function on , we have that satisfies and
[TABLE]
It remains to observe that the case of a locally constant function on an abelian group cannot occur by Corollary 2.3 as follows: we can think of the as all functions on the group with some additional equivariance properties; by construction the different ’s have different frequencies so if is a locally constant function on an abelian group and thus is locally periodic, meaning is a periodic function of . Then by Corollary 2.3,
[TABLE]
∎
Remark 2.19*.*
Note that if we also know the -Fourier uniformity conjecture then the step of all nilpotent Lie groups is by Proposition 2.4 (plugging in in the statement of that proposition).
Corollary 2.20**.**
There exists a natural number independent of , a nilpotent Lie group of step , a cocompact subgroup , an ergodic element in and a nilcharacter with nontrivial frequency even when restricted to the identity component,
[TABLE]
where as defined in Subsection 1.1 is an infinite set such that for in , the number of words of length of is if or if for some .
Proof.
By Corollary 2.17,
[TABLE]
Thus, for a positive measure set of ,
[TABLE]
By Proposition 2.18, we know that for almost every ,
[TABLE]
where is as in Proposition 2.18. Fix such an .
Since the sum converges in , there exists a natural number independent of such that . By the triangle inequality
[TABLE]
The second term is bounded by
[TABLE]
using that is shift invariant and is 1-bounded. By Cauchy-Schwarz, this term is bounded by . Thus by the triangle inequality.
[TABLE]
By the pigeonhole principle, there exists some such that
[TABLE]
Renaming everything gives the conclusion. We remark that the corollary just stated that such an , , , and exist and therefore the statement of the corollary allows to depend on , and all the other data that comes from . The remainder of the argument essentially takes place inside a single ergodic component and so how the constants vary from component to component is not important for our purposes. ∎
For the remainder of the proof, we fix , , and . We let . For the next few pages, we fix an integer in such that
[TABLE]
We will later send to infinity. The following lemma does two things: First, it uses Hölder’s inequality to raise the exponent of \Big{|}\mathbb{E}_{h\leq k}\Phi(g^{ph}x)\cdot f^{\prime}(T^{h}y)\Big{|}. We want this term raised to an even power because we want to expand out the product and get rid of the absolute values which are less “algebraic” and therefore harder to understand directly using the theory of nilpotent Lie groups. We also want this even power to be larger the more oscillatory our function is. This is because the more oscillates, the more cancellation we expect in larger and larger products. The larger the power we use, the smaller the fraction of terms which do not exhibit cancellation is. Second, we use the pigeonhole principle. This lemma and the following lemma are where we make essential use of our bound on the word growth rate of .
Lemma 2.21**.**
Recall that had at most words of length occuring with positive upper logarithmic density if for in or has many words of length that occur with positive upper logarithmic density if . Fix a constant that is small even when compared to . Then for each in there is a word of length such that
[TABLE]
for and when , we have
[TABLE]
Proof.
We know that
[TABLE]
By Holder’s inequality, we have
[TABLE]
Because each term is nonnegative, we can replace the essential by a sum over words that occur with positive -density.
[TABLE]
We assumed that the number of words occuring with positive logarithmic density and therefore the number of terms in the sum is at most when or when . By the pigeonhole principle, when there is a word such that
[TABLE]
and similarly for , we have
[TABLE]
which completes the proof. ∎
We need a slightly different estimate for the abelian case. The key to the next lemma is the idea that if correlates with for then also must correlate with translates of of size . Thus, in the abelian case, the previous lemma is rather lossy. When we replace the by a sum, we should gain an extra power of .
Lemma 2.22**.**
For , for all in , there is a word such that
[TABLE]
Proof.
Again, we know that
[TABLE]
Again, by Holder’s inequality
[TABLE]
Again we want to replace by a sum over words. Let be a number satisfying
[TABLE]
Let be the set of such that
[TABLE]
Therefore, the measure of is at least . We want to show that for -almost every in , there are at least many distinct words of such that
[TABLE]
Let be an element of such that the words of are words of and such that
[TABLE]
Denote by the word of length whose entry is . If the words are distinct for then by the triangle inequality
[TABLE]
Suppose for a moment that instead the words are not distinct for . Then there exist a minimum such that are not distinct. Fix such a for the remainder of the proof. Thus, there exists some such that . We claim that is -periodic: that’s because . Furthermore, if then since for all , we clearly have and is not minimal. For the rest of the proof, let be the minimum number such that and is not periodic. For not in , we can find such an because is not eventually periodic. Since is not periodic but is periodic and is equal to for some between and , we have that but for all other . We claim that the words and are all distinct. The reason is that for all between and , we have that for all and precisely no larger range of and for all between and we have that for all and precisely no larger range of . For between and , is periodic but because was the minimal natural number such that are not distinct, we have that the for between and are still distinct. For between and and between and we have that the intervals and meet so the previous argument shows that . A similar triangle inequality computation shows that for between and we still have
[TABLE]
This proves the claim that for in there are at least many distinct words of such that
[TABLE]
Summing over words we get that for almost every in ,
[TABLE]
Next, we use that .
[TABLE]
Sending to infinity and using the pigeonhole principle, we deduce that for some word ,
[TABLE]
∎
Remark 2.23*.*
For abelian and therefore a character, we have
[TABLE]
Therefore, by choosing sufficiently small, in the abelian case, we get
[TABLE]
The next theorem contradicts the previous two lemmas and proves Theorem 1.8. In its proof, we rely heavily on [MR3548534], [Frantzikinakis], [GT2], [GT1] and [GTZ].
Theorem 2.24**.**
Recall that, after Corollary 2.20, we fixed a nilpotent Lie group , a cocompact lattice , a nilcharacter with with nontrivial frequency on the identity component and an element which acts ergodically on , such that . Recall that the step of is at least where . Let be a sequence of words implicitly depending on . Let . Then
[TABLE]
If then we do not need the epsilon loss and instead get the estimate
[TABLE]
This contradicts Lemmas 2.21 and 2.22 as follows. When is abelian and thus , Lemma 2.22 states that there is a word of length such that
[TABLE]
for any we choose so long as is chosen from the set of natural numbers such that has fewer than many words of length . Thus, picking small, (in particular, smaller than say ), we find that
[TABLE]
contradicting Theorem 2.24. (Note that we have replaced by the same expression without the shift in as in Remark 2.23). Similarly, if the group is not abelian, Lemma 2.21 states that there exists a word of length such that
[TABLE]
for and when , we have
[TABLE]
When , again by picking small, this time smaller than , proves that
[TABLE]
again contadicting Theorem 2.24. Finally, when ,
[TABLE]
contradicts
[TABLE]
from Theorem 2.24.
Thus, the rest of this section will be devoted to showing that Theorem 2.24 is true. Suppose not and for the moment fix in such that
[TABLE]
The first step is to replace averages over primes by uniform averages over natural numbers. To do this, we need the machinery of Green-Tao [GT1] [GT2] and Green-Tao-Ziegler [GTZ]. By the triangle inequality, we may replace averages over primes by averages weighted by the von Mangoldt function.
[TABLE]
We denote . We expand:
[TABLE]
where is a phase given by the formula .
We say is diagonal if for all . We say solves the Vinogradov mean value problem if, for all between and , we have
[TABLE]
Every diagonal also solves Vinogradov’s mean value problem. We rely on the following Theorem due to Bourgain, Demeter and Guth which says that those account for “most” solutions, up to a constant.
Theorem 2.25** ([MR3548534] Theorem 1.1).**
For all and there exists a constant such that the number of solutions to the Vinogradov mean value problem is less than where .
We will show that if does not solve Vinogradov’s mean value problem then does not contribute to the sum. Thus, fix which does not solve Vinogradov’s mean value theorem and suppose that
[TABLE]
We denote
[TABLE]
Fix a subsequence such that
[TABLE]
where is some infinite subset of the natural numbers and where the implied constant may depend on . Fix a large number , a product of many small primes. We will later choose exactly how large must be. We pass to a subsequence where the following limit exists for each ,
[TABLE]
where is an infinite subset of . We may do this by a diagonalization argument. By the triangle inequality,
[TABLE]
where the implied constant does not depend on . Note that because , we miss at most one term by changing the bounds of the sum from to . Since is much smaller than , this is an acceptable error. Note that if is not coprime to , then
[TABLE]
because is never prime. By the pigeonhole principle, there exists such that
[TABLE]
where again the implied constant does not depend on and where is Euler’s torient function, the function which counts the number of residue classes mod that are coprime to . Denote . Then we can write our expression as a sum of two terms
[TABLE]
To handle the first term, we need the following theorems of Green-Tao and Green-Tao-Ziegler.
Theorem 2.26** ([GT2] Proposition 11.2).**
Let be a degree filtered nilmanifold, and let . Suppose that is a bounded nilsequence on with Lipschitz constant at most , where is a function on , is an element of and is a point in . Let and a large natural number. Then we may decompose
[TABLE]
where is a sequence with Lipschitz constant and obeying the dual norm bound
[TABLE]
while obeys the uniform bound
[TABLE]
Note that the bound is uniform in the element . We also need the following theorem of Green-Tao-Ziegler. The proof of this theorem is spread out over [GTZ], [GT2] and [GT1], making it somewhat hard to give a specific theorem number. Essentially, if the Gowers norm were big then the Inverse Conjecture for the Gowers Norms would imply that the Mobius function correlates with a nilsequence which it does not by the Mobius-Nilsequence Conjecture. In [GT2], Theorem 7.2 states the theorem follows from the Mobius-Nilsequence Conjecture and the Inverse Conjecture for the Gowers Norms. The first of these conjectures is an immediate consequence of Theorem 1.1 in [GT1]. The second of these conjectures is Theorem 1.3 in [GTZ].
Theorem 2.27** ([GTZ]; see also [GT1] and [GT2]).**
With all the notation as before,
[TABLE]
Thus, our nilsequence can be written as a sum where and implicitly depend on and enjoy the following properties. is uniformly small so
[TABLE]
can be estimated by simply moving the absolute values inside. The remaining term is bounded in dual norm so
[TABLE]
which tends to [math]. For a similar argument, see the proof of Proposition 10.2 in [GT2]. It may also be possible to circumvent the use of [GTZ] by using Theorem 7.1 in [GT1]. Putting this together, we get that
[TABLE]
As such for sufficiently large, by the triangle inequality
[TABLE]
So far we exploited cancellation in the term and simply boundedness in the term. Next, we will try to exploit cancellation in to obtain a contradiction. To exploit this cancellation we interpret the average as an integral over a complicated nilmanifold, then use the fact that the frequency of is nontrivial on the identity component of and therefore nontrivial on every component of . Let be the product of with itself many times and let be the element of whose coordinate is . For any in let be the element of whose entries are all and let be the set of all the elements of the form . Define
[TABLE]
the closure of the group generated by , and inside . Our sequence is a nilsequence on . Consider the sequence of “empirical” measures on ,
[TABLE]
where is the Haar measure on and where denotes the pushforward. By construction, if is defined by
[TABLE]
then
[TABLE]
By the Banach-Alaoglu theorem, there is a further subsequence along which the empirical measures converge weakly,
[TABLE]
where is an infinite subset of . Note that, by summation by parts, is almost invariant by in the following sense:
[TABLE]
Therefore is actually invariant. Since is an average of invariant measures, is a also invariant. Of course is also invariant because acts trivially on . Since stabilizers of measures are closed, is invariant under . By the classification of invariant measures, we know that is actually (a translate of) Haar measure on some nilmanifold . Next we need the following result essentially due to Frantzikinakis [Frantzikinakis].
Lemma 2.28** ([Frantzikinakis]; see section 5.7 and especially the proof of Proposition 5.7).**
With all the notation as before, for any and , we have .
We include the proof for completeness and because our result differs very slightly from the way it was stated in [Frantzikinakis].
Proof.
We split Lemma 2.28 into three claims:
Claim 2.29**.**
Let and be natural numbers. If is in and and are in then there exists in such that
[TABLE]
Moreover, depends continuously on , and . In fact, this holds for any nilpotent Lie group, not just .
Claim 2.30**.**
For any between and and any element in and in the identity component (which is automatic for ), there exists an element in such that
[TABLE]
Claim 2.31**.**
For any natural number between and and any natural number between and and for any in there exists an element in such that
[TABLE]
We remark that taking in Claim 2.31 gives Lemma 2.28.
Proof of Claim 2.29.
The proof is simply a computation. For any , and as above
[TABLE]
∎
Proof of Claim 2.30.
We prove Claim 2.30 by induction on . First, suppose . Consider the torus . Let be the projection map . Then since acts ergodically on , we know is an ergodic element in . Therefore, for any in , note that is in the orbit of . By the definition of , is an element of . Thus, for any in , is an element of so by definition of the quotient
[TABLE]
for some in .
Next, assume by induction that Claim 2.30 holds for . We will try to prove the claim for . We begin with the case where is the commutator of two elements of the following form. Suppose that there exists in and in such that . By assumption, there exists in and in such that
[TABLE]
Since is a group, we conclude that the commutator is in .
[TABLE]
Using Claim 2.29 repeatedly, this is
[TABLE]
for some in .
Finally, we note that commutators generate so it suffices to show that if and are elements of that satisfy Claim 2.30 then so does their product. After all, if
[TABLE]
where and are in then
[TABLE]
where and are in . This completes the proof of Claim 2.30. ∎
Proof of Claim 2.31.
First, if then we are done by Claim 2.30. Thus, we will assume .
Second, we check that if and are in and satisfy Claim 2.31 then so does their product. By assumption, we may write
[TABLE]
where is an element of and . Then the product is given by
[TABLE]
Therefore, it suffices to prove Claim 2.31 in the case that where is in and is in because such commutators generate as a group up to higher order corrections.
By Claim 2.30, there exists in such that
[TABLE]
We also know, because contains diagonal elements, that is an element of . We conclude that
[TABLE]
By Claim 2.29, this is given by
[TABLE]
for some in . ∎
This completes the proof of Lemma 2.28 by plugging in . ∎
Since the frequency of is nontrivial on the identity component, there exists an element in the identity component of such that is irrational. Fix such a . Now since does not solve Vinogradov’s mean value problem there exists such that . Fix such an . Then the map given by has image both open and closed so is in the image. For more details, see [Frantzikinakis]. Fix a such that . Then by Lemma 2.28, . As such
[TABLE]
This gives a contradiction. We conclude that the terms which do not solve Vinogradov’s mean value problem do not contribute to our sum.
For every -tuple in , we have that for all
[TABLE]
simply using a trivial bound. For every -tuple which does not solve Vinogradov’s mean value problem we have
[TABLE]
Therefore, the average is bounded by the fraction of terms which solve Vinogradov’s mean value problem. There are no more than such solutions by Bourgain-Demeter-Guth (Theorem 2.25). Thus
[TABLE]
in the case and
[TABLE]
in the case . After all, since diagonal solutions are the only solutions to Vinogradov’s mean value problem in the case of two variables and one equation i.e. , there is no loss when . Thus, we obtain Theorem 2.24 and in turn Theorem 1.8 and Theorem 1.9.
3. Proof of Theorem 1.11
The proof of Theorem 1.11 is essentially the proof of Theorem 1.8 with a few minor simplifications. As before suppose not. Then as before, we can find a joining such that
[TABLE]
As before, we can apply [FrantzikinakisHost2] such that
[TABLE]
Unlike before, we do not need to restrict the integral to . As before, we can average over translates
[TABLE]
As before, we can take an essential supremum over
[TABLE]
As before, we can apply the entropy decrement argument, for some , we have
[TABLE]
We can use the Cauchy-Schwarz inequality
[TABLE]
This time, would like to replace by a sum over words of length up to rounding. In the no-rounding case, we knew that words of were words of . We double check that a similar result holds for words up to constant rounding. In particular, fix such that there are at most words of length that occur with positive density up to rounding. Thus, we can fix a set of words of length such that and for all outside a set of [math] density there exists an in such that . Translating this to the dynamical setting,
[TABLE]
Therefore, we can replace by a sum over words as before.
[TABLE]
Notice that this time, when we replace by a word, we incur an error of . Now the rest of the argument runs exactly the same as before. In fact, after pigeonholing, any dependence on completely drops out of the argument.
Appendix A Frantzikinakis-Host and dynamical models
[TaoBlog] shows that there is a joining of a dynamical model for with where is the space of sequences in the unit disk with the product topology, is the shift map on and on , is the evaluation at [math] map, is projection onto the second factor and whenever is in . Call the pushforward of onto . Of course, factors onto by projection onto the first factor. Call the pushforward of onto . [FrantzikinakisHost2] (Proposition 4.2 in that paper) showed that is a factor of a system where , is the shift map and there exists a natural number so that if is the set of primes which are then
[TABLE]
where is any natural number, the functions are any bounded measurable functions depending only on the coordinate and by [FrantzikinakisHost2] the limit always exists. We fix such a . By [FrantzikinakisHost2] (see Theorem 4.5 in that paper), each ergodic component of is isomorphic to a product of a Bernoulli system with an inverse limit of nilsystems. Thus, we get a joining of with over their common factor . Call this joining . We also get a joining of and over their common factor , which we call . Explicitly, this joining is defined as follows. A point in can be thought of as a triple of points with in and in . Since , we have that for some in and in . The measure is supported on triples where so we will often forget and simply write a point in as a triple with in , in and in . The measure is given explicitly by the following formula: if is a natural number, are bounded measurable functions on depending only on the coordinate, is a bounded measurable function on and is a bounded measurable function on then
[TABLE]
We will proceed to check that has all the desired properties. We define by taking an element with in and in to . Let be an element of . We will write for a sequence of elements in and write for the element of the sequence . Let We define . Explicitly
[TABLE]
whenever is in . We define by the formula . This is just the pullback of under the factor map . We define by pulling back under the same factor map i.e. . Now we check
- •
- •
We have
[TABLE]
for any and whenever is in .
- •
Let be a natural number and be a sequence of bounded measurable functions depending only on [math]. Let be a function which is measurable with respect to . Then for any ,
[TABLE]
- •
For any natural number and any in , we have
[TABLE]
- •
Clearly, for any natural numbers and ,
[TABLE]
for any in .
- •
Since and are pulled back from , the “statistics” of will be the same as the statistics of and similarly for .
Therefore, is a joining of a dynamical model for with .
Let be an ergodic component of which joins the corresponding ergodic component of with . Note that is already an ergodic inverse limit of nilsystems: after all it is an inverse limit of the ergodic systems of the form and the inverse limit of ergodic systems is ergodic. By [FrantzikinakisHost2], there is a Bernoulli system and an inverse limit of nilsystems such that . Therefore is isomorphic to where is some mystery measure and where is just the product transformation. We can think of this system as a joining of with or we can think of this system as a joining of with where is some unknown measure given by pushing forward onto . Next, we claim that any ergodic joining of two inverse limits of nilsystems is in fact isomorphic to an inverse limit of nilsystems. After all, if and are two nilsystems and is an ergodic invariant measure on , then is a translate of Haar measure on some closed subgroup by measure classification for nilsystems. Thus for some nilsystem . Taking inverse limits, is isomorphic to an inverse limit of nilsystems . Because is an inverse limit of nilsystems, it has zero entropy so the only possible joining of with the Bernoulli system is the trivial joining i.e. is the product measure . Lastly, we claim that is isomorphic to the Host Kra factor of . Since the Host Kra factor is isomorphic to an inverse limit of nilsystems, it has zero entropy, so any factor map from to where is Bernoulli necessarily factors through . Thus factors onto . Of course, since factors onto , the Host Kra factor for factors onto the Host Kra factor for . Implicitly in [HostKra] and explicitly, for instance, in [HostKra2] chapter 12, for any nilsystem the Host Kra factor of is . Thus, taking inverse limits gives that the Host Kra factor of is so . This completes the proof.
Appendix B Reduction to the completely multiplicative case
We have stated our main theorems in the case that is completely multiplicative. In this appendix, we show that these assumptions can be weakened to include all multiplicative functions. For example, we will show that Theorem 1.8 holds in this generality. The same argument works for Theorem 1.9 and Theorem 1.11 (although in this last case, the way that depends on gets worse). The argument here will be entirely formal, using nothing of the proof of Theorem 1.8 and only the result. However, we remark that the interested reader could check that the proof we give can be adapted to the more general case of multiplicative functions. The main difference is that now the dynamical model for does not satisfy the identity that the push forward of restricted to is but instead we incur a error i.e. for all in satisfying we have
[TABLE]
This introduces an error term of size in Corollary 2.17 which tends to [math] as tends to infinity.
However, here we proceed just using the statement of Theorem 1.8. Famously, we can write
[TABLE]
where is Liouville function and is the Möbius function which agree with the Liouville function on squarefree numbers and vanishes on numbers which are not squarefree. Of course,
[TABLE]
but we write it this way to suggest that the convolution identity
[TABLE]
where may be generalized. In fact, for any multiplicative function taking values on the unit circle, we may write
[TABLE]
where is some completely multiplicative function taking values on the unit circle and is a (possibly unbounded) multiplicative function supported on numbers of the form for some natural numbers and with . To prove this is possible, it suffices to check it is possible on prime powers since both sides are multiplicative. For any prime , we define
[TABLE]
and so
[TABLE]
We also want,
[TABLE]
Since whether is unpretentious or not depends only on the behavior of at primes, clearly if is unpretentious then so is .
Informally, the probability that a random number is divisible by is roughly . Thus, the expected number of times that any number of the form for divides a random natural number is at most
[TABLE]
which is summable. Thus the tails
[TABLE]
and
[TABLE]
tend to zero as tends to infinity. Let be the set of natural numbers for which divides for implies . The previous analysis says most numbers are in . Fix a function as in the statement of Theorem 1.8, that is a bounded function such that for any there are infinitely many such that the number of words of of length that occur with positive upper logarithmic density is at most . Our goal will be to show that for large,
[TABLE]
is small, say less than a constant times some small positive number . If is sufficiently large depending on but still very small compared to , we may modify on the set of numbers outside . In particular, is given by the formula
[TABLE]
We conclude that the formula
[TABLE]
is bounded and agrees with all but at most
[TABLE]
of the time. Thus, it suffices to show
[TABLE]
Fix a natural number . Notice that every word of length of the function embeds in a word of of length . Thus, it is easy to check that still satisfies the conditions of Theorem 1.8. Therefore, as tends to infinity, the previous expression tends to [math].
References
