
TL;DR
This paper establishes an asymptotic count for solutions to systems of linear inequalities in primes with fewer variables than previous methods, extending the Green-Tao-Ziegler theorem.
Contribution
It improves the variable requirement from 2m+1 to m+2 for m inequalities and generalizes existing results on linear equations in primes.
Findings
Proves asymptotic formula for solutions in primes
Reduces variable count needed for such solutions
Suggests a conjecture on sieve weights pseudorandomness
Abstract
In this paper we prove an asymptotic formula for the number of solutions in prime numbers to systems of simultaneous linear inequalities with algebraic coefficients. For simultaneous inequalities we require at least variables, improving upon existing methods, which generically require at least variables. Our result also generalises the theorem of Green-Tao-Ziegler on linear equations in primes. Many of the methods presented apply for arbitrary coefficients, not just for algebraic coefficients, and we formulate a conjecture concerning the pseudorandomness of sieve weights which, if resolved, would remove the algebraicity assumption entirely.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Linear inequalities in primes
Aled Walker
Trinity College, Cambridge, CB2 1TQ, United Kingdom
Abstract.
In this paper we prove an asymptotic formula for the number of solutions in prime numbers to systems of simultaneous linear inequalities with algebraic coefficients. For simultaneous inequalities we require at least variables, improving upon existing methods, which generically require at least variables. Our result also generalises the theorem of Green-Tao-Ziegler on linear equations in primes. Many of the methods presented apply for arbitrary coefficients, not just for algebraic coefficients, and we formulate a conjecture concerning the pseudorandomness of sieve weights which, if resolved, would remove the algebraicity assumption entirely.
Contents
1. Introduction
Fourier analysis is a vital tool in the study of diophantine problems. In recent years, however, new tools have been developed which can prove asymptotic formulae for the number of solutions to certain systems even when the Fourier-analytic approach is not known to succeed. In particular, in [13] Green and Tao established an asymptotic formula for the number of prime solutions to generic systems of simultaneous linear equations in at least variables. Their result was conditional on various conjectures, but these conjectures were later proved by the same authors and Ziegler, in the series of papers [14], [15] and [16].
Theorem 1.1** (Theorem 1.8, [13], Green-Tao-Ziegler).**
Let , , and be natural numbers, with , and let be a positive constant. Let be an -by- matrix with integer coefficients, with rank , and assume the non-degeneracy condition that the only element of the row-space of over with two or fewer non-zero entries is the zero vector. Let , and suppose that and that for all and . Let be a convex set. Then
[TABLE]
where the local densities are given, for each prime , by
[TABLE]
and the global factor is given by
[TABLE]
Here and throughout, denotes a prime, denotes a vector all of whose coordinates are prime, and denotes a vector all of whose coordinates are integers . The expression denotes the greatest common divisor of and .
To give a concrete example to which this result may be applied, by considering
[TABLE]
one may deduce an asymptotic formula for the number of four-term arithmetic progressions of primes that are less than .
For , Theorem 1.1 is stronger than any similar statement that may be proved using the Fourier transform alone. Indeed, notwithstanding Balog’s example [2, Corollary 3] of a certain non-generic class of equations in prime variables, generically the Fourier transform approach needs at least prime variables in order to succeed. The proof of Theorem 1.1 rests on many creative innovations, in particular the authors’ use of Gowers norms and their inverse theory, which is a subject that is now referred to as ‘higher order Fourier analysis’. The object of the present paper is to use certain aspects of this machinery to establish, in a related setting, an analogous reduction in the number of variables that are required to prove an asymptotic formula.
We will be concerned with diophantine inequalities, a topic that we first considered in [21]. Before giving our first main result (Theorem 1.7) let us briefly review some previous results concerning diophantine inequalities in the primes. Consider the following classical theorem of Baker.111In fact Baker proved a slightly different result, writing in the cited paper that the result we quote here followed easily from the then existing methods. Vaughan proved a similar result in [20].
Theorem 1.2** ([1], Baker).**
Let , and let be three non-zero reals that are not all of the same sign. Furthermore, suppose that for all the relation holds. Then there exist infinitely many triples of primes satisfying
[TABLE]
Remark 1.3**.**
The condition concerning the signs of is clearly a necessary one, as otherwise there exist only finitely many solutions to (1.2) in the positive integers (and so certainly there exist only finitely many solutions in the primes). Regarding the other condition, the conclusion of Theorem 1.2 may hold even if there exists some for which
[TABLE]
But then one is required to solve
[TABLE]
which, if is small enough, is equivalent to solving
[TABLE]
Theorem 1.1 can then affirm that there are infinitely many solutions, provided that , , and satisfy certain local properties. This issue, of when an inequality can encode a certain equation with rational coefficients, will be an important theme of the paper.**
The classical approach to proving results such as Theorem 1.2 involves Fourier analysis over , after having replaced the characteristic function of the interval with a smoother cut-off function. This approach is known as the Davenport-Heilbronn method, it having originated in a paper [5] of those two authors. For a variety of technical reasons this method was, until relatively recently, unable to give an asymptotic formula for the number of solutions to (1.2) that satisfied , or even give a lower bound of the expected order of magnitude (at least for arbitrary ). However, certain advances of Freeman [6, 7] enabled Parsell to achieve the second of these two goals.
Theorem 1.4** (Theorem 1, [18], Parsell).**
Let , and let be three non-zero reals that are not all of the same sign. Furthermore, suppose that for all the relation holds. Then the number of prime triples satisfying and
[TABLE]
is
Since [18] was published, it has been understood that a very minor modification to Parsell’s analytic method can be used to obtain an asymptotic expression for the number of solutions to (1.3), namely
[TABLE]
for some positive constant . Furthermore, in the case of simultaneous (rationally independent) inequalities of the form (1.3), Parsell’s method can calculate an asymptotic formula for the number of solutions in primes provided the number of variables is at least . In Appendix B we take the opportunity to record the details of both the statement and the proof of this result.
In the main theorems of this paper (Theorem 1.7 and Theorem 1.16) we specialise to the case of algebraic coefficients and reduce the number of variables that are required from to . Our first result does not concern the most general type of diophantine inequality, but nonetheless it enjoys several applications. To state it, we recall the notion of the dual degeneracy variety, which we defined in Definition 2.3 of [21] in order to manipulate the non-degeneracy conditions more succinctly.
Definition 1.5** (Dual degeneracy variety, [21]).**
Let be natural numbers satisfying . Let denote the set of all -by- matrices with real coefficients that contain a non-zero row-vector in their row-space over that has two or fewer non-zero co-ordinates. We call the dual degeneracy variety.
For example, the matrix
[TABLE]
is in , since the vector lies in its row space. As is explained at length in [21], if one wishes to count solutions to an inequality given by using a method involving Gowers norms then one can only possibly succeed if . Returning to Theorem 1.1, we observe that the non-degeneracy condition in the statement of that theorem is exactly the condition that . If , non-degeneracy in this sense is easy to detect. Indeed, if and only if the determinants of all the -by- submatrices of are non-vanishing.
Remark 1.6**.**
The above notion is ‘dual’ to the notion of finite Cauchy-Schwarz complexity (see Definition 5.5), in the sense that is in the dual degeneracy variety if and only if may be parametrised by a system of linear forms with finite Cauchy-Schwarz complexity. In [21] we also introduced a degeneracy variety in order to manipulate quantitative versions of this fact, but this will not be necessary here. For more on these issues, we invite the reader to consult Sections 6 and 7 of [21].**
We are now ready to state our first main result. In the statement below, refers to the set and the function refers to the indicator function of the set .
Theorem 1.7** (Main theorem, purely irrational version).**
Let be natural numbers, with , and let be positive constants. Let be an -by- real matrix with algebraic coefficients and rank . Suppose that . Suppose further that for all one has , i.e. suppose that is purely irrational in the sense of Definition 2.4 of [21]. Let be any vector satisfying . Then
[TABLE]
as .
Remark 1.8**.**
One notes that in the asymptotic formula (1.4) there is not a contribution from any non-archimedean local factors. In Theorem 1.16 below, we will remove the supposition that there does not exist any non-zero vector for which . Once these potential rational relations are permitted, one does indeed observe a contribution from local factors.**
Remark 1.9**.**
When , it is straightforward to show (see Lemma A.2) that the main term in (1.4) is equal to
[TABLE]
where is a constant depending only on . The positivity of may be determined in practice.**
Remark 1.10**.**
The reader may note that Theorem 1.7 insists upon a fixed matrix , rather than a matrix with bounded coefficients (as appeared in Theorem 1.1). In our previous work [21, Theorem 2.10], performed in the context of linear inequalities weighted by bounded functions we proved a result that enabled to vary, as long as the coefficients of were bounded and was bounded away from . In the present paper there are many auxiliary linear equalities , which will also need to enjoy such a quantitative non-degeneracy. We found keeping track of these features throughout the whole argument to be extremely complicated, but in principle it should be possible to do so.**
Remark 1.11**.**
Theorem 1.7 strengthens Theorem B.1 of Parsell, in the sense that the number of variables has been reduced (from to ). But unfortunately this has been achieved at the cost of imposing an algebraicity assumption on the coefficients of . The situation is regrettable as, under this assumption, the classical Davenport-Heilbronn method alone is adequate to count the number of prime solutions to simultaneous linear inequalities in variables, without needing the developments of Parsell. We should stress that most of our method does not rely on the algebraicity assumption. Indeed, the conclusions of Theorems 1.7 and 1.16 do in fact hold for some explicit set of matrices that has full Lebesgue measure (see Remark 9.7). Unfortunately, owing to the intricacy of the linear-algebraic manipulations in Section 15, we have not been able to formulate a clean or enlightening characterisation of this full-measure set. We have decided to clarify the exposition of the paper by working with algebraic coefficients throughout.**
Let us give a concrete example of a linear inequality to which Theorem 1.7 applies but the Davenport-Heilbronn method does not.
Example 1.12**.**
Let . Then the number of prime quadruples satisfying
[TABLE]
is equal to , for some positive constant .
Proof.
Taking
[TABLE]
certainly satisfies the hypotheses of Theorem 1.7, since all the -by- submatrices have non-zero determinant and surds of primes are rationally independent. Taking , one may therefore apply Theorem 1.7.
This yields an asymptotic expression for the number of solutions to (1.12) with the main term in the form of an integral. Since , by Remark 1.9 we may express the main term as for some constant . Explicitly, from Lemma A.2 and expression (A.6) therein,
[TABLE]
where
[TABLE]
By a computation, we satisfy ourselves that is positive. ∎
Theorem 1.7 may also be used to count prime solutions to other systems.
Corollary 1.13**.**
Let be a real vector with algebraic coefficients. Suppose that there does not exist any that satisfies . Let denote the set of primes. Then
[TABLE]
for some positive constant .
Here denotes the floor function of , i.e. the greatest integer that is at most .
Proof.
We can expand the left-hand side of (1.6) as
[TABLE]
Observe that the equation has no solutions, since is irrational by assumption. So the above is equal to
[TABLE]
and this in turn is equal to
[TABLE]
where is the vector with every coordinate equal to , and .
Let be the -by- matrix
[TABLE]
Then (1.7) is equal to
[TABLE]
where .
One sees that satisfies the hypotheses of Theorem 1.7. Indeed, note first that if there exists some for which then by considering the final coordinates of it follows such an must have integer coordinates. But by considering the second coordinate of it follows that , which is a contradiction to our assumptions on . Secondly, if were in then either for some index , or for two different indices and . Both of these possibilities are precluded by the assumptions on .
Therefore we may apply Theorem 1.7, and by Remark 1.9 we get a main term of the form . Explicitly, using Lemma A.2 and expression (A.6) as above, we have
[TABLE]
For any vector this integral is positive, and so the corollary is proved. ∎
Let us now present a theorem which does not require to be purely irrational. This is Theorem 1.16 below, and we consider it to be our main result.
For ease of notation, we introduce the following definition.
Definition 1.14**.**
Let be natural numbers, and let be a linear map. Let and be functions with compact support. Let . Then, for functions , we define
[TABLE]
It will be convenient to introduce a logarithmic weighting to the primes. To this end, following [13], we define the function by
[TABLE]
The von Mangoldt function will not be needed in this paper.
Another notion from [13] will be useful.
Definition 1.15** (Local von Mangoldt function).**
For , the local von Mangoldt function is the -periodic function defined by
[TABLE]
We let denote the restriction of to the non-negative reals, namely the function .
The local von Mangoldt function, when is the product of small primes, can be viewed as a model for the function . This model222This is essentially the modified Cramér random model. is intimately connected to a technical device known as the -trick, which we recall in Section 7.
For a function we define the Lipschitz constant of to be
[TABLE]
and call Lipschitz if this value is finite.
We may now state the main theorem.
Theorem 1.16** (Main theorem).**
Let be natural numbers with , and let be positive real parameters. Let be a surjective linear map with algebraic coefficients, and suppose that . Let be any vector that satisfies . Let and be compactly supported Lipschitz functions with Lipschitz constants at most , and assume that is supported on and is supported on . Let , assuming that is large enough for this function to be well defined, and let . Then
[TABLE]
as .
Remark 1.17**.**
If is supported on , we have
[TABLE]
We will prove an asymptotic formula for later, in Lemma 9.11 and Remark 9.12. For example, if and
[TABLE]
say, and and are smooth functions supported on and respectively, one may use Lemma 9.11 and Remark 9.12 to show that
[TABLE]
where
[TABLE]
and
[TABLE]
where
[TABLE]
The constant is in fact equal to
[TABLE]
*where are the coordinate maps for . ***
It takes some effort to establish precisely what the map should be for a given . What’s more, the asymptotic formula in the general case is not just a product of a local factor and a global factor but rather a finite sum of products of local factors and global factors, and we will need to introduce an abundance of additional notation in order to be able to state these terms properly. Thus, in the interests of readability, we choose not to include this formula as part of the statement of Theorem 1.16.**
Remark 1.18**.**
If has rational coefficients333or more generally if has rational dimension , see Definition 5.2 below., then Theorem 1.16 reduces to a statement on linear equations in primes (a reduction which we will make precise in Remark 5.7 below). In this sense, our work is a generalisation of Green-Tao-Ziegler.**
Remark 1.19**.**
We have phrased Theorem 1.16 with Lipschitz cut-offs and . In Section 17 we will demonstrate how these cut-offs may be removed when is ‘purely irrational’, and in doing so will demonstrate how Theorem 1.16 implies Theorem 1.7. The same methods may be applied when is not purely irrational, but they will not always succeed, due to the rational degeneracy introduced in those cases. Unfortunately we have not been able to formulate what we regard to be a satisfactory general condition for saying when (1.9) holds with sharp cut-offs and . Note in particular how the proof of Lemma A.2 relies heavily on the convex sets and being axis-parallel boxes. Therefore we do not present a version of the theorem in which summation is over a general convex set , as is done in Theorem 1.1. However, if the reader wishes to apply a specific instance of Theorem 1.16 with sharp cut-offs, the methods of Section 17 and Appendix A will almost certainly suffice for the purpose.**
Remark 1.20**.**
The reader will observe that, as in Theorem 1.7, we do not determine the nature of the dependence of the error term in (1.9) on the map . We discussed this feature in Remark 1.10.**
We conjecture that the conclusion of Theorem 1.16 holds for all , provided grows slowly enough in terms of .
Conjecture 1.21** (Transcendental case).**
Let , , , and be as in the statement of Theorem 1.16, but do not assume that necessarily has algebraic coefficients. Then there is some function , with as , such that (1.9) holds with .
In Section 9 we will formulate a statement involving smoothed sieve weights (namely Conjecture 9.6) which, if resolved, would settle Conjecture 1.21.
Acknowledgments. During the writing of this paper we benefited greatly from the supervision of Ben Green, and had helpful conversations with Sam Chow, Trevor Wooley, Yufei Zhao, Joni Teräväinen and Kaisa Matomäki. We would like to thank an anonymous referee for an exceptionally detailed reading of the manuscript and for many helpful corrections and comments. The majority of the work was carried out while the author was supported by EPSRC grant no. EP/M50659X/1, continued while the author was a Program Associate at the Mathematical Sciences Research Institute in Berkeley, and finished while the author was supported by a Junior Research Fellowship at Trinity College Cambridge.
2. The structure of the argument
In this section we discuss our approach to proving Theorem 1.16, and describe the geography of the paper as a whole.
Initially, one might hope that Theorem 1.16 could be proved by replacing the coefficients of with some rational approximations, by considering the corresponding linear equation with rational coefficients, and then by appealing directly to Theorem 1.1 on linear equations in primes. However, unless the coefficients of are extremely well-approximable by rationals (and in particular are transcendental), such an approach does not seem to succeed. Indeed, let and let be a rational approximation to , with being the corresponding approximation to . In order for the comparison of with to be meaningful, we will need for all relevant , and in the general situation where all coordinates of have magnitude this requires to be . Hence the numerator and denominator of must grow rapidly with , unless is extremely well-approximable. Yet Theorem 1.1 requires the coefficients of the associated affine linear equations to have height (excepting the constant term, which may be ). In [3] Bienvenu offers a slight improvement, but even with this refinement it does not seem that we can apply an existing result on linear equations in primes as a black box.
Instead, we will follow a similar approach to that which we used in our work [21], a paper that considered diophantine inequalities in the setting of bounded functions. Namely, we replace the function by a suitable convolution , designed to ensure the validity of the approximation
[TABLE]
The integral may be manipulated by certain reparametrisations (Lemma 14.3), yielding expressions of the form
[TABLE]
where parametrises and are certain functions. By applying the Gowers-Cauchy-Schwarz inequality, in a manner strongly resembling [13, Appendix C], such expressions may be bounded by the Gowers norm , for some . A qualitative bound on this Gowers norm is known by the work of Green-Tao-Ziegler (see Lemma 7.5), and so Theorem 1.16 follows.
The novel aspect of this manipulation, over the work of [13] and [21], is the appearance of various auxiliary linear inequalities, weighted by upper bound sieve weights. These enter in a manner that is somewhat analogous to the way in which the so-called ‘linear forms condition’ arises in [13]. Asymptotics for the number of solutions to these auxiliary inequalities underpin the argument, and this leads to a ‘linear inequalities condition’
[TABLE]
for a sieve weight , which is our corresponding notion of pseudorandomness (made precise in Definition 9.1). We are unable to verify this pseudorandomness condition in full generality, but we succeed in the case when has algebraic coefficients. Our key technical tool is a bound for the number of solutions to a diophantine inequality restricted to a lattice, which we prove using the Davenport-Heilbronn method. This is the only part of the entire argument that uses the fact that the coefficients are assumed to be algebraic.
There is a final technical manoeuvre that we employ, one which has no direct analogue in [13] or [21]. It will transpire that passing to the local von Mangoldt function introduces certain singular expressions, which arise from the fact that we are dealing with inequalities rather than equations. To circumvent this issue we find it necessary to work at two different ‘local scales’, introducing functions and . By careful manoeuvring one can ensure that the singular expressions are only introduced by the scale, and so, provided grows slowly enough compared to , these singularities may be offset by the decay in the Gowers norm expressions involving . This further complicates the analysis of the expressions, and in fact our final choice of function will be non-effective.
The structure of the paper is as follows. The main elements of the proof of Theorem 1.16 take place in Part V, and the reader may wish to begin with this section. It is here that we reduce matters to bounding certain systems by Gowers norms (Section 12), prove the approximation (2.1) (Section 13), and apply the Gowers-Cauchy-Schwarz inequality (Section 15).
However, the arguments of this part rely heavily on lemmas that are proved earlier in the paper, and these lemmas split naturally into four types. There are those results that are standard properties of smooth functions, and these are recorded in Section 3. We also have lemmas whose proofs involve manipulation of a purely linear algebraic nature, in order to reduce inequalities to ones that are ‘purely irrational’ or to put linear equations into ‘normal form’. We describe these notions in Part II. The definition of pseudorandomness for an enveloping sieve weight is contained in Part III, as is our proof that a certain weight satisfies this pseudorandomness condition. Also in this part one may find Conjecture 9.6, which, if resolved, would remove the algebraicity assumptions. Part IV is reserved for those lemmas that involve the (somewhat tedious) manipulation of integrals into more pleasant forms. One of these lemmas is Lemma 11.1, which is the lemma that introduces the second local scale that we mentioned above.
The first appendix is concerned with elementary estimates relating to the integral that appears in the global factor of Theorem 1.16. As we have already said, Appendix B presents a Fourier-analytic argument which is essentially due to Parsell.
Finally, let us mention that, to help to streamline the statements of various propositions and lemmas in the paper as a whole, we have found it useful to introduce certain notational conventions that are unique to this paper. We describe these in Section 4.
Part I Preliminaries
3. Smooth functions
Smooth functions will play a significant role in the paper, and in this section we collect together those notions and lemmas that will be necessary for our forthcoming manipulations.
Following [17, Section 2], given a natural number and a compactly supported smooth function , we define to be the corresponding value of , to be the smallest such that is supported on , and for every non-negative integer we define
[TABLE]
Then, if is any set, we shall define to be the set of those smooth functions for which
[TABLE]
can be bounded above by quantities that depend only on the elements of . For example, let be the function given by
[TABLE]
and then for a positive parameter let be defined by . Then , as is proved rather succinctly in [4, Lemma 9], say.
In order to shorten some of the statements in the main part of the paper, it will be convenient to consider all functions on to be smooth (with derivatives equal to [math]).
Let us record a standard proposition on smooth majorants and minorants.
Lemma 3.1**.**
Let be a real number in the range . Then there exist two smooth functions , with , satisfying
[TABLE]
for all .
Proof.
Let be as above, and let . Then one may define
[TABLE]
and
[TABLE]
The fact that follows from differentiating under the integral (which is easily justified by the mean value theorem). ∎
Lemma 3.2** (Smooth partition of unity).**
Let be a real number in the range . Then there exists a natural number , satisfying , and functions such that
- (1)
for each , ; 2. (2)
for each , is supported on an interval of length at most ; 3. (3)
for all , ; 4. (4)
for all , is contained in the support of at most of the functions .
Proof.
Let , and write
[TABLE]
where
[TABLE]
Then define
[TABLE]
The desired properties are immediate. ∎
Lemma 3.3** (Approximating Lipschitz functions by smooth boxes).**
Let be positive real parameters, with in the range . Let be a natural number, and let be a Lipschitz function supported on with Lipschitz constant at most . Then there exists a natural number , satisfying , and functions such that
- (1)
; 2. (2)
for each , is supported on a box with side length ; 3. (3)
there is a natural number , satisfying , and functions , satisfying , such that
[TABLE]
for each , for some element and some constant .
Proof.
We have
[TABLE]
where the functions are those constructed by applying Lemma 3.2 with this value of . This manipulation is indeed valid, since for any for which
[TABLE]
Swapping the product and summation, (3) equals
[TABLE]
Let be any point at which is non-zero. Then the above is equal to
[TABLE]
by the Lipschitz properties of and the limited support of the functions (which was part (2) of Lemma 3.2).
Define
[TABLE]
These functions satisfy properties (2) and (3) of Lemma 3.3. Finally note that, by part (4) of Lemma 3.2, each is contained in the support of at most of the functions , and hence , as required. ∎
The Fourier transform of smooth functions will be an important tool in Section 8. We choose the following convention. If is a compactly supported smooth function, we define the Fourier transform by the formula
[TABLE]
Lemma 3.4**.**
Let be a set of parameters and suppose . Then for every and every non-negative integer one has
[TABLE]
Proof.
This follows from integration by parts. ∎
Finally, we recall the definition of dual lattices and the version of the Poisson summation formula that we will use.
Definition 3.5** (Dual lattice).**
Let be a natural number and let be a lattice of rank . Then the dual lattice is defined by
[TABLE]
It is easily seen that if is an -by- matrix whose columns are a lattice basis for , then is an -by- matrix whose columns are a lattice basis for .
Lemma 3.6** (Poisson summation).**
Let be a natural number and let be a lattice of rank . Let be a smooth compactly supported function. Then
[TABLE]
Proof.
This is a standard result. The version in which appears as [8, Theorem 3.1.17], with the extension to general full-rank lattices following from a change of variables. ∎
4. Notation and Conventions
For the most part the notation used in this paper is very standard, and any usage that could be viewed as somewhat unusual will be introduced as and when it is required. However, there are a few particular points that will apply to the paper as a whole which we believe to be important to address now.
We will use the Bachmann-Landau asymptotic notation , , and , but we do not, as is sometimes the convention, for a function and a positive function choose to write if there exists a constant such that for sufficiently large. Rather we require the inequality to hold for all in some pre-specified range. If is a natural number, the range is always assumed to be unless otherwise specified. It will be a convenient shorthand to use these symbols in conjunction with minus signs, whenever they appear in exponents. For example, refers to a term , where is some positive quantity bounded away from [math] as the asymptotic parameter tends to infinity.
The Vinogradov symbol will be used, where for a function and a positive function we write if and only if . We write if and . If an implied constant or a term depends on other parameters, we will denote these by subscripts, e.g. , or . However, if the implied constants depend on the underlying dimensions (denoted by , , and occasionally by , , and ) we will not record this fact explicitly, as this would render most of the expressions unreadable.
The notation , which was introduced in the previous section for compactly supported smooth functions , will also be used when is not smooth.
In order to keep track of which variables are scalars and which are vectors, we will use boldface to denote any where could be at least . In order to describe certain integrals over many variables, the following notational convention will be useful. If and if and are two subscripts with , we use to denote the vector .
With a view to trying to shorten some of the statements and proofs to follow, there are certain functions that we will fix throughout the paper, namely , , , and . From now on, the function will always be defined by
[TABLE]
Whenever is a quantity that we have defined, we write for and let
[TABLE]
The empty product is considered to be equal to . Whenever other functions occur, and a natural number is given, we will define analogously.
The following definition (a smooth version of [21, Definition 5.2]) will be a useful way to control certain functions that are required in the argument.
Definition 4.1** (-supported).**
Let be a smooth function, and let be a positive parameter. We say that is -supported if is supported on and for all .
It follows from Lemma 3.1 that -supported functions exist. From now on we fix a smooth function
[TABLE]
that is -supported. We think of as an element of . Whenever a positive parameter is defined we also define
[TABLE]
by the relation . The function is -supported, and satisfies .
We finish this section with some pieces of notation of a more standard nature. If for some , we define
[TABLE]
If is the singleton , we write for . We let denote the topological boundary of (though the symbol will also be used for partial differentiation, as usual). If and are two sets with , we let denote the indicator function of . The relevant set will usually be obvious from context. If is some event, e.g. a divisor condition, we will also use for the indicator function of this event. For we adopt the standard shorthand to mean . The Möbius function will be denoted by , though in Section 15 the symbol will also be used to denote a measure. In Section 9 we will use for Euler’s -function, and for two natural numbers and we use the shorthand to denote their greatest common divisor.
Part II Linear algebra
In [21] we developed an armoury of linear-algebraic methods, which enabled us to manipulate linear inequalities into certain desired forms. The same manipulation is necessary here. We have chosen not to consign this material to an appendix, nor simply to cite [21], since the result of Lemma 5.6 below will be very important during subsequent sections. We will also need a few results (on the vector below) that were not required in our previous work, and so citing [21] won’t quite do.
Fortunately, as we do not seek to determine exactly how the error term in Theorem 1.16 depends on , we can offer a significant simplification over the work that was presented in [21]. This is another reason to include this material.
Before starting, we remind the reader of some of the central definitions from the theory of dual vector spaces and dual linear maps, which will be used liberally throughout. Let be a finite-dimensional vector space over a field . Then denotes the dual vector space, i.e. the vector space of all linear maps under pointwise addition and scalar multiplication. If is a linear map between two finite-dimensional vector spaces, the dual map is defined by the relation for all and . Given a basis for , the dual basis for is defined by extending linearly the relations
[TABLE]
Finally, given a set the annihilator is defined by
[TABLE]
5. Dimension reduction
We begin with a generalisation of Definition 1.8. Note that the case is permitted below.
Definition 5.1**.**
Let be natural numbers, and let be a non-negative integer. Let be a linear map, and let be a linear map with integer coefficients. Let and be functions with compact support. Let and . Then for we define
[TABLE]
where is the coordinate of .
The reader might notice that this definition is subtly different from the similar definition that appeared in [21], namely Definition 4.3 of that paper, in which the function was treated as an arbitrary function . When dealing with quantitative aspects of smooth functions (a feature of this paper that is not required in [21]) it is convenient to preserve the internal structure of this particular function, and so we have modified Definition 5.1 accordingly.
Recall the notion of rational maps from [21].
Definition 5.2** (Rational dimension, rational map, purely irrational).**
Let and be natural numbers, with . Let be a surjective linear map. Let denote the largest integer for which there exists a surjective linear map for which . We call the rational dimension of , and we call any map with the above property a rational map for . We say that is purely irrational if .
Remark 5.3**.**
If (the matrix of) has algebraic coefficients, then there exists a rational map for that also has algebraic coefficients.**
Purely irrational linear maps are those that we may analyse most easily using the Davenport-Heilbronn method (see Section 8). However, even when proving Theorem 1.7, whose statement concerns only purely irrational linear maps, we will be forced to consider auxiliary linear maps that are not purely irrational. It is necessary therefore to develop a rudimentary theory of these maps. Readers desiring more detail and motivating examples concerning rational maps and rational dimension may consult Sections 2, 4, and 6 of [21].
Our key tool will be Lemma 5.6, which is a version of Lemma 4.10 from [21]. This lemma will enable us to ‘quotient out’ the rational relations that are present in a diophantine inequality, leaving behind a purely irrational linear map between spaces of a lower dimension. In particular, we will show that
[TABLE]
where is purely irrational, and the vectors and , the linear map and the function are objects that we may control.
To state the lemma we need to recall explicitly the notion from [13] that was mentioned in Remark 1.6, namely finite Cauchy-Schwarz complexity for linear maps.444In [21] a notion of degeneracy for pairs of linear maps was useful, but we have structured the present paper in such a way as to avoid requiring this complicated notion.
Definition 5.4** (Finite Cauchy-Schwarz complexity).**
Let be natural numbers, and let be a linear map. We say that has infinite Cauchy-Schwarz complexity if there are two distinct indices and , and some , for which . If no such and exist we say that has finite Cauchy-Schwarz complexity.
There is an equivalent definition, which will be more convenient for algebraic manipulations.
Definition 5.5** (Finite Cauchy-Schwarz complexity, equivalent definition).**
Let be natural numbers. Let denote the standard basis vectors of , and let denote the dual basis of . Then let denote the set of all linear maps for which there exist two indices , and some real number , such is non-zero and . If , we say that has finite Cauchy-Schwarz complexity.
The equivalence of these definitions is elementary.
For more background on the notion of finite Cauchy-Schwarz complexity, the reader may consult Section 1 of [13] or Section 6 of [21].
Now we may state and prove the important lemma, which provides the ‘dimension reduction’ of the section title.
Lemma 5.6** (Generating a purely irrational map).**
Let be natural numbers, with , and let be positive parameters. Let be a surjective linear map with algebraic coefficients. Let be the rational dimension of . Let and be compactly supported functions. Assume that is smooth, , and moreover that for some set of parameters . Let be a vector with . Then there exists a surjective linear map , a surjective linear map , an injective linear map , a finite subset , a vector , and, for each , a compactly supported function , such that
- (1)
* is a rational map for with algebraic coefficients;* 2. (2)
* has integer coefficients, depends only on , and satisfies and ;* 3. (3)
* satisfies , and for all ;* 4. (4)
for all , the function is smooth, , and ; 5. (5)
* satisfies ;* 6. (6)
for all natural numbers , and for all functions , one has
[TABLE] 7. (7)
* is purely irrational, depends only on , and has algebraic coefficients;* 8. (8)
*if then has finite Cauchy-Schwarz complexity. *
*The above properties suffice for Section 9, but three additional properties also hold. We will need these additional properties in Section 11. * 9. (9)
Letting denote the standard basis of , there is a set for which
[TABLE]
is a basis for and a lattice basis for . Furthermore, and is a lattice basis for ; 10. (10)
if is small enough in terms of , and if for some , then and is a vector that minimises over all ; 11. (11)
for all and one has
[TABLE]
.
Proof.
Parts (1) and (2): Choose to be a rational map for that has algebraic coefficients. By rank-nullity is a dimensional subspace of , and also the matrix of has integer coefficients. Combining these two facts, we see that is a dimensional lattice, and (by the standard algorithms) one can find a lattice basis that satisfies for every .
Let denote the standard basis of , and then define by
[TABLE]
for all . Then satisfies part (2) of the lemma.
Parts (3), (9), and (10): There is a set of vectors that is an integer basis for the lattice and for which for each . Furthermore there exists a set of vectors such that for each , and . By Lemma 4.8 of [21],
[TABLE]
is a basis for and a lattice basis for .
Now, if and then . Recall that and that . It follows that there are at most possible vectors for which there exists a vector for which both and . Let denote the set of all such vectors . Observe that, for all , .
For each , there exists a unique vector such that . Note that . Letting denote the set of these , we see that satisfies part (3).
If is small enough in terms of , then has size at most . Indeed, if and are two different vectors in , with respective and , then and . Hence . Yet (which is a contradiction). In this instance, writing in the form , we may pick to be an element in that minimises over all
Parts (4), (5), (6), and (11): By the definition of , and the fact that , we have that is equal to
[TABLE]
This is very close to being of the form required for part (6), and indeed it can be massaged into exactly the required form.
To do this, note that
[TABLE]
and so there exists an invertible linear map with algebraic coefficients such that
[TABLE]
For all we have
[TABLE]
We also note that , and that .
Now, write for the projection map onto the final coordinates. Define by
[TABLE]
where is the extension of by [math] in the first coordinates. Then satisfies the desired properties of part (3), since and are orthogonal.
Then (5.4) is equal to
[TABLE]
Let
[TABLE]
Then is surjective, and
[TABLE]
This resolves parts (5) and (6). But furthermore, by the construction of , part (10) is also satisfied.
Part (7): This is immediate from Lemma 4.10 of [21]. To spell it out, suppose for contradiction that there exists some surjective linear map with , i.e. with . Then define the map by
[TABLE]
Then is surjective, and . This second fact is immediately seen by writing with respect to the lattice basis from (5.3). This contradicts the assumption that has rational dimension . So is purely irrational.
Part (8): Suppose and suppose for contradiction that has infinite Cauchy-Schwarz complexity. Letting denote the standard basis of , this means there exists and a non-zero vector such that . But . Hence , which implies that , contradicting our hypothesis.
The lemma is proved. ∎
Remark 5.7**.**
*Applying Lemma 5.6 with for all , and when has rational dimension , it is evident that estimating is equivalent to counting solutions to systems of linear equations given by . This is handled by the Main Theorem of [13]. In this sense, one may see how our work in this paper generalises Green-Tao’s work in [13] to the cases in which the rational dimension is not equal to . ***
6. Normal form
In this section we describe, very briefly, what it means for a linear map to be in -normal form. For a more complete discussion we refer the reader to [13] and [21].
Definition 6.1** (Normal form).**
Let be natural numbers, let be a non-negative integer, and let be a linear map. We say that is in -normal form if for every there exists a collection of basis vectors of cardinality such that is non-zero for and vanishes otherwise.
The notion of normal form is intimately connected with the notion of finite Cauchy-Schwarz complexity (Definition 5.5). The key proposition was proved555In [21] we were forced to prove a delicate quantitative version, but this will not be necessary here. in [13].
Lemma 6.2** (Normal form extensions).**
Let be natural numbers, and let be a linear map with finite Cauchy-Schwarz complexity. Then there is a linear map such that:
- •
;
- •
for some vectors that satisfy for every , the map is of the form
[TABLE]
for all ;
- •
* is in -normal form, for some .*
Proof.
In [13, Lemma 4.4] this lemma was proved for a linear map over a -vector space. The proof over is identical. Alternatively one can iterate [21, Proposition 6.7] over all . ∎
Remark 6.3**.**
In Lemma 6.2 one may take to be the Cauchy-Schwarz complexity of . This notion will not be used in this paper, save for the ‘finite versus infinite’ dichotomy already given in Definition 5.5.**
Part III Pseudorandomness
Notions of pseudorandomness are crucial to the theory of higher order Fourier analysis. A small Gowers norm is one such notion, as is satisfying the ‘linear forms condition’ of [11] and [13]. In this part we review what is known about Gowers norms in relation to the primes, and then formulate a ‘linear inequalities condition’, which will be the analogous notion of pseudorandomness for this paper.
7. The -trick and Gowers norms
To begin with, let us recall the definition of the Gowers norm over a cyclic group and over . Given a function , and a natural number , one defines the Gowers norm to be the unique non-negative solution to the equation
[TABLE]
where , , is the complex-conjugation operator, and the summation is over . It is not immediately obvious why the right-hand side of (7.1) is always a non-negative real, nor why the norms are genuine norms if , but both facts are true. There are many expositions of the standard theory of these norms available in the literature, for example [19, Chapter 11] and [10]. For the most general treatment, the reader may consider Appendices B and C of [13].
In the sequel we will be considering functions defined on rather than on . However, the Gowers norm of such functions may be easily defined by reference to the cyclic group case. Indeed, if , and is a natural number, one chooses a natural number and then considers as an initial segment of (viewing as a set of representative classes for ). One then defines
[TABLE]
which is independent of provided is large enough in terms of .
This is as much background as we will give here, and the reader is invited to consult the aforementioned references for more detail. A Gowers norm over will also appear later on in this paper, but will be introduced in Section 13 as and when it is needed.
We move our consideration to the primes. Given some fixed modulus the primes are not uniformly distributed across arithmetic progressions modulo (as almost all the primes are coprime to ), and this lack of uniformity is an obstacle when trying to count solutions to equations in primes. Fortunately, there is a technical device, known as the -trick, that has long been used to manage this difficulty.
This device is usually introduced via the following function.
Definition 7.1**.**
Let be a natural number, and let be as in Section 4. For any natural number with , let be defined by
[TABLE]
The idea from [13], going back to [9] and [11], is that the function
[TABLE]
should act as a proxy for , while each enjoys strong pseudorandomness properties. For example we have the following deep result, which is a crucial component of the proof of Theorem 1.1 on linear equations in primes.
Theorem 7.2**.**
[13*, Theorem 7.2]**
Let be natural numbers, and let be any function that satisfies as and for all . Let be a natural number that satisfies and . Then*
[TABLE]
as , where the term may depend on the function chosen (but is independent of the choice of ).
We remind the reader that is a dimension parameter, and so dependence on is not denoted explicitly in our implied constants.
Remark 7.3**.**
In [13] Theorem 7.2 is proved conditionally, relying on two other conjectures. But, as we intimated in the introduction, these conjectures were later settled in joint work of Green-Tao and Green-Tao-Ziegler [14, 15, 16].**
Remark 7.4**.**
We will use Theorem 7.2 to prove Theorem 1.16. Unfortunately it seems that this cannot be done in the same manner as in [13], i.e. by splitting into arithmetic progressions modulo at an early stage and then performing subsequent manipulations with the functions .**
As a heuristic, instead of considering an inequality such as
[TABLE]
for some with irrational coefficients and some positive , [13] considers (7.4) for some with rational coefficients and sets equal to [math]. Under those assumptions one may rescale the variables by a factor of , as required in Definition 7.1, without fundamentally altering the problem. However, in the more general scenario of Theorem 1.16, where is strictly positive, rescaling the variable by a factor of means we must replace by , and we cannot afford this loss, as the manipulations in Section 13 lose some powers of . As far as we have been able to tell, this means that we cannot perform the -trick in this manner.**
To circumvent this issue of scaling, we will manipulate with the local von Mangoldt functions throughout, saving our rescaling for the very end of the argument. Regarding the control on Gowers norms, the following lemma is therefore the more appropriate bound.
Lemma 7.5**.**
Let be natural numbers. Then
[TABLE]
as .
The proof is a standard deduction from results of [13], achieved by splitting into arithmetic progressions modulo . We would however like to thank the anonymous referee for suggesting a simplification to our original argument.
Proof.
Let denote the linear map giving the Gowers norm, i.e. where each is of the form . From expression (7.2), we then have
[TABLE]
where
[TABLE]
It is immediate that .
We now split into arithmetic progressions modulo . To this end let be the set defined by
[TABLE]
Then the right-hand side of (7.5) is
[TABLE]
plus an error of magnitude at most
[TABLE]
This error is .
By the linearity of , and recalling the definition of from Definition 7.1, we have that expression (7.6) is equal to
[TABLE]
Observe that
[TABLE]
where
[TABLE]
is the local factor associated to the system of forms . Since has finite Cauchy-Schwarz complexity, we have the bound (by [13, Lemma 1.3]). This means that the lemma would follow from the bound
[TABLE]
for each fixed . What’s more, expression (7.7) is an immediate consequence of the Gowers-Cauchy-Schwarz inequality when combined with Theorem 7.2.
To spell out some of the details, let and let be a natural number with large enough in terms of . Then, recalling the definition of the set , the left-hand side of (7.7) is equal to
[TABLE]
Taking the term as read, this is
[TABLE]
Now, by the Gowers-Cauchy-Schwarz inequality (see [19, Expression (11.6)]), expression (7.9) is at most
[TABLE]
By expression (7.2), this is bounded above by a constant times
[TABLE]
Expression (7.10) is directly amenable to Theorem 7.2, with the only wrinkle being the fact that Theorem 7.2 only applied to functions with . But this is easy to deal with. Indeed, for natural numbers and we have the identity
[TABLE]
and so one establishes that if is in the range then
[TABLE]
where and , and where the error term is at most a constant times
[TABLE]
We have and therefore, by Theorem 7.2, expression (7.10) is . The lemma follows. ∎
8. Inequalities in lattices
This section will be devoted to proving the following technical lemma. This is the only part of the paper in which we pay especial attention to the quantitative aspects of the smooth cut-off functions, as the lemma will be applied in contexts where the functions and depend on the asymptotic parameter (albeit tamely).
Lemma 8.1** (Inequalities in lattices).**
Let be natural numbers, with , and let be a positive constant. Suppose that . Let be an additional set of parameters. Let be an injective linear map with integer coefficients and let be a purely irrational surjective linear map with algebraic coefficients. Let and let . Let and suppose that for all . Let and be functions in . Then, provided that is small enough in terms of , for all positive we have
[TABLE]
where is the local factor
[TABLE]
Remark 8.2**.**
*If and if is the identity map, then the Chinese Remainder Theorem guarantees that . In the general case, the local factors are the same objects as those factors considered in [13, Page 1831]. ***
Proof of Lemma 8.1.
We assume throughout that is small enough in terms of , and that is large enough in terms of the dimensions , , and .
By applying Fourier inversion to , we see that the left-hand side of (8.1) is equal to
[TABLE]
To bound this integral, we split into three ranges. Let be a small positive parameter to be chosen later, which we assume to be small enough in terms of . We then define the so-called ‘trivial arc’ by
[TABLE]
the ‘minor arc’ by
[TABLE]
and the ‘major arc’ by
[TABLE]
Trivial arc: By Lemma 3.4, . Therefore, applying the trivial bound to the inner sum, we have
[TABLE]
Minor arc: Choose to satisfy the simultaneous divisor conditions for every . If there is no such then (8.1) is trivially true. Further, we may assume that satisfies . Let denote the lattice
[TABLE]
Then
[TABLE]
Using this reformulation, we apply Poisson summation (Lemma 3.6) to the inner sum of (8.3). Then the contribution to (8.3) from the minor arc is equal to
[TABLE]
where is the lattice that is dual to (see Definition 3.5).
We need the following obvious lemma.
Lemma 8.3**.**
There is a natural number , of size at most , such that .
Proof.
There is an -dimensional sublattice of , namely . Therefore, we may choose a lattice basis for all of whose elements satisfy . Let be the -by- matrix that has these basis vectors as its columns. Then the columns of the matrix are a lattice basis for the dual lattice . The entries in are rational numbers with numerator and denominator at most . Clearing denominators, the lemma follows. ∎
Let denote some that minimises the expression . We claim that the only term in (8.5) that cannot be easily absorbed into the error term comes from .
Indeed, let be the quantity provided by Lemma 8.3, and let denote the second closest point to in the lattice . If more than one such point exists, choose arbitrarily. Then
[TABLE]
where is some positive constant which depends only on . By the triangle inequality and dyadic pigeonholing, one then has
[TABLE]
By Lemma 8.3 we also have the estimate
[TABLE]
which holds for all . Using (8.8), Lemma 3.4, and the bound , the quantity (8.7) is seen to be
[TABLE]
This implies that the contribution from these lattice points to (8.5) is at most
[TABLE]
Since and are small enough, (8) is
[TABLE]
which may be absorbed into the error term of (8.1) after adjusting the implied constant appropriately.
It remains to estimate
[TABLE]
We have the following key lemma.
Lemma 8.4**.**
Under the assumption that and are suitably small in terms of ,
[TABLE]
Remark 8.5**.**
The proof of this lemma uses the algebraicity of the coefficients of . One should note that the bound (8.14) below, which holds for matrices with algebraic coefficients, also holds for almost all matrices. It is this fact which ultimately leads to our observation in the introduction that the main theorems of this paper hold for almost all matrices (as well as for matrices with algebraic coefficients, as stated).**
Proof.
Certainly, by rescaling and using Lemmas 3.4 and 8.3,
[TABLE]
The quantity
[TABLE]
encodes information about diophantine approximations to the coefficients of . For example, since is purely irrational, by definition666The reader may consult Definition 5.2. we have for any . Therefore, since the function
[TABLE]
is continuous, (8.13) is always non-zero. We will need a quantitative refinement of this fact.
Fortunately, in [21] we extensively analysed expressions such as (8.13). Consider Definition 2.8 of [21] in particular, in which we defined the approximation function777We stress that the notation is unrelated to the parameter from this section. . In this language, (8.13) is equal to
[TABLE]
Therefore, since is purely irrational and has algebraic coefficients, Lemma E.1 of [21] tells us that
[TABLE]
Since and are small enough in terms of , and since , (8) implies that
[TABLE]
as claimed. ∎
The lemma above implies that (8.11) has size
[TABLE]
which is thus our bound for the total contribution from the minor arc .
Major arc: Performing the same Poisson summation argument as in the minor arc case, the main term on the left-hand side of (8.1) is equal to
[TABLE]
For one has , and so . Therefore (8.16) is equal to
[TABLE]
Since is injective, one has . Therefore (8.17) is equal to
[TABLE]
which, after the obvious manipulations, equals
[TABLE]
Fixing suitably small and , and combining the contribution from the trivial, minor, and major arc, we deduce that
[TABLE]
By adjusting the implied constant appropriately, the error term from (8) is for all positive real . The final observation is that, considering the definition of the local factor in (8.2) and the fact that we assumed is non-zero,
[TABLE]
The lemma follows. ∎
The following estimate will also be needed.
Lemma 8.6**.**
Under the same hypotheses as Lemma 8.1, for all positive
[TABLE]
where is as in (8.2).
Proof.
By applying Poisson summation, the left-hand side of (8.19) is equal to
[TABLE]
where and are as in (8.5). By applying estimates (8.7) and (8.9), one shows that the main term of (8.19) comes from the term above. After the obvious manipulations, this concludes the lemma. ∎
9. The linear inequalities condition
In [13], the key notion of pseudorandomness is the so-called ‘linear forms condition’ (see Definition 6.2 of of that paper). The upshot is that in order to understand the number of solutions to a particular linear equation in primes, it is enough to understand the number of solutions to certain auxiliary linear equations weighted by a sieve weight . In this paper an analogous philosophy holds. Indeed we will show that, in order to understand the number of solutions to a particular linear inequality in primes, it is enough to understand the number of solutions to certain auxiliary linear inequalities weighted by a sieve weight .
Let us proceed with the formal definition. The reader is reminded that (see Section 4).
Definition 9.1** (Linear inequalities condition).**
Let be natural numbers, and let be a linear map. For each natural number , let be a function. We say that the family of functions is -pseudorandom if the following holds. For all positive constants and for all sets of parameters , for all compactly supported smooth functions and such that , and for all functions that each satisfy as and for all , for all satisfying , and for functions such that each equals either or ,
[TABLE]
as , where the term may depend on the family , on , , , and on the functions .
Remark 9.2**.**
Equation (9.1) might seem to be a slightly curious formulation of a pseudorandomness principle, as it does not claim that the weight behaves like the constant function but rather behaves like the local von Mangoldt function. However, referring to Remark 7.4, let us reiterate the comment that we are not performing the -trick in the same manner as [13].**
The aim of this section is to introduce a sieve weight , and to prove that it is -pseudorandom for a large class of linear maps . We begin by introducing the sieve weight from [13, Appendix D].
Definition 9.3** (Smooth sieve weight).**
Let be a natural number, be a positive real, and define . Let be the smooth -supported function fixed in Section 4. Define the function by the formula
[TABLE]
for non-negative integers , and then by the obvious extension to negative integers.
We now define the family of majorants themselves.
Definition 9.4** (Pseudorandom majorant).**
Let be a natural number, let be a positive real, and let . Define the constant
[TABLE]
Then define the weight by
[TABLE]
Note that also depends on , but we suppress that dependence from the notation (as we fixed in Section 4).
We now state our main new result on the pseudorandomness of this sieve weight.
Theorem 9.5** (Pseudorandomness of sieve weights).**
Let be natural numbers, with . Let be a surjective linear map, and suppose that and that the coefficients of are algebraic. Assume that is a positive parameter that is small enough in terms of . Then is -pseudorandom.
Temporarily dropping the convention that , we speculate that the following general result holds.
Conjecture 9.6** (Pseudorandomness conjecture).**
Let be natural numbers, with . Let be a surjective linear map, and suppose that . Then there is some value of and some function , satisfying as , for which is -pseudorandom.
Unfortunately we have not been able to resolve Conjecture 9.6, but we strongly believe it to be true. If is large enough in terms of then the analytic methods of Parsell (see [18] and Appendix B) can be used to show that is -pseudorandom without any algebraicity assumptions. But these methods seem harder to apply in the range , and we have not been able to establish the appropriate mean value estimate. Resolving Conjecture 9.6 would, after a straightforward adaptation of the methods of this paper, enable one to remove the algebraicity assumption from Theorem 1.7 and Theorem 1.16.
Remark 9.7**.**
The proof of Theorem 9.5 is the only moment during the proof of the main theorems Theorem 1.7 and Theorem 1.16 when we use the fact that the coefficients of the original linear map are algebraic. Furthermore, we will ultimately only ever appeal to the linear inequalities condition for a certain finite collection of linear maps, which includes the original linear map itself as well as some auxiliary linear maps that are generated from applications of the Cauchy-Schwarz inequality. Since only the diophantine approximation properties of algebraic numbers are used (witness Lemma 8.4 and [21, Lemma E.1]), and since these properties are satisfied by almost all real numbers, one may show that Theorems 1.7 and 1.16 remain true for some explicit set of maps that has full Lebesgue measure.**
To demonstrate our approach to proving Theorem 9.5, we first give the argument under the simplifying additional assumption that is purely irrational (see Definition 5.2).
Lemma 9.8**.**
Suppose that , , , and the functions all satisfy the conditions in Definition 9.1. Suppose in addition that is surjective, purely irrational, and has algebraic coefficients. Then for all positive we have
[TABLE]
where is the singular integral
[TABLE]
Proof.
We have the identity
[TABLE]
Then the expression is equal to
[TABLE]
by applying Lemma 8.1 to the inner sum, where in the statement of that lemma we take , the map to be the identity, and . The local factor is equal to in this instance.
Sum the error term in (9) over all . The bound that comes from the prime number theorem controls the resulting error term (with room to spare), and the main term of (9.3) follows from the identity
[TABLE]
∎
To finish the proof of Theorem 9.5 (in the case when is purely irrational, that is) it now suffices to show that
[TABLE]
where each is either or . By multiplying out the left-hand side of (9.8), we see that it is sufficient to prove that
[TABLE]
where each equals either or (recall ).
After our analysis in Lemma 8.1, it turns out that the estimate (9.9) will follow almost immediately from the sieve calculation performed in [13, Theorem D.3]. To describe the details, it will be useful to introduce the following notation. Let
[TABLE]
and
[TABLE]
We may assume that , as otherwise the estimate (9.9) follows from the estimate (9.3).
Each may be expressed as a divisor sum, either using Definition 9.3 or expression (9.5). Doing this, and swapping orders of summations, we have that the left-hand side of expression (9.9) is equal to
[TABLE]
where , and if we write for the least common multiple . Using the compact support of the function , when analysing the inner sum one may assume that each is at most .
We apply Lemma 8.1. Therefore, provided is small enough,
[TABLE]
as in (9). By the bounds on and , the error term from (9.11) may be summed over all and remain acceptable. We also have the identity
[TABLE]
Therefore expression (9.9) would follow from the asymptotic
[TABLE]
But this is just expression D.4 of [13], applied to the identity map . Note that the quantity in expression D.4 of [13] is zero, as if are the linear maps given by for all then there are no primes for which there exist two forms and that are linearly dependent modulo . This proves (9.9), and hence resolves Theorem 9.5 in the case when is purely irrational.
We now present the detailed proof of Theorem 9.5 in full generality.
Proof of Theorem 9.5.
Let be the rational dimension of (see Definition 5.2). Apply Lemma 5.6 to both the expression and the expression
. Writing , where is the rational dimension of , and renaming as , as , as , and as , we see that it suffices to prove the following theorem.
Theorem 9.9**.**
*Let be natural numbers, and let be a non-negative integer. Suppose that . Let be positive parameters, and let be a set of additional parameters. Let be a surjective purely irrational linear map with algebraic coefficients, and let be an injective linear map with integer coefficients. Assume that is small enough in terms of . Let be a vector with , and let be a vector with . Let and be in . Let be functions that each satisfy as and for all .
These conditions will be referred to as ‘the hypotheses of Theorem 9.9’.
Then, if has finite Cauchy-Schwarz complexity,
[TABLE]
where each equals either or .
Proof of Theorem 9.9.
Let have coordinate maps . Let
[TABLE]
be the singular series, where denotes the coordinate of . Let
[TABLE]
be the singular integral.
Lemma 9.10**.**
Under the hypotheses of Theorem 9.9, if has finite Cauchy-Schwarz complexity then the singular series and singular integral satisfy the bounds
[TABLE]
and
[TABLE]
The reader may find the definition of and in Section 3.
Proof.
Since has finite Cauchy-Schwarz complexity, no two of the forms are parallel. Hence by [13, Lemma 1.3] the singular series converges, and the size may be bounded by a constant depending only on .
The bound on follows directly from Lemma A.1. ∎
We continue with the following lemma, which is a more general version of Lemma 9.8.
Lemma 9.11**.**
Under the hypotheses of Theorem 9.9 we have, for every positive real ,
[TABLE]
If has finite Cauchy-Schwarz complexity, then
[TABLE]
Proof.
We proceed as in the proof of Lemma 9.8. Then is equal to
[TABLE]
by applying Lemma 8.1 to the inner sum, where
[TABLE]
If , one should apply Lemma 8.6 in place of Lemma 8.1.
By using the identity (9.5) again one obtains
[TABLE]
This settles the first part of the lemma.
For the second part, by the Chinese Remainder Theorem we have that (9) is equal to
[TABLE]
where denotes the product over those for which .
Since has finite Cauchy-Schwarz complexity there is no pair of forms and that are parallel. Therefore we may apply the analysis of local factors in [13, Lemma 1.3] to conclude that the first bracket in (9) is equal to , and that the second bracket is equal to . Combining these bounds with Lemma 9.10 gives the second part of the present lemma. ∎
Remark 9.12**.**
As we intimated earlier, in Remark 1.17, one can use Lemma 9.11 to establish an asymptotic expression for in the general case. Indeed, one applies the rational parametrisation process of Lemma 5.6 and then the asymptotic in Lemma 9.11 to obtain**
[TABLE]
Now, Theorem 9.9 will be settled if we can show that the left-hand side of (9.14) enjoys the same asymptotic expression as the one present in (9.18). By multiplying out the left-hand side of (9.14), we see that it is sufficient to prove the following lemma.
Lemma 9.13**.**
Under the hypotheses of Theorem 9.9, if has finite Cauchy-Schwarz complexity then
[TABLE]
where each equals either or
Proof of Lemma.
The first half of the proof of this lemma comprises manipulations that are very similar to those that have appeared previously in this section. Indeed, as before, it will be useful to let
[TABLE]
and
[TABLE]
We may assume that , as otherwise the estimate (9.22) follows from Lemma 9.11.
Considering (9.5) again, and expressing each as a divisor sum, we have that the left-hand side of expression (9.22) is equal to
[TABLE]
where if we write for the least common multiple . Using the compact support of the function , when analysing the inner sum one may assume that each is at most .
We apply Lemma 8.1 (or, if , we apply Lemma 8.6). Therefore
[TABLE]
where
[TABLE]
Therefore expression (9.22) (and hence the entirety of Theorem 9.9) would follow from the asymptotic expression
[TABLE]
Note that this expression concerns linear forms with integer coefficients. We have removed the irrational information entirely.
Expression (9) follows from the sieve calculation [13, Theorem D.3], after restricting to suitable arithmetic progressions. Indeed, let
[TABLE]
Then the left-hand side of (9) is equal to
[TABLE]
The expression following the summation in is amenable to the estimate (D.4) from [13], applied with and affine linear forms
[TABLE]
In order to apply this estimate we note first that (since we have previously assumed that ). We also note again that, by the finite Cauchy-Schwarz complexity assumption, no two of the forms are rational multiples of each other.
So, applying the estimate (D.4) from [13] we have that the expression in (9) following the summation in is equal to
[TABLE]
where
[TABLE]
and
[TABLE]
where is the set of ‘exceptional’ primes, i.e. those primes for which there exist and for which the forms and are affinely related modulo .
Remark 9.14**.**
*The reader may have noticed that expression (9.28) is not exactly what was proved in estimate (D.4) of [13]. Rather than having an error term depending on and , that expression has an error term depending on the linear maps which, one notes, have coefficients that depend on and that are therefore unbounded. Fortunately, the dependence of the error term on the size of the coefficients is only polynomial, and so any contribution from powers of may be absorbed into the factor. ***
This technical manoeuvre is also required in [13] (in the application of Theorem D.3 that follows expression (D.24)), although it is not explicitly stated by the authors.**
Following on from (9.28) and assuming that is large enough in terms of , we see that any satisfies (as has finite Cauchy-Schwarz complexity). Since , the error in (9.28) is therefore . Furthermore, by [13, Lemma 1.3] we have , and so . Finally, if then
[TABLE]
Therefore expression (9), up to an error term of , is equal to
[TABLE]
where
[TABLE]
where denotes the product over all for which .
By invoking [13, Lemma 1.3] again we conclude that and also that the first part of expression (9) is equal to . Hence, as in the conclusion of the proof of Lemma 9.11, we conclude that expression (9) is equal to . This establishes expression (9), and so Lemma 9.13 is proved. ∎
Therefore Theorem 9.9 is resolved. ∎
Hence Theorem 9.5 is settled as well, i.e. we conclude that the weight is -pseudorandom. ∎
We finish this section by noting a corollary of the theorems above, which will be useful in its own right.
Corollary 9.15** (Upper bound for linear inequalities).**
Let be natural numbers, with , and let be positive reals. Let be a surjective linear map with algebraic coefficients, and suppose that and that the coefficients of are algebraic. Let be the rational dimension of . Let be functions that satisfy as for all and satisfy for all and for all . If is small enough in terms of , then for all functions supported on , for all functions supported on , and for all satisfying , one has
[TABLE]
as , where each equals either or . The term may depend on , , , , and the choice of functions .
Proof.
Using Lemma 3.1, replace both and by compactly supported smooth majorants and for which
[TABLE]
and
[TABLE]
We have and . Then, by Theorem 9.5,
[TABLE]
where the error term may depend on , , , , and the functions .
In Remark 9.12, we noted that
[TABLE]
where the error term depends on the parameters mentioned above, and where and are of the form (9.15) and (9.16). The corollary then follows from the bounds in Lemma 9.10. ∎
This result is to be compared with the following statement.
Lemma 9.16** (Weak upper bound).**
Let be natural numbers, with , and let be positive parameters. Let be a surjective linear map. Then, for all functions supported on , for all functions supported on , for all , and for all functions ,
[TABLE]
The bound in Lemma 9.16 is weaker than the bound in Corollary 9.15, but has the advantage of holding for all surjective maps , which is a situation that will be needed later.
Proof.
This is essentially identical to Lemma 3.2 of [21]. Indeed, one sees immediately that
[TABLE]
Since is surjective, without loss of generality we may assume that the first columns of form an invertible matrix. If the variables to are fixed, there are only possible choices for for which the inequality is satisfied. Summing over to , the lemma follows. ∎
Part IV The structure of inequalities
Before embarking upon this part of the argument, we remind the reader of the following basic notion from functional analysis. A linear map between two normed spaces will be called a bounded operator if there exists a constant such that for all one has . It is a standard fact that all linear maps between two finite dimensional normed spaces are bounded.
10. An alternative formulation
So far all of our theorems and lemmas have been phrased in terms of linear inequalities that are written in the form . In Section 14 the auxiliary inequalities will appear in a different form, but, as is shown in Lemma 10.1 below, these different forms are more-or-less equivalent. The statement of this lemma is unfortunately rather technical, but the proof is straightforward. The reader may wish in the first instance to consider the special case in which and is injective.
Lemma 10.1** (Alternative formulation).**
Let be natural numbers, with , and let be positive parameters. Let be another set of parameters. Let be a non-negative integer, and suppose that is small enough in terms of , , and . Let be a linear map, and suppose that . Let and be smooth functions, where and . Assume that the Lipschitz constant of is at most and assume further that . Then
- (1)
there exists a surjective linear map such that and . If has algebraic coefficients then can be chosen to have algebraic coefficients. 2. (2)
for any satisfying part (1), if has finite Cauchy-Schwarz complexity then . 3. (3)
for any satisfying part (1), if is small enough in terms of and then there exist smooth functions and , with , and , such that for all , , and natural numbers ,
[TABLE]
where is an error term of size at most
[TABLE]
Proof.
Part (1) of the lemma is immediate. Indeed, one has the quotient map . Choosing an isomorphism , we may define . If has algebraic coefficients then choosing such an with algebraic coefficients gives a suitable with algebraic coefficients.
For part (2), suppose that has finite Cauchy-Schwarz complexity. If were in then there would exist and a real number for which is non zero and , which would imply , which would imply that has infinite Cauchy-Schwarz complexity, contradicting the hypothesis.
It remains to prove part (3). Let be an orthonormal basis for , and extend this to an orthonormal basis for . Then define the linear map by
[TABLE]
By changing variables, we have that the left-hand side of (10.1) is equal to
[TABLE]
which equals
[TABLE]
Recall, from Section 4, that we use the notation to refer to the vector , etcetera.
We make some observations. Firstly, we observe that (10.2) is equal to [math] unless . Indeed, if then for all that give a non-zero contribution to (10.2) we have
[TABLE]
if is small enough. This means that
[TABLE]
which if is large enough in terms of and means that
[TABLE]
for all . [Note that and are orthogonal.]
Secondly, we observe that
[TABLE]
for all that give a non-zero contribution to the integral (10.2). Write , where and . By orthogonality, we conclude that
[TABLE]
Since is a bounded linear map, this in turn means that
[TABLE]
Since is Lipschitz, with Lipschitz constant at most , this all means that (10.2) is equal to
[TABLE]
plus an error of size at most
[TABLE]
We proceed to analyse the terms of (10) separately. Firstly, by shifting the variables we see that the first bracket of (10) is equal to
[TABLE]
Now let be any surjective linear map that satisfies . Note that is an injective linear map, and thus (10.4) is equal to
[TABLE]
Differentiating inside the integral, one sees that this expression is equal to for some smooth compactly supported function satisfying . Moreover, is supported on , since and are orthogonal. Note that , so the expression is equal to .
We move to the second term of (10). Choose to be an isomorphism with . Then the second term of (10) is equal to for some smooth function satisfying . Note that is indeed compactly supported, since and are orthogonal vectors.
In summary, we have shown that (10.1) is equal to
[TABLE]
plus an error of size
[TABLE]
By the construction of , this error is bounded by
[TABLE]
The term is not quite of the required form, since is not compactly supported as a function of . However, it may be easily massaged into this form. Indeed, from the above discussion we know that implies that , for some constant that satisfies . Let be a -supported function (in the sense of Definition 4.1), and let be defined by . Then let be defined by
[TABLE]
Then , and if we have
[TABLE]
The lemma is proved. ∎
This reformulation allows us to deduce Corollary 10.3 below. This is a corollary of Theorem 9.5 and is the result on inequalities and sieve weights that we will actually use in Section 15. In order to state this inequality, we introduce the following convention.
Definition 10.2** (Convolution).**
If has finite support, and is a measurable function, we may define the convolution by
[TABLE]
Recall from Section 4 that, for some positive parameter , the function denotes a fixed -supported function.
Corollary 10.3** (Switching functions).**
Let be natural numbers, with , and let be a non-negative integer. Let be positive parameters, and let be a set of further parameters. Suppose that is small enough in terms of , , and . Let be a linear map with algebraic coefficients, and suppose that . Suppose that has finite Cauchy-Schwarz complexity. Let be a smooth function in . For , let be any functions with for all and for which as . For each let the function be equal to either or . Let be any vector satisfying .
Then, if is small enough in terms of , the expression
[TABLE]
is independent of the choices of the functions , up to an error of size as . This term may depend on , , , , , and on the functions .
Proof.
Expanding out the definition of , one observes that the left-hand side of (10.6) is equal to
[TABLE]
By applying Lemma 10.1 to the inner integral, we get a surjective linear map with algebraic coefficients, and smooth functions and supported on and respectively and with , such that (10.7) is equal to
[TABLE]
plus an error of size
[TABLE]
Furthermore, .
Now apply Theorem 9.5 to the main term (10.8). As written this theorem applies to functions and that take values in , but by the obvious rescaling we may nonetheless apply the theorem to the present functions and . This shows immediately that (10.8) is independent of the particular choices of , up to an error of size . The term has the appropriate dependencies.
For the error term (10.9), we apply the upper bound in Corollary 9.15. This shows that (10.9) is , so may be absorbed into the term above. Corollary 10.3 is proved. ∎
An upper bound in this setting will also be convenient.
Corollary 10.4**.**
Under the same hypotheses as Corollary 10.3,
[TABLE]
where the implied constant may depend on , , , , , and on the functions .
Proof.
Proceed as in the previous proof to get to expression (10.8). Then apply the upper bound in Corollary 9.15. ∎
11. Variation in parameters
This section will be devoted to proving Lemma 11.1 below. This technical lemma shows that the number of solutions to certain inequalities, weighted by the local von Mangoldt function, is a quantity that behaves well when the underlying parameters are perturbed. The slightly esoteric notation, in which we introduce a dimension only to consider , is designed to correspond to the moment in Section 15 in which this lemma will be applied.
Lemma 11.1**.**
Let be natural numbers, with , and let be positive parameters. Let and be linear maps with algebraic coefficients. Let be a set of parameters, and let be an arbitrary smooth function. Let be a function such that as and for all . Let be a vector satisfying . For , define
[TABLE]
where is the coordinate of . Then, if is sufficiently small in terms of and , there is a function , satisfying , such that
[TABLE]
Here , though it may also depend on and .
None of the methods required to prove this lemma will be particularly deep, but the technical manoeuvres will be a little intricate. In particular, we will need to apply the approximation in Lemma 10.1 multiple times within the same argument.
The proof of Lemma 11.1 will require the preliminary result below, namely Lemma 11.2. To state this lemma, we define a metric on by the formula
[TABLE]
Lipschitz constants of functions will be considered with respect to this metric.
Lemma 11.2**.**
Let be natural numbers, and let be positive parameters. Let be a surjective linear map with integer coefficients, and let be a Lipschitz function supported on , with Lipschitz constant at most . Let be any function for which
[TABLE]
for all and all .
For each , define to be some vector with integer coordinates for which
[TABLE]
Then, provided is small enough in terms of , the function
[TABLE]
- •
depends only on the value of modulo ;
- •
is Lipschitz when viewed as a function on , with Lipschitz constant at most
[TABLE]
Remark 11.3**.**
The expression is well-defined, since is a lattice.**
Proof.
To prove the first part of the lemma, let and first suppose that there is a unique vector for which
[TABLE]
In this case, by the uniqueness of , we have . By translation, we know that
[TABLE]
for all , and hence
[TABLE]
for all . Hence
[TABLE]
and so the function
[TABLE]
depends only on the value of modulo . Furthermore, if ,
[TABLE]
by the invariance properties of . Hence the function (11.2) only depends on the value of modulo .
Now suppose that there were two distinct vectors , for which
[TABLE]
for . Then in fact . Indeed, if this were not the case then we would have , which is impossible if is small enough, since and are two distinct elements of . By translation, we may also conclude that for all . So again, the function (11.2) depends only on the value of modulo .
Regarding the second part of the lemma, the idea of the proof is similar to the above. Indeed, the only aspect of the function (11.2) that could lead to a large Lipschitz constant is of course the term , which could, one fears, jump sharply for small changes in . However, when such jumps occur, the function is always equal to zero.
Let us proceed with the full proof. Indeed, let , and suppose first that
[TABLE]
By choosing suitable coset representatives, without loss of generality we may assume that
[TABLE]
Then either or . If then
[TABLE]
Therefore
[TABLE]
That resolves the lemma in this case.
If on the other hand , we may conclude that both
[TABLE]
and
[TABLE]
Indeed, if , say, then
[TABLE]
If is small enough, this implies that must be the unique element of for which
[TABLE]
and hence that , contradicting the assumption.
If is small enough, expressions (11.4) and (11.5) imply that
[TABLE]
and so
[TABLE]
That resolves the lemma in this case.
The only remaining case to consider is when
[TABLE]
In this case we bound the Lipschitz constant very crudely, as , which is , since . This settles the lemma. ∎
We are now ready to prove Lemma 11.1.
Proof of Lemma 11.1.
For this proof we make the following conventions. Any implied constant may depend on , and , and we will use the notation , , etc. to denote a function in , that may change from line to line.
The first part of the proof will involve establishing an asymptotic formula for , namely the expression in (11.11) below. Indeed, expanding out the definition of (see Definition 10.2) we have
[TABLE]
where is defined by . Let , and note that .
The inner integral of (11.7) may be analysed using Lemma 10.1. The following table indicates which objects in (11.7) play which role in Lemma 10.1.
[TABLE]
So, applying Lemma 10.1, one sees that (11.7) is equal to
[TABLE]
Here, is a surjective linear map with algebraic coefficients, that depends only on , , and the error term may be bounded above by
[TABLE]
where the summation denotes summation over the set
[TABLE]
The error term is easy to bound. Indeed, by Lemma 9.16, expression (11.9) may be bounded by . Since , this is an error.
It remains to analyse the main term in (11.8), which we will do with the help of Lemma 5.6. The reader is invited to consult Section 5 for the statement of this result, and for the definitions of rational map, rational dimension, etcetera.
Now, let be the rational dimension of , and let be a rational map for with algebraic coefficients. Then, there exists an injective linear map with integer coefficients, satisfying , and a vector , such that the main term of (11.8) is equal to
[TABLE]
where is the coordinate of . Note how we’ve appealed to part (11) of Lemma 5.6 for the particular form of the argument of . Note also how, since is sufficiently small, we have been able to apply part (10) of the lemma to establish that consists of a single element .
Moreover, from part (10) of the lemma again, we have that is an element of for which
[TABLE]
From part (9) of Lemma 5.6, letting be the standard basis vectors of , we have a set
[TABLE]
which is a lattice basis for and for which is a lattice basis for . Letting , we have that .
By applying the first part of Lemma 9.11 to expression (11), one immediately derives
[TABLE]
where
[TABLE]
and is equal to
[TABLE]
Note that
[TABLE]
The remainder of the proof of Lemma 11.1 will consist of analysing expressions (11.12) and (11.13) for and .
We begin with , aiming for expression (11.18). Letting , we have that . For any vector let and be the components in and respectively. Then we have that
[TABLE]
since
[TABLE]
By the bound on the Lipschitz constant of , we may replace with in (11.13), up to an error of . Also, note that
[TABLE]
by Lemma A.1. This is , since by (11.14). Therefore we may replace (11.13) by the expression
[TABLE]
plus an error of size .
The expression (11.16) is in a form that is amenable to Lemma 10.1. The following table indicates which objects from our present discussion play which role in the notation of Lemma 10.1.
[TABLE]
This is a valid application of Lemma 10.1, since and the final two functions in the right-hand column are compactly supported smooth functions of their arguments (as is injective, , and is an algebraic complement to ). Recalling that has algebraic coefficients, by the third part of Lemma 10.1 we may therefore replace (11.16) by an expression of the form
[TABLE]
where .
The argument of the function above doesn’t depend smoothly on , but this may be easily rectified. Indeed, by (11.15) and the fact that is Lipschitz and is bounded, (11.17) is equal to
[TABLE]
i.e. is equal to
[TABLE]
where .
In summary then, since we have shown that
[TABLE]
The function
[TABLE]
is of the form considered in Lemma 11.2 in expression (11.2). Indeed, one first notes that (11.20) is a well-defined mapping, since is determined only by and depends on and only through the value of (see (11.12)). Then, one takes the map from Lemma 11.2 to be the map here, ones takes from that lemma to be here, and one takes the map from that lemma to be here, and one takes the map from that lemma to be
[TABLE]
here. The definition of is valid since is indeed a bijection, and by part (9) of Lemma 5.6 we have . Consulting expression (11.12) for , one sees that
[TABLE]
and so (11.20) is indeed of the form (11.2) as we have claimed. The only hypothesis of Lemma 11.2 that we haven’t already verified is the invariance of under translation by elements of , but this is immediate from the definition of , since is linear and is -periodic. Therefore, by applying Lemma 11.2, we conclude that the function (11.20) is Lipschitz on , with Lipschitz constant
The proof of Lemma 11.1 is nearly complete, since Lipschitz functions enjoy good approximation by short exponential sums. Indeed, by Lemma A.9 of [12], for all there exists a function such that and
[TABLE]
equals
[TABLE]
Then, picking to be a suitably large power of , Lemma 11.1 follows. ∎
Part V The main argument
Having completed all the preparatory material, the main thrust of the proof can begin in earnest.
12. Controlling by Gowers norms
In this section we state a type of result that has become known as a ‘generalised von Neumann theorem’, which uses Gowers norms to bound the number of solutions to a diophantine inequality. For readers familiar with [13], the procedure is routine. We will then show that this result implies the main theorem (Theorem 1.16).
Theorem 12.1** (Generalised von Neumann Theorem).**
Let be natural numbers, satisfying , and let be positive parameters. Let be a surjective linear map with algebraic coefficients, and assume that and that is small enough (depending on ). Let satisfy . Let and be functions with Lipschitz constants at most , and suppose that is supported on and is supported on . Let be arbitrary functions, satisfying for all and for all .
Then there exists an such that, if
[TABLE]
as , then
[TABLE]
as . The second term may also depend on , , , , , and the rate of decay of the first term.
Proof of Theorem 1.16 assuming Theorem 12.1.
Assume the hypotheses of Theorem 1.16. By telescoping we have that
[TABLE]
is equal to
[TABLE]
Since is supported on , we may restrict the functions and to without altering the size of expression (12).
By the construction of the sieve weight we have
[TABLE]
for all . Therefore, after rescaling, we may apply Theorem 12.1 in this setting.
Recall that, by Lemma 7.5,
[TABLE]
as , for all . So, applying Theorem 12.1 to each term of (12) separately, we derive
[TABLE]
as . By fixing a suitably small value of , we conclude Theorem 1.16. ∎
13. Transferring from to
In this section we begin the proof of Theorem 12.1. Following the programme set out in [21], our first step will be to transfer the problem from the setting of functions on to functions on .
Definition 13.1**.**
Let be natural numbers. Let be a linear map, let , and let and be compactly supported measurable functions. Then, for all bounded measurable functions we define
[TABLE]
We now state the key lemma. For the definition of , where is the function we determined in Section 4, the reader may consult Definition 10.2.
Lemma 13.2** (Transfer).**
Let be natural numbers, with , and let , , , , be positive constants. Let be a surjective linear map, and let be a vector satisfying . Let and be compactly supported Lipschitz functions, with Lipschitz constants at most . Suppose that is supported on , and is supported on . Then there exists some positive real number , satisfying , such that the following holds. Let be arbitrary functions that satisfy for all and for all . Assume that and that is small enough depending on . Then
[TABLE]
as . The implied constant in the term may depend on , , and , and the term may depend on all these parameters together with and .
Proof.
The proof is very similar to the proof of Lemma 5.4 in [21], although we do have to insert various estimates that are only proved in this paper.
Indeed, let denote the function . We choose
[TABLE]
Since is -supported, . Then, expanding the definition of the convolutions ,
[TABLE]
equals
[TABLE]
This is equal to
[TABLE]
Indeed, the inner integrand is only non-zero when , and has Lipschitz constant .
Continuing, expression (13.4) is equal to
[TABLE]
where
[TABLE]
and is a certain error, which may be bounded above by a constant times
[TABLE]
Let us deal with the first term of (13.5), in which we wish to replace with . We therefore consider
[TABLE]
which is
[TABLE]
Observe that . Indeed,
[TABLE]
by the definition of and the Lipschitz property of . The function is compactly supported, with .
Of course needn’t be smooth, but we may nonetheless apply Corollary 9.15, concluding that expression (13.7) is at most
[TABLE]
Turning to the error from (13.5), we’ve already remarked that it may be bounded above by expression (13.6). Applying Corollary 9.15 again, expression (13.6) is (with the appropriate dependencies on , , etc.).
The lemma then follows. ∎
We will need to show that the operation of replacing by is compatible with Gowers norms.
Firstly, if is a bounded measurable function, we define the Gowers norm over the reals by
[TABLE]
More detail about this quantity may be found in Appendix A of [21].
Secondly, we note that and may be related.
Lemma 13.3** (Relating different Gowers norms).**
Let be a natural number, and assume that is a positive parameter that is small enough in terms of . Let be a natural number, and let be an arbitrary function. Then we have
[TABLE]
Proof.
This is Lemma 5.5 of [21]. ∎
14. Parametrising the kernel
In this section we will convert the expression into an expression that is tailored to the subsequent manipulations. We begin with a lemma that is very similar to Proposition 8.2 of [21].
Lemma 14.1** (Separating out the kernel).**
Let be natural numbers, with , and let be positive constants. Let be a surjective linear map with algebraic coefficients, and assume further that . Let be a Lipschitz function supported on , with Lipschitz constant at most , and let be any function supported on . Then there exists an injective linear map with algebraic coefficients (depending only on ), and a Lipschitz function with Lipschitz constant and with , such that, if are arbitrary bounded measurable functions,
[TABLE]
where, for each , is some real number that satisfies .
Furthermore, has finite Cauchy-Schwarz complexity (see Definition 5.5).
Proof of Lemma 14.1.
For ease of notation, let
[TABLE]
Noting that is a vector space of dimension , define to be an orthonormal basis for consisting of vectors with algebraic coordinates. Then the map , defined by
[TABLE]
is an injective map that parametrises . Furthermore has finite Cauchy-Schwarz complexity, since otherwise there would exist and a real number such that , i.e. . This implies that , which, by definition, implies that , contradicting our hypotheses.
Now, extend the orthonormal basis for to an orthonormal basis for . By implementing a change of basis, we may rewrite
[TABLE]
where is the coordinate of .
We wish to remove the presence of the variables . To set this up, note that, by the choice of the vectors ,
[TABLE]
The vector is in and so, since is a bounded invertible operator, is equal to zero unless , for some domain of diameter and satisfying .
We can use this observation to bound the right-hand side of (14.3). Indeed, we have
[TABLE]
See Section 4 for explanation of notation. So there exists some fixed vector
in such that
[TABLE]
Define the function by
[TABLE]
and for each at most a shift
[TABLE]
Then
[TABLE]
and and satisfy the conclusions of the proposition. ∎
The next proposition is essentially identical to an argument that appears in [21] at the end of Section 8 of that paper. Unfortunately that argument is not in an easily citable form, and so we have found it necessary to state and prove the precise version that we need here. For readers unfamiliar with the notion of normal form, we included a brief summary in Section 6.
Lemma 14.2** (Parametrising by normal form).**
Following on from above, there exists a , a linear map with algebraic coefficients that is in -normal form for some , and a Lipschitz function with Lipschitz constant and with such that
[TABLE]
is bounded above by a constant times
[TABLE]
Proof.
We apply Lemma 6.2 to . Therefore, there is a natural number such that, for any real numbers , (14.7) is equal to
[TABLE]
where
- •
are some vectors that satisfy for each at most ;
- •
for each at most , is linear, and is defined by
[TABLE]
- •
is in -normal form, for some .
We remark that the right-hand side of expression (14.9) is independent of , as it was obtained by applying the change of variables .
Now, with as fixed in Section 4, let be defined by
[TABLE]
Integrating over , we have that (14.9) is at most a constant times
[TABLE]
where the function is defined by
[TABLE]
Notice in (14.10) that we were able to move the absolute value signs outside the integral, as is positive and the integral over is independent of (so in particular has constant sign).
Letting , the lemma is proved. ∎
15. Gowers-Cauchy-Schwarz argument
This section will be devoted to proving the following theorem, which lies at the heart of the proof of our main results.
Theorem 15.1** (Gowers-Cauchy-Schwarz argument).**
Let be natural numbers, and let be positive constants. Let be fixed real numbers that satisfy for all . Let be a linear map with algebraic coefficients, which is in -normal form. Let be a Lipschitz function supported on and with Lipschitz constant at most . Let be any bounded measurable functions that satisfy for all . Suppose that
[TABLE]
as . Then if and are small enough in terms of and the dimensions , , and ,
[TABLE]
as , where the error term can depend on , , , , , and the first term.
For the definition of , the reader may consult expression (13.8).
Theorem 15.1 is closely analogous to [13, Proposition ], and the first half of our proof will follow the proof of that proposition closely (and in particular will contain no new ideas). However, new technicalities will become apparent as the argument progresses. In particular it will become important to understand the structure of a function that we will come to denote by , and this will not be easy, in that we will have to appeal to the highly technical Lemma 11.1. This observation and the subsequent analysis constitute the main new elements of the proof of Theorem 15.1.
Proof.
We begin by replacing with a cut-off function that will be easier to work with during the subsequent manipulations. Indeed, let us pick a positive parameter . By Lemma 3.3 there is some parameter and some smooth functions such that
[TABLE]
and each is of the form
[TABLE]
where and the functions are smooth, supported on , and satisfy .
Therefore, we may write the left-hand side of (15.1) as the sum of expressions of the form
[TABLE]
plus an error of size at most
[TABLE]
Since is in -normal form, for some finite , it follows that has finite Cauchy-Schwarz complexity (see Definition 5.5). Therefore, by Corollary 10.4, expression (15.3) has size .
We now arrange our notation for the rest of the proof, in part to mimic the notation that is used in the proof of [13, Proposition ]. This will hopefully increase the readability for those who are familiar with [13]. Indeed, without loss of generality we may assume that
[TABLE]
Since is in -normal form there is a set of standard basis vectors with and for which vanishes for and is nonzero for . By the nested property of Gowers norms we may assume that , and by reordering the variables we can assume without loss of generality that vanishes for and is nonzero for . It will be useful to rename the first variables and the remainder as . If then the variable is trivial. Note that the coefficients are non-zero for all , so, by rescaling the variables , we may assume that
[TABLE]
For , let denote888This is the notation used in [13]. In this paper it will never risk being confused with the meaning of in asymptotic notation. the set
[TABLE]
Note that and for .
Now, for any set and vector , we define the vector to be the restriction of to the coordinates in . Then, for any set and vector , we define
[TABLE]
where we have abused notation slightly in viewing only as a function of those variables on which it depends.
We also use (for some implied dimension parameter ) to denote a smooth function in . The exact function may change from line to line.
With this notation, by picking to be a suitably slowly decaying function of we see that Theorem 15.1 would follow from the upper bound
[TABLE]
Our entire task is now to establish (15). From this point onwards, we will allow any error term or implied constant to depend on , , , , and , without notating so explicitly.
We proceed by considering the following version of [13, Corollary B.4].
Proposition 15.2** (The weighted generalised von Neumann theorem).**
Let be a finite set, and let be a finite collection of compactly supported Borel probability measures on . For every , let denote the product measure on , and let and be integrable functions such that for all . Then
[TABLE]
where for any and we define to be the unique nonnegative real number satisfying
[TABLE]
Here, as before, we use to denote the restriction of to .
Proof.
The proof is identical to the proof of [13, Corollary B.4], replacing all summations with integrals, and is a consequence of the Gowers-Cauchy-Schwarz inequality. ∎
We now apply this proposition to the left-hand side of (15) above. Observe that we have the pointwise bounds , where
[TABLE]
Therefore, applying Proposition 15.2 by taking to be the set , each to be proportional to , and to be the function , we establish that the left-hand side of (15) is
[TABLE]
Observe that
[TABLE]
and so all the functions other than have been eliminated. Experienced readers will note that, so far, we have been following [13, Appendix C] almost verbatim.
After applying Hölder’s inequality to (15), we see that to establish (15) it suffices to prove
[TABLE]
and, for all ,
[TABLE]
These two expressions correspond respectively to expressions (C.10) and (C.11) of [13].
Establishing (15.9) is straightforward. Indeed, we expand the left-hand side, yielding (up to a multiplicative constant factor) the expression
[TABLE]
As noted in [13, p. 1824], the system of forms given by
[TABLE]
for each , and such that , has finite Cauchy-Schwarz complexity (since does). We may therefore apply the upper bound in Corollary 10.4 to expression (15), and this immediately yields (15.9).
It remains to prove (15.8), which will be a much more major undertaking. We introduce some space-saving notation, namely for any subset we define the indexing set
[TABLE]
If a product is taken over triples , we interpret , and as coming from the triple . For notational expedience we will also identify the space with the space .
With this notation, the left-hand side of (15.8) expands as
[TABLE]
We make the substitution and . Given , , and one can recover , and , so the change of variables is invertible. Therefore we may bound (15) above by a constant (the Jacobian of the change of variables) times
[TABLE]
where is equal to
[TABLE]
for some linear functions .
To be precise, if then the expression is equal to
[TABLE]
where
[TABLE]
This expression is analogous to expression (C.14) of [13]. We let denote the vector . Most fortunately, the exact structure of the linear maps , save for the fact that they form a system with finite Cauchy-Schwarz complexity, will be unimportant.
Following the philosophy of [11] and [13], our next manoeuvre will be to replace with a simpler function. To that end, let be a function for which as and for all . Recall from Section 4 that .
Lemma 15.3** (Comparing and ).**
Define to be equal to
[TABLE]
where here denotes the same function as is present in (15.13). Then expression (15.12) is equal to
[TABLE]
where the may depend on the function .
Proof.
Considering the upper bound , it suffices to show that
[TABLE]
is . By Cauchy-Schwarz, it then suffices to show that both
[TABLE]
and
[TABLE]
The bound (15.16) is immediate from Corollary 10.4. To prove (15.17), expanding out the square we must consider three expressions. One of them is
[TABLE]
When multiplied out, (15.18) is equal to the large expression
[TABLE]
By applying Corollary 10.3 to the above expression, we may replace the functions with , up to an error.
It is worth noting why the application of Corollary 10.3 is valid. Indeed, the underlying set of linear forms is given by (for each )
[TABLE]
We need this linear map to have algebraic coefficients and to have finite Cauchy-Schwarz complexity. Algebraicity follows by the assumptions in the statement of Theorem 15.1. Establishing finite Cauchy-Schwarz complexity is rather involved, but fortunately this has already been done by Green and Tao, on pages 1826 and 1827 of [13], in the analysis of expression (C.14).
Replacing (15.18) with one of the other two terms that arises from expanding out the square in (15.17), and performing the same estimation, the lemma follows. ∎
Let us take stock. As a reminder, we are trying to establish that (15.8) holds. Lemma 15.3 above reduces matters to choosing some function that tends to infinity for which the bound
[TABLE]
holds. If were identically equal to , then expression (15.20) would be of the order of , and hence be by the hypotheses of Theorem 15.1. Of course is not identically equal to , but we do observe that is a function of the form considered in Lemma 11.1. Indeed, consulting the definition of in (15.14), the following table shows which objects in Lemma 11.1 correspond to which objects concerning the definition of .
[TABLE]
From Lemma 11.1, we therefore know that there exists some function satisfying for which
[TABLE]
Therefore, one gets an upper bound for the left-hand side of (15.20), namely
[TABLE]
plus an error of size
[TABLE]
By Corollary 10.4, the size of term (15.22) is . To analyse (15) we apply Lemma B.4 of [21]. Since the function is Lipschitz this means that for all there exists a complex valued function such that and for all one has
[TABLE]
Choosing to be a suitably large power of , (15) may be bounded above by
[TABLE]
plus an error of size
[TABLE]
Using Corollary 10.4 as above, expression (15.24) is .
The term (15) may be analysed using the standard methods. Indeed, by shifting the variable (and noting that is a linear function of and ) we may assume that . Then, by spreading the exponential functions across the different instances of , we see it suffices to show that
[TABLE]
where each function is of the form
[TABLE]
for some .
The argument is nearly complete. Considering expression (13.8), for each we observe that
[TABLE]
So, by the Gowers-Cauchy-Schwarz inequality (recorded in this setting as Proposition A.4 of [21]), the left-hand side of expression (15.25) is
[TABLE]
If grows slowly enough, this expression is .
We have therefore established the upper bound (15.20), and so, by our long sequence of deductions, Theorem 15.1 is finally proved. ∎
16. Combining the lemmas
With all the previous lemmas in hand, we may finally prove Theorem 12.1 (and hence prove Theorem 1.16).
Proof of Theorem 12.1.
Assume the hypotheses of the theorem, fixing a suitably small value of .
By applying Proposition 14.1 and Proposition 14.2, we conclude that there is some and some for which is
[TABLE]
where is in -normal form, has Lipschitz constant and , and each satisfies . Taking this value of in the hypotheses of Theorem 12.1, without loss of generality we may assume that
[TABLE]
as .
Then we may apply Theorem 15.1 to expression (16.1). Indeed, by rescaling the variable we may assume that is supported on . For each we set
[TABLE]
Provided is small enough, by combining (16.2) and Lemma 13.3 we deduce that
[TABLE]
as . So Theorem 15.1 may indeed be applied, which yields
[TABLE]
as .
But then, combining the estimate (16.3) with Lemma 13.2, one derives the bound
[TABLE]
Choosing to be a function tending to zero suitably slowly with , we conclude that
[TABLE]
This is the conclusion of Theorem 12.1, and we are done. ∎
From the work in Section 12, this means that Theorem 1.16, the main result of this paper, is finally settled. ∎
Part VI Final deductions
17. Removing Lipschitz cut-offs
In this section we assume Theorem 1.16, and deduce Theorem 1.7. This deduction will be a routine matter of removing Lipschitz cut-offs.
Lemma 17.1**.**
Assume the hypotheses of Theorem 1.7. Let be a real number in the range and let be an interval of length . Then
[TABLE]
The reader will note that this lemma is a slight refinement of Corollary 9.15.
Proof.
Fix some . Let be a smooth function in , supported on , that majorises the indicator function of the set . Let be some smooth function in , supported on , that majorises . Let be small enough in terms of . Then, by Theorem 9.5 and Lemma 9.11,
[TABLE]
Since , for all of the coordinate subspaces of dimension the map is surjective. We may therefore apply Lemma A.3, and conclude that expression (17) is . The lemma is proved, after having fixed a suitable . ∎
Lemma 17.2**.**
Under the hypotheses of Theorem 1.7,
[TABLE]
Proof.
Let be a positive parameter in the range , to be chosen later. Let us first consider
[TABLE]
Let be two Lipschitz functions satisfying
[TABLE]
with Lipschitz constants depending only on . Let be two Lipschitz functions satisfying
[TABLE]
with Lipschitz constants999The existence of such functions is immediate by interpolating linearly, or by appealing to the results of Section 3. depending only on . Then we have
[TABLE]
By Theorem 1.16, the lower bound in (17) is equal to
[TABLE]
since we may replace with as is supported on . By Lemma 9.11, and the properties of the support of and , this is at least
[TABLE]
Note that the singular series is equal to in this instance, since is purely irrational. By Lemma A.4, expression (17.3) is at least
[TABLE]
By performing an analogous manipulation with the upper bound, we may conclude that
[TABLE]
is equal to
[TABLE]
Therefore, by Lemma 17.1, we have that
[TABLE]
is equal to
[TABLE]
Letting be a function of , tending to zero suitably slowly as tends to infinity, the lemma follows. ∎
To establish Theorem 1.7 as given, i.e. to establish Lemma 17.2 without the log weighting, is standard. To spell it out, Lemma 17.2 implies that, for any in the range ,
[TABLE]
But also, from expression (17.4)
[TABLE]
By Lemma A.1,
[TABLE]
Hence, choosing to be a function of tending to zero suitably slowly, combining bounds (17) and (17) establishes Theorem 1.7. ∎
Part VII Appendices
Appendix A Estimating integrals
In this appendix we include the lemmas that help us estimate the ‘global factor’ from Theorem 1.7, namely
[TABLE]
Lemma A.1** (Upper bound).**
Let be a natural number, let be a non-negative integer, and let be positive constants. Let be a surjective linear map. Let and be any compactly supported measurable functions, and assume that is supported on a box of the form and is supported on a box of the form . Then
[TABLE]
Proof.
Split as a direct sum . Observe that is an injective linear map, so has bounded inverse. Hence the integrand in (A.1) is zero unless is contained within a region which has volume . The integrand is also zero unless is contained within a region which has volume . Together, these observations combine to give the required bound. ∎
Lemma A.2**.**
Let , , be natural numbers, with , and let be a positive parameter. Let be a surjective purely irrational linear map. Let be any vector. Then there exists a parameter satisfying such that
[TABLE]
Furthermore, if then there exists a constant , independent of and , for which
[TABLE]
Proof.
Since has rank , without loss of generality we may assume that the first columns of form an invertible submatrix . So
[TABLE]
The first columns of form the identity matrix, and so this expression is equal to
[TABLE]
where is the column of the matrix .
If the vector
[TABLE]
lies in , then unless it lies close to the boundary of we have
[TABLE]
More precisely, letting be a constant that is sufficiently large in terms of , we have that expression (A.3) is equal to
[TABLE]
plus an error term of size at most
[TABLE]
where indicates integration over those for which
[TABLE]
We remind the reader that refers to the topological boundary.
Define
[TABLE]
and
[TABLE]
Then certainly . To prove the first part of the lemma it then suffices to control the error term (A.5). Let denote the map
[TABLE]
For all , let denote projection onto the coordinate. Then the size of the error term (A.5) is
[TABLE]
where the supremum is over intervals .
Since is purely irrational (and so is also purely irrational), for all the linear map is non-zero. From this we conclude that (A.5) has size at most
[TABLE]
from which the first part of the lemma follows.
For the second part, assume that . Then note that
[TABLE]
where indicates integration over those for which
[TABLE]
One can estimate (A.7) by exactly the same procedure as was used to estimate (A.5), and thereby conclude that (A.7) is . This settles the second part of the lemma. ∎
The next lemma concerns the global factor when one of the variables is restricted to a short interval.
Lemma A.3**.**
Let be natural numbers, with , and let be positive parameters. Let be a surjective linear map, and assume that for all of the coordinate subspaces101010A coordinate subspace of is a subspace generated by a subset of the standard basis vectors. of dimension , the map is also surjective. Let be any vector, and let be an interval of length . Fix a coordinate . Then
[TABLE]
Proof.
By the assumptions of the surjectivity of the restrictions of , without loss of generality we may assume that and that the first columns of form an invertible matrix . Then, by integrating over , one has
[TABLE]
since is invertible. ∎
The final lemma of this section details what occurs when one permutes the parameters in the global factor.
Lemma A.4**.**
Let , , be natural numbers, with , and let be positive parameters. Assume further that . Let be a surjective purely irrational linear map, and assume that for all of the coordinate subspaces of dimension , the map is also surjective. Let be any vector. Then
[TABLE]
The proof of this lemma is very similar to the proof of Lemma A.2. We merely sketch the relevant changes.
Proof.
By Lemma A.3 one may replace the left-hand side of (A.4) by
[TABLE]
Let be a suitably large constant (depending on ). Following the procedure in the proof of Lemma A.2, and with the same notation for and , one establishes that the left-hand side of (A.4) is equal to
[TABLE]
plus an error of size at most
[TABLE]
where indicates integration over those for which
[TABLE]
The error (A.10) may be bounded above by , by the same method as we used to bound (A.5).
The main term (A.9), from the work in Lemma A.2, is equal to
[TABLE]
Bounding this integral using Lemma A.1, the present lemma follows. ∎
Remark A.5**.**
In the proofs above, we used, in a critical way, the fact that the convex domains and are axis-parallel boxes.**
Appendix B An analytic argument
We take the opportunity to record a rather more direct argument which yields an asymptotic formula for expressions of the form
[TABLE]
in the case when is a linear map with at least (and certain irrationality conditions hold). This method is a simple elaboration on Parsell’s ideas [18], and can handle more general coefficients than Theorem 1.7 (although it requires more variables, of course). We suspect that this result has been obvious to the experts for fifteen years or more, but we feel that it should appear explicitly in the literature.
Theorem B.1**.**
Let be natural numbers, with , and let be a positive parameter. Let be a surjective linear map. Assume further that, when written as matrix with respect to the standard bases, all the -by- sub-matrices of have non-zero determinant. Assume also that there does not exist a vector such that (i.e. in the language of Definition 5.2 assume that is purely irrational). Then, for all vectors ,
[TABLE]
Remark B.2**.**
The asymptotic size of the main term may easily be established using Lemma A.2.**
Sketch proof.
We sketch the argument, referring heavily to estimates from [7] and [18]. Define
[TABLE]
Let be a function that tends to infinity as tends to infinity, to be defined later, and let be a function (depending on the function ) that tends to zero suitably slowly as tends to infinity.
By Lemma 3.1, there exists smooth functions for which and
[TABLE]
By Fourier inversion we see that
[TABLE]
where . We estimate the integrals by splitting the range of integration in three regions.
[TABLE]
for some large constant . See Section 8 for another instance of this technique.
Much of the estimation relies on the following tight mean value bound.
Lemma B.3**.**
Let be a domain. Let and let be a positive real number satisfying
[TABLE]
Then
[TABLE]
Proof.
We write the left-hand side of (B.3) as
[TABLE]
which is
[TABLE]
by Hölder’s inequality. Note that .
Now recall the bound from [18, Lemma 3], namely
[TABLE]
Note that for each fixed the -by- submatrix of given by is invertible. Therefore, by applying an invertible change of variables and splitting into boxes, (B.4) is
[TABLE]
This implies the lemma. ∎
Trivial arc: The estimation on the trivial arc proceeds very similarly to page 8 of [18]. Indeed, by the bound in Lemma 3.4 we have
[TABLE]
which is
[TABLE]
by Lemma B.3, provided decays slowly enough.
Minor arc: The following is the natural higher-dimensional version of the argument in [18].
Lemma B.4**.**
For any positive and ,
[TABLE]
Proof.
Note that by the prime number theorem one has the trivial bound . Assuming for contradiction that the lemma is false, there exists some , some , and some positive such that, for infinitely many , there exists a vector satisfying and
[TABLE]
Then by [18, Lemma 1] it follows that for each such , and for all , there exist integers and such that and
[TABLE]
where . We observe that, if is large enough, we have the bound . Since and are fixed we may (by taking a subsequence of ) assume that both and are independent of . We call these integers and respectively.
Now, suppose that for all . Then
[TABLE]
Since the map is injective, there exists an inverse linear map
[TABLE]
which must necessarily be bounded. Hence , which is a contradiction for large enough , since . Therefore there exists some for which .
Finally, the sequence is contained in a compact domain, and so it must have a convergent subsequence with limit , say. Taking this limit, we observe that
[TABLE]
and so
[TABLE]
Hence there exists a vector such that , contradicting the assumptions of Theorem B.1. This proves the lemma. ∎
In the usual fashion, one may use Lemma B.4 to deduce that there is some slowly growing function such that as such that
[TABLE]
the details being given in Section 3 of [18]. Defining the minor arc using this function , we have exactly
[TABLE]
Therefore, picking some positive parameter that is small enough such that taking satisfies the hypotheses of Lemma B.3,
[TABLE]
is
[TABLE]
if grows slowly enough.
Major arc: The analysis of the contribution from the major arc is routine, given the lemmas we established in Appendix A. Let be a small positive constant whose exact value may change between each line. By the estimate (7) from [18] one has, for ,
[TABLE]
where
[TABLE]
Since the measure of is , we have
[TABLE]
Since
[TABLE]
we may extend the above integral to all of at the cost of an error of . In other the words, the contribution from the major arcs is
[TABLE]
which is
[TABLE]
Fixing a large value of , since this expression is equal to
[TABLE]
by Lemma A.4. This completes the theorem. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Baker. On some diophantine inequalities involving primes. J. Reine Angew. Math. , 228:166–181, 1967.
- 2[2] Antal Balog. Linear equations in primes. Mathematika , 39(2):367–378, 1992.
- 3[3] P.-Y. Bienvenu. A higher-dimensional Siegel-Walfisz theorem. Acta Arith. , 179(1):79–100, 2017.
- 4[4] E. Bombieri, J. B. Friedlander, and H. Iwaniec. Primes in arithmetic progressions to large moduli. II. Math. Ann. , 277(3):361–393, 1987.
- 5[5] H. Davenport and H. Heilbronn. On indefinite quadratic forms in five variables. J. London Math. Soc. , 21:185–193, 1946.
- 6[6] D. E. Freeman. Asymptotic lower bounds for Diophantine inequalities. Mathematika , 47(1-2):127–159, 2000.
- 7[7] D. E. Freeman. Asymptotic lower bounds and formulas for Diophantine inequalities. In Number theory for the millennium, II (Urbana, IL, 2000) , pages 57–74. A K Peters, Natick, MA, 2002.
- 8[8] L. Grafakos. Classical Fourier analysis , volume 249 of Graduate Texts in Mathematics . Springer, New York, second edition, 2008.
