Linear inequalities in primes

Aled Walker

arXiv:1901.04855·math.NT·October 22, 2019

Linear inequalities in primes

Aled Walker

PDF

TL;DR

This paper establishes an asymptotic count for solutions to systems of linear inequalities in primes with fewer variables than previous methods, extending the Green-Tao-Ziegler theorem.

Contribution

It improves the variable requirement from 2m+1 to m+2 for m inequalities and generalizes existing results on linear equations in primes.

Findings

01

Proves asymptotic formula for solutions in primes

02

Reduces variable count needed for such solutions

03

Suggests a conjecture on sieve weights pseudorandomness

Abstract

In this paper we prove an asymptotic formula for the number of solutions in prime numbers to systems of simultaneous linear inequalities with algebraic coefficients. For $m$ simultaneous inequalities we require at least $m + 2$ variables, improving upon existing methods, which generically require at least $2 m + 1$ variables. Our result also generalises the theorem of Green-Tao-Ziegler on linear equations in primes. Many of the methods presented apply for arbitrary coefficients, not just for algebraic coefficients, and we formulate a conjecture concerning the pseudorandomness of sieve weights which, if resolved, would remove the algebraicity assumption entirely.

Equations917

|\{\mathbf{p}\in K:L\mathbf{p}=\mathbf{b}\}|=\Big{(}\alpha_{\infty}\prod\limits_{p}\alpha_{p}\Big{)}(\log N)^{-d}+o_{C,d,m}(N^{d-m}(\log N)^{-d}),

|\{\mathbf{p}\in K:L\mathbf{p}=\mathbf{b}\}|=\Big{(}\alpha_{\infty}\prod\limits_{p}\alpha_{p}\Big{)}(\log N)^{-d}+o_{C,d,m}(N^{d-m}(\log N)^{-d}),

\alpha_{p}:=\lim\limits_{M\rightarrow\infty}\frac{1}{(2M)^{d}}\sum\limits_{\begin{subarray}{c}\mathbf{n}\in[-M,M]^{d}\\ L\mathbf{n}=\mathbf{b}\\ (n_{i},p)=1\,\text{for all }i\end{subarray}}\Big{(}1+\frac{1}{p-1}\Big{)}^{d}

\alpha_{p}:=\lim\limits_{M\rightarrow\infty}\frac{1}{(2M)^{d}}\sum\limits_{\begin{subarray}{c}\mathbf{n}\in[-M,M]^{d}\\ L\mathbf{n}=\mathbf{b}\\ (n_{i},p)=1\,\text{for all }i\end{subarray}}\Big{(}1+\frac{1}{p-1}\Big{)}^{d}

α_{\infty} := ∣ {n \in Z^{d} : n \in K, L n = b, n_{i} ⩾ 0 for all i} ∣.

α_{\infty} := ∣ {n \in Z^{d} : n \in K, L n = b, n_{i} ⩾ 0 for all i} ∣.

L = (10 - 2 1 1 - 2 01), b = 0

L = (10 - 2 1 1 - 2 01), b = 0

∣ λ_{1} p_{1} + λ_{2} p_{2} + λ_{3} p_{3} ∣ ⩽ ε .

∣ λ_{1} p_{1} + λ_{2} p_{2} + λ_{3} p_{3} ∣ ⩽ ε .

(α λ_{1}, α λ_{2}, α λ_{3}) = (q_{1}, q_{2}, q_{3}) \in Z^{3} .

(α λ_{1}, α λ_{2}, α λ_{3}) = (q_{1}, q_{2}, q_{3}) \in Z^{3} .

∣ q_{1} p_{1} + q_{2} p_{2} + q_{3} p_{3} ∣ ⩽ ε α,

∣ q_{1} p_{1} + q_{2} p_{2} + q_{3} p_{3} ∣ ⩽ ε α,

q_{1} p_{1} + q_{2} p_{2} + q_{3} p_{3} = 0.

q_{1} p_{1} + q_{2} p_{2} + q_{3} p_{3} = 0.

∣ λ_{1} p_{1} + λ_{2} p_{2} + λ_{3} p_{3} ∣ ⩽ ε

∣ λ_{1} p_{1} + λ_{2} p_{2} + λ_{3} p_{3} ∣ ⩽ ε

C_{λ_{1}, λ_{2}, λ_{3}} ε N^{2} lo g^{- 3} N + o_{λ_{1}, λ_{2}, λ_{3}} (N^{2} lo g^{- 3} N),

C_{λ_{1}, λ_{2}, λ_{3}} ε N^{2} lo g^{- 3} N + o_{λ_{1}, λ_{2}, λ_{3}} (N^{2} lo g^{- 3} N),

(12 - 2 - 4 1003)

(12 - 2 - 4 1003)

p \in [N]^{d} \sum 1_{[- ε, ε]^{m}} (L p + v) = \frac{1}{lo g ^{d} N} x \in [0, N]^{d} \int 1_{[- ε, ε]^{m}} (L x + v) d x + o_{C, L, ε} (N^{d - m} (lo g N)^{- d})

p \in [N]^{d} \sum 1_{[- ε, ε]^{m}} (L p + v) = \frac{1}{lo g ^{d} N} x \in [0, N]^{d} \int 1_{[- ε, ε]^{m}} (L x + v) d x + o_{C, L, ε} (N^{d - m} (lo g N)^{- d})

C_{L} ε^{m} N^{d - m} (lo g N)^{- d} + o_{L, ε} (N^{d - m} (lo g N)^{- d}),

C_{L} ε^{m} N^{d - m} (lo g N)^{- d} + o_{L, ε} (N^{d - m} (lo g N)^{- d}),

∣ p_{1} + p_{3} 2 - p_{4} 3 ∣

∣ p_{1} + p_{3} 2 - p_{4} 3 ∣

∣ p_{2} + p_{3} 5 - p_{4} 7 ∣

L = (100125 - 3 - 7),

L = (100125 - 3 - 7),

\frac{C _{L}}{4} = 0 ⩽ x_{1}, x_{2} ⩽ 1 x_{1} v^{(1)} + x_{2} v^{(2)} \in [0, 1]^{2} \int 1 d x_{1} d x_{2},

\frac{C _{L}}{4} = 0 ⩽ x_{1}, x_{2} ⩽ 1 x_{1} v^{(1)} + x_{2} v^{(2)} \in [0, 1]^{2} \int 1 d x_{1} d x_{2},

v^{(1)} = (- 2 - 5), v^{(2)} = (37) .

v^{(1)} = (- 2 - 5), v^{(2)} = (37) .

\sum\limits_{p_{1},p_{2}\leqslant N}\prod\limits_{j=1}^{d}1_{\mathcal{P}\cap[N]}(\lfloor p_{1}+p_{2}\theta_{j}\rfloor)=C_{\boldsymbol{\theta}}\frac{N^{2}}{\log^{d}N}+o_{\boldsymbol{\theta}}\Big{(}\frac{N^{2}}{\log^{d}N}\Big{)},

\sum\limits_{p_{1},p_{2}\leqslant N}\prod\limits_{j=1}^{d}1_{\mathcal{P}\cap[N]}(\lfloor p_{1}+p_{2}\theta_{j}\rfloor)=C_{\boldsymbol{\theta}}\frac{N^{2}}{\log^{d}N}+o_{\boldsymbol{\theta}}\Big{(}\frac{N^{2}}{\log^{d}N}\Big{)},

p_{1}, p_{2} ⩽ N \sum p_{3}, \dots, p_{d + 2} ⩽ N \sum j = 3 \prod d + 2 1_{[0, 1)} (p_{1} + p_{2} θ_{j - 2} - p_{j}) .

p_{1}, p_{2} ⩽ N \sum p_{3}, \dots, p_{d + 2} ⩽ N \sum j = 3 \prod d + 2 1_{[0, 1)} (p_{1} + p_{2} θ_{j - 2} - p_{j}) .

p_{1}, p_{2} ⩽ N \sum p_{3}, \dots, p_{d + 2} ⩽ N \sum j = 3 \prod d + 2 1_{[0, 1]} (p_{1} + p_{2} θ_{j - 2} - p_{j}),

p_{1}, p_{2} ⩽ N \sum p_{3}, \dots, p_{d + 2} ⩽ N \sum j = 3 \prod d + 2 1_{[0, 1]} (p_{1} + p_{2} θ_{j - 2} - p_{j}),

p_{1}, p_{2} ⩽ N \sum p_{3}, \dots, p_{d + 2} ⩽ N \sum 1_{[0, 1]^{d}} (p_{1} 1 + p_{2} θ - p_{3}^{d + 2}),

p_{1}, p_{2} ⩽ N \sum p_{3}, \dots, p_{d + 2} ⩽ N \sum 1_{[0, 1]^{d}} (p_{1} 1 + p_{2} θ - p_{3}^{d + 2}),

L = (1 θ - I) .

L = (1 θ - I) .

p \in [N]^{d + 2} \sum 1_{[- \frac{1}{2}, \frac{1}{2}]^{d}} (L p + v),

p \in [N]^{d + 2} \sum 1_{[- \frac{1}{2}, \frac{1}{2}]^{d}} (L p + v),

C_{θ} = 0 ⩽ x_{1}, x_{2} ⩽ 1 0 ⩽ x_{1} + θ_{i} x_{2} ⩽ 1 for all i \int 1 d x_{1} d x_{2} .

C_{θ} = 0 ⩽ x_{1}, x_{2} ⩽ 1 0 ⩽ x_{1} + θ_{i} x_{2} ⩽ 1 for all i \int 1 d x_{1} d x_{2} .

T^{L,\mathbf{v}}_{F,G,N}(f_{1},\dots,f_{d}):=\frac{1}{N^{d-m}}\sum\limits_{\mathbf{n}\in\mathbb{Z}^{d}}\Big{(}\prod\limits_{j=1}^{d}f_{j}(n_{j})\Big{)}F(\mathbf{n}/N)G(L\mathbf{n}+\mathbf{v}).

T^{L,\mathbf{v}}_{F,G,N}(f_{1},\dots,f_{d}):=\frac{1}{N^{d-m}}\sum\limits_{\mathbf{n}\in\mathbb{Z}^{d}}\Big{(}\prod\limits_{j=1}^{d}f_{j}(n_{j})\Big{)}F(\mathbf{n}/N)G(L\mathbf{n}+\mathbf{v}).

Λ^{'} (n) := {lo g n 0 n is prime otherwise .

Λ^{'} (n) := {lo g n 0 n is prime otherwise .

Λ_{Z / q Z} (n) = {\frac{q}{φ ( q )} 0 (n, q) = 1 otherwise.

Λ_{Z / q Z} (n) = {\frac{q}{φ ( q )} 0 (n, q) = 1 otherwise.

x, y \in R^{d} sup \frac{∥ F ( x ) - F ( y ) ∥ _{\infty}}{∥ x - y ∥ _{\infty}},

x, y \in R^{d} sup \frac{∥ F ( x ) - F ( y ) ∥ _{\infty}}{∥ x - y ∥ _{\infty}},

T_{F, G, N}^{L, v} (Λ^{'}, \dots, Λ^{'}) = T_{F, G, N}^{L, v} (Λ_{Z / W Z}^{+}, \dots, Λ_{Z / W Z}^{+}) + o_{C, L, ε, σ} (1)

T_{F, G, N}^{L, v} (Λ^{'}, \dots, Λ^{'}) = T_{F, G, N}^{L, v} (Λ_{Z / W Z}^{+}, \dots, Λ_{Z / W Z}^{+}) + o_{C, L, ε, σ} (1)

T_{F, G, N}^{L, v} (Λ_{Z / W Z}^{+}, \dots, Λ_{Z / W Z}^{+}) = T_{F, G, N}^{L, v} (Λ_{Z / W Z}, \dots, Λ_{Z / W Z}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Linear inequalities in primes

Aled Walker

Trinity College, Cambridge, CB2 1TQ, United Kingdom

[email protected]

Abstract.

In this paper we prove an asymptotic formula for the number of solutions in prime numbers to systems of simultaneous linear inequalities with algebraic coefficients. For $m$ simultaneous inequalities we require at least $m+2$ variables, improving upon existing methods, which generically require at least $2m+1$ variables. Our result also generalises the theorem of Green-Tao-Ziegler on linear equations in primes. Many of the methods presented apply for arbitrary coefficients, not just for algebraic coefficients, and we formulate a conjecture concerning the pseudorandomness of sieve weights which, if resolved, would remove the algebraicity assumption entirely.

1 Introduction
2 The structure of the argument
I Preliminaries
3 Smooth functions
4 Notation and Conventions
II Linear algebra
5 Dimension reduction
6 Normal form
III Pseudorandomness
7 The $W$ -trick and Gowers norms
8 Inequalities in lattices
9 The linear inequalities condition
IV The structure of inequalities
10 An alternative formulation
11 Variation in parameters
V The main argument
12 Controlling by Gowers norms
13 Transferring from $\mathbb{Z}$ to $\mathbb{R}$
14 Parametrising the kernel
15 Gowers-Cauchy-Schwarz argument
16 Combining the lemmas
VI Final deductions
17 Removing Lipschitz cut-offs
VII Appendices
A Estimating integrals
B An analytic argument

1. Introduction

Fourier analysis is a vital tool in the study of diophantine problems. In recent years, however, new tools have been developed which can prove asymptotic formulae for the number of solutions to certain systems even when the Fourier-analytic approach is not known to succeed. In particular, in [13] Green and Tao established an asymptotic formula for the number of prime solutions to generic systems of $m$ simultaneous linear equations in at least $m+2$ variables. Their result was conditional on various conjectures, but these conjectures were later proved by the same authors and Ziegler, in the series of papers [14], [15] and [16].

Theorem 1.1 (Theorem 1.8, [13], Green-Tao-Ziegler).

Let $N$ , $m$ , and $d$ be natural numbers, with $d\geqslant m+2$ , and let $C$ be a positive constant. Let $L=(\lambda_{ij})_{i\leqslant m,j\leqslant d}$ be an $m$ -by- $d$ matrix with integer coefficients, with rank $m$ , and assume the non-degeneracy condition that the only element of the row-space of $L$ over $\mathbb{Q}$ with two or fewer non-zero entries is the zero vector. Let $\mathbf{b}\in\mathbb{Z}^{m}$ , and suppose that $\|\mathbf{b}\|_{\infty}\leqslant CN$ and that $|\lambda_{ij}|\leqslant C$ for all $i$ and $j$ . Let $K\subset[-N,N]^{d}$ be a convex set. Then

[TABLE]

where the local densities $\alpha_{p}$ are given, for each prime $p$ , by

[TABLE]

and the global factor $\alpha_{\infty}$ is given by

[TABLE]

Here and throughout, $p$ denotes a prime, $\mathbf{p}$ denotes a vector all of whose coordinates are prime, and $\mathbf{n}$ denotes a vector all of whose coordinates are integers $n_{i}$ . The expression $(n_{i},p)$ denotes the greatest common divisor of $n_{i}$ and $p$ .

To give a concrete example to which this result may be applied, by considering

[TABLE]

one may deduce an asymptotic formula for the number of four-term arithmetic progressions of primes that are less than $N$ .

For $m\geqslant 2$ , Theorem 1.1 is stronger than any similar statement that may be proved using the Fourier transform alone. Indeed, notwithstanding Balog’s example [2, Corollary 3] of a certain non-generic class of $m$ equations in $m+\lceil\sqrt{2m}\rceil$ prime variables, generically the Fourier transform approach needs at least $2m+1$ prime variables in order to succeed. The proof of Theorem 1.1 rests on many creative innovations, in particular the authors’ use of Gowers norms and their inverse theory, which is a subject that is now referred to as ‘higher order Fourier analysis’. The object of the present paper is to use certain aspects of this machinery to establish, in a related setting, an analogous reduction in the number of variables that are required to prove an asymptotic formula.

We will be concerned with diophantine inequalities, a topic that we first considered in [21]. Before giving our first main result (Theorem 1.7) let us briefly review some previous results concerning diophantine inequalities in the primes. Consider the following classical theorem of Baker.111In fact Baker proved a slightly different result, writing in the cited paper that the result we quote here followed easily from the then existing methods. Vaughan proved a similar result in [20].

Theorem 1.2 ([1], Baker).

Let $\varepsilon>0$ , and let $\lambda_{1},\lambda_{2},\lambda_{3}\in\mathbb{R}\setminus\{0\}$ be three non-zero reals that are not all of the same sign. Furthermore, suppose that for all $\alpha\in\mathbb{R}\setminus\{0\}$ the relation $(\alpha\lambda_{1},\alpha\lambda_{2},\alpha\lambda_{3})\notin\mathbb{Z}^{3}$ holds. Then there exist infinitely many triples of primes $(p_{1},p_{2},p_{3})$ satisfying

[TABLE]

Remark 1.3.

The condition concerning the signs of $\lambda_{1},\lambda_{2},\lambda_{3}$ is clearly a necessary one, as otherwise there exist only finitely many solutions to (1.2) in the positive integers (and so certainly there exist only finitely many solutions in the primes). Regarding the other condition, the conclusion of Theorem 1.2 may hold even if there exists some $\alpha\in\mathbb{R}\setminus\{0\}$ for which

[TABLE]

But then one is required to solve

[TABLE]

which, if $\varepsilon$ is small enough, is equivalent to solving

[TABLE]

Theorem 1.1 can then affirm that there are infinitely many solutions, provided that $q_{1}$ , $q_{2}$ , and $q_{3}$ satisfy certain local properties. This issue, of when an inequality can encode a certain equation with rational coefficients, will be an important theme of the paper.**

The classical approach to proving results such as Theorem 1.2 involves Fourier analysis over $\mathbb{R}$ , after having replaced the characteristic function of the interval $[-\varepsilon,\varepsilon]$ with a smoother cut-off function. This approach is known as the Davenport-Heilbronn method, it having originated in a paper [5] of those two authors. For a variety of technical reasons this method was, until relatively recently, unable to give an asymptotic formula for the number of solutions to (1.2) that satisfied $1\leqslant p_{1},p_{2},p_{3}\leqslant N$ , or even give a lower bound of the expected order of magnitude (at least for arbitrary $N$ ). However, certain advances of Freeman [6, 7] enabled Parsell to achieve the second of these two goals.

Theorem 1.4 (Theorem 1, [18], Parsell).

Let $\varepsilon>0$ , and let $\lambda_{1},\lambda_{2},\lambda_{3}\in\mathbb{R}\setminus\{0\}$ be three non-zero reals that are not all of the same sign. Furthermore, suppose that for all $\alpha\in\mathbb{R}\setminus\{0\}$ the relation $(\alpha\lambda_{1},\alpha\lambda_{2},\alpha\lambda_{3})\notin\mathbb{Z}^{3}$ holds. Then the number of prime triples $(p_{1},p_{2},p_{3})$ satisfying $1\leqslant p_{1},p_{2},p_{3}\leqslant N$ and

[TABLE]

is $\Omega_{\lambda_{1},\lambda_{2},\lambda_{3}}(\varepsilon N^{2}(\log N)^{-3}).$

Since [18] was published, it has been understood that a very minor modification to Parsell’s analytic method can be used to obtain an asymptotic expression for the number of solutions to (1.3), namely

[TABLE]

for some positive constant $C_{\lambda_{1},\lambda_{2},\lambda_{3}}$ . Furthermore, in the case of $m$ simultaneous (rationally independent) inequalities of the form (1.3), Parsell’s method can calculate an asymptotic formula for the number of solutions in primes provided the number of variables is at least $2m+1$ . In Appendix B we take the opportunity to record the details of both the statement and the proof of this result.

In the main theorems of this paper (Theorem 1.7 and Theorem 1.16) we specialise to the case of algebraic coefficients and reduce the number of variables that are required from $2m+1$ to $m+2$ . Our first result does not concern the most general type of diophantine inequality, but nonetheless it enjoys several applications. To state it, we recall the notion of the dual degeneracy variety, which we defined in Definition 2.3 of [21] in order to manipulate the non-degeneracy conditions more succinctly.

Definition 1.5 (Dual degeneracy variety, [21]).

Let $m,d$ be natural numbers satisfying $d\geqslant m+2$ . Let $V^{*}_{\operatorname{degen}}(m,d)$ denote the set of all $m$ -by- $d$ matrices with real coefficients that contain a non-zero row-vector in their row-space over $\mathbb{R}$ that has two or fewer non-zero co-ordinates. We call $V^{*}_{\operatorname{degen}}(m,d)$ the dual degeneracy variety.

For example, the matrix

[TABLE]

is in $V_{\operatorname{degen}}^{*}(2,4)$ , since the vector $(0,0,-2,\sqrt{3})$ lies in its row space. As is explained at length in [21], if one wishes to count solutions to an inequality given by $L$ using a method involving Gowers norms then one can only possibly succeed if $L\notin V_{\operatorname{degen}}^{*}(m,d)$ . Returning to Theorem 1.1, we observe that the non-degeneracy condition in the statement of that theorem is exactly the condition that $L\notin V_{\operatorname{degen}}^{*}(m,d)$ . If $d=m+2$ , non-degeneracy in this sense is easy to detect. Indeed, $L\notin V_{\operatorname{degen}}^{*}(m,d)$ if and only if the determinants of all the $m$ -by- $m$ submatrices of $L$ are non-vanishing.

Remark 1.6.

The above notion is ‘dual’ to the notion of finite Cauchy-Schwarz complexity (see Definition 5.5), in the sense that $L$ is in the dual degeneracy variety if and only if $\ker L$ may be parametrised by a system of linear forms with finite Cauchy-Schwarz complexity. In [21] we also introduced a degeneracy variety in order to manipulate quantitative versions of this fact, but this will not be necessary here. For more on these issues, we invite the reader to consult Sections 6 and 7 of [21].**

We are now ready to state our first main result. In the statement below, $[N]$ refers to the set $\mathbb{N}\cap[1,N]$ and the function $1_{[-\varepsilon,\varepsilon]^{m}}$ refers to the indicator function of the set $[-\varepsilon,\varepsilon]^{m}$ .

Theorem 1.7 (Main theorem, purely irrational version).

Let $N,m,d$ be natural numbers, with $d\geqslant m+2$ , and let $C,\varepsilon$ be positive constants. Let $L$ be an $m$ -by- $d$ real matrix with algebraic coefficients and rank $m$ . Suppose that $L\notin V^{*}_{\operatorname{degen}}(m,d)$ . Suppose further that for all $\boldsymbol{\alpha}\in\mathbb{R}^{m}\setminus\{\mathbf{0}\}$ one has $L^{T}\boldsymbol{\alpha}\notin\mathbb{Z}^{d}$ , i.e. suppose that $L$ is purely irrational in the sense of Definition 2.4 of [21]. Let $\mathbf{v}\in\mathbb{R}^{m}$ be any vector satisfying $\|\mathbf{v}\|_{\infty}\leqslant CN$ . Then

[TABLE]

as $N\rightarrow\infty$ .

Remark 1.8.

One notes that in the asymptotic formula (1.4) there is not a contribution from any non-archimedean local factors. In Theorem 1.16 below, we will remove the supposition that there does not exist any non-zero vector $\boldsymbol{\alpha}\in\mathbb{R}^{m}\setminus\{\mathbf{0}\}$ for which $L^{T}\boldsymbol{\alpha}\in\mathbb{Z}^{d}$ . Once these potential rational relations are permitted, one does indeed observe a contribution from local factors.**

Remark 1.9.

When $\mathbf{v}=\mathbf{0}$ , it is straightforward to show (see Lemma A.2) that the main term in (1.4) is equal to

[TABLE]

where $C_{L}$ is a constant depending only on $L$ . The positivity of $C_{L}$ may be determined in practice.**

Remark 1.10.

The reader may note that Theorem 1.7 insists upon a fixed matrix $L$ , rather than a matrix $L$ with bounded coefficients (as appeared in Theorem 1.1). In our previous work [21, Theorem 2.10], performed in the context of linear inequalities weighted by bounded functions we proved a result that enabled $L$ to vary, as long as the coefficients of $L$ were bounded and $L$ was bounded away from $V_{\operatorname{degen}}^{*}(m,d)$ . In the present paper there are many auxiliary linear equalities $L^{\prime}$ , which will also need to enjoy such a quantitative non-degeneracy. We found keeping track of these features throughout the whole argument to be extremely complicated, but in principle it should be possible to do so.**

Remark 1.11.

Theorem 1.7 strengthens Theorem B.1 of Parsell, in the sense that the number of variables has been reduced (from $2m+1$ to $m+2$ ). But unfortunately this has been achieved at the cost of imposing an algebraicity assumption on the coefficients of $L$ . The situation is regrettable as, under this assumption, the classical Davenport-Heilbronn method alone is adequate to count the number of prime solutions to $m$ simultaneous linear inequalities in $2m+1$ variables, without needing the developments of Parsell. We should stress that most of our method does not rely on the algebraicity assumption. Indeed, the conclusions of Theorems 1.7 and 1.16 do in fact hold for some explicit set of matrices $L$ that has full Lebesgue measure (see Remark 9.7). Unfortunately, owing to the intricacy of the linear-algebraic manipulations in Section 15, we have not been able to formulate a clean or enlightening characterisation of this full-measure set. We have decided to clarify the exposition of the paper by working with algebraic coefficients throughout.**

Let us give a concrete example of a linear inequality to which Theorem 1.7 applies but the Davenport-Heilbronn method does not.

Example 1.12.

Let $\varepsilon>0$ . Then the number of prime quadruples $(p_{1},p_{2},p_{3},p_{4})\in[N]^{4}$ satisfying

[TABLE]

is equal to $C\varepsilon^{2}N^{2}(\log N)^{-4}+o_{\varepsilon}(N^{2}(\log N)^{-4})$ , for some positive constant $C$ .

Proof.

Taking

[TABLE]

$L$ certainly satisfies the hypotheses of Theorem 1.7, since all the $2$ -by- $2$ submatrices have non-zero determinant and surds of primes are rationally independent. Taking $\mathbf{v}=\mathbf{0}$ , one may therefore apply Theorem 1.7.

This yields an asymptotic expression for the number of solutions to (1.12) with the main term in the form of an integral. Since $\mathbf{v}=\mathbf{0}$ , by Remark 1.9 we may express the main term as $C_{L}\varepsilon^{2}N^{2}(\log N)^{-4}$ for some constant $C_{L}$ . Explicitly, from Lemma A.2 and expression (A.6) therein,

[TABLE]

where

[TABLE]

By a computation, we satisfy ourselves that $C_{L}\approx 1.394...$ is positive. ∎

Theorem 1.7 may also be used to count prime solutions to other systems.

Corollary 1.13.

Let $(\theta_{1},\dots,\theta_{d})^{T}=\boldsymbol{\theta}\in\mathbb{R}^{d}$ be a real vector with algebraic coefficients. Suppose that there does not exist any $\mathbf{k}\in\mathbb{Z}^{d}\setminus\{\mathbf{0}\}$ that satisfies $\mathbf{k}\cdot\boldsymbol{\theta}\in\mathbb{Z}$ . Let $\mathcal{P}$ denote the set of primes. Then

[TABLE]

for some positive constant $C_{\boldsymbol{\theta}}$ .

Here $\lfloor x\rfloor$ denotes the floor function of $x$ , i.e. the greatest integer that is at most $x$ .

Proof.

We can expand the left-hand side of (1.6) as

[TABLE]

Observe that the equation $p_{1}+p_{2}\theta_{j-2}-p_{j}=1$ has no solutions, since $\theta_{j-2}$ is irrational by assumption. So the above is equal to

[TABLE]

and this in turn is equal to

[TABLE]

where $\mathbf{1}\in\mathbb{R}^{d}$ is the vector with every coordinate equal to $1$ , and $\mathbf{p_{3}^{d+2}}:=(p_{3},\dots,p_{d+2})^{T}$ .

Let $L$ be the $d$ -by- $(d+2)$ matrix

[TABLE]

Then (1.7) is equal to

[TABLE]

where $\mathbf{v}:=(-1/2,\dots,-1/2)^{T}$ .

One sees that $L$ satisfies the hypotheses of Theorem 1.7. Indeed, note first that if there exists some $\boldsymbol{\alpha}\in\mathbb{R}^{d}\setminus\{\mathbf{0}\}$ for which $L^{T}\boldsymbol{\alpha}\in\mathbb{Z}^{d+2}$ then by considering the final $d$ coordinates of $L^{T}\boldsymbol{\alpha}$ it follows such an $\boldsymbol{\alpha}$ must have integer coordinates. But by considering the second coordinate of $L^{T}\boldsymbol{\alpha}$ it follows that $\boldsymbol{\alpha}\cdot\boldsymbol{\theta}\in\mathbb{Z}$ , which is a contradiction to our assumptions on $\boldsymbol{\theta}$ . Secondly, if $L$ were in $V_{\operatorname{degen}}^{*}(d,d+2)$ then either $\theta_{i}=0$ for some index $i$ , or $\theta_{i}=\theta_{j}$ for two different indices $i$ and $j$ . Both of these possibilities are precluded by the assumptions on $\boldsymbol{\theta}$ .

Therefore we may apply Theorem 1.7, and by Remark 1.9 we get a main term of the form $C_{\boldsymbol{\theta}}N^{2}(\log N)^{-d}$ . Explicitly, using Lemma A.2 and expression (A.6) as above, we have

[TABLE]

For any vector $\boldsymbol{\theta}$ this integral is positive, and so the corollary is proved. ∎

Let us now present a theorem which does not require $L$ to be purely irrational. This is Theorem 1.16 below, and we consider it to be our main result.

For ease of notation, we introduce the following definition.

Definition 1.14.

Let $N,m,d$ be natural numbers, and let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a linear map. Let $F:\mathbb{R}^{d}\rightarrow\mathbb{R}$ and $G:\mathbb{R}^{m}\rightarrow\mathbb{R}$ be functions with compact support. Let $\mathbf{v}\in\mathbb{R}^{m}$ . Then, for functions $f_{1},\dots,f_{d}:\mathbb{Z}\longrightarrow\mathbb{R}$ , we define

[TABLE]

It will be convenient to introduce a logarithmic weighting to the primes. To this end, following [13], we define the function $\Lambda^{\prime}:\mathbb{Z}\longrightarrow\mathbb{R}$ by

[TABLE]

The von Mangoldt function $\Lambda$ will not be needed in this paper.

Another notion from [13] will be useful.

Definition 1.15 (Local von Mangoldt function).

For $q\geqslant 2$ , the local von Mangoldt function $\Lambda_{\mathbb{Z}/q\mathbb{Z}}:\mathbb{Z}\longrightarrow\mathbb{R}$ is the $q$ -periodic function defined by

[TABLE]

We let $\Lambda_{\mathbb{Z}/q\mathbb{Z}}^{+}:\mathbb{Z}\longrightarrow\mathbb{R}$ denote the restriction of $\Lambda_{\mathbb{Z}/q\mathbb{Z}}$ to the non-negative reals, namely the function $\Lambda_{\mathbb{Z}/q\mathbb{Z}}1_{[0,\infty)}$ .

The local von Mangoldt function, when $q$ is the product of small primes, can be viewed as a model for the function $\Lambda^{\prime}$ . This model222This is essentially the modified Cramér random model. is intimately connected to a technical device known as the $W$ -trick, which we recall in Section 7.

For a function $F:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ we define the Lipschitz constant of $F$ to be

[TABLE]

and call $F$ Lipschitz if this value is finite.

We may now state the main theorem.

Theorem 1.16 (Main theorem).

Let $N,m,d$ be natural numbers with $d\geqslant m+2$ , and let $C,\varepsilon,\sigma$ be positive real parameters. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map with algebraic coefficients, and suppose that $L\notin V_{\operatorname{degen}}^{*}(m,d)$ . Let $\mathbf{v}\in\mathbb{R}^{m}$ be any vector that satisfies $\|\mathbf{v}\|_{\infty}\leqslant CN$ . Let $F:\mathbb{R}^{d}\longrightarrow[0,1]$ and $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be compactly supported Lipschitz functions with Lipschitz constants at most $\sigma^{-1}$ , and assume that $F$ is supported on $[-1,1]^{d}$ and $G$ is supported on $[-\varepsilon,\varepsilon]^{m}$ . Let $w=w(N):=\log\log\log N$ , assuming that $N$ is large enough for this function to be well defined, and let $W=W(N):=\prod\limits_{p\leqslant w}p$ . Then

[TABLE]

as $N\rightarrow\infty$ .

Remark 1.17.

If $F$ is supported on $[0,1]^{d}$ , we have

[TABLE]

We will prove an asymptotic formula for $T_{F,G,N}^{L,\mathbf{v}}(\Lambda_{\mathbb{Z}/W\mathbb{Z}},\dots,\Lambda_{\mathbb{Z}/W\mathbb{Z}})$ later, in Lemma 9.11 and Remark 9.12. For example, if $\mathbf{v}=\mathbf{0}$ and

[TABLE]

say, and $F$ and $G$ are smooth functions supported on $[0,1]^{4}$ and $[-1/2,1/2]^{2}$ respectively, one may use Lemma 9.11 and Remark 9.12 to show that

[TABLE]

where

[TABLE]

and

[TABLE]

where

[TABLE]

The constant $\mathfrak{S}$ is in fact equal to

[TABLE]

*where $(\xi_{1},\xi_{2},\xi_{3},\xi_{4})$ are the coordinate maps for $\Xi$ . ***

It takes some effort to establish precisely what the map $\Xi$ should be for a given $L$ . What’s more, the asymptotic formula in the general case is not just a product of a local factor and a global factor but rather a finite sum of products of local factors and global factors, and we will need to introduce an abundance of additional notation in order to be able to state these terms properly. Thus, in the interests of readability, we choose not to include this formula as part of the statement of Theorem 1.16.**

Remark 1.18.

If $L$ has rational coefficients333or more generally if $L$ has rational dimension $m$ , see Definition 5.2 below., then Theorem 1.16 reduces to a statement on linear equations in primes (a reduction which we will make precise in Remark 5.7 below). In this sense, our work is a generalisation of Green-Tao-Ziegler.**

Remark 1.19.

We have phrased Theorem 1.16 with Lipschitz cut-offs $F$ and $G$ . In Section 17 we will demonstrate how these cut-offs may be removed when $L$ is ‘purely irrational’, and in doing so will demonstrate how Theorem 1.16 implies Theorem 1.7. The same methods may be applied when $L$ is not purely irrational, but they will not always succeed, due to the rational degeneracy introduced in those cases. Unfortunately we have not been able to formulate what we regard to be a satisfactory general condition for saying when (1.9) holds with sharp cut-offs $F$ and $G$ . Note in particular how the proof of Lemma A.2 relies heavily on the convex sets $[-\varepsilon,\varepsilon]^{m}$ and $[0,N]^{d}$ being axis-parallel boxes. Therefore we do not present a version of the theorem in which summation is over a general convex set $K$ , as is done in Theorem 1.1. However, if the reader wishes to apply a specific instance of Theorem 1.16 with sharp cut-offs, the methods of Section 17 and Appendix A will almost certainly suffice for the purpose.**

Remark 1.20.

The reader will observe that, as in Theorem 1.7, we do not determine the nature of the dependence of the error term in (1.9) on the map $L$ . We discussed this feature in Remark 1.10.**

We conjecture that the conclusion of Theorem 1.16 holds for all $L\notin V_{\operatorname{degen}}^{*}(m,d)$ , provided $w$ grows slowly enough in terms of $L$ .

Conjecture 1.21 (Transcendental case).

Let $L$ , $\mathbf{v}$ , $F$ , and $G$ be as in the statement of Theorem 1.16, but do not assume that $L$ necessarily has algebraic coefficients. Then there is some function $w:\mathbb{N}\longrightarrow\mathbb{R}_{\geqslant 0}$ , with $w(N)\rightarrow\infty$ as $N\rightarrow\infty$ , such that (1.9) holds with $W=\prod\limits_{p\leqslant w}p$ .

In Section 9 we will formulate a statement involving smoothed sieve weights (namely Conjecture 9.6) which, if resolved, would settle Conjecture 1.21.

Acknowledgments. During the writing of this paper we benefited greatly from the supervision of Ben Green, and had helpful conversations with Sam Chow, Trevor Wooley, Yufei Zhao, Joni Teräväinen and Kaisa Matomäki. We would like to thank an anonymous referee for an exceptionally detailed reading of the manuscript and for many helpful corrections and comments. The majority of the work was carried out while the author was supported by EPSRC grant no. EP/M50659X/1, continued while the author was a Program Associate at the Mathematical Sciences Research Institute in Berkeley, and finished while the author was supported by a Junior Research Fellowship at Trinity College Cambridge.

2. The structure of the argument

In this section we discuss our approach to proving Theorem 1.16, and describe the geography of the paper as a whole.

Initially, one might hope that Theorem 1.16 could be proved by replacing the coefficients of $L$ with some rational approximations, by considering the corresponding linear equation with rational coefficients, and then by appealing directly to Theorem 1.1 on linear equations in primes. However, unless the coefficients of $L$ are extremely well-approximable by rationals (and in particular are transcendental), such an approach does not seem to succeed. Indeed, let $L=(\lambda_{ij})_{i\leqslant m,j\leqslant d}$ and let $\lambda^{\prime}_{ij}$ be a rational approximation to $\lambda_{ij}$ , with $L^{\prime}$ being the corresponding approximation to $L$ . In order for the comparison of $L$ with $L^{\prime}$ to be meaningful, we will need $\|L\mathbf{n}-L^{\prime}\mathbf{n}\|_{\infty}=O(1)$ for all relevant $\mathbf{n}$ , and in the general situation where all coordinates of $\mathbf{n}$ have magnitude $\Omega(N)$ this requires $|\lambda^{\prime}_{ij}-\lambda_{ij}|$ to be $O(N^{-1})$ . Hence the numerator and denominator of $\lambda^{\prime}_{ij}$ must grow rapidly with $N$ , unless $\lambda_{ij}$ is extremely well-approximable. Yet Theorem 1.1 requires the coefficients of the associated affine linear equations to have height $O(1)$ (excepting the constant term, which may be $O(N)$ ). In [3] Bienvenu offers a slight improvement, but even with this refinement it does not seem that we can apply an existing result on linear equations in primes as a black box.

Instead, we will follow a similar approach to that which we used in our work [21], a paper that considered diophantine inequalities in the setting of bounded functions. Namely, we replace the function $\Lambda^{\prime}:\mathbb{Z}\rightarrow\mathbb{R}$ by a suitable convolution $\Lambda^{\prime}\ast\chi:\mathbb{R}\rightarrow\mathbb{R}$ , designed to ensure the validity of the approximation

[TABLE]

The integral may be manipulated by certain reparametrisations (Lemma 14.3), yielding expressions of the form

[TABLE]

where $(\psi_{1},\dots,\psi_{d})=\Psi:\mathbb{R}^{d-m}\longrightarrow\mathbb{R}^{d}$ parametrises $\ker L$ and $g_{1},\dots,g_{d}$ are certain functions. By applying the Gowers-Cauchy-Schwarz inequality, in a manner strongly resembling [13, Appendix C], such expressions may be bounded by the Gowers norm $\|\Lambda^{\prime}-\Lambda_{\mathbb{Z}/W\mathbb{Z}}\|_{U^{s+1}[N]}$ , for some $s=O(1)$ . A qualitative bound on this Gowers norm is known by the work of Green-Tao-Ziegler (see Lemma 7.5), and so Theorem 1.16 follows.

The novel aspect of this manipulation, over the work of [13] and [21], is the appearance of various auxiliary linear inequalities, weighted by upper bound sieve weights. These enter in a manner that is somewhat analogous to the way in which the so-called ‘linear forms condition’ arises in [13]. Asymptotics for the number of solutions to these auxiliary inequalities underpin the argument, and this leads to a ‘linear inequalities condition’

[TABLE]

for a sieve weight $\nu$ , which is our corresponding notion of pseudorandomness (made precise in Definition 9.1). We are unable to verify this pseudorandomness condition in full generality, but we succeed in the case when $L$ has algebraic coefficients. Our key technical tool is a bound for the number of solutions to a diophantine inequality restricted to a lattice, which we prove using the Davenport-Heilbronn method. This is the only part of the entire argument that uses the fact that the coefficients are assumed to be algebraic.

There is a final technical manoeuvre that we employ, one which has no direct analogue in [13] or [21]. It will transpire that passing to the local von Mangoldt function $\Lambda_{\mathbb{Z}/W\mathbb{Z}}$ introduces certain singular expressions, which arise from the fact that we are dealing with inequalities rather than equations. To circumvent this issue we find it necessary to work at two different ‘local scales’, introducing functions $\Lambda_{\mathbb{Z}/W^{*}\mathbb{Z}}$ and $\Lambda_{\mathbb{Z}/W\mathbb{Z}}$ . By careful manoeuvring one can ensure that the singular expressions are only introduced by the $W^{*}$ scale, and so, provided $W^{*}$ grows slowly enough compared to $W$ , these singularities may be offset by the decay in the Gowers norm expressions involving $W$ . This further complicates the analysis of the expressions, and in fact our final choice of function $W^{*}$ will be non-effective.

The structure of the paper is as follows. The main elements of the proof of Theorem 1.16 take place in Part V, and the reader may wish to begin with this section. It is here that we reduce matters to bounding certain systems by Gowers norms (Section 12), prove the approximation (2.1) (Section 13), and apply the Gowers-Cauchy-Schwarz inequality (Section 15).

However, the arguments of this part rely heavily on lemmas that are proved earlier in the paper, and these lemmas split naturally into four types. There are those results that are standard properties of smooth functions, and these are recorded in Section 3. We also have lemmas whose proofs involve manipulation of a purely linear algebraic nature, in order to reduce inequalities to ones that are ‘purely irrational’ or to put linear equations into ‘normal form’. We describe these notions in Part II. The definition of pseudorandomness for an enveloping sieve weight is contained in Part III, as is our proof that a certain weight satisfies this pseudorandomness condition. Also in this part one may find Conjecture 9.6, which, if resolved, would remove the algebraicity assumptions. Part IV is reserved for those lemmas that involve the (somewhat tedious) manipulation of integrals into more pleasant forms. One of these lemmas is Lemma 11.1, which is the lemma that introduces the second local scale $W^{*}$ that we mentioned above.

The first appendix is concerned with elementary estimates relating to the integral that appears in the global factor of Theorem 1.16. As we have already said, Appendix B presents a Fourier-analytic argument which is essentially due to Parsell.

Finally, let us mention that, to help to streamline the statements of various propositions and lemmas in the paper as a whole, we have found it useful to introduce certain notational conventions that are unique to this paper. We describe these in Section 4.

Part I Preliminaries

3. Smooth functions

Smooth functions will play a significant role in the paper, and in this section we collect together those notions and lemmas that will be necessary for our forthcoming manipulations.

Following [17, Section 2], given a natural number $d$ and a compactly supported smooth function $F:\mathbb{R}^{d}\longrightarrow\mathbb{R}$ , we define $d(F)$ to be the corresponding value of $d$ , $\operatorname{Rad}(F)$ to be the smallest $R$ such that $F$ is supported on $[-R,R]^{d}$ , and for every non-negative integer $j$ we define

[TABLE]

Then, if $P$ is any set, we shall define $\mathcal{C}(P)$ to be the set of those smooth functions $F$ for which

[TABLE]

can be bounded above by quantities that depend only on the elements of $P$ . For example, let $g:\mathbb{R}\longrightarrow\mathbb{R}$ be the function given by

[TABLE]

and then for a positive parameter $\delta$ let $g_{\delta}:\mathbb{R}\longrightarrow\mathbb{R}$ be defined by $g_{\delta}(x):=g(x/\delta)$ . Then $g_{\delta}\in\mathcal{C}(\delta)$ , as is proved rather succinctly in [4, Lemma 9], say.

In order to shorten some of the statements in the main part of the paper, it will be convenient to consider all functions on $\mathbb{R}^{0}$ to be smooth (with derivatives equal to [math]).

Let us record a standard proposition on smooth majorants and minorants.

Lemma 3.1.

Let $\delta$ be a real number in the range $0<\delta<1$ . Then there exist two smooth functions $f^{+\delta},f^{-\delta}:\mathbb{R}\rightarrow[0,1]$ , with $f^{+\delta},f^{-\delta}\in\mathcal{C}(\delta)$ , satisfying

[TABLE]

for all $x\in\mathbb{R}$ .

Proof.

Let $g$ be as above, and let $C:=\int g(x)\,dx$ . Then one may define

[TABLE]

and

[TABLE]

The fact that $f^{+\delta},f^{-\delta}\in\mathcal{C}(\delta)$ follows from differentiating under the integral (which is easily justified by the mean value theorem). ∎

Lemma 3.2 (Smooth partition of unity).

Let $\delta$ be a real number in the range $0<\delta<1$ . Then there exists a natural number $t$ , satisfying $t=O(\delta^{-1})$ , and functions $f_{1},\dots,f_{t}:\mathbb{R}\longrightarrow[0,1]$ such that

(1)

for each $i\leqslant t$ , $f_{i}\in\mathcal{C}(\delta)$ ; 2. (2)

for each $i\leqslant t$ , $f_{i}$ is supported on an interval of length at most $2\delta$ ; 3. (3)

for all $x\in\mathbb{R}$ , $1_{[-1+\delta,1-\delta]}(x)\leqslant\sum\limits_{i=1}^{t}f_{i}(x)\leqslant 1_{[-1-\delta,1+\delta]}(x)$ ; 4. (4)

for all $x\in\mathbb{R}$ , $x$ is contained in the support of at most $2$ of the functions $f_{i}$ .

Proof.

Let $t=\lceil 4\delta^{-1}\rceil$ , and write

[TABLE]

where

[TABLE]

Then define

[TABLE]

The desired properties are immediate. ∎

Lemma 3.3 (Approximating Lipschitz functions by smooth boxes).

Let $\delta,\sigma,N$ be positive real parameters, with $\delta,\sigma$ in the range $0<\delta,\sigma<1/2$ . Let $d$ be a natural number, and let $F:\mathbb{R}^{d}\longrightarrow[0,1]$ be a Lipschitz function supported on $[-N,N]^{d}$ with Lipschitz constant at most $(\sigma N)^{-1}$ . Then there exists a natural number $k$ , satisfying $k=O(\delta^{-d})$ , and functions $F_{1},\dots,F_{k}:\mathbb{R}^{d}\longrightarrow[0,1]$ such that

(1)

$\|F-\sum\limits_{i=1}^{k}F_{i}\|_{\infty}=O(\delta\sigma^{-1})$ ; 2. (2)

for each $i\leqslant k$ , $F_{i}$ is supported on a box with side length $O(\delta)$ ; 3. (3)

there is a natural number $t$ , satisfying $t=O(\delta^{-1})$ , and functions $f_{1},\dots,f_{t}:\mathbb{R}\longrightarrow[0,1]$ , satisfying $f_{1},\dots,f_{t}\in\mathcal{C}(\delta)$ , such that

[TABLE]

for each $i\leqslant k$ , for some element $S^{(i)}\in[t]^{d}$ and some constant $c_{i,F}\in[0,1]$ .

Proof.

We have

[TABLE]

where the functions $f_{1},\dots,f_{t}$ are those constructed by applying Lemma 3.2 with this value of $\delta$ . This manipulation is indeed valid, since $F(\mathbf{x})=0$ for any $\mathbf{x}$ for which

[TABLE]

Swapping the product and summation, (3) equals

[TABLE]

Let $\mathbf{x}^{(S)}\in\mathbb{R}^{d}$ be any point at which $\prod\limits_{j=1}^{d}f_{S_{j}}(x^{(S)}_{j}/2N)$ is non-zero. Then the above is equal to

[TABLE]

by the Lipschitz properties of $F$ and the limited support of the functions $f_{1},\dots,f_{t}$ (which was part (2) of Lemma 3.2).

Define

[TABLE]

These functions satisfy properties (2) and (3) of Lemma 3.3. Finally note that, by part (4) of Lemma 3.2, each $\mathbf{x}\in\mathbb{R}$ is contained in the support of at most $O(1)$ of the functions $F_{S}$ , and hence $\|F-\sum\limits_{S\in[t]^{d}}F_{S}\|_{\infty}=O(\delta\sigma^{-1})$ , as required. ∎

The Fourier transform of smooth functions will be an important tool in Section 8. We choose the following convention. If $F:\mathbb{R}^{d}\longrightarrow\mathbb{R}$ is a compactly supported smooth function, we define the Fourier transform $\widehat{F}:\mathbb{R}^{d}\longrightarrow\mathbb{C}$ by the formula

[TABLE]

Lemma 3.4.

Let $P$ be a set of parameters and suppose $F\in\mathcal{C}(P)$ . Then for every $\boldsymbol{\alpha}$ and every non-negative integer $K$ one has

[TABLE]

Proof.

This follows from integration by parts. ∎

Finally, we recall the definition of dual lattices and the version of the Poisson summation formula that we will use.

Definition 3.5 (Dual lattice).

Let $h$ be a natural number and let $\Gamma\leqslant\mathbb{R}^{h}$ be a lattice of rank $h$ . Then the dual lattice $\Gamma^{*}$ is defined by

[TABLE]

It is easily seen that if $M$ is an $h$ -by- $h$ matrix whose columns are a lattice basis for $\Gamma$ , then $(M^{-1})^{T}$ is an $h$ -by- $h$ matrix whose columns are a lattice basis for $\Gamma^{*}$ .

Lemma 3.6 (Poisson summation).

Let $h$ be a natural number and let $\Gamma\leqslant\mathbb{R}^{h}$ be a lattice of rank $h$ . Let $F:\mathbb{R}^{h}\longrightarrow\mathbb{C}$ be a smooth compactly supported function. Then

[TABLE]

Proof.

This is a standard result. The version in which $\Gamma=\mathbb{Z}^{h}$ appears as [8, Theorem 3.1.17], with the extension to general full-rank lattices following from a change of variables. ∎

4. Notation and Conventions

For the most part the notation used in this paper is very standard, and any usage that could be viewed as somewhat unusual will be introduced as and when it is required. However, there are a few particular points that will apply to the paper as a whole which we believe to be important to address now.

We will use the Bachmann-Landau asymptotic notation $O$ , $o$ , and $\Omega$ , but we do not, as is sometimes the convention, for a function $f$ and a positive function $g$ choose to write $f=O(g)$ if there exists a constant $C$ such that $|f(N)|\leqslant Cg(N)$ for $N$ sufficiently large. Rather we require the inequality to hold for all $N$ in some pre-specified range. If $N$ is a natural number, the range is always assumed to be $\mathbb{N}$ unless otherwise specified. It will be a convenient shorthand to use these symbols in conjunction with minus signs, whenever they appear in exponents. For example, $N^{-\Omega(1)}$ refers to a term $N^{-c}$ , where $c$ is some positive quantity bounded away from [math] as the asymptotic parameter tends to infinity.

The Vinogradov symbol $\ll$ will be used, where for a function $f$ and a positive function $g$ we write $f\ll g$ if and only if $f=O(g)$ . We write $f\asymp g$ if $f\ll g$ and $g\ll f$ . If an implied constant or a $o(1)$ term depends on other parameters, we will denote these by subscripts, e.g. $O_{c,C,\varepsilon}(1)$ , or $f\asymp_{\varepsilon}g$ . However, if the implied constants depend on the underlying dimensions (denoted by $m$ , $d$ , and occasionally by $h$ , $s$ , and $t$ ) we will not record this fact explicitly, as this would render most of the expressions unreadable.

The notation $\operatorname{Rad}(F)$ , which was introduced in the previous section for compactly supported smooth functions $F$ , will also be used when $F$ is not smooth.

In order to keep track of which variables are scalars and which are vectors, we will use boldface $\mathbf{x}$ to denote any $\mathbf{x}\in\mathbb{R}^{d}$ where $d$ could be at least $2$ . In order to describe certain integrals over many variables, the following notational convention will be useful. If $\mathbf{x}\in\mathbb{R}^{d}$ and if $a$ and $b$ are two subscripts with $1\leqslant a\leqslant b\leqslant d$ , we use $\mathbf{x_{a}^{b}}$ to denote the vector $(x_{a},x_{a+1},\cdots,x_{b})^{T}\in\mathbb{R}^{b-a+1}$ .

With a view to trying to shorten some of the statements and proofs to follow, there are certain functions that we will fix throughout the paper, namely $w$ , $W$ , $\rho$ , and $\chi$ . From now on, the function $w:\mathbb{N}\longrightarrow\mathbb{R}_{\geqslant 0}$ will always be defined by

[TABLE]

Whenever $N$ is a quantity that we have defined, we write $w$ for $w(N)$ and let

[TABLE]

The empty product is considered to be equal to $1$ . Whenever other functions $w_{1},\dots,w_{d},w^{*}:\mathbb{N}\longrightarrow\mathbb{R}_{\geqslant 0}$ occur, and a natural number $N$ is given, we will define $W_{1},\dots,W_{d},W^{*}$ analogously.

The following definition (a smooth version of [21, Definition 5.2]) will be a useful way to control certain functions that are required in the argument.

Definition 4.1 ( $\eta$ -supported).

Let $\chi:\mathbb{R}\longrightarrow[0,1]$ be a smooth function, and let $\eta$ be a positive parameter. We say that $\chi$ is $\eta$ -supported if $\chi$ is supported on $[-\eta,\eta]$ and $\chi(x)\equiv 1$ for all $x\in[-\eta/2,\eta/2]$ .

It follows from Lemma 3.1 that $1$ -supported functions exist. From now on we fix a smooth function

[TABLE]

that is $1$ -supported. We think of $\rho$ as an element of $\mathcal{C}(\emptyset)$ . Whenever a positive parameter $\eta$ is defined we also define

[TABLE]

by the relation $\chi(x):=\rho(x/\eta)$ . The function $\chi$ is $\eta$ -supported, and satisfies $\chi\in\mathcal{C}(\eta)$ .

We finish this section with some pieces of notation of a more standard nature. If $X,Y\subset\mathbb{R}^{d}$ for some $d$ , we define

[TABLE]

If $X$ is the singleton $\{x\}$ , we write $\operatorname{dist}(x,Y)$ for $\operatorname{dist}(\{x\},Y)$ . We let $\partial(X)$ denote the topological boundary of $X$ (though the symbol $\partial$ will also be used for partial differentiation, as usual). If $A$ and $B$ are two sets with $A\subseteq B$ , we let $1_{A}:B\longrightarrow\{0,1\}$ denote the indicator function of $A$ . The relevant set $B$ will usually be obvious from context. If $E$ is some event, e.g. a divisor condition, we will also use $1_{E}$ for the indicator function of this event. For $\theta\in\mathbb{R}$ we adopt the standard shorthand $e(\theta)$ to mean $e^{2\pi i\theta}$ . The Möbius function will be denoted by $\mu$ , though in Section 15 the symbol $\mu$ will also be used to denote a measure. In Section 9 we will use $\varphi$ for Euler’s $\varphi$ -function, and for two natural numbers $a$ and $b$ we use the shorthand $(a,b)$ to denote their greatest common divisor.

Part II Linear algebra

In [21] we developed an armoury of linear-algebraic methods, which enabled us to manipulate linear inequalities into certain desired forms. The same manipulation is necessary here. We have chosen not to consign this material to an appendix, nor simply to cite [21], since the result of Lemma 5.6 below will be very important during subsequent sections. We will also need a few results (on the vector $\widetilde{\mathbf{r}}$ below) that were not required in our previous work, and so citing [21] won’t quite do.

Fortunately, as we do not seek to determine exactly how the error term in Theorem 1.16 depends on $L$ , we can offer a significant simplification over the work that was presented in [21]. This is another reason to include this material.

Before starting, we remind the reader of some of the central definitions from the theory of dual vector spaces and dual linear maps, which will be used liberally throughout. Let $V$ be a finite-dimensional vector space over a field $\mathbb{F}$ . Then $V^{*}$ denotes the dual vector space, i.e. the vector space of all linear maps $\omega:V\longrightarrow\mathbb{F}$ under pointwise addition and scalar multiplication. If $L:V\longrightarrow W$ is a linear map between two finite-dimensional vector spaces, the dual map $L^{*}:W^{*}\longrightarrow V^{*}$ is defined by the relation $(L^{*}(\boldsymbol{\omega}))(\mathbf{v}):=\boldsymbol{\omega}(L(\mathbf{v}))$ for all $\boldsymbol{\omega}\in W^{*}$ and $\mathbf{v}\in V$ . Given a basis $\mathbf{e_{1}},\dots,\mathbf{e_{n}}$ for $V$ , the dual basis $\mathbf{e_{1}^{*}},\dots,\mathbf{e_{n}^{*}}$ for $V^{*}$ is defined by extending linearly the relations

[TABLE]

Finally, given a set $S\subset V$ the annihilator $S^{0}\subset V^{*}$ is defined by

[TABLE]

5. Dimension reduction

We begin with a generalisation of Definition 1.8. Note that the case $m=0$ is permitted below.

Definition 5.1.

Let $N,d,h$ be natural numbers, and let $m$ be a non-negative integer. Let $L:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{m}$ be a linear map, and let $(\xi_{1},\dots,\xi_{d})=\Xi:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{d}$ be a linear map with integer coefficients. Let $F:\mathbb{R}^{d}\longrightarrow\mathbb{R}$ and $G:\mathbb{R}^{m}\longrightarrow\mathbb{R}$ be functions with compact support. Let $\mathbf{v}\in\mathbb{R}^{m}$ and $\widetilde{\mathbf{r}}\in\mathbb{Z}^{d}$ . Then for $f_{1},\dots,f_{d}:\mathbb{Z}\longrightarrow\mathbb{R}$ we define

[TABLE]

where $\widetilde{r}_{j}$ is the $j^{th}$ coordinate of $\widetilde{\mathbf{r}}$ .

The reader might notice that this definition is subtly different from the similar definition that appeared in [21], namely Definition 4.3 of that paper, in which the function $\mathbf{n}\mapsto F((\Xi(\mathbf{n})+\widetilde{\mathbf{r}})/N)$ was treated as an arbitrary function $F_{1}:\mathbb{R}^{h}\longrightarrow[0,1]$ . When dealing with quantitative aspects of smooth functions (a feature of this paper that is not required in [21]) it is convenient to preserve the internal structure of this particular function, and so we have modified Definition 5.1 accordingly.

Recall the notion of rational maps from [21].

Definition 5.2 (Rational dimension, rational map, purely irrational).

Let $m$ and $d$ be natural numbers, with $d\geqslant m$ . Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map. Let $u$ denote the largest integer for which there exists a surjective linear map $\Theta:\mathbb{R}^{m}\longrightarrow\mathbb{R}^{u}$ for which $\Theta L(\mathbb{Z}^{d})\subseteq\mathbb{Z}^{u}$ . We call $u$ the rational dimension of $L$ , and we call any map $\Theta$ with the above property a rational map for $L$ . We say that $L$ is purely irrational if $u=0$ .

Remark 5.3.

If (the matrix of) $L$ has algebraic coefficients, then there exists a rational map for $L$ that also has algebraic coefficients.**

Purely irrational linear maps are those that we may analyse most easily using the Davenport-Heilbronn method (see Section 8). However, even when proving Theorem 1.7, whose statement concerns only purely irrational linear maps, we will be forced to consider auxiliary linear maps that are not purely irrational. It is necessary therefore to develop a rudimentary theory of these maps. Readers desiring more detail and motivating examples concerning rational maps and rational dimension may consult Sections 2, 4, and 6 of [21].

Our key tool will be Lemma 5.6, which is a version of Lemma 4.10 from [21]. This lemma will enable us to ‘quotient out’ the rational relations that are present in a diophantine inequality, leaving behind a purely irrational linear map between spaces of a lower dimension. In particular, we will show that

[TABLE]

where $L^{\prime}$ is purely irrational, and the vectors $\mathbf{v^{\prime}}$ and $\widetilde{\mathbf{r}}$ , the linear map $\Xi$ and the function $G_{\widetilde{\mathbf{r}}}$ are objects that we may control.

To state the lemma we need to recall explicitly the notion from [13] that was mentioned in Remark 1.6, namely finite Cauchy-Schwarz complexity for linear maps.444In [21] a notion of degeneracy for pairs of linear maps was useful, but we have structured the present paper in such a way as to avoid requiring this complicated notion.

Definition 5.4 (Finite Cauchy-Schwarz complexity).

Let $d,h$ be natural numbers, and let $(\xi_{1},\dots,\xi_{d}):=\Xi:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{d}$ be a linear map. We say that $\Xi$ has infinite Cauchy-Schwarz complexity if there are two distinct indices $i$ and $j$ , and some $\lambda\in\mathbb{R}$ , for which $\xi_{i}=\lambda\xi_{j}$ . If no such $i$ and $j$ exist we say that $\Xi$ has finite Cauchy-Schwarz complexity.

There is an equivalent definition, which will be more convenient for algebraic manipulations.

Definition 5.5 (Finite Cauchy-Schwarz complexity, equivalent definition).

Let $d,h$ be natural numbers. Let $\mathbf{e_{1}},\dots,\mathbf{e_{d}}$ denote the standard basis vectors of $\mathbb{R}^{d}$ , and let $\mathbf{e_{1}}^{\ast},\dots,\mathbf{e_{d}}^{\ast}$ denote the dual basis of $(\mathbb{R}^{d})^{\ast}$ . Then let $V_{\operatorname{degen}}(h,d)$ denote the set of all linear maps $\Xi:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{d}$ for which there exist two indices $i,j\leqslant d$ , and some real number $\lambda$ , such $\mathbf{e_{i}}-\lambda\mathbf{e_{j}}$ is non-zero and $\mathbf{e_{i}}^{*}-\lambda\mathbf{e_{j}}^{*}\in\ker(\Xi^{*})$ . If $\Xi\notin V_{\operatorname{degen}}(h,d)$ , we say that $\Xi$ has finite Cauchy-Schwarz complexity.

The equivalence of these definitions is elementary.

For more background on the notion of finite Cauchy-Schwarz complexity, the reader may consult Section 1 of [13] or Section 6 of [21].

Now we may state and prove the important lemma, which provides the ‘dimension reduction’ of the section title.

Lemma 5.6 (Generating a purely irrational map).

Let $m,d$ be natural numbers, with $d\geqslant m+2$ , and let $C,\eta$ be positive parameters. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map with algebraic coefficients. Let $u$ be the rational dimension of $L$ . Let $F:\mathbb{R}^{d}\longrightarrow[0,1]$ and $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be compactly supported functions. Assume that $G$ is smooth, $\operatorname{Rad}(G)\leqslant\eta$ , and moreover that $G\in\mathcal{C}(P,\eta)$ for some set of parameters $P$ . Let $\mathbf{v}\in\mathbb{R}^{m}$ be a vector with $\|\mathbf{v}\|_{\infty}\leqslant CN$ . Then there exists a surjective linear map $\Theta:\mathbb{R}^{m}\longrightarrow\mathbb{R}^{u}$ , a surjective linear map $L^{\prime}:\mathbb{R}^{d-u}\longrightarrow\mathbb{R}^{m-u}$ , an injective linear map $\Xi:\mathbb{R}^{d-u}\longrightarrow\mathbb{R}^{d}$ , a finite subset $\widetilde{R}\subset\mathbb{Z}^{d}$ , a vector $\mathbf{v}^{\prime}\in\mathbb{R}^{m-u}$ , and, for each $\widetilde{\mathbf{r}}\in\widetilde{R}$ , a compactly supported function $G_{\widetilde{\mathbf{r}}}:\mathbb{R}^{m-u}\longrightarrow[0,1]$ , such that

(1)

$\Theta$ * is a rational map for $L$ with algebraic coefficients;* 2. (2)

$\Xi$ * has integer coefficients, depends only on $L$ , and satisfies $\operatorname{Im}\Xi=\ker\Theta L$ and $\Xi(\mathbb{Z}^{d-u})=\mathbb{Z}^{d}\cap\operatorname{Im}\Xi$ ;* 3. (3)

$\widetilde{R}$ * satisfies $|\widetilde{R}|=O_{L,\eta}(1)$ , and $\|\widetilde{\mathbf{r}}\|_{\infty}=O_{C,L,\eta}(N)$ for all $\widetilde{\mathbf{r}}\in\widetilde{R}$ ;* 4. (4)

for all $\widetilde{\mathbf{r}}\in\widetilde{R}$ , the function $G_{\widetilde{\mathbf{r}}}$ is smooth, $\operatorname{Rad}(G)=O_{L}(\eta)$ , and $G_{\widetilde{\mathbf{r}}}\in\mathcal{C}(L,P,\eta)$ ; 5. (5)

$\mathbf{v}^{\prime}$ * satisfies $\|\mathbf{v}^{\prime}\|_{\infty}=O_{C,L}(N)$ ;* 6. (6)

for all natural numbers $N$ , and for all functions $f_{1},\dots,f_{d}:\mathbb{Z}\longrightarrow\mathbb{R}$ , one has

[TABLE] 7. (7)

$L^{\prime}$ * is purely irrational, depends only on $L$ , and has algebraic coefficients;* 8. (8)

*if $L\notin V_{\operatorname{degen}}^{*}(m,d)$ then $\Xi$ has finite Cauchy-Schwarz complexity. *

*The above properties suffice for Section 9, but three additional properties also hold. We will need these additional properties in Section 11. * 9. (9)

Letting $\mathbf{e_{1}},\dots,\mathbf{e_{d-u}}$ denote the standard basis of $\mathbb{R}^{d-u}$ , there is a set $\{\mathbf{x_{i}}:i\leqslant u\}\subset\mathbb{R}^{d}$ for which

[TABLE]

is a basis for $\mathbb{R}^{d}$ and a lattice basis for $\mathbb{Z}^{d}$ . Furthermore, $\widetilde{R}\subset\operatorname{span}(\mathbf{x_{i}}:i\leqslant u)$ and $\{\Theta L\mathbf{x_{i}}:i\leqslant u\}$ is a lattice basis for $\Theta L\mathbb{Z}^{d}$ ; 10. (10)

if $\eta$ is small enough in terms of $L$ , and if $\mathbf{v}=L\mathbf{a}$ for some $\mathbf{a}\in\mathbb{R}^{d}$ , then $|\widetilde{R}|=1$ and $\widetilde{\mathbf{r}}\in R$ is a vector that minimises $\|\Theta L(\widetilde{\mathbf{r}}+\mathbf{a})\|_{\infty}$ over all $\widetilde{\mathbf{r}}\in\mathbb{Z}^{d}$ ; 11. (11)

for all $\widetilde{\mathbf{r}}\in\widetilde{R}$ and $\mathbf{x}\in\mathbb{R}^{d-u}$ one has

[TABLE]

.

Proof.

Parts (1) and (2): Choose $\Theta:\mathbb{R}^{m}\longrightarrow\mathbb{R}^{u}$ to be a rational map for $L$ that has algebraic coefficients. By rank-nullity $\ker(\Theta L)$ is a $d-u$ dimensional subspace of $\mathbb{R}^{d}$ , and also the matrix of $\Theta L$ has integer coefficients. Combining these two facts, we see that $\ker(\Theta L)\cap\mathbb{Z}^{d}$ is a $d-u$ dimensional lattice, and (by the standard algorithms) one can find a lattice basis $\mathbf{v_{1}},\dots,\mathbf{v_{d-u}}\in\mathbb{Z}^{d}$ that satisfies $\|\mathbf{v_{i}}\|_{\infty}=O_{L}(1)$ for every $i$ .

Let $\mathbf{e_{1}},\dots,\mathbf{e_{d-u}}$ denote the standard basis of $\mathbb{R}^{d-u}$ , and then define $\Xi:\mathbb{R}^{d-u}\longrightarrow\mathbb{R}^{d}$ by

[TABLE]

for all $i\leqslant d-u$ . Then $\Xi$ satisfies part (2) of the lemma.

Parts (3), (9), and (10): There is a set of vectors $\{\mathbf{a_{1}},\dots,\mathbf{a_{u}}\}\subset\mathbb{Z}^{u}$ that is an integer basis for the lattice $\Theta L(\mathbb{Z}^{d})$ and for which $\|\mathbf{a_{i}}\|_{\infty}=O_{L}(1)$ for each $i$ . Furthermore there exists a set of vectors $\{\mathbf{x_{1}},\dots,\mathbf{x_{u}}\}\subset\mathbb{Z}^{d}$ such that $\Theta L(\mathbf{x_{i}})=\mathbf{a_{i}}$ for each $i$ , and $\|\mathbf{x_{i}}\|_{\infty}=O_{L}(1)$ . By Lemma 4.8 of [21],

[TABLE]

is a basis for $\mathbb{R}^{d}$ and a lattice basis for $\mathbb{Z}^{d}$ .

Now, if $\mathbf{z}\in\mathbb{R}^{m}$ and $\Theta(\mathbf{z})=\mathbf{r}$ then $\|\mathbf{z}\|_{\infty}=\Omega_{L}(\|\mathbf{r}\|_{\infty})$ . Recall that $\operatorname{Rad}(G)\leqslant\eta$ and that $\Theta L(\mathbb{Z}^{d})\subseteq\mathbb{Z}^{u}$ . It follows that there are at most $O_{L,\eta}(1)$ possible vectors $\mathbf{r}\in\mathbb{Z}^{u}$ for which there exists a vector $\mathbf{n}\in\mathbb{Z}^{d}$ for which both $G(L\mathbf{n}+\mathbf{v})\neq 0$ and $\Theta L\mathbf{n}=\mathbf{r}$ . Let $R$ denote the set of all such vectors $\mathbf{r}$ . Observe that, for all $\mathbf{r}\in R$ , $\|\mathbf{r}\|_{\infty}=O_{C,L,\eta}(N)$ .

For each $\mathbf{r}\in R$ , there exists a unique vector $\widetilde{\mathbf{r}}\in\operatorname{span}(\mathbf{x_{i}}:i\leqslant u)$ such that $\Theta L\widetilde{\mathbf{r}}=\mathbf{r}$ . Note that $\|\widetilde{\mathbf{r}}\|_{\infty}=O_{C,L,\eta}(N)$ . Letting $\widetilde{R}$ denote the set of these $\widetilde{\mathbf{r}}$ , we see that $\widetilde{R}$ satisfies part (3).

If $\eta$ is small enough in terms of $L$ , then $R$ has size at most $1$ . Indeed, if $\mathbf{r^{(1)}}$ and $\mathbf{r^{(2)}}$ are two different vectors in $R$ , with respective $\widetilde{\mathbf{r}}^{\mathbf{(1)}}$ and $\widetilde{\mathbf{r}}^{\mathbf{(2)}}$ , then $G(L\widetilde{\mathbf{r}}^{\mathbf{(1)}}+\mathbf{v})\neq 0$ and $G(L\widetilde{\mathbf{r}}^{\mathbf{(2)}}+\mathbf{v})\neq 0$ . Hence $\|L(\widetilde{\mathbf{r}}^{\mathbf{(1)}}-\widetilde{\mathbf{r}}^{\mathbf{(2)}})\|_{\infty}\ll\eta$ . Yet $\|\Theta(L(\widetilde{\mathbf{r}}^{\mathbf{(1)}}-\widetilde{\mathbf{r}}^{\mathbf{(2)}}))\|_{\infty}=\|\mathbf{r^{(1)}}-\mathbf{r^{(2)}}\|_{\infty}\gg 1$ (which is a contradiction). In this instance, writing $\mathbf{v}$ in the form $L\mathbf{a}$ , we may pick $\widetilde{\mathbf{r}}\in\mathbb{Z}^{d}$ to be an element in $\operatorname{span}(\mathbf{x_{i}}:i\leqslant u)$ that minimises $\|\Theta L(\widetilde{\mathbf{r}}+\mathbf{a})\|_{\infty}$ over all $\widetilde{\mathbf{r}}\in\mathbb{Z}^{d}$

Parts (4), (5), (6), and (11): By the definition of $\widetilde{R}$ , and the fact that $\Xi(\mathbb{Z}^{d-u})=\mathbb{Z}^{d}\cap\ker(\Theta L)$ , we have that $T_{F,G,N}^{L,\mathbf{v}}(f_{1},\dots,f_{d})$ is equal to

[TABLE]

This is very close to being of the form required for part (6), and indeed it can be massaged into exactly the required form.

To do this, note that

[TABLE]

and so there exists an invertible linear map $Q:\mathbb{R}^{m}\longrightarrow\mathbb{R}^{m}$ with algebraic coefficients such that

[TABLE]

For all $\mathbf{x}\in\mathbb{R}^{d-u}$ we have

[TABLE]

We also note that $QL\Xi(\mathbf{x})\in\{0\}^{u}\times\mathbb{R}^{m-u}$ , and that $QL\widetilde{\mathbf{r}}\in\mathbb{R}^{u}\times\{0\}^{m-u}$ .

Now, write $\pi_{m-u}:\mathbb{R}^{m}\longrightarrow\mathbb{R}^{m-u}$ for the projection map onto the final $m-u$ coordinates. Define $G_{\widetilde{\mathbf{r}}}:\mathbb{R}^{m-u}\longrightarrow[0,1]$ by

[TABLE]

where $\mathbf{x_{0}}$ is the extension of $\mathbf{x}$ by [math] in the first $u$ coordinates. Then $G_{\widetilde{\mathbf{r}}}$ satisfies the desired properties of part (3), since $\mathbf{x_{0}}$ and $QL\widetilde{\mathbf{r}}+Q\mathbf{v}-(\pi_{m-u}Q\mathbf{v})_{\mathbf{0}}$ are orthogonal.

Then (5.4) is equal to

[TABLE]

Let

[TABLE]

Then $L^{\prime}:\mathbb{R}^{d-u}\longrightarrow\mathbb{R}^{m-u}$ is surjective, and

[TABLE]

This resolves parts (5) and (6). But furthermore, by the construction of $G_{\widetilde{\mathbf{r}}}$ , part (10) is also satisfied.

Part (7): This is immediate from Lemma 4.10 of [21]. To spell it out, suppose for contradiction that there exists some surjective linear map $\varphi:\mathbb{R}^{m-u}\longrightarrow\mathbb{R}$ with $\varphi L^{\prime}(\mathbb{Z}^{d-u})\subseteq\mathbb{Z}$ , i.e. with $\varphi\pi_{m-u}QL\Xi(\mathbb{Z}^{d-u})\subseteq\mathbb{Z}$ . Then define the map $\Theta^{\prime}:\mathbb{R}^{m}\longrightarrow\mathbb{R}^{u+1}$ by

[TABLE]

Then $\Theta^{\prime}$ is surjective, and $\Theta^{\prime}L(\mathbb{Z}^{d})\subseteq\mathbb{Z}^{u+1}$ . This second fact is immediately seen by writing $\mathbb{Z}^{d}$ with respect to the lattice basis $\mathcal{B}$ from (5.3). This contradicts the assumption that $L$ has rational dimension $u$ . So $L^{\prime}$ is purely irrational.

Part (8): Suppose $L\notin V_{\operatorname{degen}}^{*}(m,d)$ and suppose for contradiction that $\Xi$ has infinite Cauchy-Schwarz complexity. Letting $\mathbf{e_{1}},\dots,\mathbf{e_{d}}$ denote the standard basis of $\mathbb{R}^{d}$ , this means there exists $i,j\leqslant d$ and a non-zero vector $\mathbf{e_{i}}-\lambda\mathbf{e_{j}}$ such that $\mathbf{e_{i}}^{*}-\lambda\mathbf{e_{j}}^{*}\in\ker(\Xi^{*})$ . But $\ker(\Xi^{*})=(\operatorname{Im}\Xi)^{0}=(\ker\Theta L)^{0}=\operatorname{Im}(L^{*}\Theta^{*})$ . Hence $\mathbf{e_{i}}-\lambda\mathbf{e_{j}}\in\operatorname{Im}L^{*}$ , which implies that $L\in V_{\operatorname{degen}}^{*}(m,d)$ , contradicting our hypothesis.

The lemma is proved. ∎

Remark 5.7.

*Applying Lemma 5.6 with $f_{j}=\Lambda^{\prime}$ for all $j$ , and when $L$ has rational dimension $m$ , it is evident that estimating $T_{F,G,N}^{L,\mathbf{v}}(\Lambda^{\prime},\dots,\Lambda^{\prime})$ is equivalent to counting solutions to $|\widetilde{R}|$ systems of linear equations given by $\Xi$ . This is handled by the Main Theorem of [13]. In this sense, one may see how our work in this paper generalises Green-Tao’s work in [13] to the cases in which the rational dimension is not equal to $m$ . ***

6. Normal form

In this section we describe, very briefly, what it means for a linear map $(\psi_{1},\dots,\psi_{t})=\Psi:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{t}$ to be in $s$ -normal form. For a more complete discussion we refer the reader to [13] and [21].

Definition 6.1 (Normal form).

Let $d,t$ be natural numbers, let $s$ be a non-negative integer, and let $(\psi_{1},\dots,\psi_{t})=\Psi:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{t}$ be a linear map. We say that $\Psi$ is in $s$ -normal form if for every $i\in[t]$ there exists a collection $J_{i}\subseteq\{\mathbf{e_{1}},\dots,\mathbf{e_{d}}\}$ of basis vectors of cardinality $|J_{i}|\leqslant s+1$ such that $\prod_{\mathbf{e}\in J_{i}}\psi_{i^{\prime}}(\mathbf{e})$ is non-zero for $i^{\prime}=i$ and vanishes otherwise.

The notion of normal form is intimately connected with the notion of finite Cauchy-Schwarz complexity (Definition 5.5). The key proposition was proved555In [21] we were forced to prove a delicate quantitative version, but this will not be necessary here. in [13].

Lemma 6.2 (Normal form extensions).

Let $d,t$ be natural numbers, and let $(\psi_{1},\dots,\psi_{t})=\Psi:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{t}$ be a linear map with finite Cauchy-Schwarz complexity. Then there is a linear map $\Psi^{\prime}:\mathbb{R}^{d^{\prime}}\longrightarrow\mathbb{R}^{t}$ such that:

•

$d^{\prime}=O(1)$ ;

•

for some vectors $\mathbf{f_{k}}\in\mathbb{R}^{d}$ that satisfy $\|\mathbf{f_{k}}\|_{\infty}=O_{\Psi}(1)$ for every $k$ , the map $\Psi^{\prime}$ is of the form

[TABLE]

for all $\mathbf{u}\in\mathbb{R}^{d}$ ;

•

$\Psi^{\prime}$ * is in $s$ -normal form, for some $s=O(1)$ .*

Proof.

In [13, Lemma 4.4] this lemma was proved for a linear map over a $\mathbb{Q}$ -vector space. The proof over $\mathbb{R}$ is identical. Alternatively one can iterate [21, Proposition 6.7] over all $i\leqslant t$ . ∎

Remark 6.3.

In Lemma 6.2 one may take $s$ to be the Cauchy-Schwarz complexity of $\Psi$ . This notion will not be used in this paper, save for the ‘finite versus infinite’ dichotomy already given in Definition 5.5.**

Part III Pseudorandomness

Notions of pseudorandomness are crucial to the theory of higher order Fourier analysis. A small Gowers norm is one such notion, as is satisfying the ‘linear forms condition’ of [11] and [13]. In this part we review what is known about Gowers norms in relation to the primes, and then formulate a ‘linear inequalities condition’, which will be the analogous notion of pseudorandomness for this paper.

7. The $W$ -trick and Gowers norms

To begin with, let us recall the definition of the Gowers norm over a cyclic group and over $[N]$ . Given a function $f:\mathbb{Z}/N\mathbb{Z}\longrightarrow\mathbb{C}$ , and a natural number $d$ , one defines the Gowers $U^{d}$ norm $\|f\|_{U^{d}(N)}$ to be the unique non-negative solution to the equation

[TABLE]

where $|\boldsymbol{\omega}|=\sum_{i}\omega_{i}$ , $\mathbf{h}=(h_{1},\cdots,h_{d})$ , $\mathscr{C}$ is the complex-conjugation operator, and the summation is over $x,h_{1},\cdots,h_{d}\in\mathbb{Z}/N\mathbb{Z}$ . It is not immediately obvious why the right-hand side of (7.1) is always a non-negative real, nor why the $U^{d}$ norms are genuine norms if $d\geqslant 2$ , but both facts are true. There are many expositions of the standard theory of these norms available in the literature, for example [19, Chapter 11] and [10]. For the most general treatment, the reader may consider Appendices B and C of [13].

In the sequel we will be considering functions defined on $[N]$ rather than on $\mathbb{Z}/N\mathbb{Z}$ . However, the Gowers norm of such functions may be easily defined by reference to the cyclic group case. Indeed, if $f:[N]\longrightarrow\mathbb{C}$ , and $d$ is a natural number, one chooses a natural number $N^{\prime}>N$ and then considers $[N]$ as an initial segment of $\mathbb{Z}/N^{\prime}\mathbb{Z}$ (viewing $[N^{\prime}]$ as a set of representative classes for $\mathbb{Z}/N^{\prime}\mathbb{Z}$ ). One then defines

[TABLE]

which is independent of $N^{\prime}$ provided $N^{\prime}/N$ is large enough in terms of $d$ .

This is as much background as we will give here, and the reader is invited to consult the aforementioned references for more detail. A Gowers norm over $\mathbb{R}$ will also appear later on in this paper, but will be introduced in Section 13 as and when it is needed.

We move our consideration to the primes. Given some fixed modulus $q$ the primes are not uniformly distributed across arithmetic progressions modulo $q$ (as almost all the primes are coprime to $q$ ), and this lack of uniformity is an obstacle when trying to count solutions to equations in primes. Fortunately, there is a technical device, known as the $W$ -trick, that has long been used to manage this difficulty.

This device is usually introduced via the following function.

Definition 7.1.

Let $N$ be a natural number, and let $W$ be as in Section 4. For any natural number $b$ with $(b,W)=1$ , let $\Lambda_{b,W}^{\prime}:\mathbb{Z}\longrightarrow\mathbb{R}_{\geqslant 0}$ be defined by

[TABLE]

The idea from [13], going back to [9] and [11], is that the function

[TABLE]

should act as a proxy for $\Lambda^{\prime}$ , while each $\Lambda_{b,W}^{\prime}$ enjoys strong pseudorandomness properties. For example we have the following deep result, which is a crucial component of the proof of Theorem 1.1 on linear equations in primes.

Theorem 7.2.

[13*, Theorem 7.2]**

Let $N,s$ be natural numbers, and let $w^{*}:\mathbb{N}\longrightarrow\mathbb{R}_{\geqslant 0}$ be any function that satisfies $w^{*}(n)\longrightarrow\infty$ as $n\rightarrow\infty$ and $w^{*}(n)\leqslant\frac{1}{2}\log\log n$ for all $n$ . Let $b=b(N)$ be a natural number that satisfies $b\leqslant W^{*}$ and $(b,W^{*})=1$ . Then*

[TABLE]

as $N\rightarrow\infty$ , where the $o(1)$ term may depend on the function $w^{*}$ chosen (but is independent of the choice of $b$ ).

We remind the reader that $s$ is a dimension parameter, and so dependence on $s$ is not denoted explicitly in our implied constants.

Remark 7.3.

In [13] Theorem 7.2 is proved conditionally, relying on two other conjectures. But, as we intimated in the introduction, these conjectures were later settled in joint work of Green-Tao and Green-Tao-Ziegler [14, 15, 16].**

Remark 7.4.

We will use Theorem 7.2 to prove Theorem 1.16. Unfortunately it seems that this cannot be done in the same manner as in [13], i.e. by splitting $[N]$ into arithmetic progressions modulo $W$ at an early stage and then performing subsequent manipulations with the functions $\Lambda^{\prime}_{b,W}$ .**

As a heuristic, instead of considering an inequality such as

[TABLE]

for some $L$ with irrational coefficients and some positive $\varepsilon$ , [13] considers (7.4) for some $L$ with rational coefficients and sets $\varepsilon$ equal to [math]. Under those assumptions one may rescale the variables $\mathbf{n}$ by a factor of $W$ , as required in Definition 7.1, without fundamentally altering the problem. However, in the more general scenario of Theorem 1.16, where $\varepsilon$ is strictly positive, rescaling the variable $\mathbf{n}$ by a factor of $W$ means we must replace $\varepsilon$ by $\varepsilon/W$ , and we cannot afford this loss, as the manipulations in Section 13 lose some powers of $\varepsilon$ . As far as we have been able to tell, this means that we cannot perform the $W$ -trick in this manner.**

To circumvent this issue of scaling, we will manipulate with the local von Mangoldt functions $\Lambda_{\mathbb{Z}/W\mathbb{Z}}$ throughout, saving our rescaling for the very end of the argument. Regarding the control on Gowers norms, the following lemma is therefore the more appropriate bound.

Lemma 7.5.

Let $N,s$ be natural numbers. Then

[TABLE]

as $N\rightarrow\infty$ .

The proof is a standard deduction from results of [13], achieved by splitting into arithmetic progressions modulo $W$ . We would however like to thank the anonymous referee for suggesting a simplification to our original argument.

Proof.

Let $(\psi_{\boldsymbol{\omega}})_{\boldsymbol{\omega}\in\{0,1\}^{s+1}}=\Psi:\mathbb{R}^{s+2}\longrightarrow\mathbb{R}^{2^{s+1}}$ denote the linear map giving the Gowers norm, i.e. where each $\psi_{\boldsymbol{\omega}}$ is of the form $\psi_{\boldsymbol{\omega}}(x,\mathbf{h})=x+\boldsymbol{\omega}\cdot\mathbf{h}$ . From expression (7.2), we then have

[TABLE]

where

[TABLE]

It is immediate that $|Z|\asymp N^{s+2}$ .

We now split into arithmetic progressions modulo $W$ . To this end let $A\subset[W]^{s+2}$ be the set defined by

[TABLE]

Then the right-hand side of (7.5) is

[TABLE]

plus an error of magnitude at most

[TABLE]

This error is $o(1)$ .

By the linearity of $\Psi$ , and recalling the definition of $\Lambda_{b,W}^{\prime}$ from Definition 7.1, we have that expression (7.6) is equal to

[TABLE]

Observe that

[TABLE]

where

[TABLE]

is the local factor associated to the system of forms $\Psi$ . Since $\Psi$ has finite Cauchy-Schwarz complexity, we have the bound $\beta_{W}=O(1)$ (by [13, Lemma 1.3]). This means that the lemma would follow from the bound

[TABLE]

for each fixed $\mathbf{a}\in A$ . What’s more, expression (7.7) is an immediate consequence of the Gowers-Cauchy-Schwarz inequality when combined with Theorem 7.2.

To spell out some of the details, let $M:=\lfloor N/W\rfloor$ and let $M^{\prime}>M$ be a natural number with $M^{\prime}/M$ large enough in terms of $s$ . Then, recalling the definition of the set $Z$ , the left-hand side of (7.7) is equal to

[TABLE]

Taking the $o(1)$ term as read, this is

[TABLE]

Now, by the Gowers-Cauchy-Schwarz inequality (see [19, Expression (11.6)]), expression (7.9) is at most

[TABLE]

By expression (7.2), this is bounded above by a constant times

[TABLE]

Expression (7.10) is directly amenable to Theorem 7.2, with the only wrinkle being the fact that Theorem 7.2 only applied to functions $\Lambda^{\prime}_{b,W}$ with $b\leqslant W$ . But this is easy to deal with. Indeed, for natural numbers $n$ and $k$ we have the identity

[TABLE]

and so one establishes that if $b$ is in the range $1\leqslant b\leqslant(s+2)W$ then

[TABLE]

where $b^{\prime}\in[W]$ and $b^{\prime}\equiv b\,(\text{mod }W)$ , and where the error term $E$ is at most a constant times

[TABLE]

We have $E=o(1)$ and therefore, by Theorem 7.2, expression (7.10) is $o(1)$ . The lemma follows. ∎

8. Inequalities in lattices

This section will be devoted to proving the following technical lemma. This is the only part of the paper in which we pay especial attention to the quantitative aspects of the smooth cut-off functions, as the lemma will be applied in contexts where the functions $F$ and $G$ depend on the asymptotic parameter $N$ (albeit tamely).

Lemma 8.1 (Inequalities in lattices).

Let $N,m,d,h$ be natural numbers, with $d\geqslant h\geqslant m+1$ , and let $\gamma$ be a positive constant. Suppose that $N>2^{\frac{1}{\gamma}}$ . Let $P$ be an additional set of parameters. Let $(\xi_{1},\dots,\xi_{d})=\Xi:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{d}$ be an injective linear map with integer coefficients and let $L:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{m}$ be a purely irrational surjective linear map with algebraic coefficients. Let $\mathbf{v}\in\mathbb{R}^{m}$ and let $\widetilde{\mathbf{r}}\in\mathbb{Z}^{d}$ . Let $e_{1},\dots,e_{d}\in\mathbb{N}$ and suppose that $e_{j}<N^{\gamma}$ for all $j$ . Let $F:\mathbb{R}^{h}\longrightarrow[0,1]$ and $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be functions in $\mathcal{C}(P)$ . Then, provided that $\gamma$ is small enough in terms of $L$ , for all positive $K$ we have

[TABLE]

where $\alpha_{\mathbf{e},\widetilde{\mathbf{r}}}$ is the local factor

[TABLE]

Remark 8.2.

*If $h=d$ and if $\Xi:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{d}$ is the identity map, then the Chinese Remainder Theorem guarantees that $\alpha_{\mathbf{e},\widetilde{\mathbf{r}}}=(e_{1}\dots e_{d})^{-1}$ . In the general case, the local factors $\alpha_{\mathbf{e},\widetilde{\mathbf{r}}}$ are the same objects as those factors $\alpha_{m_{1},\dots,m_{t}}$ considered in [13, Page 1831]. ***

Proof of Lemma 8.1.

We assume throughout that $\gamma$ is small enough in terms of $L$ , and that $K$ is large enough in terms of the dimensions $m$ , $d$ , and $h$ .

By applying Fourier inversion to $G$ , we see that the left-hand side of (8.1) is equal to

[TABLE]

To bound this integral, we split $\mathbb{R}^{m}$ into three ranges. Let $\eta$ be a small positive parameter to be chosen later, which we assume to be small enough in terms of $L$ . We then define the so-called ‘trivial arc’ by

[TABLE]

the ‘minor arc’ by

[TABLE]

and the ‘major arc’ by

[TABLE]

Trivial arc: By Lemma 3.4, $|\widehat{G}(\boldsymbol{\alpha})|\ll_{K,P}\|\boldsymbol{\alpha}\|_{\infty}^{-K}$ . Therefore, applying the trivial $O_{P}(N^{h})$ bound to the inner sum, we have

[TABLE]

Minor arc: Choose $\mathbf{x}\in\mathbb{Z}^{h}$ to satisfy the simultaneous divisor conditions $e_{j}|\xi_{j}(\mathbf{x})+\widetilde{r}_{j}$ for every $j\leqslant d$ . If there is no such $\mathbf{x}\in\mathbb{Z}^{h}$ then (8.1) is trivially true. Further, we may assume that $\mathbf{x}$ satisfies $\|\mathbf{x}\|_{\infty}\leqslant e_{1}\dots e_{d}$ . Let $\Gamma_{\Xi,\mathbf{e}}$ denote the lattice

[TABLE]

Then

[TABLE]

Using this reformulation, we apply Poisson summation (Lemma 3.6) to the inner sum of (8.3). Then the contribution to (8.3) from the minor arc $\mathfrak{m}$ is equal to

[TABLE]

where $\Gamma_{\Xi,\mathbf{e}}^{*}$ is the lattice that is dual to $\Gamma_{\Xi,\mathbf{e}}$ (see Definition 3.5).

We need the following obvious lemma.

Lemma 8.3.

There is a natural number $A$ , of size at most $O(N^{O(\gamma)})$ , such that $A\Gamma_{\Xi,\mathbf{e}}^{*}\subset\mathbb{Z}^{h}$ .

Proof.

There is an $h$ -dimensional sublattice of $\Gamma_{\Xi,\mathbf{e}}$ , namely $(e_{1}\dots e_{d}\mathbb{Z})^{h}$ . Therefore, we may choose a lattice basis for $\Gamma_{\Xi,\mathbf{e}}$ all of whose elements $\mathbf{b}$ satisfy $\|\mathbf{b}\|_{\infty}=O(N^{O(\gamma)})$ . Let $M$ be the $h$ -by- $h$ matrix that has these basis vectors as its columns. Then the columns of the matrix $(M^{T})^{-1}$ are a lattice basis for the dual lattice $\Gamma_{\Xi,\mathbf{e}}^{*}$ . The entries in $(M^{T})^{-1}$ are rational numbers with numerator and denominator at most $O(N^{O(\gamma)})$ . Clearing denominators, the lemma follows. ∎

Let $\langle L^{T}\boldsymbol{\alpha}\rangle$ denote some $\mathbf{c}\in\Gamma_{\Xi,\mathbf{e}}^{*}$ that minimises the expression $\|\mathbf{c}-L^{T}\boldsymbol{\alpha}\|_{\infty}$ . We claim that the only term in (8.5) that cannot be easily absorbed into the error term comes from $\mathbf{c}=\langle L^{T}\boldsymbol{\alpha}\rangle$ .

Indeed, let $A$ be the quantity provided by Lemma 8.3, and let $\langle L^{T}\boldsymbol{\alpha}\rangle_{2}$ denote the second closest point to $L^{T}\boldsymbol{\alpha}$ in the lattice $\Gamma_{\Xi,\mathbf{e}}^{*}$ . If more than one such point exists, choose arbitrarily. Then

[TABLE]

where $a$ is some positive constant which depends only on $h$ . By the triangle inequality and dyadic pigeonholing, one then has

[TABLE]

By Lemma 8.3 we also have the estimate

[TABLE]

which holds for all $R>0$ . Using (8.8), Lemma 3.4, and the bound $A=O(N^{O(\gamma)})$ , the quantity (8.7) is seen to be

[TABLE]

This implies that the contribution from these lattice points to (8.5) is at most

[TABLE]

Since $\gamma$ and $\eta$ are small enough, (8) is

[TABLE]

which may be absorbed into the error term of (8.1) after adjusting the implied constant appropriately.

It remains to estimate

[TABLE]

We have the following key lemma.

Lemma 8.4.

Under the assumption that $\eta$ and $\gamma$ are suitably small in terms of $L$ ,

[TABLE]

Remark 8.5.

The proof of this lemma uses the algebraicity of the coefficients of $L$ . One should note that the bound (8.14) below, which holds for matrices with algebraic coefficients, also holds for almost all matrices. It is this fact which ultimately leads to our observation in the introduction that the main theorems of this paper hold for almost all matrices (as well as for matrices with algebraic coefficients, as stated).**

Proof.

Certainly, by rescaling $\boldsymbol{\alpha}$ and using Lemmas 3.4 and 8.3,

[TABLE]

The quantity

[TABLE]

encodes information about diophantine approximations to the coefficients of $L$ . For example, since $L$ is purely irrational, by definition666The reader may consult Definition 5.2. we have $L^{T}\boldsymbol{\beta}\neq\mathbb{Z}$ for any $\boldsymbol{\beta}\neq\mathbf{0}$ . Therefore, since the function

[TABLE]

is continuous, (8.13) is always non-zero. We will need a quantitative refinement of this fact.

Fortunately, in [21] we extensively analysed expressions such as (8.13). Consider Definition 2.8 of [21] in particular, in which we defined the approximation function777We stress that the notation $A_{L}$ is unrelated to the parameter $A$ from this section. $A_{L}$ . In this language, (8.13) is equal to

[TABLE]

Therefore, since $L$ is purely irrational and has algebraic coefficients, Lemma E.1 of [21] tells us that

[TABLE]

Since $\eta$ and $\gamma$ are small enough in terms of $L$ , and since $A=O(N^{O(\gamma)})$ , (8) implies that

[TABLE]

as claimed. ∎

The lemma above implies that (8.11) has size

[TABLE]

which is thus our bound for the total contribution from the minor arc $\mathfrak{m}$ .

Major arc: Performing the same Poisson summation argument as in the minor arc case, the main term on the left-hand side of (8.1) is equal to

[TABLE]

For $\boldsymbol{\alpha}\in\mathfrak{M}$ one has $\|L^{T}\boldsymbol{\alpha}\|_{\infty}\ll_{L}N^{-1+\eta}$ , and so $\langle L^{T}\boldsymbol{\alpha}\rangle=\mathbf{0}$ . Therefore (8.16) is equal to

[TABLE]

Since $L^{T}:\mathbb{R}^{m}\longrightarrow\mathbb{R}^{h}$ is injective, one has $\|L^{T}\boldsymbol{\alpha}\|_{\infty}\gg_{L}\|\boldsymbol{\alpha}\|_{\infty}$ . Therefore (8.17) is equal to

[TABLE]

which, after the obvious manipulations, equals

[TABLE]

Fixing suitably small $\eta$ and $\gamma$ , and combining the contribution from the trivial, minor, and major arc, we deduce that

[TABLE]

By adjusting the implied constant appropriately, the error term from (8) is $O_{K,L,P}(N^{-K})$ for all positive real $K$ . The final observation is that, considering the definition of the local factor $\alpha_{\mathbf{e},\widetilde{\mathbf{r}}}$ in (8.2) and the fact that we assumed $\alpha_{\mathbf{e},\widetilde{\mathbf{r}}}$ is non-zero,

[TABLE]

The lemma follows. ∎

The following estimate will also be needed.

Lemma 8.6.

Under the same hypotheses as Lemma 8.1, for all positive $K$

[TABLE]

where $\alpha_{\mathbf{e},\widetilde{\mathbf{r}}}$ is as in (8.2).

Proof.

By applying Poisson summation, the left-hand side of (8.19) is equal to

[TABLE]

where $\mathbf{x}$ and $\Gamma^{*}_{\Xi,\mathbf{e}}$ are as in (8.5). By applying estimates (8.7) and (8.9), one shows that the main term of (8.19) comes from the $\mathbf{c}=\mathbf{0}$ term above. After the obvious manipulations, this concludes the lemma. ∎

9. The linear inequalities condition

In [13], the key notion of pseudorandomness is the so-called ‘linear forms condition’ (see Definition 6.2 of of that paper). The upshot is that in order to understand the number of solutions to a particular linear equation in primes, it is enough to understand the number of solutions to certain auxiliary linear equations weighted by a sieve weight $\nu$ . In this paper an analogous philosophy holds. Indeed we will show that, in order to understand the number of solutions to a particular linear inequality in primes, it is enough to understand the number of solutions to certain auxiliary linear inequalities weighted by a sieve weight $\nu$ .

Let us proceed with the formal definition. The reader is reminded that $W_{j}=\prod_{p\leqslant w_{j}(N)}p$ (see Section 4).

Definition 9.1 (Linear inequalities condition).

Let $m,d$ be natural numbers, and let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a linear map. For each natural number $N$ , let $\nu_{N}:\mathbb{Z}\longrightarrow\mathbb{R}$ be a function. We say that the family of functions $(\nu_{N})_{N=1}^{\infty}$ is $(L,w)$ -pseudorandom if the following holds. For all positive constants $C$ and for all sets of parameters $P$ , for all compactly supported smooth functions $F:\mathbb{R}^{d}\longrightarrow[0,1]$ and $G:\mathbb{R}^{m}\longrightarrow[0,1]$ such that $F,G\in\mathcal{C}(P)$ , and for all functions $w_{1},\dots w_{d}:\mathbb{N}\longrightarrow\mathbb{R}_{\geqslant 0}$ that each satisfy $w_{j}(n)\rightarrow\infty$ as $n\rightarrow\infty$ and $w_{j}(n)\leqslant w(n)$ for all $n$ , for all $\mathbf{v}\in\mathbb{R}^{m}$ satisfying $\|\mathbf{v}\|_{\infty}\leqslant CN$ , and for functions $f_{1},\dots,f_{d}:\mathbb{Z}\longrightarrow\mathbb{R}$ such that each $f_{j}$ equals either $\nu_{N}$ or $\Lambda_{\mathbb{Z}/W_{j}\mathbb{Z}}$ ,

[TABLE]

as $N\rightarrow\infty$ , where the $o(1)$ term may depend on the family $(\nu_{N})_{N=1}^{\infty}$ , on $C$ , $L$ , $P$ , and on the functions $w_{1},\dots,w_{d}$ .

Remark 9.2.

Equation (9.1) might seem to be a slightly curious formulation of a pseudorandomness principle, as it does not claim that the weight $\nu_{N}$ behaves like the constant $1$ function but rather behaves like the local von Mangoldt function. However, referring to Remark 7.4, let us reiterate the comment that we are not performing the $W$ -trick in the same manner as [13].**

The aim of this section is to introduce a sieve weight $\nu_{N,w}^{\gamma}$ , and to prove that it is $(L,w)$ -pseudorandom for a large class of linear maps $L$ . We begin by introducing the sieve weight from [13, Appendix D].

Definition 9.3 (Smooth sieve weight).

Let $N$ be a natural number, $\gamma$ be a positive real, and define $R:=N^{\gamma}$ . Let $\rho\in\mathcal{C}(\emptyset)$ be the smooth $1$ -supported function fixed in Section 4. Define the function $\Lambda_{\rho,R,2}:\mathbb{Z}\longrightarrow\mathbb{R}_{\geqslant 0}$ by the formula

[TABLE]

for non-negative integers $n$ , and then by the obvious extension to negative integers.

We now define the family of majorants themselves.

Definition 9.4 (Pseudorandom majorant).

Let $N$ be a natural number, let $\gamma$ be a positive real, and let $R:=N^{\gamma}$ . Define the constant

[TABLE]

Then define the weight $\nu_{N,w}^{\gamma}:\mathbb{Z}\rightarrow\mathbb{R}_{\geqslant 0}$ by

[TABLE]

Note that $\nu_{N,w}^{\gamma}$ also depends on $\rho$ , but we suppress that dependence from the notation (as we fixed $\rho$ in Section 4).

We now state our main new result on the pseudorandomness of this sieve weight.

Theorem 9.5 (Pseudorandomness of sieve weights).

Let $m,d$ be natural numbers, with $d\geqslant m+2$ . Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map, and suppose that $L\notin V_{\operatorname{degen}}^{*}(m,d)$ and that the coefficients of $L$ are algebraic. Assume that $\gamma$ is a positive parameter that is small enough in terms of $L$ . Then $\nu_{N,w}^{\gamma}$ is $(L,w)$ -pseudorandom.

Temporarily dropping the convention that $w(N)=\max(1,\log\log\log N)$ , we speculate that the following general result holds.

Conjecture 9.6 (Pseudorandomness conjecture).

Let $m,d$ be natural numbers, with $d\geqslant m+2$ . Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map, and suppose that $L\notin V^{*}_{\operatorname{degen}}(m,d)$ . Then there is some value of $\gamma$ and some function $w:\mathbb{N}\longrightarrow\mathbb{R}_{\geqslant 0}$ , satisfying $w(N)\rightarrow\infty$ as $N\rightarrow\infty$ , for which $\nu_{N,w}^{\gamma}$ is $(L,w)$ -pseudorandom.

Unfortunately we have not been able to resolve Conjecture 9.6, but we strongly believe it to be true. If $d$ is large enough in terms of $m$ then the analytic methods of Parsell (see [18] and Appendix B) can be used to show that $\nu_{N,w}^{\gamma}$ is $(L,w)$ -pseudorandom without any algebraicity assumptions. But these methods seem harder to apply in the range $d\geqslant m+2$ , and we have not been able to establish the appropriate mean value estimate. Resolving Conjecture 9.6 would, after a straightforward adaptation of the methods of this paper, enable one to remove the algebraicity assumption from Theorem 1.7 and Theorem 1.16.

Remark 9.7.

The proof of Theorem 9.5 is the only moment during the proof of the main theorems Theorem 1.7 and Theorem 1.16 when we use the fact that the coefficients of the original linear map $L$ are algebraic. Furthermore, we will ultimately only ever appeal to the linear inequalities condition for a certain finite collection of linear maps, which includes the original linear map $L$ itself as well as some auxiliary linear maps that are generated from applications of the Cauchy-Schwarz inequality. Since only the diophantine approximation properties of algebraic numbers are used (witness Lemma 8.4 and [21, Lemma E.1]), and since these properties are satisfied by almost all real numbers, one may show that Theorems 1.7 and 1.16 remain true for some explicit set of maps $L$ that has full Lebesgue measure.**

To demonstrate our approach to proving Theorem 9.5, we first give the argument under the simplifying additional assumption that $L$ is purely irrational (see Definition 5.2).

Lemma 9.8.

Suppose that $F$ , $G$ , $L$ , $\mathbf{v}$ and the functions $w_{1},\dots,w_{d}$ all satisfy the conditions in Definition 9.1. Suppose in addition that $L$ is surjective, purely irrational, and has algebraic coefficients. Then for all positive $K$ we have

[TABLE]

where $J$ is the singular integral

[TABLE]

Proof.

We have the identity

[TABLE]

Then the expression $T_{F,G,N}^{L,\mathbf{v}}(\Lambda_{\mathbb{Z}/W_{1}\mathbb{Z}},\dots,\Lambda_{\mathbb{Z}/W_{d}\mathbb{Z}})$ is equal to

[TABLE]

by applying Lemma 8.1 to the inner sum, where in the statement of that lemma we take $h=d$ , the map $\Xi:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{d}$ to be the identity, and $\widetilde{\mathbf{r}}=\mathbf{0}$ . The local factor $\alpha_{\mathbf{e},\widetilde{\mathbf{r}}}$ is equal to $\prod_{j\leqslant d}e_{j}^{-1}$ in this instance.

Sum the error term in (9) over all $e_{j}$ . The bound $W_{j}=O(\log\log N)$ that comes from the prime number theorem controls the resulting error term (with room to spare), and the main term of (9.3) follows from the identity

[TABLE]

∎

To finish the proof of Theorem 9.5 (in the case when $L$ is purely irrational, that is) it now suffices to show that

[TABLE]

where each $f_{j}$ is either $\nu_{N,w}^{\gamma}$ or $\Lambda_{\mathbb{Z}/W_{j}\mathbb{Z}}$ . By multiplying out the left-hand side of (9.8), we see that it is sufficient to prove that

[TABLE]

where each $\nu_{j}$ equals either $c_{\rho,2}^{-1}\Lambda_{\rho,R,2}$ or $\Lambda_{\mathbb{Z}/W_{j}\mathbb{Z}}$ (recall $R:=N^{\gamma}$ ).

After our analysis in Lemma 8.1, it turns out that the estimate (9.9) will follow almost immediately from the sieve calculation performed in [13, Theorem D.3]. To describe the details, it will be useful to introduce the following notation. Let

[TABLE]

and

[TABLE]

We may assume that $S\neq\emptyset$ , as otherwise the estimate (9.9) follows from the estimate (9.3).

Each $\nu_{j}$ may be expressed as a divisor sum, either using Definition 9.3 or expression (9.5). Doing this, and swapping orders of summations, we have that the left-hand side of expression (9.9) is equal to

[TABLE]

where $R=N^{\gamma}$ , and if $j\in S$ we write $e_{j}$ for the least common multiple $[e_{j,1},e_{j,2}]$ . Using the compact support of the function $\rho$ , when analysing the inner sum one may assume that each $e_{j}$ is at most $N^{2\gamma}$ .

We apply Lemma 8.1. Therefore, provided $\gamma$ is small enough,

[TABLE]

as in (9). By the bounds on $W_{j}$ and $e_{j}$ , the error term from (9.11) may be summed over all $e_{j}$ and remain acceptable. We also have the identity

[TABLE]

Therefore expression (9.9) would follow from the asymptotic

[TABLE]

But this is just expression D.4 of [13], applied to the identity map $\Psi:\mathbb{R}^{|S|}\longrightarrow\mathbb{R}^{|S|}$ . Note that the quantity $X$ in expression D.4 of [13] is zero, as if $\psi_{1},\dots,\psi_{|S|}:\mathbb{R}^{|S|}\longrightarrow\mathbb{R}$ are the linear maps given by $\psi_{j}(\mathbf{x}):=x_{j}$ for all $j\leqslant|S|$ then there are no primes $p$ for which there exist two forms $\psi_{i}$ and $\psi_{j}$ that are linearly dependent modulo $p$ . This proves (9.9), and hence resolves Theorem 9.5 in the case when $L$ is purely irrational.

We now present the detailed proof of Theorem 9.5 in full generality.

Proof of Theorem 9.5.

Let $u$ be the rational dimension of $L$ (see Definition 5.2). Apply Lemma 5.6 to both the expression $T_{F,G,N}^{L,\mathbf{v}}(f_{1},\dots,f_{d})$ and the expression

$T_{F,G,N}^{L,\mathbf{v}}(\Lambda_{\mathbb{Z}/W_{1}\mathbb{Z}},\dots,\Lambda_{\mathbb{Z}/W_{d}\mathbb{Z}})$ . Writing $h:=d-u$ , where $u$ is the rational dimension of $L$ , and renaming $m-u$ as $m$ , $L^{\prime}$ as $L$ , $\mathbf{v^{\prime}}$ as $\mathbf{v}$ , and $G_{\widetilde{\mathbf{r}}}$ as $G$ , we see that it suffices to prove the following theorem.

Theorem 9.9.

*Let $N,d,h$ be natural numbers, and let $m$ be a non-negative integer. Suppose that $d\geqslant h\geqslant m+2$ . Let $C,\gamma$ be positive parameters, and let $P$ be a set of additional parameters. Let $L:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{m}$ be a surjective purely irrational linear map with algebraic coefficients, and let $\Xi:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{d}$ be an injective linear map with integer coefficients. Assume that $\gamma$ is small enough in terms of $L$ . Let $\mathbf{v}\in\mathbb{R}^{m}$ be a vector with $\|\mathbf{v}\|_{\infty}\leqslant CN$ , and let $\widetilde{\mathbf{r}}\in\mathbb{Z}^{d}$ be a vector with $\|\widetilde{\mathbf{r}}\|_{\infty}\leqslant CN$ . Let $F:\mathbb{R}^{d}\longrightarrow[0,1]$ and $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be in $\mathcal{C}(P)$ . Let $w_{1},\dots w_{d}:\mathbb{N}\longrightarrow\mathbb{R}_{\geqslant 0}$ be functions that each satisfy $w_{j}(n)\rightarrow\infty$ as $n\rightarrow\infty$ and $w_{j}(n)\leqslant w(n)$ for all $n$ .

These conditions will be referred to as ‘the hypotheses of Theorem 9.9’.

Then, if $\Xi$ has finite Cauchy-Schwarz complexity,

[TABLE]

where each $f_{j}$ equals either $\nu_{N,w}^{\gamma}$ or $\Lambda_{\mathbb{Z}/W_{j}\mathbb{Z}}$ .

Proof of Theorem 9.9.

Let $\Xi:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{d}$ have coordinate maps $(\xi_{1},\dots,\xi_{d})$ . Let

[TABLE]

be the singular series, where $\widetilde{r_{j}}$ denotes the $j^{th}$ coordinate of $\widetilde{\mathbf{r}}$ . Let

[TABLE]

be the singular integral.

Lemma 9.10.

Under the hypotheses of Theorem 9.9, if $\Xi$ has finite Cauchy-Schwarz complexity then the singular series and singular integral satisfy the bounds

[TABLE]

and

[TABLE]

The reader may find the definition of $\operatorname{Rad}(F)$ and $\operatorname{Rad}(G)$ in Section 3.

Proof.

Since $\Xi$ has finite Cauchy-Schwarz complexity, no two of the forms $\xi_{1},\dots,\xi_{d}$ are parallel. Hence by [13, Lemma 1.3] the singular series $\mathfrak{S}_{\widetilde{\mathbf{r}}}$ converges, and the size may be bounded by a constant depending only on $\Xi$ .

The bound on $J_{\widetilde{\mathbf{r}}}$ follows directly from Lemma A.1. ∎

We continue with the following lemma, which is a more general version of Lemma 9.8.

Lemma 9.11.

Under the hypotheses of Theorem 9.9 we have, for every positive real $K$ ,

[TABLE]

If $\Xi$ has finite Cauchy-Schwarz complexity, then

[TABLE]

Proof.

We proceed as in the proof of Lemma 9.8. Then $T_{F,G,N}^{L,\mathbf{v},\Xi,\widetilde{\mathbf{r}}}(\Lambda_{\mathbb{Z}/W_{1}\mathbb{Z}},\dots,\Lambda_{\mathbb{Z}/W_{d}\mathbb{Z}})$ is equal to

[TABLE]

by applying Lemma 8.1 to the inner sum, where

[TABLE]

If $m=0$ , one should apply Lemma 8.6 in place of Lemma 8.1.

By using the identity (9.5) again one obtains

[TABLE]

This settles the first part of the lemma.

For the second part, by the Chinese Remainder Theorem we have that (9) is equal to

[TABLE]

where $\prod^{*}$ denotes the product over those $j\leqslant d$ for which $p\leqslant w_{j}$ .

Since $\Xi$ has finite Cauchy-Schwarz complexity there is no pair of forms $\xi_{i}$ and $\xi_{j}$ that are parallel. Therefore we may apply the analysis of local factors in [13, Lemma 1.3] to conclude that the first bracket in (9) is equal to $\mathfrak{S}_{\widetilde{\mathbf{r}}}(1+O_{\Xi}((\min_{j}w_{j}(N))^{-1}))$ , and that the second bracket is equal to $(1+O_{\Xi}((\min_{j}w_{j}(N))^{-1})$ . Combining these bounds with Lemma 9.10 gives the second part of the present lemma. ∎

Remark 9.12.

As we intimated earlier, in Remark 1.17, one can use Lemma 9.11 to establish an asymptotic expression for $T_{F,G,N}^{L,\mathbf{v}}(\Lambda_{\mathbb{Z}/W\mathbb{Z}},\dots,\Lambda_{\mathbb{Z}/W\mathbb{Z}})$ in the general case. Indeed, one applies the rational parametrisation process of Lemma 5.6 and then the asymptotic in Lemma 9.11 to obtain**

[TABLE]

Now, Theorem 9.9 will be settled if we can show that the left-hand side of (9.14) enjoys the same asymptotic expression as the one present in (9.18). By multiplying out the left-hand side of (9.14), we see that it is sufficient to prove the following lemma.

Lemma 9.13.

Under the hypotheses of Theorem 9.9, if $\Xi$ has finite Cauchy-Schwarz complexity then

[TABLE]

where each $\nu_{j}$ equals either $c_{\rho,2}^{-1}\Lambda_{\rho,R,2}$ or $\Lambda_{\mathbb{Z}/W_{j}\mathbb{Z}}.$

Proof of Lemma.

The first half of the proof of this lemma comprises manipulations that are very similar to those that have appeared previously in this section. Indeed, as before, it will be useful to let

[TABLE]

and

[TABLE]

We may assume that $S\neq\emptyset$ , as otherwise the estimate (9.22) follows from Lemma 9.11.

Considering (9.5) again, and expressing each $\nu_{j}$ as a divisor sum, we have that the left-hand side of expression (9.22) is equal to

[TABLE]

where if $j\in S$ we write $e_{j}$ for the least common multiple $[e_{j,1},e_{j,2}]$ . Using the compact support of the function $\rho$ , when analysing the inner sum one may assume that each $e_{j}$ is at most $N^{2\gamma}$ .

We apply Lemma 8.1 (or, if $m=0$ , we apply Lemma 8.6). Therefore

[TABLE]

where

[TABLE]

Therefore expression (9.22) (and hence the entirety of Theorem 9.9) would follow from the asymptotic expression

[TABLE]

Note that this expression concerns linear forms with integer coefficients. We have removed the irrational information entirely.

Expression (9) follows from the sieve calculation [13, Theorem D.3], after restricting to suitable arithmetic progressions. Indeed, let

[TABLE]

Then the left-hand side of (9) is equal to

[TABLE]

The expression following the summation in $\mathbf{a}$ is amenable to the estimate (D.4) from [13], applied with $t=|S|$ and affine linear forms

[TABLE]

In order to apply this estimate we note first that $t\neq 0$ (since we have previously assumed that $S\neq\emptyset$ ). We also note again that, by the finite Cauchy-Schwarz complexity assumption, no two of the forms $\psi_{j}$ are rational multiples of each other.

So, applying the estimate (D.4) from [13] we have that the expression in (9) following the summation in $\mathbf{a}$ is equal to

[TABLE]

where

[TABLE]

and

[TABLE]

where $P_{\Xi}$ is the set of ‘exceptional’ primes, i.e. those primes $p$ for which there exist $i$ and $j$ for which the forms $\xi_{i}(W\mathbf{m}+\mathbf{a})+\widetilde{r}_{i}$ and $\xi_{j}(W\mathbf{m}+\mathbf{a})+\widetilde{r}_{j}$ are affinely related modulo $p$ .

Remark 9.14.

*The reader may have noticed that expression (9.28) is not exactly what was proved in estimate (D.4) of [13]. Rather than having an error term depending on $\Xi$ and $C$ , that expression has an error term depending on the linear maps $\mathbf{m}\mapsto\xi_{j}(W\mathbf{m}+\mathbf{a})+\widetilde{r}_{j}$ which, one notes, have coefficients that depend on $W$ and that are therefore unbounded. Fortunately, the dependence of the error term on the size of the coefficients is only polynomial, and so any contribution from powers of $W$ may be absorbed into the $\log^{-\frac{1}{20}}R$ factor. ***

This technical manoeuvre is also required in [13] (in the application of Theorem D.3 that follows expression (D.24)), although it is not explicitly stated by the authors.**

Following on from (9.28) and assuming that $N$ is large enough in terms of $\Xi$ , we see that any $p\in P_{\Xi}$ satisfies $p\leqslant w$ (as $\Xi$ has finite Cauchy-Schwarz complexity). Since $w(N)=\max(1,\log\log\log N)$ , the error in (9.28) is therefore $O_{C,\Xi,\gamma}(\log^{-\Omega(1)}N)$ . Furthermore, by [13, Lemma 1.3] we have $\beta_{p,\mathbf{a}}=1+O(p^{-2})$ , and so $\prod\limits_{p>w}\beta_{p,\mathbf{a}}=1+O(w^{-1})$ . Finally, if $p\leqslant w$ then

[TABLE]

Therefore expression (9), up to an error term of $O_{C,\Xi,\gamma}(w^{-1/2})$ , is equal to

[TABLE]

where

[TABLE]

where $\prod^{*}$ denotes the product over all $j\in S^{\prime}$ for which $p\leqslant w_{j}$ .

By invoking [13, Lemma 1.3] again we conclude that $\widetilde{\beta_{p}}=1+O(p^{-2})$ and also that the first part of expression (9) is equal to $c_{\rho,2}^{|S|}\mathfrak{S}_{\widetilde{\mathbf{r}}}(1+O_{C,\Xi}(\min_{j}w_{j}^{-1}))$ . Hence, as in the conclusion of the proof of Lemma 9.11, we conclude that expression (9) is equal to $c_{\rho,2}^{|S|}\mathfrak{S}_{\widetilde{\mathfrak{r}}}+O_{C,\Xi,\gamma}(\min_{j}w_{j}^{-1/2})$ . This establishes expression (9), and so Lemma 9.13 is proved. ∎

Therefore Theorem 9.9 is resolved. ∎

Hence Theorem 9.5 is settled as well, i.e. we conclude that the weight $\nu_{N,w}^{\gamma}$ is $(L,w)$ -pseudorandom. ∎

We finish this section by noting a corollary of the theorems above, which will be useful in its own right.

Corollary 9.15 (Upper bound for linear inequalities).

Let $N,m,d$ be natural numbers, with $d\geqslant m+2$ , and let $C,\varepsilon,\gamma$ be positive reals. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map with algebraic coefficients, and suppose that $L\notin V_{\operatorname{degen}}^{*}(m,d)$ and that the coefficients of $L$ are algebraic. Let $u$ be the rational dimension of $L$ . Let $w_{1},\dots w_{d}:\mathbb{N}\longrightarrow\mathbb{R}_{\geqslant 0}$ be functions that satisfy $w_{j}(n)\rightarrow\infty$ as $n\rightarrow\infty$ for all $j$ and satisfy $w_{j}(n)\leqslant w(n)$ for all $j$ and for all $n$ . If $\gamma$ is small enough in terms of $L$ , then for all functions $F:\mathbb{R}^{d}\longrightarrow[0,1]$ supported on $[-C,C]^{d}$ , for all functions $G:\mathbb{R}^{m}\longrightarrow[0,1]$ supported on $[-\varepsilon,\varepsilon]^{m}$ , and for all $\mathbf{v}\in\mathbb{R}^{m}$ satisfying $\|\mathbf{v}\|_{\infty}\leqslant CN$ , one has

[TABLE]

as $N\rightarrow\infty$ , where each $f_{j}$ equals either $\nu_{N,w}^{\gamma}$ or $\Lambda_{\mathbb{Z}/W_{j}\mathbb{Z}}$ . The $o(1)$ term may depend on $C$ , $L$ , $\varepsilon$ , $\gamma$ , and the choice of functions $w_{1},\dots,w_{d}$ .

Proof.

Using Lemma 3.1, replace both $F$ and $G$ by compactly supported smooth majorants $F_{1}$ and $G_{1}$ for which

[TABLE]

and

[TABLE]

We have $F_{1}\in\mathcal{C}(C_{1})$ and $G_{1}\in\mathcal{C}(\varepsilon)$ . Then, by Theorem 9.5,

[TABLE]

where the error term may depend on $C$ , $L$ , $\varepsilon$ , $\gamma$ , and the functions $w_{1},\dots,w_{d}$ .

In Remark 9.12, we noted that

[TABLE]

where the error term depends on the parameters mentioned above, and where $\mathfrak{S}_{\widetilde{\mathbf{r}}}$ and $J_{\widetilde{\mathbf{r}}}$ are of the form (9.15) and (9.16). The corollary then follows from the bounds in Lemma 9.10. ∎

This result is to be compared with the following statement.

Lemma 9.16 (Weak upper bound).

Let $N,m,d$ be natural numbers, with $d\geqslant m$ , and let $C,\varepsilon$ be positive parameters. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map. Then, for all functions $F:\mathbb{R}^{d}\longrightarrow[0,1]$ supported on $[-C,C]^{d}$ , for all functions $G:\mathbb{R}^{m}\longrightarrow[0,1]$ supported on $[-\varepsilon,\varepsilon]^{m}$ , for all $\mathbf{v}\in\mathbb{R}^{m}$ , and for all functions $f_{1},\dots,f_{d}:\mathbb{Z}\longrightarrow\mathbb{R}$ ,

[TABLE]

The bound in Lemma 9.16 is weaker than the bound in Corollary 9.15, but has the advantage of holding for all surjective maps $L$ , which is a situation that will be needed later.

Proof.

This is essentially identical to Lemma 3.2 of [21]. Indeed, one sees immediately that

[TABLE]

Since $L$ is surjective, without loss of generality we may assume that the first $m$ columns of $L$ form an invertible matrix. If the variables $n_{m+1}$ to $n_{d}$ are fixed, there are only $O_{\varepsilon,L}(1)$ possible choices for $n_{1},\dots,n_{m}$ for which the inequality $\|L\mathbf{n}+\mathbf{v}\|_{\infty}\leqslant\varepsilon$ is satisfied. Summing over $n_{m+1}$ to $n_{d}$ , the lemma follows. ∎

Part IV The structure of inequalities

Before embarking upon this part of the argument, we remind the reader of the following basic notion from functional analysis. A linear map $L:(V,\|\cdot\|_{V})\longrightarrow(W,\|\cdot\|_{W})$ between two normed spaces will be called a bounded operator if there exists a constant $C_{L}$ such that for all $\mathbf{v}\in V$ one has $\|L\mathbf{v}\|_{W}\leqslant C_{L}\|\mathbf{v}\|_{V}$ . It is a standard fact that all linear maps between two finite dimensional normed spaces are bounded.

10. An alternative formulation

So far all of our theorems and lemmas have been phrased in terms of linear inequalities that are written in the form $T_{F,G,N}^{L,\mathbf{v}}(f_{1},\dots,f_{d})$ . In Section 14 the auxiliary inequalities will appear in a different form, but, as is shown in Lemma 10.1 below, these different forms are more-or-less equivalent. The statement of this lemma is unfortunately rather technical, but the proof is straightforward. The reader may wish in the first instance to consider the special case in which $l=0$ and $\Phi$ is injective.

Lemma 10.1 (Alternative formulation).

Let $m,d,l$ be natural numbers, with $d\geqslant m$ , and let $C,\sigma,\eta$ be positive parameters. Let $P$ be another set of parameters. Let $k$ be a non-negative integer, and suppose that $\eta$ is small enough in terms of $m$ , $d$ , $k$ and $l$ . Let $\Phi:\mathbb{R}^{d-m+k}\longrightarrow\mathbb{R}^{d}$ be a linear map, and suppose that $k=\dim\ker\Phi$ . Let $I:\mathbb{R}^{d}\longrightarrow[0,1]$ and $H:\mathbb{R}^{d-m+k+l}\longrightarrow[0,1]$ be smooth functions, where $\operatorname{Rad}(I)\leqslant\eta$ and $\operatorname{Rad}(H)\leqslant C$ . Assume that the Lipschitz constant of $H$ is at most $\sigma^{-1}$ and assume further that $H,I\in\mathcal{C}(P)$ . Then

(1)

there exists a surjective linear map $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ such that $\ker L=\operatorname{Im}\Phi$ and $\|L\|_{\infty}=O_{\Phi}(1)$ . If $\Phi$ has algebraic coefficients then $L$ can be chosen to have algebraic coefficients. 2. (2)

for any $L$ satisfying part (1), if $\Phi$ has finite Cauchy-Schwarz complexity then $L\notin V_{\operatorname{degen}}^{*}(m,d)$ . 3. (3)

for any $L$ satisfying part (1), if $\eta$ is small enough in terms of $L$ and $\Phi$ then there exist smooth functions $F:\mathbb{R}^{d+l}\longrightarrow\mathbb{R}_{\geqslant 0}$ and $G:\mathbb{R}^{m}\longrightarrow\mathbb{R}_{\geqslant 0}$ , with $F\in\mathcal{C}(P,\Phi)$ , $G\in\mathcal{C}(L,P,\Phi)$ and $\operatorname{Rad}(G)\ll_{L}\eta$ , such that for all $\mathbf{v}\in\mathbb{R}^{l}$ , $\mathbf{z}\in\mathbb{R}^{d}$ , and natural numbers $N$ ,

[TABLE]

where $E(\mathbf{z},N)$ is an error term of size at most

[TABLE]

Proof.

Part (1) of the lemma is immediate. Indeed, one has the quotient map $\pi:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{d}/\operatorname{Im}\Phi$ . Choosing an isomorphism $\iota:\mathbb{R}^{d}/\operatorname{Im}\Phi\cong\mathbb{R}^{m}$ , we may define $L:=\iota\circ\pi$ . If $\Phi$ has algebraic coefficients then choosing such an $\iota$ with algebraic coefficients gives a suitable $L$ with algebraic coefficients.

For part (2), suppose that $\Phi$ has finite Cauchy-Schwarz complexity. If $L$ were in $V_{\operatorname{degen}}^{*}(m,d)$ then there would exist $i,j\leqslant d$ and a real number $\lambda$ for which $\mathbf{e_{i}}-\lambda\mathbf{e_{j}}$ is non zero and $\mathbf{e_{i}^{*}}-\lambda\mathbf{e_{j}^{*}}\in L^{*}((\mathbb{R}^{m})^{*})$ , which would imply $\mathbf{e_{i}^{*}}-\lambda\mathbf{e_{j}^{*}}\in(\operatorname{Im}\Phi)^{0}$ , which would imply that $\Phi$ has infinite Cauchy-Schwarz complexity, contradicting the hypothesis.

It remains to prove part (3). Let $\{\mathbf{u^{(1)}},\dots,\mathbf{u^{(k)}}\}\subset\mathbb{R}^{d-m+k}$ be an orthonormal basis for $\ker\Phi$ , and extend this to an orthonormal basis $\{\mathbf{u^{(1)}},\dots,\mathbf{u^{(d-m+k)}}\}$ for $\mathbb{R}^{d-m+k}$ . Then define the linear map $\Psi:\mathbb{R}^{d-m+k}\longrightarrow\mathbb{R}^{d-m+k}$ by

[TABLE]

By changing variables, we have that the left-hand side of (10.1) is equal to

[TABLE]

which equals

[TABLE]

Recall, from Section 4, that we use the notation $\mathbf{y_{1}^{k}}$ to refer to the vector $(y_{1},\dots,y_{k})^{T}\in\mathbb{R}^{k}$ , etcetera.

We make some observations. Firstly, we observe that (10.2) is equal to [math] unless $\|\mathbf{z}\|_{\infty}=O_{C,\Phi}(N)$ . Indeed, if $\|z\|_{\infty}\geqslant C_{1}N$ then for all $y_{k+1},\dots,y_{d-m+k}$ that give a non-zero contribution to (10.2) we have

[TABLE]

if $\eta$ is small enough. This means that

[TABLE]

which if $C_{1}$ is large enough in terms of $C$ and $\Phi$ means that

[TABLE]

for all $y_{1},\dots,y_{k}$ . [Note that $\Psi(\mathbf{y_{1}^{k}},\mathbf{0})$ and $\Psi(\mathbf{0},\mathbf{y_{k+1}^{d-m+k}})$ are orthogonal.]

Secondly, we observe that

[TABLE]

for all $y_{k+1},\dots,y_{d-m+k}$ that give a non-zero contribution to the integral (10.2). Write $\mathbf{z}=\mathbf{z_{1}}+\mathbf{z_{2}}$ , where $\mathbf{z_{1}}\in\operatorname{Im}\Phi$ and $\mathbf{z_{2}}\in(\operatorname{Im}\Phi)^{\perp}$ . By orthogonality, we conclude that

[TABLE]

Since $(\Phi|_{(\ker\Phi)^{\perp}})^{-1}:\operatorname{Im}\Phi\longrightarrow(\ker\Phi)^{\perp}$ is a bounded linear map, this in turn means that

[TABLE]

Since $H$ is Lipschitz, with Lipschitz constant at most $\sigma^{-1}$ , this all means that (10.2) is equal to

[TABLE]

plus an error of size at most

[TABLE]

We proceed to analyse the terms of (10) separately. Firstly, by shifting the variables $y_{k+1},\dots,y_{d-m+k}$ we see that the first bracket of (10) is equal to

[TABLE]

Now let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be any surjective linear map that satisfies $\ker L=\operatorname{Im}\Phi$ . Note that $L|_{(\operatorname{Im}\Phi)^{\perp}}$ is an injective linear map, and thus (10.4) is equal to

[TABLE]

Differentiating inside the integral, one sees that this expression is equal to $G(L\mathbf{z_{2}})$ for some smooth compactly supported function $G:\mathbb{R}^{m}\longrightarrow\mathbb{R}_{\geqslant 0}$ satisfying $G\in\mathcal{C}(L,P,\Phi)$ . Moreover, $G$ is supported on $[O_{L}(\eta),O_{L}(\eta)]^{m}$ , since $(L|_{(\operatorname{Im}\Phi)^{\perp}})^{-1}L\mathbf{z_{2}}$ and $\Phi(\Psi(\mathbf{0},\mathbf{y_{k+1}^{d-m+k}}))$ are orthogonal. Note that $L\mathbf{z_{2}}=L\mathbf{z}$ , so the expression is equal to $G(L\mathbf{z})$ .

We move to the second term of (10). Choose $\iota:\operatorname{Im}\Phi\longrightarrow\mathbb{R}^{d-m}$ to be an isomorphism with $\|\iota(\mathbf{x})\|_{\infty}\asymp_{\Phi}\|\mathbf{x}\|_{\infty}$ . Then the second term of (10) is equal to $F_{1}((\mathbf{v},\iota(\mathbf{z_{1}}))/N)$ for some smooth function $F_{1}:\mathbb{R}^{d-m+l}\longrightarrow\mathbb{R}_{\geqslant 0}$ satisfying $F_{1}\in\mathcal{C}(P,\Phi)$ . Note that $F_{1}$ is indeed compactly supported, since $(\Phi|_{(\ker\Phi)^{\perp}})^{-1}(\mathbf{z_{1}})$ and $\Psi(\mathbf{y_{1}^{k}},\mathbf{0})$ are orthogonal vectors.

In summary, we have shown that (10.1) is equal to

[TABLE]

plus an error of size

[TABLE]

By the construction of $L$ , this error is bounded by

[TABLE]

The term $F_{1}(\mathbf{v},\iota(\mathbf{z_{1}})/N)G(L\mathbf{z})$ is not quite of the required form, since $F_{1}(\mathbf{v},\iota(\mathbf{z_{1}})/N)$ is not compactly supported as a function of $\mathbf{z}$ . However, it may be easily massaged into this form. Indeed, from the above discussion we know that $G(L\mathbf{z})\neq 0$ implies that $\|\mathbf{z_{2}}\|_{\infty}\leqslant C_{1}\eta$ , for some constant $C_{1}$ that satisfies $C_{1}=O_{L,\Phi}(1)$ . Let $b:\mathbb{R}\longrightarrow[0,1]$ be a $1/2$ -supported function (in the sense of Definition 4.1), and let $B:\mathbb{R}^{d}\longrightarrow[0,1]$ be defined by $B(\mathbf{x})=\prod_{j=1}^{d}b(x_{j})$ . Then let $F:\mathbb{R}^{d}\longrightarrow\mathbb{R}_{\geqslant 0}$ be defined by

[TABLE]

Then $F\in\mathcal{C}(P,\Phi)$ , and if $\eta\leqslant 1/2C_{1}$ we have

[TABLE]

The lemma is proved. ∎

This reformulation allows us to deduce Corollary 10.3 below. This is a corollary of Theorem 9.5 and is the result on inequalities and sieve weights that we will actually use in Section 15. In order to state this inequality, we introduce the following convention.

Definition 10.2 (Convolution).

If $f:\mathbb{Z}\longrightarrow\mathbb{R}$ has finite support, and $g:\mathbb{R}\longrightarrow[0,1]$ is a measurable function, we may define the convolution $(f\ast g)(x):\mathbb{R}\longrightarrow\mathbb{R}$ by

[TABLE]

Recall from Section 4 that, for some positive parameter $\eta$ , the function $\chi:\mathbb{R}\longrightarrow[0,1]$ denotes a fixed $\eta$ -supported function.

Corollary 10.3 (Switching functions).

Let $N,m,d$ be natural numbers, with $d\geqslant m+2$ , and let $k$ be a non-negative integer. Let $C,\gamma,\eta$ be positive parameters, and let $P$ be a set of further parameters. Suppose that $\eta$ is small enough in terms of $m$ , $d$ , and $k$ . Let $(\varphi_{1},\dots,\varphi_{d})=\Phi:\mathbb{R}^{d-m+k}\longrightarrow\mathbb{R}^{d}$ be a linear map with algebraic coefficients, and suppose that $k=\dim\ker\Phi$ . Suppose that $\Phi$ has finite Cauchy-Schwarz complexity. Let $H:\mathbb{R}^{d-m+k}\longrightarrow[0,1]$ be a smooth function in $\mathcal{C}(P)$ . For $j\leqslant d$ , let $w_{1},\dots,w_{d}$ be any functions with $w_{j}(n)\leqslant w(n)$ for all $n$ and for which $w_{j}(n)\rightarrow\infty$ as $n\rightarrow\infty$ . For each $j\leqslant d$ let the function $f_{j}:\mathbb{Z}\longrightarrow\mathbb{R}_{\geqslant 0}$ be equal to either $\nu_{N,w}^{\gamma}$ or $\Lambda_{\mathbb{Z}/W_{j}\mathbb{Z}}$ . Let $\mathbf{r}\in\mathbb{R}^{d}$ be any vector satisfying $\|\mathbf{r}\|_{\infty}\leqslant CN$ .

Then, if $\gamma$ is small enough in terms of $\Phi$ , the expression

[TABLE]

is independent of the choices of the functions $f_{j}$ , up to an error of size $o(1)$ as $N\rightarrow\infty$ . This $o(1)$ term may depend on $C$ , $P$ , $\Phi$ , $\eta$ , $\gamma$ , and on the functions $w_{1},\dots,w_{d}$ .

Proof.

Expanding out the definition of $f_{j}\ast\chi$ , one observes that the left-hand side of (10.6) is equal to

[TABLE]

By applying Lemma 10.1 to the inner integral, we get a surjective linear map $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ with algebraic coefficients, and smooth functions $F:\mathbb{R}^{d}\longrightarrow\mathbb{R}_{\geqslant 0}$ and $G:\mathbb{R}^{m}\longrightarrow\mathbb{R}_{\geqslant 0}$ supported on $[-O_{P,\Phi}(1),O_{P,\Phi}(1)]^{d}$ and $[-O_{\Phi}(\eta),O_{\Phi}(\eta)]^{m}$ respectively and with $F,G\in\mathcal{C}(P,\Phi,\eta)$ , such that (10.7) is equal to

[TABLE]

plus an error of size

[TABLE]

Furthermore, $L\notin V_{\operatorname{degen}}^{*}(m,d)$ .

Now apply Theorem 9.5 to the main term (10.8). As written this theorem applies to functions $F$ and $G$ that take values in $[0,1]$ , but by the obvious rescaling we may nonetheless apply the theorem to the present functions $F$ and $G$ . This shows immediately that (10.8) is independent of the particular choices of $f_{1},\dots,f_{d}$ , up to an error of size $o(1)$ . The $o(1)$ term has the appropriate dependencies.

For the error term (10.9), we apply the upper bound in Corollary 9.15. This shows that (10.9) is $o(N^{-1})$ , so may be absorbed into the $o(1)$ term above. Corollary 10.3 is proved. ∎

An upper bound in this setting will also be convenient.

Corollary 10.4.

Under the same hypotheses as Corollary 10.3,

[TABLE]

where the implied constant may depend on $C$ , $P$ , $\Phi$ , $\eta$ , $\gamma$ , and on the functions $w_{1},\dots,w_{d}$ .

Proof.

Proceed as in the previous proof to get to expression (10.8). Then apply the upper bound in Corollary 9.15. ∎

11. Variation in parameters

This section will be devoted to proving Lemma 11.1 below. This technical lemma shows that the number of solutions to certain inequalities, weighted by the local von Mangoldt function, is a quantity that behaves well when the underlying parameters are perturbed. The slightly esoteric notation, in which we introduce a dimension $d$ only to consider $\mathbf{x}\in\mathbb{R}^{d-1}$ , is designed to correspond to the moment in Section 15 in which this lemma will be applied.

Lemma 11.1.

Let $d,l,N,s$ be natural numbers, with $d\geqslant 2$ , and let $C,\eta$ be positive parameters. Let $(\varphi_{1},\dots,\varphi_{l})=\Phi:\mathbb{R}^{d-1}\longrightarrow\mathbb{R}^{l}$ and $(\psi_{1},\dots,\psi_{l})=\Psi:\mathbb{R}^{s+2}\longrightarrow\mathbb{R}^{l}$ be linear maps with algebraic coefficients. Let $P$ be a set of parameters, and let $b\in\mathcal{C}(C,P,\eta,\Phi,\Psi)$ be an arbitrary smooth function. Let $w^{*}:\mathbb{N}\longrightarrow\mathbb{R}$ be a function such that $w^{*}(n)\rightarrow\infty$ as $n\rightarrow\infty$ and $w^{*}(n)\leqslant w(n)$ for all $n$ . Let $\mathbf{a}\in\mathbb{R}^{l}$ be a vector satisfying $\|\mathbf{a}\|_{\infty}\leqslant CN$ . For $\mathbf{y}\in\mathbb{R}^{s+1}$ , define

[TABLE]

where $a_{j}$ is the $j^{th}$ coordinate of $\mathbf{a}$ . Then, if $\eta$ is sufficiently small in terms of $\Phi$ and $\Psi$ , there is a function $f_{1}:\mathbb{Z}^{l}\longrightarrow\mathbb{C}$ , satisfying $\|f_{1}\|_{\infty}\ll(\log\log W^{*})^{O(1)}$ , such that

[TABLE]

Here $b_{\mathbf{a},N}\in\mathcal{C}(C,P,\eta,\Phi,\Psi)$ , though it may also depend on $\mathbf{a}$ and $N$ .

None of the methods required to prove this lemma will be particularly deep, but the technical manoeuvres will be a little intricate. In particular, we will need to apply the approximation in Lemma 10.1 multiple times within the same argument.

The proof of Lemma 11.1 will require the preliminary result below, namely Lemma 11.2. To state this lemma, we define a metric on $\mathbb{R}^{d}/K\mathbb{Z}^{d}$ by the formula

[TABLE]

Lipschitz constants of functions $\mathfrak{F}:\mathbb{R}^{d}/K\mathbb{Z}^{d}\longrightarrow\mathbb{R}$ will be considered with respect to this metric.

Lemma 11.2.

Let $d,m,K$ be natural numbers, and let $\eta,\sigma$ be positive parameters. Let $S:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map with integer coefficients, and let $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be a Lipschitz function supported on $[-\eta,\eta]^{m}$ , with Lipschitz constant at most $\sigma^{-1}$ . Let $\mathfrak{F}:S\mathbb{Z}^{d}\longrightarrow\mathbb{R}$ be any function for which

[TABLE]

for all $\mathbf{x}\in\mathbb{Z}^{d}$ and all $\mathbf{n}\in K\mathbb{Z}^{d}$ .

For each $\mathbf{a}\in\mathbb{R}^{d}$ , define $\widetilde{\mathbf{a}}\in\mathbb{Z}^{d}$ to be some vector with integer coordinates for which

[TABLE]

Then, provided $\eta$ is small enough in terms of $S$ , the function

[TABLE]

•

depends only on the value of $\mathbf{a}$ modulo $K\mathbb{Z}^{d}$ ;

•

is Lipschitz when viewed as a function on $\mathbb{R}^{d}/K\mathbb{Z}^{d}$ , with Lipschitz constant at most

[TABLE]

Remark 11.3.

The expression $\min\limits_{\mathbf{n}\in\mathbb{Z}^{d}}\|S(\mathbf{n}-\mathbf{a})\|_{\infty}$ is well-defined, since $S\mathbb{Z}^{d}$ is a lattice.**

Proof.

To prove the first part of the lemma, let $\mathbf{a}\in\mathbb{R}^{d}$ and first suppose that there is a unique vector $\mathbf{x}\in S\mathbb{Z}^{d}$ for which

[TABLE]

In this case, by the uniqueness of $\mathbf{x}$ , we have $\mathbf{x}=S\widetilde{\mathbf{a}}$ . By translation, we know that

[TABLE]

for all $\mathbf{n}\in\mathbb{Z}^{d}$ , and hence

[TABLE]

for all $\mathbf{n}\in\mathbb{Z}^{d}$ . Hence

[TABLE]

and so the function

[TABLE]

depends only on the value of $\mathbf{a}$ modulo $\mathbb{Z}^{d}$ . Furthermore, if $\mathbf{n}\in K\mathbb{Z}^{d}$ ,

[TABLE]

by the invariance properties of $\mathfrak{F}$ . Hence the function (11.2) only depends on the value of $\mathbf{a}$ modulo $K\mathbb{Z}^{d}$ .

Now suppose that there were two distinct vectors $\mathbf{x_{1}}$ , $\mathbf{x_{2}}\in S\mathbb{Z}^{d}$ for which

[TABLE]

for $i=1,2$ . Then in fact $G(S(\widetilde{\mathbf{a}}-\mathbf{a}))=0$ . Indeed, if this were not the case then we would have $\|\mathbf{x_{1}}-\mathbf{x_{2}}\|_{\infty}\leqslant O(\eta)$ , which is impossible if $\eta$ is small enough, since $\mathbf{x_{1}}$ and $\mathbf{x_{2}}$ are two distinct elements of $\mathbb{Z}^{m}$ . By translation, we may also conclude that $G(S(\widetilde{\mathbf{a}+\mathbf{n}}-(\mathbf{a}+\mathbf{n})))=0$ for all $\mathbf{n}\in\mathbb{Z}^{d}$ . So again, the function (11.2) depends only on the value of $\mathbf{a}$ modulo $K\mathbb{Z}^{d}$ .

Regarding the second part of the lemma, the idea of the proof is similar to the above. Indeed, the only aspect of the function (11.2) that could lead to a large Lipschitz constant is of course the term $S\widetilde{\mathbf{a}}$ , which could, one fears, jump sharply for small changes in $\mathbf{a}$ . However, when such jumps occur, the function $G(S(\widetilde{\mathbf{a}}-\mathbf{a}))$ is always equal to zero.

Let us proceed with the full proof. Indeed, let $\mathbf{a_{0}}$ , $\mathbf{a_{1}}\in\mathbb{R}^{d}$ and suppose first that

[TABLE]

By choosing suitable coset representatives, without loss of generality we may assume that

[TABLE]

Then either $S\widetilde{\mathbf{a_{0}}}=S\widetilde{\mathbf{a_{1}}}$ or $S\widetilde{\mathbf{a_{0}}}\neq S\widetilde{\mathbf{a_{1}}}$ . If $S\widetilde{\mathbf{a_{0}}}=S\widetilde{\mathbf{a_{1}}}$ then

[TABLE]

Therefore

[TABLE]

That resolves the lemma in this case.

If on the other hand $S\widetilde{\mathbf{a_{0}}}\neq S\widetilde{\mathbf{a_{1}}}$ , we may conclude that both

[TABLE]

and

[TABLE]

Indeed, if $\|S\widetilde{\mathbf{a_{0}}}-S\mathbf{a_{0}}\|_{\infty}\leqslant 10\eta$ , say, then

[TABLE]

If $\eta$ is small enough, this implies that $S\widetilde{\mathbf{a_{0}}}$ must be the unique element of $S\mathbb{Z}^{d}$ for which

[TABLE]

and hence that $S\widetilde{\mathbf{a_{0}}}=S\widetilde{\mathbf{a_{1}}}$ , contradicting the assumption.

If $\eta$ is small enough, expressions (11.4) and (11.5) imply that

[TABLE]

and so

[TABLE]

That resolves the lemma in this case.

The only remaining case to consider is when

[TABLE]

In this case we bound the Lipschitz constant very crudely, as $O(\eta^{-1}\|\mathfrak{F}\|_{\infty}\|G\|_{\infty})$ , which is $O(\|\mathfrak{F}\|_{\infty}\eta^{-1})$ , since $\|G\|_{\infty}\leqslant 1$ . This settles the lemma. ∎

We are now ready to prove Lemma 11.1.

Proof of Lemma 11.1.

For this proof we make the following conventions. Any implied constant may depend on $C$ , $\Phi$ and $\Psi$ , and we will use the notation $b$ , $b_{1}$ , $b_{2}$ etc. to denote a function in $\mathcal{C}(C,P,\eta,\Phi,\Psi)$ , that may change from line to line.

The first part of the proof will involve establishing an asymptotic formula for $Q_{\mathbf{a},N}(\mathbf{y})$ , namely the expression $Q_{\mathbf{a},N}(\mathbf{y})=\mathfrak{S}_{\mathbf{a},N}(\mathbf{y})I_{\mathbf{a},N}(\mathbf{y})+o_{P,\eta}(1)$ in (11.11) below. Indeed, expanding out the definition of $\Lambda_{\mathbb{Z}/W^{*}\mathbb{Z}}\ast\chi$ (see Definition 10.2) we have

[TABLE]

where $\boldsymbol{\chi}:\mathbb{R}^{l}\longrightarrow[0,1]$ is defined by $\boldsymbol{\chi}(\mathbf{z}):=\prod_{j\leqslant l}\chi(z_{j})$ . Let $k:=\dim\operatorname{Im}\Phi$ , and note that $k\leqslant d-1$ .

The inner integral of (11.7) may be analysed using Lemma 10.1. The following table indicates which objects in (11.7) play which role in Lemma 10.1.

[TABLE]

So, applying Lemma 10.1, one sees that (11.7) is equal to

[TABLE]

Here, $L:\mathbb{R}^{l}\longrightarrow\mathbb{R}^{l-k}$ is a surjective linear map with algebraic coefficients, that depends only on $\Phi$ , $\operatorname{Rad}(b_{2})=O(\eta)$ , and the error term $E$ may be bounded above by

[TABLE]

where the summation $\sum^{*}$ denotes summation over the set

[TABLE]

The error term $E$ is easy to bound. Indeed, by Lemma 9.16, expression (11.9) may be bounded by $O_{P}(N^{k-d}(\log\log W^{*})^{d})$ . Since $k\leqslant d-1$ , this is an $o_{P}(1)$ error.

It remains to analyse the main term in (11.8), which we will do with the help of Lemma 5.6. The reader is invited to consult Section 5 for the statement of this result, and for the definitions of rational map, rational dimension, etcetera.

Now, let $u$ be the rational dimension of $L$ , and let $\Theta:\mathbb{R}^{l-k}\longrightarrow\mathbb{R}^{u}$ be a rational map for $L$ with algebraic coefficients. Then, there exists an injective linear map $(\xi_{1},\dots,\xi_{l})=\Xi:\mathbb{R}^{l-u}\longrightarrow\mathbb{R}^{l}$ with integer coefficients, satisfying $\Xi\mathbb{Z}^{l-u}=\mathbb{Z}^{l}\cap\ker\Theta L$ , and a vector $\widetilde{\mathbf{r}}(\mathbf{a},\mathbf{y})\in\mathbb{Z}^{l}$ , such that the main term of (11.8) is equal to

[TABLE]

where $\widetilde{r}(\mathbf{a},\mathbf{y})_{j}$ is the $j^{th}$ coordinate of $\mathbf{\widetilde{r}}(\mathbf{a},\mathbf{y})$ . Note how we’ve appealed to part (11) of Lemma 5.6 for the particular form of the argument of $b_{2}$ . Note also how, since $\eta$ is sufficiently small, we have been able to apply part (10) of the lemma to establish that $\widetilde{R}$ consists of a single element $\mathbf{\widetilde{r}}(\mathbf{a},\mathbf{y})$ .

Moreover, from part (10) of the lemma again, we have that $\widetilde{\mathbf{r}}(\mathbf{a},\mathbf{y})$ is an element of $\mathbb{Z}^{l}$ for which

[TABLE]

From part (9) of Lemma 5.6, letting $\{\mathbf{e_{1}},\dots,\mathbf{e_{l-u}}\}$ be the standard basis vectors of $\mathbb{R}^{l-u}$ , we have a set

[TABLE]

which is a lattice basis for $\mathbb{Z}^{l}$ and for which $\{\Theta L\mathbf{x_{i}}:i\leqslant u\}$ is a lattice basis for $\Theta L\mathbb{Z}^{l}$ . Letting $U=\operatorname{span}(\{\mathbf{x_{i}}:i\leqslant u\})$ , we have that $\widetilde{\mathbf{r}}(\mathbf{a},\mathbf{y})\in U$ .

By applying the first part of Lemma 9.11 to expression (11), one immediately derives

[TABLE]

where

[TABLE]

and $I_{\mathbf{a},N}(\mathbf{y})$ is equal to

[TABLE]

Note that

[TABLE]

The remainder of the proof of Lemma 11.1 will consist of analysing expressions (11.12) and (11.13) for $\mathfrak{S}_{\mathbf{a},N}(\mathbf{y})$ and $I_{\mathbf{a},N}(\mathbf{y})$ .

We begin with $I_{\mathbf{a},N}(\mathbf{y})$ , aiming for expression (11.18). Letting $V=\operatorname{Im}\Xi$ , we have that $\mathbb{R}^{l}=U\oplus V$ . For any vector $\mathbf{v}\in\mathbb{R}^{l}$ let $\mathbf{v}|_{U}$ and $\mathbf{v}|_{V}$ be the components in $U$ and $V$ respectively. Then we have that

[TABLE]

since

[TABLE]

By the bound on the Lipschitz constant of $b_{1}$ , we may replace $b_{1}((\mathbf{y},\Xi(\mathbf{x})+\widetilde{\mathbf{r}}(\mathbf{a},\mathbf{y}))/N)$ with $b_{1}((\mathbf{y},\Xi(\mathbf{x})+\Psi(\mathbf{y})|_{U}+\mathbf{a}|_{U})/N)$ in (11.13), up to an error of $O_{P,\eta}(N^{-1})$ . Also, note that

[TABLE]

by Lemma A.1. This is $O_{P,\eta}(1)$ , since $\dim\ker L\Xi\leqslant d-1$ by (11.14). Therefore we may replace (11.13) by the expression

[TABLE]

plus an error of size $o_{P,\eta}(1)$ .

The expression (11.16) is in a form that is amenable to Lemma 10.1. The following table indicates which objects from our present discussion play which role in the notation of Lemma 10.1.

[TABLE]

This is a valid application of Lemma 10.1, since $\ker\Theta=\operatorname{Im}L\Xi$ and the final two functions in the right-hand column are compactly supported smooth functions of their arguments (as $\Xi$ is injective, $\Xi(\mathbf{x})\in V$ , and $V$ is an algebraic complement to $U$ ). Recalling that $\Theta$ has algebraic coefficients, by the third part of Lemma 10.1 we may therefore replace (11.16) by an expression of the form

[TABLE]

where $\operatorname{Rad}(b_{2})=O(\eta)$ .

The argument of the function $b_{1}$ above doesn’t depend smoothly on $\mathbf{y}$ , but this may be easily rectified. Indeed, by (11.15) and the fact that $b_{1}$ is Lipschitz and $b_{2}$ is bounded, (11.17) is equal to

[TABLE]

i.e. is equal to

[TABLE]

where $\operatorname{Rad}(b_{2})=O(\eta)$ .

In summary then, since $\mathfrak{S}_{\mathbf{a},N}(\mathbf{y})\ll(\log\log W^{*})^{O(1)}$ we have shown that

[TABLE]

The function

[TABLE]

is of the form considered in Lemma 11.2 in expression (11.2). Indeed, one first notes that (11.20) is a well-defined mapping, since $\widetilde{\mathbf{r}}(\mathbf{a},\mathbf{y})$ is determined only by $\Psi(\mathbf{y})+\mathbf{a}$ and $\mathfrak{S}_{\mathbf{a},N}(\mathbf{y})$ depends on $\mathbf{a}$ and $\mathbf{y}$ only through the value of $\widetilde{\mathbf{r}}(\mathbf{a},\mathbf{y})$ (see (11.12)). Then, one takes the map $S$ from Lemma 11.2 to be the map $\Theta L:\mathbb{R}^{l}\longrightarrow\mathbb{R}^{u}$ here, ones takes $K$ from that lemma to be $W^{*}$ here, and one takes the map $G$ from that lemma to be $b_{2}$ here, and one takes the map $\mathfrak{F}:\Theta L\mathbb{Z}^{l}\longrightarrow\mathbb{R}$ from that lemma to be

[TABLE]

here. The definition of $\mathfrak{F}$ is valid since $\Theta L|_{U}:U\longrightarrow\mathbb{R}^{u}$ is indeed a bijection, and by part (9) of Lemma 5.6 we have $(\Theta L|_{U})^{-1}(\Theta L(\mathbb{Z}^{l}))=\mathbb{Z}^{l}\cap U$ . Consulting expression (11.12) for $\mathfrak{S}_{\mathbf{a},N}(\mathbf{y})$ , one sees that

[TABLE]

and so (11.20) is indeed of the form (11.2) as we have claimed. The only hypothesis of Lemma 11.2 that we haven’t already verified is the invariance of $\mathfrak{F}$ under translation by elements of $\Theta L(W^{*}\mathbb{Z}^{l})$ , but this is immediate from the definition of $\mathfrak{F}$ , since $(\Theta L|_{U})^{-1}:\mathbb{R}^{u}\longrightarrow U$ is linear and $\Lambda_{\mathbb{Z}/W^{*}\mathbb{Z}}$ is $W^{*}$ -periodic. Therefore, by applying Lemma 11.2, we conclude that the function (11.20) is Lipschitz on $\mathbb{R}^{l}/W^{*}\mathbb{Z}^{l}$ , with Lipschitz constant $O_{P,\eta}((\log\log W^{*})^{O(1)}).$

The proof of Lemma 11.1 is nearly complete, since Lipschitz functions enjoy good approximation by short exponential sums. Indeed, by Lemma A.9 of [12], for all $X>2$ there exists a function $f_{1}:\mathbb{Z}^{l}\longrightarrow\mathbb{C}$ such that $\|f_{1}\|_{\infty}\ll(\log\log W^{*})^{O(1)}$ and

[TABLE]

equals

[TABLE]

Then, picking $X$ to be a suitably large power of $\log\log W^{*}$ , Lemma 11.1 follows. ∎

Part V The main argument

Having completed all the preparatory material, the main thrust of the proof can begin in earnest.

12. Controlling by Gowers norms

In this section we state a type of result that has become known as a ‘generalised von Neumann theorem’, which uses Gowers norms to bound the number of solutions to a diophantine inequality. For readers familiar with [13], the procedure is routine. We will then show that this result implies the main theorem (Theorem 1.16).

Theorem 12.1 (Generalised von Neumann Theorem).

Let $N,m,d$ be natural numbers, satisfying $d\geqslant m+2$ , and let $C,\gamma,\varepsilon$ be positive parameters. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map with algebraic coefficients, and assume that $L\notin V_{\operatorname{degen}}^{*}(m,d)$ and that $\gamma$ is small enough (depending on $L$ ). Let $\mathbf{v}\in\mathbb{R}^{m}$ satisfy $\|\mathbf{v}\|_{\infty}\leqslant CN$ . Let $F:\mathbb{R}^{d}\longrightarrow[0,1]$ and $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be functions with Lipschitz constants at most $\sigma^{-1}$ , and suppose that $F$ is supported on $[-1,1]^{d}$ and $G$ is supported on $[-\varepsilon,\varepsilon]^{m}$ . Let $f_{1},\dots,f_{d}:[N]\longrightarrow\mathbb{R}$ be arbitrary functions, satisfying $|f_{j}(n)|\leqslant\nu_{N,w}^{\gamma}(n)$ for all $j\leqslant d$ and for all $n\leqslant N$ .

Then there exists an $s=O(1)$ such that, if

[TABLE]

as $N\rightarrow\infty$ , then

[TABLE]

as $N\rightarrow\infty$ . The second $o(1)$ term may also depend on $C$ , $L$ , $\gamma$ , $\varepsilon$ , $\sigma$ , and the rate of decay of the first $o(1)$ term.

Proof of Theorem 1.16 assuming Theorem 12.1.

Assume the hypotheses of Theorem 1.16. By telescoping we have that

[TABLE]

is equal to

[TABLE]

Since $F$ is supported on $[-1,1]^{d}$ , we may restrict the functions $\Lambda^{\prime}$ and $\Lambda^{+}_{\mathbb{Z}/W\mathbb{Z}}$ to $[N]$ without altering the size of expression (12).

By the construction of the sieve weight $\nu_{N,w}^{\gamma}$ we have

[TABLE]

for all $n\leqslant N$ . Therefore, after rescaling, we may apply Theorem 12.1 in this setting.

Recall that, by Lemma 7.5,

[TABLE]

as $N\rightarrow\infty$ , for all $s\leqslant d-2$ . So, applying Theorem 12.1 to each term of (12) separately, we derive

[TABLE]

as $N\rightarrow\infty$ . By fixing a suitably small value of $\gamma$ , we conclude Theorem 1.16. ∎

13. Transferring from $\mathbb{Z}$ to $\mathbb{R}$

In this section we begin the proof of Theorem 12.1. Following the programme set out in [21], our first step will be to transfer the problem from the setting of functions on $\mathbb{Z}$ to functions on $\mathbb{R}$ .

Definition 13.1.

Let $N,m,d$ be natural numbers. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a linear map, let $\mathbf{v}\in\mathbb{R}^{m}$ , and let $F:\mathbb{R}^{d}\rightarrow[0,1]$ and $G:\mathbb{R}^{m}\rightarrow[0,1]$ be compactly supported measurable functions. Then, for all bounded measurable functions $g_{1},\dots,g_{d}:\mathbb{R}\longrightarrow\mathbb{R}$ we define

[TABLE]

We now state the key lemma. For the definition of $f\ast\chi$ , where $\chi$ is the function we determined in Section 4, the reader may consult Definition 10.2.

Lemma 13.2 (Transfer).

Let $N,m,d$ be natural numbers, with $d\geqslant m+2$ , and let $C$ , $\varepsilon$ , $\gamma$ , $\eta$ , $\sigma$ be positive constants. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map, and let $\mathbf{v}\in\mathbb{R}^{m}$ be a vector satisfying $\|\mathbf{v}\|_{\infty}\leqslant CN$ . Let $F:\mathbb{R}^{d}\longrightarrow[0,1]$ and $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be compactly supported Lipschitz functions, with Lipschitz constants at most $\sigma^{-1}$ . Suppose that $F$ is supported on $[-1,1]^{d}$ , and $G$ is supported on $[-\varepsilon,\varepsilon]^{m}$ . Then there exists some positive real number $C_{\chi}$ , satisfying $C_{\chi}\asymp 1$ , such that the following holds. Let $f_{1},\dots,f_{d}:[N]\longrightarrow\mathbb{R}$ be arbitrary functions that satisfy $|f_{j}(n)|\leqslant\nu_{N,w}^{\gamma}(n)$ for all $j\leqslant d$ and for all $n\leqslant N$ . Assume that $\eta\leqslant\min(1,\varepsilon)$ and that $\gamma$ is small enough depending on $L$ . Then

[TABLE]

as $N\rightarrow\infty$ . The implied constant in the $O(\eta\sigma^{-1})$ term may depend on $C$ , $L$ , and $\varepsilon$ , and the $o(1)$ term may depend on all these parameters together with $\gamma$ and $\sigma$ .

Proof.

The proof is very similar to the proof of Lemma 5.4 in [21], although we do have to insert various estimates that are only proved in this paper.

Indeed, let $\boldsymbol{\chi}:\mathbb{R}^{d}\longrightarrow[0,1]$ denote the function $\mathbf{x}\mapsto\prod\limits_{j=1}^{d}\chi(x_{j})$ . We choose

[TABLE]

Since $\chi$ is $\eta$ -supported, $C_{\chi}\asymp 1$ . Then, expanding the definition of the convolutions $f_{i}\ast\chi$ ,

[TABLE]

equals

[TABLE]

This is equal to

[TABLE]

Indeed, the inner integrand is only non-zero when $\|\mathbf{y}-\mathbf{n}\|_{\infty}\leqslant\eta$ , and $F$ has Lipschitz constant $O(\sigma^{-1})$ .

Continuing, expression (13.4) is equal to

[TABLE]

where

[TABLE]

and $E$ is a certain error, which may be bounded above by a constant times

[TABLE]

Let us deal with the first term of (13.5), in which we wish to replace $H$ with $G$ . We therefore consider

[TABLE]

which is

[TABLE]

Observe that $\|G-H\|_{\infty}=O(\eta\sigma^{-1})$ . Indeed,

[TABLE]

by the definition of $C_{\chi}$ and the Lipschitz property of $G$ . The function $|G-H|$ is compactly supported, with $\operatorname{Rad}(|G-H|)\ll\varepsilon+\eta\ll\varepsilon$ .

Of course $|G-H|$ needn’t be smooth, but we may nonetheless apply Corollary 9.15, concluding that expression (13.7) is at most

[TABLE]

Turning to the error $E$ from (13.5), we’ve already remarked that it may be bounded above by expression (13.6). Applying Corollary 9.15 again, expression (13.6) is $o(1)$ (with the appropriate dependencies on $C$ , $L$ , etc.).

The lemma then follows. ∎

We will need to show that the operation of replacing $f$ by $f\ast\chi$ is compatible with Gowers norms.

Firstly, if $g:[-N,N]\longrightarrow\mathbb{R}$ is a bounded measurable function, we define the Gowers norm over the reals $\|g\|_{U^{d}(\mathbb{R},N)}$ by

[TABLE]

More detail about this quantity may be found in Appendix A of [21].

Secondly, we note that $\|f\|_{U^{d}[N]}$ and $\|f\ast\chi\|_{U^{d}(\mathbb{R},2N)}$ may be related.

Lemma 13.3 (Relating different Gowers norms).

Let $s$ be a natural number, and assume that $\eta$ is a positive parameter that is small enough in terms of $s$ . Let $N$ be a natural number, and let $f:[N]\longrightarrow\mathbb{R}$ be an arbitrary function. Then we have

[TABLE]

Proof.

This is Lemma 5.5 of [21]. ∎

14. Parametrising the kernel

In this section we will convert the expression $T_{F,G,N}^{L,\mathbf{v}}(f_{1},\dots,f_{d})$ into an expression that is tailored to the subsequent manipulations. We begin with a lemma that is very similar to Proposition 8.2 of [21].

Lemma 14.1 (Separating out the kernel).

Let $N,m,d$ be natural numbers, with $d\geqslant m+2$ , and let $C,\varepsilon,\sigma$ be positive constants. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map with algebraic coefficients, and assume further that $L\notin V_{\operatorname{degen}}^{*}(m,d)$ . Let $F:\mathbb{R}^{d}\longrightarrow[0,1]$ be a Lipschitz function supported on $[-1,1]^{d}$ , with Lipschitz constant at most $\sigma^{-1}$ , and let $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be any function supported on $[-\varepsilon,\varepsilon]^{m}$ . Then there exists an injective linear map $(\psi_{1},\dots,\psi_{d})=\Psi:\mathbb{R}^{d-m}\longrightarrow\mathbb{R}^{d}$ with algebraic coefficients (depending only on $L$ ), and a Lipschitz function $F_{1}:\mathbb{R}^{d-m}\longrightarrow[0,1]$ with Lipschitz constant $O_{L}(\sigma^{-1})$ and with $\operatorname{Rad}(F_{1})=O_{C,L,\varepsilon}(1)$ , such that, if $g_{1},\dots,g_{d}:\mathbb{R}\longrightarrow\mathbb{R}$ are arbitrary bounded measurable functions,

[TABLE]

where, for each $j$ , $a_{j}$ is some real number that satisfies $a_{j}=O_{C,L,\varepsilon}(N)$ .

Furthermore, $\Psi$ has finite Cauchy-Schwarz complexity (see Definition 5.5).

Proof of Lemma 14.1.

For ease of notation, let

[TABLE]

Noting that $\ker L$ is a vector space of dimension $d-m$ , define $\{\mathbf{u^{(1)}},\dots,\mathbf{u^{(d-m)}}\}\subset\mathbb{R}^{d}$ to be an orthonormal basis for $\ker L$ consisting of vectors with algebraic coordinates. Then the map $(\psi_{1},\dots,\psi_{d})=\Psi:\mathbb{R}^{d-m}\longrightarrow\mathbb{R}^{d}$ , defined by

[TABLE]

is an injective map that parametrises $\ker L$ . Furthermore $\Psi$ has finite Cauchy-Schwarz complexity, since otherwise there would exist $i\neq j\leqslant d$ and a real number $\lambda$ such that $\mathbf{e_{i}}^{*}-\lambda\mathbf{e_{j}}^{*}\in(\operatorname{Im}\Psi)^{0}$ , i.e. $\mathbf{e_{i}}^{*}-\lambda\mathbf{e_{j}}^{*}\in(\ker L)^{0}$ . This implies that $\mathbf{e_{i}}^{*}-\lambda\mathbf{e_{j}}^{*}\in L((\mathbb{R}^{m})^{*})$ , which, by definition, implies that $L\in V_{\operatorname{degen}}^{*}(m,d)$ , contradicting our hypotheses.

Now, extend the orthonormal basis $\{\mathbf{u^{(1)}},\dots,\mathbf{u^{(d-m)}}\}$ for $\ker L$ to an orthonormal basis $\{\mathbf{u^{(1)}},\dots,\mathbf{u^{(d)}}\}$ for $\mathbb{R}^{d}$ . By implementing a change of basis, we may rewrite

[TABLE]

where $u^{(i)}_{j}$ is the $j^{th}$ coordinate of $\mathbf{u}^{(\mathbf{i})}$ .

We wish to remove the presence of the variables $x_{d-m+1},\dots,x_{d}$ . To set this up, note that, by the choice of the vectors $\mathbf{u^{(i)}}$ ,

[TABLE]

The vector $\sum_{i=d-m+1}^{d}x_{i}\mathbf{u^{(i)}}$ is in $(\ker L)^{\perp}$ and so, since $L|_{(\ker L)^{\perp}}$ is a bounded invertible operator, $G(L(\sum_{i=d-m+1}^{d}x_{i}\mathbf{u^{(i)}})+\mathbf{v})$ is equal to zero unless $(x_{d-m+1},\dots,x_{d})^{T}\in D$ , for some domain $D\subseteq\mathbb{R}^{m}$ of diameter $O_{L}(\varepsilon)$ and satisfying $\sup_{\mathbf{x}\in D}\|\mathbf{x}\|_{\infty}=O_{C,L}(N+\varepsilon)$ .

We can use this observation to bound the right-hand side of (14.3). Indeed, we have

[TABLE]

See Section 4 for explanation of $\mathbf{x_{1}^{d-m}}$ notation. So there exists some fixed vector

$(x_{d-m+1},\dots,x_{d})^{T}$ in $D$ such that

[TABLE]

Define the function $F_{1}:\mathbb{R}^{d-m}\longrightarrow[0,1]$ by

[TABLE]

and for each $j$ at most $d$ a shift

[TABLE]

Then

[TABLE]

and $F_{1}$ and $a_{j}$ satisfy the conclusions of the proposition. ∎

The next proposition is essentially identical to an argument that appears in [21] at the end of Section 8 of that paper. Unfortunately that argument is not in an easily citable form, and so we have found it necessary to state and prove the precise version that we need here. For readers unfamiliar with the notion of normal form, we included a brief summary in Section 6.

Lemma 14.2 (Parametrising by normal form).

Following on from above, there exists a $d^{\prime}=O(1)$ , a linear map $(\psi_{1}^{\prime},\dots,\psi_{d}^{\prime}):=\Psi^{\prime}:\mathbb{R}^{d^{\prime}}\longrightarrow\mathbb{R}^{d}$ with algebraic coefficients that is in $s$ -normal form for some $s=O(1)$ , and a Lipschitz function $F_{2}:\mathbb{R}^{d^{\prime}}\longrightarrow[0,1]$ with Lipschitz constant $O_{L}(\sigma^{-1})$ and with $\operatorname{Rad}(F_{2})=O_{C,L,\varepsilon}(1)$ such that

[TABLE]

is bounded above by a constant times

[TABLE]

Proof.

We apply Lemma 6.2 to $\Psi$ . Therefore, there is a natural number $k=O(1)$ such that, for any real numbers $y_{1},\dots,y_{k}$ , (14.7) is equal to

[TABLE]

where

•

$\mathbf{f_{1}},\cdots,\mathbf{f_{k}}\in\mathbb{R}^{d-m}$ are some vectors that satisfy $\|\mathbf{f_{i}}\|_{\infty}=O_{\Psi}(1)$ for each $i$ at most $k$ ;

•

for each $j$ at most $d$ , $\psi^{\prime}_{j}:\mathbb{R}^{k}\times\mathbb{R}^{d-m}\longrightarrow\mathbb{R}$ is linear, and $(\psi_{1}^{\prime},\cdots,\psi^{\prime}_{d}):=\Psi^{\prime}:\mathbb{R}^{k}\times\mathbb{R}^{d-m}\longrightarrow\mathbb{R}^{d}$ is defined by

[TABLE]

•

$\Psi^{\prime}$ is in $s$ -normal form, for some $s=O(1)$ .

We remark that the right-hand side of expression (14.9) is independent of $\mathbf{y}$ , as it was obtained by applying the change of variables $\mathbf{x}\mapsto\mathbf{x}+\sum_{i=1}^{k}y_{i}\mathbf{f_{i}}$ .

Now, with $\rho$ as fixed in Section 4, let $P:\mathbb{R}^{k}\longrightarrow[0,1]$ be defined by

[TABLE]

Integrating over $\mathbf{y}$ , we have that (14.9) is at most a constant times

[TABLE]

where the function $F_{2}:\mathbb{R}^{d-m+k}\longrightarrow[0,1]$ is defined by

[TABLE]

Notice in (14.10) that we were able to move the absolute value signs outside the integral, as $P$ is positive and the integral over $\mathbf{x}$ is independent of $\mathbf{y}$ (so in particular has constant sign).

Letting $d^{\prime}:=d-m+k$ , the lemma is proved. ∎

15. Gowers-Cauchy-Schwarz argument

This section will be devoted to proving the following theorem, which lies at the heart of the proof of our main results.

Theorem 15.1 (Gowers-Cauchy-Schwarz argument).

Let $N,t,d,s$ be natural numbers, and let $\gamma,\eta,\sigma,C$ be positive constants. Let $a_{1},\dots,a_{d}$ be fixed real numbers that satisfy $|a_{j}|\leqslant CN$ for all $j$ . Let $(\psi_{1},\dots,\psi_{t})=\Psi:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{t}$ be a linear map with algebraic coefficients, which is in $s$ -normal form. Let $F:\mathbb{R}^{d}\longrightarrow[0,1]$ be a Lipschitz function supported on $[-1,1]^{d}$ and with Lipschitz constant at most $\sigma^{-1}$ . Let $g_{1},\dots,g_{t}:[-2N,2N]\longrightarrow\mathbb{R}$ be any bounded measurable functions that satisfy $|g_{j}(x)|\leqslant(\nu_{N,w}^{\gamma}\ast\chi)(x+a_{j})$ for all $x$ . Suppose that

[TABLE]

as $N\rightarrow\infty$ . Then if $\eta$ and $\gamma$ are small enough in terms of $\Psi$ and the dimensions $t$ , $d$ , and $s$ ,

[TABLE]

as $N\rightarrow\infty$ , where the error term can depend on $C$ , $\sigma$ , $\eta$ , $\gamma$ , $\Psi$ , and the first $o(1)$ term.

For the definition of $\|g_{j}\|_{U^{s+1}(\mathbb{R},2N)}$ , the reader may consult expression (13.8).

Theorem 15.1 is closely analogous to [13, Proposition $7.1^{\prime\prime}$ ], and the first half of our proof will follow the proof of that proposition closely (and in particular will contain no new ideas). However, new technicalities will become apparent as the argument progresses. In particular it will become important to understand the structure of a function that we will come to denote by $Q_{\mathbf{a},N}(z,\mathbf{h})$ , and this will not be easy, in that we will have to appeal to the highly technical Lemma 11.1. This observation and the subsequent analysis constitute the main new elements of the proof of Theorem 15.1.

Proof.

We begin by replacing $F$ with a cut-off function that will be easier to work with during the subsequent manipulations. Indeed, let us pick a positive parameter $\delta\in(0,1]$ . By Lemma 3.3 there is some parameter $k=O(\delta^{-d})$ and some smooth functions $F_{1},\dots,F_{k}:\mathbb{R}^{d}\longrightarrow[0,1]$ such that

[TABLE]

and each $F_{i}$ is of the form

[TABLE]

where $|c_{i,F}|\leqslant 1$ and the functions $b_{j}^{i}:\mathbb{R}\longrightarrow[0,1]$ are smooth, supported on $[-2,2]$ , and satisfy $b_{j}^{i}\in\mathcal{C}(\delta)$ .

Therefore, we may write the left-hand side of (15.1) as the sum of $O(\delta^{-d})$ expressions of the form

[TABLE]

plus an error of size at most

[TABLE]

Since $\Psi$ is in $s$ -normal form, for some finite $s$ , it follows that $\Psi$ has finite Cauchy-Schwarz complexity (see Definition 5.5). Therefore, by Corollary 10.4, expression (15.3) has size $O_{C,\eta,\gamma}(\delta\sigma^{-1})$ .

We now arrange our notation for the rest of the proof, in part to mimic the notation that is used in the proof of [13, Proposition $7.1^{\prime\prime}$ ]. This will hopefully increase the readability for those who are familiar with [13]. Indeed, without loss of generality we may assume that

[TABLE]

Since $\Psi$ is in $s$ -normal form there is a set $J\subset\{\mathbf{e_{1}},\dots,\mathbf{e_{d}}\}$ of standard basis vectors with $|J|\leqslant s+1$ and for which $\prod_{j\in J}\psi_{i}(\mathbf{e_{j}})$ vanishes for $i\neq 1$ and is nonzero for $i=1$ . By the nested property of Gowers norms we may assume that $|J|=s+1$ , and by reordering the variables we can assume without loss of generality that $\prod_{j=1}^{s+1}\psi_{i}(\mathbf{e_{j}})$ vanishes for $i\neq 1$ and is nonzero for $i=1$ . It will be useful to rename the first $s+1$ variables $\mathbf{x}$ and the remainder as $\mathbf{y}$ . If $d=s+1$ then the variable $\mathbf{y}$ is trivial. Note that the coefficients $\psi_{1}(\mathbf{e_{j}})$ are non-zero for all $j\in[s+1]$ , so, by rescaling the variables $\mathbf{x}$ , we may assume that

[TABLE]

For $i\leqslant t$ , let $\Omega(i)$ denote888This is the notation used in [13]. In this paper it will never risk being confused with the meaning of $\Omega$ in asymptotic notation. the set

[TABLE]

Note that $\Omega(1)=[s+1]$ and $\Omega(i)\neq[s+1]$ for $i=2,\dots,t$ .

Now, for any set $B\subseteq[s+1]$ and vector $\mathbf{x}\in\mathbb{R}^{s+1}$ , we define the vector $\mathbf{x_{B}}$ to be the restriction of $\mathbf{x}$ to the coordinates in $B$ . Then, for any set $B\subseteq[s+1]$ and vector $\mathbf{y}\in\mathbb{R}^{d-s-1}$ , we define

[TABLE]

where we have abused notation slightly in viewing $\psi_{i}$ only as a function of those variables $x_{j}$ on which it depends.

We also use $b:\mathbb{R}^{a}\longrightarrow\mathbb{R}$ (for some implied dimension parameter $a$ ) to denote a smooth function in $\mathcal{C}(C,\delta,\eta,\gamma,\Psi)$ . The exact function may change from line to line.

With this notation, by picking $\delta$ to be a suitably slowly decaying function of $N$ we see that Theorem 15.1 would follow from the upper bound

[TABLE]

Our entire task is now to establish (15). From this point onwards, we will allow any error term or implied constant to depend on $C$ , $\delta$ , $\eta$ , $\gamma$ , and $\Psi$ , without notating so explicitly.

We proceed by considering the following version of [13, Corollary B.4].

Proposition 15.2 (The weighted generalised von Neumann theorem).

Let $A$ be a finite set, and let $(\mu_{\alpha})_{\alpha\in A}$ be a finite collection of compactly supported Borel probability measures on $\mathbb{R}$ . For every $B\subseteq A$ , let $\mu_{B}$ denote the product measure $\bigotimes\limits_{\alpha\in B}\mu_{\alpha}$ on $\mathbb{R}^{B}$ , and let $f_{B}:\mathbb{R}^{B}\longrightarrow\mathbb{C}$ and $\theta_{B}:\mathbb{R}^{B}\longrightarrow\mathbb{R}_{\geqslant 0}$ be integrable functions such that $|f_{B}(\mathbf{x_{B}})|\leqslant\theta_{B}(\mathbf{x_{B}})$ for all $\mathbf{x_{B}}\in\mathbb{R}^{B}$ . Then

[TABLE]

where for any $B\subseteq A$ and $h_{B}:\mathbb{R}^{B}\longrightarrow\mathbb{C}$ we define $\|h_{B}\|_{\square^{B}(\theta;\mu_{B})}$ to be the unique nonnegative real number satisfying

[TABLE]

Here, as before, we use $\mathbf{x_{C}}$ to denote the restriction of $\mathbf{x_{B}}$ to $\mathbb{R}^{C}$ .

Proof.

The proof is identical to the proof of [13, Corollary B.4], replacing all summations with integrals, and is a consequence of the Gowers-Cauchy-Schwarz inequality. ∎

We now apply this proposition to the left-hand side of (15) above. Observe that we have the pointwise bounds $|G_{B,\mathbf{y}}(\mathbf{x_{B}})|\ll\theta_{B,\mathbf{y}}(\mathbf{x_{B}})$ , where

[TABLE]

Therefore, applying Proposition 15.2 by taking $A$ to be the set $[s+1]$ , each $\mu_{\alpha}$ to be proportional to $(1/N)b_{j}(x_{j}/N)dx_{j}$ , and $\theta_{B}$ to be the function $\theta_{B,\mathbf{y}}$ , we establish that the left-hand side of (15) is

[TABLE]

Observe that

[TABLE]

and so all the functions $g_{j}$ other than $g_{1}$ have been eliminated. Experienced readers will note that, so far, we have been following [13, Appendix C] almost verbatim.

After applying Hölder’s inequality to (15), we see that to establish (15) it suffices to prove

[TABLE]

and, for all $B\subsetneq[s+1]$ ,

[TABLE]

These two expressions correspond respectively to expressions (C.10) and (C.11) of [13].

Establishing (15.9) is straightforward. Indeed, we expand the left-hand side, yielding (up to a multiplicative constant factor) the expression

[TABLE]

As noted in [13, p. 1824], the system of forms given by

[TABLE]

for each $C\subseteq B$ , $\boldsymbol{\omega}_{\mathbf{C}}\in\{0,1\}^{C}$ and $i\in[t]$ such that $\Omega(i)=C$ , has finite Cauchy-Schwarz complexity (since $\Psi$ does). We may therefore apply the upper bound in Corollary 10.4 to expression (15), and this immediately yields (15.9).

It remains to prove (15.8), which will be a much more major undertaking. We introduce some space-saving notation, namely for any subset $B\subseteq[s+1]$ we define the indexing set

[TABLE]

If a product is taken over triples $\mathfrak{t}\in I_{B}$ , we interpret $C$ , $\boldsymbol{\omega}_{C}$ and $i$ as coming from the triple $\mathfrak{t}=(C,\boldsymbol{\omega}_{C},i)$ . For notational expedience we will also identify the space $\mathbb{R}^{I_{B}}$ with the space $\mathbb{R}^{|I_{B}|}$ .

With this notation, the left-hand side of (15.8) expands as

[TABLE]

We make the substitution $\mathbf{h}:=\mathbf{x_{[s+1]}^{(1)}}-\mathbf{x_{[s+1]}^{(0)}}$ and $z:=x_{1}^{(0)}+\dots+x_{s+1}^{(0)}+\psi_{1}(\mathbf{0},\mathbf{y})$ . Given $\mathbf{h}$ , $z$ , $\mathbf{x_{[s]}^{(0)}}$ and $\mathbf{y}$ one can recover $\mathbf{x_{[s+1]}^{(0)}}$ , $\mathbf{x_{[s+1]}^{(1)}}$ and $\mathbf{y}$ , so the change of variables is invertible. Therefore we may bound (15) above by a constant (the Jacobian of the change of variables) times

[TABLE]

where $P_{\mathbf{a},N}(z,\mathbf{h})$ is equal to

[TABLE]

for some linear functions $\varphi_{\mathfrak{t}}:\mathbb{R}^{d+s+1}\longrightarrow\mathbb{R}$ .

To be precise, if $\mathfrak{t}=(C,\boldsymbol{\omega},i)$ then the expression $\varphi_{\mathfrak{t}}(z,\mathbf{h},\mathbf{x_{[s]}^{(0)}},\mathbf{y})$ is equal to

[TABLE]

where

[TABLE]

This expression is analogous to expression (C.14) of [13]. We let $\mathbf{c}(z,\mathbf{h})\in\mathbb{R}^{I_{[s+1]}}$ denote the vector $(c(z,\mathbf{h})_{\mathfrak{t}})_{\mathfrak{t}\in I_{[s+1]}}$ . Most fortunately, the exact structure of the linear maps $\varphi_{\mathfrak{t}}$ , save for the fact that they form a system with finite Cauchy-Schwarz complexity, will be unimportant.

Following the philosophy of [11] and [13], our next manoeuvre will be to replace $P_{\mathbf{a},N}(z,\mathbf{h})$ with a simpler function. To that end, let $w^{*}:\mathbb{N}\longrightarrow\mathbb{R}_{\geqslant 0}$ be a function for which $w^{*}(N)\rightarrow\infty$ as $N\rightarrow\infty$ and $w^{*}(n)\leqslant w(n)$ for all $n$ . Recall from Section 4 that $W^{*}=W^{*}(N)=\prod_{p\leqslant w^{*}(N)}p$ .

Lemma 15.3 (Comparing $P_{\mathbf{a},N}(z,\mathbf{h})$ and $Q_{\mathbf{a},N}(z,\mathbf{h})$ ).

Define $Q_{\mathbf{a},N}(z,\mathbf{h})$ to be equal to

[TABLE]

where $b((z,\mathbf{h},\mathbf{x_{[s]}^{(0)}},\mathbf{y})/N)$ here denotes the same function as is present in (15.13). Then expression (15.12) is equal to

[TABLE]

where the $o(1)$ may depend on the function $w^{*}$ .

Proof.

Considering the upper bound $|g_{1}(x)|\leqslant(\nu_{N,w}^{\gamma}\ast\chi)(x+a_{1})$ , it suffices to show that

[TABLE]

is $o(1)$ . By Cauchy-Schwarz, it then suffices to show that both

[TABLE]

and

[TABLE]

The bound (15.16) is immediate from Corollary 10.4. To prove (15.17), expanding out the square we must consider three expressions. One of them is

[TABLE]

When multiplied out, (15.18) is equal to the large expression

[TABLE]

By applying Corollary 10.3 to the above expression, we may replace the functions $\nu_{N,w}^{\gamma}\ast\chi$ with $\Lambda_{\mathbb{Z}/W^{*}\mathbb{Z}}\ast\chi$ , up to an $o(1)$ error.

It is worth noting why the application of Corollary 10.3 is valid. Indeed, the underlying set of linear forms is given by (for each $\mathfrak{t}\in I_{[s+1]}$ )

[TABLE]

We need this linear map to have algebraic coefficients and to have finite Cauchy-Schwarz complexity. Algebraicity follows by the assumptions in the statement of Theorem 15.1. Establishing finite Cauchy-Schwarz complexity is rather involved, but fortunately this has already been done by Green and Tao, on pages 1826 and 1827 of [13], in the analysis of expression (C.14).

Replacing (15.18) with one of the other two terms that arises from expanding out the square in (15.17), and performing the same estimation, the lemma follows. ∎

Let us take stock. As a reminder, we are trying to establish that (15.8) holds. Lemma 15.3 above reduces matters to choosing some function $w^{*}$ that tends to infinity for which the bound

[TABLE]

holds. If $Q_{\mathbf{a},N}(z,\mathbf{h})$ were identically equal to $1$ , then expression (15.20) would be of the order of $\|g_{1}\|_{U^{s+1}(\mathbb{R},2N)}$ , and hence be $o(1)$ by the hypotheses of Theorem 15.1. Of course $Q_{\mathbf{a},N}(z,\mathbf{h})$ is not identically equal to $1$ , but we do observe that $Q_{\mathbf{a},N}(z,\mathbf{h})$ is a function of the form considered in Lemma 11.1. Indeed, consulting the definition of $Q_{\mathbf{a},N}(z,\mathbf{h})$ in (15.14), the following table shows which objects in Lemma 11.1 correspond to which objects concerning the definition of $Q_{\mathbf{a},N}(z,\mathbf{h})$ .

[TABLE]

From Lemma 11.1, we therefore know that there exists some function $f_{1}:\mathbb{Z}^{|I_{[s+1]}|}\longrightarrow\mathbb{C}$ satisfying $\|f_{1}\|_{\infty}\ll(\log\log W^{*})^{O(1)}$ for which

[TABLE]

Therefore, one gets an upper bound for the left-hand side of (15.20), namely

[TABLE]

plus an error of size

[TABLE]

By Corollary 10.4, the size of term (15.22) is $o(1)$ . To analyse (15) we apply Lemma B.4 of [21]. Since the function $b_{\mathbf{a},N}$ is Lipschitz this means that for all $Y>2$ there exists a complex valued function $f_{\mathbf{a},N,2}$ such that $\|f_{\mathbf{a},N,2}\|_{\infty}\ll 1$ and for all $(z,\mathbf{h})$ one has

[TABLE]

Choosing $Y$ to be a suitably large power of $\log\log W^{*}$ , (15) may be bounded above by

[TABLE]

plus an error of size

[TABLE]

Using Corollary 10.4 as above, expression (15.24) is $o(1)$ .

The term (15) may be analysed using the standard methods. Indeed, by shifting the variable $z$ (and noting that $\mathbf{c}(z,\mathbf{h})$ is a linear function of $z$ and $\mathbf{h}$ ) we may assume that $a_{1}=0$ . Then, by spreading the exponential functions across the different instances of $g_{1}$ , we see it suffices to show that

[TABLE]

where each function $g_{\boldsymbol{\omega}}$ is of the form

[TABLE]

for some $\lambda_{\boldsymbol{\omega}}\in\mathbb{R}$ .

The argument is nearly complete. Considering expression (13.8), for each $\boldsymbol{\omega}$ we observe that

[TABLE]

So, by the Gowers-Cauchy-Schwarz inequality (recorded in this setting as Proposition A.4 of [21]), the left-hand side of expression (15.25) is

[TABLE]

If $w^{*}$ grows slowly enough, this expression is $o(1)$ .

We have therefore established the upper bound (15.20), and so, by our long sequence of deductions, Theorem 15.1 is finally proved. ∎

16. Combining the lemmas

With all the previous lemmas in hand, we may finally prove Theorem 12.1 (and hence prove Theorem 1.16).

Proof of Theorem 12.1.

Assume the hypotheses of the theorem, fixing a suitably small value of $\gamma$ .

By applying Proposition 14.1 and Proposition 14.2, we conclude that there is some $s=O(1)$ and some $d^{\prime}=O(1)$ for which $|\widetilde{T}_{F,G,N}^{L,\mathbf{v}}(f_{1}\ast\chi,\dots,f_{d}\ast\chi)|$ is

[TABLE]

where $(\psi_{1}^{\prime},\dots,\psi_{d}^{\prime})=\Psi^{\prime}:\mathbb{R}^{d^{\prime}}\longrightarrow\mathbb{R}^{d}$ is in $s$ -normal form, $F_{2}:\mathbb{R}^{d^{\prime}}\longrightarrow[0,1]$ has Lipschitz constant $O_{L}(\sigma^{-1})$ and $\operatorname{Rad}(F_{2})=O_{C,L,\varepsilon}(1)$ , and each $a_{j}$ satisfies $|a_{j}|=O_{C,L,\varepsilon}(N)$ . Taking this value of $s$ in the hypotheses of Theorem 12.1, without loss of generality we may assume that

[TABLE]

as $N\rightarrow\infty$ .

Then we may apply Theorem 15.1 to expression (16.1). Indeed, by rescaling the variable $\mathbf{x}$ we may assume that $F_{2}$ is supported on $[-1,1]^{d^{\prime}}$ . For each $j\in[d]$ we set

[TABLE]

Provided $\eta$ is small enough, by combining (16.2) and Lemma 13.3 we deduce that

[TABLE]

as $N\rightarrow\infty$ . So Theorem 15.1 may indeed be applied, which yields

[TABLE]

as $N\rightarrow\infty$ .

But then, combining the estimate (16.3) with Lemma 13.2, one derives the bound

[TABLE]

Choosing $\eta=\eta(N)$ to be a function tending to zero suitably slowly with $N$ , we conclude that

[TABLE]

This is the conclusion of Theorem 12.1, and we are done. ∎

From the work in Section 12, this means that Theorem 1.16, the main result of this paper, is finally settled. ∎

Part VI Final deductions

17. Removing Lipschitz cut-offs

In this section we assume Theorem 1.16, and deduce Theorem 1.7. This deduction will be a routine matter of removing Lipschitz cut-offs.

Lemma 17.1.

Assume the hypotheses of Theorem 1.7. Let $\delta$ be a real number in the range $0<\delta<1/2$ and let $I\subset[0,1]$ be an interval of length $\delta$ . Then

[TABLE]

The reader will note that this lemma is a slight refinement of Corollary 9.15.

Proof.

Fix some $i\leqslant d$ . Let $F:\mathbb{R}^{d}\longrightarrow[0,1]$ be a smooth function in $\mathcal{C}(\delta)$ , supported on $\{\mathbf{x}\in[-1,2]^{d}:x_{i}\in I+[-\delta,\delta]\}$ , that majorises the indicator function of the set $\{\mathbf{x}\in[0,1]^{d}:x_{i}\in I\}$ . Let $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be some smooth function in $\mathcal{C}(\varepsilon)$ , supported on $[-2\varepsilon,2\varepsilon]^{m}$ , that majorises $1_{[-\varepsilon,\varepsilon]^{m}}$ . Let $\gamma$ be small enough in terms of $L$ . Then, by Theorem 9.5 and Lemma 9.11,

[TABLE]

Since $L\notin V_{\operatorname{degen}}^{\ast}(m,d)$ , for all $d$ of the coordinate subspaces $U\leqslant\mathbb{R}^{d}$ of dimension $d-1$ the map $L|_{U}:U\longrightarrow\mathbb{R}^{m}$ is surjective. We may therefore apply Lemma A.3, and conclude that expression (17) is $O_{L}(\delta\varepsilon^{m})+o_{C,L,\delta,\varepsilon,\gamma}(1)$ . The lemma is proved, after having fixed a suitable $\gamma$ . ∎

Lemma 17.2.

Under the hypotheses of Theorem 1.7,

[TABLE]

Proof.

Let $\delta$ be a positive parameter in the range $(0,1/2)$ , to be chosen later. Let us first consider

[TABLE]

Let $F^{\pm\delta}:\mathbb{R}^{d}\longrightarrow[0,1]$ be two Lipschitz functions satisfying

[TABLE]

with Lipschitz constants depending only on $\delta$ . Let $G^{\pm\delta}:\mathbb{R}^{m}\longrightarrow[0,1]$ be two Lipschitz functions satisfying

[TABLE]

with Lipschitz constants999The existence of such functions is immediate by interpolating linearly, or by appealing to the results of Section 3. depending only on $\delta$ . Then we have

[TABLE]

By Theorem 1.16, the lower bound in (17) is equal to

[TABLE]

since we may replace $\Lambda_{\mathbb{Z}/W\mathbb{Z}}^{+}$ with $\Lambda_{\mathbb{Z}/W\mathbb{Z}}$ as $F$ is supported on $[0,1]^{d}$ . By Lemma 9.11, and the properties of the support of $F^{-\delta}$ and $G^{-\delta}$ , this is at least

[TABLE]

Note that the singular series $\mathfrak{S}$ is equal to $1$ in this instance, since $L$ is purely irrational. By Lemma A.4, expression (17.3) is at least

[TABLE]

By performing an analogous manipulation with the upper bound, we may conclude that

[TABLE]

is equal to

[TABLE]

Therefore, by Lemma 17.1, we have that

[TABLE]

is equal to

[TABLE]

Letting $\delta$ be a function of $N$ , tending to zero suitably slowly as $N$ tends to infinity, the lemma follows. ∎

To establish Theorem 1.7 as given, i.e. to establish Lemma 17.2 without the log weighting, is standard. To spell it out, Lemma 17.2 implies that, for any $\delta$ in the range $0<\delta<1/2$ ,

[TABLE]

But also, from expression (17.4)

[TABLE]

By Lemma A.1,

[TABLE]

Hence, choosing $\delta$ to be a function of $N$ tending to zero suitably slowly, combining bounds (17) and (17) establishes Theorem 1.7. ∎

Part VII Appendices

Appendix A Estimating integrals

In this appendix we include the lemmas that help us estimate the ‘global factor’ from Theorem 1.7, namely

[TABLE]

Lemma A.1 (Upper bound).

Let $h$ be a natural number, let $m$ be a non-negative integer, and let $C,K$ be positive constants. Let $L:\mathbb{R}^{h}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map. Let $F:\mathbb{R}^{h}\longrightarrow[0,1]$ and $G:\mathbb{R}^{m}\longrightarrow[0,1]$ be any compactly supported measurable functions, and assume that $F$ is supported on a box of the form $\mathbf{x^{(0)}}+[-C,C]^{h}$ and $G$ is supported on a box of the form $\mathbf{y^{(0)}}+[-K,K]^{m}$ . Then

[TABLE]

Proof.

Split $\mathbb{R}^{h}$ as a direct sum $(\ker L)\oplus(\ker L)^{\perp}$ . Observe that $L|_{(\ker L)^{\perp}}$ is an injective linear map, so has bounded inverse. Hence the integrand in (A.1) is zero unless $\mathbf{x}|_{(\ker L)^{\perp}}$ is contained within a region which has volume $O_{L}(K^{m})$ . The integrand is also zero unless $\mathbf{x}|_{(\ker L)}$ is contained within a region which has volume $O_{L}(C^{h-m})$ . Together, these observations combine to give the required bound. ∎

Lemma A.2.

Let $N$ , $m$ , $d$ be natural numbers, with $d\geqslant m+1$ , and let $\varepsilon$ be a positive parameter. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective purely irrational linear map. Let $\mathbf{v}\in\mathbb{R}^{m}$ be any vector. Then there exists a parameter $C_{L,\mathbf{v}/N}$ satisfying $|C_{L,\mathbf{v}/N}|=O_{L}(1)$ such that

[TABLE]

Furthermore, if $\|\mathbf{v}\|_{\infty}=o(N)$ then there exists a constant $C_{L}$ , independent of $\mathbf{v}$ and $N$ , for which

[TABLE]

Proof.

Since $L$ has rank $m$ , without loss of generality we may assume that the first $m$ columns of $L$ form an invertible submatrix $M$ . So

[TABLE]

The first $m$ columns of $M^{-1}L$ form the identity matrix, and so this expression is equal to

[TABLE]

where $\mathbf{a^{(j)}}\in\mathbb{R}^{m}$ is the $j^{th}$ column of the matrix $M^{-1}L$ .

If the vector

[TABLE]

lies in $[0,1]^{m}$ , then unless it lies close to the boundary of $[0,1]^{m}$ we have

[TABLE]

More precisely, letting $C_{1}$ be a constant that is sufficiently large in terms of $L$ , we have that expression (A.3) is equal to

[TABLE]

plus an error term of size at most

[TABLE]

where $\int^{*}$ indicates integration over those $\mathbf{x_{m+1}^{d}}\in[0,1]^{d-m}$ for which

[TABLE]

We remind the reader that $\partial$ refers to the topological boundary.

Define

[TABLE]

and

[TABLE]

Then certainly $|C_{L,\mathbf{v}/N}|=O_{L}(1)$ . To prove the first part of the lemma it then suffices to control the error term (A.5). Let $\Phi:\mathbb{R}^{d-m}\longrightarrow\mathbb{R}^{m}$ denote the map

[TABLE]

For all $i\leqslant m$ , let $\pi_{i}:\mathbb{R}^{m}\longrightarrow\mathbb{R}$ denote projection onto the $i^{th}$ coordinate. Then the size of the error term (A.5) is

[TABLE]

where the supremum is over intervals $I$ .

Since $L$ is purely irrational (and so $M^{-1}L$ is also purely irrational), for all $i\leqslant m$ the linear map $\pi_{i}\circ\Phi:\mathbb{R}^{d-m}\longrightarrow\mathbb{R}$ is non-zero. From this we conclude that (A.5) has size at most

[TABLE]

from which the first part of the lemma follows.

For the second part, assume that $\|\mathbf{v}\|_{\infty}=o(N)$ . Then note that

[TABLE]

where $\int^{**}$ indicates integration over those $\mathbf{x_{m+1}^{d}}\in[0,1]^{d-m}$ for which

[TABLE]

One can estimate (A.7) by exactly the same procedure as was used to estimate (A.5), and thereby conclude that (A.7) is $o_{L}(1)$ . This settles the second part of the lemma. ∎

The next lemma concerns the global factor when one of the variables is restricted to a short interval.

Lemma A.3.

Let $N,m,d$ be natural numbers, with $d\geqslant m+1$ , and let $\varepsilon,\delta$ be positive parameters. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map, and assume that for all $d$ of the coordinate subspaces101010A coordinate subspace of $\mathbb{R}^{d}$ is a subspace generated by a subset of the standard basis vectors. $U\leqslant\mathbb{R}^{d}$ of dimension $d-1$ , the map $L|_{U}:U\longrightarrow\mathbb{R}^{m}$ is also surjective. Let $\mathbf{v}\in\mathbb{R}^{m}$ be any vector, and let $I\subset\mathbb{R}$ be an interval of length $\delta N$ . Fix a coordinate $j\in[d]$ . Then

[TABLE]

Proof.

By the assumptions of the surjectivity of the restrictions of $L$ , without loss of generality we may assume that $j=d$ and that the first $m$ columns of $L$ form an invertible matrix $M$ . Then, by integrating over $x_{m+1},\dots,x_{d}$ , one has

[TABLE]

since $M$ is invertible. ∎

The final lemma of this section details what occurs when one permutes the parameters in the global factor.

Lemma A.4.

Let $N$ , $m$ , $d$ be natural numbers, with $d\geqslant m+1$ , and let $\delta,\varepsilon$ be positive parameters. Assume further that $\delta<1/2$ . Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective purely irrational linear map, and assume that for all $d$ of the coordinate subspaces $U\leqslant\mathbb{R}^{d}$ of dimension $d-1$ , the map $L|_{U}:U\longrightarrow\mathbb{R}^{m}$ is also surjective. Let $\mathbf{v}\in\mathbb{R}^{m}$ be any vector. Then

[TABLE]

The proof of this lemma is very similar to the proof of Lemma A.2. We merely sketch the relevant changes.

Proof.

By Lemma A.3 one may replace the left-hand side of (A.4) by

[TABLE]

Let $C_{1}$ be a suitably large constant (depending on $L$ ). Following the procedure in the proof of Lemma A.2, and with the same notation for $M$ and $\mathbf{a^{(j)}}$ , one establishes that the left-hand side of (A.4) is equal to

[TABLE]

plus an error of size at most

[TABLE]

where $\int^{*}$ indicates integration over those $\mathbf{x_{m+1}^{d}}\in[-1,2]^{d-m}$ for which

[TABLE]

The error (A.10) may be bounded above by $O_{L}((\delta+\varepsilon/N)\varepsilon^{m}N^{d-m}))$ , by the same method as we used to bound (A.5).

The main term (A.9), from the work in Lemma A.2, is equal to

[TABLE]

Bounding this integral using Lemma A.1, the present lemma follows. ∎

Remark A.5.

In the proofs above, we used, in a critical way, the fact that the convex domains $[-\varepsilon,\varepsilon]^{m}$ and $[0,N]^{d}$ are axis-parallel boxes.**

Appendix B An analytic argument

We take the opportunity to record a rather more direct argument which yields an asymptotic formula for expressions of the form

[TABLE]

in the case when $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ is a linear map with $d$ at least $2m+1$ (and certain irrationality conditions hold). This method is a simple elaboration on Parsell’s ideas [18], and can handle more general coefficients than Theorem 1.7 (although it requires more variables, of course). We suspect that this result has been obvious to the experts for fifteen years or more, but we feel that it should appear explicitly in the literature.

Theorem B.1.

Let $N,m,d$ be natural numbers, with $d\geqslant 2m+1$ , and let $\varepsilon$ be a positive parameter. Let $L:\mathbb{R}^{d}\longrightarrow\mathbb{R}^{m}$ be a surjective linear map. Assume further that, when written as matrix with respect to the standard bases, all the $m$ -by- $m$ sub-matrices of $L$ have non-zero determinant. Assume also that there does not exist a vector $\boldsymbol{\alpha}\in\mathbb{R}^{m}\setminus\{\mathbf{0}\}$ such that $L^{T}\boldsymbol{\alpha}\in\mathbb{Z}^{d}$ (i.e. in the language of Definition 5.2 assume that $L$ is purely irrational). Then, for all vectors $\mathbf{v}\in\mathbb{R}^{m}$ ,

[TABLE]

Remark B.2.

The asymptotic size of the main term may easily be established using Lemma A.2.**

Sketch proof.

We sketch the argument, referring heavily to estimates from [7] and [18]. Define

[TABLE]

Let $T(N)$ be a function that tends to infinity as $N$ tends to infinity, to be defined later, and let $\delta=\delta(N)$ be a function (depending on the function $T$ ) that tends to zero suitably slowly as $N$ tends to infinity.

By Lemma 3.1, there exists smooth functions $G_{\pm}:\mathbb{R}^{m}\longrightarrow[0,1]$ for which $G_{\pm}\in\mathcal{C}(\delta)$ and

[TABLE]

By Fourier inversion we see that

[TABLE]

where $L=(\lambda_{ij})_{i\leqslant m,\,j\leqslant d}$ . We estimate the integrals by splitting the range of integration in three regions.

[TABLE]

for some large constant $B$ . See Section 8 for another instance of this technique.

Much of the estimation relies on the following tight mean value bound.

Lemma B.3.

Let $U\subset\mathbb{R}^{m}$ be a domain. Let $d>2m$ and let $l$ be a positive real number satisfying

[TABLE]

Then

[TABLE]

Proof.

We write the left-hand side of (B.3) as

[TABLE]

which is

[TABLE]

by Hölder’s inequality. Note that $dl/m=\left(\begin{smallmatrix}d\\ m\end{smallmatrix}\right)l/\left(\begin{smallmatrix}d-1\\ m-1\end{smallmatrix}\right)>2$ .

Now recall the bound from [18, Lemma 3], namely

[TABLE]

Note that for each fixed $S$ the $m$ -by- $m$ submatrix of $L$ given by $(\lambda_{i,j})_{i\leqslant m,\,j\in S}$ is invertible. Therefore, by applying an invertible change of variables and splitting $U$ into boxes, (B.4) is

[TABLE]

This implies the lemma. ∎

Trivial arc: The estimation on the trivial arc proceeds very similarly to page 8 of [18]. Indeed, by the bound in Lemma 3.4 we have

[TABLE]

which is

[TABLE]

by Lemma B.3, provided $\delta(N)$ decays slowly enough.

Minor arc: The following is the natural higher-dimensional version of the argument in [18].

Lemma B.4.

For any positive $A$ and $B$ ,

[TABLE]

Proof.

Note that by the prime number theorem one has the trivial bound $|f(\alpha)|\ll N$ . Assuming for contradiction that the lemma is false, there exists some $A$ , some $B$ , and some positive $\varepsilon$ such that, for infinitely many $N$ , there exists a vector $\boldsymbol{\alpha}^{(N)}\in\mathbb{R}^{m}$ satisfying $\frac{\log^{B}N}{N}\leqslant\|\boldsymbol{\alpha}^{(N)}\|_{\infty}\leqslant A$ and

[TABLE]

Then by [18, Lemma 1] it follows that for each such $N$ , and for all $j\in[d]$ , there exist integers $q_{j}^{(N)}$ and $a_{j}^{(N)}$ such that $1\leqslant q_{j}^{(N)}\ll_{\varepsilon}1$ and

[TABLE]

where $|\theta_{j}^{(N)}|\ll_{\varepsilon}N^{-1}$ . We observe that, if $N$ is large enough, we have the bound $a_{j}^{(N)}\ll_{A,L,\varepsilon}1$ . Since $A$ and $\varepsilon$ are fixed we may (by taking a subsequence of $N$ ) assume that both $q_{j}^{(N)}$ and $a_{j}^{(N)}$ are independent of $N$ . We call these integers $q_{j}$ and $a_{j}$ respectively.

Now, suppose that $a_{j}=0$ for all $j$ . Then

[TABLE]

Since the map $L^{T}:\mathbb{R}^{m}\longrightarrow\mathbb{R}^{d}$ is injective, there exists an inverse linear map

[TABLE]

which must necessarily be bounded. Hence $\|\boldsymbol{\alpha}^{(N)}\|_{\infty}\ll_{\varepsilon}N^{-1}$ , which is a contradiction for large enough $N$ , since $\|\boldsymbol{\alpha}^{(N)}\|_{\infty}\gg(\log^{B}N)/N$ . Therefore there exists some $j$ for which $a_{j}\neq 0$ .

Finally, the sequence $\alpha_{i}^{(N)}$ is contained in a compact domain, and so it must have a convergent subsequence with limit $\alpha_{i}$ , say. Taking this limit, we observe that

[TABLE]

and so

[TABLE]

Hence there exists a vector $\boldsymbol{\beta}\in\mathbb{R}^{m}\setminus\{\mathbf{0}\}$ such that $L^{T}\boldsymbol{\beta}\in\mathbb{Z}^{d}$ , contradicting the assumptions of Theorem B.1. This proves the lemma. ∎

In the usual fashion, one may use Lemma B.4 to deduce that there is some slowly growing function $T(N)$ such that $T(N)\rightarrow\infty$ as $N\rightarrow\infty$ such that

[TABLE]

the details being given in Section 3 of [18]. Defining the minor arc $\mathfrak{m}$ using this function $T(N)$ , we have exactly

[TABLE]

Therefore, picking some positive parameter $\eta$ that is small enough such that taking $l=1-\eta$ satisfies the hypotheses of Lemma B.3,

[TABLE]

is

[TABLE]

if $T(N)$ grows slowly enough.

Major arc: The analysis of the contribution from the major arc is routine, given the lemmas we established in Appendix A. Let $c$ be a small positive constant whose exact value may change between each line. By the estimate (7) from [18] one has, for $\boldsymbol{\alpha}\in\mathfrak{M}$ ,

[TABLE]

where

[TABLE]

Since the measure of $\mathfrak{M}$ is $O((\log^{mB}N)N^{-m})$ , we have

[TABLE]

Since

[TABLE]

we may extend the above integral to all of $\mathbb{R}^{m}$ at the cost of an error of $O(N^{d-m}\log^{-B}N)$ . In other the words, the contribution from the major arcs is

[TABLE]

which is

[TABLE]

Fixing a large value of $B$ , since $\delta=o(1)$ this expression is equal to

[TABLE]

by Lemma A.4. This completes the theorem. ∎

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Baker. On some diophantine inequalities involving primes. J. Reine Angew. Math. , 228:166–181, 1967.
2[2] Antal Balog. Linear equations in primes. Mathematika , 39(2):367–378, 1992.
3[3] P.-Y. Bienvenu. A higher-dimensional Siegel-Walfisz theorem. Acta Arith. , 179(1):79–100, 2017.
4[4] E. Bombieri, J. B. Friedlander, and H. Iwaniec. Primes in arithmetic progressions to large moduli. II. Math. Ann. , 277(3):361–393, 1987.
5[5] H. Davenport and H. Heilbronn. On indefinite quadratic forms in five variables. J. London Math. Soc. , 21:185–193, 1946.
6[6] D. E. Freeman. Asymptotic lower bounds for Diophantine inequalities. Mathematika , 47(1-2):127–159, 2000.
7[7] D. E. Freeman. Asymptotic lower bounds and formulas for Diophantine inequalities. In Number theory for the millennium, II (Urbana, IL, 2000) , pages 57–74. A K Peters, Natick, MA, 2002.
8[8] L. Grafakos. Classical Fourier analysis , volume 249 of Graduate Texts in Mathematics . Springer, New York, second edition, 2008.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Linear inequalities in primes

Abstract.

Contents

1. Introduction

Theorem 1.1** (Theorem 1.8, [13], Green-Tao-Ziegler).**

Theorem 1.2** ([1], Baker).**

Remark 1.3**.**

Theorem 1.4** (Theorem 1, [18], Parsell).**

Definition 1.5** (Dual degeneracy variety, [21]).**

Remark 1.6**.**

Theorem 1.7** (Main theorem, purely irrational version).**

Remark 1.8**.**

Remark 1.9**.**

Remark 1.10**.**

Remark 1.11**.**

Example 1.12**.**

Proof.

Corollary 1.13**.**

Proof.

Definition 1.14**.**

Definition 1.15** (Local von Mangoldt function).**

Theorem 1.16** (Main theorem).**

Remark 1.17**.**

Remark 1.18**.**

Remark 1.19**.**

Remark 1.20**.**

Conjecture 1.21** (Transcendental case).**

2. The structure of the argument

Part I Preliminaries

3. Smooth functions

Lemma 3.1**.**

Proof.

Lemma 3.2** (Smooth partition of unity).**

Proof.

Lemma 3.3** (Approximating Lipschitz functions by smooth boxes).**

Proof.

Lemma 3.4**.**

Proof.

Definition 3.5** (Dual lattice).**

Lemma 3.6** (Poisson summation).**

Proof.

4. Notation and Conventions

Definition 4.1** (η\etaη-supported).**

Part II Linear algebra

5. Dimension reduction

Definition 5.1**.**

Definition 5.2** (Rational dimension, rational map, purely irrational).**

Remark 5.3**.**

Definition 5.4** (Finite Cauchy-Schwarz complexity).**

Definition 5.5** (Finite Cauchy-Schwarz complexity, equivalent definition).**

Lemma 5.6** (Generating a purely irrational map).**

Proof.

Remark 5.7**.**

6. Normal form

Definition 6.1** (Normal form).**

Lemma 6.2** (Normal form extensions).**

Proof.

Remark 6.3**.**

Part III Pseudorandomness

7. The WWW-trick and Gowers norms

Definition 7.1**.**

Theorem 7.2**.**

Remark 7.3**.**

Remark 7.4**.**

Lemma 7.5**.**

Proof.

8. Inequalities in lattices

Lemma 8.1** (Inequalities in lattices).**

Remark 8.2**.**

Proof of Lemma 8.1.

Lemma 8.3**.**

Proof.

Lemma 8.4**.**

Remark 8.5**.**

Theorem 1.1 (Theorem 1.8, [13], Green-Tao-Ziegler).

Theorem 1.2 ([1], Baker).

Remark 1.3.

Theorem 1.4 (Theorem 1, [18], Parsell).

Definition 1.5 (Dual degeneracy variety, [21]).

Remark 1.6.

Theorem 1.7 (Main theorem, purely irrational version).

Remark 1.8.

Remark 1.9.

Remark 1.10.

Remark 1.11.

Example 1.12.

Corollary 1.13.

Definition 1.14.

Definition 1.15 (Local von Mangoldt function).

Theorem 1.16 (Main theorem).

Remark 1.17.

Remark 1.18.

Remark 1.19.

Remark 1.20.

Conjecture 1.21 (Transcendental case).

Lemma 3.1.

Lemma 3.2 (Smooth partition of unity).

Lemma 3.3 (Approximating Lipschitz functions by smooth boxes).

Lemma 3.4.

Definition 3.5 (Dual lattice).

Lemma 3.6 (Poisson summation).

Definition 4.1 ( $\eta$ -supported).

Definition 5.1.

Definition 5.2 (Rational dimension, rational map, purely irrational).

Remark 5.3.

Definition 5.4 (Finite Cauchy-Schwarz complexity).

Definition 5.5 (Finite Cauchy-Schwarz complexity, equivalent definition).

Lemma 5.6 (Generating a purely irrational map).

Remark 5.7.

Definition 6.1 (Normal form).

Lemma 6.2 (Normal form extensions).

Remark 6.3.

7. The $W$ -trick and Gowers norms

Definition 7.1.

Theorem 7.2.

Remark 7.3.

Remark 7.4.

Lemma 7.5.

Lemma 8.1 (Inequalities in lattices).

Remark 8.2.

Lemma 8.3.

Lemma 8.4.

Remark 8.5.

Lemma 8.6.

Definition 9.1 (Linear inequalities condition).

Remark 9.2.

Definition 9.3 (Smooth sieve weight).

Definition 9.4 (Pseudorandom majorant).

Theorem 9.5 (Pseudorandomness of sieve weights).

Conjecture 9.6 (Pseudorandomness conjecture).

Remark 9.7.

Lemma 9.8.

Theorem 9.9.

Lemma 9.10.

Lemma 9.11.

Remark 9.12.

Lemma 9.13.

Remark 9.14.

Corollary 9.15 (Upper bound for linear inequalities).

Lemma 9.16 (Weak upper bound).

Lemma 10.1 (Alternative formulation).

Definition 10.2 (Convolution).

Corollary 10.3 (Switching functions).

Corollary 10.4.

Lemma 11.1.

Lemma 11.2.

Remark 11.3.

Theorem 12.1 (Generalised von Neumann Theorem).

13. Transferring from $\mathbb{Z}$ to $\mathbb{R}$

Definition 13.1.

Lemma 13.2 (Transfer).

Lemma 13.3 (Relating different Gowers norms).

Lemma 14.1 (Separating out the kernel).

Lemma 14.2 (Parametrising by normal form).