Good Bounds in Certain Systems of True Complexity One

Freddie Manners

arXiv:1705.06801·math.NT·December 31, 2018

Good Bounds in Certain Systems of True Complexity One

Freddie Manners

PDF

TL;DR

This paper proves that multilinear averages associated with certain linear systems in finite fields are controlled by the Gowers U^2 norm, with polynomial bounds, strengthening previous results and avoiding inverse Gowers norm theory.

Contribution

It establishes polynomial bounds for multilinear averages in systems with true complexity 1 and Cauchy-Schwarz complexity 2, using only Cauchy-Schwarz inequalities.

Findings

01

Multilinear averages are controlled by the U^2 norm with polynomial dependence.

02

The bounds strengthen previous results by Gowers and Wolf.

03

The dependence of the controlling constant on the system's coefficients is necessary.

Abstract

Let $Φ = (ϕ_{1}, \dots, ϕ_{6})$ be a system of $6$ linear forms in $3$ variables, i.e. $ϕ_{i} : Z^{3} \to Z$ for each $i$ . Suppose also that $Φ$ has Cauchy--Schwarz complexity $2$ and true complexity $1$ , in the sense defined by Gowers and Wolf; in fact this is true generically in this setting. Finally let $G = F_{p}^{n}$ for any $p$ prime and $n \geq 1$ . Then we show that multilinear averages by $Φ$ are controlled by the $U^{2}$ -norm, with a polynomial dependence; i.e. if $f_{1}, \dots, f_{6} : G \to C$ are functions with $∥ f_{i} ∥_{\infty} \leq 1$ for each $i$ , then for each $j$ , $1 \leq j \leq 6$ : \[ \left| \mathbb{E}_{x_1,x_2,x_3 \in G} f_1(\varphi_1(x_1,x_2,x_3)) \dots f_6(\phi_6(x_1,x_2,x_3)) \right| \le \|f_j\|_{U^2}^{1/C} \] for some $C > 0$ depending on $Φ$ . This recovers and strengthens a result of Gowers and Wolf in these cases.…

Equations210

x_{1}, x_{2}, x_{3} \in G E f_{1} (ϕ_{1} (x_{1}, x_{2}, x_{3})) \dots f_{6} (ϕ_{6} (x_{1}, x_{2}, x_{3})) \leq ∥ f_{j} ∥_{U^{2}}^{1/ C}

x_{1}, x_{2}, x_{3} \in G E f_{1} (ϕ_{1} (x_{1}, x_{2}, x_{3})) \dots f_{6} (ϕ_{6} (x_{1}, x_{2}, x_{3})) \leq ∥ f_{j} ∥_{U^{2}}^{1/ C}

Λ_{Φ} (f_{1}, \dots, f_{r}) = x_{1}, \dots, x_{d} \in G E f_{1} (ϕ_{1} (x_{1}, \dots, x_{d})) \dots f_{r} (ϕ_{r} (x_{1}, \dots, x_{d})) .

Λ_{Φ} (f_{1}, \dots, f_{r}) = x_{1}, \dots, x_{d} \in G E f_{1} (ϕ_{1} (x_{1}, \dots, x_{d})) \dots f_{r} (ϕ_{r} (x_{1}, \dots, x_{d})) .

Λ_{Φ} (1_{X}, \dots, 1_{X}) = α^{r} + o (1)

Λ_{Φ} (1_{X}, \dots, 1_{X}) = α^{r} + o (1)

∣ Λ_{Φ} (f_{1}, \dots, f_{r}) ∣ = o (1);

∣ Λ_{Φ} (f_{1}, \dots, f_{r}) ∣ = o (1);

∣ Λ_{k AP} (f_{1}, \dots, f_{k}) ∥ \leq ∥ f_{j} ∥_{U^{k - 1}}

∣ Λ_{k AP} (f_{1}, \dots, f_{k}) ∥ \leq ∥ f_{j} ∥_{U^{k - 1}}

f_{1} (x)

f_{1} (x)

f_{2} (x)

∣ Λ_{Φ} (f_{1}, \dots, f_{r}) ∣ \leq ∥ f_{j} ∥_{U^{s + 1}} .

∣ Λ_{Φ} (f_{1}, \dots, f_{r}) ∣ \leq ∥ f_{j} ∥_{U^{s + 1}} .

Λ_{Φ} (f_{1}, \dots, f_{r}) = 1

Λ_{Φ} (f_{1}, \dots, f_{r}) = 1

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \leq ∥ f_{j} ∥_{U^{2} (G)}^{1/ C}

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \leq ∥ f_{j} ∥_{U^{2} (G)}^{1/ C}

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \geq 1 0^{- 12}

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \geq 1 0^{- 12}

∥ f_{j} ∥_{U^{2}} \leq p^{- 1/8}

∥ f_{j} ∥_{U^{2}} \leq p^{- 1/8}

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \leq ∥ f_{j} ∥_{U^{2}}

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \leq ∥ f_{j} ∥_{U^{2}}

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \leq ∥ f_{1} ∥_{U^{2}}^{1/2} .

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \leq ∥ f_{1} ∥_{U^{2}}^{1/2} .

Λ_{Φ} (f_{1}, \dots, f_{6}) = x \in F_{p} E f_{6} (x) y, z \in F_{p} E f_{1} (ϕ_{1} (x, y, z)) \dots f_{5} (ϕ_{5} (x, y, z)) .

Λ_{Φ} (f_{1}, \dots, f_{6}) = x \in F_{p} E f_{6} (x) y, z \in F_{p} E f_{1} (ϕ_{1} (x, y, z)) \dots f_{5} (ϕ_{5} (x, y, z)) .

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \leq (x E ∣ f_{6} (x) ∣^{2})^{1/2} (x E y, z E f_{1} (ϕ_{1} (x, y, z)) \dots f_{5} (ϕ_{5} (x, y, z))^{2})^{1/2} .

∣ Λ_{Φ} (f_{1}, \dots, f_{6}) ∣ \leq (x E ∣ f_{6} (x) ∣^{2})^{1/2} (x E y, z E f_{1} (ϕ_{1} (x, y, z)) \dots f_{5} (ϕ_{5} (x, y, z))^{2})^{1/2} .

x, y, y^{'}, z, z^{'} E f_{1} (ϕ_{1} (x, y, z)) \dots f_{5} (ϕ_{5} (x, y, z)) \overline{f_{1} (ϕ_{1} (x, y^{'}, z^{'})) \dots f_{5} (ϕ_{5} (x, y^{'}, z^{'}))}

x, y, y^{'}, z, z^{'} E f_{1} (ϕ_{1} (x, y, z)) \dots f_{5} (ϕ_{5} (x, y, z)) \overline{f_{1} (ϕ_{1} (x, y^{'}, z^{'})) \dots f_{5} (ϕ_{5} (x, y^{'}, z^{'}))}

S

S

T

Λ_{Φ^{'}} (f_{1}, \dots, f_{5}, \overline{f_{1}}, \dots, \overline{f_{5}}) \leq ∥ f_{1} ∥_{U^{2}}

Λ_{Φ^{'}} (f_{1}, \dots, f_{5}, \overline{f_{1}}, \dots, \overline{f_{5}}) \leq ∥ f_{1} ∥_{U^{2}}

span (ϕ_{2_{0}}, ϕ_{4_{0}}, ϕ_{1_{1}}, ϕ_{2_{1}}, ϕ_{3_{1}}) = span (ϕ_{2_{0}}, ϕ_{4_{0}}, ϕ_{1_{1}}, ϕ_{2_{1}})

span (ϕ_{2_{0}}, ϕ_{4_{0}}, ϕ_{1_{1}}, ϕ_{2_{1}}, ϕ_{3_{1}}) = span (ϕ_{2_{0}}, ϕ_{4_{0}}, ϕ_{1_{1}}, ϕ_{2_{1}})

M_{S} = a_{1} a_{2} a_{4} a_{1} a_{2} b_{1} b_{2} b_{4} 00 000 b_{1} b_{2} c_{1} c_{2} c_{4} 00 000 c_{1} c_{2}

M_{S} = a_{1} a_{2} a_{4} a_{1} a_{2} b_{1} b_{2} b_{4} 00 000 b_{1} b_{2} c_{1} c_{2} c_{4} 00 000 c_{1} c_{2}

det M_{S}

det M_{S}

= \pm det a_{1} a_{2} a_{4} b_{1} b_{2} b_{4} c_{1} c_{2} c_{4} det 1 a_{1} a_{2} 0 b_{1} b_{2} 0 c_{1} c_{2} .

M_{T} = a_{1} a_{3} a_{5} a_{4} a_{5} b_{1} b_{3} b_{5} 00 000 b_{4} b_{5} c_{1} c_{3} c_{5} 00 000 c_{4} c_{5}

M_{T} = a_{1} a_{3} a_{5} a_{4} a_{5} b_{1} b_{3} b_{5} 00 000 b_{4} b_{5} c_{1} c_{3} c_{5} 00 000 c_{4} c_{5}

det M_{T} = \pm det a_{1} a_{3} a_{5} b_{1} b_{3} b_{5} c_{1} c_{3} c_{5} det 1 a_{4} a_{5} 0 b_{4} b_{5} 0 c_{4} c_{5}

det M_{T} = \pm det a_{1} a_{3} a_{5} b_{1} b_{3} b_{5} c_{1} c_{3} c_{5} det 1 a_{4} a_{5} 0 b_{4} b_{5} 0 c_{4} c_{5}

ϕ_{1} (x, y, z)

ϕ_{1} (x, y, z)

ϕ_{2} (x, y, z)

ϕ_{3} (x, y, z)

x, y, z E f_{1} (x) f_{2} (y) f_{3} (z) f_{4} (x + y + z) f_{5} (2 x + 3 y + 5 z) f_{6} (a x + b y + cz)

x, y, z E f_{1} (x) f_{2} (y) f_{3} (z) f_{4} (x + y + z) f_{5} (2 x + 3 y + 5 z) f_{6} (a x + b y + cz)

= x E f_{1} (x) y, z E f_{2} (y) f_{3} (z) f_{4} (x + y + z) f_{5} (2 x + 3 y + 5 z) f_{6} (a x + b y + cz)

\displaystyle\leq\left(\operatorname*{\mathbb{E}}_{x}|f_{1}(x)|^{2}\right)^{1/2}\bigg{(}\operatorname*{\mathbb{E}}_{x,y_{0},y_{1},z_{0},z_{1}}\begin{aligned} &f_{2}(y_{0})f_{3}(z_{0})f_{4}(x+y_{0}+z_{0})f_{5}(2x+3y_{0}+5z_{0})f_{6}(ax_{0}+by_{0}+cz_{0})\\ &\overline{f_{2}(y_{1})f_{3}(z_{1})f_{4}(x+y_{1}+z_{1})f_{5}(2x+3y_{1}+5z_{1})f_{6}(ax+by_{1}+cz_{1})}\bigg{)}^{1/2}\end{aligned}

y_{0}, y_{1} E f_{2} (y_{0}) \overline{f_{2} (y_{1})} x, z_{0}, z_{1} E f_{3} (z_{0}) f_{4} (x + y_{0} + z_{0}) f_{5} (2 x + 3 y_{0} + 5 z_{0}) f_{6} (a x + b y_{0} + c z_{0}) \overline{f_{3} (z_{1}) f_{4} (x + y_{1} + z_{1}) f_{5} (2 x + 3 y_{1} + 5 z_{1}) f_{6} (a x + b y_{1} + c z_{1})}

y_{0}, y_{1} E f_{2} (y_{0}) \overline{f_{2} (y_{1})} x, z_{0}, z_{1} E f_{3} (z_{0}) f_{4} (x + y_{0} + z_{0}) f_{5} (2 x + 3 y_{0} + 5 z_{0}) f_{6} (a x + b y_{0} + c z_{0}) \overline{f_{3} (z_{1}) f_{4} (x + y_{1} + z_{1}) f_{5} (2 x + 3 y_{1} + 5 z_{1}) f_{6} (a x + b y_{1} + c z_{1})}

\displaystyle\leq\left(\operatorname*{\mathbb{E}}_{y_{0},y_{1}}|f_{2}(y_{0})|^{2}|f_{2}(y_{1})|^{2}\right)^{1/2}\bigg{(}\operatorname*{\mathbb{E}}_{\begin{subarray}{c}y_{0},y_{1},\\ x_{0},x_{1},\\ z_{00},z_{01},\\ z_{10},z_{11}\end{subarray}}\begin{aligned} &f_{3}(z_{00})f_{4}(x_{0}+y_{0}+z_{00})f_{5}(2x_{0}+3y_{0}+5z_{00})f_{6}(ax_{0}+by_{0}+cz_{00})\\ &\overline{f_{3}(z_{10})f_{4}(x_{0}+y_{1}+z_{10})f_{5}(2x_{0}+3y_{1}+5z_{10})f_{6}(ax_{0}+by_{1}+cz_{10})}\\ &\overline{f_{3}(z_{01})f_{4}(x_{1}+y_{0}+z_{01})f_{5}(2x_{1}+3y_{0}+5z_{01})f_{6}(ax_{1}+by_{0}+cz_{01})}\\ &f_{3}(z_{11})f_{4}(x_{1}+y_{1}+z_{11})f_{5}(2x_{1}+3y_{1}+5z_{11})f_{6}(ax_{1}+by_{1}+cz_{11})\bigg{)}^{1/2}.\end{aligned}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Abstract

Let $\Phi=(\phi_{1},\dots,\phi_{6})$ be a system of $6$ linear forms in $3$ variables, i.e. $\phi_{i}\colon\mathbb{Z}^{3}\to\mathbb{Z}$ for each $i$ . Suppose also that $\Phi$ has Cauchy–Schwarz complexity $2$ and true complexity $1$ , in the sense defined by Gowers and Wolf; in fact this is true generically in this setting. Finally let $G=\mathbb{F}_{p}^{n}$ for any $p$ prime and $n\geq 1$ . Then we show that multilinear averages by $\Phi$ are controlled by the $U^{2}$ -norm, with a polynomial dependence; i.e. if $f_{1},\dots,f_{6}\colon G\to\mathbb{C}$ are functions with $\|f_{i}\|_{\infty}\leq 1$ for each $i$ , then for each $j$ , $1\leq j\leq 6$ :

[TABLE]

for some $C>0$ depending on $\Phi$ . This recovers and strengthens a result of Gowers and Wolf in these cases. Moreover, the proof uses only multiple applications of the Cauchy–Schwarz inequality, avoiding appeals to the inverse theory of the Gowers norms.

We also show that some dependence of $C$ on $\Phi$ is necessary; that is, the constant $C$ can unavoidably become large as the coefficients of $\Phi$ grow.

\dajAUTHORdetails

title = Good Bounds in Certain Systems of True Complexity One, author = Freddie Manners, plaintextauthor = Freddie Manners, \dajEDITORdetailsyear=2018, number=21, received=27 September 2017, published=28 December 2018, doi=10.19086/da.6814,

[classification=text]

1 Introduction

Let $G$ be a finite abelian group and $X\subseteq G$ a subset. Many problems of interest in additive combinatorics and related fields involve counting solutions to some system of equations within $X$ . For instance, we might wish to count Schur triples $(x,y,x+y)$ all of whose coordinates lie in $X$ , or $4$ -term arithmetic progressions $(x,x+h,x+2h,x+3h)$ where each term lies in $X$ , etc..

The most general case of this kind of question is as follows: given a tuple of $r$ linear forms $\Phi=(\phi_{1},\dots,\phi_{r})$ where $\phi_{i}\colon\mathbb{Z}^{d}\to\mathbb{Z}$ , and $r$ functions $f_{1},\dots,f_{r}\colon G\to\mathbb{C}$ , estimate the quantity

[TABLE]

(Here we have abused notation to let $\phi_{i}\colon\mathbb{Z}^{d}\to\mathbb{Z}$ induce a function $G^{d}\to G$ , in the obvious way.)

So, in our examples above, we take $f_{1}=\dots=f_{r}=1_{X}$ , the indicator function of $X$ ; however it is convenient to allow more general functions in the definition of $\Lambda_{\Phi}$ as they arise in intermediate computations. In our examples, $\Phi$ is as follows:

•

in the case of Schur triples, $r=3$ , $d=2$ , $\phi_{1}(x,y)=x$ , $\phi_{2}(x,y)=y$ and $\phi_{3}(x,y)=x+y$ ;

•

in the case of $4$ -term arithmetic progressions, $r=4$ , $d=2$ , $\phi_{1}(x,y)=x$ , $\phi_{2}(x,y)=x+y$ , $\phi_{3}(x,y)=x+2y$ and $\phi_{4}(x,y)=x+3y$ .

A fundamental observation in much recent progress in such questions (as applied to Szemerédi-type theorems, or counting solutions to linear equations in the primes), originally due to Gowers [3], is that averages $\Lambda_{\Phi}$ are controlled by Gowers uniformity norms.111We will assume the reader is familiar with the definition of Gowers norms and some related concepts; for an introduction, see e.g. [9, Appendix B], [15, Chapter 11], or [14].

A weak statement of this type is that if $X$ has density $\alpha$ and $X$ is suitably quasirandom in the sense that $\|1_{X}-\alpha\|_{U^{s+1}}=o(1)$ , where $s$ is some positive integer, then

[TABLE]

i.e. the number of solutions to the system $\Phi$ in $X$ is roughly the same as the expected count in a random set, i.e. $\alpha^{r}|G|^{d}$ . A stronger type of statement one could make is that if $f_{1},\dots,f_{d}\colon G\to\mathbb{C}$ are any functions with $\|f_{i}\|_{\infty}\leq 1$ for all $i$ , and $\|f_{j}\|_{U^{s+1}}=o(1)$ for any one $j\in\{1,\dots,d\}$ , then

[TABLE]

and indeed this kind of statement implies the previous one.

The remaining question is when one has such a statement for a system of linear forms $\Phi$ , and if so, how small the positive integer $s$ can be; i.e. how far one has to go in the hierarchy of Gowers norms to control $\Lambda_{\Phi}$ . For instance, for $k$ -term arithmetic progressions, Gowers [3] showed that a statement of type (2) holds for $s=k-2$ , and with a good bound; specifically,

[TABLE]

whenever $\|f_{i}\|_{\infty}\leq 1$ for each $i$ , and for any $1\leq j\leq k$ . The proof is $k-1$ applications of the Cauchy–Schwarz inequality.

Moreover, Gowers gave examples to show that $s=k-2$ cannot be improved. For instance, when $k=4$ and $G=\mathbb{Z}/p\mathbb{Z}$ , we can consider functions

[TABLE]

for some nonzero $a\in\mathbb{Z}/p\mathbb{Z}$ , and observe that $f_{1}(x)f_{2}(x+h)f_{3}(x+2h)f_{4}(x+3h)=1$ pointwise for any $x,h\in G$ . So, $\Lambda_{4\mathrm{AP}}(f_{1},\dots,f_{4})=1$ , but one can show that $\|f_{j}\|_{U^{2}}=O(p^{-1/2})$ . This rules out a statement of type (2) for $s=1$ ; taking appropriate level sets of these functions rules out (1) also.

The first systematic approach to this question for general systems of linear forms was given by Green and Tao [9] in the course of their work on linear equations in primes. The following is essentially implicit as a much easier case of results from that paper, and was isolated in [4]; however, the terminology we use is slightly different to both.

Proposition 1.1 (Essentially from [9]).

Given a prime $p$ , a system $\Phi=(\phi_{1},\dots,\phi_{r})$ of linear forms $\mathbb{Z}^{d}\to\mathbb{Z}$ , and an index $j$ , $1\leq j\leq r$ , we say $\Phi$ has Cauchy–Schwarz complexity $\leq{}s$ at $j$ , modulo $p$ , if the following holds: the indices $\{1,\dots,r\}\setminus\{j\}$ can be partitioned into $s+1$ classes $I_{1},\dots,I_{s+1}$ such that $\phi_{j}$ modulo $p$ , considered as a linear form $\mathbb{F}_{p}^{d}\to\mathbb{F}_{p}$ , is not contained in $\operatorname{span}_{\mathbb{F}_{p}}(\phi_{i}\colon i\in I_{k})$ for any $1\leq k\leq s+1$ .

Let $G=\mathbb{F}_{p}^{n}$ , where $p$ , $n$ may be any size (including say $n=1$ ). If $\Phi$ has Cauchy–Schwarz complexity $\leq s$ at $j$ modulo $p$ , then for any functions $f_{1},\dots,f_{d}\colon G\to\mathbb{C}$ with $\|f_{i}\|_{\infty}\leq 1$ for each $i$ , we have

[TABLE]

If $\Phi$ has Cauchy–Schwarz complexity $\leq s$ at every index, modulo $p$ , we could just say the system has Cauchy–Schwarz complexity $\leq s$ modulo $p$ , and write $s_{\operatorname{CS}}(\Phi)$ for the smallest $s$ for which this holds (where $s_{\operatorname{CS}}$ implicitly depends on $p$ ). If $p$ is very large, taking the span of $\phi_{1},\dots,\phi_{r}$ as linear forms $\mathbb{F}_{p}^{d}\to\mathbb{F}_{p}$ is essentially equivalent to working over $\mathbb{Q}$ and the value of $s_{\operatorname{CS}}$ stabilizes.

As the name suggests, the proof of Proposition 1.1 is $s+1$ applications of Cauchy–Schwarz, as in Gowers’ work. The content of the proposition is really in establishing the linear algebra condition that guarantees this Cauchy–Schwarz argument will work.

Following this, Gowers and Wolf, in a series of papers [4, 5, 6, 7], considered the question: is the value of $s$ given by Cauchy–Schwarz complexity optimal? It is natural to try to adapt the examples given by Gowers for $k$ -term progressions to the general case to give a lower bound. The task comes down to finding phase polynomials $f_{1},\dots,f_{r}\colon G\to\mathbb{C}$ of degree $D$ , i.e. functions of the form $f_{j}(x)=\exp(2\pi iP_{j}(x))$ where $P_{j}\colon G\to\mathbb{R}/\mathbb{Z}$ is a degree $D$ polynomial (in a natural sense) for each $j$ , such that

[TABLE]

i.e. the multilinear average is equal to $1$ pointwise. By contrast $\|f_{j}\|_{U^{D}}$ will typically be very small when $f_{j}$ is a degree $D$ phase polynomial, so this rules out a statement of type (2) or (1) for $s\leq D-1$ .

It turns out that this is possible if and only if $\phi_{1}^{D},\dots,\phi_{r}^{D}\in((\mathbb{F}_{p}^{d})^{\ast})^{\otimes D}$ are linearly dependent, where $\phi_{i}^{D}=\phi_{i}^{\otimes D}$ are interpreted as symmetric multilinear forms over $\mathbb{F}_{p}^{d}$ . In other words, $\Lambda_{\Phi}$ can only be fully controlled by the $U^{s+1}$ -norm if $\phi_{1}^{s+1},\dots,\phi_{r}^{s+1}$ are linearly independent elements of $((\mathbb{F}_{p}^{d})^{\ast})^{\otimes D}$ .

It also turns out that this lower bound, arising from explicit phase polynomials, and the upper bound coming from Cauchy–Schwarz complexity, do not agree in general. Gowers and Wolf conjectured that the lower bound is the truth; that is, if the true complexity of $\Phi$ over $\mathbb{F}_{p}$ is defined to be the smallest $s$ such that $\phi_{1}^{s+1},\dots,\phi_{r}^{s+1}$ are linearly independent,222In fact Gowers and Wolf set up the definitions slightly differently: they define true complexity to be the smallest $s$ such that (1) holds, and conjecture it is equal to the algebraic quantity we have just defined. Since this conjecture is known to be true in cases of interest, defining things the other way round should hopefully not cause too much confusion. then (2) holds for this $s$ and any $1\leq j\leq r$ . By our previous discussion, such a statement would be (qualitatively) best possible in $s$ .

In what follows, we write $s=s(\Phi)$ to denote this notion of the true complexity of a system of linear forms (over $\mathbb{F}_{p}$ ), unless otherwise stated.

This conjecture has now been resolved in essentially all cases of interest. The original paper [4] by Gowers and Wolf proved the case where $s=1$ , $s_{\operatorname{CS}}=2$ , and $G=\mathbb{F}_{p}^{n}$ for $p$ fixed and $n$ large. This case was proved again in [6] (also by Gowers and Wolf) with an improved quantitative bound on $|\Lambda_{\Phi}(f_{1},\dots,f_{r})|$ in terms of $\|f_{j}\|_{U^{s+1}}$ . Still when $G=\mathbb{F}_{p}^{n}$ for $p$ fixed, but not too small, the general case (i.e. arbitrary finite $s$ and $s_{\operatorname{CS}}$ ) was proven in another paper [5] by the same authors. They also showed the case $s_{\operatorname{CS}}=2$ and $s=1$ for $G=\mathbb{Z}/p\mathbb{Z}$ , where this time $p$ is large, in [7]. The general result in the cyclic setting $G=\mathbb{Z}/p\mathbb{Z}$ for $p$ large was shown by Green and Tao [8] as an application of their nilsequence-based arithmetic regularity lemma.

Later, Hatami and Lovett [11] extended the results of [5] to the asymmetric case, where $\phi_{1}^{s+1},\dots,\phi_{r}^{s+1}$ may be linearly dependent but not all of these multilinear forms are in the linear span of the others, which corresponds to (2) holding for some choices of $j$ but not others. Finally, Hatami, Hatami and Lovett [10] removed the requirement in the case $G=\mathbb{F}_{p}^{n}$ for $p$ fixed that $p$ be not too small.

We comment only very briefly on the proofs, as they will not play a large role in the current work. We focus on the simplest case of $p$ fixed, $s=1$ and $s_{\operatorname{CS}}=2$ . By the assumption on $s_{\operatorname{CS}}$ and Proposition 1.1, we are free to discard small $U^{3}$ errors at any point. By the inverse theorem for the Gowers $U^{3}$ -norm, this means we are free to assume that each $f_{i}$ is a linear combination of a few phase polynomials of degree at most $2$ . We would like to argue that when $s=1$ , the quadratic terms do not contribute much to $\Lambda_{\Phi}$ ; however, this requires a more robust version of our assumption that $\phi_{1}^{2},\dots,\phi_{r}^{2}$ are linearly independent (in effect, we need that no non-trivial linear combination over $((\mathbb{F}_{p}^{nd})^{\ast})^{\otimes 2}$ has low rank). Bridging this gap between the robust and non-robust statements is the heart of the argument.

At least qualitatively, these works verify all the central conjectures concerning true complexity. Nonetheless, there are some unresolved questions of interest.

Question 1.2.

What are the best possible bounds in the true complexity statement? That is, how small must $\delta$ be in terms of $\varepsilon$ to ensure that $\|f_{j}\|_{U^{s+1}}\leq\delta$ implies $|\Lambda_{\Phi}(f_{1},\dots,f_{d})|\leq\varepsilon$ ?

In the case $s=1$ and $s_{\operatorname{CS}}=2$ , Gowers and Wolf [6, 7] obtained a doubly exponential dependence, i.e. $\delta\approx\exp\exp(-O(\varepsilon^{-C}))$ .333One of these exponentials can probably be removed using subsequent improved bounds in the inverse theorem for the $U^{3}$ -norm that follow from work of Sanders [13]. The author is grateful to the anonymous reviewer for pointing this out. In all other cases where $s\neq s_{\operatorname{CS}}$ the best known bounds are ineffective, or as good as ineffective, as they rely on the inverse theorems for the $U^{k}$ -norms for $k\geq 4$ for which no good bounds are known.

In [5, Problem 7.8], Gowers and Wolf suggested that the dependence cannot be too good, and specifically, not polynomial; that is, they asked whether one could find a counterexample ruling out $\delta\approx\varepsilon^{C}$ .

This is closely related to the following question.

Question 1.3.

In cases where the true complexity and Cauchy–Schwarz complexity differ, could a true complexity bound be proven by elementary means, e.g. by many applications of the Cauchy–Schwarz inequality; or is some appeal to the structural theory of higher order Fourier analysis essential? Is there some qualitative feature which separates the elementary and non-elementary cases?

The primary motivation behind Gowers and Wolf’s appeal for counterexamples to good bounds is that this would rule out a proof based only on complicated applications of the Cauchy–Schwarz inequality, as that would surely give a polynomial bound.

Our final question is at first appearance more eccentric but we will see its relevance shortly.

Question 1.4.

Working over $G=\mathbb{Z}/p\mathbb{Z}$ for $p$ a large prime, and when $s\neq s_{\operatorname{CS}}$ , all the known bounds on $\varepsilon$ in terms of $\delta$ depend on the coefficients of the linear forms in $\Phi$ , not just on the values of $s$ and $s_{\operatorname{CS}}$ . In practice, these results are only effective if the coefficients are essentially bounded.

By contrast, the Cauchy–Schwarz complexity bound is completely uniform in the coefficients, provided the hypotheses are satisfied.

Is this restriction to bounded coefficients necessary, whenever $s\neq s_{\operatorname{CS}}$ ?

Working over $\mathbb{F}_{p}^{n}$ for $p$ fixed and $n$ large, there are only finitely many choices of linear forms, so any dependence on the coefficients can be removed. In this setting, the analogous question is whether the bounds should genuinely depend on $p$ .

In this paper, we consider what is in some sense the smallest non-trivial case where $s\neq s_{\operatorname{CS}}$ , which concerns systems of $6$ linear forms in $3$ variables (i.e., $r=6$ and $d=3$ ). Indeed, when $d=2$ it is always the case that $s=s_{\operatorname{CS}}$ , and similarly for $d=3$ and $r\leq 5$ . However, a generic system $\Phi$ with $r=6$ and $d=3$ will have $s=1$ but $s_{\operatorname{CS}}=2$ (see Section 2 for a discussion).

In this limited setting, we are able to give fairly complete answers to the questions above. We now outline the main results.

Theorem 1.5.

Let $\Phi=(\phi_{1},\dots,\phi_{6})$ be a system of $6$ linear forms in $3$ variables, let $p$ be a prime (not necessarily small), and let $G=\mathbb{F}_{p}^{n}$ for any $n\geq 1$ .

Then, provided the system $\Phi$ has true complexity $1$ over $\mathbb{F}_{p}$ , for any functions $f_{1},\dots,f_{6}\colon G\to\mathbb{C}$ with $\|f_{i}\|_{\infty}\leq 1$ for each $i$ , and for any $j$ , $1\leq j\leq 6$ , we have the bound

[TABLE]

where $C=C(\Phi,j)>0$ is some constant depending on the coefficients of $\Phi$ , and perhaps $j$ , but crucially not on $p$ or $n$ .

Moreover, the above inequality can be derived only using multiple applications of the Cauchy–Schwarz inequality. However, the number of applications used in the proof increases without bound as the coefficients of $\Phi$ grow.

Note that, since no restrictions are placed on $n$ and $p$ , this encompasses the cases $\mathbb{Z}/p\mathbb{Z}$ for $p$ a large prime as well as $\mathbb{F}_{p}^{n}$ for $p$ fixed and $n$ large. In intermediate cases where $p$ and $n$ are both large, even qualitatively the result may officially be new, although these cases are rarely of interest.

The key observation underlying the proof is the following.

Slogan 1.6.

Cauchy–Schwarz complexity is not preserved under applying the Cauchy–Schwarz inequality.

By this we mean the following. If we start with a system of linear forms $\Phi$ and apply the Cauchy–Schwarz inequality to one of the functions, what we get can be thought of as a new linear system $\Phi^{\prime}$ with $2(r-1)$ forms and $2d-1$ variables. It is not always true that if $s_{\operatorname{CS}}(\Phi)>1$ then $s_{\operatorname{CS}}(\Phi^{\prime})>1$ ; so in some cases we can now apply the Cauchy–Schwarz complexity bound (Proposition 1.1) to $\Lambda_{\Phi^{\prime}}$ to bound it by $\|f_{j}\|_{U^{2}}$ , meaning that in turn $\Lambda_{\Phi}$ is bounded by $\|f_{j}\|_{U^{2}}^{1/2}$ . More generally, we can hope to apply Cauchy–Schwarz repeatedly and systematically, eventually arriving at a system with Cauchy–Schwarz complexity $1$ .

On the other hand, we show that the quantity $C(\Phi)$ , which quantifies the number of times the Cauchy–Schwarz inequality is used, must necessarily grow without bound as $\Phi$ varies.

Theorem 1.7.

For any sufficiently large prime $p$ , $p\equiv\pm 1\pmod{8}$ , there exist a system $\Phi$ of $6$ linear forms in $3$ variables with $s(\Phi)=1$ , and functions $f_{1},\dots,f_{6}\colon\mathbb{Z}/p\mathbb{Z}\to\mathbb{C}$ with $\|f_{i}\|_{\infty}\leq 1$ for each $i$ , such that

[TABLE]

but

[TABLE]

for each $j$ .

Unlike in Theorem 1.5, here the system $\Psi$ is allowed to change as $p$ grows, with no control on the size of its coefficients. The condition $p\equiv\pm 1\pmod{8}$ is an inessential one related to the precise construction used and could be removed without too much added difficulty.

*Remark 1.8**.*

This negative result perhaps sheds some light on where the obstructions to a very straightforward proof of Theorem 1.5 lie.

The difficulty turns out not to be that the Cauchy–Schwarz inequality is insufficiently powerful, or too blunt to detect the algebraic nature of the boundary between systems with $s=1$ and $s=2$ ; in fact it handles such considerations surprisingly easily.

Instead, the issue is that the Cauchy–Schwarz steps used must necessarily be tailor-made to the system $\Phi$ being considered. The task of describing a mapping from systems to Cauchy–Schwarz arguments could be likened to that of building a primitive computer using only the Cauchy–Schwarz inequality. Setting up the technical machinery required to achieve this will occupy most of the paper.

*Remark 1.9**.*

The value of $C(\Phi)$ given by the proof of Theorem 1.5 is completely explicit but in many cases unreasonably large. No serious attempt has been made to optimize it, although minor changes would probably produce only minor improvements.

For large $p$ , the worst-case behavior given by the proof is something like $\exp(O(K^{O(1)}))$ where $K$ is the size of the largest (integer) coefficient appearing in $\Phi$ . Although typically one expects not to hit the worst case, nonetheless in practice for integer coefficients of size about $10$ values such as $C\approx 2^{400}$ are not unusual. It seems likely such values are not best possible.

When $p$ is fixed, we may state a bound in terms of $p$ rather than the size of the coefficients. Here the method gives $C(p)=O(p^{O(1)})$ . It is possible one could modify the argument to improve this to $O(\log p)$ , which would be best possible up to absolute constants. However, significant additional technical challenges arise, and so we will not attempt this here.

*Remark 1.10**.*

The general case of Questions 1.2, 1.3 and 1.4, for $d>3$ or $r>6$ , remains open. It seems reasonable to speculate that Theorem 1.5 (and Theorem 1.7) have analogues in this level of generality. There is no immediately apparent obstruction to the overall approach of repeated application of the Cauchy–Schwarz inequality succeeding in general, but conversely it is not obvious how to generalize the specific strategies used when $d=3$ and $r=6$ to the general case. Therefore, this is left to possible future work.

1.1 Outline of the paper

In Section 2 we present some preliminaries concerning the case of $6$ forms in $3$ variables. In particular, we will deal with some initial degenerate cases where, in a technical and slightly disingenuous sense, we will see that applying the Cauchy–Schwarz inequality causes $s_{\operatorname{CS}}$ to decrease. We will need these cases in what follows, but this also serves as an introduction to the general approach behind the proof of Theorem 1.5 without the notational complexities.

In Section 3 we introduce formalisms to keep track of the effects of multiple applications of Cauchy–Schwarz in a systematic manner. This has the effect of reducing the proof of Theorem 1.5 in any given instance, to winning a Cauchy–Schwarz “game” which has a well-defined set of possible moves and which can readily be simulated on a computer.

Section 4 addresses the core problem of solving this game in general. This comes down to finding sequences of moves which have the effect of implementing predictable arithmetic operations on the system $\Phi$ , and using them to walk $\Phi$ to a degenerate configuration of the type considered in Section 2.

Finally, Section 5 gives the proof of the negative result, Theorem 1.7.

1.2 Notation

We use $O(1)$ to denote any quantity bounded above by an absolute constant, and $O(X)$ to mean $O(1)X$ . The notation $[m]$ for $m$ a positive integer denotes the set $\{1,2,\dots,m\}$ . For a real parameter $x$ , $e(x)$ denotes $\exp(2\pi ix)$ . The notation $[A=B]$ (for example) denotes the indicator function of the event $A=B$ . If $W$ is a finite-dimensional vector space over $\mathbb{F}_{p}$ , we write $W^{\ast}$ for its dual space and $\mathbb{P}(W)$ for the corresponding projective space (i.e. the space of $1$ -dimensional subspaces of $W$ ). Also, $\mathbb{P}^{k}=\mathbb{P}^{k}(\mathbb{F}_{p})$ means the same as $\mathbb{P}\big{(}\mathbb{F}_{p}^{k+1}\big{)}$ . Given $w\in W\setminus\{0\}$ , we write $[w]$ for the corresponding element of $\mathbb{P}(W)$ . If $U\subseteq W$ is a subspace of $W$ , we write $U^{\perp}\subseteq W^{\ast}$ for the perpendicular subpsace, i.e. the set of all $\phi\in W^{\ast}$ that vanish on $U$ .

1.3 Acknowledgements

The author would like to thank Sean Eberhard, Ben Green, Rudi Mrazović and Julia Wolf for discussions on these topics at various times.

2 Preliminaries concerning six forms in three variables

We start by giving a brief analysis of the different cases that can arise concerning a system of six forms $\Phi=(\phi_{1},\dots,\phi_{6})$ in three variables, and the associated Cauchy–Schwarz complexity and true complexity. Throughout this section we write $V=\mathbb{F}_{p}^{3}$ , so $\phi_{i}$ (modulo $p$ ) can be thought of as linear functionals $V\to\mathbb{F}_{p}$ , always assumed to be non-zero.

It is clear that nothing substantial changes when we replace $\phi_{i}$ by a non-zero scalar multiple $\lambda\phi_{i}$ . Indeed, the quantities $f_{i}(\phi_{i}(v))$ and $f_{i}(\lambda\phi_{i}(v))$ are essentially the same, up to replacing $f_{i}$ with a dilate of itself, and so this has no effect on the conclusion of Theorem 1.5; and by inspection our definitions of true complexity and Cauchy–Schwarz complexity are also unchanged.

Therefore it makes sense to think of the forms $\phi_{i}\colon V\to\mathbb{F}_{p}$ as points $[\phi_{i}]$ in the projective plane $\mathbb{P}(V^{\ast})\cong\mathbb{P}^{2}(\mathbb{F}_{p})$ , quotienting out by the action of scalar multiplication. This allows us to phrase the different cases geometrically.

We have said that $\Phi$ has true complexity $s=1$ if the symmetric bilinear forms $\phi_{1}^{2},\dots,\phi_{6}^{2}\in(V^{\ast})^{\otimes 2}$ are linearly independent. Note that this space of symmetric bilinear forms on $V$ has dimension $(3\cdot 4)/2=6$ , and there are six forms, so we expect this to be true generically. Indeed, a dependence relation on $\phi_{1}^{2},\dots,\phi_{6}^{2}$ exists if and only if there is a non-zero linear functional $\mu\colon\{\sigma\in(V^{\ast})^{\otimes 2}\colon\sigma=\sigma^{T}\}\to\mathbb{F}_{p}$ which evaluates to [math] on each $\phi_{i}^{2}$ ; and this in turn is the same thing as a non-zero quadratic form $V^{\ast}\to\mathbb{F}_{p}$ which vanishes at each $\phi_{i}$ ; i.e., a conic in the projective plane $\mathbb{P}(V^{\ast})$ containing $[\phi_{i}]$ for each $i$ .444Note that this argument is still valid as stated when $p=2$ .

In other words, we have shown the following.

Slogan 2.1.

Six forms $\phi_{1},\dots,\phi_{6}$ on $\mathbb{F}_{p}^{3}$ have true complexity at least $2$ , if and only if $[\phi_{1}],\dots,[\phi_{6}]$ all lie on a (possibly degenerate) conic in $\mathbb{P}(V^{\ast})$ .

By a degenerate conic, we mean the union of two lines. In particular, if $[\phi_{1}],[\phi_{2}],[\phi_{3}]$ are collinear and so are $[\phi_{4}],[\phi_{5}],[\phi_{6}]$ , then this system has true complexity at least $2$ .

It is possible for the true complexity to be greater than $2$ : for instance, if five of the points lie on a line in $\mathbb{P}^{2}$ , in which case the true complexity is $3$ ; or if $[\phi_{i}]=[\phi_{j}]$ for some $i\neq j$ , in which case the system has infinite complexity.

However, all such cases may be fully analyzed in terms of Cauchy–Schwarz complexity, which gives a bound $|\Lambda_{\Phi}(f_{1},\dots,f_{6})|\leq\|f_{j}\|_{U^{s_{j}+1}}$ for each $j$ where the values $s_{j}$ are best possible, even when they vary with $j$ . The details are an uninteresting check that will not be relevant to the argument, so are omitted.

We therefore restrict our attention to the cases with $s=1$ . In particular, we can henceforth make the following assumptions:

(i)

the points $[\phi_{1}],\dots,[\phi_{6}]$ are all distinct; 2. (ii)

no four are collinear; and 3. (iii)

if some three of the points are collinear, the remaining three are not collinear.

It is clear that if the six forms are in general position, meaning no three are collinear, then $s_{\operatorname{CS}}=2$ . Indeed, any way we partition all but one of the forms into two classes, one of the classes will contain three forms and so their span will be all of $V^{\ast}$ ; hence $s_{\operatorname{CS}}>1$ . Conversely any $2-2-1$ split achieves $s_{\operatorname{CS}}\leq 2$ .

Our remaining task in this section is to consider the case where (i)–(iii) hold, but nonetheless $[\phi_{1}],\dots,[\phi_{6}]$ are not in general position. This is a setting in which Cauchy–Schwarz complexity has some purchase, but nonetheless there is a subtlety meaning, technically speaking, that typically $s_{\operatorname{CS}}\neq 1$ .

Proposition 2.2.

Suppose throughout that a system of forms $\Phi=(\phi_{1},\dots,\phi_{6})$ on $\mathbb{F}_{p}^{3}$ is given, with no four of $[\phi_{1}],\dots,[\phi_{6}]$ collinear and no two the same.

(i)

Suppose that $[\phi_{1}],[\phi_{2}],[\phi_{3}]$ are collinear but $[\phi_{4}],[\phi_{5}],[\phi_{6}]$ are not collinear. Then for functions $f_{1},\dots,f_{6}\colon\mathbb{F}_{p}^{n}\to\mathbb{C}$ , with $\|f_{i}\|_{\infty}\leq 1$ for each $i$ , and for any $j=4,5,6$ we have a bound

[TABLE]

coming from Proposition 1.1. 2. (ii)

Under the same conditions as (i), the system has true complexity $s=1$ . In particular, by results of Gowers and Wolf, $|\Lambda_{\Phi}(f_{1},\dots,f_{6})|$ is bounded in terms of $\|f_{j}\|_{U^{2}}$ for $j=1,2,3$ (at least for $n=1$ or $p$ fixed). 3. (iii)

Now suppose further that $[\phi_{1}],[\phi_{2}],[\phi_{3}]$ is the only collinear triple. Then for $j=1,2,3$ there is no way to partition $\{1,\dots,6\}\setminus\{j\}$ into two pieces such that $\phi_{j}$ is in the span of neither piece, and hence $s_{\operatorname{CS}}=2$ for this system.

Proof.

For (i), say when $j=6$ , we can partition $\{1,\dots,5\}$ into $\{1,2,3\}$ and $\{4,5\}$ . By our assumptions, it is clear that $\phi_{6}$ is in neither $\operatorname{span}(\phi_{1},\phi_{2},\phi_{3})=\operatorname{span}(\phi_{1},\phi_{2})\subseteq V^{\ast}$ nor $\operatorname{span}(\phi_{4},\phi_{5})\subseteq V^{\ast}$ , and so the bound indeed follows from Proposition 1.1. The other choices of $j$ are analogous.

For (ii), we note that a conic containing three distinct collinear points must be degenerate, but $[\phi_{1}],\dots,[\phi_{6}]$ are not contained in the union of any two lines. Hence the points do not lie on a conic, and so $s=1$ .

For (iii), when say $j=1$ , given any partition of $\{2,\dots,6\}$ into two pieces, one of the pieces contains three of the forms. Since that triple is not $\phi_{1},\phi_{2},\phi_{3}$ , they are not collinear and so their span is all of $V^{\ast}$ . ∎

So, this is a case where $s$ and $s_{\operatorname{CS}}$ differ, albeit for what feels like a bad reason. Indeed, it is not too challenging to recover a good bound on $\Lambda_{\Phi}(f_{1},\dots,f_{6})$ in terms of $\|f_{1}\|_{U^{2}}$ in this setting, for instance by decomposing $f_{6}$ into two parts corresponding to its large and small Fourier coefficients, bounding away the uniform contribution and treating what is left as essentially a system of five forms.

Instead, we will now recover such a bound purely by using the Cauchy–Schwarz inequality, and thereby provide the first (admittedly unimpressive) instantiation of Slogan 1.6.

Proposition 2.3.

Let $\Phi=(\phi_{1},\dots,\phi_{6})$ be a system of linear forms on $\mathbb{F}_{p}^{3}$ , such that no four of $[\phi_{1}],\dots,[\phi_{6}]$ are collinear and no two are the same; $[\phi_{1}],[\phi_{2}],[\phi_{3}]$ are collinear; and $[\phi_{4}],[\phi_{5}],[\phi_{6}]$ are not collinear.

Then for functions $f_{1},\dots,f_{6}\colon\mathbb{F}_{p}^{n}\to\mathbb{C}$ , with $\|f_{i}\|_{\infty}\leq 1$ for each $i$ , we have a bound

[TABLE]

Proof.

By applying a suitable change of basis to $V=\mathbb{F}_{p}^{3}$ , we may assume without loss of generality that $\phi_{6}(x,y,z)=x$ ; this is not essential but eases the notation. So,

[TABLE]

We can apply the Cauchy–Schwarz inequality to obtain

[TABLE]

Now, the term on the right expands to

[TABLE]

and we can think of this as $\Lambda_{\Phi^{\prime}}(f_{1},\dots,f_{5},\overline{f_{1}},\dots,\overline{f_{5}})$ where $\Phi^{\prime}$ is the system of $10$ forms in the five variables $x,y,y^{\prime},z,z^{\prime}$ , given by $\phi_{i_{0}}(x,y,y^{\prime},z,z^{\prime})=\phi_{i}(x,y,z)$ and $\phi_{i_{1}}(x,y,y^{\prime},z,z^{\prime})=\phi_{i}(x,y^{\prime},z^{\prime})$ for $1\leq i\leq 5$ , each thought of as a linear functional $\mathbb{F}_{p}^{5}\to\mathbb{F}_{p}$ .

We claim that, under our hypotheses, it is possible to partition the nine forms $\phi_{2_{0}}$ , $\phi_{3_{0}}$ , $\dots$ , $\phi_{5_{0}}$ , $\phi_{1_{1}}$ , $\dots$ , $\phi_{5_{1}}$ into two classes such that $\phi_{1_{0}}$ is not in the span of either class. Specifically, we will take

[TABLE]

to be the sets of indices in each class. If this claim holds, then by the standard Cauchy–Schwarz complexity bound (Proposition 1.1) again we have

[TABLE]

and the result follows.

We now verify the claim. We first note that, since $[\phi_{1}],[\phi_{2}],[\phi_{3}]$ are collinear by hypothesis,

[TABLE]

as $\phi_{3_{1}}\in\operatorname{span}(\phi_{1_{1}},\phi_{2_{1}})$ . This makes the claim plausible for dimension reasons: it is reasonable to expect the span of four linear forms on $\mathbb{F}_{p}^{5}$ not to contain a fifth, unless something untoward happens. However, something untoward could genuinely happen if too many of the original forms are collinear, and more generally we need to show that all bad cases are ruled out by our hypotheses. This is the technical part of the calculation, and may be skipped on first (or subsequent) reading.

Recall $\phi_{6}(x,y,z)=x$ and write $\phi_{i}(x,y,z)=a_{i}x+b_{i}y+c_{i}z$ for $1\leq i\leq 5$ . To show $\phi_{1_{0}}$ is not in the span of $\phi_{2_{0}},\phi_{4_{0}},\phi_{1_{1}},\phi_{2_{1}}$ , it would suffice to show that $\phi_{1_{0}}$ together with the other four form a basis for $\left(\mathbb{F}_{p}^{5}\right)^{\ast}$ ; equivalently, that the matrix

[TABLE]

(whose columns correspond to $x,y,y^{\prime},z,z^{\prime}$ respectively) is non-singular. However, it is not hard to see that

[TABLE]

The determinants on the right hand side are zero precisely when, respectively, $[\phi_{1}],[\phi_{2}],[\phi_{4}]$ or $[\phi_{6}],[\phi_{1}],[\phi_{2}]$ are collinear. Under our assumptions, neither can be true (as then four points would lie on a line) and so $M_{S}$ is non-singular.

The argument for $T$ is very similar. We define

[TABLE]

which is non-singular if and only if $\phi_{1_{0}},\phi_{3_{0}},\phi_{5_{0}},\phi_{4_{1}},\phi_{5_{1}}$ form a basis. Then

[TABLE]

and again this is zero if and only if either $[\phi_{1}],[\phi_{3}],[\phi_{5}]$ or $[\phi_{6}],[\phi_{4}],[\phi_{5}]$ are collinear. Again, both of these are explicitly ruled out by our hypotheses, and this proves the claim. ∎

*Remark 2.4**.*

One way to think of this proof on a high level is as a combinatorial analogue of the method we sketched above: namely, first observing that $\Lambda_{\Phi}$ is controlled by $\|f_{6}\|_{U^{2}}$ , then noting this allows us to essentially eliminate $f_{6}$ by replacing it with the sum of its large Fourier coefficients, and finally applying Cauchy–Schwarz on the remaining five forms.

What we do here is first make two copies of the original system, joined by $\phi_{6}$ ; on the right, we decompose the remaining forms as if we were attempting to prove a Cauchy–Schwarz complexity bound in $\|f_{6}\|_{U^{2}}$ , as in Proposition 2.2; and on the left we decompose as if we were tring to prove a Cauchy–Schwarz complexity bound in $\|f_{1}\|_{U^{2}}$ and the form $\phi_{6}$ didn’t exist.

So, the initial Cauchy–Schwarz allows us to somehow substitute the information gained from the former argument into the latter.

*Remark 2.5**.*

As we have said, this is an application of Slogan 1.6, but not a very convincing one. Before embarking on the programme in full generality, we briefly sketch an example of six forms in general position, having $s=1$ (but necessarily $s_{\operatorname{CS}}=2$ ), where we nonetheless get a bound using only Cauchy–Schwarz.

Consider the forms

[TABLE]

where $a,b,c\in\mathbb{F}_{p}$ are arbitrary subject to the condition that the forms be in general position. For concreteness one could substitute $a=7$ , $b=11$ , $c=13$ .

We can apply Cauchy–Schwarz to $\phi_{1},\dots,\phi_{6}$ twice as follows:

[TABLE]

and then again

[TABLE]

We now claim that this last system of $16$ linear forms in $8$ variables has Cauchy–Schwarz complexity $1$ with respect to $f_{3}(z_{00})$ , if and only if the original system has true complexity $1$ . That is, we can partition the remaining forms into two classes:

[TABLE]

such that $z_{00}$ lies in the span of one of the classes, if and only if $[\phi_{1}],\dots,[\phi_{6}]$ lie on a conic. Verifying this claim is left as an exercise for the interested reader. We stress that for particular choices of $a,b,c$ this is an elementary finite computation.

The fact that these kinds of linear algebra conditions can detect whether the points lie on a conic should perhaps not be surprising in light of Pascal’s hexagon theorem.

3 Formalisms for iterated Cauchy–Schwarz

The purpose of this section is to introduce some formalisms necessary to keep track of what happens when we apply the Cauchy–Schwarz inequality repeatedly. The notational overhead here is high, but preferable to handling yet larger explicit calculations in the style of the previous section.

3.1 Linear data

Although the central objects of study are systems of linear forms, it will be convenient to use a natural generalization of this notion, which handles the objects that arise in intermediate stages of the calculation. We introduce the relevant definitions now.

Definition 3.1.

Let a prime $p$ be fixed. By a linear datum, we mean a tuple $\Psi=\left(V,(W_{i})_{i\in I},(\psi_{i})_{i\in I}\right)$ , where $I$ is some finite index set, and

•

$V$ and $W_{i}$ for $i\in I$ are finite-dimensional vector spaces over $\mathbb{F}_{p}$ ;

•

$\psi_{i}\colon V\to W_{i}$ are surjective linear maps.

Given a positive integer $n$ , we abuse notation to write $\psi_{i}\colon V^{n}\to W_{i}^{n}$ for the map that applies $\psi_{i}$ to each coordinate. Now, for a collection of functions $(f_{i})_{i\in I}$ with $f_{i}\colon W_{i}^{n}\to\mathbb{C}$ , we define

[TABLE]

It is clear that in the special case that $\dim W_{i}=1$ for each $i$ , this is essentially the same information as a system of linear forms on $V\cong\mathbb{F}_{p}^{d}$ for some $d$ . The reader should always imagine $\dim V$ as being small, even when we are working over $G=\mathbb{F}_{p}^{n}$ for some large $n$ : the $n$ is taken care of in the definition of $\Lambda_{\Psi}$ , not of $\Psi$ .

Attempting to analyse linear data in general exposes hard problems; see [1, 2]. Since the linear data we will consider ultimately come from systems of linear forms, these subtleties will not arise here.

*Remark 3.2**.*

Typically we are not too concerned by replacing $W_{i}$ by isomorphic vector spaces, or by the exact form of the linear map $\psi_{i}$ : for instance, as we have said the difference between $f(\psi_{i}(v))$ and $f(2\psi_{i}(v))$ is usually immaterial.

As such, the only really important information is the collection of subspaces $\ker\psi_{i}$ of $V$ , as we can always recover $W_{i}$ up to isomorphism as $V/\ker\psi_{i}$ . One can interpret $\ker\psi_{i}$ as the subspace of $V$ that the function $f_{i}$ cannot depend on.

Alternatively, we could think about the perpendicular subspaces $(\ker\psi_{i})^{\perp}\subseteq V^{\ast}$ , corresponding to the span of all linear forms derived from $\psi_{i}$ . This is consistent with the geometric picture from Section 2: such subspaces correspond to points, lines, planes etc. in $\mathbb{P}(V^{\ast})$ .

For technical reasons it is useful to keep track of the linear maps $\psi_{i}$ explicitly; but the reader will rarely lose anything, and possibly gain something, by thinking of a linear datum as simply a collection of subspaces of $V$ or $V^{\ast}$ .

We need some notion of when one linear datum bounds another; for instance, but not exclusively, because one is obtained by applying the Cauchy–Schwarz inequality to the other.

Definition 3.3.

Suppose we have two linear data $\Psi=\left(V,(W_{i})_{i\in I},(\psi_{i})_{i\in I}\right)$ and $\Psi^{\prime}=\left(V^{\prime},(W^{\prime}_{i})_{i\in I^{\prime}},(\psi^{\prime}_{i})_{i\in I^{\prime}}\right)$ . Suppose further that for some pair $j\in I$ , $j^{\prime}\in I^{\prime}$ the subspaces $W^{\prime}_{j^{\prime}}=W_{j}$ are identified. Finally, let $c>0$ be a positive real number.

We say $\Psi^{\prime}$ dominates $\Psi$ respecting $(j,j^{\prime})$ with exponent $c$ if the following holds: for $n\geq 1$ and any collection of functions $(f_{i})_{i\in I}$ , $f_{i}\colon W_{i}^{n}\to\mathbb{C}$ with $\|f_{i}\|_{\infty}\leq 1$ , there exist functions $(g_{i})_{i\in I^{\prime}}$ , $g_{i}\colon{W^{\prime}_{i}}^{n}\to\mathbb{C}$ , $\|g_{i}\|_{\infty}\leq 1$ , such that $g_{j^{\prime}}=f_{j}$ , and

[TABLE]

It is clear that domination is transitive: if $\Psi^{\prime}$ dominates $\Psi$ respecting $(j,j^{\prime})$ with exponent $c$ , and $\Psi^{\prime\prime}$ dominates $\Psi^{\prime}$ respecting $(j^{\prime},j^{\prime\prime})$ with exponent $c^{\prime}$ , then $\Psi^{\prime\prime}$ dominates $\Psi$ respecting $(j,j^{\prime\prime})$ with exponent $cc^{\prime}$ .

Some straightforward examples of domination include (i) replacing $\Psi$ by an isomorphic system (i.e. reparameterizing); (ii) augmenting $\Psi$ by introducing further averaging, or by replacing $\ker\psi_{i}$ by a strictly larger subspace for some $i$ ; or (iii) taking a supremum over some part of the average. All of these are subsumed in the following general proposition.

Proposition 3.4.

Suppose $\Psi=\left(V,(W_{i})_{i\in I},(\psi_{i})_{i\in I}\right)$ and $\Psi^{\prime}=\left(V^{\prime},(W^{\prime}_{i})_{i\in I},(\psi^{\prime}_{i})_{i\in I}\right)$ are two linear data on the same index set $I$ , and that we are given linear maps $\theta\colon V^{\prime}\to V$ and $\sigma_{i}\colon W_{i}^{\prime}\to W_{i}$ such that $\psi_{i}\circ\theta=\sigma_{i}\circ\psi^{\prime}_{i}$ for each $i\in I$ (i.e. a morphism of linear data). If $j\in I$ is some index such that $W_{j}=W^{\prime}_{j}$ and $\sigma_{j}$ is the identity, then $\Psi^{\prime}$ dominates $\Psi$ respecting $(j,j)$ , with exponent $1$ .

Proof.

Let $v_{1},\dots,v_{M}\in V^{n}$ be a complete set of coset representatives of $\theta(V^{\prime})^{n}$ in $V^{n}$ . By our hypotheses, we have $\psi_{j}(\theta(V^{\prime}))=\psi^{\prime}_{j}(V^{\prime})=W_{j}$ and so $\ker\psi_{j}+\theta(V^{\prime})=V$ ; hence we may insist that $v_{1},\dots,v_{M}$ all lie in $(\ker\psi_{j})^{n}$ .

For any collection of functions $f_{i}\colon W_{i}^{n}\to\mathbb{C}$ , $i\in I$ , we have

[TABLE]

Now fix $\ell$ to be any maximal choice, and define $g_{i}\colon{W^{\prime}_{i}}^{n}\to\mathbb{C}$ by

[TABLE]

We deduce that $\left|\Lambda_{\Psi}((f_{i})_{i\in I})\right|\leq\left|\Lambda_{\Psi^{\prime}}((g_{i})_{i\in I})\right|$ . Moreover, it follows from our assumptions that $g_{j}=f_{j}$ , and so the conditions of Definition 3.3 are satisfied. ∎

Definition 3.5.

If $(\Psi,\Psi^{\prime})$ obey the hypotheses of Proposition 3.4, we say $\Psi^{\prime}$ dominates $\Psi$ trivially at index $j$ . Replacing $\Psi$ by $\Psi^{\prime}$ is termed a $\operatorname{TRIVIAL}$ operation.

We now consider how to describe an application of the Cauchy–Schwarz inequality in this language.

Proposition 3.6.

Suppose $\Psi=\left(V,(W_{i})_{i\in I},(\psi_{i})_{i\in I}\right)$ is a linear datum, and some $j\in I$ is given. Let $\Psi^{\prime}=\left(V^{\prime},(W^{\prime}_{i})_{i\in I^{\prime}},(\psi^{\prime}_{i})_{i\in I^{\prime}}\right)$ be the linear datum defined as follows:

•

$V^{\prime}$ * is the fiber product of $V$ with itself over $W_{j}$ , i.e.:*

[TABLE]

•

$I^{\prime}$ * is the disjoint union of two copies of $I\setminus\{j\}$ , denoted*

[TABLE]

•

for each $i\in I\setminus\{j\}$ , $W^{\prime}_{i_{0}}=W^{\prime}_{i_{1}}=W_{i}$ ; and

•

for each $i\in I\setminus\{j\}$ and $v^{\prime}=(v_{0},v_{1})\in V^{\prime}$ ,

[TABLE]

Then for any $i\in I\setminus\{j\}$ , $\Psi^{\prime}$ dominates $\Psi$ respecting $(i,i_{0})$ and with exponent $1/2$ .

We note that $\psi^{\prime}_{i_{0}}$ , $\psi^{\prime}_{i_{1}}$ are surjective, e.g. by observing that $\{(v,v)\colon v\in V\}$ is a subspace of $V^{\prime}$ .

Proof.

As promised, this is just the statement of the Cauchy–Schwarz inequality as it applies in this context. Given $f_{i}\colon W_{i}^{n}\to\mathbb{C}$ , we have

[TABLE]

by Cauchy–Schwarz, and

[TABLE]

Defining $(g_{i})_{i\in I^{\prime}}$ in the obvious way, and provided $\|f_{i}\|_{\infty}\leq 1$ for each $i\in I$ , we get the desired inequality. ∎

Definition 3.7.

We denote the system $\Psi^{\prime}$ defined in Proposition 3.6 by $\operatorname{CS}_{j}(\Psi)$ .

Often we need to apply Cauchy–Schwarz not just to one function, but to several at a time. The preferred way of formalizing this for our purposes is in two steps. First, we merge all the functions being considered for Cauchy–Schwarz into a single function. That is, we forget that they are separate functions, and consider their product as just one function of all the variables they collectively depend on. For instance, we might merge $f_{1}(x)$ and $f_{2}(x+y)$ into $\mathcal{F}(x,y)$ . Next, we apply the Cauchy–Schwarz inequality in the form of Proposition 3.6 to the new function $\mathcal{F}$ .

In fact we will want to apply this merging operation in other contexts as well, because doing so is one way to eliminate redundant information. Having this ability is one of the main motivations for working in this more general language of linear data.

Again, we encode this operation with a proposition.

Proposition 3.8.

Let $\Psi=\left(V,(W_{i})_{i\in I},(\psi_{i})_{i\in I}\right)$ be a linear datum, let $J$ be a finite set, and let $\tau\colon I\to J$ be a surjective function. Define a new linear datum $\Psi^{\prime}=\left(V,(W^{\prime}_{j})_{j\in J},(\psi^{\prime}_{j})_{j\in J}\right)$ on the same underlying space $V$ , as follows:

•

for each $j\in J$ , define

[TABLE]

•

define $W^{\prime}_{j}=\operatorname{im}\psi^{\prime}_{j}$ ; and

•

by abuse of notation consider $\psi^{\prime}_{j}$ as a map $V\to W^{\prime}_{j}$ .

Then for any $i\in I$ and $j\in J$ with $\tau^{-1}(j)=\{i\}$ , $\Psi^{\prime}$ dominates $\Psi$ respecting $(i,j)$ and with exponent $1$ .

Proof.

Given $(f_{i})_{i\in I}$ , for each $j\in J$ let

[TABLE]

and then restrict this function to the subspace ${W^{\prime}_{j}}^{n}$ . Then it is easy to see that

[TABLE]

and so the necessary inequality is in fact an equality. Moreover, if $\tau^{-1}(j)=\{i\}$ is a singleton then $W_{i}=W^{\prime}_{j}$ and $g_{j}=f_{i}$ , so the conditions of Definition 3.3 are met. ∎

*Remark 3.9**.*

The definitions of $\psi^{\prime}_{j}$ and $W^{\prime}_{j}$ are somewhat involved. A more natural characterization in terms of subspaces of $V$ is that

[TABLE]

i.e. merging functions corresponds to intersecting the corresponding subspaces. Dually, we have

[TABLE]

so merging takes spans of the relevant subspaces of $V^{\ast}$ . Again, either of these allows us to reconstruct $\psi^{\prime}_{j}$ and $W^{\prime}_{j}$ up to isomorphism.

Definition 3.10.

We denote the linear datum $\Psi^{\prime}$ defined as in Proposition 3.8 by $\operatorname{MERGE}_{\tau}(\Psi)$ .

In a slight abuse of notation, we may omit any indices that are unchanged by $\tau$ from the description of $\tau$ . For instance, the operation that merges indices $4$ and $7$ and labels the new combined index $A$ might be denoted $\operatorname{MERGE}_{\{4,7\}\mapsto A}$ .

We are now in a position to state a version of Theorem 1.5 coded in this language.

Lemma 3.11.

Let $p$ be a prime, and let $\phi_{1},\dots,\phi_{6}\colon\mathbb{Z}^{3}\to\mathbb{Z}$ be six linear forms in three variables. Let $V=\mathbb{F}_{p}^{3}$ , and by abuse of notation let $\phi_{i}\colon V\to\mathbb{F}_{p}$ for $1\leq i\leq 6$ denote the same forms reduced modulo $p$ , assumed to be non-zero. Write $W_{i}=\mathbb{F}_{p}$ for $1\leq i\leq 6$ , set $I=[6]$ , and hence define the linear datum $\Psi=\left(V,(W_{i})_{i\in I},(\phi_{i})_{i\in I}\right)$ .

Suppose $[\phi_{1}],\dots,[\phi_{6}]$ do not lie on a conic in $\mathbb{P}(V^{\ast})$ . Then there is some sequence of operations $\operatorname{TRIVIAL}$ , $\operatorname{CS}_{j}$ and $\operatorname{MERGE}_{\tau}$ which can be applied to $\Psi$ in turn to produce a final linear datum $\Psi^{\prime}$ , such that:

•

$\Psi^{\prime}=(V,(W_{i})_{i\in I},(\phi^{\prime}_{i})_{i\in I})$ * where $I$ , $V$ , $W_{i}$ are unchanged and $\phi_{i}=\phi_{i}^{\prime}$ for $1\leq i\leq 4$ ; i.e. $\Psi^{\prime}$ again corresponds to $6$ linear forms in $3$ variables over $\mathbb{F}_{p}$ , where $\phi^{\prime}_{5}$ and $\phi^{\prime}_{6}$ only may have changed;*

•

by applying Propositions 3.4, 3.6 or 3.8 as appropriate as we go, we can deduce that $\Psi^{\prime}$ dominates $\Psi$ respecting $(1,1)$ and with exponent $2^{-m}$ where $m$ is the number of $\operatorname{CS}$ steps;

•

$m$ * is bounded by $O(K^{O(1)})$ where $K$ is the size of the largest coefficient of $\Phi$ ; or alternatively by $O(\log p)$ ; and*

•

the points $[\phi^{\prime}_{1}],\dots,[\phi^{\prime}_{6}]$ do not lie on a conic, but some three are collinear.

This last condition means that one of Proposition 2.2 or Proposition 2.3 applies to the forms $\phi^{\prime}_{1},\dots,\phi^{\prime}_{6}$ , and so

[TABLE]

for any $f_{i}\colon W_{i}\to\mathbb{C}$ with $\|f_{i}\|_{\infty}\leq 1$ . Combining this with the domination statement allows us to deduce Theorem 1.5 (at least for $j=1$ ; the other cases follow by relabelling the indices).

*Remark 3.12**.*

The combinatorial operations $\operatorname{CS}_{j}$ , $\operatorname{MERGE}_{\tau}$ describe the heart of any strategy, whereas $\operatorname{TRIVIAL}$ steps are really just book-keeping to aid with proofs. One could in principle delay all $\operatorname{TRIVIAL}$ steps to the end of the argument, or perhaps remove them completely, without fundamentally changing the approach.

3.2 Graphs of vector spaces

One remaining difficulty in reasoning about the effect of repeated invocations of $\operatorname{CS}_{j}$ is finding a good notation for discussing the iterated fiber products that arise in the definition of the ambient vector space $V^{\prime}$ .

At the expense of yet further notational overhead, we introduce one more tool to help with this. This subsection has very little content beyond allowing us to draw certain diagrams and make sense of what they mean.

Definition 3.13.

Let $\mathcal{G}=(X,E)$ be a (multi)-graph with vertex set $X$ and edge set $E$ , and let $V$ be any vector space over $\mathbb{F}_{p}$ . Suppose that to every edge $e=(x,y)\in E$ is associated a subspace $H_{e}$ of $V$ . Then the vector space $\mathcal{G}(V,(H_{e})_{e\in E})$ associated to this set-up is the subspace of $V^{X}$ given by

[TABLE]

In other words, we place a copy of $V$ at every vertex and impose a compatibility restriction for every edge.

We will always apply this when each subspace $H_{e}$ is one of $\ker\psi_{i}$ for $1\leq i\leq 6$ , where $\psi_{i}\colon V\to W_{i}$ is part of some original linear datum with underlying space $V$ . It then makes sense to label each edge $e$ with a number $i\in[6]$ , in place of the subspace $H_{e}=\ker\psi_{i}$ .

The useful feature of this set-up is that $\operatorname{CS}_{j}$ steps correspond to simple combinatorial operations on the graph $\mathcal{G}$ : we replace $X$ by two copies $X_{0},X_{1}$ , keeping all the edges in each half; and we add an edge between $X_{0}$ and $X_{1}$ for every linear form involved in that Cauchy–Schwarz step (which is applied to the merge of some linear forms).

This is best illustrated by example. Suppose we start with a linear datum $\Psi=\left(V,(W_{i})_{i\in[6]},(\psi_{i})_{i\in[6]}\right)$ . At this point, the graph $\mathcal{G}$ consists of a single vertex and no edges.

If we now apply $\operatorname{CS}_{6}$ , we get a linear datum whose underlying vector space corresponds to the following graph:

[math] $1$$6$

Indeed, the definition of $\mathcal{G}(V,(H_{e})_{e\in E})$ in this case gives exactly the fiber product from Proposition 3.6. Recall that the indices / forms in scope are now called $1_{0},2_{0},3_{0},4_{0},5_{0}$ and $1_{1},2_{1},3_{1},4_{1},5_{1}$ and are associated to the left vertex and the right vertex respectively.

Suppose we now apply $\operatorname{MERGE}_{\{4_{0},5_{1}\}\mapsto A}$ (recalling that by convention we assume the other indices are sent to themselves) and then apply $\operatorname{CS}_{A}$ . The new graph is:

$00$$10$$01$$11$$6$$6$$4$$5$

Finally, we might apply $\operatorname{CS}_{5_{01}}$ to obtain:

$000$$100$$010$$110$$6$$6$$4$$5$$001$$101$$011$$111$$6$$6$$4$$5$$5$

Any formal justification of this general pattern would be tedious and unreadable, so will will not attempt one. The reader may, if they wish, treat all such diagrams as visual aids having no formal impact on the proofs.

4 The detailed strategy for Theorem 1.5

The formalism of the previous section gives us very significant freedom to make radical changes to a linear datum $\Psi$ . However, to prove a general result, what we want is to find a sequence of operations that changes $\Psi$ as conservatively as possible, ideally giving back another datum of the same form with a small predictable change to some of the parameters.

Our task in this section then splits into two parts:

(i)

to describe such a sequence of operations – henceforth called a block – and analyze and verify the change it produces; and 2. (ii)

to show how to chain these blocks together to reach sufficiently arbitrary points in the parameter space.

We will approach these tasks in reverse order.

4.1 The effect of the block construction

We again write $V=\mathbb{F}_{p}^{3}$ . Suppose $X_{1},\dots,X_{6}$ are six points in $\mathbb{P}(V^{\ast})$ , corresponding to some system of six linear forms. Suppose furthermore that $X_{1},\dots,X_{4}$ are in general position, and that $X_{5},X_{6}$ lie on some given line $\ell$ but $X_{1},\dots,X_{4}$ do not lie on $\ell$ . Note we allow that, say, $X_{1}X_{2}X_{5}$ be collinear, or even that $X_{5}=X_{6}$ ; in the latter case, $\ell$ forms part of the data of the set-up since it cannot be recovered from $X_{1},\dots,X_{6}$ .

We will now describe an operation that modifies this collection of points. Specifically, it will leave $X_{1},\dots,X_{4}$ unchanged, and replace $X_{5},X_{6}$ with two different points $X^{\prime}_{5},X^{\prime}_{6}$ that both lie on the unchanged line $\ell$ .

The points $X^{\prime}_{5},X^{\prime}_{6}$ are constructed as follows. Let $Y$ be the point at the intersection of the lines $X_{1}X_{5}$ and $X_{3}X_{4}$ ; then $X^{\prime}_{5}$ is the intersection of $X_{2}Y$ and $\ell$ . Similarly, letting $Z$ be the intersection of $X_{2}X_{6}$ and $X_{3}X_{4}$ , the point $X^{\prime}_{6}$ is the intersection of $X_{1}Z$ and $\ell$ . This construction is shown in Figure 1.

Given our hypotheses on $X_{1},\dots,X_{6}$ , this definition always makes sense.

We call this construction a block operation $B_{1\to 2}$ , i.e. $B_{1\to 2}(X_{1},\dots,X_{6};\ell)=(X_{1},\dots,X_{4},X^{\prime}_{5},X^{\prime}_{6};\ell)$ . By exchanging the roles of $X_{1},X_{2},X_{3},X_{4}$ we create a family of $12$ operations $B_{i\to j}$ for each pair $i,j\in[4]$ , $i\neq j$ . Swapping $X_{3}$ and $X_{4}$ but leaving $X_{1}$ and $X_{2}$ the same gives the same construction, which accounts for the fact there are $12$ operations not $4!=24$ , and for the choice of notation.

In Section 4.2, we will implement a sequence of $\operatorname{CS}$ , $\operatorname{MERGE}$ and $\operatorname{TRIVIAL}$ operations whose overall effect is equivalent to this $B_{1\to 2}$ move. That is, starting with a linear datum corresponding (indirectly) to forms $\phi_{1},\dots,\phi_{6}$ and applying this sequence of operations, we obtain a linear datum corresponding to $B_{1\to 2}([\phi_{1}],\dots,[\phi_{6}])$ . We will not state this result precisely yet, as there are some technical subtleties to do with the case $[\phi_{5}]=[\phi_{6}]$ , where the previous sentence does not even make sense and the datum has to be modified to encode the line $\ell$ as well as the points.555Handling this case correctly is an irritating source of complexity in the argument, but seems to be slightly less irritating than avoiding it.

For the time being we will consider the operations $B_{i\to j}$ as a black box. In order to prove Lemma 3.11, we broadly need to show that some sequence of moves $B_{i\to j}$ takes the original $X_{5},X_{6}$ to some final $X^{\prime\prime}_{5},X^{\prime\prime}_{6}$ , with the property that one of $X_{i},X_{j},X^{\prime\prime}_{5}$ or $X_{i},X_{j},X^{\prime\prime}_{6}$ are collinear for some choice of $1\leq i<j\leq 4$ , but the complementary triple are not collinear.

The second part of this is guaranteed by the following lemma, which shows that we never lose control of true complexity by applying $B_{1\to 2}$ (and symmetrically $B_{i\to j}$ for other pairs $(i,j)$ ).

Lemma 4.1.

Let $(X_{1},\dots,X_{6};\ell)$ be as above, and suppose:

•

if $X_{5}\neq X_{6}$ , that $X_{1},\dots,X_{6}$ do not lie on a (possibly degenerate) conic; or

•

if $X_{5}=X_{6}$ , that $X_{1},\dots,X_{5}$ do not lie on a (possibly degenerate) conic that is tangent to $\ell$ at $X_{5}$ .

Then the same is true of $B_{1\to 2}(X_{1},\dots,X_{6};\ell)$ .

Note that in the degenerate case, saying a degenerate conic consisting of two lines $\mu_{1}$ , $\mu_{2}$ is “tangent” to $\ell$ translates algebraically to saying that $\mu_{1},\mu_{2},\ell$ are concurrent.

Proof.

For any point $Y$ on $\ell$ , there is an unique (possibly degenerate) conic $C$ passing through $X_{1},\dots,X_{4}$ and $Y$ . This conic meets $\ell$ again at precisely one other point, counting multiplicity, which we denote by $\tau(Y)$ . So, $\tau$ is an involution $\tau\colon\ell\to\ell$ of the points of $\ell$ .

It is clear without doing detailed calculations that $\tau$ is a birational map $\mathbb{P}^{1}(\mathbb{F}_{p})\to\mathbb{P}^{1}(\mathbb{F}_{p})$ , and so it must be a Möbius transformation (or one can check this directly). Moreover, if we write $A_{ij}$ for $1\leq i<j\leq 4$ for the intersection point of the lines $X_{i}X_{j}$ and $\ell$ , then $\tau$ is characterized by

[TABLE]

since in all of these cases the conic $C$ is degenerate and so $\tau(Y)$ can be found by inspection.

Let $\sigma_{1\to 2}$ be the map $\ell\to\ell$ sending $P\in\ell$ to $QX_{2}\cap\ell$ where $Q=X_{1}P\cap X_{3}X_{4}$ (i.e., how we obtained $X^{\prime}_{5}$ from $X_{5}$ in the definition of $B_{1\to 2}$ ). Then $\sigma_{1\to 2}$ is also a Möbius transformation $\ell\to\ell$ , since it is the composition of two perspectivities, the first sending $\ell$ to $X_{3}X_{4}$ via $X_{1}$ and the second sending $X_{3}X_{4}$ to $\ell$ via $X_{2}$ . Moreover, it is characterized by

[TABLE]

as again the image point in these cases is immediate by inspection. Let $\sigma_{i\to j}$ for $1\leq i\neq j\leq 4$ be defined similarly by permuting the roles of $X_{1},\dots,X_{4}$ ; so (4) holds analogously for these under a suitable permutation of indices. Note that $B_{1\to 2}(X_{1},\dots,X_{6};\ell)=(X_{1},\dots,X_{4},X^{\prime}_{5},X^{\prime}_{6};\ell)$ where $X^{\prime}_{5}=\sigma_{1\to 2}(X_{5})$ and $X^{\prime}_{6}=\sigma_{2\to 1}(X_{6})$ .

Now, the hypothesis on $X_{1},\dots,X_{6}$ in the statement holds if and only if $\tau(X_{5})\neq X_{6}$ . We wish to deduce that $\tau(X^{\prime}_{5})\neq X^{\prime}_{6}$ ; equivalently, that $\tau\sigma_{1\to 2}(X_{5})\neq\sigma_{2\to 1}(X_{6})$ . It would suffice to show that $\tau\sigma_{1\to 2}=\sigma_{2\to 1}\tau$ as Möbius transformations $\ell\to\ell$ . However, this is immediate because

[TABLE]

(using (3) and (4)) and because it follows from our general position assumptions that $A_{13}$ , $A_{14}$ , $A_{34}$ are distinct points of $\ell$ , and therefore uniquely determine a Möbius transformation. ∎

This means that if we can find a sequence of operations $B_{i\to j}$ that takes $X_{5}$ to some $X^{\prime\prime}_{5}$ such that some triple $X_{k},X_{\ell},X^{\prime\prime}_{5}$ for $1\leq k<\ell\leq 4$ is collinear, then the complementary triple cannot be collinear as then $X_{1},\dots,X_{4},X^{\prime\prime}_{5},X^{\prime\prime}_{6}$ would lie on a degenerate conic. So, we can largely forget about $X_{6}$ in what follows and concentrate on the action of $B_{i\to j}$ on $X_{5}$ , which corresponds to the Möbius transformations $\sigma_{i\to j}$ defined in the proof of Lemma 4.1.

The following lemma explains how to take $X_{5}$ to an arbitrary point on $\ell$ , relatively efficiently, using multiple transformations $\sigma_{i\to j}$ .

Lemma 4.2.

We continue the notation from the proof of Lemma 4.1. Suppose we identify $\ell$ with $\mathbb{P}^{1}(\mathbb{F}_{p})$ (i.e., choose coordinates) by identifying $A_{12}$ with $\infty$ , $A_{13}$ with [math] and $A_{23}$ with $1$ (again noting these are guaranteed to be distinct). Every Möbius transformation of $\ell$ now corresponds to a $2\times 2$ matrix in $\operatorname{PGL}_{2}(\mathbb{F}_{p})$ .

Then the following hold:

[TABLE]

and:

[TABLE]

Consequently, the action of $\sigma_{i\to j}$ on $\mathbb{P}^{1}(\mathbb{F}_{p})$ is transitive, and more specifically any point $[r:s]\in\mathbb{P}^{1}(\mathbb{F}_{p})$ , where $r,s\in\mathbb{Z}$ , may be mapped to $[1:1]$ using a word of length $O(|r|+|s|)$ or $O(\log p)$ in $\sigma_{i\to j}$ .

Proof.

The first three identities may be verified using only (4) (and the corresponding statement for the other $\sigma_{i\to j}$ ) to deduce the action on $A_{12}$ , $A_{13}$ , $A_{23}$ (which correspond to $\infty,0,1$ ).

For the next four, that approach does not appear to be sufficient. Our strategy is just pick coordinates and compute explicitly.

We can choose projective coordinates for $\mathbb{P}(V^{\ast})$ such that666 To see this is possible, note that we can certainly choose a projective transformation sending $X_{1}$ , $X_{2}$ , $X_{3}$ to $[1:0:0]$ , $[0:1:0]$ and $[0:0:1]$ as $X_{1},X_{2},X_{3}$ are not collinear. In these coordinates, $\ell=\{(x,y,z)\in\mathbb{P}(V^{\ast})\colon ax+by+cz=0\}$ for some $a,b,c\in\mathbb{F}_{p}\setminus\{0\}$ (as none of $X_{1},X_{2},X_{3}$ lie on $\ell$ ); by further rescaling we can ensure $a=b=c=1$ . This convention now differs from the one above – which is somewhat more convenient – by a further fixed change of coordinates.

$X_{1}=[0:0:1]$ , $X_{2}=[1:0:1]$ , $X_{3}=[0:-1:1]$ and $\ell=\{[x:y:z]\colon z=0\}$ . We write $X_{4}=[a:b:1]$ for some $a,b\in\mathbb{F}_{p}$ : by our collinearity assumptions, this is possible, and furthermore $a\neq 0$ , $b\neq 0$ and $a-b-1\neq 0$ .

Using the information (4), we can compute the matrices of $\sigma_{i\to j}$ explicitly as

[TABLE]

with the other six following from the relation $\sigma_{i\to j}=\sigma_{j\to i}^{-1}$ . Verifying the remaining formulae is now just an exercise in multiplying (projective) matrices.

By a modified version of Euclid’s algorithm, any point $[r:s]\in\mathbb{P}^{1}$ may be reduced to one of $\infty,0,1$ using the matrices $\begin{pmatrix}1&\pm 2\\ 0&1\end{pmatrix}$ , $\begin{pmatrix}1&0\\ \pm 2&1\end{pmatrix}$ , in $O(|r|+|s|)$ steps (the worst case being something like $[r:1]$ for $r$ a large integer; the typical case is better). Alternatively, this can be done in $O(\log p)$ steps, since the Cayley graph on $\operatorname{PSL}_{2}(\mathbb{F}_{p})$ with these generators has diameter $O(\log p)$ , as a corollary of celebrated results concerning expander graphs; see [12]. Finally, one of the first three matrices moves the end point to $[1:1]$ , if necessary. ∎

We should also verify that the values of $[r:s]$ representing the original point $X_{5}$ are not too large.

Lemma 4.3.

Suppose $X_{i}=[x_{i}:y_{i}:z_{i}]$ for $1\leq i\leq 6$ where $x_{i},y_{i},z_{i}$ are integers, and $X_{5}\neq X_{6}$ . Then in the coordinates on $\ell$ described in Lemma 4.2, the point $X_{5}\in\ell$ is identified with $[r:s]$ where $r,s$ are integers with $|r|,|s|=O(max\{|x_{i}|,|y_{i}|,|z_{i}|\colon 1\leq i\leq 6\}^{6})$ .

Proof.

Write $|X_{i}X_{j}X_{k}|$ for the determinant of the $3\times 3$ matrix whose columns are the vectors $(x_{i},y_{i},z_{i})$ , $(x_{j},y_{j},z_{j})$ , $(x_{k},y_{k},z_{k})$ . Then we set

[TABLE]

and claim that $[r:s]$ are the coordinates of $X_{5}$ in the coordinate system of Lemma 4.2. One can verify that the definition of $[r:s]$ is invariant under rescaling any $(x_{i},y_{i},z_{i})$ , or under a change of projective coordinates on $\mathbb{P}^{2}$ . Also, it is straightforward to verify this claim when $X_{1}=[0:0:1]$ , $X_{2}=[1:0:1]$ , $X_{3}=[0:-1:1]$ and $\ell=\{[x:y:z]\colon z=0\}$ (i.e. in the coordinates on $\mathbb{P}^{2}$ used in the proof of Lemma 4.2). It follows that the claim holds in general. ∎

We have one further technical issue to consider. It will be convenient to construct the block that implements $B_{i\to j}$ only in cases where none of the triples $X_{i},X_{j},X_{5}$ or $X_{i},X_{j},X_{6}$ for $1\leq i<j\leq 4$ is collinear. This is not too onerous, as the first time we obtain a set of points where one of these triples is collinear, we can just stop and the conclusion of Lemma 3.11 will be satisfied. However, for this to work we need to check that, when this happens, we are not in one of the degenerate cases where $X_{5}=X_{6}$ .

We therefore check the following lemma.

Lemma 4.4.

Suppose $X_{1},\dots,X_{6}$ and $\ell$ are as above and $X^{\prime}_{5}$ , $X^{\prime}_{6}$ are the points returned by the block move $B_{1\to 2}(X_{1},\dots,X_{6};\ell)$ . Suppose also that no triple $X_{i},X_{j},X_{5}$ or $X_{i},X_{j},X_{6}$ for $1\leq i<j\leq 4$ is collinear, but that some triple $X_{i},X_{j},X^{\prime}_{5}$ or $X_{i},X_{j},X^{\prime}_{6}$ is collinear. Then $X^{\prime}_{5}\neq X^{\prime}_{6}$ .

Proof.

Suppose for contradiction that $X^{\prime}_{5}=X^{\prime}_{6}=A_{ij}$ for some $1\leq i<j\leq 4$ . Then $X_{5}=\sigma_{1\to 2}^{-1}(A_{ij})$ and $X_{6}=\sigma_{2\to 1}^{-1}(A_{ij})$ . By (4), if either $i=2$ or $j=2$ , or $i=3$ and $j=4$ , then $X_{5}=A_{i^{\prime}j^{\prime}}$ for some $1\leq i^{\prime}<j^{\prime}\leq 4$ , which is a contradiction. Similarly, if either $i=1$ or $j=1$ , or $i=3$ and $j=4$ , then $X_{6}=A_{i^{\prime}j^{\prime}}$ for some $1\leq i^{\prime}<j^{\prime}\leq 4$ , which is again a contradiction. Since these cases exhaust all possible pairs $i,j$ , the result follows. ∎

4.2 Implementing a block move

In this subsection we describe a sequence of $\operatorname{CS}$ , $\operatorname{MERGE}$ and $\operatorname{TRIVIAL}$ operations that have the effect of a block transformation $B_{1\to 2}$ . This is the last but most central ingredient in the proof of Theorem 1.5.

In the course of the argument in Section 4.1, we may need to consider intermediate configurations $(X_{1},\dots,X_{6})\in\mathbb{P}^{2}$ for which $X_{5}=X_{6}$ . Typically we do not expect this case to arise, but it would be onerous to try to avoid it in general. Also, it is not true that in such cases we are immediately done by some easy Cauchy–Schwarz technique as in Section 2: it appears this degeneracy is not one we can use to our advantage.

To handle this, we need to build more slack into our linear datum. We again set $V=\mathbb{F}_{p}^{3}$ .

Definition 4.5.

Let $\phi_{1},\dots,\phi_{6}\colon V\to\mathbb{F}_{p}$ be non-zero linear forms, with $[\phi_{1}],\dots,[\phi_{4}]$ in general position, and let $\ell$ be a subspace of dimension $2$ in $V^{\ast}$ containing $\phi_{5}$ , $\phi_{6}$ but none of $\phi_{1},\dots,\phi_{4}$ . An augmented datum representing $(\phi_{1},\dots,\phi_{6};\ell)$ is a linear datum $\Psi=\left(V\oplus\mathbb{F}_{p},(W_{i})_{i\in I},(\psi_{i})_{i\in I}\right)$ where $I=[6]$ , $W_{i}=\mathbb{F}_{p}$ for each $1\leq i\leq 4$ and $W_{5}=W_{6}=\mathbb{F}_{p}^{2}$ , and $\psi_{i}\colon V\oplus\mathbb{F}_{p}\to W_{i}$ satisfy:

•

for $1\leq i\leq 4$ and any $(v,t)\in V\oplus\mathbb{F}_{p}$ we have $\psi_{i}(v,t)=\phi_{i}(v)$ ;

•

for $i=5,6$ and any $(v,t)\in V\oplus\mathbb{F}_{p}$ we have $\psi_{i}(v,t)=(\phi_{i}(v),t+\chi_{i}(v))$ for some $\chi_{i}\in\ell$ .

Also, the standard datum representing $(\phi_{1},\dots,\phi_{6})$ is just $\Psi^{\prime}=\left(V,(W^{\prime}_{i})_{i\in I},(\phi_{i})_{i\in I}\right)$ where $I=[6]$ and $W^{\prime}_{i}=\mathbb{F}_{p}$ for each $i$ , as in Lemma 3.11.

There are many roughly equally cryptic ways to phrase this rigorously. Geometrically, what has happened is that we have embedded our projective plane $\mathbb{P}^{2}=\mathbb{P}(V^{\ast})$ in a three-dimensional space $\mathbb{P}^{3}=\mathbb{P}((V\oplus\mathbb{F}_{p})^{\ast})$ , and each of $\psi_{1},\dots,\psi_{4}$ corresponds to the respective point $[\phi_{1}],\dots,[\phi_{4}]$ in the embedded copy of $\mathbb{P}^{2}$ . Meanwhile, $\psi_{5}$ and $\psi_{6}$ correspond to lines in $\mathbb{P}^{3}$ whose intersections with the embedded $\mathbb{P}^{2}$ are $[\phi_{5}],[\phi_{6}]$ , and whose canonical projections onto the embedded $\mathbb{P}^{2}$ are both (contained in, but secretly equal to) the line $\ell$ .

In the case $[\phi_{5}]\neq[\phi_{6}]$ , we don’t really need this extra dimension but it does no harm, as the following lemma will show. When $[\phi_{5}]=[\phi_{6}]$ , the situation has genuinely changed because the augmented datum retains information about $\ell$ whereas the standard one would not.

Lemma 4.6.

Let $\phi_{1},\dots,\phi_{6}\colon V\to\mathbb{F}_{p}$ be linear forms as in Definition 4.5, and suppose further that $[\phi_{5}]\neq[\phi_{6}]$ and $\ell=\operatorname{span}(\phi_{5},\phi_{6})$ . Let $\Psi_{1}=(V\oplus\mathbb{F}_{p},(W_{i})_{i\in I},(\psi_{i})_{i\in I})$ be an augmented datum representing $(\phi_{1},\dots,\phi_{6};\ell)$ and let $\Psi_{2}=(V,(W^{\prime}_{i})_{i\in I},(\phi_{i})_{i\in I})$ be the standard datum representing $(\phi_{1},\dots,\phi_{6})$ . Then $\Psi_{1}$ dominates $\Psi_{2}$ respecting $(j,j)$ and $\Psi_{2}$ dominates $\Psi_{1}$ respecting $(j,j)$ , for any $1\leq j\leq 4$ , and with exponent $1$ in each case.

Proof.

Both directions are by $\operatorname{TRIVIAL}$ steps (i.e. Proposition 3.4). First we show that $\Psi_{1}$ dominates $\Psi_{2}$ trivially (respecting $1\leq j\leq 4$ ). Indeed, we may consider the surjective maps $\theta\colon V\oplus\mathbb{F}_{p}\to V$ given by $\theta(v,t)=v$ , and $\sigma_{i}\colon W_{i}\to W^{\prime}_{i}$ given by the identity if $1\leq i\leq 4$ and $\sigma_{5}(x,y)=\sigma_{6}(x,y)=x$ . It is immediate from our hypotheses that $\phi_{i}\circ\theta=\sigma_{i}\circ\psi_{i}$ for each $i$ , and the claim follows.

To show $\Psi_{2}$ dominates $\Psi_{1}$ trivially (respecting $1\leq j\leq 4$ ), we consider an injective map $\imath\colon V\to V\oplus\mathbb{F}_{p}$ given by $\imath(v)=(v,\mu(v))$ for some $\mu\in V^{\ast}$ which will have to be chosen carefully, together with $\tau_{i}\colon W^{\prime}_{i}\to W_{i}$ given by the identity if $1\leq i\leq 4$ and to be specified when $i=5,6$ . It suffices to show that $\psi_{i}\circ\imath=\tau_{i}\circ\phi_{i}$ for each $1\leq i\leq 6$ , under suitable choices. Note that this is already immediate for $1\leq i\leq 4$ , given our hypotheses. For $i=5,6$ , we need precisely that

[TABLE]

for any $v\in V$ . If $\mu+\chi_{i}\in\operatorname{span}(\phi_{i})$ for $i=5,6$ as elements of $V^{\ast}$ , so $\mu+\chi_{i}=\gamma_{i}\phi_{i}$ for some $\gamma_{5},\gamma_{6}\in\mathbb{F}_{p}$ , we could define $\tau_{i}(x)=(x,\gamma_{i}x)$ and the equation would be satisfied.

We check that this holds for an appropriate choice of $\mu$ . Because $[\phi_{5}]\neq[\phi_{6}]$ , $\phi_{5},\phi_{6}$ is a basis for $\ell$ and so we may write

[TABLE]

for some $\alpha,\beta\in\mathbb{F}_{p}$ . Define $\mu=\alpha\phi_{5}-\chi_{5}=-\beta\phi_{6}-\chi_{6}$ ; hence, $\chi_{5}+\mu\in\operatorname{span}(\phi_{5})$ and $\chi_{6}+\mu\in\operatorname{span}(\phi_{6})$ as required. ∎

Again it may be instructive to think about this geometrically. In the first part, we used the canonical embedding of $\mathbb{P}^{2}$ into $\mathbb{P}^{3}$ discussed above to get our morphism of linear data. In the second part, we chose a particular non-standard projection $\mathbb{P}^{3}\to\mathbb{P}^{2}$ that collapses the line corresponding to $\psi_{5}$ onto $[\phi_{5}]$ and that corresponding to $\psi_{6}$ onto $[\phi_{6}]$ .

Finally, we can state a lemma which is the workhorse of the whole argument.

Lemma 4.7.

Let $\phi_{1},\dots,\phi_{6}$ and $\ell$ be as in Definition 4.5, and let $\Psi$ be an augmented datum representing $(\phi_{1},\dots,\phi_{6};\ell)$ . Also, let $\phi^{\prime}_{5},\phi^{\prime}_{6}\colon V\to\mathbb{F}_{p}$ be linear forms such that

[TABLE]

Then there exists an augmented datum $\Psi^{\prime}$ representing $(\phi_{1},\dots,\phi_{4},\phi^{\prime}_{5},\phi^{\prime}_{6};\ell)$ , such that $\Psi^{\prime}$ dominates $\Psi$ respecting $(j,j)$ for any $1\leq j\leq 4$ , and with exponent $1/16$ .

The argument has roughly two phases. In the first phase, our goal is to build a datum corresponding to the following graph of vector spaces over $V\oplus\mathbb{F}_{p}$ :

$A$ $12\mathbin{/\mkern-6.0mu/}34$ $2\mathbin{/\mkern-6.0mu/}1$ $12\mathbin{/\mkern-6.0mu/}34$$6$$6$$5$$5$ $56\mathbin{/\mkern-6.0mu/}12$$3$ $15\mathbin{/\mkern-6.0mu/}26$ $56\mathbin{/\mkern-6.0mu/}12$$3$$4$$4$

Here the numbers next to the vertices denote two classes corresponding to those indices that get merged into $\psi^{\prime}_{5}$ and those that get merged into $\psi^{\prime}_{6}$ respectively. The indices at vertex $A$ will turn into $\psi^{\prime}_{1},\dots,\psi^{\prime}_{4}$ .

It is not possible to construct this graph directly using $\operatorname{CS}$ steps, so we have to build a larger graph using $\operatorname{CS}$ steps and then prune it back using $\operatorname{MERGE}$ and $\operatorname{TRIVIAL}$ steps.

In the second phase, we need to apply a carefully chosen $\operatorname{TRIVIAL}$ operation to reduce this to a system $\Psi^{\prime}$ defined on a single copy of $V\oplus\mathbb{F}_{p}$ .

Proof of Lemma 4.7.

We abbreviate $V\oplus\mathbb{F}_{p}$ to $V^{\prime}$ . Beginning with the augmented datum $\Psi=\left(V^{\prime},(W_{i})_{i\in[6]},(\psi_{i})_{i\in[6]}\right)$ , we first apply $\operatorname{CS}_{6}$ :

[math] $1$$6$

and then $\operatorname{MERGE}_{\{5_{0},5_{1}\}\mapsto R}$ followed by $\operatorname{CS}_{R}$ to get:

$00$$10$$01$$11$$6$$6$$5$$5$

Now we do $\operatorname{CS}_{3_{01}}$ to get:

$000$$100$$010$$110$$6$$6$$5$$5$$001$$101$$011$$111$$6$$6$$5$$5$$3$

followed by $\operatorname{MERGE}_{\{4_{010},4_{011}\}\mapsto R}$ and then $\operatorname{CS}_{R}$ to get:

$0000$$1000$$0100$$1100$$6$$6$$5$$5$$0010$$1010$$0110$$1110$$6$$6$$5$$5$$3$$0001$$1001$$0101$$1101$$6$$6$$5$$5$$0011$$1011$$0111$$1111$$6$$6$$5$$5$$3$$4$$4$

Note that in these diagrams, the set of indices of the corresponding linear data are $i_{\omega}$ where $i\in\{1,\dots,6\}$ , $\omega\in\{0,1\}^{k}$ and there is no edge labelled $i$ incident to vertex $\omega$ . In other words, there is a surviving linear form attached to each vertex $\omega$ (which has not been Cauchy–Schwarzed away) for each $i\in\{1,\dots,6\}$ which is not a vertex label at $\omega$ .

Denote this last datum by $\Psi_{1}=\left(\mathcal{V},\big{(}W_{i}^{(1)}\big{)}_{i\in I_{1}},\big{(}\psi_{i}^{(1)}\big{)}_{i\in I_{1}}\right)$ . Explicitly: $\mathcal{V}$ is the subspace of $V^{\prime\{0,1\}^{4}}$ determined by the above graph of vector spaces; the index set is

[TABLE]

the vector spaces $W^{(1)}_{i}$ for $i\in I_{1}$ are all just $\mathbb{F}_{p}$ ; and $\psi^{(1)}_{i}\colon\mathcal{V}\to W^{(1)}_{i}$ are given by

[TABLE]

Our next task is to prune back all of the $(5,6)$ squares apart from the bottom right one using $\operatorname{MERGE}$ and $\operatorname{TRIVIAL}$ steps. We will need the following standard linear algebra fact.

Lemma 4.8.

Let $L,L_{1},L_{2}$ be vector spaces and $s_{i}\colon L\to L_{i}$ for $i=1,2$ be linear maps. Then there exist maps $\mathfrak{s}_{1},\mathfrak{s}_{2}\colon L\to L$ such that

•

$s_{i}\circ\mathfrak{s}_{i}=s_{i}$ * for $i=1,2$ ;*

•

$\ker\mathfrak{s}_{i}=\ker s_{i}$ * for $i=1,2$ ; and*

•

the maps $\mathfrak{s}_{1}$ , $\mathfrak{s}_{2}$ commute.

Proof.

Pick a basis for $\ker s_{1}\cap\ker s_{2}$ , and extend it separately to a basis for $\ker s_{1}$ and $\ker s_{2}$ ; merging these gives a basis for $\ker s_{1}+\ker s_{2}$ . Finally extend this to a basis for $L$ . This gives a direct sum decomposition $L=K_{00}\oplus K_{01}\oplus K_{10}\oplus K_{11}$ where $\ker s_{1}=K_{00}+K_{01}$ and $\ker s_{2}=K_{00}+K_{10}$ . Let $\mathfrak{s}_{1}(x_{00},x_{01},x_{10},x_{11})=(0,0,x_{10},x_{11})$ and $\mathfrak{s}_{2}(x_{00},x_{01},x_{10},x_{11})=(0,x_{01},0,x_{11})$ . It is clear these maps have the desired properties. ∎

Consider e.g. the bottom left square consisting of ${0010}$ , ${0110}$ , ${1010}$ , ${1110}$ . We now merge the indices $i_{0010}$ for $1\leq i\leq 4$ into a single index $R$ , and all eight indices $i_{1010}$ , $i_{1110}$ for $1\leq i\leq 4$ into a single index $S$ ; that is, we apply

[TABLE]

to $\Psi_{1}$ to obtain the merged datum $\Psi_{2}=\left(\mathcal{V},\big{(}W^{(2)}_{i}\big{)}_{i\in I_{2}},\big{(}\psi^{(2)}_{i}\big{)}_{i\in I_{2}}\right)$ .

Let $\mathcal{V}^{\prime}$ denote the vector space associated to the following graph of vector spaces on $V^{\prime}$ :

$0000$$1000$$0100$$1100$$6$$6$$5$$5$$0110$$3$$0001$$1001$$0101$$1101$$6$$6$$5$$5$$0011$$1011$$0111$$1111$$6$$6$$5$$5$$3$$4$$4$

and define a datum $\Psi_{3}=\left(\mathcal{V}^{\prime},\big{(}W_{i}^{(3)}\big{)}_{i\in I_{3}},\big{(}\psi^{(3)}_{i}\big{)}_{i\in I_{3}}\right)$ where:

•

$I_{3}$ is the same as $I_{2}$ ;

•

$W^{(3)}_{i}=W^{(2)}_{i}=\mathbb{F}_{p}$ and $\psi^{(3)}_{i}=\psi^{(2)}_{i}=\psi^{(1)}_{i}$ for every $i\neq R,S$ ;

•

$W_{R}=W_{S}=\mathbb{F}_{p}^{2}$ and

[TABLE]

(It is not very important, but these are all surjective, as can be seen by considering the image of the diagonal embedding $V^{\prime}\to\mathcal{V}^{\prime}$ .)

We claim that $\Psi_{2}$ is dominated trivially by the “pruned” datum $\Psi_{3}$ . To justify this, we first apply Lemma 4.8 to $V^{\prime}$ , $\psi_{5}$ and $\psi_{6}$ to obtain maps $\mathfrak{s}_{1},\mathfrak{s}_{2}\colon V^{\prime}\to V^{\prime}$ . We can then define an injection $\imath\colon\mathcal{V}^{\prime}\to\mathcal{V}$ by

[TABLE]

For this to make sense, we need the compatibility conditions associated to the graph for $\mathcal{V}$ to hold. In particular we need

[TABLE]

and indeed these follow from the properties of $\mathfrak{s}_{1}$ and $\mathfrak{s}_{2}$ . The remaining compatibility conditions are inherited from $\mathcal{V}^{\prime}$ .

We now give the $\operatorname{TRIVIAL}$ step explicitly. For $i\neq R,S$ the map $\sigma_{i}\colon W^{(3)}_{i}\to W^{(2)}_{i}$ is just the identity. For $R$ and $S$ , consider that

[TABLE]

as the original constituent forms of $\psi^{(2)}_{R}$ depend only on $v^{\prime}_{0010}$ and those of $\psi^{(2)}_{S}$ on $(v^{\prime}_{1010},v^{\prime}_{1110})$ in $\mathcal{V}$ . It follows that there exist unique linear maps $\sigma_{R}\colon W_{R}^{(3)}\to W_{R}^{(2)}$ and $\sigma_{S}\colon W_{S}^{(3)}\to W_{S}^{(2)}$ such that $\psi^{(2)}_{R}\circ\imath=\sigma_{R}\circ\psi_{R}^{(3)}$ and $\psi^{(2)}_{S}\circ\imath=\sigma_{S}\circ\psi_{S}^{(3)}$ , respectively. Hence the conditions of Proposition 3.4 are satisfied and $\Psi_{3}$ dominates $\Psi_{2}$ trivially.

It is natural to relabel $R$ as $5_{0110}$ and $S$ as $6_{0110}$ in $\Psi_{3}$ , as these indices now behave exactly like copies of $\psi_{5}$ and $\psi_{6}$ respectively associated to the vertex $0110$ .

We can summarize the preceding argument, which took us from the $16$ -vertex graph to the $13$ -vertex graph above, informally as follows. On the dual side, we can say that we built a projection map $\imath^{\ast}\colon\mathcal{V}^{\ast}\to\mathcal{V}^{\prime\ast}$ that maps each of $\big{(}\ker\psi^{(2)}_{i_{0010}}\big{)}^{\perp}$ into $\big{(}\ker\psi^{(3)}_{R}\big{)}^{\perp}$ and each of $\big{(}\ker\psi^{(2)}_{i_{1010}}\big{)}^{\perp}$ or $\big{(}\ker\psi^{(2)}_{i_{1110}}\big{)}^{\perp}$ into $\big{(}\ker\psi^{(3)}_{S}\big{)}^{\perp}$ (for $i\in\{1,\dots,4\}$ ). Moreover, $\imath^{\ast}$ was constructed by projecting out the spare coordinates at the vertices $0010$ , $1010$ , $11110$ using $\mathfrak{s}_{1}^{\ast}$ along vertical edges and $\mathfrak{s}_{2}^{\ast}$ along horizontal edges, and we checked this made sense. This is what is meant by the following further annotated diagram (we will see more of this kind below).

$0000$$1000$$0100$$1100$$6$$6$$5$$5$$0010$$1010$$0110$$1110$$6$$\mathfrak{s}_{2}^{\ast}$$6$$\mathfrak{s}_{2}^{\ast}$$5$$\mathfrak{s}_{1}^{\ast}$$5$$\mathfrak{s}_{1}^{\ast}$$1234\mathbin{/\mkern-6.0mu/}\emptyset$$\emptyset\mathbin{/\mkern-6.0mu/}1234$$\emptyset\mathbin{/\mkern-6.0mu/}1234$$3$$0001$$1001$$0101$$1101$$6$$6$$5$$5$$0011$$1011$$0111$$1111$$6$$6$$5$$5$$3$$4$$4$

We then repeat these same steps on the top left and top right corners. The result is a datum $\Psi_{4}=\left(\mathcal{V}^{\prime\prime},\big{(}\psi^{(4)}_{i}\big{)}_{i\in I_{4}},\big{(}W_{i}^{(4)}\big{)}_{i\in I_{4}}\right)$ corresponding to the $7$ -vertex configuration

$0000$$1000$$0100$$1100$$6$$6$$5$$5$$0110$$3$$0101$$0111$$3$$4$$4$

and which dominates $\Psi_{3}$ , and hence $\Psi$ , respecting $(r_{0000},r)$ for $1\leq r\leq 4$ , with exponent $1$ . Explicitly:

•

the index set of $\Psi_{4}$ is

[TABLE]

•

the space $\mathcal{V}^{\prime\prime}$ is that associated to the above graph of vector spaces, and so is a subspace of ${V^{\prime}}^{\mathcal{A}}$ where $\mathcal{A}=\{0000,1000,0100,1100,0101,0110,0111\}$ ;

•

we have $W^{(4)}_{r_{\omega}}=\mathbb{F}_{p}$ when $1\leq r\leq 4$ or $\mathbb{F}_{p}^{2}$ when $r=5,6$ ; and

•

$\psi^{(4)}_{r_{\omega}}((v^{\prime}_{\eta})_{\eta\in\mathcal{A}})=\psi_{r}(v^{\prime}_{\omega})$ for each $r_{\omega}\in I_{4}$ .

This completes the first phase of the argument.

We now perform our remaining $\operatorname{MERGE}$ operation. This partitions all remaining forms apart from those in the $0000$ copy into two classes $A$ and $B$ , by

[TABLE]

and we call the merged datum $\Psi_{5}$ . This corresponds to the annotated diagram discussed above:

$0000$$1000$$12\mathbin{/\mkern-6.0mu/}34$$0100$$2\mathbin{/\mkern-6.0mu/}1$$1100$$12\mathbin{/\mkern-6.0mu/}34$$6$$6$$5$$5$$0110$$56\mathbin{/\mkern-6.0mu/}12$$3$$0101$$15\mathbin{/\mkern-6.0mu/}26$$0111$$56\mathbin{/\mkern-6.0mu/}12$$3$$4$$4$

and we note the index set is now $\{r_{0000}\colon 1\leq r\leq 4\}\cup\{A,B\}$ .

Finally, we wish to dominate $\Psi_{5}$ trivially by an augmented datum $\Psi^{\prime}$ . Recall that $\Psi^{\prime}$ is as follows:

•

its base space is $V^{\prime}$ ;

•

the index set is $\{1,\dots,6\}$ ;

•

the spaces $W^{\prime}_{i}$ are given by $W^{\prime}_{i}=\mathbb{F}_{p}$ for $1\leq i\leq 4$ and $W^{\prime}_{5},W^{\prime}_{6}=\mathbb{F}_{p}^{2}$ ; and

•

$\psi^{\prime}_{i}=\psi_{i}$ for $1\leq i\leq 4$ , and $\psi^{\prime}_{i}(v,t)=(\phi^{\prime}_{i}(v),t+\chi^{\prime}_{i}(v))$ for $i=5,6$ , where $\phi^{\prime}_{5},\phi^{\prime}_{6}$ are the given forms satisfy $[\phi^{\prime}_{5}]=X^{\prime}_{5}$ , $[\phi^{\prime}_{6}]=X^{\prime}_{6}$ , and $\chi^{\prime}_{5},\chi^{\prime}_{6}\in\ell$ are forms we may choose.

In what follows we identify indices $r_{0000}$ and $r$ for $1\leq r\leq 4$ , $A$ and $5$ , and $B$ and $6$ . Our remaining task is therefore to construct linear maps $\CMjmath\colon V^{\prime}\to\mathcal{V}^{\prime\prime}$ , $\nu_{5}\colon W^{\prime}_{5}\to W^{(5)}_{5}$ and $\nu_{6}\colon W^{\prime}_{6}\to W^{(5)}_{6}$ which, together with the identity maps $W^{\prime}_{i}\to W^{(5)}_{i}$ for $1\leq i\leq 4$ , satisfy the conditions of Proposition 3.4.

We fix some notation. Again write $X_{1},\dots,X_{6}$ for the points $[\phi_{1}],\dots,[\phi_{6}]$ in $\mathbb{P}(V^{\ast})$ . Let $Y$ , $Z$ , $X^{\prime}_{5}$ , $X^{\prime}_{6}$ be defined as above (see Figure 1); that is, $Y$ is the intersection of the lines $X_{1}X_{5}$ , $Z$ is the intersection of the lines $X_{2}X_{6}$ and $X_{3}X_{4}$ , $X^{\prime}_{5}$ is $X_{2}Y\cap\ell$ , and $X^{\prime}_{6}$ is $X_{1}Z\cap\ell$ . Write $H_{i}=\ker(\psi_{i})^{\perp}\subseteq V^{\prime\ast}$ for each $1\leq i\leq 6$ .

Also recall $V^{\prime}=V\oplus\mathbb{F}_{p}$ ; so $V$ is naturally a subspace $\{(v,0)\colon v\in V\}$ of $V^{\prime}$ , and we write $T=\operatorname{span}((0,1))$ for the other summand, so that $V^{\prime}=V\oplus T$ . Dually, we may make an identification $V^{\prime\ast}=V^{\ast}\oplus T^{\ast}$ , and thereby identify $V^{\ast}$ and $T^{\ast}$ with subspaces of $V^{\prime\ast}$ . Let $\xi\in{V^{\prime}}^{\ast}$ be the linear form $\xi(v,t)=t$ , meaning that $\operatorname{span}(\xi)=T^{\ast}$ .

We make a simplifying observation. If $Y=Z$ , then $X^{\prime}_{6}=X_{5}$ and $X^{\prime}_{5}=X_{6}$ , so the effect of the whole block move was just to swap $X_{5}$ and $X_{6}$ . In this case, the result is trivially satisfied by exchanging the indices $5$ and $6$ (and ignoring everything we’ve done up to this point). Hence we can assume $Y\neq Z$ in what follows.

We isolate a linear algebraic lemma which states concretely what is needed for this $\operatorname{TRIVIAL}$ step.

Lemma 4.9.

There exist subspaces $H^{\prime}_{5}$ , $H^{\prime}_{6}$ of $\ell+T^{\ast}$ (which is itself a subspace of $V^{\prime\ast}$ ), and linear maps $\tau_{1},\tau_{2},\tau_{3},\tau_{4}\colon V^{\prime}\to V^{\prime}$ , with the following properties:

(i)

*composition with $\tau_{1}$ fixes $\psi_{3}$ and $\psi_{4}$ *(i.e., $\psi_{3}\circ\tau_{1}=\psi_{3}$ and $\psi_{4}\circ\tau_{1}=\psi_{4}$ ); 2. (ii)

*composition with $\tau_{2}$ fixes $\psi_{5}$ and $\psi_{6}$ *(i.e., $\psi_{5}\circ\tau_{2}=\psi_{5}$ and $\psi_{6}\circ\tau_{2}=\psi_{6}$ ); 3. (iii)

similarly, $\psi_{3}\circ\tau_{3}=\psi_{3}$ and $\psi_{6}\circ\tau_{4}=\psi_{6}$ ; 4. (iv)

$\tau_{2}^{\ast}\tau_{1}^{\ast}(H_{1}+H_{5})\subseteq H^{\prime}_{5}$ * and $\tau_{2}^{\ast}\tau_{1}^{\ast}(H_{2}+H_{6})\subseteq H^{\prime}_{6}$ ;* 5. (v)

$\tau_{2}^{\ast}(H_{2})\subseteq H^{\prime}_{5}$ * and $\tau_{2}^{\ast}(H_{1})\subseteq H^{\prime}_{6}$ ;* 6. (vi)

$\tau_{2}^{\ast}\tau_{1}^{\ast}\tau_{3}^{\ast}(H_{5}+H_{6})\subseteq H^{\prime}_{5}$ * and $\tau_{2}^{\ast}\tau_{1}^{\ast}\tau_{3}^{\ast}(H_{1}+H_{2})\subseteq H^{\prime}_{6}$ ;* 7. (vii)

$\tau_{2}^{\ast}\tau_{4}^{\ast}(H_{1}+H_{2})\subseteq H^{\prime}_{5}$ * and $\tau_{2}^{\ast}\tau_{4}^{\ast}(H_{3}+H_{4})\subseteq H^{\prime}_{6}$ ;* 8. (viii)

$H^{\prime}_{5}$ * and $H^{\prime}_{6}$ have dimension at most $2$ , and $H^{\prime}_{5}\cap V^{\ast}$ and $H^{\prime}_{6}\cap V^{\ast}$ are contained in the $1$ -dimensional subspaces corresponding to $X^{\prime}_{5}$ , $X^{\prime}_{6}$ respectively.*

Indeed, suppose this lemma holds. We may define

[TABLE]

which is perhaps best summarized by further annotating the above diagram as follows:

$0000$$1000$$12\mathbin{/\mkern-6.0mu/}34$$0100$$2\mathbin{/\mkern-6.0mu/}1$$1100$$12\mathbin{/\mkern-6.0mu/}34$$\tau_{2}^{\ast}\circ\tau_{4}^{\ast}$$6$$\tau_{4}^{\ast}$$6$$\tau_{2}^{\ast}$$5$$\operatorname{id}$$5$$0110$$56\mathbin{/\mkern-6.0mu/}12$$\tau_{1}^{\ast}\circ\tau_{3}^{\ast}$$3$$0101$$15\mathbin{/\mkern-6.0mu/}26$$0111$$56\mathbin{/\mkern-6.0mu/}12$$\tau_{3}^{\ast}$$3$$\tau_{1}^{\ast}$$4$$\operatorname{id}$$4$

Statements (i)–(iii) ensure that makes sense, i.e. that all the compatibility conditions in the definition of $\mathcal{V}^{\prime\prime}$ are satisfied. By the fact that $H^{\prime}_{5},H^{\prime}_{6}\subseteq\ell+T^{\ast}$ and statement (viii), for any $\phi^{\prime}_{5},\phi^{\prime}_{6}$ with $[\phi^{\prime}_{5}]=X^{\prime}_{5}$ , $[\phi^{\prime}_{6}]=X^{\prime}_{6}$ we can find $\chi^{\prime}_{5},\chi^{\prime}_{6}\in\ell$ such that $\operatorname{span}(\phi^{\prime}_{5},\xi+\chi^{\prime}_{5})\supseteq H^{\prime}_{5}$ and $\operatorname{span}(\phi^{\prime}_{6},\xi+\chi^{\prime}_{6})\supseteq H^{\prime}_{6}$ ; we use these $\chi^{\prime}_{5}$ , $\chi^{\prime}_{6}$ to complete the definition of $\psi^{\prime}_{5}$ , $\psi^{\prime}_{6}$ and thereby $\Psi^{\prime}$ . Then, statements (iv)–(vii) are precisely what we need to deduce that

[TABLE]

and as before this guarantees that there exist unique maps $\nu_{5}\colon W^{\prime}_{5}\to W^{(5)}_{A}$ and $\nu_{6}\colon W^{\prime}_{6}\to W^{(5)}_{B}$ such that $\psi^{(5)}_{A}\circ\CMjmath=\nu_{5}\circ\psi^{\prime}_{5}$ and $\psi^{(5)}_{B}\circ\CMjmath=\nu_{6}\circ\psi^{\prime}_{6}$ respectively.

The proof of this lemma is unpleasant and technical linear algebra, and will occupy the rest of the section.

Proof of Lemma 4.9.

Note that $H_{1},\dots,H_{4}$ are $1$ -dimensional subspaces of $V^{\ast}$ corresponding to $X_{1},\dots,X_{4}$ , i.e. $H_{i}=\operatorname{span}(\phi_{i})$ . Also, $H_{5},H_{6}$ are subspaces of $\ell+T^{\ast}$ of dimension $2$ such that $H_{5}\cap V^{\ast}$ and $H_{6}\cap V^{\ast}$ are the $1$ -dimension subspaces $\operatorname{span}(\phi_{5})$ , $\operatorname{span}(\phi_{6})$ corresponding to $X_{5},X_{6}$ respectively.

We first construct the map $\tau_{1}\colon V^{\prime}\to V^{\prime}$ . Roughly speaking, $\tau_{1}^{\ast}$ is a projection from $\mathbb{P}(V^{\prime\ast})$ to the line $X_{3}X_{4}$ , which collapses the line $X_{1}X_{5}$ onto $Y$ and the line $X_{2}X_{6}$ onto $Z$ . Specifically, we want the following.

Claim.

We can find $\tau_{1}\colon V^{\prime}\to V^{\prime}$ satisfying the following conditions: $\tau_{1}^{\ast}|_{H_{3}+H_{4}}$ is the identity; and writing $H_{Y}=\tau_{1}^{\ast}(H_{1}+H_{5})$ and $H_{Z}=\tau_{1}^{\ast}(H_{2}+H_{6})$ , then $H_{Y}$ is just the $1$ -dimensional subspace corresponding to $Y$ and $H_{Z}$ the $1$ -dimensional subspace corresponding to $Z$ .

Note that the fact $\tau_{1}^{\ast}|_{H_{3}+H_{4}}=\operatorname{id}$ implies (i) from the statement.

Proof of Claim.

Let $D$ be the intersection point of the lines $X_{1}X_{5}$ and $X_{2}X_{6}$ (which exists as, say, $X_{1}X_{2}X_{5}$ are not collinear). Since we are assuming $Y\neq Z$ , it follows that $D$ does not lie on the line $X_{3}X_{4}$ .

Note $\dim(H_{1}+H_{5})=3$ and $\dim(H_{2}+H_{6})=3$ , and that $(H_{1}+H_{5})\cap V^{\ast}$ and $(H_{2}+H_{6})\cap V^{\ast}$ are $2$ -dimensional subspaces correspond to the lines $X_{1}X_{5}$ and $X_{2}X_{6}$ repsectively. Writing $W=(H_{1}+H_{5})\cap(H_{2}+H_{6})$ , we have $\dim W=2$ (as $H_{1}+H_{5}\neq H_{2}+H_{6}$ ), and furthermore the intersection $W\cap V^{\ast}$ is precisely the $1$ -dimensional subspace corresponding to $D=X_{1}X_{5}\cap X_{2}X_{6}$ .

Hence, we may pick some non-zero $y\in W\cap V^{\ast}$ , and extend this to a basis $x,y$ for $W$ ; so $[y]=D$ and necessarily $x\notin V^{\ast}$ .

It follows that $\phi_{3},\phi_{4},x,y$ is a basis for $V^{\prime\ast}$ , and so we may define $\tau_{1}$ by:

[TABLE]

which immediately implies that $\tau_{1}^{\ast}|_{H_{3}+H_{4}}$ is the identity.

Since $\dim(H_{1}+H_{5})=3$ and $x,y\in H_{1}+H_{5}$ are linearly independent vectors mapped to [math] by $\tau_{1}^{\ast}$ , it follows that $H_{Y}=\tau_{1}^{\ast}(H_{1}+H_{5})$ has dimension at most $1$ . Moreover, any vector in the $1$ -dimension subspace corresponding to $Y$ is in $H_{1}+H_{5}$ and also in $H_{3}+H_{4}$ , so is fixed by $\tau_{1}^{\ast}$ . It follows that $H_{Y}$ contains the $1$ -dimensional subspace corrosponding to $Y$ ; so in fact $H_{Y}$ is exactly this subspace.

A parallel argument shows $H_{Z}$ is exactly the $1$ -dimensional subspace $Z$ . ∎

The construction of $\tau_{2}$ is similar, only in reverse, i.e. projecting back onto the subspace $\ell+T^{\ast}$ .

Claim.

We can find a map $\tau_{2}\colon V^{\prime}\to V^{\prime}$ such that the following hold. Define $H^{\prime}_{5}=\tau_{2}^{\ast}(H_{Y}+H_{2})$ and $H^{\prime}_{6}=\tau_{2}^{\ast}(H_{Z}+H_{1})$ . Then $H^{\prime}_{5},H^{\prime}_{6}$ are subspaces of $\ell+T^{\ast}$ of dimension at most $2$ ; and $H^{\prime}_{5}\cap V^{\ast}$ , $H^{\prime}_{6}\cap V^{\ast}$ are contained in the $1$ -dimension subspace corresponding to $X^{\prime}_{5}$ , $X^{\prime}_{6}$ respectively.

Note this gives the definition of the subspaces $H^{\prime}_{5}$ and $H^{\prime}_{6}$ from the statement.

Proof of claim.

Let $e_{1},e_{2},e_{3}$ be any basis for $\ell+T^{\ast}$ ; so (say) $\phi_{1},e_{1},e_{2},e_{3}$ is a basis for $V^{\prime\ast}$ . Define $\tau_{2}$ by:

[TABLE]

So, $\tau_{2}^{\ast}$ is a projection onto the subspace $\ell+T^{\ast}$ , and in particular its image is $\ell+T^{\ast}$ . Also, $H_{1},H_{2}$ , $H_{Y},H_{Z}$ all have dimension $1$ , and so the first part of the claim follows.

Suppose $x\in H^{\prime}_{5}\cap V^{\ast}$ . Necessarily $x\in\ell$ , and $x=\tau_{2}^{\ast}(y)$ for some $y\in H_{Y}+H_{2}$ . Since $y\in V^{\ast}$ , we may write $y=\alpha\phi_{1}+z$ for some $\alpha\in\mathbb{F}_{p}$ and $z\in\ell$ , and then $x=\alpha\xi+z$ . But $x\in\ell$ , $z\in\ell$ and $\xi\notin\ell$ , so $\alpha=0$ and $x=y=z$ . We deduce that $x$ lies in $H_{Y}+H_{2}$ (as $y$ does) and in $\ell$ ; so $x$ lies in the $1$ -dimension subspace $(H_{Y}+H_{2})\cap\ell$ which corresponds precisely to $X^{\prime}_{5}$ .

A parallel argument shows that $H^{\prime}_{6}\cap V^{\ast}$ is contained in the subspace corresponding to $X^{\prime}_{6}$ . ∎

At this point properties (ii), (iv), (v) and (viii) from the statement are satisfied, in addition to (i) as discussed. Indeed, (ii) follows from the requirement $\tau_{2}^{\ast}|_{\ell+T^{\ast}}=\operatorname{id}$ , since $(\ker\psi_{5})^{\perp},(\ker\psi_{6})^{\perp}\subseteq\ell+T^{\ast}$ . The facts $\tau_{1}^{\ast}(H_{1}+H_{5})=H_{Y}$ and $\tau_{1}^{\ast}(H_{Y})\subseteq H^{\prime}_{5}$ , and similarly $\tau_{2}^{\ast}(H_{2}+H_{6})=H_{Z}$ and $\tau_{2}^{\ast}(H_{Z})\subseteq H^{\prime}_{6}$ , give (iv). Property (v) is immediate from the definition of $H^{\prime}_{5}$ , $H^{\prime}_{6}$ , and (viii) is contained in the previous claim.

Our final task is to construct $\tau_{3}$ and $\tau_{4}$ . Specifically, we want:

Claim.

There exist maps $\tau_{3}\colon V^{\prime}\to V^{\prime}$ and $\tau_{4}\colon V^{\prime}\to V^{\prime}$ such that

•

$\psi_{3}\circ\tau_{3}=\psi_{3}$ ;

•

$\tau_{3}^{\ast}(H_{5}+H_{6})\subseteq H_{1}+H_{5}$ ;

•

$\tau_{3}^{\ast}(H_{1}+H_{2})\subseteq H_{2}+H_{6}$ ;

and

•

$\psi_{6}\circ\tau_{4}=\psi_{6}$ ;

•

$\tau_{4}^{\ast}(H_{1}+H_{2})\subseteq H_{Y}+H_{2}$ ;

•

$\tau_{4}^{\ast}(H_{3}+H_{4})\subseteq H_{Z}+H_{1}$ .

Combined with the properties of $\tau_{1}$ and $\tau_{2}$ we have already shown, this suffices for the remaining parts (iii),(vi),(vii) of the statement.

We isolate yet another linear algebra sub-claim.

Lemma 4.10.

Given a tuple $(w,U_{1},U_{2})$ where $U_{1},U_{2}$ are two distinct $2$ -dimensional subspaces of a vector space $W$ of dimension $3$ , and $0\neq w\in W$ is a point not in either subspace; and another such configuration $(w^{\prime},U^{\prime}_{1},U^{\prime}_{2})$ ; there is some isomorphism $\theta\colon W\to W$ mapping $w\mapsto w^{\prime}$ , $U_{1}\mapsto U^{\prime}_{1}$ , $U_{2}\mapsto U^{\prime}_{2}$ .

Proof.

After a change of coordinates, we may assume $U_{1}=\{(x,y,z)\in\mathbb{F}_{p}^{3}\colon x=0\}$ and $U_{2}=\{(x,y,z)\in\mathbb{F}_{p}^{3}\colon y=0\}$ (e.g. by considering two corresponding points in $W^{\ast}$ and extending to a basis). Suppose $w=(\alpha,\beta,\gamma)$ in these coordinates; so $\alpha,\beta\neq 0$ by assumption. By a further rescaling of coordinates we may therefore assume $\alpha=\beta=1$ . Finally, a change of variables $z^{\prime}=z-\gamma x$ means $w=(1,1,0)$ and $U_{1},U_{2}$ are unchanged.

Repeating this argument for $(w^{\prime},U^{\prime}_{1},U^{\prime}_{2})$ and interpreting the changes of coordinates as an isomorphism gives the result. ∎

Proof of claim.

Let $\theta_{3}\colon V^{\ast}\to V^{\ast}$ be the map given by the lemma applied to the tuples $(\phi_{3},\ell,H_{1}+H_{2})$ and $(\phi_{3},(H_{1}+H_{5})\cap V^{\ast},(H_{2}+H_{6})\cap V^{\ast})$ in $V^{\ast}$ . Note that $X_{3}$ is not on $\ell$ , $X_{1}X_{2}$ , $X_{1}X_{5}$ or $X_{2}X_{6}$ by our assumptions, and $X_{1}X_{2}\neq\ell$ , $X_{1}X_{5}\neq X_{2}X_{6}$ , so the hypotheses are satisfied. Then let $\tau_{3}^{\ast}(v,t)=(\theta_{3}(v),0)$ . It follows that this has the properties claimed.

Now let $\theta_{4}\colon V^{\ast}\to V^{\ast}$ be the map given by the lemma applied to $(\phi_{6},H_{1}+H_{2},H_{3}+H_{4})$ and $(\phi_{6},H_{Y}+H_{2},H_{Z}+H_{1})$ . Again, $X_{6}$ does not lie on $X_{1}X_{2}$ or $X_{3}X_{4}$ , or on $YX_{2}$ or $ZX_{1}$ (as the latter two imply respectively that $Y=Z$ or $X_{1}X_{2}X_{6}$ are collinear); and $X_{1}X_{2}\neq X_{3}X_{4}$ and $YX_{2}\neq ZX_{1}$ (for many reasons); so this is valid.

Now let $\tau_{4}^{\ast}$ be the unique map defined by $\tau_{4}^{\ast}(v)=\theta_{4}(v)$ for any $v\in V^{\ast}$ and $\tau_{4}^{\ast}(\chi_{6},1)=(\chi_{6},1)$ , where $\chi_{6}$ is the linear form from the definition of $\Psi$ , as in Definition 4.5. (This is possible: choose a basis for $V^{\ast}$ and extend it to a basis for $V^{\prime\ast}$ by adding $(\chi_{6},1)$ ; then define $\tau_{4}^{\ast}$ suitably on this basis.) Again, this gives the desired properties. ∎

This concludes the proof of Lemma 4.9, and thereby that of Lemma 4.7. ∎

∎

This is the last ingredient in the proof of Theorem 1.5. We briefly summarize the proof as a whole, as the different parts have been spread over the last few sections.

First one calculates the point $[r:s]\in\mathbb{P}^{1}(\mathbb{F}_{p})$ , given explicitly in Lemma 4.3, corresponding to the point $X_{5}$ on $\ell$ in our chosen coordinates.

Next, we convert the standard datum given into an augmented datum (by Lemma 4.6).

In the main part of the argument, we apply Lemma 4.7 repeatedly, under various permutations of $\{1,\dots,4\}$ , following the steps from from Lemma 4.2 applied to the point $[r:s]$ .

If at any point we arrive at a datum where $X_{i}X_{j}X_{k}$ are collinear for some $1\leq i<j\leq 4$ and $k=5,6$ , we terminate this process early; but if it runs to completion, some such collinearity is guaranted at the end. By Lemma 4.6 again (and Lemma 4.4) we dominate this by the corresponding standard datum.

Finally, we apply Proposition 2.3, or the standard Cauchy–Schwarz complexity bound (Proposition 1.1), to control this final datum by $\|f_{1}\|_{U^{2}}^{1/2}$ or $\|f_{1}\|_{U^{2}}$ respectively. By keeping track of the various domination statements, and noting in particular that we did not apply Lemma 4.7 too many times, we deduce the required bound on the original linear datum.

5 A proof of Theorem 1.7

Here we describe the construction of the counterexample described in Theorem 1.7.

As in the statement, let $p\equiv\pm 1\pmod{8}$ be a large prime. The congruence condition ensures that $2$ is a quadratic residue modulo $p$ . In what follows, we will assume that some choice of square root of $2$ in $\mathbb{F}_{p}$ has been fixed, and refer to it simply as $\sqrt{2}$ .

We let $X\subseteq\mathbb{F}_{p}$ denote the two-dimensional arithmetic progression:

[TABLE]

for some small absolute constant $\alpha>0$ to be specified.

We note that any value $x\in\mathbb{F}_{p}$ has at most one representation as $x=a+b\sqrt{2}$ where $a,b$ are integers with $|a|,|b|\leq p^{1/2}/4$ . Indeed, if $x=a+b\sqrt{2}=a^{\prime}+b^{\prime}\sqrt{2}$ then $(a-a^{\prime})^{2}-2(b-b^{\prime})^{2}$ is a multiple of $p$ ; but it has absolute value at most $\max((a-a^{\prime})^{2},2(b-b^{\prime})^{2})\leq 2(p^{1/2}/2)^{2}<p$ . Hence, $(a-a^{\prime})^{2}=2(b-b^{\prime})^{2}$ which is a contradiction unless $a=a^{\prime}$ , $b=b^{\prime}$ .

Now, define the system $\Phi$ of linear forms $\phi_{1},\dots,\phi_{6}\colon\mathbb{F}_{p}^{3}\to\mathbb{F}_{p}$ by:

[TABLE]

We claim that these forms do not lie on a conic, i.e. the system has true complexity $1$ . Since $p\neq 2$ , we need not distinguish between symmetric bilinear forms and quadratic forms, so it makes sense to write

[TABLE]

and then it suffices to verify that the matrix (whose columns correspond to $x^{2},y^{2},z^{2},xy,xz,yz$ )

[TABLE]

is non-singular. In fact, the determinant of this matrix is $512\sqrt{2}$ , and so $\phi_{1},\dots,\phi_{6}$ do not lie on a conic.

Our reason for choosing these specific forms is that they nonetheless satisfy a kind of “skew conic” identity which we now describe. Suppose we define forms $\widetilde{\phi_{1}},\dots,\widetilde{\phi_{6}}\colon\mathbb{Z}[\sqrt{2}]^{3}\to\mathbb{Z}[\sqrt{2}]$ using the same coefficients as above, now thought of as elements of the ring $\mathbb{Z}[\sqrt{2}]$ . These also induce forms $\mathbb{Q}(\sqrt{2})^{3}\to\mathbb{Q}(\sqrt{2})$ in the obvious way. Write $\sigma(a+b\sqrt{2})=a-b\sqrt{2}$ for the Galois conjugate and $N(z)=z\sigma(z)$ for the norm of $z\in\mathbb{Q}(\sqrt{2})$ (so $N(a+b\sqrt{2})=a^{2}-2b^{2}$ ). Then the “skew conic” identity is the fact that:

[TABLE]

for any $x,y,z\in\mathbb{Q}(\sqrt{2})$ .

So, we can define a function

[TABLE]

for some parameter $R\in\mathbb{R}/\mathbb{Z}$ to be specified; so $f$ is supported on $X$ . We claim, for say $\alpha=1/49$ , that for every $x,y,z\in\mathbb{F}_{p}$ the expression

[TABLE]

takes value either $1$ , if all six forms take values in $X$ , or [math] otherwise. This is essentially a kind of Freĭman isomorphism argument over $\mathbb{Z}[\sqrt{2}]$ .

Indeed, note that tuples $(\phi_{1}(x,y,z),\dots,\phi_{6}(x,y,z))$ or $\big{(}\widetilde{\phi_{1}}(x,y,z),\dots,\widetilde{\phi_{6}}(x,y,z)\big{)}$ for $x,y,z\in\mathbb{F}_{p}$ or $x,y,z\in\mathbb{Q}(\sqrt{2})$ respectively are precisely those tuples $(r_{1},\dots,r_{6})$ satisfying the equations

[TABLE]

So, if $\phi_{1}(x,y,z),\dots,\phi_{6}(x,y,z)$ are all in $X$ , write $\phi_{i}(x,y,z)=a_{i}+b_{i}\sqrt{2}$ for the unique integers $a_{i},b_{i}$ with $|a_{i}|,|b_{i}|\leq\alpha p^{-1/2}$ , and let $r_{i}=a_{i}+b_{i}\sqrt{2}$ be the corresponding elements of $\mathbb{Z}[\sqrt{2}]$ . Then each of the left hand sides of the equations above has the form $u+v\sqrt{2}$ for integers $u,v$ with $|u|,|v|\leq 12\alpha p^{1/2}<p^{1/2}/4$ , and is congruent to $0=0+0\sqrt{2}$ modulo $p$ , so by our earlier uniqueness argument must be [math]. Hence $(r_{1},\dots,r_{6})=\big{(}\widetilde{\phi_{1}}(x,y,z),\dots,\widetilde{\phi_{6}}(x,y,z)\big{)}$ for some $x,y,z\in\mathbb{Q}(\sqrt{2})$ , so (5) applies and the claim follows.

Given that $\phi_{1}(x,y,z),\dots,\phi_{6}(x,y,z)\in X$ for any $x,y,z$ of the form $a+b\sqrt{2}$ for integers $a,b$ with $|a|,|b|\leq p^{1/2}/199$ , we deduce that

[TABLE]

for $p$ sufficiently large, as required.

Finally we need to consider $\|f\|_{U^{2}}$ . This is a fairly standard estimate on quadratic exponential sums, but with some variations. For simplicity we use a mean value strategy.

For $x\in X$ , let $N(x)$ denote $N(a+b\sqrt{2})=a^{2}-2b^{2}$ where $a,b$ are the unique integers in $[-\alpha p^{1/2},\alpha p^{1/2}]$ with $x=a+b\sqrt{2}$ . Then for any $x,h,h^{\prime}\in\mathbb{F}_{p}$ such that $x,x+h,x+h^{\prime},x+h+h^{\prime}$ are in $X$ , we have $h=r+s\sqrt{2}$ , $h^{\prime}=r^{\prime}+s^{\prime}\sqrt{2}$ for some unique integers $r,s,r^{\prime},s^{\prime}\in[-2\alpha p^{1/2},2\alpha p^{1/2}]$ , and

[TABLE]

So, for any such $x,h,h^{\prime}$ , if we now take an average in the parameter $R$ , we get

[TABLE]

It follows that

[TABLE]

and we note that for $r,r^{\prime}$ fixed and any fixed $s^{\prime}\neq 0$ there is at most one solution in $s$ to $rr^{\prime}=2ss^{\prime}$ , so the right hand side is bounded by

[TABLE]

whenever $p$ is sufficiently large. Picking some $R$ for which $\|f\|_{U^{2}}^{4}$ is at most its mean value, the result follows.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Tim Austin, Partial difference equations over compact abelian groups, i: modules of solutions , 2013.
2[2] , Partial difference equations over compact abelian groups, ii: step-polynomial solutions , 2013.
3[3] W. T. Gowers, A new proof of Szemerédi’s theorem , Geom. Funct. Anal. 11 (2001), no. 3, 465–588.
4[4] W. T. Gowers and J. Wolf, The true complexity of a system of linear equations , Proc. Lond. Math. Soc. (3) 100 (2010), no. 1, 155–176.
5[5] , Linear forms and higher-degree uniformity for functions on 𝔽 p n subscript superscript 𝔽 𝑛 𝑝 \mathbb{F}^{n}_{p} , Geom. Funct. Anal. 21 (2011), no. 1, 36–69.
6[6] , Linear forms and quadratic uniformity for functions on 𝔽 p n subscript superscript 𝔽 𝑛 𝑝 \mathbb{F}^{n}_{p} , Mathematika 57 (2011), no. 2, 215–237.
7[7] , Linear forms and quadratic uniformity for functions on ℤ N subscript ℤ 𝑁 \mathbb{Z}_{N} , J. Anal. Math. 115 (2011), 121–186.
8[8] Ben Green and Terence Tao, An arithmetic regularity lemma, an associated counting lemma, and applications , An irregular mind, Bolyai Soc. Math. Stud., vol. 21, János Bolyai Math. Soc., Budapest, 2010, pp. 261–334.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Abstract

1 Introduction

Proposition 1.1** (Essentially from [9]).**

Question 1.2**.**

Question 1.3**.**

Question 1.4**.**

Theorem 1.5**.**

Slogan 1.6**.**

Theorem 1.7**.**

Remark 1.8*.*

Remark 1.9*.*

Remark 1.10*.*

1.1 Outline of the paper

1.2 Notation

1.3 Acknowledgements

2 Preliminaries concerning six forms in three variables

Slogan 2.1**.**

Proposition 2.2**.**

Proof.

Proposition 2.3**.**

Proof.

Remark 2.4*.*

Remark 2.5*.*

3 Formalisms for iterated Cauchy–Schwarz

3.1 Linear data

Definition 3.1**.**

Remark 3.2*.*

Definition 3.3**.**

Proposition 3.4**.**

Proof.

Definition 3.5**.**

Proposition 3.6**.**

Proof.

Definition 3.7**.**

Proposition 3.8**.**

Proof.

Remark 3.9*.*

Definition 3.10**.**

Lemma 3.11**.**

Remark 3.12*.*

3.2 Graphs of vector spaces

Definition 3.13**.**

4 The detailed strategy for Theorem 1.5

4.1 The effect of the block construction

Lemma 4.1**.**

Proof.

Lemma 4.2**.**

Proof.

Lemma 4.3**.**

Proof.

Lemma 4.4**.**

Proof.

4.2 Implementing a block move

Definition 4.5**.**

Lemma 4.6**.**

Proof.

Lemma 4.7**.**

Proof of Lemma 4.7.

Lemma 4.8**.**

Proof.

Lemma 4.9**.**

Proof of Lemma 4.9.

Claim**.**

Proof of Claim.

Claim**.**

Proof of claim.

Claim**.**

Lemma 4.10**.**

Proof.

Proof of claim.

5 A proof of Theorem 1.7

Proposition 1.1 (Essentially from [9]).

Question 1.2.

Question 1.3.

Question 1.4.

Theorem 1.5.

Slogan 1.6.

Theorem 1.7.

*Remark 1.8**.*

*Remark 1.9**.*

*Remark 1.10**.*

Slogan 2.1.

Proposition 2.2.

Proposition 2.3.

*Remark 2.4**.*

*Remark 2.5**.*

Definition 3.1.

*Remark 3.2**.*

Definition 3.3.

Proposition 3.4.

Definition 3.5.

Proposition 3.6.

Definition 3.7.

Proposition 3.8.

*Remark 3.9**.*

Definition 3.10.

Lemma 3.11.

*Remark 3.12**.*

Definition 3.13.

Lemma 4.1.

Lemma 4.2.

Lemma 4.3.

Lemma 4.4.

Definition 4.5.

Lemma 4.6.

Lemma 4.7.

Lemma 4.8.

Lemma 4.9.

Claim.

Claim.

Claim.

Lemma 4.10.