Good Bounds in Certain Systems of True Complexity One
Freddie Manners

TL;DR
This paper proves that multilinear averages associated with certain linear systems in finite fields are controlled by the Gowers U^2 norm, with polynomial bounds, strengthening previous results and avoiding inverse Gowers norm theory.
Contribution
It establishes polynomial bounds for multilinear averages in systems with true complexity 1 and Cauchy-Schwarz complexity 2, using only Cauchy-Schwarz inequalities.
Findings
Multilinear averages are controlled by the U^2 norm with polynomial dependence.
The bounds strengthen previous results by Gowers and Wolf.
The dependence of the controlling constant on the system's coefficients is necessary.
Abstract
Let be a system of linear forms in variables, i.e. for each . Suppose also that has Cauchy--Schwarz complexity and true complexity , in the sense defined by Gowers and Wolf; in fact this is true generically in this setting. Finally let for any prime and . Then we show that multilinear averages by are controlled by the -norm, with a polynomial dependence; i.e. if are functions with for each , then for each , : \[ \left| \mathbb{E}_{x_1,x_2,x_3 \in G} f_1(\varphi_1(x_1,x_2,x_3)) \dots f_6(\phi_6(x_1,x_2,x_3)) \right| \le \|f_j\|_{U^2}^{1/C} \] for some depending on . This recovers and strengthens a result of Gowers and Wolf in these cases.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Abstract
Let be a system of linear forms in variables, i.e. for each . Suppose also that has Cauchy–Schwarz complexity and true complexity , in the sense defined by Gowers and Wolf; in fact this is true generically in this setting. Finally let for any prime and . Then we show that multilinear averages by are controlled by the -norm, with a polynomial dependence; i.e. if are functions with for each , then for each , :
[TABLE]
for some depending on . This recovers and strengthens a result of Gowers and Wolf in these cases. Moreover, the proof uses only multiple applications of the Cauchy–Schwarz inequality, avoiding appeals to the inverse theory of the Gowers norms.
We also show that some dependence of on is necessary; that is, the constant can unavoidably become large as the coefficients of grow.
\dajAUTHORdetails
title = Good Bounds in Certain Systems of True Complexity One, author = Freddie Manners, plaintextauthor = Freddie Manners, \dajEDITORdetailsyear=2018, number=21, received=27 September 2017, published=28 December 2018, doi=10.19086/da.6814,
[classification=text]
1 Introduction
Let be a finite abelian group and a subset. Many problems of interest in additive combinatorics and related fields involve counting solutions to some system of equations within . For instance, we might wish to count Schur triples all of whose coordinates lie in , or -term arithmetic progressions where each term lies in , etc..
The most general case of this kind of question is as follows: given a tuple of linear forms where , and functions , estimate the quantity
[TABLE]
(Here we have abused notation to let induce a function , in the obvious way.)
So, in our examples above, we take , the indicator function of ; however it is convenient to allow more general functions in the definition of as they arise in intermediate computations. In our examples, is as follows:
- •
in the case of Schur triples, , , , and ;
- •
in the case of -term arithmetic progressions, , , , , and .
A fundamental observation in much recent progress in such questions (as applied to Szemerédi-type theorems, or counting solutions to linear equations in the primes), originally due to Gowers [3], is that averages are controlled by Gowers uniformity norms.111We will assume the reader is familiar with the definition of Gowers norms and some related concepts; for an introduction, see e.g. [9, Appendix B], [15, Chapter 11], or [14].
A weak statement of this type is that if has density and is suitably quasirandom in the sense that , where is some positive integer, then
[TABLE]
i.e. the number of solutions to the system in is roughly the same as the expected count in a random set, i.e. . A stronger type of statement one could make is that if are any functions with for all , and for any one , then
[TABLE]
and indeed this kind of statement implies the previous one.
The remaining question is when one has such a statement for a system of linear forms , and if so, how small the positive integer can be; i.e. how far one has to go in the hierarchy of Gowers norms to control . For instance, for -term arithmetic progressions, Gowers [3] showed that a statement of type (2) holds for , and with a good bound; specifically,
[TABLE]
whenever for each , and for any . The proof is applications of the Cauchy–Schwarz inequality.
Moreover, Gowers gave examples to show that cannot be improved. For instance, when and , we can consider functions
[TABLE]
for some nonzero , and observe that pointwise for any . So, , but one can show that . This rules out a statement of type (2) for ; taking appropriate level sets of these functions rules out (1) also.
The first systematic approach to this question for general systems of linear forms was given by Green and Tao [9] in the course of their work on linear equations in primes. The following is essentially implicit as a much easier case of results from that paper, and was isolated in [4]; however, the terminology we use is slightly different to both.
Proposition 1.1** (Essentially from [9]).**
Given a prime , a system of linear forms , and an index , , we say has Cauchy–Schwarz complexity at , modulo , if the following holds: the indices can be partitioned into classes such that modulo , considered as a linear form , is not contained in for any .
Let , where , may be any size (including say ). If has Cauchy–Schwarz complexity at modulo , then for any functions with for each , we have
[TABLE]
If has Cauchy–Schwarz complexity at every index, modulo , we could just say the system has Cauchy–Schwarz complexity modulo , and write for the smallest for which this holds (where implicitly depends on ). If is very large, taking the span of as linear forms is essentially equivalent to working over and the value of stabilizes.
As the name suggests, the proof of Proposition 1.1 is applications of Cauchy–Schwarz, as in Gowers’ work. The content of the proposition is really in establishing the linear algebra condition that guarantees this Cauchy–Schwarz argument will work.
Following this, Gowers and Wolf, in a series of papers [4, 5, 6, 7], considered the question: is the value of given by Cauchy–Schwarz complexity optimal? It is natural to try to adapt the examples given by Gowers for -term progressions to the general case to give a lower bound. The task comes down to finding phase polynomials of degree , i.e. functions of the form where is a degree polynomial (in a natural sense) for each , such that
[TABLE]
i.e. the multilinear average is equal to pointwise. By contrast will typically be very small when is a degree phase polynomial, so this rules out a statement of type (2) or (1) for .
It turns out that this is possible if and only if are linearly dependent, where are interpreted as symmetric multilinear forms over . In other words, can only be fully controlled by the -norm if are linearly independent elements of .
It also turns out that this lower bound, arising from explicit phase polynomials, and the upper bound coming from Cauchy–Schwarz complexity, do not agree in general. Gowers and Wolf conjectured that the lower bound is the truth; that is, if the true complexity of over is defined to be the smallest such that are linearly independent,222In fact Gowers and Wolf set up the definitions slightly differently: they define true complexity to be the smallest such that (1) holds, and conjecture it is equal to the algebraic quantity we have just defined. Since this conjecture is known to be true in cases of interest, defining things the other way round should hopefully not cause too much confusion. then (2) holds for this and any . By our previous discussion, such a statement would be (qualitatively) best possible in .
In what follows, we write to denote this notion of the true complexity of a system of linear forms (over ), unless otherwise stated.
This conjecture has now been resolved in essentially all cases of interest. The original paper [4] by Gowers and Wolf proved the case where , , and for fixed and large. This case was proved again in [6] (also by Gowers and Wolf) with an improved quantitative bound on in terms of . Still when for fixed, but not too small, the general case (i.e. arbitrary finite and ) was proven in another paper [5] by the same authors. They also showed the case and for , where this time is large, in [7]. The general result in the cyclic setting for large was shown by Green and Tao [8] as an application of their nilsequence-based arithmetic regularity lemma.
Later, Hatami and Lovett [11] extended the results of [5] to the asymmetric case, where may be linearly dependent but not all of these multilinear forms are in the linear span of the others, which corresponds to (2) holding for some choices of but not others. Finally, Hatami, Hatami and Lovett [10] removed the requirement in the case for fixed that be not too small.
We comment only very briefly on the proofs, as they will not play a large role in the current work. We focus on the simplest case of fixed, and . By the assumption on and Proposition 1.1, we are free to discard small errors at any point. By the inverse theorem for the Gowers -norm, this means we are free to assume that each is a linear combination of a few phase polynomials of degree at most . We would like to argue that when , the quadratic terms do not contribute much to ; however, this requires a more robust version of our assumption that are linearly independent (in effect, we need that no non-trivial linear combination over has low rank). Bridging this gap between the robust and non-robust statements is the heart of the argument.
At least qualitatively, these works verify all the central conjectures concerning true complexity. Nonetheless, there are some unresolved questions of interest.
Question 1.2**.**
What are the best possible bounds in the true complexity statement? That is, how small must be in terms of to ensure that implies ?
In the case and , Gowers and Wolf [6, 7] obtained a doubly exponential dependence, i.e. .333One of these exponentials can probably be removed using subsequent improved bounds in the inverse theorem for the -norm that follow from work of Sanders [13]. The author is grateful to the anonymous reviewer for pointing this out. In all other cases where the best known bounds are ineffective, or as good as ineffective, as they rely on the inverse theorems for the -norms for for which no good bounds are known.
In [5, Problem 7.8], Gowers and Wolf suggested that the dependence cannot be too good, and specifically, not polynomial; that is, they asked whether one could find a counterexample ruling out .
This is closely related to the following question.
Question 1.3**.**
In cases where the true complexity and Cauchy–Schwarz complexity differ, could a true complexity bound be proven by elementary means, e.g. by many applications of the Cauchy–Schwarz inequality; or is some appeal to the structural theory of higher order Fourier analysis essential? Is there some qualitative feature which separates the elementary and non-elementary cases?
The primary motivation behind Gowers and Wolf’s appeal for counterexamples to good bounds is that this would rule out a proof based only on complicated applications of the Cauchy–Schwarz inequality, as that would surely give a polynomial bound.
Our final question is at first appearance more eccentric but we will see its relevance shortly.
Question 1.4**.**
Working over for a large prime, and when , all the known bounds on in terms of depend on the coefficients of the linear forms in , not just on the values of and . In practice, these results are only effective if the coefficients are essentially bounded.
By contrast, the Cauchy–Schwarz complexity bound is completely uniform in the coefficients, provided the hypotheses are satisfied.
Is this restriction to bounded coefficients necessary, whenever ?
Working over for fixed and large, there are only finitely many choices of linear forms, so any dependence on the coefficients can be removed. In this setting, the analogous question is whether the bounds should genuinely depend on .
In this paper, we consider what is in some sense the smallest non-trivial case where , which concerns systems of linear forms in variables (i.e., and ). Indeed, when it is always the case that , and similarly for and . However, a generic system with and will have but (see Section 2 for a discussion).
In this limited setting, we are able to give fairly complete answers to the questions above. We now outline the main results.
Theorem 1.5**.**
Let be a system of linear forms in variables, let be a prime (not necessarily small), and let for any .
Then, provided the system has true complexity over , for any functions with for each , and for any , , we have the bound
[TABLE]
where is some constant depending on the coefficients of , and perhaps , but crucially not on or .
Moreover, the above inequality can be derived only using multiple applications of the Cauchy–Schwarz inequality. However, the number of applications used in the proof increases without bound as the coefficients of grow.
Note that, since no restrictions are placed on and , this encompasses the cases for a large prime as well as for fixed and large. In intermediate cases where and are both large, even qualitatively the result may officially be new, although these cases are rarely of interest.
The key observation underlying the proof is the following.
Slogan 1.6**.**
Cauchy–Schwarz complexity is not preserved under applying the Cauchy–Schwarz inequality.
By this we mean the following. If we start with a system of linear forms and apply the Cauchy–Schwarz inequality to one of the functions, what we get can be thought of as a new linear system with forms and variables. It is not always true that if then ; so in some cases we can now apply the Cauchy–Schwarz complexity bound (Proposition 1.1) to to bound it by , meaning that in turn is bounded by . More generally, we can hope to apply Cauchy–Schwarz repeatedly and systematically, eventually arriving at a system with Cauchy–Schwarz complexity .
On the other hand, we show that the quantity , which quantifies the number of times the Cauchy–Schwarz inequality is used, must necessarily grow without bound as varies.
Theorem 1.7**.**
For any sufficiently large prime , , there exist a system of linear forms in variables with , and functions with for each , such that
[TABLE]
but
[TABLE]
for each .
Unlike in Theorem 1.5, here the system is allowed to change as grows, with no control on the size of its coefficients. The condition is an inessential one related to the precise construction used and could be removed without too much added difficulty.
Remark 1.8*.*
This negative result perhaps sheds some light on where the obstructions to a very straightforward proof of Theorem 1.5 lie.
The difficulty turns out not to be that the Cauchy–Schwarz inequality is insufficiently powerful, or too blunt to detect the algebraic nature of the boundary between systems with and ; in fact it handles such considerations surprisingly easily.
Instead, the issue is that the Cauchy–Schwarz steps used must necessarily be tailor-made to the system being considered. The task of describing a mapping from systems to Cauchy–Schwarz arguments could be likened to that of building a primitive computer using only the Cauchy–Schwarz inequality. Setting up the technical machinery required to achieve this will occupy most of the paper.
Remark 1.9*.*
The value of given by the proof of Theorem 1.5 is completely explicit but in many cases unreasonably large. No serious attempt has been made to optimize it, although minor changes would probably produce only minor improvements.
For large , the worst-case behavior given by the proof is something like where is the size of the largest (integer) coefficient appearing in . Although typically one expects not to hit the worst case, nonetheless in practice for integer coefficients of size about values such as are not unusual. It seems likely such values are not best possible.
When is fixed, we may state a bound in terms of rather than the size of the coefficients. Here the method gives . It is possible one could modify the argument to improve this to , which would be best possible up to absolute constants. However, significant additional technical challenges arise, and so we will not attempt this here.
Remark 1.10*.*
The general case of Questions 1.2, 1.3 and 1.4, for or , remains open. It seems reasonable to speculate that Theorem 1.5 (and Theorem 1.7) have analogues in this level of generality. There is no immediately apparent obstruction to the overall approach of repeated application of the Cauchy–Schwarz inequality succeeding in general, but conversely it is not obvious how to generalize the specific strategies used when and to the general case. Therefore, this is left to possible future work.
1.1 Outline of the paper
In Section 2 we present some preliminaries concerning the case of forms in variables. In particular, we will deal with some initial degenerate cases where, in a technical and slightly disingenuous sense, we will see that applying the Cauchy–Schwarz inequality causes to decrease. We will need these cases in what follows, but this also serves as an introduction to the general approach behind the proof of Theorem 1.5 without the notational complexities.
In Section 3 we introduce formalisms to keep track of the effects of multiple applications of Cauchy–Schwarz in a systematic manner. This has the effect of reducing the proof of Theorem 1.5 in any given instance, to winning a Cauchy–Schwarz “game” which has a well-defined set of possible moves and which can readily be simulated on a computer.
Section 4 addresses the core problem of solving this game in general. This comes down to finding sequences of moves which have the effect of implementing predictable arithmetic operations on the system , and using them to walk to a degenerate configuration of the type considered in Section 2.
Finally, Section 5 gives the proof of the negative result, Theorem 1.7.
1.2 Notation
We use to denote any quantity bounded above by an absolute constant, and to mean . The notation for a positive integer denotes the set . For a real parameter , denotes . The notation (for example) denotes the indicator function of the event . If is a finite-dimensional vector space over , we write for its dual space and for the corresponding projective space (i.e. the space of -dimensional subspaces of ). Also, means the same as \mathbb{P}\big{(}\mathbb{F}_{p}^{k+1}\big{)}. Given , we write for the corresponding element of . If is a subspace of , we write for the perpendicular subpsace, i.e. the set of all that vanish on .
1.3 Acknowledgements
The author would like to thank Sean Eberhard, Ben Green, Rudi Mrazović and Julia Wolf for discussions on these topics at various times.
2 Preliminaries concerning six forms in three variables
We start by giving a brief analysis of the different cases that can arise concerning a system of six forms in three variables, and the associated Cauchy–Schwarz complexity and true complexity. Throughout this section we write , so (modulo ) can be thought of as linear functionals , always assumed to be non-zero.
It is clear that nothing substantial changes when we replace by a non-zero scalar multiple . Indeed, the quantities and are essentially the same, up to replacing with a dilate of itself, and so this has no effect on the conclusion of Theorem 1.5; and by inspection our definitions of true complexity and Cauchy–Schwarz complexity are also unchanged.
Therefore it makes sense to think of the forms as points in the projective plane , quotienting out by the action of scalar multiplication. This allows us to phrase the different cases geometrically.
We have said that has true complexity if the symmetric bilinear forms are linearly independent. Note that this space of symmetric bilinear forms on has dimension , and there are six forms, so we expect this to be true generically. Indeed, a dependence relation on exists if and only if there is a non-zero linear functional which evaluates to [math] on each ; and this in turn is the same thing as a non-zero quadratic form which vanishes at each ; i.e., a conic in the projective plane containing for each .444Note that this argument is still valid as stated when .
In other words, we have shown the following.
Slogan 2.1**.**
Six forms on have true complexity at least , if and only if all lie on a (possibly degenerate) conic in .
By a degenerate conic, we mean the union of two lines. In particular, if are collinear and so are , then this system has true complexity at least .
It is possible for the true complexity to be greater than : for instance, if five of the points lie on a line in , in which case the true complexity is ; or if for some , in which case the system has infinite complexity.
However, all such cases may be fully analyzed in terms of Cauchy–Schwarz complexity, which gives a bound for each where the values are best possible, even when they vary with . The details are an uninteresting check that will not be relevant to the argument, so are omitted.
We therefore restrict our attention to the cases with . In particular, we can henceforth make the following assumptions:
- (i)
the points are all distinct; 2. (ii)
no four are collinear; and 3. (iii)
if some three of the points are collinear, the remaining three are not collinear.
It is clear that if the six forms are in general position, meaning no three are collinear, then . Indeed, any way we partition all but one of the forms into two classes, one of the classes will contain three forms and so their span will be all of ; hence . Conversely any split achieves .
Our remaining task in this section is to consider the case where (i)–(iii) hold, but nonetheless are not in general position. This is a setting in which Cauchy–Schwarz complexity has some purchase, but nonetheless there is a subtlety meaning, technically speaking, that typically .
Proposition 2.2**.**
Suppose throughout that a system of forms on is given, with no four of collinear and no two the same.
- (i)
Suppose that are collinear but are not collinear. Then for functions , with for each , and for any we have a bound
[TABLE]
coming from Proposition 1.1. 2. (ii)
Under the same conditions as (i), the system has true complexity . In particular, by results of Gowers and Wolf, is bounded in terms of for (at least for or fixed). 3. (iii)
Now suppose further that is the only collinear triple. Then for there is no way to partition into two pieces such that is in the span of neither piece, and hence for this system.
Proof.
For (i), say when , we can partition into and . By our assumptions, it is clear that is in neither nor , and so the bound indeed follows from Proposition 1.1. The other choices of are analogous.
For (ii), we note that a conic containing three distinct collinear points must be degenerate, but are not contained in the union of any two lines. Hence the points do not lie on a conic, and so .
For (iii), when say , given any partition of into two pieces, one of the pieces contains three of the forms. Since that triple is not , they are not collinear and so their span is all of . ∎
So, this is a case where and differ, albeit for what feels like a bad reason. Indeed, it is not too challenging to recover a good bound on in terms of in this setting, for instance by decomposing into two parts corresponding to its large and small Fourier coefficients, bounding away the uniform contribution and treating what is left as essentially a system of five forms.
Instead, we will now recover such a bound purely by using the Cauchy–Schwarz inequality, and thereby provide the first (admittedly unimpressive) instantiation of Slogan 1.6.
Proposition 2.3**.**
Let be a system of linear forms on , such that no four of are collinear and no two are the same; are collinear; and are not collinear.
Then for functions , with for each , we have a bound
[TABLE]
Proof.
By applying a suitable change of basis to , we may assume without loss of generality that ; this is not essential but eases the notation. So,
[TABLE]
We can apply the Cauchy–Schwarz inequality to obtain
[TABLE]
Now, the term on the right expands to
[TABLE]
and we can think of this as where is the system of forms in the five variables , given by and for , each thought of as a linear functional .
We claim that, under our hypotheses, it is possible to partition the nine forms , , , , , , into two classes such that is not in the span of either class. Specifically, we will take
[TABLE]
to be the sets of indices in each class. If this claim holds, then by the standard Cauchy–Schwarz complexity bound (Proposition 1.1) again we have
[TABLE]
and the result follows.
We now verify the claim. We first note that, since are collinear by hypothesis,
[TABLE]
as . This makes the claim plausible for dimension reasons: it is reasonable to expect the span of four linear forms on not to contain a fifth, unless something untoward happens. However, something untoward could genuinely happen if too many of the original forms are collinear, and more generally we need to show that all bad cases are ruled out by our hypotheses. This is the technical part of the calculation, and may be skipped on first (or subsequent) reading.
Recall and write for . To show is not in the span of , it would suffice to show that together with the other four form a basis for ; equivalently, that the matrix
[TABLE]
(whose columns correspond to respectively) is non-singular. However, it is not hard to see that
[TABLE]
The determinants on the right hand side are zero precisely when, respectively, or are collinear. Under our assumptions, neither can be true (as then four points would lie on a line) and so is non-singular.
The argument for is very similar. We define
[TABLE]
which is non-singular if and only if form a basis. Then
[TABLE]
and again this is zero if and only if either or are collinear. Again, both of these are explicitly ruled out by our hypotheses, and this proves the claim. ∎
Remark 2.4*.*
One way to think of this proof on a high level is as a combinatorial analogue of the method we sketched above: namely, first observing that is controlled by , then noting this allows us to essentially eliminate by replacing it with the sum of its large Fourier coefficients, and finally applying Cauchy–Schwarz on the remaining five forms.
What we do here is first make two copies of the original system, joined by ; on the right, we decompose the remaining forms as if we were attempting to prove a Cauchy–Schwarz complexity bound in , as in Proposition 2.2; and on the left we decompose as if we were tring to prove a Cauchy–Schwarz complexity bound in and the form didn’t exist.
So, the initial Cauchy–Schwarz allows us to somehow substitute the information gained from the former argument into the latter.
Remark 2.5*.*
As we have said, this is an application of Slogan 1.6, but not a very convincing one. Before embarking on the programme in full generality, we briefly sketch an example of six forms in general position, having (but necessarily ), where we nonetheless get a bound using only Cauchy–Schwarz.
Consider the forms
[TABLE]
where are arbitrary subject to the condition that the forms be in general position. For concreteness one could substitute , , .
We can apply Cauchy–Schwarz to twice as follows:
[TABLE]
and then again
[TABLE]
We now claim that this last system of linear forms in variables has Cauchy–Schwarz complexity with respect to , if and only if the original system has true complexity . That is, we can partition the remaining forms into two classes:
[TABLE]
such that lies in the span of one of the classes, if and only if lie on a conic. Verifying this claim is left as an exercise for the interested reader. We stress that for particular choices of this is an elementary finite computation.
The fact that these kinds of linear algebra conditions can detect whether the points lie on a conic should perhaps not be surprising in light of Pascal’s hexagon theorem.
3 Formalisms for iterated Cauchy–Schwarz
The purpose of this section is to introduce some formalisms necessary to keep track of what happens when we apply the Cauchy–Schwarz inequality repeatedly. The notational overhead here is high, but preferable to handling yet larger explicit calculations in the style of the previous section.
3.1 Linear data
Although the central objects of study are systems of linear forms, it will be convenient to use a natural generalization of this notion, which handles the objects that arise in intermediate stages of the calculation. We introduce the relevant definitions now.
Definition 3.1**.**
Let a prime be fixed. By a linear datum, we mean a tuple , where is some finite index set, and
- •
and for are finite-dimensional vector spaces over ;
- •
are surjective linear maps.
Given a positive integer , we abuse notation to write for the map that applies to each coordinate. Now, for a collection of functions with , we define
[TABLE]
It is clear that in the special case that for each , this is essentially the same information as a system of linear forms on for some . The reader should always imagine as being small, even when we are working over for some large : the is taken care of in the definition of , not of .
Attempting to analyse linear data in general exposes hard problems; see [1, 2]. Since the linear data we will consider ultimately come from systems of linear forms, these subtleties will not arise here.
Remark 3.2*.*
Typically we are not too concerned by replacing by isomorphic vector spaces, or by the exact form of the linear map : for instance, as we have said the difference between and is usually immaterial.
As such, the only really important information is the collection of subspaces of , as we can always recover up to isomorphism as . One can interpret as the subspace of that the function cannot depend on.
Alternatively, we could think about the perpendicular subspaces , corresponding to the span of all linear forms derived from . This is consistent with the geometric picture from Section 2: such subspaces correspond to points, lines, planes etc. in .
For technical reasons it is useful to keep track of the linear maps explicitly; but the reader will rarely lose anything, and possibly gain something, by thinking of a linear datum as simply a collection of subspaces of or .
We need some notion of when one linear datum bounds another; for instance, but not exclusively, because one is obtained by applying the Cauchy–Schwarz inequality to the other.
Definition 3.3**.**
Suppose we have two linear data and . Suppose further that for some pair , the subspaces are identified. Finally, let be a positive real number.
We say dominates respecting with exponent if the following holds: for and any collection of functions , with , there exist functions , , , such that , and
[TABLE]
It is clear that domination is transitive: if dominates respecting with exponent , and dominates respecting with exponent , then dominates respecting with exponent .
Some straightforward examples of domination include (i) replacing by an isomorphic system (i.e. reparameterizing); (ii) augmenting by introducing further averaging, or by replacing by a strictly larger subspace for some ; or (iii) taking a supremum over some part of the average. All of these are subsumed in the following general proposition.
Proposition 3.4**.**
Suppose and are two linear data on the same index set , and that we are given linear maps and such that for each (i.e. a morphism of linear data). If is some index such that and is the identity, then dominates respecting , with exponent .
Proof.
Let be a complete set of coset representatives of in . By our hypotheses, we have and so ; hence we may insist that all lie in .
For any collection of functions , , we have
[TABLE]
Now fix to be any maximal choice, and define by
[TABLE]
We deduce that . Moreover, it follows from our assumptions that , and so the conditions of Definition 3.3 are satisfied. ∎
Definition 3.5**.**
If obey the hypotheses of Proposition 3.4, we say dominates trivially at index . Replacing by is termed a operation.
We now consider how to describe an application of the Cauchy–Schwarz inequality in this language.
Proposition 3.6**.**
Suppose is a linear datum, and some is given. Let be the linear datum defined as follows:
- •
* is the fiber product of with itself over , i.e.:*
[TABLE]
- •
* is the disjoint union of two copies of , denoted*
[TABLE]
- •
for each , ; and
- •
for each and ,
[TABLE]
Then for any , dominates respecting and with exponent .
We note that , are surjective, e.g. by observing that is a subspace of .
Proof.
As promised, this is just the statement of the Cauchy–Schwarz inequality as it applies in this context. Given , we have
[TABLE]
by Cauchy–Schwarz, and
[TABLE]
Defining in the obvious way, and provided for each , we get the desired inequality. ∎
Definition 3.7**.**
We denote the system defined in Proposition 3.6 by .
Often we need to apply Cauchy–Schwarz not just to one function, but to several at a time. The preferred way of formalizing this for our purposes is in two steps. First, we merge all the functions being considered for Cauchy–Schwarz into a single function. That is, we forget that they are separate functions, and consider their product as just one function of all the variables they collectively depend on. For instance, we might merge and into . Next, we apply the Cauchy–Schwarz inequality in the form of Proposition 3.6 to the new function .
In fact we will want to apply this merging operation in other contexts as well, because doing so is one way to eliminate redundant information. Having this ability is one of the main motivations for working in this more general language of linear data.
Again, we encode this operation with a proposition.
Proposition 3.8**.**
Let be a linear datum, let be a finite set, and let be a surjective function. Define a new linear datum on the same underlying space , as follows:
- •
for each , define
[TABLE]
- •
define ; and
- •
by abuse of notation consider as a map .
Then for any and with , dominates respecting and with exponent .
Proof.
Given , for each let
[TABLE]
and then restrict this function to the subspace . Then it is easy to see that
[TABLE]
and so the necessary inequality is in fact an equality. Moreover, if is a singleton then and , so the conditions of Definition 3.3 are met. ∎
Remark 3.9*.*
The definitions of and are somewhat involved. A more natural characterization in terms of subspaces of is that
[TABLE]
i.e. merging functions corresponds to intersecting the corresponding subspaces. Dually, we have
[TABLE]
so merging takes spans of the relevant subspaces of . Again, either of these allows us to reconstruct and up to isomorphism.
Definition 3.10**.**
We denote the linear datum defined as in Proposition 3.8 by .
In a slight abuse of notation, we may omit any indices that are unchanged by from the description of . For instance, the operation that merges indices and and labels the new combined index might be denoted .
We are now in a position to state a version of Theorem 1.5 coded in this language.
Lemma 3.11**.**
Let be a prime, and let be six linear forms in three variables. Let , and by abuse of notation let for denote the same forms reduced modulo , assumed to be non-zero. Write for , set , and hence define the linear datum .
Suppose do not lie on a conic in . Then there is some sequence of operations , and which can be applied to in turn to produce a final linear datum , such that:
- •
* where , , are unchanged and for ; i.e. again corresponds to linear forms in variables over , where and only may have changed;*
- •
by applying Propositions 3.4, 3.6 or 3.8 as appropriate as we go, we can deduce that dominates respecting and with exponent where is the number of steps;
- •
* is bounded by where is the size of the largest coefficient of ; or alternatively by ; and*
- •
the points do not lie on a conic, but some three are collinear.
This last condition means that one of Proposition 2.2 or Proposition 2.3 applies to the forms , and so
[TABLE]
for any with . Combining this with the domination statement allows us to deduce Theorem 1.5 (at least for ; the other cases follow by relabelling the indices).
Remark 3.12*.*
The combinatorial operations , describe the heart of any strategy, whereas steps are really just book-keeping to aid with proofs. One could in principle delay all steps to the end of the argument, or perhaps remove them completely, without fundamentally changing the approach.
3.2 Graphs of vector spaces
One remaining difficulty in reasoning about the effect of repeated invocations of is finding a good notation for discussing the iterated fiber products that arise in the definition of the ambient vector space .
At the expense of yet further notational overhead, we introduce one more tool to help with this. This subsection has very little content beyond allowing us to draw certain diagrams and make sense of what they mean.
Definition 3.13**.**
Let be a (multi)-graph with vertex set and edge set , and let be any vector space over . Suppose that to every edge is associated a subspace of . Then the vector space associated to this set-up is the subspace of given by
[TABLE]
In other words, we place a copy of at every vertex and impose a compatibility restriction for every edge.
We will always apply this when each subspace is one of for , where is part of some original linear datum with underlying space . It then makes sense to label each edge with a number , in place of the subspace .
The useful feature of this set-up is that steps correspond to simple combinatorial operations on the graph : we replace by two copies , keeping all the edges in each half; and we add an edge between and for every linear form involved in that Cauchy–Schwarz step (which is applied to the merge of some linear forms).
This is best illustrated by example. Suppose we start with a linear datum . At this point, the graph consists of a single vertex and no edges.
If we now apply , we get a linear datum whose underlying vector space corresponds to the following graph:
[math]1$$6
Indeed, the definition of in this case gives exactly the fiber product from Proposition 3.6. Recall that the indices / forms in scope are now called and and are associated to the left vertex and the right vertex respectively.
Suppose we now apply (recalling that by convention we assume the other indices are sent to themselves) and then apply . The new graph is:
00$$10$$01$$11$$6$$6$$4$$5
Finally, we might apply to obtain:
000$$100$$010$$110$$6$$6$$4$$5$$001$$101$$011$$111$$6$$6$$4$$5$$5
Any formal justification of this general pattern would be tedious and unreadable, so will will not attempt one. The reader may, if they wish, treat all such diagrams as visual aids having no formal impact on the proofs.
4 The detailed strategy for Theorem 1.5
The formalism of the previous section gives us very significant freedom to make radical changes to a linear datum . However, to prove a general result, what we want is to find a sequence of operations that changes as conservatively as possible, ideally giving back another datum of the same form with a small predictable change to some of the parameters.
Our task in this section then splits into two parts:
- (i)
to describe such a sequence of operations – henceforth called a block – and analyze and verify the change it produces; and 2. (ii)
to show how to chain these blocks together to reach sufficiently arbitrary points in the parameter space.
We will approach these tasks in reverse order.
4.1 The effect of the block construction
We again write . Suppose are six points in , corresponding to some system of six linear forms. Suppose furthermore that are in general position, and that lie on some given line but do not lie on . Note we allow that, say, be collinear, or even that ; in the latter case, forms part of the data of the set-up since it cannot be recovered from .
We will now describe an operation that modifies this collection of points. Specifically, it will leave unchanged, and replace with two different points that both lie on the unchanged line .
The points are constructed as follows. Let be the point at the intersection of the lines and ; then is the intersection of and . Similarly, letting be the intersection of and , the point is the intersection of and . This construction is shown in Figure 1.
Given our hypotheses on , this definition always makes sense.
We call this construction a block operation , i.e. . By exchanging the roles of we create a family of operations for each pair , . Swapping and but leaving and the same gives the same construction, which accounts for the fact there are operations not , and for the choice of notation.
In Section 4.2, we will implement a sequence of , and operations whose overall effect is equivalent to this move. That is, starting with a linear datum corresponding (indirectly) to forms and applying this sequence of operations, we obtain a linear datum corresponding to . We will not state this result precisely yet, as there are some technical subtleties to do with the case , where the previous sentence does not even make sense and the datum has to be modified to encode the line as well as the points.555Handling this case correctly is an irritating source of complexity in the argument, but seems to be slightly less irritating than avoiding it.
For the time being we will consider the operations as a black box. In order to prove Lemma 3.11, we broadly need to show that some sequence of moves takes the original to some final , with the property that one of or are collinear for some choice of , but the complementary triple are not collinear.
The second part of this is guaranteed by the following lemma, which shows that we never lose control of true complexity by applying (and symmetrically for other pairs ).
Lemma 4.1**.**
Let be as above, and suppose:
- •
if , that do not lie on a (possibly degenerate) conic; or
- •
if , that do not lie on a (possibly degenerate) conic that is tangent to at .
Then the same is true of .
Note that in the degenerate case, saying a degenerate conic consisting of two lines , is “tangent” to translates algebraically to saying that are concurrent.
Proof.
For any point on , there is an unique (possibly degenerate) conic passing through and . This conic meets again at precisely one other point, counting multiplicity, which we denote by . So, is an involution of the points of .
It is clear without doing detailed calculations that is a birational map , and so it must be a Möbius transformation (or one can check this directly). Moreover, if we write for for the intersection point of the lines and , then is characterized by
[TABLE]
since in all of these cases the conic is degenerate and so can be found by inspection.
Let be the map sending to where (i.e., how we obtained from in the definition of ). Then is also a Möbius transformation , since it is the composition of two perspectivities, the first sending to via and the second sending to via . Moreover, it is characterized by
[TABLE]
as again the image point in these cases is immediate by inspection. Let for be defined similarly by permuting the roles of ; so (4) holds analogously for these under a suitable permutation of indices. Note that where and .
Now, the hypothesis on in the statement holds if and only if . We wish to deduce that ; equivalently, that . It would suffice to show that as Möbius transformations . However, this is immediate because
[TABLE]
(using (3) and (4)) and because it follows from our general position assumptions that ,, are distinct points of , and therefore uniquely determine a Möbius transformation. ∎
This means that if we can find a sequence of operations that takes to some such that some triple for is collinear, then the complementary triple cannot be collinear as then would lie on a degenerate conic. So, we can largely forget about in what follows and concentrate on the action of on , which corresponds to the Möbius transformations defined in the proof of Lemma 4.1.
The following lemma explains how to take to an arbitrary point on , relatively efficiently, using multiple transformations .
Lemma 4.2**.**
We continue the notation from the proof of Lemma 4.1. Suppose we identify with (i.e., choose coordinates) by identifying with , with [math] and with (again noting these are guaranteed to be distinct). Every Möbius transformation of now corresponds to a matrix in .
Then the following hold:
[TABLE]
and:
[TABLE]
Consequently, the action of on is transitive, and more specifically any point , where , may be mapped to using a word of length or in .
Proof.
The first three identities may be verified using only (4) (and the corresponding statement for the other ) to deduce the action on ,, (which correspond to ).
For the next four, that approach does not appear to be sufficient. Our strategy is just pick coordinates and compute explicitly.
We can choose projective coordinates for such that666 To see this is possible, note that we can certainly choose a projective transformation sending , , to , and as are not collinear. In these coordinates, for some (as none of lie on ); by further rescaling we can ensure . This convention now differs from the one above – which is somewhat more convenient – by a further fixed change of coordinates.
, , and . We write for some : by our collinearity assumptions, this is possible, and furthermore , and .
Using the information (4), we can compute the matrices of explicitly as
[TABLE]
with the other six following from the relation . Verifying the remaining formulae is now just an exercise in multiplying (projective) matrices.
By a modified version of Euclid’s algorithm, any point may be reduced to one of using the matrices , , in steps (the worst case being something like for a large integer; the typical case is better). Alternatively, this can be done in steps, since the Cayley graph on with these generators has diameter , as a corollary of celebrated results concerning expander graphs; see [12]. Finally, one of the first three matrices moves the end point to , if necessary. ∎
We should also verify that the values of representing the original point are not too large.
Lemma 4.3**.**
Suppose for where are integers, and . Then in the coordinates on described in Lemma 4.2, the point is identified with where are integers with .
Proof.
Write for the determinant of the matrix whose columns are the vectors , , . Then we set
[TABLE]
and claim that are the coordinates of in the coordinate system of Lemma 4.2. One can verify that the definition of is invariant under rescaling any , or under a change of projective coordinates on . Also, it is straightforward to verify this claim when , , and (i.e. in the coordinates on used in the proof of Lemma 4.2). It follows that the claim holds in general. ∎
We have one further technical issue to consider. It will be convenient to construct the block that implements only in cases where none of the triples or for is collinear. This is not too onerous, as the first time we obtain a set of points where one of these triples is collinear, we can just stop and the conclusion of Lemma 3.11 will be satisfied. However, for this to work we need to check that, when this happens, we are not in one of the degenerate cases where .
We therefore check the following lemma.
Lemma 4.4**.**
Suppose and are as above and , are the points returned by the block move . Suppose also that no triple or for is collinear, but that some triple or is collinear. Then .
Proof.
Suppose for contradiction that for some . Then and . By (4), if either or , or and , then for some , which is a contradiction. Similarly, if either or , or and , then for some , which is again a contradiction. Since these cases exhaust all possible pairs , the result follows. ∎
4.2 Implementing a block move
In this subsection we describe a sequence of , and operations that have the effect of a block transformation . This is the last but most central ingredient in the proof of Theorem 1.5.
In the course of the argument in Section 4.1, we may need to consider intermediate configurations for which . Typically we do not expect this case to arise, but it would be onerous to try to avoid it in general. Also, it is not true that in such cases we are immediately done by some easy Cauchy–Schwarz technique as in Section 2: it appears this degeneracy is not one we can use to our advantage.
To handle this, we need to build more slack into our linear datum. We again set .
Definition 4.5**.**
Let be non-zero linear forms, with in general position, and let be a subspace of dimension in containing , but none of . An augmented datum representing is a linear datum where , for each and , and satisfy:
- •
for and any we have ;
- •
for and any we have for some .
Also, the standard datum representing is just where and for each , as in Lemma 3.11.
There are many roughly equally cryptic ways to phrase this rigorously. Geometrically, what has happened is that we have embedded our projective plane in a three-dimensional space , and each of corresponds to the respective point in the embedded copy of . Meanwhile, and correspond to lines in whose intersections with the embedded are , and whose canonical projections onto the embedded are both (contained in, but secretly equal to) the line .
In the case , we don’t really need this extra dimension but it does no harm, as the following lemma will show. When , the situation has genuinely changed because the augmented datum retains information about whereas the standard one would not.
Lemma 4.6**.**
Let be linear forms as in Definition 4.5, and suppose further that and . Let be an augmented datum representing and let be the standard datum representing . Then dominates respecting and dominates respecting , for any , and with exponent in each case.
Proof.
Both directions are by steps (i.e. Proposition 3.4). First we show that dominates trivially (respecting ). Indeed, we may consider the surjective maps given by , and given by the identity if and . It is immediate from our hypotheses that for each , and the claim follows.
To show dominates trivially (respecting ), we consider an injective map given by for some which will have to be chosen carefully, together with given by the identity if and to be specified when . It suffices to show that for each , under suitable choices. Note that this is already immediate for , given our hypotheses. For , we need precisely that
[TABLE]
for any . If for as elements of , so for some , we could define and the equation would be satisfied.
We check that this holds for an appropriate choice of . Because , is a basis for and so we may write
[TABLE]
for some . Define ; hence, and as required. ∎
Again it may be instructive to think about this geometrically. In the first part, we used the canonical embedding of into discussed above to get our morphism of linear data. In the second part, we chose a particular non-standard projection that collapses the line corresponding to onto and that corresponding to onto .
Finally, we can state a lemma which is the workhorse of the whole argument.
Lemma 4.7**.**
Let and be as in Definition 4.5, and let be an augmented datum representing . Also, let be linear forms such that
[TABLE]
Then there exists an augmented datum representing , such that dominates respecting for any , and with exponent .
The argument has roughly two phases. In the first phase, our goal is to build a datum corresponding to the following graph of vector spaces over :
12\mathbin{/\mkern-6.0mu/}34$$6$$6$$5$$5 56\mathbin{/\mkern-6.0mu/}12$$3 56\mathbin{/\mkern-6.0mu/}12$$3$$4$$4
Here the numbers next to the vertices denote two classes corresponding to those indices that get merged into and those that get merged into respectively. The indices at vertex will turn into .
It is not possible to construct this graph directly using steps, so we have to build a larger graph using steps and then prune it back using and steps.
In the second phase, we need to apply a carefully chosen operation to reduce this to a system defined on a single copy of .
Proof of Lemma 4.7.
We abbreviate to . Beginning with the augmented datum , we first apply :
[math]1$$6
and then followed by to get:
00$$10$$01$$11$$6$$6$$5$$5
Now we do to get:
000$$100$$010$$110$$6$$6$$5$$5$$001$$101$$011$$111$$6$$6$$5$$5$$3
followed by and then to get:
0000$$1000$$0100$$1100$$6$$6$$5$$5$$0010$$1010$$0110$$1110$$6$$6$$5$$5$$3$$0001$$1001$$0101$$1101$$6$$6$$5$$5$$0011$$1011$$0111$$1111$$6$$6$$5$$5$$3$$4$$4
Note that in these diagrams, the set of indices of the corresponding linear data are where , and there is no edge labelled incident to vertex . In other words, there is a surviving linear form attached to each vertex (which has not been Cauchy–Schwarzed away) for each which is not a vertex label at .
Denote this last datum by \Psi_{1}=\left(\mathcal{V},\big{(}W_{i}^{(1)}\big{)}_{i\in I_{1}},\big{(}\psi_{i}^{(1)}\big{)}_{i\in I_{1}}\right). Explicitly: is the subspace of determined by the above graph of vector spaces; the index set is
[TABLE]
the vector spaces for are all just ; and are given by
[TABLE]
Our next task is to prune back all of the squares apart from the bottom right one using and steps. We will need the following standard linear algebra fact.
Lemma 4.8**.**
Let be vector spaces and for be linear maps. Then there exist maps such that
- •
* for ;*
- •
* for ; and*
- •
the maps , commute.
Proof.
Pick a basis for , and extend it separately to a basis for and ; merging these gives a basis for . Finally extend this to a basis for . This gives a direct sum decomposition where and . Let and . It is clear these maps have the desired properties. ∎
Consider e.g. the bottom left square consisting of , , , . We now merge the indices for into a single index , and all eight indices , for into a single index ; that is, we apply
[TABLE]
to to obtain the merged datum \Psi_{2}=\left(\mathcal{V},\big{(}W^{(2)}_{i}\big{)}_{i\in I_{2}},\big{(}\psi^{(2)}_{i}\big{)}_{i\in I_{2}}\right).
Let denote the vector space associated to the following graph of vector spaces on :
0000$$1000$$0100$$1100$$6$$6$$5$$5$$0110$$3$$0001$$1001$$0101$$1101$$6$$6$$5$$5$$0011$$1011$$0111$$1111$$6$$6$$5$$5$$3$$4$$4
and define a datum \Psi_{3}=\left(\mathcal{V}^{\prime},\big{(}W_{i}^{(3)}\big{)}_{i\in I_{3}},\big{(}\psi^{(3)}_{i}\big{)}_{i\in I_{3}}\right) where:
- •
is the same as ;
- •
and for every ;
- •
and
[TABLE]
(It is not very important, but these are all surjective, as can be seen by considering the image of the diagonal embedding .)
We claim that is dominated trivially by the “pruned” datum . To justify this, we first apply Lemma 4.8 to , and to obtain maps . We can then define an injection by
[TABLE]
For this to make sense, we need the compatibility conditions associated to the graph for to hold. In particular we need
[TABLE]
and indeed these follow from the properties of and . The remaining compatibility conditions are inherited from .
We now give the step explicitly. For the map is just the identity. For and , consider that
[TABLE]
as the original constituent forms of depend only on and those of on in . It follows that there exist unique linear maps and such that and , respectively. Hence the conditions of Proposition 3.4 are satisfied and dominates trivially.
It is natural to relabel as and as in , as these indices now behave exactly like copies of and respectively associated to the vertex .
We can summarize the preceding argument, which took us from the -vertex graph to the -vertex graph above, informally as follows. On the dual side, we can say that we built a projection map that maps each of \big{(}\ker\psi^{(2)}_{i_{0010}}\big{)}^{\perp} into \big{(}\ker\psi^{(3)}_{R}\big{)}^{\perp} and each of \big{(}\ker\psi^{(2)}_{i_{1010}}\big{)}^{\perp} or \big{(}\ker\psi^{(2)}_{i_{1110}}\big{)}^{\perp} into \big{(}\ker\psi^{(3)}_{S}\big{)}^{\perp} (for ). Moreover, was constructed by projecting out the spare coordinates at the vertices , , using along vertical edges and along horizontal edges, and we checked this made sense. This is what is meant by the following further annotated diagram (we will see more of this kind below).
0000$$1000$$0100$$1100$$6$$6$$5$$5$$0010$$1010$$0110$$1110$$6$$\mathfrak{s}_{2}^{\ast}$$6$$\mathfrak{s}_{2}^{\ast}$$5$$\mathfrak{s}_{1}^{\ast}$$5$$\mathfrak{s}_{1}^{\ast}$$1234\mathbin{/\mkern-6.0mu/}\emptyset$$\emptyset\mathbin{/\mkern-6.0mu/}1234$$\emptyset\mathbin{/\mkern-6.0mu/}1234$$3$$0001$$1001$$0101$$1101$$6$$6$$5$$5$$0011$$1011$$0111$$1111$$6$$6$$5$$5$$3$$4$$4
We then repeat these same steps on the top left and top right corners. The result is a datum \Psi_{4}=\left(\mathcal{V}^{\prime\prime},\big{(}\psi^{(4)}_{i}\big{)}_{i\in I_{4}},\big{(}W_{i}^{(4)}\big{)}_{i\in I_{4}}\right) corresponding to the -vertex configuration
0000$$1000$$0100$$1100$$6$$6$$5$$5$$0110$$3$$0101$$0111$$3$$4$$4
and which dominates , and hence , respecting for , with exponent . Explicitly:
- •
the index set of is
[TABLE]
- •
the space is that associated to the above graph of vector spaces, and so is a subspace of where ;
- •
we have when or when ; and
- •
for each .
This completes the first phase of the argument.
We now perform our remaining operation. This partitions all remaining forms apart from those in the copy into two classes and , by
[TABLE]
and we call the merged datum . This corresponds to the annotated diagram discussed above:
0000$$1000$$12\mathbin{/\mkern-6.0mu/}34$$0100$$2\mathbin{/\mkern-6.0mu/}1$$1100$$12\mathbin{/\mkern-6.0mu/}34$$6$$6$$5$$5$$0110$$56\mathbin{/\mkern-6.0mu/}12$$3$$0101$$15\mathbin{/\mkern-6.0mu/}26$$0111$$56\mathbin{/\mkern-6.0mu/}12$$3$$4$$4
and we note the index set is now .
Finally, we wish to dominate trivially by an augmented datum . Recall that is as follows:
- •
its base space is ;
- •
the index set is ;
- •
the spaces are given by for and ; and
- •
for , and for , where are the given forms satisfy , , and are forms we may choose.
In what follows we identify indices and for , and , and and . Our remaining task is therefore to construct linear maps , and which, together with the identity maps for , satisfy the conditions of Proposition 3.4.
We fix some notation. Again write for the points in . Let , , , be defined as above (see Figure 1); that is, is the intersection of the lines , is the intersection of the lines and , is , and is . Write for each .
Also recall ; so is naturally a subspace of , and we write for the other summand, so that . Dually, we may make an identification , and thereby identify and with subspaces of . Let be the linear form , meaning that .
We make a simplifying observation. If , then and , so the effect of the whole block move was just to swap and . In this case, the result is trivially satisfied by exchanging the indices and (and ignoring everything we’ve done up to this point). Hence we can assume in what follows.
We isolate a linear algebraic lemma which states concretely what is needed for this step.
Lemma 4.9**.**
There exist subspaces , of (which is itself a subspace of ), and linear maps , with the following properties:
- (i)
*composition with fixes and *(i.e., and ); 2. (ii)
*composition with fixes and *(i.e., and ); 3. (iii)
similarly, and ; 4. (iv)
* and ;* 5. (v)
* and ;* 6. (vi)
* and ;* 7. (vii)
* and ;* 8. (viii)
* and have dimension at most , and and are contained in the -dimensional subspaces corresponding to , respectively.*
Indeed, suppose this lemma holds. We may define
[TABLE]
which is perhaps best summarized by further annotating the above diagram as follows:
0000$$1000$$12\mathbin{/\mkern-6.0mu/}34$$0100$$2\mathbin{/\mkern-6.0mu/}1$$1100$$12\mathbin{/\mkern-6.0mu/}34$$\tau_{2}^{\ast}\circ\tau_{4}^{\ast}$$6$$\tau_{4}^{\ast}$$6$$\tau_{2}^{\ast}$$5$$\operatorname{id}$$5$$0110$$56\mathbin{/\mkern-6.0mu/}12$$\tau_{1}^{\ast}\circ\tau_{3}^{\ast}$$3$$0101$$15\mathbin{/\mkern-6.0mu/}26$$0111$$56\mathbin{/\mkern-6.0mu/}12$$\tau_{3}^{\ast}$$3$$\tau_{1}^{\ast}$$4$$\operatorname{id}$$4
Statements (i)–(iii) ensure that makes sense, i.e. that all the compatibility conditions in the definition of are satisfied. By the fact that and statement (viii), for any with , we can find such that and ; we use these , to complete the definition of , and thereby . Then, statements (iv)–(vii) are precisely what we need to deduce that
[TABLE]
and as before this guarantees that there exist unique maps and such that and respectively.
The proof of this lemma is unpleasant and technical linear algebra, and will occupy the rest of the section.
Proof of Lemma 4.9.
Note that are -dimensional subspaces of corresponding to , i.e. . Also, are subspaces of of dimension such that and are the -dimension subspaces , corresponding to respectively.
We first construct the map . Roughly speaking, is a projection from to the line , which collapses the line onto and the line onto . Specifically, we want the following.
Claim**.**
We can find satisfying the following conditions: is the identity; and writing and , then is just the -dimensional subspace corresponding to and the -dimensional subspace corresponding to .
Note that the fact implies (i) from the statement.
Proof of Claim.
Let be the intersection point of the lines and (which exists as, say, are not collinear). Since we are assuming , it follows that does not lie on the line .
Note and , and that and are -dimensional subspaces correspond to the lines and repsectively. Writing , we have (as ), and furthermore the intersection is precisely the -dimensional subspace corresponding to .
Hence, we may pick some non-zero , and extend this to a basis for ; so and necessarily .
It follows that is a basis for , and so we may define by:
[TABLE]
which immediately implies that is the identity.
Since and are linearly independent vectors mapped to [math] by , it follows that has dimension at most . Moreover, any vector in the -dimension subspace corresponding to is in and also in , so is fixed by . It follows that contains the -dimensional subspace corrosponding to ; so in fact is exactly this subspace.
A parallel argument shows is exactly the -dimensional subspace . ∎
The construction of is similar, only in reverse, i.e. projecting back onto the subspace .
Claim**.**
We can find a map such that the following hold. Define and . Then are subspaces of of dimension at most ; and , are contained in the -dimension subspace corresponding to , respectively.
Note this gives the definition of the subspaces and from the statement.
Proof of claim.
Let be any basis for ; so (say) is a basis for . Define by:
[TABLE]
So, is a projection onto the subspace , and in particular its image is . Also, , all have dimension , and so the first part of the claim follows.
Suppose . Necessarily , and for some . Since , we may write for some and , and then . But , and , so and . We deduce that lies in (as does) and in ; so lies in the -dimension subspace which corresponds precisely to .
A parallel argument shows that is contained in the subspace corresponding to . ∎
At this point properties (ii), (iv), (v) and (viii) from the statement are satisfied, in addition to (i) as discussed. Indeed, (ii) follows from the requirement , since . The facts and , and similarly and , give (iv). Property (v) is immediate from the definition of , , and (viii) is contained in the previous claim.
Our final task is to construct and . Specifically, we want:
Claim**.**
There exist maps and such that
- •
;
- •
;
- •
;
and
- •
;
- •
;
- •
.
Combined with the properties of and we have already shown, this suffices for the remaining parts (iii),(vi),(vii) of the statement.
We isolate yet another linear algebra sub-claim.
Lemma 4.10**.**
Given a tuple where are two distinct -dimensional subspaces of a vector space of dimension , and is a point not in either subspace; and another such configuration ; there is some isomorphism mapping , , .
Proof.
After a change of coordinates, we may assume and (e.g. by considering two corresponding points in and extending to a basis). Suppose in these coordinates; so by assumption. By a further rescaling of coordinates we may therefore assume . Finally, a change of variables means and are unchanged.
Repeating this argument for and interpreting the changes of coordinates as an isomorphism gives the result. ∎
Proof of claim.
Let be the map given by the lemma applied to the tuples and in . Note that is not on , , or by our assumptions, and , , so the hypotheses are satisfied. Then let . It follows that this has the properties claimed.
Now let be the map given by the lemma applied to and . Again, does not lie on or , or on or (as the latter two imply respectively that or are collinear); and and (for many reasons); so this is valid.
Now let be the unique map defined by for any and , where is the linear form from the definition of , as in Definition 4.5. (This is possible: choose a basis for and extend it to a basis for by adding ; then define suitably on this basis.) Again, this gives the desired properties. ∎
This concludes the proof of Lemma 4.9, and thereby that of Lemma 4.7. ∎
∎
This is the last ingredient in the proof of Theorem 1.5. We briefly summarize the proof as a whole, as the different parts have been spread over the last few sections.
First one calculates the point , given explicitly in Lemma 4.3, corresponding to the point on in our chosen coordinates.
Next, we convert the standard datum given into an augmented datum (by Lemma 4.6).
In the main part of the argument, we apply Lemma 4.7 repeatedly, under various permutations of , following the steps from from Lemma 4.2 applied to the point .
If at any point we arrive at a datum where are collinear for some and , we terminate this process early; but if it runs to completion, some such collinearity is guaranted at the end. By Lemma 4.6 again (and Lemma 4.4) we dominate this by the corresponding standard datum.
Finally, we apply Proposition 2.3, or the standard Cauchy–Schwarz complexity bound (Proposition 1.1), to control this final datum by or respectively. By keeping track of the various domination statements, and noting in particular that we did not apply Lemma 4.7 too many times, we deduce the required bound on the original linear datum.
5 A proof of Theorem 1.7
Here we describe the construction of the counterexample described in Theorem 1.7.
As in the statement, let be a large prime. The congruence condition ensures that is a quadratic residue modulo . In what follows, we will assume that some choice of square root of in has been fixed, and refer to it simply as .
We let denote the two-dimensional arithmetic progression:
[TABLE]
for some small absolute constant to be specified.
We note that any value has at most one representation as where are integers with . Indeed, if then is a multiple of ; but it has absolute value at most . Hence, which is a contradiction unless , .
Now, define the system of linear forms by:
[TABLE]
We claim that these forms do not lie on a conic, i.e. the system has true complexity . Since , we need not distinguish between symmetric bilinear forms and quadratic forms, so it makes sense to write
[TABLE]
and then it suffices to verify that the matrix (whose columns correspond to )
[TABLE]
is non-singular. In fact, the determinant of this matrix is , and so do not lie on a conic.
Our reason for choosing these specific forms is that they nonetheless satisfy a kind of “skew conic” identity which we now describe. Suppose we define forms using the same coefficients as above, now thought of as elements of the ring . These also induce forms in the obvious way. Write for the Galois conjugate and for the norm of (so ). Then the “skew conic” identity is the fact that:
[TABLE]
for any .
So, we can define a function
[TABLE]
for some parameter to be specified; so is supported on . We claim, for say , that for every the expression
[TABLE]
takes value either , if all six forms take values in , or [math] otherwise. This is essentially a kind of Freĭman isomorphism argument over .
Indeed, note that tuples or \big{(}\widetilde{\phi_{1}}(x,y,z),\dots,\widetilde{\phi_{6}}(x,y,z)\big{)} for or respectively are precisely those tuples satisfying the equations
[TABLE]
So, if are all in , write for the unique integers with , and let be the corresponding elements of . Then each of the left hand sides of the equations above has the form for integers with , and is congruent to modulo , so by our earlier uniqueness argument must be [math]. Hence (r_{1},\dots,r_{6})=\big{(}\widetilde{\phi_{1}}(x,y,z),\dots,\widetilde{\phi_{6}}(x,y,z)\big{)} for some , so (5) applies and the claim follows.
Given that for any of the form for integers with , we deduce that
[TABLE]
for sufficiently large, as required.
Finally we need to consider . This is a fairly standard estimate on quadratic exponential sums, but with some variations. For simplicity we use a mean value strategy.
For , let denote where are the unique integers in with . Then for any such that are in , we have , for some unique integers , and
[TABLE]
So, for any such , if we now take an average in the parameter , we get
[TABLE]
It follows that
[TABLE]
and we note that for fixed and any fixed there is at most one solution in to , so the right hand side is bounded by
[TABLE]
whenever is sufficiently large. Picking some for which is at most its mean value, the result follows.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Tim Austin, Partial difference equations over compact abelian groups, i: modules of solutions , 2013.
- 2[2] , Partial difference equations over compact abelian groups, ii: step-polynomial solutions , 2013.
- 3[3] W. T. Gowers, A new proof of Szemerédi’s theorem , Geom. Funct. Anal. 11 (2001), no. 3, 465–588.
- 4[4] W. T. Gowers and J. Wolf, The true complexity of a system of linear equations , Proc. Lond. Math. Soc. (3) 100 (2010), no. 1, 155–176.
- 5[5] , Linear forms and higher-degree uniformity for functions on 𝔽 p n subscript superscript 𝔽 𝑛 𝑝 \mathbb{F}^{n}_{p} , Geom. Funct. Anal. 21 (2011), no. 1, 36–69.
- 6[6] , Linear forms and quadratic uniformity for functions on 𝔽 p n subscript superscript 𝔽 𝑛 𝑝 \mathbb{F}^{n}_{p} , Mathematika 57 (2011), no. 2, 215–237.
- 7[7] , Linear forms and quadratic uniformity for functions on ℤ N subscript ℤ 𝑁 \mathbb{Z}_{N} , J. Anal. Math. 115 (2011), 121–186.
- 8[8] Ben Green and Terence Tao, An arithmetic regularity lemma, an associated counting lemma, and applications , An irregular mind, Bolyai Soc. Math. Stud., vol. 21, János Bolyai Math. Soc., Budapest, 2010, pp. 261–334.
