
TL;DR
This paper provides a refined estimate for counting rational points on Grassmannian varieties with bounded height, improving classical results and extending to all points, with implications for flag varieties.
Contribution
It introduces a new counting formula for rational points on Grassmannians that counts all points, refining previous bounds and extending to flag varieties.
Findings
Derived a comprehensive estimate for rational points on Grassmannians.
Extended counting results to flag varieties.
Improved upon classical bounds by including all points.
Abstract
We prove an estimate on the number of rational points on the Grassmannian variety of bounded twisted height, refining the classical results of Schmidt ([12]) and Thunder ([20]) over the rational field: most importantly, our formula counts all points. Among the consequences are a couple of new implications on the classical subject of counting rational points on flag varieties.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Counting rational points of a Grassmannian
Seungki Kim
Abstract.
We prove an estimate on the number of rational points on the Grassmannian variety of bounded twisted height, refining the classical results of Schmidt ([12]) and Thunder ([20]) over the rational field: most importantly, our formula counts all points. Among the consequences are a couple of new implications on the classical subject of counting rational points on flag varieties.
Key words and phrases:
rational points, Grassmannians, flag varieties, Manin’s conjecture.
2020 Mathematics Subject Classification:
11H06, 11G50, 14G05
1. Introduction
1.1. Main result
For a lattice, and , let be the number of primitive rank sublattices of of determinant less than or equal to . The purpose of this paper is to investigate the quantitative behavior of . The earliest result of this kind goes back to the mid-twentieth century, due to W. Schmidt ([12]):
Theorem 1.1** (Schmidt [12], Theorem 1).**
Let
[TABLE]
where is the volume of the unit ball in and is the Riemann zeta function, except that we understand for convenience. Then
[TABLE]
where the implicit constant depends on only.
For of full rank, may also be understood in terms of a counting problem on the Grassmannian variety consisting of the -dimensional subspaces of . A rational point on is a -dimensional subspace such that is a rank sublattice of . Its height is given by the determinant of , and given a real matrix , its height twisted by is given by the determinant of the rank lattice
[TABLE]
(we write vectors horizontally, so that matrices multiply from the right). Observe that, if and are two matrices whose row vectors generate the same lattice , i.e. for some , then the number of the rational points whose heights are bounded by is the same, regardless of whether one twists the height by or . In addition, one checks that this number is precisely as defined above. We refer the reader to Thunder ([19, Introduction], [20, Part I]) for the general definition of a twisted height, in which it is first introduced.
From this perspective, Thunder ([20]) proved a vast generalization of Theorem 1.1 above, extending it to any lattice and to any number field (here -modules play the role of lattices). His result, from the 1990’s, remains state-of-the-art to this day. We state his result in case :
Theorem 1.2** (Thunder [20], Theorem 3).**
Let be a lattice of full rank. In addition to the notations in Theorem 1.1, define
[TABLE]
where are choices of linearly independent vectors in such that , where is the -th successive minimum of defined by
[TABLE]
where here is the closed ball at origin of radius in . Let be the number of rank sublattices of of determinant whose intersection with is trivial. Then
[TABLE]
where the implicit constant depends only on .
A notable feature of Theorem 1.2 is that it provides an explicit description of the dependence of the error term on the successive minima of (observe that by the Minkowski’s second theorem). Informally speaking, it reflects the “skewness” of the lattice: in case is severely skewed, in the sense that is much smaller than for some , one expects a different behavior of the error term than the case in which most are about equal. Theorem 1.2 may be seen as a realization of this intuition.
However, Thunder ([20]) does not provide a corresponding estimate for , remarking that it would “be a cumbersome task.” In the present paper, we introduce a method that circumvents this difficulty, and prove
Theorem 1.3’****.
Continue with the notations in Theorems 1.1 and 1.2 above. Then for all ,
[TABLE]
where the implied constant depends only on .
In fact, we prove the more precise
Theorem 1.3**.**
Let be a lattice of full rank. For all ,
[TABLE]
where the implied constant in the big-O notation depends only on , and is a finite set of indices, of cardinality less than for . Each is associated with and of the form
[TABLE]
for some real such that . This makes the right-hand side of (1.2) scale-invariant i.e. it remains unchanged if and are replaced by and , respectively.
In particular, the leading error term is
[TABLE]
as in Theorem 1.2.
Remark*.*
If is of rank , we may adapt Theorem 1.3 by observing that, for any isometry , it holds that . The same applies to the results of the similar flavor that we state below.
Remark*.*
The estimate is an extremely crude one, provided only to assure that the sum is finite. Describing exactly from our computations would be quite a laborious task that would not yield a pretty formula and whose fruits seem unclear.
Thus one may feel that the complicated statement of Theorem 1.3 is unnecessary. However, we leave it as it is, since the precise knowledge of at least a part of the error term may be useful for certain applications. For instance, to prove Corollary 1.1 below, we really need the (close-to-)optimal version of the leading error term stated as in Theorem 1.3. If we could compute the coefficients ’s optimally for more ’s, we expect to be able to strengthen Corollary 1.1 accordingly.
We also state the subsequent results in the precise form. If one desires simplification, one may replace by an appropriate power of ’s, as in (1.1).
Before we go on to discuss a few applications of Theorem 1.3, let us present a few of its variants that may also be of use.
Theorem 1.4**.**
Let be the number of (not necessarily primitive) rank sublattices of of determinant . Also let
[TABLE]
Then similarly to Theorem 1.3, for a full-rank lattice we have
[TABLE]
where the implied constant in the big-O notation depends only on , and is a set of indices of cardinality at most . The description of (resp. ) is the same as that of (resp. ) in Theorem 1.3. Also, if , the leading error term is the same as in Theorem 1.3. If , then the largest for may be chosen to be , for any .
Theorem 1.5**.**
For a lattice , choose a primitive sublattice of rank , and let be the number of primitive rank sublattices of whose intersection with is trivial. For of full rank, we have
[TABLE]
where again the description of the error term is identical to that of Theorem 1.3, except that is now a set of cardinality less than . In particular, the leading error term is the same as in Theorem 1.3. Moreover, this formula is independent of .
The analogous statement holds for , which counts both primitive and non-primitive lattices.
1.2. Applications
Below we demonstrate a few immediate applications of Theorem 1.3 and the techniques used in its proof. Its main strength lies in the fact that it counts all the sublattices, and that it provides information regardless of how skewed the given lattice is, in particular relative to . In comparison, its precedent Theorem 1.2 misses the sublattices that nontrivially intersect , and thus it does not say anything about the lattices for which .
We expect there to be more uses of Theorem 1.3; for instance, see a recent work of Le Boudec ([7]), which employs the case (due to Schmidt ([12]), see (2.1) below) as the main device.
1.2.1. Rational points of flag varieties
It is natural to expect that a counting formula on Grassmannians should yield a counting formula for general flag varieties. Indeed, Thunder ([20]) derives such a formula as a relatively simple application of Theorem 1.2. We present its simplest case to initiate the discussion:
Theorem 1.6** (Thunder [20], Theorem 5).**
Let be a lattice of rank , and suppose is sufficiently large. Then the number of flags of type (hence ) such that for , and , whose height twisted by is at most is
[TABLE]
where is some explicit constant depending only on , and the implicit constant in the error depends only on .
In this context, the height of the flag is the quantity ; see e.g. Thunder ([20]) for details.
In comparison, we can derive from Theorem 1.3 the following
Corollary 1.1**.**
Let be a lattice of rank , and be sufficiently large — more precisely, . Then the number of flags of type such that whose height twisted by is at most is
[TABLE]
where and likewise for , is the same as in (1.3), the implicit constant depends on only, is an index set of cardinality at most , and and are as in the statement of Theorem 1.3, except that .
Furthermore, the largest is , and the next largest is either
[TABLE]
if , it is always .
In order to keep the proof relatively short and simple, we had to keep some of the assumptions made by Theorem 1.6. Still, it has a couple of new implications that may be of interest. First, it shows that there must exist a gap between Theorem 1.6 and an ideal counting formula that would count all the rational flags, and that it must be at least of size
[TABLE]
For a heavily skewed , for instance if but , then this is of comparable size to the main term.
The second implication has to do with the error term in the well-known theorem of Franke, Manin, and Tschinkel ([4]) on the number of rational points on flag varieties. In the corollary to Theorem 5 therein, which says that the number of rational points on a flag variety of (“untwisted”) height bounded by is
[TABLE]
where is a polynomial of degree , they conjecture that the error term is of size with . On the other hand, when is a Grassmannian, the literature (for instance Schmidt [12], and Thunder [19] [20]) suggests that , as their analyses seem fairly sharp. Our Corollary 1.1 extends this to flag varieties of type , suggesting that we have again, at least when . In case , one may be able to estimate by duality. But in general the nature of appears rather complicated.
It is possible to prove by a similar argument an analogue of Corollary 1.1 for a flag variety of any type that is strong enough to yield these same implications. On the other hand, we expect the ideal formula that would count all points on a flag variety to require another substantial amount of effort along the lines of the present paper. As stated in the remark after Theorem 1.3, if we could find explicit expressions for every in (1.2), preferably containing large powers of , it would allow our error-bounding techniques to apply immediately. Using the methods of the present paper, it may be possible to achieve this for the first few small values of .
1.2.2. Mean value theorems over lattices
Mean value theorems over random lattices provide a method of averaging lattice-point counting formulas over the space of determinant full-rank lattices in . The first known such theorem is the famous Siegel integration formula:
Theorem 1.7** (Siegel [17], Theorem on p.341).**
Let be a Borel measurable and compactly supported function. Then
[TABLE]
where is the normalized Haar measure on , and is the Lebesgue measure.
For example, if one sets to be the characteristic function of a set , then the sum on the left-hand side of (1.5) equals , and thus Theorem 1.7 implies that a random lattice sampled according to has on average nonzero vectors contained in . Another important example is the Rogers integral formula, one of the main tools in geometry of numbers (see e.g. [5], [15], [18] for some of the applications):
Theorem 1.8** (Rogers [9], (essentially) Theorem 4).**
For , let be a Borel measurable and compactly supported function. Then
[TABLE]
where is the normalized Haar measure on , and each is the Lebesgue measure.
In the author’s work [6], written concurrently with the present paper, the machinery that turns a lattice-point counting formula, such as Theorems 1.3 and 1.5, into a mean-value theorem has been developed, inspired by the argument of Rogers ([9]). As a result, the following extension of Theorem 1.8 to Grassmannians is obtained from Theorem 1.5
Theorem 1.9** (Kim [6], Theorem 3).**
Suppose , and . Define to be the number of independent primitive sublattices of of ranks and determinants bounded by , respectively. Then
[TABLE]
We note that Thunder ([21]) proved the case of this result. Theorem 1.9 has several implications on the statistics of the randomized heights of the points on Grassmannians and flag varieties, such as the following.
Corollary 1.2** (Kim [6], Corollary to Theorem 5).**
Let , and . For a lattice and its flag of type , its height is defined as the quantity
[TABLE]
Let be the number of type flags of whose heights are bounded by . Then
[TABLE]
It may be interesting to compare this result with Theorem 1.6 and Corollary 1.1. The author speculates that the divergence here is related to the main term of (1.4) in the statement of Corollary 1.1 being dependent on the skewness of the lattice in question.
1.3. Method of proof
All previous works on this topic ([12], [19], [20]) count “upwards,” i.e. they construct the -dimensional sublattice from either a -dimensional sublattice or a -dimensional sublattice lying in an -dimensional ambient space. Our main idea is to take the dual approach, and count “downwards” instead: we project all the -dimensional sublattices to a hyperplane, and count the cardinality of each fiber. This lets us bypass some of the technical difficulties that arise when counting upwards.
To elaborate, we prove Theorem 1.3 by the following inductive procedure that resembles the Pascal’s triangle method of computing the binomial coefficients. In case or , the formulas are well-known. Otherwise, let be the projection of onto the orthogonal complement of a shortest nonzero vector of . Then we have
[TABLE]
where can be regarded as a certain integral transformation. For a choice of a basis of and a sublattice of rank , let us say is of d-type — “d” stands for “dual” — if the projection of onto has rank . Then the first term on the right-hand side of (1.6) is counting the sublattices of d-types , and the second term is counting those of d-types .
In comparison, Theorem 1.2 counts precisely the sublattices of d-type . The upward counting method forces one to count the sublattices of each d-type separately, which is precisely what Thunder refers to as being “cumbersome.” The downward method resolves this difficulty.
Most of this paper is devoted to explicitly writing out and estimating . Many parts of the computation can be done by slightly refining the methods of Schmidt ([12]) or Thunder ([20]). However, the fact that can be arbitrarily skewed presents a new difficulty, especially when bounding the error terms. This is resolved by comparing the gaps between the successive minima to : if for all , the lattice may be considered not so severely skewed, as the classical techniques continue to apply. If in contrast for some , we exploit this gap to finesse the desired error bound.
1.4. Organization
In Section 2, we introduce the definitions and notations used throughout the paper, and state the known formulas for and . In Section 3, we set up the induction argument, establishing the precise version of (1.6). Sections 4 and 5 are devoted to the main and error term estimates, respectively. Section 6 collects all the computations and concludes the proof of Theorem 1.3. The variants are all proved in Section 7.
1.5. Acknowledgment
Part of this work is supported by NSF grant CNS-2034176. The author thanks the anonymous referee, Lillian Pierce, Anders Södergren, and Jeffery Thunder for helpful comments and suggestions.
2. Some backgrounds
2.1. Definitions, notations, and conventions
Unless mentioned otherwise, the definitions and notations of this section apply.
Generalities
The lowercase letter denotes a prime. Let us write for short. We use capital letters such as to refer to matrices, and calligraphic fonts such as to denote lattices. and are fixed integers throughout the paper.
As in the statement of Theorem 1.2, is the -th successive minimum of , and denotes (a choice of) a primitive -dimensional sublattice containing , which are linearly independent with . The -entry of a matrix is denoted by the lowercase letter of the name of the matrix indexed by . For example, if is a matrix, then . Similarly, if , then the -th entry of is denoted by .
Later, given a matrix and a vector , we will need to consider the matrix whose -th row equals . We write to describe such a matrix.
For two quantities and , means , where is a positive constant possibly depending on and but no other variables. means and . For example, Minkowski’s second theorem says that .
For two matrices and with rows, means they differ by the left multiplication by an element of . If the rows of each of and respectively span and in (whose precise definition is given below), means that .
Later in the paper, we will need a few facts from reduction theory. Let be a basis of a lattice , and be its Gram-Schmidt orthogonalization, that is, each is the projection of to the orthogonal complement of . Let us say the basis is reduced if each and for all . It is known by reduction theory (see e.g. [2, Chapter 1]) that any lattice has a reduced basis. Moreover, the LLL algorithm ([8]) outputs a reduced basis of any lattice, given any basis of that lattice.
and the determinant/height
A integral matrix is said to be primitive if can be completed to an element of . When , this agrees with the standard definition of a primitive vector. We denote the set of all primitive matrices by .
For a lattice of rank , a sublattice is said to be primitive if . We denote for the set of all rank primitive sublattices of inside . Choose a basis of , and a basis of . Let and respectively denote the and matrices, such that the -th row of is , and the -th row of is . One checks that by the fact that .
Suppose one chooses a different basis of , and let be the matrix such that the -th row of is . Then for some . Conversely, if for some and , then the rows of and span the same element of . Therefore, with a choice of whose rows span over , there exists a bijection between and the set of orbits of the action of on by left multiplication.
To make this even more explicit, recall that each element of is uniquely represented by a primitive Hermite normal form over . Thus there is also a bijection between and the set of all elements of form where is a primitive Hermite normal form over . Whenever convenient, we will use these identifications of interchangeably throughout the paper.
In order to simplify some notations, we adopt the unusual convention that all determinants are nonnegative (the groups and maintain their usual meanings, though). Specifically, for a square matrix , we write for the absolute value of its usual definition of determinant. For a non-square matrix , we define . For a lattice , we define to be its covolume within its -span. For , note that holds, where is any choice of a matrix whose row vectors form a basis of , and is any choice of an element of such that the row vectors of form a basis of .
For a matrix , we define
[TABLE]
and similarly for a lattice .
It is easy to see that, for any compactly supported function defined on a subset of , we have
[TABLE]
where is understood as the image of by the linear map induced by the matrix . Again we will switch freely between these notations as we see fit.
Orthogonality notions
Following Schmidt ([12]), we define the polar lattice of by . If , we define its orthogonal lattice by .
In addition, for a matrix whose -th row vector is denoted by , we define its polar matrix as the matrix whose -th row vector lies in and satisfies . Then the rows of generate the polar lattice of the lattice generated by the rows of .
2.2. Base cases
In case , Theorem 1.3 is precisely Theorem 4 in [20] (also Lemma 2 of [12]), which states that
[TABLE]
Below in Lemma 3.6, we present an extension of (2.1) to an affine lattice, which we will need later.
In case , we apply the duality theorem (see Section 2 of [20]) to (2.1), which states that, for a sublattice and its orthogonal lattice ,
[TABLE]
holds, and thus
[TABLE]
Therefore (2.1) implies
[TABLE]
By the well-known facts that and (in fact, , by [1, Theorem 2.1]),
[TABLE]
so we can rewrite the above as
[TABLE]
3. Division into two parts
3.1. Preliminaries
Until the end of Section 6, we fix and . We will divide into two parts, and deal with them one at a time. We induct on , assuming that has been computed for all lattices of rank .
Throughout the rest of the paper, we fix a basis of , and denote by the matrix whose -th row is . Define , and identify it with the projection of onto the subspace of orthogonal to i.e. we think of as a subset of . Let be the component of orthogonal to , so that for some and . Also let be the matrix whose -th row is .
We write
[TABLE]
where equals the number of rank primitive sublattices of of determinant such that its projection to is also of rank , and equals the number of those whose projection is of rank . Equivalently, counts primitive sublattices whose -span does not contain , and counts those that does.
As discussed in Section 1.4 above, we may identify with an orbit of the left multiplication of on , for some . Also, let be the matrix whose -th row vector equals for , and for , so that
[TABLE]
Then we can also write in the form , where is the first submatrix of , and and are vectors in .
3.2. Computing
Consider first the case , so that contributes to . We may assume that is a Hermite normal form, so that is too. Because is primitive, so is , and the -th entry of the vectors and must be equal to and [math] respectively. This forces each of the other entries of to have only one choice modulo the left action of . Thus
[TABLE]
to which we can simply apply Theorem 1.3 (see the remark after its statement).
3.3. Some lemmas
Working with is much more involved. Most of the remainder of this paper is devoted to this task. The goal of this section is to derive the expression (3.5) for that is amenable to computation.
We start by recalling the standard choice of the representatives of the right cosets of in the double coset , where has determinant . Such a representative, say , is a lower triangular matrix with determinant , with the condition that for all . Of course, if and only if and have the same invariant factors.
Lemma 3.1**.**
Given an integral matrix with , there exists a unique triple , where is one of the right coset representatives described above, is a primitive Hermite normal form of rank , and , such that .
Proof.
By the theory of the Smith normal form, we have where is an invariant factor matrix — that is, with — is a primitive matrix of full rank, and . Write , where is the Hermite normal form of and . Then there exists and a coset representative of such that . Therefore, writing , we have .
Suppose we have another triple such that . This is possible only if the row vectors of and generate the same lattice. Since both and are in the Hermite normal form, . This in turn implies and . ∎
Lemma 3.2**.**
Again given an integral matrix , write , where , is an invariant factor matrix, and is primitive. Thus , where .
Then is primitive if and only if and is coprime to .
Proof.
Without loss of generality, we may assume to be the matrix which has ’s in the diagonal and [math]’s elsewhere. is imprimitive if and only if there exist integers for , not all zero, such that , or equivalently .
Suppose . We claim that, for any and , for a nontrivial choice of the ’s. There exists a prime such that and , so it suffices to find a nontrivial solution to the expression . But this is clearly possible.
Next suppose . We are led to consider the condition . This is impossible if and only if , which completes the proof. ∎
Lemma 3.3**.**
Write . Then the necessary and sufficient condition for to be one of the standard form right coset representatives of in is as follows: is a lower triangular matrix with , where and , for , and in addition if are two indices such that and — i.e. all diagonal entries between and are trivial — then .
Proof.
Let be a coset representative of some double coset of a matrix of determinant , in the form that we chose in the beginning of this section. Then all but the last condition are automatically satisfied. For the last condition, choose the three smallest indices for which . We consider the matrix
[TABLE]
We will show that this matrix has invariant factors if and only if and are coprime to . Then the proof is complete because we can repeatedly apply this argument to to compute the invariant factors of .
If and are coprime, there exist integers such that , so that the matrix
[TABLE]
has determinant . Multiplying this on the left of (3.2), we have
[TABLE]
which, upon multiplying by suitable elements of from both sides, becomes
[TABLE]
If furthermore is coprime to , then so is , so we can use the same trick to see that (3.2) has invariant factors indeed.
Now go back to (3.2) and consider the case ; we can assume and . We restrict our attention to the upper-left corner submatrix of (3.2), and temporarily use to denote the equivalence under the left and right multiplication by . Then, by a similar argument as earlier, for an appropriate integer ,
[TABLE]
so appears as one of the invariant factors.
∎
Lemma 3.4**.**
Write , as in the previous lemma. Then the number of the right cosets of in equals
[TABLE]
Proof.
From the general theory of Hecke operators (see Chapter 3 of Shimura [16]), it suffices to prove the lemma for the case . We proceed by induction on .
In case , there exist coset representatives which has and for all . This exhausts all the representatives of , so the lemma holds true in this case.
For the general case, it suffices to match, to each representative of , representatives of , different for each . Suppose is the smallest number for which is a power of . Then modifying to and to , for any choice of , yields a representative of , accounting for out of total. Also, for each , replacing by , a choice of each from and of from ( cannot be [math] by the previous lemma) yields a representative of , and there are of this kind. Therefore, for each there is a total of coset representatives of constructed in this manner, as desired. It remains to show that these representatives do not overlap with those constructed from a different choice of . But this is immediate since, given a representative of , one can read off which representative of it came from, by discarding the first factor of that appears in its diagonal. ∎
3.4. A computable expression for
For , recall we defined if and [math] otherwise. Also, as in the statement of Lemma 3.4 write . Thanks to Lemmas 3.1, 3.2 and 3.4, we can rewrite as
[TABLE]
where the sum over is taken over all coset representatives of in the standard form.
Fix for a moment, and consider the innermost summation in (3.3). For some , it is equal to (cf. Lemma 3.2)
[TABLE]
where is the Möbius function, and we wrote
[TABLE]
for short. Note that is a row vector, whereas and are column vectors.
Temporarily write and . We need to compute the determinant of . First observe that
[TABLE]
because , and also
[TABLE]
This motivates the use of the matrix-determinant lemma, which asserts that for a matrix and (row) vectors , . To this end, we also need the following lemma.
Lemma 3.5**.**
Let be a full-rank matrix whose -th row equals . Let such that they form the basis of the polar lattice spanned by and that . Let be the matrix whose -th row equals . Then the inverse of is given by .
Proof.
Complete to an invertible matrix , such that the rows of are orthogonal to the rows of . Similarly complete to , so that the rows of form the dual basis to that formed by the rows of . Then the rows of are orthogonal to the rows of as well.
Since and are inverses of each other, we have . By abuse of language, write , and similarly with . Then
[TABLE]
and observe that the first term on the right is zero outside the first submatrix, and the second term is zero outside the “last” submatrix. This completes the proof. ∎
Thanks to the above lemma, with we compute that is the square root of
[TABLE]
In the last line, we used the fact that .
Let
[TABLE]
if , and set otherwise. We will use this notation throughout the rest of the paper. Then (3.4) becomes
[TABLE]
The lemma below ensures that the translation of the vectors by does not present any extra difficulty in our estimate of this sum.
Lemma 3.6**.**
Let be a lattice of rank , and . Temporarily denote by the number of points with . Then
[TABLE]
where the implicit constant depends on only.
Proof.
This is Lemma 2 in [12] generalized to an affine lattice, and is also a special case of Theorem 5.4 in [22]. We provide a proof here for completeness.
We proceed by induction on . The base case is clear. Now assume the lemma for . By adjusting , we may assume .
First consider the case . Let , , be a vector with , and consider the parallelepiped spanned by . Its diameter is , and it contains a fundamental parallelepiped of , which also has diameter .
Write for the ball in at the origin of radius . Then since , we have
[TABLE]
and thus
[TABLE]
where the second equality follows from the Minkowski’s second theorem.
It remains to consider the case . Then lies in at most two translates of in the direction of . Thus the induction hypothesis implies . Also we have
[TABLE]
as above. This completes the proof.
∎
It follows that (3.4) equals
[TABLE]
where here denotes the lattice spanned by the row vectors of . We have , and by (2.3). Also, , so the above sum can be rewritten as
[TABLE]
which we in turn rewrite as, for the lattice spanned by ,
[TABLE]
Summing up all our work in this section, we deduce that (3.3) equals
[TABLE]
Here is the Euler totient.
The remainder of this paper is devoted to computing (3.5). Because depends on , we cannot deal with the constant factor just yet. However, we will later use
Lemma 3.7**.**
For ,
[TABLE]
Proof.
We can write the expression under question multiplicatively as
[TABLE]
which then becomes
[TABLE]
∎
4. Main term of (3.5)
In this section, we estimate the intended main term of (3.5), namely
[TABLE]
for each and . We may also assume , since otherwise (4.1) is equal to [math]. Our approach is essentially that of Schmidt [12], who uses summation by parts. We improve it somewhat by adopting the language of the Riemann-Stieltjes integral, in order to simplify the computation and to derive pretty error terms.
Let us rewrite (4.1) as
[TABLE]
where
[TABLE]
and
[TABLE]
It is easy to check that is a twice differentiable function on , with .
Choose a with . Write with and . Also, let be the number of elements such that , and . Then for ,
[TABLE]
Write and . Since , by the summation by parts,
[TABLE]
Thus we have bounded from both sides by certain Riemann-Stieltjes sums. We need to show that those sums converge as (and thus ). First, observe that, since ’s are supported strictly away from zero by , we may assume the same of , so that is of bounded variation. Second, are clearly not continuous, but by the induction hypothesis on , we know it is bounded from both sides by a polynomial in . More precisely,
[TABLE]
where is as in Theorem 1.3, and
[TABLE]
By Theorem 6.8 of Rudin ([10]), we have shown that
[TABLE]
Since the same argument will be used repeatedly later in this paper, we summarize our discussion so far in the form of a lemma:
Lemma 4.1**.**
Assume Theorem 1.3 for , and let be a lattice of rank . Suppose is a decreasing twice differentiable function supported on . Then
[TABLE]
We return to estimating (4.3). Recall . In (4.3), for the integrals inside the -notation, there is no harm in replacing with [math] if . For the main term, we can do the same at the cost of
[TABLE]
Now the main term of contributes
[TABLE]
For the second last equality, we used the identity on the beta function (see e.g. [3, Section 6.2.1])
[TABLE]
and the last equality follows from the definition of .
Now accounting for the factor of in , we obtain for the intended main term of (4.1)
[TABLE]
It is clear that this term is scale-invariant i.e. invariant under replacing and by and for any .
The error terms of are dealt with in a similar way, only simpler. For with , the corresponding term contributes
[TABLE]
It is apparent that is scale-invariant, since both and are.
For those with , we proceed as follows:
[TABLE]
In case , we used our assumption . Also, to retain the polynomial shape of the error term, we note for any , and use this bound instead. The scale-invariance can be checked in a straightforward manner.
In conclusion, we proved that (4.1) equals
[TABLE]
where is an index set of cardinality , each is a reciprocal of products of ’s and , so that is invariant under the appropriate scaling. The leading error term is of degree .
5. Error term of (3.5)
In this section, we work on the intended error term of (3.5), namely
[TABLE]
for . Rewrite (5.1) as times
[TABLE]
which we simplify and bound from above by
[TABLE]
Our analysis of (5.2) depends on the “skewness” of and . We will first explain how to deal with (5.2) in case all is of size — i.e. is not too skewed — and then work out the general case.
In addition, for the rest of this section, we assume for simplicity. To restore the general case, one could simply replace by .
5.1. When is “not skewed”
Assume . For each , consider the restriction of the sum (5.2) to those for which is the lowest number such that
[TABLE]
(in fact, the former inequality follows from the latter and the minimality of ), where we interpret and . Such a sum is then bounded by a constant times
[TABLE]
where the sum is over all satisfying (5.3), since it follows from the Minkowski’s second and (5.3) that
[TABLE]
The idea for bounding (5.4) is that, because we are assuming , we can proceed as in Section 9 of Schmidt ([12]). The lemma below is a refinement of Lemma 6 of [12], so as to make explicit the dependence on the successive minima of and .
Lemma 5.1**.**
Let be an ( in our context) dimensional lattice. Fix a , and let . Then the number of such that , and is
[TABLE]
where the implicit constant here depends only on and the implied constants on the bound relating and .
Proof.
We may assume , because by Minkowski’s second
[TABLE]
We proceed by induction on . Suppose first that . Let be the orthogonal projection onto the orthogonal complement of . Then () is a rank lattice of determinant , and is a -dimensional primitive sublattice spanned by a vector whose length is . Therefore, the number of is bounded by the number of primitive vectors of of length .
If is a fundamental domain of , then is a fundamental domain of . Since we can choose an of diameter and is a contraction, has diameter . So the number of vectors of of length is bounded by a constant times
[TABLE]
Here we used the fact that .
For a general , by inductive hypothesis what we need to estimate is
[TABLE]
where the sum is over all such that and . In addition, must satisfy say, since .
From the (proof of) case , the number of with , , and is
[TABLE]
But , so we may disregard the latter possibility.
Therefore, we can apply Lemma 4.1, the Riemann-Stieltjes argument in the previous section, and deduce that (5.6) is bounded by a constant times
[TABLE]
which turns out to be equal to a constant times
[TABLE]
as desired.
∎
We proceed to estimating (5.4). Thanks to Lemma 5.1, for some constant depending only on such that (which exists by Minkowski’s second), we can bound it by
[TABLE]
This can be handled again as in the previous section using Lemma 4.1, yielding terms of -degree at most satisfying all the miscellaneous conditions such as the scaling invariance.
5.2. The skewed case
Now assume that is the lowest number such that
[TABLE]
As earlier, we again restrict the sum (5.2) to those for which is the lowest number such that
[TABLE]
Then we must have and . There is a decomposition
[TABLE]
where is an dimensional lattice chosen as follows: take a reduced basis of such that and . Then we let . Also, let to be the orthogonal projection of onto . An important fact we will use later is that by construction.
We further restrict (5.2) to those for which for a fixed , and call . Note that . We also let be the projection of onto . Clearly , and since we have .
Our considerations so far lead us to bound the restriction of (5.2) by, for some constant ,
[TABLE]
Using the induction hypothesis on our main theorem, and the fact that , we can rewrite the inner sum so that this becomes
[TABLE]
Let us look at one at a time, and consider
[TABLE]
By Lemma 5.1 as in the previous “not skewed” section, we obtain that this is
[TABLE]
Applying Lemma 4.1, it is seen that (5.8) may be bounded by at most error terms. We need to make sure that the -degree of those terms are strictly below . Here we only discuss the terms of the highest degrees, as the rest can be dealt with in a similar fashion.
If , estimating the sum in (5.8) using Lemma 4.1 yields a term of -degree . Therefore, the -degree of (5.8) equals
[TABLE]
which attains its maximum only if and . But recall that we are assuming .
If , the sum is of size , in which case we can say that, for a small , the -degree is if , and is if .
5.3. The number of the error terms
We summarize and estimate the maximum number of error terms arising from our estimate of (5.2) so far. If is “not skewed,” then our estimate yielded error terms, where we write
[TABLE]
and we understand to be the empty set.
As for the skewed case, it is really separate cases corresponding to the parameter , and for each we obtained at most error terms. Hence, regardless of and , we are able to estimate (5.2) using at most
[TABLE]
terms.
6. Summary, and a proof of Theorem 1.3
6.1. A polynomial expression for
Summing up all our work so far, we have that
[TABLE]
where , is an index set of cardinality at most
[TABLE]
(collecting all error terms from the previous two sections), and each is a reciprocal of products of ’s and so that is scale-invariant. In this section, we will estimate the sum (6.1), and then make a choice of so that the dependence on ’s turns into dependence on ’s. This will prove our main theorem.
We treat (6.1) one monomial at a time. The highest degree term contributes
[TABLE]
The corresponding infinite series, by Lemma 3.7, equals
[TABLE]
the desired main term. It remains to bound the tail, which we can, up to a constant factor, approximate as
[TABLE]
which is of size
[TABLE]
We need to show that is bounded by a reciprocal of a product of ’s. Since and (which can be seen by projecting a -dimensional subspace of onto the orthogonal complement of ), we have , and thus . On the other hand, , which contains the factor times. For any , , so , as desired.
We return to other monomials in (6.1). For the indices with , the sum under consideration is
[TABLE]
which we can bound by the corresponding infinite series and apply Lemma 3.7, obtaining . If , the sum is of size
[TABLE]
and if , it is
[TABLE]
for any . Hence, together with the expression (3.1) of , we conclude that
[TABLE]
for some index set of cardinality at most , and where each is a product of reciprocals of ’s, ’s, and , so that is scale-invariant.
At this point, choose to be one of the shortest nonzero vectors of . Then the following lemma shows that we can replace by for each , so that would depend only on .
Lemma 6.1**.**
Recall that is the orthogonal projection of onto the complement of a vector . If we choose to be a shortest nonzero vector of , then for all .
Proof.
Let be a reduced basis of containing . Then, writing for the projection of to the complement of , is a reduced basis of . Therefore, and .
On the other hand, by the definition of a reduced basis, for some . This immediately implies , and also, since , we have , completing the proof. ∎
6.2. The number of the error terms
Let us give a quick, crude estimate of . From the above discussion, we have
[TABLE]
Recall that by definition. Also, it is clear from Section 2.2 that . We claim in general that for . Indeed, the base case is obvious, and assuming the truth for the case, it follows from the above inequality that
[TABLE]
6.3. The primary error term,
Finally, we provide an estimate on the primary error term of , again assuming . We temporarily assume , and argue the cases by duality. Tracing back our estimates so far, there are two candidates for the primary error term: one is from the estimate of the “main part” (4.1), which contributes
[TABLE]
where for corresponding to the leading error term, and the other is from the estimate of the “error part” (5.1) in case , which contributes
[TABLE]
but by rewriting everything in terms of ’s with help of Lemma 6.1, we find that this is bounded by
[TABLE]
The reason we use this slightly inferior bound is that this possesses a convenient symmetry under duality, as we will see below.
We claim by induction that the main error term has degree , and that (6.2) is no greater than (6.3). In the base case , this is clear. In general, if , (6.2) is of degree strictly less than , and we are done. If , then by the fact that and Lemma 6.1,
[TABLE]
which shows that (6.2) has the same size as (6.3), by the inductive hypothesis on . This proves the claim.
6.4. The primary error term,
Write for short. By the duality theorem (2.2), , where . Observe that both has the same main term, that is,
[TABLE]
Moreover, from the previous section, has the leading error term of size
[TABLE]
which is equal to
[TABLE]
as desired.
7. Proofs of the variants
7.1. Formula for
An asymptotic formula on can be derived easily from that of by a standard Möbius inversion, as in Schmidt ([12, Sections 3,4,10]). As in [12], define inductively by
[TABLE]
It is shown in [12, Section 3] that equals the number of index sublattices of a rank lattice, and that
[TABLE]
for . From the latter it follows that
[TABLE]
where . We handle each sum over one at a time. For the main term, we have
[TABLE]
On the right-hand side, the first sum is by (7.1), which is exactly what we need. The second sum is bounded by a constant times
[TABLE]
for any .
In the error term, for those with we can replace the sum by the infinite sum , and apply (7.1). For those with , we see that
[TABLE]
for any . If , can be set small enough, so that the secondary term has -degree . If , the secondary term has degree .
The required properties of the coefficients can be checked straightforwardly, so we omit the proof.
Remark*.*
One may wonder what the formula for would be. In this case, the skewness of induces no subtlety at all, and simply
[TABLE]
for any . The proof is identical to the argument in Section 3 of [12]; indeed, observe that for any full-rank of covolume 1.
7.2. Formula for
Let be a sublattice of rank . By choosing the basis of so that is a basis of , and proceeding analogously as in Section 3 with instead of , we obtain an estimate of analogous to that of in (1.2), with the coefficients being a product of reciprocals of and (here we identify with the projection of onto the orthogonal complement of in ). However, the reciprocal of could be arbitrarily large, which may cause difficulties in some applications of Theorem 1.3. For instance, suppose one wants to compute
[TABLE]
Here one is eventually led to sum the multiples of the reciprocals of over sublattices of height bounded by . It seems to be a nontrivial task to show that such a sum is asymptotically small.
Fortunately, with minor modifications to our proof of Theorem 1.3, it is possible to provide a formula for independent of , avoiding the above complication altogether. In this section, we explain what modifications are to be made.
Consider first the base cases or . If , , and bounding the contribution from in terms of using (because ), we obtain the same type of estimate as in (2.1). In case , we must have , and thus for , if and only if ; hence the proof follows from the case and the duality theorem.
For other values of , we proceed by induction on , and split as in Section 3. For , we simply bound it by . As for , observe that, analogously to (3.3), we can write
[TABLE]
where denotes the lattice spanned by the row vectors of . The idea is that the main contribution of the above sum comes from those with , where is the projection of onto . Since implies , we may write
[TABLE]
where
[TABLE]
But since , it also holds that
[TABLE]
Therefore all we need to show is that is small, or equivalently, that is close to .
Estimating amounts to considering an analogous expression to (3.5) where the sum over is further restricted to those for which . With the same restriction added to all the subsequent computations, all of the arguments in Section 4 goes through, including Lemma 4.1, since by the induction hypothesis and satisfy the same asymptotics on lattices of rank . As for the error terms of , we may simply bound them by those of , namely (5.2) for , and so there are no changes to make. This shows that and satisfy the same asymptotics, and hence that is bounded by terms of leading degree strictly less than .
To count the number of error terms, recall that . have error terms, and since , it has at most error terms. Thus the number of error terms in our formula for is no more than .
It remains to determine the main error term. If , the argument of Section 6.3 carries over, showing that it is the same as in Theorem 1.3. If , extend to a sublattice of rank precisely . Then
[TABLE]
on the one hand, and on the other hand,
[TABLE]
Now we can argue as in Section 6.4, using duality, to show that has the leading error term of the same size, and therefore so does . This completes the proof of Theorem 1.5; for , one may proceed as in the last section.
7.3. Flag varieties of type
Let be a lattice, and let . Our goal is to estimate the sum
[TABLE]
Here . Note that there is also the constraint
[TABLE]
coming from the definition of height.
Estimating the sum over the main term is very similar to the computation in Section 4, so we will be brief. The largest term of (7.2), obtained by applying Lemma 4.1 to , comes from the integral
[TABLE]
where and likewise for . The smaller terms can be computed similarly, and it turns out the largest error term is of -degree , and the second largest is of -degree , both coming from the leading error term of . One obtains a total of error terms from here.
The harder part of (7.2) is the sum over the error term, namely
[TABLE]
To bound this, we employ our method in Section 5 above. In order to avoid repetitive and unenlightening computations, we only show how to compute the first two largest -degree terms, and suppress the factors from the expressions for the error terms.
As in Section 5, for each , we restrict the sum in (7.4) to those for which is the smallest number such that
[TABLE]
Then we can bound (7.4) by
[TABLE]
In order to work on the inner sum, we first determine the range of . By Minkowski’s second, we have . On the other hand, again by Minkowski’s second we have , so (7.3) implies
[TABLE]
In the last line, we also used .
If , the outer sum of (7.5) is vacuous, and the inner sum can be computed as in the main term estimate above, yielding up to lower -degree terms. Similarly, we obtain the same bound in case . Each of these cases add at most error terms to our estimate.
So assume . We will apply Lemma 5.1. To do so, it is required that , which is true provided . Thus, assuming is sufficiently large, the inner sum of (7.5) is bounded by a constant times
[TABLE]
We divide into cases according to whether or not:
- (i)
If , then (7.5) is
[TABLE]
(7.6) imposes the additional constraint to this sum. Therefore, by Lemma 4.1, the contribution from the largest -degree term in our estimate (1.2) of to (7.5) is
[TABLE]
up to the two largest -degree terms, as desired. The contributions from the smaller terms of will be discussed later. 2. (ii)
If , then we instead have
[TABLE]
Estimating in the same manner as in Case (i) above, this is
[TABLE]
and since , we are done. 3. (iii)
In the rare, yet possible, case that , we proceed similarly, and bound (7.5) by
[TABLE]
In addition, in all three cases above, the contribution from the leading error term of is of size , and the number of error terms obtained is at most . This completes the error estimate. We note that the related computation in Thunder ([20]), lines 5-6 on p.185, contains a minor error: if , the integral there diverges.
In summary, we estimated (7.2) to be
[TABLE]
where is an index set of cardinality at most , ’s are appropriate inverse products of ’s, and the implied constant depends on only. The largest is , and the second largest is one of
[TABLE]
When , it is always , but otherwise both of the other two are possible.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] W. Banaszczyk. New bounds in some transference theorems in the geometry of numbers. Math. Ann. 296 (1993), no. 4, 625-635.
- 2[2] A. Borel. Introduction to arithmetic groups. University Lecture Series, Vol. 73. The American Mathematical Society, Providence, RI, 2019.
- 3[3] P. J. Davis. 6. Gamma function and related functions, in M. Abramowitz and I. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover Publications, 1972.
- 4[4] J. Franke, Y. Manin, and Y. Tschinkel. Rational points of bounded height on Fano varieties. Invent. Math. 95 (1989), no. 2, 421-435.
- 5[5] S. Kim. Random lattice vectors in a set of size O ( n ) 𝑂 𝑛 O(n) . Int. Math. Res. Not. (2020), 2020(5): 1385-1416.
- 6[6] S. Kim. Mean value formulas on sublattices and flags of the random lattice. J. Number Theory, to appear.
- 7[7] P. Le Boudec. Height of rational points on random Fano hypersurfaces. ar Xiv:2006.02288 v 1.
- 8[8] A. K. Lenstra, H. W. Lenstra, Jr., and L. Lovász. Factoring polynomials with rational coefficients. Math. Ann. 261 (1982), no. 4, 515-534.
