Some new results in random matrices over finite fields
Kyle Luh, Sean Meehan, Hoi H. Nguyen

TL;DR
This paper investigates the distribution properties of random matrices over finite fields, providing new characterizations of random walks and analyzing eigenvalue and polynomial divisibility probabilities, revealing universal behaviors.
Contribution
It introduces novel characterizations of random walks with large discrepancy and extends universality results for matrix eigenvalues and polynomial divisibility over finite fields.
Findings
Distribution of ranks of random matrices over F_p analyzed
Probability of eigenvalue-free matrices characterized
Divisibility of characteristic polynomials by irreducible polynomials studied
Abstract
In this note we give various characterizations of random walks with possibly different steps that have relatively large discrepancy from the uniform distribution modulo a prime p, and use these results to study the distribution of the rank of random matrices over F_p and the equi-distribution behavior of normal vectors of random hyperplanes. We also study the probability that a random square matrix is eigenvalue-free, or when its characteristic polynomial is divisible by a given irreducible polynomial in the limit n to infinity in F_p. We show that these statistics are universal, extending results of Stong and Neumann-Praeger beyond the uniform model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Some new results in random matrices over finite fields
Kyle Luh
Center of Mathematical Sciences and Applications
Harvard University
20 Garden St.
Cambridge, MA 02138 USA
,
Sean Meehan
Department of Mathematics
The Ohio State University
231 W 18th Ave
Columbus, OH 43210 USA
and
Hoi H. Nguyen
Department of Mathematics
The Ohio State University
231 W 18th Ave
Columbus, OH 43210 USA
Abstract.
In this note we give various characterizations of random walks with possibly different steps that have relatively large discrepancy from the uniform distribution modulo a prime , and use these results to study the distribution of the rank of random matrices over and the equi-distribution behavior of normal vectors of random hyperplanes. We also study the probability that a random square matrix is eigenvalue-free, or when its characteristic polynomial is divisible by a given irreducible polynomial in the limit in . We show that these statistics are universal, extending results of Stong and Neumann-Praeger beyond the uniform model.
2010 Mathematics Subject Classification:
15B52, 20G40
1. Introduction
Let be a prime power, and let () be the group of all (resp. non-singular) matrices with entries in the field of elements. Given a (class) function that depends on (such as the rank of , or the factors of its characteristic polynomial, etc), it is natural to study the behavior of for a “typical” matrix, such as for one sampled uniformly at random, we call this the uniform model.
1.1. Some statistics for the uniform model
Our first example is on the rank of . By exposing the columns of one by one, it is not hard to show that the probability that belongs to is exactly
[TABLE]
More generally, for we can show that
[TABLE]
Our next example is the probability that does not have eigenvalue in (equivalently, does not have one dimensional invariant subspace). Beautiful results by Stong [36] and Neumann-Praeger [25] (see also [15]) showed that this probability tends to the derangement probability of a random permutation. We have
[TABLE]
More generally, Stong showed that in the limit, the probability that the characteristic polynomial of factors into degree irreducible factors is the same as the probability of an element of factors into cycles of degree , and Hansen and Schmutz [19] also obtained similar results for joint cycle structures.
In a different direction, random matrices over finite field is also a source to generate random partitions. For instance it follows from [16] (and the references therein) for the uniform model:
[TABLE]
where is the partition corresponding to in the rational canonical form of a uniform matrix . We refer the reader to Section 2 for a precise definition of .
The measure above had been studied extensively in Number Theory in the context of the Cohen-Lenstra heuristics. Indeed, assume that is a -group, , Friedman and Washington [14] showed that for a Haar random matrix in
[TABLE]
We also refer the reader to [42, 43] and the references therein for related results.
In the spirit of (3), Fulman [15, 16] also showed for the uniform model that as , for any fixed irreducible polynomial ,
[TABLE]
and in particularly,
[TABLE]
Moreover, it was also shown that these statistics are asymptotically independent for different in the sense that for any different irreducible polynomials
[TABLE]
We invite the reader to Section 2 for a useful tool to deduce Equations (2), (3) and (4).
1.2. Our main results
Motivated by the universality phenomenon in Random Matrix Theory, we wonder if the above statistics also hold for other models of . While there have been many results addressing universality of random matrices in characteristic zero (to study the spectral behavior of various models of random matrices), we have not seen much in the literature addressing universality behavior in the finite fields setting. In fact, to the best of our knowledge, although there had been partial results such as [1, 2, 3, 6, 10, 12, 20, 21, 35], universality results of matrices in finite fields only appeared very recently in [23, 24, 27, 30, 42, 43]. For instance, regarding the rank distribution, a simple consequence of results of Maples [23, 24] (see also [27]) and of Wood [42] showed
Theorem 1.3**.**
Let be fixed. Assume that is a random matrix where are iid copies of a random -balanced distribution in . Then we have
[TABLE]
Here we say that a random variable in is -balanced 111For simplicity, our notion here is weaker than those from [23, 24, 27, 30] in that is fixed. if
[TABLE]
The method of [23, 24] (see also [27]) relies on a swapping technique from [9, 38, 4], and can yield exponentially small error bound of type in Theorem 1.3. However this technique is quite delicate, and does not seem to extend to other interesting models of matrices such as symmetric matrices. The method of [42, 43] mainly rely on the moment method, which extends rather easily to matrices of entries over for composite (and to control other algebraic statistics beside the ranks), but one has to assume to be sufficiently small.
One of the main goals of this note is to provide three alternative methods, which we will call the “arithmetic approach” (after [28, 40]),“geometric approach” (after [32]) and “combinatorial approach” (after [8]). Although the error bounds obtained by these methods are usually of subexponential type (rather than exponential type), we believe that the methods will be extremely useful for the study of random matrices in finite fields. For instance the methods can be adapted to matrices with constraints such as symmetric matrices and antisymmetric matrices [31] to answer a question from [5]. To highlight a result of this method, we show in Section 6 the following result
Theorem 1.4** (Rank distribution).**
Assume that for a sufficiently small constant . Assume that . Then for a random matrix with entries being iid copies of an -balanced random variable in we have
[TABLE]
where is another (sufficiently small) positive constant depending on and .
Note that we can also establish similar rank distribution for rectangular matrices of size for a fixed by a similar method. These results are not new and weaker than existing results in the literature (see for instance [23, 24] and [27, Theorem A.4], [30, Theorem 5.3].) However, as mentioned, our approach is new and seems to be robust. (For instance it can be used to prove Theorems 1.5, 1.6, 1.7 and 1.8 below.) More precisely, to establish Theorem 1.4 we will analyze the normal vectors of random subspaces for which we will show that the random sum spreads out in uniformly very quickly.
Theorem 1.5** (Non-structure of the normal vectors).**
With the same assumption as in Theorem 1.4, let be the first column vectors of . Let be any non-zero vector that is orthogonal to . Then with probability at least with respect to we have
[TABLE]
where are iid copies of .
We will prove the above result by showing that with very high probability the normal vectors do not have any structure (in any arithmetic, geometric, or combinatorial sense). On the other hand, we will also show that the normal vectors actually behave like a uniform vector in . This can be seen as a discrete analog of [29] where it was shown that normalized normal vectors of a random hyperplane in behave like a uniform vector on the unit sphere.
Theorem 1.6** (Uniformity of the normal vectors).**
With the same assumption as in Theorem 1.4, and conditioning on the event that the subspace generated by has full rank, we have
- •
For each ,
[TABLE]
- •
For each , and for any we have
[TABLE]
- •
Furthermore, with being the number of such that , if we assume and then
[TABLE]
where is absolute.
We need to be smaller than in Eq. (6) so that is not vanishing on average. We remark that the above result holds trivially for the uniform model (when a chosen uniformly at random from ) as in this case is distributed as a uniform vector. However, it is not clear at all as to why also behaves like a random uniform vector even when the are sampled differently.
In the above results there is a natural connection between the ranks and the normal vectors. Somewhat more surprisingly, we show that these quantities can also be used to study the characteristics polynomials. Namely we can obtain the following analog of Equation (4).
Theorem 1.7** (Divisibility of the characteristic polynomials).**
With the same assumption as in Theorem 1.4, let be any fixed constant. For a prime and fixed polynomial such that and is irreducible over we have
[TABLE]
where and the implied constant depend on and is a constant that depends only on the degree of .
Also, we will show the following analog of Equation (2).
Theorem 1.8** (Universality for eigenvalue-free matrices).**
Assume that is as in Theorem 1.4. We have
[TABLE]
Thus, for instance, our result works for the following simple-looking model of random matrices of (mean zero) integral entries. Let be a finite deterministic set of integers (such as or , etc.), and let be a prime so that the projection of onto is not a single point. Let be the image of the uniform measure on under , then with the same notations as above we have the following result.
Corollary 1.9**.**
Among all matrices whose entries are all in , there are
- •
-portion of them have rank in ;
- •
-portion of them have characteristic polynomial divisible by a given irreducible polynomial for primes ;
- •
-portion of them are eigenvalue-free as and ;
- •
the normal vectors satisfy Theorem 1.5 and 1.6.
1.10. Notation
We write for probability and for expected value. For an event , we write for its complement. We write for the exponential function . We use to denote .
For a given index set and a vector , we write to be the subvector of of components indexed from . Similarly, if is a subspace then is the subspace spanned by for .
For a vector we let . We will also write for the dot product . We say is a normal vector for a subspace if for every .
For , the matrix is the submatrix of the rows and columns indexed from and respectively. Sometimes we will also write for if there is no confusion.
We write to be the distance to the nearest integer. Sometimes, for a matrix we write and for this -th row and column respectively. We write , , , or if for some fixed . We also write that if and .
Our paper is organized as follows. We will first discuss tools to prove Equations (2), (3) and (4) in Section 2. We will present our characterization methods in Sections 3, 4, 5, and then use these results to prove Theorem 1.5 in Section 6. We will present a short proof of Theorem 1.4 in Section 7 and of Theorem 1.6 in Section 8. The remaining two sections are reserved to prove Theorem 1.7 and Theorem 1.8 respectively.
2. The uniform model
In this part we discuss the method to prove Equations (2), (3) and (4). Although this is not the main goal of the note, we would like to present it here for pedagogical purposes, as for most of the cases the universal statistics are computed from the uniform model. We refer the reader to [16] for a comprehensive survey on the method and its other applications.
We first introduce a simple representative for (or ) modulo the conjugacy action of . To motivate the formulas, let us first introduce a simpler variant for the permutation groups . For a permutation let be the number of cycles of length of . The cycle index of a subgroup of is defined as . The function is called the cycle index generating function of the symmetric groups and Pólya’s result shows that . This formula is useful in the study of (conjugacy) class functions of permutations.
For matrices over , the cycle index generating functions can be described by first giving some information on the conjugacy classes. Let be a partition of some non-negative integer into integer parts . In what follows denotes the number of parts of of size , is the partition dual to , and denotes .
Recall that we define the characteristic polynomial of an matrix as . Assume that the irreducible decomposition of the characteristic polynomial of a matrix has the form . The rational canonical form of the conjugacy class containing is a matrix of form
[TABLE]
where each matrix has the form
[TABLE]
and . Also we have the constraint that . Here for , the companion matrix is defined as
[TABLE]
In other words, we have the decomposition of where the characteristic polynomial of on is , and furthermore where are cyclic subspaces with dimension .
Note that in the data given above, each irreducible polynomial is assigned a partition . For example for , then , and for all other ; while for
[TABLE]
then , and for all other .
To introduce the cycle index formula for , let be variables corresponding to pairs of polynomials and partitions. Define
[TABLE]
Beautiful results of Kung [22] and Stong [36] showed that
[TABLE]
Note that one can also define similarly. The above formula allows one to study class functions for matrices over , for which we now give a proof for Equation (2), a proof for Equations (3) and (4) can be done similarly by specifying the variables appropriately.
Proof.
(of Equation (2)) In the cycle index formula above, by specializing the variables we may count different subsets of . For instance if we set we get everything so,
[TABLE]
We want to count matrices with no fixed subspace. In terms of this is the same as for linear and otherwise. Making this assignment and using (7) we have,
[TABLE]
where is the number of derangements in . Now the coefficient of this generating function is going to the first term in the product evaluated at and by a result of Fine-Herstein [13] we have (with ),
[TABLE]
Some cursory analysis (using the fact that the asymptotic behavior of the sum is determined by its first term) shows
[TABLE]
as desired. ∎
3. Structures of vectors in : an almost optimal characterization
Let be an (additive) abelian group. A set is a generalized arithmetic progression (GAP) of rank if it can be expressed in the form
[TABLE]
for some elements of , and for some integers and . One can think of Q as the image of an integer box under the linear map
[TABLE]
Given with a representation as above, the numbers are generators of , the numbers and are dimensions of , and is the volume of associated to this presentation (i.e. this choice of ). We say that is proper for this presentation if the above linear map is one to one, or equivalently if . For an integer , we let denote the dilation of by , i.e.
[TABLE]
and we say is -proper if is also proper. If for all and , we say that is symmetric for this presentation. A coset progression in is a set of type , where is a subgroup of .
Our main result here is that, if a random walk in does not spread out evenly fast, then the steps must be arithmetically correlated (and vice versa).
Theorem 3.1** (Arithmetic structure, characterization I).**
Let and be positive constants. Suppose is a random variable that is -balanced taking values in and that is such that
[TABLE]
where are independent and identically distributed copies of and is an odd prime possibly depending on . Then for any , there is a set of components such that one of the following holds.
- •
For , there exists a GAP of rank one that contains , where
[TABLE]
- •
For , there exists a proper symmetric GAP of rank that contains , where
[TABLE]
Note that our characterization is almost optimal in the sense that it nearly implies the backward direction: if satisfies the conclusion of the theorem, then . It also implies that for (for any ), recovering a result by Maples (see Theorem 4.1. This is because that if a positive portion of the are non-zero, then the set must have size at least and .) Our presentation here follows from [30] with some modifications (as in [30] we focused only on large , and on the quantity rather than on as above.) We will make use of two results from [39] by Tao and Vu. The first result allows one to pass from coset progressions to proper coset progressions without any substantial loss.
Theorem 3.2**.**
[39, Corollary 1.18]** There exists a positive integer such that the following statement holds. Let be a symmetric coset progression of rank and let be an integer. Then there exists a -proper symmetric coset progression of rank at most such that we have
[TABLE]
We also have the size bound
[TABLE]
The second result, which is directly relevant to us, says that as long as grows slowly compared to , then it can be contained in a structure. This is a long-range version of the Freiman-Ruzsa theorem.
Theorem 3.3**.**
[39, Theorem 1.21]** There exists a positive integer such hat the following statement holds: whenever and is a non-empty finite set such that
[TABLE]
then there exists a proper symmetric coset progression of rank and size and such that
[TABLE]
Note that any GAP is contained in a symmetric GAP . Thus, by combining Theorem 3.3 with Theorem 3.2 we obtain the following
Corollary 3.4**.**
Whenever and is a non-empty finite set such that
[TABLE]
then there exists a 2-proper symmetric coset progression of rank and size such that
[TABLE]
Proof.
(of Theorem 3.1) First, for convenience we will pass to symmetric distributions. Let be the symmetrization and let be a lazy version of so that
[TABLE]
Notice that is symmetric as is symmetric. We can check that , and so
[TABLE]
We assume that for , and that , where for all and . Denote . Consider where is maximum (or minimum). Using the standard notation for , we have
[TABLE]
So
[TABLE]
By independence
[TABLE]
It follows that
[TABLE]
where we made the change of variable (in ) and used the triangle inequality.
By convexity, we have that for any , where is the distance of to the nearest integer. Thus,
[TABLE]
Hence for each
[TABLE]
Consequently, we obtain a key inequality
[TABLE]
Large level sets. Now we consider the level sets . We have
[TABLE]
As , there must be a large level set such that
[TABLE]
In fact, since , we can assume that . The bound guarantees that is non-empty. Now we consider two cases.
Case 1. We assume . We know that is non-empty, and hence there exists so that
[TABLE]
Set
[TABLE]
Then by definition of , we have
[TABLE]
Thus we can rewrite the above as
[TABLE]
Thus there exists an index so that , that is
[TABLE]
So, for most
[TABLE]
More precisely, by averaging, the set of satisfying (15) has size at least . We call this set . The set has size at most and this is the exceptional set that appears in Theorem 3.1. By definition, for from this set we have
[TABLE]
Hence we have seen that, after a dilation by , belongs to the arithmetic progression of rank one and of size ,
[TABLE]
Notice that in this case we don’t have to assume .
Case 2. We assume . By double-counting we have
[TABLE]
So, for most ,
[TABLE]
for some large constant .
By averaging, the set of satisfying (17) has size at least . We call this set . The set has size at most and this is the exceptional set that appears in Theorem 3.1. In the rest of the proof, we are going to show that is a dense subset of a proper GAP.
Since is a norm, by the triangle inequality, we have for any
[TABLE]
More generally, for any and
[TABLE]
Dual sets. Define
[TABLE]
where the constant is ad hoc and any sufficiently large constant would do. We have
[TABLE]
To see this, define . Using the fact that for any , we have, for any
[TABLE]
One the other hand, using the basic identity , we have (taking into account that )
[TABLE]
Equation (20) then follows from the last two estimates and averaging.
Next, for a properly chosen constant we set
[TABLE]
By (19) we have . Next, set
[TABLE]
We have . This results in the critical bound
[TABLE]
We are now in a position to apply Corollary 3.4 with as the set of distinct elements of . As ,
[TABLE]
It follows from Corollary 3.4 that is a subset of a 2-proper symmetric coset progression of rank and cardinality
[TABLE]
Now we use the special property of that it has only trivial proper subgroup. As , and as , the only way that is that . Consequently, is now a subset of , a 2-proper symmetric GAP of rank and cardinality
[TABLE]
To this end, we apply the following dividing trick from [28, Lemma A.2].
Lemma 3.5**.**
Assume that and that is a 2-proper symmetric GAP that contains . Then .
Combining (23) and Lemma 3.5 we thus obtain a GAP that contains and
[TABLE]
concluding the proof. ∎
Before concluding the section, we record here an elementary but useful result beyond the polynomial regime.
Theorem 3.6** (degenerate case).**
Let and be positive constants such that . Let be an odd prime number. Suppose is a random variable that is -balanced taking values in . Also, assume that is such that
[TABLE]
where are independent and identically distributed copies of . Then for any , there is a set of components and then a GAP of rank one that contains , where
[TABLE]
We note that be bound on above is very close the the trivial bound . The result is effective for not too large .
Proof.
We proceed as in the proof of Theorem 3.1 until (12) that
[TABLE]
We recall the level sets . We have
[TABLE]
As , there must be a large level set such that
[TABLE]
In fact, since , we can assume that . The bound guarantees that is non-empty. Our next step is almost identical to the proof of the first part of Theorem 3.1. As is non-empty, there exists so that
[TABLE]
With as in (14) we have , and we can rewrite the above as
[TABLE]
Thus there exists an index so that , that is
[TABLE]
So, with be the set of such that then has at least elements. By definition, for we have and this implies that after a dilation by the set belongs to the arithmetic progression of rank one
[TABLE]
Notice that the size of is bounded by as desired.
∎
4. Structures of vectors in : a geometric approach
From now on, for simplicity we will assume our random variables are iid Bernoulli (taking values with probability 1/2) and , the general -balanced case can be treated almost identically (see Remark 4.9.)
Let be a non-zero vector in , where is a prime. We first cite a result of Erdős-Littlewood-Offord type from [23]
Theorem 4.1**.**
Let be a constant, and assume that
[TABLE]
Then
[TABLE]
where the implied constant depends on , and where and are iid Bernoulli.
In what follows, if not specified, we always assume our deterministic vector to satisfy the non-sparsity property (25). We remark that this non-sparsity property passes to all other dilations of in for non-zero .
As mentioned in the introduction, our treatment in this section is motivated by the work of Rudelson and Vershynin (in characteristic zero) [32] and we hope to develop a “geometric” characterization of the steps of our random walk in if the walk spreads out slowly. This task is not straightforward; as we will see, there are many simple concepts in characteristic zero that are hard to find natural (and equally useful) analogs in the finite field setting (for instance, the notion of compressible and incompressible vectors).
In some situations, if is a vector in , then by viewing as the interval in , we will consider the components as integers from this interval. We then write as the vector in
[TABLE]
Definition 4.2**.**
Let and be given. Let be a non-zero vector in where . We denote by to be the smallest (infimum) positive integer such that
[TABLE]
where denotes the smallest Euclidean distance from to an element of .
Throughout this paper, is an absolute constant (such as ), and , for some positive constant to be chosen.
This definition is in characteristic zero. Here we used the notion of (compared to the notion of from [32]) to emphasize that is not normalized (i.e. its -norm might not be unit). Notice that
[TABLE]
Furthermore, if then by definition we would have for all that
[TABLE]
Remark 4.3**.**
Note that if for some we have for all , then
[TABLE]
This is because otherwise, and hence , which cannot be smaller than by definition as .
Our result below says that if is large then the concentration probability is small. In our notation is another vector in , which again can be viewed as a vector in . We then define as accordingly in this projection to characteristic zero.
Theorem 4.4** (Geometric structure, characterization II).**
Let be a prime, and let be an arbitrary constant. Let be a non-zero vector in , where , and let . Then
- (1)
If there is no non-zero such that then we have
[TABLE] 2. (2)
Otherwise, assume that and satisfies (25), with and
[TABLE]
Then
[TABLE]
where the implied constants depend on , and where and are iid Bernoulli in the concentration definition .
Corollary 4.5**.**
Assume that there exists a quantity such that , then there exists a dilation of , where non-zero, so that with we have and there exists such that
[TABLE]
and
[TABLE]
We next deduce another elementary but useful result, which will be used later on in the applications.
Corollary 4.6**.**
Assume that has at least non-zero coordinates, and . We then have
[TABLE]
Proof.
As has at least non-zero coordinates for any non-zero , we have that
[TABLE]
and we are in scenario (1) of Theorem 4.4. ∎
We now present a proof of Theorem 4.4.
Proof.
(of Theorem 4.4) Write , then for any we have
[TABLE]
So
[TABLE]
where we used the fact that for any , where is the distance of to the nearest integer, and that
[TABLE]
From here, (1) follows as
We are now in the assumption of (2). For each integer , let be the (level) set of corresponding to ,
[TABLE]
By the non-sparsity of , we can show that is not all of (we can show this by using the fact that if then ). Thus
[TABLE]
Our next claim shows that the level sets consist of well separated intervals.
Claim 4.7** (spacing of the level sets).**
Assume that and and
[TABLE]
Then
[TABLE]
Consequently we have
[TABLE]
Proof.
(of Claim 4.7) Assume that , then by the triangle inequality,
[TABLE]
Thus
[TABLE]
where in the last estimate we used . Thus by the definition of we must have
[TABLE]
∎
Next, we will need a Cauchy-Davenport-type bound on size of sumsets in . Observe from the Cauchy-Schwarz inequality, , and so
[TABLE]
where we view these sets as subsets of . Hence, by Cauchy-Davenport’s inequality in ([37]) we have that
[TABLE]
Thus for all , by choosing we have
[TABLE]
where we used .
We deduce
[TABLE]
completing the theorem proof. ∎
Remark 4.8**.**
Under the assumption of (2) of Theorem 4.4, we have actually shown a stronger estimate that
[TABLE]
We note that Theorem 4.1 can be deduced from Theorem 4.4 by setting with sufficiently large ; for this there is a dilation with of order but . We then just apply (2) of Theorem 4.4, noting that .
Remark 4.9**.**
When are iid copies of an -balanced random variable, then by Equation (12), by convexity, and by the fact that we have
[TABLE]
It thus boils down to study the bounds for concentration probability of , for which we have done in the proof of Theorem 4.4.
4.10. Some properties of ULCD
Roughly speaking, our next result is similar to Theorem 4.4, but instead of working with the concentration event we are working with a coarser event that belongs to an arc in . We find it more convenient to write in mod 1 as follows.
Theorem 4.11** (anti-concentration modulo one).**
Assume that . Assume that
[TABLE]
Then for any
[TABLE]
we have
[TABLE]
where are iid Bernoulli.
Note that we need this result because at some point we need to pass to characteristic zero, and take distance to . A key difference of this bound compared to the classical small ball estimate (say studied in [32]) is that we are looking at the balls modulo one, rather than with respect to the whole real line.
Proof.
(of Theorem 4.11) Let be the distribution of modulo one, where we can write , and where . Let . We use the Erdős-Turán inequality,
[TABLE]
As are iid Bernoulli, bounding the cosine as in the proof of Theorem 4.4 we have
[TABLE]
Now by definition of , as , for any we have
[TABLE]
and so
[TABLE]
Summing over all we have
[TABLE]
as desired. ∎
It is remarked that the bound above depends on , which becomes almost meaningless if is small, say of order . To avoid this situation, we will need to consider vectors that have large size and large at the same time.
Our next result roughly says that a non-sparse vector cannot have very small , at least with respect to with not too large and not too small . To be more precise, we have the following.
Remark 4.12**.**
As we will be working with vectors satisfying (25), we easily see that for any in
[TABLE]
Notice that this quantity is larger than if , and in this case the first part of 4.4 holds, and hence automatically
[TABLE]
As such, in what follows we will be working with
[TABLE]
Lemma 4.13** (LCD and size in fields of small order).**
Assume that for a positive constant . Assume that is a prime smaller than , and is a vector satisfying (25) and such that . Then there exists so that with we have has order and either (in which case we can apply (26)) or else
[TABLE]
We remark that this result is perhaps the most important one in our treatment, as it allows us to assume that the to be sufficiently large to make sense of the bounds. In characteristic zero, this bound is straightforward if the vector is incompressible (being far from sparse vectors).
Before proving this lemma, we first need the following simple statement.
Claim 4.14**.**
Assume that is a non-zero vector satisfying (25) and such that , with . Then there exists so that with we have
[TABLE]
Proof.
(of Claim 4.14) As satisfies (25) and , (1) of Theorem 4.4 does not apply, and so there is a fiber such that . If the we would be done. Otherwise we just consider the sequence , etc. By the triangle inequality (where we recall that ) we have
[TABLE]
On the other hand, by (25) has order , so there must exist a smallest such that . It then follows that . ∎
Proof.
(of Lemma 4.13) Assume that we are not in the first case, and also assume to the contrary that we are not in the second case either. We will iterate the following process, which will then result in a contradiction. Set
[TABLE]
We start from any in the fiber of with .
Step 1: Let , then . Let , then we have
[TABLE]
Step 2: If this vector has norm smaller than , then we use Claim 4.14 to dilate appropriately by so that , and set
[TABLE]
We then return to Step 1 and iterate the process, note that while the are bounded by , we don’t have such a bound for the .
Now for each we can always write
[TABLE]
where . Indeed, to verify this we first divide by and get a remainder ; we then divide the quotient by to get a remainder , and then divide the new quotient by , etc until the last step. Now as (this is where we require to be small), and as , we must stop the division process after steps.
Next we we analyze the norm of . We write, with
[TABLE]
Thus by the triangle inequality, and as we have
[TABLE]
We next consider
[TABLE]
By the triangle inequality
[TABLE]
where in the last estimate we used the fact that and is the largest integer so that (where we recall that , and the role of was only to dilate this vector if its norm was much smaller than this, as in the proof of Claim 4.14). The analysis for and other terms can be done similarly.
Adding all the bounds, we hence obtain
[TABLE]
Now as this is true for all , we thus have
[TABLE]
On the other hand, by (25), as has at least non-zero entries, the left hand side can be shown to be at least , which is a contradiction if . ∎
With the same proof, we record the following corollary which will be used later.
Corollary 4.15** (ULCD cannot be small).**
Assume that and that for . Assume that , for , and has at least non-zero components. Then we either have either for all , or there exists such such that and that
[TABLE]
5. Structures of vectors in : a combinatorial approach
Now we present our third characterization. Let be an -balanced distribution in . For simplicity, we again assume to be Bernoulli , the general -balanced case can be treated almost identically as in the previous two sections. Our goal here is the following.
Theorem 5.1** (Combinatorial structure, characterization III).**
Let be an integer. Let be any function such that . For any non-zero vector we have
[TABLE]
where are independent and identically distributed copies of , and where is the number of solutions to , and .
Our approach here is somewhat similar to [8], which in turn follows the original approach of Halász in [18]). However, the key difference here is that we are estimating the deviation of from rather then giving an upper bound for as in [8].
Proof.
(of Theorem 5.1) We follow the proof of Theorem 4.4 until Equation (27) that
[TABLE]
where .
Denote . Then we see that
[TABLE]
By Markov’s inequality,
[TABLE]
By expanding out the RHS and summing over instead, we can bound the RHS from above by
[TABLE]
The rest can be completed as in Theorem 4.4:
[TABLE]
as claimed. ∎
6. Non-structures of normal vectors
In this section we use the three characterizations above to establish Theorem 1.5. First, it is easy to show that normal vectors are non-sparse with high probability.
Lemma 6.1**.**
There exists an absolute constant such that with probability any normal vector of satisfies (25).
Proof.
This follows from Odylzko’s lemma, see for instance [30, 27]. ∎
Now we use the results from Sections 3, 4, and 5 to show that the normal vectors cannot have any structure. In our first proposition, we use the structure from Section 3.
Proposition 6.2** (Normal vectors cannot have additive structures).**
Let . Let be the first columns of a matrix whose entries are iid copies of a -balanced random variable, where for some sufficiently small constant . Let be any non-zero vector that is orthogonal to . Then with probability at least , the vector cannot have structure as in the conclusion of Theorem 3.1. In particular, we have
[TABLE]
where the implied constant depends on .
Proof.
(of Proposition 6.2) First of all, from Lemma 6.1, with a loss of in probability we can assume that is not sparse. Assume that
[TABLE]
where . Also, by Corollary 4.6, it suffices to assume
For convenience, let , and so . Then by Theorem 3.1, we have a generalized arithmetic progression of rank in and of size that contains all but entries of . Note that the number of ways to choose such a is bounded by
[TABLE]
Given , the number of vectors whose components are from is at most
[TABLE]
provided that .
Given for which , the probability that is orthogonal to is bounded by
[TABLE]
provided that for some small positive constant .
Taking union bound over only choices of , we obtain the claim. ∎
We next use the result from Section 4 to show that the random normal vector does not have small .
Proposition 6.3** (Normal vectors cannot have small ULCD).**
Assume that and that for . Let be the first columns of a random Bernoulli matrix, where . Let be any non-zero vector that is orthogonal to . Then with probability at least , we have
[TABLE]
with some depending on and . In particular, Theorem 1.5 holds.
We remark that in the above theorem we assume to be a Bernoulli matrix. Our treatment also works for other integral matrices 222Here the random entries of take value in , although in our results we view as a matrix of entries from (or ) via the natural map . with but it does not seem to extend to -balanced ensemble as in Proposition 6.2 (although Theorem 4.4 holds for this setting). The main reason is that at some point in the proof we pass to a net of vectors in , and then under the action of the size of this net will blow up if has large norm, see (28).
Proof.
(of Proposition 6.3) We will show that with high probability, there does not exist in the fiber of such that and that
[TABLE]
To do this, we divide this range into dyadic intervals . For , let
[TABLE]
Lemma 6.4** (Size of the approximating net).**
Let be given sufficiently small compared to (where ). accepts a -net of size if and of size if and such that .
Before proving this result by following [32], let use introduce a fact that will be useful to our nets.
Fact 6.5**.**
Assume that accepts a -net of size , then also accepts a -net such that and which has size at most .
Proof.
By throwing away vectors from if needed, we assume that each -approximates at least one vector from . Let be a collection of such (we choose an arbitrary from that is -approximated by any .) Thus . Now for any , there exists such that , and also by definition there also exists such that . Thus we have , so is a -net of . ∎
Proof of Lemma 6.4.
By taking union bound over a small number of choices (at most choices) we assume that for some we have
[TABLE]
By definition, as and , there exists such that
[TABLE]
This implies that
[TABLE]
and hence
[TABLE]
Thus
[TABLE]
Now as , we also have and so
[TABLE]
Let be the collection of vectors , where ranges over choices in the set , and ranges over all integer vectors in satisfying .
Now we bound the size of basing on the magnitude of .
Case 1. If , then the number of integral vectors of norm at most is known to be bounded by , and so
[TABLE]
Case 2. If , where is sufficiently small, then all but entries of are zero. So the number of such vectors is bounded by , and so
[TABLE]
Finally, we can always assume to consist of vectors from by using Fact 6.5. ∎
Now we use the obtained net to show that normal vectors in iid matrices cannot have small .
For short, the method below works as follows: for (viewed as vectors in ) we have , where . Then we approximate this vector by an element from the obtained net, and then pass to consider the probability from each net element. After approximation, we have that is close to in -norm, and so we can apply the classical Erdős-Turán bound.
Now we complete the proof of the proposition. Assume otherwise, then by the argument above, by passing to an appropriate , we can assume that , and that for some from dyadic intervals. As is orthogonal to in , we then have the following key property for
[TABLE]
where is the matrix formed by .
By Lemma 6.4, there exists such that
[TABLE]
It is well known that with probability at least (We note that this is the only place where we used to prevent the net from expanding), and so we will condition on this event. We then have
[TABLE]
Therefore,
[TABLE]
Let be this event, whose probability will be bounded shortly. By Theorem 4.11, as obviously , we have
[TABLE]
where in the last estimate we used the fact that with sufficiently small .
By Lemma A.1 we thus have for some absolute positive constant
[TABLE]
Putting together using union bound over all from the net, as , we obtain in the case a bound
[TABLE]
Note that here we have to assume at least.
Also, in the second case that , noting that
[TABLE]
assuming that is sufficiently large compared to , and that . ∎
In our last result of this subsection, by using the terminology of Section 5, we show the following.
Proposition 6.6** (Normal vectors cannot have combinatorial structure).**
Assume that and that for . Let be the first columns of a matrix whose entries are iid copies of an -balanced random variable, where for some sufficiently small constant . Let be any non-zero vector that is orthogonal to . Then with probability at least , we have that
[TABLE]
where and are constants that only depend on and .
Note that this result holds for -balanced ensembles where we don’t have to assume .
Let be any vector in . We first record the following elementary relation (where we recall from Theorems 3.1 and 4.1).
Fact 6.7**.**
For any we have
[TABLE]
Proof.
It suffices to show this for . We first write
[TABLE]
and hence
[TABLE]
where we note that both sides are non-negative.
We can bound the minimum in a similar fashion
[TABLE]
and so
[TABLE]
where we note that both sides are non-positive.
Putting this together, we thus obtain
[TABLE]
completing the proof. ∎
We next need the following key definitions and results from [8].
Definition 6.8**.**
For an , and , we define to be the number of solutions to
[TABLE]
that satisfy .
We will make use of the observation from [8] that is never much larger than .
Lemma 6.9** (Lemma 1.6, [8]).**
For all integers with and any prime , and ,
[TABLE]
As we will have the occassion to deal with subsets of vectors which we consider as vectors in their own right, we introduce the notation to mean the dimension of a vector . By we mean that is a truncation of . The key technical result in [8] is the following combinatorial lemma, which helps control the number of vectors with many “local” arithmetic relations.
Lemma 6.10**.**
[8, Theorem 1.7]** Denote
[TABLE]
Then
[TABLE]
At this point, we fix , and define
[TABLE]
So roughly speaking, this is the set of which are not arithmetically rich. As Lemma 6.10 suggests, this set captures most of the vectors. More precisely we have the following (see also [8]).
Corollary 6.11**.**
If and for , then for
[TABLE]
Proof.
(of Corollary 6.11) We can assume that , otherwise the statement is trivially true as the left-hand side is zero. Fix a subset with and enumerate the vectors with . By assumption, so the restriction of to the set is an element of . Therefore, Lemma 6.10 guarantees that the number of possible choices for is at most
[TABLE]
where the second inequality follows from our assumption that . We obtain the final result by summing over all subsets . ∎
The next lemma is a simple consequence of Theorem 5.1.
Lemma 6.12**.**
Suppose that . If and for , then if , there exists a constant such that
[TABLE]
Proof.
(of Lemma 6.12) Let be a subvector of with and . In the notation of Theorem 5.1, if we let then
[TABLE]
This expression is dominated by the first term by our bound on the range of and our choice of . We recall that
[TABLE]
to finish the proof. ∎
Now we complete one of our main results of the section.
Proof.
(of Proposition 6.6) We let denote the vectors in with support larger than and the set of non-zero vectors that such that
[TABLE]
By Corollary 4.6, we can assume . Observe that
[TABLE]
By Lemma 6.1,
[TABLE]
Therefore, it suffices to focus on the vectors in . Note that any must reside in since .
There are two cases to consider.
Case 1. We begin with vectors in . Let be such a vector, then by definition of and By Lemma 6.12,
[TABLE]
Because of the lower bound, by Theorem 3.6 there exists a generalized arithmetic progression of rank one in and of size that contains all but entries of . Note that the number of ways to choose such a is bounded by . For a fixed , the number of vectors with at least components in is at most
[TABLE]
The probability that any with at least components in is orthogonal to is bounded by
[TABLE]
provided that for some small constant . Finally, we take a union bound over choices of to conclude the proof.
Case 2. We address the remaining vectors. Let . We show that no vector in is orthogonal to . We can now partition as
[TABLE]
where is the smallest integer such that . Clearly, . We then have
[TABLE]
Combining Corollary 6.11 and Lemma 6.12, we have (noting trivially that )
[TABLE]
where the last line follows from small enough .
Combining the above estimates, we can conclude that
[TABLE]
as desired. ∎
7. Distribution of ranks revisited
In this section we give a short proof for Theorem 1.4. We start with a high-dimensional lemma, which, in some sense, is a discrete analog of [33] where they considered distance of a random vector to a subspace of condimension in .
Lemma 7.1**.**
Assume that is a subspace in of codimension , and such that for any we have . Then
[TABLE]
Proof.
(of Lemma 7.1) Let be a basis of . Our assumption says that for any , not all zero, we have
[TABLE]
Note that in
[TABLE]
We have
[TABLE]
Now by our assumption
[TABLE]
Thus we have
[TABLE]
completing the proof. ∎
Now we apply Propositions 6.2, 6.3 and 6.6 to prove the following.
Lemma 7.2**.**
Assume that and that for and . There exists an event with probability such that the following holds
[TABLE]
where is the subspace generated by .
Assume this Lemma, we can then complete Theorem 1.4 by direct calculations, or by applying [30, Theorem 5.3], we leave it for the reader as an exercise.
Proof.
(of Lemma 7.2) We have seen from Propositions 6.3 and 6.6 that there is an event with such that for any we have
[TABLE]
Now conditioning on this event, if then the codimension of is , and hence by Lemma 7.1 we have
[TABLE]
as claimed. ∎
8. Equi-distribution of the normal vectors
In this section we prove Theorem 1.6. For convenience we decompose the task into two parts.
Proposition 8.1**.**
With the same assumption as in Theorem 1.6 we have
- •
For each , we have
[TABLE]
- •
For each , and for any we have
[TABLE]
Proof.
(of Proposition 8.1) We prove the first item of Proposition 8.1. It suffices to show this for . Fix . We seek to bound the probability of the event that our normal vector has under the condition that our first columns achieve full rank, i.e. Suppose we are given such a normal vector. Then restricting to the bottom rows, we see that this is equivalent to the event that the submatrix has a nontrivial nullspace. So we rewrite as is singular . We can simply view this as
By Theorem 1.4 (see also [27, Theorem A.4]), we know that
[TABLE]
Now consider the event This is the event that rows span a subspace of dimension and is not in the span of , which can be expressed as . For , we again use Theorem 1.4,
[TABLE]
For , our previous section tells us that if we condition on rows having rank equal to , then our normal vector exists and has large , i.e. . So the probability that is in the span of under this condition is
[TABLE]
Putting this all together, we have
[TABLE]
Now we prove the second item. It suffices to assume . The event that is equivalent to the event that , where is the -th row of our matrix. If is zero, we are done via the previous argument, so assume is nonzero.
Let be the span of rows , which has full rank by our rank assumption on and the fact that . Further, let be the projection to the orthogonal complement For each evaluation of rows , is deterministic and where is the deterministic normal vector. Applying this projection to the linear combination, we have
[TABLE]
Since is deterministic, each inner product takes values with probability uniformly with error by Theorem 6.3. This means that , as the ratio of the inner products, is also uniformly distributed with probability and similar error. ∎
We next prove the second part of Theorem 1.6, restated for convenience.
Proposition 8.2**.**
Assume that . Let denotes the number of such that . Then for any which might depend on such that . We then have
[TABLE]
Proof.
(of Proposition 8.2) Let denote the set of vectors under consideration up to scaling, and be the complement. We first have the following elementary fact.
Fact 8.3**.**
We have
[TABLE]
Proof.
First, we note that each has distribution Bin() with variance
[TABLE]
Letting and denote the mean of this distribution, the upper-tail and lower-tail Chernoff inequalities combine to form the following bound:
[TABLE]
For each in , let denote the event that Then trivial union bound gives
[TABLE]
Since there are different choices for , the number of non-equidistributed vectors is at most . ∎
Now we complete our result. Let be an arbitrary vector in . We seek to upper bound is normal and ). Immediately we have is normal and is bounded above by
[TABLE]
Similar to our previous sections, we may decompose the sum into classes where is sparse and is non-sparse. By Lemma 6.1, the contribution over our sparse vectors is negligible. For our non-sparse vectors, we appeal to Theorem 6.3. We can now bound the sum via:
[TABLE]
as long as , completing the proof. ∎
9. Proof of Theorem 1.7
It suffices to prove Theorem 9.1 below. Again, for simplicity we will assume to be an iid Bernoulli matrix taking values with probability 1/2 and . We recall that has degree and is sufficiently large and
[TABLE]
Let be a root of and consider the field extension . Notice that any element of this field has form
[TABLE]
More importantly, the event is equivalent to the event that has rank at most in this field . In other words, let be the subspace in generated by the first columns of the matrix (equivalently, the columns of ), then the event that has rank at most is the union of the (disjoint) events that has rank and the -th column belongs to . In what follows we will be mainly focusing on the case of , treatments for will be discussed later, and summing over these events will imply Theorem 1.7.
Theorem 9.1**.**
There exists an absolute constant such that
[TABLE]
Consider the normal vector of (i.e. the column space of the matrix ). This vector can be written as
[TABLE]
where . Notice that as for we have . So we have that
[TABLE]
This implies that
[TABLE]
Let
[TABLE]
By fixing the last coordinates of each (and with a loss of a multiplicative factor in probability), and by fixing (i.e. is the last row of ), with be the truncated vectors we can rewrite (30) as (with )
[TABLE]
In what follows we set
[TABLE]
Conditioning on , we will show the following key lemma.
Lemma 9.2**.**
With probability with respect to the columns of the matrix , for any vector that is orthogonal to the first columns of the subspace generated by (defined as above) in cannot have -sparse vector. In other words, there do not exist coefficients , not all zero, such that
[TABLE]
We can actually prove a slightly more general version of this lemma, which will be used to control .
Lemma 9.3**.**
With probability at most with respect to the columns of the matrix , for any vector that is orthogonal to the first columns of (where ) there do not exist coefficients , not all zero, such that
[TABLE]
We remark that the above lemmas are somewhat similar to Propositions 6.2, 6.3, 6.6, but the situation here is much more complicated as the relation between and are non-trivial (for instance is not orthogonal to ), and also the diagonal entries are perturbed by .
We postpone the proof of this lemma for a moment, and let us use it to prove the following result, which automatically implies Theorem 9.1.
Theorem 9.4**.**
On the event of Lemma 9.2 we have
[TABLE]
Proof.
(of Theorem 9.4) Notice that the event implies that (by (30), using the same notations for ) for all
[TABLE]
In other words, conditioning on , and by letting and by choosing deterministic numbers appropriately we have
[TABLE]
[TABLE]
Now as is not very sparse for any non-trivial choice of , by Corollary 4.6 we have
[TABLE]
Thus we have
[TABLE]
∎
Notice that in the above proof, with , then the event can be written as
[TABLE]
where is the field trace. We just showed that
[TABLE]
In the same way, we show the following more general version of Theorem 9.1.
Theorem 9.5**.**
On the event of Lemma 9.3, as long as we have
[TABLE]
In particularly,
[TABLE]
Proof.
(of Theorem 9.5) Notice that the event is equivalent with the event that this vector is orthogonal to the (orthogonal basis) of in . We have
[TABLE]
Now observe that is a non-zero vector that is orthogonal to the first columns of . By Lemma 9.3 and by the proof of Theorem 9.4 (via Equation (34)) we have
[TABLE]
Plugging this bound into Equation (9) for each non-zero tuple we complete the proof.
∎
For the rest of this section we will be focusing on Lemma 9.2. The proof of Lemma 9.3 can be done similarly. Indeed, assume that with , then can be expressed as for some , one of which is non-zero if is non-zero. Hence if is -sparse in , then are -sparse for any . Choose one index where is non-zero, and hence not all are zero.
In what follows we prove the key lemmas on non-sparsity. We will mainly focus on Lemma 9.2 because the proof for Lemma 9.3 is almost identical as long as .
9.6. Proof of Lemma 9.2
Let us assume that is an -sparse vector. We first note that by a proper “rotation”, we can assume that this vector is . This can be seen by, where and are as before,
[TABLE]
By iterating (32) we have (with and from (36), and are deterministic, being determined by from (32))
[TABLE]
Our goal is that, assuming that is -sparse, then conditioning on a realization of possible values of , we show that the probability is very small, so that after taking union bound the probability is still negligible (where we will use the assumption that and ).
We will estimate by a decoupling process, which roughly speaking allows us to pass from polynomials of to multilinear forms where the factors are sparser matrices than , but they are independent, and so that we can control the probability easier. This process, roughly speaking, can be described as follows: assume that is a matrix obtained from by replacing all entries by zero, except the -th block of rows (in general we decompose the rows of into groups, each with consecutive indices of size approximately , and our matrices are formed by rows within a group). Then we can write
[TABLE]
where and are submatrices obtained from by dividing the -th block of rows into two groups of (almost) equal size.
In general, in each step of our process, we decrease the polynomial degrees of a given matrix (the total degree remains the same), but double the number of matrices, and hence the probability. We will rely on the following well-known decoupling result (see for instance [7, 41]).
Claim 9.7**.**
Let be two random vectors in and , and let be a function. Then for any we have
[TABLE]
[TABLE]
where is an independent copy of , and is an independent copy of .
In our application below , etc will be matrices. Now we describe the process in more details.
- •
We start by decomposing in (36). Factoring out, we will obtain a sum of many products of and . In the next step we will be using Claim 9.7 to remove and (i.e. the highest degree polynomial of and ) accordingly.
- •
In general, assume that we have , where are products of matrices that do not contain (it is possible that is just the identity matrix), and appears times; we then decompose into block matrices, expanding the products we obtain
[TABLE]
where in the total number of appearances of and are at most . We then keep , decouple by another independent matrix of the same distribution to remove , and then keep and , decouple by another independent matrix of the same distribution to remove . More precisely, by Claim 9.7 (with )
[TABLE]
It is crucial to note that by doing so, if the highest degree of (which was decomposed into ) was also , then the products having factors of (and also ) are canceled out in .
- •
In summary, after each round of the decoupling process, we will not create higher polynomials elsewhere but replace which appears times in the product form by four matrices which appear at most times. Hence, after steps of iterating the process, we will create a sum of many multilinear forms in which each matrix factor appears at most once. In other words they might have the form
[TABLE]
where are also multilinear forms which might also contain but the total degrees are smaller than .
Notice that in the matrix the are approximately non-zero rows.
Proof.
(of Theorem 9.4) We have seen that
[TABLE]
Set
[TABLE]
A simplified case. In order to motivate our next step, let us assume for now that our RHS (37) has only the term , that we are estimating the probability that .
Let be fixed. We will assume to have exactly non-zero entries (noting that taking union bound over will not significantly change our bounds), then by the fact that for some absolute constant and by the fact that has iid row vectors (where ) we have that
[TABLE]
In the next step, conditioning on this event of , we consider the vector and apply the following fact (by relying on Theorem 4.1)
Claim 9.8**.**
Assume that is not -sparse, and is a random matrix with iid rows as in . Then for sufficiently large , and for
[TABLE]
Proof.
Note that if is not -sparse then by Theorem 4.1
[TABLE]
As such, as is sufficiently large the probability under consideration is bounded by
[TABLE]
where we take a union bound over all possible positions for the zero coordinates of . ∎
We remark the the above might continue to hold for small , but we will not be focusing on this case for simplicity.
We next iterate Claim 9.8, by taking union bound and with an assumption that we obtain a bound
[TABLE]
for the event that (or is -sparse).
General case. First let us record here a slightly more general variant of Claim 9.8. We say that a random vector is -free if all but at most coordinates of are determined. So for instance -sparse vectors are -free because all but at most coordinates of are zero. By an identical proof to Claim 9.8, we have
Claim 9.9**.**
Assume that is not -free, and is a random matrix with iid rows as in . Then for sufficiently large , and for
[TABLE]
To continue, recall that we would like to estimate , and here we cannot expose , and then , etc one by one in order as in the simplified case above because might not appear exactly in the -th position of each multilinear forms. However we can adjust the process using the following observation.
Fact 9.10**.**
For any , and for any vector , the vector of have non-zero entries only in the -th block. In particularly, if a vector is -sparse, then is also -sparse.
Now we describe the method. First, basing on Fact 9.10 we just need to address all the multilinear forms beginning with . The vector obtained by summing over these forms is of type . To estimate the probability that this vector is not -free, we will focus only on the submatrix restricted by the columns of having the same index as the rows of , and condition on the remaining entries of . Only using the randomness from this submatrix of , it suffices to show that is not -free with high probability. To show this, assume that we already know that is non-sparse, we then can use (39) (i.e. Claim 9.9), where is allowed to depend on . Now to show that is not -free with high probability (where the randomness is on ), where is a sum of multilinear forms without and , we again focus on the submatrix of restricted by the columns having the same indices as that of , and continue forward. So in the last step of our argument (or the first step if we go backward as in the simplified case above), we just need to show that is not -free with high probability with respect to the randomness of and for some appropriate number , but this was exactly (38). ∎
10. Proof of Theorem 1.8
We are going to prove the following result.
Proposition 10.1** (asymptotically independence).**
Let be fixed. Then for any distinct numbers in ,
[TABLE]
where is the event that is an eigenvalue of .
Proof.
(of Theorem 1.8) Let denote the random variable
[TABLE]
By Proposition 10.1, one easily has , and more generally for any fixed integer
[TABLE]
where . It thus follows that is asymptotically distributed as . In particularly
[TABLE]
∎
It remains to justify the above independence lemma.
10.2. Proof of Proposition 10.1
The event is equivalent to the event that there exist such that
[TABLE]
We condition on . Let be a normal vector of . The event then implies that
[TABLE]
We are going to show that the probability of this event with the randomness on is modulo a small error term for almost all realization of . To be more precise, we will restrict on the following event of probability close to 1.
Lemma 10.3**.**
Let be a sufficiently small constant to be chosen later. With probability at least with respect to , none of the vectors for all , not all zero, is -sparse.
Assuming this for now we can conclude our main result.
Proof.
(of Proposition 10.1) Conditioning on the event consider in Lemma 10.3, we can then just follow the proof of Theorem 9.4 applied to the events in (40). More precisely, the estimates following Equation (33) hold with for . ∎
10.4. Proof of Lemma 10.3
Set , for sufficiently small . By taking union bound (with a loss of a multiplicative factor in probability), it suffices to consider the probability that is -sparse for a fixed choice of .
Lemma 10.5**.**
With probability at least , cannot be -sparse.
As in the previous section, let denote the matrix . Our result says that the vectors can be viewed as null vector of a polynomial of degree of . More specifically, let be the last row of .
Claim 10.6**.**
There exist coefficients and such that
[TABLE]
It is clear that, by using this claim, we can complete the proof of Lemma 10.3 by following exactly the decoupling process in the proof of Theorem 9.4 in the previous section (which in turns yield the bound , obtained at the last step of the process). This bound is clearly strong enough to absorb all union bounds of type given the range of . It thus remains to justify the result above.
Proof.
(of Claim 10.6) By fixing (and hence we lose another multiplicative factor of in probability), and is the concatination of , by we have
[TABLE]
As such
[TABLE]
In other words, under the action by , we eliminate (or we changed it to a deterministic vector). Iterating the process, we then obtain that
[TABLE]
where depend on and . ∎
11. Remarks
First, among the three characterizations provided, Theorem 3.1 was less effective in our current applications because of the polynomial restriction, but this result is expected to have other implications beyond random matrix theory because of its near optimality. The remaining two characterizations, Theorem 4.4 and Theorem 5.1, yield sub-exponential bounds in application. The later is more amenable to perturbations (that we don’t have to assume ).
Second, we remark that the error bounds in Theorem 9.1 and in Proposition 10.1 are of the form , which were obtained by applying Corollary 4.6. Compared to the justification for the uniform model in Section 2, our approach seems to be natural and does explain the main terms in Theorem 1.7 (obtained from Theorem 9.1 and Theorem 9.5) and in Theorem 1.8 (obtained from Proposition 10.1). Following our treatment of Section 7, it is natural to expect that these error bounds can be made , but for this improvement one has to show that the vectors in (36) or in Claim 10.6 to have large ULCD or large , but this task seems to be extremely challenging.
Universality is an extremely complicated phenomenon. While we have addressed only a few universal examples for random matrices in (prime) fields in the current note, there remains so many interesting and tantalizing questions. Beside the obvious (and doable) direction of extending the current results to general finite fields , we conjecture that the following statistics of the uniform model are universal in terms of iid random matrix model:
- •
the distributions of are asymptotically independent for different irreducible polynomials (which would then generalize (5));
- •
the results of Stong and of Hansen and Schmutz [19] connecting the distribution of degrees of irreducible factors of the characteristic polynomial to the cycle lengths of a random permutation.
Lastly, we conjecture that for a fixed random matrix model (such as the Bernoulli or model), the considered statistics over for different primes are asymptotically independent.
Acknowledgements. The authors are thankful to J. Koenig for helpful comments. The first author is supported in part by the National Science Foundation postdoctoral fellowship DMS-1702533. The last two authors are partially supported by National Science Foundation grants DMS-1600782 and CAREER DMS-1752345.
Appendix A Tensorization lemma
The following is an analog of [32, Lemma 2.2].
Lemma A.1**.**
Let be given. Assume that are iid real-valued random variable and that for all . Then
[TABLE]
where is absolute.
Proof.
Assume that . By Chebyshev’s inequality
[TABLE]
On the other hand,
[TABLE]
For we use , while for we have . Thus
[TABLE]
as desired. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. V. Balakin, The distribution of the rank of random matrices over a finite field (Russian, with English summary), Teor. Veroyatn. Primen. 13 (1968), 631-641.
- 2[2] J. Blomer, R. Karp, and E. Welzl, The rank of sparse random matrices over finite fields, Random Structures Algorithms 10 (1997), no. 4, 407-419.
- 3[3] R. P. Brent and B. D. Mc Kay, Determinants and ranks of random matrices over ℤ m subscript ℤ 𝑚 {\mathbb{Z}}_{m} , Discrete Math. 66 (1987), no. 1-2, 35-49.
- 4[4] J. Bourgain, V. Vu and P. M. Wood, On the singularity probability of discrete random matrices, Journal of Functional Analysis 258 (2010), no.2, 559-603.
- 5[5] J. Clancy, T. Leake, N. Kaplan, S. Payne, and M. M. Wood. On a Cohen-Lenstra heuristic for Jacobians of random graphs, Journal of Algebraic Combinatorics 42 (2015), no. 3, 701-723.
- 6[6] C. Cooper, On the rank of random matrices, Random Structures Algorithms 16 (2000), no. 2, 209-232.
- 7[7] K. Costello, T. Tao, V. Vu, Random symmetric matrices are almost surely non-singular, Duke Math. J. 135 (2006), 395-413.
- 8[8] A. Ferber, V. Jain, K. Luh and W. Samotij, On the counting problem in inverse Littlewood–Offord theory, arxiv.org/abs/1904.10425 .
