Convergence Rate of Empirical Spectral Distribution of Random Matrices from Linear Codes
Chin Hei Chan, Vahid Tarokh, Maosheng Xiong

TL;DR
This paper proves that the empirical spectral distribution of certain random matrices from linear codes converges to the Marchenko-Pastur law at a rate of at least n^{-1/4}, expanding understanding of spectral convergence in coding theory.
Contribution
It establishes a quantitative convergence rate for the spectral distribution of random matrices from linear codes, under conditions on the dual code's Hamming distance.
Findings
Convergence rate of at least n^{-1/4} in probability
Spectral distribution converges to Marchenko-Pastur law
Applicable to linear codes with dual Hamming distance ≥ 5
Abstract
It is known that the empirical spectral distribution of random matrices obtained from linear codes of increasing length converges to the well-known Marchenko-Pastur law, if the Hamming distance of the dual codes is at least 5. In this paper, we prove that the convergence in probability is at least of the order where is the length of the code.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Convergence Rate of Empirical Spectral Distribution of Random Matrices from Linear Codes
Chin Hei Chan, Vahid Tarokh and Maosheng Xiong C. Chan is at the Dept. of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (email: [email protected]).V. Tarokh is at the Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA (email: [email protected]).M. Xiong is at the Dept. of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (email: [email protected]). The research of M. Xiong was supported by RGC grant number 16303615 from Hong Kong.
Abstract
It is known that the empirical spectral distribution of random matrices obtained from linear codes of increasing length converges to the well-known Marchenko-Pastur law, if the Hamming distance of the dual codes is at least 5. In this paper, we prove that the convergence rate in probability is at least of the order where is the length of the code.
Index Terms:
Group randomness, linear code, dual distance, empirical spectral measure, random matrix theory, Marchenko-Pastur law.
I Introduction
Random matrix theory is the study of matrices whose entries are random variables. Of particular interest is the study of eigenvalue statistics of random matrices such as the empirical spectral measure. It has been broadly investigated in a wide variety of areas, including statistics [24], number theory [17], economics [18], theoretical physics [23] and communication theory [22].
Most of the matrix models in the literature are random matrices with independent entries. In a recent series of work (initiated in [2] and developed further in [1, 3, 25]), the authors considered a class of sample-covariance type matrices formed randomly from linear codes over a finite field, and proved that if the Hamming distance of the dual codes is at least 5, then as the length of the codes goes to infinity, the empirical spectral distribution of the random matrices obtained in this way converges to the well-known Marchenko-Pastur (MP) law. Since truly random matrices (i.e. random matrices with i.i.d. entries) of large size satisfy this property, this can be interpreted as that sequences from linear codes of dual distance at least 5 behave like random among themselves. This is a new pseudo-random test for sequences and is called a “group randomness” property [1]. It may have many potential applications.
How fast does the empirical spectral distribution converge to the MP law? This question is interesting in its own rights, and important in applications as one may wish to use linear codes of proper length to generate pseudo-random matrices. Along with proving the convergence in expectation, the authors in [25] obtained a convergence rate of the order where is the length of the code. This is quite unsatisfactory, as the numerical data showed clearly that the convergence is rather fast with respect to . In this paper, we prove that the convergence rate is indeed at least of the order in probability. This substantially improves the previous result.
To introduce our main result, we need some notation.
Let be a linear code of length and dimension over the finite field of order , where is a prime power. is called an linar code for short. The most interesting case is the binary linear codes, corresponding to . The dual code consists of the -tuples in which are orthogonal to all codewords of under the standard inner product. Clearly, is also a linear code. Denote by the Hamming distance of . It is called the dual distance of .
Let be the standard additive character. To be more precise, if has characteristic , which is a prime number, then is given by , where is the absolute trace mapping from to . In particular, if , then the map is defined as . We extend component-wise to and obtain the map . Denote .
Denote by a matrix whose rows are chosen from uniformly and independently. This makes the set a probability space with the uniform probability.
Let be the Gram matrix of , that is,
[TABLE]
where means the conjugate transpose of the matrix . Let be the empirical spectral measure of , that is,
[TABLE]
where are the eigenvalues of and is the Dirac measure at the point . Note that is a random measure, that is, for any interval , the value is a random variable with respect to the probability space . Our main result is as follows.
Theorem 1**.**
Assume that is fixed. If , then
[TABLE]
uniformly for all intervals . Here is the empirical spectral measure of the Marchenko-Pastur law whose density function is given by
[TABLE]
where the constant and are defined as
[TABLE]
and is the indicator function of the interval .
Remark 1**.**
The symbol in (3) is a standard notation for “stochastic domination” in probability theory (see [8] for details). Here it means that for any and any , there is a quantity , such that whenever , we have
[TABLE]
where is the probability with respect to and the supremum can also be taken over all linear codes of length over with .
Remark 2**.**
Theorem 1 is reminiscent of a well-known result of Sidel’nikov ([16, 21]) which states that for any binary linear code with dual distance , one has
[TABLE]
Here is the normalized cumulative weight distribution of and
[TABLE]
Hence the “randomness” of the weight distribution of is ensured if is sufficiently large. With this respect Theorem 1 is a little surprising since the condition already ensures a fast convergence rate to the MP law (see Equation (3)). We emphasize that the condition is also optimal: the work [1] showed that the empirical spectral distribution of random matrices based on binary Simplex (shortened first-order Reed-Muller) codes with does not converge to the MP law; a similar calculation shows that the empirical spectral distribution of random matrices based on binary first-order Reed-Muller codes with does not converge to the MP law either.
Remark 3**.**
It seems quite possible to extend our results to nonlinear codes, where the dual distance is defined as in [16, Chapter 5]. In this paper, however, we focus only on linear codes.
Remark 4**.**
For application purposes, from Theorem 1, binary linear codes of dual distance 5 with large length and small dimension are desirable as they can be used to generate random matrices efficiently. Here we mention two constructions of binary linear codes with parameters and dual distance 5. The first family is the dual of primitive double-error correcting BCH codes ([13]). The second family of such codes, which includes the well-known Gold codes, can be constructed as follows: Let be a function such that . Let and be a primitive element of . Define a matrix
[TABLE]
Given a basis of over , each element of can be identified as an column vector in , hence the above can be considered as a binary matrix of size . Denote by the binary linear code obtained from as a generator matrix. Note that has length and dimension . It is known that the dual distance of is 5 if and only if is an almost perfect nonlinear (APN) function [10, 20]. Since there are many APNs when is odd, this provides a general construction of binary linear codes of dual distance 5 which may be of interest for applications.
Remark 5**.**
Binary linear codes can be used to construct deterministic sensing matrices which satisfy the important “statistical restricted isometry property” ([7, 11]). From an binary linear code , letting and using the same notation as in Theorem 1 for easy comparison, one obtains an matrix whose rows consist of all the codewords of under the map (so that is a matrix of entries ). The probability space is choosing distinct rows uniformly at random from to form a submatrix . Then the sensing matrix is said to have the - if
[TABLE]
Equation (6) states that the event that all the eigenvalues of the matrix lie in the interval has probability . Since the probability space in (6) is essentially the same as that in (3) of Theorem 1, it can be seen that Theorem 1 provides a strong and much more precise description about how the eigenvalues of are distributed along the real line, but Theorem 1 falls short of proving (6). It seems possible to prove (6) by considering a slightly different normalization of the random matrix from linear codes as done in ([12]). We might come back to this question in the future.
For truly random matrices with i.i.d. entries, finding the rate of convergence has been a long-standing question, starting from [4, 5, 15] in early 1990s. Great progress has been made in the last 10 years, culminating in achieving the optimal rate of convergence where is the size of the matrix (see [8, 14, 19]). The major technique is the use of the Stieltjes transform. In this paper we also use this technique.
The convergence rate problem for the empirical spectral distribution of large sample covariance random matrices has been studied for example in [5, 9], and in particular in [9] an optimal rate of convergence of order was obtained under quite general conditions. However, despite our best effort, none of the techniques in [5] and [9] can be easily applied directly to our setting. Instead we use a combination of ideas from [5] and [9]. Moreover, it is not clear to us what the best rate of convergence is under general linear codes in terms of dual distance. It might be interesting to find out a general sufficient condition on the dual distance that guarantees the optimal rate of convergence . We hope to stress this problem in the future.
The paper is now organized as follows. In Section II, Preliminaries we introduce the main tool, the Stieltjes transform and related formulas and lemmas which will play important roles in the Proof of Theorem 1. In Section III, we show how Theorem 1 can be derived directly from a major statement in terms of the Stieltjes transform (Theorem 4). While the argument is standard, it is quite technical and non-trivial. To streamline the idea of the proof, we put some of the arguments in Section V Appendix. In Section IV, we give a detailed proof of Theorem 4.
II Preliminaries
II-A Stieltjes Transform
In this section we recall some basic knowledge of Stieltjes transform. Interested readers may refer to [6, Chapter B.2] for more details.
Let be an arbitrary real function with bounded variation, and be the corresponding (signed) measure. The Stieltjes transform of (or ) is defined by
[TABLE]
where is a complex variable outside the support of (or ). In particular, is well-defined for all , the upper half complex plane. Here is the imaginary part of .
It can be verified that for all . The complex variable is commonly written as for .
The Stieltjes transform is useful because a function of bounded variation (or signed measures) can be recovered from its Stieltjes transform via the inverse formula ([4, 15]):
[TABLE]
Here means that the real number approaches zero from the right. Moreover, unlike the method of moments, the convergence of Stieltjes transform is both necessary and sufficient for the convergence of the underlying distribution (see [6, Theorem B.9]).
II-B Resolvent Identities and Formulas for Green function entries
Let be a matrix. Denote by the Green function of , that is,
[TABLE]
where and is the identity matrix.
Given a subset , let be the matrix whose -th entry is defined by . In addition, let be the Green function of . We write and as the Green functions of and respectively. Then for , we have [9, (3.8)]
[TABLE]
where the indices vary in , and is the -th entry of the matrix .
The two Green functions and are related by the following identity ([9, Lemma 3.9]):
[TABLE]
Here is the cardinality of the set , and is the trace of the matrix .
Recall that we denote . Then we have the following eigenvalue interlacing property ([9, Lemma 3.10])
[TABLE]
where is a constant depending on the set only, and also the Wald’s identity (see [9, (3.14)] or [8, (3.6)])
[TABLE]
II-C Stieltjes Transform of the Marchenko-Pastur Law
The Stieltjes transform of the Marchenko-Pastur distribution given in (4) can be computed as (see [5])
[TABLE]
It is well-known that is the unique function that satisfies the equation of in
[TABLE]
such that whenever .
If a function satisfies Equation 12 with a small perturbation, we then expect that should be quite close to as well. This is quantified by the following result. First, we define
[TABLE]
where and are constants given in (5) and for a fixed constant , we define
[TABLE]
Lemma 2**.**
[9, Lemma 4.5]** Suppose the function satisfies:
* for some fixed constant for all ;* 2. 2.
* is Lipschitz continuous with Lipschitz constant ;* 3. 3.
for each fixed , the function is nonincreasing for .
Suppose is the Stieltjes transform of a probability measure satisfying
[TABLE]
for some .
Fix and define , where is the real part of . Suppose that
[TABLE]
Then we have
[TABLE]
where is the -dependent variable defined as in (13).
II-D Convergence of Stieltjes Transform in Probability
The following result is useful to bound the convergence rate of a random Stieltjes transform in probability.
Lemma 3**.**
Let be a random matrix with independent rows, , and be the Stieltjes transform of the empirical spectral distribution of . Then
[TABLE]
Proof of Lemma 3.
Note that the -th entry of is simply the inner product of the -th and -th rows of . Hence varying one row of only gives an additive perturbation of of rank at most two. Applying the resolvent identity [8, (2.3)], we see that the Green function is also only affected by an additive perturbation by a matrix of rank of at most two and operator norm at most . Then the desired result follows directly by applying the McDiarmid’s Lemma [8, Lemma F.3].
∎
For the purpose of this paper, we define an -dependent event to hold with high probability if for any , there is a quantity such that for any .
III Proof of Theorem 1
From this section onwards, let be a linear code of length over with dual distance . Let be the standard additive character, extended to component-wisely. Write .
Let be a random matrix whose rows are picked from uniformly and independently. This makes a probability space. Let be fixed. Write and the Gram matrix of . Furthermore, let be the empirical spectral measure of given by (2).
Denote to be the Stieltjes transform of , which is given by
[TABLE]
where are the eigenvalues of the matrix , and is the Green function of , that is, . Note that in this setting this Stieltjes transform is itself a random variable.
Denote
[TABLE]
Here is the expectation with respect to the probability space .
III-A An equation for
In the following result, we write defined in (17) in the form of the equation (12) with a small perturbation.
Theorem 4**.**
For any ,
[TABLE]
where .
We remark that Theorem 4 is a major technical result regarding the expected Stieltjes transform , from which Theorem 1 can be derived directly without reference to linear codes at all. The proof of Theorem 4 is, however, quite complicated and is directly related to properties of linear codes. To streamline the idea of the proof, here we assume Theorem 4 and sketch a proof of Theorem 1. The proof of Theorem 4 is postponed to Section IV.
III-B Proof of Theorem 1
Assuming Theorem 4, we can first estimate the term , following ideas from [8] and [9].
Theorem 5**.**
Assume that Theorem 4 holds. Then for any fixed , we have
[TABLE]
with high probability.
Proof of Theorem 5.
We can check that all the conditions of Lemma 2 are satisfied: first by Theorem 4 we see that (15) holds for ; in addition, (16) holds for , and this function is independent of , nonincreasing in and Lipschitz continuous with Lipschitz constant . Hence by Lemma 2, we have
[TABLE]
Note that in we have . Therefore we have
[TABLE]
for all .
Now Lemma 3 implies that
[TABLE]
on , for any and large enough . Combining this with (18) completes the proof of Theorem 5. ∎
Finally, armed with Theorem 5, we can derive Theorem 1 from a standard application of the Helffer-Sjöstrand formula in random matrix theory. The argument is essentially complex analysis. Interested readers may refer to Section V Appendix for details.
IV Proof of Theorem 4
Now we give a detailed proof of Theorem 4, in which the condition that becomes essential.
IV-A Linear codes with
Recall the notation from the beginning of Section III. Let be a linear code of length over . First is a simple orthogonality result regarding .
Lemma 6**.**
Let . Then
[TABLE]
Here is the usual inner product between the vectors and .
As in Section III, let be a random matrix whose rows are picked from uniformly and independently and let . Denote by the -th entry of .
Corollary 7**.**
Assume . Then for any ,
(a) if ;
(b) if the indices do not come in pairs. If the indices come in pairs, then .
Here is the expectation with respect to the probability space .
Proof of Corollary 7.
For simplicity, denote by the vector with a at the -th entry and [math] at all other places.
(a) It is easy to see that
[TABLE]
As and , so , and the desired result follows directly from Lemma 6.
(b) Again we can check that
[TABLE]
If the indices do not come in pairs, since , we have , and the result is zero by Lemma 6; If the indices do come in pairs, noting that , we also obtain the desired estimate. This completes the proof of Corollary 7. ∎
IV-B Resolvent identities
We start with the resolvent identity (7) for . The sum on the right of (7) can be written as
[TABLE]
where
[TABLE]
[TABLE]
where
[TABLE]
IV-C Estimates of and
We now give estimates on the (-dependent) random variable . First, given , we denote .
Lemma 8**.**
For any , we have
(a) ;
(b) .
Proof of Lemma 8.
(a) From the definition of in (19), we have
[TABLE]
where the first equality follows from the fact that rows of are independent, and second equality follows from statement (a) of Corollary 7. The proof of the result on is similar by replacing with .
(b) Expanding and taking expectation inside, noting that the rows of are independent, we have
[TABLE]
Since , by using statement (b) of Corollary 7 and Wald’s identity (10), together with the trivial bound , we obtain
[TABLE]
Here is a generic absolute constant which may be different in each occurrence. ∎
The above estimations lead to the following estimations about .
Lemma 9**.**
For any , we have
(a) ;
(b) .
Proof of Lemma 9.
(a) By (21) we get
[TABLE]
where the second equality follows from (a) of Lemma 8. Using (9) we easily obtain
[TABLE]
(b) We split as
[TABLE]
where
[TABLE]
We first estimate . By the definition of in (21) and applying (a) of Lemma 8, we see that
[TABLE]
Then by (b) of Lemma 8 we obtain
[TABLE]
Next we estimate . Again by (21) and Lemma 8, we have
[TABLE]
Hence
[TABLE]
where and for . The second equality follows from applying successively the law of total variance to the rows of .
For , denote and . It is easy to check that
[TABLE]
Thus by (9) we have .
Putting this into (IV-C) yields
[TABLE]
Plugging the estimates of in statement (a), in (23) and above into the equation (22), we obtain the desired estimate of . This finishes the proof of Lemma 9. ∎
We can now complete the proof of Theorem 4.
Proof of Theorem 4.
Taking reciprocal and then expectation on both sides of (20), we get
[TABLE]
where
[TABLE]
[TABLE]
and
[TABLE]
Multiplying on both sides of (26) and using the estimate , we obtain
[TABLE]
Putting the results of Lemma 9 in (28), we get
[TABLE]
Using the fact that , we have , so that .
Substituting all these into (27) yields
[TABLE]
for all .
Then the theorem follows directly from summing both sides of (25) for all and then dividing both sides by . ∎
V Appendix
In this section, we use Helffer-Sjöstrand formula to prove Theorem 1 from Theorem 5. This is a standard procedure well-known in random matrix theory. We follow the idea based on [8, Appendix C].
First we define the signed measure and its Stieltjes transform by
[TABLE]
Now fix and define . For any interval , where and are constants defined in (5), we choose a smoothed indicator function satisfying for , for , and . These imply that the supports of and have Lebesgue measure bounded by . In addition, choose a smooth even cutoff function with for for and . Throughout this section, represents a positive constant whose value may vary in each appearance.
Then by the Helffer-Sjöstrand formula, we get
[TABLE]
As LHS is real, we can write as
[TABLE]
First, by the trivial identity and the fact that is Lipschitz continuous on the compact set , we can easily extend Theorem 5 as follows:
Lemma 10**.**
For any fixed , we have, with high probability,
[TABLE]
for all such that and .
We may now estimate the three terms appearing in (29)-(31). First, for the term in (31), by using the fact that is even with support in , we have
[TABLE]
with high probability.
We next estimate the term in (29). Since is small, we cannot apply Lemma 10 directly. However it can be proved that for all , the function is nondecreasing for . This implies, for ,
[TABLE]
with high probability.
Hence we have
[TABLE]
For the term in (30), we have
[TABLE]
The first term is zero. As for the second, by Cauchy-Riemann equation, we have
[TABLE]
For the first term, we get
[TABLE]
For the second term, we get
[TABLE]
Putting all together, we get
[TABLE]
with high probability.
Now we have to return from the smooth function to the indicator function of . If , then we get
[TABLE]
with high probability. On the other hand, denote by (which is hence a subset of ), then we also have
[TABLE]
with high probability. Hence
[TABLE]
with high probability. As is arbitrary, we conclude that for any .
Then for a general interval , we first note that we have proved that . As is a probability measure and , we deduce that . Therefore we have
[TABLE]
where in the last step we use that for . From the calculation it is easy to see that the above estimate holds simultaneously for all (i.e. the constant absorbed by is independent of ).
This completes the proof of Theorem 1.
Acknowledgments
The second author would like to thank David Forney, Rob Calderbank and Neil Sloane for stimulating discussions. The third author would like to thank Zhigang Bao for comments and suggestions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Babadi, S. S. Ghassemzadeh and V. Tarokh, “Group randomness properties of pseudo-noise and Gold sequences,” 2011 12th Canadian Workshop on Information Theory (CWIT) , 2011, pp. 42–46.
- 2[2] B. Babadi and V. Tarokh, “Random frames from binary linear block codes,” 2010 44th Annual Conference on Information Sciences and Systems (CISS) , Princeton, NJ, 2010, pp. 1–3.
- 3[3] B. Babadi and V. Tarokh, “Spectral distribution of random matrices from binary linear block codes,” IEEE Trans. Inform. Theory 57 (2011), no. 6, 3955–3962.
- 4[4] Z. Bai, “Convergence rate of expected spectral distributions of large random matrices. Part I. Wigner matrices,” Ann. Probab. 21 (1993), no. 2, 625–648.
- 5[5] Z. Bai, “Convergence rate of expected spectral distributions of large random matrices. Part II. Sample covariance matrices,” Ann. Probab. 21 (1993), no. 2, 649–672.
- 6[6] Z. Bai and J. W. Silverstein, Spectral analysis of large dimensional random matrices , 2nd ed. Springer Series in Statistics, 2010.
- 7[7] A. Barg, A. Marzumdar and R. Wang, “Restricted isometry property of random subdictionaries,” IEEE Trans. Inform. Theory 61 (2015), no. 8, 4440–4450.
- 8[8] F. Benaych-Georges and A. Knowles, “Local semicircle law for Wigner matrices,” in Advanced Topics in Random Matrices . Paris: Panoramas et Synthèses 53 , Société Mathématique de France, 2018, ch. 1, pp. 1–90.
