Random Matrices from Linear Codes and Wigner's Semicircle Law II
Chin Hei Chan, Maosheng Xiong

TL;DR
This paper proves that random matrices derived from linear codes over finite fields converge to Wigner's semicircle law, with a convergence rate depending on the code length, under the condition that the dual distance is at least 5.
Contribution
It establishes that a dual distance of at least 5 guarantees spectral convergence to Wigner's law with a specific convergence rate, extending previous results.
Findings
Spectral distribution converges to Wigner's semicircle law as code length increases.
Convergence rate is of order $n^{-eta}$ for some $0<eta<1$.
Dual distance ≥ 5 is sufficient for convergence.
Abstract
Recently we considered a class of random matrices obtained by choosing distinct codewords at random from linear codes over finite fields and proved that under some natural algebraic conditions their empirical spectral distribution converges to Wigner's semicircle law as the length of the codes goes to infinity. One of the conditions is that the dual distance of the codes is at least 5. In this paper, employing more advanced techniques related to Stieltjes transform, we show that the dual distance being at least 5 is sufficient to ensure the convergence, and the convergence rate is of the form for some , where is the length of the code.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Coding theory and cryptography · Advanced Algebra and Geometry
Random Matrices from Linear Codes and the Convergence to Wigner’s Semicircle Law
Chin Hei Chan and Maosheng Xiong
Abstract
Recently we considered a class of random matrices obtained by choosing distinct codewords at random from linear codes over finite fields and proved that under some natural algebraic conditions their empirical spectral distribution converges to Wigner’s semicircle law as the length of the codes goes to infinity. One of the conditions is that the dual distance of the codes is at least 5. In this paper, employing more advanced techniques related to Stieltjes transform, we show that the dual distance being at least 5 is sufficient to ensure the convergence, and the convergence rate is of the form for some , where is the length of the code.
Index Terms:
Group randomness, linear code, dual distance, empirical spectral measure, random matrix theory, Wigner’s semicircle law.
I Introduction
Random matrix theory is the study of matrices whose entries are random variables. Of particular interest is the study of eigenvalue statistics of random matrices such as the empirical spectral measure. It has been broadly investigated in a wide variety of areas, including statistics [25], number theory [17], economics [18], theoretical physics [24] and communication theory [23].
Most of the matrix models considered in the literature were matrices whose entries have independent structures. In a series of work ([2, 3, 26]), initiated in [1], the authors studied a class of matrices formed by choosing codewords at random from linear codes over finite fields and ultimately proved the convergence of the empirical spectral distribution of their Gram matrices to the Marchenko-Pastur law under the condition that the minimum Hamming distance of the dual codes is at least 5. This is the first result relating the randomness of matrices from linear codes to the algebraic properties of the underlying dual codes, and can be interpreted as a joint randomness test for sequences from linear codes. It implies in particular that sequences from linear codes with desired properties behave like random sequences from the view point of random matrix theory. This is called a “group randomness” property in [1] and may have many applications (see [20, 21] from a different perspective).
Recently we considered a distinct normalization of matrices obtained in a similar fashion from linear codes and proved the convergence of the empirical spectral distribution to the Wigner’s semicircle law under some natural algebraic conditions of the underlying codes (see [10]). This is also a group randomness property of linear codes. In this paper we explore this new phenomenon much further.
I-A Statement of Main Results
To describe our results more precisely, we need some notation. Let be a family of linear codes of length and dimension over the finite field of elements ( is called an code for short), where is a prime power. The most interesting case is binary linear codes, corresponding to . Denote by the dual code of and the Hamming distance of . is also called the dual distance of .
The standard additive character of extends component-wise to a natural mapping . For each , we choose distinct codewords from and apply the mapping . Endowing with uniform probability on the choice of the codewords, this forms a probability space. Put the distinct sequences as the rows of a random matrix . Denote
[TABLE]
where is the conjugate transpose of the matrix and define
[TABLE]
Here is the identity matrix.
For any matrix with eigenvalues , the spectral measure of is defined by
[TABLE]
where is the Dirac measure at the point . The empirical spectral distribution of is defined by
[TABLE]
Our first main result is as follows:
Theorem 1**.**
Suppose simultaneously as . If for any , then as , we have
[TABLE]
and the convergence is uniform for all intervals . Here is the spectral measure of the matrix and is the probability measure of the semicircle law whose density function is given by
[TABLE]
and is the indicator function of the interval .
We remark that originally in [10] the same convergence (3) was proved with an extra condition that there is a fixed constant independent of such that
[TABLE]
The condition (5) is natural as explained in [10], and when , it is equivalent to
[TABLE]
where is the Hamming weight of the codeword . It is interesting that this extra condition can be dropped. Now the result of Theorem 1 has the same strength as that of [26] where the condition alone is sufficient to ensure the convergence. It shall be noted that similar to [26], the condition in Theorem 1 is optimal because if , then Conclusion (3) is false for first-order binary Reed-Muller codes which have dual distance .
Our second main result shows that the rate of convergence (3) is fast with respect to the length of the codes.
Theorem 2**.**
Let be an code with dual distance . For fixed constants and , suppose and satisfy
[TABLE]
Then
[TABLE]
uniformly for all intervals , where is given by
[TABLE]
We remark that the symbol “” in (6) is a standard “stochastic domination” notation in probability theory (see [8] for details), which means that for any and any , there is a quantity , such that whenever , we have
[TABLE]
Here is the probability within the space of picking distinct codewords from and the supremum is taken over all intervals . Since and do not depend on , the supremum can be taken over all linear codes of length over with .
We also remark that is a very mild restriction on linear codes , and there is an abundance of binary codes that satisfy this condition, for example, the Gold codes ([15]), some families of BCH codes (see [13, 14]) and many families of cyclic and linear codes studied in the literature (see for example [12, 22]). Such binary linear codes can also be generated by almost perfect nonlinear (APN) functions [9, 19], a special class of functions with important applications in cryptography.
I-B Simulations
We illustrate Theorems 1 and 2 by numerical experiments. We focus on binary Gold codes augmented by the all-1 vector. It is known that binary Gold codes have length , dimension and dual distance 5. The augmented binary Gold codes has length , dimension and dual distance at least 5. Because of the presence of the all-1 vector, the condition (5) is not satisfied. For each triple in the set , we randomly pick codewords from the augmented binary Gold code of length and form the corresponding matrix, from which we use Sage to compute the eigenvalues and plot the empirical spectral distribution along with Wigner’s distribution (see Figures 1 to 4 below). We do the above 10 times for each such triple and at each time, we find that the plots are almost the same as before: they are all very close to Wigner’s semicircle law and as the length increases, they become less and less distinguishable.
In order to illustrate more clearly the shape of the eigenvalue distribution, we also plot a density graph, which is shown in Figure 5. This is based on picking codewords from a binary Gold code of length .
From (7) it is easy to see that and the upper bound is achieved when . It might be possible to improve this value and hence obtain a better convergence rate. From the simulation results, however, it is not clear to us what the optimal that one may expect is.
I-C Techniques and relation to previous work
This paper strengthens [10, Theorem 2] on two fronts: in Theorem 1 we obtain the same convergence by removing the extra condition (5), and in Theorem 2 we obtain a strong and explicit convergence rate with respect to the length of the code, and the results were supported by computer simulations.
The main technique we use in this paper is the Stieltjes transform, a well-developed and standard tool in random matrix theory, and the method is essentially complex analysis. From the view point of random matrix theory, in [6, 7, 27] the authors have used Stieltjes transform to study similar matrix models with success, however, our matrices, arising from general linear codes over finite fields with dual distance 5, possess characteristics significantly different from [6, 7, 27]. With applications in mind, say, to generate pseudo-random matrices efficiently via linear codes, our matrices are more natural and interesting. None of the methods in previous works seem to apply directly to our setting. Instead we adopt methods from [4, 5, 8] and use a combination of ideas to obtain our final results.
Related to this paper, the authors in [11] have used Stieltjes transform to obtain a strong convergence rate which is similar in nature to Theorem 2 of this paper, hence extending the work [26], and some of the arguments are similar.
The paper is organized as follows. In Section II we introduce Stieltjes transform and related formulas and lemmas which will play important roles later. The main ideas of proving Theorems 1 and 2 share some similarity but technically speaking, they are quite involved, with the latter being even more so. To streamline the idea of the proofs, we assume a major technical statement (Theorem 5) from which we prove Theorems 1 and 2 in Sections III and IV respectively. Finally we prove the required Theorem 5 in Section V.
II Preliminaries
II-A Linear codes over of dual distance at least 5
The standard additive character is given by
[TABLE]
where is the absolute trace mapping from to its prime subfield of order and is a (complex) primitive -th root of unity. In particular when , then and for . It is known that satisfies the following orthogonality relation:
[TABLE]
Let be an linear code with dual distance . By the sphere-packing bound [16, Theorem 1.12.1], we have
[TABLE]
here the implied constant in the big O-notation depends only on . From this we can obtain
[TABLE]
Since is linear, the orthogonal relation (10) further implies that for any , we have
[TABLE]
Here is the usual inner product between the vectors and in .
II-B Stieltjes Transform
In this section we recall some basic knowledge of Stieltjes transform. Interested readers may refer to [5, Chapter B.2] for more details. Stieltjes transform can be defined for any real function of bounded variation. For the case of interest to us, however, we confine ourselves to functions arising from probability theory.
Let be a probability measure and let be the corresponding cumulative distribution function. The Stieltjes transform of or is defined by
[TABLE]
where is a complex variable taking values in , the upper half complex plane. Here is the imaginary part of .
It is known that is well-defined for all and is well-behaved, satisfying the following properties:
- (i).
for any ;
- (ii).
is analytic in and
[TABLE]
where ;
- (iii).
the probability measure can be recovered from the Stieltjes transform via the inverse formula (see [5]):
[TABLE]
- (iv).
the convergence of Stieltjes transforms is equivalent to the convergence of the underlying probability measures (see for example [5, Theorem B.9]).
II-C Resolvent Identities and Formulas for Green function entries
Let be a Hermitian matrix whose -th entry is . Denote by the Green function of , that is,
[TABLE]
where . The -th entry of is .
Given any subset , let be the matrix whose -th entry is given by . In addition, let be the Green function of , that is,
[TABLE]
When is a singleton, say , it is common to further abbreviate the notation as , and similar for other matrices.
Let denote the -th column of . For and any , we have the Schur complement formula (see [5, 8])
[TABLE]
where and is the conjugate transpose of .
We also have the following eigenvalue interlacing property (see [5, 8])
[TABLE]
where , is the trace function, and is a constant depending only on the set .
II-D Stieltjes Transform of the Semicircle Law
The Stieltjes transform of the semicircle distribution given in (4) can be computed as (see [5])
[TABLE]
Here and throughout this paper, we always pick the complex square root to be the one with positive imaginary part.
It is well-known that is the unique function that satisfies the equation
[TABLE]
such that whenever .
II-E Convergence of Stieltjes Transform in Probability
In order to bound the convergence rate of a random Stieltjes transform in probability, we need the following well-known McDiarmid’s lemma from probability theory (see [8, Lemma F.3]).
Lemma 3** (McDiarmid).**
Let be independent random variables taking values in the spaces respectively. Let
[TABLE]
be a measurable function and define the random variable . Define, for each ,
[TABLE]
where the supremum is taken over all for and . Then for any , we have
[TABLE]
We will need the following concentration inequality. We remark that a very similar concentration inequality was proved (see [8, Lemma F.4]). Here for the sake of completeness, we provide a detailed proof.
Lemma 4**.**
Let be a random matrix with independent rows, define . Let be the Stieltjes transform of the empirical spectral distribution of . Then for any and ,
[TABLE]
Proof of Lemma 4.
Applying Lemma 3, we take to be the -th row of and the function to be the Stieltjes transform . Note that the -th entry of is a linear function of the inner product of the -th and -th rows of . Hence changing one row of only gives an additive perturbation of of rank at most two. Applying the resolvent identity [8, (2.3)], we see that the Green function is also only affected by an additive perturbation by a matrix of rank at most two and operator norm at most . Therefore the quantities in (19) can be bounded by
[TABLE]
Then the required result follows directly from inserting the above bound to (20). ∎
III Proof of Theorem 1
Throughout the paper, let be an linear code over . We always assume that its dual distance satisfies . Denote . The standard additive character on extends component-wise to a natural mapping . Define .
III-A Problem set-up
Theorems 1 and 2 are for random matrices in the probability space of choosing distinct elements uniformly from . Denote by the probability space of choosing elements from independently and uniformly. Because , from (11) we have
[TABLE]
as . Thus to prove Theorems 1 and 2, it is equivalent to consider the larger probability space . This will simplify the proofs.
Now let be a random matrix whose rows are picked from uniformly and independently. Denote by the expectation with respect to the probability space . We may assume that is a function of such that as .
Let
[TABLE]
Let be the empirical spectral measure of and let be its Stieltjes transform, that is,
[TABLE]
Here are the eigenvalues of the matrix , and is the Green function of given by
[TABLE]
Note that the Stieltjes transform is itself a random variable in the space . We define
[TABLE]
Throughout the paper, the complex value is always written as
[TABLE]
For a fixed constant , we define
[TABLE]
Now we assume a result about the expected Stieltjes transform .
Theorem 5**.**
For any , we write
[TABLE]
Then we have
[TABLE]
We emphasize here that this is one of the major technical results in this paper and the proof is a little complicated. This is the only result in the paper that is directly related to the properties of linear codes. It requires but not the extra condition (5) used in [10]. To streamline the presentation, here we assume Theorem 5, then Theorem 1 can be proved easily. The proof of Theorem 5 is postponed to Section V.
III-B Proof of Theorem 1
By properties of the Stieltjes transform (see [5, Theorem B.9]), to prove Theorem 1, it is equivalent to prove the following statement: For any , we have
[TABLE]
We prove Statement (25) in several steps.
First, we fix an arbitrary value . The quadratic equation (24) has two solutions
[TABLE]
As , from Theorem 5 we have , so for large enough . Since , we see that
[TABLE]
Then by the continuity of and by taking , we obtain
[TABLE]
Moreover, by Lemma 4, for any fixed , as , we have
[TABLE]
This and (27) immediately imply
[TABLE]
Noting that (28) holds for any fixed and any , so to prove (25), in the next step we need to show that the convergence is “uniform” for all . To do this, we adopt a simple lattice argument.
For any , define the sets
[TABLE]
and
[TABLE]
It is easy to see that and
[TABLE]
For any fixed , define to be the event
[TABLE]
By (28), for any , there is an such that
[TABLE]
Here the set denotes the complement of the event . Then for any such that
[TABLE]
we have
[TABLE]
Finally we consider the event , that is,
[TABLE]
Recall from (13) that the Stieltjes transforms and are both -Lipschitz on the set , and for any , we can find one such that
[TABLE]
So for this we have
[TABLE]
This means that
[TABLE]
Therefore
[TABLE]
for any .
Hence for any , we have
[TABLE]
Taking the limit , we obtain the desired Statement (25). This completes the proof of Theorem 1.
IV Proof of Theorem 2
Now for fixed constants and , let us assume
[TABLE]
Similar in proving Theorem 1 in the previous section, here we assume Theorem 5. Then the main idea of proving Theorem 2 is to provide a refined and quantitative version of Statement (25), so in each step of the proofs, we need to keep track of all the varying parameters as .
First, the upper bound for in Theorem 5 can be simplified as
[TABLE]
where the constant is explicitly given in (7).
Let us define
[TABLE]
From now on, denotes some positive constant depending only on and whose value may vary at each occurrence. We can estimate the difference as follows.
Lemma 6**.**
For any , we have
[TABLE]
Proof of Lemma 6.
First, for large enough , noting that
[TABLE]
we see that Equation (26) holds for all . More precisely, we have
[TABLE]
By using the fact which can be easily checked from (17), we conclude that
[TABLE]
Then Lemma 6 is proved. ∎
Next we estimate the term . An -dependent event is said to hold with high probability if for any , there is a quantity such that for any .
Theorem 7**.**
We have, with high probability,
[TABLE]
Proof of Theorem 7.
By the concentration inequality given in Lemma 4, we have
[TABLE]
Noting that the inequality (29) holds for any fixed . In order to prove Theorem 7, we need an upper bound which is uniform for all . We apply a lattice argument again.
Let
[TABLE]
Note that the set and
[TABLE]
Also, for any and , define to be the event
[TABLE]
and the complement. Then (29) can be rewritten as
[TABLE]
So we have
[TABLE]
for any and .
Finally we consider the event , that is,
[TABLE]
Noting that for any , there is such that
[TABLE]
and that and are both -Lipschitz on , we obtain, for any ,
[TABLE]
This means that
[TABLE]
Hence by (30) we have
[TABLE]
for all .
Combining the above inequality with Lemma 6 completes the proof of Theorem 7. ∎
Proof of Theorem 2.
As a standard application of the Helffer-Sjöstrand formula via complex analysis, Theorem 2 can be derived directly from Theorem 7. This is quite well-known, and the computation is routine. Interested readers may refer to [8, Section 8] for a very similar analysis. We omit the details. ∎
V Proof of Theorem 5
In this section we give a detailed proof of Theorem 5, where the condition that plays an important role.
Recall from the beginning of Section III that is a linear code of length over with , is the standard additive character on , extended component-wisely to , , and is a random matrix whose rows are selected uniformly and independently from . This makes a probability space, on which we use to denote the expectation. Let and be defined as in (21). Since all the entries of are roots of unity, the diagonal entries of are all zero.
Let be the -th entry of . The following properties of , while very simple, depend crucially on the condition that .
Lemma 8**.**
For any , we have
(a) if ;
(b) if the indices do not come in pairs; If the indices come in pairs, then .
Proof of Lemma 8.
(a) It is easy to see that
[TABLE]
where and . Here in the 1 and appear at the -th and -th entries respectively. Since , we have , and the desired result follows directly from (12).
(b) It is easy to see that
[TABLE]
where the vector is formed from the all-zero vector by adding s to the -th and -th entries and then adding s from the -th and -th entries. If the indices do not come in pairs, then . Since , we have by (12). The second statement of (b) is trivial since for any . ∎
For any , let be the matrix obtained from by changing the whole -th row to 0. Define
[TABLE]
Denote by the -th row of , and the -th column of . It is easy to see that
[TABLE]
Let
[TABLE]
be the Green functions of and respectively for the complex variable .
For the Green function , we start with the resolvent identity (15) for . Using (31), we can express the third term on the right side of (15) as
[TABLE]
By the identity
[TABLE]
the right hand side can be further expressed as
[TABLE]
where
[TABLE]
Here the indices vary in and ’s are the -th entry of the matrix given by
[TABLE]
Hence the resolvent identity (15) yields
[TABLE]
Expanding the second term on the right, we obtain
[TABLE]
where
[TABLE]
V-A Estimates of and
The random variables and depend on the complex value . For any fixed constant , recall defined in (23). Throughout this section we always assume .
Lemma 9**.**
Let . Then for any , we have
(a) . Here is the conditional expectation given ;
(b) .
Proof of Lemma 9.
(a) Since the rows of are independent, the entries as defined in (33) are independent with and . Hence from the definition of in (32) and statement (a) of Lemma 8, we have
[TABLE]
The proof of the result on is similar by replacing with .
(b) Expanding and taking expectation inside, noting that the rows of are independent, we have
[TABLE]
Since , by using statement (b) of Lemma 8, we find
[TABLE]
where is an absolute constant which may be different in each appearance. Using the definition of in (33) we have
[TABLE]
Expanding the terms on the right, we can easily obtain
[TABLE]
Here are the eigenvalues of , and is a positive constant depending only on whose value may vary in each occurrence. ∎
The above estimations lead to the following estimations of .
Lemma 10**.**
Let . Then for any , we have
(a) ;
(b) .
Proof of Lemma 10.
(a) Taking expectation on in (35) and noting that , we get
[TABLE]
By the eigenvalue interlacing property in (16) and the trivial bound , we get
[TABLE]
(b) We split as
[TABLE]
where
[TABLE]
We first estimate . Using (a) of Lemma 9, we see that
[TABLE]
Hence by (b) of Lemma 9 we obtain
[TABLE]
Next we estimate . Again by Lemma 9 we have
[TABLE]
So we have
[TABLE]
Here we denote and for any , and for any subset , we denote to be the conditional expectation given . The second equality follows from applying successively the law of total variance to the rows of .
For , writing , we can easily check that
[TABLE]
where . By (16) we have . Hence we obtain
[TABLE]
Plugging the estimates of in statement (a), in (37) and above into the equation (36), we obtain the desired estimate of . ∎
V-B Proof of Theorem 5
We can now complete the proof of Theorem 5.
Proof of Theorem 5.
We write (34) as
[TABLE]
where
[TABLE]
Taking expectations on both sides of (39), we can obtain
[TABLE]
where
[TABLE]
and
[TABLE]
For , since
[TABLE]
we obtain
[TABLE]
For , using the fact that and Lemma 10 we obtain
[TABLE]
for any .
Summing for all and then dividing on both sides of (40), it is easy to see that in writing
[TABLE]
the quantity satisfies the same bound as above. This completes the proof of Theorem 5. ∎
Acknowledgments
The research of M. Xiong was supported by RGC grant number 16303615 from Hong Kong.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Babadi, S. S. Ghassemzadeh and V. Tarokh, “Group randomness properties of pseudo-noise and Gold sequences,” Proc. 12th Can. Workshop Inf. Theory (CWIT) (2011), 42–46.
- 2[2] B. Babadi and V. Tarokh, “Spectral distribution of random matrices from binary linear block codes,” IEEE Trans. Inf. Theory 57 (2011), no. 6, 3955–3962.
- 3[3] B. Babadi and V. Tarokh, “Spectral distribution of product of pseudorandom matrices formed from binary block codes”, IEEE Trans. Inform. Theory 59 (2013), no. 2, 970–978.
- 4[4] Z. Bai, “Convergence Rate of Expected Spectral Distributions of Large Random Matrices. Part I. Wigner Matrices,” The Annals of Probability 21 (1993), no. 2, 625–648.
- 5[5] Z. Bai and J. W. Silverstein, Spectral Analysis of Large Dimensional Random Matrices , 2nd ed. New York, NY 10013, USA: Springer Series in Statistics, 2010.
- 6[6] Z. Bai and Y. Yin, “Convergence to the semicircle law,” Ann. Probab. 16 (1988), no. 2, 863–875.
- 7[7] Z. Bao, “Strong convergence of ESD for the generalized sample covariance matrices when p / n → 0 → 𝑝 𝑛 0 p/n\to 0 ”, Statist. Probab. Lett. 82 (2012), no. 5, 894–901.
- 8[8] F. Benaych-Georges and A. Knowles, Lectures on the local semicircle law for Wigner matrices , 2016.
