A Uniform Bound on the Operator Norm of Sub-Gaussian Random Matrices and Its Applications
Grigory Franguridi, Hyungsik Roger Moon

TL;DR
This paper establishes a uniform bound on the operator norm of sub-Gaussian random matrices with dependent entries, useful for statistical estimation and factor analysis in high-dimensional settings.
Contribution
It provides a novel uniform bound involving Talagrand's functional for matrices with dependent sub-Gaussian entries, extending previous results to more complex data structures.
Findings
Bound applies to matrices with weakly dependent sub-Gaussian entries.
The bound incorporates the complexity of the parameter space via Talagrand's functional.
Applications include operator norm minimization in moment condition estimation and functional data factor analysis.
Abstract
For an random matrix with weakly dependent uniformly sub-Gaussian entries that may depend on a possibly infinite-dimensional parameter , we obtain a uniform bound on its operator norm of the form , where is an absolute constant, controls the tail behavior of (the increments of) , and is Talagrand's functional, a measure of multi-scale complexity of the metric space . We illustrate how this result may be used for estimation that seeks to minimize the operator norm of moment conditions as well as for estimation of the maximal number of factors with functional data.
| 25 | 50 | 100 | 25 | 50 | 100 | 25 | 50 | 100 | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bias | 3.5 | 1.7 | 0.2 | 2.0 | 0.7 | 0.0 | 6.5 | 4.2 | 1.3 | |||
| 25 | RMSE | 0.6 | 0.6 | 0.4 | 0.6 | 0.6 | 0.1 | 0.5 | 0.6 | 0.6 | ||
| Bias | 2.0 | 0.0 | 0.0 | 0.9 | 0.0 | 0.0 | 4.5 | 4.8 | 0.2 | |||
| 50 | RMSE | 0.6 | 0.1 | 0.0 | 0.6 | 0.0 | 0.0 | 0.6 | 0.6 | 0.4 | ||
| Bias | 0.3 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 | 1.7 | 0.3 | 0.9 | |||
| 100 | RMSE | 0.5 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 | 0.6 | 0.5 | 0.5 | ||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Uniform Bound on the Operator Norm
of Sub-Gaussian Random Matrices and Its Applications111We appreciate valuable comments and suggestions from Victor Chernozhukov, Guido Kuersteiner (Co-editor), three anonymous referees, and the participants of the conference on econometrics celebrating Peter Phillips’ 40 years at Yale.
Grigory Franguridi333 Department of Economics, University of Southern California.
Hyungsik Roger Moon444 Department of Economics, University of Southern California and Yonsei University.
Abstract
For an random matrix with weakly dependent uniformly sub-Gaussian entries that may depend on a possibly infinite-dimensional parameter , we obtain a uniform bound on its operator norm of the form , where is an absolute constant, controls the tail behavior of (the increments of) , and is Talagrand’s functional, a measure of multi-scale complexity of the metric space . We illustrate how this result may be used for estimation that seeks to minimize the operator norm of moment conditions as well as for estimation of the maximal number of factors with functional data.
Keywords Random Matrix Theory, Operator Norm, Uniform Bound, Operator Norm Minimizing Estimator, Functional Factor Models.
1 Introduction
Since its introduction in nuclear physics (Wigner,, 1955) and mathematical statistics (Wishart,, 1928), random matrix theory has been developed to understand the properties of the spectra of large dimensional random matrices generated by various distributions. These include the asymptotic theory of the empirical distribution of the eigenvalues of large dimensional random matrices and bounds on the extreme eigenvalues. For detailed results on these topics, readers can refer to recent surveys like Bai, (2008), Edelman and Rao, (2005), Bai and Silverstein, (2010), and Tao, (2012), among others.
In random matrix theory the study of the asymptotics of the largest eigenvalue of large dimensional random matrices goes back to Geman, (1980). Suppose that is an matrix consisting of random variables . Many researchers have derived the limit of the largest eigenvalue of the sample covariance matrix, 555 denotes the transpose of ., under various distributional assumptions on the random matrix . For example, when are iid and , Geman, (1980) showed that . Johnstone, (2001) obtained a stronger result that the properly normalized largest eigenvalue, with and , converges to the Tracy–Widom law; this has been later shown to hold under more general distributional assumptions by Khorunzhiy, (2012) and Tao and Vu, (2011), among many others.
The aforementioned results imply that is stochastically bounded666A sequence of random variables is said to be stochastically bounded or order , , if for any there exists such that for all large enough values of . of order , or equivalently, the operator norm is stochastically bounded of order . In fact, such bound does not require that the underlying distribution is Gaussian and can be derived under much weaker conditions. For example, Latała, (2005) showed that the bound holds if are independent across with mean zero and uniformly bounded fourth moments. Moon and Weidner, (2017) extended this result for the cases where are weakly correlated across or . Other papers that have established similar bounds on include Bandeira and Van Handel, (2016), Guédon et al., (2017) and Latała et al., (2018).
In the case where consists of independent sub-Gaussian entries, the order for the operator norm may be obtained using a powerful way of bounding sub-Gaussian stochastic processes called generic chaining, which was developed in Fernique, (1976) and advanced later by M. Talagrand in a series of papers. Indeed, note that , where maxima are taken over the unit spheres and , respectively. The process defined on can be shown to be sub-Gaussian and so we can invoke generic chaining to get the bound for its expected maximum in terms of a certain measure of metric complexity of called Talagrand’s functional (see definition in the next section). It turns out that has exact order .
In this paper we extend existing nonasymptotic bounds on the operator norm of a high-dimensional random matrix to the case of elements that are allowed to be weakly dependent and to be functions of a possibly infinite-dimensional parameter. Specifically, let be weakly dependent over , sub-Gaussian stochastic processes indexed by parameter belonging to a (pseudo-)metric space . Let be the matrix consisting of and let be Talagrand’s functional of w.r.t. . Our main contribution is to show that is of order .
We illustrate usefulness of this uniform bound with two examples. In one, we propose and show consistency of a new estimator that minimizes the operator norm of a matrix that consists of moment functions. In the other, we consider the generalization of the standard factor model to the case of functional data and suggest a new estimator of the maximal number of factors.
The paper is organized as follows. Section 2 introduces our uniform bound along with the techniques necessary for its derivation. Section 3 contains two applications of our theoretical result. Finally, Section 5 concludes the paper. The appendix contains two technical proofs of the results in the main text.
Throughout the paper, will denote a universal positive constant that may not be the same at each occurrence, but may never depend on sample sizes, dimensions or any other features of the modeling framework.
2 Uniform bound on the operator norm
2.1 Generic chaining bound
Our main result is based on the general bound on suprema of sub-Gaussian processes called the generic chaining bound. We discuss this classic technique in this section and provide a proof in the appendix for completeness.
First, we need the following definitions. The -Orlicz norm of a random variable is defined as
[TABLE]
where is a convex function satisfying and , and the convention that the infimum of an empty set is . In this paper, we let , where , and call just “the Orlicz norm”. A random variable with finite (-)Orlicz norm is called sub-Gaussian.
Intuitively, the Orlicz norm quantifies the decay speed for the tails of the distribution of . In fact, is equivalent to777See e.g. Vershynin, (2018), Proposition 2.5.2.
[TABLE]
Hence, for example, Gaussian distributions and distributions with bounded support are all sub-Gaussian.
Note also that the last inequality implies
[TABLE]
Now let be a set and be a (pseudo-)metric on this set such that is a (pseudo-)metric space888Throughout the paper, “metric” can be replaced by a less restrictive notion of “pseudometric”, a distinction we omit from now on.. Consider a zero mean stochastic process indexed by the elements of . The process is said to have sub-Gaussian increments if there exists a constant such that
[TABLE]
It has long been understood that behavior of sub-Gaussian processes is intimately connected to the metric complexity of its index set. In particular, the conventional bound on the expected supremum of (see e.g. Van Der Vaart and Wellner, (1996) Corollary 2.2.8.) is
[TABLE]
where is the covering number of (i.e. the minimal number of -balls that is sufficient to cover in metric ) and is an absolute constant. The integral on the right hand side is sometimes called Dudley’s entropy of and quantifies complexity of across multiple scales.
It turns out, however, that Dudley’s entropy bound is not optimal, even for Gaussian processes. In fact, the entropy may be infinite when the expected supremum is not, rendering the bound uninformative999For an illustrative example, see Exercise 8.1.12 in Vershynin, (2018)..
This led to the development of more precise ways to control suprema of sub-Gaussian processes in Fernique, (1976) and Talagrand, (2006). The generic chaining bound is stronger than (3) and is sharp for Gaussian processes101010See Section 8.6 in Vershynin, (2018).. To introduce it, we need another definition.
For a metric space , a sequence of finite subsets is admissible if their cardinalities satisfy
[TABLE]
Let the distance from the point to the set be
[TABLE]
Talagrand’s functional is then defined by the formula
[TABLE]
where the infimum is taken over all admissible sequences . Note that we can restrict our attention to only those admissible sequences that eventually come arbitrarily close to any point , which is possible provided is separable111111A metric space is separable if it has a countable subset that is dense in ..
To understand the relation between Talagrand’s functional and Dudley’s entropy, let us provide the discussion from Talagrand, (2006) pp.12–13 here.
Denote , for , and
[TABLE]
Note that
[TABLE]
where the second equality holds because minimizing the sum w.r.t. all admissible sequences can be performed by separately minimizing each term w.r.t. subsets satisfying .
The definition of involves choosing at most points in such that the balls with radius and centers in cover ; moreover, is the minimal such radius, i.e.
[TABLE]
It follows that if , then or . Hence we can write
[TABLE]
Since for , summation over yields
[TABLE]
where, of course, .
The term on the left hand side of this inequality satisfies
[TABLE]
Combining this with (6) and (7) yields the key relation
[TABLE]
Hence, when used as an upper bound, Talagrand’s functional is sharper than Dudley’s entropy.
We are now ready to state the generic chaining bound for sub-Gaussian processes, see e.g. Theorem 8.5.3 in Vershynin, (2018).
Theorem 1** (Generic chaining).**
Let , be a mean zero random process on a separable metric space with sub-Gaussian increments as in (2). Then, for some absolute constant ,
[TABLE]
Proof.
See Appendix A. ∎
2.2 The main result
We impose the following assumptions.
Assumption 1**.**
The parameter belongs to a separable metric space .
Assumption 2**.**
For each , random variables follow different MA() processes for each , viz.
[TABLE]
where are nonrandom coefficients such that, for all and ,
[TABLE]
Assumption 3**.**
Innovations are independent, mean zero, sub-Gaussian random variables with uniformly bounded scaling factors, i.e. there exists s.t. for all
[TABLE]
Assumption 4**.**
Innovations are separable121212Let be a separable metric space with a countable dense subset . A stochastic process on is called separable if for all , there exists a sequence such that and almost surely. Non-separable stochastic processes have separable copies under very weak conditions, see Shalizi and Kontorovich, (2010). stochastic processes whose increments are sub-Gaussian with uniformly bounded constants, i.e. there exists s.t. for all and
[TABLE]
1 is very weak and only imposes separability of the metric space which holds for most parameter spaces encountered in practice such as Euclidean spaces and spaces of integrable functions. 2 is similar to case (ii) in Lemma S.2.1 of Moon and Weidner, (2017) and allows to be weakly dependent over time. 3 and 4 impose uniform sub-Gaussianity on the innovations and their increments , respectively. Note that 4 is equivalent to the tail bound
[TABLE]
Denote and let the matrix consisting of , , Equation (8) can be rewritten in the matrix form as
[TABLE]
Suppose for a moment that we have a bound on of the form
[TABLE]
where does not depend on . Then
[TABLE]
This shows that the bound on is, up to the absolute constant , the same as the bound on . Hence we can focus on obtaining the latter bound from now on. It will be clear from the proof that the bound will not depend on , so we consider the case and denote for brevity.
The operator norm of can be expressed as
[TABLE]
where and are unit spheres in and , respectively, and the process
[TABLE]
Define the product metric on by
[TABLE]
where denotes the standard Euclidean metric on .
To obtain a uniform bound on , we would like to apply Theorem 1 to the process defined on the metric space . Our first lemma asserts that has sub-Gaussian increments.
Lemma 1**.**
Under Assumptions 1, 3, 4, the process has sub-Gaussian increments w.r.t. the metric , with the constant
Proof.
For , write
[TABLE]
Recall a standard result for the norm (see e.g. equation (2.1) in Mendelson and Tomczak-Jaegermann, (2008)): there exists an absolute constant such that for all constants and independent centered variables one has
[TABLE]
Applying this inequality, we obtain
[TABLE]
This implies
[TABLE]
which completes the proof. ∎
Our second lemma establishes the bound on Talagrand’s functional of a product space in terms of Talagrand’s functionals of component spaces.
Lemma 2** (Talagrand’s functional of a product space).**
Consider a finite number of metric spaces and the product space with the product metric defined by
[TABLE]
Talagrand’s functional of satisfies
[TABLE]
Proof.
See Appendix B. ∎
Finally, by 1, we can apply the generic chaining bound of Theorem 1 to defined on the separable metric space with the metric . 2 then yields
[TABLE]
For the unit sphere in , its Dudley’s entropy satisfies
[TABLE]
Besides, Talagrand’s functional is bounded from above by Dudley’s entropy (e.g. Exercise 8.5.7 in Vershynin, (2018)), up to absolute constant factors.
Applying these bounds to unit spheres and gives
[TABLE]
Finally, taking into account the inequality (10), we obtain the main theoretical result of this paper.
Theorem 2**.**
[TABLE]
where
Remarks
- (i)
Generic chaining yields not only the bound on the expected value, but also tail bounds and bounds on moments of , see e.g. Dirksen, (2015). In particular, it follows from Theorem 8.5.5 of Vershynin, (2018) that, for all , the event
[TABLE]
holds with probability at least , where is the diameter of in .
- (ii)
Suppose are Gaussian random variables. Then the process is Gaussian and therefore the bound (11) is sharp, up to an absolute constant, by the majorizing measure theorem, see Theorem 8.6.1 in Vershynin, (2018).
- (iii)
If is a bounded set in , the main result and majorization of Talagrand’s functional with Dudley’s entropy yield
[TABLE]
In particular, if consists of one element (so that there is no dependence on ), the bound reduces to
[TABLE]
which is a classical result in random matrix theory, see e.g. Latała, (2005).
- (iv)
The dimension of is allowed to grow with the sample size; of course, to maintain the rate for the operator norm, the dimension should not grow faster than .
- (v)
Theorem 2 can be generalized to the case of Orlicz norms with , . An important special case corresponds to sub-exponential random variables.
The bound will take the form
[TABLE]
where the generalized Talagrand’s functional is defined by
[TABLE]
The proof is similar to the case . The appropriate version of the generic chaining bound is
[TABLE]
where is a stochastic process with bounded -Orlicz increments. Also,
[TABLE]
Both results can be found in Talagrand, (2006).
3 Applications
3.1 Operator norm minimizing estimator
In this section, we investigate a new estimator that minimizes the operator norm of the moment function matrix. Suppose that are moment functions of such that . For simplicity, assume that . Let , the matrix of moment functions.
The conventional method of moment estimator solves
[TABLE]
where is the -vector of ones.
The new estimator we propose minimizes the operator norm of the moment function matrix ,
[TABLE]
In this section we establish consistency of using our main result of the previous section.
Assumption 5**.**
(i) the parameter set is a bounded subset of , (ii) the centered moment function satisfies the conditions of Assumptions 2-4, and (iii) for any , there exists such that .
Conditions (i)-(ii) of Assumption 5 ensure that satisfies Assumptions 1-4. The last condition (iii) corresponds to the identification condition of the extremum estimator.
For consistency of , it suffices to show that for any , there exists such that
[TABLE]
with probability approaching one.
First, note that, since , the triangle inequality yields
[TABLE]
On the other hand,
[TABLE]
Combine (13) and (14) to obtain
[TABLE]
Finally, choose as in Assumption 5(iii) to guarantee
[TABLE]
and note that Theorem 2 gives
[TABLE]
Then (15) implies
[TABLE]
which finishes the proof of consistency of .
Remarks
- (i)
If are iid, then the identification condition Assumption 5 (iii) becomes the usual identification condition, that is, for any , there exists such that . This is because
- (ii)
Suppose that . Instead of the operator norm objective function, we may also consider
[TABLE]
where are weights.
- (iii)
We can also extend the objective function to be the sum of largest singular values, where is a sequence of positive integers such that while :
[TABLE]
where is the largest singular value of matrix . Since , we have
[TABLE]
3.2 Estimator of number of factors with functional data
Consider a generic factor model for functional data
[TABLE]
where belongs to a separable metric space , is the observation matrix of functional outcomes , and , such that for all the probability limits of and exist and are positive definite deterministic matrices such that
[TABLE]
The object of interest is the maximal rank .
To illustrate applicability of this model, suppose that the outcome variable is intraday pollution levels , where is the time within a day, across counties and time , as in Aue et al., (2015). It is plausible to assume that counties with higher population density and dependence on automobiles will have higher average levels of pollution. At the same time, pollution patterns on weekdays and on weekends may differ in a systematic way. Hence it is reasonable to model the intraday pollution curve as the interaction of the county fixed effect and the time effect , plus independent noise, arriving at model (16). A related approach to modeling functional time series can be found in Kargin and Onatski, (2008), whose empirical objective is to predict the contract rate curves of daily Eurodollar futures.
Of course, arguments similar to those outlined above may be applied to modeling of numerous other functional quantities, from mortality as a function of age to crop yields as a function of spatial location. For more examples and an overview of functional data analysis, see e.g. Wang et al., (2016) and Kowal et al., (2019).
Let us now show heuristically how to derive a consistent estimator of the maximal rank .
Note that the model assumptions imply
[TABLE]
If satisfies the conditions of Theorem 2, we have and so
[TABLE]
Denote the -th largest singular value of matrix . The Ky Fan inequality for singular values asserts that for
[TABLE]
Using this inequality, for a fixed we obtain
[TABLE]
Therefore, there exists a positive constant such that
[TABLE]
where the last inequality holds by (17).
On the other hand,
[TABLE]
This establishes consistency of the following natural estimator of ,
[TABLE]
where is a sequence of real numbers satisfying and .
Empirical practice calls for an automatic procedure for choosing the tuning parameter . One may consider one of the following three options, using the penalty term from Bai and Ng, (2002):
[TABLE]
where is a consistent estimator of
[TABLE]
In applications, can be replaced by the residual variance of after partialling out factors using principle component analysis, where is a pre-specified upper bound on the true maximal number of factors .
4 Monte Carlo illustration
Here we illustrate the performance of the maximal rank estimator in the functional factor model described in the previous section with a simple simulation design.
The data generating process is the functional factor model (16), where, for simplicity, we let the loadings and the factors to be independent of . In scalar form, the model is
[TABLE]
where and
[TABLE]
The chosen specification for comes from a generic representation of any Gaussian stochastic process as an infinite trigonometric series, in which we only retain one term. Clearly, the error variance for all and there is nontrivial dependence of across values of . We set . The results do not change substantially when larger values of are used.
We choose the range of parameter to be and the corresponding ranks
[TABLE]
so that the true value of interest is .
The simulated bias and root MSE for the maximal rank estimator (20) are shown in Table 1. Clearly, the choice for the tuning parameter leads to poor small sample performance, which is similar to the results of Bai and Ng, (2002). However, under the other two choices , bias and RMSE are modest even in small samples and become essentially zero when .
Given these simulation results, we are convinced that our generalization (20) of the estimator of Bai and Ng, (2002) will be useful for practitioners who are interested in estimating factor models with functional data.
5 Conclusion
In this paper, we derive a novel uniform stochastic bound on the operator norm of sub-Gaussian random matrices. We use it to establish consistency of a new estimator that minimizes the operator norm of the matrix of moment functions as well as to introduce an estimator of the maximal number of factors in a functional interactive fixed effects model.
\appendixpage
Appendix A Proof of Theorem 1
The following proof can be found in Vershynin, (2018), see Theorem 8.5.3.
Since is separable, we can assume for simplicity that it is finite. Let be an admissible sequence and be the best approximation to in , i.e.
[TABLE]
Now consider a chain of approximations to the point starting from some
[TABLE]
and write
[TABLE]
Sub-Gaussianity of increments implies that, for any ,
[TABLE]
where .
Note that since , , the number of possible pairs is . Applying the union bound to (22) over and pairs , we obtain
[TABLE]
The event on the left-hand side implies
[TABLE]
for a constant . Taking supremum over yields
[TABLE]
Since this event holds with probability at least , is a sub-Gaussian random variable with Orlicz norm bounded by . The conclusion then follows from (1) and the inequality
[TABLE]
Appendix B Proof of 2
We give the proof for the case of metric spaces for simplicity. The case of arbitrary follows immediately by inspection.
Denote the two spaces by and . Consider admissible sequences and in and , respectively. To each such pair there corresponds a sequence in of the form
[TABLE]
This sequence is admissible since and for .
Fix and write
[TABLE]
The bound on the first two terms on the right-hand side is
[TABLE]
Similarly, we have
[TABLE]
Adding the two inequalities and taking suprema yields
[TABLE]
Taking infima over admissible sequences (which are functions of admissible sequences and ) yields
[TABLE]
Finally, note that is not larger than the left-hand side of the inequality above since the infimum in its definition is taken over all admissible sequences , not only those that have the form .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aue et al., (2015) Aue, A., Norinho, D. D., and Hörmann, S. (2015). On the prediction of stationary functional time series. Journal of the American Statistical Association , 110(509):378–392.
- 2Bai and Ng, (2002) Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica , 70(1):191–221.
- 3Bai and Silverstein, (2010) Bai, Z. and Silverstein, J. W. (2010). Spectral analysis of large dimensional random matrices , volume 20. Springer.
- 4Bai, (2008) Bai, Z. D. (2008). Methodologies in spectral analysis of large dimensional random matrices, a review. In Advances In Statistics , pages 174–240. World Scientific.
- 5Bandeira and Van Handel, (2016) Bandeira, A. S. and Van Handel, R. (2016). Sharp nonasymptotic bounds on the norm of random matrices with independent entries. The Annals of Probability , 44(4):2479–2506.
- 6Dirksen, (2015) Dirksen, S. (2015). Tail bounds via generic chaining. Electronic Journal of Probability , 20.
- 7Edelman and Rao, (2005) Edelman, A. and Rao, N. R. (2005). Random matrix theory. Acta Numerica , 14:233–297.
- 8Fernique, (1976) Fernique, X. (1976). Regularité des trajectoires des fonctions aléatoires gaussiennes. pages 1–96.
