A Uniform Bound on the Operator Norm of Sub-Gaussian Random Matrices and Its Applications

Grigory Franguridi; Hyungsik Roger Moon

arXiv:1905.01096·econ.EM·December 17, 2025

A Uniform Bound on the Operator Norm of Sub-Gaussian Random Matrices and Its Applications

Grigory Franguridi, Hyungsik Roger Moon

PDF

TL;DR

This paper establishes a uniform bound on the operator norm of sub-Gaussian random matrices with dependent entries, useful for statistical estimation and factor analysis in high-dimensional settings.

Contribution

It provides a novel uniform bound involving Talagrand's functional for matrices with dependent sub-Gaussian entries, extending previous results to more complex data structures.

Findings

01

Bound applies to matrices with weakly dependent sub-Gaussian entries.

02

The bound incorporates the complexity of the parameter space via Talagrand's functional.

03

Applications include operator norm minimization in moment condition estimation and functional data factor analysis.

Abstract

For an $N \times T$ random matrix $X (β)$ with weakly dependent uniformly sub-Gaussian entries $x_{i t} (β)$ that may depend on a possibly infinite-dimensional parameter $β \in B$ , we obtain a uniform bound on its operator norm of the form $E sup_{β \in B} ∣∣ X (β) ∣∣ \leq C K (max (N, T) + γ_{2} (B, d_{B}))$ , where $C$ is an absolute constant, $K$ controls the tail behavior of (the increments of) $x_{i t} (\cdot)$ , and $γ_{2} (B, d_{B})$ is Talagrand's functional, a measure of multi-scale complexity of the metric space $(B, d_{B})$ . We illustrate how this result may be used for estimation that seeks to minimize the operator norm of moment conditions as well as for estimation of the maximal number of factors with functional data.

Tables1

Table 1. Table 1: Performance of the maximal rank estimator under different thresholds ψ N T subscript 𝜓 𝑁 𝑇 \psi_{NT} .

		$ψ_{N T, 1}$			$ψ_{N T, 2}$			$ψ_{N T, 3}$
$N$	$T$	25	50	100	25	50	100	25	50	100
	Bias	3.5	1.7	0.2	2.0	0.7	0.0	6.5	4.2	1.3
25	RMSE	0.6	0.6	0.4	0.6	0.6	0.1	0.5	0.6	0.6
	Bias	2.0	0.0	0.0	0.9	0.0	0.0	4.5	4.8	0.2
50	RMSE	0.6	0.1	0.0	0.6	0.0	0.0	0.6	0.6	0.4
	Bias	0.3	0.0	0.0	0.1	0.0	0.0	1.7	0.3	0.9
100	RMSE	0.5	0.0	0.0	0.2	0.0	0.0	0.6	0.5	0.5

Equations186

∣∣ Y ∣ ∣_{ψ} = in f {K > 0 s.t. E (ψ (Y / K)) \leq 1},

∣∣ Y ∣ ∣_{ψ} = in f {K > 0 s.t. E (ψ (Y / K)) \leq 1},

P (∣ Y ∣ \leq t) \geq 1 - 2 e^{- t^{2} / K^{2}} for all t \geq 0.

P (∣ Y ∣ \leq t) \geq 1 - 2 e^{- t^{2} / K^{2}} for all t \geq 0.

E ∣ Y ∣ = \int_{0}^{\infty} (1 - P (∣ Y ∣ \leq t)) d t \leq 2 \int_{0}^{\infty} e^{- t^{2} / K^{2}} d t = K π .

E ∣ Y ∣ = \int_{0}^{\infty} (1 - P (∣ Y ∣ \leq t)) d t \leq 2 \int_{0}^{\infty} e^{- t^{2} / K^{2}} d t = K π .

∣∣ Z_{t} - Z_{s} ∣ ∣_{ψ_{2}} \leq K \cdot d (t, s) for all t, s \in T .

∣∣ Z_{t} - Z_{s} ∣ ∣_{ψ_{2}} \leq K \cdot d (t, s) for all t, s \in T .

E t \in T sup Z_{t} \leq C K \int_{0}^{\infty} lo g N (T, d, ε) d ε,

E t \in T sup Z_{t} \leq C K \int_{0}^{\infty} lo g N (T, d, ε) d ε,

∣ T_{0} ∣ = 1 and ∣ T_{k} ∣ \leq 2^{2^{k}} for k \geq 1.

∣ T_{0} ∣ = 1 and ∣ T_{k} ∣ \leq 2^{2^{k}} for k \geq 1.

d (t, T_{k}) = t^{'} \in T_{k} in f d (t, t^{'}) .

d (t, T_{k}) = t^{'} \in T_{k} in f d (t, t^{'}) .

γ_{2} (T, d) = (T_{k}) in f t \in T sup k = 0 \sum \infty 2^{k /2} d (t, T_{k}),

γ_{2} (T, d) = (T_{k}) in f t \in T sup k = 0 \sum \infty 2^{k /2} d (t, T_{k}),

e_{k} (T) = S \subset T : ∣ S ∣ \leq N_{k} in f t \in T sup d (t, S) .

e_{k} (T) = S \subset T : ∣ S ∣ \leq N_{k} in f t \in T sup d (t, S) .

γ_{2} (T, d) \leq (T_{k}) in f k = 0 \sum \infty 2^{k /2} t \in T sup d (t, T_{k}) = k = 0 \sum \infty 2^{k /2} e_{k} (T),

γ_{2} (T, d) \leq (T_{k}) in f k = 0 \sum \infty 2^{k /2} t \in T sup d (t, T_{k}) = k = 0 \sum \infty 2^{k /2} e_{k} (T),

e_{k} (T) = in f {ε > 0 : N (T, d, ε) \leq N_{k}} .

e_{k} (T) = in f {ε > 0 : N (T, d, ε) \leq N_{k}} .

lo g (N_{k} + 1) (e_{k} (T) - e_{k + 1} (T)) \leq \int_{e_{k + 1} (T)}^{e_{k} (T)} lo g N (T, d, ε) d ε .

lo g (N_{k} + 1) (e_{k} (T) - e_{k + 1} (T)) \leq \int_{e_{k + 1} (T)}^{e_{k} (T)} lo g N (T, d, ε) d ε .

lo g 2 k = 0 \sum \infty 2^{k /2} (e_{k} (T) - e_{k + 1} (T)) \leq \int_{0}^{e_{0} (T)} lo g N (T, d, ε) d ε,

lo g 2 k = 0 \sum \infty 2^{k /2} (e_{k} (T) - e_{k + 1} (T)) \leq \int_{0}^{e_{0} (T)} lo g N (T, d, ε) d ε,

k = 0 \sum \infty 2^{k /2} (e_{k} (T) - e_{k + 1} (T)) = k = 0 \sum \infty 2^{k /2} e_{k} (T) - k = 1 \sum \infty 2^{(k - 1) /2} e_{k} (T) \geq (1 - 2^{- 1/2}) k = 0 \sum \infty 2^{k /2} e_{k} (T) .

k = 0 \sum \infty 2^{k /2} (e_{k} (T) - e_{k + 1} (T)) = k = 0 \sum \infty 2^{k /2} e_{k} (T) - k = 1 \sum \infty 2^{(k - 1) /2} e_{k} (T) \geq (1 - 2^{- 1/2}) k = 0 \sum \infty 2^{k /2} e_{k} (T) .

γ_{2} (T, d) \leq C \int_{0}^{diam (T)} lo g N (T, d, ε) d ε .

γ_{2} (T, d) \leq C \int_{0}^{diam (T)} lo g N (T, d, ε) d ε .

E t \in T sup Z_{t} \leq C K γ_{2} (T, d) .

E t \in T sup Z_{t} \leq C K γ_{2} (T, d) .

x_{i t} (β) = τ = 0 \sum \infty ψ_{i τ} (β) ε_{i, t - τ} (β),

x_{i t} (β) = τ = 0 \sum \infty ψ_{i τ} (β) ε_{i, t - τ} (β),

∣ ψ_{i τ} (β) ∣ \leq θ_{τ}, where τ = 0 \sum \infty θ_{τ} < \infty.

∣ ψ_{i τ} (β) ∣ \leq θ_{τ}, where τ = 0 \sum \infty θ_{τ} < \infty.

∣∣ ε_{i τ} (β) ∣ ∣_{ψ_{2}} \leq K_{1} .

∣∣ ε_{i τ} (β) ∣ ∣_{ψ_{2}} \leq K_{1} .

∣∣ ε_{i τ} (β_{1}) - ε_{i τ} (β_{2}) ∣ ∣_{ψ_{2}} \leq K_{2} \cdot d_{B} (β_{1}, β_{2}) .

∣∣ ε_{i τ} (β_{1}) - ε_{i τ} (β_{2}) ∣ ∣_{ψ_{2}} \leq K_{2} \cdot d_{B} (β_{1}, β_{2}) .

P (∣ ε_{i t} (β_{1}) - ε_{i t} (β_{2}) ∣ \leq t \cdot d_{B} (β_{1}, β_{2})) \geq 1 - 2 e^{- \frac{t ^{2}}{K _{2}^{2}}} for all t \geq 0.

P (∣ ε_{i t} (β_{1}) - ε_{i t} (β_{2}) ∣ \leq t \cdot d_{B} (β_{1}, β_{2})) \geq 1 - 2 e^{- \frac{t ^{2}}{K _{2}^{2}}} for all t \geq 0.

X (β) = (x_{ij} (β)) = τ = 0 \sum \infty diag (ψ_{τ} (β)) Ξ_{- τ} (β) .

X (β) = (x_{ij} (β)) = τ = 0 \sum \infty diag (ψ_{τ} (β)) Ξ_{- τ} (β) .

E β sup ∣∣ Ξ_{- τ} (β) ∣∣ \leq φ (N, T, B),

E β sup ∣∣ Ξ_{- τ} (β) ∣∣ \leq φ (N, T, B),

E β sup ∣∣ X (β) ∣∣

E β sup ∣∣ X (β) ∣∣

\leq E τ = 0 \sum \infty β sup ∥ diag (ψ_{τ} (β)) \cdot Ξ_{- τ} (β) ∥ \leq E τ = 0 \sum \infty β sup ∥ diag (ψ_{τ} (β)) ∥ \cdot ∥ Ξ_{- τ} (β) ∥

\leq E τ = 0 \sum \infty β sup ∥ diag (ψ_{τ} (β)) ∥ \cdot β sup ∥ Ξ_{- τ} (β) ∥ = τ = 0 \sum \infty β sup ∥ diag (ψ_{τ} (β)) ∥ E β sup ∥ Ξ_{- τ} (β) ∥

\leq φ (N, T, B) τ = 0 \sum \infty β sup i = 1, \dots, N max ∣ ψ_{i τ} (β) ∣ \leq D φ (N, T, B) .

∣∣Ξ (β) ∣∣ = u \in U, v \in V sup Z (u, v, β),

∣∣Ξ (β) ∣∣ = u \in U, v \in V sup Z (u, v, β),

Z (u, v, β) := u^{'} Ξ (β) v = i = 1 \sum N t = 1 \sum T u_{i} v_{t} ε_{i t} (β) .

Z (u, v, β) := u^{'} Ξ (β) v = i = 1 \sum N t = 1 \sum T u_{i} v_{t} ε_{i t} (β) .

d ((\tilde{u}, \tilde{v}, \tilde{β}), (u, v, β)) = d_{R^{N}} (\tilde{u}, u) + d_{R^{T}} (\tilde{v}, v) + d_{B} (\tilde{β}, β) .

d ((\tilde{u}, \tilde{v}, \tilde{β}), (u, v, β)) = d_{R^{N}} (\tilde{u}, u) + d_{R^{T}} (\tilde{v}, v) + d_{B} (\tilde{β}, β) .

Z (\tilde{u}, \tilde{v}, \tilde{β}) - Z (u, v, β) = (\tilde{u} - u)^{'} Ξ (\tilde{β}) \tilde{v} + u^{'} (Ξ (\tilde{β}) - Ξ (β)) \tilde{v} + u^{'} Ξ (β) (\tilde{v} - v) = z_{I} + z_{I I} + z_{I I I} .

Z (\tilde{u}, \tilde{v}, \tilde{β}) - Z (u, v, β) = (\tilde{u} - u)^{'} Ξ (\tilde{β}) \tilde{v} + u^{'} (Ξ (\tilde{β}) - Ξ (β)) \tilde{v} + u^{'} Ξ (β) (\tilde{v} - v) = z_{I} + z_{I I} + z_{I I I} .

i = 1 \sum n a_{i} ξ_{i}_{ψ_{2}} \leq c i = 1 \sum n a_{i}^{2} ∣∣ ξ_{i} ∣ ∣_{ψ_{2}}^{2} \leq c ∣∣ a ∣∣ i = 1, \dots, n max ∣∣ ξ_{i} ∣ ∣_{ψ_{2}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Uniform Bound on the Operator Norm

of Sub-Gaussian Random Matrices and Its Applications111We appreciate valuable comments and suggestions from Victor Chernozhukov, Guido Kuersteiner (Co-editor), three anonymous referees, and the participants of the conference on econometrics celebrating Peter Phillips’ 40 years at Yale.

Grigory Franguridi333 Department of Economics, University of Southern California.

Hyungsik Roger Moon444 Department of Economics, University of Southern California and Yonsei University.

Abstract

For an $N\times T$ random matrix $X(\beta)$ with weakly dependent uniformly sub-Gaussian entries $x_{it}(\beta)$ that may depend on a possibly infinite-dimensional parameter $\beta\in\mathbf{B}$ , we obtain a uniform bound on its operator norm of the form $\mathbb{E}\sup_{\beta\in\mathbf{B}}||X(\beta)||\leq CK\left(\sqrt{\max(N,T)}+\gamma_{2}(\mathbf{B},d_{\mathbf{B}})\right)$ , where $C$ is an absolute constant, $K$ controls the tail behavior of (the increments of) $x_{it}(\cdot)$ , and $\gamma_{2}(\mathbf{B},d_{\mathbf{B}})$ is Talagrand’s functional, a measure of multi-scale complexity of the metric space $(\mathbf{B},d_{\mathbf{B}})$ . We illustrate how this result may be used for estimation that seeks to minimize the operator norm of moment conditions as well as for estimation of the maximal number of factors with functional data.

Keywords Random Matrix Theory, Operator Norm, Uniform Bound, Operator Norm Minimizing Estimator, Functional Factor Models.

1 Introduction

Since its introduction in nuclear physics (Wigner,, 1955) and mathematical statistics (Wishart,, 1928), random matrix theory has been developed to understand the properties of the spectra of large dimensional random matrices generated by various distributions. These include the asymptotic theory of the empirical distribution of the eigenvalues of large dimensional random matrices and bounds on the extreme eigenvalues. For detailed results on these topics, readers can refer to recent surveys like Bai, (2008), Edelman and Rao, (2005), Bai and Silverstein, (2010), and Tao, (2012), among others.

In random matrix theory the study of the asymptotics of the largest eigenvalue of large dimensional random matrices goes back to Geman, (1980). Suppose that $X$ is an $N\times T$ matrix consisting of random variables $x_{it}$ . Many researchers have derived the limit of the largest eigenvalue of the sample covariance matrix, $\lambda_{1}(X^{\prime}X)$ 555 $X^{\prime}$ denotes the transpose of $X$ ., under various distributional assumptions on the random matrix $X$ . For example, when $X_{it}$ are iid $N(0,1)$ and $\kappa:=\lim\frac{N}{T}$ , Geman, (1980) showed that $\frac{1}{N}\lambda_{1}(X^{\prime}X)\rightarrow_{a.s.}(1+\kappa^{1/2})^{2}$ . Johnstone, (2001) obtained a stronger result that the properly normalized largest eigenvalue, $\frac{\lambda_{1}(X^{\prime}X)-\mu_{NT}}{\sigma_{NT}}$ with $\mu_{NT}=(\sqrt{N-1}+\sqrt{T})^{2}$ and $\sigma_{NT}=(\sqrt{N-1}+\sqrt{T})(1/\sqrt{N-1}+1/\sqrt{T})^{1/3}$ , converges to the Tracy–Widom law; this has been later shown to hold under more general distributional assumptions by Khorunzhiy, (2012) and Tao and Vu, (2011), among many others.

The aforementioned results imply that $\lambda_{1}(X^{\prime}X)$ is stochastically bounded666A sequence of random variables $\xi_{n}$ is said to be stochastically bounded or order $a_{n}$ , $\xi_{n}=O_{p}(a_{n})$ , if for any $\varepsilon>0$ there exists $M>0$ such that $\mathbb{P}(|\xi_{n}/a_{n}|\geq M)\leq\varepsilon$ for all large enough values of $n$ . of order $\max(N,T)$ , or equivalently, the operator norm $\|X\|:=\sqrt{\lambda_{1}(X^{\prime}X)}$ is stochastically bounded of order $\sqrt{\max(N,T)}$ . In fact, such bound does not require that the underlying distribution is Gaussian and can be derived under much weaker conditions. For example, Latała, (2005) showed that the bound holds if $x_{it}$ are independent across $(i,t)$ with mean zero and uniformly bounded fourth moments. Moon and Weidner, (2017) extended this result for the cases where $x_{it}$ are weakly correlated across $i$ or $t$ . Other papers that have established similar bounds on $E\|X\|$ include Bandeira and Van Handel, (2016), Guédon et al., (2017) and Latała et al., (2018).

In the case where $X$ consists of independent sub-Gaussian entries, the $\sqrt{\max(N,T)}$ order for the operator norm may be obtained using a powerful way of bounding sub-Gaussian stochastic processes called generic chaining, which was developed in Fernique, (1976) and advanced later by M. Talagrand in a series of papers. Indeed, note that $||X||=\max_{u\in\mathbf{U}}\max_{v\in\mathbf{V}}u^{\prime}Xv$ , where maxima are taken over the unit spheres $\mathbf{U}\subset\mathbb{R}^{N}$ and $\mathbf{V}\subset\mathbb{R}^{T}$ , respectively. The process $Z(u,v)=u^{\prime}Xv$ defined on $\mathbf{U}\times\mathbf{V}$ can be shown to be sub-Gaussian and so we can invoke generic chaining to get the bound for its expected maximum in terms of a certain measure of metric complexity of $\mathbf{U}\times\mathbf{V}$ called Talagrand’s functional $\gamma_{2}(\mathbf{U}\times\mathbf{V})$ (see definition in the next section). It turns out that $\gamma_{2}(\mathbf{U}\times\mathbf{V})$ has exact order $\sqrt{\max(N,T)}$ .

In this paper we extend existing nonasymptotic bounds on the operator norm of a high-dimensional random matrix to the case of elements that are allowed to be weakly dependent and to be functions of a possibly infinite-dimensional parameter. Specifically, let $x_{it}(\beta)$ be weakly dependent over $t$ , sub-Gaussian stochastic processes indexed by parameter $\beta$ belonging to a (pseudo-)metric space $(\mathbf{B},d_{\mathbf{B}})$ . Let $X(\beta)$ be the $N\times T$ matrix consisting of $x_{it}(\beta)$ and let $\gamma_{2}(\mathbf{B},d_{\mathbf{B}})$ be Talagrand’s functional of $\mathbf{B}$ w.r.t. $d_{\mathbf{B}}$ . Our main contribution is to show that $\mathbb{E}\sup_{\beta\in\mathbf{B}}\|X(\beta)\|$ is of order $\sqrt{\max(N,T)}+\gamma_{2}(\mathbf{B},d_{\mathbf{B}})$ .

We illustrate usefulness of this uniform bound with two examples. In one, we propose and show consistency of a new estimator that minimizes the operator norm of a matrix that consists of moment functions. In the other, we consider the generalization of the standard factor model to the case of functional data and suggest a new estimator of the maximal number of factors.

The paper is organized as follows. Section 2 introduces our uniform bound along with the techniques necessary for its derivation. Section 3 contains two applications of our theoretical result. Finally, Section 5 concludes the paper. The appendix contains two technical proofs of the results in the main text.

Throughout the paper, $C$ will denote a universal positive constant that may not be the same at each occurrence, but may never depend on sample sizes, dimensions or any other features of the modeling framework.

2 Uniform bound on the operator norm

2.1 Generic chaining bound

Our main result is based on the general bound on suprema of sub-Gaussian processes called the generic chaining bound. We discuss this classic technique in this section and provide a proof in the appendix for completeness.

First, we need the following definitions. The $\psi$ -Orlicz norm of a random variable $Y$ is defined as

[TABLE]

where $\psi:\mathbb{R}_{+}\to\mathbb{R}_{+}$ is a convex function satisfying $\lim_{x\to\infty}\psi(x)/x=\infty$ and $\lim_{x\to 0}\psi(x)/x=0$ , and the convention that the infimum of an empty set is $+\infty$ . In this paper, we let $\psi=\psi_{2}$ , where $\psi_{2}(x)=\exp(x^{2})-1$ , and call $||\cdot||_{\psi_{2}}$ just “the Orlicz norm”. A random variable with finite ( $\psi_{2}$ -)Orlicz norm is called sub-Gaussian.

Intuitively, the Orlicz norm quantifies the decay speed for the tails of the distribution of $Y$ . In fact, $||Y||_{\psi_{2}}\leq K$ is equivalent to777See e.g. Vershynin, (2018), Proposition 2.5.2.

[TABLE]

Hence, for example, Gaussian distributions and distributions with bounded support are all sub-Gaussian.

Note also that the last inequality implies

[TABLE]

Now let $T$ be a set and $d$ be a (pseudo-)metric on this set such that $(T,d)$ is a (pseudo-)metric space888Throughout the paper, “metric” can be replaced by a less restrictive notion of “pseudometric”, a distinction we omit from now on.. Consider a zero mean stochastic process $(Z_{t})$ indexed by the elements of $T$ . The process $(Z_{t})$ is said to have sub-Gaussian increments if there exists a constant $K>0$ such that

[TABLE]

It has long been understood that behavior of sub-Gaussian processes is intimately connected to the metric complexity of its index set. In particular, the conventional bound on the expected supremum of $(Z_{t})$ (see e.g. Van Der Vaart and Wellner, (1996) Corollary 2.2.8.) is

[TABLE]

where $N(T,d,\varepsilon)$ is the covering number of $(T,d)$ (i.e. the minimal number of $\varepsilon$ -balls that is sufficient to cover $T$ in metric $d$ ) and $C$ is an absolute constant. The integral on the right hand side is sometimes called Dudley’s entropy of $(T,d)$ and quantifies complexity of $(T,d)$ across multiple scales.

It turns out, however, that Dudley’s entropy bound is not optimal, even for Gaussian processes. In fact, the entropy may be infinite when the expected supremum is not, rendering the bound uninformative999For an illustrative example, see Exercise 8.1.12 in Vershynin, (2018)..

This led to the development of more precise ways to control suprema of sub-Gaussian processes in Fernique, (1976) and Talagrand, (2006). The generic chaining bound is stronger than (3) and is sharp for Gaussian processes101010See Section 8.6 in Vershynin, (2018).. To introduce it, we need another definition.

For a metric space $(T,d)$ , a sequence of finite subsets $T_{0}\subset T_{1}\subset\cdots\subset T$ is admissible if their cardinalities satisfy

[TABLE]

Let the distance from the point $t\in T$ to the set $T_{k}\subset T$ be

[TABLE]

Talagrand’s functional $\gamma_{2}$ is then defined by the formula

[TABLE]

where the infimum is taken over all admissible sequences $(T_{k})$ . Note that we can restrict our attention to only those admissible sequences that eventually come arbitrarily close to any point $t\in T$ , which is possible provided $(T,d)$ is separable111111A metric space $(T,d)$ is separable if it has a countable subset that is dense in $T$ ..

To understand the relation between Talagrand’s functional and Dudley’s entropy, let us provide the discussion from Talagrand, (2006) pp.12–13 here.

Denote $N_{0}=1$ , $N_{k}=2^{2^{k}}$ for $k\geq 1$ , and

[TABLE]

Note that

[TABLE]

where the second equality holds because minimizing the sum $\sum_{k=0}^{\infty}2^{k/2}\sup_{t\in T}d(t,T_{k})$ w.r.t. all admissible sequences $(T_{k})$ can be performed by separately minimizing each term $\sup_{t\in T}d(t,T_{k})$ w.r.t. subsets $T_{k}\subset T$ satisfying $|T_{k}|\leq N_{k}$ .

The definition of $e_{k}(T)$ involves choosing at most $N_{k}$ points $S$ in $T$ such that the balls with radius $e_{k}(T)$ and centers in $S$ cover $T$ ; moreover, $e_{k}(T)$ is the minimal such radius, i.e.

[TABLE]

It follows that if $e_{k}(T)<\varepsilon$ , then $N(T,d,\varepsilon)>N_{k}$ or $N(T,d,\varepsilon)\geq N_{k}+1$ . Hence we can write

[TABLE]

Since $\log(N_{k}+1)\geq 2^{k}\log 2$ for $k\geq 0$ , summation over $k\geq 0$ yields

[TABLE]

where, of course, $e_{0}(T)=\text{diam}(T)=\sup_{t,s\in T}d(t,s)$ .

The term on the left hand side of this inequality satisfies

[TABLE]

Combining this with (6) and (7) yields the key relation

[TABLE]

Hence, when used as an upper bound, Talagrand’s functional is sharper than Dudley’s entropy.

We are now ready to state the generic chaining bound for sub-Gaussian processes, see e.g. Theorem 8.5.3 in Vershynin, (2018).

Theorem 1 (Generic chaining).

Let $Z_{t}$ , $t\in T$ be a mean zero random process on a separable metric space $(T,d)$ with sub-Gaussian increments as in (2). Then, for some absolute constant $C>0$ ,

[TABLE]

Proof.

See Appendix A. ∎

2.2 The main result

We impose the following assumptions.

Assumption 1.

The parameter $\beta$ belongs to a separable metric space $(\mathbf{B},d_{\mathbf{B}})$ .

Assumption 2.

For each $\beta\in\mathbf{B}$ , random variables $x_{it}(\beta)$ follow different MA( $\infty$ ) processes for each $i$ , viz.

[TABLE]

where $\psi_{i\tau}(\beta)$ are nonrandom coefficients such that, for all $i=1,\dots,N$ and $\beta\in\mathbf{B}$ ,

[TABLE]

Assumption 3.

Innovations $\varepsilon_{i\tau}(\beta)$ are independent, mean zero, sub-Gaussian random variables with uniformly bounded scaling factors, i.e. there exists $K_{1}>0$ s.t. for all $(i,\tau,\beta)\in\mathbb{N}\times\mathbb{Z}\times\mathbf{B}$

[TABLE]

Assumption 4.

Innovations $\varepsilon_{i\tau}(\beta)$ are separable121212Let $(\mathbf{B},d_{\mathbf{B}})$ be a separable metric space with a countable dense subset $D$ . A stochastic process $\xi$ on $\mathbf{B}$ is called separable if for all $\beta\in\mathbf{B}$ , there exists a sequence $\beta_{i}\in D$ such that $\beta_{i}\to\beta$ and $\xi(\beta_{i})\to\xi(\beta)$ almost surely. Non-separable stochastic processes have separable copies under very weak conditions, see Shalizi and Kontorovich, (2010). stochastic processes whose increments are sub-Gaussian with uniformly bounded constants, i.e. there exists $K_{2}>0$ s.t. for all $(i,\tau)\in\mathbb{N}\times\mathbb{Z}$ and $(\beta_{1},\beta_{2})\in\mathbf{B}\times\mathbf{B}$

[TABLE]

1 is very weak and only imposes separability of the metric space $\mathbf{B}$ which holds for most parameter spaces encountered in practice such as Euclidean spaces and spaces of integrable functions. 2 is similar to case (ii) in Lemma S.2.1 of Moon and Weidner, (2017) and allows $x_{it}(\beta)$ to be weakly dependent over time. 3 and 4 impose uniform sub-Gaussianity on the innovations $\varepsilon_{it}(\beta)$ and their increments $\varepsilon_{it}(\beta_{1})-\varepsilon_{it}(\beta_{2})$ , respectively. Note that 4 is equivalent to the tail bound

[TABLE]

Denote $\psi_{\tau}(\beta)=(\psi_{1\tau}(\beta),\dots,\psi_{N\tau}(\beta))^{\prime}$ and let $\Xi_{-\tau}(\beta)$ the $N\times T$ matrix consisting of $\varepsilon_{it}(\beta)$ , $i=1,\dots,N$ , $t=1-\tau,\dots,T-\tau.$ Equation (8) can be rewritten in the matrix form as

[TABLE]

Suppose for a moment that we have a bound on $\Xi_{-\tau}(\beta)$ of the form

[TABLE]

where $\varphi$ does not depend on $\tau$ . Then

[TABLE]

This shows that the bound on $\mathbb{E}\sup_{\beta}||X(\beta)||$ is, up to the absolute constant $D$ , the same as the bound on $\mathbb{E}\sup_{\beta}||\Xi_{-\tau}(\beta)||$ . Hence we can focus on obtaining the latter bound from now on. It will be clear from the proof that the bound will not depend on $\tau$ , so we consider the case $\tau=0$ and denote $\Xi=\Xi_{0}$ for brevity.

The operator norm of $\Xi(\beta)$ can be expressed as

[TABLE]

where $\mathbf{U}$ and $\mathbf{V}$ are unit spheres in $\mathbb{R}^{N}$ and $\mathbb{R}^{T}$ , respectively, and the process

[TABLE]

Define the $L_{1}$ product metric on $\mathbf{U}\times\mathbf{V}\times\mathbf{B}$ by

[TABLE]

where $d_{\mathbb{R}^{d}}$ denotes the standard Euclidean metric on $\mathbb{R}^{d}$ .

To obtain a uniform bound on $||\Xi(\beta)||$ , we would like to apply Theorem 1 to the process $Z(\cdot)$ defined on the metric space $(\mathbf{U}\times\mathbf{V}\times\mathbf{B},d)$ . Our first lemma asserts that $Z$ has sub-Gaussian increments.

Lemma 1.

Under Assumptions 1, 3, 4, the process $Z$ has sub-Gaussian increments w.r.t. the metric $d$ , with the constant $K=\max(K_{1},K_{2}).$

Proof.

For $(\tilde{u},\tilde{v},\tilde{\beta}),(u,v,\beta)\in\mathbf{U}\times\mathbf{V}\times\mathbf{B}$ , write

[TABLE]

Recall a standard result for the $\psi_{2}$ norm (see e.g. equation (2.1) in Mendelson and Tomczak-Jaegermann, (2008)): there exists an absolute constant $c>0$ such that for all constants $a_{i}$ and independent centered variables $\xi_{1},\dots,\xi_{n}$ one has

[TABLE]

Applying this inequality, we obtain

[TABLE]

This implies

[TABLE]

which completes the proof. ∎

Our second lemma establishes the bound on Talagrand’s functional of a product space in terms of Talagrand’s functionals of component spaces.

Lemma 2 (Talagrand’s functional of a product space).

Consider a finite number of metric spaces $(T_{l},d_{l}),$ $l=1,\dots,L$ and the product space $T=\bigotimes_{l=1}^{L}T_{l}$ with the $L^{1}$ product metric defined by

[TABLE]

Talagrand’s functional $\gamma_{2}$ of $T$ satisfies

[TABLE]

Proof.

See Appendix B. ∎

Finally, by 1, we can apply the generic chaining bound of Theorem 1 to $Z(u,v,\beta)$ defined on the separable metric space $T=\mathbf{U}\times\mathbf{V}\times\mathbf{B}$ with the $L_{1}$ metric $d$ . 2 then yields

[TABLE]

For the unit sphere $S^{d-1}$ in $\mathbb{R}^{d}$ , its Dudley’s entropy satisfies

[TABLE]

Besides, Talagrand’s functional is bounded from above by Dudley’s entropy (e.g. Exercise 8.5.7 in Vershynin, (2018)), up to absolute constant factors.

Applying these bounds to unit spheres $\mathbf{U}\subset\mathbb{R}^{N}$ and $\mathbf{V}\subset\mathbb{R}^{T}$ gives

[TABLE]

Finally, taking into account the inequality (10), we obtain the main theoretical result of this paper.

Theorem 2.

Under Assumptions 1, 2, 3, 4,

[TABLE]

where $K=\max(K_{1},K_{2}).$

Remarks

(i)

Generic chaining yields not only the bound on the expected value, but also tail bounds and bounds on moments of $\sup_{\beta\in\mathbf{B}}||X(\beta)||$ , see e.g. Dirksen, (2015). In particular, it follows from Theorem 8.5.5 of Vershynin, (2018) that, for all $u\geq 0$ , the event

[TABLE]

holds with probability at least $1-2\exp(-u^{2})$ , where $\text{diam}(\mathbf{B})$ is the diameter of $\mathbf{B}$ in $d_{\mathbf{B}}$ .

(ii)

Suppose $\varepsilon_{it}(\beta)$ are Gaussian random variables. Then the process $Z(u,v,\beta)$ is Gaussian and therefore the bound (11) is sharp, up to an absolute constant, by the majorizing measure theorem, see Theorem 8.6.1 in Vershynin, (2018).

(iii)

If $\mathbf{B}$ is a bounded set in $\mathbb{R}^{d}$ , the main result and majorization of Talagrand’s functional with Dudley’s entropy yield

[TABLE]

In particular, if $\mathbf{B}$ consists of one element (so that there is no dependence on $\beta$ ), the bound reduces to

[TABLE]

which is a classical result in random matrix theory, see e.g. Latała, (2005).

(iv)

The dimension of $\mathbf{B}$ is allowed to grow with the sample size; of course, to maintain the $\sqrt{\max(N,T)}$ rate for the operator norm, the dimension should not grow faster than $\sqrt{\max(N,T)}$ .

(v)

Theorem 2 can be generalized to the case of Orlicz norms $||\cdot||_{\psi_{\alpha}}$ with $\psi_{\alpha}(x)=\exp(x^{\alpha})-1$ , $\alpha\geq 1$ . An important special case $\alpha=1$ corresponds to sub-exponential random variables.

The bound will take the form

[TABLE]

where the generalized Talagrand’s functional is defined by

[TABLE]

The proof is similar to the case $\alpha=2$ . The appropriate version of the generic chaining bound is

[TABLE]

where $Z(\cdot)$ is a stochastic process with bounded $\psi_{\alpha}$ -Orlicz increments. Also,

[TABLE]

Both results can be found in Talagrand, (2006).

3 Applications

3.1 Operator norm minimizing estimator

In this section, we investigate a new estimator that minimizes the operator norm of the moment function matrix. Suppose that $\varepsilon_{it}(\beta)\in\mathbb{R}^{L}$ are $L$ moment functions of $\beta\in\mathbf{B}\subset\mathbb{R}^{K}$ such that $\mathbb{E}(\varepsilon_{it}(\beta_{0}))=0$ . For simplicity, assume that $L=K=1$ . Let $\varepsilon(\beta)=[\varepsilon_{it}(\beta)]$ , the $N\times T$ matrix of moment functions.

The conventional method of moment estimator solves

[TABLE]

where $\mathbf{1}_{N}$ is the $N$ -vector of ones.

The new estimator we propose minimizes the operator norm of the moment function matrix $\varepsilon(\beta)$ ,

[TABLE]

In this section we establish consistency of $\widehat{\beta}$ using our main result of the previous section.

Assumption 5.

(i) the parameter set $\mathbf{B}$ is a bounded subset of $\mathbb{R}$ , (ii) the centered moment function $\varepsilon_{it}(\beta)-\mathbb{E}(\varepsilon_{it}(\beta))$ satisfies the conditions of Assumptions 2-4, and (iii) for any $\epsilon>0$ , there exists $\delta>0$ such that $\inf_{|\beta-\beta_{0}|\geq\epsilon}\frac{\|\mathbb{E}(\epsilon(\beta))\|}{\sqrt{NT}}\geq 2\delta$ .

Conditions (i)-(ii) of Assumption 5 ensure that $\varepsilon_{it}(\beta)-\mathbb{E}(\varepsilon_{it}(\beta))$ satisfies Assumptions 1-4. The last condition (iii) corresponds to the identification condition of the extremum estimator.

For consistency of $\widehat{\beta}$ , it suffices to show that for any $\epsilon>0$ , there exists $\delta>0$ such that

[TABLE]

with probability approaching one.

First, note that, since $\mathbb{E}(\varepsilon(\beta_{0}))=0$ , the triangle inequality yields

[TABLE]

On the other hand,

[TABLE]

Combine (13) and (14) to obtain

[TABLE]

Finally, choose $\delta$ as in Assumption 5(iii) to guarantee

[TABLE]

and note that Theorem 2 gives

[TABLE]

Then (15) implies

[TABLE]

which finishes the proof of consistency of $\hat{\beta}$ .

Remarks

(i)

If $\varepsilon_{it}(\beta)$ are iid, then the identification condition Assumption 5 (iii) becomes the usual identification condition, that is, for any $\epsilon>0$ , there exists $\delta>0$ such that $\inf_{|\beta-\beta_{0}|\geq\epsilon}|\mathbb{E}(\varepsilon_{it}(\beta))|>2\delta$ . This is because $\frac{\|\mathbb{E}(\varepsilon(\beta))\|}{\sqrt{NT}}=|\mathbb{E}(\varepsilon_{it}(\beta))|\left\|\frac{\mathbf{1}_{N}}{\sqrt{N}}\frac{\mathbf{1}_{T}^{\prime}}{\sqrt{T}}\right\|=|\mathbb{E}(\varepsilon_{it}(\beta))|.$

(ii)

Suppose that $\varepsilon_{it}(\beta)=(\varepsilon_{1,it}(\beta),...,\varepsilon_{L,it}(\beta))^{\prime}\in\mathbb{R}^{L}$ . Instead of the operator norm objective function, we may also consider

[TABLE]

where $\omega_{l}$ are weights.

(iii)

We can also extend the objective function to be the sum of $R_{NT}$ largest singular values, where $R_{NT}$ is a sequence of positive integers such that $R_{NT}\rightarrow\infty$ while $\frac{R_{NT}}{\sqrt{\min(N,T)}}\rightarrow 0$ :

[TABLE]

where $s_{r}(A)$ is the $r^{th}$ largest singular value of matrix $A$ . Since $\|\varepsilon(\beta)-\mathbb{E}(\varepsilon(\beta))\|=s_{1}(\varepsilon(\beta)-\mathbb{E}(\varepsilon(\beta)))$ , we have

[TABLE]

3.2 Estimator of number of factors with functional data

Consider a generic factor model for functional data

[TABLE]

where $\beta$ belongs to a separable metric space $(\mathbf{B},d_{\mathbf{B}})$ , $Y(\beta)\in\mathbb{R}^{N\times T}$ is the observation matrix of functional outcomes $\beta\mapsto y_{it}(\beta)$ , and $\lambda(\beta)\in\mathbb{R}^{N\times R(\beta)}$ , $f(\beta)\in\mathbb{R}^{T\times R(\beta)}$ such that for all $\beta\in\mathbf{B}$ the probability limits of $\lambda(\beta)^{\prime}\lambda(\beta)/N$ and $f(\beta)^{\prime}f(\beta)/T$ exist and are positive definite deterministic matrices such that

[TABLE]

The object of interest is the maximal rank $R=\max_{\beta\in\mathbf{B}}R(\beta)$ .

To illustrate applicability of this model, suppose that the outcome variable is intraday pollution levels $y_{it}(\beta)$ , where $\beta$ is the time within a day, across counties $i$ and time $t$ , as in Aue et al., (2015). It is plausible to assume that counties with higher population density and dependence on automobiles will have higher average levels of pollution. At the same time, pollution patterns on weekdays and on weekends may differ in a systematic way. Hence it is reasonable to model the intraday pollution curve $y_{it}(\cdot)$ as the interaction of the county fixed effect $\lambda_{i}(\cdot)$ and the time effect $f_{t}(\cdot)$ , plus independent noise, arriving at model (16). A related approach to modeling functional time series can be found in Kargin and Onatski, (2008), whose empirical objective is to predict the contract rate curves of daily Eurodollar futures.

Of course, arguments similar to those outlined above may be applied to modeling of numerous other functional quantities, from mortality as a function of age to crop yields as a function of spatial location. For more examples and an overview of functional data analysis, see e.g. Wang et al., (2016) and Kowal et al., (2019).

Let us now show heuristically how to derive a consistent estimator of the maximal rank $R$ .

Note that the model assumptions imply

[TABLE]

If $U(\beta)$ satisfies the conditions of Theorem 2, we have $\sup_{\beta}||U(\beta)||=O_{p}\left(\sqrt{\max(N,T)}+\gamma_{2}(\mathbf{B},d_{\mathbf{B}})\right)$ and so

[TABLE]

Denote $s_{i}(A)$ the $i$ -th largest singular value of matrix $A$ . The Ky Fan inequality for singular values asserts that for $A,B\in\mathbb{R}^{N\times T}$

[TABLE]

Using this inequality, for a fixed $\beta$ we obtain

[TABLE]

Therefore, there exists a positive constant $C>0$ such that

[TABLE]

where the last inequality holds by (17).

On the other hand,

[TABLE]

This establishes consistency of the following natural estimator of $R$ ,

[TABLE]

where $\psi_{NT}$ is a sequence of real numbers satisfying $\psi_{NT}\to 0$ and $\psi_{NT}\sqrt{\min(N,T)}\to\infty$ .

Empirical practice calls for an automatic procedure for choosing the tuning parameter $\psi_{NT}$ . One may consider one of the following three options, using the penalty term from Bai and Ng, (2002):

[TABLE]

where $\hat{\sigma}^{2}=\sup_{\beta}\hat{\sigma}^{2}(\beta)$ is a consistent estimator of

[TABLE]

In applications, $\hat{\sigma}^{2}(\beta)$ can be replaced by the residual variance of $Y(\beta)$ after partialling out $k_{\max}$ factors using principle component analysis, where $k_{\max}$ is a pre-specified upper bound on the true maximal number of factors $R$ .

4 Monte Carlo illustration

Here we illustrate the performance of the maximal rank estimator in the functional factor model described in the previous section with a simple simulation design.

The data generating process is the functional factor model (16), where, for simplicity, we let the loadings $\lambda(\beta)$ and the factors $f(\beta)$ to be independent of $\beta$ . In scalar form, the model is

[TABLE]

where $\lambda_{ir},f_{tr}\sim\text{iid}\,N(0,1)$ and

[TABLE]

The chosen specification for $u_{it}(\cdot)$ comes from a generic representation of any Gaussian stochastic process as an infinite trigonometric series, in which we only retain one term. Clearly, the error variance $\mathbf{V}(u_{it}(\beta))=\sigma$ for all $\beta$ and there is nontrivial dependence of $u_{it}(\beta)$ across values of $\beta$ . We set $\sigma=1$ . The results do not change substantially when larger values of $\sigma$ are used.

We choose the range of parameter $\beta$ to be $\mathbf{B}=\{0,0.1,\dots,0.9,1\}$ and the corresponding ranks

[TABLE]

so that the true value of interest is $R=\max_{\beta}R(\beta)=4$ .

The simulated bias and root MSE for the maximal rank estimator (20) are shown in Table 1. Clearly, the choice $\psi_{NT}=\psi_{NT,3}$ for the tuning parameter leads to poor small sample performance, which is similar to the results of Bai and Ng, (2002). However, under the other two choices $\psi_{NT,1},\psi_{NT,2}$ , bias and RMSE are modest even in small samples and become essentially zero when $N,T\geq 50$ .

Given these simulation results, we are convinced that our generalization (20) of the estimator of Bai and Ng, (2002) will be useful for practitioners who are interested in estimating factor models with functional data.

5 Conclusion

In this paper, we derive a novel uniform stochastic bound on the operator norm of sub-Gaussian random matrices. We use it to establish consistency of a new estimator that minimizes the operator norm of the matrix of moment functions as well as to introduce an estimator of the maximal number of factors in a functional interactive fixed effects model.

\appendixpage

Appendix A Proof of Theorem 1

The following proof can be found in Vershynin, (2018), see Theorem 8.5.3.

Since $T$ is separable, we can assume for simplicity that it is finite. Let $(T_{k})$ be an admissible sequence and $\pi_{k}(t)$ be the best approximation to $t$ in $T_{k}$ , i.e.

[TABLE]

Now consider a chain of approximations to the point $t$ starting from some $t_{0}$

[TABLE]

and write

[TABLE]

Sub-Gaussianity of increments $Z_{\pi_{k}(t)}-Z_{\pi_{k-1}(t)}$ implies that, for any $u>0$ ,

[TABLE]

where $C\geq\sqrt{8K}$ .

Note that since $\pi_{k}(t)\in T_{k}$ , $\pi_{k-1}(t)\in T_{k-1}$ , the number of possible pairs $(\pi_{k}(t),\pi_{k-1}(t))$ is $|T_{k}|\cdot|T_{k-1}|\leq|T_{k}|^{2}=2^{2^{k+1}}$ . Applying the union bound to (22) over $k\in\mathbb{N}$ and pairs $(\pi_{k}(t),\pi_{k-1}(t))$ , we obtain

[TABLE]

The event on the left-hand side implies

[TABLE]

for a constant $\tilde{C}>0$ . Taking supremum over $t\in T$ yields

[TABLE]

Since this event holds with probability at least $1-2e^{-u^{2}}$ , $\sup_{t\in T}|Z_{t}-Z_{t_{0}}|$ is a sub-Gaussian random variable with Orlicz norm bounded by $\tilde{C}\gamma_{2}(T,d)$ . The conclusion then follows from (1) and the inequality

[TABLE]

Appendix B Proof of 2

We give the proof for the case of $L=2$ metric spaces for simplicity. The case of arbitrary $L$ follows immediately by inspection.

Denote the two spaces by $(X,d_{X})$ and $(Y,d_{Y})$ . Consider admissible sequences $(X_{k})$ and $(Y_{k})$ in $X$ and $Y$ , respectively. To each such pair there corresponds a sequence $(\tilde{T}_{k})$ in $T=X\times Y$ of the form

[TABLE]

This sequence is admissible since $|\tilde{T}_{0}|=|\tilde{T}_{1}|=1$ and $|\tilde{T}_{k}|=|X_{k-1}||Y_{k-1}|\leq 2^{2^{k-1}}2^{2^{k-1}}=2^{2^{k}}$ for $k\geq 2$ .

Fix $(x,y)\in T$ and write

[TABLE]

The bound on the first two terms on the right-hand side is

[TABLE]

Similarly, we have

[TABLE]

Adding the two inequalities and taking suprema yields

[TABLE]

Taking infima over admissible sequences $(\tilde{T}_{k})$ (which are functions of admissible sequences $(X_{k})$ and $(Y_{k})$ ) yields

[TABLE]

Finally, note that $\gamma_{2}(T,d)$ is not larger than the left-hand side of the inequality above since the infimum in its definition is taken over all admissible sequences $(T_{k})$ , not only those that have the form $(\tilde{T}_{k})$ . $\blacksquare$

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aue et al., (2015) Aue, A., Norinho, D. D., and Hörmann, S. (2015). On the prediction of stationary functional time series. Journal of the American Statistical Association , 110(509):378–392.
2Bai and Ng, (2002) Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica , 70(1):191–221.
3Bai and Silverstein, (2010) Bai, Z. and Silverstein, J. W. (2010). Spectral analysis of large dimensional random matrices , volume 20. Springer.
4Bai, (2008) Bai, Z. D. (2008). Methodologies in spectral analysis of large dimensional random matrices, a review. In Advances In Statistics , pages 174–240. World Scientific.
5Bandeira and Van Handel, (2016) Bandeira, A. S. and Van Handel, R. (2016). Sharp nonasymptotic bounds on the norm of random matrices with independent entries. The Annals of Probability , 44(4):2479–2506.
6Dirksen, (2015) Dirksen, S. (2015). Tail bounds via generic chaining. Electronic Journal of Probability , 20.
7Edelman and Rao, (2005) Edelman, A. and Rao, N. R. (2005). Random matrix theory. Acta Numerica , 14:233–297.
8Fernique, (1976) Fernique, X. (1976). Regularité des trajectoires des fonctions aléatoires gaussiennes. pages 1–96.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Uniform Bound on the Operator Norm

Abstract

1 Introduction

2 Uniform bound on the operator norm

2.1 Generic chaining bound

Theorem 1** (Generic chaining).**

Proof.

2.2 The main result

Assumption 1**.**

Assumption 2**.**

Assumption 3**.**

Assumption 4**.**

Lemma 1**.**

Proof.

Lemma 2** (Talagrand’s functional of a product space).**

Proof.

Theorem 2**.**

3 Applications

3.1 Operator norm minimizing estimator

Assumption 5**.**

3.2 Estimator of number of factors with functional data

4 Monte Carlo illustration

5 Conclusion

Appendix A Proof of Theorem 1

Appendix B Proof of 2

Theorem 1 (Generic chaining).

Assumption 1.

Assumption 2.

Assumption 3.

Assumption 4.

Lemma 1.

Lemma 2 (Talagrand’s functional of a product space).

Theorem 2.

Assumption 5.