Operator norm upper bound for sub-Gaussian tailed random matrices

Eric Benhamou; Jamal Atif; Rida Laraki

arXiv:1812.09618·math.PR·January 23, 2019

Operator norm upper bound for sub-Gaussian tailed random matrices

Eric Benhamou, Jamal Atif, Rida Laraki

PDF

TL;DR

This paper establishes bounds on the operator norm of sub-Gaussian tailed random matrices, extending results to matrices with row-wise independence and non-uniform variances.

Contribution

It proves that matrices with row-wise independent sub-Gaussian rows satisfy Tracy-Widom bounds, generalizing previous results that required independent coefficients.

Findings

01

Operator norm bounded by O(√n) with high probability

02

Row-wise independence suffices for Tracy-Widom bounds

03

Results extend to non-uniform variance matrices

Abstract

This paper investigates an upper bound of the operator norm for sub-Gaussian tailed random matrices. A lot of attention has been put on uniformly bounded sub-Gaussian tailed random matrices with independent coefficients. However, little has been done for sub-Gaussian tailed random matrices whose matrix coefficients variance are not equal or for matrix for which coefficients are not independent. This is precisely the subject of this paper. After proving that random matrices with uniform sub-Gaussian tailed independent coefficients satisfy the Tracy Widom bound, that is, their matrix operator norm remains bounded by $O (n)$ with overwhelming probability, we prove that a less stringent condition is that the matrix rows are independent and uniformly sub-Gaussian. This does not impose in particular that all matrix coefficients are independent, but only their rows, which is a weaker…

Equations68

∥ X ∥_{a, b} = sup {∥ X u ∥_{a} ∣ ∥ u ∥_{b} \leq 1} .

∥ X ∥_{a, b} = sup {∥ X u ∥_{a} ∣ ∥ u ∥_{b} \leq 1} .

∥ X ∥_{o p} = sup {∥ X u ∥ ∣ ∥ u ∥ \leq 1} .

∥ X ∥_{o p} = sup {∥ X u ∥ ∣ ∥ u ∥ \leq 1} .

∥ X ∥_{2} = σ_{max} (X) = (λ_{max} (X^{T} X))^{1/2} .

∥ X ∥_{2} = σ_{max} (X) = (λ_{max} (X^{T} X))^{1/2} .

P (∣ ξ ∣ > t) \leq B exp (- b t^{2}) .

P (∣ ξ ∣ > t) \leq B exp (- b t^{2}) .

P (∣ ξ - E [X] ∣ > t) \leq 2 exp (- \frac{t ^{2}}{2 σ ^{2}}) .

P (∣ ξ - E [X] ∣ > t) \leq 2 exp (- \frac{t ^{2}}{2 σ ^{2}}) .

P (E) \geq 1 - C_{k} / n^{k}

P (E) \geq 1 - C_{k} / n^{k}

P (E^{c}) \leq C_{k} e^{- k n}

P (E^{c}) \leq C_{k} e^{- k n}

P (∥ M ∥_{o p} > A n) \leq C exp (- c A n)

P (∥ M ∥_{o p} > A n) \leq C exp (- c A n)

P (∥ M ∥_{a, b} > A n) \leq C exp (- c A n)

P (∥ M ∥_{a, b} > A n) \leq C exp (- c A n)

P (∥ R_{i} ∣ > t) \leq B exp (- b t^{2})

P (∥ R_{i} ∣ > t) \leq B exp (- b t^{2})

P (∣ x i_{ij} ∣ > t) \leq B exp (- b t^{2})

P (∣ x i_{ij} ∣ > t) \leq B exp (- b t^{2})

P (∥ M u ∥ > A n) \leq C exp (- c A n)

P (∥ M u ∥ > A n) \leq C exp (- c A n)

P (∣ ξ_{i, j} ∣ \geq t) \leq B exp (- b t^{2}) .

P (∣ ξ_{i, j} ∣ \geq t) \leq B exp (- b t^{2}) .

P (∣ R_{i} ∣ \geq t) \leq P (n j max ∣ ξ_{i, j} ∣ \geq t) \leq B exp (- \frac{b}{n} t^{2}) .

P (∣ R_{i} ∣ \geq t) \leq P (n j max ∣ ξ_{i, j} ∣ \geq t) \leq B exp (- \frac{b}{n} t^{2}) .

P (∥ R_{i} ∥ \geq t) \leq B exp (- b^{'} t^{2}) .

P (∥ R_{i} ∥ \geq t) \leq B exp (- b^{'} t^{2}) .

P (∣ R_{i} u ∣ \geq t) \leq P (∥ R_{i} ∥ \geq t) \leq B exp (- b^{'} t^{2}) .

P (∣ R_{i} u ∣ \geq t) \leq P (∥ R_{i} ∥ \geq t) \leq B exp (- b^{'} t^{2}) .

E [e^{b ∣ R_{i} u ∣^{2}}] \leq B .

E [e^{b ∣ R_{i} u ∣^{2}}] \leq B .

E [e^{b ∥ M u ∥^{2}}] = E [i = 1 \prod n e^{b ∣ R_{i} u ∣^{2}}] = i = 1 \prod n E [e^{b ∣ R_{i} u ∣^{2}}] \leq B^{n} .

E [e^{b ∥ M u ∥^{2}}] = E [i = 1 \prod n e^{b ∣ R_{i} u ∣^{2}}] = i = 1 \prod n E [e^{b ∣ R_{i} u ∣^{2}}] \leq B^{n} .

P (∥ M u ∥ \geq A n) = P (e^{b ∥ M u ∥^{2}} \geq e^{b A^{2} n}) \leq \frac{E [ e ^{b ∥ M u ∥^{2}} ]}{e ^{b A^{2} n}} \leq C e^{- b A^{2} n} \leq C e^{- b C A n}

P (∥ M u ∥ \geq A n) = P (e^{b ∥ M u ∥^{2}} \geq e^{b A^{2} n}) \leq \frac{E [ e ^{b ∥ M u ∥^{2}} ]}{e ^{b A^{2} n}} \leq C e^{- b A^{2} n} \leq C e^{- b C A n}

P (∥ M u ∥ > A n) \leq C exp (- c A n)

P (∥ M u ∥ > A n) \leq C exp (- c A n)

P (∥ M ∥_{o p} > λ) \leq P (u \in S ⋃ ∥ M u ∥ > λ)

P (∥ M ∥_{o p} > λ) \leq P (u \in S ⋃ ∥ M u ∥ > λ)

P (∥ M ∥_{o p} > λ) \leq P (v \in Σ (ε) ⋃ ∥ M v ∥ > λ (1 - ε))

P (∥ M ∥_{o p} > λ) \leq P (v \in Σ (ε) ⋃ ∥ M v ∥ > λ (1 - ε))

∥ M x ∥ = ∥ M ∥_{o p}

∥ M x ∥ = ∥ M ∥_{o p}

∥ M (x - y) ∥ \leq ∥ M ∥_{o p} ∥ x - y ∥ \leq M ∥_{o p} ∥ ε

∥ M (x - y) ∥ \leq ∥ M ∥_{o p} ∥ x - y ∥ \leq M ∥_{o p} ∥ ε

∥ M x ∥_{o p} = ∥ M x ∥ \leq ∥ M (x - y) ∥ + ∥ M y ∥ \leq ∥ M ∥_{o p} ε + ∥ M y ∥

∥ M x ∥_{o p} = ∥ M x ∥ \leq ∥ M (x - y) ∥ + ∥ M y ∥ \leq ∥ M ∥_{o p} ε + ∥ M y ∥

∥ M y ∥ \geq M ∥_{o p} (1 - ε)

∥ M y ∥ \geq M ∥_{o p} (1 - ε)

\frac{( 1 + ε /2 ) ^{n} - ( 1 - ε /2 ) ^{n}}{( ε /2 ) ^{n}}

\frac{( 1 + ε /2 ) ^{n} - ( 1 - ε /2 ) ^{n}}{( ε /2 ) ^{n}}

\frac{( 1 + ε /2 ) ^{n} - ( 1 - ε /2 ) ^{n}}{( ε /2 ) ^{n}}

\frac{( 1 + ε /2 ) ^{n} - ( 1 - ε /2 ) ^{n}}{( ε /2 ) ^{n}}

P (∥ M ∥_{o p} > A n) \leq P (v \in Σ (ε) ⋃ ∥ M v ∥ > A n (1 - ε)) \leq v \in Σ (ε) \sum P (∥ M v ∥ > A n (1 - ε))

P (∥ M ∥_{o p} > A n) \leq P (v \in Σ (ε) ⋃ ∥ M v ∥ > A n (1 - ε)) \leq v \in Σ (ε) \sum P (∥ M v ∥ > A n (1 - ε))

P (∥ M v ∥ > A n (1 - ε)) \leq C exp (- c A (1 - ε) n)

P (∥ M v ∥ > A n (1 - ε)) \leq C exp (- c A (1 - ε) n)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A short note on the operator norm upper bound for sub-Gaussian tailed random matrices

Eric Benhamou A.I. Square ConnectLamsade, Paris DauphineEmail: [email protected], [email protected]

Jamal Atif 22footnotemark: 2 Email: [email protected]

Rida Laraki 22footnotemark: 2 Email: [email protected]

Abstract

This paper investigates an upper bound of the operator norm for sub-Gaussian tailed random matrices. A lot of attention has been put on uniformly bounded sub-Gaussian tailed random matrices with independent coefficients. However, little has been done for sub-Gaussian tailed random matrices whose matrix coefficients variance are not equal or for matrix for which coefficients are not independent. This is precisely the subject of this paper. After proving that random matrices with uniform sub-Gaussian tailed independent coefficients satisfy the Tracy Widom bound, that is, their matrix operator norm remains bounded by $O(\sqrt{n})$ with overwhelming probability, we prove that a less stringent condition is that the matrix rows are independent and uniformly sub-Gaussian. This does not impose in particular that all matrix coefficients are independent, but only their rows, which is a weaker condition.

1 Introduction

Random matrices and their spectra have been under intensive study in many fields. This is the case in Statistics since the work of Wishart (1928) on sample covariance matrices, in Numerical Analysis since their introduction by von Neumann and Goldstine (1947) in the 1940s, in Physics as a consequence of the work of Wigner (1955, 1958) since the 1950s on in Banach Space Theory and Differential Geometric Analysis with the work of Grothendieck (1956) in a similar period. More recently, in machine learning, the netflix prize (see Wikipedia (2018a)) has attracted a lot of attention with a large part of the community investigating recommender systems (see Wikipedia (2018b)) and collaborative filtering methods, which ultimately also rely on random matrices and their eigen and singular values spectra.

In particular, an interesting and important problem in matrix completion problem has been to investigate where the operator norm is concentrated to be able to make some reasonable assumptions about missing entries. Other important contribution have been the Tracy Widom law, which says that for Wigner matrix, the operator norm is concentrated in the range of $\left[2\sqrt{n}-O(n^{-1/6}),\right.$ $\left.2\sqrt{n}+O(n^{-1/6})\right]$ (see Tracy and Widom (1994)), and the Marchenko–Pastur distribution that describes the asymptotic behavior of singular values of large rectangular random matrices (see Marčenko and Pastur (1967)).

However, most of these results have been derived under the assumptions of independent and identically distributed coefficients. It is natural to ask similar questions about general random matrices whose entries distribution may differ. In particular, to make the question more concrete, we are interested in finding an upper bound of the operator norm of a random matrix whose coefficients are sub Gaussian and see the implied consequence for the matrix coefficients. The paper is organized as follows. In section 2, we recall various definitions. In section 3, we first proved that for independent and uniform sub-Gaussian tailed random squared matrices their operator norm satisfies the Tracy Widom bound, that is, the matrix operator norm for the $L_{a},L_{b}$ norm remains bounded by $O(\sqrt{n})$ . We see that a less stringent sufficient condition is that the matrix rows $L_{a}$ norms are uniformly sub-Gaussian and independent. This implies in particular that a matrix with coefficients that are not necessarily independent and sub-Gaussian can still validate an upper bound for the its operator norm of $O(\sqrt{n})$ with overwhelming probability.

The condition of independence of rows has already been mentioned in Vershynin (2018) with a similar setting and proof and appeared as early as 2017. Additionally, Benaych-Georges and Knowles (2016) provided a similar proof in the Hermitian case and pointed kindly to the authors the last two references that authors were not aware of at the time of their writing. This article has at least the merit to be self contained and to focus only on sub-Gaussian random matrix making the presentation shorter and self consistent. But for more details, we advise the reader to refer to the last two references that cover a much wider scope and are respectively 300 and 80 pages long.

2 Some definitions

Suppose $\|\cdot\|_{a}$ and $\|\cdot\|_{b}$ are norms on $\mathbb{R}^{m}$ and $\mathbb{R}^{n}$ , respectively. We can of course generalize easily the concept to norms operating on $\mathbb{C}^{m}$ and $\mathbb{C}^{n}$ if we look at matrices with complex number coefficients.

Definition 2.1.

We define the operator norm of $\mathbf{X}\in\mathbb{R}^{m\times n}$ , induced by the norms $\|\dots\|_{a}$ and $\|\dots\|_{b}$ , as

[TABLE]

We will denote this norm as $\|\cdot\|_{op}$ and we will drop the $a,b$ indices to make things simpler whenever there is no risk of confusion and have the following definition

[TABLE]

When $\|\cdot\|_{a}$ and $\|\cdot\|_{b}$ are both Euclidean norms, the operator norm of $\mathbf{X}$ is its maximum singular value, and is denoted $\|\cdot\|_{2}$ :

[TABLE]

where $\sigma_{\text{max}}(\mathbf{X})$ is the maximum singular value of the matrix $\mathbf{X}$ and where $\lambda_{\text{max}}(\mathbf{X}^{T}\mathbf{X})$ is the maximum eigen value of the matrix $\mathbf{X}^{T}\mathbf{X}$ also defined as $\text{sup}\{u^{T}\mathbf{X}^{T}\mathbf{X}u\,\|\,\ \|u\|_{2}=1\}$ .

In the rest of the paper, we will assume to simplify notation that $a=b=2$ to keep things simple but all results remain the same for any $a,b\geq 1$ .

*Remark 1**.*

For the trivial matrix consisting entirely of single ones, it has an operator norm of exactly $n$ . This can be seen easily by taking the vector $u=(1/\sqrt{n},\ldots,1/\sqrt{n})^{T}$ that gives $\|\mathbf{X}u\|_{2}=n$ , and proves that the operator norm should be at least equal to $n$ . But the Cauchy-Schwarz inequality proves that it cannot be more than $n$ . This vector is the right one to choose for the $L_{2}$ norm. But using the fact that any norm is equivalent in finite dimension (and that the matrix space is of finite dimension $n^{2}$ ), this result is not specific to the $L_{2}$ norm and is true for any norm.

Furthermore, the same application of the Cauchy Schwartz proves that the operator norm of any matrix whose coefficients are uniformly bounded by a constant $K$ has an operator norm bounded by $Kn$ . In other words, using the Landau notation, any matrix whose entries are all uniformly $O(1)$ has an operator norm of $O(n)$ . However, this upper bound does not take into account of any possible cancellations in the matrix $M$ . Indeed, intuitively, using the concentration inequality of Hoeffding and Markov, we should expect with overwhelming probability (a notion that we will define shortly) that the operator norm should be bounded by $\sqrt{n}$ rather than $n$ in most cases where matrices coefficients are symmetrically distributed and have tails that are decreasing fast enough, a concept that we will also make more precised shortly with the concept of sub-Gaussian tails.

As for Euclidean norms, the operator norm boils down to computing the maximum singular value and for symmetric matrices, the maximum eigen values, it gives fruitful information about the these two quantities.

Definition 2.2.

A random variable $\xi$ is called sub-Gaussian if there are non negative constants $B,b>0$ such that for every $t>0$ ,

[TABLE]

where $\mathbb{P}$ is the probability measure defined on a usual probability space $\Omega=(\Omega,\mathcal{B},\mathbb{P})$ . where $\Omega$ is the ambient sample space, associated with a $\sigma$ -algebra $\mathcal{B}$ of subsets of $\Omega$ .

*Remark 2**.*

Sub-Gaussian can be defined in multiple ways. We have used the traditional definition that states that the tails of the variable $\xi$ are dominated by, meaning they decay at least as fast as, the tails of a Gaussian. A more probabilistic way of defining the sub-Gaussian is to state that a random variable $\xi$ is called sub-Gaussian with variance proxy $\sigma$ if

[TABLE]

Chernoff bound allows to translate a bound on the moment generating function into a tail bound and vice versa. So we should expect to have equivalent definition in terms of moment generating, Laplace transform and many more criteria. Indeed, there are many equivalent definitions ( that can be found for instance in Buldygin and Kozachenko (1980) or Ledoux and Talagrand (1991))

•

A random variable $\xi$ is sub-Gaussian.

•

A random variable $\xi$ satisfies the $\psi_{2}$ -condition, that is, there exist two non negative real constants $B,b>0$ such that $\mathbb{E}[e^{b\xi^{2}}]\leq B$ .

•

A random variable $\xi$ satisfies the Laplace transform condition, that is there exist two non negative real constants $B,b>0$ such that $\forall\lambda\in\mathbb{R}$ , $\ \ \mathbb{E}[e^{\lambda(\xi-\operatorname{E}[\xi])}]\leq Be^{\lambda^{2}b/2}$ . This condition is also referred to as the moment generating-condition, that is there exist two non negative real constants $B,b>0$ such that $\mathbb{E}[e^{t\xi}]\leq Be^{t^{2}b^{2}/2}$ . The parameter $b$ is directly related to the variance proxy $\sigma$ .

•

A random variable $\xi$ satisfies the Moment condition, that is there exists a non negative real constant $K>0$ such that $\ \forall p\geq 1\ \left(\mathbb{E}[|\xi|^{p}\right])^{1/p}\leq K\sqrt{p}$ . It is easy to see with for instance Gaussian variables that $K$ can be expressed with respect to the variance proxy $\sigma$ as follows: $K=\sigma e^{1/e}$ for $k\geq 2$ and $K=\sigma\sqrt{2\pi}$ .

•

A random variable $\xi$ satisfies the Union bound condition, that is there exists a non negative real constant $c>0$ such that $\forall n\geq c\ \mathbb{E}[\max\{|\xi_{1}-\operatorname{E}[\xi]|,\ldots,|\xi_{n}-\operatorname{E}[\xi]|\}]\leq c\sqrt{\log n}$ where $\xi_{1},\ldots,\xi_{n}$ are independent and identically distributed random variables, copies of $\xi$ .

•

The tail is less than the one of a Gaussian of variance proxy $\sigma$ , there exist $b>0$ and $Z\sim\mathcal{N}(0,\sigma^{2})$ such that $\mathbb{P}(|\xi|>t)\leq b\mathbb{P}(|Z|\geq t)$ . The latter definition explains the term sub-Gaussian constants quite well.

Obviously, the different negative real constants $B,b>0$ are not necessarily the same.

Definition 2.3.

Referring to Tao , we say that an event $E$ holds with overwhelming probability if, for every fixed real constant $k>0$ , we have

[TABLE]

for some constant $C_{k}$ independent of $n$ or equivalently $\mathbb{P}(E^{c})\leq C_{k}e^{-k\ln n}$ where $A^{c}$ denotes the complementary of $A$ .

*Remark 3**.*

Of course, the concept of overwhelming probability can be extended to a family of events $E_{\alpha}$ depending on some parameter $\alpha$ with the condition that each event in the family holds with overwhelming probability uniformly in $\alpha$ if the constant $C_{k}$ in the definition of overwhelming probability is independent of $\alpha$ .

*Remark 4**.*

Using Boole’s inequality (also referred to as the union bound in the English mathematical literature) that states that the probability measure is $\sigma$ -sub additive, we trivially see that if a family of events $E_{\alpha}$ of polynomial cardinality holds with overwhelming probability, then the intersection over $\alpha$ of this family $\bigcap\limits_{\alpha}E_{\alpha}$ still holds with overwhelming probability.

*Remark 5**.*

The previous Boole’s inequality remark emphasizes that although the concept of overwhelming probability is not the same as the one of almost surely, it is still something with very high probability. In the rest of the paper, we will even get tighter bound and prove

[TABLE]

which implies that the event $E$ holds with overwhelming probability.

3 Upper bound for operator norm for sub-Gaussian tailed matrices

Equipped with these definition, we shall prove the following statement

Proposition 3.1.

Let a squared matrix $M$ be with independent coefficients $\xi_{i,j}$ with zero mean that are uniformly sub-Gaussian , then there exist non negative real constants $C,c>0$ such that

[TABLE]

for all $A\geq C$ . In particular, we have $\|M\|_{op}=O(\sqrt{n})$ with overwhelming probability

Proof.

See A.1. ∎

*Remark 6**.*

This result is quite natural as the matrix coefficients $\xi_{i,j}$ are uniformly sub-Gaussian. Indeed in the proof, we have used the fact that the matrix coefficients $\xi_{i,j}$ $L_{\infty}$ norm was sub-Gaussian, hence any of the matrix row for the $L_{a}$ was sub-Gaussian. But can we go further and find a less stringent sufficient condition for the inequality 8 to hold? The answer is yes and is provided by the condition stated in proposition 3.2.

Proposition 3.2.

Let a squared matrix $M$ such that any of its row is uniformly sub-Gaussian for the norm $L_{a}$ and independent, then there exist non negative real constants $C,c>0$ such that

[TABLE]

for all $A\geq C$ . In particular, we have $\|M\|_{a,b}=O(\sqrt{n})$ with overwhelming probability

Proof.

See A.2. ∎

*Remark 7**.*

If a random matrix has its rows uniformly sub-Gaussian, necessarily, any of its coefficients is also uniformly sub-Gaussian. This is trivially seen as for a given coefficient $\xi_{ij}$ , the corresponding row $R_{i}$ is sub-Gaussian, hence there are positive constants $B,b$ that does not depend on $i$ such that for every $t>0$ ,

[TABLE]

Hence since $\|R_{i}\ |>|xi_{ij}|$ , we have as well

[TABLE]

which proves the uniform sub-Gaussian character of any of the matrix row. The independence of the matrix rows, however, does not imply that each of the matrix row are independent, making the condition of proposition 3.1 less stringent.

4 Conclusion

This paper investigated an upper bound of the operator norm for sub-Gaussian tailed random matrices. We proved here that random matrices with independent rows that are uniformly sub-Gaussian satisfy the Tracy Widom bound, that is, the matrix operator norm remains bounded by $O(\sqrt{n})$ . An interesting extension would be to see how we can generalize our result to the $(\ell_{p},\ell_{r})$ -Grothendieck problem, which seeks to maximize the bilinear form $y^{T}Ax$ for an input matrix $A\in{\mathbb{R}}^{m\times n}$ over vectors $x,y$ with $\|x\|_{p}=\|y\|_{r}=1$ . We know this problem is equivalent to computing the $p\to r^{\ast}$ operator norm of $A$ , where $\ell_{r^{*}}$ is the dual norm to $\ell_{r}$ .

Appendix A Proofs

A.1 Proof of proposition 3.1

We will do the proof thanks to three simple lemmas below that take advantage of the uniform sub-Gaussian tails bounds and the remarkable property of the Lipschitz character of the map $x\rightarrow\|Mx\|$ , combined with the compacity of the unit sphere.

Let us define the unit sphere $\mathcal{S}:=\{u\in\mathbb{R}^{n}|\|u\|=1\}$ of the $\mathbb{R}^{n}$ vector space. The result is similar for complex coefficients matrices in which case the unit sphere is modified into $\mathcal{S}:=\{u\in\mathbb{R}^{n}|\|u\|=1\}$ of the $\mathbb{C}^{n}$ . We will first prove the following lemma

Lemma A.1.

If the coefficients $\xi_{i,j}$ of $M$ are independent and have uniformly sub-Gaussian tails, then there exist absolute constants $C,c>0$ such that for any $u\in\mathcal{S}$ , we have

[TABLE]

for all $A\geq C$ .

Proof.

Let $R_{1},\ldots,R_{n}$ be the $n$ rows of the matrix $\mathbf{M}$ , then the column vector $\mathbf{M}u$ has coefficients $R_{i}u$ for $i=1,\ldots,n$ .

The matrix coefficients $\xi_{i,j}$ are all uniformly sub Gaussian, hence there are positive constants $B,b>0$ independent of $i,j$ such that for every $t>0$ ,

[TABLE]

This implies in particular that $R_{i}$ is also with sub-Gaussian tails but with different coefficients. This is because we have

[TABLE]

Hence taking $b^{\prime}=\frac{b}{n}$ , we have

[TABLE]

The Cauchy Schwartz inequality gives us that for $u\in\mathcal{S}$ , we have $|R_{i}u|\leq\|R_{i}\|\|u\|=\|R_{i}\|$ as $\|u\|=1$ , hence,

[TABLE]

which states that $R_{i}u$ is uniformly sub-Gaussian or equivalently, that it satisfies the $\psi_{2}$ condition, that there exist two non negative constants $b,B>0$ (that are different constants from previously) such that:

[TABLE]

Because of the assumption that the matrix coefficients are independent, each row $R_{i}u$ is also independent and the vector $Mu$ satisfies also the $\psi_{2}$ condition as:

[TABLE]

Let us take $C=B^{n}$ and take $A\geq C$ and $n\geq 1$ . The Markov property gives us

[TABLE]

Taking $c=b\,C$ , we get the required inequality:

[TABLE]

which concludes the proof. ∎

*Remark 8**.*

Expressing the lemma A.1 in terms of probability, we have proved that for any individual unit vector $u$ , the norm of the matrix multiplication of $M$ with $u$ , denoted by $\|Mu\|$ is with growth at most $\sqrt{n}$ or equivalently $\|Mu\|=O(\sqrt{n})$ with overwhelming probability.

*Remark 9**.*

At this stage, we could imagine that equipped with lemma A.1, we could finalize the proof of proposition 3.1. The slight difference between lemma A.1 and proposition 3.1 is the applying set. Lemma A.1 states that for any individual unit vector $u$ , the norm of the matrix multiplication of $M$ with $u$ , denoted by $\|Mu\|$ is with growth at most $\sqrt{n}$ . Proposition 3.1 states that the supremum over the unit sphere of any individual unit vector $u$ , the norm of the matrix multiplication of $M$ with $u$ , denoted by $\|Mu\|$ is with growth at most $\sqrt{n}$ . We could imagine going from lemma A.1 to proposition 3.1 using the simple union bound on all points of the unit sphere for the operator norm as follows:

[TABLE]

However, we would be stuck as the unit sphere $\mathcal{S}$ is an uncountable number of points set.

To solve this issue, we shall change the set in the union bound and use the usual trick of maximal $\varepsilon$ -net of the unit sphere $\mathcal{S}$ , denoted by $\Sigma(\varepsilon)$ . This leads to lemma A.2. As we will see shortly, the maximal $\varepsilon$ -net of the sphere $\mathcal{S}$ is countable, using standard packing arguments. On this particular set, we can exploit the fact that the map $x\rightarrow\|Mx\|$ is Lipschitz with Lipschitz constant given by $\|M\|_{op}$ . The induced continuity also us controlling the upper bound of the norm of $\|Mv\|$ for $v\in\Sigma(\varepsilon)$ .

Lemma A.2.

Let $0<\varepsilon<1$ and $\Sigma(\varepsilon)$ be the maximal $\varepsilon$ -net of the sphere $\mathcal{S}$ , that is the set of points in $\mathcal{S}$ separated from each other by a distance of at least $\varepsilon$ and which is maximal with respect to set inclusion. Then for any $n\times n$ matrix $M$ and any $\lambda>0$ , we have

[TABLE]

Proof.

From the definition of the operator norm (see 2.1) as a supremum, using the fact that the map $x\rightarrow\|Mx\|$ is Lipschitz, hence continuous and that the unit sphere $\mathcal{S}$ is compact as we are in finite dimension, we can find $x\in\mathcal{S}$ such that it attains the supremum (recall that a continuous function attains its supremum on a compact set).

[TABLE]

We can eliminate the trivial case of $x$ belonging to $\Sigma(\varepsilon)$ as the inequality 21 is easily verified in this scenario. In the other case, where $x$ does not belong to $\Sigma(\varepsilon)$ , there must exist a point $y$ in $\Sigma(\varepsilon)$ whose distance to $x$ is less than $\varepsilon$ (otherwise we would have a contradiction of the maximality of $\Sigma(\varepsilon)$ by including $x$ to $\Sigma(\varepsilon)$ ). We are going now to use the Lipschitz feature of the map $x\rightarrow\|Mx\|$ whose Lipschitz constant given by $\|M\|_{op}$ Since $\|x-y\|\leq\varepsilon$ , the Lipschitz property gives us

[TABLE]

The triangular inequality gives us

[TABLE]

Hence,

[TABLE]

In particular if $\|My\|_{op}>\lambda$ , then $\|My\|>\lambda(1-\varepsilon)$ which concludes the proof ∎

*Remark 10**.*

The lemma 21 is very intuitive. The continuity of the map $x\rightarrow\|Mx\|$ implies that it attains its maximum on the compact unit sphere. By packing argument, we have necessarily that around this optimum, there is a point of the maximum set $\Sigma(\varepsilon)$ with a distance lower than $\varepsilon$ . As the map $x\rightarrow\|Mx\|$ is Lipschitz, the decrease between the optimum and this point in $x\rightarrow\|Mx\|$ should be at most $\|M\|_{op}\varepsilon$ as $\|M\|_{op}$ is the constant Lipschitz.

We recall last but not least that the cardinality of the maximal $\varepsilon$ -net of the sphere $\mathcal{S}$ , $\Sigma(\varepsilon)$ should be polynomial at most in $n-1$ , the dimension of the sphere with the following two lemmas

Lemma A.3.

Let $0<\varepsilon<1$ , and let $\Sigma(\varepsilon)$ be a maximal $\varepsilon$ -net of the unit sphere $\mathcal{S}$ . Then $\Sigma(\varepsilon)$ has cardinality at most $\frac{C}{\varepsilon^{n-1}}$ for some non negative constant $C>0$ and at least $\frac{c}{\varepsilon^{n-1}}$ for some constant $c>0$ .

Proof.

The proof is quite intuitive and simple. It relies on a volume packing argument. The balls of radius $\varepsilon/2$ centered around each point of $\Sigma(\varepsilon)$ are disjoint and they are in the same numbers as the cardinal of $\Sigma(\varepsilon)$ . By the triangular inequality, and using the fact that $\varepsilon/2<1$ , all these balls are contained within the intersection of the large ball of radius $(1+\varepsilon/2)$ and center the origin, and the smaller ball of radius $(1-\varepsilon/2)$ and center the same origin. Hence, (using the fact that the volume of a ball is a constant times the radius to the power the dimension of the space), we can pack at most

[TABLE]

of these balls, which proves that the cardinality is at most $(C/\varepsilon))^{n-1}$ for some non negative constant $C>0$ as the constant is for $\varepsilon$ small equivalent to $\frac{2n}{\varepsilon^{n-1}}$ .

Reciprocally, for $\varepsilon<2$ , the $\Sigma(\varepsilon)$ is not empty. If we sequentially pack the space between the same previous large and small balll of radius $(1+\varepsilon/2)$ and $(1-\varepsilon/2)$ respectively, both centered at the origin, with balls that do not intersect and have radius $\varepsilon/2$ and with centers on the unit sphere, we can take the set of the centers of these balls. As the balls of radius $\varepsilon/2$ do not intersect, by the triangular inequality, their centers are at least at a distance greater or equal to $\varepsilon$ . Because $\Sigma(\varepsilon)$ is a maximal set, its cardinality should be at least equal to the number of previously created centers. We can pack

[TABLE]

of these centers, which proves that the cardinality is at least $(c/\varepsilon))^{n-1}$ for some non negative constant $c>0$ ∎

Proof.

We can now prove proposition 3.1 as follows. Using lemma A.2 and the union bound, we have

[TABLE]

Lemma A.1 states that for $v\in\mathcal{S}$ , there exist absolute constants $C,c>0$ such that

[TABLE]

Since the cardinality of $\Sigma(\varepsilon)$ is bounded by $\frac{K}{\varepsilon^{n-1}}$ , we can upper bound $\mathbb{P}(\|M\|_{op}>A\sqrt{n})$ by

[TABLE]

Fixing $\varepsilon=1/2$ , denoting by $C^{\prime}=KC$ and taking $c^{\prime}$ such that $c^{\prime}A=cA/2-\ln 2$ , we have

[TABLE]

which concludes the proof. ∎

A.2 Proof of proposition 3.2

Proof.

This is exactly the same reasoning as proposition 3.1 but starting with the fact that any of the matrix $M$ row is uniformly sub-Gaussian for the norm $L_{a}$ . This means that there exist absolute constants $B,b>0$ such that for any $i=1,\ldots,n$

[TABLE]

for the norm $L_{a}$ . The independence of the rows allows us proving the following lemma (similar to lemma A.1) that under the condition of proposition 3.2, there exist absolute constants $C,c>0$ such that for any $u\in\mathcal{S}$ , we have

[TABLE]

for all $A\geq C$ . lemma A.2 and A.3 remain unchanged allowing to conclude. ∎

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Benaych-Georges and Knowles (2016) Florent Benaych-Georges and Antti Knowles. Lectures on the local semicircle law for Wigner matrices. ar Xiv e-prints , art. ar Xiv:1601.04055, January 2016.
2Buldygin and Kozachenko (1980) V. V. Buldygin and Yu.V. Kozachenko. Sub-gaussian random variables. Ukrainian Math , 32:483–489, 1980.
3Grothendieck (1956) A. Grothendieck. Résumé de la théorie métrique des produits tensoriels topologiques. Soc. de Matemática de São Paulo , 1956.
4Ledoux and Talagrand (1991) Michel Ledoux and Michel Talagrand. Probability in Banach Spaces . Springer-Verlag, 1991.
5Marčenko and Pastur (1967) V. A. Marčenko and L. A. Pastur. Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik , 1(4):457–483, April 1967.
6(6) T. Tao. Topics in Random Matrix Theory . Graduate studies in mathematics. American Mathematical Soc. ISBN 9780821885079.
7Tracy and Widom (1994) Craig A. Tracy and Harold Widom. Level-spacing distributions and the airy kernel. Comm. Math. Phys. , 159(1):151–174, 1994.
8Vershynin (2018) Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science . Cambridge University Press, 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A short note on the operator norm upper bound for sub-Gaussian tailed random matrices

Abstract

1 Introduction

2 Some definitions

Definition 2.1**.**

Remark 1*.*

Definition 2.2**.**

Remark 2*.*

Definition 2.3**.**

Remark 3*.*

Remark 4*.*

Remark 5*.*

3 Upper bound for operator norm for sub-Gaussian tailed matrices

Proposition 3.1**.**

Proof.

Remark 6*.*

Proposition 3.2**.**

Proof.

Remark 7*.*

4 Conclusion

Appendix A Proofs

A.1 Proof of proposition 3.1

Lemma A.1**.**

Proof.

Remark 8*.*

Remark 9*.*

Lemma A.2**.**

Proof.

Remark 10*.*

Lemma A.3**.**

Proof.

Proof.

A.2 Proof of proposition 3.2

Proof.

Definition 2.1.

*Remark 1**.*

Definition 2.2.

*Remark 2**.*

Definition 2.3.

*Remark 3**.*

*Remark 4**.*

*Remark 5**.*

Proposition 3.1.

*Remark 6**.*

Proposition 3.2.

*Remark 7**.*

Lemma A.1.

*Remark 8**.*

*Remark 9**.*

Lemma A.2.

*Remark 10**.*

Lemma A.3.