Concentration inequalities for random matrix products

Amelia Henriksen; Rachel Ward

arXiv:1907.05833·math.PR·July 15, 2019

Concentration inequalities for random matrix products

Amelia Henriksen, Rachel Ward

PDF

TL;DR

This paper establishes sharp nonasymptotic concentration inequalities for normalized products of independent bounded random matrices, with applications to stochastic algorithms like streaming PCA.

Contribution

It provides the first nonasymptotic spectral norm bounds for a broad class of random matrix products, combining matrix Bernstein inequality and combinatorial methods.

Findings

01

Spectral norm error bound of O((log n)^2 log(d/δ)/√n) with high probability

02

Convergence of normalized matrix products to matrix exponential e^{X}

03

Sharpness of the rate up to logarithmic factors

Abstract

Suppose ${X_{k}}_{k \in Z}$ is a sequence of bounded independent random matrices with common dimension $d \times d$ and common expectation $E [X_{k}] = X$ . Under these general assumptions, the normalized random matrix product $Z_{n} = (I + \frac{1}{n} X_{n}) (I + \frac{1}{n} X_{n - 1}) \dots (I + \frac{1}{n} X_{1})$ converges to $Z_{n} \to e^{X}$ as $n \to \infty$ . Normalized random matrix products of this form arise naturally in stochastic iterative algorithms, such as Oja's algorithm for streaming Principal Component Analysis. Here, we derive nonasymptotic concentration inequalities for such random matrix products. In particular, we show that the spectral norm error satisfies $∥ Z_{n} - e^{X} ∥ = O ((lo g (n))^{2} lo g (d / δ) / n)$ with probability exceeding $1 - δ$ . This rate is sharp in $n$ , $d$ , and $δ$ , up to possibly the $lo g (n)$ and $lo g (d)$ …

Equations142

Z_{n} = (I_{d} + \frac{1}{n} X_{n}) (I_{d} + \frac{1}{n} X_{n - 1}) \dots (I_{d} + \frac{1}{n} X_{1})

Z_{n} = (I_{d} + \frac{1}{n} X_{n}) (I_{d} + \frac{1}{n} X_{n - 1}) \dots (I_{d} + \frac{1}{n} X_{1})

n \to \infty lim k = 0 \prod n - 1 (1 + \frac{u _{k}}{n}) = e^{μ} .

n \to \infty lim k = 0 \prod n - 1 (1 + \frac{u _{k}}{n}) = e^{μ} .

n \to \infty lim \frac{1}{n} k = 1 \sum n A_{k} = A

n \to \infty lim \frac{1}{n} k = 1 \sum n A_{k} = A

Z_{n} = (I_{d} + \frac{1}{n} A_{1}) \dots (I_{d} + \frac{1}{n} A_{n}) .

Z_{n} = (I_{d} + \frac{1}{n} A_{1}) \dots (I_{d} + \frac{1}{n} A_{n}) .

n \to \infty lim Z_{n} = e^{A} .

n \to \infty lim Z_{n} = e^{A} .

E [X_{k}] = X and ∥ X_{k} ∥ \leq L for each index k .

E [X_{k}] = X and ∥ X_{k} ∥ \leq L for each index k .

Z_{n} = (I_{d} + \frac{1}{n} X_{n}) (I_{d} + \frac{1}{n} X_{n - 1}) \dots (I_{d} + \frac{1}{n} X_{1}) .

Z_{n} = (I_{d} + \frac{1}{n} X_{n}) (I_{d} + \frac{1}{n} X_{n - 1}) \dots (I_{d} + \frac{1}{n} X_{1}) .

max {3, ⌈ L e^{2} ⌉} \leq lo g (n) + 1 \leq (\frac{16 n}{lo g ( d / δ ) + lo g ( n e )})^{1/3}

max {3, ⌈ L e^{2} ⌉} \leq lo g (n) + 1 \leq (\frac{16 n}{lo g ( d / δ ) + lo g ( n e )})^{1/3}

Z_{n} - e^{X}

Z_{n} - e^{X}

E Z_{n} - e^{X} \leq (1 - 2 δ) (\frac{2 L e ^{L} lo g ( n )}{n} (2 lo g (2 d / δ) + (lo g (n))^{2} + \frac{lo g ( n )}{n})) + \frac{L ^{2} e ^{L}}{2 n} + 4 δ e^{L}

E Z_{n} - e^{X} \leq (1 - 2 δ) (\frac{2 L e ^{L} lo g ( n )}{n} (2 lo g (2 d / δ) + (lo g (n))^{2} + \frac{lo g ( n )}{n})) + \frac{L ^{2} e ^{L}}{2 n} + 4 δ e^{L}

E Z_{n} - e^{X} \leq (\frac{2 L e ^{L} lo g ( n )}{n} (2 lo g (2 d / δ) + (lo g (n))^{2} + \frac{lo g ( n )}{n})) + \frac{L ^{2} e ^{L}}{n} .

E Z_{n} - e^{X} \leq (\frac{2 L e ^{L} lo g ( n )}{n} (2 lo g (2 d / δ) + (lo g (n))^{2} + \frac{lo g ( n )}{n})) + \frac{L ^{2} e ^{L}}{n} .

E [S_{k}] = 0 and ∥ S_{k} ∥ \leq L for each index k .

E [S_{k}] = 0 and ∥ S_{k} ∥ \leq L for each index k .

Z = k \sum S_{k} .

Z = k \sum S_{k} .

v (Z)

v (Z)

= max {∥ k \sum E [S_{k} S_{k}^{*}] ∥, ∥ k \sum E [S_{k}^{*} S_{k}] ∥} .

Prob {∥ Z ∥ \geq t} \leq (d_{1} + d_{2}) exp (\frac{- t ^{2} /2}{v ( Z ) + L t /3}) .

Prob {∥ Z ∥ \geq t} \leq (d_{1} + d_{2}) exp (\frac{- t ^{2} /2}{v ( Z ) + L t /3}) .

Z_{n}

Z_{n}

= I_{d} + k = 1 \sum n Z_{n, k}

Z_{n, k} = (\frac{1}{n})^{k} 1 \leq j_{1} < \dots < j_{k} \leq n \sum X_{j_{k}} X_{j_{k - 1}} \dots X_{j_{1}}, 1 \leq k \leq n .

Z_{n, k} = (\frac{1}{n})^{k} 1 \leq j_{1} < \dots < j_{k} \leq n \sum X_{j_{k}} X_{j_{k - 1}} \dots X_{j_{1}}, 1 \leq k \leq n .

E [Z_{n, k}] = (\frac{1}{n})^{k} (k n) X^{k}, E [Z_{n}] = I_{d} + k = 1 \sum n E [Z]_{n, k} = (I_{d} + \frac{1}{n} X)^{n}; .

E [Z_{n, k}] = (\frac{1}{n})^{k} (k n) X^{k}, E [Z_{n}] = I_{d} + k = 1 \sum n E [Z]_{n, k} = (I_{d} + \frac{1}{n} X)^{n}; .

Z_{n} - e^{X}

Z_{n} - e^{X}

= ∥ Z_{n} - E [Z_{n}] ∥ + (I_{d} + \frac{1}{n} X)^{n} - e^{X}

\leq k = 1 \sum n ∥ Z_{n, k} - E [Z_{n, k}] ∥ + (I_{d} + \frac{1}{n} X)^{n} - e^{X}

∥ (I + \frac{1}{n} X)^{n} - e^{X} ∥ \leq \frac{∥ X ∥ ^{2}}{2 n} e^{∥ X ∥}

∥ (I + \frac{1}{n} X)^{n} - e^{X} ∥ \leq \frac{∥ X ∥ ^{2}}{2 n} e^{∥ X ∥}

k = ⌈ l o g (n)⌉ \sum n ∥ Z_{n, k} - E (Z_{n, k}) ∥ \leq \frac{2 L e ^{2}}{n ( e - 1 )}

k = ⌈ l o g (n)⌉ \sum n ∥ Z_{n, k} - E (Z_{n, k}) ∥ \leq \frac{2 L e ^{2}}{n ( e - 1 )}

k

k

Prob [∥ Z_{n, k} - E (Z_{n, k}) ∥ > γ_{k}] \leq δ^{k}

Prob [∥ Z_{n, k} - E (Z_{n, k}) ∥ > γ_{k}] \leq δ^{k}

γ_{k} = 2 (\frac{e L}{k - 1})^{k - 1} (\frac{2 L}{n} lo g (\frac{2 d ( n e / ( k - 1 ) ) ^{k - 1}}{δ}) + \frac{L ( k - 1 )}{n})

γ_{k} = 2 (\frac{e L}{k - 1})^{k - 1} (\frac{2 L}{n} lo g (\frac{2 d ( n e / ( k - 1 ) ) ^{k - 1}}{δ}) + \frac{L ( k - 1 )}{n})

Z_{k} - E [Z_{k}]

Z_{k} - E [Z_{k}]

= (\frac{1}{n})^{k} 1 \leq j_{1} < \dots < j_{k} \leq n - p \sum (X_{j_{k}} X_{j_{k - 1}} \dots X_{j_{1}} - X^{k}) + D_{k} .

∥ D_{k} ∥

∥ D_{k} ∥

\leq 2 (k - 1) (k - 1 n - 1) (\frac{L}{n})^{k} (by Pascal’s rule)

\leq 2 (k - 1) (\frac{( n - 1 ) ( e )}{k - 1})^{k - 1} (\frac{L}{n})^{k}

\leq 2 \frac{L ( k - 1 )}{n} (\frac{e L}{k - 1})^{k - 1}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Concentration inequalities for random matrix products

Amelia Henriksen and Rachel Ward Oden Institute for Computational Engineering and Sciences, University of Texas, Austin, TX (email: [email protected], [email protected]). This material is based upon work supported in part by AFOSR MURI Award N00014-17-S-F006.

Abstract

Suppose $\{\bm{X}_{k}\}_{k\in\mathbb{Z}}$ is a sequence of bounded independent random matrices with common dimension $d\times d$ and common expectation $\mathbb{E}\left[\bm{X}_{k}\right]=\bm{X}$ . Under these general assumptions, the normalized random matrix product

[TABLE]

converges to $\bm{Z}_{n}\rightarrow e^{\bm{X}}$ as $n\rightarrow\infty$ . Normalized random matrix products of this form arise naturally in stochastic iterative algorithms, such as Oja’s algorithm for streaming Principal Component Analysis. Here, we derive nonasymptotic concentration inequalities for such random matrix products. In particular, we show that the spectral norm error satisfies $\|\bm{Z}_{n}-e^{\bm{X}}\|=O((\log(n))^{2}\log(d/\delta)/\sqrt{n})$ with probability exceeding $1-\delta$ . This rate is sharp in $n$ , $d$ , and $\delta$ , up to possibly the $\log(n)$ and $\log(d)$ factors. The proof relies on two key points of theory: the Matrix Bernstein inequality concerning the concentration of sums of random matrices, and Baranyai’s theorem from combinatorial mathematics. Concentration bounds for general classes of random matrix products are hard to come by in the literature, and we hope that our result will inspire further work in this direction.

1 Introduction

A classical limit theorem from complex analysis reads: *Let $(u_{n})_{n\in\mathbb{N}}$ be a uniformly bounded complex sequence whose mean $\frac{1}{n}\sum_{k=0}^{n-1}u_{n}$ converges towards $\mu$ . Then *

[TABLE]

This result is easily verified by taking the natural logarithm of each side, and observing that $\log\left(\prod\limits_{k=0}^{n-1}\left(1+\frac{u_{k}}{n}\right)\right)\approx\frac{1}{n}\sum_{k=0}^{n-1}u_{k}\rightarrow\mu$ . A non-commutative extension of this result was recently proven by Emme and Hubert in [EH18]:

Proposition 1.

Let $(\bm{A}_{n})_{n\in\mathbb{N}}$ be a sequence of $d\times d$ complex matrices satisfying

[TABLE]

and such that $(\frac{1}{n}\sum_{k=1}^{n}\|\bm{A}_{k}\|)_{n\in\mathbb{N}}$ is bounded for a norm $\|\cdot\|$ by $\alpha$ . Consider the matrix product

[TABLE]

Then

[TABLE]

The proof of Theorem 1 is not a straightforward extension of the scalar result. The matrix product is non-commutative in general, $\bm{A}\bm{B}\neq\bm{B}\bm{A}$ , and so of course $\log(\bm{A}\bm{B})\neq\log(\bm{A})+\log(\bm{B})$ fails to hold in turn.

An important special case within the framework of Proposition 1 is when the $\bm{A}_{k}$ are uniformly bounded independent random matrices with common expectation $\mathbb{E}\left[\bm{A}_{k}\right]=\bm{A}$ . Then $\bm{Z}_{n}$ is also a random matrix, and has expectation $\mathbb{E}\left[\bm{Z}_{n}\right]=(\bm{I}_{d}+\frac{1}{n}\bm{A})^{n}$ . Within this framework, it is natural to ask about about rates of convergence of $\bm{Z}_{n}$ to $e^{\bm{A}}$ . As far as we are aware, precise rates of convergence for matrix products of the form $\bm{Z}_{n}$ have not appeared in the literature before, despite such random matrix products naturally arising in stochastic iterative algorithms such as stochastic gradient descent; in particular, in Oja’s algorithm for estimating the top eigenvector of the covariance matrix of a distribution of matrices observed sequentially [Kra70, Oja82, BDF13, MCJ13, SRO15, JJK*+*16, AZL17]. Here, as the main content of this paper, we derive a rate of convergence for matrix products of this form.

Theorem 1 (Main Theorem).

Consider a sequence $\{\bm{X}_{k}\}_{k\in\mathbb{Z}}$ of independent (real or complex-valued) random matrices with common dimension $d\times d$ . Assume that

[TABLE]

Introduce the sequence of random matrices $\{\bm{Z}_{n}\}_{n\in\mathbb{N}}$ given by

[TABLE]

Suppose that $L>0$ , $n,d\in\mathbb{Z},$ and $\delta\in(0,1/2]$ are such that

[TABLE]

Then with probability exceeding $1-2\delta$ , the following holds:

[TABLE]

where $\|\cdot\|$ denotes the matrix spectral norm.

Theorem 1 immediately implies a bound on the expected value of $\left\lVert\bm{Z}_{n}-e^{\bm{X}}\right\rVert.$ Note that $\|\bm{Z}_{n}\|\leq e^{L}$ and $\|e^{\bm{X}}\|\leq e^{L}$ , so for any $\delta>0$ satisfying (3),

[TABLE]

In particular, setting $\delta=\frac{L^{2}}{8n}$ gives

[TABLE]

Note that the $O(\frac{1}{\sqrt{n}})$ convergence rate is unavoidable under the stated assumptions. Indeed, consider the scalar case $d=1$ , where $\{x_{j}\}_{j=1}^{n}$ is a sequence of independent real-valued mean-zero scalars, bounded uniformly by $|x_{j}|\leq L$ . In this case, as $\frac{1}{n}\sum_{j=1}^{n}x_{j}$ becomes sufficiently small, $\frac{1}{n}\sum_{j=1}^{n}x_{j}$ and $\log(\prod\limits_{j=1}^{n}(1+\frac{x_{j}}{n}))$ are nearly equivalent. Thus, applying the standard scalar Bernstein inequality to $\frac{1}{n}\sum_{j=1}^{n}x_{j}$ results in a bound of the form $|\prod\limits_{j=1}^{n}(1+\frac{x_{j}}{n})-1|\leq CL\frac{\sqrt{\log(1/\delta)}}{\sqrt{n}}$ . It remains open whether the $\log(n)$ and $\log(d)$ factors in the rate given by Theorem 1 can be removed, and also whether the dependence on $L$ can be improved.

Remark 1.

Limit laws for products of random matrices have been extensively analyzed in the context of ergodic theory or martingales on Markov chains – see for instance the book [BQ16] or the extensive survey articles [Fur02, Led01]. However, results in the form of quantitative rates of convergence of general random matrix products are quite scarce, apart from specialized cases such as for products of i.i.d. Gaussian random matrices. Surprisingly, for the random matrix product $\bm{Z}_{n}$ we consider (2), a rigorous proof of the limiting behavior $\bm{Z}_{n}\rightarrow\exp(X)$ appears to have only been proven recently [EH18], even though a seemingly incomplete proof of this limiting behavior was provided as Theorem $7$ of the 1984 paper [Ber84].**

Notation. Throughout, $\|\bm{X}\|$ refers to the spectral norm of the matrix $\bm{X}$ . For an integer $n\geq 1$ , we use the notation $[n]$ to refer to the set $\{1,2,\dots,n\}$ . We write ${\bf Prob}[E]$ to refer to the probability of the event $E$ .

2 Preliminaries

A crucial ingredient of the proof of Theorem 1 is the matrix Bernstein inequality, a matrix-level extension of the classical scalar Bernstein inequality describing the upper tail of a sum of independent bounded or sub-exponential random variables. The first matrix Bernstein type bound was derived by Ahlswede and Winter [AW03], and subsequently improved by Tropp [Tro10] by applying Lieb’s theorem in place of the Golden-Thompson inequality. We use the variant of the matrix Bernstein inequality of Tropp stated below.

Proposition 2 (Matrix Bernstein Inequality (Theorem 6.1.1 in [Tro15])).

Consider a finite sequence $\{\bm{S}_{k}\}$ of independent random matrices with common dimension $d_{1}\times d_{2}$ . Assume that

[TABLE]

Introduce the random matrix

[TABLE]

Let $v(\bm{Z})$ be the matrix variance statistic of the sum:

[TABLE]

Then, for all $t\geq 0$ ,

[TABLE]

Another key theorem we rely on is Baranyai’s theorem [Bar75], stated below.

Proposition 3 (Baranyai, 1973).

Let $a_{1},\dots,a_{t}$ be natural numbers such that $\sum_{j=1}^{t}a_{j}={N\choose k}$ . Then the set of $k$ -subsets of $[N]$ can be partitioned into disjoint families $S_{1},\dots,S_{t}$ with $|S_{j}|=a_{j}$ and each $i\in[N]$ is included in exactly $\lceil{\frac{a_{j}\cdot k}{N}\rceil}$ or $\lfloor{\frac{a_{j}\cdot k}{N}\rfloor}$ elements of $S_{j}$ .

2.1 Sketch of the proof of Theorem 1

Suppose that $\bm{X}_{k}$ , $\bm{X}$ , and $\bm{Z}_{n}$ satisfy the assumptions of Theorem 1. Write

[TABLE]

where

[TABLE]

Because the $\bm{X}_{k}$ are independent, the expected values of $\bm{Z}_{n,k}$ and $\bm{Z}_{n}$ are easily calculated:

[TABLE]

We then write

[TABLE]

The approximation error $\left\lVert(\bm{I}_{d}+\frac{1}{n}\bm{X})^{n}-e^{\bm{X}}\right\rVert$ is bounded deterministically using standard analysis, and converges to zero at rate $O(1/n)$ , as made precise by Lemma 2. The errors $\left\lVert\bm{Z}_{n,k}-\mathbb{E}[\bm{Z}_{n,k}]\right\rVert$ decay sufficiently quickly in $k$ that the sum of all but the first $\log(n)$ many of them, $\sum_{k=\lceil{\log(n)\rceil}}^{n}\|\bm{Z}_{n,k}-\mathbb{E}[\bm{Z}_{n,k}]\|$ , is also bounded by $O(1/n)$ deterministically (Lemma 3 below). The leading error term $\|\bm{Z}_{n,1}-\mathbb{E}[\bm{Z}_{n,1}]\|$ is bounded with high probability using the Matrix Bernstein inequality. The most interesting, and most difficult, part of the proof is in bounding the intermediate terms $\|\bm{Z}_{n,k}-\mathbb{E}[\bm{Z}_{n,k}]\|$ , $k=2,\dots,\lfloor{\log(n)\rfloor}.$ To do this, we appeal to Baranyai’s theorem, which implies that each such term can be approximately written as a sum of sums of independent matrix products, so that we may apply the matrix Bernstein inequality with properly tuned parameters to each sub-sum to achieve the final bound.

3 Key Ingredients

The first two lemmas use standard analysis tools; we defer the proofs to appendices.

Lemma 2.

Let $\bm{X}$ be a square real or complex-matrix with spectral norm $\|\bm{X}\|$ . The following holds:

[TABLE]

The proof of Lemma 2 is found in Appendix B.

Lemma 3.

Suppose that $\bm{Z}_{n}$ is as in Theorem 1, and let $\bm{Z}_{n,k}$ be as defined in 8. Suppose that $\lceil\log(n)\rceil\geq\max\{3,\lceil{Le^{2}\rceil}\}.$ Then

[TABLE]

The proof of Lemma 3 is found in Appendix A.

Proposition 4 contains the meat of the proof. By carefully combining the Matrix Bernstein inequality and Baranyai’s theorem, we produce high probability bounds for the error terms $\|\bm{Z}_{n,k}-\mathbb{E}\left(\bm{Z}_{n,k}\right)\|$ .

Proposition 4.

Assume $\bm{X}_{1},\bm{X}_{2},\ldots\bm{X}_{n}$ are $d\times d$ matrices satisfying the assumptions in Theorem 1, and suppose that $n,k,d\in\mathbb{Z},$ and $\delta>0$ are such that

[TABLE]

where, for the $k=1$ case, we treat $0^{0}=1$ . Then

[TABLE]

where

[TABLE]

Proof.

For simplicity of notation, we drop the subscript $n$ in all matrix notation throughout; that is, we let $\bm{Z}=\bm{Z}_{n}$ , we let $\bm{Z}_{n,k}=\bm{Z}_{k}$ , and so on. Note Let $p\in\{0,1,\ldots,k-1\}$ be the unique integer such that $k$ divides $n-p$ , and write

[TABLE]

The random matrix $\bm{D}_{k}$ is a sum of $\binom{n}{k}-\binom{n-p}{k}$ random matrix products, each of which contains at least one of the $p$ matrices $\bm{X}_{n-p+1},\cdots\bm{X}_{n}$ . Each term is bounded in norm deterministically by $2\left(\frac{L}{n}\right)^{k}$ , so

[TABLE]

We thus have so far that

[TABLE]

Now, as a consequence of Baranyai’s theorem, there exist $m_{k}=\frac{{n-p\choose k}}{(n-p)/k}=\binom{n-p}{k-1}$ partitions of $[n-p]=\{1,2,\dots,n-p\}$ , denoted by ${\cal P}_{r}$ , $r=1,2,\dots,m_{k}$ , such that

[TABLE]

Write

[TABLE]

Because the $\bm{X}_{j}$ are independent and because each ${\cal P}_{r}$ constitutes a partition of $[n-p]$ , each subset of random matrices $\{\bm{Y}_{r,\ell}\}_{\ell=1}^{(n-p)/k}$ forms a mutually independent set of random matrices. We can use this to bound $\left\lVert\sum_{r=1}^{m_{k}}\sum_{\ell=1}^{(n-p)/k}\bm{Y}_{r,\ell}\right\rVert$ with high probability, using the Matrix Bernstein Inequality (Proposition 2). Indeed, we will apply the Matrix Bernstein Inequality separately to each sum $\sum_{\ell=1}^{(n-p)/k}\bm{Y}_{r,\ell}$ of independent random matrices. To do this, we employ the bounds

$\mathbb{E}\left[\bm{Y}_{r,\ell}\right]=\left(\frac{1}{n}\right)^{k}\mathbb{E}\left[\bm{X}_{j_{k}}\cdots\bm{X}_{j_{1}}-\bm{X}^{k}\right]=\left(\frac{1}{n}\right)^{k}\left(\bm{X}^{k}-\bm{X}^{k}\right)=0$ 2. 2.

$\|\bm{Y}_{r,\ell}\|=\left\lVert\left(\frac{1}{n}\right)^{k}\left(\bm{X}_{j_{k}}\cdots\bm{X}_{j_{1}}-\bm{X}^{k}\right)\right\rVert\leq\left(\frac{1}{n}\right)^{k}\left(\left\lVert\bm{X}_{j_{k}}\right\rVert\cdots\left\lVert\bm{X}_{j_{1}}\right\rVert+\left\lVert X\right\rVert^{k}\right)\leq 2\left(\frac{L}{n}\right)^{k}$ 3. 3.

[TABLE] 4. 4.

Similarly, $\left\lVert\sum_{\ell=1}^{(n-p)/k}\mathbb{E}\left[{\bm{Y}_{r,\ell}}^{*}\bm{Y}_{r,\ell}\right]\right\rVert\leq 4\left(\frac{n}{k}\right)\left(\frac{L}{n}\right)^{2k}$

We can now apply the Matrix Bernstein Inequality: for any $\tau>0$ ,

[TABLE]

We take the union bound over all $m_{k}=\binom{n-p}{k-1}\leq\binom{n}{k-1}\leq(\frac{ne}{k-1})^{k-1}$ sums to obtain

[TABLE]

Set $\tau=\beta_{k}(\frac{ne}{k-1})^{-(k-1)}$ (where, in case $k=1$ , we use $0^{0}=1$ ). Then

[TABLE]

Set

[TABLE]

Under the assumption that

[TABLE]

which is implied by the stated condition (12) on $k$ , it follows that

[TABLE]

and so we can continue to bound

[TABLE]

Thus, we conclude that for each $k=1,2,\dots,$ satisfying assumption (18), it holds that

[TABLE]

Recalling

[TABLE]

yields the result. ∎

4 Proof of Theorem 1

We can bound the error $\|\bm{Z}_{n}-\mathbb{E}[\bm{Z}_{n}]\|$ from Theorem 1 by combining Proposition 4 with Lemma 3.

Corollary 3.1.

Suppose that $L$ , $n,$ and $\delta\in(0,1/2]$ are such that

[TABLE]

Then with probability exceeding $1-2\delta$ ,

[TABLE]

Proof.

First, $\|\bm{Z}_{n}-\mathbb{E}[\bm{Z}_{n}]\|\leq\sum_{k=1}^{n}\left\lVert\bm{Z}_{n,k}-\mathbb{E}\left[\bm{Z}_{n,k}\right]\right\rVert$ by the triangle inequality. By Proposition 3,

[TABLE]

Now, given (19), we can apply Proposition 4 to each of $k=1,2,\dots,\lceil\log(n)\rceil$ , and via the union bound, we obtain that the following holds with probability at least $1-\sum_{k=1}^{\lceil\log(n)\rceil}\delta^{-k}\geq 1-\frac{\delta}{1-\delta}\geq 1-2\delta:$

[TABLE]

where in the final inequality, we use that $\left(\frac{eL}{x}\right)^{x}$ is maximized over $x>0$ at $x^{*}=L$ . We have the stated result.

∎

Proof of Theorem 1 from Corollary 3.1. Write $\|\bm{Z}_{n}-e^{\bm{X}}\|\leq\|\bm{Z}_{n}-\mathbb{E}[\bm{Z}_{n}]\|+\|\mathbb{E}[\bm{Z}_{n}]-e^{\bm{X}}\|$ . Bound $\|\bm{Z}_{n}-\mathbb{E}[\bm{Z}_{n}]\|$ using Corollary 3.1 and bound $\|\mathbb{E}[\bm{Z}_{n}]-e^{\bm{X}}\|=\|(I+\frac{1}{n}\bm{X})^{n}-e^{\bm{X}}\|$ using Lemma 2 to arrive at the statement of Theorem 1.

5 Conclusion and Future Directions

We derived a large deviations bound for the convergence rate of a certain type of product of random matrices toward its limiting distribution. Our results are quite general and nearly sharp with respect to dependence on the matrix size $d$ and number of terms in the product, $n$ .

One particularly immediate application of our rates of convergence is in the analysis of random matrix products arising in stochastic iterative algorithms such as Oja’s algorithm for streaming principal component analysis [Oja82]. One area of future work would be to use our results to derive convergence rates for Oja’s method using minimal assumptions – an area of ongoing research (see, for example, [AL16, JJK*+*16]). This is particularly important because of the fundamental role streaming PCA plays in high-dimensional data analysis.

Appendix A Proof of lemma 3

See 3

Proof.

We have that

[TABLE]

Hence it remains to show that

[TABLE]

Let $k_{0}=\lceil\log(n)\rceil$ in the remainder. First, we observe that $\left(\frac{Le}{k_{0}}\right)^{k_{0}}\leq\frac{Le}{n}$ :

[TABLE]

Since $k_{0}\geq\log(n),$ it suffices to show that

[TABLE]

We consider two cases:

Case 1: If $L\leq\frac{1}{e}$ , then $(k_{0}-1)\log(Le)\leq(k_{0}-1)\log(1)\leq 0.$ Thus we require $-k_{0}(\log(k_{0})-1)\leq 0$ . This clearly holds because $k_{0}\geq e$ . 2. 2.

Case 2: If $L>\frac{1}{e}$ , then $\log(Le)>\log(1)=0\Rightarrow-\log(Le)<0$ . Since $k_{0}\geq Le^{2},$ it follows that

[TABLE]

Now, for each $k\geq k_{0}$ ,

[TABLE]

By induction, it follows that

[TABLE]

Hence,

[TABLE]

∎

Appendix B Proof of Lemma 2

See 2

Proof.

The proof uses only basic analytic tools and inequalities. Recall the matrix exponential: $e^{\bm{X}}:=\sum_{k=0}^{\infty}\frac{\bm{X}^{k}}{k!}$ . Let $\sigma=\|\bm{X}\|$ . Then we have

[TABLE]

where in the final inequality, we used that $\log(1+\sigma/n)\geq\frac{\sigma}{n}-\frac{1}{2}\left(\frac{\sigma}{n}\right)^{2}$ . Thus, using also that $e^{-x}\geq 1-x$ for all $x>0,$

[TABLE]

∎

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AL 16] Z. Allen-Zhu and Y. Li. First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate. Ar Xiv e-prints , July 2016.
2[AW 03] R. Ahlswede and A. Winter. Strong converse for identification via quantum channels. IEEE Transactions on Information Theory , 48:569–579, 2003.
3[AZL 17] Zeyuan Allen-Zhu and Yuanzhi Li. First efficient convergence for streaming k-pca: a global, gap-free, and near-optimal rate. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages 487–492. IEEE, 2017.
4[Bar 75] Zsolt Baranyai. On the factorization of the complete uniform hypergraph. In Infinite and finite sets: To Paul Erdös on his 60th birthday , volume 10 of Colloquia mathematica Societatits János Bolyai , pages 91–108. North-Holland Publishing Company, 1975.
5[BDF 13] A. Balsubramani, S. Dasgupta, and Y. Freund. The fast convergence of incremental PCA. Advances in Neural Information Processing Systems (NIPS) , pages 3174–3182, 2013.
6[Ber 84] M. Berger. Central limit theorem for products of random matrices. Transactions of the American Mathematical Society , 285:777–803, 1984.
7[BQ 16] Y. Benoist and Je. Quint. Random walks on reductive groups , volume 62 of Results in Mathematics and Related Areas . Springer, 2016.
8[EH 18] Jordan Emme and Pascal Hubert. Limit laws for random matrix products. Mathematical Research Letters , 25, 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Concentration inequalities for random matrix products

Abstract

1 Introduction

Proposition 1**.**

Theorem 1** (Main Theorem).**

Remark 1**.**

2 Preliminaries

Proposition 2** (Matrix Bernstein Inequality (Theorem 6.1.1 in [Tro15])).**

Proposition 3** (Baranyai, 1973).**

2.1 Sketch of the proof of Theorem 1

3 Key Ingredients

Lemma 2**.**

Lemma 3**.**

Proposition 4**.**

Proof.

4 Proof of Theorem 1

Corollary 3.1**.**

Proof.

5 Conclusion and Future Directions

Appendix A Proof of lemma 3

Proof.

Appendix B Proof of Lemma 2

Proof.

Proposition 1.

Theorem 1 (Main Theorem).

Remark 1.

Proposition 2 (Matrix Bernstein Inequality (Theorem 6.1.1 in [Tro15])).

Proposition 3 (Baranyai, 1973).

Lemma 2.

Lemma 3.

Proposition 4.

Corollary 3.1.