Harmonic Means of Wishart Random Matrices

Asad Lodhia

arXiv:1905.02357·math.PR·June 21, 2019

Harmonic Means of Wishart Random Matrices

Asad Lodhia

PDF

TL;DR

This paper analyzes the spectral properties of the harmonic mean of Wishart matrices using free probability, revealing how it compares to the arithmetic mean in operator norm for different sample sizes.

Contribution

It introduces a free probability approach to characterize the harmonic mean of Wishart matrices and uncovers a size-dependent norm closeness phenomenon.

Findings

01

Harmonic mean is closer to expectation than arithmetic mean for small n.

02

Operator norm difference varies with the number of matrices.

03

Results extend to non-identity expectation cases.

Abstract

We use free probability to compute the limiting spectral properties of the harmonic mean of $n$ i.i.d. Wishart random matrices $W_{i}$ whose limiting aspect ratio is $γ \in (0, 1)$ when $E [W_{i}] = I$ . We demonstrate an interesting phenomenon where the harmonic mean $H$ of the $n$ Wishart matrices is closer in operator norm to $E [W_{i}]$ than the arithmetic mean $A$ for small $n$ , after which the arithmetic mean is closer. We also prove some results for the general case where the expectation of the Wishart matrices are not the identity matrix.

Figures2

Click any figure to enlarge with its caption.

Equations286

W_{i} := \frac{X _{i} X _{i}^{*}}{N}

W_{i} := \frac{X _{i} X _{i}^{*}}{N}

A := \frac{\sum _{i = 1}^{n} W _{i}}{n} .

A := \frac{\sum _{i = 1}^{n} W _{i}}{n} .

\mathbf{A}=\frac{\big{[}\mathbf{X}_{1},\cdots,\mathbf{X}_{n}\big{]}\big{[}\mathbf{X}_{1},\cdots,\mathbf{X}_{n}\big{]}^{*}}{Nn}.

\mathbf{A}=\frac{\big{[}\mathbf{X}_{1},\cdots,\mathbf{X}_{n}\big{]}\big{[}\mathbf{X}_{1},\cdots,\mathbf{X}_{n}\big{]}^{*}}{Nn}.

\rho_{\mathrm{MP},\gamma}(x):=\frac{\sqrt{\big{(}(1+\sqrt{\gamma})^{2}-x\big{)}\big{(}x-(1-\sqrt{\gamma})^{2}\big{)}}}{2\pi\gamma x}\mathbf{1}_{\big{[}(1-\sqrt{\gamma})^{2},(1+\sqrt{\gamma})^{2}\big{]}}(x).

\rho_{\mathrm{MP},\gamma}(x):=\frac{\sqrt{\big{(}(1+\sqrt{\gamma})^{2}-x\big{)}\big{(}x-(1-\sqrt{\gamma})^{2}\big{)}}}{2\pi\gamma x}\mathbf{1}_{\big{[}(1-\sqrt{\gamma})^{2},(1+\sqrt{\gamma})^{2}\big{]}}(x).

∥ W_{i} - I ∥ \to γ + 2 γ a.s.,

∥ W_{i} - I ∥ \to γ + 2 γ a.s.,

∥ A - I ∥ \to \frac{γ}{n} + 2 \frac{γ}{n} a.s.

∥ A - I ∥ \to \frac{γ}{n} + 2 \frac{γ}{n} a.s.

M_{1} ⪯ M_{2}

M_{1} ⪯ M_{2}

k\Big{(}\mathbf{M}_{1}^{-1}+\cdots+\mathbf{M}_{k}^{-1}\Big{)}^{-1}\preceq\frac{\mathbf{M}_{1}+\cdots+\mathbf{M}_{k}}{k}.

k\Big{(}\mathbf{M}_{1}^{-1}+\cdots+\mathbf{M}_{k}^{-1}\Big{)}^{-1}\preceq\frac{\mathbf{M}_{1}+\cdots+\mathbf{M}_{k}}{k}.

k\Big{(}\mathbf{M}_{1}^{-1}+\cdots+\mathbf{M}_{k}^{-1}\Big{)}^{-1},

k\Big{(}\mathbf{M}_{1}^{-1}+\cdots+\mathbf{M}_{k}^{-1}\Big{)}^{-1},

\mathbf{H}:=n\bigg{(}\sum_{i=1}^{n}\mathbf{W}_{i}^{-1}\bigg{)}^{-1}.

\mathbf{H}:=n\bigg{(}\sum_{i=1}^{n}\mathbf{W}_{i}^{-1}\bigg{)}^{-1}.

\mathbb{E}\big{[}\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{W}_{i}\mathbf{\Sigma}^{\frac{1}{2}}\big{]}=\mathbf{\Sigma}.

\mathbb{E}\big{[}\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{W}_{i}\mathbf{\Sigma}^{\frac{1}{2}}\big{]}=\mathbf{\Sigma}.

\Bigg{|}\frac{P}{N}-\gamma\Bigg{|}\leq\frac{K}{P^{2}},

\Bigg{|}\frac{P}{N}-\gamma\Bigg{|}\leq\frac{K}{P^{2}},

\frac{n}{2 π γ x} (e_{+} - x) (x - e_{-}) 1_{[e_{-}, e_{+}]} (x),

\frac{n}{2 π γ x} (e_{+} - x) (x - e_{-}) 1_{[e_{-}, e_{+}]} (x),

e_{\pm} := 1 - γ + \frac{2 γ}{n} \pm 2 \frac{γ}{n} 1 - γ + \frac{γ}{n} .

e_{\pm} := 1 - γ + \frac{2 γ}{n} \pm 2 \frac{γ}{n} 1 - γ + \frac{γ}{n} .

P, N \to \infty lim ∥ H - I ∥ \to γ - \frac{2 γ}{n} + 2 \frac{γ}{n} 1 - γ + \frac{γ}{n} a.s.

P, N \to \infty lim ∥ H - I ∥ \to γ - \frac{2 γ}{n} + 2 \frac{γ}{n} 1 - γ + \frac{γ}{n} a.s.

P, N \to \infty lim ∥ H - I ∥ = γ - \frac{2 γ}{n} + 2 \frac{γ}{n} 1 - γ + \frac{γ}{n} < \frac{γ}{n} + 2 \frac{γ}{n} = P, N \to \infty lim ∥ A - I ∥,

P, N \to \infty lim ∥ H - I ∥ = γ - \frac{2 γ}{n} + 2 \frac{γ}{n} 1 - γ + \frac{γ}{n} < \frac{γ}{n} + 2 \frac{γ}{n} = P, N \to \infty lim ∥ A - I ∥,

2 γ 1 - \frac{γ}{2} < \frac{γ}{2} + 2 γ,

2 γ 1 - \frac{γ}{2} < \frac{γ}{2} + 2 γ,

P, N \to \infty lim ∥ H - I ∥ = 1 - e_{-}

P, N \to \infty lim ∥ H - I ∥ = 1 - e_{-}

\mathbb{E}[\exp(tX)]\leq\exp\bigg{(}\frac{\sigma^{2}t^{2}}{2}\bigg{)}

\mathbb{E}[\exp(tX)]\leq\exp\bigg{(}\frac{\sigma^{2}t^{2}}{2}\bigg{)}

\Bigg{|}\frac{P}{N}-\gamma\Bigg{|}\leq\frac{K}{P^{2}},

\Bigg{|}\frac{P}{N}-\gamma\Bigg{|}\leq\frac{K}{P^{2}},

P, N \to \infty lim sup \frac{∥ Σ ∥∥ Σ ^{- 1} ∥∥ H - I ∥}{∥ A - I ∥} < 1,

P, N \to \infty lim sup \frac{∥ Σ ∥∥ Σ ^{- 1} ∥∥ H - I ∥}{∥ A - I ∥} < 1,

P, N \to \infty lim sup \frac{∥ Σ ^{\frac{1}{2}} H Σ ^{\frac{1}{2}} - Σ ∥}{∥ Σ ^{\frac{1}{2}} A Σ ^{\frac{1}{2}} - Σ ∥} < 1 a.s.

P, N \to \infty lim sup \frac{∥ Σ ^{\frac{1}{2}} H Σ ^{\frac{1}{2}} - Σ ∥}{∥ Σ ^{\frac{1}{2}} A Σ ^{\frac{1}{2}} - Σ ∥} < 1 a.s.

\big{\|}\mathbf{\Sigma}^{\frac{1}{2}}\big{\|}^{2}=\|\mathbf{\Sigma}\|\quad\hbox{and}\quad\big{\|}\mathbf{\Sigma}^{-\frac{1}{2}}\big{\|}^{2}=\big{\|}\mathbf{\Sigma}^{-1}\big{\|}.

\big{\|}\mathbf{\Sigma}^{\frac{1}{2}}\big{\|}^{2}=\|\mathbf{\Sigma}\|\quad\hbox{and}\quad\big{\|}\mathbf{\Sigma}^{-\frac{1}{2}}\big{\|}^{2}=\big{\|}\mathbf{\Sigma}^{-1}\big{\|}.

\big{\|}\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{H}\mathbf{\Sigma}^{\frac{1}{2}}-\mathbf{\Sigma}\big{\|}\leq\big{\|}\mathbf{\Sigma}\big{\|}\|\mathbf{H}-\mathbf{I}\|\leq\big{\|}\mathbf{\Sigma}\big{\|}\big{\|}\mathbf{\Sigma}^{-1}\big{\|}\frac{\|\mathbf{H}-\mathbf{I}\|}{\|\mathbf{A}-\mathbf{I}\|}\|\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{A}\mathbf{\Sigma}^{\frac{1}{2}}-\mathbf{\Sigma}\|,

\big{\|}\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{H}\mathbf{\Sigma}^{\frac{1}{2}}-\mathbf{\Sigma}\big{\|}\leq\big{\|}\mathbf{\Sigma}\big{\|}\|\mathbf{H}-\mathbf{I}\|\leq\big{\|}\mathbf{\Sigma}\big{\|}\big{\|}\mathbf{\Sigma}^{-1}\big{\|}\frac{\|\mathbf{H}-\mathbf{I}\|}{\|\mathbf{A}-\mathbf{I}\|}\|\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{A}\mathbf{\Sigma}^{\frac{1}{2}}-\mathbf{\Sigma}\|,

\frac{∥ Σ ^{\frac{1}{2}} H Σ ^{\frac{1}{2}} - Σ ∥}{∥ Σ ^{\frac{1}{2}} A Σ ^{\frac{1}{2}} - Σ ∥} \leq \frac{∥ Σ ∥∥ Σ ^{- 1} ∥∥ H - I ∥}{∥ A - I ∥},

\frac{∥ Σ ^{\frac{1}{2}} H Σ ^{\frac{1}{2}} - Σ ∥}{∥ Σ ^{\frac{1}{2}} A Σ ^{\frac{1}{2}} - Σ ∥} \leq \frac{∥ Σ ∥∥ Σ ^{- 1} ∥∥ H - I ∥}{∥ A - I ∥},

\lim_{P,N\to\infty}\frac{\|\mathbf{\Sigma}\|\|\mathbf{\Sigma}^{-1}\|\|\mathbf{H}-\mathbf{I}\|}{\|\mathbf{A}-\mathbf{I}\|}=c\bigg{(}\frac{\sqrt{1-\frac{\gamma}{2}}}{1+\frac{1}{2}\sqrt{\frac{\gamma}{2}}}\bigg{)}<1,

\lim_{P,N\to\infty}\frac{\|\mathbf{\Sigma}\|\|\mathbf{\Sigma}^{-1}\|\|\mathbf{H}-\mathbf{I}\|}{\|\mathbf{A}-\mathbf{I}\|}=c\bigg{(}\frac{\sqrt{1-\frac{\gamma}{2}}}{1+\frac{1}{2}\sqrt{\frac{\gamma}{2}}}\bigg{)}<1,

c < \frac{5}{4} \frac{4}{3} \approx 1.44337567 \dots .

c < \frac{5}{4} \frac{4}{3} \approx 1.44337567 \dots .

m_{\mathfrak{s}\mathfrak{h}}(z)=\int_{\mathbb{R}^{+}}\frac{\operatorname{d\!}F(x)}{z-x\big{(}\frac{\gamma zm_{\mathfrak{s}\mathfrak{h}}(z)}{n}+1-\gamma\big{)}},

m_{\mathfrak{s}\mathfrak{h}}(z)=\int_{\mathbb{R}^{+}}\frac{\operatorname{d\!}F(x)}{z-x\big{(}\frac{\gamma zm_{\mathfrak{s}\mathfrak{h}}(z)}{n}+1-\gamma\big{)}},

m_{e} (z) = \int_{R^{+}} \frac{d F ( x )}{z - \frac{x}{S _{\overset{˘}{h}} ( z m _{e} ( z ) - 1 )}},

m_{e} (z) = \int_{R^{+}} \frac{d F ( x )}{z - \frac{x}{S _{\overset{˘}{h}} ( z m _{e} ( z ) - 1 )}},

\frac{\gamma z}{n}S_{\breve{\mathfrak{h}}}(z)^{2}+\gamma\bigg{(}\frac{1+z}{n}-1\bigg{)}S_{\breve{\mathfrak{h}}}(z)-1=0.

\frac{\gamma z}{n}S_{\breve{\mathfrak{h}}}(z)^{2}+\gamma\bigg{(}\frac{1+z}{n}-1\bigg{)}S_{\breve{\mathfrak{h}}}(z)-1=0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Harmonic Means of Wishart Random Matrices

Asad Lodhia

256 West Hall, 1085 South University Avenue, Ann Arbor MI, 48109-1107

[email protected]

Abstract.

We use free probability to compute the limiting spectral properties of the harmonic mean of $n$ i.i.d. Wishart random matrices $\mathbf{W}_{i}$ whose limiting aspect ratio is $\gamma\in(0,1)$ when $\mathbb{E}[\mathbf{W}_{i}]=\mathbf{I}$ . We demonstrate an interesting phenomenon where the harmonic mean $\mathbf{H}$ of the $n$ Wishart matrices is closer in operator norm to $\mathbb{E}[\mathbf{W}_{i}]$ than the arithmetic mean $\mathbf{A}$ for small $n$ , after which the arithmetic mean is closer. We also prove some results for the general case where the expectation of the Wishart matrices are not the identity matrix.

1. Introduction

Positive definite random matrices are often studied in probability theory and statistics. The most famous (and arguably most widely used) matrix model supported on the set of positive semidefinite matrices is the Wishart ensemble. Let $\{\mathbf{X}_{i}\}_{i=1}^{n}$ be a sequence of centered independent identically distributed matrices that have dimension $P\times N$ whose entries have at least two finite moments. Suppose each column of $\mathbf{X}_{i}$ is an independent $P$ -dimensional random vector. The matrices

[TABLE]

are called Wishart matrices. If the columns of each $\mathbf{X}_{i}$ are i.i.d. observations from a Gaussian distribution it suffices to specify their covariance matrix $\mathbf{\Sigma}=\mathbb{E}[\mathbf{W}_{i}]$ to obtain their distribution. In statistics the estimation of such a covariance matrix is a fundamental task. Our interest in this paper will be the mathematical study of estimates in operator norm of the covariance in the high-dimensional regime $\frac{P}{N}\to\gamma\in(0,1)$ .

The notational choice in the previous paragraph may seem odd to the reader. If the columns of $\mathbf{X}_{i}$ are drawn i.i.d., one may combine them, say by computing the arithmetic mean

[TABLE]

This reduces the variance by a factor of $n^{-1}$ and is equivalent to adjoining the columns of the $\mathbf{X}_{i}$ into a single $P$ -by- $Nn$ matrix, since

[TABLE]

In the regime where $P/N\to\gamma\in(0,1)$ the sample covariance matrix $\mathbf{W}_{i}$ does not converge to its expected value $\mathbf{\Sigma}$ . Instead, when $\mathbb{E}\mathbf{W}_{i}=\mathbf{I}$ , the spectral measure of each $\mathbf{W}_{i}$ satisfies the Marčenko-Pastur Law with parameter $\gamma$ :

[TABLE]

In fact, under sufficient moment conditions [16], we have the stronger result that

[TABLE]

where $\|\mathbf{M}\|$ represents the operator norm of the matrix $\mathbf{M}$ . It is important to note here that the value of the operator norm in this particular case is due to the right-edge of the spectrum of the Marčenko-Pastur Law. Subtraction of the matrix $\mathbf{I}$ shifts all of the eigenvalues of $\mathbf{W}_{i}$ exactly by one and the eigenvalue with largest absolute value is at the right edge of the spectrum. Heuristically, our error is due to overestimating the largest eigenvalue. When we average the $\mathbf{W}_{i}$ resulting operator norm bound becomes

[TABLE]

The above limit follows from our interpretation of the arithmetic mean as a sample covariance matrix with aspect ratio $P/Nn\to\gamma/n$ . Notice that the change in the operator norm error is not simply a rescaling by $n^{-1/2}$ , even though the entrywise variance has changed by $n^{-1}$ . The purpose of this paper is to explore an alternative to the arithmetic mean that takes into account the positive definiteness of $\mathbf{W}_{i}$ when $P<N$ .

The space of positive definite matrices is a cone and has a natural partial ordering. When $\mathbf{M}_{1}$ and $\mathbf{M}_{2}$ are $P\times P$ positive semidefinite matrices, we say

[TABLE]

if and only if $\mathbf{M}_{2}-\mathbf{M}_{1}$ is positive semidefinite. Under this ordering one can show various generalizations of classical inequalities. Of particular interest in this paper, if $\mathbf{M}_{1}$ , $\ldots$ , $\mathbf{M}_{k}$ are positive definite (and therefore invertible), the classic arithmetic mean harmonic mean (AMHM) generalizes as [9, Theorem 1]

[TABLE]

The matrix on the left,

[TABLE]

is the harmonic mean of $\mathbf{M}_{1}$ , $\ldots$ , $\mathbf{M}_{k}$ . This paper shows that $\mathbf{A}$ can give worse estimates in operator norm than the matrix harmonic mean

[TABLE]

When $\mathbb{E}[\mathbf{W}_{i}]=\mathbf{I}$ , we show for any $\gamma\in(0,1)$ and the operator norm of $\mathbf{H}-\mathbf{I}$ is always smaller than $\mathbf{A}-\mathbf{I}$ when $n=2$ . For general $n\geq 2$ , this advantage disappears when $n$ exceeds a critical value $n^{*}(\gamma)$ that is a function only of $\gamma$ .

A heuristic explanation of this result is the AMHM inequality $\mathbf{H}\preceq\mathbf{A}$ . We know from our discussion above that $\mathbf{A}$ is, in some sense, an overestimate of its expectation $\mathbf{I}$ . By taking a matrix smaller in the positive definite cone, we are compensating for this overestimation. As will be shown below, $\|\mathbf{H}-\mathbf{I}\|$ will be the absolute value of the smallest eigenvalue of $\mathbf{H}-\mathbf{I}$ , so $\mathbf{H}$ underestimates $\mathbf{I}$ . When $n$ is large, the spectral measure of $\mathbf{H}$ approaches a point mass at $(1-\gamma)$ whereas the spectral measure of $\mathbf{A}$ approaches a point mass at $1$ (the spectral measure of $\mathbf{I}$ ). This explains why eventually, for $n$ large enough, $\mathbf{A}$ is a better estimate.

The analysis presented in this paper is complete for the case where $\mathbb{E}[\mathbf{W}_{i}]=\mathbf{I}$ but we will be able to comment on Wishart matrices with general non-singular covariance, by the observation that if $\mathbb{E}[\mathbf{W}_{i}]=\mathbf{I}$ , then

[TABLE]

This fact implies that for both the arithmetic and harmonic mean we simply need to multiply on both sides by $\mathbf{\Sigma}^{1/2}$ to get the arithmetic and harmonic mean of a Wishart matrix with a general covariance $\mathbf{\Sigma}$ . With some conditions on $\mathbf{\Sigma}$ , we can ensure that the result sketched above still holds in this more general case.

Notation

In this paper $\mathbf{I}$ will be the identity matrix, its dimension will be clear from the context. For a matrix $\mathbf{M}$ , $\|\mathbf{M}\|$ will always denote its operator norm and $\mathbf{M}^{*}$ its conjugate transpose. Given a set $A$ , the function $\mathbf{1}_{A}(x)$ is the indicator function associated to that set. For a unital $C^{*}$ -algebra $\mathcal{A}$ , the norm will be denoted $\|\cdot\|_{\mathcal{A}}$ , the unit element will be denoted $1_{\mathcal{A}}$ and $*$ will denote the involution.

Acknowledgements

We are grateful to Alice Guionnet, Elizaveta Levina and Jinho Baik for their helpful comments and suggestions. We are also extremely grateful to Keith Levin for reading earlier drafts of the paper and providing helpful comments. This research was supported through NSF Grant DMS-1646108.

2. Results and Outline

For what follows, we will make the following assumption on the matrices $\mathbf{X}_{i}$ that generate $\mathbf{W}_{i}$ . We need these assumptions primarily due to our application of Theorem 4.1.

Definition 1 (Matrix Model).

The matrices $\{\mathbf{X}_{i}\}_{i=1}^{n}$ are $P$ by $N$ and their entries are i.i.d. standard complex Gaussians111A standard complex Gaussian is of the form $\frac{Z_{1}+\sqrt{-1}Z_{2}}{\sqrt{2}}$ where $Z_{1}$ and $Z_{2}$ are independent standard real Gaussian random variables. and

[TABLE]

where $K>0$ and $\gamma\in(0,1)$ are constants that do not depend on $P$ , $N$ or $n$ . For each $i=1,2,\dots,n$ , define $\mathbf{W}_{i}=N^{-1}\mathbf{X}_{i}\mathbf{X}_{i}^{*}$ .

We will prove the following result, which shows the harmonic mean of Wishart random matrices can be closer in operator norm to the true covariance than is the operator norm of the arithmetic mean. See Figure 1 for a simulation.

Theorem 2.1.

Let $\mathbf{W}_{1}$ , $\ldots$ , $\mathbf{W}_{n}$ satisfy Definition 1. Then for each fixed $n\geq 2$ , the spectral measure of $\mathbf{H}$ converges weakly almost surely to the measure with density, i.e.,

[TABLE]

where

[TABLE]

Further, we have the convergence:

[TABLE]

Remark 1.

Note that for small $n$

[TABLE]

which is lost after $n$ exceeds a threshold $n^{*}(\gamma)$ . Indeed, the inequality is always true for $n=2$ , where it reads:

[TABLE]

see Figure 2 for a comparison of these functions, and notice the improvement of $\mathbf{H}$ is larger as $\gamma$ gets closer to 1. Observe that as $n\to\infty$ , $e_{\pm}$ converge to $1-\gamma$ which suggests that $\mathbf{H}$ is somehow “shrunken” compared to $\mathbf{A}$ . As mentioned in the Introduction,

[TABLE]

so $\mathbf{H}$ is off by the identity due to an “underestimate” of the operator norm.

The Theorem 2.1 applies to matrices from Definition 1. For applications to statistics and other fields, it may be more desirable to have a model for general subgaussian real random matrices.

Definition 2 (Alternative Matrix Model).

The matrices $\{\mathbf{X}_{i}\}_{i=1}^{n}$ are $P$ by $N$ and their entries are i.i.d. real subgaussian 222A centered real subgaussian random variable $X$ is a random variable such that there exists a $\sigma>0$ such that

$\mathbb{E}[\exp(tX)]\leq\exp\bigg{(}\frac{\sigma^{2}t^{2}}{2}\bigg{)}$

for all $t\in\mathbb{R}$ . The number $\sigma$ is often called the subgaussian parameter of $X$ . and

[TABLE]

where $K>0$ and $\gamma\in(0,1)$ are constants that do not depend on $P$ , $N$ or $n$ . For each $i=1,2,\dots,n$ , define $\mathbf{W}_{i}=N^{-1}\mathbf{X}_{i}\mathbf{X}_{i}^{*}$ .

A few of the Lemma used to prove Theorem 2.1 carry through to the matrices in the Definition 2. This strongly suggests Theorem 2.1 should hold for more general assumptions on the matrix entries. See Remark 3 for a technical discussion that clarifies this possible extension.

Another natural question is whether the results above carry over to the case where $\mathbb{E}[\mathbf{W}_{i}]\neq\mathbf{I}$ . A simple submultiplicativity argument combined with the above Theorem gives the following result:

Corollary 2.1.1.

Assume $n\leq n^{*}(\gamma)$ . Let $\mathbf{\Sigma}$ be a sequence of deterministic $P\times P$ positive definite covariance matrices (with $P$ -dependence suppressed) such that

[TABLE]

then

[TABLE]

Proof.

Since $\mathbf{\Sigma}$ are positive definite, we have

[TABLE]

Now, by submultiplicativity of the operator norm, it follows that

[TABLE]

since with probability one $\mathbf{A}\neq\mathbf{I}$ we know the quantity on the right is non-zero. Hence we can rearrange to obtain the inequality

[TABLE]

now taking the $\limsup$ of both sides yields the required result. ∎

Remark 2.

The quantity $\|\mathbf{\Sigma}\|\|\mathbf{\Sigma}^{-1}\|$ is the largest eigenvalue of $\mathbf{\Sigma}$ divided by the smallest eigenvalue of $\mathbf{\Sigma}$ . In applications, this is often called the condition number of $\mathbf{\Sigma}$ . Suppose that the limit of $\|\mathbf{\Sigma}\|\|\mathbf{\Sigma}^{-1}\|$ exists and is a constant $c\geq 1$ . Then, assuming $n=2$ for ease, under the assumptions of Theorem 2.1 our required inequality for the condition number is

[TABLE]

which is clearly non-vacuous, for instance when $\gamma=\frac{1}{2}$ the inequality requires

[TABLE]

In Section 6 we provide the following fixed point equation for the limiting Stieltjes transform of $\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{H}\mathbf{\Sigma}^{\frac{1}{2}}$ assuming that $\mathbf{\Sigma}$ and $\mathbf{H}$ as non-commutative random variables converge to a pair of freely independent random variables (see Section 3, Definition 3 and equation (7) for relevant definitions and terminology).

Theorem 2.2.

Suppose that $(\mathbf{H},\mathbf{\Sigma})$ as a pair of non-commutative random variables converge in the sense of distribution to a pair $(\mathfrak{h},\mathfrak{s})$ of non-commutative freely independent random variables with the law of $\mathfrak{h}$ being the spectral measure defined in Theorem 2.1 and the law of $\mathfrak{s}$ being the limiting spectral measure of $\mathbf{\Sigma}$ whose cdf we denote as $F$ . We assume $F$ is supported on the positive reals. Then we have the following limiting fixed point equation for the Stieltjes transform of $\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{H}\mathbf{\Sigma}^{\frac{1}{2}}$ , which we denote $m_{\mathfrak{s}\mathfrak{h}}(z)$

[TABLE]

and the limiting fixed point equation for the Stieltjes transform of $\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{H}\mathbf{\Sigma}^{\frac{1}{2}}-\mathbf{\Sigma}$ , which we denote as $m_{\mathfrak{e}}(z)$ , is

[TABLE]

where $S_{\breve{\mathfrak{h}}}(z)$ is the $S$ -transform of $\mathfrak{h}-1_{\mathcal{F}}$ which satisfies the quadratic:

[TABLE]

By Corollary 2.1.1, it stands to reason that the improvement of the harmonic mean over the arithmetic mean in operator norm should be true for a wide range of covariance $\mathbf{\Sigma}$ . By the above fixed point characterization, we expect this improvement should only depend on the limiting distribution $\operatorname{d\!}F$ of $\mathbf{\Sigma}$ . In future investigations we hope to characterize the role of $\operatorname{d\!}F$ in the phenomenon described in Theorem 2.1 and Remark 1.

Outline

The paper is organized as follows: Section 3 provides relevant background terminology and results from free probability theory needed to understand the proof of Theorem 2.1 and Theorem 2.2. Section 4 states and proves Lemma 4.3, which guarantees the operator norm convergence in Theorem 2.1. Section 5 gives the proof of Theorem 2.1, which is reduced to a calculation when Lemma 4.3 is taken as given. Section 6 gives the proof of Theorem 2.2.

3. Free Probability Theory

In order to prove the main results of this paper, we require some tools from the theory of free probability. Free probability is a generalization of classical probability invented by Dan Voiculescu in the 1980s for the purpose of investigating some properties of operator algebras [13]. We require this theory because the sequence of $\{\mathbf{W}_{i}\}_{i=1}^{n}$ given in Definition 1 behave as the “joint law” of a collection of non-commutative random variables (see Definition 3). In Section 5 we will use this fact to directly compute the limiting spectral measure of the harmonic mean $\mathbf{H}$ . Our primary references for the exposition in this section are [1, Chapter 5] and [6, Chapters 1–7].

Let $(\mathcal{A},\|\cdot\|_{\mathcal{A}},*)$ denote a unital $C^{*}$ -algebra with involution $*$ . This means $\mathcal{A}$ is a complex vector space equipped with a complete norm $\|\cdot\|_{\mathcal{A}}$ (i.e., $\mathcal{A}$ is a Banach space), a bilinear product

[TABLE]

and a unit element

[TABLE]

$\mathcal{A}$ is a unital Banach algebra if in addition the norm satisfies

[TABLE]

When $\mathcal{A}$ has an involution operation

[TABLE]

which satisfies for all $a$ , $b\in\mathcal{A}$ and $\lambda\in\mathbb{C}$

[TABLE]

then we say that $\mathcal{A}$ is a unital $C^{*}$ -algebra. An element $a\in\mathcal{A}$ of a $C^{*}$ -algebra is invertible if there exists a $b$ such that $ab=ba=1_{\mathcal{A}}$ . Notice that the algebraic structure of $\mathcal{A}$ allows us to consider non-commutative polynomials over elements in $\mathcal{A}$ . The subalgebra of non-commutative polynomials in formal variables $x_{1}$ , $\ldots$ , $x_{n}$ will be denoted $\mathbb{C}\langle x_{1},\ldots,x_{n}\rangle$ .

If $\mathcal{A}$ is a $C^{*}$ -algebra, then for each $a\in\mathcal{A}$ the spectrum of $a$ can be defined by

[TABLE]

we can say an element in $\mathcal{A}$ is non-negative, written $a\succeq_{\mathcal{A}}0$ , if $a^{*}=a$ and its spectrum is non-negative. Note that for the $C^{*}$ -algebra $\mathrm{Mat}_{P}(\mathbb{C})$ of $P$ -by- $P$ matrices, this is identical to the definition of a positive-semidefinite matrix.

To apply free probability to our problem of interest, we need the notion of a $C^{*}$ -probability space. A non-commutative $C^{*}$ -probability space $(\mathcal{A},\|\cdot\|_{\mathcal{A}},*,\phi)$ is the unital $C^{*}$ -algebra $(\mathcal{A},\|\cdot\|_{\mathcal{A}},*)$ equipped with a linear map

[TABLE]

satisfying $\phi(1_{\mathcal{A}})=1$ and $\phi(a)\geq 0$ whenever $a\succeq_{\mathcal{A}}0$ . Such a $\phi$ is called a state. If $\phi(ab)=\phi(ba)$ for every $a$ , $b\in\mathcal{A}$ , then $\phi$ is called a tracial state. Finally, if for every $a\in\mathcal{A}$

[TABLE]

then $\phi$ is a faithful tracial state 333For a faithful tracial state, the operator norm for any $a\in\mathcal{A}$ can be recovered by taking a limit:

$\lim_{k\to\infty}\phi\big{(}(aa^{*})^{k}\big{)}^{\frac{1}{2k}}=\|a\|_{\mathcal{A}},$

see [6, Proposition 3.17] for a proof..

Elements of $\mathcal{A}$ are called non-commutative random variables, and for any collection $a_{1},\ldots,a_{m}\in\mathcal{A}$ , their joint law is the map

[TABLE]

where $Q\in\mathbb{C}\langle x_{1},\ldots,x_{m}\rangle$ .

The most important $C^{*}$ -probability space will be $(\mathrm{Mat}_{P}(\mathbb{C}),\|\cdot\|,*,\varphi_{P})$ where

[TABLE]

When $a$ is a normal matrix, $\varphi_{P}(a)$ is the integral over the normalized spectral measure of $a$ :

[TABLE]

where $\lambda_{j}(a)\in\mathbb{C}$ are the eigenvalues of $a$ .

For non-commutative random variables, there is a notion of convergence in distribution as well as an analogue of independence called free independence. Let $(\mathcal{A}_{m},\|\cdot\|_{\mathcal{A}_{m}},*,\phi_{m})$ for $m\geq 1$ and $(\mathcal{A},\|\cdot\|,*,\phi)$ be a collection of non-commutative $C^{*}$ -probability spaces. Suppose that for each $m$ , $a_{m,1}$ , $\ldots$ , $a_{m,k}\in\mathcal{A}_{m}$ is a collection of non-commutative random variables and let $a_{1}$ , $\ldots$ , $a_{k}\in\mathcal{A}$ be a fixed collection of non-commutative random variables. We say $a_{m,1}$ , $\ldots$ , $a_{m,k}$ converge in distribution to $a_{1}$ , $\ldots$ , $a_{k}$ if for every non-commutative polynomial $Q\in\mathbb{C}\langle x_{1},\ldots,x_{k}\rangle$ ,

[TABLE]

A sequence of non-commutative random variables $a_{1}$ , $\ldots$ , $a_{k}$ are freely independent if for any polynomials $Q_{1}$ , $\ldots$ , $Q_{k}$ , we have

[TABLE]

we say a sequence of non-commutative random variables $a_{m,1}$ , $\ldots$ , $a_{m,k}\in\mathcal{A}_{m}$ are asymptotically freely independent if they converge in distribution to freely independent non-commutative random variables $a_{1}$ , $\ldots$ , $a_{m}\in\mathcal{A}$ .

The random matrices $\mathbf{W}_{i}$ , when viewed as a sequence of random variables taking values in the $C^{*}$ -probability space $(\mathrm{Mat}_{P}(\mathbb{C}),\|\cdot\|,*,\varphi_{P})$ , converge almost surely in the sense of distribution to a collection of non-commutative random variables $\mathfrak{p}_{1},\ldots,\mathfrak{p}_{n}$ :

[TABLE]

We define the $\mathfrak{p}_{j}$ and the state $\nu$ below.

Definition 3.

Let $(\mathcal{F},\|\cdot\|_{\mathcal{F}},*,\nu)$ be a $C^{*}$ -algebra with faithful tracial state $\nu$ and non-commutative random variables $\mathfrak{p}_{1}$ , $\ldots\,$ , $\mathfrak{p}_{n}\in\mathcal{F}$ that are self-adjoint, non-negative, freely independent and satisfy

[TABLE]

where $\rho_{\mathrm{MP},\gamma}$ is the Marčenko-Pastur Law with parameter $\gamma$ defined in (2). The $\mathfrak{p}_{j}$ are called free Poisson non-commutative random variables.

The $C^{*}$ -probability space defined above is guaranteed to exist due to a functional analytic construction called the free product [1, Section 5.2–5.3]. In fact, it is easier for us to assume we have this construction in hand for what follows below. Specifically, there exists a Hilbert space $\mathcal{H}$ and a subalgebra $\mathcal{F}$ in the space of bounded linear operators on $\mathcal{H}$ , denoted $B(\mathcal{H})$ , such that the $C^{*}$ -algebra in Definition 3 is $\mathcal{F}$ equipped with the operator norm and the involution is the mapping that takes an operator to its adjoint. Furthermore there is a $\zeta\in\mathcal{H}$ such that

[TABLE]

In particular, the spectral measure of each $\mathfrak{\mathfrak{p}_{i}}$ is $\rho_{\mathrm{MP},\gamma}$ , see [1, Theorem 5.2.24].

In the next section, we will use a result from [3] in addition to concentration results in [12, 8] to show that the spectral measure of the harmonic mean $\mathbf{H}$ converges to the law of the non-commutative random variable

[TABLE]

In addition, we will be able to show $\|\mathbf{H}-\mathbf{I}\|$ converges almost surely to $\|\mathfrak{h}-1_{\mathcal{F}}\|_{\mathcal{F}}$ . First, however, we must establish the existence of $\mathfrak{h}$ .

Lemma 3.1.

The non-commutative random variable $\mathfrak{h}$ in (7) is well-defined and can be approximated by a sequence of non-commutative polynomials in $\{\mathfrak{p}_{1},\dots,\mathfrak{p}_{n}\}$ .

Proof.

A simple proof of this property comes directly from the fact that each of our $\mathfrak{p}_{j}$ are represented as bounded linear operators on a Hilbert space $\mathcal{H}$ . Since their spectral measure is $\rho_{\mathrm{MP},\gamma}$ which is supported on the positive reals, they are all invertible so each $\mathfrak{p}_{j}$ is invertible and so is the sum

[TABLE]

We may approximate $\mathfrak{h}$ with non-commutative polynomials in $\mathfrak{p}_{i}$ by utilizing the Neumann series. Let

[TABLE]

now consider the partial sum of the geometric series

[TABLE]

by definition of $\Delta$ , along with usual bounds on geometric series we have

[TABLE]

which goes to [math] as $m\to\infty$ . Similarly, each $\mathfrak{p}_{i}^{-1}$ can be expanded as the infinite series

[TABLE]

with similar error bounds as the expansion for $\mathfrak{h}$ . Since $\mathfrak{h}^{-1}$ is the sum of $\mathfrak{p}_{i}^{-1}$ we need only insert the truncated geometric series of $\mathfrak{p}_{i}^{-1}$ into the truncated geometric series for $\mathfrak{h}$ to get a non-commutative polynomial in $\mathfrak{p}_{i}$ that approximates $\mathfrak{h}$ in the norm $\|\cdot\|_{\mathcal{F}}$ . ∎

The polynomial approximation in the proof above will be used again in the next section and is the main technical ingredient in addition to Theorem 4.1 below to establish the operator norm convergence of $\mathbf{H}$ .

4. Strong Convergence of the Harmonic Mean

The following Theorem from [3] will be our main tool for obtaining explicit formulas for the limiting operator norm of $\mathbf{H}-\mathbf{I}$ .

Theorem 4.1.

Let $\{\mathbf{W}_{i}\}_{i=1}^{n}$ satisfy Definition 1. Then $\mathbf{W}_{i}$ are asymptotically free and converge in the strong sense to freely independent Poisson random variables $\mathfrak{p}_{1},\ldots,\mathfrak{p}_{n}$ . This means, for any fixed polynomial $Q\in\mathbb{C}\langle x_{1},\ldots,x_{n}\rangle$ , in addition to the convergence

[TABLE]

we have the convergence

[TABLE]

In order for this theorem to imply our desired results, we will use the fact that $\mathbf{H}$ can be approximated by polynomials in the matrices $\mathbf{W}_{i}$ . A concentration bound on the largest eigenvalues of both $\mathbf{W}_{i}$ and $\mathbf{W}_{i}^{-1}$ is necessary before proceeding. We prove this Lemma for matrices satisfying Definition 1 and Definition 2.

Lemma 4.2.

Let $\{\mathbf{W}_{i}\}$ satisfy Definition 1 or Definition 2. Then there exists a deterministic constant $\kappa>0$ that depends only on $n$ and the subgaussian parameter of the entries of $\mathbf{X}_{i}$ such that the event

[TABLE]

satisfies

[TABLE]

Proof.

For any $t>0$ :

[TABLE]

By the AMHM inequality in (3), we have

[TABLE]

so the triangle inequality and union bound applied to $\|\mathbf{A}\|$ gives

[TABLE]

The triangle inequality and a union bound also yield

[TABLE]

so we have

[TABLE]

There are several methods to bound the first probability on the right-hand side of (8). One is an $\epsilon$ -net argument that is described in [12, Thereom 4.4.5], it gives a bound of the form

[TABLE]

here $C_{1}$ , $c_{1}>0$ only depend on the subgaussian parameter of the entries of $\mathbf{X}_{i}$ . The above bound is clearly summable for any $t>0$ fixed. Note that the $\epsilon$ -net argument given in [12] is for the model in Definition 2, a similar argument can easily be made for the model in Definition 1 with limited adjustments.

We take more care to bound the second probability on the right-hand side of (8). It suffices to bound the smallest singular value of $\mathbf{X}_{i}$ since this is equal to $N^{1/2}\|\mathbf{W}_{i}^{-1}\|^{-1/2}$ . We consider the complex Gaussian model of Definition 1 separately from the real-entried model of Definition 2.

For the model in Definition 1 we use the fact that the eigenvalues of $\mathbf{W}_{i}$ have the same distribution as the eigenvalues of $N^{-1}\mathbf{Y}\mathbf{Y}^{*}$ where $\mathbf{Y}$ is the lower-triangular matrix

[TABLE]

where each $D_{j}$ and $L_{j}$ in the above matrix are independent $\chi^{2}$ -distributed random variable with $j$ degrees of freedom (see [4] for a derivation). With this representation, the same Greŝgorin disk argument that yields [10, Equation (2)] yields the lower bound

[TABLE]

the rest of the arguments in [10] that bound from below the right hand side of the above expression carry through identically and yield a constant $\epsilon>0$ such that the event $\{\|\mathbf{W}_{i}^{-1}\|>\epsilon\}$ is summable.

For the model in Definition 2 we use [8, Theorem 1.1] which states for any $\epsilon>0$ ,

[TABLE]

where $C_{2}$ and $c_{2}>0$ depend only on the subgaussian parameter of the entries of $\mathbf{X}_{i}$ . Rearranging yields

[TABLE]

letting $\epsilon=1/2C_{2}$ ensures that $\big{(}C_{2}\epsilon\big{)}^{N-P}$ is summable in $P$ , since $\frac{P}{N}\to\gamma\in(0,1)$ .

Combining these bounds, we can select $\kappa>0$ large enough so that both tail bounds in (8) are summable. ∎

We will now use Lemma 4.2, Lemma 3.1 and Theorem 4.1 to prove the strong convergence of of $\mathbf{H}$ to the non-commutative random variable $\mathfrak{h}$ .

Lemma 4.3.

Assume $\{\mathbf{W}_{i}\}$ satisfy Definition 1, then the sequence of random matrices $\mathbf{H}$ converge in distribution and in the strong sense to the non-commutative random variable $\mathfrak{h}$ .

Remark 3.

Note that the proof of this Lemma is restricted to matrices satisfying Definition 1 only due to the application of Theorem 4.1 since Lemma 4.2 was proven for the models in Definition 1 and Definition 2. If Theorem 4.1 is extended to the models in Definition 2, then this Lemma would automatically apply and Theorem 2.1 would also extend to the matrices in Definition 2.

Proof.

We first show for any monomial $S\in\mathbb{C}\langle x_{1}\rangle$

[TABLE]

It suffices to prove the above convergence for the monomial $S(x)=x$ since for matrices $\mathbf{M}_{1}$ and $\mathbf{M}_{2}$ :

[TABLE]

for any $k\geq 1$ so the approximation argument we use below will carry through for general $S$ .

By Lemma 4.2, on the event $B_{P}^{c}$ , $\mathbf{H}$ can be expanded as a Neumann series in $\mathbf{H}^{-1}$ :

[TABLE]

Note that on $B_{P}^{c}$ the convergence rate of the partial sum is explicit and deterministic:

[TABLE]

The inverse of $\mathbf{H}$ can also be expanded into a series,

[TABLE]

and similar explicit deterministic convergence rates can be derived. It follows that there is a sequence of non-commutative polynomials $Q_{d}\in\mathbb{C}\langle x_{1},\ldots,x_{n}\rangle$ , whose coefficients depend only on $n$ and $\kappa$ , such that on the event $B_{P}^{c}$ ,

[TABLE]

This implies that on $B_{P}^{c}$ , as $d\to\infty$ , $Q_{d}(\mathbf{W}_{1},\ldots,\mathbf{W}_{n})$ converges in operator norm to $\mathbf{H}$ . Furthermore, for each $d$ , we have

[TABLE]

with probability 1 by Theorem 4.1. Note that if we select $\kappa$ larger than the value $\Delta$ defined in the proof of Lemma 3.1 we also have the bounds

[TABLE]

since the construction of the polynomial $Q_{d}$ in Lemma 3.1 is identical to the one described above and satisfies the same bounds when $\|\cdot\|$ is replaced by $\|\cdot\|_{\mathcal{F}}$ .

Next, we work on the event $B^{c}=\liminf B_{P}^{c}$ , noting that the summability of $\mathbb{P}(B_{P})$ implies $\mathbb{P}(\limsup B_{P})=0$ . As $d\to\infty$ , $Q_{d}(\mathfrak{p}_{1},\ldots,\mathfrak{p}_{n})$ converges to $\mathfrak{h}$ in the norm $\|\cdot\|_{\mathcal{F}}$ . Triangle inequality implies

[TABLE]

Since $B^{c}$ occurs with probability $1$ , the first term is bounded by $\frac{1}{d}$ as $P\to\infty$ by construction of $Q_{d}$ . The second term vanishes as $P\to\infty$ by Theorem 4.1. For any $\epsilon>0$ , there is a deterministic $d$ large enough that makes the third term smaller than $\epsilon$ in the above inequality. Therefore for arbitrary $d$ and $\epsilon>0$ :

[TABLE]

the result then follows.

The convergence

[TABLE]

follows from a similar argument. Again without loss of generality assume $S(\mathbf{H})=\mathbf{H}$ , and write

[TABLE]

The first term is bounded by $\|\mathbf{H}-Q_{d}(\mathbf{W}_{1},\ldots,\mathbf{W}_{n})\|$ which on $B^{c}$ is bounded by $\frac{1}{d}$ as $P\to\infty$ . The second term goes to 0 with probability 1 as $P\to\infty$ by Theorem 4.1. By Lemma 3.1 for any $\epsilon>0$ there is a deterministic $d$ such that for $d$ large enough the third term is bounded by $\epsilon$ . ∎

5. Harmonic Mean of Free Poisson Random Variables

In Sections 3 and 4 we proved that the limiting spectral measure of $\mathbf{H}$ is the law of the non-commutative random variable $\mathfrak{h}$ . Additionally, we proved

[TABLE]

We can conclude the proof of Theorem 2.1 by computing the distribution of $\mathfrak{h}$ and the value of $\|\mathfrak{h}-1_{\mathcal{F}}\|_{\mathcal{F}}$ , which follows from a now standard type of calculation from free probability theory, which is called additive free convolution [14].

Let $\sigma$ be a compactly supported probability measure on $\mathbb{R}$ . The Cauchy-Stieltjes transform of $\sigma$ is denoted

[TABLE]

Let $K_{\sigma}(z)$ be the functional inverse of $m_{\sigma}(z)$ . We define the $R$ -transform of $\sigma$ as

[TABLE]

For two compactly supported probability measures $\sigma_{1}$ and $\sigma_{2}$ , on $\mathbb{R}$ , the additive free convolution of $\sigma_{1}$ and $\sigma_{2}$ , denoted $\sigma_{1}\boxplus\sigma_{2}$ , is the unique probability measure obtained by the relation

[TABLE]

The additive free convolution is significant in free probability because if $\mu_{a}$ and $\mu_{b}$ are the laws of two freely independent non-commutative random variables $a$ and $b$ , respectively, then the law of the non-commutative random variable $a+b$ is the measure $\mu_{a}\boxplus\mu_{b}$ . Note that for notational ease in what follows, we use $R_{a}$ to denote $R$ -transform of the (compactly supported) measure that is the law of the non-commutative random variable $a$ . Here, we use the additive free convolution to compute the law of $\mathfrak{h}$ via the following steps:

(1)

We use the Cauchy-Stieltjes transform of each $\mathfrak{p}_{i}$ to compute the Cauchy-Stieltjes transform of $\mathfrak{p}_{i}^{-1}$ . The fixed point equation for the Cauchy-Stieltjes transform of $\mathfrak{p}_{i}$ is a quadratic equation. This results in a fixed point equation for $\mathfrak{p}_{i}^{-1}$ which is also a quadratic equation. 2. (2)

Using the definition of the $R$ -transform above, we obtain a quadratic fixed point equation for the $R$ -transform of each $\mathfrak{p}_{i}^{-1}$ . 3. (3)

Because each $\mathfrak{p}_{i}$ is freely independent of any other $\mathfrak{p}_{j}$ for $i\neq j$ , $\mathfrak{p}_{i}^{-1}$ is freely independent of $\mathfrak{p}_{j}^{-1}$ . We may now compute the $R$ -transform

[TABLE]

where $\mathfrak{p}$ has the same law as all of the $\mathfrak{p}_{i}$ . 4. (4)

With the $R$ -transform of $\mathfrak{p}_{1}^{-1}+\cdots+\mathfrak{p}_{n}^{-1}$ in hand, we compute the Cauchy-Stieltjes transform of $\mathfrak{h}$ . As a consequence of steps 1–3, this function satisfies a quadratic fixed point equation which can be solved. 5. (5)

We invert the Cauchy-Stieltjes transform of $\mathfrak{h}$ using the usual Plemelj inversion formula. This gives the law of $\mathfrak{h}$ , which upon shifting by $1$ and using faithfulness of the state $\nu$ yields the operator norm of $\mathfrak{h}-1_{\mathcal{F}}$ .

Our approach in the calculations outlined above is from the paper [7], which provides a general framework for computing various transforms for non-commutative random variables whose Stieltjes transforms satisfy polynomial equations. In our case, each $m_{\mathfrak{p}}$ satisfies the following fixed point equation [5]

[TABLE]

Proof of Theorem 2.1.

Denote the Cauchy-Stieltjes transform of the law of each $\mathfrak{p}_{i}$ by

[TABLE]

To obtain the law of $\mathfrak{h}$ , we first compute the law of

[TABLE]

which is the additive free convolution of the $n$ freely independent random variables $\{\mathfrak{p}_{i}^{-1}\}$ . Since they all have the same parameter $\gamma>0$ , we need only compute, for a fixed $\mathfrak{p}$ with the same law as $\mathfrak{p}_{i}$ , the $R$ -transform

[TABLE]

With the law of $n\mathfrak{h}^{-1}$ in hand, we simply invert and rescale to obtain the law of $\mathfrak{h}$ , which allows us to compute the value of $\|\mathfrak{h}-1_{\mathcal{F}}\|_{\mathcal{F}}$ .

The law of $\mathfrak{p}^{-1}$ is the push-forward measure of $\rho_{\mathrm{MP},\gamma}$ by the mapping $x\mapsto\frac{1}{x}$ . We denote this measure by $\mu_{\mathfrak{p}^{-1}}$ . Using the push-forward, we have

[TABLE]

rearranging and replacing $z$ with $\frac{1}{z}$ yields

[TABLE]

The fixed point equation (10) for $m_{\mathfrak{p}}$ can be rewritten as

[TABLE]

inserting (12) into (13) yields

[TABLE]

which when simplified yields

[TABLE]

Equation (14) yields the following equation when for $K_{\mathfrak{p}^{-1}}(z)$

[TABLE]

substituting the $R$ -transform $R_{\mathfrak{p}^{-1}}(z)$ gives the equation

[TABLE]

and simplifying further gives

[TABLE]

Using the additive convolution formula (11) in (15) gives

[TABLE]

We will solve for $m_{n\mathfrak{h}^{-1}}$ , by reversing the procedure we performed above to obtain the $R$ -transform given the Stieltjes transform. Inserting the definition of the $R$ -transform into (16) gives

[TABLE]

simplifying this gives

[TABLE]

so that

[TABLE]

changing variables $z\mapsto\frac{1}{z}$ gives

[TABLE]

As in the pushforward calculation that gave (12) before, we have the relationship

[TABLE]

which when substituted into (17) gives

[TABLE]

which simplifies to the quadratic

[TABLE]

rescaling the law of $n^{-1}\mathfrak{h}$ gives the final equation for $m_{\mathfrak{h}}$ :

[TABLE]

The solution to the quadratic equation (18) is

[TABLE]

where the branch cut of the square root has been taken to be the positive real line. We have chosen this particular root of the quadratic due to the decay condition $m_{\mathfrak{h}}(z)\sim\frac{1}{z}$ as $z\to\infty$ and the requirement that $m_{\mathfrak{h}}(z)$ must be complex analytic off the real line. See, for example, [11, §2.4.3] for a more detailed calculation (for Wigner matrices) that explains the selection of the branch cut when solving fixed point equations for the Stieltjes transform. See [2, §3.3] for a derivation of the MP-law using these techniques. To recover the law $\mu_{\mathfrak{h}}$ from the above Stieltjes transform, we follow the usual inversion formula, which appears in [1, Theorem 2.4.3],

[TABLE]

where $a<b$ are continuity points of the measure $\mu_{\mathfrak{h}}$ . By computing directly, we get that $\mu_{\mathfrak{h}}$ is absolutely continuous with respect to Lebesgue measure with density

[TABLE]

where $e_{\pm}$ are defined in Theorem 2.1. Using faithfulness of the state $\nu$ , we may conclude that the operator norm of $\mathfrak{h}-1_{\mathcal{A}}$ is the largest element in absolute value of the support of the measure $\mu_{\mathfrak{h}}$ after it has been shifted to the left by one:

[TABLE]

the choice of $-$ sign makes the absolute value largest:

[TABLE]

this concludes the proof of Theorem 2.1. ∎

6. General Covariance Matrix

From the last Section, we know that $\mathbf{H}$ converges in the strong sense to a non-commutative random variable, $\mathfrak{h}$ , whose law we computed in the previous section. As mentioned in the Introduction, we can study the harmonic mean of general population $\mathbf{\Sigma}$ by multiplication $\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{H}\mathbf{\Sigma}^{\frac{1}{2}}$ . In this section, we will obtain a fixed point equation for both the limiting spectral measure of $\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{H}\mathbf{\Sigma}^{\frac{1}{2}}$ and its centered version $\mathbf{\Sigma}^{\frac{1}{2}}\mathbf{H}\mathbf{\Sigma}^{\frac{1}{2}}-\mathbf{\Sigma}$ in terms of the limiting cdf $F$ of $\mathbf{\Sigma}$ assuming $(\mathfrak{h},\mathbf{\Sigma})$ converge as a set of non-commutative freely independent random variables $(\mathfrak{s},\mathfrak{h})$ , where $\mathfrak{s}$ has law given by the measure $\operatorname{d\!}F$ .

We use another tool from free probability called the multiplicative free convolution [15]. To define the $S$ -transform, for a non-commutative random variable $a$ in some non-commutative $C^{*}$ -probability space $(\mathcal{A},\|\cdot\|,*,\phi)$ define the function

[TABLE]

we will assume the law $\mu_{a}$ of $a$ is a compactly supported measure supported on $\mathbb{R}$ . We have the relationship

[TABLE]

Assume $\phi(a)\neq 0$ so that $\ell_{a}(z)$ is guaranteed to exist and is the functional inverse of $g_{a}(z)$ :

[TABLE]

The $S$ transform of a non-commutative random variable $a$ is defined as

[TABLE]

For freely independent non-commutative random variables $a$ and $b$ with $\phi(a)\neq 0$ and $\phi(b)\neq 0$ , we have the rule

[TABLE]

Supposing the law of both $a$ and $b$ are known, $\mu_{a}$ and $\mu_{b}$ respectively. We will derive a fixed point equation for the Stieltjes transform $m_{a}(z)$ in terms using the formula (23). First, note that (23) can be written as

[TABLE]

replacing $z$ with $g_{ab}\big{(}\frac{1}{z}\big{)}$ gives

[TABLE]

now applying (21) to this yields

[TABLE]

rearranging yields

[TABLE]

applying $g_{a}$ on both sides yields

[TABLE]

using (21) once more gives

[TABLE]

which written in integral form is:

[TABLE]

We will use (25) to prove Theorem 2.2.

Proof of Theorem 2.2.

By assumption of the Theorem, it will suffice to study the law of

[TABLE]

where $\mathfrak{s}^{\frac{1}{2}}$ is the square root of $\mathfrak{s}$ which exists because $\mathfrak{s}$ can be realized as a positive bounded self-adjoint linear operator on a Hilbert space $\mathcal{H}$ . For notational ease, define

[TABLE]

It is clear that the state $\nu$ is tracial since it is the limit in distribution of the tracial state $\varphi_{P}$ . Therefore, deriving the law of the variables in (26) and (27) is the same as deriving the law of

[TABLE]

respectively. Furthermore, it is clear that $\nu(\mathfrak{s})>0$ and by direct computation we have

[TABLE]

so both $\nu(\breve{\mathfrak{h}})\neq 0$ and $\nu(\mathfrak{h})\neq 0$ . Hence we have the equations

[TABLE]

We derive the fixed point equation for $\mathfrak{s}\mathfrak{h}$ first. From the previous section,

[TABLE]

replacing $z$ with $\frac{1}{z}$ and applying (21) gives

[TABLE]

replacing $z$ with $\ell_{\mathfrak{h}}(z)$ gives

[TABLE]

and solving for $\ell_{\mathfrak{h}}(z)$ gives

[TABLE]

which yields the simple formula

[TABLE]

applying equation (25) gives the required result.

For the second limit equation, since

[TABLE]

it follows

[TABLE]

replacing $z$ with $\frac{1}{z}$ and applying (21) yields

[TABLE]

replacing $z$ with $\ell_{\breve{\mathfrak{h}}}(z)$ gives the polynomial

[TABLE]

rearranging this yields

[TABLE]

inserting the definition of the $S$ -transform in (22) yields

[TABLE]

since $z$ is a non-zero complex number, we can divide through by $z$ to get

[TABLE]

which concludes the proof by another application of equation (25). ∎

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to random matrices , volume 118 of Cambridge Studies in Advanced Mathematics . Cambridge University Press, Cambridge, 2010.
2[2] Zhidong Bai and Jack W. Silverstein. Spectral analysis of large dimensional random matrices . Springer Series in Statistics. Springer, New York, second edition, 2010.
3[3] M. Capitaine and C. Donati-Martin. Strong asymptotic freeness for Wigner and Wishart matrices. Indiana Univ. Math. J. , 56(2):767–803, 2007.
4[4] Ioana Dumitriu and Alan Edelman. Matrix models for beta ensembles. J. Math. Phys. , 43(11):5830–5847, 2002.
5[5] V. A. Marčenko and L. A. Pastur. Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.) , 72 (114):507–536, 1967.
6[6] Alexandru Nica and Roland Speicher. Lectures on the combinatorics of free probability , volume 335 of London Mathematical Society Lecture Note Series . Cambridge University Press, Cambridge, 2006.
7[7] N. Raj Rao and Alan Edelman. The polynomial method for random matrices. Found. Comput. Math. , 8(6):649–702, 2008.
8[8] Mark Rudelson and Roman Vershynin. Smallest singular value of a random rectangular matrix. Comm. Pure Appl. Math. , 62(12):1707–1739, 2009.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Harmonic Means of Wishart Random Matrices

Abstract.

1. Introduction

Notation

Acknowledgements

2. Results and Outline

Definition 1** (Matrix Model).**

Theorem 2.1**.**

Remark 1**.**

Definition 2** (Alternative Matrix Model).**

Corollary 2.1.1**.**

Proof.

Remark 2**.**

Theorem 2.2**.**

Outline

3. Free Probability Theory

Definition 3**.**

Lemma 3.1**.**

Proof.

4. Strong Convergence of the Harmonic Mean

Theorem 4.1**.**

Lemma 4.2**.**

Proof.

Lemma 4.3**.**

Remark 3**.**

Proof.

5. Harmonic Mean of Free Poisson Random Variables

Proof of Theorem 2.1.

6. General Covariance Matrix

Proof of Theorem 2.2.

Definition 1 (Matrix Model).

Theorem 2.1.

Remark 1.

Definition 2 (Alternative Matrix Model).

Corollary 2.1.1.

Remark 2.

Theorem 2.2.

Definition 3.

Lemma 3.1.

Theorem 4.1.

Lemma 4.2.

Lemma 4.3.

Remark 3.