The middle-scale asymptotics of Wishart matrices

Didier Ch\'etelat; Martin T. Wells

arXiv:1705.03510·math.PR·May 11, 2017

The middle-scale asymptotics of Wishart matrices

Didier Ch\'etelat, Martin T. Wells

PDF

TL;DR

This paper investigates the asymptotic behavior of high-dimensional Wishart matrices when the dimension grows much slower than the degrees of freedom, revealing phase transitions and new distributional tools.

Contribution

It introduces the G-transform for distributions, extends the t-distribution to symmetric matrices, and characterizes phase transitions in Wishart matrices in the middle-scale regime.

Findings

01

Existence of phase transitions at specific growth rates of p relative to n.

02

Derivation of density approximations between phase transitions.

03

Empirical spectral distribution follows a semicircle law when p/n approaches zero.

Abstract

We study the behavior of a real $p$ -dimensional Wishart random matrix with $n$ degrees of freedom when $n, p \to \infty$ but $p / n \to 0$ . We establish the existence of phase transitions when $p$ grows at the order $n^{(K + 1) / (K + 3)}$ for every $k \in N$ , and derive expressions for approximating densities between every two phase transitions. To do this, we make use of a novel tool we call the G-transform of a distribution, which is closely related to the characteristic function. We also derive an extension of the $t$ -distribution to the real symmetric matrices, which naturally appears as the conjugate distribution to the Wishart under a G-transformation, and show its empirical spectral distribution obeys a semicircle law when $p / n \to 0$ . Finally, we discuss how the phase transitions of the Wishart distribution might originate from changes in rates of convergence…

Figures1

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Asymptotics of small normalized empirical moments of T ∼ T n / 2 ( I p / 8 ) similar-to 𝑇 subscript T 𝑛 2 subscript 𝐼 𝑝 8 T\sim\text{T}_{n/2}(I_{p}/8) .

Normalized empirical moment	Limit	Asymptotics of its squared $L^{2}$ error
$\frac{1}{p} tr (\frac{4 T}{\sqrt{p}})$	0	$\frac{2}{p^{2}}$
$\frac{1}{p} tr {(\frac{4 T}{\sqrt{p}})}^{2}$	$C_{1} = 1$	$\frac{5}{p^{2}} + \frac{2}{m} + \frac{p^{2}}{m^{2}}$
$\frac{1}{p} tr {(\frac{4 T}{\sqrt{p}})}^{3}$	0	$\frac{24}{p^{2}}$
$\frac{1}{p} tr {(\frac{4 T}{\sqrt{p}})}^{4}$	$C_{2} = 2$	$\frac{97}{p^{2}} + \frac{50}{m} + \frac{25 p^{2}}{m^{2}}$

Equations795

\frac{1}{n} i = 1 \sum n f (λ_{i}) \to \int_{c_{-}}^{c_{+}} f (l) \frac{( c _{+} - l ) ( c _{-} - l )}{2 π c l} d l a.s.,

\frac{1}{n} i = 1 \sum n f (λ_{i}) \to \int_{c_{-}}^{c_{+}} f (l) \frac{( c _{+} - l ) ( c _{-} - l )}{2 π c l} d l a.s.,

\frac{1}{n} i = 1 \sum n f (λ_{i}) \to f (1) a.s.,

\frac{1}{n} i = 1 \sum n f (λ_{i}) \to f (1) a.s.,

\displaystyle\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]}\;\Rightarrow\;\text{GOE}(p),

\displaystyle\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]}\;\Rightarrow\;\text{GOE}(p),

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,\text{GOE}(p)\bigg{)}\rightarrow 0

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,\text{GOE}(p)\bigg{)}\rightarrow 0

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,\text{GOE}(p)\bigg{)}\nrightarrow 0.

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,\text{GOE}(p)\bigg{)}\nrightarrow 0.

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,F_{1}\bigg{)}\rightarrow 0,

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,F_{1}\bigg{)}\rightarrow 0,

\displaystyle f_{1}(X)\;\;\propto\;\;\bigg{|}\text{E}\bigg{[}\exp\bigg{\{}\frac{i}{\sqrt{8}}\operatorname{tr}XZ-\frac{i}{3\sqrt{2n}}\operatorname{tr}Z^{3}+\frac{1}{4n}\operatorname{tr}Z^{4}+\frac{\sqrt{2}i}{5n^{3/2}}\operatorname{tr}Z^{5}

\displaystyle f_{1}(X)\;\;\propto\;\;\bigg{|}\text{E}\bigg{[}\exp\bigg{\{}\frac{i}{\sqrt{8}}\operatorname{tr}XZ-\frac{i}{3\sqrt{2n}}\operatorname{tr}Z^{3}+\frac{1}{4n}\operatorname{tr}Z^{4}+\frac{\sqrt{2}i}{5n^{3/2}}\operatorname{tr}Z^{5}

\displaystyle-\frac{1}{3n^{2}}\operatorname{tr}Z^{6}+\frac{i(p+1)}{2\sqrt{2n}}\operatorname{tr}Z-\frac{p+1}{4n}\operatorname{tr}Z^{2}-\frac{4\sqrt{2}i(p+1)}{3n^{3/2}}\operatorname{tr}Z^{3}\bigg{\}}\bigg{]}\bigg{|}^{2},

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,F_{2}\bigg{)}\rightarrow 0,

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,F_{2}\bigg{)}\rightarrow 0,

\displaystyle f_{2}(X)\;\;\propto\;\;\bigg{|}\text{E}\bigg{[}\exp\bigg{\{}\frac{i}{\sqrt{8}}\operatorname{tr}XZ-\frac{i}{3\sqrt{2n}}\!\operatorname{tr}Z^{3}+\frac{1}{4n}\!\operatorname{tr}Z^{4}+\frac{\sqrt{2}i}{5n^{3/2}}\!\operatorname{tr}Z^{5}

\displaystyle f_{2}(X)\;\;\propto\;\;\bigg{|}\text{E}\bigg{[}\exp\bigg{\{}\frac{i}{\sqrt{8}}\operatorname{tr}XZ-\frac{i}{3\sqrt{2n}}\!\operatorname{tr}Z^{3}+\frac{1}{4n}\!\operatorname{tr}Z^{4}+\frac{\sqrt{2}i}{5n^{3/2}}\!\operatorname{tr}Z^{5}

- \frac{1}{3 n ^{2}} tr Z^{6} - \frac{2 2 i}{7 n ^{5/2}} tr Z^{7} + \frac{i ( p + 1 )}{2 2 n} tr Z - \frac{p + 1}{4 n} tr Z^{2} - \frac{4 2 i ( p + 1 )}{3 n ^{3/2}} tr Z^{3}

\displaystyle\hskip 50.0pt+\frac{p\!+\!1}{4n^{2}}\!\operatorname{tr}Z^{4}+i\frac{512(p\!+\!1)}{5n^{5/2}}\!\operatorname{tr}Z^{5}-\frac{1024(p\!+\!1)}{3n^{3}}\!\operatorname{tr}Z^{6}\bigg{\}}\bigg{]}\bigg{|}^{2},

\displaystyle f_{K}(X)\;\;\propto\;\;\Bigg{|}\text{E}\Bigg{[}\exp\Bigg{\{}\frac{i}{\sqrt{8}}\operatorname{tr}(XZ)+\frac{n}{4}{{\sum}}_{k=3}^{\begin{subarray}{c}2K+3+\\ \mathbbm{1}\!\left[K\,\mathrm{odd}\right]\end{subarray}}i^{k}\Big{(}\frac{2}{n}\Big{)}^{\frac{k}{2}}\frac{\operatorname{tr}Z^{k}}{k}

\displaystyle f_{K}(X)\;\;\propto\;\;\Bigg{|}\text{E}\Bigg{[}\exp\Bigg{\{}\frac{i}{\sqrt{8}}\operatorname{tr}(XZ)+\frac{n}{4}{{\sum}}_{k=3}^{\begin{subarray}{c}2K+3+\\ \mathbbm{1}\!\left[K\,\mathrm{odd}\right]\end{subarray}}i^{k}\Big{(}\frac{2}{n}\Big{)}^{\frac{k}{2}}\frac{\operatorname{tr}Z^{k}}{k}

\displaystyle+\frac{p+\!1}{4}{{\sum}}_{k=1}^{\begin{subarray}{c}2K+2-\\ \mathbbm{1}\!\left[K\,\mathrm{odd}\right]\end{subarray}}i^{k}\Big{(}\frac{2}{n}\Big{)}^{\frac{k}{2}}\frac{\operatorname{tr}Z^{k}}{k}\Bigg{\}}\Bigg{]}\Bigg{|}^{2}

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,F_{K}\bigg{)}\rightarrow 0

\displaystyle\mathrm{d}_{\text{TV}}\bigg{(}\sqrt{n}\Big{[}W_{p}(n,I_{p}/n)-I_{p}\Big{]},\,F_{K}\bigg{)}\rightarrow 0

\frac{\partial _{s}}{\partial _{s} X _{k l}} = \frac{1 + δ _{k l}}{2} \frac{\partial}{\partial X _{k l}} = {\frac{1}{2} \frac{\partial}{\partial X _{k l}} \frac{\partial}{\partial X _{k k}} for k \neq = l for k = l .

\frac{\partial _{s}}{\partial _{s} X _{k l}} = \frac{1 + δ _{k l}}{2} \frac{\partial}{\partial X _{k l}} = {\frac{1}{2} \frac{\partial}{\partial X _{k l}} \frac{\partial}{\partial X _{k k}} for k \neq = l for k = l .

\displaystyle\qquad\bigintsss_{\mathbb{S}_{p}(\mathbb{R})}f(X)\,dX=\bigintsss_{\mathbb{R}^{p(p+1)/2}}f\big{(}X\big{)}\prod_{i\leq j}^{p}\,dX_{ij}.

\displaystyle\qquad\bigintsss_{\mathbb{S}_{p}(\mathbb{R})}f(X)\,dX=\bigintsss_{\mathbb{R}^{p(p+1)/2}}f\big{(}X\big{)}\prod_{i\leq j}^{p}\,dX_{ij}.

\displaystyle f(X)=\frac{1}{|\Sigma|^{\alpha}\Gamma_{p}\left(\alpha\right)}\big{|}X\big{|}^{\alpha-\frac{p+1}{2}}\exp\Big{\{}-\operatorname{tr}(\Sigma^{-1}X)\Big{\}}\mathbbm{1}\!\left[X>0\right],

\displaystyle f(X)=\frac{1}{|\Sigma|^{\alpha}\Gamma_{p}\left(\alpha\right)}\big{|}X\big{|}^{\alpha-\frac{p+1}{2}}\exp\Big{\{}-\operatorname{tr}(\Sigma^{-1}X)\Big{\}}\mathbbm{1}\!\left[X>0\right],

\displaystyle\mathrm{H}(F_{1},F_{2})=\mathrm{H}(f_{1},f_{2})=\Big{(}\int\Big{|}f_{1}^{1/2}(x)-f_{2}^{1/2}(x)\Big{|}^{2}dx\Big{)}^{\frac{1}{2}}.

\displaystyle\mathrm{H}(F_{1},F_{2})=\mathrm{H}(f_{1},f_{2})=\Big{(}\int\Big{|}f_{1}^{1/2}(x)-f_{2}^{1/2}(x)\Big{|}^{2}dx\Big{)}^{\frac{1}{2}}.

\frac{1}{2} d_{TV} (f_{1}, f_{2}) \leq H (f_{1}, f_{2}) \leq d_{TV}^{1/2} (f_{1}, f_{2}) .

\frac{1}{2} d_{TV} (f_{1}, f_{2}) \leq H (f_{1}, f_{2}) \leq d_{TV}^{1/2} (f_{1}, f_{2}) .

F {f} (T) = \frac{1}{2 ^{\frac{p}{2}} π ^{\frac{p ( p + 1 )}{4}}} \bigintsss_{S_{p} (R)} e^{- i tr (T X)} f (X) d X .

F {f} (T) = \frac{1}{2 ^{\frac{p}{2}} π ^{\frac{p ( p + 1 )}{4}}} \bigintsss_{S_{p} (R)} e^{- i tr (T X)} f (X) d X .

\bigintsss_{S_{p} (R)} f (X) \widebar g (X) d X = \bigintsss_{S_{p} (R)} F {f} (T) \widebar F {g} (T) d T .

\bigintsss_{S_{p} (R)} f (X) \widebar g (X) d X = \bigintsss_{S_{p} (R)} F {f} (T) \widebar F {g} (T) d T .

G {f} = F {f^{1/2}}^{2},

G {f} = F {f^{1/2}}^{2},

\displaystyle\qquad\psi(T)=\mathcal{G}\{f\}(T)=\frac{1}{2^{p}\pi^{\frac{p(p+1)}{2}}}\bigg{(}\bigintsss_{\mathbb{S}_{p}(\mathbb{R})}e^{-i\operatorname{tr}(TX)}f^{1/2}(X)\,dX\bigg{)}^{2}.

\displaystyle\qquad\psi(T)=\mathcal{G}\{f\}(T)=\frac{1}{2^{p}\pi^{\frac{p(p+1)}{2}}}\bigg{(}\bigintsss_{\mathbb{S}_{p}(\mathbb{R})}e^{-i\operatorname{tr}(TX)}f^{1/2}(X)\,dX\bigg{)}^{2}.

\bigintsss_{S_{p} (R)} ∣ ψ (T) ∣ d T

\bigintsss_{S_{p} (R)} ∣ ψ (T) ∣ d T

= \bigintsss_{S_{p} (R)} ∣ f^{1/2} (X) ∣^{2} d X = \bigintsss_{S_{p} (R)} ∣ f (X) ∣ d X = 1.

d_{TV} (ψ_{1}, ψ_{2})

d_{TV} (ψ_{1}, ψ_{2})

and H (ψ_{1}, ψ_{2})

\frac{1}{2} d_{TV} (ψ_{1}, ψ_{2}) \leq H (ψ_{1}, ψ_{2}) \leq d_{TV}^{1/2} (ψ_{1}, ψ_{2}) .

\frac{1}{2} d_{TV} (ψ_{1}, ψ_{2}) \leq H (ψ_{1}, ψ_{2}) \leq d_{TV}^{1/2} (ψ_{1}, ψ_{2}) .

H^{2} (f_{1}, f_{2})

H^{2} (f_{1}, f_{2})

= \bigintsss_{S_{p} (R)} ∣ ψ_{1}^{1/2} (T) - ψ_{2}^{1/2} (T) ∣^{2} d T = H^{2} (ψ_{1}, ψ_{2}) .

H^{2} (f_{1}, f_{2}) \leq E [lo g \frac{f _{1} ( X )}{f _{2} ( X )}] for X \sim f_{1} .

H^{2} (f_{1}, f_{2}) \leq E [lo g \frac{f _{1} ( X )}{f _{2} ( X )}] for X \sim f_{1} .

H^{2} (ψ_{1}, ψ_{2}) \leq E [ℜ Log \frac{ψ _{1} ( T )}{ψ _{2} ( T )}] + 2 E [ℑ Log \frac{ψ _{1} ( T )}{ψ _{2} ( T )}]^{2} for T \sim F_{1}^{*},

H^{2} (ψ_{1}, ψ_{2}) \leq E [ℜ Log \frac{ψ _{1} ( T )}{ψ _{2} ( T )}] + 2 E [ℑ Log \frac{ψ _{1} ( T )}{ψ _{2} ( T )}]^{2} for T \sim F_{1}^{*},

\displaystyle\mathrm{H}^{2}(\psi_{1},\psi_{2})\;\;\leq\;\;\bigg{[}\!\!\bigintsss_{\mathbb{S}_{p}(\mathbb{R})}|\psi_{2}|(T)\,dT-1\bigg{]}+\operatorname{E}\!\left[\Re\operatorname{Log}\frac{\psi_{1}(T)}{\psi_{2}(T)}\right]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

The middle-scale asymptotics of Wishart matrices

Abstract

We study the behavior of a real $p$ -dimensional Wishart random matrix with $n$ degrees of freedom when $n,p\rightarrow\infty$ but $p/n\rightarrow 0$ . We establish the existence of phase transitions when $p$ grows at the order $n^{(K+1)/(K+3)}$ for every $k\in\mathbb{N}$ , and derive expressions for approximating densities between every two phase transitions. To do this, we make use of a novel tool we call the G-transform of a distribution, which is closely related to the characteristic function. We also derive an extension of the $t$ -distribution to the real symmetric matrices, which naturally appears as the conjugate distribution to the Wishart under a G-transformation, and show its empirical spectral distribution obeys a semicircle law when $p/n\rightarrow 0$ . Finally, we discuss how the phase transitions of the Wishart distribution might originate from changes in rates of convergence of symmetric $t$ statistics.

60B20,

60B10,

60E10,

keywords:

[class=MSC]

Didier Chételat label=e1][email protected] Martin T. Wells label=e2][email protected]

1 Introduction

The roots of random matrix theory lies in statistics, with the work of Wishart (1928) and Bartlett (1933), and in numerical analysis, with the work of Von Neumann and Goldstine (1947). In this early period, many well-known matrix distributions were introduced. This includes the real Gaussian matrix ensemble $\text{G}(p,q)$ , a $p\times q$ matrix with independent standard Gaussian entries, the Gaussian orthogonal ensemble $\text{GOE}(p)$ , the distribution of a symmetric matrix $(X+X^{t})/\sqrt{2}$ with $X\sim\text{G}(p,p)$ , and the Wishart (also known as Laguerre) distribution $\text{W}_{p}(n,I_{p}/n)$ , the distribution of a symmetric matrix $XX^{t}/n$ with $X\sim\text{G}(p,n)$ . During that time, the main concern was to derive properties of these distributions for a fixed dimension. Some asymptotics of the Wishart distribution were considered, but only as $n\rightarrow\infty$ for fixed $p$ .

Starting with the pioneering work of Wigner (1951, 1955, 1957), Porter and Rosenzweig (1960), Gaudin (1961) and Mehta (1960a, b), researchers began investigating the asymptotics of Gaussian ensembles as their dimension grew to infinity. As a result of decades of work, the behavior of a $\text{GOE}(p)$ matrix is now well understood both in the classical setting where $p$ is fixed, and in the setting where $p\rightarrow\infty$ .

However, the situation asymptotics of the Wishart distribution is more complicated, as it depends on two parameters, $n$ and $p$ , and initial progress was slow. The work of Marchenko and Pastur (1967) clearly established that the analogue of a Gaussian orthogonal ensemble matrix whose dimension $p$ grows to infinity is a Wishart matrix whose degrees of freedom $n$ and dimension $p$ jointly grow to infinity in such a way that $p/n\rightarrow c\in(0,1)$ . Since then, we gained a very good understanding of the behavior of Wishart matrices in this regime.

But this body of work left open the question as to what happens to a Wishart matrix when $n,p\rightarrow\infty$ with $p/n\rightarrow 0$ . Since such asymptotics are middle-scale between the classical regime where $p$ is fixed as $n\rightarrow\infty$ and the high-dimensional regime where $p/n\rightarrow c\in(0,1)$ , we might refer to them as middle-scale regimes. Hence, we might ask: what is the asymptotic behavior of a Wishart matrix $\text{W}_{p}(n,I_{p}/n)$ in the middle-scale regimes? This question is addressed this article.

To gain some intuition, it is instructive to look at the eigenvalues $\lambda_{1}>\dots>\lambda_{p}>0$ of a $\text{W}_{p}(n,I_{p}/n)$ Wishart matrix. In the classical regime where $p$ is fixed as $n\rightarrow\infty$ , the eigenvalues must all almost surely tend to $1$ by the strong law of large numbers. In constrast, in the high-dimensional regime where both $n,p\rightarrow\infty$ with $p/n\rightarrow c\in(0,1)$ , the Marchenko-Pastur law states that for any bounded, continuous $f$ , {ceqn}

[TABLE]

where $c_{\pm}=(1\pm\sqrt{c})^{2}$ . Thus the eigenvalues do not all tend to $1$ , but rather distribute themselves in the shape of a Marchenko-Pastur law with parameter $c$ .

What happens between these two extremes? When $c\rightarrow 0$ , the Marchenko-Pastur law converges weakly to a Dirac measure with mass at $1$ . This suggests that whenever $n,p\rightarrow\infty$ with $p/n\rightarrow 0$ {ceqn}

[TABLE]

or in other words that the eigenvalues converge almost surely to $1$ , as in the classical case.

This motivates a binary view of Wishart asymptotics. It appears that the behavior of a Wishart matrix in the middle-scale regimes is the same as in the classical regime, and therefore that there really are only two regimes: low-dimensional where $p/n\rightarrow 0$ , and high-dimensional where $p/n\rightarrow c\in(0,1)$ .

This binary view has very concrete repercussions. For example, in statistics, many covariance matrix estimators have been developed that leverage high-dimensional Wishart asymptotics (see Pourahmadi (2013) for a review). When faced with a problem where $p$ is large with respect to $n$ , it has been argued that the high-dimensional asymptotics, rather than the classical, constitute the correct model. The binary view provides a useful rule of thumb: small $p$ ’s call for classical covariance estimators, while large $p$ ’s call for high-dimensional covariance estimators.

Unfortunately, recent results establish that this binary view is incorrect. In the classical regime where $p$ is fixed, the central limit theorem implies that {ceqn}

[TABLE]

as $n\rightarrow\infty$ , where the arrow stands for weak convergence. In fact, something better is known: recent work has extended this result to the case where $p$ tends to infinity. Recall that the total variation distance between two absolutely continuous distributions $F_{1}$ and $F_{2}$ with densities $f$ and $g$ is given by $\mathrm{d}_{\text{TV}}(F_{1},F_{2})=\mathrm{d}_{\text{TV}}(f_{1},f_{2})=\int|f_{1}(x)-f_{2}(x)|dx$ . With different approaches, Jiang and Li (2015) and Bubeck et al. (2016) independently established that {ceqn}

[TABLE]

whenever $p^{3}/n\rightarrow 0$ . Thus, when $p^{3}/n\rightarrow 0$ , the same asymptotics hold as in the $p$ fixed case, and we might regard these regimes as rightfully belonging to the classical setting.

The surprising part is that the converse is true! When $p^{3}/n\nrightarrow 0$ , results of Bubeck et al. (2016) and Rácz and Richey (2016) show that {ceqn}

[TABLE]

Thus a phase transition occurs when $p$ is of the order $\sqrt[3]{n}$ . This begs the question: if a normal approximation fails to hold when $p$ grows faster than $\sqrt[3]{n}$ , what asymptotics hold? Is there a uniform asymptotic behavior that holds whenever $p/n\rightarrow 0$ with $p^{3}/n\nrightarrow 0$ , or are there further phase transitions as the growth rate of $p$ is increased?

The results of this paper offers a mostly complete answer to this question. Namely, we establish that when $p^{3}/n\nrightarrow 0$ but $p^{2}/n\rightarrow 0$ , {ceqn}

[TABLE]

where $F_{1}$ is a continuous distribution on the space of real symmetric matrices whose density is given when $n\geq 3p-3$ by

[TABLE]

for a $Z\sim\text{GOE}(p)$ . When $p$ grows like $\sqrt{n}$ , another phase transition occurs. Namely, we establish that when $p^{2}/n\nrightarrow 0$ but $p^{5/3}/n\rightarrow 0$ , {ceqn}

[TABLE]

where $F_{2}$ is a continuous distribution on the space of real symmetric matrices whose density is given when $n\geq 3p-3$ by

[TABLE]

again for a $Z\sim\text{GOE}(p)$ .

In general, for every $K\in\mathbb{N}$ we find a continuous distribution $F_{K}$ on the space of real symmetric matrices, with density given when $n\geq 3p-3$ by

[TABLE]

for a $Z\sim\text{GOE}(p)$ , which approximates the normalized Wishart distribution in some (but not all) middle-scale regimes. Namely, we prove the following, which can be regarded as the main result of this paper.

Theorem 1.

For any $K\in\mathbb{N}$ , the total variation distance between the the normalized Wishart distribution $\sqrt{n}[\text{W}_{p}(n,I_{p}/n)-I_{p}]$ and the $K^{\text{th}}$ degree density $f_{K}$ satisfies {ceqn}

[TABLE]

as $n\rightarrow\infty$ with $p^{K+3}/n^{K+1}\rightarrow 0$ .

The definition of $f_{K}$ and proof of Theorem 1 are found in Section 6, and follow from definitions and results from Sections 3, 4 and 5 that constitute the bulk of this paper.

The main consequence of this theorem is the existence of an infinite countable number of phase transitions, occurring when $p$ grows like $n^{(K+1)/(K+3)}$ for $K\in\mathbb{N}$ . A diagram is provided at Figure 1. This naturally groups the middle-scale regimes satisfying $\lim\limits_{n\rightarrow\infty}\frac{\log p}{\log n}<1$ by which semi-open interval $\big{[}\frac{K}{K+2},\frac{K+1}{K+3}\big{)}$ their limit $\lim\limits_{n\rightarrow\infty}\frac{\log p}{\log n}$ belongs to. We might refer to this grouping as the degree of the regime. In other words, we will say an middle-scale regime satisfying $\lim\limits_{n\rightarrow\infty}\frac{\log p}{\log n}<1$ has degree $K$ when $\lim\limits_{n\rightarrow\infty}\frac{\log p}{\log n}\in\big{[}\frac{K}{K+2},\frac{K+1}{K+3}\big{)}$ .

The main result of this paper, Theorem 1, may then be summarized as saying that the normalized Wishart distribution can be approximated by the distribution with density $f_{K}$ in every middle-scale regime of degree $K$ or less. The $0^{\text{th}}$ degree case corresponds to the classical setting, while the higher degrees correspond to previously unknown behavior. In fact, we show that our $0^{\text{th}}$ degree approximation $F_{0}$ is asymptotically equivalent to the Gaussian orthogonal ensemble. The results of this paper can therefore be regarded as a wide generalization of the Wishart asymptotics results of Jiang and Li (2015), Bubeck et al. (2016); Bubeck and Ganguly (2016) and Rácz and Richey (2016).

Our approach relies on a novel technical tool we call the G-transform. It turns out that to understand middle-scale regime behavior of Wishart matrices, densities are less clear than characteristic functions (that is, Fourier transforms of densities). Unfortunately, characteristic functions are difficult to relate to metrics like the total variation distance. To remedy this problem, we develop the G-transform and some associated theory in Section 3. An interesting aspect of G-transform theory is that to every distribution we can associate a closely related distribution called its G-conjugate. In fact, the G-conjugate of a Wishart matrix is essentially a generalization of the $t$ distribution to real symmetric matrices. In Section 4, we define and derive several results concerning this new distribution, including a semicircle law. From these results, we derive in Section 5 approximations to the Wishart distribution for middle-scale regimes of every degree. Since these approximations are given using the language of G-transforms, we derive in Section 6 density approximations, from which Theorem 1 follows. We briefly discuss what concrete effects the phase transitions might have on Wishart asymptotics in Section 7. Finally, we compile auxiliary results in Section 8, while we discuss in Section 9 open questions that arise from these results.

Although the results of this paper explain a large part of the behavior of Wishart matrices when $p/n\rightarrow 0$ , there exists regimes for which $p/n\rightarrow 0$ yet $p\not\in O(n^{(K+1)/(K+3)})$ for all $K\in\mathbb{N}$ , or in other words for which $\lim\limits_{n\rightarrow\infty}\frac{\log p}{\log n}=1$ . An example is when $p$ grows at the order $n^{1-1/\sqrt{\log n}}$ . Although the results of our paper characterize almost all middle-scale regimes in the sense that among those regimes satisfying $\lim\limits_{n\rightarrow\infty}\frac{\log p}{\log n}\leq 1$ , those such that $\lim\limits_{n\rightarrow\infty}\frac{\log p}{\log n}=1$ represent a negligible set, they nonetheless exist. One might regard regimes such as those as having infinite degree. Beyond this, however, it is difficult to say anything about the behavior of Wishart matrices in these regimes. More work in that direction is clearly needed.

2 Notation and definitions

The transpose of a matrix is denoted t, and the identity matrix of dimension $p$ is $I_{p}$ . As is standard, we take the trace operator to have lower priority than the power operator: thus for a matrix $X$ , $\operatorname{tr}X^{k}$ means the trace of $X^{k}$ . We will write $\operatorname{tr}^{k}X$ when we mean the $k^{\text{th}}$ power of the trace of $X$ . The Kronecker delta is the symbol $\delta_{kl}=\mathbbm{1}\!\left[k=l\right]$ .

The space of all real-valued symmetric matrices is denoted $\mathbb{S}_{p}(\mathbb{R})=\{X\in\mathbb{M}_{p}(\mathbb{R})|X=X^{t}\}$ . For a symmetric matrix $X$ , we define the symmetric differentiation operator $\partial_{\text{s}}/\partial_{\text{s}}X_{kl}$ by

[TABLE]

This operator has the elegant property that $\frac{\partial_{\text{S}}}{\partial_{\text{S}}X_{kl}}\operatorname{tr}(XY)=Y_{kl}$ for any two symmetric matrices $X$ , $Y$ .

The space of symmetric matrices $\mathbb{S}_{p}(\mathbb{R})$ can be assimilated to $\mathbb{R}^{p(p+1)/2}$ by mapping a symmetric matrix to its upper triangle. By integration over $\mathbb{S}_{p}(\mathbb{R})$ , we mean integration with respect to the pullback Lebesgue measure under this isomorphism, that is,

[TABLE]

We say a real symmetric matrix follows the Gaussian orthogonal ensemble $\text{GOE}(p)$ distribution if $X_{kl}$ , $k\leq l$ are all independent, with diagonal elements $X_{kk}\sim\text{N}(0,2)$ and off-diagonal elements $X_{kl}\sim\text{N}(0,1)$ .

Let $X$ be a $n\times p$ matrix of i.i.d. $\text{N}(0,1)$ random variables, and let $\Sigma$ be a $p\times p$ positive-definite matrix. The Wishart distribution $\text{W}_{p}(n,\Sigma)$ is the distribution of the random matrix $\Sigma^{\frac{1}{2}}X^{t}X\Sigma^{\frac{1}{2}}$ . This is a special case of the matrix gamma distribution. Following Gupta and Nagar (1999, Section 3.6), we say a positive-definite matrix $X$ has a matrix gamma distribution $\text{G}_{p}(\alpha,\Sigma)$ with shape parameter $\alpha>(p-1)/2$ and scale parameter $\Sigma$ if it has density over $\mathbb{S}_{p}(\mathbb{R})$ given by

[TABLE]

where $\Gamma_{p}$ is the multivariate gamma function. With this definition, the Wishart distribution $W_{p}(n,\Sigma)$ is a matrix gamma with shape $\frac{n}{2}$ and scale $2\Sigma$ .

While studying the Wishart distribution, the expression $n-p-1$ comes up so often that it makes sense to give it its own symbol. We will therefore write $m=n-p-1$ .

The Hellinger distance is metric between absolutely continuous probability measures. For two distributions $F_{1}$ and $F_{2}$ with densities $f_{1}$ and $f_{2}$ , their Hellinger distance is defined as

[TABLE]

The Hellinger distance is closely related to the total variation distance by the inequalities

[TABLE]

In particular, $\mathrm{H}(f_{1},f_{2})\rightarrow 0$ if and only if $\mathrm{d}_{\text{TV}}(f_{1},f_{2})\rightarrow 0$ . Thus they can be seen as inducing the same topology on absolutely continuous probability measures, called the strong topology, in contrast to the topology induced by weak convergence of measures called the weak topology. One can show that if a sequence of measures converges in the strong sense (i.e. in the $\mathrm{d}_{\text{TV}}$ or $\mathrm{H}$ metrics), then it converges weakly.

3 G-transforms

Our analysis of Wishart matrices relies heavily on a tool we call the G-transform of a probability measure. To do so, we first need to define the Fourier transform over symmetric matrices.

In Section 2, we clarified what we meant by integration over $\mathbb{S}_{p}(\mathbb{R})$ . For a function $f:\mathbb{S}_{p}(\mathbb{R})\rightarrow\mathbb{C}$ in $L^{1}(\mathbb{S}_{p}(\mathbb{R}))$ , we define its Fourier transform to be

[TABLE]

It is more common to define the Fourier transform on symmetric matrices with the integrand $\exp\big{\{}-i\sum_{k\leq l}T_{kl}X_{kl}\big{\}}$ , but choosing $\exp\big{\{}i\operatorname{tr}(TX)\big{\}}$ considerably simplifies our computations.

We extend this definition to $f\in L^{r}(\mathbb{S}_{p}(\mathbb{R}))$ , $1<r\leq 2$ in the usual manner. Because of the specific normalization chosen, this definition obeys a simple version of Plancherel’s theorem, namely

[TABLE]

We now define the G-transform. In itself, the definition has nothing to do with symmetric matrices and could have been perfectly well defined on any other space endowed with a Fourier transform.

Definition 1 (G-transform of a density).

Let $f$ be an integrable function $\mathbb{S}_{p}(\mathbb{R})\rightarrow\mathbb{C}$ . Its G-transform is the complex-valued function $\mathcal{G}\{f\}:\mathbb{S}_{p}(\mathbb{R})\rightarrow\mathbb{C}$ defined by

[TABLE]

where $z^{1/2}$ stands for the principal branch of the complex logarithm.

In the same way that the Fourier transform maps $L^{2}(\mathbb{S}_{p}(\mathbb{R}))$ to itself, the G-transform maps $L^{1}(\mathbb{S}_{p}(\mathbb{R}))$ to itself.

By extension, for an absolutely continuous distribution on $\mathbb{S}_{p}(\mathbb{R})$ with density $f$ , we will define its G-transform to be the G-transform of its density. (This usage mirrors other transforms, such as the Stietjes transform.) We will usually denote the G-transform of $f$ by $\psi$ . Since a density is integrable, this is always well-defined. Moreover, $f=\mathcal{F}^{-1}\{\psi^{1/2}\}^{2}$ , so the density can be recovered from the G-transform, and therefore to understand a distribution it is equivalent to study its density or its G-transform.

Two comments are in order. First, for many densities, $f^{1/2}\in L^{1}(\mathbb{S}_{p}(\mathbb{R}))$ . In this case, the G-transform can be written explicitly as

[TABLE]

Second, throughout this article we will often talk about “the” square root of a G-transform. To be clear, by $\psi^{1/2}$ we will always mean $\mathcal{F}\{f^{1/2}\}$ .

Now, in many ways, the G-transform behaves similarly to the characteristic function (Fourier transform of a density), but it has unique features. First, Plancherel’s theorem yields that

[TABLE]

Thus $|\psi|$ is itself a density, which we will call the G-conjugate of $f$ . (In particular, $\psi^{1/2}$ is much like a quantum-mechanical wavefunction.) We will also use an asterisk notation, so that the G-conjugate of a $\text{N}(0,1)$ distribution will be denoted $\text{N}(0,1)^{*}$ . For example, straightforward computations yield that $\text{N}(0,1)^{*}=\text{N}(0,1/8)$ , $\chi^{2*}_{n}=\frac{1}{\sqrt{8n}}t_{n/2}$ (where $\chi^{2}_{\nu}$ and $t_{\nu}$ are the univariate $\chi^{2}$ and $t$ distributions with $\nu$ degrees of freedom, respectively) and $(aF+b)^{*}=a^{-1}F^{*}$ for any distribution $F$ and scalars $a\not=0$ , $b\in\mathbb{R}$ . Studying the G-conjugate of the Wishart distribution will play a key part in deriving results about the Wishart distribution itself. We should note that, in general, the double G-conjugate $F^{**}$ is not the same as $F$ . For example, $\chi^{2**}_{n}$ is a density involving modified Bessel functions of the first kind, not a $\chi^{2}_{n}$ .

A second feature that distinguishes G-transforms from characteristic functions is that they are easy to relate to the Hellinger distance between probability measures. Consider two densities $f_{1},f_{2}$ with G-transforms $\psi_{1},\psi_{2}$ . By analogy, we could define the “total variation” and “Hellinger” distances of $\psi_{1}$ and $\psi_{2}$ by

[TABLE]

Since the modulus of the G-transforms integrate to one, their total variation and Hellinger distances are related to each other in the same way as in Equation 2.1 for densities, namely

[TABLE]

Thus $\mathrm{d}_{\text{TV}}(\psi_{1},\psi_{2})\rightarrow 0$ if and only if $\mathrm{H}(\psi_{1},\psi_{2})\rightarrow 0$ . But the Hellinger distance between G-transforms is much more useful. Indeed, by the Plancherel theorem, for any two densities $f_{1},f_{2}$ with G-transforms $\psi_{1},\psi_{2}$ , their Hellinger distance satisfies

[TABLE]

Thus to compute the Hellinger distance $\mathrm{H}^{2}(f_{1},f_{2})$ between two densities, we can instead compute the Hellinger distance $\mathrm{H}^{2}(\psi_{1},\psi_{2})$ of their G-transforms. In contrast, there is no explicit way to express the Hellinger distance in terms of characteristic functions. And no such connection exists between the total variation distances of densities and G-transforms.

The G-transform does have some disadvantages compared to the Fourier transform. It is a non-linear transformation (and therefore not a true transform), and it does not behave well with respect to convolution. For our purposes, however, the advantages listed above outweigh these problems.

In practice, it is not aways easy to control the Hellinger distance directly, and one often focuses on the Kullback-Leibler divergence instead. The two quantities are related through the well known inequality

[TABLE]

For G-transforms, the following analog holds, which clarifies our interest in G-conjugates:

[TABLE]

where $\operatorname{Log}$ stands for the principal branch of the complex logarithm. In fact, in this article we will need a further generalization, where $\psi_{2}$ does not need to be a G-transform of a density.

Proposition 1 (Kullback-Leibler inequality for G-transforms).

Let $\psi_{1}$ be the G-transform of an absolutely continuous distribution $F_{1}$ on $\mathbb{S}_{p}(\mathbb{R})$ , and let $\psi_{2}$ be an integrable function $\mathbb{S}_{p}(\mathbb{R})\rightarrow\mathbb{C}$ . Then

[TABLE]

for $T\sim F_{1}^{*}$ , where $\operatorname{Log}$ stands for the principal branch of the complex logarithm.

Proof.

We can write

[TABLE]

Now using the inequality $-\cos(x)\leq-1+\sqrt{2|x|}$ that holds for any $x\in\mathbb{R}$ . The last quantity is bounded as

[TABLE]

In the second term, use $1-x\leq-\log(x)$ for $x\geq 0$ , while in the third term, use the Cauchy-Schwarz inequality to obtain

[TABLE]

Now use Jensen’s inequality in the second term and the algebraic identity $\exp\{-\Re\operatorname{Log}\psi_{1}(T)/\psi_{2}(T)\}\allowbreak=|\psi_{2}|(T)/|\psi_{1}|(T)$ in the third term to obtain

[TABLE]

as desired. ∎

Let us now compute the G-transform of the Gaussian Orthogonal Ensemble and the normalized Wishart distribution, which will be needed in our proofs. The density of a $\text{GOE}(p)$ matrix over $\mathbb{S}_{p}(\mathbb{R})$ is

[TABLE]

To compute its G-transform, we will make use of the fact that the elements of a $\text{GOE}(p)$ matrix are independent to reduce the expression to a product of characteristic functions.

Proposition 2.

The G-transform of the Gaussian Orthogonal Ensemble density on $\mathbb{S}_{p}(\mathbb{R})$ is

[TABLE]

Proof.

From Equation (3.9), $f_{\text{GOE}}^{1/2}$ is proportional to the density of the $\sqrt{2}\,\text{GOE}(p)$ distribution, so it is integrable. Therefore, we can apply Equation (3.3) to find that

[TABLE]

for $Z\sim\text{N}(0,1)$ . The characteristic function of a $\text{N}(0,1)$ is $\exp(-t^{2}/2)$ , so

[TABLE]

Squaring this result yields the desired expression for $\psi_{\text{GOE}}$ . ∎

In particular, we see that $|\psi_{\text{GOE}}|$ is the density of a $\text{GOE}(p)/4$ distribution, or in other words that $\text{GOE}(p)^{*}=\text{GOE}(p)/4$ . In particular the Gaussian orthogonal ensemble is its own G-conjugate, up to a constant factor.

Let us now compute the G-transform of the normalized Wishart distribution. Unlike the $\text{GOE}(p)$ case, the elements of the matrix are not independent, but the elements of its Cholesky decomposition are. By being careful about complex changes of variables, we can reduce the computation of the G-transform to the computation of characteristic functions of the Cholesky elements.

Proposition 3.

Let $n\geq p-2$ . Then the G-transform of the normalized Wishart distribution $\sqrt{n}[\text{W}_{p}(n,I_{p}/n)-I_{p}]$ density on $\mathbb{S}_{p}(\mathbb{R})$ is given by

[TABLE]

Proof.

Recall the notation $m=n-p-1$ used throughout this article. The density of a $Y\sim\text{W}_{p}(n,I_{p}/n)$ distribution is

[TABLE]

If we do a change of variables $X=\sqrt{n}(Y-I_{p})$ , so that $Y=I_{p}+X/\sqrt{n}$ and

[TABLE]

we see that the normalized Wishart distribution $\sqrt{n}[\text{W}_{p}(n,I_{p}/n)-I_{p}]$ has density

[TABLE]

Notice that $f_{\text{W}}^{1/2}$ is proportional to $\exp\{-\operatorname{tr}\!\left([\frac{4}{n}I_{p}]^{-1}Y\right)\}|Y|^{\frac{n+p+1}{4}-\frac{p+1}{2}}$ , so it must be proportional to the density of a matrix gamma distribution $\text{G}_{p}\left(\frac{n+p+1}{4},\frac{4}{n}I_{p}\right)$ when $\frac{n+p+1}{4}>\frac{p-1}{2}$ , i.e. $n\geq p-2$ . In particular, it must be integrable. As $f_{\text{NW}}$ was obtained by a linear change of variables from $f_{\text{W}}$ , $f_{\text{NW}}^{1/2}$ must be integrable too, that is

[TABLE]

Therefore, we can apply Equation (3.3) to obtain

[TABLE]

If we rewrite the expectation in terms of $Y=I_{p}+X/\sqrt{n}$ , this last expression equals

[TABLE]

Since $T$ is real symmetric, there must be a spectral decomposition $T=ODO^{t}$ with $O$ real orthogonal and $D$ real diagonal. As $O^{t}YO$ has the same distribution as $Y$ , namely $\text{W}_{p}(n,I_{p}/n)$ , we can rewrite Equation (3.12) as

[TABLE]

Now, since $Y$ is positive-definite it has a Cholesky decomposition $Y=U^{t}U$ with $U$ upper-triangular. According to Bartlett’s theorem (see Muirhead, 1982, Theorem 3.2.14), all the elements of $U$ are independent, the diagonal elements have the distribution $U_{kk}^{2}\sim\chi_{n-k+1}^{2}/n$ and the upper diagonal elements have $U_{kl}\sim\text{N}(0,1/n)$ for $k<l$ . Since

[TABLE]

and $|Y|=\prod_{k=1}^{p}U^{2}_{kk}$ , we have by independence and Equation (3.13) that

[TABLE]

We will now compute these expected values in several steps. For a given $1\leq k\leq p$ , let

[TABLE]

Since $T^{2}_{kk}\sim\chi^{2}_{n-k+1}/n$ and $m=n-p-1$ , we have

[TABLE]

Consider the truncated integrands

[TABLE]

Clearly this sequence is dominated by the integrable positive function $h$ ,

[TABLE]

Therefore, by the Dominated Convergence Theorem and Equation (3.16),

[TABLE]

By the change of variables $z=\big{(}\frac{n}{4}+\sqrt{n}D_{kk}i\big{)}x$ , this can be rewritten

[TABLE]

To compute this integral, we use a contour argument. Consider the closed path $C=C_{1}+C_{2}+C_{3}$ given by $C_{1}$ a path from [math] to $\frac{nM}{4}$ , $C_{2}$ a path from $\frac{nM}{4}$ to $\frac{nM}{4}+\sqrt{n}D_{kk}Mi$ and finally $C_{3}$ a path from $\frac{nM}{4}+\sqrt{n}D_{kk}Mi$ to 0. A diagram is provided as Figure 2.

As $k\leq p$ , $z\mapsto e^{-z}z^{\frac{n-2k+p+3}{4}-1}$ is entire and its integral over $C$ must be zero. Therefore

[TABLE]

Do a change of variables $z=\frac{nM}{4}+y\sqrt{n}D_{kk}Mi$ , so that $y=\frac{z-\frac{nM}{4}}{\sqrt{n}D_{kk}Mi}$ is real on the path. It yields

[TABLE]

This last integral is finite, since it is continuous on a bounded interval. Therefore the limit is zero and by Equation (3.17) and the previous expression,

[TABLE]

Going back to (3.14), let us now consider the expectations in the second products. For fixed $1\leq l<k\leq p$ , let

[TABLE]

Since $T_{lk}^{2}\sim\chi^{2}_{1}/n,$

[TABLE]

Consider the truncated integrands

[TABLE]

We see that they are dominated by a positive, integrable function $h(x)$ ,

[TABLE]

Therefore, by the Dominated Convergence Theorem and Equation (3.20), we conclude that

[TABLE]

A complex change of variables $z=\big{(}\frac{n}{4}+i\sqrt{n}D_{kk}\big{)}x$ yields

[TABLE]

Let’s compute this integral again using a contour integration argument. Consider the contour $C=C_{1}+C_{2}+C_{3}+C_{4}$ given by $C_{1}$ a line from $\frac{n}{4M}$ to $\frac{nM}{4}$ , $C_{2}$ a line from $\frac{nM}{4}$ to $\frac{nM}{4}+\sqrt{n}D_{kk}Mi$ , $C_{3}$ a line from $\frac{nM}{4}+\sqrt{n}D_{kk}Mi$ to $\frac{n}{4M}+\frac{\sqrt{n}D_{kk}}{M}i$ and $C_{4}$ a line from $\frac{n}{4M}+\frac{\sqrt{n}D_{kk}}{M}i$ to $\frac{n}{4M}$ . A diagram is provided as Figure 3.

Since $z\mapsto e^{-z}/\sqrt{z}$ is holomorphic away from zero,

[TABLE]

By changes of variables $z=\frac{nM}{4}+y\sqrt{n}D_{kk}Mi$ and $z=\frac{n}{4M}+y\frac{\sqrt{n}D_{kk}}{M}i$ on the two respective integrals, we get

[TABLE]

Since $\Big{|}\frac{n}{4}+y\sqrt{n}D_{kk}i\Big{|}^{-\frac{1}{2}}=\Big{(}\frac{n^{2}}{16}+y^{2}nD^{2}_{kk}\Big{)}^{-\frac{1}{4}}$ is continuous on $[0,1]$ , a bounded interval, we conclude that the integrals are finite and that the limits are zero. Therefore, by Equation (3.22),

[TABLE]

Recall the definitions of A and B at Equations (3.15) and (3.19). Combining both Equations (3.18) and (3.23) into the expression for $\psi^{1/2}_{\text{NW}}$ at Equation (3.14) provides

[TABLE]

But by Muirhead (1982, Theorem 2.1.12),

[TABLE]

so by taking a $n/4$ factor out of the determinant, we find that

[TABLE]

Squaring this result yields the desired expression for $\psi_{\text{NW}}$ . ∎

By Proposition 3, when $n\geq p-2$ the G-conjugate of a normalized Wishart distribution must have a density on $\mathbb{S}_{p}(\mathbb{R})$ given by

[TABLE]

As mentioned in the paragraph following Equation (3), the G-conjugate of a $\chi^{2}_{n}/n$ distribution is a scaled $t_{n/2}$ . Thus, by analogy, Equation (3.24) should be represent some kind of generalization of the $t$ distribution to the real symmetric matrices. Matrix-variate generalizations of the $t$ distribution have been investigated in the past, but not for symmetric matrices. Hence it appears the concept is new.

This motivates us to propose in Section 4 a candidate for a symmetric matrix variate $t$ distribution. Using that definition, the G-conjugate to the normalized Wishart could then be regarded as the $t$ distribution with $n/2$ degrees of freedom and scale matrix $I_{p}/8$ , which we denote $T_{n/2}(I_{p}/8)$ . But regardless of its name, this distribution will play a key role in our results about the middle-scale regime asymptotics of Wishart matrices, and will be investigated in depth in Section 4.

4 The symmetric matrix variate $t$ distribution

In Section 3, Equation (3.24), we proved that when $n\geq p-2$ , the G-conjugate of the normalized Wishart distribution $\sqrt{n}[\text{W}_{p}(n,I_{p}/n)-I_{p}]$ has density on $\mathbb{S}_{p}(\mathbb{R})$ given by

[TABLE]

Two remarks are in order. First, we are unaware of any matrix calculus tools that could let us integrate this expression directly. Thus, the mere fact that this expression integrates to unity, a consequence of being the G-conjugate of another distribution, seems remarkable.

Second, when $p=1$ , this is the $t_{n/2}/\sqrt{8}$ distribution. Thus, as we mentioned while discussing Equation (3.24), it is natural to interpret this distribution as the parametrization of some generalization of the $t$ distribution to $\mathbb{S}_{p}(\mathbb{R})$ , the space of real-valued symmetric matrices. The purpose of this section is to propose a candidate definition for such generalization, as well as prove several results concerning the normalized Wishart G-conjugate.

To the best of our knowledge, no extension of the $t$ distribution to symmetric matrices has ever been proposed. However, a non-symmetric matrix variate $t$ distribution has been thoroughly investigated in the literature – see Gupta and Nagar (1999, Chapter 4) for a thorough summary. Several definitions exist. For our purposes, we say that a $p\times q$ real-valued random matrix $T$ has the matrix variate $t$ distribution with $\nu$ degrees of freedom and $q\times q$ positive-definite scale matrix $\Omega$ if it has density

[TABLE]

It is not exactly clear what should be the proper analog of this distribution for symmetric matrices. But it would be elegant if the degrees of freedom of Equation (4.1) were to be exactly $n/2$ , as in the univariate case. Thus, the following definition seems natural.

Definition 2 (Symmetric matrix variate $t$ distribution).

We say a real symmetric $p\times p$ matrix $T$ has the symmetric matrix variate $t$ distribution with $\nu\geq p/2-1$ degrees of freedom and $p\times p$ positive-definite scale matrix $\Omega$ , denoted $T_{\nu}(\Omega)$ , if it has density

[TABLE]

With this definition, the G-conjugate to the normalized Wishart distribution, whose density is given by Equation (4.1), is the $T_{n/2}(I_{p}/8)$ distribution on $\mathbb{S}_{p}(\mathbb{R})$ .

In fact, since Equation (4.1) integrates to one, we can deduce the normalization constant of Definition 2. For an arbitrary degrees of freedom parameter $\nu$ , imagine the density $|\psi_{\text{NW}}|$ of the G-conjugate of a normalized Wishart distribution with $n=2\nu\geq p-2$ . By virtue of being a G-conjugate, it must integrate to unity. Then from the change of variables $T=\Omega^{-\frac{1}{4}}S\Omega^{-\frac{1}{4}}/\sqrt{8}$ which has Jacobian $dT=8^{-\frac{p(p+1)}{4}}|\Omega|^{-\frac{p+1}{4}}dS$ , we see that

[TABLE]

Thus we must have

[TABLE]

It would be interesting to see if this distribution satisfies the properties we would expect of a $t$ distribution, to ensure our guess is the “correct” one. However, this would take us too far away from the topic of this article. Instead, we will focus in the rest of this section on proving results about $T_{n/2}(I_{p}/8)$ , the G-conjugate to the normalized Wishart distribution.

Our first result will concern the asymptotic expansion of its normalization constant. We mention that this constant is the same as the $C_{n,p}$ term appearing in the expression of the G-transform of the normalized Wishart in Proposition 3.

Lemma 1.

The normalization constant of the $T_{n/2}(I_{p}/8)$ distribution

[TABLE]

has, for every $K\in\mathbb{N}$ , the asymptotic expansion

[TABLE]

as $n\rightarrow\infty$ with $p/n\rightarrow 0$ .

Proof.

By Stirling’s approximation applied to $\log\Gamma$ , as well as Muirhead (1982, Theorem 2.1.12), we find that

[TABLE]

as $x\rightarrow\infty$ . Thus

[TABLE]

as $n\rightarrow\infty$ with $p/n\rightarrow 0$ , and so by Equation (4.3),

[TABLE]

Let us now focus on the two sums in this expression. Recall that for any $k\geq 1$ ,

[TABLE]

even for negative $x$ . Therefore,

[TABLE]

Now let $B_{k}$ denote the Bernoulli numbers, with the convention $B_{1}=\frac{1}{2}$ . Faulhaber’s formula provides

[TABLE]

But by the binomial theorem, $\frac{(p-1)^{k+2}}{n^{k}}=\frac{p^{k+2}}{n^{k}}-(k+2)\frac{p^{k+1}}{n^{k}}+o(1)$ , $\frac{(p-1)^{k+1}}{n^{k}}=\frac{p^{k+1}}{n}+o(1)$ and $\frac{(p-1)^{l}}{n^{k}}=o(1)$ for any $1\leq l\leq k$ . Thus

[TABLE]

Using that $B_{0}=1$ and $B_{1}=\frac{1}{2}$ , we obtain

[TABLE]

The analysis of the other sum of Equation (4.4) is similar but more involved, as we must distinguish the cases where $p$ is even and where $p$ is odd. We find, from Equation (4.5) again, that

[TABLE]

At this point, it is simpler to analyze the cases where $p$ is even and odd separately. If $p$ is odd, define $q=(p-3)/2$ and observe that by Faulhaber’s formula,

[TABLE]

By the binomial theorem, $\frac{(2q)^{k+2}}{n^{k}}=\frac{p^{k+2}}{n^{k}}-3(k\!+\!2)\frac{p^{k+1}}{n^{k}}+o(1)$ , $\frac{(2q)^{k+1}}{n^{k}}=\frac{p^{k+1}}{n^{k}}+o(1)$ and $\frac{(2q)^{l}}{n^{k}}=o(1)$ for $1\leq l\leq k$ . Moreover,

[TABLE]

and

[TABLE]

Thus, for odd $p$ , Equation 4.7 equals

[TABLE]

Moreover,

[TABLE]

Thus,

[TABLE]

When $p$ is even, let $q=(p-2)/2$ and observe that by Faulhaber’s formula,

[TABLE]

But by the binomial theorem, $\frac{(2q)^{k+2}}{n^{k}}=\frac{p^{k+2}}{n^{k}}-2(k\!+\!2)\frac{p^{k+1}}{n^{k}}+o(1)$ , $\frac{(2q)^{k+1}}{n^{k}}=\frac{p^{k+1}}{n^{k}}+o(1)$ and $\frac{(2q)^{l}}{n^{k}}=o(1)$ for $1\leq l\leq k$ . If we apply Equations (4.8)–(4.9), then Equation (4.7) becomes

[TABLE]

Moreover,

[TABLE]

Thus again,

[TABLE]

which is the exact same result as in the odd $p$ case (see Equation 4.10). Plugging Equations (4.6) and (4.10)–(4.11) in Equation (4.4), we obtain

[TABLE]

as desired. ∎

Thus the constant $C_{n,p}$ is closely related to the normalization constant of the $\text{GOE}(p)$ distribution $2^{p(3p+1)/4}/\pi^{p(p+1)/2}$ .

We now turn our attention to the study of the asymptotic moments of a $T_{n/2}(I_{p}/8)$ distribution. We first remind the reader of some classic results. For a Gaussian Orthogonal Ensemble matrix $Z\sim\text{GOE}(p)$ , a moment-based approach to Wigner’s theorem states that for any $k\in\mathbb{N}$ , its $k^{\text{th}}$ moment satisfy

[TABLE]

where $C_{k}=\frac{1}{k+1}\binom{2k}{k}$ is the $k^{\text{th}}$ Catalan number. In fact, Anderson et al. (2010, section 2.1.4 on p.17) show that the variance of the $k^{\text{th}}$ moment satisfies $\lim\limits_{p\rightarrow\infty}\text{Var}\big{[}\frac{1}{p}\operatorname{tr}(Z/\sqrt{p})^{k}\big{]}=0$ , so we really have

[TABLE]

as $p\rightarrow\infty$ .

Now, what do we know about the moments of $T_{n/2}(I_{p}/8)$ ? By symmetry, $\operatorname{E}\!\left[\operatorname{tr}T^{k}\right]=0$ for odd $k$ , but it is much less clear what happens for even $k$ . It turns out that in many ways, if $T\sim T_{n/2}(I_{p}/8)$ then $4T\sim T_{n/2}(2I_{p})$ mimics the Gaussian Orthogonal Ensemble results outlined above, especially when $p/n\rightarrow 0$ as $n\rightarrow\infty$ . We have the following result.

Theorem 2.

Let $k\in\mathbb{N}$ and $T\sim T_{n/2}(I_{p}/8)$ . If $p/n\rightarrow c\in[0,1)$ , the moments of $T$ satisfy the asymptotic bounds $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]=O(p^{k+1})$ and $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{k}\right]=O(p^{k+2})$ as $n\rightarrow\infty$ . In fact, for any $k\in\mathbb{N}$ ,

[TABLE]

as $n,p\rightarrow\infty$ with $p/n\rightarrow 0$ , where $C_{k}=\frac{1}{k+1}\binom{2k}{k}$ is the $k^{\text{th}}$ Catalan number.

Although our proof will rely on the close relationship between the Wishart and the $t$ distribution, it is worthwhile to step back and think why a $T_{n/2}(2I_{p})$ should behave like a $\text{GOE}(p)$ when $p/n\rightarrow 0$ . One good reason might be the classic result that as $n\rightarrow\infty$ , the density of a $t$ distribution converges pointwise to a standard normal density. Thus, we might think that as long as $p$ does not grow too fast, in some aspects the symmetric $t$ distribution should behave like a $\text{GOE}(p)$ .

In the context of the proof, it will prove useful to use the notion of power sum symmetric polynomials. For any integer partition $\kappa=(\kappa_{1},\dots,\kappa_{q})$ in decreasing order $\kappa_{1}\geq\dots\geq\kappa_{q}>0$ , define its associated power sum polynomial to be

[TABLE]

The norm of the partition $\kappa$ is $|\kappa|=\kappa_{1}+\dots+\kappa_{q}>0$ , which should not be confused with its length $q(\kappa)=q$ (number of elements).

By convention, we will assume there also exists an empty partition $\varnothing=()$ with length $q(\varnothing)=0$ , norm $|\varnothing|=0$ and power sum polynomial $r_{\varnothing}(Z)=1$ .

Let’s now turn to the proof of the theorem. The odd moments of the $T_{n/2}(I_{p}/8)$ moments are zero by symmetry, so it makes sense to focus on the even moments $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]$ and the square moments $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{k}\right]$ . Our first step in the proof is to express these in terms of expectations of power sum polynomials of an inverse Wishart $Y^{-1}\sim\text{W}^{-1}_{p}(n,I_{p}/n)$ , where by power sum polynomials we mean expressions like at Equation (4.12). Recall the useful shorthand $m=n-p-1$ .

Lemma 2.

Let $T\sim\text{T}_{n/2}(I_{p}/8)$ . Then for any $k\in\mathbb{N}$ , whenever $n$ is large enough so that $n\geq p+16k+6$ , we can compute the $2k^{\text{th}}$ moment of $T$ by

[TABLE]

and its squared $k^{\text{th}}$ moment by

[TABLE]

for $Y^{-1}\sim\text{W}^{-1}_{p}(n,I_{p}/n)$ and some $b^{(1)}_{\kappa}$ , $b^{(2)}_{\kappa}$ . These $b^{(1)}_{\kappa}$ , $b^{(2)}_{\kappa}$ are polynomials in $n,m,p$ , indexed by integer partitions $\kappa$ , whose degrees satisfy $\mathrm{deg}\,b^{(1)}_{\kappa}\leq 2k+1-q(\kappa)$ and $\mathrm{deg}\,b^{(2)}_{\kappa}\leq 2k+2-q(\kappa)$ . The sums are taken over all partitions of the integers $\kappa$ satisfying $|\kappa|\leq 2k$ and $|\kappa|\leq 2k+1$ respectively, including the empty partition.

Proof.

Let $f_{\text{NW}}$ and $\psi_{\text{NW}}$ stand for the density and the G-transform of a normalized Wishart matrix $\sqrt{n}[\text{W}_{p}(n,I_{p}/n)-I_{p}]$ . In the proof of Proposition 3, we concluded at Equation (3.11) that $f^{1/2}_{\text{NW}}$ had to be integrable when $n\geq p-2$ , as its integral was proportional to a multivariate gamma function. Let $R(X)=-X$ be the flip operator. Since $f^{1/2}_{\text{NW}}$ is integrable, $f^{1/2}_{\text{NW}}\circ R$ must be integrable as well, and so their convolution $f^{1/2}_{\text{NW}}\star\big{(}f^{1/2}_{\text{NW}}\circ R\big{)}$ is well-defined and integrable.

At Equation (3.1) at the start of Section 3, we defined our notion of Fourier transform for integrable functions on $\mathbb{S}_{p}(\mathbb{R})$ . Define the map $\iota:S_{p}(\mathbb{R})\rightarrow\mathbb{R}^{p(p+1)/2}$ that maps a symmetric matrix to its vectorized upper triangle, and let $\tau:\mathbb{S}_{p}(\mathbb{R})\rightarrow\mathbb{S}_{p}(\mathbb{R})$ be the map

[TABLE]

Then in terms of the usual Fourier transform on $\mathbb{R}^{p(p+1)/2}$ ,

[TABLE]

This close relationship transfer properties to our Fourier transform on $\mathbb{S}_{p}(\mathbb{R})$ . We will need three.

For any integrable function $f$ , we have $\mathcal{F}\big{\{}\overline{f\circ R}\big{\}}=\overline{\mathcal{F}\{f\}}$ . 2. 2.

(Convolution) For any two integrable functions $f_{1}$ and $f_{2}$ , we have $\mathcal{F}\{f_{1}\star f_{2}\}=2^{\frac{p}{2}}\pi^{\frac{p(p+1)}{4}}\mathcal{F}\{f_{1}\}\mathcal{F}\{f_{2}\}$ . 3. 3.

(Fourier inversion) For any continuous integrable $f$ with integrable Fourier transform $\phi$ , we have

[TABLE]

for all $X\in\mathbb{S}_{p}(\mathbb{R})$ .

These properties are important for the following. Since $f_{\text{NW}}$ is real-valued, properties 1 and 2 provide

[TABLE]

But then, since $|\psi_{\text{NW}}|$ is integrable (in fact, to unity), the Fourier inversion formula yields that

[TABLE]

Thus we might say the characteristic function of the $T_{n/2}(I_{p}/8)$ distribution is given by $f^{1/2}_{\text{NW}}\star\big{(}f^{1/2}_{\text{NW}}\circ R\big{)}$ . It is well known that the derivatives of the characteristic function of a distribution evaluated at zero provide its moments, up to a constant. This suggests we should try to repeatedly differentiate $f^{1/2}_{\text{NW}}\star\big{(}f^{1/2}_{\text{NW}}\circ R\big{)}$ at zero to compute $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]$ and $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{k}\right]$ , our ultimate goal.

Unfortunately, the convolution is given by an integral whose domain makes it difficult to directly interchange the differentiation and integration symbols. Because the integrand is orthogonally invariant, we found it easier to compute the derivatives at zero by taking a limit over a sequence of decreasing positive-definite matrices at both sides instead. In this spirit, define on the open set $\{0<X<I_{p}\}\subset\mathbb{S}_{p}(\mathbb{R})$ the real-valued functions

[TABLE]

and

[TABLE]

for fixed $k$ , $p$ and $n$ . Here $\frac{\partial_{\text{s}}}{\partial_{\text{s}}X_{ij}}$ stands for the symmetric differentiation operator $\frac{\partial_{\text{s}}}{\partial_{\text{s}}X_{ij}}=\frac{1+\delta_{ij}}{2}\frac{\partial}{\partial X_{ij}}$ , as defined in Section 2. The $\sqrt{n}$ scaling in the argument helps link the convolution to an expectation with respect to an inverse Wishart distribution.

Let’s first relate these functions to the moments of the $T_{n/2}(I_{p}/8)$ distribution. The symmetric differentation operator has the pleasant property that $\frac{\partial_{\text{s}}}{\partial_{\text{s}}X_{ij}}\operatorname{tr}(XT)=T_{ij}$ for any two symmetric matrices $X$ , $T$ . Thus, for any $1\leq l\leq 2k$ and indices $1\leq i_{1},\dots,i_{2l}\leq p$ , we find that

[TABLE]

for all $X\in\mathbb{S}_{p}(\mathbb{R})$ .

We now show that the right hand side (4.15) is integrable. This is not a mere formality: when $p=1$ , asking if this expression is integrable is the same as asking if the $t$ distribution with $n/2$ degrees of freedom has an $l^{\text{th}}$ moment, and it is well-known that the $t$ distribution only possesses moments of order smaller than its degrees of freedom. So the answer is most likely to be positive, but only for $n$ large enough.

Let us see why. For any symmetric matrix $T$ ,

[TABLE]

where $\lambda_{1}(T^{2})\geq\dots\geq\lambda_{p}(T^{2})\geq 0$ are the ordered eigenvalues of the positive-definite matrix $T^{2}$ . Thus

[TABLE]

When $n-2l\geq p-2$ , the last integrand is the density of a $T_{n/2-l}(I_{p}/8)$ distribution, so integrates to unity. Thus, when $n\geq p+4k-2$ , the right hand side of Equation (4.16) is an integrable function for all $1\leq l\leq 2k$ and $1\leq i_{1},\dots,i_{2l}\leq p$ . By Equation (4.15), and repeated differentiation under the integral sign justified by the integrability bounds given by Equations (4.16) and (4.17), we find that

[TABLE]

and

[TABLE]

for any $X\in\mathbb{S}_{p}(\mathbb{R})$ and any $n\geq p+4k-2$ .

Now let’s relate $H_{1}$ and $H_{2}$ to the definition of $f^{1/2}_{\text{NW}}\!\star\!\big{(}f^{1/2}_{\text{NW}}\!\circ\!R\big{)}$ as a convolution. This is where restricting $H_{1}$ and $H_{2}$ to small positive-definite matrices becomes useful. By Equation (3.10), the expression is

[TABLE]

using the change of variables $Y=I_{p}+Z/\sqrt{n}-X$ with $dZ=n^{\frac{p(p+1)}{4}}dY$ . For $X>0$ , we have $\mathbbm{1}\!\left[Y+X>0,Y>0\right]=\mathbbm{1}\!\left[Y>0\right]$ , and thus $H_{1}$ , $H_{2}$ satisfy

[TABLE]

and

[TABLE]

We would now like to interchange the integral and differentiation signs. To do so, we must understand what the repeated derivatives of $\exp\{-\frac{n}{4}\operatorname{tr}Y\Big{\}}|Y|^{\frac{m}{4}}$ look like. Differentiating once, we see that:

[TABLE]

Differentiating twice, we see that:

[TABLE]

So in general, it is clear that the repeated derivatives are given by some polynomial in entries of $(Y+X)^{-1}$ , times $\exp\{-\frac{n}{4}\operatorname{tr}(Y+X)\Big{\}}|Y+X|^{\frac{m}{4}}$ . We won’t investigate further the nature of these polynomials beyond remarking that for any indices $1\leq l\leq 2k$ and $1\leq i_{1},\dots,i_{2l}\leq p$ , and any symmetric matrices $X,Y\in\mathbb{S}_{p}(\mathbb{R})$ , we must have some crude bound

[TABLE]

for some polynomials $a_{J,s}$ that do not depend on $X$ or $Y$ . We relegate a proof of this result as Lemma 3 in Section 8. This can be uniformly bounded for all $0\leq X\leq I_{p}$ by

[TABLE]

for some constant $C_{1}(n,m,p)$ that does not depend on $X$ or $Y$ . But for any $n\geq p-2$ and $l\geq 0$ ,

[TABLE]

for a $Y$ with a matrix gamma distribution $\text{G}_{p}\left(\frac{n+p+1}{4},\frac{n}{2}I_{p}\right)$ . The Cauchy-Schwarz inequality then entails the bound

[TABLE]

The first expectation is always finite when $n\geq p-2$ . Since $\operatorname{tr}^{2s}(Y^{-1})$ can be written as a sum of zonal polynomials indexed by partitions of the integer $2s$ , the results of Muirhead (1982, Theorem 7.2.13) imply that the second expectation is finite whenever $\frac{n+p+1}{4}>2s+\frac{p-1}{2}\Leftrightarrow n\geq p+8s-2$ . Thus, in Equation (4.20) with $l\leq k$ and (4.21) with $l\leq 2k$ , whenever $n\geq p+16k-2$ we are justified in repeatedly differentiating under the integral sign by the integrability bounds given by Equations (4.22) and (4.23), and obtain in that case

[TABLE]

and

[TABLE]

Let us now look at how $H_{1}(X)$ and $H_{2}(X)$ behave as $X\rightarrow 0$ . On one hand, for any symmetric matrix $T$ we have $|\operatorname{tr}T^{k}|\leq\sqrt{p\operatorname{tr}T^{2k}}\leq\sqrt{p}|I_{p}+T^{2}|^{k/2}$ , so we must have the bounds

[TABLE]

and

[TABLE]

holding uniformly in $X$ . When $n-4k\geq p-2\Leftrightarrow n\geq p+4k-2$ , the right hand sides are proportional to the density of the G-conjugates of the normalized Wishart distributions with $n-4k$ degrees of freedom, so are integrable. Thus, by the dominated convergence theorem and Equations (4.18) and (4.19),

[TABLE]

and

[TABLE]

for $T\sim T_{n/2}(I_{/}8)$ .

On the other hand, the integrands at Equations (4.25) and (4.26) take a particularly simple form. Lemma 4 establishes by induction that there must be polynomials $b_{\kappa}^{(1)}$ and $b_{\kappa}^{(2)}$ in $n$ , $m$ and $p$ with degrees $\mathrm{deg}\,b_{\kappa}^{(1)}\leq 2k+1-q(\kappa)$ and $\mathrm{deg}\,b_{\kappa}^{(2)}\leq 2k+2-q(\kappa)$ such that

[TABLE]

and

[TABLE]

for any $0<X<I_{p}$ and $n\geq p+16k-2$ . The sums are taken over all partitions of the integers $\kappa$ satisfying $|\kappa|\leq 2k$ and $|\kappa|\leq 2k+1$ respectively, including the empty partition. But for any integer partition $\kappa$ , the bound

[TABLE]

holds uniformly in $0\leq X\leq I_{p}$ . Thus for $|\kappa|\leq 2k+1$ , the right hand side is integrable for $n\geq p+16k+6$ , by the same argument as for Equation (4.23). Thus for such $n$ , by the dominated convergence theorem and Equations (4.29) and (4.30), we obtain that

[TABLE]

and

[TABLE]

where $Y$ follows a $\text{W}_{p}(n,I_{p}/n)$ distribution. Combining Equations (4.27)–(4.28) with Equations (4.31)–(4.32) and Lemma 4 concludes the proof. ∎

Something remarkable about Lemma 2 is that it provides us with an algorithm to compute the moments of a symmetric $t$ distribution in terms of the moments of an inverse Wishart matrix. For example, when $k=1$ , repeated differentiation yields that

[TABLE]

We can recognize $\operatorname{tr}(Y+X)^{-2}$ and $\operatorname{tr}^{2}(Y+X)$ as the power sum polynomials $r_{(2)}([Y+X]^{-1})$ and $r_{(1,1)}([Y+X]^{-1})$ in the sense of Equation (4.12), so we must have $b_{(2)}^{(1)}=m(m-2)/16$ , $b_{(1,1)}^{(1)}=-m/8$ , $b_{(1)}^{(1)}=mn/8$ and $b_{\varnothing}^{(1)}=n^{2}p/16$ in the result of Lemma 4. Hence Lemma 2 really tells us that whenever $n\geq p+22$ , $\operatorname{E}\!\left[\operatorname{tr}T^{2}\right]$ for $T\sim T_{n/2}(I_{p}/8)$ can be expressed as

[TABLE]

where $Y\sim\text{W}_{p}(n,I_{p}/n)$ .

Of course, this also works with square moments and higher $k$ . For example, the same strategy for, say, square moments with $k=2$ yields that whenever $n\geq p+38$ , $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{2}\right]$ for $T\sim T_{n/2}(I_{p}/8)$ can be expressed as

[TABLE]

again where $Y\sim\text{W}_{p}(n,I_{p}/n)$ .

Unfortunately, as we consider larger orders, the repeated differentiation of $\exp\{-\frac{n}{4}\operatorname{tr}Z\}|Z|^{m/4}$ quickly becomes too cumbersome to perform by hand. But at least in theory, we can compute expressions like (4.34) and (4.35) for any $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]$ and $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{k}\right]$ , and Lemma 2 summarize that fact. That is, using the Fourier inversion theorem we have reduced the problem of computing moments of the $t$ -distribution $T_{n/2}(I_{p}/8)$ to that of computing expected power sum polynomials of the inverse Wishart distribution $\text{W}^{-1}_{p}(n,I_{p}/n)$ , for large enough $n$ .

How can we compute expected power sum polynomials of an inverse Wishart? There are two approaches in the literature. Letac and Massam (2004) found an expression in terms of a different basis, the zonal polynomials, which behave particularly nicely with respect to the inverse Wishart distribution, and whose expectations have a simple closed form. From this, they provided an algorithm for computing expected power sum polynomials to arbitrary order. Matsumoto (2012) found expressions of coordinate-wise moments in terms of modified Wiengarten orthogonal functions, from which expectations of power sum polynomials can be computed. We follow the idea of Letac and Massam (2004) in our asymptotic analysis.

For any integer partition $\kappa$ , there exist coefficients $c_{\kappa,\lambda}$ (which depend solely on $\kappa$ and $\lambda$ ) such that

[TABLE]

for $C_{\lambda}$ the so-called zonal polynomials. For an overview of the topic with a focus on random matrix theory, see Muirhead (1982, Chapter 7). The coefficients $c_{\kappa,\lambda}$ are explicitly computable. If we follow the normalization of zonal polynomials of Muirhead (1982), for example, we find that

[TABLE]

As mentioned, expectations of zonal polynomials with respect to a Wishart or inverse Wishart distribution take a particularly simple form. From Muirhead (1982, Theorem 7.2.13 and Equation (18) on p.237), the expected zonal polynomials for $Y^{-1}\sim\text{W}^{-1}_{p}(n,I_{p}/n)$ are

[TABLE]

for $\lambda\neq\varnothing$ , and $\operatorname{E}\!\left[C_{\varnothing}(Y^{-1})\right]=1$ . For example, the first few expected zonal polynomials are

[TABLE]

From this, we can exactly compute $\operatorname{E}\!\left[r_{\kappa}(Y^{-1})\right]$ and thus $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]$ and $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{k}\right]$ , as a function of $p$ and $n$ (or $m$ ). For example, by Equation (4.38) we find that

[TABLE]

and thus, by Equation (4.34), whenever $n\geq p+38$

[TABLE]

In a similar way, we can compute the expected zonal polynomials and hence, the expected power sum polynomials of $Y^{-1}$ for $|\kappa|=3,4$ . So from Equation (4.35), we obtain for $n\geq p+38$ that

[TABLE]

Of course, this reasoning also works for other $k$ ’s. In particular, we essentially derived a (potentially inefficient) algorithm to compute the moments of a $T_{n/2}(I_{p}/8)$ distribution to arbitrary order on our path to proving this theorem.

At this point, it is worthwhile to realize that Equations (4.40) and (4.41) are already enough to prove the theorem for small moments. For example, when $n\rightarrow\infty$ such that $p/n\rightarrow c\in[0,1)$ , then $m\sim(1-c)n$ and

[TABLE]

which proves that $\operatorname{E}\!\left[\operatorname{tr}T^{2}\right]=O(p^{2})$ . In fact,

[TABLE]

Moreover, when $n,p\rightarrow\infty$ such that $p/n\rightarrow 0$ , then $m\sim n$ and

[TABLE]

Thus {ceqn}

[TABLE]

and the theorem is proven for the second moment.

In theory, we could proceed in the same way for any moment of interest, but naturally we could never conclude that the theorem holds for all moments that way. Nonetheless, the calculations give us some hints about how to argue in the general case.

The idea is to express the moments of the symmetric $t$ distribution as polynomials of $p$ and $p/m$ . There are two regimes where random matrix theory is well understood: the classical regime where $p$ is held fixed as $n\rightarrow\infty$ , and the linear, high-dimensional regime where $p$ grows linearly with $n$ . From this, we can therefore conclude a few facts regarding the behavior of symmetric $t$ moments in these regimes. But these moments are polynomials, and a polynomial is a very rigid object: results from the two extreme cases where $p$ is fixed and $p$ grows linearly will be enough to prove results for every regime in between, yielding the first part of the theorem. Proving the second part will then be the simple matter of applying the GOE approximation of Jiang and Li (2015) and Bubeck et al. (2016) to the specific shape found for the symmetric $t$ moments while proving the first part, namely Equations (4.58) and (4.59).

Proof of Theorem 2.

Recall the expected zonal polynomial of an inverse Wishart $\text{W}^{-1}_{p}(n,I_{p}/n)$ is given by Equation (4.39). Based on the previous calculations, it is tempting to define

[TABLE]

so that {ceqn}

[TABLE]

With these expressions the expected power sum polynomials can be written as

[TABLE]

In other words, if we define

[TABLE]

then

[TABLE]

But $R^{-1}_{\mu}(m)=\prod_{i=1}^{q(\lambda)}\prod_{l=0}^{\lambda_{i}-1}\big{(}1-\frac{1-i+2l}{m}\big{)}$ is a polynomial in $1/m$ , while $P_{\lambda}(m,p)=\prod_{i=1}^{q(\lambda)}\prod_{i=1}^{\lambda_{i}-1}\left(\frac{p}{m}+\frac{1-i+2l}{m}\right)$ is a polynomial in $p/m$ and $1/m$ , both of degree at most $|\mu|=|\lambda|=|\kappa|$ . Thus

[TABLE]

for some coefficients $b_{ij}$ that don’t depend on $m$ , $p$ (or $n$ ). Define the polynomials $f_{j}(\alpha)=\sum_{i=0}^{|\kappa|}b_{ij}\alpha^{i}$ , so that

[TABLE]

Let us show that for all $0\leq j<|\kappa|-q(\kappa)$ , the polynomial $f_{j}$ must be identically zero over the interval $\alpha\in\big{(}0,1/\max(|\kappa|-2,0)\big{)}$ . Indeed, say this was not the case, and let $0\leq j_{0}<|\kappa|-q(\kappa)$ be the smallest $j$ with the property that $f_{j_{0}}(\alpha_{0})\neq 0$ for some $\alpha_{0}\in\big{(}0,\frac{1}{\max(|\kappa|-2,0)}\big{)}$ . As $f_{j_{0}}$ is a polynomial, by continuity it must be non-zero in a neighborhood of $\alpha_{0}$ , so we may as well assume $\alpha_{0}$ is rational without loss of generality. Now look at what happens to $\operatorname{E}\!\left[r_{\kappa}(Y^{-1})\right]$ as $p$ grows to infinity at the very specific linear rate $p=\lfloor\frac{\alpha_{0}}{1+\alpha_{0}}(n-1)\rfloor$ . Since $\alpha_{0}$ is rational, there must be a subsequence $n_{l}$ such that $p_{l}$ is exactly an integer (for example, if $\alpha_{0}=a/b$ with $a$ , $b$ integers, we can take $n_{l}=(a+b)l+1$ ). Then for $p_{l}=\frac{\alpha_{0}}{1+\alpha_{0}}(n_{l}-1)$ , we have exactly $p_{l}=\alpha_{0}m_{l}$ .

Since $\alpha_{0}<\frac{1}{\max(|\kappa|-2,0)}$ , then $|\kappa|<1+\big{(}\frac{\alpha_{0}}{1+\alpha_{0}}\big{)}^{-1}$ . Thus by Hölder’s inequality and Lemma 5,

[TABLE]

On the other hand, by Equations (4.46) and (4.48), the definition of $j_{0}$ and the fact that $R_{|\kappa|}(m)\rightarrow 1$ as $m\rightarrow\infty$ ,

[TABLE]

As $\alpha_{0}>0$ , $f_{j_{0}}(\alpha_{0})$ must therefore equal zero, a contradiction. Hence, as claimed, the polynomials $f_{j}(\alpha)$ for $0\leq j<|\kappa|-q(\kappa)$ all vanish over the interval $\big{(}0,\frac{1}{\max(|\kappa|-2,0)}\big{)}$ .

But a polynomial can have an infinite number of zeros only if all its coefficients are zero, so we conclude that {ceqn}

[TABLE]

Thus, from Equations (4.46) and (4.47) we have

[TABLE]

where

[TABLE]

Going back to equations (4.13) and (4.14) and plugging in the above yields that as long as $n\geq p+16k+6$ ,

[TABLE]

where

[TABLE]

Now, for any $a\leq b$ , we can associate a partition $\mu$ of norm $|\mu|=a$ with the partition $\mu^{*}=(\mu_{1}+b-a,\mu_{+}2,\dots,\mu_{q(\mu)})$ of norm $|\mu^{*}|=b$ , which satisfies

[TABLE]

By definition for the $R_{\mu}(m)$ ’s at Equation (4.44), this means that every factor that appears in $R^{-1}_{\mu}(m)$ appears in $R^{-1}_{\mu^{*}}(m)$ , so by definition of the $R_{|\mu|}(m)$ ’s at Equation (4.45), $R_{a}(m)R^{-1}(m)$ is a polynomial in $\frac{1}{m}$ . Moreover, as $b^{(1)}_{\kappa}$ and $b_{\kappa}^{(2)}$ are polynomials of degrees $d_{1}(\kappa)\equiv 2k+1-q(\kappa)$ and $d_{2}(\kappa)\equiv 2k+2-q(\kappa)$ respectively, there exists coefficients $c^{(1)}_{ijl}$ and $c^{(2)}_{ijl}$ such that

[TABLE]

and

[TABLE]

As $d_{1}(-i-j-l\geq 0),j,l\geq 0$ in the first case and $d_{2}(\kappa)-i-j-l,j,l\geq 0$ in the second case, we conclude that these two expressions are polynomials in $\frac{p}{m}$ and $\frac{1}{m}$ . Therefore, looking back at (4.49) and (4.50), we conclude that the $Q^{(1)}_{\kappa}(m,p)$ ’s and $Q^{(2)}_{\kappa}(m,p)$ ’s are polynomials in $\frac{p}{m}$ and $\frac{1}{m}$ . Therefore, if $n\geq p+16k+6$ there must be coefficients $a^{(1)}_{ij}$ , $a^{(2)}_{ij}$ and large enough integers $D_{1}$ , $D_{2}$ such that

[TABLE]

for polynomials $g_{i}^{(1)}(p)=\sum\limits_{j=0}^{i}a^{(1)}_{ij}p^{j}$ and $g_{i}^{(2)}(p)=\sum\limits_{j=0}^{i}a^{(2)}_{ij}p^{j}$ .

We will now proceed to show that $g_{i}^{(1)}$ and $g_{i}^{(2)}$ must vanish on $\mathbb{N}$ for $0\leq i_{0}<k+1$ and $0\leq i_{0}<k+2$ respectively. Our argument relies on the analysis of the asymptotic behavior of the moments of $T$ in the classical regime where $p$ is held fixed while $n$ grows to infinity.

Observe first that $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]$ and $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{k}\right]$ must have a finite limit as $n\rightarrow\infty$ with $p$ held fixed. Indeed, since $16T^{2}/n$ is positive definite, $|I_{p}+16T^{2}/n|$ is greater than one and so we have the bound

[TABLE]

When $p$ is held fixed, $\lim\limits_{n\rightarrow\infty}C_{n,p}=2^{\frac{p(3p+1)}{4}}/\pi^{\frac{p(p+1)}{4}}$ by Lemma 1. Moreover,

[TABLE]

for $\lambda_{1}(4T^{2})\geq\dots\geq\lambda_{p}(4T^{2})\geq 0$ the eigenvalues of $4T^{2}$ , and $(1+x/n)^{-n}$ is monotone decreasing towards $\exp(x)$ . Therefore, for a fixed dimension $p$ we can apply the monotone convergence theorem to obtain that

[TABLE]

for $Z\sim\text{GOE}(p)/4$ . Repeating the argument with $\operatorname{tr}^{2}T^{k}$ yields similarly

[TABLE]

Thus indeed $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]$ and $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{k}\right]$ have finite limits when $p$ is held fixed.

We can use this to show that $g_{i}^{(1)}$ and $g_{i}^{(2)}$ must vanish on $\mathbb{N}$ for $0\leq i_{0}<k+1$ and $0\leq i_{0}<k+2$ as follows. Say the first statement wasn’t true, and let $0\leq i_{0}<k+1$ be the smallest $i$ such that for some $p_{0}\in\mathbb{N}$ , $g_{i_{0}}^{(1)}(p_{0})\neq 0$ . Then by Equation (4.51) and the definition of $i_{0}$ , the limit of $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]$ as $n\rightarrow\infty$ with $p$ fixed at $p_{0}$ satisfies

[TABLE]

But $m=n-p-1$ tends to infinity as $n$ tends to infinity, and since $k+1-i_{0}>0$ , Equation (4.53) means that $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]/m^{k+1-i_{0}}$ must tend to zero. Thus $g^{(1)}_{i_{0}}(p_{0})$ has to equal zero, which contradicts our assumption. Thus for every $0\leq i<k+1$ , the polynomial $g^{(1)}_{i}$ must vanish on $\mathbb{N}$ .

Similarly, for $0\leq i<k+2$ , the polynomial $g^{(2)}_{i}$ must vanish on $\mathbb{N}$ , because if it wasn’t the case, we could take $0\leq i_{0}<k+2$ as the smallest $i$ with the property that for some $p_{0}\in\mathbb{N}$ , $g^{(2)}_{i_{0}}(p_{0})\neq 0$ , and then by Equation (4.52) with $p$ fixed at $p_{0}$ as $n\rightarrow\infty$ we would get

[TABLE]

But then by Equation (4.54), as $m$ tends to infinity and $k+2-i_{0}\geq 1$ the ratio $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{k}\right]/m^{k+2-i_{0}}$ must tend to zero. Thus we must have $g^{(2)}_{i_{0}}(p_{0})=0$ , a contradiction. Hence indeed for $0\leq i<k+2$ , the polynomial $g^{(2)}_{i}$ must vanish on $\mathbb{N}$ .

But of course, a polynomial can only have an infinite number of zeroes if its coefficients are all zero, so we must have {ceqn}

[TABLE]

Now say that $p$ varies with $n$ in such a way that $\lim_{n\rightarrow\infty}p/n=\alpha<1$ . Then for large enough $n$ , $n\geq p+16k+6$ so by Equations (4.51) and (4.57),

[TABLE]

and by Equations (4.52) and (4.57),

[TABLE]

Although we might not know what $a_{ij}$ coefficients are, this shows at least that the limits are finite. In particular, from Equations (4.58) and (4.59) we can conclude that $\operatorname{E}\!\left[\operatorname{tr}T^{2k}\right]=O(p^{k+1})$ and $\operatorname{E}\!\left[\operatorname{tr}^{2}T^{k}\right]=O(p^{k+2})$ , which shows the first claim of the theorem.

For the second claim, let $n,p\rightarrow\infty$ with $p/n\rightarrow\alpha=0$ . Then Equations (4.58) and (4.59) specialize to {ceqn}

[TABLE]

What is interesting about this result is that these limits must be the same regardless of the way $p$ grows! As long as $p\rightarrow\infty$ with $p/n\rightarrow 0$ , the limits are $a^{(1)}_{(k+1)(k+1)}$ and $a^{(2)}_{(k+2)(k+2)}$ , regardless of whether $p\sim\log n$ or $p\sim\sqrt{n}$ or some other growth.

Now, Bubeck et al. (2016, Theorem 7) and Jiang and Li (2015, Theorem 1) have shown that when $p\rightarrow\infty$ with $p^{3}/n\rightarrow 0$ , the total variation distance between a normalized Wishart $\sqrt{n}(\text{W}_{p}(n,I_{p}/n)-I_{p})$ matrix and a Gaussian Orthogonal Ensemble $\text{GOE}(p)$ matrix tends to zero as $n\rightarrow\infty$ . Therefore, the Hellinger distance satisfies also $H^{2}(\psi_{\text{NW}},\psi_{\text{GOE}})=H^{2}(f_{\text{NW}},f_{\text{GOE}})\rightarrow 0$ as $n\rightarrow\infty$ .

But convergence in Hellinger distance has strong implications for real-valued statistics. Indeed, for $T_{1}\sim\text{T}_{n/2}(I_{p}/8)=|\psi_{\text{NW}}|$ , $T_{2}\sim\text{GOE}(p)/4=|\psi_{\text{GOE}}|$ and any function $g:\mathbb{S}_{p}(\mathbb{R})\rightarrow\mathbb{R}$ such that $g(T_{1})$ , $g(T_{2})$ are square-integrable,

[TABLE]

by the Cauchy-Schwarz inequality.

Let’s consider applying this result to $g(T)=\operatorname{tr}T^{2k}/p^{k+1}$ and $g(T)=\operatorname{tr}^{2}T^{k}/p^{k+2}$ . What do we know about these statistics? In the case where $T_{2}\sim\text{GOE}(p)/4$ , results of Anderson et al. (2010, Lemma 2.1.6 and the equation above Equation (2.1.21)) provide that

[TABLE]

because these expressions only depend on $p$ , and since $p\rightarrow\infty$ as $n\rightarrow\infty$ , taking a limit as $n\rightarrow\infty$ is the same as taking a limit as $p\rightarrow\infty$ . Moreover, in the $T_{1}\sim\text{T}_{n/2}(I_{p}/8)$ case, using Jensen’s inequality and Equations (4.58)–(4.59) we can at least see that

[TABLE]

Therefore, using Equation (4.63) with $g(T)=\operatorname{tr}T^{2k}/p^{k+1}$ and $g(T)=\operatorname{tr}^{2}T^{k}/p^{k+2}$ we find that when $n,p\rightarrow\infty$ with $p^{3}/n\rightarrow 0$ ,

[TABLE]

Since $p^{3}/n\rightarrow 0$ implies $p/n\rightarrow 0$ , we conclude from Equation (4.62) that {ceqn}

[TABLE]

But then, by that same equation, we conclude that when $n,p\rightarrow\infty$ , not only when $p^{3}/n\rightarrow 0$ but for all $p$ such that $p/n\rightarrow 0$ , we have {ceqn}

[TABLE]

for $T\sim\text{T}_{n/2}(I_{p}/8)$ . To finish the proof, use Equation (4.66) with the fact that $\operatorname{E}\!\left[\operatorname{tr}T^{k}\right]=0$ for odd $k$ to find that

[TABLE]

Thus $\operatorname{tr}T^{k}/p^{k+1}\overset{L^{2}}{\longrightarrow}C_{k}/4^{2k}$ , as desired. This proves the second claim and concludes the proof. ∎

A pleasant consequence of this result is that when $n,p\rightarrow\infty$ with $p/n\rightarrow 0$ , we can conclude a version of the semicircle law holds for the $\text{T}_{n/2}(2I_{p})$ distribution. This is interesting because the $\text{T}_{n/2}(2I_{p})$ distribution has dependent entries with heavy tails, whose distribution varies with $n,p$ .

Let $4T/\sqrt{p}\sim 4\text{T}_{n/2}(I_{p}/8)/\sqrt{p}$ , with eigenvalues $\lambda_{1}(4T/\sqrt{p})\geq\dots\geq\lambda_{p}(4T/\sqrt{p})$ . Then define its empirical spectral measure to be {ceqn}

[TABLE]

Since $L_{4T/\sqrt{p}}$ depends on the random matrix $T$ , it is a random measure on $\mathbb{R}$ .

Corollary 1 (Semicircle law for the $t$ distribution).

The empirical spectral measure $L_{4T/\sqrt{p}}$ of a $4\text{T}_{n/2}(I_{p}/8)/\sqrt{p}$ random matrix converges weakly, in square mean, to the semicircle distribution {ceqn}

[TABLE]

Proof.

Let $f$ be any continuous function $\mathbb{R}\rightarrow\mathbb{R}$ that vanishes at infinity. By the Stone-Weierstrass theorem, there exists a sequence $f_{1},f_{2},\dots$ of polynomials such that for any $\epsilon>0$ , $\sup_{x\in\mathbb{R}}|f(x)-f_{l}(x)|<\epsilon$ . To fix some notation, write

[TABLE]

Then since $L_{4T/\sqrt{p}}$ and $L$ are both probability measures,

[TABLE]

By Theorem 2, the expectation tends to zero as $n,p\rightarrow\infty$ with $p/n\rightarrow 0$ . Thus

[TABLE]

But this is true for every $\epsilon>0$ , so the limit must be zero. Hence for every continuous $f$ that vanishes at infinity, the integral $\int fdL_{4T/\sqrt{p}}$ converges in square mean to $\int fdL$ . By Chung (2001, Theorem 4.4.1 and 4.4.2), this implies that for every bounded continuous $f$ , the integral $\int fdL_{4T/\sqrt{p}}$ converges in square mean to $\int fdL$ . Thus the empirical spectral distribution $L_{4T/\sqrt{p}}$ converges weakly, in square mean, to the semicircle distribution $L$ , as desired. ∎

5 Wishart asymptotics: the G-transform point-of-view

We now turn our attention to the main objective of this paper, namely studying the behavior of Wishart matrices in the various middle-scale regimes. To do this, we exploit the close connection between the Wishart and the symmetric $t$ distributions and make use of the results found Section 4. The main result of this section, Theorem 3, states that we can approximate for every middle-scale regime the G-transform $\psi_{\text{NW}}$ of a normalized Wishart by a degree-specific function $\psi_{K}$ . This can be seen as an analogue of Theorem 1 in the G-transform domain.

The reasoning behind the approximations is as follows. We could imagine writing $\psi_{\text{NW}}$ from Proposition 3 in exponential form, and expanding the terms as a Taylor series would yield

[TABLE]

Now imagine that the $T$ ’s appearing in the expression follow a $\text{T}_{n/2}(I_{p}/8)$ distribution. By Theorem 2, we know that $\operatorname{tr}\big{(}\frac{4T}{\sqrt{p}}\big{)}^{k}=\Theta(p)$ when $k$ is even, in an $L^{2}$ sense. When $k$ is odd, the theorem merely proves that $\operatorname{tr}\big{(}\frac{4T}{\sqrt{p}}\big{)}^{k}=o(p)$ , but for a $\text{GOE}(p)$ matrix $Z$ , we know that $\operatorname{tr}\frac{Z}{\sqrt{p}}^{k}$ is asymptotically normal for odd $k$ by Anderson et al. (2010, Theorem 2.1.31). This would suggest that $\operatorname{tr}\big{(}\frac{4T}{\sqrt{p}}\big{)}^{k}=\Theta(1)$ when $k$ is odd. Thus we would have, in some sense,

[TABLE]

In other words, terms in the power series would be associated with some degree $K$ , such that they would be non-negligible in any middle-scale regime of degree up to $K$ , and negligible in higher degrees. In fact, a similar phenomenon occurs with $C_{n,p}$ , by Lemma 1. This suggests we should try truncating these power series to derive degree-specific approximations.

Definition 3 (G-transform approximations).

For any $K\in\mathbb{N}$ , define the $K^{\text{th}}$ degree approximation $\psi_{K}:\mathbb{S}_{p}(\mathbb{R})\rightarrow\mathbb{C}$ as

[TABLE]

Just like the G-transform of a normalized Wishart matrix, these functions implicitly depend on $n$ . The first three are

[TABLE]

These functions have the pleasant property that their modulus is bounded, up to a constant, by the G-conjugate density $|\psi_{K}|$ . Indeed, on one hand we can rewrite Definition 3 into

[TABLE]

On the other hand, for any $x\in\mathbb{R}$ , we can write $1+ix=\sqrt{1+x^{2}}\exp\big{(}i\;\text{atan}(x)\big{)}$ with the arctangent function taking values in $(-\pi/2,\pi/2)$ . Thus by Proposition 3 we can rewrite $\psi_{\text{NW}}$ as

[TABLE]

with the understanding that the matrix-variate arctangent function operates on eigenvalues by functional calculus. Now, for any $x\in\mathbb{R}$ and odd integer $L$ , there is an elementary inequality

[TABLE]

Notice that $K+1\pm\mathbbm{1}\!\left[K\,\mathrm{odd}\right]$ is always an odd integer. Thus, from the above inequality and Equations (5.1) and (5.2), we can derive the bound

[TABLE]

In particular, since $\psi_{\text{NW}}$ is integrable whenever $n\geq p-2$ by Proposition 3, Equation (5.3) implies that every $\psi_{K}$ must also be integrable whenever $n\geq p-2$ . In particular, for large enough $n$ it makes sense to talk about the asymptotic total variation or Hellinger distance between $\psi_{\text{NW}}$ and $\psi_{K}$ .

We now state the main result, which is that each function $\psi_{K}$ approximates the G-transform of a normalized Wishart for all middle-scale regimes of degree $K$ or lower, but no other.

Theorem 3.

Let $\lim\limits_{n\rightarrow\infty}\frac{\log p}{\log n}<1$ as $n\rightarrow\infty$ . For any $K\in\mathbb{N}$ , the total variation distance between the G-transform of the normalized Wishart distribution $\sqrt{n}[\text{W}_{p}(n,I_{p}/n)-I_{p}]$ and the $K^{\text{th}}$ approximating function $\psi_{K}$ satisfies

[TABLE]

if and only if $p^{K+3}/n^{K+1}\rightarrow 0$ .

Proof.

If statement. For the first part of the theorem, remark that by Equation (3.7) it is equivalent to show that the Hellinger distance tends to zero, i.e. that

[TABLE]

when $p^{K+3}/n^{K+1}\rightarrow 0$ . To control this quantity, we use the Kullback-Leibler inequality for G-transforms. Notice that for any $x\in\mathbb{R}$ and $L\in\mathbb{N}$ ,

[TABLE]

Let $\operatorname{Log}$ stand for the principal branch of the complex logarithm, and let us study $\operatorname{Log}\psi_{\text{NW}}/\psi_{K}$ . Its real part can be bounded by

[TABLE]

By Equation (5.4), this can be bounded by

[TABLE]

We can bound the imaginary part of $\operatorname{Log}\psi_{\text{NW}}/\psi_{K}$ in a similar way. Define $P_{(-\pi,\pi]}:\mathbb{R}\rightarrow(-\pi,\pi]$ to be the projection mapping $P_{(-\pi,\pi]}x=x-2\pi\lceil\frac{x}{2\pi}-\frac{1}{2}\rceil$ . A plot is given as Figure 4.

It satisfies $\Im\operatorname{Log}z=P_{(-\pi,\pi]}\Im\log z$ for all branches of $\log z$ , as well as the inequality $|P_{(-\pi,\pi]}x|\leq|x|$ . Using this mapping, we can see that the imaginary part of $\operatorname{Log}\psi_{\text{NW}}/\psi_{K}$ can be bounded as

[TABLE]

By Equation (5.5), this can be bounded by

[TABLE]

Recall that the G-conjugate of the normalized Wishart distribution is the $t$ distribution with $n/2$ degrees of freedom and scale matrix $I_{p}/8$ , denoted $T_{n/2}(I_{p}/8)$ – see Equation (3.24) and Section 4 for details. Let us bound the expectations of these absolute real and imaginary parts under this distribution. By Equations (5.10), (5.6), (5.7) and Theorem 2, we find that for $T\sim|\psi_{\text{NW}}|=T_{n/2}(I_{p}/8)$ ,

[TABLE]

and

[TABLE]

as $n\rightarrow\infty$ with $p/n\rightarrow 0$ .

Moreover, from Lemma 1, we see that

[TABLE]

Thus, from Equations (5.3) and (5.10), we see that when $p^{K+3}/n^{K+1}\rightarrow 0$ , the asymptotic $L^{1}$ norm of $\psi_{K}$ is bounded by

[TABLE]

In fact, at the end of this proof we will see that this bound is sharp and the limit must be exactly 1.

Using Equations (5.8), (5.9) and (5.11) with Proposition (1) implies that when $p^{K+3}/n^{K+1}\rightarrow 0$ ,

[TABLE]

Thus $\mathrm{H}^{2}\big{(}\psi_{\text{NW}},\psi_{K}\big{)}\rightarrow 0$ , hence by Equation (3.7) we must have the limit $\mathrm{d}_{\text{TV}}\big{(}\psi_{\text{NW}},\psi_{K}\big{)}\rightarrow 0$ , as desired.

Only if statement. For the second part of the theorem, assume that the total variation distance satisfies $\mathrm{d}_{\text{TV}}(\psi_{\text{NW}},\psi_{K})\rightarrow 0$ , hence $\mathrm{H}\big{(}\psi_{\text{NW}},\psi_{K}\big{)}\rightarrow 0$ by Equation (3.7), as $n\rightarrow\infty$ . We will show by contradiction this implies that $p^{K+3}/n^{K+1}\rightarrow 0$ .

Assume this wasn’t the case. Since $\lim_{n\rightarrow\infty}\frac{\log p}{\log n}<1$ , there must be an $L\in\mathbb{N}$ such that $p^{L+3}/n^{L+1}\rightarrow 0$ , and since $p^{K+3}/n^{K+1}\nrightarrow 0$ , we must have $K<L$ . By Equation (5.8), we must have for $T\sim|\psi_{\text{NW}}|=\text{T}_{n/2}(I_{p}/8)$ that

[TABLE]

so

[TABLE]

Now write, by Equation (5.1) and Definition 3,

[TABLE]

But as $p^{L+3}/n^{L+1}$ , we must have $p/n\rightarrow 0$ , so by Theorem 2, we have $\frac{1}{p}\operatorname{tr}(\frac{4T}{\sqrt{p}})^{2k}\overset{L^{2}}{\rightarrow}C_{k}$ . Moreover, as we assumed that $p^{K+3}/n^{K+1}\nrightarrow 0$ , we must have $p\rightarrow\infty$ . Thus

[TABLE]

Then by the reverse triangle inequality,

[TABLE]

for a $T\sim|\psi_{\text{NW}}|=\text{T}_{n/2}(I_{p}/8)$ , that is

[TABLE]

Since $L^{p}$ convergence implies convergence in probability, by the continuous mapping theorem we must have

[TABLE]

as $n\rightarrow\infty$ , so by Equation (5.12)

[TABLE]

But then, from Equation (5.13) and Slutsky’s lemma (van der Vaart, 2000, Lemma 2.8 (iii)),

[TABLE]

as $n\rightarrow\infty$ . As $p^{K+3}/n^{K+1}$ is deterministic, this implies that $p^{K+3}/n^{K+1}\rightarrow 0$ as $n\rightarrow\infty$ , a contradiction. Thus whenever $\mathrm{H}^{2}\big{(}\psi_{\text{NW}},\psi_{K}\big{)}\rightarrow 0$ as $n\rightarrow\infty$ with $\lim\limits_{n\rightarrow\infty}\frac{\log p}{\log n}<1$ , we must have $p^{K+3}/n^{K+1}\rightarrow 0$ , as desired. This concludes the proof. ∎

Although Theorem 3 states that the functions $\psi_{K}$ approximate $\psi_{\text{NW}}$ , there is no guarantee that they are G-transforms of a probability density. In other words, nothing guarantees that their inverse G-transforms $\tilde{f}_{K}=\mathcal{G}^{-1}\{\psi_{K}\}$ are real-valued, non-negative and integrate to unity. However, the reverse triangle inequality applied to the $L^{2}$ -norm provides that

[TABLE]

so Theorem 3 and the Plancherel theorem implies that

[TABLE]

when $p^{K+3}/n^{K+1}\rightarrow 0$ . That is, the theorem at least guarantees that $|\tilde{f}_{K}|$ is asymptotically a density in its associated regime. We discuss this further in Section 6.

We independently know, by the results of Jiang and Li (2015) and Bubeck et al. (2016), that a Gaussian orthogonal ensemble approximation holds in the classical regime. Although $\psi_{0}$ is not the G-transform of a $\text{GOE}(p)$ , a simple Kullback-Leibler argument is sufficient to prove that it approximates $\psi_{\text{GOE}}$ for $0^{\text{th}}$ degree regimes.

Proposition 4.

The total variation distance between the $0^{\text{th}}$ degree G-transform approximation $\psi_{0}$ and the Gaussian orthogonal ensemble G-transform $\psi_{\text{GOE}}$ satisfies $\mathrm{d}_{\text{TV}}(\psi_{0},\psi_{\text{GOE}})\rightarrow 0$ as $n\rightarrow\infty$ with $p^{3}/n\rightarrow 0$ .

Proof.

We use a similar strategy to Theorem 3: namely, by Equation (3.7) is it equivalent to prove that $\mathrm{H}(\psi_{0},\psi_{\text{GOE}})\rightarrow 0$ as $n\rightarrow\infty$ with $p^{3}/n\rightarrow 0$ , and to control that quantity we can use the Kullback-Leibler inequality for G-transforms.

Let $T\sim|\psi_{\text{GOE}}|=\text{GOE}(p)/4$ . Since the Gaussian orthogonal ensemble has been extensively studied, we understand well its empirical moments. For example, according to Anderson et al. (2010, Lemma 2.2.2), we have $\operatorname{E}\!\left[\operatorname{tr}T^{2}\right]=O(p^{2})$ , while from Equation (2.1.45) of the same book we have $\operatorname{E}\!\left[|\operatorname{tr}T|\right]=O(p^{1/2})$ and $\operatorname{E}\!\left[|\operatorname{tr}T^{3}|\right]=O(p^{3/2})$ . Then from Definition 3 and Proposition 2, and using the projection map $P_{(-\pi,\pi)}$ as in the proof of Theorem 3, we find that

[TABLE]

and

[TABLE]

Since $\int_{\mathbb{S}_{p}(\mathbb{R})}|\psi_{0}|(T)dT\rightarrow 0$ when $n\rightarrow\infty$ with $p^{3}/n\rightarrow 0$ by Equation (5.14), if we apply Proposition 1 we find thatt

[TABLE]

for $p^{3}/n\rightarrow 0$ . By Equation (3.7), this concludes the proof. ∎

As a consequence, $H(f_{\text{NW}},f_{\text{GOE}})=H(\psi_{\text{NW}},\psi_{\text{GOE}})\leq H(\psi_{\text{NW}},\psi_{0})+H(\psi_{0},\psi_{\text{GOE}})\rightarrow 0$ when $n\rightarrow\infty$ with $p^{3}/n\rightarrow 0$ by Theorem 3 and Proposition 4. Hence $\mathrm{d}_{\text{TV}}(f_{\text{NW}},f_{\text{GOE}})\rightarrow 0$ by Equation (2.1) in the classical setting. This provides an alternative proof of the results of Jiang and Li (2015) and Bubeck et al. (2016).

6 Wishart asymptotics: the density point-of-view

In Section 5, we studied the asymptotic behavior of the normalized Wishart distribution $\sqrt{n}[\text{W}_{p}(n,I_{p}/n)-I_{p}]$ using its G-transform $\psi_{\text{NW}}$ . In particular, we derived an approximation to $\psi_{\text{NW}}$ for every middle-scale regime of a given degree. But although it is equivalent to study a probability distribution from a density or a G-transform point of view, it is still natural to wonder if we can find approximations to the density of a normalized Wishart for every middle-scale regime of a given degree.

Recall from Theorem 3 that $\mathrm{d}_{\text{TV}}(\psi_{\text{NW}},\psi_{K})\rightarrow 0$ when $p^{K+3}/n^{K+1}\rightarrow 0$ . Define $\tilde{f}_{K}=\mathcal{G}^{-1}\{\psi_{K}\}$ . In general, there is no guarantee that these should be real-valued. On the other hand, we know from Equation (5.3) that whenever $n\geq p-2$ , $\psi_{K}$ must be integrable, and since the G-transform maps integrable functions to integrable functions, $\tilde{f}_{K}$ must also be integrable. In fact, according to Equation (5.14), we know $|\tilde{f}_{K}|$ must be asymptotically a density when $p^{K+1}/n^{K+1}\rightarrow 0$ . This suggests we define the following densities.

Definition 4 (Density approximations).

For any $K\in\mathbb{N}$ and $n\geq p-2$ , define the $K^{\text{th}}$ degree density approximation as {ceqn}

[TABLE]

where $\tilde{f}_{K}=\mathcal{G}^{-1}\{\psi_{K}\}$ and $\psi_{K}$ is as in Definition 3. The distribution on the real symmetric matrices with density $f_{K}$ will be denoted $F_{K}$ .

The main interest is that we can asymptotically approximate the density $f_{\text{NW}}$ of a normalized Wishart by the bona fide densities $f_{K}$ . This was the content of Theorem 1 from Section 1, which we now prove as a simple corollary of its G-transform analogue Theorem 3 from Section 5.

Proof of Theorem 1.

As in the rest of this paper, we write the density of the normalized Wishart distribution $\sqrt{n}[W_{p}(n,I_{p}/n)-I_{p}]$ by $f_{\text{NW}}$ , and by Definition 4 the density of $F_{K}$ is $f_{K}$ . Notice that by Equation (2.1), to prove $\mathrm{d}_{\text{TV}}(f_{\text{NW}},f_{K})\rightarrow 0$ it is equivalent to prove that $\mathrm{H}(f_{\text{NW}},f_{K})\rightarrow 0$ . From the triangle inequality, the reverse triangle inequality, Theorem 3 and Equation (5.14),

[TABLE]

when $p^{K+3}/n^{K+1}\rightarrow 0$ . Thus $\mathrm{H}(f_{\text{NW}},f_{K})$ , hence $\mathrm{d}_{\text{TV}}(f_{\text{NW}},f_{K})$ , tends to zero when $n\rightarrow\infty$ with $p^{K+3}/n^{K+1}\rightarrow 0$ , as desired. ∎

We defined $f_{K}$ in terms of the inverse G-transform of the $\psi_{K}$ functions given by Definition 3. How can we express this explicitely? By Equation (5.3), we see that $|\psi_{K}|(T)$ is asymptotically bounded by the $\text{T}_{n/2}(I_{p}/8)$ density $|\psi_{\text{NW}}|(T)$ , which is integrable whenever $n-p+2\geq 0$ . But $|\psi_{\text{NW}}|^{1/2}(T)$ is proportional to a $\text{T}_{m/4}(\frac{n}{4m}I_{p})$ density in the sense of Definition 2, which is integrable for $m/4\geq p/2-1$ , that is whenever $n-3p+3\geq 0$ . Thus $|\psi_{\text{NW}}|^{1/2}(T)$ and therefore $|\psi_{K}|^{1/2}(T)$ is integrable whenever $n-3p+3\geq 0$ . Hence we can use the Fourer inversion theorem to conclude that $f_{K}$ is proportional to the integral

[TABLE]

whenever $n-3p+3\geq 0$ . In particular, if we do a change of variables $Z=\sqrt{8}T$ , we obtain Equation (1.3) from Section 1 whenever $n\geq 3p-3$ , from which we can derive Equations (1.1) and (1.2).

It would be quite pleasant if there was a way to solve the integral in Equation (6.1) or (1.3) and obtain a (potentially quite complicated) closed form expression for $f_{K}$ up to its normalization constant. So far, our efforts have been unfruitful.

We close our discussion with a final remark. At the end of Section 5, we showed that $\psi_{0}$ approximates $\psi_{\text{GOE}}$ in $0^{\text{th}}$ degree middle-scale regimes, from which the classical asymptotic normality follows. It is natural to wonder if $f_{0}$ approximates $f_{\text{GOE}}$ in the same context. An argument similar to that of Theorem 1 shows this is the case.

Proposition 5.

The total variation distance between the $0^{\text{th}}$ degree density approximation $f_{0}$ and the Gaussian orthogonal ensemble G-transform $f_{\text{GOE}}$ satisfies $\mathrm{d}_{\text{TV}}(f_{0},f_{\text{GOE}})\rightarrow 0$ as $n\rightarrow\infty$ with $p^{3}/n\rightarrow 0$ .

Proof.

The Hellinger distance between $f_{0}$ and $f_{\text{GOE}}$ satisfies

[TABLE]

By Equation (2.1), the result follows. ∎

Of course, we could conclude from this that $H(f_{\text{NW}},f_{\text{GOE}})\leq H(f_{\text{NW}},f_{0})+H(f_{0},f_{\text{GOE}})\rightarrow 0$ when $n\rightarrow\infty$ with $p^{3}/n\rightarrow 0$ , offering yet again another proof that a Gaussian orthogonal ensemble approximation holds in the classical setting.

7 The effect of phase transitions

Although we have established the existence of phase transitions, it does not shed much light on how the behavior of a normalized Wishart distribution might differ across phase transitions. To do this, it can be very illuminating to study the asymptotics of some of its statistics. For example, we could study its empirical moments.

For a normalized Wishart matrix $X\sim\sqrt{n}[\text{W}_{p}(n,I_{p}/8)-I_{p}]$ , a direct computation yields

[TABLE]

so in every middle-scale regime, that is whenever $n,p\rightarrow\infty$ with $p/n\rightarrow 0$ ,

[TABLE]

Thus we have $L^{2}$ convergence of the second empirical moment to $1$ , but otherwise nothing very interesting. There doesn’t seem to be any change of behavior across the different middle-scale regimes. In contrast, the situation with the symmetric $t$ distribution is striking, and illustrates yet again that middle-scale regime behavior becomes clearer under a G-transform. Indeed, we know from Theorem 2 that for a $T\sim\text{T}_{n/2}(I_{p}/8)$ , the quantity $\frac{1}{p}\operatorname{tr}(\frac{4T}{\sqrt{p}})^{2}$ also converges to 1, but we know more. At Equation (4.43), we computed the exact $L^{2}$ distance between $\frac{1}{p}\operatorname{tr}(\frac{4T}{\sqrt{p}})^{2}$ and $1$ , and found that

[TABLE]

Thus the $L^{2}$ distance must have middle-scale asymptotics

[TABLE]

as $n,p\rightarrow\infty$ with $p/n\rightarrow 0$ . Thus there is a sharp change in behavior of $\frac{1}{p}\operatorname{tr}(\frac{4T}{\sqrt{p}})^{2}$ when $p$ grows like $\sqrt{n}$ , and despite the symmetric $t$ distribution satisfying a semicircle law according to Corollary 1, it must ultimately behave differently than a Gaussian orthogonal ensemble matrix. The first-order asymptotics look the same: it is rather in the rate of this convergence that they differ.

This matters for both the symmetric $t$ and the Wishart distribution because rates of convergence can be distinguished in the strong topology. As a simple example, consider the sequence of one-dimensional distributions {ceqn}

[TABLE]

In the weak topology, these are asymptotically the same, since they converge to the same distribution – namely $F_{p},G_{p}\Rightarrow\delta_{0}$ as $p\rightarrow\infty$ , for $\delta_{0}$ the Dirac measure at [math]. In other words, in a metric that induces the weak topology such as the Lévy metric,

[TABLE]

Yet, by a direct computation of the Hellinger distance, which induces the strong topology,

[TABLE]

as $p\rightarrow\infty$ . Thus it is clear that the strong topology captures rates of convergence in a way that the weak topology can’t. But then, we should expect a phase transition when $p$ grows like $\sqrt{n}$ for the $\text{T}_{n/2}(I_{p}/8)$ distribution. And since the symmetric $t$ is the G-conjugate of the Wishart, this should imply a phase transition when $p$ grows like $\sqrt{n}$ for the Wishart distribution as well. This is consistent with Theorem 3, and provides an alternative explanation for the existence of the second phase transition.

A natural question then is to ask whether we can find symmetric $t$ statistics that exemplify all the middle-scale regime phase transitions. It is tempting to look at the $L^{2}$ error of the other empirical moments of the symmetric $t$ distribution, because we can use the methodology developed in Section 4 to compute their asymptotics to arbitrary order. As a reference, we compiled a table of the few first few moments as Table 1.

As can be seen from the table, the odd moments seem to have uniform behavior across all middle-scale regimes. In contrast, the even moments seem to all change asymptotics at the second phase transition $p=\Theta(\sqrt{n})$ , but nowhere else. Hence finding statistics that “flag” the other phase transitions remain an open question.

8 Auxiliary results

This section compiles several lemmas used elsewhere in the article.

Lemma 3 (First derivatives lemma).

For any indices $1\leq i_{1},\dots,i_{2l}\leq p$ and real symmetric matrix $Z$ , there exist polynomials $a_{J,s}(n,m)$ in $n$ and $m=n-p-1$ , indexed by $0\leq s\leq l$ and $J=(j_{1},\dots,j_{2l})$ , such that

[TABLE]

Proof.

To simplify notation, let

[TABLE]

and let $M_{l}=\{M_{J,s}\,|\,J\in\{1,\dots,p\}^{2l},s\leq l\}$ be the set of all such terms “on $2l$ indices”. Let $\langle M_{l}\rangle$ denote the linear span of $M_{l}$ , that is, the space of all linear combinations of elements of $M_{l}$ , with as coefficients real polynomials in $n$ and $m$ . Then we are really claiming that

[TABLE]

To see this, let $J=(j_{1},\dots,j_{2l-2})\in\{1,\dots,p\}^{2l-2}$ and define the extension $J_{a,b}^{q}=(j_{1},\dots,j_{q-1},a,b,j_{q+1},\dots,j_{2l-2})\in\{1,\dots,p\}^{2l}$ to be $J$ with indices $a$ , $b$ inserted (in this order) at the $q^{\text{th}}$ position. Then using that

[TABLE]

and

[TABLE]

we conclude that

[TABLE]

Thus, by linearity, $\partial_{\text{s}}/\partial_{\text{s}}Z_{i_{2l}i_{2l-1}}$ maps $\langle M_{l-1}\rangle$ to $\langle M_{l}\rangle$ . But naturally we have $\exp\{-\frac{n}{4}\operatorname{tr}Z\}|Z|^{m/4}\in\langle M_{0}\rangle$ , so by induction Equation (8.1) must then hold, as desired. ∎

Lemma 4 (Second derivatives lemma).

For any $k\in\mathbb{N}$ and any $Z\in\mathbb{S}_{p}(\mathbb{R})$ ,

[TABLE]

and

[TABLE]

for some polynomials $b_{\kappa}^{(1)}(n,m,p)$ and $b_{\kappa}^{(2)}(n,m,p)$ with degrees $\mathrm{deg}\,b_{\kappa}^{(1)}\leq 2k+1-q(\kappa)$ and $\mathrm{deg}\,b_{\kappa}^{(2)}\leq 2k+2-q(\kappa)$ . The sums at the right hand sides are taken over all integer partitions $\kappa$ of norm at most $2k$ and $2k+1$ , including the empty partition.

Proof.

We give a spectral proof. Let $OLO^{t}$ be the spectral decomposition of $Z$ , with eigenvalues $\lambda_{1}\geq\dots\geq\lambda_{p}$ , and notice that

[TABLE]

for any $1\leq i,j,h,l\leq p$ . As a consequence, for any differentiable real-valued functions $F_{1}(L),\dots,F_{p}(L)$ , we have

[TABLE]

This suggests we define a new operator $D_{L}$ that would map the space of diagonal matrices $F(L)=\mathrm{diag}(F_{1}(L),\dots,F_{p}(L))$ that differentially depends on $L$ , to itself, by

[TABLE]

In particular,

[TABLE]

and similarly

[TABLE]

Let us look more closely at this operator $D_{L}$ . It satisfies the following.

(i)

$D_{L}$ is linear, in the sense that for diagonals $F(L)$ , $G(L)$ and constants $a$ , $b$ with respect to $L$ ,

[TABLE] 2. (ii)

$D_{L}$ satisfies a restricted product rule, in the sense that for a diagonal $F(L)$ of the form $F(L)=f(L)I_{p}$ for some function $f(L)$ , and any diagonal $G(L)$ ,

[TABLE]

Moreover, from the definition of $D_{L}$ ,

[TABLE]

Now define the spaces

[TABLE]

for $l=1,\dots,2k$ , and let $\langle M_{l}\rangle$ denote the linear span of $M_{l}$ , i.e. the space of all real linear combinations of elements of $M_{l}$ . Moreover, for a partition $\kappa$ , let $\kappa\pm i$ denote $\kappa$ with the integer $i$ added or removed, respectively. For example, $(3,1,1,1)+2=(3,2,1,1,1)$ and $(3,2,1,1,1)-1=(3,2,1,1)$ . Note that $|\kappa\pm i|=|\kappa|\pm i$ . Then, for any $F\in M_{l}$ ,

[TABLE]

Thus $D_{L}\{F\}\in\langle M_{l+1}\rangle$ . It follows by linearity that $D_{L}$ maps $\langle M_{l}\rangle$ to $\langle M_{l+1}\rangle$ .

Now, $e^{-\frac{n}{4}\operatorname{tr}L}\big{|}L\big{|}^{\frac{m}{4}}I_{p}\in M_{0}$ , so by induction $D^{2k}_{L}\{e^{-\frac{n}{4}\operatorname{tr}L}\big{|}L\big{|}^{\frac{m}{4}}I_{p}\}\in\langle M_{2k}\rangle$ . Hence, for some polynomials $b_{\kappa,s}^{(1)}(n,m,p)$ of degree at most $2k-q(\kappa)$ ,

[TABLE]

for $\kappa^{\prime}=\kappa+s$ , $b_{\kappa^{\prime}}^{(1)}=b_{\kappa,s}^{(1)}$ when $s\neq 0$ , while $\kappa^{\prime}=\kappa$ , $b_{\kappa^{\prime}}^{(1)}=pb_{\kappa,s}^{(1)}$ when $s=0$ . Notice that when $s\neq 0$ , the degree of the $b_{\kappa^{\prime}}$ ’s is at most $2k-q(\kappa)=2k-(q(\kappa^{\prime})-1)$ , while when $s=0$ it is at most $2k-q(\kappa)+1=2k-q(\kappa^{\prime})+1$ . Thus in both cases, $\mathrm{deg}\,b_{\kappa^{\prime}}^{(1)}\leq 2k-q(\kappa^{\prime})+1$ , which by Equation (8.2) shows the first statement of the lemma.

For the second statement of the lemma, by an argument analoguous to Equation (8.4) we find that $\operatorname{tr}D_{L}^{k}\{e^{-\frac{n}{4}\operatorname{tr}L}\big{|}L\big{|}^{\frac{m}{4}}I_{p}\}\in\langle M_{k+1}\rangle$ . Thus by induction again, we must have $D_{L}^{k}\{\operatorname{tr}D_{L}^{k}\{e^{-\frac{n}{4}\operatorname{tr}L}\big{|}L\big{|}^{\frac{m}{4}}I_{p}\}\}\in\langle M_{2k+1}\rangle$ . Hence for some polynomials $b^{(2)}_{\kappa,s}(n,m,p)$ of degree at most $2k+1-q(\kappa)$ ,

[TABLE]

for again $\kappa^{\prime}=\kappa+s$ , $b^{(2)}_{\kappa^{\prime}}=b^{(2)}_{\kappa,s}$ when $s\neq 0$ , while $\kappa^{\prime}=\kappa$ , $b_{\kappa^{\prime}}^{(2)}=pb_{\kappa,s}^{(2)}$ when $s=0$ . By the same argument as before, $\mathrm{deg}\,b_{\kappa^{\prime}}^{(2)}\leq 2k-q(\kappa^{\prime})+2$ , which by Equation (8.3) shows the second statement of the lemma. This concludes the proof. ∎

We will also need in our proof a result about the asymptotics of inverse moments of the Wishart distribution. Because we couldn’t find anything like it in the literature, we think it is worthwhile to provide some context.

Let $f:(0,4)\rightarrow\mathbb{R}$ be the restriction to the positive reals of a complex function analytic in a neighborhood of $(0,4)$ . We are often interested in the linear spectral statistic $\frac{1}{p}\operatorname{tr}f(Y)$ for $Y\sim\text{W}_{p}(n,I_{p}/n)$ . Much is known about its distributional properties in the high-dimensional regime where $p\rightarrow\infty$ such that $\lim\limits_{n\rightarrow\infty}\frac{p}{n}=\alpha<1$ . For example, if $0<\alpha<1$ , there must be an $\epsilon>0$ such that $p/n\in[\epsilon,1-\epsilon]$ for all $n$ large enough, so Bai and Silverstein (2010, Theorem 9.10) and the dominated convergence theorem yield that

[TABLE]

as $n\rightarrow\infty$ . Here, $\overset{\mathcal{P}}{\rightarrow}$ stands for convergence in probability and $x_{\pm}$ for $(1\pm\sqrt{x})^{2}$ . In fact, the theorem states more, namely a central limit theorem, but what we want to draw to attention is the class of functions for which this result was proven.

This is sometimes enough, but often we would like to understand the expectation of this linear spectral statistic. If $f$ is bounded, then Equation (8.5) implies that

[TABLE]

This is nice for a function $f(z)$ like $e^{z}$ or $\sin z$ that happens to be bounded on a neighborhood of $(0,4)$ , but it unfortunately excludes many interesting unbounded functions, such as $\log z$ or $1/z$ . In fact, for unbounded $f$ , it is in general not even clear if $\lim\limits_{n\rightarrow\infty}\frac{1}{p}\operatorname{E}\!\left[\operatorname{tr}f(Y)\right]$ will be finite!

The following result shows that, at least in the case $f(z)=1/z^{s}$ for $s\in\mathbb{N}$ , we can use Stein’s lemma to obtain Equation (8.6) and its $\alpha=0$ analogue.

Lemma 5.

Let for $Y\sim\text{W}_{p}(n,I_{p}/n)$ and $s$ be any integer $s\geq 1$ . Then as long as $n\geq p+4s+2$ , the $s^{\text{th}}$ inverse moment satisfies the recursive bound

[TABLE]

In particular, as $p\rightarrow\infty$ such that $\lim\limits_{n\rightarrow\infty}\frac{p}{n}=\alpha<1$ , if $s<\alpha^{-1}-1$ then

[TABLE]

for $\alpha_{\pm}=(1\pm\sqrt{\alpha})^{2}$ .

Proof.

The classical Stein’s lemma states that for any differentiable function $f:\mathbb{R}\rightarrow\mathbb{R}$ such that $\operatorname{E}\!\left[\big{|}\big{(}\frac{\partial}{\partial Z}-Z\big{)}f(Z)\big{|}\right]<\infty$ for $Z\sim\text{N}(0,1)$ and $\lim\limits_{z\rightarrow\pm\infty}f(z)e^{-z^{2}/2}=0$ , we must have

[TABLE]

Let $Z\sim\text{N}_{n\times p}(0,I_{n}\otimes I_{p})$ be an $n\times p$ matrix of i.i.d. standard normal random variables, and let $Y=\frac{1}{n}Z^{t}Z\sim\text{W}_{p}(n,I_{p}/n)$ . For any $1\leq\alpha\leq n$ and $1\leq\beta,i,j\leq p$ ,

[TABLE]

so for $\delta$ the Kronecker delta,

[TABLE]

Let us first show that this expression is integrable. For any matrix $X$ , $|X_{ij}|\leq\|X\|_{2}=\|X^{t}X\|_{2}^{1/2}$ . Thus by Equation (8.7),

[TABLE]

As $Y$ is positive definite, $\|Y^{\pm a}\|_{2}\leq\operatorname{tr}Y^{\pm a}$ for any $a\in\mathbb{N}$ , so by the Cauchy-Schwarz inequality,

[TABLE]

which is finite for $n\geq p+4s+2$ .

Moreover, $(ZY^{-s})_{\alpha\beta}$ can be expressed using minors and determinants as a rational function of the entries of $Z$ , so

[TABLE]

So all conditions are fulfilled to apply Stein’s lemma to Equation 8.7 and obtain

[TABLE]

As $\operatorname{tr}(Y^{-l})\operatorname{tr}(Y^{-(s-l)})\leq p\operatorname{tr}Y^{-s}$ for any $1\leq l\leq s$ , and every term is integrable as $n\geq p+4s+2$ , this means that

[TABLE]

This shows the first part of the proof.

For the second part, let $S\in\mathbb{N}$ . If we let $n\rightarrow\infty$ such that $\lim\limits_{n\rightarrow\infty}\frac{p}{n}=\alpha<1$ , then any $S<\alpha^{-1}$ we will have $n\geq p+4S+2$ and $n\geq(p+1)S$ for $n$ large enough. So by repeatedly applying Equation (8.8) for $s=S,\dots,1$ and dividing by $p$ , we obtain

[TABLE]

Taking a limit in the above yields

[TABLE]

Thus for any $S<\alpha^{-1}$ , we have

[TABLE]

In the case $0<\alpha<1$ , if $s+1<\alpha^{-1}$ then by Jensen’s inequality and Equation (8.9) applied to $S=s+1$ , we have

[TABLE]

Thus $\frac{1}{p}\operatorname{tr}Y^{-s}$ is uniformly integrable, and by Equation (8.5),

[TABLE]

for $\alpha_{\pm}=(1\pm\sqrt{\alpha})^{2}$ .

In contrast, by applying Jensen’s inequality twice,

[TABLE]

so when $\alpha=0$ , by applying Equation (8.9) with $S=s$ , we obtain that $\lim\limits_{n\rightarrow\infty}\frac{1}{p}\operatorname{E}\!\left[\operatorname{tr}Y^{-s}\right]=1$ , as desired. ∎

9 Conclusion

The results of this paper raise more questions than they answer. We enumerate some that we found particularly interesting.

(1)

The univariate $t$ distribution with $\nu$ degrees of freedom is often defined as the distribution of $Z/\sqrt{s}$ , for $Z\sim\text{N}(0,1)$ and $s\sim\chi^{2}_{\nu}/\nu$ independent. In the real symmetric matrix case, we could imagine studying the distribution of $S^{1/4}ZS^{1/4}$ , for $Z\sim\text{GOE}(p)$ and $S\sim\text{W}_{p}(\nu,I_{p}/\nu)$ independent. Is this the $T_{\nu}(2I_{p})$ distribution in the sense of Section 4? 2. (2)

By Theorem 2 and Corollary 1, it is clear the empirical moments of a symmetric $t$ distribution are quite similar to those of a Gaussian orthogonal ensemble matrix, except perhaps in their rates of convergence. From Anderson et al. (2010, Theorem 2.1.31), we know the empirical moments of a Gaussian orthogonal ensemble are asymptotically normal. Are the empirical moments of the symmetric $t$ distribution also asymptotically normal? 3. (3)

In Section 4, we showed that the rate of convergence of the even normalized empirical moments of a symmetric $t$ distribution change when $p$ grows like $\sqrt{n}$ . Can we find analogue symmetric $t$ statistics that change their rates of convergence when $p$ grows like $n^{(K+1)/(K+3)}$ for every $K\in\mathbb{N}$ ? This would establish phase transitions for the symmetric $t$ distribution. If so, can we find approximating densities between every two transitions, just like in the Wishart case? 4. (4)

As a counterpart of Theorem 1, could we prove that $\mathrm{d}_{\text{TV}}(f_{\text{NW}},f_{K})\nrightarrow 0$ whenever $p^{K+3}/n^{K+1}\nrightarrow 0$ as $n\rightarrow\infty$ ? This is delicate because we have no guarantee that the $L^{1}$ norm of $\psi_{K}$ is asymptotically bounded for regimes of degree $K+1$ or higher. 5. (5)

Can we find the normalization constant or, better, solve the expectation of Equation (1.3) in closed form? 6. (6)

What asymptotics hold for the symmetric $t$ or the Wishart distribution in a middle-scale regime of infinite degree? How do these asymptotics differ from the other middle-scale regimes, or the high-dimensional regime? 7. (7)

The symmetric $t$ distribution was discovered as the G-conjugate of the Wishart distribution. What other distributions can be realized as the G-conjugate of some well-known distribution? 8. (8)

In Lemma 2, we expressed the characteristic function of the G-conjugate $F^{*}$ of a distribution $F$ as $f^{1/2}\star(f^{1/2}\circ R)$ , for $f$ the density of $F$ and $R$ the flip operator. To obtain the moments, we then repeatedly differentiated under the convolution integral at zero, and obtained an expression of the moments as an expectation with respect to $f$ . The argument worked when $F^{*}$ was the symmetric $t$ distribution. Can this argument be generalized to other $F^{*}$ ? If $F^{*}$ is a well-known distribution, does this give rise to novel and nontrivial expressions for its moments? 9. (9)

The G-transform of a distribution encodes all the information relative to that distribution. However, taking a modulus removes some information, and so in some sense the G-conjugate distribution is “less informative” than the original distribution. What happens when we repeatedly apply the G-conjugation operator, destroying information every time? For example, is there an attractor distribution $G$ that is the limit of this process regardless of the initial distribution? 10. (10)

Can we find distinct random operators which can be regarded, in some sense, as the total variation limit of a normalized Wishart matrix between every two phase transitions?

It appears to us that some of these questions might be very difficult to answer. We would be pleased if future work were able to shed light on any of them.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Anderson et al. [2010] Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni. An Introduction to Random Matrices . Cambridge University Press, 2010.
2Bai and Silverstein [2010] Zhidong Bai and Jack W. Silverstein. Spectral Analysis of Large Dimensional Random Matrices , volume 20. Springer, 2010.
3Bartlett [1933] Maurice S. Bartlett. On the theory of statistical regression. Proceedings of the Royal Society of Edinburgh , 53:260–283, 1933.
4Bubeck and Ganguly [2016] Sébastien Bubeck and Shirshendu Ganguly. Entropic CLT and phase transition in high-dimensional Wishart matrices. International Mathematics Research Notices , pages 243–258, 2016.
5Bubeck et al. [2016] Sébastien Bubeck, Jian Ding, Ronen Eldan, and Miklós Z. Rácz. Testing for high-dimensional geometry in random graphs. Random Structures & Algorithms , 49:503––532, 2016.
6Chung [2001] Kai-Lai Chung. A Course in Probability Theory . Academic Press, 2001.
7Gaudin [1961] Michel Gaudin. Sur la loi limite de l’espacement des valeurs propres d’une matrice alé atoire. Nuclear Physics , 25:447–458, 1961.
8Gupta and Nagar [1999] Arjun K. Gupta and Daya K. Nagar. Matrix Variate Distributions . CRC Press, 1999.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

The middle-scale asymptotics of Wishart matrices

Abstract

keywords:

1 Introduction

Theorem 1**.**

2 Notation and definitions

3 G-transforms

Definition 1** (G-transform of a density).**

Proposition 1** (Kullback-Leibler inequality for G-transforms).**

Proof.

Proposition 2**.**

Proof.

Proposition 3**.**

Proof.

4 The symmetric matrix variate ttt distribution

Definition 2** (Symmetric matrix variate ttt distribution).**

Lemma 1**.**

Proof.

Theorem 2**.**

Lemma 2**.**

Proof.

Proof of Theorem 2.

Corollary 1** (Semicircle law for the ttt distribution).**

Proof.

5 Wishart asymptotics: the G-transform point-of-view

Definition 3** (G-transform approximations).**

Theorem 3**.**

Proof.

Proposition 4**.**

Proof.

6 Wishart asymptotics: the density point-of-view

Definition 4** (Density approximations).**

Proof of Theorem 1.

Proposition 5**.**

Proof.

7 The effect of phase transitions

8 Auxiliary results

Lemma 3** (First derivatives lemma).**

Proof.

Lemma 4** (Second derivatives lemma).**

Proof.

Lemma 5**.**

Proof.

9 Conclusion

Theorem 1.

Definition 1 (G-transform of a density).

Proposition 1 (Kullback-Leibler inequality for G-transforms).

Proposition 2.

Proposition 3.

4 The symmetric matrix variate $t$ distribution

Definition 2 (Symmetric matrix variate $t$ distribution).

Lemma 1.

Theorem 2.

Lemma 2.

Corollary 1 (Semicircle law for the $t$ distribution).

Definition 3 (G-transform approximations).

Theorem 3.

Proposition 4.

Definition 4 (Density approximations).

Proposition 5.

Lemma 3 (First derivatives lemma).

Lemma 4 (Second derivatives lemma).

Lemma 5.