Computation of the Expected Euler Characteristic for the Largest   Eigenvalue of a Real Non-central Wishart Matrix

Nobuki Takayama; Lin Jiu; Satoshi Kuriki; and Yi Zhang

arXiv:1903.10099·math.ST·May 25, 2020

Computation of the Expected Euler Characteristic for the Largest Eigenvalue of a Real Non-central Wishart Matrix

Nobuki Takayama, Lin Jiu, Satoshi Kuriki, and Yi Zhang

PDF

Open Access

TL;DR

This paper develops an approximate formula for the distribution of the largest eigenvalue of real Wishart matrices using the expected Euler characteristic method, including differential equations and numerical analysis for small dimensions.

Contribution

It introduces a new approximation method for eigenvalue distributions of Wishart matrices applicable to general dimensions, with detailed analysis for 2x2 cases.

Findings

01

Derived an approximate distribution formula for the largest eigenvalue.

02

Established a differential equation for the 2x2 case.

03

Performed numerical analysis validating the approximation.

Abstract

We give an approximate formula for the distribution of the largest eigenvalue of real Wishart matrices by the expected Euler characteristic method for the general dimension. The formula is expressed in terms of a definite integral with parameters. We derive a differential equation satisfied by the integral for the $2 \times 2$ matrix case and perform a numerical analysis of it.

Tables5

Table 1. Table 1: Euler characteristic versus Monte Carlo simulation for the evaluation of ( 19 )

$x$	0	1	2	3	4	5
$E [χ (M_{x})]$	$- 5.9 \times 10^{- 8}$	0.74	0.56	0.14	0.014	0.00058
$\Pr (σ > x)$	1.	0.95	0.57	0.14	0.014	0.00058

Table 2. Table 2: Chyzak’s algorithm versus Heuristic 1 in deriving holonomic systems of ( 21 )

# pars	$0$	$1$	$2$	$3$	$4$	$5$
Chyzak	$976$	$9.8323 \times 10^{4}$	-	-	-	-
Heuristic	$43.49$	$394.4$	$8527$	$4.3957 \times 10^{5}$	-	$1.5519 \times 10^{6}$

Table 3. Table 3: Holonomic gradient method (HGM) versus Monte Carlo simulation in evaluating E [ χ ( M x ) ] E delimited-[] 𝜒 subscript 𝑀 𝑥 {\rm E}[\chi(M_{x})]

$x$	$1$	$2$	$3$	$4$	$5$	$6$
HGM	$0.745835$	$0.567729$	$0.144879$	$0.0146728$	$0.000582526$	$8.79942 \times 10^{- 6}$
mc	$0.745802$	$0.567623$	$0.144986$	$0.0146901$	$0.0005933$	$9.6 \times 10^{- 6}$

Table 4. Table 4: Numerical evaluation by extrapolation series versus Monte Carlo simulation for E [ χ ( M x ) ] E delimited-[] 𝜒 subscript 𝑀 𝑥 {\rm E}[\chi(M_{x})]

$x$	$f (x)$	simulation
3.8133	0.051146	0.051176
3.8166	0.047517	0.047695
3.82	0.044120	0.044515

Table 5. Table 5: Evaluation of E [ χ ( M x ) ] E delimited-[] 𝜒 subscript 𝑀 𝑥 {\rm E}[\chi(M_{x})] by ( 35 ) versus Monte Carlo simulation

$x$	$E [χ (M_{x})]$	simulation (with 100000 tries)
3	0.215428520	0.217072
4	0.016122970	0.016195
5	0.000357368	0.000386

Equations274

W = Ξ Ξ^{⊤}, Ξ = (ξ_{1}, \dots, ξ_{n}) \in R^{m \times n} .

W = Ξ Ξ^{⊤}, Ξ = (ξ_{1}, \dots, ξ_{n}) \in R^{m \times n} .

Pr (λ_{1} (W) < x) = c_{m, n} det (\frac{1}{2} n x Σ^{- 1})^{n /2}_{1} F_{1} (\frac{1}{n}; \frac{1}{2} (n + m + 1); - \frac{1}{2} n x Σ^{- 1}),

Pr (λ_{1} (W) < x) = c_{m, n} det (\frac{1}{2} n x Σ^{- 1})^{n /2}_{1} F_{1} (\frac{1}{n}; \frac{1}{2} (n + m + 1); - \frac{1}{2} n x Σ^{- 1}),

\mathrm{etr}\bigl{(}-TT^{\top}\bigr{)}H_{\kappa}(T)=\frac{(-1)^{|\kappa|}}{\pi^{mn/2}}\int\mathrm{etr}\bigl{(}-2iTU^{\top}\bigr{)}\mathrm{etr}\bigl{(}-UU^{\top}\bigr{)}C_{\kappa}\bigl{(}UU^{\top}\bigr{)}\,dU,\quad T,U\in\mathbb{R}^{m\times n}.

\mathrm{etr}\bigl{(}-TT^{\top}\bigr{)}H_{\kappa}(T)=\frac{(-1)^{|\kappa|}}{\pi^{mn/2}}\int\mathrm{etr}\bigl{(}-2iTU^{\top}\bigr{)}\mathrm{etr}\bigl{(}-UU^{\top}\bigr{)}C_{\kappa}\bigl{(}UU^{\top}\bigr{)}\,dU,\quad T,U\in\mathbb{R}^{m\times n}.

\int_{0<W<xI_{m}}|W|^{(n-m+1)/2-1}\mathrm{etr}\left(-\frac{1}{2}\bigl{(}\Sigma^{-1}W+\Omega\bigr{)}\right){}_{0}F_{1}(n/2;\Omega\Sigma^{-1}W/4)\,dW.

\int_{0<W<xI_{m}}|W|^{(n-m+1)/2-1}\mathrm{etr}\left(-\frac{1}{2}\bigl{(}\Sigma^{-1}W+\Omega\bigr{)}\right){}_{0}F_{1}(n/2;\Omega\Sigma^{-1}W/4)\,dW.

\bigl{\{}u^{\top}\Xi v\mid\|u\|_{\mathbb{R}^{m}}=\|v\|_{\mathbb{R}^{n}}=1\bigr{\}},

\bigl{\{}u^{\top}\Xi v\mid\|u\|_{\mathbb{R}^{m}}=\|v\|_{\mathbb{R}^{n}}=1\bigr{\}},

Pr (λ_{1} (W) \geq x) - Pr (λ_{2} (W) \geq x) + \dots + (- 1)^{m - 1} Pr (λ_{m} (W) \geq x)

Pr (λ_{1} (W) \geq x) - Pr (λ_{2} (W) \geq x) + \dots + (- 1)^{m - 1} Pr (λ_{m} (W) \geq x)

\Pr(\lambda_{2}(W)\geq x)=o\Bigl{(}\Pr(\lambda_{1}(W)\geq x\Bigr{)}\quad\mbox{as $x\to\infty$}.

\Pr(\lambda_{2}(W)\geq x)=o\Bigl{(}\Pr(\lambda_{1}(W)\geq x\Bigr{)}\quad\mbox{as $x\to\infty$}.

p (A) d A, d A = \prod d a_{ij} .

p (A) d A, d A = \prod d a_{ij} .

M = {h g^{⊤} ∣ g \in S^{m - 1}, h \in S^{n - 1}} ≃ S^{m - 1} \times S^{n - 1} / \sim,

M = {h g^{⊤} ∣ g \in S^{m - 1}, h \in S^{n - 1}} ≃ S^{m - 1} \times S^{n - 1} / \sim,

f (U) = tr (U A) = g^{⊤} A h, U \in M,

f (U) = tr (U A) = g^{⊤} A h, U \in M,

M_{x} = {h g^{⊤} \in M ∣ f (U) = g^{⊤} A h \geq x} .

M_{x} = {h g^{⊤} \in M ∣ f (U) = g^{⊤} A h \geq x} .

f (g, h) = g^{⊤} A h, g \in S^{n - 1}, h \in S^{m - 1} .

f (g, h) = g^{⊤} A h, g \in S^{n - 1}, h \in S^{m - 1} .

\partial_{i} f = g_{i} A h = 0, \partial_{a} f = g A h_{a} = 0.

\partial_{i} f = g_{i} A h = 0, \partial_{a} f = g A h_{a} = 0.

(g^{⊤} A) h = (d h^{⊤}) h = d (h^{⊤} h) = d

(g^{⊤} A) h = (d h^{⊤}) h = d (h^{⊤} h) = d

g^{⊤} (A h) = g^{⊤} (c g) = c (g^{⊤} g) = c .

g^{⊤} (A h) = g^{⊤} (c g) = c (g^{⊤} g) = c .

σ = g^{⊤} A h, B = G^{⊤} (g) A H (h) .

σ = g^{⊤} A h, B = G^{⊤} (g) A H (h) .

A = σ g h^{⊤} + G (g) B H (h)^{⊤} .

A = σ g h^{⊤} + G (g) B H (h)^{⊤} .

B (i, σ) = {B \in M (m - 1, n - 1) ∣

B (i, σ) = {B \in M (m - 1, n - 1) ∣

\displaystyle\mbox{$\lambda_{j}(B)>\sigma$ for all $j<i$},\ \mbox{$\lambda_{j}(B)\leq\sigma$ for all $j\geq i$}\}.

{\cal A}=\{A\in M(m,n)\,|\,\mbox{all the singular values of $A$ are different and non-zero}\},

{\cal A}=\{A\in M(m,n)\,|\,\mbox{all the singular values of $A$ are different and non-zero}\},

A_{i} = {(σ, g, h, B) ∣ σ \in R_{> 0}, (g, h) \in S^{m - 1} \times S^{n - 1} / \sim, B \in B (i, σ)} .

A_{i} = {(σ, g, h, B) ∣ σ \in R_{> 0}, (g, h) \in S^{m - 1} \times S^{n - 1} / \sim, B \in B (i, σ)} .

σ^{(1)} > σ^{(2)} > \dots > σ^{(m)} > 0.

σ^{(1)} > σ^{(2)} > \dots > σ^{(m)} > 0.

φ_{i} (A) = (σ^{(i)}, g^{(i)}, h^{(i)}, G (g^{(i)}) A H^{⊤} (h^{(i)})) .

φ_{i} (A) = (σ^{(i)}, g^{(i)}, h^{(i)}, G (g^{(i)}) A H^{⊤} (h^{(i)})) .

ψ (σ, g, h, B) = g σ h^{⊤} + G (g) B H (h)^{⊤} .

ψ (σ, g, h, B) = g σ h^{⊤} + G (g) B H (h)^{⊤} .

\frac{1}{2} \int_{x}^{\infty} σ^{n - m} d σ \int_{R^{(m - 1) (n - 1)}} d B \int_{S^{m - 1}} G^{⊤} d g \int_{S^{n - 1}} H^{⊤} d h det (σ^{2} I_{m - 1} - B B^{⊤}) p (A) .

\frac{1}{2} \int_{x}^{\infty} σ^{n - m} d σ \int_{R^{(m - 1) (n - 1)}} d B \int_{S^{m - 1}} G^{⊤} d g \int_{S^{n - 1}} H^{⊤} d h det (σ^{2} I_{m - 1} - B B^{⊤}) p (A) .

χ (M_{x}) =

χ (M_{x}) =

=

=

\partial_{i} \partial_{j} f = g_{ij}^{⊤} A h = g_{ij}^{⊤} σ g h^{⊤} h + g_{ij}^{⊤} GB H^{⊤} h = - σ g_{i}^{⊤} g_{j} since H^{⊤} h = 0.

\partial_{i} \partial_{j} f = g_{ij}^{⊤} A h = g_{ij}^{⊤} σ g h^{⊤} h + g_{ij}^{⊤} GB H^{⊤} h = - σ g_{i}^{⊤} g_{j} since H^{⊤} h = 0.

\partial_{i} \partial_{a} f = g_{i}^{⊤} A h_{a} = g_{i}^{⊤} g σ h^{⊤} h_{a} + g_{i}^{⊤} GB H^{⊤} h_{a} = g_{i}^{⊤} GB H^{⊤} h_{a} since g_{i}^{⊤} g = h^{⊤} h_{a} = 0.

\partial_{i} \partial_{a} f = g_{i}^{⊤} A h_{a} = g_{i}^{⊤} g σ h^{⊤} h_{a} + g_{i}^{⊤} GB H^{⊤} h_{a} = g_{i}^{⊤} GB H^{⊤} h_{a} since g_{i}^{⊤} g = h^{⊤} h_{a} = 0.

\partial_{a} \partial_{b} f = g^{⊤} A h_{ab} = g^{⊤} g σ h^{⊤} h_{ab} + g^{⊤} GB H^{⊤} h_{ab} = - σ h_{a}^{⊤} h_{b} since g^{⊤} G = 0.

\partial_{a} \partial_{b} f = g^{⊤} A h_{ab} = g^{⊤} g σ h^{⊤} h_{ab} + g^{⊤} GB H^{⊤} h_{ab} = - σ h_{a}^{⊤} h_{b} since g^{⊤} G = 0.

\displaystyle\left(\begin{array}[]{cc}-\partial_{i}\partial_{j}f&-\partial_{i}\partial_{a}f\\ -\partial_{i}\partial_{a}f&-\partial_{a}\partial_{b}\\ \end{array}\right)=\left(\begin{array}[]{cc}\sigma g_{i}^{\top}g_{j}&-g_{i}^{\top}GBH^{\top}h_{a}\\ -h_{a}^{\top}HB^{\top}G^{\top}g_{i}&\sigma h_{a}^{\top}h_{b}\\ \end{array}\right)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Point processes and geometric inequalities · Morphological variations and asymmetry

Full text

Computation of the expected Euler characteristic for the largest eigenvalue of a real non-central Wishart matrix

Nobuki Takayama

Lin Jiu

Satoshi Kuriki

Yi Zhang

Department of Mathematics, Kobe University, Japan

Department of Mathematics and Statistics, Dalhousie University, Canada

The Institute of Statistical Mathematics, Research Organization of Information and Systems, Japan

Department of Mathematical Sciences, The University of Texas at Dallas, USA

Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, China

Abstract

We give an approximate formula for the distribution of the largest eigenvalue of real Wishart matrices by the expected Euler characteristic method for general dimension. The formula is expressed in terms of a definite integral with parameters. We derive a differential equation satisfied by the integral for the $2\times 2$ matrix case and perform a numerical analysis of it.

keywords:

Euler characteristic method, holonomic gradient method, real non-central Wishart distributions.

MSC:

[2010] 62H10 , 68W30

††journal: Journal of Multivariate Analysis

1 Introduction

For $i\in\{1,\ldots,n\}$ , let $\xi_{i}\in\mathbb{R}^{m\times 1}$ be independently distributed as the $m$ -dimensional (real) Gaussian distribution $\mathcal{N}_{m}(\mu_{i},\Sigma)$ , where $\mu_{i}$ and $\Sigma$ are the mean vector and covariance matrix of $\xi_{i}$ , respectively. The (real) Wishart distribution $\mathcal{W}_{m}(n,\Sigma;\Omega)$ is the probability measure on the cone of $m\times m$ positive semi-definite matrices induced by the random matrix

[TABLE]

Here, $\Omega=\Sigma^{-1}\sum_{i=1}^{n}\mu_{i}\mu_{i}^{\top}$ is a parameter matrix. Unless $\Omega$ vanishes, the corresponding distribution is referred to as the non-central (real) Wishart distribution.

The largest eigenvalue $\lambda_{1}(W)$ of $W$ is used as a test statistic for testing $\Sigma=I_{m}$ and/or $\Omega\neq 0$ under the assumption that $\Sigma-I_{m}$ is positive semi-definite. This test statistic is expected to have good power when the matrices $\Sigma-I_{m}$ and $\Omega$ are of small size.

When testing hypotheses, the distribution of $\lambda_{1}(W)$ , which is the largest eigenvalue of $W$ , is of particular interest as it gives the power of the test. When $\Omega=0$ , the works by James and other authors (see, e.g., Muirhead [31]) show that the cumulative distribution function of $\lambda_{1}(W)$ can be written as a hypergeometric function with matrix argument as follows:

[TABLE]

where $c_{m,n}$ is a known constant [31, Corollary 9.7.2]. It is well known that the hypergeometric function ${}_{1}F_{1}$ has a series expression in the zonal polynomial $C_{\kappa}$ with index $\kappa$ , which is a partition of an integer. However, in view of numerical calculation, this is less useful because the explicit form of $C_{\kappa}(X)$ is not known unless the rank of the matrix $X$ is 1 or 2. On account of this difficulty, Hashiguchi et al. [14] proposed a holonomic gradient method (HGM) for numerical evaluation, which utilizes a holonomic system of differential equations for computation. However, when $\Omega\neq 0$ , the situation is more difficult. In this case, the cumulative distribution function $\Pr(\lambda_{1}(W)<x)$ cannot be expressed as a simple series of zonal polynomials. Hayakawa [15, Corollary 10] provides a formula for the cumulative distribution function as a series expansion in the Hermite polynomial $H_{\kappa}$ with symmetric matrix argument defined by the Laplace transform of $C_{\kappa}$ as follows:

[TABLE]

The Hermite polynomial $H_{\kappa}$ can be written as a linear combination of the zonal polynomial $C_{\kappa}$ ; however, the coefficients not provided explicitly [7]. Another approach is to use invariant polynomials proposed by Davis [11, 12]. Using the probability density function of the non-central Wishart distribution derived by James [19], the cumulative distribution function of $\lambda_{1}(W)$ is shown to be proportional to

[TABLE]

Díaz-Garí and Gutiérrez-Jáimez [13] showed that this has a series expansion in terms of invariant polynomials. Here, the invariant polynomial is a polynomial in two matrices indexed by two partitions. Although, in principle, the invariant polynomial can be expressed in terms of zonal polynomials in two matrices, it is challenging to utilize this expression for numerical calculation.

In this paper, instead of the direct calculation approach, we approximate the distribution function through the expected Euler characteristic heuristic or the Euler characteristic method proposed by Adler [1] and Worsley [36]. (see also [2] and [29].) This is a methodology to approximate the tail upper probability of a random field. In our problem, since the square root of the largest eigenvalue $\lambda_{1}(W)^{1/2}$ is the maximum of a Gaussian field

[TABLE]

this method is applicable ( [27], [28]). One can show that the Euler characteristic method evaluates the quantity

[TABLE]

rather than $\Pr(\lambda_{1}(W)\geq x)$ . Nevertheless, this formula approximates $\Pr(\lambda_{1}(W)\geq x)$ well when $x$ is large because $\Pr(\lambda_{i}(W)\geq x)$ ( $i\geq 2$ ) are negligible for large $x$ . This is practically sufficient for our purpose because only the upper tail probability is required in testing hypotheses.

In this paper, we consider the non-central real Wishart matrix. In the multiple-input multiple-output (MIMO) problem, the non-central complex Wishart matrix also plays an important role. The largest eigenvalue of the non-central complex Wishart matrix is significantly easier to handle in that case because the explicit formula for the cumulative distribution was given by Kang and Alouini [21]. The holonomic gradient method based on Kang and Alouini’s formula was proposed in [10].

In general, the approximation error of the Euler characteristic has not been extensively studied. Nevertheless, the Euler characteristic heuristic is widely used as an approximation of the tail probability of the supremum because of the difficulty of the original problem and the empirical usefulness of this heuristic (see, e.g., [35]). One exception is the Gaussian process with mean zero and variance one, which corresponds to the central Wishart case where $\Sigma$ is proportional to the identity matrix and $\Omega$ vanishes. In this particular case, the approximation error has been fully investigated [27, 28]; the details are presented in B. For the non-central case, we present the following lemma for which the proof is provided in A:

Lemma 1.

Assume that $m=2$ . If either $\Sigma$ or $\Sigma\Omega$ has distinct eigenvalues, then it holds that

[TABLE]

This implies that the Euler characteristic approximation (1) is justified as an approximation for $\Pr(\lambda_{1}(W)\geq x)$ . We conjecture that this holds for arbitrary $m$ and arbitrary configurations of $\Sigma$ and $\Omega$ .

The rest of the paper is organized as follows. In Section 2, we provide an integral representation formula for the expectation of the Euler characteristic for random matrices of a general size. In Section 3, we consider the case of $2\times 2$ random matrices and study the integral representation derived in Section 2 in the polar coordinate system and investigate it from a numerical point of view. By virtue of the theory of holonomic systems (see, e.g., [16]), the integral representation given in Section 2 satisfies a holonomic system of linear differential equations. However, its explicit form is not known in general. In Section 4, we consider the case of $2\times 2$ random matrices again to demonstrate that the recent development [22, 25] of computer aided proofs and derivations (CAPD), for combinatorial identities, proofs of them, derivations of difference, and differential equations can be applied to the evaluation of definite integrals or sums. We derive a differential equation which is satisfied by the integral representation of the expectation of the Euler characteristic with the help of computer algebra algorithms and perform a numerical analysis of the differential equation. Thus, a new efficient method to numerically evaluate the Euler expectation, when the numerical integration is difficult to perform, is obtained. Last but not least, in B, we give a closed formula, expressed in terms of the Laguerre polynomial, for the expectation of the Euler characteristic for random matrices of general size for the central and scalar covariance case.

2 Expectation of an Euler characteristic number

Let $A=(a_{ij})$ be a real $m\times n$ matrix-valued random variable (random matrix) with density

[TABLE]

We assume that $p(A)$ is smooth and $n\geq m\geq 2$ . Define a manifold

[TABLE]

where $(h,g)\sim(-h,-g)$ , $h$ and $g$ are column vectors, and $hg^{\top}$ is a rank $1$ $m\times n$ matrix. Set

[TABLE]

and

[TABLE]

Proposition 1.

Let $A$ be a random matrix as aforementioned. The following claims are equivalent:

(i)

The function $f(U)$ has a critical point at $U=hg^{\top}$ . 2. (ii)

The vectors $g^{\top},h$ are left and right eigenvectors of $A$ , respectively. In other words, there exists a constant $c$ such that $g^{\top}A=ch^{\top}$ , $Ah=cg$ .

Moreover, the function $f$ takes value $c$ at the critical point $(g,h)$ .

Proof. We assume that $g\in S^{n-1}$ and $h\in S^{m-1}$ are expressed by local coordinates $u_{i}$ and $v_{a}$ , respectively, where $1\leq i\leq m-1$ and $1\leq a\leq n-1$ . We denote $\partial/\partial u_{i}$ by $\partial_{i}$ and $\partial/\partial v_{a}$ by $\partial_{a}$ . Since $g^{\top}g=1$ , we have $g_{i}^{\top}g=0$ , where $g_{i}=\partial_{i}\bullet g$ . We omit $\bullet$ , which represents the action, when there is no ambiguity. Analogously, we have $h_{a}^{\top}h_{a}=0$ , where $h_{a}=\partial_{a}h$ .

Assume that $A$ is a $m\times n$ (random real) matrix. Let us consider the function $f(U)$ expressed by the local coordinate $(g(u),h(v))$

[TABLE]

At the critical point of $f$ , we have

[TABLE]

The aforementioned equality (2)holds for each $i$ , and $u$ is a local coordinate of $S^{n-1}$ , which implies that all $g_{i}$ ’s are linearly independent. Therefore, there exists a constant $c$ such that $Ah=cg$ at the critical point. Analogously, there exists a constant $d$ such that $A^{\top}g=dh$ . Let us show that $c=d$ . We have

[TABLE]

and

[TABLE]

Therefore, we have $d=c=f(g,h)$ at the critical point.

Conversely, $Ah=cg$ and $A^{\top}g=dh$ at a point $(u,v)$ imply that $(g(u),h(v))$ is a critical point of $f(g(u),h(v))$ . ∎

We consider a continuous family of elements of $\mathit{SO}(m)$ parameterized by the first column vector $g$ . In other words, we consider a continuous family of orthogonal frames of $\mathbb{R}^{m}$ parameterized by $g\in S^{m-1}$ . An element of $\mathit{SO}(m)$ is denoted by $(g,G)\in O(m)$ , where $G$ is an $m\times(m-1)$ matrix. Analogously, we take a family $(h,H)\in\mathit{SO}(n)$ parameterized by $h\in S^{n-1}$ , where $H$ is an $n\times(n-1)$ matrix parameterized by $h$ . Set

[TABLE]

The matrix $A$ can be expressed as

[TABLE]

Intuitively, this is a partial singular value decomposition. We denote the set of all $(m-1)\times(n-1)$ matrices by $M(m-1,n-1)$ .

The aforementioned decomposition provides a coordinate system for the space of random matrices $A$ . Without loss of generality, we assume that $m\leq n$ . We sort the singular values of $B$ in descending order, and denote by $\lambda_{j}(B)$ the $j$ -th singular value of the matrix $B$ . For a real number $\sigma$ , we define

[TABLE]

Subsequently, we set

[TABLE]

and

[TABLE]

For a matrix $A$ in ${\cal A}\subset M(m,n)$ , we sort the singular values of $A$ in descending order as follows:

[TABLE]

Let $g^{(i)}$ and $h^{(i)}$ be the left and right eigenvectors of $A$ for $\sigma^{(i)}$ , respectively. Note that $g^{(i)}$ and $h^{(i)}$ are respective eigenvectors of $AA^{\top}$ and $A^{\top}A$ for the eigenvalue $\sigma^{(i)}$ , which implies that $g^{(i)}$ and $h^{(i)}$ are uniquely determined modulo the multiplication by $\pm 1$ . Define a map $\varphi_{i}$ from ${\cal A}$ to ${\cal A}_{i}$ by

[TABLE]

The matrix $G(g^{(i)})AH^{\top}(h^{(i)})$ lies in ${\cal B}(i,\sigma^{(i)})$ because the singular values of $B^{(i)}$ agree with those of $A$ excluding $\sigma^{(i)}$ .

Lemma 2.

The map $\varphi_{i}$ in (4) is smooth and isomorphic.

Proof. Define a map $\psi$ from ${\cal A}_{i}$ to ${\cal A}$ by

[TABLE]

Based on calculation, we observe that $\varphi_{i}\circ\psi$ and $\psi\circ\varphi_{i}$ are identity maps. The map $\varphi_{i}$ is then one-to-one and surjective. Next, we show that the map $\psi$ is smooth. Since we assume that all the singular values are different, the maps of taking the $i$ -th singular value of a given $A$ and an eigenvector for the singular value are smooth on an open connected neighborhood $W\subset{\cal A}$ of $A$ (by checking the Jacobian does not vanish). The inverse map is then locally smooth. Hence, $\varphi_{i}$ is smooth and isomorphic. ∎

We are interested in the Euler characteristic of $M_{x}$ .

Theorem 1.

Suppose that $x>0$ and $f(U)$ is a Morse function for almost all $A$ ’s. We further assume that if a set is of measure zero with respect to the Lebesgue measure, it is also a measure zero set with respect to the measure $p(A)dA$ . The expectation of the Euler characteristic number ${\rm E}[\chi(M_{x})]$ equals

[TABLE]

Here, we set $G^{\top}dg=\wedge_{i=1}^{m-1}G_{i}^{\top}dg$ , $H^{\top}dh=\wedge_{i=1}^{n-1}H_{i}^{\top}dh$ , where $G_{i}$ and $H_{i}$ are the $i$ -th column vectors of $G$ and $H$ , respectively, $dg=(dg_{1},\ldots,dg_{m})^{\top}$ and $dh=(dh_{1},\ldots,dh_{n})^{\top}.$

Note that $G^{\top}dg$ and $H^{\top}dh$ are $O(m)$ and $O(n)$ invariant measures on $S^{m-1}$ and $S^{n-1}$ , respectively.

Proof. Without loss of generality, we assume that $m\leq n$ . According to Morse theory, if $f(U)$ is a Morse function, which is a smooth function without a degenerated critical point, then we have

[TABLE]

where $\sigma^{(i)}$ is the $i$ -th singular value of $A$ , $g^{(i)}$ and $h^{(i)}$ are left and right eigenvectors, and $B^{(i)}=G^{\top}(g^{(i)})AH(h^{(i)})$ . The equality (8) is the Morse theorem for manifolds with boundaries. The equalities (8) and (11) can be established as follows.

First, we have the relation $g_{i}^{\top}g=0$ . By differentiating it with respect to $u_{j}$ , we have $g_{ij}^{\top}g+g_{i}^{\top}g_{j}=0$ . Let us evaluate $\partial_{i}\partial_{j}f$ . By the expression $A=\sigma gh^{\top}+GBH^{\top}$ , it is equal to

[TABLE]

Next, we evaluate $\partial_{i}\partial_{a}f$ :

[TABLE]

Third, we evaluate $\partial_{a}\partial_{b}f$ :

[TABLE]

Summarizing the aforedescribed calculations, the Hessian is equal to

[TABLE]

Since $\det(PP^{\top})=\det(P)^{2}$ , the sign of the determinant of the Hessian is equal to that of the middle of the above 3 matrices.

The equalities (11) and (12) can now be established; we fix $i$ and omit the superscript $(i)$ in the following discussion. We consider the product of the following two matrices:

[TABLE]

which is equal to

[TABLE]

Since the bottom-left block is $\mathbf{0}$ , the determinant of this matrix is $\det(\sigma^{2}I_{m}-GBB^{\top}G)$ . Setting $C=BB^{\top}$ and $\tilde{G}=\left(g|G\right)$ , we have

[TABLE]

Since ${\tilde{G}}{\tilde{G}}^{\top}=E$ , the determinant of the matrix above is equal to $\sigma^{2}\,\det(\sigma^{2}I_{m-1}-C)$ . In summary, we have obtained equalities of (11) and (12).

With regard to the expectation of the Euler characteristic number, exchanging the sum and the integral, we have

[TABLE]

To evaluate the expectation of the Euler characteristic number, we require the Jacobian of (3). According to standard arguments in multivariate analysis (see, e.g., [34, (3.19)]), we have

[TABLE]

Subsequently, we have

[TABLE]

The factor $1/2$ is owing to the multiplicity of $(g,h)\mapsto gh^{\top}$ being $2$ . Set ${\cal B}^{(i)}={\cal B}(i,\sigma^{(i)})$ . For $i\neq j$ , since ${\cal B}^{(i)}\cap{\cal B}^{(j)}$ and $\mathbb{R}^{(m-1)(n-1)}\setminus\sum_{i=1}^{m}{\cal B}^{(i)}$ are measure zero sets, we may sum up integral domains for $B$ into one domain as

[TABLE]

Thus, we have derived the conclusion. ∎

The integral (5) does not depend on the choice of $G(g)$ nor $H(h)$ . The reason is as follows. The column vectors of the matrix $G=G(g)$ are of length 1 and are orthogonal to the vector $g$ . Let ${\tilde{G}}$ be a matrix with the same property. In other words, we assume $(g,{\tilde{G}})\in SO(m)$ . Then there exists an $(m-1)\times(m-1)$ orthogonal matrix $P$ such that ${\tilde{G}}=GP$ and $|P|=1$ hold. Taking the exterior product of elements of ${\tilde{G}}^{\top}dg=PG^{\top}dg$ , we have

[TABLE]

The case for $H$ can be shown analogously.

One of the most important examples is that $A$ has a Gaussian distribution $\mathcal{N}_{m\times n}(M,\Sigma\otimes I_{n})$ , where $\otimes$ is the Kronecker product of matrices. In this case, we have

[TABLE]

The largest singular value of $A$ is the square root of the largest eigenvalue of a non-central Wishart matrix $W_{m}(n,\Sigma,\Sigma^{-1}MM^{\top})$ . Substituting (3) and (13) into (5), we have

[TABLE]

In this expression, the number of parameters is $m(m+1)/2+mn$ ; therefore, it is over-parameterized. Note that

[TABLE]

Let $\Sigma^{1/2}=P^{\top}DP$ be a spectral decomposition, where $D=\mathrm{diag}(d_{i})$ . Then we have

[TABLE]

Let $PM=NQ$ be a QR decomposition, where $N$ is $m\times n$ lower triangular matrix with nonnegative diagonal elements and $Q\in O(n)$ . Then $PAQ^{\top}=DV+N$ . Since the largest eigenvalues of $A$ and $PAQ^{\top}$ are the same, we can assume without loss of generality that $\Sigma$ is a diagonal matrix and $M$ is a lower triangle with nonnegative diagonal elements, i.e.,

[TABLE]

When $\Sigma$ has multiple roots, i.e.,

[TABLE]

by multiplying $\mathrm{diag}(P_{1},\ldots,P_{r})\in O(n_{1})\times\cdots\times O(n_{r})$ and its transpose from the left and right, we can assume

[TABLE]

Therefore, our problem can be formalized as follows: Evaluate (14) with parameters (16) (or (17) and (18)).

In the following sections, we will evaluate the integral representation of the expectation of the Euler characteristic given in Theorem 1 for some interesting special cases. We can obtain approximate values of the probability of the largest eigenvalue of random matrices by virtue of them. The Euler characteristic heuristic is

[TABLE]

The condition that $f(U)$ is a Morse function with probability one holds if $A$ has $m$ distinct and non-zero singular values with probability one.

3 The case of $m=n=2$

We derive Theorem 1 in the special case of $m=n=2$ by taking explicit coordinates. This derivation motivates the proof for the general case discussed in the previous section. The case $m=n=2$ is studied numerically in the last section with the holonomic gradient method (HGM).

Fix two unit vectors

[TABLE]

Define

[TABLE]

which satisfies

[TABLE]

Similarly, we define $H=\left(\cos\left(\phi+\frac{\pi}{2}\right),\sin\left(\phi+\frac{\pi}{2}\right)\right)^{\top}=\left(-\sin\phi,\cos\phi\right)^{\top}$ . Here, in case the sum is greater than $2\pi$ , both $\theta+\pi/2$ and $\phi+\pi/2$ should be treated as mod $2\pi$ . Now, any $2\times 2$ matrix, say $A$ , can be recovered by

[TABLE]

with $4$ variables $\left(\sigma,\theta,\phi,b\right)$ . We may further assume that $\sigma\in\mathbb{R}_{\geq 0}$ , $b\in\mathbb{R}$ , and $\phi,\theta\in[0,2\pi)$ .

Fix $\sigma_{0},b_{0},\theta_{0},\phi_{0}$ and let

[TABLE]

By letting $\sigma,b$ vary in $\mathbb{R}$ and $\phi,\theta$ vary in $[0,2\pi)$ , we recover $A_{0}$ four times:

[TABLE]

Here, for the first two cases, it is easily seen from the symmetry of the manifold $M$ (shown below) that $\left(h,g\right)$ is equivalent to $\left(-h,-g\right)$ .

2.

The second symmetry is given by $\left(\sigma^{\prime},b^{\prime}\right)=\left(b_{0},\sigma_{0}\right)$ , i.e., interchanging $\sigma$ and $b$ . Note that $G\left(\theta\right)=g\left(\theta+\frac{\pi}{2}\right)$ and $H\left(\phi\right)=h\left(\phi+\pi/2\right)$ . Thus, there also exists

[TABLE]

recovering $A_{0}$ .

Therefore, to recover $A$ , we can always assume that $\sigma\geq b$ and let $\theta,\phi\in[0,2\pi)$ . See Lemma 2 for a general claim.

Next, we consider the manifold

[TABLE]

and the function $f$ on $M$ such that

[TABLE]

Apparently, $A$ only has two pairs of eigenvectors, which can be verified by the following computations:

[TABLE]

The function $f$ has two critical points on $M$ , which are at

the point $P=hg^{\top}\in M\Leftrightarrow\left(\alpha,\beta\right)=\left(\theta,\phi\right)$

2.

or $Q=HG^{\top}\in M\Leftrightarrow\left(\alpha,\beta\right)=\left(\theta+\pi/2,\phi+\pi/2\right)$ .

Further computation indicates the following four facts:

(i)

$f\left(P\right)=g^{\top}Ah=\sigma$ and $f\left(Q\right)=G^{\top}AH=b$ ; 2. (ii)

From

[TABLE]

it follows that $\det\left(\mathrm{Hess}_{P}f\right)=\sigma^{2}-b^{2}$ and $\det\left(\mathrm{Hess}_{Q}f\right)=b^{2}-\sigma^{2}$ . Therefore, we see

(a)

if $x>\sigma\geq b$ , then $M_{x}$ does not contain any critical points, so $\chi\left(M_{x}\right)=0$ ; 2. (b)

if $x<b\leq\sigma$ , then $M_{x}$ contains both critical points, and thus

[TABLE] 3. (c)

the only nontrivial case is $\sigma\geq x\geq b$ , then

[TABLE] 3. (iii)

Since

[TABLE]

we have

[TABLE]

where $\wedge$ is the exterior product for vectors. 4. (iv)

Let $M=\left(\begin{matrix}m_{11}&0\\ m_{21}&m_{22}\end{matrix}\right)$ and $\Sigma=\left(\begin{matrix}1/s_{1}&0\\ 0&1/s_{2}\end{matrix}\right)$ such that

[TABLE]

Then

[TABLE]

where

[TABLE]

Hence, we have

[TABLE]

Note that we have $\int_{-\infty}^{\infty}db\ldots=\int_{-\infty}^{x}db\ldots$ by the anti-symmetry of $\sigma$ and $b$ in this case. In other words, integrals over $\sigma>x>0,b>x,\sigma>b$ and $\sigma>x>0,b>x,\sigma<b$ are canceled. Thus, we have

[TABLE]

In summary, we have obtained Theorem 1 in the case that $A$ has a Gaussian distribution.

A numerical example is given below.

Example 1.

We evaluate (19) with parameters

[TABLE]

and derive Table 1.

Here, the probability $\Pr(\sigma>x)$ is estimated by a Monte Carlo simulation with 10,000,000 iterations, and the expectation of the Euler characteristic is evaluated by a numerical integration function NIntegrate on Mathematica. As expected, ${\rm E}[\chi(M_{x})]\approx\Pr(\sigma>x)$ when $x$ is large.

4 Computer algebra and the expectation for small $m$ and $n$

In this section, we study the non-central case $M\neq 0$ with the help of computer algebra. When $m=n=2$ , we can perform the holonomic gradient method (HGM) [14] to evaluate the integral (5).

In Section 3, we derive an integral formula (19) for the case $m=n=2$ . For (19), we set

[TABLE]

Then we have

[TABLE]

where $\tilde{R}$ is a rational function in $\sigma,b,s,t$ . Since the integrand is a holonomic function in $\sigma,b,s,t$ , we can apply the creative telescoping method [37] to derive holonomic systems for the integrals. This is straightforward for the inner single integral of ${\rm E}[\chi(M_{x})]$ by the classic methods [23] (such as Zeilberger’s algorithm, Takayama’s algorithm and Chyzak’s algorithm). Below is an example.

Example 2.

Consider the inner single integral of (20):

[TABLE]

where $\tilde{R}$ is a rational function in $\sigma,b,s,t$ . Since $f_{0}$ is a holonomic function, we can compute a holonomic system satisfied by $f_{0}$ using the Mathematica package HolonomicFunctions [24]. Using the holonomic system satisfied by $f_{0}$ and Chyzak’s algorithm [8], we can then derive a holonomic system of $f_{1}$ , which is of holonomic rank $2$ . The detailed calculations can be found in the supplementary material [33].

In the aforementioned example, we use Chyzak’s algorithm to derive a holonomic system of the inner single integral of ${\rm E}[\chi(M_{x})]$ . This can be done within 5 seconds on a Linux computer with 15.10 GB RAM. However, experiments show that it is not efficient enough to derive a holonomic system for the inner double integral in the same way within reasonable computational time because of the complexity of this algorithm. To speed up the computation, we intend is to utilize the Stafford theorem [17, 30] empirically. Let us first recall the theorem. Assume that $\mathbb{K}$ is a field of characteristic [math] and $n$ is a positive integer. Let $R_{n}=\mathbb{K}(x_{1},\ldots,x_{n})[\partial_{1},\ldots,\partial_{n}]$ and $D_{n}=\mathbb{K}[x_{1},\ldots,x_{n}][\partial_{1},\ldots,\partial_{n}]$ be the ring of differential operators with rational coefficients and the Weyl algebra in $n$ variables, respectively.

Theorem 2.

Every left ideal in $R_{n}$ or $D_{n}$ can be generated by two elements.

Assume that $I$ is a left ideal in $R_{n}$ or $D_{n}$ . We observe from experiments that for any two random operators $a,b\in I$ , it is of high probability that $I=\langle a,b\rangle$ . This suggests the following heuristic method for computing a holonomic system for the inner double integral of ${\rm E}[\chi(M_{x})]$ . As a matter of notation, we set

[TABLE]

Recall that a D-finite system [5] in $R_{n}$ is a finite set of generators of a zero-dimensional ideal in $R_{n}$ . The relation between D-finite systems and holonomic systems is illustrated in [16, Section 6.9]. For the application of the HGM, D-finite systems are alternative to holonomic systems. Here, we use D-finite systems because they are more efficient for computation.

Heuristic 1.

Given a D-finite system $G$ in $R_{n}$ , compute another D-finite system $G_{1}$ in $R_{n-1}$ such that

[TABLE]

(i)

Choose two finite support set $S_{1},S_{2}\in T_{n-1}$ . 2. (ii)

Using the polynomial ansatz method [23, Section 3.4], check whether there exist telescopers $P_{1},P_{2}\in R_{n-1}$ of $G$ with support sets $S_{1},S_{2}$ or not. If $P_{1}$ and $P_{2}$ exist, then go to the next step. Otherwise, go to step 1. 3. (iii)

Compute the Gröbner basis $G_{1}$ of $\{P_{1},P_{2}\}$ with respect to a term order [9] in $T_{n-1}$ . If $G_{1}$ is D-finite, then output $G_{1}$ . Otherwise, go to step 1.

In the aforementioned heuristic method, we need to find two finite support set $S_{1},S_{2}\in T_{n-1}$ through trial and error so that the computation terminates and finishes in reasonable time. Next, we demonstrate its application to derive a D-finite system for the inner double integral of ${\rm E}[\chi(M_{x})]$ .

Example 3.

Consider the inner double integral of (20):

[TABLE]

where $f_{1}(\sigma,b,s)$ is defined in Example 2.

Let $G$ be a D-finite system of $f_{1}$ , which is derived from Example 2. Using $G$ and the polynomial ansatz method, we find two non-zero annihilators $P_{1}$ and $P_{2}$ for $f_{2}$ with support sets $S_{1}$ and $S_{2}$ , respectively, where

[TABLE]

Then we compute the Gröbner basis $G_{1}$ of $\{P_{1},P_{2}\}$ in $\mathbb{Q}(b,\sigma)[\partial_{b},\partial_{\sigma}]$ with respect to a total degree lexicographic order. We find that $G_{1}$ is a D-finite system of holonomic rank $6$ . The details of the calculation can be found in [33].

In the aforementioned example, we specify the parameters in the integrand as those in Example 1. Using Heuristic 1, we can further compute a holonomic system for the inner double integral of ${\rm E}[\chi(M_{x})]$ without specifying those parameters (pars). This is significantly more efficient than Chyzak’s algorithm. Table 2 compares Chyzak’s algorithm (chyzak) and Heuristic 1 (heuristic) in terms of computation time (s).

Next, we use Heuristic 1 to derive a D-finite system of the inner triple integral of ${\rm E}[\chi(M_{x})]$ and then numerically solve the corresponding ordinary differential equation. Finally, we use numerical integration to evaluate ${\rm E}[\chi(M_{x})]$ .

Example 4.

Consider

[TABLE]

where $f_{2}(\sigma,b)$ is specified in (21) with parameters

[TABLE]

By Example 3, we have derived a D-finite system for $f_{2}$ . Using Heuristic 1, we derive a D-finite system for the inner first integral $f_{3}$ of (22) of the following form:

[TABLE]

where $c_{i}\in\mathbb{Q}[\sigma],i\in\{0,\ldots,10\}$ . Now, we first numerically solve the ordinary differential equation $P(f_{3})=0$ to evaluate $f_{3}$ , and then evaluate ${\rm E}[\chi(M_{x})]$ by using numerical integration. Table 3 are the corresponding numerical results,

where mc represents the Monte Carlo simulation of ${\rm E}[\chi(M_{x})]$ by the following formula with 10,000,000 iterations:

[TABLE]

with

[TABLE]

where $\sigma_{i}$ and $b_{i}$ are singular values of $M_{x,i}$ , $i\in\{1,\ldots,m\}$ .

As expected, the results of the HGM are approximate to those of the Monte Carlo simulation. The detailed computation can be found in [33].

The evaluations of ${\rm E}[\chi(M_{x})]$ in the above example are also approximate of those given in Example 1. The source codes for this section and a demo notebook are freely available as part of the supplementary electronic material [33].

Example 5.

We consider the evaluation of (20) with parameters

[TABLE]

It is difficult to evaluate (20) for the relatively large parameters $s_{i}$ by numerical integration (even with the Monte Carlo integration). Thus, we take a different approach. Using Heuristic 1, we can compute a linear ordinary differential equation (ODE) for (20) of rank $11$ with respect to the independent variable $x$ . Then we construct series solutions for this differential equation and use them to extrapolate results by simulations.

Although this extrapolation method is well known, we explain it in a subtle form with application in our evaluation problem. Consider an ODE with coefficients in $\mathbb{Q}(x)$ of rank $r$ . Let $c\in\mathbb{Q}$ be a point in the $x$ -space and we take $r$ increasing numbers $y_{j}\in\mathbb{Q}$ , where $j\in\{0,\ldots,r-1\}$ . We construct a series solution $f_{i}(x)$ as a series in $x-(c+y_{i})$ . We may further assume that $c+y_{i}$ is not a singular point of the ODE for each $i$ . The initial value vector may be taken suitably so that the series is determined uniquely over $\mathbb{Q}$ .

We assume that the vector $(f_{i}(x))$ converges in a segment $I$ containing all $c+y_{i}$ ’s and that it is a basis of the solution space. Once we construct such a basis of series solutions, we can construct the solution $f(x)$ that takes values $b_{j}$ at $x=p_{j}\in\mathbb{Q}\cap I$ , $j\in\{0,\ldots,r-1\}$ . To be specific, we set

[TABLE]

with unknown coefficients $t_{i}$ ’s. Then we have

[TABLE]

The unknown coefficients $t_{i}$ ’s can be determined by solving the system of linear equations

[TABLE]

We call $f$ the extrapolation function by series solutions of ODE. We call $b_{j}$ the reference value of $f$ at reference point $p_{j}$ .

Let us now come back to our example. The linear ODE for (20) has rank $r=11$ . We set $c=370/100-1/100$ and the $y_{j}$ ’s as $[0,1/100,\ldots,10/100]$ . Then we have

[TABLE]

We construct an approximate series solution $f_{i}(x)$ by taking $20000$ terms with rational arithmetic.

We set the reference points $p_{j}=38/100+j/1000$ , $p_{0}=3.8,\ldots,p_{10}=3.81$ and construct a matrix related to (23). Numbers in the matrix are translated to approximate rational numbers to avoid the instability problem of solving linear equations (23) with floating point numbers.

We assume that the expectation of the Euler characteristic of $M_{x}$ is almost equal to the probability $\Pr(\ell_{1}>x)$ that the first eigenvalue is larger than $x$ . In fact, we have the Euler expectation ${\rm E}[\chi(M_{x})]=\Pr(\ell_{1}>x)-\Pr(\ell_{2}>x)$ in this case, where $\ell_{i}$ is the $i$ -th eigenvalue. We have $\Pr(\ell_{2}>3.8)=0$ by the Monte-Carlo simulation with $1,000,000$ tries. Then we may suppose that reference values $f(p_{j})$ are estimated by Monte-Carlo simulation for $\Pr(\ell_{1}>x)$ . We construct a solution $f(x)$ with these reference values. Evaluation of $f(x)$ is done with big floats.

Table 4 represents the values of the extrapolation function $f(x)$ obtained by the above method with the big floats of 380 digits and that by simulation with $1,000,000$ samples. One simulation takes approximately $573$ s by using the R package * mnormt* on a machine with Intel Xeon CPU(2.70GHz) and 256G memory.

The solid line in Fig. 1 is obtained by this extrapolation function. The line goes to a big value at $x=3.866$ because this $x$ is out of the domain of convergence of this approximate series. Dots are values obtained by simulation and those on the thick solid line are values used as reference values to obtain the extrapolation function.

We obtain the series $f_{i}$ with $20,000$ terms in $5661$ s by using Risa/Asir on a machine with Intel Xeon CPU(2.70GHz) and 256G memory. The time to evaluate the extrapolation function at $61$ points is $14.03$ s. On the other hand, if we want to obtain simulation values at 61 points, we need about $573\times 61=34953s$ . Thus, our extrapolation method is of advantage in evaluating the function ${\rm E}[\chi(M_{x})]$ for many $x$ .

Appendix A Proof of Lemma 1

Recall that we are dealing with the Wishart matrix $W=AA^{\top}$ with $A$ given in (15). For an $m\times 1$ unit vector $p$ , $p^{\top}Wp$ is distributed as

[TABLE]

where $c\cdot\chi^{2}(n;\delta^{2})$ represents the distribution of $c$ times a non-central chi-square random variable with $n$ degrees of freedom and non-central parameter $\delta^{2}$ .

We consider the case $m=2$ . From the characterization of the largest and smallest eigenvalues, we have $\lambda_{1}(A)\geq p_{1}^{\top}Ap_{1}$ and $\lambda_{2}(A)\leq p_{2}^{\top}Ap_{2}$ , where $p_{1}$ and $p_{2}$ are arbitrary $2\times 1$ unit vectors.

(i) Suppose that $\Sigma$ has two distinct eigenvalues. Set $p_{1}$ and $p_{2}$ to be two eigenvectors of $\Sigma$ corresponding to the eigenvalues $\lambda_{1}(\Sigma)$ and $\lambda_{2}(\Sigma)$ . Then

[TABLE]

where $\delta^{2}_{i}=p_{i}^{\top}Mp_{i}/\lambda_{i}(\Sigma)$ , $i=1,2$ . Note that $\delta_{i}$ can be zero.

The tail behaviors of the central and non-central chi-square distributions were investigated by Beran [4]. From (2.9) and (3.3) of [4], combined with the asymptotics for the modified Bessel function of the first kind $I_{\nu}(x)\sim e^{x}/\sqrt{x}$ as $x\to\infty$ , we have

[TABLE]

as $x\to\infty$ . In either case whether $\delta_{i}^{2}$ is zero or not, the right-hand side of (24) goes to zero as $x$ goes to infinity.

(ii) Suppose that $\Sigma=\sigma I_{2}$ and $M=\Sigma\Omega$ has two distinct eigenvalues. Set $p_{1}$ and $p_{2}$ to be two eigenvectors of $M$ corresponding to the eigenvalues $\lambda_{1}(M)$ and $\lambda_{2}(M)$ , respectively. Then

[TABLE]

which goes to zero as $x$ goes to infinity. ∎

Appendix B Central case with a scalar covariance: Selberg type integral and Laguerre polynomials

We assume that $M=0$ (central) and $\Sigma$ in (16) is a scalar matrix, and we study this case by special functions. Under these assumptions, we show that the expectation of the Euler characteristic can be expressed in terms of a Selberg type integral, which is equal to a Laguerre polynomial in view of the works by Aomoto [3] and Kaneko [20].

Theorem 3.

Let

[TABLE]

Assume that the distribution of $m\times n$ random matrices $A$ is the Gaussian distribution with mean [math] and covariance $I_{m}/s$ . In other words, we have

[TABLE]

Then we have

[TABLE]

where $c_{1},c_{2},c_{3},c_{4},c_{5}$ are given by (25), (26), (28), (30), (34), respectively.

Proof. For $g\in S^{m-1},h\in S^{n-1}$ , set

[TABLE]

Then the $m\times n$ matrix $A$ can be written as

[TABLE]

We denote by ${\widetilde{B}}$ the middle matrix in the above expression.

Set $\mathrm{etr}(X)=\exp(\mathrm{tr}(X))$ and $S=\Sigma^{-1}$ . We consider the central case $M=0$ in (14). Since $\mathrm{tr}(PQ)=\mathrm{tr}(QP)$ and ${\widetilde{H}}^{\top}{\widetilde{H}}=E$ , we have

[TABLE]

It follows from Theorem 1 with $p(A)$ being the normal distribution that

[TABLE]

where

[TABLE]

We denote by $G_{i}$ the $i$ -th column vector of $G$ and by $dg$ the column vector of the differential form $dg_{i}$ . Define

[TABLE]

It is an invariant measure for rotations on $S^{m-1}$ [18, Theorem 4.2]. We may define $H^{\top}dh$ analogously.

Moreover, since $S=sI_{m}$ , we have

[TABLE]

Since there is no $G,H$ involved on the right side of the above identity, we can separate the following integral:

[TABLE]

Therefore, we only need to evaluate the integral

[TABLE]

We denote the integral above by $q(s;\sigma)$ . In terms of $q(s;\sigma)$ , we have

[TABLE]

We make the singular value decomposition of the matrix $B$ as $B=PLQ^{\top}$ , where the matrices $P\in O(m-1)$ , $Q\in V_{m-1}(\mathbb{R}^{n-1})$ (Stiefel manifold), $L=\mathrm{diag}(\ell_{1},\ldots,\ell_{m-1})$ (see, e.g., [18] and [34, (3.1)]). It follows from [34, (3.1)] that

[TABLE]

the volume element of the Stiefel manifold, when $\ell_{1}\geq\ell_{2}\geq\cdots\geq\ell_{m-1}\geq 0$ . Here, $P_{i}$ is the $i$ -th column vector of $P$ . Since

[TABLE]

and

[TABLE]

we have

[TABLE]

where $c_{3}^{\prime}(m,n;\sigma)=c_{3}(m,n,;\sigma)\exp\left(-\frac{s}{2}\sigma^{2}\right)$ ,

[TABLE]

In (28), there is a constant $(m-1)!2^{m-1}2^{m-1}$ involved in the denominator because in this case $(m-1)!2^{m-1}$ copies of the domain $\ell_{1}\geq\ell_{2}\geq\ldots\geq\ell_{m-1}\geq 0$ cover $\mathbb{R}^{m-1}$ , and the correspondence between the coordinates of $B$ and those of its singular value decomposition is $1/2^{m-1}$ because we have the choice of signs of the eigenvector $P_{i}$ . For the volumes of $O(m-1)$ and $V_{m-1}(\mathbb{R}^{n-1})$ , see, e.g., [38, Proposition 2.23, Theorem 2.24].

In (27), we make a change of variables by $\ell_{i}^{\prime}=\ell_{i}^{2}$ . Then we have $d\ell_{i}^{\prime}=2\ell_{i}d\ell_{i}$ , and

[TABLE]

Furthermore, we have

[TABLE]

Set $\ell_{i}^{\prime}=\frac{2}{s}\ell_{i}^{\prime\prime}$ and factor out $s>0$ . Then it follows from $d\ell_{i}^{\prime}=\frac{2}{s}d\ell_{i}^{\prime\prime}$ that

[TABLE]

where

[TABLE]

and

[TABLE]

The integral (29) can be expressed as a polynomial in $\sigma$ . Let us derive differential equations for this integral and express it in terms of a special polynomial. We utilize the result by Aomoto [3] and its generalization by Kaneko [20]. In [20], a system of differential equations, special values, and an expansion in terms of Jack polynomials were given for the integral

[TABLE]

when $\mu=1$ or $\mu=-\lambda/2$ . Let us make the coordinate change $\ell_{i}=y_{i}/N$ , $\lambda_{2}=N$ , $\sigma_{i}=\tau_{i}/N$ . Then we have $d\ell_{i}=dy_{i}/N$ , $(1-\ell_{i})^{\lambda}=(1-y_{i}/N)^{N}$ ,

[TABLE]

The integral (31) becomes

[TABLE]

When $N\rightarrow\infty$ , the above integral divided by $c_{N}$ converges to

[TABLE]

Let us apply this limiting procedure to derive a differential equation for the above integral. When $r=\mu=1$ , the differential equation for the integral (31) is

[TABLE]

where $a=-(m-1)$ , $b=\frac{2}{\lambda}(\lambda_{1}+\lambda_{2}+2)+(m-1)+1$ , $c=\frac{2}{\lambda}(\lambda_{1}+1)$ . This is the Gauss hypergeometric equation. Set $\lambda_{2}=N$ , $\sigma=\frac{z}{N}$ . Then we can find the limit of this equation when $N\rightarrow\infty$ . In fact, it can be performed as follows. Set $\theta_{z}=z\partial_{z}$ . Note that (33) is invariant by the scalar multiplication of $z$ . Then the limit of

[TABLE]

when $N\rightarrow\infty$ , equals

[TABLE]

In particular, when $\lambda=1$ and $\lambda_{1}=-1/2+(n-m)/2$ , we have

[TABLE]

A polynomial solution of the above equation can be written as a constant multiple of the confluent hypergeometric polynomial ${}_{1}F_{1}(-(m-1),1+n-m;2z)$ . Therefore, it follows from (29), (32) and the above argument that

[TABLE]

where

[TABLE]

by taking a limit of the Selberg integral formula [32] with an analogous method as was used when deriving (32). ∎

Let us make a numerical evaluation by utilizing Theorem 3 when $m=n=3$ . If $m=n=3$ , we have

[TABLE]

Since

[TABLE]

where the last integral is equal to the upper tail probability of the Gamma distribution with scale parameter $2/s$ and shape parameter $k+1/2$ . It follows from Theorem 3 that the expectation is equal to

[TABLE]

An R code for evaluating ${\rm E}[\chi(M_{x})]$ in this case is as follows:

ug2<-function(s,k,x) { return(pgamma(x^2, scale=2/s, shape=k+1/2, lower = FALSE)* gamma(k+1/2)(2/s)^(k+1/2)/2); } ec3<-function(x,s) { cc<- 2(2/pi)^(1/2)s^(1/2); c5<-1; return(ccc5* (ug2(s,0,x)-2sug2(s,1,x)+(1/2)s^2ug2(s,2,x))); }

Draw a graph

curve(ec3(x,1),from=1,to=10)

When $s=1$ , some values are given in Table 5:

We present two graphs in Fig. 2 to compare our approximate formula with the exact values by the Pfaffian of a matrix (see, e.g., [26, 6]). The matrix sizes are $m=n=10$ and $m=10,n=12$ , and $s=1$ . The horizontal axis is $x^{2}$ . Note that our approximation formula is expressed as a finite sum of $m$ terms of incomplete Gamma functions which can be evaluated faster than the Pfaffian of an $m\times m$ matrix when $m$ becomes larger. The approximation error was evaluated by [27, 28] as

[TABLE]

which is exponentially smaller than

[TABLE]

This explains the very accurate tail behaviors in Figs. 2 and 3. Note that $\Delta(x)$ is always negative because (1) is always less than $\Pr(\lambda_{1}(W)\geq x)$ .

The two graphs in Fig. 3 are to compare our approximate formula with values by a simulation of $10000$ tries. The matrix size is $m=10,n=100$ and $m=10,n=200$ , respectively, and $s=1$ . The horizontal axis is $x^{2}$ .

Acknowledgments

The authors would like to thank the Editor-in-Chief, Associate Editors and two anonymous referees who kindly reviewed the earlier versions of this paper and provided valuable comments and suggestions. Besides, we also deeply thank Christoph Koutschan, who is the author of the package HolonomicFunctions used in this study, for his help and encouragement. This research is partially supported by the Austrian Science Fund (FWF): P29467-N32, JSPS KAKENHI Grant Number 16H02792, the UTD start-up grant: P-1-03246, the Natural Science Foundation of USA grants: CCF-1815108 and CCF-1708884, and JST CREST Grant Number JP19209317.

References

Adler [1981]

R. J. Adler, The geometry of random fields, John Wiley & Sons, Ltd., Chichester, 1981. Wiley Series in Probability and Mathematical Statistics.

Adler and Taylor [2007]

R. J. Adler, J. E. Taylor, Random fields and geometry, Springer, New York, 2007.

Aomoto [1987]

K. Aomoto, Jacobi polynomials associated with selberg integrals, SIAM Journal on Mathematical Analysis 18 (1987) 545–549.

Beran [1975]

R. Beran, Tail probabilities of noncentral quadratic forms, The Annals of Statistics 3 (1975) 969–974.

Chen et al. [2019]

S. Chen, M. Kauers, Z. Li, Y. Zhang, Apparent singularities of D-finite systems, Journal of Symbolic Computation 95 (2019) 217–237.

Chiani [2016]

M. Chiani, Distribution of the largest root of a matrix for roy’s test in multivariate analysis of variance, Journal of Multivariate Analysis 143 (2016) 467–471.

Chikuse [1992]

Y. Chikuse, Properties of Hermite and Laguerre polynomials in matrix argument and their applications, Linear Algebra and its Applications 176 (1992) 237–260.

Chyzak [2000]

F. Chyzak, An extension of Zeilberger’s fast algorithm to general holonomic functions, Discrete Mathematics 217 (1-3) (2000) 115–134.

Coxe et al. [2015]

D. A. Coxe, J. Little, D. O’Shea, Ideals, Varieties, and Algorithms, Springer, New York, 4th edition, 2015.

Danufane et al. [2017]

F. H. Danufane, C. Siriteanu, K. Ohara, N. Takayama, Holonomic gradient method-based cdf evaluation for the largest eigenvalue of a complex noncentral Wishart matrix, 2017. ArXiv:1707.02564.

Davis [1979]

A. W. Davis, Invariant polynomials with two matrix arguments extending the zonal polynomials: Applications to multivariate distribution theory, Annals of the Institute of Statistical Mathematics 31 (1979) 465–485.

Davis [1980]

A. W. Davis, Invariant polynomials with two matrix arguments: extending the zonal polynomials, in: P. R. Krishnaiah (Ed.), Multivariate Analysis V, North-Holland Publishing Company, 1980, pp. 287–299.

Díaz-García and Gutiérrez-Jáimez [2011]

J. A. Díaz-García, R. Gutiérrez-Jáimez, On Wishart distribution: Some extensions, Linear Algebra and its Applications 435 (2011) 1296–1310.

Hashiguchi et al. [2013]

H. Hashiguchi, Y. Numata, N. Takayama, A. Takemura, Holonomic gradient method for the distribution function of the largest root of a wishart matrix, Journal of Multivariate Analysis 117 (2013) 296–312.

Hayakawa [1969]

T. Hayakawa, On the distribution of the latent roots of a positive definite random symmetric matrix i, Annals of the Institute of Statistical Mathematics 21 (1969) 1–21.

Hibi and et al. [2013]

T. Hibi, et al., Gröbner Bases: Statistics and software systems, Springer, New York, 2013.

Hillebrand and Schmale [2001]

A. Hillebrand, W. Schmale, Towards a effective version of a theorem of Stafford, Journal of Symbolic Computation 32 (2001) 699–716.

James [1954]

A. T. James, Normal multivariate analysis and the orthogonal group, The Annals of Mathematical Statistics 25 (1954) 40–75.

James [1955]

A. T. James, The non-central Wishart distribution, Proceedings of the Royal Society of London 229 (1955) 364–366.

Kaneko [1993]

J. Kaneko, Selberg integrals and hypergeometric functions associated with Jack polynomials, SIAM Journal on Mathematical Analysis 24 (1993) 1086–1110.

Kang and Alouini [2003]

M. Kang, M. S. Alouini, Largest eigenvalue of complex wishart matrices and performance analysis of MIMO MRC systems, IEEE Journal on Selected Areas in Communications 21 (2003) 418–426.

Kauers et al. [2009]

M. Kauers, C. Koutschan, D. Zeilberger, Proof of Ira Gessel’s lattice path conjecture, Proceedings of the National Academy of Sciences 106 (28) (2009) 1150211505.

Koutschan [2009]

C. Koutschan, Advanced applications of the holonomic systems approach, Ph.D. thesis, Johannes Kepler University Linz, 2009.

Koutschan [2010]

C. Koutschan, HolonomicFunctions user’s guide, Technical Report, Johannes Kepler University Linz, 2010. http://www.risc.jku.at/publications/download/risc_3934/hf.pdf.

Koutschan et al. [2011]

C. Koutschan, M. Kauers, D. Zeilberger, Proof of George Andrews’s and David Robbins’s $q$ -TSPP conjecture, Proceedings of the National Academy of Sciences 108 (6) (2011) 21962199.

Krishnaiah and Chang [1971]

P. R. Krishnaiah, T. C. Chang, On the exact distributions of the extreme roots of the wishart and manova matrices, Journal of Multivariate Analysis 1 (1971) 108–117.

Kuriki and Takemura [2001]

S. Kuriki, A. Takemura, Tail probabilities of the maxima of multilinear forms and their applications, The Annals of Statistics 29 (2001) 328–371.

Kuriki and Takemura [2008]

S. Kuriki, A. Takemura, Euler characteristic heuristic for approximating the distribution of the largest eigenvalue of an orthogonally invariant random matrix, Journal of Statistical Planning and Inference 138 (2008) 3357–3378.

Kuriki and Takemura [2009]

S. Kuriki, A. Takemura, volume of tubes and the distribution of the maximum of a Gaussian random field, selected papers on probability and statistics, American Mathematical Society Translations Series 2 227 (2009) 25–48.

Leykin [2004]

A. Leykin, Algorithmic proofs of two theorems of Stafford, Journal of Symbolic Computation 38 (2004) 1535–1550.

Muirhead [2005]

R. J. Muirhead, Aspects of multivariate statistical theory, Wiley, 2005.

Selberg [1944]

A. Selberg, Remarks on a multiple integral, Norsk Matematisk Tidsskrift 26 (1944) 71–78.

Takayama et al. [2019]

N. Takayama, L. Jiu, S. Kuriki, N. Takayama, Y. Zhang, Supplementary electronic material to the article computations of the expected Euler characteristic for the largest eigenvalue of a real non-central Wishart matrix, 2019. https://yzhang1616.github.io/ec1/ec1.html.

Takemura and Kuriki [1999]

A. Takemura, S. Kuriki, Shrinkage to smooth non-convex cone: Principal component analysis as Stein estimation, Communications in Statistics: Theory and Methods 28 (1999) 651–669.

Taylor and Worsley [2013]

J. E. Taylor, K. J. Worsley, Detecting sparse cone alternatives for Gaussian random fields, with an application to fMRI, Statistica Sinica 23 (2013) 1629–1656.

Worsley [1995]

K. J. Worsley, Boundary corrections for the expected Euler characteristic of excursion sets of random fields, with an application to astrophysics, Advances in Applied Probability 27 (1995) 943–959.

Zeilberger [1991]

D. Zeilberger, The method of creative telescoping, Journal of Symbolic Computation 11 (1991) 195–204.

Zhang [2015]

L. Zhang, volumes of orthogonal groups and unitary groups, 2015. ArXiv:1509.00537.

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Adler [1981] R. J. Adler, The geometry of random fields, John Wiley & Sons, Ltd., Chichester, 1981. Wiley Series in Probability and Mathematical Statistics.
2Adler and Taylor [2007] R. J. Adler, J. E. Taylor, Random fields and geometry, Springer, New York, 2007.
3Aomoto [1987] K. Aomoto, Jacobi polynomials associated with selberg integrals, SIAM Journal on Mathematical Analysis 18 (1987) 545–549.
4Beran [1975] R. Beran, Tail probabilities of noncentral quadratic forms, The Annals of Statistics 3 (1975) 969–974.
5Chen et al. [2019] S. Chen, M. Kauers, Z. Li, Y. Zhang, Apparent singularities of D-finite systems, Journal of Symbolic Computation 95 (2019) 217–237.
6Chiani [2016] M. Chiani, Distribution of the largest root of a matrix for roy’s test in multivariate analysis of variance, Journal of Multivariate Analysis 143 (2016) 467–471.
7Chikuse [1992] Y. Chikuse, Properties of Hermite and Laguerre polynomials in matrix argument and their applications, Linear Algebra and its Applications 176 (1992) 237–260.
8Chyzak [2000] F. Chyzak, An extension of Zeilberger’s fast algorithm to general holonomic functions, Discrete Mathematics 217 (1-3) (2000) 115–134.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Computation of the expected Euler characteristic for the largest eigenvalue of a real non-central Wishart matrix

Abstract

keywords:

MSC:

1 Introduction

Lemma 1**.**

2 Expectation of an Euler characteristic number

Proposition 1**.**

Lemma 2**.**

Theorem 1**.**

3 The case of m=n=2m=n=2m=n=2

Example 1**.**

4 Computer algebra and the expectation for small mmm and nnn

Example 2**.**

Theorem 2**.**

Heuristic 1**.**

Example 3**.**

Example 4**.**

Example 5**.**

Appendix A Proof of Lemma 1

Appendix B Central case with a scalar covariance: Selberg type integral and Laguerre polynomials

Theorem 3**.**

Draw a graph

Acknowledgments

References

Lemma 1.

Proposition 1.

Lemma 2.

Theorem 1.

3 The case of $m=n=2$

Example 1.

4 Computer algebra and the expectation for small $m$ and $n$

Example 2.

Theorem 2.

Heuristic 1.

Example 3.

Example 4.

Example 5.

Theorem 3.