Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal   Matrices

Rishabh Dudeja; Milad Bakhshizadeh; Junjie Ma; Arian Maleki

arXiv:1903.02676·cs.IT·March 6, 2020

Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices

Rishabh Dudeja, Milad Bakhshizadeh, Junjie Ma, Arian Maleki

PDF

TL;DR

This paper analyzes the effectiveness of spectral initialization methods for phase retrieval when using random orthogonal matrices, providing precise asymptotic characterizations for practical measurement models.

Contribution

It extends the theoretical understanding of spectral methods in phase retrieval to isotropically random orthogonal matrices, a more realistic model for practical systems.

Findings

01

Derived a simple expression for the overlap between spectral estimator and true signal.

02

Provided asymptotic analysis for large measurement and signal dimensions.

03

Enhanced understanding of spectral initialization performance in practical measurement models.

Abstract

Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the spectral method, in which the leading eigenvector of a data-dependent matrix is used as a starting point. Recently, the performance of the spectral initialization was characterized accurately for measurement matrices with independent and identically distributed entries. This paper aims to obtain the same level of knowledge for isotropically random column-orthogonal matrices, which are substantially better models for practical phase retrieval systems. Towards this goal, we consider the…

Equations575

y_{i} = ∣ (A x_{⋆})_{i} ∣,

y_{i} = ∣ (A x_{⋆})_{i} ∣,

x min i = 1 \sum m (y_{i}^{2} - ∣ a_{i}^{H} x ∣^{2})^{2},

x min i = 1 \sum m (y_{i}^{2} - ∣ a_{i}^{H} x ∣^{2})^{2},

M = Δ A^{H} T A

M = Δ A^{H} T A

\hat{x}

\hat{x}

μ_{A}

μ_{A}

S_{m, n}

S_{m, n}

C^{+} = Δ {z \in C \mathchar 58 Im (z) > 0} and C^{-} = Δ {z \in C \mathchar 58 Im (z) < 0} .

C^{+} = Δ {z \in C \mathchar 58 Im (z) > 0} and C^{-} = Δ {z \in C \mathchar 58 Im (z) < 0} .

A_{ϵ} = Δ {x \mathchar 58 dist (x, A) < ϵ} .

A_{ϵ} = Δ {x \mathchar 58 dist (x, A) < ϵ} .

y = ∣ A x_{⋆} ∣

y = ∣ A x_{⋆} ∣

A = H S_{m, n}, H \sim Unif (U (m)),

A = H S_{m, n}, H \sim Unif (U (m)),

\hat{x} = ar g ∥ u ∥ = 1 max u^{H} M u,

\hat{x} = ar g ∥ u ∥ = 1 max u^{H} M u,

y \geq 0 sup T (y) = 1, y \geq 0 in f T (y) = 0

y \geq 0 sup T (y) = 1, y \geq 0 in f T (y) = 0

\tilde{T} (y) = Δ (T (y) - a) / (b - a) .

\tilde{T} (y) = Δ (T (y) - a) / (b - a) .

M = Δ A^{H} T A

M = Δ A^{H} T A

= \frac{1}{b - a} M - \frac{a}{b - a} I_{n} .

Λ (τ)

Λ (τ)

ψ_{2} (τ)

λ_{1} (M) \to a.s. {Λ (τ_{r}), Λ (θ_{⋆}), ψ_{1} (τ_{r}) \leq \frac{δ}{δ - 1}, ψ_{1} (τ_{r}) > \frac{δ}{δ - 1} .

λ_{1} (M) \to a.s. {Λ (τ_{r}), Λ (θ_{⋆}), ψ_{1} (τ_{r}) \leq \frac{δ}{δ - 1}, ψ_{1} (τ_{r}) > \frac{δ}{δ - 1} .

\frac{∣ x _{⋆}^{H} x ^ ∣ ^{2}}{n} \to a.s. ⎩ ⎨ ⎧ 0, \frac{( \frac{δ}{δ - 1} ) ^{2} - \frac{δ}{δ - 1} \cdot ψ _{2} ( θ _{⋆} )}{ψ _{3} ( θ _{⋆} ) ^{2} - \frac{δ}{δ - 1} \cdot ψ _{2} ( θ _{⋆} )}, ψ_{1} (τ_{r}) < \frac{δ}{δ - 1}, ψ_{1} (τ_{r}) > \frac{δ}{δ - 1} .

\frac{∣ x _{⋆}^{H} x ^ ∣ ^{2}}{n} \to a.s. ⎩ ⎨ ⎧ 0, \frac{( \frac{δ}{δ - 1} ) ^{2} - \frac{δ}{δ - 1} \cdot ψ _{2} ( θ _{⋆} )}{ψ _{3} ( θ _{⋆} ) ^{2} - \frac{δ}{δ - 1} \cdot ψ _{2} ( θ _{⋆} )}, ψ_{1} (τ_{r}) < \frac{δ}{δ - 1}, ψ_{1} (τ_{r}) > \frac{δ}{δ - 1} .

y_{i}

y_{i}

\displaystyle Z\sim\mathcal{CN}\left(0,1\right),\;Y\sim f\bigg{(}\cdot\bigg{|}\frac{Z}{\sqrt{\delta}}\bigg{)},\;T=\mathcal{T}(Y).

\displaystyle Z\sim\mathcal{CN}\left(0,1\right),\;Y\sim f\bigg{(}\cdot\bigg{|}\frac{Z}{\sqrt{\delta}}\bigg{)},\;T=\mathcal{T}(Y).

m, n \to \infty m = n δ lim sup \frac{∣ x _{⋆}^{H} x ^ ∣ ^{2}}{n}

m, n \to \infty m = n δ lim sup \frac{∣ x _{⋆}^{H} x ^ ∣ ^{2}}{n}

ρ_{opt}^{2} (δ)

ρ_{opt}^{2} (δ)

ψ_{1}^{opt} (τ)

ψ_{1}^{opt} (τ)

Z

Z

T_{opt} (y)

T_{opt} (y)

T_{opt, ϵ} (y)

T_{opt, ϵ} (y)

ϵ ↓ 0 lim m, n \to \infty m = n δ lim \frac{∣ x _{⋆}^{H} x ^ _{ϵ} ∣ ^{2}}{n}

ϵ ↓ 0 lim m, n \to \infty m = n δ lim \frac{∣ x _{⋆}^{H} x ^ _{ϵ} ∣ ^{2}}{n}

ρ^{2} (A, x_{⋆})

ρ^{2} (A, x_{⋆})

H_{m} = d H_{m} \cdot [Γ 0 0 I_{m - n}],

H_{m} = d H_{m} \cdot [Γ 0 0 I_{m - n}],

A = [A_{1}, A_{- 1}],

A = [A_{1}, A_{- 1}],

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices

Rishabh Dudeja, Milad Bakhshizadeh, Junjie Ma, Arian Maleki

Department of Statistics, Columbia University

Abstract

Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. There has been recent interest in understanding the performance of local search algorithms that work directly on the non-convex formulation of the problem. Due to the non-convexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the spectral method, in which the leading eigenvector of a data-dependent matrix is used as a starting point. Recently, the performance of the spectral initialization was characterized accurately for measurement matrices with independent and identically distributed entries. This paper aims to obtain the same level of knowledge for isotropically random column-orthogonal matrices, which are substantially better models for practical phase retrieval systems. Towards this goal, we consider the asymptotic setting in which the number of measurements $m$ , and the dimension of the signal, $n$ , diverge to infinity with $m/n=\delta\in(1,\infty)$ , and obtain a simple expression for the overlap between the spectral estimator and the true signal vector.

Index Terms:

Phase Retrieval, Spectral Initialization, Random Orthogonal Matrices, Coded Diffraction Pattern, Phase Transition, Random Matrix Theory.

I Introduction

Phase retrieval refers to the problem of recovering a signal $\bm{x}_{\star}\in\mathbb{C}^{n}$ from a set of phaseless linear observations $\bm{y}\in\mathbb{R}^{m}$ . Under the absence of the measurement noise, the acquisition process is modeled as

[TABLE]

where $\bm{A}\in\mathbb{C}^{m\times n}$ is a measurement matrix and $(\cdot)_{i}$ denotes the $i^{\rm th}$ element of a vector. The phase retrieval problem is intended to model practical imaging systems where it is difficult to measure the phase of the measurements [1]. A number of recent recovery algorithms pose Phase retrieval as a non-convex optimization problem, and employ a local search algorithm to find the minimizer [2, 3, 4, 5]. For instance, the well known Wirtinger Flow algorithm [2] solves the optimization problem:

[TABLE]

using gradient descent.

Since the optimization problem (1) is non-convex, the initialization can have an impact on the success of local search algorithms. The most widely used initialization scheme, known as spectral initialization [6, 3, 4, 7, 8, 9], uses the leading eigenvector of the following data-dependent matrix:

[TABLE]

as the starting point for local search algorithms. In the above equation, $\bm{T}={\rm Diag}(\mathcal{T}(y_{1}),\mathcal{T}(y_{2}),\ldots,\mathcal{T}(y_{m}))$ , and $\mathcal{T}(\cdot)$ denotes a suitable trimming function. Let $\hat{\bm{x}}$ denote the leading eigenvector of $\bm{M}$ normalized to have unit Euclidean ( $\ell_{2}$ ) norm. That is,

[TABLE]

The earliest analysis [6, 2] of the spectral estimator showed that if number of measurements $m$ is large enough (for a fixed $n$ ), then the leading eigenvector of $\bm{M}$ is a consistent estimator of the true signal vector. However these analyses had two drawbacks: (i) They only provide information about the order of measurements required for a successful initialization and not a sharp requirement on the sampling ratio $m/n$ , (ii) These analyses fail to capture the difference in the performance of various trimming functions. Recently, Lu and Li [7] have analyzed the spectral estimator for measurement matrices that are composed of independent and identically distributed (i.i.d.) standard normal entries in the high dimensional asymptotic regime. More specifically, Lu and Li considered the asymptotic setting in which $m,n\rightarrow\infty$ , $m/n=\delta$ , and obtained a sharp characterization for the overlap between the leading eigenvector and the true signal. In follow up work by Mondelli and Montanari [8] and Luo, Alghamdi and Lu [9] this characterization was leveraged to design optimal trimming functions. For the optimal trimming function, the overlap $|\hat{\bm{x}}^{\mathsf{H}}\bm{x}_{\star}|^{2}/\|\bm{x}_{\star}\|^{2}$ converges to zero when $\delta<1$ , and converges to a strictly positive value otherwise.

A major assumption in the analysis of [7, 8, 9] is that the measurement matrix $\bm{A}$ contains i.i.d. Gaussian entries. However, it is well-known that many important applications of phase retrieval are concerned with Fourier-type matrices [10]. This leads to the following natural questions: (i) Are the conclusions of [7, 8, 9] correct for other matrices that are employed in practice? (ii) Is the optimal choice of trimming that was derived in [7, 8, 9] for Gaussian measurement matrices optimal for other matrices employed in practice? In response to these questions, Ma et al. [11] considered a popular class of matrices that can be used in phase retrieval systems, known as coded diffraction pattern (CDP) [12]. Through an extensive numerical study, the authors showed that the performance of the spectral initialization for such matrices closely approximates the performance of the spectral estimator for partial orthogonal matrices. The authors then designed an Expectation Propagation (EP) [13, 14] algorithm for the eigenvalue problem given in (3). EP algorithms had previously been proposed for partial orthogonal matrices in [15, 16] and their State Evolution (SE) had been analyzed in [17, 18]. Ma et al. used the SE of derived EP algorithm for the eigenvalue problem to derive a (conjectured) formula for the asymptotic overlap $|\hat{\bm{x}}^{\mathsf{H}}\bm{x}_{\star}|^{2}/\|\bm{x}_{\star}\|^{2}$ between the true signal vector and the spectral initialization. However, while it is believed that EP algorithm indeed solves the eigenvalue problem (this has also been observed in simulations), this has not been shown rigorously. As a result of such studies, the authors conjectured that for partial orthogonal matrices if the trimming function is chosen optimally, then for $\delta>2$ , $|\hat{\bm{x}}^{\mathsf{H}}\bm{x}_{\star}|^{2}/\|\bm{x}_{\star}\|^{2}>0$ , and for $\delta<2$ , $|\hat{\bm{x}}^{\mathsf{H}}\bm{x}_{\star}|^{2}/\|\bm{x}_{\star}\|^{2}=0$ , in the asymptotic setting where $n,m=\delta n\to\infty$ . As mentioned previously, the simulations in [11] suggest that these conjectures are also likely to hold for CDP matrices.

In this paper, we prove the conjectures presented in [11] for partial orthogonal matrices using tools from the free probability theory [19]. We believe this is the first theoretical justification that the expectation propagation framework can correctly predict the statistical properties of the solutions to non-convex optimization problems. The main technical step in our proof is the identification of the location of the largest eigenvalue using a subordination function [19]. Interestingly, this subordination function appears naturally in the expectation propagation (EP) algorithm of [11].

II Main result

II-A Notation

II-A1 For Linear Algebraic Aspects

For a matrix $\bm{A}$ , $\bm{A}^{\mathsf{H}}$ refers to the conjugate transpose of $\bm{A}$ . For a matrix $\bm{A}\in\mathbb{C}^{n\times n}$ , with real eigenvalues, we use ${\lambda_{1}(\bm{A})\geq\lambda_{2}(\bm{A})\dots\geq\lambda_{n}(\bm{A})}$ to denote the eigenvalues arranged in descending order. We use $\sigma(\bm{A})$ to refer to the spectrum of $\bm{A}$ which is simply the set of eigenvalues $\{\lambda_{1}(\bm{A}),\lambda_{2}(\bm{A})\dots\lambda_{n}(\bm{A})\}$ . Finally we define the spectral measure of $\bm{A}$ , denoted by $\mu_{\bm{A}}$ as,

[TABLE]

For $m,n\in\mathbb{N}$ , we denote the $m\times m$ identity matrix by $\bm{I}_{m}$ and a $m\times n$ matrix of all zero entries by $\bm{0}_{m,n}$ . For $m\geq n$ , We also define the special matrix $\bm{S}_{m,n}$ as:

[TABLE]

II-A2 For Complex Analytic Aspects

For a complex number $z\in\mathbb{C}$ , $\mathrm{Re}(z),\mathrm{Im}(z),\mathsf{Arg}(z),|z|,\overline{z}$ refer to the real part, imaginary part, argument, modulus and conjugate of $z$ . We denote the complex upper half plane and lower half planes by

[TABLE]

II-A3 For Probabilistic Aspects

We use $\mathcal{CN}\left(0,1\right)$ to denote the standard, circularly symmetric, complex Gaussian distribution. $\text{Unif}(\mathbb{U}_{m})$ denotes the Haar measure on the unitary group. We denote almost sure convergence, convergence in probability and convergence in distribution by $\overset{\text{\tiny{a.s.}}}{\rightarrow},\overset{\text{\tiny{P}}}{\rightarrow}$ and $\overset{\text{\tiny{d}}}{\rightarrow}$ respectively. Two random variables $X,Y$ are equal in distribution, denoted by $X\overset{\text{\tiny{d}}}{=}Y$ if they have the same distribution. Throughout this paper, the random variables $Z,T$ refer to the pair of random variables with the joint distribution given by $Z\sim\mathcal{CN}\left(0,1\right),T=\mathcal{T}(|Z|/\sqrt{\delta})$ . For a borel probability measure $\mu$ , we use $\text{Supp}({\mu})$ to denote the support of $\mu$ .

II-A4 Miscellaneous:

Let $A$ be a subset of $\mathbb{R}$ or $\mathbb{C}$ . $\overline{A}$ denotes the closure of $A$ . The distance from a point $x\in\mathbb{R}$ to $A$ is defined by $\text{dist}(x,A)=\inf_{y\in A}|x-y|$ . We define the $\epsilon$ neighborhood of $A$ , denoted by $A_{\epsilon}$ as

[TABLE]

The symbol $\emptyset$ is used to denote the empty set.

II-B Measurement Model and Spectral Estimator

In the phase retrieval problem we are given $m$ observations $\bm{y}\in\mathbb{R}^{m}$ generated as:

[TABLE]

where $\bm{x}_{\star}\in\mathbb{C}^{n}$ is the unknown signal vector and $\bm{A}\in\mathbb{C}^{m\times n}$ is the sensing matrix. We assume that $\|\bm{x}_{\star}\|=\sqrt{n}$ and that the matrix $\bm{A}$ is generated according to the following process: Sample $\bm{H}_{m}\in\mathbb{U}(m)$ from the Haar measure on the unitary group $\mathbb{U}(m)$ and set $\bm{A}$ to be the matrix formed by picking the first $n$ columns of $\bm{H}_{m}$ . More formally,

[TABLE]

and $\bm{S}$ is defined in (4). An important parameter for our analysis will be the sampling ratio, denoted by $\delta\overset{\scriptscriptstyle\Delta}{=}m/n$ . Let $\mathcal{T}\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}$ be a trimming function. We study spectral estimators $\hat{\bm{x}}$ constructed as the leading eigenvector of the matrix $\bm{M}$ , defined below:

[TABLE]

where $\bm{M}=\bm{A}^{\mathsf{H}}\bm{T}\bm{A}$ and $\bm{T}=\text{Diag}(\mathcal{T}(y_{1}),\mathcal{T}(y_{2})\dots\mathcal{T}(y_{m}))$ .

II-C Assumptions & Asymptotic Framework

We analyze the performance of the spectral estimator in an asymptotic setup where $n,m\rightarrow\infty,m/n=\delta>1$ . In particular, we consider a sequence of independent phase retrieval problems realized on the same probability space with increasing $n,m$ . We assume some regularity assumptions on the trimming function $\mathcal{T}$ which are stated below.

Assumption 1.

The trimming function $\mathcal{T}$ satisfies the following conditions:

$\mathcal{T}$ * is Lipschitz continuous.* 2. 2.

$\sup_{y\geq 0}\mathcal{T}(y)=1,\;\inf_{y\geq 0}\mathcal{T}(y)=0$ . 3. 3.

The random variable $T$ , defined by $Z\sim\mathcal{CN}\left(0,1\right)$ and $T=\mathcal{T}(|Z|/\sqrt{\delta})$ has a density with respect to the Lebesgue measure on $\mathbb{R}$ .

In the following remarks, we discuss why each of these assumptions are required and whether they can be relaxed.

Remark 1.

We need the trimming function $\mathcal{T}$ to be Lipschitz continuous so that the trimmed measurements $\mathcal{T}(y_{i})$ can be approximated in distribution by $\mathcal{T}(|Z|/\sqrt{\delta}),Z\sim\mathcal{CN}\left(0,1\right)$ . We expect this approximation to hold under weaker smoothness hypothesis on $\mathcal{T}$ than Lipschitz continuity.

Remark 2.

The assumptions:

[TABLE]

are no stronger than the assumption that $\mathcal{T}$ is a bounded trimming function. In fact, given any arbitary bounded trimming function with $\inf_{y\geq 0}\mathcal{T}(y)=a$ and $\sup_{y\geq 0}\mathcal{T}(y)=b$ , the spectral estimator constructed using $\mathcal{T}$ has the same performance as the spectral measure constructed using

[TABLE]

This is because,

[TABLE]

In particular $\bm{M}$ and $\widetilde{\bm{M}}$ have the same leading eigenvector. We require the assumption that the trimming function is bounded since a number of results in free probability theory that we rely on assume this.

Remark 3.

We need (3) in Assumption 1 to ensure that the limiting spectral measure of the matrix $\bm{M}$ has no discrete component. We expect that this assumption can be completely removed by a careful analysis since the location of point masses in the limiting spectral measure of $\bm{M}$ is well understood.

II-D Main Result

In order to state our main result about the performance of the spectral estimator, we need to introduce the following four functions:

[TABLE]

In the above display, the random variables $Z,T$ have the joint distribution given by $Z\sim\mathcal{CN}\left(0,1\right),\;T=\mathcal{T}(|Z|/\sqrt{\delta})$ . The functions $\Lambda,\psi_{1}$ are defined on $[1,\infty)$ and the functions $\psi_{2},\psi_{3}$ are defined on $(1,\infty)$ .

Remark 4.

Under Assumption 1, the support of the random variable $T$ is the interval $[0,1]$ . Hence the definition of these functions at $\tau=1$ needs some clarification. First, note that the random variable $(1-T)^{-1}\geq 0$ . Hence, the $\mathbb{E}[(1-T)^{-1}]$ is well-defined, but maybe $\infty$ . If it is finite, each of the above functions are well-defined at $\tau=1$ . If $\mathbb{E}[(1-T)^{-1}]=\infty$ , we define, $\Lambda(1)=1,\psi_{1}(1)=1$ . This corresponds to interpreting $1/\infty=0$ and $\infty/\infty=1$ in the definition of these functions.

Theorem 1.

Define $\tau_{r}\triangleq\arg\min_{\tau\in[1,\infty)}\Lambda(\tau)$ . Also, let $\theta_{\star}$ denote the unique value of $\theta>\tau_{r}$ that satisfies $\psi_{1}(\theta)=\frac{\delta}{\delta-1}$ . Then, under Assumption 1, we have

[TABLE]

Furthermore,

[TABLE]

Remark 5.

The proof of Theorem 1 shows that if $\psi_{1}(\tau_{r})>\delta/(\delta-1)$ , there exists exactly one solution to the equation $\psi_{1}(\theta)=\delta/(\delta-1),\;\theta\in(\tau_{r},\infty)$ . Hence, $\theta_{\star}$ is well-defined.

The proof of this result is postponed until Section IV. Before we proceed to the proof of this theorem, let us clarify some of its interesting features. First, note that similar to the Gaussian sensing matrices, even in the case of partial orthogonal matrices, the maximum eigenvector exhibits a phase transition behavior. For certain values of $\delta>1$ , the inequality $\psi_{1}(\tau_{r})<\frac{\delta}{\delta-1}$ holds, and hence the maximum eigenvector does not carry information about $\bm{x}_{*}$ . For other values of $\delta$ , the inequality $\psi_{1}(\tau_{\star})>\frac{\delta}{\delta-1}$ holds and hence, the direction of the maximum eigenvector starts to offer information about the direction of $\bm{x}_{*}$ . For typical choices of the trimming function $\mathcal{T}$ , there exists a critical value of $\delta$ , denoted by $\delta_{\mathcal{T}}$ such that, when $\delta<\delta_{\mathcal{T}}$ , the spectral estimator is asymptotically orthogonal to the signal vector. When $\delta>\delta_{\mathcal{T}}$ , the spectral estimator makes a non-trivial angle with the signal vector. This phase transition phenomena is illustrated in Figure 1 for 3 different choices of $\mathcal{T}$ .

Remark 6 (Choice of Trimming function).

The trimming function in Figure 1 are supported on $[0,1]$ .

$\mathcal{T}(y)=\delta y^{2}/(\delta y^{2}+\sqrt{\delta}-1)$ * is a translated and re-scaled version of the trimming function proposed by [8].* 2. 2.

$\mathcal{T}(y)=\delta y^{2}/(\delta y^{2}+0.1)$ * is a regularized version of the trimming function proposed by [9].*

Remark 7 (Extensions to generalized linear measurements).

While we focus on the phase retrieval problem in this paper, our results extend straightforwardly to the generalized linear estimation, where the measurements $y_{i}$ are generated as follows:

[TABLE]

where $f(\cdot|\cdot)$ denotes a conditional distribution modelling a possibly randomized output channel. Under suitable regularity assumptions on $f$ , Theorem 1 holds with the change that the joint distribution of the random variables $T,Z$ is now given by:

[TABLE]

III Optimal Trimming Functions

Theorem 1 can used to design the trimming function $\mathcal{T}$ optimally in order to obtain the best possible value of $|\bm{x}_{\star}^{\mathsf{H}}\hat{\bm{x}}|^{2}$ . Most of the work towards this goal was already done in [11] where the result in Theorem 1 was stated as a conjecture and was used to design the optimal trimming function. In particular, [11] showed the following impossibility result.

Proposition 1 ([11]).

Let $\mathcal{T}$ be any trimming function for which Theorem 1 holds. Then,

[TABLE]

where,

[TABLE]

where $\theta_{\star}^{\mathsf{opt}}$ is the solution to the equation (in $\tau$ ):

[TABLE]

which exists uniquely when $\delta>2$ and, the random variable $T_{\mathsf{opt}}$ is distributed as:

[TABLE]

The work [11] also provided a candidate for the optimal trimming function:

[TABLE]

They showed that if the characterization given in Theorem 1 holds for $\mathcal{T}_{\mathsf{opt}}$ , then it achieves the asymptotic squared correlation $\rho^{2}_{\mathsf{opt}}(\delta)$ . Unfortunately, since $\mathcal{T}_{\mathsf{opt}}$ is unbounded, Theorem 1 does not apply to it. Extending Theorem 1 to unbounded trimming functions would likely require extending previously known results in free probability to unbounded measures, and we don’t pursue this approach in our work. Instead, we suitably modify the arguments of [11] to show that the family of bounded trimming functions:

[TABLE]

attains an asymptotic squared correlation that can be made arbitrarily close to $\rho^{2}(\delta)$ as $\epsilon\downarrow 0$ .

Proposition 2.

Let $\hat{\bm{x}}_{\epsilon}$ denote the spectral estimator for $\bm{x}_{\star}$ obtained by using $\mathcal{T}_{\mathsf{opt},\epsilon}$ as the trimming function. We have, almost surely,

[TABLE]

We provide a proof of this result in Appendix A.

The regularized trimming functions $\mathcal{T}_{\mathsf{opt},\epsilon}$ are not only useful from a theoretical point of view to prove an achievability result, but also from a computational stand point: In simulations we have observed that the power iterations are slow to converge when $\mathcal{T}_{\mathsf{opt}}$ is used as the trimming function due to presence of large negative eigenvalues and this problem is mitigated by using $\mathcal{T}_{\mathsf{opt},\epsilon}$ with a small value of $\epsilon$ (such as $0.1$ or $0.01$ ) with a negligible degradation in performance.

IV Proof of Theorem 1

IV-A Roadmap

Our proof follows the general strategy taken by [7]. In this subsection, we state several key lemmas and show how they fit together in the proof of Theorem II. First we note that without loss of generality, for the purpose of analysis of the spectral estimator, we can assume $\bm{x}_{\star}=\sqrt{n}\bm{e}_{1}$ . The following lemma supports this claim.

Lemma 1.

The distribution of the cosine similarity, $\rho^{2}=|\bm{x}_{\star}^{\mathsf{H}}\hat{\bm{x}}|^{2}/n$ is independent of $\bm{x}_{\star}$ .

Proof.

Let $\bm{x}_{\star}$ be an arbitrary signal vector with $\|\bm{x}_{\star}\|=\sqrt{n}$ . Let $\bm{y},\bm{T},\hat{\bm{x}}$ denote the measurements, trimmed measurements and spectral estimate generated when the sensing matrix was $\bm{A}$ and the signal vector was $\bm{x}_{\star}$ . Note that the cosine similarity $\rho^{2}$ is a (deterministic) function of $\bm{A},\bm{x}_{\star}$ and hence we use the notation $\rho^{2}(\bm{A},\bm{x}_{\star})$ to denote the cosine similarity when the sensing matrix is $\bm{A}$ and the signal vector is $\bm{x}_{\star}$ .

Let $\bm{\Gamma}\in\mathbb{U}(n)$ be such that $\sqrt{n}\bm{\Gamma}\bm{e}_{1}=\bm{x}_{\star}$ . We have $\bm{x}_{\star}^{\mathsf{H}}\hat{\bm{x}}=\sqrt{n}\bm{e}_{1}^{\mathsf{H}}\bm{\Gamma}^{\mathsf{H}}\hat{\bm{x}}$ . Next we note that $\hat{\bm{x}}^{\prime}\overset{\scriptscriptstyle\Delta}{=}\bm{\Gamma}^{\mathsf{H}}\hat{\bm{x}}$ is the leading eigenvector of the matrix $\bm{M}^{\prime}\overset{\scriptscriptstyle\Delta}{=}\bm{\Gamma}^{\mathsf{H}}\bm{M}\bm{\Gamma}=(\bm{A}\bm{\Gamma})^{\mathsf{H}}\bm{T}\bm{A}\bm{\Gamma}={\bm{A}^{\prime}}^{\mathsf{H}}\bm{T}\bm{A}^{\prime}$ , where we defined $\bm{A}^{\prime}\overset{\scriptscriptstyle\Delta}{=}\bm{A}\bm{\Gamma}$ . Noting that $\bm{T}$ is a diagonal matrix consisting of the trimmed observations $\bm{y}=|\bm{A}\bm{x}_{\star}|=\sqrt{n}|\bm{A}^{\prime}\bm{e}_{1}|$ , we conclude that $\hat{\bm{x}}^{\prime}$ is the spectral estimate generated when the sensing matrix was $\bm{A}^{\prime}$ and the signal vector was $\sqrt{n}\bm{e}_{1}$ . Hence, we have concluded that

[TABLE]

Next we note that $\bm{A}$ was generated from the sub-sampled Haar model, that is $\bm{A}=\bm{H}_{m}\bm{S}_{m,n}$ where ${\bm{H}_{m}\sim\textup{Unif}(\mathbb{U}(m))}$ . Since the Haar measure on $\mathbb{U}(n)$ is invariant to right multiplication by unitary matrices, we have

[TABLE]

where the notation $\overset{\text{\tiny{d}}}{=}$ means that two random vectors have the same distributions. Consequently $\bm{A}=\bm{H}_{m}\bm{S}_{m,n}\overset{\text{\tiny{d}}}{=}\bm{A}\bm{\Gamma}=\bm{A}^{\prime}$ . Therefore, $\rho^{2}(\bm{A},\bm{x}_{\star})=\rho^{2}(\bm{A}^{\prime},\sqrt{n}\bm{e}_{1})\overset{\text{\tiny{d}}}{=}\rho^{2}(\bm{A},\sqrt{n}\bm{e}_{1})$ , and the distribution of $\rho^{2}$ is independent of $\bm{x}_{\star}$ . ∎

In the light of the above lemma, in the rest of the paper, we will assume $\bm{x}_{\star}=\sqrt{n}\bm{e}_{1}$ . Next, we partition $\bm{A}$ by separating the first column

[TABLE]

where $\bm{A}_{-1}$ denotes all the remaining columns of $\bm{A}$ (except $\bm{A}_{1}$ ). Hence we can partition $\bm{A}^{\mathsf{H}}\bm{T}\bm{A}$ in the following way:

[TABLE]

Our strategy will be to reduce questions about the spectrum of the matrix $\bm{M}$ to questions about the spectrum of a matrix of the form $\bm{X}=\bm{E}\bm{U}\bm{F}\bm{U}^{\mathsf{H}}$ , where $\bm{U}$ is a uniformly random unitary matrix, $\bm{E}$ is a random matrix independent of $\bm{U}$ and $\bm{F}$ is deterministic. This matrix model has been well studied in Free Probability [19]. The starting point of our reduction is Proposition 2 from [7], stated below.

Proposition 3 ([7]).

Let $\bm{D}$ be an arbitrary deterministic symmetric matrix partitioned as:

[TABLE]

Then, we have

[TABLE]

where $L(\vartheta)=\lambda_{1}(\bm{P}+\vartheta\bm{q}\bm{q}^{\mathsf{H}})$ , and $\vartheta_{\star}>0$ is the unique solution to the fixed point equation $L(\vartheta)=\frac{1}{\vartheta}+a$ . Furthermore, let $\bm{v}_{1}$ be the eigenvector corresponding to the largest eigenvalue of $\bm{D}$ . Then,

[TABLE]

where $\partial_{-}$ and $\partial_{+}$ denote the left and right derivatives respectively. In particular, if $L(\vartheta)$ is differentiable at $\vartheta_{\star}$ , then

[TABLE]

A straightforward corollary of the above proposition to our problem is given below. Define the function

[TABLE]

Corollary 1.

Let $\vartheta_{m}>0$ be the unique solution of $L_{m}(\vartheta)=1/\vartheta+\bm{A}_{1}^{\mathsf{H}}\bm{T}\bm{A}_{1}$ . Then, $\lambda_{1}(\bm{A}^{\mathsf{H}}\bm{T}\bm{A})=L_{m}(\vartheta_{m})$ and

[TABLE]

In particular, if $L_{m}(\vartheta)$ is differentiable at $\vartheta_{m}$ , then

[TABLE]

Hence, we shift our focus to characterizing the function $L_{m}(\vartheta)$ . Recall the decomposition of the matrix $\bm{M}$ given in (6). Recall that since $\bm{x}_{\star}=\sqrt{n}\bm{e}_{1}$ , the diagonal matrix $\bm{T}$ is a deterministic function of $\bm{A}_{1}$ . If the sensing matrix $\bm{A}$ consisted of independent Gaussian entries, then $\bm{T},\bm{A}_{1}$ would have been independent of $\bm{A}_{-1}$ . This is no longer true when $\bm{A}$ is a partial unitary matrix. In order to take care of this, the following lemma leverages a conditioning trick to get rid of the dependence. The following lemma also establishes the link between the function $L_{m}(\vartheta)$ and the study of the spectrum of a matrix of the form $\bm{X}=\bm{E}\bm{U}\bm{F}\bm{U}^{\mathsf{H}}$ , where $\bm{U}$ is a uniformly random unitary matrix, $\bm{E}$ is a random matrix independent of $\bm{U}$ and $\bm{F}$ is deterministic.

Lemma 2.

We have

[TABLE]

where

[TABLE]

$\bm{B}\in\mathbb{C}^{m\times m-1}$ * is an arbitrary basis matrix for $\bm{A}_{1}^{\perp}$ , which denotes the subspace orthogonal to $\bm{A}_{1}$ , and $\bm{H}_{m-1}\sim\textup{Unif}(\mathbb{U}(m-1))$ is independent of $\bm{A}_{1}$ .*

Proof.

We condition on $\bm{A}_{1}$ . Conditioned on $\bm{A}_{1}$ , we can realize $\bm{A}_{-1}$ as:

[TABLE]

In the above equation, $\bm{B}\in\mathbb{C}^{m\times m-1}$ is matrix whose columns form an orthonormal basis of the orthogonal complement of $\bm{A}_{1}$ and $\bm{H}_{m-1}$ is a Haar Unitary of size $m-1$ independent of $\bm{A}_{1}$ . Hence, we obtain

[TABLE]

In the step marked (a), We used the fact that for any two matrices $\bm{\Lambda},\bm{\Gamma}$ (of appropriate dimensions), $\bm{\Lambda}\bm{\Gamma}$ and $\bm{\Gamma}\bm{\Lambda}$ have the same non-zero eigenvalues. In particular, we used this fact with:

[TABLE]

∎

Define the matrix,

[TABLE]

The following lemma characterizes the asymptotic limit of the function $L_{m}(\vartheta)$ . Define $\Lambda_{+}(\tau)$ as

[TABLE]

where $T=\mathcal{T}(|Z|/\sqrt{\delta})$ and $Z\sim\mathcal{CN}(0,1)$ , and

[TABLE]

Lemma 3.

Let $\vartheta_{c}\overset{\scriptscriptstyle\Delta}{=}\left(1-\left(\mathbb{E}\left[\frac{|Z|^{2}}{1-T}\right]\right)^{-1}-\mathbb{E}[|Z|^{2}T]\right)^{-1}$ . Define the function $\theta(\vartheta)$ as:

•

When $\vartheta>\vartheta_{c}$ : Let $\theta(\vartheta)$ be the unique value of $\lambda$ that satisfies the equation:

[TABLE]

in the interval:

[TABLE]

•

When $\vartheta\leq\vartheta_{c}$ : $\theta(\vartheta)\overset{\scriptscriptstyle\Delta}{=}1$ .

Then, we have $L_{m}(\vartheta)\overset{\text{\tiny{a.s.}}}{\rightarrow}\Lambda_{+}(\theta(\vartheta))$ , where $L_{m}(\vartheta)$ is defined in (7).

The proof of Lemma 3 can be found in Section IV-E.

From Corollary 1, we know that $\lambda_{1}(\bm{M})$ solves the fixed point equation (in $\vartheta$ ): $L_{m}(\vartheta)=1/\vartheta+\bm{A}_{1}^{\mathsf{H}}\bm{T}\bm{A}_{1}.$ Simple concentration arguments (see Lemma 7, Section IV-C) show that asymptotically:

[TABLE]

Combining this with Lemma 3 suggests that asymptotically $\lambda_{1}(\bm{M})$ behaves like the solution to the following fixed point equation (in $\vartheta$ ):

[TABLE]

The following lemma analyzes the behavior of this asymptotic fixed point equation. The proof of this lemma can be found in Section IV-E.

Lemma 4.

The following hold for the equation:

[TABLE]

This equation has a unique solution. 2. 2.

Let $\vartheta_{\star}$ denote the solution of the above equation. Then:

Case 1

If $\psi_{1}(\tau_{r})\leq\frac{\delta}{\delta-1},$ we have

[TABLE]

Furthermore if $\psi_{1}(\tau_{r})<\delta/(\delta-1)$ , then,

[TABLE]

Case 2

If $\psi_{1}(\tau_{r})>\frac{\delta}{\delta-1},$ we have

[TABLE]

and,

[TABLE]

*where $\theta_{\star}>1$ is the unique $\theta\geq\tau_{r}$ that satisfies $\psi_{1}(\theta)=\frac{\delta}{\delta-1}.$ *

We are now in the position to prove our main result (restated below for convenience). Recall the definitions of the functions $\Lambda(\tau),\psi_{1}(\tau),\psi_{2}(\tau),\psi_{3}(\tau)$ from Section II.

Theorem 1 Define $\tau_{r}\triangleq\arg\min_{\tau\in[1,\infty)}\Lambda(\tau)$ . Also, let $\theta_{\star}$ denote the unique value of $\theta>\tau_{r}$ that satisfies $\psi_{1}(\theta)=\frac{\delta}{\delta-1}$ . Then, we have

[TABLE]

Furthermore,

[TABLE]

Proof.

We start with the analysis of the largest eigenvalue. We recall the claim of Corollary 1, which tells us that $\lambda_{1}(\bm{M})$ is given by $L_{m}(\vartheta_{m})$ where $\vartheta_{m}$ denotes the solution of $L_{m}(\vartheta)=1/\vartheta+a_{m}$ and $a_{m}=\bm{A}_{1}^{\mathsf{H}}\bm{T}\bm{A}_{1}$ .

We also know that there exists a probability 1 event $\mathcal{E}$ , on which, $L_{m}(\vartheta)\overset{\text{\tiny{a.s.}}}{\rightarrow}\Lambda_{+}(\theta(\vartheta))$ (Lemma 3) and $a_{m}\overset{\text{\tiny{a.s.}}}{\rightarrow}\mathbb{E}[|Z|^{2}T]$ (see Lemma 7 in Section IV-C).

We claim that on $\mathcal{E}$ , $\vartheta_{m}\rightarrow\vartheta_{\star}$ , where $\vartheta_{\star}$ is the solution of the limiting fixed point equation $\Lambda_{+}(\theta(\vartheta))=1/\vartheta+\mathbb{E}[|Z|^{2}T]$ (which was analyzed in Lemma 4). To see this let $\overline{\vartheta}=\lim\sup\vartheta_{m}$ . Consider a subsequence $\vartheta_{m_{k}}\rightarrow\overline{\vartheta}$ . Then applying Lemma 3 (in Appendix E) of [7], we obtain,

[TABLE]

That is, $\overline{\vartheta}$ is also a solution to the limiting fixed point equation $\Lambda_{+}(\theta(\vartheta))=1/\vartheta+\mathbb{E}[|Z|^{2}T]$ . But since this equation has a unique solution (Lemma 4), we have $\lim\sup\vartheta_{m}=\overline{\vartheta}=\vartheta_{\star}$ . Likewise, an analogous argument shows $\lim\inf\vartheta_{m}=\vartheta_{\star}$ .

Now for any realization in the event $\mathcal{E}$ , we have,

[TABLE]

In the above display, in the step marked (a), we again appealed to Lemma 3 (Appendix E) of [7] and the fact that $\vartheta_{m}\rightarrow\vartheta_{\star}$ . Finally, appealing to the alternative characterization of $\Lambda_{+}(\theta(\vartheta_{\star}))$ given in Lemma 4 gives us the claim of the theorem.

We now discuss our result about the cosine similarity. We recall that from Corollary 1, we have

[TABLE]

Appealing to Lemma 4 in Appendix E of [7], we have,

[TABLE]

The derivative of $\Lambda_{+}(\theta(\vartheta))$ at $\vartheta=\vartheta_{\star}$ was calculated in Lemma 4. Plugging this in the above expression gives the statement of the theorem. ∎

The remainder of this section is dedicated to the proof of Lemmas 3 and 4, and is organized as follows:

•

Recall that (cf. 7)

[TABLE]

where

[TABLE]

Note that $\bm{E}(\vartheta)$ is independent of $\bm{H}_{m-1}$ . The spectrum of such a matrix product has been studied in free probability theory, and we collect some results regarding this in Section IV-B.

•

In order to apply the free probability results, we need to understand the spectrum of $\bm{E}(\vartheta)$ . This is done in Section IV-C.

•

It turns out that the limiting spectrum measure of $\bm{E}(\vartheta)\bm{H}_{m-1}\bm{R}\bm{H}_{m-1}^{\mathsf{H}}$ is given by the free convolution (defined in Section IV-B) of the measures $\gamma$ and $\mathcal{L}_{T}$ , where $\gamma\overset{\scriptscriptstyle\Delta}{=}\frac{1}{\delta}\delta_{1}+\left(1-\frac{1}{\delta}\right)\delta_{0}$ and $\mathcal{L}_{T}$ is the law of the random variable $T=\mathcal{T}(|Z|/\sqrt{\delta})$ . Section IV-D is devoted to understanding the support of the free convolution.

•

Finally, Section IV-E proves lemmas 3 and 4.

IV-B Free Probability Background

Our analysis of the spectral estimators relies on a well-studied model in the theory of free probability; We will reduce the problem to the problem of understanding the spectrum of matrices of the form $\bm{X}=\bm{E}\bm{U}\bm{F}\bm{U}^{\mathsf{H}}$ , where $\bm{E}$ and $\bm{F}$ are deterministic matrices and $\bm{U}$ is a Haar-distributed unitary matrix. Then, the limiting spectral distribution of $\bm{X}$ is the free multiplicative convolution of the limiting spectral distributions of $\bm{E}$ and $\bm{F}$ . This section is a collection of the results and definitions regarding these aspects. Here is the organization of this section. Section IV-B1 collects various facts from free harmonic analysis. Section IV-B2 describes the two fundamental results about the model $\bm{X}=\bm{E}\bm{U}\bm{F}\bm{U}^{\mathsf{H}}$ that will be used throughout our paper. Section IV-B3 reviews some results about the support of singular part of the free convolution of two measures. Throughout this section, we assume that $\gamma$ and $\nu$ are two arbitrary compactly supported probability measures on $[0,\infty)$ and that neither of the two measures is completely concentrated at a single point.

IV-B1 Facts from Free Harmonic Analysis

In this section, we collect some facts from the field of free harmonic analysis. All these results can be found in Chapter 3 of [20] or the papers [19] and [21].

Definition 1.

The Cauchy transform $G_{\gamma}$ of $\gamma$ at $z$ is defined as follows:

[TABLE]

Definition 2.

The moment generating function of $\gamma$ , $\psi_{\gamma}$ at $z$ is defined as follows:

[TABLE]

The Cauchy transform and the moment generating function are related via the relation

[TABLE]

Definition 3.

The $\eta$ -transform of a measure is defined as,

[TABLE]

The Cauchy Transform (and hence the Moment Generating function) uniquely characterizes a measure. The measure can be obtained by the following inversion formula. The particular version we state is taken from Section 3.1 of [19].

Theorem 2.

For $a<b\in[0,\infty)$ , we have

[TABLE]

Furthermore, if $\gamma$ satisfies $\gamma=\gamma_{ac}+\gamma_{s}$ , where $\gamma_{ac}$ and $\gamma_{s}$ denote the absolutely continuous and the singular part of the measure with respect to the Lebesgue measure, then the density of the absolutely continuous part is given by

[TABLE]

Next we recall the definition of the free convolution based on the subordination functions from [22]. The statement we provide below appears in a more general form as Proposition 2.6 in [23].

Definition 4.

Let $(\gamma,\nu)$ be a pair of probability measures. There exist analytic functions $w_{\gamma},w_{\nu}$ defined on $\mathbb{C}\backslash[0,\infty)$ such that, for all $z\in\mathbb{C}^{+}$ we have

$w_{\gamma}(z),w_{\nu}(z)\in\mathbb{C}^{+}$ ; $w_{\gamma}(\overline{z})=\overline{w_{\gamma}(z)},w_{\nu}(\overline{z})=\overline{w_{\nu}(z)}$ and $\mathsf{Arg}(w_{\gamma}(z))\geq\mathsf{Arg}(z),\mathsf{Arg}(w_{\nu}(z))\geq\mathsf{Arg}(z)$ . 2. 2.

For any $z\in\mathbb{C}^{+}$ , $w_{\nu}(z)$ is the unique solution in $\mathbb{C}^{+}$ of the fixed point equation $Q_{z}(w)=w$ , where $Q_{z}$ is given by

[TABLE]

An analogous characterization holds for $w_{\gamma}$ with the role of $\gamma$ and $\nu$ changed.

The free convolution of the measures $\gamma$ and $\nu$ denoted by $\gamma\boxtimes\nu$ is the measure whose moment generating function satisfies

[TABLE]

Remark 8.

We emphasize that each of the subordination functions $w_{\gamma},w_{\nu}$ depend on both the measures $\gamma,\nu$ . This is clear since the function $Q_{z}(w)$ defining $w_{\nu}$ depends on both $\nu,\gamma$ .

Note that the above definition defines $w_{\nu}$ and $w_{\gamma}$ on $\mathbb{C}\backslash[0,\infty)$ . However these functions can be continously extended to $\overline{\mathbb{C}^{+}}\cup\{\infty\}$ (Lemma 3.2 in [19]). These extensions to the real line will be important for Theorem IV-B2.

Lemma 5.

The restrictions of subordination functions $w_{\gamma},w_{\nu}$ on $\mathbb{C}^{+}$ have extensions to $\overline{\mathbb{C}^{+}}\cup\{\infty\}$ with the following properties:

$w_{\gamma},w_{\nu}\mathrel{\mathop{\mathchar 58\relax}}\overline{\mathbb{C}^{+}}\cup\{\infty\}\rightarrow\overline{\mathbb{C}^{+}}\cup\{\infty\}$ * are continuous.* 2. 2.

If $1/x\in[0,\infty)\backslash\text{Supp}(\gamma\boxtimes\nu)$ , then the functions $w_{\gamma},w_{\nu}$ continue analytically to a neighborhood of $x$ and

[TABLE]

IV-B2 Spectrum of $\mathbf{X}=\mathbf{E}\mathbf{U}\mathbf{F}\mathbf{U}^{\mathsf{H}}$

As we discussed before, we will convert the problem of analyzing the spectrum of $\bm{M}$ to problems involving the spectrum of matrices of the form $\mathbf{X}_{N}=\mathbf{E}_{N}\mathbf{U}_{N}\mathbf{F}_{N}\mathbf{U}_{N}^{\mathsf{H}}$ , where $\bm{U}_{N}$ is a sequence of Haar distributed $N\times N$ random matrices, and $\bm{E}_{N}$ and $\bm{F}_{N}$ are sequences of deterministic positive semidefinite matrices. In this section, we review two important results from the field of free probability regarding such matrices.

Suppose that $\bm{E}_{N}$ and $\bm{F}_{N}$ satisfy the following hypotheses:

(i)

$\mu_{\bm{E}_{N}}\overset{\text{\tiny{d}}}{\rightarrow}\mu_{e}$ and $\mu_{\bm{F}_{N}}\overset{\text{\tiny{d}}}{\rightarrow}\mu_{f}$ , where $\mu_{e},\mu_{f}$ are compactly supported measures on $[0,\infty)$ . 2. (ii)

$\bm{E}_{N}$ has a single outlying eigenvalue $\theta$ not contained in $\text{Supp}(\mu_{e})$ . $\bm{F}_{N}$ has no eigenvalues outside $\text{Supp}(\mu_{f})$ . 3. (iii)

The set of eigenvalues of $\bm{E}_{N}$ not equal to $\theta$ converge uniformly to $\text{Supp}(\mu_{e})$ in the sense,

[TABLE]

Our next theorem characterizes the bulk distribution of $\bm{X}_{N}$ . The first part of this theorem is due to [24] and the second and third parts are due to [19] (Theorem 2.3).

Theorem 3.

Let $w_{e}$ and $w_{f}$ denote the subordination functions for the free multiplicative convolution of $\mu_{e}$ and $\mu_{f}$ . Define

[TABLE]

Then we have, almost surely for large enough $N$ ,

$\mu_{\bm{X}_{N}}\overset{\text{\tiny{d}}}{\rightarrow}\mu_{e}\boxtimes\mu_{f}$ . 2. 2.

Given $\epsilon>0$ , we have $\sigma(\bm{X}_{N})\subset K_{\epsilon}$ , where $K_{\epsilon}$ is the $\epsilon$ -neighborhood of $K$ and $\sigma(\bm{X}_{N})$ denotes the set of eigenvalues of $\bm{X}_{N}$ . 3. 3.

For any $\rho\in\tau_{e}^{-1}(\theta)$ such that $\exists\epsilon>0$ with $(\rho-2\epsilon,\rho+2\epsilon)\cap K=\{\rho\}$ , we have $|\sigma(\bm{X}_{N})\cap(\rho-\epsilon,\rho+\epsilon)|=1$ .

Remark 9.

The hypothesis in the above theorem can be relaxed (as mentioned in Remark 5.11 of [19]) in the following two ways: 1) $\bm{E}_{N}$ is random, independent of $\bm{U}_{N}$ and $\bm{F}_{N}$ is deterministic, provided $\mu_{\bm{E}_{N}}\overset{\text{\tiny{d}}}{\rightarrow}\mu_{e}$ occurs almost surely, 2) The spike locations depend on $N$ , $\theta_{N}$ provided $\theta_{N}\rightarrow\theta$ almost surely.

Remark 10.

The above theorem is a simplified version of Theorem 2.3 in [19] which allows for multiple spikes in both $\bm{E}_{N}$ and $\bm{F}_{N}$ .

Remark 11.

The function $\tau$ might not be invertible. In such cases, $\tau^{-1}(\theta)$ can be a non-singleton set, and hence a single spike in $\bm{E}_{N}$ can create multiple spikes in $\bm{X}_{N}$ . But we will see that this doesn’t happen in our problem.

IV-B3 Singular Part of Free Convolution

In the last section we discussed the bulk distribution of $\bm{X}_{N}=\bm{E}_{N}\bm{U}_{N}\bm{F}_{N}\bm{U}_{N}$ . The main objective of this section is to mention a result regarding the largest eigenvalue of $\bm{X}_{N}$ . We state regularity results for the singular part of $\gamma\boxtimes\nu$ from [25] (Corollary 3.4) and [21] (Theorem 4.1).

Theorem 4 (Singular Part of $\gamma\boxtimes\nu$ ).

Decompose the singular part of $\gamma\boxtimes\nu$ as $(\gamma\boxtimes\nu)_{s}=(\gamma\boxtimes\nu)_{d}+(\gamma\boxtimes\nu)_{sc}$ where $(\gamma\boxtimes\nu)_{d}$ denotes the discrete part and $(\gamma\boxtimes\nu)_{sc}$ denotes the singular continous part. Then we have,

There can be at most two atoms. The possible locations of the atoms are:

(a)

[math], with $\gamma\boxtimes\nu(\{0\})=\max(\gamma(\{0\}),\nu(\{0\}))$ . 2. (b)

Any $a\in(0,\infty)$ such that there exist $u,v\in(0,\infty)$ with $uv=a$ and $\gamma(\{u\})+\nu(\{v\})>1$ and we have, $\gamma\boxtimes\nu(\{a\})=\gamma(\{u\})+\nu(\{v\})-1$ . Note that there can be atmost one such $a$ . 2. 2.

Suppose neither of $\gamma,\nu$ is completely concentrated at a single point. We have, $\text{Supp}((\gamma\boxtimes\nu)_{sc})\subset\text{Supp}((\gamma\boxtimes\nu)_{ac})$ . Hence,

[TABLE]

IV-C Analysis of the Spectrum of $\mathbf{E}(\vartheta)$

In order to apply Theorem 3, we need to understand the spectrum of $\bm{B}^{\mathsf{H}}(\bm{T}+\vartheta\bm{T}\bm{A}_{1}(\bm{T}\bm{A}_{1})^{\mathsf{H}})\bm{B}$ . This is done in the following lemma.

Lemma 6.

Let

[TABLE]

denote the sorted trimmed measurements. Let ${\bm{E}(\vartheta)\overset{\scriptscriptstyle\Delta}{=}\bm{B}^{\mathsf{H}}(\bm{T}+\vartheta\bm{T}\bm{A}_{1}(\bm{T}\bm{A}_{1})^{\mathsf{H}})\bm{B}}$ . Then,

The eigenvalues of $\bm{E}(\vartheta)$ interlace with $T_{(1)},T_{(2)}\dots T_{(m)}$ in the sense,

[TABLE] 2. 2.

$\bm{E}(\vartheta)$ * can have at most one eigenvalue bigger than $T_{(1)}$ , which (if it exists) is given by the root of the following equation:*

[TABLE]

where $Q_{m}(\lambda)$ is defined as

[TABLE] 3. 3.

Furthermore, $\lambda_{1}(\bm{E}(\vartheta))\leq 1+\vartheta$ and $\lambda_{m-1}(\bm{E}(\vartheta))\geq 0$ .

Proof.

Define the matrix $\bm{E}(\vartheta)=\bm{B}^{\mathsf{H}}(\bm{T}+\vartheta\bm{T}\bm{A}_{1}(\bm{T}\bm{A}_{1})^{\mathsf{H}})\bm{B}$ . The main trick will be to choose the orthonormal basis matrix $\bm{B}$ conveniently, which will make our calculations easier. Recall that the columns of matrix $\bm{B}$ , i.e. $\bm{B}_{1},\bm{B}_{2}\dots\bm{B}_{m-1}$ , span the subspace $\bm{A}_{1}^{\perp}$ . Any basis for subspace $\bm{A}_{1}^{\perp}$ can serve as matrix $\bm{B}$ . Hence, we chose the following specific construction of $\bm{B}$ :

[TABLE]

where $a_{m}=\bm{A}_{1}^{\mathsf{H}}\bm{T}\bm{A}_{1}$ and $b_{m}=\bm{A}_{1}^{\mathsf{H}}\bm{T}^{2}\bm{A}_{1}.$ With this choice, we note that

[TABLE]

Hence $\bm{E}(\vartheta)=\bm{B}^{\mathsf{H}}\bm{T}\bm{B}+\vartheta(b_{m}-a_{m}^{2})\bm{e}_{1}\bm{e}_{1}^{\mathsf{H}}$ . To obtain the eigenvalues of $\bm{E}(\vartheta)$ we use its characteristic polynomial. To evaluate the characteristic polynomial of $\bm{E}(\vartheta)$ , we connect it to the characteristic polynomial of $\bm{O}^{\mathsf{H}}\bm{T}\bm{O}$ , where $\bm{O}=[\bm{A}_{1},\bm{B}]$ . Note that $\bm{O}$ is a unitary matrix. First, we have

[TABLE]

Consider the following matrix equation:

[TABLE]

where

[TABLE]

Therefore,

[TABLE]

Now, we can compute the characteristic polynomial of $\bm{E}(\vartheta)$ . We have

[TABLE]

Note that

[TABLE]

Where $Q_{m}(\lambda)$ is defined in the following way:

[TABLE]

Hence,

[TABLE]

We emphasize that the above equation does not imply that $T_{1},T_{2},\dots,T_{m}$ are the eigenvalues of $\bm{E}(\vartheta)$ . This is because while $\det(\lambda\bm{I}-\bm{T})$ has zeros at $T_{i}$ , the function $Q_{m}(\lambda)$ has poles at $T_{i}$ . This prevents us from concluding that ${\det(\lambda\bm{I}-\bm{E}(\vartheta))=0}$ when $\lambda=T_{i}$ . However, we can make the following observations:

By Cauchy’s interlacing theorem, we have

[TABLE]

The above is also true for the eigenvalues of:

[TABLE]

since $\bm{O}$ is a unitary matrix. 2. 2.

(9) shows that $\bm{E}(\vartheta)$ is a principal submatrix of

[TABLE]

Hence, the eigenvalues of $\bm{E}(\vartheta)$ will interlace the eigenvalues of $\bm{O}^{\mathsf{H}}(\bm{T}+\vartheta(\bm{T}\bm{A}_{1})(\bm{T}\bm{A}_{1})^{\mathsf{H}})\bm{O}$ :

[TABLE]

Combining (11) and (12), one obtains

[TABLE]

This proves statement (1) in the lemma. This means that $\bm{E}(\vartheta)$ has atmost one eigenvalue bigger than $T_{(1)}$ . If $\lambda_{1}(\bm{E}(\vartheta))\leq T_{(1)}$ , then it has no outlying eigenvalue, if $\lambda_{1}(\bm{E}(\vartheta))>T_{(1)}$ , it has exactly one. We call this eigenvalue an outlying eigenvalue for reasons that will be clear later. 3. 3.

The outlying eigenvalue of $\bm{E}(\vartheta)$ (if it exists) is a root of the characteristic polynomial:

[TABLE]

Since this root lies in $(T_{(1)},\infty)$ , it must be a root of:

[TABLE]

Observing that:

[TABLE]

we conclude the outlying eigenvalue is the unique solution (if it exists) to:

[TABLE]

This proves statement (2). 4. 4.

Finally, we observe that $\bm{E}(\vartheta)$ is a positive semidefinite matrix for all $\vartheta\geq 0$ , which shows $\lambda_{m-1}(\bm{E}(\vartheta))\geq 0$ . Also, we have $\lambda_{1}(\bm{E}(\vartheta))\leq\|\bm{E}(\vartheta)\|\leq\|\bm{B}\|^{2}\|\bm{T}+\vartheta\bm{T}\bm{A}_{1}(\bm{T}\bm{A}_{1})^{\mathsf{H}}\|$ . Note that $\|\bm{B}\|\leq 1$ and $\|\bm{T}\|\leq 1$ and $\|\bm{T}\bm{A}_{1}(\bm{T}\bm{A}_{1})^{\mathsf{H}}\|=\bm{A}_{1}^{\mathsf{H}}\bm{T}^{2}\bm{A}_{1}\leq T_{(1)}^{2}\leq 1$ . Hence, by the triangle inequality we have $\lambda_{1}(\bm{E}(\vartheta))\leq 1+\vartheta$ . This proves statement (3) of the lemma.

∎

The following lemma analyzes the concentration of the function $Q_{m}(\lambda)$ to the deterministic function $Q(\lambda)$ .

Lemma 7.

Suppose $\frac{m}{n}=\delta$ . For a Lipschitz function $\mathcal{T}$ whose range is in $[0,1]$ , there exists an event of probability 1, on which the following three statements hold:

$\frac{1}{m}\sum_{i=1}^{m}\delta_{T_{i}}\overset{\text{\tiny{d}}}{\rightarrow}\mathcal{L}_{T}$ , 2. 2.

$Q_{m}(\lambda)\rightarrow Q(\lambda)\quad\forall\;\lambda\in(1,\infty)$ , 3. 3.

$a_{m}\rightarrow\mathbb{E}|Z|^{2}T$ .

In the above equations, $Z\sim\mathcal{CN}\left(0,1\right)$ , and $T=\mathcal{T}(|Z|/\sqrt{\delta})$ . Furthermore, $\mathcal{L}_{T}$ denotes the law of the random variable $T$ , and

[TABLE]

Proof.

It is sufficient to show each item holds almost surely.

The argument for this part is a minor modification of the argument sketched in [26]. To prove statement (1) it suffices to show that

[TABLE]

almost surely. Because if we have (14), then for every bounded continuous function $f$ ,

[TABLE]

where $g(x)=f(\mathcal{T}(\frac{\mathinner{\!\left\lvert x\right\rvert}}{\sqrt{\delta}}))$ is a bounded continuous function as well. Hence by (14),

[TABLE]

which implies $\frac{1}{m}\sum_{i=1}^{m}\delta_{T_{i}}\xrightarrow{d}\mathcal{L}_{T}$ .

To show (14), note that $\bm{A}_{1}$ has the same distribution as $\frac{\bm{z}}{\mathinner{\!\left\lVert\bm{z}\right\rVert}}$ , where $\bm{z}=\mathinner{\left(z_{1},...,z_{m}\right)}$ , and $z_{i}\overset{i.i.d.}{\sim}\mathcal{CN}(0,1)$ . Let $\Phi$ denote the cumulative distribution function of a standard normal random variable and define

[TABLE]

Then, we have

[TABLE]

Moreover,

[TABLE]

$G_{m}(t\mathinner{\!\left\lVert\bm{z}\right\rVert})-\Phi(t\mathinner{\!\left\lVert\bm{z}\right\rVert})$ goes to [math] almost surely by Glivenko-Cantelli lemma. Furthermore, since

[TABLE]

and $\Phi$ is a continuous function we conclude that

[TABLE]

Hence,

[TABLE]

almost surely which yields (14). 2. 2.

We now focus on the proof of statement (2). Let

[TABLE]

We will show that

[TABLE]

almost surely. This means there is a set $\mathcal{C}_{k}^{\prime}$ , with measure [math], out of which we have the convergence for all $\lambda\in\mathcal{C}_{k}$ . If we define $\mathcal{C}^{\prime}\overset{\scriptscriptstyle\Delta}{=}\bigcup\limits_{k=1}^{\infty}\mathcal{C}_{k}^{\prime}$ , then $Q_{m}(\lambda)\to Q(\lambda)\quad\forall\lambda\in\mathinner{\left(1,\infty\right)}$ out of $\mathcal{C^{\prime}}$ and clearly $\mathbb{P}\mathinner{\left(\mathcal{C^{\prime}}\right)}=0$ .

First note that $\bm{A}_{1}\overset{d}{=}\frac{\bm{z}}{\mathinner{\!\left\lVert\bm{z}\right\rVert}}$ , where

[TABLE]

Define

[TABLE]

Note that for a fixed $\lambda$ we have $\tilde{Q}_{m}(\lambda)\to Q(\lambda)$ almost surely by the strong law of large numbers. Since $\tilde{Q}_{m}(\lambda)$ is a decreasing function in $\lambda$ and we have $\tilde{Q}_{m}(\lambda)\to Q(\lambda)\quad\forall\lambda\in\mathcal{C}_{k}\cap\mathbb{Q}$ almost surely, we obtain $\tilde{Q}_{m}(\lambda)\to Q(\lambda)$ for all $\lambda\in\mathcal{C}_{k}$ with probability $1$ . Hence, it suffices to show under an event that holds with probability 1,

[TABLE]

To prove (18), we will find a sequence $\tau_{m}$ such that $\tau_{m}\rightarrow 0$ as $m\rightarrow\infty$ , and,

[TABLE]

With this, Borel-Cantelli lemma yields that event

[TABLE]

has measure [math]. Out of the event $E$ we have (18) as it was desired.

Define the events:

[TABLE]

where $\epsilon$ is parameter we will set later. Note that,

[TABLE]

where we defined the terms $\mathsf{I},\mathsf{II}$ as:

[TABLE]

Using the fact that $\bm{z}\in E_{1}\cap E_{2,\epsilon}$ and $\lambda\in\mathcal{C}_{k}$ , we have,

[TABLE]

Observe that, on the event $E_{1}\cap E_{2,\epsilon}$ ,

[TABLE]

Since $\mathcal{T}$ was assumed to be Lipchitz,

[TABLE]

where $\|\mathcal{T}\|_{\mathsf{Lip}}$ denotes the Lipchitz constant of $\mathcal{T}$ . Hence, when $m\geq e^{2}$ , setting $\epsilon=\frac{1}{\log(m)}\leq 0.5$ , we obtain, on the event $E_{1}\cap E_{2,\epsilon}$

[TABLE]

where

[TABLE]

Note that $\tau_{m}\rightarrow 0$ as $m\rightarrow\infty$ as required. And,

[TABLE]

where the last step follows from standard bounds on the tail Gaussian random variables and $\chi^{2}$ random variables. In particular, we have,

[TABLE]

as required. 3. 3.

The proof is similar to the proof of the second statement. Hence, we skip the details. Note that if we define

[TABLE]

then it again converges under the event $E_{1}\cap E_{2,\epsilon}$ , defined in the proof of statement (2).

∎

The next lemma analyzes the properties of the limiting fixed point equation $Q(\lambda)=(\lambda-\mathbb{E}|Z|^{2}T-1/\vartheta)^{-1}$ . Define the critical value $\vartheta_{c}$ as:

[TABLE]

Lemma 8.

Consider the fixed point equation (in $\lambda$ )

[TABLE]

on the domain:

[TABLE]

We have

If $\vartheta>\vartheta_{c}$ , then the above equation has exactly 1 solution, denoted by $\lambda=\theta(\vartheta)$ . Furthermore,

[TABLE]

Furthermore, we have $\theta(\vartheta)$ is an increasing function of $\vartheta$ and $\lim_{\vartheta\rightarrow\infty}\theta(\vartheta)=\infty$ . 2. 2.

If $\vartheta\leq\vartheta_{c}$ , then the equation has no solutions. For any $\vartheta\leq\vartheta_{c}$ , we define $\theta(\vartheta)=1$ .

Proof.

The following change of measure simplifies some of the proofs:

[TABLE]

Note that $p(z)$ is a proper probability density function since $\int p(z)\mathop{}\!\mathrm{d}z=\mathbb{E}[|Z|^{2}]=1$ . With this notation, (20) can be written as

[TABLE]

Define the random variable $G(\lambda)=(\lambda-T)^{-1}$ . Note that $G^{\prime}(\lambda)=-G^{2}(\lambda)$ . Further, define

[TABLE]

The first two derivatives of $f(\lambda)$ are

[TABLE]

First, since $f^{\prime}(\lambda)\geq 0$ , the function $f(\lambda)$ is increasing. By Jensen’s Inequality $f^{\prime}(\lambda)\geq 1$ . Since the equality holds if and only if $G$ is deterministic, and we have assumed that the support of $T$ is $[0,1]$ , we conclude that $f(\lambda)>1$ . Noting that $G\geq 0$ and applying Chebychev’s association inequality (See Fact 1, Appendix B) with $B=A=G$ and $f(a)=g(a)=a$ gives ${f}^{\prime\prime}(\lambda)\leq 0$ . Hence $f(\lambda)$ is an increasing, concave function and ${f}^{\prime}(\lambda)>1$ .

Next, we claim that $f(\lambda)=\lambda-\tilde{\mathbb{E}}[T]-1/\vartheta$ can have atmost one solution in $(1,\infty)$ . To see this, let $\lambda_{1}$ be the first point at which the two curves intersect. Hence $f(\lambda_{1})=\lambda_{1}-\tilde{\mathbb{E}}[T]-1/\vartheta$ . Furthermore

[TABLE]

Hence there can be no other intersection point of the two curves after $\lambda_{1}$ .

Now consider the following two cases:

Case 1: $\vartheta>\vartheta_{c}$ . First note that since $(1-x)^{-1}$ is a convex function on $(-\infty,1]$ , according to Jensen’s Inequality

[TABLE]

Hence,

[TABLE]

This shows that $\vartheta_{c}\geq 0$ . Furthermore,

[TABLE]

On the other hand, we can also compare the limiting behavior of $\lambda-\tilde{\mathbb{E}}[T]-1/\vartheta$ and $f(\lambda)$ as $\lambda\rightarrow\infty$ . We have

[TABLE]

and

[TABLE]

Hence, $f(\lambda)>\lambda-\tilde{\mathbb{E}}[T]-1/\vartheta$ for $\lambda$ large enough and $f(1)<1-\tilde{\mathbb{E}}[T]-1/\vartheta$ . Hence the functions $f(\lambda)$ and $1-\tilde{\mathbb{E}}[T]-1/\vartheta$ intersect once in $(1,\infty)$ . Finally note that,

[TABLE]

Hence $f(\lambda)=\lambda-\tilde{\mathbb{E}}[T]-1/\vartheta$ has exactly one solution in $\lambda\geq\max(1,\tilde{\mathbb{E}}[T]+1/\vartheta)$ as claimed. By the Implicit Function Theorem, we can compute

[TABLE]

Hence $\theta(\vartheta)$ is an increasing function of $\vartheta$ . Finally, we verify that $\lim_{\vartheta\rightarrow\infty}\theta(\vartheta)=\infty$ . Suppose that this is not the case, i.e. $\theta(\vartheta)\rightarrow\theta_{\infty}<\infty$ as $\vartheta\rightarrow\infty$ . Recalling the fixed point characterization of $\theta(\vartheta)$ , we obtain that $\theta_{\infty}$ satisfies the fixed point equation

[TABLE]

This means that Jensen’s Inequality applied to the strictly convex function $(\theta_{\infty}-t)^{-1}$ should be tight. This means under the tilted measure ( $\tilde{\mathbb{E}}$ ), $T$ is deterministic. This is not possible since we have assumed that $T$ is supported on $[0,1]$ .

Case 2: $\vartheta\leq\vartheta_{c}$ As in Case 1 we argue (this time with the opposite conclusion) that

[TABLE]

Furthermore, since ${f}^{\prime}(\lambda)>\frac{\mathop{}\!\mathrm{d}(\lambda-\tilde{\mathbb{E}}[T]-1/\vartheta)}{\mathop{}\!\mathrm{d}\lambda}=1,$ $f(\lambda)=\lambda-\tilde{\mathbb{E}}[T]-1/\vartheta$ has no solution in $(1,\infty)$ . ∎

Combining the above sequence of lemmas, we obtain the following proposition about the spectrum of the matrix $\bm{E}(\vartheta)$ .

Proposition 4.

Let $\bm{E}(\vartheta)=\bm{B}^{\mathsf{H}}(\bm{T}+\vartheta\bm{T}\bm{A}_{1}(\bm{T}\bm{A}_{1})^{\mathsf{H}}))\bm{B}$ . Then, there exists an event of probability 1, on which we have,

$\mu_{\bm{E}(\vartheta)}\overset{\text{\tiny{d}}}{\rightarrow}\mathcal{L}_{T}$ . 2. 2.

If $\vartheta\leq\vartheta_{c}$ , $\sigma(\bm{E}(\vartheta))\subset[0,1]$ . 3. 3.

If $\vartheta>\vartheta_{c}$ , then $\lambda_{i}(\bm{E}(\vartheta))\in[0,1]\;\forall\;i\geq 2$ , and,

[TABLE]

where $\theta(\vartheta)$ is the unique solution to the equation (in $\lambda$ ):

[TABLE]

in the domain:

[TABLE]

Proof.

We restrict ourselves to the event guaranteed by Lemma 7, on which,

$a_{m}\rightarrow\mathbb{E}|Z|^{2}T$ 2. 2.

$\frac{1}{m}\sum_{i=1}^{m}\delta_{T_{i}}\overset{\text{\tiny{d}}}{\rightarrow}\mathcal{L}_{T}$ 3. 3.

$Q_{m}(\lambda)\rightarrow Q(\lambda)\;\forall\;\lambda\in(1,\infty)$ .

Let us denote this event by $\mathcal{E}$ . Define the sequence of (random) functions $f_{m}(\lambda)$ as:

[TABLE]

with the domain:

[TABLE]

Define the (deterministic) function $f(\lambda)$ :

[TABLE]

with the domain:

[TABLE]

Note that on $\mathcal{E}$ , we have $f_{m}(\lambda)\rightarrow f(\lambda)\;\forall\;\lambda>1$ .

By Lemma 6, we know that the eigenvalues of $\bm{E}(\vartheta)$ interlace with the eigenvalues of the diagonal matrix $\bm{T}$ . On the event $\mathcal{E}$ , $\mu_{\bm{T}}\rightarrow\mathcal{L}_{T}$ . Hence indeed $\mu_{\bm{E}(\vartheta)}\overset{\text{\tiny{d}}}{\rightarrow}\mathcal{L}_{T}$ . This proves statement (1) of the proposition. 2. 2.

Consider the case $\vartheta\leq\vartheta_{c}$ . By Lemma 6, we already know that $\lambda_{2}(\bm{E}(\vartheta))\leq T_{(1)}\leq 1$ and $\lambda_{m-1}(\bm{E}(\vartheta))\geq 0$ . Hence to prove (2), it is sufficient to show that

[TABLE]

For the sake of contradiction, suppose that there is a realization in $\mathcal{E}$ such that $\bar{\lambda}_{1}>1$ . On this realization we consider a subsequence such that $\lambda_{1}(\bm{E}(\vartheta))\rightarrow\bar{\lambda}_{1}$ . All the analysis henceforth is along this subsequence. Since for all $m$ large enough $\lambda_{1}(\bm{E}(\vartheta))>1$ , by Lemma 6, we must have $f_{m}(\lambda_{1}(\bm{E}(\vartheta))=0$ . Applying Lemma 3 from [7] (Appendix E), we obtain

[TABLE]

Since $\vartheta\leq\vartheta_{c}$ , we know by Lemma 8 that $f(\lambda)=0$ does not have any solution in $\lambda>\max(1,\mathbb{E}[|Z|^{2}T]+1/\vartheta)$ . Hence,

[TABLE]

However,

[TABLE]

This contradicts $f(\bar{\lambda}_{1})=0$ . Hence, $\underset{m\rightarrow\infty}{\lim\sup}\ \lambda_{1}(\bm{E}(\vartheta))\leq 1,\;\text{on$ \mathcal{E} $.}$ This concludes the proof of statement (2). 3. 3.

Now consider the case $\vartheta>\vartheta_{c}$ . Again by Lemma 6, we know $\lambda_{i}(\bm{E}(\vartheta))\in[0,1]$ for all $i\geq 2$ . By Lemma 8, we know that $f(\lambda)=0$ has a unique solution in $\lambda>\max(1,\mathbb{E}|Z|^{2}T+1/\vartheta)$ denoted by $\theta(\vartheta)$ . Fix an $\epsilon$ small enough such that $[\theta(\vartheta)-\epsilon,\theta(\vartheta)+\epsilon]$ lies in the domain of $f(\lambda)$ . Note that $f(\theta(\vartheta))=0$ , while $f(\theta(\vartheta)-\epsilon)>0$ and $f(\theta(\vartheta)+\epsilon)<0$ (by Lemma 8).

Since $a_{m}\rightarrow\mathbb{E}|Z|^{2}T$ , for all $m$ large enough, $[\theta(\vartheta)-\epsilon,\theta(\vartheta)+\epsilon]$ also lies in the domain of $f_{m}(\lambda)$ . By Lemma 7, we have $f_{m}(\lambda)\rightarrow f(\lambda)$ for all $\lambda\in[\theta(\vartheta)-\epsilon,\theta(\vartheta)+\epsilon]$ . In particular, we have, for all $n$ large enough $f_{m}(\theta(\vartheta)-\epsilon)>0$ while $f_{m}(\theta(\vartheta)+\epsilon)<0$ . Hence, by Lemma 6, we have $\lambda_{1}(\bm{E}(\vartheta))\in[\theta(\vartheta)-\epsilon,\theta(\vartheta)+\epsilon]$ for all $n$ large enough. Hence indeed, $\lambda_{1}(\bm{E}(\vartheta))\overset{\text{\tiny{a.s.}}}{\rightarrow}\theta(\vartheta)$ . This proves (3).

∎

IV-D Analysis of the Support of $\gamma\boxtimes\mathcal{L}_{T}$

We recall that $\mathcal{L}_{T}$ is the law of the random variable $T=\mathcal{T}(|Z|/\sqrt{\delta})$ , and $\gamma=\frac{1}{\delta}\delta_{1}+\left(1-\frac{1}{\delta}\right)\delta_{0}$ . To keep the notation clean, we will refer to the analytic transforms corresponding to the measure $\mathcal{L}_{T}$ with the subscript $T$ , for example the Cauchy transform for the measure $\mathcal{L}_{T}$ will be referred to as $G_{T}$ .We begin by computing the Cauchy Transform of $\gamma\boxtimes T$ .

Lemma 9.

Let $z\in\mathbb{C}^{-}$ . Then, we have,

[TABLE]

In the above display, the subordination function, $w_{T}(1/z)$ , is the unique solution in $\mathbb{C}^{+}$ to the equation $\Lambda(1/w)=z$ , where the function $\Lambda$ is defined as:

[TABLE]

Proof.

First we can compute the moment generating functions:

[TABLE]

The $\eta$ -transforms of the two measures are given by,

[TABLE]

Hence, we can compute the function $Q_{z}$ , given in Definition 4,

[TABLE]

Hence $w_{T}$ is the unique solution in $\mathbb{C}^{+}$ of the equation $Q_{z}(w)=w$ . This equation can be simplified to

[TABLE]

where the function $\Lambda$ is defined as $\Lambda(\tau)\overset{\scriptscriptstyle\Delta}{=}\tau-\frac{(1-1/\delta)}{\mathbb{E}\left[\frac{1}{\tau-T}\right]}.$ Hence, we can compute the moment generating function of $\gamma\boxtimes T$ in the following way:

[TABLE]

In the above display, in the step marked (a), we used the fact that $w_{T}$ solves $\Lambda(1/w)=1/z$ . Finally, the Cauchy Transform of $\gamma\boxtimes T$ is given by

[TABLE]

∎

Our next goal is to characterize $\text{Supp}(\gamma\boxtimes T)$ . Theorem 4 gives a complete characterization of the support of the singular part of $\gamma\boxtimes T$ . Hence, we now need to understand the support of the absolutely continuous part of $\gamma\boxtimes T$ . According to the Stieltjes Inversion theorem, (Theorem 2) the density of the continuous part is given by

[TABLE]

Since $\tau_{T}(x-i\epsilon)\overset{\scriptscriptstyle\Delta}{=}1/w_{T}(1/(x-i\epsilon))$ uniquely solves $\Lambda(\tau)=x-i\epsilon$ in $\mathbb{C}^{-}$ , our interest will be to study the solutions of this equation for $\epsilon\approx 0$ . Hence, we begin by studying the solutions of $\Lambda(\tau)=x$ . Before doing so, we clarify the definition of $\Lambda(\tau)$ at $\tau=1$ which is a subtle case because $1\in\text{Supp}(T)$ . We note that the random variable $(1-T)^{-1}$ is non-negative and hence the expectation $\mathbb{E}[(1-T)^{-1}]$ is well defined but might be $\infty$ . If it is finite, then $\Lambda(\tau)$ is well defined at $\tau=1$ . If the expectation is $\infty$ , we define $\Lambda(1)=1$ which is consistent with intepreting $1/\infty=0$ . $\Lambda(\tau)$ is defined at $\tau=0$ analogously. This definition ensures $\Lambda(\tau)$ is a continuous function on $(-\infty,0]\cup[1,\infty)$ . Next we discuss the solutions of $\Lambda(\tau)=x$ . Figure 2 shows a typical plot $\Lambda(\tau)$ . As is clear from this figure we expect the following two quantities to play major roles in determining the existence of a solution of $\Lambda(\tau)=x$ : Define

[TABLE]

Our next lemma proves the properties of $\Lambda(\tau)$ suggested by Figure 2.

Lemma 10.

The following statements are true about $\Lambda(\tau)$ :

$\Lambda(\tau)$ * is a convex function on $[1,\infty)$ and a concave function on $(-\infty,0]$ .* 2. 2.

$\lim_{\tau\rightarrow\infty}\Lambda(\tau)=\infty,\;\lim_{\tau\rightarrow-\infty}\Lambda(\tau)=-\infty$ . 3. 3.

$\lambda_{r}>\lambda_{l}\geq 0$ . 4. 4.

Consider the 3 mutually exclusive and exhaustive cases:

Case A: $x\leq\lambda_{l}$ . There is at least one and at most two solutions to $\Lambda(\tau)=x$ . All solutions lie in $(-\infty,0]$ . Furthermore, when $x<\lambda_{l}$ , there is exactly one solution for the equation $\Lambda(\tau)=x,{\Lambda}^{\prime}(\tau)>0$ . This unique solution additionally satisfies $\tau<\tau_{l}\leq 0$ .

Case B: $\lambda_{l}<x<\lambda_{r}$ . There are no solutions of the equation $\Lambda(\tau)=x,\;\tau\in(-\infty,0]\cup[1,\infty)$ .

Case C: $x\geq\lambda_{r}$ . There is at least one and at most two solutions to $\Lambda(\tau)=x$ . All solutions lie in $[1,\infty)$ . Furthermore, when, $x>\lambda_{r}$ , there is a unique solution to $\Lambda(\tau)=x,{\Lambda}^{\prime}(\tau)>0$ . This solution additionally satisfies $\tau>\tau_{r}\geq 1$ .

Proof.

We define the random variable $G(\tau)$ ,

[TABLE]

We observe that for any $\tau\in[1,\infty)$ , $G(\tau)\geq 0$ where as for $\tau\in(-\infty,0]$ , $G(\tau)\leq 0$ . It is straightforward to see that $G^{\prime}(\tau)=-G^{2}(\tau)\leq 0.$ For notational simplicity, we will often short hand $G(\tau)$ as $G$ . We have

[TABLE]

Consider the following two cases,

Case 1: $\tau\in[1,\infty)$ .

Applying Chebychev’s Association Inequality (Fact 1) with $A=B=G$ and $f(a)=g(a)=a$ gives us that $\Lambda^{\prime\prime}(\tau)\geq 0$ . In fact, an inspection of the proof of the Chebychev’s Association Inequality from [27] allows us to rule out the equality case under the assumptions imposed on $\mathcal{T}$ , and we have $\Lambda^{\prime\prime}(\tau)>0$ . Hence, $\Lambda$ is strictly convex in $(1,\infty)$ . Since $\Lambda(\tau)$ is continuous on $[1,\infty)$ , we have $\Lambda$ is convex on $[1,\infty)$

Case 2: $\tau\in(-\infty,0]$ .

Again, applying Chebychev’s Association Inequality with $A=B=-G$ and $f(a)=f(b)=a$ gives us ${\Lambda}^{\prime\prime}(\tau)\leq 0$ , Hence $\Lambda$ is concave in this region. As before, an inspection of the proof of Chebychev’s Association inequality allows us to rule out the equality case under the assumptions imposed on $\mathcal{T}$ , and we have $\Lambda^{\prime\prime}(\tau)<0$ . Hence, $\Lambda$ is strictly concave in $(-\infty,0)$ . Since $\Lambda(\tau)$ is continuous on $(-\infty,0)$ , we have $\Lambda$ is concave on $(-\infty,0]$ . This concludes the proof of statement (1) in the lemma. 2. 2.

Note that,

[TABLE]

This shows $\lim_{\tau\rightarrow\infty}\Lambda(\tau)=\infty$ . The claim about the limit as $\tau\rightarrow-\infty$ can be analogously obtained. This proves item (2) in the statement of the lemma. 3. 3.

The infimum in the definition of $\lambda_{r}$ is attained due to item (2) in the statement of the lemma. Analogously, the supremum in the definition of $\lambda_{l}$ is attained. Next consider any $\tau_{+}\in(1,\infty)$ and any $\tau_{-}\in(-\infty,0)$ . Since the function $f(t)=(\tau_{+}-t)^{-1}$ is convex on $[0,1]$ , according to Jensen’s Inequality, we have

[TABLE]

On the other hand, since the function $f(t)=(\tau_{-}-t)^{-1}$ is concave on $[0,1]$ , we have

[TABLE]

Hence,

[TABLE]

Taking the minimum over $\tau_{+}$ and maximum of $\tau_{-}$ gives us $\lambda_{r}>\lambda_{l}$ . Furthermore we note that $\Lambda(0^{-})\geq 0$ . Hence $\lambda_{l}\geq 0$ . This concludes the proof of item (3) in the statement of the lemma. 4. 4.

For any $x\in(\lambda_{l},\lambda_{r})$ , $\Lambda(\tau)=x$ doesn’t have a solution in $(-\infty,0]\cup[1,\infty)$ since $\Lambda(\tau)\leq\lambda_{l}\;\forall\;\tau\leq 0$ and $\Lambda(\tau)\geq\lambda_{r}\;\forall\;\tau\geq 1$ . Now consider any $x\geq\lambda_{r}$ . Since $\lambda(\tau)\leq\lambda_{l}<\lambda_{r}\;\forall\;\tau\leq 0$ , we know that all solutions of $\Lambda(\tau)=x$ lie in $[1,\infty)$ . Since $\Lambda$ is strictly convex in $(1,\infty)$ , there can be atmost 2 solutions. Now consider any $x>\lambda_{r}$ . Let $\tau_{r}=\arg\min_{\tau\geq 1}\Lambda(\tau)$ . Due to strict convexity of $\Lambda(\tau)$ , we have $\Lambda^{\prime}(\tau)>0$ for any $\tau\in(\tau_{r},\infty)$ . Hence $\Lambda(\tau)$ is strictly increasing on $[\tau_{r},\infty)$ . Since $\lambda_{r}=\Lambda(\tau_{r})<x<\Lambda(\infty)=\infty$ , we are guaranteed to have exactly one solution to $\Lambda(\tau)=x$ on $(\tau_{r},\infty)$ which indeed satisfies $\Lambda^{\prime}(\tau)>0$ . The analysis for the case when $x\leq\lambda_{l}$ can be done in a similar way. This concludes the proof of item (4) in the statement of the lemma.

∎

We are now in the position to characterize the support of $\gamma\boxtimes T$ which is the content of the following proposition.

Proposition 5.

The support of $\gamma\boxtimes T$ is given by

[TABLE]

where $(\gamma\boxtimes T)_{d}$ denotes the discrete part of the measure $\gamma\boxtimes T$ . If the random variable $T$ has a density with respect to the Lebesgue measure, then,

[TABLE]

Proof.

We first claim that $(\lambda_{l},\lambda_{r})\subset\text{Supp}(\gamma\boxtimes T)$ . Since the support of a measure is closed, this means that $[\lambda_{l},\lambda_{r}]\subset\text{Supp}(\gamma\boxtimes T)$ . We prove this claim by contradiction. Suppose that $\exists\lambda\in(\lambda_{l},\lambda_{r})$ such that $\lambda\not\in\text{Supp}(\gamma\boxtimes T)$ . To simplify notation, for $z\in\mathbb{C}^{-}$ , we introduce the following reciprocal subordination function $\tau_{T}(z)$

[TABLE]

According to Lemma 5, we have

[TABLE]

By Lemma 9, $\tau_{T}(\lambda-i\epsilon)$ uniquely solves the equation $\Lambda(\tau)=\lambda-i\epsilon$ in $\mathbb{C}^{-}$ . Taking $\epsilon\rightarrow 0$ , we obtain,

[TABLE]

In the step marked (a), we used the fact that since $\lim_{\epsilon\rightarrow 0^{+}}\tau_{T}(\lambda-i\epsilon)\not\in\text{Supp}(T)$ , we have $\exists c>0$ , such that for any $\epsilon$ small enough $\text{dist}(\tau_{T}(\lambda-i\epsilon),\text{Supp}(T))\geq c$ . This gives us a dominating function for an application of the dominated convergence theorem. Hence, we have found a solution for the equation $\lambda=\Lambda(\tau),\tau\in(-\infty,0)\cup(1,\infty)$ . But this contradicts Lemma 10. Hence, we have, $(\lambda_{l},\lambda_{r})\subset\text{Supp}(\gamma\boxtimes T)$ .

Next, we claim that any $x\in[0,\lambda_{l})\cup(\lambda_{r},\infty)$ is not in the support of the absolutely continuous part of $\gamma\boxtimes T$ . To show this, we first compute a first order asymptotic expansion of $\tau_{T}(x-i\epsilon)$ for $\epsilon\approx 0$ . From Lemma 10, we know there exists a unique solution for the equation $\Lambda(\tau)=x,\tau\in(-\infty,0)\cup(1,\infty)$ and ${\Lambda}^{\prime}(\tau)>0$ . We denote this solution by $\tau_{\star}$ . Since $\tau_{\star}\not\in\text{Supp}(T)$ , the function $\Lambda(\tau)$ is analytic in the neighborhood (in $\mathbb{C}$ ) of $\tau_{\star}$ . The implicit function theorem guarantees us a solution $\tau(\epsilon)=\tau_{R}(\epsilon)+i\tau_{I}(\epsilon)$ of the equation $\Lambda(\tau)=x-i\epsilon$ . However, this $\tau(\epsilon)$ may not be the reciprocal subordination function $\tau_{T}(x-i\epsilon)$ since we still need to verify it is in $\mathbb{C}^{-}$ . To take care of this, again by the implicit function theorem we have

[TABLE]

This gives us

[TABLE]

Hence, we have

[TABLE]

This verifies that $\tau(\epsilon)\in\mathbb{C}^{-}$ for $\epsilon$ small enough. Finally since $\tau_{T}(x-i\epsilon)$ is the unique solution to the equation $\Lambda(\tau)=x-i\epsilon$ in $\mathbb{C}^{-}$ , we have

[TABLE]

According to the Stieltjes Inversion Formula, Theorem 2, we obtain

[TABLE]

In the step marked (b), we are relying on the assumption that $\tau_{\star}\neq x$ . To verify this, we recall that $\tau_{\star}$ solves, $\Lambda(\tau_{\star})=x$ and $\tau_{\star}\not\in[0,1]$ . This means that

[TABLE]

Hence, we have shown

[TABLE]

This implies,

[TABLE]

Taking complements, we have $\text{Supp}((\gamma\boxtimes T)_{ac})\subset[\lambda_{l},\lambda_{r}]$ . Hence, we have shown that

[TABLE]

Therefore, $\text{Supp}(\gamma\boxtimes T)=[\lambda_{l},\lambda_{r}]\cup\text{Supp}((\gamma\boxtimes T)_{d})$ which proves the claim of the proposition. Finally, when $T$ has a density with respect to Lebesgue measure, Theorem 4 gives us $\text{Supp}((\gamma\boxtimes T)_{d})=\emptyset$ which yields the second claim in the proposition. ∎

Finally we note that in order to apply Theorem 3, it is necessary to understand the set ${\tau_{T}^{-1}(\{\theta\})\cap(\mathbb{R}\backslash{\text{Supp}(\gamma\boxtimes T)}})$ , $\theta\in\mathbb{R}$ (See Theorem 3 for a definition of $\tau_{T}$ ). This is done in the following lemma.

Lemma 11.

Let $(w_{\gamma},w_{T})$ denote the subordination functions corresponding to the free multiplicative convolution of $\gamma,\mathcal{L}_{T}$ . Define

[TABLE]

Then, we have

[TABLE]

where where, $\tau_{l}\triangleq\arg\max_{\tau\leq 0}\Lambda(\tau)$ , $\tau_{r}\triangleq\arg\min_{\tau\geq 1}\Lambda(\tau)$ .

Proof.

From Proposition 5, we know that $\text{Supp}(\gamma\boxtimes T)=[\lambda_{l},\lambda_{r}]$ , where $\lambda_{l}\overset{\scriptscriptstyle\Delta}{=}\max_{\tau\leq 0}\Lambda(\tau)$ and $\lambda_{r}\overset{\scriptscriptstyle\Delta}{=}\min_{\tau\geq 1}\Lambda(\tau)$ . Furthermore, we showed that for any $x\not\in[\lambda_{l},\lambda_{r}]$ , the reciprocal subordination function $\tau_{T}(x)$ is the unique solution to the equations: $\Lambda(\tau)=x,\Lambda^{\prime}(\tau)>0,\;\tau\not\in[0,1]$ . From Lemma 10, we know that when $x>\lambda_{r}$ , the unique solution to $\Lambda(\tau)=x,\Lambda^{\prime}(x)>0$ satisfies $\tau>\tau_{r}$ and when $x<\lambda_{l}$ , the unique solution satisfies $\tau<\tau_{l}$ . These considerations immediately yield the claim of the lemma. ∎

IV-E Proof of Lemmas 3 and 4

Recall we defined $\Lambda_{+}(\tau)$ as

[TABLE]

where $T=\mathcal{T}(|Z|/\sqrt{\delta})$ and $Z\sim\mathcal{CN}(0,1)$ , and

[TABLE]

We first prove Lemma 3, which we restated below for convenience.

Lemma 3. Let $\vartheta_{c}\overset{\scriptscriptstyle\Delta}{=}\left(1-\left(\mathbb{E}\left[\frac{|Z|^{2}}{1-T}\right]\right)^{-1}-\mathbb{E}[|Z|^{2}T]\right)^{-1}$ . Define the function $\theta(\vartheta)$ as:

•

When $\vartheta>\vartheta_{c}$ : Let $\theta(\vartheta)$ be the unique value of $\lambda$ that satisfies the equation:

[TABLE]

in the interval:

[TABLE]

•

When $\vartheta\leq\vartheta_{c}$ : $\theta(\vartheta)\overset{\scriptscriptstyle\Delta}{=}1$ .

Then, we have $L_{m}(\vartheta)\overset{\text{\tiny{a.s.}}}{\rightarrow}\Lambda_{+}(\theta(\vartheta))$ , where $L_{m}(\vartheta)$ is defined in (7).

Proof.

In Proposition 6, we obtained an asymptotic characterization of the spectrum of $\bm{E}(\vartheta)$ . More specifically, we proved that

[TABLE]

We recall the matrix $\bm{R}$ was defined as

[TABLE]

In particular, $\mu_{\bm{R}}\overset{\text{\tiny{d}}}{\rightarrow}\gamma$ , where the measure $\gamma$ is given by

[TABLE]

Applying Theorem 3, we obtain:

The spectral measure of $\bm{E}(\vartheta)\bm{H}_{m-1}\bm{R}\bm{H}_{m-1}^{\mathsf{H}}$ converges to:

[TABLE] 2. 2.

For any $\epsilon>0$ , we have, almost surely, for $m$ large enough that, $\sigma(\bm{E}(\vartheta)\bm{H}_{m-1}\bm{R}\bm{H}_{m-1}^{\mathsf{H}})\subset K_{\epsilon}$ , where $K_{\epsilon}$ is the $\epsilon$ -neighborhood of the set $K=\text{Supp}(\gamma\boxtimes\mathcal{L}_{T})\cup\tau_{T}^{-1}(\{\theta(\vartheta)\})$ . 3. 3.

For any $\lambda\in\tau_{T}^{-1}(\{\theta(\vartheta)\})\cap(\mathbb{R}\backslash\text{Supp}(\gamma\boxtimes\mathcal{L}_{T}))$ , we have almost surely exactly one eigenvalue of $\bm{E}(\vartheta)\bm{H}_{m-1}\bm{R}\bm{H}_{m-1}^{\mathsf{H}}$ in a small enough neighborhood of $\lambda$ for large enough $n$ .

In Proposition 5, we characterized $\text{Supp}(\gamma\boxtimes\mathcal{L}_{T})$ as $[\lambda_{l},\lambda_{r}]$ , where $\lambda_{l}=\max_{\tau\leq 0}\Lambda(\tau)$ , $\lambda_{r}=\min_{\tau\geq 1}\Lambda(\tau)$ and the function $\Lambda(\tau)$ is given by:

[TABLE]

In Lemma 11, we characterized the set:

[TABLE]

where, $\tau_{l}\triangleq\arg\max_{\tau\leq 0}\Lambda(\tau)$ , $\tau_{r}\triangleq\arg\min_{\tau\geq 1}\Lambda(\tau)$ . Putting these together, one obtains the following two cases:

Case 1: $\theta(\vartheta)\leq\tau_{r}.$ In this case, the set $\tau_{T}^{-1}(\{\theta\})\cap(\mathbb{R}\backslash{\text{Supp}(\gamma\boxtimes T)})=\emptyset$ . The matrix $\bm{E}(\vartheta)\bm{H}_{m-1}\bm{R}\bm{H}_{m-1}^{\mathsf{H}}$ has no eigenvalues outside the support of the bulk distribution, and

[TABLE]

Case 2: $\theta(\vartheta)>\tau_{r}.$ In this case, the set

[TABLE]

Hence, there is an eigenvalue in the neighborhood of $\Lambda(\theta(\vartheta)))$ . Since $\theta(\vartheta)>\tau_{r}$ , and $\Lambda$ is a strictly increasing function on $[\tau_{r},\infty)$ (Lemma 10), we have $\Lambda(\theta(\vartheta))>\lambda_{r}$ . Hence the eigenvalue in the neighborhood of $\Lambda(\theta(\vartheta))$ is the largest one, and we have

[TABLE]

It is now straightforward to check that the above two cases can be combined into a concise form stated in the claim of the lemma. ∎

We end this section by proving Lemma 4, restated below for convenience.

Lemma 4. The following hold for the equation:

[TABLE]

This equation has a unique solution. 2. 2.

Let $\vartheta_{\star}$ denote the solution of the above equation. Then:

Case 1

If $\psi_{1}(\tau_{r})\leq\frac{\delta}{\delta-1},$ we have

[TABLE]

Furthermore if $\psi_{1}(\tau_{r})<\delta/(\delta-1)$ , then,

[TABLE]

Case 2

If $\psi_{1}(\tau_{r})>\frac{\delta}{\delta-1},$ we have

[TABLE]

and,

[TABLE]

*where $\theta_{\star}>1$ is the unique $\theta\geq\tau_{r}$ that satisfies * $\psi_{1}(\theta)=\frac{\delta}{\delta-1}.$

Proof.

Before we begin the proof of this lemma, it is helpful to list the conclusions of some of the previous lemmas.

Lemma 8: In this lemma, for $\vartheta>\vartheta_{c}$ we defined the function $\theta(\vartheta)$ as the unique value of $\lambda>\max(1,\mathbb{E}[|Z|^{2}T]+1/\vartheta)$ that satisfies

[TABLE]

We also set $\theta(\vartheta)=1$ when $\vartheta\leq\vartheta_{c}$ . We also showed that $\theta(\vartheta)$ is strictly increasing on $[\vartheta_{c},\infty)$ and $\theta(\infty)=\infty$ . In particular $\theta(\vartheta)$ has a well defined inverse defined on the domain $[1,\infty)$ given by:

[TABLE]

Lemma 10: We defined the function $\Lambda(\tau)$ as

[TABLE]

We showed the that $\Lambda(\tau)$ is strictly convex on $[1,\infty)$ . We defined $(\tau_{r},\lambda_{r})$ to be the minimizing argument and the minimum value of $\Lambda(\tau)$ in $[1,\infty)$ . In particular $\tau_{r}\geq 1$ . We also showed that $\Lambda(\infty)=\infty$ . We further defined $\Lambda_{+}(\tau)$ in the following way:

[TABLE]

Some simple implications of the above assertions are: First, since $\theta(\vartheta)$ and $\Lambda_{+}$ are both non-decreasing continuous functions $\Lambda_{+}(\theta(\vartheta))$ is non-decreasing and continuous. Second, since $\Lambda(\tau)=\lambda_{r}$ for $\tau\leq\tau_{r}$ , we have, for all $\vartheta\leq\theta^{-1}(\tau_{r})$ , $\Lambda_{+}(\theta(\vartheta))=\lambda_{r}$ . Third since $\theta(\infty)=\infty$ and $\Lambda(\infty)=\infty$ , we have, $\Lambda_{+}(\theta(\vartheta))\rightarrow\infty$ as $\vartheta\rightarrow\infty$ . The only possible point of non-differentiability of $\Lambda_{+}(\theta(\vartheta))$ is at $\vartheta=\theta^{-1}(\tau_{r})$ . It is straightforward to compute the derivative of $\Lambda(\theta(\vartheta))$ at all other points using implicit function theorem and obtain

[TABLE]

The derivatives of $\Lambda,\theta$ can be calculated as,

[TABLE]

A representative plot of the function $\Lambda_{+}(\theta(\vartheta))$ is shown in Figure 3.

We are now in a position to prove the claims of the lemma.

Since $\Lambda_{+}(\theta(\vartheta))$ is continuous and non-decreasing and $1/\vartheta+\mathbb{E}[|Z|^{2}T]$ is continuous and strictly decreasing, the fixed point equation can have at most one solution. On the other hand comparing the values of the two sides of the fixed point equation at $\vartheta\rightarrow 0$ and $\vartheta\rightarrow\infty$ shows that there is at least one solution. 2. 2.

Let $\vartheta_{\star}$ be denote the solution of the fixed point equation $\Lambda_{+}(\theta(\vartheta))=1/\vartheta+\mathbb{E}[|Z|^{2}T]$ . A typical plot of these two functions is shown in Figure 3. The figure shows two possible cases for the intersection of the two curves: *Case 1: * The curves intersect at a point $\vartheta_{\star}\leq\theta^{-1}(\tau_{r})$ (or on the flat part of $\Lambda_{+}(\theta(\alpha)$ ). In this case we have, $\Lambda_{+}(\theta(\vartheta_{\star}))=\lambda_{r}$ .

*Case 2: * The curves intersect at a point $\vartheta_{\star}>\theta^{-1}(\tau_{r})$ or the rising part of $\Lambda_{+}(\theta(\alpha)$ . We have $\Lambda_{+}(\theta(\vartheta_{\star}))>\lambda_{r}$ . We can distinguish between the two cases by comparing the value of the function $1/\vartheta+\mathbb{E}[|Z|^{2}T]$ at $\vartheta=\theta^{-1}(\tau_{r})$ with $\lambda_{r}$ . In particular, we have,

*Case 1: *

[TABLE]

*Case 2: *

[TABLE]

Substituting the formula for $\theta^{-1}(\tau_{r})$ , mentioned in (22), and $\lambda_{r}=\Lambda(\tau_{r})$ and the formula for $\Lambda$ from (23), the 2 cases can be simplified slightly more.

*Case 1: * This case occurs when

[TABLE]

In this situation, we have, $\Lambda_{+}(\theta(\vartheta_{\star}))=\lambda_{r}$ . Furthermore, if we additionally have

[TABLE]

Then $\Lambda_{+}(\theta(\vartheta))$ is differentiable at $\vartheta_{\star}$ and, from (24), we have

[TABLE]

*Case 2: * This case occurs when

[TABLE]

In this situation, we have, $\Lambda_{+}(\theta(\vartheta_{\star}))>\lambda_{r}$ . It turns out that we can give a simpler expression for $\Lambda_{+}(\theta(\vartheta_{\star}))$ . In this case, $\vartheta_{\star}\geq\theta^{-1}(\tau_{r})$ solves,

[TABLE]

and $\theta(\vartheta_{\star})\geq 1$ is the solution of the equation

[TABLE]

By definition the function $\Lambda(\tau(\alpha))$ is

[TABLE]

We first eliminate $\vartheta_{\star}$ from Equations (27)-(29) and conclude that $\theta_{\star}\overset{\scriptscriptstyle\Delta}{=}\theta(\vartheta_{\star})$ solves

[TABLE]

and $\vartheta_{\star}$ is given by

[TABLE]

Since the solution to Equations (27)-(29) was guaranteed to be unique, the solution to (30) is guaranteed to be unique. Finally we can compute the derivative of $\Lambda_{+}(\theta(\vartheta))$ at $\vartheta=\vartheta_{\star}$ . It will be convenient to introduce the random variable $G=(\theta_{\star}-T)^{-1}$ to write the equations in a compact form. From (24)-(26), we have

[TABLE]

In the above display, in the step marked (a) we used the fact that $\theta_{\star}$ satisfies $\psi_{1}(\theta_{\star})=\delta/(\delta-1)$ . This concludes the proof of the characterization (2) given in the statement of the lemma.

∎

V Conclusions

We analyzed the asymptotic performance of a spectral method for phase retrieval under a random column orthogonal matrix model. Our results provides a rigorous justification for the conjectures in [11], which were obtained by analyzing an expectation propagation algorithm.

Appendix A Proof of Proposition 2

This section is devoted to the proof of Proposition 2. We denote the functions $\Lambda,\psi_{1},\psi_{2},\psi_{3}$ (recall (5)) with ${\mathcal{T}=\mathcal{T}_{\mathsf{opt}}}$ as $\Lambda_{\mathsf{opt}},\psi_{1}^{\mathsf{opt}},\psi_{2}^{\mathsf{opt}},\psi_{3}^{\mathsf{opt}}$ and those with $\mathcal{T}=\mathcal{T}_{\mathsf{opt},\epsilon}$ as $\Lambda_{\epsilon},\psi_{1}^{\epsilon},\psi_{2}^{\epsilon},\psi_{3}^{\epsilon}$ . Define the random variables:

[TABLE]

Next we observe that the function $\mathcal{T}_{\mathsf{opt},\epsilon}$ is a bounded, strictly increasing, Lipchitz function and consequently $T_{\epsilon}$ has a density with respect to the Lebesgue measure. Hence by the rescale and shift argument outlined in Remark 2, Theorem 1 applies to a equivalent modification of $\mathcal{T}_{\mathsf{opt},\epsilon}$ which can used to infer the corresponding result for $\mathcal{T}_{\mathsf{opt},\epsilon}$ (after another rescale and shift argument). This gives us the result:

[TABLE]

where $\tau_{r}^{\epsilon}\overset{\scriptscriptstyle\Delta}{=}\operatorname*{arg\,min}_{\tau\in[1,\infty)}\Lambda_{\epsilon}(\tau)$ and $\theta_{\star}^{\epsilon}$ is the solution to the fixed point equation (in $\tau$ ): $\psi_{1}^{\epsilon}(\tau)=\delta/(\delta-1)$ which is guaranteed to exist uniquely provided $\psi_{1}(\tau_{r}^{\epsilon})>\delta/{(\delta-1)}$ . First we observe that,

[TABLE]

In particular, at $\tau=1$ , we have,

[TABLE]

and,

[TABLE]

We consider the following two cases.

*Case 1: $1<\delta<2$ . * Lemma 10 shows that $\Lambda_{\epsilon}(\tau)$ is convex on $[1,\infty)$ . When $\delta<2$ , $\Lambda^{\prime}_{\epsilon}(1)>0$ for $\epsilon$ small enough, and hence $\Lambda_{\epsilon}$ is strictly increasing and $\tau_{r}^{\epsilon}=1$ . Moreover, in this case, for $\epsilon$ small enough,

[TABLE]

Hence, using (31),

[TABLE]

Case 2: $\delta>2$ In this case, for small enough $\epsilon$ , $\Lambda^{\prime}_{\epsilon}(1)<0$ . Hence the $\tau_{r}^{\epsilon}$ , the minimizer of the convex function $\Lambda_{\epsilon}$ occurs in the region $(1,\infty)$ . This means it satisfies the optimality condition:

[TABLE]

Next we claim that, $\forall\tau\in[1,\infty)$ ,

[TABLE]

which is a consequence of Chebychev’s association inequality (Fact 1) with the choice:

[TABLE]

In particular we have $\psi_{1}^{\epsilon}(\tau_{r}^{\epsilon})>\delta/(\delta-1)$ , and hence Theorem 1 gives us:

There exists a unique solution $\theta_{\star}^{\epsilon}\in(\tau_{r}^{\epsilon},\infty)$ such that $\psi_{1}^{\epsilon}(\theta_{\star}^{\epsilon})=\delta/(\delta-1)$ , 2. 2.

and,

[TABLE]

Next we claim that,

[TABLE]

To see this, observe

[TABLE]

If $\liminf_{\epsilon\downarrow 0}\theta_{\star}^{\epsilon}=1$ , one can select a subsequence along which $\psi_{1}^{\epsilon}(\theta_{\star}^{\epsilon})\rightarrow\mathbb{E}|Z|^{4}=2$ by dominated convergence which contradicts: $\psi_{2}^{\epsilon}(\theta_{\star}^{\epsilon})=\delta/(\delta-1)<2$ . Likewise if $\limsup_{\epsilon\downarrow 0}\theta_{\star}^{\epsilon}=\infty$ , one can find a subsequence along which $\theta_{\star}^{\epsilon}\rightarrow\infty$ and, by dominated convergence,

[TABLE]

which contradicts $\psi_{1}^{\epsilon}(\theta_{\star}^{\epsilon})=\delta/(\delta-1)<1\;\forall\;\delta\;\in\;(2,\infty)$ . We can now conclude that,

[TABLE]

where $\theta_{\star}^{\mathsf{opt}}$ is the unique solution to $\psi_{1}^{\mathsf{opt}}(\tau)=\delta/(\delta-1)$ in $\tau\in(1,\infty)$ guaranteed by Proposition 1 (due to [11]). This is because, by selecting a subsequence along with ${\theta_{\star}^{\epsilon}\rightarrow\liminf_{\epsilon\downarrow 0}\theta_{\star}^{\epsilon}}$ , we can conclude that, along that subsequence,

[TABLE]

This implies,

[TABLE]

and analogously,

[TABLE]

Since Proposition 1 guarantees that the equation ${\psi_{1}^{\mathsf{opt}}(\tau)=\delta/(\delta-1)}$ has a unique solution in $(1,\infty)$ we get,

[TABLE]

Dominated convergence now yields,

[TABLE]

and consequently, almost surely,

[TABLE]

The right hand side of the above display can be simplified to:

[TABLE]

This clean formula is due to [11] and we refer the reader to Appendix B in [11] for a proof.

Appendix B Miscellaneous results

Fact 1 (Chebychev Association Inequality, [27]).

Let $A,B$ be r.v.s and $B\geq 0$ . Suppose $f,g$ are two non-decreasing functions. Then,

[TABLE]

Furthermore, if, $\mathbb{P}\mathinner{\left(B=0\right)}=0$ and,

[TABLE]

then, the above inequality is strict.

Proof.

The proof of the inequality appears in [27]. Inspecting the proof we can derive a sufficient condition for the inequality to be strict. The proof in [27] shows,

[TABLE]

where $(B^{\prime},A^{\prime})$ is an independent sample of the random variables $(B,A)$ . Since, $f,g$ are increasing $(f(A)-f(A^{\prime}))\cdot(g(A)-g(A^{\prime}))\geq 0$ and $B\geq 0,B^{\prime}\geq 0$ . Hence the equality is tight iff:

[TABLE]

which is ruled out by the assumptions of the claim. ∎

Acknowledgments

We would like to thank Professor Serban Belinschi for discussions about free probability and Professor Tomoyuki Obuchi for discussions about the replica method. We acknowledge support from NSF DMS-1810888 and the Google faculty award.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev. Phase retrieval with application to optical imaging: A contemporary overview. 32(3):87–109, May 2015.
2[2] E. J. Candes, X. Li, and M. Soltanolkotabi. Phase retrieval via wirtinger flow: Theory and algorithms. 61(4):1985–2007, April 2015.
3[3] Yuxin Chen and E. J. Candes. Solving random quadratic systems of equations is nearly as easy as solving linear systems. Communications on Pure and Applied Mathematics , 70:822–883, May 2017.
4[4] G. Wang, G. B. Giannakis, and Y. C. Eldar. Solving systems of random quadratic equations via truncated amplitude flow. 64(2):773–794, Feb 2018.
5[5] Huishuai Zhang and Yingbin Liang. Reshaped wirtinger flow for solving quadratic system of equations. In Advances in Neural Information Processing Systems , pages 2622–2630, 2016.
6[6] Praneeth Netrapalli, Prateek Jain, and Sujay Sanghavi. Phase retrieval using alternating minimization. In Advances in Neural Information Processing Systems , pages 2796–2804, 2013.
7[7] Yue M. Lu and Gen Li. Phase transitions of spectral initialization for high-dimensional nonconvex estimation. Information and Inference, to appear , 2018.
8[8] Marco Mondelli and Andrea Montanari. Fundamental limits of weak recovery with applications to phase retrieval. Foundations of Computational Mathematics , pages 1–71, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices

Abstract

Index Terms:

I Introduction

II Main result

II-A Notation

II-A1 For Linear Algebraic Aspects

II-A2 For Complex Analytic Aspects

II-A3 For Probabilistic Aspects

II-A4 Miscellaneous:

II-B Measurement Model and Spectral Estimator

II-C Assumptions & Asymptotic Framework

Assumption 1**.**

Remark 1**.**

Remark 2**.**

Remark 3**.**

II-D Main Result

Remark 4**.**

Theorem 1**.**

Remark 5**.**

Remark 6** (Choice of Trimming function).**

Remark 7** (Extensions to generalized linear measurements).**

III Optimal Trimming Functions

Proposition 1** ([11]).**

Proposition 2**.**

IV Proof of Theorem 1

IV-A Roadmap

Lemma 1**.**

Proof.

Proposition 3** ([7]).**

Corollary 1**.**

Lemma 2**.**

Proof.

Lemma 3**.**

Lemma 4**.**

Case 1

Case 2

Proof.

IV-B Free Probability Background

IV-B1 Facts from Free Harmonic Analysis

Definition 1**.**

Definition 2**.**

Definition 3**.**

Theorem 2**.**

Definition 4**.**

Remark 8**.**

Lemma 5**.**

IV-B2 Spectrum of X=EUFUH\mathbf{X}=\mathbf{E}\mathbf{U}\mathbf{F}\mathbf{U}^{\mathsf{H}}X=EUFUH

Theorem 3**.**

Remark 9**.**

Remark 10**.**

Remark 11**.**

IV-B3 Singular Part of Free Convolution

Theorem 4** (Singular Part of γ⊠ν\gamma\boxtimes\nuγ⊠ν).**

IV-C Analysis of the Spectrum of E(ϑ)\mathbf{E}(\vartheta)E(ϑ)

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Lemma 8**.**

Proof.

Proposition 4**.**

Proof.

IV-D Analysis of the Support of γ⊠LT\gamma\boxtimes\mathcal{L}_{T}γ⊠LT​

Lemma 9**.**

Proof.

Lemma 10**.**

Proof.

Case 1: τ∈[1,∞)\tau\in[1,\infty)τ∈[1,∞).

Case 2: τ∈(−∞,0]\tau\in(-\infty,0]τ∈(−∞,0].

Proposition 5**.**

Proof.

Lemma 11**.**

Proof.

Assumption 1.

Remark 1.

Remark 2.

Remark 3.

Remark 4.

Theorem 1.

Remark 5.

Remark 6 (Choice of Trimming function).

Remark 7 (Extensions to generalized linear measurements).

Proposition 1 ([11]).

Proposition 2.

Lemma 1.

Proposition 3 ([7]).

Corollary 1.

Lemma 2.

Lemma 3.

Lemma 4.

Definition 1.

Definition 2.

Definition 3.

Theorem 2.

Definition 4.

Remark 8.

Lemma 5.

IV-B2 Spectrum of $\mathbf{X}=\mathbf{E}\mathbf{U}\mathbf{F}\mathbf{U}^{\mathsf{H}}$

Theorem 3.

Remark 9.

Remark 10.

Remark 11.

Theorem 4 (Singular Part of $\gamma\boxtimes\nu$ ).

IV-C Analysis of the Spectrum of $\mathbf{E}(\vartheta)$

Lemma 6.

Lemma 7.

Lemma 8.

Proposition 4.

IV-D Analysis of the Support of $\gamma\boxtimes\mathcal{L}_{T}$

Lemma 9.

Lemma 10.

Case 1: $\tau\in[1,\infty)$ .

Case 2: $\tau\in(-\infty,0]$ .

Proposition 5.

Lemma 11.

Fact 1 (Chebychev Association Inequality, [27]).