Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices
Rishabh Dudeja, Milad Bakhshizadeh, Junjie Ma, Arian Maleki

TL;DR
This paper analyzes the effectiveness of spectral initialization methods for phase retrieval when using random orthogonal matrices, providing precise asymptotic characterizations for practical measurement models.
Contribution
It extends the theoretical understanding of spectral methods in phase retrieval to isotropically random orthogonal matrices, a more realistic model for practical systems.
Findings
Derived a simple expression for the overlap between spectral estimator and true signal.
Provided asymptotic analysis for large measurement and signal dimensions.
Enhanced understanding of spectral initialization performance in practical measurement models.
Abstract
Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the spectral method, in which the leading eigenvector of a data-dependent matrix is used as a starting point. Recently, the performance of the spectral initialization was characterized accurately for measurement matrices with independent and identically distributed entries. This paper aims to obtain the same level of knowledge for isotropically random column-orthogonal matrices, which are substantially better models for practical phase retrieval systems. Towards this goal, we consider the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices
Rishabh Dudeja, Milad Bakhshizadeh, Junjie Ma, Arian Maleki
Department of Statistics, Columbia University
Abstract
Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. There has been recent interest in understanding the performance of local search algorithms that work directly on the non-convex formulation of the problem. Due to the non-convexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the spectral method, in which the leading eigenvector of a data-dependent matrix is used as a starting point. Recently, the performance of the spectral initialization was characterized accurately for measurement matrices with independent and identically distributed entries. This paper aims to obtain the same level of knowledge for isotropically random column-orthogonal matrices, which are substantially better models for practical phase retrieval systems. Towards this goal, we consider the asymptotic setting in which the number of measurements , and the dimension of the signal, , diverge to infinity with , and obtain a simple expression for the overlap between the spectral estimator and the true signal vector.
Index Terms:
Phase Retrieval, Spectral Initialization, Random Orthogonal Matrices, Coded Diffraction Pattern, Phase Transition, Random Matrix Theory.
I Introduction
Phase retrieval refers to the problem of recovering a signal from a set of phaseless linear observations . Under the absence of the measurement noise, the acquisition process is modeled as
[TABLE]
where is a measurement matrix and denotes the element of a vector. The phase retrieval problem is intended to model practical imaging systems where it is difficult to measure the phase of the measurements [1]. A number of recent recovery algorithms pose Phase retrieval as a non-convex optimization problem, and employ a local search algorithm to find the minimizer [2, 3, 4, 5]. For instance, the well known Wirtinger Flow algorithm [2] solves the optimization problem:
[TABLE]
using gradient descent.
Since the optimization problem (1) is non-convex, the initialization can have an impact on the success of local search algorithms. The most widely used initialization scheme, known as spectral initialization [6, 3, 4, 7, 8, 9], uses the leading eigenvector of the following data-dependent matrix:
[TABLE]
as the starting point for local search algorithms. In the above equation, , and denotes a suitable trimming function. Let denote the leading eigenvector of normalized to have unit Euclidean () norm. That is,
[TABLE]
The earliest analysis [6, 2] of the spectral estimator showed that if number of measurements is large enough (for a fixed ), then the leading eigenvector of is a consistent estimator of the true signal vector. However these analyses had two drawbacks: (i) They only provide information about the order of measurements required for a successful initialization and not a sharp requirement on the sampling ratio , (ii) These analyses fail to capture the difference in the performance of various trimming functions. Recently, Lu and Li [7] have analyzed the spectral estimator for measurement matrices that are composed of independent and identically distributed (i.i.d.) standard normal entries in the high dimensional asymptotic regime. More specifically, Lu and Li considered the asymptotic setting in which , , and obtained a sharp characterization for the overlap between the leading eigenvector and the true signal. In follow up work by Mondelli and Montanari [8] and Luo, Alghamdi and Lu [9] this characterization was leveraged to design optimal trimming functions. For the optimal trimming function, the overlap converges to zero when , and converges to a strictly positive value otherwise.
A major assumption in the analysis of [7, 8, 9] is that the measurement matrix contains i.i.d. Gaussian entries. However, it is well-known that many important applications of phase retrieval are concerned with Fourier-type matrices [10]. This leads to the following natural questions: (i) Are the conclusions of [7, 8, 9] correct for other matrices that are employed in practice? (ii) Is the optimal choice of trimming that was derived in [7, 8, 9] for Gaussian measurement matrices optimal for other matrices employed in practice? In response to these questions, Ma et al. [11] considered a popular class of matrices that can be used in phase retrieval systems, known as coded diffraction pattern (CDP) [12]. Through an extensive numerical study, the authors showed that the performance of the spectral initialization for such matrices closely approximates the performance of the spectral estimator for partial orthogonal matrices. The authors then designed an Expectation Propagation (EP) [13, 14] algorithm for the eigenvalue problem given in (3). EP algorithms had previously been proposed for partial orthogonal matrices in [15, 16] and their State Evolution (SE) had been analyzed in [17, 18]. Ma et al. used the SE of derived EP algorithm for the eigenvalue problem to derive a (conjectured) formula for the asymptotic overlap between the true signal vector and the spectral initialization. However, while it is believed that EP algorithm indeed solves the eigenvalue problem (this has also been observed in simulations), this has not been shown rigorously. As a result of such studies, the authors conjectured that for partial orthogonal matrices if the trimming function is chosen optimally, then for , , and for , , in the asymptotic setting where . As mentioned previously, the simulations in [11] suggest that these conjectures are also likely to hold for CDP matrices.
In this paper, we prove the conjectures presented in [11] for partial orthogonal matrices using tools from the free probability theory [19]. We believe this is the first theoretical justification that the expectation propagation framework can correctly predict the statistical properties of the solutions to non-convex optimization problems. The main technical step in our proof is the identification of the location of the largest eigenvalue using a subordination function [19]. Interestingly, this subordination function appears naturally in the expectation propagation (EP) algorithm of [11].
II Main result
II-A Notation
II-A1 For Linear Algebraic Aspects
For a matrix , refers to the conjugate transpose of . For a matrix , with real eigenvalues, we use to denote the eigenvalues arranged in descending order. We use to refer to the spectrum of which is simply the set of eigenvalues . Finally we define the spectral measure of , denoted by as,
[TABLE]
For , we denote the identity matrix by and a matrix of all zero entries by . For , We also define the special matrix as:
[TABLE]
II-A2 For Complex Analytic Aspects
For a complex number , refer to the real part, imaginary part, argument, modulus and conjugate of . We denote the complex upper half plane and lower half planes by
[TABLE]
II-A3 For Probabilistic Aspects
We use to denote the standard, circularly symmetric, complex Gaussian distribution. denotes the Haar measure on the unitary group. We denote almost sure convergence, convergence in probability and convergence in distribution by and respectively. Two random variables are equal in distribution, denoted by if they have the same distribution. Throughout this paper, the random variables refer to the pair of random variables with the joint distribution given by . For a borel probability measure , we use to denote the support of .
II-A4 Miscellaneous:
Let be a subset of or . denotes the closure of . The distance from a point to is defined by . We define the neighborhood of , denoted by as
[TABLE]
The symbol is used to denote the empty set.
II-B Measurement Model and Spectral Estimator
In the phase retrieval problem we are given observations generated as:
[TABLE]
where is the unknown signal vector and is the sensing matrix. We assume that and that the matrix is generated according to the following process: Sample from the Haar measure on the unitary group and set to be the matrix formed by picking the first columns of . More formally,
[TABLE]
and is defined in (4). An important parameter for our analysis will be the sampling ratio, denoted by . Let be a trimming function. We study spectral estimators constructed as the leading eigenvector of the matrix , defined below:
[TABLE]
where and .
II-C Assumptions & Asymptotic Framework
We analyze the performance of the spectral estimator in an asymptotic setup where . In particular, we consider a sequence of independent phase retrieval problems realized on the same probability space with increasing . We assume some regularity assumptions on the trimming function which are stated below.
Assumption 1**.**
The trimming function satisfies the following conditions:
* is Lipschitz continuous.* 2. 2.
. 3. 3.
The random variable , defined by and has a density with respect to the Lebesgue measure on .
In the following remarks, we discuss why each of these assumptions are required and whether they can be relaxed.
Remark 1**.**
We need the trimming function to be Lipschitz continuous so that the trimmed measurements can be approximated in distribution by . We expect this approximation to hold under weaker smoothness hypothesis on than Lipschitz continuity.
Remark 2**.**
The assumptions:
[TABLE]
are no stronger than the assumption that is a bounded trimming function. In fact, given any arbitary bounded trimming function with and , the spectral estimator constructed using has the same performance as the spectral measure constructed using
[TABLE]
This is because,
[TABLE]
In particular and have the same leading eigenvector. We require the assumption that the trimming function is bounded since a number of results in free probability theory that we rely on assume this.
Remark 3**.**
We need (3) in Assumption 1 to ensure that the limiting spectral measure of the matrix has no discrete component. We expect that this assumption can be completely removed by a careful analysis since the location of point masses in the limiting spectral measure of is well understood.
II-D Main Result
In order to state our main result about the performance of the spectral estimator, we need to introduce the following four functions:
[TABLE]
In the above display, the random variables have the joint distribution given by . The functions are defined on and the functions are defined on .
Remark 4**.**
Under Assumption 1, the support of the random variable is the interval . Hence the definition of these functions at needs some clarification. First, note that the random variable . Hence, the is well-defined, but maybe . If it is finite, each of the above functions are well-defined at . If , we define, . This corresponds to interpreting and in the definition of these functions.
Theorem 1**.**
Define . Also, let denote the unique value of that satisfies . Then, under Assumption 1, we have
[TABLE]
Furthermore,
[TABLE]
Remark 5**.**
The proof of Theorem 1 shows that if , there exists exactly one solution to the equation . Hence, is well-defined.
The proof of this result is postponed until Section IV. Before we proceed to the proof of this theorem, let us clarify some of its interesting features. First, note that similar to the Gaussian sensing matrices, even in the case of partial orthogonal matrices, the maximum eigenvector exhibits a phase transition behavior. For certain values of , the inequality holds, and hence the maximum eigenvector does not carry information about . For other values of , the inequality holds and hence, the direction of the maximum eigenvector starts to offer information about the direction of . For typical choices of the trimming function , there exists a critical value of , denoted by such that, when , the spectral estimator is asymptotically orthogonal to the signal vector. When , the spectral estimator makes a non-trivial angle with the signal vector. This phase transition phenomena is illustrated in Figure 1 for 3 different choices of .
Remark 6** (Choice of Trimming function).**
The trimming function in Figure 1 are supported on .
* is a translated and re-scaled version of the trimming function proposed by [8].* 2. 2.
* is a regularized version of the trimming function proposed by [9].*
Remark 7** (Extensions to generalized linear measurements).**
While we focus on the phase retrieval problem in this paper, our results extend straightforwardly to the generalized linear estimation, where the measurements are generated as follows:
[TABLE]
where denotes a conditional distribution modelling a possibly randomized output channel. Under suitable regularity assumptions on , Theorem 1 holds with the change that the joint distribution of the random variables is now given by:
[TABLE]
III Optimal Trimming Functions
Theorem 1 can used to design the trimming function optimally in order to obtain the best possible value of . Most of the work towards this goal was already done in [11] where the result in Theorem 1 was stated as a conjecture and was used to design the optimal trimming function. In particular, [11] showed the following impossibility result.
Proposition 1** ([11]).**
Let be any trimming function for which Theorem 1 holds. Then,
[TABLE]
where,
[TABLE]
where is the solution to the equation (in ):
[TABLE]
which exists uniquely when and, the random variable is distributed as:
[TABLE]
The work [11] also provided a candidate for the optimal trimming function:
[TABLE]
They showed that if the characterization given in Theorem 1 holds for , then it achieves the asymptotic squared correlation . Unfortunately, since is unbounded, Theorem 1 does not apply to it. Extending Theorem 1 to unbounded trimming functions would likely require extending previously known results in free probability to unbounded measures, and we don’t pursue this approach in our work. Instead, we suitably modify the arguments of [11] to show that the family of bounded trimming functions:
[TABLE]
attains an asymptotic squared correlation that can be made arbitrarily close to as .
Proposition 2**.**
Let denote the spectral estimator for obtained by using as the trimming function. We have, almost surely,
[TABLE]
We provide a proof of this result in Appendix A.
The regularized trimming functions are not only useful from a theoretical point of view to prove an achievability result, but also from a computational stand point: In simulations we have observed that the power iterations are slow to converge when is used as the trimming function due to presence of large negative eigenvalues and this problem is mitigated by using with a small value of (such as or ) with a negligible degradation in performance.
IV Proof of Theorem 1
IV-A Roadmap
Our proof follows the general strategy taken by [7]. In this subsection, we state several key lemmas and show how they fit together in the proof of Theorem II. First we note that without loss of generality, for the purpose of analysis of the spectral estimator, we can assume . The following lemma supports this claim.
Lemma 1**.**
The distribution of the cosine similarity, is independent of .
Proof.
Let be an arbitrary signal vector with . Let denote the measurements, trimmed measurements and spectral estimate generated when the sensing matrix was and the signal vector was . Note that the cosine similarity is a (deterministic) function of and hence we use the notation to denote the cosine similarity when the sensing matrix is and the signal vector is .
Let be such that . We have . Next we note that is the leading eigenvector of the matrix , where we defined . Noting that is a diagonal matrix consisting of the trimmed observations , we conclude that is the spectral estimate generated when the sensing matrix was and the signal vector was . Hence, we have concluded that
[TABLE]
Next we note that was generated from the sub-sampled Haar model, that is where . Since the Haar measure on is invariant to right multiplication by unitary matrices, we have
[TABLE]
where the notation means that two random vectors have the same distributions. Consequently . Therefore, , and the distribution of is independent of . ∎
In the light of the above lemma, in the rest of the paper, we will assume . Next, we partition by separating the first column
[TABLE]
where denotes all the remaining columns of (except ). Hence we can partition in the following way:
[TABLE]
Our strategy will be to reduce questions about the spectrum of the matrix to questions about the spectrum of a matrix of the form , where is a uniformly random unitary matrix, is a random matrix independent of and is deterministic. This matrix model has been well studied in Free Probability [19]. The starting point of our reduction is Proposition 2 from [7], stated below.
Proposition 3** ([7]).**
Let be an arbitrary deterministic symmetric matrix partitioned as:
[TABLE]
Then, we have
[TABLE]
where , and is the unique solution to the fixed point equation . Furthermore, let be the eigenvector corresponding to the largest eigenvalue of . Then,
[TABLE]
where and denote the left and right derivatives respectively. In particular, if is differentiable at , then
[TABLE]
A straightforward corollary of the above proposition to our problem is given below. Define the function
[TABLE]
Corollary 1**.**
Let be the unique solution of . Then, and
[TABLE]
In particular, if is differentiable at , then
[TABLE]
Hence, we shift our focus to characterizing the function . Recall the decomposition of the matrix given in (6). Recall that since , the diagonal matrix is a deterministic function of . If the sensing matrix consisted of independent Gaussian entries, then would have been independent of . This is no longer true when is a partial unitary matrix. In order to take care of this, the following lemma leverages a conditioning trick to get rid of the dependence. The following lemma also establishes the link between the function and the study of the spectrum of a matrix of the form , where is a uniformly random unitary matrix, is a random matrix independent of and is deterministic.
Lemma 2**.**
We have
[TABLE]
where
[TABLE]
* is an arbitrary basis matrix for , which denotes the subspace orthogonal to , and is independent of .*
Proof.
We condition on . Conditioned on , we can realize as:
[TABLE]
In the above equation, is matrix whose columns form an orthonormal basis of the orthogonal complement of and is a Haar Unitary of size independent of . Hence, we obtain
[TABLE]
In the step marked (a), We used the fact that for any two matrices (of appropriate dimensions), and have the same non-zero eigenvalues. In particular, we used this fact with:
[TABLE]
∎
Define the matrix,
[TABLE]
The following lemma characterizes the asymptotic limit of the function . Define as
[TABLE]
where and , and
[TABLE]
Lemma 3**.**
Let . Define the function as:
- •
When : Let be the unique value of that satisfies the equation:
[TABLE]
in the interval:
[TABLE]
- •
When : .
Then, we have , where is defined in (7).
The proof of Lemma 3 can be found in Section IV-E.
From Corollary 1, we know that solves the fixed point equation (in ): Simple concentration arguments (see Lemma 7, Section IV-C) show that asymptotically:
[TABLE]
Combining this with Lemma 3 suggests that asymptotically behaves like the solution to the following fixed point equation (in ):
[TABLE]
The following lemma analyzes the behavior of this asymptotic fixed point equation. The proof of this lemma can be found in Section IV-E.
Lemma 4**.**
The following hold for the equation:
[TABLE]
This equation has a unique solution. 2. 2.
Let denote the solution of the above equation. Then:
Case 1
If we have
[TABLE]
Furthermore if , then,
[TABLE]
Case 2
If we have
[TABLE]
and,
[TABLE]
*where is the unique that satisfies *
We are now in the position to prove our main result (restated below for convenience). Recall the definitions of the functions from Section II.
Theorem 1 Define . Also, let denote the unique value of that satisfies . Then, we have
[TABLE]
Furthermore,
[TABLE]
Proof.
We start with the analysis of the largest eigenvalue. We recall the claim of Corollary 1, which tells us that is given by where denotes the solution of and .
We also know that there exists a probability 1 event , on which, (Lemma 3) and (see Lemma 7 in Section IV-C).
We claim that on , , where is the solution of the limiting fixed point equation (which was analyzed in Lemma 4). To see this let . Consider a subsequence . Then applying Lemma 3 (in Appendix E) of [7], we obtain,
[TABLE]
That is, is also a solution to the limiting fixed point equation . But since this equation has a unique solution (Lemma 4), we have . Likewise, an analogous argument shows .
Now for any realization in the event , we have,
[TABLE]
In the above display, in the step marked (a), we again appealed to Lemma 3 (Appendix E) of [7] and the fact that . Finally, appealing to the alternative characterization of given in Lemma 4 gives us the claim of the theorem.
We now discuss our result about the cosine similarity. We recall that from Corollary 1, we have
[TABLE]
Appealing to Lemma 4 in Appendix E of [7], we have,
[TABLE]
The derivative of at was calculated in Lemma 4. Plugging this in the above expression gives the statement of the theorem. ∎
The remainder of this section is dedicated to the proof of Lemmas 3 and 4, and is organized as follows:
- •
Recall that (cf. 7)
[TABLE]
where
[TABLE]
Note that is independent of . The spectrum of such a matrix product has been studied in free probability theory, and we collect some results regarding this in Section IV-B.
- •
In order to apply the free probability results, we need to understand the spectrum of . This is done in Section IV-C.
- •
It turns out that the limiting spectrum measure of is given by the free convolution (defined in Section IV-B) of the measures and , where and is the law of the random variable . Section IV-D is devoted to understanding the support of the free convolution.
- •
Finally, Section IV-E proves lemmas 3 and 4.
IV-B Free Probability Background
Our analysis of the spectral estimators relies on a well-studied model in the theory of free probability; We will reduce the problem to the problem of understanding the spectrum of matrices of the form , where and are deterministic matrices and is a Haar-distributed unitary matrix. Then, the limiting spectral distribution of is the free multiplicative convolution of the limiting spectral distributions of and . This section is a collection of the results and definitions regarding these aspects. Here is the organization of this section. Section IV-B1 collects various facts from free harmonic analysis. Section IV-B2 describes the two fundamental results about the model that will be used throughout our paper. Section IV-B3 reviews some results about the support of singular part of the free convolution of two measures. Throughout this section, we assume that and are two arbitrary compactly supported probability measures on and that neither of the two measures is completely concentrated at a single point.
IV-B1 Facts from Free Harmonic Analysis
In this section, we collect some facts from the field of free harmonic analysis. All these results can be found in Chapter 3 of [20] or the papers [19] and [21].
Definition 1**.**
The Cauchy transform of at is defined as follows:
[TABLE]
Definition 2**.**
The moment generating function of , at is defined as follows:
[TABLE]
The Cauchy transform and the moment generating function are related via the relation
[TABLE]
Definition 3**.**
The -transform of a measure is defined as,
[TABLE]
The Cauchy Transform (and hence the Moment Generating function) uniquely characterizes a measure. The measure can be obtained by the following inversion formula. The particular version we state is taken from Section 3.1 of [19].
Theorem 2**.**
For , we have
[TABLE]
Furthermore, if satisfies , where and denote the absolutely continuous and the singular part of the measure with respect to the Lebesgue measure, then the density of the absolutely continuous part is given by
[TABLE]
Next we recall the definition of the free convolution based on the subordination functions from [22]. The statement we provide below appears in a more general form as Proposition 2.6 in [23].
Definition 4**.**
Let be a pair of probability measures. There exist analytic functions defined on such that, for all we have
; and . 2. 2.
For any , is the unique solution in of the fixed point equation , where is given by
[TABLE]
An analogous characterization holds for with the role of and changed.
The free convolution of the measures and denoted by is the measure whose moment generating function satisfies
[TABLE]
Remark 8**.**
We emphasize that each of the subordination functions depend on both the measures . This is clear since the function defining depends on both .
Note that the above definition defines and on . However these functions can be continously extended to (Lemma 3.2 in [19]). These extensions to the real line will be important for Theorem IV-B2.
Lemma 5**.**
The restrictions of subordination functions on have extensions to with the following properties:
* are continuous.* 2. 2.
If , then the functions continue analytically to a neighborhood of and
[TABLE]
IV-B2 Spectrum of
As we discussed before, we will convert the problem of analyzing the spectrum of to problems involving the spectrum of matrices of the form , where is a sequence of Haar distributed random matrices, and and are sequences of deterministic positive semidefinite matrices. In this section, we review two important results from the field of free probability regarding such matrices.
Suppose that and satisfy the following hypotheses:
- (i)
and , where are compactly supported measures on . 2. (ii)
has a single outlying eigenvalue not contained in . has no eigenvalues outside . 3. (iii)
The set of eigenvalues of not equal to converge uniformly to in the sense,
[TABLE]
Our next theorem characterizes the bulk distribution of . The first part of this theorem is due to [24] and the second and third parts are due to [19] (Theorem 2.3).
Theorem 3**.**
Let and denote the subordination functions for the free multiplicative convolution of and . Define
[TABLE]
Then we have, almost surely for large enough ,
. 2. 2.
Given , we have , where is the -neighborhood of and denotes the set of eigenvalues of . 3. 3.
For any such that with , we have .
Remark 9**.**
The hypothesis in the above theorem can be relaxed (as mentioned in Remark 5.11 of [19]) in the following two ways: 1) is random, independent of and is deterministic, provided occurs almost surely, 2) The spike locations depend on , provided almost surely.
Remark 10**.**
The above theorem is a simplified version of Theorem 2.3 in [19] which allows for multiple spikes in both and .
Remark 11**.**
The function might not be invertible. In such cases, can be a non-singleton set, and hence a single spike in can create multiple spikes in . But we will see that this doesn’t happen in our problem.
IV-B3 Singular Part of Free Convolution
In the last section we discussed the bulk distribution of . The main objective of this section is to mention a result regarding the largest eigenvalue of . We state regularity results for the singular part of from [25] (Corollary 3.4) and [21] (Theorem 4.1).
Theorem 4** (Singular Part of ).**
Decompose the singular part of as where denotes the discrete part and denotes the singular continous part. Then we have,
There can be at most two atoms. The possible locations of the atoms are:
- (a)
[math], with . 2. (b)
Any such that there exist with and and we have, . Note that there can be atmost one such . 2. 2.
Suppose neither of is completely concentrated at a single point. We have, . Hence,
[TABLE]
IV-C Analysis of the Spectrum of
In order to apply Theorem 3, we need to understand the spectrum of . This is done in the following lemma.
Lemma 6**.**
Let
[TABLE]
denote the sorted trimmed measurements. Let . Then,
The eigenvalues of interlace with in the sense,
[TABLE] 2. 2.
* can have at most one eigenvalue bigger than , which (if it exists) is given by the root of the following equation:*
[TABLE]
where is defined as
[TABLE] 3. 3.
Furthermore, and .
Proof.
Define the matrix . The main trick will be to choose the orthonormal basis matrix conveniently, which will make our calculations easier. Recall that the columns of matrix , i.e. , span the subspace . Any basis for subspace can serve as matrix . Hence, we chose the following specific construction of :
[TABLE]
where and With this choice, we note that
[TABLE]
Hence . To obtain the eigenvalues of we use its characteristic polynomial. To evaluate the characteristic polynomial of , we connect it to the characteristic polynomial of , where . Note that is a unitary matrix. First, we have
[TABLE]
Consider the following matrix equation:
[TABLE]
where
[TABLE]
Therefore,
[TABLE]
Now, we can compute the characteristic polynomial of . We have
[TABLE]
Note that
[TABLE]
Where is defined in the following way:
[TABLE]
Hence,
[TABLE]
We emphasize that the above equation does not imply that are the eigenvalues of . This is because while has zeros at , the function has poles at . This prevents us from concluding that when . However, we can make the following observations:
By Cauchy’s interlacing theorem, we have
[TABLE]
The above is also true for the eigenvalues of:
[TABLE]
since is a unitary matrix. 2. 2.
(9) shows that is a principal submatrix of
[TABLE]
Hence, the eigenvalues of will interlace the eigenvalues of :
[TABLE]
Combining (11) and (12), one obtains
[TABLE]
This proves statement (1) in the lemma. This means that has atmost one eigenvalue bigger than . If , then it has no outlying eigenvalue, if , it has exactly one. We call this eigenvalue an outlying eigenvalue for reasons that will be clear later. 3. 3.
The outlying eigenvalue of (if it exists) is a root of the characteristic polynomial:
[TABLE]
Since this root lies in , it must be a root of:
[TABLE]
Observing that:
[TABLE]
we conclude the outlying eigenvalue is the unique solution (if it exists) to:
[TABLE]
This proves statement (2). 4. 4.
Finally, we observe that is a positive semidefinite matrix for all , which shows . Also, we have . Note that and and . Hence, by the triangle inequality we have . This proves statement (3) of the lemma.
∎
The following lemma analyzes the concentration of the function to the deterministic function .
Lemma 7**.**
Suppose . For a Lipschitz function whose range is in , there exists an event of probability 1, on which the following three statements hold:
, 2. 2.
, 3. 3.
.
In the above equations, , and . Furthermore, denotes the law of the random variable , and
[TABLE]
Proof.
It is sufficient to show each item holds almost surely.
The argument for this part is a minor modification of the argument sketched in [26]. To prove statement (1) it suffices to show that
[TABLE]
almost surely. Because if we have (14), then for every bounded continuous function ,
[TABLE]
where is a bounded continuous function as well. Hence by (14),
[TABLE]
which implies .
To show (14), note that has the same distribution as , where , and . Let denote the cumulative distribution function of a standard normal random variable and define
[TABLE]
Then, we have
[TABLE]
Moreover,
[TABLE]
goes to [math] almost surely by Glivenko-Cantelli lemma. Furthermore, since
[TABLE]
and is a continuous function we conclude that
[TABLE]
Hence,
[TABLE]
almost surely which yields (14). 2. 2.
We now focus on the proof of statement (2). Let
[TABLE]
We will show that
[TABLE]
almost surely. This means there is a set , with measure [math], out of which we have the convergence for all . If we define , then out of and clearly .
First note that , where
[TABLE]
Define
[TABLE]
Note that for a fixed we have almost surely by the strong law of large numbers. Since is a decreasing function in and we have almost surely, we obtain for all with probability . Hence, it suffices to show under an event that holds with probability 1,
[TABLE]
To prove (18), we will find a sequence such that as , and,
[TABLE]
With this, Borel-Cantelli lemma yields that event
[TABLE]
has measure [math]. Out of the event we have (18) as it was desired.
Define the events:
[TABLE]
where is parameter we will set later. Note that,
[TABLE]
where we defined the terms as:
[TABLE]
Using the fact that and , we have,
[TABLE]
Observe that, on the event ,
[TABLE]
Since was assumed to be Lipchitz,
[TABLE]
where denotes the Lipchitz constant of . Hence, when , setting , we obtain, on the event
[TABLE]
where
[TABLE]
Note that as as required. And,
[TABLE]
where the last step follows from standard bounds on the tail Gaussian random variables and random variables. In particular, we have,
[TABLE]
as required. 3. 3.
The proof is similar to the proof of the second statement. Hence, we skip the details. Note that if we define
[TABLE]
then it again converges under the event , defined in the proof of statement (2).
∎
The next lemma analyzes the properties of the limiting fixed point equation . Define the critical value as:
[TABLE]
Lemma 8**.**
Consider the fixed point equation (in )
[TABLE]
on the domain:
[TABLE]
We have
If , then the above equation has exactly 1 solution, denoted by . Furthermore,
[TABLE]
Furthermore, we have is an increasing function of and . 2. 2.
If , then the equation has no solutions. For any , we define .
Proof.
The following change of measure simplifies some of the proofs:
[TABLE]
Note that is a proper probability density function since . With this notation, (20) can be written as
[TABLE]
Define the random variable . Note that . Further, define
[TABLE]
The first two derivatives of are
[TABLE]
First, since , the function is increasing. By Jensen’s Inequality . Since the equality holds if and only if is deterministic, and we have assumed that the support of is , we conclude that . Noting that and applying Chebychev’s association inequality (See Fact 1, Appendix B) with and gives . Hence is an increasing, concave function and .
Next, we claim that can have atmost one solution in . To see this, let be the first point at which the two curves intersect. Hence . Furthermore
[TABLE]
Hence there can be no other intersection point of the two curves after .
Now consider the following two cases:
Case 1: . First note that since is a convex function on , according to Jensen’s Inequality
[TABLE]
Hence,
[TABLE]
This shows that . Furthermore,
[TABLE]
On the other hand, we can also compare the limiting behavior of and as . We have
[TABLE]
and
[TABLE]
Hence, for large enough and . Hence the functions and intersect once in . Finally note that,
[TABLE]
Hence has exactly one solution in as claimed. By the Implicit Function Theorem, we can compute
[TABLE]
Hence is an increasing function of . Finally, we verify that . Suppose that this is not the case, i.e. as . Recalling the fixed point characterization of , we obtain that satisfies the fixed point equation
[TABLE]
This means that Jensen’s Inequality applied to the strictly convex function should be tight. This means under the tilted measure (), is deterministic. This is not possible since we have assumed that is supported on .
Case 2: As in Case 1 we argue (this time with the opposite conclusion) that
[TABLE]
Furthermore, since has no solution in . ∎
Combining the above sequence of lemmas, we obtain the following proposition about the spectrum of the matrix .
Proposition 4**.**
Let . Then, there exists an event of probability 1, on which we have,
. 2. 2.
If , . 3. 3.
If , then , and,
[TABLE]
where is the unique solution to the equation (in ):
[TABLE]
in the domain:
[TABLE]
Proof.
We restrict ourselves to the event guaranteed by Lemma 7, on which,
2. 2.
3. 3.
.
Let us denote this event by . Define the sequence of (random) functions as:
[TABLE]
with the domain:
[TABLE]
Define the (deterministic) function :
[TABLE]
with the domain:
[TABLE]
Note that on , we have .
By Lemma 6, we know that the eigenvalues of interlace with the eigenvalues of the diagonal matrix . On the event , . Hence indeed . This proves statement (1) of the proposition. 2. 2.
Consider the case . By Lemma 6, we already know that and . Hence to prove (2), it is sufficient to show that
[TABLE]
For the sake of contradiction, suppose that there is a realization in such that . On this realization we consider a subsequence such that . All the analysis henceforth is along this subsequence. Since for all large enough , by Lemma 6, we must have . Applying Lemma 3 from [7] (Appendix E), we obtain
[TABLE]
Since , we know by Lemma 8 that does not have any solution in . Hence,
[TABLE]
However,
[TABLE]
This contradicts . Hence, \underset{m\rightarrow\infty}{\lim\sup}\ \lambda_{1}(\bm{E}(\vartheta))\leq 1,\;\text{on \mathcal{E}.} This concludes the proof of statement (2). 3. 3.
Now consider the case . Again by Lemma 6, we know for all . By Lemma 8, we know that has a unique solution in denoted by . Fix an small enough such that lies in the domain of . Note that , while and (by Lemma 8).
Since , for all large enough, also lies in the domain of . By Lemma 7, we have for all . In particular, we have, for all large enough while . Hence, by Lemma 6, we have for all large enough. Hence indeed, . This proves (3).
∎
IV-D Analysis of the Support of
We recall that is the law of the random variable , and . To keep the notation clean, we will refer to the analytic transforms corresponding to the measure with the subscript , for example the Cauchy transform for the measure will be referred to as .We begin by computing the Cauchy Transform of .
Lemma 9**.**
Let . Then, we have,
[TABLE]
In the above display, the subordination function, , is the unique solution in to the equation , where the function is defined as:
[TABLE]
Proof.
First we can compute the moment generating functions:
[TABLE]
The -transforms of the two measures are given by,
[TABLE]
Hence, we can compute the function , given in Definition 4,
[TABLE]
Hence is the unique solution in of the equation . This equation can be simplified to
[TABLE]
where the function is defined as Hence, we can compute the moment generating function of in the following way:
[TABLE]
In the above display, in the step marked (a), we used the fact that solves . Finally, the Cauchy Transform of is given by
[TABLE]
∎
Our next goal is to characterize . Theorem 4 gives a complete characterization of the support of the singular part of . Hence, we now need to understand the support of the absolutely continuous part of . According to the Stieltjes Inversion theorem, (Theorem 2) the density of the continuous part is given by
[TABLE]
Since uniquely solves in , our interest will be to study the solutions of this equation for . Hence, we begin by studying the solutions of . Before doing so, we clarify the definition of at which is a subtle case because . We note that the random variable is non-negative and hence the expectation is well defined but might be . If it is finite, then is well defined at . If the expectation is , we define which is consistent with intepreting . is defined at analogously. This definition ensures is a continuous function on . Next we discuss the solutions of . Figure 2 shows a typical plot . As is clear from this figure we expect the following two quantities to play major roles in determining the existence of a solution of : Define
[TABLE]
Our next lemma proves the properties of suggested by Figure 2.
Lemma 10**.**
The following statements are true about :
* is a convex function on and a concave function on .* 2. 2.
. 3. 3.
. 4. 4.
Consider the 3 mutually exclusive and exhaustive cases:
Case A: . There is at least one and at most two solutions to . All solutions lie in . Furthermore, when , there is exactly one solution for the equation . This unique solution additionally satisfies .
Case B: . There are no solutions of the equation .
Case C: . There is at least one and at most two solutions to . All solutions lie in . Furthermore, when, , there is a unique solution to . This solution additionally satisfies .
Proof.
We define the random variable ,
[TABLE]
We observe that for any , where as for , . It is straightforward to see that For notational simplicity, we will often short hand as . We have
[TABLE]
Consider the following two cases,
Case 1: .
Applying Chebychev’s Association Inequality (Fact 1) with and gives us that . In fact, an inspection of the proof of the Chebychev’s Association Inequality from [27] allows us to rule out the equality case under the assumptions imposed on , and we have . Hence, is strictly convex in . Since is continuous on , we have is convex on
Case 2: .
Again, applying Chebychev’s Association Inequality with and gives us , Hence is concave in this region. As before, an inspection of the proof of Chebychev’s Association inequality allows us to rule out the equality case under the assumptions imposed on , and we have . Hence, is strictly concave in . Since is continuous on , we have is concave on . This concludes the proof of statement (1) in the lemma. 2. 2.
Note that,
[TABLE]
This shows . The claim about the limit as can be analogously obtained. This proves item (2) in the statement of the lemma. 3. 3.
The infimum in the definition of is attained due to item (2) in the statement of the lemma. Analogously, the supremum in the definition of is attained. Next consider any and any . Since the function is convex on , according to Jensen’s Inequality, we have
[TABLE]
On the other hand, since the function is concave on , we have
[TABLE]
Hence,
[TABLE]
Taking the minimum over and maximum of gives us . Furthermore we note that . Hence . This concludes the proof of item (3) in the statement of the lemma. 4. 4.
For any , doesn’t have a solution in since and . Now consider any . Since , we know that all solutions of lie in . Since is strictly convex in , there can be atmost 2 solutions. Now consider any . Let . Due to strict convexity of , we have for any . Hence is strictly increasing on . Since , we are guaranteed to have exactly one solution to on which indeed satisfies . The analysis for the case when can be done in a similar way. This concludes the proof of item (4) in the statement of the lemma.
∎
We are now in the position to characterize the support of which is the content of the following proposition.
Proposition 5**.**
The support of is given by
[TABLE]
where denotes the discrete part of the measure . If the random variable has a density with respect to the Lebesgue measure, then,
[TABLE]
Proof.
We first claim that . Since the support of a measure is closed, this means that . We prove this claim by contradiction. Suppose that such that . To simplify notation, for , we introduce the following reciprocal subordination function
[TABLE]
According to Lemma 5, we have
[TABLE]
By Lemma 9, uniquely solves the equation in . Taking , we obtain,
[TABLE]
In the step marked (a), we used the fact that since , we have , such that for any small enough . This gives us a dominating function for an application of the dominated convergence theorem. Hence, we have found a solution for the equation . But this contradicts Lemma 10. Hence, we have, .
Next, we claim that any is not in the support of the absolutely continuous part of . To show this, we first compute a first order asymptotic expansion of for . From Lemma 10, we know there exists a unique solution for the equation and . We denote this solution by . Since , the function is analytic in the neighborhood (in ) of . The implicit function theorem guarantees us a solution of the equation . However, this may not be the reciprocal subordination function since we still need to verify it is in . To take care of this, again by the implicit function theorem we have
[TABLE]
This gives us
[TABLE]
Hence, we have
[TABLE]
This verifies that for small enough. Finally since is the unique solution to the equation in , we have
[TABLE]
According to the Stieltjes Inversion Formula, Theorem 2, we obtain
[TABLE]
In the step marked (b), we are relying on the assumption that . To verify this, we recall that solves, and . This means that
[TABLE]
Hence, we have shown
[TABLE]
This implies,
[TABLE]
Taking complements, we have . Hence, we have shown that
[TABLE]
Therefore, which proves the claim of the proposition. Finally, when has a density with respect to Lebesgue measure, Theorem 4 gives us which yields the second claim in the proposition. ∎
Finally we note that in order to apply Theorem 3, it is necessary to understand the set , (See Theorem 3 for a definition of ). This is done in the following lemma.
Lemma 11**.**
Let denote the subordination functions corresponding to the free multiplicative convolution of . Define
[TABLE]
Then, we have
[TABLE]
where where, , .
Proof.
From Proposition 5, we know that , where and . Furthermore, we showed that for any , the reciprocal subordination function is the unique solution to the equations: . From Lemma 10, we know that when , the unique solution to satisfies and when , the unique solution satisfies . These considerations immediately yield the claim of the lemma. ∎
IV-E Proof of Lemmas 3 and 4
Recall we defined as
[TABLE]
where and , and
[TABLE]
We first prove Lemma 3, which we restated below for convenience.
Lemma 3. Let . Define the function as:
- •
When : Let be the unique value of that satisfies the equation:
[TABLE]
in the interval:
[TABLE]
- •
When : .
Then, we have , where is defined in (7).
Proof.
In Proposition 6, we obtained an asymptotic characterization of the spectrum of . More specifically, we proved that
[TABLE]
We recall the matrix was defined as
[TABLE]
In particular, , where the measure is given by
[TABLE]
Applying Theorem 3, we obtain:
The spectral measure of converges to:
[TABLE] 2. 2.
For any , we have, almost surely, for large enough that, , where is the -neighborhood of the set . 3. 3.
For any , we have almost surely exactly one eigenvalue of in a small enough neighborhood of for large enough .
In Proposition 5, we characterized as , where , and the function is given by:
[TABLE]
In Lemma 11, we characterized the set:
[TABLE]
where, , . Putting these together, one obtains the following two cases:
Case 1: In this case, the set . The matrix has no eigenvalues outside the support of the bulk distribution, and
[TABLE]
Case 2: In this case, the set
[TABLE]
Hence, there is an eigenvalue in the neighborhood of . Since , and is a strictly increasing function on (Lemma 10), we have . Hence the eigenvalue in the neighborhood of is the largest one, and we have
[TABLE]
It is now straightforward to check that the above two cases can be combined into a concise form stated in the claim of the lemma. ∎
We end this section by proving Lemma 4, restated below for convenience.
Lemma 4. The following hold for the equation:
[TABLE]
This equation has a unique solution. 2. 2.
Let denote the solution of the above equation. Then:
Case 1
If we have
[TABLE]
Furthermore if , then,
[TABLE]
Case 2
If we have
[TABLE]
and,
[TABLE]
*where is the unique that satisfies *
Proof.
Before we begin the proof of this lemma, it is helpful to list the conclusions of some of the previous lemmas.
Lemma 8: In this lemma, for we defined the function as the unique value of that satisfies
[TABLE]
We also set when . We also showed that is strictly increasing on and . In particular has a well defined inverse defined on the domain given by:
[TABLE]
Lemma 10: We defined the function as
[TABLE]
We showed the that is strictly convex on . We defined to be the minimizing argument and the minimum value of in . In particular . We also showed that . We further defined in the following way:
[TABLE]
Some simple implications of the above assertions are: First, since and are both non-decreasing continuous functions is non-decreasing and continuous. Second, since for , we have, for all , . Third since and , we have, as . The only possible point of non-differentiability of is at . It is straightforward to compute the derivative of at all other points using implicit function theorem and obtain
[TABLE]
The derivatives of can be calculated as,
[TABLE]
A representative plot of the function is shown in Figure 3.
We are now in a position to prove the claims of the lemma.
Since is continuous and non-decreasing and is continuous and strictly decreasing, the fixed point equation can have at most one solution. On the other hand comparing the values of the two sides of the fixed point equation at and shows that there is at least one solution. 2. 2.
Let be denote the solution of the fixed point equation . A typical plot of these two functions is shown in Figure 3. The figure shows two possible cases for the intersection of the two curves: *Case 1: * The curves intersect at a point (or on the flat part of ). In this case we have, .
*Case 2: * The curves intersect at a point or the rising part of . We have . We can distinguish between the two cases by comparing the value of the function at with . In particular, we have,
*Case 1: *
[TABLE]
*Case 2: *
[TABLE]
Substituting the formula for , mentioned in (22), and and the formula for from (23), the 2 cases can be simplified slightly more.
*Case 1: * This case occurs when
[TABLE]
In this situation, we have, . Furthermore, if we additionally have
[TABLE]
Then is differentiable at and, from (24), we have
[TABLE]
*Case 2: * This case occurs when
[TABLE]
In this situation, we have, . It turns out that we can give a simpler expression for . In this case, solves,
[TABLE]
and is the solution of the equation
[TABLE]
By definition the function is
[TABLE]
We first eliminate from Equations (27)-(29) and conclude that solves
[TABLE]
and is given by
[TABLE]
Since the solution to Equations (27)-(29) was guaranteed to be unique, the solution to (30) is guaranteed to be unique. Finally we can compute the derivative of at . It will be convenient to introduce the random variable to write the equations in a compact form. From (24)-(26), we have
[TABLE]
In the above display, in the step marked (a) we used the fact that satisfies . This concludes the proof of the characterization (2) given in the statement of the lemma.
∎
V Conclusions
We analyzed the asymptotic performance of a spectral method for phase retrieval under a random column orthogonal matrix model. Our results provides a rigorous justification for the conjectures in [11], which were obtained by analyzing an expectation propagation algorithm.
Appendix A Proof of Proposition 2
This section is devoted to the proof of Proposition 2. We denote the functions (recall (5)) with as and those with as . Define the random variables:
[TABLE]
Next we observe that the function is a bounded, strictly increasing, Lipchitz function and consequently has a density with respect to the Lebesgue measure. Hence by the rescale and shift argument outlined in Remark 2, Theorem 1 applies to a equivalent modification of which can used to infer the corresponding result for (after another rescale and shift argument). This gives us the result:
[TABLE]
where and is the solution to the fixed point equation (in ): which is guaranteed to exist uniquely provided . First we observe that,
[TABLE]
In particular, at , we have,
[TABLE]
and,
[TABLE]
We consider the following two cases.
*Case 1: . * Lemma 10 shows that is convex on . When , for small enough, and hence is strictly increasing and . Moreover, in this case, for small enough,
[TABLE]
Hence, using (31),
[TABLE]
Case 2: In this case, for small enough , . Hence the , the minimizer of the convex function occurs in the region . This means it satisfies the optimality condition:
[TABLE]
Next we claim that, ,
[TABLE]
which is a consequence of Chebychev’s association inequality (Fact 1) with the choice:
[TABLE]
In particular we have , and hence Theorem 1 gives us:
There exists a unique solution such that , 2. 2.
and,
[TABLE]
Next we claim that,
[TABLE]
To see this, observe
[TABLE]
If , one can select a subsequence along which by dominated convergence which contradicts: . Likewise if , one can find a subsequence along which and, by dominated convergence,
[TABLE]
which contradicts . We can now conclude that,
[TABLE]
where is the unique solution to in guaranteed by Proposition 1 (due to [11]). This is because, by selecting a subsequence along with , we can conclude that, along that subsequence,
[TABLE]
This implies,
[TABLE]
and analogously,
[TABLE]
Since Proposition 1 guarantees that the equation has a unique solution in we get,
[TABLE]
Dominated convergence now yields,
[TABLE]
and consequently, almost surely,
[TABLE]
The right hand side of the above display can be simplified to:
[TABLE]
This clean formula is due to [11] and we refer the reader to Appendix B in [11] for a proof.
Appendix B Miscellaneous results
Fact 1** (Chebychev Association Inequality, [27]).**
Let be r.v.s and . Suppose are two non-decreasing functions. Then,
[TABLE]
Furthermore, if, and,
[TABLE]
then, the above inequality is strict.
Proof.
The proof of the inequality appears in [27]. Inspecting the proof we can derive a sufficient condition for the inequality to be strict. The proof in [27] shows,
[TABLE]
where is an independent sample of the random variables . Since, are increasing and . Hence the equality is tight iff:
[TABLE]
which is ruled out by the assumptions of the claim. ∎
Acknowledgments
We would like to thank Professor Serban Belinschi for discussions about free probability and Professor Tomoyuki Obuchi for discussions about the replica method. We acknowledge support from NSF DMS-1810888 and the Google faculty award.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev. Phase retrieval with application to optical imaging: A contemporary overview. 32(3):87–109, May 2015.
- 2[2] E. J. Candes, X. Li, and M. Soltanolkotabi. Phase retrieval via wirtinger flow: Theory and algorithms. 61(4):1985–2007, April 2015.
- 3[3] Yuxin Chen and E. J. Candes. Solving random quadratic systems of equations is nearly as easy as solving linear systems. Communications on Pure and Applied Mathematics , 70:822–883, May 2017.
- 4[4] G. Wang, G. B. Giannakis, and Y. C. Eldar. Solving systems of random quadratic equations via truncated amplitude flow. 64(2):773–794, Feb 2018.
- 5[5] Huishuai Zhang and Yingbin Liang. Reshaped wirtinger flow for solving quadratic system of equations. In Advances in Neural Information Processing Systems , pages 2622–2630, 2016.
- 6[6] Praneeth Netrapalli, Prateek Jain, and Sujay Sanghavi. Phase retrieval using alternating minimization. In Advances in Neural Information Processing Systems , pages 2796–2804, 2013.
- 7[7] Yue M. Lu and Gen Li. Phase transitions of spectral initialization for high-dimensional nonconvex estimation. Information and Inference, to appear , 2018.
- 8[8] Marco Mondelli and Andrea Montanari. Fundamental limits of weak recovery with applications to phase retrieval. Foundations of Computational Mathematics , pages 1–71, 2017.
