Support Recovery in the Phase Retrieval Model: Information-Theoretic Fundamental Limits
Lan V. Truong, Jonathan Scarlett

TL;DR
This paper investigates the fundamental limits of support recovery in phase retrieval models with noisy measurements, providing sharp thresholds and new concentration bounds for information content.
Contribution
It introduces sharp information-theoretic bounds for support recovery in phase retrieval, including new concentration bounds for log-concave random variables.
Findings
Sharp thresholds for support recovery in phase retrieval models.
New concentration bounds for the conditional information content.
Near-matching constants in various sparsity and noise regimes.
Abstract
The support recovery problem consists of determining a sparse subset of variables that is relevant in generating a set of observations. In this paper, we study the support recovery problem in the phase retrieval model consisting of noisy phaseless measurements, which arises in a diverse range of settings such as optical detection, X-ray crystallography, electron microscopy, and coherent diffractive imaging. Our focus is on information-theoretic fundamental limits under an approximate recovery criterion, considering both discrete and Gaussian models for the sparse non-zero entries, along with Gaussian measurement matrices. In both cases, our bounds provide sharp thresholds with near-matching constant factors in several scaling regimes on the sparsity and signal-to-noise ratio. As a key step towards obtaining these results, we develop new concentration bounds for the conditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced X-ray Imaging Techniques · Medical Image Segmentation Techniques · Sparse and Compressive Sensing Techniques
Support Recovery in the Phase Retrieval Model: Information-Theoretic Fundamental Limits
Lan V. Truong and Jonathan Scarlett L. V . Truong is with the Department of Engineering, the University of Cambridge, Cambridge CB2 1PZ UK (e-mail: [email protected]).J. Scarlett is with the Department of Computer Science, School of Computing, National University of Singapore (NUS), Singapore 117417, and also with the Department of Mathematics, NUS, Singapore 119076 (e-mail: [email protected]).This work is supported by an NUS Early Career Research Award. This paper was presented in part at the 2019 IEEE Information Theory Workshop.Copyright © 2017 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected].
Abstract
The support recovery problem consists of determining a sparse subset of variables that is relevant in generating a set of observations. In this paper, we study the support recovery problem in the phase retrieval model consisting of noisy phaseless measurements, which arises in a diverse range of settings such as optical detection, X-ray crystallography, electron microscopy, and coherent diffractive imaging. Our focus is on information-theoretic fundamental limits under an approximate recovery criterion, considering both discrete and Gaussian models for the sparse non-zero entries, along with Gaussian measurement matrices. In both cases, our bounds provide sharp thresholds with near-matching constant factors in several scaling regimes on the sparsity and signal-to-noise ratio. As a key step towards obtaining these results, we develop new concentration bounds for the conditional information content of log-concave random variables, which may be of independent interest.
Index Terms:
Phase retrieval, support recovery, sparsity pattern recovery, information-theoretic limits, compressive sensing, non-linear models, log-concave concentration.
I Introduction
Recently, there has been a growing interest in recovering an unknown signal from a number of phaseless quadratic observations, each taking the form , where is a measurement vector, and represents measurement noise. Since only the magnitude of is measured, and not the phase (or the sign, in the real case), this problem is referred to as phase retrieval. The phase retrieval problem has many applications including optical detection, -ray crystallography, electron microscopy, and coherent diffractive imaging [1].
I-A Sparse Phase Retrieval
Similarly to the basic linear model, various works have shown that the number of measurements can be reduced significantly if the signal is sparse, i.e., it has at most non-zero entries for some . Here we provide a non-exhaustive list of relevant results from the literature.
It is shown in [1] that for real-valued signals, stable phase retrieval can be achieved with measurements in the noiseless setting, and with measurements in the noisy setting under some conditions on the noise distribution. The measurements considered in [1] are isotropic and sub-Gaussian, including (real) Gaussian measurements as a special case. The corresponding results are information-theoretic and are not shown to be attained with any practical algorithm, and to the best of our knowledge, it remains an open problem to attain comparable theoretical results for practical algorithms. However, numerical evidence has been given for the success of generalized approximate message passing (GAMP) with roughly Gaussian measurements in the noiseless case, and for the robustness of GAMP to noise [2]. On the other hand, rigorous results for efficient algorithms with Gaussian measurements typically require significantly more measurements; for instance, an bound is attained in [3] via a semidefinite programming (SDP) approach.
While our focus will be on Gaussian measurements, it is also worth highlighting some works that achieve computationally efficient sparse phase retrieval with a similar number of carefully-designed non-Gaussian measurements. Cai et al. [4] designed an algorithm that succeeds with measurements and decoding time in the noiseless complex-valued setting; the measurement matrix is generated from the structure of a series of bipartite graphs (between signal components and measurements) with various desirable properties. More recently, in the real-valued setting, Nakos [5] proposed an algorithm that recovers an approximately -sparse vector under the guarantee, with measurements and decoding time for any constant . See also [6] for additional variants.
In the noisy complex-valued setting, Iwen et al. [7] provided a simple two-stage sparse phase retrieval strategy that can stably reconstruct up to a global phase shift using only measurements, under some bounded noise assumptions. In addition, Pedarsani et al. [8] used a sparse-graph coding approach to attain an approximate support recovery guarantee for quantized signals with: (i) measurements and decoding time, or (ii) measurements and decoding time. In the noiseless case, these further reduce to , even without the assumption of quantized signals.
Fourier measurements are also commonly considered, and are particularly relevant in many practical applications. For example, in this setting, Jaganathan et al. [9] gave guarantees on recovering sparse signals whose support is aperiodic, and proposed an efficient two-stage algorithm is proposed that first identifies the support, and then the sparse signal values.
I-B Support Recovery
A distinct goal that has received less attention in phase retrieval, but considerable attention in other models, is the support recovery problem [10, 11, 12], where one wishes to exactly or approximately determine the support given a collection of observations and the corresponding measurement matrix .111 As we mention in the notation section below, the boldness of these symbols is used to highlight their association with multiple measurements. In contrast, while represents a vector, it is non-bold because it is only associated with a single measurement. This problem is of direct interest when the goal is to find which variables influence the output (rather than their associated weights), and may also be used as a first step towards estimating the values of (e.g., see [13, 9]).
Under general linear and non-linear models, Scarlett and Cevher [14] provided achievability and converse bounds characterizing the trade-off between error probability and number of measurements. They applied their general bounds to the linear, -bit, and group testing models to obtain precise thresholds on the number of measurements required to achieve vanishing decoding error probability in the high-dimensional limit. Numerous other related works also exist, with the focus being mainly on linear models [15, 16, 17, 18, 19]; see [14] for a more detailed overview. In particular, approximate recovery criteria were studied by Reeves and Gastpar [20, 21] in the regime , and by Scarlett and Cevher [14] in the regime ; we focus on the latter setting.
Although the initial bounds in [14] are very general, applying these bounds to new models can still be very challenging, due to the need to establish concentration bounds and mutual information bounds on a case-by-case basis. In this paper, we use this approach to establish fundamental limits for approximate support recovery in the phase retrieval model, under a log-concavity assumption on the noise distribution. To achieve this goal, we need to overcome at least two key challenges: establishing concentration bounds for information quantities in the phase retrieval model, and upper and lower bounding key conditional mutual information terms that have no closed form expressions. For each of these challenges, we develop novel auxiliary results, some of which may be of independent interest. The following subsection lists our specific contributions in more detail.
I-C Contributions
Our main contributions in this paper are as follows:
- •
We extend the concentration bounds of the unconditional information content of log-concave densities by Fradelizi et al. [22, Theorem 3.1] to conditional versions (cf. Corollary 9) in which joint log-concavity does not hold. Due to this extension, we can establish concentration bounds for the conditional information density of -dimensional random variables (cf. Theorem 11) and apply these bounds to the phase retrieval model. Because of their generality, our extended concentration bounds might be of independent interest.
- •
Under i.i.d. complex Gaussian measurement matrices , we establish tight upper and lower bounds on the required number of measurements to achieve approximate support recovery (i.e., recovering a given proportion of the support) under both discrete (cf. Lemma 13) and Gaussian (cf. Theorem 2) modeling assumptions on the non-zero entries of . In both cases, the upper and lower bounds coincide up to an explicit constant factor in certain sparsity regimes, and this constant factor is often very close to one (e.g., when the signal-to-noise ratio is sufficiently high).
I-D Notation
We use the similar notation to [14]. We use upper-case letters for random variables, and lower cases for their realizations. A non-bold character may be a scalar or a vector, whereas a bold character refers to a collection of scalars (e.g., ) or vectors (e.g., ), where is the number of measurements. We write to denote the subvector of at the columns indexed by , and to denote the submatrix of containing the columns indexed by . The complement with respect to is denoted by .
The symbol means “distributed as”. For a given joint probability density distribution , the corresponding marginal distributions are denoted by and , and similarly for conditional probability density marginals (e.g., ). The notation , , etc. denotes the corresponding i.i.d. distribution in which each term is distributed as , , etc. We write for probabilities, for expectations, and for variances.
We use usual notations for the differential entropy (e.g., ) and mutual information (e.g., ), and their conditional counterparts (e.g., ). We use the notation for real Gaussian random variables, for complex Gaussians (with variance in each of the real and imaginary parts), and for the central chi squared distribution with degrees of freedom.
We make use of the standard asymptotic notations and . We define the function and write the floor and ceiling functions as and , respectively. The function has base , and all information quantities are measured in nats.
Throughout the paper, we frequently make use of integrals written as , , etc., where denotes a suitable measure that can typically be taken to be the Lebesgue measure. For , we say that a function on is in is is integrable.
I-E Structure of the Paper
In Section II, we formally introduce the problem setup and overview our main results. In Section III, we provide the main auxiliary results on log-concavity, concentration of measure, and mutual information bounds. Sections IV and V provide the proofs of our main support recovery results. Conclusions are drawn in Section VI.
II Problem Setup and Main Results
II-A Model and Assumptions
Let denote the ambient dimension, the sparsity level, and the number of measurements. We let be the set of subsets of having cardinality . The key random variables in the support retrieval problem are the support set , the unknown signal , the measurement matrix , and the observation vector .
The support set is assumed to be equiprobable on the subsets within . Given , the entries of are deterministically set to zero, and the remaining entries are generated according to some distribution .222We allow for both discrete and continuous distributions on , meaning that in some cases represents a probability mass function rather than a density function. We assume that these non-zero entries follows the same distribution for all the possible realizations of , and that this distribution is permutation-invariant.
We consider the setting of (complex) Gaussian measurements, in which the measurement matrix takes i.i.d. values on , whose density is denoted by . We write , to denote the corresponding i.i.d. distribution for matrices, and we write as a shorthand for . Given , each entry of the observation vector is generated in a conditionally independent manner according to the following model:
[TABLE]
where , , and , with being an arbitrary log-concave density function. This log-concavity assumption is made for mathematical convenience, but also captures a wide range of noise distributions, including Gaussian. We note that the permutation-invariance of , and with respect to allows us to condition on a fixed of cardinality throughout the analysis (e.g. ) without loss of generality; such conditioning should henceforth be assumed unless explicitly stated otherwise.
The relation (1) induces the following conditional joint distribution of (given ):
[TABLE]
and its multiple-observation counterpart
[TABLE]
where is the -fold product of . The remaining entries of the measurement matrix are distributed as .
Given and , a decoder forms an estimate of . Like previous works studying the information-theoretic limits of support recovery (e.g., [14, 15]), we assume that the decoder knows the system model, including and . We focus on the approximate recovery criterion, only requiring that at least entries of are successfully identified (approximate recovery) for some . Following [20, 14], the error probability is given by
[TABLE]
Note that if both and have cardinality with probability one, then the two events in the union are identical, and hence either of the two can be removed. A more stringent performance criterion also considered in literature is the exact support recovery problem, where the error probability is given by , but our techniques currently appear to be less suited to that setting.
Our main goal is to derive necessary and sufficient conditions on (as a function of and ) such that vanishes as . Moreover, when considering converse results, we will not only be interested in conditions under which , but also conditions under which the stronger statement holds.
II-B Overview of Main Results
Here we state and discuss the two main results of this paper. Both of the theorems concern the information-theoretic limits of support recovery in the phase retrieval as described above, but with two different models of interest for the non-zero entries . We note that our results are asymptotic as and , and we seek explicit constant factors in the leading term, but leave higher-order terms unspecified. Sharper (e.g., non-asymptotic) characterizations appear to be much more challenging, and are beyond the scope of this work. In addition, we emphasize that our achievability result is based on a computationally intractable information-theoretic decoder, and approaching the fundamental limits with practical decoding techniques remains an interesting direction for future studies.
Discrete setting. The first result concerns a discrete distribution on , namely, is a uniformly random permutation of a fixed complex vector . We let be the sorted version of such that , and define the following mutual information quantities:
[TABLE]
where is the differential entropy of .
Theorem 1**.**
Consider the phase retrieval setup in Section II, with being a uniformly random permutation of a fixed complex vector . Let and , and assume that , and that with as . In addition, assume that there are distinct elements in .
We have as provided that
[TABLE]
for arbitrarily small if either of the following additional conditions hold: (i) and , or (ii) (and is arbitrary).
Conversely, under the general scaling and arbitrary , we have as whenever
[TABLE]
for arbitrarily small .
Proof:
See Section IV. ∎
We observe that the upper and lower bounds are nearly in closed form, other than the optimization over a single scalar . Moreover, the two have a very similar form, with the main difference being the appearance of vs. in the numerator, and vs. in the denominator. The bounds hold for an arbitrary log-concave noise distribution .
Since the noise variance is fixed and the measurement matrix has normalized entries, the assumption corresponds to the case that the signal-to-noise ratio (SNR) is constant. We observe that under this assumption, the upper and lower bounds provide matching \Theta\big{(}k\log\frac{p}{k}\big{)} behavior. Perhaps more significantly, in the high-SNR limit (i.e., ), we obtain nearly identical constant factors. To see this, it suffices to crudely lower bound by \frac{1}{2}\log\big{[}\big{(}\frac{4}{\exp(2h(Z))}\big{)}\big{(}\lfloor\alpha k\rfloor|b_{\rm{min}}|^{2}\big{)}^{2}+1\big{]}, and upper bound by \frac{1}{2}\log\big{[}\big{(}\frac{2\pi e}{\exp(2h(Z))}\big{)}\big{(}\lfloor\alpha k\rfloor|b_{\rm{max}}|^{2}\big{)}^{2}+1\big{]}+\frac{1}{2}\log\big{[}1+\frac{\lfloor\alpha k\rfloor k|b_{\rm{max}}|^{4}}{\lfloor\alpha k\rfloor^{2}|b_{\rm{min}}|^{4}}\big{]}+\frac{1}{2}\log\big{(}\frac{\pi e}{2}\big{)}. For any bounded away from zero, since , these both behave as as (or equivalently ), which implies that the maxima in (8) and (9) are attained by in this limit, and the upper and lower bounds coincide up to a factor of .
We believe that the additional assumptions on and in the achievability part are an artifact of our analysis, and note that similar assumptions were made for the linear model in [14]. The conditions in Theorem 1 are less restrictive than those in [14] since we are considering approximate recovery instead of exact recovery.
Gaussian setting. We now turn to a (complex) Gaussian model on the non-zero entries in which , for some . This is analogous to a model considered for the linear setting in [20, 14]. Our result is stated in terms of the mutual information quantities
[TABLE]
where is defined as
[TABLE]
with denoting the cumulative distribution function of a random variable.
Theorem 2**.**
Consider the phase retrieval setup in Section II where , and with for some constant . If with , then we have as provided that
[TABLE]
for arbitrarily small .
Conversely, under the broader scaling regime with , we have as whenever
[TABLE]
for arbitrarily small .
The assumption in the achievability part (which holds, for example, when for some ) is rather restrictive compared to the general scaling in the converse part. The former arises from a significant technical challenge (see Proposition 14 below), and we expect that the requirement is merely an artifact of our analysis.333In fact, extending our analysis to the broader scaling regime (for some ) leads to the correct scaling , but unfortunately the resulting constant factors are quite loose compared to Theorem 2. In addition, we note that while we allowed an arbitrary log-concave distribution in the discrete setting, here we have focused on to simplify the analysis. Despite this restriction, we believe that Gaussian noise still captures the essential features of the phase retrieval problem.
Once again, the scaling amounts to a fixed SNR. As mentioned in [20], exact recovery is not possible for Gaussian when the SNR is constant, and may even need a huge number of measurements when the SNR increases with . This motivates the consideration of approximate recovery in this setting.
The differences between the upper and lower bounds are similar to the discrete case. In particular, although the constants differ, the bounds are similar, and always have the same scaling laws. In the limit , we have and ; in this case, the maxima in (13)-(14) are both achieved with , and hence, the two bounds coincide to within a multiplicative factor of .
Comparison to the linear model. In Figures 1 and 2, we plot the upper and lower bounds of Theorem 1 and Theorem 2 for under various signal-to-noise ratios (SNRs), along with the counterparts for the linear model in [14].444The approximate recovery result for the discrete case was not explicitly stated in [14], but it is easily inferred from the analysis, and amounts to a much simpler version of the analysis of the present paper. For the discrete model, we focus on the simple case that and
[TABLE]
for some , corresponding to in Theorem 1. In Appendix A, we describe how we equate the SNR in the linear and phase retrieval models, and also how to evaluate the bounds of Theorem 1 when .
As predicted by the discussion following Theorems 1 and 2, the upper and lower bounds are close (though still with a constant gap) when the SNR is sufficiently high. In addition, in this regime the information-theoretic limits of the phase retrieval model and the linear model are very similar, especially in the Gaussian case.
However, at lower SNR, the gap for the phase retrieval model can widen significantly more than that of the linear model. This appears to be because the key mutual information quantities arising in the analysis can only be expressed in closed form in the linear model, while requiring possibly-loose bounds in the phase retrieval model. However, all that is needed to close these gaps (at least partially) is to deduce improved mutual information bounds for the phase retrieval setting (cf., Section III-D).
III Auxiliary Results
In this section, we introduce the main auxiliary results needed to prove Theorems 1 and 2. We first introduce some notation and recall the initial bounds for general observation models from [14], and then present the relevant log-concavity properties, mutual information bounds, and concentration bounds.
III-A Information-Theoretic Definitions
We first outline some information theoretic definitions from [14], recalling that we are conditioning on a fixed throughout. We consider partitions of the support set into two disjoint sets and , where will typically correspond to an overlap between and some other set (i.e., , the “equal” part), and will correspond to the indices in one set but not in the other (i.e., , the “differing” part).
For fixed and a corresponding pair , we introduce the notation
[TABLE]
where is the marginal distribution of (4). While the left-hand sides of (16) and (17) represent the same quantities for any pair , it will still prove convenient to work with these in place of the right-hand sides. In particular, this allows us to introduce the marginal distributions
[TABLE]
where . Using the preceding definitions, we introduce two information densities (in the terminology of the information theory literature, e.g., [23]). The first contains probabilities averaged over ,
[TABLE]
whereas the second conditions on :
[TABLE]
where is the -th measurement, and the single-letter information density is
[TABLE]
Averaging (22) with respect to the distribution in (17) conditioned on yields a conditional mutual information quantity, which is denoted by
[TABLE]
III-B General Achievability and Converse Bounds
For the general support recovery problem with probabilistic models, the following achievability and converse bounds are given in [14]. While these are stated for the real-valued setting in [14], the proofs apply verbatim to the complex-valued setting.
Theorem 3**.**
[14, Theorem 5]** Fix any constants , and , and functions such that the following holds:
[TABLE]
for all with and for all in some (typical) set . Then we have
[TABLE]
where
[TABLE]
Theorem 4**.**
[14, Theorem 6]** Fix any constants , and functions such that the following holds:
[TABLE]
for all with , and for all in some (typical) set . Then we have
[TABLE]
The steps for applying and simplifying these bounds are as follows:
Establish an explicit characterization of each mutual information term (e.g., upper and lower bounds); 2. 2.
Use concentration of measure to find expressions for each function and in Theorems 3 and 4, i.e., functions satisfying (24) and (28); 3. 3.
According to the specific model on the non-zero entries under consideration, choose a suitable typical set , and also a value of , so that both and can be proved to be vanishing as ; 4. 4.
Combine and simplify the preceding steps to deduce the final sample complexity bound.
These steps turn out to be highly non-trivial in the phase retrieval setting. In the following subsections, we provide general-purpose tools for Steps 1 and 2; we defer Steps 3 and 4 to Section IV for discrete , and to Section V for Gaussian .
III-C Log-Concavity Properties
Both our mutual information bounds and concentration bounds will crucially rely on the log-concavity properties stated in the following lemma.
Lemma 5**.**
Under the phase retrieval setup in Section II, we have the following:
Given and , the conditional marginal density of is log-concave; 2. 2.
Given , , and for some , the conditional marginal density of is log-concave.
Proof:
Recall that is log-concave by assumption, and with having i.i.d. entries. In other words, , where is the squared magnitude of a random variable. We observe that is log-concave, since the distribution with two degrees of freedom is log-concave [24] and the convolution of two log-concave functions is log-concave [25].
In addition, given , , and , we have , where is the squared magnitude of a random variable. This distribution on is also log-concave by a similar argument, and the fact that the non-central distribution with two degrees of freedom is log-concave [24]. ∎
III-D Mutual Information Bounds
While an exact expression for the mutual information does not appear to be possible, the following theorem states closed-form upper and lower bounds. While there is a gap between the two in general, the asymptotic behavior is similar when grows large; this fact ultimately leads to tight sample complexity bounds in the high-SNR setting.
Theorem 6**.**
For the phase retrieval setup in Section II, the following holds for defined in (23):
[TABLE]
where and .
Proof:
The upper bound is based on the entropy power inequality and the maximum entropy property of the Gaussian distribution, and the lower bound is based on (known) results that give nearly-matching lower bounds for log-concave random variables. The details are given in Appendix B. ∎
III-E Concentration Bounds
Perhaps the most technically challenging part of our analysis is to establish concentration bounds amounting to explicit expressions for and in Theorems 3 and 4.
Before stating the final concentration bounds, we provide a general result that may be of independent interest, giving a concentration bound on conditional information random variables of the form (in generic notation) under certain log-concavity assumptions. Such a result is provided as a corollary of the following, which considers generic random variables that need not be associated with the phase retrieval problem at this point.
Proposition 7**.**
Suppose that with joint density function . For each , define
[TABLE]
and assume that
[TABLE]
for all . Moreover, for an arbitrary positive number (to be chosen later), define
[TABLE]
and assume that
[TABLE]
Then, the following holds:
[TABLE]
where
[TABLE]
Proof:
We follow the general approach of [22], which considers the unconditional information variable ; however, many of the details differ significantly. The reader is referred to Appendix C. ∎
From this, we immediately deduce a similar result for i.i.d. product distributions.
Corollary 8**.**
Let . Suppose that with distribution (i.e., i.i.d. on ), where satisfies (33) and (35). Then, the following holds:
[TABLE]
where
[TABLE]
and is defined in (34).
Proof:
The i.i.d. assumption readily yields and \mathbb{E}[\exp(\mu\tilde{h}(\mathbf{Y}|\mathbf{X}))]=\big{(}\mathbb{E}[\exp(\mu\tilde{h}(Y|X))]\big{)}^{n}, where . Hence,
[TABLE]
and the corollary follows by bounding the expectation via Proposition 7. ∎
We are now ready to state a general result on the concentration of conditional information variables.
Corollary 9**.**
Let with . Then, under conditions (33) and (35) of Proposition 7, the following holds for any :
[TABLE]
where is defined in (38), in (34), and in (39).
Proof:
With Corollary 8 in place, this is a fairly straightforward application of the Chernoff bound. The details are given in Appendix C-C. ∎
Remark 10**.**
Some remarks are in order.
- •
In **[22, Theorem 3.1]**, the authors showed that for any and any random vector such that for all (i.e., is absolutely integrable). Theorem 7 shows that this fact can be extended to conditional distributions under some assumptions on the joint distribution .
- •
When and are independent, Theorem 7 is very similar to **[22, Theorem 3.1]**.
- •
When we apply Corollary 9 to the phase retrieval problem, we will bound using the log-concavity properties in Lemma 5.
- •
If and were jointly log-concave, a variant of (36) with an alternative definition for could be used based on **[22, Theorem 3.1]** and the union bound, since and . However, such a bound is not suitable for out purposes, since the measurement variables and outputs in the phase retrieval problem are not jointly log-concave.
- •
Alternatively, using only the fact that is log-concave for all , **[22, Theorem 3.1]** gives for suitably-defined that
[TABLE]
However, (40) does not appear to follow from (46).
Although the preceding results are general, finding an explicit expression or upper bound for in (34) is non-trivial. With some technical effort, we are able to attain such a bound for the phase retrieval model and deduce the following key concentration result used in the proofs of Theorems 1 and 2.
Theorem 11**.**
Under the phase retrieval setup in Section II, the following bounds hold:
[TABLE]
for all , where is defined in (23), is a constant depending on ,555The definition is given in (277) in Lemma 23. and is defined in (39).
Proof:
See Appendix D. ∎
It turns out that the constant behaves as whenever , leading to the following corollary.
Corollary 12**.**
For the complex phase retrieval problem in (1), equations (47) and (48) hold with replaced by some constant under the condition .
Proof:
See Appendix D. ∎
IV Proof of Theorem 1 (Discrete )
As a stepping stone to proving Theorem 13, we state the following lemma, which can be thought of as a version of that theorem before applying the suitable mutual information bounds and asymptotic simplifications. Recall that is defined in (23).
Lemma 13**.**
Consider the setup of Theorem 1 with being a uniformly random permutation of satisfying , and , and , and distinct elements in .
We have as provided that
[TABLE]
for arbitrarily small if either of the following additional conditions holds: (i) and ; or (ii) .
Conversely, for general and , we have as provided that
[TABLE]
for arbitrarily small .
IV-A Proof of Lemma 13
We apply Theorem 3 in several steps as follows.
Step 1: Choose the typical set. Let be the set of all permutations of the fixed complex vector . Under the conditions and , we observe that the quantity also behaves as , while behaves as (note that we only consider with cardinality , a constant fraction of the total ). Hence, we find from (6) of Theorem 6 that
[TABLE]
In addition, since there are at most possible random permutations by the definition of , choosing gives ; this immediately follows by writing in (27).
Step 2: Bound the information density tail probabilities. Fix (to be chosen later), and define
[TABLE]
for each , where is defined in Corollary 12.
Now, for each integer representing , set
[TABLE]
By setting in (47), and applying Corollary 12, we have
[TABLE]
Similarly, we obtain from (48) that
[TABLE]
This means that the conditions (24) and (28) are satisfied with and defined in (53), respectively.
Step 3: Control the remainder terms. We first consider the remainder term in the converse bound (30) resulting from (56). Since , we have for some . Since as stated in (51), we deduce from (52) that
[TABLE]
so that (54) yields
[TABLE]
We choose to be a slowly vanishing function of , so that a simple Taylor expansion in the definition of in (39) yields r\big{(}\Theta(\delta_{2})\big{)}=\Theta(\delta_{2}^{2}). Therefore, (58) vanishes as if n=\omega\big{(}\frac{1}{\delta^{2}_{2}}\big{)}.
We now turn to the achievability part. First observe that the term in (26) vanishes as provided that . Since , this is equivalent to
[TABLE]
for all . From (53) and (59), we find that provided that
[TABLE]
as for all , where we have used .
Since for some , (60) holds provided that
[TABLE]
for arbitrarily small . Again using for slowly vanishing (as established in the above converse part), we find that this condition simplifies to n=\Omega\big{(}\frac{k}{\delta_{2}^{2}}\big{)}.
We also need to consider the effect of the term in Theorem 3, recalling that we already established that with . For the first case in Lemma 13, i.e., and , we have . In the second case, i.e. and , we have \gamma=O(k\log k)=o\big{(}k\log\frac{p}{k}\big{)}. Hence, in both cases, we have \gamma=o\big{(}k\log\frac{p}{k}\big{)}.
Step 4: Combine and simplify. For the converse part, since (58) vanishes when n=\omega\big{(}\frac{1}{\delta_{2}^{2}}\big{)}, we deduce from Theorem 4 (with and sufficiently slowly) that when (50) holds, as required. Specifically, (50) is merely an asymptotic simplification of (29).
For the achievability part, by choosing and sufficiently slowly in Theorem 3, we find that the condition (25) reduces to
[TABLE]
for arbitrarily small . Since and for some , the first term in the numerator of (62) behaves as , and the second term behaves as . Since for both cases (i) and (ii) of Lemma 13, we have , it immediately follows that the numerator in (62) is dominated by the first term and the others can be factored into the remainder term . Moreover, the condition n=\Omega\big{(}\frac{k}{\delta_{2}^{2}}\big{)} stated following (61) behaves as o\big{(}\alpha k\log\frac{p}{k}\big{)} when sufficiently slowly (e.g., ). Combining these observations, we deduce that we only require (62), with the first term alone kept in the numerator, and the rest factored into in (49).
IV-B Proof of Theorem 1
Recall the definitions and . Since , we have for some .
For the achievability part, we use the lower bound in (6) of Theorem 6. Since this lower bound is increasing in and does not depend on , we have the following whenever :
[TABLE]
recalling that defined in (6) replaces by the value corresponding to the lowest-magnitude entries of . Hence, (8) of Theorem 1 follows from (49) of Lemma 13 by observing that the numerator of (49) behaves as \big{(}\alpha k\log\frac{p}{k}\big{)}(1+o(1)) and the denominator is lower bounded by via (63).
For the converse part, we use the upper bound in (6) of Theorem 6. While this bound depends on and in a more complicated fashion, the converse bound (50) remains valid when we replace the maximum over by any fixed choice. Under the choice in which contains the indices corresponding to the entries of with the smallest magnitude, (6) yields
[TABLE]
where is defined in (6).
Regarding the numerator in (50), it was shown in [14, Proof of Cor. 2] via simple asymptotic expansions that the term \log\big{(}\sum_{d=0}^{\lfloor\alpha^{*}k\rfloor}{p-k\choose d}{|s_{\rm{dif}}|\choose d}\big{)} is dominated by as with , and that the overall numerator in (50) simplifies to if . Combining this fact with (64), we have that as if
[TABLE]
for some . This yields equation (9) of Theorem 1.
V Proof of Theorem 2 (Gaussian )
One of the key challenges in the Gaussian setting compared to the discrete setting is bounding the quantity appearing in Theorem 3. As noted in [14], this roughly amounts to bounding the mutual information quantity , for which the approaches proposed in [14] appear to be insufficient. The following proposition states a bound on resulting from a novel approach.
Proposition 14**.**
Under the phase retrieval setup in Section II with for some , for some \sigma_{\beta}^{2}=\Theta\big{(}\frac{1}{k}\big{)}, and with , the following holds:
[TABLE]
for any , where is defined in (27) of Theorem 3, i.e., P_{0}(\gamma):=\mathbb{P}\big{[}\log\frac{f_{\mathbf{Y}|\mathbf{X}_{s}\beta_{s}}(\mathbf{Y}|\mathbf{X}_{s},\beta)}{f_{\mathbf{Y}|\mathbf{X}_{s}}(\mathbf{Y}|\mathbf{X}_{s})}>\gamma\big{]}.
Proof:
See Section V-B. ∎
The following proposition characterizes the behavior of the entries of having the smallest magnitude for fixed . For the real linear model in [14], follows a chi-squared distribution with one degree of freedom for all . However, for our phase retrieval model (cf. Section II), follows a chi-squared distribution with two degrees of freedom. This difference only amounts to a minor change in the definition of in (12), and [14, Prop. 3] extends immediately to the following.
Proposition 15**.**
[14, Prop. 3]** For i.i.d. on \mathcal{C}\mathcal{N}\big{(}0,\frac{\sigma_{\beta}^{2}}{k}) for fixed , we have with probability one that the following holds for all :
[TABLE]
where is the permutation of whose entries are listed in increasing order of magnitude, and is defined in (12).
Note that this result is essentially an application of the Glivenko-Cantelli theorem [26, Thm. 19.1], stating uniform convergence of the empirical cumulative distribution function (CDF) to the true CDF.
V-A Proof of Theorem 2
The proof of Theorem 2 follows the same high-level steps as those for the discrete case.
Step 1: Choose a typical set. Based on the result in Proposition 15, we set to be the set of vectors such that \max_{\alpha\in[0,1]}\big{|}\frac{1}{k\sigma_{\beta}^{2}}\sum_{i=1}^{\lfloor\alpha k\rfloor}|(b_{s}^{\prime})_{i}|^{2}-g(\alpha)\big{|}\leq\varepsilon as , where is chosen to decay sufficiently slowly so that . We therefore have
[TABLE]
for all , and in particular by setting . In addition, we obtain
[TABLE]
by using and .
We proceed similarly to Section IV-B for the discrete setting, recalling that . For the achievability part, (68) and the mutual information lower bound in (6) (with ) imply (within the typical set) that for any with , we have
[TABLE]
where is defined in (10).
For the converse part, we do not need to consider all pairs , since any fixed choice still provides a valid converse. Hence, for a given cardinality , we only consider the choice such that contains the indices corresponding to the entries of with the smallest magnitude. Under this choice, we have from (68)–(69) and the upper bound in (6) (with ) that
[TABLE]
where is defined in (11).
Step 2: Bound the information density tail probabilities. We again make use of Theorem 11 and its subsequent expression for and in (54).
Step 3: Control the remainder terms. Recall that is defined in (27) of Theorem 3. By Proposition 14, we have under any choice of satisfying for some growing to arbitrarily slowly. When this growth is sufficiently slow and n=O\big{(}k\log\frac{p}{k}\big{)}, we have
[TABLE]
due to the assumption . Note that n=O\big{(}k\log\frac{p}{k}\big{)} holds trivially under the condition (14) in the converse, whereas for the achievability we can assume without loss of generality that (13) holds with equality, since additional measurements can only improve the information-theoretic performance.
By our choice of , we may focus on realizations of satisfying (67). For such realizations, we have for all with that by (67) and the assumption that . Hence, by by (70)–(71) and the fact that for some , we have as . By using the same arguments as (57) and (58), we deduce that the remainder term resulting from (56) in the converse bound vanishes asymptotically if n=\omega\big{(}\frac{1}{\delta^{2}_{2}}\big{)}.
For the achievability part, we have as if n=\Omega\big{(}\frac{k}{\delta_{2}^{2}}\big{)} by using the same arguments as (59)–(61). Recalling that we also established above (72) that , we deduce that since by Theorem 3.
Step 4: Combine and simplify. The condition (13) is obtained from (25) of Theorem 3 and (70). By the assumption and (72), the numerator in (25) of Theorem 3 is dominated by , which behaves as \big{(}\alpha k\log\frac{p}{k}\big{)}(1+o(1)). The factor \gamma=o\big{(}k\log\frac{p}{k}\big{)} (see (72)) and the factor in (25) have been factored into ; note that the latter term behaves as when sufficiently slowly.
The converse bound in (14) is obtained similarly by using (29) of Theorem 4 and (71). Note that by [14, Proof of Cor. 2], we have that \log\big{(}\sum_{d=0}^{\lfloor\alpha^{*}k\rfloor}{p-k\choose d}{|s_{\rm{dif}}|\choose d}\big{)} simplifies to \big{(}\alpha^{*}k\log\frac{p}{k}\big{)}(1+o(1)). Combining this fact with the assumption that , and observing that for some , the numerator in (29) of Theorem 4 simplifies to , thus yielding (14).
V-B Proof of Proposition 14
Overview. We first outline the intuition behind the proof. In [14], the method for controlling was upper bounding via the expansion . Our analysis is instead based on the expansion (note that is independent of ). However, a difficulty with this expansion is in showing that is not too negative, and we overcome this difficulty as follows:
- •
Carefully choose a typical set in which the triplet lies with high probability;
- •
Show that a quantity similar to , but with conditioning on lying in the typical set, cannot be too negative by showing that given , the most probable also has a surrounding region of vectors with a similar conditional density value. This limits how high the conditional density of can be, and hence how negative the differential entropy can be.
We proceed in several steps.
Defining a typical set. Let
[TABLE]
with , , and , where is the Frobenius norm, and
[TABLE]
By the union bound, we have
[TABLE]
Recall that , , and are i.i.d. Gaussian vectors with variances , , and respectively. Applying the weak law of large numbers to the first and third probabilities, and Markov’s inequality to the middle one, we deduce that as (with and simultaneously).
Useful properties within the typical set. Fix , as well as some satisfying
[TABLE]
for some to be chosen later. From and , we have
[TABLE]
and hence
[TABLE]
On the other hand, we also have
[TABLE]
by the assumptions and . It follows that
[TABLE]
Now, we have for each that
[TABLE]
where (86) applies the triangle inequality, (88) is by Cauchy-Schwartz, and (89) applies (76) and (84).
It follows from (89) that
[TABLE]
and by interchanging the roles of and (and noting that (84) holds), we obtain
[TABLE]
Summing over , we obtain
[TABLE]
by the condition in .
Similarly, from (91), we have
[TABLE]
and summing over , we obtain
[TABLE]
We now use (94) to bound a related term containing the observations:
[TABLE]
where (100) uses , (103) uses the definition of (whose -th entry is denoted by ), (105) uses , (106) follows since within , and (107) follows from (94) and (99).
Bounding a density ratio. Let be the Dirac delta function, and observe that
[TABLE]
Recalling the distributions and , it follows from (112) that
[TABLE]
where (115) uses (81) and (107). Now, since , , , and , we see from (115) that if we choose
[TABLE]
then we obtain
[TABLE]
whenever and .
Bounding an average log-density. Let be an arbitrary point in , and define
[TABLE]
From (117), we have whenever . Hence, we have
[TABLE]
where is vanishing as . On the other hand, we trivially have , and hence
[TABLE]
Now, defining the ball , we have
[TABLE]
where is the volume of the ball [27]. Therefore, we have
[TABLE]
by (116). Combining (120) and (125) gives
[TABLE]
Since can be arbitrarily chosen within , we rename it to , and take the logarithm to deduce that
[TABLE]
by the assumption .
Bounding a mutual information-like term. The mutual information is the average of a log-density ratio, and that ratio may be positive or negative in general. We will find it more convenient to apply the function to the log-density ratio, and proceed as follows:
[TABLE]
where (131) follows from Bayes’ rule, (133) follows from the fact that f_{\beta_{s}}(b_{s})=\frac{1}{(\pi\sigma_{\beta}^{2})^{k}}\exp\big{(}-\frac{\|b_{s}\|_{2}^{2}}{\sigma_{\beta}^{2}}\big{)} for all , (134) applies for , (135) uses , (136) follows from (129) and the assumption \sigma_{\beta}^{2}=\Theta\big{(}\frac{1}{k}\big{)}, and (137) uses and the assumption .
Wrapping up. It follows from (137) and Markov’s inequality that for any ,
[TABLE]
Hence, we have
[TABLE]
This concludes the proof of Proposition 14.
VI Conclusion
We have characterized the information-theoretic limits of approximate support recovery in the complex phase retrieval model with Gaussian measurements, under both discrete and Gaussian distributions on the unknown non-zero entries. Along the way, we established novel concentration bounds for conditional information random variables, which may be of independent interest. Our achievability and converse bounds have matching scaling laws, as well as near-matching constant factors as the SNR increases. There are numerous potential directions for further work, including (i) handling the exact recovery criterion, (ii) improving our results in the low-SNR regime via tighter mutual information bounds, (iii) extending our achievability results to general scalings , (iv) handling the linear sparsity regime without any additional assumptions, (v) performing analogous studies for non-Gaussian measurement matrices, such as Fourier measurements, and (vi) seeking computationally efficient algorithms whose support recovery performance comes close to the fundamental limits.
Appendix A Signal-to-Noise Ratio (SNR) Calculations
Gaussian . For the real Gaussian linear model in [14, Corr. 2], we have i.i.d. measurements, \mathcal{N}\big{(}0,\frac{c_{\beta}}{\sigma^{2}}\big{)} entries of , and noise, leading to an SNR of .
The complex Gaussian phase retrieval setting in Section II with \mathcal{CN}\big{(}0,\frac{c_{\beta}}{\sigma^{2}}) entries of is slightly more complicated. Noting that a standard random variable has mean , variance , and second moment , we find that the expected SNR for sending a support vector is
[TABLE]
where (146) follows from the fact that given , so has a distribution, (147) follows from the fact that has a distribution so \mathbb{E}[(2\|X_{s}\|_{2}^{2})^{2}]=\big{(}\mathbb{E}[2\|X_{s}\|_{2}^{2}]\big{)}^{2}+\operatorname{\mathsf{Var}}[2\|X_{s}\|_{2}^{2}]=(2k)^{2}+4k=4k(k+1), and (148) uses . Since we only consider scaling regimes where , the term is negligible.
Discrete . For the real discrete linear model in [14, Sec. IV-A], we have i.i.d. measurements, a -sparse random vector which is a uniformly random permutation of , and noise, leading to an SNR of . In particular, when , the SNR is equal to .
For the complex discrete phase retrieval setting in Section II with being a uniformly random permutation of and with noise, we can use similar arguments as in the Gaussian case to show that
[TABLE]
In particular, for the case , we have
[TABLE]
In addition, since the “sorted” vector satisfies (as ) and similarly , the mutual information terms (6) and (II-B) simplify to
[TABLE]
These simplifications readily permit the numerical evaluation of (8)–(9) in Theorem 1 as .
Matching the linear and phase retrieval models. In light of the above calculations, in Figure 1 and Figure 2, we match the SNR of the two models (real linear and complex phase retrieval) by taking from the phase retrieval model and squaring it and then multiplying it by to get the value for the linear model.
Appendix B Proof of Theorem 6 (Mutual Information Bounds)
First, for a fixed partition of the support set , we rewrite the acquisition model in (1) as
[TABLE]
Conditioned on , this gives
[TABLE]
where , , and are independent random variables (recall that has i.i.d. entries).
Next, given and , we write , where
[TABLE]
follows a non-central distribution with two degrees of freedom, which is log-concave [24]. Observe that
[TABLE]
where . The entropy of can be lower bounded using the entropy power inequality as [28], or equivalently
[TABLE]
To find an upper bound on the entropy of , we use the reverse entropy power inequality [29, Theorem. 7] for two uncorrelated log-concave random variables and to obtain \exp(2h(U_{w_{\rm{eq}}}+Z))\leq\frac{\pi e}{2}\big{(}\exp(2h(U_{w_{\rm{eq}}}))+\exp(2h(Z))\big{)}, or equivalently,
[TABLE]
We now consider upper and lower bounding the entropy of . For the upper bound, we simply use that the Gaussian distribution maximizes entropy for a given variance:
[TABLE]
Moreover, the result of [29, Theorem 3] states that this upper bound is nearly tight for log-concave random variables:
[TABLE]
Indeed, U_{w_{\rm{eq}}}=v_{\rm{dif}}\big{|}\mathcal{CN}\big{(}\sqrt{\frac{v_{\rm{eq}}}{v_{\rm{dif}}}}w_{\rm{eq}},1\big{)}\big{|}^{2} (cf., (155)) has a non-central distribution with two degrees of freedom, which is log-concave [24]. In addition, the variance is given by [30, p. 45]
[TABLE]
Hence, from (162) and (164), we obtain
[TABLE]
and from from (163), we obtain
[TABLE]
It follows from (161) and (165) that
[TABLE]
where the two equalities are simple algebraic manipulations. Similarly, it follows from (160) and (166) that
[TABLE]
Returning to (159), we have
[TABLE]
where (174) follows from (169) and the concavity of the function for , (175) follows from the fact that .
Finally, from (159) and (172), we have
[TABLE]
and (6) follows from (175)–(176).
Appendix C Proof of Proposition 7 and Corollary 9 (General Concentration of Conditional Information)
Before proceeding, we briefly explain the notation used throughout this appendix. The first two lemmas below concern generic vectors , and the remainder of the appendix concerns joint density functions on with and , and more generally on with and . Initially, this should be viewed as generic notation; in Appendix D, we will specialize to the phase retrieval setting by interpreting complex vectors in as equivalently being in .
C-A Technical Analysis
The following lemma gives a sufficient condition for interchanging certain derivatives and integrals, and perhaps more importantly, establishes bounds on certain first and second derivatives that will eventually be used to bound the key quantity in Proposition 7. Here and subsequently, denotes the set of absolutely integrable functions on .
Lemma 16**.**
Fix , and let . Assume that is a real entire function666A real entire function is a function on which is analytic (complex differentiable or holomorphic) on the entire complex plane and assumes real values on the real axis. For our purposes, it suffices to understand that the exponential function falls in this class, and that any function in this class restricted to the real line is always equal to its infinite Taylor expansion [31, Sec. 2.3].
in for each fixed such that for all . In addition, assume that either for all pairs or
for all pairs . For , define
[TABLE]
(i) If for all , we have that is twice differentiable and that
[TABLE]
*for .
(ii) Let be a subset of . Under the condition
[TABLE]
for some constant , we have
[TABLE]
for any , where is an interior of the set .
(iii) Let be a subset of . Under the condition
[TABLE]
for some constant , we have
[TABLE]
for any .
Proof:
See Appendix E-A. ∎
Fradelizi, Madiman, and Wang [22] state that we can exchange analogous integrals and derivatives if the function under the integral is in , but we are not aware of a proof. They also noted that satisfies this property for any when is log-concave. However, we cannot use such results directly, because we will be considering joint distributions that fail to be jointly log-concave.
The following lemma formally states that the integral of any power of a log-concave random vector is in , and provides an explicit upper bound on such an integral (to be used in Corollary 22 below).
Lemma 17**.**
Fix , and let be a log-concave function such that and .777Here denotes the integral of the absolute value, and denotes the maximum absolute value. Then, for all , the following holds:
[TABLE]
where is finite and is defined as
[TABLE]
Proof:
Observe that
[TABLE]
where (186) follows from a change of variable . Noting that is jointly concave as a function of [22, Lemma 2.8], we find that is log-concave in by Prékepa’s theorem [32], which states that the marginal function of a jointly log-concave function is log-concave. Since the product of two log-concave functions is log-concave, we deduce from (186) that the function is also log-concave in .
To establish that the supremum over in (184) is bounded, we will combine the log-concavity property with the limiting behavior as . We write
[TABLE]
and consider taking the limit on both sides. For this purpose, we need to establish some technical conditions for applying the monotone convergence theorem [33, Ch. 18]:
For fixed , the function t^{n}\big{(}\frac{f(\mathbf{x})}{\|f\|_{\infty}+1}\big{)}^{t} is non-increasing for sufficiently large, since by the definition of . 2. 2.
For each fixed , we have
[TABLE]
again using . 3. 3.
The function t^{n}\big{(}\frac{f(\mathbf{x})}{\|f\|_{\infty}+1}\big{)}^{t} is integrable with respect to for any fixed ; this is because \big{(}\frac{f(\mathbf{x})}{\|f\|_{\infty}+1}\big{)}^{t}\leq\frac{f(\mathbf{x})}{\|f\|_{\infty}+1} for , and .
Taking limits in (187) and applying (188) and the monotone convergence theorem [33, Ch. 18], we obtain
[TABLE]
Summarizing the above findings, we have shown that the function \kappa(t):=\log\big{(}(\|f\|_{\infty}+1)^{-t}\int_{\mathbb{R}^{n}}t^{n}f^{t}(\mathbf{x})\mu(d\mathbf{x})\big{)} is concave in (and is therefore continuous wherever it takes finite values), is bounded from above for any fixed , and tends to as . These properties immediately imply that , so to establish in (184), it only remains to show that .
If , then is zero almost everywhere, and the claim is trivial, so we proceed assuming that . In this case, for any fixed , which implies that (again, a concave function is continuous wherever it takes finite values). By concavity, we have for that \kappa(1)\geq\frac{1}{2}\big{(}\kappa(t)+\kappa(2-t)\big{)}, or equivalently
[TABLE]
Hence, having already shown that and , we deduce that and hence .
∎
We note that the preceding lemmas concern general vectors that need not be related to the matrix in the phase retrieval setting. Henceforth, we gives results concerning pairs , which will later be directly equated with the relevant quantities in the phase retrieval problem.
In the following lemma, we specialize the first part of Lemma 16 to functions of under the condition of a certain integral being finite. This condition is explored further below.
Corollary 18**.**
Fix , and let be random vectors with joint distribution . For each , define
[TABLE]
Then, under the condition that
[TABLE]
holds for all , we have that is twice differentiable and
[TABLE]
Proof:
We use the first part of Lemma 16 with playing the role of therein, and playing the role of . Note that for each fixed , is an entire function in [31, Sec. 2.3] and that for all . In addition, for each fixed , we have
[TABLE]
or equivalently,
[TABLE]
Hence, for each fixed we have that for all pairs if , and that for all pairs if , so that the assumption of Lemma 16 is satisfied in both cases. ∎
The following lemma provides sufficient conditions under which (193) holds.
Lemma 19**.**
Fix , and let . Under the conditions
[TABLE]
we have that (193) of Corollary 18 holds for all , i.e., . More specifically, we have
[TABLE]
for all , and
[TABLE]
for all .
Proof:
See Appendix E-B. ∎
The following corollary shows that the sufficient conditions of Lemma 19 are satisfied when are i.i.d. according to a joint distribution on corresponding to an additive noise model with a log-concave marginal . The latter condition can be interpreted as stating that is log-concave “on average”. In addition, explicit upper bounds on (192) are given that will be useful later.
Corollary 20**.**
Fix , and let be i.i.d. on with and . Assume that is log-concave, and that given , we have , where and are independent random variables and . Then conditions (198)–(199) of Lemma 19 hold, and in addition, we have
[TABLE]
Proof:
First, for all , we have
[TABLE]
and hence
[TABLE]
or equivalently . This means that condition (199) of Lemma 19 holds. Moreover, we have from (205) that
[TABLE]
Combining this with the log-concavity of (and hence ) and applying Lemma 17, we deduce that condition (198) of Lemma 19 holds. ∎
The preceding results will be used in conjunction with the following lemma in order to bound the key quantity appearing in Proposition 7. This result is a counterpart to part of the analysis in [22, proof of Theorem 2.3], but it is proved using different methods.888The analysis in [22, proof of Theorem 2.3] does not seem to be feasible for our purposes unless are jointly log-concave, since otherwise we cannot confirm that is concave.
Lemma 21**.**
Fix , and let such that , , and the distribution of is log-concave. Define , and suppose that
[TABLE]
for some positive constants , , and . Then, defining , we have
[TABLE]
for all t\in\big{(}0,1\big{]}, and
[TABLE]
for all .
Proof:
This result follows from Lemma 16 (with , since we consider jointly) applied separately for the following two cases:
- •
For , set and use the third part of the lemma with ;
- •
For , set and use the second part of the lemma with .
Note that and \bar{Q}_{1}^{1-t}f_{X}(x)f_{Y|X}^{t}(y|x)=f_{X}(x)\big{(}\frac{f_{Y|X}(y|x)}{\bar{Q}_{1}}\big{)}^{t} are both real entire functions in for each fixed (see Footnote 6 on page 6). In addition, both functions are non-negative valued, and the required conditions on the derivatives hold by the same argument as (196)–(197). ∎
C-B Proof of Proposition 7 (General Exponential Bound)
Recall the notation , and as per the proposition statement, and define
[TABLE]
where as stated in (32). From Corollary 18 with , we have
[TABLE]
and in addition, the definition of in (34) immediately implies
[TABLE]
Now, from the Taylor-Lagrange formula (e.g., see [22, proof of Theorem 3.1]) for the function , for every t\in\big{(}0,1], we have
[TABLE]
where (220) follows from (217) along with direct differentiation, and (221) follows from (214) and (216). It follows from (221) that for all , we have
[TABLE]
where (225) follows from the fact that for all .
In addition, from the Taylor-Lagrange formula for the function , for every t\in\big{(}1,\infty), we have
[TABLE]
where (228) follows from (218) along with direct differentiation, and (229) follows from (214) and (216). It follows from (229) that for all , we have
[TABLE]
where (233) follows from the fact that for all , and (235) follows from the fact that for all .
Combining the cases in (226) and (235), we have
[TABLE]
for all . On the other hand, since , we also have
[TABLE]
It follows from (236) and (239) that
[TABLE]
or equivalently
[TABLE]
for all . By setting , we obtain (36) from (241), recalling from the definition of in (39) that for . The remaining case is trivial, since the right-hand side of (36) evaluates to by the definition for .
C-C Proof of Corollary 9 (General Concentration Corollary)
The proof is very similar to that of [22, Corollary 3.4], with the main idea being to use the Chernoff bound and optimize the exponent.
By the Chernoff bound, we have for any and that
[TABLE]
Combining these bounds with Proposition 7 (with in the first case and in the second case), we obtain
[TABLE]
where is defined in (39). Now, define
[TABLE]
It is easy to see that for . For , by differentiating, the supremum is reached at and the maximum value is
[TABLE]
In fact, holds for all , since has value for by definition.
From (244) and (248), for , we have
[TABLE]
Similarly, we can define
[TABLE]
By differentiating, the supremum is reached at and the maximum value is
[TABLE]
for any (here there is no case). From (245) and (254), for , we have
[TABLE]
The proof is completed by replacing by in (251) and (257), and noting that for .
Appendix D Proof of Theorem 11 (Concentration of Information Density for Phase Retrieval)
The following corollary shows that for the phase retrieval setting, and have the boundedness properties required to apply Lemma 21.
Corollary 22**.**
For the phase retrieval model in (1), we have for fixed and that
[TABLE]
and
[TABLE]
for all , where , and
[TABLE]
Proof:
We condition on (and ), and consider the resulting joint distributions and for a single measurement. The log-concavity properties in Lemma 5 allow us to apply Corollary 20 (with ) and subsequently Lemma 19. Substituting (202) into (201) yields the following for :
[TABLE]
These equations are equivalent to (260) and (261).
For the case , we first apply Lemma 17 and Corollary 20 (with ); the latter implies via (203), which we combine with the former to obtain
[TABLE]
where is defined in (262). Since and , we can weaken (266) to
[TABLE]
Now, by applying (200) of Lemma 19 (with and playing the role of ) together with (267), we have
[TABLE]
and similarly
[TABLE]
by replacing by . ∎
With Corollary 22 in place, we are able to use Lemma 21 to deduce the following result for bounding the crucial quantity in the concentration bounds (first appearing in Proposition 7, and leading to Corollary 9 being the form that we will apply). Note that below, and are instances of in Lemma 21, and and are instances of .
Lemma 23**.**
For the phase retrieval model in Section II, for fixed , , and , define
[TABLE]
for , where . Moreover, define
[TABLE]
Then, the following bounds hold:
[TABLE]
where
[TABLE]
and is defined in (262).
Proof:
This result is obtained by applying Lemma 21 (with or in place of , and conditioning on ), and characterizing the upper bounds therein using Corollary 22. Note that a complex random vector in can be equivalently considered as a real complex vector in . ∎
With the above tools in place, we are ready to prove the main result on the concentration of the information density (Theorem 11), and its simplified version (Corollary 12).
D-A Proof of Theorem 11
By the assumption of i.i.d. measurements, we have
[TABLE]
where, recalling , the conditional distributions for a single measurement are given by
[TABLE]
with being the squared magnitude of a random variable. Moreover, we have
[TABLE]
where denotes the (conditional) negative log-density (cf., (41)).
Recall the log-concavity properties in Lemma 5 for a single measurement, which immediately imply analogous properties for the vector of independent measurements [25, Prop. 3.2]. Applying Corollary 9 and bounding therein (as defined in (34)) by in accordance with Lemma 23, we have for all that
[TABLE]
noting the one-to-one correspondence between and . Similarly, Corollary 9 and Lemma 23 also yield
[TABLE]
Finally, observe that for all , we have
[TABLE]
where (D-A) follows from (282) and the union bound, (288) follows from (284)–(285). Notice that (288) recovers (47), and we similarly obtain (48) from (283) and (286).
D-B Proof of Corollary 12
Recall that given , any given measurement takes the form , where has i.i.d. entries. Hence, is the convolution of the noisy density with a random variable scaled by . This implies that if is a constant (i.e., remains fixed as increases), then so is (see (262)) and hence also (see (277)). Equivalently, if , then , as required.
Appendix E Technical Proofs
E-A Proof of Lemma 16
Proof of part (i). Fix . Since is a real entire function in (analytic for all ) for each fixed , by Taylor’s expansion [31, Theorem 4.4], we have
[TABLE]
for all . Re-arranging, we obtain for that
[TABLE]
Taking the absolute value, and supposing that for some , we have
[TABLE]
where (293) follows from the assumption that either for all or for all , (294) follows from (289) applied twice with and , and (295) follows from the triangle inequality and the non-negativity of .
We proceed by integrating (295) over . By the assumption that for all , we have
[TABLE]
where we applied the definition of in (177), and used the fact that it is finite by the assumption for fixed .
The definition of also yields
[TABLE]
From (295) and (298), we have that \big{|}\frac{g(\mathbf{x},u)-g(\mathbf{x},t)}{u-t}\big{|} is dominated by the integrable function \frac{2}{\varepsilon}\big{(}g(\mathbf{x},t-\varepsilon/2)+2g(\mathbf{x},t)+g(\mathbf{x},t+\varepsilon/2), meaning we can apply the dominated convergence theorem [33, Ch. 18] to obtain
[TABLE]
where (300) uses (299), and (302) follows from the definition of partial derivative. We have thus proved (178) in the case that .
Similarly to (302), from (295), (298), and the dominated convergence theorem [33, Ch. 18], we have
[TABLE]
From (295), (298), and (304), we have
[TABLE]
Since (305) holds for any , we similarly have
[TABLE]
Now, following the same steps as those for the first derivative, we have the following analog of (299):
[TABLE]
By Taylor’s expansion (replacing in (290) by ), we also have
[TABLE]
Using the same arguments as from (291) to (295), we obtain (for with and ) that
[TABLE]
Integrating both sides and applying the definition of in (305), we obtain
[TABLE]
where the finiteness is by (307).
From (310), (312), and (313), we have that \big{|}\frac{\frac{\partial g}{\partial u}(\mathbf{x},u)-\frac{\partial g}{\partial u}(\mathbf{x},t)}{u-t}\big{|} is dominated by the integrable function \frac{2}{\varepsilon}\big{(}\big{|}\frac{\partial g}{\partial u}(\mathbf{x},t-\varepsilon/2)\big{|}+2\big{|}\frac{\partial g}{\partial u}(\mathbf{x},t)\big{|}+\big{|}\frac{\partial g}{\partial u}(\mathbf{x},t+\varepsilon/2)\big{|}\big{)}. Hence, by the dominated convergence theorem [33, Ch. 18], we have
[TABLE]
This proves (178) for .
Proof of part (ii). Setting , we have from (302) and (307) that
[TABLE]
by upper bounding each by in accordance with (179). Moreover, returning to (315), we have
[TABLE]
where (322) uses (313) with . Combining (322) and (320) gives
[TABLE]
as required.
Proof of part (iii). We again set . The two steps leading to (319) are still valid in this case, but from there we need to proceed differently via the definition of in (181):
[TABLE]
In addition (322) is still valid in this case, but is further bounded differently via (328):
[TABLE]
[TABLE]
as required.
Remark. We could potentially reduce the constant in (324) or in (334) by choosing the optimal values of . However, for the purposes of this paper, the exact values of these constants are not important.
E-B Proof of Lemma 19
Case 1: . For brevity, let . We have
[TABLE]
where (336) follows from the definition of , and (337) from the assumption .
Case 2: . In this case, we have
[TABLE]
Now, for each , if , then we have
[TABLE]
whereas if , then we have
[TABLE]
Combining these two cases, we obtain
[TABLE]
for all . Hence,
[TABLE]
where (347) follows from the boundedness assumption in (198).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y. C. Eldar and S. Mendelson, “Phase retrieval: Stability and recovery guarantees,” Applied and Computational Harmonic Analysis , vol. 36, no. 3, pp. 473–494, 2014.
- 2[2] P. Schniter and S. Rangan, “Compressive phase retrieval via generalized approximate message passing,” IEEE Trans. Signal Process. , vol. 63, no. 4, pp. 1043–1055, 2014.
- 3[3] X. Li and V. Voroninski, “Sparse signal recovery from quadratic measurements via convex programming,” SIAM Journal on Mathematical Analysis , vol. 45, no. 5, pp. 3019–3033, 2013.
- 4[4] S. Cai, M. Bakshi, S. Jaggi, and M. Chen, “SUPER: Sparse signals with unknown phases efficiently recovered,” in Proc. of Intl. Symp. on Inform. Th. , 2014.
- 5[5] V. Nakos, “Almost optimal phaseless compressed sensing with sublinear decoding time,” in Proc. of Intl. Symp. on Inform. Th. , 2017, pp. 1142–1146.
- 6[6] Y. Li and V. Nakos, “Sublinear-time algorithms for compressive phase retrieval,” in Proc. of Intl. Symp. on Inform. Th. , Vail, CO, 2018, pp. 2301–2305.
- 7[7] M. Iwen, A. Viswanathan, and Y. Wang, “Robust sparse phase retrieval made easy,” Applied and Computational Harmonic Analysis , vol. 42, no. 1, pp. 135 – 142, 2017.
- 8[8] R. Pedarsani, D. Yin, K. Lee, and K. Ramchandran, “Phasecode: Fast and efficient compressive phase retrieval based on sparse-graph codes,” IEEE Transactions on Information Theory , vol. 63, no. 6, pp. 3663–3691, 2017.
