Support Recovery in the Phase Retrieval Model: Information-Theoretic   Fundamental Limits

Lan V. Truong; Jonathan Scarlett

arXiv:1901.10647·cs.IT·September 29, 2020

Support Recovery in the Phase Retrieval Model: Information-Theoretic Fundamental Limits

Lan V. Truong, Jonathan Scarlett

PDF

Open Access

TL;DR

This paper investigates the fundamental limits of support recovery in phase retrieval models with noisy measurements, providing sharp thresholds and new concentration bounds for information content.

Contribution

It introduces sharp information-theoretic bounds for support recovery in phase retrieval, including new concentration bounds for log-concave random variables.

Findings

01

Sharp thresholds for support recovery in phase retrieval models.

02

New concentration bounds for the conditional information content.

03

Near-matching constants in various sparsity and noise regimes.

Abstract

The support recovery problem consists of determining a sparse subset of variables that is relevant in generating a set of observations. In this paper, we study the support recovery problem in the phase retrieval model consisting of noisy phaseless measurements, which arises in a diverse range of settings such as optical detection, X-ray crystallography, electron microscopy, and coherent diffractive imaging. Our focus is on information-theoretic fundamental limits under an approximate recovery criterion, considering both discrete and Gaussian models for the sparse non-zero entries, along with Gaussian measurement matrices. In both cases, our bounds provide sharp thresholds with near-matching constant factors in several scaling regimes on the sparsity and signal-to-noise ratio. As a key step towards obtaining these results, we develop new concentration bounds for the conditional…

Equations552

Y = ∣ ⟨ X_{s}, β_{s} ⟩ ∣^{2} + Z,

Y = ∣ ⟨ X_{s}, β_{s} ⟩ ∣^{2} + Z,

f_{β_{s} X_{s} Y} (b_{s}, x_{s}, y)

f_{β_{s} X_{s} Y} (b_{s}, x_{s}, y)

= f_{β_{s}} (b_{s}) f_{X}^{k} (x_{s}) f_{Z} (y - ∣ ⟨ x_{s}, b_{s} ⟩ ∣^{2}),

f_{β_{s} X_{s} Y} (b_{s}, x_{s}, y) = f_{β_{s}} (b_{s}) f_{X}^{n \times k} (x_{s}) f_{Y ∣ X_{s} β_{s}}^{n} (y ∣ x_{s}, b_{s}),

f_{β_{s} X_{s} Y} (b_{s}, x_{s}, y) = f_{β_{s}} (b_{s}) f_{X}^{n \times k} (x_{s}) f_{Y ∣ X_{s} β_{s}}^{n} (y ∣ x_{s}, b_{s}),

\displaystyle\mathsf{P}_{\mathrm{e}}(\alpha^{*}):=\mathbb{P}\Big{[}\{|S\setminus\hat{S}|\geq\lfloor\alpha^{*}k\rfloor\}\cup\{|\hat{S}\setminus S|\geq\lfloor\alpha^{*}k\rfloor\}\Big{]}.

\displaystyle\mathsf{P}_{\mathrm{e}}(\alpha^{*}):=\mathbb{P}\Big{[}\{|S\setminus\hat{S}|\geq\lfloor\alpha^{*}k\rfloor\}\cup\{|\hat{S}\setminus S|\geq\lfloor\alpha^{*}k\rfloor\}\Big{]}.

I_{1} (α, k)

I_{1} (α, k)

I_{2} (α, k)

\displaystyle\quad+\frac{1}{2}\log\bigg{[}1+\frac{\big{(}\sum_{i=1}^{\lfloor\alpha k\rfloor}|b^{\prime}_{i}|^{2}\big{)}\big{(}\sum_{i=\lfloor\alpha k\rfloor+1}^{k}|b^{\prime}_{i}|^{2}\big{)}}{\big{(}\sum_{i=1}^{\lfloor\alpha k\rfloor}|b^{\prime}_{i}|^{2}\big{)}^{2}+\frac{\exp(2h(Z))}{2\pi e}}\bigg{]}+\frac{1}{2}\log\bigg{(}\frac{\pi e}{2}\bigg{)},

n \geq α \in [α^{*}, 1] max \frac{α k lo g ( \frac{p}{k} )}{I _{1} ( α , k )} (1 + η)

n \geq α \in [α^{*}, 1] max \frac{α k lo g ( \frac{p}{k} )}{I _{1} ( α , k )} (1 + η)

n \leq α \in [α^{*}, 1] max \frac{( α - α ^{*} ) k lo g ( \frac{p}{k} )}{I _{2} ( α , k )} (1 - η),

n \leq α \in [α^{*}, 1] max \frac{( α - α ^{*} ) k lo g ( \frac{p}{k} )}{I _{2} ( α , k )} (1 - η),

\overset{ˉ}{I}_{1} (α)

\overset{ˉ}{I}_{1} (α)

\overset{ˉ}{I}_{2} (α)

g (α) := \int_{0}^{\infty} [α - F_{1} (u)]^{+} d u,

g (α) := \int_{0}^{\infty} [α - F_{1} (u)]^{+} d u,

n \geq α \in [α^{*}, 1] max \frac{α k lo g \frac{p}{k}}{I ˉ _{1} ( α )} (1 + η),

n \geq α \in [α^{*}, 1] max \frac{α k lo g \frac{p}{k}}{I ˉ _{1} ( α )} (1 + η),

n \leq α \in [α^{*}, 1] max \frac{( α - α ^{*} ) k lo g \frac{p}{k}}{I ˉ _{2} ( α )} (1 - η)

n \leq α \in [α^{*}, 1] max \frac{( α - α ^{*} ) k lo g \frac{p}{k}}{I ˉ _{2} ( α )} (1 - η)

b_{1} = \dots = b_{k} = \frac{c _{β}}{k}

b_{1} = \dots = b_{k} = \frac{c _{β}}{k}

f_{Y ∣ X_{s_{dif}} X_{s_{eq}}} (y ∣ x_{s_{dif}}, x_{s_{eq}})

f_{Y ∣ X_{s_{dif}} X_{s_{eq}}} (y ∣ x_{s_{dif}}, x_{s_{eq}})

f_{Y ∣ X_{s_{dif}} X_{s_{eq}}} (y ∣ x_{s_{dif}}, x_{s_{eq}}, b_{s})

f_{Y ∣ X_{s_{eq}}} (y ∣ x_{s_{eq}})

f_{Y ∣ X_{s_{eq}}} (y ∣ x_{s_{eq}})

f_{Y ∣ X_{s_{eq}}} (y ∣ x_{s_{eq}})

i (x_{s_{dif}}; y ∣ x_{s_{eq}}) := lo g \frac{f _{Y ∣ X_{s_{dif}} X_{s_{eq}}} ( y ∣ x _{s_{dif}} , x _{s_{eq}} )}{f _{Y ∣ X_{s_{eq}}} ( y ∣ x _{s_{eq}} )},

i (x_{s_{dif}}; y ∣ x_{s_{eq}}) := lo g \frac{f _{Y ∣ X_{s_{dif}} X_{s_{eq}}} ( y ∣ x _{s_{dif}} , x _{s_{eq}} )}{f _{Y ∣ X_{s_{eq}}} ( y ∣ x _{s_{eq}} )},

i^{n} (x_{s_{dif}}; y ∣ x_{s_{eq}}, b_{s}) := i = 1 \sum n i (x_{s_{dif}}^{(i)}; y^{(i)} ∣ x_{s_{eq}}^{(i)}, b_{s}),

i^{n} (x_{s_{dif}}; y ∣ x_{s_{eq}}, b_{s}) := i = 1 \sum n i (x_{s_{dif}}^{(i)}; y^{(i)} ∣ x_{s_{eq}}^{(i)}, b_{s}),

i (x_{s_{dif}}; y ∣ x_{s_{eq}}, b_{s}) := lo g \frac{f _{Y ∣ X_{s_{dif}} X_{s_{eq}} β_{s}} ( y ∣ x _{s_{dif}} , x _{s_{eq}} , b _{s} )}{f _{Y ∣ X_{s_{eq}} β_{s}} ( y ∣ x _{s_{eq}} , b _{s} )} .

i (x_{s_{dif}}; y ∣ x_{s_{eq}}, b_{s}) := lo g \frac{f _{Y ∣ X_{s_{dif}} X_{s_{eq}} β_{s}} ( y ∣ x _{s_{dif}} , x _{s_{eq}} , b _{s} )}{f _{Y ∣ X_{s_{eq}} β_{s}} ( y ∣ x _{s_{eq}} , b _{s} )} .

I_{s_{dif}, s_{eq}} (b_{s}) := I (X_{s_{dif}}; Y ∣ X_{s_{eq}}, β_{s} = b_{s}) .

I_{s_{dif}, s_{eq}} (b_{s}) := I (X_{s_{dif}}; Y ∣ X_{s_{eq}}, β_{s} = b_{s}) .

\displaystyle\mathbb{P}\Big{[}i^{n}(\mathbf{X}_{s_{\rm{dif}}};\mathbf{Y}|\mathbf{X}_{s_{\rm{eq}}},\beta_{s})\leq n(1-\delta_{2})I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})\,\big{|}\,\beta_{s}=b_{s}\Big{]}\leq\psi_{|s_{\rm{dif}}|}(n,\delta_{2}),

\displaystyle\mathbb{P}\Big{[}i^{n}(\mathbf{X}_{s_{\rm{dif}}};\mathbf{Y}|\mathbf{X}_{s_{\rm{eq}}},\beta_{s})\leq n(1-\delta_{2})I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})\,\big{|}\,\beta_{s}=b_{s}\Big{]}\leq\psi_{|s_{\rm{dif}}|}(n,\delta_{2}),

\displaystyle n\geq\frac{\log{p-k\choose|s_{\rm{dif}}|}+\log\Big{(}\frac{k^{2}}{\delta_{1}^{2}}{k\choose|s_{\rm{dif}}|}^{2}\Big{)}+\gamma}{I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})(1-\delta_{2})},

P_{e} (α^{*}) \leq l = ⌊ α^{*} k ⌋ \sum k (ℓ k) ψ_{ℓ} (n, δ_{2}) + P_{0} (γ) + 2 δ_{1} + P [β_{s} \in / T_{β}],

P_{e} (α^{*}) \leq l = ⌊ α^{*} k ⌋ \sum k (ℓ k) ψ_{ℓ} (n, δ_{2}) + P_{0} (γ) + 2 δ_{1} + P [β_{s} \in / T_{β}],

\displaystyle P_{0}(\gamma):=\mathbb{P}\bigg{[}\log\frac{f_{\mathbf{Y}|\mathbf{X}_{s}\beta_{s}}(\mathbf{Y}|\mathbf{X}_{s},\beta_{s})}{f_{\mathbf{Y}|\mathbf{X}_{s}}(\mathbf{Y}|\mathbf{X}_{s})}>\gamma\bigg{]}.

\displaystyle P_{0}(\gamma):=\mathbb{P}\bigg{[}\log\frac{f_{\mathbf{Y}|\mathbf{X}_{s}\beta_{s}}(\mathbf{Y}|\mathbf{X}_{s},\beta_{s})}{f_{\mathbf{Y}|\mathbf{X}_{s}}(\mathbf{Y}|\mathbf{X}_{s})}>\gamma\bigg{]}.

\displaystyle\mathbb{P}\Big{[}i^{n}(\mathbf{X}_{s_{\rm{dif}}};\mathbf{Y}|\mathbf{X}_{s_{\rm{eq}}},\beta_{s})\leq n(1+\delta_{2})I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})\,\big{|}\,\beta_{s}=b_{s}\Big{]}\geq 1-\psi^{\prime}_{|s_{\rm{dif}}|}(n,\delta_{2}),

\displaystyle\mathbb{P}\Big{[}i^{n}(\mathbf{X}_{s_{\rm{dif}}};\mathbf{Y}|\mathbf{X}_{s_{\rm{eq}}},\beta_{s})\leq n(1+\delta_{2})I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})\,\big{|}\,\beta_{s}=b_{s}\Big{]}\geq 1-\psi^{\prime}_{|s_{\rm{dif}}|}(n,\delta_{2}),

\displaystyle n\leq\frac{\log{p-k+|s_{\rm{dif}}|\choose|s_{\rm{dif}}|}-\log\big{(}\sum_{d=0}^{\lfloor\alpha^{*}k\rfloor}{p-k\choose d}{|s_{\rm{dif}}|\choose d}\big{)}-\log\delta_{1}}{I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})(1+\delta_{2})},

\displaystyle\mathsf{P}_{\mathrm{e}}(\alpha^{*})\geq\mathbb{P}[\beta_{s}\in\mathcal{T}_{\beta}]\Big{(}1-\max_{\ell=\lfloor\alpha^{*}k\rfloor,\dotsc,k}\psi^{\prime}_{\ell}(n,\delta_{2})\Big{)}-\delta_{1}.

\displaystyle\mathsf{P}_{\mathrm{e}}(\alpha^{*})\geq\mathbb{P}[\beta_{s}\in\mathcal{T}_{\beta}]\Big{(}1-\max_{\ell=\lfloor\alpha^{*}k\rfloor,\dotsc,k}\psi^{\prime}_{\ell}(n,\delta_{2})\Big{)}-\delta_{1}.

\displaystyle\frac{1}{2}\log\bigg{[}\bigg{(}\frac{4}{\exp(2h(Z))}\bigg{)}v_{\rm{dif}}^{2}+1\bigg{]}\leq I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})

\displaystyle\frac{1}{2}\log\bigg{[}\bigg{(}\frac{4}{\exp(2h(Z))}\bigg{)}v_{\rm{dif}}^{2}+1\bigg{]}\leq I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})

\displaystyle\qquad\leq\frac{1}{2}\log\bigg{(}\frac{\pi e}{2}\bigg{)}+\frac{1}{2}\log\bigg{[}\bigg{(}\frac{2\pi e}{\exp(2h(Z))}\bigg{)}v_{\rm{dif}}^{2}+1\bigg{]}+\frac{1}{2}\log\bigg{(}1+\frac{v_{\rm{dif}}v_{\rm{eq}}}{v_{\rm{dif}}^{2}+\frac{\exp(2h(Z))}{2\pi e}}\bigg{)},

L (t)

L (t)

L (t)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced X-ray Imaging Techniques · Medical Image Segmentation Techniques · Sparse and Compressive Sensing Techniques

Full text

Support Recovery in the Phase Retrieval Model: Information-Theoretic Fundamental Limits

Lan V. Truong and Jonathan Scarlett L. V . Truong is with the Department of Engineering, the University of Cambridge, Cambridge CB2 1PZ UK (e-mail: [email protected]).J. Scarlett is with the Department of Computer Science, School of Computing, National University of Singapore (NUS), Singapore 117417, and also with the Department of Mathematics, NUS, Singapore 119076 (e-mail: [email protected]).This work is supported by an NUS Early Career Research Award. This paper was presented in part at the 2019 IEEE Information Theory Workshop.Copyright © 2017 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected].

Abstract

The support recovery problem consists of determining a sparse subset of variables that is relevant in generating a set of observations. In this paper, we study the support recovery problem in the phase retrieval model consisting of noisy phaseless measurements, which arises in a diverse range of settings such as optical detection, X-ray crystallography, electron microscopy, and coherent diffractive imaging. Our focus is on information-theoretic fundamental limits under an approximate recovery criterion, considering both discrete and Gaussian models for the sparse non-zero entries, along with Gaussian measurement matrices. In both cases, our bounds provide sharp thresholds with near-matching constant factors in several scaling regimes on the sparsity and signal-to-noise ratio. As a key step towards obtaining these results, we develop new concentration bounds for the conditional information content of log-concave random variables, which may be of independent interest.

Index Terms:

Phase retrieval, support recovery, sparsity pattern recovery, information-theoretic limits, compressive sensing, non-linear models, log-concave concentration.

I Introduction

Recently, there has been a growing interest in recovering an unknown signal $\beta\in\mathbb{C}^{p}$ from a number of phaseless quadratic observations, each taking the form $Y=|\langle\beta,X\rangle|^{2}+Z$ , where $X\in\mathbb{C}^{p}$ is a measurement vector, and $Z\in\mathbb{R}$ represents measurement noise. Since only the magnitude of $\langle\beta,X\rangle$ is measured, and not the phase (or the sign, in the real case), this problem is referred to as phase retrieval. The phase retrieval problem has many applications including optical detection, $X$ -ray crystallography, electron microscopy, and coherent diffractive imaging [1].

I-A Sparse Phase Retrieval

Similarly to the basic linear model, various works have shown that the number of measurements can be reduced significantly if the signal $\beta\in\mathbb{C}^{p}$ is sparse, i.e., it has at most $k$ non-zero entries for some $k\ll p$ . Here we provide a non-exhaustive list of relevant results from the literature.

It is shown in [1] that for real-valued signals, stable phase retrieval can be achieved with $O(k\log(\frac{p}{k}))$ measurements in the noiseless setting, and with $O(k\log k\log(\frac{p}{k}))$ measurements in the noisy setting under some conditions on the noise distribution. The measurements considered in [1] are isotropic and sub-Gaussian, including (real) Gaussian measurements as a special case. The corresponding results are information-theoretic and are not shown to be attained with any practical algorithm, and to the best of our knowledge, it remains an open problem to attain comparable theoretical results for practical algorithms. However, numerical evidence has been given for the success of generalized approximate message passing (GAMP) with roughly $2k\log(p/k)$ Gaussian measurements in the noiseless case, and for the robustness of GAMP to noise [2]. On the other hand, rigorous results for efficient algorithms with Gaussian measurements typically require significantly more measurements; for instance, an $O(k^{2}\log p)$ bound is attained in [3] via a semidefinite programming (SDP) approach.

While our focus will be on Gaussian measurements, it is also worth highlighting some works that achieve computationally efficient sparse phase retrieval with a similar number of carefully-designed non-Gaussian measurements. Cai et al. [4] designed an algorithm that succeeds with $O(k)$ measurements and $O(k\log k)$ decoding time in the noiseless complex-valued setting; the measurement matrix is generated from the structure of a series of bipartite graphs (between signal components and measurements) with various desirable properties. More recently, in the real-valued setting, Nakos [5] proposed an algorithm that recovers an approximately $k$ -sparse vector under the $\ell_{2}/\ell_{2}$ guarantee, with $O(k\log p)$ measurements and $O(k^{1+\gamma}{\rm poly}(\log p))$ decoding time for any constant $\gamma>0$ . See also [6] for additional variants.

In the noisy complex-valued setting, Iwen et al. [7] provided a simple two-stage sparse phase retrieval strategy that can stably reconstruct $\beta$ up to a global phase shift using only $O(k\log(\frac{p}{k}))$ measurements, under some bounded noise assumptions. In addition, Pedarsani et al. [8] used a sparse-graph coding approach to attain an approximate support recovery guarantee for quantized signals with: (i) $O(k\log p)$ measurements and $O(p\log p)$ decoding time, or (ii) $O(k\log^{3}p)$ measurements and $O(k\log^{3}p)$ decoding time. In the noiseless case, these further reduce to $O(k)$ , even without the assumption of quantized signals.

Fourier measurements are also commonly considered, and are particularly relevant in many practical applications. For example, in this setting, Jaganathan et al. [9] gave guarantees on recovering sparse signals whose support is aperiodic, and proposed an efficient two-stage algorithm is proposed that first identifies the support, and then the sparse signal values.

I-B Support Recovery

A distinct goal that has received less attention in phase retrieval, but considerable attention in other models, is the support recovery problem [10, 11, 12], where one wishes to exactly or approximately determine the support $S={\rm supp}(\beta)$ given a collection of observations $\mathbf{Y}\in\mathbb{R}^{n}$ and the corresponding measurement matrix $\mathbf{X}\in\mathbb{C}^{n\times p}$ .111 As we mention in the notation section below, the boldness of these symbols is used to highlight their association with multiple measurements. In contrast, while $X\in\mathbb{C}^{p}$ represents a vector, it is non-bold because it is only associated with a single measurement. This problem is of direct interest when the goal is to find which variables influence the output (rather than their associated weights), and may also be used as a first step towards estimating the values of $\beta$ (e.g., see [13, 9]).

Under general linear and non-linear models, Scarlett and Cevher [14] provided achievability and converse bounds characterizing the trade-off between error probability and number of measurements. They applied their general bounds to the linear, $1$ -bit, and group testing models to obtain precise thresholds on the number of measurements required to achieve vanishing decoding error probability in the high-dimensional limit. Numerous other related works also exist, with the focus being mainly on linear models [15, 16, 17, 18, 19]; see [14] for a more detailed overview. In particular, approximate recovery criteria were studied by Reeves and Gastpar [20, 21] in the regime $k=\Theta(p)$ , and by Scarlett and Cevher [14] in the regime $k=o(p)$ ; we focus on the latter setting.

Although the initial bounds in [14] are very general, applying these bounds to new models can still be very challenging, due to the need to establish concentration bounds and mutual information bounds on a case-by-case basis. In this paper, we use this approach to establish fundamental limits for approximate support recovery in the phase retrieval model, under a log-concavity assumption on the noise distribution. To achieve this goal, we need to overcome at least two key challenges: establishing concentration bounds for information quantities in the phase retrieval model, and upper and lower bounding key conditional mutual information terms that have no closed form expressions. For each of these challenges, we develop novel auxiliary results, some of which may be of independent interest. The following subsection lists our specific contributions in more detail.

I-C Contributions

Our main contributions in this paper are as follows:

•

We extend the concentration bounds of the unconditional information content of log-concave densities by Fradelizi et al. [22, Theorem 3.1] to conditional versions (cf. Corollary 9) in which joint log-concavity does not hold. Due to this extension, we can establish concentration bounds for the conditional information density of $n$ -dimensional random variables (cf. Theorem 11) and apply these bounds to the phase retrieval model. Because of their generality, our extended concentration bounds might be of independent interest.

•

Under i.i.d. complex Gaussian measurement matrices $\mathbf{X}$ , we establish tight upper and lower bounds on the required number of measurements to achieve approximate support recovery (i.e., recovering a given proportion of the support) under both discrete (cf. Lemma 13) and Gaussian (cf. Theorem 2) modeling assumptions on the non-zero entries of $\beta$ . In both cases, the upper and lower bounds coincide up to an explicit constant factor in certain sparsity regimes, and this constant factor is often very close to one (e.g., when the signal-to-noise ratio is sufficiently high).

I-D Notation

We use the similar notation to [14]. We use upper-case letters for random variables, and lower cases for their realizations. A non-bold character may be a scalar or a vector, whereas a bold character refers to a collection of $n$ scalars (e.g., $\mathbf{Y}\in\mathbb{R}^{n}$ ) or vectors (e.g., $\mathbf{X}\in\mathbb{R}^{n\times p}$ ), where $n$ is the number of measurements. We write $\beta_{S}$ to denote the subvector of $\beta$ at the columns indexed by $S$ , and $\mathbf{X}_{S}$ to denote the submatrix of $\mathbf{X}$ containing the columns indexed by $S$ . The complement with respect to $\{1,2,\ldots,p\}$ is denoted by $(\cdot)^{c}$ .

The symbol $\sim$ means “distributed as”. For a given joint probability density distribution $f_{XY}$ , the corresponding marginal distributions are denoted by $f_{X}$ and $f_{Y}$ , and similarly for conditional probability density marginals (e.g., $f_{Y|X}$ ). The notation $f_{XY}^{n}$ , $f_{X}^{n}$ , etc. denotes the corresponding i.i.d. distribution in which each term is distributed as $f_{XY}$ , $f_{X}$ , etc. We write $\mathbb{P}[\cdot]$ for probabilities, $\mathbb{E}[\cdot]$ for expectations, and $\operatorname{\mathsf{Var}}[\cdot]$ for variances.

We use usual notations for the differential entropy (e.g., $h(X)$ ) and mutual information (e.g., $I(X;Y)$ ), and their conditional counterparts (e.g., $h(X|Z),I(X;Y|Z)$ ). We use the notation $\mathcal{N}(\mu,\sigma^{2})$ for real Gaussian random variables, $\mathcal{C}\mathcal{N}(\mu,\sigma^{2})$ for complex Gaussians (with variance $\frac{\sigma^{2}}{2}$ in each of the real and imaginary parts), and $\chi_{k}^{2}$ for the central chi squared distribution with $k$ degrees of freedom.

We make use of the standard asymptotic notations $O(\cdot),o(\cdot),\Theta(\cdot),\Omega(\cdot)$ and $\omega(\cdot)$ . We define the function $[\cdot]^{+}=\max\{0,\cdot\}$ and write the floor and ceiling functions as $\lfloor\cdot\rfloor$ and $\lfloor\cdot\rfloor$ , respectively. The function $\log$ has base $e$ , and all information quantities are measured in nats.

Throughout the paper, we frequently make use of integrals written as $\int(\,\dotsc)\mu(dx)$ , $\int(\,\dotsc)\mu(dx\times dy)$ , etc., where $\mu(\cdot)$ denotes a suitable measure that can typically be taken to be the Lebesgue measure. For $t>0$ , we say that a function $f(\mathbf{x})$ on $\mathbb{R}^{n}$ is in $L^{t}(\mathbb{R}^{n})$ is $|f(\mathbf{x})|^{t}$ is integrable.

I-E Structure of the Paper

In Section II, we formally introduce the problem setup and overview our main results. In Section III, we provide the main auxiliary results on log-concavity, concentration of measure, and mutual information bounds. Sections IV and V provide the proofs of our main support recovery results. Conclusions are drawn in Section VI.

II Problem Setup and Main Results

II-A Model and Assumptions

Let $p$ denote the ambient dimension, $k$ the sparsity level, and $n$ the number of measurements. We let $\mathcal{S}$ be the set of subsets of $\{1,2,\ldots,p\}$ having cardinality $k$ . The key random variables in the support retrieval problem are the support set $S\in\mathcal{S}$ , the unknown signal $\beta\in\mathbb{C}^{p}$ , the measurement matrix $\mathbf{X}\in\mathbb{C}^{n\times p}$ , and the observation vector $\mathbf{Y}\in\mathbb{R}^{n}$ .

The support set $S$ is assumed to be equiprobable on the ${p\choose k}$ subsets within $\mathcal{S}$ . Given $S$ , the entries of $\beta_{S^{c}}$ are deterministically set to zero, and the remaining entries are generated according to some distribution $\beta_{S}\sim f_{\beta_{S}}$ .222We allow for both discrete and continuous distributions on $\beta_{S}$ , meaning that in some cases $f_{\beta_{S}}$ represents a probability mass function rather than a density function. We assume that these non-zero entries follows the same distribution for all the ${p\choose k}$ possible realizations of $S$ , and that this distribution is permutation-invariant.

We consider the setting of (complex) Gaussian measurements, in which the measurement matrix takes i.i.d. values on $\mathcal{CN}(0,1)$ , whose density is denoted by $f_{X}$ . We write $f_{X}^{n\times p}$ , to denote the corresponding i.i.d. distribution for matrices, and we write $f_{X}^{k}$ as a shorthand for $f_{X}^{k\times 1}$ . Given $S=s$ , each entry of the observation vector $\mathbf{Y}$ is generated in a conditionally independent manner according to the following model:

[TABLE]

where $X_{s}\sim f_{X}^{k}$ , $\beta_{s}\in\mathbb{C}^{k}$ , and $Z\sim f_{Z}$ , with $f_{Z}$ being an arbitrary log-concave density function. This log-concavity assumption is made for mathematical convenience, but also captures a wide range of noise distributions, including Gaussian. We note that the permutation-invariance of $Y$ , $X_{S}$ and $\beta_{S}$ with respect to $S$ allows us to condition on a fixed $S=s$ of cardinality $k$ throughout the analysis (e.g. $s=\{1,\dotsc,k\}$ ) without loss of generality; such conditioning should henceforth be assumed unless explicitly stated otherwise.

The relation (1) induces the following conditional joint distribution of $(\beta_{s},X_{s},Y)$ (given $S=s$ ):

[TABLE]

and its multiple-observation counterpart

[TABLE]

where $f_{Y|X_{s}\beta_{s}}^{n}(\mathbf{y}|\mathbf{x}_{s},b_{s})$ is the $n$ -fold product of $f_{Y|X_{s}\beta_{s}}(\cdot|\cdot,b_{s})$ . The remaining entries of the measurement matrix are distributed as $\mathbf{X}_{s^{c}}\sim f_{X}^{n\times(p-k)}$ .

Given $\mathbf{X}$ and $\mathbf{Y}$ , a decoder forms an estimate $\hat{S}$ of $S$ . Like previous works studying the information-theoretic limits of support recovery (e.g., [14, 15]), we assume that the decoder knows the system model, including $f_{Y|X_{s}\beta_{s}}$ and $f_{\beta_{s}}$ . We focus on the approximate recovery criterion, only requiring that at least $k-\lfloor\alpha^{*}k\rfloor+1$ entries of $S$ are successfully identified (approximate recovery) for some $\alpha^{*}\in(0,1)$ . Following [20, 14], the error probability is given by

[TABLE]

Note that if both $S$ and $\hat{S}$ have cardinality $k$ with probability one, then the two events in the union are identical, and hence either of the two can be removed. A more stringent performance criterion also considered in literature is the exact support recovery problem, where the error probability is given by $\mathsf{P}_{\mathrm{e}}(0)$ , but our techniques currently appear to be less suited to that setting.

Our main goal is to derive necessary and sufficient conditions on $n$ (as a function of $k$ and $p$ ) such that $\mathsf{P}_{\mathrm{e}}(\alpha^{*})$ vanishes as $p\to\infty$ . Moreover, when considering converse results, we will not only be interested in conditions under which $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\not\to 0$ , but also conditions under which the stronger statement $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 1$ holds.

II-B Overview of Main Results

Here we state and discuss the two main results of this paper. Both of the theorems concern the information-theoretic limits of support recovery in the phase retrieval as described above, but with two different models of interest for the non-zero entries $\beta_{s}$ . We note that our results are asymptotic as $p\to\infty$ and $k\to\infty$ , and we seek explicit constant factors in the leading term, but leave higher-order terms unspecified. Sharper (e.g., non-asymptotic) characterizations appear to be much more challenging, and are beyond the scope of this work. In addition, we emphasize that our achievability result is based on a computationally intractable information-theoretic decoder, and approaching the fundamental limits with practical decoding techniques remains an interesting direction for future studies.

Discrete setting. The first result concerns a discrete distribution on $\beta_{s}$ , namely, $\beta_{s}$ is a uniformly random permutation of a fixed complex vector $(b_{1},\dotsc,b_{k})$ . We let $(b^{\prime}_{1},\dotsc,b^{\prime}_{k})$ be the sorted version of $(b_{1},\dotsc,b_{k})$ such that $|b^{\prime}_{1}|\leq|b^{\prime}_{2}|\leq\cdots|b^{\prime}_{k}|$ , and define the following mutual information quantities:

[TABLE]

where $h(Z)$ is the differential entropy of $Z$ .

Theorem 1.

Consider the phase retrieval setup in Section II, with $\beta_{s}$ being a uniformly random permutation of a fixed complex vector $(b_{1},b_{2},\ldots,b_{k})$ . Let $|b_{\rm{min}}|=\min\{|b_{i}|:i\in\{1,\cdots,k\}\}$ and $|b_{\rm{max}}|=\max\{|b_{i}|:i\in\{1,\cdots,k\}\}$ , and assume that $|b_{\rm{min}}|=\Theta(|b_{\rm{max}}|)$ , and that $k\to\infty$ with $\|b_{s}\|_{2}=\Theta(1)$ as $p\to\infty$ . In addition, assume that there are $m_{\beta}\in\{1,\dotsc,k\}$ distinct elements in $(b_{1},\ldots,b_{k})$ .

We have $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 0$ as $p\to\infty$ provided that

[TABLE]

for arbitrarily small $\eta>0$ if either of the following additional conditions hold: (i) $m_{\beta}=\Theta(1)$ and $k=o(p)$ , or (ii) $\log k=o(\log p)$ (and $m_{\beta}$ is arbitrary).

Conversely, under the general scaling $k=o(p)$ and arbitrary $m_{\beta}$ , we have $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 1$ as $p\to\infty$ whenever

[TABLE]

for arbitrarily small $\eta>0$ .

Proof:

See Section IV. ∎

We observe that the upper and lower bounds are nearly in closed form, other than the optimization over a single scalar $\alpha$ . Moreover, the two have a very similar form, with the main difference being the appearance of $\alpha$ vs. $(\alpha-\alpha^{*})$ in the numerator, and $I_{1}$ vs. $I_{2}$ in the denominator. The bounds hold for an arbitrary log-concave noise distribution $f_{Z}$ .

Since the noise variance $\sigma^{2}$ is fixed and the measurement matrix has normalized $\mathcal{CN}(0,1)$ entries, the assumption $\|b\|_{2}=\Theta(1)$ corresponds to the case that the signal-to-noise ratio (SNR) is constant. We observe that under this assumption, the upper and lower bounds provide matching $\Theta\big{(}k\log\frac{p}{k}\big{)}$ behavior. Perhaps more significantly, in the high-SNR limit (i.e., $\|b\|_{2}\to\infty$ ), we obtain nearly identical constant factors. To see this, it suffices to crudely lower bound $I_{1}(\alpha,k)$ by $\frac{1}{2}\log\big{[}\big{(}\frac{4}{\exp(2h(Z))}\big{)}\big{(}\lfloor\alpha k\rfloor|b_{\rm{min}}|^{2}\big{)}^{2}+1\big{]}$ , and upper bound $I_{2}(\alpha,k)$ by $\frac{1}{2}\log\big{[}\big{(}\frac{2\pi e}{\exp(2h(Z))}\big{)}\big{(}\lfloor\alpha k\rfloor|b_{\rm{max}}|^{2}\big{)}^{2}+1\big{]}+\frac{1}{2}\log\big{[}1+\frac{\lfloor\alpha k\rfloor k|b_{\rm{max}}|^{4}}{\lfloor\alpha k\rfloor^{2}|b_{\rm{min}}|^{4}}\big{]}+\frac{1}{2}\log\big{(}\frac{\pi e}{2}\big{)}$ . For any $\alpha$ bounded away from zero, since $|b_{\rm{min}}|=\Theta(|b_{\rm{max}}|)$ , these both behave as $\log(k|b_{\rm{min}}|^{2})(1+o(1))$ as $\|b\|_{2}\to\infty$ (or equivalently $k|b_{\rm{min}}|^{2}\to\infty$ ), which implies that the maxima in (8) and (9) are attained by $\alpha=1$ in this limit, and the upper and lower bounds coincide up to a factor of $\frac{1}{1-\alpha^{*}}$ .

We believe that the additional assumptions on $m_{\beta}$ and $k$ in the achievability part are an artifact of our analysis, and note that similar assumptions were made for the linear model in [14]. The conditions in Theorem 1 are less restrictive than those in [14] since we are considering approximate recovery instead of exact recovery.

Gaussian setting. We now turn to a (complex) Gaussian model on the non-zero entries in which $\beta_{s}\sim\mathcal{CN}(0,\mathbf{I}_{k}\sigma_{\beta}^{2})$ , $\sigma_{\beta}^{2}=\frac{c_{\beta}}{k}$ for some $c_{\beta}>0$ . This is analogous to a model considered for the linear setting in [20, 14]. Our result is stated in terms of the mutual information quantities

[TABLE]

where $g(\cdot)$ is defined as

[TABLE]

with $F_{1}$ denoting the cumulative distribution function of a $|\mathcal{CN}(0,1)|^{2}$ random variable.

Theorem 2.

Consider the phase retrieval setup in Section II where $Z\sim\mathcal{N}(0,\sigma^{2})$ , and $\beta_{s}\sim\mathcal{CN}(0,\mathbf{I}_{k}\sigma_{\beta}^{2})$ with $\sigma_{\beta}^{2}=\frac{c_{\beta}}{k}$ for some constant $c_{\beta}>0$ . If $k\to\infty$ with $\log k=o(\log p)$ , then we have $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 0$ as $p\to\infty$ provided that

[TABLE]

for arbitrarily small $\eta>0$ .

Conversely, under the broader scaling regime $k\to\infty$ with $k=o(p)$ , we have $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 1$ as $p\to\infty$ whenever

[TABLE]

for arbitrarily small $\eta>0$ .

The assumption $\log k=o(\log p)$ in the achievability part (which holds, for example, when $k=O((\log p)^{c})$ for some $c>0$ ) is rather restrictive compared to the general $k=o(p)$ scaling in the converse part. The former arises from a significant technical challenge (see Proposition 14 below), and we expect that the requirement is merely an artifact of our analysis.333In fact, extending our analysis to the broader scaling regime $k=O(p^{1-\epsilon})$ (for some $\epsilon>0$ ) leads to the correct scaling $n=O(k\log p)$ , but unfortunately the resulting constant factors are quite loose compared to Theorem 2. In addition, we note that while we allowed an arbitrary log-concave distribution in the discrete setting, here we have focused on $Z\sim\mathcal{N}(0,\sigma^{2})$ to simplify the analysis. Despite this restriction, we believe that Gaussian noise still captures the essential features of the phase retrieval problem.

Once again, the scaling $\sigma_{\beta}^{2}=\frac{c_{\beta}}{k}$ amounts to a fixed SNR. As mentioned in [20], exact recovery is not possible for Gaussian $\beta_{s}$ when the SNR is constant, and may even need a huge number of measurements when the SNR increases with $p$ . This motivates the consideration of approximate recovery in this setting.

The differences between the upper and lower bounds are similar to the discrete case. In particular, although the constants differ, the bounds are similar, and always have the same scaling laws. In the limit $c_{\beta}\to\infty$ , we have $\bar{I}_{1}(\alpha)=\log(c_{\beta})(1+o(1))$ and $\bar{I}_{2}(\alpha)=\log(c_{\beta})(1+o(1))$ ; in this case, the maxima in (13)-(14) are both achieved with $\alpha\to 1$ , and hence, the two bounds coincide to within a multiplicative factor of $\frac{1}{1-\alpha^{*}}$ .

Comparison to the linear model. In Figures 1 and 2, we plot the upper and lower bounds of Theorem 1 and Theorem 2 for $\alpha^{*}=0.1$ under various signal-to-noise ratios (SNRs), along with the counterparts for the linear model in [14].444The approximate recovery result for the discrete case was not explicitly stated in [14], but it is easily inferred from the analysis, and amounts to a much simpler version of the analysis of the present paper. For the discrete model, we focus on the simple case that $Z\sim\mathcal{N}(0,\sigma^{2})$ and

[TABLE]

for some $c_{\beta}>0$ , corresponding to $m_{\beta}=1$ in Theorem 1. In Appendix A, we describe how we equate the SNR in the linear and phase retrieval models, and also how to evaluate the bounds of Theorem 1 when $m_{\beta}=1$ .

As predicted by the discussion following Theorems 1 and 2, the upper and lower bounds are close (though still with a constant gap) when the SNR is sufficiently high. In addition, in this regime the information-theoretic limits of the phase retrieval model and the linear model are very similar, especially in the Gaussian case.

However, at lower SNR, the gap for the phase retrieval model can widen significantly more than that of the linear model. This appears to be because the key mutual information quantities arising in the analysis can only be expressed in closed form in the linear model, while requiring possibly-loose bounds in the phase retrieval model. However, all that is needed to close these gaps (at least partially) is to deduce improved mutual information bounds for the phase retrieval setting (cf., Section III-D).

III Auxiliary Results

In this section, we introduce the main auxiliary results needed to prove Theorems 1 and 2. We first introduce some notation and recall the initial bounds for general observation models from [14], and then present the relevant log-concavity properties, mutual information bounds, and concentration bounds.

III-A Information-Theoretic Definitions

We first outline some information theoretic definitions from [14], recalling that we are conditioning on a fixed $S=s$ throughout. We consider partitions of the support set $s\in\mathcal{S}$ into two disjoint sets $s_{\rm{dif}}\neq\emptyset$ and $s_{\rm{eq}}$ , where $s_{\rm{eq}}$ will typically correspond to an overlap between $s$ and some other set $\bar{s}$ (i.e., $s\cap\bar{s}$ , the “equal” part), and $s_{\rm{dif}}$ will correspond to the indices in one set but not in the other (i.e., $s\setminus\bar{s}$ , the “differing” part).

For fixed $s\in\mathcal{S}$ and a corresponding pair $(s_{\rm{dif}},s_{\rm{eq}})$ , we introduce the notation

[TABLE]

where $f_{\mathbf{Y}|\mathbf{X}_{s}}$ is the marginal distribution of (4). While the left-hand sides of (16) and (17) represent the same quantities for any pair $(s_{\rm{dif}},s_{\rm{eq}})$ , it will still prove convenient to work with these in place of the right-hand sides. In particular, this allows us to introduce the marginal distributions

[TABLE]

where $\ell:=|s_{\rm{dif}}|$ . Using the preceding definitions, we introduce two information densities (in the terminology of the information theory literature, e.g., [23]). The first contains probabilities averaged over $\beta_{s}$ ,

[TABLE]

whereas the second conditions on $\beta_{s}=b_{s}$ :

[TABLE]

where $(x^{(i)},y^{(i)})$ is the $i$ -th measurement, and the single-letter information density is

[TABLE]

Averaging (22) with respect to the distribution in (17) conditioned on $\beta_{s}=b_{s}$ yields a conditional mutual information quantity, which is denoted by

[TABLE]

III-B General Achievability and Converse Bounds

For the general support recovery problem with probabilistic models, the following achievability and converse bounds are given in [14]. While these are stated for the real-valued setting in [14], the proofs apply verbatim to the complex-valued setting.

Theorem 3.

[14, Theorem 5]** Fix any constants $\delta_{1}>0,\delta_{2}\in(0,1)$ , and $\gamma>0$ , and functions $\{\psi_{\ell}\}_{\ell=\lfloor\alpha^{*}k\rfloor}^{k}(\psi_{\ell}:\mathbb{Z}\times\mathbb{R}\to\mathbb{R})$ such that the following holds:

[TABLE]

for all $(s_{\rm{dif}},s_{\rm{eq}})$ with $\lfloor\alpha^{*}k\rfloor\leq|s_{\rm{dif}}|\leq k$ and for all $b_{s}$ in some (typical) set $\mathcal{T}_{\beta}$ . Then we have

[TABLE]

where

[TABLE]

Theorem 4.

[14, Theorem 6]** Fix any constants $\delta_{1}>0,\delta_{2}>0$ , and functions $\{\psi^{\prime}_{\ell}\}_{\ell=\lfloor\alpha^{*}k\rfloor}^{k}(\psi^{\prime}_{\ell}:\mathbb{Z}\times\mathbb{R}\to\mathbb{R})$ such that the following holds:

[TABLE]

for all $(s_{\rm{dif}},s_{\rm{eq}})$ with $|s_{\rm{dif}}|\in[\lfloor\alpha^{*}k\rfloor,k]$ , and for all $b_{s}$ in some (typical) set $\mathcal{T}_{\beta}$ . Then we have

[TABLE]

The steps for applying and simplifying these bounds are as follows:

Establish an explicit characterization of each mutual information term $I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})$ (e.g., upper and lower bounds); 2. 2.

Use concentration of measure to find expressions for each function $\psi_{\ell}$ and $\psi^{\prime}_{\ell}$ in Theorems 3 and 4, i.e., functions satisfying (24) and (28); 3. 3.

According to the specific model on the non-zero entries $\beta_{s}$ under consideration, choose a suitable typical set $\mathcal{T}_{\beta}$ , and also a value of $\gamma$ , so that both $\mathbb{P}[\beta_{s}\notin\mathcal{T}_{\beta}]$ and $P_{0}(\gamma)$ can be proved to be vanishing as $p\to\infty$ ; 4. 4.

Combine and simplify the preceding steps to deduce the final sample complexity bound.

These steps turn out to be highly non-trivial in the phase retrieval setting. In the following subsections, we provide general-purpose tools for Steps 1 and 2; we defer Steps 3 and 4 to Section IV for discrete $\beta_{s}$ , and to Section V for Gaussian $\beta_{s}$ .

III-C Log-Concavity Properties

Both our mutual information bounds and concentration bounds will crucially rely on the log-concavity properties stated in the following lemma.

Lemma 5.

Under the phase retrieval setup in Section II, we have the following:

Given $S=s$ and $\beta_{s}=b_{s}$ , the conditional marginal density of $Y$ is log-concave; 2. 2.

Given $S=s$ , $\beta_{s}=b_{s}$ , and $X_{s_{\rm{eq}}}=x_{s_{\rm{eq}}}$ for some $s_{\rm eq}\subset s$ , the conditional marginal density of $Y$ is log-concave.

Proof:

Recall that $Z$ is log-concave by assumption, and $Y=|\langle X_{s},b_{s}\rangle|^{2}+Z$ with $X_{s}$ having i.i.d. $\mathcal{C}\mathcal{N}(0,1)$ entries. In other words, $Y=U+Z$ , where $U$ is the squared magnitude of a $\mathcal{CN}(0,\|b_{s}\|_{2}^{2})$ random variable. We observe that $Y$ is log-concave, since the $\chi^{2}$ distribution with two degrees of freedom is log-concave [24] and the convolution of two log-concave functions is log-concave [25].

In addition, given $S=s$ , $\beta_{s}=b_{s}$ , and $X_{s_{\rm{eq}}}=x_{s_{\rm{eq}}}$ , we have $Y=U+Z$ , where $U$ is the squared magnitude of a $\mathcal{CN}(\langle b_{s_{\rm{eq}}},x_{s_{\rm eq}}\rangle,\|b_{s_{\rm{dif}}}\|_{2}^{2})$ random variable. This distribution on $Y$ is also log-concave by a similar argument, and the fact that the non-central $\chi^{2}$ distribution with two degrees of freedom is log-concave [24]. ∎

III-D Mutual Information Bounds

While an exact expression for the mutual information $I_{s_{\rm{dif}},s_{\rm{eq}}}(b)$ does not appear to be possible, the following theorem states closed-form upper and lower bounds. While there is a gap between the two in general, the asymptotic behavior is similar when $v_{\rm{dif}}=\sum_{i\in s_{\rm{dif}}}|b_{i}|^{2}$ grows large; this fact ultimately leads to tight sample complexity bounds in the high-SNR setting.

Theorem 6.

For the phase retrieval setup in Section II, the following holds for $I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})$ defined in (23):

[TABLE]

where $v_{\rm{eq}}=\sum_{i\in s_{\rm{eq}}}|b_{i}|^{2}$ and $v_{\rm{dif}}=\sum_{i\in s_{\rm{dif}}}|b_{i}|^{2}$ .

Proof:

The upper bound is based on the entropy power inequality and the maximum entropy property of the Gaussian distribution, and the lower bound is based on (known) results that give nearly-matching lower bounds for log-concave random variables. The details are given in Appendix B. ∎

III-E Concentration Bounds

Perhaps the most technically challenging part of our analysis is to establish concentration bounds amounting to explicit expressions for $\psi_{\ell}$ and $\psi^{\prime}_{\ell}$ in Theorems 3 and 4.

Before stating the final concentration bounds, we provide a general result that may be of independent interest, giving a concentration bound on conditional information random variables of the form $\tilde{h}(\mathbf{Y}|\mathbf{X})=-\log f_{\mathbf{Y}|\mathbf{X}}(\mathbf{Y}|\mathbf{X})$ (in generic notation) under certain log-concavity assumptions. Such a result is provided as a corollary of the following, which considers generic random variables $(X,Y)$ that need not be associated with the phase retrieval problem at this point.

Proposition 7.

Suppose that $(X,Y)\in\mathbb{R}^{2k}\times\mathbb{R}$ with joint density function $f_{XY}$ . For each $t\in\mathbb{R}^{+}$ , define

[TABLE]

and assume that

[TABLE]

for all $t\in\mathbb{R}^{+}$ . Moreover, for an arbitrary positive number $\bar{Q}>0$ (to be chosen later), define

[TABLE]

and assume that

[TABLE]

Then, the following holds:

[TABLE]

where

[TABLE]

Proof:

We follow the general approach of [22], which considers the unconditional information variable $\tilde{h}(x)=-\log f_{X}(x)$ ; however, many of the details differ significantly. The reader is referred to Appendix C. ∎

From this, we immediately deduce a similar result for i.i.d. product distributions.

Corollary 8.

Let $k\in\mathbb{Z}^{+}$ . Suppose that $(\mathbf{X},\mathbf{Y})\in\mathbb{R}^{2kn}\times\mathbb{R}^{n}$ with distribution $f_{XY}^{n}$ (i.e., i.i.d. on $f_{XY}$ ), where $f_{XY}$ satisfies (33) and (35). Then, the following holds:

[TABLE]

where

[TABLE]

and $K_{1}$ is defined in (34).

Proof:

The i.i.d. assumption readily yields $h(\mathbf{Y}|\mathbf{X})=nh(Y|X)$ and $\mathbb{E}[\exp(\mu\tilde{h}(\mathbf{Y}|\mathbf{X}))]=\big{(}\mathbb{E}[\exp(\mu\tilde{h}(Y|X))]\big{)}^{n}$ , where $(X,Y)\sim f_{XY}$ . Hence,

[TABLE]

and the corollary follows by bounding the expectation via Proposition 7. ∎

We are now ready to state a general result on the concentration of conditional information variables.

Corollary 9.

Let $(\mathbf{X},\mathbf{Y})\sim f_{XY}^{n}$ with $X\in\mathbb{R}^{2k},Y\in\mathbb{R}$ . Then, under conditions (33) and (35) of Proposition 7, the following holds for any $\mu>0$ :

[TABLE]

where $\tilde{h}(\mathbf{Y}|\mathbf{X})$ is defined in (38), $K_{1}$ in (34), and $r(\mu)$ in (39).

Proof:

With Corollary 8 in place, this is a fairly straightforward application of the Chernoff bound. The details are given in Appendix C-C. ∎

Remark 10.

Some remarks are in order.

•

In **[22, Theorem 3.1]**, the authors showed that $\mathbb{E}[\exp(\mu(\tilde{h}(Y)-h(Y)))]\leq\exp(Kr(-\mu))$ for any $\mu\in\mathbb{R}$ and any random vector $Y\in\mathbb{R}$ such that $f_{Y}\in L^{t}(\mathbb{R})$ for all $t>0$ (i.e., $|f_{Y}(y)|^{t}$ is absolutely integrable). Theorem 7 shows that this fact can be extended to conditional distributions under some assumptions on the joint distribution $f_{XY}$ .

•

When $\mathbf{X}$ and $\mathbf{Y}$ are independent, Theorem 7 is very similar to **[22, Theorem 3.1]**.

•

When we apply Corollary 9 to the phase retrieval problem, we will bound $K_{1}$ using the log-concavity properties in Lemma 5.

•

If $\mathbf{X}$ and $\mathbf{Y}$ were jointly log-concave, a variant of (36) with an alternative definition for $K_{1}$ could be used based on **[22, Theorem 3.1]** and the union bound, since $\tilde{h}(\mathbf{Y}|\mathbf{X})=\tilde{h}(\mathbf{X},\mathbf{Y})-\tilde{h}(\mathbf{X})$ and $h(\mathbf{Y}|\mathbf{X})=h(\mathbf{X},\mathbf{Y})-h(\mathbf{X})$ . However, such a bound is not suitable for out purposes, since the measurement variables and outputs in the phase retrieval problem are not jointly log-concave.

•

Alternatively, using only the fact that $f_{\mathbf{Y}|\mathbf{X}}(\cdot|\mathbf{x})$ is log-concave for all $\mathbf{x}$ , **[22, Theorem 3.1]** gives for suitably-defined $K$ that

[TABLE]

However, (40) does not appear to follow from (46).

Although the preceding results are general, finding an explicit expression or upper bound for $K_{1}$ in (34) is non-trivial. With some technical effort, we are able to attain such a bound for the phase retrieval model and deduce the following key concentration result used in the proofs of Theorems 1 and 2.

Theorem 11.

Under the phase retrieval setup in Section II, the following bounds hold:

[TABLE]

for all $\mu>0$ , where $I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})$ is defined in (23), $C(b_{s})$ is a constant depending on $b_{s}\in\mathbb{C}^{k}$ ,555The definition is given in (277) in Lemma 23. and $r(\mu)$ is defined in (39).

Proof:

See Appendix D. ∎

It turns out that the constant $C(b_{s})$ behaves as $\Theta(1)$ whenever $\|b_{s}\|_{2}=\Theta(1)$ , leading to the following corollary.

Corollary 12.

For the complex phase retrieval problem in (1), equations (47) and (48) hold with $C(b_{s})$ replaced by some constant $C=\Theta(1)$ under the condition $\|b_{s}\|_{2}=\Theta(1)$ .

Proof:

See Appendix D. ∎

IV Proof of Theorem 1 (Discrete $\beta_{s}$ )

As a stepping stone to proving Theorem 13, we state the following lemma, which can be thought of as a version of that theorem before applying the suitable mutual information bounds and asymptotic simplifications. Recall that $I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})$ is defined in (23).

Lemma 13.

Consider the setup of Theorem 1 with $\beta_{s}$ being a uniformly random permutation of $b_{s}=(b_{1},b_{2},\ldots,b_{k})$ satisfying $|b_{\rm{min}}|=\Theta(|b_{\rm{max}}|)$ , and $\|b_{s}\|_{2}=\Theta(1)$ , and $k\to\infty$ , and $m_{\beta}$ distinct elements in $(b_{1},b_{2},\ldots,b_{k})$ .

We have $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 0$ as $p\to\infty$ provided that

[TABLE]

for arbitrarily small $\eta>0$ if either of the following additional conditions holds: (i) $m_{\beta}=\Theta(1)$ and $k=o(p)$ ; or (ii) $\log k=o(\log p)$ .

Conversely, for general $m_{\beta}$ and $k=o(p)$ , we have $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 1$ as $p\to\infty$ provided that

[TABLE]

for arbitrarily small $\eta>0$ .

IV-A Proof of Lemma 13

We apply Theorem 3 in several steps as follows.

Step 1: Choose the typical set. Let $\mathcal{T}_{\beta}$ be the set of all permutations of the fixed complex vector $b_{s}=(b_{1},b_{2},\ldots,b_{k})$ . Under the conditions $|b_{\rm{min}}|=\Theta(|b_{\rm{max}}|)$ and $\|b_{s}\|_{2}=\Theta(1)$ , we observe that the quantity $v_{\rm{dif}}=\sum_{i\in s_{\rm{dif}}}|b_{i}|^{2}$ also behaves as $\Theta(1)$ , while $v_{\rm{eq}}=\sum_{i\in s_{\rm{eq}}}|b_{i}|^{2}$ behaves as $O(1)$ (note that we only consider $s_{\rm{dif}}$ with cardinality $\Theta(k)$ , a constant fraction of the total $k$ ). Hence, we find from (6) of Theorem 6 that

[TABLE]

In addition, since there are at most $m_{\beta}^{k}$ possible random permutations by the definition of $m_{\beta}$ , choosing $\gamma=\log\frac{1}{\min_{b_{s}}f_{\beta_{s}}(b_{s})}\leq k\log m_{\beta}$ gives $P_{0}(\gamma)=0$ ; this immediately follows by writing $f_{\mathbf{Y}|\mathbf{X}_{s}}(\mathbf{y}|\mathbf{x}_{s})=\sum_{b_{s}}f_{\beta_{s}}(b_{s})f_{\mathbf{Y}|\mathbf{X}_{s}\beta_{s}}(\mathbf{y}|\mathbf{x}_{s},b_{s})$ in (27).

Step 2: Bound the information density tail probabilities. Fix $\delta_{2}>0$ (to be chosen later), and define

[TABLE]

for each $|s_{\rm dif}|$ , where $C$ is defined in Corollary 12.

Now, for each integer $\ell$ representing $|s_{\rm{dif}}|$ , set

[TABLE]

By setting $\mu=\mu_{|s_{\rm{dif}}|}$ in (47), and applying Corollary 12, we have

[TABLE]

Similarly, we obtain from (48) that

[TABLE]

This means that the conditions (24) and (28) are satisfied with $\psi_{\ell}$ and $\psi^{\prime}_{\ell}$ defined in (53), respectively.

Step 3: Control the remainder terms. We first consider the remainder term $\psi^{\prime}_{\ell}$ in the converse bound (30) resulting from (56). Since $|s_{\rm{dif}}|\in[\lfloor\alpha^{*}k\rfloor,k]$ , we have $|s_{\rm{dif}}|=\lfloor\alpha k\rfloor$ for some $\alpha\in[\alpha^{*},1]$ . Since $I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})=\Theta(1)$ as stated in (51), we deduce from (52) that

[TABLE]

so that (54) yields

[TABLE]

We choose $\delta_{2}$ to be a slowly vanishing function of $p$ , so that a simple Taylor expansion in the definition of $r(\cdot)$ in (39) yields $r\big{(}\Theta(\delta_{2})\big{)}=\Theta(\delta_{2}^{2})$ . Therefore, (58) vanishes as $p\to\infty$ if $n=\omega\big{(}\frac{1}{\delta^{2}_{2}}\big{)}$ .

We now turn to the achievability part. First observe that the term $\sum_{\ell=\lfloor\alpha^{*}k\rfloor}^{k}{k\choose\ell}\psi_{\ell}(n,\delta_{2})$ in (26) vanishes as $p\to\infty$ provided that $(1-\alpha^{*})k\max_{\ell\in[\lfloor\alpha^{*}k\rfloor,k]}{k\choose\ell}\psi_{\ell}(n,\delta_{2})\to 0$ . Since $\psi_{\ell}(n,\delta_{2})>0$ , this is equivalent to

[TABLE]

for all $\ell\in[\lfloor\alpha^{*}k\rfloor,k]$ . From (53) and (59), we find that $\sum_{\ell=\lfloor\alpha^{*}k\rfloor}^{k}{k\choose\ell}\psi_{\ell}(n,\delta_{2})\to 0$ provided that

[TABLE]

as $p\to\infty$ for all $\ell\in[\lfloor\alpha^{*}k\rfloor,k]$ , where we have used $\log{k\choose\ell}\leq\ell\log\frac{ke}{\ell}$ .

Since $\ell=\lfloor\alpha k\rfloor$ for some $\alpha\in[\alpha^{*},1]$ , (60) holds provided that

[TABLE]

for arbitrarily small $\eta>0$ . Again using $\min\{r(\mu_{\ell}),r(-\mu_{\ell})\}=\Theta(\delta_{2}^{2})$ for slowly vanishing $\delta_{2}$ (as established in the above converse part), we find that this condition simplifies to $n=\Omega\big{(}\frac{k}{\delta_{2}^{2}}\big{)}$ .

We also need to consider the effect of the term $\gamma$ in Theorem 3, recalling that we already established that $P_{0}(\gamma)=0$ with $\gamma\leq k\log m_{\beta}$ . For the first case in Lemma 13, i.e., $m_{\beta}=\Theta(1)$ and $k=o(p)$ , we have $\gamma=O(k)$ . In the second case, i.e. $m_{\beta}=O(k)$ and $\log k=o(\log p)$ , we have $\gamma=O(k\log k)=o\big{(}k\log\frac{p}{k}\big{)}$ . Hence, in both cases, we have $\gamma=o\big{(}k\log\frac{p}{k}\big{)}$ .

Step 4: Combine and simplify. For the converse part, since (58) vanishes when $n=\omega\big{(}\frac{1}{\delta_{2}^{2}}\big{)}$ , we deduce from Theorem 4 (with $\delta_{1}\to 0$ and $\delta_{2}\to 0$ sufficiently slowly) that $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 1$ when (50) holds, as required. Specifically, (50) is merely an asymptotic simplification of (29).

For the achievability part, by choosing $\delta_{1}\to 0$ and $\delta_{2}\to 0$ sufficiently slowly in Theorem 3, we find that the condition (25) reduces to

[TABLE]

for arbitrarily small $\eta>0$ . Since $k=o(p)$ and $|s_{\rm{dif}}|=\lfloor\alpha k\rfloor$ for some $\alpha\in[\alpha^{*},1]$ , the first term in the numerator of (62) behaves as $\Theta(\alpha k\log(\frac{p}{k}))$ , and the second term behaves as $\Theta(\log k+|s_{\rm{dif}}|\log\frac{k}{|s_{\rm{dif}}|})=\Theta(\alpha k)$ . Since for both cases (i) and (ii) of Lemma 13, we have $\gamma=o(k\log(\frac{p}{k}))$ , it immediately follows that the numerator in (62) is dominated by the first term and the others can be factored into the remainder term $\eta>0$ . Moreover, the condition $n=\Omega\big{(}\frac{k}{\delta_{2}^{2}}\big{)}$ stated following (61) behaves as $o\big{(}\alpha k\log\frac{p}{k}\big{)}$ when $\delta_{2}\to 0$ sufficiently slowly (e.g., $\delta_{2}=\Theta(\frac{1}{\log(\log k)})$ ). Combining these observations, we deduce that we only require (62), with the first term alone kept in the numerator, and the rest factored into $\eta$ in (49).

IV-B Proof of Theorem 1

Recall the definitions $v_{\rm{dif}}=\sum_{i\in s_{\rm{dif}}}|b_{i}|^{2}$ and $v_{\rm{eq}}=\sum_{i\in s_{\rm{eq}}}|b_{i}|^{2}$ . Since $|s_{\rm{dif}}|\in[\lfloor\alpha^{*}k\rfloor,k]$ , we have $|s_{\rm{dif}}|=\lfloor\alpha k\rfloor$ for some $\alpha\in[\alpha^{*},1]$ .

For the achievability part, we use the lower bound in (6) of Theorem 6. Since this lower bound is increasing in $v_{\rm{dif}}$ and does not depend on $v_{\rm{eq}}$ , we have the following whenever $|s_{\rm{dif}}|=\lfloor\alpha k\rfloor$ :

[TABLE]

recalling that $I_{1}(\alpha,k)$ defined in (6) replaces $v_{\rm dif}$ by the value corresponding to the lowest-magnitude entries of $b_{s}$ . Hence, (8) of Theorem 1 follows from (49) of Lemma 13 by observing that the numerator of (49) behaves as $\big{(}\alpha k\log\frac{p}{k}\big{)}(1+o(1))$ and the denominator is lower bounded by $I_{1}(\alpha,k)$ via (63).

For the converse part, we use the upper bound in (6) of Theorem 6. While this bound depends on $v_{\rm{dif}}$ and $v_{\rm{eq}}$ in a more complicated fashion, the converse bound (50) remains valid when we replace the maximum over $(s_{\rm{dif}},s_{\rm{eq}})$ by any fixed choice. Under the choice in which $s_{\rm{dif}}$ contains the indices corresponding to the $\lfloor\alpha k\rfloor$ entries of $b_{s}$ with the smallest magnitude, (6) yields

[TABLE]

where $I_{2}(\alpha,k)$ is defined in (6).

Regarding the numerator in (50), it was shown in [14, Proof of Cor. 2] via simple asymptotic expansions that the term $\log\big{(}\sum_{d=0}^{\lfloor\alpha^{*}k\rfloor}{p-k\choose d}{|s_{\rm{dif}}|\choose d}\big{)}$ is dominated by $\alpha^{*}k\log\frac{p}{k}$ as $p\to\infty$ with $k=o(p)$ , and that the overall numerator in (50) simplifies to $(\alpha-\alpha^{*})k\log(\frac{p}{k})(1+o(1))$ if $|s_{\rm{dif}}|=\lfloor\alpha k\rfloor$ . Combining this fact with (64), we have that $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 1$ as $p\to\infty$ if

[TABLE]

for some $\eta>0$ . This yields equation (9) of Theorem 1.

V Proof of Theorem 2 (Gaussian $\beta_{s}$ )

One of the key challenges in the Gaussian setting compared to the discrete setting is bounding the quantity $P_{0}(\gamma)$ appearing in Theorem 3. As noted in [14], this roughly amounts to bounding the mutual information quantity $I(\beta_{s};\mathbf{Y}|\mathbf{X}_{s})$ , for which the approaches proposed in [14] appear to be insufficient. The following proposition states a bound on $P_{0}(\gamma)$ resulting from a novel approach.

Proposition 14.

Under the phase retrieval setup in Section II with $Z\sim\mathcal{N}(0,\sigma^{2})$ for some $\sigma\in\mathbb{R}^{+}$ , $\beta_{s}\sim\mathcal{CN}(0,\mathbf{I}_{k}\sigma_{\beta}^{2})$ for some $\sigma_{\beta}^{2}=\Theta\big{(}\frac{1}{k}\big{)}$ , and $k\to\infty$ with $n=\Omega(k)$ , the following holds:

[TABLE]

for any $\gamma>0$ , where $P_{0}(\gamma)$ is defined in (27) of Theorem 3, i.e., $P_{0}(\gamma):=\mathbb{P}\big{[}\log\frac{f_{\mathbf{Y}|\mathbf{X}_{s}\beta_{s}}(\mathbf{Y}|\mathbf{X}_{s},\beta)}{f_{\mathbf{Y}|\mathbf{X}_{s}}(\mathbf{Y}|\mathbf{X}_{s})}>\gamma\big{]}$ .

Proof:

See Section V-B. ∎

The following proposition characterizes the behavior of the $\lfloor\alpha k\rfloor$ entries of $\beta_{s}$ having the smallest magnitude for fixed $\alpha$ . For the real linear model in [14], $(\beta_{s})_{i}^{2}$ follows a chi-squared distribution with one degree of freedom for all $i=1,\cdots,k$ . However, for our phase retrieval model (cf. Section II), $|(\beta_{s})_{i}|^{2}$ follows a chi-squared distribution with two degrees of freedom. This difference only amounts to a minor change in the definition of $g(\alpha)$ in (12), and [14, Prop. 3] extends immediately to the following.

Proposition 15.

[14, Prop. 3]** For $\beta_{s}$ i.i.d. on $\mathcal{C}\mathcal{N}\big{(}0,\frac{\sigma_{\beta}^{2}}{k})$ for fixed $\sigma_{\beta}^{2}$ , we have with probability one that the following holds for all $\alpha\in[0,1]$ :

[TABLE]

where $\beta_{s}^{\prime}$ is the permutation of $\beta_{s}$ whose entries are listed in increasing order of magnitude, and $g(\alpha)$ is defined in (12).

Note that this result is essentially an application of the Glivenko-Cantelli theorem [26, Thm. 19.1], stating uniform convergence of the empirical cumulative distribution function (CDF) to the true CDF.

V-A Proof of Theorem 2

The proof of Theorem 2 follows the same high-level steps as those for the discrete case.

Step 1: Choose a typical set. Based on the result in Proposition 15, we set $\mathcal{T}_{\beta}$ to be the set of vectors $b_{s}$ such that $\max_{\alpha\in[0,1]}\big{|}\frac{1}{k\sigma_{\beta}^{2}}\sum_{i=1}^{\lfloor\alpha k\rfloor}|(b_{s}^{\prime})_{i}|^{2}-g(\alpha)\big{|}\leq\varepsilon$ as $k\to\infty$ , where $\varepsilon$ is chosen to decay sufficiently slowly so that $\mathbb{P}[\mathcal{T}_{\beta}]\to 1$ . We therefore have

[TABLE]

for all $b_{s}\in\mathcal{T}_{\beta}$ , and in particular $\frac{1}{k\sigma_{\beta}^{2}}\|b_{s}\|_{2}^{2}\to 1$ by setting $\alpha=1$ . In addition, we obtain

[TABLE]

by using $c_{\beta}=k\sigma_{\beta}^{2}$ and $\frac{1}{k\sigma_{\beta}^{2}}\|b_{s}\|_{2}^{2}\to 1$ .

We proceed similarly to Section IV-B for the discrete setting, recalling that $v_{\rm{dif}}=\sum_{i\in s_{\rm{dif}}}|b_{i}|^{2}$ . For the achievability part, (68) and the mutual information lower bound in (6) (with $Z\sim\mathcal{N}(0,\sigma^{2})$ ) imply (within the typical set) that for any $(s_{\rm{dif}},s_{\rm{eq}})$ with $|s_{\rm{dif}}|=\lfloor\alpha k\rfloor$ , we have

[TABLE]

where $\bar{I}_{1}(\alpha)$ is defined in (10).

For the converse part, we do not need to consider all pairs $(s_{\rm{dif}},s_{\rm{eq}})$ , since any fixed choice still provides a valid converse. Hence, for a given cardinality $|s_{\rm{dif}}|=\lfloor\alpha k\rfloor$ , we only consider the choice such that $s_{\rm{dif}}$ contains the indices corresponding to the $\lfloor\alpha k\rfloor$ entries of $b_{s}$ with the smallest magnitude. Under this choice, we have from (68)–(69) and the upper bound in (6) (with $Z\sim\mathcal{N}(0,\sigma^{2})$ ) that

[TABLE]

where $\bar{I}_{2}(\alpha)$ is defined in (11).

Step 2: Bound the information density tail probabilities. We again make use of Theorem 11 and its subsequent expression for $\psi_{\ell}$ and $\psi_{\ell}^{\prime}$ in (54).

Step 3: Control the remainder terms. Recall that $P_{0}(\gamma)$ is defined in (27) of Theorem 3. By Proposition 14, we have $P_{0}(\gamma)\to 0$ under any choice of $\gamma$ satisfying $\gamma=\vartheta_{p}k\log n$ for some $\vartheta_{p}$ growing to $\infty$ arbitrarily slowly. When this growth is sufficiently slow and $n=O\big{(}k\log\frac{p}{k}\big{)}$ , we have

[TABLE]

due to the assumption $\log k=o(\log p)$ . Note that $n=O\big{(}k\log\frac{p}{k}\big{)}$ holds trivially under the condition (14) in the converse, whereas for the achievability we can assume without loss of generality that (13) holds with equality, since additional measurements can only improve the information-theoretic performance.

By our choice of $\mathcal{T}_{\beta}$ , we may focus on realizations $b_{s}$ of $\beta_{s}$ satisfying (67). For such realizations, we have for all $s_{\rm{dif}}$ with $|s_{\rm{dif}}|=\Theta(k)$ that $v_{\rm{dif}}=\sum_{i\in s_{\rm{dif}}}|b_{i}|^{2}=\Theta(1)$ by (67) and the assumption that $\sigma_{\beta}^{2}=\frac{c_{\beta}}{k}$ . Hence, by by (70)–(71) and the fact that $|s_{\rm{dif}}|=\lfloor\alpha k\rfloor$ for some $\alpha\in[\alpha^{*},1]$ , we have $I_{s_{\rm{dif}},s_{\rm{eq}}}(b_{s})=\Theta(1)$ as $k\to\infty$ . By using the same arguments as (57) and (58), we deduce that the remainder term $\psi^{\prime}_{\ell}$ resulting from (56) in the converse bound vanishes asymptotically if $n=\omega\big{(}\frac{1}{\delta^{2}_{2}}\big{)}$ .

For the achievability part, we have $\sum_{\ell=\lfloor\alpha^{*}k\rfloor}^{k}{k\choose\ell}\psi_{\ell}(n,\delta_{2})\to 0$ as $k\to\infty$ if $n=\Omega\big{(}\frac{k}{\delta_{2}^{2}}\big{)}$ by using the same arguments as (59)–(61). Recalling that we also established above (72) that $P_{0}(\gamma)\to 0$ , we deduce that $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\to 0$ since $\mathsf{P}_{\mathrm{e}}(\alpha^{*})\leq\sum_{\ell=\lfloor\alpha^{*}k\rfloor}^{k}{k\choose\ell}\psi_{\ell}(n,\delta_{2})+P_{0}(\gamma)+2\delta_{1}$ by Theorem 3.

Step 4: Combine and simplify. The condition (13) is obtained from (25) of Theorem 3 and (70). By the assumption $k=o(p)$ and (72), the numerator in (25) of Theorem 3 is dominated by ${p-k\choose\lfloor\alpha k\rfloor}$ , which behaves as $\big{(}\alpha k\log\frac{p}{k}\big{)}(1+o(1))$ . The factor $\gamma=o\big{(}k\log\frac{p}{k}\big{)}$ (see (72)) and the factor $\log(\frac{k^{2}}{\delta_{1}^{2}}{k\choose|s_{\rm{dif}}|}^{2})$ in (25) have been factored into $\eta$ ; note that the latter term behaves as $O(k)$ when $\delta_{1}\to 0$ sufficiently slowly.

The converse bound in (14) is obtained similarly by using (29) of Theorem 4 and (71). Note that by [14, Proof of Cor. 2], we have that $\log\big{(}\sum_{d=0}^{\lfloor\alpha^{*}k\rfloor}{p-k\choose d}{|s_{\rm{dif}}|\choose d}\big{)}$ simplifies to $\big{(}\alpha^{*}k\log\frac{p}{k}\big{)}(1+o(1))$ . Combining this fact with the assumption that $k=o(p)$ , and observing that $|s_{\rm{dif}}|=\lfloor\alpha k\rfloor$ for some $\alpha\in[\alpha^{*},1]$ , the numerator in (29) of Theorem 4 simplifies to $(\alpha-\alpha^{*})k\log(\frac{p}{k})(1+o(1))$ , thus yielding (14).

V-B Proof of Proposition 14

Overview. We first outline the intuition behind the proof. In [14], the method for controlling $P_{0}(\gamma)$ was upper bounding $I(\beta_{s};\mathbf{Y}|\mathbf{X}_{s})$ via the expansion $I(\beta_{s};\mathbf{Y}|\mathbf{X}_{s})=h(\mathbf{Y}|\mathbf{X}_{s})-h(\mathbf{Y}|\mathbf{X}_{s},\beta_{s})$ . Our analysis is instead based on the expansion $I(\beta_{s};\mathbf{Y}|\mathbf{X}_{s})=h(\beta_{s})-h(\beta_{s}|\mathbf{X}_{s},\mathbf{Y})$ (note that $\beta_{s}$ is independent of $\mathbf{X}_{s}$ ). However, a difficulty with this expansion is in showing that $h(\beta_{s}|\mathbf{X}_{s},\mathbf{Y})$ is not too negative, and we overcome this difficulty as follows:

•

Carefully choose a typical set in which the triplet $(\beta_{s},\mathbf{X}_{s},\mathbf{Y})$ lies with high probability;

•

Show that a quantity similar to $h(\beta_{s}|\mathbf{X}_{s},\mathbf{Y})$ , but with conditioning on lying in the typical set, cannot be too negative by showing that given $(\mathbf{X}_{s},\mathbf{Y})$ , the most probable $\beta_{s}^{*}$ also has a surrounding region of $\beta_{s}$ vectors with a similar conditional density value. This limits how high the conditional density of $\beta_{s}$ can be, and hence how negative the differential entropy can be.

We proceed in several steps.

Defining a typical set. Let

[TABLE]

with $C=\sqrt{2kn}$ , $C^{\prime}=\sqrt{k\sigma_{\beta}^{2}\log n}$ , and $C^{\prime\prime}=\sqrt{2n\sigma^{2}}$ , where $\|\mathbf{x}_{s}\|_{\mathrm{F}}$ is the Frobenius norm, and

[TABLE]

By the union bound, we have

[TABLE]

Recall that $\mathbf{X}_{s}$ , $\beta_{s}$ , and $\mathbf{Z}$ are i.i.d. Gaussian vectors with variances $1$ , $\sigma_{\beta}^{2}$ , and $\sigma^{2}$ respectively. Applying the weak law of large numbers to the first and third probabilities, and Markov’s inequality to the middle one, we deduce that $\mathbb{P}[\mathcal{A}]\to 1$ as $p\to\infty$ (with $k\to\infty$ and $n\to\infty$ simultaneously).

Useful properties within the typical set. Fix $(b_{s},\mathbf{x}_{s},\mathbf{y})\in\mathcal{A}$ , as well as some $\tilde{b}_{s}\in\mathbb{C}^{k}$ satisfying

[TABLE]

for some $\varepsilon>0$ to be chosen later. From $\|b_{s}-\tilde{b}_{s}\|_{2}\leq\varepsilon$ and $\|b_{s}\|_{2}\leq C^{\prime}$ , we have

[TABLE]

and hence

[TABLE]

On the other hand, we also have

[TABLE]

by the assumptions $\|b_{s}\|_{2}\leq C^{\prime}$ and $\|b_{s}-\tilde{b}_{s}\|_{2}\leq\varepsilon$ . It follows that

[TABLE]

Now, we have for each $i\in\{1,2,\dotsc,n\}$ that

[TABLE]

where (86) applies the triangle inequality, (88) is by Cauchy-Schwartz, and (89) applies (76) and (84).

It follows from (89) that

[TABLE]

and by interchanging the roles of $b_{s}$ and $\tilde{b}_{s}$ (and noting that (84) holds), we obtain

[TABLE]

Summing over $i$ , we obtain

[TABLE]

by the condition $\|\mathbf{x}_{s}\|_{F}\leq C$ in $\mathcal{A}$ .

Similarly, from (91), we have

[TABLE]

and summing over $i$ , we obtain

[TABLE]

We now use (94) to bound a related term containing the observations:

[TABLE]

where (100) uses $a^{2}-b^{2}=(a-b)(a+b)$ , (103) uses the definition of $\mathbf{z}_{b}$ (whose $i$ -th entry is denoted by $z_{b}^{(i)}$ ), (105) uses $|z_{b}^{(i)}|\leq\|\mathbf{z}_{b}\|_{2}$ , (106) follows since $\|\mathbf{z}_{b}\|_{2}\leq C^{\prime\prime}$ within $\mathcal{A}$ , and (107) follows from (94) and (99).

Bounding a density ratio. Let $\delta\{\cdot\}$ be the Dirac delta function, and observe that

[TABLE]

Recalling the distributions $Z\sim\mathcal{N}(0,\sigma^{2})$ and $\beta_{s}\sim\mathcal{CN}(0,\mathbf{I}_{k}\sigma_{\beta}^{2})$ , it follows from (112) that

[TABLE]

where (115) uses (81) and (107). Now, since $C=\sqrt{2kn}$ , $C^{\prime}=\sqrt{k\sigma_{\beta}^{2}\log n}$ , $C^{\prime\prime}=\sqrt{2\sigma^{2}n}$ , and $\sigma_{\beta}^{2}=\Theta(\frac{1}{k})$ , we see from (115) that if we choose

[TABLE]

then we obtain

[TABLE]

whenever $(b_{s},\mathbf{x}_{s},\mathbf{y})\in\mathcal{A}$ and $\|\tilde{b}_{s}-b_{s}\|_{2}\leq\varepsilon$ .

Bounding an average log-density. Let $(b_{s}^{*},\mathbf{x}_{s}^{*},\mathbf{y}^{*})$ be an arbitrary point in $\mathcal{A}$ , and define

[TABLE]

From (117), we have $f_{\beta_{s}|\mathbf{X}_{s}\mathbf{Y}}(\tilde{b}_{s}|\mathbf{x}_{s}^{*},\mathbf{y}^{*})\geq f_{\beta_{s}|\mathbf{X}_{s}\mathbf{Y}}(b_{s}^{*}|\mathbf{x}_{s}^{*},\mathbf{y}^{*})(1+o(1))$ whenever $\|\tilde{b}_{s}-b_{s}^{*}\|_{2}\leq\varepsilon$ . Hence, we have

[TABLE]

where $o(1)$ is vanishing as $k\to\infty$ . On the other hand, we trivially have $\tilde{f}_{\rm{\min}}(\mathcal{A})\leq f_{\beta_{s}|\mathbf{X}_{s}\mathbf{Y}}(b_{s}^{*}|\mathbf{x}_{s}^{*},\mathbf{y}^{*})$ , and hence

[TABLE]

Now, defining the ball $B_{\varepsilon}(b_{s}^{*}):=\{\tilde{b}_{s}\in\mathbb{C}^{k}:\|\tilde{b}_{s}-b_{s}^{*}\|_{2}\leq\varepsilon\}$ , we have

[TABLE]

where $\frac{\pi^{k}}{k!}\varepsilon^{2k}$ is the volume of the ball $B_{\varepsilon}(b^{*})$ [27]. Therefore, we have

[TABLE]

by (116). Combining (120) and (125) gives

[TABLE]

Since $(b_{s}^{*},\mathbf{x}_{s}^{*},\mathbf{y}^{*})$ can be arbitrarily chosen within $\mathcal{A}$ , we rename it to $(b_{s},\mathbf{x}_{s},\mathbf{y})\in\mathcal{A}$ , and take the logarithm to deduce that

[TABLE]

by the assumption $n=\Omega(k)$ .

Bounding a mutual information-like term. The mutual information is the average of a log-density ratio, and that ratio may be positive or negative in general. We will find it more convenient to apply the function $[\cdot]^{+}$ to the log-density ratio, and proceed as follows:

[TABLE]

where (131) follows from Bayes’ rule, (133) follows from the fact that $f_{\beta_{s}}(b_{s})=\frac{1}{(\pi\sigma_{\beta}^{2})^{k}}\exp\big{(}-\frac{\|b_{s}\|_{2}^{2}}{\sigma_{\beta}^{2}}\big{)}$ for all $b_{s}\in\mathbb{C}^{k}$ , (134) applies $[a+b]^{+}\leq a+[b]^{+}$ for $a\geq 0$ , (135) uses $\mathbb{E}[\|\beta_{s}\|_{2}^{2}]\geq\mathbb{P}[(\beta_{s},\mathbf{X}_{s},\mathbf{Y})\in\mathcal{A}]\cdot\mathbb{E}[\|\beta_{s}\|_{2}^{2}\,|\,(\beta_{s},\mathbf{X}_{s},\mathbf{Y})\in\mathcal{A}]$ , (136) follows from (129) and the assumption $\sigma_{\beta}^{2}=\Theta\big{(}\frac{1}{k}\big{)}$ , and (137) uses $\mathbb{P}[\mathcal{A}]\to 1$ and the assumption $n=\Omega(k)$ .

Wrapping up. It follows from (137) and Markov’s inequality that for any $\gamma>0$ ,

[TABLE]

Hence, we have

[TABLE]

This concludes the proof of Proposition 14.

VI Conclusion

We have characterized the information-theoretic limits of approximate support recovery in the complex phase retrieval model with Gaussian measurements, under both discrete and Gaussian distributions on the unknown non-zero entries. Along the way, we established novel concentration bounds for conditional information random variables, which may be of independent interest. Our achievability and converse bounds have matching scaling laws, as well as near-matching constant factors as the SNR increases. There are numerous potential directions for further work, including (i) handling the exact recovery criterion, (ii) improving our results in the low-SNR regime via tighter mutual information bounds, (iii) extending our achievability results to general scalings $k=o(p)$ , (iv) handling the linear sparsity regime $k=\Theta(p)$ without any additional assumptions, (v) performing analogous studies for non-Gaussian measurement matrices, such as Fourier measurements, and (vi) seeking computationally efficient algorithms whose support recovery performance comes close to the fundamental limits.

Appendix A Signal-to-Noise Ratio (SNR) Calculations

Gaussian $\beta_{s}$ . For the real Gaussian linear model in [14, Corr. 2], we have i.i.d. $\mathcal{N}(0,1)$ measurements, $\mathcal{N}\big{(}0,\frac{c_{\beta}}{\sigma^{2}}\big{)}$ entries of $\beta_{s}$ , and $\mathcal{N}(0,\sigma^{2})$ noise, leading to an SNR of $\frac{c_{\beta}}{\sigma^{2}}$ .

The complex Gaussian phase retrieval setting in Section II with $\mathcal{CN}\big{(}0,\frac{c_{\beta}}{\sigma^{2}})$ entries of $\beta_{s}$ is slightly more complicated. Noting that a standard $\chi_{2}^{2}$ random variable has mean $2$ , variance $4$ , and second moment $8$ , we find that the expected SNR for sending a support vector $s\in\mathcal{S}$ is

[TABLE]

where (146) follows from the fact that given $X_{s}$ , $\langle X_{s},\beta_{s}\rangle\sim\mathcal{CN}(0,\sigma_{\beta}^{2}\|X_{s}\|_{2}^{2})$ so $\frac{2}{\sigma_{\beta}^{2}\|X_{s}\|_{2}^{2}}|\langle X_{s},\beta_{s}\rangle|^{2}$ has a $\chi_{2}^{2}$ distribution, (147) follows from the fact that $2\|X_{s}\|_{2}^{2}$ has a $\chi_{2k}^{2}$ distribution so $\mathbb{E}[(2\|X_{s}\|_{2}^{2})^{2}]=\big{(}\mathbb{E}[2\|X_{s}\|_{2}^{2}]\big{)}^{2}+\operatorname{\mathsf{Var}}[2\|X_{s}\|_{2}^{2}]=(2k)^{2}+4k=4k(k+1)$ , and (148) uses $\sigma_{\beta}^{2}=\frac{c_{\beta}}{k}$ . Since we only consider scaling regimes where $k\to\infty$ , the term $\frac{1}{k}$ is negligible.

Discrete $\beta_{s}$ . For the real discrete linear model in [14, Sec. IV-A], we have i.i.d. $\mathcal{N}(0,1)$ measurements, a $k$ -sparse random vector $\beta_{s}$ which is a uniformly random permutation of $b_{s}=(b_{1},b_{2},\cdots,b_{k})$ , and $\mathcal{N}(0,\sigma^{2})$ noise, leading to an SNR of $\frac{\|b_{s}\|_{2}^{2}}{\sigma^{2}}$ . In particular, when $|b_{1}|=|b_{2}|=\cdots=|b_{k}|=\sqrt{\frac{c_{\beta}}{k}}$ , the SNR is equal to $\frac{c_{\beta}}{\sigma^{2}}$ .

For the complex discrete phase retrieval setting in Section II with $\beta_{s}$ being a uniformly random permutation of $b_{s}=(b_{1},b_{2},\cdots,b_{k})$ and with $\mathcal{CN}(0,\sigma^{2})$ noise, we can use similar arguments as in the Gaussian case to show that

[TABLE]

In particular, for the case $|b_{1}|=|b_{2}|=\cdots=|b_{k}|=\sqrt{\frac{c_{\beta}}{k}}$ , we have

[TABLE]

In addition, since the “sorted” vector $b^{\prime}_{s}$ satisfies $\sum_{i=1}^{\lfloor\alpha k\rfloor}|b_{i}^{\prime}|^{2}=\frac{\lfloor\alpha k\rfloor}{k}c_{\beta}\to\alpha c_{\beta}$ (as $k\to\infty$ ) and similarly $\sum_{i=\lfloor\alpha k\rfloor+1}^{k}|b_{i}^{\prime}|^{2}\to(1-\alpha)c_{\beta}$ , the mutual information terms (6) and (II-B) simplify to

[TABLE]

These simplifications readily permit the numerical evaluation of (8)–(9) in Theorem 1 as $k\to\infty$ .

Matching the linear and phase retrieval models. In light of the above calculations, in Figure 1 and Figure 2, we match the SNR of the two models (real linear and complex phase retrieval) by taking $c_{\beta}$ from the phase retrieval model and squaring it and then multiplying it by $2$ to get the value for the linear model.

Appendix B Proof of Theorem 6 (Mutual Information Bounds)

First, for a fixed partition $(s_{\rm eq},s_{\rm dif})$ of the support set $s$ , we rewrite the acquisition model in (1) as

[TABLE]

Conditioned on $\beta_{s}=b_{s}$ , this gives

[TABLE]

where $v_{\rm{eq}}=\sum_{i\in s_{\rm{eq}}}|b_{i}|^{2}$ , $v_{\rm{dif}}=\sum_{i\in s_{\rm{dif}}}|b_{i}|^{2}$ , and $W_{\rm{eq}},W_{\rm{dif}}$ are independent $\mathcal{CN}(0,1)$ random variables (recall that $X_{s}$ has i.i.d. $\mathcal{CN}(0,1)$ entries).

Next, given $\beta_{s}=b_{s}$ and $W_{s_{\rm{eq}}}=w_{s_{\rm{eq}}}$ , we write $Y=U_{w_{\rm{eq}}}+Z$ , where

[TABLE]

follows a non-central $\chi^{2}$ distribution with two degrees of freedom, which is log-concave [24]. Observe that

[TABLE]

where $\tilde{X}_{s_{\rm{eq}}}\sim f_{X}^{|s_{\rm{eq}}|}$ . The entropy of $U_{w_{\rm{eq}}}+Z$ can be lower bounded using the entropy power inequality as $\exp(2h(U_{w_{\rm{eq}}}+Z))\geq\exp(2h(U_{w_{\rm{eq}}}))+\exp(2h(Z))$ [28], or equivalently

[TABLE]

To find an upper bound on the entropy of $U_{w_{\rm{eq}}}+Z$ , we use the reverse entropy power inequality [29, Theorem. 7] for two uncorrelated log-concave random variables $U_{w_{\rm{eq}}}$ and $Z$ to obtain $\exp(2h(U_{w_{\rm{eq}}}+Z))\leq\frac{\pi e}{2}\big{(}\exp(2h(U_{w_{\rm{eq}}}))+\exp(2h(Z))\big{)}$ , or equivalently,

[TABLE]

We now consider upper and lower bounding the entropy of $U_{w_{\rm{eq}}}$ . For the upper bound, we simply use that the Gaussian distribution maximizes entropy for a given variance:

[TABLE]

Moreover, the result of [29, Theorem 3] states that this upper bound is nearly tight for log-concave random variables:

[TABLE]

Indeed, $U_{w_{\rm{eq}}}=v_{\rm{dif}}\big{|}\mathcal{CN}\big{(}\sqrt{\frac{v_{\rm{eq}}}{v_{\rm{dif}}}}w_{\rm{eq}},1\big{)}\big{|}^{2}$ (cf., (155)) has a non-central $\chi^{2}$ distribution with two degrees of freedom, which is log-concave [24]. In addition, the variance is given by [30, p. 45]

[TABLE]

Hence, from (162) and (164), we obtain

[TABLE]

and from from (163), we obtain

[TABLE]

It follows from (161) and (165) that

[TABLE]

where the two equalities are simple algebraic manipulations. Similarly, it follows from (160) and (166) that

[TABLE]

Returning to (159), we have

[TABLE]

where (174) follows from (169) and the concavity of the function $\log(1+x)$ for $x>-1$ , (175) follows from the fact that $W_{\rm{eq}}\sim\mathcal{CN}(0,1)$ .

Finally, from (159) and (172), we have

[TABLE]

and (6) follows from (175)–(176).

Appendix C Proof of Proposition 7 and Corollary 9 (General Concentration of Conditional Information)

Before proceeding, we briefly explain the notation used throughout this appendix. The first two lemmas below concern generic vectors $\mathbf{x}\in\mathbb{R}^{n}$ , and the remainder of the appendix concerns joint density functions on $(X,Y)$ with $X\in\mathbb{R}^{2k}$ and $Y\in\mathbb{R}$ , and more generally on $(\mathbf{X},\mathbf{Y})$ with $\mathbf{X}\in\mathbb{R}^{2kn}$ and $\mathbf{Y}\in\mathbb{R}^{n}$ . Initially, this should be viewed as generic notation; in Appendix D, we will specialize to the phase retrieval setting by interpreting complex vectors in $\mathbb{C}^{k}$ as equivalently being in $\mathbb{R}^{2k}$ .

C-A Technical Analysis

The following lemma gives a sufficient condition for interchanging certain derivatives and integrals, and perhaps more importantly, establishes bounds on certain first and second derivatives that will eventually be used to bound the key quantity $K_{1}$ in Proposition 7. Here and subsequently, $L^{1}(\mathbb{R}^{n})$ denotes the set of absolutely integrable functions on $\mathbb{R}^{n}$ .

Lemma 16.

Fix $n\in\mathbb{Z}^{+}$ , and let $g:\mathbb{R}^{n}\times\mathbb{C}\to\mathbb{C}$ . Assume that $g(\mathbf{x},u)$ is a real entire function666A real entire function is a function on $\mathbb{C}$ which is analytic (complex differentiable or holomorphic) on the entire complex plane and assumes real values on the real axis. For our purposes, it suffices to understand that the exponential function $g(t)=e^{ct}$ falls in this class, and that any function in this class restricted to the real line is always equal to its infinite Taylor expansion [31, Sec. 2.3].
in $u$ for each fixed $\mathbf{x}\in\mathbb{R}^{n}$ such that $g(\mathbf{x},u)\geq 0$ for all $(\mathbf{x},u)\in\mathbb{R}^{n}\times\mathbb{R}^{+}$ . In addition, assume that either $(-1)^{l}\frac{\partial^{l}g}{\partial u^{l}}(\mathbf{x},t)\geq 0$ for all pairs $(l,t)\in\mathbb{N}\times\mathbb{R}^{+}$ or $\frac{\partial^{l}g}{\partial u^{l}}(\mathbf{x},t)\geq 0$ for all pairs $(l,t)\in\mathbb{N}\times\mathbb{R}^{+}$ . For $t\in\mathbb{R}^{+}$ , define

[TABLE]

(i) If $g(\mathbf{x},u)\in L^{1}(\mathbb{R}^{n})$ for all $u\in\mathbb{R}^{+}$ , we have that $T(t)$ is twice differentiable and that

[TABLE]

*for $l\in\{1,2\}$ .

(ii) Let $\mathcal{G}$ be a subset of $(0,\infty)$ . Under the condition

[TABLE]

for some constant $T^{*}$ , we have

[TABLE]

for any $t\in\mathcal{G}^{o}$ , where $\mathcal{G}^{o}$ is an interior of the set $\mathcal{G}$ .

(iii) Let $\mathcal{G}$ be a subset of $(0,\infty)$ . Under the condition

[TABLE]

for some constant $T^{\dagger}$ , we have

[TABLE]

for any $t\in\mathcal{G}^{o}$ .

Proof:

See Appendix E-A. ∎

Fradelizi, Madiman, and Wang [22] state that we can exchange analogous integrals and derivatives if the function under the integral is in $L^{1}$ , but we are not aware of a proof. They also noted that $f_{\mathbf{X}}^{t}(\cdot)$ satisfies this property for any $t>0$ when $f_{\mathbf{X}}(\cdot)$ is log-concave. However, we cannot use such results directly, because we will be considering joint distributions that fail to be jointly log-concave.

The following lemma formally states that the integral of any power of a log-concave random vector is in $L^{1}$ , and provides an explicit upper bound on such an integral (to be used in Corollary 22 below).

Lemma 17.

Fix $n\in\mathbb{Z}^{+}$ , and let $f:\mathbb{R}^{n}\to\mathbb{R}^{+}$ be a log-concave function such that $\|f\|_{1}<\infty$ and $\|f\|_{\infty}<\infty$ .777Here $\|f\|_{1}$ denotes the integral of the absolute value, and $\|f\|_{\infty}$ denotes the maximum absolute value. Then, for all $t>0$ , the following holds:

[TABLE]

where $D$ is finite and is defined as

[TABLE]

Proof:

Observe that

[TABLE]

where (186) follows from a change of variable $\mathbf{z}=t\mathbf{x}$ . Noting that $t\log f(\mathbf{z}/t)$ is jointly concave as a function of $(\mathbf{z},t)\in\mathbb{R}^{n}\times(0,\infty)$ [22, Lemma 2.8], we find that $\int_{\mathbb{R}^{n}}\exp(t\log f(\mathbf{z}/t))\mu(d\mathbf{z})$ is log-concave in $t$ by Prékepa’s theorem [32], which states that the marginal function of a jointly log-concave function is log-concave. Since the product of two log-concave functions is log-concave, we deduce from (186) that the function $(\|f\|_{\infty}+1)^{-t}\int_{\mathbb{R}^{n}}t^{n}f^{t}(\mathbf{x})\mu(d\mathbf{x})$ is also log-concave in $t$ .

To establish that the supremum over $t>0$ in (184) is bounded, we will combine the log-concavity property with the limiting behavior as $t\to\infty$ . We write

[TABLE]

and consider taking the limit $t\to\infty$ on both sides. For this purpose, we need to establish some technical conditions for applying the monotone convergence theorem [33, Ch. 18]:

For fixed $\mathbf{x}\in\mathbb{R}^{n}$ , the function $t^{n}\big{(}\frac{f(\mathbf{x})}{\|f\|_{\infty}+1}\big{)}^{t}$ is non-increasing for $t$ sufficiently large, since $0\leq\frac{f(\mathbf{x})}{\|f\|_{\infty}+1}<1$ by the definition of $\|f\|_{\infty}$ . 2. 2.

For each fixed $\mathbf{x}\in\mathbb{R}^{n}$ , we have

[TABLE]

again using $0\leq\frac{f(\mathbf{x})}{\|f\|_{\infty}+1}<1$ . 3. 3.

The function $t^{n}\big{(}\frac{f(\mathbf{x})}{\|f\|_{\infty}+1}\big{)}^{t}$ is integrable with respect to $\mathbf{x}$ for any fixed $t\geq 1$ ; this is because $\big{(}\frac{f(\mathbf{x})}{\|f\|_{\infty}+1}\big{)}^{t}\leq\frac{f(\mathbf{x})}{\|f\|_{\infty}+1}$ for $t\geq 1$ , and $\|f\|_{1}\leq\infty$ .

Taking limits in (187) and applying (188) and the monotone convergence theorem [33, Ch. 18], we obtain

[TABLE]

Summarizing the above findings, we have shown that the function $\kappa(t):=\log\big{(}(\|f\|_{\infty}+1)^{-t}\int_{\mathbb{R}^{n}}t^{n}f^{t}(\mathbf{x})\mu(d\mathbf{x})\big{)}$ is concave in $t$ (and is therefore continuous wherever it takes finite values), is bounded from above for any fixed $t\geq 1$ , and tends to $-\infty$ as $t\to\infty$ . These properties immediately imply that $\sup_{t\geq 1}\kappa(t)<\infty$ , so to establish $D<\infty$ in (184), it only remains to show that $\sup_{t\in(0,1)}\kappa(t)<\infty$ .

If $\|f\|_{1}=0$ , then $f(\mathbf{x})$ is zero almost everywhere, and the claim $D<\infty$ is trivial, so we proceed assuming that $\|f\|_{1}>0$ . In this case, $\int_{\mathbb{R}^{n}}f^{t}(\mathbf{x})\mu(d\mathbf{x})>0$ for any fixed $t>0$ , which implies that $\inf_{t\in[1,2]}\kappa(t)>-\infty$ (again, a concave function is continuous wherever it takes finite values). By concavity, we have for $t\in(0,1)$ that $\kappa(1)\geq\frac{1}{2}\big{(}\kappa(t)+\kappa(2-t)\big{)}$ , or equivalently

[TABLE]

Hence, having already shown that $\sup_{u\geq 1}\kappa(u)<\infty$ and $\inf_{u\in[1,2]}\kappa(u)>-\infty$ , we deduce that $\sup_{t\in(0,1)}\kappa(t)<\infty$ and hence $D<\infty$ .

∎

We note that the preceding lemmas concern general vectors $\mathbf{x}$ that need not be related to the matrix $\mathbf{X}$ in the phase retrieval setting. Henceforth, we gives results concerning pairs $(\mathbf{x},\mathbf{y})$ , which will later be directly equated with the relevant quantities in the phase retrieval problem.

In the following lemma, we specialize the first part of Lemma 16 to functions of $(\mathbf{x},\mathbf{y})\in\mathbb{R}^{2kn}$ under the condition of a certain integral being finite. This condition is explored further below.

Corollary 18.

Fix $n,k\in\mathbb{Z}^{+}$ , and let $(\mathbf{X},\mathbf{Y})\in\mathbb{R}^{2kn}\times\mathbb{R}^{n}$ be random vectors with joint distribution $f_{\mathbf{X}\mathbf{Y}}$ . For each $t\in\mathbb{R}^{+}$ , define

[TABLE]

Then, under the condition that

[TABLE]

holds for all $t\in\mathbb{R}^{+}$ , we have that $L_{n}(t)$ is twice differentiable and

[TABLE]

Proof:

We use the first part of Lemma 16 with $(\mathbf{x},\mathbf{y})$ playing the role of $\mathbf{x}$ therein, and $f_{\mathbf{X}}f_{\mathbf{Y}|\mathbf{X}}^{t}$ playing the role of $g$ . Note that for each fixed $(\mathbf{x},\mathbf{y})\in\mathbb{R}^{2kn}\times\mathbb{R}^{n}$ , $f_{\mathbf{Y}|\mathbf{X}}^{t}(\mathbf{y}|\mathbf{x})f_{\mathbf{X}}(\mathbf{x})=\exp(t\log f_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x}))$ is an entire function in $t\in\mathbb{C}$ [31, Sec. 2.3] and that $f_{\mathbf{Y}|\mathbf{X}}^{t}(\mathbf{y}|\mathbf{x})f_{\mathbf{X}}(\mathbf{x})\in\mathbb{R}_{+}$ for all $t\in\mathbb{R}^{+}$ . In addition, for each fixed $(\mathbf{x},\mathbf{y})\in\mathbb{R}^{kn}\times\mathbb{R}^{n}$ , we have

[TABLE]

or equivalently,

[TABLE]

Hence, for each fixed $(\mathbf{x},\mathbf{y})$ we have that $\frac{\partial^{l}f_{\mathbf{Y}|\mathbf{X}}^{t}(\mathbf{y}|\mathbf{x})f_{\mathbf{X}}(\mathbf{x})}{\partial t^{l}}\geq 0$ for all pairs $(l,t)\in\mathbb{N}\times\mathbb{R}^{+}$ if $f_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x})>1$ , and that $(-1)^{l}\frac{\partial^{l}f_{\mathbf{Y}|\mathbf{X}}^{t}(\mathbf{y}|\mathbf{x})f_{\mathbf{X}}(\mathbf{x})}{\partial t^{l}}\geq 0$ for all pairs $(l,t)\in\mathbb{N}\times\mathbb{R}^{+}$ if $f_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x})\leq 1$ , so that the assumption of Lemma 16 is satisfied in both cases. ∎

The following lemma provides sufficient conditions under which (193) holds.

Lemma 19.

Fix $n,k\in\mathbb{Z}^{+}$ , and let $(\mathbf{X},\mathbf{Y})\sim f_{\mathbf{X}\mathbf{Y}}$ . Under the conditions

[TABLE]

we have that (193) of Corollary 18 holds for all $t\in\mathbb{R}^{+}$ , i.e., $L_{n}(t)=\int_{\mathbb{R}^{2kn}}f_{\mathbf{X}}(\mathbf{x})\int_{\mathbb{R}^{n}}f_{\mathbf{Y}|\mathbf{X}}^{t}(\mathbf{y}|\mathbf{x})\mu(d\mathbf{y})\mu(d\mathbf{x})<\infty$ . More specifically, we have

[TABLE]

for all $0<t\leq 1$ , and

[TABLE]

for all $t>1$ .

Proof:

See Appendix E-B. ∎

The following corollary shows that the sufficient conditions of Lemma 19 are satisfied when $(\mathbf{X},\mathbf{Y})$ are i.i.d. according to a joint distribution on $(X,Y)$ corresponding to an additive noise model with a log-concave marginal $f_{Y}$ . The latter condition can be interpreted as stating that $f_{Y|X}(\cdot|x)$ is log-concave “on average”. In addition, explicit upper bounds on (192) are given that will be useful later.

Corollary 20.

Fix $n,k\in\mathbb{Z}^{+}$ , and let $(\mathbf{X},\mathbf{Y})\sim f_{XY}^{n}$ be i.i.d. on $f_{XY}$ with $x\in\mathbb{R}^{2k}$ and $Y\in\mathbb{R}$ . Assume that $f_{Y}$ is log-concave, and that given $X=x$ , we have $Y=U_{x}+Z$ , where $U_{x}$ and $Z$ are independent random variables and $\|f_{Z}\|_{\infty}<\infty$ . Then conditions (198)–(199) of Lemma 19 hold, and in addition, we have

[TABLE]

Proof:

First, for all $(x,y)\in\mathbb{R}^{2k}\times\mathbb{R}$ , we have

[TABLE]

and hence

[TABLE]

or equivalently $\sup_{\mathbf{x}\in\mathbb{R}^{2kn},\mathbf{y}\in\mathbb{R}^{n}}f_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x})\leq\|f_{Z}\|_{\infty}^{n}<\infty$ . This means that condition (199) of Lemma 19 holds. Moreover, we have from (205) that

[TABLE]

Combining this with the log-concavity of $Y$ (and hence $\mathbf{Y}$ ) and applying Lemma 17, we deduce that condition (198) of Lemma 19 holds. ∎

The preceding results will be used in conjunction with the following lemma in order to bound the key quantity $K_{1}$ appearing in Proposition 7. This result is a counterpart to part of the analysis in [22, proof of Theorem 2.3], but it is proved using different methods.888The analysis in [22, proof of Theorem 2.3] does not seem to be feasible for our purposes unless $(X,Y)$ are jointly log-concave, since otherwise we cannot confirm that $\bar{G}(t):=n\log t+\log\int_{\mathbb{R}^{2kn}}f_{\mathbf{X}}(\mathbf{x})\int_{\mathbb{R}^{n}}f_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x})\mu(d\mathbf{y})\mu(d\mathbf{x})$ is concave.

Lemma 21.

Fix $k\in\mathbb{Z}^{+}$ , and let $(X,Y)\sim f_{XY}$ such that $X\in\mathbb{R}^{2k}$ , $Y\in\mathbb{R}$ , and the distribution of $Y$ is log-concave. Define $L_{1}(t):=\int_{\mathbb{R}^{2k}}f_{X}(x)\int_{\mathbb{R}}f_{Y|X}^{t}(y|x)\mu(dy)\mu(dx)$ , and suppose that

[TABLE]

for some positive constants $P_{1}$ , $P_{2}$ , and $\bar{Q}_{1}$ . Then, defining $C=150\max\{P_{1},P_{2}\}$ , we have

[TABLE]

for all $t\in\big{(}0,1\big{]}$ , and

[TABLE]

for all $t>1$ .

Proof:

This result follows from Lemma 16 (with $n=2k+1$ , since we consider $(X,Y)$ jointly) applied separately for the following two cases:

•

For $0<t\leq 1$ , set $g(x,y,t):=f_{X}(x)f_{Y|X}^{t}(y|x)$ and use the third part of the lemma with $\mathcal{G}=(0,1]$ ;

•

For $t>1$ , set $g(x,y,t):=\bar{Q}_{1}^{1-t}f_{X}(x)f_{Y|X}^{t}(y|x)$ and use the second part of the lemma with $\mathcal{G}=[1,\infty)$ .

Note that $f_{X}(x)f_{Y|X}^{t}(y|x)$ and $\bar{Q}_{1}^{1-t}f_{X}(x)f_{Y|X}^{t}(y|x)=f_{X}(x)\big{(}\frac{f_{Y|X}(y|x)}{\bar{Q}_{1}}\big{)}^{t}$ are both real entire functions in $t\in\mathbb{C}$ for each fixed $(x,y)\in\mathbb{R}^{2k}\times\mathbb{R}$ (see Footnote 6 on page 6). In addition, both functions are non-negative valued, and the required conditions on the derivatives hold by the same argument as (196)–(197). ∎

C-B Proof of Proposition 7 (General Exponential Bound)

Recall the notation $f_{XY}$ , $\bar{Q}$ and $K_{1}$ as per the proposition statement, and define

[TABLE]

where $L(t):=\int_{\mathbb{R}^{2k}\times\mathbb{R}}f_{Y|X}^{t}(y|x)f_{X}(x)\mu(dx\times dy)$ as stated in (32). From Corollary 18 with $n=1$ , we have

[TABLE]

and in addition, the definition of $K_{1}$ in (34) immediately implies

[TABLE]

Now, from the Taylor-Lagrange formula (e.g., see [22, proof of Theorem 3.1]) for the function $tL(t)$ , for every $t\in\big{(}0,1]$ , we have

[TABLE]

where (220) follows from (217) along with direct differentiation, and (221) follows from (214) and (216). It follows from (221) that for all $t\in(0,1]$ , we have

[TABLE]

where (225) follows from the fact that $\log(1+x)\leq x$ for all $x>-1$ .

In addition, from the Taylor-Lagrange formula for the function $\bar{Q}^{1-t}L(t)$ , for every $t\in\big{(}1,\infty)$ , we have

[TABLE]

where (228) follows from (218) along with direct differentiation, and (229) follows from (214) and (216). It follows from (229) that for all $t\in(1,\infty)$ , we have

[TABLE]

where (233) follows from the fact that $\log(1+x)\leq x$ for all $x>-1$ , and (235) follows from the fact that $t-1-\log t\geq 0$ for all $t>0$ .

Combining the cases in (226) and (235), we have

[TABLE]

for all $t>0$ . On the other hand, since $F(t)=\log L(t)$ , we also have

[TABLE]

It follows from (236) and (239) that

[TABLE]

or equivalently

[TABLE]

for all $t>0$ . By setting $\mu=1-t$ , we obtain (36) from (241), recalling from the definition of $r(\cdot)$ in (39) that $r(-\mu)=-\mu-\log(1-\mu)$ for $\mu<1$ . The remaining case $\mu\geq 1$ is trivial, since the right-hand side of (36) evaluates to $+\infty$ by the definition $r(-\mu)=+\infty$ for $\mu\geq 1$ .

C-C Proof of Corollary 9 (General Concentration Corollary)

The proof is very similar to that of [22, Corollary 3.4], with the main idea being to use the Chernoff bound and optimize the exponent.

By the Chernoff bound, we have for any $\beta>0$ and $\mu>0$ that

[TABLE]

Combining these bounds with Proposition 7 (with $\beta=\mu$ in the first case and $\beta=-\mu$ in the second case), we obtain

[TABLE]

where $r(u)$ is defined in (39). Now, define

[TABLE]

It is easy to see that $r^{*}(t)=+\infty$ for $t\geq 1$ . For $0<t<1$ , by differentiating, the supremum is reached at $u=\frac{t}{1-t}>0$ and the maximum value is

[TABLE]

In fact, $r^{*}(t)=r(-t)$ holds for all $t>0$ , since $r(-t)$ has value $+\infty$ for $t\geq 1$ by definition.

From (244) and (248), for $\mu>0$ , we have

[TABLE]

Similarly, we can define

[TABLE]

By differentiating, the supremum is reached at $u=\frac{t}{1+t}\in(0,1)$ and the maximum value is

[TABLE]

for any $t>0$ (here there is no $+\infty$ case). From (245) and (254), for $\mu>0$ , we have

[TABLE]

The proof is completed by replacing $\mu$ by $n(K_{1}+1)\mu$ in (251) and (257), and noting that $h(\mathbf{Y}|\mathbf{X})=nh(Y|X)$ for $(\mathbf{X},\mathbf{Y})\sim f_{XY}^{n}$ .

Appendix D Proof of Theorem 11 (Concentration of Information Density for Phase Retrieval)

The following corollary shows that for the phase retrieval setting, $f_{Y|X_{s_{\rm{eq}}}\beta_{s}}(\cdot|x_{s_{\rm{eq}}},b_{s})$ and $f_{Y|X_{s}\beta_{s}}(\cdot|x_{s},b_{s})$ have the boundedness properties required to apply Lemma 21.

Corollary 22.

For the phase retrieval model in (1), we have for fixed $s$ and $s_{\rm{eq}}\subset s$ that

[TABLE]

and

[TABLE]

for all $b_{s}\in\mathbb{C}^{k}$ , where $\ell:=k-|s_{\rm{eq}}|$ , and

[TABLE]

Proof:

We condition on $\beta_{s}=b_{s}$ (and $S=s$ ), and consider the resulting joint distributions $(X_{s},Y)$ and $(X_{s_{\rm eq}},Y)$ for a single measurement. The log-concavity properties in Lemma 5 allow us to apply Corollary 20 (with $n=1$ ) and subsequently Lemma 19. Substituting (202) into (201) yields the following for $t>1$ :

[TABLE]

These equations are equivalent to (260) and (261).

For the case $t\in(0,1]$ , we first apply Lemma 17 and Corollary 20 (with $n=1$ ); the latter implies $\|f_{Y|\beta_{s}=b_{s}}\|_{\infty}\leq\|f_{Z}\|_{\infty}<\infty$ via (203), which we combine with the former to obtain

[TABLE]

where $D(b_{s})$ is defined in (262). Since $t\leq 1$ and $\|f_{Z}\|_{\infty}+1\geq 1$ , we can weaken (266) to

[TABLE]

Now, by applying (200) of Lemma 19 (with $n=1$ and $X_{s_{\rm eq}}$ playing the role of $X$ ) together with (267), we have

[TABLE]

and similarly

[TABLE]

by replacing $X_{s_{\rm{eq}}}$ by $X_{s}$ . ∎

With Corollary 22 in place, we are able to use Lemma 21 to deduce the following result for bounding the crucial quantity $K_{1}$ in the concentration bounds (first appearing in Proposition 7, and leading to Corollary 9 being the form that we will apply). Note that below, $\bar{L}_{1,b_{s}}(t)$ and $\bar{L}^{(s_{\rm{eq}})}_{1,b_{s}}(t)$ are instances of $L_{1}(t)$ in Lemma 21, and $\bar{K}_{1}(b_{s})$ and $\bar{K}^{(s_{\rm{eq}})}_{1}(b_{s})$ are instances of $K_{1}$ .

Lemma 23.

For the phase retrieval model in Section II, for fixed $s$ , $s_{\rm{eq}}\subset s$ , and $b_{s}\in\mathbb{C}^{k}$ , define

[TABLE]

for $t>0$ , where $\ell:=|s_{\rm{dif}}|=k-|s_{\rm{eq}}|$ . Moreover, define

[TABLE]

Then, the following bounds hold:

[TABLE]

where

[TABLE]

and $D(b_{s})$ is defined in (262).

Proof:

This result is obtained by applying Lemma 21 (with $(X_{s},Y)$ or $(X_{s_{\rm eq}},Y)$ in place of $(X,Y)$ , and conditioning on $\beta_{s}=b_{s}$ ), and characterizing the upper bounds $P_{1},P_{2}$ therein using Corollary 22. Note that a complex random vector in $\mathbb{C}^{k}$ can be equivalently considered as a real complex vector in $\mathbb{R}^{2k}$ . ∎

With the above tools in place, we are ready to prove the main result on the concentration of the information density (Theorem 11), and its simplified version (Corollary 12).

D-A Proof of Theorem 11

By the assumption of i.i.d. measurements, we have

[TABLE]

where, recalling $Y=|\langle X_{s},\beta_{s}\rangle|+Z$ , the conditional distributions for a single measurement are given by

[TABLE]

with $U_{s_{\rm eq}}$ being the squared magnitude of a $\mathcal{CN}(\langle x_{s_{\rm{eq}}},b_{s}\rangle,\|b_{s_{\rm{dif}}}\|_{2}^{2})$ random variable. Moreover, we have

[TABLE]

where $\tilde{h}$ denotes the (conditional) negative log-density (cf., (41)).

Recall the log-concavity properties in Lemma 5 for a single measurement, which immediately imply analogous properties for the vector of $n$ independent measurements [25, Prop. 3.2]. Applying Corollary 9 and bounding $K_{1}$ therein (as defined in (34)) by $C(b_{s})-1$ in accordance with Lemma 23, we have for all $\mu>0$ that

[TABLE]

noting the one-to-one correspondence between $\mathbf{X}_{s}$ and $(\mathbf{X}_{s_{\rm{dif}}},\mathbf{X}_{s_{\rm{eq}}})$ . Similarly, Corollary 9 and Lemma 23 also yield

[TABLE]

Finally, observe that for all $\mu>0$ , we have

[TABLE]

where (D-A) follows from (282) and the union bound, (288) follows from (284)–(285). Notice that (288) recovers (47), and we similarly obtain (48) from (283) and (286).

D-B Proof of Corollary 12

Recall that given $\beta_{s}=b_{s}$ , any given measurement takes the form $Y=|\langle X_{s},b_{s}\rangle|^{2}+Z$ , where $X_{s}$ has i.i.d. $\mathcal{N}(0,1)$ entries. Hence, $f_{Y|\beta_{s}}$ is the convolution of the noisy density $f_{Z}$ with a $\chi_{2}^{2}$ random variable scaled by $\|b_{s}\|^{2}$ . This implies that if $\|b_{s}\|^{2}$ is a constant (i.e., remains fixed as $k$ increases), then so is $D(b_{s})$ (see (262)) and hence also $C(b_{s})$ (see (277)). Equivalently, if $\|b_{s}\|_{2}=\Theta(1)$ , then $C(b_{s})=\Theta(1)$ , as required.

Appendix E Technical Proofs

E-A Proof of Lemma 16

Proof of part (i). Fix $t\in\mathbb{R}^{+}$ . Since $g(\mathbf{x},u)$ is a real entire function in $u$ (analytic for all $u\in\mathbb{C}$ ) for each fixed $\mathbf{x}\in\mathbb{R}^{n}$ , by Taylor’s expansion [31, Theorem 4.4], we have

[TABLE]

for all $u\in\mathbb{R}^{+}$ . Re-arranging, we obtain for $u\neq t$ that

[TABLE]

Taking the absolute value, and supposing that $|u-t|\leq\varepsilon/2$ for some $\varepsilon<2t$ , we have

[TABLE]

where (293) follows from the assumption that either $(-1)^{l}\frac{\partial^{l}g}{\partial u^{l}}(\mathbf{x},t)\geq 0$ for all $(l,t)$ or $\frac{\partial^{l}g}{\partial u^{l}}(\mathbf{x},t)\geq 0$ for all $(l,t)$ , (294) follows from (289) applied twice with $u=t-\varepsilon/2$ and $u=t+\varepsilon/2$ , and (295) follows from the triangle inequality and the non-negativity of $g$ .

We proceed by integrating (295) over $\mathbf{x}$ . By the assumption that $g(\mathbf{x},u)\in L^{1}(\mathbb{R}^{n})$ for all $u>0$ , we have

[TABLE]

where we applied the definition of $T(t)$ in (177), and used the fact that it is finite by the assumption $g(\mathbf{x},u)\in L^{1}(\mathbb{R}^{n})$ for fixed $u$ .

The definition of $T(t)$ also yields

[TABLE]

From (295) and (298), we have that $\big{|}\frac{g(\mathbf{x},u)-g(\mathbf{x},t)}{u-t}\big{|}$ is dominated by the integrable function $\frac{2}{\varepsilon}\big{(}g(\mathbf{x},t-\varepsilon/2)+2g(\mathbf{x},t)+g(\mathbf{x},t+\varepsilon/2)$ , meaning we can apply the dominated convergence theorem [33, Ch. 18] to obtain

[TABLE]

where (300) uses (299), and (302) follows from the definition of partial derivative. We have thus proved (178) in the case that $l=1$ .

Similarly to (302), from (295), (298), and the dominated convergence theorem [33, Ch. 18], we have

[TABLE]

From (295), (298), and (304), we have

[TABLE]

Since (305) holds for any $t>0$ , we similarly have

[TABLE]

Now, following the same steps as those for the first derivative, we have the following analog of (299):

[TABLE]

By Taylor’s expansion (replacing $g(\mathbf{x},u)$ in (290) by $\frac{\partial g}{\partial u}(\mathbf{x},u)$ ), we also have

[TABLE]

Using the same arguments as from (291) to (295), we obtain (for $u\neq t$ with $|u-t|\leq\varepsilon/2$ and $t>\varepsilon/2$ ) that

[TABLE]

Integrating both sides and applying the definition of $\bar{T}$ in (305), we obtain

[TABLE]

where the finiteness is by (307).

From (310), (312), and (313), we have that $\big{|}\frac{\frac{\partial g}{\partial u}(\mathbf{x},u)-\frac{\partial g}{\partial u}(\mathbf{x},t)}{u-t}\big{|}$ is dominated by the integrable function $\frac{2}{\varepsilon}\big{(}\big{|}\frac{\partial g}{\partial u}(\mathbf{x},t-\varepsilon/2)\big{|}+2\big{|}\frac{\partial g}{\partial u}(\mathbf{x},t)\big{|}+\big{|}\frac{\partial g}{\partial u}(\mathbf{x},t+\varepsilon/2)\big{|}\big{)}$ . Hence, by the dominated convergence theorem [33, Ch. 18], we have

[TABLE]

This proves (178) for $l=2$ .

Proof of part (ii). Setting $\varepsilon=t$ , we have from (302) and (307) that

[TABLE]

by upper bounding each $T(\cdot)$ by $T^{*}$ in accordance with (179). Moreover, returning to (315), we have

[TABLE]

where (322) uses (313) with $\epsilon=t$ . Combining (322) and (320) gives

[TABLE]

as required.

Proof of part (iii). We again set $\varepsilon=t$ . The two steps leading to (319) are still valid in this case, but from there we need to proceed differently via the definition of $T^{\dagger}$ in (181):

[TABLE]

In addition (322) is still valid in this case, but is further bounded differently via (328):

[TABLE]

From (328) and (331), we have

[TABLE]

as required.

Remark. We could potentially reduce the constant $75$ in (324) or $150$ in (334) by choosing the optimal values of $\varepsilon\in(0,2t)$ . However, for the purposes of this paper, the exact values of these constants are not important.

E-B Proof of Lemma 19

Case 1: $t>1$ . For brevity, let $\tilde{Q}:=\sup_{\mathbf{x}\in\mathbb{R}^{2kn},\mathbf{y}\in\mathbb{R}^{n}}f_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x})$ . We have

[TABLE]

where (336) follows from the definition of $\tilde{Q}$ , and (337) from the assumption $\tilde{Q}<\infty$ .

Case 2: $0<t\leq 1$ . In this case, we have

[TABLE]

Now, for each $(\mathbf{x},\mathbf{y})\in\mathbb{R}^{2kn}\times\mathbb{R}^{n}$ , if $f_{\mathbf{X}|\mathbf{Y}}(\mathbf{x}|\mathbf{y})\leq f_{\mathbf{X}}(\mathbf{x})$ , then we have

[TABLE]

whereas if $f_{\mathbf{X}|\mathbf{Y}}(\mathbf{x}|\mathbf{y})>f_{\mathbf{X}}(\mathbf{x})$ , then we have

[TABLE]

Combining these two cases, we obtain

[TABLE]

for all $(\mathbf{x},\mathbf{y})\in\mathbb{R}^{2kn}\times\mathbb{R}^{n}$ . Hence,

[TABLE]

where (347) follows from the boundedness assumption in (198).

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Y. C. Eldar and S. Mendelson, “Phase retrieval: Stability and recovery guarantees,” Applied and Computational Harmonic Analysis , vol. 36, no. 3, pp. 473–494, 2014.
2[2] P. Schniter and S. Rangan, “Compressive phase retrieval via generalized approximate message passing,” IEEE Trans. Signal Process. , vol. 63, no. 4, pp. 1043–1055, 2014.
3[3] X. Li and V. Voroninski, “Sparse signal recovery from quadratic measurements via convex programming,” SIAM Journal on Mathematical Analysis , vol. 45, no. 5, pp. 3019–3033, 2013.
4[4] S. Cai, M. Bakshi, S. Jaggi, and M. Chen, “SUPER: Sparse signals with unknown phases efficiently recovered,” in Proc. of Intl. Symp. on Inform. Th. , 2014.
5[5] V. Nakos, “Almost optimal phaseless compressed sensing with sublinear decoding time,” in Proc. of Intl. Symp. on Inform. Th. , 2017, pp. 1142–1146.
6[6] Y. Li and V. Nakos, “Sublinear-time algorithms for compressive phase retrieval,” in Proc. of Intl. Symp. on Inform. Th. , Vail, CO, 2018, pp. 2301–2305.
7[7] M. Iwen, A. Viswanathan, and Y. Wang, “Robust sparse phase retrieval made easy,” Applied and Computational Harmonic Analysis , vol. 42, no. 1, pp. 135 – 142, 2017.
8[8] R. Pedarsani, D. Yin, K. Lee, and K. Ramchandran, “Phasecode: Fast and efficient compressive phase retrieval based on sparse-graph codes,” IEEE Transactions on Information Theory , vol. 63, no. 6, pp. 3663–3691, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Support Recovery in the Phase Retrieval Model: Information-Theoretic Fundamental Limits

Abstract

Index Terms:

I Introduction

I-A Sparse Phase Retrieval

I-B Support Recovery

I-C Contributions

I-D Notation

I-E Structure of the Paper

II Problem Setup and Main Results

II-A Model and Assumptions

II-B Overview of Main Results

Theorem 1**.**

Proof:

Theorem 2**.**

III Auxiliary Results

III-A Information-Theoretic Definitions

III-B General Achievability and Converse Bounds

Theorem 3**.**

Theorem 4**.**

III-C Log-Concavity Properties

Lemma 5**.**

Proof:

III-D Mutual Information Bounds

Theorem 6**.**

Proof:

III-E Concentration Bounds

Proposition 7**.**

Proof:

Corollary 8**.**

Proof:

Corollary 9**.**

Proof:

Remark 10**.**

Theorem 11**.**

Proof:

Corollary 12**.**

Proof:

IV Proof of Theorem 1 (Discrete βs\beta_{s}βs​)

Lemma 13**.**

IV-A Proof of Lemma 13

IV-B Proof of Theorem 1

V Proof of Theorem 2 (Gaussian βs\beta_{s}βs​)

Proposition 14**.**

Proof:

Proposition 15**.**

V-A Proof of Theorem 2

V-B Proof of Proposition 14

VI Conclusion

Appendix A Signal-to-Noise Ratio (SNR) Calculations

Appendix B Proof of Theorem 6 (Mutual Information Bounds)

Appendix C Proof of Proposition 7 and Corollary 9 (General Concentration of Conditional Information)

C-A Technical Analysis

Lemma 16**.**

Proof:

Lemma 17**.**

Proof:

Corollary 18**.**

Proof:

Lemma 19**.**

Proof:

Corollary 20**.**

Proof:

Lemma 21**.**

Proof:

C-B Proof of Proposition 7 (General Exponential Bound)

C-C Proof of Corollary 9 (General Concentration Corollary)

Appendix D Proof of Theorem 11 (Concentration of Information Density for Phase Retrieval)

Corollary 22**.**

Proof:

Lemma 23**.**

Proof:

D-A Proof of Theorem 11

Theorem 1.

Theorem 2.

Theorem 3.

Theorem 4.

Lemma 5.

Theorem 6.

Proposition 7.

Corollary 8.

Corollary 9.

Remark 10.

Theorem 11.

Corollary 12.

IV Proof of Theorem 1 (Discrete $\beta_{s}$ )

Lemma 13.

V Proof of Theorem 2 (Gaussian $\beta_{s}$ )

Proposition 14.

Proposition 15.

Lemma 16.

Lemma 17.

Corollary 18.

Lemma 19.

Corollary 20.

Lemma 21.

Corollary 22.

Lemma 23.