Fourier Phase Retrieval with Extended Support Estimation via Deep Neural   Network

Kyung-Su Kim; Sae-Young Chung

arXiv:1904.01821·stat.ML·October 2, 2019

Fourier Phase Retrieval with Extended Support Estimation via Deep Neural Network

Kyung-Su Kim, Sae-Young Chung

PDF

TL;DR

This paper introduces a deep neural network approach for sparse Fourier phase retrieval that estimates an extended support set to improve signal reconstruction accuracy with low computational complexity.

Contribution

The paper proposes a novel DNN-based method to estimate an extended support set for sparse phase retrieval, enhancing accuracy and efficiency over existing methods.

Findings

01

Outperforms local search-based greedy methods in accuracy.

02

Achieves lower computational complexity.

03

Demonstrates superior performance in numerical experiments.

Abstract

We consider the problem of sparse phase retrieval from Fourier transform magnitudes to recover the $k$ -sparse signal vector and its support $T$ . We exploit extended support estimate $E$ with size larger than $k$ satisfying $E \supseteq T$ and obtained by a trained deep neural network (DNN). To make the DNN learnable, it provides $E$ as the union of equivalent solutions of $T$ by utilizing modulo Fourier invariances. Set $E$ can be estimated with short running time via the DNN, and support $T$ can be determined from the DNN output rather than from the full index set by applying hard thresholding to $E$ . Thus, the DNN-based extended support estimation improves the reconstruction performance of the signal with a low complexity burden dependent on $k$ . Numerical results verify that the proposed scheme…

Equations10

y [i] = c [i] + w [i] for i \in {1, ..., m},

y [i] = c [i] + w [i] for i \in {1, ..., m},

x \in R^{n} minimize g (x; S) := \sum_{i \in {1 : m}} (y [i] - v_{i} (x; S))^{2},

x \in R^{n} minimize g (x; S) := \sum_{i \in {1 : m}} (y [i] - v_{i} (x; S))^{2},

x minimize subject to g (x; E) := \sum_{i \in {1 : m}} (y [i] - v_{i} (x; E))^{2}, ∣ supp (x) ∣ \leq k,

x minimize subject to g (x; E) := \sum_{i \in {1 : m}} (y [i] - v_{i} (x; E))^{2}, ∣ supp (x) ∣ \leq k,

L (θ) := E_{(\overset{y}{ˉ}, \overset{x}{ˉ})} [ce(f_{θ} (\overset{y}{ˉ}), u_{n} (I_{- 1} (T_{\overset{x}{ˉ}}) - 1)],

L (θ) := E_{(\overset{y}{ˉ}, \overset{x}{ˉ})} [ce(f_{θ} (\overset{y}{ˉ}), u_{n} (I_{- 1} (T_{\overset{x}{ˉ}}) - 1)],

p_{x^{\circ}} (x ∣ s) := \frac{p _{x^{\circ}} ( x ) \cdot 1 ( ∣ supp ( x ) ∣ = s )}{\sum _{\tilde{x} \in R^{n} s.t. ∣ supp (\tilde{x}) ∣ = s} p _{x^{\circ}} ( x ~ )},

p_{x^{\circ}} (x ∣ s) := \frac{p _{x^{\circ}} ( x ) \cdot 1 ( ∣ supp ( x ) ∣ = s )}{\sum _{\tilde{x} \in R^{n} s.t. ∣ supp (\tilde{x}) ∣ = s} p _{x^{\circ}} ( x ~ )},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Fourier Phase Retrieval with Extended Support Estimation via Deep Neural Network

Kyung-Su Kim, Sae-Young Chung

School of Electrical Engineering, Korea Advanced Institute of Science and Technology

E-mails: kyungsukim, [email protected]

Abstract

We consider the problem of sparse phase retrieval from Fourier transform magnitudes to recover the $k$ -sparse signal vector and its support $\mathcal{T}$ . We exploit extended support estimate $\mathcal{E}$ with size larger than $k$ satisfying $\mathcal{E}\supseteq\mathcal{T}$ and obtained by a trained deep neural network (DNN). To make the DNN learnable, it provides $\mathcal{E}$ as the union of equivalent solutions of $\mathcal{T}$ by utilizing modulo Fourier invariances. Set $\mathcal{E}$ can be estimated with short running time via the DNN, and support $\mathcal{T}$ can be determined from the DNN output rather than from the full index set by applying hard thresholding to $\mathcal{E}$ . Thus, the DNN-based extended support estimation improves the reconstruction performance of the signal with a low complexity burden dependent on $k$ . Numerical results verify that the proposed scheme has a superior performance with lower complexity compared to local search-based greedy sparse phase retrieval and a state-of-the-art variant of the Fienup method.

Index Terms:

Deep neural network, extended support estimation, Fourier transform, sparse phase retrieval.

I Introduction

Sparse phase retrieval from the magnitude of the Fourier transform (SPRF) [1] has been widely studied in many fields including X-ray crystallography [2], optics [3, 4], blind channel estimation [5], and computational biology [6]. It recovers $k$ -sparse111Signal vector ${x^{\circ}}$ is called $k$ -sparse if it has $k$ nonzero elements. signal vector ${x^{\circ}}=(x[1],...,x[n])^{\top}$ given measurements and the squared magnitude, $y=(y[1],...,y[m])^{\top}$ , of an $m$ -point discrete Fourier transform of ${x^{\circ}}$ :

[TABLE]

where $c[i]=\operatorname{abs}(\sum\limits_{j\in{\mathcal{T}}}x[j]\exp{(-{2\pi\sqrt{-1}(i-1)(j-1)}/{m})})^{2}$ , $\operatorname{abs}(\cdot)$ denotes the elementwise absolute value, ${\mathcal{T}}=\operatorname{supp}(x^{\circ})$ is the support of ${x^{\circ}}$ (i.e., set of nonzero elements in ${x^{\circ}}$ ) with size $k$ , and $(w[1],...,w[m])^{\top}=:w\in\mathbb{R}^{m}$ is a noise vector.

A commonly used algorithm to solve SPRF is the greedy sparse phase retrieval (GESPAR) proposed by Shechtman et al. [7]. GESPAR performs a local search for $\mathcal{T}$ and iteratively updates support estimate $\mathcal{S}$ by exchanging one element in $\mathcal{S}$ with one in $\mathcal{V}\setminus\mathcal{S}$ , where $\mathcal{V}$ is an estimated index set such that $\mathcal{V}\supseteq\mathcal{T}$ . Depending on the search technology, GESPAR exhibits better performance than related algorithms (e.g., sparse Fienup [8], SDP [1], and two-stage sparse phase retrieval [9]) to reconstruct ${x^{\circ}}$ . However, given that GESPAR updates only one index in the support estimate per iteration, its performance according to complexity (i.e., efficiency) can be severely degraded as the set difference between $\mathcal{S}$ and $\mathcal{T}$ widens, and its complexity scales with $k$ [10]. The complexity of GESPAR further increases as the signal dimension $n$ or the signal-to-noise ratio (SNR) increases, given that set $\mathcal{V}$ approaches the full index set, $\{1,...,n\}$ , in any of these cases.

A learned deep neural network (DNN) can obtain desired solutions with high efficiency by simply performing a matrix multiplication at each layer without solving specific optimization problems. Therefore, the DNN has notably contributed to enhancing the performance of image reconstruction and denoising in SPRF [11, 12, 13]. However, this advantage has been limited to image processing, because DNNs consider image features during learning. Consequently, available research has neglected DNNs for performance improvement to recover general (synthetic) signals for SPRF. Nevertheless, verifying the high DNN performance for recovering any synthetic signal for SPRF would imply its superiority in all fields of SPRF besides image processing. On the other hand, DNNs have provided much lower complexity with similar performance to other algorithms to recover any synthetic signal in non-SPRF domains [14, 15, 16]. Hence, DNNs can improve the efficiency to recover all synthetic signals in the SPRF domain; we discussed this in detail in Section VI-A. To verify this, we propose an algorithm called phase retrieval with extended support estimation using DNN (PRED). This is a one-shot retrieval for the support by exploiting the prior information via a DNN applied in SPRF. It improves the efficiency of GESPAR to recover all synthetic and sparse signals. In Section VI-B, we demonstrated PRED scalability through intuition and principles.

As long short-term memory (LSTM) has the same structure as Bayesian learning iterations [15], the phase retrieval problem can be solved by using either the Bayesian learning framework [17] or implementing the framework as a subroutine [10]. In fact, SPRF can be solved by executing the linear inversion for sparse estimation (e.g., sparse Bayesian learning) as a subroutine [10]. Thus, LSTM-based DNNs implicitly enable to impose structural priors to estimate the support in SPRF. Therefore, we adopt a gated-feedback LSTM [15, 18] for the DNN in PRED, although other DNN architectures may be applied for SPRF.

PRED determines extended support estimate $\mathcal{E}$ to identify $\mathcal{T}$ (Section III). The extended support denotes an index set with size larger than sparsity $k$ and containing ${\mathcal{T}}$ . Specifically, we propose a DNN framework and its training rule to generate $\mathcal{E}$ . For the DNN to be learnable, we define a union of equivalent solutions of $\mathcal{T}$ and train the DNN to estimate this union set instead of $\mathcal{T}$ (Section III-A). PRED iteratively obtains $\mathcal{E}$ from the trained DNN output and estimates $\mathcal{T}$ as a subset $\mathcal{S}$ of $\mathcal{E}$ through an algorithm called three-stage signal estimation (TSE); this process makes PRED scalable as we explained it in Section VI-B. TSE extends the damped Gauss–Newton (DGN) algorithm from GESPAR by taking more than $k$ indices as input (Section III-B) [7]. In addition, PRED improves the efficiency of GESPAR to find $\mathcal{T}$ . In fact, it simultaneously updates multiple indices in support estimate $\mathcal{S}$ by exploiting a probability measure for $\mathcal{E}$ , which is provided by the trained DNN output, whereas GESPAR updates one index in $\mathcal{S}$ per iteration without utilizing the measure. Numerical results confirm that PRED outperforms GESPAR and a state-of-the-art variant of the Fienup technique, called FISTA for phase retrieval (FISTAPH) [19], with lower complexity.

II Background

II-A DGN Algorithm

Suppose that an estimate of ${\mathcal{T}}$ is given as ${\mathcal{S}}$ ( $|{\mathcal{S}}|=k$ , where $|\mathcal{S}|$ is the cardinality of $\mathcal{S}$ ). If ${\mathcal{S}}$ is correct (i.e., ${\mathcal{S}}={\mathcal{T}}$ ), SPRF can be formulated as the minimization in (2), whose solution (i.e., $k$ -sparse vector $x\in\mathbb{R}^{n}$ ) is an estimate of $x^{\circ}$ .

[TABLE]

where $v_{i}({x};{\mathcal{S}}):=({x}^{\mathcal{S}})^{\top}A_{i}({\mathcal{S}})\,{x}^{\mathcal{S}}$ , $A_{i}({\mathcal{S}}):=(F^{\{i\}}_{\mathcal{S}})^{*}F^{\{i\}}_{\mathcal{S}}\in\mathbb{R}^{k\times k}$ for $i\in\{1:m\}$ , and $F\in\mathbb{C}^{m\times n}$ is the discrete Fourier transform such that $(c[1],...,c[m])^{\top}=:c$ can be expressed as $c=|Fx^{\circ}|^{2}$ . For brevity, set $\{i,i+1,...,j\}$ is denoted as $\{i:j\}$ . A submatrix of $A:=[a_{1},...,a_{n}]$ $\in\mathbb{R}^{m\times n}$ with columns indexed by $J\subseteq\{1:n\}$ is denoted by $A_{J}$ . $A^{Q}$ and $x^{Q}$ denote the submatrix of $A$ with rows indexed by $Q\subseteq\{1:m\}$ and the subvector of $x$ with entries indexed by $Q$ , respectively.

Using first-order linear approximation, $y[i]-v_{i}(x;{\mathcal{S}})$ in (2) can be approximated as the $i$ th element of vector $B(x_{t};{\mathcal{S}})x_{t}^{\mathcal{S}}-b(x_{t};{\mathcal{S}})\in\mathbb{R}^{m}$ , where $B(x_{t};{\mathcal{S}})\in\mathbb{R}^{m\times k}$ is the matrix whose $i$ th row is $2(({x_{t}}^{\mathcal{S}})^{\top}A_{i}({\mathcal{S}}))\in\mathbb{R}^{k}$ and $b(x_{t};{\mathcal{S}})\in\mathbb{R}^{m}$ is the vector whose $i$ th element is $y[i]+({x_{t}}^{\mathcal{S}})^{\top}A_{i}({\mathcal{S}})\,{x_{t}}^{\mathcal{S}}$ . Then, using ${\mathcal{S}}$ , the DGN method applied in SPRF (Algorithm 1) estimates ${x^{\circ}}$ as a limit point of sequence $(x_{1},x_{2},...)$ obtained in steps 2 and 3, where $\delta_{t}:=(1/2)^{a(u,x_{t},z_{t};\mathcal{S})}u$ is a step size determined by a backtracking procedure and $a(u,x_{t},z_{t};\mathcal{S})\in\mathbb{N}$ denotes the minimum nonnegative integer $a$ such that $g(x_{t}-\delta_{t}d_{t};\mathcal{S})<g(x_{t};\mathcal{S})-u(\frac{1}{2})^{a+1}\nabla g(x_{t};\mathcal{S})^{\top}\,{d_{t}}^{\mathcal{S}}$ . The limit of sequence $(x_{1},x_{2},...)$ has been proven to be stationary [7].

II-B GESPAR

GESPAR (Algorithm 2) first determines two index sets, $\mathcal{V}_{1}$ and $\mathcal{V}_{2}$ , satisfying $\mathcal{V}_{1}\subseteq\mathcal{T}\subseteq\mathcal{V}_{2}$ through an autocorrelation-based process. Then, it generates estimate $\mathcal{S}$ of ${\mathcal{T}}$ ( $|\mathcal{S}|=k$ ) such that $\mathcal{V}_{1}\subseteq\mathcal{S}\subseteq\mathcal{V}_{2}$ and utilizes the DGN method with $\mathcal{S}$ (Algorithm 1) to estimate signal ${x^{\circ}}$ and determine whether signal error $\epsilon$ is sufficiently small. GESPAR iteratively updates support estimate $\mathcal{S}$ in step 3 using 2-opt local search, and the iterative process (steps 2–11) proceeds until either the signal error is sufficiently small or GESPAR exceeds iteration limit $\kappa_{\textup{ITER}}$ . More details on GESPAR can be found in [7].

III PRED Structure

To enhance the tradeoff between performance and complexity from GESPAR, the proposed PRED aims to determine extended support estimate ${\mathcal{E}}\subseteq\{1:n\}$ from a trained DNN output and recover ${x^{\circ}}$ via the proposed TSE using ${\mathcal{E}}$ . Suppose that ${\mathcal{E}}$ is given and includes ${\mathcal{T}}$ . Then, the SPRF problem can be formulated by (3), which estimates ${x^{\circ}}$ as solution $x\in\mathbb{R}^{n}$ .

[TABLE]

where $v_{i}({x};{\mathcal{E}}):=({x}^{\mathcal{E}})^{\top}A_{i}({\mathcal{E}})\,{x}^{\mathcal{E}}$ for $i\in\{1:m\}$ . Note that the original SPRF problem is equal to (3) with ${\mathcal{E}}$ replaced by $\{1:n\}$ . Thus, by considering ${\mathcal{E}}$ , the SPRF problem can be simplified such that the dimension of the target signal is reduced from $n$ to $|\mathcal{E}|$ . Besides, it is easier to find ${\mathcal{E}}$ such that ${\mathcal{E}}\supseteq{\mathcal{T}}$ than to identify ${\mathcal{T}}$ . Hence, PRED adopts this principle to generate ${\mathcal{E}}$ from a trained DNN output (Section III-A) and estimates ${x^{\circ}}$ by solving (3) given ${\mathcal{E}}$ (Section III-B).

III-A DNN for Extended Support Estimation

We propose a DNN structure and a training method to obtain extended support ${\mathcal{E}}$ of ${x^{\circ}}$ . Given the trivial ambiguity in SPRF (modulo Fourier invariances [9]), there exists a $k$ -sparse vector $x\in\mathbb{R}^{n}$ satisfying $\operatorname{abs}(Fx^{\circ})=\operatorname{abs}(Fx)$ , whose support $\mathcal{W}$ is defined by $\mathcal{T}-\min\limits_{v\in\mathcal{T}}v+l$ or $\max\limits_{v\in\mathcal{T}}v-\mathcal{T}+l$ for any integer $l$ such that $1\leq l\leq n-\max\limits_{v\in\mathcal{T}}v+\min\limits_{v\in\mathcal{T}}v$ . Therefore, even in the noiseless case, support $\mathcal{W}$ of $x$ cannot be uniquely identified as $\mathcal{T}$ given the measurement vector $y$ , and consequently it is hard to optimize the DNN for estimating $\mathcal{T}$ . To solve this problem, we introduce union $\mathcal{I}(\mathcal{T})$ of equivalent solutions of the true support (UES), defined as $\mathcal{I}(\mathcal{T}):=\alpha(\mathcal{T})\cup\beta(\mathcal{T})$ , where $\alpha(\mathcal{T}):=\mathcal{T}-\min\limits_{v\in\mathcal{T}}v+1$ , $\beta(\mathcal{T}):=\max\limits_{v\in\mathcal{T}}v-\mathcal{T}+1$ . Unlike true support $\mathcal{T}$ , UES $\mathcal{I}(\mathcal{T})$ is uniquely determined by $y$ . Thus, the DNN is learnable without the ambiguity by considering its output as $\mathcal{I}(\mathcal{T})$ instead of $\mathcal{T}$ . As the index $1$ is always included in $\mathcal{I}(\mathcal{T})$ , the DNN is trained to retrieve $\mathcal{I}_{-1}(\mathcal{T}):=\mathcal{I}(\mathcal{T})\setminus\{1\}$ .

III-A1 DNN Structure and Training Objective

For sparse vector $z\in\mathbb{R}^{n}$ , whose support is denoted by $\mathcal{T}_{z}$ and measurement vector $h=\operatorname{abs}(Fz)^{2}\in\mathbb{R}^{m}$ given discrete Fourier transform matrix $F$ , the DNN defined by $f_{\theta}(\cdot):\mathbb{R}^{m}\rightarrow\mathbb{T}^{n-1}$ takes vector $h$ as its input and is trained to return vector $v=(v[1],...,v[n-1])^{\top}=f_{\theta}(h)\in\mathbb{T}^{n-1}$ such that $v[i-1]=1/|\mathcal{I}_{-1}(\mathcal{T}_{z})|$ for $i\in\mathcal{I}_{-1}(\mathcal{T}_{z})$ and $v[i-1]=0$ for $i\notin\mathcal{I}_{-1}(\mathcal{T}_{z})$ , where $\mathbb{T}^{q}$ represents the $(q-1)$ -dimensional probability simplex. Each element $v[i]$ of vector $v$ is a probability, with index $i+1$ belonging to $\mathcal{I}_{-1}(\mathcal{T}_{z})$ . Thus, for any integer $e$ such that $|\mathcal{I}_{-1}(\mathcal{T}_{z})|\leq e$ , an extended support $\mathcal{E}$ of $z$ including $\mathcal{I}_{-1}(\mathcal{T}_{z})$ can be obtained by selecting the $e$ largest elements from the ideally trained DNN output. For instance, if $n$ is 6 and support $\mathcal{T}_{z}$ is $\{2,3,6\}$ so that $\mathcal{I}_{-1}(\mathcal{T}_{z})=\{2,4,5\}$ , $f_{\theta}(h)$ is trained to return output vector $f_{\theta}(h)$ as $(1/3,0,1/3,1/3,0)^{\top}$ , and $\mathcal{I}_{-1}(\mathcal{T}_{z})$ is obtained by selecting its $3$ largest elements.

III-A2 DNN Training

We randomly sampled a $k$ -sparse signal vector $\bar{x}$ , whose sparsity $k$ ranges from $k_{1}$ to $k_{2}$ . We set $(k_{1},k_{2})$ to $(2,20)$ in this study (Section V). From signal vector $\bar{x}$ , a measurement vector $\bar{y}$ satisfying (1) can be obtained by adding random noise vector $w$ according to the given SNR. For the distribution of pairs $(\bar{x},\bar{y})$ produced this way, the training goal can be formulated as the minimization of (4).

[TABLE]

where $\textup{ce}(v_{1},v_{2}):=-\frac{1}{g}\sum_{i=1}^{g}v_{2}[i]\log\,v_{1}[i]$ is the cross-entropy between vectors $v_{1}$ and $v_{2}$ of dimension $g$ , $\textup{u}_{n}(\mathcal{I})$ is an $(n-1)$ -dimensional vector $v\in\mathbb{R}^{n-1}$ , whose nonzero elements are $1/|\mathcal{I}|$ and its support is given by set $\mathcal{I}$ , $\mathcal{T}_{\bar{x}}$ is the support of ${\bar{x}}$ , and $\theta$ is the training parameter to be updated for minimizing $L(\theta)$ . The detailed description is shown in Appendix.

III-B Three-Stage Signal Estimation

From the trained DNN output, we can obtain extended support estimate $\mathcal{E}$ such that ${\mathcal{E}}\supseteq{\mathcal{I}(\mathcal{T})}$ . Then, it remains to estimate the true support as $\alpha(\mathcal{T})$ or $\beta(\mathcal{T})$ by selecting $k$ indices from $\mathcal{E}$ . This is resolved by the minimization in (3) given ${\mathcal{E}}\supseteq{\mathcal{I}(\mathcal{T})}$ . We introduce the TSE in Algorithm 3, which calls the DGN method (Algorithm 1) twice and approximately solves the minimization in (3) through the following three stages: (a) temporary signal estimation from the given $\mathcal{E}$ : $\bar{x}\leftarrow\underset{x\in\mathbb{R}^{n}}{\arg\min}\,g(x;\mathcal{E})$ ; (b) support estimation from $\bar{x}$ : $\mathcal{S}\leftarrow\underset{\mathcal{S}\subseteq\mathcal{E}\textup{ s.t. }|\mathcal{S}|=k}{\arg\min}\,g(\bar{x};\mathcal{S})$ ; and (c) signal estimation from $\mathcal{S}$ : $\hat{x}\leftarrow\underset{\check{x}\in\mathbb{R}^{n}}{\arg\min}\,g(\check{x};\mathcal{S})$ , where $\hat{x}$ and $\mathcal{S}$ are the signal and support estimates, respectively.

The first stage of TSE (step 1) minimizes the cost in (a), with a temporary signal estimate $\tilde{x}$ supported on ${\mathcal{E}}$ obtained by applying the DGN method (Algorithm 1) with ${\mathcal{E}}$ such that $\tilde{x}=$ DGN( $F,y,\mathcal{E},\tau,h$ ). The second stage (step 2) retrieves support estimate ${\mathcal{S}}$ as a subset of $\mathcal{E}$ by approximating the minimization in (b) through hard thresholding (i.e., selecting $k$ indices corresponding to the $k$ largest absolute values of $(\tilde{x})^{\mathcal{E}}$ supported on ${\mathcal{E}}$ ). Finally, TSE determines ${x^{\circ}}$ through the DGN method with ${\mathcal{S}}$ to solve (c) at the third stage (step 3).

IV PRED Algorithm

The proposed PRED is detailed in Algorithm 4 and estimates $({x^{\circ}},{\mathcal{T}})$ as $({\hat{x}},{\mathcal{S}})$ through trained DNN $f_{\theta}(\cdot)$ and by applying TSE (Algorithm 3). The trivial ambiguity of SPRF [9] implies that index $1$ can be considered an element in the true support. By selecting the $(q-1)$ largest values of the trained DNN output, PRED initializes extended support estimate $\mathcal{E}$ , where $q$ is the size of $\mathcal{E}$ randomly sampled from $2k$ to $3k$ . Note that $|\mathcal{E}|$ is larger than $|\mathcal{I}(\mathcal{T})|$ from inequality $|\mathcal{E}|\geq 2k$ , and the $i$ th element of the trained DNN output $f_{\theta}(y)\in\mathbb{R}^{n-1}$ indicates a probability of index $i+1$ belonging to UES $\mathcal{I}(\mathcal{T})$ . Thus, $\mathcal{E}$ is the set expected to include one of the equivalent solutions of $\mathcal{T}$ in $\mathcal{I}(\mathcal{T})$ (i.e., either $\alpha(\mathcal{T})$ or $\beta(\mathcal{T})$ ). Under premise $\mathcal{E}\supseteq\alpha(\mathcal{T})$ or $\mathcal{E}\supseteq\beta(\mathcal{T})$ , PRED solves the minimization in (3) by executing TSE to estimate $(x^{\circ},\mathcal{T})$ as ( $x_{1},\mathcal{S}_{1}$ ) in step 2. Then, PRED terminates depending on whether current signal error $g(x_{1};\mathcal{S}_{1})$ obtained from the estimate is below input threshold $\epsilon$ in steps 3 and 4. If the signal error is higher than $\epsilon$ , PRED executes steps 5–7 to obtain a new extended support estimate $\mathcal{E}$ via an update process, which is executed in step 7 by replacing the complementary set of $\mathcal{S}$ in $\mathcal{E}$ with multiple indices randomly selected according to discrete probability vector $p$ generated from DNN output $f_{\theta}(y)$ . Vector $p$ represents the probability of each index in $\{2:n\}\setminus\mathcal{S}$ belonging to $\mathcal{I}(\mathcal{T})\setminus\mathcal{S}$ . By using the updated extended support $\mathcal{E}$ , TSE estimates the signal vector and its support as ( $x_{2},\mathcal{S}_{2}$ ) in step 2 at the next iteration. This process is iterated until either the signal error is sufficiently small or the number of iterations exceeds limit $\kappa_{\textup{ITER}}$ .

V Numerical Experiments and Results

We compared the performance of PRED against similar algorithms, namely, GESPAR, FISTAPH, and phase-retrieval generalized approximate message passing (PRGAMP) [10]. For a fair comparison, we applied the same stopping criterion shown in steps 3 and 4 of Algorithm 4 to GESPAR and FISTAPH. We executed FISTAPH $20i$ times for $n=256(4-i)$ and $i\in\{1:3\}$ to get multiple candidate solutions and select the one with minimum error among them. The soft thresholding parameter of FISTAPH was set to $0.02$ , and we selected the $k$ largest elements of its signal estimate to recover support $\mathcal{T}$ . We used uniform and Gaussian models to generate signals. In the uniform model, each nonzero element of ${x^{\circ}}$ was sampled from $-1$ to $1$ excluding the interval from $-0.2$ to $0.2$ . In the Gaussian model, each nonzero element of ${x^{\circ}}$ was sampled from a standard Gaussian distribution. We assumed that each entry of $w$ follows chi-squared distribution $\chi^{2}(2)$ with $2$ degrees of freedom For a complex random variable $z:=a+b\sqrt{-1}\in\mathbb{C}$ , whose real and imaginary parts are i.i.d following a Gaussian distribution, $|\sum\limits_{j\in{\mathcal{T}}}x[j]\exp{(-{2\pi\sqrt{-1}(i-1)(j-1)}/{m})}+z|^{2}\approx c[i]+a^{2}+b^{2}$ . Hence, the $i$ th element of noise vector $w$ can be set to $a^{2}+b^{2}$ , such that $w[i]\sim\sigma\cdot\chi^{2}(2)$ for $i\in\{1:m\}$ , where $\sigma$ is determined by the SNR. Given support estimate $\mathcal{S}$ , we use modulo Fourier invariances and define recovery success rate $\mathbb{E}[\max$$(1(\alpha(\mathcal{T})=\alpha(\mathcal{S})),1(\alpha(\mathcal{T})=\beta(\mathcal{S})))]$ and soft recovery success rate $\mathbb{E}[\max$$(|\alpha(\mathcal{T})\cap\alpha(\mathcal{S})|,|\alpha(\mathcal{T})\cap\beta(\mathcal{S})|)/k]$ for the support, where $1(\cdot)$ is the indicator function. We set input parameters $(h,\tau,\epsilon)$ in PRED and GESPAR to $(100,10^{-4},\left\|y\right\|\cdot 10^{-v_{\textup{SNR}_{\textup{dB}}}/20})$ and iteration limit $\kappa_{\textup{ITER}}$ in PRED and GESPAR to $100$ and $500$ , respectively.

For the gated-feedback LSTM $f_{\theta}(\cdot)$ used in PRED [18], we set the hidden unit size, number of unfolding steps, and layer size to $2000$ , $20$ , and $2$ , respectively. Further details on this network are available in [15]. The detailed description and settings for learning DNN $f_{\theta}(\cdot)$ are given in Appendix.

We evaluated each algorithm with different SNRs and dimension values $n$ of ${x^{\circ}}$ . Figs. 1(a)–(d) and Figs. 1(e)–(g) show the rate of successful support recovery and execution time per algorithm, respectively. In most of the sparsity region, PRED outperforms the other algorithms, provides lower complexity, and is more robust to noise. We can expect that PRED scales well with $n$ , as Figs. 1(a)–(d) show that PRED uniformly recovers about twice the sparsity compared to GESPAR and FISTAPH for different $n$ and SNR values.222In Fig. 1(b), the maximum $k$ satisfying support recovery rates higher than $95$ % is $16$ , $2$ , and $8$ for PRED, GESPAR, and FISTAPH, respectively. Figs. 1(e)–(f) show that the running time of PRED is less than half of that of the other methods at sparsity $k$ below $20$ .

Given that PRED and GESPAR consist of an iteration of DGN (Algorithm 1), their complexity is expressed as $\eta\cdot\phi$ , where $\phi$ is the average complexity of DGN and $\eta$ is the average number of DGN executions in each algorithm. Note that the complexity of the pseudoinverse in step 2 of DGN with input $\mathcal{S}$ of size $b$ is generally $O(b^{2}\cdot m)$ , where $b$ is set to $k$ and $\alpha k$ in GESPAR and PRED, respectively, for a constant $\alpha$ smaller than $3$ .333Given that size $b$ of the index set for the DGN input in steps 1 and 3 in TSE (Algorithm 3) is smaller than $3k$ and equal to $k$ , respectively, $\alpha$ and its mean are smaller than $3$ and $3/2$ , respectively. This implies that average complexity $\phi$ of the DGN used in PRED and GESPAR has the same order for $k$ , and hence their complexity is mainly dependent on $\eta$ . Figs. 1(h)–(i) show that $\eta$ in PRED is less than or similar to one-third of $\eta$ in GESPAR. Thus, the complexity of PRED and GESPAR has the same order for $k$ and supports the results in Figs. 1(e)–(g), showing that the execution time of PRED is shorter than that of GESPAR in most of the sparsity region.

Figs. 1(j)–(l) show the performance results for the zero-mean and unit-variance Gaussian model. PRGAMP444The public software package implemented in MATLAB was used to test PRGAMP. The other methods were implemented in Python with TensorFlow. was compared only on the Gaussian model due to its structural characteristics. Even for the Gaussian model, PRED has a superior performance with lower complexity than existing algorithms including PRGAMP.555We excluded the performance result of PRGAMP in Fig. 1(j) because it is zero for the whole sparsity region.

VI Discussion

VI-A The scalability of DNN to recover synthetic signals for SPRF

To support the claim that the DNN structure can be scalable to estimate the support in SPRF, we prepared the following subsections 1) and 2). To show that the DNN is superior to other methods for solving SPRF, we prepared the following subsection 3).

VI-A1 The DNN imposes structural priors for support estimation in SRPF

We introduce the following three results (a)–(c) by referring to [10], [15], and [17]:

(a)

In [15], it is guaranteed that a canonical LSTM cell has the same structure as the computational flow of each iteration of Bayesian learning framework (BLF). 2. (b)

In [17], it is shown that phase retrieval can be solved by using the BLF. 3. (c)

In [10], the PRGAMP algorithm has an inner loop where the GAMP algorithm, one of compressed sensing algorithms for sparse linear inversion, is used to estimate the sparse signal. Thus, in PRGAMP, the GAMP algorithm can be replaced by any compressed sensing algorithm to estimate the support in SPRF. As the sparse BLF is a compressed sensing algorithm, we can conclude that SPRF can be solved by using the BLF; this also supports the result (b).

Results (a)–(c) imply that LSTM is a generalized (i.e., learned) version of the BLF (from result (a)) and support estimation in SPRF can be done by the BLF (from results (b) and (c)). Furthermore, it is well-known that the BLF is a scalable algorithm, as it has structural priors to estimate the target support. Therefore, as the BLF has structural priors, the LSTM-based DNN implicitly enables to impose structural priors to estimate the target support in SPRF.

VI-A2 The DNN is scalable according to signal dimension n for support estimation in SRPF

Note that BLF is scalable for signal dimension $n$ and LSTM has the same structure as the BLF. Thus, LSTM can be scalable for $n$ by imposing the structure prior to estimate the target support in SPRF. Our test results shown in Figs. 1(a)–(c) implies that the LSTM-based DNN, used in the proposed PRED, estimates the true support with a high probability, irrespective of $n$ . This is because PRED uniformly recovers about the twice the sparsity with a lower complexity than other related methods, for different values of $n$ . This supports our claim that DNN (i.e., LSTM) is scalable for $n$ to estimate the target support in SPRF.

Note that in our test, the signal is sampled from two continuous probability distributions (i.e., uniform and Gaussian). This implies that there is an infinite number of combinations of pairs for measurement vector $y$ and its corresponding true support, given any fixed sparsity $k$ . Thus, it would not possible for the DNN to recover all true supports, given that the DNN should store all the infinite number of cases if the DNN did not have any inherent structure.

The test results in Figs. 1(a)–(c) show that the DNN in PRED can recover twice the sparsity, by recovering all supports with probability one, in comparison with related SPRF methods. Hence, the DNN (i.e., LSTM) does not simply store all the supports. Instead, it has an implicit structure to estimate the target support via its forward computational flow, which is like a computational flow of the BLF. This inherent structure also ensures that the DNN is scalable.

VI-A3 The DNN can outperform existing methods for support estimation in SRPF

The LSTM can be interpreted as a learned version of the BLF. It has been shown in [20] and [21] that learned versions of approximated message passing and the iterative shrinkage thresholding algorithm outperform their counterparts for estimating the target support. This indicates that LSTM (i.e., the learned version of the BLF) can outperform the BLF for estimating the support in SPRF, as demonstrated in [15] though by solving a problem different from SPRF. This implies from the results (b) and (c) in Section VI-A1 that DNN (i.e., LSTM) can outperform existing support estimators for SPRF. We demonstrated in Section V that the proposed PRED outperforms other SPRF methods by using the LSTM as the DNN architecture for PRED, supporting our claim.

VI-B Demonstration of PRED scalability through intuition and principles

PRED has the following two main steps (1) and (2): (1) Extended support estimation from DNN output and (2) support estimation from the extended support estimate via the TSE algorithm (Algorithm 3). For step (1), the DNN provides a set (the extended support estimate) including the true support with high probability, as the DNN (i.e., LSTM) has the implicit structure to estimate the support as discussed in Section VI-A. Step (1) has a low complexity as the extended support estimation using DNN is performed simply via a matrix multiplication at each DNN layer without solving specific optimization problems. Thus, the complexity (performance) of PRED is mainly dependent on step (2). As we have shown in the last paragraph of Section V, the complexity of step (2) is O( $b^{2}$ ), where b is the size of the extended support estimate obtained from step (1). Thus, the maximum of $b$ is $3k$ where $k$ is the sparsity. Hence, the complexity of PRED is the order of $k$ (i.e., O( $k^{2}$ )) and does not depend on signal dimension $n$ . Disregarding step (1), the complexity of PRED is O( $n^{2}$ ), as the extended support estimate in step (2) is set to the whole index set. Therefore, PRED is scalable (the complexity is not affected by $n$ but by $k$ ) by making the TSE algorithm search the support not in the whole index set, but in the extended support estimate obtained from the DNN. This justifies the combination of the DNN with existing SPRF algorithms, as discussed in Section VII.

VII Conclusion

Although a DNN cannot accurately estimate the support, it is efficient to estimate the set containing it [15]. On the other hand, the optimization-based approach is less efficient at finding the support from a full set of indices, but is highly accurate from a relatively small set including the support. We leverage the advantages of both approaches to perform DNN-based extended support estimation and first show that this approach, called PRED, outperforms existing algorithms in recovering common sparse signals for SPRF.

DNN Training

-A Description of the proposed algorithm for training DNN

Algorithm 5 describes the proposed training method for the DNN $f_{\theta}(\cdot)$ . It considers noisy training data for the DNN to estimate the UES, and the case when sparsity $k$ of ${x^{\circ}}$ in the test data is unknown, with minimum and maximum bounds given by $k_{1}$ and $k_{2}$ , respectively, to generate the training data. In the algorithm, $n_{e}$ is the number of epochs, $s_{b}$ is the size of batch data, $n_{b}$ is the number of batches per epoch, and $v_{\textup{SNR$ {}_{\textup{dB}} $}}$ is the SNR in decibels (dB), which is expected to be $10\log_{10}\,(\sum_{i}c[i]/\sum_{i}w[i])$ . For each epoch, steps 3–9 generate training data $(x_{i},y_{i})_{i=1}^{s_{b}\cdot n_{b}}$ , where $x_{i}$ and $y_{i}$ are the signal and measurement vectors, respectively. Specifically, in step 5, signal vector $x_{i}$ , whose sparsity $s$ is uniformly sampled between $k_{1}$ and $k_{2}$ , is sampled from the conditional probability $p_{{x^{\circ}}}(x\mid s)$ of the signal vector given sparsity $s$ , defined as

[TABLE]

where $p_{{x^{\circ}}}(x)$ is the distribution of ${x^{\circ}}$ and $1(\cdot)$ is the indicator function. Measurement vector $y_{i}$ in step 9 is given by $z_{i}:=Fx_{i}$ plus noise vector $w_{i}$ such that $10\log_{10}\,(\sum_{j}z_{i}[j]/\sum_{j}w_{i}[j])\geq v_{\textup{SNR$ {}_{\textup{dB}} $}}$ . The training goal is to minimize cost $L(\theta):=\mathbb{E}_{(y,x)}[\textup{ce(}f_{\theta}(y),\textup{u}_{n}(\mathcal{I}))]$ in step 13, with $\textup{u}_{n}(\mathcal{I})$ being an $(n-1)$ -dimensional vector $v\in\mathbb{R}^{n-1}$ , whose nonzero elements are $1/|\mathcal{I}|$ and support is given by set $\mathcal{I}$ , and $\textup{ce}(v_{1},v_{2}):=-\frac{1}{g}\sum_{i=1}^{g}v_{2}[i]\log\,v_{1}[i]$ is the cross-entropy between vectors $v_{1}$ and $v_{2}$ of dimension $g$ . Training parameter $\theta$ is updated to minimize $L(\theta)$ in steps 14 and 15 through $\textup{Update}_{\theta}(L(\theta),\eta)$ with its learning rate $\eta$ .

-B Setting environment for the experiments in Section V

To train the network given by $f_{\theta}(\cdot)$ , we used Algorithm 5 by setting input $(k_{1},k_{2},s_{b},n_{b},n_{e})$ to $(2,20,10^{6},250,40)$ and fixing $v_{\textup{SNR$ {}_{\textup{dB}} $}}$ to the SNR. In addition, we used RMSprop optimization with learning rate $\eta_{i}$ of $0.0001$ for epochs $i\leq 10$ and $0.0001/4^{j}$ for epochs $i$ from $1+10j$ to $10+10j$ ( $j\in\{1:3\}$ ) to update the gradient.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] K. Jaganathan, S. Oymak, and B. Hassibi, “Recovery of sparse 1-D signals from the magnitudes of their Fourier transform,” in Proceedings of IEEE International Symposium on Information Theory , 2012, pp. 1473–1477.
2[2] R. P. Millane, “Phase retrieval in crystallography and optics,” Journal of the Optical Society of America A , vol. 7, no. 3, pp. 394–411, 1990.
3[3] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase retrieval with application to optical imaging: A contemporary overview,” IEEE Signal Processing Magazine , vol. 32, no. 3, pp. 87–109, 2015.
4[4] V. Y. Katkovnik and K. Egiazarian, “Sparse superresolution phase retrieval from phase-coded noisy intensity patterns,” Optical Engineering , vol. 56, no. 9, p. 094103, 2017.
5[5] B. Baykal, “Blind channel estimation via combining autocorrelation and blind phase estimation,” IEEE Transactions on Circuits and Systems I: Regular Papers , vol. 51, no. 6, pp. 1125–1131, 2004.
6[6] M. Stefik, “Inferring DNA structures from segmentation data,” Artificial Intelligence , vol. 11, no. 1-2, pp. 85–114, 1978.
7[7] Y. Shechtman, A. Beck, and Y. C. Eldar, “GESPAR: Efficient phase retrieval of sparse signals,” IEEE Transactions on Signal Processing , vol. 62, no. 4, pp. 928–938, 2014.
8[8] S. Mukherjee and C. S. Seelamantula, “An iterative algorithm for phase retrieval with sparsity constraints: Application to frequency domain optical coherence tomography,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing , 2012, pp. 553–556.