Perturbed Amplitude Flow for Phase Retrieval

Bing Gao; Xinwei Sun; Yang Wang; Zhiqiang Xu

arXiv:1904.10307·math.NA·October 15, 2020·IEEE Trans. Signal Process.

Perturbed Amplitude Flow for Phase Retrieval

Bing Gao, Xinwei Sun, Yang Wang, Zhiqiang Xu

PDF

1 Repo

TL;DR

This paper introduces Perturbed Amplitude Flow (PAF), a simple, efficient non-convex algorithm for phase retrieval that guarantees linear convergence with optimal measurements and is validated through simulations and image experiments.

Contribution

The paper presents PAF, a novel non-convex phase retrieval algorithm with proven recovery guarantees and linear convergence, requiring no truncation or re-weighting.

Findings

01

PAF recovers signals with optimal O(n) measurements.

02

PAF converges linearly from a designed initial point.

03

Validated effectiveness through simulations and natural image experiments.

Abstract

In this paper, we propose a new non-convex algorithm for solving the phase retrieval problem, i.e., the reconstruction of a signal $\vx\in\H^n$ ( $\overset{=}{˝} R$ or $\C$ ) from phaseless samples $b_{j} = \abs ⟨ \va_{j}, \vx ⟩$ , $j = 1, \dots, m$ . The proposed algorithm solves a new proposed model, perturbed amplitude-based model, for phase retrieval and is correspondingly named as {\em Perturbed Amplitude Flow} (PAF). We prove that PAF can recover $c \vx$ ( $\abs c = 1$ ) under $O (n)$ Gaussian random measurements (optimal order of measurements). Starting with a designed initial point, our PAF algorithm iteratively converges to the true solution at a linear rate for both real and complex signals. Besides, PAF algorithm needn't any truncation or re-weighted procedure, so it enjoys simplicity for implementation. The effectiveness and benefit of the proposed method are validated by…

Figures11

Click any figure to enlarge with its caption.

Tables1

Table 1. TABLE I: Iteraions and elapsed time.

Algorithm	Relative error	Iter	Time(s)
WF	$1 \times 10^{- 5}$	172	348.36
WF	$1 \times 10^{- 10}$	302	606.71
TWF	$1 \times 10^{- 5}$	51	320.53
TWF	$1 \times 10^{- 10}$	118	691.39
TAF	$1 \times 10^{- 5}$	37	118.06
TAF	$1 \times 10^{- 10}$	84	250.23
RAF	$1 \times 10^{- 5}$	37	124.99
RAF	$1 \times 10^{- 10}$	84	271.40
PAF	$1 \times 10^{- 5}$	37	97.27
PAF	$1 \times 10^{- 10}$	84	224.23
AF	$1 \times 10^{- 5}$	37	87.65
AF	$1 \times 10^{- 10}$	84	189.24

Equations344

b_{j} = ∣⟨ a_{j}, x ⟩∣, j = 1, \dots, m .

b_{j} = ∣⟨ a_{j}, x ⟩∣, j = 1, \dots, m .

z max Re (⟨ z, \hat{z})) subject to ∣⟨ a_{j}, z ⟩∣ \leq b_{j},

z max Re (⟨ z, \hat{z})) subject to ∣⟨ a_{j}, z ⟩∣ \leq b_{j},

z min g (z) := \frac{1}{4 m} j = 1 \sum m (∣ a_{j}^{*} z ∣^{2} - b_{j}^{2})^{2},

z min g (z) := \frac{1}{4 m} j = 1 \sum m (∣ a_{j}^{*} z ∣^{2} - b_{j}^{2})^{2},

z min f (z) := \frac{1}{2 m} j = 1 \sum m (∣ a_{j}^{*} z ∣ - b_{j})^{2},

z min f (z) := \frac{1}{2 m} j = 1 \sum m (∣ a_{j}^{*} z ∣ - b_{j})^{2},

z min h (z) := - j = 1 \sum m (b_{j}^{2} lo g (∣ a_{j}^{*} z ∣^{2}) - ∣ a_{j}^{*} z ∣^{2}) .

z min h (z) := - j = 1 \sum m (b_{j}^{2} lo g (∣ a_{j}^{*} z ∣^{2}) - ∣ a_{j}^{*} z ∣^{2}) .

z min f_{ϵ} (z) := z min \frac{1}{m} j = 1 \sum m (∣ a_{j}^{*} z ∣^{2} + ϵ_{j}^{2} - b_{j}^{2} + ϵ_{j}^{2})^{2},

z min f_{ϵ} (z) := z min \frac{1}{m} j = 1 \sum m (∣ a_{j}^{*} z ∣^{2} + ϵ_{j}^{2} - b_{j}^{2} + ϵ_{j}^{2})^{2},

ϵ_{j} \neq = 0 for all b_{j} \neq = 0.

ϵ_{j} \neq = 0 for all b_{j} \neq = 0.

\nabla f (z)

\nabla f (z)

\displaystyle=\frac{1}{m}\sum_{j=1}^{m}\left({\mathbf{a}}_{j}{\mathbf{a}}_{j}^{\top}{\mathbf{h}}+{\mathbf{a}}_{j}|{\mathbf{a}}_{j}^{\top}{\mathbf{x}}|\Big{(}\frac{{\mathbf{a}}_{j}^{\top}{\mathbf{x}}}{|{\mathbf{a}}_{j}^{\top}{\mathbf{x}}|}-\frac{{\mathbf{a}}_{j}^{\top}{\mathbf{z}}}{|{\mathbf{a}}_{j}^{\top}{\mathbf{z}}|}\Big{)}\right),

\nabla f_{ϵ} (z) = \frac{1}{m} j = 1 \sum m 1 - \frac{b _{j}^{2} + ϵ _{j}^{2}}{∣ a _{j}^{*} z ∣ ^{2} + ϵ _{j}^{2}} a_{j} a_{j}^{*} z .

\nabla f_{ϵ} (z) = \frac{1}{m} j = 1 \sum m 1 - \frac{b _{j}^{2} + ϵ _{j}^{2}}{∣ a _{j}^{*} z ∣ ^{2} + ϵ _{j}^{2}} a_{j} a_{j}^{*} z .

dist (z, x) = ϕ \in [0, 2 π) min ∥ z - e^{i ϕ} x ∥ := ∥ z - e^{i ϕ_{x} (z)} x ∥,

dist (z, x) = ϕ \in [0, 2 π) min ∥ z - e^{i ϕ} x ∥ := ∥ z - e^{i ϕ_{x} (z)} x ∥,

ϕ_{x} (z) := ϕ \in [0, 2 π) argmin ∥ z - e^{i ϕ} x ∥.

ϕ_{x} (z) := ϕ \in [0, 2 π) argmin ∥ z - e^{i ϕ} x ∥.

\mathcal{S}_{\mathbf{x}}(\rho):=\Big{\{}{\mathbf{z}}\in{\mathbb{C}}^{d}:\text{dist}({\mathbf{z}},{\mathbf{x}})\leq\rho\|{\mathbf{x}}\|\Bigr{\}}.

\mathcal{S}_{\mathbf{x}}(\rho):=\Big{\{}{\mathbf{z}}\in{\mathbb{C}}^{d}:\text{dist}({\mathbf{z}},{\mathbf{x}})\leq\rho\|{\mathbf{x}}\|\Bigr{\}}.

Y = \frac{1}{m} j = 1 \sum m (γ - exp (- b_{j}^{2} / λ^{2})) a_{j} a_{j}^{*}

Y = \frac{1}{m} j = 1 \sum m (γ - exp (- b_{j}^{2} / λ^{2})) a_{j} a_{j}^{*}

λ^{2} = \frac{1}{m} j = 1 \sum m b_{j}^{2} .

λ^{2} = \frac{1}{m} j = 1 \sum m b_{j}^{2} .

dist (z_{0}, x) \leq ξ ∥ x ∥

dist (z_{0}, x) \leq ξ ∥ x ∥

f_{ϵ} (z) := \frac{1}{m} j = 1 \sum m (∣ a_{j}^{*} z ∣^{2} + ϵ_{j}^{2} - b_{j}^{2} + ϵ_{j}^{2})^{2}

f_{ϵ} (z) := \frac{1}{m} j = 1 \sum m (∣ a_{j}^{*} z ∣^{2} + ϵ_{j}^{2} - b_{j}^{2} + ϵ_{j}^{2})^{2}

z_{k + 1} = z_{k} - μ \nabla f_{ϵ} (z_{k}),

z_{k + 1} = z_{k} - μ \nabla f_{ϵ} (z_{k}),

\nabla f_{ϵ} (z)

\nabla f_{ϵ} (z)

= \frac{1}{m} j = 1 \sum m 1 - \frac{b _{j}^{2} + ϵ _{j}^{2}}{∣ a _{j}^{*} z ∣ ^{2} + ϵ _{j}^{2}} a_{j} a_{j}^{*} z .

∥\nabla f_{ϵ} (z) ∥ \leq (1 + δ) \cdot dist (z, x)

∥\nabla f_{ϵ} (z) ∥ \leq (1 + δ) \cdot dist (z, x)

{\rm Re}\big{(}\langle\nabla f_{{\bm{\epsilon}}}({\mathbf{z}}),\,{\mathbf{z}}-{\mathbf{x}}e^{i\phi_{\mathbf{x}}({\mathbf{z}})}\rangle\big{)}\geq\beta_{\alpha}\cdot\textup{dist}^{2}({\mathbf{z}},{\mathbf{x}})

{\rm Re}\big{(}\langle\nabla f_{{\bm{\epsilon}}}({\mathbf{z}}),\,{\mathbf{z}}-{\mathbf{x}}e^{i\phi_{\mathbf{x}}({\mathbf{z}})}\rangle\big{)}\geq\beta_{\alpha}\cdot\textup{dist}^{2}({\mathbf{z}},{\mathbf{x}})

\displaystyle{\rm Re}\big{(}\langle\nabla f_{{\bm{\epsilon}}}({\mathbf{z}}),\,{\mathbf{z}}-{\mathbf{x}}e^{i\phi_{\mathbf{x}}({\mathbf{z}})}\rangle\big{)}

\displaystyle{\rm Re}\big{(}\langle\nabla f_{{\bm{\epsilon}}}({\mathbf{z}}),\,{\mathbf{z}}-{\mathbf{x}}e^{i\phi_{\mathbf{x}}({\mathbf{z}})}\rangle\big{)}

\displaystyle=\frac{1}{m}\sum_{j=1}^{m}\left(1-\frac{\sqrt{b_{j}^{2}+\epsilon_{j}^{2}}}{\sqrt{|{\mathbf{a}}_{j}^{*}({\mathbf{x}}+{\mathbf{h}})|^{2}+\epsilon_{j}^{2}}}\right)\big{(}|{\mathbf{a}}_{j}^{*}{\mathbf{h}}|^{2}+{\rm Re}({\mathbf{h}}^{*}{\mathbf{a}}_{j}{\mathbf{a}}_{j}^{*}{\mathbf{x}})\big{)}.

dist^{2} (z_{k + 1}, x) \leq (1 - β_{α}^{2} /1.00 1^{2}) \cdot dist^{2} (z_{k}, x) .

dist^{2} (z_{k + 1}, x) \leq (1 - β_{α}^{2} /1.00 1^{2}) \cdot dist^{2} (z_{k}, x) .

dist (z_{k}, x) \leq \frac{1}{10} (1 - \frac{0.010 7 ^{2}}{1.00 1 ^{2}})^{k /2} \cdot ∥ x ∥.

dist (z_{k}, x) \leq \frac{1}{10} (1 - \frac{0.010 7 ^{2}}{1.00 1 ^{2}})^{k /2} \cdot ∥ x ∥.

dist^{2} (z_{k + 1}, x)

dist^{2} (z_{k + 1}, x)

\leq ∥ z_{k + 1} - x e^{i ϕ_{x} (z_{k})} ∥^{2}

= ∥ z_{k} - x e^{i ϕ_{x} (z_{k})} - μ \nabla f_{ϵ} (z_{k}) ∥^{2}

= ∥ z_{k} - x e^{i ϕ_{x} (z_{k})} ∥^{2}

\displaystyle\quad-2\mu{\rm Re}\big{(}\langle\nabla f_{{\bm{\epsilon}}}({\mathbf{z}}_{k}),\,{\mathbf{z}}_{k}-{\mathbf{x}}e^{i\phi_{\mathbf{x}}({\mathbf{z}}_{k})}\rangle\big{)}+\mu^{2}\|\nabla f_{{\bm{\epsilon}}}({\mathbf{z}}_{k})\|^{2}

\leq ∥ z_{k} - x e^{i ϕ_{x} (z_{k})} ∥^{2} - 2 μ \cdot β_{α} ∥ z_{k} - x e^{i ϕ_{x} (z_{k})} ∥^{2}

+ μ^{2} \cdot 1.00 1^{2} ∥ z_{k} - x e^{i ϕ_{x} (z_{k})} ∥^{2}

\displaystyle=\big{(}1-\mu\cdot(2\beta_{\alpha}-1.001^{2}\mu)\big{)}\|{\mathbf{z}}_{k}-{\mathbf{x}}e^{i\phi_{\mathbf{x}}({\mathbf{z}}_{k})}\|^{2}

= (1 - β_{α}^{2} /1.00 1^{2}) \cdot dist^{2} (z_{k}, x) .

dist (z_{k}, x)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ford666/Perturbed-amplitude-flow
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Perturbed amplitude flow for phase retrieval

††thanks: B. Gao is with School of Mathematical Sciences, Nankai University, Tianjin, China. Email: [email protected] ††thanks: X. Sun is with Microsoft Research Asia Email: [email protected] ††thanks: Y. Wang is with Department of Mathematics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong. Email: [email protected] ††thanks: Z. Xu is with Inst. Comp. Math., Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100091, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China. Email: [email protected] ††thanks: Yang Wang was supported in part by the Hong Kong Research Grant Council grants 16306415 and 16308518. Zhiqiang Xu was supported by NSFC grant (91630203, 11688101), Beijing Natural Science Foundation (Z180002).

Bing Gao, Xinwei Sun, Yang Wang, Zhiqiang Xu

Abstract

In this paper, we propose a new non-convex algorithm for solving the phase retrieval problem, i.e., the reconstruction of a signal ${\mathbf{x}}\in{\mathbb{H}}^{n}$ ( ${\mathbb{H}}={\mathbb{R}}$ or ${\mathbb{C}}$ ) from phaseless samples $b_{j}=\lvert\langle{\mathbf{a}}_{j},{\mathbf{x}}\rangle\rvert$ , $j=1,\ldots,m$ . The proposed algorithm solves a new proposed model, perturbed amplitude-based model, for phase retrieval and is correspondingly named as Perturbed Amplitude Flow (PAF). We prove that PAF can recover $c{\mathbf{x}}$ ( $\lvert c\rvert=1$ ) under $\mathcal{O}(n)$ Gaussian random measurements (optimal order of measurements). Starting with a designed initial point, our PAF algorithm iteratively converges to the true solution at a linear rate for both real and complex signals. Besides, PAF algorithm needn’t any truncation or re-weighted procedure, so it enjoys simplicity for implementation. The effectiveness and benefit of the proposed method are validated by both the simulation studies and the experiment of recovering natural images.

Index Terms:

Phase retrieval, Perturbed amplitude flow, Linear convergence.

I Introduction

I-A Problem Setup and Related Work

In this paper, we consider the well-known phase retrieval problem, which aims to recover a signal ${\mathbf{x}}\in{\mathbb{H}}^{n}$ , where ${\mathbb{H}}={\mathbb{R}}$ or ${\mathbb{C}}$ , from phaseless measurements

[TABLE]

Here ${\mathbf{x}}\in{\mathbb{H}}^{n}$ is the target signal or the target vector and the vectors ${\mathbf{a}}_{j}\in{\mathbb{H}}^{n}$ for all $j$ are the measurement vectors. Phase retrieval has many applications in both science and engineering, such as X-ray crystallography [1, 2], astronomy [3], optics [4, 5], microscopy [6].

Due to the removal of phase information in the measurements $|\langle{\mathbf{a}}_{j},{\mathbf{x}}\rangle|$ , we can only recover ${\mathbf{x}}$ up to a unimodular constant. Moreover, it is also known that $\mathcal{O}(n)$ general measurements are enough to recover a signal ${\mathbf{x}}\in{\mathbb{H}}^{n}$ uniquely. Particularly, it was shown that $m\geq 2n-1$ and $m\geq 4n-4$ generic measurements $\{{\mathbf{a}}_{j}\}_{j=1}^{m}\subset{\mathbb{H}}^{n}$ are sufficient to recover any ${\mathbf{x}}\in{\mathbb{H}}^{n}$ up to a unimodular constant for ${\mathbb{H}}={\mathbb{R}}$ and ${\mathbb{H}}={\mathbb{C}}$ , respectively [7, 8, 9].

The original phase retrieval problem mainly considers the recovery of a signal from its Fourier transform magnitude [10] or the magnitude of the short-time Fourier transform [11, 12, 13]. At the same time, more algorithms have been developed for general cases, in which random observations are considered, which also provide heuristic algorithms for practical applications. They can be roughly divided into two categories: the convex methods and the non-convex ones. For convex methods, the general strategy is to lift the phase retrieval problem into a problem of recovering a rank-one matrix and apply the semi-definite programming to solve it. The first such method, called PhaseLift [14, 15, 16], can achieve the exact recovery using $m=\mathcal{O}(n)$ independent Gaussian random measurements ${\mathbf{a}}_{j}$ , $j=1,\ldots,m$ . However, such an approach is computationally inefficient for large dimensional problems since semi-definite programming for $n\times n$ matrices is slow for large $n$ . An alternative method called PhaseMax [17, 18, 19] aims to recover the signal ${\mathbf{x}}$ by solving the model

[TABLE]

where $\hat{{\mathbf{z}}}$ is an approximation to the true signal ${\mathbf{x}}$ . It is proved that this method can recover ${\mathbf{x}}$ with high probability when $m\geq 4n/\theta$ where $\theta=1-\frac{2}{\pi}\text{angle}(\langle\hat{{\mathbf{z}}},{\mathbf{x}}\rangle)$ . However, numerical experiments have shown that larger oversampling ratios $m/n$ are often required for exact recovery, especially compared to several non-convex algorithms.

In a different direction, a series of non-convex approaches have been proposed and studied. Among such schemes, early studies are based on the alternating projection approach, including the works by Gerchberg and Saxton [20] and Fineup [21]. These methods often perform well numerically but lack theoretical foundations. Motivated by the success of alternating minimization, Netrapalli et al [22] developed the AltMinPhase method that is shown to achieve linear convergence with $\mathcal{O}(n\log^{3}n)$ Gaussian random measurements and resampling. Recently, the sample complexity is improved to $\mathcal{O}(n)$ Gaussian random measurements in complex number field under a carefully chosen initial point by Waldspurger in [23]. However, such an alternating projection-based approach also suffers from larger computational complexity, due to the projection step. More recently another framework was proposed, in which one starts from a “good” initial guess and try to iteratively refine it by solving a given model such as the intensity-based model [24, 25]

[TABLE]

or the amplitude-based model [26, 27, 28, 29]

[TABLE]

or the Poisson likelihood model [30]

[TABLE]

Among existing proposed algorithms to solve intensity-based model (2), Candès et al have developed the Wirtinger Flow method (WF) [24] to recover ${\mathbf{x}}$ via gradient descent. It achieved provable linear convergence with $m=\mathcal{O}(n\log n)$ Gaussian random measurements under carefully chosen initialization method. Particularly, Sun et al [31] proved the benign geometric landscape of (2) under $\mathcal{O}(n\mathrm{poly}(\log{n}))$ Gaussian measurements, motivating the Trust-region method to avoid spurious local minimizers. Besides, Ma et al [32] proved the “nice” geometry of (2) under Gaussian random measurements, explaining the favorable performance of unregularized gradient descent. Such geometric benefits guarantee the success of gradient descent for this non-convex phase retrieval problem. Recently, the result is refined in real field ${\mathbb{H}}={\mathbb{R}}$ to achieve a reduction of measurements $m=\mathcal{O}(n)$ by solving amplitude-based model (3) via gradient descent [27] or via truncated gradient descent [26] or via reweighted gradient descent [29], or by solving Poisson likelihood model (4) via modified gradient descent [30]. In detail, Zhang et al [33] have proposed Reshaped Wirtinger Flow, which named Amplitude Flow (AF) in this paper to coincide with the model used, to solve model (3) by gradient descent. Wang et al [26] have proposed Truncated Amplitude Flow (TAF) to solve model (3) by truncated gradient descent. Wang et al [29] have designed Reweighted Amplitude Flow (RAF) to solve model (3) via reweighted gradient descent. Chen and Candès [30] have designed Truncated Wirtinger Flow (TWF), which solves model (4) by modified gradient descent.

From the perspective of theoretical analysis, the methods that given in AF, TAF, RAF and TWF all can achieve linear convergence under the optimal order of measurements. Different from truncation-based methods (e.g., TAF [26], TWF [30]) that remove the components having too much influence on the search direction, the RAF [29] implements re-weighted procedure to control such components by reducing their weights at each update. Instead of using truncation or re-weighted procedures to get reliable gradients, the AF [27] method performs gradient descent directly. But the analysis of AF is based on the fact that the value of $\text{sign}(\langle{\mathbf{a}}_{j},{\mathbf{z}}\rangle)$ equals $-1$ or $1$ , which can only be satisfied in the real number field. This fact is also required by TAF, TWF and RAF. Thus the theoretical results can’t be extended to the complex case trivially.

In this paper, we introduce a new perturbed amplitude-based model to address these theoretical deficiencies and limitations in this framework.

I-B Our Contribution: The Perturbed Amplitude Flow (PAF)

We propose the Perturbed Amplitude Flow (PAF) algorithm in this paper through the following model:

[TABLE]

where ${\bm{\epsilon}}=[\epsilon_{1},\ldots,\epsilon_{m}]\in{\mathbb{R}}^{m}$ have prescribed value, with the requirement that

[TABLE]

Note that if $b_{j}=0$ , then $\left(\sqrt{|{\mathbf{a}}_{j}^{*}{\mathbf{z}}|^{2}+\epsilon_{j}^{2}}-\sqrt{b_{j}^{2}+\epsilon_{j}^{2}}\right)^{2}$ is smooth regardless of the value of $\epsilon_{j}$ , even when $\epsilon_{j}=0$ . The loss function $f_{{\bm{\epsilon}}}$ is thus smooth. When all $\epsilon_{j}=0$ , this model is reduced to the classic amplitude-based model (3). So we shall name it as the perturbed amplitude-based model and name the corresponding gradient descent method as Perturbed Amplitude Flow (PAF).

In the perturbed amplitude-based model (5), ${\bm{\epsilon}}$ not only keeps the loss function smooth but plays a role similar to truncation/re-weighted while reducing the effects of bad observations. From the previous work [26, 29], we know that only the gradients associated with sizable $|{\mathbf{a}}_{j}^{*}{\mathbf{z}}|/|{\mathbf{a}}_{j}^{*}{\mathbf{x}}|$ offer meaningful directions. In detail, considering the model (3), when ${\mathbb{H}}={\mathbb{R}}$ the wirtinger derivative of $f$ concerning to ${\mathbf{z}}$ is

[TABLE]

with ${\mathbf{h}}={\mathbf{z}}-{\mathbf{x}}$ . Note that the first term ${\mathbf{a}}_{j}{\mathbf{a}}_{j}^{\top}{\mathbf{h}}$ flows a desirable direction, whereas the second term ${\mathbf{a}}_{j}|{\mathbf{a}}_{j}^{\top}{\mathbf{x}}|\Big{(}\frac{{\mathbf{a}}_{j}^{\top}{\mathbf{x}}}{|{\mathbf{a}}_{j}^{\top}{\mathbf{x}}|}-\frac{{\mathbf{a}}_{j}^{\top}{\mathbf{z}}}{|{\mathbf{a}}_{j}^{\top}{\mathbf{z}}|}\Big{)}$ has negative influence and such an influence can be reduced when ${\mathbf{a}}_{j}^{\top}{\mathbf{z}}$ shares the same sign with ${\mathbf{a}}_{j}^{\top}{\mathbf{x}}$ . The TAF [26] established that those terms with inconsistent sign are normally those terms with small $|{\mathbf{a}}_{j}^{\top}{\mathbf{z}}|$ in real case, which motivates a truncation scheme that drops the terms with small $|{\mathbf{a}}_{j}^{\top}{\mathbf{z}}|/|{\mathbf{a}}_{j}^{\top}{\mathbf{x}}|$ . Instead of abandoning those gradients, RAF [29] uses re-weighted procedure to reduce the influence of those components. However, these analyses heavily rely on the sign of each element equal 1 or -1, therefore hard to be extended to the complex case.

For our model, with a suitable choice of ${\bm{\epsilon}}$ , one can control the size of the gradient. This is essential for avoiding the extremely large gradient components. More precisely, note that the Wirtinger derivative of $f_{{\bm{\epsilon}}}$ with respect to ${\mathbf{z}}$ is

[TABLE]

The magnitude of $\nabla f_{{\bm{\epsilon}}}({\mathbf{z}})$ is under control even when $|{\mathbf{a}}_{j}^{*}{\mathbf{z}}|/|{\mathbf{a}}_{j}^{*}{\mathbf{x}}|$ very small. This fact avoids the extreme value of gradients during each update, which makes each update flows in a desirable direction and guarantees the gradient satisfies curvature condition. The curvature condition shall be introduced in Lemma II.3.

So the truncation-based methods (TAF, TWF) use truncation to withdraw the spurious components and RAF uses re-weighted to reduce the effects of “bad” gradients. Compared to them, our PAF controls these components by adding the perturbed term, i.e., $\epsilon$ to avoid the extreme value during each update, which frees our methods from truncation or re-weighted procedure. Besides, such a perturbation and corresponding benefit is applicable to both real and complex fields, thus make our theoretical analysis easily incorporate the complex field as a whole.

Numerical tests show that our proposed algorithm outperforms AF ( ${\bm{\epsilon}}={\mathbf{0}}$ ) in terms of success rate for real signals, as shown in Figure 2. Besides, using vanilla gradient descent to solve the perturbed amplitude-based model (5), we can achieve linear convergence with $m=\mathcal{O}(n)$ measurements for both real and complex signals (see Section II). The result improves upon the WF method, which uses $m=\mathcal{O}(n\log n)$ measurements, or the AF method, which can be theoretically proved only for real signals, or the TWF, TAF, RAF methods, which need truncation or re-weighted procedure during each iteration.

In summary, compared with the previous algorithms for solving model (3) or (4), the PAF method needn’t truncation or re-weighted at all and the convergence result holds for both real and complex signals. Numerical experiments show that the proposed PAF method is slightly more efficient although comparable computationally with TAF, RAF and significantly more efficient than TWF (see Section III). We believe the reason lies in the fact that truncated/re-weighted methods, such as TWF, TAF, RAF incur additional computational cost on measuring the gradient components.

I-C Notations

Let ${\mathbf{x}}\in{\mathbb{H}}^{n}$ $({\mathbb{H}}={\mathbb{C}}$ or ${\mathbb{H}}={\mathbb{R}})$ be the target signal. Throughout this paper, we assume that ${\mathbf{a}}_{j}\in{\mathbb{H}}^{n}$ , $j=1,\ldots,m$ are $m$ independent and identically distributed standard Gaussian random measurement vectors, i.e. ${\mathbf{a}}_{j}\sim\mathcal{N}(0,I)$ for ${\mathbb{H}}={\mathbb{R}}$ and ${\mathbf{a}}_{j}\sim\mathcal{N}(0,I/2)+i\mathcal{N}(0,I/2)$ for ${\mathbb{H}}={\mathbb{C}}$ . For each measurement ${\mathbf{a}}_{j}$ , we obtain $b_{j}=|{\mathbf{a}}_{j}^{*}{\mathbf{x}}|$ . We shall attempt to recover the original signal ${\mathbf{x}}$ from $b_{j}$ , $j=1,\ldots,m$ by solving the perturbed amplitude-based model (5). In this paper, we use $C$ , $c$ or the subscript/superscript form of them to represent constants and their values vary according to the context. Since for phase retrieval the best we can do is to recover the target signal ${\mathbf{x}}$ up to a global phase/sign, we use the following definition for distance between two vectors ${\mathbf{x}},{\mathbf{z}}\in{\mathbb{H}}^{n}$ :

[TABLE]

where

[TABLE]

For any $\rho\geq 0$ , we define the $\rho$ -neighborhood of ${\mathbf{x}}$ as

[TABLE]

II Perturbed Amplitude Flow Algorithm

II-A Initialization

To avoid iterations getting trapped in undesirable stationary points, a proper initialization is essential to any non-convex optimization problem. To achieve this goal, many initialization methods have been proposed, such as the spectral initialization method [24], a modified spectral initialization method [30] and the null initialization method [26]. These methods are all based on finding the eigenvector corresponding to the largest eigenvalue of a specially designed Hermitian matrix.

Here we adopt the initialization strategy given in [25], which is shown to provide a good initial guess under $\mathcal{O}(n)$ measurements. With this strategy, the initial guess ${\mathbf{z}}_{0}$ is obtained by calculating the eigenvector corresponding to the largest eigenvalue of the Hermitian matrix

[TABLE]

with $\gamma=1/2$ for ${\mathbb{H}}={\mathbb{C}}$ or $\gamma=1/\sqrt{3}$ for ${\mathbb{H}}={\mathbb{R}}$ , and normalized to $\|{\mathbf{z}}_{0}\|=\lambda$ , where $\lambda$ is defined by

[TABLE]

Lemma II.1 ([25]).

Let ${\mathbf{z}}_{0}$ be the above initial guess. For any $\xi>0$ , there exists a $C_{\xi}>0$ such that for $m\geq C_{\xi}n$ ,

[TABLE]

holds with probability at least $1-4\exp(-c_{\xi}n)$ .

II-B Gradient Descent Iteration

After initialization to obtain ${\mathbf{z}}_{0}$ , we use gradient descent on the loss function $f_{{\bm{\epsilon}}}$ given in (5) by

[TABLE]

to iteratively refine the estimation:

[TABLE]

where $\mu$ is the step size and $\nabla f_{{\bm{\epsilon}}}({\mathbf{z}})$ is the Wirtinger derivative of $f_{{\bm{\epsilon}}}({\mathbf{z}})$ with respect to ${\mathbf{z}}$ in complex variables ${\mathbf{z}},\overline{{\mathbf{z}}}$ which is defined as

[TABLE]

As simple as the scheme (10) may look, our main result proves that it can achieve linear convergence under the optimal order of measurements $m=\mathcal{O}(n)$ by choosing ${\bm{\epsilon}}=\sqrt{\alpha}{\mathbf{b}}$ for an appropriately chosen parameter $\alpha>0$ ( $0.37\leq\alpha\leq 29$ ).

Motivated by the technique used in WF, the proof of our main result is mainly based on the following two key lemmas, whose proofs are given in Section IV.

Lemma II.2.

Let ${\mathbf{x}}$ be the target signal and assume that ${\bm{\epsilon}}$ satisfies (6). For any $\delta>0$ , there exist constants $C_{\delta}$ , $c_{\delta}>0$ such that as long as $m\geq C_{\delta}n$ , then with probability at least $1-\exp(-c_{\delta}n)$ ,

[TABLE]

holds for every $z\in\mathbb{H}^{n}$ satisfying $z\in\mathcal{S}_{\mathbf{x}}(1/10)$ .

This lemma implies that the gradient of $f_{{\bm{\epsilon}}}$ is well controlled in the neighborhood of the target signal ${\mathbf{x}}$ .

Lemma II.3.

Let ${\mathbf{x}}$ be the target signal and assume that ${\bm{\epsilon}}=\sqrt{\alpha}{\mathbf{b}}$ with $0.37\leq\alpha\leq 29$ . There exist positive constants $C,c,\beta_{\alpha}$ depending on $\alpha$ such that for any ${\mathbf{z}}\in\mathcal{S}_{\mathbf{x}}(1/10)$ and $m\geq Cn$ , we have

[TABLE]

with probability at least $1-\exp(-cn)$ .

The constants in the lemma can, in theory, be explicitly estimated, although the theoretical estimates are typically “overkills” for practical applications, just like in other existing schemes. Later in Remark IV.1, we show more explicitly the relation between $\beta_{\alpha}$ and $\alpha$ . Particularly, by setting $\alpha=0.826$ , $\beta_{\alpha}=64/5945$ roughly reaches its largest value. For ${\bm{\epsilon}}=\sqrt{\alpha}{\mathbf{b}}$ with $\alpha\in[0.37,29]$ , Lemma II.3 guarantees sufficient descent along the search direction.

Set ${\mathbf{h}}:=e^{-i\phi_{\mathbf{x}}({\mathbf{z}})}{\mathbf{z}}-{\mathbf{x}}$ with $\rho=\|{\mathbf{h}}\|$ . Then

[TABLE]

The main technique in proving Lemma II.3 is that we first fix one ${\mathbf{z}}\in{\mathbb{C}}^{n}$ and then provide estimates separately for cases $|{\mathbf{a}}_{j}^{*}{\mathbf{h}}|\geq\rho|{\mathbf{a}}_{j}^{*}{\mathbf{x}}|$ and $|{\mathbf{a}}_{j}^{*}{\mathbf{h}}|<\rho|{\mathbf{a}}_{j}^{*}{\mathbf{x}}|$ . An $\eta$ -net argument is then used to obtain uniform control over all ${\mathbf{z}}\in\SS_{\mathbf{x}}(\rho)$ .

Building on these two lemmas, we can now state and prove our main theorem, which establishes linear convergence of the PAF algorithm iteration (10).

Theorem II.1.

Under the conditions of Lemma II.3, let ${\mathbf{z}}_{k}$ , $k\in{\mathbb{Z}}_{+}$ be the iterations generated by (10) with $\mu=\beta_{\alpha}/1.001^{2}$ . Assume that ${\mathbf{z}}_{0}\in\mathcal{S}_{\mathbf{x}}(1/10)$ . Then there exist positive constants $C,c$ such that for $m\geq Cn$ , with probability at least $1-\exp(-cn)$ , the following holds for all $k\in{\mathbb{Z}}_{+}$

[TABLE]

In particular by taking $\alpha=0.826$ , with probability at least $1-\exp(-cn)$ , the following holds for all $k\in{\mathbb{Z}}_{+}$

[TABLE]

Proof:

According to the update rule (10), Lemma II.2 and Lemma II.3, for $m\geq Cn$ , with probability at least $1-\exp(-cn)$ we have

[TABLE]

This establishes the linear convergence part of the theorem.

For the second part, we set $\alpha=0.826$ . Later in Remark IV.1, we show that one may take $\beta_{\alpha}=64/5945$ in $\mu=\beta_{\alpha}/1.001^{2}$ . Substituting these values in we thus obtain

[TABLE]

∎

As mentioned earlier, we can achieve ${\mathbf{z}}_{0}\in\mathcal{S}_{\mathbf{x}}(1/10)$ through initialization given in Lemma II.1, by setting $\xi=1/10$ . This also requires $m=\mathcal{O}(n)$ measurements. Thus the combination of Lemma II.1 and Theorem II.1 yield linear convergence of the PAF algorithm.

III Numerical Experiments

III-A Simulation Study

To evaluate the performance of our PAF algorithm, we present a series of simulated tests and compare them with WF, TWF, AF, TAF and RAF. We perform all the simulations under the same initialization procedure. All experiments are carried out on Matlab 2017b with a 2.3 GHz Intel Core i5-8259U and 16 GB memory.

First we plot the relative error for the recovery of a complex-valued signal, in logarithmic scale versus the iteration count for WF, TWF, AF, TAF, RAF and PAF. We choose $n=512$ with $m=4.5n$ i.i.d. Gaussian random measurements ${\mathbf{a}}_{1},{\mathbf{a}}_{2},\ldots,{\mathbf{a}}_{m}\in{\mathbb{C}}^{n}$ . For the initialization, we follow the method given in Section II-A with 50 power iterations. For the PAF algorithm we set ${\bm{\epsilon}}={\mathbf{b}}$ and fix the step size $\mu=2.5$ . Note that AF is equivalent to PAF algorithm with ${\bm{\epsilon}}={\mathbf{0}}$ . We also consider the case where the measurements are contaminated by noise, i.e. ${\mathbf{b}}=\lvert A{\mathbf{x}}\rvert+\omega$ where the noise $\omega$ follows distribution $\omega\sim\mathcal{N}(0,I/10)$ . The results are plotted in Figure 1. It shows that PAF, TWF, AF, TAF and RAF, all of which converge linearly in theory, have comparable convergence rate. PAF seems to have a slight advantage possibly due to its ability to handle a larger step size.

Next, we compare the empirical success rate of PAF with that of WF, TWF, AF, TAF and RAF. Here we set the maximum number of gradient-type iterations to $T=2500$ for each scheme. In PAF, we set $n=512$ , ${\bm{\epsilon}}={\mathbf{b}}$ and fix the step size to $\mu=1$ . We let $m/n$ vary from $1$ to $6$ . A test is successful if the relative error is within $10^{-5}$ after the maximum number of iterations. For the test we compute the success rate by performing 100 random trials for each $m/n$ . The results are given in Figure 2. Of particular note is that in the real case, PAF, TWF and TAF all perform better than AF, indicating the effectiveness of controlling the size of the gradient in all gradient descent algorithms for avoiding spurious stationary points. WF seems to lag behind other algorithms, unsurprisingly, as it agrees with the theoretical analysis.

III-B Recovery of Natural Image

To show the efficiency and scalability of our algorithm, we use PAF to recover the Milky Way Galaxy image 111Download from http://pics-about-space.com/milky-way-galaxy, which is the image used in [24, 34] with the coded diffraction measurements. We denote the image by $\bm{X}$ , $\bm{X}\in{\mathbb{R}}^{1080\times 1920\times 3}$ . This is a color image so it has three channels. Thus we actually perform phase retrieval for each of the three channels separately. Let ${\mathbf{x}}$ denote any of the color channels of $\bm{X}$ . We have measurements

[TABLE]

where $\bm{F}$ denotes the $n\times n$ discrete Fourier transform matrix, and $\bm{D}^{(l)}$ is a diagonal matrix having i.i.d. entries sampled from a distribution $g$ . Here we take the *octanary * pattern that $g=g_{1}g_{2}$ , where $g_{1}$ and $g_{2}$ are independent with distributions

[TABLE]

and

[TABLE]

We set $L=20$ and adopt the same initialization method for all schemes in our comparison. For each model, we record the time elapsed and the iterations needed to achieve relative error at $10^{-5}$ and $10^{-10}$ , respectively. The results are shown in Table I. It is shown that PAF achieves the same level of precision and is comparable in efficiency with AF and TAF. Besides, note that it took TAF, RAF, PAF and AF the same number of iterations to achieve fixed relative error. Moreover, it’s reasonable that our PAF is a little bit slower than AF ( ${\bm{\epsilon}}={\mathbf{0}}$ ) with additional nonzero item ${\bm{\epsilon}}$ . These three methods are significantly more efficient than WF and TWF.

Interestingly if we take a much smaller $L=6$ , while WF does not recover the target image, our PAF method actually performs better than with $L=20$ . It takes 300 iterations and computation time $183.5$ sec to achieve recovery with a relative error of $5.04\times 10^{-15}$ in Figure 3. While more iterations are taken here, the computational time is actually less because $L=6$ is significantly smaller than $L=20$ .

IV Proof of main lemmas in section II-B

IV-A Proof of Lemma II.2

Proof:

For any ${\mathbf{z}}\in{\mathbb{C}}^{n}$ , set ${\mathbf{h}}=e^{-i\phi_{\mathbf{x}}({\mathbf{z}})}{\mathbf{z}}-{\mathbf{x}}$ , where we recall that $\phi_{\mathbf{x}}({\mathbf{z}})$ is given in (8). Then $\|{\mathbf{h}}\|=\textup{dist}({\mathbf{z}},{\mathbf{x}})$ . Denote $A=[{\mathbf{a}}_{1},\ldots,{\mathbf{a}}_{m}]^{*}\in{\mathbb{C}}^{m\times n}$ , ${\mathbf{v}}=[v_{1},v_{2},\ldots,v_{m}]^{T}$ with $v_{j}=\left(1-\frac{\sqrt{b_{j}^{2}+\epsilon_{j}^{2}}}{\sqrt{|{\mathbf{a}}_{j}^{*}{\mathbf{z}}|^{2}+\epsilon_{j}^{2}}}\right)({\mathbf{a}}_{j}^{*}{\mathbf{z}})$ . Note that we set $v_{j}=0$ if $b_{j}=\epsilon_{j}={\mathbf{a}}_{j}^{*}{\mathbf{z}}=0$ . Then $\nabla f_{{\bm{\epsilon}}}({\mathbf{z}})=\frac{1}{m}A^{*}{\mathbf{v}}$ . For any $\epsilon_{j}>0$ , we have

[TABLE]

where the last inequality follows from the inequality $|\sqrt{t^{2}+c^{2}}-\sqrt{s^{2}+c^{2}}|\leq|t-s|$ for any $t,s,c\in{\mathbb{R}}$ . According to Lemma .1 (see the Appendix), for any $\delta^{\prime}>0$ and $m\geq C_{\delta^{\prime}}n$ with a sufficiently large constant $C_{\delta^{\prime}}$ , the inequality

[TABLE]

holds with probability at least $1-e^{-c_{\delta^{\prime}}n}$ for some $c_{\delta^{\prime}}>0$ . Also for the Gaussian random matrix $A$ and any $\delta^{\prime\prime}>0$ , for $m\geq C_{\delta^{\prime\prime}}n$ we have $\|A^{*}\|\leq(1+\delta^{\prime\prime})\sqrt{m}$ with probability at least $1-e^{-c_{\delta^{\prime\prime}}n}$ ([35], Remark 5.40). These results together imply that

[TABLE]

holds with probability at least $1-\exp(-c_{\delta}n)$ whenever $m\geq C_{\delta}n$ for some $C_{\delta},c_{\delta}>0$ . Here we choose $1+\delta\geq\sqrt{(1+\delta^{\prime})}(1+\delta^{\prime\prime})$ and $C_{\delta}\geq\max\{C_{\delta^{\prime}},C_{\delta^{\prime\prime}}\}$ . ∎

IV-B Proof of Lemma II.3

Proof:

Without loss of generality, we shall assume that the target signal ${\mathbf{x}}$ has $\|{\mathbf{x}}\|=1$ . Again for each ${\mathbf{z}}\in{\mathbb{C}}^{n}$ we set ${\mathbf{h}}=e^{-i\phi_{\mathbf{x}}({\mathbf{z}})}{\mathbf{z}}-{\mathbf{x}}$ , and denote $\tilde{{\mathbf{h}}}={\mathbf{h}}/\|{\mathbf{h}}\|$ . Definition 7 implies that ${\rm Im}({\mathbf{h}}^{*}{\mathbf{x}})=0$ . Since ${\mathbf{z}}\in\mathcal{S}_{\mathbf{x}}(1/10)$ , we have $\rho:=\|{\mathbf{h}}\|\leq 1/10$ . Therefore

[TABLE]

with $T_{j}$ being the $j$ -th item of the summation. To simplify the statement, we use $d_{j}$ to denote the denominator of $T_{j}$ , i.e.,

[TABLE]

To prove the conclusion holds for all ${\mathbf{z}}\in\mathcal{S}_{\mathbf{x}}(1/10)$ , i.e., any $\tilde{{\mathbf{h}}}$ in unit ball. We first consider $\tilde{{\mathbf{h}}}\in{\mathbb{C}}^{n}$ to be fixed and then divide it into two cases.

In the first case, we assume $\tilde{{\mathbf{h}}}=c{\mathbf{x}}$ with $\lvert c\rvert=1$ . Here we have ${\rm Im}(\tilde{{\mathbf{h}}}^{*}{\mathbf{x}})=0$ , which implies $\tilde{{\mathbf{h}}}=\pm{\mathbf{x}}$ . Hence

[TABLE]

due to the facts that

[TABLE]

$\epsilon_{j}^{2}=\alpha|{\mathbf{a}}_{j}^{*}{\mathbf{x}}|^{2}$ and $a+\sqrt{ab}\leq\frac{3}{2}a+\frac{1}{2}b$ . Thus under the condition of $\|{\mathbf{h}}\|\leq\frac{1}{10}$ , we obtain

[TABLE]

By Lemma .1 of the Appendix, for $m\geq C_{\delta}n$ , with probability greater than $1-\exp(-c_{\delta}m)$ we have

[TABLE]

For the second case $\tilde{{\mathbf{h}}}\neq\pm{\mathbf{x}}$ , given the assumption $\|{\mathbf{x}}\|=1$ and $\|{\mathbf{h}}\|=\rho$ , we claim that

[TABLE]

Indeed, for each measurement ${\mathbf{a}}_{j}$ we have

[TABLE]

Also note that a Gaussian random measurement ${\mathbf{a}}$ is rotational invariant, i.e. for any unitary matrix $O$ , $O{\mathbf{a}}$ is also a Gaussian random measurement. Thus for fixed ${\mathbf{x}}$ and $\tilde{{\mathbf{h}}}$ , we may without loss of generality assume that $\tilde{{\mathbf{h}}}={\mathbf{e}}_{1}$ and ${\mathbf{x}}=\sigma{\mathbf{e}}_{1}+\sqrt{1-\sigma^{2}}{\mathbf{e}}_{2}$ , with $\sigma=\tilde{{\mathbf{h}}}^{*}{\mathbf{x}}\in{\mathbb{R}}$ . This is because otherwise we can always find a unitary matrix to map $\tilde{{\mathbf{h}}},{\mathbf{x}}$ to these two vectors. Set

[TABLE]

where $O_{2}\in{\mathbb{C}}^{(n-2)\times(n-2)}$ is unitary and

[TABLE]

Then we have $O{\mathbf{x}}=\tilde{{\mathbf{h}}}$ and $O\tilde{{\mathbf{h}}}={\mathbf{x}}$ . Set ${\mathbf{g}}:=O{\mathbf{a}}$ and ${\mathbf{g}}$ is a Gaussian random measurement. Consequently we have

[TABLE]

which implies

[TABLE]

Combining (15) and (16) we now obtain (14).

For each index set $I\subseteq\{1,2,\ldots,m\}$ , define a corresponding event

[TABLE]

According to (14), we know that the event ${\mathbb{E}}_{I}$ occurs with probability $1/2^{m}$ . We assume that $I_{0}$ is an index set which satisfies $\frac{m}{4}\leq|I_{0}|\leq\frac{3m}{4}$ . Then on event ${\mathbb{E}}_{I_{0}}$ , ${\rm Re}\big{(}\langle\nabla f_{{\bm{\epsilon}}}({\mathbf{z}}),\,{\mathbf{z}}-{\mathbf{x}}e^{i\phi_{\mathbf{x}}({\mathbf{z}})}\rangle\big{)}$ can be divided into two groups:

[TABLE]

For each group, we next provide an upper bound and a lower bound for the denominators $d_{j}$ , $j=1,\ldots,m$ . Recall that ${\bm{\epsilon}}=\sqrt{\alpha}{\mathbf{b}}\,(\alpha>0)$ . When $j\in I_{0}=\big{\{}j\,:\,\rho\,|{\mathbf{a}}_{j}^{*}{\mathbf{x}}|>|{\mathbf{a}}_{j}^{*}{\mathbf{h}}|\big{\}}$ we have

[TABLE]

where $U_{1}:=2\alpha+2+3\rho+\frac{3}{2}\rho^{2}$ . Here the second inequality follows from $\epsilon_{j}^{2}=\alpha|{\mathbf{a}}_{j}^{*}{\mathbf{x}}|^{2}$ and

[TABLE]

On the other hand, since

[TABLE]

and $\epsilon_{j}^{2}=\alpha\lvert{\mathbf{a}}_{j}^{*}{\mathbf{x}}\rvert^{2}>(\alpha/\rho^{2})\lvert{\mathbf{a}}_{j}^{*}{\mathbf{h}}\rvert^{2}$ , we have

[TABLE]

where $L_{1}:=\sqrt{(1-\rho)^{2}+\alpha}\,\big{(}\sqrt{(1-\rho)^{2}+\alpha}+\sqrt{1+\alpha}\big{)}/\rho^{2}$ . Similarly, for $k\in I_{0}^{c}=\big{\{}k\,:\,\rho\,|{\mathbf{a}}_{k}^{*}{\mathbf{x}}|\leq|{\mathbf{a}}_{k}^{*}{\mathbf{h}}|\big{\}}$ , we have

[TABLE]

and hence

[TABLE]

where $U_{2}:=\frac{2\alpha+2}{\rho^{2}}+\frac{3}{2}+\frac{3}{\rho}$ and

[TABLE]

where $L_{2}:=\alpha+\sqrt{\alpha(1+\alpha)}$ .

Using the concentration inequalities given in the Appendix, we next give the lower bounds of $\sum_{j\in I_{0}}T_{j}$ and $\sum_{j\in I_{0}^{c}}T_{k}$ . Based on (17), (18) and Lemma .3, given any $\delta>0$ , for $|I_{0}|\geq C_{1}(\delta)n$ the following inequality holds with probability at least $1-\exp\big{(}-c_{1}(\delta)\cdot|I_{0}|\big{)}$

[TABLE]

where $\varphi_{1}:=\frac{1-6\rho}{4U_{1}}-\frac{1}{16L_{1}}-\frac{\delta}{4}$ . Here the fourth inequality comes from Lemma .3.

Similarly, according to (19), (20) and Lemma .3, for $|I_{0}^{c}|\geq C_{2}(\delta)n$ the following inequality holds with probability at least $1-\exp\big{(}-c_{2}(\delta)\cdot|I_{0}^{c}|\big{)}$ :

[TABLE]

where $\phi=\Big{(}\frac{3}{4U_{2}\rho}\Big{)}^{2}/\Big{(}\frac{63}{128U_{2}\rho^{2}}-\frac{9}{128L_{2}}\Big{)}$ and $\varphi_{2}:=\frac{9}{32U_{2}\rho^{2}}+\frac{1}{2U_{2}}-\frac{3}{32L_{2}}-\phi-\frac{\delta}{4}$ . The second inequality follows from the concentration inequalities given in Lemma .3. The fourth inequality derives from the facts that $\frac{63}{128U_{2}\rho^{2}}-\frac{9}{128L_{2}}>0$ for any $0.37\leq\alpha\leq 197$ and $\rho\leq 1/10$ .

Set $\delta:=0.001$ . For arbitrary fixed $\alpha\in[0.37,197]$ , a simple observation is that $\varphi_{1}$ and $\varphi_{2}$ are decreasing functions of $\rho$ . So we next only consider $\rho=1/10$ . When $0.37\leq\alpha\leq 197$ , we have

[TABLE]

and

[TABLE]

with $\tilde{\phi}=\frac{9}{128L_{2}}-\frac{1575}{32U_{2}}<0$ .

For sufficiently large constant $C\geq 4\max\{C_{1}(\delta),C_{2}(\delta)\}$ , as long as $m\geq Cn$ , we have $|I_{0}|\geq m/4\geq C_{1}(\delta)n$ and $|I_{0}^{c}|\geq m/4\geq C_{2}(\delta)n$ . Thus with probability at least $(1-\exp(-c_{3}m))/2^{m}$ , we have

[TABLE]

The number of the index sets $I$ satisfying $\frac{m}{4}\leq|I|\leq\frac{3m}{4}$ is $\sum_{k=m/4}^{3m/4}{m\choose k}$ . So for fixed $\tilde{{\mathbf{h}}}$ , when $\tilde{{\mathbf{h}}}\neq\pm{\mathbf{x}}$ , the inequality (LABEL:generalcase) holds with probability greater than $\sum_{k=m/4}^{3m/4}{m\choose k}(1-\exp(-c_{3}m))/2^{m}$ . Note that

[TABLE]

and $(4e)^{1/4}<2$ . Hence $\sum_{k=0}^{m/4-1}{m\choose k}/2^{m}<c_{0}^{m}$ for some $c_{0}\in(0,1)$ , which implies that $\sum_{k=m/4}^{3m/4}{m\choose k}(1-\exp(-c_{3}m))/2^{m}\geq 1-\exp(-c_{5}m)$ . Moreover, for $\alpha\in[0.37,197]$ we have

[TABLE]

Considering the two cases as a whole, for a fixed ${\mathbf{z}}$ , combining (LABEL:specialcase), (LABEL:generalcase) and (24), we obtain

[TABLE]

with probability at least $1-\exp(-c_{6}m)$ . Particularly, when $\alpha\in[0.37,29]$ we have $\frac{\varphi_{1}+\varphi_{2}}{4}>0.001$ .

To complete the proof, we will need to establish uniform bound over all vectors, so we adopt an $\eta$ -net argument. Observe that

[TABLE]

For any ${\mathbf{z}}\in{\mathbb{C}}^{n}$ , which means for any $\tilde{{\mathbf{h}}}$ with $\|\tilde{{\mathbf{h}}}\|=1$ and ${\rm Im}(\tilde{{\mathbf{h}}}^{*}{\mathbf{x}})=0$ , we consider the function ${\rm Re}\big{(}\langle\nabla f_{\bm{\epsilon}}({\mathbf{x}}+\rho\tilde{{\mathbf{h}}}),\,\rho\tilde{{\mathbf{h}}}\rangle\big{)}$ with $\rho\leq 1/10$ . Suppose that $\tilde{{\mathbf{h}}}_{1},\tilde{{\mathbf{h}}}_{2}\in{\mathbb{C}}^{n}$ satisfy $\|\tilde{{\mathbf{h}}}_{1}-\tilde{{\mathbf{h}}}_{2}\|\leq\eta$ . When $0.37\leq\alpha\leq 29$ we have

[TABLE]

where $\xi\in{\mathbb{C}}^{n}$ . Here the third inequality follows from Lemma II.2 and Lemma .4. Therefore for any $\tilde{{\mathbf{h}}}_{1}$ and $\tilde{{\mathbf{h}}}_{2}$ satisfying $\|\tilde{{\mathbf{h}}}_{1}-\tilde{{\mathbf{h}}}_{2}\|\leq\eta:=\frac{\delta}{6}$ with $\delta=0.001$ , let ${\mathcal{N}}_{\eta}$ be an $\eta$ -net for the unit sphere of ${\mathbb{C}}^{n}$ with cardinality $|{\mathcal{N}}_{\eta}|\leq(1+2/\eta)^{2n}$ . Then for all ${\mathbf{z}}$ , $0.37\leq\alpha\leq 29$ and $m\geq(C_{2}\cdot\eta^{-2}\log\eta^{-1})n$ , with probability at least $1-\exp(-cn)$ we have

[TABLE]

with $\beta_{\alpha}:=(\varphi_{1}+\varphi_{2})/4-\delta>0$ . According to Remark IV.1, when $\alpha=0.826$ , $\beta_{\alpha}=64/5945$ approximately reaches its largest value. ∎

Remark IV.1.

According to the proof of Lemma II.3, by taking $\rho=1/10$ and $\delta=0.001$ , we have $U_{1}=2\alpha+463/200$ , $U_{2}=200\alpha+463/2$ , $L_{1}=100\alpha+81+100\sqrt{(\alpha+1)(\alpha+0.81)}$ and $L_{2}=\alpha+\sqrt{\alpha(1+\alpha)}$ . Recall that

[TABLE]

with $\tilde{\phi}=\frac{9}{128L_{2}}-\frac{1575}{32U_{2}}$ . Figure 4 here shows the relationship between $\beta_{\alpha}$ and $\alpha$ .

Particularly, when $\alpha=0.826$ , $\beta_{\alpha}=64/5945$ roughly reaches its maximum.

[Auxiliary Lemmas]

In previous sections we have applied concentration inequalities several times. They have played a key role in the proof of our results. Here we present these concentration inequalities used for the proof of Lemma II.2 and Lemma II.3.

Lemma .1 ([14] Lemma 3.1 ).

Let ${\mathbf{a}}_{1},{\mathbf{a}}_{2},\ldots,{\mathbf{a}}_{m}\in{\mathbb{C}}^{n}$ be i.i.d. Gaussian random measurements. Fix any $\delta$ in $(0,1/2)$ and assume $m\geq 20\delta^{-2}n$ . Then for all unit vectors ${\mathbf{u}}\in{\mathbb{C}}^{n}$ ,

[TABLE]

holds with probability at least $1-\exp(-mt^{2}/2)$ , where $\delta/4=t^{2}+t$ .

Lemma .2.

Let ${\mathbf{a}}\in{\mathbb{C}}^{n}$ be a Gaussian random measurement. Let ${\mathbf{x}}\in{\mathbb{C}}^{n}$ and $\tilde{{\mathbf{h}}}\in{\mathbb{C}}^{n}$ be two fixed vectors with $\|{\mathbf{x}}\|=\|\tilde{{\mathbf{h}}}\|=1$ , ${\rm Im}(\tilde{{\mathbf{h}}}^{*}{\mathbf{x}})=0$ and $\tilde{{\mathbf{h}}}\neq\pm{\mathbf{x}}$ . Then we have

[TABLE]

and

[TABLE]

Proof:

Since the distribution of ${\mathbf{a}}$ is invariant by unitary transformation, we can take ${\mathbf{x}}={\mathbf{e}}_{1}$ and $\tilde{{\mathbf{h}}}=\sigma{\mathbf{e}}_{1}+\sqrt{1-\sigma^{2}}{\mathbf{e}}_{2}$ , where $\sigma={\mathbf{x}}^{*}\tilde{{\mathbf{h}}}={\rm Re}({\mathbf{x}}^{*}\tilde{{\mathbf{h}}})\in{\mathbb{R}}$ and $|\sigma|<1$ . We use $\xi_{1},\xi_{2},\xi_{3},\xi_{4}$ to represent the real and imaginary parts of $a_{1}$ and $a_{2}$ respectively, which implies that the variables $\xi_{1},\xi_{2},\xi_{3},\xi_{4}$ are independent and obey normal distribution $\mathcal{N}(0,1/2)$ . Then it follows that

[TABLE]

and

[TABLE]

Since ${\mathbf{a}}$ is invariant by unitary transformation and ${\mathbf{x}},\tilde{{\mathbf{h}}}$ are two fixed vectors satisfying $\tilde{{\mathbf{h}}}\neq\pm{\mathbf{x}}$ , so we have

[TABLE]

Here ${\mathbf{g}}:=O{\mathbf{a}}$ is a Gaussian random measurement with unitray matrix $O$ satisfying $O{\mathbf{x}}=\tilde{{\mathbf{h}}}$ and $O\tilde{{\mathbf{h}}}={\mathbf{x}}$ . Then we obtain

[TABLE]

which implies (LABEL:exp-haax).

Similarly, we have

[TABLE]

which implies

[TABLE]

And

[TABLE]

implies

[TABLE]

and

[TABLE]

Then to prove the inequalities (27), (28), (29) and (30), it’s sufficient to prove

[TABLE]

Next, we commit to prove (31) and (32). Firstly, we take polar coordinates transformation:

[TABLE]

with $r_{1},r_{2}\in(0,\infty)$ , $\theta_{1},\theta_{2}\in[0,2\pi)$ . Then we can write the expectation as

[TABLE]

It is an even function about $\sigma$ and when $\sigma\in[0,1)$ the derivative

[TABLE]

Hence the expectation obtains its maximum at $\sigma=0$ , i.e.,

[TABLE]

Thus we have the inequality (31).

Using the same polar coordinates transformation, we know

[TABLE]

Thus we obtain (32). This completes the proof. ∎

Lemma .3.

Let ${\mathbf{a}}_{1},{\mathbf{a}}_{2},\ldots,{\mathbf{a}}_{m}\in{\mathbb{C}}^{n}$ be i.i.d. Gaussian random measurements. Let ${\mathbf{x}}\in{\mathbb{C}}^{n}$ and $\tilde{{\mathbf{h}}}\in{\mathbb{C}}^{n}$ be two fixed vectors with $\|{\mathbf{x}}\|=\|\tilde{{\mathbf{h}}}\|=1$ , ${\rm Im}(\tilde{{\mathbf{h}}}^{*}{\mathbf{x}})=0$ and $\tilde{{\mathbf{h}}}\neq\pm{\mathbf{x}}$ . For any $\delta>0$ , there exist positive constants $C_{\delta},c_{\delta}>0$ such that for any $m\geq C_{\delta}n$ the inequalities

[TABLE]

and

[TABLE]

hold with probability at least $1-\exp(-c_{\delta}m)$ .

Proof:

For fixed $\tilde{{\mathbf{h}}}$ and ${\mathbf{x}}$ , the following sets are all independent sub-exponential random variables

[TABLE]

Recall that ${\mathbf{a}}=(a_{1},\ldots,a_{n})\in{\mathbb{C}}^{n}\sim\mathcal{N}(0,I/2)+i\mathcal{N}(0,I/2)$ is a Gaussian random measurement. Then based on Bernstein-type inequality, for any $\delta>0$ , the inequalities

[TABLE]

hold with probability at least $1-\exp(-c_{\delta}m)$ provided $m\geq C_{\delta}n$ , where $C_{\delta},c_{\delta}$ are positive constants depending on $\delta$ . Then the inequalities (33), (34), (35), (LABEL:con3), (LABEL:con4) can be derived directly from the expectation bounds given in Lemma .2.

∎

The following lemma provides an upper bound for the operator norm of $\nabla^{2}f_{\bm{\epsilon}}({\mathbf{z}})$ .

Lemma .4.

Set ${\bm{\epsilon}}=\sqrt{\alpha}{\mathbf{b}}$ . Then there exist constants $C^{\prime},c^{\prime}>0$ such that for $m\geq C^{\prime}n$ , $\|\nabla^{2}f_{\bm{\epsilon}}({\mathbf{z}})\|\leq 2\sqrt{\frac{1+\alpha}{\alpha}}$ holds with probability at least $1-\exp(-c^{\prime}m)$ .

Proof:

Recall that

[TABLE]

Similarly, we obtain

[TABLE]

For any ${\mathbf{z}}\in{\mathbb{C}}^{n}$ , we have

[TABLE]

with probability at least $1-\exp(-c^{\prime}m)$ provided $m\geq C^{\prime}n$ . Here the third inequality is obtained by Lemma .1. ∎

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Miao, P. Charalambous, J. Kirz, and D. Sayre, “Extending the methodology of x-ray crystallography to allow imaging of micrometre-sized non-crystalline specimens,” Nature , vol. 400, no. 6742, p. 342, 1999.
2[2] V. Elser, T. Lan, and T. Bendory, “Benchmark problems for phase retrieval,” Siam Journal on Imaging Sciences , vol. 11, no. 4, pp. 2429–2455, 2018.
3[3] C. Fienup and J. Dainty, “Phase retrieval and image reconstruction for astronomy,” Image Recovery: Theory and Application , pp. 231–275, 1987.
4[4] A. Walther, “The question of phase retrieval in optics,” Optica Acta: International Journal of Optics , vol. 10, no. 1, pp. 41–49, 1963.
5[5] R. P. Millane, “Phase retrieval in crystallography and optics,” JOSA A , vol. 7, no. 3, pp. 394–411, 1990.
6[6] J. Miao, T. Ishikawa, Q. Shen, and T. Earnest, “Extending x-ray crystallography to allow the imaging of noncrystalline materials, cells, and single protein complexes,” Annu. Rev. Phys. Chem. , vol. 59, pp. 387–410, 2008.
7[7] R. Balan, P. Casazza, and D. Edidin, “On signal reconstruction without phase,” Applied and Computational Harmonic Analysis , vol. 20, no. 3, pp. 345–356, 2006.
8[8] A. S. Bandeira, J. Cahill, D. G. Mixon, and A. A. Nelson, “Saving phase: Injectivity and stability for phase retrieval,” Applied and Computational Harmonic Analysis , vol. 37, no. 1, pp. 106–125, 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Perturbed amplitude flow for phase retrieval

Abstract

Index Terms:

I Introduction

I-A Problem Setup and Related Work

I-B Our Contribution: The Perturbed Amplitude Flow (PAF)

I-C Notations

II Perturbed Amplitude Flow Algorithm

II-A Initialization

Lemma II.1** ([25]).**

II-B Gradient Descent Iteration

Lemma II.2**.**

Lemma II.3**.**

Theorem II.1**.**

Proof:

III Numerical Experiments

III-A Simulation Study

III-B Recovery of Natural Image

IV Proof of main lemmas in section II-B

IV-A Proof of Lemma II.2

Proof:

IV-B Proof of Lemma II.3

Proof:

Remark IV.1**.**

Lemma .1** ([14] Lemma 3.1 ).**

Lemma .2**.**

Proof:

Lemma .3**.**

Proof:

Lemma .4**.**

Proof:

Lemma II.1 ([25]).

Lemma II.2.

Lemma II.3.

Theorem II.1.

Remark IV.1.

Lemma .1 ([14] Lemma 3.1 ).

Lemma .2.

Lemma .3.

Lemma .4.