Deep Iterative Reconstruction for Phase Retrieval

\c{C}a\u{g}atay I\c{s}{\i}l; Figen S. Oktem; Aykut Ko\c{c}

arXiv:1904.11301·eess.IV·August 20, 2019

Deep Iterative Reconstruction for Phase Retrieval

\c{C}a\u{g}atay I\c{s}{\i}l, Figen S. Oktem, Aykut Ko\c{c}

PDF

TL;DR

This paper introduces a novel deep learning-enhanced phase retrieval algorithm combining neural networks with traditional methods, achieving improved accuracy and robustness over existing techniques.

Contribution

The work develops a hybrid phase retrieval method using two neural networks with the HIO algorithm, enhancing reconstruction quality and robustness to noise and initialization.

Findings

01

State-of-the-art reconstruction performance achieved.

02

Enhanced robustness to noise and initialization.

03

Comparable computational cost to traditional HIO.

Abstract

Classical phase retrieval problem is the recovery of a constrained image from the magnitude of its Fourier transform. Although there are several well-known phase retrieval algorithms including the hybrid input-output (HIO) method, the reconstruction performance is generally sensitive to initialization and measurement noise. Recently, deep neural networks (DNNs) have been shown to provide state-of-the-art performance in solving several inverse problems such as denoising, deconvolution, and superresolution. In this work, we develop a phase retrieval algorithm that utilizes two DNNs together with the model-based HIO method. First, a DNN is trained to remove the HIO artifacts and is used iteratively with the HIO method to improve the reconstructions. After this iterative phase, a second DNN is trained to remove the remaining artifacts. Numerical results demonstrate the effectiveness of…

Tables1

Table 1. Table 1: The average reconstruction and runtime performances for 236 236 236 test images ( 5 5 5 Monte Carlo runs)

$α = 2$ (Avg. SNR: 33.39 dB)	Avg. PSNR (dB)			Avg. SSIM			Avg. runtime (sec.)
	Overall	Natural	Unnatural	Overall	Natural	Unnatural
The HIO method (The output of the initialization stage)	18.97	18.92	20.78	0.28	0.29	0.26	55.40
DNN-1	20.76	20.77	20.33	0.33	0.33	0.20	55.47
Iterative DNN-HIO (The final HIO reconstruction)	21.63	21.60	22.75	0.47	0.47	0.26	59.07
PrDeep	23.45	23.49	21.72	0.51	0.51	0.24	169.81
Developed method	23.61	23.60	24.02	0.53	0.53	0.31	59.14
$α = 3$ (Avg. SNR: 31.66 dB)	Avg. PSNR (dB)			Avg. SSIM			Avg. runtime (sec.)
	Overall	Natural	Unnatural	Overall	Natural	Unnatural
The HIO method (The output of the initialization stage)	18.07	18.02	19.97	0.21	0.21	0.14	55.61
DNN-1	19.69	19.68	20.06	0.26	0.26	0.18	55.69
Iterative DNN-HIO (The final HIO reconstruction)	21.07	21.03	22.82	0.41	0.42	0.25	60.29
PrDeep	22.06	22.09	20.91	0.44	0.44	0.22	171.02
Developed method	22.87	22.85	23.50	0.47	0.48	0.29	60.35
$α = 4$ (Avg. SNR: 30.40 dB)	Avg. PSNR (dB)			Avg. SSIM			Avg. runtime (sec.)
	Overall	Natural	Unnatural	Overall	Natural	Unnatural
The HIO method (The output of the initialization stage)	17.34	17.30	18.72	0.16	0.17	0.10	55.78
DNN-1	18.75	18.76	18.65	0.21	0.21	0.14	55.86
Iterative DNN-HIO (The final HIO reconstruction)	20.08	20.03	22.22	0.35	0.36	0.20	60.99
PrDeep	20.69	20.70	20.38	0.37	0.38	0.18	172.47
Developed method	21.80	21.77	22.79	0.41	0.41	0.25	61.05

Equations6

y^{2} = ∣Fx ∣^{2} + w, w \sim N (0, α^{2} D ia g (∣ Fx ∣^{2}))

y^{2} = ∣Fx ∣^{2} + w, w \sim N (0, α^{2} D ia g (∣ Fx ∣^{2}))

\displaystyle\mathbf{x}_{k+1}[n]=\left\{\begin{array}[]{rcl}\mathbf{x}_{k}^{\prime}[n]&\mbox{for}&n\notin\gamma\\ \mathbf{x}_{k}[n]-\beta\mathbf{x}_{k}^{\prime}[n]&\mbox{for}&n\in\gamma\\ \end{array}\right.

\displaystyle\mathbf{x}_{k+1}[n]=\left\{\begin{array}[]{rcl}\mathbf{x}_{k}^{\prime}[n]&\mbox{for}&n\notin\gamma\\ \mathbf{x}_{k}[n]-\beta\mathbf{x}_{k}^{\prime}[n]&\mbox{for}&n\in\gamma\\ \end{array}\right.

x_{k}^{'} =

x_{k}^{'} =

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Deep Iterative Reconstruction for Phase Retrieval

Çağatay Işıl

Department of Electrical and Electronics Engineering

Middle East Technical University (METU)

Ankara, 06800, Turkey

[email protected]

&Figen S. Oktem

Department of Electrical and Electronics Engineering

Middle East Technical University (METU)

Ankara, 06800, Turkey

[email protected]

&Aykut Koç

Artificial Intelligence & Information Technologies Research Program Department

ASELSAN Research Center

Ankara 06370, Turkey

[email protected] Ç. Işıl is also with Artificial Intelligence & Information Technologies Research Program Department, ASELSAN Research Center, Ankara 06370, Turkey

Abstract

Classical phase retrieval problem is the recovery of a constrained image from the magnitude of its Fourier transform. Although there are several well-known phase retrieval algorithms including the hybrid input-output (HIO) method, the reconstruction performance is generally sensitive to initialization and measurement noise. Recently, deep neural networks (DNNs) have been shown to provide state-of-the-art performance in solving several inverse problems such as denoising, deconvolution, and superresolution. In this work, we develop a phase retrieval algorithm that utilizes two DNNs together with the model-based HIO method. First, a DNN is trained to remove the HIO artifacts, and is used iteratively with the HIO method to improve the reconstructions. After this iterative phase, a second DNN is trained to remove the remaining artifacts. Numerical results demonstrate the effectiveness of our approach, which has little additional computational cost compared to the HIO method. Our approach not only achieves state-of-the-art reconstruction performance but also is more robust to different initialization and noise levels.

K****eywords phase retrieval $\cdot$ deep learning $\cdot$ inverse problems $\cdot$ image reconstruction

1 Introduction

The classical phase retrieval problem is the recovery of a constrained signal from the magnitude of its Fourier transform, or equivalently from its autocorrelation. This problem is encountered in a variety of applications in science and engineering such as crystallography [1], microscopy [2, 3], astronomy [4], optical imaging [5, 6], and speech processing [7]. Although a unique solution almost always exists for most of the practical scenarios [8], solving the phase retrieval problem is generally a difficult task because of the inherent ill-posedness and nonlinearity involved.

Although there are several approaches developed for phase retrieval, each suffers from different limitations. Alternating projection-based methods, including the hybrid input-output (HIO) algorithm, are the most commonly used methods because of their low computational complexity and image generality [9, 10]. This class of methods alternates between the space and frequency domains by imposing the available information in each domain through projections [9, 11, 12, 10]. However, some of these projections involve non-convex sets, and hence convergence to the global solution can not be guaranteed. The resulting reconstructions may have artifacts and errors mostly due to being stuck in local minima or amplification of noise in the solution. More recent phase retrieval algorithms have been developed to overcome some of these limitations. Examples include semi-definite programming-based approaches [13, 14, 15], regularization-based methods [16, 17, 18], global optimization methods [19], and Wirtinger flow and its variants [20].

Deep neural networks (DNNs) [21] have been shown to be successful in various inverse problems in imaging in the last few years [22]. There are mainly two different approaches in exploiting DNNs for the solution of inverse problems. In the first class of approaches, a DNN is used to reconstruct the unknown image directly from an available measurement or from an initial estimate obtained with a simple model-based inversion approach. Hence, these approaches exploit DNNs either to perform direct inversion or to improve a rough estimate that may involve artifacts or errors. For this, a DNN is trained by minimizing a loss function between the ground truth images and the available measurements or estimates. This approach has been utilized to solve several inverse problems [23, 24, 25, 26], including phase retrieval as encountered in holography, lensless imaging and Fourier ptychography [27, 3, 28]. Moreover, in the second class of approaches, DNNs are utilized for the regularization of model-based inversion methods by using plug-and-play regularization and its variants [29, 30]. This approach has been also applied to several inverse problems [31, 32, 33, 34] including phase retrieval [17].

In this paper, we develop a hybrid phase retrieval algorithm that utilizes DNNs with a model-based inversion approach. Here, the used model-based inversion approach is the well-known HIO method, which incorporates the physical model and the constraints into the solution, but may lead to artifacts. The main idea in the developed method is to use a DNN in an iterative manner with the HIO method to remove the artifacts. The developed approach consists of two main stages: the iterative DNN-HIO stage and the final DNN stage. For the iterative DNN-HIO stage, a DNN is trained to remove the HIO artifacts. This trained DNN is then used iteratively with the HIO method to generate an intermediate reconstruction. In the final stage, the intermediate reconstructions are used to train a second DNN to remove the remaining artifacts. The performance of the developed approach is compared with the classical and state-of-the-art methods through numerical simulations. The results demonstrate the effectiveness of our approach, which has relatively little additional computational cost compared to HIO. Our approach not only achieves state-of-the-art reconstruction performance, but also is more robust to different initialization and noise levels.

The rest of this paper is organized as follows. The classical phase retrieval problem is described in Section 2. Related work on phase retrieval and DNN-based methods are discussed in Section 3. Section 4 presents the developed approach. The performance of the approach is compared with the classical and state-of-the-art methods in Section 5 through simulations. Finally, we summarize the results and conclude in Section 6.

2 Phase Retrieval Problem

In the classical phase retrieval problem, available measurements can be modeled as

[TABLE]

where $\mathbf{y^{2}}\in\mathbb{R}^{MxM}$ denotes the noisy Fourier intensity measurements, $\mathbf{F}$ is the $MxM$ -point DFT matrix, and $\mathbf{x}\in\mathbb{R}^{NxN}$ represents the unknown image of interest. The unknown image $\mathbf{x}$ is assumed to be non-negative, real-valued and have finite support. Moreover, $\mathbf{w}\in\mathbb{R}^{MxM}$ denotes the measurement noise, and $\alpha$ is a scaling parameter that controls the signal-to-noise ratio (SNR). The noise is generally assumed to be Poisson-distributed, and here its normal approximation [17] is used.

For two or higher dimensional real-valued discrete signals with finite support, Fourier intensity measurements at discrete frequencies, $\mathbf{|Fx|^{2}}$ , can uniquely determine the unknown signal, $\mathbf{x}$ . To guarantee uniqueness, for an image with support $N$ x $N$ , the magnitude of its $MxM$ -point oversampled DFT with $M\geq 2N-1$ should be provided [8]. In this work, $M$ is chosen as $2N$ for simplicity.

3 Related Work

3.1 Alternating Projection Methods for Phase Retrieval

Alternating projection-based methods are widely used for phase retrieval. In the classical Gerchberg-Saxton (GS) algorithm [11], magnitude constraints are iteratively imposed in space and Fourier domains to reconstruct the unknown signal. The error reduction (ER) algorithm is a modified version of the GS algorithm, which uses other space domain constraints than the magnitude in the space domain [12]. The most commonly used alternating projection-based method is the HIO algorithm [10], which is developed based on the ER algorithm.

Similar to the ER algorithm, in the HIO method, Fourier magnitude constraint and space domain constraints (such as support, non-negativity, and real valuedness) are iteratively used. However, unlike ER, HIO does not force the iterates to satisfy the constraints exactly, but it uses the iterates to eventually drive the algorithm to a solution that satisfy the constraints [10]. The HIO iterations can be expressed as follows:

[TABLE]

where

[TABLE]

Here, $\mathbf{x}_{k}\in\mathbb{R}^{NxN}$ is the reconstruction at the $k^{th}$ iteration, $\mathbf{F^{-1}}$ denotes the inverse DFT matrix, $\odot$ represents the element-wise (Hadamard) multiplication operation, $\beta$ is a constant parameter (with a typical value of $0.9$ ) and $\gamma$ is the set of indices $n$ for which $\mathbf{x}_{k}^{\prime}[n]$ violates the space domain constraints [10]. Although the convergence behavior of the HIO method cannot be completely analyzed, it often converges to a reasonably good solution empirically in a wide variety of applications. However, the HIO reconstructions may have artifacts and errors mostly due to being trapped in local minima or amplification of noise in the solution. Variants of the HIO method have also been proposed to improve its performance [9].

3.2 DNN-based Methods for Inverse Problems

In the last decade, DNNs have been successfully used for the solution of various inverse problems including denoising, deconvolution, and superresolution [22]. There are two main approaches in utilizing DNNs for solving inverse problems.

In the first class of approaches, a DNN is used to reconstruct the unknown image directly from an available measurement or from an initial estimate obtained with a simple model-based inversion. That is, these approaches exploit DNNs either to solve end-to-end inverse problems or to improve a rough estimate that may have artifacts or errors. For this purpose, a DNN is trained by minimizing a loss function using a dataset containing the ground truth images and the measurements (or the initial estimates). In general, this approach provides a faster reconstruction than a model-based inversion approach since it works in a non-iterative feed-forward fashion to solve the problem. However, a DNN usually needs specialized training and dataset for each inverse problem, which reduces its flexibility to handle different inverse problems. More importantly, this approach works successfully only when the measurements or the initial estimates used for reconstruction are similar in appearance to the ground truth images. This approach has been used to solve several inverse problems in imaging applications such as denoising [23], deconvolution [24, 35], superresolution [25, 36], tomography [26], holographic image reconstruction [3], phase retrieval for phase objects [27], and Fourier ptychography [28].

In the second class of approaches, DNNs are utilized for the regularization of model-based inversion methods. The classical approach to regularization is to formulate the inverse problem as a maximum posterior (MAP) estimation problem by incorporating the prior statistical knowledge about the unknown image. This yields to an optimization problem involving a likelihood term, which quantifies the fidelity with the model, and a prior term, which is also known as the regularization term [37]. By variable splitting techniques, this optimization problem can be divided into sub-problems to deal with the likelihood and prior terms separately. In particular, the sub-problem containing the prior term corresponds to a denoising problem, which can be solved with any denoising algorithm. This is the main idea in plug-and-play regularization [29, 30]. Recently, DNN-based denoisers are used for plug-and-play regularization because DNNs provides state-of-the-art performance in denoising. Plug-and-play regularization using DNN-based denoisers is a flexible model-based approach since the same denoiser can be used for the solution of different inverse problems. This approach has been applied to several inverse problems including deconvolution, denoising, superresolution and demosaicking [31, 32, 33, 34], as well as phase retrieval [17, 38].

4 DNN-based Iterative Approach

Our deep learning-based hybrid approach utilizes DNNs together with the HIO method. The main idea in our approach is to use the HIO method to directly incorporate the physical model and the constraints into the reconstruction, and DNNs to improve the resulting HIO reconstructions. The first DNN, namely DNN-1, is trained to remove the artifacts of the initial HIO reconstructions, and is used iteratively with the HIO method to generate an intermediate reconstruction. Then, a second DNN, namely DNN-2, is trained to remove the remaining artifacts after this iterative stage. The output of DNN-2 is the final reconstruction of our method. The overall approach is illustrated in Fig. 1 using representative images for the input and output of each step. A preliminary version of this iterative approach was presented in [39].

As shown in Fig. 1, the approach consists of three stages: the initialization stage, the iterative DNN-HIO stage and the final DNN stage. The initialization stage aims to achieve robustness to initialization. For this aim, the HIO reconstructions with different random initialization are obtained and the one that provides Fourier magnitude closest to the given measurement is chosen as the input (initialization) for the iterative stage. In the iterative DNN-HIO stage, a DNN and the HIO method are used iteratively to generate an intermediate reconstruction. This DNN is trained using the HIO reconstruction at the output of the initialization stage with the ground truth images. Hence this training aims to remove the HIO artifacts at the output of the initialization stage, but this can be performed to some extent. After this iterative stage, the intermediate reconstructions have less artifacts than the initial HIO reconstructions. In the final DNN stage, the intermediate reconstructions are used with the ground truth images to train a second DNN to remove the remaining artifacts.

As DNN architectures, the modified U-net architecture developed in [26] is used. This architecture, which is shown in Fig. 2, works in a non-iterative feed-forward fashion to solve general inverse problems in imaging. In particular, in [26], this is used to obtain reconstructions for computed tomography. Here we use the same architecture in an iterative manner with the HIO method to solve the phase retrieval problem.

This architecture is the modified version of the original U-net architecture [40]. The original U-net is developed for biomedical image segmentation and it exploits encoding and decoding convolutional layers with skip connections between symmetric downsampling and upsampling convolutional layers [22]. These features were shown to be useful for solving many inverse problems including denoising [41], image inpainting [42], optical flow estimation [43] and computed tomography [26]. In addition to these features, the modified U-net architecture contains batch normalization layers and direct skip connection between the input and output. These modifications help the DNN to better learn the residual between the input and output images [23].

In what follows, we provide the details of each stage in our approach.

4.1 Initialization Stage

Due to the nonlinearity (and non-convexity) involved in the phase retrieval problem, the reconstruction algorithms are generally sensitive to initialization. Here, to increase the robustness of our approach, a particular initialization procedure described earlier in [17] is used. In this procedure, first, the HIO method is run with $m$ different random initialization for a small number of $s$ iterations. Then, the reconstruction $\mathbf{\hat{x}}$ with the lowest residual ${\left\|\mathbf{y-|F\hat{x}|}\right\|}^{2}_{2}$ is used for another HIO run for a larger number of $n$ iterations. The final reconstruction is used as the input (initialization) for the iterative DNN-HIO stage.

4.2 Iterative DNN-HIO Stage

As mentioned before, although the HIO method benefits from the physical model and the constraints during the reconstruction process, the results may have artifacts and errors caused mostly by the presence of noise or being stuck in local minima. In this stage, a DNN (namely DNN-1) and the HIO method are used alternately to solve the phase retrieval problem.

DNN-1 is trained to remove the artifacts of the HIO method after the initialization stage. That is, DNN-1 is trained by using a dataset containing the true images and their corresponding HIO reconstructions at the output of the initialization stage. Then, the HIO method and the trained DNN are used in an iterative manner until the reconstructions start to change slightly. This iterative approach aims to improve the reconstructions by escaping from local minima and reducing artifacts.

More specifically, at the $k^{th}$ iteration of this stage, the last HIO reconstruction, $\mathbf{x}_{k}$ , is used as the input for DNN-1. Then, the improved reconstruction, $\mathbf{u}_{k}$ , at the output of DNN-1 is used as the initialization for the HIO method, which is run for a small number of $t$ iterations. This iterative procedure continues until the normalized error between two consecutive DNN-1 reconstructions, i.e. $\left\|\mathbf{u}_{k}-\mathbf{u}_{k-1}\right\|_{2}/\left\|\mathbf{u}_{k}\right\|_{2}$ , is smaller than $10^{-3}$ .

As the iterations proceed, both the reconstructions of DNN-1 and HIO are improved. In particular, the HIO method better preserves the high spatial frequencies of the original image, which represent sudden spatial changes in the image, compared to DNN-1, while DNN-1 provides reconstructions with less artifacts. This has two main reasons. First, DNNs generally smooth out the high frequencies during its learning process when they are trained with a mean squared error (MSE) based loss, which is a common problem in DNNs [22]. Moreover, the main task of DNN-1 here is to remove the large artifacts, which inherently comes with the side effect of smoothing (i.e. low-pass filtering). Secondly, unlike DNN-1, the HIO method uses the available measurements together with the forward model, which helps to preserve high frequencies, although it comes with artifacts. The final HIO reconstruction is used as the input for the last stage in order to preserve high frequencies in the final reconstruction.

4.3 Final DNN Stage

In this last stage, a second DNN (namely DNN-2) is used to improve the reconstruction of the iterative DNN-HIO stage by removing the remaining artifacts. The reason for using a different DNN here is that DNN-1 is trained to remove the HIO artifacts at the output of the initialization stage, but the reconstructions of the iterative DNN-HIO stage have less artifacts than before (for example, see Fig. 1). Hence training another DNN enables to obtain improved reconstructions with better preserved high frequencies (details) and reduced artifacts.

DNN-2 is trained to remove the artifacts of the iterative DNN-HIO stage. That is, DNN-2 is trained by using a dataset containing the same ground truth images and the corresponding HIO reconstructions at the output of the iterative DNN-HIO stage. As mentioned before, MSE-based loss function is used for training, but different loss functions could also be utilized to better preserve high frequencies. This trained DNN is used in a non-iterative feed-forward fashion to obtain the final reconstruction of our method.

5 Numerical Results

Here we present numerical simulations to illustrate the effectiveness of our approach. For this, we consider a large image dataset and compare the reconstruction performance of the developed approach with the classical and state-of-the-art phase retrieval methods.

To compare the algorithms in terms of noise tolerance, image generality, and computational efficiency, the reconstruction performance is investigated using two different kind of images, which are called natural and unnatural images. For training DNN-1 and DNN-2, only natural images are used. This training dataset consists of $3000$ natural images. These include $200$ training and $100$ validation images of Berkeley segmentation dataset (BSD) [44], $400$ selected images from validation set of ImageNet database [45, 31], and randomly chosen $2300$ images of Waterloo Exploration Database [46].

For testing, both natural and unnatural images are used. This test dataset consists of $236$ images containing $230$ natural and $6$ unnatural images. These include $200$ test images of BSD, $24$ Kodak dataset images [47], $6$ natural and $6$ unnatural images taken from [17]. The unnatural image dataset consists of images acquired by scanning electron microscopes and telescopes, as shown in Fig. 3. The pixel values of all images are between [math] and $255$ , and all are of size $256\times 256$ .

The noisy Fourier measurements were simulated using Eqn. 1 with $\alpha=3$ , resulting in an average SNR of $31.84$ dB (where SNR $=10\log(\left\||\mathbf{Fx}|^{2}\right\|_{2}/\left\|\mathbf{y^{2}-|Fx|^{2}}\right\|_{2})$ ). These measurements were used to obtain the initial HIO reconstructions at the output of the initialization stage. DNN-1 was trained using these reconstructions and the true images. Likewise, DNN-2 was trained using the true images and the HIO reconstructions of the iterative DNN-HIO stage. Although only natural images were used in training, the developed approach with the trained DNNs was tested using both natural and unnatural images.

Training was performed by minimizing the MSE-based loss between the true images and the output of each DNN. Stochastic gradient descent algorithm with momentum was used for the optimization [48]. All computations were done using MATLAB with MatConvNet toolbox [49] and NVIDIA Geforce GTX TITAN X GPU. The total training times for DNN-1 and DNN-2 were about $38$ hours (for $251$ iterations) and $51$ hours (for $201$ iterations), respectively.

In the initialization stage, the HIO method was first run with $m=50$ different random initialization for $s=50$ iterations. Then, the reconstruction with the lowest residual was used for another HIO run for $n=1000$ iterations. The resulting reconstruction was input to the iterative DNN-HIO stage as shown in Fig. 1. In this stage, each time the HIO method was run for $t=5$ iterations.

After the testing phase, the reconstructions of the developed approach were compared with the true images using peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) [50]. For comparison, the reconstructions of the HIO method and prDeep [17], one of the state-of-the-art deep learning-based phase retrieval algorithms, were also obtained. Both the developed algorithm and prDeep were initialized with the output of the initialization stage. The HIO reconstruction used for comparison was the output of this initialization stage.

In Table 1, the average reconstruction performance of the algorithms for $236$ test images and $5$ Monte Carlo runs are given for different amount of Poisson noise ( $\alpha=2,3,4$ ). As seen in the table, for all cases, the developed method outperforms the HIO and prDeep methods in terms of both PSNR and SSIM, while requiring little additional runtime compared to HIO. As another benchmark, the results at the output of DNN-1 and iterative DNN-HIO stages are also provided in the table to show performance gains obtained by the iterative approach. The results illustrates that, by utilizing a DNN in an iterative manner with the HIO method, many of the HIO artifacts can be successfully removed while preserving the image characteristics. This iterative approach with the additional DNN (DNN-2) is the overall method, which provides the best reconstruction performance.

Sample reconstructions for a natural image in the test dataset are shown in Fig. 4. As seen from the figures, the developed approach provides the best reconstruction visually as well as in terms of used quantitative image quality measures (PSNR and SSIM). In fact, our approach generally does not introduce artifacts and errors like the HIO and prDeep methods. As mentioned before, removing artifacts sometimes causes the side effect of smoothing, as illustrated with the zoomed results given in Fig. 4.

For the same test image, Fig. 5 shows the several intermediate reconstructions obtained with the developed approach. The reconstructions at the output of each stage, including the initialization stage, iterative DNN-HIO stage, and the final DNN stage, are shown here, together with their respective PSNR and SSIM values. These clearly illustrate the contribution of each stage. For example, the improvement obtained with the final DNN-2 stage can be understood by comparing the final reconstruction in Fig. 5f with the reconstructions at the output of the iterative stage as given in Figures 5d and 5e. In fact, this final reconstruction is much better than all the other reconstructions both visually and quantitatively. Moreover, to demonstrate the usefulness of the iterative use of HIO with DNN-1, the reconstructions obtained after the first iteration are also provided in Figures 5b and 5c. Comparing these with Figures 5d and 5e illustrates that, although even a single iteration helps to improve the initial HIO reconstruction, iterations until convergence can provide much significant improvement. Note that after DNN-1 (see Fig. 5b), the reconstruction suffers from over-smoothing, and when this is input to HIO (see Fig. 5c) some high frequency information is recovered but with artifacts. As the iterations proceed, both over-smoothing and artifacts are reduced.

To assess the performance of different algorithms in terms of image generality, the results for both natural and unnatural test images are separately provided in Table 1. As seen in the table, although the DNNs were trained by using only natural images, the developed method shows the best reconstruction performance not only for natural images but also for unnatural images, which have distinct statistics from natural images. In particular, the performance of the prDeep method substantially degrades for unnatural images, as expected, since its reconstruction relies on a regularization prior learned from natural images. To illustrate these points, sample reconstructions for an unnatural image in the test dataset are shown in Fig. 6.

The developed approach also appears to be robust to different noise levels. As seen from the table, the reconstruction performance of the approach surpasses the other methods for different noise levels ( $\alpha=2,4$ ) as well, even though the DNNs were trained only for a specific noise level ( $\alpha=3$ ).

As mentioned before, phase retrieval algorithms are generally sensitive to initialization because of the nonlinearity involved in the problem. To illustrate the robustness of the developed approach to different initialization and image characteristics, the PSNR and SSIM histograms are provided in Fig. 7 for each method (when $\alpha=3$ ). These include reconstructions obtained with $236$ distinct test images and $5$ Monte Carlo runs, which means that $5$ different initialization is used for each test image. As seen from the histograms, although the histogram for the prDeep reconstructions has more counts in higher PSNR and SSIM values, our method attains a higher average PSNR and SSIM, as well as a smaller spread around these averages. These results suggest that the performance of the developed approach is more robust to different initialization and image statistics compared to HIO and prDeep.

Sample reconstructions illustrating the performance of the developed approach for different initialization are shown in Fig. 8. Here different HIO reconstructions of the same image are used as an initialization for prDeep and the developed method. As seen, for the HIO initialization with the lower PSNR and SSIM values, prDeep reconstruction has more artifacts than the developed method. Hence, Fig. 7 and 8 together demonstrate that the developed method is more robust to initialization than prDeep.

The average runtime of each method is also given in Table 1. As seen, the HIO and the developed method are roughly three-fold faster than prDeep. In fact, the runtime of the HIO initialization stage approximately corresponds to $92\%$ of the runtime of the developed method. Hence our approach not only outperforms the prDeep and HIO methods in terms of reconstruction quality, but also is computationally more efficient than prDeep and achieves a computational efficiency almost comparable with the HIO method.

6 Conclusions

In this paper, we developed a phase retrieval approach that utilizes two DNNs with the model-based HIO method. The key idea in the approach is the iterative use of a DNN with the HIO method, which simultaneously incorporates the physical model and the constraints into the solution, while avoiding the reconstruction artifacts. The performance of the developed approach is also compared with the classical and state-of-the-art methods through various numerical simulations. The results demonstrate the effectiveness of our approach both in terms of reconstruction quality and computational efficiency. Our approach not only achieves state-of-the-art reconstruction performance but also is more robust to initialization, different noise levels, and image statistics. Moreover, the developed approach achieves a computational efficiency almost comparable with the HIO method.

Note that the developed method contains two DNNs, DNN-1 and DNN-2, each of which is trained to remove HIO artifacts. That is, DNN-1 is trained to remove the artifacts of HIO reconstructions at the output of the initialization stage and DNN-2 is trained to remove the artifacts of HIO reconstructions at the output of iterative DNN-HIO stage. These reconstructions have different amount of artifacts, as one can observe from the PSNR and SSIM values in Table 1. One would expect the trained weights of these two DNNs to vastly differ since each DNN is trained to remove different amount of HIO artifacts. To explore this, we analyzed the frequency response of the first $64$ convolution filters in each trained DNN. Almost half of these filters in DNN-1 have similar characteristics, which effectively correspond to low pass filters. On the other hand, DNN-2 filters have varying frequency responses and a very small fraction of these filters are low-pass. This indicates that the detailed differences between the input and the desired output images (like edges) are lost more in the first convolution filters of DNN-1 and do not propagate much through the network. This is expected since DNN-1 is trained using inputs with larger amount of HIO artifacts. The low-pass behavior of many of the input filters of DNN-1 can be the reason why DNN-1 is less successful in learning the details and leads to over-smoothed images at its output. A more detailed analysis of the filters in DNNs would provide a better understanding of the developed approach, which will be a topic for future study. The joint training of DNNs can be another promising research direction in this respect.

To conclude, the developed hybrid method offers state-of-the-art reconstruction performance as well as computational efficiency for the phase retrieval problem. We believe that the hybrid use of DNNs with model-based approaches, such as in an iterative manner as illustrated in this paper, may play a key role in developing more reliable algorithms for phase retrieval and nonlinear inverse problems in general.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. P. Millane. Phase retrieval in crystallography and optics. J. Opt. Soc. Am. A , 7(3):394–411, Mar 1990.
2[2] Guoan Zheng, Roarke Horstmeyer, and Changhuei Yang. Wide-field, high-resolution fourier ptychographic microscopy. Nature Photonics , 7(9):739, 2013.
3[3] Yair Rivenson, Yibo Zhang, Harun Günaydın, Da Teng, and Aydogan Ozcan. Phase recovery and holographic image reconstruction using deep learning in neural networks. Light: Science & Applications , 7(2):17141, 2018.
4[4] J C. Dainty and James Fienup. Phase retrieval and image reconstruction for astronomy. Image Recovery: Theory Appl , 13:231–275, 01 1987.
5[5] Adriaan Walther. The question of phase retrieval in optics. Optica Acta: International Journal of Optics , 10(1):41–49, 1963.
6[6] Timothy J. Schulz and Donald L. Snyder. Image recovery from correlations. J. Opt. Soc. Am. A , 9(8):1266–1272, Aug 1992.
7[7] Lawrence Rabiner and Biing-Hwang Juang. Fundamentals of speech recognition . Prentice-Hall, Inc., 1993.
8[8] M. Hayes. The reconstruction of a multidimensional sequence from the phase or magnitude of its fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing , 30(2):140–154, Apr 1982.