GAN-based Projector for Faster Recovery with Convergence Guarantees in Linear Inverse Problems
Ankit Raj, Yuqi Li, Yoram Bresler

TL;DR
This paper introduces a GAN-based projector for linear inverse problems that accelerates recovery by 60-80 times, guarantees convergence under certain conditions, and reduces measurement requirements by 5-10 times, applicable across various tasks.
Contribution
It proposes a network-based projector for PGD that speeds up GAN-based recovery, with theoretical convergence guarantees and a method for designing measurement matrices, applicable to multiple inverse problems.
Findings
Achieves 60-80x faster recovery than previous GAN methods.
Requires 5-10x fewer measurements for similar accuracy.
Provides convergence guarantees under moderate conditioning.
Abstract
A Generative Adversarial Network (GAN) with generator trained to model the prior of images has been shown to perform better than sparsity-based regularizers in ill-posed inverse problems. Here, we propose a new method of deploying a GAN-based prior to solve linear inverse problems using projected gradient descent (PGD). Our method learns a network-based projector for use in the PGD algorithm, eliminating expensive computation of the Jacobian of . Experiments show that our approach provides a speed-up of over earlier GAN-based recovery methods along with better accuracy. Our main theoretical result is that if the measurement matrix is moderately conditioned on the manifold range() and the projector is -approximate, then the algorithm is guaranteed to reach reconstruction error in steps in the low noise regime.…
| CSGM 333Run time includes 2 initializations, as implemented by the authors, for CelebA. The same number of initializations for CelebA (and 10 for MNIST) has been used to produce results in figures 2, 7, 8, and 9. Our NPGD algorithm uses only one, deterministic initialization, . | PGD-GAN | NPGD | |
|---|---|---|---|
| 200 | 5.8 | 66 | 0.09 (64x) |
| 500 | 6.6 | 60 | 0.10 (66x) |
| 1000 | 8.0 | 63 | 0.11 (72x) |
| 2000 | 11.2 | 61 | 0.14 (80x) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729
GAN-based Projector for Faster Recovery with Convergence Guarantees in Linear Inverse Problems
Ankit Raj Yuqi Li11footnotemark: 1 Yoram Bresler
University of Illinois at Urbana-Champaign, USA
{ankitr3, yuqil3, ybresler}@illinois.edu Equal contribution. Ankit Raj and Yoram Bresler’s research work was supported in part by the National Science Foundation under Grant IIS 14-47879 . Yuqi Li and Yoram Bresler’s reseach work was supported in part by Sandia National Laboratories under Grant ID: AE056, IP: 00371547
Abstract
*A Generative Adversarial Network (GAN) with generator trained to model the prior of images has been shown to perform better than sparsity-based regularizers in ill-posed inverse problems. Here, we propose a new method of deploying a GAN-based prior to solve linear inverse problems using projected gradient descent (PGD). Our method learns a network-based projector for use in the PGD algorithm, eliminating expensive computation of the Jacobian of . Experiments show that our approach provides a speed-up of over earlier GAN-based recovery methods along with better accuracy. Our main theoretical result is that if the measurement matrix is moderately conditioned on the manifold range() and the projector is -approximate, then the algorithm is guaranteed to reach reconstruction error in steps in the low noise regime. Additionally, we propose a fast method to design such measurement matrices for a given . Extensive experiments demonstrate the efficacy of this method by requiring fewer measurements than random Gaussian measurement matrices for comparable recovery performance. Because the learning of the GAN and projector is decoupled from the measurement operator, our GAN-based projector and recovery algorithm are applicable without retraining to all linear inverse problems, as confirmed by experiments on compressed sensing, super-resolution, and inpainting. *
1 Introduction
Many application such as computational imaging, and remote sensing fall in the compressive sensing (CS) paradigm. CS [9, 5] refers to projecting a high dimensional, sparse or sparsifiable signal to a lower dimensional measurement , using a small set of linear, non-adaptive frames. The noisy measurement model is:
[TABLE]
where the measurement matrix is often a random matrix. In this work, we are interested in the problem of recovering the unknown natural signal , from the compressed measurement , given the measurement matrix . Traditionally, for signal priors, natural images are considered sparse in some fixed or learnable basis [11, 8, 36, 22, 7, 38, 10, 21].
Instead of the sparse prior commonly adopted by CS literature, we turn to a learned prior. Neural network-based inverse problem solvers have been explored recently [14, 35, 31, 1, 12, 15, 25, 32, 22, 37, 26]. However, [1, 12, 15, 25] use information about the measurement matrix while training the network. Thus, their algorithms are limited to a particular set-up to solve specific inverse-problem and usually cannot solve other problems without retraining. Another line of work, [28, 29] jointly optimizes the measurement matrix and recovery algorithm, again resulting in algorithm limited to a particular inverse problem and measurement matrix. Instead, in this paper the network is trained independently of and can be generalized across different inverse problems. This aspect is shared by two other neural-network-based solvers [35, 31], however, they model the image prior only implicitly by training a denoiser or a proximal map, and perhaps for this reason appear to require massive quantity of training samples. Importantly, very little is known about why and when they perform well, as even if the learned proximal map is assumed to be exact, there is no theoretical convergence guarantee or bound on the recovery error.
In this work, we leverage the success of generative adversarial network (GAN) [13, 6, 42, 39, 3, 20] in modeling the distribution of data. Indeed, GAN-based priors for natural images have been successfully employed to solve linear inverse problems [24, 4, 33]. However, in [24], the operator is integrated into training the GAN, limiting it to a particular inverse problem. We therefore focus on the recent papers [4, 33] closest to our work, for extensive comparisons.
Bora et al.[4] do not have a guarantee on the convergence of their algorithm for solving the non-convex optimization problem, requiring several random initializations. Similarly, in [33], the inner loop uses a gradient descent algorithm to solve a non-convex optimization problem with no guarantee of convergence to a global optimum. Furthermore, the conditions imposed in [33] on the random Gaussian measurement matrix for convergence of their outer iterative loop are unnecessarily stringent and cannot be achieved with a moderate number of measurements. Importantly, both these methods require expensive computation of the Jacobian of the differentiable generator with respect to the latent input . Since computing involves back-propagation through at every iteration, these reconstruction algorithms are computationally expensive and even when implemented on a GPU they are slow.
We propose a GAN-based projection network to solve compressed sensing recovery problems using projected gradient descent (PGD). We are able to reconstruct the image even with compression ratio (i.e., with less than of a full measurement set) using a random Gaussian measurement matrix. The proposed approach provides superior recovery accuracy over existing methods, simultaneously with a speed-up, making the algorithm useful for practical applications. We also provide theoretical results on the convergence of the reconstruction error, given that the measurement matrix satisfies certain conditions when restricted to the range of the generator. We complement the theory by proposing a method to design a measurement matrix that satisfies these sufficient conditions for guaranteed convergence. We assess these sufficient conditions for both the random Gaussian measurement matrix and the designed matrix for a given dataset. Both our analysis and experiments show that with the designed matrix, fewer measurements suffice for robust recovery. Because the training of the GAN and projector is decoupled from the measurement operator, we demonstrate that other linear inverse problems like super-resolution and inpainting can also be solved using our algorithm without retraining.
2 Problem Formulation
Let denote a ground truth image, a fixed measurement matrix, and the noisy measurement, with noise . We assume that the ground truth images lie in a non-convex set , the range of generator . The maximum likelihood estimator (MLE) of , , can be formulated as follows:
[TABLE]
Bora et al.[4] (whose algorithm we denote by CSGM) solve the optimization problem in the latent space (), and set . Their gradient descent algorithm often gets stuck at local optima. Since the problem is non-convex, the reconstruction is strongly dependent on the initialization of and requires several random initializations to converge to a good point. To resolve this problem, Shah and Hegde [33] proposed a projected gradient descent (PGD)-based method (which we call PGD-GAN) to solve (2), shown in fig.2(a). They perform gradient descent in the ambient ()-space and project the updated term onto . This projection involves solving another non-convex minimization problem (shown in the second box in fig.2(a)) using the Adam optimizer [17] for 100 iterations from a random initialization. No convergence result is given for this iterative algorithm to perform the non-linear projection, and the convergence analysis for the PGD-GAN algorithm [33] only holds if one assumes that the inner loop succeeds in finding the optimum projection.
Our main idea in this paper is to replace this iterative scheme in the inner-loop with a learning-based approach, as it often performs better and does not fall into local optima [42]. Another important benefit is that both earlier approaches require expensive computation of the Jacobian of , which is eliminated in the proposed approach.
3 Proposed Method
In this section, we introduce our methodology and architecture to train a projector using a pre-trained generator and how we use this projector to obtain the optimizer in (2).
3.1 Inner-Loop-Free Scheme
We show that by carefully designing a network architecture with a suitable training strategy, we can train a projector onto , the range of the generator , thereby removing the inner-loop required in the earlier approach. The resulting iterative updates of our network-based PGD (NPGD) algorithm are shown in fig.2(b). This approach eliminates the need to solve the non-convex optimization problem in the inner-loop, which depends on initialization and requires several restarts. Furthermore, our method provides a significant speed-up by a factor of on the CelebA dataset for two major reasons: (i) since there is no inner-loop, the total number of iterations required for convergence is significantly reduced, (ii) doesn’t require computation of *i.e.*the Jacobian of the generator with respect to the input, . This expensive operation repeats back-propagation through the network for (for [4]) or (for [33]) times, where are number of restarts, outer and inner iterations respectively.
3.2 Generator-based Projector
A GAN consists of two networks, generator and discriminator, which follow an adversarial training strategy to learn the data distribution. A well-trained generator takes in a random latent variable and produces sharp looking images imitating the training data distribution in . The goal is to train a network that projects an image onto . The projector, onto a set should satisfy two main properties: Idempotence, for any point , , Least distance, for a point , . Figure 3 shows the network structure we used to train a projector using a GAN. We define the multi-task loss to be:
[TABLE]
where is a generator obtained from the GAN trained on a particular dataset. Operator , parameterized by , approximates a non-linear least squares pseudo-inverse of and indicates noise added to the generator’s output for different so that the projector network denoted by is trained on points outside the range() and learns to project them onto . The objective function consists of two parts. The first is similar to standard Encoder-Decoder framework, however, the loss function is minimized over – the parameters of , while keeping the parameters of (obtained by standard GAN training) fixed. This ensures that doesn’t change and is a mapping onto . The second part is used to keep close to true used to generate training image . This second term can be considered a regularizer for training the projector with being the regularization constant.
4 Theoretical Study
4.1 Convergence Analysis
Let denote the loss function of projected gradient descent. Algorithm (1) describes the proposed network-based projected gradient descent (NPGD) to solve equation (2).
Definition 1** (Restricted Eigenvalue Constraint (REC))**
Let . For some parameters , matrix is said to satisfy the if the following holds for all .
[TABLE]
Definition 2** (Approximate Projection using GAN)**
A concatenated network is a -approximate projector, if the following holds for all :
[TABLE]
Theorem 1 provides upper bounds on the cost function and reconstruction error of our NPGD algorithm after iterations.
Theorem 1
Let matrix satisfy the with , and let the concatenated network be a -approximate projector. Then for every and measurement , executing algorithm 1 with step size , will yield . Furthermore, the algorithm achieves \|x_{n}-x^{*}\|^{2}\leq\big{(}C+\frac{1}{2\alpha/\beta-1}\big{)}\delta after \frac{1}{2-\beta/\alpha}\log\big{(}\frac{f(x_{0})}{C\alpha\delta}\big{)} steps. When , .
Proof 1
Please refer to the appendix.
From theorem 1, one important factor is the ratio . This ratio largely determines the speed of linear (”geometric”) convergence, as well as the reconstruction error at convergence. We would like ratio as close to 1 as possible and must have for convergence. It has been shown in [2] that a random matrix with orthonormal rows will satisfy this condition with high probability for roughly linear in dimension with log factors dependent on the properties of the manifold, in this case, . However, as we demonstrate later (see figure 4), a random matrix often will not satisfy the desired condition for small or moderate . To extend into such regimes, we propose next a fast heuristic method to find a relatively good measurement matrix for an image set , given a fixed .
4.2 Generator-based Measurement Matrix Design
There have been a few attempts to optimize the measurement matrix based on the specific data distribution. Hegde et al.[16] find a deterministic measurement matrix that satisfies for a given finite set of size , but their time complexity is . Because the secant set (defined later) would be of cardinality for a training set of size , with , the time complexity would be infeasible even for fairly small -pixel images. Furthermore, the final number of required measurements , which is determined by the algorithm, depends on the isometry constant , and cannot be specified in advance. Kvinge et al.[18] introduced a heuristic iterative algorithm to find a measurement matrix with orthonormal rows that satisfies the REC with small ratio, but their time complexity is and the space complexity is , which is infeasible for a high-dimensional image dataset. Instead, our method, based on sampling from the secant set, has time complexity , and space complexity , where is a tiny fraction of .
Definition 3** (Secant Set)**
The normalized secant set of is defined as follows:
[TABLE]
and the associated distribution is denoted as , where
[TABLE]
Given , the optimization over A is as follows:
[TABLE]
The inequality is due to an additional constraint on . This results in the largest singular value of being 1 and hence the numerator term, , is at most 1. As the minimization in (7) requires iterating through the set , we use the expected value over as a surrogate objective
[TABLE]
The last approximation replaces the surrogate objective by its empirical estimate obtained by sampling secants according to . For and large enough, this designed measurement matrix would satisfy the condition for most of the secants in . Constructing an matrix , (8) reduces to:
[TABLE]
The optimal in (9) has rows equal to the leading eigenvectors . We compute and its eigenvalue decomposition at time complexity and space complexity .
Our approach to the design of is related to one of the steps described by [18], however by using the sampling-based estimates per (6) and (8) rather than the secant set for the entire training set, we reduce the computational cost by orders of magnitude to a modest level.
4.2.1 REC Histogram for
We analyze the conditions by plotting the histogram of values for different measurement matrices in figure 4 where , the secant set of the samples from trained on MNIST dataset. The left column shows the histograms for the random and -based designed matrix. For random , the spread of is clearly wider for few measurements , resulting in . For the designed , the histogram is more concentrated. Even with as few as measurements (for MNIST), the designed satisfies the sufficient condition for convergence of the PGD algorithm, thus ensuring stable recovery. The middle columns shows the histograms corresponding to the downsampling that takes the spatial averages of , , pixel values to generate low-resolution images. The right column shows the histograms for the inpainting that masks out a centered square of various sizes. As expected, with more difficult recovery problems the spread increases. However, for each inverse problem (defined by a matrix ), the ratio can be estimated for e.g., 99.9% of the samples, providing, in combination with Theorem 1, an explicit quantitative guarantee.
5 Experiments
Network Architecture: We implement two GAN architectures: Deep convolutional GAN (DCGAN) [30] for MNIST and CelebA, Self-attention GAN (SAGAN) [41] for LSUN church-outdoor dataset. DCGAN builds on multiple convolution, transpose convolution, and ReLU layers, and uses batch normalization and dropout for better generalization, whereas SAGAN combines convolutions with self-attention mechanisms in both, generator and discriminator, allowing for long-range dependency modeling to generate images with high-resolution details. For DCGAN, we have used standard objective function of the adversarial loss, whereas for SAGAN, we minimized the hinge version of the adversarial loss [27]. The architecture of the model is similar to that of the discriminator in the GAN and only differs in the final layer, where we add a fully-connected layer with output size same as the latent variables dimension . For training , we used the architecture shown in Fig. 3 and objective defined in (2), while keeping the pre-trained fixed. We found that using , in (2), gave the best performance. The noise used for perturbing the training images follows . We observed that training with low results in a projector similar to an identity operator and hence only projecting close-by points onto , whereas for large the projector violates idempotence. We empirically set . We then obtain a projection network that approximately projects images lying outside onto . We empirically pick latent variable dimension .
MNIST dataset [19] consists of greyscale images of digits with training and test samples. We pre-train the GAN consisting of transposed convolution layers for and convolution layers in the discriminator using rescaled images lying between . We use as the ’s input. The GAN is trained using the Adam optimizer with learning rate , mini-batch size of for epochs. For training the pseudo-inverse of i.e., we minimize the objective (2), using samples generated from , and with the same hyper-parameters used for the GAN.
CelebA dataset [23] consists of more than celebrity images. We use the aligned and cropped version, which preprocesses each image to a size of and scaled between . We randomly pick images for training the GAN. Images from the held-out set are used for evaluation. The GAN consists of transposed convolution layers in the and convolution layers in . GAN is trained for epochs using Adam optimizer with learning rate and mini-batch size . is trained in the same way as for the MNIST dataset.
LSUN church-outdoor dataset [40] consists of more than cropped and aligned images of size scaled between . DCGAN generates high-resolution details using spatially local points in lower-resolution feature maps, whereas in SAGAN, details can be generated using information from many feature locations making it a natural choice for diverse dataset such as LSUN. The SAGAN consists of transposed convolution layers and self-attention modules at different scales in and convolution layers and self-attention modules in . Each self-attention module consists of 3 convolution layers and are added at the and layers of the two networks. SAGAN uses conditional batch normalization in and projection in . Spectral normalization is used for the layers in both and . We use ADAM optimizer with and , learning rate and mini-batch size for the GAN training. , consisting of self-attention mechanism similar to , is trained using the objective 2 using the ADAM optimizer with and , learning rate and mini-batch size of for epochs.
We compare the performance of our algorithm on MNIST and CelebA with other GAN-prior solvers ([4, 33]) and sparsity-based methods, Lasso with discrete cosine transform (DCT) basis [34] and total variation minimization method (TVAL3) [21] for linear inverse problems, namely compressed sensing (CS), super-resolution and inpainting. For CS, we extensively evaluate the reconstruction performance with the random Gaussian and designed measurement matrices. Furthermore, we demonstrate the recovery of LSUN church-outdoor dataset images using the proposed method for the different problems in Fig. 5.
5.1 Compressed Sensing
5.1.1 Recovery with random Gaussian matrix
In this set-up, we use the same measurement matrix as ([4, 33]) i.e. where is the number of measurements. For MNIST, the measurement matrix , with , whereas for CelebA, , with . Figure 2 shows the recovery results for MNIST images from the test set. Our NPGD algorithm performs better than others and avoids local optima. Figure 7 shows the reconstruction of eight test images from CelebA. Our algorithm outperforms the other three methods visually as it is able to preserve detailed facial features such as sunglasses, hair and has accurate color tones. Figures 8(a) and 8(c) provide a quantitative comparison for MNIST and CelebA, respectively.
5.1.2 Recovery with the designed matrix
In this set-up, we use the -based designed described in the section 4.2. We observe that recovery with the designed is possible for much fewer measurements . This corroborates our assessment based on Figure 4 that the designed matrix satisfies the desired REC condition with high probability for most of the secants, for smaller . Figures 8(a), 8(c) show that our algorithm consistently outperforms other approaches in terms of reconstruction error and structural similarity index (SSIM) for a random . Furthermore, with the designed , we are able to get performance on-par with the random matrix using smaller . Figures 8(b),8(d) show the recovered images with the designed and a random using our algorithm for different . Clearly, recovery with the random requires much bigger than the designed one to achieve similar performance.
5.2 Super-resolution
Super-resolution refers to recovering the high-resolution image from a single low-resolution image, often modeled as a blurred and downsampled image of the original. This super-resolution problem is just a special case in our framework of linear measurements. We simulate the blurring+downsampling by taking the spatial averages of pixel values (in RGB color space), where is the ratio of downsampling. This corresponds to blurring by an box impulse response, followed by downsampling. We test our algorithm with , corresponding to , and -smaller image sizes, respectively. We note that for higher , the measurement matrix may not satisfy the desired with (see figure 4) required for convergence of our algorithm and, consequently, our theorem might not be applicable. Results for MNIST in figure 9(a)-9(c) shows that recovery performance indeed degrades with increasing , however, our NPGD algorithm, gives better reconstructions than Bora et al.[4].
5.3 Inpainting
Inpainting refers to recovering the entire image from a partly occluded version. In this case, is an image with masked regions and is the linear operation applying a pixel-wise mask to the original image . Again, this is a special case of linear measurements where each measurement corresponds to an observed pixel. For experiments on the MNIST dataset, we apply a centered square mask of size . Recovery results in figure 10(a)-10(c) show that our method consistently outperforms [4] and recovers almost perfectly for mask-size less than . The results align with the histogram for inpainting (figure 4), which shows that for higher mask-size, the desired condition for guaranteed convergence may not be satisfied.
5.4 Comparison of Run-time for Recovery
Table 1 compares the run times of our network-based algorithm NPGD and other recovery algorithms. We record the average run time to recover a single image from its compressed sensing measurements over 10 different images. All three algorithms were run on the same workstation with i7-4770K CPU, 32GB RAM and GeForce Titan X GPU.
5.5 Analysis: Error in Projector
Figure 11 illustrates the idempotence error of the projector for different . Three different categories of images are tested, namely, MNIST training samples, MNIST test samples, and samples generated using the pre-trained . We use clean images from the three sources and plot the relative idempotence error . The error decreases with increasing and saturates around . The idemopotence errors for MNIST training and test samples are very close, indicating negligible generalization error. On the other hand, samples generated by give much lower errors, which indicates representation error in the GAN. Thus we expect that a more flexible generator (deeper network) will lead to a better projector on the actual dataset and hence improve performance.
6 Conclusion
In this work, we propose a GAN based projection network for faster recovery in linear inverse problems. Our method demonstrates superior performance and also provides a speed-up of over existing GAN-based methods, eliminating the expensive computation of the Jacobian matrix every iteration. We provide a theoretical bound on the reconstruction error for a moderately-conditioned measurement matrix. To help design such a matrix for compressed sensing, we propose a method which enables recovery using fewer measurements than using a random Gaussian matrix. Our experiments on compressed sensing, super-resolution, and inpainting demonstrate that generic linear inverse problems can be solved with the proposed method without requiring retraining. In the future, deriving a bound for the projection error and an associated performance guarantee is a interesting direction.
Appendix A Appendix: Proof of Theorem 1
By the assumption of -approximate projection,
[TABLE]
where from the gradient update step, we have
[TABLE]
Substituting into (10) yields
[TABLE]
Rearranging the terms we have
[TABLE]
where the last two inequalities follow from . Now the LHS can be rewritten as:
[TABLE]
Combining (11) and (12), and rearranging the terms, we have:
[TABLE]
and since ,
[TABLE]
For simplicity, we substitute in the following:
[TABLE]
For convergence, we require . When reaches \frac{1}{2-\kappa}\log\Big{(}\frac{f(x_{0})}{C\alpha\delta}\Big{)}, we have
[TABLE]
Finally, when , we have
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Jonas Adler and Ozan Öktem. Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems , 33(12):124007, 2017.
- 2[2] Richard G Baraniuk and Michael B Wakin. Random projections of smooth manifolds. Foundations of computational mathematics , 9(1):51–77, 2009.
- 3[3] David Berthelot, Thomas Schumm, and Luke Metz. Began: Boundary equilibrium generative adversarial networks. ar Xiv preprint ar Xiv:1703.10717 , 2017.
- 4[4] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing using generative models. ar Xiv preprint ar Xiv:1703.03208 , 2017.
- 5[5] Emmanuel J Candes, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences , 59(8):1207–1223, 2006.
- 6[6] Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A Bharath. Generative adversarial networks: An overview. IEEE Signal Processing Magazine , 35(1):53–65, 2018.
- 7[7] Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Bm 3d image denoising with shape-adaptive principal component analysis. In SPARS’09-Signal Processing with Adaptive Sparse Structured Representations , 2009.
- 8[8] Weisheng Dong, Lei Zhang, Guangming Shi, and Xiaolin Wu. Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Transactions on Image Processing , 20(7):1838–1857, 2011.
