Sparse synthesis regularization with deep neural networks
Daniel Obmann, Johannes Schwab, Markus Haltmeier

TL;DR
This paper introduces a novel sparse reconstruction method using deep neural networks with an $ ext{l}^1$ penalty, enabling effective inverse problem solving by leveraging a trained decoder for sparse signal reconstruction.
Contribution
It presents a new sparse synthesis regularization framework with neural networks that incorporates $ ext{l}^1$-penalty in training, differing from traditional frame-based methods.
Findings
Decoder network enables sparse signal reconstruction with thresholded coefficients.
The $ ext{l}^1$-Tikhonov functional acts as a regularization method for inverse problems.
Proven effectiveness in reconstructing signals with sparse synthesis prior.
Abstract
We propose a sparse reconstruction framework for solving inverse problems. Opposed to existing sparse regularization techniques that are based on frame representations, we train an encoder-decoder network by including an -penalty. We demonstrate that the trained decoder network allows sparse signal reconstruction using thresholded encoded coefficients without losing much quality of the original image. Using the sparse synthesis prior, we propose minimizing the -Tikhonov functional, which is the sum of a data fitting term and the -norm of the synthesis coefficients, and show that it provides a regularization method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Sparse synthesis regularization with deep
neural networks
Daniel Obmann
Department of Mathematics
University of Innsbruck
Technikerstrasse 13, 6020 Innsbruck, Austria
[email protected]@uibk.ac.at
Johannes Schwab
Department of Mathematics, University of Innsbruck
Technikerstrasse 13, 6020 Innsbruck, Austria
[email protected]@uibk.ac.at
Markus Haltmeier
Department of Mathematics, University of Innsbruck
Technikerstrasse 13, 6020 Innsbruck, Austria
(August 6, 2019)
Abstract
We propose a sparse reconstruction framework for solving inverse problems. Opposed to existing sparse regularization techniques that are based on frame representations, we train an encoder-decoder network by including an -penalty. We demonstrate that the trained decoder network allows sparse signal reconstruction using thresholded encoded coefficients without losing much quality of the original image. Using the sparse synthesis prior, we propose minimizing the -Tikhonov functional, which is the sum of a data fitting term and the -norm of the synthesis coefficients, and show that it provides a regularization method.
1 Introduction
Various applications in medical imaging, remote sensing and elsewhere require solving an inverse problems of the form
[TABLE]
where is a linear operator between Hilbert spaces , , and is the data distortion. Inverse problems are well analyzed and several established approaches for its solution exist, including filter-based methods or variational regularization [1, 2]. In the very recent years, neural networks (NNs) and deep learning appeared as new paradigms for solving inverse problems, and demonstrate impressive performance. Several approaches have been developed, including two-step [3, 4, 5], variational [6], iterative [7, 8] and regularizing networks [9].
Standard deep learning approaches may lack data consistency for unknowns very different from the training images. To address this issue, in [10] a deep learning approach has been introduced where minimizers
[TABLE]
are investigated. Here is a trained NN, a Hilbert space, , a regularization parameter and . The resulting reconstruction approach has been named NETT (for network Tikhonov regularization), as it is a generalized form of Tikhonov regularization using a NN as trained regularizer. For a related approach see [11]. In [10] it is shown that under reasonable conditions, the NETT yields a convergent regularization method.
In this paper, we introduce a novel deep learning approach for inverse problems that is somehow dual to (1.2). We define approximate solutions of (1.1) as , where
[TABLE]
Here is a trained network, a penalty functional and a regularization parameter. The NETT functional in (1.2) uses an analysis approach where the analysis coefficients are regular with regularity measured in smallness of . Opposed to that, (1.3) assumes regularity of the synthesis coefficients and is therefore a synthesis version of NETT.
In particular, we investigate the case where for some index set and is a weighted -norm used as a sparsity prior. To construct an appropriate network, we train a (modified) tight frame U-net [12] of the form using an -penalty, and take the decoder part as synthesis network. We show numerically that the decoder allows to reconstruct the signal using sparse representations. Note that we train the network independent of any measurement-operator. As in [7] this allows one to solve any inverse problem with the same (or similar) prior assumptions in the same way without having to retrain the network. As the main theoretical result, in this paper we show that (1.3) is a convergent regularization method. Performing numerical reconstructions and comparing (1.3) with existing approaches for solving inverse problems is subject of future research.
2 Preliminaries
In this section, we give some theoretical background of inverse problems. Moreover, we describe the tight frame U-net that will be used for the trained regularizer.
2.1 Regularization of inverse problem
The characteristic property of inverse problems is its ill-posedness, which means that the solution of is not unique or highly unstable with respect to data perturbations. In order to make the signal reconstruction process stable and accurate, regularization methods have to be applied, which use a-priori knowledge about the true unknown in order to construct estimates from data (1.1) that are close to the true solution.
Variational regularization is one of the most established methods for solving inverse problems. These methods incorporate prior knowledge by choosing solutions with small value of a regularization functional. In the synthesis approach, this amounts solving (1.3), where is a prescribed synthesis operator. The minimizers of (1.3) are designed to approximate -minimizing solutions of the equation , defined by
[TABLE]
A frequently chosen regularizer is a weighted -norm, which has been proven to be useful for solving compressed sensing and other inverse problems [13, 14, 15]. This is the form for the regularizer we will be using in this paper.
The synthesis approach is commonly used with being the synthesis operator of a frame of , such as a wavelet or curvelet frame or a trained dictionary [16, 17, 18, 19]. In this case, is linear, which allows the application of the standard sparse recovery theory [2, 14]. Opposed to that, in this paper we take the synthesis operator as a trained network in which case is non-linear. In particular, we take the synthesis operator as decoder part of an encoder-decoder network that is trained to satisfy . As encoder-decoder network we use the tight frame U-net [12] which is a modification of the U-net [20] with improved reproducing capabilities.
2.2 Tight frame U-net
We consider the case of 2D images and denote by the space at the coarsest resolution of the signal with size and channels. The tight frame U-net uses a hierarchical multi-scale representation defined recursively by
[TABLE]
for and with . Here and are convolutional layers followed by a non-linearity and is the identity used for the bypass-connection. are horizontal, vertical and diagonal high-pass filters and is a low-pass filter such that the tight frame property
[TABLE]
is satisfied for some . We define the filters by applying the tensor products , , and of the Haar wavelet low-pass and high-pass filters separately in each channel.
The architecture of the tight frame U-net is shown in Figure 2.1. It uses standard learned convolution, batch-normalization and the fixed wavelet filters for downsampling and upsampling. To improve flexibility of the network we include an additional learned deconvolution layer after the upsampling. After every convolutional layer the ReLU activation function is applied. Similarly, we define a tight frame U-net without bypass-connection,
[TABLE]
for and with . Here , are convolutional layers followed by a nonlinearity, and , are the wavelet filters as described above. In the rest of the paper we will refer to the network defined in (2.2) as tight frame U-net with bypass-connection, and the network defined in (2.4) as tight frame U-net without bypass-connection.
The tight frame property (2.3) allows the networks (2.2) and (2.4) to both have the perfect recovery condition which means that filters can be chosen such that any signal can be perfectly recovered from its frame coefficients if they are given in all layers [12]. In the following we will refer to the results after convolving an image with the fixed wavelet filters as filtered version of .
3 Nonlinear sparse synthesis regularization
To solve the inverse problem (1.1), we use the sparse synthesis NETT which considers minimizers of
[TABLE]
Here is the synthesis operator, an index set and are positive parameters.
3.1 Theoretical analysis
The sparse synthesis NETT can be seen as weighted -regularization for the coefficient inverse problem . For its theoretical analysis we require the following
- (A1)
is bounded linear; 2. (A2)
is weakly continuous; 3. (A3)
.
We then have the following result:
Theorem 3.1** (Well-posedness).**
Under assumptions (A1)-(A3) the following holds:
Existence:* For all , , the functional in (3.1) has a minimizer*
Stability:* Suppose and . Then weak accumulation points of exist and are minimizers of .*
Proof.
According to (A1), (A2), the operator is weakly continuous. Therefore, the results are a direct consequence of [2, Theorem 3.48]. ∎
From [2, Theorem 3.48, Theorem 3.49] we can further deduce convergence (as the noise level goes to zero) of the sparse synthesis NETT. Later we take as decoder part of a tight frame U-net trained as an auto-encoder, which we expect to be weakly continuous and Lipschitz continuous. In this case, we have stability and convergence for the actual reconstruction .
3.2 A trained sparse regularizer
Using a similar architecture to the one suggested in [12], we train a model for sparse regularization. To enforce sparsity in the encoded domain we will use a combination of mean-squared-error and an -penalty of the filtered coefficients as loss-function for training purposes. The idea is to enforce the sparsity in the high-pass filtered images. To achieve this, we will regularize these images in the encoded domain using a regularization parameter depending on the layer.
We write the tight frame U-net defined by (2.2) in the form where is the encoder and the decoder part. Moreover, we denote by for the high-pass filter coefficients of in the th layer. Given training data , the loss-function used for network training is taken as
[TABLE]
The first term of the loss-function is supposed to enforce the network to reproduce the training images. Following the sparse regularization strategy, the second term forces the network to learn convolutions such that high-pass filtered coefficients are sparse.
4 Numerical experiments
The above sparse encoding strategy has been tested with the two network architectures described in (2.2) and (2.4). Both networks are tested for their reconstruction capabilities when setting parts of the frame coefficients to zero. Actual application to the solution of tomographic inverse problems is subject of future research.
4.1 Implementation details
For the numerical experiments, we generated grayscale images which contain an ellipse, a rectangle and a star-like shape. Each of the shapes parameter has been chosen randomly. The training dataset consists of 1500 and the validation dataset of 500 such images. One of the phantoms from the training set is shown in Figure 4.1 (top left). The top right image shows the reconstruction using the tight frame U-net trained with the bypass-connection after setting the bypass-coefficients to zero. The large difference between these two images shows that the bypass-connection significantly contributes to the image representation and reconstruction. Since the wavelet filters have not been applied to the bypass-connection, one cannot expect sparsity for this part. This is actually the reason why we expect the tight frame U-net without bypass-connection to allow much sparser approximation than the tight frame U-net with bypass-connection. This conjecture is supported by the numerical results presented below.
Each of the networks has 3 downsampling- and upsampling-layers and starts with 8 channels for the first convolution. The number of channels is then doubled in each consequent layer. For minimizing the loss-function w.r.t and we use the Adam [21] algorithm with the suggested parameters and train each network for 60 epochs. For the experiments we chose the regularization parameters where is the number of trainings-samples and . The training was done using an Intel Xeon CPU E331225 @ processor and RAM. Each epoch (including the evaluation on the validation set) took about for the tight frame U-net with bypass-connection and about minutes for the tight frame U-net without bypass-connection. This results in a training-time of and , respectively. Note that the training time could be reduced significantly by using GPUs for less than € 1000 instead of the CPU.
4.2 Sparse approximation results
Each of the two tight frame U-nets has been tested on its ability to reconstruct the image from a sparse approximation in the encoded domain. To this end, we calculated the frame coefficients of the test image using the encoder part of the network, and set a certain fraction of the coefficients in each channel with smallest absolute value to 0. The decoder is then applied to the thresholded coefficients to get a sparse approximation of the original image. In Figure 4.2, example reconstructions using all coefficients (left) and thresholded coefficients with a value of (right) are shown. We observe that both tight frame U-net variants yield almost perfect recovery when using the original coefficients. However, as expected, when applied to the thresholded coefficients, the network without bypass-connection (bottom) yields significantly better results.
To quantitatively evaluate the reconstructed images, we compute the structural similarity index (SSIM), the peak-signal-to-noise-ratio (PSNR) and the image distance (ID), defined by with , meaning that entries differing by less than one pixel are considered equal. To evaluate the sparse approximation capabilities of the two models we calculate ratios of the evaluation metrics between the reconstructions with the thresholded and the original coefficients, respectively. In these evaluation metrics, a high (close to 1) ratio indicates good performance.
4.3 Discussion
The reconstruction results in Figure 4.2 show the sparse approximation results using the tight frame U-net with and without bypass-connection. The network with bypass-connection is able to almost perfectly recover the image from all frame coefficients (top left). However, when thresholding of the coefficients, this is no longer the case (top right). The bottom left image shows the image passed through the network without bypass-connection. Comparing this to the pass through the network with bypass-connection we see that the network without bypass-connection, when using all coefficients, performs slightly worse. However, when thresholding of the coefficients obtained by passing the image through the encoder part, the network without bypass-connection significantly outperforms the one with bypass-connection.
To further investigate this issue, we sample images from the validation set and plot the mean of the ratios of the metric scores when setting various percentages of coefficients to zero (Figure 4.3). As a base for this we take the metric scores obtained by passing the images through the network. Because of the inherent sparsity of the images we chose to plot these metrics only for . When comparing the two plots in Figure 4.3 we see that the network without bypass-connection can almost maintain the metric scores up to some point at , whereas the network with bypass-connection falls off right at the beginning and tends to perform worse than the network without bypass-connection.
5 Conclusion
In this paper we proposed a sparse regularization strategy using a neural network as synthesis operator. The network is used as a nonlinear transformation between the image space and a coefficient space used for signal representation. In particular, we used an encoder-decoder pair of a tight frame U-Net trained with an -penalty for signal representation in the coefficient space. To numerically investigate the sparse approximation capabilities, we set some of the encoded coefficients to zero before applying the decoder. Our numerical results suggests that the tight frame U-net without bypass-connection enables sparse recovery. Actual implementation of our approach to tomographic inverse problems and detailed comparison with other established reconstruction methods is subject of future research. We point out that the learned part of our proposed regularization approach only depends on the class of images to be (re-)constructed which allows us to apply the same network to any inverse problem targeting a similar class of phantoms, without having to retrain the network.
Acknowledgments
D.O. and M.H. acknowledge support of the Austrian Science Fund (FWF), project P 30747-N32.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems , ser. Mathematics and its Applications. Dordrecht: Kluwer Academic Publishers Group, 1996, vol. 375.
- 2[2] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen, Variational methods in imaging . Springer, 2009.
- 3[3] D. Lee, J. Yoo, and J. C. Ye, “Deep residual learning for compressed sensing MRI,” in IEEE 14th International Symposium on Biomedical Imaging , 2017, pp. 15–18.
- 4[4] K. H. Jin, M. T. Mc Cann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process. , vol. 26, pp. 4509–4522, 2017.
- 5[5] S. Antholzer, M. Haltmeier, and J. Schwab, “Deep learning for photoacoustic tomography from sparse data,” Inverse Probl. Sci. and Eng. , vol. in press, pp. 1–19, 2018.
- 6[6] E. Kobler, T. Klatzer, K. Hammernik, and T. Pock, “Variational networks: connecting variational methods and deep learning,” in German Conference on Pattern Recognition . Springer, 2017, pp. 281–293.
- 7[7] J. R. Chang, C.-L. Li, B. Poczos, and B. V. Kumar, “One network to solve them all–solving linear inverse problems using deep projection models,” in IEEE International Conference on Computer Vision (ICCV) , 2017, pp. 5889–5898.
- 8[8] J. Adler and O. Öktem, “Solving ill-posed inverse problems using iterative deep neural networks,” Inverse Probl. , vol. 33, p. 124007, 2017.
