TL;DR
This paper introduces a novel RNN architecture derived from unfolding an l1-l1 minimization algorithm, specifically designed for sequential signal reconstruction, leveraging sparsity in signals and their differences.
Contribution
The paper presents a new RNN model based on unfolding a proximal gradient method for l1-l1 minimization, tailored for sparse sequential signal reconstruction.
Findings
Outperforms state-of-the-art RNN models in video frame reconstruction from compressive measurements.
Demonstrates the effectiveness of unfolding optimization algorithms into neural network architectures.
Leverages sparsity in signals and their differences for improved reconstruction accuracy.
Abstract
We propose a new deep recurrent neural network (RNN) architecture for sequential signal reconstruction. Our network is designed by unfolding the iterations of the proximal gradient method that solves the l1-l1 minimization problem. As such, our network leverages by design that signals have a sparse representation and that the difference between consecutive signal representations is also sparse. We evaluate the proposed model in the task of reconstructing video frames from compressive measurements and show that it outperforms several state-of-the-art RNN models.
| Model | Compression rate | |||
|---|---|---|---|---|
| 50% | 33% | 25% | 20% | |
| Stacked-RNN | 38.56 | 35.82 | 33.12 | 30.78 |
| Stacked-LSTM | 37.02 | 34.06 | 31.55 | 29.60 |
| Stacked-GRU | 40.3 | 37.31 | 33.98 | 31.09 |
| SISTA-RNN | 42.52 | 37.20 | 33.85 | 30.91 |
| Our model | 44.65 | 38.90 | 34.22 | 30.76 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Designing Recurrent Neural Networks by unfolding
an L1-L1 minimization algorithm
Abstract
We propose a new deep recurrent neural network (RNN) architecture for sequential signal reconstruction. Our network is designed by unfolding the iterations of the proximal gradient method that solves the minimization problem. As such, our network leverages by design that signals have a sparse representation and that the difference between consecutive signal representations is also sparse. We evaluate the proposed model in the task of reconstructing video frames from compressive measurements and show that it outperforms several state-of-the-art RNN models.
**Index Terms— ** Sparse signal recovery, deep unfolding, recurrent neural networks, minimization.
1 Introduction
The problem of reconstructing sequential signals from low-dimensional—and possibly corrupted—observations across time appears in various imaging applications, including compressive video sensing [1], dynamic magnetic resonance imaging [2], and mm-Wave imaging [3]. When reconstructing time-varying signals, one needs to leverage prior knowledge; namely that (i) at a given time instance the signal has a low-complexity representation, such as sparsity in a learned dictionary or fixed basis, and (ii) signals (or their representations) across time are correlated (temporal correlation).
Various methods for sequential signal reconstruction have been proposed in the past. The method in [4] adapted a Kalman filter in sequential compressed sensing, whereas the Modified-CS method [5] integrates an estimate the signal’ s support into the reconstruction scheme. Alternatively, the methods in [6, 7] considered that two consecutive sparse signal representations are close under an - or -norm metric. Such approaches, however, recover the signals using iterative optimization algorithms, leading to high computational complexity when the dimensionality of the problem increases.
Deep neural networks (DNNs) have recently achieved state-of-the-art performance in solving inverse problems [8]. These approaches come with the additional benefit of fast reconstruction as they do not have to solve an optimization problem during inference. However, DNNs are black-box models, meaning that they do not integrate prior or domain knowledge, and thus lack interpretability and theoretical guarantees [8]. Recent efforts on designing DNNs that incorporate domain knowledge, include deep unfolding methods, which interpret a DNN as an unrolled version of an iterative optimization algorithm. Examples include the learned-ISTA (LISTA) network [9], which unfolds the iterative soft-thresholding algorithm (ISTA) [10], the unfolded versions of the approximate message passing [11] and iterative hard thresholding [12] algorithms, and the ADMM-Net [13].
Little attention has, however, been devoted to the design of deep recurrent neural networks (RNNs) [14] for representing sequential signals. The authors of [15] proposed an RNN design—named SISTA-RNN—by unfolding the sequential version of ISTA. The problem that the algorithm solves considers that two consecutive signal realizations are close in the -norm sense. In this work, we propose a novel RNN model for sequential signal recovery. Our model is derived by unfolding a proximal gradient method that solves the minimization problem [6, 7]. This problem assumes that the different between sequential sparse signal representations is also sparse, and it is proven to outperform both and minimization [16]. We apply the proposed model in the problem of video reconstruction from low-dimensional measurements, that is, sequential frame compressed sensing. Experimentation on the moving MNIST dataset111The source code to replicate our experiments is available on https://github.com/dhungle.[[17](#bib.bib17)] shows that the proposed model achieves higher reconstruction results compared to various state-of-the-art RNN models, including SISTA-RNN [15].
The paper is organized as follows. Section 2 presents the background of the work. Section 3 describes the proposed model for video reconstruction. Section 4 presents the experiments and section 5 concludes the work.
2 Background and Related Work
2.1 Sparse signal reconstruction
Consider the problem of reconstructing a sparse signal from noisy measurements , where is a sensing matrix and a noise vector. By leveraging that has a sparse representation in a dictionary , that is, , the signal can be recovered from the measurements by solving [18]:
[TABLE]
where is the -norm and is a regularization parameter. ISTA [10] solves (1) by iterating over
[TABLE]
where is the soft thresholding operator [see Fig.1], , and is an upper bound on the Lipschitz constant of the gradient of . LISTA [9] unrolls the iterations of ISTA into a feedforward neural network with shared weights, where each layer implements an iteration: with , , and learned from data.
2.2 Stacked-RNNs for sequential signal representation
RNNs are connectionist models with self-feedback loops allowing information to pass across sequential steps. In a stacked-RNN [14], see Fig. 2(a), the vertical stack of network layers capture the latent representation of an input signal at a given time instance and the horizontal connections learn the temporal relationship across signals. RNNs can be used to recover a sequence of signals , from a sequence of noisy measurement vectors . Specifically, given , , the signal representation at the layer of layers, , and the reconstructed signal, at time are calculated as
[TABLE]
[TABLE]
where is a nonlinear activation function such as the tanh or the ReLU, are affine transformations, and are output bias vectors. These parameters, along with initial hidden states , can be trained using gradient descent with backpropagation-through-time. Different RNN architectures can be applied to solve the problem, e.g., the long short-term memory (LSTM) network [19] and the gated recurrent unit (GRU) [20], which have been proposed to prevent the vanishing gradient problem with long input sequences.
2.3 The SISTA stacked-RNN network
Traditional RNN models [19, 20, 14] do not integrate the knowledge that the signals have sparse representations , . To address this issue, the study in [15] proposed a stacked-RNN architecture, which stems from unfolding an iterative soft-thresholding algorithm—referred to as SISTA—that solves the following problem:
[TABLE]
where is a correlation matrix between and , and are regularization parameters. Similar to LISTA [9], the SISTA-RNN network [15] uses the soft-thresholding operator [Section 2.1 and Fig.1] as activation function.
3 The proposed RNN via minimization
This section describes the proposed RNN model, which stems from unfolding the steps of a proximal method that solves the minimization problem.
3.1 - minimization in sequential signal recovery
In sequential signal recovery, one can recover the signal from measurements by solving the minimization problem [6, 7]:
[TABLE]
where is an affine transformation that promotes the correlation between the sparse representations between two consecutive instantiations of the signal, and , and are regularization parameters. We highlight that in (6) the correlation between and is encoded using the -norm as opposed to using the -norm to express the correlation between and in SISTA [see (5)]. The motivation is twofold: Firstly, it is proven that - outperforms - minimization in sparse signal recovery [16]. Secondly, from an application perspective, we know that the error between consecutive video frames (or their motion-compensated versions) typically follow the Laplace rather than the Gaussian distribution [21].
The objective function of (6) consists of two parts: the differentiable function and the non-smooth function . Hence, in this work, we propose to solve (6) using a proximal gradient method, the steps of which are given in Algorithm 1. In our algorithm, step 5 applies a gradient descent update for with a learning rate and step 6 applies element-wise the proximal operator for our problem, . For notation brevity, we denote , and , an element in , , respectively; the proximal operator is then defined as:
[TABLE]
if , and
[TABLE]
if . Fig. 1 depicts a schema of the proximal operator of our algorithm for [see Fig. 1] in comparison with the soft-thresholding operator [see Fig. 1], which is used in SISTA [15].
3.2 The proposed --RNN architecture
We now describe the proposed stacked-RNN architecture for sequential signal recovery, which we call --RNN. The network, which is shown in Fig. 2(b), is designed by unrolling the steps of Algorithm 1 across the iterations (yielding the hidden layers) and time instances . Specifically, the -th hidden layer is given by
[TABLE]
and the reconstructed signal at time instance is calculated as
[TABLE]
where , , , are defined as
[TABLE]
The activation function has the form of the proximal operator with the parameters , learned during training. We train our network in an end-to-end fashion: Vectorized frames are inputs , , which are compressed by a linear measurement layer A, resulting in compressive measurements . The reconstructed frames are obtained by multiplying linearly the hidden representation with the dictionary D. During training, we minimize the loss function using Adam optimization [22] on mini-batches, where the trainable parameters are and is the hyper-parameter of the weight decay regularization.
4 Experiments
We assess the performance of the proposed RNN model in the problem of video frame reconstruction from compressive measurements. We use the moving MNIST dataset [17] for our experiments; the 10.000 video sequences (with 20 frames each) in the dataset are split into non-overlapping training, validation, and test sets, consisting of 8.000, 1.000, and 1.000 sequences, respectively. In order to reduce the computational complexity and memory requirements, each frame has been spatially downscaled from to pixels using bilinear decimation. After vectorising, we obtain signal sequences of with time steps. For each sequence, we obtain a sequence of measurements using a trainable linear sensing matrix , with and . We test several different values of corresponding to compression rates of . The dictionary is initialized with the overcomplete discrete cosine transform (DCT).
We compare the reconstruction performance of the proposed RNN model against existing various RNN models, namely, SISTA-RNN [15], stacked-RNN [23], stacked-LSTM (with the LSTM cell architecture from [19]), and stacked-GRU (with the GRU cell architecture from [20]). Following the experimental setup in [15], we initialize the sparse code as the zero vector and . We empirically found that using a weight decay in the SISTA-RNN and the proposed --RNN yields best results in the validation set, whereas for the rest of the models not using a weight decay () provides best results. In order to initialize [see (6)], we perform a random search to find the best combination that yields high reconstruction accuracy in the validation dataset, and obtain , for our model and , for SISTA-RNN. We use stacks for all models, except for the stacked-LSTM model222The stacked-LSTM with could not be trained from the data, possibly due to the large amount of parameters to optimise. for which we set . All weights and biases are initialized with a uniform distribution of , where is the size of each hidden layer. We train the networks for 200 epochs with a learning rate of 0.0003, and a batch size of 32.
Table 1 reports the reconstructed PSNR results averaged across all frames and sequences in the test set. The experiments show that our model outperforms all other models at compression rates , and , bringing respective improvements of 2.13 dB, 1.59 dB, 0.24 dB over the second best model. At the rate of , the proposed model is outperformed by SISTA-RNN and stacked-GRU. In addition, Fig. 3 illustrates average PSNR (measured on the validation set) versus training epochs curves for all models at a compression rate of . The learning curve of the proposed RNN is consistently better than those of the other models.
After 200 epochs of training our model, we measured the sparsity (number of zero elements) of 55% in the last layer . We leave the investigation of how sparsity affects reconstruction performance for future work.
5 Conclusion
We proposed a stacked RNN for sequential sparse signal recovery from compressive measurements. Our RNN architecture incorporates prior knowledge about the structure of the signals and their correlation by deep-unfolding a proximal gradient method for the minimization problem. Our experiments in the task of video-frame recovery from compressive measurement show that our model outperforms several state-of-the-art RNNs.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. G. Baraniuk, T. Goldstein, A. C. Sankaranarayanan, C. Studer, A. Veeraraghavan, and M. B. Wakin, “Compressive video sensing: Algorithms, architectures, and applications,” IEEE Signal Processing Magazine , vol. 34, no. 1, pp. 52–66, Jan 2017.
- 2[2] L. Weizman, Y. C. Eldar, and D. Ben Bashat, “Compressed sensing for longitunal MRI : An adaptive-weighted approach,” Medical Physics , vol. 42, no. 9, pp. 5195–5207, 2015.
- 3[3] M. Becquaert, E. Cristofani, H. V. Luong, M. Vandewal, J. Stiens, and N. Deligiannis, “Compressed sensing mm-wave SAR for non-destructive testing applications using multiple weighted side information,” Sensors , vol. 18, no. 6, 2018.
- 4[4] N. Vaswani, “Kalman filtered compressed sensing,” IEEE International Conference on Image Processing , 2008.
- 5[5] J. Zhan and N. Vaswani, “Time invariant error bounds for modified-CS-based sparse signal sequence recovery,” IEEE Transactions on Information Theory , vol. 61, no. 3, pp. 1389–1409, 2015.
- 6[6] A. Charles, M. S. Asif, J. Romberg, and C. Rozell, “Sparsity penalties in dynamical system estimation,” in Conference on Information Sciences and Systems (CISS) , 2011, pp. 1–6.
- 7[7] J. F. C. Mota, N. Deligiannis, A. C. Sankaranarayanan, V. Cevher, and M. R. D. Rodrigues, “Adaptive-rate reconstruction of time-varying signals with application in compressive foreground extraction,” IEEE Transactions on Signal Processing , vol. 64, no. 14, pp. 3651–3666, July 2016.
- 8[8] A. Lucas, M. Iliadis, R. Molina, and A. K. Katsaggelos, “Using deep neural networks for inverse problems in imaging: beyond analytical methods,” IEEE Signal Processing Magazine , vol. 35, no. 1, pp. 20–36, 2018.
