Iterative Soft/Hard Thresholding with Homotopy Continuation for Sparse Recovery
Yuling Jiao, Bangti Jin, Xiliang Lu

TL;DR
This paper presents an analysis of an iterative thresholding algorithm with homotopy continuation for sparse signal recovery, achieving sharp error bounds and efficient iteration complexity under certain conditions.
Contribution
It introduces a novel analysis of soft/hard thresholding with homotopy continuation, providing theoretical guarantees and complexity bounds for sparse recovery.
Findings
Achieves reconstruction error of O(ε) under regularity conditions
Provides iteration complexity of O((ln ε)/(ln γ) np)
Demonstrates effectiveness through numerical examples
Abstract
In this note, we analyze an iterative soft / hard thresholding algorithm with homotopy continuation for recovering a sparse signal from noisy data of a noise level . Under suitable regularity and sparsity conditions, we design a path along which the algorithm can find a solution which admits a sharp reconstruction error with an iteration complexity , where and are problem dimensionality and controls the length of the path. Numerical examples are given to illustrate its performance.
| method | time (s) | nMV | Re | Ab | |
|---|---|---|---|---|---|
| ISTC | 1.0 | 58 | 4.21e-3 | 2.66e-1 | |
| PGH | 1.7 | 419 | 4.14e-3 | 2.66e-1 | |
| SpaRSA | 3.4 | 302 | 4.13e-3 | 2.63e-1 | |
| GPSR | 3.0 | 256 | 4.25e-3 | 2.71e-1 | |
| FISTA | 5.3 | 505 | 4.30e-3 | 2.65e-1 | |
| ISTC | 3.3 | 58 | 4.34e-3 | 2.88e-1 | |
| PGH | 5.6 | 443 | 4.25e-3 | 2.85e-1 | |
| SpaRSA | 11.4 | 309 | 4.25e-3 | 2.84e-1 | |
| GPSR | 9.5 | 258 | 4.36e-3 | 2.91e-1 | |
| FISTA | 17.2 | 506 | 4.40e-3 | 2.74e-1 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Iterative Soft/Hard Thresholding with Homotopy Continuation for Sparse Recovery
Yuling Jiao, Bangti Jin and Xiliang Lu Yuling Jiao is in the School of Statistics and Mathematics and Big Data Institute of ZUEL, Zhongnan University of Economics and Law, Wuhan, 430063, P.R. China (email: [email protected]), Bangti Jin is in the Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK (email: [email protected], [email protected]), and Xiliang Lu (corresponding author) is in the School of Mathematics and Statistics, Wuhan University and Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan 430072, P.R. China (email: [email protected]).
Abstract
In this note, we analyze an iterative soft / hard thresholding algorithm with homotopy continuation for recovering a sparse signal from noisy data of a noise level . Under suitable regularity and sparsity conditions, we design a path along which the algorithm can find a solution which admits a sharp reconstruction error with an iteration complexity , where and are problem dimensionality and controls the length of the path. Numerical examples are given to illustrate its performance.
Index Terms:
iterative soft/hard thresholding, continuation, solution path, convergence
I Introduction
Sparse recovery has attracted much attention in machine learning, signal processing, statistics and inverse problems over the last decade. Often the problem is formulated as
[TABLE]
where is the unknown sparse signal, is the data with the noise of level , and the matrix with has normalized columns , i.e., , The desired sparsity structure can be enforced by either the or penalty, i.e.,
[TABLE]
where is the regularization parameter.
Among existing algorithms for minimizing (2), iterative soft / hard thresholding (IST/IHT) algorithm [1, 2, 3, 4] and their accelerated extension [5, 6] are extremely popular. These algorithms are of the form
[TABLE]
where is the stepsize, and is a soft- or hard-thresholding operator defined componentwise by
[TABLE]
where is the characteristic function. Their convergence was analyzed in many works, mostly under the condition . This condition ensures a (asymptotically) contractive thresholding and thus the desired convergence [1, 2, 3, 4]. Meanwhile, it was observed that the continuation along can greatly speed up the algorithms [7, 8, 9, 6, 10]. Nonetheless, as pointed out by [11] “… the design of a robust, practical, and theoretically effective continuation algorithm remains an interesting open question …” There were several works aiming at filling this gap. In the works [12, 13], a proximal gradient method with continuation for problem was analyzed with linear search, under sparse restricted eigenvalue/restricted strong convexity condition. Recently, a Newton type method with continuation was studied for and problems [14, 15]. In this work, we present a unified approach to analyze IST/IHT with continuation and a fixed stepsize , denoted by ISTC/IHTC. The challenge in the analysis is the lack of monotonicity of function values due to the choice .
The overall procedure is given in Algorithm 1. Here is an initial guess of , supposedly large, is the decreasing factor for , and is the maximum number of inner iterations (for a fixed ). The choice of the final is given in (5) below. Distinctly, the inner iteration does not need to be solved exactly (actually one inner iteration suffices the desired accuracy of the final solution , cf. Theorem 2 below), and there is no need to perform stepsize selection.
In Theorem 2, we prove that under suitable mutual coherence condition on the matrix (cf. Assumption II.1 and Remark II.2), ISTC/IHTC always converges.
II Convergence analysis
The starting point of our analysis is the next lemma.
Lemma 1**.**
For any , there holds
[TABLE]
Proof.
By the definition of the operator , cf. (4),
[TABLE]
which completes the proof of the lemma. ∎
Let the true signal be -sparse with a support , i.e., , and the complement of . Recall also that the mutual coherence (MC) of the matrix is defined by [16].
Assumption II.1**.**
The MC of satisfies
The proper choice of the regularization parameter is essential for successful sparse recovery. It is well known that under Assumption II.1, the choice for the penalty and for the penalty ensures [17, 15]. Thus we consider the following a priori choice
[TABLE]
In practice, one may consider a posteriori choice rules [18]. Now we can state the global convergence of Algorithm 1.
Theorem 2**.**
Let Assumption II.1 hold, and be chosen by (5). Suppose that is large, , and
[TABLE]
Then Algorithm 1 is well-defined, and the solution satisfies:
- (i)
,
- (ii)
there holds the error estimate
[TABLE]
Further, if is large enough, then .
Proof.
We only prove the assertion for ISTC, since that for IHTC is similar. The choice of in (5) implies and , and thus the choice of makes sense.
First we consider the inner loop at lines 5 - 7 of Algorithm 1 and omit the index for notational simplicity. Let , and . Consider one IST iteration from to . The key step to the convergence proof is the following implication: with
[TABLE]
Now we show this claim. It follows from (1) and the following componentwise expression for the update
[TABLE]
By the hypothesis in (6), , , and (5), we deduce that for any
[TABLE]
by the definition of , and the second inequality follows from [15, Lemma 2.1]. Hence, , which implies directly . Meanwhile, under (6) and (5), for any , by Lemma 1, we deduce
[TABLE]
Thus we have , i.e., the claim (6) holds.
Next we prove the following assertion by mathematical induction: for all with , there holds
[TABLE]
Since is large, it satisfies (7). Now assume (7) holds for , i.e., and . When Algorithm 1 runs lines 3 - 7 for , since , then we have and From (6), we obtain that for all , In particular, if we choose , then (7) holds for . When Algorithm 1 terminates for some , then and . From (7) we have and . Likewise, if , property (ii) implies .
Last, we briefly discuss IHTC. For the choice in (5), makes sense. With , a similar argument yields
[TABLE]
The rest follows like before, and thus it is omitted. ∎
Remark II.1**.**
The proof works for any choice , including . In practice, we fix it at . This together with Theorem 2 allows estimating the complexity of Algorithm 1. At each iteration, one needs to compute matrix-vector product and , and for each , the number of iterations is bounded by . The overall cost depends on the decreasing factor by .
Remark II.2**.**
Conditions similar to Assumption II.1 have been widely used in the literature, for analyzing OMP [19, 20, 17] (with ) and for bounding the estimation error of Lasso [21, 22] (with and ). Thus Assumption II.1 is fairly standard. Examples of matrices with small MC include that formed by equiangular tight frame and random subgaussian matrices [23]. Further, we note that other similar conditions, e.g., restricted eigenvalue condition and RIP conditions, were also used to derive error bounds of the type for proximal gradient homotopy algorithms [12, 13] and Greedy methods, e.g., CoSaMP [24], NIHT [25] and CGIHT [26].
III Numerical Results and Discussions
Now we present numerical examples to show the convergence and the performance of Algorithm 1. First, we give implementation details, e.g., data generation, parameter setting for the algorithm. Then our method is compared with several state-of-the-art algorithms in terms of reconstruction error and recovery ability via phase transition.
III-A Implementation details
Following [6], the signals are chosen as -sparse with a dynamic range The matrix is chosen to be either random Gaussian matrix, or random Bernoulli matrix, or the product of a partial FFT matrix and inverse Haar wavelet transform. Under proper conditions, such matrices satisfy Assumption II.1. The noise has entries following i.i.d. .
We fix the algorithm parameters as follows: and for ISTC and IHTC, respectively [14, 15], decreasing factor . Since the optimal depends on the noise level , which is often unknown in practice, we predefine a path with and . Then we run Algorithm 1 on the path and select the optimal by Bayesian information criterion [14]. All the computations were performed on an eight-core desktop with 3.40 GHz and 12 GB RAM using MATLAB 2014a. The MATLAB package ISHTC for reproducing all the numerical results can be found at http://www0.cs.ucl.ac.uk/staff/b.jin/companioncode.html.
First we illustrate Theorem 2 by examining the influence of sparsity level , coherence and noise level on IHTC recovery on three settings (, , ):
- (a)
random Gaussian , 1e-2, . 2. (b)
random Gaussian , , 1e-4,1e-3,1e-2,1e-1,1. 3. (c)
is random Gaussian with correlation, where the parameter controls the coherence (see [27, Sect. 5.1] for details). In general a larger parameter gives a larger (a typical example: for ; for ; for and for ). We choose , , 1e-3.
The results in Fig. 1 are computed from 100 independent realizations. It is observed that when the sparsity level and noise level and incoherence are small, IHTC recovers the exact support with high probability as implied by Theorem 2.
III-B Comparison of ISTC with solvers
Now we compare ISTC with four state-of-the-art solvers: GPSR [8] (http://www.lx.it.pt/mtf/GPSR/), SpaRSA [9] (http://www.lx.it.pt/mtf/SpaRSA/), proximal-gradient homotopy method (PGH)[12] (https://www.microsoft.com/en-us/download/details.aspx?id=52421), and FISTA [5] (implemented as https://web.iem.technion.ac.il/images/user-files/becka/papers/wavelet_FISTA.zip)111All the codes were last accessed on February 23, 2017..
The numerical results (CPU time, number of matrix-vector multiplications (nMV), relative error (Re), and absolute error (Ab)) are computed from 10 independent realizations of for random Bernoulli sensing matrices with different parameter tuples are shown in Tables I. It is observed that ISTC yields reconstructions that are comparable with that by other methods but at least two to three times faster. Further, it scales well with the problem size .
Next, we compare the empirical performance of ISTC with other methods by their phase transition curves in the - plane, with and . When computing the curves, we fix the dimension , and partition the range into a equally spaced grid, and run 100 independent simulations at each grid point. The -sparse signal , matrix , and data are generated as [28, Fig. 13]. Fig. 2 plots the logistic regression curves identifying the success rate for the algorithms. IHTC exhibits similar phase transition behavior as other methods.
III-C Comparison of IHTC with greedy solvers
Now we compare IHTC with four state-of-the-art greedy methods for the problem, to recover 1D signal and benchmark MRI image. These methods include OMP [19] (https://sparselab.stanford.edu/SparseLab_files/Download_files/SparseLab21-Core.zip), normalized IHT (NIHT) [25] (http://www.gaga4cs.org/), CoSaMP [24] (http://mdav.ece.gatech.edu/software/SSCoSaMP-1.0.zip), and conjugate gradient IHT (CGIHT) [26] (http://www.gaga4cs.org/).
The underlying 1D signal and 2D MRI image are compressible under a wavelet basis. Thus, the data can be chosen as the wavelet coefficients sampled by the product of a partial FFT matrix and inverse Haar wavelet transform. For the 1D signal, the matrix is of size , and consists of applying a partial FFT and an inverse two level Harr wavelet transform. The signal under wavelet transform has nonzeros, and . The results are shown in Fig. 3 and Table III. The reconstruction by IHTC is visually more appealing than that of the others, cf. Fig. 3. The results by AIHT and CoSaMP suffer from pronounced oscillations. This is further confirmed by the PSNR value defined by , where is the maximum absolute value of the true signal, and MSE is the mean squared error of the reconstruction. Table III also presents the CPU time of the 1D example, which shows clearly that IHTC is the fastest one.
For the 2D MRI image, the matrix amounts to a partial FFT and an inverse wavelet transform, and it has a size . The image under eight level Haar wavelet transformation has nonzero entries and . The numerical results are shown in Fig. 4 and Table III. All methods produce comparable results, but the IHTC is fastest.
Next, we compare the empirical sparse recovery performance of IHTC with these greedy methods by means of phase transition curves in the - plane, with and . When computing the curves, we fix the dimension , partition the range into a uniform grid, and run 100 independent simulations at each grid point. Like before, the -sparse signal , matrix and data are generated as [28, Fig. 13]. Fig. 5 plots the logistic regression curves identifying the success rate for the algorithms. IHTC exhibits comparable phase transition phenomenon with other greedy methods, whereas CoSaMP performs slightly worse than others.
IV Conclusion
In this paper, we analyze an iterative soft / hard thresholding algorithm with homotopy continuation for sparse recovery from noisy data. Under standard regularity condition and sparsity assumptions, sharp reconstruction errors can be obtained with an iteration complexity . Numerical results indicated its competitiveness with state-of-the-art sparse recovery algorithms. The results can be extended to other penalties, e.g., MCP [29] or SCAD [30].
Acknowledgements
The authors thank anonymous referees for their helpful comments. The research of Y. Jiao is partially supported by National Science Foundation of China (NSFC) No. 11501579 and National Science Foundation of Hubei Province No. 2016CFB486, B. Jin by EPSRC grant EP/M025160/1, and X. Lu by NSFC Nos. 11471253 and 91630313.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Comm. Pure Appl. Math. , vol. 57, no. 11, pp. 1413–1457, 2004.
- 2[2] P. Combettes and V. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Model. Simul. , vol. 4, no. 4, pp. 1168–1200, 2005.
- 3[3] T. Blumensath and M. E. Davies, “Iterative thresholding for sparse approximations,” J. Fourier Anal. Appl. , vol. 14, no. 5-6, pp. 629–654, 2008.
- 4[4] H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods,” Math. Program. , vol. 137, no. 1-2, pp. 91–129, 2013.
- 5[5] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci. , vol. 2, no. 1, pp. 183–202, 2009.
- 6[6] S. Becker, J. Bobin, and E. Candés, “NESTA: a fast and accurate first-order method for sparse recovery,” SIAM J. Imag. Sci. , vol. 4, no. 1, pp. 1–39, 2011.
- 7[7] E. Hale, W. Yin, and Y. Zhang, “Fixed-point continuation for ℓ 1 subscript ℓ 1 \ell_{1} -minimization: Methodology and convergence,” SIAM J. Optim. , vol. 19, no. 3, pp. 1107–1130, 2008.
- 8[8] M. Figueiredo, R. Nowak, and S. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE J. Sel. Topics Signal Proc. , vol. 1, no. 4, pp. 586–597, 2007.
