Iterative Soft/Hard Thresholding with Homotopy Continuation for Sparse   Recovery

Yuling Jiao; Bangti Jin; Xiliang Lu

arXiv:1704.03121·math.NA·May 24, 2017·IEEE Signal Process. Lett.

Iterative Soft/Hard Thresholding with Homotopy Continuation for Sparse Recovery

Yuling Jiao, Bangti Jin, Xiliang Lu

PDF

TL;DR

This paper presents an analysis of an iterative thresholding algorithm with homotopy continuation for sparse signal recovery, achieving sharp error bounds and efficient iteration complexity under certain conditions.

Contribution

It introduces a novel analysis of soft/hard thresholding with homotopy continuation, providing theoretical guarantees and complexity bounds for sparse recovery.

Findings

01

Achieves reconstruction error of O(ε) under regularity conditions

02

Provides iteration complexity of O((ln ε)/(ln γ) np)

03

Demonstrates effectiveness through numerical examples

Abstract

In this note, we analyze an iterative soft / hard thresholding algorithm with homotopy continuation for recovering a sparse signal $x^{†}$ from noisy data of a noise level $ϵ$ . Under suitable regularity and sparsity conditions, we design a path along which the algorithm can find a solution $x^{*}$ which admits a sharp reconstruction error $∥ x^{*} - x^{†} ∥_{ℓ^{\infty}} = O (ϵ)$ with an iteration complexity $O (\frac{l n ϵ}{l n γ} n p)$ , where $n$ and $p$ are problem dimensionality and $γ \in (0, 1)$ controls the length of the path. Numerical examples are given to illustrate its performance.

Tables1

Table 1. Table I: Numerical results (CPU time and errors), with random Bernoulli Ψ Ψ \Psi , of size p = 𝑝 absent p= 10000, 18000, n = ⌊ p / 4 ⌋ 𝑛 𝑝 4 n=\lfloor p/4\rfloor , s = ⌊ n / 40 ⌋ 𝑠 𝑛 40 s=\lfloor n/40\rfloor , with D R = 100 𝐷 𝑅 100 DR=100 and σ = 5e-2 𝜎 5e-2 \sigma=\mbox{5e-2} .

$p$	method	time (s)	nMV	Re $ℓ^{2}$	Ab $ℓ^{\infty}$
	ISTC	1.0	58	4.21e-3	2.66e-1
	PGH	1.7	419	4.14e-3	2.66e-1
$10000$	SpaRSA	3.4	302	4.13e-3	2.63e-1
	GPSR	3.0	256	4.25e-3	2.71e-1
	FISTA	5.3	505	4.30e-3	2.65e-1
	ISTC	3.3	58	4.34e-3	2.88e-1
	PGH	5.6	443	4.25e-3	2.85e-1
$18000$	SpaRSA	11.4	309	4.25e-3	2.84e-1
	GPSR	9.5	258	4.36e-3	2.91e-1
	FISTA	17.2	506	4.40e-3	2.74e-1

Equations38

y = Ψ x^{†} + η,

y = Ψ x^{†} + η,

x \in R^{p} min \frac{1}{2} ∥Ψ x - y ∥^{2} + λ ∥ x ∥_{t}, t \in {0, 1},

x \in R^{p} min \frac{1}{2} ∥Ψ x - y ∥^{2} + λ ∥ x ∥_{t}, t \in {0, 1},

x^{k + 1} = T_{τ_{k} λ} (x^{k} + τ_{k} Ψ^{t} (y - Ψ x^{k})),

x^{k + 1} = T_{τ_{k} λ} (x^{k} + τ_{k} Ψ^{t} (y - Ψ x^{k})),

T_{\lambda}(t)=\left\{\begin{array}[]{ll}\max(|t|-\lambda,0)\mathrm{sgn}(t),&\mbox{IST},\\ \chi_{\{|t|>\sqrt{2\lambda}\}}(t),&\mbox{IHT},\end{array}\right.

T_{\lambda}(t)=\left\{\begin{array}[]{ll}\max(|t|-\lambda,0)\mathrm{sgn}(t),&\mbox{IST},\\ \chi_{\{|t|>\sqrt{2\lambda}\}}(t),&\mbox{IHT},\end{array}\right.

|T_{\lambda}(x+y)-x|\leq\left\{\begin{array}[]{ll}|y|+\lambda&\mbox{ IST},\\ |y|+\sqrt{2\lambda}&\mbox{ IHT}.\end{array}\right.

|T_{\lambda}(x+y)-x|\leq\left\{\begin{array}[]{ll}|y|+\lambda&\mbox{ IST},\\ |y|+\sqrt{2\lambda}&\mbox{ IHT}.\end{array}\right.

∣ T_{λ} (x + y) - x ∣

∣ T_{λ} (x + y) - x ∣

\displaystyle\leq\left\{\begin{array}[]{ll}|y|+\lambda&\mbox{ IST},\\ |y|+\sqrt{2\lambda}&\mbox{ IHT},\end{array}\right.

\lambda^{*}=\left\{\begin{array}[]{ll}C_{1}\epsilon,\ \mbox{with }C_{1}>\frac{1}{1-2\mu s},&\mbox{for ISTC},\\ C_{0}\epsilon^{2},\ \mbox{with }C_{0}>\frac{1}{2(1-2\mu s)^{2}},&\mbox{for IHTC}.\end{array}\right.

\lambda^{*}=\left\{\begin{array}[]{ll}C_{1}\epsilon,\ \mbox{with }C_{1}>\frac{1}{1-2\mu s},&\mbox{for ISTC},\\ C_{0}\epsilon^{2},\ \mbox{with }C_{0}>\frac{1}{2(1-2\mu s)^{2}},&\mbox{for IHTC}.\end{array}\right.

\gamma\in\left\{\begin{array}[]{ll}\ [{2\mu s}/(1-1/C_{1}),1),&\mbox{for ISTC},\\ \ [(\frac{2\mu s}{1-{1}/({2C_{0}})^{1/2}})^{2},1),&\mbox{for IHTC}.\end{array}\right.

\gamma\in\left\{\begin{array}[]{ll}\ [{2\mu s}/(1-1/C_{1}),1),&\mbox{for ISTC},\\ \ [(\frac{2\mu s}{1-{1}/({2C_{0}})^{1/2}})^{2},1),&\mbox{for IHTC}.\end{array}\right.

∥ x^{*} - x^{†} ∥_{ℓ^{\infty}} \leq {(C_{1} - 1) ϵ / (μ s), (2 C_{0} - 1) ϵ / (μ s), \mbox f or I S T C, \mbox f or I H T C .

∥ x^{*} - x^{†} ∥_{ℓ^{\infty}} \leq {(C_{1} - 1) ϵ / (μ s), (2 C_{0} - 1) ϵ / (μ s), \mbox f or I S T C, \mbox f or I H T C .

A^{k} \subset A^{†} \mbox an d E^{k} \leq α λ

A^{k} \subset A^{†} \mbox an d E^{k} \leq α λ

\Rightarrow

x_{i}^{k + 1}

x_{i}^{k + 1}

= T_{λ} (x_{i}^{†} + Ψ_{i}^{t} (Ψ_{A^{†} \cup A^{k} \ {i}} (x^{†} - x^{k})_{A^{†} \cup A^{k} \ {i}} + η)) .

∣ x_{i}^{†} + Ψ_{i}^{t} (Ψ_{A^{†} \cup A^{k} \ {i}} (x^{†} - x^{k})_{A^{†} \cup A^{k} \ {i}} + η) ∣

∣ x_{i}^{†} + Ψ_{i}^{t} (Ψ_{A^{†} \cup A^{k} \ {i}} (x^{†} - x^{k})_{A^{†} \cup A^{k} \ {i}} + η) ∣

\leq

\leq

∣ x_{i}^{k + 1} - x_{i}^{†} ∣

∣ x_{i}^{k + 1} - x_{i}^{†} ∣

\leq λ + μ (s - 1) E^{k} + ϵ \leq λ + μ s α λ + \frac{1}{C _{1}} λ

= (1 + \frac{1}{C _{1}} + α μ s) λ = 2 λ \leq α γ λ .

supp x (λ_{ℓ}) \subset A^{†}, ∥ x (λ_{ℓ}) - x^{†} ∥_{ℓ^{\infty}} \leq α γ λ_{ℓ} .

supp x (λ_{ℓ}) \subset A^{†}, ∥ x (λ_{ℓ}) - x^{†} ∥_{ℓ^{\infty}} \leq α γ λ_{ℓ} .

A^{k} \subset A^{†} \mbox an d E^{k} \leq α 2 λ

A^{k} \subset A^{†} \mbox an d E^{k} \leq α 2 λ

\Rightarrow

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Iterative Soft/Hard Thresholding with Homotopy Continuation for Sparse Recovery

Yuling Jiao, Bangti Jin and Xiliang Lu Yuling Jiao is in the School of Statistics and Mathematics and Big Data Institute of ZUEL, Zhongnan University of Economics and Law, Wuhan, 430063, P.R. China (email: [email protected]), Bangti Jin is in the Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK (email: [email protected], [email protected]), and Xiliang Lu (corresponding author) is in the School of Mathematics and Statistics, Wuhan University and Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan 430072, P.R. China (email: [email protected]).

Abstract

In this note, we analyze an iterative soft / hard thresholding algorithm with homotopy continuation for recovering a sparse signal $x^{\dagger}$ from noisy data of a noise level $\epsilon$ . Under suitable regularity and sparsity conditions, we design a path along which the algorithm can find a solution $x^{*}$ which admits a sharp reconstruction error $\|x^{*}-x^{\dagger}\|_{\ell^{\infty}}=O(\epsilon)$ with an iteration complexity $O(\frac{\ln\epsilon}{\ln\gamma}np)$ , where $n$ and $p$ are problem dimensionality and $\gamma\in(0,1)$ controls the length of the path. Numerical examples are given to illustrate its performance.

Index Terms:

iterative soft/hard thresholding, continuation, solution path, convergence

I Introduction

Sparse recovery has attracted much attention in machine learning, signal processing, statistics and inverse problems over the last decade. Often the problem is formulated as

[TABLE]

where $x^{{\dagger}}\in\mathbb{R}^{p}$ is the unknown sparse signal, $y\in\mathbb{R}^{n}$ is the data with the noise $\eta\in\mathbb{R}^{n}$ of level $\epsilon=\|\eta\|$ , and the matrix $\Psi\in\mathbb{R}^{n\times p}$ with $p\gg n$ has normalized columns $\{\psi_{i}\}$ , i.e., $\|\psi_{i}\|=1$ , $i=1,\ldots,p.$ The desired sparsity structure can be enforced by either the $\ell^{0}$ or $\ell^{1}$ penalty, i.e.,

[TABLE]

where $\lambda>0$ is the regularization parameter.

Among existing algorithms for minimizing (2), iterative soft / hard thresholding (IST/IHT) algorithm [1, 2, 3, 4] and their accelerated extension [5, 6] are extremely popular. These algorithms are of the form

[TABLE]

where $\tau_{k}$ is the stepsize, and $T_{\lambda}$ is a soft- or hard-thresholding operator defined componentwise by

[TABLE]

where $\chi(t)$ is the characteristic function. Their convergence was analyzed in many works, mostly under the condition $\tau_{k}<2/\|\Psi\|^{2}$ . This condition ensures a (asymptotically) contractive thresholding and thus the desired convergence [1, 2, 3, 4]. Meanwhile, it was observed that the continuation along $\lambda$ can greatly speed up the algorithms [7, 8, 9, 6, 10]. Nonetheless, as pointed out by [11] “… the design of a robust, practical, and theoretically effective continuation algorithm remains an interesting open question …” There were several works aiming at filling this gap. In the works [12, 13], a proximal gradient method with continuation for $\ell^{1}$ problem was analyzed with linear search, under sparse restricted eigenvalue/restricted strong convexity condition. Recently, a Newton type method with continuation was studied for $\ell^{1}$ and $\ell^{0}$ problems [14, 15]. In this work, we present a unified approach to analyze IST/IHT with continuation and a fixed stepsize $\tau=1$ , denoted by ISTC/IHTC. The challenge in the analysis is the lack of monotonicity of function values due to the choice $\tau=1$ .

The overall procedure is given in Algorithm 1. Here $\lambda_{0}$ is an initial guess of $\lambda$ , supposedly large, $\gamma\in(0,1)$ is the decreasing factor for $\lambda$ , and $K_{max}$ is the maximum number of inner iterations (for a fixed $\lambda$ ). The choice of the final $\lambda^{*}$ is given in (5) below. Distinctly, the inner iteration does not need to be solved exactly (actually one inner iteration suffices the desired accuracy of the final solution $x^{*}$ , cf. Theorem 2 below), and there is no need to perform stepsize selection.

In Theorem 2, we prove that under suitable mutual coherence condition on the matrix $\Psi$ (cf. Assumption II.1 and Remark II.2), ISTC/IHTC always converges.

II Convergence analysis

The starting point of our analysis is the next lemma.

Lemma 1.

For any $x,y\in\mathbb{R}$ , there holds

[TABLE]

Proof.

By the definition of the operator $T_{\lambda}$ , cf. (4),

[TABLE]

which completes the proof of the lemma. ∎

Let the true signal $x^{\dagger}$ be $s$ -sparse with a support $\mathcal{A}^{\dagger}$ , i.e., $s=|\mathcal{A}^{\dagger}|$ , and $\mathcal{I}^{\dagger}$ the complement of $\mathcal{A}^{\dagger}$ . Recall also that the mutual coherence (MC) $\mu$ of the matrix $\Psi$ is defined by $\mu=\max_{i\neq j}|\langle\psi_{i},\psi_{j}\rangle|$ [16].

Assumption II.1.

The MC $\mu$ of $\Psi$ satisfies $\mu s<1/2.$

The proper choice of the regularization parameter $\lambda$ is essential for successful sparse recovery. It is well known that under Assumption II.1, the choice $\lambda=O(\epsilon)$ for the $\ell_{1}$ penalty and $\lambda=O(\epsilon^{2})$ for the $\ell_{0}$ penalty ensures $\|x-x^{\dagger}\|_{\ell^{\infty}}=O(\epsilon)$ [17, 15]. Thus we consider the following a priori choice

[TABLE]

In practice, one may consider a posteriori choice rules [18]. Now we can state the global convergence of Algorithm 1.

Theorem 2.

Let Assumption II.1 hold, and $\lambda^{*}$ be chosen by (5). Suppose that $\lambda_{0}$ is large, $K_{max}\in\mathbb{N}$ , and

[TABLE]

Then Algorithm 1 is well-defined, and the solution $x^{*}$ satisfies:

(i)

$\mathrm{supp}(x^{*})\subset\mathcal{A}^{\dagger}$ ,

(ii)

there holds the error estimate

[TABLE]

Further, if $\min_{i\in\mathcal{A}^{\dagger}}|x_{i}^{\dagger}|$ is large enough, then $\mathrm{supp}(x^{*})=\mathcal{A}^{\dagger}$ .

Proof.

We only prove the assertion for ISTC, since that for IHTC is similar. The choice of $C_{1}$ in (5) implies $C_{1}>1$ and $\frac{2\mu s}{1-1/C_{1}}<1$ , and thus the choice of $\gamma$ makes sense.

First we consider the inner loop at lines 5 - 7 of Algorithm 1 and omit the index $\ell$ for notational simplicity. Let $E^{k}=\|x^{k}-x^{\dagger}\|_{\ell^{\infty}}$ , and $\alpha=\frac{1-{1}/{C_{1}}}{\mu s}$ . Consider one IST iteration from $x^{k}$ to $x^{k+1}$ . The key step to the convergence proof is the following implication: with ${\mathcal{A}}^{k}=\mathrm{supp}(x^{k})$

[TABLE]

Now we show this claim. It follows from (1) and $\|\Psi_{i}\|=1$ the following componentwise expression for the update

[TABLE]

By the hypothesis in (6), $\mathcal{A}^{k}\subset\mathcal{A}^{\dagger}$ , $E^{k}\leq\alpha\lambda$ , $\lambda\geq\lambda^{*}$ and (5), we deduce that for any $i\in\mathcal{I}^{\dagger}$

[TABLE]

by the definition of $\alpha$ , and the second inequality follows from [15, Lemma 2.1]. Hence, $|x^{k+1}_{i}|\leq|T_{\lambda}(\mu sE^{k}+\epsilon)|=0$ , which implies directly $\mathcal{A}^{k+1}\subset\mathcal{A}^{\dagger}$ . Meanwhile, under (6) and (5), for any $i\in\mathcal{A}^{\dagger}$ , by Lemma 1, we deduce

[TABLE]

Thus we have $E^{k+1}\leq\alpha\gamma\lambda$ , i.e., the claim (6) holds.

Next we prove the following assertion by mathematical induction: for all $\ell$ with $\lambda_{\ell}\geq\lambda^{*}$ , there holds

[TABLE]

Since $\lambda_{0}$ is large, it satisfies (7). Now assume (7) holds for $\lambda_{\ell-1}$ , i.e., $\mathrm{supp}\;x(\lambda_{\ell-1})\subset\mathcal{A}^{\dagger}$ and $\|x(\lambda_{\ell-1})-x^{\dagger}\|_{\ell^{\infty}}\leq\alpha\gamma\lambda_{\ell-1}$ . When Algorithm 1 runs lines 3 - 7 for $\lambda_{\ell}$ , since $x^{0}=x(\lambda_{\ell-1})$ , then we have $\mathcal{A}^{0}\subset\mathcal{A}^{\dagger}$ and $E^{0}\leq\alpha\lambda_{\ell}.$ From (6), we obtain that for all $k\geq 1$ , $\mathcal{A}^{k}\subset\mathcal{A}^{\dagger}\mbox{ and }E^{k}\leq\alpha\gamma\lambda_{\ell}.$ In particular, if we choose $k=K_{max}$ , then (7) holds for $\lambda_{\ell}$ . When Algorithm 1 terminates for some $\lambda_{\ell}<\lambda^{*}$ , then $\lambda_{\ell-1}\geq\lambda^{*}$ and $x^{*}=x(\lambda_{\ell-1})$ . From (7) we have $\mathrm{supp}\;x^{*}\subset\mathcal{A}^{\dagger}$ and $\|x^{*}-x^{\dagger}\|_{\ell^{\infty}}\leq\alpha\lambda^{*}=(C_{1}-1)\epsilon/(\mu s)$ . Likewise, if $\min_{i\in{\mathcal{A}}^{\dagger}}|x_{i}|>(C_{1}-1)\epsilon/(\mu s)$ , property (ii) implies $\mathrm{supp}(x^{*})={\mathcal{A}}^{\dagger}$ .

Last, we briefly discuss IHTC. For the choice $C_{0}$ in (5), $\gamma\in[(\frac{2\mu s}{1-{1}/({2C_{0}})^{1/2}})^{2},1)$ makes sense. With $\alpha=\frac{1-{1}/({2C_{0}})^{1/2}}{\mu s}$ , a similar argument yields

[TABLE]

The rest follows like before, and thus it is omitted. ∎

Remark II.1.

The proof works for any choice $K_{max}\geq 1$ , including $K_{max}=1$ . In practice, we fix it at $K_{max}=5$ . This together with Theorem 2 allows estimating the complexity of Algorithm 1. At each iteration, one needs to compute matrix-vector product $\Psi x$ and $\Psi^{t}y$ , and for each $\lambda$ , the number of iterations is bounded by $K_{max}$ . The overall cost depends on the decreasing factor $\gamma$ by $O(\frac{\ln\lambda^{*}}{\ln\gamma}np)=O(\frac{\ln\epsilon}{\ln\gamma}np)$ .

Remark II.2.

Conditions similar to Assumption II.1 have been widely used in the literature, for analyzing OMP [19, 20, 17] (with $(2s-1)\mu\leq 1$ ) and for bounding the estimation error of Lasso [21, 22] (with $7s\mu<1$ and $4s\mu\leq 1$ ). Thus Assumption II.1 is fairly standard. Examples of matrices with small MC $\mu$ include that formed by equiangular tight frame and random subgaussian matrices [23]. Further, we note that other similar conditions, e.g., restricted eigenvalue condition and RIP conditions, were also used to derive error bounds of the type $\|x-x^{\dagger}\|_{2}=O(\epsilon)$ for proximal gradient homotopy algorithms [12, 13] and Greedy methods, e.g., CoSaMP [24], NIHT [25] and CGIHT [26].

III Numerical Results and Discussions

Now we present numerical examples to show the convergence and the performance of Algorithm 1. First, we give implementation details, e.g., data generation, parameter setting for the algorithm. Then our method is compared with several state-of-the-art algorithms in terms of reconstruction error and recovery ability via phase transition.

III-A Implementation details

Following [6], the signals $x^{\dagger}$ are chosen as $s$ -sparse with a dynamic range $DR:=\max\{|x^{\dagger}_{i}|:x^{\dagger}_{i}\neq 0\}/\min\{|x^{\dagger}_{i}|:x^{\dagger}_{i}\neq 0\}.$ The matrix $\Psi\in\mathbb{R}^{n\times p}$ is chosen to be either random Gaussian matrix, or random Bernoulli matrix, or the product of a partial FFT matrix and inverse Haar wavelet transform. Under proper conditions, such matrices satisfy Assumption II.1. The noise $\eta$ has entries following i.i.d. $N(0,\sigma^{2})$ .

We fix the algorithm parameters as follows: $\lambda_{0}=\|\Psi^{t}y\|_{\infty}$ and $\lambda_{0}=\|\Psi^{t}y\|_{\infty}^{2}/2$ for ISTC and IHTC, respectively [14, 15], decreasing factor $\gamma=0.8$ . Since the optimal $\lambda^{*}$ depends on the noise level $\epsilon$ , which is often unknown in practice, we predefine a path $\Lambda=\{\lambda_{\ell}\}_{\ell=0}^{N}$ with $\lambda_{\ell}=\lambda_{0}\gamma^{\ell}$ and $N=100$ . Then we run Algorithm 1 on the path $\Lambda$ and select the optimal $\lambda^{*}$ by Bayesian information criterion [14]. All the computations were performed on an eight-core desktop with 3.40 GHz and 12 GB RAM using MATLAB 2014a. The MATLAB package ISHTC for reproducing all the numerical results can be found at http://www0.cs.ucl.ac.uk/staff/b.jin/companioncode.html.

First we illustrate Theorem 2 by examining the influence of sparsity level $s$ , coherence $\mu$ and noise level $\sigma$ on IHTC recovery on three settings ( $n=500$ , $p=1000$ , $DR=100$ ):

(a)

random Gaussian $\Psi$ , $\sigma=$ 1e-2, $s=10:10:100$ . 2. (b)

random Gaussian $\Psi$ , $s=50$ , $\sigma=$ 1e-4,1e-3,1e-2,1e-1,1. 3. (c)

$\Psi$ is random Gaussian with correlation, where the parameter $\nu$ controls the coherence $\mu$ (see [27, Sect. 5.1] for details). In general a larger parameter $\nu$ gives a larger $\mu$ (a typical example: $\mu=0.19$ for $\nu=0$ ; $\mu=0.33$ for $\nu=0.15$ ; $\mu=0.56$ for $\nu=0.3$ and $\mu=0.74$ for $\nu=0.5$ ). We choose $\nu=0:0.05:1$ , $s=10$ , $\sigma=$ 1e-3.

The results in Fig. 1 are computed from 100 independent realizations. It is observed that when the sparsity level $s$ and noise level $\sigma$ and incoherence $\nu$ are small, IHTC recovers the exact support with high probability as implied by Theorem 2.

III-B Comparison of ISTC with $\ell^{1}$ solvers

Now we compare ISTC with four state-of-the-art $\ell^{1}$ solvers: GPSR [8] (http://www.lx.it.pt/mtf/GPSR/), SpaRSA [9] (http://www.lx.it.pt/mtf/SpaRSA/), proximal-gradient homotopy method (PGH)[12] (https://www.microsoft.com/en-us/download/details.aspx?id=52421), and FISTA [5] (implemented as https://web.iem.technion.ac.il/images/user-files/becka/papers/wavelet_FISTA.zip)111All the codes were last accessed on February 23, 2017..

The numerical results (CPU time, number of matrix-vector multiplications (nMV), relative $\ell_{2}$ error (Re $\ell_{2}$ ), and absolute $\ell_{\infty}$ error (Ab $\ell_{\infty}$ )) are computed from 10 independent realizations of for random Bernoulli sensing matrices with different parameter tuples $(n,p,s,DR,\sigma)$ are shown in Tables I. It is observed that ISTC yields reconstructions that are comparable with that by other methods but at least two to three times faster. Further, it scales well with the problem size $p$ .

Next, we compare the empirical performance of ISTC with other methods by their phase transition curves in the $\rho$ - $\delta$ plane, with $\rho=s/n$ and $\delta=n/p$ . When computing the curves, we fix the dimension $p=1000$ , and partition the range $(\delta,\rho)\times[0.1,1]^{2}$ into a $30\times 30$ equally spaced grid, and run 100 independent simulations at each grid point. The $s$ -sparse signal $x^{{\dagger}}\in\mathbb{R}^{p}$ , matrix $\Psi\in\mathbb{R}^{n\times p}$ , and data $y\in\mathbb{R}^{n}$ are generated as [28, Fig. 13]. Fig. 2 plots the logistic regression curves identifying the $90\%$ success rate for the algorithms. IHTC exhibits similar phase transition behavior as other methods.

III-C Comparison of IHTC with greedy solvers

Now we compare IHTC with four state-of-the-art greedy methods for the $\ell^{0}$ problem, to recover 1D signal and benchmark MRI image. These methods include OMP [19] (https://sparselab.stanford.edu/SparseLab_files/Download_files/SparseLab21-Core.zip), normalized IHT (NIHT) [25] (http://www.gaga4cs.org/), CoSaMP [24] (http://mdav.ece.gatech.edu/software/SSCoSaMP-1.0.zip), and conjugate gradient IHT (CGIHT) [26] (http://www.gaga4cs.org/).

The underlying 1D signal and 2D MRI image are compressible under a wavelet basis. Thus, the data can be chosen as the wavelet coefficients sampled by the product of a partial FFT matrix and inverse Haar wavelet transform. For the 1D signal, the matrix $\Psi$ is of size $665\times 1024$ , and consists of applying a partial FFT and an inverse two level Harr wavelet transform. The signal under wavelet transform has $247$ nonzeros, and $\sigma=\mbox{1e-4}$ . The results are shown in Fig. 3 and Table III. The reconstruction by IHTC is visually more appealing than that of the others, cf. Fig. 3. The results by AIHT and CoSaMP suffer from pronounced oscillations. This is further confirmed by the PSNR value defined by $\mathrm{PSNR}=10\cdot\log\frac{V^{2}}{\rm MSE}$ , where $V$ is the maximum absolute value of the true signal, and MSE is the mean squared error of the reconstruction. Table III also presents the CPU time of the 1D example, which shows clearly that IHTC is the fastest one.

For the 2D MRI image, the matrix $\Psi$ amounts to a partial FFT and an inverse wavelet transform, and it has a size $34489\times 262144$ . The image under eight level Haar wavelet transformation has $7926$ nonzero entries and $\sigma=\mbox{3e-2}$ . The numerical results are shown in Fig. 4 and Table III. All $\ell^{0}$ methods produce comparable results, but the IHTC is fastest.

Next, we compare the empirical sparse recovery performance of IHTC with these greedy methods by means of phase transition curves in the $\rho$ - $\delta$ plane, with $\rho=s/n$ and $\delta=n/p$ . When computing the curves, we fix the dimension $p=1000$ , partition the range $(\delta,\rho)\in[0.1,1]^{2}$ into a $90\times 90$ uniform grid, and run 100 independent simulations at each grid point. Like before, the $s$ -sparse signal $x^{{\dagger}}\in\mathbb{R}^{p}$ , matrix $\Psi\in\mathbb{R}^{n\times p}$ and data $y\in\mathbb{R}^{n}$ are generated as [28, Fig. 13]. Fig. 5 plots the logistic regression curves identifying the $90\%$ success rate for the algorithms. IHTC exhibits comparable phase transition phenomenon with other greedy methods, whereas CoSaMP performs slightly worse than others.

IV Conclusion

In this paper, we analyze an iterative soft / hard thresholding algorithm with homotopy continuation for sparse recovery from noisy data. Under standard regularity condition and sparsity assumptions, sharp reconstruction errors can be obtained with an iteration complexity $O(\frac{\ln\epsilon}{\ln\gamma}np)$ . Numerical results indicated its competitiveness with state-of-the-art sparse recovery algorithms. The results can be extended to other penalties, e.g., MCP [29] or SCAD [30].

Acknowledgements

The authors thank anonymous referees for their helpful comments. The research of Y. Jiao is partially supported by National Science Foundation of China (NSFC) No. 11501579 and National Science Foundation of Hubei Province No. 2016CFB486, B. Jin by EPSRC grant EP/M025160/1, and X. Lu by NSFC Nos. 11471253 and 91630313.

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Comm. Pure Appl. Math. , vol. 57, no. 11, pp. 1413–1457, 2004.
2[2] P. Combettes and V. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Model. Simul. , vol. 4, no. 4, pp. 1168–1200, 2005.
3[3] T. Blumensath and M. E. Davies, “Iterative thresholding for sparse approximations,” J. Fourier Anal. Appl. , vol. 14, no. 5-6, pp. 629–654, 2008.
4[4] H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods,” Math. Program. , vol. 137, no. 1-2, pp. 91–129, 2013.
5[5] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci. , vol. 2, no. 1, pp. 183–202, 2009.
6[6] S. Becker, J. Bobin, and E. Candés, “NESTA: a fast and accurate first-order method for sparse recovery,” SIAM J. Imag. Sci. , vol. 4, no. 1, pp. 1–39, 2011.
7[7] E. Hale, W. Yin, and Y. Zhang, “Fixed-point continuation for ℓ 1 subscript ℓ 1 \ell_{1} -minimization: Methodology and convergence,” SIAM J. Optim. , vol. 19, no. 3, pp. 1107–1130, 2008.
8[8] M. Figueiredo, R. Nowak, and S. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE J. Sel. Topics Signal Proc. , vol. 1, no. 4, pp. 586–597, 2007.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Iterative Soft/Hard Thresholding with Homotopy Continuation for Sparse Recovery

Abstract

Index Terms:

I Introduction

II Convergence analysis

Lemma 1**.**

Proof.

Assumption II.1**.**

Theorem 2**.**

Proof.

Remark II.1**.**

Remark II.2**.**

III Numerical Results and Discussions

III-A Implementation details

III-B Comparison of ISTC with ℓ1\ell^{1}ℓ1 solvers

III-C Comparison of IHTC with greedy solvers

IV Conclusion

Acknowledgements

Lemma 1.

Assumption II.1.

Theorem 2.

Remark II.1.

Remark II.2.

III-B Comparison of ISTC with $\ell^{1}$ solvers