Accelerated Schemes for the $L_1/L_2$ Minimization
Chao Wang, Ming Yan, Yaghoub Rahimi, Yifei Lou

TL;DR
This paper introduces accelerated algorithms for $L_1/L_2$ minimization in sparse recovery, demonstrating efficiency and effectiveness, especially with high dynamic range signals, and providing empirical insights into exact $L_1$ recovery.
Contribution
It proposes three new numerical algorithms for $L_1/L_2$ minimization, including two adaptive schemes that reduce computation time and analyze their convergence.
Findings
Algorithms are comparable to state-of-the-art methods.
Adaptive schemes work well with high dynamic range signals.
Empirical evidence suggests conditions for exact $L_1$ recovery.
Abstract
In this paper, we consider the minimization for sparse recovery and study its relationship with the - model. Based on this relationship, we propose three numerical algorithms to minimize this ratio model, two of which work as adaptive schemes and greatly reduce the computation time. Focusing on two adaptive schemes, we discuss their connection to existing approaches and analyze their convergence. The experimental results demonstrate the proposed approaches are comparable to the state-of-the-art methods in sparse recovery and work particularly well when the ground-truth signal has a high dynamic range. Lastly, we reveal some empirical evidence on the exact recovery under various combinations of sparsity, coherence, and dynamic ranges, which calls for theoretical justification in the future.
| 2 | 6 | 10 | 14 | 18 | 22 | |
| 100 | 100 | 80 | 4 | 0 | 0 | |
| 100 | 100 | 80 | 4 | 0 | 0 | |
| 100 | 100 | 80 | 4 | 0 | 0 | |
| 100 | 100 | 80 | 4 | 0 | 0 | |
| 100 | 100 | 86 | 16 | 0 | 0 | |
| 100 | 100 | 88 | 38 | 12 | 0 | |
| 2 | 6 | 10 | 14 | 18 | 22 | |
| 100 | 100 | 100 | 100 | 50 | 0 | |
| 100 | 100 | 100 | 100 | 52 | 0 | |
| 100 | 100 | 100 | 100 | 52 | 0 | |
| 100 | 100 | 100 | 100 | 52 | 0 | |
| 100 | 100 | 100 | 100 | 54 | 0 | |
| 100 | 100 | 100 | 100 | 76 | 16 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Accelerated Schemes for the Minimization
Chao Wang, Ming Yan, Yaghoub Rahimi, Yifei Lou C. Wang and Y. Lou are with the Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75080 USA (E-mail: [email protected], [email protected]). Y. Lou was partially supported by NSF Awards DMS 1522786 and 1846690. Y. Rahimi is with the School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332 USA (E-mail: [email protected]).M. Yan is with the Department of Computational Mathematics, Science and Engineering (CMSE) and the Department of Mathematics, Michigan State University, East Lansing, MI, 48824 USA (Email: [email protected]). M. Yan was partially supported by NSF award DMS 1621798.
Abstract
In this paper, we consider the minimization for sparse recovery and study its relationship with the - model. Based on this relationship, we propose three numerical algorithms to minimize this ratio model, two of which work as adaptive schemes and greatly reduce the computation time. Focusing on the two adaptive schemes, we discuss their connection to existing approaches and analyze their convergence. The experimental results demonstrate that the proposed algorithms are comparable to state-of-the-art methods in sparse recovery and work particularly well when the ground-truth signal has a high dynamic range. Lastly, we reveal some empirical evidence on the exact recovery under various combinations of sparsity, coherence, and dynamic ranges, which calls for theoretical justification in the future.
Index Terms:
Sparsity, , adaptive scheme, dynamic range.
I Introduction
In various science and engineering applications, one aims to seek for a low-dimensional representation from high-dimensional data, and sparsity is a crucial assumption. For example, it is reasonable to assume in machine learning [1] that only a few features correspond to the response. In image processing [2], the restored images are often piecewise constant, which means that gradients are sparse. In non-negative matrix factorization [3], the low-rank decomposition enforces sparsity with respect to singular values.
Sparse signal recovery is to find the sparsest solution of where (), , and . We assume that has a full row rank and is nonzero. This problem is often referred to as compressed sensing (CS) [4, 5] in the sense that the sparse signal is compressible. Mathematically, it can be formulated by the minimization,
[TABLE]
Unfortunately, the problem is known to be NP-hard [6]. Various approaches in sparse recovery have been investigated. Some greedy methods include orthogonal matching pursuit (OMP) [7], orthogonal least squares (OLS) [8], and compressive sampling matching pursuit (CoSaMp) [9]. However, these greedy methods often lack of accuracy when is large. Alternatively, approximations/relaxation approaches to the norm have been sought. For example, convex relaxation, referred to as basis pursuit (BP) [10], replaces in (1) with the norm. Recently, nonconvex models attract considerate amount of attentions due to their sharper approximations of compared to the norm. Some popular nonconvex models include [11, 12, 13], - [14, 15], transformed (TL1) [16, 17, 18], nonnegative garrote [19], and capped- [20, 21, 22]. Except for -, all of these nonconvex models involve one parameter to be determined and adjusted for different types of sparse recovery problems.
In this paper, we study the ratio of and as a scale-invariant and parameter-free metric to approximate the desired scale-invariant norm. The ratio of and can be traced back to [23] as a sparsity measure, and its scale-invariant property was explicitly mentioned in [24]. Esser et al. [25, 14] focused on nonnegative signals and established the equivalence between and . The ratio model was later formulated as a nonlinear constraint that was solved by a lifted approach [26, 27]. Some applications of include blind deconvolution [28, 29] and sparse filtering [30, 31].
In our earlier work [32], we focused on a constrained minimization problem,
[TABLE]
Theoretically, we proved that any -sparse vector is a local minimizer of the model provided with a strong null space property (sNSP) condition. Computationally, we considered to minimize (2) via the alternating direction method of multipliers (ADMM) [33]. In particular, we introduced two auxiliary variables and formed the augmented Lagrangian as
[TABLE]
where is defined as
[TABLE]
There is a closed-form solution for each sub-problem. Please refer to [32] for more details.
This paper contributes three schemes to minimize (2). We demonstrate in experiments that the new schemes are computationally more efficiently compared to the previous ADMM approach. The novelties of the paper are three-fold:
- (1)
Thanks to the new schemes, can effectively deal with sparse signals with a high dynamic range, which is not the case for the ADMM approach; 2. (2)
We reveal the connection of the proposed schemes to existing approaches, which helps to establish the convergence; 3. (3)
Our empirical results shed light about the effects of sparsity, coherence, and dynamic range on sparse recovery, which is new in the CS literature.
The rest of the paper is organized as follows. Section II is devoted to theoretical analysis on the relation between and -, which motivates three numerical schemes to minimize . We interpret the proposed schemes in line with some existing approaches in Section III, followed by convergence analysis in Section IV. We conduct extensive experiments in Section V to demonstrate the performance of the model with three minimizing algorithms over state-of-the-art methods in sparse recovery. Section VI presents how the classic approach behaves under different dynamic ranges and how sparsity, coherence, and dynamic range interplay on sparse recovery. Finally, conclusions and future works are given in Section VII.
II Numerical schemes
We establish in 1 a link between the constrained formulation (2) and -, where is a positive parameter. Immediately following this proposition, we develop a numerical algorithm for minimizing the ratio model. We further discuss two accelerated approaches in Section II-B.
Proposition 1**.**
Denote
[TABLE]
and
[TABLE]
then we have
- (a)
if , then ; 2. (b)
if , then ; 3. (c)
if , then
Proof.
Denote the feasible set of (5) by . Since then .
- (a)
If , then there exists such that , which implies that . Therefore, we have . 2. (b)
If , then for all we have . So and hence , i.e., . 3. (c)
If then by part (b) we get . Furthermore, there exists a sequence such that . Since , we have . Hence, has a lower bounded, i.e. for all , then we get , which means . Therefore, we have .
∎
II-A Bisection Search
It follows from 1 that the optimal value of equals to the value of in the - model if the objective value of - is zero. That is to say, the optimal value of the ratio model is the root of , which can be obtained by bisection search. Moreover, we have upper/lower bounds of , i.e., , since [34]. The procedure goes as follows: we start with an initial range of to be and an initial value of in between. Then using this , we solve for the - minimization via the difference-of-convex algorithm (DCA) [35]; more details on the DCA implementation will be given in Section II-B. Based on the objective value of , we update the range of . Specifically if , then we find the minimum ratio and the corresponding minimizer in the - model is also the minimizer of the model. If , then we update the range as If , then the minimum ratio is smaller than , so we can shorten the range from to We can further shorten the internal as as the objective value of - would be less than or equal to zero in the next iteration. After the range is updated, we choose using the middle point of two end points and iterate.
We summarize the entire process as Algorithm 1, in which the stopping criterion is that the error between two adjacent values is small enough. As the algorithmic scheme follows directly from bisection search, we refer the algorithm as -BS or BS if the context is clear. The convergence of BS can be obtained in the same way that the bisection method converges. However, due to the nonconvex nature of the - minimization (6), there is no guarantee to find its global minimizer and hence the solution to (5) may be suboptimal.
II-B Adaptive Algorithms
The BS algorithm is computationally expensive, considering that the - minimization is conducted for multiple times. To speed up, we discuss two variants of -BS by updating the parameter iteratively while minimizing .
Following the DCA framework [36, 37] to minimize , we consider the objective function as the difference of two convex functions, i.e., By linearizing the second term , the DCA iterates as follows,
[TABLE]
Particularly for the - model, we have
[TABLE]
thus leading to the DCA update as
[TABLE]
Now we consider to update iteratively by the ratio of the current solution, leading to the following scheme,
[TABLE]
where is defined in (8). Notice that the -subproblem in (10) is a linear programming (LP) problem, which unfortunately has no guarantee that the optimal solution exists (as the problem can be unbounded). To increase the robustness of the algorithm, we further incorporate a quadratic term into the linear problem, i.e.,
[TABLE]
We denote these two adaptive methods (10) and (11) as -A1 and -A2, respectively or A1 and A2 for short. Both algorithms are summarized in Algorithm 2.
For the subproblem of -A1, we convert it into an LP problem. Assume that where and Denote then becomes with . Therefore, the -subproblem becomes
[TABLE]
where . We adopt the software Gurobi [38] to solve this LP problem.
The subproblem of -A2 is a quadratic programming problem, which can be solved via ADMM. By introducing an auxiliary variable , we have the augmented Lagrangian,
[TABLE]
Then the ADMM iteration goes as follows
[TABLE]
where the subscript indexes the inner loop, as opposed to the superscript for outer iterations used in (11). The -subproblem of (14) is a projection problem to minimize
[TABLE]
under the constraint of . Since the closed-form solution of projecting a vector to this constraint is
[TABLE]
the -update is given by
[TABLE]
The -subproblem of (14) is equivalent to
[TABLE]
It has a closed-form solution via soft shrinkage, i.e.,
[TABLE]
with
III Connections to previous works
We try to interpret the proposed adaptive methods (A1 and A2) in line with some existing approaches: parameter selection, generalized inverse power, and gradient-based methods. Our efforts contribute to convergence analysis in Section IV.
III-A Parameter Selection
Recall that in -BS, the ratio is minimized when there exists a proper such that with . We can regard this process as a root-finding problem for , which often occurs in parameter selection. For example, in the discrepancy principle method [39, 40, 41], one aims to find a parameter such that the resulting data-fitting term is close to the noise level. In particular, we represent this process by
[TABLE]
where is a general objective function to be minimized and is a certain scheme to update so that discrepancy principle holds. Typically, an inner loop is required to find the solution of -subproblem, followed by updating this parameter in an outer iteration. We further present the -th inner iteration at the -th outer iteration by
[TABLE]
for the -subproblem in (17).
To speed-up the process, Wen and Chan [40] proposed an adaptive scheme that updates the parameter during the inner loop such that it renders the current data-fitting term equal to the noise level. In other words, instead of updating after minimizing , they directly iterated
[TABLE]
in a way that satisfies the discrepancy principle. In this way, only one loop is needed as opposed to inner/outer loops in (18). But it requires a closed-form solution for so one can perform a one-dimensional search for .
The proposed BS scheme falls into the framework of (17) in that the searching range of parameter is shorten every outer iteration. However, in our BS method is the - minimization that does not have a closed-form solution. As opposed to (19), we consider to update
[TABLE]
prior to updating . In other word, we update based on rather than , the latter of which was adopted in the parameter-selection method [40]. The rationale of (20) is to guarantee that satisfies . The iterative scheme (20) is consistent with A1 or A2 (depending on the form of ), if we change the notation from subscript to superscript .
III-B Generalized Inverse Power Methods
A standard technique to find the smallest eigenvalue of a positive semi-definite symmetric matrix is the inverse power method [34] that requires to iteratively solve the linear system,
[TABLE]
The iteration converges to the smallest eigenvector of , denoted by . Then the smallest eigenvalue can be evaluated by , where is Rayleigh quotient defined as
[TABLE]
Note that (21) is equivalent to the minimization problem
[TABLE]
It is well known in linear algebra [34, 42] that eigenvectors of are critical points of and the smallest eigenvalue/eigenvector can be found by (22). This idea is naturally extended to the nonlinear case in [43], where a general quotient is considered, with arbitrary functions and . Similarly to (22), we have the corresponding scheme
[TABLE]
Following [43], we consider to update the eigenvalue at each iteration to guarantee the algorithm’s descent. In particular, the iterative scheme is given by
[TABLE]
If we choose and denote as , then the generalized inverse power method (23) is -A1. In [44], a modified inverse power method was proposed via the steepest descent flow. The iteration scheme is to incorporate a quadratic term in the objective function of the -subproblem, which leads to -A2.
III-C Gradient-based Methods
Definition 1**.**
A critical point of a constrained optimization problem is a vector in the feasible set (satisfying the constraints) that is also a local maximum, minimum, or saddle point of the objective function.
According to Karush-Kuhn-Tucker (KKT) conditions, is a critical point of (2) if and only if there exists a vector such that
[TABLE]
By introducing , we have
[TABLE]
The condition (25) is also an optimality condition to another optimization problem:
[TABLE]
where is from (8) and is some function satisfying
[TABLE]
Note that can not be explicitly determined from (27).
By applying a proximal gradient method (PGM) [45, 46, 47] on the model (26), we obtain the following scheme
[TABLE]
where This iterative scheme is the same as -A2.
As for -A1, we can interpret it as a generalized conditional gradient method [48] that minimizes by
IV Convergence analysis
Following the discussion in Section III-C, we present the convergence analysis. We start with the convergence of A2, which is characterized in Theorem 1. To prove it, we need four lemmas, whose proofs are given in Appendix.
Lemma 1**.**
(Sufficient decreasing) The sequence produced by -A2 satisfies
[TABLE]
The next two lemmas (Lemma 2 and Lemma 3) discuss the Lipschitz properties.
Lemma 2**.**
Define . Then for any satisfying , we have
[TABLE]
Since the gradient of the norm is , Lemma 2 implies that the gradient of Euclidean norm is Lipschitz-continuous in the domain . The next lemma is about the Lipschitz property for the implicit function that satisfies (27).
Lemma 3**.**
Given defined in Lemma 2. For any satisfying , then
[TABLE]
for satisfying (27) and .
Lemma 4**.**
Given defined in (8) and suppose satisfies (27), we denote
[TABLE]
for an arbitrary . Then we have
- (a)
if and only if is a critical point of (2); 2. (b)
with for any satisfying .
It is stated in (28) that -A2 can be expressed as . By the definition of in (30) and the decreasing property of in Lemma 1, we can interpret A2 as a gradient descent method
[TABLE]
In the following theorem, we rely on Lemma 4 to show that the descent direction along leads to convergence.
Theorem 1**.**
Given a sequence generated by -A2. If is bounded, there exists a subsequence that converges to a critical point of the ratio model (2).
Proof.
According to Lemma 1, we know that is decreasing and bounded from below, so there exists a scalar such that . With the boundedness assumption of , we get from Lemma 1, which implies that . The boundedness of also leads to a convergent subsequence, i.e., Therefore, we have
[TABLE]
As , we get and hence By Lemma 4, converges to a critical point. ∎
Remark 1**.**
Theorem 1* does not require that the step-size is small, which is typically for gradient-based methods. In our numerical tests, we can choose small and get good results. *
Theorem 2**.**
Given a sequence generated by -A1. If is bounded, it has a convergent subsequence.
Proof.
Denote
[TABLE]
Since by the definition of , the minimal value of subject to the constraint is less than or equal to zero. Specifically, As a result, by Cauchy-Schwarz inequality, we have
[TABLE]
which implies . Since , the decreasing sequence of converges, i.e., . By the boundedness of , it has a convergent subsequence, i.e, there exists a vector such that . ∎
Remark 2**.**
The sufficient decrease property (Lemma 1) does not hold for when -A2 reduces to A1. So, we cannot show that A1 converges to a critical point.
Remark 3**.**
According to Theorem 1 and Theorem 2, we prove that either both algorithms diverge due to unboundedness or there exists a convergent subsequence. It is possible that the solution can be unbounded. For example, has a zero-column, then the corresponding entry can take so that the ratio of and is minimized. In the numerical tests, we demonstrate empirically that is always bounded and hence convergent for general (random) matrices .
V Numerical experiments
In this section, we compare the proposed algorithms with state-of-the-art methods in sparse recovery. All the numerical experiments are conducted on a desktop with CPU (Intel i7-6700, 3.4GHz) and
We focus on the sparse recovery problem with highly coherent matrices, on which standard models fail. Following the works of [15, 49, 50], we consider an oversampled discrete cosine transform (DCT), defined as with
[TABLE]
where is a random vector that is uniformly distributed in and is a positive parameter to control the coherence in a way that a larger yields a more coherent matrix. Throughout the experiments, we consider over-sampled DCT matrices of size . The ground truth is simulated as an -sparse signal, where is the number of nonzero entries. As suggested in [50], we require a minimum separation at least in the support of . As for the values of non-zero elements, we follow the work of [51] to consider sparse signals with a high dynamic range. Define the dynamic range of a signal as which can be controlled by an exponential factor . In particular, we simulate by the following MATLAB command,
[TABLE]
In the experiments, we set and , corresponding to and , respectively. Note that randn and rand are the commands for the Gaussian distribution and the uniform distribution , respectively. To compare with our previous work [32] of the minimization, we also consider that the nonzero elements follow the Gaussian distribution, i.e.,
The fidelity of sparse signal recovery is assessed in terms of success rate, defined as the number of successful trials over the total number of trials. When the relative error between the ground truth and the reconstructed solution , i.e., is less than , we declare it as a success. Moreover, we categorize the failure of not recovering the ground-truth signal as model/algorithm failures and by comparing the objective function at the ground truth and at the restored solution . If , then is not a global minimizer of the model, so we regard it as a model failure. If , then the algorithm does not reach a global minimizer. It is referred to as an algorithm failure. Similarly to success rates, we can define model-failure rates and algorithm-failure rates.
V-A Algorithmic Comparison
We present various computational aspects of the proposed algorithms, i.e., BS, A1, and A2, together with comparison to our previous ADMM approach [32]. First of all, we attempt to demonstrate the convergence of all proposed algorithms using an example of , (so the minimal separation is 30), and nonzero elements following Gaussian distribution. Since the ratio model is solved via the - model, we plot the values of and versus iteration counter in Figure 1. For -BS, we record the value at each outer iteration and the stopping conditions are either the maximum outer iteration reaches 10 or . For each iteration of A1, A2, and the inner loop of BS, the stopping criterions are the relative error The left plot in Figure 1 illustrates the convergence of the three algorithms in the sense that goes down. Both A1 and A2 are faster than BS as BS starts with a larger range of as , while A1 and A2 start with a good initial value of , which is very close to the final optimal value . The right plot in Figure 1 examines the evolution of , which gradually becomes stable and approaches to a similar value around 3.06 for all three algorithms. Figure 1 confirms the decrease property of proved in Lemma 1.
In Theorem 1, we require the sequence to be bounded for the convergence analysis. Here we aim at an empirical verification on the boundedness. In particular, we test on various kinds of linear systems with and sparsity ranging from 2 to 22. In each setting, we randomly generate 50 pairs of ground-truth signals and linear systems to compute the norm of solutions obtained by A1 and A2, along with the norm of ground-truth signals. The mean values of these norms are plotted in Figure 2. As the maximum values are finite numbers, it means that the reconstructed signal is always bounded. Figure 2 also shows that the norms of A1 and A2 align quite well with the ground truth when the sparsity is below 14, no matter the system is coherent or not. When the matrix is highly coherent with more nonzero elements, both A1 and A2 give much larger values of the norm compared to the ground truth. It is because a larger norm gives rise to a smaller value in the ratio of that we try to minimize. In any cases, the solutions of both A1 and A2 are shown to be bounded.
Next, we compare the three algorithms with our previous ADMM approach [32]. We consider and with nonzero elements following the Gaussian distribution or having high dynamic ranges. We randomly simulate 50 trials for each sparsity level and compute the average of success rates, algorithm-failure rates, and computation time. The Gaussian case is illustrated in Figure 3, showing that ADMM is the worst in terms of success rates partly due to high algorithm failure rates. Here, for ADMM and for A2. In addition, BS achieves the highest success rates but is the slowest. Both A1 and A2 have similar performance to BS with much reduced computation time. Figure 4 examines the case of the dynamic range for the non-zero values in with and . Here we set and for A2, while for ADMM. Similar performance is observed as the Gaussian case. In summary, we rate A1 as the most efficient algorithm for minimizing the ratio model with a balanced performance between accuracy and computational costs. We also observe that all the algorithms tend to give better performance in terms of success rates with higher dynamic ranges, which seems counter-intuitive. We will revisit this phenomenon in Section VI.
V-B Model Comparison
We intend to compare various sparse promoting models. Since the Gaussian case was conducted in our previous work [32], we focus on the dynamic range here. We compare the proposed model with the following models: [10], [11], - [49, 15], and TL1 [18]. We adopt -A1 to solve for the ratio model, as it is the most efficient algorithm from the discussion in Section V-A. The initial guess for all non-convex models is the solution obtained by Gurobi. We choose for and for TL1 when the range factor is known a priori.
Figure 5 plots the success rates of and . We observe that TL1 is the best except for the low coherence and the low dynamic case, where is the best. But is the worst in the other cases. The model is always the second best. Note that the ratio model is parameter-free, while the performance of TL1 largely relies on the parameter . Figure 6 examines the success rate of TL1 with different values of . We choose in the model comparison, which is almost the best among these testing values of . If no such prior information of the dynamic range were available to tune , the performance of TL1 might be worse than .
VI Discussions
Candés and Wakin [52] presented two principles in compressed sensing, i.e., sparsity and incoherence. We reported in our previous work [32] that higher coherence leads to better sparse recovery, which seems to contradict with the current belief in CS. In this paper, we discuss the dynamic range and reveal its effect on the exact recovery via the approach. To our best of our knowledge, there has been little discussion on the dynamic range in the CS literature, except for [51]. We consider low-coherent matrices with and high-coherent ones with . We record the success rates of different combinations of sparsity levels () and dynamic ranges in Table I. It shows that a higher dynamic range leads a better performance. It seems that the approach is independent on for relatively sparser signals.
Now that there are three quantities that may contribute to the success of sparse recovery, i.e., sparsity, coherence, and dynamic range, we try to give a comprehensive analysis by using the relative error instead of the success rates, as the latter depends on the successful threshold. We plot in Figure 7 the mean and the standard deviation of the relative errors from 50 random trails versus coherence levels (). Based on Table I, we only consider the number of non-zeros value larger than 18 and . In each subfigure of Figure 7, the curves decrease when increases, which means that higher coherence leads to better performance. This is consistent with the observation in [32]. As for the dynamic range, we discover in Figure 7 that a larger value of leads to a smaller relative error. Finally, the sparsity affects the performance in the way that smaller relative errors can be achieved for sparser signals. These numerical phenomena have not been reported in the CS literature, which motivate for future theoretical justifications.
VII Conclusions and future works
We studied the scale-invariant and parameter-free minimization to promote sparsity. We presented three numerical algorithms to minimize this nonconvex model based on the relationship between and - for certain . Experimental results compared the proposed algorithms with state-of-the-art methods in sparse recovery. Particularly important is the proposed algorithm works well when the ground-truth signal has a high dynamic range. Last but not least, we analyzed the behaviors of the approach towards the exact recovery with varying sparsity, coherence, and dynamic range. Future works include the theoretical analysis on the effect of the high dynamic range towards sparse recovery as well as the applications of the ratio model in image processing such as blind deconvolution [28, 29].
-A *Proof of Lemma 1 *
Proof.
Based on the -subproblem in (11), we get
[TABLE]
After rearranging, we get the following inequality
[TABLE]
The second inequality is owing to the convexity of Euclidean norm and the definition of . Lemma 1 is then obtained by dividing on both sides of (33). ∎
-B *Proof of Lemma 2 *
Proof.
Simple calculations lead to
[TABLE]
For any satisfying , the minimal norm is reached by projecting the origin onto the feasible set of It follows from the projection operator defined in (15) that
[TABLE]
Combining (34) and (35), we get Lemma 2. ∎
-C Proof of Lemma 3
Proof.
It is straightforward to have
[TABLE]
We simplify the first term in (36) by calculating
[TABLE]
and using . Therefore, we get
[TABLE]
As for the second term in (36), we have it bounded by
[TABLE]
Combining (37) and (38), we obtain (29). ∎
-D Proof of Lemma 4
Proof.
It is straightforward that
[TABLE]
By the optimality condition [47], the latter relation holds if and only if there exists a vector such that
[TABLE]
which implies that is a critical point of (26). It follows from (28) that (26) is equivalent to (2) and hence is also a critical point of (2). According to the nonexpansiveness of the proximal operator and the Lipschitz continuousness of , we have
[TABLE]
The Lemma follows. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. Series B , vol. 58, no. 1, pp. 267–288, 1996.
- 2[2] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D , vol. 60, no. 1-4, pp. 259–268, 1992.
- 3[3] A. Berman and R. J. Plemmons, Nonnegative matrices in the mathematical sciences . SIAM, 1994.
- 4[4] D. L. Donoho et al. , “Compressed sensing,” IEEE Trans. Inf. Theory , vol. 52, no. 4, pp. 1289–1306, 2006.
- 5[5] E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. Pure Appl. Math , vol. 59, no. 8, pp. 1207–1223, 2006.
- 6[6] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM J. Comput. , vol. 24, no. 2, pp. 227–234, 1995.
- 7[7] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Asilomar Conf. Signals, Systems and Computers . IEEE, 1993, pp. 40–44.
- 8[8] S. Chen, S. A. Billings, and W. Luo, “Orthogonal least squares methods and their application to non-linear system identification,” Int. J. Control , vol. 50, no. 5, pp. 1873–1896, 1989.
