Sparse Solutions of a Class of Constrained Optimization Problems
Lei Yang, Xiaojun Chen, Shuhuang Xiang

TL;DR
This paper investigates properties of solutions to a class of sparse optimization problems involving nonconvex and non-Lipschitz objectives, providing bounds, solution set characterizations, and an algorithm with convergence guarantees.
Contribution
It offers new theoretical insights into the structure of solutions for sparse optimization with nonconvex penalties and proposes an effective smoothing penalty method for solving such problems.
Findings
Optimal solutions are on the boundary of the feasible set when 0<p<1.
The solution set for 0<p<1 is finite for q in {1,∞}.
The proposed smoothing penalty method converges to a KKT point under mild conditions.
Abstract
In this paper, we consider a well-known sparse optimization problem that aims to find a sparse solution of a possibly noisy underdetermined system of linear equations. Mathematically, it can be modeled in a unified manner by minimizing subject to for given , , , and . We then study various properties of the optimal solutions of this problem. Specifically, without any condition on the matrix , we provide upper bounds in cardinality and infinity norm for the optimal solutions, and show that all optimal solutions must be on the boundary of the feasible set when . Moreover, for , we show that the problem with has a finite number of optimal solutions and prove that there exists such that the solution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Variational Analysis · Sparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research
\NatBibNumeric\TheoremsNumberedBySection\EquationsNumberedBySection
\TITLE
Sparse Solutions of a Class of Constrained Optimization Problems
\ARTICLEAUTHORS\AUTHOR
Lei Yang \AFF Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076. ([email protected]) \AUTHORXiaojun Chen \AFFDepartment of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China. ([email protected]) \AUTHORShuhuang Xiang \AFFSchool of Mathematics and Statistics, INP-LAMA, Central South University, Changsha, Hunan 410083, China. ([email protected])
\ABSTRACT
In this paper, we consider a well-known sparse optimization problem that aims to find a sparse solution of a possibly noisy underdetermined system of linear equations. Mathematically, it can be modeled in a unified manner by minimizing subject to for given , , , and . We then study various properties of the optimal solutions of this problem. Specifically, without any condition on the matrix , we provide upper bounds in cardinality and infinity norm for the optimal solutions, and show that all optimal solutions must be on the boundary of the feasible set when . Moreover, for , we show that the problem with has a finite number of optimal solutions and prove that there exists such that the solution set of the problem with any is contained in the solution set of the problem with and there further exists such that the solution set of the problem with any remains unchanged. An estimation of such is also provided. In addition, to solve the constrained nonconvex non-Lipschitz - problem ( and ), we propose a smoothing penalty method and show that, under some mild conditions, any cluster point of the sequence generated is a stationary point of our problem. Some numerical examples are given to implicitly illustrate the theoretical results and show the efficiency of the proposed algorithm for the constrained - problem under different noises.
\KEYWORDS
Sparse optimization; nonconvex non-Lipschitz optimization; cardinality minimization; penalty method; smoothing approximation. \MSCCLASSPrimary: 90C26, 90C30; secondary: 65K05 \ORMSCLASSPrimary: mathematics, systems solution; secondary: programming, algorithms
1 Introduction
In this paper, we consider a class of sparse optimization problems, which can be modeled in a unified manner as the following constrained - problem:
[TABLE]
where , , , and are given. We assume that the feasible set of problem (1) is nonempty so that problem (1) is well-defined. With this assumption, one can easily verify that an optimal solution for (namely, a sparsest solution) exists thanks to the discrete and discontinuous nature of and the closedness of the feasible set. Moreover, for , since is level-bounded, then an optimal solution exists (see [33, Theorem 1.9]). Therefore, the optimal solution set of problem (1), denoted by , is nonempty for any and . We also assume that so that and . Obviously, when , (1) is a convex optimization problem and when , (1) yields a nonconvex and non-Lipschitz optimization problem.
Problem (1) aims to find a sparse vector from the corrupted observation , where denotes an unknown noisy vector bounded by (the noise level) in -norm, i.e., . This problem arises in many contemporary applications and has been widely studied under different choices of , and in the literature; see, for example, [3, 4, 5, 6, 7, 8, 12, 13, 15, 16, 17, 18, 20, 23, 31, 34, 35, 41, 42, 43]. Among these studies, the -norm is commonly used for measuring the noise and leads to a mathematically tractable problem when the noise exists and comes from a Gaussian distribution [3, 5, 12, 13, 17, 20, 34, 35]. In particular, it has been known that a sparse vector can be (approximately) recovered by the solution of the convex optimization problem (1) with and under some well-known recovery conditions such as the restricted isometry property (RIP) [5], the mutual coherence condition [3, 17] and the null space property (NSP) [15, 41]. Such convex constrained - problem can also be solved efficiently by a spectral projected gradient minimization algorithm (SPGL1) proposed by Van den Berg and Friedlander [35]. On the other hand, it is natural to find a sparse vector by solving problem (1) with since approaches as . Indeed, under certain RIP conditions, Foucart and Lai [20] showed that a sparse vector can be (approximately) recovered by the solution of the nonconvex non-Lipschitz problem (1) with and . Chen, Lu and Pong [12] also proposed a penalty method for solving this constrained - problem () with promising numerical performances. Later, this penalty method and the SPGL1 are further combined to solve (1) with and for recovering sparse signals on the sphere in [13]. However, when the noise does not come from the Gaussian distribution but other heavy-tailed distributions (e.g., Student’s t-distribution) or contains outliers, using as the data fitting term is no longer appropriate. In this case, some robust loss functions such as the -norm [19, 36, 37] and the -norm [4, 7] are used to develop robust models. Recently, Zhao, Jiang and Luo [43] also established a fairly comprehensive weak stability theory for problem (1) with and under a so-called weak range space property (RSP) condition. The weak RSP condition can be induced by several existing compressed sensing matrix properties and hence can be the mildest one for the sparse solution recovery. However, it is still not easy to verify this condition in practice.
In this paper, we focus on problem (1) with different choices of and , and establish the following theoretical results concerning its optimal solutions without any condition on the sensing matrix .
- (i)
For any with and , we have and
[TABLE]
where , denotes its cardinality, and and are the largest and smallest eigenvalues of , respectively. Moreover, for any , for ; and with some for .
- (ii)
For , the solution set SOL with has a finite number of elements.
- (iii)
There exists a such that for any . An explicit estimation of such is also given. Moreover, there exists a such that for any .
Here, we would like to point out that the sparse solution recovery result (iii) is developed without any aforementioned recovery condition on . This not only complements the existing recovery results in the literature, but also shows the potential advantage of using the -norm () for recovering the sparse solution over the -norm ball.
Note that problem (1) is a constrained problem, while, in statistics and computer science, the - problem/minimization often refers to the following unconstrained regularized problem [9, 11, 14]:
[TABLE]
where is a positive regularization parameter. Indeed, when and , problem (2) is the well-known -regularized least-squares problem (namely, the LASSO problem) and it is known that, in this case, there exists a such that, for , the constrained problem (1) is equivalent to the unconstrained problem (2) regarding solutions; see, for example, [3, Section 3.2.3]. However, Example 3.1 in [12] shows that for and , there does not exist a so that problems (1) and (2) have a common global or local minimizer. Hence, for , one cannot expect to solve (1) by solving the regularized problem (2) with some fixed . In view of this, we shall consider a penalty method for solving problem (1) with , which basically solves the constrained problem (1) by solving a sequence of unconstrained penalty problems. Specifically, we consider the following penalty problem of (1):
[TABLE]
Note that the function is continuously differentiable for . Then, based on problem (3), one can readily extend the penalty method proposed in [12] for solving problem (1) with and to solve problem (1) with and . However, for , since the function is nonsmooth, then the approach in [12] cannot be adapted directly. In view of this, in this paper, we propose an alternative smoothing penalty method for solving
[TABLE]
where . Notice that we omit the case of to save space in this paper. Nevertheless, our approach can be extended without much difficulty to solve problem (1) with and , because the -constrained problem and the -constrained problem have similar properties in the sense that both constraints and can be represented as linear constraints, and the functions and are piecewise linear. We shall show that problem (3) with is the exact penalty problem of problem (4) regarding local minimizers and global minimizers. We also prove that any cluster point of a sequence generated by our smoothing penalty method is a stationary point of problem (4). Moreover, some numerical results are reported to show that all computed stationary points have the properties in our theoretical contribution (i) mentioned above. Here, we would like to emphasize that finding a global optimal solution of (4) is NP-hard [11, 21]. Thus, it is interesting to see that our smoothing penalty method can efficiently find a ‘good’ stationary point of problem (4), which has important properties of a global optimal solution of problem (4).
The rest of this paper is organized as follows. In Section 2, we rigorously prove properties (i)-(iii) listed above and give a concrete example to verify these properties. In Section 3, we present a smoothing penalty method for solving problem (4) and show some convergence results. Some numerical results are presented in Section 4, with some concluding remarks given in Section 5.
Notation and Preliminaries
In this paper, we use the convention that . For an index set , let denote its cardinality and denote its complementarity set. We denote by the subvector formed from a vector by picking the entries indexed by and denote by the submatrix formed from a matrix by picking the columns indexed by . Recall from [33, Definition 8.3] that, for a proper closed function , the regular (or Fréchet) subdifferential, the (limiting) subdifferential and the horizon subdifferential of at are defined respectively as
[TABLE]
It can be observed from the above definitions (or see [33, Proposition 8.7]) that
[TABLE]
When is convex, the above (limiting) subdifferential coincides with the classical subdifferential in convex analysis [33, Proposition 8.12]. Moreover, if is continuously differentiable, we have , where is the gradient of at [33, Exercise 8.8(b)]. For a closed set , its indicator function is defined by if and otherwise. In addition, we use to denote the closed ball of radius centered at , i.e., , and to denote the feasible set of problem (1).
2 Properties of solutions of problem (1)
In this section, we characterize the properties of the optimal solutions of problem (1) with different choices of and . We first give a supporting lemma.
Lemma 2.1
Let . For any , we have
[TABLE]
Proof. We consider the following two cases.
- •
. In this case, it is easy to see that . On the other hand, since , it then follows from the Hölder’s inequality that
[TABLE]
which results in .
- •
. In this case, it is easy to see that . On the other hand, since , it then follows from the Hölder’s inequality that
[TABLE]
which results in .
Combing the above results, we prove this lemma.
The following theorem is given for and .
Theorem 2.2
Let . For any , the following statements hold with .
- (i)
For , ; and for , there is a scalar such that and for any .
- (ii)
For , .
- (iii)
For ,
[TABLE]
where and are the largest and smallest eigenvalues of , respectively. Moreover, when , we have .
Proof. Statement (i). If , the results hold trivially. Next, we assume that .
Consider . From , we see that . Then, it is easy to verify that there exists a constant such that . Thus, , but for . This leads to a contradiction. Hence, we have .
Consider . Let . Then, from the continuity of , and , there exists a scalar such that . Moreover, it is easy to verify that is convex on . Thus, for any , there exists a such that and . Hence, is feasible. This together with shows that for any .
Statement (ii). Let for simplicity. We then consider the following two cases.
Case 1, . First, it is not hard to see that since any set of vectors in is linearly dependent. Thus, we have . We next prove by contradiction. Assume that . Then, there exists a vector such that and . Let be a vector such that and . Thus, we have . Now, let
[TABLE]
Then, we see that since . Moreover, from the definition of , one can verify that and thus . This leads to a contradiction. Hence, we only have .
Case 2, . We first prove by contradiction. Assume that . Thus, there exists a vector such that and , since . Let be a vector such that and . Thus, we have that and hence for any . Moreover, we can choose a sufficiently small real positive number such that, for all ,
[TABLE]
Let . Then, we have
[TABLE]
where the third equality follows because and the last equality follows from (6). However, for all ,
[TABLE]
This leads to a contradiction. Hence, we have and . We further assume that . Then, there also exists a vector such that and . Using the similar arguments as above, we can get a contradiction. Hence, we only have that .
Statement (iii). From statement (ii), has full column rank and hence . Then, we see that
[TABLE]
where the second inequality follows from Lemma 2.1 and the last inequality follows from . Thus, the above relation implies that
[TABLE]
which gives the upper bound for . On the other hand, we have
[TABLE]
where the third inequality follows from Lemma 2.1 and the last inequality follows from . This results in
[TABLE]
which gives the lower bound for . Recall that (our blanket assumption). Thus, this lower bound is nontrivial. Moreover, when , we have and hence . We then complete the proof.
Remark 2.3** **(The sparse solution of the - problem)
Theorem 2.2(ii) implies that without any condition on the sensing matrix , for any with and , while Shen and Mousavi show in [34, Proposition 3.1] that for any with and if every submatrix of is invertible. Combining these results gives a formal confirmation that if , all solutions of the - problem with are sparse, but the - problem with may not have sparse solutions.
In the following, we shall derive more theoretical results for the optimal solution set of the -constrained problem (4) with . But we should point out that all results established later can be extended without much difficulty to the -constrained case or other more general cases; see Remarks 2.6 and 2.10 for more details. As we shall see later, solving problem (4) with an arbitrarily sufficiently small actually gives an optimal solution of problem (4) with . This nice result is obtained based on a simple observation that the feasible set is indeed a convex polyhedron in (see Lemma 8.1). Moreover, observe that can be represented as a union of orthants, denoted by for , such that any two vectors and in each have the same sign for each entry, i.e., for each , we have
[TABLE]
For example, when , we have , where , , and . Then, for each , one can see that is empty or a polyhedron that has a finite number of extreme points because contains no lines; see [32, Corollary 18.5.3] and [32, Corollary 19.1.1].
Lemma 2.4
Let . Suppose that is an arbitrary index such that , where is defined in (7). Then, any optimal solution of the following problem
[TABLE]
is an extreme point of .
Proof. Let be an optimal solution of (8). Suppose that there exist such that for some . Then, we have
[TABLE]
where the third equality follows because any have the same sign for each entry, the first inequality follows because is strictly concave for , and the last inequality follows because is an optimal solution of (8). Note that the above relation holds if and only if . This implies that is an extreme point of .
Based on Lemma 2.4, we are able to characterize the number of the optimal solutions of problem (4) with . For notational simplicity, for , let
[TABLE]
Proposition 2.5
For any , the optimal solution set of problem (4) is a finite set. Moreover, the set is a finite set.
Proof. For a given , let be an optimal solution of problem (4), i.e., . Then, there must exist a such that and is also an optimal solution of (8) with in place of . Then, it follows from Lemma 2.4 that is an extreme point of . This implies that
[TABLE]
Note that, for each , is empty or a polyhedron that has a finite number of extreme points since contains no lines; see [32, Corollary 18.5.3] and [32, Corollary 19.1.1]. This together with (9) implies that is a finite set.
Moreover, since (9) holds for any , then we have
[TABLE]
which implies is a finite set. This completes the proof.
Remark 2.6** **(Comments on Proposition 2.5)
Proposition 2.5 is obtained based on the observation that the feasible set is a convex polyhedron in . From this observation, we can extend Proposition 2.5 to that for any , the optimal solution set of (1) with is a finite set. However, it is not clear whether for any , the optimal solution set of problem (1) with is a finite set. Thanks to Theorem 2.2, we can claim that if satisfies , the optimal solution set is a finite set, where is a positive integer. Indeed, in this case, by Theorem 2.2(ii), any optimal solution satisfies that and hence has at most two nonzero entries supported on . Then, there are only different choices of the support set . Let be the optimal objective value and, without loss of generality, assume that , , . Then, . Also, let and . We then see from Theorem 2.2(i) that and this equation can be further written as a -th order polynomial equation , which has at most real roots. This implies that, for each satisfying , there are only different choices of and . Hence, the optimal solution set is a finite set and the number of solutions is at most .
We next give two supporting lemmas and relegate the proofs to Appendices 6 and 7, respectively.
Lemma 2.7
Suppose that and satisfy
[TABLE]
then .
Lemma 2.8
Given , with . Let and be the nonzero entries in and , respectively, and, without loss of generality, assume that and . For , define
[TABLE]
Then, the following statements hold.
- (i)
If for all , then holds for any .
- (ii)
Otherwise, there exists a sufficiently small such that either or holds for any .
Now, we are ready to present our results concerning the optimal solution set with different choices of .
Theorem 2.9
There exists a such that for any . Moreover, there exists a such that for any .
Proof. We prove the first result by contradiction. Assume that there does not exist a number such that, for any , . Consider a sequence with and as . Thus, from the hypothesis, for each , there exists a point such that and . Now, we consider the sequence . Note that all elements in come from the set but they are not contained in . Since there are only finitely many points in (by Proposition 2.5), then there exists at least one point such that contains infinitely many , i.e., there exists a subsequence so that for all . Moreover, let . Then, for all , we have since . Then, we see that
[TABLE]
which implies that . This leads to a contradiction and completes the proof for the first result.
Next, we prove the second result. For notational simplicity, let and , where . For any , we have (by the first result) and define a set as \mathcal{C}(\bm{x}):=\big{\{}\bm{z}\in\mathcal{S}_{0\sim p^{*}}:\Delta_{k}(\bm{x},\bm{z})=0,\,\forall\,k=1,\cdots,s\big{\}}, where is defined as (10). Then, given and , it follows from Lemma 2.8(ii) that there exists a sufficiently small such that either or holds for any . Since is contained in , then the number of such a pair is finite. Therefore, we must have a sufficiently small such that, for any and , either or holds for any . Now, for such , consider any and let . We must have for any . This together with Lemma 2.8(ii) implies that for any , for any . Moreover, from Lemma 2.8(i), for any , for any . These two facts show that for any , for any . Hence, we have for any . Since is arbitrary and is also arbitrary, we can conclude that for any .
We now prove by contradiction that there must exist a such that for any . Assume this is not true. Then, for any , there exists a such that . This together with the conclusion obtained above implies that must be strictly contained in , i.e., . With this fact, we generate a sequence as follows. Let . Then, there exists a such that . For such , there exists a such that . Repeating this procedure, we can obtain a sequence such that and . Thus, along such sequence , the number of elements of will strictly increase and hence must have infinitely many elements. This leads to a contradiction and completes the proof.
Remark 2.10** **(Comments on Theorem 2.9)
Theorem 2.9 is established based on the observation that the feasible set of problem (4) is a polyhedron, and then, for each , has at most a finite number of extreme points. Thus, one can also consider minimizing under many other polyhedral constraints, for example, and with , and , to fit different scenarios in practice. Following the similar arguments presented in this paper, one can obtain the similar results in Theorem 2.9 as well as Theorem 2.12 under these polyhedral constraints. Moreover, it is also possible to extend our smoothing penalty method presented in the next section to solve problems in these cases. Here, we will omit more details to avoid overcomplicating the presentation. In addition, we are aware that the first result in Theorem 2.9 has also been discussed in [40]. However, the analysis there is much more tedious.
Based on Theorem 2.9, it is easy to give the following corollary for (namely, the noiseless case), which has also been discussed in [31, Theorem 1].
Corollary 2.11
There exists a such that, for any , every optimal solution of problem \min\big{\{}\|\bm{x}\|_{p}^{p}:A\bm{x}=\bm{b}\big{\}} is an optimal solution of problem \min\big{\{}\|\bm{x}\|_{0}:A\bm{x}=\bm{b}\big{\}}.
Theorem 2.9 says that there exists a such that solving problem (4) with any also solves problem (4) with . Therefore, the constant is obviously the key for such nice relation and we are interested in estimating such in the next theorem. Our analysis is motivated by that of [31, Theorem 1], but makes use of results developed in Theorem 2.2 and Lemma 2.4 for the more general feasible set. Before proceeding, we define two constants as follows:
[TABLE]
Note that for any subset such that has full column rank, is a principal submatrix of . Then, it follows from [27, Theorem 1.4.10] that is an eigenvalue of and hence . This together with Theorem 2.2(iii) implies that
[TABLE]
From (12), (13) and Lemma 2.4, one can also see that .
Theorem 2.12
Let be the optimal objective value of problem (4) with and
[TABLE]
Then, for any , .
Proof. First, we show that
[TABLE]
holds for any . Since and (15) holds trivially when , then we only consider in the following two cases.
- •
. In this case, . Since , then for any .
- •
. In this case, . Since , then for any .
Hence, (15) holds for any .
Next, let be an arbitrary optimal solution of problem (4) with , i.e., . It then follows from Lemma 2.4 that is an extreme point of for some . Thus, we have for any . Moreover, we see that
[TABLE]
where the second inequality follows because for any , the function is non-decreasing on , the equality (i) follows from (13), the third inequality follows because for any , the function is non-increasing on , the equality (ii) follows again from (13), and the last inequality follows from (15). Then, from the above relation, we have that and hence is an optimal solution of problem (4) with . This completes the proof.
Before closing this section, we present a simple example to illustrate our previous theoretical results.
Example 2.13
Let , and . Then, we consider
[TABLE]
with and . Next, for each , we discuss the optimal solution sets of problem (16) with different choices of .
For , the feasible set of (16) is
[TABLE]
Then,
[TABLE]
For , the feasible set of (16) is
[TABLE]
Then,
[TABLE]
For , the feasible set of (16) is
[TABLE]
Then,
[TABLE]
From this example, one can easily see that every optimal solution of (16) is at the boundary of the feasible set for and there is a such that is at the boundary of the feasible set for , as claimed in Theorem 2.2(i). Moreover, every optimal solution of (16) with is exactly a sparsest solution over for , while an optimal solution of (16) with may not be a sparest one. This shows the potential advantage of using the -norm () to approximate the -norm. In particular, when , one can further estimate by (14) for this example. Indeed, it is easy to see that . Then, from (11), we compute that . Moreover, one can verify that
[TABLE]
Thus, it follows from (12) that . Now, using (14), since , we have
[TABLE]
Recalling Theorem 2.12, we know that every optimal solution of (16) with shall be an optimal solution of (16) with . This is clearly evident in (17). In fact, for this example, every optimal solution of (16) with is an optimal solution of (16) with . This shows that given in (14) may not be the optimal upper bound of such that for any . In addition, our current estimate in (14) depends on the knowledge on the optimal value , which may be unknown or difficult to find in practice. Fortunately, we observe that , viewed as a function of , is actually decreasing when . Thus, one may estimate a proper upper bound for the true optimal value (i.e., ) and compute satisfying . It then follows from Theorem 2.12 that for any . But it should be noticed that such can be more conservative. Improving estimations of and will be an interesting research topic in the future.
3 A smoothing penalty method
In this section, we propose a smoothing penalty method for solving the -constrained problem (4) with . Before proceeding, we would like to point out that the smoothing penalty method presented in this paper can be extended without much difficulty to solve the -constrained problem, namely, problem (1) with and . Because the -constrained problem is similar to the -constrained problem in the sense that both constraints and are polyhedral constraints, and the functions and are piecewise linear. On the other hand, for , the function is continuously differentiable. Then, one can readily extend the smoothing penalty method proposed in [12] to solve problem (1) with and . However, the approach in [12] cannot be directly adapted for due to the nonsmoothness of the function in these two cases. In view of the above, in this paper, we consider an alternative smoothing penalty method for solving the -constrained problem and omit the discussions on solving the -constrained problem to save space.
We first study the first-order optimality conditions for problem (4) with . For simplicity, from now on, let . Then, problem (4) with can be equivalently written as follows:
[TABLE]
It is known from the generalized Fermat’s rule [33, Theorem 10.1] that, at any local minimizer of (18) (hence (4)), the following first-order necessary condition holds:
[TABLE]
This motivates the following definition.
Definition 3.1** **(Stationary point of problem (4) with )
A point is said to be a stationary point of problem (4) with if and (19) is satisfied with in place of .
Note that finding an optimal solution of problem (4) with is NP-hard [11, 21]. Therefore, we shall focus on finding a stationary point of this problem. To this end, we introduce the following auxiliary penalty problem:
[TABLE]
where is the penalty parameter and . This problem is indeed an exact penalty problem for problem (4) with . The detailed analysis for the exact penalization results regarding global and local minimizers is given in Appendix 8. However, problem (20) is still not conceivably solvable because both parts in (20) are nonsmooth, and moreover, is nonconvex and non-Lipschitz. We then consider a partially smoothing problem of (20) as follows:
[TABLE]
where , are smoothing parameters and
[TABLE]
with
[TABLE]
Note that and are the smoothing functions of and , respectively (see Figure 1), and they have the following nice properties:
[TABLE]
More details on these smoothing functions can be found in [10, Section 3] and references therein. Thus, the composite function is indeed obtained by applying the smoothing technique twice. Hence, it is continuously differentiable and can be viewed as a smoothing function of . One can also show that
[TABLE]
Moreover, it is worth mentioning that when , the auxiliary penalty problem (20) reduces to . Then, the smoothing function of is no longer needed and the subsequent analysis can also be simplified in this special case. Now, based on (21), we are ready to present a smoothing penalty method as Algorithm 1 for solving problem (4) with . We call it SPeL1 for short in the rest of this paper.
The reader may have observed that, since problem (20) is a penalty counterpart of problem (4) and problem (21) is a partially smoothing counterpart of problem (20), our method actually adapts the penalty strategy and the smoothing strategy at the same time for solving the nonconvex nonsmooth non-Lipschitz constrained problem (4) with . Specifically, in our method, at each iteration, we solve problem (21) approximately with given , and then update and . The cooperation of these two strategies indeed provides an efficient practical way to solve problem (4) with . This circumvents the potential disadvantages of the traditional penalty approach that directly solves the penalty problem (20) with an exact penalty parameter , because (i) it is still not easy to solve problem (20) efficiently; (ii) it is, in general, hard to estimate the exact penalty parameter and the overestimation may make the penalty problem (20) ill-conditioned. The convergence result that characterizes a cluster point of the sequence generated by the SPeL1 in Algorithm 1 is shown in the next theorem. We should note that, though the proofs are motivated by those in [12, Theorem 4.2] and [29, Theorem 2], the technical details become much more involved since our smoothing function is obtained by a composition of two smoothing functions and .
For the ease of future reference, we write down the gradients of and as well as the derivatives of and as follows:
[TABLE]
Moreover, we claim that is regular at any as follows. Let for any . It is easy to see that is regular at any , because is smooth in a neighborhood of any ; see [33, Exercise 8.8] and [33, Corollary 8.11]. For , it follows from [12, Lemma 2.5] and its proof that . Moreover, from the definition of the horizon cone (see [33, Definition 3.3]), we have that . Using these facts and [33, Corollary 8.11], we see that is also regular at . Therefore, it follows from [33, Proposition 10.5] that is regular at any .
Theorem 3.2
Suppose that and are chosen such that . Let be the sequence generated by the SPeL1 in Algorithm 1. Then, the following statements hold.
- (i)
* is bounded.*
- (ii)
Any cluster point of is a feasible point of problem (4) with .
- (iii)
Suppose that is a cluster point of and it holds at that
[TABLE]
Then, is a stationary point of problem (4) with .
Proof. Statement (i). First, we see that
[TABLE]
where the first inequality follows from the nonnegativity of (since for all ), the second inequality follows from (27), the third inequality follows from Step 1 in Algorithm 1, the fourth inequality follows from (24) and the last inequality follows from . This together with the level-boundedness of (recall that ) implies that is bounded.
Statement (ii). Since is bounded, there exists at least one cluster point. Suppose that is a cluster point of and let be a convergent subsequence such that . Note that
[TABLE]
where the first inequality follows from (22), (23) and the fact that is non-decreasing, and the last inequality follows from (24). Then,
[TABLE]
Taking limit in above inequality along and recalling that , , (see Step 3 in Algorithm 1), we see that . Hence, is a feasible point of (4) with .
Statement (iii). We next show that is a stationary point of problem (4) with . For simplicity, let () be the column vector formed from the th row of , i.e., . Moreover, let . Then, thanks to and (26) with . Thus, from (25) and (28), we see that for any , there exists a such that
[TABLE]
In the following, we consider two cases: and .
Case 1. In this case, we suppose that . Since , then, for any , there exists a sufficiently large such that \big{|}\|A\bm{x}^{k_{i}}-\bm{b}\|_{1}-\|A\bm{x}^{*}-\bm{b}\|_{1}\big{|}\leq\gamma for all . Note that
[TABLE]
where the first inequality follows from (23), the equality follows from , the second inequality holds for all and the last inequality follows whenever for some because and . This together with (29) implies that g^{\prime}_{\mu_{k_{i}-1}}\big{(}H_{\nu_{k_{i}-1}}(A\bm{x}^{k_{i}}-\bm{b})-\sigma\big{)}=0 for all sufficiently large . Hence, (32) reduces to for all sufficiently large . Then, we have from (5) that . This together with (since ) implies that
[TABLE]
Moreover, since and are regular, then it follows from [33, Corollary 8.11] and [33, Exercise 8.14] that and . Using these facts and recalling [33, Theorem 8.6], [33, Corollary 10.9], we have
[TABLE]
which implies that is a stationary point of problem (4) with .
Case 2. In this case, we suppose that . For such , one can follow [25, Theorem 1.3.5 in Section D] to compute that
[TABLE]
For simplicity, let and for . Also, let , and . Then, (32) is equivalent to
[TABLE]
Since and , there exists a sufficiently large such that for all , we have and for all , and have and for all . Thus, it follows from (30) that for all , we have for all and for all . Moreover, for all , we see from (29) and (30) that and for all . Then, for all , we have that for all , for all and for all .
We next prove by contradiction that is bounded. Suppose that is unbounded. Without loss of generality, we assume that and that for some . Then, it follows from (34) that
[TABLE]
Moreover, from the discussions in the last paragraph, for all , we have that for all , for all and for all . Then, it follows from (33) that
[TABLE]
for all . Then, passing to the limit in (35) along , together with and the closeness of , it is not hard to see that
[TABLE]
Since due to , this is in contradiction to (31). Hence, is bounded. Without loss of generality, assume that . Then, passing to the limit in (34) along and , making use of (33) and the closeness of , recalling (5), we obtain that
[TABLE]
Thus, following the similar arguments in Case 1, one can show that is a stationary point of problem (4) with . This completes the proof.
Remark 3.3** **(Comments on condition (31))
Condition (31) used for Theorem 3.2(iii) is actually a classic constraint qualification for nonconvex nonsmooth optimization problems; see [33, Theorem 8.15]. Note that, for any , we have
[TABLE]
Moreover, recall from [12, Lemma 2.5(ii)] that
[TABLE]
Thus, condition (31) obviously holds at a point satisfying . For a point satisfying , one sufficient condition for (31) is that, for some , holds for any , i.e., for any .
To end this section, we briefly discuss the method for approximately solving the smoothing penalty problem (21) such that conditions (25)–(27) hold. Note that, for any given , is a continuous function that consists of a nonconvex nonsmooth non-Lipschitz function and a smooth function . It is also not hard to verify that the gradient of is Lipschitz continuous. Moreover, is level-bounded because is level-bounded and is nonnegative since is nonnegative. Hence, the well-known proximal gradient method and its variants are suitably applied for solving (21) with convergence guarantee; see, for example, [1, 2, 12, 39]. In our numerical experiments, we follow [12] to adapt the nonmonotone proximal gradient (NPG) method. The NPG method is basically the proximal gradient method with a non-monotone line search technique and allows the occasional increases in objective. By incorporating this technique, the NPG has been shown to have more favorable numerical performance over the monotone version in many applications; see, for example, [22, 38, 39]. The iterative scheme of the NPG for solving (21) with is given as follows:
Choose , , , and an integer . At the -th () iteration, choose and find the smallest nonnegative integer such that
\left\{\begin{aligned} &\bm{w}\in\arg\min\limits_{\bm{x}\in\mathbb{R}^{n}}\Big{\{}\Phi(\bm{x})+\langle\nabla f_{\lambda_{k},\mu_{k},\nu_{k}}(\bm{x}^{k,l}),\,\bm{x}\rangle+\frac{\tau^{i_{l}}L_{k,l}^{0}}{2}\|\bm{x}-\bm{x}^{k,l}\|^{2}\Big{\}},\\ &F_{\lambda_{k},\mu_{k},\nu_{k}}(\bm{w})-\max\limits_{[l-N]_{+}\leq i\leq l}F_{\lambda_{k},\mu_{k},\nu_{k}}(\bm{x}^{k,i})\leq-\frac{c}{2}\|\bm{w}-\bm{x}^{k,l}\|^{2}.\end{aligned}\right.
(36)
Then, set and .
One can also show that, for any given and , a point satisfying conditions (25)–(27) can be found by the NPG within a finite number of iterations. Indeed, it follows from [12, Proposition A.1(i)] that (27) holds for all . Moreover, from the optimality condition of (36), we see that
[TABLE]
which implies that
[TABLE]
This together with the boundedness of (see [12, Proposition A.1(ii)]) and as (see [12, Theorem A.1]) implies that (25) and (26) hold when is sufficiently large. In view of the above, the sequence generated by the SPeL1 in Algorithm 1 is well-defined.
4 Numerical simulations
In this section, we conduct some numerical experiments for problem (4) with on finding sparse solutions to implicitly illustrate the theoretical results established in Section 2 and show the efficiency of our SPeL1 in Algorithm 1. All experiments are run in Matlab R2016a on a workstation with Intel(R) Xeon(R) Processor [email protected] and 64GB of RAM, equipped with 64-bit Windows 10 OS.
For the SPeL1, we set and , where the computation of is not counted in the CPU time below. At the th outer iteration, we compute
[TABLE]
Then, based on these quantities, we set
[TABLE]
The initial tolerance for the subproblem is set to and is updated as (instead of ) in our implementation. Finally, we terminate the SPeL1 when
[TABLE]
Once the SPeL1 is terminated and returns an approximate solution , we also perform a refinement step by setting if to improve the quality of the approximate solution.
For solving each subproblem (21) with in the SPeL1, we adapt the NPG described in (36) with , L_{k}^{\max}=\big{(}\frac{m}{\mu_{k}}+\frac{2}{\nu_{k}}\big{)}\lambda_{k}\|A\|^{2}, , and . Moreover, we set and, for any ,
[TABLE]
with , where
[TABLE]
The NPG method is terminated when the number of iterations exceeds 1000 or
[TABLE]
Note from (37) that if the first inequality above holds, condition (25) is then approximately satisfied.
In the following experiments, we consider randomly generated instances. Given a dimensional triple , we randomly generate an instance as follows. First, we generate a matrix with i.i.d. standard Gaussian entries and then normalize so that each column of has unit norm. We next choose a subset of size uniformly at random and generate an -sparse vector , which has i.i.d. standard Gaussian entries on and zeros on . Then, we generate the vector by setting , where is a scaling parameter and is the noisy vector with each entry independently following certain distribution. We shall consider two cases:
- •
Case 1. We use the standard Gaussian distribution via the Matlab command: xi = randn(m,1).
- •
Case 2. We use the Student’s distribution via the Matlab command: xi = trnd(2,m,1).
Finally, we set so that . In particular, for such , we have observed from our simulations that all random instances satisfy and hence .
Table 4 presents the numerical results of the SPeL1 for solving problem (4) with , where we use and consider different choices of and under different noisy cases. In this table, “nnz” denotes the number of nonzero entries in the refined terminating solution ; “rank” denotes the rank of with ; \mathbf{err}_{1}:=\max\big{\{}\|\bm{x}^{*}\|_{\infty}-(\lambda_{\min}(A_{\mathcal{J}}^{\top}A_{\mathcal{J}}))^{-\frac{1}{2}}(\sigma+\|\bm{b}\|_{2}),\,(|\mathcal{J}|\lambda_{\max}(A_{\mathcal{J}}^{\top}A_{\mathcal{J}}))^{-\frac{1}{2}}(\|\bm{b}\|_{1}-\sigma)-\|\bm{x}^{*}\|_{\infty},\,0\big{\}}; and . All results presented are the average of 10 independent instances for each and we display the rounding numbers for “nnz” and “rank”. From Table 4, one can see that nnz rank, and always hold, clearly matching Theorem 2.2 established for an optimal solution of problem (4) with . This implies that our SPeL1 is able to find a ‘good’ stationary point of problem (4) with , which has important properties of an optimal solution.
We further generate one random instance for each under different noisy cases, and then apply our SPeL1 to solve problem (4) with different . The number of nonzero entries in the approximate solution obtained for different are presented in Figure 2. From this figure, we see that solving problem (4) with a smaller always gives a sparser approximate solution, and the sparsity is almost unchanged and is close to the sparsity of when is smaller than a certain threshold. This observation implicitly matches Theorem 2.9, which says that and remains unchanged for any sufficiently small , and shows the potential advantage of solving problem (4) with a small for finding a sparse solution. Moreover, in practice, such may not be necessarily too small. From our experiments, we observe that is small enough for problem (4) to give a sparse solution.
Next, we consider using model (4) to recover a sparse solution of an underdetermined linear system from noisy measurements, and compare its performance with that of using the widely-studied -constrained problem (see, for example, [3, 12, 13, 35]):
[TABLE]
We will solve problem (39) with by the smoothing penalty method111The Matlab codes implemented by the authors in [12] are available at http://www.mypolyuweb.hk/~tkpong/Exact_lp_codes/ proposed in [12] and call it SPeL2 for short. All parameters in the SPeL2 are chosen as the default settings, except that we terminate its subroutine NPG when the inner iteration number exceeds 1000 to save the cost for solving the subproblem, while maintaining the quality of the eventual solution. Moreover, we initialize the SPeL2 at the same point as the SPeL1 and terminate the SPeL2 at the th iteration when , where , are defined in (38) and \eta_{4}^{k}:=\max\big{\{}\|A\bm{x}^{k+1}-\bm{b}\|_{2}-\sigma,\,0\big{\}}. We also adapt the refinement step for the approximate solution obtained by the SPeL2 to improve the quality of the approximate solution.
In comparisons below, we use and consider different and under different noisy cases. For each and , we randomly generate , , , as described above, but set for (4) and set for (39) so that both resulting feasible sets of (4) and (39) will contain the sparse vector as a boundary point. The computational results are reported in Table 4, where “nnz” denotes the number of nonzero entries in the refined terminating solution ; “feas” denotes the deviation of from the constraint, which is given by for (4) and for (39); “recerr” denotes the relative recovery error ; “time” denotes the computational time (in seconds). All results reported are the average of 10 independent instances for each and . One can observe from this table that for the Gaussian noisy case, the performance of our SPeL1 is comparable with that of the SPeL2 with respect to the relative recovery error, while for the Student’s noisy case, our SPeL1 gives sparse solutions with smaller relative recovery errors for all instances. It is worth noting that, for the problem of recovering sparse solutions, even marginal improvements on recovery error could be very hard. Moreover, all approximate solutions obtained by the SPeL1 are exactly the feasible points of (4) and the sparsity of each solution is closer to that of the true sparse vector for most cases.
To better visualize the recovery performances of SPeL1 and SPeL2, we generate more instances to test and plot the “frequency of success” for each method with different . Specifically, we fix , and vary from 20 to 70. The noisy level is set to . For each , we generate 500 independent instances, and for each instance, we run each method to obtain an approximate solution and consider the recovery successful if . The results of the experiments are presented in Figure 3. Note that when the number of measurements is fixed, a larger generally leads to a more difficult recovery problem and thus the successful rate would be decayed, as shown in the figure. Moreover, one can see that for the Gaussian noisy case, the successful rate of our SPeL1 is comparable with that of the SPeL2, while for the Student’s t(2) noisy case, our SPeL1 can give better successful rates especially when is small. This highlights the potential advantage of our approach for recovering a sparse solution under non-Gaussian noisy cases. One may also observe that when becomes larger and , the successful rates of both methods appear to become lower as becomes smaller. The possible reason is that when is large and is too small, finding a solution of problem (4) or (39) can be rather difficult and hence it is less likely for a stationary point to be a good candidate. Therefore, both SPeL1 and SPeL2 may still need some improvements for the hard cases ( is small and is large). We will leave this interesting research topic in the future.
5 Concluding remarks
In this paper, we consider a unified - sparse optimization problem (1) and study various properties of its optimal solutions. Specifically, without any condition on the sensing matrix , we provide upper bounds in cardinality and infinity norm for the optimal solutions, and show that all optimal solutions must be at the boundary of the feasible set when ; see Theorem 2.2. Moreover, for , we show that the -constrained problem with has finitely many optimal solutions; see Proposition 2.5 and Remark 2.6. We further show that, for , there exists such that the solution set of the problem with any is contained in the solution set of the problem with and there also exists such that the solution set of the problem with any remains unchanged; see Theorem 2.9 and Remark 2.10. An estimation of such is also provided in Theorem 2.12. A convergent smoothing penalty method is also proposed to solve the -constrained problem with . Some numerical examples are presented to implicitly illustrate the theoretical results and show the efficiency of the proposed method for solving the constrained - problem under different noises.
{APPENDICES}
6 Proof of Lemma 2.7
First, for , we define , ,
[TABLE]
Then, from Viète’s formula [24], we see that and are the roots of and , respectively, where
[TABLE]
Moreover, from [30, Eq. ()] and the discussions that follow, we have that, for ,
[TABLE]
with . Notice that and for . Thus, from (40), it is not hard to show by induction that holds for . This implies that and have the same roots and hence .
7 Proof of Lemma 2.8
First, from the Taylor expansion (with Lagrange remainder), for any , and , we have
[TABLE]
where is a number between 0 and . Then, for any and , we have
[TABLE]
where, for , is a number between 0 and , and is a number between 0 and . In the following, we consider two cases.
Case 1: for all , where is defined as (10). In this case, we have for all . This together with Lemma 2.7 further implies that and hence . Then, we have for any . This proves statement (i).
Case 2: Case 1 does not hold. In this case, there must exist some so that and for . Then, we have from (41) and (10) that
[TABLE]
where \Xi_{\tilde{k}+1}^{p}(\bm{a},\bm{b}):={\textstyle\sum_{j=1}^{s}}\big{(}e^{\xi_{i_{j},\tilde{k}+1}}(\ln|a_{i_{j}}|)^{\tilde{k}+1}-e^{\eta_{t_{j},\tilde{k}+1}}(\ln|b_{t_{j}}|)^{\tilde{k}+1}\big{)}. Note also that and as . Thus, there must exist a sufficiently small such that
[TABLE]
We now consider the following two cases.
- •
: in this case, using (42) and (43), we obtain that
[TABLE]
This implies that for any .
- •
: in this case, using (42) and (43), we obtain that
[TABLE]
This implies that for any .
Combing the above results, we complete the proof for statement (ii).
8 Exact penalization
In this section, we show that problem (20) is actually an exact penalization for problem (4) with . For notational simplicity, we define a set and a matrix as follows:
[TABLE]
where and for any . Since each entry of is either or and the dimension of is , then one can have different choices of and hence such and are well-defined. Moreover, it is easy to see that if , then . A simple example is given as follows: let , then
[TABLE]
We next present some auxiliary lemmas, which will be useful in our analysis.
Lemma 8.1
Let , and . Then, can be equivalently rewritten as , where is defined in (44) and .
Proof. Observe that
[TABLE]
where the first equality follows from , the second equality follows because the maximizer of must be an extreme point of (see [32, Corollary 32.3.4]) and is the set of all extreme points of . This completes the proof.
From Lemma 8.1, it is easy to see that the feasible set is a convex polyhedron. This together with the Hoffman error bound theorem [26] gives the following lemma.
Lemma 8.2
There exists a constant such that
[TABLE]
holds for any , where , and is defined in (44).
Based on this error bound result, we further give the following lemma.
Lemma 8.3
There exists a constant such that, for any , we have
[TABLE]
Proof. We first show that, for any , it holds that
[TABLE]
where and are defined in Lemma 8.2. Indeed, for any , there exists some such that . Then, we have
[TABLE]
On the other hand, from , we have
[TABLE]
Then, we see that
[TABLE]
where the second equality follows because if and , then and , and the inequality follows from (46). From the above, we obtain (45). This together with Lemma 8.2 completes the proof.
Now, we are ready to present our exact penalization results. Our first theorem concerns local minimizers of problems (4) and (20). The other two theorems concern -minimizers of problems (4) and (20) (see definitions later).
Theorem 8.4
Suppose that is a local minimizer of (4). Then, there exists a such that is a local minimizer of (20) whenever .
Proof. We first assume that and consider any bounded neighborhood of 0 and . Let denote a Lipschitz constant of the function on . For this , one can verify that there exists a neighborhood of 0 such that for all . Then, for any , we have
[TABLE]
where the last inequality follows from the definition of being a Lipschitz constant. This shows that is a local minimizer of (20) for any .
From now on, we assume that . Let for simplicity. Then, since . Since is a local minimizer of (4), one can verify that is a local minimizer of the following problem:
[TABLE]
Let \tilde{\epsilon}=\frac{1}{2}\min\big{\{}|x^{*}_{i}|:i\in\mathcal{J}\big{\}}>0. Thus, there exists a small such that is a local minimizer of (47) and \min\big{\{}|x_{i}|:i\in\mathcal{J}\big{\}}>\tilde{\epsilon} for all . Moreover, note that is Lipschitz continuous on and there exists a constant such that for all (see Lemma 8.3). Then, from [12, Lemma 3.1] (or [28, Proposition 4]), there exists a such that, for any , is a local minimizer of the following problem:
[TABLE]
i.e., there exists a neighborhood of 0 with such that
[TABLE]
We now show that is a local minimizer of (20) for any . Fix any and any . Consider the bounded neighborhood of 0 and let be a Lipschitz constant of the function on . For this , there exists an such that for all . Then, for any , we have
[TABLE]
where the first inequality follows from the Lipschitz continuity of with Lipschtiz constant and the last inequality follows from (48). This shows that is a local minimizer of (20) for any and completes the proof.
We next study -minimizers of (4) and (20), which are defined as follows.
Definition 8.5** (-minimizer**)
Let .
- (i)
* is said to be an -minimizer of problem (4) if and \|\bm{x}_{\epsilon}\|_{p}^{p}\leq\min\big{\{}\|\bm{x}\|_{p}^{p}:\bm{x}\in{\rm FEA}(A,\bm{b},\sigma,1)\big{\}}+\epsilon.*
- (ii)
* is said to be an -minimizer of problem (20) if .*
We also introduce the following function:
[TABLE]
where is a constant. Note that is continuously differentiable. Moreover, from the discussions in [12, Section 3.3], we have that
[TABLE]
Then, we characterize the relation between the global minimizer of problem (4) and the -minimizer of problem (20) in the next theorem.
Theorem 8.6
Suppose that is a global minimizer of problem (4). Then, for any , there exists a such that is an -minimizer of problem (20) whenever .
Proof. First, for any , we consider and defined in (49). Then, we see from (50) and (51) that
[TABLE]
and is globally Lipschitz continuous with Lipschitz constant . Now, let , where is chosen as in Lemma 8.3. For any , we also use to denote the projection of on . Then, for and any ,
[TABLE]
where the first inequality follows from (52), the second inequality follows from Lemma 8.3, the third inequality follows from , the fourth inequality follows the Lipschitz continuity of with Lipschtiz constant , and the last two inequalities follows from (52) and the definition of as a minimizer of problem (4). This shows that is an -minimizer of problem (20) and completes the proof.
From Theorems 8.4 and 8.6, we see that if is a local minimizer or global minimizer of problem (4), then it is also a local minimizer or -minimizer of problem (20). Conversely, it is easy to see that if is a local minimizer or -minimizer of problem (20) for some and , then it is also a local minimizer or -minimizer of problem (4). Finally, we shall study the case when is a global minimizer of problem (20) for some but .
Theorem 8.7
Suppose that is an arbitrary feasible point of problem (4), i.e., . Take any and consider any , where is chosen as in Lemma 8.3. Then, for any global minimizer of problem (20), the projection is an -minimizer of problem (4).
Proof. First, from the definition of and the global optimality of , we have
[TABLE]
Then, for any , we have
[TABLE]
where the first inequality follows from (53), the second inequality follows from [12, Lemma 2.4], the third inequality follows from the concavity of the function for nonnegative , the fourth inequality follows from Lemma 8.3 and the last two inequality follows from (54) and the choice of . This implies that is an -minimizer of (4) and completes the proof.
Acknowledgments
The authors are grateful to the editor and the anonymous referees for their valuable suggestions and comments, which have helped to improve the quality of this paper. The authors would also like to thank the CAS AMSS-PolyU Joint Laboratory of Applied Mathematics for its support while this research was being conducted. The research of Shuhuang Xiang was supported in part by the National Natural Science Foundation of China (Grant No. 11771454).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Attouch et al. [2013] Attouch H, Bolte J, Svaiter B (2013) Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1):91–129.
- 2Beck [2017] Beck A (2017) First-Order Methods in Optimization , volume 25 (SIAM).
- 3Bruckstein et al. [2009] Bruckstein A, Donoho D, Elad M (2009) From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1):34–81.
- 4Cai et al. [2011] Cai T, Liu W, Luo X (2011) A constrained ℓ 1 subscript ℓ 1 \ell_{1} minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc. 106(494):594–607.
- 5Candès et al. [2006] Candès E, Romberg J, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8):1207–1223.
- 6Candès and Tao [2005] Candès E, Tao T (2005) Decoding by linear programming. IEEE Trans. Inf. Theory 51(12):4203–4215.
- 7Candès and Tao [2007] Candès E, Tao T (2007) The Dantzig selector: Statistical estimation when p 𝑝 p is much larger than n 𝑛 n . Ann. Stat. 35(6):2313–2351.
- 8Chartrand [2007] Chartrand R (2007) Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 14(10):707–710.
