An Efficient and Globally Convergent Algorithm for $\ell_{p,q}$-$\ell_{r}$ Model in Group Sparse Optimization
Yunhua Xue, Yanfei Feng, Chunlin Wu

TL;DR
This paper introduces a new proximally linearized algorithm called InISSAPL designed to efficiently solve non-Lipschitz group sparse optimization problems involving the _{p,q}-_{r} model, ensuring global convergence.
Contribution
The paper presents the first efficient algorithm with global convergence guarantees for the _{p,q}-_{r} group sparse optimization problem.
Findings
The algorithm converges globally for the non-Lipschitz _{p,q}-_{r} model.
It outperforms existing methods in efficiency and accuracy.
The method is applicable to various group sparse optimization tasks.
Abstract
Group sparsity combines the underlying sparsity and group structure of the data in problems. We develop a proximally linearized algorithm InISSAPL for the non-Lipschitz group sparse - optimization problem.
| Laplace noise | ||||||
|---|---|---|---|---|---|---|
| 0.0370 | ||||||
| 0.0270 | ||||||
| 0.0362 | ||||||
| 0.0491 | ||||||
| Gaussian noise | ||||||
| 0.0186 | ||||||
| 0.0203 | ||||||
| 0.0247 | ||||||
| 0.0297 | ||||||
| uniform noise | ||||||
| 0.0110 | ||||||
| 0.0127 | ||||||
| 0.0159 | ||||||
| 0.0147 |
| PGM-GSO | e-PGM-GSO | InISSAPL | |||||
|---|---|---|---|---|---|---|---|
| Time(s) | Time(s) | Time(s) | |||||
| 4 | 0.46 | 0.0023 | |||||
| 8 | 0.0025 | 0.49 | |||||
| 12 | 0.0030 | 0.50 | 0.0030 | ||||
| 16 | 0.52 | 0.0031 | |||||
| PGM-GSO | e-PGM-GSO | InISSAPL | |||||
| Time(s) | Time(s) | Time(s) | |||||
| 25 | 3.98 | 0.0025 | |||||
| 50 | 0.0027 | 4.20 | 0.0027 | ||||
| 75 | 6.58 | 0.0028 | |||||
| 100 | 9.02 | 0.0866 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Optimization and Variational Analysis
An Efficient and Globally Convergent Algorithm for - Model in Group Sparse Optimization
Yunhua Xue,Yanfei Feng and Chunlin Wu,
School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China Corresponding author. [email protected]
Abstract
Group sparsity combines the underlying sparsity and group structure of the data in problems. We develop a proximally linearized algorithm InISSAPL for the non-Lipschitz group sparse - optimization problem. The algorithm gives a unified framework for all the parameters , which is applicable to different kinds of measurement noise. In particular, it includes the addition of the non-smooth regularization term and the non-smooth / fidelity term as special cases. It allows an inexact inner loop accessible to the implementation of scaled ADMM, and still has global convergence. The algorithm is efficient and fast with computation only on the shrinking group support set. Many numerical experiments are presented for the algorithm with diversity of parameters . The comparisons show that our algorithm is superior to others in the existing works.
Keywords. group sparse, - model, non-Lipschitz optimization, Laplace noise, Gaussian noise, uniform distribution noise, lower bound theory, Kurdyka-Łojasiewicz property
Mathematics subject classification (2010). 49M05, 65K10, 90C26, 90C30
1 Introduction
We consider the following - minimization problem
[TABLE]
where
[TABLE]
and , ,, , , , , the regularization term measures the group sparse structure of , which is a quasi-norm, defined by
[TABLE]
where are the group members defined in Section 2 and is the standard norm for vectors.
In Big Data era, data used to describe the structures, segments and features always have group property. Namely, they have a natural grouping of their components. Sparsity allows us to reconstruct high-dimensional data with only a small number of variables, leading to better recovery performance. By combining them, the recovery or reconstruction of group sparse data is enhanced to an active research topic in sparse optimization. The group sparse minimization problem (1.1) by underdetermined linear measurements has a wide variety of applications, such as signal recovery [17, 21], image processing [31], compressed sensing [30], model selection in birth weight prediction [38], sparse learning [35], variable selection in gene finding [28] and so on. Therefore, it is meaningful to study efficient algorithms for this general group sparse optimization problem.
The general means that it covers a lot of case models for different parameters . We assume the observation
[TABLE]
where represents the noise. The model here can be adapted for the diversity of noise by the parameter in the data fitting term . As well known, for Gaussian noise, people use the fidelity term (). For Laplace noise or heavy-tailed noise such as impulsive noise, the fidelity term () is a good choice. For the noise by uniform distribution or quantization error, the fidelity term () suits.
There are many references to study the sparse optimization problem without group structure in it, i.e. the non-group model in which the number of groups equals . Then the term in (1.1) is degenerated to regularization one. One class of methods is smoothing approximate methods [5, 14, 13, 24, 12]. By a smoothing function , the non-Lipschitz property of the objective function can be removed. The second class of methods is general iterative shrinkage-thresholding algorithms (GISA) for - problem [34, 40, 9]. GISA was inspired by the great success of soft thresholding and iterative shrinkage-thresholding algorithms (ISTA) [16, 3] for convex - problem. The third class of methods is the iterative reweighted minimization methods for - minimization problem; see, e.g. [23, 10, 15, 27]. Actually reweighted methods reformulate the original non-Lipschitz - to Lipschitz ones by a de-singularizing parameter. Very recently, [25, 39] developed methods by successively shrinking the support of the variables to overcome non-Lipschitz property, in which [25] considered the non-group case with and [39] focused on the image restoration with . To the best of our knowledge, we note that most of the references considered only in these methods.
For the group sparse optimization problem (1.1), most algorithms were proposed only in the case of as well. Hu et al. [21] investigated this problem via regularization, others developed algorithms for regularized least squares, e.g. group Lasso [38, 8, 17]. As noted before, it is important and necessary to develop algorithm for general . This will bring the difficulty to universally handle the noise parameter with regularized parameters in the group structure. In addition, the regularization term with parameters in the objective function in (1.1) leads to a non-convex, non-Lipschitz optimization problem. This non-smoothness becomes even serious for the regularization case. All these characteristics of the minimization model (1.1) result in a great challenge to solve it.
In this paper, we extend our recent work [25] to solve the general group sparse optimization problem (1.1). This extension is not trivial, because model (1.1) is more complicated and includes more nonsmooth cases than the non-group one in [25], as mentioned in the former paragraph. We firstly establish a motivating proposition by developing subdifferential lemmas in group variables. This gives us the rationality to design a unified iterative support shrinking algorithm over group support set of unknown variables for various . To make the algorithm more practical and easily implementable, we linearize the regularization term and present the InISSAPL algorithm to calculate the approximate solution. Although the algorithm allows an inexact inner loop, we prove its global convergence from a new lower bound theory for the norm of the nonzero groups of iteration sequence. The algorithm implementation by scaled ADMM is also discussed where, especially for the case of , we give an analytical derivation of the explicit solution of the corresponding subproblem. Numerical experiments show that the algorithm is not only robust to the diversity of noise, but also has good performance for different . Compared with others in group sparse optimization on relative errors, successful rates and running time, our algorithm outperforms them. The main characters of InISSAPL algorithm for model (1.1) are presented as follows,
- (i)
The algorithm provides a unified framework for all the parameters . It can particularly deal with the case of the addition of non-smooth regularization term and non-smooth / fidelity term. 2. (ii)
The computation is implemented only on the shrinking group support set of at each iteration step. Naturally our algorithm is efficient, especially for large scale sparse recovery problems. 3. (iii)
The key step is to overcome the non-Lipschitz property of the objective function and construct an appropriate subdifferential formula, when using KL property to prove the global convergence of the algorithm. It is solved by developing a lower bound theory of the nonzero groups of the iterative sequence and a technical construction of the subdifferential; see section 4 for details.
The rest of the paper is outlined as follows. In section 2, we give some basic notations and preliminaries. In section 3, we give the motivating proposition and propose the corresponding algorithms. In section 4, we establish the global convergence theorem for the proposed algorithms. In section 5, we describe the implementation of the algorithm by scaled ADMM. Numerical experiments and comparisons are showed in section 6. Section 7 concludes the paper.
2 Notations and preliminaries
Suppose that is an matrix and is a column vector with components. denotes the row index set of . To be specialized, we use another kind of upright font to express the group index such as . Let represent the group structure of . denotes the group index set of . For each group member , we denote by the index set, then . We also refer to as its th entry of and denote the group support set of by
[TABLE]
where means that for some . Furthermore, we use when for all . The support of group member is defined by
[TABLE]
Let be a subset of . We denote by the group vectors of indexed by , which consists of the nonzero group members of when .
For a matrix , we partition it into submatrices , which is the th row of partitioned according to the group structure of , i.e.,
[TABLE]
Because are row vectors, we denote by the -th entry of it. In a similar way with , we denote by the column sub-matrix of consisting of the columns indexed by .
Define by . We state some useful properties for .
Proposition 2.1**.**
The function has the following properties:
- (i)
* and on .* 2. (ii)
* is concave and the following inequality holds,*
[TABLE] 3. (iii)
For any , is -Lipschitz continuous on , i.e., there exists a constant determined by , such that ,
[TABLE]
Lemma 2.2**.**
Let be the m-dimensional vector, the following inequality holds:
[TABLE]
Proof.
Let , then is monotone decreasing by the fact for .
∎
Lemma 2.3**.**
Let , then there exists constant , such that,
[TABLE]
Proof.
For , the result can be verified easily from the norm equivalence in finite dimensional space. For , from [21, Lemma 1], we have
[TABLE]
where is the smallest integer such that . We use the norm equivalence once again to have
[TABLE]
where , and is the relation coefficient of norm equivalence. ∎
3 Motivation and the proposed algorithm
3.1 Subdifferentials and regularity
By the definition of , we have . We also define the norm function for a vector . In order to calculate the subdifferential of the object function in (1.1), we give two lemmas firstly.
Lemma 3.1** (Subdifferential).**
Let be an -dimensional vector, we have the following results,
- (i)
For and , the subdifferential is,
[TABLE]
where and means the Cartesian product of sets; 2. (ii)
For , the subdifferential would be
[TABLE]
where
[TABLE]
Proof.
For brevity, denote the set by . In (i), let , which is the regular subdifferential at . By the definition,
[TABLE]
From the equivalence of norms when , we have
[TABLE]
where is a constant. It is sufficient to have
[TABLE]
This is true for any due to . Then the proof is finished by the fact that .
In (ii), for , the function is continuously differential at , so the subdifferential is the gradient in this case. For , we show that firstly. On one hand, let and , the limit inferior hold along the special direction,
[TABLE]
Then we have
[TABLE]
by the differential mean value theorem. So .
On the other hand, we construct function when is in the neighbourhood of :
[TABLE]
where . Then is differentiable at and , . From [29, Proposition 8.5], we have . Here
[TABLE]
to obtain by the arbitrary . Hence .
The left is to show , since the inclusion relationship in the other direction holds from the remark of Definition 9.1.
In fact, suppose , by the definition, there exists and , thus and have the identical support when is sufficiently large. Based on it and from the fact
[TABLE]
we obtain that by the limit process. ∎
The regularity property of function is essential for dealing with the subdifferential of the addition of two non-smooth norms, i.e. term and / noise term, we give the lemma here.
Lemma 3.2** (Regularity).**
Let be the -dimensional vector, then is regular at for .
Proof.
By [29, Corollary 8.11], is regular at if and only if
[TABLE]
In the proof of Lemma 3.1, we know that the first equality in (3.1) holds. The left is to verify the second equality.
For , we have
[TABLE]
thus the horizon cone is the same set by letting and in Definition 9.2. We can also conclude the horizon subdifferential by the same trick.
For , we have the following from Definition 9.1 and the remark of Definition 9.2:
[TABLE]
due to the boundedness of . ∎
Remark*.*
From [29, Proposition 10.5] for separable functions, the sum function
[TABLE]
is also regular.
The objective function in (1.1) reads
[TABLE]
which is bounded below, coercive, and continuous. It has at least one minimizer.
Now, we derive the subdifferential of at . From Lemma 3.2 and the remark, we know that is regular. For , is convex and also regular. By [29, Exercise 10.9], we get
[TABLE]
The subdifferential on the first term in (3.3) can be obtained by [29, Proposition 10.5],
[TABLE]
The subdifferential factors in the right-hand term can be calculated by Lemma 3.1 according to the specific cases of . The subdifferential on the second term in (3.3) can be obtained by the chain rule of composite subdifferential,
[TABLE]
where the subdifferential of the infinity norm can be derived as follows. From the Danskin-Bertsekas Theorem for subdifferential in [4, Proposition A.22], it holds that
[TABLE]
Hence, the each entry of element in , denoted by \mbox{\boldmath\small\eta}_{\mathsf{i},j}(\mathbf{x}),\mathsf{i}\in\mathsf{G},j\in J_{\mathsf{i}} has the following representation,
[TABLE]
From the definition of the subdifferential, we have that is a stationary point of (1.1) if and only if
[TABLE]
3.2 A motivating proposition
The following proposition inspires us to design the algorithm in the next section.
Proposition 3.3**.**
Suppose has the group structure . If is sufficiently close to a local minimizer (or a stationary point) of (1.1). Then it holds that
[TABLE]
Proof.
We prove (3.8) by contradiction.
As is a local minimizer (or a stationary point) of , the condition (3.7) implies that .
If , that is, for some . For , we have
[TABLE]
Summing up all the absolute values of the two terms in (3.9) for , we have
[TABLE]
the left inequality holds from Lemma 2.2 for and from for .
The right side of (3.10) is uniformly bounded in the neighborhood of , and the bound is independent of . Since can be sufficiently close to , it contradicts (3.10) by .
For and , we have
[TABLE]
Thus, the results can be derived similarly from the uniform boundedness of the sets \mbox{\boldmath\small\eta}_{\mathsf{i}^{\prime},j}(\mathbf{x}^{\ast}) in the neighborhood of .
∎
Remark*.*
For the special case in fidelity term, [14, 21] established the lower bound theory, which can also inspire our proposition.
3.3 Algorithm
Motivated by Proposition 3.3, we propose to solve the problem (1.1) by an iterative process, which generates a sequence whose group support set is nonincreasing. Suppose that is an approximate solution in the th iteration. In the next iteration, we minimize the objective function only on the group support set of , with the remaining group components being null. This idea yields the following iterative support shrinking algorithm (ISSA).
Initialization: Select .
Iteration: For until convergence:
Set .
Compute by solving
()
where is the distance of at the -th step over the group support set ,
Set
\mathbf{x}_{\mathsf{i}}^{(l+1)}=\mbox{\boldmath\small0},\mbox{ for }\mathsf{i}\in\mathsf{G}\setminus{\mathsf{S}}^{(l)}.
To make ISSA more practical, each term can be linearized at . We introduce the following energy functional with proximal linearization:
[TABLE]
where .
We present an inexact iterative support shrinking algorithm with proximal linearization to solve (1.1).
Initialization: Select with or randomly, where is the all one vector.
Iteration: For until convergence:
Set . Set for and fixed for .
Compute by approximately solving
()
such that
(3.13)
with the tolerance error .
Set
\mathbf{x}_{\mathsf{i}}^{(l+1)}=\mbox{\boldmath\small0},\mathbf{u}_{\mathsf{i}}^{(l)}(\mathbf{x}^{(l+1)})=0,\mbox{ for }\mathsf{i}\in\mathsf{G}\setminus{\mathsf{S}}^{(l)}.
Remark*.*
The condition (3.13) in InISSAPL is motivated by [2, 25]. It corresponds to an inexact inner loop and a guide to select the approximate solution for (). Due to the strong convexity of the problem (), it can be solved to any given accuracy. Therefore, the condition (3.13) in InISSAPL can hold, as long as the problem () is solved sufficiently accurately.
Remark*.*
From the motivating Proposition 3.3, is required to be with as large support as possible. There are two strategies to choose the starting point. One is to set by nonzero scalar multiplication of the all one vector, which yields a group lasso when for the first step. The other is to set by randomly generating data of i.i.d Gaussian (with zero probability to obtain zero group member), indicating a weighted group lasso when . Due to the fact that is not the proximal solution, we also set for the first step in the algorithm. The results of experiments with suggested two kinds of starting points are given in section 6.1.
For the convenience of description later, we give the representation of the subdifferential in (3.13) for ,
[TABLE]
where
[TABLE]
and
[TABLE]
4 Convergence analysis
In this section, we establish the global convergence result of the sequence generated by the InISSAPL algorithm. Theorem 9.2 in the appendix gives a celebrating theoretical framework for the convergence of sequence in decent methods. Recently it has extensive applications [1, 2, 7], especially in non-convex optimization. When we turn back to our problem, the key issue is to deal with the non-Lipschitz property of . In this paper, a lower bound theory of the iterative sequence is developed to overcome the difficulty of the non-Lipschitz property. Furthermore, due to the non-smooth property of , the construction of the element in to prove the relative error condition (H2) in Theorem 9.2 is more technical.
From the iteration process, we can see that it produces a nonincreasing sequence of group support set. The lemma is given in the following.
Lemma 4.1**.**
The sequence converges in a finite number of iterations, i.e., there exists an integer such that if , then .
Proof.
Since is a finite set and
[TABLE]
converges in a finite number of iterations. ∎
In the next, we verify the conditions (H1)-(H3) in Theorem 9.2 for the sequence of the objective function . (H1) is the sufficient decrease condition for the sequence, and it is given in Lemma 4.2. Here we introduce the energy functional with proximal linearization once again, but defined over :
[TABLE]
It should be noted that it is different from in (3.12) by the fidelity term.
Lemma 4.2**.**
For any and , let be a sequence generated by InISSAPL. Then
- (i)
The sequence is nonincreasing and satisfies
[TABLE] 2. (ii)
The sequence is bounded and satisfies .
Proof.
Due to the fact that , we have
[TABLE]
When and , we obtain
[TABLE]
Let . Then
[TABLE]
where is defined in (3.14) and \mbox{\boldmath\small\eta}_{\mathsf{i},j}(\mathbf{x}) is defined in (3.6). Since for any , , we have
[TABLE]
Putting (4.3), (4.4) and (4.6) together, we obtain
[TABLE]
With the fact that is bounded from below and , it follows that is nonincreasing and converges to a finite value as . Thus
[TABLE]
Because is coercive, we know that is bounded.
∎
The following lemma is the lower bound theory on the nonzero groups of the iteration sequence, which can be used to overcome the non-Lipschitz property.
Lemma 4.3**.**
There are such that
[TABLE]
Proof.
From Lemma 4.1, for any and , . The sequence has upper bound from Lemma 4.2,
[TABLE]
We now prove by contradiction that has nonzero lower bound for any .
Suppose there exists for some subsequence , still denoted by , such that
[TABLE]
By the subdifferential expression (3.14), we have for , and ,
[TABLE]
with the left term,
[TABLE]
Summing up all the terms for , we have
[TABLE]
where the second inequality holds from the same reason as the motivating proposition (Proposition 3.3). It follows from the boundedness of that is bounded. The condition (3.13) implies that is also bounded. Thus the equation (4.8) is impossible to hold when because of .
∎
By combining Lemma 4.3 and Proposition 2.1, we can obtain the Lipschitz property over the support of group members.
[TABLE]
when .
Using this property, we can prove the relative error condition (H2) by Lemma 4.4 in which the sequence of is well constructed though is non-smooth.
Lemma 4.4**.**
For each , there exists and constant such that
[TABLE]
Proof.
For , the vector in the set of has the form in (3.14),
[TABLE]
Then the intermediate variable is introduced as follows,
[TABLE]
The upper bound of can be measured by the iterative error,
[TABLE]
Noting the difference of and , we specially construct to be the form,
[TABLE]
where \mbox{\boldmath\small\eta}^{(l)}_{\mathsf{i},j}(\mathbf{x}^{(l+1)}) is the same as the part of and
[TABLE]
Here in \mbox{\boldmath\small\zeta}_{\mathsf{i},j}(\mathbf{x}^{(l+1)}) is to be defined by the requirement of . On one hand, by Lemma 3.1 (i) and (3.3)-(3.4), for , and the set is bounded, then belongs to the corresponding entries of the element in . On the other hand, by Lemma 3.1 (ii) and (3.3)-(3.4), for , it can be checked that if satisfies , \mbox{\boldmath\small\zeta}_{\mathsf{i}}(\mathbf{x}) will be in . Thus also belongs to the corresponding entries of the element in . Therefore, the left is to construct . It is more technical. is determined by estimating the error of and in the case of later. Thus, the main idea of constructing is to compare and \mbox{\boldmath\small\zeta}_{\mathsf{i},j}^{(l)}(\mathbf{x}^{(l+1)})\in[-q\|\mathbf{x}_{\mathsf{i}}^{(l)}\|_{1}^{q-1},q\|\mathbf{x}_{\mathsf{i}}^{(l)}\|_{1}^{q-1}] in (3.14). That is, let , if \mbox{\boldmath\small\zeta}_{\mathsf{i},j}^{(l)}(\mathbf{x}^{(l+1)})\in I, we choose it. Otherwise, we choose the nearest point in . Hence we choose
[TABLE]
where \mbox{\boldmath\small\zeta}_{\mathsf{i},j}^{(l)}(\mathbf{x}^{(l+1)}) is the part of . Noting that , we can check that .
After constructing \mbox{\boldmath\small\zeta}_{\mathsf{i},j}(\mathbf{x}^{(l+1)}), we can now measure the difference between and . We divide this measurement into two cases: and . For , the norm of the difference can be bounded by
[TABLE]
where is also the coefficient of norm equivalence. For , it follows,
[TABLE]
where the first inequality comes from (4.12), (4.13) and (3.14).
Combining (4.11), (4.14) and (4.15) yields:
[TABLE]
where .
∎
(H3) is the continuity condition, and it holds naturally. From Appendix 9, we know that satisfies KL property. Finally, we establish our main convergence result.
Theorem 4.5**.**
The iterative sequence generated by InISSAPL algorithm converges globally to the limit point , which is a stationary point of problem (1.1).
Proof.
Since is bounded (Lemma 4.3), there exists a subsequence and such that
[TABLE]
By combing (4.2), (4.10) and (4.16), and by Theorem 9.2 in the appendix, the sequence converges globally to the limit point , which is a stationary point of . ∎
5 Algorithm Implementation
For each iteration step in InISSAPL algorithm, it is a weighted minimization in essence. It is convex and the inexact inner loop is allowed in implementation. Some standard methods like ADMM [8], split Bregman method [20, 37] and primal-dual algorithm [11, 19] can be used to efficiently solve it. Here we adopt scaled ADMM.
5.1 Scaled ADMM
a At each -th step in InISSAPL, it is equivalently to solving () by
[TABLE]
over group support set . For the brevity of notations, we still use the boldface to denote the vectors on in the following.
Equivalently, we can solve the following constrained optimization problem by
[TABLE]
where
[TABLE]
We introduce the penalty parameters (denoted by ) and the Lagrangian multipliers \mbox{\boldmath\small\lambda},\mbox{\boldmath\small\mu}, then the scaled augmented Lagrangian functional for the weighted problem (5.2) at -th step is the following:
[TABLE]
The scaled ADMM for solving (5.2) is described as follows. When there is no confusion with the notations, we use to denote the -th iteration step in the inner loop of scaled ADDM.
Initialization: Start with \bar{\mathbf{x}}^{(0)}=\mathbf{x}^{(l)}_{\mathsf{S}^{(l)}},\mbox{\boldmath\small\lambda}^{(0)}=\mathbf{0},\mbox{\boldmath\small\mu}^{(0)}=\mathbf{0}.
Iteration: For ,
Compute
(\mathbf{z}^{(i+1)},\mathbf{s}^{(i+1)})=\arg\min_{\mathbf{z},\mathbf{s}}\mathcal{L}^{(l)}_{\rho}(\bar{\mathbf{x}}^{(i)},\mathbf{z},\mathbf{s};\mbox{\boldmath\small\lambda}^{(i)},\mbox{\boldmath\small\mu}^{(i)}).
(5.3)
Compute
\bar{\mathbf{x}}^{(i+1)}=\arg\min_{\bar{\mathbf{x}}}\mathcal{L}^{(l)}_{\rho}(\bar{\mathbf{x}},\mathbf{z}^{(i+1)},\mathbf{s}^{(i+1)};\mbox{\boldmath\small\lambda}^{(i)},\mbox{\boldmath\small\mu}^{(i)}).
(5.4)
Update
(5.5)
(5.6)
5.2 Solving (5.3) and (5.4)
The subproblems (5.3) and (5.4) can be efficiently solved.
- (i)
The minimization subproblem in (5.3) is equivalently to solving
[TABLE]
which can be separated into two independent subproblems.
- (a)
-minimization problem:
[TABLE]
For , we have the explicit solution by [37],
[TABLE]
For , this group problem is separable, the minimizer of it can be also explicitly given by the shrinkage lemma in [36, 32, 33]:
[TABLE]
For the general , it is strongly convex, we can use standard nonlinear numerical methods, such as Newton method to solve it. 2. (b)
-minimization problem:
[TABLE]
For , it is a same problem as -minimization one for , we omit it here.
For , the solution can be obtained easily,
[TABLE]
For general , we also can use the standard nonlinear numerical methods to solve it efficiently.
For , the -minimization problem reads,
[TABLE]
Let are sorted from by the absolute values of elements of the known vector in ascending order, it is equivalent to solving,
[TABLE]
Its optimal solution can be obtained by Theorem 5.1 in the next subsection,
[TABLE]
where and satisfies (5.10). 2. (ii)
The minimization problem in (5.4) is equivalent to solving
[TABLE]
The optimality condition is a linear system like,
[TABLE]
We can solve it by the inverse of a symmetric positive-definite matrix.
Remark*.*
In fact, when , it is unnecessary to introduce the variable . The scaled ADMM can be simplified in this case.
5.3 The analytical solution for the -problem with infinity norm
Now we consider the equivalent -minimization problem for in (5.7). It is strongly convex, so it has a unique solution.
Theorem 5.1**.**
Suppose , and the elements of is in ascending order by , then the minimization problem
[TABLE]
has the explicit optimal solution,
[TABLE]
where is a specific element of such that
[TABLE]
holds simultaneously.
Proof.
Suppose . The minimization problem (5.8) can be rewritten to be more simple,
[TABLE]
We remark here if , the minimizer is when . This is a contradiction. Hence we can replace by , and the minimization problem (5.11) can be modified to be
[TABLE]
In fact, the objective functional is a piecewise continuous function. Letting , we have
[TABLE]
and
[TABLE]
For , the right limit of the derivative of at is,
[TABLE]
similarly, the left limit of the derivative of at is
[TABLE]
Since is continuous at and , is continuously differentiable.
Furthermore, from (5.12), we know that the derivative of is monotonically increasing. Hence is convex. Thus can give us the optimal solution of the simplified problem (5.11). Let
[TABLE]
If there exists such that , then is the minimizer. Evidently, the optimal solution of minimization (5.8) can be given by (5.9).
∎
6 Numerical Experiments
Numerical experiments are reported in this section to show the efficiency of the InISSAPL algorithm. All of them are implemented on a Laptop (Intel(R) Core(TM) Duo i5-7200u @2.50GHz 2.70GHz, 4.00GB RAM) using Matlab(License ID:1108635).
We consider the numerical tests of application in group sparse signal recovery. Let denote the group sparse original signal, which is generated by randomly splitting its components into groups. For each nonzero group member, its entries are randomly generated as i.i.d. Gaussian. Suppose that is randomly generated by an i.i.d. Gaussian ensemble. We let be the row orthogonalized matrix of by in Matlab code. Then the measurement is get by
[TABLE]
where is the noise level and represents the three popular ones, Laplace noise, Gaussian noise and uniform noise.
We denote by the number of nonzero groups of the original signal . Then the sparsity level is defined by . For simplicity, we consider the uniform group partitions that we have the same group size, denoted by . Define the relative recovery error by
[TABLE]
In our numerical experiments, we set for the size of problem, for the noise level and for the uniform group size, unless otherwise mentioned. The recovery is recognized as success when the relative error is less than . For the iteration stopping criteria in the InISSAPL algorithm, we use the same criterion as in [8] by setting in the inner scaled ADMM loop, where
[TABLE]
with
[TABLE]
[TABLE]
We adopt the stopping criterion for the outer iteration. The maximal iteration numbers are set to MAXit=1000 in the ADMM and MAX=100 in the outer iteration.
6.1 Experiments on the initialization of the InISSAPL
We report the results of experiments when the different starting points are chosen in InISSAPL algorithm. The first kind of starting points are with . We choose in the test. By setting for Gaussian noise, we compute the relative errors . The second kind of starting points are randomly generated as i.i.d. Gaussian. We compute the average relative error of 1000 different starting points for the same problem setting as in the first kind.
The experiments are performed for different signal recovery problems with three sensing matrices and three sparsity cases . The comparisons are displayed in Table 1.
It shows that the InISSAPL algorithm is effective and not sensitive to the choice of suggested starting points, even for the less sparsity case . Based on this fact, we will choose vector with ones in all elements as starting point in the following experiments.
The InISSAPL algorithm covers many cases for different choices of . We discuss them separately in the following subsections.
6.2 Accessible to diversity of noise
Our algorithm is applicable to different types of noise. Here we fix and noise level to show the performance for three kinds of noise, Laplace noise, Gaussian noise, and uniform distribution noise.
For a specific case of noise, we compare the relative error in Table 2 when the fidelity term uses different norms. It is clearly illustrated that is best for Laplace noise, is best for Gaussian noise and is best for uniform noise.
6.3 Choice of and
We discuss numerically the InISSAPL algorithm on the parameters in the regularization term. Firstly, letting , we test the algorithm when varies among . The rate of success on sparsity level is demonstrated in Figure 1. It shows that the algorithm performs best when . This fact is consistent with the numerical results in [21, 34].
Secondly, we examine the algorithm on commonly used and for the three kinds of noise with . As suggested in the former Subsection, we use for Laplace noise, for Gaussian noise and for uniform noise, respectively. We compare the rate of success on sparsity level in Figure 2. It can be observed that the rate of success with is better than it with for Laplace noise and conversely for Gaussian noise. For uniform noise, it has no essential numerical difference between and . These results show that different values may apply to a specific model.
6.4 Sensitivity analysis on group size
In this subsection, we study the sensitivity of our algorithm on group size. We implement the experiments to show the rate of success over the different group sizes () for three types of noise. Similarly as before, we set for Laplace noise, for Gaussian noise and for uniform noise. The sensitivity results are given in Figure 3 with and . It shows that the larger the group size, the higher the rate of success. This fact is true because more information is included for larger group size.
6.5 Comparison with some state-of-the-art algorithms
We compare the InISSAPL algorithm with others in the existing works for the group sparse model. The algorithms are typically PGM-GSO [21] and the convex optimization Group Lasso [8]. In the code of PGM-GSO algorithm (available online https://CRAN.R-project.org/package=GSparO), there is an additional input: the number of nonzero groups . In our experiments, PGM-GSO denotes their algorithm with EXACT of the ground truth. Since, in applications, it is hard to know of the ground truth exactly, we also use an estimated value (close to the true value ) with in the experiments for more tests. The PGM-GSO with estimated is named e-PGM-GSO. The comparison on rate of success is demonstrated in Figure 4 by setting the parameters for Gaussian noise. We can see that the rates of success of PGM-GSO (with exact of the number of nonzero groups of the ground truth) and our InISSAPL are similar, which are considerably higher than e-PGM-GSO and Group Lasso. Note that our InISSAPL does NOT require to input the number of nonzero groups.
For the competitive algorithms, InISSAPL, PGM-GSO, and e-PGM-GSO, we compare the running time and relative error for different sized problems in Table 3. It is illustrated that InISSAPL is more efficient than PGM-GSOers, especially for larger scale problems. The reason is that the computation is implemented only on the shrinking group support set.
7 Conclusions
The group sparse - model is very useful in many applications. The InISSAPL algorithm provides a unified framework to deal with all the cases of parameters . When proving the global convergence of algorithm with KL property, we develop a lower bound theory for the nonzero groups of the iterative sequence to avoid the non-Lipschitz feature and construct a sophisticated subdifferential formula. Along iterations, the unknowns become fewer and fewer and can be calculated by the scaled ADMM in the inner loop. Therefore it is specially efficient for large-scale problems. Numerical experiments and comparisons demonstrate the good performance of our algorithm.
In our future work, the model and algorithm can be extended to other applications with overlapping groups structure such as the gene expression data and the patch patterns in image processing.
8 Acknowledgements
We greatly appreciate helpful discussions with Xue Feng, and thank the authors of [21] for providing their code available online https://CRAN.R-project.org/package=GSparO.
9 Appendix
We firstly recall the basic definitions of subdifferential and horizon cone from the reference [29].
Definition 9.1** (Subdifferentials).**
Let be a proper, lower semicontinuous function.
- (i)
The regular subdifferential of at is defined as
[TABLE] 2. (ii)
The (limiting) subdifferential of at is defined as
[TABLE] 3. (iii)
The horizon subdifferential of at is defined as
[TABLE]
Remark*.*
From Definition 9.1, the following properties hold:
- (i)
For any , . If is continuously differentiable at , then ; 2. (ii)
For any , the subdifferential set is closed, i.e,
[TABLE]
Definition 9.2** (Horizon cone).**
For a set , the horizon cone is the closed cone given by
[TABLE]
Remark*.*
A set is bounded if and only if its horizon cone is just the zero cone: .
Secondly, the Kurdyka-Łojasiewicz (KL) property [26, 22] is a useful tool for establishing the convergence of bounded sequence. It allows to cover a wide range of problems [2].
Definition 9.3** (Kurdyka-Łojasiewicz Property).**
[1] A proper function is said to have the Kurdyka-Łojasiewicz property at if there exist , a neighborhood of , and a continuous concave function such that
- (i)
; 2. (ii)
is on ; 3. (iii)
for all , ; 4. (iv)
for all satisfying , the Kurdyka-Łojasiewicz inequality holds:
[TABLE]
where ,
A proper, lower semicontinuous function satisfying the KL property at all points in is called a KL function. One can refer to [2, 7] for examples of KL functions and the application of KL property in optimization theory.
Recently, the KL property has been extended to the definable functions in an o-minimal structure for the nonsmooth version, see [22, 18, 1, 6] and the reference therein. The following definitions and theorem are based on them.
Definition 9.4**.**
[1] Let be such that each is a collection of subsets of . The family is an o-minimal structure over , if it satisfies the following axioms:
- (i)
Each is a boolean algebra. Namely and for each , , and belong to . 2. (ii)
For all , and belong to . 3. (iii)
For all , belongs to . 4. (iv)
For all in , belong to . 5. (v)
The set belongs to . 6. (vi)
The elements of are exactly finite unions of intervals.
Definition 9.5**.**
[1] Given an o-minimal structure over . A set is said to be definable (in ) if belongs to . A function is said to be definable in if its graph belongs to .
Then the definable function has the following property:
- •
finite sums of definable functions are definable;
- •
compositions of definable functions are definable;
- •
function of is definable if and the set are definable.
As an example [18, 1], there exists an o-minimal structure containing the graph of , which is given by
[TABLE]
Theorem 9.1**.**
[1]** Any proper lower semicontinuous function that is definable in an o-minimal structure has the Kurdyka-Łojasiewicz property at each point of .
From this theorem and Definition 9.5, the objective function in this paper is the compositions of definable functions. So it satisfies the KL property.
The following theorem gives a general and important theoretical framework for the convergence of sequence. It has extensive applications recently [2, 7].
Theorem 9.2**.**
[2, 7]** Let be a proper lower semicontinous function. Consider a sequence that satisfies
(H1). (Sufficient decrease condition). For each ,
[TABLE]
(H2). (Relative error condition). For each , there exists such that
[TABLE]
(H3). (Continuity condition). There exists a subsequence and such that
[TABLE]
If has the KL property at the cluster point specified in (H3), then the sequence converges to as and is a critical point.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. , 35(2):438–457, 2010.
- 2[2] H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. , 137(1-2):91–129, 2013.
- 3[3] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. , 2(1):183–202, 2009.
- 4[4] D. P. Bertsekas. Control of uncertain systems with a set-membership description of the uncertainty. Ph D thesis , May, 1971
- 5[5] W. Bian and X. Chen. Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization. SIAM J. Optim. , 23(3):1718-1741, 2013.
- 6[6] J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota. Clarke subgradients of stratifiable functions. SIAM J. Optim. , 18(2):556–572, 2007
- 7[7] J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. , 146(1-2):459–494, 2014.
- 8[8] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. , 3(1):1–122, 2011.
