Research Report: Exact biconvex reformulation of the $\ell_2-\ell_0$ minimization problem
Arne Bechensteen, Laure Blanc-F\'eraud, Gilles Aubert

TL;DR
This paper introduces an exact biconvex reformulation of the $ ext{l}_2- ext{l}_0$ minimization problem, enabling improved algorithms for sparse optimization with applications in microscopy.
Contribution
It presents a novel exact biconvex reformulation of the $ ext{l}_0$ minimization problem, facilitating more effective optimization algorithms.
Findings
The reformulation is exact and biconvex, preserving solutions.
The proposed algorithm outperforms Iterative Hard Thresholding.
Application to microscopy demonstrates improved results.
Abstract
We focus on the minimization of the least square loss function either under a -sparse constraint or with a sparse penalty term. Based on recent results, we reformulate the pseudo-norm exactly as a convex minimization problem by introducing an auxiliary variable. We then propose an exact biconvex reformulation of the constrained and penalized problems. We give correspondence results between minimizers of the initial function and the reformulated ones. The reformulation is biconvex and the non-convexity is due to a penalty term. These two properties are used to derive a minimization algorithm. We apply the algorithm to the problem of single-molecule localization microscopy and compare the results with the well-known Iterative Hard Thresholding algorithm. Visually and numerically the biconvex reformulations perform better.
| Jaccard index (%) | ||||
|---|---|---|---|---|
| Method - Tolerance (nm) | 50 | 100 | 150 | 200 |
| IHT - Constrained | 20.0 | 39.4 | 48.9 | 54.3 |
| Biconvex - Constrained | 20.0 | 48.3 | 61.4 | 67.7 |
| IHT - Penalized | 13.1 | 31.2 | 35.7 | 38.0 |
| Biconvex - Penalized | 18.1 | 40.0 | 50.0 | 54.7 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotoacoustic and Ultrasonic Imaging · Aortic aneurysm repair treatments · Sparse and Compressive Sensing Techniques
11institutetext: Arne Bechensteen 22institutetext: Université Côte d’Azur, CNRS, INRIA, Laboratoire I3S UMR 7271, 06903 Sophia Antipolis, France, 22email: [email protected] 33institutetext: Laure Blanc-Féraud 44institutetext: Université Côte d’Azur, CNRS, INRIA, Laboratoire I3S UMR 7271, 06903 Sophia Antipolis, France, 44email: [email protected] 55institutetext: Gilles Aubert 66institutetext: Université Côte d’Azur, UNS, Laboratoire J. A. Dieudonné UMR 7351, 06100 Nice, France, 66email: [email protected]
Research Report: Exact biconvex reformulation of the minimization problem
Arne Bechensteen
Laure Blanc-Féraud
and Gilles Aubert
Abstract
We focus on the minimization of the least square loss function either under a -sparse constraint or with a sparse penalty term. Based on recent results, we reformulate the pseudo-norm exactly as a convex minimization problem by introducing an auxiliary variable. We then propose an exact biconvex reformulation of the constrained and penalized problems. We give correspondence results between minimizers of the initial function and the reformulated ones. The reformulation is biconvex and the non-convexity is due to a penalty term. These two properties are used to derive a minimization algorithm. We apply the algorithm to the problem of single-molecule localization microscopy and compare the results with the well-known Iterative Hard Thresholding algorithm. Visually and numerically the biconvex reformulations perform better.
1 Introduction
Sparse optimization consists in finding a solution with many zero components from an underdetermined problem. There are many problems where the solution has many zero components (e.g machine learning, variable selection, pulse deconvolution, etc). The most common way to measure the sparsity of a solution is by using the counting function which is, by abuse of terminology, referred to as the -norm, and is defined as
[TABLE]
In this paper, we are interested in linear problems where the observation can be described as the multiplication of the solution with a matrix plus some noise which we assume to be additive white Gaussian and independent of the data.
[TABLE]
This problem is underdetermined when . In sparse optimization involving the square norm, there are three different approaches to tackle the problem. We search for solution of:
[TABLE]
The two cases (C) and (1) are on constrained form. For problem (C), the user has a knowledge of the sparsity of the solution, which is in this case at maximum . In the case of (1), the user have prior knowledge about the amount of noise, , the signal has been affected by. It is usually possible to estimate from the data and the statistics of .
In the case of problem (P), which is also referred to as the penalized form, the user does not have any information on the sparsity of the solution, nor on the noise the signal has been affected by. Therefore, the user must choose the amplitude of which serves as a trade-off parameter between the data term and the sparsity term. If is large, the reconstruction of will be sparse, but the difference between and may be large. Conversely, if is small, the error between and is small, but the reconstructed may not be sparse.
The above problems are not continuous, nor convex and the problems are known to be NP-hard due to the combinatorial nature of the -norm. However, they have been greatly studied due to their countless applications such as sparse reconstruction of signals, variable selection, and single-molecule localization microscopy to cite a few. There are two main approaches to solve the problems, which are greedy algorithms and relaxations. A new approach has been lately been introduced which is a mathematical program with equilibrium constraint.
Greedy algorithms Greedy algorithms are often used in sparse optimization. The idea behind these algorithms is to start with a zero initialization and for each iteration add one component to the signal until the wished sparsity is obtained. One of the easiest and least costly greedy algorithms, the Matching Pursuit (MP) algorithm mallat_matching_1993 , adds the component that reduces the residual at each iteration , which is defined as .
The Orthogonal Matching Pursuit (OMP), proposed in pati_orthogonal_1993 , is a more refined version of MP. The algorithm chooses each component in the same way as MP, but for each new component added, it calculates and update the value of all the previous components. This may lead to a better result, but the complexity and cost of the calculation are greater than MP.
Greedy algorithms have been greatly studied and a lot of different versions of the above algorithms has been developed. More complex ones, as the algorithm Greedy sparse simplex beck_sparsity_2012 or Single Best Replacement (SBR) soussen2011bernoulli have been introduced. In contrast to MP or OMP, the algorithms can add and also substract components.
Relaxations The three formulations of the sparse optimization problem (P, C and 1) are non-convex, due to the non-convexity of the -norm. A common alternative is to work with the convex -norm instead of non-convex -norm. The problem becomes a problem. This is called a convex relaxation since the non-convex term is replaced by a convex term. However, only under certain assumptions, the original problem and the convex relaxed problem have the same solutions candes_robust_2006 . Furthermore, and are very different when contains large values. Non-smooth, non-convex but continuous relaxations where primarily introduced to avoid the difference between and when contains large values. These relaxations are still non-convex, and the convergence of the algorithms to a global minimum is not assured. There are many non-convex continuous relaxations, as the NonNegative Garrote breiman_better_1995 , the Log-Sum penalty candes_enhancing_2007 or Capped- peleg_bilinear_2008 to mention some. The continuous Exact penalty introduced in soubies2015continuous proposes an exact relaxation for the problem (P) and a unified view of these functions is given in soubies2017unified .
Mathematical program with equilibrium constraint A more recent method resolving a sparse optimization problem is to introduce auxiliary variables to simulate the nature of -norm and add a constraint between the primary variable and the auxiliary. Hence the problem becomes a mathematical program with equilibrium constraint, and among the approaches, we find Mixed Integer reformulations bourguignon2016exact , Boolean relaxation pilanci2015sparse and the article that inspired our work, yuan_sparsity_2016 . The method has been used to study the three formulations of the sparse optimization problem (see bi2014exact ; lu2013sparse , for example).
Contribution: The aim of this paper is to present and study a new method for optimizing the constrained (C) and penalized (P) problem with an added positivity constraint. The added positivity constraint is important in many sparse optimization problems. We start in section 2 by introducing a reformulation of the -norm by a variational characterization. The norm is rewritten as a convex minimization problem by introducing an auxiliary variable, and we can reformulate (C) and (P) as a mathematical program with equilibrium constraint (MPEC). The reformulation of the -norm was presented in yuan_sparsity_2016 , and our work is an extension of their work, as they only study the minimization of a data term which is Lipschitz continuous with a sparsity constraint. In this paper, the data term is the square norm,on the error which is not Lipschitz continuous, and we study the minimization with a sparsity constraint (problem (C)) and with a sparse penalty term (problem (P)). Based on the MPEC formulation of the problem we define a Lagrangian cost function . The function is biconvex. The main contribution of the paper is Theorem 2.1 for the constrained version of and Theorem 2.2 for the penalized version of . We show that minimizing the is equivalent, in the sense of minimizers, as to find a solution to the initial constrained or penalized problem. In section 3 we propose an algorithm to minimize the new objective function. This algorithm is easy to implement as it is based on already existing and well known algorithms. In section 4 we test the algorithms on the problem of single-molecule localization microscopy (SMLM). This is a well-studied problem sage2015quantitative where the goal is to localize the molecules with a high precision.
Notations:
- •
, the -norm. If another norm is applied, this will be denoted with a subscript.
- •
The function
[TABLE]
will be, by abuse of terminology, referred to as the -norm.
- •
The observed signal .
- •
A is a matrix in , .
- •
is the transposed matrix of .
- •
For a matrix , the singular value decomposition (SVD) of is noted .
- •
For a matrix , we denote the spectral norm of A defined as
[TABLE]
where is the largest singular value of
- •
If not stated otherwise, the vector .
- •
The indicator function is defined for as
[TABLE]
- •
The subgradient of the convex function at point is the set of vectors such that
[TABLE]
- •
The normal cone of a convex set in is defined as
[TABLE]
- •
is a component-wise notation, i.e, .
- •
is a vector containing the absolute value of each component of the vector .
2 Exact reformulation
In this section we focus on a reformulation of the -norm. This reformulation was first introduced in yuan_sparsity_2016 . The -norm can be rewritten as a convex minimization problem by introducing an auxiliary variable.
Lemma 1.
(yuan_sparsity_2016*, *, Lemma 1)** For any
[TABLE]
Proof.
We consider first the problem
[TABLE]
The equality constraint and yields
[TABLE]
As we minimize , if then . We have that . Furthermore, since , we have . So the constraint is equivalent to which is exactly . ∎∎
The introduction of the auxiliary variable increases the dimension of the problem, but the non-convex and non-continuous -norm can now be written as a convex minimization problem. In this paper, we study the penalized and constrained problems using the reformulation of the -norm. We also add non-negativity constraint to the variable as it is usually used as a priori in imaging problems. The two problems can be written as a general problem defined as:
[TABLE]
where is in the case of the constrained problem (C):
[TABLE]
and for the penalized problem (P):
[TABLE]
We note the , and we define the functional as
[TABLE]
The functional (8) is still non-convex due to the equality constraint, but it is biconvex: the minimization of (8) with respect to while is fixed is convex, and conversely. However, the minimization of such a function is hard because of the equality constraint which is non-convex. We can relax this constraint by introducing a penalty term, . This is based on the method of Lagrange Multipliers. Note that it is not necessary to add the absolute value to this penalty term since and therefore the penalty term is never negative.
We introduce a Lagrangian cost function defined as
[TABLE]
In this paper we are focusing on exact penalty methods, such that a local or global minimizer of (9) leads to a local or global minimizer of the initial problem (8). The following theorem ensures this.
Theorem 2.1 (Constrained form).
Assume that , and is of full rank. Let and be defined respectively in (9) and (8) with the constrained form defined in (6). We have:
If is a local or global minimizer of , then is a local or global minimizer of . 2. 2.
If is a global minimizer of , then is a global minimizer of .
Two lemmas are needed in order to proof Theorem 2.1. The complete proofs of these lemmas require three other lemmas (Lemma 6, Lemma 9, and Lemma 10) stated in the Appendix.
Lemma 2.
Let . Let be a local or global minimizer of with defined as in (6) or (7). Let . Then
Proof.
Let denote the set of indices: . If is a local or global minimizer of then , where denotes a neighborhood of of size , we have
[TABLE]
By choosing and with and , with , we have
[TABLE]
We want to show that is zero. We have
[TABLE]
Using the above decomposition simplifies (10), and we have :
[TABLE]
Thus is a solution of
[TABLE]
or, equivalently solution of
[TABLE]
where is the submatrix of composed by the columns indexed by of . With Lemma 6 (see Appendix), we have that and if we can apply Lemma 9 (see Appendix) with a vector composed of . We conclude that . ∎∎
Lemma 3.
If , let be a local or global minimizer of
[TABLE]
with defined as in (6), that is, the constrained form. Then .
Proof.
From Lemma 10 (see Appendix), we have that , and . It suffices to prove . For that we use Lemma 2, and conclude that . ∎∎
With the two above lemmas we can prove Theorem 2.1
Proof.
We start by proving the first part of the theorem. Let be a local minimizer of , with on the constrained form, that is, defined as in (6). Let . If then, from Lemma 3,
[TABLE]
Furthermore, from the definition of a minimizer, we have
[TABLE]
and so we have
[TABLE]
Since , we have
[TABLE]
By the definition, is also a local minimizer of .
Now we prove part 2 of Theorem 2.1.
Let be a global minimizer of . We necessarily have . First, we show that
[TABLE]
This can be shown by contradiction. Assume the opposite, and denote a global minimizer of . We then have
[TABLE]
Lemma 3 shows that , so and we have
[TABLE]
and more precisely, which is not possible, since is a global minimizer of .
We therefore have shown that , and we have
[TABLE]
is thus a global minimizer of . ∎∎
Theorem 2.2 (Penalized form).
Assume that , and is of full rank. Let and be defined respectively in (9) and (8) with on the penalized form with defined in (7). We have:
If is a local or global minimizer of , then we can construct which is a local or global minimizer of . 2. 2.
If is a global minimizer of , then is a global minimizer of .
For the proof, we need two lemmas, Lemma 2 which is already presented and the following lemma.
Lemma 4.
Let be a local or a global minimizer of for the penalized form ( defined as in (7)). If then such that we have
Proof.
From Lemma 11 (see Appendix), we have that iff . We denote the set of indices where , and we can apply Lemma 2, and conclude that .
∎∎
Remark 1.
If , note that the cost function with minimizers is constant on and .
Remark 2.
In the case of the penalized form, the minimizers () of with may be such that . This may only happen if .
Remark 3.
If . From Remark 1, from a minimizer of , we can construct a minimiser of such that . This can be done by denoting , the set of indices such that . If is non-empty, we have . From Remark 2, . Take and , then . Furthermore, is a minimizer of after Remark 1 and the fact that is convex with respect to .
With Lemma 4 and the above remarks, we can prove Theorem 2.2.
Proof.
We start by proving the first part of the theorem. Given a local or global minimizer of , with on the penalized form, that is, defined as in (6). Let denote the space where . If then, from remark 3, we can construct such that
[TABLE]
Furthermore, from the definition of a minimizer, we have
[TABLE]
and so we get
[TABLE]
Since , we obtain
[TABLE]
Then, is also a local minimizer of .
The second part of Theorem 2.2 can be proved as in the proof of Theorem 2.1. ∎∎
Theorem 2.1 and 2.2 show that, for a given , minimizing (9) is equivalent in terms of minimizers as minimizing (8). The results in this section are similar to yuan_sparsity_2016 . In their paper, they, instead of working with the square norm, work with a Lipschitz continuous function . We were inspired by their work to extend it to the square norm. They have a theorem equivalent to Theorem 2.1, but the lower bound for is less sharp. Furthermore, the paper yuan_sparsity_2016 does not tackle the penalized sparsity problem and thus has not a theorem equivalent to Theorem 2.2.
Although in (9) is non-convex, the formulation is biconvex, i.e, the functional is convex with respect to when is constant and conversely. With that in mind, we propose in the next section an algorithm to minimize (9).
3 A minimization algorithm
The functional has two interesting particularities. The first is that the non-convexity of is due to the coupling term . is therefore convex when the penalty parameter equals to zero. This inspires the idea of an algorithm to minimize . The minimization starts with a small and minimizes . For each iteration, the penalty parameter, , increases and the solution of the previous iteration are used as initialization for the next minimization. This method will hopefully give a good initialization for the final minimization, that is when is according to Theorem 2.1 and Theorem 2.2. The second interesting property of functional is the biconvexity. Minimization by blocks is therefore interesting since with respect to each block (that is either or ), the problem is convex. With this in mind, and following yuan_sparsity_2016 , we propose the following algorithm.
The minimization of (15) is done by using the Proximal Alternating Minimization algorithm (PAM) attouch_proximal_2008 which ensures convergence to a critical point. The PAM minimizes functions on the form
[TABLE]
In our case, we have, , and . PAM has the following outline
[TABLE]
and add strict convexity to each block, and are bounded from below and above.
In the following section we develop the minimization schemes for (15) in the case of the constrained problem ( defined as in (6)) respectively the penalized problem ( defined as in (7)). We recall the minimization of is
[TABLE]
where is defined in (6) or in (7).
3.1 The minimization with respect to .
The minimization with respect to using PAM is
[TABLE]
which can be rewritten as
[TABLE]
This problem can be solved using the FISTA algorithm beck_fast_2009 . The algorithm works with a functional where is a smooth convex function with a Lipschitz continuous gradient . is a continuous convex function and possibly non-smooth. In our case we have
[TABLE]
The proximal operator of is the soft thresholding with positivity constraint
[TABLE]
3.2 The minimization with respect to
In this section we study how to find a solution to the following convex minimization problem
[TABLE]
The above problem can be rewritten as
[TABLE]
and for simplicity we denote .
The constrained minimization of
In this section we work with the constrained formulation of . Then the minimization problem (21) can be simplified and written as
[TABLE]
Since the minimizer of is reached for , we can write . Furthermore, since the is invariant with respect to the sign, we can rewrite the minimization problem as
[TABLE]
and then . This minimization problem is a variant of the knapsack problem which can be solved using classical minimization schemes such as doi:10.1155/S168712000402009X :
[TABLE]
The penalized minimization of
The minimization of (21) with respect to , with on the penalized form (7), can be written as
[TABLE]
Proposition 1.
The solution of
[TABLE]
is reached for
[TABLE]
Proof.
Problem (22) has a closed form expression which can be found by calculating the subgradient for the problem (22) with respect to . Note that the subgradient of the box constraint is 0 if , if and if . We obtain the following optimal conditions:
[TABLE]
and the optimal solution is
[TABLE]
∎∎
4 Application to 2D single-molecule localization microscopy
In this section, we compare the minimization of the biconvex reformulations to the algorithm Iterative Hard Thresholding combettes2005signal where we add the non-negativity constraint to . This algorithm performs as well to both formulations (C) and (P). They are applied to the problem of 2D Single-Molecule Localization Microscopy (SMLM).
SMLM is a microscopy method which is used to obtain images with a higher resolution than what is possible with normal optical microscopes. This was first introduced in hess2006ultra ; betzig2006imaging ; rust2006sub . Fluorescent microscopy uses molecules that can emit light when they are excited with a laser. The molecules are observed with an optical microscope, and, since the molecules are smaller than the diffraction limit, what is observed is not each molecule, but rather a diffraction disk larger than the molecule. This limits the resolution of the image. SMLM exploits photoactivatable fluorescent molecules, and, instead of activating all the molecules at once as done by other fluorescent microscopy methods, activates a sparse set of fluorescent molecules. The localization of each molecule with a high precision is possible since the probability of two or more molecules to be in the same diffraction disk is small. The localization becomes harder if the density of emitting molecules is higher. Once each molecule has been precisely localized, they are switched off and the process is repeated until all the molecules have been activated. The total acquisition time may be long when activating few molecules at a time, which is unfortunate as SMLM may be used on living samples which can move during this time. This will lead to a faulty reconstruction. We are, in this paper, interested in high-density acquisitions.
The localization problem of SMLM can be described as a minimization problem such as (P) and (C) with an added positivity constraint since we reconstruct the intensity of the molecules. The two biconvex formulations can be applied to the SMLM problem. is the matrix operator that performs a convolution with the Point Spread function and a reduction of dimensions. The molecules are reconstructed on a grid which is finer than the observed image , with . For a complete lecture on the mathematical model, see gazagnes2017high .
We test the algorithms on two datasets, both accessible from the ISBI 2013 challenge sage2015quantitative . Both datasets are of high-density acquisitions. The first dataset contains simulated acquisitions, which makes it possible to do a numerical evaluation of the reconstruction. The second dataset contains real acquisitions. For a complete lecture on the SMLM and the different localization algorithms, see the ISBI-SMLM challenge sage2015quantitative . In Figure 1 are three of the 361 acquisitions of the simulated dataset. We apply the localization algorithms to each acquisition, and the results of the localization of the 361 acquisitions yields one super-resolution image.
We use the Jaccard index in order to perform a numerical evaluation of the reconstructions. The Jaccard index evaluates only the localization of the reconstructed molecules (see sage2015quantitative ). The Jaccard index is the ratio between the correctly reconstructed (CR) molecules and the sum of CR-, false negatives (FN)- and false positives (FP) molecules. The index is 1 for a perfect reconstruction, and the lower the index, the poorer the reconstruction. The Jaccard index includes a tolerance of error in its calculations of correctly reconstructed molecules.
[TABLE]
4.1 Results of the simulated dataset
The ISBI simulated dataset represents 8 tubes of 30 nm diameter. The acquisition is of the size of pixels where each pixel is of size nm2. The Point Spread Function (PSF) is modeled by a Gaussian function where the Full Width at Half Maximum (FWHM) is 258.21 nm. In total there are 81 049 molecules on a total of 361 images.
We localize the molecules with a higher precision on a pixel image, where the size of each pixel is nm2. As an optimization problem, this is equivalent to reconstruct for an acquisition , where and . The center of the pixel is used to estimate the position of the molecule.
We set , the maximum number of molecules the algorithm reconstructs, equal to 220 for the constrained problem. This number is the average number of molecules for each acquisition, which we know from the ground truth. Note that in order to observe the reconstruction, we normalize the image, that is, we let the smallest value in the image to be 0, and the largest to be 1. Each pixel has an intensity between 0 and 1, and the brighter the pixel the stronger the intensity.
We set for the biconvex algorithms. Note that a smaller could be chosen, but this implies longer computational time. Both constrained algorithms reconstruct 220 molecules for each acquisition. We choose a for the two penalized algorithms such that they reconstruct around 220 molecules on average. For the IHT, and for the biconvex penalized . We initialize the IHT with applying the conjugate of the operator on the acquisition. The results of the reconstructions are shown in Figure 2. Both biconvex reformulations reconstruct the tubes thicker than the ground truth. The two IHT algorithms do not manage to distinguish between two tubes when they are close (see the red case in Figure 2) compared to the biconvex reformulations. The Jaccard index is shown in Table 1. We observe the low Jaccard index of the IHT constrained algorithm compared to the biconvex constrained algorithm. This might be surprising since the IHT seems to reconstruct the tubelins with a correct thickness. However, this indicates that IHT reconstruct many molecules of low intensity which are not situated on the tubelins.
4.2 Results of the real dataset
We compare the algorithms on a high-density dataset of tubulins which are provided from the 2013 ISBI SMLM challenge, where there are 500 acquisitions. Each acquisition is of size pixels and each pixel is of size nm2. The FWHM has been previously estimated to be 351.8 nm chahid2014echantillonnage . We localize the molecules on a pixel image, where each pixel is of size nm2.
In this section, we do not have any beforehand knowledge of the solution, and we set for the biconvex constrained algorithm. For the biconvex penalized algorithm we set . We choose because of computational time. For the constrained IHT algorithm, we set the constraint and for the penalized we set . Figure 3 presents the reconstruction. The results are coherent with the results from the simulated dataset. The IHT algorithms reconstruct not as well as the biconvex algorithms, with the penalized version much worse than the constrained version.
5 Conclusion
In this paper, we have presented a reformulation of the constrained and penalized problems. We have proved in Theorem 2.1 and Theorem 2.2 the exactness of the reformulations, that is, we can from a minimizer of the reformulation obtain a minimizer of the initial problem. Furthermore, both reformulations are biconvex. Using two central properties of the reformulation, we derive a general algorithm in order to minimize the constrained or the penalized reformulation. This algorithm is easy to implement as each step can be decomposed to well-studied problems. The algorithms are compared to the well-known IHT algorithm on constrained and penalized form. We apply the algorithms to single-molecule localization microscopy and the two biconvex algorithms outperform the IHT algorithms visually and numerically.
As perspectives, it seems interesting to further investigate the reformulation of the -norm, and to introduce it with other data-fitting terms.
Appendix
In this Appendix we recall and prove some properties that are useful for the proof of Theorem 2.1 and Theorem 2.2.
Lemma 5.
Let be a semi-orthogonal matrix, that is, a non-square matrix composed of orthonormal columns. Then, is the identity matrix in .
Lemma 6.
Let , let denote the th column of . Defining to be a set of indices, . Let the restriction of to the columns indexed by the elements of be denoted as . Then .
Proof.
Note that we can write as the product of matrix and a matrix . We define the vector , the unitary vector which has zeros everywhere except for the place. The matrix can be constructed with . The matrix is therefore a semi-orthonormal matrix. The spectral norm of the matrix is 1, as is the identity matrix (from Lemma 5). The norm can be written as
[TABLE]
∎∎
Lemma 7.
[Pshenichnyi-Rockafellar lemma](zalinescu2002convex, , Theorem 2.9.1) Assume is a proper lower semi-continuous convex function. Let be a convex set, such that . Then
[TABLE]
where is the normal cone of the convex set .
Lemma 8.
Given the problem
[TABLE]
where is a full rank matrix and a non-negative vector. is a vector which contains the absolute value of each component of . Let be a solution of problem (24). Then is bounded independently of and
[TABLE]
Proof.
Let be the solution of , then we have
[TABLE]
In particular, by choosing we have:
[TABLE]
The term is always non-negative as is a non-negative vector, therefore we have
[TABLE]
and so
[TABLE]
∎∎
Lemma 9.
Let , be a full rank matrix and is a non-negative vector. We have the following result: If then the optimal solution of the following optimization problem:
[TABLE]
is achieved with .
Proof.
We start by proving that . Remark that Lemma 8 is valid for problem (28), from which we have
[TABLE]
Then, by choosing, for all , , we are sure that . From the Pshenichnyi-Rockafellar lemma, a necessary and sufficient condition for is a minimizer of on is that
[TABLE]
where in our case is the and . We have that since is a sum of two convex functions, where the intersection of the domains is non empty (see (bookConvex, , Corollary 16.38)).
The optimal condition is therefore
[TABLE]
where
[TABLE]
and
[TABLE]
For we have the following optimal condition
[TABLE]
If , then and cannot be strictly positive, furthermore cannot be strictly negative since we work in the non-negative space. Therefore .
∎∎
Lemma 10.
Let be a local minimizer of defined in (9), with on the constrained form, that is, defined as in (6). Let . We denote as the indexes of the k largest values of . , and . Moreover, we define , and . If , that is, , then the minimum of will be reached with such that
[TABLE]
If , that is, , then
[TABLE]
such that .
Proof.
We observe that minimizing can be viewed as a problem of minimizing by using the definition of . The results are obvious. ∎∎
Lemma 11.
Let be a local minimizer of defined in (9), with on the penalized form, that is, defined as in (7). Let . The minimum of will be reached with a such that
[TABLE]
Proof.
Proof of the necessary condition:
We start by writing the optimal conditions of .
[TABLE]
We split the study of (33) in five cases.
- •
If
[TABLE]
Thus,
- •
If
[TABLE]
Thus
- •
If
[TABLE]
Thus
- •
If
[TABLE]
Thus
- •
If
[TABLE]
Thus,
Proof of sufficient condition:
We can prove that the reverse statement is also true. We can rewrite , for some . We have then from the optimal conditions (33) that
[TABLE]
[TABLE]
Assuming , then only (34) is possible. If , then (34), (35) (36) are possible. If , then only (36) is possible. If , then only (36) is possible. If , then (36), (37) and (38) are possible. If , then only (38) is possible.
This finishes the proof.
∎∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Hédy Attouch, Jérôme Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Mathematics of Operations Research , 35(2):438–457, 2010.
- 2(2) A. Beck and M. Teboulle. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences , 2(1):183–202, January 2009.
- 3(3) Amir Beck and Yonina C Eldar. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM Journal on Optimization , 23(3):1480–1509, 2013.
- 4(4) Eric Betzig, George H Patterson, Rachid Sougrat, O Wolf Lindwasser, Scott Olenych, Juan S Bonifacino, Michael W Davidson, Jennifer Lippincott-Schwartz, and Harald F Hess. Imaging intracellular fluorescent proteins at nanometer resolution. Science , 313(5793):1642–1645, 2006.
- 5(5) Shujun Bi, Xiaolan Liu, and Shaohua Pan. Exact penalty decomposition method for zero-norm minimization based on mpec formulation. SIAM Journal on Scientific Computing , 36(4):A 1451–A 1477, 2014.
- 6(6) Sébastien Bourguignon, Jordan Ninin, Hervé Carfantan, and Marcel Mongeau. Exact sparse approximation problems via mixed-integer programming: Formulations and computational performance. IEEE Transactions on Signal Processing , 64(6):1405–1419, 2016.
- 7(7) Leo Breiman. Better Subset Regression Using the Nonnegative Garrote. Technometrics , 37(4):373–384, 1995.
- 8(8) E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory , 52(2):489–509, February 2006.
