Research Report: Exact biconvex reformulation of the $\ell_2-\ell_0$   minimization problem

Arne Bechensteen; Laure Blanc-F\'eraud; Gilles Aubert

arXiv:1903.01162·math.OC·March 7, 2019

Research Report: Exact biconvex reformulation of the $\ell_2-\ell_0$ minimization problem

Arne Bechensteen, Laure Blanc-F\'eraud, Gilles Aubert

PDF

Open Access

TL;DR

This paper introduces an exact biconvex reformulation of the $ ext{l}_2- ext{l}_0$ minimization problem, enabling improved algorithms for sparse optimization with applications in microscopy.

Contribution

It presents a novel exact biconvex reformulation of the $ ext{l}_0$ minimization problem, facilitating more effective optimization algorithms.

Findings

01

The reformulation is exact and biconvex, preserving solutions.

02

The proposed algorithm outperforms Iterative Hard Thresholding.

03

Application to microscopy demonstrates improved results.

Abstract

We focus on the minimization of the least square loss function either under a $k$ -sparse constraint or with a sparse penalty term. Based on recent results, we reformulate the $ℓ_{0}$ pseudo-norm exactly as a convex minimization problem by introducing an auxiliary variable. We then propose an exact biconvex reformulation of the $ℓ_{2} - ℓ_{0}$ constrained and penalized problems. We give correspondence results between minimizers of the initial function and the reformulated ones. The reformulation is biconvex and the non-convexity is due to a penalty term. These two properties are used to derive a minimization algorithm. We apply the algorithm to the problem of single-molecule localization microscopy and compare the results with the well-known Iterative Hard Thresholding algorithm. Visually and numerically the biconvex reformulations perform better.

Tables1

Table 1. Table 1: The Jaccard index with respect to the tolerance and the different algorithms. In bold is the best reconstruction.

	Jaccard index (%)
Method - Tolerance (nm)	50	100	150	200
IHT - Constrained	20.0	39.4	48.9	54.3
Biconvex - Constrained	20.0	48.3	61.4	67.7
IHT - Penalized	13.1	31.2	35.7	38.0
Biconvex - Penalized	18.1	40.0	50.0	54.7

Equations180

∥ x ∥_{0} = # {x_{i}, i = 1, \dots N : x_{i} \neq = 0} .

∥ x ∥_{0} = # {x_{i}, i = 1, \dots N : x_{i} \neq = 0} .

A x + η = d

A x + η = d

\overset{x}{^} \in x arg min G_{ℓ_{0}} (x) := \frac{1}{2} ∥ A x - d ∥_{2}^{2} + λ ∥ x ∥_{0}

\overset{x}{^} \in x arg min G_{ℓ_{0}} (x) := \frac{1}{2} ∥ A x - d ∥_{2}^{2} + λ ∥ x ∥_{0}

\overset{x}{^} \in x arg min G_{k} (x) := \frac{1}{2} ∥ A x - d ∥_{2}^{2} s.t. ∥ x ∥_{0} \leq k

\overset{x}{^} \in x arg min ∥ x ∥_{0} s.t. \frac{1}{2} ∥ A x - d ∥_{2}^{2} \leq ε

∥ x ∥_{0} = # {x_{i}, i = 1, \dots N : x_{i} \neq = 0}

∥ x ∥_{0} = # {x_{i}, i = 1, \dots N : x_{i} \neq = 0}

∥ A ∥ = σ (A)

∥ A ∥ = σ (A)

ι_{x \in X} (x) = {+ \infty if x \in / X 0 if x \in X

ι_{x \in X} (x) = {+ \infty if x \in / X 0 if x \in X

\forall z \in d o m (f) f (z) \geq f (x) + v^{T} (z - x)

\forall z \in d o m (f) f (z) \geq f (x) + v^{T} (z - x)

N_{C} (x_{0}) = {η \in R^{n}, < η, x - x_{0} >\leq 0 \forall x \in C} .

N_{C} (x_{0}) = {η \in R^{n}, < η, x - x_{0} >\leq 0 \forall x \in C} .

∥ x ∥_{0} = - 1 \leq u \leq 1 min ∥ u ∥_{1} s.t ∥ x ∥_{1} =< u, x >

∥ x ∥_{0} = - 1 \leq u \leq 1 min ∥ u ∥_{1} s.t ∥ x ∥_{1} =< u, x >

- 1 \leq u \leq 1 min ∥ u ∥_{1} s.t. ∣ x_{i} ∣ = u_{i} x_{i} \forall i

- 1 \leq u \leq 1 min ∥ u ∥_{1} s.t. ∣ x_{i} ∣ = u_{i} x_{i} \forall i

\overset{u}{^}_{i} ⎩ ⎨ ⎧ = 1 iff x_{i} > 0 = - 1 iff x_{i} < 0 \in [- 1, 1] iff x_{i} = 0

\overset{u}{^}_{i} ⎩ ⎨ ⎧ = 1 iff x_{i} > 0 = - 1 iff x_{i} < 0 \in [- 1, 1] iff x_{i} = 0

x, u min \frac{1}{2} ∥ A x - d ∥^{2} + I (u) + ι_{\cdot \geq 0} (x) s.t. ∥ x ∥_{1} =< x, u >

x, u min \frac{1}{2} ∥ A x - d ∥^{2} + I (u) + ι_{\cdot \geq 0} (x) s.t. ∥ x ∥_{1} =< x, u >

I (u) = {0 if ∥ u ∥_{1} \leq k and \forall i, - 1 \leq u_{i} \leq 1 \infty otherwise

I (u) = {0 if ∥ u ∥_{1} \leq k and \forall i, - 1 \leq u_{i} \leq 1 \infty otherwise

I (u) = {λ ∥ u ∥_{1} if \forall i, - 1 \leq u_{i} \leq 1 \infty otherwise

I (u) = {λ ∥ u ∥_{1} if \forall i, - 1 \leq u_{i} \leq 1 \infty otherwise

G (x, u) = \frac{1}{2} ∥ A x - d ∥^{2} + I (u) + ι_{\cdot \geq 0} (x) + ι_{\cdot \in S} (x, u)

G (x, u) = \frac{1}{2} ∥ A x - d ∥^{2} + I (u) + ι_{\cdot \geq 0} (x) + ι_{\cdot \in S} (x, u)

G_{ρ} (x, u) = \frac{1}{2} ∥ A x - d ∥^{2} + I (u) + ι_{\cdot \geq 0} (x) + ρ (∥ x ∥_{1} - < x, u >)

G_{ρ} (x, u) = \frac{1}{2} ∥ A x - d ∥^{2} + I (u) + ι_{\cdot \geq 0} (x) + ρ (∥ x ∥_{1} - < x, u >)

\frac{1}{2} ∥ A x_{ρ} - d ∥^{2} + ι_{\cdot \geq 0} (x_{ρ}) + I (u_{ρ}) +

\frac{1}{2} ∥ A x_{ρ} - d ∥^{2} + ι_{\cdot \geq 0} (x_{ρ}) + I (u_{ρ}) +

\frac{1}{2} ∥ A x - d ∥^{2} + ι_{\cdot \geq 0} (x) + I (u) + ρ (∥ x ∥_{1} - < x, u >)

\frac{1}{2} ∥ A x_{ρ} - d ∥^{2} + ι_{\cdot \geq 0} (x_{ρ}) + ρ ∥ (x_{ρ})_{ω} ∥_{1} \leq \frac{1}{2} ∥ A \tilde{x} - d ∥^{2} + ι_{\cdot \geq 0} (\tilde{x}) + ρ ∥ x_{ω} ∥_{1}

\frac{1}{2} ∥ A x_{ρ} - d ∥^{2} + ι_{\cdot \geq 0} (x_{ρ}) + ρ ∥ (x_{ρ})_{ω} ∥_{1} \leq \frac{1}{2} ∥ A \tilde{x} - d ∥^{2} + ι_{\cdot \geq 0} (\tilde{x}) + ρ ∥ x_{ω} ∥_{1}

∥ A x - d ∥^{2}

∥ A x - d ∥^{2}

= i \sum (A x)_{i}^{2} + ∥ d ∥^{2} - 2 i \sum x_{i} (A^{T} d)_{i}

= i \sum (j \in J \sum A_{ij} x_{j})^{2} + (j \in ω \sum A_{ij} x_{j})^{2} + ∥ d ∥^{2} -

2 [i \in J \sum x_{i} (A^{T} d)_{i} + i \in ω \sum x_{i} (A^{T} d)_{i}]

\frac{1}{2} i \sum (j \in ω \sum A_{ij} (x_{ρ})_{J})^{2} -

\frac{1}{2} i \sum (j \in ω \sum A_{ij} (x_{ρ})_{J})^{2} -

\frac{1}{2} i \sum (j \in ω \sum A_{ij} x_{j})^{2} - i \in ω \sum x_{i} (A^{T} d)_{i} + ρ ∥ x_{ω} ∥_{1} + ι_{\cdot \geq 0} (x_{ω})

x_{ω} arg min \frac{1}{2} i \sum (j \in ω \sum A_{ij} x_{j})^{2} - i \in ω \sum x_{i} (A^{T} d)_{i} + ρ ∥ x_{ω} ∥_{1} ι_{\cdot \geq 0} (x_{ω})

x_{ω} arg min \frac{1}{2} i \sum (j \in ω \sum A_{ij} x_{j})^{2} - i \in ω \sum x_{i} (A^{T} d)_{i} + ρ ∥ x_{ω} ∥_{1} ι_{\cdot \geq 0} (x_{ω})

x_{ω} arg min \frac{1}{2} ∥ A_{ω} x_{ω} - d ∥^{2} + ρ ∥ x_{ω} ∥_{1} + ι_{\cdot \geq 0} (x_{ω})

x_{ω} arg min \frac{1}{2} ∥ A_{ω} x_{ω} - d ∥^{2} + ρ ∥ x_{ω} ∥_{1} + ι_{\cdot \geq 0} (x_{ω})

x, u arg min \frac{1}{2} ∥ A x - d ∥^{2} + ι_{\cdot \geq 0} (x) + ρ (∥ x ∥_{1} - < x, u >) + I (u)

x, u arg min \frac{1}{2} ∥ A x - d ∥^{2} + ι_{\cdot \geq 0} (x) + ρ (∥ x ∥_{1} - < x, u >) + I (u)

(x_{ρ}, u_{ρ}) verifies ∥ x_{ρ} ∥_{1} =< x_{ρ}, u_{ρ} > .

(x_{ρ}, u_{ρ}) verifies ∥ x_{ρ} ∥_{1} =< x_{ρ}, u_{ρ} > .

G_{ρ} (x_{ρ}, u_{ρ}) \leq G_{ρ} (x, u) \forall (x, u) \in N ((x_{ρ}, u_{ρ}), γ)

G_{ρ} (x_{ρ}, u_{ρ}) \leq G_{ρ} (x, u) \forall (x, u) \in N ((x_{ρ}, u_{ρ}), γ)

G_{ρ} (x_{ρ}, u_{ρ}) \leq G_{ρ} (x, u) \forall (x, u) \in N ((x_{ρ}, u_{ρ}), γ) \cap S

G_{ρ} (x_{ρ}, u_{ρ}) \leq G_{ρ} (x, u) \forall (x, u) \in N ((x_{ρ}, u_{ρ}), γ) \cap S

G (x_{ρ}, u_{ρ}) \leq G (x, u) \forall (x, u) \in N ((x_{ρ}, u_{ρ}), γ) \cap S

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhotoacoustic and Ultrasonic Imaging · Aortic aneurysm repair treatments · Sparse and Compressive Sensing Techniques

Full text

11institutetext: Arne Bechensteen 22institutetext: Université Côte d’Azur, CNRS, INRIA, Laboratoire I3S UMR 7271, 06903 Sophia Antipolis, France, 22email: [email protected] 33institutetext: Laure Blanc-Féraud 44institutetext: Université Côte d’Azur, CNRS, INRIA, Laboratoire I3S UMR 7271, 06903 Sophia Antipolis, France, 44email: [email protected] 55institutetext: Gilles Aubert 66institutetext: Université Côte d’Azur, UNS, Laboratoire J. A. Dieudonné UMR 7351, 06100 Nice, France, 66email: [email protected]

Research Report: Exact biconvex reformulation of the $\ell_{2}-\ell_{0}$ minimization problem

Arne Bechensteen

Laure Blanc-Féraud

and Gilles Aubert

Abstract

We focus on the minimization of the least square loss function either under a $k$ -sparse constraint or with a sparse penalty term. Based on recent results, we reformulate the $\ell_{0}$ pseudo-norm exactly as a convex minimization problem by introducing an auxiliary variable. We then propose an exact biconvex reformulation of the $\ell_{2}-\ell_{0}$ constrained and penalized problems. We give correspondence results between minimizers of the initial function and the reformulated ones. The reformulation is biconvex and the non-convexity is due to a penalty term. These two properties are used to derive a minimization algorithm. We apply the algorithm to the problem of single-molecule localization microscopy and compare the results with the well-known Iterative Hard Thresholding algorithm. Visually and numerically the biconvex reformulations perform better.

1 Introduction

Sparse optimization consists in finding a solution with many zero components from an underdetermined problem. There are many problems where the solution has many zero components (e.g machine learning, variable selection, pulse deconvolution, etc). The most common way to measure the sparsity of a solution is by using the counting function $\|\cdot\|_{0}$ which is, by abuse of terminology, referred to as the $\ell_{0}$ -norm, and is defined as

[TABLE]

In this paper, we are interested in linear problems where the observation $d\in\mathbb{R}^{M}$ can be described as the multiplication of the solution $x\in\mathbb{R}^{N}$ with a matrix $A\in\mathbb{R}^{M\times N}$ plus some noise $\eta$ which we assume to be additive white Gaussian and independent of the data.

[TABLE]

This problem is underdetermined when $M<N$ . In sparse optimization involving the square norm, there are three different approaches to tackle the problem. We search for $\hat{x}$ solution of:

[TABLE]

The two cases (C) and (1) are on constrained form. For problem (C), the user has a knowledge of the sparsity of the solution, which is in this case at maximum $k$ . In the case of (1), the user have prior knowledge about the amount of noise, $\eta$ , the signal $d$ has been affected by. It is usually possible to estimate $\epsilon$ from the data and the statistics of $\eta$ .

In the case of problem (P), which is also referred to as the penalized $\ell_{0}$ form, the user does not have any information on the sparsity of the solution, nor on the noise the signal $d$ has been affected by. Therefore, the user must choose the amplitude of $\lambda\in\mathbb{R}_{+}$ which serves as a trade-off parameter between the data term and the sparsity term. If $\lambda$ is large, the reconstruction of $x$ will be sparse, but the difference between $Ax$ and $d$ may be large. Conversely, if $\lambda$ is small, the error between $Ax$ and $d$ is small, but the reconstructed $x$ may not be sparse.

The above problems are not continuous, nor convex and the problems are known to be NP-hard due to the combinatorial nature of the $\ell_{0}$ -norm. However, they have been greatly studied due to their countless applications such as sparse reconstruction of signals, variable selection, and single-molecule localization microscopy to cite a few. There are two main approaches to solve the problems, which are greedy algorithms and relaxations. A new approach has been lately been introduced which is a mathematical program with equilibrium constraint.

Greedy algorithms Greedy algorithms are often used in sparse optimization. The idea behind these algorithms is to start with a zero initialization and for each iteration add one component to the signal $x$ until the wished sparsity is obtained. One of the easiest and least costly greedy algorithms, the Matching Pursuit (MP) algorithm mallat_matching_1993 , adds the component that reduces the residual $R$ at each iteration $s$ , which is defined as $R=d-Ax^{s}$ .

The Orthogonal Matching Pursuit (OMP), proposed in pati_orthogonal_1993 , is a more refined version of MP. The algorithm chooses each component in the same way as MP, but for each new component added, it calculates and update the value of all the previous components. This may lead to a better result, but the complexity and cost of the calculation are greater than MP.

Greedy algorithms have been greatly studied and a lot of different versions of the above algorithms has been developed. More complex ones, as the algorithm Greedy sparse simplex beck_sparsity_2012 or Single Best Replacement (SBR) soussen2011bernoulli have been introduced. In contrast to MP or OMP, the algorithms can add and also substract components.

Relaxations The three formulations of the sparse optimization problem (P, C and 1) are non-convex, due to the non-convexity of the $\ell_{0}$ -norm. A common alternative is to work with the convex $\ell_{1}$ -norm instead of non-convex $\ell_{0}$ -norm. The $\ell_{2}-\ell_{0}$ problem becomes a $\ell_{2}-\ell_{1}$ problem. This is called a convex relaxation since the non-convex term is replaced by a convex term. However, only under certain assumptions, the original problem and the convex relaxed problem have the same solutions candes_robust_2006 . Furthermore, $\|x\|_{0}$ and $\|x\|_{1}$ are very different when $x$ contains large values. Non-smooth, non-convex but continuous relaxations where primarily introduced to avoid the difference between $\|x\|_{0}$ and $\|x\|_{1}$ when $x$ contains large values. These relaxations are still non-convex, and the convergence of the algorithms to a global minimum is not assured. There are many non-convex continuous relaxations, as the NonNegative Garrote breiman_better_1995 , the Log-Sum penalty candes_enhancing_2007 or Capped- $\ell_{1}$ peleg_bilinear_2008 to mention some. The continuous Exact $\ell_{0}$ penalty introduced in soubies2015continuous proposes an exact relaxation for the problem (P) and a unified view of these functions is given in soubies2017unified .

Mathematical program with equilibrium constraint A more recent method resolving a sparse optimization problem is to introduce auxiliary variables to simulate the nature of $\ell_{0}$ -norm and add a constraint between the primary variable and the auxiliary. Hence the problem becomes a mathematical program with equilibrium constraint, and among the approaches, we find Mixed Integer reformulations bourguignon2016exact , Boolean relaxation pilanci2015sparse and the article that inspired our work, yuan_sparsity_2016 . The method has been used to study the three formulations of the sparse optimization problem (see bi2014exact ; lu2013sparse , for example).

Contribution: The aim of this paper is to present and study a new method for optimizing the constrained (C) and penalized (P) problem with an added positivity constraint. The added positivity constraint is important in many sparse optimization problems. We start in section 2 by introducing a reformulation of the $\ell_{0}$ -norm by a variational characterization. The norm is rewritten as a convex minimization problem by introducing an auxiliary variable, and we can reformulate (C) and (P) as a mathematical program with equilibrium constraint (MPEC). The reformulation of the $\ell_{0}$ -norm was presented in yuan_sparsity_2016 , and our work is an extension of their work, as they only study the minimization of a data term which is Lipschitz continuous with a sparsity constraint. In this paper, the data term is the square norm,on the error $Ax-d$ which is not Lipschitz continuous, and we study the minimization with a sparsity constraint (problem (C)) and with a sparse penalty term (problem (P)). Based on the MPEC formulation of the problem we define a Lagrangian cost function $G_{\rho}$ . The function $G_{\rho}:\mathbb{R}^{N}\times\mathbb{R}^{N}\rightarrow\mathbb{R}$ is biconvex. The main contribution of the paper is Theorem 2.1 for the constrained version of $G_{\rho}$ and Theorem 2.2 for the penalized version of $G_{\rho}$ . We show that minimizing the $G_{\rho}$ is equivalent, in the sense of minimizers, as to find a solution to the initial constrained or penalized problem. In section 3 we propose an algorithm to minimize the new objective function. This algorithm is easy to implement as it is based on already existing and well known algorithms. In section 4 we test the algorithms on the problem of single-molecule localization microscopy (SMLM). This is a well-studied problem sage2015quantitative where the goal is to localize the molecules with a high precision.

Notations:

•

$\|\cdot\|=\|\cdot\|_{2}$ , the $\ell_{2}$ -norm. If another norm is applied, this will be denoted with a subscript.

•

The function

[TABLE]

will be, by abuse of terminology, referred to as the $\ell_{0}$ -norm.

•

The observed signal $d\in\mathbb{R}^{M}$ .

•

A is a matrix in $\mathbb{R}^{M\times N}$ , $M<N$ .

•

$A^{T}$ is the transposed matrix of $A$ .

•

For a matrix $A\in\mathbb{R}^{M\times N}$ , the singular value decomposition (SVD) of $A$ is noted $A=U_{A}\Sigma(A)V_{A}^{*}$ .

•

For a matrix $A\in\mathbb{R}^{M\times N}$ , we denote $\|A\|$ the spectral norm of A defined as

[TABLE]

where $\sigma(A)$ is the largest singular value of $A$

•

If not stated otherwise, the vector $x\in\mathbb{R}^{N}$ .

•

The indicator function $\iota_{x\in X}$ is defined for $X\subset\mathbb{R}^{N}$ as

[TABLE]

•

The subgradient of the convex function $f$ at point $x$ is the set of vectors $v$ such that

[TABLE]

•

The normal cone $N_{C}(x_{0})$ of a convex set $C$ in $x_{0}\in C$ is defined as

[TABLE]

•

$-{\bf{1}}\leq u\leq{\bf{1}}$ is a component-wise notation, i.e, $\forall\,\,i,\,-1\leq u_{i}\leq 1$ .

•

$|x|\in\mathbb{R}^{N}$ is a vector containing the absolute value of each component of the vector $x$ .

2 Exact reformulation

In this section we focus on a reformulation of the $\ell_{0}$ -norm. This reformulation was first introduced in yuan_sparsity_2016 . The $\ell_{0}$ -norm can be rewritten as a convex minimization problem by introducing an auxiliary variable.

Lemma 1.

(yuan_sparsity_2016*, *, Lemma 1)** For any $x\in\mathbb{R}^{N}$

[TABLE]

Proof.

We consider first the problem

[TABLE]

The equality constraint $|x_{i}|=u_{i}x_{i}$ and $-1\leq u_{i}\leq 1$ yields

[TABLE]

As we minimize $\|u\|_{1}$ , if $x_{i}=0$ then $\hat{u}_{i}=0$ . We have that $\|\hat{u}\|_{1}=\|x\|_{0}$ . Furthermore, since $u\in[-1,1]$ , we have $|x_{i}|-u_{i}x_{i}\geq 0\,\forall i$ . So the constraint $|x_{i}|=x_{i}u_{i}\,\forall\,i$ is equivalent to $\sum_{i}|x_{i}|=\sum_{i}x_{i}u_{i}$ which is exactly $\|x\|_{1}=<x,u>$ . ∎∎

The introduction of the auxiliary variable increases the dimension of the problem, but the non-convex and non-continuous $\ell_{0}$ -norm can now be written as a convex minimization problem. In this paper, we study the $\ell_{2}-\ell_{0}$ penalized and constrained problems using the reformulation of the $\ell_{0}$ -norm. We also add non-negativity constraint to the $x$ variable as it is usually used as a priori in imaging problems. The two problems can be written as a general problem defined as:

[TABLE]

where $I(u)$ is in the case of the constrained problem (C):

[TABLE]

and for the penalized problem (P):

[TABLE]

We note the $\mathcal{S}=\{(x,u);\|x\|_{1}=<x,u>\}$ , and we define the functional $G$ as

[TABLE]

The functional (8) is still non-convex due to the equality constraint, but it is biconvex: the minimization of (8) with respect to $x$ while $u$ is fixed is convex, and conversely. However, the minimization of such a function is hard because of the equality constraint which is non-convex. We can relax this constraint by introducing a penalty term, $\rho(\|x\|_{1}-<x,u>)$ . This is based on the method of Lagrange Multipliers. Note that it is not necessary to add the absolute value to this penalty term since $\forall\,\,i\,,|u_{i}|\leq 1$ and therefore the penalty term is never negative.

We introduce a Lagrangian cost function $G_{\rho}(x,u):\mathbb{R}^{N}\times\mathbb{R}^{N}\rightarrow\mathbb{R}$ defined as

[TABLE]

In this paper we are focusing on exact penalty methods, such that a local or global minimizer of (9) leads to a local or global minimizer of the initial problem (8). The following theorem ensures this.

Theorem 2.1 (Constrained form).

Assume that $\rho>\sigma(A)\|d\|_{2}$ , and $A$ is of full rank. Let $G_{\rho}$ and $G$ be defined respectively in (9) and (8) with the constrained form $I(u)$ defined in (6). We have:

If $(x_{\rho},u_{\rho})$ is a local or global minimizer of $G_{\rho}$ , then $(x_{\rho},u_{\rho})$ is a local or global minimizer of $G$ . 2. 2.

If $(\hat{x},\hat{u})$ is a global minimizer of $G$ , then $(\hat{x},\hat{u})$ is a global minimizer of $G_{\rho}$ .

Two lemmas are needed in order to proof Theorem 2.1. The complete proofs of these lemmas require three other lemmas (Lemma 6, Lemma 9, and Lemma 10) stated in the Appendix.

Lemma 2.

Let $\rho>\sigma(A)\|d\|_{2}$ . Let $(x_{\rho},u_{\rho})$ be a local or global minimizer of $G_{\rho}(x,u):=\frac{1}{2}\|Ax-d\|^{2}+I(u)+\rho(\|x\|_{1}-<x,u>)$ with $I(u)$ defined as in (6) or (7). Let $\omega=\{i\in\{1,\dots,N\};(u_{\rho})_{i}=0\}$ . Then $(x_{\rho})_{i}=0\,\forall i\in\omega$

Proof.

Let $J$ denote the set of indices: $J=\{1,\dots,N\}\backslash\omega$ . If $(x_{\rho},u_{\rho})$ is a local or global minimizer of $G_{\rho}$ then $\forall(x,u)\in\mathcal{N}((x_{\rho},u_{\rho}),\gamma)$ , where $\mathcal{N}((x_{\rho},u_{\rho}),\gamma)$ denotes a neighborhood of $(x_{\rho},u_{\rho})$ of size $\gamma$ , we have

[TABLE]

By choosing $u=u_{\rho}$ and $x=\tilde{x}$ with $\tilde{x}_{J}=(x_{\rho})_{J}$ and $\tilde{x}_{\omega}=x_{\omega}$ , with $(x_{\omega},(u_{\rho})_{\omega})\in\mathcal{N}(((x_{\rho})_{\omega},(u_{\rho})_{\omega}),\gamma)$ , we have

[TABLE]

We want to show that $(x_{\rho})_{\omega}$ is zero. We have

[TABLE]

Using the above decomposition simplifies (10), and we have $\forall\,\,\,x_{\omega}$ :

[TABLE]

Thus $(x_{\rho})_{\omega}$ is a solution of

[TABLE]

or, equivalently solution of

[TABLE]

where $A_{\omega}$ is the $M\times\#\omega$ submatrix of $A$ composed by the columns indexed by $\omega$ of $A$ . With Lemma 6 (see Appendix), we have that $\sigma(A)\geq\sigma(A_{\omega})$ and if $\rho>\sigma(A)\|d\|_{2}$ we can apply Lemma 9 (see Appendix) with $w$ a vector composed of $\rho$ . We conclude that $(x_{\rho})_{\omega}=0$ . ∎∎

Lemma 3.

If $\rho>\sigma(A)\|d\|_{2}$ , let $(x_{\rho},u_{\rho})$ be a local or global minimizer of

[TABLE]

with $I(u)$ defined as in (6), that is, the constrained form. Then $\|x_{\rho}\|_{1}-<x_{\rho},u_{\rho}>=0$ .

Proof.

From Lemma 10 (see Appendix), we have that $(u_{\rho})_{i}(x_{\rho})_{i}=|(x_{\rho})_{i}|\forall\,\,i\in J$ , and $(u_{\rho})_{i}=0\,\forall i\in\omega$ . It suffices to prove $(x_{\rho})_{i}=0\,\forall i\in\omega$ . For that we use Lemma 2, and conclude that $(x_{\rho})_{\omega}=0$ . ∎∎

With the two above lemmas we can prove Theorem 2.1

Proof.

We start by proving the first part of the theorem. Let $(x_{\rho},u_{\rho})$ be a local minimizer of $G_{\rho}$ , with $I(u)$ on the constrained form, that is, defined as in (6). Let $\mathcal{S}=\{(x,u);\|x\|_{1}=<x,u>\}$ . If $\rho>\sigma(A)\|d\|_{2}$ then, from Lemma 3,

[TABLE]

Furthermore, from the definition of a minimizer, we have

[TABLE]

and so we have

[TABLE]

Since $\forall(x,u)\in\mathcal{S},G_{\rho}(x,u)=G(x_{\rho},u_{\rho})$ , we have

[TABLE]

By the definition, $(x_{\rho},u_{\rho})$ is also a local minimizer of $G$ .

Now we prove part 2 of Theorem 2.1.

Let $(\hat{x},\hat{u})$ be a global minimizer of $G$ . We necessarily have $\|\hat{x}\|_{1}=<\hat{x},\hat{u}>$ . First, we show that

[TABLE]

This can be shown by contradiction. Assume the opposite, and denote $(x_{\rho},u_{\rho})$ a global minimizer of $G_{\rho}$ . We then have

[TABLE]

Lemma 3 shows that $\|x_{\rho}\|_{1}=<x_{\rho},u_{\rho}>$ , so $G_{\rho}(x_{\rho},u_{\rho})=G(x_{\rho},u_{\rho})$ and we have

[TABLE]

and more precisely, $G(\hat{x},\hat{u})>G(x_{\rho},u_{\rho})$ which is not possible, since $(\hat{x},\hat{u})$ is a global minimizer of $G$ .

We therefore have shown that $G_{\rho}(\hat{x},\hat{u})\leq\min G_{\rho}(x,u)$ , and we have

[TABLE]

$(\hat{x},\hat{u})$ is thus a global minimizer of $G_{\rho}$ . ∎∎

Theorem 2.2 (Penalized form).

Assume that $\rho>\sigma(A)\|d\|_{2}$ , and $A$ is of full rank. Let $G_{\rho}$ and $G$ be defined respectively in (9) and (8) with on the penalized form with $I(u)$ defined in (7). We have:

If $(x_{\rho},u_{\rho})$ is a local or global minimizer of $G_{\rho}$ , then we can construct $(x_{\rho},\tilde{u}_{\rho})$ which is a local or global minimizer of $G$ . 2. 2.

If $(\hat{x},\hat{u})$ is a global minimizer of $G$ , then $(\hat{x},\hat{u})$ is a global minimizer of $G_{\rho}$ .

For the proof, we need two lemmas, Lemma 2 which is already presented and the following lemma.

Lemma 4.

Let $(x_{\rho},u_{\rho})$ be a local or a global minimizer of $G_{\rho}$ for the penalized form ( $I(u)$ defined as in (7)). If $\rho>\sigma(A)\|d\|_{2}$ then $\forall\,i$ such that $(u_{\rho})_{i}=0$ we have $(x_{\rho})_{i}=0$

Proof.

From Lemma 11 (see Appendix), we have that $(u_{\rho})_{i}=0$ iff $(x_{\rho})_{i}\in]-\frac{\lambda}{\rho},\frac{\lambda}{\rho}[$ . We denote $\omega$ the set of indices where $u_{\rho}=0$ , and we can apply Lemma 2, and conclude that $(x_{\rho})_{\omega}=0$ .

∎∎

Remark 1.

If $\rho>\sigma(A)\|d\|_{2}$ , note that the cost function $G_{\rho}$ with minimizers $(x_{\rho},u_{\rho})$ is constant on $|(x_{\rho})_{i}|=\frac{\lambda}{\rho}$ and $|(u_{\rho})_{i}|\in[0,1]$ .

Remark 2.

In the case of the penalized form, the minimizers ( $x_{\rho},u_{\rho}$ ) of $G_{\rho}$ with $\rho>\sigma(A)\|d\|_{2}$ may be such that $<x_{\rho},u_{\rho}>\neq\|x_{\rho}\|_{1}$ . This may only happen if $|(x_{\rho})_{i}|=\frac{\lambda}{\rho}$ .

Remark 3.

If $\rho>\sigma(A)\|d\|_{2}$ . From Remark 1, from a minimizer $(x_{\rho},u_{\rho})$ of $G_{\rho}$ , we can construct a minimiser $(x_{\rho},\tilde{u}_{\rho})$ of $G_{\rho}$ such that $\|x_{\rho}\|_{1}=<x_{\rho},\tilde{u}_{\rho}>$ . This can be done by denoting $Z$ , the set of indices such that $0<|(u_{\rho})_{i}|<1$ . If $Z$ is non-empty, we have $<x_{\rho},u_{\rho}>\neq\|x_{\rho}\|_{1}$ . From Remark 2, $|(x_{\rho})_{i}|=\frac{\lambda}{\rho}\forall i\in Z$ . Take $\tilde{u}_{\rho i}=sign(x_{i})\,\,\,\forall i\in Z$ and $\tilde{u}_{\rho i}=(u_{\rho})_{i}\,\forall i\notin Z$ , then $<x_{\rho},\tilde{u}_{\rho}>=\|x_{\rho}\|_{1}$ . Furthermore, $(x_{\rho},\tilde{u}_{\rho})$ is a minimizer of $G_{\rho}$ after Remark 1 and the fact that $G_{\rho}(x_{\rho},u)$ is convex with respect to $u$ .

With Lemma 4 and the above remarks, we can prove Theorem 2.2.

Proof.

We start by proving the first part of the theorem. Given $(x_{\rho},u_{\rho})$ a local or global minimizer of $G_{\rho}$ , with $I(u)$ on the penalized form, that is, defined as in (6). Let $\mathcal{S}$ denote the space where $\|x\|_{1}=<x,u>$ . If $\rho>\sigma(A)\|d\|_{2}$ then, from remark 3, we can construct $(x_{\rho},\tilde{u}_{\rho})$ such that

[TABLE]

Furthermore, from the definition of a minimizer, we have

[TABLE]

and so we get

[TABLE]

Since $\forall(x,u)\in\mathcal{S},G_{\rho}(x,u)=G(x_{\rho},u_{\rho})$ , we obtain

[TABLE]

Then, $(x_{\rho},\tilde{u}_{\rho})$ is also a local minimizer of $G$ .

The second part of Theorem 2.2 can be proved as in the proof of Theorem 2.1. ∎∎

Theorem 2.1 and 2.2 show that, for a given $\rho$ , minimizing (9) is equivalent in terms of minimizers as minimizing (8). The results in this section are similar to yuan_sparsity_2016 . In their paper, they, instead of working with the square norm, work with a Lipschitz continuous function $f$ . We were inspired by their work to extend it to the square norm. They have a theorem equivalent to Theorem 2.1, but the lower bound for $\rho$ is less sharp. Furthermore, the paper yuan_sparsity_2016 does not tackle the penalized sparsity problem and thus has not a theorem equivalent to Theorem 2.2.

Although $G_{\rho}(x,u)$ in (9) is non-convex, the formulation is biconvex, i.e, the functional is convex with respect to $x$ when $u$ is constant and conversely. With that in mind, we propose in the next section an algorithm to minimize (9).

3 A minimization algorithm

The functional $G_{\rho}$ has two interesting particularities. The first is that the non-convexity of $G_{\rho}$ is due to the coupling term $<x,u>$ . $G_{\rho}$ is therefore convex when the penalty parameter $\rho$ equals to zero. This inspires the idea of an algorithm to minimize $G_{\rho}(x,u)$ . The minimization starts with a $\rho^{0}$ small and minimizes $G_{\rho^{0}}(x^{0},u^{0})$ . For each iteration, the penalty parameter, $\rho$ , increases and the solution of the previous iteration are used as initialization for the next minimization. This method will hopefully give a good initialization for the final minimization, that is when $\rho$ is according to Theorem 2.1 and Theorem 2.2. The second interesting property of functional $G_{\rho}$ is the biconvexity. Minimization by blocks is therefore interesting since with respect to each block (that is either $x$ or $u$ ), the problem is convex. With this in mind, and following yuan_sparsity_2016 , we propose the following algorithm.

The minimization of (15) is done by using the Proximal Alternating Minimization algorithm (PAM) attouch_proximal_2008 which ensures convergence to a critical point. The PAM minimizes functions on the form

[TABLE]

In our case, we have, $f(x)=\frac{1}{2}\|Ax-d\|^{2}+\rho\|x\|_{1}+\iota_{\cdot\geq 0}(x)$ , $g(u)=I(u)$ and $Q(x,u)=-\rho<x,u>$ . PAM has the following outline

[TABLE]

$c^{s}$ and $b^{s}$ add strict convexity to each block, and $c^{s},b^{s}$ are bounded from below and above.

In the following section we develop the minimization schemes for (15) in the case of the constrained problem ( $I(u)$ defined as in (6)) respectively the penalized problem ( $I(u)$ defined as in (7)). We recall the minimization of $G_{\rho}$ is

[TABLE]

where $I(u)$ is defined in (6) or in (7).

3.1 The minimization with respect to $x$ .

The minimization with respect to $x$ using PAM is

[TABLE]

which can be rewritten as

[TABLE]

This problem can be solved using the FISTA algorithm beck_fast_2009 . The algorithm works with a functional $F(x)=f(x)+g(x)$ where $f$ is a smooth convex function with a Lipschitz continuous gradient $L(f)$ . $g$ is a continuous convex function and possibly non-smooth. In our case we have

[TABLE]

The proximal operator of $g(x)$ is the soft thresholding with positivity constraint

[TABLE]

3.2 The minimization with respect to $u$

In this section we study how to find a solution to the following convex minimization problem

[TABLE]

The above problem can be rewritten as

[TABLE]

and for simplicity we denote $z=u^{s}+\rho b^{s}x^{s+1}$ .

The constrained minimization of $u$

In this section we work with the constrained formulation of $G_{\rho}$ . Then the minimization problem (21) can be simplified and written as

[TABLE]

Since the minimizer of $\operatorname*{arg\,min}_{u}\frac{1}{2}\|u-z\|^{2}$ is reached for $u=z$ , we can write $u^{s+1}=sign(z)\operatorname*{arg\,min}_{u}\frac{1}{2}\|u-|z|\|^{2}$ . Furthermore, since the $\|\cdot\|_{1}$ is invariant with respect to the sign, we can rewrite the minimization problem as

[TABLE]

and then $u^{s+1}=sign(z)|u^{s+1}|$ . This minimization problem is a variant of the knapsack problem which can be solved using classical minimization schemes such as doi:10.1155/S168712000402009X :

[TABLE]

The penalized minimization of $u$

The minimization of (21) with respect to $u$ , with $I(u)$ on the penalized form (7), can be written as

[TABLE]

Proposition 1.

The solution $u^{s+1}$ of

[TABLE]

is reached for

[TABLE]

Proof.

Problem (22) has a closed form expression which can be found by calculating the subgradient for the problem (22) with respect to $u$ . Note that the subgradient of the box constraint $\iota_{-1\leq\cdot\leq 1}$ is 0 if $|u_{i}|<1$ , $[0,\infty[$ if $u_{i}=1$ and $]-\infty,0]$ if $u_{i}=-1$ . We obtain the following optimal conditions:

[TABLE]

and the optimal solution $u_{\rho}$ is

[TABLE]

∎∎

4 Application to 2D single-molecule localization microscopy

In this section, we compare the minimization of the biconvex reformulations to the algorithm Iterative Hard Thresholding combettes2005signal where we add the non-negativity constraint to $x$ . This algorithm performs as well to both formulations (C) and (P). They are applied to the problem of 2D Single-Molecule Localization Microscopy (SMLM).

SMLM is a microscopy method which is used to obtain images with a higher resolution than what is possible with normal optical microscopes. This was first introduced in hess2006ultra ; betzig2006imaging ; rust2006sub . Fluorescent microscopy uses molecules that can emit light when they are excited with a laser. The molecules are observed with an optical microscope, and, since the molecules are smaller than the diffraction limit, what is observed is not each molecule, but rather a diffraction disk larger than the molecule. This limits the resolution of the image. SMLM exploits photoactivatable fluorescent molecules, and, instead of activating all the molecules at once as done by other fluorescent microscopy methods, activates a sparse set of fluorescent molecules. The localization of each molecule with a high precision is possible since the probability of two or more molecules to be in the same diffraction disk is small. The localization becomes harder if the density of emitting molecules is higher. Once each molecule has been precisely localized, they are switched off and the process is repeated until all the molecules have been activated. The total acquisition time may be long when activating few molecules at a time, which is unfortunate as SMLM may be used on living samples which can move during this time. This will lead to a faulty reconstruction. We are, in this paper, interested in high-density acquisitions.

The localization problem of SMLM can be described as a $\ell_{2}-\ell_{0}$ minimization problem such as (P) and (C) with an added positivity constraint since we reconstruct the intensity of the molecules. The two biconvex formulations can be applied to the SMLM problem. $A$ is the matrix operator that performs a convolution with the Point Spread function and a reduction of dimensions. The molecules are reconstructed on a grid $\in\mathbb{R}^{ML\times ML}$ which is finer than the observed image $\in\mathbb{R}^{M\times M}$ , with $L>1$ . For a complete lecture on the mathematical model, see gazagnes2017high .

We test the algorithms on two datasets, both accessible from the ISBI 2013 challenge sage2015quantitative . Both datasets are of high-density acquisitions. The first dataset contains simulated acquisitions, which makes it possible to do a numerical evaluation of the reconstruction. The second dataset contains real acquisitions. For a complete lecture on the SMLM and the different localization algorithms, see the ISBI-SMLM challenge sage2015quantitative . In Figure 1 are three of the 361 acquisitions of the simulated dataset. We apply the localization algorithms to each acquisition, and the results of the localization of the 361 acquisitions yields one super-resolution image.

We use the Jaccard index in order to perform a numerical evaluation of the reconstructions. The Jaccard index evaluates only the localization of the reconstructed molecules (see sage2015quantitative ). The Jaccard index is the ratio between the correctly reconstructed (CR) molecules and the sum of CR-, false negatives (FN)- and false positives (FP) molecules. The index is 1 for a perfect reconstruction, and the lower the index, the poorer the reconstruction. The Jaccard index includes a tolerance of error in its calculations of correctly reconstructed molecules.

[TABLE]

4.1 Results of the simulated dataset

The ISBI simulated dataset represents 8 tubes of 30 nm diameter. The acquisition is of the size of $64\times 64$ pixels where each pixel is of size $100\times 100$ nm2. The Point Spread Function (PSF) is modeled by a Gaussian function where the Full Width at Half Maximum (FWHM) is 258.21 nm. In total there are 81 049 molecules on a total of 361 images.

We localize the molecules with a higher precision on a $256\times 256$ pixel image, where the size of each pixel is $25\times 25$ nm2. As an optimization problem, this is equivalent to reconstruct $x\in\mathbb{R}^{ML\times ML}$ for an acquisition $d\in\mathbb{R}^{M\times M}$ , where $M=64$ and $L=4$ . The center of the pixel is used to estimate the position of the molecule.

We set $k$ , the maximum number of molecules the algorithm reconstructs, equal to 220 for the constrained problem. This number is the average number of molecules for each acquisition, which we know from the ground truth. Note that in order to observe the reconstruction, we normalize the image, that is, we let the smallest value in the image to be 0, and the largest to be 1. Each pixel has an intensity between 0 and 1, and the brighter the pixel the stronger the intensity.

We set $\rho=0.1$ for the biconvex algorithms. Note that a smaller $\rho$ could be chosen, but this implies longer computational time. Both constrained algorithms reconstruct 220 molecules for each acquisition. We choose a $\lambda$ for the two penalized algorithms such that they reconstruct around 220 molecules on average. For the IHT, $\lambda=0.13$ and for the biconvex penalized $\lambda=0.019$ . We initialize the IHT with applying the conjugate of the operator $A$ on the acquisition. The results of the reconstructions are shown in Figure 2. Both biconvex reformulations reconstruct the tubes thicker than the ground truth. The two IHT algorithms do not manage to distinguish between two tubes when they are close (see the red case in Figure 2) compared to the biconvex reformulations. The Jaccard index is shown in Table 1. We observe the low Jaccard index of the IHT constrained algorithm compared to the biconvex constrained algorithm. This might be surprising since the IHT seems to reconstruct the tubelins with a correct thickness. However, this indicates that IHT reconstruct many molecules of low intensity which are not situated on the tubelins.

4.2 Results of the real dataset

We compare the algorithms on a high-density dataset of tubulins which are provided from the 2013 ISBI SMLM challenge, where there are 500 acquisitions. Each acquisition is of size $128\times 128$ pixels and each pixel is of size $100\times 100$ nm2. The FWHM has been previously estimated to be 351.8 nm chahid2014echantillonnage . We localize the molecules on a $512\times 512$ pixel image, where each pixel is of size $25\times 25$ nm2.

In this section, we do not have any beforehand knowledge of the solution, and we set $k=140$ for the biconvex constrained algorithm. For the biconvex penalized algorithm we set $\lambda=1200$ . We choose $\rho=1$ because of computational time. For the constrained IHT algorithm, we set the constraint $k=100$ and for the penalized we set $\lambda=0.25$ . Figure 3 presents the reconstruction. The results are coherent with the results from the simulated dataset. The IHT algorithms reconstruct not as well as the biconvex algorithms, with the penalized version much worse than the constrained version.

5 Conclusion

In this paper, we have presented a reformulation of the $\ell_{2}-\ell_{0}$ constrained and penalized problems. We have proved in Theorem 2.1 and Theorem 2.2 the exactness of the reformulations, that is, we can from a minimizer of the reformulation obtain a minimizer of the initial problem. Furthermore, both reformulations are biconvex. Using two central properties of the reformulation, we derive a general algorithm in order to minimize the constrained or the penalized reformulation. This algorithm is easy to implement as each step can be decomposed to well-studied problems. The algorithms are compared to the well-known IHT algorithm on constrained and penalized form. We apply the algorithms to single-molecule localization microscopy and the two biconvex algorithms outperform the IHT algorithms visually and numerically.

As perspectives, it seems interesting to further investigate the reformulation of the $\ell_{0}$ -norm, and to introduce it with other data-fitting terms.

Appendix

In this Appendix we recall and prove some properties that are useful for the proof of Theorem 2.1 and Theorem 2.2.

Lemma 5.

Let $P\in\mathbb{R}^{N\times l}$ be a semi-orthogonal matrix, that is, a non-square matrix composed of orthonormal columns. Then, $P^{T}P$ is the identity matrix in $\mathbb{R}^{l\times l}$ .

Lemma 6.

Let $A\in\mathbb{R}^{M\times N}$ , let $a_{i}$ denote the $i$ th column of $A$ . Defining $\omega$ to be a set of indices, $\omega\subseteq\{1,\dots,N\}$ . Let the restriction of $A$ to the columns indexed by the elements of $\omega$ be denoted as $A_{\omega}=(a_{\omega[1]},\dots,a_{\omega[\#\omega]})\in\mathbb{R}^{M\times\#\omega}$ . Then $\|A_{\omega}\|\leq\|A\|$ .

Proof.

Note that we can write $A_{\omega}$ as the product of matrix $A$ and a matrix $P$ . We define the vector $e_{i}\in\mathbb{R}^{M}$ , the unitary vector which has zeros everywhere except for the $i^{t}h$ place. The matrix $P\in\mathbb{R}^{N\times\#\omega}$ can be constructed with $e_{i}\,\forall i\in\omega$ . The matrix $P$ is therefore a semi-orthonormal matrix. The spectral norm of the matrix $P$ is 1, as $P^{T}P$ is the identity matrix (from Lemma 5). The norm $A_{\omega}$ can be written as

[TABLE]

∎∎

Lemma 7.

[Pshenichnyi-Rockafellar lemma](zalinescu2002convex, , Theorem 2.9.1) Assume $g$ is a proper lower semi-continuous convex function. Let $C$ be a convex set, such that $int(C)\cap dom(g)\neq\emptyset$ . Then

[TABLE]

where $N_{C}$ is the normal cone of the convex set $C$ .

Lemma 8.

Given the problem

[TABLE]

where $A$ $\mathbb{R}^{M\times N}$ is a full rank matrix and $w$ a non-negative vector. $|x|$ is a vector which contains the absolute value of each component of $x$ . Let $\hat{x}$ be a solution of problem (24). Then $\|A\hat{x}-d\|_{2}$ is bounded independently of $w$ and

[TABLE]

Proof.

Let $\hat{x}$ be the solution of $\operatorname*{arg\,min}_{x}\frac{1}{2}\|Ax-d\|^{2}+<w,|x|>$ , then we have $\forall x\in\mathbb{R}^{N}$

[TABLE]

In particular, by choosing $x=0$ we have:

[TABLE]

The term $<w,|\hat{x}|>$ is always non-negative as $w$ is a non-negative vector, therefore we have

[TABLE]

and so

[TABLE]

∎∎

Lemma 9.

Let $f(x)=\frac{1}{2}\|Ax-d\|_{2}^{2}+<w,|x|>+\iota_{\cdot\geq 0}(x)$ , $A$ be a full rank matrix and $w$ is a non-negative vector. We have the following result: If $w_{i}>\sigma(A)\|d\|_{2}$ then the optimal solution of the following optimization problem:

[TABLE]

is achieved with $\hat{x}_{i}=0$ .

Proof.

We start by proving that $\sigma(A)\|d\|_{2}\geq\left|\left(A^{T}(A\hat{x}-d)\right)_{i}\right|$ . Remark that Lemma 8 is valid for problem (28), from which we have

[TABLE]

Then, by choosing, for all $i\in[1..N]$ , $w_{i}>\sigma(A)\|d\|_{2}$ , we are sure that $w_{i}>\left|\left(A^{T}(A\hat{x}-d)\right)_{i}\right|$ . From the Pshenichnyi-Rockafellar lemma, a necessary and sufficient condition for $\hat{x}$ is a minimizer of $f$ on $C$ is that

[TABLE]

where in our case $C$ is the $\mathbb{R}^{N}_{+}$ and $f(x)=\frac{1}{2}\|Ax-d\|^{2}+<w,|x|>$ . We have that $\partial f(x)=\partial(\frac{1}{2}\|Ax-d\|^{2})+\partial(<w,|x|>)$ since $f(x)$ is a sum of two convex functions, where the intersection of the domains is non empty (see (bookConvex, , Corollary 16.38)).

The optimal condition is therefore

[TABLE]

where

[TABLE]

and

[TABLE]

For $\hat{x}_{i}$ we have the following optimal condition

[TABLE]

If $w_{i}>\sigma(A)\|d\|_{2}$ , then $|A^{T}(A\hat{x}-d)_{i}|<w_{i}$ and $\hat{x}_{i}$ cannot be strictly positive, furthermore $\hat{x}_{i}$ cannot be strictly negative since we work in the non-negative space. Therefore $\hat{x}_{i}=0$ .

∎∎

Lemma 10.

Let $(x_{\rho},u_{\rho})$ be a local minimizer of $G_{\rho}$ defined in (9), with $I$ on the constrained form, that is, defined as in (6). Let $G_{x_{\rho}}(u)=\frac{1}{2}\|Ax_{\rho}-d\|^{2}+I(u)+\rho(\|x_{\rho}\|_{1}-<x_{\rho},u>)$ . We denote $O$ as the indexes of the k largest values of $\{i=1...N,|(x_{\rho})_{i}|\}$ . $Q\triangleq\{i|(x_{\rho})_{i}>0\}$ , and $S\triangleq\{j|(x_{\rho})_{j}<0\}$ . Moreover, we define $D\triangleq O\cap Q$ , $L\triangleq O\cap S$ and $W\triangleq\{1,2...,N\}\backslash\{D\cup L\}$ . If $\#(D\cup L)=k$ , that is, $\|x_{\rho}\|_{0}\geq k$ , then the minimum of $G_{x_{\rho}}(u)$ will be reached with $u_{\rho}$ such that

[TABLE]

If $\#(D\cup L)<k$ , that is, $\|x_{\rho}\|_{0}<k$ , then

[TABLE]

such that $\sum_{i\in W}|u_{i}|\leq k-\#(D\cup L)$ .

Proof.

We observe that minimizing $G_{x_{\rho}}(u)$ can be viewed as a problem of minimizing $-<x_{\rho},u>+\iota_{-1\leq\cdot\leq 1}(u)+\iota_{\|\cdot\|_{1}\leq k}(u)$ by using the definition of $I(u)$ . The results are obvious. ∎∎

Lemma 11.

Let $(x_{\rho},u_{\rho})$ be a local minimizer of $G_{\rho}$ defined in (9), with $I$ on the penalized form, that is, defined as in (7). Let $G_{x_{\rho}}(u)=\frac{1}{2}\|Ax_{\rho}-d\|^{2}+I(u)+\rho(\|x_{\rho}\|_{1}-<x_{\rho},u>)$ . The minimum of $G_{x_{\rho}}(u)$ will be reached with a $u_{\rho}$ such that

[TABLE]

Proof.

Proof of the necessary condition:

We start by writing the optimal conditions of $G_{x_{\rho}}(u)$ .

[TABLE]

We split the study of (33) in five cases.

•

If $(u_{\rho})_{i}=1$

[TABLE]

Thus, $(u_{\rho})_{i}=1\Rightarrow(x_{\rho})_{i}\in[\frac{\lambda}{\rho},+\infty[$

•

If $0<(u_{\rho})_{i}<1$

[TABLE]

Thus $0<(u_{\rho})_{i}<1\Rightarrow(x_{\rho})_{i}=\frac{\lambda}{\rho}$

•

If $(u_{\rho})_{i}=0$

[TABLE]

Thus $(u_{\rho})_{i}=0\Rightarrow(x_{\rho})_{i}\in\frac{\lambda}{\rho}[-1,1]$

•

If $-1<(u_{\rho})_{i}<0$

[TABLE]

Thus $-1<(u_{\rho})_{i}<0\Rightarrow(x_{\rho})_{i}=-{\lambda}{\rho}$

•

If $(u_{\rho})_{i}=-1$

[TABLE]

Thus, $u_{\rho}=-1\Rightarrow(x_{\rho})_{i}\in]-\infty,-\frac{\lambda}{\rho}]$

Proof of sufficient condition:

We can prove that the reverse statement is also true. We can rewrite $(x_{\rho})_{i}=\frac{\beta}{\rho}$ , for some $\beta\in\mathbb{R}$ . We have then from the optimal conditions (33) that

[TABLE]

Assuming $\beta>\lambda$ , then only (34) is possible. If $\beta=\lambda$ , then (34), (35) (36) are possible. If $0\leq\beta<\lambda$ , then only (36) is possible. If $-\lambda<\beta<0$ , then only (36) is possible. If $\beta=-\lambda$ , then (36), (37) and (38) are possible. If $\beta<-\lambda$ , then only (38) is possible.

This finishes the proof.

∎∎

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Hédy Attouch, Jérôme Bolte, Patrick Redont, and Antoine Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Mathematics of Operations Research , 35(2):438–457, 2010.
2(2) A. Beck and M. Teboulle. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences , 2(1):183–202, January 2009.
3(3) Amir Beck and Yonina C Eldar. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM Journal on Optimization , 23(3):1480–1509, 2013.
4(4) Eric Betzig, George H Patterson, Rachid Sougrat, O Wolf Lindwasser, Scott Olenych, Juan S Bonifacino, Michael W Davidson, Jennifer Lippincott-Schwartz, and Harald F Hess. Imaging intracellular fluorescent proteins at nanometer resolution. Science , 313(5793):1642–1645, 2006.
5(5) Shujun Bi, Xiaolan Liu, and Shaohua Pan. Exact penalty decomposition method for zero-norm minimization based on mpec formulation. SIAM Journal on Scientific Computing , 36(4):A 1451–A 1477, 2014.
6(6) Sébastien Bourguignon, Jordan Ninin, Hervé Carfantan, and Marcel Mongeau. Exact sparse approximation problems via mixed-integer programming: Formulations and computational performance. IEEE Transactions on Signal Processing , 64(6):1405–1419, 2016.
7(7) Leo Breiman. Better Subset Regression Using the Nonnegative Garrote. Technometrics , 37(4):373–384, 1995.
8(8) E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory , 52(2):489–509, February 2006.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Research Report: Exact biconvex reformulation of the ℓ2−ℓ0\ell_{2}-\ell_{0}ℓ2​−ℓ0​ minimization problem

Abstract

1 Introduction

2 Exact reformulation

Lemma 1.

Proof.

Theorem 2.1 (Constrained form).

Lemma 2.

Proof.

Lemma 3.

Proof.

Proof.

Theorem 2.2 (Penalized form).

Lemma 4.

Proof.

Remark 1.

Remark 2.

Remark 3.

Proof.

3 A minimization algorithm

3.1 The minimization with respect to xxx.

3.2 The minimization with respect to uuu

The constrained minimization of uuu

The penalized minimization of uuu

Proposition 1.

Proof.

4 Application to 2D single-molecule localization microscopy

4.1 Results of the simulated dataset

4.2 Results of the real dataset

5 Conclusion

Appendix

Lemma 5.

Lemma 6.

Proof.

Lemma 7.

Lemma 8.

Proof.

Lemma 9.

Proof.

Lemma 10.

Proof.

Lemma 11.

Proof.

Research Report: Exact biconvex reformulation of the $\ell_{2}-\ell_{0}$ minimization problem

3.1 The minimization with respect to $x$ .

3.2 The minimization with respect to $u$

The constrained minimization of $u$

The penalized minimization of $u$