Efficient atom selection strategy for iterative sparse approximations

Cl\'ement Dorffer (1); Ang\'elique Dr\'emeau (1); Cedric Herzet (2); ((1) Lab-STICC UMR 6285; CNRS; ENSTA Bretagne; (2) INRIA Centre; Rennes-Bretagne Atlantique; Lab-STICC UMR 6285; CNRS; IMT-Atlantique)

arXiv:1812.01932·eess.SP·December 6, 2018

Efficient atom selection strategy for iterative sparse approximations

Cl\'ement Dorffer (1), Ang\'elique Dr\'emeau (1), Cedric Herzet (2), ((1) Lab-STICC UMR 6285, CNRS, ENSTA Bretagne, (2) INRIA Centre, Rennes-Bretagne Atlantique, Lab-STICC UMR 6285, CNRS, IMT-Atlantique)

PDF

Open Access

TL;DR

This paper introduces a computationally efficient atom selection method for sparse representation algorithms, applicable to both discrete and continuous dictionaries, demonstrated through DOA and Gaussian deconvolution experiments.

Contribution

It presents a novel low-computational strategy for atom selection that reduces complexity in sparse approximation algorithms.

Findings

01

Significant computational savings in DOA and Gaussian deconvolution tasks

02

Applicable to both discrete and continuous dictionaries

03

Improves efficiency without sacrificing accuracy

Abstract

We propose a low-computational strategy for the efficient implementation of the "atom selection step" in sparse representation algorithms. The proposed procedure is based on simple tests enabling to identify subsets of atoms which cannot be selected. Our procedure applies on both discrete or continuous dictionaries. Experiments performed on DOA and Gaussian deconvolution problems show the computational gain induced by the proposed approach.

Equations14

a_{select} \in ar g a \in A max ⟨ r, a ⟩,

a_{select} \in ar g a \in A max ⟨ r, a ⟩,

a \in \overset{ˉ}{A} max ⟨ r, a ⟩ \leq a \in A max ⟨ r, a ⟩ .

a \in \overset{ˉ}{A} max ⟨ r, a ⟩ \leq a \in A max ⟨ r, a ⟩ .

⟨ r, a ⟩ < τ \Rightarrow a \in / ar g \tilde{a} \in A max ⟨ r, \tilde{a} ⟩ .

⟨ r, a ⟩ < τ \Rightarrow a \in / ar g \tilde{a} \in A max ⟨ r, \tilde{a} ⟩ .

a \in R max ⟨ r, a ⟩ < τ \Rightarrow \forall a \in A \cap R : a \in / ar g \tilde{a} \in A max ⟨ r, \tilde{a} ⟩,

a \in R max ⟨ r, a ⟩ < τ \Rightarrow \forall a \in A \cap R : a \in / ar g \tilde{a} \in A max ⟨ r, \tilde{a} ⟩,

\displaystyle\begin{array}[]{ll}\mathcal{B}_{\bm{t},\epsilon}\triangleq\{\bm{a}\leavevmode\nobreak\ :\leavevmode\nobreak\ \left\|\bm{a}-\bm{t}\right\|_{2}\leq\epsilon\}&\mbox{(sphere)}\\ \mathcal{D}_{\bm{t},\epsilon}\triangleq\{\bm{a}\leavevmode\nobreak\ :\leavevmode\nobreak\ \left\langle\bm{t},\bm{a}\right\rangle\geq\epsilon,\left\|\bm{a}\right\|_{2}=1\}&\mbox{(dome)},\end{array}

\displaystyle\begin{array}[]{ll}\mathcal{B}_{\bm{t},\epsilon}\triangleq\{\bm{a}\leavevmode\nobreak\ :\leavevmode\nobreak\ \left\|\bm{a}-\bm{t}\right\|_{2}\leq\epsilon\}&\mbox{(sphere)}\\ \mathcal{D}_{\bm{t},\epsilon}\triangleq\{\bm{a}\leavevmode\nobreak\ :\leavevmode\nobreak\ \left\langle\bm{t},\bm{a}\right\rangle\geq\epsilon,\left\|\bm{a}\right\|_{2}=1\}&\mbox{(dome)},\end{array}

a_{select} \in ar g a \in A \ A_{removed} max ⟨ r, a ⟩,

a_{select} \in ar g a \in A \ A_{removed} max ⟨ r, a ⟩,

\mathcal{O}(\underbrace{\mathrm{card}(\bar{\mathcal{A}})m}_{\mbox{evaluation of $\tau$}}+\underbrace{Lm}_{\mbox{evaluation of \eqref{eq_major2}}}+\underbrace{\mathrm{card}(\mathcal{A}\backslash\mathcal{A}_{\mathrm{removed}})m}_{\mbox{evaluation of \eqref{atom_selred}}}).

\mathcal{O}(\underbrace{\mathrm{card}(\bar{\mathcal{A}})m}_{\mbox{evaluation of $\tau$}}+\underbrace{Lm}_{\mbox{evaluation of \eqref{eq_major2}}}+\underbrace{\mathrm{card}(\mathcal{A}\backslash\mathcal{A}_{\mathrm{removed}})m}_{\mbox{evaluation of \eqref{atom_selred}}}).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Blind Source Separation Techniques

Full text

Efficient atom selection strategy for iterative sparse approximations

Clément Dorffer1, Angélique Drémeau1 and Cédric Herzet2

1Lab-STICC UMR 6285, CNRS, ENSTA Bretagne, Brest, F-29200, France.

2INRIA Centre Rennes-Bretagne Atlantique and Lab-STICC UMR 6285, CNRS, IMT-Atlantique, Rennes, F-35000, France. The authors thank the DGA/MRIS, the ONR (N62909-17-1-2007) and the ANR (ANR-15-CE23-0021) for their financial support.

Abstract

We propose a low-computational strategy for the efficient implementation of the “atom selection step” in sparse representation algorithms. The proposed procedure is based on simple tests enabling to identify subsets of atoms which cannot be selected. Our procedure applies on both discrete or continuous dictionaries. Experiments performed on DOA and Gaussian deconvolution problems show the computational gain induced by the proposed approach.

1 Problem statement

Sparsely approximating a signal vector $\bm{y}\in\mathbb{R}^{m}$ in a dictionary $\mathcal{A}$ consists of finding $k\ll m$ coefficients $x_{i}$ and atoms $\bm{a}_{i}\in\mathcal{A}$ such that $\bm{y}\approx\sum_{i=1}^{k}\bm{a}_{i}x_{i}$ . The dictionary $\mathcal{A}$ can be either discrete, i.e., composed of a finite number of elements, or “continuous”, i.e., having an infinite uncountable number of atoms.

Sparse approximations have proven to be relevant in many application domains and a great number of procedures to find “good” sparse approximations have been proposed in the literature: convex relaxation [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], greedy algorithms [11, 12, 13, 14, 15], Bayesian approaches [16, 17, 18, 19], etc. Many popular instances of these procedures rely on the same “atom selection” step, i.e., the new atom added to the support at each iteration, says $\bm{a}_{\text{select}}$ , verifies

[TABLE]

where $\left\langle\bm{\cdot},\bm{\cdot}\right\rangle$ denotes the inner product111In case of possibly negative coefficients in $\bm{x}$ , it is common to use the absolute value of the inner product. However, one can also deal with negative coefficients using eq. 1 with a doubled size dictionary containing atoms $\bm{a}_{i}$ and their negatives $-\bm{a}_{i}$ . and $\bm{r}$ is the current “residual”, i.e., the original signal from which the contributions from the previously selected atoms have been removed. The Frank-Wolfe algorithm [1], the matching pursuit [11] or the orthogonal matching pursuit [12] procedures are popular instances of algorithms using (1). It is worth noting that (1) constitutes the core of most sparse-approximation algorithms in the context of continuous dictionaries [15, 6] where some standard “matrix-vector” operations available in the discrete setting are no longer possible.

A brute-force evaluation222In the discrete setting, the standard approach consists of evaluating $\left\langle\bm{r},\bm{a}\right\rangle$ for all $\bm{a}\in\mathcal{A}$ , leading to a complexity scaling as $\mathcal{O}(\mathrm{card}(\mathcal{A})m)$ . In the continuous setting, a classic approach consists of running gradient ascent algorithms initialized on a fine discretization of the dictionary. of (1) may become resource consuming when the number of atoms in $\mathcal{A}$ is large since it requires the exploration of the whole dictionary to find the atom the most correlated with $\bm{r}$ . In this work, we propose a strategy to alleviate the complexity of (1). Our procedure is inspired from work [20]: it consists of performing simple tests allowing to identify group of atoms not attaining the maximum value of (1). Interestingly, the proposed approach provides a rigorous framework to recently-proposed procedures based on some approximations of continuous dictionaries [15, 6]. In the rest of this abstract, we do not elaborate on the connections with [15, 6] (details will be provided during the conference) but rather focus on the description of the proposed methodology.

2 Proposed strategy

Our proposed selection strategy is based on the following observations. If $\bar{\mathcal{A}}\subseteq\mathcal{A}$ ,

[TABLE]

Hence, letting $\tau\triangleq\max\limits_{\bm{a}\in\bar{\mathcal{A}}}\left\langle\bm{r},\bm{a}\right\rangle$ , we have $\forall\bm{a}\in\mathcal{A}$ :

[TABLE]

In other words, if $\bm{a}\in\mathcal{A}$ is an atom which satisfies the inequality in the left-hand side of (2), then this atom is surely not the one to be selected by (1). Elaborating on this observation, we further have:

[TABLE]

where $\mathcal{R}$ is some arbitrary subset of $\mathbb{R}^{m}$ . In the sequel we will refer to $\mathcal{R}$ as “region”. The operational meaning of (3) is as follows: if the inequality in the left-hand side is satisfied, one is ensured that no atom in $\mathcal{A}\cap\mathcal{R}$ will attain the maximum of $\left\langle\bm{r},\bm{a}\right\rangle$ . The entire set $\mathcal{A}\cap\mathcal{R}$ can thus be ignored, enabling us to reduce the number of candidate atoms to be tested in the selection (1).

Implication (3) constitutes the basis of our complexity reduction method. More specifically, we consider (3) with some particular choices of region $\mathcal{R}$ . These choices are motivated by the following requirements: i) $\mathcal{R}$ should lead to any easy evaluation of $\max\limits_{\bm{a}\in\mathcal{R}}\left\langle\bm{r},\bm{a}\right\rangle$ ; ii) $\mathcal{R}$ should approximate as tightly as possible some part of $\mathcal{A}$ since larger regions typically lead to inequalities more difficult to satisfy.

In our contribution, we show that the first requirement is satisfied for some particular geometries of region $\mathcal{R}$ . In particular, we consider “sphere”, “dome” and “slice” geometries. Sphere and dome regions can be formally expressed as

[TABLE]

whereas the mathematical characterization of the slice regions is more involved and not detailed in this abstract. We show that for these choices of regions, $\max\limits_{\bm{a}\in\mathcal{R}}\left\langle\bm{r},\bm{a}\right\rangle$ admits a simple analytical expression. In particular, evaluating $\max\limits_{\bm{a}\in\mathcal{R}}\left\langle\bm{r},\bm{a}\right\rangle$ for sphere and dome regions basically requires the computation of one single inner product $\left\langle\bm{t},\bm{r}\right\rangle$ .

We address the second requirement by proposing a methodology to automatically adapt the size of $\mathcal{R}$ (via a tuning of $\epsilon$ ) in order to satisfy the inequality in the left-hand side of (3). We do not detail this procedure here but mention that the evaluation of the “optimal” value of $\epsilon$ has a negligible complexity.

We thus propose the following strategy (summarized in Algorithm 1) to speed up the computation of (1). We select a set $\bar{\mathcal{A}}\subset\mathcal{A}$ and $L$ regions $\{\mathcal{R}_{l}\}_{l=1}^{L}$ , and apply test (3) for each region. Each test allows to identify a set333Which may be empty in some cases. of atoms which do not attain the maximum of $\left\langle\bm{r},\bm{a}\right\rangle$ . We evaluate (1) by working on a reduced dictionary:

[TABLE]

where $\mathcal{A}_{\mathrm{removed}}$ denotes the set of atoms which have been removed by the tests (3).

For discrete dictionaries, the computational complexity of the proposed method (for sphere and dome regions) evolves as

[TABLE]

This has to be compared to the complexity required by a brute evaluation of (1), i.e., $\mathcal{O}(\mathrm{card}(\mathcal{A})m)$ . In the next section, we propose to compare the efficiency and the computational gain allowed by the proposed methodology.

3 Experiments

We propose to challenge the proposed selection procedure and the classical exhaustive search on two different problems: a direction-of-arrival (DOA) estimation problem using a discrete dictionary and a Gaussian deconvolution problem with a continuous dictionary. Within the DOA estimation framework, we examine the number of scalar products required to achieve the selection step (1), with and without the proposed method. This provides a quantitative assessment of the computational gain induced by our method. The Gaussian deconvolution problem acts as a proof of concept, illustrating the interest of our method in continuous dictionaries.

In the DOA estimation problem, we consider a dictionary composed of $n=1000$ normalized steering vectors of size $m=100$ , each corresponding to a different angle of arrival in $[-\pi/2,\pi/2]$ . In the Gaussian deconvolution problem, $\mathcal{A}=\{a(\mu):\mu\in[0,100]\}$ where $a(\mu)$ is a Gaussian function with mean $\mu$ and variance $\sigma^{2}=10$ . In both simulations, the observed signal $\bm{y}$ is constructed as a random linear combination of $5$ atoms of $\mathcal{A}$ . Coefficients associated to the atoms are realizations of a uniform distribution on $[-1,1].$

In Fig. 1, we considered the DOA estimation problem. We applied 5 iterations of orthogonal matching pursuit on $\bm{y}$ (which requires to solve (1) at each iteration). Each column of the figure represents the (cumulated) number of inner products which have been computed with an exhaustive search of $\bm{a}_{\text{select}}$ in the whole dictionary (blue), and with the proposed method described in Algorithm 1 (red). We set $L=100$ and define the centers of the (dome) regions $\{\bm{t}_{l}\}_{l=1}^{L}$ by a regular subsampling of the dictionary. We let $\bar{\mathcal{A}}=\{\bm{t}_{l}\}_{l=1}^{L}$ . We see that the proposed method allows for a gain of complexity of one order of magnitude with respect to a brute-force approach.

In Fig. 1 we consider the Gaussian deconvolution problem and show the set of removed atoms, $\mathcal{A}\cap\mathcal{R}_{l}$ , for each (sphere) region $\{\mathcal{R}_{l}\}_{l=1}^{L=100}$ . We choose the centers of the regions $\{\bm{t}_{l}\}_{l=1}^{L}$ by a regular subsampling of $\mu\in\left[0,100\right]$ . The size of each region (value of $\epsilon$ ) is automatically tuned to verify (if possible) the inequality in the left-hand side (3) as discussed in Section 2. We set $\bar{\mathcal{A}}=\{\bm{t}_{l}\}_{l=1}^{L=100}$ . The figure shows the interval of value of $\mu$ which can be removed from the dictionary for each test region. We see in Fig. 1 that by the end of the proposed elimination procedure, the search space is reduced to a small interval, around $\mu\approx 90$ .

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval Research Logistics (NRL) , 3(1-2):95–110, 1956.
2[2] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) , pages 267–288, 1996.
3[3] Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on pure and applied mathematics , 57(11):1413–1457, 2004.
4[4] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences , 2(1):183–202, 2009.
5[5] Nikhil Rao, Parikshit Shah, and Stephen Wright. Forward–backward greedy algorithms for atomic norm regularization. IEEE Transactions on Signal Processing , 63(21):5798–5811, 2015.
6[6] Chaitanya Ekanadham, Daniel Tranchina, and Eero P Simoncelli. Recovery of sparse translation-invariant signals with continuous basis pursuit. IEEE transactions on signal processing , 59(10):4735–4744, 2011.
7[7] Gongguo Tang, Badri Narayan Bhaskar, Parikshit Shah, and Benjamin Recht. Compressed sensing off the grid. IEEE transactions on information theory , 59(11):7465–7490, 2013.
8[8] Angeliki Xenaki and Peter Gerstoft. Grid-free compressive beamforming. The Journal of the Acoustical Society of America , 137(4):1923–1935, 2015.