Discrete Optimization Methods for Group Model Selection in Compressed Sensing
Bubacarr Bah, Jannis Kurtz, Oliver Schaudt

TL;DR
This paper advances algorithms for recovering group-sparse signals in compressed sensing by developing new projection methods and analyzing their performance with Gaussian and expander sensing matrices.
Contribution
It introduces improved algorithms for group model projections, including exact and approximate methods, and extends analysis to more general group models beyond existing approaches.
Findings
Exact projection algorithms based on dynamic programming and Benders' Decomposition.
Approximate projections using greedy methods and LP-rounding.
Successful recovery demonstrated with Gaussian and expander sensing matrices.
Abstract
In this article we study the problem of signal recovery for group models. More precisely for a given set of groups, each containing a small subset of indices, and for given linear sketches of the true signal vector which is known to be group-sparse in the sense that its support is contained in the union of a small number of these groups, we study algorithms which successfully recover the true signal just by the knowledge of its linear sketches. We derive model projection complexity results and algorithms for more general group models than the state-of-the-art. We consider two versions of the classical Iterative Hard Thresholding algorithm (IHT). The classical version iteratively calculates the exact projection of a vector onto the group model, while the approximate version (AM-IHT) uses a head- and a tail-approximation iteratively. We apply both variants to group models and analyse the…
| Paper | Signal | Sensing | |
|---|---|---|---|
| Model | Sample | Ensemble | |
| complexity | |||
| [49] | Groups with overlaps | – | – |
| [21] | Blocks | Gaussian | |
| [5] | Binary trees | Subgaussian | |
| Blocks | |||
| [53] | Constrained earth mover distance | – | – |
| [34] | Binary Trees | Model RIP-1 matrices | |
| Blocks | |||
| [3] | -ary Trees | Expander | |
| Groups with loopless overlaps | |||
| [29] | Weighted graph model | Gaussian | |
| [44] | Blocks | Expander | |
| This | Groups with overlaps | Subgaussian | |
| work | & Expander | ||
| Paper | Reconstruction | ||||
|---|---|---|---|---|---|
| Algorithm | Model | Projection | Runtime | Instance | |
| name | projection | algorithm | complexity | optimality | |
| [49] | Group-LASSO | Exact | CD & BCD | Polynomial | – |
| [21] | -minimization | Exact | SOCP | Polynomial | |
| [5] | CoSaMP & IHT | Approximate | – | – | |
| [53] | EMD-IHT | Exact | – | Polynomial | – |
| EMD-CoSaMP | |||||
| [34] | Model-based | Approximate | – | Exponential | |
| sparse recovery | |||||
| [3] | MEIHT | Exact | DP | ||
| [29] | PCSF-TAIL | Approximate | – | – | |
| GRAPH-CoSaMP | |||||
| [44] | -minimization | Exact | SOCP | Polynomial | |
| MEIHT | Exact | DP or BD | |||
| This | Model-IHT | Exact | or exponential | ||
| work | AM-IHT | Approximate | LP-rounding | Polynomial | – |
| AM-EIHT | Approximate | & greedy | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Discrete Optimization Methods for Group Model Selection in Compressed Sensing
Bubacarr Bah AIMS South Africa, & Stellenbosch University, 6 Melrose Road, Muizenberg, Cape Town 7945, South Africa
Jannis Kurtz Chair for Mathematics of Information Processing, RWTH Aachen University, Pontdriesch 10, 52062 Aachen, Germany
Oliver Schaudt Chair for Mathematics of Information Processing, RWTH Aachen University, Pontdriesch 10, 52062 Aachen, Germany
Abstract
In this article we study the problem of signal recovery for group models. More precisely for a given set of groups, each containing a small subset of indices, and for given linear sketches of the true signal vector which is known to be group-sparse in the sense that its support is contained in the union of a small number of these groups, we study algorithms which successfully recover the true signal just by the knowledge of its linear sketches. We derive model projection complexity results and algorithms for more general group models than the state-of-the-art. We consider two versions of the classical Iterative Hard Thresholding algorithm (IHT). The classical version iteratively calculates the exact projection of a vector onto the group model, while the approximate version (AM-IHT) uses a head- and a tail-approximation iteratively. We apply both variants to group models and analyse the two cases where the sensing matrix is a Gaussian matrix and a model expander matrix.
To solve the exact projection problem on the group model, which is known to be equivalent to the maximum weight coverage problem, we use discrete optimization methods based on dynamic programming and Benders’ Decomposition. The head- and tail-approximations are derived by a classical greedy-method and LP-rounding, respectively.
Compressed Sensing, Group Models, Iterative Hard Thresholding, Maximum Weight Coverage
1 Introduction
In many applications involving sensors or sensing systems an unknown sparse signal has to be recovered from a relatively small number of measurements. The reconstruction problem in standard compressed sensing attempts to recover an unknown -sparse signal , i.e. it has at most non-zero entries, from its (potentially noisy) linear measurements . Here, for , and is a noise vector, typically with a bounded noise level ; see [19, 13, 14]. A well-known result is that, if is a random Gaussian matrix, the number of measurements required for most of the classical algorithms like -minimization or Iterative Hard Thresholding (IHT) to successfully recover the true signal is [15, 10].
In model-based compressed sensing we exploit second-order structures beyond the first order sparsity and compressibility structures of a signal to more efficiently encode and more accurately decode the signal. Efficient encoding means taking a fewer number of measurements than in the standard compressed sensing setting, while accurate decoding does not only include smaller recovery error but better interpretability of the recovered solution than in the standard compressed sensing setting. The second order structures of the signal are usually referred to as the structured sparsity of the signal. The idea is, besides standard sparsity, to take into account more complicated structures of the signal [5]. Nevertheless most of the classical results and algorithms for standard compressed sensing can be adapted to the model-based framework [5].
Numerous applications of model-based compressed sensing exist in practice. Key amongst these applications is the multiple measurement vector (MMV) problem, which can be modelled as a block-sparse recovery problem, [21]. The tree-sparse model has been well-exploited in a number of wavelet-based signal processing applications [5]. In the sparse matrix setting (see Section 2) the model-based compressed sensing was used to solve the Earth Mover Distance problem (EMD). The EMD problem introduced in [53] is motivated by the task of reconstructing time sequences of spatially sparse signals, e.g. seismic measurements; see also [26]. In addition, there are many more potential applications in linear sketching including data streaming [47], graph sketching [1, 25], breaking privacy of databases via aggregate queries [20], and in sparse regression codes or sparse superposition codes (SPARC) decoding [38, 54], which is also an MMV problem.
Structured sparsity models include tree-sparse, block-sparse, and group-sparse models. For instance, for block-sparse models with dense Gaussian sensing matrices it has been established in [5] that the number of required measurements to ensure recovery is as opposed to in the standard compressed sensing setting. Furthermore, in the sparse matrix setting, precisely for adjacency matrices of model expander graphs (also known as model expander matrices), the tree-sparse model only requires measurements [34, 3], which is smaller than the standard compressed sensing sampling rate stated above. Moreover, all proposed algorithms that perform an exact model projection, which is to find the closest vector in the model space for a given signal, guarantee recovery of a solution belonging to the model space, which is then more interpretable than applying off-the-shelf standard compressed sensing algorithms [3].
As the exact model projection problem used in many of the classical algorithms may become theoretically and computationally hard for specific sparsity models, approximation variants of some well-known algorithms like the Model-IHT have been introduced in [27]. Instead of iteratively solving the exact model projection problem, this algorithm, called AM-IHT, uses a head- and a tail-approximation to recover the signal which is computationally less demanding in general. The latter computational benefit comes along with a larger number of measurements required to obtain successful recovery with weaker recovery guarantees: the typical speed versus accuracy trade-off.
A special class of structured sparsity models are group models, where the support of the signal is known to be contained in the union of a small number of groups of indices. Group models were already studied extensively in the literature in the compressed sensing context; see [6, 36, 49]. Its choice is motivated by several applications, e.g. in image processing; see [44, 50, 51]. As it was shown in [4] the exact projection problem for group models is NP-hard in general but can be solved in polynomial time by dynamic programming if the intersection graph of the groups has no cycles. The latter case is quite restricting, since as a consequence each element is contained in at most two groups. In this work we extend existing results for the Model-IHT algorithm and its approximation variant (AM-IHT) derived in [27] to group models and model expander matrices. We focus on deriving discrete optimization methods to solve the exact projection problem and the head- and tail-approximations for much more general classes of group models than the state-of-the-art.
In Section 2 we present the main preliminary results regarding compressed sensing for structured sparsity models and group models. In Section 3 we study recovery algorithms using exact projection oracles. We first show that for group models with low treewidth, the projection problem can be solved in polynomial time by dynamic programming which is a generalization of the result in [4]. We then adapt known theoretical results for model expander matrices to these more general group models. To solve the exact projection problem for general group models we apply a Benders’ decomposition procedure. It can be even used for the more general assumption that we seek a signal which is group-sparse and additionally sparse in the classical sense. In Section 4 we study recovery algorithms using approximation projection oracles, namely head- and tail-approximations. We apply the known results in [27] to group models of low frequency and show that the required head- and tail-approximations for group models can be solved by a classical greedy-method and LP rounding, respectively. In Section 5 we test all algorithms, including Model-IHT, AM-IHT, MEIHT and AM-EIHT, on overlapping block-groups and compare the number of required measurements, iterations and the run-time.
Summary of our contributions:
- •
We study the Model-Expander IHT (MEIHT) algorithm, which was analysed in [3] for tree-sparse and loopless group-sparse signals, and extend the existing results to general group models, proving convergence of the algorithm.
- •
We extend the results in [4] by proving that the projection problem can be solved in polynomial time if the incidence graph of the underlying group model has bounded treewidth. This includes the case when the intersection graph has bounded treewidth, which generalizes the result for acyclic graphs derived in [4]. We complement the latter result with a hardness result that we use to justify the bounded treewidth approach.
- •
We derive a Benders’ decomposition procedure to solve the projection problem for arbitrary group models, assuming no restriction on the frequency or the structure of the groups. The latter procedure even works for the more general model combining group-sparsity with classical sparsity. We integrate the latter procedure into the Model-IHT and MEIHT algorithm.
- •
We apply the Approximate-Model IHT (AM-IHT) derived in [26, 28] to Gaussian and expander matrices and to the case of group models with bounded frequency, which is the maximal number of groups an element is contained in. In the expander case we call the algorithm AM-EIHT. To this end we derive both, head- and tail-approximations of arbitrary precision using a classical greedy method and LP-rounding. Using the AM-IHT and the results in [26, 28], this implies compressive sensing recovery guarantees for group-sparse signals. We show that the number of measurements needed to guarantee a successful recovery exceeds the number needed by the usual model-based compressed sensing bound [5, 9] only by a constant factor.
- •
We test the algorithms Model-IHT, MEIHT, AM-IHT and AM-EIHT on groups given by overlapping blocks for random signals and measurement matrices. We analyse and compare the minimal number of measurements needed for recovery, the run-time and the number of iterations of the algorithm.
2 Preliminaries
2.1 Notation
In most of this work scalars are denoted by ordinary letters (e.g. , ), vectors and matrices by boldface letters (e.g. , ), and sets by calligraphic capital letters (e.g., ). The cardinality of a set is denoted by and we define . Given , its complement is denoted by and is the restriction of to , i.e.
[TABLE]
The support of a vector is defined by . For a given we say a vector is -sparse if . For a matrix , the matrix denotes a sub-matrix of with columns indexed by . For a graph and , denotes the set of neighbours of , that is the set of nodes that are connected by an edge to the nodes in . We denote by an edge connecting node to node . The set denotes a group of size and a group model is any subset of ; while a group model of order is denoted by , which is a collection of any groups of . For a subset of groups we sometimes write
[TABLE]
The norm of a vector is defined as
[TABLE]
2.2 Compressed Sensing
Recall that the reconstruction problem in standard compressed sensing [19, 13, 14] attempts to recover an unknown -sparse signal , from its (potentially noisy) linear measurements , where , for and is a noise vector, typically with a bounded noise level . The reconstruction problem can be formulated as the optimization problem
[TABLE]
where is the number of non-zero components of . Problem (1) is usually relaxed to an -minimization problem by replacing with the -norm. It has been established that the solution minimizing the -norm coincides with the optimal solution of (1) under certain conditions [14]. Besides the latter approach the compressed sensing problem can be solved by a class of greedy algorithms, including the IHT [10]. A detailed discussion on compressed sensing algorithms can be found in [22].
The idea behind the IHT can be explained by considering the problem
[TABLE]
Under certain choices of and the latter problem is equivalent to (1) [10]. Based on the idea of gradient descent methods, (2) can be solved by iteratively taking a gradient descent step, followed by a hard thresholding operation, which sets all components to zero except the largest in magnitude. Starting with an initial guess , the -th IHT update is given by
[TABLE]
where is the hard threshold operator and is the adjoint matrix of .
Recovery guarantees of algorithms are typically given in terms of what is referred to as the instance optimality [14]. Precisely, an algorithm has instance optimality if for a given signal it always returns a signal with the following error bound
[TABLE]
where , are constants independent of the dimension of the signal and
[TABLE]
is the best -term approximation of a signal (in the -norm).
Ideally, we would like to have instance optimality [14]. It turned out that the instance optimality of the known algorithms depends mainly on the sensing matrix . Key amongst the tools used to analyse the suitability of is the restricted isometry property, which is defined in the following.
Definition 1** (RIP).**
A matrix satisfies the -norm restricted isometry property (RIP-p) of order , with restricted isometry constant (RIC) , if for all -sparse vectors
[TABLE]
Typically RIP without the subscript refers to case when . We use this general definition here because we will study the case later. The RIP is a sufficient condition on that guarantees optimal recovery of for most of the known algorithms. If the entries of are drawn i.i.d from a sub-Gaussian distribution and , then has RIP-2 with high probability and leads to the ideal instance optimality for most algorithms; see [15]. Note that the bound is asymptotically tight. On the other hand, deterministic constructions of or random with binary entries with non-zero mean do not achieve this optimal , and are faced with the so-called square root bottleneck where ; see [18, 16].
Sparse Sensing Matrices from Expander Graphs.
The computational benefits of sparse sensing matrices necessitated finding a way to circumvent the square root bottleneck for non-zero mean binary matrices. One such class of binary matrices is the class of adjacency matrices of expander graphs (henceforth referred to as expander matrices), which satisfy the weaker RIP-1. Expander graphs are objects of interest in pure mathematics and theoretical computer science, for a detailed discourse on this subject see [31]. We define an expander graph as follows:
Definition 2** (Expander graph).**
Let be a left-regular bipartite graph with left vertices, right vertices, a set of edges and left degree . If for any and any of size it holds , then is referred to as a -expander graph.
An expander matrix is the adjacency matrix of an expander graph. Choosing , then a random bipartite graph with left degree is an -expander graph with high probability [22]. Furthermore expander matrices achieve the sub-optimal instance optimality [8]. For completeness we state the lemma in [35] deriving the RIC for such matrices.
Lemma 3** (RIP-1 for Expander Matrices, [35]).**
Let be the adjacency matrix of a -expander graph , then for any -sparse vector , we have
[TABLE]
The most relevant algorithm that exploits the structure of the expander matrices is the Expander-IHT (EIHT) proposed in [22]. Similar to the IHT algorithm it performs updates
[TABLE]
where is the median operator and for each . For expander matrices the EIHT achieves instance optimality [22].
2.3 Model-based Compressed Sensing
Besides sparsity (and compressibility) signals do exhibit more complicated structures. When compressed sensing takes into account these more complicated structures (or models) in addition to sparsity, it is usually referred to as model-based compressed sensing or structured sparse recovery [5]. A precise definition is given in the following:
Definition 4** (Structured Sparsity Model [5]).**
A structured sparsity model is a collection of sets, with , of allowed structured supports .
Note that the classical -sparsity studied in Section 2.2 is a special case of a structured sparsity model where all supports of size at most are allowed. Popular structured sparsity models include tree-sparse, block-sparse, and group-sparse models [5]. In this work we study group-sparse models which we will introduce in Section 2.4.
Similar to the classical sparsity case the RIP property is defined for structured sparsity models.
Definition 5** (Model-RIP [5]).**
A matrix satisfies the -norm model restricted isometry property (-RIP-) with model restricted isometry constant (-RIC) , if for all vectors with it holds
[TABLE]
In [5] it was shown that for a matrix to have the Model-RIP with high probability the required number of measurements is for tree-sparse signals and for block-sparse signals with block size , when the sensing matrices are dense (typically sub-Gaussian). In general for a given structured sparsity model for sub-Gaussian random matrices the number of required measurements is where is the cardinality of the largest support in .
Furthermore the authors in [5] show that classical algorithms like the IHT can be modified for structured sparsity models to achieve instance optimality. To this end the hard thresholding operator used in the classical IHT is replaced by a model-projection oracle which for a given signal returns the closest signal over all signals having support in . We define the model-projection oracle in the following.
Definition 6** (Model-Projection Oracle [5]).**
Given , a model-projection oracle is a function such that for all we have and it holds
[TABLE]
From the definition it directly follows that if and [math] otherwise. Note that in the case of classical -sparsity the model-projection oracle is given by the hard thresholding operator . In contrast to this case, calculating the optimal model projection for a given signal and a given structured sparsity model may be computationally hard. Depending on the model finding the optimal model projection vector may be even NP-hard; see Section 2.4. The modified version of the IHT derived in [5] is presented in Algorithm 1.
Note that common halting criterion is given by a maximum number of iterations or a bound on the iteration error .
Model-Sparse Sensing Matrices from Expander Graphs.
In the sparse matrix setting the sparse matrices we consider as model expander matrices, which are adjacency matrices of model expander graphs, defined thus.
Definition 7** (Model expander graph).**
Let be a left-regular bipartite graph with left vertices, right vertices, a set of edges and left degree . Given a model , if for any and any , with and , we have , then is referred to as a -model expander graph.
In this setting the known results are sub-optimal. Using model expander matrices for tree-sparse models the required number of measurements to obtain instance optimality is which was shown in [34, 3].
A key ingredient in the analysis for the afore-mentioned sample complexity results for model expanders is the model-RIP-1, which is just RIP-1 for model expander matrices (hence they are also called model-RIP-1 matrices [34]). Consequently, Lemma 3 also holds for these model-RIP-1 matrices [34].
First, in [3] the Model Expander IHT (MEIHT) was studied for loopless overlapping groups and -ary tree models. Similar to Algorithm 1 the MEIHT is a modification of EIHT where the hard threshold operator is replaced by the projection oracle onto the model . Thus the update of the MEIHT in each iteration is given by
[TABLE]
In [3] the authors show that this algorithm always returns a solution in the model, which is highly desirable for some applications. The running time is given in the proposition below.
Proposition 8** (Proposition 3.1, [3]).**
The runtime of MEIHT is and for -ary tree models and loopless overlapping group models respectively, where is the sparsity of the tree model and is the number of active groups (i.e. group sparsity of the model), is the number of iterations, is the number of groups and is the dimension of the signal.
2.4 Group Models
The models of interest in this paper are group models. A group model is a collection of groups of indices, i.e. , together with a budget . We denote as the structured sparsity model (i.e. group-sparse model) which contains all supports contained in the union of at most groups in , i.e.
[TABLE]
We will always tacitly assume that . We say that a signal is -group-sparse if the support of is contained in . If is clear from the context, we simply say that is group-sparse. Let and denote as the size of largest support in . The intersection graph of a group model is the graph which has a node for each group and has an edge between and if the groups overlap, i.e. if ; see [4]. We call a group model loopless if the intersection graph of the group model has no cycles. We call a group model block model if all groups have equal size and if they are pairwise disjoint. In this case the groups are sometimes called blocks. We define the frequency of a group model as the maximum number of groups an element is contained in, i.e.
[TABLE]
In [4] besides the latter group models, the more general models are considered where an additional sparsity in the classical sense is required on the signal. More precisely for a given budget and a sparsity they study the structured sparsity model
[TABLE]
Note that for we obtain a standard group model defined as above.
Both variants of group models defined above clearly are special cases of a structured sparsity model defined in Section 2.3. Therefore all results for structured sparsity models can be used for group-sparse models. To adapt Algorithm 1 a model projection oracle (or ) has to be provided. Note that for several applications we are not only interested in the optimal support of the latter projection but we want to find at most groups covering this support. The main work of this paper is to analyse the complexity of the latter problem for group models and to provide efficient algorithms to solve it exactly or approximately. Given a signal , the group model projection problem or sometimes called signal approximation problem is then to find a support together with groups covering this support such that is minimal, i.e. we want to solve the problem
[TABLE]
If the parameter is not mentioned we assume .
Baldassarre et al. [4] observed the following close relation to the NP-hard Maximum -Coverage problem. Given a signal , a group-sparse vector for which is minimum satisfies for all . For a vector with the latter property,
[TABLE]
holds and so minimizing is equivalent to maximizing . Consequently, the group model projection problem with is equivalent to the problem of finding an index set of at most groups, i.e. , maximizing . This problem is called Maximum -Coverage in the literature [30]. Despite the prominence of the latter problem, we will stick to the group model notation, since it is closer to the applications we have in mind and we will leave the regime of Maximum -Coverage by introducing more constraints later.
We simplify the notation by defining for all . Using this notation, the group model projection problem is equivalent to finding an optimal solution of the following integer program:
[TABLE]
Here, the variable is one if and only if the -th index is contained in the support of the signal approximation, and is one if and only if the group is chosen.
Besides the NP-hardness for the general case the authors in [4] show that the group model projection problem can be solved in polynomial time by dynamic programming for the special case of loopless groups. Furthermore the authors show that if the intersection graph is bipartite the projection problem can be solved in polynomial time by relaxing problem (12). Similar results are obtained for the more general problem, where additional to the group-sparsity the classical -sparsity is assumed, i.e. the additional constraint
[TABLE]
is added to problem (12).
As stated in Section 2.3, the authors of [5] first study a special case of group models, i.e. block models, where the groups are non-overlapping and are all of equal size. The sample complexity they derived in that work for sub-Gaussian measurement matrices is , where is the fixed block size. However, in [3] the authors studied group models in the sparse matrix setting, besides other results they proposed the MEIHT algorithm for tree and group models. The more relevant result to this work show that for loopless overlapping group-sparse models with maximum group size , using model expander measurement matrices, the number of measurements required for successful recovery is ; see [3]. This results holds for general groups, the “looplessness” condition is only necessary for the polynomial time reconstruction using the MEIHT algorithm. Therefore, this sample complexity result also holds for the general group models we consider in this manuscript.
2.4.1 Group Lasso
The classical Lasso approach for -sparse signals seeks to minimize a quadratic error penalized by the -norm [22]. More precisely, for a given we want to find an optimal solution of the problem
[TABLE]
It is well known that using the latter approach for appropriate choices of leads to sparse solutions.
The Lasso approach was already extended to group models in [55] and afterwards studied in several works for non-overlapping groups; see [32, 40, 45, 48]. The idea is again to minimize a loss function, e.g. the quadratic loss, and to penalize the objective value for each group by a norm of the weights of the recovered vector restricted to the items in each group. An extension which can also handle overlapping groups was studied in [56, 37]. In [49] the authors study what they call the latent group Lasso. To this end they consider a loss function and propose to solve the -penalized problem
[TABLE]
for given weights and for each . The idea is that for ideal choices of the latter weights a solution of Problem (14) will be sparse and its support is likely to be a union of groups. Nevertheless it is not guaranteed that the number of selected groups is optimal as it is the case for the iterative methods in the previous sections. Note that equivalently we can replace each norm by a variable in the objective function and add the quadratic constraint . Hence Problem (14) can be modelled as a quadratic problem and can be solved by standard solvers like CPLEX.
The counterpart of Problem (14) was considered in [33] under the name block coding and can be formulated as
[TABLE]
Note that in contrast to Problem (14) an easy reformulation of Problem (15) into a continuous quadratic problem is not possible. Nevertheless we can reformulate it using the mixed-integer programming formulation
[TABLE]
where can be chosen larger or equal to the entry of the true signal for each . The variables have value if and only if group is selected for the support of . As for the variant it is not guaranteed that the number of selected groups is optimal. Note that the latter problem is a mixed-integer problem and therefore hard to solve in large dimension in general. Furthermore the efficiency of classical methods as the branch & bound algorithm depend on the quality of the calculated lower bound which depends on the values . Hence in practical applications where the true signal is not known good estimations of the values are crucial for the success of the latter method. Another drawback is that the best values for and the weights are not known in advance and have to be chosen appropriately for each application.
We study Problems (14) and (15) computationally in Section 5.
2.5 Approximation Algorithms for Model-Based Compressed Sensing
As mentioned in the last section solving the projection problem, given in Definition 6, may be computationally hard. To overcome this problem the authors in [28, 27] present algorithms, based on the idea of IHT (and CoSaMP), which instead of solving the projection problems exactly, use two approximation procedures called head- and tail-approximation. In this section we will shortly describe the concept and the results in [28, 27]. Note that we just present results related to the IHT, although similar results for the CoSaMP were derived as well in [28, 27].
Given two structured sparsity models and a vector , let be an algorithm that computes a vector with support in . Then, given some (typically ) we say that is an -head approximation if
[TABLE]
Note that the support of the vector calculated by is contained in while the approximation guarantee must be fulfilled for all supports in .
Moreover given two structured sparsity models let be an algorithm which computes a vector with support in . Given some (typically ) we say that is a -tail approximation if
[TABLE]
Note that in general a head approximation does not need to be a tail approximation and vice versa.
The cases studied in [27] are and . For the case the authors propose an algorithm called Approximate Model-IHT (AM-IHT), shown in Algorithm 2.
Assume that is a -tail approximation and is a -head approximation where is the Minkowski sum of and . Furthermore we assume the condition
[TABLE]
holds. The authors in [27] prove that for a signal with , noisy measurements where has -model RIP with RIC , Algorithm 2 calculates a signal estimate satisfying
[TABLE]
where depends on and . Note that the condition (19) holds e.g. for approximation accuracies and .
For the case the authors replace Step 3 in Algorithm 2 by the update
[TABLE]
where is the median operator defined as in Section 2.2. Under the same assumptions as above, but considering for the head- and tail-approximations and having the -RIP-, the authors in [27] show convergence of the adapted algorithm.
2.6 Comparison to Related Works
In a more illustrative way in Tables 1 and 2 below we show where our results stand vis-a-vis other results. In Table 1 we show the studied models together with the derived sample complexity and the studied class of measurements matrices. In Table 2 we present the names of the studied algorithms, the class of the model projection, the class of the algorithm used to solve the model projection, the runtime complexity of the projection problem and the class of instance optimality.
3 Algorithms with Exact Projection Oracles
In this section we study the exact group model projection problem which has to be solved iteratively in the Model-IHT and the MEIHT. We extend existing results for group-sparse models and pass from loopless overlapping group models (which was the most general prior to this work) to overlapping group models of bounded treewidth and to general group models without any restriction on the structure. The graph representing a loopless overlapping group model has a treewidth of 1.
We start by showing that it is possible to perform exact projections onto overlapping groups with bounded treewidth using dynamic programming, see Section 3.1. While this procedure has a polynomial run-time bound it is restricted to the class of group models with bounded treewidth. Nevertheless we prove that the exact projection problem is NP-hard if the incidence graph is a grid which is the most basic graph structure without bounded treewidth. For the sake of completeness we solve the exact projection problem for all instances of group models by a method based on Benders’ decomposition in Section 3.6. Solving an NP-hard problem this method does not yield a polynomial run-time guarantee but works well in practice as shown in [17]. In Section 3.5 we present an appropriately modified algorithm (MEIHT) with exact projection oracles for the recovery of signals from structured sparsity models. We derive corollaries for the general group-model case from existing works about run-time and convergence of this modified algorithm.
Recall the following notation: denotes a group of size , , and a group model is a collection . The group-sparse model of order is denoted by , which contains all supports which are contained in the union of at most groups of , i.e. and ; see (10). We will interchangeably say that or is -sparse. Clearly group models are a special case of structured sparsity models. Assume is the size of the maximal support which is possible by selecting groups out of . For denoting the maximal size of a single group in , i.e.,
[TABLE]
we have . Furthermore the number of possible supports is in . Therefore applying the result from Section 2.3 we obtain
[TABLE]
as the number of required measurements for a sub-Gaussian matrix to obtain the group-model-RIP with RIC with high probability, which induces the convergence of Algorithm 1 for small enough .
3.1 Group models of Low Treewidth
One approach to overcome the hardness of the group model projection problem is to restrict the structure of the group models considered. To this end we follow Baldassarre et al. [4] and consider two graphs associated to a group model .
The intersection graph of , , is given by the vertex set , and the edge set
[TABLE]
The incidence graph of , , is given by the vertex set , and the edge set
[TABLE]
Note that the incidence graph is bipartite since an edge is always adjacent to an element and a group . See Fig. 1 for a simple illustration of the two constructions.
Baldassarre et al. [4] prove that there is a polynomial time algorithm to solve the group model projection problem in the case that the intersection graph is an acyclic graph. Their algorithm uses dynamic programming on the acyclic structure of the intersection graph.
We generalize this approach and show that the same problem can be solved in polynomial time if the treewidth of the incidence graph is bounded. Following Proposition 9 below, this implies that the group model projection Problem can be solved in polynomial time if the treewidth of the intersection graph is bounded. We proceed by formally introducing the relevant concepts.
3.2 Tree Decomposition
Let be a graph. A tree decomposition of is a tree where each node of has a bag of vertices of such that the following properties hold:
. 2. 2.
If and both contain a vertex , then the bags of all nodes of on the path between and contain as well. Equivalently, the tree nodes containing the vertex form a connected subtree of . 3. 3.
For every edge in there is some bag that contains both and . That is, vertices in can be adjacent only if the corresponding subtrees in have a node in common.
The width of a tree decomposition is the size of its largest bag minus one, i.e. . The treewidth of , , is the minimum width among all possible tree decompositions of .
Intuitively, the treewidth measures how ‘treelike’ a graph is: the smaller the treewidth is, the more treelike is the graph. The graphs of treewidth one are the acyclic graphs. Fig. 2 shows a graph of treewidth 2, together with a tree decomposition.
Before stating any algorithms, we discuss the relation of the treewidth of the intersection and the incidence graphs of a given group model.
When bounding the treewidth of the graphs associated to a group model, it makes sense to consider the incidence graph rather than the intersection graph. This is due to the following simple observation.
Proposition 9**.**
For any group model it holds that . However, for every there exists a group model such that .
This statement is not necessarily new, but we quickly prove it in our language in order to be self-contained.
Proof of Proposition 9..
To see the first assertion, let be a group model. Let be a tree decomposition of of width . In the following, we attach leaves to , one for each element in , and obtain a tree decomposition of . Each leaf will contain at most many elements of and at most one element of . Hence we get .
To construct the tree decomposition of pick any , and let be the set of groups in containing . Since all groups in contain , the set is a clique in .111Recall that a clique in a graph is a set of mutually adjacent vertices. Moreover, since is a tree decomposition of , the subtrees of the groups in mutually share a node. As subtrees of a tree have the Helly property, there is at least one node of such that . We now add a new node with bag and an edge between and in . Doing this for all simultaneously, it is easy to see that we arrive at a tree decomposition of of width at most which proves the first assertion.
To prove the second assertion consider, for any , the group model where
[TABLE]
Note that is a tree, hence . In , however, the set is a clique of size . Thus, which implies . ∎∎
Consider now a tree decomposition of a graph . We say that is a nice tree decomposition if every node is of one of the following types.
- •
Leaf: has no children and .
- •
Introduce: has one child, say , and there is a vertex of with .
- •
Forget: has one child, say , and there is a vertex of with .
- •
Join: has two children and such that .
This kind of decomposition limits the structure of the difference of two adjacent nodes in the decomposition. A folklore statement (explained in detail in the classic survey by Kloks [39]) says that such a nice decomposition is easily computed given any tree decomposition of without increasing the width.
3.3 Dynamic Programming
In this section we derive a polynomial time algorithm for the group model projection problem for fixed treewidth. As in Section 2.4 we assume we have a given signal and define with . In the following we use the notation
[TABLE]
for a subset . The algorithm is presented using a nice tree decomposition of the incidence graph and uses the following concept. Fix a node of the decomposition tree of , a number with and a map . We say that is a colouring of . We consider solutions to the problem group model projection which is defined as follows.
A set is a feasible solution of group model projection if
- (a)
contains only groups that appear in some bag of a node in the subtree rooted at node , 2. (b)
, 3. (c)
contains exactly those group-vertices of that are in , that is,
[TABLE] 4. (d)
of the elements in , covers exactly those that are in . Formally,
[TABLE]
The objective value of the set is given by . Intuitively, a feasible solution to group model projection does not cover elements labelled [math] or , but covers all elements labelled . The elements labelled are assumed to be covered by groups not yet visited in the tree decomposition.
If group model projection does not admit a feasible solution, we say that group model projection is infeasible. The maximum objective value attained by a feasible solution to group model projection, if feasible, we denote by . If group model projection is infeasible, we set .
Assertion (d) implies that group model projection is infeasible if the groups in cover elements in or . That is, group model projection is infeasible if . To deal with this exceptional case we call consistent if , and inconsistent otherwise. Note that consistency of is necessary to ensure feasibility of group model projection, but not sufficient.
Our algorithm processes the nodes of a nice tree decomposition in a bottom-up fashion. Fix a node , a number with and a map . We use dynamic programming to compute the value , assuming we know all possible values for all children of , all with , and all .
In the following, for a subset the function is the restriction of to . We use to denote the neighborhood of in .
If is not consistent, we may set right away. We thus assume that is consistent and distinguish the type of node as follows.
- •
Leaf: set and for all .
- •
Introduce: let be the unique child of and let such that .
If , we set
[TABLE]
If , we set
[TABLE]
where is compatible to if
- –
is a consistent colouring of ,
- –
, and
- –
.
- •
Forget: let be the unique child of and let such that . We set
[TABLE]
- •
Join: we set
[TABLE]
where and are the two children of . The maximum is taken over all consistent colourings with and .
- •
Root: first we compute for all relevant choices of , depending on the type of node . The algorithm then terminates with the output
[TABLE]
Lemma 10**.**
The output OPT is the objective value of an optimal solution of the group model projection problem.
Proof.
The proof follows the individual steps of the dynamic programming algorithm. Consider the problem group model projection. If is not consistent, we correctly set . We thus proceed to the case when is consistent. Fix an optimal solution of group model projection if existent.
The leaf-node case is clear, so we proceed to the case of being an introduce-node. Let be the unique child of and let such that . First assume that .
Assume that . Since is consistent, holds, and so is an optimal solution to group model projection. We may thus set .
If , covers , so we need to make sure some vertex is contained in the solution in order for group model projection to be feasible. Hence we have if , and otherwise.
Next we assume that . If , we may simply put . So, assume that . If group model projection is feasible and thus exists, define . Now is a solution to for some with and . Note that is compatible to . Consequently, is upper bounded by the right hand side of (21).
To see that is at least the right hand side of (21), let be compatible to and let be a solution to group model projection of objective value . Then is a solution to group model projection of objective value . Consequently,
[TABLE]
If is a forget-node, let be the unique child of and let such that . If , we have
[TABLE]
Otherwise, if , we have
[TABLE]
Moreover, any solution of group model projection, where and is a solution of group model projection. This proves (22).
If is a join-node, let and be the two children of and recall that .
Let be the collection of groups in contained in the subtree rooted at , and let be the collection of groups in contained in the subtree rooted at . Since is a tree decomposition, .
Note that is a solution of group model projection and is a solution of group model projection for some and with . It is easy to see that and .
The objective value of equals
[TABLE]
This shows that is at most the right hand side of (23).
Now let be an optimal solution of group model projection and let be an optimal solution of group model projection where
- •
are both consistent,
- •
and , and
- •
.
Note that and exist since, as we have shown earlier, the colourings and satisfy the above assertions. Then is a solution of group model projection with objective value
[TABLE]
Consequently, is at least the right hand side of (23) and thus (23) holds. ∎∎
By storing the best current solution alongside the OPT-values we can compute an optimal solution together with OPT.
3.3.1 Runtime of the Algorithm
The computational complexity of the individual steps are as follows.
- (a)
Given the incidence graph on vertices of treewidth , one can compute a tree decomposition of width in time using Bodlaender’s algorithm [11]. The number of nodes of the decomposition is in . 2. (b)
Given a tree decomposition of width with nodes, one can compute a nice tree decomposition of width on nodes in time in a straightforward way [39].
The running time of the dynamic programming is bounded as follows.
Theorem 11**.**
The dynamic programming algorithm can be implemented to run in time given a nice tree decomposition of of width on nodes.
Note that we can assume that with . Together with the running time of the construction of the nice tree decomposition we can solve the exact projection problem on graphs with treewidth in .
Proof of Theorem 11.
Since the join-nodes are clearly the bottleneck of the algorithm, we discuss how to organize the computation in a way that the desired running time bound of holds in a node of this type.
So, let be a join-node and assume that and are the two children of . We want to compute , for all colourings and all with . Recall that we need to compute this value according to (23), that is,
[TABLE]
where the maximum is taken over all consistent colourings with and .
We enumerate all colourings and derive , , and . We put
[TABLE]
If either of , , or are inconsistent, we discard this choice of . In this way we capture every consistent colouring , and all consistent choices of , and satisfying and .
It remains to discuss the computation of the value . This value can be computed in time, since we are computing differences and unions of at most groups of size each. We arrive at a total running time in . ∎∎
Remark 12**.**
The dynamic programming algorithm can be extended to include a sparsity restriction on the support of the signal approximation itself. So, we can compute an optimal -sparse -group-sparse signal approximation if the bipartite incidence graph of the studied group models is bounded. The running time of the algorithm increases by a factor of .
3.4 Hardness on Grid-Like Group Structures
An -grid is a graph with vertex set , and two vertices are adjacent if and only if and , or if and . We also say that is the size of the grid. Fig. 3 shows a -grid.
Recall the group model projection problem can be solved efficiently when the treewidth of the incidence graph of the group structure is bounded, as shown in Section 3.1.
Definition 13** (Graph minor).**
Let and be two graphs. The graph is called a minor of if can be obtained from by deleting edges and/or vertices and by contracting edges.
A classical theorem by Robertson and Seymour [52] says that in a graph class closed under taking minors either the treewidth is bounded or contains all grid graphs.
Consequently, if is a class of graphs that does not have a bounded treewidth, it contains all grids. Our next theorem shows that group model projection is NP-hard on group models for which is a grid, thus complementing Theorem 11.
Theorem 14**.**
The group model projection problem is NP-hard even if restricted to instances for which is a grid graph and the weight of any element is either 0 or 1.
Consider the following problem: given an -pixel black-and-white image, pick -pixel windows to cover as many black pixels as possible. This problem can be modeled as the group model projection problem on a grid graph where the weight of any element is either 0 or 1. See Fig. 4 for an illustration. Note that this group model is of frequency at most 4, and so we can do an approximate model projection and signal recovery using the result of Section 4.
Our proof is a reduction from the Vertex Cover problem. Recall that for a graph , a vertex cover is a subset of the vertices of such that any edge of has at least one endpoint in this subset. The size of the smallest vertex cover of , the vertex cover number, is denoted .
Given a graph and a number as input, the task in the Vertex Cover problem is to decide whether admits a vertex cover of size . That is, whether . This problem is NP-complete even if restricted to cubic planar graphs [23].222Recall that a graph is cubic if every vertex is of degree , and planar if it can be drawn into the plane such that no two edges cross.
We use the following simple lemma in our proof.
Lemma 15**.**
Let be a graph and let be the graph obtained by subdividing some edge of twice. Then .
We can now prove our theorem.
Proof of Theorem 14.
The reduction is from Vertex Cover on planar cubic graphs. Consider to be a planar cubic graph and let be some number. Our aim is to compute an instance of the group model projection problem where is a grid such that has a vertex cover of size if and only if admits a selection of groups that together cover elements of a total weight at least some threshold .
First we embed the graph in some grid of polynomial size, meaning the vertices of get mapped to the vertices of the grid and edges get mapped to mutually internally disjoint paths in the grid connecting its endvertices. This can be done in polynomial time using an algorithm for orthogonal planar embedding [2]. We denote the mapping by , hence is some vertex of and is a path from to in for all and .
Next we subdivide each edge of the grid 9 times, so that a vertical/horizontal edge of becomes a vertical/horizontal path of length 10 in some new, larger grid . We choose such that the corners of are mapped to the corners of . In particular, . Let us denote the obtained embedded subdivision of in by , and let denote the embedding. Moreover, let be the corresponding embedding of into the subdivided grid . Note that .
Let be a bipartition of . We may assume that is in for all . We consider to be the incidence graph of a group model where the vertices in correspond to the elements and the vertices in correspond to the groups of . We refer to the vertices in as group-vertices and to vertices in as element-vertices. Slightly abusing notation, we identify each group with its group-vertex and each element with its element-vertex and write .
We observe that
- (a)
is an induced subgraph of , 2. (b)
every vertex , , has degree 3 in and is a group-vertex, 3. (c)
every other vertex has degree 2 in , and 4. (d)
for every group-vertex there is some group-vertex with
[TABLE]
Next we will tweak the embedding of a bit, to get rid of paths with the wrong parity. We do so in a way that preserves the properties (a)-(d). Let be the set of all paths with a length 0 (mod 4), and let . We want to substitute each path in by a path of length 2 (mod 4). For this, let be the neighbour of in the path . Note that the path in starts with a vertical or horizontal path from to of length 10. We bypass the middle vertex of this path (an element-vertex) by going over two new element-vertices and one group-vertex instead. See Fig. 5 for an illustration.
To keep the notation easy we denote the newly obtained path by . Note that, after adding the bypass, the new path is two edges longer and thus has length 2 (mod 4). We complete to an embedding of by putting and for all and with . Moreover, let us denote the changed embedding of by .
We observe that the new embedding still satisfies the assertions (a)-(d) and, in addition, it holds that
- (e)
every path connecting two vertices of degree 3 over vertices of degree 2 only has length 2 (mod 4).
Next we define the weights of the element-vertices by putting
[TABLE]
Assertion (d) implies that, for any subset of size there is an of size at most such that
- •
, and
- •
.
Since for all elements in , we may thus restrict our attention to the restricted group model on the element set .
Slightly abusing notation, any subset is a vertex subset of and equals the number of edges of adjacent to the vertex set in . Moreover, the graph is obtained from the graph by subdividing each edge an even number of times.
From Lemma 15 we know that there is some number such that . Hence, has a vertex cover of size if and only if has a cover of size of total weight . This, in turn, is the case if and only if admits a cover of size of total weight . Since the construction of can be done in polynomial time, the proof is complete. ∎∎
3.5 MEIHT for General Group Structures
In this section we apply the results for structured sparsity models and for expander matrices to the group model case. The Model-Expander IHT (MEIHT) algorithm is one of the exact projection algorithms with provable guarantees for tree-sparse and loopless overlapping group-sparse models using model-expander sensing matrices [3]. In this work we show how to use the MEIHT for more general group structures. The only modification of the MEIHT algorithm is the projection on these new group structures. We show MEIHT’s guaranteed convergence and polynomial runtime.
Note that as in [4], we are able to do model projections with an additional sparsity constraint, i.e. projection onto defined in (11). Therefore Algorithm 3 works with an extra input and the model projection becomes , retuning a -sparse approximation to .
The convergence analysis of MEIHT with the more general group structures considered here remains the same as for loopless overlapping group models discussed in [3]. We are able to perform the exact projection of (and ) as discussed in Section 3.6. With the possibility of doing the projection onto the model, we present the convergence results in Corollaries 16 and 17 as corollaries to Theorem 3.1 and Corollary 3.1 in [3] respectively.
Corollary 16**.**
Consider to be a group model of bounded treewidth and to be -sparse. Let the matrix be a model expander matrix with and ones per column. For any and , the sequence of updates of MEIHT with satisfies, for any
[TABLE]
where and .
Note that is the expansion coefficient of the underlying -model expander graph for . This ensures that satisfied model RIP-1 over all -sparse signals.
The proof of this Corollary can be done analogously to the proof of Theorem 3.1 in [3]. It is thus skipped and the interested reader is referred to [3].
Let us define the -error of the best -term approximation to a vector
[TABLE]
This is then used in the following corollary.
Corollary 17**.**
Consider the setting of Corollary 16. After iterations, MEIHT returns a solution satisfying
[TABLE]
where and .
Proof.
Without loss of generality we initialize MEIHT with . Upper bounding by and using triangle inequalities with some algebraic manipulations, (24) simplifies to
[TABLE]
Using the fact that is a binary matrix with ones per column we have . We also have when . Applying these bounds to (27) leads to
[TABLE]
for . This is equivalent to (26) with , , for the given , and because is the best -term approximation to the -sparse . This completes the proof.∎∎
The runtime complexity of MEIHT still depends on the median operation and the complexity of the projection onto the model. However, as observed in [3], the projection onto the model is the dominant operation of the algorithm. Therefore, the complexity of MEIHT is of the order of the complexity of the projection onto the model. In the case of overlapping group models with bounded treewidth, MEIHT achieves a polynomial runtime complexity as shown in Proposition 18 below. On the other hand, when the treewidth of the group model is unbounded MEIHT can still be implemented by using the Bender’s decomposition procedure in Section 3.6 for the projection which may have an exponential runtime complexity.
Proposition 18**.**
The runtime of MEIHT is for the -group-sparse model with bounded treewidth , where is the number of iterations, is the number of groups, is the group budget and is the signal dimension.
Proof.
Before we start the MEIHT procedure we have to calculate a nice tree decomposition of the incidence graph of the group model. This can be done in . Then in each of the iterations of the MEIHT we have to solve the exact projection onto the model which is the dominant operation of the MEIHT. Since the projection on the group model with bounded treewidth can be done through the dynamic programming algorithm that runs in , as proven in Section 3.1, this proves the result. ∎∎
Remark 19**.**
The convergence results above hold when MEIHT is modified appropriately to solve the standard -sparse and -group-sparse problem with groups having bounded treewidth, where the projection becomes . However, in this case the runtime complexity in each iteration grows by a factor of , as indicated in Remark 12.
3.6 Exact Projection for General Group Models
In this section we consider the most general version of group models, i.e. is an arbitrary set of groups and and are given budgets. We study the structured sparsity model introduced in Section 2.4. Here additional to the number of groups that can be selected we bound the number of indices to be selected in these groups by (i.e. we consider a group-sparse model with an additional standard sparsity constraint). Note that setting reduces this model to the general group model .
If we want to apply exact projection recovery algorithms like the Model-IHT and MEIHT to group models, iteratively the model projection problem has to be solved, i.e. in each iteration for a given signal we have to find the closest signal which has support in the model . In this section we will derive an efficient procedure based on the idea of Benders’ Decomposition to solve the projection problem. This procedure is analysed and implemented in Section 5.
It has been proved that the group model projection problem for group models without a sparsity condition on the support is NP-hard [4]. Therefore the projection problem on the more general model is NP-hard as well. The latter problem can be reformulated by the integer programming formulation
[TABLE]
Here are the squared entries of the signal, the -variables represent the groups and the -variables represent the elements which are selected. Note that by choosing we obtain the projection problem for classical group models .
To derive an efficient algorithm for the projection problem we use the concept of Benders’ Decomposition which was already studied in [7, 24]. The idea of this approach is to decompose Problem 29 into a master problem and a subproblem. Then iteratively the subproblem is used to derive feasible inequalities for the master problem until no feasible inequality can be found any more. This procedure has been applied to Problem 29 without the sparsity constraint on the -variables in [17]. The following results for the more general Problem 29 are based on the the idea of Benders’ Decomposition and extend the results in [17].
First we can relax the -variables in the latter formulation without changing the optimal value, i.e. we may assume . We can now reformulate (29) by
[TABLE]
where . Replacing the linear problem in (30) by its dual formulation, we obtain
[TABLE]
where is the feasible set of the dual problem. Since is a polyhedron and the minimum in (31) exists, the first constraint in (31) holds if and only if it holds for each vertex of . Therefore Problem (31) can be reformulated by
[TABLE]
where are the vertices of . Each of the constraints
[TABLE]
for is called optimality cut.
The idea of Benders’ Decomposition is, starting from Problem (32) containing no optimality cut (called the master problem), to iteratively calculate the optimal and then find a optimality cut which cuts off the latter optimal solution. In each step the most violating optimality cut is determined by solving
[TABLE]
for the actual optimal solution . If the optimal solution fulfills
[TABLE]
then the optimality cut related to the optimal vertex is added to the master problem. This procedure is iterated until the latter inequality is not true any more. The last optimal must then be optimal for Problem (29) since the first constraint in (31) is then true for .
If we use the latter Benders’ Decomposition approach it is desired to use fast algorithms for the master problem (32) and the subproblem (33) in each iteration. By the following lemma an optimal solution of the subproblem can be easily calculated.
Lemma 20**.**
For a given solution we define and by the indices of the largest values for . An optimal solution of Problem (33) is then given by where and
[TABLE]
Proof.
Note that for a given the dual problem of subproblem (33) is
[TABLE]
It is easy to see that there exists an optimal solution of Problem (34) where if and only if and otherwise. We will use the dual slack conditions
[TABLE]
to derive the optimal values for . We obtain the following cases:
Case : If and , i.e. , then by conditions (35) we have . To ensure the constraint , the value of must be at least .
Case : If and , then and the objective coefficient of in Problem (33) is [math]. Therefore we can increase as much as we want without changing the objective value and therefore we set to ensure the -th constraint and set .
Case If , i.e. , and then by condition (35) . Therefore to ensure the -th constraint the value of must be at least and since we minimize in the objective function the latter holds with equality.
Case : If , i.e. , and then we cannot use condition (35) to derive the values for and . Nevertheless in this case both variables have an objective coefficient of while has an objective coefficient of . By increasing by the objective value increases by . In Cases and nothing changes, while for each index in Case we could decrease by to remain feasible. But since at most indices fulfil the conditions of Case we cannot improve our objective value by this strategy. Therefore has to be selected as small as possible in Case , i.e. by Case we set and to ensure feasibility we set . This concludes the proof.∎∎
Theorem 21**.**
An optimal solution of subproblem (33) can be calculated in time .
Proof.
The set can be calculated in time by going through all groups for each index and checking if the index is contained in one of the groups. To obtain we have to find the -th largest element in which can be done in time ; see [46]. Afterwards we select all values which are larger than the -th largest element which can be done in . Assigning all values to can also be done in time . ∎∎
The following theorem states that the masterproblem can be solved in pseudopolynomial time if the number of constraints is fixed. Nevertheless note that the number of iterations of the procedure described above may be exponential in and .
Theorem 22**.**
Problem (32) with constraints can be solved in where .
Proof.
Problem (32) with constraints can be reformulated by
[TABLE]
The latter problem is a special case of the robust knapsack problem with discrete uncertainty (sometimes called robust selection problem) with additional uncertain constant. In [12] the authors mention that the problem with an uncertain constant is equivalent to the problem without such a constant. Furthermore using the result in [41] Problem (36) can be solved in where
[TABLE]
Since for all solutions generated in Lemma 20 it holds that we have which proves the result.∎∎
4 Algorithms with Approximation Projection Oracles
As mentioned in the previous sections solving the group model projection problem is NP-hard in general. Therefore to use classical algorithms as the Model-IHT or the MEIHT we have to solve an NP-hard problem in each iteration. To tackle problems like this the authors in [28, 27] introduced an algorithm called Approximate Model-IHT (AM-IHT) which is based on the idea of the classical IHT but which does not require an exact projection oracle (see Section 2.5). Instead the authors show that under certain assumptions on the measurement matrix a signal can be recovered by just using two approximation variants of the projection problem which they call head- and tail-approximation (for further details again see Section 2.5).
In this section we apply the latter results to group models of bounded frequency: group models where the maximum number of groups an element is contained in is bounded by some number . Note that from Theorem 14 we know that group model projection is NP-hard for group models of frequency 4. A particularly interesting case of such group structures is the graphic case, where each element is contained in at most two groups. Understanding this case was proposed as an open problem by Baldassarre et al. [4].
Furthermore we apply the theoretical results derived in [28, 27] to group models and show that the number of required measurements compared to the classical structured sparsity case increases by just a constant factor. In Section 5 we will computationally compare the AM-IHT to the Model-IHT and the MEIHT on interval groups.
4.1 Head- and Tail-Approximations for Group Models with Low Frequency
In this section we derive head- and tail-approximations for the case of group models with bounded frequency. We first recall the definition of head- and tail-approximations for the case of group models. Assume we have given a group model together with .
Given a vector , let be an algorithm that computes a vector with support in for some . Then, given some (typically ) we say that is an -head approximation if
[TABLE]
In other words, uses many groups to cover at least an -fraction of the maximum total weight covered by groups. Note that can be chosen larger than .
Moreover, let be an algorithm which computes a vector with support in . Given some (typically ) we say that is a -tail approximation if
[TABLE]
This means that may use many groups to leave at most a -fraction of weight uncovered compared to the minimum total weight left uncovered by groups.
In the following we derive the head- and tail-approximation just for the case . Note that equivalent approximation procedures can be easily derived for the case by replacing the accuracies and by and and using the weights instead of in the latter proofs. We will first present a result which implies the existence of a head approximation.
Theorem 23** (Hochbaum and Pathria [30]).**
For each there exists an -head approximation running in polynomial time.
The algorithm derived in [30] was designed to solve the Maximum -Coverage problem and is based on a simple greedy method. The idea is to iteratively select the group which covers the largest amount of uncovered weight. It is proven by the authors that if you are allowed to select enough groups, namely , then the optimal value is approximated up to an accuracy of . The greedy procedure is given in Algorithm 4. Note that for a given signal and a group we define
[TABLE]
Next we derive a tail-approximation for our problems based on the idea of LP rounding. Note that in contrast to the head-approximation, the run-time bound of the following tail-approximation depends on the frequency of the group model.
Theorem 24**.**
Suppose the frequency of the group model is . For any and there exists an -tail approximation running in polynomial time.
Proof.
Given a signal , we define . We consider the following linear relaxation of the group model projection problem
[TABLE]
Consider an optimal solution of (39). We compute a group cover by
[TABLE]
Note that contains at most many groups, since
[TABLE]
It remains to show that is a tail approximation. To this end let be the set of indices only barely covered by , i.e.
[TABLE]
Note that
[TABLE]
since implies
[TABLE]
and hence
[TABLE]
for some with . Moreover, note that
[TABLE]
and hence
[TABLE]
We obtain the inequalities
[TABLE]
where we used (40) and (41). Since is an optimal solution of the relaxed problem (39), we have
[TABLE]
where is an optimal solution of the group model projection problem and the corresponding optimal vector of Problem (12). Therefore the latter procedure is a tail-approximation which completes the proof.∎∎
4.2 AM-IHT and AM-MEIHT for Group Models
As in the previous sections for a sensing matrix and the true signal we have a noisy measurement vector for some noise vector . The task consists in recovering the original signal , or a vector close to . Furthermore we have given a group model together with with frequency .
In the last section we derived polynomial time algorithms for an -head approximation and an -tail approximation. Note that we can use any here. Using the results in Section 2.5, we obtain convergence of the AM-IHT for group models if is an -tail approximation, is an -head approximation, where and , and the sensing matrix has -RIP with . Note that for fixed accuracy and frequency . Furthermore for a constant . Using the bound (20) we obtain that the number of required measurements for a sub-Gaussian random matrix having the -RIP with high probability is
[TABLE]
which differs by just a constant factor from the number of measurements required in the case of exact projections (see Section 3). Under condition (19) convergence of the AM-IHT is ensured.
4.3 Extension to within-group sparsity and beyond
The head and tail approximation approach can be extended far beyond the standard group-sparsity model. It still works even if we are considering -sparse and -group-sparse (i.e. -sparse) vectors in our model, for example.
The reason is that the group model projection can be head approximated to a constant even in this case. Formally, if we search for the weight maximal elements covered by many groups, we are maximizing a submodular function subject to a knapsack constraint.333Actually, we are maximizing a submodular function subject to a uniform matroid constraint which is a simpler problem. This is known to be approximable to a constant factor (cf. Kulik et al. [42, 43] and related work). Suppose we delete the covered elements and run such an approximation algorithm again. Then, after a constant number of steps, we obtain a collection of groups and elements such that the total weight of the elements is at least an -fraction of the total weight that a -sparse and -group-sparse solution could ever have. Moreover, the sparsity budgets are exceeded only by a constant factor each.
Similarly, the analysis given in the proof of Theorem 24 works even if we impose sparsity on both groups and elements. Again, we obtain a solution that has a -tail guarantee whose support exceeds that of a -group-sparse -sparse vector by at most a constant factor if we assume a bounded frequency. This leads to the positive consequences detailed above.
More generally, any knapsack constraints on groups and elements can be handled, leading to head and tail approximations in the case when there are non-uniform sparsity budgets on the groups and elements. However the corresponding head approximations are rather involved, and certainly much less efficient than the simple greedy procedure proposed by the Hochbaum and Pathria algorithm [30].
5 Computations
In this section we present the computational results for the Model-IHT, MEIHT, AM-IHT and AM-EIHT presented in Section 2 for block-group instances. Precisely, we study block-groups, i.e. each group is a set of sequenced indices, , where and each group has the same size . For a given dimension we generate blocks of size . We consider two types of block-models, one where the successive groups overlap in items and another where they overlap in items. Note that the frequency is then given by or respectively.
We run all algorithms for random signals in dimension . For each dimension we vary the number of measurements and generate random matrices each together with a random signal . We assume there is no noise, i.e. . For a given group model the support of the signal is determined as the union of randomly selected groups. The components of in the support are calculated as identical and independent draws from a standard Gaussian distribution while all other components are set to [math]. Our computations are processed for two classes of random matrices, Gaussian matrices and expander matrices as described in Section 2. The Gaussian matrices are generated by drawing identical and independent values from a standard Gaussian distribution for each entry of and afterwards normalizing each entry by . The expander matrices are generated by randomly selecting indices in for each column of the matrix. The choice of is motivated by the choice in [3]. Each of the algorithms is stopped if either the number of iterations reaches or if for any iteration we have . For the error in each iteration we use for the calculations corresponding to the expander matrices and for the Gaussian matrices. After the determination of the algorithm we calculate the relative error of the returned signal to the true signal , i.e. we calculate
[TABLE]
Again we use for the calculations corresponding to the expander matrices and for the Gaussian matrices. We call a signal recovered if the relative error is smaller than . For the AM-IHT and the AM-EIHT the approximation accuracy of the head- and the tail approximation algorithms are set to and .
For the exact signal approximation problem which has to be solved in each iteration of the Model-IHT and the MEIHT we implemented the Benders’ decomposition procedure presented in Section 3.6. To this end the master problem is solved by CPLEX 12.6 while each optimal solution of the subproblem is calculated using the result of Lemma 20. Regarding the AM-IHT, for the head-approximation we implemented the greedy procedure given in Algorithm 4 while for the tail-approximation we implemented the procedure of Theorem 24. Again the LP in the latter procedure is solved by CPLEX 12.6. All computations were calculated on a cluster of 64-bit Intel(R) Xeon(R) CPU E5-2603 processors running at 1.60 GHz with 15MB cache.
The results of the computations are presented in the following diagrams. For all instances we measure the smallest number of measurements for which the median of the relative error to the true signal is at most , i.e. the smallest number of measurements for which at least of the signals were recovered. Furthermore we show the average number of iterations and the average time in seconds the algorithms need to successfully recover a signal, given number of measurements. We stop increasing the number of measurements if is reached.
In Figures 6, 7 and 8 we show the development of , the number of iterations and the runtime in seconds of all algorithms over all dimensions for block-groups, generated as explained above, with fixed value .
The smallest number of measurements which leads to a median relative error of at most is nearly linear in the dimension; see Figure 6. For all algorithms the corresponding is very close even for different number of overlaps. Nevertheless the number of measurements is smaller for expander matrices than for Gaussian matrices. Furthermore in the expander case the instances with overlap have a smaller . The average number of iterations performed by the algorithms fluctuates a lot for the Gaussian case. Here the value increases slowly for the Model-IHT while it increases more rapidly for the AM-IHT. In the expander case the number of iterations is very close to each other for all algorithms and lies between and most of the time; see Figure 7. The drop from to is due to the small value of when . Furthermore the number of iterations is much lower in the expander case.
The average runtime for the Gaussian case is a bit larger than the runtime for the expander case as expected, since operations with dense matrices are more costly than the sparse ones. However, it may be due to the larger number of iterations; see Figure 8. Furthermore the runtime for the instances with overlap is much larger in both cases. Here the AM-IHT (AM-EIHT) is faster than the Model-IHT (MEIHT) for the instances with overlap while it is slightly slower for the others.
In Figures 9, 10 and 11 we show the same properties as above, but for varying , a fixed dimension of and fixed to for all values of . Similar to Figure 6 the value seems to be linear in (see Figure 9). Just the Model-IHT (MEIHT) for blocks with overlap seems to require an exponential number of measurements in to guarantee a small median relative error. Here the AM-IHT (AM-EIHT) performs much better. Interestingly the number of iterations does not change a lot for increasing for Gaussian matrices while it grows for the AM-EIHT in the expander case. This is in contrast to the iteration results with increasing ; see Figure 7. The runtime of all algorithms increases slowly with increasing , except for the IHT for blocks with overlap the runtime explodes. For both instances the AM-IHT (AM-EIHT) is faster than the Model-IHT (MEIHT).
To conclude this section we selected the instances for dimension and and present the development of the median relative error over the number of measurements ; see Figure 12. In the expander case the median relative error decreases rapidly and is nearly [math] already for . Just for the MEIHT for blocks with overlap the relative error is close to [math] not until . For the Gaussian case the results look similar with the only difference that a median relative error close to [math] is reached not until .
5.1 Latent Group Lasso
In this section we present computational results for the latent group Lasso approach introduced in Section 2. We consider block-groups and all group instances are generated as described in the previous section. We study the variant presented in (14) and its counterpart presented in (15). Both problems are implemented in CPLEX 12.6. For the counterpart we implemented the integer programming formulation (16). For the given random Gaussian matrices and its linear measurements we use while for expander matrices we use . The last choice is motivated by our comnputational tests which showed that the -norm has a better performance for expander matrices.
We run all algorithms for random signals in dimension . The number of measurements is varied in and we generate random matrices each together with a random signal . We assume there is no noise, i.e. . For a given group model the support of the signal is determined as the union of randomly selected groups. The components of in the support are calculated as identical and independent draws from a standard Gaussian distribution while all other components are set to [math]. Our computations are processed for two classes of random matrices, Gaussian matrices and expander matrices generated as described in the previous section. After each calculation we calculate the relative error of the returned signal to the true signal , i.e. we calculate
[TABLE]
where we use for the calculations corresponding to the expander matrices and for the Gaussian matrices. Additionally we calculate the pattern recovery error
[TABLE]
which was defined in [49], and the probability of recovery, i.e. the fraction of instances which were successfully recovered. We call a signal recovered if the relative error is smaller or equal than .
All computations were performed for and for all groups . For each the with the best average relative error is calculated and the which has the best average relative error most often over all is chosen. For all experiments the optimal value was .
All computations were calculated on a cluster of 64-bit Intel(R) Xeon(R) CPU E5-2603 processors running at 1.60 GHz with 15MB cache.
The results of the computations are presented in Figures 13, 14, 15, 16 and 17. For each we show the median relative error, the probability of recovery, the average pattern recovery error and the average number of selected groups ; each value calculated over all matrices and for . In Figure 17 we show for each the value of with has the smallest average relative error.
The results in Figure 13 show that the variant of the latent group Lasso performs very well for Gaussian matrices. Even for a small number of measurements the median relative error is [math] while for expander matrices the error never is smaller than . The variant for Gaussian matrices has a larger error for small number of measurements which decreases rapidly and is always [math] for larger than . In the expander case it is never smaller than as well. The same picture holds for the pattern recovery error; see Figure 15. The only difference here is that also for expander matrices the error tends to [math]. The results indicate that the optimal support is calculated for both variants and for both types of matrices if is large enough, but for expander matrices the latent group Lasso struggles to find the optimal component-values of in the support. Interestingly the frequency of the groups does not significantly influence the results.
The probability of recovery for Gaussian matrices is for all for the variant and is for larger than for the variant; see Figure 14. According to the results for the relative error the probability of recovery for expander matrices is [math] for all . The number of groups selected by the latent group Lasso is close to for all for the variant; see Figure 16. For the variant with Gaussian matrices it is close to for all larger than while for the expander matrices it is already close to for all larger than . The value of the optimal is large for small and is always for larger ; see Figure 17.
To summarize the results, it seems that the latent group Lasso outperforms the variant. It can even compete with the iterative algorithms tested in the previous section; the number of required measurements can be even smaller for the latent group Lasso while at the same time the support is correctly recovered. Nevertheless in a real-word application the optimal is not known and has to be found. Furthermore in contrast to the iterative algorithms studied in this work it can never be guaranteed that the recovered support and especially the number of groups calculated by the latent group Lasso is optimal. The expander variant of the latent group Lasso performs much worse than for the iterative algorithms. Especially this approach fails to recover the true signal for all instances, while the true support can be found.
6 Conclusion
In this paper we revisited the model-based compressed sensing problem focusing on overlapping group models with bounded treewidth and low frequency. We derived a polynomial time dynamic programming algorithm to solve the exact projection problem for group models with bounded treewidth, which is more general than the state-of-the-art considering loopless overlapping models. For general group models we derived an algorithm based on the idea of Bender’s decomposition, which may run in exponential time but often performs better than dynamic programming in practice. We proved that the latter procedure is generalizable from group-sparse models to group-sparse plus standard sparse models. The most dominant operation of iterative exact projection algorithms is the model projection. Hence our results show that the Model-IHT and the MEIHT run in polynomial time for group models with bounded treewidth. Alternatively, for group models with bounded frequency we show that another class of less accurate algorithms run in polynomial time. More precisely the AM-IHT and the AM-EIHT are algorithms using head- and tail-approximations instead of exact projections.
Using Benders’ Decomposition (with Gaussian and model-expander sensing matrices) we compare the minimum number of measurements required by, and runtimes of, each of the four algorithms (Model-IHT, MEIHT, AM-IHT and AM-EIHT) to achieve a given accuracy. In summary the experimental results on overlapping block groups seem to indicate that the number of required measurements to recover a signal is smaller for expander matrices than for Gaussian matrices. Furthermore, we could observe that the number of measurements to ensure a small relative error is smaller for the approximate versions of the algorithms. The run-time gets much larger for Gaussian matrices with increasing than for expander matrices, which might be just what is expected when applying dense versus sparse matrices. In general the approximate versions of the algorithms may have a larger number of iterations but the run-time is lower. This indicates that the larger number of iterations can be compensated by the faster computation of the approximate projection problems in each iteration. Additionally to the iterative algorithms we test the latent group Lasso approach on the same instances and show that the variant outperforms the variant and is even competitive to the iterative algorithms.
Acknowledgements
BB acknowledges the support from the funding by the German Federal Ministry of Education and Research, administered by Alexander von Humboldt Foundation, for the German Research Chair at AIMS South Africa.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] K. J. Ahn, S. Guha, and A. Mc Gregor. Graph sketches: sparsification, spanners, and subgraphs. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems , pages 5–14. ACM, 2012.
- 2[2] M. J. Alam, M. A. Bekos, M. Kaufmann, P. Kindermann, S. G. Kobourov, and A. Wolff. Smooth orthogonal drawings of planar graphs. In LATIN 2014: Theoretical Informatics - 11th Latin American Symposium, Montevideo, Uruguay, March 31 - April 4, 2014. Proceedings , pages 144–155, 2014.
- 3[3] B. Bah, L. Baldassarre, and V. Cevher. Model-based sketching and recovery with expanders. In SODA , pages 1529–1543. SIAM, 2014.
- 4[4] L. Baldassarre, N. Bhan, V. Cevher, A. Kyrillidis, and S. Satpathi. Group-sparse model selection: Hardness and relaxations. IEEE Trans. Information Theory , 62(11):6508–6534, 2016.
- 5[5] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde. Model-based compressive sensing. Information Theory, IEEE Transactions on , 56(4):1982–2001, 2010.
- 6[6] R. G. Baraniuk, V. Cevher, and M. B. Wakin. Low-dimensional models for dimensionality reduction and signal recovery: A geometric perspective. Proceedings of the IEEE , 98(6):959–971, 2010.
- 7[7] J. F. Benders. Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematik , 4(1):238–252, 1962.
- 8[8] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss. Combining geometry and combinatorics: A unified approach to sparse signal recovery. In Communication, Control, and Computing, 2008 46th Annual Allerton Conference on , pages 798–805. IEEE, 2008.
