Generalized orthogonal matching pursuit for multiple measurements - A   structural approach

Florian Bo{\ss}mann

arXiv:1705.08259·math.NA·May 24, 2017

Generalized orthogonal matching pursuit for multiple measurements - A structural approach

Florian Bo{\ss}mann

PDF

Open Access

TL;DR

This paper introduces a generalized orthogonal matching pursuit algorithm tailored for multiple measurement vectors, emphasizing the preservation of data structures and demonstrating improved approximation in signal processing applications.

Contribution

It presents a novel structural approach to MMV sparse approximation, focusing on data structure preservation rather than just approximation accuracy.

Findings

01

Numerical comparisons show improved performance over existing methods.

02

The approach effectively preserves complex data structures.

03

Applications demonstrate practical benefits in signal processing tasks.

Abstract

Sparse data approximation has become a popular research topic in signal processing. However, in most cases only a single measurement vector (SMV) is considered. In applications, the multiple measurement vector (MMV) case is more usual, i.e., the sparse approximation problem has to be solved for several data vectors coming from closely related measurements. Thus, there is an unknown inter-vector correlation between the data vectors. Using SMV methods typically does not return the best approximation result as the correlation is ignored. In the past few years several algorithms for the MMV case have been designed to overcome this problem. Most of these techniques focus on the approximation quality while quite strong assumptions to the type of inter-vector correlation are made. While we still want to find a sparse approximation, our focus lies on preserving (possibly complex) structures in…

Tables1

Table 1. TABLE I: Polynomial coefficients of the structures shown in Fig. 5 .

$L$	$1$	$2$	$3$	$4$
$x^{4}$	$1.05 e - 10$	$1.57 e - 11$	$1.48 e - 6$	$5.65 e - 7$
$x^{3}$	$8.91 e - 08$	$1.45 e - 08$	$- 3.02 e - 5$	$4.22 e - 5$
$x^{2}$	$2.42 e - 05$	$4.80 e - 06$	$- 0.000290$	$0.001354$
$x$	$0.002202$	$0.000576$	$0.007798$	$0.010080$
const.	$2.8598$	$0.770270$	$1.6013$	$1.6006$

Equations63

\underline{x} \in C^{N_{3}} min ∥ \underline{x} ∥_{0} s.t. ∥ \underline{D} \underline{x} - \underline{b} ∥_{2} \leq ε

\underline{x} \in C^{N_{3}} min ∥ \underline{x} ∥_{0} s.t. ∥ \underline{D} \underline{x} - \underline{b} ∥_{2} \leq ε

\underline{X} \in C^{N_{3} \times N_{2}} min ∥ \underline{X} ∥_{0, \infty} s.t. ∥ \underline{D} \underline{X} - \underline{B} ∥_{2} \leq ε

\underline{X} \in C^{N_{3} \times N_{2}} min ∥ \underline{X} ∥_{0, \infty} s.t. ∥ \underline{D} \underline{X} - \underline{B} ∥_{2} \leq ε

J = J_{P} = {supp \underline{M} ∣ each column of \underline{M} is 1-sparse},

J = J_{P} = {supp \underline{M} ∣ each column of \underline{M} is 1-sparse},

(j, k) \in J \Leftrightarrow j = arg max ∣ \underline{D}^{*} \underline{r}^{k} ∣,

(j, k) \in J \Leftrightarrow j = arg max ∣ \underline{D}^{*} \underline{r}^{k} ∣,

J = J_{V} = {supp \underline{M} ∣ ∥ \underline{M} ∥_{0} = 1}

J = J_{V} = {supp \underline{M} ∣ ∥ \underline{M} ∥_{0} = 1}

J = J_{S} = {supp \underline{M} ∣ \underline{M} has at most one non-zero row}

J = J_{S} = {supp \underline{M} ∣ \underline{M} has at most one non-zero row}

i = j arg max ∥ (\underline{D}^{*} \underline{R})_{j, \cdot} ∥_{λ}

i = j arg max ∥ (\underline{D}^{*} \underline{R})_{j, \cdot} ∥_{λ}

({m_{i}}_{(i, j) \in J} ∣ {\overline{m_{i} m_{i^{'}}}, d_{M} (m_{i}, m_{i^{'}}) \leq α}) is connected.

({m_{i}}_{(i, j) \in J} ∣ {\overline{m_{i} m_{i^{'}}}, d_{M} (m_{i}, m_{i^{'}}) \leq α}) is connected.

d_{P} (p_{j}, p_{j^{'}}) \leq γ d_{M} (m_{i}, m_{i^{'}})

d_{P} (p_{j}, p_{j^{'}}) \leq γ d_{M} (m_{i}, m_{i^{'}})

J (α, γ) = {J ∣ J satisfies (\ref eq:condSupp) and (\ref eq:condLipp)} .

J (α, γ) = {J ∣ J satisfies (\ref eq:condSupp) and (\ref eq:condLipp)} .

J = J^{'} \in J (α, γ) arg max ∥ (\underline{D}^{*} \underline{R})_{J^{'}} ∥_{λ}

J = J^{'} \in J (α, γ) arg max ∥ (\underline{D}^{*} \underline{R})_{J^{'}} ∥_{λ}

(χ_{k})_{j} = ⎩ ⎨ ⎧ 1 - 1 0 the j-th edge starts at v_{k}, the j-th edge ends at v_{k}, otherwise .

(χ_{k})_{j} = ⎩ ⎨ ⎧ 1 - 1 0 the j-th edge starts at v_{k}, the j-th edge ends at v_{k}, otherwise .

∥ p_{j, k} - p_{j^{'}, k^{'}} ∥_{2} = ⎩ ⎨ ⎧ 2 n 2 n 2 n + 2 j \neq = j^{'}, j = j^{'} but \overline{v_{k} v_{k^{'}}} \neq \in G, j = j^{'} and \overline{v_{k} v_{k^{'}}} \in G,

∥ p_{j, k} - p_{j^{'}, k^{'}} ∥_{2} = ⎩ ⎨ ⎧ 2 n 2 n 2 n + 2 j \neq = j^{'}, j = j^{'} but \overline{v_{k} v_{k^{'}}} \neq \in G, j = j^{'} and \overline{v_{k} v_{k^{'}}} \in G,

(\underline{R})_{(j, k), i} = {10 k = i, otherwise .

(\underline{R})_{(j, k), i} = {10 k = i, otherwise .

μ_{1} (l) = Ω \subset {1, \dots, N_{3}}, ∣Ω∣ \leq l max ω \neq \in Ω max ∥ (\underline{D}^{*} \underline{D})_{Ω, ω} ∥_{1},

μ_{1} (l) = Ω \subset {1, \dots, N_{3}}, ∣Ω∣ \leq l max ω \neq \in Ω max ∥ (\underline{D}^{*} \underline{D})_{Ω, ω} ∥_{1},

λ = min {\frac{( D ^{*} R ^{l - 1} ) _{i, j}}{∥ ( D ^{*} R ^{l - 1} ) _{\cdot, j} ∥ _{\infty}}, (i, j) \in J_{l}, l = 1, \dots, L} .

λ = min {\frac{( D ^{*} R ^{l - 1} ) _{i, j}}{∥ ( D ^{*} R ^{l - 1} ) _{\cdot, j} ∥ _{\infty}}, (i, j) \in J_{l}, l = 1, \dots, L} .

\underline{D}^{*} \underline{R}^{l - 1} = (100019992 \dots \dots 11000)

\underline{D}^{*} \underline{R}^{l - 1} = (100019992 \dots \dots 11000)

\frac{( D ^{*} R ^{l - 1} ) _{i, j}}{∥ ( D ^{*} R ^{l - 1} ) _{\cdot, j} ∥ _{\infty}} > β

\frac{( D ^{*} R ^{l - 1} ) _{i, j}}{∥ ( D ^{*} R ^{l - 1} ) _{\cdot, j} ∥ _{\infty}} > β

(i, j) \in J_{l}

(i, j) \in J_{l}

d_{P} (p_{j_{1}}, p_{j_{2}})

d_{P} (p_{j_{1}}, p_{j_{2}})

\leq (γ + 2 ε_{u} / m) d_{M} (m_{i_{1}}, m_{i_{2}})

d_{P} (p_{j_{1}}, p_{j_{2}}) \geq d_{P} (p_{j_{1}^{'}}, p_{j_{2}^{'}}) - 2 ε_{u} > (γ + \frac{2 ε}{m}) d_{M} (m_{i_{1}}, m_{i_{2}}) .

d_{P} (p_{j_{1}}, p_{j_{2}}) \geq d_{P} (p_{j_{1}^{'}}, p_{j_{2}^{'}}) - 2 ε_{u} > (γ + \frac{2 ε}{m}) d_{M} (m_{i_{1}}, m_{i_{2}}) .

J_{l} \subseteq J_{l},

J_{l} \subseteq J_{l},

(1 - ε_{B}^{k})^{N_{2} - k + 1} \leq P r (J_{l} \in J (k α, γ)) .

(1 - ε_{B}^{k})^{N_{2} - k + 1} \leq P r (J_{l} \in J (k α, γ)) .

f_{l} = f \in F arg min (p_{j} - f (m_{i}))_{(i, j) \in J_{l}}_{2} + δ ∣ supp f ∣,

f_{l} = f \in F arg min (p_{j} - f (m_{i}))_{(i, j) \in J_{l}}_{2} + δ ∣ supp f ∣,

J_{l} = {(i, j) ∣ m_{i} \in supp f_{l}, p_{j} = [f_{l} (m_{i})]}

J_{l} = {(i, j) ∣ m_{i} \in supp f_{l}, p_{j} = [f_{l} (m_{i})]}

g_{l} = g \in G arg min ((\underline{X})_{i, j} - g (m_{i}))_{(i, j) \in J_{l}}_{2}

g_{l} = g \in G arg min ((\underline{X})_{i, j} - g (m_{i}))_{(i, j) \in J_{l}}_{2}

(\underline{X})_{i, j} = {10 i = [j tan ξ] otherwise

(\underline{X})_{i, j} = {10 i = [j tan ξ] otherwise

(\underline{X})_{i, j} = {10 i = 500 otherwise,

(\underline{X})_{i, j} = {10 i = 500 otherwise,

(\underline{X}_{2})_{i, j} = {ε_{B} (j) 0 i = 500 otherwise,

(\underline{X}_{2})_{i, j} = {ε_{B} (j) 0 i = 500 otherwise,

g (t) = e^{- θ t^{2}} cos (ϕt + ψ) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Structural Health Monitoring Techniques · Microwave Imaging and Scattering Analysis

Full text

Generalized orthogonal matching pursuit for multiple measurements - A structural approach

Florian Boßmann1 1University of Passau, Mathmatics / Digital Image Processing, [email protected]

Abstract

Sparse data approximation has become a popular research topic in signal processing. However, in most cases only a single measurement vector (SMV) is considered. In applications, the multiple measurement vector (MMV) case is more usual, i.e., the sparse approximation problem has to be solved for several data vectors coming from closely related measurements. Thus, there is an unknown inter-vector correlation between the data vectors. Using SMV methods typically does not return the best approximation result as the correlation is ignored. In the past few years several algorithms for the MMV case have been designed to overcome this problem. Most of these techniques focus on the approximation quality while quite strong assumptions to the type of inter-vector correlation are made.

While we still want to find a sparse approximation, our focus lies on preserving (possibly complex) structures in the data. Structural knowledge is of interest in many applications. It can give information about e.g., type, form, number or size of objects of interest. This may even be more useful than information given by the non-zero amplitudes itself. Moreover, it allows efficient post processing of the data. We numerically compare our new approach with other techniques and demonstrate its benefits in two applications.

Index Terms:

sparse approximation, multiple measurements, greedy algorithm, inter-signal correlation

I Introduction

Sparse approximations of given data are of great interest in many different applications. They are used in image processing for e.g., denoising [2, 1], compression [3] or restoration [4]. Sparsity assumptions appear in face and speech recognition [5, 6], magnetic resonance imaging (MRI) and computer tomography (CT) [7, 8], as well as in non-destructive testing [10, 9] and seismic data processing [13, 14, 11, 12]. A detailed overview can also be found in [16, 15] and the references therein.

The sparse approximation itself can be stated as follows: Given a dictionary matrix $\underline{D}\hskip 1.0pt\in\mathbb{C}^{N_{1}\times N_{3}}$ and a measurement vector $\underline{b}\hskip 1.0pt\in\mathbb{C}^{N_{1}}$ , solve

[TABLE]

for a given $\varepsilon>0$ . Here $\|\underline{x}\hskip 1.0pt\|_{0}$ is the $\ell_{0}$ -quasi-norm, i.e., $\|\underline{x}\hskip 1.0pt\|_{0}:=|\{k\ |\ x_{k}\neq 0\}|$ . The vector $\underline{x}\hskip 1.0pt$ is said to be $L$ -sparse, if $\|\underline{x}\hskip 1.0pt\|_{0}\leq L$ for $L\in\mathbb{N}$ . The matrix $\underline{D}\hskip 1.0pt$ is constructed using a basis, frame or dictionary in which the given data $\underline{b}\hskip 1.0pt$ is assumed to be sparse. Typical examples are Fourier [17] or Wavelet bases [18] as well as Curvelet [19] or Shearlet frames [20]. Dictionaries can be designed according to the underlying application, as e.g., the Gabor impulse in ultrasonic testing [21, 9].

The exact solution of (1) can in general only be found combinatorially, i.e., by considering all possible supports of $\underline{x}\hskip 1.0pt$ . Hence, finding the exact solution becomes NP-hard. There are two main strategies to find at least an approximate solution of (1): The first strategy is, replacing the $\ell_{0}$ -quasi-norm by the $\ell_{1}$ -norm what makes the problem convex. This approach is known as convex relaxation or basis pursuit [22, 23]. Greedy algorithms are another strategy to solve (1) approximatively. Those methods iteratively built up a global approximation by solving local subproblems [25]. Matching Pursuit (MP) and Orthogonal Matching Pursuit (OMP) [23] may be the most known algorithms in this context. In recent years, more advanced algorithms have been developed such as Stagewise OMP [26], Compressive Sampling MP [27, 28] and regularized OMP [27, 29]. An overview can be found in [30].

Eq. (1) is known as single measurement vector problem (SMV). It has been studied extensively over the last few years. However, there is an extension known as multiple measurement vector problem (MMV). Instead of only having one data vector $\underline{b}\hskip 1.0pt\in\mathbb{C}^{N_{1}}$ , several measurements $\underline{B}\hskip 1.0pt:=(\underline{b}\hskip 1.0pt^{1},\ldots,\underline{b}\hskip 1.0pt^{N_{2}})\in\mathbb{C}^{N_{1}\times N_{2}}$ are given. The problem is stated similar as

[TABLE]

with $\underline{X}\hskip 1.0pt=(\underline{x}\hskip 1.0pt^{1},\ldots,\underline{x}\hskip 1.0pt^{N_{2}})$ and $\|\underline{X}\hskip 1.0pt\|_{0,\infty}=\max_{k}\|\underline{x}\hskip 1.0pt^{k}\|_{0}$ , i.e., each vector $\underline{x}\hskip 1.0pt^{k}$ is sparse. In fact this problem seems to be more common in many applications. It appears e.g., in non-destructive testing [10, 9], in seismic data [13, 14, 11, 12] or Magnetoencephalography (MEG) [24]. The MMV formulation can be used, whenever several measurements of the same or similar objects were made. One straight-forward approach to solve (2) is, to split it into $N_{2}$ SMV problems and solve these independently using the methods mentioned above. However, our intuition tells us, since the $N_{2}$ measurements were made in a quite similar set-up, also the obtained data vectors $\underline{b}\hskip 1.0pt^{1},\ldots,\underline{b}\hskip 1.0pt^{N_{2}}$ should be correlated somehow. Simply solving $N_{2}$ SMV problems ignores this correlation and thus the solution quality might suffer.

Recently, new methods have been developed that consider inter-signal correlation. In [24] an extension for MP and the FOCal Underdetermined System Solver (FOCUSS) are presented. Bayesian methods are considered in [31, 32]. In [33, 34] the authors introduce Greedy pursuit and convex relaxation for the MMV problem. Theoretical results have been shown e.g., in [35]. All these methods force a common support in the reconstructed solution, i.e., the reconstructed matrix $\underline{X}\hskip 1.0pt$ has only few nonzero rows or, in other words, the columns of $\underline{X}\hskip 1.0pt$ have (nearly) the same support. In [36] two joint sparsity models (JSM) for compressed sensing are introduced. JSM-1 considers solutions where all columns $\underline{x}\hskip 1.0pt^{k}$ can be written as the sum $\underline{x}\hskip 1.0pt^{k}=\underline{x}\hskip 1.0pt^{c}+\underline{x}\hskip 1.0pt^{k,u}$ of a common sparse component $\underline{x}\hskip 1.0pt^{c}$ that is equal for each column and another unique sparse vector $\underline{x}\hskip 1.0pt^{k,u}$ . JSM-2 is equal to the support constraint considered above. Another approach is presented in [38, 37] where correlated measurements are assumed to have sparse approximations that are close in the euclidean distance. This idea is related to dynamic compressed sensing [40, 39]. Here neighboring columns are assumed to have similar support. In both cases, the support is allowed to change slowly over different data vectors. In most cases the used methods penalize non-smooth rows in $\underline{X}\hskip 1.0pt$ .

However, all methods have quite restricting support assumptions and hence cannot reconstruct simple geometries in the solution. As an example consider $\underline{X}\hskip 1.0pt$ to be the identity matrix. There is no common support between all columns and the rows are non-smooth. Nevertheless, the matrix is still clearly structured. The linear structure can be described by a shift of $1$ index per column. In the next section we introduce a generalized version of orthogonal matching pursuit for multiple measurements (GM-OMP) that takes complex structures in the data into account. Numerical evidence for the proposed method are shown in the third section of this work.

II The Algorithm

In this section we first introduce OMP and discuss its generalization to the MMV problem. GM-OMP increases the support of the solution $\underline{X}\hskip 1.0pt$ in each iteration by adding an index set $J\in\mathbb{J}$ where $\mathbb{J}$ is the set of feasible selections. The parametrization and selection of $J$ is the main idea of GM-OMP and is discussed in the second subsection. After first theoretical results are shown in the third subsection, we present an a-posteriori denoising technique that is based on the structural component of the reconstructed solution.

II-A OMP and GM-OMP

Orthogonal matching pursuit is a greedy algorithm that seeks to find a sparse solution of (1). For simplicity we assume that the columns of $\underline{D}\hskip 1.0pt$ are normalized. Then the iterative scheme of OMP can be summarized as follows:

Set the residual $\underline{r}\hskip 1.0pt=\underline{b}\hskip 1.0pt$ and the support $I=\emptyset$ . 2. 2.

Calculate $i=\operatorname*{arg\,max}|\underline{D}\hskip 1.0pt^{*}\underline{r}\hskip 1.0pt|$ and update $I\leftarrow I\cup\{i\}$ . 3. 3.

Solve $\underline{x}\hskip 1.0pt=\operatorname*{arg\,min}\limits_{\operatorname*{supp}\underline{y}\subseteq I}\|\underline{b}\hskip 1.0pt-\underline{D}\hskip 1.0pt\underline{y}\|_{2}$ and set $\underline{r}\hskip 1.0pt=\underline{b}\hskip 1.0pt-\underline{D}\hskip 1.0pt\underline{x}\hskip 1.0pt$ . 4. 4.

Iterate 2-3 until a stopping criterion holds.

Here $\underline{D}\hskip 1.0pt^{*}$ is the transposed conjugate complex matrix. Hence, the algorithm chooses the column of $\underline{D}\hskip 1.0pt$ that correlates most with the residual and adds its index to the support set in step 2. Step 3 calculates the best approximation according to the selected support. The algorithm may e.g., be stopped after $L$ iterations (the solution $\underline{x}\hskip 1.0pt$ is $L$ -sparse then), or when the residuum drops below a threshold, i.e., $\|\underline{r}\hskip 1.0pt\|_{2}\leq\varepsilon$ .

Now, let us consider the MMV problem shown in (2). The idea of GM-OMP is surprisingly simple. We only adapt the second step of OMP, while all other steps stay the same. Therefore, note that OMP chooses one index $i$ and adds it to the support set $I$ . Since we are now dealing with multiple measurements, GM-OMP is allowed to add not only one index $i$ , but multiple indices to the support (e.g., one index per column of $\underline{X}\hskip 1.0pt$ ). Let us denote the set of all indices added by $J$ . This index set should be chosen from a set of feasible selections $J\in\mathbb{J}\subseteq\mathcal{P}(\{(j,k)\ |j\leq N_{3},k\leq N_{2}\})$ where $\mathcal{P}(\cdot)$ denotes the power set. The second step of GM-OMP now reads as follows:

Choose $J\in\mathbb{J}$ and update $I\leftarrow I\cup J$ .

The complete scheme of GM-OMP is shown in Alg. 1 where the maximum number of iterations $L$ and the minimal residuum norm $\varepsilon_{R}$ is included as stopping criterion. Of course we need to define the feasible set $\mathbb{J}$ and find a suitable choice $J\in\mathbb{J}$ . This problem will be discussed in the next subsection. Let us first consider three examples to clarify the principle of GM-OMP and the feasible set $\mathbb{J}$ .

For our first example, consider the set

[TABLE]

i.e., $\mathbb{J}_{P}$ contains all index sets that can be associated with the support of matrices $\underline{M}\hskip 1.0pt$ having at most one non-zero element per column. Here $\underline{M}\hskip 1.0pt$ is of same size as $\underline{X}\hskip 1.0pt$ . Choose $J\in\mathbb{J}_{P}$ such that

[TABLE]

where $\underline{r}\hskip 1.0pt^{k}$ is the $k$ -th column of the residual matrix $\underline{R}\hskip 1.0pt$ . This way, GM-OMP is equivalent to OMP parallelly applied to each column of $\underline{B}\hskip 1.0pt$ . In our next example define

[TABLE]

as the set of all 1-sparse supports. It is easy to see that the best choice $J\in\mathbb{J}_{V}$ is $J=\{\operatorname*{arg\,max}|\underline{D}\hskip 1.0pt^{*}\underline{R}\hskip 1.0pt|\}$ the index set containing only the position of the maximum absolute value of $\underline{D}\hskip 1.0pt^{*}\underline{R}\hskip 1.0pt$ . Now, GM-OMP is identical to using OMP on the vectorized formulation of (2), i.e., rewrite $\underline{X}\hskip 1.0pt,\underline{B}\hskip 1.0pt$ as column vectors and $\underline{D}\hskip 1.0pt$ becomes a block diagonal matrix. As our last example, consider

[TABLE]

containing all support sets with constant row index. A possible choice $J\in\mathbb{J}_{S}$ is given by

[TABLE]

and $J=\{(i,j)\ |\ j\leq N_{2}\}$ . Here $\|(\underline{D}\hskip 1.0pt^{*}\underline{R}\hskip 1.0pt)_{j,\cdot}\|_{\lambda}$ denotes the $\lambda$ -norm of the $j$ -th row of the matrix. For $\lambda=1$ GM-OMP becomes the simultaneous OMP (S-OMP) introduced in [33], the case $\lambda=2$ is discussed in [24].

For the three demonstrated choices of $\mathbb{J}$ , GM-OMP transforms into well known algorithms for the MMV problem. However, neither $\mathbb{J}_{P}$ nor $\mathbb{J}_{V}$ contain structured sets while $\mathbb{J}_{S}$ is bounded to row-sparsity of $\underline{X}\hskip 1.0pt$ (see discussion in the introduction). Thus, we introduce a more general choice for $\mathbb{J}$ in the next subsection. Here, $\mathbb{J}=\mathbb{J}(\alpha,\gamma)$ can be adapted by parameters. We will see that the three examples form the extreme cases of the parameter choice.

II-B Feasible set and selection

Alg. 1 demonstrates the generalized scheme of OMP for multiple measurements. The set $\mathbb{J}$ represents the feasible sparsity patterns that can be chosen per iteration. However, since $\mathbb{J}$ is a subset of a power set, it can be of exponential size. Thus, it is not sufficient to leave it to the user as input data. Instead we will parametrize $\mathbb{J}$ and define a selection rule for $J\in\mathbb{J}$ based on the parameters. This way, the user only has to choose parameters that describe sparsity patterns suitable for his application.

Remark 1

Following, we describe a parametrization idea that, in the authors opinion, can be used in many applications. However, the reader might choose a different description of $\mathbb{J}$ and an according selection rule $J\in\mathbb{J}$ that is more suitable for the particular problem.

Note that $J\in\mathbb{J}$ is a set of two-dimensional elements (row and column indices). Our idea is, to use exactly two parameters $\alpha,\gamma$ to determine $\mathbb{J}=\mathbb{J}(\alpha,\gamma)$ . It is clear that we cannot cover all sets $\mathbb{J}$ with two parameters, since the number of possible choices for $\mathbb{J}$ grows exponential. Thus, we need a parametrization that generates suitable sets for applications. Analogous to the given examples (3)-(5), we identify an element $J\in\mathbb{J}$ by its pattern matrix $\underline{M}\hskip 1.0pt$ where $J=\operatorname*{supp}\underline{M}\hskip 1.0pt$ holds. Since we assume the columns of $\underline{X}\hskip 1.0pt$ to be sparse, it is reasonable to permit only matrices $\underline{M}\hskip 1.0pt$ with (at most) 1-sparse columns, i.e., in each iteration of GM-OMP the support of $\underline{X}\hskip 1.0pt$ should at most grow by one index per column. Given such a matrix we interpret its sparsity pattern as samples of a function, mapping the column indices to corresponding row indices (Fig. 1). Due to the 1-sparse columns of $\underline{M}\hskip 1.0pt$ this mapping is unique but not necessarily defined for all rows (there may be zero columns in $\underline{M}\hskip 1.0pt$ ). Having this in mind, we postulate the sparsity pattern of $\underline{M}\hskip 1.0pt$ to hold two conditions:

•

The domain of the sparsity pattern should be connected.

•

The sparsity pattern should be (Lipschitz-)continuous.

Fig. 1 shows a sparsity pattern where both conditions do not hold (see the dotted lines). Next, we formulate our parametrization of $\mathbb{J}$ that uses two parameters $\alpha,\gamma$ to ensure the above stated conditions. Therefore, we introduce the parameter and measurement space.

Let $\mathscr{P}$ and $\mathscr{M}$ be metric spaces. For $\underline{X}\hskip 1.0pt\in\mathbb{C}^{N_{3}\times N_{2}}$ let $p_{1},\ldots,p_{N_{3}}\in\mathscr{P}$ and $m_{1},\ldots,m_{N_{2}}\in\mathscr{M}$ be given. We call $\mathscr{P}$ and $\mathscr{M}$ the parameter space and measurement space respectively. The elements $p_{j}$ are parameters (of the dictionary) and $m_{j}$ is a measurement (setup).

Remark 2

At first glance it seems quite restricting to require the existence of such spaces and elements. However, they come quite naturally. For example consider $\underline{D}\hskip 1.0pt$ being the Fourier matrix. Then $p_{j}$ is the frequency of the $j$ -th column of $\underline{D}\hskip 1.0pt$ . If we use a Wavelet dictionary $\underline{D}\hskip 1.0pt$ , the parameters $p_{j}$ contain the shift and scaling of each column. For convolution matrices $\underline{D}\hskip 1.0pt$ each $p_{j}$ is the shift of the $j$ -th column. On the other hand, consider the measurement data $\underline{B}\hskip 1.0pt$ was obtained using several sensors at different positions, each column of $\underline{B}\hskip 1.0pt$ corresponding to one sensor. Thus we can set $m_{j}$ to the position of the $j$ -th sensor. While these parameters give the following formulas a more reasonable interpretation, one can surely just use $p_{j}=j$ and $m_{j}=j$ .

Now we can formulate the above stated conditions on $J=\operatorname*{supp}\underline{M}\hskip 1.0pt$ . Using the points $m_{i}$ as vertices and defining edges using the metric $d_{\mathscr{M}}(\cdot,\cdot)$ defined on $\mathscr{M}$ , we can state the connected sparsity pattern condition as, the graph

[TABLE]

Lipschitz continuity of the pattern is ensured if

[TABLE]

with the metric $d_{\mathscr{P}}$ on $\mathscr{P}$ . We define $\mathbb{J}=\mathbb{J}(\alpha,\gamma)$ by

[TABLE]

By (8) we also ensure 1-sparse columns of $\underline{M}\hskip 1.0pt$ ( $J=\operatorname*{supp}\underline{M}\hskip 1.0pt$ ) if $\gamma<\infty$ . For $\gamma=\infty$ we use the convention $\infty\cdot 0=\lim_{\gamma\rightarrow\infty}\gamma\cdot 0=0$ . We obtain the relations $\mathbb{J}(\infty,\infty)=\mathbb{J}_{P}$ , $\mathbb{J}(0,\gamma)=\mathbb{J}_{V}$ and $\mathbb{J}(\infty,0)=\mathbb{J}_{S}$ , i.e., our parametrization covers the shown examples.

Given the set $\mathbb{J}(\alpha,\gamma)$ we need to choose $J\in\mathbb{J}$ . Like in OMP, we search for the support $J$ that maximizes the correlation between dictionary and residuum, i.e., we would like to solve

[TABLE]

for some $\lambda\geq 1$ (compare Eq. 6). Intuitively, we want all values $(\underline{D}\hskip 1.0pt^{*}\underline{R}\hskip 1.0pt)_{J}$ to be of same order of magnitude, i.e., $\lambda=1$ (or $\lambda=2$ ) may be a good choice. Unfortunately, we can state the following theorem which is proven in the next subsection:

Theorem 1

For $\lambda<\infty$ and arbitrary $\alpha,\gamma$ problem (10) is NP-hard.

Hence, we use a greedy algorithm to approximatively solve (10). Indeed, the algorithm returns the exact solution of (10) with $\lambda=\infty$ . Starting with the correlation matrix $C=|\underline{D}\hskip 1.0pt^{*}\underline{R}\hskip 1.0pt|$ we iteratively built a matrix $\underline{M}\hskip 1.0pt$ such that $J=\operatorname*{supp}\underline{M}\hskip 1.0pt\in\mathbb{J}$ . Beginning with $\underline{M}\hskip 1.0pt=0$ we add the position of the maximum value of $C$ to the support, i.e., we calculate $(i,j)=\operatorname*{arg\,max}(C)_{i^{\prime},j^{\prime}}$ and update $(\underline{M}\hskip 1.0pt)_{i,j}=1$ . To ensure that (7) is not penalized, we restrict ourself to indices $i^{\prime}\in K$ where $K$ is the set of all indices for which (7) holds. Afterwards, the chosen element $(C)_{i,j}$ and all elements in $C$ that violate (8) are set to zero, This way, it is guaranteed that (8) is fulfilled. The scheme is shown in Alg. 2.

For implementation we recommend to replace $(C)_{I,\cdot}\neq 0$ by $(C)_{I,\cdot}>\varepsilon$ using a reasonable threshold $\varepsilon$ (see also the discussion in the theory part). Furthermore, if $C=|\underline{D}\hskip 1.0pt^{*}\underline{R}\hskip 1.0pt|$ from start on has zero entries (or elements below the threshold), they will never be chosen by the algorithm. This assures that the support is not artificially enlarged, i.e., $\operatorname*{supp}\underline{M}\hskip 1.0pt\subseteq\operatorname*{supp}C$ always holds.

It is easy to see that Alg. 2 returns the exact solution for $(\alpha,\gamma)=(\infty,\infty)$ and $(\alpha,\gamma)=(0,\gamma)$ (i.e,, for $\mathbb{J}_{P}$ and $\mathbb{J}_{V}$ ). Setting $(\alpha,\gamma)=(\infty,0)$ (i.e., $\mathbb{J}_{S}$ ) Alg. 2 solves (6) for $\lambda=\infty$ .

II-C Theoretical results

We first prove Theorem 1 that is the motivation for Alg. 2.

Proof:

We give a polynomial-time reduction of the coloring problem: Given a graph $G$ and a number of colors $C$ , assign a color to each vertex $v_{j}\in G$ such that for each edge $\overline{v_{i}v_{j}}\in G$ the vertices $v_{i},v_{j}$ have a different color.

Let $G$ have $M$ vertices, each vertex with at most $n$ edges. For the reduction we need each vertex to have the same amount of edges. Thus, we add edges $\overline{v_{j}\cdot}$ to each vertex $v_{j}$ until it has exactly $n$ edges. Here $\overline{v_{j}\cdot}$ denotes an edge with no second vertex (or an ”imaginary” vertex that will not be considered for the coloring). Let the total amount of edges be given by $N$ where $n\leq N\leq Mn$ . Now define $m_{i}=e_{i}\in\mathbb{R}^{M}$ as the $i$ -th unit vector, i.e., $\|m_{i}-m_{i^{\prime}}\|_{2}=\sqrt{2}$ for $i\neq i^{\prime}$ . Furthermore, define $\chi_{k}\in\mathbb{R}^{N}$ , $k=1,\ldots,M$ by

[TABLE]

Note that $\chi_{k}$ is exactly $n$ -sparse. For the $j$ -th unit vector $e^{\prime}_{j}\in\mathbb{R}^{C}$ set $p_{j,k}=e^{\prime}_{j}\otimes\chi_{k}\in\mathbb{R}^{CN}$ where $\otimes$ is the Kronecker product. (For simplicity we keep the two-dimensional indexing of $p_{j,k}$ instead of reordering it into a single index.) We obtain

[TABLE]

i.e., for different colors or vertices which are not connected we obtain $\sqrt{2n}$ . Now choose $\alpha=\infty$ and $\gamma=\sqrt{n}$ , then (8) only holds for pairs $(j,k),(j^{\prime},k^{\prime})$ with either different colors $j\neq j^{\prime}$ or not connected vertices $\overline{v_{k}v_{k^{\prime}}}\not\in G$ . Set $\underline{D}\hskip 1.0pt$ as identity matrix and $\underline{R}\hskip 1.0pt\in\mathbb{R}^{CM\times M}$ to

[TABLE]

Then, there is a feasible coloring of $G$ if and only if $\|(\underline{D}\hskip 1.0pt^{*}\underline{R}\hskip 1.0pt)_{J}\|_{\lambda}=M^{1/\lambda}$ for $\lambda<\infty$ and $J$ solution of (10). By construction of $\underline{R}\hskip 1.0pt$ the value $M^{1/\lambda}$ can only be achieved with $|J|=M$ and $(\underline{R}\hskip 1.0pt)_{(j,k),i}=1$ for all $((j,k),i)\in J$ . It follows that $(j,k)=(j,i)$ and thus the $i$ -th vertex is assigned with the $j$ -th color. Since $|J|=M$ each vertex is colored. On the other hand, each feasible coloring defines an index set $J\in\mathbb{J}(\alpha,\gamma)$ with $\|(\underline{D}\hskip 1.0pt^{*}\underline{R}\hskip 1.0pt)_{J}\|_{\lambda}=M^{1/\lambda}$ what is maximal since $|J^{\prime}|\leq M$ for all $J^{\prime}\in\mathbb{J}(\alpha,\gamma)$ . ∎

Following, we prove reconstruction results for GM-OMP given exact data or data obtained with noised sparsity pattern. Therefore, we need results shown in [23]. We briefly summarize Theorem 3.1, Corollary 3.2 and Theorem 3.5 of this work for the reader: Given the Babel function

[TABLE]

OMP recovers the $L$ -sparse solution of (1) in the noiseless case ( $\varepsilon=0$ ) whenever $\mu_{1}(L)<1-\mu_{1}(L-1)$ . There exists a weak version of OMP that chooses an index $i$ in each iteration such that $|(\underline{D}\hskip 1.0pt^{*}\underline{r}\hskip 1.0pt)_{i}|\geq\lambda\max|\underline{D}\hskip 1.0pt^{*}\underline{r}\hskip 1.0pt|$ with a weakness constant $\lambda\leq 1$ holds. Weak OMP recovers an $L$ -sparse solution whenever $\mu_{1}(L)<\lambda(1-\mu_{1}(L-1))$ .

Let $\underline{R}\hskip 1.0pt^{l}$ be the residual matrix after $l$ iterations and $J_{l}\in\mathbb{J}$ the greedy choice returned by Alg. 2. We calculate the weakness parameter of GM-OMP by

[TABLE]

Note that $\lambda$ can be very small depending on the range of amplitudes. Exemplary, consider

[TABLE]

with $\alpha=\infty$ , $\gamma=0$ . The greedy choice would either select the first or the second row. In both cases we obtain $\lambda=\min\{(1001-k)/k,\ k=1,\ldots,1000\}$ and hence $\lambda=1/1000$ . However, $\lambda$ will be close to $1$ for feasible sets where $\min|(\underline{X}\hskip 1.0pt)_{J_{l}}|>\max|(\underline{X}\hskip 1.0pt)_{J_{l^{\prime}}}|$ for $l<l^{\prime}$ holds. Before we state our first theorem, we need the following definition.

Definition 1

For $L$ sets $J_{1},\ldots,J_{L}\in\mathbb{J}$ we say that the sets are $(\alpha,\gamma)$ -intersecting if there exists $(i,j)\in J_{l}$ , $(i^{\prime},j^{\prime})\in J_{l^{\prime}}$ , $l\neq l^{\prime}$ such that (7) and (8) hold.

Let $\underline{X}\hskip 1.0pt$ be the unique $L$ -column-sparse solution of $\underline{B}\hskip 1.0pt=\underline{D}\hskip 1.0pt\underline{X}\hskip 1.0pt$ and $\operatorname*{supp}\underline{X}\hskip 1.0pt=\cup_{l=1}^{L}J_{l}$ with $J_{l}\in\mathbb{J}(\alpha,\gamma)$ . We state

Theorem 2

GM-OMP recovers $\underline{X}\hskip 1.0pt$ and all elements $J_{1},\ldots,J_{L}$ of the sparsity pattern in $L$ iterations whenever $J_{1},\ldots,J_{L}$ are not $(\alpha,\gamma)$ -intersecting and $\mu_{1}(L)<\lambda(1-\mu_{1}(L-1))$ .

Proof:

The reconstruction of the support of $\underline{X}\hskip 1.0pt$ follows directly by the exactness of weak OMP. Since the sets are not $(\alpha,\gamma)$ -intersecting the indices selected by Alg. 2 belong to the same set $J_{l}$ . Suppose an index in $J_{l}$ has not been selected. Then the matrix $C$ from Alg. 2 is not zero and Alg. 2 will not stop iterating. Thus, the complete set $J_{l}$ is recovered. ∎

Theorem 2 has two disadvantages. First, the conditions depend on the solution and hence cannot be checked beforehand. Second, as we have seen $\lambda$ can be small and thus the conditions are quite restricting. To overcome the last problem, we define $\beta=\mu_{1}(L)/(1-\mu_{1}(L-1))$ . Now, we can use an adaptive threshold in Alg. 2 by selecting only indices such that $\lambda>\beta$ holds, i.e., in the $l$ -th iteration only indices $(i,j)$ with

[TABLE]

will be selected. We can state

Theorem 3

Using this threshold strategy in Alg. 2, GM-OMP recovers the solution $\underline{X}\hskip 1.0pt$ in $L^{\prime}$ iterations where $L\leq L^{\prime}\leq N_{3}L$ whenever $\beta\leq 1$ . Furthermore, let $J^{\prime}_{1},\ldots,J^{\prime}_{L^{\prime}}$ be the feasible sets selected in each iteration and $\operatorname*{supp}\underline{X}\hskip 1.0pt=\cup_{l=1}^{L}J_{l}$ . If $J_{1},\ldots,J_{L}$ are not $(\alpha,\gamma)$ -intersecting, then there exists a partition $L^{\prime}_{1},\ldots,L^{\prime}_{l}$ of $\{1,\ldots,L^{\prime}\}$ such that $J_{l}=\cup_{l^{\prime}\in L^{\prime}_{l}}J^{\prime}_{l^{\prime}}$ .

Proof:

Note that the first index selected in Alg. 2 is the maximum of $C=|\underline{D}\hskip 1.0pt^{*}\underline{R}\hskip 1.0pt|$ and thus equivalent to an OMP choice, i.e., Alg. 2 selects at least one element per iteration. As $\beta\leq 1$ this choice is part of $\operatorname*{supp}\underline{X}\hskip 1.0pt$ . The algorithm terminates after at most $N_{3}L$ iterations what is the number of non-zero entries in $\underline{X}\hskip 1.0pt$ . If the sets $J_{1},\ldots,J_{L}$ are not $(\alpha,\gamma)$ -intersecting the indices selected by Alg. 2 belong to the same set $J_{l}$ . Since we use a threshold, we can no longer follow that $J_{l}$ is found in one iteration. However, because $\operatorname*{supp}\underline{X}\hskip 1.0pt$ is recovered completely after $L^{\prime}$ iterations, the existence of such a partition follows. ∎

Now, let us discuss the reconstruction qualities of GM-OMP due to noised sparsity patterns. Instead of $\underline{B}\hskip 1.0pt=\underline{D}\hskip 1.0pt\underline{X}\hskip 1.0pt$ with $\operatorname*{supp}\underline{X}\hskip 1.0pt=\cup_{l=1}^{L}J_{l}$ , the noised data $\widetilde{\underline{B}\hskip 1.0pt}=\underline{D}\hskip 1.0pt\widetilde{\underline{X}\hskip 1.0pt}$ with $\operatorname*{supp}\widetilde{\underline{X}\hskip 1.0pt}=\cup_{l=1}^{L}\widetilde{J}_{l}$ is given. Here $\widetilde{J}_{l}$ is the noised version of the pattern $J_{l}$ . We consider two different kinds of noise and analyze in which case GM-OMP is able to reconstruct the sets $\widetilde{J}_{l}$ , $l=1,\ldots,L$ . Given $\widetilde{J}_{l}$ we can try to recover $J_{l}$ using a post processing strategy that will be introduced in the next subsection.

Theorem 4

Let $J_{l}$ be corrupted by uniform noise $\varepsilon_{u}>0$ :

[TABLE]

Set $m=\min_{i\neq i^{\prime}}d_{\mathscr{M}}(m_{i},m_{i^{\prime}})$ . If (8) holds for $J_{l}$ with parameter $\gamma$ , then (8) holds for $\widetilde{J}_{l}$ with Lippschitz parameter $\widetilde{\gamma}=\gamma+2\varepsilon_{u}/m$ . If $J_{1},\ldots,J_{L}$ are not $(\alpha,\gamma+4\varepsilon/m)$ -intersecting, then $\widetilde{J}_{1},\ldots,\widetilde{J}_{L}$ are not $(\alpha,\gamma+2\varepsilon/m)$ -intersecting.

Proof:

For $(i_{1},j_{1}),(i_{2},j_{2})\in\widetilde{J}_{l}$ and $(i_{1},j^{\prime}_{1}),(i_{2},j^{\prime}_{2})\in J_{l}$

[TABLE]

holds. Equivalently, $(i_{1},j_{1})\in\widetilde{J}_{l}$ , $(i_{2},j_{2})\in\widetilde{J}_{l^{\prime}}$ , $(i_{1},j^{\prime}_{1})\in J_{l}$ , $(i_{2},j^{\prime}_{2})\in J_{l^{\prime}}$ with $l\neq l^{\prime}$ using that $J_{l},J_{l^{\prime}}$ are not $(\alpha,\gamma+4\varepsilon/m)$ -intersecting and the inverse triangle inequality:

[TABLE]

∎

The noise assumed in Theorem 4 typically appears in applications where measurements may be corrupted due to shaking apertures. If an upper bound $\varepsilon_{u}$ is known, the parameters of GM-OMP can be adapted.

Theorem 5

Let $J_{l}$ be corrupted by Bernoulli distributed noise $\varepsilon_{B}\in[0,1]$ , i.e.,

[TABLE]

where $\mathop{Pr}\left((i,j)\not\in\widetilde{J}_{l}\ |\ (i,j)\in J_{l}\right)$ is the probability that an index $(i,j)\in J_{l}$ is not in the corrupted set $(i,j)\not\in\widetilde{J}_{l}$ . Let (7) hold for $J_{l}$ with parameter $\alpha$ and $\mathop{Pr}(\widetilde{J}_{l}\in\mathbb{J}(k\alpha,\gamma))$ be the probability that (7) holds for $\widetilde{J}_{l}$ with parameter $k\alpha$ , $k\in\mathbb{N}$ . Then

[TABLE]

Proof:

Note that (7) gives a connected graph. We search for a lower bound of the probability, that the graph is still connected when we remove points $m_{j}$ with probability $\varepsilon_{B}$ but add edges $\overline{m_{j}m_{j^{\prime}}}$ with $d_{\mathscr{M}}(m_{j},m_{j^{\prime}})\leq k\alpha$ . For a lower bound it is sufficient to consider the worst case, i.e., $m_{j}=j\alpha$ . The graph becomes a line with at most $N_{2}$ points. The graph is connected whenever there is a connection from $m_{1}$ to $m_{N_{2}}$ . For the new parameter $k\alpha$ we obtain the edges $\overline{m_{j},m_{j+q}}$ with $q\leq k$ . It follows that the graph will no longer be connected whenever $k$ consecutive points vanish.

This problem is an application of success runs in Bernoulli trails [41]. In particular, $\mathop{Pr}(\widetilde{J}_{l}\in\mathbb{J}(k\alpha,\gamma))$ is the probability that the longest success run is shorter than $k$ . This probability has an exact but rather complicated analytic expression. The simple lower bound $(1-\varepsilon_{B}^{k})^{N_{2}-k+1}$ is shown in [42]. Other bounds and the exact analytic form can also be found in [41]. ∎

The noise assumed in Theorem 5 appears in applications e.g., whenever a single measurement is lost or a sensor fails. The parameter $\alpha$ can be adapted according to Theorem 5.

Theorem 2 gives two conditions for exact recovery. The condition $\mu_{1}(L)<\lambda(1-\mu_{1}(L-1))$ ensures recovery of the right support set. It was deduced from OMP and was shown to be strict [23]. The $(\alpha,\gamma)$ -separation condition guarantees the separation of the support into $L$ structures. This condition is not strict. The $L$ feasible sets $J_{1},\ldots,J_{L}$ may still be reconstructed without having $(\alpha,\gamma)$ -separation depending on the amplitudes $(\underline{X}\hskip 1.0pt)_{J_{l}}$ , $l=1,\ldots,L$ . Theorem 3 gives reconstruction results if one or both of these conditions were penalized.

In Theorem 4 and 5 we discussed a noised sparsity pattern and how the parameters $\alpha,\gamma$ should be adapted. In the next section, we present a post processing step to reconstruct the original pattern given. Beforehand, we give a statement on two other cases of noisy data. First, consider the case where $\underline{B}\hskip 1.0pt$ does not have an exact sparse representation $\underline{X}\hskip 1.0pt$ with $\underline{D}\hskip 1.0pt\underline{X}\hskip 1.0pt=\underline{B}\hskip 1.0pt$ , but instead we search for a sparse approximation as in problem 2. This problem occurs e.g., when the data $\underline{B}\hskip 1.0pt$ is noised. We can easily obtain similar results to Theorem 2 and 3 by replacing the exact recovery condition of (weak) OMP with the optimal $L$ -term approximation conditions given in [23].

As another scenario, consider a sampling $\underline{B}\hskip 1.0pt$ that is sparse in some dictionary $\underline{D}\hskip 1.0pt$ , i.e., there exists a sparse solution $\underline{X}\hskip 1.0pt$ of $\underline{D}\hskip 1.0pt\underline{X}\hskip 1.0pt=\underline{B}\hskip 1.0pt$ . Now, instead of $\underline{D}\hskip 1.0pt$ we have only given the dictionary $\widetilde{\underline{D}\hskip 1.0pt}$ . Exemplary, let $\underline{B}\hskip 1.0pt$ be sparse in Fourier domain but not necessarily containing frequencies given by the discrete Fourier transform. Given the Fourier transform of $\underline{B}\hskip 1.0pt$ is it possible to reconstruct the exact frequencies, i.e., given an approximation in $\widetilde{\underline{D}\hskip 1.0pt}$ is it possible to reconstruct the exact dictionary $\underline{D}\hskip 1.0pt$ ? This problem was analyzed under the keyword of super-resolution in [43] for the SMV problem. Only recently, the MMV problem with common support constraint was discussed in [44]. In both cases, an exact recovery is possible whenever the non-zero entries are separated by at least a distance depending on the super-resolution factor. An interesting question we consider for future work, is the connection between this separation and patterns that are not $(\alpha,\gamma)$ -intersecting. This connection may be used to design a super-resolution method for generalized patterns.

II-D Post processing

So far, we presented the GM-OMP algorithm and proved basic theoretical properties. Before we demonstrate the technique on numerical examples, we discuss how to use GM-OMP for powerful post processing of the data. Consider we reconstructed a solution $\underline{X}\hskip 1.0pt$ and its support $\operatorname*{supp}\underline{X}\hskip 1.0pt=I=\widetilde{J}_{1}\cup\ldots\cup\widetilde{J}_{L}$ , where $\widetilde{J}_{l}\in\mathbb{J}$ is assumed to be a corrupted sparsity pattern.

While it is a common idea to denoise corrupted amplitude values of $\underline{X}\hskip 1.0pt$ , the sparsity pattern has been of minor interest so far. Even though the pattern itself might be noised. Exemplary, in non-destructive testing external forces during the measurement can corrupt the probes positions what influences the geometry and thus the sparsity pattern [10, 9]. As another example, consider $\underline{D}\hskip 1.0pt$ being the Fourier matrix. It only contains a fixed amount of Fourier frequencies. However, there are signals that are sparse in Fourier domain but only consist of frequencies not covered by the matrix. Then the reconstructed sparse approximation most likely rounds these frequencies upto the closest frequency of $\underline{D}\hskip 1.0pt$ , what can be interpreted as a corrupted sparsity pattern of $\underline{X}\hskip 1.0pt$ . As last example, simply assume a failed measurement, i.e., a zero column in $\underline{B}\hskip 1.0pt$ . Surely the corresponding column in $\underline{X}\hskip 1.0pt$ will also be zero. To reconstruct the original signal, we can apply inpainting ideas on the sparsity pattern.

Remembering Fig. 1, i.e., $J_{l}=\operatorname*{supp}\underline{M}\hskip 1.0pt_{l}$ as a discrete sampling of a function, we can denoise the sparsity pattern for $l=1,\ldots,L$ by solving the problems

[TABLE]

with a weight $\delta>0$ . Here, $\mathcal{F}=\mathcal{F}(\mathscr{M},\mathscr{P})$ is a suitable function space (e.g., polynomials, splines, $\ldots$ ). Afterwards set the denoised pattern $J_{l}$ to

[TABLE]

where $[f(m_{i})]$ is $f(m_{i})\in\mathscr{P}$ rounded to the closest of the elements $p_{1},\ldots,p_{N_{3}}$ . For small $\delta$ the support $\operatorname*{supp}f$ may be large and hence $|J_{l}|$ can increase. This gives an inpainting strategy to reconstruct missing structure elements.

A similar approach can be applied to denoise the amplitudes of $\underline{X}\hskip 1.0pt$ . Given $J_{l}$ and assume that $J_{l}\cap J_{l^{\prime}}=\emptyset$ for all $l^{\prime}\neq l$ we solve

[TABLE]

on a function space $\mathcal{G}=\mathcal{G}(\mathscr{M},\mathbb{C})$ and update $(X)_{i,j}=g_{l}(m_{i})$ , for all $(i,j)\in J_{l}$ .

III Numerics

We demonstrate the advantages of our proposed algorithm in three examples. First, we compare the technique with other sparse approximation methods for the MMV problem. Afterwards we discuss two practical examples and illustrate the information given by the sparsity pattern.

III-A Numerical comparison

We compare our method to three other techniques: OMP applied to each column separately, S-OMP [33] and MSBL, a technique presented in [45] based on sparse bayesian learning. Let $\underline{D}\hskip 1.0pt$ be a convolution matrix of a Gauss kernel with standard deviation $\sqrt{2.5}$ . We define the matrix $\underline{X}\hskip 1.0pt\in\mathbb{R}^{1000\times 1000}$ by

[TABLE]

for $i,j\leq 1000$ , i.e., each column of $\underline{X}\hskip 1.0pt$ is $1$ -sparse. The matrix is clearly structured, it consists of one line with a slope of $\xi$ . For $\xi=45^{\circ}$ this becomes the identity matrix; $\xi=0^{\circ}$ gives a matrix with one non-zero row (which is the pattern that S-OMP and the MSBL assume). For $\xi=0^{\circ},\ldots,45^{\circ}$ we calculate $\underline{B}\hskip 1.0pt=\underline{D}\hskip 1.0pt\underline{X}\hskip 1.0pt$ and use all methods to reconstruct $\underline{X}\hskip 1.0pt$ . Therefore, we choose $m_{i}=i$ , $p_{j}=j$ , $\alpha=1$ and $\gamma=1$ (which corresponds to a maximal slope of $45^{\circ}$ ). We use $L=1$ iteration since $\underline{X}\hskip 1.0pt$ contains exactly one structure. In Fig. 2 the reconstruction error and the number of non-zero elements in the solution $\underline{X}\hskip 1.0pt$ are shown for all algorithms. Nearly all methods are able to find a good approximation. Only S-OMP forces row sparsity of $\underline{X}\hskip 1.0pt$ and thus produces stare casing effects which corrupt the solution. Both the MSBL and S-OMP assume that $\underline{X}\hskip 1.0pt$ is row-sparse, i.e., there are only a few non-zero rows. Once a row contains a non-zero element, the entire row is considered to be non-zero. This leads to an extreme overestimation of the support while OMP and GM-OMP can find the exact number of non-zero entries.

Next, we demonstrate the power of the proposed denoising step. Therefore consider $\underline{X}\hskip 1.0pt$ and its noised versions $\underline{X}\hskip 1.0pt_{1},\underline{X}\hskip 1.0pt_{2}\in\mathbb{R}^{1000\times 1000}$ with

[TABLE]

where $|\varepsilon_{u}(j)|\leq\varepsilon_{u}$ is uniform noise and $\varepsilon_{B}(j)\in\{0,1\}$ , $\mathop{Pr}(\varepsilon_{B}(j)=0)=\varepsilon_{B}$ is Bernoulli distributed. Given $\underline{B}\hskip 1.0pt_{1}=\underline{D}\hskip 1.0pt\underline{X}\hskip 1.0pt_{1}$ or $\underline{B}\hskip 1.0pt_{2}=\underline{D}\hskip 1.0pt\underline{X}\hskip 1.0pt_{2}$ we want to reconstruct $\underline{X}\hskip 1.0pt$ . The mean squared error over $100$ runs for both cases is plotted in Fig. 3. The noise level gives the values of $\varepsilon_{u}$ respectively $\varepsilon_{B}$ .

For $\underline{X}\hskip 1.0pt_{1}$ we adapted the parameters according to Theorem 4 and set $\gamma=13\geq 1+2\varepsilon_{u}/m$ . In the second case, we choose $\alpha=6$ for the reconstruction of $\underline{X}\hskip 1.0pt_{2}$ . Using Theorem 5 this gives us a reconstruction probability of more than $78\%$ even for $\varepsilon_{B}=0.25$ . In both cases we solve optimization problem (13) with $\delta=0$ and $\mathcal{F}=\Pi_{4}$ afterwards to denoise the pattern.

Note that the row-sparsity assumption of S-OMP and MSBL hold for $\underline{X}\hskip 1.0pt$ . Nevertheless, only GM-OMP is able to find a good approximation in most cases. All other methods only reconstruct the noised matrices $\underline{X}\hskip 1.0pt_{1},\underline{X}\hskip 1.0pt_{2}$ . For a high probability of $\sigma=0$ the MSE of GM-OMP increases, i.e., the parameter $\alpha$ does no longer compensate the missing data. Interestingly, S-OMP profits from its stare casing effects when the pattern is uniformly noised and hence returns a slightly better solution.

III-B Application 1: Non-destructive testing

As a first practical example we analyze ultrasonic data obtained from non-destructive testing of steel tubes. The original data shown in Fig. 4 was generated by the ”Time-of-Flight Diffraction” (ToFD) method using an Olympus Omniscan iX system with $5$ Mhz transducer, $6$ mm diameter and $70^{\circ}$ angle of incidence. The tested tube was a large diameter pipe with outer diameter $1066$ mm and $23.3$ mm wall thickness. Each column of the data represents a measured signal at different positions on the tubes surface. The positions were equidistantly set on a straight line with a distance of $0.5$ mm, hence we set $m_{i}=0.5i$ . The signals were measured in time with a sampling ratio of $0.01\mu$ s. Four major events can clearly been seen in the data. The topmost one is an ultrasonic impulse that directly travels through the surface from transducer to receiver - the lateral wave. The bottommost is an impulse that was reflected by the back wall - the back wall echo. The two events in between (recognizable as parabolas) indicate defects in the material. We use GM-OMP to recover and denoise these events.

Ultrasonic data is column-wise sparse when $\underline{D}\hskip 1.0pt$ is a convolution matrix based on the Gabor impulse ([9, 11])

[TABLE]

Here $\theta$ is the bandwidth factor, $\phi$ is the frequency and $\psi$ is the phase. Thus, $p_{j}=0.01j$ is the shift of the $j$ -th column in $\mu$ s. For the given data we choose $p=6.8486$ , $\phi=14.685$ and $\psi=-2.0836$ (see [9, 11] for details about the parameter choice). We define our feasible set $\mathbb{J}$ using $\alpha=0.5$ such that $|m_{i}-m_{i^{\prime}}|\leq\alpha$ only for $i^{\prime}=i-1,i+1$ . Note that (8) compares distance in time ( $\mu$ s) with a distance in space (mm). The ultrasonic speed in steel is about $5.9$ mm/ $\mu$ s, hence we chose $\gamma=0.1>5.9^{-1}$ what gives stability for noisy data.

After applying $L=4$ iterations of GM-OMP to the data, we use the denoising strategies discussed in the last section and set $\beta=1$ . Since structures in pipe testing often behave linearly or quadratically, we use $\mathcal{F}=\{f\cdot\chi_{C}\ |\ f\in\Pi_{4},C\subseteq\mathscr{M}\text{ convex }\}$ , i.e., polynomials upto degree four multiplied by a characteristic function. The characteristic function is needed for the support constraint in (13). Moreover we choose $\mathcal{G}=\Pi_{0}$ as the space of all constant functions, i.e., the amplitudes of each structure are set to its mean value. This value can give a first hint about the underlying material in the pipe.

In Fig. 5 the four sparsity patterns found by GM-OMP are shown in data domain (i.e., multiplied by $\underline{D}\hskip 1.0pt$ ). As we see, the algorithm is able to reconstruct all four structures of the original data. Due to the structural denoising, the pattern looks more smooth and effects caused by shaking probes are no longer visible. In Tab. I the polynomial coefficients of each pattern are shown. As suppoesd, the lateral wave and back wall echo are mostly linear while the defects were approximated by a quadratic polynomial.

III-C Application 2: meteorologic data

In this example we use GM-OMP without post-processing on meteorologic data provided by Deutscher Wetterdienst (DWD) [46]. We analyze the hourly precipitation in Germany from 25th to 28th of November 2008 where we use data of $932$ stations shown in Fig. 6. Stations that were moved during this time or had too many missing values were neglected. In Fig. 7 the overall precipitation and data of two stations is plotted exemplary where [math] hour refers to Nov. 25th 2008, $0:00$ . We use a dictionary based on centered cardinal B-splines

[TABLE]

We use the normalized versions of all B-splines with $n\leq 7$ , i.e., $\underline{D}\hskip 1.0pt$ contains all $96$ shifts of $B_{n}$ , $n\leq 7$ . We have chosen a time period with low precipitation and thus the data can be sparsely approximated using splines. Note that a B-spline of order $n$ has a support of length $n$ . Thus, the order directly correlates to the duration of the precipitation. Since a single precipitation (e.g., rain) will be registered at several stations, we have an underlying structure in the data.

Choose $m_{i}$ to be the position coordinates of the $i$ -th station and $p_{j}\in\mathbb{N}^{2}$ as the shift and order of the corresponding spline. We use $d_{\mathscr{M}}$ as the geodetic distance and set $\alpha=30$ km. Let $d_{\mathscr{P}}=\|\cdot\|_{\infty}$ and $\gamma=1/15$ , i.e., neither the duration nor the time of occurrence should change more than $2$ hours per $30$ km. We perform $L=100$ iterations of GM-OMP.

Fig. 8 shows the time of occurrence of the largest precipitation event. i.e., the structure that includes the most stations (here $156$ ). The mean duration is $1.28$ h (mean B-spline order) and one can clearly recognize the event moving from north to south caused e.g., by wind. In Fig. 9 we reconstructed the overall precipitation (see Fig. 7) using only $15$ structures. In the left figure we choose the first $15$ structure, i.e., those with the strongest precipitation; for the right figure the $15$ largest events were used. While the strongest events contain the precipitation peaks, the largest events can better reconstruct the overall structure from Fig. 9.

IV Conclusion

We presented a generalized orthogonal matching pursuit for multiple measurements. The algorithm is able to recognize and reconstruct more general sparsity pattern in the solution as other algorithms for multiple measurements. Moreover, GM-OMP allows efficient post processing for each pattern. These patterns can provide crucial information in application which was exemplary demonstrated in two practical examples. Two parameters allow an adaption of the feasible patterns to the application and make the algorithm more flexible. The advantages of GM-OMP were shown in comparison to other techniques and confirmed by first theoretical results.

Acknowledgements

The author thanks Mannesmann Salzgitter GmbH for providing the ultrasonic data used in this paper. This work is supported by BMBF joined research project ZeMat (grant number: 05M13MGA).

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, ”Image denoising by sparse 3-D transform-domain collaborative filtering”, IEEE Trans. Image Processing, vol. 16(8), pp. 2080-2095, 2007.
2[2] M. Elad, M. Aharon, ”Image denoising via sparse and redundant representations over learned dictionaries”, IEEE Trans. Image Processing, vol. 15(12), pp. 3736-3745, 2006.
3[3] E. Le Pennec, S. Mallat, ”Sparse geometric image representations with bandelets”, IEEE Trans. Image Processing, vol. 14(4), pp. 423-438, 2005.
4[4] J. Mairal, M. Elad, G. Sapiro, ”Sparse representation for color image restoration”, IEEE Trans. Image Processing, vol. 17(1), pp. 53-69, 2008.
5[5] S. M. Katz, ”Estimation of probabilities from sparse data for the language model component of a speech recognizer”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 35(3), pp. 400-401, 1987.
6[6] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, Y. Ma, ”Robust face recognition via sparse representation”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31(2), pp. 210-227, 2009.
7[7] M. Lustig, D. Donoho, J. M. Pauly, ”Sparse MRI: The application of compressed sensing for rapid MR imaging”, Magnetic resonance in medicine, vol. 58(6), pp. 1182-1195, 2007.
8[8] E. Y. Sidky, X. Pan, ”Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization”, Physics in medicine and biology, vol. 53(17), pp. 4777–4807, 2008.