Uniform recovery in infinite-dimensional compressed sensing and   applications to structured binary sampling

Ben Adcock; Vegard Antun; Anders C. Hansen

arXiv:1905.00126·cs.IT·May 25, 2021

Uniform recovery in infinite-dimensional compressed sensing and applications to structured binary sampling

Ben Adcock, Vegard Antun, Anders C. Hansen

PDF

TL;DR

This paper establishes uniform recovery guarantees for infinite-dimensional compressed sensing with structured sparsity, introducing multilevel sampling schemes and demonstrating their effectiveness in binary Walsh sampling applications.

Contribution

It provides the first uniform recovery guarantees for infinite-dimensional compressed sensing with local sparsity in levels and multilevel sampling, applicable to binary Walsh sampling.

Findings

01

Recovery guarantees are sharp up to log factors.

02

Improves existing results for unweighted -regularization.

03

First guarantees for Walsh transform with wavelet bases in binary sampling.

Abstract

Infinite-dimensional compressed sensing deals with the recovery of analog signals (functions) from linear measurements, often in the form of integral transforms such as the Fourier transform. This framework is well-suited to many real-world inverse problems, which are typically modelled in infinite-dimensional spaces, and where the application of finite-dimensional approaches can lead to noticeable artefacts. Another typical feature of such problems is that the signals are not only sparse in some dictionary, but possess a so-called local sparsity in levels structure. Consequently, the sampling scheme should be designed so as to exploit this additional structure. In this paper, we introduce a series of uniform recovery guarantees for infinite-dimensional compressed sensing based on sparsity in levels and so-called multilevel random subsampling. By using a weighted $ℓ^{1}$ -regularizer we…

Figures7

Click any figure to enlarge with its caption.

Tables2

Table 1. Table 1 : The Lipschitz regularity of Daubechies wavelets with ν 𝜈 \nu vanishing moments.

$ν$	$α$
2	0.55
3	1.08
4	1.61

Table 2. Table 2 : Left: Fraction between the local coherences for U = [ B wh , B wave 3 , 2 ] 𝑈 subscript 𝐵 wh superscript subscript 𝐵 wave 3 2 U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{3,2}] and 𝐌 = 𝐍 = [ 2 4 , … , 2 11 ] 𝐌 𝐍 superscript 2 4 … superscript 2 11 \mathbf{M}=\mathbf{N}=[2^{4},\ldots,2^{11}] . Right: Fraction between the local coherences for U = [ B wh , B wave 4 , 4 ] 𝑈 subscript 𝐵 wh superscript subscript 𝐵 wave 4 4 U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{4,4}] and 𝐌 = 𝐍 = [ 2 5 , … , 2 12 ] 𝐌 𝐍 superscript 2 5 … superscript 2 12 \mathbf{M}=\mathbf{N}=[2^{5},\ldots,2^{12}] .

$μ_{k, l} / μ_{k + 1, l}$	$l = 1$	$l = 2$	$l = 3$
$k = 2$	3.017
$k = 3$	2.532	1.854
$k = 4$	3.292	2.532	1.846
$k = 5$	3.653	3.293	2.534
$k = 6$	3.828	3.653	3.293
$k = 7$	3.914	3.828	3.654
$k = 8$	3.957	3.914	3.828

Equations401

F f (ω) : = \int_{[0, 1)} f (x) e^{- 2 π ω x} \leavevmode d x, \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode f \in L^{2} ([0, 1)),

F f (ω) : = \int_{[0, 1)} f (x) e^{- 2 π ω x} \leavevmode d x, \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode f \in L^{2} ([0, 1)),

W f (n) : = \int_{[0, 1)} f (x) w_{n} (x) \leavevmode d x, \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode f \in L^{2} ([0, 1))

W f (n) : = \int_{[0, 1)} f (x) w_{n} (x) \leavevmode d x, \leavevmode \leavevmode \leavevmode \leavevmode \leavevmode f \in L^{2} ([0, 1))

z \in C^{N} minimize ∥ z ∥_{1} subject to ∥ P_{Ω} V Ψ^{- 1} z - y ∥_{2} \leq η

z \in C^{N} minimize ∥ z ∥_{1} subject to ∥ P_{Ω} V Ψ^{- 1} z - y ∥_{2} \leq η

(1 - δ) ∥ x ∥_{2}^{2} \leq ∥ A x ∥_{2}^{2} \leq (1 + δ) ∥ x ∥_{2}^{2} \forall x \in Σ_{s},

(1 - δ) ∥ x ∥_{2}^{2} \leq ∥ A x ∥_{2}^{2} \leq (1 + δ) ∥ x ∥_{2}^{2} \forall x \in Σ_{s},

z \in C^{N} minimize ∥ z ∥_{1} subject to ∥ z - (A x + e) ∥_{2} \leq η

z \in C^{N} minimize ∥ z ∥_{1} subject to ∥ z - (A x + e) ∥_{2} \leq η

∥ x - \overset{x}{^} ∥_{2} \leq \frac{C}{s} σ_{s} (x)_{1} + D η

∥ x - \overset{x}{^} ∥_{2} \leq \frac{C}{s} σ_{s} (x)_{1} + D η

μ (U) = i, j = 1, \dots, N max ∣ U_{ij} ∣^{2} \in [N^{- 1}, 1] .

μ (U) = i, j = 1, \dots, N max ∣ U_{ij} ∣^{2} \in [N^{- 1}, 1] .

m ≳ δ^{- 2} \cdot s \cdot N \cdot μ (U) \cdot (lo g (2 m) lo g (2 N) lo g^{2} (2 s) + lo g (ϵ^{- 1}))

m ≳ δ^{- 2} \cdot s \cdot N \cdot μ (U) \cdot (lo g (2 m) lo g (2 N) lo g^{2} (2 s) + lo g (ϵ^{- 1}))

∣ supp (x) \cap {M_{l - 1} + 1, \dots, M_{l}} ∣ \leq s_{l} for l = 1, \dots, r .

∣ supp (x) \cap {M_{l - 1} + 1, \dots, M_{l}} ∣ \leq s_{l} for l = 1, \dots, r .

(1 - δ) ∥ x ∥_{2}^{2} \leq ∥ A x ∥_{2}^{2} \leq (1 + δ) ∥ x ∥_{2}^{2} \forall x \in Σ_{s, M} .

(1 - δ) ∥ x ∥_{2}^{2} \leq ∥ A x ∥_{2}^{2} \leq (1 + δ) ∥ x ∥_{2}^{2} \forall x \in Σ_{s, M} .

σ_{s, M} (x)_{p} : = in f {∥ x - z ∥_{p} : z \in Σ_{s, M}} .

σ_{s, M} (x)_{p} : = in f {∥ x - z ∥_{p} : z \in Σ_{s, M}} .

δ_{2 s, M} < \frac{1}{r ( α _{s, M} + \frac{1}{4} ) ^{2} + 1} .

δ_{2 s, M} < \frac{1}{r ( α _{s, M} + \frac{1}{4} ) ^{2} + 1} .

z \in C^{M} minimize ∥ z ∥_{1} subject to ∥ z - (A x + e) ∥_{2} \leq η

z \in C^{M} minimize ∥ z ∥_{1} subject to ∥ z - (A x + e) ∥_{2} \leq η

∥ x - \overset{x}{^} ∥_{2} \leq (C + C^{'} (r α_{s, M})^{1/4}) \frac{σ _{s, M} ( x ) _{1}}{s} + (D + D^{'} (r α_{s, M})^{1/4}) η

∥ x - \overset{x}{^} ∥_{2} \leq (C + C^{'} (r α_{s, M})^{1/4}) \frac{σ _{s, M} ( x ) _{1}}{s} + (D + D^{'} (r α_{s, M})^{1/4}) η

μ_{k, l} = μ_{k, l} (N, M) = {max ∣ U_{ij} ∣^{2} : i = N_{k - 1} + 1, \dots, N_{k}, j = M_{l - 1} + 1, \dots, M_{l}} .

μ_{k, l} = μ_{k, l} (N, M) = {max ∣ U_{ij} ∣^{2} : i = N_{k - 1} + 1, \dots, N_{k}, j = M_{l - 1} + 1, \dots, M_{l}} .

m_{k} = N_{k} - N_{k - 1}, for k = 1, \dots, r_{0},

m_{k} = N_{k} - N_{k - 1}, for k = 1, \dots, r_{0},

m_{k} ≳ δ^{- 2} \cdot (N_{k} - N_{k - 1}) \cdot (l = 1 \sum r s_{l} μ_{k, l}) \cdot (r lo g (2 \tilde{m}) lo g (2 N) lo g^{2} (2 s) + lo g (ϵ^{- 1}))

m_{k} ≳ δ^{- 2} \cdot (N_{k} - N_{k - 1}) \cdot (l = 1 \sum r s_{l} μ_{k, l}) \cdot (r lo g (2 \tilde{m}) lo g (2 N) lo g^{2} (2 s) + lo g (ϵ^{- 1}))

A = \frac{1}{p _{1}} P_{Ω_{1}} U ⋮ \frac{1}{p _{r}} P_{Ω_{r}} U where p_{k} = \frac{m _{k}}{N _{k} - N _{k - 1}} for k = 1, \dots, r

A = \frac{1}{p _{1}} P_{Ω_{1}} U ⋮ \frac{1}{p _{r}} P_{Ω_{r}} U where p_{k} = \frac{m _{k}}{N _{k} - N _{k - 1}} for k = 1, \dots, r

M = [2^{1}, 2^{2}, \dots, 2^{r}],

M = [2^{1}, 2^{2}, \dots, 2^{r}],

(V_{Four})_{ω = - N /2 + 1, \leavevmode j = 1}^{N /2, \leavevmode N} = \frac{1}{N} exp (2 π i (j - 1) ω / N),

(V_{Four})_{ω = - N /2 + 1, \leavevmode j = 1}^{N /2, \leavevmode N} = \frac{1}{N} exp (2 π i (j - 1) ω / N),

W_{k + 1} = {- 2^{k} + 1, \dots, - 2^{k - 1}} \cup {2^{k - 1} + 1, \dots, 2^{k}}, k = 1, \dots, r - 1.

W_{k + 1} = {- 2^{k} + 1, \dots, - 2^{k - 1}} \cup {2^{k - 1} + 1, \dots, 2^{k}}, k = 1, \dots, r - 1.

m_{k}\gtrsim\delta^{-2}\cdot\bigg{(}\sum_{l=1}^{r}2^{-|k-l|}s_{l}\bigg{)}\left(r\log(2m)\log(2N)\log^{2}(2s)+\log(\epsilon^{-1})\right).

m_{k}\gtrsim\delta^{-2}\cdot\bigg{(}\sum_{l=1}^{r}2^{-|k-l|}s_{l}\bigg{)}\left(r\log(2m)\log(2N)\log^{2}(2s)+\log(\epsilon^{-1})\right).

δ_{2 s, M} < \frac{1}{r ( α _{s, M} + \frac{1}{4} ) ^{2} + 1} .

δ_{2 s, M} < \frac{1}{r ( α _{s, M} + \frac{1}{4} ) ^{2} + 1} .

m_{k} ≳ r \cdot α_{s, M} \cdot (N_{k} - N_{k - 1}) \cdot (l = 1 \sum r μ_{k, l} s_{l}) \cdot L

m_{k} ≳ r \cdot α_{s, M} \cdot (N_{k} - N_{k - 1}) \cdot (l = 1 \sum r μ_{k, l} s_{l}) \cdot L

U_{ij} = ⟨ b_{j}^{sp}, b_{i}^{sa} ⟩

U_{ij} = ⟨ b_{j}^{sp}, b_{i}^{sa} ⟩

H : = 1/ p_{1} P_{Ω_{1}} U 1/ p_{2} P_{Ω_{2}} U \dots 1/ p_{r} P_{Ω_{r}} U \in C^{m \times \infty}, where p_{k} = m_{k} / (N_{k} - N_{k - 1})

H : = 1/ p_{1} P_{Ω_{1}} U 1/ p_{2} P_{Ω_{2}} U \dots 1/ p_{r} P_{Ω_{r}} U \in C^{m \times \infty}, where p_{k} = m_{k} / (N_{k} - N_{k - 1})

\tilde{y} = D P_{Ω} y + e \in C^{m}

\tilde{y} = D P_{Ω} y + e \in C^{m}

minimize ∥ z ∥_{1} subject to ∥ A z - \tilde{y} ∥_{2} \leq η .

minimize ∥ z ∥_{1} subject to ∥ A z - \tilde{y} ∥_{2} \leq η .

A x - \tilde{y} = H P_{M}^{⊥} x + e,

A x - \tilde{y} = H P_{M}^{⊥} x + e,

A = H P_{K}

A = H P_{K}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Uniform recovery in infinite-dimensional compressed sensing and

applications to structured binary sampling

Ben Adcock 111Simon Fraser University, Canada

Vegard Antun 222University of Oslo, Norway 444Corresponding author ([email protected])

Anders C. Hansen 333University of Cambridge, United Kingdom 222University of Oslo, Norway

Abstract

Infinite-dimensional compressed sensing deals with the recovery of analog signals (functions) from linear measurements, often in the form of integral transforms such as the Fourier transform. This framework is well-suited to many real-world inverse problems, which are typically modelled in infinite-dimensional spaces, and where the application of finite-dimensional approaches can lead to noticeable artefacts. Another typical feature of such problems is that the signals are not only sparse in some dictionary, but possess a so-called local sparsity in levels structure. Consequently, the sampling scheme should be designed so as to exploit this additional structure. In this paper, we introduce a series of uniform recovery guarantees for infinite-dimensional compressed sensing based on sparsity in levels and so-called multilevel random subsampling. By using a weighted $\ell^{1}$ -regularizer we derive measurement conditions that are sharp up to log factors, in the sense they agree with those of certain oracle estimators. These guarantees also apply in finite dimensions, and improve existing results for unweighted $\ell^{1}$ -regularization. To illustrate our results, we consider the problem of binary sampling with the Walsh transform using orthogonal wavelets. Binary sampling is an important mechanism for certain imaging modalities. Through carefully estimating the local coherence between the Walsh and wavelet bases, we derive the first known recovery guarantees for this problem.

Keywords:

Infinite-dimensional compressed sensing, uniform recovery, Walsh sampling, wavelet recovery, sparsity in levels, local coherence

Mathematics Subject Classification (2010):

94A20, 42C40, 42C10, 15B52

1 Introduction

Compressive sensing (CS), introduced by Candes, Romberg & Tao in [10] and Donoho in [14], has been an area of substantial research during the last decade. The key assumption, which lays the foundation for this field of research, is that a sparse vector $x\in\mathbb{C}^{M}$ can be recovered from an underdetermined system of linear equations, using, for instance, convex optimization algorithms [15, 16].

Imaging has been one of the most successful areas of application of CS. However, in this area, the sparsity assumption is typically too general. Examples include all applications using Fourier samples – such as Magnetic Resonance Imaging (MRI) [22, 24, 25], surface scattering [21], Computerized Tomography (CT) and electron microscopy – as well as applications using binary sampling, e.g. fluorescence microscopy [29], lensless imaging [33] and numerous other optical imaging modalities [6, 17, 32]. Natural images, when sparsified via a wavelet (or more generally, $X$ -let) transform, are not only sparse, but have specific sparsity structure [3, 27]. For wavelets, which will be our sparsifying transform in this paper, natural images have coefficients where most of the large entries are concentrated at the coarse scales, and progressively fewer at the fine scales (termed asymptotic sparsity in [3]).

In the presence of structured sparsity, it is natural to ask how best to promote this additional structure. In [3] it was proposed to do this via the sampling operator. Wavelets partition Fourier space into dyadic bands corresponding to distinct scales. Hence, by choosing Fourier samples in these bands corresponding to the local sparsities, one obtains as structured sampling scheme – a so-called multilevel sampling scheme – which promotes the asymptotic sparsity structure. The practical benefits of such schemes have been demonstrated in [27] for various different imaging modalities, including MRI, Nuclear Magnetic Resonance (NMR) spectroscopy, fluorescence microscopy and Helium Atom Scattering. Theoretical analysis has been presented in [3] (nonuniform recovery) and [7, 23] (uniform recovery in the finite-dimensional setting).

1.1 Main results

This paper has two main objectives. First, we generalize existing uniform recovery guarantees [7, 23] from the finite-dimensional to the infinite-dimensional setting. This extension is important for practical imaging. Although much of the compressive imaging literature considers the recovery of discrete images (i.e. finite-dimensional arrays) from discrete measurements (e.g. the discrete Fourier transform), modalities such as MRI, NMR and others are naturally analog, and hence better modelled over the continuum (i.e. functions, and the continuous Fourier transform). Indeed, as we will see in Section 2.3, discretizing such a problem leads to measurement mismatch [11], and in the case of wavelet recovery, the wavelet crime [28, 232], both of which can introduce artefacts in the reconstruction [19]. In this paper, we consider signals as functions $f\in L^{2}([0,1))$ and work with continuous integral transforms, thus avoiding these pitfalls.

In our theoretical analysis, we also improve the uniform recovery guarantee given in previous works [7, 23]. Unlike previous results, our recovery guarantees are, up to log factors, optimal: specifically, they agree with those of the oracle least-square estimator based on a priori knowledge of the support [1]. We do this by replacing the standard $\ell^{1}$ -minimization decoder by a certain weighted $\ell^{1}$ -minimization decoder; an idea originally proposed in [31].

Our second objective is to consider binary sampling. Previous works have addressed the case of (discrete or continuous) Fourier sampling. Yet many imaging modalities, e.g. fluorescence microscopy and lensless imaging, require binary sampling operators. To do so, we replace the Fourier transform

[TABLE]

by the binary Walsh transform

[TABLE]

where $w_{n}\colon[0,1)\to\{+1,-1\}$ , $n\in\mathbb{Z}_{+}\coloneqq\{0,1,\ldots\}$ denote the Walsh functions. This is a widely used sampling operator in binary imaging [29, 33], and often goes under the name of Hadamard sampling in the discrete case. Working with this continuous transform, we provide analogous guarantees for binary sampling to those for Fourier sampling. As a side note, we remark that working in the continuous setting also simplifies the analysis (specifically, the derivation of so-called local coherence estimates) over working directly with the discrete setup.

We note that in this paper we only consider recovery guarantees for one dimensional functions. We expect that the setup for higher dimensional function will deviate slightly from what we present here, and we will save this discussion for future work.

The outline of the remainder of this paper is as follows. We commence in Section 2 by reviewing previous work, and in particular, the existing finite-dimensional theory. We then introduce an abstract infinite-dimensional model for isometries $U$ acting on $\ell^{2}(\mathbb{N})$ in Section 3. Here we will derive sufficient conditions for such operators to provide uniform recovery guarantees. In Section 4 we continue this work by finding conditions for which the cross-Gramian $U$ between a wavelet and Walsh basis satisfies these conditions. Finally in Section 5, 6 and 6.6 we will present proofs of our main results.

2 Sparsity in levels in finite dimensions

2.1 Notation

For $N\in\mathbb{N}$ and $\Omega\subseteq\{1,\ldots,N\}$ we let $P_{\Omega}\in\mathbb{C}^{N\times N}$ denote the projection onto the linear span of the associated subset of the canonical basis, i.e. for $x\in\mathbb{C}^{N}$ , we have $(P_{\Omega}x)_{i}=x_{i}$ if $i\in\Omega$ and $(P_{\Omega}x)_{i}=0$ if $i\not\in\Omega$ . Sometimes, we will abuse this notation slightly by assuming $P_{\Omega}\in\mathbb{C}^{|\Omega|\times N}$ , and discard all the zero entries in $P_{\Omega}x$ . Whether we mean $P_{\Omega}\in\mathbb{C}^{N\times N}$ or $P_{\Omega}\in\mathbb{C}^{|\Omega|\times N}$ will be clear from the context. If $\Omega=\{N_{k-1}+1,\ldots,N_{k}\}$ we simply write $P_{N_{k}}^{N_{k-1}}=P_{\{N_{k-1}+1,\ldots,N_{k}\}}$ , and simply $P_{N_{k}}$ if $N_{k-1}=0$ .

We call a vector $x\in\mathbb{C}^{N}$ $s$ -sparse if $|\textnormal{supp}(x)|\leq s$ , where $\textnormal{supp}(x)=\{i:x_{i}\neq 0\}$ . We write $A\lesssim B$ if there exits a constant $C>0$ independent of all relevant parameters, so that $A\leq CB$ , and similarly for $A\gtrsim B$ .

2.2 Finite model

Let $V\in\mathbb{C}^{N\times N}$ be a measurement matrix e.g. a Fourier of Hadamard matrix, denoted $V_{\text{Four}}$ and $V_{\text{Had}}$ , respectively, and let $\Omega\subset\{1,\ldots N\}$ with $|\Omega|=m<N$ . In a typical finite-dimensional CS setup we consider the recovery of a signal $x\in\mathbb{C}^{N}$ from measurements $y=P_{\Omega}Vx+e\in\mathbb{C}^{m}$ , where $e\in\mathbb{C}^{m}$ is a vector of measurement error. If $x$ is sparse in a discrete wavelet basis, one then recovers its coefficients by solving the optimization problem

[TABLE]

where $\Psi\in\mathbb{C}^{N\times N}$ is a discrete wavelet transform and $\eta\geq\|e\|_{2}$ is a noise parameter. Usually one would scale $V\in\mathbb{C}^{N\times N}$ so that it becomes orthonormal and choose an orthonormal wavelet basis, so that the matrix $U=V\Psi^{-1}=V\Psi^{T}$ acts as an isometry on $\mathbb{C}^{N}$ .

Suppose that $U$ is indeed an isometry. To obtain a uniform recovery guarantee for the above system, one typically first shows that the matrix $A=\frac{1}{\sqrt{p}}P_{\Omega}U\in\mathbb{C}^{m\times N}$ , with $p=\frac{m}{N}$ , satisfies the Restricted Isometry Property (RIP) with high probability.

Definition 2.1 (RIP).

Let $1\leq s\leq N$ and $A\in\mathbb{C}^{m\times N}$ . The Restricted Isometry Constant (RIC) of order $s$ is the smallest $\delta\geq 0$ such that

[TABLE]

where $\Sigma_{s}$ denotes the set of $s$ -sparse vectors in $\mathbb{C}^{N}$ . If $0\leq\delta<1$ we say that $A$ has the Restricted Isometry Property (RIP) of order $s$ .

Theorem 2.2 ([16, Thm. 6.12]).

Suppose the RIC $\delta_{2s}$ of a matrix $A\in\mathbb{C}^{m\times N}$ satisfies $\delta_{2s}<4/\sqrt{41}\approx 0.62$ . Then for any $x\in\mathbb{C}^{N}$ and $e\in\mathbb{C}^{m}$ with $\|e\|_{2}\leq\eta$ , any solution $\hat{x}\in\mathbb{C}^{N}$ of

[TABLE]

satisfies

[TABLE]

where $C,D>0$ are constants dependent on $\delta_{2s}$ only and $\sigma_{s}(x)_{1}=\inf\{\|x-z\|_{1}:z\in\Sigma_{s}\}$ .

For an isometry $U\in\mathbb{C}^{N\times N}$ the question of whether or not $P_{\Omega}U$ satisfies the RIP is related to the so-called coherence of $U$ :

Definition 2.3 (Coherence).

Let $U\in\mathbb{C}^{N\times N}$ be an isometry. The coherence of $U$ is

[TABLE]

Theorem 2.4 ([16, Thm. 12.32]).

Let $U\in\mathbb{C}^{N\times N}$ be an isometry and let $0<\delta,\epsilon<1$ . Suppose $\Omega=\{t_{1},\ldots t_{m}\}\subseteq\{1,\ldots,N\}$ where each $t_{k}$ is chosen uniformly and independently at random from the set $\{1,\ldots,N\}$ . If

[TABLE]

then with probability $1-\epsilon$ the matrix $A=\tfrac{1}{\sqrt{p}}P_{\Omega}U\in\mathbb{C}^{m\times N}$ , with $p=\tfrac{m}{N}$ , satisfies the RIP of order $s$ with $\delta_{s}\leq\delta$ .

(We slightly abuse notation here in that we allow for possible repeats of the values $t_{i}$ that make up $\Omega$ ). Thus if the coherence $\mu(U)\approx N^{-1}$ we obtain the RIP of order $s$ using approximately $s$ measurements up to constants and log factors.

There are, however, two problems with this approach. First, in our setup, where $U=V\Psi^{T}$ is the product of a Fourier or Hadamard matrix and a discrete wavelet transform, the coherence $\mu(U)\approx 1$ . Hence satisfying the RIP requires at least $m\approx N$ measurements. Second, the RIP asserts recovery for all $s$ -sparse vectors of wavelet coefficients, and thus does not exploit any additional structure these coefficients possess. However, as stated, wavelet coefficient are highly structured: large wavelet coefficients tend to cluster at coarse scales, with coefficients at fine scales being increasingly sparse.

Motivated by this, the following structured sparsity model was introduced in [3]:

Definition 2.5 (Sparsity in levels).

Let $\mathbf{M}=[M_{1},\ldots,M_{r}]\in\mathbb{N}^{r}$ , $M_{0}=0$ , with $1\leq M_{1}<\cdots<M_{r}=M$ and let $\boldsymbol{s}=(s_{1},\ldots,s_{r})\in\mathbb{N}^{r}$ with $s_{l}\leq M_{l}-M_{l-1}$ , for $l=1,\ldots,r$ . We say that the vector $x\in\mathbb{C}^{M}$ is sparse in levels if

[TABLE]

In which case we call $x$ , $(\boldsymbol{s},\mathbf{M})$ -sparse, where $\boldsymbol{s}$ and $\mathbf{M}$ are called the local sparsities and sparsity levels, respectively. We denote the set of all $(\boldsymbol{s},\mathbf{M})$ -sparse vectors by $\Sigma_{\boldsymbol{s},\mathbf{M}}$ .

As noted above, randomly subsampling an isometry $U$ is a poor measurement protocol for coherent problems such as Fourier–Wavelets. Instead, in [3] it was proposed to sample in the following structured way:

Definition 2.6 (Multilevel random subsampling).

Let $\mathbf{N}=[N_{1},\ldots,N_{r}]\in\mathbb{N}^{r}$ , where $1\leq N_{1}<\cdots<N_{r}=N$ and $\boldsymbol{m}=(m_{1},\ldots,m_{r})\in\mathbb{N}^{r}$ with $m_{k}\leq N_{k}-N_{k-1}$ for $k=1,\ldots,r$ , and $N_{0}=0$ . For each $k=1,\ldots,r$ , let $\Omega_{k}=\{N_{k-1}+1,\ldots,N_{k}\}$ if $m_{k}=N_{k}-N_{k-1}$ and if not, let $t_{k,1},\ldots,t_{k,m_{k}}$ be chosen uniformly and independently from the set $\{N_{k-1}+1,\ldots,N_{k}\}$ , and set $\Omega_{k}=\{t_{k,1},\ldots,t_{k,m_{k}}\}$ . If $\Omega=\Omega_{\mathbf{N},\boldsymbol{m}}=\Omega_{1}\cup\cdots\cup\Omega_{r}$ we refer to $\Omega$ as an $(\mathbf{N},\boldsymbol{m})$ -multilevel subsampling scheme.

For this structured model, the following extensions of the RIP was first introduced in [7].

Definition 2.7 (RIPL).

Let $\boldsymbol{s},\mathbf{M}\in\mathbb{N}^{r}$ be given local sparsities and sparsity levels, respectively. For a matrix $A\in\mathbb{C}^{m\times N}$ the Restricted Isometry Constant in Levels (RICL) of order $(\boldsymbol{s},\mathbf{M})$ , denoted $\delta_{\boldsymbol{s},\mathbf{M}}$ , is the smallest $\delta\geq 0$ such that

[TABLE]

We say that $A$ has the Restricted Isometry Property in Levels (RIPL) if $0\leq\delta<1$ .

We shall see that this leads to uniform recovery of all $(\boldsymbol{s},\mathbf{M})$ -sparse vectors, but first we define the best $(\boldsymbol{s},\mathbf{M})$ -term approximation error of $x\in\mathbb{C}^{N}$ . That is

[TABLE]

Theorem 2.8 ([7, Thm. 4.4]).

Let $\boldsymbol{s},\mathbf{M}\in\mathbb{N}^{r}$ be local sparsities and sparsity levels, respectively. Let $\alpha_{\boldsymbol{s},\mathbf{M}}=\max_{k,l=1,\ldots,r}s_{l}/s_{k}$ and $s=s_{1}+\cdots+s_{r}$ . Suppose that the RICL $\delta_{2\boldsymbol{s},\mathbf{M}}\geq 0$ for the matrix $A\in\mathbb{C}^{m\times M}$ satisfies

[TABLE]

Then, for $x\in\mathbb{C}^{M}$ and $e\in\mathbb{C}^{m}$ with $\|e\|_{2}\leq\eta$ , any solution $\hat{x}$ of

[TABLE]

satisfies

[TABLE]

where $C,C^{\prime},D,D^{\prime}>0$ are constants which only dependent on $\delta_{2\boldsymbol{s},\mathbf{M}}$ .

In [23] the authors investigated conditions under which a subsampled isometry $U\in\mathbb{C}^{N\times N}$ satisfies the RIPL. In was shown that the number of samples required to satisfy the RIPL was related to the so-called local coherence properties of $U$ :

Definition 2.9.

Let $U\in C^{N\times N}$ be an isometry and $\mathbf{N},\mathbf{M}\in\mathbb{N}^{r}$ be given sampling and sparsity levels. The local coherence of $U$ is

[TABLE]

Theorem 2.10 ([23, thm. 3.2]).

Let $U\in\mathbb{C}^{N\times N}$ be an isometry. Let $r\in\mathbb{N}$ , $0<\delta,\epsilon<1$ , and $0\leq r_{0}\leq r$ . Let $\Omega=\Omega_{\mathbf{N},\boldsymbol{m}}$ be an $(\mathbf{N},\boldsymbol{m})$ -multilevel random subsampling scheme. Let $\tilde{m}=m_{r_{0}+1}+\ldots+m_{r}$ and $s=s_{1}+\ldots+s_{r}$ . Suppose that the $m_{k}$ s satisfy

[TABLE]

and

[TABLE]

for $k=r_{0}+1,\ldots,r$ . Then the matrix

[TABLE]

satisfies the RIPL of order $(\boldsymbol{s},\mathbf{M})$ with constant $\delta_{\boldsymbol{s},\mathbf{M}}\leq\delta$ .

This theorem characterizes the number of local measurements $m_{k}$ needed to ensure uniform recovery explicitly in terms of local sparsities $s_{k}$ and local coherences $\mu_{k,l}$ . In particular, if the local coherences are suitably well-behaved, then recovery may still be possible from highly subsampled measurements, even though the global coherence may be high (see next). Note that the condition (2.3), whereby the first $r_{0}$ sampling levels are saturated, models practical imaging scenarios where the low Fourier frequencies are typically fully sampled.

To illustrate this theorem, in [4] the authors consider the one-dimensional discrete Fourier sampling problem with sparsity in Haar wavelets. For the Haar wavelet basis we choose an ordering where the first level $\{M_{0}+1,M_{1}\}=\{1,2\}$ consists of the scaling function and mother wavelet and the subsequent levels are chosen so that $\{M_{l-1}+1,\ldots,M_{l}\}=\{2^{l-1}+1,\ldots,2^{l}\}$ consists of the wavelets at scale $l-1$ . This gives the sparsity levels

[TABLE]

where $r=\log_{2}(N)$ (assumed to be an integer). Next we define the entries in the Fourier matrix $V_{\text{Four}}\in\mathbb{C}^{N\times N}$ as

[TABLE]

where we have started the ordering of the rows with negative indices for convenience. We define the sampling levels for the frequencies $\omega$ in dyadic bands with $W_{1}=\{0,1\}$ and

[TABLE]

Notice that for a suitable reordering of the rows of $V_{\text{Four}}$ these bands corresponds to the sampling levels $\mathbf{N}=[2^{1},2^{2},\ldots,2^{r}]$ .

Theorem 2.11 ([23, Cor. 3.3]).

Let $N=2^{r}$ for some $r\geq 1$ and let $U=V_{\text{Four}}\Psi^{-1}\in\mathbb{C}^{N\times N}$ , where $\Psi$ is the Haar wavelet matrix. Let $0<\delta,\epsilon<1$ and let $\mathbf{N}=\mathbf{M}=[2^{1},\ldots,2^{r}]$ . Let $m=m_{1}+\cdots m_{r}$ and $s=s_{1}+\cdots s_{r}$ . For each $k=1,\ldots,r$ suppose we draw $m_{k}$ Fourier samples from band $W_{k}$ randomly and independently, where

[TABLE]

Then with probability at least $1-\epsilon$ the matrix (2.5) satisfies the RIPL with constant $\delta_{s,\mathbf{M}}\leq\delta$ .

Here, for convenience, we have taken $r_{0}=0$ ; see [23] for further discussion on this point.

2.3 Shortcomings

These results have two primary shortcomings, which we now discuss in further detail. The key issue is that they are limited to finite dimensions. As noted in Section 1, applying finite-dimensional recovery procedures to analog problems can result in artefacts. For simplicity, let $N=2^{p}$ . We have argued that analog signals should be modelled as elements in $L^{2}([0,1))$ , rather than $\mathbb{C}^{N}$ . Yet, above we have tried to use discrete tools for recovering the signal $f\in L^{2}([0,1))$ by replacing $\mathcal{W}f$ and $\mathcal{F}f$ with $V_{\text{Had}}$ and $V_{\text{Four}}$ , respectively. Next we argue that this construction leads to both measurement mismatch and the wavelet crime.

Let $\chi_{[a,b)}$ denote step functions on the interval $[a,b)$ and set $\Delta_{k,p}=[k2^{-p},(k+1)2^{-p})$ . We see that replacing $\mathcal{W}f$ with $V_{\text{Had}}\in\mathbb{C}^{N\times N}$ is equivalent to replacing $f$ by e.g. $\tilde{f}=\sum_{k=0}^{N-1}c_{k}\chi_{\Delta_{k,r}}$ for some $c\in\mathbb{C}^{N}$ , since $\mathcal{W}\tilde{f}=V_{\text{Had}}c$ . Clearly, $\mathcal{W}\tilde{f}$ will be a poor approximation to $\mathcal{W}f$ . We refer to this as measurement mismatch.

Next let $\phi^{0},\phi^{1}$ denote a scaling function and wavelet, respectively, and set $\phi_{j,k}^{s}=2^{j/2}\phi^{s}(2^{j}\cdot-k)$ for $s\in\{0,1\}$ . By construction the solution $\hat{x}$ of (2.1) will be the coefficients of a function $\hat{f}$ written in a basis consisting of both wavelets and scaling functions. Equivalently we can represent $\hat{f}$ in the basis $\{\phi_{j,k}^{0}\}_{k=0}^{N-1}$ using the coefficients $c=\Psi^{-1}\hat{x}\in\mathbb{C}^{N}$ . The wavelet crime is whenever we let $c$ , represent pointwise samples of $f$ i.e. $c_{k}=f(k/N)$ .

What does this mean for reconstruction? To illustrate the issue we provide a similar example to the first numerical simulation in [2], showing how finite-dimensional compressed sensing fails to recover even a function that is 1-sparse (meaning it has only one non-zero coefficient) in its wavelet decomposition. Indeed, in Figure 1 we consider the problem of recovering a function $f$ from samples of the continuous Walsh transform. In particular, we choose $f(t)=\phi_{4,4}(t)$ , where $\phi$ is the Daubechies scaling function, corresponding to the wavelet with four vanishing moments. Figure 1 shows the poor performance of CS using the discrete finite-dimensional setup when applied to a continuous problem. Conversely, the infinite-dimensional CS approach, which we develop in the next sections, gives a much higher fidelity reconstruction from exactly the same samples as used in the finite-dimensional case. In fact, the infinite-dimensional CS reconstruction recovers $f$ perfectly up to numerical errors occurring from solving the optimization problem. We also observe the slightly paradoxical phenomenon in the finite-dimensional case: more samples do not improve performance. This is due to the fact that the finite-dimensional CS solution with full sampling coincides with the truncated Walsh series (direct inversion) approximation. This approximation is clearly highly suboptimal, as demonstrated in Figure 1.

We note in passing that the above crimes stem from too early a discretization of the inverse problem. Our infinite-dimensional CS approach replaces $V_{\text{Had}}\Psi^{-1}$ by a finite section of the an isometry $U\in\mathcal{B}(\ell^{2}(\mathbb{N}))$ representing change of basis between the continuous Fourier or Walsh transform and wavelet basis.

On a related note, even if one were to ignore the above issues, estimating the local coherences $\mu_{k,l}$ in the discrete setting for anything but the Haar wavelet becomes extremely complicated. Conversely, by moving to the continuous setting, these estimates become much easier to derive. We do this later in the paper for arbitrary Daubechies’ wavelets with the Walsh transform.

The second shortcoming relates to Theorem 2.8. It says that we can guarantee recovery of all sparse signals provided the matrix $A\in\mathbb{C}^{m\times M}$ satisfies the RIPL with constant

[TABLE]

Here $r$ is the number of levels and $\alpha_{\boldsymbol{s},\mathbf{M}}=\max_{k,l=1,\ldots,r}s_{l}/s_{k}$ is the sparsity ratio. Inserting the above inequality into Theorem 2.10 gives a sampling condition of the form

[TABLE]

where $L$ is the log factors. This means that the sparsity ratio $\alpha_{\boldsymbol{s},\mathbf{M}}$ will affect the sampling condition in all sampling levels. Thus for signals where we expect the local sparsities to vary greatly from level to level (e.g. wavelets) this will lead to a unreasonably high number of samples.

To overcome this problem, using an idea from [31], we replace the $\ell^{1}$ -regularizer in the optimization problem (2.1) with a weighted $\ell^{1}$ -regularizer. For a suitable choice of weights, this removes the factor of $\alpha_{\boldsymbol{s},\mathbf{M}}$ in the various measurement conditions. As we show, these guarantees are optimal up to constants and log factors.

3 Extensions to infinite dimensions

3.1 Setup

We will continue with the notation we introduced above, extended to infinite dimensions. That is, we assume that the signal $f$ is an element of $L^{2}([0,1))$ . We still let $P_{\Omega}$ denote the projection onto the canonical basis, but we now let it be an element in either $\mathcal{B}(\ell^{2}(\mathbb{N}))$ or $\mathcal{B}(\ell^{2}(\mathbb{N}),\mathbb{C}^{|\Omega|})$ . Similarly we call a vector $x\in\ell^{2}(\mathbb{N})$ $(\boldsymbol{s},\mathbf{M})$ -sparse if $P_{M}x$ is $(\boldsymbol{s},\mathbf{M})$ -sparse and $P_{M}^{\perp}x=0$ . Here $M=M_{r}$ and we refer to it as the sparsity bandwidth of $x$ . For an isometry $U\in\mathcal{B}(\ell^{2}(\mathbb{N}))$ we define the coherence of $U$ as $\mu(U)=\sup\{|U_{ij}|^{2}:i,j\in\mathbb{N}\}$ .

Next we describe the setup for a general sampling basis $B_{\textnormal{sa}}=\allowbreak\{b^{\textnormal{sa}}_{1},b^{\textnormal{sa}}_{2},b^{\textnormal{sa}}_{3},\ldots,\}$ and a sparsifying basis $B_{\textnormal{sp}}=\allowbreak\{b^{\textnormal{sp}}_{1},b^{\textnormal{sp}}_{2},b^{\textnormal{sp}}_{3},\ldots,\}$ , both assumed to be orthonormal bases of $L^{2}([0,1))$ . In Section 4, we will specialize this so that $B_{\textnormal{sa}}$ is the Walsh sampling basis and $B_{\textnormal{sp}}$ is a wavelet sparsifying basis. This will enable us to derive concrete recovery guarantees for $f$ . The setup below is, however, completely general.

For the two bases $B_{\textnormal{sa}}$ and $B_{\textnormal{sp}}$ we can represent $f$ using the coefficients $y=\{\left\langle f,b^{\textnormal{sa}}_{n}\right\rangle\}_{n\in\mathbb{N}}$ and $x=\{\left\langle f,b^{\textnormal{sp}}_{n}\right\rangle\}_{n\in\mathbb{N}}$ , respectively. To change the representation from $B_{\textnormal{sa}}$ to $B_{\textnormal{sp}}$ we define the following matrix.

Definition 3.1.

Let $B_{\textnormal{sa}}=\allowbreak\{b^{\textnormal{sa}}_{1},b^{\textnormal{sa}}_{2},b^{\textnormal{sa}}_{3},\ldots,\}$ and $B_{\textnormal{sp}}=\allowbreak\{b^{\textnormal{sp}}_{1},b^{\textnormal{sp}}_{2},b^{\textnormal{sp}}_{3},\ldots,\}$ be orthonormal bases for $L^{2}([0,1))$ . The change of basis matrix $U\in\mathcal{B}(\ell^{2}(\mathbb{N}))$ between $B_{\textnormal{sa}}$ and $B_{\textnormal{sp}}$ is the infinite matrix with entries

[TABLE]

We will denote this matrix by $U=[B_{\textnormal{sa}},B_{\textnormal{sp}}]$ .

Notice in particular that since $B_{\textnormal{sa}}$ and $B_{\textnormal{sp}}$ are orthonormal, $U=[B_{\textnormal{sa}},B_{\textnormal{sp}}]$ is an isometry on $\ell^{2}(\mathbb{N})$ and we can write $y=Ux$ .

Next let $\Omega=\Omega_{\boldsymbol{m},\mathbf{N}}$ be a given multilevel random sampling scheme with $|\Omega|=m$ . We refer to $N=N_{r}$ as the sampling bandwidth of $\Omega$ (as discussed later, this will be chosen in terms of sampling bandwidth to ensure stable truncation of $U$ ). Now define the matrix

[TABLE]

and we use the slightly unusual notation $\mathbb{C}^{m\times\infty}$ for the operators $\mathcal{B}(\ell^{2}(\mathbb{N}),\mathbb{C}^{m})$ . Due to the scaling factors $1/\sqrt{p_{k}}$ we consider scaled noisy measurements

[TABLE]

where $D$ is a diagonal matrix with the corresponding scaling factors found in $H$ along the diagonal and $e$ is the measurement noise.

Suppose that $x$ is approximately $(\boldsymbol{s},\mathbf{M})$ -sparse with sparsity bandwidth $M$ . It is tempting to form the finite matrix $A=HP_{M}\in\mathbb{C}^{m\times M}$ and solve the minimization problem

[TABLE]

However, note that the truncation of $H$ to $A$ introduces an additional truncation error $HP_{M}^{\perp}x$ . Indeed,

[TABLE]

and this poses a problem since for the above decoder we require $\eta\geq\|HP_{M}^{\perp}x+e\|_{2}$ in order for $P_{M}x$ to be a feasible point. For some applications we might have a rough estimate of $\|e\|_{2}$ , but any estimate of $\|HP_{M}^{\perp}x\|_{2}$ would require a priori knowledge of $x$ , the signal we are trying to recover. This is generally impossible. (We note in passing that there is some recent work [8] which derives CS recovery guarantees in the absence of feasibility of the target vector $P_{M}x$ , but the application of this work to the sparse in levels model is not clear).

To overcome this issue, we will introduce a data fidelity parameter $K\geq M$ and assume we know $\|e\|_{2}$ so that we can let $\eta>\|e\|_{2}$ . Then there will always exits a $K^{\prime}\geq M$ such that $P_{K}x$ lies in the feasible set $\{z\in\mathbb{C}^{K}:\|Az-\tilde{y}\|_{2}\leq\eta\}$ corresponding to the augmented matrix

[TABLE]

for all $K\geq K^{\prime}$ . In practice (for the general case) it will also be impossible determine a sufficient value for $K$ , but for fixed $\eta>\|e\|_{2}$ there will always exist such a $K$ . It should, however, be noted that there are special cases, such as Walsh sampling and wavelet recovery, where sufficient values for $K$ are known; see Remark 4.9.

This aside, as previously mentioned, we also now modify the optimization problem to include weights. Specifically, let $\mathbf{M},\boldsymbol{s}\in\mathbb{N}^{r}$ be given sparsity levels and local sparsities respectively. For positive weights $\boldsymbol{\omega}=(\omega_{1},\ldots,\omega_{r+1})$ we define

[TABLE]

with $M_{r+1}=K$ for $x\in\mathbb{C}^{K}$ . Notice that this weighted regularizer assigns constant weights on each sparsity level. With this in hand, our recovery procedure is

[TABLE]

with $A$ as in (3.3) and $\eta\geq\|Ax-\tilde{y}\|_{2}$ .

3.2 The balancing property

We now discuss the relation between the sampling and sparsity bandwidths $N$ and $M$ . From generalized sampling theory [2] we know that we must choose $N\geq M$ to obtain a stable mapping between the first $N$ sampling basis functions and the first $M$ sparsity basis functions. The degree of stability for this solution will depend of the so-called balancing property:

Definition 3.2.

Let $U\colon\ell^{2}(\mathbb{N})\to\ell^{2}(\mathbb{N})$ be an isometry. Let $0<\theta<1$ and $N\geq M\geq 1$ . Then $U$ has the balancing property with constant $\theta$ if

[TABLE]

Note that the balancing property may not hold for any $N\geq M$ . However, it always holds for sufficiently large $N$ (for fixed $M$ ). Indeed, $P_{M}U^{*}P_{N}UP_{M}\rightarrow P_{M}U^{*}UP_{M}\equiv P_{M}$ in the operator norm, hence the balancing property holds with $\theta$ arbitrarily close to $1$ for large enough $N$ .

Below we shall see that this property will also affect our recovery guarantees, but it will be camouflaged as the quantity $\|G^{-1}\|_{2}$ , where $G=\sqrt{P_{M}U^{*}P_{N}U_{P}M}$ . This gives the following relation.

Lemma 3.3.

Let $U\in\mathcal{B}(\ell^{2}(\mathbb{N}))$ be an isometry satisfying the balancing property of order $0<\theta<1$ for $M,N\in\mathbb{N}$ . Let $G=\sqrt{P_{M}U^{*}P_{N}UP_{M}}$ be self-adoint and nonnegative definite. Then $G$ is invertible and

[TABLE]

3.3 $\boldsymbol{G}$ -adjusted Restricted Isometry Property in Levels (G-RIPL)

Our theoretical analysis requires a RIP-type property for the matrix $HP_{M}$ . However, as implied in the previous discussion, the finite matrix $P_{N}UP_{M}\in\mathbb{C}^{N\times M}$ (from which $AP_{M}$ is constructed) is not an isometry for any $N\geq M$ . In particular, unlike in finite dimensions $\mathbb{E}(P_{M}H^{*}HP_{M})=P_{M}U^{*}P_{N}UP_{M}=G^{2}$ is not the identity. In order to handle this situation, we introduce the following generalization of the RIP:

Definition 3.4 (G-RIPL).

Let $A\in\mathbb{C}^{m\times M}$ , $G\in\mathbb{C}^{M\times M}$ be invertible, ${\mathbf{M}}=(M_{1},\ldots,M_{r})$ be sparsity levels and ${\mathbf{s}}=(s_{1},\ldots,s_{r})$ be local sparsities. The ${\mathbf{s}}^{\mathrm{th}}$ $G$ -adjusted Restricted Isometry Constant in Levels (G-RICL) $\delta_{{\mathbf{s}},{\mathbf{M}}}$ is the smallest $\delta\geq 0$ such that

[TABLE]

If $0<\delta_{{\mathbf{s}},{\mathbf{M}}}<1$ we say that the matrix $A$ satisfies the $G$ -adjusted Restricted Isometry Property in Levels (G-RIPL) of order $({\mathbf{s}},{\mathbf{M}})$ .

The G-RIPL is of course completely general and can be stated for any $G$ . However, in the following we will let $G=\sqrt{P_{M}U^{*}P_{N}UP_{M}}$ and show that the matrix $A=HP_{K}$ (or equivalently, $HP_{M}$ – note that $\Sigma_{\boldsymbol{s},\mathbf{M}}$ consists of vectors $z$ with $P^{\perp}_{M}z=0$ ) satisfies the G-RIPL for this particular $G$ .

First, however, we show that the G-RIPL implies uniform recovery. For this, we introduce the following notation:

[TABLE]

Notice in particular that for the choice $\boldsymbol{\omega}=(1,\ldots,1,\omega_{r+1})$ we have $S_{\boldsymbol{\omega},\boldsymbol{s}}=s_{1}+\ldots+s_{r}$ and for the choice $\boldsymbol{\omega}=(s^{-1/2}_{1},\ldots,s^{-1/2}_{r},\omega_{r+1})$ we have $S_{\boldsymbol{\omega},\boldsymbol{s}}=r$ . Finally, we let $\kappa(G)=\|G\|_{2}\|G^{-1}\|_{2}$ denote the condition number of $G$ .

Theorem 3.5.

Let $A\in\mathbb{C}^{m\times K}$ , $G\in\mathbb{C}^{M\times M}$ with $K\geq M$ and let $\mathbf{M},\boldsymbol{s}\in\mathbb{N}^{r}$ be given sparsity levels and local sparsities, respectively. Let $\boldsymbol{\omega}\in\mathbb{R}^{r+1}$ be positive weights. Suppose $AP_{M}$ satisfies the G-RIPL of order $(\boldsymbol{t},\mathbf{M})$ with constant $\delta_{\boldsymbol{t},\mathbf{M}}\leq 1/2$ and

[TABLE]

Let

[TABLE]

Let $\eta\geq 0$ , $x\in\mathbb{C}^{K}$ , $e\in\mathbb{C}^{m}$ with $\|e\|_{2}\leq\eta$ and set $y=Ax+e$ . Then any solution $\hat{x}$ of the optimization problem

[TABLE]

satisfies

[TABLE]

where $C=2(2+\sqrt{3})/(2-\sqrt{3})$ , $D=8\sqrt{2}/(2-\sqrt{3})$ and $\sigma_{\boldsymbol{s},\mathbf{M}}(x)_{1,\boldsymbol{\omega}}=\inf\{\|x-z\|_{1,\boldsymbol{\omega}}:z\in\Sigma_{\boldsymbol{s},\mathbf{M}}\}$ .

Notice that the condition on $\delta$ in the above theorem is fundamentally different from the condition found in Theorem 2.8. In the latter one requires $\delta_{2\boldsymbol{s},\mathbf{M}}<(r(\sqrt{\alpha_{\boldsymbol{s},\mathbf{M}}}+\tfrac{1}{4})^{2}+1)^{-1/2}$ where $\alpha_{\boldsymbol{s},\mathbf{M}}=\max_{k,l=1,\ldots,r}s_{k}/s_{l}$ is the sparsity ratio. Thus for sparsity levels where the local sparsities vary greatly, this bound will be unreasonably small.

In the above theorem we have removed this sparsity ratio term, by setting $\delta=1/2$ , and require $\delta_{\boldsymbol{t},\mathbf{M}}\leq\delta$ where $t_{l}\geq 2\left\lceil 4\kappa(G)S_{\boldsymbol{\omega},\boldsymbol{s}}w_{l}^{-2}\right\rceil$ . For the unweighted case this leads to a condition of the form

[TABLE]

which could be difficult to fulfill in practice, since each $t_{l}$ would have to be greater than the total sparsity of the signal. However, by considering the weights $\boldsymbol{\omega}=(s_{1}^{-1/2},\ldots,s_{r}^{-1/2},\omega_{r+1})$ we obtain a condition of the form

[TABLE]

where $t_{l}$ is independent of $s_{k}$ for $k\neq l$ . This means that we can write the requirement as $\delta_{2\left\lceil 4\kappa(G)^{2}r\boldsymbol{s}\right\rceil,\mathbf{M}}\leq 1/2$ , and ignore any dependence between the $\boldsymbol{s}$ -values, as was the problem in Theorem 2.8.

3.4 Sufficient condition for the G-RIPL

In Definition 2.9 we defined the local coherence $\mu_{k,l}$ of an isometry $U\in\mathbb{C}^{N\times N}$ . We extend this to isometries $U\in\mathcal{B}(\ell^{2}(\mathbb{N}))$ in the exact same way

[TABLE]

This yields the following theorem.

Theorem 3.6 (Subsampled isometries and the G-RIPL).

Let $U\in\mathcal{B}(\ell^{2}(\mathbb{N}))$ be an isometry, and let $\Omega=\Omega_{{\mathbf{N}},{\mathbf{m}}}$ be an $({\mathbf{N}},{\mathbf{m}})$ -multilevel sampling scheme with $r$ levels. Let $\mathbf{M},\boldsymbol{s}\in\mathbb{N}^{r}$ be sparsity levels and local sparsities, respectively. Let $\epsilon,\delta\in(0,1)$ and let $0\leq r_{0}\leq r$ , with $\tilde{m}=m_{r_{0}+1}+\cdots+m_{r}$ . Let $s=s_{1}+\cdots+s_{r}$ and $L=r\cdot\log(2\tilde{m})\cdot\log(2N)\cdot\log^{2}(2s)+\log(\epsilon^{-1})$ . Suppose $G=\sqrt{P_{M}U^{*}P_{N}UP_{M}}$ is non-singular. If

[TABLE]

and

[TABLE]

for $k=r_{0}+1,\ldots,r$ then with probability at least $1-\epsilon$ , the matrix

[TABLE]

satisfies the G-RIPL of order $({\mathbf{s}},{\mathbf{M}})$ with constant $\delta_{{\mathbf{s}},{\mathbf{M}}}\leq\delta$ .

3.5 Overall recovery guarantee

Theorem 3.5 and Theorem 3.6 yield the next results.

Corollary 3.7.

Let $U\in\mathcal{B}(\ell^{2}(\mathbb{N}))$ be an isometry, and let $\Omega=\Omega_{{\mathbf{N}},{\mathbf{m}}}$ be an $({\mathbf{N}},{\mathbf{m}})$ -multilevel sampling scheme with $r$ levels. Let $\mathbf{M},\boldsymbol{s}\in\mathbb{N}^{r}$ be sparsity levels and local sparsities, respectively, and let $\boldsymbol{\omega}=[s_{1}^{-1/2},\ldots,s_{r}^{-1/2},\omega_{r+1}]$ be weights. Let $\epsilon,\delta\in(0,1)$ and $0\leq r_{0}\leq r$ . Let $m=m_{1}+\ldots+m_{r}$ , $\tilde{m}=m_{r_{0}+1}+\cdots+m_{r}$ , $s=s_{1}+\cdots+s_{r}$ , and $L=r\cdot\log(2\tilde{m})\cdot\log(2N)\cdot\log^{2}(2s)+\log(\epsilon^{-1})$ . Let $H\in\mathbb{C}^{m\times\infty}$ be as in (3.1) and set $A=HP_{K}$ . Let $x\in\ell^{2}(\mathbb{N})$ , $e_{1}\in\mathbb{C}^{m}$ and $\eta>0$ . Set $e=HP_{K}^{\perp}x+e_{1}$ and $\tilde{y}=Ax+e$ . Suppose

(i)

we choose $M$ and $N$ so that $U$ satisfies the balancing property of order $0<\theta<1$ , 2. (ii)

we choose $\eta\geq\|e_{1}\|$ and $K$ so that $\|HP_{K}^{\perp}x\|_{2}\leq\eta^{\prime}$ , 3. (iii)

the weight $\omega_{r+1}$ satisfies

[TABLE] 4. (iv)

the $m_{k}$ ’s satisfy $m_{k}=N_{k}-N_{k-1}$ for $k=1,\ldots,r_{0}$ and

[TABLE]

Then with probability $1-\epsilon$ any solution $\hat{x}$ of the optimization problem

[TABLE]

satisfies

[TABLE]

where $C=2(2+\sqrt{3})/(2-\sqrt{3})$ and $D=8\sqrt{2}/(2-\sqrt{3})$ .

Suppose that $x$ is exactly $(\boldsymbol{s},\mathbf{M})$ -sparse. Then the above theorem guarantees exact recovery of $x$ via weighted $\ell^{1}$ minimization subject to the corresponding measurement condition. We note in passing this measurement condition is optimal up to log factors, in the sense that it is the same of that of the oracle estimator based on a priori knowledge of $\textnormal{supp}(x)$ . See [1].

4 Recovery guarantees for Walsh sampling with wavelet reconstruction

Having presented the abstract infinite-dimensional CS framework in full generality, the remainder of the paper is devoted to its application to the case of binary sampling with the Walsh transform with sparsity in orthogonal wavelet bases. We first describe the setup, before presenting the main recovery guarantees in Sections 4.3 and 4.4.

4.1 Walsh functions

For any number $n\in\mathbb{Z}_{+}=\{0,1,2,\ldots\}$ there exits a unique dyadic expansion

[TABLE]

where $n_{j}\in\{0,1\}$ for $j\in\mathbb{N}$ . Similarly any $x\in[0,1)$ can be written in its dyadic form as

[TABLE]

with $x_{j}\in\{0,1\}$ for all $j\in\mathbb{N}$ . For a dyadic rational number $x$ this expansion is not unique, as one may use either a finite expansion, or an infinite expansion where $x_{i}=1$ for all $i\geq k$ for some $k\in\mathbb{N}$ . In such cases we always consider the finite expansion. In practice this means that we have removed countably many singletons from $[0,1)$ .

Definition 4.1.

Let $n\in\mathbb{Z}_{+}$ and $x\in[0,1)$ . The Walsh function $w_{n}\colon[0,1)\to\{+1,-1\}$ is given by

[TABLE]

On the interval $[0,1)$ the Walsh function $w_{n}$ has $n$ sign changes, $n$ is therefore often denoted the frequency of $w_{n}$ . The $2^{r}$ first Walsh functions gives rise to the entries in the sequency ordered Hadamard matrix

[TABLE]

where $i,j=1,\ldots,2^{r}$ .

Definition 4.2 (Walsh basis).

Define the Walsh basis as

[TABLE]

where “wh” is an abbreviation for Walsh-Hadamard.

Note that this is an orthonormal basis of $L^{2}([0,1))$ .

4.2 Wavelet transform

Let $\phi\colon\mathbb{R}\to\mathbb{R}$ and $\psi\colon\mathbb{R}\to\mathbb{R}$ be a orthonormal scaling function and wavelet [13], respectively, with minimal support, corresponding to an multiresolution analysis (MRA). Note that this could both be the classical “Daubechies wavelet” with a minimum-phase or “symlets” which are close to being symmetric, but with a larger phase [26, 294]. Let

[TABLE]

denote the scaled and translated versions.

A wavelet $\psi$ is said to have $\nu$ vanishing moments if

[TABLE]

For for orthogonal wavelets with minimum support, the support depends on the number of vanishing moments. That is

[TABLE]

While this system constitutes an orthonormal basis of $L^{2}(\mathbb{R})$ , in our case we require an orthonormal basis of $L^{2}([0,1))$ . There exists several construction of wavelets on the interval, but we will only consider periodic extensions and the orthogonal boundary wavelets introduced by Cohen, Daubechies and Vial in [12], which preserves the number of vanishing moments.

For wavelets on the interval we need to replace the $2\nu$ wavelets/scaling functions intersecting the boundaries at each scale, with their corresponding boundary-corrected counterparts. We postpone the formal definition of periodic and boundary wavelets until we need it, in the proof sections. But to simplify the notation let

[TABLE]

where $\phi^{\text{boundary}}_{j,k}$ and $\psi^{\text{boundary}}_{j,k}$ are either a periodic wavelet/scaling function or the boundary wavelet/scaling functions introduced in [12]. For the former extension we say that $\phi_{j,k}^{s}$ , $s\in\{0,1\}$ “originate from a periodic wavelet” while for the latter we say that it “originate from a

boundary wavelet*”.

We will throughout assume $J_{0}\in\mathbb{Z}_{+}$ satisfies $2^{J_{0}}\geq 2\nu$ for $\nu\geq 2$ and $J_{0}\geq 0$ for $\nu=1$ . This will ensure that there exits at least one $k\in\{0,\ldots,2^{j}-1\}$ such that $\operatorname{supp}(\phi_{j,k})=\operatorname{supp}(\psi_{j,k})\subseteq[0,1)$ for all $j\geq J_{0}$ .

Definition 4.3.

For a fixed number of vanishing moments $\nu$ , minimum wavelet decomposition $J_{0}$ and a boundary extension which is either periodic or boundary wavelets, let $\phi_{j,k}^{s}$ be the corresponding wavelets and scaling functions. We define

[TABLE]

Both $B_{\textnormal{wh}}$ and $B_{\textnormal{wave}}^{J_{0},\nu}$ are orthonormal bases for $L^{2}([0,1))$ .

4.3 Recovery guarantees

From Section 3 there are four unknown factors depending on $U$ which need to be estimated. These are the local coherences $\mu_{k,l}$ , the norm $\|HP_{K}^{M}\|_{1\to 2}$ where $H$ is given by (3.1), the condition number $\kappa(G)=\|G\|_{2}\|G^{-1}\|_{2}$ and the factor $\|G^{-1}\|_{2}$ found in condition (3.10).

For the two latter factors we have $G=\sqrt{P_{M}U^{*}P_{N}UP_{M}}$ . Furthermore we know that $\|G\|_{2}\leq 1$ since $U$ is an isometry. In practice we therefore only need to determine an upper bound $\|G^{-1}\|_{2}$ and from Lemma 3.3 we know that $\|G^{-1}\|_{2}\leq 1/\sqrt{\theta}$ , where $0<\theta<1$ is the balancing property constant. In other words, it suffices to determine when the balancing property holds with a given $\theta$ .

The following three propositions estimate these quantities for the case $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ .

Proposition 4.4.

Let $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ . For each $\theta\in(0,1)$ , there exits a constant $q_{\theta}\geq 0$ , such that whenever $N=2^{k+q_{\theta}}\geq 2^{k}=M$ then $U$ satisfies the balancing property of order $\theta$ for all $k\in\mathbb{N}$ .

Note that Proposition 4.4 is a consequence of Theorem 1.1 in [20].

Proposition 4.5.

Let $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ with $\nu\geq 3$ and let

[TABLE]

be sparsity and sampling levels, respectively. Then the local coherences of $U$ scales like

[TABLE]

Proposition 4.6.

Let $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ and let $\mathbf{M},\mathbf{N}\in\mathbb{N}^{r}$ be sparsity and sampling levels. Let $\Omega=\Omega_{\boldsymbol{m},\mathbb{N}}$ be a multilevel random sampling scheme, and let $H$ be as in (3.1). Then

[TABLE]

We can now present the two main theorems in this section. We point out that these are only valid for $\nu\geq 3$ vanishing moments. For $\nu=1$ , the corresponding wavelet is the Haar wavelet, and will be considered in the next subsection. For $\nu=2$ , the coherence of $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},2}]$ does not decay as fast as for the other wavelets. Whether this is because our coherence bounds are not sharp enough for this wavelet or if it is because the coherence of $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},2}]$ actually decays more slowly is not known. We do, however, present some numerics in Section 6.5 which indicate that it is potentially the latter.

Theorem 4.7.

Let $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ with $\nu\geq 3$ and let

[TABLE]

be sparsity and sampling levels, respectively. Let $\boldsymbol{s}\in\mathbb{N}^{r}$ be local sparsities. Suppose $q$ is chosen so that $U$ satisfies the balancing property with constant $0<\theta<1$ and set $G=\sqrt{P_{M}U^{*}P_{N}UP_{M}}$ . Let $\epsilon,\delta\in(0,1)$ and let $0\leq r_{0}\leq r$ , with $\tilde{m}=m_{r_{0}+1}+\cdots+m_{r}$ . Let $s=s_{1}+\cdots+s_{r}$ and $L=r\cdot\log(2\tilde{m})\cdot\log(2N)\cdot\log^{2}(2s)+\log(\epsilon^{-1})$ . If

[TABLE]

and

[TABLE]

for $k=r_{0}+1,\ldots,r$ , then with probability at least $1-\epsilon$ , the matrix in (3.11) satisfies the G-RIPL of order $({\mathbf{s}},{\mathbf{M}})$ with constant $\delta_{{\mathbf{s}},{\mathbf{M}}}\leq\delta$ .

With this in hand, we now present our main result:

Theorem 4.8.

Let $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ with $\nu\geq 3$ and let

[TABLE]

be sparsity and sampling levels, respectively. Let $\boldsymbol{s}\in\mathbb{N}^{r}$ be local sparsities, $\boldsymbol{\omega}=(s^{-1/2}_{1},\ldots,s_{r}^{-1/2},\omega_{r+1})$ be weights and let $\boldsymbol{m}\in\mathbb{N}^{r}$ be sampling densities. Let $\epsilon\in(0,1)$ and let $0\leq r_{0}\leq r$ . Let $m=m_{1}+\ldots+m_{r}$ , $\tilde{m}=m_{r_{0}+1}+\cdots+m_{r}$ , $s=s_{1}+\ldots+s_{r}$ , and $L=r\cdot\log(2\tilde{m})\cdot\log(2N)\cdot\log^{2}(2s)+\log(\epsilon^{-1})$ .

Let $H\in\mathbb{C}^{m\times\infty}$ be as in (3.1) and set $A=HP_{K}$ . Let $x\in\ell^{2}(\mathbb{N})$ , $e_{1}\in\mathbb{C}^{m}$ and $\eta>0$ . Set $e=HP_{K}^{\perp}x+e_{1}$ and $\tilde{y}=Ax+e$ . Suppose

(i)

we choose $q=q_{\theta}$ as in Proposition 4.4 so that $U$ satisfies the balancing property of order $0<\theta<1$ , 2. (ii)

we choose $\eta\geq\|e_{1}\|$ and $K$ so that $\|HP_{K}^{\perp}x\|_{2}\leq\eta^{\prime}$ , 3. (iii)

the weight $\omega_{r+1}$ satisfies

[TABLE] 4. (iv)

the $m_{k}$ ’s satisfy $m_{k}=N_{k}-N_{k-1}$ for $k=1,\ldots,r_{0}$ and

[TABLE]

Then with probability $1-\epsilon$ any solution $\hat{x}$ of the optimization problem

[TABLE]

satisfies

[TABLE]

where $C=2(2+\sqrt{3})/(2-\sqrt{3})$ and $D=8\sqrt{2}/(2-\sqrt{3})$ .

*Remark 4.9**.*

Note that the second condition (ii) can be guaranteed using Proposition 4.6. Indeed, it suffices for $K$ to satisfy

[TABLE]

Hence, given any a priori estimates on the decay of the coefficients $x$ (such as in the case of wavelets), one can use this to determine a suitable $K$ .

4.4 Uniform recovery for Haar wavelets

Below we shall see that for the Haar wavelet, $P_{N}UP_{N}$ will be an isometry for $N=2^{r}$ where $r\in\mathbb{N}$ . This can also be seen from Figure 2, where $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ is perfectly block diagonal for $\nu=1$ . This means that the G-RIPL, reduces to the $I$ -adjusted RIPL, or simply the RIPL, which we know from the finite dimensional case. Notice in particular that we also avoid any considerations where $K>M=N$ as above, since $HP_{M}^{\perp}=0$ .

Proposition 4.10.

Let $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},1}]$ and let $N=2^{k}$ , for some $k\in\mathbb{N}$ with $k\geq J_{0}+1$ . Then $P_{N}UP_{N}$ is an isometry on $\mathbb{C}^{N}$ .

Proposition 4.11.

Let $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},1}]$ and let $\mathbf{M}=\mathbf{N}=[2^{J_{0}+1},\ldots,2^{J_{0}+r}]$ be sparsity and sampling levels, respectively. Then the local coherences of $U$ are

[TABLE]

It is now straightforward to derive the following:

Theorem 4.12.

Let $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},1}]$ and let $\mathbf{M}=\mathbf{N}=[2^{J_{0}+1},\ldots,2^{J_{0}+r}]$ be sparsity and sampling levels. Let $s\in\mathbb{N}^{r}$ be local sparsities and $\boldsymbol{m}\in\mathbb{N}^{r}$ be local sampling densities. Let $\epsilon,\delta\in(0,1)$ and $0\leq r_{0}\leq r$ . Let $\tilde{m}=m_{r_{0}+1}+\ldots+m_{r}$ and $s=s_{1}+\ldots+s_{r}$ . Suppose that the $m_{k}$ ’s satisfies $m_{k}=N_{k}-N_{k-1}$ for $k=1,\ldots,r_{0}$ and

[TABLE]

Then with probability $1-\epsilon$ the matrix (3.11) satisfies the RIPL with constant $\delta_{\boldsymbol{s},\mathbf{M}}\leq\delta$ .

Proof.

Using Proposition 4.10 we know that $P_{N}UP_{N}$ is an isometry. Thus inserting the local coherences from Proposition 4.11 into (2.4) in Theorem 2.10 gives to the result. ∎

Theorem 4.13.

Let $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},1}]$ and let $\mathbf{M}=\mathbf{N}=[2^{J_{0}+1},\ldots,2^{J_{0}+r}]$ be sparsity and sampling levels. Let $\boldsymbol{s}\in\mathbb{N}^{r}$ be local sparsities, $\boldsymbol{\omega}=(s_{1}^{1/2},\ldots,s_{r}^{1/2})$ be weights and $\boldsymbol{m}\in\mathbb{N}^{r}$ be local sampling densities. Let $\epsilon\in(0,1)$ and let $0\leq r_{0}\leq r$ . Let $m=m_{1}+\ldots+m_{r}$ , $\tilde{m}=m_{r_{0}+1}+\cdots+m_{r}$ and $s=s_{1}+\ldots+s_{r}$ . Suppose we sample $m_{k}=N_{k}-N_{k-1}$ for $k=1,\ldots,r_{0}$ and

[TABLE]

for $k=r_{0}+1,\ldots,r$ . Let $H\in\mathbb{C}^{m\times\infty}$ be as in (3.1) with $A=HP_{M}$ . Let $x\in\ell^{2}(\mathbb{N})$ and $e\in\mathbb{C}^{m}$ with $\|e\|_{2}\leq\eta$ for some $\eta\geq 0$ . Set $\tilde{y}=Ax+e$ . Then any solution $\hat{x}$ of the optimization problem

[TABLE]

satisfies

[TABLE]

with probability $1-\epsilon$ , where $C=2(2+\sqrt{3})/(2-\sqrt{3})$ and $D=8\sqrt{2}/(2-\sqrt{3})$ .

Proof.

Proposition 4.10 gives $G=\sqrt{P_{M}U^{*}P_{N}UP_{M}}=\sqrt{I}=I$ . Next notice that $S_{\boldsymbol{\omega},\boldsymbol{s}}=r$ and that $P_{M}x\in\{z\in\mathbb{C}^{M}:\|Az-\tilde{y}\|_{2}\leq\eta\}$ since $\|HP_{M}^{\perp}\|=0$ . Using Theorem 3.5 we see that we can guarantee recovery of $(\boldsymbol{s},\mathbf{M})$ -sparse vectors, if $A$ satisfies the RIPL with constant $\delta_{\boldsymbol{t},\mathbf{M}}\leq 1/2$ , where $t_{l}=\min\{M_{l}-M_{l-1},8rs_{l}\}$ . Using Theorem 4.12 gives the result. ∎

5 Proof of results in Section 3

When deriving uniform recovery guarantees via the RIP, it is typical to proceed as follows. First, one shows that the RIP implies the so-called robust Null space Property (rNSP) of order $s$ (see Def. 4.17 in [16]). Second, one the shows that the rNSP implies stable and robust recovery. Thus the line of implications reads

[TABLE]

A similar line of implications holds for the RIPL and the corresponding robust Null Space Property in levels (rNSPL); see Def. 3.6 in [7]).

Both of the recovery guarantees for matrices satisfying the rNSP and rNSPL consider minimizers of the unweighed quadratically-constrained basis pursuit (QCBP) optimization problem. In our setup we consider minimizers of the weighted QCBP. We have therefore generalized the rNSPL to what we call the weighted robust null space property in levels.

For the sufficient condition for the G-RIPL in Theorem 3.6, the proof follows along similar lines as in [23]. We only sketch the main differences here.

5.1 The weighted rNSPL and norm bounds

For a set $\Theta\subseteq\{1,\ldots,M\}$ and a vector $x\in\mathbb{C}^{M}$ we let the vector $x_{\Theta}$ be given by

[TABLE]

We also define

[TABLE]

Definition 5.1 (weigthed rNSP in levels).

Let $\mathbf{M},\boldsymbol{s}\in\mathbb{N}^{r}$ be sparsity levels and local sparsities, respectively. For positive weights $\boldsymbol{\omega}\in\mathbb{R}^{r+1}$ , we say that $A\in\mathbb{C}^{m\times M}$ satisfies the weighted robust Null Space Property in Levels (weighted rNSPL) of order $(\boldsymbol{s},\mathbf{M})$ with constants $0<\rho<1$ and $\gamma>0$ if

[TABLE]

for all $x\in\mathbb{C}^{M}$ and all $\Theta\in E_{\boldsymbol{s},\mathbf{M}}$ .

Lemma 5.2 (weighted rNSPL implies $\ell^{(1,\boldsymbol{\omega})}$ -distance bound).

Suppose that $A\in\mathbb{C}^{m\times M}$ satisfies the weighted rNSPL of order $(\boldsymbol{s},\mathbf{M})$ with constants $0<\rho<1$ and $\gamma>0$ . Let $x,z\in\mathbb{C}^{M}$ . Then

[TABLE]

Proof.

Let $v=z-x$ and $\Theta\in E_{\boldsymbol{s},\mathbf{M}}$ be such that $\|x_{\Theta^{c}}\|_{1,\boldsymbol{\omega}}=\sigma_{\boldsymbol{s},\mathbf{M}}(x)_{1,\boldsymbol{\omega}}$ . Then

[TABLE]

which implies that

[TABLE]

Now consider ${\left\|v_{\Theta}\right\|}_{1,\boldsymbol{\omega}}$ . By the weighted rNSPL, we have

[TABLE]

Hence (5.3) gives

[TABLE]

and after rearranging we get

[TABLE]

Therefore, using this and (5.3) once more, we deduce that

[TABLE]

which gives the result. ∎

Lemma 5.3 (weighted rNSPL implies $\ell^{2}$ distance bound).

Suppose that $A\in\mathbb{C}^{m\times M}$ satisfies the weighted rNSPL of order $({\mathbf{s}},{\mathbf{M}})$ with constants $0<\rho<1$ and $\gamma>0$ . Let $x,z\in\mathbb{C}^{M}$ . Then

[TABLE]

Proof.

Let $v=z-x$ and $\Theta=\Theta_{1}\cup\cdots\cup\Theta_{r}$ , where $\Theta_{l}\subseteq\{M_{l-1}+1,\ldots,M_{l}\}$ , $|\Theta_{l}|=s_{l}$ is the index set of the largest $s_{l}$ coefficients of $P^{M_{l-1}}_{M_{l}}v$ in absolute value. Then

[TABLE]

which gives

[TABLE]

Since ${\left\|v_{\Theta_{l}}\right\|}_{2}\leq{\left\|v_{\Theta}\right\|}_{2}$ we deduce that

[TABLE]

Applying Young’s inequality $ab\leq\frac{1}{2}a^{2}+\frac{1}{2}b^{2}$ , we obtain

[TABLE]

Hence

[TABLE]

We now use the weighted rNSPL to get

[TABLE]

To complete the proof, we use the inequality ${\left\|v_{\Theta^{c}}\right\|}_{1,\boldsymbol{\omega}}\leq{\left\|v\right\|}_{1,\boldsymbol{\omega}}$ . ∎

5.2 Weighted rNSPL implies uniform recovery

Theorem 5.4.

Let $\mathbf{M},\boldsymbol{s}\in\mathbb{N}^{r}$ be sparsity levels and local sparsities, respectively, and let $\boldsymbol{\omega}\in\mathbb{R}^{r+1}$ be positive weights. Let $x\in\mathbb{C}^{K}$ , with $K>M$ and $e\in\mathbb{C}^{m}$ with $\|e\|_{2}\leq\eta$ . Set $y=Ax+e$ . Let $A\in\mathbb{C}^{m\times K}$ and suppose that $AP_{M}$ satisfies the weighted rNSP in levels of order $(\boldsymbol{s},\mathbf{M})$ with constants $\rho=\sqrt{3}/2$ and $\gamma>0$ . If

[TABLE]

then any solution $\hat{x}$ of the optimization problem

[TABLE]

satisfies

[TABLE]

where $C=2(2+\sqrt{3})/(2-\sqrt{3})$ and $D=8/(2-\sqrt{3})$ .

Proof.

Recall that $\rho=\sqrt{3}/2$ , and notice that this gives $C/2=(1+\rho)/(1-\rho)$ and $D/2=2/(1-\rho)$ . Next we consider the bound (5.5), and note that this bound implies

[TABLE]

We also note that (5.5) implies

[TABLE]

which can be written as

[TABLE]

Next set $v=x-\hat{x}$ and consider the $\ell^{(1,\boldsymbol{\omega})}$ -bound. First notice that since $AP_{M}$ satisfies the weighted rNSPL, Lemma 5.2 gives

[TABLE]

Here the last term can be bounded by

[TABLE]

since both $x$ and $\hat{x}$ are feasible. Combining (5.10), (5.12) and (5.14) gives

[TABLE]

Using that $\hat{x}$ is a minimizer of (5.6) gives the desired bound.

We now consider the $\ell^{2}$ -bound. First note that

[TABLE]

We shall also need

[TABLE]

Again, since $AP_{M}$ satisfies the weighted rNSPL we can apply Lemma 5.3, Lemma 5.2 and inequality (5.16) to obtain the bound

[TABLE]

Combining (5.11), (5.14), (5.15), (5.17) and now gives

[TABLE]

Using that $\hat{x}$ is a minimizer of (5.6) completes the proof. ∎

5.3 G-RIPL implies weighted rNSPL

Theorem 5.5.

Let $A\in\mathbb{C}^{m\times M}$ and let $G\in\mathbb{C}^{M\times M}$ be invertible. Let $\mathbf{M}\in\mathbb{N}^{r}$ be sparsity levels, $\boldsymbol{s},\boldsymbol{t}\in\mathbb{N}^{r}$ be local sparsities and let $\boldsymbol{\omega}\in\mathbb{R}^{r}$ be positive weights. Suppose that $A$ satisfies the G-RIPL of order $(\boldsymbol{t},\mathbf{M})$ with constant $0<\delta_{\boldsymbol{t},\mathbf{M}}<1$ , where

[TABLE]

Then $A$ satisfies the weighted rNSP in levels of order $(\boldsymbol{s},\mathbf{M})$ with constants $0<\rho<1$ and $\gamma=\sqrt{2}\|G^{-1}\|_{2}$ .

Proof.

Let $x\in\mathbb{C}^{K}$ be such that $P_{M}^{\perp}x=0$ and let $\Theta=\Theta_{1}\cup\cdots\cup\Theta_{r}$ , where $\Theta_{l}$ is the set of the largest $s_{l}$ indices of $P^{M_{l-1}}_{M_{l}}x$ in absolute value. If $t_{l}=M_{l}-M_{l-1}$ , let $T_{l,0}=\{M_{l-1}+1,\ldots,M_{l}\}$ and let $T_{l,k}=\emptyset$ for $k\geq 1$ . For $t_{l}<M_{l}-M_{l-1}$ let $T_{l,0}$ be the index set of the largest $t_{l}/2$ values of $|P^{M_{l-1}}_{M_{l}}x|$ , and let $T_{l,1}$ be the index set of the next $t_{l}/2$ largest values and so forth. In the case where there are less than $t_{l}/2$ values left at iteration $k$ , we let $T_{l,k}$ be the remaining indices. Let $T_{k}=T_{1,k}\cup\cdots\cup T_{r,k}$ and let $T_{\{0,1\}}=T_{0}\cup T_{1}$ . Since $\Theta\subseteq T_{\{0,1\}}$ we have

[TABLE]

where $\delta=\delta_{\boldsymbol{t},{\mathbf{M}}}$ . Note that

[TABLE]

Then

[TABLE]

Set $\Delta=\{l\in\{1,\ldots,r\}:t_{l}<M_{l}-M_{l-1}\}$ and notice that $T_{l,k}=\emptyset$ for $l\in\{1,\ldots,r\}\setminus\Delta$ and $k\geq 1$ . Thus for $k\geq 2$ we get

[TABLE]

Therefore

[TABLE]

This results in

[TABLE]

which establishes the weighted rNSPL of order $(\boldsymbol{s},\mathbf{M})$ with $0<\rho<1$ and $\gamma=\sqrt{2}\|G^{-1}\|_{2}$ . ∎

5.4 Proof of Theorem 3.5

Proof of Theorem 3.5.

First notice that for $0<\delta\leq 1/2$ we have

[TABLE]

Hence using Theorem 5.5 with $0<\delta_{\boldsymbol{t},\mathbf{M}}\leq\delta\leq 1/2$ and $\rho=\sqrt{3}/2$ we see that Equation (5.18), simplifies to Equation (3.5). This implies that $AP_{M}$ satisfies the weighted rNSPL of order $(\boldsymbol{s},\mathbf{M})$ , with constants $\rho=\sqrt{3}/2$ and $\gamma=\sqrt{2}\|G^{-1}\|_{2}$ . Now since

[TABLE]

we know from Theorem 5.4 that any solution $\hat{x}$ of (3.6) satisfies (3.7) and (3.8). ∎

5.5 Proof of Theorem 3.6

Proof of Theorem 3.6.

We recall that $U\in\mathcal{B}(\ell^{2})$ is an isometry and that

[TABLE]

and $m=m_{1}+\ldots+m_{r}$ . Note that

[TABLE]

and therefore

[TABLE]

Notice also that $p_{k}=1$ and $\Omega_{k}=\{N_{k-1}+1,\ldots,N_{k}\}$ for $k=1,\ldots,r_{0}$ . Next notice that the matrix $P_{\Omega_{k}}$ can be written as

[TABLE]

where $\{e_{i}\}^{\infty}_{i=1}$ is the standard basis on $\ell^{2}(\mathbb{N})$ . It now follows that

[TABLE]

where $X_{k,i}$ are random vectors given by $X_{k,i}=\frac{1}{\sqrt{p_{k}}}P_{M}U^{*}e_{t_{k,i}}$ . Note that the $X_{k,i}$ are independent, and also that

[TABLE]

where $G\in\mathbb{C}^{M\times M}$ is non-singular by assumption. Let

[TABLE]

We now define the following seminorm on $\mathbb{C}^{M\times M}$ :

[TABLE]

so that

[TABLE]

Due to (5.5) and (5.20), we may rewrite this as

[TABLE]

Having detailed the setup, the remainder of the proof now follows along very similar lines to that of [23, Thm. 3.2]. Hence we only sketch the details.

The first step is to estimate $\mathbb{E}\left(\delta_{{\mathbf{s}},{\mathbf{M}}}\right)$ . Using the standard techniques of symmetrization, Dudley’s inequality, properties of covering numbers, and arguing as in [23, Sec. 4.2], we deduce that

[TABLE]

where $C_{1}>0$ is a universal constant, $\tilde{m}=\sum^{r}_{k=r_{0}+1}m_{k}$ , and

[TABLE]

In particular,

[TABLE]

provided

[TABLE]

where $C_{2}>0$ is a constant. Using this, Talagrand’s theorem and using the fact that $\|P_{N}UP_{M}\|_{2}\leq\|U\|_{2}=1$ (see [23, Sec. 4.3]) we deduce that

[TABLE]

In particular,

[TABLE]

provided

[TABLE]

Combining this with (5.23) and (5.24) now completes the proof.

∎

5.6 Proof of Corollary 3.7 and Lemma 3.3

Proof of Corollary 3.7.

We must ensure that all the conditions are met to be able to apply Theorem 3.5 with $P_{K}x$ .

First notice that for weights $\boldsymbol{\omega}=(s_{1}^{-1/2},\ldots,s_{r}^{-1/2},\omega_{r+1})$ we have $S_{\boldsymbol{\omega},\boldsymbol{s}}=r$ and $\zeta_{\boldsymbol{s},\boldsymbol{\omega}}=1$ . Next we note that condition $(ii)$ implies that $P_{K}x$ is a feasible point since $\|HP_{K}x-\tilde{y}\|_{2}\leq\|HP_{K}^{\perp}x\|_{2}+\|e_{1}\|_{2}=\eta+\eta^{\prime}$ .

Let $G=\sqrt{P_{M}U^{*}P_{N}UP_{M}}$ . Combining condition $(i)$ and Lemma 3.3 gives $\|G^{-1}\|_{2}\leq 1/\sqrt{\theta}$ and since $\|G\|_{2}\leq 1$ we also have $\kappa(G)=\|G\|_{2}\|G^{-1}\|_{2}\leq 1/\sqrt{\theta}$ . Inserting the above equalities and inequalities into the weight condition for $\omega_{r+1}$ in Theorem 3.5 gives condition $(iii)$ .

Next we must ensure that $AP_{M}$ satisfies the G-RIPL of order $(\boldsymbol{t},\mathbf{M})$ with $\delta_{\boldsymbol{t},\mathbf{M}}\leq 1/2$ where

[TABLE]

According to Theorem 3.6 this occurs if the $m_{k}$ ’s satisfies condition $(iv)$ . The error bounds (3.7) and (3.8) now follows directly from Theorem 3.5. ∎

Proof of lemma 3.3.

First notice that the balancing property is equivalent to requiring

[TABLE]

where $\sigma_{M}(P_{N}UP_{M})$ is the $M$ th largest singular value of $P_{N}UP_{M}$ . Indeed, since $U$ is an isometry, the matrix $P_{M}-P_{M}U^{*}P_{N}UP_{M}$ is nonnegative definite, and therefore

[TABLE]

This gives (5.26). Next let $G=\sqrt{P_{M}U^{*}P_{N}UP_{M}}$ and notice that $\sigma_{M}(G)=\sigma_{M}(P_{N}UP_{M})$ . This gives $\|G^{-1}\|_{2}=1/\sigma_{M}(G)\leq 1/\sqrt{\theta}$ . ∎

6 Proof of results in Section 4

In Section 4 we found concrete recovery guarantees for the Walsh sampling and wavelet reconstruction, using the theorems in Section 3. The key to deriving Walsh-wavelet recovery guarantees boils down to estimating the quantities $\mu_{k,l}$ , $||HP_{K}^{M}||_{1\to 2}$ and $||G^{-1}||_{2}\leq\frac{1}{\sqrt{\theta}}$ . All of these quantities depend directly $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ , and to control them we will have to estimate how the entries of $U$ changes for varying $n,j,k$ and $s$ . We will therefore start this section by setting up notation for wavelets on the interval and stating some useful properties of Walsh functions. Then in Section 6.3 and 6.4 we will estimate $\mu_{k,l}$ , followed by a discussion of the sharpness of this estimate for $\nu=2$ in Section 6.5. We will then finish in Section 6.6 by estimating $||HP_{K}^{M}||_{1\to 2}$ , show how $\theta$ scales for varying $M$ and $N$ , and prove Theorem 4.7 and 4.8.

6.1 Wavelets on the interval and regularity

In section 4.2 we introduced orthogonal wavelets on the real line, but we did not make any formal definitions of the wavelets we used at the boundaries of the interval $[0,1)$ . Next we consider the two boundary extensions, periodic and boundary wavelets. To simplify the exposition we define the following sets

[TABLE]

At each scale $j\geq J_{0}$ , the periodic wavelet basis consists of the usual wavelets and scaling functions $\psi_{j,k}$ , $\phi_{j,k}$ for $k\in\Lambda_{\nu,j,\textnormal{mid}}$ and the periodic extended functions $\phi_{j,k}^{\text{per}}$ and $\psi_{j,k}^{\text{per}}$ for $k\in\Lambda_{\nu,j,\textnormal{left}}\cup\Lambda_{\nu,j,\textnormal{right}}$ . These are defined as

[TABLE]

and similarly for $\psi_{j,k}^{\text{per}}$ . Strictly speaking we could have defined these periodic extensions only for $k=0,\ldots,\nu-2$ and $k=2^{j}-\nu+1,\ldots,2^{j}-1$ , but to unify the notation for both boundary extensions we have chosen the former.

Next we have the boundary wavelet basis with $\nu$ vanishing moments. This wavelet basis consists of the same interior wavelets as the periodic basis, but with $2\nu$ boundary scaling and wavelet functions.

[TABLE]

As for the interior functions we also define the scaled versions as

[TABLE]

The names ’left’ and ’right’ corresponds to the support of these functions. That is

[TABLE]

for $k=0,\ldots,\nu-1$ .

In the following we shall see that all of our results holds for both periodic and boundary wavelets, but their treatment in some of the proofs differs slightly. To make the treatment as unified as possible we make the following definition.

Definition 6.1.

We say that $\phi_{j,k}^{s}$ , $s\in\{0,1\}$ “originates from a periodic wavelet” if

[TABLE]

We say that $\phi_{j,k}^{s}$ “originates from a boundary wavelet” if

[TABLE]

With these functions defined now for both boundary extensions, the definition of $B_{\textnormal{wave}}^{J_{0},\nu}$ is also clear. Next we make a note on the regularity of these orthogonal wavelets.

Definition 6.2.

Let $\alpha=k+\beta$ , where $k\in\mathbb{Z}_{+}$ and $0<\beta<1$ . A function $f\colon\mathbb{R}\to\mathbb{R}$ is said to be uniformly Lipschitz $\alpha$ if $f$ is $k$ -times continuously differentiable and for which the $k^{\text{th}}$ derivative $f^{(k)}$ is Hölder continuous with exponent $\beta$ , i.e.

[TABLE]

for some constant $C>0$ .

In particular the Daubechies wavelet with 1 vanishing moment (i.e., the Haar wavelet) is not uniformly Lipschitz as it is not continuous, whereas for $\nu\geq 2$ we have the constants found in table 1 [13, 239]. For large $\nu$ , $\alpha$ grows as $0.2\nu$ [26, 294]. Also note that each of the boundary functions $\phi_{k}^{\text{left}},\phi_{k}^{\text{right}}$ and $\psi^{\text{left}}_{k},\phi_{k}^{\text{right}}$ are constructed as finite linear combinations of the interior scaling function $\phi$ and wavelet $\psi$ . Thus all of these boundary functions has the same regularity as $\phi$ and $\psi$ .

6.2 Properties of Walsh functions

Definition 6.3.

Let $x=\{x_{i}\}_{i=1}^{\infty}$ and $y=\{y_{i}\}_{i=1}^{\infty}$ be sequences consisting of only binary numbers. That is $x_{i},y_{i}\in\{0,1\}$ for all $i\in\mathbb{N}$ . The operation $\oplus$ applied to these sequences gives

[TABLE]

For two binary numbers $x_{i},y_{i}\in\{0,1\}$ , we let $x_{i}\oplus y_{i}=|x_{i}-y_{i}|$ .

Proposition 6.4.

For $j,m,n\in\mathbb{Z}_{+}$ and $x,y\in[0,1)$ , the Walsh function satisfies the the following properties

[TABLE]

Proof.

Equation (6.6) and (6.5) can be found in any standard text on Walsh functions e.g., [18], whereas the last follows by inserting $j$ zeros in front of $x$ ’s dyadic expansion. ∎

6.3 Bounding the inner product $|\langle\phi_{j,k}^{s},w_{n}\rangle|$

The entries in $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ , consists of $\langle\phi_{j,k}^{s},w_{n}\rangle$ for different values of $j,k,s$ and $n$ . Thus in order to determine the local coherences we need to find an upper bound of this inner product. Next we derive such an bound for $\nu\geq 2$ vanishing moments and discusses its sharpness. For $\nu=1$ we determine the magnitude of each matrix entry explicitly.

Lemma 6.5.

Let $w_{n}\in B_{\textnormal{wh}}$ and let $\phi_{j,k}^{s}\in B_{\textnormal{wave}}^{J_{0},\nu}$ for $\nu\geq 2$ . For $j\geq J_{0}$ , $s\in\{0,1\}$ and $k\in\Lambda_{j}$ we have

[TABLE]

where

[TABLE]

if $\phi_{j,k}^{s}$ originates from a boundary wavelet and

[TABLE]

if $\phi_{j,k}^{s}$ originates from a periodic wavelet.

Proof.

First notice that for any $x\in[0,1)$ we have

[TABLE]

Next, we only consider the interior wavelets $\phi_{j,k}^{s}$ i.e. $k\in\Lambda_{\nu,j,\textnormal{mid}}$ . For $k\in\Lambda_{\nu,j,\textnormal{left}}\cup\Lambda_{\nu,j,\textnormal{right}}$ , we need to handle the two cases where $\phi_{j,k}^{s}$ orignates from a periodic and boundary wavelet seperately. The arguments/calculations for the two different boundary extensions are analogous. Also, both of these extensions will have support less than $2\nu$ .

For $k\in\Lambda_{\nu,j,\textnormal{mid}}$ , notice that $\textnormal{supp}(\phi_{j,k}^{s})=[2^{-j}(-\nu+1+k),2^{-j}(\nu+k)]$ .

[TABLE]

∎

Lemma 6.6 ([9]).

Let $f:[0,1)\to\mathbb{R}$ be uniformly Lipschitz $0<\alpha\leq 1$ then

[TABLE]

for $n\in\mathbb{Z}_{+}$ .

Theorem 6.7.

Let $\phi_{l,t}^{s}\in B_{\textnormal{wave}}^{J_{0},\nu}$ with $\nu\geq 3$ and let $w_{n}\in B_{\textnormal{wh}}$ . For $l\geq J_{0}$ and $2^{k}\leq n<2^{k+1}$ with $k\in\mathbb{Z}_{+}$ , we have

[TABLE]

for all $t\in\Lambda_{l}$ and $s\in\{0,1\}$ . For $n=0$ the bound hold with $k=0$ .

Proof.

To obtain the bound above we will combine Lemma 6.5 and Lemma 6.6. We start by arguing that $\phi_{l,t}^{s}$ have the same regularity regardless of boundary extension. Let $a\in\Gamma_{t}$ where $\Gamma_{t}$ is as in lemma 6.5.

If $\phi_{l,t}^{s}$ originates from a periodic wavelet, $\phi^{s}_{0,-a}\lvert_{[0,1)}$ , will have Lipschitz regularity $\alpha>0$ , since both $\phi$ and $\psi$ have this regularity. Next if $\phi_{l,t}^{s}$ originates from a boundary wavelet and $t\in\Lambda_{\nu,l,\text{mid}}$ , $\phi_{0,-a}^{s}\lvert_{[0,1)}$ will have Lipschitz regularity $\alpha$ , by the same argument as above. If $t\in\Lambda_{\nu,l,\text{left}}\cup\Lambda_{\nu,l,\text{right}}$ we know from the construction of the boundary functions [12] that these are finite linear combinations of $\phi_{l,t}$ and $\psi_{l,t}$ . These function will therefore posses the same regularity $\alpha$ as the interior function.

Next notice from table 1 that for $\nu\geq 3$ vanishing moments, we known that $\alpha\geq 1$ . Applying Lemma 6.5 and Lemma 6.6 then gives

[TABLE]

where $\Gamma_{t}$ depends on the boundary extension. ∎

Theorem 6.8.

Let $w_{n}\in B_{\textnormal{wh}}$ and let $\phi_{l,t}^{s}\in B_{\textnormal{wave}}^{J_{0},1}$ for $l\geq 0$ and $t\in\Lambda_{l}$ . Then

[TABLE]

Proof.

These equalities can be found in either [5] or [30]. ∎

6.4 Proof of Proposition 4.5,

4.10 and 4.11

Using the above results we are now able to determine the local coherences of $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ .

Proof of Proposition 4.5.

We use the bound found in Theorem 6.7. Recall that $\mathbf{M}=[2^{J_{0}+1},\ldots,2^{J_{0}+r}]$ and $\mathbf{N}=[2^{J_{0}+1},\ldots,2^{J_{0}-1+r},2^{J_{0}+r+q}]$ . For fixed $l\in\{1,\ldots,r\}$ and $k\in\{2,\ldots,r\}$ we have

[TABLE]

For $l\in\{1,\ldots,r\}$ and $k=1$ we have $N_{0}=0$ . This gives

[TABLE]

∎

Proof of Proposition 4.10.

Since both $B_{\textnormal{wave}}^{J_{0},1}$ and $B_{\textnormal{wh}}$ are orthonormal, $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},1}]$ is an isometry on $\ell^{2}(\mathbb{N})$ i.e. $U^{*}U=I\in\mathcal{B}(\ell^{2}(\mathbb{N}))$ . Let $N=2^{k}$ for some $k\in\mathbb{N}$ with $k\geq J_{0}+1$ . Using Theorem 6.8 we see that

[TABLE]

which means that

[TABLE]

∎

Proof of Proposition 4.11.

We use the bound found in Theorem 6.8. Recall that $\mathbf{M}=\mathbf{N}=[2^{J_{0}+1},\ldots,2^{J_{0}+r}]$ . For fixed $k,l\in\{1,\ldots,r\}$ we have that

[TABLE]

∎

6.5 About the sharpness of the local coherence bounds

As can be seen from Proposition 4.11, the coherence bounds for $\nu=1$ are sharp. However, for $\nu\geq 2$ , we have not discussed their sharpness. In fact, none of the results in this paper consider the case for $\nu=2$ vanishing moments. The reason for this is that these wavelet have a Lipschitz regularity $\alpha\approx 0.55$ , which means that the bound in Theorem 6.7 would have less rapid decay if we had included these wavelets in the theorem. To simplify the presentation we have chosen to exclude them.

We will argue that Theorem 6.7 does not seem to extend to wavelets with $\nu=2$ vanishing moments. Let $\mathbf{M}=\mathbf{N}=[2^{J_{0}+1},\ldots,2^{J_{0}+r}]$ and $U=[B_{\textnormal{wh}},B_{\textnormal{wave}}^{J_{0},\nu}]$ for $\nu\geq 2$ . Notice that setting $\nu=2$ does only affect the local coherence estimates $\mu_{k,l}$ for $k\geq l$ . For $k<l$ , the local coherences are unaffected by the regularity of the wavelet. This follows from Lemma 6.5, by setting $|\mathcal{W}\phi^{s}(\cdot+l)(0)|\approx 1$ . Next consider the case where $k\geq l$ , then Theorem 6.7 suggests that $\mu_{k,l}/\mu_{k+1}\approx 4$ for $\nu\geq 3$ .

We now consider table 2 and notice that for $\nu=2$ , all of the 18 entries in table 2 have values less than $4$ . This suggest that the bound in Theorem 6.7 does not extend to the case of $\nu=2$ vanishing moments. From the same table we also observe that for $\nu=4$ , the bound in Theorem 6.7 seem to be quite sharp. While there are a few entries that are less than $4$ , most are very close, if not larger than this value.

6.6 Proof of remaining results in Section 4

Proof of Proposition 4.4.

This proposition is a consequence of Theorem 1.1 in [20]. Let $\mathcal{S}_{N}=\{w_{n}:n=0,\ldots,N-1\}$ and $\mathcal{R}_{M}$ be the $M$ first function in $B_{\textnormal{wave}}^{J_{0},\nu}$ . The subspace cosine angle between $\mathcal{S}_{N}$ and $\mathcal{R}_{M}$ is defined as

[TABLE]

and $P_{\mathcal{S}_{N}}$ is the projection operator onto $\mathcal{S}_{N}$ . As both $B_{\textnormal{wh}}$ and $B_{\textnormal{wave}}^{J_{0},\nu}$ are orthonormal bases, the synthesis and analysis operators are unitary. We therefore have

[TABLE]

Furthermore notice that by equation (5.29) and the definition of the balancing property, we have

[TABLE]

Hence if $U$ satisfies the balancing property of order $\theta\in(0,1)$ for $N$ and $M$ , then $1/\cos(\omega(\mathcal{R}_{M},\mathcal{S}_{N}))\leq 1/\theta$ , where $1/\theta>1$ . Next for $M\in\mathbb{N}$ and $\gamma>1$ we define the stable sampling rate as

[TABLE]

Rearranging the terms we see that if $N$ , $M$ satisfies the stable sampling rate of order $\gamma=1/\theta>1$ then $U$ satisfies the balancing property of order $\theta$ for $N$ and $M$ .

Theorem 1.1 in [20] states that for $M=2^{r}$ , $r\in\mathbb{N}$ and for all $\gamma>1$ there exists a constant $S_{\gamma}>1$ (dependent on $\gamma$ ), such that whenever $N\geq S_{\gamma}M$ , then $1/\cos(\omega(\mathcal{R}_{M},\mathcal{S}_{N}))<\gamma$ . Moreover, we have the relation $\Theta(M,\gamma)\leq S_{\gamma}M=\mathcal{O}(M)$ . Hence if $q=\left\lceil\log_{2}S_{1/\theta}\right\rceil$ we see that the proposition hold with $N=2^{k+q}\geq S_{1/\theta}2^{k}>2^{k}=M$ . ∎

Proof of Proposition 4.6.

Using Theorem 6.7, we see that $\mu(P_{N}UP_{K}^{\perp})\lesssim K^{-1}$ . This gives

[TABLE]

∎

Proof of Theorem 4.7.

First recall that $\mathbf{M}=[2^{J_{0}+1},\ldots,2^{J_{0}+r}]$ and $\mathbf{N}=[2^{J_{0}+1},\allowbreak\ldots,2^{J_{0}+r-1},2^{J_{0}+r+q}]$ where $q\geq 0$ is chosen so that $G$ satisfies the balancing property of order $0<\theta<1$ . From Lemma 3.3 we therefore have $\|G^{-1}\|_{2}\leq 1/\sqrt{\theta}$ .

From Theorem 3.6 we know that the matrix $A$ in equation (3.11) satisfies the G-RIPL with $\delta_{\boldsymbol{s},\mathbf{M}}\leq\delta$ , provided the sample densities $\boldsymbol{m}\in\mathbb{N}^{r}$ satisfies $m_{k}=N_{k}-N_{k-1}$ for $k=1,\ldots,r_{0}$ , and

[TABLE]

for $k=r_{0}+1,\ldots,r$ . Next notice that $N_{k}-N_{k-1}=2^{J_{0}+k-1}$ for $k=2,\ldots,r-1$ , while $N_{r}-N_{r-1}=2^{J_{0}+r}(2^{q}-2^{-1})$ and $N_{1}-N_{0}=2^{J_{0}+1}$ . Using the local coherences $\mu_{k,l}$ from Proposition 4.5 we obtain

[TABLE]

Inserting this and $\|G^{-1}\|_{2}^{2}\leq\theta^{-1}$ into (6.13) leads to the sampling condition in Theorem 4.7. ∎

Proof of Theorem 4.8.

The theorem is identical to Corollary 3.7, except that we have fixed $\mathbf{M}$ and $\mathbf{N}$ . The concrete values for these have been inserted in condition $(iv)$ together with the local coherences $\mu_{k,l}$ . The computation of this can be found in the proof above. ∎

Acknowledgements

The authors would like to thank Simone Brugiapaglia, Simon Foucart, Remi Gribonval, Øyvind Ryan and Laura Thesing for useful discussions and comments. BA acknowledges support from the Natural Sciences and Engineering Research Council of Canada through grant 611675. ACH acknowledges support from the UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/L003457/1, a Royal Society University Research Fellowship, and the Philip Leverhulme Prize (2017).

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Adcock, C. Boyer, and S. Brugiapaglia. On oracle-type local recovery guarantees in compressed sensing. ar Xiv preprint ar Xiv:1806.03789 , 2018.
2[2] B. Adcock and A. C. Hansen. Generalized sampling and infinite-dimensional compressed sensing. Foundations of Computational Mathematics , 16(5):1263–1323, 2016.
3[3] B. Adcock, A. C. Hansen, C. Poon, and B. Roman. Breaking the coherence barrier: A new theory for compressed sensing. In Forum of Mathematics, Sigma , volume 5. Cambridge University Press, 2017.
4[4] B. Adcock, A. C. Hansen, and B. Roman. A note on compressed sensing of structured sparse wavelet coefficients from subsampled fourier measurements. IEEE Signal Processing Letters , 23(5):732–736, 2016.
5[5] V. Antun. Coherence estimates between hadamard matrices and daubechies wavelets, 2016. Master’s thesis, University of Oslo .
6[6] G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. S. Kittle. Compressive coded aperture spectral imaging: An introduction. IEEE Signal Processing Magazine , 31(1):105–115, 2014.
7[7] A. Bastounis and A. C. Hansen. On the absence of uniform recovery in many real-world applications of compressed sensing and the restricted isometry property and nullspace property in levels. SIAM Journal on Imaging Sciences , 10(1):335–371, 2017.
8[8] S. Brugiapaglia and B. Adcock. Robustness to unknown error in sparse regularization. IEEE Transactions on Information Theory , 64(10):6638–6661, 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Uniform recovery in infinite-dimensional compressed sensing and

Abstract

Keywords:

Mathematics Subject Classification (2010):

1 Introduction

1.1 Main results

2 Sparsity in levels in finite dimensions

2.1 Notation

2.2 Finite model

Definition 2.1** (RIP).**

Theorem 2.2** ([16, Thm. 6.12]).**

Definition 2.3** (Coherence).**

Theorem 2.4** ([16, Thm. 12.32]).**

Definition 2.5** (Sparsity in levels).**

Definition 2.6** (Multilevel random subsampling).**

Definition 2.7** (RIPL).**

Theorem 2.8** ([7, Thm. 4.4]).**

Definition 2.9**.**

Theorem 2.10** ([23, thm. 3.2]).**

Theorem 2.11** ([23, Cor. 3.3]).**

2.3 Shortcomings

3 Extensions to infinite dimensions

3.1 Setup

Definition 3.1**.**

3.2 The balancing property

Definition 3.2**.**

Lemma 3.3**.**

3.3 G\boldsymbol{G}G-adjusted Restricted Isometry Property in Levels (G-RIPL)

Definition 3.4** (G-RIPL).**

Theorem 3.5**.**

3.4 Sufficient condition for the G-RIPL

Theorem 3.6** (Subsampled isometries and the G-RIPL).**

3.5 Overall recovery guarantee

Corollary 3.7**.**

4 Recovery guarantees for Walsh sampling with wavelet reconstruction

4.1 Walsh functions

Definition 4.1**.**

Definition 4.2** (Walsh basis).**

4.2 Wavelet transform

Definition 4.3**.**

4.3 Recovery guarantees

Proposition 4.4**.**

Proposition 4.5**.**

Proposition 4.6**.**

Theorem 4.7**.**

Theorem 4.8**.**

Remark 4.9*.*

4.4 Uniform recovery for Haar wavelets

Proposition 4.10**.**

Proposition 4.11**.**

Theorem 4.12**.**

Proof.

Theorem 4.13**.**

Proof.

5 Proof of results in Section 3

5.1 The weighted rNSPL and norm bounds

Definition 5.1** (weigthed rNSP in levels).**

Lemma 5.2** (weighted rNSPL implies ℓ(1,ω)\ell^{(1,\boldsymbol{\omega})}ℓ(1,ω)-distance bound).**

Proof.

Lemma 5.3** (weighted rNSPL implies ℓ2\ell^{2}ℓ2 distance bound).**

Proof.

5.2 Weighted rNSPL implies uniform recovery

Theorem 5.4**.**

Proof.

5.3 G-RIPL implies weighted rNSPL

Theorem 5.5**.**

Proof.

5.4 Proof of Theorem 3.5

Proof of Theorem 3.5.

5.5 Proof of Theorem 3.6

Proof of Theorem 3.6.

5.6 Proof of Corollary 3.7 and Lemma 3.3

Proof of Corollary 3.7.

Proof of lemma 3.3.

Definition 2.1 (RIP).

Theorem 2.2 ([16, Thm. 6.12]).

Definition 2.3 (Coherence).

Theorem 2.4 ([16, Thm. 12.32]).

Definition 2.5 (Sparsity in levels).

Definition 2.6 (Multilevel random subsampling).

Definition 2.7 (RIPL).

Theorem 2.8 ([7, Thm. 4.4]).

Definition 2.9.

Theorem 2.10 ([23, thm. 3.2]).

Theorem 2.11 ([23, Cor. 3.3]).

Definition 3.1.

Definition 3.2.

Lemma 3.3.

3.3 $\boldsymbol{G}$ -adjusted Restricted Isometry Property in Levels (G-RIPL)

Definition 3.4 (G-RIPL).

Theorem 3.5.

Theorem 3.6 (Subsampled isometries and the G-RIPL).

Corollary 3.7.

Definition 4.1.

Definition 4.2 (Walsh basis).

Definition 4.3.

Proposition 4.4.

Proposition 4.5.

Proposition 4.6.

Theorem 4.7.

Theorem 4.8.

*Remark 4.9**.*

Proposition 4.10.

Proposition 4.11.

Theorem 4.12.

Theorem 4.13.

Definition 5.1 (weigthed rNSP in levels).

Lemma 5.2 (weighted rNSPL implies $\ell^{(1,\boldsymbol{\omega})}$ -distance bound).

Lemma 5.3 (weighted rNSPL implies $\ell^{2}$ distance bound).

Theorem 5.4.

Theorem 5.5.

Definition 6.1.

Definition 6.2.

Definition 6.3.

Proposition 6.4.

6.3 Bounding the inner product $|\langle\phi_{j,k}^{s},w_{n}\rangle|$

Lemma 6.5.

Lemma 6.6 ([9]).

Theorem 6.7.

Theorem 6.8.