Uniform recovery in infinite-dimensional compressed sensing and applications to structured binary sampling
Ben Adcock, Vegard Antun, Anders C. Hansen

TL;DR
This paper establishes uniform recovery guarantees for infinite-dimensional compressed sensing with structured sparsity, introducing multilevel sampling schemes and demonstrating their effectiveness in binary Walsh sampling applications.
Contribution
It provides the first uniform recovery guarantees for infinite-dimensional compressed sensing with local sparsity in levels and multilevel sampling, applicable to binary Walsh sampling.
Findings
Recovery guarantees are sharp up to log factors.
Improves existing results for unweighted -regularization.
First guarantees for Walsh transform with wavelet bases in binary sampling.
Abstract
Infinite-dimensional compressed sensing deals with the recovery of analog signals (functions) from linear measurements, often in the form of integral transforms such as the Fourier transform. This framework is well-suited to many real-world inverse problems, which are typically modelled in infinite-dimensional spaces, and where the application of finite-dimensional approaches can lead to noticeable artefacts. Another typical feature of such problems is that the signals are not only sparse in some dictionary, but possess a so-called local sparsity in levels structure. Consequently, the sampling scheme should be designed so as to exploit this additional structure. In this paper, we introduce a series of uniform recovery guarantees for infinite-dimensional compressed sensing based on sparsity in levels and so-called multilevel random subsampling. By using a weighted -regularizer we…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7| 2 | 0.55 |
|---|---|
| 3 | 1.08 |
| 4 | 1.61 |
| 3.017 | |||
| 2.532 | 1.854 | ||
| 3.292 | 2.532 | 1.846 | |
| 3.653 | 3.293 | 2.534 | |
| 3.828 | 3.653 | 3.293 | |
| 3.914 | 3.828 | 3.654 | |
| 3.957 | 3.914 | 3.828 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Uniform recovery in infinite-dimensional compressed sensing and
applications to structured binary sampling
Ben Adcock 111Simon Fraser University, Canada
Vegard Antun 222University of Oslo, Norway 444Corresponding author ([email protected])
Anders C. Hansen 333University of Cambridge, United Kingdom 222University of Oslo, Norway
Abstract
Infinite-dimensional compressed sensing deals with the recovery of analog signals (functions) from linear measurements, often in the form of integral transforms such as the Fourier transform. This framework is well-suited to many real-world inverse problems, which are typically modelled in infinite-dimensional spaces, and where the application of finite-dimensional approaches can lead to noticeable artefacts. Another typical feature of such problems is that the signals are not only sparse in some dictionary, but possess a so-called local sparsity in levels structure. Consequently, the sampling scheme should be designed so as to exploit this additional structure. In this paper, we introduce a series of uniform recovery guarantees for infinite-dimensional compressed sensing based on sparsity in levels and so-called multilevel random subsampling. By using a weighted -regularizer we derive measurement conditions that are sharp up to log factors, in the sense they agree with those of certain oracle estimators. These guarantees also apply in finite dimensions, and improve existing results for unweighted -regularization. To illustrate our results, we consider the problem of binary sampling with the Walsh transform using orthogonal wavelets. Binary sampling is an important mechanism for certain imaging modalities. Through carefully estimating the local coherence between the Walsh and wavelet bases, we derive the first known recovery guarantees for this problem.
Keywords:
Infinite-dimensional compressed sensing, uniform recovery, Walsh sampling, wavelet recovery, sparsity in levels, local coherence
Mathematics Subject Classification (2010):
94A20, 42C40, 42C10, 15B52
1 Introduction
Compressive sensing (CS), introduced by Candes, Romberg & Tao in [10] and Donoho in [14], has been an area of substantial research during the last decade. The key assumption, which lays the foundation for this field of research, is that a sparse vector can be recovered from an underdetermined system of linear equations, using, for instance, convex optimization algorithms [15, 16].
Imaging has been one of the most successful areas of application of CS. However, in this area, the sparsity assumption is typically too general. Examples include all applications using Fourier samples – such as Magnetic Resonance Imaging (MRI) [22, 24, 25], surface scattering [21], Computerized Tomography (CT) and electron microscopy – as well as applications using binary sampling, e.g. fluorescence microscopy [29], lensless imaging [33] and numerous other optical imaging modalities [6, 17, 32]. Natural images, when sparsified via a wavelet (or more generally, -let) transform, are not only sparse, but have specific sparsity structure [3, 27]. For wavelets, which will be our sparsifying transform in this paper, natural images have coefficients where most of the large entries are concentrated at the coarse scales, and progressively fewer at the fine scales (termed asymptotic sparsity in [3]).
In the presence of structured sparsity, it is natural to ask how best to promote this additional structure. In [3] it was proposed to do this via the sampling operator. Wavelets partition Fourier space into dyadic bands corresponding to distinct scales. Hence, by choosing Fourier samples in these bands corresponding to the local sparsities, one obtains as structured sampling scheme – a so-called multilevel sampling scheme – which promotes the asymptotic sparsity structure. The practical benefits of such schemes have been demonstrated in [27] for various different imaging modalities, including MRI, Nuclear Magnetic Resonance (NMR) spectroscopy, fluorescence microscopy and Helium Atom Scattering. Theoretical analysis has been presented in [3] (nonuniform recovery) and [7, 23] (uniform recovery in the finite-dimensional setting).
1.1 Main results
This paper has two main objectives. First, we generalize existing uniform recovery guarantees [7, 23] from the finite-dimensional to the infinite-dimensional setting. This extension is important for practical imaging. Although much of the compressive imaging literature considers the recovery of discrete images (i.e. finite-dimensional arrays) from discrete measurements (e.g. the discrete Fourier transform), modalities such as MRI, NMR and others are naturally analog, and hence better modelled over the continuum (i.e. functions, and the continuous Fourier transform). Indeed, as we will see in Section 2.3, discretizing such a problem leads to measurement mismatch [11], and in the case of wavelet recovery, the wavelet crime [28, 232], both of which can introduce artefacts in the reconstruction [19]. In this paper, we consider signals as functions and work with continuous integral transforms, thus avoiding these pitfalls.
In our theoretical analysis, we also improve the uniform recovery guarantee given in previous works [7, 23]. Unlike previous results, our recovery guarantees are, up to log factors, optimal: specifically, they agree with those of the oracle least-square estimator based on a priori knowledge of the support [1]. We do this by replacing the standard -minimization decoder by a certain weighted -minimization decoder; an idea originally proposed in [31].
Our second objective is to consider binary sampling. Previous works have addressed the case of (discrete or continuous) Fourier sampling. Yet many imaging modalities, e.g. fluorescence microscopy and lensless imaging, require binary sampling operators. To do so, we replace the Fourier transform
[TABLE]
by the binary Walsh transform
[TABLE]
where , denote the Walsh functions. This is a widely used sampling operator in binary imaging [29, 33], and often goes under the name of Hadamard sampling in the discrete case. Working with this continuous transform, we provide analogous guarantees for binary sampling to those for Fourier sampling. As a side note, we remark that working in the continuous setting also simplifies the analysis (specifically, the derivation of so-called local coherence estimates) over working directly with the discrete setup.
We note that in this paper we only consider recovery guarantees for one dimensional functions. We expect that the setup for higher dimensional function will deviate slightly from what we present here, and we will save this discussion for future work.
The outline of the remainder of this paper is as follows. We commence in Section 2 by reviewing previous work, and in particular, the existing finite-dimensional theory. We then introduce an abstract infinite-dimensional model for isometries acting on in Section 3. Here we will derive sufficient conditions for such operators to provide uniform recovery guarantees. In Section 4 we continue this work by finding conditions for which the cross-Gramian between a wavelet and Walsh basis satisfies these conditions. Finally in Section 5, 6 and 6.6 we will present proofs of our main results.
2 Sparsity in levels in finite dimensions
2.1 Notation
For and we let denote the projection onto the linear span of the associated subset of the canonical basis, i.e. for , we have if and if . Sometimes, we will abuse this notation slightly by assuming , and discard all the zero entries in . Whether we mean or will be clear from the context. If we simply write , and simply if .
We call a vector -sparse if , where . We write if there exits a constant independent of all relevant parameters, so that , and similarly for .
2.2 Finite model
Let be a measurement matrix e.g. a Fourier of Hadamard matrix, denoted and , respectively, and let with . In a typical finite-dimensional CS setup we consider the recovery of a signal from measurements , where is a vector of measurement error. If is sparse in a discrete wavelet basis, one then recovers its coefficients by solving the optimization problem
[TABLE]
where is a discrete wavelet transform and is a noise parameter. Usually one would scale so that it becomes orthonormal and choose an orthonormal wavelet basis, so that the matrix acts as an isometry on .
Suppose that is indeed an isometry. To obtain a uniform recovery guarantee for the above system, one typically first shows that the matrix , with , satisfies the Restricted Isometry Property (RIP) with high probability.
Definition 2.1** (RIP).**
Let and . The Restricted Isometry Constant (RIC) of order is the smallest such that
[TABLE]
where denotes the set of -sparse vectors in . If we say that has the Restricted Isometry Property (RIP) of order .
Theorem 2.2** ([16, Thm. 6.12]).**
Suppose the RIC of a matrix satisfies . Then for any and with , any solution of
[TABLE]
satisfies
[TABLE]
where are constants dependent on only and .
For an isometry the question of whether or not satisfies the RIP is related to the so-called coherence of :
Definition 2.3** (Coherence).**
Let be an isometry. The coherence of is
[TABLE]
Theorem 2.4** ([16, Thm. 12.32]).**
Let be an isometry and let . Suppose where each is chosen uniformly and independently at random from the set . If
[TABLE]
then with probability the matrix , with , satisfies the RIP of order with .
(We slightly abuse notation here in that we allow for possible repeats of the values that make up ). Thus if the coherence we obtain the RIP of order using approximately measurements up to constants and log factors.
There are, however, two problems with this approach. First, in our setup, where is the product of a Fourier or Hadamard matrix and a discrete wavelet transform, the coherence . Hence satisfying the RIP requires at least measurements. Second, the RIP asserts recovery for all -sparse vectors of wavelet coefficients, and thus does not exploit any additional structure these coefficients possess. However, as stated, wavelet coefficient are highly structured: large wavelet coefficients tend to cluster at coarse scales, with coefficients at fine scales being increasingly sparse.
Motivated by this, the following structured sparsity model was introduced in [3]:
Definition 2.5** (Sparsity in levels).**
Let , , with and let with , for . We say that the vector is sparse in levels if
[TABLE]
In which case we call , -sparse, where and are called the local sparsities and sparsity levels, respectively. We denote the set of all -sparse vectors by .
As noted above, randomly subsampling an isometry is a poor measurement protocol for coherent problems such as Fourier–Wavelets. Instead, in [3] it was proposed to sample in the following structured way:
Definition 2.6** (Multilevel random subsampling).**
Let , where and with for , and . For each , let if and if not, let be chosen uniformly and independently from the set , and set . If we refer to as an -multilevel subsampling scheme.
For this structured model, the following extensions of the RIP was first introduced in [7].
Definition 2.7** (RIPL).**
Let be given local sparsities and sparsity levels, respectively. For a matrix the Restricted Isometry Constant in Levels (RICL) of order , denoted , is the smallest such that
[TABLE]
We say that has the Restricted Isometry Property in Levels (RIPL) if .
We shall see that this leads to uniform recovery of all -sparse vectors, but first we define the best -term approximation error of . That is
[TABLE]
Theorem 2.8** ([7, Thm. 4.4]).**
Let be local sparsities and sparsity levels, respectively. Let and . Suppose that the RICL for the matrix satisfies
[TABLE]
Then, for and with , any solution of
[TABLE]
satisfies
[TABLE]
where are constants which only dependent on .
In [23] the authors investigated conditions under which a subsampled isometry satisfies the RIPL. In was shown that the number of samples required to satisfy the RIPL was related to the so-called local coherence properties of :
Definition 2.9**.**
Let be an isometry and be given sampling and sparsity levels. The local coherence of is
[TABLE]
Theorem 2.10** ([23, thm. 3.2]).**
Let be an isometry. Let , , and . Let be an -multilevel random subsampling scheme. Let and . Suppose that the s satisfy
[TABLE]
and
[TABLE]
for . Then the matrix
[TABLE]
satisfies the RIPL of order with constant .
This theorem characterizes the number of local measurements needed to ensure uniform recovery explicitly in terms of local sparsities and local coherences . In particular, if the local coherences are suitably well-behaved, then recovery may still be possible from highly subsampled measurements, even though the global coherence may be high (see next). Note that the condition (2.3), whereby the first sampling levels are saturated, models practical imaging scenarios where the low Fourier frequencies are typically fully sampled.
To illustrate this theorem, in [4] the authors consider the one-dimensional discrete Fourier sampling problem with sparsity in Haar wavelets. For the Haar wavelet basis we choose an ordering where the first level consists of the scaling function and mother wavelet and the subsequent levels are chosen so that consists of the wavelets at scale . This gives the sparsity levels
[TABLE]
where (assumed to be an integer). Next we define the entries in the Fourier matrix as
[TABLE]
where we have started the ordering of the rows with negative indices for convenience. We define the sampling levels for the frequencies in dyadic bands with and
[TABLE]
Notice that for a suitable reordering of the rows of these bands corresponds to the sampling levels .
Theorem 2.11** ([23, Cor. 3.3]).**
Let for some and let , where is the Haar wavelet matrix. Let and let . Let and . For each suppose we draw Fourier samples from band randomly and independently, where
[TABLE]
Then with probability at least the matrix (2.5) satisfies the RIPL with constant .
Here, for convenience, we have taken ; see [23] for further discussion on this point.
2.3 Shortcomings
These results have two primary shortcomings, which we now discuss in further detail. The key issue is that they are limited to finite dimensions. As noted in Section 1, applying finite-dimensional recovery procedures to analog problems can result in artefacts. For simplicity, let . We have argued that analog signals should be modelled as elements in , rather than . Yet, above we have tried to use discrete tools for recovering the signal by replacing and with and , respectively. Next we argue that this construction leads to both measurement mismatch and the wavelet crime.
Let denote step functions on the interval and set . We see that replacing with is equivalent to replacing by e.g. for some , since . Clearly, will be a poor approximation to . We refer to this as measurement mismatch.
Next let denote a scaling function and wavelet, respectively, and set for . By construction the solution of (2.1) will be the coefficients of a function written in a basis consisting of both wavelets and scaling functions. Equivalently we can represent in the basis using the coefficients . The wavelet crime is whenever we let , represent pointwise samples of i.e. .
What does this mean for reconstruction? To illustrate the issue we provide a similar example to the first numerical simulation in [2], showing how finite-dimensional compressed sensing fails to recover even a function that is 1-sparse (meaning it has only one non-zero coefficient) in its wavelet decomposition. Indeed, in Figure 1 we consider the problem of recovering a function from samples of the continuous Walsh transform. In particular, we choose , where is the Daubechies scaling function, corresponding to the wavelet with four vanishing moments. Figure 1 shows the poor performance of CS using the discrete finite-dimensional setup when applied to a continuous problem. Conversely, the infinite-dimensional CS approach, which we develop in the next sections, gives a much higher fidelity reconstruction from exactly the same samples as used in the finite-dimensional case. In fact, the infinite-dimensional CS reconstruction recovers perfectly up to numerical errors occurring from solving the optimization problem. We also observe the slightly paradoxical phenomenon in the finite-dimensional case: more samples do not improve performance. This is due to the fact that the finite-dimensional CS solution with full sampling coincides with the truncated Walsh series (direct inversion) approximation. This approximation is clearly highly suboptimal, as demonstrated in Figure 1.
We note in passing that the above crimes stem from too early a discretization of the inverse problem. Our infinite-dimensional CS approach replaces by a finite section of the an isometry representing change of basis between the continuous Fourier or Walsh transform and wavelet basis.
On a related note, even if one were to ignore the above issues, estimating the local coherences in the discrete setting for anything but the Haar wavelet becomes extremely complicated. Conversely, by moving to the continuous setting, these estimates become much easier to derive. We do this later in the paper for arbitrary Daubechies’ wavelets with the Walsh transform.
The second shortcoming relates to Theorem 2.8. It says that we can guarantee recovery of all sparse signals provided the matrix satisfies the RIPL with constant
[TABLE]
Here is the number of levels and is the sparsity ratio. Inserting the above inequality into Theorem 2.10 gives a sampling condition of the form
[TABLE]
where is the log factors. This means that the sparsity ratio will affect the sampling condition in all sampling levels. Thus for signals where we expect the local sparsities to vary greatly from level to level (e.g. wavelets) this will lead to a unreasonably high number of samples.
To overcome this problem, using an idea from [31], we replace the -regularizer in the optimization problem (2.1) with a weighted -regularizer. For a suitable choice of weights, this removes the factor of in the various measurement conditions. As we show, these guarantees are optimal up to constants and log factors.
3 Extensions to infinite dimensions
3.1 Setup
We will continue with the notation we introduced above, extended to infinite dimensions. That is, we assume that the signal is an element of . We still let denote the projection onto the canonical basis, but we now let it be an element in either or . Similarly we call a vector -sparse if is -sparse and . Here and we refer to it as the sparsity bandwidth of . For an isometry we define the coherence of as .
Next we describe the setup for a general sampling basis and a sparsifying basis , both assumed to be orthonormal bases of . In Section 4, we will specialize this so that is the Walsh sampling basis and is a wavelet sparsifying basis. This will enable us to derive concrete recovery guarantees for . The setup below is, however, completely general.
For the two bases and we can represent using the coefficients and , respectively. To change the representation from to we define the following matrix.
Definition 3.1**.**
Let and be orthonormal bases for . The change of basis matrix between and is the infinite matrix with entries
[TABLE]
We will denote this matrix by .
Notice in particular that since and are orthonormal, is an isometry on and we can write .
Next let be a given multilevel random sampling scheme with . We refer to as the sampling bandwidth of (as discussed later, this will be chosen in terms of sampling bandwidth to ensure stable truncation of ). Now define the matrix
[TABLE]
and we use the slightly unusual notation for the operators . Due to the scaling factors we consider scaled noisy measurements
[TABLE]
where is a diagonal matrix with the corresponding scaling factors found in along the diagonal and is the measurement noise.
Suppose that is approximately -sparse with sparsity bandwidth . It is tempting to form the finite matrix and solve the minimization problem
[TABLE]
However, note that the truncation of to introduces an additional truncation error . Indeed,
[TABLE]
and this poses a problem since for the above decoder we require in order for to be a feasible point. For some applications we might have a rough estimate of , but any estimate of would require a priori knowledge of , the signal we are trying to recover. This is generally impossible. (We note in passing that there is some recent work [8] which derives CS recovery guarantees in the absence of feasibility of the target vector , but the application of this work to the sparse in levels model is not clear).
To overcome this issue, we will introduce a data fidelity parameter and assume we know so that we can let . Then there will always exits a such that lies in the feasible set corresponding to the augmented matrix
[TABLE]
for all . In practice (for the general case) it will also be impossible determine a sufficient value for , but for fixed there will always exist such a . It should, however, be noted that there are special cases, such as Walsh sampling and wavelet recovery, where sufficient values for are known; see Remark 4.9.
This aside, as previously mentioned, we also now modify the optimization problem to include weights. Specifically, let be given sparsity levels and local sparsities respectively. For positive weights we define
[TABLE]
with for . Notice that this weighted regularizer assigns constant weights on each sparsity level. With this in hand, our recovery procedure is
[TABLE]
with as in (3.3) and .
3.2 The balancing property
We now discuss the relation between the sampling and sparsity bandwidths and . From generalized sampling theory [2] we know that we must choose to obtain a stable mapping between the first sampling basis functions and the first sparsity basis functions. The degree of stability for this solution will depend of the so-called balancing property:
Definition 3.2**.**
Let be an isometry. Let and . Then has the balancing property with constant if
[TABLE]
Note that the balancing property may not hold for any . However, it always holds for sufficiently large (for fixed ). Indeed, in the operator norm, hence the balancing property holds with arbitrarily close to for large enough .
Below we shall see that this property will also affect our recovery guarantees, but it will be camouflaged as the quantity , where . This gives the following relation.
Lemma 3.3**.**
Let be an isometry satisfying the balancing property of order for . Let be self-adoint and nonnegative definite. Then is invertible and
[TABLE]
3.3 -adjusted Restricted Isometry Property in Levels (G-RIPL)
Our theoretical analysis requires a RIP-type property for the matrix . However, as implied in the previous discussion, the finite matrix (from which is constructed) is not an isometry for any . In particular, unlike in finite dimensions is not the identity. In order to handle this situation, we introduce the following generalization of the RIP:
Definition 3.4** (G-RIPL).**
Let , be invertible, be sparsity levels and be local sparsities. The -adjusted Restricted Isometry Constant in Levels (G-RICL) is the smallest such that
[TABLE]
If we say that the matrix satisfies the -adjusted Restricted Isometry Property in Levels (G-RIPL) of order .
The G-RIPL is of course completely general and can be stated for any . However, in the following we will let and show that the matrix (or equivalently, – note that consists of vectors with ) satisfies the G-RIPL for this particular .
First, however, we show that the G-RIPL implies uniform recovery. For this, we introduce the following notation:
[TABLE]
Notice in particular that for the choice we have and for the choice we have . Finally, we let denote the condition number of .
Theorem 3.5**.**
Let , with and let be given sparsity levels and local sparsities, respectively. Let be positive weights. Suppose satisfies the G-RIPL of order with constant and
[TABLE]
Let
[TABLE]
Let , , with and set . Then any solution of the optimization problem
[TABLE]
satisfies
[TABLE]
where , and .
Notice that the condition on in the above theorem is fundamentally different from the condition found in Theorem 2.8. In the latter one requires where is the sparsity ratio. Thus for sparsity levels where the local sparsities vary greatly, this bound will be unreasonably small.
In the above theorem we have removed this sparsity ratio term, by setting , and require where . For the unweighted case this leads to a condition of the form
[TABLE]
which could be difficult to fulfill in practice, since each would have to be greater than the total sparsity of the signal. However, by considering the weights we obtain a condition of the form
[TABLE]
where is independent of for . This means that we can write the requirement as , and ignore any dependence between the -values, as was the problem in Theorem 2.8.
3.4 Sufficient condition for the G-RIPL
In Definition 2.9 we defined the local coherence of an isometry . We extend this to isometries in the exact same way
[TABLE]
This yields the following theorem.
Theorem 3.6** (Subsampled isometries and the G-RIPL).**
Let be an isometry, and let be an -multilevel sampling scheme with levels. Let be sparsity levels and local sparsities, respectively. Let and let , with . Let and . Suppose is non-singular. If
[TABLE]
and
[TABLE]
for then with probability at least , the matrix
[TABLE]
satisfies the G-RIPL of order with constant .
3.5 Overall recovery guarantee
Theorem 3.5 and Theorem 3.6 yield the next results.
Corollary 3.7**.**
Let be an isometry, and let be an -multilevel sampling scheme with levels. Let be sparsity levels and local sparsities, respectively, and let be weights. Let and . Let , , , and . Let be as in (3.1) and set . Let , and . Set and . Suppose
- (i)
we choose and so that satisfies the balancing property of order , 2. (ii)
we choose and so that , 3. (iii)
the weight satisfies
[TABLE] 4. (iv)
the ’s satisfy for and
[TABLE]
Then with probability any solution of the optimization problem
[TABLE]
satisfies
[TABLE]
where and .
Suppose that is exactly -sparse. Then the above theorem guarantees exact recovery of via weighted minimization subject to the corresponding measurement condition. We note in passing this measurement condition is optimal up to log factors, in the sense that it is the same of that of the oracle estimator based on a priori knowledge of . See [1].
4 Recovery guarantees for Walsh sampling with wavelet reconstruction
Having presented the abstract infinite-dimensional CS framework in full generality, the remainder of the paper is devoted to its application to the case of binary sampling with the Walsh transform with sparsity in orthogonal wavelet bases. We first describe the setup, before presenting the main recovery guarantees in Sections 4.3 and 4.4.
4.1 Walsh functions
For any number there exits a unique dyadic expansion
[TABLE]
where for . Similarly any can be written in its dyadic form as
[TABLE]
with for all . For a dyadic rational number this expansion is not unique, as one may use either a finite expansion, or an infinite expansion where for all for some . In such cases we always consider the finite expansion. In practice this means that we have removed countably many singletons from .
Definition 4.1**.**
Let and . The Walsh function is given by
[TABLE]
On the interval the Walsh function has sign changes, is therefore often denoted the frequency of . The first Walsh functions gives rise to the entries in the sequency ordered Hadamard matrix
[TABLE]
where .
Definition 4.2** (Walsh basis).**
Define the Walsh basis as
[TABLE]
where “wh” is an abbreviation for Walsh-Hadamard.
Note that this is an orthonormal basis of .
4.2 Wavelet transform
Let and be a orthonormal scaling function and wavelet [13], respectively, with minimal support, corresponding to an multiresolution analysis (MRA). Note that this could both be the classical “Daubechies wavelet” with a minimum-phase or “symlets” which are close to being symmetric, but with a larger phase [26, 294]. Let
[TABLE]
denote the scaled and translated versions.
A wavelet is said to have vanishing moments if
[TABLE]
For for orthogonal wavelets with minimum support, the support depends on the number of vanishing moments. That is
[TABLE]
While this system constitutes an orthonormal basis of , in our case we require an orthonormal basis of . There exists several construction of wavelets on the interval, but we will only consider periodic extensions and the orthogonal boundary wavelets introduced by Cohen, Daubechies and Vial in [12], which preserves the number of vanishing moments.
For wavelets on the interval we need to replace the wavelets/scaling functions intersecting the boundaries at each scale, with their corresponding boundary-corrected counterparts. We postpone the formal definition of periodic and boundary wavelets until we need it, in the proof sections. But to simplify the notation let
[TABLE]
where and are either a periodic wavelet/scaling function or the boundary wavelet/scaling functions introduced in [12]. For the former extension we say that , “originate from a periodic wavelet” while for the latter we say that it “originate from a
- boundary wavelet*”.
We will throughout assume satisfies for and for . This will ensure that there exits at least one such that for all .
Definition 4.3**.**
For a fixed number of vanishing moments , minimum wavelet decomposition and a boundary extension which is either periodic or boundary wavelets, let be the corresponding wavelets and scaling functions. We define
[TABLE]
Both and are orthonormal bases for .
4.3 Recovery guarantees
From Section 3 there are four unknown factors depending on which need to be estimated. These are the local coherences , the norm where is given by (3.1), the condition number and the factor found in condition (3.10).
For the two latter factors we have . Furthermore we know that since is an isometry. In practice we therefore only need to determine an upper bound and from Lemma 3.3 we know that , where is the balancing property constant. In other words, it suffices to determine when the balancing property holds with a given .
The following three propositions estimate these quantities for the case .
Proposition 4.4**.**
Let . For each , there exits a constant , such that whenever then satisfies the balancing property of order for all .
Note that Proposition 4.4 is a consequence of Theorem 1.1 in [20].
Proposition 4.5**.**
Let with and let
[TABLE]
be sparsity and sampling levels, respectively. Then the local coherences of scales like
[TABLE]
Proposition 4.6**.**
Let and let be sparsity and sampling levels. Let be a multilevel random sampling scheme, and let be as in (3.1). Then
[TABLE]
We can now present the two main theorems in this section. We point out that these are only valid for vanishing moments. For , the corresponding wavelet is the Haar wavelet, and will be considered in the next subsection. For , the coherence of does not decay as fast as for the other wavelets. Whether this is because our coherence bounds are not sharp enough for this wavelet or if it is because the coherence of actually decays more slowly is not known. We do, however, present some numerics in Section 6.5 which indicate that it is potentially the latter.
Theorem 4.7**.**
Let with and let
[TABLE]
be sparsity and sampling levels, respectively. Let be local sparsities. Suppose is chosen so that satisfies the balancing property with constant and set . Let and let , with . Let and . If
[TABLE]
and
[TABLE]
for , then with probability at least , the matrix in (3.11) satisfies the G-RIPL of order with constant .
With this in hand, we now present our main result:
Theorem 4.8**.**
Let with and let
[TABLE]
be sparsity and sampling levels, respectively. Let be local sparsities, be weights and let be sampling densities. Let and let . Let , , , and .
Let be as in (3.1) and set . Let , and . Set and . Suppose
- (i)
we choose as in Proposition 4.4 so that satisfies the balancing property of order , 2. (ii)
we choose and so that , 3. (iii)
the weight satisfies
[TABLE] 4. (iv)
the ’s satisfy for and
[TABLE]
Then with probability any solution of the optimization problem
[TABLE]
satisfies
[TABLE]
where and .
Remark 4.9*.*
Note that the second condition (ii) can be guaranteed using Proposition 4.6. Indeed, it suffices for to satisfy
[TABLE]
Hence, given any a priori estimates on the decay of the coefficients (such as in the case of wavelets), one can use this to determine a suitable .
4.4 Uniform recovery for Haar wavelets
Below we shall see that for the Haar wavelet, will be an isometry for where . This can also be seen from Figure 2, where is perfectly block diagonal for . This means that the G-RIPL, reduces to the -adjusted RIPL, or simply the RIPL, which we know from the finite dimensional case. Notice in particular that we also avoid any considerations where as above, since .
Proposition 4.10**.**
Let and let , for some with . Then is an isometry on .
Proposition 4.11**.**
Let and let be sparsity and sampling levels, respectively. Then the local coherences of are
[TABLE]
It is now straightforward to derive the following:
Theorem 4.12**.**
Let and let be sparsity and sampling levels. Let be local sparsities and be local sampling densities. Let and . Let and . Suppose that the ’s satisfies for and
[TABLE]
Then with probability the matrix (3.11) satisfies the RIPL with constant .
Proof.
Using Proposition 4.10 we know that is an isometry. Thus inserting the local coherences from Proposition 4.11 into (2.4) in Theorem 2.10 gives to the result. ∎
Theorem 4.13**.**
Let and let be sparsity and sampling levels. Let be local sparsities, be weights and be local sampling densities. Let and let . Let , and . Suppose we sample for and
[TABLE]
for . Let be as in (3.1) with . Let and with for some . Set . Then any solution of the optimization problem
[TABLE]
satisfies
[TABLE]
with probability , where and .
Proof.
Proposition 4.10 gives . Next notice that and that since . Using Theorem 3.5 we see that we can guarantee recovery of -sparse vectors, if satisfies the RIPL with constant , where . Using Theorem 4.12 gives the result. ∎
5 Proof of results in Section 3
When deriving uniform recovery guarantees via the RIP, it is typical to proceed as follows. First, one shows that the RIP implies the so-called robust Null space Property (rNSP) of order (see Def. 4.17 in [16]). Second, one the shows that the rNSP implies stable and robust recovery. Thus the line of implications reads
[TABLE]
A similar line of implications holds for the RIPL and the corresponding robust Null Space Property in levels (rNSPL); see Def. 3.6 in [7]).
Both of the recovery guarantees for matrices satisfying the rNSP and rNSPL consider minimizers of the unweighed quadratically-constrained basis pursuit (QCBP) optimization problem. In our setup we consider minimizers of the weighted QCBP. We have therefore generalized the rNSPL to what we call the weighted robust null space property in levels.
For the sufficient condition for the G-RIPL in Theorem 3.6, the proof follows along similar lines as in [23]. We only sketch the main differences here.
5.1 The weighted rNSPL and norm bounds
For a set and a vector we let the vector be given by
[TABLE]
We also define
[TABLE]
Definition 5.1** (weigthed rNSP in levels).**
Let be sparsity levels and local sparsities, respectively. For positive weights , we say that satisfies the weighted robust Null Space Property in Levels (weighted rNSPL) of order with constants and if
[TABLE]
for all and all .
Lemma 5.2** (weighted rNSPL implies -distance bound).**
Suppose that satisfies the weighted rNSPL of order with constants and . Let . Then
[TABLE]
Proof.
Let and be such that . Then
[TABLE]
which implies that
[TABLE]
Now consider . By the weighted rNSPL, we have
[TABLE]
Hence (5.3) gives
[TABLE]
and after rearranging we get
[TABLE]
Therefore, using this and (5.3) once more, we deduce that
[TABLE]
which gives the result. ∎
Lemma 5.3** (weighted rNSPL implies distance bound).**
Suppose that satisfies the weighted rNSPL of order with constants and . Let . Then
[TABLE]
Proof.
Let and , where , is the index set of the largest coefficients of in absolute value. Then
[TABLE]
which gives
[TABLE]
Since we deduce that
[TABLE]
Applying Young’s inequality , we obtain
[TABLE]
Hence
[TABLE]
We now use the weighted rNSPL to get
[TABLE]
To complete the proof, we use the inequality . ∎
5.2 Weighted rNSPL implies uniform recovery
Theorem 5.4**.**
Let be sparsity levels and local sparsities, respectively, and let be positive weights. Let , with and with . Set . Let and suppose that satisfies the weighted rNSP in levels of order with constants and . If
[TABLE]
then any solution of the optimization problem
[TABLE]
satisfies
[TABLE]
where and .
Proof.
Recall that , and notice that this gives and . Next we consider the bound (5.5), and note that this bound implies
[TABLE]
We also note that (5.5) implies
[TABLE]
which can be written as
[TABLE]
Next set and consider the -bound. First notice that since satisfies the weighted rNSPL, Lemma 5.2 gives
[TABLE]
Here the last term can be bounded by
[TABLE]
since both and are feasible. Combining (5.10), (5.12) and (5.14) gives
[TABLE]
Using that is a minimizer of (5.6) gives the desired bound.
We now consider the -bound. First note that
[TABLE]
We shall also need
[TABLE]
Again, since satisfies the weighted rNSPL we can apply Lemma 5.3, Lemma 5.2 and inequality (5.16) to obtain the bound
[TABLE]
Combining (5.11), (5.14), (5.15), (5.17) and now gives
[TABLE]
Using that is a minimizer of (5.6) completes the proof. ∎
5.3 G-RIPL implies weighted rNSPL
Theorem 5.5**.**
Let and let be invertible. Let be sparsity levels, be local sparsities and let be positive weights. Suppose that satisfies the G-RIPL of order with constant , where
[TABLE]
Then satisfies the weighted rNSP in levels of order with constants and .
Proof.
Let be such that and let , where is the set of the largest indices of in absolute value. If , let and let for . For let be the index set of the largest values of , and let be the index set of the next largest values and so forth. In the case where there are less than values left at iteration , we let be the remaining indices. Let and let . Since we have
[TABLE]
where . Note that
[TABLE]
Then
[TABLE]
Set and notice that for and . Thus for we get
[TABLE]
Therefore
[TABLE]
This results in
[TABLE]
which establishes the weighted rNSPL of order with and . ∎
5.4 Proof of Theorem 3.5
Proof of Theorem 3.5.
First notice that for we have
[TABLE]
Hence using Theorem 5.5 with and we see that Equation (5.18), simplifies to Equation (3.5). This implies that satisfies the weighted rNSPL of order , with constants and . Now since
[TABLE]
we know from Theorem 5.4 that any solution of (3.6) satisfies (3.7) and (3.8). ∎
5.5 Proof of Theorem 3.6
Proof of Theorem 3.6.
We recall that is an isometry and that
[TABLE]
and . Note that
[TABLE]
and therefore
[TABLE]
Notice also that and for . Next notice that the matrix can be written as
[TABLE]
where is the standard basis on . It now follows that
[TABLE]
where are random vectors given by . Note that the are independent, and also that
[TABLE]
where is non-singular by assumption. Let
[TABLE]
We now define the following seminorm on :
[TABLE]
so that
[TABLE]
Due to (5.5) and (5.20), we may rewrite this as
[TABLE]
Having detailed the setup, the remainder of the proof now follows along very similar lines to that of [23, Thm. 3.2]. Hence we only sketch the details.
The first step is to estimate . Using the standard techniques of symmetrization, Dudley’s inequality, properties of covering numbers, and arguing as in [23, Sec. 4.2], we deduce that
[TABLE]
where is a universal constant, , and
[TABLE]
In particular,
[TABLE]
provided
[TABLE]
where is a constant. Using this, Talagrand’s theorem and using the fact that (see [23, Sec. 4.3]) we deduce that
[TABLE]
In particular,
[TABLE]
provided
[TABLE]
Combining this with (5.23) and (5.24) now completes the proof.
∎
5.6 Proof of Corollary 3.7 and Lemma 3.3
Proof of Corollary 3.7.
We must ensure that all the conditions are met to be able to apply Theorem 3.5 with .
First notice that for weights we have and . Next we note that condition implies that is a feasible point since .
Let . Combining condition and Lemma 3.3 gives and since we also have . Inserting the above equalities and inequalities into the weight condition for in Theorem 3.5 gives condition .
Next we must ensure that satisfies the G-RIPL of order with where
[TABLE]
According to Theorem 3.6 this occurs if the ’s satisfies condition . The error bounds (3.7) and (3.8) now follows directly from Theorem 3.5. ∎
Proof of lemma 3.3.
First notice that the balancing property is equivalent to requiring
[TABLE]
where is the th largest singular value of . Indeed, since is an isometry, the matrix is nonnegative definite, and therefore
[TABLE]
This gives (5.26). Next let and notice that . This gives . ∎
6 Proof of results in Section 4
In Section 4 we found concrete recovery guarantees for the Walsh sampling and wavelet reconstruction, using the theorems in Section 3. The key to deriving Walsh-wavelet recovery guarantees boils down to estimating the quantities , and . All of these quantities depend directly , and to control them we will have to estimate how the entries of changes for varying and . We will therefore start this section by setting up notation for wavelets on the interval and stating some useful properties of Walsh functions. Then in Section 6.3 and 6.4 we will estimate , followed by a discussion of the sharpness of this estimate for in Section 6.5. We will then finish in Section 6.6 by estimating , show how scales for varying and , and prove Theorem 4.7 and 4.8.
6.1 Wavelets on the interval and regularity
In section 4.2 we introduced orthogonal wavelets on the real line, but we did not make any formal definitions of the wavelets we used at the boundaries of the interval . Next we consider the two boundary extensions, periodic and boundary wavelets. To simplify the exposition we define the following sets
[TABLE]
At each scale , the periodic wavelet basis consists of the usual wavelets and scaling functions , for and the periodic extended functions and for . These are defined as
[TABLE]
and similarly for . Strictly speaking we could have defined these periodic extensions only for and , but to unify the notation for both boundary extensions we have chosen the former.
Next we have the boundary wavelet basis with vanishing moments. This wavelet basis consists of the same interior wavelets as the periodic basis, but with boundary scaling and wavelet functions.
[TABLE]
As for the interior functions we also define the scaled versions as
[TABLE]
The names ’left’ and ’right’ corresponds to the support of these functions. That is
[TABLE]
for .
In the following we shall see that all of our results holds for both periodic and boundary wavelets, but their treatment in some of the proofs differs slightly. To make the treatment as unified as possible we make the following definition.
Definition 6.1**.**
We say that , “originates from a periodic wavelet” if
[TABLE]
We say that “originates from a boundary wavelet” if
[TABLE]
With these functions defined now for both boundary extensions, the definition of is also clear. Next we make a note on the regularity of these orthogonal wavelets.
Definition 6.2**.**
Let , where and . A function is said to be uniformly Lipschitz if is -times continuously differentiable and for which the derivative is Hölder continuous with exponent , i.e.
[TABLE]
for some constant .
In particular the Daubechies wavelet with 1 vanishing moment (i.e., the Haar wavelet) is not uniformly Lipschitz as it is not continuous, whereas for we have the constants found in table 1 [13, 239]. For large , grows as [26, 294]. Also note that each of the boundary functions and are constructed as finite linear combinations of the interior scaling function and wavelet . Thus all of these boundary functions has the same regularity as and .
6.2 Properties of Walsh functions
Definition 6.3**.**
Let and be sequences consisting of only binary numbers. That is for all . The operation applied to these sequences gives
[TABLE]
For two binary numbers , we let .
Proposition 6.4**.**
For and , the Walsh function satisfies the the following properties
[TABLE]
Proof.
Equation (6.6) and (6.5) can be found in any standard text on Walsh functions e.g., [18], whereas the last follows by inserting zeros in front of ’s dyadic expansion. ∎
6.3 Bounding the inner product
The entries in , consists of for different values of and . Thus in order to determine the local coherences we need to find an upper bound of this inner product. Next we derive such an bound for vanishing moments and discusses its sharpness. For we determine the magnitude of each matrix entry explicitly.
Lemma 6.5**.**
Let and let for . For , and we have
[TABLE]
where
[TABLE]
if originates from a boundary wavelet and
[TABLE]
if originates from a periodic wavelet.
Proof.
First notice that for any we have
[TABLE]
Next, we only consider the interior wavelets i.e. . For , we need to handle the two cases where orignates from a periodic and boundary wavelet seperately. The arguments/calculations for the two different boundary extensions are analogous. Also, both of these extensions will have support less than .
For , notice that .
[TABLE]
∎
Lemma 6.6** ([9]).**
Let be uniformly Lipschitz then
[TABLE]
for .
Theorem 6.7**.**
Let with and let . For and with , we have
[TABLE]
for all and . For the bound hold with .
Proof.
To obtain the bound above we will combine Lemma 6.5 and Lemma 6.6. We start by arguing that have the same regularity regardless of boundary extension. Let where is as in lemma 6.5.
If originates from a periodic wavelet, , will have Lipschitz regularity , since both and have this regularity. Next if originates from a boundary wavelet and , will have Lipschitz regularity , by the same argument as above. If we know from the construction of the boundary functions [12] that these are finite linear combinations of and . These function will therefore posses the same regularity as the interior function.
Next notice from table 1 that for vanishing moments, we known that . Applying Lemma 6.5 and Lemma 6.6 then gives
[TABLE]
where depends on the boundary extension. ∎
Theorem 6.8**.**
Let and let for and . Then
[TABLE]
Proof.
These equalities can be found in either [5] or [30]. ∎
6.4 Proof of Proposition 4.5,
Using the above results we are now able to determine the local coherences of .
Proof of Proposition 4.5.
We use the bound found in Theorem 6.7. Recall that and . For fixed and we have
[TABLE]
For and we have . This gives
[TABLE]
∎
Proof of Proposition 4.10.
Since both and are orthonormal, is an isometry on i.e. . Let for some with . Using Theorem 6.8 we see that
[TABLE]
which means that
[TABLE]
∎
Proof of Proposition 4.11.
We use the bound found in Theorem 6.8. Recall that . For fixed we have that
[TABLE]
∎
6.5 About the sharpness of the local coherence bounds
As can be seen from Proposition 4.11, the coherence bounds for are sharp. However, for , we have not discussed their sharpness. In fact, none of the results in this paper consider the case for vanishing moments. The reason for this is that these wavelet have a Lipschitz regularity , which means that the bound in Theorem 6.7 would have less rapid decay if we had included these wavelets in the theorem. To simplify the presentation we have chosen to exclude them.
We will argue that Theorem 6.7 does not seem to extend to wavelets with vanishing moments. Let and for . Notice that setting does only affect the local coherence estimates for . For , the local coherences are unaffected by the regularity of the wavelet. This follows from Lemma 6.5, by setting . Next consider the case where , then Theorem 6.7 suggests that for .
We now consider table 2 and notice that for , all of the 18 entries in table 2 have values less than . This suggest that the bound in Theorem 6.7 does not extend to the case of vanishing moments. From the same table we also observe that for , the bound in Theorem 6.7 seem to be quite sharp. While there are a few entries that are less than , most are very close, if not larger than this value.
6.6 Proof of remaining results in Section 4
Proof of Proposition 4.4.
This proposition is a consequence of Theorem 1.1 in [20]. Let and be the first function in . The subspace cosine angle between and is defined as
[TABLE]
and is the projection operator onto . As both and are orthonormal bases, the synthesis and analysis operators are unitary. We therefore have
[TABLE]
Furthermore notice that by equation (5.29) and the definition of the balancing property, we have
[TABLE]
Hence if satisfies the balancing property of order for and , then , where . Next for and we define the stable sampling rate as
[TABLE]
Rearranging the terms we see that if , satisfies the stable sampling rate of order then satisfies the balancing property of order for and .
Theorem 1.1 in [20] states that for , and for all there exists a constant (dependent on ), such that whenever , then . Moreover, we have the relation . Hence if we see that the proposition hold with . ∎
Proof of Proposition 4.6.
Using Theorem 6.7, we see that . This gives
[TABLE]
∎
Proof of Theorem 4.7.
First recall that and where is chosen so that satisfies the balancing property of order . From Lemma 3.3 we therefore have .
From Theorem 3.6 we know that the matrix in equation (3.11) satisfies the G-RIPL with , provided the sample densities satisfies for , and
[TABLE]
for . Next notice that for , while and . Using the local coherences from Proposition 4.5 we obtain
[TABLE]
Inserting this and into (6.13) leads to the sampling condition in Theorem 4.7. ∎
Proof of Theorem 4.8.
The theorem is identical to Corollary 3.7, except that we have fixed and . The concrete values for these have been inserted in condition together with the local coherences . The computation of this can be found in the proof above. ∎
Acknowledgements
The authors would like to thank Simone Brugiapaglia, Simon Foucart, Remi Gribonval, Øyvind Ryan and Laura Thesing for useful discussions and comments. BA acknowledges support from the Natural Sciences and Engineering Research Council of Canada through grant 611675. ACH acknowledges support from the UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/L003457/1, a Royal Society University Research Fellowship, and the Philip Leverhulme Prize (2017).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Adcock, C. Boyer, and S. Brugiapaglia. On oracle-type local recovery guarantees in compressed sensing. ar Xiv preprint ar Xiv:1806.03789 , 2018.
- 2[2] B. Adcock and A. C. Hansen. Generalized sampling and infinite-dimensional compressed sensing. Foundations of Computational Mathematics , 16(5):1263–1323, 2016.
- 3[3] B. Adcock, A. C. Hansen, C. Poon, and B. Roman. Breaking the coherence barrier: A new theory for compressed sensing. In Forum of Mathematics, Sigma , volume 5. Cambridge University Press, 2017.
- 4[4] B. Adcock, A. C. Hansen, and B. Roman. A note on compressed sensing of structured sparse wavelet coefficients from subsampled fourier measurements. IEEE Signal Processing Letters , 23(5):732–736, 2016.
- 5[5] V. Antun. Coherence estimates between hadamard matrices and daubechies wavelets, 2016. Master’s thesis, University of Oslo .
- 6[6] G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. S. Kittle. Compressive coded aperture spectral imaging: An introduction. IEEE Signal Processing Magazine , 31(1):105–115, 2014.
- 7[7] A. Bastounis and A. C. Hansen. On the absence of uniform recovery in many real-world applications of compressed sensing and the restricted isometry property and nullspace property in levels. SIAM Journal on Imaging Sciences , 10(1):335–371, 2017.
- 8[8] S. Brugiapaglia and B. Adcock. Robustness to unknown error in sparse regularization. IEEE Transactions on Information Theory , 64(10):6638–6661, 2018.
