Total Variation Minimization in Compressed Sensing
Felix Krahmer, Christian Kruschel, Michael Sandbichler

TL;DR
This paper reviews recovery guarantees for total variation minimization in compressed sensing, highlighting the limitations of synthesis sparse approaches and extending results from Gaussian to subgaussian measurements.
Contribution
It provides a comprehensive overview of total variation minimization in compressed sensing and introduces generalized guarantees for subgaussian measurement scenarios.
Findings
Total variation minimization has specific recovery guarantees in compressed sensing.
Synthesis sparse signal approaches are inadequate for total variation minimization.
Recent results are extended from Gaussian to subgaussian measurement models.
Abstract
This chapter gives an overview over recovery guarantees for total variation minimization in compressed sensing for different measurement scenarios. In addition to summarizing the results in the area, we illustrate why an approach that is common for synthesis sparse signals fails and different techniques are necessary. Lastly, we discuss a generalizations of recent results for Gaussian measurements to the subgaussian case.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotoacoustic and Ultrasonic Imaging · Sparse and Compressive Sensing Techniques · Optical Imaging and Spectroscopy Techniques
Total Variation Minimization in Compressed Sensing
Felix Krahmer, Christian Kruschel, and Michael Sandbichler
Abstract
This chapter gives an overview over recovery guarantees for total variation minimization in compressed sensing for different measurement scenarios. In addition to summarizing the results in the area, we illustrate why an approach that is common for synthesis sparse signals fails and different techniques are necessary. Lastly, we discuss a generalizations of recent results for Gaussian measurements to the subgaussian case.
1 Introduction
The central aim of Compressed Sensing (CS) [CRT06, Don06] is the recovery of an unknown vector from very few linear measurements. Put formally, we would like to recover from with , where denotes additive noise.
For general , recovery is certainly not possible, hence additional structural assumptions are necessary in order to be able to guarantee recovery. A common assumption used in CS is that the signal is sparse. Here for we assume
[TABLE]
that is, there are only very few nonzero entries of . And say that is -sparse for some given sparsity level . We call a vector compressible, if it can be approximated well by a sparse vector. To quantify the quality of approximation, we let
[TABLE]
denote the error of the best -sparse approximation of .
In most cases, the vector is not sparse in the standard basis, but there is a basis , such that and is sparse. This is also known as synthesis sparsity of . To find an (approximately) synthesis sparse vector, we can instead solve the problem of recovering from . A common strategy in CS is to solve a basis pursuit program in order to recover the original vector. For a fixed noise level , it is given by
[TABLE]
While this and related approaches of convex regularization have been studied in the inverse problems and statistics literature long before the field of compressed sensing developed, these works typically assumed the measurement setup was given. The new paradigm arising in the context of compressed sensing was to attempt to use the remaining degrees of freedom of the measurement system to reduce the ill-posedness of the system as much as possible. In many measurement systems, the most powerful known strategies will be based on randomization, i.e., the free parameters are chosen at random.
Given an appropriate amount of randomness (i.e., for various classes of random matrices , including some with structure imposed by underlying applications), one can show that the minimizer of (1) recovers the original vector with error
[TABLE]
see, e.g., [BDDW08] for an elementary proof in the case of subgaussian matrices without structure, and [KR14] for an overview, including many references, of corresponding results for random measurement systems with additional structure imposed by applications. Note that (2) entails that if is -sparse and the measurements are noiseless, the recovery is exact.
For many applications, however, the signal model of sparsity in an orthonormal basis has proven somewhat restrictive. Two main lines of generalization have been proposed. The first line of work, initiated by [RSV08] is the study of sparsity in redundant representation systems, at first under incoherence assumptions on the dictionary. More recently, also systems without such assumptions have been analyzed [CENR10, KNW15]. The main idea of these works is that even when one cannot recover the coefficients correctly due to conditioning problems, one may still hope for a good approximation of the signal.
The second line of work focuses on signals that are sparse after the application of some transform, one speaks of cosparsity or analysis sparsity [NDEG13], see, e.g., [KR15] for an analysis of the Gaussian measurement setup in this framework. A special case of particular importance, especially for imaging applications, is that of sparse gradients. Namely, as it turns out, natural images often admit very sparse approximations in the gradient domain, see, e.g., Figure 1. Here the discrete gradient at location is defined as the vector with its entries given by \big{(}(\nabla z)_{i}\big{)}_{j}=z_{i+e_{j}}-z_{i}, , where is the -th standard basis vector.
A first attempt to recover a gradient sparse signal is to formulate a compressed sensing problem in terms of the sparse gradient. When this is possible (for instance in the example of Fourier measurements [CRT06]), applying (1) will correspond to minimizing , the total variation seminorm. Then (under some additional assumptions) compressed sensing recovery guarantees of the form (2) can apply. This proof strategy, however, only allows for showing that the gradient can be approximately recovered, not the signal. When no noise is present and the gradient is exactly sparse (which is not very realistic), this allows for signal recovery via integrating the gradient, but in case of noisy measurements, this procedure is highly unstable.
Nevertheless, the success motivates to minimize the total variation seminorm if one attempts to recover the signal directly, not the gradient. In analogy with (1), this yields the following minimization problem.
[TABLE]
For the identity (i.e., not reducing the dimension), this relates to the famous Rudin-Osher-Fatemi functional, a classical approach for signal and image denoising [rudin1992nonlinear]. Due to its high relevance for image processing, this special case of analysis sparsity has received a lot of attention recently also in the compressed sensing framework where is dimension reducing. The purpose of this chapter is to give an overview of recovery results for total variation minimization in this context of compressed sensing (Section 2) and to provide some geometric intuition by discussing the one-dimensional case under Gaussian or subgaussian measurements (to our knowledge, a generalization to the latter case does not appear yet in the literature) with a focus on the interaction between the high-dimensional geometry and spectral properties of the gradient operator (Section 3).
2 An overview over TV recovery results
In this section, we will give an overview of the state of the art guarantees for the recovery of gradient sparse signals via total variation minimization. We start by discussing in Section 2.1 sufficient conditions for the success of TV minimization.
Subsequently, we focus on recovery results for random measurements. Interestingly, the results in one dimension differ severely from the ones in higher dimensions. Instead of obtaining a required number of measurements roughly on the order of the sparsity level , we need measurements for recovery. We will see this already in Subsection 2.2, where we present the results of Cai and Xu [CX15] for recovery from Gaussian measurements. In Section 3, we will use their results to obtain refined results for noisy measurements as well as guarantees for subgaussian measurements, combined with an argument of Tropp [Tro15]. In Subsection 2.3 we will present results by Ward and Needell for dimensions larger or equal than two showing that recovery can be achieved from Haar incoherent measurements.
2.1 Sufficient Recovery Conditions
Given linear measurements for an arbitrary and a signal with , a natural way to recover is by solving
[TABLE]
For we denote as the columns of indexed by , and for a consecutive notation we denote as the rows of indexed by and as the identity matrix. The following results can also be easily applied to analysis -minimization, where any arbitrary matrix replaces in (3), as well as to any real Hilbert space setting [Kru15].
In many applications it is important to verify whether there is exactly one solution of (3). Since is not injective here, we cannot easily use the well-known recovery results in compressed sensing [FR13] for the matrix . However, a necessary conditon can be given since can only satisfy and if
[TABLE]
If is replaced by the identity, this is equivalent to being injective. Since this injectivity condition is unavoidable, we assume for the rest of this section that it is satisfied.
The paper [NDEG13] provides sufficient and necessary conditons for uniform recovery via (3). The conditions rely on the null space of the measurements and are hard to verify similar to the classical compressed sensing setup [TP14]. The following result is a corollary of these conditions. It no longer provides a necessary condition, but is more manageable.
Corollary** 2.1****.**
[NDEG13] For all with , the solution of (3) with is unique and equal to if for all with it holds that
[TABLE]
To consider measurements for specific applications, where it is difficult to prove whether uniform recovery is guaranteed, one can empirically examine whether specific elements solve (3) uniquely. For computed tomography measurements, a Monte Carlo Experiment is considered in [JKL15] to approximate the fraction of all gradient -sparse vectors to uniquely solve (3). The results prompt that there is a sharp transition between the case that every vector with a certain gradient sparsity is uniquely recoverable and the case that TV-minimization will find a different solution than the desired vector. This behavior empirically agrees with the phase transition in the classical compressed sensing setup with Gaussian measurements [Don04].
To efficiently check whether many specific vectors can be uniquely recovered via (3), one needs to establish characteristics of which must be easily verifiable. Such a non-uniform recovery condition is given in the following theorem.
Theorem** 2.1****.**
[JKL15] It holds that is a unique solution of (3) if and only if there exists and such that
[TABLE]
The basic idea of the proof is to use the optimality condition for convex optimization problems [Roc72]. Equivalent formulations of the latter theorem can be found in [ZMY16, KR15] where the problem is considered from a geometric perspective. However, verifying the conditions in Theorem 2.1 still requires solving a linear program where an optimal for (4) needs to be found. In classical compressed sensing, the Fuchs Condition [Fuc04] is known as a weaker result as it suggests a particular in (4) and avoids solving the consequential linear program. The following result generalizes this result to general analysis -minimization.
Corollary** 2.2****.**
If satisfies
[TABLE]
then is the unique solution of (3).
2.2 Recovery from Gaussian measurements
As discussed above, to date no deterministic constructions of compressed sensing matrices are known that get anywhere near an optimal number of measurements. Also for the variation of aiming to recover approximately gradient sparse measurements, the only near-optimal recovery guarantees have been established for random measurement models. Both under (approximate) sparsity and gradient sparsity assumptions, an important benchmark is that of a measurement matrix with independent standard Gaussian entries. Even though such measurements are hard to realize in practice, they can be interpreted as the scenario with maximal randomness, which often has particularly good recovery properties. For this reason, the recovery properties of total variation minimization have been analyzed in detail for such measurements. Interestingly, as shown by the following theorem, recovery properties in the one-dimensional case are significantly worse than for synthesis sparse signals and also for higher dimensional cases. That is why we focus on this case in Section 3, providing a geometric viewpoint and generalizing the results to subgaussian measurements.
Theorem** 2.2****.**
[CX15] Let the entries of be i.i.d. standard Gaussian random variables and let be a solution of (3) with input data . Then
There exist constants , such that for
[TABLE] 2. 2.
For any , there are constants and a universal constant , such that for and . If , there exist infinitely many with , such that .
This scaling is notably different from what is typically obtained for synthesis sparsity, where the number of measurements scales linearly with up to factors. Such a scaling is only obtained for higher dimensional signals, e.g., images. Indeed, in [CX15], it is shown that for dimensions at least two the number of Gaussian measurements sufficient for recovery is
[TABLE]
where the constant depends on the dimension.
Furthermore, as we can see in Theorem 2.5 below, this is also the scaling one obtains for dimensions larger than and Haar incoherent measurements. Thus the scaling of is a unique feature of the -dimensional case. Also note that the square-root factor in the upper bound makes the result meaningless for a sparsity level on the order of the dimension. This has been addressed in [KRZ15], showing that a dimension reduction is also possible if the sparsity level is a (small) constant multiple of the dimension.
The proof of Theorem 2.2 uses Gordon’s escape through the mesh Theorem [Gor88]. We will elaborate on this topic in Section 3.
In case we are given noisy measurements with , we can instead of solving (3) consider
[TABLE]
If is not exactly, but approximately sparse, and our measurements are corrupted with noise, the following result can be established.
Theorem** 2.3****.**
[CX15] Let the entries of be i.i.d. standard Gaussian random variables and let be a solution of (5) with input data satisfying . Then for any , there are positive constants , such that for and
[TABLE]
This looks remarkably similar to the recovery guarantees obtained for compressed sensing, note however that the number of measurements needs to be proportional to , which is not desirable. We will present a similar result with improved number of measurements in Section 3.5.
Theorem** 2.4****.**
(Corollary of Theorem 3.4) Let be such that for and with be a standard Gaussian matrix. Furthermore, set , where denotes the (bounded) error of the measurement and for some absolute constants the solution of (12) satisfies
[TABLE]
Note, however that in contrast to theorem 2.3, this theorem does not cover the case of gradient compressible vectors, but on the other hand Theorem 3.4 also incorporates the case of special subgaussian measurement ensembles. Also, if we set , we reach a similar conclusion as in Theorem 2.3.
2.3 Recovery from Haar-incoherent measurements
For dimensions , Needell and Ward [NW13a, NW13b] derived recovery results for measurement matrices having the restricted isometry property (RIP) when composed with the Haar wavelet transform. Here we say that a matrix has the RIP of order and level if for every -sparse vector it holds that
[TABLE]
The results of [NW13a, NW13b] build upon a connection between a signal’s wavelet representation and its total variation seminorm first noted by Cohen, Dahmen, Daubechies and DeVore [CDDD03].
Their theorems yield stable recovery via TV minimization for dimensional signals. For , notably these recovery results concern images of size .
Several definitions are necessary in order to be able to state the theorem. The dimensional discrete gradient is defined via and maps to its discrete derivative which, for each is a vector composed of the derivatives in all directions. Up to now, we have always used the anisotropic version of the TV seminorm, which can be seen as taking the norm of the discrete gradient. The isotropic TV seminorm is defined via a combination of and norms. It is given by . The result in [NW13a] is given in terms of the isotropic TV seminorm but can also be formulated for the anisotropic version.
Furthermore, we will need to concatenate several measurement matrices in order to be able to state the theorem. This will be done via the concatenation operator , which ’stacks’ two linear maps.
Finally, we need the notion of shifted operators. For an operator , these are defined as the operators and concatenating a column of zeros to the end or beginning of the -th component, respectively.
Theorem** 2.5**** ([NW13a]).**
Let and fix integers and . Let be a map that has the restricted isometry property of order and level if it is composed with the orthonormal Haar wavelet transform. Furthermore let with be such that has the restricted isometry property of order and level . Consider the linear operator . Then with and for all we have the following. Suppose we have noisy measurements with , then the solution to
[TABLE]
satisfies
, 2. 2.
, 3. 3.
for some absolute constants .
From the last point of the previous theorem, we see that for noiseless measurements and gradient sparse vectors , perfect recovery can be achieved provided the RIP assumption holds. Subgaussian measurement matrices, for example, will have the RIP, also when composed with the Haar wavelet transform (this is a direct consequence of rotation invariance). Moreover, as shown in [KW11], randomizing the column signs of an RIP matrix will, with high probability, also yield a matrix that has the RIP when composed with . An important example is a subsampled Fourier matrix with random column signs, which relates to spread spectrum MRI (cf. [PMG*+*12]).
2.4 Recovery from subsampled Fourier measurements
Fourier measurements are widely used in many applications. Especially in medical applications as parallel-beam tomography and magnetic resonance imaging it is desirable to reduce the number of samples to spare patients burden. In Section 2.1, this is a motivation for introducing algorithmic checks for unique solutions of (3). In this section, we consider a probabilistic approach where an incomplete measurement matrix chosen from the discrete Fourier transform on is considered. Therefore we consider a subset of the index set , where consists of integers chosen uniformly at random and, additionally, . Hence, we want to recover a signal, sparse in the gradient domain, with a measurement matrix . In [CRT06] the optimal sampling cardinality for -sparse signals in the gradient domain was given and enables to recover one-dimensional signals signals from Fourier samples. It naturally extends to two dimensions.
Theorem** 2.6****.**
[CRT06] With probability exceeding , a signal , which is -sparse in the gradient domain is the unique solution of (3) if
[TABLE]
As already discussed in the introduction, the proof of this result proceeds via recovering the gradient and then using that the discrete gradient (with periodic boundary conditions) is injective. Due to the poor conditioning of the gradient, however, this injectivity results do not directly generalize to recovery guarantees for noisy measurements. For two (and more) dimensions, such results can be obtained via the techniques discussed in the previous subsection.
These techniques, however, do not apply directly. Namely, the Fourier (measurement) basis is not incoherent to the Haar wavelet basis; in fact, the constant vector is contained in both, which makes them maximally coherent. As observed in [PVW11], this incoherence phenomenon only occurs for low frequencies, the high frequency Fourier basis vectors exhibit small inner products to the Haar wavelet basis. This can be taken into account using a variable density sampling scheme with sampling density that is larger for low frequencies and smaller for high frequencies. For such a sampling density, one can establish the restricted isometry for the corresponding randomly subsampled discrete Fourier matrix combined with the Haar wavelet transform with appropriately rescaled rows [KW14]. This yields the following recovery guarantee.
Theorem** 2.7****.**
[KW14]
Fix integers and such that and
[TABLE]
Select frequencies i.i.d. according to
[TABLE]
where is an absolute constant and is chosen such that is a probability distribution.
Consider the weight vector with , and assume that the noise vector satisfies , for some . Then with probability exceeding , the following holds for all images :
Given noisy partial Fourier measurements , the estimation
[TABLE]
where denotes the Hadamard product, approximates up to the noise level and best -term approximation error of its gradient:
[TABLE]
A similar optimality result is given in [Poo15], also for noisy data and inexact sparsity. In contrast to the previous result, this result includes the one-dimensional case. The key to obtaining such a result is showing that the stable gradient recover implies the stable signal recovery, i.e.,
[TABLE]
Again the sampling distribution is chosen as a combination of the uniform distribution and a decaying distribution. The main idea is to use this sampling to establish (10) via the RIP. We skip technicalities for achieving the optimality in the following theorem and refer to the original article for more details.
Theorem** 2.8****.**
[Poo15] Let be fixed and be a minimizer of (5) with for some , , and an appropriate sampling distribution. Then with probability exceeding , it holds that
[TABLE]
where is the orthogonal projection onto a -dimensional subspace,
[TABLE]
In the two-dimensional setting the result changes to
[TABLE]
with remaining and
[TABLE]
These results are optimal since the best error one can archive [NW13b] is .
The optimality in the latter theorems is achieved by considering a combination of uniform random samling and variable density sampling. Uniform sampling on its own can achieve robust and stable recovery. However, the following theorem shows that the signal error is no longer optimal but the bound on the gradient error is still optimal up to log factors. Here (10) is obtained by using the Poincaré inequality.
Theorem** 2.9****.**
[Poo15] Let be fix and be a minimizer of (5) with for some and with random uniform sampling. Then with probability exceeding , it holds that
[TABLE]
where is the orthogonal projection onto a -dimensional subspace and .
3 TV-recovery from subgaussian measurements in 1D
In this section, we will apply the geometric viewpoint discussed in [Ver15] to the problem, which will eventually allow us to show the TV recovery results for noisy subgaussian measurements mentioned in Section 2.2.
As in the original proof of the 1D recovery guarantees for Gaussian measurements [CX15], the Gaussian mean width will play an important role in our considerations.
Definition** 3.1****.**
The (Gaussian) mean width of a bounded subset of is defined as
[TABLE]
where is a vector of i.i.d. random variables.
In [CX15], the mean width appears in the context of the Gordon’s escape through the mesh approach [Gor88] (see Section 3.4 below), but as we will see, it will also be a crucial ingredient in applying the Mendelson small ball method [KM15, Men14].
The mean width has some nice (and important) properties, it is for example invariant under taking the convex hull, i.e.,
[TABLE]
Furthermore, it is also invariant under translations of , as . Due to the rotational invariance of Gaussian random variables, that is , we also have that . Also, it satisfies the inequalities
[TABLE]
which are equalities if is symmetric about [math], because then and hence .
3.1 bounds and recovery
In order to highlight the importance of the Gaussian mean width in signal recovery, we present some arguments from [Ver15]. Thus in this section we present a classical result, the bound, which connects the mean width to recovery problems, cf. [Ver15]. Namely, recall that due to rotational invariance, the kernel of a Gaussian random matrix is a random subspace distributed according to the uniform distribution (the Haar measure) on the Grassmannian
[TABLE]
Consequently, the set of all vectors that yield the same measurements directly correspond to such a random subspace.
The average size of the intersection of this subspace with a set reflecting the minimization objective now gives us an average bound on the worst case error.
Theorem** 3.1**** ( bound, Theorem 3.12 in [Ver15]).**
Let be a bounded subset of and be a random subspace of of drawn from the Grassmanian according to the Haar measure. Then
[TABLE]
where is absolute constant.
Given the -bound it is now straightforward to derive bounds on reconstructions from linear observations. We first look at feasibility programs - which in turn can be used to obtain recovery results for optimization problems. For that, let be bounded and be the vector we seek to reconstruct from measurements with a Gaussian matrix .
Corollary** 3.1****.**
[MPTJ07] Choose , such that
[TABLE]
then one has, for an absolute constant ,
[TABLE]
This corollary directly follows by choosing , observing that , and that the side constraint enforces .
Via a standard construction in functional analysis, the so called Minkowski functional, one can now cast an optimization problem as a feasiblity program so that Corollary 3.1 applies.
Definition** 3.2****.**
The Minkowski functional of a bounded, symmetric set is given by
[TABLE]
So the Minkowski functional tells us, how much we have to ’inflate’ our given set in order to capture the vector . Clearly, from the definition we have that if is closed
[TABLE]
If a convex set is closed and symmetric, then defines a norm on .
Recall that a set is star shaped, if there exists a point , which satisfies that for all we have . It is easy to see that convex sets are star shaped, but for example unions of subspaces are not convex, but star shaped.
For bounded, star shaped , the notion of now allows to establish a direct correspondence between norm minimization problems and feasibility problems. With this observation, Corollary 3.1 translates to the following result.
Corollary** 3.2****.**
For bounded, symmetric and star-shaped, let and . Choose , such that it solves
[TABLE]
then
[TABLE]
Here is due to the fact that the minimum satisfies , as by assumption.
This result directly relates recovery guarantees to the mean width, it thus remains to calculate the mean width for the sets under consideration. In the following subsections, we will discuss two cases. The first one directly corresponds to the desired signal model, namely gradient sparse vectors. These considerations are mainly of theoretical interest, as the associated minimization problem closely relates to support size minimization, which is known to be NP hard in general. The second case considers the TV minimization problem introduced above, which then also yields guarantees for the (larger) set of vectors with bounded total variation.
Note, however, that the -bound only gives a bound for the expected error. We can relate this result to a statement about tail probabilities using Markov’s inequality, namely
[TABLE]
In the next section we compute the mean width for the set of gradient sparse vectors, that is we now specify the set in Corollary 3.1 to be the set of all vectors with energy bounded by one that only have a small number of jumps.
3.2 The mean width of gradient sparse vectors in 1d
Here [PV13] served as an inspiration, as the computation is very similar for the set of sparse vectors.
Definition** 3.3****.**
The jump support of a vector is given via
[TABLE]
The jump support captures the positions, in which a vector changes its values. With this, we now define the set
[TABLE]
The set consists of all -gradient sparse vectors, which have -norm smaller than one. We will now calculate the mean width of in order to apply Corrolary 3.1 or 3.2.
Note that we can decompose the set into smaller sets with , and . As we can’t add any jumps within the set , it is a subspace of . We can even quite easily find an orthonormal basis for it, if we define
[TABLE]
As we can align all elements of with , we see that forms an ONB of . Now, we can write all elements as by setting . The property that now enforces (ONB) that . Now, note that , so we have
[TABLE]
Using the decomposition , we get
[TABLE]
Now
[TABLE]
Note that is again a Gaussian random variable with mean [math] and variance . Furthermore, the supremum over is attained, if is parallel to , so we have . Also note that has i.i.d. entries, but for different , the random vectors and may be dependent. Our task is now to calculate . As it has been shown for example in [FR13], we have that
[TABLE]
and from standard results for Gaussian concentration (cf. [PV13]), we get
[TABLE]
By noting that , we see by a union bound that
[TABLE]
For the following calculation, set . By Jensen’s inequality and rewriting the expectation, we have that
[TABLE]
Now, the previous consideration showed, that
[TABLE]
Computing the resulting integrals yields
[TABLE]
Using a standard bound for the binomial coefficients, namely , we see
[TABLE]
or equivalently
[TABLE]
By setting and assuming (reasonably) large , we thus get
[TABLE]
From this, we see that
[TABLE]
It follows that the Gaussian mean width of the set of gradient sparse vectors is the same as the mean width of sparse vectors due to the similar structure. If we want to obtain accuracy for our reconstruction, according to Theorem 3.1, we need to take
[TABLE]
measurements.
In Compressed Sensing, the squared mean width of the set of -sparse vectors (its so called statistical dimension) already determines the number of required measurements in order to recover a sparse signal with basis pursuit. This is the case because the convex hull of the set of sparse vectors can be embedded into the -ball inflated by a constant factor. In the case of TV minimization, as we will see in the following section, this embedding yields a (rather large) constant depending on the dimension.
3.3 The extension to gradient compressible vectors needs a new approach
In the previous subsection, we considered exactly gradient sparse vectors. However searching all such vectors that satisfy is certainly not a feasible task. Instead, we want to solve the convex program
[TABLE]
with the total variation seminorm. Now if we have that , we get that
[TABLE]
with as in section 3.2, so . As is convex, we even have . We can think of the set as ’gradient- compressible’ vectors.
In the proof of Theorem 3.3 in [CX15], the Gaussian width of the set has been calculated via a wavelet-based argument. One obtains that with being an absolute constant. In this section we illustrate, that proof techniques different from the ones used in the case of synthesis sparsity are indeed necessary in order to obtain useful results. In the synthesis case, the -norm ball of radius is contained in the set of -sparse vectors inflated by a constant factor. This in turn implies that the mean width of the compressible vectors is bounded by a constant times the mean width of the -sparse vectors.
We will attempt a similar computation, that is to find a constant, such that the set is contained in the ’inflated’ set . Then . Although this technique works well for sparse recovery, where , it pityably fails in the case of TV recovery as we will see below.
Let us start with . Now we can decompose with in an ascending manner, i.e., for all , we have that . Note that the number of such sets satisfies . Similarly as above, we now write . From this, we see that
[TABLE]
The necessary factor can be found by bounding the size of , namely
[TABLE]
From this, we see that . To see that this embedding constant is optimal, we construct a vector, for which it is needed.
To simplify the discussion, suppose that and are even and . For even , the vector has unity norm, lies in for and has jump support on all of !
For a vector and an index set , we define the restriction of to by
[TABLE]
By splitting into sets and setting , we see that and in order for this to be elements of , we have to set . This follows from
[TABLE]
and no smaller inflation factor than can suffice.
So from the previous discussion, we get
Lemma** 3.1****.**
We have the series of inclusions
[TABLE]
In view of the results obtainable for sparse vectors and the -ball, this is very disappointing, because Lemma 3.1 now implies that the width of satisfies
[TABLE]
which is highly suboptimal.
Luckily, the results in [CX15] suggest, that the factor in the previous equation can be replaced by . However, they have to resort to a direct calculation of the Gaussian width of . The intuition why the Gaussian mean width can be significantly smaller than the bound given in Lemma 3.1 stems from the fact, that in order to obtain an inclusion we need to capure all ’outliers’ of the set - no matter how small their measure is.
3.4 Exact recovery
For exact recovery, the -bound is not suitable anymore and, as suggested in [Ver15], we will use ’Gordon’s escape through the mesh’ in order to find conditions on exact recovery. Exact recovery for TV minimization via this approach has first been considered in [CX15].
Suppose, we want to recover from Gaussian measurements . Given, that we want our estimator to lie in a set , exact recovery is achieved, if . This is equivalent to requiring
[TABLE]
With the descent cone , we can rewrite this condition as
[TABLE]
by introducing the set , we see that if
[TABLE]
we get exact recovery. The question, when a section of a subset of the sphere with a random hyperplane is empty is answered by Gordon’s escape through a mesh.
Theorem** 3.2**** ([Gor88]).**
Let be fixed and be drawn at random according to the Haar measure. Assume that , then with probability exceeding
[TABLE]
So we get exact recovery with high probability from a program given in Theorem 3.1 or 3.2, provided that .
Let’s see how this applies to TV minimization. Suppose, we are given and Gaussian measurements . Solving
[TABLE]
amounts to using the Minkowski functional of the set , which is a scaled TV-Ball.
In [CX15], the null space property for TV minimization given in Corollary 2.1 has been used in order to obtain recovery guarantees.
They consider the set, where this condition is not met
[TABLE]
and apply Gordon’s escape through the mesh to see that with high probability, its intersection with the kernel of is empty, thus proving exact recovery with high probability. Their estimate to the mean width of the set ,
[TABLE]
with is essentially optimal (up to logarithmic factors), as they also show that . So uniform exact recovery can only be expected for measurements.
Let us examine some connections to the previous discussion about the descent cone.
Lemma** 3.2****.**
We have that for defined as above and , it holds that .
Proof.
Let . Then there exists a , such that . Set , then, as , we have that , or
[TABLE]
Now, by the triangle inequality and this observation, we have
[TABLE]
The last equality follows from the fact that is zero outside of the gradient support of . Multiplying both sides with gives the desired result
[TABLE]
∎
The previous lemma shows that the recovery guarantees derived from the null space property and via the descent cone are actually connected in a very simple way.
Clearly, now if we do not intersect the set , we also do not intersect the set , which yields exact recovery for example with the same upper bounds on as for . Even more specifically, in the calculation of given in [CX15], an embedding into a slightly larger set is made. This embedding can also quite easily be done if we note that , as we showed above and .
Note that the same discussion also holds for higher dimensional signals, such that the improved numbers of measurements as given in Section 2.2 can be applied.
3.5 Subgaussian measurements
Up to this point, all our measurement matrices have been assumed to consist of i.i.d. Gaussian random variables. We will reduce this requirement in this section to be able to incorporate also subgaussian measurement matrices into our framework.
Definition** 3.4****.**
A real valued random variable is called subgaussian, if there exists a number , such that . A real valued random vector is called subgaussian, if all of its one dimensional marginals are subgaussian.
An obvious example of subgaussian random variables are Gaussian random variables, as the expectation in Definition 3.4 exists for all . Also, all bounded random variables are subgaussian.
Here, we rely on results given by Tropp in [Tro15] using the results of Mendelson [KM15, Men14]. We will consider problems of the form
[TABLE]
where is supposed to be a matrix with independent subgaussian rows. Furthermore, we denote the exact solution by , i.e., . We pose the following assumptions on the distribution of the rows of .
- (M1)
, 2. (M2)
There exists , such that for all it holds that , 3. (M3)
There is a , such that for all it holds that , 4. (M4)
The constant is small.
Then the small ball methods yields the following recovery guarantee (we present the version of [Tro15]).
Theorem** 3.3****.**
Let and be a subgaussian matrix satisfying (M1)-(M4) above. Furthermore, set , where denotes the (bounded) error of the measurement. Then the solution of (12) satisfies
[TABLE]
with probability exceeding . denotes the descent cone of the set at , as defined in the previous section.
From this we see that, provided
[TABLE]
we obtain stable reconstruction of our original vector from (12). Note that the theorem is only meaningful for , as otherwise the denominator vanishes.
In the previous section, we have shown the inclusion for and hence we have that
[TABLE]
So we see that for , we obtain the bound
[TABLE]
with high probability. We conclude that, given the absolute constants , we need to set in order to obtain a meaningful result. Combining all our previous discussions with Theorem 3.3, we get
Theorem** 3.4****.**
Let , and be a subgaussian matrix satisfying (M1)-(M4). Furthermore, set , where denotes the (bounded) error of the measurement, constants as above and . Then the solution of (12) satisfies
[TABLE]
We can for example set (for ) to obtain the bound
[TABLE]
For example for i.i.d. standard Gaussian measurements, the constant .
Note that in the case of noisefree measurements , Theorem 3.4 gives an exact recovery result for a wider class of measurement ensembles with high probability. Furthermore with a detailed computation of one may be able to improve the number of measurements for nonuniform recovery. It also remains open, whether the lower bounds of Cai and Xu for the case of Gaussian measurements can be generalized to the subgaussian case. In fact, our numerical experiments summarized in Figure 2 suggest a better scaling in the ambient dimension, around , in the average case. We consider it an interesting problem for future work to explore whether this is due to a difference between Rademacher and Gaussian matrix entries, between uniform and nonuniform recovery, or between the average and the worst case. Also, it is not clear whether the scaling is in fact or if the observed slope is just a linearization of, say, a logarithmic dependence.
4 Discussion and open problems
As the considerations in the previous sections illustrate, the mathematical properties of total variation minimization differ significantly from algorithms based on synthesis sparsity, especially in one dimension. For this reason, there are a number of questions that have been answered for synthesis sparsity, but which are still open for the framework of total variation minimization. For example, the analysis provided in [RRT12, KMR14] for deterministically subsampled partial random circulant matrices, as they are used to model measurement setups appearing in remote sensing or coded aperture imaging, could not be generalized to total variation minimization. The difficulty in this setup is that the randomness is encoded by the convolution filter, so it is not clear what the analogy of variable density sampling would be.
Another case of practical interest is that of sparse measurement matrices. Recently it has been suggested that such meausurements increase efficiency in photoacoustic tomography, while at the same time, the signals to be recovered (after a suitable temporal transform) are approximately gradient sparse. This suggests the use of total variation minimization for recovery, and indeed empirically, this approaches yields good recovery results [SKB*+*15]. Theoretical guarantees, however, (as they are known for synthesis sparse signals via an expander graph construction [BGI*+*08]) are not available to date for this setup.
Acknowledgements
FK and MS acknowledge support by the Hausdorff Institute for Mathematics (HIM), where part of this work was completed in the context of the HIM trimester program ”Mathematics of Signal Processing”, FK and CK acknowledge support by the German Science Foundation in the context of the Emmy Noether Junior Research Group “Randomized Sensing and Quantization of Signals and Images” (KR 4512/1-1) and by the German Ministry of Research and Education in the context of the joint research initiative ZeMat. MS has been supported by the Austrian Science Fund (FWF) under Grant no. Y760 and the DFG SFB/TRR 109 ”Discretization in Geometry and Dynamics”.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[BDDW 08] R. G. Baraniuk, M. Davenport, R. A. De Vore, and M. Wakin. A simple proof of the Restricted Isometry Property for random matrices. Constr. Approx. , 28(3):253–263, 2008.
- 2[BGI + 08] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss. Combining geometry and combinatorics: A unified approach to sparse signal recovery. In 46th Annual Allerton Conference on Communication, Control, and Computing, 2008 , pages 798–805. IEEE, 2008.
- 3[CDDD 03] A. Cohen, W. Dahmen, I. Daubechies, and R De Vore. Harmonic analysis of the space bv. Rev. Mat. Iberoam. , 19(1):235–263, 2003.
- 4[CENR 10] E. J. Candès, Y. C. Eldar, D. Needell, and P. Randall. Compressed sensing with coherent and redundant dictionaries. Appl. Comput. Harmon. Anal. , 31(1):59–73, 2010.
- 5[CRT 06] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory , 52(3):489–509, 2006.
- 6[CX 15] J.-F. Cai and W. Xu. Guarantees of total variation minimization for signal recovery. Information and Inference , 4(4):328–353, 2015.
- 7[Don 04] D. Donoho. High-dimensional centrally-symmetric polytopes with neighborliness proportional to dimension. Technical report, Department of Statistics, Stanford University, 2004.
- 8[Don 06] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory , 52(4):1289–1306, 2006.
