Derandomizing compressed sensing with combinatorial design
Peter Jung, Richard Kueng, Dustin G. Mixon

TL;DR
This paper shows how to reduce randomness in compressed sensing measurement designs by using structured combinatorial objects, achieving reliable sparse signal recovery with fewer random measurements.
Contribution
It introduces derandomization techniques using orthogonal arrays and mutually unbiased bases to improve measurement design in compressed sensing.
Findings
Uniform s-sparse reconstruction guarantees with $C s \,\log(n)$ measurements.
Measurements chosen from structured combinatorial designs.
Imitation of random vectors using highly structured families.
Abstract
Compressed sensing is the art of reconstructing structured -dimensional vectors from substantially fewer measurements than naively anticipated. A plethora of analytic reconstruction guarantees support this credo. The strongest among them are based on deep results from large-dimensional probability theory that require a considerable amount of randomness in the measurement design. Here, we demonstrate that derandomization techniques allow for considerably reducing the amount of randomness that is required for such proof strategies. More, precisely we establish uniform s-sparse reconstruction guarantees for measurements that are chosen independently from strength-four orthogonal arrays and maximal sets of mutually unbiased bases, respectively. These are highly structured families of vectors that imitate signed Bernoulli and standard Gaussian vectors in a…
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Blind Source Separation Techniques · Electrical and Bioimpedance Tomography
Derandomizing compressed sensing with combinatorial design
Peter Jung1, Richard Kueng2, Dustin G. Mixon3
Communications and Information Theory Group, Technische Universität Berlin, Germany
1
Department of Computing + Mathematical Sciences & Institute for Quantum Information and Matter, California Institute of Technology, USA
2
Department of Mathematics, Ohio State University, USA
2
Abstract
Compressed sensing is the art of reconstructing structured -dimensional vectors from substantially fewer measurements than naively anticipated. A plethora of analytic reconstruction guarantees support this credo. The strongest among them are based on deep results from large-dimensional probability theory that require a considerable amount of randomness in the measurement design. Here, we demonstrate that derandomization techniques allow for considerably reducing the amount of randomness that is required for such proof strategies. More, precisely we establish uniform s-sparse reconstruction guarantees for measurements that are chosen independently from strength-four orthogonal arrays and maximal sets of mutually unbiased bases, respectively. These are highly structured families of vectors that imitate signed Bernoulli and standard Gaussian vectors in a (partially) derandomized fashion.
Index Terms:
Keywords: Compressed sensing, -wise independence, orthogonal arrays, spherical design, derandomization
I Introduction and main results
I-A Motivation
Compressed sensing is the art of reconstructing structured signals from substantially fewer measurements than would naively be required for standard techniques like least squares. Although not entirely novel, rigorous treatments of this observation [1, 2] spurred considerable scientific attention from 2006 on, see e.g. [3, 4] and references therein. While deterministic results do exist, the strongest theoretic convergence guarantees still rely on randomness. Broadly, these can be grouped into two families:
generic measurements such as independent Gaussian, or Bernoulli vectors. Such an abundance of randomness allows for establishing very strong results by following comparatively simple and instructive proof techniques. The downside is that concrete implementations do require a lot of randomness. In fact, they might be too random to be useful for certain applications. 2. 2.
structured measurements such as random rows of a Fourier, or Hadamard matrix. In contrast to generic measurements, these feature a lot of structure that is geared towards applications. Moreover, sampling random rows from a fixed matrix does require very little randomness. E.g. random bits are required to sample a random DFT row while an i.i.d. Bernoulli vector consumes bits of randomness. Structure and comparatively little randomness have a downside, however. Theoretic convergence guarantees tend to be weaker than their generic counterparts. It should also not come as a surprise that the necessary proof techniques become considerably more involved.
Typically, results of type 1) precede results of type 2). Phase retrieval via PhaseLift is a concrete example for such a development. Generic convergence guarantees [5, 6] preceded (partially) de-randomized results [7, 8]. Compressed sensing is special in this regard. The two seminal works [1, 2] from 2006 provided both results almost simultaneously. This had an interesting consequence. Despite considerable effort, to this date there still seems to be a gap between both proof techniques.
Here, we try to close this gap by applying a method that is very well established in theoretical computer science: partial derandomization. We start with a proof technique of type 1) and considerably limit the amount of randomness required for it to work. While doing so, we keep careful track of the “amount of randomness” that is still necessary. Finally, we replace the original (generic) random measurements with pseudo-random ones that mimic them in a sufficiently accurate fashion. Our results highlight that this technique almost allows for bridging the gap between existing proof techniques for generic and structured measurements: the results are still strong, but require slightly more randomness than choosing vectors uniformly from a bounded orthogonal system, such as Fourier or Hadamard vectors.
There is a also a didactic angle to this work: within the realm of signal processing, partial-derandomization techniques have been successfully applied to matrix reconstruction [8, 9] and phase retrieval via PhaseLift [7, 10, 11]. Although similar in spirit, the more involved nature of these problems may obscure the key ideas, intuition and tricks behind such an approach. However, the same techniques have not yet been applied to the original problem of compressed sensing. Here, we fill this gap and, in doing so, provide an introduction to partial derandomization techniques by example. To preserve this didactic angle, we try to keep the presentation as simple and self-contained as possible.
Finally, one may argue that compressed sensing has not fully lived up to the high expectations of the community yet, see e.g. [12]. Arguably, one of the most glaring problems for applications is the requirement of choosing individual measurements at random111Existing deterministic constructions, see e.g. [13], do not (yet) yield comparable statements.. While we are not able to fully overcome this drawback here, the methods described in this work do limit the amount of randomness required to generate individual structured measurements. We believe that this may help to reduce the discrepancy between “what can be proved” and “what can be done” in a variety of concrete applications.
I-B Preliminaries on compressed sensing
Compressed sensing aims at reconstructing -sparse vectors from linear measurements:
[TABLE]
Since , the matrix is singular and there are infinitely many solutions to this equation. A convex penalizing function is used to promote sparsity among these solutions. Typically, this penalizing function is the -norm :
[TABLE]
Mathematical proofs for convergence to the correct solution have been established for different measurement matrices . By and large, they require randomness in the sense that each row of is an independent copy of a random vector . Prominent examples include
standard complex Gaussian measurements: , 2. 2.
signed Bernoulli (Rademacher) measurements: , 3. 3.
random rows of a DFT matrix: , 4. 4.
for : random rows of a Hadamard matrix: .
A rigorous treatment of all these cases can be found in Ref. [3]. Here, and throughout this work, denotes an absolute constant whose exact value depends on the context, but it is always independent of the problem parameters and . It is instructive to compare the amount of randomness that is required to generate one instance of the random vectors in question. A random signed Bernoulli vector requires random bits (one for each coordinate), while a total of random bits suffice to select a random row of a Hadamard matrix. A comparison between complex standard Gaussian vectors and random Fourier vectors indicates a similar discrepancy. In summary: highly structured random vectors, like require exponentially fewer random bits to generate than generic random vectors, like . Importantly, this transition from generic measurements to highly structured ones comes at a price. The number of measurements required in case (1) and (4) scales poly-logarithmically in . More sophisticated approaches allow for converting this offset into a polylogarithmic scaling in rather than [14, 15]. Another, arguably even higher price, is hidden in the proof techniques behind these results. They are considerably more involved.
The following two subsections are devoted to introduce formalisms that allow for partially de-randomizing signed Bernoulli vectors and complex standard Gaussian vectors, respectively.
I-C Partially de-randomizing signed Bernoulli vectors
Throughout this work, we endow with the standard inner product . We denote the associated (Euclidean) norm by . Let be a signed Bernoulli vector with coefficients chosen independently at random (Rademacher random variables). Then,
[TABLE]
which is equivalent to demanding
[TABLE]
Independent sign entries are sufficient, but not necessary for this feature. Indeed, suppose that is a power of two. Then the rows of a Sylvester Hadamard matrix correspond to a particular subset of sign vectors. Let be the random vector arising from choosing a Hadamard row uniformly at random. Then,
[TABLE]
because the Hadamard rows ’s are proportional to an orthonormal basis and have norm . This in turn implies that the coordinates of a randomly selected Hadamard matrix row obey (2), despite not being independent instances of random signs. This feature is called pairwise independence and naturally generalizes to :
Definition 1** (-wise independence).**
Fix and let denote independent instances of a signed Bernoulli random variable. We call a random sign vector -wise independent, if its components obey
[TABLE]
for all -tuples of indices .
Explicit constructions for -wise independent vectors are known for any and . In this work we focus on particular constructions that rely on generalizing the following instructive example. Fix and consider the rows of the following matrix:
[TABLE]
The first two rows summarize all possible length-two combinations of . The coefficients of the third row correspond to their entry-wise product. Hence, it is completely characterized by the first two. The three row vectors are not mutually independent. Nonetheless, each subset of two rows does mimic independent behavior: all possible length-two combinations of occur exactly once. This ensures that a randomly selected row is pairwise independent in the sense that its coefficients obey Eq. (2).
This simple example may readily be generalized. A binary orthogonal array of strength is a sign matrix such that every selection of rows contains all elements of an equal number of times.
Several different explicit constructions of orthogonal arrays are known. A simple counting argument reveals that the number of rows must obey . This number scales polynomially in the array strength – a potentially exponential improvement over the “full” array that lists all possible elements of . In turn, selecting a random row of only requires random bits and produces a random vector that is -wise independent according to Definition 1. We refer to Sec. IV and Ref. [16] for a more thorough treatment of this concept.
I-D Partially derandomizing complex standard Gaussian vectors
Let us now discuss another general purpose tool for (partial) de-randomization. Concentration of measure implies that -dimensional standard complex Gaussian vectors concentrate sharply around the complex sphere of radius . Hence, they behave very similarly to vectors chosen uniformly from this sphere. Such random vectors obey the following formula for any and any :
[TABLE]
Here, denotes the uniform measure on the complex unit sphere . This formula characterizes even moments of this uniform distribution222For comparison, a complex standard Gaussian vector obeys instead.. The concept of -designs [17] uses this moment formula as a starting point for partial de-randomization. Roughly speaking, a -design is a finite subset of -length vectors such that the uniform distribution over these vectors reproduces the uniform measure on up to -th moments. More precisely:
Definition 2**.**
A set of vectors with length is called a (complex projective) -design if a randomly chosen vector obeys for any
[TABLE]
(Spherical) -designs were originally developed as cubature formulas for the real-valued unit sphere [17]. The concept has since been extended to other sets. A generalization to the complex projective space gives rise to Definition 2. Complex projective -designs are known to exist for any and any dimension , see e.g. [18, 19, 20]. However, explicit constructions for are notoriously difficult to find. In contrast, several explicit families of 2-designs have been identified. Here, we will focus on one such family. Two orthonormal bases and of are called mutually unbiased if
[TABLE]
A prominent example for such a basis pair are the standard basis and the Fourier, or Hadamard, basis, respectively. One can show that at most different orthonormal bases exist that have this property in a pairwise fashion [21, Theorem 3.5]. Such a set of bases is called a maximal set of mutually unbiased bases (MMUB). For instance, in the standard basis together with
[TABLE]
forms a MMUB. Importantly, MMUBs are always (proportional to) 2-designs [22]. Explicit constructions exist for any prime power dimension and one can ensure that the standard basis is always one of them. Here we point out one construction that is particularly simple if the dimension is (an odd) prime [23]: The standard basis vectors together with all vectors whose entry-wise coefficients correspond to
[TABLE]
form a MMUB. Here is a -th root of unity. The parameter singles out one of the different bases, while labels the corresponding basis vectors. Excluding the standard basis, this set of vectors corresponds to all time-frequency shifts of a discrete Alltop sequence [24].
I-E Main results
Theorem 1** (CS from orthogonal array measurements).**
Suppose that a matrix contains rows that are chosen independently from an orthogonal array with strength four. Then, with probability at least , any -sparse can be recovered from by means of algorithm (1).
Theorem 2** (CS from time-frequency shifted Alltop sequences).**
Let be prime and suppose that contains rows that correspond to random time-frequency shifts of the Alltop sequence (5) in dimension . Then, with probability at least , any -sparse can be recovered from by means of algorithm (1).
This result actually generalizes to measurements that are sampled from a maximal set of mutually unbiased bases (excluding the standard basis). Time-frequency shifts of the Alltop sequence are one concrete construction that applies to prime dimensions only.
Note that the cardinality of all Alltop shifts is . Hence, random bits suffice to select a random time-frequency shift. In turn, a total of
[TABLE]
random bits are required for sampling a complete measurement matrix . This number is exponentially smaller than the number of random bits required to generate a matrix with independent complex Gaussian entries. A similar comparison holds true for random signed Bernoulli matrices and columns sampled from a strength-4 orthogonal array.
Highly structured families of vectors – such as rows of a Fourier, or Hadamard matrix – require even less randomness to sample from: only bits are required to select such a row uniformly at random. However, existing convergence guarantees are weaker than the main results presented here. They require an order of random measurements to establish comparable results. Thus, the total number of random bits required for such a procedure scales like . Eq. (6) still establishes a logarithmic improvement in terms of sparsity.
The recovery guarantees in Theorem 1 and 2 can be readily extended to ensure stability with respect to noise corruption in the measurements and robustness with respect to violations of the model assumption of sparsity. We refer to Sec. III for details.
We also emphasize that there are results in the literature that establish compressed sensing guarantees comparable, or even less, randomness. Obviously, deterministic constructions are the extreme case in this regard. Early results suffer from a “quadratic bottleneck”. The number of measurements must scale quadratically in the sparsity: . Although this obstacle was overcome, existing progress is still comparatively mild. Refs. [25, 26, 27] establish deterministic convergence guarantees for , where is a (very) small constant.
Closer in spirit to this work is Ref. [28]. There, the authors employ the Legendre symbol – which is well known for its pseudorandom behavior – to partially derandomize a signed Bernoulli matrix. In doing so, they establish uniform -sparse recovery from measurements that require an order of random bits to generate. Compared to the main results presented here, this result gets by with less randomness, but requires more measurements. The proof technique is also very different.
To this date, the strongest de-randomized reconstruction guarantees hail from a close connection between -sparse recovery and Johnson-Lindenstrauss embeddings [29, 30]. These have a wide range of applications in modern data science. Kane and Nelson [31] established a very strong partial de-randomization for such embeddings. This result may be used to establish uniform -sparse recovery for measurements that require an order of random bits. This result surpasses the main results presented here in both sampling rate and randomness required.
However, this strong result follows from “reducing” the problem of -sparse recovery to a (seemingly) very different problem: find Johnson-Lindenstrauss embeddings. Such a reduction typically does not preserve problem-specific structure. In contrast, the approach presented addresses the problem of sparse recovery directly and relies on tools from signal processing. In doing so, we maintain structural properties that are common in several applications of -sparse recovery. Orthogonal array measurements, for instance, have -entries. This is well-suited for the single pixel camera [32]. Alltop sequence constructions, on the other hand, have successfully been applied to stylized radar problems [33]. Both types of measurements also have the property that every entry has unit modulus. This is an important feature for the application of CDMA [34]. Having pointed out these high level connections, we want to emphasize that careful, problem specific adaptations may be required to rigorously exploit these. The framework developed here may serve as a guideline on how to achieve this goal in concrete scenarios.
II Proofs
II-A Textbook-worthy proof for real-valued compressed sensing with Gaussian measurements
This section is devoted to summarizing an elegant argument that is originally due to Rudelson and Vershynin [14], see also [35, 36, 37] for arguments that are similar in spirit. This argument only applies to -sparse recovery of real-valued signals. We will generalize a similar idea to the complex case later on.
In this work we are concerned with uniform reconstruction guarantees: With high probability a single realization of the measurement matrix allows for reconstructing any -sparse vector by means of -regularization (1). A necessary pre-requisite for uniform recovery is the demand that no -sparse vector is contained in the kernel, or nullspace, of . This condition is captured by the nullspace property (NSP). Define
[TABLE]
where is the approximation error (measured in -norm) one incurs when approximating with a -sparse vector. A matrix obeys the NSP of order if
[TABLE]
The set is a subset of the unit sphere that contains all normalized -sparse vectors. This justifies the informal definition of the NSP: no -sparse vector is an element of the nullspace of . Importantly, the NSP is not only necessary, but also sufficient for uniform recovery, see e.g. [3, Theorem 4.5]. Hence, universal recovery of -sparse signals readily follows from establishing Rel. (8). The nullspace property and its relation to -sparse recovery has long been somewhat folklore. We refer to Ref. [3] for a discussion of its origin.
The following powerful statement allows for exploiting generic randomness in order to establish nullspace properties. It is originally due to Gordon [38], but we utilize a more modern reformulation, see [3, Theorem 9.21].
Theorem 3** (Gordon’s escape through a mesh).**
Let be a real-valued standard Gaussian matrix and let be a subset of the real-valued unit sphere. Define the Gaussian width where the expectation is over realizations of a standard Gaussian random vector. Then, for the bound
[TABLE]
is true with probability at least .
This is a deep statement that connects random matrix theory to geometry: the Gaussian width is a rough measure of the size of the set . Setting allows us to conclude that a matrix encompassing independent Gaussian measurements is very likely to obey the -NSP (8), provided that exceeds . In order to derive an upper bound on , we may use the following inclusion
[TABLE]
see e.g. [35, Lemma 3] and [14, Lemma 4.5]. Here, denotes the set of all -sparse vectors with unit length. In turn,
[TABLE]
because the linear function achieves its maximum value at the boundary of the convex set . The right hand side of (9) is the expected supremum of a Gaussian process indexed by . Dudley’s inequality [39], see also [3, Theorem 8.23], states
[TABLE]
where are covering numbers associated with the set . They are defined as the smallest cardinality of a -covering net with respect to the Euclidean distance. A volumetric counting argument yields and Dudley’s inequality therefore implies
[TABLE]
where is an absolute constant. This readily yields the following assertion.
Theorem 4** (NSP for Gaussian measurements).**
A number of independent real-valued Gaussian measurements obeys the (real-valued) -NSP with high probability at least .
This argument is exemplary for generic proof techniques: strong results from probability theory allow for establishing close-to-optimal results in a relatively succinct fashion.
II-B Extending the scope to subgaussian measurements
The extended arguments presented here are largely due to Dirksen, Lecue and Rauhut [36]. Again, we will focus on the real-valued case.
Gordon’s escape through a mesh is only valid for Gaussian random matrices . Novel methods are required to extend this proof technique beyond this idealized case. Comparatively recently, Mendelson provided one by generalizing Gordon’s escape through a mesh [40, 41].
Theorem 5** (Mendelson’s small ball method, Tropp’s formulation [37]).**
Suppose that is a random matrix whose rows correspond to independent realizations of a random vector . Fix a set , and define
[TABLE]
is the empirical average over independent copies of weighted by uniformly random signs . Then, for any
[TABLE]
with probability at least .
It is worthwhile to point out that for real-valued Gaussian vectors this result recovers Theorem 3 up to constants. Fix of appropriate size. Then, ensures that is constant. Moreover, reduces to the usual Gaussian width .
Mendelson’s small ball method can be used to establish the nullspace property for independent random measurements that exhibit subgaussian behavior:
[TABLE]
Signed Bernoulli vectors are a concrete example: is an independent instance of a Rademacher random variable. Signed Bernoulli vectors obey
[TABLE]
Direct computation also reveals
[TABLE]
because there are 3 possible pairings of four indices.
Now, set .
An application of the Paley-Zygmund inequality then allows for bounding the parameter in Mendelson’s small ball method from below:
[TABLE]
This lower bound is constant for any .
Next, note that is a stochastic process that is indexed by . This process is centered () and Eq. (10) implies that it is also subguassian (at least for any ). Moreover, readily follows from (11). Unlike Gordon’s escape through a mesh, Dudley’s inequality does remain valid for such stochastic processes with subgaussian marginals. We can now repeat the width analysis from the previous section to obtain
[TABLE]
Fixing sufficiently small, setting and inserting these bounds into Eq. (5) yields the following result.
Theorem 6** (NSP for signed Bernoulli measurements).**
A matrix encompassing random signed Bernoulli measurements obeys the real-valued -NSP with probability at least .
A similar result remains valid for other classes of independent measurements with subgaussian marginals (10).
II-C Generalization to complex-valued signals and partial de-randomization
The nullspace property, as well as its connection to uniform -sparse recovery readily generalizes to complex-valued -sparse vectors. A similar extension applies to Mendelson’s small ball method:
Theorem 7** (Mendelson’s small ball method for complex vector spaces).**
Suppose that the rows of correspond to independent copies of a random vector . Fix a set and define
[TABLE]
Then, for any
[TABLE]
with probability at least .
Such a generalization was conjectured by Tropp [37], but we are not aware of any rigorous proof in the literature. We provide one in Subsection V-B and believe that such an extension may be of independent interest. This extension allows for generalizing the arguments from the previous subsection to the complex-valued case.
Let us now turn to the main scope of this work: partial de-randomization. Effectively, Mendelson’s small ball method reduces the task of establishing nullspace properties to bounding the two parameters and in an appropriate fashion. A lower bound on the former readily follows from the Paley-Zygmund inequality, provided that the random vector obeys
[TABLE]
where is a constant:
[TABLE]
In contrast, establishing an upper bound on via Dudley’s inequality requires subgaussian marginals (10) (that must not depend on the ambient dimension). This implicitly imposes stringent constraints on all moments simultaneously. An additional assumption allows to considerably weaken these demands:
[TABLE]
Incoherence has long been identified as a key ingredient for developing -sparse recovery guarantees. Here, we utilize it to establish an upper bound on that does not rely on subgaussian marginals.
Lemma 1**.**
Let be a random vector that is isotropic and incoherent. Let be the complex-valued generalization of the set defined in Eq. (7) and assume . Then,
[TABLE]
This bound only requires an appropriate scaling of the first two moments (isotropy). However, this partial derandomization comes at a price: the bound scales logarithmically in rather than . We defer a proof of this statement to Subsection V-A below. Inserting the bounds (13) and (15) into the assertion of Theorem 7 readily yields the main technical result of this work:
Theorem 8**.**
Suppose that is a random vector that obeys incoherence, isotropy and the 4th moment bound. Then, choosing
[TABLE]
instances of uniformly at random results in a measurement matrix that obeys the complex-valued nullspace property of order with probability at least .
In complete analogy to the real-valued case, the complex nullspace property ensures uniform recovery of -sparse vectors from linear measurements of the form via algorithm (1).
II-D Recovery guarantee for strength-four orthogonal arrays
Suppose that is chosen uniformly from an orthogonal array with strength 4. By definition
[TABLE]
which establishes incoherence. Moreover, the components of obey , because 4-wise independence necessarily implies 2-wise independence. Isotropy readily follows:
[TABLE]
Finally, 4-wise independence suffices to establish the 4th moment bound. By assumption and we may thus infer
[TABLE]
Therefore meets all the requirements of Theorem 8. The first main result then readily follows from the fact that the complex nullspace property ensures uniform recovery of all -sparse signals.
II-E Recovery guarantee for mutually unbiased bases
Suppose that is chosen uniformly from a maximal set of mutually unbiased bases (excluding the standard basis) whose elements are re-normalized to length . Random time-frequency shift of the Alltop sequence (5) is a concrete example for such a sampling procedure, provided that the dimension is an (odd) prime.
The vector is chosen from a union of bases that are all mutually unbiased with respect to the standard basis, see Eq. (4). Together with super-normalization () this readily establishes incoherence: with probability one.
Next, by assumption is chosen uniformly from a union of re-scaled orthonormal bases with . Therefore, for any
[TABLE]
which establishes isotropy.
Finally, a maximal set of mutually unbiased bases – including the standard basis which we denote by – forms a 2-design according to Definition 2. For any this property ensures
[TABLE]
which implies the 4th moment bound. In summary, the random vector meets the requirements of Theorem 8. Theorem 2 then readily follows form the implications of the nullspace property for -sparse recovery.
III Extension to noisy measurements
The nullspace property may be generalized to address two imperfections in -sparse recovery simultaneously: (i) the vector may only be approximately sparse in the sense that it is well-approximated by a -sparse vector, (ii) the measurements may be corrupted by additive noise: with .
To state this generalization, we need some additional notation. For and , let be the vector that only contains the largest entries in modulus. All other entries are set to zero. Likewise, we write to denote the remainder. In particular, . A matrix obeys the robust nullspace property of order with parameters and if
[TABLE]
see e.g. [3, Definition 4.21]. This extension of the nullspace property is closely related to stable -sparse recovery from noisy measurements via basis pursuit denoising:
[TABLE]
Here, denotes an upper bound on the strength of the noise corruption: . Indeed, [3, Theorem 4.22] draws the following connection: suppose that obeys the robust nullspace property with parameters . Then, the solution to (16) is guaranteed to obey
[TABLE]
where and . The first term on the r.h.s. vanishes if is exactly -sparse and remains small if is well approximated by a -sparse vector. The second term scales linearly in the noise bound and vanishes in the absence of any noise corruption.
In the previous section, we have established the classical nullspace property for measurements that are chosen independently from a vector distribution that is isotropic, incoherent and obeys a bound on the 4th moments. This argument may readily be extended to establish the robust nullspace property with relatively little extra effort. To this end, define the set
[TABLE]
A moment of thought reveals that the matrix obeys the robust nullspace property with parameters if
[TABLE]
What is more, the following inclusion formula is also valid:
[TABLE]
see [35, Lemma 3] and [14, Lemma 4.5]. This ensures that the bounds on the parameters in Mendelson’s small ball method generalize in a rather straightforward fashion. Isotropy, incoherence and the 4th moment bound ensure
[TABLE]
Now, suppose that subsumes independent copies of the random vector , where is sufficiently large. Then, Theorem 7 readily asserts
[TABLE]
with probability at least . Previously, we employed Mendelson’s small ball method to simply assert that a similar infimum is strictly positive. Eq. (19) provides a strictly positive lower bound with comparable effort. Comparing this relation to Eq. (18) highlights that this is enough to establish the robust nullspace property with parameters and with high probability. In turn, a stable generalization of the main recovery guarantee follows from Eq. (17).
Theorem 9**.**
Fix and . Suppose that we sample independent copies of an isotropic, incoherent random vector that also obeys the 4th moment bound. Then, with probability at least , the resulting measurement matrix allows for stable, uniform recovery of (approximately) -sparse vectors. More precisely, the solution to (16) is guaranteed to obey
[TABLE]
where depend only on .
IV Numerical experiments
In this part we demonstrate the performance which can be achieved with our proposed derandomized constructions and we compare this to generic measurement matrices (Gaussian, signed Bernoulli). However, since the orthogonal array construction is more involved we first provide additional details relevant for numerical experiments.
IV-A Details on orthogonal arrays
An orthogonal array of strength , with factors and levels is an array of different symbols such that in any columns every ordered -tuple occurs in exactly rows. Arrays with are called simple. A comprehensive treatment can be found in the book [16]. Known arrays are listed in several libraries333for example http://neilsloane.com/oadir/ or http://pietereendebak.nl/oapage/. Often the symbol alphabet is not relevant, but we use the set for concreteness. Such arrays can be represented as a matrix in . For with prime the simple orthogonal array is linear if the rows of the matrix form a vector space over . The runs of an orthogonal array (the rows of the corresponding matrix) can also be interpreted as codewords of a code and vice versa. The array is linear if and only if the corresponding code is linear [16, Chapter 4]. This relationship allows to employ classical code constructions to construct orthogonal arrays.
IV-B Counting bits
In this work we propose to generate sampling matrices by selecting rows at random from an orthogonal array , eventually removing the bias (substracting per component) and scale appropriately. Intuitively, bits are then required to specify such a matrix . For and , a classical lower bound due to Rao [42] demands
[TABLE]
Arrays that saturate this bound are called tight (or complete). In summary, an order of bits are required to sample a matrix with rows according to this procedure.
IV-C Strength- Constructions
For compressed sensing applications we want arrays with large number of factors since this corresponds to the ambient dimension of the sparse vectors to recover. On the other hand the run size should scale “moderately” to describe the random matrices only with few bits. Most constructions use an existing orthogonal array as a seed to construct larger arrays. Known binary arrays of strength are for example the simple array , or . Ref. [43] proposes an algorithm that uses a linear orthogonal array as a seed to construct a linear orthogonal array . This procedure may then be iterated.
IV-D Numerical results for orthogonal arrays:
Figure 1 summarizes the empirical performance of basis pursuit (1) from independent orthogonal array measurements. We consider real-valued signals and quantify the performance in terms of the normalized -recovery error (NMSE). To construct the orthogonal array, algorithm [43] is applied twice .
The rows are uniformly sampled from this array, i.e. the sampling matrix has entries (mapping ) and size . Note that, in the case of non-negative sparse vectors, the corresponding 0/1-matrices may be used instead to recover with non-negative least-squares [44]. The sparsity of the unknown vector has been varied between . For each sparsity many experiments are performed to compute NMSE. In each run, the support of the unknown vector has been chosen uniformly at random and the values are independent instances of a standard Gaussian random variable. For comparison, we have also included the corresponding performances of a generic sampling matrix (signed Bernoulli) of the same size. Numerically, the partially derandomized orthogonal array construction achieves essentially the same performance as its generic counterpart.
IV-E Numerical results for the Alltop design
Figure 1 shows the NMSE achieved for measurement matrices based on subsampling from an Alltop-design (5). The data is obtained in the same way as above but the sparse vectors are generated as iid. complex-normal distributed on the support. For comparison the results for a (complex) standard Gaussian sampling matrix are included as well. Again, the performance of random Alltop-design measurements essentially matches its generic (Gaussian) counterpart.
V Additional proofs
V-A Proof of Lemma 1
The inclusion remains valid in the complex case. Moreover, every necessarily obeys
[TABLE]
because the maximum value of a convex function over a convex set is achieved at the boundary. Hoelder’s inequality therefore implies
[TABLE]
where . Moreover,
[TABLE]
and we may bound both expressions on the r.h.s. independently. For the first term, fix and use Jensen’s inequality (the logarithm is a concave function) to obtain
[TABLE]
Monotonicity and non-negativity of the exponential function then imply
[TABLE]
where we have also used that all ’s and ’s are independent. The remaining moment generating functions can be bounded individually. Fix , and and exploit the Rademacher randomness to infer
[TABLE]
because . Incoherence moreover ensures . This ensures that the remaining expectation value is upper-bounded by . Inserting these individual bounds into the expression above yields
[TABLE]
for any . Choosing is feasible and minimizes this upper bound. A completely analogous bound can be derived for the expected maximum absolute value of the imaginary part. Combining both yields
[TABLE]
and inserting this bound into Eq. (21) ensures
[TABLE]
V-B Proof of Theorem 7
The proof is based on rather straightforward modifications of Tropp’s proof for Mendelson’s small ball method [37]. Let be a complex-valued random vector. Suppose that are independent copies of and let be the matrix whose rows correspond to these vectors. The goal is to obtain a lower bound on where is an arbitrary, but fixed, set. First, note that and norms on are related via . For fixed this ensures
[TABLE]
Next, we fix arbitrary and introduce the indicator function which obeys for all . Consequently, is upper-bounded by
[TABLE]
Also, note that the expectation value of each summand obeys
[TABLE]
according to the union bound. The last line follows from the following observation. Let be a complex number. Then, necessarily implies either , or (or both). Now, define
[TABLE]
and note that the estimate from above ensures
[TABLE]
Adding and subtracting to Eq. (22) and taking the infimum yields
[TABLE]
Here we have applied Eq. (23) to the first term. Since features both a real and imaginary part and we can split up the remaining supremum accordingly. The suprema over real and complex parts individually correspond to
[TABLE]
and we denote them by and , respectively. The vectors are independent copies of . The bounded difference inequality [45, Section 6.1] asserts that both expressions concentrate around their expectation. More precisely, for any
[TABLE]
Therefore, the union bound grants a transition from to with probability at least . These expectation values can be further simplified. Define the soft indicator function
[TABLE]
which obeys for all . Moreover, is a contraction, i.e. a real-valued function with Lipschitz constant one that also obeys . Rademacher symmetrization [3, Lemma 8.4] and the Rademacher comparison principle [46, Eq. (4.20)] yield
[TABLE]
where . A completely analogous bound holds true for . Inserting both bounds into Eq. (24) establishes
[TABLE]
with probability at least . Setting establishes the claim.
Acknowledgements
This work can be seen as a continuation of the research program that David Gross devised for RK’s doctoral studies. PJ is supported by DFG grant JU 2795/3 and DAAD grant 57417688. RK was in part supported by Joel A. Tropp under ONR Award No. N00014-17-12146 and also acknowledges funding provided by the Institute of Quantum Information and Matter, an NSF Physics Frontiers Center (NSF Grant PHY-1733907). DGM was partially supported by AFOSR FA9550-18-1-0107, NSF DMS 1829955, and the Simons Institute of the Theory of Computing.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory , vol. 52, no. 4, pp. 1289–1306, 2006.
- 2[2] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory , vol. 52, no. 2, pp. 489–509, 2006.
- 3[3] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing , ser. Applied and Numerical Harmonic Analysis. Birkhäuser/Springer, New York, 2013.
- 4[4] Y. C. Eldar and G. Kutyniok, Compressed sensing: Theory and Applications . Cambridge University Press, 2012.
- 5[5] E. J. Candès, T. Strohmer, and V. Voroninski, “Phaselift: exact and stable signal recovery from magnitude measurements via convex programming.” Commun. Pure Appl. Math. , vol. 66, pp. 1241–1274, 2013.
- 6[6] E. Candès and X. Li, “Solving quadratic equations via Phase Lift when there are about as many equations as unknowns,” Found. Comput. Math. , pp. 1–10, 2013.
- 7[7] D. Gross, F. Krahmer, and R. Kueng, “A partial derandomization of phaselift using spherical designs,” J. Fourier Anal. Appl. , vol. 21, no. 2, pp. 229–266, 2015.
- 8[8] R. Kueng, H. Rauhut, and U. Terstiege, “Low rank matrix recovery from rank one measurements,” Appl. Comput. Harmon. Anal. , vol. 42, no. 1, pp. 88 – 116, 2017.
