Derandomizing compressed sensing with combinatorial design

Peter Jung; Richard Kueng; Dustin G. Mixon

arXiv:1812.08130·cs.IT·December 20, 2018

Derandomizing compressed sensing with combinatorial design

Peter Jung, Richard Kueng, Dustin G. Mixon

PDF

Open Access

TL;DR

This paper shows how to reduce randomness in compressed sensing measurement designs by using structured combinatorial objects, achieving reliable sparse signal recovery with fewer random measurements.

Contribution

It introduces derandomization techniques using orthogonal arrays and mutually unbiased bases to improve measurement design in compressed sensing.

Findings

01

Uniform s-sparse reconstruction guarantees with $C s \,\log(n)$ measurements.

02

Measurements chosen from structured combinatorial designs.

03

Imitation of random vectors using highly structured families.

Abstract

Compressed sensing is the art of reconstructing structured $n$ -dimensional vectors from substantially fewer measurements than naively anticipated. A plethora of analytic reconstruction guarantees support this credo. The strongest among them are based on deep results from large-dimensional probability theory that require a considerable amount of randomness in the measurement design. Here, we demonstrate that derandomization techniques allow for considerably reducing the amount of randomness that is required for such proof strategies. More, precisely we establish uniform s-sparse reconstruction guarantees for $C s lo g (n)$ measurements that are chosen independently from strength-four orthogonal arrays and maximal sets of mutually unbiased bases, respectively. These are highly structured families of $\tilde{C} n^{2}$ vectors that imitate signed Bernoulli and standard Gaussian vectors in a…

Figures2

Click any figure to enlarge with its caption.

Equations183

y = Ax \in C^{m} .

y = Ax \in C^{m} .

z \in C^{n} minimize

z \in C^{n} minimize

Az = y

E [ϵ_{i} \overset{ϵ}{ˉ}_{j}] = E [ϵ_{i} ϵ_{j}] = δ_{ij}

E [ϵ_{i} \overset{ϵ}{ˉ}_{j}] = E [ϵ_{i} ϵ_{j}] = δ_{ij}

E [⟨ y, a_{s b} ⟩ ⟨ a_{s b}, z ⟩] =

E [⟨ y, a_{s b} ⟩ ⟨ a_{s b}, z ⟩] =

E [⟨ y, a_{h} ⟩ ⟨ a_{h}, z ⟩] = \frac{1}{n} i = 1 \sum n ⟨ y, h_{i} ⟩ ⟨ h_{i}, z ⟩ = ⟨ y, z ⟩ \forall y, z,

E [⟨ y, a_{h} ⟩ ⟨ a_{h}, z ⟩] = \frac{1}{n} i = 1 \sum n ⟨ y, h_{i} ⟩ ⟨ h_{i}, z ⟩ = ⟨ y, z ⟩ \forall y, z,

E [i = 1 \prod k a_{i_{k}}] = E [i = 1 \prod k ϵ_{i_{k}}]

E [i = 1 \prod k a_{i_{k}}] = E [i = 1 \prod k ϵ_{i_{k}}]

\left(\begin{array}[]{rrrr}1&1&-1&-1\\ 1&-1&1&-1\\ 1&-1&-1&1\end{array}\right)

\left(\begin{array}[]{rrrr}1&1&-1&-1\\ 1&-1&1&-1\\ 1&-1&-1&1\end{array}\right)

E [∣ ⟨ z, a_{s} ⟩ ∣^{2 k}] =

E [∣ ⟨ z, a_{s} ⟩ ∣^{2 k}] =

=

E [∣ ⟨ z, a_{(t)} ⟩ ∣^{2 k}] = n^{k} (k n + k - 1)^{- 1} ∥ z ∥_{ℓ_{2}}^{2 k} \forall z \in C^{n} .

E [∣ ⟨ z, a_{(t)} ⟩ ∣^{2 k}] = n^{k} (k n + k - 1)^{- 1} ∥ z ∥_{ℓ_{2}}^{2 k} \forall z \in C^{n} .

∣ ⟨ b_{i}, c_{j} ⟩ ∣^{2} = \frac{1}{n} for all i, j \in [n] = {1, \dots, n} .

∣ ⟨ b_{i}, c_{j} ⟩ ∣^{2} = \frac{1}{n} for all i, j \in [n] = {1, \dots, n} .

\frac{1}{\sqrt{2}}\left(\begin{array}[]{c}1\\ 1\end{array}\right),\frac{1}{\sqrt{2}}\left(\begin{array}[]{c}1\\ -1\end{array}\right),\frac{1}{\sqrt{2}}\left(\begin{array}[]{c}1\\ i\end{array}\right),\\ \frac{1}{\sqrt{2}}\left(\begin{array}[]{c}1\\ -i\end{array}\right)

\frac{1}{\sqrt{2}}\left(\begin{array}[]{c}1\\ 1\end{array}\right),\frac{1}{\sqrt{2}}\left(\begin{array}[]{c}1\\ -1\end{array}\right),\frac{1}{\sqrt{2}}\left(\begin{array}[]{c}1\\ i\end{array}\right),\\ \frac{1}{\sqrt{2}}\left(\begin{array}[]{c}1\\ -i\end{array}\right)

[b_{α, λ}]_{k} = \frac{1}{n} ω_{n}^{(k + α)^{3} + λ (k + α)}

[b_{α, λ}]_{k} = \frac{1}{n} ω_{n}^{(k + α)^{3} + λ (k + α)}

2 lo g_{2} (n) m ≃ 2 C s lo g^{2} (n)

2 lo g_{2} (n) m ≃ 2 C s lo g^{2} (n)

T_{s} = {z \in S^{n - 1} : ∥ z ∥_{ℓ_{1}} \geq 2 σ_{s} (z)} \subset S^{n - 1},

T_{s} = {z \in S^{n - 1} : ∥ z ∥_{ℓ_{1}} \geq 2 σ_{s} (z)} \subset S^{n - 1},

z \in T_{s} in f ∥ Az ∥_{ℓ_{2}} > 0.

z \in T_{s} in f ∥ Az ∥_{ℓ_{2}} > 0.

z \in E in f ∥ Az ∥_{ℓ_{2}} \geq m - 1 - ℓ (E) - t

z \in E in f ∥ Az ∥_{ℓ_{2}} \geq m - 1 - ℓ (E) - t

T_{s} \subset 2 conv (Σ_{n}^{s}),

T_{s} \subset 2 conv (Σ_{n}^{s}),

ℓ (T_{s}) \leq 2 E z \in conv (Σ_{n}^{s}) sup ⟨ a_{g}, z ⟩ = 2 E z \in Σ_{s}^{n} sup ⟨ a_{g}, z ⟩,

ℓ (T_{s}) \leq 2 E z \in conv (Σ_{n}^{s}) sup ⟨ a_{g}, z ⟩ = 2 E z \in Σ_{s}^{n} sup ⟨ a_{g}, z ⟩,

E z \in Σ_{s}^{n} sup ⟨ a_{g}, z ⟩ \leq 42 \int_{0}^{1} ln (N (Σ_{s}^{n}, ∥ \cdot ∥_{ℓ_{2}}, u)), d u

E z \in Σ_{s}^{n} sup ⟨ a_{g}, z ⟩ \leq 42 \int_{0}^{1} ln (N (Σ_{s}^{n}, ∥ \cdot ∥_{ℓ_{2}}, u)), d u

ℓ (T_{s}) \leq c s lo g (e n / s),

ℓ (T_{s}) \leq c s lo g (e n / s),

Q_{ξ} (a, E) =

Q_{ξ} (a, E) =

W_{m} (a, E) =

z \in E in f ∥ Az ∥_{ℓ_{2}} \geq ξ m Q_{2 ξ} (a, E) - 2 W_{m} (a, E) - ξ t

z \in E in f ∥ Az ∥_{ℓ_{2}} \geq ξ m Q_{2 ξ} (a, E) - 2 W_{m} (a, E) - ξ t

E exp (θ ⟨ y, a ⟩) \leq exp (\frac{θ ^{2}}{2} ∥ y ∥_{ℓ_{2}}^{2}) for all y \in R^{n}, θ > 0.

E exp (θ ⟨ y, a ⟩) \leq exp (\frac{θ ^{2}}{2} ∥ y ∥_{ℓ_{2}}^{2}) for all y \in R^{n}, θ > 0.

E [⟨ z, a_{sb} ⟩^{2}] =

E [⟨ z, a_{sb} ⟩^{2}] =

E [⟨ z, a_{sb} ⟩^{4}] =

E [⟨ z, a_{sb} ⟩^{4}] =

=

=

\leq

Q_{2 ξ} (a_{sb}, T_{s}) \geq

Q_{2 ξ} (a_{sb}, T_{s}) \geq

=

\geq

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Blind Source Separation Techniques · Electrical and Bioimpedance Tomography

Full text

Derandomizing compressed sensing with combinatorial design

Peter Jung1, Richard Kueng2, Dustin G. Mixon3

Communications and Information Theory Group, Technische Universität Berlin, Germany

1

Department of Computing + Mathematical Sciences & Institute for Quantum Information and Matter, California Institute of Technology, USA

2

Department of Mathematics, Ohio State University, USA

2

Abstract

Compressed sensing is the art of reconstructing structured $n$ -dimensional vectors from substantially fewer measurements than naively anticipated. A plethora of analytic reconstruction guarantees support this credo. The strongest among them are based on deep results from large-dimensional probability theory that require a considerable amount of randomness in the measurement design. Here, we demonstrate that derandomization techniques allow for considerably reducing the amount of randomness that is required for such proof strategies. More, precisely we establish uniform s-sparse reconstruction guarantees for $Cs\log(n)$ measurements that are chosen independently from strength-four orthogonal arrays and maximal sets of mutually unbiased bases, respectively. These are highly structured families of $\tilde{C}n^{2}$ vectors that imitate signed Bernoulli and standard Gaussian vectors in a (partially) derandomized fashion.

Index Terms:

Keywords: Compressed sensing, $k$ -wise independence, orthogonal arrays, spherical design, derandomization

I Introduction and main results

I-A Motivation

Compressed sensing is the art of reconstructing structured signals from substantially fewer measurements than would naively be required for standard techniques like least squares. Although not entirely novel, rigorous treatments of this observation [1, 2] spurred considerable scientific attention from 2006 on, see e.g. [3, 4] and references therein. While deterministic results do exist, the strongest theoretic convergence guarantees still rely on randomness. Broadly, these can be grouped into two families:

generic measurements such as independent Gaussian, or Bernoulli vectors. Such an abundance of randomness allows for establishing very strong results by following comparatively simple and instructive proof techniques. The downside is that concrete implementations do require a lot of randomness. In fact, they might be too random to be useful for certain applications. 2. 2.

structured measurements such as random rows of a Fourier, or Hadamard matrix. In contrast to generic measurements, these feature a lot of structure that is geared towards applications. Moreover, sampling random rows from a fixed matrix does require very little randomness. E.g. $\log(n)$ random bits are required to sample a random DFT row while an i.i.d. Bernoulli vector consumes $n$ bits of randomness. Structure and comparatively little randomness have a downside, however. Theoretic convergence guarantees tend to be weaker than their generic counterparts. It should also not come as a surprise that the necessary proof techniques become considerably more involved.

Typically, results of type 1) precede results of type 2). Phase retrieval via PhaseLift is a concrete example for such a development. Generic convergence guarantees [5, 6] preceded (partially) de-randomized results [7, 8]. Compressed sensing is special in this regard. The two seminal works [1, 2] from 2006 provided both results almost simultaneously. This had an interesting consequence. Despite considerable effort, to this date there still seems to be a gap between both proof techniques.

Here, we try to close this gap by applying a method that is very well established in theoretical computer science: partial derandomization. We start with a proof technique of type 1) and considerably limit the amount of randomness required for it to work. While doing so, we keep careful track of the “amount of randomness” that is still necessary. Finally, we replace the original (generic) random measurements with pseudo-random ones that mimic them in a sufficiently accurate fashion. Our results highlight that this technique almost allows for bridging the gap between existing proof techniques for generic and structured measurements: the results are still strong, but require slightly more randomness than choosing vectors uniformly from a bounded orthogonal system, such as Fourier or Hadamard vectors.

There is a also a didactic angle to this work: within the realm of signal processing, partial-derandomization techniques have been successfully applied to matrix reconstruction [8, 9] and phase retrieval via PhaseLift [7, 10, 11]. Although similar in spirit, the more involved nature of these problems may obscure the key ideas, intuition and tricks behind such an approach. However, the same techniques have not yet been applied to the original problem of compressed sensing. Here, we fill this gap and, in doing so, provide an introduction to partial derandomization techniques by example. To preserve this didactic angle, we try to keep the presentation as simple and self-contained as possible.

Finally, one may argue that compressed sensing has not fully lived up to the high expectations of the community yet, see e.g. [12]. Arguably, one of the most glaring problems for applications is the requirement of choosing individual measurements at random111Existing deterministic constructions, see e.g. [13], do not (yet) yield comparable statements.. While we are not able to fully overcome this drawback here, the methods described in this work do limit the amount of randomness required to generate individual structured measurements. We believe that this may help to reduce the discrepancy between “what can be proved” and “what can be done” in a variety of concrete applications.

I-B Preliminaries on compressed sensing

Compressed sensing aims at reconstructing $s$ -sparse vectors $\mathbf{x}\in\mathbb{C}^{n}$ from $m\ll n$ linear measurements:

[TABLE]

Since $m\ll n$ , the matrix $\mathbf{A}$ is singular and there are infinitely many solutions to this equation. A convex penalizing function is used to promote sparsity among these solutions. Typically, this penalizing function is the $\ell_{1}$ -norm $\|\mathbf{z}\|_{\ell_{1}}=\sum_{i=1}^{n}|z_{i}|$ :

[TABLE]

Mathematical proofs for convergence to the correct solution $\mathbf{x}\in\mathbb{C}^{n}$ have been established for different measurement matrices $\mathbf{A}$ . By and large, they require randomness in the sense that each row $\mathbf{a}_{i}\in\mathbb{C}^{n}$ of $\mathbf{A}$ is an independent copy of a random vector $\mathbf{a}\in\mathbb{R}^{n}$ . Prominent examples include

$m=Cs\log(n/s)$ standard complex Gaussian measurements: $\mathbf{a}_{g}\sim\mathcal{N}(\mathbf{0},\mathbb{I}/\sqrt{2})+i\mathcal{N}(\mathbf{0},\mathbb{I}/\sqrt{2})$ , 2. 2.

$m=Cs\log(n/s)$ signed Bernoulli (Rademacher) measurements: $\mathbf{a}_{sb}\sim\left\{\pm 1\right\}^{n}$ , 3. 3.

$m=Cs\log^{4}(n)$ random rows of a DFT matrix: $\mathbf{a}_{f}\sim\left\{\mathbf{f}_{1},\ldots,\mathbf{f}_{n}\right\}$ , 4. 4.

for $n=2^{d}$ : $m=Cs\log^{4}(n)$ random rows of a Hadamard matrix: $\mathbf{a}_{h}\sim\left\{\mathbf{h}_{1},\ldots,\mathbf{h}_{n}\right\}$ .

A rigorous treatment of all these cases can be found in Ref. [3]. Here, and throughout this work, $C>0$ denotes an absolute constant whose exact value depends on the context, but it is always independent of the problem parameters $n,s$ and $m$ . It is instructive to compare the amount of randomness that is required to generate one instance of the random vectors in question. A random signed Bernoulli vector $\mathbf{a}_{sb}\in\mathbb{R}^{n}$ requires $n$ random bits (one for each coordinate), while a total of $d=\log_{2}(n)$ random bits suffice to select a random row $\mathbf{a}_{h}\in\mathbb{R}^{n}$ of a Hadamard matrix. A comparison between complex standard Gaussian vectors $\mathbf{a}_{g}\in\mathbb{C}^{n}$ and random Fourier vectors $\mathbf{a}_{f}\in\mathbb{C}^{n}$ indicates a similar discrepancy. In summary: highly structured random vectors, like $\mathbf{a}_{f},\mathbf{a}_{h}$ require exponentially fewer random bits to generate than generic random vectors, like $\mathbf{a}_{g},\mathbf{a}_{sb}$ . Importantly, this transition from generic measurements to highly structured ones comes at a price. The number of measurements required in case (1) and (4) scales poly-logarithmically in $n$ . More sophisticated approaches allow for converting this offset into a polylogarithmic scaling in $s$ rather than $n$ [14, 15]. Another, arguably even higher price, is hidden in the proof techniques behind these results. They are considerably more involved.

The following two subsections are devoted to introduce formalisms that allow for partially de-randomizing signed Bernoulli vectors and complex standard Gaussian vectors, respectively.

I-C Partially de-randomizing signed Bernoulli vectors

Throughout this work, we endow $\mathbb{C}^{n}$ with the standard inner product $\langle\mathbf{x},\mathbf{y}\rangle=\sum_{i=1}^{n}\bar{x}_{i}y_{i}$ . We denote the associated (Euclidean) norm by $\|\mathbf{z}\|_{\ell_{2}}^{2}=\langle\mathbf{z},\mathbf{z}\rangle$ . Let $\mathbf{a}_{sb}=\sum_{i=1}^{n}\epsilon_{i}\mathbf{e}_{i}$ be a signed Bernoulli vector with coefficients $\epsilon_{i}\sim\left\{\pm 1\right\}$ chosen independently at random (Rademacher random variables). Then,

[TABLE]

which is equivalent to demanding

[TABLE]

Independent sign entries are sufficient, but not necessary for this feature. Indeed, suppose that $n=2^{d}$ is a power of two. Then the rows of a Sylvester Hadamard matrix $\mathbf{h}_{1},\ldots,\mathbf{h}_{n}$ correspond to a particular subset of sign vectors. Let $\mathbf{a}_{h}\in\mathbb{R}^{n}$ be the random vector arising from choosing a Hadamard row uniformly at random. Then,

[TABLE]

because the Hadamard rows $\mathbf{h}_{i}$ ’s are proportional to an orthonormal basis and have norm $\sqrt{n}$ . This in turn implies that the coordinates $h_{i},h_{j}\in\left\{\pm 1\right\}$ of a randomly selected Hadamard matrix row obey (2), despite not being independent instances of random signs. This feature is called pairwise independence and naturally generalizes to $k\geq 2$ :

Definition 1 ( $k$ -wise independence).

Fix $k\geq 2$ and let $\epsilon_{i}$ denote independent instances of a signed Bernoulli random variable. We call a random sign vector $\mathbf{a}\in\left\{\pm 1\right\}^{n}$ $k$ -wise independent, if its components $a_{1},\ldots,a_{n}$ obey

[TABLE]

for all $k$ -tuples of indices $1\leq i_{1},\ldots,i_{k}\leq n$ .

Explicit constructions for $k$ -wise independent vectors are known for any $k$ and $n$ . In this work we focus on particular constructions that rely on generalizing the following instructive example. Fix $n=4$ and consider the rows of the following matrix:

[TABLE]

The first two rows summarize all possible length-two combinations of $\pm 1$ . The coefficients of the third row correspond to their entry-wise product. Hence, it is completely characterized by the first two. The three row vectors are not mutually independent. Nonetheless, each subset of two rows does mimic independent behavior: all possible length-two combinations of $\pm 1$ occur exactly once. This ensures that a randomly selected row is pairwise independent in the sense that its coefficients obey Eq. (2).

This simple example may readily be generalized. A binary $M\times n$ orthogonal array of strength $t$ is a sign matrix $\mathbf{O}\in\left\{\pm 1\right\}^{M\times n}$ such that every selection of $t$ rows contains all elements of $\left\{\pm 1\right\}^{t}$ an equal number of times.

Several different explicit constructions of orthogonal arrays are known. A simple counting argument reveals that the number of rows must obey $M\geq O(n^{t/2})$ . This number scales polynomially in the array strength $t$ – a potentially exponential improvement over the “full” array that lists all $2^{n}$ possible elements of $\left\{\pm 1\right\}^{n}$ . In turn, selecting a random row of $\mathbf{O}$ only requires $\log_{2}(M)\geq\mathcal{O}(t\log_{2}(n))$ random bits and produces a random vector that is $t$ -wise independent according to Definition 1. We refer to Sec. IV and Ref. [16] for a more thorough treatment of this concept.

I-D Partially derandomizing complex standard Gaussian vectors

Let us now discuss another general purpose tool for (partial) de-randomization. Concentration of measure implies that $n$ -dimensional standard complex Gaussian vectors concentrate sharply around the complex sphere $\sqrt{n}\mathbb{S}^{n-1}$ of radius $\sqrt{n}$ . Hence, they behave very similarly to vectors $\mathbf{a}_{s}\in\mathbb{C}^{n}$ chosen uniformly from this sphere. Such random vectors obey the following formula for any $k\in\mathbb{N}$ and any $\mathbf{z}\in\mathbb{C}^{n}$ :

[TABLE]

Here, $\mathrm{d}\mathbf{w}$ denotes the uniform measure on the complex unit sphere $\mathbb{S}^{n-1}\subset\mathbb{C}^{n}$ . This formula characterizes even moments of this uniform distribution222For comparison, a complex standard Gaussian vector obeys $\mathbb{E}\left[|\langle\mathbf{z},\mathbf{a}_{g}\rangle|^{2k}\right]=k!\|\mathbf{z}\|_{\ell_{2}}^{2k}$ instead.. The concept of $k$ -designs [17] uses this moment formula as a starting point for partial de-randomization. Roughly speaking, a $t$ -design is a finite subset of $\sqrt{n}$ -length vectors such that the uniform distribution over these vectors reproduces the uniform measure on $\sqrt{n}\mathbb{S}^{n-1}$ up to $k$ -th moments. More precisely:

Definition 2.

A set of $N$ vectors $\left\{\mathbf{w}_{i}\right\}_{i=1}^{n}\subset\sqrt{n}\mathbb{S}^{n-1}$ with length $\sqrt{n}$ is called a (complex projective) $t$ -design if a randomly chosen vector $\mathbf{a}_{(t)}$ obeys for any $1\leq k\leq t$

[TABLE]

(Spherical) $t$ -designs were originally developed as cubature formulas for the real-valued unit sphere [17]. The concept has since been extended to other sets. A generalization to the complex projective space $\mathbb{C}P^{n-1}$ gives rise to Definition 2. Complex projective $t$ -designs are known to exist for any $t$ and any dimension $n$ , see e.g. [18, 19, 20]. However, explicit constructions for $t\geq 3$ are notoriously difficult to find. In contrast, several explicit families of 2-designs have been identified. Here, we will focus on one such family. Two orthonormal bases $\left\{\mathbf{b}_{i}\right\}_{i=1}^{n}$ and $\left\{\mathbf{c}_{i}\right\}_{i=1}^{n}$ of $\mathbb{C}^{n}$ are called mutually unbiased if

[TABLE]

A prominent example for such a basis pair are the standard basis and the Fourier, or Hadamard, basis, respectively. One can show that at most $n+1$ different orthonormal bases exist that have this property in a pairwise fashion [21, Theorem 3.5]. Such a set of $n+1$ bases is called a maximal set of mutually unbiased bases (MMUB). For instance, in $n=2$ the standard basis together with

[TABLE]

forms a MMUB. Importantly, MMUBs are always (proportional to) 2-designs [22]. Explicit constructions exist for any prime power dimension $n$ and one can ensure that the standard basis is always one of them. Here we point out one construction that is particularly simple if the dimension is (an odd) prime $n\geq 5$ [23]: The standard basis vectors $\mathbf{e}_{1},\ldots,\mathbf{e}_{n}\in\mathbb{C}^{n}$ together with all vectors whose entry-wise coefficients correspond to

[TABLE]

form a MMUB. Here $\omega_{n}=\exp\left(\frac{2\pi i}{n}\right)$ is a $n$ -th root of unity. The parameter $\alpha\in\left[n\right]$ singles out one of the $n$ different bases, while $\lambda\in\left[n\right]$ labels the $n$ corresponding basis vectors. Excluding the standard basis, this set of $n^{2}$ vectors corresponds to all time-frequency shifts of a discrete Alltop sequence $\left[\mathbf{f}\right]_{k}=\omega_{n}^{k^{3}}$ [24].

I-E Main results

Theorem 1 (CS from orthogonal array measurements).

Suppose that a matrix $\mathbf{A}$ contains $m\geq Cs\log(2n)$ rows that are chosen independently from an orthogonal array with strength four. Then, with probability at least $1-2\mathrm{e}^{-\tilde{c}m}$ , any $s$ -sparse $\mathbf{x}\in\mathbb{C}^{n}$ can be recovered from $\mathbf{y}=\mathbf{A}\mathbf{x}$ by means of algorithm (1).

Theorem 2 (CS from time-frequency shifted Alltop sequences).

Let $n\geq 5$ be prime and suppose that $\mathbf{A}$ contains $m\geq Cs\log(2n)$ rows that correspond to random time-frequency shifts of the Alltop sequence (5) in dimension $n$ . Then, with probability at least $1-\mathrm{e}^{-\tilde{c}m}$ , any $s$ -sparse $\mathbf{x}\in\mathbb{R}^{n}$ can be recovered from $\mathbf{y}=\mathbf{A}\mathbf{x}$ by means of algorithm (1).

This result actually generalizes to measurements that are sampled from a maximal set of mutually unbiased bases (excluding the standard basis). Time-frequency shifts of the Alltop sequence are one concrete construction that applies to prime dimensions only.

Note that the cardinality of all Alltop shifts is $n^{2}$ . Hence, $2\log_{2}(n)$ random bits suffice to select a random time-frequency shift. In turn, a total of

[TABLE]

random bits are required for sampling a complete measurement matrix $\mathbf{A}$ . This number is exponentially smaller than the number of random bits required to generate a matrix with independent complex Gaussian entries. A similar comparison holds true for random signed Bernoulli matrices and columns sampled from a strength-4 orthogonal array.

Highly structured families of vectors – such as rows of a Fourier, or Hadamard matrix – require even less randomness to sample from: only $\log_{2}(n)$ bits are required to select such a row uniformly at random. However, existing convergence guarantees are weaker than the main results presented here. They require an order of $Cs\mathrm{polylog}(s)\log(n)$ random measurements to establish comparable results. Thus, the total number of random bits required for such a procedure scales like $Cs\mathrm{polylog}(s)\log^{2}(n)$ . Eq. (6) still establishes a logarithmic improvement in terms of sparsity.

The recovery guarantees in Theorem 1 and 2 can be readily extended to ensure stability with respect to noise corruption in the measurements and robustness with respect to violations of the model assumption of sparsity. We refer to Sec. III for details.

We also emphasize that there are results in the literature that establish compressed sensing guarantees comparable, or even less, randomness. Obviously, deterministic constructions are the extreme case in this regard. Early results suffer from a “quadratic bottleneck”. The number of measurements must scale quadratically in the sparsity: $m\simeq s^{2}$ . Although this obstacle was overcome, existing progress is still comparatively mild. Refs. [25, 26, 27] establish deterministic convergence guarantees for $m\simeq s^{2-\epsilon}$ , where $\epsilon>0$ is a (very) small constant.

Closer in spirit to this work is Ref. [28]. There, the authors employ the Legendre symbol – which is well known for its pseudorandom behavior – to partially derandomize a signed Bernoulli matrix. In doing so, they establish uniform $s$ -sparse recovery from $m\geq Cs\log^{2}(s)\log(n)$ measurements that require an order of $s\log(s)\log(n)$ random bits to generate. Compared to the main results presented here, this result gets by with less randomness, but requires more measurements. The proof technique is also very different.

To this date, the strongest de-randomized reconstruction guarantees hail from a close connection between $s$ -sparse recovery and Johnson-Lindenstrauss embeddings [29, 30]. These have a wide range of applications in modern data science. Kane and Nelson [31] established a very strong partial de-randomization for such embeddings. This result may be used to establish uniform $s$ -sparse recovery for $m=Cs\log(n/s)$ measurements that require an order of $s\log\left(s\log(n/s)\log(n/s)\right)$ random bits. This result surpasses the main results presented here in both sampling rate and randomness required.

However, this strong result follows from “reducing” the problem of $s$ -sparse recovery to a (seemingly) very different problem: find Johnson-Lindenstrauss embeddings. Such a reduction typically does not preserve problem-specific structure. In contrast, the approach presented addresses the problem of sparse recovery directly and relies on tools from signal processing. In doing so, we maintain structural properties that are common in several applications of $s$ -sparse recovery. Orthogonal array measurements, for instance, have $\pm 1$ -entries. This is well-suited for the single pixel camera [32]. Alltop sequence constructions, on the other hand, have successfully been applied to stylized radar problems [33]. Both types of measurements also have the property that every entry has unit modulus. This is an important feature for the application of CDMA [34]. Having pointed out these high level connections, we want to emphasize that careful, problem specific adaptations may be required to rigorously exploit these. The framework developed here may serve as a guideline on how to achieve this goal in concrete scenarios.

II Proofs

II-A Textbook-worthy proof for real-valued compressed sensing with Gaussian measurements

This section is devoted to summarizing an elegant argument that is originally due to Rudelson and Vershynin [14], see also [35, 36, 37] for arguments that are similar in spirit. This argument only applies to $s$ -sparse recovery of real-valued signals. We will generalize a similar idea to the complex case later on.

In this work we are concerned with uniform reconstruction guarantees: With high probability a single realization of the measurement matrix $\mathbf{A}$ allows for reconstructing any $s$ -sparse vector $\mathbf{x}$ by means of $\ell_{1}$ -regularization (1). A necessary pre-requisite for uniform recovery is the demand that no $s$ -sparse vector is contained in the kernel, or nullspace, of $\mathbf{A}$ . This condition is captured by the nullspace property (NSP). Define

[TABLE]

where $\sigma_{s}(\mathbf{x})=\inf_{\|\mathbf{z}\|_{0}\leq s}\|\mathbf{x}-\mathbf{z}\|_{\ell_{1}}\quad\textrm{for}\quad\mathbf{x}\in\mathbb{C}^{n}$ is the approximation error (measured in $\ell_{1}$ -norm) one incurs when approximating $\mathbf{x}$ with a $s$ -sparse vector. A matrix $\mathbf{A}$ obeys the NSP of order $s$ if

[TABLE]

The set $T_{s}$ is a subset of the unit sphere that contains all normalized $s$ -sparse vectors. This justifies the informal definition of the NSP: no $s$ -sparse vector is an element of the nullspace of $\mathbf{A}$ . Importantly, the NSP is not only necessary, but also sufficient for uniform recovery, see e.g. [3, Theorem 4.5]. Hence, universal recovery of $s$ -sparse signals readily follows from establishing Rel. (8). The nullspace property and its relation to $s$ -sparse recovery has long been somewhat folklore. We refer to Ref. [3] for a discussion of its origin.

The following powerful statement allows for exploiting generic randomness in order to establish nullspace properties. It is originally due to Gordon [38], but we utilize a more modern reformulation, see [3, Theorem 9.21].

Theorem 3 (Gordon’s escape through a mesh).

Let $\mathbf{A}\in\mathbb{M}_{m\times n}$ be a real-valued standard Gaussian matrix and let $E\subseteq\mathbb{S}^{n}$ be a subset of the real-valued unit sphere. Define the Gaussian width $\ell(E)=\mathbb{E}\sup_{\mathbf{z}\in E}\langle\mathbf{a}_{g},\mathbf{z}\rangle,$ where the expectation is over realizations $\mathbf{a}_{g}\sim\mathcal{N}(0,\mathbb{I})$ of a standard Gaussian random vector. Then, for $t\geq 0$ the bound

[TABLE]

is true with probability at least $1-\mathrm{e}^{-t^{2}/2}$ .

This is a deep statement that connects random matrix theory to geometry: the Gaussian width is a rough measure of the size of the set $E\subseteq\mathbb{S}^{n}$ . Setting $E=T_{s}$ allows us to conclude that a matrix $\mathbf{A}$ encompassing $m$ independent Gaussian measurements is very likely to obey the $s$ -NSP (8), provided that $m-1$ exceeds $\ell(T_{s})^{2}$ . In order to derive an upper bound on $\ell(T_{s})$ , we may use the following inclusion

[TABLE]

see e.g. [35, Lemma 3] and [14, Lemma 4.5]. Here, $\Sigma^{n}_{s}\subseteq\mathbb{S}^{n}$ denotes the set of all $s$ -sparse vectors with unit length. In turn,

[TABLE]

because the linear function $\mathbf{z}\mapsto\langle\mathbf{a}_{g},\mathbf{z}\rangle$ achieves its maximum value at the boundary $\Sigma_{s}^{n}$ of the convex set $\mathrm{conv}\left(\Sigma_{s}^{n}\right)$ . The right hand side of (9) is the expected supremum of a Gaussian process indexed by $\mathbf{z}\in\Sigma_{s}^{n}$ . Dudley’s inequality [39], see also [3, Theorem 8.23], states

[TABLE]

where $\mathcal{N}(\Sigma_{s}^{n},\|\cdot\|_{\ell_{2}},u)$ are covering numbers associated with the set $\Sigma_{s}^{n}$ . They are defined as the smallest cardinality of a $u$ -covering net with respect to the Euclidean distance. A volumetric counting argument yields $\mathcal{N}(\Sigma_{s}^{n},\|\cdot\|_{\ell_{2}},u)\leq\left(\frac{\mathrm{e}n}{s}\right)^{s}\left(1+\frac{2}{u}\right)^{s}$ and Dudley’s inequality therefore implies

[TABLE]

where $c$ is an absolute constant. This readily yields the following assertion.

Theorem 4 (NSP for Gaussian measurements).

A number of $m\geq cs\log(\mathrm{e}n/s)$ independent real-valued Gaussian measurements obeys the (real-valued) $s$ -NSP with high probability at least $1-\mathrm{e}^{-\tilde{c}m}$ .

This argument is exemplary for generic proof techniques: strong results from probability theory allow for establishing close-to-optimal results in a relatively succinct fashion.

II-B Extending the scope to subgaussian measurements

The extended arguments presented here are largely due to Dirksen, Lecue and Rauhut [36]. Again, we will focus on the real-valued case.

Gordon’s escape through a mesh is only valid for Gaussian random matrices $\mathbf{A}$ . Novel methods are required to extend this proof technique beyond this idealized case. Comparatively recently, Mendelson provided one by generalizing Gordon’s escape through a mesh [40, 41].

Theorem 5 (Mendelson’s small ball method, Tropp’s formulation [37]).

Suppose that $\mathbf{A}$ is a random $m\times n$ matrix whose rows correspond to $m$ independent realizations of a random vector $\mathbf{a}\in\mathbb{R}^{n}$ . Fix a set $E\subseteq\mathbb{R}^{n}$ , and define

[TABLE]

is the empirical average over $m$ independent copies of $\mathbf{a}$ weighted by uniformly random signs $\epsilon_{i}\sim\left\{\pm 1\right\}$ . Then, for any $t,\xi>0$

[TABLE]

with probability at least $1-2\mathrm{e}^{-t^{2}/2}$ .

It is worthwhile to point out that for real-valued Gaussian vectors this result recovers Theorem 3 up to constants. Fix $\xi>0$ of appropriate size. Then, $E\subseteq\mathbb{S}^{n}$ ensures that $\xi Q_{2\xi}(\mathbf{a}_{g},E)$ is constant. Moreover, $W_{m}(\mathbf{a}_{g},E)$ reduces to the usual Gaussian width $\ell(E)$ .

Mendelson’s small ball method can be used to establish the nullspace property for independent random measurements $\mathbf{a}\in\mathbb{R}^{n}$ that exhibit subgaussian behavior:

[TABLE]

Signed Bernoulli vectors are a concrete example: $\left[\mathbf{a}\right]_{k}=\epsilon_{k}$ is an independent instance of a Rademacher random variable. Signed Bernoulli vectors obey

[TABLE]

Direct computation also reveals

[TABLE]

because there are 3 possible pairings of four indices.

Now, set $E=T_{s}\subset\mathbb{S}^{n}$ .

An application of the Paley-Zygmund inequality then allows for bounding the parameter $Q_{2\xi}(\mathbf{a}_{\mathrm{sb}},T_{s})$ in Mendelson’s small ball method from below:

[TABLE]

This lower bound is constant for any $\xi\in(0,1)$ .

Next, note that $X_{\mathbf{z}}=\langle\mathbf{z},\mathbf{h}\rangle$ is a stochastic process that is indexed by $\mathbf{z}\in\mathbb{R}^{n}$ . This process is centered ( $\mathbb{E}X_{\mathbf{z}}=0$ ) and Eq. (10) implies that it is also subguassian (at least for any $\mathbf{z}\in\Sigma_{s}^{n}$ ). Moreover, $\mathbb{E}\left[|X_{\mathbf{z}}-X_{\mathbf{y}}|^{2}\right]^{1/2}=\|\mathbf{z}-\mathbf{y}\|_{\ell_{2}}^{2}$ readily follows from (11). Unlike Gordon’s escape through a mesh, Dudley’s inequality does remain valid for such stochastic processes with subgaussian marginals. We can now repeat the width analysis from the previous section to obtain

[TABLE]

Fixing $\xi>0$ sufficiently small, setting $t=\tilde{c}\sqrt{m}$ and inserting these bounds into Eq. (5) yields the following result.

Theorem 6 (NSP for signed Bernoulli measurements).

A matrix $\mathbf{A}$ encompassing $m\geq Cs\log(\mathrm{e}n/s)$ random signed Bernoulli measurements obeys the real-valued $s$ -NSP with probability at least $1-\mathrm{e}^{\tilde{c}m}$ .

A similar result remains valid for other classes of independent measurements with subgaussian marginals (10).

II-C Generalization to complex-valued signals and partial de-randomization

The nullspace property, as well as its connection to uniform $s$ -sparse recovery readily generalizes to complex-valued $s$ -sparse vectors. A similar extension applies to Mendelson’s small ball method:

Theorem 7 (Mendelson’s small ball method for complex vector spaces).

Suppose that the rows of $\mathbf{A}$ correspond to $m$ independent copies of a random vector $\mathbf{a}\in\mathbb{C}^{n}$ . Fix a set $E\subset\mathbb{C}^{n}$ and define

[TABLE]

Then, for any $t,\xi>0$

[TABLE]

with probability at least $1-2\mathrm{e}^{-t^{2}/2}$ .

Such a generalization was conjectured by Tropp [37], but we are not aware of any rigorous proof in the literature. We provide one in Subsection V-B and believe that such an extension may be of independent interest. This extension allows for generalizing the arguments from the previous subsection to the complex-valued case.

Let us now turn to the main scope of this work: partial de-randomization. Effectively, Mendelson’s small ball method reduces the task of establishing nullspace properties to bounding the two parameters $Q_{2^{3/2}\xi}(\mathbf{a},T_{s})$ and $W_{m}(\mathbf{a},T_{s})$ in an appropriate fashion. A lower bound on the former readily follows from the Paley-Zygmund inequality, provided that the random vector $\mathbf{a}$ obeys

[TABLE]

where $C_{4}>0$ is a constant:

[TABLE]

In contrast, establishing an upper bound on $W_{m}(\mathbf{a},T_{s})$ via Dudley’s inequality requires subgaussian marginals (10) (that must not depend on the ambient dimension). This implicitly imposes stringent constraints on all moments simultaneously. An additional assumption allows to considerably weaken these demands:

[TABLE]

Incoherence has long been identified as a key ingredient for developing $s$ -sparse recovery guarantees. Here, we utilize it to establish an upper bound on $W_{m}(\mathbf{A},T_{s})$ that does not rely on subgaussian marginals.

Lemma 1.

Let $\mathbf{a}\in\mathbb{C}^{n}$ be a random vector that is isotropic and incoherent. Let $T_{s}\subset\mathbb{C}^{n}$ be the complex-valued generalization of the set defined in Eq. (7) and assume $m\geq\log(2n)$ . Then,

[TABLE]

This bound only requires an appropriate scaling of the first two moments (isotropy). However, this partial derandomization comes at a price: the bound scales logarithmically in $n$ rather than $n/s$ . We defer a proof of this statement to Subsection V-A below. Inserting the bounds (13) and (15) into the assertion of Theorem 7 readily yields the main technical result of this work:

Theorem 8.

Suppose that $\mathbf{a}\in\mathbb{C}^{n}$ is a random vector that obeys incoherence, isotropy and the 4th moment bound. Then, choosing

[TABLE]

instances of $\mathbf{a}$ uniformly at random results in a measurement matrix $\mathbf{A}$ that obeys the complex-valued nullspace property of order $s$ with probability at least $1-2\mathrm{e}^{-\tilde{c}m}$ .

In complete analogy to the real-valued case, the complex nullspace property ensures uniform recovery of $s$ -sparse vectors $\mathbf{x}\in\mathbb{C}^{n}$ from linear measurements of the form $\mathbf{y}=\mathbf{A}\mathbf{x}$ via algorithm (1).

II-D Recovery guarantee for strength-four orthogonal arrays

Suppose that $\mathbf{a}_{oa}\in\left\{\pm 1\right\}^{n}$ is chosen uniformly from an orthogonal array with strength 4. By definition

[TABLE]

which establishes incoherence. Moreover, the components $a_{i}$ of $\mathbf{a}_{oa}$ obey $\mathbb{E}\left[a_{i}a_{j}\right]=\mathbb{E}\left[\epsilon_{i}\epsilon_{j}\right]=\delta_{ij}$ , because 4-wise independence necessarily implies 2-wise independence. Isotropy readily follows:

[TABLE]

Finally, 4-wise independence suffices to establish the 4th moment bound. By assumption $\mathbb{E}\left[a_{i}a_{j}\bar{a}_{k}\bar{a}_{l}\right]=\mathbb{E}\left[\epsilon_{i}\epsilon_{j}\epsilon_{k}\epsilon_{l}\right]$ and we may thus infer

[TABLE]

Therefore $\mathbf{a}_{oa}$ meets all the requirements of Theorem 8. The first main result then readily follows from the fact that the complex nullspace property ensures uniform recovery of all $s$ -sparse signals.

II-E Recovery guarantee for mutually unbiased bases

Suppose that $\mathbf{a}_{mub}\in\mathbb{C}^{n}$ is chosen uniformly from a maximal set of $n$ mutually unbiased bases (excluding the standard basis) whose elements are re-normalized to length $\sqrt{n}$ . Random time-frequency shift of the Alltop sequence (5) is a concrete example for such a sampling procedure, provided that the dimension $n\geq 5$ is an (odd) prime.

The vector $\mathbf{a}_{mub}$ is chosen from a union of $n$ bases that are all mutually unbiased with respect to the standard basis, see Eq. (4). Together with super-normalization ( $\|\mathbf{a}\|_{\ell_{2}}=\sqrt{n}$ ) this readily establishes incoherence: $\max_{1\leq k\leq n}|\langle\mathbf{e}_{k},\mathbf{a}\rangle|^{2}=\frac{n}{n}=1$ with probability one.

Next, by assumption $\mathbf{a}_{mub}$ is chosen uniformly from a union of $n$ re-scaled orthonormal bases $\left\{\sqrt{n}\mathbf{b}_{1}^{(l)},\ldots,\sqrt{n}\mathbf{b}_{n}^{(l)}\right\}$ with $1\leq l\leq n$ . Therefore, for any $\mathbf{z}\in\mathbb{C}^{n}$

[TABLE]

which establishes isotropy.

Finally, a maximal set of $(n+1)$ mutually unbiased bases – including the standard basis which we denote by $\mathbf{b}_{k}^{(n+1)}=\mathbf{e}_{k}$ – forms a 2-design according to Definition 2. For any $\mathbf{z}\in\mathbb{C}^{n}$ this property ensures

[TABLE]

which implies the 4th moment bound. In summary, the random vector $\mathbf{a}_{mub}\in\mathbb{C}^{n}$ meets the requirements of Theorem 8. Theorem 2 then readily follows form the implications of the nullspace property for $s$ -sparse recovery.

III Extension to noisy measurements

The nullspace property may be generalized to address two imperfections in $s$ -sparse recovery simultaneously: (i) the vector $\mathbf{x}\in\mathbb{C}^{d}$ may only be approximately sparse in the sense that it is well-approximated by a $s$ -sparse vector, (ii) the measurements may be corrupted by additive noise: $\mathbf{y}=\mathbf{A}\mathbf{x}+\mathbf{s}$ with $\mathbf{s}\in\mathbb{C}^{m}$ .

To state this generalization, we need some additional notation. For $\mathbf{z}\in\mathbb{C}^{n}$ and $1\leq s\leq n$ , let $\mathbf{z}_{s}\in\mathbb{C}^{n}$ be the vector that only contains the $s$ largest entries in modulus. All other entries are set to zero. Likewise, we write $\mathbf{z}_{\bar{s}}=\mathbf{z}-\mathbf{z}_{s}$ to denote the remainder. In particular, $\sigma_{s}(\mathbf{z})=\|\mathbf{z}_{\bar{s}}\|_{\ell_{1}}$ . A $m\times n$ matrix $\mathbf{A}$ obeys the robust nullspace property of order $s$ with parameters $\rho\in(0,1)$ and $\tau>0$ if

[TABLE]

see e.g. [3, Definition 4.21]. This extension of the nullspace property is closely related to stable $s$ -sparse recovery from noisy measurements via basis pursuit denoising:

[TABLE]

Here, $\eta>0$ denotes an upper bound on the strength of the noise corruption: $\|\mathbf{s}\|_{\ell_{2}}\leq\eta$ . Indeed, [3, Theorem 4.22] draws the following connection: suppose that $\mathbf{A}$ obeys the robust nullspace property with parameters $\rho,\tau$ . Then, the solution $\mathbf{z}^{\sharp}\in\mathbb{C}^{n}$ to (16) is guaranteed to obey

[TABLE]

where $D_{1}=(1+\rho)^{2}/(1-\rho)$ and $D_{2}=(3+\rho)\tau/(1-\rho)$ . The first term on the r.h.s. vanishes if $\mathbf{x}$ is exactly $s$ -sparse and remains small if $\mathbf{x}$ is well approximated by a $s$ -sparse vector. The second term scales linearly in the noise bound $\eta\geq\|\mathbf{s}\|_{\ell_{2}}$ and vanishes in the absence of any noise corruption.

In the previous section, we have established the classical nullspace property for measurements that are chosen independently from a vector distribution that is isotropic, incoherent and obeys a bound on the 4th moments. This argument may readily be extended to establish the robust nullspace property with relatively little extra effort. To this end, define the set

[TABLE]

A moment of thought reveals that the matrix $\mathbf{A}$ obeys the robust nullspace property with parameters $\rho,\tau$ if

[TABLE]

What is more, the following inclusion formula is also valid:

[TABLE]

see [35, Lemma 3] and [14, Lemma 4.5]. This ensures that the bounds on the parameters in Mendelson’s small ball method generalize in a rather straightforward fashion. Isotropy, incoherence and the 4th moment bound ensure

[TABLE]

Now, suppose that $\mathbf{A}$ subsumes $m\geq C\rho^{-2}s\log(2n)$ independent copies of the random vector $\mathbf{a}\in\mathbb{C}^{n}$ , where $C>0$ is sufficiently large. Then, Theorem 7 readily asserts

[TABLE]

with probability at least $1-2\mathrm{e}^{-\tilde{c}m}$ . Previously, we employed Mendelson’s small ball method to simply assert that a similar infimum is strictly positive. Eq. (19) provides a strictly positive lower bound with comparable effort. Comparing this relation to Eq. (18) highlights that this is enough to establish the robust nullspace property with parameters $\rho$ and $\tau=\frac{\rho}{c\sqrt{m}}$ with high probability. In turn, a stable generalization of the main recovery guarantee follows from Eq. (17).

Theorem 9.

Fix $\rho\in(0,1)$ and $s\in\mathbb{N}$ . Suppose that we sample $m\geq C\rho^{-2}s\log(n)$ independent copies of an isotropic, incoherent random vector $\mathbf{a}\in\mathbb{C}^{n}$ that also obeys the 4th moment bound. Then, with probability at least $1-2\mathrm{e}^{-\tilde{c}m}$ , the resulting measurement matrix $\mathbf{A}$ allows for stable, uniform recovery of (approximately) $s$ -sparse vectors. More precisely, the solution $\mathbf{z}^{\sharp}$ to (16) is guaranteed to obey

[TABLE]

where $D_{1},D_{2}>0$ depend only on $\rho$ .

IV Numerical experiments

In this part we demonstrate the performance which can be achieved with our proposed derandomized constructions and we compare this to generic measurement matrices (Gaussian, signed Bernoulli). However, since the orthogonal array construction is more involved we first provide additional details relevant for numerical experiments.

IV-A Details on orthogonal arrays

An orthogonal array $\text{OA}(\lambda\sigma^{t},n,\sigma,t)$ of strength $t$ , with $n$ factors and $\sigma$ levels is an $\lambda\sigma^{t}\times n$ array of $\sigma$ different symbols such that in any $t$ columns every ordered $\sigma^{t}$ -tuple occurs in exactly $\lambda$ rows. Arrays with $\lambda=1$ are called simple. A comprehensive treatment can be found in the book [16]. Known arrays are listed in several libraries333for example http://neilsloane.com/oadir/ or http://pietereendebak.nl/oapage/. Often the symbol alphabet is not relevant, but we use the set $\mathbb{Z}_{\sigma}=\{0,\dots,\sigma-1\}$ for concreteness. Such arrays can be represented as a matrix in $\mathbb{Z}_{\sigma}^{\lambda\sigma^{t}\times n}$ . For $\sigma=q^{p}$ with $q$ prime the simple orthogonal array $\text{OA}(\sigma^{t},n,\sigma,t)$ is linear if the $q^{pt}$ rows of the matrix form a vector space over $\mathbb{F}_{q}$ . The runs of an orthogonal array (the rows of the corresponding matrix) can also be interpreted as codewords of a code and vice versa. The array is linear if and only if the corresponding code is linear [16, Chapter 4]. This relationship allows to employ classical code constructions to construct orthogonal arrays.

IV-B Counting bits

In this work we propose to generate $m\times n$ sampling matrices $\mathbf{A}$ by selecting $m\leq M=\lambda\sigma^{t}$ rows at random from an orthogonal array $\text{OA}(\lambda\sigma^{4},n,\sigma,4)$ , eventually removing the bias (substracting $(\sigma-1)/2$ per component) and scale appropriately. Intuitively, $m\log_{2}(M)$ bits are then required to specify such a matrix $\mathbf{A}$ . For $t=4$ and $k=n$ , a classical lower bound due to Rao [42] demands

[TABLE]

Arrays that saturate this bound are called tight (or complete). In summary, an order of $s\log^{2}(n)$ bits are required to sample a $m\times n$ matrix $\mathbf{A}$ with $m\geq Cs\log(n)$ rows according to this procedure.

IV-C Strength- $4$ Constructions

For compressed sensing applications we want arrays with large number of factors $n$ since this corresponds to the ambient dimension $n=k$ of the sparse vectors to recover. On the other hand the run size $M$ should scale “moderately” to describe the random matrices only with few bits. Most constructions use an existing orthogonal array as a seed to construct larger arrays. Known binary arrays of strength $4$ are for example the simple array $\text{OA}(16,5,2,4)$ , or $\text{OA}(80,6,2,4)$ . Ref. [43] proposes an algorithm that uses a linear orthogonal array $\text{OA}(N,n,\sigma,t)$ as a seed to construct a linear orthogonal array $\text{OA}(N^{2},n^{2}+2n,\sigma,t)$ . This procedure may then be iterated.

IV-D Numerical results for orthogonal arrays:

Figure 1 summarizes the empirical performance of basis pursuit (1) from independent orthogonal array measurements. We consider real-valued signals and quantify the performance in terms of the normalized $\ell_{2}$ -recovery error (NMSE). To construct the orthogonal array, algorithm [43] is applied twice $\text{OA}(16,5,2,4)\rightarrow\text{OA}(256,35,2,4)\rightarrow\text{OA}(65536,1295,2,4)$ .

The $323$ rows are uniformly sampled from this array, i.e. the sampling matrix $\mathbf{A}$ has $\pm 1$ entries (mapping $\{0,1\}\rightarrow\{\pm 1\}$ ) and size $323\times 1295$ . Note that, in the case of non-negative sparse vectors, the corresponding 0/1-matrices may be used instead to recover with non-negative least-squares [44]. The sparsity of the unknown vector has been varied between $1\dots 180$ . For each sparsity many experiments are performed to compute NMSE. In each run, the support of the unknown vector has been chosen uniformly at random and the values are independent instances of a standard Gaussian random variable. For comparison, we have also included the corresponding performances of a generic sampling matrix (signed Bernoulli) of the same size. Numerically, the partially derandomized orthogonal array construction achieves essentially the same performance as its generic counterpart.

IV-E Numerical results for the Alltop design

Figure 1 shows the NMSE achieved for measurement matrices based on subsampling from an Alltop-design (5). The data is obtained in the same way as above but the sparse vectors are generated as iid. complex-normal distributed on the support. For comparison the results for a (complex) standard Gaussian sampling matrix are included as well. Again, the performance of random Alltop-design measurements essentially matches its generic (Gaussian) counterpart.

V Additional proofs

V-A Proof of Lemma 1

The inclusion $T_{s}\subset 2\mathrm{conv}(\Sigma_{n}^{s})$ remains valid in the complex case. Moreover, every $\mathbf{z}\in\mathrm{conv}(\Sigma_{n}^{s})$ necessarily obeys

[TABLE]

because the maximum value of a convex function over a convex set is achieved at the boundary. Hoelder’s inequality therefore implies

[TABLE]

where $\mathbf{h}=\frac{1}{\sqrt{m}}\sum_{i=1}^{m}\epsilon_{i}\mathbf{a}_{i}\in\mathbb{C}^{n}$ . Moreover,

[TABLE]

and we may bound both expressions on the r.h.s. independently. For the first term, fix $\theta>0$ and use Jensen’s inequality (the logarithm is a concave function) to obtain

[TABLE]

Monotonicity and non-negativity of the exponential function then imply

[TABLE]

where we have also used that all $\epsilon_{i}$ ’s and $\mathbf{a}_{i}$ ’s are independent. The remaining moment generating functions can be bounded individually. Fix $1\leq k\leq n$ , $\sigma\in\left\{\pm 1\right\}$ and $1\leq i\leq m$ and exploit the Rademacher randomness to infer

[TABLE]

because $\sigma^{2}=1$ . Incoherence moreover ensures $(\mathrm{Re}(\langle\mathbf{e}_{k},\mathbf{a}_{i}\rangle)^{2}\leq|\langle\mathbf{e}_{k},\mathbf{a}_{i}\rangle|^{2}\leq 1$ . This ensures that the remaining expectation value is upper-bounded by $\exp\left(\frac{\theta^{2}}{2m}\right)$ . Inserting these individual bounds into the expression above yields

[TABLE]

for any $0<\theta\leq\sqrt{2m}$ . Choosing $\theta=\sqrt{2\log(2n)}$ is feasible and minimizes this upper bound. A completely analogous bound can be derived for the expected maximum absolute value of the imaginary part. Combining both yields

[TABLE]

and inserting this bound into Eq. (21) ensures

[TABLE]

V-B Proof of Theorem 7

The proof is based on rather straightforward modifications of Tropp’s proof for Mendelson’s small ball method [37]. Let $\mathbf{a}\in\mathbb{C}^{n}$ be a complex-valued random vector. Suppose that $\mathbf{a}_{1},\ldots,\mathbf{a}_{m}\in\mathbb{C}^{n}$ are independent copies of $\mathbf{a}$ and let $\mathbf{A}$ be the $m\times n$ matrix whose $m$ rows correspond to these vectors. The goal is to obtain a lower bound on $\inf_{\mathbf{z}\in E}\|\mathbf{A}\mathbf{z}\|_{\ell_{2}},$ where $E\subset\mathbb{C}^{n}$ is an arbitrary, but fixed, set. First, note that $\ell_{1}$ and $\ell_{2}$ norms on $\mathbb{R}^{2m}$ are related via $\|\mathbf{v}\|_{\ell_{2}}\geq(2m)^{-1}\|\mathbf{v}\|_{\ell_{1}}$ . For fixed $\mathbf{z}\in E$ this ensures

[TABLE]

Next, we fix $\xi>0$ arbitrary and introduce the indicator function $\mathbb{I}\left\{x\geq\xi\right\}$ which obeys $x\geq\xi\mathbb{I}\left\{x\geq\xi\right\}$ for all $x\geq 0$ . Consequently, $\|\mathbf{A}\mathbf{z}\|_{\ell_{2}}$ is upper-bounded by

[TABLE]

Also, note that the expectation value of each summand obeys

[TABLE]

according to the union bound. The last line follows from the following observation. Let $z=a+ib$ be a complex number. Then, $|z|=\sqrt{a^{2}+b^{2}}\geq\sqrt{2}\xi$ necessarily implies either $|a|\geq\xi$ , or $|b|\geq\xi$ (or both). Now, define

[TABLE]

and note that the estimate from above ensures

[TABLE]

Adding and subtracting $\xi(m/2)^{1/2}Q_{2\xi}(\mathbf{z})$ to Eq. (22) and taking the infimum yields

[TABLE]

Here we have applied Eq. (23) to the first term. Since $Q_{2\xi}(\mathbf{z})$ features both a real and imaginary part and we can split up the remaining supremum accordingly. The suprema over real and complex parts individually correspond to

[TABLE]

and we denote them by $R(E,\mathbf{a})$ and $I(E,\mathbf{a})$ , respectively. The vectors $\mathbf{a}_{1},\ldots,\mathbf{a}_{m}$ are independent copies of $\mathbf{a}\in\mathbb{C}^{n}$ . The bounded difference inequality [45, Section 6.1] asserts that both expressions concentrate around their expectation. More precisely, for any $t>0$

[TABLE]

Therefore, the union bound grants a transition from $R(E,\mathbf{a})+I(E,\mathbf{a})$ to $\mathbb{E}R(E,\mathbf{a})+\mathbb{E}I(E,\mathbf{a})+2\sqrt{m}t$ with probability at least $1-2\mathrm{e}^{-t^{2}/2}$ . These expectation values can be further simplified. Define the soft indicator function

[TABLE]

which obeys $\mathbb{I}\left\{|s|\geq 2\xi\right\}\leq\psi_{\xi}(s)\leq\mathbb{I}\left\{|s|\geq\xi\right\}$ for all $s\in\mathbb{R}$ . Moreover, $\xi\psi_{\xi}(s)$ is a contraction, i.e. a real-valued function with Lipschitz constant one that also obeys $\xi\psi_{\xi}(0)=0$ . Rademacher symmetrization [3, Lemma 8.4] and the Rademacher comparison principle [46, Eq. (4.20)] yield

[TABLE]

where $\mathbf{h}=\frac{1}{\sqrt{m}}\sum_{i=1}^{m}\epsilon_{i}\mathbf{a}_{i}\in\mathbb{C}^{n}$ . A completely analogous bound holds true for $\mathbb{E}I(E,\mathbf{a})$ . Inserting both bounds into Eq. (24) establishes

[TABLE]

with probability at least $1-2\mathrm{e}^{-t^{2}/2}$ . Setting $W_{m}(E,\mathbf{z})=\mathbb{E}\sup_{\mathbf{z}\in E}|\langle\mathbf{z},\mathbf{h}\rangle|$ establishes the claim.

Acknowledgements

This work can be seen as a continuation of the research program that David Gross devised for RK’s doctoral studies. PJ is supported by DFG grant JU 2795/3 and DAAD grant 57417688. RK was in part supported by Joel A. Tropp under ONR Award No. N00014-17-12146 and also acknowledges funding provided by the Institute of Quantum Information and Matter, an NSF Physics Frontiers Center (NSF Grant PHY-1733907). DGM was partially supported by AFOSR FA9550-18-1-0107, NSF DMS 1829955, and the Simons Institute of the Theory of Computing.

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory , vol. 52, no. 4, pp. 1289–1306, 2006.
2[2] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory , vol. 52, no. 2, pp. 489–509, 2006.
3[3] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing , ser. Applied and Numerical Harmonic Analysis. Birkhäuser/Springer, New York, 2013.
4[4] Y. C. Eldar and G. Kutyniok, Compressed sensing: Theory and Applications . Cambridge University Press, 2012.
5[5] E. J. Candès, T. Strohmer, and V. Voroninski, “Phaselift: exact and stable signal recovery from magnitude measurements via convex programming.” Commun. Pure Appl. Math. , vol. 66, pp. 1241–1274, 2013.
6[6] E. Candès and X. Li, “Solving quadratic equations via Phase Lift when there are about as many equations as unknowns,” Found. Comput. Math. , pp. 1–10, 2013.
7[7] D. Gross, F. Krahmer, and R. Kueng, “A partial derandomization of phaselift using spherical designs,” J. Fourier Anal. Appl. , vol. 21, no. 2, pp. 229–266, 2015.
8[8] R. Kueng, H. Rauhut, and U. Terstiege, “Low rank matrix recovery from rank one measurements,” Appl. Comput. Harmon. Anal. , vol. 42, no. 1, pp. 88 – 116, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Derandomizing compressed sensing with combinatorial design

Abstract

Index Terms:

I Introduction and main results

I-A Motivation

I-B Preliminaries on compressed sensing

I-C Partially de-randomizing signed Bernoulli vectors

Definition 1** (kkk-wise independence).**

I-D Partially derandomizing complex standard Gaussian vectors

Definition 2**.**

I-E Main results

Theorem 1** (CS from orthogonal array measurements).**

Theorem 2** (CS from time-frequency shifted Alltop sequences).**

II Proofs

II-A Textbook-worthy proof for real-valued compressed sensing with Gaussian measurements

Theorem 3** (Gordon’s escape through a mesh).**

Theorem 4** (NSP for Gaussian measurements).**

II-B Extending the scope to subgaussian measurements

Theorem 5** (Mendelson’s small ball method, Tropp’s formulation [37]).**

Theorem 6** (NSP for signed Bernoulli measurements).**

II-C Generalization to complex-valued signals and partial de-randomization

Theorem 7** (Mendelson’s small ball method for complex vector spaces).**

Lemma 1**.**

Theorem 8**.**

II-D Recovery guarantee for strength-four orthogonal arrays

II-E Recovery guarantee for mutually unbiased bases

III Extension to noisy measurements

Theorem 9**.**

IV Numerical experiments

IV-A Details on orthogonal arrays

IV-B Counting bits

IV-C Strength-444 Constructions

IV-D Numerical results for orthogonal arrays:

IV-E Numerical results for the Alltop design

V Additional proofs

V-A Proof of Lemma 1

V-B Proof of Theorem 7

Acknowledgements

Definition 1 ( $k$ -wise independence).

Definition 2.

Theorem 1 (CS from orthogonal array measurements).

Theorem 2 (CS from time-frequency shifted Alltop sequences).

Theorem 3 (Gordon’s escape through a mesh).

Theorem 4 (NSP for Gaussian measurements).

Theorem 5 (Mendelson’s small ball method, Tropp’s formulation [37]).

Theorem 6 (NSP for signed Bernoulli measurements).

Theorem 7 (Mendelson’s small ball method for complex vector spaces).

Lemma 1.

Theorem 8.

Theorem 9.

IV-C Strength- $4$ Constructions