Deterministic Sparse Fourier Transform with an ell_infty Guarantee

Yi Li; Vasileios Nakos

arXiv:1903.00995·cs.DS·May 8, 2020

Deterministic Sparse Fourier Transform with an ell_infty Guarantee

Yi Li, Vasileios Nakos

PDF

TL;DR

This paper develops deterministic algorithms for sparse Fourier transform recovery with strong infinity-norm guarantees, matching known lower bounds and constructing incoherent matrices via derandomization techniques.

Contribution

It introduces nearly optimal deterministic sampling and recovery algorithms for sparse Fourier transforms with ll_{}/ll_1 guarantees, and provides new derandomized incoherent matrix constructions.

Findings

01

Deterministic ll_{}/ll_1 recovery with O(k^2 g n) samples.

02

New derandomized incoherent matrix constructions matching randomized bounds.

03

Algorithms are nearly sample-optimal, approaching theoretical lower bounds.

Abstract

In this paper we revisit the deterministic version of the Sparse Fourier Transform problem, which asks to read only a few entries of $x \in C^{n}$ and design a recovery algorithm such that the output of the algorithm approximates $\overset{x}{^}$ , the Discrete Fourier Transform (DFT) of $x$ . The randomized case has been well-understood, while the main work in the deterministic case is that of Merhi et al.\@ (J Fourier Anal Appl 2018), which obtains $O (k^{2} lo g^{- 1} k \cdot lo g^{5.5} n)$ samples and a similar runtime with the $ℓ_{2} / ℓ_{1}$ guarantee. We focus on the stronger $ℓ_{\infty} / ℓ_{1}$ guarantee and the closely related problem of incoherent matrices. We list our contributions as follows. 1. We find a deterministic collection of $O (k^{2} lo g n)$ samples for the $ℓ_{\infty} / ℓ_{1}$ recovery in time $O (nk lo g^{2} n)$ , and a deterministic collection of $O (k^{2} lo g^{2} n)$ samples…

Tables3

Table 1. Table 1: Common guarantees of sparse recovery. Only the ℓ 2 / ℓ 2 subscript ℓ 2 subscript ℓ 2 \ell_{2}/\ell_{2} case requires a parameter C > 1 𝐶 1 C>1 . The guarantees are listed in the descending order of strength.

Guarantee	Formula	Deterministic Lower Bound
$ℓ_{\infty} / ℓ_{2}$	${‖ \hat{x} - {\hat{x}}^{'} ‖}_{\infty} \leq {‖ {\hat{x}}_{- k} ‖}_{2} / \sqrt{k}$	$Ω (n)$ [CDD09]
$ℓ_{2} / ℓ_{2}$	${‖ \hat{x} - {\hat{x}}^{'} ‖}_{2} \leq C {‖ {\hat{x}}_{- k} ‖}_{2}$	$Ω (n)$ [CDD09]
$ℓ_{\infty} / ℓ_{1}$	${‖ \hat{x} - {\hat{x}}^{'} ‖}_{\infty} \leq {‖ {\hat{x}}_{- k} ‖}_{1} / k$	$Ω (k^{2} + k \log n)$ [Gan08, FPRU10]
$ℓ_{2} / ℓ_{1}$	${‖ \hat{x} - {\hat{x}}^{'} ‖}_{2} \leq {‖ {\hat{x}}_{- k} ‖}_{1} / \sqrt{k}$	$Ω (k \log (n / k))$ [Gan08, FPRU10]

Table 2. Table 2: Comparison of our results and the previous results. All O 𝑂 O - and Ω Ω \Omega -notations are suppressed. The result in the first row follows from Lemma 2.4 and the RIP matrix in [ HR16 ] .Our algorithms adopt the common assumption in the sparse FT literature that the signal-to-noise ratio is bounded by n c superscript 𝑛 𝑐 n^{c} for some absolute constant c > 0 𝑐 0 c>0 .

	Samples	Run-time	Guarantee	Explict Construction	Lower Bound
[HR16]	$k \log^{2} k \log n$	$poly (n)$	$ℓ_{2} / ℓ_{1}$	No	$k \log (n / k)$
[MZIC18]	$k^{2} \log^{5.5} n / \log k$	$k^{2} \log^{5.5} n / \log k$	$ℓ_{2} / ℓ_{1}$	Yes	$k \log (n / k)$
Theorem 2.7	$k^{2} \log n$	$n k \log^{2} n$	$ℓ_{\infty} / ℓ_{1}$	Yes	$k^{2} + k \log n [NNW14]$
Theorem 2.8	$k^{2} \log^{2} n$	$k^{2} \log^{3} n$	$ℓ_{\infty} / ℓ_{1}$	Yes	$k^{2} + k \log n [NNW14]$

Table 3. Table 3: Notation and semantics for variables in this subsection.

Notation	Semantics
$C$	Absolute Constant
$B$	Number of “Buckets”, power of $2$
$d$	Number of “repetitions”
$β$	equals $C B / d$
$γ$	Rate of SNR decrease
$μ$	Given Approximation to SNR
$ν^{(t)}$	Approximation of SNR at the $t$ -th step
$r^{(t)}$	Residual at the $t$ -th step

Equations210

∥ x - x^{'} ∥_{2} \leq \frac{1}{k} ∥ x_{- k} ∥_{1},

∥ x - x^{'} ∥_{2} \leq \frac{1}{k} ∥ x_{- k} ∥_{1},

∥ x - x^{'} ∥_{\infty} \leq \frac{1}{k} ∥ x_{- k} ∥_{1} .

∥ x - x^{'} ∥_{\infty} \leq \frac{1}{k} ∥ x_{- k} ∥_{1} .

∥ x - x^{'} ∥_{\infty} \leq \frac{1}{k} ∥ x_{- k} ∥_{1},

∥ x - x^{'} ∥_{\infty} \leq \frac{1}{k} ∥ x_{- k} ∥_{1},

∥ x - x^{'} ∥_{\infty} \leq \frac{1}{k} ∥ x_{- k} ∥_{1},

∥ x - x^{'} ∥_{\infty} \leq \frac{1}{k} ∥ x_{- k} ∥_{1},

∥ x_{C} ∥_{1} = \frac{n - k}{n - γ k} ∥ x_{B \cup C} ∥_{1} .

∥ x_{C} ∥_{1} = \frac{n - k}{n - γ k} ∥ x_{B \cup C} ∥_{1} .

∥ 0 - x ∥_{2}^{2}

∥ 0 - x ∥_{2}^{2}

\leq γ k \cdot \frac{4}{k ^{2}} ∥ x_{- k} ∥_{1}^{2} + \frac{1}{( n - γ k )} ∥ x_{B \cup C} ∥_{1}^{2}

\leq \frac{4 γ}{k} ∥ x_{- k} ∥_{1}^{2} + \frac{n - γ k}{( n - k ) ^{2}} ∥ x_{C} ∥_{1}^{2}

\leq (\frac{4 γ}{k} + \frac{1 + γ}{n - k}) ∥ x_{- k} ∥_{1}^{2}

\leq \frac{5 γ}{k} ∥ x_{- k} ∥_{1}^{2},

θ_{j} = (\frac{2 π}{n} j) mod \leavevmode 2 π,

θ_{j} = (\frac{2 π}{n} j) mod \leavevmode 2 π,

x_{β} = x_{i^{*}} e^{- 1 β θ_{i^{*}}} + j \neq = i^{*} \sum x_{j} e^{- 1 β θ_{j}},

x_{β} = x_{i^{*}} e^{- 1 β θ_{i^{*}}} + j \neq = i^{*} \sum x_{j} e^{- 1 β θ_{j}},

(m_{H})_{s} = f \in [n] \sum G_{π (f) - (n / B) \cdot s} ω^{aσ f} \cdot x_{f} \in C

(m_{H})_{s} = f \in [n] \sum G_{π (f) - (n / B) \cdot s} ω^{aσ f} \cdot x_{f} \in C

u_{h (f)} = Δ_{h (f)} + f^{'} \in [n] \sum G_{o_{f} (f^{'})} (x - z)_{f^{'}} ω^{aσ f^{'}},

u_{h (f)} = Δ_{h (f)} + f^{'} \in [n] \sum G_{o_{f} (f^{'})} (x - z)_{f^{'}} ω^{aσ f^{'}},

G_{o_{f} (f)}^{- 1} (m_{H})_{h (f)} ω^{- aσ f} = x_{f} + noise \leavevmode \nobreak term G_{o_{f} (f)}^{- 1} f^{'} \in [n] \ {f} \sum G_{o_{f} (f^{'})} x_{f} ω^{aσ (f^{'} - f)} .

G_{o_{f} (f)}^{- 1} (m_{H})_{h (f)} ω^{- aσ f} = x_{f} + noise \leavevmode \nobreak term G_{o_{f} (f)}^{- 1} f^{'} \in [n] \ {f} \sum G_{o_{f} (f^{'})} x_{f} ω^{aσ (f^{'} - f)} .

r \in [d] \sum G_{o_{f, r} (f)}^{- 1} G_{o_{f, r} (f^{'})} \leq \frac{2 d}{B},

r \in [d] \sum G_{o_{f, r} (f)}^{- 1} G_{o_{f, r} (f^{'})} \leq \frac{2 d}{B},

x_{f} - G_{o_{f, r} (f)}^{- 1} (m_{H_{r}})_{h_{r} (f)} \leq \frac{10}{B} ∥ x_{[n] ∖ {f}} ∥_{1} .

x_{f} - G_{o_{f, r} (f)}^{- 1} (m_{H_{r}})_{h_{r} (f)} \leq \frac{10}{B} ∥ x_{[n] ∖ {f}} ∥_{1} .

r \in [d] \sum x_{f} - G_{o_{f, r} (f)}^{- 1} (m_{H_{r}})_{h_{r} (f)}

r \in [d] \sum x_{f} - G_{o_{f, r} (f)}^{- 1} (m_{H_{r}})_{h_{r} (f)}

\leq r \in [d] \sum G_{o_{f, r} (f)}^{- 1} f^{'} \in [n] ∖ {f} \sum G_{o_{f, r} (f^{'})} ∣ x_{f^{'}} ∣

= f^{'} \in [n] ∖ {f} \sum ∣ x_{f^{'}} ∣ r \in [d] \sum G_{o_{f, r} (f)}^{- 1} G_{o_{f, r} (f^{'})}

\leq f^{'} \in [n] ∖ {f} \sum ∣ x_{f^{'}} ∣ \frac{2 d}{B} .

ν^{(t)} = C μ γ^{T - t},

ν^{(t)} = C μ γ^{T - t},

∥ r^{(T)} ∥_{\infty} \leq max {∥ r_{I}^{(T)} ∥_{\infty}, ∥ r_{I^{c}}^{(T)} ∥_{\infty}} \leq max {ν^{(T)}, ∥ x_{I^{c}} ∥_{\infty}} \leq max {2 μ, (1/ ρ) μ} = 2 μ .

∥ r^{(T)} ∥_{\infty} \leq max {∥ r_{I}^{(T)} ∥_{\infty}, ∥ r_{I^{c}}^{(T)} ∥_{\infty}} \leq max {ν^{(T)}, ∥ x_{I^{c}} ∥_{\infty}} \leq max {2 μ, (1/ ρ) μ} = 2 μ .

r \in [d] \sum G_{o_{f, r} (f^{'})} \leq \frac{2}{1 + ϵ} \cdot \frac{d}{B} .

r \in [d] \sum G_{o_{f, r} (f^{'})} \leq \frac{2}{1 + ϵ} \cdot \frac{d}{B} .

Pr (A_{f, f^{'}} ∣ σ_{1}, b_{1}, \dots, σ_{r}, b_{r}) \leq h_{r} (f, f^{'}; σ_{1}, b_{1}, \dots, σ_{r}, b_{r})

Pr (A_{f, f^{'}} ∣ σ_{1}, b_{1}, \dots, σ_{r}, b_{r}) \leq h_{r} (f, f^{'}; σ_{1}, b_{1}, \dots, σ_{r}, b_{r})

f \neq = f^{'} \sum h_{0} (f, f^{'}) < 1

h_{r} (f, f^{'}; σ_{1}, b_{1}, \dots, σ_{r}, b_{r}) \geq σ_{r + 1}, b_{r + 1} E h_{r + 1} (f, f^{'}; σ_{1}, b_{1}, \dots, σ_{r}, b_{r}, σ_{r + 1}, b_{r + 1})

f \neq = f^{'} \sum h_{r + 1} (f, f^{'}; σ_{1}, b_{1}, \dots, σ_{r}, b_{r}, σ_{r + 1}, b_{r + 1}) .

f \neq = f^{'} \sum h_{r + 1} (f, f^{'}; σ_{1}, b_{1}, \dots, σ_{r}, b_{r}, σ_{r + 1}, b_{r + 1}) .

h_{r} (f, f^{'}; σ_{1}, b_{1}, \dots, σ_{r}, b_{r}) = e^{- λ β} exp (λ ℓ = 1 \sum r G_{o_{f, ℓ} (f^{'})}) (M (λ))^{d - r},

h_{r} (f, f^{'}; σ_{1}, b_{1}, \dots, σ_{r}, b_{r}) = e^{- λ β} exp (λ ℓ = 1 \sum r G_{o_{f, ℓ} (f^{'})}) (M (λ))^{d - r},

M (λ) = e^{λ ϵ} [(\frac{2}{B} + \frac{1}{n}) (e^{λ (1 - ϵ)} - 1) + 1] .

M (λ) = e^{λ ϵ} [(\frac{2}{B} + \frac{1}{n}) (e^{λ (1 - ϵ)} - 1) + 1] .

o_{f, σ, b} (f^{'}) \equiv σ (f^{'} - f) + σ (f - b) - \frac{n}{B} round (\frac{B}{n} σ (f - b)) (mod n) .

o_{f, σ, b} (f^{'}) \equiv σ (f^{'} - f) + σ (f - b) - \frac{n}{B} round (\frac{B}{n} σ (f - b)) (mod n) .

Z_{σ} = σ (f - b) - \frac{n}{B} round (\frac{B}{n} σ (f - b)) .

Z_{σ} = σ (f - b) - \frac{n}{B} round (\frac{B}{n} σ (f - b)) .

Z_{σ} = \frac{n}{B} (\frac{B}{n} σ (f - b) - round (\frac{B}{n} σ (f - b))),

Z_{σ} = \frac{n}{B} (\frac{B}{n} σ (f - b) - round (\frac{B}{n} σ (f - b))),

odd ℓ ⋃ [2^{s} ℓ - \frac{n}{2 B}, 2^{s} ℓ + \frac{n}{2 B})

odd ℓ ⋃ [2^{s} ℓ - \frac{n}{2 B}, 2^{s} ℓ + \frac{n}{2 B})

E e^{λ G_{o_{f, σ, b} (f^{'})}} \leq (\frac{2}{B} + \frac{1}{n}) e^{λ} + (1 - \frac{2}{B} - \frac{1}{n}) e^{λ ϵ} = e^{λ ϵ} [(\frac{2}{B} + \frac{1}{n}) (e^{λ (1 - ϵ)} - 1) + 1],

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Deterministic Sparse Fourier Transform with an $\ell_{\infty}$ Guarantee

Yi Li

Nanyang Technological University

[email protected]

Vasileios Nakos This work is part of the project TIPEA that has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No. 850979). Part of the work was completed when the author was a Ph.D. student in Harvard University and supported in part by NSF CAREER award CCF-1350670. Saarland University and Max-Planck Institute for Informatics

[email protected]

In this paper we revisit the deterministic version of the Sparse Fourier Transform problem, which asks to read only a few entries of $x\in\mathbb{C}^{n}$ and design a recovery algorithm such that the output of the algorithm approximates $\widehat{x}$ , the Discrete Fourier Transform (DFT) of $x$ . The randomized case has been well-understood, while the main work in the deterministic case is that of Merhi et al. (J Fourier Anal Appl 2018), which obtains $O(k^{2}\log^{-1}k\cdot\log^{5.5}n)$ samples and a similar runtime with the $\ell_{2}/\ell_{1}$ guarantee. We focus on the stronger $\ell_{\infty}/\ell_{1}$ guarantee and the closely related problem of incoherent matrices. We list our contributions as follows.

We find a deterministic collection of $O(k^{2}\log n)$ samples for the $\ell_{\infty}/\ell_{1}$ recovery in time $O(nk\log^{2}n)$ , and a deterministic collection of $O(k^{2}\log^{2}n)$ samples for the $\ell_{\infty}/\ell_{1}$ sparse recovery in time $O(k^{2}\log^{3}n)$ . 2. 2.

We give new deterministic constructions of incoherent matrices that are row-sampled submatrices of the DFT matrix, via a derandomization of Bernstein’s inequality and bounds on exponential sums considered in analytic number theory. Our first construction matches a previous randomized construction of Nelson, Nguyen and Woodruff (RANDOM 12), where there was no constraint on the form of the incoherent matrix.

Our algorithms are nearly sample-optimal, since a lower bound of $\Omega(k^{2}+k\log n)$ is known, even for the case where the sensing matrix can be arbitrarily designed. A similar lower bound of $\Omega(k^{2}\log n/\log k)$ is known for incoherent matrices.

1 Introduction

Compressed sensing is a subfield of discrete signal processing, based on the principle that a high-dimensional signal can be approximately reconstructed, by exploiting its sparsity, in fewer samples than those demanded by the Shannon-Nyquist theorem. An important subtopic is the Sparse Fourier Transform, where we desire to detect and approximate the largest coordinates of a high-dimensional signal, given a few samples from its Fourier spectrum. Fewer samples play a crucial role, for example, in medical imaging, where reconstructing an image corresponds exactly to reconstructing a signal from its Fourier representation. Thus, the number of Fourier coefficients needed for (approximate) reconstruction is proportional to the radiation dose a patient receives as well as the time the patient needs to remain in the scanner. Furthermore, exploiting the sparsity of the signal has given researchers the hope of defeating the FFT algorithm of Cooley and Tukey, in the special (but of high practical value) case where the signal is approximately sparse. Thus, since FFT serves as an important computational primitive, and has been recognized as one of the 10 most important algorithms of the 20th century [Cip00], every place where it has found application can possibly be benefited from a faster algorithm. The main intuition and hope is that signals arising in practice often exhibit certain structure, such as concentration of energy in a small number of Fourier coefficients.

Since vectors in practice are never exactly sparse, and it is impossible to reconstruct a generic vector $\widehat{x}\in\mathbb{C}^{n}$ from $o(n)$ samples, researchers resort to approximation. More formally, a sparse recovery scheme consists of a sample set $S\subseteq\{1,\dots,n\}$ and a recovery algorithm $\mathcal{R}$ such that for any given $x\in\operatorname{\mathbb{C}}^{n}$ , the scheme approximates $\widehat{x}$ by $\widehat{x}^{\prime}=\mathcal{R}(x_{S})$ , where $x_{S}$ denotes the vector of $x$ restricted to the coordinates in $S$ . The fineness of approximation is measured with respect to the best $k$ -sparse approximation to $\widehat{x}$ . The breakthrough work of Candès, Tao and Donoho [CT06, Don06] first showed that $k\log^{O(1)}n$ samples of $x\in\mathbb{C}^{n}$ suffices to reconstruct a $O(k)$ -sparse vector $\widehat{x}^{\prime}$ which is “close” to the best $k$ -approximation of $\widehat{x}$ . More formally, the reconstruction $\widehat{x}^{\prime}$ satisfies the so-called $\ell_{2}/\ell_{1}$ guarantee, i.e.,

[TABLE]

where $\widehat{x}_{-k}$ is the tail vector, obtained from restricting $\widehat{x}$ to its smallest $n-k$ coordinates in magnitude. The strength of their algorithms lies in the uniformity, in the sense that the samples at the same coordinates can be used to approximate every $x\in\mathbb{C}^{n}$ . However, the running time is polynomial in the vector length $n$ , giving thus only sample-efficient, but not necessarily time-efficient, algorithms. Furthermore, the samples are not obtained via a deterministic procedure, but are chosen at random. Regarding non-uniform randomized algorithms that run in sublinear time, numerous researchers have worked on the problem and obtained a series of algorithms with different recovery guarantees [GL89, Man92, KM93, GGI*+*02, AGS03, GMS05, HIKP12a, HIKP12b, LWC13, Iwe13, PR14, IKP14, IK14, Kap16, Kap17, KVZ19, NSW19]. See Table 1 for a list of common recovery guarantees. The state of the art is the seminal algorithm of Kapralov [Kap17], which shows that $O(k\log n)$ samples and $O(k\log^{O(1)}n)$ time are simultaneously possible for the $\ell_{2}/\ell_{2}$ guarantee (which is strictly stronger111Here we mean that given an algorithm giving the $\ell_{2}/\ell_{2}$ guarantee, one can create an algorithm, using the $\ell_{2}/\ell_{2}$ algorithm as a black box, with sparsity parameter $k^{\prime}=O(k)$ , achieving the $\ell_{2}/\ell_{1}$ guarantee with the same order of number of samples. than the $\ell_{2}/\ell_{1}$ ). The fastest algorithm is due to [HIKP12a], needing $O(k\log n\cdot\log(n/k))$ time and samples. We note also the algorithm of Indyk and Kapralov [IK14] that runs in $O(n\log^{2}n)$ time, uses $O(k\log n)$ samples but gives a stronger $\ell_{\infty}/\ell_{2}$ guarantee than the $\ell_{2}/\ell_{2}$ guarantee in the previous two papers. We refer the reader to the next section for comparison of the different guarantees appearing in the literature. Recently there has been also considerable work on recovering $k$ -sparse signals from their continuous Fourier Transform, see [BCG*+*14, PS15, CKPS16, AKM*+*18].

Although our understanding on randomized algorithms is almost complete, there are still important gaps in our knowledge regarding deterministic schemes. The following natural open-ended question has theoretical and practical interest and remains in principle highly unexplored, touching a variety of fields including (sublinear-time) algorithms, pseudorandomness and computational complexity, Additive Combinatorics [BDF*+*11] and analytic number theory.

Question 1.1.

What are the best bounds we can obtain for the different versions of the deterministic Sparse Fourier Transform problem?

With sublinear runtime, the earliest work of Iwen [Iwe08, Iwe10] gives $O(k^{2}\log^{4}n)$ samples and time, albeit in a significantly easier (although similar) model: where one wants to learn a band-limited function $f:[0,2\pi)\to\mathbb{C}$ and can evaluate $f$ at any point. In the discrete case which we are interested in, the state of the art is the work of Merhi et al. [MZIC18], which obtains $O(k^{2}\log^{11/2}n/\log k)$ samples and the same runtime. A recent work of Bittens et al. [BZI17] showed that the quadratic dependence can be dropped if the signals are sufficiently structured, namely, if the Fourier coefficients are generated by an unknown but small degree polynomial. On the related problem of the Walsh-Hamadard Transform, Indyk and Cheraghchi [CI17] showed that roughly $O(k^{1+\alpha}\log^{O(1)+6/\alpha}n)$ samples and similar run-time are possible, if one resorts to a slightly weaker guarantee. Interestingly, their approach resides in a novel connection between the Walsh-Hadamard matrix and linear lossless condensers. However, this connection does not extend to the Fourier Transform over $\mathbb{Z}_{n}$ , which is our focus and the most interesting case. Interesting ideas appear also in the work of Akavia [Aka10, Aka14], where it is shown how to approximate the Fourier Transform of an arithmetic progression in poly-logarithmic time in the length of the progression; due to the worse dependence on the quality of approximation, however, that work obtained an algorithm with sample complexity $(k\cdot\text{(signal-to-noise ratio)})^{4}$ .

The papers above showed how to achieve the $\ell_{2}/\ell_{1}$ guarantee in a number of samples that is quadratic in the signal sparsity. It is already known that a nearly linear dependence is possible [CT06]; however, we do not have efficient deterministic algorithms for finding these samples. The work of [CT06], as well as subsequent works, proceeds by sampling with repetition rows of the DFT matrix, and showing that the RIP condition (see Definition 2.3) holds, which in turn implies the desired result, but via a super-linear algorithm. The state-of-the-art analysis of such row subsampling is due to Haviv and Regev [HR16], who showed that $O(k\log^{2}k\log n)$ samples suffice. A lower bound of $\Omega(k\log n)$ rows for this subsampling process has been shown in [BLM17]. In this paper, we follow a different avenue and give a new set of schemes for the Sparse Fourier Transform which allow uniform reconstruction. Although our dependence is still quadratic in $k$ , it is necessary, in contrast to the previous works: our results satisfy the strictly stronger $\ell_{\infty}/\ell_{1}$ guarantee, for which a quadratic lower bound is known [Gan08], and hence one cannot hope for a sub-quadratic dependence. We also note the deterministic algorithm of [KVZ19], which needs a cubic dependence on $k$ but solves a somewhat different problem of finding the multidimensional sparse Fourier transform of a signal with at most $k$ non-zeros in the frequency domain, and thus is not robust to noise.

The focus of our work is the $\ell_{\infty}/\ell_{1}$ guarantee, defined formally as follows.

Definition 1.2 ( $\ell_{\infty}/\ell_{1}$ guarantee).

A sparse recovery scheme is said to satisfy the $\ell_{\infty}/\ell_{1}$ guarantee with parameter $k$ , if given access to vector $x$ , it outputs a vector $\widehat{x}^{\prime}$ such that

[TABLE]

$\ell_{\infty}/\ell_{1}$ versus $\ell_{2}/\ell_{1}$ : A matter of “find all” versus “miss all”.

As we have discussed, previous works satisfied the $\ell_{2}/\ell_{1}$ guarantee, while our target is the $\ell_{\infty}/\ell_{1}$ guarantee. Any algorithm for the latter guarantee also satisfies the former one. But, as we shall demonstrate in Section 2.3, the $\ell_{\infty}/\ell_{1}$ guarantee is much stronger: there exists an infinite family of vectors for which an $\ell_{2}/\ell_{1}$ algorithm might detect none of the heavy frequencies, while an $\ell_{\infty}/\ell_{1}$ algorithm must detect all of them. This happens because the $\ell_{\infty}/\ell_{1}$ is a worst-case guarantee, in the sense that it requires detection of every frequency just above the noise level, in contrast to the $\ell_{2}/\ell_{1}$ , which should be regarded as an average-case guarantee in the sense that it allows missing a subset of the heavy frequencies if they carry the energy proportional to the noise level.

Previous Work on $\ell_{\infty}/\ell_{1}$ with arbitrary linear measurements.

All approaches described above concerned Fourier measurements, but compressed sensing has a long history using arbitrary linear measurements, for example [DBIPW10, PW11, IPW11, GLPS10, GNP*+*13, NSWZ18, GLPS17, LNW18, LN18, NS19]. Regarding $\ell_{\infty}/\ell_{1}$ , the work of [NNW14] indicated a connection between the aforementioned guarantee and incoherent matrices. More specifically, it was shown that given a $(1/k)$ -incoherent matrix one can design an algorithm satisfying the $\ell_{\infty}/\ell_{1}$ guarantee. The existence of a matrix with $O(k^{2}\min\{\log n,(\log n/\log k)^{2}\})$ rows was also proved. Reconstruction needed $\Omega(nk)$ time, something which was partially remedied by Li and Nakos [LN18] with a scheme of $O(k^{2}\log n\cdot\log^{\ast}k)$ measurements and $\operatorname{poly}(k,\log n)$ decoding time. Incoherent matrices are interesting objects on their own, and have been studied before, as they can be used to obtain RIP matrices. Deterministic constructions of $O(k^{2}(\log n/\log k)^{2})$ rows were obtained by DeVore [DeV07] using deep results from the theory of Gelfand widths and by Amini and Marvasti [AM11] via binary BCH code vectors, where the zeros are replaced by $-1$ s. We note that incoherent matrices matching this bound also follow immediately from the famous Nisan-Wigderson combinatorial designs [NW94], and serve as a cornerstone for constructions of pseudorandom generators and extractors [Tre01]. Incoherent matrices are also connected with $\epsilon$ -biased codes, and thus an almost optimal strongly explicit construction can be obtained by the recent breakthrough work of [TS17]. On the lower bound side, Alon has shown that $\Omega(k^{2}\log n/\log k)$ rows are necessary for a $(1/k)$ -incoherent matrix [Alo09].

Our Contribution.

In this work we offer several new results for the Sparse Fourier Transform problem across different axis, some of which are nearly optimal. We show how to find in polynomial time a deterministic collection of samples from the time domain, such that we can solve the Sparse Fourier Transform problem in linear and sublinear time and achieve nearly optimal sample complexity. For the closely related problem of incoherent matrices from DFT rows, which is of independent interest, we obtain a nearly optimal derandomized construction via Bernstein’s inequality. We also demonstrate strongly explicit constructions, by invoking heavy number-theoretical machinery.

We note that the bounds of our constructions have been known for more than a decade if the sensing/incoherent matrix is allowed to be arbitrary. However, the previous arguments did not facilitate the frequent and relevant scenario where we have access to rows only from the Fourier ensemble. Part of our work is to show that some of these results carry over to the significantly more constrained case. We also note that any progress to deterministic $\ell_{2}/\ell_{1}$ schemes with subquadratic sample complexity is connected to the very challenging problem of obtaining a deterministic DFT row-subsampled RIP matrices with subquadratic number of rows222Note that [BDF*+*11] breaks the quadratic barrier for RIP matrices but does not use the Fourier ensemble; the rows are picked from the discrete chirp-Fourier ensemble, where the linear functions are substituted by quadratic polynomials. which possibly out of reach at the moment.

2 Technical Results

2.1 Preliminaries

For a positive integer $n$ , we define $[n]=\{0,1\ldots,n-1\}$ and we shall index the coordinates of a $n$ -dimensional vector or the rows/columns of an $n\times n$ matrix from [math] to $n-1$ . We define the Discrete Fourier Transform (DFT) matrix $F\in\mathbb{C}^{n\times n}$ to be the unitary matrix such that $F_{ij}=\frac{1}{\sqrt{n}}e^{2\pi\sqrt{-1}\cdot ij/n}$ , and the Discrete Fourier Transform of a vector $x\in\operatorname{\mathbb{C}}^{n}$ to be $\widehat{x}=Fx$ .

For a set $S\subseteq[n]$ we define $x_{S}$ to be the vector obtained from $x$ after zeroing out the coordinates not in $S$ . We also define $H(x,k)$ to be the set of the indices of the largest $k$ coordinates (in magnitude) of $x$ , and $x_{-k}=x_{[n]\setminus H(x,k)}$ . We say $x$ is $k$ -sparse if $x_{-k}=0$ . We also define $\|x\|_{p}=\big{(}\sum_{i=0}^{n-1}|x_{i}|^{p}\big{)}^{1/p}$ for $p\geq 1$ and $\|x\|_{0}$ to be the number of nonzero coordinates of $x$ .

For a matrix $F\in\operatorname{\mathbb{C}}^{n\times n}$ and subsets $S,T\subseteq[n]$ , we define $F_{S,T}$ to be the submatrix of $F$ indexed by rows in $S$ and columns in $T$ .

The median of a collection of complex numbers $\{z_{i}\}$ is defined to be $\operatorname*{median}_{i}z_{i}=\operatorname*{median}_{i}\operatorname{Re}(z_{i})+\sqrt{-1}\operatorname*{median}_{i}\operatorname{Im}(z_{i})$ , i.e., taking the median of the real and the imaginary component separately.

For two points $x$ and $y$ on the unit circle, we use $|x-y|_{\circ}$ to denote the circular distance (in radians, i.e. modulo $2\pi$ ) between $x$ and $y$ .

2.1.1 $\ell_{\infty}/\ell_{1}$ Gurantee and incoherent matrices

The quality of the approximation is usually measured in different error metrics, and the main recovery guarantee we are interested in is called the $\ell_{\infty}/\ell_{1}$ guarantee, as defined in Definition 1.2. Other types of recovery guarantee, such as the $\ell_{\infty}/\ell_{2}$ , the $\ell_{2}/\ell_{2}$ and the $\ell_{2}/\ell_{1}$ , are defined similarly, where (1) is replaced with the respective expression in Table 1. Note that these are definitions of the error guarantee per se and do not have algorithmic requirements on the scheme.

Highly relevant with the $\ell_{\infty}/\ell_{1}$ guarantee is a matrix condition which we call incoherence.

Definition 2.1 (Incoherent Matrix).

A matrix $A\in\mathbb{C}^{m\times n}$ is called $\epsilon$ -incoherent if $\|A_{i}\|_{2}=1$ for all $i$ (where $A_{i}$ denotes the $i$ -th column of $A$ ) and $|\langle A_{i},A_{j}\rangle|\leq\epsilon$ .

Lemma 2.2 ([NNW14]).

There exist an absolute constant $c>0$ such that for any $(c/k)$ -incoherent matrix $A$ , there exists a $\ell_{\infty}/\ell_{1}$ -scheme which uses $A$ as the measurement matrix and whose recovery algorithm runs in polynomial time.

2.1.2 The Restrictred Isometry Property and its connection with incoherence

Another highly relevant condition is called the renowned restricted isometry property, introduced by Candès et al. in [CRT06]. We show how incoherent matrices are connected to it.

Definition 2.3 (Restricted Isometry Property).

A matrix $A\in\mathbb{C}^{m\times n}$ is said to satisfy the $(k,\epsilon)$ Restricted Isometry Property (RIP), if for all $x\in\mathbb{C}^{n}$ with $\|x\|_{0}\leq k$ , it holds that $(1-\epsilon)\|x\|_{2}\leq\|Ax\|_{2}\leq(1+\epsilon)\|x\|_{2}$ .

Candès et al. proved in their breakthrough paper [CRT06] that any RIP matrix can be used for sparse recovery with the $\ell_{2}/\ell_{1}$ error guarantee. The following formulation comes from [FR13, Theorem 6.12].

Lemma 2.4.

Given a $(2k,\epsilon)$ -RIP matrix $A$ with $\epsilon<4/\sqrt{41}$ , we can design a $\ell_{2}/\ell_{1}$ -scheme that uses $A$ as the measurement matrix and has a recovery algorithm that runs in polynomial time.

Although randomly subsampling the DFT matrix gives an RIP matrix with $O(k\log^{2}k\log n)$ rows [HR16], no algorithm for finding these rows in polynomial time is known; actually, even for $o(k^{2})\cdot\operatorname{poly}(\log n)$ rows the problem remains wide open333In fact, one of the results of our paper gives the state-of-the-art result even for this problem, with $O(k^{2}\log n)$ rows, see Theorem 2.10.. It is a very important and challenging problem whether one can have an explicit construction of RIP matrices from Fourier measurements that break the quadratic barrier on $k$ .

We state the following two folklore results, connecting the two different guarantees, and their associated combinatorial objects. This indicates the importance of incoherent matrices for the field of compressed sensing.

Proposition 2.5 (folklore).

An $\ell_{\infty}/\ell_{1}$ scheme with a measurement matrix of $m$ rows and recovery time $T$ induces an $\ell_{2}/\ell_{1}$ scheme of a measurement matrix of $O(m)$ rows and recovery time $O(T+\|\widehat{x}^{\prime}\|_{0})$ , where $\widehat{x}^{\prime}$ is the output of the $\ell_{\infty}/\ell_{1}$ scheme.

Proposition 2.6 (folklore).

A $(c/k)$ -incoherent matrix is also a $(k,c)$ -RIP matrix.

2.2 Our results

2.2.1 Sparse Fourier Transform Algorithms

Theorem 2.7 (Deterministic SFT with super-linear time, Section 5).

Let $n$ be a power of $2$ . There exist a set $S\subseteq[n]$ with $|S|=O(k^{2}\log n)$ and an absolute constant $c>0$ such that the following holds. For any vector $x\in\mathbb{C}^{n}$ with $\|\widehat{x}\|_{\infty}\leq n^{c}\|\widehat{x}_{-k}\|_{1}/k$ , one can find an $O(k)$ -sparse vector $\widehat{x}^{\prime}\in\operatorname{\mathbb{C}}^{n}$ such that

[TABLE]

in time $O(nk\log^{2}n)$ by accessing $\{x_{i}\}_{i\in S}$ only. Moreover, the set $S$ can be found in $\operatorname{poly}(n)$ time.

Theorem 2.8 (Deterministic SFT with sublinear time, Section 6).

Let $n$ be a power of $2$ . There exist a set $S\subseteq[n]$ with $|S|=O(k^{2}\log^{2}n)$ and an absolute constant $c>0$ such that the following holds. For any vector $x\in\mathbb{C}^{n}$ with $\|\widehat{x}\|_{\infty}\leq n^{c}\|\widehat{x}_{-k}\|_{1}/k$ , one can find an $O(k)$ -sparse vector $\widehat{x}^{\prime}\in\operatorname{\mathbb{C}}^{n}$ such that

[TABLE]

in time $O(k^{2}\log^{3}n)$ by accessing $\{x_{i}\}_{i\in S}$ only. Moreover, the set $S$ can be found in $\operatorname{poly}(n)$ time.

Remark 2.9.

The condition $\|\widehat{x}\|_{\infty}\leq n^{c}\|\widehat{x}_{-k}\|_{1}/k$ upper bounds the “signal-to-noise ratio”, a common measure in engineering that compares the level of a desired signal to the level of the background noise. This is a common assumption in most algorithms in the Sparse Fourier Transform literature, see, e.g. [HIKP12a, IK14, Kap16, CKSZ17, Kap17], where the $\ell_{2}$ -norm variant $\|\widehat{x}\|_{\infty}\leq n^{c}\|\widehat{x}_{-k}\|_{2}/\sqrt{k}$ was assumed.

2.2.2 From DFT to incoherent matrices

This section contains deterministic constructions of incoherent matrices.

An Explicit Construction: Derandomization in $\mathrm{poly}(n)$ time.

Theorem 2.10 (Incoherent matrices by derandomized subsampling of DFT, Section 7).

There exists a set $S\subseteq[n]$ with of cardinality $O(k^{2}\log n)$ such that the matrix $\sqrt{\frac{n}{m}}F_{S,[n]}$ is $(1/k)$ -incoherent. Moreover, $S$ can be found in $\operatorname{poly}(n)$ time.

The above Theorem yields immediately a different algorithm for $\ell_{\infty}/\ell_{1}$ Sparse Fourier Tranform with $O(k^{2}\log n)$ samples, via the reduction in [NNW14].

Strongly explicit constructions: Derandomization in sub-linear time

Theorem 2.11 (Incoherent matrices from DFT via low-degree polynomials, Section 8).

Let $\epsilon>0$ be a constant small enough, $p$ be a prime and $d\geq 2$ be an integer. There exists a strongly explicit construction of an $O(m^{\epsilon}(\frac{1}{m}+\frac{p}{m^{d}})^{2^{1-d}})$ -incoherent matrix $M\in\operatorname{\mathbb{C}}^{m\times p}$ such that the rows of $\sqrt{m}M$ are rows of the DFT matrix (a row may appear more than once). The hidden constant in the $O$ -notation depends on $d$ and $\epsilon$ . Finding the indices of the rows takes $\widetilde{O}(m)$ time.

To get an idea of the above result one could for example set $d=3$ and observe that the results translates to the following: for every $k\geq p^{1/8}$ one can get a $(1/k)$ -incoherent matrix with $O(k^{4+\epsilon})$ rows. One needs the condition on $k$ (or equivalently the condition on $m$ ) to bound the term $p/m^{d}$ . The larger the degree $d$ , the looser this condition, but also the worse the dependence of $m$ on $k$ . For example, when $d=4$ , we can expand the regime of $k$ to approximately $k\geq p^{1/24}$ , but obtain approximately $m=O(k^{8+\epsilon})$ .

The following is a different construction, incomparable with Theorem 2.11 in multiple ways. First, the construction runs in sublinear time in $p$ but it is not strongly explicit. Second, it gives different trade-offs between the sparsity parameter and the number of rows. Last but not least, the construction depends on the factorization of $p-1$ .

Theorem 2.12 (Incoherent matrices from DFT via multiplicative subgroups, Section 8).

Let $p$ be a prime number. For every divisor $d$ of $p-1$ such that $d>\sqrt{p}$ we can find in time $O(d\log p)$ a matrix $M\in\operatorname{\mathbb{C}}^{d\times p}$ with rows being the rows of the DFT matrix such that $\frac{1}{d}M$ is $(\sqrt{p}/d)$ -incoherent.

This result could give (depending on the factorization of $p-1$ ) a better polynomial dependence of $m$ on $k$ in the high-sparsity regime. If $p-1$ has a large divisor about $p^{1-\gamma}$ , this would yield a matrix with sparsity parameter $k\approx p^{\gamma}$ and $m\approx k^{1/\gamma-1}$ rows. For example, when $\gamma=1/4$ , we obtain $k\approx p^{1/4}$ and $m\approx k^{3}$ , which cannot be obtained from Theorem 2.11. In general, Theorem 2.12 will yield useful matrices as long as $p-1$ has divisors in the range $[\sqrt{p},p-1]$ , ideally as many as possible. An extreme case is Fermat primes, which have $(\log p)/2$ divisors in the aforesaid interval.

The reader might ask the question if the polynomial dependence of $k$ on $p$ is necessary; ideally one would like a logarithmic dependence, since the polynomial dependence is interesting only in the high-sparsity regime. Regarding strongly explicit constructions, we provide some evidence why this might be a very hard problem in the remark below.

Remark 2.13.

The inferiority of our bounds in the low-sparsity regime is justifiable to some extent: it is because of a common obstacle that has persisted more than a century in the theory of exponential sums, due to the lack of techniques to account for sparse character sums (either additive or multiplicative). In general, the fewer summands the sum has, the harder it is to prove a tight cancellation bound. Thus, owing to the use of heavy machinery from analytic number theory and more specifically the theory of exponential sums over finite fields, our bounds for strongly explicit constructions are quite suboptimal.

2.3 Comparing $\ell_{2}/\ell_{1}$ with $\ell_{\infty}/\ell_{1}$

In this subsection we elaborate why $\ell_{\infty}/\ell_{1}$ is much stronger than $\ell_{2}/\ell_{1}$ , and not just a guarantee that implies $\ell_{2}/\ell_{1}$ . Let $\gamma<1$ be a constant and consider the following scenario. There are three sets $A,B,C$ of size $\gamma k,(1-\gamma)k$ , $n-k$ respectively, and for every $i\in A$ we have $|\widehat{x}_{i}|=\frac{2}{k}\|\widehat{x}_{C}\|_{1}=\frac{2}{k}\|\widehat{x}_{-k}\|_{1}$ , while every coordinate in $B$ and $C$ has the equal magnitude. It follows immediately that

[TABLE]

Now assume that $k\leq\gamma n$ , then $(n-\gamma k)/(n-k)\leq 1+\gamma$ . We claim that the zero vector is a valid solution for the $\ell_{2}/\ell_{1}$ guarantee, since

[TABLE]

where the last inequality follows provided it further holds that $k\leq\gamma n/(2\gamma+1)$ . Hence when $\gamma\leq 1/5$ , we see that the zero vector satisfies the $\ell_{2}/\ell_{1}$ guarantee.

Since $\vec{0}$ is a possible output, we may not recover any of the coordinates in $S$ , which is the set of “interesting” coordinates. On the other hand, the $\ell_{\infty}/\ell_{1}$ guarantee does allow the recovery of every coordinate in $S$ . This is a difference of recovering all $\gamma k$ versus [math] coordinates. We conclude from the discussion above that in the case of too much noise, the $\ell_{2}/\ell_{1}$ guarantee becomes much weaker than the $\ell_{\infty}/\ell_{1}$ , possibly giving meaningless results in some cases.

3 Overview

Sparse Fourier Transform Algorithms (Subsection 2.2.1).

We first show how to achieve the for-all schemes, i.e., schemes that allow universal reconstruction of all vectors, and then derandomize them. Similarly to the previous works [HIKP12b, IK14, Kap17], our algorithm hashes, with the filter in [Kap17], the spectrum of $x$ to $O(k)$ buckets using pseudorandom permutations, and repeat $O(k\log n)$ times with fresh randomness. The main part of the analysis is to show that for any vector $\widehat{x}\in\mathbb{C}^{n}$ and any set $S\subseteq[n]$ with $|S|\leq k$ , each $i\in S$ , in a constant fraction of the repetitions, receives “low noise” from all other elements, under the pseudorandom permutations. This will boil down to a set of $\Theta(n^{2})$ inequalities involving the filter and the pseudorandom permutations. We prove these inequalities with full randomness (Lemma 5.9), and then derandomize the pseudorandom permutations using the method of conditional expectations (Lemma 5.10). This will give us Theorem 2.7. To do so, we choose the pseudorandom permutations one at a time, repetition by repetition, and keep an (intricate) pessimistic estimator (Lemma 5.8), which we update accordingly. Our argument extends the arguments in [NNW14] and [PR08], and could be of independent interest. To compare with [NNW14] we have the following observation. The construction in [NNW14] consists of $O(k\log n)$ matrices, joined vertically, each having $O(k)$ rows and exactly one $1$ per column. This ensures a small incoherence of the concatenated matrix and gives the $\ell_{\infty}/\ell_{1}$ guarantee. In the Fourier case, the convolution with the filter functions behaves analogously: instead of having exactly one non-zero element, each column in the $\ell$ -th matrix has a contiguous segment of $1$ s of size $\approx n/k$ (where the center of that segment depends on the choice of the $\ell$ -th pseudorandom permutation) and polynomially decaying entries away from this segment. Moreover, the positions of the segments across the columns are not fully independent and are defined via the pseudorandom permutations in Definition 4.2. We show that even in this more restricted setting, derandomization is possible in polynomial time. Several details are omitted in the preceding high-level discussion and we suggest the reader look at the corresponding sections for the complete argument.

The sublinear-time algorithm (Theorem 2.8) is obtained by bootstrapping the derandomized scheme above with an identification procedure in each bucket, as most previous algorithms have done (e.g. [HIKP12a]). The major difference is that our identification procedure needs to be deterministic. We show an explicit set of samples that allow the implementation of the desired routine. To illustrate our idea, let us focus on the following $1$ -sparse case: $\widehat{x}\in\mathbb{C}^{n}$ and $|\widehat{x}_{i^{*}}|\geq 3\|\widehat{x}_{[n]\setminus i^{\ast}}\|_{1}$ for some $i^{*}$ , which we want to locate. Let

[TABLE]

and consider the $\log n$ samples $x_{0},x_{1},x_{2},x_{4},\ldots,x_{2^{r-1}},\dots$ .

Observe that (ignoring $1/\sqrt{n}$ factors)

[TABLE]

we can find $\beta\theta_{i^{*}}+\arg\widehat{x}_{i^{\ast}}$ up to $\pi/8$ , just by estimating the phase of $x_{\beta}$ and Proposition 4.10. Thus we can estimate $\beta\theta_{i^{\ast}}$ up to $\pi/4$ from the phase of $x_{\beta}/x_{0}$ . If $i^{\ast}\neq j$ , then there exists a $\beta\in\{1,2,2^{2},\dots,2^{r-1},\ldots\}$ such that $|\beta\theta_{i^{*}}-\beta\theta_{j}|_{\circ}>\pi/2$ , and so $\beta\theta_{j}$ will be more than $\pi/4$ away from the phase of the measurement. Thus, by iterating over all $j\in[n]$ , we keep the index $j$ for which $\beta\theta_{j}$ is within $\pi/4$ from $\arg(x_{\beta}/x_{0})$ , for every $\beta$ that is a power of $2$ in $\mathbb{Z}_{n}$ .

Unfortunately, although this is a deterministic collection of $O(\log n)$ samples, the above argument gives only $O(n\log n)$ time. For sublinear-time decoding we use $x_{1}/x_{0}$ to find a sector $S_{0}$ of the unit circle of length $\pi/4$ that contains $\theta_{i^{*}}$ . Then, from $x_{2}/x_{0}$ we find two sectors of length $\pi/8$ each, the union of which contains $\theta_{i^{*}}$ . Because these sectors are antipodal on the unit circle, the sector $S_{0}$ intersects exactly one of those, let the intersection be $S_{1}$ . The intersection is a sector of length at most $\pi/8$ . Proceeding iteratively, we halve the size of the sector at each step, till we find $\theta_{i^{*}}$ , and infer $i^{*}$ . Plugging this idea in the whole $k$ -sparse recovery scheme yields the desired result. Our argument crucially depends on the fact that in the $\ell_{1}$ norm the phase of $\theta_{i^{\ast}}$ will always dominate the phase of all samples we take.

Incoherent Matrices from the Fourier ensemble (Subsection 2.2.2).

Our first result for incoherent matrices (Theorem 2.10) is more general and works for any matrix that has orthonormal columns with entries bounded by $O(1/\sqrt{n})$ . We subsample the matrix, invoke a Chernoff bound and Bernstein’s inequality to show the small incoherence of the subsampled matrix. We follow a derandomization procedure which essentially mimics the proof of Bernstein’s inequality, keeping a pessimistic estimator which corresponds to the sum of the generating functions of the probabilities of all events we want to hold, evaluated at specific points. We obtain an explicit construction, i.e. a derandomization in $\mathrm{poly}(n)$ time. This argument could be of independent interest for its generality. As there are many technical obstacles to overcome, we suggest the reader take a careful look at the proof to gain a clearer picture of the argument.

Our next results (Theorem 2.11 and Theorem 2.12) construct strongly explicit incoherent matrices by making use of technology from the fruitful theory of exponential sums in analytic number theory and additive combinatorics. Roughly speaking, to bound a complex exponential sum over a set $S$ , one would expect that specific choices of the set $S$ lead to non-trivial bounds, i.e. $o(|S|)$ , since cancellation takes place in the summation. Ideally, one would desire that the exponentials behave like a random walk and give the optimal cancellation of $O(\sqrt{|S|})$ . This intuition is clearly not true, but the results by Weyl and others show that certain sets $S$ can exhibit a nicer behaviour. We exploit their results to build incoherent matrices by taking the rows of the DFT matrix indexed by the “nice” sets. This connection also yields an immediate improvement on the lower bound of an exponential sum obtained by Winterhof [Win01].

4 Technical Toolkit

4.1 Hash Functions

Definition 4.1 (Frequency domain hashings $\pi,h,o$ ).

Given $\sigma,b\in[n]$ , we define a function $\pi_{\sigma,b}:[n]\rightarrow[n]$ to be $\pi_{\sigma,b}(f)=\sigma(f-b)\pmod{n}$ for all $f\in[n]$ . Define a hash function $h_{\sigma,b}:[n]\rightarrow[B]$ as $h_{\sigma,b}(f)=\operatorname{round}((B/n)\pi_{\sigma,b}(f))$ and the off-set functions $o_{f,\sigma,b}:[n]\rightarrow[n/B]$ as $o_{f,\sigma,b}(f^{\prime})=\pi_{\sigma,b}(f^{\prime})-(n/B)h_{\sigma,b}(f)$ . When it is clear from context, we will omit the subscripts $\sigma,b$ from the above functions.

In what follows, we might use the notation $H=(\sigma,a,b)$ to denote a tuple of values along with the associated hash function from Definition 4.1. Below we define a pseudorandom permutation in the frequency domain.

Definition 4.2 ( $P_{\sigma,a,b}$ ).

Suppose that $\sigma^{-1}\mod n$ exists. For $a,b\in[n]$ , we define the pseudorandom permutation $P_{\sigma,a,b}$ by $(P_{\sigma,a,b}x)_{t}=x_{\sigma(t-a)}\omega^{t\sigma b}$ .

Proposition 4.3 ([HIKP12a, Claim 2.2]).

$(\widehat{P_{\sigma,a,b}x})_{\pi_{\sigma,b}(f)}=\widehat{x}_{f}\omega^{a\sigma f}$ .

Definition 4.4 (Sequence of Hashings).

A sequence of $d$ hashings is specified by $d$ tuples $\{(\sigma_{r},a_{r},b_{r})\}_{r\in[d]}$ . For a fixed $r\in[d]$ , we will also set $\pi_{r},h_{r},o_{r}$ to be the functions defined in Definition 4.1, and $P_{r}$ to be the pseudorandom permutation defined in Definition 4.2, by setting $a=a_{r},b=b_{r},\sigma=\sigma_{r}$ .

4.2 Filter Functions

Definition 4.5 (Flat filter with $B$ buckets and sharpness $F$ [Kap17]).

*A sequence $\widehat{G}\in\operatorname{\mathbb{R}}^{n}$ symmetric about zero with Fourier transform $G\in\operatorname{\mathbb{R}}^{n}$ is called a flat filter with $B$ buckets and sharpness $F$ if

(1) $\widehat{G}_{f}\in[0,1]$ for all $f\in[n]$ ;

(2) $\widehat{G}_{f}\geq 1-(1/4)^{F-1}$ for all $f\in[n]$ such that $|f|\leq\frac{n}{2B}$ ;

(3) $\widehat{G}_{f}\leq(1/4)^{F-1}(\frac{n}{B|f|})^{F-1}$ for all $f\in[n]$ such that $|f|\geq\frac{n}{B}$ .*

Lemma 4.6 (Compactly supported flat filter with $B$ buckets and sharpness $F$ [Kap17]).

Fix the integers $(n,B,F)$ with $n$ a power of two, integers $B<n$ , and $F\geq 2$ an even integer. There exists an $(n,B,F)$ -flat filter $\widehat{G}\in\operatorname{\mathbb{R}}^{n}$ , whose inverse Fourier transform $G$ is supported on a length- $O(FB)$ window centered at zero in time domain.

Lemma 4.7 ([HIKP12b, Lemma 3.6], [HIKP12a, Lemma 2.4], [IK14, Lemma 3.2]).

Let $f,f^{\prime}\in[n]$ . Let $\sigma$ be uniformly random odd number between $1$ and $n-1$ . Then for all $d\geq 0$ we have $\Pr[|\sigma(f-f^{\prime})|_{\circ}\leq d]\leq 4d/n$ .

4.3 Formulas for Estimation

Definition 4.8 (Measurement).

For a signal $\widehat{x}\in\operatorname{\mathbb{C}}^{n}$ , a hashing $H=(\sigma,a,b)$ , integers $B$ and $F$ , a measurement vector $m_{H}\in\operatorname{\mathbb{C}}^{B}$ is the $B$ -dimensional complex-valued vector such that

[TABLE]

for $s\in[B]$ . Here $\widehat{G}$ is a filter with $B$ buckets and sharpness $F$ constructed in Definition 4.5.

The following lemma provides a HashToBins procedure, which computes the bucket values of the residual $\widehat{x}-\widehat{z}$ , where $\widehat{z}$ is also provided as input.

Lemma 4.9 (HashToBins [Kap17, Lemma 2.8]).

Let $H=(\sigma,a,b)$ and parameters $B,F$ such that $B$ is a power of $2$ , and $F$ is an even integer. There exists a deterministic procedure HashToBins $(x,\widehat{z},H)$ which computes $u\in\operatorname{\mathbb{C}}^{B}$ such that for any $f\in[n]$ ,

[TABLE]

where $\widehat{G}$ is the filter defined in Definition 4.5, and $\Delta_{h(f)}$ is a negligible error term satisfying $|\Delta_{h(f)}|\leq\|z\|_{2}\cdot n^{-c}$ for $c>0$ an arbitrarily large absolute constant. It takes $O(BF)$ samples, and $O(F\cdot B\log B+\|\widehat{z}\|_{0}\cdot\log n)$ time.

We shall ignore the $\Delta_{h(f)}$ term in the proof of correctness of our algoriths, since it will be negligible and won’t affect the analysis. For a hashing $H=(\sigma,a,b)$ , values $B,F$ , and the associated measurement $m_{H}$ , one has

[TABLE]

The following is a basic fact of complex numbers, which will be crucially used in our sublinear-time algorithm, for estimating the phase of a heavy coordinate.

Proposition 4.10.

Let $x,y\in\operatorname{\mathbb{C}}$ with $|y|\leq|x|/3$ , then $|\arg(x+y)-\arg x|\leq\pi/8$ .

Proof.

The worst case occurs when $y$ is orthogonal to $x$ , and thus $|\arg(x+y)-\arg x|\leq\arctan(1/3)<\pi/8$ . ∎

5 Linear-Time Algorithm

Our first step is to obtain a condition that allows us to approximate every coordinate of $\widehat{x}\in\mathbb{C}^{n}$ . This condition corresponds to a set of $n(n-1)$ inequalities. In this section we shall consider a sequence of hashings $\{H_{r}\}_{r\in[d]}=\{(\sigma_{r},a_{r},b_{r})\}_{r\in[d]}$ and for notational simplicity we shall abbreviate $o_{f,\sigma_{r},b_{r}}(f^{\prime})$ as $o_{f,r}(f^{\prime})$ .

We first present a lemma, which states that each $\widehat{x}_{f}$ can be finely estimated in most hashing repetitions.

Lemma 5.1.

Fix $B$ and $F$ . Let a sequence of hashings $\{H_{r}\}_{r\in[d]}=\{(\sigma_{r},a_{r},b_{r})\}_{r\in[d]}$ and $x\in\mathbb{C}^{n}$ . If for all $f,f^{\prime}\in[n]$ with $f\neq f^{\prime}$ it holds that

[TABLE]

then for every vector $x\in\mathbb{C}^{n}$ and every $f\in[n]$ , for at least $8d/10$ indices $r\in[d]$ we have that

[TABLE]

Proof.

We have that

[TABLE]

Hence there can be at most $2d/10$ indices $r\in[d]$ for which the estimate $|\widehat{x}_{f}-\widehat{G}_{o_{f,r}}(f)\cdot m_{r}(h_{r}(f))|$ is more than $(10/B)\|\widehat{x}_{[n]\setminus\{f\}}\|_{1}$ , otherwise the leftmost-hand side would be at least $(2d/10+1)\cdot(10/B)\|\widehat{x}_{[n]\setminus\{f\}}\|_{1}>2(d/B)\|\widehat{x}_{[n]\setminus\{f\}}\|_{1}$ . ∎

The lemma above implies that for every $f\in[n]$ we can find an estimate of $\widehat{x}_{f}$ up to $\frac{10}{B}\|\widehat{x}_{[n]\setminus\{f\}}\|_{1}$ in time $O(d)$ , by taking the median of all values $m_{r}(h_{r}(f))$ for $r\in[d]$ . The existence of pseudorandom permurations such that the conditions of Lemma 5.1 hold, namely inequalities 3, is proved in Lemma 5.9, see next subsections for notation and definitions.

5.1 Proof of correctness assuming Inequalities (3) hold

We prove the first part of Theorem 2.7 (existence of $S$ ) assuming that the inequalities 3 hold, and thus the conditions of Lemma 5.1 hold.

For notational simplicity, let $\epsilon=(1/4)^{F-1}$ so the filter $\widehat{G}$ satisfies that $\widehat{G}_{f^{\prime}}\geq 1-\epsilon$ for all $f^{\prime}\in[-\frac{n}{2B},\frac{n}{2B}]$ and $\widehat{G}_{f^{\prime}}\leq\epsilon$ for all $f^{\prime}\in[n]\setminus(-\frac{n}{B},\frac{n}{B})$ . In the rest of the section, we choose $B=10(1-\epsilon)^{-1}\beta k$ rounded to the closest power of $2$ from above; $\beta$ is some constant to be determined.

As in previous Fourier sparse recovery papers [HIKP12a, IK14, Kap16, Kap17], we assume that we have the knowledge of $\mu=\|\widehat{x}_{-k}\|_{1}/k$ (or a constant factor upper bound) and that the signal-to-noise ratio $R^{\ast}=\|\widehat{x}\|_{1}/\mu\leq n^{\alpha}$ . Our estimation algorithm is similar to that in [IK14]. The main algorithm is Algorithm 1. It recovers the heavy coordinates of $\widehat{x}$ in increasing magnitude by repeatedly calling the subroutine Algorithm 2, which recovers the heavy coordinates of the residual spectrum above certain threshold.

The following lemmata are analogous to Lemmata 6.1 and 6.2 in [IK14], and their proofs are postponed to Section A. The first lemma states that Algorithm 2 will recover all the coordinates in the residual spectrum that are at least $\nu$ and it will not mistake a small coordinate for a large one.

Lemma 5.2 (guarantee of SubRecovery, Section A).

Consider the call $\textsc{SubRecovery}\{x,\widehat{z},\nu\}$ (Algorithm 2). Let $w=\widehat{x}-\widehat{z}$ . When $\nu\geq\frac{16}{\beta k}\|\widehat{w}\|_{1}$ , the output $\widehat{w}^{\prime}$ of Algorithm 2 satisfies

(i)

$|\widehat{w}_{f}|\geq(7/16)\nu$ * for all $f\in\operatorname{supp}(\widehat{w}^{\prime})$ .* 2. (ii)

$|\widehat{w}_{f}-\widehat{w}^{\prime}_{f}|\leq|\widehat{w}_{f}|/7$ * for all $i\in\operatorname{supp}(\widehat{w}^{\prime})$ ;* 3. (iii)

$\operatorname{supp}(\widehat{w}^{\prime})$ * contains all $f$ such that $|\widehat{w}_{f}|\geq\nu$ ;*

Next we turn to the analysis of Algorithm 1. Let $H=H(\widehat{x},k)$ and $I=\{f:|\widehat{x}_{f}|\geq\frac{1}{\rho k}\|\widehat{x}_{-k}\|_{1}\}$ for some constant $\rho$ to be determined. By the SNR assumption of $\widehat{x}$ , we have that $\|\widehat{x}_{H}\|_{1}\leq k\|\widehat{x}\|_{\infty}\leq R^{\ast}\|\widehat{x}_{-k}\|_{1}$ and thus $\|\widehat{x}\|_{1}\leq(R^{\ast}+1)\|\widehat{x}_{-k}\|_{1}$ . In Algorithm 1, the threshold in the $t$ -th step is

[TABLE]

where $C\geq 1,\gamma>1$ are constants to be determined. Let $r^{(t)}$ be the residual vector at the beginning of the $t$ -th step in the iteration. We can show that the coordinates we shall ever identify are all heavy (contained in $I$ ) and we always have good estimates of them.

Lemma 5.3 ( $\ell_{\infty}$ norm reduction, Section A).

There exist $C,\beta,\rho,\gamma$ such that it holds for all $0\leq t\leq T$ that

(a)

$\widehat{x}_{f}=r^{(t)}_{f}$ * for all $f\notin I$ ;* 2. (b)

$|r^{(t)}_{f}|\leq|\widehat{x}_{f}|$ * for all $f$ .* 3. (c)

$\|r^{(t)}_{I}\|_{\infty}\leq\nu^{(t)}$ ;

Now we are ready to show the first part of Theorem 2.7, which is one of our main results. We shall choose $d=O(k\log n)$ such that the conditions in (5.1) holds. The hashings $\{H_{r}\}_{r\in[d]}$ can be chosen deterministically, which we shall prove in the rest of the section after this proof; this will complete the full proof.

Proof of Theorem 2.7.

The recovery guarantee follows immediately from Lemma 5.3, as

[TABLE]

This implies that $\|\widehat{x}-\widehat{x}^{\prime}\|_{\infty}\leq(2/k)\|x_{-k}\|_{1}$ . To obtain the $\ell_{\infty}/\ell_{1}$ error guarantee, that is, to achieve a right-hand side of $(1/k)\|x_{-k}\|_{1}$ , we can just replace $k$ with $2k$ throughout our construction and analysis.

Number of Measurements.

Computing the measurements in SubRecovery requires $O(k)$ measurements (Lemma 4.9). These measurements are reused throughout the iteration in the overall algorithm, hence there are $O(kd)=O(k\cdot k\log n)=O(k^{2}\log n)$ measurements in total.

Running Time.

Each call to SubRecovery runs in time $O(d(B\log B+\|\widehat{z}\|_{0}\log n)+nd)=O(k^{2}\log k\log n+k\|\widehat{z}\|_{0}\log^{2}n+nk\log n)$ . By Lemma 5.3(a), we know that $\|\widehat{z}\|_{0}\leq|I|=O(k)$ . The overall runtime is therefore $O(k^{2}\log k\log n+nk\log^{2}n+k^{2}\log^{2}n)=O(k^{2}\log^{2}n+nk\log^{2}n)=O(nk\log^{2}n)$ .

∎

5.2 Choosing the hash functions

In this and the next subsection, we shall find $\{(\sigma_{r},a_{r},b_{r})\}_{r\in[d]}$ such that (3) holds for all pairs $f\neq f^{\prime}$ . It will be crucial for the next section that we can choose $a_{r}$ freely; that means the inequalities depend solely on $\sigma_{r},b_{r}$ . Note that $o_{f,r}(f)\in[-\frac{n}{2B},\frac{n}{2B}]$ and thus $\widehat{G}_{o_{f,r}(f)}\in[1-\epsilon,1]$ , it suffices to find $\{(\sigma_{r},b_{r})\}_{r\in[d]}$ such that it holds for all $f\neq f^{\prime}$ that

[TABLE]

We shall show how to do so in polynomial time in $n$ .

Definition 5.4 (Bad Events).

Let $C=2/(1+\epsilon)$ and $\beta=Cd/B$ . Let $A_{f,f^{\prime}}$ denote the event $\sum_{r=1}^{d}\widehat{G}_{o_{f,r}(f^{\prime})}\geq\beta$ .

Pessimistic Estimator

The derandomization proceeds as follows: find a pessimistic estimator $h_{r}(f,f^{\prime};\sigma_{1},b_{1},\dots,\sigma_{r},b_{r})$ for each $r$ with the first $r$ hash functions fixed by $(\sigma_{1},b_{1}),\dots,(\sigma_{r},b_{r})$ such that the following holds:

[TABLE]

Note that inequality 7 implies that there exist choices of the pseudorandom permutations such that the conditions of Lemma 5.1 hold. The algorithm will start with $r=0$ . At the $r$ -th step, it chooses $\sigma_{r+1},b_{r+1}$ to minimize

[TABLE]

By (8), this sum keeps decreasing as $r$ increases. At the end of step $d-1$ , all hash functions are fixed, and by (6) and (7), we have $\sum_{f\neq f^{\prime}}\Pr(A_{f,f^{\prime}}|\sigma_{1},b_{1},\dots,\sigma_{d},b_{d})<1$ . Since $A_{f,f^{\prime}}$ is a deterministic event conditioned on all $d$ hash functions, the conditional probability is either [math] or $1$ . The inequality above implies that all conditional probabilities are [math], i.e., none of the bad events $A_{f,f^{\prime}}$ happens, as desired.

We first define our pessimistic estimator. In what follows, we shall be dealing with numbers that might have up to $O(n)$ digits. Manipulating numbers of that length can be done in polynomial time. We will not bother with determining the exact exponent in the polynomial or optimizing it, which we leave to future work.

Definition 5.5 (Pessimistic Estimator).

Let $\lambda>0$ to be determined. Define

[TABLE]

where

[TABLE]

This function can be evaluated in $\widetilde{O}(r)\cdot\mathrm{poly}(n)$ time for each pair $f\neq f^{\prime}$ and thus the algorithm runs in polynomial time in $n$ .

To complete the proof, we shall verify (6)–(8) in Subsection 5.4.

5.3 Distribution of Offset Function

This subsection prepares auxiliary lemmata which will be used to verify the derandomization inequalities. In this subsection we focus on the distribution of the offset $o_{f,\sigma,b}(f^{\prime})$ for $f^{\prime}\neq f$ and appropriately random $\sigma$ and $b$ .

Lemma 5.6.

Suppose that $n,B$ are powers of $2$ , $\sigma$ is uniformly random on the odd integers in $[n]$ and $b$ is uniformly random in $[n]$ . For any fixed pair $f\neq f^{\prime}$ it holds that

(i)

When $(n/B)\nmid(f-f^{\prime})$ , $o_{f,\sigma,b}(f^{\prime})$ is uniformly distributed on $[n]$ ; 2. (ii)

When $(f-f^{\prime})/(n/B)$ is even, $\Pr\{o_{f,\sigma,b}(f^{\prime})=\ell\}=0$ for all $\ell\in[-\frac{n}{B},\frac{n}{B}]$ . 3. (iii)

When $(f-f^{\prime})/(n/B)$ is odd, $\Pr\{o_{f,\sigma,b}(f^{\prime})=\ell\}=0$ for $\ell\in[-\frac{n}{2B},\frac{n}{2B})$ and $\Pr\{o_{f,\sigma,b}(f^{\prime})=\ell\}=\frac{2}{n}$ for $\ell\in[-\frac{n}{B},-\frac{n}{2B})\cup[\frac{n}{2B},\frac{n}{B}]$ .

Proof.

First observe that

[TABLE]

For a fixed $\sigma$ , let

[TABLE]

Note that $\sigma(f-b)\bmod n$ as a function of $b$ is uniform on $[n]$ . Note also that

[TABLE]

which gives that $Z_{\sigma}$ is uniform on its support, which is $[-\frac{n}{2B},\frac{n}{2B})$ .

Suppose that $f^{\prime}-f\equiv 2^{s}K\pmod{n}$ , where $K\geq 1$ is an odd integer. It is clear that $\sigma(f^{\prime}-f)$ is uniform on its support $T=\{2^{s}\ell\bmod n:\ell\text{ is odd}\}$ , which consists of equidistant points. Since $Z_{\sigma}$ is always uniform (regardless of $\sigma$ ), and the distribution of $o_{f}(f^{\prime})=\sigma(f-f^{\prime})+Z_{\sigma}$ is the convolution of two distributions.

Suppose that now that $n=2^{r}$ and $B=2^{b}$ .

When $(n/B)\nmid(f^{\prime}-f)$ , it holds that $r-b\geq s+1$ , and thus $n/B$ is an integer multiple of the distance between two consecutive distance in $T$ . In this case it is easy to see that $o_{f,\sigma,b}(f^{\prime})$ is uniform on $[n]$ .

When $(f^{\prime}-f)/(n/B)$ is even, it must hold that $r-b\leq s-1$ and thus $n/B\leq 2^{s-1}$ . The support of $o_{f,\sigma,b}(f^{\prime})$ is

[TABLE]

which leaves a gap of width at least $2n/B$ in the middle between two consecutive points in $T$ .

When $(f^{\prime}-f)/(n/B)$ is odd, it must hold that $r-b=s$ and thus $n/B=2^{s}$ . The support of $o_{f,\sigma,b}(f^{\prime})$ therefore leaves a gap of width at least $n/B$ in the middle between two consecutive points in $T$ . It is easy to see that $o_{f,\sigma,b}(f^{\prime})$ is uniform on its support. ∎

The next theorem, which bounds the moment generating function of $\widehat{G}_{o_{f}(f^{\prime})}$ , is a straightforward corollary of Lemma 5.6.

Lemma 5.7.

Let $n$ , $\sigma$ and $b$ be as in Lemma 5.6. When $f\neq f^{\prime}$ , $\operatorname*{{\bf{E}}}\exp(\lambda\widehat{G}_{o_{f,\sigma,b}(f^{\prime})})\leq M(\lambda)$ .

Proof.

When $(n/B)\nmid(f-f^{\prime})$ ,

[TABLE]

where the inequality follows from the fact that $\widehat{G}$ is at most $1$ on $[-n/B,n/B]$ as at most $\epsilon$ elsewhere (recall Definition 4.5), and the equality from rearranging the terms.

When $f^{\prime}-f\equiv k(n/B)\pmod{n}$ for even $k$ ,

[TABLE]

since the filter $\widehat{G}$ is at most $\epsilon$ outside of $[-n/B,n/B]$ and the distribution $o_{f,\sigma,b}(f^{\prime})$ is not supported on that interval by Lemma 5.6.

When $f^{\prime}-f\equiv k(n/B)\pmod{n}$ for odd $k$ ,

[TABLE]

where the inequality follows again by combining Lemma 5.6(iii) and the bounds on $\widehat{G}$ from Definition 4.5, and the equality is just a rearrangement of terms. ∎

5.4 Putting the Pieces Together

We are now ready to verify (6)–(8).

Lemma 5.8 (Pessimistic Estimation).

It holds that

[TABLE]

Proof.

Let $z=\sum_{\ell=1}^{r}G_{o_{f,\ell}(f^{\prime})}$ . Then

[TABLE]

where the last inequality follows from Lemma 5.7. ∎

Lemma 5.9 (Initial constraint).

It holds that

[TABLE]

Proof.

It follows from Lemma 5.7 that

[TABLE]

Recall that we choose $B=\Theta(k)$ and $d=O(k\log n)$ . It follows that

[TABLE]

Lemma 5.10 (Derandomization step).

It holds that

[TABLE]

Proof.

Let $z=\sum_{\ell=1}^{r}G_{o_{f,r}(f^{\prime})}^{(\ell)}$ . The proposition is equivalent to

[TABLE]

This clearly holds by Lemma 5.7. ∎

6 Sublinear-Time Algorithm

In this section, we take the pseudorandom hashings $\{H_{r}\}_{r\in[d]}$ to be as in Lemma 5.1 and assume that (3) holds.

The first lemma concerns $1$ -sparse recovery, because, as in earlier works, we shall create $k$ subsignals using hashing, most of which are $1$ -sparse.

Lemma 6.1.

Suppose that $n$ is a power of $2$ . Let $Q=\{0,1,2,4,\dots,n/2\}\subseteq[n]$ . Then the following holds: Let $x\in\mathbb{C}^{n}$ and suppose that $|\widehat{x}_{f}|\geq 3\|\widehat{x}_{[n]\setminus\{f\}}\|$ for some $f\in[n]$ . Then one can recover the frequency $f$ from the samples $x_{Q}$ in $O(\log n)$ time.

Proof.

Define $\theta_{f^{\prime}}=\left(\frac{2\pi}{n}f^{\prime}\right)\bmod{2\pi}$ . Observe that

[TABLE]

It follows from Proposition 4.10 that $|\arg x_{q}-(\arg x_{f}+q\theta_{f})|\leq\pi/8$ . When $q=0$ , one has $|\arg x_{0}-\arg x_{f}|\leq\pi/8$ , and thus $|\arg(x_{q}/x_{0})-q\theta_{f}|\leq\pi/4$ .

Hence,

[TABLE]

Note that $I_{q}$ is the union of $q$ disjoint intervals of length $\pi/(2q)$ . We may view these intervals as arcs on the unit circle, each arc being of length $\pi/(2q)$ , and the left endpoints of every two consecutive arcs having distance $2\pi/q$ .

Define a series of intervals $\{S_{r}\}$ for $r=0,1,\dots,\log n-1$ recursively as

[TABLE]

It is easy to see, via an inductive argument, that $\theta_{f}\in S_{r}$ for all $0\leq r\leq\log n-1$ , and $|S_{r}|\leq\frac{\pi}{2^{r+1}}$ . In the end, $S_{\log n-1}$ is an interval of length $\pi/(2n)$ , which can contain only one $\theta_{f^{\prime}}$ , and thus we can recover $i$ .

Each $S_{r}$ can be computed in $O(1)$ time from $S_{r-1}$ and thus the overall runtime is $O(\log n)$ . ∎

Now we move to develop our sublinear-time algorithm. The following is an immediate corollary of Lemma 5.1.

Lemma 6.2.

For each $f$ , it holds for at least $8d/10$ indices $r\in[d]$ that

[TABLE]

Proof.

It follows from Lemma 5.1, Eq. (2) and the observation that $G_{o_{f,r}(f)}\in[1-\epsilon,1]$ . ∎

As before, we choose $B=10(1-\epsilon)^{-1}\beta k$ rounded to the closest power of $2$ ; $\beta$ is some constant to be determined. The following is a lemma for Algorithm 3, which gives the same guarantees as Lemma 5.2.

Lemma 6.3.

Suppose that $x,\widehat{z},\nu$ be the input to Algorithm 3. Let $w=\widehat{x}-\widehat{z}$ . When $\nu\geq\frac{16}{\beta k}\|\widehat{w}\|_{1}$ , the output $\widehat{w}^{\prime}$ of Algorithm 3 satisfies

(i)

$|\widehat{w}_{f}|\geq(7/16)\nu$ * for all $f\in\operatorname{supp}(\widehat{w}^{\prime})$ .* 2. (ii)

$|\widehat{w}_{f}-\widehat{w}^{\prime}_{f}|\leq|\widehat{w}_{f}|/7$ * for all $i\in\operatorname{supp}(\widehat{w}^{\prime})$ ;* 3. (iii)

$\operatorname{supp}(\widehat{w}^{\prime})$ * contains all $f$ such that $|\widehat{w}_{f}|\geq\nu$ ;*

Proof.

The proof of (i) and (ii) are the same as the proof of Lemma 5.2. Next we prove (iii). When $|\widehat{w}_{f}|\geq\nu$ , we have

[TABLE]

Hence for the signal $y_{r}\in\mathbb{C}^{n}$ defined via its Fourier coefficients as

[TABLE]

By Lemma 6.2, since $16(1-\epsilon)\geq 3$ , we see that $y_{r}$ with frequency $f$ satisfies the condition of Lemma 6.1 and thus it will be recovered in at least $8d/10$ repetitions $r\in[d]$ . The measurements are exactly $(m_{H})_{h(f)}$ with $q\in Q$ . The thresholding argument is the same as in the proof of Lemma 5.2. ∎

Observe that Lemma 5.3 continues to hold if we replace Algorithm 2 with Algorithm 3 and Lemma 5.2 with Lemma 6.3. Now we are ready to prove our main theorem, Theorem 2.8, on the sublinear-time algorithm.

Proof of Theorem 2.8.

The recovery guarantee follows identically as in the proof of Theorem 2.7.

The measurements are $u_{q}$ for $q\in Q$ in each of the $d$ repetitions, and calculating each $u_{q}$ requires $O(k)$ measurements (Lemma 4.9). There measurements are reused throughout the iteration in the overall algorithm, hence there are $O(kd|Q|)=O(k\cdot k\log n\cdot\log n)=O(k^{2}\log^{2}n)$ measurements in total.

Each call to SubRecovery runs in time $O(d(B\log B+\|\widehat{z}\|_{0}\log n+B\log n)+kd)=O(k^{2}\log^{2}n+k\|\widehat{z}\|_{0}\log^{2}n)=O(k^{2}\log^{2}n)$ , where we use the fact that $\|\widehat{z}\|=O(k)$ from Lemma 5.3(a). The overall runtime is therefore $O(k^{2}\log^{3}n)$ . ∎

7 Incoherent Matrices via Subsampling DFT Matrix

Consider an $N\times N$ unitary matrix $A$ and assume that $|A_{i,j}|\leq C/\sqrt{n}$ for all $i,j$ . Our goal in this section is to show how to sample deterministically $m=C_{m}k^{2}\log n$ rows of $A$ , obtaining a matrix $B$ , such that $|\langle B_{i},B_{j}\rangle|\leq m/(kn)$ for all pairs $i\neq j$ . Once we have such $B$ , the rescaled matrix $B^{\prime}=\sqrt{\frac{n}{m}}A$ is a $(1/k)$ -incoherent matrix, that is, $|\langle B^{\prime}_{i},B^{\prime}_{j}\rangle|\leq 1/k$ for all pairs $i\neq j$ .

Let $\delta_{1},\dots,\delta_{n}$ be i.i.d. Bernoulli variables with $\Pr(\delta_{\ell}=1)=p$ for some $p=m/n$ . Let $i,j\in[n]$ such that $i\neq j$ , then

[TABLE]

Let $z_{\ell}=A_{\ell,i}\overline{A}_{\ell,j}$ , then $|z_{\ell}|\leq\eta$ , where $\eta=C^{2}/n$ . We consider the real and the imaginary parts separately, since for a complex random variable $Z$ ,

[TABLE]

Hence it suffices to consider the real variable problem as follows. Suppose that $a_{1},\dots,a_{n}\in\operatorname{\mathbb{R}}$ satisfy $|a_{i}|\leq\eta$ , and consider the centred sum $S=\sum_{i}(\delta_{i}-p)a_{i}$ . We wish to find $\delta_{1},\dots,\delta_{n}$ deterministically such that $|S|\leq m/(kn)$ .

Define the pessimistic estimator to be

[TABLE]

The moment generating function of $(\delta_{i}-p)a_{i}$ is

[TABLE]

Pessimistic Estimation

Let $w=\sum_{i=1}^{r}(\delta_{i}-p)a_{i}$ , where $\delta_{1},\dots,\delta_{r}$ have been fixed.

[TABLE]

Derandomization step

One can show first that

[TABLE]

which is equivalent to

[TABLE]

where

[TABLE]

It is now clear that the left-hand side of (10) is $pM^{\prime}+(1-p)M^{\prime\prime}$ , and therefore (9) holds. This implies that

[TABLE]

Initial condition

This is a standard argument for Bernstein’s inequality. For notational convenience, let $\phi(x)=(e^{\lambda x}-\lambda x-1)/x^{2}$ . Note that $\phi(x)$ is increasing on $(0,\infty)$ . Using Taylor’s expansion, one can bound that (see [BLM13, p35])

[TABLE]

and (see [Tro15, p98])

[TABLE]

It then follows (see [Tro15, p98]) that

[TABLE]

provided that $\lambda=t/(n\eta^{2}p(1-p)+t\eta/3)\in(0,3/\eta)$ .

When $t=m/(kn)$ , $p=m/n$ and $\eta=C^{2}/n$ , $\lambda\simeq\log n<3/\eta$ and the above probability is at most

[TABLE]

provided that $C_{m}$ is large enough.

Therefore at step $r$ , the algorithm minimizes $f_{r+1}(\delta_{1},\dots,\delta_{r+1})$ by choosing $\delta_{r+1}$ , and at the end of step $r+1$ , all $\delta_{1},\dots,\delta_{r}$ have been fixed and such that $|\sum_{i}(\delta_{i}-p)a_{i}|\leq t$ .

Now we return to the original incoherence problem in the complex case. We can define $2n(n-1)$ events, $E_{i,j}$ and $F_{i,j}$ , for every pair $i\neq j$ as

[TABLE]

For each pair of $i\neq j$ , using the preceding argument, we have pessimistic estimators $f^{(1)}_{r}(i,j;\delta_{1},\dots,\delta_{r})$ by setting $a_{\ell}=\operatorname{Re}B_{i,\ell}\overline{B_{\ell,i}}$ and $f^{(2)}_{r}(i,j;\delta_{1},\dots,\delta_{r})$ by setting $a_{\ell}=\operatorname{Im}B_{i,\ell}\overline{B_{\ell,j}}$ such that

•

(pessimistic estimation)

[TABLE]

•

(derandomization step)

[TABLE]

•

(initial condition)

[TABLE]

Note that (11) implies

[TABLE]

In addition, we also need to control the number of $\delta_{i}$ ’s which take value $1$ ; we want this number to be $O(m)$ . This can be achieved by combining another derandomization procedure on $\sum_{i}\delta_{i}$ using one-sided Chernoff bounds. Define the event $G=\{\sum_{i}\delta_{i}>2m\}$ . Then for $\kappa>0$ ,

[TABLE]

where

[TABLE]

is the moment generating function of $\delta_{i}$ . Define our pessimistic estimator to be

[TABLE]

then, similar to the proof in Section 5, we have

•

(pessimistic estimation)

[TABLE]

•

(derandomization step)

[TABLE]

•

(initial condition) When $\kappa$ is small enough and $C_{m}$ large enough,

[TABLE]

Overall, our standard derandomization procedure, which at step $r$ chooses $\delta_{r+1}\in\{0,1\}$ that minimizes

[TABLE]

will find $\delta_{1},\dots,\delta_{r}$ such that none of $E_{i,j}$ and $F_{i,j}$ and $G$ holds, which implies that $|\langle B_{i},B_{j}\rangle|\leq t=m/(kn)$ for all $i\neq j$ and $\sum\delta_{i}\leq 2m$ . That is, we have chosen $2m$ rows of $A$ , obtaining a matrix $B$ of incoherence at most $m/(kn)$ .

8 Incoherent Matrices and Analytic Number Theory

In this section we give new results via the connection between the incoherent matrices and the exponential sum of characters, a classical quantity of interest in analytic number theory. Such connection has been formerly exploited, for instance, by Xu [Xu11] and Bourgain et al. [BDF*+*11] for explicit constructions of RIP matrices. We utilize the connection bidirectionally: we shall give explicit constructions of incoherent matrices using exponential sums, and improve the lower bound of an exponential sum using a lower bound of incoherent matrices.

8.1 A simple construction via Gauss sums

We give a rather simple construction of an $(1/\sqrt{n})$ -incoherent matrix $M\in\mathbb{C}^{\frac{n+1}{2}\times n}$ . It is expected that Gauss sums will behave nicely for incoherent matrices, since they have the optimal rate of cancellation: summing $p/2$ elements gives cancellation $\sqrt{p}$ . Let $p$ be a prime number and let $Q=\{x\in\operatorname{\mathbb{Z}}_{p}:\exists y\in\operatorname{\mathbb{Z}}_{p},y^{2}=x\}$ , i.e. the set of quadratic residues in $\operatorname{\mathbb{Z}}_{p}$ , including [math]. It is a standard fact that $|Q|=(p+1)/2$ . We shall show that the rows of the DFT matrix indexed by the elements of $Q$ give an incoherent matrix with an appropriate scaling. Let $\omega=e^{2\pi\sqrt{-1}/p}$ . Observe that

[TABLE]

where the last inequality follows from the triangle inequality and the standard property of Gauss sums (see, e.g., [IR90, p91]).

Now, let $M\in\operatorname{\mathbb{C}}^{|Q|\times p}$ be defined as $M_{x,t}=\omega^{tx}$ for $x\in Q$ and $t\in\operatorname{\mathbb{Z}}_{p}$ . For every pair $(t_{1},t_{2})\in\operatorname{\mathbb{Z}}_{p}\times\operatorname{\mathbb{Z}}_{p}$ with $t_{1}\neq t_{2}$ , we have that the inner product of the $t_{1}$ -th and the $t_{2}$ -th column of $M$ is exactly $\sum_{x\in Q}\omega^{(t_{1}-t_{2})x}$ . Normalising $M$ gives the desired result.

8.2 Proof of Theorem 2.11

In the previous subsection we obtained an incoherent matrix by picking the rows of DFT indexed by quadratic residues, i.e. quadratic polynomials. Motivated by this, we show that taking polynomials of a higher degree can give an improved result that works in a larger range of parameters. We shall need the following deep theorem of Weyl.

Theorem 8.1 ([Nat96, Theorem 4.3]).

Let $M,N,q$ be positive integers and $\alpha$ an integer such that $(\alpha,q)=1$ . If $g$ is a real polynomial of degree $d\geq 2$ with leading coefficient $a$ such that $|a-\frac{\alpha}{q}|\leq q^{-2}$ , then for any $\epsilon>0$ we have

[TABLE]

where the hidden constant in the $O$ -notation depends on $d$ and $\epsilon$ .

We are now ready to prove Theorem 2.11.

Proof.

Pick any polynomial $g$ of degree $d$ such that every coefficient of $g$ is an integer multiple of $1/p$ . Pick also any $m$ consecutive points in $\operatorname{\mathbb{Z}}_{p}$ ; we can just take [math] to $m-1$ . Take the rows of DFT indexed by $g$ evaluated on these $m$ consecutive points. We shall show that after appropriate normalization this corresponds to an incoherent matrix of the desired form. The inner product between two columns indexed by $t_{1},t_{2}$ of the formed matrix is

[TABLE]

Observe that $g(t_{1})-g(t_{2})$ is a $d$ -degree polynomial where every coefficient is an integer multiple of $1/p$ . Applying Theorem 8.1 with $N=m$ , $a=\frac{t}{p}$ , $\alpha=t$ , $q=p$ and noticing that $q\geq N$ , we see that the above sum is at most

[TABLE]

Rescale the formed matrix by $1/\sqrt{m}$ , the incoherence of the matrix is rescaled by $1/m$ and thus becomes

[TABLE]

yielding the desired result. ∎

8.3 Proof of Theorem 2.12

Proof.

Suppose that $g$ is a generator of the multiplicative cyclic group $\operatorname{\mathbb{Z}}_{p}^{\ast}$ (we shall show how to find such $g$ later). For every $d$ that divides $p-1$ we shall take the rows of DFT indexed by the multiplicative subgroup $G$ that is generated by $g^{(p-1)/d}$ . Since $g$ is a generator of $\operatorname{\mathbb{Z}}_{p}^{\ast}$ it must hold that $|G|=d$ . The incoherence bound follows by a classical fact that (see, e.g. [Kur07]) for any $t_{1},t_{2}\in\mathbb{Z}_{p}$ with $t_{1}\neq t_{2}$ ,

[TABLE]

Rescaling gives the desired incoherence bound.

To find a generator $g$ of $\operatorname{\mathbb{Z}}_{p}^{\ast}$ is a classic problem with a rich research history. We include a simple, standard algorithm below for completeness.

The first step is to factor $p-1$ in $\widetilde{O}(\sqrt{p})$ time. We can find all primes smaller than $\sqrt{p-1}$ in $O(\sqrt{p}\log\log p)$ time using Eratosthene’s sieve. For each such prime $q$ we shall find the highest power $q^{\ell}$ which divides $p-1$ . Let $t$ be the number that is obtained after dividing $p-1$ with $q^{\ell}$ for all such $q,\ell$ . If $t\neq 1$ , it must be a prime, otherwise for $t=ab$ one of $a,b$ would be at most $\sqrt{t}\leq\sqrt{p-1}$ .

Now we are ready to find a generator $g$ . It is known that the smallest generator of $\operatorname{\mathbb{Z}}_{p}^{\ast}$ is $O(p^{1/4+\epsilon})$ [Bur62, Theorem 3] and thus we shall iterate over the first $O(p^{1/4+\epsilon})$ elements of $\operatorname{\mathbb{Z}}_{p}^{\ast}$ and check if every such element $z$ is a generator by checking whether $z^{(p-1)/d}\neq 1$ in $\mathbb{Z}_{p}^{*}$ for all prime divisors $d$ of $p-1$ . To ensure that such a $z$ is a generator, observe first that the checking condition guarantees that $z$ is of order $p-1$ , and checking only prime $d$ suffices (since if $d$ is composite and $z^{(p-1)/d}=1$ this implies $z^{(p-1)/{d^{\prime}}}=1$ for all divisors $d^{\prime}$ of $d$ ); moreover, it is a basic fact in group theory that the order of any subgroup divides the order of the group and hence we need only look at divisors of $p-1$ . The runtime of this part is $\widetilde{O}(p^{1/4+\epsilon})$ . ∎

8.4 Strengthening the lower bound in [Win01]

The lower bound in [Win01] states that for any $n,d\geq 2$ and with $\mathrm{gcd}(d,n-1)=1$ , any subset $S\subseteq\operatorname{\mathbb{Z}}_{n}$ , there exists $b\in\operatorname{\mathbb{Z}}_{n}^{\ast}$ and an irreducible $d$ -degree polynomial $g$ with coefficients in $\operatorname{\mathbb{Z}}_{n}$ , such that

[TABLE]

With the connection to incoherent matrices and the lower bound of Alon, we obtain a much stronger result. In fact we have for any $d\geq 1$ and any polynomial $g$ with coefficients in $\operatorname{\mathbb{Z}}_{n}$ that

[TABLE]

for some $b\in\operatorname{\mathbb{Z}}_{n}^{\ast}$ , provided that $|S|=\Omega(\log n/\log\log n)$ . In the case that $|S|=O(\log n/\log\log n)$ we still have a lower bound of $\Omega(\sqrt{|S|})$ .

Note that the condition $d\geq 2$ has been relaxed to $d\geq 1$ , the assumption that $\gcd(d,n-1)=1$ has been removed, the conclusion “there exists an irreducible polynomial” has been replaced with the condition “for any polynomial”, and the right-hand side has been amplified by a multiplicative factor of $\sqrt{\log_{|S|}n}$ for $|S|=\Omega(\log n/\log\log n)$ .

Our new lower bound follows immediately from Alon’s lower bound on incoherent matrices [Alo09]. Indeed, assume that there exists a polynomial $g$ such that for all $b\in\operatorname{\mathbb{Z}}_{n}^{\ast}$ the left-hand side of (12) is at most $c\sqrt{|S|\log_{|S|}n}$ for some absolute constant $c$ . Consider the matrix with the rows of the DFT matrix indexed by numbers $\{g(x):x\in S\}$ (some rows of the DFT matrix may appear more than once). Observe that after normalizing the matrix by $\frac{1}{\sqrt{|S|}}$ , the incoherence is

[TABLE]

This would violate the lower bound in [Alo09], which states that an $m\times n$ $(1/k)$ -incoherent matrix must satisfy $m\geq\alpha\cdot k^{2}\log_{k}n$ for some absolute constant $\alpha$ , since

[TABLE]

for $c$ small enough, when $|S|=\Omega(\log n/\log\log n)$ . In the case of $|S|=O(\log n/\log\log n)$ we can still use the quadratic bound ( $m=\Omega(k^{2})$ ) on incoherent matrices to obtain a bound of $\Omega(\sqrt{|S|})$ .

9 Open Problems and Future Direction

A direction of research is to design deterministic schemes that break the quadratic barrier for signals with structured Fourier support. For example, subsampling the rows of the DFT matrix to obtain RIP matrices depends highly on the structure of the vectors we would like to preserve. The more additive structure the support of a $k$ -sparse vector $x$ has, the worse is the concentration of a random Fourier coefficient of $x$ . Equivalently, the less additive structure the support of $x$ has, the flatter its Fourier transform is, and hence, the better concentration bounds we obtain. The concentration in the extreme case, when the support of $x$ is “dissociated”, is captured by the renowned Rudin’s inequality in additive combinatorics (see, e.g. [TV06, Lemma 4.33]). We thus believe that it is an interesting direction to use machinery from the field of additive combinatorics and the relevant fields in order to obtain new constructions and algorithms, at least for interesting subclasses of structured signals.

10 Acknowledgements

We would like to thank anonymous reviewers for their valuable feedback.

Appendix A Reduction of the $\ell_{\infty}$ norm

Lemma 5.2.

Suppose that $x,\widehat{z},\nu$ be the input to Algorithm 3. Let $w=\widehat{x}-\widehat{z}$ . When $\nu\geq\frac{16}{\beta k}\|\widehat{w}\|_{1}$ , the output $\widehat{w}^{\prime}$ of Algorithm 3 satisfies

(i)

$|\widehat{w}_{f}|\geq(7/16)\nu$ * for all $i\in\operatorname{supp}(\widehat{w}^{\prime})$ .* 2. (ii)

$|\widehat{w}_{f}-\widehat{w}^{\prime}_{f}|\leq|\widehat{w}_{f}|/7$ * for all $i\in\operatorname{supp}(\widehat{w}^{\prime})$ ;* 3. (iii)

$\operatorname{supp}(\widehat{w}^{\prime})$ * contains all $i$ such that $|\widehat{w}_{f}|\geq\nu$ ;*

Proof.

By the recovery guarantee we know that

[TABLE]

By thresholding, it must hold for $i\in\operatorname{supp}(\widehat{w}^{\prime})$ that $|\widehat{w}^{\prime}_{f}|\geq\nu/2$ and thus

[TABLE]

which proves (i). Thus

[TABLE]

which proves (ii). Next we prove (iii). When $|\widehat{w}_{f}|\geq\nu$ , we have

[TABLE]

Hence for the signal $y_{r}\in\mathbb{C}^{n}$ defined via its Fourier coefficients as

[TABLE]

By Lemma 6.2, since $16(1-\epsilon)\geq 3$ , we see that $y_{r}$ with index $i$ satisfies the condition of Lemma 6.1 and thus it will be recovered in at least $8d/10$ indices $r\in[d]$ . The measurements are exactly $(m_{H})_{h(i)}$ with $q\in Q$ . The recovered estimate is at least $\nu-\nu/16>\nu/2$ and thus the median estimate will pass the thresholding, and $i\in\operatorname{supp}(\widehat{w}^{\prime})$ . ∎

Let $H=H(x,k)$ and $I=\{f:|\widehat{x}_{f}|\geq\frac{1}{\rho k}\|x_{-k}\|_{1}\}$ . By the SNR assumption of $\widehat{x}$ , we have that $\|\widehat{x}_{H}\|_{1}\leq k\|\widehat{x}\|_{\infty}\leq R^{\ast}\|\widehat{x}_{-k}\|_{1}$ and thus $\|\widehat{x}\|_{1}\leq(R^{\ast}+1)\|\widehat{x}_{-k}\|_{1}$ . Let $r^{(t)}$ be the residual vector at the beginning of the $t$ -th step in the iteration. The threshold in the $t$ -th step is

[TABLE]

where $C\geq 1,\gamma>1$ are constants to be determined.

Lemma 5.3.

There exist $C,\beta,\rho,\gamma$ such that it holds for all $0\leq t\leq T$ that

(a)

$\widehat{x}_{f}=r^{(t)}_{f}$ * for all $f\notin I$ ;* 2. (b)

$|r^{(t)}_{f}|\leq|\widehat{x}_{f}|$ * for all $f$ .* 3. (c)

$\|r^{(t)}_{f}\|_{\infty}\leq\nu^{(t)}$ ;

Proof.

We prove the three properties inductively. The base case is $t=0$ , where all properties clearly hold, noticing that $\mu\gamma^{T}=\|x\|_{\infty}$ .

Next we prove the inductive step from $t$ to $t+1$ . Note that

[TABLE]

When

[TABLE]

it holds that

[TABLE]

and thus Lemma 5.2 applies.

From Lemma 5.2(i), we know that when

[TABLE]

no coordinates in $I^{c}$ will be modified. This proves (a).

Lemma 5.2(ii) implies (b).

To prove (c), let $J=\{f\in I:|r^{(t)}_{f}|\geq\nu^{(t+1)}\}$ . By Lemma 5.2(iii), all coordinates in $J$ will be recovered. Hence for $f\in J$ ,

[TABLE]

provided that

[TABLE]

For $f\in I\setminus J$ , the definition of $J$ implies that $|r^{(t+1)}_{f}|\leq\nu^{(t+1)}$ . This proves (c).

We can take $C=2$ , $\rho=32$ , $\beta=32$ , $\gamma=2$ , which satisfy all the constraints (13), (14) and (15). ∎

Bibliography67

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AGS 03] Adi Akavia, Shafi Goldwasser, and Shmuel Safra. Proving hard-core predicates using list decoding. In FOCS , volume 44, pages 146–159, 2003.
2[Aka 10] Adi Akavia. Deterministic sparse Fourier approximation via fooling arithmetic progressions. In COLT , pages 381–393, 2010.
3[Aka 14] Adi Akavia. Deterministic sparse Fourier approximation via approximating arithmetic progressions. IEEE Transactions on Information Theory , 60(3):1733–1741, 2014.
4[AKM + 18] Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, and Amir Zandieh. A universal sampling method for reconstructing signals with simple Fourier transforms. ar Xiv preprint ar Xiv:1812.08723 , 2018.
5[Alo 09] Noga Alon. Perturbed identity matrices have high rank: Proof and applications. Combinatorics, Probability and Computing , 18(1-2):3–15, 2009.
6[AM 11] Arash Amini and Farokh Marvasti. Deterministic construction of binary, bipolar, and ternary compressed sensing matrices. IEEE Transactions on Information Theory , 57(4):2360–2370, 2011.
7[BCG + 14] Petros Boufounos, Volkan Cevher, Anna C Gilbert, Yi Li, and Martin J Strauss. What’s the frequency, Kenneth?: Sublinear Fourier sampling off the grid. In Algorithmica(A preliminary version of this paper appeared in the Proceedings of RANDOM/APPROX 2012, LNCS 7408, pp.61–72) , pages 1–28. Springer, 2014.
8[BDF + 11] Jean Bourgain, Stephen J Dilworth, Kevin Ford, Sergei V Konyagin, and Denka Kutzarova. Breaking the k 2 superscript 𝑘 2 k^{2} barrier for explicit RIP matrices. In Proceedings of the forty-third annual ACM symposium on Theory of computing , pages 637–644. ACM, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Deterministic Sparse Fourier Transform with an ℓ∞\ell_{\infty}ℓ∞​ Guarantee

1 Introduction

Question 1.1**.**

Definition 1.2** (ℓ∞/ℓ1\ell_{\infty}/\ell_{1}ℓ∞​/ℓ1​ guarantee).**

ℓ∞/ℓ1\ell_{\infty}/\ell_{1}ℓ∞​/ℓ1​ versus ℓ2/ℓ1\ell_{2}/\ell_{1}ℓ2​/ℓ1​: A matter of “find all” versus “miss all”.

Previous Work on ℓ∞/ℓ1\ell_{\infty}/\ell_{1}ℓ∞​/ℓ1​ with arbitrary linear measurements.

Our Contribution.

2 Technical Results

2.1 Preliminaries

2.1.1 ℓ∞/ℓ1\ell_{\infty}/\ell_{1}ℓ∞​/ℓ1​ Gurantee and incoherent matrices

Definition 2.1** (Incoherent Matrix).**

Lemma 2.2** ([NNW14]).**

2.1.2 The Restrictred Isometry Property and its connection with incoherence

Definition 2.3** (Restricted Isometry Property).**

Lemma 2.4**.**

Proposition 2.5** (folklore).**

Proposition 2.6** (folklore).**

2.2 Our results

2.2.1 Sparse Fourier Transform Algorithms

Theorem 2.7** (Deterministic SFT with super-linear time, Section 5).**

Theorem 2.8** (Deterministic SFT with sublinear time, Section 6).**

Remark 2.9**.**

2.2.2 From DFT to incoherent matrices

An Explicit Construction: Derandomization in poly(n)\mathrm{poly}(n)poly(n) time.

Theorem 2.10** (Incoherent matrices by derandomized subsampling of DFT, Section 7).**

Strongly explicit constructions: Derandomization in sub-linear time

Theorem 2.11** (Incoherent matrices from DFT via low-degree polynomials, Section 8).**

Theorem 2.12** (Incoherent matrices from DFT via multiplicative subgroups, Section 8).**

Remark 2.13**.**

2.3 Comparing ℓ2/ℓ1\ell_{2}/\ell_{1}ℓ2​/ℓ1​ with ℓ∞/ℓ1\ell_{\infty}/\ell_{1}ℓ∞​/ℓ1​

3 Overview

Sparse Fourier Transform Algorithms (Subsection 2.2.1).

Incoherent Matrices from the Fourier ensemble (Subsection 2.2.2).

4 Technical Toolkit

4.1 Hash Functions

Definition 4.1** (Frequency domain hashings π,h,o\pi,h,oπ,h,o).**

Definition 4.2** (Pσ,a,bP_{\sigma,a,b}Pσ,a,b​).**

Proposition 4.3** ([HIKP12a, Claim 2.2]).**

Definition 4.4** (Sequence of Hashings).**

4.2 Filter Functions

Definition 4.5** (Flat filter with BBB buckets and sharpness FFF [Kap17]).**

Lemma 4.6** (Compactly supported flat filter with BBB buckets and sharpness FFF [Kap17]).**

Lemma 4.7** ([HIKP12b, Lemma 3.6], [HIKP12a, Lemma 2.4], [IK14, Lemma 3.2]).**

4.3 Formulas for Estimation

Definition 4.8** (Measurement).**

Lemma 4.9** (HashToBins [Kap17, Lemma 2.8]).**

Proposition 4.10**.**

Proof.

5 Linear-Time Algorithm

Lemma 5.1**.**

Proof.

5.1 Proof of correctness assuming Inequalities (3) hold

Lemma 5.2** (guarantee of SubRecovery, Section A).**

Lemma 5.3** (ℓ∞\ell_{\infty}ℓ∞​ norm reduction, Section A).**

Proof of Theorem 2.7.

Number of Measurements.

Running Time.

5.2 Choosing the hash functions

Definition 5.4** (Bad Events).**

Pessimistic Estimator

Definition 5.5** (Pessimistic Estimator).**

5.3 Distribution of Offset Function

Lemma 5.6**.**

Proof.

Lemma 5.7**.**

Proof.

5.4 Putting the Pieces Together

Lemma 5.8** (Pessimistic Estimation).**

Proof.

Lemma 5.9** (Initial constraint).**

Proof.

Lemma 5.10** (Derandomization step).**

Proof.

6 Sublinear-Time Algorithm

Deterministic Sparse Fourier Transform with an $\ell_{\infty}$ Guarantee

Question 1.1.

Definition 1.2 ( $\ell_{\infty}/\ell_{1}$ guarantee).

$\ell_{\infty}/\ell_{1}$ versus $\ell_{2}/\ell_{1}$ : A matter of “find all” versus “miss all”.

Previous Work on $\ell_{\infty}/\ell_{1}$ with arbitrary linear measurements.

2.1.1 $\ell_{\infty}/\ell_{1}$ Gurantee and incoherent matrices

Definition 2.1 (Incoherent Matrix).

Lemma 2.2 ([NNW14]).

Definition 2.3 (Restricted Isometry Property).

Lemma 2.4.

Proposition 2.5 (folklore).

Proposition 2.6 (folklore).

Theorem 2.7 (Deterministic SFT with super-linear time, Section 5).

Theorem 2.8 (Deterministic SFT with sublinear time, Section 6).

Remark 2.9.

An Explicit Construction: Derandomization in $\mathrm{poly}(n)$ time.

Theorem 2.10 (Incoherent matrices by derandomized subsampling of DFT, Section 7).

Theorem 2.11 (Incoherent matrices from DFT via low-degree polynomials, Section 8).

Theorem 2.12 (Incoherent matrices from DFT via multiplicative subgroups, Section 8).

Remark 2.13.

2.3 Comparing $\ell_{2}/\ell_{1}$ with $\ell_{\infty}/\ell_{1}$

Definition 4.1 (Frequency domain hashings $\pi,h,o$ ).

Definition 4.2 ( $P_{\sigma,a,b}$ ).

Proposition 4.3 ([HIKP12a, Claim 2.2]).

Definition 4.4 (Sequence of Hashings).

Definition 4.5 (Flat filter with $B$ buckets and sharpness $F$ [Kap17]).

Lemma 4.6 (Compactly supported flat filter with $B$ buckets and sharpness $F$ [Kap17]).

Lemma 4.7 ([HIKP12b, Lemma 3.6], [HIKP12a, Lemma 2.4], [IK14, Lemma 3.2]).

Definition 4.8 (Measurement).

Lemma 4.9 (HashToBins [Kap17, Lemma 2.8]).

Proposition 4.10.

Lemma 5.1.

Lemma 5.2 (guarantee of SubRecovery, Section A).

Lemma 5.3 ( $\ell_{\infty}$ norm reduction, Section A).

Definition 5.4 (Bad Events).

Definition 5.5 (Pessimistic Estimator).

Lemma 5.6.

Lemma 5.7.

Lemma 5.8 (Pessimistic Estimation).

Lemma 5.9 (Initial constraint).

Lemma 5.10 (Derandomization step).

Lemma 6.1.

Lemma 6.2.

Lemma 6.3.

Theorem 8.1 ([Nat96, Theorem 4.3]).

Appendix A Reduction of the $\ell_{\infty}$ norm