Deterministic Sparse Fourier Transform with an ell_infty Guarantee
Yi Li, Vasileios Nakos

TL;DR
This paper develops deterministic algorithms for sparse Fourier transform recovery with strong infinity-norm guarantees, matching known lower bounds and constructing incoherent matrices via derandomization techniques.
Contribution
It introduces nearly optimal deterministic sampling and recovery algorithms for sparse Fourier transforms with ll_{}/ll_1 guarantees, and provides new derandomized incoherent matrix constructions.
Findings
Deterministic ll_{}/ll_1 recovery with O(k^2 g n) samples.
New derandomized incoherent matrix constructions matching randomized bounds.
Algorithms are nearly sample-optimal, approaching theoretical lower bounds.
Abstract
In this paper we revisit the deterministic version of the Sparse Fourier Transform problem, which asks to read only a few entries of and design a recovery algorithm such that the output of the algorithm approximates , the Discrete Fourier Transform (DFT) of . The randomized case has been well-understood, while the main work in the deterministic case is that of Merhi et al.\@ (J Fourier Anal Appl 2018), which obtains samples and a similar runtime with the guarantee. We focus on the stronger guarantee and the closely related problem of incoherent matrices. We list our contributions as follows. 1. We find a deterministic collection of samples for the recovery in time , and a deterministic collection of samples…
| Notation | Semantics |
|---|---|
| Absolute Constant | |
| Number of “Buckets”, power of | |
| Number of “repetitions” | |
| equals | |
| Rate of SNR decrease | |
| Given Approximation to SNR | |
| Approximation of SNR at the -th step | |
| Residual at the -th step |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Deterministic Sparse Fourier Transform with an Guarantee
Yi Li
Nanyang Technological University
Vasileios Nakos This work is part of the project TIPEA that has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No. 850979). Part of the work was completed when the author was a Ph.D. student in Harvard University and supported in part by NSF CAREER award CCF-1350670. Saarland University and Max-Planck Institute for Informatics
In this paper we revisit the deterministic version of the Sparse Fourier Transform problem, which asks to read only a few entries of and design a recovery algorithm such that the output of the algorithm approximates , the Discrete Fourier Transform (DFT) of . The randomized case has been well-understood, while the main work in the deterministic case is that of Merhi et al. (J Fourier Anal Appl 2018), which obtains samples and a similar runtime with the guarantee. We focus on the stronger guarantee and the closely related problem of incoherent matrices. We list our contributions as follows.
We find a deterministic collection of samples for the recovery in time , and a deterministic collection of samples for the sparse recovery in time . 2. 2.
We give new deterministic constructions of incoherent matrices that are row-sampled submatrices of the DFT matrix, via a derandomization of Bernstein’s inequality and bounds on exponential sums considered in analytic number theory. Our first construction matches a previous randomized construction of Nelson, Nguyen and Woodruff (RANDOM 12), where there was no constraint on the form of the incoherent matrix.
Our algorithms are nearly sample-optimal, since a lower bound of is known, even for the case where the sensing matrix can be arbitrarily designed. A similar lower bound of is known for incoherent matrices.
1 Introduction
Compressed sensing is a subfield of discrete signal processing, based on the principle that a high-dimensional signal can be approximately reconstructed, by exploiting its sparsity, in fewer samples than those demanded by the Shannon-Nyquist theorem. An important subtopic is the Sparse Fourier Transform, where we desire to detect and approximate the largest coordinates of a high-dimensional signal, given a few samples from its Fourier spectrum. Fewer samples play a crucial role, for example, in medical imaging, where reconstructing an image corresponds exactly to reconstructing a signal from its Fourier representation. Thus, the number of Fourier coefficients needed for (approximate) reconstruction is proportional to the radiation dose a patient receives as well as the time the patient needs to remain in the scanner. Furthermore, exploiting the sparsity of the signal has given researchers the hope of defeating the FFT algorithm of Cooley and Tukey, in the special (but of high practical value) case where the signal is approximately sparse. Thus, since FFT serves as an important computational primitive, and has been recognized as one of the 10 most important algorithms of the 20th century [Cip00], every place where it has found application can possibly be benefited from a faster algorithm. The main intuition and hope is that signals arising in practice often exhibit certain structure, such as concentration of energy in a small number of Fourier coefficients.
Since vectors in practice are never exactly sparse, and it is impossible to reconstruct a generic vector from samples, researchers resort to approximation. More formally, a sparse recovery scheme consists of a sample set and a recovery algorithm such that for any given , the scheme approximates by , where denotes the vector of restricted to the coordinates in . The fineness of approximation is measured with respect to the best -sparse approximation to . The breakthrough work of Candès, Tao and Donoho [CT06, Don06] first showed that samples of suffices to reconstruct a -sparse vector which is “close” to the best -approximation of . More formally, the reconstruction satisfies the so-called guarantee, i.e.,
[TABLE]
where is the tail vector, obtained from restricting to its smallest coordinates in magnitude. The strength of their algorithms lies in the uniformity, in the sense that the samples at the same coordinates can be used to approximate every . However, the running time is polynomial in the vector length , giving thus only sample-efficient, but not necessarily time-efficient, algorithms. Furthermore, the samples are not obtained via a deterministic procedure, but are chosen at random. Regarding non-uniform randomized algorithms that run in sublinear time, numerous researchers have worked on the problem and obtained a series of algorithms with different recovery guarantees [GL89, Man92, KM93, GGI*+*02, AGS03, GMS05, HIKP12a, HIKP12b, LWC13, Iwe13, PR14, IKP14, IK14, Kap16, Kap17, KVZ19, NSW19]. See Table 1 for a list of common recovery guarantees. The state of the art is the seminal algorithm of Kapralov [Kap17], which shows that samples and time are simultaneously possible for the guarantee (which is strictly stronger111Here we mean that given an algorithm giving the guarantee, one can create an algorithm, using the algorithm as a black box, with sparsity parameter , achieving the guarantee with the same order of number of samples. than the ). The fastest algorithm is due to [HIKP12a], needing time and samples. We note also the algorithm of Indyk and Kapralov [IK14] that runs in time, uses samples but gives a stronger guarantee than the guarantee in the previous two papers. We refer the reader to the next section for comparison of the different guarantees appearing in the literature. Recently there has been also considerable work on recovering -sparse signals from their continuous Fourier Transform, see [BCG*+*14, PS15, CKPS16, AKM*+*18].
Although our understanding on randomized algorithms is almost complete, there are still important gaps in our knowledge regarding deterministic schemes. The following natural open-ended question has theoretical and practical interest and remains in principle highly unexplored, touching a variety of fields including (sublinear-time) algorithms, pseudorandomness and computational complexity, Additive Combinatorics [BDF*+*11] and analytic number theory.
Question 1.1**.**
What are the best bounds we can obtain for the different versions of the deterministic Sparse Fourier Transform problem?
With sublinear runtime, the earliest work of Iwen [Iwe08, Iwe10] gives samples and time, albeit in a significantly easier (although similar) model: where one wants to learn a band-limited function and can evaluate at any point. In the discrete case which we are interested in, the state of the art is the work of Merhi et al. [MZIC18], which obtains samples and the same runtime. A recent work of Bittens et al. [BZI17] showed that the quadratic dependence can be dropped if the signals are sufficiently structured, namely, if the Fourier coefficients are generated by an unknown but small degree polynomial. On the related problem of the Walsh-Hamadard Transform, Indyk and Cheraghchi [CI17] showed that roughly samples and similar run-time are possible, if one resorts to a slightly weaker guarantee. Interestingly, their approach resides in a novel connection between the Walsh-Hadamard matrix and linear lossless condensers. However, this connection does not extend to the Fourier Transform over , which is our focus and the most interesting case. Interesting ideas appear also in the work of Akavia [Aka10, Aka14], where it is shown how to approximate the Fourier Transform of an arithmetic progression in poly-logarithmic time in the length of the progression; due to the worse dependence on the quality of approximation, however, that work obtained an algorithm with sample complexity .
The papers above showed how to achieve the guarantee in a number of samples that is quadratic in the signal sparsity. It is already known that a nearly linear dependence is possible [CT06]; however, we do not have efficient deterministic algorithms for finding these samples. The work of [CT06], as well as subsequent works, proceeds by sampling with repetition rows of the DFT matrix, and showing that the RIP condition (see Definition 2.3) holds, which in turn implies the desired result, but via a super-linear algorithm. The state-of-the-art analysis of such row subsampling is due to Haviv and Regev [HR16], who showed that samples suffice. A lower bound of rows for this subsampling process has been shown in [BLM17]. In this paper, we follow a different avenue and give a new set of schemes for the Sparse Fourier Transform which allow uniform reconstruction. Although our dependence is still quadratic in , it is necessary, in contrast to the previous works: our results satisfy the strictly stronger guarantee, for which a quadratic lower bound is known [Gan08], and hence one cannot hope for a sub-quadratic dependence. We also note the deterministic algorithm of [KVZ19], which needs a cubic dependence on but solves a somewhat different problem of finding the multidimensional sparse Fourier transform of a signal with at most non-zeros in the frequency domain, and thus is not robust to noise.
The focus of our work is the guarantee, defined formally as follows.
Definition 1.2** ( guarantee).**
A sparse recovery scheme is said to satisfy the guarantee with parameter , if given access to vector , it outputs a vector such that
[TABLE]
versus : A matter of “find all” versus “miss all”.
As we have discussed, previous works satisfied the guarantee, while our target is the guarantee. Any algorithm for the latter guarantee also satisfies the former one. But, as we shall demonstrate in Section 2.3, the guarantee is much stronger: there exists an infinite family of vectors for which an algorithm might detect none of the heavy frequencies, while an algorithm must detect all of them. This happens because the is a worst-case guarantee, in the sense that it requires detection of every frequency just above the noise level, in contrast to the , which should be regarded as an average-case guarantee in the sense that it allows missing a subset of the heavy frequencies if they carry the energy proportional to the noise level.
Previous Work on with arbitrary linear measurements.
All approaches described above concerned Fourier measurements, but compressed sensing has a long history using arbitrary linear measurements, for example [DBIPW10, PW11, IPW11, GLPS10, GNP*+*13, NSWZ18, GLPS17, LNW18, LN18, NS19]. Regarding , the work of [NNW14] indicated a connection between the aforementioned guarantee and incoherent matrices. More specifically, it was shown that given a -incoherent matrix one can design an algorithm satisfying the guarantee. The existence of a matrix with rows was also proved. Reconstruction needed time, something which was partially remedied by Li and Nakos [LN18] with a scheme of measurements and decoding time. Incoherent matrices are interesting objects on their own, and have been studied before, as they can be used to obtain RIP matrices. Deterministic constructions of rows were obtained by DeVore [DeV07] using deep results from the theory of Gelfand widths and by Amini and Marvasti [AM11] via binary BCH code vectors, where the zeros are replaced by s. We note that incoherent matrices matching this bound also follow immediately from the famous Nisan-Wigderson combinatorial designs [NW94], and serve as a cornerstone for constructions of pseudorandom generators and extractors [Tre01]. Incoherent matrices are also connected with -biased codes, and thus an almost optimal strongly explicit construction can be obtained by the recent breakthrough work of [TS17]. On the lower bound side, Alon has shown that rows are necessary for a -incoherent matrix [Alo09].
Our Contribution.
In this work we offer several new results for the Sparse Fourier Transform problem across different axis, some of which are nearly optimal. We show how to find in polynomial time a deterministic collection of samples from the time domain, such that we can solve the Sparse Fourier Transform problem in linear and sublinear time and achieve nearly optimal sample complexity. For the closely related problem of incoherent matrices from DFT rows, which is of independent interest, we obtain a nearly optimal derandomized construction via Bernstein’s inequality. We also demonstrate strongly explicit constructions, by invoking heavy number-theoretical machinery.
We note that the bounds of our constructions have been known for more than a decade if the sensing/incoherent matrix is allowed to be arbitrary. However, the previous arguments did not facilitate the frequent and relevant scenario where we have access to rows only from the Fourier ensemble. Part of our work is to show that some of these results carry over to the significantly more constrained case. We also note that any progress to deterministic schemes with subquadratic sample complexity is connected to the very challenging problem of obtaining a deterministic DFT row-subsampled RIP matrices with subquadratic number of rows222Note that [BDF*+*11] breaks the quadratic barrier for RIP matrices but does not use the Fourier ensemble; the rows are picked from the discrete chirp-Fourier ensemble, where the linear functions are substituted by quadratic polynomials. which possibly out of reach at the moment.
2 Technical Results
2.1 Preliminaries
For a positive integer , we define and we shall index the coordinates of a -dimensional vector or the rows/columns of an matrix from [math] to . We define the Discrete Fourier Transform (DFT) matrix to be the unitary matrix such that , and the Discrete Fourier Transform of a vector to be .
For a set we define to be the vector obtained from after zeroing out the coordinates not in . We also define to be the set of the indices of the largest coordinates (in magnitude) of , and . We say is -sparse if . We also define \|x\|_{p}=\big{(}\sum_{i=0}^{n-1}|x_{i}|^{p}\big{)}^{1/p} for and to be the number of nonzero coordinates of .
For a matrix and subsets , we define to be the submatrix of indexed by rows in and columns in .
The median of a collection of complex numbers is defined to be , i.e., taking the median of the real and the imaginary component separately.
For two points and on the unit circle, we use to denote the circular distance (in radians, i.e. modulo ) between and .
2.1.1 Gurantee and incoherent matrices
The quality of the approximation is usually measured in different error metrics, and the main recovery guarantee we are interested in is called the guarantee, as defined in Definition 1.2. Other types of recovery guarantee, such as the , the and the , are defined similarly, where (1) is replaced with the respective expression in Table 1. Note that these are definitions of the error guarantee per se and do not have algorithmic requirements on the scheme.
Highly relevant with the guarantee is a matrix condition which we call incoherence.
Definition 2.1** (Incoherent Matrix).**
A matrix is called -incoherent if for all (where denotes the -th column of ) and .
Lemma 2.2** ([NNW14]).**
There exist an absolute constant such that for any -incoherent matrix , there exists a -scheme which uses as the measurement matrix and whose recovery algorithm runs in polynomial time.
2.1.2 The Restrictred Isometry Property and its connection with incoherence
Another highly relevant condition is called the renowned restricted isometry property, introduced by Candès et al. in [CRT06]. We show how incoherent matrices are connected to it.
Definition 2.3** (Restricted Isometry Property).**
A matrix is said to satisfy the Restricted Isometry Property (RIP), if for all with , it holds that .
Candès et al. proved in their breakthrough paper [CRT06] that any RIP matrix can be used for sparse recovery with the error guarantee. The following formulation comes from [FR13, Theorem 6.12].
Lemma 2.4**.**
Given a -RIP matrix with , we can design a -scheme that uses as the measurement matrix and has a recovery algorithm that runs in polynomial time.
Although randomly subsampling the DFT matrix gives an RIP matrix with rows [HR16], no algorithm for finding these rows in polynomial time is known; actually, even for rows the problem remains wide open333In fact, one of the results of our paper gives the state-of-the-art result even for this problem, with rows, see Theorem 2.10.. It is a very important and challenging problem whether one can have an explicit construction of RIP matrices from Fourier measurements that break the quadratic barrier on .
We state the following two folklore results, connecting the two different guarantees, and their associated combinatorial objects. This indicates the importance of incoherent matrices for the field of compressed sensing.
Proposition 2.5** (folklore).**
An scheme with a measurement matrix of rows and recovery time induces an scheme of a measurement matrix of rows and recovery time , where is the output of the scheme.
Proposition 2.6** (folklore).**
A -incoherent matrix is also a -RIP matrix.
2.2 Our results
2.2.1 Sparse Fourier Transform Algorithms
Theorem 2.7** (Deterministic SFT with super-linear time, Section 5).**
Let be a power of . There exist a set with and an absolute constant such that the following holds. For any vector with , one can find an -sparse vector such that
[TABLE]
in time by accessing only. Moreover, the set can be found in time.
Theorem 2.8** (Deterministic SFT with sublinear time, Section 6).**
Let be a power of . There exist a set with and an absolute constant such that the following holds. For any vector with , one can find an -sparse vector such that
[TABLE]
in time by accessing only. Moreover, the set can be found in time.
Remark 2.9**.**
The condition upper bounds the “signal-to-noise ratio”, a common measure in engineering that compares the level of a desired signal to the level of the background noise. This is a common assumption in most algorithms in the Sparse Fourier Transform literature, see, e.g. [HIKP12a, IK14, Kap16, CKSZ17, Kap17], where the -norm variant was assumed.
2.2.2 From DFT to incoherent matrices
This section contains deterministic constructions of incoherent matrices.
An Explicit Construction: Derandomization in time.
Theorem 2.10** (Incoherent matrices by derandomized subsampling of DFT, Section 7).**
There exists a set with of cardinality such that the matrix is -incoherent. Moreover, can be found in time.
The above Theorem yields immediately a different algorithm for Sparse Fourier Tranform with samples, via the reduction in [NNW14].
Strongly explicit constructions: Derandomization in sub-linear time
Theorem 2.11** (Incoherent matrices from DFT via low-degree polynomials, Section 8).**
Let be a constant small enough, be a prime and be an integer. There exists a strongly explicit construction of an -incoherent matrix such that the rows of are rows of the DFT matrix (a row may appear more than once). The hidden constant in the -notation depends on and . Finding the indices of the rows takes time.
To get an idea of the above result one could for example set and observe that the results translates to the following: for every one can get a -incoherent matrix with rows. One needs the condition on (or equivalently the condition on ) to bound the term . The larger the degree , the looser this condition, but also the worse the dependence of on . For example, when , we can expand the regime of to approximately , but obtain approximately .
The following is a different construction, incomparable with Theorem 2.11 in multiple ways. First, the construction runs in sublinear time in but it is not strongly explicit. Second, it gives different trade-offs between the sparsity parameter and the number of rows. Last but not least, the construction depends on the factorization of .
Theorem 2.12** (Incoherent matrices from DFT via multiplicative subgroups, Section 8).**
Let be a prime number. For every divisor of such that we can find in time a matrix with rows being the rows of the DFT matrix such that is -incoherent.
This result could give (depending on the factorization of ) a better polynomial dependence of on in the high-sparsity regime. If has a large divisor about , this would yield a matrix with sparsity parameter and rows. For example, when , we obtain and , which cannot be obtained from Theorem 2.11. In general, Theorem 2.12 will yield useful matrices as long as has divisors in the range , ideally as many as possible. An extreme case is Fermat primes, which have divisors in the aforesaid interval.
The reader might ask the question if the polynomial dependence of on is necessary; ideally one would like a logarithmic dependence, since the polynomial dependence is interesting only in the high-sparsity regime. Regarding strongly explicit constructions, we provide some evidence why this might be a very hard problem in the remark below.
Remark 2.13**.**
The inferiority of our bounds in the low-sparsity regime is justifiable to some extent: it is because of a common obstacle that has persisted more than a century in the theory of exponential sums, due to the lack of techniques to account for sparse character sums (either additive or multiplicative). In general, the fewer summands the sum has, the harder it is to prove a tight cancellation bound. Thus, owing to the use of heavy machinery from analytic number theory and more specifically the theory of exponential sums over finite fields, our bounds for strongly explicit constructions are quite suboptimal.
2.3 Comparing with
In this subsection we elaborate why is much stronger than , and not just a guarantee that implies . Let be a constant and consider the following scenario. There are three sets of size , respectively, and for every we have , while every coordinate in and has the equal magnitude. It follows immediately that
[TABLE]
Now assume that , then . We claim that the zero vector is a valid solution for the guarantee, since
[TABLE]
where the last inequality follows provided it further holds that . Hence when , we see that the zero vector satisfies the guarantee.
Since is a possible output, we may not recover any of the coordinates in , which is the set of “interesting” coordinates. On the other hand, the guarantee does allow the recovery of every coordinate in . This is a difference of recovering all versus [math] coordinates. We conclude from the discussion above that in the case of too much noise, the guarantee becomes much weaker than the , possibly giving meaningless results in some cases.
3 Overview
Sparse Fourier Transform Algorithms (Subsection 2.2.1).
We first show how to achieve the for-all schemes, i.e., schemes that allow universal reconstruction of all vectors, and then derandomize them. Similarly to the previous works [HIKP12b, IK14, Kap17], our algorithm hashes, with the filter in [Kap17], the spectrum of to buckets using pseudorandom permutations, and repeat times with fresh randomness. The main part of the analysis is to show that for any vector and any set with , each , in a constant fraction of the repetitions, receives “low noise” from all other elements, under the pseudorandom permutations. This will boil down to a set of inequalities involving the filter and the pseudorandom permutations. We prove these inequalities with full randomness (Lemma 5.9), and then derandomize the pseudorandom permutations using the method of conditional expectations (Lemma 5.10). This will give us Theorem 2.7. To do so, we choose the pseudorandom permutations one at a time, repetition by repetition, and keep an (intricate) pessimistic estimator (Lemma 5.8), which we update accordingly. Our argument extends the arguments in [NNW14] and [PR08], and could be of independent interest. To compare with [NNW14] we have the following observation. The construction in [NNW14] consists of matrices, joined vertically, each having rows and exactly one per column. This ensures a small incoherence of the concatenated matrix and gives the guarantee. In the Fourier case, the convolution with the filter functions behaves analogously: instead of having exactly one non-zero element, each column in the -th matrix has a contiguous segment of s of size (where the center of that segment depends on the choice of the -th pseudorandom permutation) and polynomially decaying entries away from this segment. Moreover, the positions of the segments across the columns are not fully independent and are defined via the pseudorandom permutations in Definition 4.2. We show that even in this more restricted setting, derandomization is possible in polynomial time. Several details are omitted in the preceding high-level discussion and we suggest the reader look at the corresponding sections for the complete argument.
The sublinear-time algorithm (Theorem 2.8) is obtained by bootstrapping the derandomized scheme above with an identification procedure in each bucket, as most previous algorithms have done (e.g. [HIKP12a]). The major difference is that our identification procedure needs to be deterministic. We show an explicit set of samples that allow the implementation of the desired routine. To illustrate our idea, let us focus on the following -sparse case: and for some , which we want to locate. Let
[TABLE]
and consider the samples .
Observe that (ignoring factors)
[TABLE]
we can find up to , just by estimating the phase of and Proposition 4.10. Thus we can estimate up to from the phase of . If , then there exists a such that , and so will be more than away from the phase of the measurement. Thus, by iterating over all , we keep the index for which is within from , for every that is a power of in .
Unfortunately, although this is a deterministic collection of samples, the above argument gives only time. For sublinear-time decoding we use to find a sector of the unit circle of length that contains . Then, from we find two sectors of length each, the union of which contains . Because these sectors are antipodal on the unit circle, the sector intersects exactly one of those, let the intersection be . The intersection is a sector of length at most . Proceeding iteratively, we halve the size of the sector at each step, till we find , and infer . Plugging this idea in the whole -sparse recovery scheme yields the desired result. Our argument crucially depends on the fact that in the norm the phase of will always dominate the phase of all samples we take.
Incoherent Matrices from the Fourier ensemble (Subsection 2.2.2).
Our first result for incoherent matrices (Theorem 2.10) is more general and works for any matrix that has orthonormal columns with entries bounded by . We subsample the matrix, invoke a Chernoff bound and Bernstein’s inequality to show the small incoherence of the subsampled matrix. We follow a derandomization procedure which essentially mimics the proof of Bernstein’s inequality, keeping a pessimistic estimator which corresponds to the sum of the generating functions of the probabilities of all events we want to hold, evaluated at specific points. We obtain an explicit construction, i.e. a derandomization in time. This argument could be of independent interest for its generality. As there are many technical obstacles to overcome, we suggest the reader take a careful look at the proof to gain a clearer picture of the argument.
Our next results (Theorem 2.11 and Theorem 2.12) construct strongly explicit incoherent matrices by making use of technology from the fruitful theory of exponential sums in analytic number theory and additive combinatorics. Roughly speaking, to bound a complex exponential sum over a set , one would expect that specific choices of the set lead to non-trivial bounds, i.e. , since cancellation takes place in the summation. Ideally, one would desire that the exponentials behave like a random walk and give the optimal cancellation of . This intuition is clearly not true, but the results by Weyl and others show that certain sets can exhibit a nicer behaviour. We exploit their results to build incoherent matrices by taking the rows of the DFT matrix indexed by the “nice” sets. This connection also yields an immediate improvement on the lower bound of an exponential sum obtained by Winterhof [Win01].
4 Technical Toolkit
4.1 Hash Functions
Definition 4.1** (Frequency domain hashings ).**
Given , we define a function to be for all . Define a hash function as and the off-set functions as . When it is clear from context, we will omit the subscripts from the above functions.
In what follows, we might use the notation to denote a tuple of values along with the associated hash function from Definition 4.1. Below we define a pseudorandom permutation in the frequency domain.
Definition 4.2** ().**
Suppose that exists. For , we define the pseudorandom permutation by .
Proposition 4.3** ([HIKP12a, Claim 2.2]).**
.
Definition 4.4** (Sequence of Hashings).**
A sequence of hashings is specified by tuples . For a fixed , we will also set to be the functions defined in Definition 4.1, and to be the pseudorandom permutation defined in Definition 4.2, by setting .
4.2 Filter Functions
Definition 4.5** (Flat filter with buckets and sharpness [Kap17]).**
*A sequence symmetric about zero with Fourier transform is called a flat filter with buckets and sharpness if
(1) for all ;
(2) for all such that ;
(3) for all such that .*
Lemma 4.6** (Compactly supported flat filter with buckets and sharpness [Kap17]).**
Fix the integers with a power of two, integers , and an even integer. There exists an -flat filter , whose inverse Fourier transform is supported on a length- window centered at zero in time domain.
Lemma 4.7** ([HIKP12b, Lemma 3.6], [HIKP12a, Lemma 2.4], [IK14, Lemma 3.2]).**
Let . Let be uniformly random odd number between and . Then for all we have .
4.3 Formulas for Estimation
Definition 4.8** (Measurement).**
For a signal , a hashing , integers and , a measurement vector is the -dimensional complex-valued vector such that
[TABLE]
for . Here is a filter with buckets and sharpness constructed in Definition 4.5.
The following lemma provides a HashToBins procedure, which computes the bucket values of the residual , where is also provided as input.
Lemma 4.9** (HashToBins [Kap17, Lemma 2.8]).**
Let and parameters such that is a power of , and is an even integer. There exists a deterministic procedure HashToBins which computes such that for any ,
[TABLE]
where is the filter defined in Definition 4.5, and is a negligible error term satisfying for an arbitrarily large absolute constant. It takes samples, and time.
We shall ignore the term in the proof of correctness of our algoriths, since it will be negligible and won’t affect the analysis. For a hashing , values , and the associated measurement , one has
[TABLE]
The following is a basic fact of complex numbers, which will be crucially used in our sublinear-time algorithm, for estimating the phase of a heavy coordinate.
Proposition 4.10**.**
Let with , then .
Proof.
The worst case occurs when is orthogonal to , and thus . ∎
5 Linear-Time Algorithm
Our first step is to obtain a condition that allows us to approximate every coordinate of . This condition corresponds to a set of inequalities. In this section we shall consider a sequence of hashings and for notational simplicity we shall abbreviate as .
We first present a lemma, which states that each can be finely estimated in most hashing repetitions.
Lemma 5.1**.**
Fix and . Let a sequence of hashings and . If for all with it holds that
[TABLE]
then for every vector and every , for at least indices we have that
[TABLE]
Proof.
We have that
[TABLE]
Hence there can be at most indices for which the estimate is more than , otherwise the leftmost-hand side would be at least . ∎
The lemma above implies that for every we can find an estimate of up to in time , by taking the median of all values for . The existence of pseudorandom permurations such that the conditions of Lemma 5.1 hold, namely inequalities 3, is proved in Lemma 5.9, see next subsections for notation and definitions.
5.1 Proof of correctness assuming Inequalities (3) hold
We prove the first part of Theorem 2.7 (existence of ) assuming that the inequalities 3 hold, and thus the conditions of Lemma 5.1 hold.
For notational simplicity, let so the filter satisfies that for all and for all . In the rest of the section, we choose rounded to the closest power of from above; is some constant to be determined.
As in previous Fourier sparse recovery papers [HIKP12a, IK14, Kap16, Kap17], we assume that we have the knowledge of (or a constant factor upper bound) and that the signal-to-noise ratio . Our estimation algorithm is similar to that in [IK14]. The main algorithm is Algorithm 1. It recovers the heavy coordinates of in increasing magnitude by repeatedly calling the subroutine Algorithm 2, which recovers the heavy coordinates of the residual spectrum above certain threshold.
The following lemmata are analogous to Lemmata 6.1 and 6.2 in [IK14], and their proofs are postponed to Section A. The first lemma states that Algorithm 2 will recover all the coordinates in the residual spectrum that are at least and it will not mistake a small coordinate for a large one.
Lemma 5.2** (guarantee of SubRecovery, Section A).**
Consider the call (Algorithm 2). Let . When , the output of Algorithm 2 satisfies
- (i)
* for all .* 2. (ii)
* for all ;* 3. (iii)
* contains all such that ;*
Next we turn to the analysis of Algorithm 1. Let and for some constant to be determined. By the SNR assumption of , we have that and thus . In Algorithm 1, the threshold in the -th step is
[TABLE]
where are constants to be determined. Let be the residual vector at the beginning of the -th step in the iteration. We can show that the coordinates we shall ever identify are all heavy (contained in ) and we always have good estimates of them.
Lemma 5.3** ( norm reduction, Section A).**
There exist such that it holds for all that
- (a)
* for all ;* 2. (b)
* for all .* 3. (c)
;
Now we are ready to show the first part of Theorem 2.7, which is one of our main results. We shall choose such that the conditions in (5.1) holds. The hashings can be chosen deterministically, which we shall prove in the rest of the section after this proof; this will complete the full proof.
Proof of Theorem 2.7.
The recovery guarantee follows immediately from Lemma 5.3, as
[TABLE]
This implies that . To obtain the error guarantee, that is, to achieve a right-hand side of , we can just replace with throughout our construction and analysis.
Number of Measurements.
Computing the measurements in SubRecovery requires measurements (Lemma 4.9). These measurements are reused throughout the iteration in the overall algorithm, hence there are measurements in total.
Running Time.
Each call to SubRecovery runs in time . By Lemma 5.3(a), we know that . The overall runtime is therefore .
∎
5.2 Choosing the hash functions
In this and the next subsection, we shall find such that (3) holds for all pairs . It will be crucial for the next section that we can choose freely; that means the inequalities depend solely on . Note that and thus , it suffices to find such that it holds for all that
[TABLE]
We shall show how to do so in polynomial time in .
Definition 5.4** (Bad Events).**
Let and . Let denote the event .
Pessimistic Estimator
The derandomization proceeds as follows: find a pessimistic estimator for each with the first hash functions fixed by such that the following holds:
[TABLE]
Note that inequality 7 implies that there exist choices of the pseudorandom permutations such that the conditions of Lemma 5.1 hold. The algorithm will start with . At the -th step, it chooses to minimize
[TABLE]
By (8), this sum keeps decreasing as increases. At the end of step , all hash functions are fixed, and by (6) and (7), we have . Since is a deterministic event conditioned on all hash functions, the conditional probability is either [math] or . The inequality above implies that all conditional probabilities are [math], i.e., none of the bad events happens, as desired.
We first define our pessimistic estimator. In what follows, we shall be dealing with numbers that might have up to digits. Manipulating numbers of that length can be done in polynomial time. We will not bother with determining the exact exponent in the polynomial or optimizing it, which we leave to future work.
Definition 5.5** (Pessimistic Estimator).**
Let to be determined. Define
[TABLE]
where
[TABLE]
This function can be evaluated in time for each pair and thus the algorithm runs in polynomial time in .
To complete the proof, we shall verify (6)–(8) in Subsection 5.4.
5.3 Distribution of Offset Function
This subsection prepares auxiliary lemmata which will be used to verify the derandomization inequalities. In this subsection we focus on the distribution of the offset for and appropriately random and .
Lemma 5.6**.**
Suppose that are powers of , is uniformly random on the odd integers in and is uniformly random in . For any fixed pair it holds that
- (i)
When , is uniformly distributed on ; 2. (ii)
When is even, for all . 3. (iii)
When is odd, for and for .
Proof.
First observe that
[TABLE]
For a fixed , let
[TABLE]
Note that as a function of is uniform on . Note also that
[TABLE]
which gives that is uniform on its support, which is .
Suppose that , where is an odd integer. It is clear that is uniform on its support , which consists of equidistant points. Since is always uniform (regardless of ), and the distribution of is the convolution of two distributions.
Suppose that now that and .
When , it holds that , and thus is an integer multiple of the distance between two consecutive distance in . In this case it is easy to see that is uniform on .
When is even, it must hold that and thus . The support of is
[TABLE]
which leaves a gap of width at least in the middle between two consecutive points in .
When is odd, it must hold that and thus . The support of therefore leaves a gap of width at least in the middle between two consecutive points in . It is easy to see that is uniform on its support. ∎
The next theorem, which bounds the moment generating function of , is a straightforward corollary of Lemma 5.6.
Lemma 5.7**.**
Let , and be as in Lemma 5.6. When , .
Proof.
When ,
[TABLE]
where the inequality follows from the fact that is at most on as at most elsewhere (recall Definition 4.5), and the equality from rearranging the terms.
When for even ,
[TABLE]
since the filter is at most outside of and the distribution is not supported on that interval by Lemma 5.6.
When for odd ,
[TABLE]
where the inequality follows again by combining Lemma 5.6(iii) and the bounds on from Definition 4.5, and the equality is just a rearrangement of terms. ∎
5.4 Putting the Pieces Together
We are now ready to verify (6)–(8).
Lemma 5.8** (Pessimistic Estimation).**
It holds that
[TABLE]
Proof.
Let . Then
[TABLE]
where the last inequality follows from Lemma 5.7. ∎
Lemma 5.9** (Initial constraint).**
It holds that
[TABLE]
Proof.
It follows from Lemma 5.7 that
[TABLE]
Recall that we choose and . It follows that
[TABLE]
Lemma 5.10** (Derandomization step).**
It holds that
[TABLE]
Proof.
Let . The proposition is equivalent to
[TABLE]
This clearly holds by Lemma 5.7. ∎
6 Sublinear-Time Algorithm
In this section, we take the pseudorandom hashings to be as in Lemma 5.1 and assume that (3) holds.
The first lemma concerns -sparse recovery, because, as in earlier works, we shall create subsignals using hashing, most of which are -sparse.
Lemma 6.1**.**
Suppose that is a power of . Let . Then the following holds: Let and suppose that for some . Then one can recover the frequency from the samples in time.
Proof.
Define . Observe that
[TABLE]
It follows from Proposition 4.10 that . When , one has , and thus .
Hence,
[TABLE]
Note that is the union of disjoint intervals of length . We may view these intervals as arcs on the unit circle, each arc being of length , and the left endpoints of every two consecutive arcs having distance .
Define a series of intervals for recursively as
[TABLE]
It is easy to see, via an inductive argument, that for all , and . In the end, is an interval of length , which can contain only one , and thus we can recover .
Each can be computed in time from and thus the overall runtime is . ∎
Now we move to develop our sublinear-time algorithm. The following is an immediate corollary of Lemma 5.1.
Lemma 6.2**.**
For each , it holds for at least indices that
[TABLE]
Proof.
It follows from Lemma 5.1, Eq. (2) and the observation that . ∎
As before, we choose rounded to the closest power of ; is some constant to be determined. The following is a lemma for Algorithm 3, which gives the same guarantees as Lemma 5.2.
Lemma 6.3**.**
Suppose that be the input to Algorithm 3. Let . When , the output of Algorithm 3 satisfies
- (i)
* for all .* 2. (ii)
* for all ;* 3. (iii)
* contains all such that ;*
Proof.
The proof of (i) and (ii) are the same as the proof of Lemma 5.2. Next we prove (iii). When , we have
[TABLE]
Hence for the signal defined via its Fourier coefficients as
[TABLE]
By Lemma 6.2, since , we see that with frequency satisfies the condition of Lemma 6.1 and thus it will be recovered in at least repetitions . The measurements are exactly with . The thresholding argument is the same as in the proof of Lemma 5.2. ∎
Observe that Lemma 5.3 continues to hold if we replace Algorithm 2 with Algorithm 3 and Lemma 5.2 with Lemma 6.3. Now we are ready to prove our main theorem, Theorem 2.8, on the sublinear-time algorithm.
Proof of Theorem 2.8.
The recovery guarantee follows identically as in the proof of Theorem 2.7.
The measurements are for in each of the repetitions, and calculating each requires measurements (Lemma 4.9). There measurements are reused throughout the iteration in the overall algorithm, hence there are measurements in total.
Each call to SubRecovery runs in time , where we use the fact that from Lemma 5.3(a). The overall runtime is therefore . ∎
7 Incoherent Matrices via Subsampling DFT Matrix
Consider an unitary matrix and assume that for all . Our goal in this section is to show how to sample deterministically rows of , obtaining a matrix , such that for all pairs . Once we have such , the rescaled matrix is a -incoherent matrix, that is, for all pairs .
Let be i.i.d. Bernoulli variables with for some . Let such that , then
[TABLE]
Let , then , where . We consider the real and the imaginary parts separately, since for a complex random variable ,
[TABLE]
Hence it suffices to consider the real variable problem as follows. Suppose that satisfy , and consider the centred sum . We wish to find deterministically such that .
Define the pessimistic estimator to be
[TABLE]
The moment generating function of is
[TABLE]
Pessimistic Estimation
Let , where have been fixed.
[TABLE]
Derandomization step
One can show first that
[TABLE]
which is equivalent to
[TABLE]
where
[TABLE]
It is now clear that the left-hand side of (10) is , and therefore (9) holds. This implies that
[TABLE]
Initial condition
This is a standard argument for Bernstein’s inequality. For notational convenience, let . Note that is increasing on . Using Taylor’s expansion, one can bound that (see [BLM13, p35])
[TABLE]
and (see [Tro15, p98])
[TABLE]
It then follows (see [Tro15, p98]) that
[TABLE]
provided that .
When , and , and the above probability is at most
[TABLE]
provided that is large enough.
Therefore at step , the algorithm minimizes by choosing , and at the end of step , all have been fixed and such that .
Now we return to the original incoherence problem in the complex case. We can define events, and , for every pair as
[TABLE]
For each pair of , using the preceding argument, we have pessimistic estimators by setting and by setting such that
- •
(pessimistic estimation)
[TABLE]
- •
(derandomization step)
[TABLE]
- •
(initial condition)
[TABLE]
Note that (11) implies
[TABLE]
In addition, we also need to control the number of ’s which take value ; we want this number to be . This can be achieved by combining another derandomization procedure on using one-sided Chernoff bounds. Define the event . Then for ,
[TABLE]
where
[TABLE]
is the moment generating function of . Define our pessimistic estimator to be
[TABLE]
then, similar to the proof in Section 5, we have
- •
(pessimistic estimation)
[TABLE]
- •
(derandomization step)
[TABLE]
- •
(initial condition) When is small enough and large enough,
[TABLE]
Overall, our standard derandomization procedure, which at step chooses that minimizes
[TABLE]
will find such that none of and and holds, which implies that for all and . That is, we have chosen rows of , obtaining a matrix of incoherence at most .
8 Incoherent Matrices and Analytic Number Theory
In this section we give new results via the connection between the incoherent matrices and the exponential sum of characters, a classical quantity of interest in analytic number theory. Such connection has been formerly exploited, for instance, by Xu [Xu11] and Bourgain et al. [BDF*+*11] for explicit constructions of RIP matrices. We utilize the connection bidirectionally: we shall give explicit constructions of incoherent matrices using exponential sums, and improve the lower bound of an exponential sum using a lower bound of incoherent matrices.
8.1 A simple construction via Gauss sums
We give a rather simple construction of an -incoherent matrix . It is expected that Gauss sums will behave nicely for incoherent matrices, since they have the optimal rate of cancellation: summing elements gives cancellation . Let be a prime number and let , i.e. the set of quadratic residues in , including [math]. It is a standard fact that . We shall show that the rows of the DFT matrix indexed by the elements of give an incoherent matrix with an appropriate scaling. Let . Observe that
[TABLE]
where the last inequality follows from the triangle inequality and the standard property of Gauss sums (see, e.g., [IR90, p91]).
Now, let be defined as for and . For every pair with , we have that the inner product of the -th and the -th column of is exactly . Normalising gives the desired result.
8.2 Proof of Theorem 2.11
In the previous subsection we obtained an incoherent matrix by picking the rows of DFT indexed by quadratic residues, i.e. quadratic polynomials. Motivated by this, we show that taking polynomials of a higher degree can give an improved result that works in a larger range of parameters. We shall need the following deep theorem of Weyl.
Theorem 8.1** ([Nat96, Theorem 4.3]).**
Let be positive integers and an integer such that . If is a real polynomial of degree with leading coefficient such that , then for any we have
[TABLE]
where the hidden constant in the -notation depends on and .
We are now ready to prove Theorem 2.11.
Proof.
Pick any polynomial of degree such that every coefficient of is an integer multiple of . Pick also any consecutive points in ; we can just take [math] to . Take the rows of DFT indexed by evaluated on these consecutive points. We shall show that after appropriate normalization this corresponds to an incoherent matrix of the desired form. The inner product between two columns indexed by of the formed matrix is
[TABLE]
Observe that is a -degree polynomial where every coefficient is an integer multiple of . Applying Theorem 8.1 with , , , and noticing that , we see that the above sum is at most
[TABLE]
Rescale the formed matrix by , the incoherence of the matrix is rescaled by and thus becomes
[TABLE]
yielding the desired result. ∎
8.3 Proof of Theorem 2.12
Proof.
Suppose that is a generator of the multiplicative cyclic group (we shall show how to find such later). For every that divides we shall take the rows of DFT indexed by the multiplicative subgroup that is generated by . Since is a generator of it must hold that . The incoherence bound follows by a classical fact that (see, e.g. [Kur07]) for any with ,
[TABLE]
Rescaling gives the desired incoherence bound.
To find a generator of is a classic problem with a rich research history. We include a simple, standard algorithm below for completeness.
The first step is to factor in time. We can find all primes smaller than in time using Eratosthene’s sieve. For each such prime we shall find the highest power which divides . Let be the number that is obtained after dividing with for all such . If , it must be a prime, otherwise for one of would be at most .
Now we are ready to find a generator . It is known that the smallest generator of is [Bur62, Theorem 3] and thus we shall iterate over the first elements of and check if every such element is a generator by checking whether in for all prime divisors of . To ensure that such a is a generator, observe first that the checking condition guarantees that is of order , and checking only prime suffices (since if is composite and this implies for all divisors of ); moreover, it is a basic fact in group theory that the order of any subgroup divides the order of the group and hence we need only look at divisors of . The runtime of this part is . ∎
8.4 Strengthening the lower bound in [Win01]
The lower bound in [Win01] states that for any and with , any subset , there exists and an irreducible -degree polynomial with coefficients in , such that
[TABLE]
With the connection to incoherent matrices and the lower bound of Alon, we obtain a much stronger result. In fact we have for any and any polynomial with coefficients in that
[TABLE]
for some , provided that . In the case that we still have a lower bound of .
Note that the condition has been relaxed to , the assumption that has been removed, the conclusion “there exists an irreducible polynomial” has been replaced with the condition “for any polynomial”, and the right-hand side has been amplified by a multiplicative factor of for .
Our new lower bound follows immediately from Alon’s lower bound on incoherent matrices [Alo09]. Indeed, assume that there exists a polynomial such that for all the left-hand side of (12) is at most for some absolute constant . Consider the matrix with the rows of the DFT matrix indexed by numbers (some rows of the DFT matrix may appear more than once). Observe that after normalizing the matrix by , the incoherence is
[TABLE]
This would violate the lower bound in [Alo09], which states that an -incoherent matrix must satisfy for some absolute constant , since
[TABLE]
for small enough, when . In the case of we can still use the quadratic bound () on incoherent matrices to obtain a bound of .
9 Open Problems and Future Direction
A direction of research is to design deterministic schemes that break the quadratic barrier for signals with structured Fourier support. For example, subsampling the rows of the DFT matrix to obtain RIP matrices depends highly on the structure of the vectors we would like to preserve. The more additive structure the support of a -sparse vector has, the worse is the concentration of a random Fourier coefficient of . Equivalently, the less additive structure the support of has, the flatter its Fourier transform is, and hence, the better concentration bounds we obtain. The concentration in the extreme case, when the support of is “dissociated”, is captured by the renowned Rudin’s inequality in additive combinatorics (see, e.g. [TV06, Lemma 4.33]). We thus believe that it is an interesting direction to use machinery from the field of additive combinatorics and the relevant fields in order to obtain new constructions and algorithms, at least for interesting subclasses of structured signals.
10 Acknowledgements
We would like to thank anonymous reviewers for their valuable feedback.
Appendix A Reduction of the norm
Lemma 5.2.
Suppose that be the input to Algorithm 3. Let . When , the output of Algorithm 3 satisfies
- (i)
* for all .* 2. (ii)
* for all ;* 3. (iii)
* contains all such that ;*
Proof.
By the recovery guarantee we know that
[TABLE]
By thresholding, it must hold for that and thus
[TABLE]
which proves (i). Thus
[TABLE]
which proves (ii). Next we prove (iii). When , we have
[TABLE]
Hence for the signal defined via its Fourier coefficients as
[TABLE]
By Lemma 6.2, since , we see that with index satisfies the condition of Lemma 6.1 and thus it will be recovered in at least indices . The measurements are exactly with . The recovered estimate is at least and thus the median estimate will pass the thresholding, and . ∎
Let and . By the SNR assumption of , we have that and thus . Let be the residual vector at the beginning of the -th step in the iteration. The threshold in the -th step is
[TABLE]
where are constants to be determined.
Lemma 5.3.
There exist such that it holds for all that
- (a)
* for all ;* 2. (b)
* for all .* 3. (c)
;
Proof.
We prove the three properties inductively. The base case is , where all properties clearly hold, noticing that .
Next we prove the inductive step from to . Note that
[TABLE]
When
[TABLE]
it holds that
[TABLE]
and thus Lemma 5.2 applies.
From Lemma 5.2(i), we know that when
[TABLE]
no coordinates in will be modified. This proves (a).
Lemma 5.2(ii) implies (b).
To prove (c), let . By Lemma 5.2(iii), all coordinates in will be recovered. Hence for ,
[TABLE]
provided that
[TABLE]
For , the definition of implies that . This proves (c).
We can take , , , , which satisfy all the constraints (13), (14) and (15). ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AGS 03] Adi Akavia, Shafi Goldwasser, and Shmuel Safra. Proving hard-core predicates using list decoding. In FOCS , volume 44, pages 146–159, 2003.
- 2[Aka 10] Adi Akavia. Deterministic sparse Fourier approximation via fooling arithmetic progressions. In COLT , pages 381–393, 2010.
- 3[Aka 14] Adi Akavia. Deterministic sparse Fourier approximation via approximating arithmetic progressions. IEEE Transactions on Information Theory , 60(3):1733–1741, 2014.
- 4[AKM + 18] Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, and Amir Zandieh. A universal sampling method for reconstructing signals with simple Fourier transforms. ar Xiv preprint ar Xiv:1812.08723 , 2018.
- 5[Alo 09] Noga Alon. Perturbed identity matrices have high rank: Proof and applications. Combinatorics, Probability and Computing , 18(1-2):3–15, 2009.
- 6[AM 11] Arash Amini and Farokh Marvasti. Deterministic construction of binary, bipolar, and ternary compressed sensing matrices. IEEE Transactions on Information Theory , 57(4):2360–2370, 2011.
- 7[BCG + 14] Petros Boufounos, Volkan Cevher, Anna C Gilbert, Yi Li, and Martin J Strauss. What’s the frequency, Kenneth?: Sublinear Fourier sampling off the grid. In Algorithmica(A preliminary version of this paper appeared in the Proceedings of RANDOM/APPROX 2012, LNCS 7408, pp.61–72) , pages 1–28. Springer, 2014.
- 8[BDF + 11] Jean Bourgain, Stephen J Dilworth, Kevin Ford, Sergei V Konyagin, and Denka Kutzarova. Breaking the k 2 superscript 𝑘 2 k^{2} barrier for explicit RIP matrices. In Proceedings of the forty-third annual ACM symposium on Theory of computing , pages 637–644. ACM, 2011.
