Dimension-independent Sparse Fourier Transform
Michael Kapralov, Ameya Velingker, Amir Zandieh

TL;DR
This paper introduces a novel algorithm for computing the sparse Fourier transform in any dimension with runtime polynomial in sparsity and logarithmic in signal size, overcoming previous dimensionality limitations.
Contribution
It presents the first dimension-independent Sparse Fourier Transform algorithm with polynomial runtime in sparsity and logarithmic in size, using adaptive aliasing filters.
Findings
Achieves dimension-independent Sparse FFT in polynomial time.
Introduces adaptive aliasing filters for frequency isolation.
Provides efficient algorithms for average case models.
Abstract
The Discrete Fourier Transform (DFT) is a fundamental computational primitive, and the fastest known algorithm for computing the DFT is the FFT (Fast Fourier Transform) algorithm. One remarkable feature of FFT is the fact that its runtime depends only on the size of the input vector, but not on the dimensionality of the input domain: FFT runs in time irrespective of whether the DFT in question is on or for some , where . The state of the art for Sparse FFT, i.e. the problem of computing the DFT of a signal that has at most nonzeros in Fourier domain, is very different: all current techniques for sublinear time computation of Sparse FFT incur an exponential dependence on the dimension in the runtime. In this paper we give the first algorithm that computes the DFT of a -sparse signal in time …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Approximation and Integration · Sparse and Compressive Sensing Techniques · Digital Filter Design and Implementation
Dimension-independent Sparse Fourier Transform
Michael Kapralov
EPFL
Ameya Velingker
Google Research
[email protected] This work was completed while the author was a research scientist in the School of Computer and Communication Sciences, EPFL.
Amir Zandieh
EPFL
Abstract
The Discrete Fourier Transform (DFT) is a fundamental computational primitive, and the fastest known algorithm for computing the DFT is the FFT (Fast Fourier Transform) algorithm. One remarkable feature of FFT is the fact that its runtime depends only on the size of the input vector, but not on the dimensionality of the input domain: FFT runs in time irrespective of whether the DFT in question is on or for some , where .
The state of the art for Sparse FFT, i.e. the problem of computing the DFT of a signal that has at most nonzeros in Fourier domain, is very different: all current techniques for sublinear time computation of Sparse FFT incur an exponential dependence on the dimension in the runtime. In this paper we give the first algorithm that computes the DFT of a -sparse signal in time in any dimension , avoiding the curse of dimensionality inherent in all previously known techniques. Our main tool is a new class of filters that we refer to as adaptive aliasing filters: these filters allow isolating frequencies of a -Fourier sparse signal using samples in time domain and runtime per frequency, in any dimension .
We also investigate natural average case models of the input signal: (1) worst case support in Fourier domain with randomized coefficients and (2) random locations in Fourier domain with worst case coefficients. Our techniques lead to an time algorithm for the former and an time algorithm for the latter.
1 Introduction
The Discrete Fourier Transform (DFT) is one of the most widely used computational primitives in modern computing, with numerous applications in data analysis, signal processing, and machine learning. The fastest algorithm for computing the DFT is the Fast Fourier Transform (FFT) algorithm of Cooley and Tukey, which has been recognized as one of the 10 most important algorithms of the 20th century [Cip00]. The FFT algorithm is very efficient: it computes the Discrete Fourier Transform of a length complex-valued signal in time . This applies to vectors in any dimension: FFT works in time irrespective of whether the DFT is on the line, on a grid, or is in fact the Hadamard transform on , with .
In any applications of the Discrete Fourier Transform, the input signal often satisfies sparsity or approximate sparsity constraints: the Fourier transform of has a small number of coefficients or is close to a signal with a small number of coefficients (e.g., this phenomenon is the motivation for compression schemes such as JPEG and MPEG). This has motivated a rich line of work on the Sparse FFT problem: given access to a signal in time domain that is sparse in Fourier domain, compute the nonzero coefficients in sublinear (i.e., ) time.
Very efficient algorithms for the Sparse FFT problem have been developed in the literature [GL89, KM91, Man92, GGI*+*02, AGS03, GMS05, Iwe10a, Aka10, HIKP12b, HIKP12a, LWC12, BCG*+*12, HAKI12, PR13, HKPV13, IKP14, IK14, Kap16, PS15, CKPS16, Kap17]. The state-of-the-art approach, due to [HIKP12a], yields an runtime algorithm for the following exact -sparse Fourier transform problem: given access to an input signal of length whose Fourier transform has at most nonzeros, output the nonzero coefficients and their values. This highly efficient algorithm comes with a caveat, however: the runtime of only holds for the Fourier transform on the line, namely, . The algorithm naturally extends to higher dimensions, namely, , where , but with an exponential loss in runtime; the runtime becomes as opposed to . Interestingly, the other extreme of , i.e., the Hadamard transform, has been known to admit an algorithm since the seminal work of Goldreich and Levin [GL89]. However, all intermediate values of exhibit a curse of dimensionality. This is in sharp contrast with FFT itself, which runs in time , where is the length of the input signal, in any dimension . The focus of our work is to design sublinear time algorithms for Sparse FFT that avoid this curse of dimensionality. Our main point of attention is the Sparse FFT problem:
[TABLE]
Our main result is the first sublinear algorithm for exact Sparse FFT (1), as stated in the following theorem.
Theorem 1** (Main result, informal version of Theorem 2.1 in Section 2.1).**
For any integer that is a power of two and any positive integer , there exists a deterministic algorithm that, given access to a signal with , recovers in time .
We note that this is the first sublinear time Sparse FFT algorithm that avoids an exponential dependence on the dimension . One should note that the runtime still depends on , since is lower bounded by , but this dependence is polynomial as opposed to exponential.
1.1 Significance of our results and related work
Significance of our results.
The state of the art in high dimensional Sparse Fourier Transforms presents an interesting conundrum: algorithms with runtime are known for (Discrete Fourier Transform on the line, see [HIKP12a]) and (the Hadamard transform, see [GL89]), but for all intermediate values of the runtime scales exponentially in . Given that FFT itself is dimension-insensitive, this strongly suggests that exciting new algorithmic techniques can be developed for the high-dimensional version of the problem. Our paper designs the first approach to high dimensional Sparse FFT that does not suffer from the curse of dimensionality, and naturally leads to several exciting open problems that we hope will spur further progress in this area.
In addition, we note that rather high-dimensional versions of the Fourier transform arise in applications (e.g., 2D, 3D and 4D-NMR in medical imaging), and designing practical Sparse FFT algorithms for this regime is an important problem. We hope that new techniques for dimension-independent Sparse FFT will lead to progress in this direction as well.
Sample complexity of high-dimensional Sparse FFT.
We note that, besides runtime, another very important parameter of a Sparse FFT algorithm is sample complexity, i.e., the number of samples that an algorithm needs to access in time domain in order to compute the top few coefficients of the Fourier transform. The sample complexity of Sparse FFT, unlike runtime, does not suffer from a curse of dimensionality. Indeed, there exist several algorithms with runtime that can recover the top coefficients of using only accesses in time domain, irrespective of the dimensionality of the problem. This can be achieved, for example, using either results on the restricted isometry property (RIP) [CT06, RV08, Bou14, CGV12, HR17], or using the filtering approach developed in the Sparse FFT literature, with decoding time. Thus, the challenge is to achieve sublinear runtime without an exponential dependence on the dimension.
We now outline existing approaches to Sparse FFT and explain why they fail to scale well in high dimensions:
State-of-the-art approaches to Sparse FFT and their lack of scalability in high dimensions.
The main idea behind many recently developed algorithms for the Sparse FFT problem is the “hashing” approach inherited from sparse recovery with arbitrary linear measurements. Given access to a signal , one designs linear measurements of that allow one to “hash” the nonzero positions of into a number of “buckets.” The number of buckets is chosen to be a constant factor larger than the sparsity to ensure that a large constant fraction of the nonzero positions of are isolated in their buckets. Every isolated element can be recovered and subtracted from for future iterations of the same hashing scheme, thereby ensuring convergence. The idea of hashing is implemented via filtering: one designs a filter such that approximates a “bucket,” i.e., is close to on an ball of side length in dimension . The content of the -th ‘bucket’, for , is then
[TABLE]
Since is essentially on the ball around the center of the ‘bucket’ and essentially zero outside, (2) gives the algorithm time domain access to the restriction of to the “bucket,” i.e., the essential support of , where is the location in time domain at which the signal is being accessed. A pseudorandom permutation of the frequency space ensures that such a bucket is likely to contain just a single element of the support, which enables the algorithm to recover at least a constant fraction of elements in a single round and perform iterative recovery. Furthermore, if the (essential) support of in time domain is small, one obtains an efficient algorithm.
The difficulty that arises in using (2) in high dimensions is the fact that it is not known how to ensure that is close to in an appropriately defined “bucket” while simultaneously ensuring that is small. For example, the filters constructed in [HIKP12a] ensure that is polynomially close to in Fourier domain, but this comes at the expense of being larger than (the ideal support size) by a factor of , and this effect is even more pronounced in higher dimensions, resulting in a loss in runtime. The other extreme would be to choose to be equal to on an ball with points around the origin, but in that case, its Fourier transform is the sinc function, which is only a constant factor approximation to the indicator of the corresponding box in Fourier domain (i.e., the ideal “bucket”). In dimension , the approximation degrades to for some constant , leading to exponential loss in runtime. Indeed, suppose that all elements of have roughly the same value. Then for a given element , the expected contribution of other elements to the noise in the “bucket” that is hashed to is , but the contribution of to its own bucket is (most of the time) only of its value, and, hence, only an exponentially small fraction of coefficients can be recovered in a given round of hashing. 111In addition, the discussion above assumes the presence of an approximate pairwise hashing lemma for high dimensions that does not lose an exponential factor in the dimension (it is known that such a lemma holds with at most about a factor of loss [IK14], but no dimension-independent version is available in the literature).
Related work.
In [CI17], the authors presented a deterministic Sparse Fourier transform algorithm for the Hadamard transform, i.e., , that runs in nearly linear time in the sparsity parameter , but it is not known how this extends to lower dimensions. In [Iwe10b, Iwe12] the author gives a time deterministic algorithm for the Sparse Fourier Transform, but the algorithm only applies to a related but distinctly easier problem. Specifically, the problem considers a continuous function on whose Fourier transform is bandlimited and sparse. The presented algorithm requires sampling the signal at arbitrary locations in . A natural approach is to emulate sampling off-grid (i.e., at arbitrary points in ) given discrete samples that we have access to, which is achieved in [MZIC17] giving an time deterministic algorithm for one dimensional sparse FFT. But this is a challenging task in multi-dimensional setting for several reasons. First, we are operating under the sparsity assumption alone, and no powerful general interpolation techniques that work under the sparsity assumption alone are available, to the best of our knowledge. Furthermore, even if the function were bandlimited, a natural approach to interpolation would involve some form of Taylor expansion or semi-equispaced Fourier Transform, however, both approaches incur a loss in dimension . Indeed, similar exponential dependence on the dimensionality of the problem manifests itself in Fast Multipole Methods [GR87b, BG97] and the Sparse FFT algorithms mentioned above. Finally, one should also note that whereas the problem of computing the Fourier transform on a grid with mutually prime with is equivalent to a one-dimensional Fourier transform on , the standard case of side lengths that are powers of two (for which we have the most efficient FFT algorithms) does not admit such a reduction. Furthermore, such a reduction appears to be quite challenging in high dimensions for reasons outlined above, and even more so for highly oscillatory functions that Sparse FFT algorithms need to handle.
2 Overview of our results and techniques
Prior works on Sparse FFT have primarily focused on efficiently implementing hashing-based ideas developed in the extensive literature on sparse recovery using general linear measurements (e.g., [GHI*+*13]), which meets with several difficulties. In particular, the presence of multiplicative subgroups in has been a hurdle in analyzing Sparse FFT algorithms: while aliasing filters have optimal performance from the point of view of the uncertainty principle, their applications have been limited due to the fact that frequencies that belong to the same subgroup get hashed together if such filters are used, making it impossible to reason about isolation of individual frequencies. At the same time, FFT itself owes much of its efficiency to the very same multiplicative subgroups of , and a natural question is whether one can design a Sparse FFT algorithm that operates on similar principles. This is precisely the approach that we take.
Adaptive aliasing filters.
The main technical innovation that allows us to avoid exponential dependence on the dimension and obtain Theorem 1 is a new family of filters for isolating a subset of frequencies in Fourier domain in a sparse signal using few samples in time domain. We refer to the family of filters as adaptive aliasing filters.
Definition 1** (-isolating filter, informal version of Definition 11, see Section 4).**
Suppose is a power of two integer and for a positive integer . Then, for any frequency , a filter is called -isolating if and for every .
We explain the intuition behind the construction of the filter in Section 2.1 below and provide the details later in Section 4.
The reason why an -isolating filter is useful lies in the fact that for every signal with we have, for all
[TABLE]
Thus, the filter enables access to the time domain representation of the restriction of to in time proportional to , at any point . Of course, this is only useful if the support of is small. The main technical lemma of our paper shows that for every support set , there exists an that can be isolated efficiently:
Lemma 1** (Informal version of Corollary 2 in Section 4).**
For every power of two , positive integer , and set , there exists an and an -isolating filter such that .
The proof of the lemma is given in Section 4.
Accessing the residual signal.
Lemma 1 suggests a natural approach to the estimation problem with Fourier measurements in high dimensions: iteratively construct an -isolating filter , estimate , remove from , and proceed. The hope is that we can essentially assume that we are given access to once we have estimated . In general, if we have been able to estimate the values of for all with some , then we would like to obtain access to
[TABLE]
Note that we would need for in the support of at the next iteration, and this support is generally a rather complicated set of size , from which we need to subtract the inverse Fourier transform of the signal estimated so far. This problem is the non-uniform Fourier transform problem, and no subquadratic methods for subtraction are known even in dimension when the set in time domain that we want to compute the inverse Fourier transform on is arbitrary. Even if the target set is an -box, the best known algorithms for this problem run in time , where is the precision parameter of the computation—this reduces to quadratic time even when and inverse polynomial in precision is desired. Thus, subtracting from time domain would result in at least cubic runtime in . Instead, we subtract the influence of the residual in frequency domain, which requires evaluations of (as we show, can be evaluated at a cost of just ). Note that it is crucial here that we peel off one coefficient at at time. Any improvements to this process, if they were to achieve runtime overall, would likely also imply improvements in the computation of approximate non-uniform Fourier transform: given a -sparse signal and a set with , output such that . However, it seems plausible that quadratic runtime in is essentially optimal for the non-uniform Fourier transform problem: specifically, that under natural complexity theoretic assumptions there exists no algorithm for the -approximate non-uniform Fourier transform problem with runtime when and for sufficiently large constant . We note that current techniques do not provide a subquadratic algorithm even for simple sets such as the box with points in dimension (due to the dependence mentioned above; a similar exponential dependence on the dimension is present in Fast Multipole Methods [GR87a, BG]). For an arbitrary set no subquadratic algorithm is known even when .
Putting it together: estimation with Fourier measurements
Combining the aforementioned ideas, we are able to develop a deterministic algorithm for the estimation problem with Fourier measurements in high dimensions:
[TABLE]
For the estimation problem (3) we obtain the following result.
Theorem 2** (Estimation guarantee, informal version of Theorem 5 in Section 5).**
Suppose is a power of two integer, is a positive integer, and . Then, for any signal with , the procedure Estimate (see Algorithm 2) recovers . Moreover, the sample complexity of this procedure is and its runtime is . Furthermore, the procedure Estimate is deterministic.
In the rest of this section, we give an overview of our techniques. Throughout the section, we present our results for the one-dimensional setting, as this makes notation simpler. All our results translate to the high-dimensional setting without any loss—see Section 4.2 for details.
2.1 Recovery via adaptive aliasing filters
Our main theorem is the following, which presents an algorithm for problem (1) for worst-case signals.
{restatable}
[Sparse FFT for worst-case signals]theoremsfftworstcase For any power of two integer and any positive integer and any signal with , the procedure SparseFFT in Algorithm 4 recovers . Moreover, the sample complexity of this procedure is and its runtime is .
The major difference between estimation and recovery (i.e., problem (3) vs. (1)) is the fact in the latter problem, the set of frequencies is unknown to us: the algorithm is only given access to and an upper bound on the sparsity of . Since our -separating filter is adaptive, i.e., depends on , this appears to present a challenge. However, we circumvent this challenge by constructing a sequence of successive approximations to the set . In dimension , these approximations amount to reducing modulo for all , and adaptively probing to learn which of the residue classes are nonzero. As before, our approach extends seamlessly to high dimensions by simply concatenating the coordinates into a single vector. Note that this is in sharp contrast to all previously known approaches, which are more efficient in low dimensions, but incur an exponential loss overall. We would like to note that at a high level one can view our filtering approach as a way to prune the FFT computation graph in a way that suffices for recovery of a -Fourier sparse vector.
We outline the main ideas in one-dimensional setting here to simplify the presentation (see Section 4.2 for the high-dimensional version of the argument). Let be the length of the signal and be the dimension for a power of two. We define to be a full binary tree of height and define a labelling scheme on the vertices as follows. {restatable}definitiondeftfull
Suppose is a power of two. Let be a full binary tree of height , where for every , the nodes at level (i.e., at distance from the root) are labeled with integers in . For a node , we let be its label. The label of the root is . The labelling of satisfies the condition that for every and every at level , the right and left children of have labels and , respectively. Note that the root of is at level 0, while the leaves are at level .
The tree captures the computation graph of FFT algorithm, where leaves correspond to frequencies in (given by the label), and for any , the nodes at level (i.e., at distance from the root) correspond to congruence classes of frequencies modulo , as specified by the labelling (see Figure 1(a)).
Note that the full FFT algorithm starts from the root of and computes the congruence classes of the Fourier transform of signal at each level of this tree iteratively. Because it can reuse the computations from each level for computing the next levels, the total runtime of FFT is .
In order to speed up the computation for sparse signals, we introduce the notion of a splitting tree, which is nothing but the subtree of that contains the nonzero locations of together with paths connecting them to the root. Given a set (the support of in Fourier domain), we define the splitting tree of the set as follows:
{restatable}
[Splitting tree]definitiondefsplit Let be a power of two. For every , the splitting tree of a set is a binary tree that is the subtree of that contains, for every , all nodes at level such that .
An illustration of such a tree is given in Figure 1(b). In order to recover the identities of the elements in , our algorithm performs an exploration of this tree. At every point, the algorithm constructs a filter that isolates frequencies in a given subtree and tests whether that subtree contains a nonzero signal. In order to make this work, we need a construction of filters that isolates the entire subtree as opposed to only one element, as Definition 1 does. Fortunately, the actual -isolating filters constructed in Lemma 1 satisfy precisely this property. The stronger isolation properties are captured by the following definition:
{restatable}
[Frequency cone of a leaf of ]definitiondeffreqcone For every power of two , subtree of , and vertex which is at level from the root, define the frequency cone of with respect to as
[TABLE]
Intuitively, the frequency cone of a node in captures all potential nonzeros of that belong to the subtree of in (see Figure 2). Our adaptive filter construction lets us obtain time domain access to the corresponding part of the frequency space:
{restatable}
[-isolating filter]definitiondefvtisol For every integer , subtree of , and leaf of , a filter is called -isolating if the following conditions hold:
- •
For all , we have .
- •
For every , we have .
Note that for all signals with and ,
[TABLE]
Iterative tree exploration process leading to an algorithm with runtime.
Now that we have defined the framework for our algorithm, we need to specify the order in which the algorithm will be accessing the leaves of the tree in order to minimize runtime. This is governed by the cost of constructing and using a -isolating filter for various nodes in . To quantify cost, we introduce the notion of a weight of a leaf in the tree. {restatable}[Weight of a leaf]definitiondefwt Suppose is a power of two. Let be a subtree of . Then for any leaf , we define its weight with respect to to be the number of ancestors of in tree with two children.
It turns out that the techniques from Lemma 1 also yield the following.
Lemma 2** (Informal version of Lemma 3 in Section 4).**
Suppose is a power of two. Let be a subtree of . Then for any leaf , there exists a -isolating filter with such that and can be evaluated at cost per point.
Before describing the algorithm we give an example illustrating filter support in time domain. Consider a complete binary tree of height . Suppose that is some vertex at level of this tree. Now we take the subtree rooted at and add an appendage of length to . The appendage is a path of nodes each of which has a single child. This doesn’t change the weight of any of the leafs of the original tree because every node on the appendage has exactly one child. One can see an example of such tree in Fig. 3. Suppose that the leaf is a leaf of the subtree rooted at , which is moved far from the root by the appendage. In order to isolate from the elements that are not in the subtree of we need a filter which is -periodic in time domain and in order to isolate from the rest of the elements in subtree of the filter needs to sample the signal at a fine grid of length . Note that the support of a -isolating filter is . In Fig. 3 we exhibit a -isolating filter which is constructed based on Lemma 3, where and correspond to this instance of splitting tree.
Given Lemma 2, our algorithm is natural. We find the vertex , which, by Kraft’s inequality, satisfies . We then define an auxiliary tree by appending a left and a right child to . Then for each of the children , we, in turn, construct a filter that isolates them from the rest of (i.e., from the frequency cones of other nodes in ) and check whether the corresponding restricted signals are nonzero. The latter is unfortunately a nontrivial task, since the sparsity of these signals can be as high as , and detecting whether a -sparse signal is nonzero requires samples. However, a fixed set of locations that satisfies the restricted isometry property (RIP) can be selected, and accessing the signal on those values suffices to test whether it is nonzero. The overall runtime becomes : the isolating filter has support at most , while the number of samples needed to test whether the two subtrees of are nonempty is , so peeling off elements takes time overall. This results in Theorem 2.1 (the algorithm is presented as Algorithm 4).
runtime under random phase assumption.
We note that the runtime can be easily reduced to if assumptions are made on the signal that ensure that its energy is evenly spread across time domain, making samples sufficient to detect whether the signal is zero or not. This occurs, for instance, if a signal’s frequencies satisfy distributional assumptions (e.g., the values have random phases). We present such a result in Section 7. It seems that even under this assumption on the values of the signal, since the support of the signal in Fourier domain is worst case, reducing the runtime below likely requires a major advance in techniques for non-uniform Fourier transform computation.
More formally, we introduce the notion of a worst-case signal with random phase as follows: {restatable}[Worst-case signal with random phase]definitionrandsign For any positive integer and power of two , we define to be a worst-case signal with random phase having values if
[TABLE]
independently for every . Furthermore, if of the values are nonzero, then is said to be a worst-case -sparse signal with random phase and is guaranteed to have sparsity .
Note that “worst-case” in the above definition signifies the fact that the support of the signal is arbitrary (having no distributional assumptions), subject to a potential sparsity constraint. We then present the following theorem: {restatable}[Sparse FFT for worst-case signals with random support]theoremsfftrandphase For any power of two integer , positive integer , and worst-case -sparse signal with random phase , the procedure SparseFFT-RandomPhase in Algorithm 5 recovers with probability . Moreover, the sample complexity of this procedure is and its runtime is .
Impossibility of reducing the number of iterations (rounds of adaptivity): signals with low Hamming weight support.
We note that our algorithm differs from all prior works in that it uses many rounds of adaptivity. Indeed, the samples that our algorithm takes are guided by values of the signal that have been read in previously queried locations, which is in contrast to most prior Sparse Fourier Transform algorithms. Two notable exceptions in recent literature include the adaptive block Sparse FFT algorithms of [CKSZ17] and [CKPS16].
Our algorithm uses rounds of adaptivity, peeling off one element at a time. It would be desirable to reduce the number of rounds of adaptivity by perhaps peeling off many elements in one batch as opposed to one at a time. For example, if the locations of the nonzeros of are uniformly random in , then the splitting tree of is likely to be rather balanced (see, e.g. Fig. 5 for an illustration), so perhaps one can find a filter that has small support and can be efficiently used to isolate many coefficients at once? Indeed, this intuition turns out to be correct for signals with uniformly random supports—we show in Section 2.2 below (with details presented in Section 8) that this idea yields a time algorithm. However, rather surprisingly, adversarial instances exist that force the peeling process to use rounds of adaptivity in the worst case, making our analysis essentially tight. We now present this adversarial instance.
{restatable}
[Hamming ball]definitionlowhamming For any power of two integer any integer , we define to be the closed Hamming ball of radius centered at 0:
[TABLE]
where is the Hamming weight of the binary representation of , i.e., is the number of ones in the binary representation of .
Note that .
{restatable}
[Class of signals with low Hamming support]definitionlowhammingsupport For any power of two integer and any integer , Let denote the class of signals in with support as in Definition 2.1,
[TABLE]
Note that for any we have that , so for any , the signals that are contained in class are \Theta\Big{(}\binom{\log_{2}n}{c}\Big{)}-sparse.
{restatable}
[Low Hamming weight binary trees]definitionlowhammingweighttrees Suppose is a power of two integer. Then, we define a low Hamming weight binary tree inductively for :
is defined to be the unique tree of depth that has a single leaf node and satisfies the property that each non-leaf node has a single right child only. Thus, has nodes. 2. 2.
For any , is constructed as follows: Take and label the nodes in order from the root to the leaf as . Then, for each node , take a copy of and let its root be the left child of node . The resulting tree defines .
Note that all the leaves of are at level .
It is not hard to see that is in fact the splitting tree for the set and, hence, the number of its leaves is . An illustration of the tree for and is shown in Figure 4.
We prove the following theorem in Section 6 (see Theorem 6):
Theorem 3** (Informal version of Theorem 6).**
A peeling process with threshold (i.e. any threshold that allows isolation of an element at cost bounded by ) must take iterations to terminate.
To add to the result above, we note that the lower bound on the number of rounds of adaptivity is not the only cause for quadratic runtime in our algorithm. The other cause is the necessity to update the residual signal as more and more elements are recovered, i.e. perform non-uniform Fourier transform computations. Since no subquadratic approach to this problem are known in high dimensions, it seems plausible that a runtime algorithm for high-dimensional FFT would also shed light on the complexity of this intriguing problem.
2.2 Runtime for random supports through a batched peeling process
To complement our lower bound of rounds of adaptive pruning for worst-case signals using our adaptive aliasing filters, we show that if the support of the signal is uniformly random, adaptive aliasing filters can be used to achieve an algorithm with runtime. A beautiful runtime and optimal sample complexity algorithm for this model was given in [GHI*+*13]. The algorithm was stated for but readily extends to high dimensions. Unfortunately, it comes with a major restriction, namely, the sparsity must be . Our approach is different and extends to all .
We now introduce the notion of a Fourier sparse random support signal: {restatable}[Random support signal]definitiondefrandsupp For any positive integer , power of two , and arbitrary , we define to be a random support signal of Fourier sparsity (with values given by ) if is the signal defined by
[TABLE]
where the are independently chosen for the various .
In other words, we assume a Bernoulli model for , while the values at the frequencies that are chosen to be in the support are arbitrary.
Our algorithmic result for such signals is stated below. {restatable}[Sparse FFT algorithm for random support signals]theoremsfftalg Suppose is a positive integer and and are powers of two. For any signal such that is a random support signal of Fourier sparsity , the procedure SparseFFT (see Algorithm 8) returns with probability . Moreover, the runtime and sample complexity of this procedure are .
The algorithm is motivated by the idea of speeding up our algorithm for worst-case signals (Algorithm 4, also see Theorem 2.1) by reducing the number of iterations of the process from down to . Such a reduction (which we show to be impossible for worst-case signals in Section 6) requires the ability to peel off many elements of the residual in a single phase of the algorithm, which turns out to be possible if the support of is chosen uniformly at random as in Definition 2.2. Indeed, if one considers the splitting tree of a signal with uniformly random support (see Fig. 5 for an illustration), one sees that
(a)
a large constant fraction of nodes satisfy ;
(b)
the adaptive aliasing filters constructed for such nodes will have significantly overlapping support in time domain.
We provide the intuition for this for the one-dimensional setting () to simplify notation (changes required in higher dimensions are minor). In this setting, property (b) above is simply a manifestation of the fact that since the support is uniformly random, any given congruence classes modulo for a large enough constant is likely to contain only a single element of the support of . Our adaptive aliasing filters provide a way to only partition frequency space along a carefully selected subset of bits in , but due to the randomness assumption, one can isolate most of the elements by simply partitioning using the bottom bits. This essentially corresponds to hashing into buckets at computational cost . While this scheme is efficient, it unfortunately only recovers a constant fraction of coefficients. One solution would be to hash into buckets (i.e., consider congruence classes modulo ), which would result in perfect hashing with good constant probability, allowing us to recover the entire signal in a single round. However, this hashing scheme would result in a runtime of and is, hence, not satisfactory. On the other hand, hashing into buckets is clearly wasteful, as most buckets would be empty. Our main algorithmic contribution is a way of “implicitly” hashing into buckets, i.e., getting access to the nonempty buckets, at an improved cost of .
Our algorithm uses an iterative approach, and the main underlying observation is very simple. Suppose that we are given the ability to “implicitly” hash into buckets for some , namely, get access to the nonempty buckets. If is at least , we know that there are no collisions with high probability and we are done. If not, then we show that, given access to nonempty buckets in the -hashing (i.e. a hashing into buckets), we can get access to the nonempty buckets of a -hashing for some appropriately chosen constant at a polylogarithmic cost in the size of each nonempty bucket of the -bucketing by essentially computing the Fourier transform of the signal restricted to nonempty buckets in the -bucketing. We then proceed iteratively in this manner, starting with , for which we can perform the hashing explicitly. Since the number of nonzero frequencies remaining in the residual after iterations of this process decays geometrically in , we can also afford to use a smaller number of buckets in the hashing that we construct explicitly, ensuring that the runtime is dominated by the first iteration.
Ultimately, the algorithm takes the following form. At every iteration, we explicitly compute a hashing into buckets explicitly. Then, using a list of nonempty buckets in a -hashing from the previous iteration, we extend this list to a list of nonempty buckets in a -bucketing at polylogarithmic cost per bucket (by solving a well-conditioned linear system, see Algorithm 6), where for some large enough constant . Meanwhile, we reduce by a factor of , thus maintaining the invariant at all times (note that this is satisfied at the start, when , and remains invariant at each iteration). Therefore, after a logarithmic number of iterations, we have effectively emulated hashing into but at a total cost of roughly one hashing computation into buckets (see Figure 6 for an illustration).
Organization.
In Section 3, we introduce basic definitions and notation that will be used throughout the paper. Section 4 introduces our main technical tool of adaptive aliasing filter, which are used in the various algorithms found in this paper. Section 5 shows how to use the adaptive aliasing filters to solve the problem of estimation for Fourier measurements for worst-case signals, i.e., problem (3), thereby proving Theorem 2. Section 6 then shows that the inherent tree pruning process used to subtract off recovered frequencies and access residual signals in the estimation algorithm is essentially optimal.
Section 7 proves our main theorem, Theorem 2.1, for problem (1) on worst-case signals. Additionally, it shows how to improve on the runtime under the assumption that the signal is a worst-case signal with random phase, thereby proving Theorem 2.1.
Finally, Section 8 discusses how to obtain an algorithm for problem (1) on random support signals and proves Theorem 2.2.
3 Preliminaries and notation
In this section, we introduce some notation and basic definitions that we will use in the paper.
For any positive integer , we use the notation to denote the set of integer numbers . We are interested in computing the Fourier transform of discrete signals of size in dimension , where for some . Such a signal will be a function . However, we will often identify with for convenience (and often use the two interchangably depending on the context). This correspondence is formally defined later in Definition 9. We first need the notion of an inner product.
Definition 2** (Inner product).**
Let and be two vectors in dimension . We denote the inner product of and by .
Let us define the Fourier transform of a signal.
Definition 3** (Fourier transform).**
For any positive integers and , the Fourier transform of a signal is denoted by , where for any , we define .
Note that in the case of , the Fourier transform reduces to the Hadamard transform of size .
Claim 1** (Parseval’s theorem).**
For any positive integers and , any signal satisfies .
Definition 4** (Unit impulse).**
For any positive integers and , the unit impulse function is defined as the function given by for and for .
Claim 2**.**
For any positive integers , , and any , the inverse Fourier transform of given by is .
Claim 3** (Convolution theorem).**
Suppose and are positive integers. Then, for any signals , , where is the convolution of and which itself is a signal in defined as, for all .
We will require the notion of a tensor product of signals. Given signals , the tensor product constructs a signal in that is defined as follows.
Definition 5** (Tensor multiplication).**
Suppose and are positive integers. Given functions , we define the tensor product as for all .
Note that the tensor product is essentially a generalization of the usual outer product on two vectors to vectors.
Claim 4** (Fourier transform of a tensor product).**
For any integers and and , let denote the tensor product . Then, the -dimensional Fourier transform of is the tensor product of , i.e., .
Definition 6**.**
For any positive , , and , a signal is called Fourier -sparse if .
Definition 7** (The Restricted isometry property).**
We say that a matrix satisfies the restricted isometry property (RIP) of order if for every -sparse vector , i.e., , it holds that .
We will use the following theorem from [HR17].
Theorem 4**.**
(The Restricted Isometry Property [HR17, Theorem 3.7])* For sufficiently large and , and a unitary matrix satisfying , the following holds. For some let be a matrix whose rows are chosen uniformly and independently from the rows of , multiplied by . Then, with probability , the matrix A satisfies the restricted isometry property of order , as per Definition 7.*
4 Adaptive aliasing filters
In this section, we introduce a new class of filters that forms the basis of our algorithm for estimation of worst case Fourier sparse signals. For simplicity, we begin by introducing the filters in the one-dimensional setting and then show how they naturally extend to the multidimensional setting (using tensoring). Throughout the section, we assume that the input is a signal with for some .
4.1 One-dimensional Fourier transform
We restate the following definition for and corresponding labels of vertices: \deftfull*
Next, we recall the definition of the splitting tree of a set. \defsplit*
The splitting tree can be constructed easily in time, given . We provide simple pseudocode in Algorithm 9.
For every node , the level of , denoted by , is the distance from to the root. The following basic claim will be useful and follows immediately from the definition of :
Claim 5**.**
For every integer power of two , if is a subtree of , then for every node , the labels of nodes that belong to the subtree of rooted at are congruent to modulo . Furthermore, every node at level or higher which satisfies belongs to .
\defwt
Definition 8** (-isolating filter).**
For every power of two , set , and , a filter is called -isolating if , and for all .
In particular, if is -isolating, then for every signal with , we have
[TABLE]
for all , by convolution theorem, see Claim 3.
While the definitions above suffice to state our estimation primitive, our Sparse FFT algorithm requires a filter that satisfies a more refined property due to the fact that throughout the execution of the algorithm, the identity of is only partially known. We encode this knowledge as a subtree of whose leaves are not necessarily at level . Hence, every leaf corresponds to a set of frequencies in the support of whose full identities have not been discovered yet. This is captured by the following definition:
\deffreqcone
Note that under this definition, the frequency cone of a vertex of corresponds to the subtree rooted at when is embedded inside (see Figure 2). \defvtisol* Note that in particular, for all signals with and ,
[TABLE]
Lemma 3** (Filter properties).**
For every power of two , subtree of , and leaf , the procedure FilterPreProcess outputs a static data structure in time such that, given , the following conditions hold:
The primitive FilterTime outputs a filter such that and is a -isolating filter. Moreover, the procedure runs in time . 2. 2.
For every , the primitive FilterFrequency computes the Fourier transform of at frequency , namely, , in time .
Before we prove Lemma 3, we establish the following corollary, assuming the statement of Lemma 3 holds.
Corollary 1**.**
Suppose is a power of two, , and . Then, let be the splitting tree of . If is the leaf of with label , while is the output of FilterPreProcess, and is the filter computed by FilterTime, then the following conditions hold:
(1)
* is an -isolating filter.*
(2)
.
Proof.
Indeed, given a subset and , if , then all the leaves of are at level and the set of labels of the leaves is exactly . Hence, for every leaf of , one has . By Lemma 3, is a -isolating filter. Therefore, by Definition 1,
[TABLE]
and for all . This implies (1), see definition of -isolating filters in 8. Property (2) follows directly from Lemma 3. ∎
Now, we prove Lemma 3.
Proof of Lemma 3: Let be a leaf of , denote the level of (i.e., distance from the root), denote the root of , and denote the path from root to in , where and .
We first show how to efficiently construct a -isolating filter in the Fourier domain, i.e., how to efficiently construct . Then we derive the time domain representation of . We iteratively define a sequence of functions (with Fourier transforms , respectively) by traversing the path from the root to in , after which we let be the final filter constructed on this path, i.e., (and ). We start with for all . Then, we iteratively define in terms of according to the following update rule for all :
[TABLE]
for every .
We now show that is a -isolating filter. It is enough to show that satisfies
[TABLE]
and
[TABLE]
We now prove (5). Consider a leaf of distinct from . Recall that denotes the root to path in . Let be the largest integer such that is a common ancestor of and .
By definition of tree (Definition 2.1) and because is at level , one has that the label of the right child of is , and the label of the left child is . Furthermore, using this together with Claim 5, we get that the labels of nodes in subtree of subtended at the right child of are congruent to modulo , and labels in the subtree rooted at the left child of are all congruent to modulo .
Suppose that belongs to the right subtree of , and belongs to the left subtree (the other case is symmetric). We thus get that , and . It now suffices to note that by construction of (see (4)), we have that for all ,
[TABLE]
By Claim 5, for all one has that and hence, because . Therefore, by substituting in the above, we get
[TABLE]
implying that and, hence, , as required.
It remains to prove (6). Consider any , and note that by Claim 5, . Using this in (4), we get
[TABLE]
since for every .
Next, note that the primitive FilterPreProcess() preprocesses the tree by traversing the path from root to leaf in time . Given , the primitive FilterFrequency implements (4) for successive values of , and the runtime of this algorithm is because of the for loop passing through vector .
Finally, it remains to show that the filter in time domain can be computed efficiently and has a small support. First note that by Claim 2, the inverse Fourier transform of is .
By the duality of convolution in the time domain and multiplication in Fourier domain (see Claim 3), we can equivalently define (see (4)) by letting and setting
[TABLE]
for every . Thus, is the time domain representation of the filter defined in (4). We now note that convolving any function with a function supported on two points, e.g., , at most doubles the support. Since the number of times the convolution is performed in obtaining from (as per (7)) is , the support size of is at most . Given , the primitive FilterTime implements the above algorithm for construction of and, therefore, runs in time . ∎
4.2 -dimensional Fourier transform
In this section, we show that our construction of adaptive aliasing filters from the previous section naturally extends to higher dimensions without any loss by tensoring.
Definition 9** (Flattening of to . Unflattening of to ).**
For every power of two , positive integer , and we define the flattening of as
[TABLE]
Similarly, for a subset we let denote the flattening of .
For , we define the unflattening of as , where
[TABLE]
for every . Similarly, for a subset , we let denote the unflattening of .
Definition 10** (Multidimensional splitting tree).**
Suppose is a positive integer and is a power of two. For every , the flattened splitting tree of is defined as where is flattening of .
The unflattened splitting tree of is denoted by and is obtained from the flattened splitting tree by unflattening the labels of all nodes .
Definition 11** (Multidimensional -isolating filter).**
Suppose is a power of two integer and for a positive integer . Then, for any frequency , a filter is called -isolating if and for every .
Definition 12** (Frequency cone of a leaf of in high dimensions).**
Suppose is a positive integer, is a power of two, and . For every unflattened subtree of and , we define the frequency cone of as
[TABLE]
where denotes the level of in (i.e., the distance from the root).
Claim 6**.**
For every positive integer , power of two , and every subtree of and every leaf of height , let T^{\prime}=T\cup\{\text{left child uv}\}\cup\{\text{right child wv}\}. Then the following holds,
[TABLE]
Definition 13** (Multidimensional -isolating filter).**
Suppose is a positive integer, is a power of two, and . For every subtree of and vertex , a filter is called -isolating if for all and for every one has .
In particular, for every signal with and for all ,
[TABLE]
Lemma 4** (Construction of a multidimensional isolating filter).**
Suppose is a power of two integer and is a positive integer. Let . For every subtree of and every leaf , there exists a -isolating filter such that . Such a filter can be constructed in time . Moreover, for any frequency , the Fourier transform of at frequency , i.e., , can be computed in time .
The proof of Lemma 4 appears in Appendix A. The key idea is to choose to be the smallest positive integer such that . One then defines successive filters by letting and
[TABLE]
for , where is an isolating filter corresponding to the projection of the leaves of tree into coordinate . The final filter turns out to be -isolating.
4.3 Putting it together
Claim 7**.**
For any binary tree let be the set of leaves of . There exists a leaf such that .
Proof.
Let be the tree obtained by “collapsing” , i.e., removing all nodes (and incident edges) of that have exactly one child. Then, observe that the leaves of are still preserved in , except that they are at possibly varying levels. In particular, a leaf in will be at level . Thus, by applying Kraft’s inequality to (which is an equality because every node in is either a leaf or has two children), we see that
[TABLE]
Therefore, there exists a such that and, therefore, , as desired. ∎
This gives us the main result of this section, and the main technical lemma of the paper:
Corollary 2**.**
For every integer a power of two and every positive integer , every , there exists an and an -isolating filter (as defined in Definition 11) such that .
Proof.
Follows by combining Lemma 4 with Claim 7. ∎
5 Estimation of sparse high-dimensional signals in quadratic time
In this section, we use the filters that we have constructed in Section 4 in order to show the first result of the paper, a deterministic algorithm for estimation of Fourier-sparse signals in time which is quadratic in the sparsity.
Theorem 5** (Estimation guarantee).**
Suppose is a power of two integer and is a positive integer and . Then, for any signal with , the procedure Estimate (see Algorithm 2) returns . Moreover, the sample complexity of this procedure is and its runtime is .
Proof.
The proof is by induction on the iteration number of the while loop in Algorithm 2. One can see that since at each iteration the tree looses one of its leaves, the algorithm terminates after iterations, since initially the number of leaves of is . Let denote the signal after iteration , and let denote the tree after iteration and let denote the set of frequencies corresponding to leaves of , i.e., . In particular, and is the unflattened spltting tree of and .
We claim that for each , we have
[TABLE]
Base case of induction:
We have and , which immediately implies (8) for .
Inductive step:
For the inductive hypothesis, let and assume that (8) holds for . The main loop of the algorithm finds . By Claim 7 along with inductive hypothesis, . Note that the main loop of the algorithm constructs a -isolating filter , along with . In order to do so, the algorithm constructs trees for all which in total takes time . Given ’s, the algorithm constructs filter and in time , by Lemma 4. Moreover, the filter has support size by Lemma 4.
By Lemma 4 computing the quantity in line 15 of Algorithm 2 can be done in time . By convolution theorem 3, the quantity satisfies , and thus
[TABLE]
where the last transition is due to the fact that is -isolating along with the inductive hypothesis of .
We thus get that . Moreover, it updates the tree . Also note that the set gets updated to accordingly. This establishes (8) for , thereby completing the inductive step.
The number of steps is exactly , as follows from the inductive claim. Thus, the total runtime is . ∎
6 A lower bound of rounds of tree pruning
One apparent disadvantage of our algorithm presented in the previous section is the fact that it only estimates elements of the Fourier spectrum one at a time, thereby taking rounds to estimate all elements in the spectrum. Since the isolation of one element takes up to time due to the support size of , the resulting bound on the runtime is quadratic in . A natural conjecture is that our analysis is not tight, and one can achieve better runtime by removing several nodes of weight at most at once. If one could argue that the filters that isolate the nodes removed in one round have nontrivial overlap, runtime improvements could be achieved. In this section we present a class of signals on which rounds of pruning the tree are required, showing that our analysis is essentially optimal.
Tree pruning process
Suppose is a power of two integer and is a positive integer. Let be a subtree of . The tree pruning process, , is an iterative algorithm that performs the following operations on successively until is empty:
Find \tilde{S}_{\tau}=\{\text{leaves v of }T:w_{T}(v)\leq\tau\}, i.e., set of vertices of weight no more than . 2. 2.
For each (in an arbitrary order) remove from together with the path from to its closest ancestor that has two children (i.e., run ; see Algorithm 9).
We show that for every and sufficiently large integer there exists a tree with leaves such that with requires rounds to terminate. This in particular shows that our runtime analysis from section 5 cannot be improved by reusing work done in a single iteration, and hence our analysis is essentially optimal. Our construction is one-dimensional, although higher dimensional extensions can be readily obtained.
Theorem 6**.**
For any integer constant , sufficiently large power of two integer there exists such that if , the following condition holds. There exists a subtree of with leaves such that the tree pruning process requires iterations to terminate.
The following simple lemma is crucial to our analysis
Lemma 5** (Monotonicity of tree pruning process).**
Suppose is a power of two integer a subtree of and a subtree of . Then for every integer the number of rounds that it takes to collapse is at most the number of rounds that it takes to collapse .
Proof.
For , let (respectively ) denote the tree obtained by performing rounds of the tree pruning process (with threshold ) to (respectively ). Note that and .
We claim that is a subtree of for all , which will obviously imply the desired conclusion. We use induction on . Note that the base of induction is trivial for . Now, we prove the inductive step. Suppose . By the inductive hypothesis, we have that is a subtree of . Thus, for any leaf that appears in both and , we have (this is because any node in along the path from the root to that has exactly one child will also have exactly one child in ). Hence, if is removed from in the -th iteration of the process, then it is also removed from during the -th iteration. Hence, is a subtree of , which completes the inductive step and, therefore, proves the claim. ∎
We recall a few definitions. \lowhamming* Note that .
\lowhammingsupport
Note that for any we have that , so for any , the signals that are contained in class are \Theta\Big{(}\binom{\log_{2}n}{c}\Big{)}-sparse.
\lowhammingweighttrees
It is not hard to see that is in fact the splitting tree for the set and, hence, the number of its leaves is .
Now, we are ready to prove Theorem 6.
Proof of Theorem 6: Let us choose the tree to be for some positive integer . We will set parameter at the end of the proof. Let denote the number of iterations required to collapse with threshold . We prove that
[TABLE]
for any power of two integer , any integer , and any positive integer . We use induction on .
Base: Note that for , the tree has one leaf, which gets removed in the first iteration of the tree pruning process. Thus, for any power of two and , and so, (9) holds for .
Inductive step: Suppose . For any , we label the nodes along the path from the root to the rightmost leaf (i.e., the path formed by starting at the root and repeatedly following the right child) in order as .
Note that if , then
[TABLE]
Thus, (9) does indeed hold for .
Now, suppose . Recall that a copy of is rooted at the left child of node of for all . We divide the pruning process on into two phases. The first phase consists of the process up until the point at which the left subtree of node in completely collapses for some , while the second phases consists of the process thereafter. Thus, the number of rounds in the first phase is just the number of rounds till the top left subtrees collapses.
Note that during the first phase, the behavior of the collapsing process on the left subtree of node corresponds to running a collapsing process with threshold on . Thus, the number of rounds in the first phase is,
[TABLE]
By the inductive hypothesis (on ), we have that for
[TABLE]
which implies that since we assumed .
Now, let be the tree obtained after performing rounds of the collapsing process on . Moreover, let be the tree obtained by further removing any left subtrees of nodes . By Lemma 5, we have that the number of rounds needed to collapse is at least the number of rounds needed to collapse . Moreover, observe that the number of rounds needed to collapse is precisely , thus, the number of rounds in the second phase is at least , and so,
[TABLE]
Note that a similar argument gives us
[TABLE]
for all (this condition ensures that , as required by our argument above). Hence, it follows that
[TABLE]
which establishes (9) for . This completes the inductive step.
Recall that k=\Theta\Big{(}\binom{\log_{2}n}{c}\Big{)}, so for any constant one has . Setting , we get
[TABLE]
as required.
∎
7 Sparse FFT for worst-case sparse signals and worst case signals with random phase
In this section we prove the main result of the paper, namely \sfftworstcase*
We also study Fourier sparse signals whose nonzero frequencies are distributed arbitrarily (worst-case) and whose values at the nonzero frequencies are independently chosen to have a uniformly random phase. Recall Definition 2.1:
\randsign
For this model we prove the stronger result: \sfftrandphase*
The main property that allows us to obtain the stronger result is the fact that a small number of time domain samples from such a signal suffice to approximate its energy with high confidence (whereas samples are required in general for a worst-case -sparse signal). This is reflected by the following
Lemma 6**.**
For any positive integer , power of two , and worst-case signal with random phase , we have
[TABLE]
where for some absolute constant and are i.i.d. random variables. The probability is over the randomness in choosing the various as well the randomness in the choice of phase for each frequency of .
For completeness we present a proof for this lemma in Appendix A.
7.1 Proofs of Theorems 2.1 and 2.1
Given the construction of our adaptive aliasing filter from the previous section, our sparse recovery algorithms follow by a reduction to the estimation problem. We find the vertex , which, by Kraft’s inequality, satisfies . We then define an auxiliary tree by appending a left and a right child to . Then for each of the children , we, in turn, construct a filter that isolates them from the rest of (i.e., from the frequency cones of other nodes in ) and check whether the corresponding restricted signals are nonzero. The latter is unfortunately a nontrivial task, since the sparsity of these signals can be as high as , and detecting whether a -sparse signal is nonzero requires samples. However, a fixed set of locations that satisfies the restricted isometry property (RIP) can be selected, and accessing the signal on those values suffices to test whether it is nonzero. If the signal is further assumed to be a worst case random phase signal, then a polylogarithmic number of samples suffices. The following lemma (Lemma 7) makes the latter claim formal. The algorithm is presented as Algorithm 4.
Lemma 7** (ZeroTest guarantee).**
Suppose is a positive integer and is a power of two. Assume is a subtree of . Suppose that signals satisfy . Suppose that is a multiset of sample from which satisfies the following for every leaf of :
[TABLE]
where is the signal obtained by restricting to frequencies and zeroing it out on all other frequencies.
Then the following conditions hold:
- •
ZeroTest* outputs true if ; otherwise, it outputs false.*
- •
The sample complexity of this procedure is , where is the weight of leaf in (see Definition 2.1).
- •
The runtime of the ZeroTest procedure is
[TABLE]
where denotes the number of leaves of .
Proof.
Consider lines 14-15 in Algorithm 3. By Claim 3, we have that
[TABLE]
Thus,
[TABLE]
Note that, by Lemma 4, the filter used in Algorithm 3 is a -isolating filter. Therefore, by the assumption and the definition of a -isolating filter (see Definition 13), we have
[TABLE]
Note that is essentially the inverse Fourier transform of , where denotes the signal obtained by restricting to frequencies and zeroing out the signal on all other frequencies. By the assumption of the lemma we have the following:
[TABLE]
Therefore the first claim of the lemma holds.
Note that in order to construct a -isolating filter , along with , the algorithm constructs trees for all , which has total time complexity . Given ’s, the algorithm constructs filter and in time , by Lemma 4. Moreover, the filter has support size , by Lemma 4.
By Lemma 4, computing the quantities for all in line 14 of Algorithm 3 can be done in time . Given the values of for various , computing all \big{\{}|H^{\Delta}_{f^{*}}|^{2}\big{\}}_{\Delta\in\mathbf{\Delta}} in line 15 takes time . Therefore the total runtime of this procedure is
[TABLE]
as desired.
Because support size of is , computing all \big{\{}|H^{\Delta}_{f^{*}}|^{2}\big{\}}_{\Delta\in\mathbf{\Delta}} in line 15 of the algorithm requires samples from which proves the second claim of the lemma. ∎
We now prove our main result:
Proof of Theorems 2.1 and 2.1: Note that Algorithms 4 and 5 are identical except in line 2. We first analyze the common code of the algorithms (after line 2) under the assumption that the set in all calls to ZeroTest are replaced with a more powerful set which satisfies the precondition of Lemma 7 hence ZeroTest correctly tests the zero hypothesis on its input signal with probability . We then establish a coupling between this idealized execution and the actual execution for both Algorithms 4 and 5, leading to our result.
Let denote the size of the set . We prove that the following properties are maintained throughout the execution of SparseFFT (Algorithm 4) and SparseFFT-RandomPhase (Algorithm 5):
(1)
;
(2)
For every leaf of tree one has ;
(3)
If is a worst-case signal with random phase, then is a worst-case signal with random phase;
(4)
The quantity always decreases by at least 1 on every iteration of Algorithm 4 or 5;
(5)
Always ;
The base of the induction is provided by the first iteration, at which point is a single vertex where is the root with and . The conditions (1) and (2) and (3) and (5) are satisfied since and and is a worst-case signal with random phase if is a worst-case signal with random phase.
We now prove the inductive step. We assume that conditions (1) and (2) and (3) and (5) of the inductive hypothesis are satisfied at the beginning of a certain iteration and argue that conditions (1) and (2) and (3) and (5) are maintained at the end of the iteration. We also show that the value of the quantity defined in (4), at the end of the loop is smaller than its value at the start of the loop by at least one. Let be the smallest weight leaf chosen by the algorithm in line 4. We now consider two cases.
Case 1: . Since is a -isolating filter, we have by Definition 1 that for every signal with Fourier support and for all ,
[TABLE]
By condition (1) of the inductive hypothesis one has and thus we can apply (10) with and , obtaining
[TABLE]
Note that by Claim 3,
[TABLE]
where is the quantity computed in line 15. We thus get that
[TABLE]
because due to the assumption that . Thus we get that therefore at the end of the loop we have which means that will no longer be in . And also gets removed from tree implying that will be excluded from . Note that this also implies that will remain a worst-case signal with random phase. Therefore, condition (1) and (2) and (3) hold.
Now, note that will decrease by 1 exactly because is no longer in and the rest of the support is unchanged. This shows that condition (5) holds. Also, decreases by exactly because the level of was and gets removed from . So will decrease by exactly one as required in condition (4).
Case 2 Suppose that . We first check that the invocation of ZeroTest satisfies preconditions of Lemma 7. We need to ensure that for the residual signal one has
[TABLE]
where is the tree obtained from by adding two children of (line 19). This follows, since by the inductive hypothesis we have
[TABLE]
and because by claim 6 we have,
[TABLE]
We thus get that the preconditions of Lemma 7 are satisfied, and the output of ZeroTest is true if and false otherwise. A similar analysis shows that the algorithm correctly tests the zero hypothesis on . We thus get, letting denote the tree at the end of the while loop, that
[TABLE]
and for every one has . Hence, because remains unchanged, conditions (1) and (2) and (3) hold at the end of the loop.
Now, we show is decreased at least by one. By inductive hypothesis and at least one of or will be added to because . Note that hence . Because remains unchanged, the value of will decrease by at least one hence conditions (4) and (d) hold.
Because for every leaf , it follows from condition (2) that the quantity is non-negative. At the first iteration, and where is the root with . Hence, at first iteration. Because is decreasing by at least 1 at each iteration, the algorithm terminates after iterations. By Lemma 7 along with Claim 7, the runtime of each iteration of algorithm is and also sample complexity of each iteration is therefore the total runtime and sample complexity both will be .
Finally, observe that throughout this analysis we have assumed that the set satisfies the precondition of Lemma 7 for all the invocations of ZeroTest by our algorithm.
In reality, there are two cases. The first case is for worst-case signals (Algorithm 4, Theorem 2.1). In this case, the algorithm chooses to be a multiset which corresponds to the Fourier measurements with RIP of order . Let be the dimensional inverse Fourier transform’s matrix with points. The matrix is a unitary matrix. If you let denote the submatrix of whose rows are sampled from according to set defined in line 5 of Algorithm 4 then by Theorem 4 there exists a multiset of size such that satisfies the restricted isometry property of order . Therefore, for every signal :
[TABLE]
As we have shown in condition (5) of the induction, . Hence for every leaf of the tree therefore the precondition of lemma 7 is satisfied.
The second case is for worst-case signals with random phase (Algorithm 5, Theorem 2.1). We have shown in condition (3) of the induction that is a worst-case signal with random phase in every iteration of the algorithm. Therefore for every leaf of the tree it is true that is a worst-case signal with random phase. In this case, the multiset is defined in line 5 of Algorithm 5 therefore by Lemma 6 for a fixed leaf of tree with probability at least the following holds:
[TABLE]
where .
This shows that in the second case which corresponds to theorem 2.1, the failure probability of procedure ZeroTest is at most . Moreover, the above analysis shows that SFFT makes at most calls to ZeroTest. Therefore, by a union bound, the overall failure probability of the calls to ZeroTest is . Hence, we obtain the desired result.
∎
8 Signals with random support in high dimension
In this section, we consider Fourier sparse signals whose support in the frequency domain is chosen randomly, while the values at the nonzero frequencies are chosen arbitrarily (worst-case). In other words, we assume a Bernoulli model for , while the values at the frequencies that are chosen to be in the support are arbitrary. We will present an algorithm that runs in time . The model for random support signals can be found in Definition 2.2 (Section 2), which we restate here for convenience of the reader:
\defrandsupp
8.1 Outline of our approach
The algorithm is motivated by the idea of speeding up our algorithm for worst-case signals (Algorithm 4, also see Theorem 2.1) by reducing the number of iterations of the process from down to . Such a reduction (which we show to be impossible for worst-case signals in Section 6) requires the ability to peel off many elements of the residual in a single phase of the algorithm, which turns out to be possible if the support of is chosen uniformly at random as in Definition 2.2. Indeed, if one considers the splitting tree of a signal with uniformly random support, one sees that
(a)
a large constant fraction of nodes satisfy ;
(b)
the adaptive aliasing filters constructed for such nodes will have significantly overlapping support in time domain.
We provide the intuition for this for the one-dimensional setting () to simplify notation (changes required in higher dimensions are minor). In this setting, property (b) above is simply a manifestation of the fact that since the support is uniformly random, any given congruence classes modulo for a large enough constant is likely to contain only a single element of the support of . Our adaptive aliasing filters provide a way to only partition frequency space along a carefully selected subset of bits in , but due to the randomness assumption, one can isolate most of the elements by simply partitioning using the bottom bits. This essentially corresponds to hashing into buckets at computational cost . While this scheme is efficient, it unfortunately only recovers a constant fraction of coefficients. One solution would be to hash into buckets (i.e., consider congruence classes modulo ), which would result in perfect hashing with good constant probability, allowing us to recover the entire signal in a single round. However, this hashing scheme would result in a runtime of and is, hence, not satisfactory. On the other hand, hashing into buckets is clearly wasteful, as most buckets would be empty. Our main algorithmic contribution is a way of “implicitly” hashing into buckets, i.e., getting access to the nonempty buckets, at an improved cost of .
Our algorithm uses an iterative approach, and the main underlying observation is very simple. Suppose that we are given the ability to “implicitly” hash into buckets for some , namely, get access to the nonempty buckets. If is at least , we know that there are no collisions with high probability and we are done. If not, then we show that, given access to nonempty buckets in the -hashing (i.e. a hashing into buckets), we can get access to the nonempty buckets of a -hashing for some appropriately chosen constant at a polylogarithmic cost in the size of each nonempty bucket of the -bucketing by essentially computing the Fourier transform of the signal restricted to nonempty buckets in the -bucketing. We then proceed iteratively in this manner, starting with , for which we can perform the hashing explicitly. Since the number of nonzero frequencies remaining in the residual after iterations of this process decays geometrically in , we can also afford to use a smaller number of buckets in the hashing that we construct explicitly, ensuring that the runtime is dominated by the first iteration.
Ultimately, the algorithm takes the following form. At every iteration, we explicitly compute a hashing into buckets explicitly. Then, using a list of nonempty buckets in a -hashing from the previous iteration, we extend this list to a list of nonempty buckets in a -bucketing at polylogarithmic cost per bucket (by solving a well-conditioned linear system, see Algorithm 6), where for some large enough constant . Meanwhile, we reduce by a factor of , thus maintaining the invariant at all times (note that this is satisfied at the start, when , and remains invariant at each iteration). Therefore, after a logarithmic number of iterations, we have effectively emulated hashing into but at a total cost of roughly one hashing computation into buckets (see Figure 6 for an illustration).
Bucketing in high dimensions (MakeBucket function).
We note that our vectorial notation for buckets in high dimensions (see section 8.2) allows us to continue talking about bucketings with buckets, even though now the number of buckets is in fact a vector of length . In fact in dimension the only property of the bucketing that matters for our analysis is the number of buckets and the shape of each bucket is not important (this is due to the fact that the support is sampled from a permutation invariant distribution). In order to avoid unnecessary notation overload, in Algorithm 8 we introduce procedure MakeBucket that constructs a bucketing of size of the following simple form. The vector is defined by
[TABLE]
The pseudocode for MakeBucket, which implements the formula above, is given in Algorithm 8.
8.2 Notation
We will need notations for vectorial operations, e.g., entrywise multiplication and/or division of vectors, which is defined in the following definition.
Definition 14** (Entrywise vectorial arithmetic).**
Suppose that , and are -dimensional vectors and is a scalar value. Then we define the following operations,
[TABLE]
8.3 Outline of our approach
The algorithm is motivated by the idea of speeding up our algorithm for worst-case signals (Algorithm 4, also see Theorem 2.1) by reducing the number of iterations of the process from down to . Such a reduction (which is shown to be impossible for worst-case signals in Section 6) requires the ability to peel off many elements of the residual in a single phase of the algorithm, which turns out to be possible if the support of is chosen uniformly at random as in Definition 2.2. Indeed, if one considers the splitting tree of a signal with uniformly random support (see Figure 5 for an illustration), one sees that
(a)
a large constant fraction of nodes satisfy ;
(b)
the adaptive aliasing filters constructed for such nodes will have significantly overlapping support in time domain.
We provide the intuition for this for the one-dimensional setting () to simplify notation (changes required in higher dimensions are minor). In this setting, property (b) above is simply a manifestation of the fact that since the support is uniformly random, any given non-empty congruence class modulo for a large enough constant is likely to contain only a single element of the support of . Our adaptive aliasing filters provide a way to only partition frequency space along a carefully selected subset of bits in , but due to the randomness assumption, one can isolate most of the elements by simply partitioning using the bottom bits. This essentially corresponds to hashing into buckets at computational cost . While this scheme is efficient, it unfortunately only recovers a constant fraction of coefficients. One solution would be to hash into buckets (i.e., consider congruence classes modulo ), which would result in perfect hashing with good constant probability, allowing us to recover the entire signal in a single round. However, this hashing scheme would result in a runtime of and is, hence, not satisfactory. On the other hand, hashing into buckets is clearly wasteful, as most buckets would be empty. Our main algorithmic contribution is a way of “implicitly” hashing into buckets, i.e., getting access to the nonempty buckets, at an improved cost of .
Our algorithm uses an iterative approach, and the main underlying observation is very simple. Suppose that we are given the ability to “implicitly” hash into buckets for some , namely, get access to the nonempty buckets. If is at least , we know that there are no collisions with high probability and we are done. If not, then we show that, given access to nonempty buckets in the -hashing (i.e. a hashing into buckets), we can get access to the nonempty buckets of a -hashing for some appropriately chosen constant at a polylogarithmic cost in the size of each nonempty bucket of the -bucketing by essentially computing the Fourier transform of the signal restricted to nonempty buckets in the -bucketing. We then proceed iteratively in this manner, starting with , for which we can perform the hashing explicitly. Since the number of nonzero frequencies remaining in the residual after iterations of this process decays geometrically in , we can also afford to use a smaller number of buckets in the hashing that we construct explicitly, ensuring that the runtime is dominated by the first iteration.
Ultimately, the algorithm takes the following form. At every iteration, we explicitly compute a hashing into buckets explicitly. Then, using a list of nonempty buckets in a -hashing from the previous iteration, we extend this list to a list of nonempty buckets in a -bucketing at polylogarithmic cost per bucket (by solving a well-conditioned linear system, see Algorithm 6), where for some large enough constant . Meanwhile, we reduce by a factor of , thus maintaining the invariant at all times (note that this is satisfied at the start, when , and remains invariant at each iteration). Therefore, after a logarithmic number of iterations, we have effectively emulated hashing into but at a total cost of roughly one hashing computation into buckets (see Figure 6 for an illustration).
Bucketing in high dimensions (MakeBucket function).
We note that our vectorial notation for buckets in high dimensions (see section 8.2) allows us to continue talking about bucketings with buckets, even though now the number of buckets is in fact a vector of length . In fact in dimension the only property of the bucketing that matters for our analysis is the number of buckets and the shape of each bucket is not important (this is due to the fact that the support is sampled from a permutation invariant distribution). In order to avoid unnecessary notation overload, in Algorithm 8 we introduce procedure MakeBucket that constructs a bucketing of size of the following simple form. The vector is defined by
[TABLE]
The pseudocode for MakeBucket, which implements the formula above, is given in Algorithm 8.
8.4 Filtering, hashing, and bucketing in high dimensions
We introduce the main definitions here. Our techniques in this section use a version of our adaptive aliasing filters that is taylored to the assumption that the support of is chosen uniformly at random. Since the signal is assumed to be sampled from a distribution, we are able to design a fast algorithm by adapting to a distribution as opposed to a given realization of the support of . The next definition is essentially a simplified version of the definition of a frequency cone from Section 2 (see Definition 1):
Definition 15** (Congruence classes of support).**
Suppose and are positive integers such that is a power of two. Let be a vector of powers of two such that for . For every , and signal , we define the -congruence class of to be the set , given by
[TABLE]
We access the signal using a bucketing operation, defined below.
Definition 16** (Bucketing in high dimensions).**
Suppose and are positive integers such that is a power of two. Let be a vector of powers of two such that for all . For every , , and signal , we define the -bucketing of with shift to be , given by
[TABLE]
The following definition of Bernoulli set provides a compact way of referring to the distribution of :
Definition 17** (Bernoulli set).**
For every power of two and positive integer , let and set be a random set such that each is independently chosen to be in with probability ,
[TABLE]
Moreover, for any such that , we define
[TABLE]
The next lemma is crucial for our analysis. The lemma considers two bucketings and , where is a refinement of . The object of interest is the number of buckets in the bucketing that contain at least two elements of a Bernoulli set , i.e. non-singleton buckets. The lemma shows that as long as the product of the number of buckets in and is at least , the elements (i.e. frequencies) of a Bernoulli set that belong to non-singleton buckets in must be rather uniformly spread over the coarser bucketing . Specifically, no bucket in contains more than such frequencies with high probability. We will use this lemma with and and (see proof of Theorem 2.2 below).
Lemma 8** (Refinement lemma).**
For any power of two integers and , suppose and satisfy for all as well as . Then, with probability at least , we have that for all ,
[TABLE]
The proof is a simple probabilistic argument and is given in Appendix B.
8.5 Hashing in high dimensions by using downsampling
The main primitive we need for developing an algorithm with sample complexity is a hashing function based on downsampling, presented below as Algorithm 6. The algorithm takes as input the list of buckets in the hashing into (that will later be guaranteed to be superset of the nonempty buckets in a -hashing of the residual signal ) and outputs a list of potentially nonempty buckets in the hashing into , together with evaluations of the corresponding hashed signals at a point that is given as a parameter.
Lemma 9**.**
(Hashing in high dimensions)* Suppose and are positive integers such that is a power of two. Suppose , and are vectors of powers of two such that and for all . Moreover, let be a shift vector. For any signals suppose that*
[TABLE]
where is -congruence class of as per Definition 15. Then the procedure Hashing outputs a set that with probability satisfies
[TABLE]
Moreover, the sample complexity of this procedure is
[TABLE]
while the time complexity of this procedure is
[TABLE]
where .
Proof.
Recall that the algorithm uses a coarse -bucketing to refine a -bucketing, for which only the non-empty buckets are computed, to a -bucketing. Let and , where . Suppose (12) holds.
Note that for any ,
[TABLE]
where denotes for ease of notation. Hence, -bucketing and -bucketing are related via a linear system which is defined in (14) for any collection of values of . We also show how to choose a relatively small number of values of such that the above linear system will be well-conditioned. Note that is a vector of length , as per our vectorial notations in section 8.2, which denotes by how much we want to further refine each bucket of -bucketing.
Choosing ’s that make the linear system in (14) well-conditioned:
For every , let denote the matrix whose rows are indexed by and columns are indexed by , with entries defined by
[TABLE]
Moreover, let denote the column vector of length with entries indexed by and given by
[TABLE]
while we let denote the column vector of length with entries indexed by and given by
[TABLE]
Then, (14) implies the following linear system of equations,
[TABLE]
Next, for any , let
[TABLE]
By the assumption of the lemma on the set , one has that the vectors in (15) satisfy for every . This shows that the linear system in (17) can be solved very efficiently by randomly sampling its rows. More formally, suppose that is a multiset such that where and let denote the submatrix of whose rows are selected with respect to set . Then by Theorem 4, the matrix satisfies RIP of order . We will use this property to solve the system (17) efficiently.
Let be a submatrix of with size . Suppose that its rows are selected with respect to set and its columns are selected with respect to . More specifically, is a matrix whose rows are indexed by and columns are indexed by , with entries defined by
[TABLE]
Moreover, let denote the column vector of length with entries indexed by elements of and given by
[TABLE]
while we let denote the column vector of length with entries indexed by elements of and given by
[TABLE]
Then, (17) implies that
[TABLE]
Now, because the matrix
[TABLE]
satisfies RIP of order for all , one has that the condition number of is at most . Therefore, a linear least squares solver can compute efficiently and in a numerically stable way using the reduced linear system in (19). Note that lines 11-14 of Algorithm 6 carry out this procedure and compute for each .
Computing :
Now we show how to compute for any and . By standard downsampling properties, we have that if is the signal defined by
[TABLE]
then its Fourier transform is given by
[TABLE]
Hence,
[TABLE]
which demonstrates how to compute . Note that lines 6-8 of Algorithm 6 simply compute , for some with for some .
Sample complexity and Runtime:
Lines 6-8 of Algorithm 6 compute , for some with for some , in time and with sample complexity , according to the rule (20). This shows that the vector in (16) can be constructed efficiently.
Note that lines 11-14 of Algorithm 6 carry out a least squares linear system procedure and compute for each in time , as the time complexity of LeastSquaresSolver procedure is . Moreover, by (18), it follows that for a fixed , line 16 simply adds all pairs with satisfying . Also, any for which there exists with must satisfy and, hence, . This shows that (by (12)). Thus, it follows that after looping over all , the final satisfies (13), as desired.
Note that the sample complexity of Algorithm 6 is determined by the total number of samples required to construct the various . For any fixed , constructing requires samples from . Since there are values of that are relevant, it follows that the total sample complexity is
[TABLE]
The time complexity of this procedure is due to two computations. First, constructing each for a fixed takes time . Second, computing the vector for each fixed requires time . Therefore, the total time complexity is
[TABLE]
∎
8.6 Resolving buckets in the hashed signal
The other major building block we need for developing a sparse FFT algorithm is a function for testing bucketings of signals with various shifts for emptyness and one-sparsity. Such a primitive takes in a list of buckets of a hashed signal and determines whether each bucket is empty or not. If a bucket is not empty, then we determine whether the bucket consists of exactly one frequency using a one-sparse test. If so, we can determine this frequency and the value of the signal at this frequency from the bucketed signals. If not, then we retain the bucket for the next iteration, in which we will hash to more buckets.
Lemma 10**.**
(TestBuckets in high dimensions)* Suppose and are positive integers such that is a power of two. Suppose is a vector of powers of two such that for all . Suppose is a signal such that is a -bucketing of with shift for all and where is a multiset of i.i.d samples from for some*
[TABLE]
and for all are Congruence classes of . Also suppose that Algorithm 7 takes in the quantities for all , standard basis vectors in . Then, TestBuckets returns and that with probability satisfy the following:
- •
For any such that is a singleton set, , we have .
- •
We have .
Moreover, the runtime of this procedure is .
Proof.
Let be the dimensional inverse Fourier transform’s matrix with points. The matrix is a unitary matrix and all of its elements have absolute value . If you let denote the submatrix of whose rows are sampled from according to set then by Theorem 4, satisfies the restricted isometry property of order with probability . In the rest we condition on the event corresponding to matrix satisfying RIP of order .
Now, note that by definition of and (definitions 16 and 15 respectively) we have,
[TABLE]
therefore, for every the following holds true,
[TABLE]
thus the zero test in line 5 of Algorithm 7 works correctly for all buckets.
Now note that if is a singleton set then,
[TABLE]
Therefore, for every , the following holds,
[TABLE]
thus, . Also note that, . This is precisely implemented in line 6-7 of Algorithm 7. On the other hand if the hypothesis that is a singleton set is incorrect our algorithm will find it. Using the notation as in line 7 of Algorithm 7,
[TABLE]
where, . Because, is at most sparse and matrix satisfies RIP of order we have that,
[TABLE]
thus, the one sparse test in line 8 of Algorithm 7 works correctly for all buckets.
It is straightforward to see that the runtime of this procedure is . ∎
8.7 Sparse FFT for signals with random support in nearly linear time
Now, we are ready to present the main theorem of this section.
\sfftalg
The theorem is a consequence of the following lemma.
Lemma 11**.**
Let , , , , and denote the values of , , , , and , respectively, at the start of iteration of the main for loop in Algorithm 8. Then, for all , we define the event to be the occurrence of the following statements:
. 2. 2.
* and .* 3. 3.
If then for every .
Then, holds with probability 1, while for .
Proof.
Note that for , we have . Thus, condition (1) trivially holds. Condition (2) also trivially holds, since . Furthermore, (3) does not apply for event . Thus, holds with probability 1.
Now, assume that hold for some . We consider the probability of occurring, conditioned on the aforementioned events. Note that it follows from the values that Algorithm 8 assigns to , and along with condition (1) of the inductive hypothesis that Lemma 9 can be applied to invocations of Hashing in line 13 of Algorithm 8, and hence the output of Hashing procedure satisfies the following,
[TABLE]
Moreover, it is clear that for every and every , one has that is -bucketing of .
Now, note that by condition (2) of the inductive hypothesis along with definition of Congruence class presented in Definition 15, it follows that for all . Hence by Lemma 12, with probability ,
[TABLE]
This show that set defined in line 12 of Algorithm 8 satisfies the precondition of Lemma 10. Therefore by Lemma 10, a call to TestBuckets procedure in line 14 of Algorithm 8, with probability , outputs such that the following hold:
- (a)
For any such that is a singleton set, , one has . 2. (b)
.
In order to complete the inductive step, it suffices to show that (a) and (b) imply conditions (1), (2), and (3) in the definition of for .
Observe that condition (a) imply that therefore since (by line 15), it follows that . Hence, by inductive hypothesis we have . Also note that for every we have that and hence by condition (2) of the inductive hypothesis , one has . Condition (a) implies that for every and hence for every such every . This establishes condition (2) of for .
Next note that (by line 16). Conditions (a) along with condition (1) of the inductive hypothesis for and (21) imply that there exists no such that . Also note that condition (b) implies that for every , therefore satisfies condition (1) of the induction for . This also establishes condition (3) of for .
By a union bound we have that with probability , event holds true as desired. ∎
Now we are ready to prove Theorem 2.2.
Proof of Theorem 2.2.
Note that by Lemma 11, there exist events such that and for . Observe that
[TABLE]
Note that condition (3) of implies that the existence of a such that requires . Now, recall that after the main for loop in Algorithm 8 finishes execution,we have
[TABLE]
Thus, by Lemma 14, we have that , by our choice of and . Thus, by Markov’s inequality, with probability over the randomness in the choice of , we have that . Hence, by a union bound, we have that . Thus, by condition (3) in Lemma 11, we see that with probability , the output of Algorithm 8 satisfies for all , which proves the correctness of Algorithm 8.
Now, let us compute the sample complexity of Algorithm 8. Note that for each iteration of the main for loop in SparseFFT, we have . Also, By condition (1) of in Lemma 11,
[TABLE]
Therefore, since , by Lemma 8, we have that
[TABLE]
with probability .
Moreover, . Therefore, by Lemma 9, each call to Hashing in the -th iteration has sample complexity
[TABLE]
Hence, because in each iteration Hashing is invoked times, the total sample complexity for the algorithm is
[TABLE]
since .
Finally, we compute the time complexity of Algorithm 8. By Lemma 9, we have that the time complexity for each call to Hashing in the -th iteration of the main for loop is
[TABLE]
Thus, the total time complexity due to calls to Hashing is
[TABLE]
which can be simplified as
[TABLE]
since . Moreover, the call to TestBucket in the -th iteration of the main for loop has time complexity
[TABLE]
Hence, the total time complexity due to calls to TestBucket is
[TABLE]
Therefore, by (22) and (23), the total time complexity of Algorithm 8 is
[TABLE]
as desired. ∎
Acknowledgements
Michael Kapralov is supported in part by ERC Starting Grant 759471.
Appendix A Proofs and pseudocode omitted from section 4
Proof of Lemma 4: Let be a leaf of , let denote the level of , let denote the root of , and let denote the path from root to in , where and . Let denote the smallest positive integer such that . Note that .
For let be a subtree of which denotes the result of truncating to contain only the nodes that are at distance at most from the root.
We construct the -isolating filter iteratively by starting with and refining to over steps. The filters will be -isolating for and will be -isolating. Since and , the filter will be -isolating, as required.
For every let be the subtree of which is rooted at and is restricted to contain only the nodes that are at distance at most from . For every node the label of is defined to be , i.e., the th coordinate of , where is the label of node in tree .
We now define for . We start by letting and letting for every
[TABLE]
where is a -isolating filter for all and is a -isolating filter. By lemma 3, for every there exists such with and can be constructed in time . Such a filter can be computed in Fourier domain at any desired frequency in time . Note that is a tensor product of filters in dimension one. We now show by induction on that is a -isolating filter.
The base of the induction is provided by : since is the root of , we have that and as required.
We now prove the inductive step: . We first show that for every . Let be a leaf of distinct from . Let denote the leaf of which is the ancestor of . We consider two cases.
Case 1:
Suppose that . Note that , and also note that
[TABLE]
Thus for every it is true that . By the inductive hypothesis we have that is -isolating, and hence by the assumption of , one has for every such , and thus as required.
Case 2:
Suppose that is ancestor of . Therefore, by definition of , one can see that is a leaf in . Hence, by definition of , for every , it is true that . Recall that is a -isolating filter and therefore, , and thus as required.
Now we show that for all . Note that is a leaf in . Hence, for every , it is true that . Since is a -isolating filter, . Now, note that
[TABLE]
Thus for every it is true that . By the inductive hypothesis we have that is -isolating, and hence , and thus as required.
It remains to note that . By Lemma 3, for every one has , so , as required (note that the support size of the convolution of two filters is at most the product of support sizes of each filter).
The total runtime for constructing this filter has two parts; First part is the computation time of ’s for all which takes by Lemma 3. Second part is the time needed for computing the tensor product of all ’s which is . Therefore the total runtime is . Moreover, the total time for computing is the sum of the times needed for computing all ’s for , which is by Lemma 3.
∎
Proof of Lemma 6: Let . Recall that for every ,
[TABLE]
Because all ’s are zero mean independent random variables, for every fixed one has that for every the random variables are zero mean and independent. Observe that for all , we have and also . Therefore by Bernstein’s inequality we have that for every fixed ,
[TABLE]
If we choose for some absolute constant ,
[TABLE]
for large enough constant . By a union bound over all we get that, for all with probability .
Now note that by Parseval’s theorem, Claim 1,
[TABLE]
Conditioning on , by Chernoff-Hoeffding Bound we have,
[TABLE]
where the probability is over the i.i.d. random variables and is some positive constant. Therefore, by the choice of for some large enough constant we have that,
[TABLE]
By a union bound over these two events we have that with probability at least .
∎
Appendix B Proof of Lemma 8
Lemma 12**.**
For every power of two integer and positive integer , if is a random support signal as per Definition 2.2, the following conditions hold. If , is a vector of powers of two such that for all and , then, with probability at least over ,
[TABLE]
for all (where is the set from Definition 15).
Proof.
Note that for every , we have that for each with , . Then, since there are such for every fixed , it follows that,
[TABLE]
Hence, by the Chernoff bound, it follows that with probability for any fixed . Finally, by a union bound over all , we have the desired result. ∎
We now prove a lemma about the size of the sets :
Lemma 13**.**
For any power of two integers and , positive integer and such that and , we have that , where is defined as in Definition 17.
Proof.
Suppose . Then, observe that there are elements such that . Note that if at least two of these elements lies in . Thus, for every we have,
[TABLE]
since . ∎
As a consequence, we have a bound on the expected size of .
Lemma 14**.**
For any power of two integers and , any such that , we have .
Proof.
Simply note that
[TABLE]
by Lemma 13. ∎
We are now ready to proof Lemma 8.
Proof of Lemma 8: Consider a fixed . Note that there are values of such that . Moreover, by Lemma 13, each such lies in with identical probability
[TABLE]
and these events are all independent. Thus,
[TABLE]
Thus, by the Chernoff bound, we have that
[TABLE]
with probability at least , as desired. Finally, taking a union bound over all values of gives the desired result. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AGS 03] A. Akavia, S. Goldwasser, and S. Safra. Proving hard-core predicates using list decoding. FOCS , 44:146–159, 2003.
- 2[Aka 10] A. Akavia. Deterministic sparse Fourier approximation via fooling arithmetic progressions. COLT , pages 381–393, 2010.
- 3[BCG + 12] P. Boufounos, V. Cevher, A. C. Gilbert, Y. Li, and M. J. Strauss. What’s the frequency, kenneth?: Sublinear fourier sampling off the grid. RANDOM/APPROX , 2012.
- 4[BG] R. Beatson and Greengard. A short course on fast multipole methods. https://web.njit.edu/~jiang/math 614/beatson-greengard.pdf .
- 5[BG 97] Rick Beatson and Leslie Greengard. A short course on fast multipole methods. Wavelets, multilevel methods and elliptic PD Es , 1:1–37, 1997.
- 6[Bou 14] J. Bourgain. An improved estimate in the restricted isometry problem. GAFA , 2014.
- 7[CGV 12] Mahdi Cheraghchi, Venkatesan Guruswami, and Ameya Velingker. Restricted isometry of fourier matrices and list decodability of random linear codes. SODA , 2012.
- 8[CI 17] Mahdi Cheraghchi and Piotr Indyk. Nearly optimal deterministic algorithm for sparse walsh-hadamard transform. ACM Trans. Algorithms , 13(3):34:1–34:36, 2017.
