Estimating the Frequency of a Clustered Signal

Xue Chen; Eric Price

arXiv:1904.13043·cs.DS·May 1, 2019

Estimating the Frequency of a Clustered Signal

Xue Chen, Eric Price

PDF

TL;DR

This paper develops new methods for accurately estimating the central frequency of clustered signals with Fourier spectrum in a narrow band, improving bounds on recovery accuracy and establishing fundamental limits.

Contribution

It introduces generic conditions for frequency estimation, improves bounds for $k$-Fourier-sparse signals, and provides a new ratio bound with independent applications.

Findings

01

Achieves frequency recovery within $ ilde{O}(k^3)$ error bound.

02

Improves previous bounds from $O( ilde{O}(k^5)^{1.5})$ to $ ilde{O}(k^3)$.

03

Establishes a lower bound of $ ilde{O}(k^2)$ for frequency estimation accuracy.

Abstract

We consider the problem of locating a signal whose frequencies are "off grid" and clustered in a narrow band. Given noisy sample access to a function $g (t)$ with Fourier spectrum in a narrow range $[f_{0} - Δ, f_{0} + Δ]$ , how accurately is it possible to identify $f_{0}$ ? We present generic conditions on $g$ that allow for efficient, accurate estimates of the frequency. We then show bounds on these conditions for $k$ -Fourier-sparse signals that imply recovery of $f_{0}$ to within $Δ + \tilde{O} (k^{3})$ from samples on $[- 1, 1]$ . This improves upon the best previous bound of $O (Δ + \tilde{O} (k^{5}))^{1.5}$ . We also show that no algorithm can do better than $Δ + \tilde{O} (k^{2})$ . In the process we provide a new $\tilde{O} (k^{3})$ bound on the ratio between the maximum and average value of continuous $k$ -Fourier-sparse signals, which has independent application.

Equations172

\operatorname*{\mathbb{E}}_{t\in[-T,T]}\big{[}|\eta(t)|^{2}\big{]}\leq\varepsilon\operatorname*{\mathbb{E}}_{t\in[-T,T]}\big{[}|g(t)|^{2}\big{]}

\operatorname*{\mathbb{E}}_{t\in[-T,T]}\big{[}|\eta(t)|^{2}\big{]}\leq\varepsilon\operatorname*{\mathbb{E}}_{t\in[-T,T]}\big{[}|g(t)|^{2}\big{]}

span (e^{2 π i (j ε) t} ∣ j \in [k]) .

span (e^{2 π i (j ε) t} ∣ j \in [k]) .

R := \frac{sup _{t \in [- T, T]} ∣ g ( t ) ∣ ^{2}}{E _{t \in [- T, T]} ∣ g ( t ) ∣ ^{2}} .

R := \frac{sup _{t \in [- T, T]} ∣ g ( t ) ∣ ^{2}}{E _{t \in [- T, T]} ∣ g ( t ) ∣ ^{2}} .

|g(t)|^{2}\leq\mathsf{poly}(R)\cdot\operatorname*{\mathbb{E}}_{t\in[-T,T]}\big{[}|g(t)|^{2}\big{]}\cdot|\frac{t}{T}|^{S}

|g(t)|^{2}\leq\mathsf{poly}(R)\cdot\operatorname*{\mathbb{E}}_{t\in[-T,T]}\big{[}|g(t)|^{2}\big{]}\cdot|\frac{t}{T}|^{S}

\mathcal{F}:=\left\{g(t)=\sum_{j=1}^{k}v_{j}e^{2\pi\boldsymbol{i}f_{j}t}\bigg{|}f_{j}\in[-F,F]\right\}.

\mathcal{F}:=\left\{g(t)=\sum_{j=1}^{k}v_{j}e^{2\pi\boldsymbol{i}f_{j}t}\bigg{|}f_{j}\in[-F,F]\right\}.

R := g \in F sup \frac{x \in [ - T , T ] sup ∣ g ( x ) ∣ ^{2}}{x \in [ - T , T ] E [ ∣ g ( x ) ∣ ^{2} ]} = O (k^{3} lo g^{2} k) .

R := g \in F sup \frac{x \in [ - T , T ] sup ∣ g ( x ) ∣ ^{2}}{x \in [ - T , T ] E [ ∣ g ( x ) ∣ ^{2} ]} = O (k^{3} lo g^{2} k) .

\|g\|_{2}=\big{(}\underset{x\sim[-1,1]}{\operatorname*{\mathbb{E}}}|g(x)|^{2}\big{)}^{1/2},

\|g\|_{2}=\big{(}\underset{x\sim[-1,1]}{\operatorname*{\mathbb{E}}}|g(x)|^{2}\big{)}^{1/2},

g (f) = \int_{- \infty}^{+ \infty} g (t) e^{- 2 π i f t} d t for any real f .

g (f) = \int_{- \infty}^{+ \infty} g (t) e^{- 2 π i f t} d t for any real f .

Pr [∣ X - 1∣ \geq ε] \leq 2 exp (- \frac{ε ^{2}}{3} \cdot \frac{n}{R}) .

Pr [∣ X - 1∣ \geq ε] \leq 2 exp (- \frac{ε ^{2}}{3} \cdot \frac{n}{R}) .

\int_{- 1}^{1} ∣ H (t) \cdot g (t) ∣^{2} d t \geq 0.9 9^{2} \int_{- 1 + \frac{1}{C \cdot R}}^{1 - \frac{1}{C \cdot R}} ∣ g (t) ∣^{2} d t .

\int_{- 1}^{1} ∣ H (t) \cdot g (t) ∣^{2} d t \geq 0.9 9^{2} \int_{- 1 + \frac{1}{C \cdot R}}^{1 - \frac{1}{C \cdot R}} ∣ g (t) ∣^{2} d t .

\int_{1}^{\infty} ∣ H (t) \cdot g (t) ∣^{2} d t \leq 0.01 \int_{- 1}^{1} ∣ g (t) ∣^{2} d t,

\int_{1}^{\infty} ∣ H (t) \cdot g (t) ∣^{2} d t \leq 0.01 \int_{- 1}^{1} ∣ g (t) ∣^{2} d t,

sinc (C R \cdot t)^{O (l o g R)} = (\frac{sin ( π C R \cdot t )}{π C R \cdot t})^{O (l o g R)}

sinc (C R \cdot t)^{O (l o g R)} = (\frac{sin ( π C R \cdot t )}{π C R \cdot t})^{O (l o g R)}

(1 - \frac{1}{C R}, 1], (1, 1 + \frac{1}{S}], (1 + \frac{1}{S}, 1 + \frac{2}{S}], \dots, (1 + \frac{2 ^{j}}{S}, 1 + \frac{2 ^{j + 1}}{S}], \dots, (1 + \frac{S /2}{S}, 2], (2, + \infty) .

(1 - \frac{1}{C R}, 1], (1, 1 + \frac{1}{S}], (1 + \frac{1}{S}, 1 + \frac{2}{S}], \dots, (1 + \frac{2 ^{j}}{S}, 1 + \frac{2 ^{j + 1}}{S}], \dots, (1 + \frac{S /2}{S}, 2], (2, + \infty) .

y_{H} (t + β) \approx e^{2 π i f_{0} β} \cdot y_{H} (t)

y_{H} (t + β) \approx e^{2 π i f_{0} β} \cdot y_{H} (t)

\int_{- 1}^{1} ∣ y_{H} (t + β) - e^{2 π i f_{0} β} \cdot y_{H} (t) ∣^{2} d t ≲ ε \cdot \int_{- 1}^{1} ∣ y_{H} (t) ∣^{2} d t .

\int_{- 1}^{1} ∣ y_{H} (t + β) - e^{2 π i f_{0} β} \cdot y_{H} (t) ∣^{2} d t ≲ ε \cdot \int_{- 1}^{1} ∣ y_{H} (t) ∣^{2} d t .

∣ y_{H} (α + β) - e^{2 π i f_{0} β} y_{H} (α) ∣ \leq 0.3 \cdot y_{H} (α) with probability more than half .

∣ y_{H} (α + β) - e^{2 π i f_{0} β} y_{H} (α) ∣ \leq 0.3 \cdot y_{H} (α) with probability more than half .

\frac{t \in [ - 1 , 1 ] sup ∣ g ( t ) ∣ ^{2}}{∥ g ∥ _{2}^{2}} \leq R and ∣ g (t) ∣^{2} \leq poly (R) \cdot ∥ g ∥_{2}^{2} \cdot ∣ t ∣^{S} .

\frac{t \in [ - 1 , 1 ] sup ∣ g ( t ) ∣ ^{2}}{∥ g ∥ _{2}^{2}} \leq R and ∣ g (t) ∣^{2} \leq poly (R) \cdot ∥ g ∥_{2}^{2} \cdot ∣ t ∣^{S} .

g (1) = j \in [d] \sum c_{j} \cdot g (1 - j \cdot θ) implies ∣ g (1) ∣^{2} \leq (j \in [d] \sum ∣ c_{j} ∣^{2}) \cdot (j \in [d] \sum ∣ g (1 - j \cdot θ) ∣^{2}) .

g (1) = j \in [d] \sum c_{j} \cdot g (1 - j \cdot θ) implies ∣ g (1) ∣^{2} \leq (j \in [d] \sum ∣ c_{j} ∣^{2}) \cdot (j \in [d] \sum ∣ g (1 - j \cdot θ) ∣^{2}) .

H(t)=s_{0}\cdot\left(\operatorname{sinc}(CR\cdot t)^{C\log R}\cdot\operatorname{sinc}\big{(}C\cdot S\cdot t\big{)}^{C}\cdot\operatorname{sinc}\big{(}\frac{C\cdot S}{2}\cdot t\big{)}^{2C}\cdots\operatorname{sinc}\big{(}C\cdot t\big{)}^{C\cdot S}\right)*\operatorname{rect}_{2}(t)

H(t)=s_{0}\cdot\left(\operatorname{sinc}(CR\cdot t)^{C\log R}\cdot\operatorname{sinc}\big{(}C\cdot S\cdot t\big{)}^{C}\cdot\operatorname{sinc}\big{(}\frac{C\cdot S}{2}\cdot t\big{)}^{2C}\cdots\operatorname{sinc}\big{(}C\cdot t\big{)}^{C\cdot S}\right)*\operatorname{rect}_{2}(t)

H (f) = s_{0} \cdot (rect_{C R} (f)^{* C l o g R} * rect_{C \cdot S} (f)^{* C} * rect_{\frac{C \cdot S}{2}} (f)^{* 2 C} * \dots * rect_{C} (f)^{* C S}) \cdot sinc (2 t),

H (f) = s_{0} \cdot (rect_{C R} (f)^{* C l o g R} * rect_{C \cdot S} (f)^{* C} * rect_{\frac{C \cdot S}{2}} (f)^{* 2 C} * \dots * rect_{C} (f)^{* C S}) \cdot sinc (2 t),

\frac{\sum _{i = 1}^{m} ∣ g _{H} ( x _{i} ) ∣ ^{2}}{m} \in [1 - ε, 1 + ε] \cdot x \sim [- 1, 1] E [∣ g_{H} (x) ∣^{2}] .

\frac{\sum _{i = 1}^{m} ∣ g _{H} ( x _{i} ) ∣ ^{2}}{m} \in [1 - ε, 1 + ε] \cdot x \sim [- 1, 1] E [∣ g_{H} (x) ∣^{2}] .

\frac{1}{m}\sum_{i=1}^{m}|y_{H}(x_{i})|^{2}\geq\frac{1}{m}\sum_{i=1}^{m}\bigg{(}|g_{H}(x_{i})|^{2}-2|g_{H}(x_{i})|\cdot|\eta_{H}(x_{i})|+|\eta_{H}(x_{i})|^{2}\bigg{)}.

\frac{1}{m}\sum_{i=1}^{m}|y_{H}(x_{i})|^{2}\geq\frac{1}{m}\sum_{i=1}^{m}\bigg{(}|g_{H}(x_{i})|^{2}-2|g_{H}(x_{i})|\cdot|\eta_{H}(x_{i})|+|\eta_{H}(x_{i})|^{2}\bigg{)}.

\frac{1}{m}\sum_{i=1}^{m}|y_{H}(x_{i})|^{2}\geq\big{(}0.93-2\sqrt{0.93\cdot 15\epsilon}\big{)}\|g\|_{2}^{2}.

\frac{1}{m}\sum_{i=1}^{m}|y_{H}(x_{i})|^{2}\geq\big{(}0.93-2\sqrt{0.93\cdot 15\epsilon}\big{)}\|g\|_{2}^{2}.

\int_{f \in / [f_{0} - Δ^{'}, f_{0} + Δ^{'}]} ∣ y (f) ∣^{2} d f \leq \int_{- \infty}^{\infty} ∣ η_{H} (f) ∣^{2} d f = \int_{- \infty}^{\infty} ∣ η_{H} (t) ∣^{2} d t \leq 1.0 2^{2} ϵ \int_{- 1}^{1} ∣ g (t) ∣^{2} d t .

\int_{f \in / [f_{0} - Δ^{'}, f_{0} + Δ^{'}]} ∣ y (f) ∣^{2} d f \leq \int_{- \infty}^{\infty} ∣ η_{H} (f) ∣^{2} d f = \int_{- \infty}^{\infty} ∣ η_{H} (t) ∣^{2} d t \leq 1.0 2^{2} ϵ \int_{- 1}^{1} ∣ g (t) ∣^{2} d t .

\int_{- 1}^{1} ∣ z (t) ∣^{2} d t \leq \int_{- \infty}^{\infty} ∣ z (t) ∣^{2} d t = \int_{- \infty}^{\infty} ∣ z (f) ∣^{2} d f = \int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ z (f) ∣^{2} d f + \int_{f \in / [f_{0} - Δ^{'}, f_{0} + Δ^{'}]} ∣ z (f) ∣^{2} d f .

\int_{- 1}^{1} ∣ z (t) ∣^{2} d t \leq \int_{- \infty}^{\infty} ∣ z (t) ∣^{2} d t = \int_{- \infty}^{\infty} ∣ z (f) ∣^{2} d f = \int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ z (f) ∣^{2} d f + \int_{f \in / [f_{0} - Δ^{'}, f_{0} + Δ^{'}]} ∣ z (f) ∣^{2} d f .

\int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ z (f) ∣^{2} d f = \int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ y_{H} (f) \cdot e^{2 π i f_{0} β} - y_{H} (f) \cdot e^{2 π i f β} ∣^{2} d f \leq \int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ y_{H} (f) ∣^{2} \cdot ∣ e^{2 π i f_{0} β} - e^{2 π i f β} ∣^{2} d f .

\int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ z (f) ∣^{2} d f = \int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ y_{H} (f) \cdot e^{2 π i f_{0} β} - y_{H} (f) \cdot e^{2 π i f β} ∣^{2} d f \leq \int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ y_{H} (f) ∣^{2} \cdot ∣ e^{2 π i f_{0} β} - e^{2 π i f β} ∣^{2} d f .

\int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ z (f) ∣^{2} d f ≲ γ^{2} \int_{- \infty}^{+ \infty} ∣ y_{H} (f) ∣^{2} d f = γ^{2} \int_{- \infty}^{+ \infty} ∣ y_{H} (t) ∣^{2} d t ≲ γ^{2} (1 + 2 ϵ) \int_{- 1}^{1} ∣ g (t) ∣^{2} d t .

\int_{f_{0} - Δ^{'}}^{f_{0} + Δ^{'}} ∣ z (f) ∣^{2} d f ≲ γ^{2} \int_{- \infty}^{+ \infty} ∣ y_{H} (f) ∣^{2} d f = γ^{2} \int_{- \infty}^{+ \infty} ∣ y_{H} (t) ∣^{2} d t ≲ γ^{2} (1 + 2 ϵ) \int_{- 1}^{1} ∣ g (t) ∣^{2} d t .

\int_{f \in / [f_{0} - Δ^{'}, f_{0} + Δ^{'}]} ∣ z (f) ∣^{2} d f

\int_{f \in / [f_{0} - Δ^{'}, f_{0} + Δ^{'}]} ∣ z (f) ∣^{2} d f

\leq 4 \int_{f \in / [f_{0} - Δ^{'}, f_{0} + Δ^{'}]} ∣ y_{H} (f) ∣^{2} d f

\leq 4 \int_{- \infty}^{+ \infty} ∣ η_{H} (f) ∣^{2} d f = 4 \int_{- \infty}^{+ \infty} ∣ η_{H} (t) ∣^{2} d t

i = 1 \sum m ∣ y_{H} (x_{i}) ∣^{2} / m \geq 0.8∥ g ∥_{2}^{2} and i = 1 \sum m ∣ z (x_{i}) ∣^{2} / m \leq 0.01∥ g ∥_{2}^{2} .

i = 1 \sum m ∣ y_{H} (x_{i}) ∣^{2} / m \geq 0.8∥ g ∥_{2}^{2} and i = 1 \sum m ∣ z (x_{i}) ∣^{2} / m \leq 0.01∥ g ∥_{2}^{2} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\includeversion

version:full \excludeversionversion:short

Estimating the Frequency of a Clustered Signal

Xue Chen

[email protected]

Northwestern University Supported by research funding from Northwestern University. Part of this work was done while the author was in the University of Texas at Austin supported by NSF Grant CCF-1526952 and a Simons Investigator Award (#409864, David Zuckerman).

Eric Price

[email protected]

The University of Texas at Austin Supported in part by NSF Award CCF-1751040 (CAREER).

Abstract

We consider the problem of locating a signal whose frequencies are “off grid” and clustered in a narrow band. Given noisy sample access to a function $g(t)$ with Fourier spectrum in a narrow range $[f_{0}-\Delta,f_{0}+\Delta]$ , how accurately is it possible to identify $f_{0}$ ? We present generic conditions on $g$ that allow for efficient, accurate estimates of the frequency. We then show bounds on these conditions for $k$ -Fourier-sparse signals that imply recovery of $f_{0}$ to within $\Delta+\widetilde{O}(k^{3})$ from samples on $[-1,1]$ . This improves upon the best previous bound of $O\big{(}\Delta+\widetilde{O}(k^{5})\big{)}^{1.5}$ . We also show that no algorithm can do better than $\Delta+\widetilde{O}(k^{2})$ .

In the process we provide a new $\widetilde{O}(k^{3})$ bound on the ratio between the maximum and average value of continuous $k$ -Fourier-sparse signals, which has independent application.

1 Introduction

A natural question, dating at least to the work of Prony in 1795, is to estimate a signal from samples, assuming the signal has a $k$ -sparse Fourier representation, i.e., that the signal is a sum of $k$ complex exponentials: $g(t)=\sum_{j=1}^{k}v_{j}e^{2\pi\mathbf{i}f_{j}t}$ for some set of frequencies $f_{j}$ and coefficients $v_{j}$ .

If the frequencies are located on a discrete grid (giving a sparse discrete Fourier transform), then a long line of work has studied efficient algorithms for recovering the signal (e.g., [11, 7, 1, 8, 9, 10]). If the frequencies are not on a grid, then Prony’s method from 1795 [13] or matrix pencil [3] can still identify them in the absence of noise. With noise, however, one cannot robustly recover frequencies that are too close together: if one listens to a signal for the interval $[-T,T]$ then any two frequencies $\theta$ and $\theta+\varepsilon/T$ will be $O(\varepsilon)$ -close to each other, and so cannot be distinguished with noise. As shown in [12], this nonrobustness grows exponentially in $k$ . On the other hand, [12] also showed that recovery with polynomially small noise is possible if all the frequencies have separation $1/2T$ , and [14] showed that a constant fraction of noise is tolerable with separation $\log^{O(1)}(FT)/T$ , where $F$ is the bandlimit of the signal.

So what is possible for arbitrary Fourier-sparse signals, without any assumption of frequency separation? One cannot hope to identify the frequencies exactly, but one can still estimate the signal itself. If two frequencies are similar enough to be indistinguishable over the sampled interval, we do not need to distinguish them. In [5], this led to an algorithm for an arbitrary $k$ -Fourier-sparse signal that used $\text{poly}(k,\log(FT))$ samples to estimate it with only a constant factor increase in the noise. However, this polynomial is fairly poor.

Since prior work could handle the case of well-separated frequencies, a key challenge in [5] is the setting with all the frequencies in a narrow cluster. Formally, consider the following subproblem: if all the frequencies $f_{i}$ of the signal lie in a narrow band $[f_{0}-\Delta,f_{0}+\Delta]$ , how accurately can we estimate $f_{0}$ ? Note that while we would like an efficient algorithm that takes a small number of samples, the key question is information theoretic. And we can ask this question more generally: if the signal is not $k$ -sparse, but still has all its frequencies in a narrow band, can we locate that band?

Question 1.1.

Let $g(t)$ be a signal with Fourier transform supported on $[f_{0}-\Delta,f_{0}+\Delta]$ , for some $f_{0}\in[-F,F]$ . Suppose that we can sample from $y(t)=g(t)+\eta(t)$ at points in $[-T,T]$ , where $\eta(t)$ could be any $\ell_{2}$ bounded noise on $[-T,T]$ with

[TABLE]

for a small constant $\varepsilon$ . Under what conditions on $g$ can we estimate $f_{0}$ , and how accurately?

One might expect to be able to estimate $f_{0}$ to $\pm(\Delta+O(\frac{1}{T}))$ for all functions $g$ ; after all, $g$ is just a combination of individual frequencies, each of which points to some frequency in the right range, and each individual frequency in isolation can be estimated to within $\pm O(\frac{1}{T})$ in the presence of noise. Unfortunately, this intuition is false.

To see this, consider the family of $k$ -sparse Fourier functions with $f_{j}=\varepsilon j$ , i.e.,

[TABLE]

By sending $\varepsilon\to 0$ and taking a Taylor expansion, this family can get arbitrarily close to any degree $k-1$ polynomial, on any interval $[-T^{\prime},T^{\prime}]$ . Thus, to solve the question, one would also need to solve it when $g(t)$ is a polynomial even for arbitrarily small $\Delta$ .

There are two ways in which $g(t)$ being a degree $d$ polynomial can lead to trouble. The first is that $g(t)$ could itself be a Taylor expansion of $e^{\pi\mathbf{i}ft}$ . If $d\gtrsim fT$ , this Taylor approximation will be quite accurate on $[-T,T]$ ; with the noise $\eta$ , the observed signal can equal $e^{\pi\mathbf{i}ft}$ . Thus the algorithm has to output $f$ , which can be $\Theta(d/T)$ far from the “true” answer $f_{0}=0$ .

The second way in which $g(t)$ can lead to trouble is by removing most of the signal energy. If $g(t)$ is the (slightly shifted) Chebyshev polynomial $g(t)=T_{d}\big{(}t/T+O(\frac{\log^{2}d}{d^{2}})\big{)}$ , then $|g(t)|\leq 1$ for $t\leq\big{(}1-O(\frac{\log^{2}d}{d^{2}})\big{)}T$ , while $g(t)\geq d$ for $t\geq\big{(}1-O(\frac{\log^{2}d}{d^{2}})\big{)}T$ . That is to say, the majority of the $\ell_{2}$ energy of $g$ can lie in the final $O(\frac{\log^{2}d}{d^{2}})$ fraction of the interval. In such a case, a small constant noise level $\eta$ can make samples outside that $T\cdot\widetilde{O}(1/d^{2})$ size region equal to zero, and hence completely uninformative; and samples in that region still have to tolerate noise. This leads to an “effective” interval size of $T^{\prime}=T\cdot\widetilde{O}(\frac{1}{d^{2}})$ , leading to accuracy $O(1/T^{\prime})=\widetilde{O}(d^{2})/T$ .

Our main result is that, in a sense, these two types of difficulties are the only ones that arise. We can measure the second type of difficulty by looking at how much larger the maximum value of $g$ is than its average:

[TABLE]

We can measure the former by observing that while a polynomial may approximate a complex exponential on a bounded region, as $t\to\infty$ the polynomial will blow up. In particular, we take the $S$ such that

[TABLE]

for all $|t|\geq T$ . We show that if $R$ and $S$ are bounded, one can estimate $f_{0}$ to within $\Delta+\widetilde{O}(R+S)/T$ , which is almost tight from the above discussion of polynomials. Moreover, the time and number of samples required are fairly efficient:

Theorem 1.2.

Given any $T>0,F>0,\Delta>0,R$ , and $S>0$ , let $g(t)$ be a signal with the following properties:

$\mathsf{supp}(\widehat{g})\subseteq[f_{0}-\Delta,f_{0}+\Delta]$ * where $f_{0}\in[-F,F]$ .* 2. 2.

$\underset{t\in[-T,T]}{\sup}\big{[}|g(t)|^{2}\big{]}\leq R\cdot\underset{t\in[-T,T]}{\operatorname*{\mathbb{E}}}\big{[}|g(t)|^{2}\big{]}$ . 3. 3.

$|g(t)|^{2}$ * grows as at most $\mathsf{poly}(R)\cdot\underset{t\in[-T,T]}{\operatorname*{\mathbb{E}}}\big{[}|g(t)|^{2}\big{]}\cdot|\frac{t}{T}|^{S}$ for $t\notin[-T,T]$ .*

Let $y(t)=g(t)+\eta(t)$ be the observable signal on $[-T,T]$ , where $\underset{t\in[-T,T]}{\operatorname*{\mathbb{E}}}\big{[}|\eta(t)|^{2}\big{]}\leq\epsilon\cdot\underset{t\in[-T,T]}{\operatorname*{\mathbb{E}}}\big{[}|g(t)|^{2}\big{]}$ for a sufficiently small constant $\epsilon$ . For $\Delta^{\prime}=\Delta+\frac{\widetilde{O}(R+S)}{T}$ and any $\delta>0$ , there exists an efficient algorithm that takes $O(R\log\frac{F}{\Delta^{\prime}\cdot\delta})$ samples from $y(t)$ and outputs $\widetilde{f}$ satisfying $|f_{0}-\widetilde{f}|\leq O(\Delta^{\prime})$ with probability at least $1-\delta$ .

Application to sparse Fourier transforms

Specializing to $k$ -Fourier-sparse signals, we give bounds on $R$ and $S$ for this family. Since (as described above) this family can approximate degree- $(k-1)$ polynomials, we know that $R\gtrsim k^{2}$ and $S\gtrsim k$ ; we show that $R\lesssim k^{3}\log^{2}k$ and $S\lesssim k^{2}\log k$ . Thus, whenever $R$ is between $k^{2}$ and $\widetilde{O}(k^{3})$ , we can identify $k$ -Fourier-sparse signals to within $\Delta+\widetilde{O}(R)/T$ . This is an improvement over the results in [5] in several ways.

Formally, for a given sparsity level $k$ , we consider signals in

[TABLE]

Theorem 1.3.

For any $k$ and $T$ ,

[TABLE]

It was previously known that $R\lesssim k^{4}\log^{3}k$ [5], and this fact was used in [2]. (Thus, our improved bound on $R$ immediately implies an improvement in Theorem 8 of [2], from $s_{\mu,\varepsilon}^{5}\log^{3}s_{\mu,\varepsilon}$ to $s_{\mu,\varepsilon}^{4}\log^{2}s_{\mu,\varepsilon}$ .)

Next we bound the growth $S=\widetilde{O}(k^{2})$ for any $|t|\geq T$ .

Theorem 1.4.

*There exists $S=O(k^{2}\log k)$ such that for any $|t|>T$ and $g(t)=\sum_{j=1}^{k}v_{j}\cdot e^{2\pi\boldsymbol{i}f_{j}t}$ , $|g(t)|^{2}\leq\mathsf{poly}(k)\cdot\underset{x\in[-T,T]}{\operatorname*{\mathbb{E}}}[|g(x)|^{2}]\cdot|\frac{t}{T}|^{S}$ .

This is analogous to Theorem 5.5 of [5], which proves a bound of $(kt)^{k}$ rather than $t^{\widetilde{O}(k^{2})}$ . These bounds are incomparable, but the $t^{\widetilde{O}(k^{2})}$ bound is actually more useful for this problem: what really matters is showing that $g(t)$ is not too large just outside the interval. Theorem 1.4 gives the “correct” polynomial dependence at $t=(1+1/k^{2})T$ .

We can now apply Theorem 1.2 to get an efficient algorithm to recover the center of a cluster of $k$ frequencies within accuracy $\tilde{O}(R)$ .

Theorem 1.5.

Given $F,T,$ and $k$ , let $R$ be the ratio between the maximum and average value of continuous $k$ -Fourier-sparse signals defined in (1). Given $\Delta$ , let $g(t)$ be a $k$ -Fourier-sparse signal centered around $f_{0}$ : $g(t)=\sum_{i\in[k]}v_{i}\cdot e^{2\pi\mathbf{i}f_{i}t}$ where each $f_{i}\in[f_{0}-\Delta,f_{0}+\Delta]$ and $y(t)=g(t)+\eta(t)$ be the observable signal on $[-T,T]$ , where $\underset{t\in[-T,T]}{\operatorname*{\mathbb{E}}}\big{[}|\eta(t)|^{2}\big{]}\leq\epsilon\cdot\underset{t\in[-T,T]}{\operatorname*{\mathbb{E}}}\big{[}|g(t)|^{2}\big{]}$ for a sufficiently small constant $\epsilon$ .

For any $\delta>0$ , there exist $\Delta^{\prime}=\Delta+\frac{\tilde{O}(R)}{T}$ and an efficient algorithm that takes $O(k\log^{2}k\log\frac{F}{\Delta^{\prime}\cdot\delta})$ samples from $y(t)$ and outputs $\widetilde{f}$ satisfying $|f_{0}-\widetilde{f}|\leq O(\Delta^{\prime})$ with probability at least $1-\delta$ .

Note that the sample complexity here is $\widetilde{O}(k)$ not $\widetilde{O}(R)$ . This is because, based on the structure of the problem, we can use a nonuniform sampling procedure that performs better. Otherwise this theorem is just Theorem 1.2 applied to the $R$ and $S$ from Theorems 1.3 and 1.4.

Theorem 1.5 is a direct improvement on Theorem 7.5 of [5], which for $T=1$ could estimate to within $O\left(\Delta+\widetilde{O}(k^{5})\right)^{1.5}$ accuracy and used $\text{poly}(k)$ samples. In particular, in addition to improving the additive $\text{poly}(k)$ term, our result avoids a multiplicative increase in the bandwidth $\Delta$ of $g$ .

The main technical lemma in proving Theorems 1.2 and 1.5 is a filter function $H$ with a compact supported Fourier transform $\widehat{H}$ that simulates a box function on $[-T,T]$ for any $g$ satisfying the conditions in Theorem 1.2.

Lemma 1.6.

Given any $T$ , $S$ , and $R$ , there exists a filter function $H$ with $\big{|}\mathsf{supp}(\widehat{H})\big{|}\leq\frac{\tilde{O}(R+S)}{T}$ such that for any $g(t)$ satisfying the second and third conditions in Theorem 1.2,

$H$ * is close to a box function on $[-T,T]$ : $\int_{-T}^{T}|g(t)\cdot H(t)|^{2}\mathrm{d}t\geq 0.9\int_{-T}^{T}|g(t)|^{2}\mathrm{d}t$ .* 2. 2.

The tail of $H(t)\cdot g(t)$ is small: $\int_{-T}^{T}|g(t)\cdot H(t)|^{2}\mathrm{d}t\geq 0.95\int_{-\infty}^{\infty}|g(t)\cdot H(t)|^{2}\mathrm{d}t.$

Organization

We introduce some notation and tools in Section 2. Then we provide a technical overview in Section 3. We show our filter function and prove Lemma 1.6 in Section 4. Next we present the algorithm about frequency estimation of Theorem 1.2 in Section 5. Finally we prove the results about sparse Fourier transform — Theorem 1.3 and Theorem 1.4 in Section 6.

2 Preliminaries

In the rest of this work, we fix the observation interval to be $[-1,1]$ and define

[TABLE]

because we could rescale $[-T,T]$ to $[-1,1]$ and $[-F,F]$ to $[-FT,FT]$ .

We first review several facts about the Fourier transform. The Fourier transform $\widehat{g}(f)$ of an integrable function $g:\mathbb{R}\rightarrow\mathbb{C}$ is

[TABLE]

We use $g\cdot h$ to denote the pointwise dot product $g(t)\cdot h(t)$ and $g^{k}$ to denote $\underbrace{g(t)\cdots g(t)}_{k}$ . Similarly, we use $g*h$ to denote the convolution of $g$ and $h$ : $\int_{-\infty}^{+\infty}g(x)\cdot h(t-x)\mathrm{d}x$ . In this work, we always set $g^{*k}$ as the convolution $\underbrace{g(t)*\cdots*g(t)}_{k}$ . Notice that $\mathsf{supp}(g\cdot h)=\mathsf{supp}(g)\cap\mathsf{supp}(h)$ and $\mathsf{supp}(g*h)=\mathsf{supp}(g)+\mathsf{supp}(h)$ .

We define the box function and its Fourier transform $\operatorname{sinc}$ function as follows. Given a width $s>0$ , the box function $\operatorname{rect}_{s}(t)=1/s$ iff $|t|\leq s/2$ ; and its Fourier transform is $\operatorname{sinc}(sf)=\frac{\sin(\pi fs)}{\pi fs}$ for any $f$ .

We state the Chernoff bound for random sampling [4].

Lemma 2.1.

Let $X_{1},X_{2},\cdots,X_{n}$ be independent random variables in $[0,R]$ with expectation $1$ . For any $\varepsilon<1/2$ and $n\gtrsim\frac{R}{\epsilon^{2}}$ , $X=\frac{\sum_{i=1}^{n}X_{i}}{n}$ with expectation 1 satisfies

[TABLE]

3 Proof Overview

We first outline the proofs of Lemma 1.6 and Theorem 1.2. Then we show the proof sketch of $R=\tilde{O}(k^{3})$ and $S=\tilde{O}(k^{2})$ of $k$ -Fourier-sparse signals.

The filter functions $(H,\widehat{H})$ in Lemma 1.6.

Ideally, to satisfy the two claims in Lemma 1.6, we could set $H(t)$ to be the box function $2\operatorname{rect}_{2}(t)$ on $[-1,1]$ . However, by the uncertainty principle, it is impossible to make its Fourier transform $\widehat{H}$ compact using such an $H(t)$ . Hence our construction of $(H,\widehat{H})$ is in the inverse direction: we build $\widehat{H}(f)$ by box functions and $H(t)$ by the Fourier transform of box functions — the sinc function. In the rest of this discussion, we focus on using the sinc function to prove Lemma 1.6 given the properties of $g$ in Theorem 1.2.

We first notice that any $H$ with the following two properties is effective in Lemma 1.6 for $g$ satisfying $|g(t)|^{2}\leq R\cdot\|g\|_{2}^{2}$ for any $|t|\leq 1$ and $|g(t)|^{2}\leq\mathsf{poly}(R)\|g\|^{2}_{2}\cdot|t|^{S}$ for $|t|>1$ :

$H(t)=1\pm 0.01$ for any $t\in[-1+\frac{1}{C\cdot R},1-\frac{1}{C\cdot R}]$ of a large constant $C$ . This shows

[TABLE]

Because $|g(t)|^{2}\leq R\cdot\|g\|_{2}^{2}$ for any $t\in[-1,1]\setminus[-1+\frac{1}{C\cdot R},1-\frac{1}{C\cdot R}]$ , the constant on the R.H.S. is at least $0.99^{2}\cdot(1-\frac{1}{C})\geq 0.9$ , which implies the first claim of Lemma 1.6. 2. 2.

$H(t)$ declines to $\frac{1}{\mathsf{poly}(R)\cdot t^{2S}}$ for any $|t|>1$ . This shows

[TABLE]

which implies the second claim.

For ease of exposition, we start with $S=0$ . We plan to design a filter $H_{0}(t)$ with compact $\widehat{H_{0}}$ dropping from $0.99$ at $t=1-\frac{1}{C\cdot R}$ to $\frac{1}{\mathsf{poly}(R)}$ at $t=1$ in a small range $\frac{1}{CR}$ using the sinc function. To apply the sinc function, we notice that

[TABLE]

decays from 1 at $t=0$ to $1/\mathsf{poly}(R)$ at $t=\frac{1}{C\cdot R}$ , which matches the dropping of $H_{0}(t)$ from $t=1-\frac{1}{C\cdot R}$ to $t=1$ .

Then, to make $H(t)\approx 1$ for any $|t|\leq 1-\frac{1}{C\cdot R}$ , let us consider a convolution of $\operatorname{rect}_{1}(t)$ and $\operatorname{sinc}(CR\cdot t)^{O(\log R)}$ . Because most of the mass of the latter is in $[-\frac{1}{CR},\frac{1}{CR}]$ , this convolution keeps almost the same value in $[-\frac{1}{2}+\frac{1}{CR},\frac{1}{2}-\frac{1}{CR}]$ and drops down to $1/\mathsf{poly}(R)$ at $t=\frac{1}{2}+\frac{1}{CR}$ . At the same time, it will keep the compactness of $\widehat{H_{0}}$ since it corresponds to the dot product on the Fourier domain. By normalizing and scaling, this gives the desired $(H_{0},\widehat{H_{0}})$ for $S=0$ .

Next we describe the construction of $S>0$ . The high level idea is to consider the decays of $H(t)$ in $\log_{2}S+O(1)$ segments rather than one segment of $S=0$ :

[TABLE]

For each segment, we provide a power of sinc functions matching its decay in $H(t)$ like the construction of $H_{0}$ on $(1-\frac{1}{CR},1]$ . The final construction is the convolution of the dot product of all sinc powers and a box function, which appears in Section 4.

The Algorithm of Theorem 1.2.

Now we show how to estimate $f_{0}$ given the observable signal $y=g+\eta$ where $\mathsf{supp}(\widehat{g})\subseteq[f_{0}-\Delta,f_{0}+\Delta]$ and $\|\eta\|_{2}^{2}\leq\varepsilon\|g\|_{2}^{2}$ (with $\ell_{2}$ norm taken over $[-T,T]$ defined in (2)). We instead consider $y_{H}(t)=y(t)\cdot H(t)$ with the filter function $(H,\widehat{H})$ from Lemma 1.6 and the corresponding dot products $g_{H}=g\cdot H$ and $\eta_{H}=\eta\cdot H$ . The starting point is that for a sufficiently small $\beta$ , we expect

[TABLE]

because $y_{H}$ has Fourier spectrum concentrated around $f_{0}$ . This does not hold for all $t$ , but it does hold on average:

[TABLE]

This is because we can use Parseval’s identity to replace these integrals by an integral over Fourier domain—Parseval’s identity would apply if the integrals were from $-\infty$ to $\infty$ , but because of the filter function $H$ , relatively little mass in $y_{H}$ lies outside $[-1,1]$ . Then, the Fourier transform of the term inside the left square is $e^{2\pi\boldsymbol{i}f\beta}\cdot\widehat{y_{H}}(f)-e^{2\pi\boldsymbol{i}f_{0}\beta}\cdot\widehat{y_{H}}(f)$ . Note that $\widehat{y_{H}}=\widehat{g_{H}}+\widehat{\eta_{H}}$ has most of its $\ell_{2}$ mass in $\mathsf{supp}(g_{H})\subseteq[f_{0}-\Delta^{\prime},f_{0}+\Delta^{\prime}]$ for $\Delta^{\prime}=\Delta+|\mathsf{supp}(\widehat{H})|$ , and every such frequency shrinks in the left by a factor $|e^{2\pi\boldsymbol{i}f\beta}-e^{2\pi\boldsymbol{i}f_{0}\beta}|=O(\beta\Delta^{\prime})$ . Thus, for $\beta\ll 1/\Delta^{\prime}$ , (3) holds.

To learn $f_{0}$ through $e^{2\pi\boldsymbol{i}f_{0}\beta}$ , we design a sampling procedure to output $\alpha$ satisfying

[TABLE]

Even though the above discussion shows the left hand side is smaller than the R.H.S. on average, a uniformly random $\alpha\sim[-1,1]$ may not satisfy it with good probability: $|y_{H}(\alpha)|\geq\|y_{H}\|_{2}$ may be only true for $1/R$ fraction of $\alpha\in[-1,1]$ , while the corruption by adversarial noise $\eta$ has $\|\eta\|_{2}^{2}\gtrsim\varepsilon\|y_{H}\|_{2}^{2}$ for a constant $\varepsilon\gg 1/R$ . At the same time, even for many points $\alpha_{1},\ldots,\alpha_{m}$ where some of them satisfy the above inequality, it is infeasible to verify such an $\alpha_{i}$ given $f_{0}$ is unknown. We provide a solution by adopting the importance sampling: for $m=O(R)$ random samples $\alpha_{1},\ldots,\alpha_{m}\in[-1,1]$ , we output $\alpha$ with probability proportional to the weight $|y_{H}(\alpha_{i})|^{2}$ .

We prove the correctness of this sampling procedure in Lemma 5.2 in Section 5.

Finally, learning $e^{2\pi\mathbf{i}f_{0}\beta}$ is not enough to learn $f_{0}$ : because of the noise, we only learn $e^{2\pi\mathbf{i}f_{0}\beta}$ to within a constant $\varepsilon$ , which gives $f_{0}$ to within $\pm O(\varepsilon/\beta)$ ; and because of the different branches of the complex logarithm, this is only up to integer multiples of $1/\beta$ . Therefore to fully learn $f_{0}$ , we repeat the sampling procedure at logarithmically many different scales of $\beta$ , from $\beta=1/2F$ to $\beta=\frac{\Theta(1)}{\Delta^{\prime}}$ .

$k$ -Fourier-sparse signals.

Finally, we show $R=\widetilde{O}(k^{3})$ and $S=\widetilde{O}(k^{2})$ such that for any $g(t)=\sum_{j=1}^{k}v_{j}\cdot e^{2\pi\boldsymbol{i}f_{j}t}$ — not necessarily one with the $f_{j}$ clustered together—

[TABLE]

We first review the previous argument of $R=\widetilde{O}(k^{4})$ [5]. The key point is to show for some $d=\widetilde{O}(k^{2})$ that $g(1)$ is a linear combination of $g(1-\theta),\ldots,g(1-d\cdot\theta)$ using bounded integer coefficients $c_{1},\ldots,c_{d}=O(1)$ for any $\theta\leq\frac{2}{d}$ . Then

[TABLE]

If we think of $g(1)$ as the supremum and $g(1-j\cdot\theta)$ as the average $\|g\|_{2}$ —which we can formally do up to logarithmic factors by averaging over $\theta$ —this shows $|g(1)|^{2}\leq\widetilde{O}(d^{2})\|g\|_{2}^{2}$ . One natural idea to improve it is to use a smaller value $d$ and a shorter linear combination [6]. However, $d=\tilde{\Omega}(k^{2})$ for such a combination when $g$ is approximately the degree $k-1$ Chebyshev polynomial. In this work, we use a geometric sequence to control $c_{j}$ such that $\sum_{j}|c_{j}|^{2}=O(d/k)$ instead of $O(d)$ , which provides an improvement of a factor $\widetilde{O}(k)$ on $R$ .

Then we bound $S=\widetilde{O}(k^{2})$ for $g(t)$ at $|t|>1$ . The intuition is that given (4) holds for any $g(t)$ in terms of $g(t-\theta),\ldots,g(t-d\cdot\theta)$ with $\theta=\frac{2}{d}$ , it implies $|g(t)|^{2}\leq\mathsf{poly}(k)\cdot\|g\|_{2}^{2}\cdot e^{(t-1)\cdot O(d)}$ for $t>1$ . Combining this with an alternate bound $|g(t)|^{2}\leq\mathsf{poly}(k)\cdot\|g\|_{2}^{2}\cdot(k\cdot t)^{O(k)}$ for $t>1+1/k$ , it completes the proof of Theorem 1.4 about $S$ .

Finally we notice that we could improve the sample complexity in Theorem 1.5 to $\widetilde{O}(k)\log\frac{F}{\Delta^{\prime}}$ using a biased distribution [6] to generate $\alpha$ . These results about $k$ -Fourier-sparse signals appear in Section 6.

4 Our Filter Function

The main result is an explicit filter function $H$ with compact support $\widehat{H}$ that is close to the box function on $[-1,1]$ for any $g$ satisfying the conditions in Theorem 1.2.

We show our filter function as follows.

Definition 4.1.

Given $R$ , the growth rate $S$ and an even constant $C$ , we define the filter function

[TABLE]

where $s_{0}\in\mathbb{R}^{+}$ is a parameter to normalize $H(0)=1$ . On the other hand, its Fourier transform is

[TABLE]

whose support size is $O(CR\cdot C\log R+CS\cdot C+\cdots+C\cdot C\cdot S)=O(R\log R+S\log S)$ .

We prove Lemma 1.6 using $H(\alpha x)$ with a large constant $C$ and a scale parameter $\alpha=\frac{1}{2}+\frac{1.2}{\pi CR}$ . For convenience, we state the full version of Lemma 1.6 for $T=1$ as follows.

Theorem 4.2.

Let $R,S>0$ , let $C$ be a large even constant, and define $\alpha=(\frac{1}{2}+\frac{1.2}{\pi CR})$ . Consider any function $g$ satisfying the following two conditions:

$\underset{t\in[-1,1]}{\sup}|g(t)|^{2}\leq R\cdot\|g\|_{2}^{2}$ ** 2. 2.

And $|g(t)|^{2}\leq\mathsf{poly}(R)\cdot\|g\|^{2}_{2}\cdot|t|^{S}$ for $t\notin[-1,1]$ ,

Then the filter function $H\big{(}\alpha x\big{)}$ is such that $H\big{(}\alpha x\big{)}\cdot g(x)$ satisfies

$\int_{-1}^{1}|g(x)\cdot H\big{(}\alpha x\big{)}|^{2}\mathrm{d}x\geq 0.9\int_{-1}^{1}|g(x)|^{2}\mathrm{d}x$ . 2. 2.

$\int_{-1}^{1}|g(x)\cdot H\big{(}\alpha x\big{)}|^{2}\mathrm{d}x\geq 0.95\int_{-\infty}^{\infty}|g(x)\cdot H\big{(}\alpha x\big{)}|^{2}\mathrm{d}x.$ ** 3. 3.

$|H(x)|\leq 1.01$ * for any $x$ .*

{version:full}

For completeness, we show a few properties of $H$ and finish the proof of Theorem 4.2 in Appendix 7.

{version:short}

Due to the space constraint, we defer the proof of Theorem 4.2 to the full version.

5 Frequency Estimation

We show the algorithm for frequency estimation and prove Theorem 1.2 in this section. We fix $T=1$ and use the definition $\|h\|_{2}^{2}=\underset{x\sim[-1,1]}{\operatorname*{\mathbb{E}}}[|h(x)|^{2}]$ to restate the theorem.

Theorem 5.1.

Given any $F>0,\Delta>0,R$ , and $S>0$ , let $g(t)$ be a signal with the following properties:

$\mathsf{supp}(\widehat{g})\subseteq[f_{0}-\Delta,f_{0}+\Delta]$ * where $f_{0}\in[-F,F]$ .* 2. 2.

$\underset{t\in[-1,1]}{\sup}\big{[}|g(t)|^{2}\big{]}\leq R\cdot\|g\|_{2}^{2}$ . 3. 3.

$|g(t)|^{2}$ * grows as at most $\mathsf{poly}(R)\cdot\|g\|^{2}_{2}\cdot|t|^{S}$ for $t\notin[-1,1]$ .*

Let $y(t)=g(t)+\eta(t)$ be the observable signal on $[-1,1]$ , where $\|\eta\|_{2}^{2}\leq\epsilon\cdot\|g\|_{2}^{2}$ for a sufficiently small constant $\epsilon$ . For $\Delta^{\prime}=\Delta+\widetilde{O}(R+S)$ and any $\delta$ , there exists an efficient algorithm that takes $O(R\log\frac{F}{\Delta^{\prime}\cdot\delta})$ samples from $y(t)$ and outputs $\widetilde{f}$ satisfying $|f_{0}-\widetilde{f}|\leq O(\Delta^{\prime})$ with probability at least $1-\delta$ .

For convenience, we set $h_{H}(t)=h(t)\cdot H(\alpha t)$ for any signal $h(t)$ with the filter function $H$ defined in Theorem 4.2 such that $y_{H}(t)=y(t)\cdot H(\alpha t)$ .

Given the observation $y(t)$ with most Fourier mass concentrated around $f_{0}$ , the main technical result in this section is an estimation of $e^{2\pi\mathbf{i}\beta f_{0}}$ through $y_{H}(\alpha)e^{2\pi if_{0}\beta}\approx y_{H}(\alpha+\beta)$ .

Lemma 5.2.

Given parameters $F,R,S$ , and $\Delta$ , let $g$ be a signal satisfying the three conditions in Theorem 1.2 for some $f_{0}\in[-F,F]$ and $\Delta^{\prime}=\Delta+O(R\log R+S\log S)$ .

Let $y(t)=g(t)+\eta(t)$ be the observable signal on $[-1,1]$ where the noise $\|\eta\|^{2}_{2}\leq\epsilon\|g\|^{2}_{2}$ for a sufficiently small constant $\epsilon$ . There exist a constant $\gamma$ and an algorithm such that for any $\beta\leq\frac{\gamma}{\Delta^{\prime}}$ , it takes $O(R)$ samples to output $\alpha$ satisfying $|y_{H}(\alpha)e^{2\pi if_{0}\beta}-y_{H}(\alpha+\beta)|\leq 0.3|y_{H}(\alpha)|$ with probability at least 0.6.

We show our algorithm in Algorithm 1. We finish the proof of Theorem 1.5 here and defer the proof of Lemma 5.2 to Section 5.1.

Proof of Theorem 5.1. From Lemma 5.2, $\frac{y_{H}(\alpha+\beta)}{y_{H}(\alpha)}$ gives a good estimation of $e^{2\pi if_{0}\beta}$ with probability 0.6 for any $\beta\leq\frac{\gamma}{\Delta^{\prime}}$ . We use the frequency search algorithm of Lemma 7.3 in [5] with the sampling procedure in Lemma 5.2. Because the algorithm in [5] uses the sampling procedure $O(\log\frac{F}{\Delta^{\prime}\cdot\delta})$ times to return a frequency $\widetilde{f}$ satisfying $|\widetilde{f}-f_{0}|\leq\Delta^{\prime}$ with prob. at least $1-\delta$ , the sample complexity is $O(R\cdot\log\frac{F}{\Delta^{\prime}\cdot\delta})$ . ∎

5.1 Proof of Lemma 5.2

For $y_{H}(x)=g_{H}(x)+\eta_{H}(x)$ , we have the following concentration lemma for estimation $g_{H}(x)$ .

Claim 5.3.

Given any $g$ satisfying the three conditions in Theorem 1.2 and any $\varepsilon$ and $\delta$ , there exists $m=O(R\log\frac{1}{\delta}/\varepsilon^{2})$ such that for $m$ random samples $x_{1},\ldots,x_{m}\sim[-1,1]$ , with probability $1-\delta$ ,

[TABLE]

Proof.

Notice that $\frac{\underset{x\sim[-1,1]}{\sup}[|g_{H}(x)|^{2}]}{\underset{x\sim[-1,1]}{\operatorname*{\mathbb{E}}}[|g_{H}(x)|^{2}]}\leq 2R$ . From the Chernoff bound in Lemma 2.1, $m=O(R\log\frac{1}{\delta}/\varepsilon^{2})$ suffices to estimate $\|g_{H}\|_{2}^{2}$ . ∎

Next we consider the effect of noise $\eta_{H}(x_{i})$ and $y_{H}(x_{i})$ .

Claim 5.4.

With probability $0.9$ over $m$ random samples in $[-1,1]$ , $\sum_{i=1}^{m}|y_{H}(x_{i})|^{2}/m\geq 0.8\|g\|_{2}^{2}$ .

Proof.

From Theorem 4.2, $\|g_{H}\|_{2}^{2}\geq 0.95\|g\|_{2}^{2}$ . Thus Claim 5.3 implies $\sum_{i=1}^{m}|g_{H}(x_{i})|^{2}/m\geq 0.98\cdot 0.95\|g\|_{2}^{2}$ for $m=O(R)$ with probability 0.99.

At the same time, because $\operatorname*{\mathbb{E}}[\sum_{i=1}^{m}|\eta_{H}(x_{i})|^{2}/m]=\|\eta_{H}\|_{2}^{2}$ , $\sum_{i=1}^{m}|\eta_{H}(x_{i})|^{2}/m\leq 14\|\eta_{H}\|_{2}^{2}$ with probability at least $1-\frac{1}{14}$ from the Markov inequality. This is also less than $14\cdot 1.02^{2}\|\eta\|_{2}^{2}\leq 15\epsilon\|g\|_{2}^{2}$ from the upper bound on $H(t)$ .

We have

[TABLE]

By the Cauchy-Schwartz inequality, the cross term $\sum_{i=1}^{m}|g_{H}(x_{i})|\cdot|\eta_{H}(x_{i})|\leq(\sum_{i=1}^{m}|g_{H}(x_{i})|^{2})^{1/2}\cdot(\sum_{i=1}^{m}|\eta_{H}(x_{i})|^{2})^{1/2}$ . From all discussion above,

[TABLE]

When $\varepsilon$ is a small constant, it is at least $0.8\cdot\|g\|_{2}^{2}$ . ∎

We set $z(t)=y_{H}(t)\cdot e^{2\pi\boldsymbol{i}f_{0}\beta}-y_{H}(t+\beta)$ for convenience and bound it as follows.

Claim 5.5.

Given any small constant $\gamma$ , $\Delta^{\prime}=\Delta+\mathsf{supp}(H)$ , and $z(t)=y_{H}(t)\cdot e^{2\pi\boldsymbol{i}f_{0}\beta}-y_{H}(t+\beta)$ for $\beta\leq\frac{\gamma}{\Delta^{\prime}}$ , $\|z\|_{2}^{2}\lesssim(\gamma^{2}+\epsilon)\|g\|_{2}^{2}$ .

Proof.

Notice that $y_{H}=g_{H}+\eta_{H}$ where $\mathsf{supp}(\widehat{g_{H}})\in[f_{0}-\Delta,f_{0}+\Delta]$ such that

[TABLE]

We bound $\|z\|_{2}^{2}$ through

[TABLE]

Therefore we write

[TABLE]

Because $f\in[f_{0}-\Delta^{\prime},f_{0}+\Delta^{\prime}]$ and $\beta\leq\frac{\gamma}{\Delta^{\prime}}$ , $|e^{2\pi\boldsymbol{i}f_{0}\beta}-e^{2\pi\boldsymbol{i}f\beta}|\leq 4\pi\gamma$ . So

[TABLE]

On the other hand,

[TABLE]

which is less than $5\epsilon\int_{-1}^{1}|g(t)|^{2}\mathrm{d}t.$

From all discussion above, $\int_{-1}^{1}|z(t)|^{2}\mathrm{d}t\lesssim(\gamma^{2}+\epsilon)\int_{-1}^{1}|g(t)|^{2}\mathrm{d}t$ . ∎

For sufficiently small $\gamma$ and $\varepsilon$ , by Markov inequality, we have the following corollary.

Corollary 5.6.

For sufficiently small constants $\gamma$ and $\epsilon$ , with probability $0.9$ over $m$ random samples in $[-1,1]$ , $\sum_{i=1}^{m}|z(x_{i})|^{2}\leq 0.01\|g\|_{2}^{2}$ .

Finally we finish the proof of Lemma 5.2.

Proof of Lemma 5.2. We assume Claim 5.4 and Corollary 5.6 hold in this proof, i.e.,

[TABLE]

For a random sample $\alpha\sim D_{m}$ , we bound

[TABLE]

This is $\frac{\sum_{i=1}^{m}|z(x_{i})|^{2}}{\sum_{j=1}^{m}|y_{H}(x_{j})|^{2}}\leq\frac{0.01}{0.8}$ . Thus with probability $0.8$ , $\frac{|y_{H}(\alpha)e^{2\pi if_{0}\beta}-y_{H}(\alpha+\beta)|^{2}}{|y_{H}(\alpha)|^{2}}$ is less than $0.05/0.8\leq 0.09$ . From all discussion above, $\frac{|y_{H}(\alpha)e^{2\pi if_{0}\beta}-y_{H}(\alpha+\beta)|}{|y_{H}(\alpha)|}\leq 0.3$ with probability 0.6. ∎

6 Bounds on Fourier-sparse Signals

We consider $g(t)=\sum_{j=1}^{k}v_{j}e^{2\pi\boldsymbol{i}f_{j}t}$ where each $f_{j}\in[f_{0}-\Delta,f_{0}+\Delta]$ in this section. The main result is to prove $R=\tilde{O}(k^{3})$ and $S=\tilde{O}(k^{2})$ for $k$ arbitrary real frequencies. We restate Theorem 1.5 after fixing $T=1$ .

Theorem 6.1.

Given $F,\Delta,$ and $k$ , let $g(t)$ be a $k$ -Fourier-sparse signal centered around $f_{0}\in[-F,F]$ : $g(t)=\sum_{i\in[k]}v_{i}\cdot e^{2\pi\mathbf{i}f_{i}t}$ where $f_{i}\in[f_{0}-\Delta,f_{0}+\Delta]$ and $y(t)=g(t)+\eta(t)$ be the observable signal on $[-1,1]$ , where $\|\eta\|_{2}^{2}\leq\epsilon\cdot\|g\|_{2}^{2}$ for a sufficiently small constant $\epsilon$ .

For any $\delta>0$ , there exist $\Delta^{\prime}=\Delta+\tilde{O}(R)$ and an efficient algorithm that takes $O(k\log^{2}k\log\frac{F}{\Delta^{\prime}\cdot\delta})$ samples from $y(t)$ and outputs $\widetilde{f}$ satisfying $|f_{0}-\widetilde{f}|\leq O(\Delta^{\prime})$ with probability at least $1-\delta$ .

The main improvement is a biased distribution that saves the sample complexity from $O(R)\cdot\log\frac{F}{\Delta^{\prime}\cdot\delta}$ to $\widetilde{O}(k)\cdot\log\frac{F}{\Delta^{\prime}\cdot\delta}$ .

{version:full}

We provide the main technical lemma here and defer the proofs of Theorem 1.3, 1.4, and 6.1 to Appendix 8.

{version:short}

We provide the main technical lemma here and defer the proofs of Theorem 1.3, 1.4, and 6.1 to the full version.

Theorem 6.2.

Given $z_{1},\ldots,z_{k}$ with $|z_{1}|=|z_{2}|=\cdots=|z_{k}|=1$ , there exists a degree $d=O(k^{2}\log k)$ polynomial $P(z)=\sum_{j=0}^{d}c(j)\cdot z^{j}$ satisfying

$P(z_{i})=0$ * for each $i\in[k]$ .* 2. 2.

Coefficients $c(0)=\Omega(1)$ , $c(j)=O(1)$ and $\sum_{j=1}^{d}|c(j)|^{2}=O(k)\cdot|c(0)|^{2}$ .

Corollary 6.3.

Given any $g(t)=\sum_{j=1}^{k}v_{j}e^{2\pi\boldsymbol{i}f_{j}t}$ and $\theta>0$ , there exist $d=O(k^{2}\log k)$ and a sequence of coefficients $(\alpha_{1},\ldots,\alpha_{d})$ such that

$\alpha_{j}=O(1)$ * for any $j=1,\ldots,d$ .* 2. 2.

For any $x$ (not necessarily in $[-1,1]$ ), $g(x)=\sum_{j=1}^{d}\alpha_{j}\cdot g(x-j\theta)$ .

Proof.

Given $\theta$ , we set $z_{i}=e^{-2\pi\boldsymbol{i}f_{j}\theta}$ and apply Theorem 6.2 to obtain coefficients $c(0),\ldots,c(d)$ . Then we set $\alpha_{j}=-c(j)/c(0)$ . It is straightforward to verify the second property because of

[TABLE]

∎

The proof of Theorem 6.2 requires the following bound on the coefficients of residual polynomials, which is stated as Lemma 5.3 in [5].

Lemma 6.4.

Given $z_{1},\ldots,z_{k}$ , for any integer $n$ , let $r_{n,k}(z)=\sum_{i=0}^{k-1}r_{n,k}^{(i)}\cdot z^{i}$ denote the residual polynomial of $r_{n,k}\equiv z^{n}\mod\prod_{j=1}^{k}(z-z_{j})$ . Then each coefficient in $r_{n,k}$ is bounded: $|r_{n,k}^{(i)}|\leq{k-1\choose i}\cdot{n\choose k-1}$ for $n\geq k$ and $|r_{n,k}^{(i)}|\leq{k-1\choose i}\cdot{|n|+k-1\choose k-1}$ for $n<0$ .

We finish the proof of Theorem 6.2 here.

Proof.

Let $C_{0}$ be a large constant and $d=5\cdot k^{2}\log k$ . We use $\mathcal{P}$ to denote the following subset of polynomials with bounded coefficients:

[TABLE]

For each polynomial $P(z)\in\mathcal{P}$ , we rewrite $P(z)\mod\prod_{j=1}^{k}(z-z_{j})$ as

[TABLE]

The coefficient $\sum_{j=0}^{d}\alpha_{j}\cdot 2^{-j/k}\cdot r_{n,k}^{(i)}$ is bounded by

[TABLE]

Then we apply the pigeonhole principle on the $(2C_{0}+1)^{d}$ polynomials in $\mathcal{P}$ after module $\prod_{j=1}^{d}(z-z_{j})$ : there exist $m>(2C_{0}+1)^{0.9d}$ polynomials $P_{1},\ldots,P_{m}$ such that each coefficient of $(P_{i}-P_{j})\mod\prod_{j=1}^{k}(z-z_{j})$ is $d^{-2k}$ small from the counting

[TABLE]

Because $m>(2C_{0}+1)^{0.9d}$ , there exists $j_{1}\in[m]$ and $j_{2}\in[m]\setminus\{j_{1}\}$ such that the lowest monomial $z^{l}$ with different coefficients in $P_{j_{1}}$ and $P_{j_{2}}$ satisfies $l\leq 0.1d$ . Eventually we set

[TABLE]

to satisfy the first property $P(z_{1})=P(z_{2})=\cdots=P(z_{k})=0$ . We prove the second property in the rest of this proof.

We bound every coefficient in $\big{(}z^{-l}\mod\prod_{j=1}^{k}(z-z_{j})\big{)}\cdot\big{(}P_{j_{1}}(z)-P_{j_{2}}(z)\mod\prod_{j=1}^{k}(z-z_{j})\big{)}$ by

[TABLE]

which is less than $k\cdot 2^{k}(l+k)^{k-1}\cdot d^{-2k}\leq d\cdot 2^{k}d^{k-1}\cdot d^{-2k}\leq d^{-0.5k}$ from Lemma 6.4 and the above discussion.

On the other hand, the constant coefficient in $z^{-l}\cdot\big{(}P_{j_{1}}(z)-P_{j_{2}}(z)\big{)}$ is at least $2^{-l/k}\geq 2^{-0.1d/k}=k^{-0.5k}$ because $z^{l}$ is the smallest monomial with different coefficients in $P_{j_{1}}$ and $P_{j_{2}}$ from $\mathcal{P}$ . Thus the constant coefficient $|C(0)|^{2}$ of $P(z)$ is at least $0.5\cdot 2^{-2l/k}$ .

Next we upper bound the sum of the rest of the coefficients $\sum_{j=1}^{d}|C(j)|^{2}$ by

[TABLE]

which demonstrates the second property after normalizing $C(0)$ to 1. ∎

Acknowledgement

We thank Daniel Kane and Zhao Song for many helpful discussions. We also thank the anonymous referee for the detailed feedback and comments.

7 Properties of the Filter function

We show basic properties of our filter function in Appendix 7.1 and prove Theorem 4.2 in Appendix 7.2.

7.1 Properties of $H$

We use two bounds on the $\operatorname{sinc}$ function:

For any $|x|\geq\frac{1.2}{\pi}$ , $\operatorname{sinc}(x)\leq\frac{1}{\pi|x|}$ . 2. 2.

For any $|x|\leq\frac{1.2}{\pi}$ , $\operatorname{sinc}(x)\in[1-\frac{\pi^{2}|x|^{2}}{6},1-\frac{\pi^{2}|x|^{2}}{10}]$ .

Without loss of generality, we assume both $R$ and $C$ are powers of 2 and $R\geq S$ (otherwise set $R=S$ ). Recall that $C$ is even in this section.

We use $g(t)$ to denote the product of sinc functions in $H(t)$ for convenience:

[TABLE]

We fix $l=\log_{2}(S)$ in this section and rewrite $g(t)$ as

[TABLE]

Before we show the properties of $H$ , we consider the tail of $g(t)$ .

Claim 7.1.

$g(t)=\Theta(1)$ * for $|t|\leq\frac{1.2}{\pi CR\cdot\sqrt{C\log R}}$ .* 2. 2.

$g(t)=e^{-\Theta(|CR\cdot t|^{2}\log R)}$ * for $|t|\in[\frac{1.2}{\pi CR\cdot\sqrt{C\log R}},\frac{1.2}{\pi CR}]$ .* 3. 3.

$g(t)\leq(\frac{1}{\pi\cdot CR\cdot|t|})^{C\log R}$ * for $|t|\in[\frac{1.2}{\pi CR},\frac{1.2}{\pi C\cdot S}]$ .* 4. 4.

For any $i\in[l]$ , $g(t)\leq(\frac{1}{\pi\cdot CR\cdot|t|})^{C\log R}\cdot 1.2^{-(2^{i+1}-1)C}$ for any $|t|\in[\frac{1.2\cdot 2^{i-1}}{\pi C\cdot S},\frac{1.2\cdot 2^{i}}{\pi C\cdot S}]$ . 5. 5.

$g(t)\leq(\frac{1}{\pi CR\cdot t})^{C\log R}\cdot\prod_{j=0}^{l}(\frac{1}{\pi 2^{-j}\cdot C\cdot S\cdot t})^{2^{j}\cdot C}$ * for $|t|\geq\frac{1.2\cdot 2^{l}}{\pi C\cdot S}=\frac{1.2}{C\pi}$ .*

Proof.

We first bound $\operatorname{sinc}(CR\cdot t)^{C\log R}$ then bound $\prod_{j=0}^{l}\operatorname{sinc}\big{(}2^{-j}\cdot C\cdot S\cdot t\big{)}^{2^{j}\cdot C}$ .

For $|t|\leq\frac{1.2}{\pi CR}$ , from the second property of $\operatorname{sinc}$ functions,

[TABLE]

and

[TABLE] 2. 2.

For $|t|\geq\frac{1.2}{\pi CR}$ , from the first property of $\operatorname{sinc}$ functions,

[TABLE]

Then we bound the tail of the product of sinc functions.

For $|t|\leq\frac{1.2}{\pi C\cdot S}$ ,

[TABLE]

Notice that $\pi^{2}\cdot|2^{-j}\cdot C\cdot S\cdot t|^{2}$ is less than $1.2^{2}\cdot 2^{-2j}$ . Thus $\operatorname{sinc}\big{(}2^{-j}\cdot C\cdot S\cdot t\big{)}^{2^{j}\cdot C}=\big{(}1-\Theta(2^{-j})\big{)}^{C}$ and their products over $j$ is

[TABLE] 2. 2.

Let us fix $i\leq l$ and consider $\operatorname{sinc}\big{(}2^{-j}\cdot C\cdot S\cdot t\big{)}^{2^{j}\cdot C}$ for $|t|\in[\frac{1.2\cdot 2^{i-1}}{\pi C\cdot S},\frac{1.2\cdot 2^{i}}{\pi C\cdot S}]$ . By the first property of sinc function, for $j\leq i$ ,

[TABLE]

For $j>i$ , we use the same analysis with the second property of the sinc function:

[TABLE]

where $\pi^{2}\cdot|2^{-j}\cdot C\cdot S\cdot t|^{2}$ is at least $1.2^{2}\cdot 2^{-2(j-i)}$ . Hence the product is

[TABLE]

We get the tail bounds by combining the above discussion of $\operatorname{sinc}(CR\cdot t)^{C\log R}$ and $\prod_{j=0}^{l}\operatorname{sinc}\big{(}2^{-j}\cdot C\cdot S\cdot t\big{)}^{2^{j}\cdot C}$ together. ∎

Since $H(t)=s_{0}\cdot g(t)*\operatorname{rect}_{2}(t)=s_{0}\cdot\int_{t-1/2}^{t+1/2}g(x)\mathrm{d}x$ , we have the following bounds on $H(t)$ based on Claim 7.1.

Lemma 7.2.

For any constant $C\geq 4$ ,

$s_{0}=\Theta(\pi CR\cdot\sqrt{C\log R})$ . 2. 2.

$H(t)=1\pm 0.01$ * for $|t|\leq\frac{1}{2}-\frac{1.2}{\pi CR}$ .* 3. 3.

$H(t)\lesssim\frac{s_{0}}{S}\cdot R^{-C/4}$ * for $|t|\in[\frac{1}{2}+\frac{1.2}{\pi CR},\frac{1}{2}+\frac{1.2}{\pi C\cdot S}]$ .* 4. 4.

$H(t)\lesssim s_{0}\cdot R^{-C/4}\cdot 1.2^{-2^{i}C}$ * for $|t|\in[\frac{1}{2}+\frac{1.2\cdot 2^{i-1}}{\pi C\cdot S},\frac{1}{2}+\frac{1.2\cdot 2^{i}}{\pi C\cdot S}]$ of any $i\leq[l]$ .* 5. 5.

$H(t)\leq s_{0}\cdot(\frac{1}{1.2\pi CR\cdot(|t|-\frac{1}{2})})^{C\log R}\cdot\big{(}\frac{1}{C\pi\cdot(|t|-\frac{1}{2})}\big{)}^{CS}$ * for $t\geq\frac{1}{2}+\frac{1.2}{C\pi}$ .*

Proof.

We bound the integration of different intervals of $g(t)$ as follows:

$\int_{\frac{-1.2}{\pi CR}}^{\frac{1.2}{\pi CR}}g(x)\mathrm{d}x=\int_{\frac{-1.2}{\pi CR\cdot\sqrt{C\log R}}}^{\frac{1.2}{\pi CR\cdot\sqrt{C\log R}}}g(x)\mathrm{d}x+2\int_{\frac{1.2}{\pi CR\cdot\sqrt{C\log R}}}^{\frac{1.2}{\pi CR}}e^{-\Theta(|CR\cdot x|^{2}\log R)}\mathrm{d}x=\Theta(\frac{1}{\pi CR\cdot\sqrt{C\log R}})$ . 2. 2.

$\int_{\frac{1.2}{\pi CR}}^{\frac{1.2}{\pi C\cdot S}}g(x)\mathrm{d}x\leq\int_{\frac{1.2}{\pi CR}}^{\frac{1.2}{\pi C\cdot S}}(\frac{1}{\pi\cdot CR\cdot x})^{C\log R}\mathrm{d}x\leq\frac{1.2}{\pi C\cdot S}\cdot 1.2^{-C\log R}$ . 3. 3.

For any $i\in[l]$ of $l=\log_{2}S$ ,

[TABLE] 4. 4.

For $|t|\geq\frac{1.2}{C\pi}$ ,

[TABLE]

Next we prove all claims in this lemma.

For $s_{0}$ , notice that

[TABLE]

which also indicates $s_{0}\in[1,1+10^{-3}]\cdot 1/\left(\int_{\frac{-1.2}{\pi CR}}^{\frac{1.2}{\pi CR}}g(x)\mathrm{d}x\right)$ . 2. 2.

When $|t|<\frac{1}{2}-\frac{1.2}{\pi CR}$ , $H(t)=s_{0}\cdot\left(\int_{\frac{-1.2}{\pi CR}}^{\frac{1.2}{\pi CR}}g(x)\mathrm{d}x+\int_{[t-1/2,t+1/2]\setminus[\frac{-1.2}{\pi CR},\frac{1.2}{\pi CR}]}g(x)\mathrm{d}x\right)$ , which is in $s_{0}\cdot[1,1+10^{-3}]\cdot\int_{\frac{-1.2}{\pi CR}}^{\frac{1.2}{\pi CR}}g(x)\mathrm{d}x\subseteq[1-0.01,1+0.01].$ 3. 3.

When $|t|\in[\frac{1}{2}-\frac{1.2}{\pi CR},\frac{1}{2}+\frac{1.2}{\pi CR}]$ , $H(t)\in[0,1]$ . 4. 4.

When $|t|\in[\frac{1}{2}+\frac{1.2}{\pi CR},\frac{1}{2}+\frac{1.2}{\pi C\cdot S}]$ ,

[TABLE] 5. 5.

When $|t|\in[\frac{1}{2}+\frac{1.2\cdot 2^{i-1}}{\pi C\cdot S},\frac{1}{2}+\frac{1.2\cdot 2^{i}}{\pi C\cdot S}]$ of a positive integer $i<l$ ,

[TABLE] 6. 6.

When $t>\frac{1}{2}+\frac{1.2}{C\pi}$ , we use the bound in the last item of the above discussion.

∎

7.2 Proof of Theorem 4.2

We finish the proof of Theorem 4.2 using Lemma 7.2 for $\alpha=\frac{1}{2}+\frac{1.2}{\pi CR}$ . Without loss of generality, we assume $R\geq S$ in this proof (otherwise set $R=S$ ).

We first show

[TABLE]

From the second property of $H$ in Lemma 7.2, $H\big{(}\alpha x\big{)}\geq 1-0.01$ for any $|x|\leq\frac{\frac{1}{2}-\frac{1.2}{\pi CR}}{\alpha}=1-\frac{2.4}{\pi CR+2.4}$ such that

[TABLE]

At the same time, $|g(t)|^{2}\leq R\cdot\underset{x\sim[-1,1]}{\operatorname*{\mathbb{E}}}[|g(x)|^{2}]=R/2\cdot\int_{-1}^{1}|g(x)|^{2}\mathrm{d}x$ for any $t\in[-1,1]$ . This indicates

[TABLE]

The first property follows from these two inequalities.

In the rest of this proof, we apply Lemma 7.2 to prove:

[TABLE]

We split $\int_{1}^{\infty}|g(x)\cdot H\big{(}\alpha x\big{)}|^{2}\mathrm{d}x$ into several intervals:

[TABLE]

In the first two terms, we rewrite $|g(t)|\leq\mathsf{poly}(R)\cdot\|g\|_{2}\cdot t^{S}$ as $\mathsf{poly}(R)\cdot\|g\|_{2}\cdot e^{(t-1)S}$ . By the third and fourth properties of $H(t)$ in Lemma 7.2, their summations is less than $0.01\|g\|_{2}^{2}$ . For the last term, given the last property of $H(t)$ in Lemma 7.2 and a large constant $C$ , we have

[TABLE]

It is straightforward to verify that $\int_{1}^{\infty}|g(x)\cdot H\big{(}\alpha x\big{)}|^{2}\mathrm{d}x\leq 0.02\cdot\|g\|^{2}_{2}$ .

The last property follows from the upper bounds in Lemma 7.2.

8 Omitted Proofs in Section 6

We first prove Theorem 1.5 then finish the proof of Theorem 1.3 and 1.4 in Appendix 8.2 and 8.3 separately.

8.1 Proof of Theorem 1.5

We finish the proof of Theorem 1.5 in this section. The only difference compared to Theorem 1.2 is to use a biased distribution $D$ such that we could improve the sample complexity to $\widetilde{O}(k\log\frac{F}{\Delta^{\prime}\epsilon})$ .

How to Generate Samples.

We will use a distribution $D$ not uniform on $[-1,1]$ to generate the random samples. For $m$ samples $x_{1},\cdots,x_{m}\sim D$ , we assign a weight $w_{i}=\frac{1}{2m\cdot D(x_{i})}$ for each sample $x_{i}$ such that for any function $h$ ,

[TABLE]

[6] presented an explicit distribution $D$ such that $\tilde{O}(k)$ samples could guarantee $\sum_{i=1}^{m}w_{i}|g(x_{i})|^{2}$ is close to $\|g\|_{2}^{2}$ with high probability. For completeness, we show it with our improved bound $R$ .

Lemma 8.1.

Given the sparsity $k$ , there exists a constant $c$ such that the distribution

[TABLE]

guarantees for any $k$ -Fourier-sparse signal $g$ , $\underset{x\in[-1,1]}{\sup}\frac{1}{2D(x)}\cdot\frac{|g(x)|^{2}}{\|g\|_{2}^{2}}=O(k\log^{2}k)$ .

Moreover, $m=O(\frac{k\log^{2}k\log\frac{1}{\delta}}{\epsilon^{2}})$ samples $x_{1},\cdots,x_{m}$ from $D$ with weights $w_{i}=\frac{1}{2m\cdot D(x_{i})}$ for $i\in[m]$ guarantee that, with probability at least $1-\delta$ ,

[TABLE]

Proof.

Given $D$ and the $k$ -Fourier-sparse signal $g$ , let $z(x)$ denote $\frac{|g(x)|^{2}}{2D(x)}$ for $x\in[-1,1]$ . We have $\operatorname*{\mathbb{E}}_{x\sim D}\big{[}z(x)\big{]}=\operatorname*{\mathbb{E}}_{x\sim[-1,1]}\big{[}|g(x)|^{2}\big{]}=\|g\|_{2}^{2}$ and $\underset{x\in\mathsf{supp}(D)}{\sup}\frac{z(x)}{\operatorname*{\mathbb{E}}_{x^{\prime}\sim D}[z(x^{\prime})]}=O(k\log^{2}k)$ . We apply the Chernoff bound in Lemma 2.1 on the random variables $z(x_{1}),\cdots,z(x_{m})$ to obtain the statement. ∎

Similar to Lemma 5.2, we state the following version for Fourier-sparse signals.

Lemma 8.2.

Given the sparsity $k,f_{0}$ and $\Delta$ , let $g$ be a $k$ -Fourier-sparse signal $g(t)=\sum_{i\in[k]}v_{i}\cdot e^{2\pi\mathbf{i}f_{i}t}$ with $f_{i}\subseteq[f_{0}-\Delta,f_{0}+\Delta]$ and $\Delta^{\prime}=\Delta+O(\frac{R\log k+k^{2}\log^{2}k}{T})$ .

Let $y(t)=g(t)+\eta(t)$ be the observable signal on $[-1,1]$ where the noise $\|\eta\|^{2}_{2}\leq\epsilon\|g\|^{2}_{2}$ for a sufficiently small constant $\epsilon$ . There exist a constant $\gamma$ and an algorithm such that for any $\beta\leq\frac{\gamma}{\Delta^{\prime}}$ , it takes $O(k\log^{2}k)$ samples to output $\alpha$ satisfying $|y_{H}(\alpha)e^{2\pi if_{0}\beta}-y_{H}(\alpha+\beta)|\leq 0.3|y_{H}(\alpha)|$ with probability at least 0.6.

We show our algorithm in Algorithm 2. We finish the proof of Theorem 1.5.

Proof of Theorem 6.1. From Lemma 8.2, $\frac{y_{H}(\alpha+\beta)}{y_{H}(\alpha)}$ gives a good estimation of $e^{2\pi if_{0}\beta}$ with probability 0.6 for any $\beta\leq\frac{\gamma}{\Delta^{\prime}}$ . We use the frequency search algorithm of Lemma 7.3 in [5] with the sampling procedure in Lemma 8.2. Because the algorithm in [5] uses the sampling procedure $O(\log\frac{F}{\Delta^{\prime}\cdot\delta})$ times to return a frequency $\widetilde{f}$ satisfying $|\widetilde{f}-f_{0}|\leq\Delta^{\prime}$ with prob. at least $1-\delta$ , the sample complexity is $O(k\log^{2}k\cdot\log\frac{F}{\Delta^{\prime}\cdot\delta})$ . ∎

8.2 Proof of Theorem 1.3

We bound $R$ of $k$ -sparse-Fourier signals in this section. We first state the technical result to prove the upper bound $R$ .

Theorem 8.3.

Given any $k>0$ , there exists $d=O(k^{2}\log k)$ such that for any $g(x)=\sum_{j=1}^{k}v_{j}\cdot e^{2\pi\boldsymbol{i}f_{j}\cdot x}$ , any $t\in\mathbb{R}$ , and any $\theta>0$ ,

[TABLE]

Proof of Theorem 8.3. Given $k$ frequencies $f_{1},\cdots,f_{k}$ and $\theta$ , we set $z_{1}=e^{2\pi\boldsymbol{i}f_{1}\cdot\theta},\cdots,z_{k}=e^{2\pi\boldsymbol{i}f_{k}\cdot\theta}$ . Let $C(0),\cdots,C(d)$ be the coefficients of the degree $d$ polynomial $P(z)$ in Theorem 6.2. We have

[TABLE]

Hence for every $i\in[k]$ ,

[TABLE]

By Cauchy-Schwartz inequality, we have

[TABLE]

From the second property of $C(0),\cdots,C(d)$ in Theorem 6.2, $|g(t)|^{2}\leq O(k)\cdot\bigg{(}\sum_{j=1}^{d}|g(t+j\cdot\theta|^{2}\bigg{)}$ . ∎

We finish the proof of Theorem 1.3 bounding $R$ by the above relation. For convenience, we restate it for $T=1$ .

Theorem 8.4.

For any $g(t)=\sum_{j=1}^{k}v_{j}e^{2\pi\boldsymbol{i}f_{j}t}$ ,

[TABLE]

Proof.

We prove

[TABLE]

which indicates $|g(t)|^{2}=O(k^{3}\log^{2}k)\cdot\underset{x\sim[-1,1]}{\operatorname*{\mathbb{E}}}\big{[}|g(x)|^{2}\big{]}$ . By symmetry, it also implies that $|g(t)|^{2}=O(k^{3}\log^{2}k)\cdot\underset{x\sim[-1,1]}{\operatorname*{\mathbb{E}}}\big{[}|g(x)|^{2}\big{]}$ for any $t\geq 0$ .

We use Theorem 8.3 on $g(t)$ :

[TABLE]

From all discussion above, we have $|g(t)|^{2}\lesssim dk\log k\cdot\underset{x\in[-1,1]}{\operatorname*{\mathbb{E}}}[|g(x)|^{2}]$ . ∎

8.3 Growth outside of the observation

We prove Theorem 1.4 which bounds $S=\tilde{O}(k^{2})$ in this section. We divide the proof into two parts for $|x|\leq 1+1/k$ and $|x|>1+1/k$ separately after fixing $T=1$ .

Lemma 8.5.

For any $g(t)=\sum_{j=1}^{k}v_{j}\cdot e^{2\pi\boldsymbol{i}f_{j}t}$ , there exists a constant $C_{1}$ such that for any $x\geq 1$ , $|g(x)|\lesssim\mathsf{poly}(k)\cdot\|g\|_{2}\cdot C_{1}^{(x-1)\cdot k^{2}\log k}$ .

Remark 8.6.

The growth of Chebyshev polynomial is $e^{\Theta(k\sqrt{x-1})}$ for $x=1+O(1/k^{2})$ .

Proof.

Let $d=O(k^{2}\log k)$ denote the length of the linear combination in Corollary 6.3 and $\theta=\frac{2}{d}$ . Given $g(t)$ and $\theta$ , we use $\alpha_{1},\cdots,\alpha_{d}$ to denote the coefficients of the linear combination of $g(t),g(t-\theta),\ldots,g(t-d\theta)$ in Corollary 6.3. For convenience, we use $C_{0}$ to denote the upper bound on the coefficients $\alpha_{j}$ .

We use induction to prove that for some $C=O(1)$ , for any $l$ ,

[TABLE]

For base case $l=1$ , from Corollary 6.3, $g(x)=\sum_{j=1}^{d}\alpha_{j}\cdot g(x-j\theta)$ where $x-j\theta\in[-1,1]$ . Because each $\big{|}g(x-j\theta)\big{|}\leq C\cdot k^{1.5}\log k\cdot\|g\|_{2}$ from Theorem 1.3, we have

[TABLE]

Suppose (7) is true for any $x\in(1,1+\frac{2l}{d}]$ . Let us consider $x\in(1+\frac{2l}{d},1+\frac{2(l+1)}{d}]$ . We still have $g(x)=\sum_{j=1}^{d}\alpha_{j}\cdot g(x-j\theta)$ where each $x-j\theta\in(1+\frac{2(l-j)}{d},1+\frac{2(l+1-j)}{d}]$ . This indicates

[TABLE]

∎

For completeness, we bound the growth rate of $|t|>1+1/k$ here, which is a reformulation of Lemma 5.5 in [5].

Lemma 8.7.

For any $g(t)=\sum_{j=1}^{k}v_{j}e^{2\pi\boldsymbol{i}f_{j}t}$ and any $|t|>1$ ,

[TABLE]

Proof.

We fix $t>1$ in this proof. Let $\theta=1/k$ and $n=\big{[}(t+1/2)/\theta\big{]}$ such that $t-n\theta\in[-1/2,-1/2+\theta]$ and $t-(n-k)\theta\in[1/2,1/2+\theta]$ . We first show the coefficients $C_{0},\cdots,C_{k-1}$ in

[TABLE]

satisfy $g(t)=\sum_{l=0}^{k-1}C_{j}\cdot g\big{(}t-(n-l)\theta\big{)}$ . Let $z_{j}=e^{2\pi\boldsymbol{i}f_{j}\theta}$ such that $z_{j}^{n}=\sum_{j=0}^{k-1}C_{j}\cdot z^{j}$ . For $g(t)=\sum_{j=1}^{k}v_{j}e^{2\pi\boldsymbol{i}f_{j}t}$ , we rewrite it as

[TABLE]

Thus $|g(t)|^{2}\leq(\sum_{j=0}^{k-1}|C_{j}|^{2})\cdot(\sum_{l=0}^{k-1}|g(t-n\theta+l\theta)|^{2})$ .

Since $g(t-n\theta+l\theta)\in[-2/3,2/3]$ , $|g(t-n\theta+l\theta)|^{2}\lesssim k\underset{x\in[-1,1]}{\operatorname*{\mathbb{E}}}[|g(x)|^{2}]$ from [6]. On the other hand, $|C_{j}|\leq{k-1\choose j}{n\choose k-1}\leq(2n)^{k-1}$ from Lemma 6.4.

From all discussion above,

[TABLE]

∎

Proof of Theorem 1.4. We combine Lemma 8.5 and 8.7: For $x\leq 1+1/k$ , $C_{1}^{(x-1)k^{2}\log k}=e^{(x-1)k^{2}\log k\log C_{1}}=x^{O(k^{2}\log k)}$ . For $x>1+1/k$ , $(3kx)^{k}$ is still less than $x^{O(k^{2}\log k)}$ . ∎

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1AGS [03] A. Akavia, S. Goldwasser, and S. Safra. Proving hard-core predicates using list decoding. FOCS , 44:146–159, 2003.
2AKM + [19] Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, and Amir Zandieh. A universal sampling method for reconstructing signals with simple fourier transforms. In Proceedings of the 51st annual ACM symposium on Theory of computing (STOC 2019) , 2019.
3BM [86] Y. Bresler and A. Macovski. Exact maximum likelihood parameter estimation of superimposed exponential signals in noise. IEEE Transactions on Acoustics, Speech, and Signal Processing , 34(5):1081–1089, Oct 1986.
4Che [52] Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics , 23:493–507, 1952.
5CKPS [16] Xue Chen, Daniel M. Kane, Eric Price, and Zhao Song. Fourier-sparse interpolation without a frequency gap. In Foundations of Computer Science(FOCS), 2016 IEEE 57th Annual Symposium on , 2016.
6CP [18] Xue Chen and Eric Price. Active regression via linear-sample sparsification. ar Xiv preprint ar Xiv:1711.10051 , 2018.
7GGI + [02] Anna C Gilbert, Sudipto Guha, Piotr Indyk, S Muthukrishnan, and Martin Strauss. Near-optimal sparse Fourier representations via sampling. In Proceedings of the thirty-fourth annual ACM symposium on Theory of computing , pages 152–161. ACM, 2002.
8GMS [05] Anna C Gilbert, S Muthukrishnan, and Martin Strauss. Improved time bounds for near-optimal sparse Fourier representations. In Optics & Photonics 2005 , pages 59141 A–59141 A. International Society for Optics and Photonics, 2005.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Estimating the Frequency of a Clustered Signal

Abstract

1 Introduction

Question 1.1**.**

Theorem 1.2**.**

Application to sparse Fourier transforms

Theorem 1.3**.**

Theorem 1.4**.**

Theorem 1.5**.**

Lemma 1.6**.**

Organization

2 Preliminaries

Lemma 2.1**.**

3 Proof Overview

The filter functions (H,H^)(H,\widehat{H})(H,H) in Lemma 1.6.

The Algorithm of Theorem 1.2.

kkk-Fourier-sparse signals.

4 Our Filter Function

Definition 4.1**.**

Theorem 4.2**.**

5 Frequency Estimation

Theorem 5.1**.**

Lemma 5.2**.**

5.1 Proof of Lemma 5.2

Claim 5.3**.**

Proof.

Claim 5.4**.**

Proof.

Claim 5.5**.**

Proof.

Corollary 5.6**.**

6 Bounds on Fourier-sparse Signals

Theorem 6.1**.**

Theorem 6.2**.**

Corollary 6.3**.**

Proof.

Lemma 6.4**.**

Proof.

Acknowledgement

7 Properties of the Filter function

7.1 Properties of HHH

Claim 7.1**.**

Proof.

Lemma 7.2**.**

Proof.

7.2 Proof of Theorem 4.2

8 Omitted Proofs in Section 6

8.1 Proof of Theorem 1.5

How to Generate Samples.

Lemma 8.1**.**

Proof.

Lemma 8.2**.**

8.2 Proof of Theorem 1.3

Theorem 8.3**.**

Theorem 8.4**.**

Proof.

8.3 Growth outside of the observation

Lemma 8.5**.**

Remark 8.6**.**

Proof.

Lemma 8.7**.**

Proof.

Question 1.1.

Theorem 1.2.

Theorem 1.3.

Theorem 1.4.

Theorem 1.5.

Lemma 1.6.

Lemma 2.1.

The filter functions $(H,\widehat{H})$ in Lemma 1.6.

$k$ -Fourier-sparse signals.

Definition 4.1.

Theorem 4.2.

Theorem 5.1.

Lemma 5.2.

Claim 5.3.

Claim 5.4.

Claim 5.5.

Corollary 5.6.

Theorem 6.1.

Theorem 6.2.

Corollary 6.3.

Lemma 6.4.

7.1 Properties of $H$

Claim 7.1.

Lemma 7.2.

Lemma 8.1.

Lemma 8.2.

Theorem 8.3.

Theorem 8.4.

Lemma 8.5.

Remark 8.6.

Lemma 8.7.