Atomic Norm Denoising for Complex Exponentials with Unknown Waveform   Modulations

Shuang Li; Michael B. Wakin; and Gongguo Tang

arXiv:1902.05238·cs.IT·October 9, 2019

Atomic Norm Denoising for Complex Exponentials with Unknown Waveform Modulations

Shuang Li, Michael B. Wakin, and Gongguo Tang

PDF

Open Access

TL;DR

This paper introduces an atomic norm regularized approach for denoising signals composed of complex exponentials with unknown modulations in non-stationary blind super-resolution, providing theoretical error bounds and numerical validation.

Contribution

It extends atomic norm methods to denoise complex exponential signals with unknown waveforms, offering new theoretical insights and practical algorithms.

Findings

01

Mean square error depends on noise variance and signal parameters.

02

The proposed method achieves robust denoising in non-stationary blind super-resolution.

03

Numerical experiments validate the theoretical error bounds.

Abstract

Non-stationary blind super-resolution is an extension of the traditional super-resolution problem, which deals with the problem of recovering fine details from coarse measurements. The non-stationary blind super-resolution problem appears in many applications including radar imaging, 3D single-molecule microscopy, computational photography, etc. There is a growing interest in solving non-stationary blind super-resolution task with convex methods due to their robustness to noise and strong theoretical guarantees. Motivated by the recent work on atomic norm minimization in blind inverse problems, we focus here on the signal denoising problem in non-stationary blind super-resolution. In particular, we use an atomic norm regularized least-squares problem to denoise a sum of complex exponentials with unknown waveform modulations. We quantify how the mean square error depends on the noise…

Equations579

x^{⋆} (m) = j = 1 \sum J c_{j} e^{- i 2 π m τ_{j}} g_{j} (m), \leavevmode m = - 2 M, \dots, 0, \dots, 2 M,

x^{⋆} (m) = j = 1 \sum J c_{j} e^{- i 2 π m τ_{j}} g_{j} (m), \leavevmode m = - 2 M, \dots, 0, \dots, 2 M,

y (m) = x^{⋆} (m) + z (m), \leavevmode m = - 2 M, \dots, 0, \dots, 2 M,

y (m) = x^{⋆} (m) + z (m), \leavevmode m = - 2 M, \dots, 0, \dots, 2 M,

B = [b_{- 2 M} \leavevmode \dots \leavevmode b_{0} \leavevmode \dots \leavevmode b_{2 M}]^{H} .

B = [b_{- 2 M} \leavevmode \dots \leavevmode b_{0} \leavevmode \dots \leavevmode b_{2 M}]^{H} .

a (τ) = [e^{i 2 π τ (- 2 M)} \leavevmode \dots \leavevmode 1 \leavevmode \dots \leavevmode e^{i 2 π τ (2 M)}]^{⊤} .

a (τ) = [e^{i 2 π τ (- 2 M)} \leavevmode \dots \leavevmode 1 \leavevmode \dots \leavevmode e^{i 2 π τ (2 M)}]^{⊤} .

x^{⋆} (m)

x^{⋆} (m)

= j = 1 \sum J c_{j} tr (e_{m} b_{m}^{H} h_{j} a (τ_{j})^{H})

= tr (e_{m} b_{m}^{H} j = 1 \sum J c_{j} h_{j} a (τ_{j})^{H})

= ⟨ j = 1 \sum J c_{j} h_{j} a (τ_{j})^{H}, b_{m} e_{m}^{H} ⟩,

⟨ X_{1}, X_{2} ⟩ = tr (X_{2}^{H} X_{1}) .

⟨ X_{1}, X_{2} ⟩ = tr (X_{2}^{H} X_{1}) .

[B (X)]_{m}

[B (X)]_{m}

B^{*} (x)

X^{⋆} ≜ j = 1 \sum J c_{j} h_{j} a (τ_{j})^{H} .

X^{⋆} ≜ j = 1 \sum J c_{j} h_{j} a (τ_{j})^{H} .

x^{⋆} = B (X^{⋆})

x^{⋆} = B (X^{⋆})

y = x^{⋆} + z = B (X^{⋆}) + z .

y = x^{⋆} + z = B (X^{⋆}) + z .

A ≜ {h a (τ)^{H} : τ \in [0, 1), ∥ h ∥_{2} = 1, h \in C^{K}}

A ≜ {h a (τ)^{H} : τ \in [0, 1), ∥ h ∥_{2} = 1, h \in C^{K}}

∥ X ∥_{A}

∥ X ∥_{A}

= c_{j}, τ_{j}, ∥ h_{j} ∥_{2} = 1 in f {j = 1 \sum J c_{j} : X = j = 1 \sum J c_{j} h_{j} a (τ_{j})^{H}, c_{j} > 0},

X = ar g X min \frac{1}{2} ∥ y - B (X) ∥_{2}^{2} + λ ∥ X ∥_{A},

X = ar g X min \frac{1}{2} ∥ y - B (X) ∥_{2}^{2} + λ ∥ X ∥_{A},

{X, u, T} = ar g X, u, T min

{X, u, T} = ar g X, u, T min

s.t.

E b b^{H} = I_{K}, \leavevmode and \leavevmode E (\frac{b}{∥ b ∥ _{2}} \frac{b ^{H}}{∥ b ∥ _{2}}) = \frac{1}{K} I_{K}, \leavevmode \leavevmode b \in F .

E b b^{H} = I_{K}, \leavevmode and \leavevmode E (\frac{b}{∥ b ∥ _{2}} \frac{b ^{H}}{∥ b ∥ _{2}}) = \frac{1}{K} I_{K}, \leavevmode \leavevmode b \in F .

1 \leq k \leq K max ∣ b (k) ∣^{2} \leq μ, \leavevmode \leavevmode b \in F

1 \leq k \leq K max ∣ b (k) ∣^{2} \leq μ, \leavevmode \leavevmode b \in F

j \neq = k min d (τ_{j}, τ_{k}) \geq \frac{1}{N},

j \neq = k min d (τ_{j}, τ_{k}) \geq \frac{1}{N},

\frac{1}{N} ∥ x - x^{⋆} ∥_{2}^{2} \leq C η^{2} σ^{2} μ^{2} \frac{J ^{2} K}{N} lo g (N) lo g (N J K)

\frac{1}{N} ∥ x - x^{⋆} ∥_{2}^{2} \leq C η^{2} σ^{2} μ^{2} \frac{J ^{2} K}{N} lo g (N) lo g (N J K)

λ \approx η E_{z} ∥ B^{*} (z) ∥_{A}^{*}

λ \approx η E_{z} ∥ B^{*} (z) ∥_{A}^{*}

∥ Q ∥_{A}^{*}

∥ Q ∥_{A}^{*}

= τ \in [0, 1), ∥ h ∥_{2} = 1 sup ⟨ Q, h a (τ)^{H} ⟩_{R}

= τ \in [0, 1), ∥ h ∥_{2} = 1 sup ⟨ Q a (τ), h ⟩_{R}

= τ \in [0, 1) sup ∥ Q a (τ) ∥_{2} .

B^{*} (x)

B^{*} (x)

E_{z} ∥ B^{*} (z) ∥_{A}^{*} \leq C σ ∥ B ∥_{F} lo g (N) .

E_{z} ∥ B^{*} (z) ∥_{A}^{*} \leq C σ ∥ B ∥_{F} lo g (N) .

P (∥ B^{*} (z) ∥_{A}^{*} \leq \frac{λ}{η}) \geq 1 - c \frac{1}{N}

P (∥ B^{*} (z) ∥_{A}^{*} \leq \frac{λ}{η}) \geq 1 - c \frac{1}{N}

(∥ B^{*} (z) ∥_{A}^{*})^{2}

(∥ B^{*} (z) ∥_{A}^{*})^{2}

= τ \in [0, 1) sup k = 1 \sum K m = - 2 M \sum 2 M z (m) b_{m} (k) e^{i 2 π τ m}^{2}

= τ \in [0, 1) sup k = 1 \sum K m, n = - 2 M \sum 2 M z (m) z (n)^{H} b_{m} (k) b_{n} (k)^{H} e^{i 2 π τ (m - n)}

= τ \in [0, 1) sup Z_{N} (e^{i 2 π τ}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhotoacoustic and Ultrasonic Imaging · Sparse and Compressive Sensing Techniques · Advanced Fluorescence Microscopy Techniques

Full text

Atomic Norm Denoising for Complex Exponentials

with Unknown Waveform Modulations

Shuang Li, Michael B. Wakin, and Gongguo Tang Department of Electrical Engineering, Colorado School of Mines. Email: {shuangli,mwakin,gtang}@mines.edu.

(February 15, 2019 Revised: September 04, 2019)

Abstract

Non-stationary blind super-resolution is an extension of the traditional super-resolution problem, which deals with the problem of recovering fine details from coarse measurements. The non-stationary blind super-resolution problem appears in many applications including radar imaging, 3D single-molecule microscopy, computational photography, etc. There is a growing interest in solving non-stationary blind super-resolution task with convex methods due to their robustness to noise and strong theoretical guarantees. Motivated by the recent work on atomic norm minimization in blind inverse problems, we focus here on the signal denoising problem in non-stationary blind super-resolution. In particular, we use an atomic norm regularized least-squares problem to denoise a sum of complex exponentials with unknown waveform modulations. We quantify how the mean square error depends on the noise variance and the true signal parameters. Numerical experiments are also implemented to illustrate the theoretical result.

Index terms— Atomic norm denoising, non-stationary blind deconvolution, line spectrum estimation, demodulation, mean square error.

1 Introduction

Super-resolution is the process of recovering high-resolution information of a signal from its coarse-scale measurements [1]. Super-resolution problems arise in a wide variety of applications, including microscopy [2], imaging spectroscopy [3], radar signal demixing [4], astronomical imaging [5], and medical imaging [6]. One specific super-resolution problem has received considerable theoretical study in recent years: recovering positions on the interval $[0,1)$ of unknown point sources given only low-frequency samples [1, 7]. Exchanging the roles of time and frequency, this problem is equivalent to that of line spectral estimation: recovering frequencies on the interval $[0,1)$ of unknown complex exponentials given limited or possibly compressed time-domain samples. The total variation norm has been used to regularize such super-resolution problems, and this is equivalent to the atomic norm that has been used to regularize such line spectral estimation problems.

In this work, we are interested in the non-stationary blind super-resolution scenario, which extends the above super-resolution problem to the setting where each point source is convolved with a unique and unknown point spread function. The term “non-stationary” indicates that the point spread functions are potentially different and comes from the field of non-stationary deconvolution [8]; the term “blind” indicates that the point spread functions are unknown. Non-stationary blind super-resolution problems also appear in applications involving radar imaging [9], astronomy [10], photography [11], 3D single-molecule microscopy [12], seismology [8] and nuclear magnetic resonance (NMR) spectroscopy [13, 14, 15]. Exchanging the roles of time and frequency, the conventional line spectral estimation problem is modified as follows: each complex exponential is modulated (pointwise multiplied) by an unknown waveform, and this waveform can vary from one complex exponential to the next. Though both problem formulations are equivalent, it is this modified line spectrum estimation problem that we detail in Section 2 and refer to throughout this paper.

In recent years, convex methods have been widely used in super-resolution due to their robustness to noise and strong theoretical guarantees. Among them, atomic norm minimization based methods are extremely popular. In [16], the authors propose an atomic norm minimization based scheme to super-resolve unknown frequencies of a signal from its random time samples. Yang et al. [17] extend the super-resolution of unknown frequencies in [16] to the case where multiple measurement vectors are available. Our earlier work [18] also brings atomic norm minimization into the application of modal analysis for super-resolution of unknown modal parameters of a vibration system from its random and compressed measurements. Meanwhile, the robustness of atomic norm minimization when given noisy data [7] has also been widely studied in the past few years. The authors in [19] apply an atomic norm denoising based technique to line spectral estimation, which is one of the fundamental problems in statistical signal processing. The same authors [20] also establish a nearly optimal algorithm, which is called atomic norm soft thresholding, to denoise a mixture of complex sinusoids. In [21], the authors use atomic norm denoising to investigate the performance of super-resolution line spectral estimation with white noise and provide theoretical guarantees for support recovery. In addition, atomic norm denoising is also studied in [22] for the multiple measurement vector case.

Our work is most closely related to papers [20] and [23]. The authors in [20] focus on a mixture of complex exponentials, while we work on a superposition of complex exponentials with unknown waveform modulations. It can be seen from Section 2 that our problem reduces to the problem studied in [20] by setting the subspace dimension $K=1$ and selecting the subspace matrix $\mathbf{B}$ as an $N\times 1$ vector with all ones. Both [20] and our work establish theoretical guarantees for the mean square error (MSE) with respect to the noise level and true signal parameters. However, since we deal with a more sophisticated scenario in this work, our theory also depends on properties of the subspace matrix $\mathbf{B}$ . In addition, we provide an explicit success probability (unlike [20]) that increases with the number of signal samples. In [23], the authors study the problem of non-stationary blind super-resolution in a noiseless setting, namely, recovering parameters of a sum of unknown complex exponentials from modulations with unknown waveforms. In contrast, we consider a more practical scenario in which the observed data are contaminated with noise. Therefore, we use different algorithms in this work. In [23], one can exactly recover the unknown parameters with high probability when provided with enough samples. However, it is no longer possible to achieve exact recovery in this work since we only have access to the noisy observations. Thus, the goal of this work is to characterize the MSE as a function of the noise variance and the true signal parameters.

The main contribution of this work is that we have quantified how the MSE depends on the noise level and the true signal parameters. Namely, we provide a theoretical result to bound the MSE in terms of the noise variance, the total number of uniform samples, the number of true frequencies and the dimension of the subspace in which the unknown waveform modulations live. To be more precise, we have proved both theoretically and numerically that 1) the MSE scales linearly with the noise variance and the subspace dimension, and 2) the MSE is inversely proportional to the total number of uniform samples. We have proved theoretically that the MSE scales at worst with square of the number of true frequencies but numerical experiments show that it scales linearly with the number of frequencies. We leave the problem of improving our theoretical bound to match the numerical experiments for our future work.

The remainder of this paper is organized as follows. In Section 2, we set up our problem, propose an atomic norm denoising program and introduce its semidefinite program (SDP). In Section 3, we present the main theorem that provides the theoretical guarantee for the atomic norm denoising program. In Section 4, we illustrate the theoretical guarantee with several numerical simulations. The proof for the main theory is presented in Section 5. Finally, we conclude this work and discuss future direction in Section 6. The Appendix provides some supplementary theoretical results.

2 Problem Formulation

In this work, we consider the following signal

[TABLE]

which can be interpreted as uniform samples of a continuous-time superposition of complex exponentials with unknown amplitudes and frequencies, each modulated by a different waveform. The requirement for the above sampling indices to be centered around zero is just for technical convenience. Denote $N=4M+1$ as the length of our samples. All the conclusions in this work remain true with appropriate modifications for any $N$ consecutive samples. Without loss of generality, we assume that the unknown coefficients $c_{j}>0$ and the unknown frequencies $\tau_{j}$ are normalized, i.e., $\tau_{j}\in[0,1)$ . $\bm{g}_{j}\in\mathbb{C}^{N}$ are unknown waveforms and $J$ is the number of active frequencies in the signal $\bm{x}^{\star}$ .

The signal model introduced in (2.1) appears in a wide range of applications. For example, the authors in [9] consider a radar imaging problem of identifying the relative distances and velocities of targets from a received signal $x(t)=\sum_{j=1}^{J}c_{j}e^{i2\pi\tau_{j}t}g(t-\nu_{j})$ , which can be viewed as a sum of finitely many delays (by $\nu_{j}$ ) and Doppler shifts (by $\tau_{j}$ ) of a given transmitted signal $g(t)$ . The signal model (2.1) can be obtained by sampling the received signal $x(t)$ and defining $\bm{g}_{j}$ to be samples of the $j$ th delayed copy of $g(t)$ . In addition, the signal in NMR spectroscopy is modeled as $x(t)=\sum_{j=1}^{J}c_{j}e^{i2\pi\tau_{j}t}e^{-\nu_{j}t}$ in [14]. One can again obtain the signal model (2.1) by sampling the NMR spectroscopy signal $x(t)$ . Finally, the received signal in multi-user communication systems [24] can be modeled as $x(t)=\sum_{j=1}^{J}c_{j}g_{j}(t-\tau_{j})$ , with each $c_{j}$ being an unknown coefficient and each $\tau_{j}$ being an unknown delay of an unknown transmitted signal $g_{j}(t)$ . In this case, the signal model (2.1) can be obtained by sampling the Fourier transform of $x(t)$ , namely, $\widehat{\bm{x}}(f)=\sum_{j=1}^{J}c_{j}e^{-i2\pi f\tau_{j}}\widehat{g}_{j}(f)$ .

As noise is ubiquitous in practice, we may only have access to the noisy observations of $\bm{x}^{\star}$ , namely,

[TABLE]

where $\bm{z}$ is the observation noise with i.i.d. complex Gaussian entries from the distribution $\mathcal{CN}(0,\sigma^{2})$ . To recover the unknown frequencies $\{\tau_{j}\}$ , coefficients $\{c_{j}\}$ and modulation waveforms $\{\bm{g}_{j}\in\mathbb{C}^{N}\}$ from the noisy observation $\bm{y}$ , we observe that the number of degrees of freedom ( $\mathcal{O}(JN)$ ) is much larger than the number of observations ( $\mathcal{O}(N)$ ), which implies that we need some other assumptions to make the inverse problem well-posed. Therefore, we assume that all the unknown waveforms $\{\bm{g}_{j}\}$ belong to a common and known low-dimensional subspace, which is spanned by the columns of a matrix $\mathbf{B}\in\mathbb{C}^{N\times K}$ with $K\leq N$ . Let $\bm{b}_{m}\in\mathbb{C}^{K}$ denote the $m$ -th column of $\mathbf{B}^{H}$ , i.e.,

[TABLE]

Then, we have $\bm{g}_{j}=\mathbf{B}\bm{h}_{j}$ for some unknown coefficients $\bm{h}_{j}\in\mathbb{C}^{K}$ . Without loss of generality, we also assume that $\bm{h}_{j}$ has unit norm, i.e., $\|\bm{h}_{j}\|_{2}=1$ . This is because the coefficients $c_{j}$ can be scaled as needed. Note that $\bm{g}_{j}$ can be estimated once $\bm{h}_{j}$ is recovered. Therefore, the number of degrees of freedom becomes $\mathcal{O}(JK)$ , which can be smaller than $\mathcal{O}(N)$ when we have enough measurements, that is, when $N$ is large enough.

As is illustrated in [23], the assumption that the unknown waveforms $\{\bm{g}_{j}\}$ belong to a common and known low-dimensional subspace appears in many real applications such as super-resolution imaging and multi-user communication systems. For example, the point spread functions in super-resolution imaging can be modeled as Gaussian kernels with unknown widths [12, 25]. One can construct a dictionary of Gaussian functions having different widths and apply principal component analysis to this dictionary to discover a low-dimensional subspace that accurately represents the unknown point spread functions. This is also demonstrated in the simulations of [23].

For any $\tau\in[0,1)$ , define a vector $\bm{a}(\tau)\in\mathbb{C}^{N}$ as

[TABLE]

Then, with the assumption that $\bm{g}_{j}=\mathbf{B}\bm{h}_{j}$ , we can rewrite each sample of the signal $\bm{x}^{\star}$ in (2.1) as

[TABLE]

where $\bm{e}_{m}$ is the $(m+2M+1)$ -th column of the $N\times N$ identity matrix $\mathbf{I}_{N}$ 111Note that we use $\mathbf{I}$ with a subscript to denote an identify matrix with appropriate size in this work. and $\operatorname{tr}(\cdot)$ denotes the trace of a matrix. The inner product between two matrices $\mathbf{X}_{1}$ and $\mathbf{X}_{2}$ is defined as

[TABLE]

Define a linear operator $\mathcal{B}:\mathbb{C}^{K\times N}\rightarrow\mathbb{C}^{N}$ and its corresponding adjoint operator $\mathcal{B}^{*}:\mathbb{C}^{N}\rightarrow\mathbb{C}^{K\times N}$ as

[TABLE]

where $\left[\mathcal{B}(\mathbf{X})\right]_{m}$ denotes the $m$ -th entry of $\mathcal{B}(\mathbf{X})$ . Define

[TABLE]

Then, we have

[TABLE]

and the noisy observation vector $\bm{y}$ can be rewritten as

[TABLE]

When $J$ is small, the noiseless data matrix $\mathbf{X}^{\star}$ , defined in (2.5), can be viewed as a sparse combination of elements from the following atomic set

[TABLE]

with $\bm{a}(\tau)$ defined in (2.2). The associated atomic norm is then defined as

[TABLE]

where $\operatorname{conv}(\mathcal{A})$ is the convex hall of the atomic set $\mathcal{A}$ .

Let $\lambda$ denote a suitably chosen regularization parameter.222Please refer to Section 5.1 for guidelines on choosing $\lambda$ . Inspired by the sparse representation of $\mathbf{X}^{\star}$ with respect to the atomic set $\mathcal{A}$ , we perform denoising by proposing an atomic norm regularized least-squares problem

[TABLE]

which is equivalent to the following semidefinite program (SDP)333 In practice, one can use the CVX software package to solve this SDP [26]. [17, 22]

[TABLE]

Here, $\operatorname{Toep}(\bm{u})\in\mathbb{C}^{N\times N}$ denotes the Hermitian Toeplitz matrix with $\bm{u}$ being its first column.

In this work, our goal is to analyze the performance of the above denoising scheme by bounding the mean-squared error (MSE) $\frac{1}{N}\|\widehat{\bm{x}}-\bm{x}^{\star}\|_{2}^{2}$ between the solution $\widehat{\bm{x}}=\mathcal{B}(\widehat{\mathbf{X}})$ and the true signal $\bm{x}^{\star}=\mathcal{B}(\mathbf{X}^{\star})$ .

3 Theoretical Guarantee for Atomic Norm Denoising

Motived by [27, 28, 23], we assume that the columns of $\mathbf{B}^{H}$ , i.e., $\bm{b}_{m}\in\mathbb{C}^{K}$ , are independent and identically distributed (i.i.d.) samples from a distribution that satisfies the isotropy and incoherence properties with coherence parameter $\mu$ .

•

Isotropy property: A distribution $\mathcal{F}$ satisfies the isotropy property if444Note that this definition of isotropy property is slightly different from the one used in [27, 28, 23]. To give an example of $\bm{b}$ that obeys the isotropy and incoherence properties (3.1) and (3.2) with $\mu=1$ , we can choose the entries of $\bm{b}$ from the Rademacher distribution.

[TABLE]

•

Incoherence property: A distribution $\mathcal{F}$ satisfies the incoherence property with coherence $\mu$ if

[TABLE]

holds almost surely. Here, $\bm{b}(k)$ denotes the $k$ -th entry of $\bm{b}$ .

Next, we present the main result that characterizes the MSE $\frac{1}{N}\|\widehat{\bm{x}}-\bm{x}^{\star}\|_{2}^{2}$ in the following theorem.

Theorem 3.1.

Suppose the noiseless signal $\bm{x}^{\star}$ is given as in (2.1) with the true frequencies satisfying a minimum separation condition

[TABLE]

where $d(\tau_{j},\tau_{k})\triangleq|\tau_{j}-\tau_{k}|$ denotes the wrap-around distance on the unit circle. Assume that we have $N=4M+1$ noisy measurements $\bm{y}(m)=\bm{x}^{\star}(m)+\bm{z}(m),\leavevmode\nobreak\ m=-2M,\ldots,0,\ldots,2M$ , where $\bm{z}(m)$ is i.i.d. complex Gaussian noise with mean 0 and variance $\sigma^{2}$ . We also assume that $\bm{b}_{m}$ are i.i.d. samples from a distribution that satisfies the isotropy (3.1) and incoherence (3.2) properties with coherence parameter $\mu$ . Then, the estimate $\widehat{\bm{x}}$ obtained by solving the atomic norm regularized least-squares problem (2.7) with $\lambda=2\eta\sigma\|\mathbf{B}\|_{F}\sqrt{\log(N)}$ for some $\eta>1$ satisfies

[TABLE]

with probability at least $1-cN^{-1}$ if $\eta\!\in\!(1,\infty)$ is chosen sufficiently large and $N\geq C\mu J^{2}K\log\left(NJK\right)$ . Here, $C$ and $c$ are some numerical constants.555Note that the numerical constants $C$ and $c$ used in this paper can vary from line to line.

Our use of the isotropy (3.1) and incoherence (3.2) properties parallels the assumptions made for subspace models in several related works. These properties were first defined in [27] for the development of a probabilistic and RIPless theory of compressed sensing, and then used in [28] for the problem of blind spikes deconvolution. Other random subspace assumptions are also used in [29, 30] for random channel coding and blind deconvolution. The transmitted signal in multi-user communication systems [24] can also be represented in a known low-dimensional random subspace in the case when each of the transmitters sends out a random signal for the sake of security, privacy, or spread spectrum communications.666Random signals also appear in applications such as noise radar [31]. The transmitted random signals can be either directly generated from a noise-generating microwave source or obtained by modulating a sine wave with random noise. Note that the matrix $\mathbf{B}$ with randomness assumptions can alternatively be viewed as a sensing measure to obtain random measurements of the data matrix $\mathbf{X}^{\star}$ via (2.3). As stated in [27] and many other compressive sensing works, random measurements are crucial in the development of theoretical results, and can result in better empirical results. As evidenced by the numerical experiments in [23], the randomness assumption on $\mathbf{B}$ does not appear to be critical in practice. Finally, observe that we have used not only the same isotropy and incoherence properties on the random subspace model as in papers [28, 23], but also some other randomness assumptions on the subspace, namely, $\mathbb{E}\left(\frac{\bm{b}}{\|\bm{b}\|_{2}}\frac{\bm{b}^{H}}{\|\bm{b}\|_{2}}\right)=\frac{1}{K}\mathbf{I}_{K}$ , where $\bm{b}$ denotes arbitrary column of $\mathbf{B}^{H}$ . This part of the isotropy property is only an artifact of our proof technique.

Note that an oracle MSE rate of $\mathcal{O}(\sigma^{2}KJ/N)$ could be achieved if one had enough measurements (i.e., $N\geq KJ$ ) and perfect knowledge of the well-separated frequencies $\tau_{j}$ . To be more precise, the noiseless signal $\bm{x}^{\star}$ in (2.1) can be written as $\bm{x}^{\star}=\mathbf{M}_{ab}\bm{v}_{ch}^{\star}$ , where $\mathbf{M}_{ab}\in\mathbb{C}^{N\times KJ}$ is a matrix related to the subspace matrix $\mathbf{B}$ and vectors $\bm{a}(\tau_{j})$ , and $\bm{v}_{ch}^{\star}\in\mathbb{C}^{KJ}$ is a vector determined by the coefficients $c_{j}$ and $\bm{h}_{j}$ . If $N\geq KJ$ and the well-separated frequencies $\tau_{j}$ are known, the matrix $\mathbf{M}_{ab}$ is then known and is a tall matrix with full column rank due to the randomness of $\mathbf{B}$ . Then, recovering $\bm{x}^{\star}$ from its noisy observation $\bm{y}=\bm{x}^{\star}+\bm{z}$ is equivalent to solving a least-squares problem. Elementary calculations then show that $\frac{1}{N}\mathbb{E}_{\bm{z}}\|\widehat{\bm{x}}-\bm{x}^{\star}\|_{2}^{2}=\sigma^{2}\frac{KJ}{N}$ . Therefore, our proposed MSE bound (3.4) is optimal except for an extra $J$ factor and some logarithmic terms. This extra $J$ factor may be removed by using some other proof strategies instead of the dual analysis of atomic norm. We leave the problem of improving our theoretical bound for future work.

Finally, note that we have removed the randomness assumption on $\bm{h}$ as is required in [23]. However, we bound the sample complexity $N$ with $\mathcal{O}(J^{2}K)$ instead of $\mathcal{O}(JK)$ . We also notice that the author in [28] provides a sample complexity bound $\mathcal{O}(J^{2}K^{2})$ without using any randomness assumptions on $\bm{h}$ . It is worth noting that those papers are not focused on signal denoising as we are in this work. The extra $J$ factor in our sample complexity bound is a result of removing the randomness assumption on $\bm{h}$ that is used in Lemmas 11 and 13 of reference [23].

4 Numerical Simulations

In this section, we conduct four numerical experiments to support our theoretical bound in Theorem 3.1. In all of these experiments, we perform denoising by solving the SDP of the atomic norm regularized least-squares problem (2.7) with CVX. As is suggested in Theorem 3.1, we set the regularization parameter as $\lambda=2\eta\sigma\|\mathbf{B}\|_{F}\sqrt{\log(N)}$ with $\eta=0.5$ 777Note that we require $\eta>1$ in Theorem 3.1. However, we find that $\eta=0.5$ can achieve much lower MSE in practice. Thus, we set $\eta=0.5$ for all of the following experiments.. We generate the entries of $\mathbf{B}$ as Rademacher random variables and the entries of $\bm{h}_{j},\leavevmode\nobreak\ j=1,\ldots,J$ as Gaussian random variables satisfying $\mathcal{N}(0,1)$ and then normalize all the $\bm{h}_{j}$ to make sure $\|\bm{h}_{j}\|_{2}=1$ . We generate the observation noise $\bm{z}$ with i.i.d. complex Gaussian entries satisfying $\mathcal{CN}(0,\sigma^{2})$ . 50 trials are performed for each of these experiments.

In the first experiment, we examine the relationship between the MSE $\frac{1}{N}\|\widehat{\bm{x}}-\bm{x}^{\star}\|_{2}^{2}$ and the total number of uniform samples $N$ with $J=3,\leavevmode\nobreak\ K=4,$ and $\sigma=0.1$ . The true frequencies and corresponding amplitudes are set as $\tau_{1}=0.1,\leavevmode\nobreak\ \tau_{2}=0.15,\leavevmode\nobreak\ \tau_{3}=0.5$ and $c_{1}=1,\leavevmode\nobreak\ c_{2}=2,\leavevmode\nobreak\ c_{3}=3$ . We change $M$ from 10 to 100, namely, $N=4M+1$ changes from 41 to 401. Figure 1(a) shows the denoising performance of atomic norm regularized least-squares problem (2.7) while Figure 1(b) indicates that the MSE does scale with $\mathcal{O}(\frac{1}{N}\log(N))$ , which implies that the MSE can decrease linearly (if we ignore the log term) as we increase the number of uniform samples $N$ .

In the second experiment, we characterize the impact of the noise variance $\sigma^{2}$ on MSE. We fix $M=20$ and set $\sigma=0.025:0.025:0.25$ . Other parameters are set the same as the first experiment. It can be seen from Figure 2(a) that the MSE does scale linearly with $\sigma^{2}$ , as is shown in Theorem 3.1.

To see the influence of the number of true frequencies $J$ on the MSE, we repeat the first experiment by fixing $M=20$ and changing $J$ from 1 to 10. We randomly select $J$ true frequencies from a set $\{0:\operatorname{MinSep}:1-\operatorname{MinSep}\}$ with $\operatorname{MinSep}=\frac{1}{M}$ denoting the minimum separation. The corresponding amplitudes are then set as $c_{j}=j,\leavevmode\nobreak\ j=1,\ldots,J$ . Other parameters are the same as those used in the first experiment. Figure 2(b) implies that the MSE scales with $J\log(J)$ , which is better than the one indicated in Theorem 3.1. We leave the improvement of Theorem 3.1 for future work.

Finally, we illustrate the relationship between the MSE and the subspace dimension $K$ in the last experiment. We set $M=20$ and change $K$ from 1 to 10. Other parameters are same as those used in the first experiment. Figure 2(c) roughly shows a linear relationship between the MSE and $K\log(K)$ , which is consistent with the bound in Theorem 3.1.

5 Proof of Theorem 3.1

The proof of Theorem 3.1 is presented in two separate subsections. In Subsection 5.1, we discuss how to choose the regularization parameter $\lambda$ . It is well known that a good choice of the regularization parameter $\lambda$ can achieve accelerated convergence rates.888The conditions under which an accelerated convergence rate can be obtained for an atomic norm denoising problem is discussed in [19]. In Subsection 5.2, with the well chosen regularization parameter $\lambda$ , we bound the MSE with respect to the noise level and true signal parameters. We prove Theorem 3.1 by extending the proof of [20, Theorem 1] and [18, Theorem III.6] to our framework. Note that [18, Theorem III.6] is a multiple measurement vector (MMV) extension of [20, Theorem 1]. Due to the random linear operator $\mathcal{B}$ that appears in our atomic norm denoising problem (2.7), we will develop a random extension of the two previous results, [20, Theorem 1] and [18, Theorem III.6], with respect to $\mathcal{B}$ . The proof idea is borrowed from these prior works. However, the proof there does not extend directly to our framework with the linear operator $\mathcal{B}$ . For example, this linear operator $\mathcal{B}$ can first affect our choice of the regularization parameter $\lambda$ .

Inspired by the regularization parameter used in prior works [19, 20, 18], we set

[TABLE]

in the atomic norm regularized least-squares problem (2.7). Here, $\bm{z}$ is a complex Gaussian vector and $\eta\in(1,\infty)$ is a constant that must be large enough to enable the proof of Lemma 5.5. $\|\cdot\|_{\mathcal{A}}^{*}$ is defined as the dual norm of the atomic norm defined in (2.6), which is given as

[TABLE]

In order to set $\lambda$ , we first compute an upper bound for $\mathbb{E}_{\bm{z}}\|\mathcal{B}^{*}(\bm{z})\|_{\mathcal{A}}^{*}$ .

5.1 Bounding $\mathbb{E}_{\bm{z}}\|\mathcal{B}^{}(\bm{z})\|_{\mathcal{A}}^{}$ and $\mathbb{P}\left(\|\mathcal{B}^{}(\bm{z})\|_{\mathcal{A}}^{}\leq\frac{\lambda}{\eta}\right)$

Lemma 5.1.

Let $\bm{z}\in\mathbb{C}^{N}$ be a random vector with i.i.d. complex Gaussian entries from the distribution $\mathcal{CN}(0,\sigma^{2})$ . Define a linear operator $\mathcal{B}^{*}:\mathbb{C}^{N}\rightarrow\mathbb{C}^{K\times N}$ as in (2.4), i.e.,

[TABLE]

where $\bm{b}_{m}$ is the $m$ -th column of a $K\times N$ matrix $\mathbf{B}^{H}$ , the Hermitian matrix of $\mathbf{B}$ . $\bm{x}(m)$ is the $m$ th entry of $\bm{x}$ and $\bm{e}_{m}$ is the $(m+2M+1)$ -th column of the $N\times N$ identity matrix $\mathbf{I}_{N}$ . Then, there exists a numerical constant $C\in(1,2)$ such that

[TABLE]

By setting the regularization parameter as $\lambda=2\eta\sigma\|\mathbf{B}\|_{F}\sqrt{\log(N)}$ , we have

[TABLE]

with $c$ being some constant.

Proof.

The dual norm defined in (5.1) implies that

[TABLE]

where $\bm{b}_{m}(k)$ is the $k$ -th entry of $\bm{b}_{m}$ . We have defined a polynomial $\mathcal{Z}_{N}(e^{i2\pi\tau})$ as

[TABLE]

Note that we have

[TABLE]

for any $\tau_{1},\tau_{2}\in[0,1)$ . The first inequality follows from the mean value theorem while the last inequality follows from Bernstein’s inequality for polynomials [32].

Let $\tau_{2}$ take any of the values $0,\frac{1}{L},\ldots,\frac{L-1}{L}$ , which gives us

[TABLE]

Then, we upper bound $\left(\|\mathcal{B}^{*}(\bm{z})\|_{\mathcal{A}}^{*}\right)^{2}$ with

[TABLE]

if $L\geq 4\pi N$ . It follows that

[TABLE]

and

[TABLE]

Observe that, conditioned on $\{\bm{b}_{m}\}$ ,

[TABLE]

where $u_{k,l}$ is a complex Gaussian random variable and defined as

[TABLE]

for $k=1,\ldots,K,\leavevmode\nobreak\ l=0,\ldots,L-1$ . Note that the expectation and variance of $u_{k,l}$ are given as

[TABLE]

since $\bm{z}(m)\sim\mathcal{CN}(0,\sigma^{2})$ . Therefore, conditioned on $\{\bm{b}_{m}\}$ , the complex Gaussian random variable $u_{k,l}$ defined in (5.7) satisfies $\mathcal{CN}(0,\widetilde{\sigma}_{k}^{2})$ . Let $u_{k,l}^{r}$ and $u_{k,l}^{i}$ denote the real part and imaginary part of $u_{k,l}$ , i.e., $u_{k,l}=u_{k,l}^{r}+iu_{k,l}^{i}$ . Then, we have

[TABLE]

which implies that

[TABLE]

is a chi-squared random variable with two degrees of freedom since both $\frac{\sqrt{2}u_{k,l}^{r}}{\widetilde{\sigma}_{k}}$ and $\frac{\sqrt{2}u_{k,l}^{i}}{\widetilde{\sigma}_{k}}$ satisfy standard normal distribution. Using to the properties of the chi-square distribution, we have

[TABLE]

Choosing $\widetilde{\delta}_{k}=2\widetilde{\sigma}_{k}^{2}\log(L)$ and $L=4\pi N\log(N)$ , together with inequalities (5.5), (5.6), and (5.8), we finally obtain

[TABLE]

where $C$ is a numerical constant that belongs to the interval $(1,2)$ when $N$ is large. Note that the last equality follows from the fact that

[TABLE]

This completes the proof for inequality (5.2).

Next, we can set the regularization parameter $\lambda$ as

[TABLE]

for some constant $\eta\in(1,\infty)$ and continue to prove inequality (5.3). It follows from (5.4) that

[TABLE]

where $\mathcal{W}_{l,k}\triangleq\sum_{m=-2M}^{2M}\bm{z}(m)\bm{b}_{m}(k)e^{i2\pi(l/L)m}$ is a set of complex Gaussian variables with mean 0 and variance $N\widetilde{\sigma}_{k}^{2}\sigma^{2}$ since $\bm{z}(m)\sim\mathcal{CN}(0,\sigma^{2})$ and $\mathbf{B}$ is fixed. Then, we have

[TABLE]

for any $\beta>1/\sqrt{2\pi}$ [33]. As a consequence, we have

[TABLE]

where the first inequality follows by plugging in (5.9), $\lambda=2\eta\sigma\|\mathbf{B}\|_{F}\sqrt{\log(N)}$ , and $\|\mathbf{B}\|_{F}^{2}=N\sum_{k=1}^{K}\widetilde{\sigma}_{k}^{2}$ . The second inequality comes from the union bound. By letting $\beta=2\sqrt{\left(1-\frac{2\pi N}{L}\right)\log(N)}$ and $L=8\pi N$ , we finally obtain

[TABLE]

with some numerical constant $c$ . Here, the first inequality follows from (5.10).

∎

5.2 Bounding $\frac{1}{N}\|\widehat{\bm{x}}-\bm{x}^{\star}\|_{2}^{2}$

Now, we can set the regularization parameter as $\lambda=2\eta\sigma\|\mathbf{B}\|_{F}\sqrt{\log(N)}$ for some constant $\eta\in(1,\infty)$ such that

[TABLE]

holds with probability at least $1-c\frac{1}{N}$ , as is shown in Lemma 5.1.

With some fundamental computations based on convex analysis, we have the following lemma that provides optimality conditions for $\widehat{\mathbf{X}}$ to be the solution of the atomic norm regularized least-squares problem (2.7).

Lemma 5.2.

(Optimality Conditions): $\widehat{\mathbf{X}}$ is the solution of the atomic norm regularized least-squares problem (2.7) if and only if

$\|\mathcal{B}^{*}(\bm{y}-\widehat{\bm{x}})\|_{\mathcal{A}}^{*}\leq\lambda$ , 2. 2.

$\langle\widehat{\mathbf{X}},\mathcal{B}^{*}(\bm{y}-\widehat{\bm{x}})\rangle_{\mathbb{R}}=\lambda\|\widehat{\mathbf{X}}\|_{\mathcal{A}}$ .

Define a vector-valued representing measure for the true data matrix $\mathbf{X}^{\star}$ as

[TABLE]

with $\tau\in[0,1),\leavevmode\nobreak\ \|\bm{h}_{j}\|_{2}=1$ , that is, we have

[TABLE]

Similarly, we can also define a representation measure $\widehat{\boldsymbol{\mu}}$ for the recovered data matrix $\widehat{\mathbf{X}}$ and represent it as

[TABLE]

Then, a difference measure can be defined as

[TABLE]

which implies that we can represent the recovery error as

[TABLE]

Define the $j$ -th near region corresponding to $\tau_{j}$ and the far region as

[TABLE]

where $d(\tau,\tau_{j})\triangleq|\tau-\tau_{j}|$ denotes the wrap-around distance on the unit circle. Define

[TABLE]

It follows that $\bm{e}=\mathcal{B}(\mathbf{E})$ and we can then bound $\|\bm{e}\|_{2}^{2}$ as

[TABLE]

Here, we have defined a vector-valued error function $\boldsymbol{\xi}(\tau)\triangleq\mathcal{B}^{*}\mathcal{B}(\mathbf{E})\bm{a}(\tau)=\mathcal{B}^{*}(\bm{e})\bm{a}(\tau)$ .

With a little abuse of notation, we define

[TABLE]

By using the optimality conditions in Lemma 5.2 and the assumption that the bound condition in (5.11) holds, we have

[TABLE]

To bound the MSE $\frac{1}{N}\|\bm{e}\|_{2}^{2}$ , we need the following three key lemmas.

Lemma 5.3.

Observe that each entry of $\boldsymbol{\xi}(\tau)=\mathcal{B}^{*}\mathcal{B}(\mathbf{E})\bm{a}(\tau)$ is an order- $N$ trigonometric polynomial. We have

[TABLE]

with

[TABLE]

The proof of Lemma 5.3 is given in Appendix A.

Lemma 5.4.

For some numerical constants $C_{0}$ and $C_{1}$ , we have that

[TABLE]

hold with probability at least $1-2\delta$ when provided with $N\geq C\mu J^{2}K\log\left(\frac{NJK}{\delta}\right)$ . Here, $\|\cdot\|_{2,2}\triangleq\left(\int_{0}^{1}\|\cdot\|_{2}^{2}d\tau\right)^{\frac{1}{2}}$ is defined as the $2,2$ norm.

The proof of Lemma 5.4 is given in Appendix B.

Lemma 5.5.

There exists a numerical constant $C$ such that

[TABLE]

holds with probability at least $1-2\delta$ for some sufficiently large $\eta>1$ .

The proof of Lemma 5.5 is given in Appendix C.

As a consequence of the above three lemmas, we have

[TABLE]

Note that

[TABLE]

where the fourth equality follows from Parseval’s theorem and $\bm{e}(m)$ is the $m-$ th entry of $\bm{e}$ . It follows that

[TABLE]

Finally, plugging (5.17) into (5.16), we have that

[TABLE]

holds with probability at least $1-cN^{-1}$ when provided with $N\geq C\mu J^{2}K\log\left(\frac{NJK}{\delta}\right)$ . Here, the last two inequalities follow from the incoherence property (3.2) and by setting $\delta=N^{-1}$ .

Next, we explain the reason why we use $N\geq C\mu J^{2}K\log\left(\frac{NJK}{\delta}\right)$ instead of the lower bound provided in paper [23], which considers the noiseless counterpart of this work. Particularly, [23] requires $N$ to satisfy

[TABLE]

if all the $\bm{h}_{j}$ are i.i.d. symmetric random samples from the complex unit sphere, namely, $\mathbb{E}\bm{h}_{j}\bm{h}_{j}^{H}=\frac{1}{K}\mathbf{I}_{K}$ . In order to drop this randomness assumption on $\bm{h}_{j}$ since we never use it in our proof, we make a slight modification of the proof in paper [23]. Note that the authors in [23] only use the randomness assumption on $\bm{h}_{j}$ in Lemmas 11 and 13. Therefore, we only need to bound $\|\mathbf{I}_{1}^{l}(\tau_{d})\|_{2}$ and $\|\mathbf{I}_{2}^{l}(\tau_{d})\|_{2}$ in Lemmas 11 and 13 without the randomness assumption on $\bm{h}_{j}$ .

In this part, we use the same notation as paper [23]. Readers can refer to paper [23] for detailed definition of all variables. Inspired by the proof of [28, Lemma 5], we have

[TABLE]

which is conditioned on $\mathcal{E}_{3}\bigcap\mathcal{E}_{1,\varepsilon_{1}}$ . Here, $\mathcal{E}_{3}$ and $\mathcal{E}_{1,\varepsilon_{1}}$ are two events defined in [23]. The last inequality follows from $\sup_{\tau_{d}\in\Omega_{\operatorname{Grid}}}\|\mathbf{V}_{l}(\tau_{d})-\mathbb{E}\mathbf{V}_{l}(\tau_{d}))^{H}\mathbf{L}\|\leq 4\varepsilon_{2}$ on the event $\mathcal{E}_{3}$ and $\|\bm{h}\|_{2}=\sqrt{J}$ with $\bm{h}=[\bm{h}_{1}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ \bm{h}_{J}^{H}]^{H}$ . Then, we can obtain $\sup_{\tau_{d}\in\Omega_{\operatorname{Grid}}}\|\mathbf{I}_{1}^{l}(\tau_{d})\|_{2}\leq\varepsilon_{4}$ by setting $\varepsilon_{2}\leq\frac{\varepsilon_{4}}{4\sqrt{J}}$ .

Getting rid of the conditional probability, we have

[TABLE]

It is shown in paper [23] that the first term $4|\Omega_{\operatorname{Grid}}|\delta_{2}\leq\delta$ and the second term $\mathbb{P}(\mathcal{E}^{c}_{1,\varepsilon_{1}})\leq\delta$ when provided

[TABLE]

and

[TABLE]

respectively. Thus, for some constant $C$ , we have

[TABLE]

provided

[TABLE]

Note that we set $\varepsilon_{1}=\frac{1}{4}$ and absorb all of the constants into one here.

Similarly, conditioned on $\mathcal{E}_{1,\varepsilon_{1}}$ , we have

[TABLE]

for some numerical constant $C$ . Then, we can obtain $\sup_{\tau_{d}\in\Omega_{\operatorname{Grid}}}\|\mathbf{I}_{2}^{l}(\tau_{d})\|_{2}\leq\varepsilon_{5}$ by setting $\varepsilon_{1}\leq\frac{\varepsilon_{5}}{C\sqrt{J}}$ .

Getting rid of the conditional probability, we have

[TABLE]

when provided

[TABLE]

Now, we have dropped the randomness assumption on $\bm{h}_{j}$ that is used in Lemmas 11 and 13 of paper [23]. We can follow the remaining proof of [23] and finally get

[TABLE]

Define $\mathcal{T}\triangleq\{\tau_{1},\tau_{2},\cdots,\tau_{J}\}$ as the true frequency set. The above bound on $N$ can guarantee that the $\ell_{2}$ norm of the dual polynomial $\mathcal{Q}(\tau)$ constructed in [23] is strictly less than 1 when $\tau\notin\mathcal{T}$ , which is used in the proof of Lemma 5.4.

6 Conclusion

In this work, we recover a signal that consists of a superposition of complex exponentials with unknown waveform modulations from its noisy measurements by solving an atomic norm regularized least-squares problem. We analyze the mean square error (MSE) and provide a theoretical result to bound the MSE in terms of the noise variance, the total number of uniform samples, the number of true frequencies, and the dimension of the subspace in which the unknown waveform modulations live. Meanwhile, we conduct several numerical experiments to support the theory. One of the experiments indicates that there is a room to improve the MSE bound and make it scale linearly with the number of true frequencies. We leave this for our future work.

Acknowledgement

The authors would like to thank Jonathan Helland at the Colorado School of Mines for some helpful discussions on atomic norm denoising. The authors would also like to thank the anonymous reviewers for their constructive comments and suggestions which greatly improve the quality of this paper. This work was supported by NSF grant CCF-1409258, NSF grant CCF-1464205, and NSF grant CCF-1704204.

Appendix A Proof of Lemma 5.3

Let $\bm{u}\in\mathbb{C}^{K}$ be any vector with $\|\bm{u}\|_{2}=1$ . Define a trigonometric polynomial

[TABLE]

with degree $N$ . Then, we have the following two inequalities

[TABLE]

which follow from the Bernstein’s inequality for polynomials [32]. As a consequence, we have

[TABLE]

Therefore, we obtain an upper bound on $\|\boldsymbol{\xi}^{\prime}(\tau)\|_{2,\infty}$ :

[TABLE]

With a similar argument, we also have

[TABLE]

The Taylor expansion of $\gamma(\tau)$ at $\tau_{j}$ is

[TABLE]

with some $\widetilde{\tau}_{j}\in N_{j}$ . Now, by using the inequality (A.1), we obtain

[TABLE]

Defining a function $\bm{r}(\tau)$ as

[TABLE]

we note that

[TABLE]

Then, we have

[TABLE]

Now, we can bound the second term in (5.12) as follows

[TABLE]

where $I_{l},l=0,1,2$ are defined in Lemma 5.3. Here, we have plugged in $\boldsymbol{\xi}(\tau)=\boldsymbol{\xi}(\tau_{j})+(\tau-\tau_{j})\boldsymbol{\xi}^{\prime}(\tau_{j})+\bm{r}(\tau)$ to get the first equality. On the other hand, the first term in (5.12) can be bounded as

[TABLE]

by using the Cauchy-Schwarz inequality.

Finally, the square error $\|\bm{e}\|_{2}^{2}$ can be upper bounded as

[TABLE]

and we finish the proof of Lemma 5.3.

Appendix B Proof of Lemma 5.4

To prove Lemma 5.4, we need the following two theorems, which are multiple measurement vector (MMV) random extensions of [20, Theorems 4, 5] and are proved in Appendix D and E.

Theorem B.1.

Define a $K$ dimensional unit ball $\mathcal{H}=\{\bm{h}\in\mathbb{C}^{K}:\|\bm{h}\|_{2}=1\}$ . For any $\tau_{1},\tau_{2},\ldots,\tau_{J}$ satisfying the minimum separation condition (3.3), there exists a dual certificate $\bm{q}$ such that the corresponding vector-valued trigonometric polynomial $\mathcal{Q}(\tau)=\mathcal{B}^{*}(\bm{q})\bm{a}(\tau)$ satisfies the following properties for some $\bm{q}\in\mathbb{C}^{N}$ provided that $N\geq C\mu J^{2}K\log\left(\frac{NJK}{\delta}\right)$ .

For each $j=1,\ldots,J$ , $\mathcal{Q}(\tau_{j})=\bm{h}_{j}$ with $\bm{h}_{j}\in\mathcal{H}$ . 2. 2.

In each near region $N_{j}=\{\tau:d(\tau,\tau_{j})<0.16/N\}$ , there exist constants $C_{a}$ and $C^{\prime}_{a}$ such that

[TABLE] 3. 3.

In the far region $\tau\in F=[0,1)/\cap_{j=1}^{J}N_{j}$ , there exists a constant $C_{b}>0$ such that

[TABLE]

Theorem B.2.

Define a $K$ dimensional unit ball $\mathcal{H}=\{\bm{h}\in\mathbb{C}^{K}:\|\bm{h}\|_{2}=1\}$ . For any $\tau_{1},\tau_{2},\ldots,\tau_{J}$ satisfying the minimum separation condition (3.3), there exists a vector-valued trigonometric polynomial $\mathcal{Q}_{1}(\tau)=\mathcal{B}^{*}(\bm{q}_{1})\bm{a}(\tau)$ that satisfies the following properties for some $\bm{q}_{1}\in\mathbb{C}^{N}$ provided that $N\geq C\mu J^{2}K\log\left(\frac{NJK}{\delta}\right)$ .

In each near region $N_{j}=\{\tau:d(\tau,\tau_{j})<0.16/N\}$ , there exists a constant $C_{a}^{1}$ such that

[TABLE] 2. 2.

In the far region $\tau\in F=[0,1)/\cap_{j=1}^{J}N_{j}$ , there exists a constant $C_{b}^{1}>0$ such that

[TABLE]

Next, we define a dual certificate as follows:

Definition B.1.

(Dual Certificate): Define a vector $\bm{q}\in\mathbb{C}^{N}$ as a dual certificate for $\bm{x}^{\star}$ if $\bm{q}$ makes the corresponding trigonometric polynomial

[TABLE]

satisfy

[TABLE]

where $\mathcal{T}\triangleq\{\tau_{1},\tau_{2},\cdots,\tau_{J}\}$ is defined as a set containing all the true frequencies.

Note that

[TABLE]

where the last inequality follows from $\|\mathcal{Q}(\tau)\|_{2}\leq 1$ and

[TABLE]

by using inequality (B.2) and the fact that $\frac{\int_{N_{j}}\bm{v}(\widehat{\tau})d\widehat{\tau}}{\|\int_{N_{j}}\bm{v}(\widehat{\tau})d\widehat{\tau}\|_{2}}$ belongs to $\mathcal{H}$ .

Recall that the linear operator $\mathcal{B}:\mathbb{C}^{K\times N}\rightarrow\mathbb{C}^{N}$ and its adjoint operator $\mathcal{B}^{*}:\mathbb{C}^{N}\rightarrow\mathbb{C}^{K\times N}$ are defined as in (2.3) and (2.4). Then, we have $\mathcal{B}\mathcal{B}^{*}:\mathbb{C}^{N}\rightarrow\mathbb{C}^{N}$ and $(\mathcal{B}\mathcal{B}^{*})^{-1}:\mathbb{C}^{N}\rightarrow\mathbb{C}^{N}$ given as

[TABLE]

To get (5.13), we still need to bound the first term in (B.8). In particular, we have

[TABLE]

Define a new polynomial $\widetilde{\mathcal{Q}}(\tau)\triangleq\mathcal{B}^{*}(\mathcal{B}\mathcal{B}^{*})^{-1}(\bm{q})\bm{a}(\tau)$ and recall that $\boldsymbol{\xi}(\tau)=\mathcal{B}^{*}\mathcal{B}(\mathbf{E})\bm{a}(\tau)$ . With Parseval’s theorem, we obtain

[TABLE]

where the last inequality follows from the Cauchy-Schwarz inequality. Here, we define the $2,2$ -norm of $\widetilde{\mathcal{Q}}(\tau)$ and $\boldsymbol{\xi}(\tau)$ as

[TABLE]

It follows that

[TABLE]

The following lemma gives an upper bound for $\|\widetilde{\mathcal{Q}}(\tau)\|_{2,2}$ and is proved in Appendix F.

Lemma B.1.

Define two events

[TABLE]

with $\mathbb{P}(\mathcal{E}_{\mathbf{K}})\geq 1-\delta$ and $\mathbb{P}(\mathcal{E}_{\mathbf{K}^{\prime}})\geq 1-\delta$ . $\mathbf{K}$ and $\mathbf{K}^{\prime}$ are two block matrices defined in Appendix F. Conditioned on the above two events $\mathcal{E}_{\mathbf{K}}$ and $\mathcal{E}_{\mathbf{K}^{\prime}}$ , the $2,2$ norm of $\widetilde{\mathcal{Q}}(\tau)$ can be bounded as

[TABLE]

for some numerical constant $C$ provided that $N\geq CK\log\left(\frac{K(J+1)}{\delta}\right)$ .

By plugging (B.9) and (B.10) into (B.8), one can bound $I_{0}$ as

[TABLE]

conditioned on the two events $\mathcal{E}_{\mathbf{K}}$ and $\mathcal{E}_{\mathbf{K}^{\prime}}$ and provided that $N\geq CK\log\left(\frac{K(J+1)}{\delta}\right)$ .

Similar to (B.8), we can divide $I_{1}$ into the following three parts

[TABLE]

where the last inequality follows from (B.4) and (B.5). Then, we are left with bounding the first term in (B.12).

With a similar trick that used for $I_{0}$ , we have

[TABLE]

The vector-valued polynomial $\widetilde{\mathcal{Q}}_{1}(\tau)$ shares the same form as in (F.11), namely,

[TABLE]

with coefficient vectors $\boldsymbol{\alpha}^{1}=[{\boldsymbol{\alpha}_{1}^{1}}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ {\boldsymbol{\alpha}_{J}^{1}}^{H}]^{H}$ and $\boldsymbol{\beta}^{1}=[{\boldsymbol{\beta}_{1}^{1}}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ {\boldsymbol{\beta}_{J}^{1}}^{H}]^{H}$ satisfying

[TABLE]

which can be verified with a similar trick used in (F.15).

Similar to Lemma B.1, we can bound $\|\widetilde{\mathcal{Q}}_{1}(\tau)\|_{2,2}$ as

[TABLE]

with probability as least $1-\delta$ . Finally, one can bound $I_{1}$ as

[TABLE]

conditioned on the two events $\mathcal{E}_{\mathbf{K}}$ and $\mathcal{E}_{\mathbf{K}^{\prime}}$ and provided that $N\geq CK\log\left(\frac{K(J+1)}{\delta}\right)$ . Then, we finish the proof of Lemma 5.4.

Appendix C Proof of Lemma 5.5

Define $\mathcal{P}_{\mathcal{T}}(\boldsymbol{\nu})$ as the projection of $\boldsymbol{\nu}(\tau)$ on the true frequency set $\mathcal{T}\triangleq\{\tau_{1},\tau_{2},\cdots,\tau_{J}\}$ . Set $\mathcal{Q}(\tau)$ as the dual polynomial in Thereom B.1. Denote $\|\cdot\|_{2,\operatorname{TV}}$ as an extension of the traditional TV norm, i.e.,

[TABLE]

where $\mathcal{T}^{c}$ is defined as the complement set of $\mathcal{T}$ on $[0,1)$ . Note that the integration over the far region $F$ can be bounded with

[TABLE]

by using (B.3). On the other hand, we can bound the integration over $N_{j}/\{\tau_{j}\}$ with

[TABLE]

where the second inequality follows from (B.1). Hence, $\|\mathcal{P}_{\mathcal{T}}(\boldsymbol{\nu})\|_{2,\operatorname{TV}}$ can be bounded with

[TABLE]

by plugging (C.2) and (C.3) into (C.1). It follows that

[TABLE]

As in Lemma 5.2, denote $\widehat{\mathbf{X}}$ as the solution of the atomic norm regularized least-squares problem (2.7). Then, we have

[TABLE]

By some elementary calculations, we can obtain

[TABLE]

where the first equality follows from $\bm{y}=\mathcal{B}(\mathbf{X}^{\star})+\bm{z}$ . Then, we have

[TABLE]

which immediately results in

[TABLE]

due to $\|\widehat{\mathbf{X}}\|_{\mathcal{A}}=\|\widehat{\boldsymbol{\mu}}\|_{2,\operatorname{TV}}$ and $\|\mathbf{X}^{\star}\|_{\mathcal{A}}=\|\boldsymbol{\mu}\|_{2,\operatorname{TV}}$ .

Recall that in Lemma 5.3, we have shown

[TABLE]

With a similar technique, we can bound the inner product $|\langle\bm{e},\bm{z}\rangle|$ by

[TABLE]

if $\|\mathcal{B}^{*}(\bm{z})\|_{\mathcal{A}}^{*}\leq\frac{\lambda}{\eta}$ . Here, we also use Lemma 5.4 in the last inequality. Substituting (C.6) into (C.5) leads to

[TABLE]

which further implies

[TABLE]

Combining (C.4) and (C.7), we get

[TABLE]

Finally, we can obtain (5.15) with large enough $\eta$ and finish the proof of Lemma 5.5.

Appendix D Proof of Theorem B.1

We use the dual polynomial constructed in [23], namely,

[TABLE]

with

[TABLE]

being the random matrix kernel and $\boldsymbol{\alpha}=[\boldsymbol{\alpha}_{1}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ \boldsymbol{\alpha}_{J}^{H}]^{H}$ , $\boldsymbol{\beta}=[\boldsymbol{\beta}_{1}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ \boldsymbol{\beta}_{J}^{H}]^{H}$ being the coefficients that are selected such that

[TABLE]

Then, we can ensure that the first and third statements are satisfied due to the construction of this dual polynomial, when $N$ satisfies the lower bound given in (5.18). Then, we are left with proving the second statement.

For all $\tau_{j}\in\mathcal{T}$ , we have $\|\mathcal{Q}(\tau_{j})\|_{2}=1$ and

[TABLE]

due to $\mathcal{Q}^{\prime}(\tau_{j})=\mathbf{0}$ . Furthermore, for all $\tau\in N_{j}$ , we have

[TABLE]

where the last inequality follows from $\|\mathcal{Q}^{\prime}(\tau)\|_{2}^{2}+\operatorname{Re}\{\mathcal{Q}^{\prime\prime}(\tau)^{H}\mathcal{Q}(\tau)\}\leq-CN^{2}$ and $\|\mathcal{Q}(\tau)\|_{2}\leq 1$ for some numerical constant $C$ [23]. The Taylor expansion of $\mathcal{Q}(\tau)$ at $\tau_{j}$ gives

[TABLE]

with some $\widetilde{\tau}\in N_{j}$ . Then, setting $C=\frac{1}{2}C_{\alpha}$ , we can obtain (B.1).

Next, we continue to prove (B.2). Recall the dual polynomial constructed in [17], namely,

[TABLE]

where

[TABLE]

is the squared Fejér kernel and $\overline{\boldsymbol{\alpha}}=[\overline{\boldsymbol{\alpha}}_{1}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ \overline{\boldsymbol{\alpha}}_{J}^{H}]^{H}$ , $\overline{\boldsymbol{\beta}}=[\overline{\boldsymbol{\beta}}_{1}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ \overline{\boldsymbol{\beta}}_{J}^{H}]^{H}$ are the coefficients that are selected such that

[TABLE]

With the help of the above dual polynomial (D.3), we have

[TABLE]

since $\|\mathcal{Q}(\tau)-\overline{\mathcal{Q}}(\tau)\|_{2}$ can be upper bounded with a very small number as is shown in Lemma 15 of paper [23].

To obtain (B.2), we next bound $\|\bm{h}_{j}-\overline{\mathcal{Q}}(\tau)\|_{2}$ by following the proof strategies of Lemma 2.5 in [7]. Without loss of generality, we consider $\tau_{j}=0$ and bound $\|\bm{h}_{j}-\overline{\mathcal{Q}}(\tau)\|_{2}$ in the interval $[0,0.16/N]$ . Define

[TABLE]

where $\bm{w}_{R}(\tau)$ and $\bm{w}_{I}(\tau)$ denote the real and imaginary part of $\bm{w}(\tau)$ , respectively. Then, we have

[TABLE]

where we have used

[TABLE]

for the third line [17, 23]. The last line follows from equation (2.25) and Lemma 2.7 of paper [1].

Then, in the interval $[0,0.16/N]$ , due to $\bm{w}_{R}(0)=\bm{w}_{R}^{\prime}(0)=\bm{w}_{I}(0)=\bm{w}_{I}^{\prime}(0)=0$ , we have

[TABLE]

with some $\widetilde{\tau}\in[0,0.16/N]$ . Similarly, we can get

[TABLE]

It follows that

[TABLE]

which implies that

[TABLE]

and we finish the proof.

Appendix E Proof of Theorem B.2

In this section, we extend the proof of Lemma 2.7 in [7] to prove our Theorem B.2. Define a vector-valued polynomial $\mathcal{Q}_{1}(\tau)$ that shares the same form of $\mathcal{Q}(\tau)$ as in (D.1), namely,

[TABLE]

where the random matrix kernel $\mathbf{K}_{M}(\tau)$ is defined in (D.2) and the coefficient vectors $\boldsymbol{\alpha}^{1}=[{\boldsymbol{\alpha}_{1}^{1}}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ {\boldsymbol{\alpha}_{J}^{1}}^{H}]^{H}$ , $\boldsymbol{\beta}^{1}=[{\boldsymbol{\beta}_{1}^{1}}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ {\boldsymbol{\beta}_{J}^{1}}^{H}]^{H}$ are selected to satisfy

[TABLE]

Similar to Appendix D, we define another polynomial $\overline{\mathcal{Q}}_{1}(\tau)$ with the squared Fejér kernel $\mathcal{K}_{M}(\tau)$ , namely,

[TABLE]

where $\overline{\boldsymbol{\alpha}}^{1}=[{\overline{\boldsymbol{\alpha}}_{1}^{1}}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ {\overline{\boldsymbol{\alpha}}_{J}^{1}}^{H}]^{H}$ , $\overline{\boldsymbol{\beta}}^{1}=[{\overline{\boldsymbol{\beta}}_{1}^{1}}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ {\overline{\boldsymbol{\beta}}_{J}^{1}}^{H}]^{H}$ are the coefficients that are selected such that

[TABLE]

It can be seen that the polynomial $\overline{\mathcal{Q}}_{1}(\tau)$ (E.2) is to $\mathcal{Q}_{1}(\tau)$ (E.1) what $\overline{\mathcal{Q}}(\tau)$ (D.3) is to $\mathcal{Q}(\tau)$ (D.1). Therefore, we can show that $\|\mathcal{Q}_{1}(\tau)-\overline{\mathcal{Q}}_{1}(\tau)\|_{2}$ is upper bounded with a very small number when provided with (5.18) by using a similar strategy to that in paper [23]. This further implies that

[TABLE]

Then, we only need to bound $\|\overline{\mathcal{Q}}_{1}(\tau)\|_{2}$ and $\|\bm{h}_{j}(\tau-\tau_{j})-\overline{\mathcal{Q}}_{1}(\tau)\|_{2}$ .

Note that the constraints in (E.3) can be expressed in the following matrix form

[TABLE]

with $(\overline{\mathbf{D}}_{l})_{sj}=\mathcal{K}_{M}^{l}(t_{s}-t_{j})$ and $\bm{h}=[\bm{h}_{1}^{H}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ \bm{h}_{J}^{H}]^{H}$ . Define

[TABLE]

It is shown in [1] that $\overline{\mathbf{D}}$ is invertible, which implies that $\overline{\mathbf{D}}\otimes\mathbf{I}_{K}$ is also invertible and these coefficient vectors can be expressed as

[TABLE]

with

[TABLE]

Then, the coefficient vectors can be rewritten as

[TABLE]

Define $\|\mathbf{A}\|_{\infty}\triangleq\max_{i}\sum_{j}|a_{ij}|$ as the infinity norm of a matrix $\mathbf{A}$ . Using a similar method to that in the proof of Lemma 5.3 in the technical report [34], we can bound the $l_{2,\infty}$ norm of $\overline{\boldsymbol{\alpha}}^{1}$ and $\overline{\boldsymbol{\beta}}^{1}$ as

[TABLE]

where we have used the bounds (B.7) and (B.8) in paper [7]. It follows that

[TABLE]

where the last inequality follows from (E.4) and (B.9) in paper [7].

With the same method used to obtain (B.2), we can show that

[TABLE]

when we consider $\tau_{j}=0$ without loss of generality. Therefore, we obtain (B.4) and finish the proof.

Appendix F Proof of Lemma B.1

Recall that the dual certificate $\bm{q}\in\mathbb{C}^{N}$ constructed in [23] satisfies the two conditions (B.6) and (B.7) that are required in Definition B.1 when $N$ satisfies the lower bound given in (5.18). Therefore, we use the optimal $\bm{q}\in\mathbb{C}^{N}$ constructed in [23], that is

[TABLE]

Here, $\mathbf{W}=\operatorname{diag}([w_{-2M},\cdots,w_{0},\cdots,w_{2M}])$ denotes a weighting matrix and

[TABLE]

are some coefficients that satisfy

[TABLE]

where $\mathbf{A}\in\mathbb{C}^{2KJ\times N}$ and $\bm{c}_{h}\in\mathbb{C}^{2KJ}$ are given as

[TABLE]

Plugging in the optimal dual certificate $\bm{q}$ in (F.7), the polynomial $\widetilde{\mathcal{Q}}(\tau)$ can be represented as

[TABLE]

with

[TABLE]

As in [23], by setting $w_{m}=\sqrt{\frac{M}{g_{M}(m)}}$ , we have

[TABLE]

Recall that we assume $\frac{\bm{b}_{m}}{\|\bm{b}_{m}\|_{2}}$ satisfies the isotropy property (3.1), namely,

[TABLE]

which implies that

[TABLE]

where $\mathcal{K}_{M}(\tau)\triangleq\frac{1}{M}\sum_{m=-2M}^{2M}g_{M}(m)e^{i2\pi\tau m}$ is the squared Fejér kernel. It follows that

[TABLE]

where $\mathbf{K}\triangleq[\widetilde{\mathbf{K}}_{M}(\tau-\tau_{1})\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ \widetilde{\mathbf{K}}_{M}(\tau-\tau_{J})]$ and $\mathbf{K}^{\prime}\triangleq[\widetilde{\mathbf{K}}_{M}^{\prime}(\tau-\tau_{1})\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ \widetilde{\mathbf{K}}_{M}^{\prime}(\tau-\tau_{J})]$ are two block matrices with size $K\times KJ$ . $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ are two coefficient vectors defined in (F.10). It follows from [23] that

[TABLE]

where $\mathbf{D}$ denotes a system matrix and is defined as

[TABLE]

with $[\mathbf{D}_{l}]_{sj}=\widetilde{\mathbf{K}}_{M}^{(l)}(\tau_{s}-\tau_{j})$ , $l=0,1,2$ .999Note that we use $\widetilde{\mathbf{K}}_{M}^{(l)}(\tau)$ to denote the $l$ -th order derivative of $\widetilde{\mathbf{K}}_{M}(\tau)$ , namely, $\widetilde{\mathbf{K}}_{M}^{(0)}(\tau)=\widetilde{\mathbf{K}}_{M}(\tau)$ , $\widetilde{\mathbf{K}}_{M}^{(1)}(\tau)=\widetilde{\mathbf{K}}_{M}^{\prime}(\tau)$ , and $\widetilde{\mathbf{K}}_{M}^{(2)}(\tau)=\widetilde{\mathbf{K}}_{M}^{\prime\prime}(\tau)$ . To obtain (F.14), we have used $\|\mathbf{D}^{-1}\|\leq 2\|(\mathbb{E}\mathbf{D})^{-1}\|$ [23, Lemma 7], $\|(\mathbb{E}\mathbf{D})^{-1}\|\leq 1.568$ [23, Lemma 4] and $\|\bm{c}_{h}\|_{2}=\sqrt{J}$ . The inequality in (F.14) also implies that

[TABLE]

since $|\mathcal{K}_{M}^{\prime\prime}(0)|=\frac{4\pi^{2}(M^{2}-1)}{3}\leq CN^{2}$ . Then, we have

[TABLE]

where the last inequality follows from (F.15) and the Cauchy-Schwarz inequality.

To bound $\|\widetilde{\mathcal{Q}}(\tau)\|_{2,2}^{2}$ , we are left with bounding $\int_{0}^{1}\|\mathbf{K}\|^{2}d\tau$ and $\int_{0}^{1}\|\mathbf{K}^{\prime}\|^{2}d\tau$ . Note that

[TABLE]

where the last inequality follows from the Cauchy-Schwarz inequality.

Denote $\lambda_{\max}(\mathbf{X})$ as the maximum eigenvalue of a matrix $\mathbf{X}$ . The second term in (F.17) can be bounded with

[TABLE]

for some numerical constant $C$ . Here we use Parseval’s theorem and $\|g_{M}\|_{\infty}=\sup_{m}|g_{M}(m)|\leq 1$ [16] to get the last equality and inequality, respectively.

Next, we bound $\left\|\mathbf{K}-\mathbb{E}\mathbf{K}\right\|$ with the matrix Bernstein inequality [35]. Define a set of independent zero mean random matrices $\mathbf{S}_{m}\in\mathbb{C}^{K\times KJ},m=-2M,\ldots,2M$ with

[TABLE]

where “ $\otimes$ ” denotes the Kronecker product. Then, we have

[TABLE]

To apply the matrix Bernstein inequality, we need to bound the spectral norm $\|\mathbf{S}_{m}\|$ and the matrix variance statistic of the sum:

[TABLE]

which we tackle separately in the sequel.

Note that we can bound the spectral norm as

[TABLE]

with some numerical constant $C$ . Here, the last inequality follows from

[TABLE]

On the other hand, we have

[TABLE]

and

[TABLE]

where $\mathbf{E}_{m\tau}$ is a $J\times J$ matrix with the $(k,l)-$ th entry being $e^{i2\pi(\tau_{k}-\tau_{l})m}$ . Therefore, the matrix variance statistic of the sum can be bounded with $C\frac{J}{MK}$ . Then, applying the matrix Bernstein inequality [35] yields that

[TABLE]

for any $t\in[0,C\frac{\sqrt{J}}{K}]$ . Set $t=C\sqrt{\frac{J}{MK}\log\left(\frac{K(J+1)}{\delta}\right)}$ , which belongs to the interval $[0,C\frac{\sqrt{J}}{K}]$ if $M\geq CK\log\left(\frac{K(J+1)}{\delta}\right)$ . Then, we have

[TABLE]

which immediately suggests that the following event

[TABLE]

holds with probability at least $1-\delta$ provided that $N\geq CK\log\left(\frac{K(J+1)}{\delta}\right)$ . Conditioned on event $\mathcal{E}_{\mathbf{K}}$ , one can show that

[TABLE]

holds by using (F.18) provided that $N\geq CK\log\left(\frac{K(J+1)}{\delta}\right)$ .

Similar to (F.17), we note that

[TABLE]

Using Parseval’s identify, we can bound the second term in (F.21) as

[TABLE]

with some numerical constant $C$ .

Now, we bound $\left\|\mathbf{K}^{\prime}-\mathbb{E}\mathbf{K}^{\prime}\right\|$ with the matrix Bernstein inequality [35]. Define a set of independent zero mean random matrices $\mathbf{S}_{m}^{\prime}\in\mathbb{C}^{K\times KJ},m=-2M,\ldots,2M$ with

[TABLE]

Then, we have

[TABLE]

It can be seen that $\mathbf{S}_{m}^{\prime}$ is also the first order derivative of $\mathbf{S}_{m}$ with respect to $\tau$ . We can then bound its spectral norm as

[TABLE]

with some numerical constant $C$ . Here, the last inequality follows from (F.19).

Further, we have

[TABLE]

and

[TABLE]

where $\mathbf{E}_{m\tau}$ is a $J\times J$ matrix with the $(k,l)-$ th entry being $e^{i2\pi(\tau_{k}-\tau_{l})m}$ . Therefore, the matrix variance statistic of the sum can be bounded with $C\frac{JM}{K}$ . Then, we combine the above bounds and apply the matrix Bernstein inequality to obtain

[TABLE]

for any $t\in[0,C\frac{M\sqrt{J}}{K}]$ . Set $t=C\sqrt{\frac{JM}{K}\log\left(\frac{K(J+1)}{\delta}\right)}$ , which belongs to the interval $[0,C\frac{M\sqrt{J}}{K}]$ if $M\geq CK\log\left(\frac{K(J+1)}{\delta}\right)$ . Then, we have

[TABLE]

and the following event

[TABLE]

holds with probability at least $1-\delta$ provided that $N\geq CK\log\left(\frac{K(J+1)}{\delta}\right)$ . Thus, one can show that

[TABLE]

holds on the event $\mathcal{E}_{\mathbf{K}^{\prime}}$ provided that $N\geq CK\log\left(\frac{K(J+1)}{\delta}\right)$ .

Plugging (F.20) and (F.22) into (F.16), we can bound $\|\widetilde{\mathcal{Q}}(\tau)\|_{2,2}^{2}$ with

[TABLE]

and finish the proof of Lemma B.1 by taking square root on both sides of (F.23).

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. J. Candès and C. Fernandez-Granda, “Towards a mathematical theory of super-resolution,” Communications on Pure and Applied Mathematics , vol. 67, no. 6, pp. 906–956, 2014.
2[2] C. W. Mccutchen, “Superresolution in microscopy and the abbe resolution limit,” JOSA , vol. 57, no. 10, pp. 1190–1192, 1967.
3[3] T. Harris, R. Grober, J. Trautman, and E. Betzig, “Super-resolution imaging spectroscopy,” Applied Spectroscopy , vol. 48, no. 1, pp. 14A–21A, 1994.
4[4] Y. Xie, S. Li, G. Tang, and M. B. Wakin, “Radar signal demixing via convex optimization,” in 2017 22nd International Conference on Digital Signal Processing (DSP) , pp. 1–5, IEEE, 2017.
5[5] K. G. Puschmann and F. Kneer, “On super-resolution in astronomical imaging,” Astronomy & Astrophysics , vol. 436, no. 1, pp. 373–378, 2005.
6[6] H. Greenspan, “Super-resolution in medical imaging,” The Computer Journal , vol. 52, no. 1, pp. 43–63, 2008.
7[7] E. J. Candès and C. Fernandez-Granda, “Super-resolution from noisy data,” Journal of Fourier Analysis and Applications , vol. 19, no. 6, pp. 1229–1254, 2013.
8[8] G. F. Margrave, M. P. Lamoureux, and D. C. Henley, “Gabor deconvolution: Estimating reflectivity by nonstationary deconvolution of seismic data,” Geophysics , vol. 76, no. 3, pp. W 15–W 30, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Atomic Norm Denoising for Complex Exponentials

Abstract

1 Introduction

2 Problem Formulation

3 Theoretical Guarantee for Atomic Norm Denoising

Theorem 3.1**.**

4 Numerical Simulations

5 Proof of Theorem 3.1

5.1 Bounding Ez∥B∗(z)∥A∗\mathbb{E}_{\bm{z}}\|\mathcal{B}^{*}(\bm{z})\|_{\mathcal{A}}^{*}Ez​∥B∗(z)∥A∗​ and P(∥B∗(z)∥A∗≤λη)\mathbb{P}\left(\|\mathcal{B}^{*}(\bm{z})\|_{\mathcal{A}}^{*}\leq\frac{\lambda}{\eta}\right)P(∥B∗(z)∥A∗​≤ηλ​)

Lemma 5.1**.**

Proof.

5.2 Bounding 1N∥x^−x⋆∥22\frac{1}{N}\|\widehat{\bm{x}}-\bm{x}^{\star}\|_{2}^{2}N1​∥x−x⋆∥22​

Lemma 5.2**.**

Lemma 5.3**.**

Lemma 5.4**.**

Lemma 5.5**.**

6 Conclusion

Acknowledgement

Appendix A Proof of Lemma 5.3

Appendix B Proof of Lemma 5.4

Theorem B.1**.**

Theorem B.2**.**

Definition B.1**.**

Lemma B.1**.**

Appendix C Proof of Lemma 5.5

Appendix D Proof of Theorem B.1

Appendix E Proof of Theorem B.2

Appendix F Proof of Lemma B.1

Theorem 3.1.

5.1 Bounding $\mathbb{E}_{\bm{z}}\|\mathcal{B}^{}(\bm{z})\|_{\mathcal{A}}^{}$ and $\mathbb{P}\left(\|\mathcal{B}^{}(\bm{z})\|_{\mathcal{A}}^{}\leq\frac{\lambda}{\eta}\right)$

Lemma 5.1.

5.2 Bounding $\frac{1}{N}\|\widehat{\bm{x}}-\bm{x}^{\star}\|_{2}^{2}$

Lemma 5.2.

Lemma 5.3.

Lemma 5.4.

Lemma 5.5.

Theorem B.1.

Theorem B.2.

Definition B.1.

Lemma B.1.