Atomic Norm Denoising for Complex Exponentials with Unknown Waveform Modulations
Shuang Li, Michael B. Wakin, and Gongguo Tang

TL;DR
This paper introduces an atomic norm regularized approach for denoising signals composed of complex exponentials with unknown modulations in non-stationary blind super-resolution, providing theoretical error bounds and numerical validation.
Contribution
It extends atomic norm methods to denoise complex exponential signals with unknown waveforms, offering new theoretical insights and practical algorithms.
Findings
Mean square error depends on noise variance and signal parameters.
The proposed method achieves robust denoising in non-stationary blind super-resolution.
Numerical experiments validate the theoretical error bounds.
Abstract
Non-stationary blind super-resolution is an extension of the traditional super-resolution problem, which deals with the problem of recovering fine details from coarse measurements. The non-stationary blind super-resolution problem appears in many applications including radar imaging, 3D single-molecule microscopy, computational photography, etc. There is a growing interest in solving non-stationary blind super-resolution task with convex methods due to their robustness to noise and strong theoretical guarantees. Motivated by the recent work on atomic norm minimization in blind inverse problems, we focus here on the signal denoising problem in non-stationary blind super-resolution. In particular, we use an atomic norm regularized least-squares problem to denoise a sum of complex exponentials with unknown waveform modulations. We quantify how the mean square error depends on the noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotoacoustic and Ultrasonic Imaging · Sparse and Compressive Sensing Techniques · Advanced Fluorescence Microscopy Techniques
Atomic Norm Denoising for Complex Exponentials
with Unknown Waveform Modulations
Shuang Li, Michael B. Wakin, and Gongguo Tang Department of Electrical Engineering, Colorado School of Mines. Email: {shuangli,mwakin,gtang}@mines.edu.
(February 15, 2019 Revised: September 04, 2019)
Abstract
Non-stationary blind super-resolution is an extension of the traditional super-resolution problem, which deals with the problem of recovering fine details from coarse measurements. The non-stationary blind super-resolution problem appears in many applications including radar imaging, 3D single-molecule microscopy, computational photography, etc. There is a growing interest in solving non-stationary blind super-resolution task with convex methods due to their robustness to noise and strong theoretical guarantees. Motivated by the recent work on atomic norm minimization in blind inverse problems, we focus here on the signal denoising problem in non-stationary blind super-resolution. In particular, we use an atomic norm regularized least-squares problem to denoise a sum of complex exponentials with unknown waveform modulations. We quantify how the mean square error depends on the noise variance and the true signal parameters. Numerical experiments are also implemented to illustrate the theoretical result.
Index terms— Atomic norm denoising, non-stationary blind deconvolution, line spectrum estimation, demodulation, mean square error.
1 Introduction
Super-resolution is the process of recovering high-resolution information of a signal from its coarse-scale measurements [1]. Super-resolution problems arise in a wide variety of applications, including microscopy [2], imaging spectroscopy [3], radar signal demixing [4], astronomical imaging [5], and medical imaging [6]. One specific super-resolution problem has received considerable theoretical study in recent years: recovering positions on the interval of unknown point sources given only low-frequency samples [1, 7]. Exchanging the roles of time and frequency, this problem is equivalent to that of line spectral estimation: recovering frequencies on the interval of unknown complex exponentials given limited or possibly compressed time-domain samples. The total variation norm has been used to regularize such super-resolution problems, and this is equivalent to the atomic norm that has been used to regularize such line spectral estimation problems.
In this work, we are interested in the non-stationary blind super-resolution scenario, which extends the above super-resolution problem to the setting where each point source is convolved with a unique and unknown point spread function. The term “non-stationary” indicates that the point spread functions are potentially different and comes from the field of non-stationary deconvolution [8]; the term “blind” indicates that the point spread functions are unknown. Non-stationary blind super-resolution problems also appear in applications involving radar imaging [9], astronomy [10], photography [11], 3D single-molecule microscopy [12], seismology [8] and nuclear magnetic resonance (NMR) spectroscopy [13, 14, 15]. Exchanging the roles of time and frequency, the conventional line spectral estimation problem is modified as follows: each complex exponential is modulated (pointwise multiplied) by an unknown waveform, and this waveform can vary from one complex exponential to the next. Though both problem formulations are equivalent, it is this modified line spectrum estimation problem that we detail in Section 2 and refer to throughout this paper.
In recent years, convex methods have been widely used in super-resolution due to their robustness to noise and strong theoretical guarantees. Among them, atomic norm minimization based methods are extremely popular. In [16], the authors propose an atomic norm minimization based scheme to super-resolve unknown frequencies of a signal from its random time samples. Yang et al. [17] extend the super-resolution of unknown frequencies in [16] to the case where multiple measurement vectors are available. Our earlier work [18] also brings atomic norm minimization into the application of modal analysis for super-resolution of unknown modal parameters of a vibration system from its random and compressed measurements. Meanwhile, the robustness of atomic norm minimization when given noisy data [7] has also been widely studied in the past few years. The authors in [19] apply an atomic norm denoising based technique to line spectral estimation, which is one of the fundamental problems in statistical signal processing. The same authors [20] also establish a nearly optimal algorithm, which is called atomic norm soft thresholding, to denoise a mixture of complex sinusoids. In [21], the authors use atomic norm denoising to investigate the performance of super-resolution line spectral estimation with white noise and provide theoretical guarantees for support recovery. In addition, atomic norm denoising is also studied in [22] for the multiple measurement vector case.
Our work is most closely related to papers [20] and [23]. The authors in [20] focus on a mixture of complex exponentials, while we work on a superposition of complex exponentials with unknown waveform modulations. It can be seen from Section 2 that our problem reduces to the problem studied in [20] by setting the subspace dimension and selecting the subspace matrix as an vector with all ones. Both [20] and our work establish theoretical guarantees for the mean square error (MSE) with respect to the noise level and true signal parameters. However, since we deal with a more sophisticated scenario in this work, our theory also depends on properties of the subspace matrix . In addition, we provide an explicit success probability (unlike [20]) that increases with the number of signal samples. In [23], the authors study the problem of non-stationary blind super-resolution in a noiseless setting, namely, recovering parameters of a sum of unknown complex exponentials from modulations with unknown waveforms. In contrast, we consider a more practical scenario in which the observed data are contaminated with noise. Therefore, we use different algorithms in this work. In [23], one can exactly recover the unknown parameters with high probability when provided with enough samples. However, it is no longer possible to achieve exact recovery in this work since we only have access to the noisy observations. Thus, the goal of this work is to characterize the MSE as a function of the noise variance and the true signal parameters.
The main contribution of this work is that we have quantified how the MSE depends on the noise level and the true signal parameters. Namely, we provide a theoretical result to bound the MSE in terms of the noise variance, the total number of uniform samples, the number of true frequencies and the dimension of the subspace in which the unknown waveform modulations live. To be more precise, we have proved both theoretically and numerically that 1) the MSE scales linearly with the noise variance and the subspace dimension, and 2) the MSE is inversely proportional to the total number of uniform samples. We have proved theoretically that the MSE scales at worst with square of the number of true frequencies but numerical experiments show that it scales linearly with the number of frequencies. We leave the problem of improving our theoretical bound to match the numerical experiments for our future work.
The remainder of this paper is organized as follows. In Section 2, we set up our problem, propose an atomic norm denoising program and introduce its semidefinite program (SDP). In Section 3, we present the main theorem that provides the theoretical guarantee for the atomic norm denoising program. In Section 4, we illustrate the theoretical guarantee with several numerical simulations. The proof for the main theory is presented in Section 5. Finally, we conclude this work and discuss future direction in Section 6. The Appendix provides some supplementary theoretical results.
2 Problem Formulation
In this work, we consider the following signal
[TABLE]
which can be interpreted as uniform samples of a continuous-time superposition of complex exponentials with unknown amplitudes and frequencies, each modulated by a different waveform. The requirement for the above sampling indices to be centered around zero is just for technical convenience. Denote as the length of our samples. All the conclusions in this work remain true with appropriate modifications for any consecutive samples. Without loss of generality, we assume that the unknown coefficients and the unknown frequencies are normalized, i.e., . are unknown waveforms and is the number of active frequencies in the signal .
The signal model introduced in (2.1) appears in a wide range of applications. For example, the authors in [9] consider a radar imaging problem of identifying the relative distances and velocities of targets from a received signal , which can be viewed as a sum of finitely many delays (by ) and Doppler shifts (by ) of a given transmitted signal . The signal model (2.1) can be obtained by sampling the received signal and defining to be samples of the th delayed copy of . In addition, the signal in NMR spectroscopy is modeled as in [14]. One can again obtain the signal model (2.1) by sampling the NMR spectroscopy signal . Finally, the received signal in multi-user communication systems [24] can be modeled as , with each being an unknown coefficient and each being an unknown delay of an unknown transmitted signal . In this case, the signal model (2.1) can be obtained by sampling the Fourier transform of , namely, .
As noise is ubiquitous in practice, we may only have access to the noisy observations of , namely,
[TABLE]
where is the observation noise with i.i.d. complex Gaussian entries from the distribution . To recover the unknown frequencies , coefficients and modulation waveforms from the noisy observation , we observe that the number of degrees of freedom () is much larger than the number of observations (), which implies that we need some other assumptions to make the inverse problem well-posed. Therefore, we assume that all the unknown waveforms belong to a common and known low-dimensional subspace, which is spanned by the columns of a matrix with . Let denote the -th column of , i.e.,
[TABLE]
Then, we have for some unknown coefficients . Without loss of generality, we also assume that has unit norm, i.e., . This is because the coefficients can be scaled as needed. Note that can be estimated once is recovered. Therefore, the number of degrees of freedom becomes , which can be smaller than when we have enough measurements, that is, when is large enough.
As is illustrated in [23], the assumption that the unknown waveforms belong to a common and known low-dimensional subspace appears in many real applications such as super-resolution imaging and multi-user communication systems. For example, the point spread functions in super-resolution imaging can be modeled as Gaussian kernels with unknown widths [12, 25]. One can construct a dictionary of Gaussian functions having different widths and apply principal component analysis to this dictionary to discover a low-dimensional subspace that accurately represents the unknown point spread functions. This is also demonstrated in the simulations of [23].
For any , define a vector as
[TABLE]
Then, with the assumption that , we can rewrite each sample of the signal in (2.1) as
[TABLE]
where is the -th column of the identity matrix 111Note that we use with a subscript to denote an identify matrix with appropriate size in this work. and denotes the trace of a matrix. The inner product between two matrices and is defined as
[TABLE]
Define a linear operator and its corresponding adjoint operator as
[TABLE]
where denotes the -th entry of . Define
[TABLE]
Then, we have
[TABLE]
and the noisy observation vector can be rewritten as
[TABLE]
When is small, the noiseless data matrix , defined in (2.5), can be viewed as a sparse combination of elements from the following atomic set
[TABLE]
with defined in (2.2). The associated atomic norm is then defined as
[TABLE]
where is the convex hall of the atomic set .
Let denote a suitably chosen regularization parameter.222Please refer to Section 5.1 for guidelines on choosing . Inspired by the sparse representation of with respect to the atomic set , we perform denoising by proposing an atomic norm regularized least-squares problem
[TABLE]
which is equivalent to the following semidefinite program (SDP)333 In practice, one can use the CVX software package to solve this SDP [26]. [17, 22]
[TABLE]
Here, denotes the Hermitian Toeplitz matrix with being its first column.
In this work, our goal is to analyze the performance of the above denoising scheme by bounding the mean-squared error (MSE) between the solution and the true signal .
3 Theoretical Guarantee for Atomic Norm Denoising
Motived by [27, 28, 23], we assume that the columns of , i.e., , are independent and identically distributed (i.i.d.) samples from a distribution that satisfies the isotropy and incoherence properties with coherence parameter .
- •
Isotropy property: A distribution satisfies the isotropy property if444Note that this definition of isotropy property is slightly different from the one used in [27, 28, 23]. To give an example of that obeys the isotropy and incoherence properties (3.1) and (3.2) with , we can choose the entries of from the Rademacher distribution.
[TABLE]
- •
Incoherence property: A distribution satisfies the incoherence property with coherence if
[TABLE]
holds almost surely. Here, denotes the -th entry of .
Next, we present the main result that characterizes the MSE in the following theorem.
Theorem 3.1**.**
Suppose the noiseless signal is given as in (2.1) with the true frequencies satisfying a minimum separation condition
[TABLE]
where denotes the wrap-around distance on the unit circle. Assume that we have noisy measurements , where is i.i.d. complex Gaussian noise with mean 0 and variance . We also assume that are i.i.d. samples from a distribution that satisfies the isotropy (3.1) and incoherence (3.2) properties with coherence parameter . Then, the estimate obtained by solving the atomic norm regularized least-squares problem (2.7) with for some satisfies
[TABLE]
with probability at least if is chosen sufficiently large and . Here, and are some numerical constants.555Note that the numerical constants and used in this paper can vary from line to line.
Our use of the isotropy (3.1) and incoherence (3.2) properties parallels the assumptions made for subspace models in several related works. These properties were first defined in [27] for the development of a probabilistic and RIPless theory of compressed sensing, and then used in [28] for the problem of blind spikes deconvolution. Other random subspace assumptions are also used in [29, 30] for random channel coding and blind deconvolution. The transmitted signal in multi-user communication systems [24] can also be represented in a known low-dimensional random subspace in the case when each of the transmitters sends out a random signal for the sake of security, privacy, or spread spectrum communications.666Random signals also appear in applications such as noise radar [31]. The transmitted random signals can be either directly generated from a noise-generating microwave source or obtained by modulating a sine wave with random noise. Note that the matrix with randomness assumptions can alternatively be viewed as a sensing measure to obtain random measurements of the data matrix via (2.3). As stated in [27] and many other compressive sensing works, random measurements are crucial in the development of theoretical results, and can result in better empirical results. As evidenced by the numerical experiments in [23], the randomness assumption on does not appear to be critical in practice. Finally, observe that we have used not only the same isotropy and incoherence properties on the random subspace model as in papers [28, 23], but also some other randomness assumptions on the subspace, namely, , where denotes arbitrary column of . This part of the isotropy property is only an artifact of our proof technique.
Note that an oracle MSE rate of could be achieved if one had enough measurements (i.e., ) and perfect knowledge of the well-separated frequencies . To be more precise, the noiseless signal in (2.1) can be written as , where is a matrix related to the subspace matrix and vectors , and is a vector determined by the coefficients and . If and the well-separated frequencies are known, the matrix is then known and is a tall matrix with full column rank due to the randomness of . Then, recovering from its noisy observation is equivalent to solving a least-squares problem. Elementary calculations then show that . Therefore, our proposed MSE bound (3.4) is optimal except for an extra factor and some logarithmic terms. This extra factor may be removed by using some other proof strategies instead of the dual analysis of atomic norm. We leave the problem of improving our theoretical bound for future work.
Finally, note that we have removed the randomness assumption on as is required in [23]. However, we bound the sample complexity with instead of . We also notice that the author in [28] provides a sample complexity bound without using any randomness assumptions on . It is worth noting that those papers are not focused on signal denoising as we are in this work. The extra factor in our sample complexity bound is a result of removing the randomness assumption on that is used in Lemmas 11 and 13 of reference [23].
4 Numerical Simulations
In this section, we conduct four numerical experiments to support our theoretical bound in Theorem 3.1. In all of these experiments, we perform denoising by solving the SDP of the atomic norm regularized least-squares problem (2.7) with CVX. As is suggested in Theorem 3.1, we set the regularization parameter as with 777Note that we require in Theorem 3.1. However, we find that can achieve much lower MSE in practice. Thus, we set for all of the following experiments.. We generate the entries of as Rademacher random variables and the entries of as Gaussian random variables satisfying and then normalize all the to make sure . We generate the observation noise with i.i.d. complex Gaussian entries satisfying . 50 trials are performed for each of these experiments.
In the first experiment, we examine the relationship between the MSE and the total number of uniform samples with and . The true frequencies and corresponding amplitudes are set as and . We change from 10 to 100, namely, changes from 41 to 401. Figure 1(a) shows the denoising performance of atomic norm regularized least-squares problem (2.7) while Figure 1(b) indicates that the MSE does scale with , which implies that the MSE can decrease linearly (if we ignore the log term) as we increase the number of uniform samples .
In the second experiment, we characterize the impact of the noise variance on MSE. We fix and set . Other parameters are set the same as the first experiment. It can be seen from Figure 2(a) that the MSE does scale linearly with , as is shown in Theorem 3.1.
To see the influence of the number of true frequencies on the MSE, we repeat the first experiment by fixing and changing from 1 to 10. We randomly select true frequencies from a set with denoting the minimum separation. The corresponding amplitudes are then set as . Other parameters are the same as those used in the first experiment. Figure 2(b) implies that the MSE scales with , which is better than the one indicated in Theorem 3.1. We leave the improvement of Theorem 3.1 for future work.
Finally, we illustrate the relationship between the MSE and the subspace dimension in the last experiment. We set and change from 1 to 10. Other parameters are same as those used in the first experiment. Figure 2(c) roughly shows a linear relationship between the MSE and , which is consistent with the bound in Theorem 3.1.
5 Proof of Theorem 3.1
The proof of Theorem 3.1 is presented in two separate subsections. In Subsection 5.1, we discuss how to choose the regularization parameter . It is well known that a good choice of the regularization parameter can achieve accelerated convergence rates.888The conditions under which an accelerated convergence rate can be obtained for an atomic norm denoising problem is discussed in [19]. In Subsection 5.2, with the well chosen regularization parameter , we bound the MSE with respect to the noise level and true signal parameters. We prove Theorem 3.1 by extending the proof of [20, Theorem 1] and [18, Theorem III.6] to our framework. Note that [18, Theorem III.6] is a multiple measurement vector (MMV) extension of [20, Theorem 1]. Due to the random linear operator that appears in our atomic norm denoising problem (2.7), we will develop a random extension of the two previous results, [20, Theorem 1] and [18, Theorem III.6], with respect to . The proof idea is borrowed from these prior works. However, the proof there does not extend directly to our framework with the linear operator . For example, this linear operator can first affect our choice of the regularization parameter .
Inspired by the regularization parameter used in prior works [19, 20, 18], we set
[TABLE]
in the atomic norm regularized least-squares problem (2.7). Here, is a complex Gaussian vector and is a constant that must be large enough to enable the proof of Lemma 5.5. is defined as the dual norm of the atomic norm defined in (2.6), which is given as
[TABLE]
In order to set , we first compute an upper bound for .
5.1 Bounding and
Lemma 5.1**.**
Let be a random vector with i.i.d. complex Gaussian entries from the distribution . Define a linear operator as in (2.4), i.e.,
[TABLE]
where is the -th column of a matrix , the Hermitian matrix of . is the th entry of and is the -th column of the identity matrix . Then, there exists a numerical constant such that
[TABLE]
By setting the regularization parameter as , we have
[TABLE]
with being some constant.
Proof.
The dual norm defined in (5.1) implies that
[TABLE]
where is the -th entry of . We have defined a polynomial as
[TABLE]
Note that we have
[TABLE]
for any . The first inequality follows from the mean value theorem while the last inequality follows from Bernstein’s inequality for polynomials [32].
Let take any of the values , which gives us
[TABLE]
Then, we upper bound with
[TABLE]
if . It follows that
[TABLE]
and
[TABLE]
Observe that, conditioned on ,
[TABLE]
where is a complex Gaussian random variable and defined as
[TABLE]
for . Note that the expectation and variance of are given as
[TABLE]
since . Therefore, conditioned on , the complex Gaussian random variable defined in (5.7) satisfies . Let and denote the real part and imaginary part of , i.e., . Then, we have
[TABLE]
which implies that
[TABLE]
is a chi-squared random variable with two degrees of freedom since both and satisfy standard normal distribution. Using to the properties of the chi-square distribution, we have
[TABLE]
Choosing and , together with inequalities (5.5), (5.6), and (5.8), we finally obtain
[TABLE]
where is a numerical constant that belongs to the interval when is large. Note that the last equality follows from the fact that
[TABLE]
This completes the proof for inequality (5.2).
Next, we can set the regularization parameter as
[TABLE]
for some constant and continue to prove inequality (5.3). It follows from (5.4) that
[TABLE]
where is a set of complex Gaussian variables with mean 0 and variance since and is fixed. Then, we have
[TABLE]
for any [33]. As a consequence, we have
[TABLE]
where the first inequality follows by plugging in (5.9), , and . The second inequality comes from the union bound. By letting and , we finally obtain
[TABLE]
with some numerical constant . Here, the first inequality follows from (5.10).
∎
5.2 Bounding
Now, we can set the regularization parameter as for some constant such that
[TABLE]
holds with probability at least , as is shown in Lemma 5.1.
With some fundamental computations based on convex analysis, we have the following lemma that provides optimality conditions for to be the solution of the atomic norm regularized least-squares problem (2.7).
Lemma 5.2**.**
(Optimality Conditions): is the solution of the atomic norm regularized least-squares problem (2.7) if and only if
, 2. 2.
.
Define a vector-valued representing measure for the true data matrix as
[TABLE]
with , that is, we have
[TABLE]
Similarly, we can also define a representation measure for the recovered data matrix and represent it as
[TABLE]
Then, a difference measure can be defined as
[TABLE]
which implies that we can represent the recovery error as
[TABLE]
Define the -th near region corresponding to and the far region as
[TABLE]
where denotes the wrap-around distance on the unit circle. Define
[TABLE]
It follows that and we can then bound as
[TABLE]
Here, we have defined a vector-valued error function .
With a little abuse of notation, we define
[TABLE]
By using the optimality conditions in Lemma 5.2 and the assumption that the bound condition in (5.11) holds, we have
[TABLE]
To bound the MSE , we need the following three key lemmas.
Lemma 5.3**.**
Observe that each entry of is an order- trigonometric polynomial. We have
[TABLE]
with
[TABLE]
The proof of Lemma 5.3 is given in Appendix A.
Lemma 5.4**.**
For some numerical constants and , we have that
[TABLE]
hold with probability at least when provided with . Here, is defined as the norm.
The proof of Lemma 5.4 is given in Appendix B.
Lemma 5.5**.**
There exists a numerical constant such that
[TABLE]
holds with probability at least for some sufficiently large .
The proof of Lemma 5.5 is given in Appendix C.
As a consequence of the above three lemmas, we have
[TABLE]
Note that
[TABLE]
where the fourth equality follows from Parseval’s theorem and is the th entry of . It follows that
[TABLE]
Finally, plugging (5.17) into (5.16), we have that
[TABLE]
holds with probability at least when provided with . Here, the last two inequalities follow from the incoherence property (3.2) and by setting .
Next, we explain the reason why we use instead of the lower bound provided in paper [23], which considers the noiseless counterpart of this work. Particularly, [23] requires to satisfy
[TABLE]
if all the are i.i.d. symmetric random samples from the complex unit sphere, namely, . In order to drop this randomness assumption on since we never use it in our proof, we make a slight modification of the proof in paper [23]. Note that the authors in [23] only use the randomness assumption on in Lemmas 11 and 13. Therefore, we only need to bound and in Lemmas 11 and 13 without the randomness assumption on .
In this part, we use the same notation as paper [23]. Readers can refer to paper [23] for detailed definition of all variables. Inspired by the proof of [28, Lemma 5], we have
[TABLE]
which is conditioned on . Here, and are two events defined in [23]. The last inequality follows from on the event and with . Then, we can obtain by setting .
Getting rid of the conditional probability, we have
[TABLE]
It is shown in paper [23] that the first term and the second term when provided
[TABLE]
and
[TABLE]
respectively. Thus, for some constant , we have
[TABLE]
provided
[TABLE]
Note that we set and absorb all of the constants into one here.
Similarly, conditioned on , we have
[TABLE]
for some numerical constant . Then, we can obtain by setting .
Getting rid of the conditional probability, we have
[TABLE]
when provided
[TABLE]
Now, we have dropped the randomness assumption on that is used in Lemmas 11 and 13 of paper [23]. We can follow the remaining proof of [23] and finally get
[TABLE]
Define as the true frequency set. The above bound on can guarantee that the norm of the dual polynomial constructed in [23] is strictly less than 1 when , which is used in the proof of Lemma 5.4.
6 Conclusion
In this work, we recover a signal that consists of a superposition of complex exponentials with unknown waveform modulations from its noisy measurements by solving an atomic norm regularized least-squares problem. We analyze the mean square error (MSE) and provide a theoretical result to bound the MSE in terms of the noise variance, the total number of uniform samples, the number of true frequencies, and the dimension of the subspace in which the unknown waveform modulations live. Meanwhile, we conduct several numerical experiments to support the theory. One of the experiments indicates that there is a room to improve the MSE bound and make it scale linearly with the number of true frequencies. We leave this for our future work.
Acknowledgement
The authors would like to thank Jonathan Helland at the Colorado School of Mines for some helpful discussions on atomic norm denoising. The authors would also like to thank the anonymous reviewers for their constructive comments and suggestions which greatly improve the quality of this paper. This work was supported by NSF grant CCF-1409258, NSF grant CCF-1464205, and NSF grant CCF-1704204.
Appendix A Proof of Lemma 5.3
Let be any vector with . Define a trigonometric polynomial
[TABLE]
with degree . Then, we have the following two inequalities
[TABLE]
which follow from the Bernstein’s inequality for polynomials [32]. As a consequence, we have
[TABLE]
Therefore, we obtain an upper bound on :
[TABLE]
With a similar argument, we also have
[TABLE]
The Taylor expansion of at is
[TABLE]
with some . Now, by using the inequality (A.1), we obtain
[TABLE]
Defining a function as
[TABLE]
we note that
[TABLE]
Then, we have
[TABLE]
Now, we can bound the second term in (5.12) as follows
[TABLE]
where are defined in Lemma 5.3. Here, we have plugged in to get the first equality. On the other hand, the first term in (5.12) can be bounded as
[TABLE]
by using the Cauchy-Schwarz inequality.
Finally, the square error can be upper bounded as
[TABLE]
and we finish the proof of Lemma 5.3.
Appendix B Proof of Lemma 5.4
To prove Lemma 5.4, we need the following two theorems, which are multiple measurement vector (MMV) random extensions of [20, Theorems 4, 5] and are proved in Appendix D and E.
Theorem B.1**.**
Define a dimensional unit ball . For any satisfying the minimum separation condition (3.3), there exists a dual certificate such that the corresponding vector-valued trigonometric polynomial satisfies the following properties for some provided that .
For each , with . 2. 2.
In each near region , there exist constants and such that
[TABLE] 3. 3.
In the far region , there exists a constant such that
[TABLE]
Theorem B.2**.**
Define a dimensional unit ball . For any satisfying the minimum separation condition (3.3), there exists a vector-valued trigonometric polynomial that satisfies the following properties for some provided that .
In each near region , there exists a constant such that
[TABLE] 2. 2.
In the far region , there exists a constant such that
[TABLE]
Next, we define a dual certificate as follows:
Definition B.1**.**
(Dual Certificate): Define a vector as a dual certificate for if makes the corresponding trigonometric polynomial
[TABLE]
satisfy
[TABLE]
where is defined as a set containing all the true frequencies.
Note that
[TABLE]
where the last inequality follows from and
[TABLE]
by using inequality (B.2) and the fact that belongs to .
Recall that the linear operator and its adjoint operator are defined as in (2.3) and (2.4). Then, we have and given as
[TABLE]
To get (5.13), we still need to bound the first term in (B.8). In particular, we have
[TABLE]
Define a new polynomial and recall that . With Parseval’s theorem, we obtain
[TABLE]
where the last inequality follows from the Cauchy-Schwarz inequality. Here, we define the -norm of and as
[TABLE]
It follows that
[TABLE]
The following lemma gives an upper bound for and is proved in Appendix F.
Lemma B.1**.**
Define two events
[TABLE]
with and . and are two block matrices defined in Appendix F. Conditioned on the above two events and , the norm of can be bounded as
[TABLE]
for some numerical constant provided that .
By plugging (B.9) and (B.10) into (B.8), one can bound as
[TABLE]
conditioned on the two events and and provided that .
Similar to (B.8), we can divide into the following three parts
[TABLE]
where the last inequality follows from (B.4) and (B.5). Then, we are left with bounding the first term in (B.12).
With a similar trick that used for , we have
[TABLE]
The vector-valued polynomial shares the same form as in (F.11), namely,
[TABLE]
with coefficient vectors and satisfying
[TABLE]
which can be verified with a similar trick used in (F.15).
Similar to Lemma B.1, we can bound as
[TABLE]
with probability as least . Finally, one can bound as
[TABLE]
conditioned on the two events and and provided that . Then, we finish the proof of Lemma 5.4.
Appendix C Proof of Lemma 5.5
Define as the projection of on the true frequency set . Set as the dual polynomial in Thereom B.1. Denote as an extension of the traditional TV norm, i.e.,
[TABLE]
where is defined as the complement set of on . Note that the integration over the far region can be bounded with
[TABLE]
by using (B.3). On the other hand, we can bound the integration over with
[TABLE]
where the second inequality follows from (B.1). Hence, can be bounded with
[TABLE]
by plugging (C.2) and (C.3) into (C.1). It follows that
[TABLE]
As in Lemma 5.2, denote as the solution of the atomic norm regularized least-squares problem (2.7). Then, we have
[TABLE]
By some elementary calculations, we can obtain
[TABLE]
where the first equality follows from . Then, we have
[TABLE]
which immediately results in
[TABLE]
due to and .
Recall that in Lemma 5.3, we have shown
[TABLE]
With a similar technique, we can bound the inner product by
[TABLE]
if . Here, we also use Lemma 5.4 in the last inequality. Substituting (C.6) into (C.5) leads to
[TABLE]
which further implies
[TABLE]
Combining (C.4) and (C.7), we get
[TABLE]
Finally, we can obtain (5.15) with large enough and finish the proof of Lemma 5.5.
Appendix D Proof of Theorem B.1
We use the dual polynomial constructed in [23], namely,
[TABLE]
with
[TABLE]
being the random matrix kernel and , being the coefficients that are selected such that
[TABLE]
Then, we can ensure that the first and third statements are satisfied due to the construction of this dual polynomial, when satisfies the lower bound given in (5.18). Then, we are left with proving the second statement.
For all , we have and
[TABLE]
due to . Furthermore, for all , we have
[TABLE]
where the last inequality follows from and for some numerical constant [23]. The Taylor expansion of at gives
[TABLE]
with some . Then, setting , we can obtain (B.1).
Next, we continue to prove (B.2). Recall the dual polynomial constructed in [17], namely,
[TABLE]
where
[TABLE]
is the squared Fejér kernel and , are the coefficients that are selected such that
[TABLE]
With the help of the above dual polynomial (D.3), we have
[TABLE]
since can be upper bounded with a very small number as is shown in Lemma 15 of paper [23].
To obtain (B.2), we next bound by following the proof strategies of Lemma 2.5 in [7]. Without loss of generality, we consider and bound in the interval . Define
[TABLE]
where and denote the real and imaginary part of , respectively. Then, we have
[TABLE]
where we have used
[TABLE]
for the third line [17, 23]. The last line follows from equation (2.25) and Lemma 2.7 of paper [1].
Then, in the interval , due to , we have
[TABLE]
with some . Similarly, we can get
[TABLE]
It follows that
[TABLE]
which implies that
[TABLE]
and we finish the proof.
Appendix E Proof of Theorem B.2
In this section, we extend the proof of Lemma 2.7 in [7] to prove our Theorem B.2. Define a vector-valued polynomial that shares the same form of as in (D.1), namely,
[TABLE]
where the random matrix kernel is defined in (D.2) and the coefficient vectors , are selected to satisfy
[TABLE]
Similar to Appendix D, we define another polynomial with the squared Fejér kernel , namely,
[TABLE]
where , are the coefficients that are selected such that
[TABLE]
It can be seen that the polynomial (E.2) is to (E.1) what (D.3) is to (D.1). Therefore, we can show that is upper bounded with a very small number when provided with (5.18) by using a similar strategy to that in paper [23]. This further implies that
[TABLE]
Then, we only need to bound and .
Note that the constraints in (E.3) can be expressed in the following matrix form
[TABLE]
with and . Define
[TABLE]
It is shown in [1] that is invertible, which implies that is also invertible and these coefficient vectors can be expressed as
[TABLE]
with
[TABLE]
Then, the coefficient vectors can be rewritten as
[TABLE]
Define as the infinity norm of a matrix . Using a similar method to that in the proof of Lemma 5.3 in the technical report [34], we can bound the norm of and as
[TABLE]
where we have used the bounds (B.7) and (B.8) in paper [7]. It follows that
[TABLE]
where the last inequality follows from (E.4) and (B.9) in paper [7].
With the same method used to obtain (B.2), we can show that
[TABLE]
when we consider without loss of generality. Therefore, we obtain (B.4) and finish the proof.
Appendix F Proof of Lemma B.1
Recall that the dual certificate constructed in [23] satisfies the two conditions (B.6) and (B.7) that are required in Definition B.1 when satisfies the lower bound given in (5.18). Therefore, we use the optimal constructed in [23], that is
[TABLE]
Here, denotes a weighting matrix and
[TABLE]
are some coefficients that satisfy
[TABLE]
where and are given as
[TABLE]
Plugging in the optimal dual certificate in (F.7), the polynomial can be represented as
[TABLE]
with
[TABLE]
As in [23], by setting , we have
[TABLE]
Recall that we assume satisfies the isotropy property (3.1), namely,
[TABLE]
which implies that
[TABLE]
where is the squared Fejér kernel. It follows that
[TABLE]
where and are two block matrices with size . and are two coefficient vectors defined in (F.10). It follows from [23] that
[TABLE]
where denotes a system matrix and is defined as
[TABLE]
with , .999Note that we use to denote the -th order derivative of , namely, , , and . To obtain (F.14), we have used [23, Lemma 7], [23, Lemma 4] and . The inequality in (F.14) also implies that
[TABLE]
since . Then, we have
[TABLE]
where the last inequality follows from (F.15) and the Cauchy-Schwarz inequality.
To bound , we are left with bounding and . Note that
[TABLE]
where the last inequality follows from the Cauchy-Schwarz inequality.
Denote as the maximum eigenvalue of a matrix . The second term in (F.17) can be bounded with
[TABLE]
for some numerical constant . Here we use Parseval’s theorem and [16] to get the last equality and inequality, respectively.
Next, we bound with the matrix Bernstein inequality [35]. Define a set of independent zero mean random matrices with
[TABLE]
where “” denotes the Kronecker product. Then, we have
[TABLE]
To apply the matrix Bernstein inequality, we need to bound the spectral norm and the matrix variance statistic of the sum:
[TABLE]
which we tackle separately in the sequel.
Note that we can bound the spectral norm as
[TABLE]
with some numerical constant . Here, the last inequality follows from
[TABLE]
On the other hand, we have
[TABLE]
and
[TABLE]
where is a matrix with the th entry being . Therefore, the matrix variance statistic of the sum can be bounded with . Then, applying the matrix Bernstein inequality [35] yields that
[TABLE]
for any . Set , which belongs to the interval if . Then, we have
[TABLE]
which immediately suggests that the following event
[TABLE]
holds with probability at least provided that . Conditioned on event , one can show that
[TABLE]
holds by using (F.18) provided that .
Similar to (F.17), we note that
[TABLE]
Using Parseval’s identify, we can bound the second term in (F.21) as
[TABLE]
with some numerical constant .
Now, we bound with the matrix Bernstein inequality [35]. Define a set of independent zero mean random matrices with
[TABLE]
Then, we have
[TABLE]
It can be seen that is also the first order derivative of with respect to . We can then bound its spectral norm as
[TABLE]
with some numerical constant . Here, the last inequality follows from (F.19).
Further, we have
[TABLE]
and
[TABLE]
where is a matrix with the th entry being . Therefore, the matrix variance statistic of the sum can be bounded with . Then, we combine the above bounds and apply the matrix Bernstein inequality to obtain
[TABLE]
for any . Set , which belongs to the interval if . Then, we have
[TABLE]
and the following event
[TABLE]
holds with probability at least provided that . Thus, one can show that
[TABLE]
holds on the event provided that .
Plugging (F.20) and (F.22) into (F.16), we can bound with
[TABLE]
and finish the proof of Lemma B.1 by taking square root on both sides of (F.23).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. J. Candès and C. Fernandez-Granda, “Towards a mathematical theory of super-resolution,” Communications on Pure and Applied Mathematics , vol. 67, no. 6, pp. 906–956, 2014.
- 2[2] C. W. Mccutchen, “Superresolution in microscopy and the abbe resolution limit,” JOSA , vol. 57, no. 10, pp. 1190–1192, 1967.
- 3[3] T. Harris, R. Grober, J. Trautman, and E. Betzig, “Super-resolution imaging spectroscopy,” Applied Spectroscopy , vol. 48, no. 1, pp. 14A–21A, 1994.
- 4[4] Y. Xie, S. Li, G. Tang, and M. B. Wakin, “Radar signal demixing via convex optimization,” in 2017 22nd International Conference on Digital Signal Processing (DSP) , pp. 1–5, IEEE, 2017.
- 5[5] K. G. Puschmann and F. Kneer, “On super-resolution in astronomical imaging,” Astronomy & Astrophysics , vol. 436, no. 1, pp. 373–378, 2005.
- 6[6] H. Greenspan, “Super-resolution in medical imaging,” The Computer Journal , vol. 52, no. 1, pp. 43–63, 2008.
- 7[7] E. J. Candès and C. Fernandez-Granda, “Super-resolution from noisy data,” Journal of Fourier Analysis and Applications , vol. 19, no. 6, pp. 1229–1254, 2013.
- 8[8] G. F. Margrave, M. P. Lamoureux, and D. C. Henley, “Gabor deconvolution: Estimating reflectivity by nonstationary deconvolution of seismic data,” Geophysics , vol. 76, no. 3, pp. W 15–W 30, 2011.
