The Noise Collector for sparse recovery in high dimensions
Miguel Moscoso, Alexei Novikov, George Papanicolaou, and Chrysoula, Tsogka

TL;DR
This paper introduces the Noise Collector method, an efficient approach for detecting sparse signals in high-dimensional noisy data without parameter estimation, ensuring zero false discoveries and exact support recovery under certain conditions.
Contribution
The paper proposes the Noise Collector matrix and algorithm, enabling robust sparse recovery in noisy settings without noise level estimation, with theoretical guarantees and practical efficiency.
Findings
Zero false discovery rate for any noise level
Exact support recovery when noise is moderate
Computational cost comparable to standard methods
Abstract
The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system can be found efficiently with an -norm minimization approach if the data is noiseless. Detection of the signal's support from data corrupted by noise is still a challenging problem, especially if the level of noise must be estimated. We propose a new efficient approach that does not require any parameter estimation. We introduce the Noise Collector (NC) matrix and solve an augmented system , where is the noise. We show that the -norm minimal solution of the augmented system has zero false discovery rate for any level of noise and with probability that tends to one as the dimension of increases to infinity. We also obtain exact support recovery if the noise isā¦
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\MakePerPage
footnote
The Noise Collector for sparse recovery in high dimensions
Miguel Moscoso111Department of Mathematics, Universidad Carlos III de Madrid, Leganes, Madrid 28911, Spain , Alexei Novikov222Department of Mathematics, Pennsylvania State University, University Park, PA 16802 , George Papanicolaou333Department of Mathematics, Stanford University, Stanford, CA 94305 , Chrysoula Tsogka444Department of Applied Mathematics, University of California, Merced, CA 95343
Abstract
The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system {\cal A}\mbox{\boldmath{\rho}}=\mbox{\boldmath{b}}_{0} can be found efficiently with an -norm minimization approach if the data is noiseless. Detection of the signalās support from data corrupted by noise is still a challenging problem, especially if the level of noise must be estimated. We propose a new efficient approach that does not require any parameter estimation. We introduce the Noise Collector (NC) matrix and solve an augmented system {\cal A}\mbox{\boldmath{\rho}}+{\cal C}\mbox{\boldmath{\eta}}=\mbox{\boldmath{b}}_{0}+\mbox{\boldmath{e}}, where is the noise. We show that the -norm minimal solution of the augmented system has zero false discovery rate for any level of noise and with probability that tends to one as the dimension of \mbox{\boldmath{b}}_{0} increases to infinity. We also obtain exact support recovery if the noise is not too large, and develop a Fast Noise Collector Algorithm which makes the computational cost of solving the augmented system comparable to that of the original one. Finally, we demonstrate the effectiveness of the method in applications to passive array imaging.
We want to find sparse solutions \mbox{\boldmath{\rho}}\in\mathbb{R}^{K} for
[TABLE]
from highly incomplete measurement data \mbox{\boldmath{b}}=\mbox{\boldmath{b}}_{0}+\mbox{\boldmath{e}}\in\mathbb{R}^{N}, corrupted by noise where . In the noiseless case, can be found exactly by solving the optimization problem [9]
[TABLE]
provided the measurement matrix satisfies additional conditions, e.g., decoherence or restricted isometry propertiesĀ [11, 4], and the solution vector has a small number of nonzero components or degrees of freedom. When measurements are noisy exact recovery is no longer possible. However the exact support of can be determined if the noise is not too strong. The most commonly used approach is to solve the -relaxed form of [2]
[TABLE]
known as Lasso in the statistics literature [26]. There are sufficient conditions for the support of \mbox{\boldmath{\rho}}_{\lambda} to be contained within the true support, see e.g. Fuchs [14], Tropp [27] and Wainwright [31]. These conditions depend on the signal-to-noise ratio (SNR), which is not known and must be estimated, and on the regularization parameter , which must be carefully chosen and/or adaptively changed [32]. Although such an adaptive procedure improves the outcome, the resulting solutions tend to include a large number of āfalse positivesā in practice [23]. Our contribution is a method for exact support recovery in the presence of additive noise. A key element of this method is that it has no tuning parameters. In particular, it does not require any prior knowledge of the level of noise which is often difficult to estimate.
Main Results. Suppose is an -sparse solution of the noiseless system inĀ (1), where the columns of have unit length. Our main result ensures that we can recover the support of by looking at the support of \mbox{\boldmath{\rho}}_{\tau} found as
[TABLE]
with an weight , and an appropriately chosen *Noise Collector * matrix , . The minimization problem [4] can be understood as a relaxation of [2]. It works by absorbing all the noise, and possibly some signal, in {\cal C}\mbox{\boldmath{\eta}}_{\tau}.
The following theorem shows that if the signal is pure noise, and the columns of the Noise Collector are chosen uniformly and independently at random on the unit sphere , then {\cal C}\mbox{\boldmath{\eta}}_{\tau}=\mbox{\boldmath{e}} for any level of noise, with high probability.
TheoremĀ 1 (No phantom signal): Suppose \mbox{\boldmath{b}}_{0}=0 and \mbox{\boldmath{e}}/\|\mbox{\boldmath{e}}\|_{l_{2}} is uniformly distributed on the unit sphere . Fix and draw columns for independently from the uniform distribution on . For any there are constants and such that, for and all , \mbox{\boldmath{\rho}}_{\tau}, the solution ofĀ (4), is zero with probability .
TheoremĀ 1 guarantees a zero false discovery rate in the absence of signals with meaningful information, with high probability. We generalize this result for the case in which the recorded signals carry useful information in the next Theorem, where we show that the support of \mbox{\boldmath{\rho}}_{\tau} is inside the support of .
TheoremĀ 2 (Zero false discoveries): Let be an -sparse solution of the noiseless system {\cal A}\mbox{\boldmath{\rho}}=\mbox{\boldmath{b}}_{0}. Assume , , the Noise Collector, the noise, and \mbox{\boldmath{\rho}}_{\tau} are the same as in TheoremĀ 1. In addition, assume that the columns of are incoherent, in the sense that |\langle\mbox{\boldmath{a}}_{i},\mbox{\boldmath{a}}_{j}\rangle|\leqslant\frac{1}{3M}. Then, there are constants and such that, for and all , \mbox{supp}(\mbox{\boldmath{\rho}}_{\tau})\subseteq\mbox{supp}(\mbox{\boldmath{\rho}}) with probability .
The incoherence conditions in TheoremĀ 2 are needed to guarantee that the true signal does not create false positives elsewhere. The next Theorem shows that if the noise is not too large, then \mbox{\boldmath{\rho}}_{\tau} and have exactly the same support.
TheoremĀ 3 (Exact support recovery): Keep the same assumptions as in TheoremĀ 2. Suppose the magnitudes of the non-zero entries of are bounded by . If \|\mbox{\boldmath{e}}\|_{l_{2}}/\|\mbox{\boldmath{b}}_{0}\|_{l_{2}}\leqslant c_{2}/\sqrt{\ln N}, , then \mbox{\boldmath{\rho}}_{\tau} and have the same support with probability .
Motivation. We are interested in imaging accurately sparse scenes using limited and noisy data. Such imaging problems arise in many areas such as medical imaging [29], structural biology [1], radar [2], and geophysicsĀ [24]. In imaging, the -norm minimization method inĀ (2) is used often, in e.g.Ā [19, 22, 16, 28, 12, 6]. This method has the desirable property of super-resolution, that is, the enhancement fine scale details of the images using, in this case, prior information about its low dimensional structure (sparsity). This has been analyzed in different settings by Donoho and Elad [10], CandĆØs and Fernandez-Granda [5], Fannjiang and Liao [13], and Borcea and Kocyigit [3], among others. We want to retain this property in our method when the data is corrupted by additive noise.
However, noise fundamentally limits the quality of the images formed with almost all computational imaging techniques. Specifically, -norm minimization produces images that are unstable for low SNR due to the ill-conditioning of super-resolution reconstruction schemes. The instability emerges as clutter noise in the images, or grass, that degrades the resolution. Our initial motivation to introduce the Noise Collector matrix was to regularize the matrix and, thus, to suppress the clutter in the images. We proposed inĀ [20] to seek the minimal -norm solution of the augmented linear system {\cal A}\mbox{\boldmath\rho}+{\cal C}\mbox{\boldmath{\eta}}=\mbox{\boldmathb}. The idea was to choose the columns of almost orthogonal to those of . Indeed, the condition number of becomes when columns of are taken at random. This essentially follows from the bounds on the largest and the smallest nonzero singular values of random matrices, see e.g. Theorem 4.6.1 inĀ [30].
The idea to create a dictionary for noise is not new. For example, the work by Laska et al. [17] considers a specific version of the measurement noise model so \mbox{\boldmath{b}}={\cal A}\mbox{\boldmath{\rho}}+{\cal C}\mbox{\boldmath{e}}, where is a matrix with fewer (orthonormal) columns than rows, and the noise vector is sparse. represents the basis in which the noise is sparse and it is assumed to be known. Then, they show that it is possible to recover sparse signals and sparse noise exactly using -norm minimization algorithms. We stress that we do not assume here that the noise is sparse. In our work, the noise is large (SNR can be small) and is evenly distributed across the data, so it cannot be sparsely accommodated.
To suppress the clutter, our theory inĀ [20] required exponentially many columns, so . This seemed to make the noise collector impractical, but the numerical experiments suggested that columns were enough to obtain excellent results. We address this issue here and explain why the noise collector matrix only needs algebraically many columns. Moreover, to make the absorption of noise less expensive, and thus improve the algorithm inĀ [20], we introduce the weight inĀ (4). Indeed, by weighting the columns of the noise collector matrix with respect to those in the model matrix , the algorithm now produces images with no clutter at all, no matter how much noise is added to the data.
Finally, we want the Noise Collector to be efficient, with almost no extra computational cost with respect to the Lasso problem in [3]. To this end, it is constructed using circulant matrices that allows for efficient matrix vector multiplications using FFTs.
The proofs of TheoremsĀ 1,Ā 2, andĀ 3 are given in Section Proofs. We now explain how the Noise Collector works.
The Noise Collector
The construction of the Noise Collector matrix starts with the following three key properties. Firstly, its columns should be sufficiently orthogonal to the columns of , so it does not absorb signals with āmeaningfulā information. Secondly, the columns of should be uniformly distributed on the unit sphere so that we could approximate well a typical noise vector. Thirdly, the number of columns in should grow slower than exponential with , otherwise the method is impractical. One way to guarantee all three properties is to impose
[TABLE]
with , and fill out drawing \mbox{\boldmath{c}}_{i} at random with rejections until the rejection rate becomes too high. Then, by construction, the columns of are almost orthogonal to the columns of , and when the rejection rate becomes too high this implies that we can not pack more N-dimensional unit vectors into and, thus, we can approximate well a typical noise vector. Finally, the Kabatjanskii-Levenstein inequality (see discussion inĀ [25]) implies that the number of columns in grows at most polynomially: .
It is, however, more convenient for the proofs to use a probabilistic version ofĀ [5]. Suppose that the columns of are drawn at random independently. Then, the dot product of any two random unit vectors is still typically of order , see e.g.Ā [30]. If the number of columns grows polynomially, we only have to sacrifice an asymptotically negligible event where our Noise Collector does not satisfy the three key properties, and the decoherence constraints inĀ [5] are weakened by a logarithmic factor. The next Lemma is proved in Section Proofs.
LemmaĀ 1: Suppose , , vectors \mbox{\boldmath{c}}_{i}\in\mathbb{S}^{N-1} are drawn at random and independently. Then, (i) for any there are constants and , such that
[TABLE]
and (ii) for any \mbox{\boldmath{e}}\in\mathbb{S}^{N-1} there exists at least one \mbox{\boldmath{c}}_{j}, so that
[TABLE]
with probability .
The estimate inĀ [6] implies that any solution {\cal C}\mbox{\boldmath{\eta}}=\mbox{\boldmath{a}}_{i} satisfies, for any ,
[TABLE]
with probability . This estimate measures how expensive it is to approximate columns of with the Noise Collector. In turn, the weight should be chosen so that it is expensive to approximate noise using columns of . It cannot be taken too large, though, because we may loose the signal. In fact, one can prove that if , then \mbox{\boldmath{\rho}}_{\tau}\equiv 0 for any and any level of noise. Intuitively, the weight characterizes the rate at which the signal is lost as the noise increases.
To explain the theoretical lower bound we turn to the geometric interpretation of duality in linear programming. Suppose and there is no signal, \mbox{\boldmath{b}}_{0}. Then, the solution ofĀ [4] satisfies (\mbox{\boldmath{\rho}}_{\infty},\mbox{\boldmath{\eta}}_{\infty})=(\mbox{\boldmath{0}},\mbox{\boldmath{\eta}}), and there is a dual certificate of optimality of (\mbox{\boldmath{0}},\mbox{\boldmath{\eta}}) for that satisfies
[TABLE]
Define a nonlinear map
[TABLE]
where \mbox{\boldmath{e}}\in\mathbb{R}^{N} is the noise vector inĀ [4], and is the dual certificate of optimality of (\mbox{\boldmath{0}},\mbox{\boldmath{\eta}}) for . For example, if is the identity matrix, then \Phi_{\cal C}(\mbox{\boldmath{e}})=(\hbox{sign}(e_{1}),\dots,\hbox{sign}(e_{N})); see FigureĀ 1-left. If \mbox{\boldmath{z}}=\Phi_{\cal C}(\mbox{\boldmath{e}}) remains a dual certificate of optimality of (\mbox{\boldmath{0}},\mbox{\boldmath{\eta}}) for , then it implies that support(\mbox{\boldmath{\rho}}_{\tau})\subset support(\mbox{\boldmath{\rho}}) for such . Thus, TheoremĀ 1 follows once we check that
[TABLE]
holds with large probability. Thus, we need to understand the statistics of \mbox{\boldmath{z}}=\Phi_{\cal C}(\mbox{\boldmath{e}}), given that \mbox{\boldmath{e}}/\|\mbox{\boldmath{e}}\| is uniformly distributed on . The columns of the Noise Collector were also uniformly distributed on , thus the vector \mbox{\boldmath{n}}=\mbox{\boldmath{z}}/\|\mbox{\boldmath{z}}\|_{l_{2}} has to be uniformly distributed on as well. The chanceĀ (10) does not hold, could be estimated by the area of the intersection of the unit sphere and the ball of radius (see FigureĀ 1-right), which can be shown to be small by standard estimates from high-dimensional probability.
By construction, the columns of the combined matrix are incoherent. This is the key observation, that allows us to prove TheoremsĀ 2 andĀ 3 using standard techniques, see e.g.Ā [20]. In particular, we automatically have exact recovery by the standard argumentsĀ [11] applied to if the data is noiseless.
Lemma 2 (Exact Recovery): Suppose is an -sparse solution of {\cal A}\mbox{\boldmath{\rho}}=\mbox{\boldmath{b}}, and there is no noise, \mbox{\boldmath{e}}=0. In addition, assume that the columns of are incoherent: |\langle\mbox{\boldmath{a}}_{i},\mbox{\boldmath{a}}_{j}\rangle|\leqslant\frac{1}{3M}. Then, the solution toĀ [4] satisfies \mbox{\boldmath\rho}_{\tau}=\mbox{\boldmath\rho} for all
[TABLE]
with probability .
Fast Noise Collector Algorithm
To find the minimizer [4], we consider a variational approach. We define the function
[TABLE]
for a weight , and determine the solution as
[TABLE]
The key observation is that this variational principle finds the minimum in [4] exactly for all values of the regularization parameter . Hence, the proposed method is fully automated, meaning that it has no tuning parameters. To determine the exact extremum in [13], we use the iterative soft thresholding algorithm GeLMAĀ [21] that works as follows .
For we use in our numerical experiments. For optimal results, one can calibrate to be the smallest constant such that Theorem 1 holds, that is, we see no phantom signals when the algorithm is fed with pure noise.
Pick a value for the regularization parameter , e.g. . Choose step sizes and 555Choosing two step sizes instead of the smaller one improves the convergence speed.. Set \mbox{\boldmath{\rho}}_{0}=\mbox{\boldmath{0}}, \mbox{\boldmath{\eta}}_{0}=\mbox{\boldmath{0}}, \mbox{\boldmath{z}}_{0}=\mbox{\boldmath{0}}, and iterate for :
[TABLE]
where .
The Noise Collector matrix is computed by drawing normally distributed -dimensional vectors, normalized to unit length. These are the generating vectors of the Noise Collector. From each of them a circulant matrix , , is constructed. The Noise Collector matrix is obtained by concatenation, so . Exploiting the circulant structure of the matrices , we perform the matrix vector multiplications {\cal C}\mbox{\boldmath{\eta}}_{k} and {\cal C}^{*}(\mbox{\boldmath{z}}_{k}+\mbox{\boldmath{r}}) in (14) using the FFT [15]. This makes the complexity associated to the Noise Collector . Note that only the generating vectors are stored, and not the entire Noise Collector matrix. In practice, we use which makes the cost of using the Noise Collector negligible, as typically .
Application to imaging
We consider passive array imaging of point sources. The problem consists in determining the positions \vec{\mbox{\boldmath{z}}}_{j} and the complex666We chose to work with real numbers in the previous sections for ease of presentation but the results also hold with complex numbers. amplitudes , , of a few point sources from measurements of polychromatic signals on an array of receivers; see Figure 2. The imaging system is characterized by the array aperture , the distance to the sources, the bandwidth and the central wavelength .
The sources are located inside an image window IW, which is discretized with a uniform grid of points \vec{\mbox{\boldmath{y}}}_{k}, . The unknown is the source vector \mbox{\boldmath{\rho}}=[\rho_{1},\ldots,\rho_{K}]^{\intercal}\in\mathbb{C}^{K}, whose components correspond to the complex amplitudes of the sources at the grid points \vec{\mbox{\boldmath{y}}}_{k}, , with . For the true source vector we have if \vec{\mbox{\boldmath{y}}}_{k}=\vec{\mbox{\boldmath{z}}}_{j} for some , while otherwise.
Denoting by G(\vec{\mbox{\boldmath{x}}},\vec{\mbox{\boldmath{y}}};\omega) the Greenās function for the propagation of a signal of angular frequency from point \vec{\mbox{\boldmath{y}}} to point \vec{\mbox{\boldmath{x}}}, we define the single-frequency Greenās function vector that connects a point \vec{\mbox{\boldmath{y}}} in the IW with all points \vec{\mbox{\boldmath{x}}}_{r}, , on the array as
[TABLE]
In a homogeneous medium in three dimensions, G(\vec{\mbox{\boldmath{x}}},\vec{\mbox{\boldmath{y}}};\omega)=\frac{\exp\{\mathrm{i}\omega|\vec{\mbox{\boldmath{x}}}-\vec{\mbox{\boldmath{y}}}|/c_{0}\}}{4\pi|\vec{\mbox{\boldmath{x}}}-\vec{\mbox{\boldmath{y}}}|}.
The data for the imaging problem are the signals
[TABLE]
recorded at receiver locations \vec{\mbox{\boldmath{x}}}_{r}, , at frequencies , . These data are stacked in a column vector
[TABLE]
with \mbox{\boldmath{b}}(\omega_{l})=[b(\vec{\mbox{\boldmath{x}}}_{1},\omega_{l}),b(\vec{\mbox{\boldmath{x}}}_{2},\omega_{l}),\dots,b(\vec{\mbox{\boldmath{x}}}_{N},\omega_{l})]^{\intercal}\in\mathbb{C}^{N}. Then, {\cal A}\,\mbox{\boldmath\rho}=\mbox{\boldmathb}, with the measurement matrix whose columns \mbox{\boldmath{a}}_{k} are the multiple-frequency Greenās function vectors
[TABLE]
normalized to have length one. The system {\cal A}\,\mbox{\boldmath\rho}=\mbox{\boldmathb} relates the unknown vector \mbox{\boldmath\rho}\in\mathbb{C}^{K} to the data vector \mbox{\boldmathb}\in\mathbb{C}^{(N\cdot S)}.
Next, we illustrate the performance of the Noise Collector in this imaging setup. The most important features are that (i) no calibration is necessary with respect to the level of noise, (ii) exact support recovery for relatively large levels of noise (i.e., \|\mbox{\boldmath{e}}\|_{l_{2}}/\|\mbox{\boldmath{b}}_{0}\|_{l_{2}}\leqslant c_{2}/\sqrt{\ln N}), and (iii) zero false discovery rate for all levels of noise, with high probability.
We consider a high frequency microwave imaging regime with central frequency GHz corresponding to mm. We make measurements for equally spaced frequencies spanning a bandwidth GHz. The array has receivers and an aperture cm. The distance from the array to the center of the imaging window is cm. Then, the resolution is mm in the cross-range (direction parallel to the array) and mm in range (direction of propagation). These parameters are typical in microwave scanning technology [18].
We seek to image a source vector with sparsity ; see the left plot in Fig. 3. The size of the imaging window is 20cm60cm and the pixel spacing is 5mm15mm. The number of unknowns is, therefore, and the number of data is . The size of the noise collector is taken to be , so . When the data is noiseless, we obtain exact recovery as expected; see the right plot in Fig. 3
In Fig. 4, we display the imaging results, with and without a Noise Collector, when the data is corrupted by additive noise. The SNRĀ , so the -norms of the signals and the noise are equal. In the left plot, we show the recovered image using -norm minimization without a Noise Collector. There is a lot of grass in this image, with many non-zero values outside the true support. When a Noise Collector is used, the level of the grass is reduced and the image improves; see the second from the left plot. Still, there are several false discoveries because we use in [14].
In the third column from the left of Fig. 4 we show the image obtained with a weight in [14]. With this weight, there are no false discoveries and the recovered support is exact. This simplifies the imaging problem dramatically, as we can now restrict the inverse problem to the true support just obtained, and then solve an overdetermined linear system using a classical approach. The results are shown in the right column of Fig. 4. Note that this second step largely compensates for the signal that was lost in the first step due to the high level of noise.
In Figure 5 we illustrate the performance of the Noise Collector for different sparsity levels and SNR values. Success in recovering the true support of the unknown corresponds to the value (yellow) and failure to [math] (blue). The small phase transition zone (green) contains intermediate values. These results are obtained by averaging over 5 realizations of noise.
Remark 1: We considered passive array imaging for ease of presentation. Same results hold for active array imaging with or without multiple scattering; see [7] for the detailed analytical setup.
Remark 2: We have considered a microwave imaging regime. Similar results can be obtained in other regimes.
Proofs
Proof of Lemma 1: Denote the event
[TABLE]
By independence, \mathbb{P}\left(|\langle\mbox{\boldmath{a}}_{i},\mbox{\boldmath{c}}_{j}\rangle|\geqslant t/\sqrt{N}\right)\leqslant 2\exp(-t^{2}/2) for any and . Thus, . Choosing for sufficiently large , we get
[TABLE]
where and . Hence, Ā [6] holds with large probability .
Next, we consider the chances thatĀ [7] does not hold. Suppose there is a direction \mbox{\boldmath{b}}\in\mathbb{S}^{N-1} such that
[TABLE]
holds for all . Let V_{k}(\mbox{\boldmath{c}}_{i_{1}},\dots,\mbox{\boldmath{c}}_{i_{k}}) be the -dimensional volume of a parallelogram spanned by \mbox{\boldmath{c}}_{i_{1}}, , \mbox{\boldmath{c}}_{i_{k}}. Note that is equal to times its height. Then, ifĀ (18) holds,
[TABLE]
for any choice of columns \mbox{\boldmath{c}}_{i_{j}} from the noise collector . If we fix the indices , , then, due to rotational invariance, the probability of the eventĀ (19) equals the probability of event |\langle\mbox{\boldmath{c}}_{1},\mbox{\boldmath{e}}_{1}\rangle|\leqslant 2\alpha/\sqrt{N}. Using
[TABLE]
and that we can find sets of distinct indices , , , we conclude that
[TABLE]
Choosing sufficiently small, i.e. , and sufficiently large, we obtain the result.
Proof of TheoremĀ 1: In order to checkĀ (10), we assume that both \mbox{\boldmath{c}}_{i} and -\mbox{\boldmath{c}}_{i} are in , because it is more geometrically intuitive to work with the convex hull
[TABLE]
It implies we may also assume inĀ (4) has non-negative coefficients, and \|\mbox{\boldmath{\eta}}\|_{l_{1}}=\min_{\lambda>0}\{\mbox{\boldmath{e}}\in\lambda H\}. Thus, \|\mbox{\boldmath{\eta}}\|_{l_{1}} is a norm of with respect to , and we can set \|\mbox{\boldmath{e}}\|_{{\cal C}}:=\|\mbox{\boldmath{\eta}}\|_{l_{1}}. This norm is called atomic inĀ [8]. Suppose is the support of . Its typical size . Then, the simplex
[TABLE]
has the unique normal vector , which is collinear to our dual certificate , because
[TABLE]
The estimateĀ (7) implies that the convex hull contains an -ball of radius . Therefore, \|\mbox{\boldmath{z}}\|_{l_{2}}\leqslant\sqrt{N}/\alpha with large probability.
By construction, the distribution of \Phi_{\cal C}(\mbox{\boldmath{e}}) is rotationally invariant with respect to the probability measure induced by all \mbox{\boldmath{c}}_{i} and . Thus \mbox{\boldmath{n}}=\mbox{\boldmath{z}}/\|\mbox{\boldmath{z}}\|_{l_{2}} is also uniformly distributed on \mbox{\boldmath{S}}^{N-1}, and
[TABLE]
for all , see e.g.Ā [30]. Therefore, we can bound the probability that (10) does not hold:
[TABLE]
[TABLE]
for large and appropriately chosen . Hence, (10) holds with large probability .
Proof of TheoremĀ 2: If the columns of are orthogonal, our previous arguments could be modified to verify Theorem 2. Indeed, suppose is the span of the column vectors \mbox{\boldmath{a}}_{j}, with in the support of . Say, is spanned by \mbox{\boldmath{a}}_{1}, , \mbox{\boldmath{a}}_{M}. Let be the orthogonal complement to . Then, the orthogonal projection of the signal \mbox{\boldmath{\rho}}^{w}=0. By the concentration of measure see e.g.Ā [30], the projection of the noise \mbox{\boldmath{e}}^{w} is uniformly distributed on the unit sphere with large probability. Applying the previous arguments to \mbox{\boldmath{z}}^{w}, the projection of on , we conclude that the projection \mbox{\boldmath{\rho}}^{w}_{\tau}=0. Therefore, \mbox{supp}(\mbox{\boldmath{\rho}}_{\tau})\subseteq\mbox{supp}(\mbox{\boldmath{\rho}}) with large probability.
For general consider the orthogonal decomposition \mbox{\boldmath{a}}_{i}=\mbox{\boldmath{a}}_{i}^{v}+\mbox{\boldmath{a}}_{i}^{w} for all . As before, we can choose so that |\langle\mbox{\boldmath{a}}_{i}^{w},\mbox{\boldmath{z}}\rangle|<\tau/2 with large probability. It remains to demonstrate that |\langle\mbox{\boldmath{a}}_{i}^{v},\mbox{\boldmath{z}}\rangle|\leqslant\tau/2. Fix any . Suppose \mbox{\boldmath{a}}_{i}^{v}=\sum_{k=1}^{M}\alpha_{k}\mbox{\boldmath{a}}_{k}, and |\alpha_{j}|=\max_{k\leqslant M}|\alpha_{k}|=\|\mbox{\boldmath{\alpha}}\|_{l_{\infty}}. Thus,
[TABLE]
Then, \|\mbox{\boldmath{\alpha}}\|_{l_{\infty}}\leqslant 1/2M, so \|\mbox{\boldmath{\alpha}}\|_{l_{1}}\leqslant M\|\mbox{\boldmath{\alpha}}\|_{l_{\infty}}\leqslant 1/2. Hence,
[TABLE]
Proof of TheoremĀ 3: It suffices to prove the result for 1-sparse , say, \mbox{\boldmath{\rho}}=(1,0,\dots,0). We will demonstrate that the solution to the minimization problem
[TABLE]
with , satisfies if , . This implies \mbox{supp}(\mbox{\boldmath\rho}_{\tau})=\mbox{supp}(\mbox{\boldmath\rho}).
Suppose \mbox{\boldmath{\eta}}_{\mbox{\boldmath{e}}}, \mbox{\boldmath{\eta}}_{\mbox{\boldmath{a}}_{1}} and \mbox{\boldmath{\eta}}_{t} are the optimal solutions of
[TABLE]
with right-hand sides \mbox{\boldmath{b}}=\mbox{\boldmath{e}}, \mbox{\boldmath{b}}=\mbox{\boldmath{a}}_{1}, and \mbox{\boldmath{b}}=\mbox{\boldmath{e}}+t\mbox{\boldmath{a}}_{1}, respectively. Since {\cal C}\left(\mbox{\boldmath{\eta}}_{t}-\mbox{\boldmath{\eta}}_{\mbox{\boldmath{e}}}\right)=t\mbox{\boldmath{a}}_{1}, we have
[TABLE]
and, therefore,
[TABLE]
[TABLE]
respectively. Suppose , . Then, for any , and for large enough,
[TABLE]
UsingĀ (25) with , we conclude that
[TABLE]
for all . It implies (23).
Acknowledgements The work of M. Moscoso was partially supported by Spanish grant MICINN FIS2016-77892-R. The work of A.Novikov was partially supported by NSF grants DMS-1515187, DMS-1813943. The work of G. Papanicolaou was partially supported by AFOSR FA9550-18-1-0519. The work of C. Tsogka was partially supported by AFOSR FA9550-17-1-0238 and FA9550-18-1-0519. We thank Marguerite Novikov for drawing FigureĀ 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Al Quraishi and H. H. Mc Adams, Direct inference of protein DNA interactions using compressed sensing methods, Proc. Natl. Acad. Sci. U.S.A 108 ,14819ā14824 (2001).
- 2[2] R. Baraniuk and Philippe Steeghs, Compressive Radar Imaging, in 2007 IEEE Radar Conference , Apr. 2007, 128ā133.
- 3[3] L. Borcea and I. Kocyigit, Resolution analysis of imaging with ā 1 subscript ā 1 \ell_{1} optimization, SIAM J. Imaging Sci. 8 , 3015ā3050 (2015).
- 4[4] E. J CandĆØs and T. Tao, Decoding by linear programming, IEEE Trans. Inf. Theory 51 , 4203ā4215 (2005).
- 5[5] E. J. CandĆØs and C. Fernandez-Granda, Towards a mathematical theory of super-resolution, Comm. Pure Appl. Math. 67 , 906-956 (2014).
- 6[6] A. Chai, M. Moscoso and G. Papanicolaou, Robust imaging of localized scatterers using the singular value decomposition and ā 1 subscript ā 1 \ell_{1} optimization, Inverse Problems 29 , 025016 (2013).
- 7[7] A. Chai, M. Moscoso and G. Papanicolaou, Imaging Strong Localized Scatterers with Sparsity Promoting Optimization, SIAM J. Imaging Sci. 7 , 1358ā1387 (2014).
- 8[8] V. Chandrasekaran, B.Recht, P. A. Parrilo, A. S. Willsky, The convex geometry of linear inverse problems, Found. Comput. Math. 12 , 805ā849 (2012).
