The Noise Collector for sparse recovery in high dimensions

Miguel Moscoso; Alexei Novikov; George Papanicolaou; and Chrysoula; Tsogka

arXiv:1908.04412·eess.SP·June 8, 2022

The Noise Collector for sparse recovery in high dimensions

Miguel Moscoso, Alexei Novikov, George Papanicolaou, and Chrysoula, Tsogka

PDF

TL;DR

This paper introduces the Noise Collector method, an efficient approach for detecting sparse signals in high-dimensional noisy data without parameter estimation, ensuring zero false discoveries and exact support recovery under certain conditions.

Contribution

The paper proposes the Noise Collector matrix and algorithm, enabling robust sparse recovery in noisy settings without noise level estimation, with theoretical guarantees and practical efficiency.

Findings

01

Zero false discovery rate for any noise level

02

Exact support recovery when noise is moderate

03

Computational cost comparable to standard methods

Abstract

The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system $A ρ = b_{0}$ can be found efficiently with an $l_{1}$ -norm minimization approach if the data is noiseless. Detection of the signal's support from data corrupted by noise is still a challenging problem, especially if the level of noise must be estimated. We propose a new efficient approach that does not require any parameter estimation. We introduce the Noise Collector (NC) matrix $C$ and solve an augmented system $A ρ + C η = b_{0} + e$ , where $e$ is the noise. We show that the $l_{1}$ -norm minimal solution of the augmented system has zero false discovery rate for any level of noise and with probability that tends to one as the dimension of $b_{0}$ increases to infinity. We also obtain exact support recovery if the noise is…

Equations86

{\cal A}\,\mbox{\boldmath{$\rho$}}=\mbox{\boldmath{$b$}},

{\cal A}\,\mbox{\boldmath{$\rho$}}=\mbox{\boldmath{$b$}},

\mbox{\boldmath{$\rho$}}_{*}=\arg\min_{\small\mbox{\boldmath{$\rho$}}}\|\mbox{\boldmath{$\rho$}}\|_{\ell_{1}},\hbox{ subject to }{\cal A}\,\mbox{\boldmath{$\rho$}}=\mbox{\boldmath{$b$}},

\mbox{\boldmath{$\rho$}}_{*}=\arg\min_{\small\mbox{\boldmath{$\rho$}}}\|\mbox{\boldmath{$\rho$}}\|_{\ell_{1}},\hbox{ subject to }{\cal A}\,\mbox{\boldmath{$\rho$}}=\mbox{\boldmath{$b$}},

\displaystyle\mbox{\boldmath{$\rho$}}_{\lambda}=\arg\min_{\small\mbox{\boldmath{$\rho$}}}\left(\lambda\|\mbox{\boldmath{$\rho$}}\|_{\ell_{1}}+\|\mbox{\boldmath{${\cal A}$}}\mbox{\boldmath{$\rho$}}-\mbox{\boldmath{$b$}}\|^{2}_{\ell_{2}}\right),

\displaystyle\mbox{\boldmath{$\rho$}}_{\lambda}=\arg\min_{\small\mbox{\boldmath{$\rho$}}}\left(\lambda\|\mbox{\boldmath{$\rho$}}\|_{\ell_{1}}+\|\mbox{\boldmath{${\cal A}$}}\mbox{\boldmath{$\rho$}}-\mbox{\boldmath{$b$}}\|^{2}_{\ell_{2}}\right),

\displaystyle\left(\mbox{\boldmath{$\rho$}}_{\tau},\mbox{\boldmath{$\eta$}}_{\tau}\right)=\arg\min_{\small\mbox{\boldmath{$\rho$}},\small\mbox{\boldmath{$\eta$}}}\left(\tau\|\mbox{\boldmath{$\rho$}}\|_{l_{1}}+\|\mbox{\boldmath{$\eta$}}\|_{l_{1}}\right),

\displaystyle\left(\mbox{\boldmath{$\rho$}}_{\tau},\mbox{\boldmath{$\eta$}}_{\tau}\right)=\arg\min_{\small\mbox{\boldmath{$\rho$}},\small\mbox{\boldmath{$\eta$}}}\left(\tau\|\mbox{\boldmath{$\rho$}}\|_{l_{1}}+\|\mbox{\boldmath{$\eta$}}\|_{l_{1}}\right),

\displaystyle\hbox{ subject to }{\cal A}\mbox{\boldmath{$\rho$}}+{\cal C}\mbox{\boldmath{$\eta$}}=\mbox{\boldmath{$b$}}_{0}+\mbox{\boldmath{$e$}},

|\langle\mbox{\boldmath{$a$}}_{i},\mbox{\boldmath{$c$}}_{j}\rangle|<\frac{\alpha}{\sqrt{N}}~{}\forall i,j\,,\hbox{ and }|\langle\mbox{\boldmath{$c$}}_{i},\mbox{\boldmath{$c$}}_{j}\rangle|<\frac{\alpha}{\sqrt{N}}~{}\forall i\neq j,

|\langle\mbox{\boldmath{$a$}}_{i},\mbox{\boldmath{$c$}}_{j}\rangle|<\frac{\alpha}{\sqrt{N}}~{}\forall i,j\,,\hbox{ and }|\langle\mbox{\boldmath{$c$}}_{i},\mbox{\boldmath{$c$}}_{j}\rangle|<\frac{\alpha}{\sqrt{N}}~{}\forall i\neq j,

\forall i,j|\langle\mbox{\boldmath{$a$}}_{i},\mbox{\boldmath{$c$}}_{j}\rangle|<c_{0}\sqrt{\ln N}/\sqrt{N},

\forall i,j|\langle\mbox{\boldmath{$a$}}_{i},\mbox{\boldmath{$c$}}_{j}\rangle|<c_{0}\sqrt{\ln N}/\sqrt{N},

|\langle\mbox{\boldmath{$e$}},\mbox{\boldmath{$c$}}_{j}\rangle|\geqslant\alpha/\sqrt{N}\,

|\langle\mbox{\boldmath{$e$}},\mbox{\boldmath{$c$}}_{j}\rangle|\geqslant\alpha/\sqrt{N}\,

\|\mbox{\boldmath{$\eta$}}\|_{\ell_{1}}\geqslant\frac{\sqrt{N}}{c_{0}\sqrt{\ln N}}\,,

\|\mbox{\boldmath{$\eta$}}\|_{\ell_{1}}\geqslant\frac{\sqrt{N}}{c_{0}\sqrt{\ln N}}\,,

\langle\mbox{\boldmath{$c$}}_{j},\mbox{\boldmath{$z$}}\rangle=\hbox{sign}(\eta_{j})\,\hbox{ if }\eta_{j}\neq 0,\hbox{ and }|\langle\mbox{\boldmath{$c$}}_{j},\mbox{\boldmath{$z$}}\rangle|\leqslant 1\,\hbox{ if }\eta_{j}=0.

\langle\mbox{\boldmath{$c$}}_{j},\mbox{\boldmath{$z$}}\rangle=\hbox{sign}(\eta_{j})\,\hbox{ if }\eta_{j}\neq 0,\hbox{ and }|\langle\mbox{\boldmath{$c$}}_{j},\mbox{\boldmath{$z$}}\rangle|\leqslant 1\,\hbox{ if }\eta_{j}=0.

\Phi_{\cal C}:\mbox{\boldmath{$e$}}\to\mbox{\boldmath{$z$}},

\Phi_{\cal C}:\mbox{\boldmath{$e$}}\to\mbox{\boldmath{$z$}},

|\langle\mbox{\boldmath{$a$}}_{j},\mbox{\boldmath{$z$}}\rangle|<\tau,\hbox{ for all }j\leqslant K,

|\langle\mbox{\boldmath{$a$}}_{j},\mbox{\boldmath{$z$}}\rangle|<\tau,\hbox{ for all }j\leqslant K,

M < \frac{N}{c _{0} ln N τ},

M < \frac{N}{c _{0} ln N τ},

\displaystyle F(\mbox{\boldmath{$\rho$}},\mbox{\boldmath{$\eta$}},\mbox{\boldmath{$z$}})=\lambda\,(\tau\|\mbox{\boldmath{$\rho$}}\|_{\ell_{1}}+\|\mbox{\boldmath{$\eta$}}\|_{\ell_{1}})

\displaystyle F(\mbox{\boldmath{$\rho$}},\mbox{\boldmath{$\eta$}},\mbox{\boldmath{$z$}})=\lambda\,(\tau\|\mbox{\boldmath{$\rho$}}\|_{\ell_{1}}+\|\mbox{\boldmath{$\eta$}}\|_{\ell_{1}})

\displaystyle+\frac{1}{2}\|{\cal A}\mbox{\boldmath{$\rho$}}+{\cal C}\mbox{\boldmath{$\eta$}}-\mbox{\boldmath$b$}\|^{2}_{\ell_{2}}+\langle\mbox{\boldmath{$z$}},\mbox{\boldmath$b$}-{\cal A}\mbox{\boldmath{$\rho$}}-{\cal C}\mbox{\boldmath{$\eta$}}\rangle

\max_{\mbox{\boldmath{$z$}}}\min_{\mbox{\boldmath{$\rho$}},\mbox{\boldmath{$\eta$}}}F(\mbox{\boldmath{$\rho$}},\mbox{\boldmath{$\eta$}},\mbox{\boldmath{$z$}}).

\max_{\mbox{\boldmath{$z$}}}\min_{\mbox{\boldmath{$\rho$}},\mbox{\boldmath{$\eta$}}}F(\mbox{\boldmath{$\rho$}},\mbox{\boldmath{$\eta$}},\mbox{\boldmath{$z$}}).

\displaystyle\mbox{\boldmath{$r$}}=\mbox{\boldmath{$b$}}-{\cal A}\,\mbox{\boldmath$\rho$}_{k}-{\cal C}\,\mbox{\boldmath{$\eta$}}_{k}\,,

\displaystyle\mbox{\boldmath{$r$}}=\mbox{\boldmath{$b$}}-{\cal A}\,\mbox{\boldmath$\rho$}_{k}-{\cal C}\,\mbox{\boldmath{$\eta$}}_{k}\,,

\displaystyle\mbox{\boldmath{$\rho$}}_{k+1}=\mathcal{S}_{\,\tau\,\lambda\Delta t_{1}}\left(\mbox{\boldmath$\rho$}_{k}+\Delta t_{1}\,{\cal A}^{*}(\mbox{\boldmath{$z$}}_{k}+\mbox{\boldmath{$r$}})\right)\,,

\displaystyle\mbox{\boldmath{$\eta$}}_{k+1}=\mathcal{S}_{\lambda\Delta t_{1}}\left(\mbox{\boldmath{$\eta$}}_{k}+\Delta t_{1}\,{\cal C}^{*}(\mbox{\boldmath{$z$}}_{k}+\mbox{\boldmath{$r$}})\right)\,,

\displaystyle\mbox{\boldmath{$z$}}_{k+1}=\mbox{\boldmath{$z$}}_{k}+\Delta t_{2}\,\mbox{\boldmath{$r$}}\,,

\mbox{\boldmath{$g$}}(\vec{\mbox{\boldmath{$y$}}};\omega)=[G(\vec{\mbox{\boldmath{$x$}}}_{1},\vec{\mbox{\boldmath{$y$}}};\omega),G(\vec{\mbox{\boldmath{$x$}}}_{2},\vec{\mbox{\boldmath{$y$}}};\omega),\ldots,G(\vec{\mbox{\boldmath{$x$}}}_{N},\vec{\mbox{\boldmath{$y$}}};\omega)]^{\intercal}\in\mathbb{C}^{N}\,.

\mbox{\boldmath{$g$}}(\vec{\mbox{\boldmath{$y$}}};\omega)=[G(\vec{\mbox{\boldmath{$x$}}}_{1},\vec{\mbox{\boldmath{$y$}}};\omega),G(\vec{\mbox{\boldmath{$x$}}}_{2},\vec{\mbox{\boldmath{$y$}}};\omega),\ldots,G(\vec{\mbox{\boldmath{$x$}}}_{N},\vec{\mbox{\boldmath{$y$}}};\omega)]^{\intercal}\in\mathbb{C}^{N}\,.

b(\vec{\mbox{\boldmath{$x$}}}_{r},\omega_{l})=\sum_{j=1}^{M}\alpha_{j}G(\vec{\mbox{\boldmath{$x$}}}_{r},\vec{\mbox{\boldmath{$z$}}}_{j};\omega_{l})

b(\vec{\mbox{\boldmath{$x$}}}_{r},\omega_{l})=\sum_{j=1}^{M}\alpha_{j}G(\vec{\mbox{\boldmath{$x$}}}_{r},\vec{\mbox{\boldmath{$z$}}}_{j};\omega_{l})

\mbox{\boldmath{$b$}}=[\mbox{\boldmath{$b$}}(\omega_{1})^{\intercal},\mbox{\boldmath{$b$}}(\omega_{2})^{\intercal},\dots,\mbox{\boldmath{$b$}}(\omega_{S})^{\intercal}]^{\intercal}\in\mathbb{C}^{(N\cdot S)}\,,

\mbox{\boldmath{$b$}}=[\mbox{\boldmath{$b$}}(\omega_{1})^{\intercal},\mbox{\boldmath{$b$}}(\omega_{2})^{\intercal},\dots,\mbox{\boldmath{$b$}}(\omega_{S})^{\intercal}]^{\intercal}\in\mathbb{C}^{(N\cdot S)}\,,

\mbox{\boldmath{$a$}}_{k}=[\mbox{\boldmath{$g$}}(\vec{\mbox{\boldmath{$y$}}}_{k};\omega_{1})^{\intercal},\mbox{\boldmath{$g$}}(\vec{\mbox{\boldmath{$y$}}}_{k};\omega_{2})^{\intercal},\dots,\mbox{\boldmath{$g$}}(\vec{\mbox{\boldmath{$y$}}}_{k};\omega_{S})^{\intercal}]^{\intercal}\in\mathbb{C}^{(N\cdot S)}\,,

\mbox{\boldmath{$a$}}_{k}=[\mbox{\boldmath{$g$}}(\vec{\mbox{\boldmath{$y$}}}_{k};\omega_{1})^{\intercal},\mbox{\boldmath{$g$}}(\vec{\mbox{\boldmath{$y$}}}_{k};\omega_{2})^{\intercal},\dots,\mbox{\boldmath{$g$}}(\vec{\mbox{\boldmath{$y$}}}_{k};\omega_{S})^{\intercal}]^{\intercal}\in\mathbb{C}^{(N\cdot S)}\,,

\Omega_{t}=\left\{\max_{i,j}|\langle\mbox{\boldmath{$a$}}_{i},\mbox{\boldmath{$c$}}_{j}\rangle|\geqslant t/\sqrt{N}\right\}.

\Omega_{t}=\left\{\max_{i,j}|\langle\mbox{\boldmath{$a$}}_{i},\mbox{\boldmath{$c$}}_{j}\rangle|\geqslant t/\sqrt{N}\right\}.

P (Ω_{t}) ⩽ C N^{β + 1} N^{- c_{0}^{2} /2} ⩽ N^{- κ},

P (Ω_{t}) ⩽ C N^{β + 1} N^{- c_{0}^{2} /2} ⩽ N^{- κ},

|\langle\mbox{\boldmath{$b$}},\mbox{\boldmath{$c$}}_{j}\rangle|\leqslant\alpha/\sqrt{N}

|\langle\mbox{\boldmath{$b$}},\mbox{\boldmath{$c$}}_{j}\rangle|\leqslant\alpha/\sqrt{N}

\frac{V_{N}(\mbox{\boldmath{$c$}}_{i_{1}},\dots,\mbox{\boldmath{$c$}}_{i_{N-1}},\mbox{\boldmath{$c$}}_{i_{N}})}{V_{N-1}(\mbox{\boldmath{$c$}}_{i_{1}},\dots,\mbox{\boldmath{$c$}}_{i_{N-1}})}\leqslant\frac{2\alpha}{\sqrt{N}}

\frac{V_{N}(\mbox{\boldmath{$c$}}_{i_{1}},\dots,\mbox{\boldmath{$c$}}_{i_{N-1}},\mbox{\boldmath{$c$}}_{i_{N}})}{V_{N-1}(\mbox{\boldmath{$c$}}_{i_{1}},\dots,\mbox{\boldmath{$c$}}_{i_{N-1}})}\leqslant\frac{2\alpha}{\sqrt{N}}

\mathbb{P}\left(|\langle\mbox{\boldmath{$c$}}_{1},\mbox{\boldmath{$e$}}_{1}\rangle|\leqslant\frac{2\alpha}{\sqrt{N}}\right)=\sqrt{\frac{N}{2\pi}}\int_{-2\alpha/\sqrt{N}}^{2\alpha/\sqrt{N}}\!\!\!\!e^{-x^{2}N/2}dx\leqslant\frac{4\alpha}{\sqrt{2\pi}},

\mathbb{P}\left(|\langle\mbox{\boldmath{$c$}}_{1},\mbox{\boldmath{$e$}}_{1}\rangle|\leqslant\frac{2\alpha}{\sqrt{N}}\right)=\sqrt{\frac{N}{2\pi}}\int_{-2\alpha/\sqrt{N}}^{2\alpha/\sqrt{N}}\!\!\!\!e^{-x^{2}N/2}dx\leqslant\frac{4\alpha}{\sqrt{2\pi}},

\mathbb{P}\left(\exists\mbox{\boldmath{$b$}}\in\mathbb{S}^{N-1}\hbox{ such that~{}\eqref{anti_deco_1} holds}\right)\leqslant\left(4\alpha/\sqrt{2\pi}\right)^{N^{\beta-1}}.

\mathbb{P}\left(\exists\mbox{\boldmath{$b$}}\in\mathbb{S}^{N-1}\hbox{ such that~{}\eqref{anti_deco_1} holds}\right)\leqslant\left(4\alpha/\sqrt{2\pi}\right)^{N^{\beta-1}}.

H=\left\{x\in\mathbb{R}^{N}\left|x=\sum_{i=1}^{\Sigma}\xi_{i}\mbox{\boldmath{$c$}}_{i},~{}\xi_{i}\geqslant 0,~{}\sum_{i=1}^{\Sigma}\xi_{i}\leqslant 1\right.\right\}.

H=\left\{x\in\mathbb{R}^{N}\left|x=\sum_{i=1}^{\Sigma}\xi_{i}\mbox{\boldmath{$c$}}_{i},~{}\xi_{i}\geqslant 0,~{}\sum_{i=1}^{\Sigma}\xi_{i}\leqslant 1\right.\right\}.

\left\{\mbox{\boldmath{$x$}}\in\mathbb{R}^{N}\left|\mbox{\boldmath{$x$}}=\sum_{i\in\Lambda}\alpha_{i}\mbox{\boldmath{$c$}}_{i},\sum_{i\in\Lambda}\alpha_{i}=1,\alpha_{i}\geqslant 0\right.\right\}

\left\{\mbox{\boldmath{$x$}}\in\mathbb{R}^{N}\left|\mbox{\boldmath{$x$}}=\sum_{i\in\Lambda}\alpha_{i}\mbox{\boldmath{$c$}}_{i},\sum_{i\in\Lambda}\alpha_{i}=1,\alpha_{i}\geqslant 0\right.\right\}

\langle\mbox{\boldmath{$z$}},\mbox{\boldmath{$c$}}_{i}\rangle=\langle\mbox{\boldmath{$z$}},\mbox{\boldmath{$e$}}\rangle/\|e\|_{{\cal C}}=1,\forall i\in\Lambda,\langle\mbox{\boldmath{$z$}},\mbox{\boldmath{$c$}}_{j}\rangle<1,\forall j\not\in\Lambda.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\MakePerPage

footnote

The Noise Collector for sparse recovery in high dimensions

Miguel Moscoso111Department of Mathematics, Universidad Carlos III de Madrid, Leganes, Madrid 28911, Spain , Alexei Novikov222Department of Mathematics, Pennsylvania State University, University Park, PA 16802 , George Papanicolaou333Department of Mathematics, Stanford University, Stanford, CA 94305 , Chrysoula Tsogka444Department of Applied Mathematics, University of California, Merced, CA 95343

Abstract

The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system ${\cal A}\mbox{\boldmath{$ \rho $}}=\mbox{\boldmath{$ b $}}_{0}$ can be found efficiently with an $\ell_{1}$ -norm minimization approach if the data is noiseless. Detection of the signal’s support from data corrupted by noise is still a challenging problem, especially if the level of noise must be estimated. We propose a new efficient approach that does not require any parameter estimation. We introduce the Noise Collector (NC) matrix ${\cal C}$ and solve an augmented system ${\cal A}\mbox{\boldmath{$ \rho $}}+{\cal C}\mbox{\boldmath{$ \eta $}}=\mbox{\boldmath{$ b $}}_{0}+\mbox{\boldmath{$ e $}}$ , where $e$ is the noise. We show that the $l_{1}$ -norm minimal solution of the augmented system has zero false discovery rate for any level of noise and with probability that tends to one as the dimension of $\mbox{\boldmath{$ b $}}_{0}$ increases to infinity. We also obtain exact support recovery if the noise is not too large, and develop a Fast Noise Collector Algorithm which makes the computational cost of solving the augmented system comparable to that of the original one. Finally, we demonstrate the effectiveness of the method in applications to passive array imaging.

We want to find sparse solutions $\mbox{\boldmath{$ \rho $}}\in\mathbb{R}^{K}$ for

[TABLE]

from highly incomplete measurement data $\mbox{\boldmath{$ b $}}=\mbox{\boldmath{$ b $}}_{0}+\mbox{\boldmath{$ e $}}\in\mathbb{R}^{N}$ , corrupted by noise $e$ where $1\ll N<K$ . In the noiseless case, $\rho$ can be found exactly by solving the optimization problem [9]

[TABLE]

provided the measurement matrix ${\cal A}\in\mathbb{R}^{N\times K}$ satisfies additional conditions, e.g., decoherence or restricted isometry properties [11, 4], and the solution vector $\rho$ has a small number $M$ of nonzero components or degrees of freedom. When measurements are noisy exact recovery is no longer possible. However the exact support of $\rho$ can be determined if the noise is not too strong. The most commonly used approach is to solve the $\ell_{2}$ -relaxed form of [2]

[TABLE]

known as Lasso in the statistics literature [26]. There are sufficient conditions for the support of $\mbox{\boldmath{$ \rho $}}_{\lambda}$ to be contained within the true support, see e.g. Fuchs [14], Tropp [27] and Wainwright [31]. These conditions depend on the signal-to-noise ratio (SNR), which is not known and must be estimated, and on the regularization parameter $\lambda$ , which must be carefully chosen and/or adaptively changed [32]. Although such an adaptive procedure improves the outcome, the resulting solutions tend to include a large number of “false positives” in practice [23]. Our contribution is a method for exact support recovery in the presence of additive noise. A key element of this method is that it has no tuning parameters. In particular, it does not require any prior knowledge of the level of noise which is often difficult to estimate.

Main Results. Suppose $\rho$ is an $M$ -sparse solution of the noiseless system in (1), where the columns of ${\cal A}$ have unit length. Our main result ensures that we can recover the support of $\rho$ by looking at the support of $\mbox{\boldmath{$ \rho $}}_{\tau}$ found as

[TABLE]

with an $O(\sqrt{\ln N})$ weight $\tau$ , and an appropriately chosen *Noise Collector * matrix ${\cal C}\in\mathbb{R}^{N\times\Sigma}$ , $\Sigma\gg K$ . The minimization problem [4] can be understood as a relaxation of [2]. It works by absorbing all the noise, and possibly some signal, in ${\cal C}\mbox{\boldmath{$ \eta $}}_{\tau}$ .

The following theorem shows that if the signal is pure noise, and the columns of the Noise Collector are chosen uniformly and independently at random on the unit sphere $\mathbb{S}^{N-1}=\left\{x\in\mathbb{R}^{N},\|x\|_{\ell_{2}}=1\right\}$ , then ${\cal C}\mbox{\boldmath{$ \eta $}}_{\tau}=\mbox{\boldmath{$ e $}}$ for any level of noise, with high probability.

Theorem 1 (No phantom signal): Suppose $\mbox{\boldmath{$ b $}}_{0}=0$ and $\mbox{\boldmath{$ e $}}/\|\mbox{\boldmath{$ e $}}\|_{l_{2}}$ is uniformly distributed on the unit sphere $\mathbb{S}^{N-1}$ . Fix $\beta>1$ and draw $\Sigma=N^{\beta}$ columns for ${\cal C}$ independently from the uniform distribution on $\mathbb{S}^{N-1}$ . For any $\kappa>0$ there are constants $c_{0}=c_{0}(\kappa,\beta)$ and $N_{0}=N_{0}(\kappa,\beta)$ such that, for $\tau=c_{0}\sqrt{\ln N}$ and all $N>N_{0}$ , $\mbox{\boldmath{$ \rho $}}_{\tau}$ , the solution of (4), is zero with probability $1-1/N^{\kappa}$ .

Theorem 1 guarantees a zero false discovery rate in the absence of signals with meaningful information, with high probability. We generalize this result for the case in which the recorded signals carry useful information in the next Theorem, where we show that the support of $\mbox{\boldmath{$ \rho $}}_{\tau}$ is inside the support of $\rho$ .

Theorem 2 (Zero false discoveries): Let $\rho$ be an $M$ -sparse solution of the noiseless system ${\cal A}\mbox{\boldmath{$ \rho $}}=\mbox{\boldmath{$ b $}}_{0}$ . Assume $\kappa$ , $\beta$ , the Noise Collector, the noise, and $\mbox{\boldmath{$ \rho $}}_{\tau}$ are the same as in Theorem 1. In addition, assume that the columns of ${\cal A}$ are incoherent, in the sense that $|\langle\mbox{\boldmath{$ a $}}_{i},\mbox{\boldmath{$ a $}}_{j}\rangle|\leqslant\frac{1}{3M}$ . Then, there are constants $c_{0}=c_{0}(\kappa,\beta)$ and $N_{0}=N_{0}(\kappa,\beta)$ such that, for $\tau=c_{0}\sqrt{\ln N}$ and all $N>N_{0}$ , $\mbox{supp}(\mbox{\boldmath{$ \rho $}}_{\tau})\subseteq\mbox{supp}(\mbox{\boldmath{$ \rho $}})$ with probability $1-1/N^{\kappa}$ .

The incoherence conditions in Theorem 2 are needed to guarantee that the true signal does not create false positives elsewhere. The next Theorem shows that if the noise is not too large, then $\mbox{\boldmath{$ \rho $}}_{\tau}$ and $\rho$ have exactly the same support.

Theorem 3 (Exact support recovery): Keep the same assumptions as in Theorem 2. Suppose the magnitudes of the non-zero entries of $\rho$ are bounded by $\gamma$ . If $\|\mbox{\boldmath{$ e $}}\|_{l_{2}}/\|\mbox{\boldmath{$ b $}}_{0}\|_{l_{2}}\leqslant c_{2}/\sqrt{\ln N}$ , $c_{2}=c_{2}(\kappa,\beta,\gamma,M)$ , then $\mbox{\boldmath{$ \rho $}}_{\tau}$ and $\rho$ have the same support with probability $1-1/N^{\kappa}$ .

Motivation. We are interested in imaging accurately sparse scenes using limited and noisy data. Such imaging problems arise in many areas such as medical imaging [29], structural biology [1], radar [2], and geophysics [24]. In imaging, the $\ell_{1}$ -norm minimization method in (2) is used often, in e.g. [19, 22, 16, 28, 12, 6]. This method has the desirable property of super-resolution, that is, the enhancement fine scale details of the images using, in this case, prior information about its low dimensional structure (sparsity). This has been analyzed in different settings by Donoho and Elad [10], Candès and Fernandez-Granda [5], Fannjiang and Liao [13], and Borcea and Kocyigit [3], among others. We want to retain this property in our method when the data is corrupted by additive noise.

However, noise fundamentally limits the quality of the images formed with almost all computational imaging techniques. Specifically, $\ell_{1}$ -norm minimization produces images that are unstable for low SNR due to the ill-conditioning of super-resolution reconstruction schemes. The instability emerges as clutter noise in the images, or grass, that degrades the resolution. Our initial motivation to introduce the Noise Collector matrix ${\cal C}$ was to regularize the matrix ${\cal A}$ and, thus, to suppress the clutter in the images. We proposed in [20] to seek the minimal $\ell_{1}$ -norm solution of the augmented linear system ${\cal A}\mbox{\boldmath$ \rho $}+{\cal C}\mbox{\boldmath{$ \eta $}}=\mbox{\boldmath$ b $}$ . The idea was to choose the columns of ${\cal C}$ almost orthogonal to those of ${\cal A}$ . Indeed, the condition number of $[{\cal A}\,|\,{\cal C}]$ becomes $O(1)$ when $O(N)$ columns of ${\cal C}$ are taken at random. This essentially follows from the bounds on the largest and the smallest nonzero singular values of random matrices, see e.g. Theorem 4.6.1 in [30].

The idea to create a dictionary for noise is not new. For example, the work by Laska et al. [17] considers a specific version of the measurement noise model so $\mbox{\boldmath{$ b $}}={\cal A}\mbox{\boldmath{$ \rho $}}+{\cal C}\mbox{\boldmath{$ e $}}$ , where ${\cal C}$ is a matrix with fewer (orthonormal) columns than rows, and the noise vector $e$ is sparse. ${\cal C}$ represents the basis in which the noise is sparse and it is assumed to be known. Then, they show that it is possible to recover sparse signals and sparse noise exactly using $\ell_{1}$ -norm minimization algorithms. We stress that we do not assume here that the noise is sparse. In our work, the noise is large (SNR can be small) and is evenly distributed across the data, so it cannot be sparsely accommodated.

To suppress the clutter, our theory in [20] required exponentially many columns, so $\Sigma\lesssim e^{N}$ . This seemed to make the noise collector impractical, but the numerical experiments suggested that $O(N)$ columns were enough to obtain excellent results. We address this issue here and explain why the noise collector matrix ${\cal C}$ only needs algebraically many columns. Moreover, to make the absorption of noise less expensive, and thus improve the algorithm in [20], we introduce the weight $\tau$ in (4). Indeed, by weighting the columns of the noise collector matrix ${\cal C}$ with respect to those in the model matrix ${\cal A}$ , the algorithm now produces images with no clutter at all, no matter how much noise is added to the data.

Finally, we want the Noise Collector to be efficient, with almost no extra computational cost with respect to the Lasso problem in [3]. To this end, it is constructed using circulant matrices that allows for efficient matrix vector multiplications using FFTs.

The proofs of Theorems 1, 2, and 3 are given in Section Proofs. We now explain how the Noise Collector works.

The Noise Collector

The construction of the Noise Collector matrix ${\cal C}$ starts with the following three key properties. Firstly, its columns should be sufficiently orthogonal to the columns of ${\cal A}$ , so it does not absorb signals with ”meaningful” information. Secondly, the columns of ${\cal C}$ should be uniformly distributed on the unit sphere $\mathbb{S}^{N-1}$ so that we could approximate well a typical noise vector. Thirdly, the number of columns in ${\cal C}$ should grow slower than exponential with $N$ , otherwise the method is impractical. One way to guarantee all three properties is to impose

[TABLE]

with $\alpha>1$ , and fill out ${\cal C}$ drawing $\mbox{\boldmath{$ c $}}_{i}$ at random with rejections until the rejection rate becomes too high. Then, by construction, the columns of ${\cal C}$ are almost orthogonal to the columns of ${\cal A}$ , and when the rejection rate becomes too high this implies that we can not pack more N-dimensional unit vectors into ${\cal C}$ and, thus, we can approximate well a typical noise vector. Finally, the Kabatjanskii-Levenstein inequality (see discussion in [25]) implies that the number $\Sigma$ of columns in ${\cal C}$ grows at most polynomially: $\Sigma\leqslant N^{\alpha^{2}}$ .

It is, however, more convenient for the proofs to use a probabilistic version of [5]. Suppose that the columns of ${\cal C}$ are drawn at random independently. Then, the dot product of any two random unit vectors is still typically of order $1/\sqrt{N}$ , see e.g. [30]. If the number of columns grows polynomially, we only have to sacrifice an asymptotically negligible event where our Noise Collector does not satisfy the three key properties, and the decoherence constraints in [5] are weakened by a logarithmic factor. The next Lemma is proved in Section Proofs.

Lemma 1: Suppose $\Sigma=N^{\beta}$ , $\beta>1$ , vectors $\mbox{\boldmath{$ c $}}_{i}\in\mathbb{S}^{N-1}$ are drawn at random and independently. Then, (i) for any $\kappa>0$ there are constants $c_{0}(\kappa,\beta)$ and $\alpha>1/2$ , such that

[TABLE]

and (ii) for any $\mbox{\boldmath{$ e $}}\in\mathbb{S}^{N-1}$ there exists at least one $\mbox{\boldmath{$ c $}}_{j}$ , so that

[TABLE]

with probability $1-1/N^{\kappa}$ .

The estimate in [6] implies that any solution ${\cal C}\mbox{\boldmath{$ \eta $}}=\mbox{\boldmath{$ a $}}_{i}$ satisfies, for any $i\leqslant N$ ,

[TABLE]

with probability $1-1/N^{\kappa}$ . This estimate measures how expensive it is to approximate columns of ${\cal A}$ with the Noise Collector. In turn, the weight $\tau$ should be chosen so that it is expensive to approximate noise using columns of ${\cal A}$ . It cannot be taken too large, though, because we may loose the signal. In fact, one can prove that if $\tau\geqslant\sqrt{N}/\alpha$ , then $\mbox{\boldmath{$ \rho $}}_{\tau}\equiv 0$ for any $\rho$ and any level of noise. Intuitively, the weight $\tau$ characterizes the rate at which the signal is lost as the noise increases.

To explain the theoretical lower bound $\tau\geqslant c_{0}\sqrt{\ln N}$ we turn to the geometric interpretation of duality in linear programming. Suppose $\tau=\infty$ and there is no signal, $\mbox{\boldmath{$ b $}}_{0}$ . Then, the solution of [4] satisfies $(\mbox{\boldmath{$ \rho $}}_{\infty},\mbox{\boldmath{$ \eta $}}_{\infty})=(\mbox{\boldmath{$ 0 $}},\mbox{\boldmath{$ \eta $}})$ , and there is a dual certificate $z$ of optimality of $(\mbox{\boldmath{$ 0 $}},\mbox{\boldmath{$ \eta $}})$ for $\tau=\infty$ that satisfies

[TABLE]

Define a nonlinear map

[TABLE]

where $\mbox{\boldmath{$ e $}}\in\mathbb{R}^{N}$ is the noise vector in [4], and $z$ is the dual certificate of optimality of $(\mbox{\boldmath{$ 0 $}},\mbox{\boldmath{$ \eta $}})$ for $\tau=\infty$ . For example, if ${\cal C}$ is the identity matrix, then $\Phi_{\cal C}(\mbox{\boldmath{$ e $}})=(\hbox{sign}(e_{1}),\dots,\hbox{sign}(e_{N}))$ ; see Figure 1-left. If $\mbox{\boldmath{$ z $}}=\Phi_{\cal C}(\mbox{\boldmath{$ e $}})$ remains a dual certificate of optimality of $(\mbox{\boldmath{$ 0 $}},\mbox{\boldmath{$ \eta $}})$ for $\tau=c_{0}\sqrt{\ln N}$ , then it implies that support $(\mbox{\boldmath{$ \rho $}}_{\tau})\subset$ support $(\mbox{\boldmath{$ \rho $}})$ for such $\tau$ . Thus, Theorem 1 follows once we check that

[TABLE]

holds with large probability. Thus, we need to understand the statistics of $\mbox{\boldmath{$ z $}}=\Phi_{\cal C}(\mbox{\boldmath{$ e $}})$ , given that $\mbox{\boldmath{$ e $}}/\|\mbox{\boldmath{$ e $}}\|$ is uniformly distributed on $\mathbb{S}^{N-1}$ . The columns of the Noise Collector were also uniformly distributed on $\mathbb{S}^{N-1}$ , thus the vector $\mbox{\boldmath{$ n $}}=\mbox{\boldmath{$ z $}}/\|\mbox{\boldmath{$ z $}}\|_{l_{2}}$ has to be uniformly distributed on $\mathbb{S}^{N-1}$ as well. The chance (10) does not hold, could be estimated by the area of the intersection of the unit sphere $\mathbb{S}^{N-1}$ and the $l_{1}$ ball of radius $O(\sqrt{N})$ (see Figure 1-right), which can be shown to be small by standard estimates from high-dimensional probability.

By construction, the columns of the combined matrix $[{\cal A}\,|\,{\cal C}]$ are incoherent. This is the key observation, that allows us to prove Theorems 2 and 3 using standard techniques, see e.g. [20]. In particular, we automatically have exact recovery by the standard arguments [11] applied to $[{\cal A}\,|\,{\cal C}]$ if the data is noiseless.

Lemma 2 (Exact Recovery): Suppose $\rho$ is an $M$ -sparse solution of ${\cal A}\mbox{\boldmath{$ \rho $}}=\mbox{\boldmath{$ b $}}$ , and there is no noise, $\mbox{\boldmath{$ e $}}=0$ . In addition, assume that the columns of ${\cal A}$ are incoherent: $|\langle\mbox{\boldmath{$ a $}}_{i},\mbox{\boldmath{$ a $}}_{j}\rangle|\leqslant\frac{1}{3M}$ . Then, the solution to [4] satisfies $\mbox{\boldmath$ \rho $}_{\tau}=\mbox{\boldmath$ \rho $}$ for all

[TABLE]

with probability $1-1/N^{\kappa}$ .

Fast Noise Collector Algorithm

To find the minimizer [4], we consider a variational approach. We define the function

[TABLE]

for a weight $\tau=c_{0}\sqrt{\ln N}$ , and determine the solution as

[TABLE]

The key observation is that this variational principle finds the minimum in [4] exactly for all values of the regularization parameter $\lambda$ . Hence, the proposed method is fully automated, meaning that it has no tuning parameters. To determine the exact extremum in [13], we use the iterative soft thresholding algorithm GeLMA [21] that works as follows .

For $\beta=1.5$ we use $\tau=0.8\sqrt{\ln N}$ in our numerical experiments. For optimal results, one can calibrate $c_{0}$ to be the smallest constant such that Theorem 1 holds, that is, we see no phantom signals when the algorithm is fed with pure noise.

Pick a value for the regularization parameter $\lambda$ , e.g. $\lambda=1$ . Choose step sizes $\Delta t_{1}<2/\|[{\cal A}\,|\,{\cal C}]\|^{2}$ and $\Delta t_{2}<\lambda/\|{\cal A}\|$ 555Choosing two step sizes instead of the smaller one $\Delta t_{1}$ improves the convergence speed.. Set $\mbox{\boldmath{$ \rho $}}_{0}=\mbox{\boldmath{$ 0 $}}$ , $\mbox{\boldmath{$ \eta $}}_{0}=\mbox{\boldmath{$ 0 $}}$ , $\mbox{\boldmath{$ z $}}_{0}=\mbox{\boldmath{$ 0 $}}$ , and iterate for $k\geqslant 0$ :

[TABLE]

where $\mathcal{S}_{\lambda}(y_{i})=\text{sign}(y_{i})\max\{0,|y_{i}|-\lambda\}$ .

The Noise Collector matrix ${\cal C}$ is computed by drawing $N^{\beta-1}$ normally distributed $N$ -dimensional vectors, normalized to unit length. These are the generating vectors of the Noise Collector. From each of them a circulant $N\times N$ matrix ${\cal C}_{i}$ , $i=1,\ldots,N^{\beta-1}$ , is constructed. The Noise Collector matrix is obtained by concatenation, so ${\cal C}=\left[{\cal C}_{1}\left|{\cal C}_{2}\left|\ldots\left|{\cal C}_{N^{\beta-1}}\right.\right.\right.\right]$ . Exploiting the circulant structure of the matrices ${\cal C}_{i}$ , we perform the matrix vector multiplications ${\cal C}\mbox{\boldmath{$ \eta $}}_{k}$ and ${\cal C}^{*}(\mbox{\boldmath{$ z $}}_{k}+\mbox{\boldmath{$ r $}})$ in (14) using the FFT [15]. This makes the complexity associated to the Noise Collector $O(N^{\beta}\log(N))$ . Note that only the $N^{\beta-1}$ generating vectors are stored, and not the entire $N\times N^{\beta}$ Noise Collector matrix. In practice, we use $\beta\approx 1.5$ which makes the cost of using the Noise Collector negligible, as typically $K\gg N^{\beta-1}$ .

Application to imaging

We consider passive array imaging of point sources. The problem consists in determining the positions $\vec{\mbox{\boldmath{$ z $}}}_{j}$ and the complex666We chose to work with real numbers in the previous sections for ease of presentation but the results also hold with complex numbers. amplitudes $\alpha_{j}$ , $j=1,\dots,M$ , of a few point sources from measurements of polychromatic signals on an array of receivers; see Figure 2. The imaging system is characterized by the array aperture $a$ , the distance $L$ to the sources, the bandwidth $B$ and the central wavelength $\lambda_{0}$ .

The sources are located inside an image window IW, which is discretized with a uniform grid of points $\vec{\mbox{\boldmath{$ y $}}}_{k}$ , $k=1,\ldots,K$ . The unknown is the source vector $\mbox{\boldmath{$ \rho $}}=[\rho_{1},\ldots,\rho_{K}]^{\intercal}\in\mathbb{C}^{K}$ , whose components $\rho_{k}$ correspond to the complex amplitudes of the $M$ sources at the grid points $\vec{\mbox{\boldmath{$ y $}}}_{k}$ , $k=1,\ldots,K$ , with $K\gg M$ . For the true source vector we have $\rho_{k}=\alpha_{j}$ if $\vec{\mbox{\boldmath{$ y $}}}_{k}=\vec{\mbox{\boldmath{$ z $}}}_{j}$ for some $j=1,\ldots,M$ , while $\rho_{k}=0$ otherwise.

Denoting by $G(\vec{\mbox{\boldmath{$ x $}}},\vec{\mbox{\boldmath{$ y $}}};\omega)$ the Green’s function for the propagation of a signal of angular frequency $\omega$ from point $\vec{\mbox{\boldmath{$ y $}}}$ to point $\vec{\mbox{\boldmath{$ x $}}}$ , we define the single-frequency Green’s function vector that connects a point $\vec{\mbox{\boldmath{$ y $}}}$ in the IW with all points $\vec{\mbox{\boldmath{$ x $}}}_{r}$ , $r=1,\ldots,N$ , on the array as

[TABLE]

In a homogeneous medium in three dimensions, $G(\vec{\mbox{\boldmath{$ x $}}},\vec{\mbox{\boldmath{$ y $}}};\omega)=\frac{\exp\{\mathrm{i}\omega|\vec{\mbox{\boldmath{$ x $}}}-\vec{\mbox{\boldmath{$ y $}}}|/c_{0}\}}{4\pi|\vec{\mbox{\boldmath{$ x $}}}-\vec{\mbox{\boldmath{$ y $}}}|}$ .

The data for the imaging problem are the signals

[TABLE]

recorded at receiver locations $\vec{\mbox{\boldmath{$ x $}}}_{r}$ , $r=1,\ldots,N$ , at frequencies $\omega_{l}$ , $l=1,\dots,S$ . These data are stacked in a column vector

[TABLE]

with $\mbox{\boldmath{$ b $}}(\omega_{l})=[b(\vec{\mbox{\boldmath{$ x $}}}_{1},\omega_{l}),b(\vec{\mbox{\boldmath{$ x $}}}_{2},\omega_{l}),\dots,b(\vec{\mbox{\boldmath{$ x $}}}_{N},\omega_{l})]^{\intercal}\in\mathbb{C}^{N}$ . Then, ${\cal A}\,\mbox{\boldmath$ \rho $}=\mbox{\boldmath$ b $}$ , with ${\cal A}$ the $(N\cdot S)\times K$ measurement matrix whose columns $\mbox{\boldmath{$ a $}}_{k}$ are the multiple-frequency Green’s function vectors

[TABLE]

normalized to have length one. The system ${\cal A}\,\mbox{\boldmath$ \rho $}=\mbox{\boldmath$ b $}$ relates the unknown vector $\mbox{\boldmath$ \rho $}\in\mathbb{C}^{K}$ to the data vector $\mbox{\boldmath$ b $}\in\mathbb{C}^{(N\cdot S)}$ .

Next, we illustrate the performance of the Noise Collector in this imaging setup. The most important features are that (i) no calibration is necessary with respect to the level of noise, (ii) exact support recovery for relatively large levels of noise (i.e., $\|\mbox{\boldmath{$ e $}}\|_{l_{2}}/\|\mbox{\boldmath{$ b $}}_{0}\|_{l_{2}}\leqslant c_{2}/\sqrt{\ln N}$ ), and (iii) zero false discovery rate for all levels of noise, with high probability.

We consider a high frequency microwave imaging regime with central frequency $f_{0}=60$ GHz corresponding to $\lambda_{0}=5$ mm. We make measurements for $S=25$ equally spaced frequencies spanning a bandwidth $B=20$ GHz. The array has $N=25$ receivers and an aperture $a=50$ cm. The distance from the array to the center of the imaging window is $L=50$ cm. Then, the resolution is $\lambda_{0}L/a=5$ mm in the cross-range (direction parallel to the array) and $c_{0}/B=15$ mm in range (direction of propagation). These parameters are typical in microwave scanning technology [18].

We seek to image a source vector with sparsity $M=12$ ; see the left plot in Fig. 3. The size of the imaging window is 20cm $\times$ 60cm and the pixel spacing is 5mm $\times$ 15mm. The number of unknowns is, therefore, $K=1681$ and the number of data is $NS=625$ . The size of the noise collector is taken to be $\Sigma=10^{4}$ , so $\beta\approx 1.5$ . When the data is noiseless, we obtain exact recovery as expected; see the right plot in Fig. 3

In Fig. 4, we display the imaging results, with and without a Noise Collector, when the data is corrupted by additive noise. The SNR $=1$ , so the $\ell_{2}$ -norms of the signals and the noise are equal. In the left plot, we show the recovered image using $\ell_{1}$ -norm minimization without a Noise Collector. There is a lot of grass in this image, with many non-zero values outside the true support. When a Noise Collector is used, the level of the grass is reduced and the image improves; see the second from the left plot. Still, there are several false discoveries because we use $\tau=1$ in [14].

In the third column from the left of Fig. 4 we show the image obtained with a weight $\tau=0.8\sqrt{\ln 625}=2$ in [14]. With this weight, there are no false discoveries and the recovered support is exact. This simplifies the imaging problem dramatically, as we can now restrict the inverse problem to the true support just obtained, and then solve an overdetermined linear system using a classical $\ell_{2}$ approach. The results are shown in the right column of Fig. 4. Note that this second step largely compensates for the signal that was lost in the first step due to the high level of noise.

In Figure 5 we illustrate the performance of the Noise Collector for different sparsity levels $M$ and SNR values. Success in recovering the true support of the unknown corresponds to the value $1$ (yellow) and failure to [math] (blue). The small phase transition zone (green) contains intermediate values. These results are obtained by averaging over 5 realizations of noise.

Remark 1: We considered passive array imaging for ease of presentation. Same results hold for active array imaging with or without multiple scattering; see [7] for the detailed analytical setup.

Remark 2: We have considered a microwave imaging regime. Similar results can be obtained in other regimes.

Proofs

Proof of Lemma 1: Denote the event

[TABLE]

By independence, $\mathbb{P}\left(|\langle\mbox{\boldmath{$ a $}}_{i},\mbox{\boldmath{$ c $}}_{j}\rangle|\geqslant t/\sqrt{N}\right)\leqslant 2\exp(-t^{2}/2)$ for any $i$ and $j$ . Thus, $\mathbb{P}\left(\Omega_{t}\right)\leqslant 2N\Sigma\exp(-t^{2}/2)$ . Choosing $t=c_{0}\sqrt{\ln N}$ for sufficiently large $c_{0}$ , we get

[TABLE]

where $c_{0}^{2}>2(\beta+\kappa+1)$ and $N\geqslant N_{0}$ . Hence, [6] holds with large probability $1-N^{-\kappa}$ .

Next, we consider the chances that [7] does not hold. Suppose there is a direction $\mbox{\boldmath{$ b $}}\in\mathbb{S}^{N-1}$ such that

[TABLE]

holds for all $j$ . Let $V_{k}(\mbox{\boldmath{$ c $}}_{i_{1}},\dots,\mbox{\boldmath{$ c $}}_{i_{k}})$ be the $k$ -dimensional volume of a parallelogram spanned by $\mbox{\boldmath{$ c $}}_{i_{1}}$ , $\dots$ , $\mbox{\boldmath{$ c $}}_{i_{k}}$ . Note that $V_{k}$ is equal to $V_{k-1}$ times its height. Then, if (18) holds,

[TABLE]

for any choice of $N$ columns $\mbox{\boldmath{$ c $}}_{i_{j}}$ from the noise collector ${\cal C}$ . If we fix the indices ${i_{1}}$ , $\dots$ , ${i_{N}}$ then, due to rotational invariance, the probability of the event (19) equals the probability of event $|\langle\mbox{\boldmath{$ c $}}_{1},\mbox{\boldmath{$ e $}}_{1}\rangle|\leqslant 2\alpha/\sqrt{N}$ . Using

[TABLE]

and that we can find $N^{\beta-1}$ sets of distinct indices ${i_{1}}$ , $\dots$ , ${i_{N}}$ , we conclude that

[TABLE]

Choosing $\alpha$ sufficiently small, i.e. $\alpha<4\sqrt{2\pi}/4\approx 0.63$ , and $N$ sufficiently large, we obtain the result. $\Box$

Proof of Theorem 1: In order to check (10), we assume that both $\mbox{\boldmath{$ c $}}_{i}$ and $-\mbox{\boldmath{$ c $}}_{i}$ are in ${\cal C}$ , because it is more geometrically intuitive to work with the convex hull

[TABLE]

It implies we may also assume $\eta$ in (4) has non-negative coefficients, and $\|\mbox{\boldmath{$ \eta $}}\|_{l_{1}}=\min_{\lambda>0}\{\mbox{\boldmath{$ e $}}\in\lambda H\}$ . Thus, $\|\mbox{\boldmath{$ \eta $}}\|_{l_{1}}$ is a norm of $e$ with respect to ${\cal C}$ , and we can set $\|\mbox{\boldmath{$ e $}}\|_{{\cal C}}:=\|\mbox{\boldmath{$ \eta $}}\|_{l_{1}}$ . This norm is called atomic in [8]. Suppose $\Lambda$ is the support of $\eta$ . Its typical size $|\Lambda|=N$ . Then, the simplex

[TABLE]

has the unique normal vector $n$ , which is collinear to our dual certificate $z$ , because

[TABLE]

The estimate (7) implies that the convex hull $H$ contains an $l_{2}$ -ball of radius $\alpha/\sqrt{N}$ . Therefore, $\|\mbox{\boldmath{$ z $}}\|_{l_{2}}\leqslant\sqrt{N}/\alpha$ with large probability.

By construction, the distribution of $\Phi_{\cal C}(\mbox{\boldmath{$ e $}})$ is rotationally invariant with respect to the probability measure induced by all $\mbox{\boldmath{$ c $}}_{i}$ and $e$ . Thus $\mbox{\boldmath{$ n $}}=\mbox{\boldmath{$ z $}}/\|\mbox{\boldmath{$ z $}}\|_{l_{2}}$ is also uniformly distributed on $\mbox{\boldmath{$ S $}}^{N-1}$ , and

[TABLE]

for all $i=j,\dots,K$ , see e.g. [30]. Therefore, we can bound the probability that (10) does not hold:

[TABLE]

for large $N$ and appropriately chosen $c_{0}=\sqrt{2(\kappa+\beta)}/\alpha$ . Hence, (10) holds with large probability $1-N^{-\kappa}$ . $\Box$

Proof of Theorem 2: If the columns of ${\cal A}$ are orthogonal, our previous arguments could be modified to verify Theorem 2. Indeed, suppose $V$ is the span of the column vectors $\mbox{\boldmath{$ a $}}_{j}$ , with $j$ in the support of $\rho$ . Say, $V$ is spanned by $\mbox{\boldmath{$ a $}}_{1}$ , $\dots$ , $\mbox{\boldmath{$ a $}}_{M}$ . Let $W=V^{\perp}$ be the orthogonal complement to $V$ . Then, the orthogonal projection of the signal $\mbox{\boldmath{$ \rho $}}^{w}=0$ . By the concentration of measure see e.g. [30], the projection of the noise $\mbox{\boldmath{$ e $}}^{w}$ is uniformly distributed on the unit sphere $\mathbb{S}^{N-1-M}$ with large probability. Applying the previous arguments to $\mbox{\boldmath{$ z $}}^{w}$ , the projection of $z$ on $W$ , we conclude that the projection $\mbox{\boldmath{$ \rho $}}^{w}_{\tau}=0$ . Therefore, $\mbox{supp}(\mbox{\boldmath{$ \rho $}}_{\tau})\subseteq\mbox{supp}(\mbox{\boldmath{$ \rho $}})$ with large probability.

For general ${\cal A}$ consider the orthogonal decomposition $\mbox{\boldmath{$ a $}}_{i}=\mbox{\boldmath{$ a $}}_{i}^{v}+\mbox{\boldmath{$ a $}}_{i}^{w}$ for all $i\geqslant M+1$ . As before, we can choose $\tau=c_{0}\sqrt{\ln N}$ so that $|\langle\mbox{\boldmath{$ a $}}_{i}^{w},\mbox{\boldmath{$ z $}}\rangle|<\tau/2$ with large probability. It remains to demonstrate that $|\langle\mbox{\boldmath{$ a $}}_{i}^{v},\mbox{\boldmath{$ z $}}\rangle|\leqslant\tau/2$ . Fix any $i\geqslant M+1$ . Suppose $\mbox{\boldmath{$ a $}}_{i}^{v}=\sum_{k=1}^{M}\alpha_{k}\mbox{\boldmath{$ a $}}_{k}$ , and $|\alpha_{j}|=\max_{k\leqslant M}|\alpha_{k}|=\|\mbox{\boldmath{$ \alpha $}}\|_{l_{\infty}}$ . Thus,

[TABLE]

Then, $\|\mbox{\boldmath{$ \alpha $}}\|_{l_{\infty}}\leqslant 1/2M$ , so $\|\mbox{\boldmath{$ \alpha $}}\|_{l_{1}}\leqslant M\|\mbox{\boldmath{$ \alpha $}}\|_{l_{\infty}}\leqslant 1/2$ . Hence,

[TABLE]

Proof of Theorem 3: It suffices to prove the result for 1-sparse $\rho$ , say, $\mbox{\boldmath{$ \rho $}}=(1,0,\dots,0)$ . We will demonstrate that the solution to the minimization problem

[TABLE]

with $\tau=c_{0}\sqrt{\ln N}$ , satisfies $\rho_{1}>1/2$ if $\delta<c_{2}/\sqrt{\ln N}$ , $c_{2}=\alpha/5c_{0}$ . This implies $\mbox{supp}(\mbox{\boldmath$ \rho $}_{\tau})=\mbox{supp}(\mbox{\boldmath$ \rho $})$ .

Suppose $\mbox{\boldmath{$ \eta $}}_{\mbox{\boldmath{$ e $}}}$ , $\mbox{\boldmath{$ \eta $}}_{\mbox{\boldmath{$ a $}}_{1}}$ and $\mbox{\boldmath{$ \eta $}}_{t}$ are the optimal solutions of

[TABLE]

with right-hand sides $\mbox{\boldmath{$ b $}}=\mbox{\boldmath{$ e $}}$ , $\mbox{\boldmath{$ b $}}=\mbox{\boldmath{$ a $}}_{1}$ , and $\mbox{\boldmath{$ b $}}=\mbox{\boldmath{$ e $}}+t\mbox{\boldmath{$ a $}}_{1}$ , respectively. Since ${\cal C}\left(\mbox{\boldmath{$ \eta $}}_{t}-\mbox{\boldmath{$ \eta $}}_{\mbox{\boldmath{$ e $}}}\right)=t\mbox{\boldmath{$ a $}}_{1}$ , we have

[TABLE]

and, therefore,

[TABLE]

From (7) and (8), we have

[TABLE]

respectively. Suppose $\delta\leqslant c_{2}/\sqrt{\ln N}$ , $c_{2}=\alpha/5c_{0}$ . Then, for any $t\geqslant 1/2$ , and for $N$ large enough,

[TABLE]

Using (25) with $t=1-\rho_{1}$ , we conclude that

[TABLE]

for all $\rho_{1}\leqslant 1/2$ . It implies (23). $\Box$

Acknowledgements The work of M. Moscoso was partially supported by Spanish grant MICINN FIS2016-77892-R. The work of A.Novikov was partially supported by NSF grants DMS-1515187, DMS-1813943. The work of G. Papanicolaou was partially supported by AFOSR FA9550-18-1-0519. The work of C. Tsogka was partially supported by AFOSR FA9550-17-1-0238 and FA9550-18-1-0519. We thank Marguerite Novikov for drawing Figure 1.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Al Quraishi and H. H. Mc Adams, Direct inference of protein DNA interactions using compressed sensing methods, Proc. Natl. Acad. Sci. U.S.A 108 ,14819–14824 (2001).
2[2] R. Baraniuk and Philippe Steeghs, Compressive Radar Imaging, in 2007 IEEE Radar Conference , Apr. 2007, 128–133.
3[3] L. Borcea and I. Kocyigit, Resolution analysis of imaging with ℓ 1 subscript ℓ 1 \ell_{1} optimization, SIAM J. Imaging Sci. 8 , 3015–3050 (2015).
4[4] E. J Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inf. Theory 51 , 4203–4215 (2005).
5[5] E. J. Candès and C. Fernandez-Granda, Towards a mathematical theory of super-resolution, Comm. Pure Appl. Math. 67 , 906-956 (2014).
6[6] A. Chai, M. Moscoso and G. Papanicolaou, Robust imaging of localized scatterers using the singular value decomposition and ℓ 1 subscript ℓ 1 \ell_{1} optimization, Inverse Problems 29 , 025016 (2013).
7[7] A. Chai, M. Moscoso and G. Papanicolaou, Imaging Strong Localized Scatterers with Sparsity Promoting Optimization, SIAM J. Imaging Sci. 7 , 1358–1387 (2014).
8[8] V. Chandrasekaran, B.Recht, P. A. Parrilo, A. S. Willsky, The convex geometry of linear inverse problems, Found. Comput. Math. 12 , 805–849 (2012).