Recovery of Structured Signals From Corrupted Non-Linear Measurements

Zhongxing Sun; Wei Cui; and Yulong Liu

arXiv:1901.08349·cs.IT·January 25, 2019

Recovery of Structured Signals From Corrupted Non-Linear Measurements

Zhongxing Sun, Wei Cui, and Yulong Liu

PDF

Open Access

TL;DR

This paper proposes an extended Lasso method to recover structured signals from a limited number of corrupted non-linear measurements, providing theoretical conditions for successful reconstruction of both signal and corruption.

Contribution

Introduction of an extended Lasso approach for disentangling signals and corruption in non-linear measurement models with theoretical recovery guarantees.

Findings

01

Successful recovery conditions established

02

Extended Lasso effectively separates signal and corruption

03

Applicable to various structured signal models

Abstract

This paper studies the problem of recovering a structured signal from a relatively small number of corrupted non-linear measurements. Assuming that signal and corruption are contained in some structure-promoted set, we suggest an extended Lasso to disentangle signal and corruption. We also provide conditions under which this recovery procedure can successfully reconstruct both signal and corruption.

Equations180

y = Φ x^{⋆} + n,

y = Φ x^{⋆} + n,

x min ∥ y - Φ x ∥_{2}, s.t.

x min ∥ y - Φ x ∥_{2}, s.t.

y_{i} = f_{i} (⟨ Φ_{i}, x^{⋆} ⟩), i = 1, \dots, m,

y_{i} = f_{i} (⟨ Φ_{i}, x^{⋆} ⟩), i = 1, \dots, m,

y_{i} = f_{i} (⟨ Φ_{i}, x^{⋆} ⟩) + m v_{i}^{⋆}, i = 1, \dots, m .

y_{i} = f_{i} (⟨ Φ_{i}, x^{⋆} ⟩) + m v_{i}^{⋆}, i = 1, \dots, m .

x, v min ∥ y - Φ x - m v ∥_{2}, s.t.

x, v min ∥ y - Φ x - m v ∥_{2}, s.t.

D (\SS, x) = {t u : t \geq 0, u \in \SS - x} .

D (\SS, x) = {t u : t \geq 0, u \in \SS - x} .

ω (\SS) := E x \in \SS sup ⟨ g, x ⟩, where g \sim N (0, I_{n}),

ω (\SS) := E x \in \SS sup ⟨ g, x ⟩, where g \sim N (0, I_{n}),

γ (\SS) := E x \in \SS sup ∣ ⟨ g, x ⟩ ∣, where g \sim N (0, I_{n}) .

γ (\SS) := E x \in \SS sup ∣ ⟨ g, x ⟩ ∣, where g \sim N (0, I_{n}) .

(ω (\SS) + ∥ y ∥_{2}) /3 \leq γ (\SS) \leq 2 (ω (\SS) + ∥ y ∥_{2}) \forall y \in \SS .

(ω (\SS) + ∥ y ∥_{2}) /3 \leq γ (\SS) \leq 2 (ω (\SS) + ∥ y ∥_{2}) \forall y \in \SS .

ω_{t} (\SS) := E x \in \SS \cap t B_{2}^{n} sup ⟨ g, x ⟩, where g \sim N (0, I_{n}) .

ω_{t} (\SS) := E x \in \SS \cap t B_{2}^{n} sup ⟨ g, x ⟩, where g \sim N (0, I_{n}) .

∥ X ∥_{ψ_{2}} = in f {t > 0 : E exp (X^{2} / t^{2}) \leq 2}

∥ X ∥_{ψ_{2}} = in f {t > 0 : E exp (X^{2} / t^{2}) \leq 2}

\|\bm{x}\|_{\psi_{2}}:=\sup_{\bm{y}\in\mathbb{S}^{n-1}}\big{\|}\left\langle\bm{x},\bm{y}\right\rangle\big{\|}_{\psi_{2}}.

\|\bm{x}\|_{\psi_{2}}:=\sup_{\bm{y}\in\mathbb{S}^{n-1}}\big{\|}\left\langle\bm{x},\bm{y}\right\rangle\big{\|}_{\psi_{2}}.

(a, b) \in T sup

(a, b) \in T sup

\leq C K^{2} [γ (T) + s \cdot rad (T)]

(a, b) \in T \cap S^{n + m - 1} in f ∥ A a + m b ∥_{2} \geq m - C K^{2} γ (T \cap S^{n + m - 1})

(a, b) \in T \cap S^{n + m - 1} in f ∥ A a + m b ∥_{2} \geq m - C K^{2} γ (T \cap S^{n + m - 1})

(a, b) \in T \cap t S^{n + m - 1} in f A a + m b_{2} \geq t m - C K^{2} γ (T \cap t S^{n + m - 1})

(a, b) \in T \cap t S^{n + m - 1} in f A a + m b_{2} \geq t m - C K^{2} γ (T \cap t S^{n + m - 1})

Mean term :

Mean term :

Variance term :

m \geq C \cdot ω_{1} (D)^{2},

m \geq C \cdot ω_{1} (D)^{2},

\displaystyle\sqrt{\|\hat{\bm{x}}-\mu\bm{x}^{\star}\|_{2}^{2}+\|\hat{\bm{v}}-\bm{v}^{\star}\|_{2}^{2}}\leq\frac{C}{\sqrt{m}}\big{(}\omega_{1}(\mathcal{D})(\sigma+\psi+\mu)+s\sigma\big{)}

\displaystyle\sqrt{\|\hat{\bm{x}}-\mu\bm{x}^{\star}\|_{2}^{2}+\|\hat{\bm{v}}-\bm{v}^{\star}\|_{2}^{2}}\leq\frac{C}{\sqrt{m}}\big{(}\omega_{1}(\mathcal{D})(\sigma+\psi+\mu)+s\sigma\big{)}

m \geq C \cdot ω_{t} (K)^{2} / t^{2},

m \geq C \cdot ω_{t} (K)^{2} / t^{2},

∥ \hat{x} - μ x^{⋆} ∥_{2}^{2} + ∥ \hat{v} - v^{⋆} ∥_{2}^{2}

∥ \hat{x} - μ x^{⋆} ∥_{2}^{2} + ∥ \hat{v} - v^{⋆} ∥_{2}^{2}

\leq t + \frac{C}{m} (\frac{ω _{t} ( K ) ( σ + ψ + μ )}{t} + s σ)

\displaystyle\sup_{(\bm{a},\bm{b})\in\mathcal{K}^{t}}\left\langle\bm{\Phi}\bm{a}+\sqrt{m}\bm{b},\bm{z}\right\rangle\leq C\sqrt{m}\big{[}\omega(\mathcal{K}^{t})(\sigma+\psi+\mu)+st\sigma\big{]}

\displaystyle\sup_{(\bm{a},\bm{b})\in\mathcal{K}^{t}}\left\langle\bm{\Phi}\bm{a}+\sqrt{m}\bm{b},\bm{z}\right\rangle\leq C\sqrt{m}\big{[}\omega(\mathcal{K}^{t})(\sigma+\psi+\mu)+st\sigma\big{]}

∥ Φ h + m e ∥_{2} \geq \frac{m}{2} ∥ h ∥_{2}^{2} + ∥ e ∥_{2}^{2}

∥ Φ h + m e ∥_{2} \geq \frac{m}{2} ∥ h ∥_{2}^{2} + ∥ e ∥_{2}^{2}

(h, e) \in K, ∥ (h, e) ∥_{2} \geq t in f \frac{∥ Φ h + m e ∥ _{2}}{∥ h ∥ _{2}^{2} + ∥ e ∥ _{2}^{2}}

(h, e) \in K, ∥ (h, e) ∥_{2} \geq t in f \frac{∥ Φ h + m e ∥ _{2}}{∥ h ∥ _{2}^{2} + ∥ e ∥ _{2}^{2}}

= (u, v) \in λ K \cap t S^{n + m - 1} in f \frac{∥ Φ u + m v ∥ _{2}}{t}

\geq (u, v) \in K \cap t S^{n + m - 1} in f \frac{∥ Φ u + m v ∥ _{2}}{t}

\geq m - C^{'} γ (K \cap t S^{n + m - 1}) / t

\geq m - C^{''} ω_{t} (K) / t

\geq \frac{m}{2}

γ (K \cap t S^{n + m - 1}) \leq γ (K \cap t B_{2}^{n + m}) \leq 2 ω (K \cap t B_{2}^{n + m}) .

γ (K \cap t S^{n + m - 1}) \leq γ (K \cap t B_{2}^{n + m}) \leq 2 ω (K \cap t B_{2}^{n + m}) .

∥ y - Φ \hat{x} - m \hat{v} ∥_{2} \leq ∥ y - Φ μ x^{⋆} - m v^{⋆} ∥_{2} .

∥ y - Φ \hat{x} - m \hat{v} ∥_{2} \leq ∥ y - Φ μ x^{⋆} - m v^{⋆} ∥_{2} .

∥ Φ h + m e - z ∥_{2} \leq ∥ z ∥_{2} .

∥ Φ h + m e - z ∥_{2} \leq ∥ z ∥_{2} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Blind Source Separation Techniques · Electrical and Bioimpedance Tomography

Full text

Recovery of Structured Signals From Corrupted Non-Linear Measurements

Zhongxing Sun and Wei Cui

School of Information and Electronics

Beijing Institute of Technology

Beijing 100081, China

Email: {zhongxingsun, cuiwei}@bit.edu.cn

Yulong Liu

School of Physics

Beijing Institute of Technology

Beijing 100081, China

Email: [email protected]

Abstract

This paper studies the problem of recovering a structured signal from a relatively small number of corrupted non-linear measurements. Assuming that signal and corruption are contained in some structure-promoted set, we suggest an extended Lasso to disentangle signal and corruption. We also provide conditions under which this recovery procedure can successfully reconstruct both signal and corruption.

I Introduction

Throughout science and engineering, one is often faced with the challenge of recovering a structured signal from a relatively small number of linear observations

[TABLE]

where $\bm{\Phi}\in\mathbb{R}^{m\times n}$ is the sensing matrix, $\bm{x}^{\star}\in\mathbb{R}^{n}$ is the desired structured signal, and $\bm{n}\in\mathbb{R}^{m}$ is the random noise. The objective is to estimate $\bm{x}^{\star}$ from given knowledge of $\bm{y}$ and $\bm{\Phi}$ . Since this problem is generally ill-posed, tractable recovery is possible when the signal is suitably structured. A general model to encode signal structure is to assume that $\bm{x}^{\star}$ belongs to some set $\SS\subset\mathbb{R}^{n}$ . For example, to promote sparsity (or low-rankness) of the solution, one can choose $\SS$ to be a scaled $\ell_{1}$ (or nuclear norm) ball. Then the signal can be recovered by solving the following $\SS$ -Lasso problem:

[TABLE]

The performance of $\SS$ -Lasso (and its variants) under linear measurements has been extensively studied in the literature, see e.g., [1, 2, 3, 4] and references therein.

However, in many applications of interest the linear model may not be plausible. Important examples include $1$ -bit compressed sensing [5] and generalized linear models [6]. In these scenarios, measurements can be approached with the semiparametric single index model [7, 8]

[TABLE]

where $f_{i}:\mathbb{R}\rightarrow\mathbb{R}$ are independent copies of an unknown non-linear map $f$ (or it may be deterministic) and $\bm{\Phi}_{i}^{T}$ denote rows of $\bm{\Phi}$ . In a seminal paper [9], Plan and Vershynin present a theoretical analysis for $\SS$ -Lasso under the non-linear observation model (2). Their results show that non-linear observations behave as scaled and noisy linear observations, and under suitable conditions, a scaled original signal can be recovered by $\SS$ -Lasso.

This work extends that of [9] to a more challenging setting, in which the non-linear measurements are corrupted by an unknown but structured vector $\bm{v}^{\star}$ , i.e.,

[TABLE]

This model is motivated by some practical applications:

•

Clipping or saturation noise: signal clipping or saturation frequently appears in power-amplifiers and analog-to-digital converters (ADC) because of the limited range in the devices [10, 11]. In those cases, one always measures $f(\bm{\Phi}\bm{x})$ rather than $\bm{\Phi}\bm{x}$ , where $f$ is typically a nonlinear map. And saturation occurs when the input exceeding the maximum or minimum device output. Unlike the white noise or quantization error, the saturation can be unbounded. However, it will be sparse provided the clipping level is high enough, which means the model (3) is appropriate. The elimination of saturation effect may be difficult in a broad class of radar and sonar systems [12].

•

State estimation for electrical power networks: non-linear measurements $f(\bm{x})$ caused by device constraints are sent to the central control center in powers networks. These measurements may contain gross errors or outliers modeled by structured corruptions which have arbitrary amplitude due to system malfunctions. So state estimation in power networks needs to detect and eliminate these large measurement errors [13, 14, 15, 16].

In particular, if $f$ is the identity function, the model (3) reduces to the standard corrupted sensing problem [17, 18, 19, 20, 21, 22].

Assume that $(\bm{x}^{\star},\bm{v}^{\star})$ belongs to some set $\mathcal{T}\subset\mathbb{R}^{n}\times\mathbb{R}^{m}$ which is meant to capture structures of signal and corruption. A natural method to disentangle signal and corruption is to minimize the $\ell_{2}$ loss subject to a geometric constraint:

[TABLE]

This procedure might be regarded as an extension of $\SS$ -Lasso [9].

The goal of this paper is to investigate the performance of $\mathcal{T}$ -Lasso (4) under the model (3). To this end, we require some model assumptions:

•

Gaussian measurements: we assume that rows $\bm{\Phi}_{i}^{T}$ of $\bm{\Phi}$ are i.i.d. Gaussian vectors, i.e., $\bm{\Phi}_{i}\sim\mathcal{N}(0,\bm{I}_{n})$ . Note that the factor $\sqrt{m}$ in the model (3) makes the columns of both $\bm{A}$ and $\sqrt{m}\bm{I}_{m}$ have the same scale, which helps our theoretical results to be more interpretable.

•

Unit norm of the signal: without loss of generality, we assume that $\|\bm{x}^{\star}\|_{2}=1$ because the norm of $\bm{x}^{\star}$ may be absorbed into the non-linear function $f$ .

•

Sub-Gaussian distribution of $\bar{\bm{y}}_{i}=f_{i}(\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle)$ : we assume that $\bar{\bm{y}}_{i}=f_{i}(\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle)$ are sub-Gaussian variables as in [23]. To understand this assumption, note that $\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle$ is Gaussian, $\bar{\bm{y}}_{i}$ will be sub-Gaussian provided that $f$ does not grow faster than linearly, namely, $f(x)\leq a+b|x|$ for some scalars $a$ and $b$ .

Under the above assumptions, we establish theoretical guarantees for $\mathcal{T}$ -Lasso (4) under corrupted non-linear measurements (3). Our results demonstrate that under proper conditions, it is possible to disentangle signal and corruption in this quite challenging scenario.

II Preliminaries

In this section, we review some preliminaries which underlie our analysis. Hereafter, $\mathbb{S}^{n-1}$ and $\mathbb{B}_{2}^{n}$ denote the unit sphere and ball in $\mathbb{R}^{n}$ under the $\ell_{2}$ norm respectively. We use the notation $C,C^{\prime},c_{1},c_{2},\textrm{etc.},$ to refer to absolute constants whose value may change from line to line.

II-A Convex Geometry

The tangent cone of a set $\SS\subset\mathbb{R}^{n}$ at $\bm{x}$ is defined as

[TABLE]

The tangent cone may also be called the descent cone.

The Gaussian width and the Gaussian complexity of a set $\SS\subset\mathbb{R}^{n}$ are, respectively, defined as

[TABLE]

and

[TABLE]

These two geometric quantities are closely related to each other [24]:

[TABLE]

The local Gaussian width of a set $\SS\subset\mathbb{R}^{n}$ is a function of parameter $t\geq 0$ defined as

[TABLE]

II-B High-Dimensional Probability

A random variable $X$ is called a sub-Gaussian random variable if the sub-Gaussian norm

[TABLE]

is finite. A random vector $\bm{x}$ in $\mathbb{R}^{n}$ is sub-Gaussian random vector if all of its one-dimensional marginals are sub-Gaussian random variables. The sub-Gaussian norm of $\bm{x}$ is defined as

[TABLE]

A random vector $\bm{x}$ in $\mathbb{R}^{n}$ is isotropic if $\operatorname{\mathbb{E}}(\bm{x}\bm{x}^{T})=\bm{I}_{n}$ .

II-C A Useful Tool

In the proofs of our main results, we make heavy use of the following matrix deviation inequality, which implies a tight lower bound for the restricted singular value of the extended sensing matrix $[\bm{A},\sqrt{m}\bm{I}_{m}]$ .

Fact 1 (Extended Matrix Deviation Inequality, [22]).

Let $\bm{A}$ be an $m\times n$ matrix whose rows $\bm{A}_{i}^{T}$ are independent centered isotropic sub-Gaussian vectors with $K=\max_{i}\|\bm{A}_{i}\|_{\psi_{2}}$ , and $\mathcal{T}$ be a bounded subset of $\mathbb{R}^{n}\times\mathbb{R}^{m}$ . Then for any $s\geq 0$ , the event

[TABLE]

holds with probability at least $1-\exp(-s^{2})$ , where $\operatorname{rad}(\mathcal{T}):=\sup_{\bm{x}\in\mathcal{T}}\|\bm{x}\|_{2}$ denotes the radius of $\mathcal{T}$ .

In particular, when $\mathcal{T}$ is a subset of $\mathbb{S}^{n+m-1}$ or $t\mathbb{S}^{n+m-1}$ , Fact 1 implies that the event

[TABLE]

holds with probability at least $1-\exp\{-\gamma(\mathcal{T}\cap\mathbb{S}^{n+m-1})^{2}\}$ , or the event

[TABLE]

holds with probability at least $1-\exp\{-\gamma(\mathcal{T}\cap t\mathbb{S}^{n+m-1})^{2}/t^{2}\}$ .

III Main Results

Before stating our result, we need to introduce two nonlinearity parameters, which are essentially the intrinsic mean and variance associated with the nonlinear map $f$ . Let $g$ be a standard normal random variable, the two parameters are defined as [9]:

[TABLE]

We then present two main results, one considers the case when the signal $(\mu\bm{x}^{\star},\bm{v}^{\star})$ lies at an extreme point of $\mathcal{T}$ , and the other assumes that $(\mu\bm{x}^{\star},\bm{v}^{\star})$ lies in the interior of $\mathcal{T}$ .

Theorem 1.

Let $(\hat{\bm{x}},\hat{\bm{v}})$ be the solution to $\mathcal{T}$ -Lasso (4). Suppose that $\bm{\Phi}_{i}\sim\mathcal{N}(0,\bm{I}_{n})$ , $\bm{x}^{\star}\in S^{n-1}$ , and that $\bar{\bm{y}}_{i}=f_{i}(\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle)$ are centered sub-Gaussian random variables with sub-Gaussian norm $\psi$ . Assume that $(\mu\bm{x}^{\star},\bm{v}^{\star})\in\mathcal{T}$ , and let $\mathcal{D}:=\mathcal{D}(\mathcal{T},(\mu\bm{x}^{\star},\bm{v}^{\star}))$ . If

[TABLE]

then, for any $0<s\leq\sqrt{m}$ , the event

[TABLE]

holds with probability at least $1-2\exp(-cs^{2}\sigma^{4}/(\psi+\mu)^{4})-\exp(-\gamma(\mathcal{D}\cap\mathbb{S}^{n+m-1})^{2})$ .

*Remark 1** (Relation to corrupted sensing).*

If $f$ is the identity function, then we have $\mu=1,~{}\sigma=0$ , and $\psi=c$ . Thus Theorem 1 implies that if $m\geq C\cdot\omega(\mathcal{D}\cap\mathbb{B}_{2}^{n+m})^{2}$ , $\mathcal{T}$ -Lasso (4) succeeds with high probability, which is consistent with the constrained recovery results in [17, Theorem 1] and [22, Theorem 2].

Note that $\omega_{1}(\mathcal{D})^{2}$ is the effective dimension of the descent cone $\mathcal{D}$ . When $(\mu\bm{x}^{\star},\bm{v}^{\star})$ lies on the boundary of $\mathcal{T}$ , which might lead to a narrow descent cone and hence a small effective dimension, then Theorem 1 becomes quite reasonable: a good estimation is guaranteed if the number of observations exceeds the effective dimension of $\mathcal{D}$ , which may be much smaller than the ambient dimension $n+m$ . However, when $(\mu\bm{x}^{\star},\bm{v}^{\star})$ is an interior point of $\mathcal{T}$ , the descent cone is the entire space, the effective dimension $\omega_{1}(\mathcal{D})^{2}$ is of the order of the ambient dimension $n+m$ . In this case, the results in Theorem 1 become meaningless. The following theorem deals with this situation. As it turns out that local Gaussian width serves as a new measure to characterize the low dimension structure of set $\mathcal{T}$ which is unnecessary to be a cone.

Theorem 2.

Let $(\hat{\bm{x}},\hat{\bm{v}})$ be the solution to $\mathcal{T}$ -Lasso (4). Suppose that $\bm{\Phi}_{i}\sim\mathcal{N}(0,\bm{I}_{n})$ , $\bm{x}^{\star}\in S^{n-1}$ , and that $\bar{\bm{y}}_{i}=f_{i}(\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle)$ are centered sub-Gaussian random variables with sub-Gaussian norm $\psi$ . Assume that $(\mu\bm{x}^{\star},\bm{v}^{\star})\in\mathcal{T}$ and let $\mathcal{K}:=\mathcal{T}-(\mu\bm{x}^{\star},\bm{v}^{\star})$ is a star shaped set111 $\mathcal{K}$ is a star shaped set if it satisfies $\lambda\mathcal{K}\subset\mathcal{K}$ for any $0\leq\lambda\leq 1$ . Specially, any convex set containing origin is star shaped.. If

[TABLE]

then, for any $t>0,~{}0<s\leq\sqrt{m}$ , the event

[TABLE]

holds with probability at least $1-2\exp(-cs^{2}\sigma^{4}/(\psi+\mu)^{4})-\exp(-\gamma(\mathcal{K}\cap t\mathbb{S}^{n+m-1})^{2}/t^{2})$ .

*Remark 2** (Local Gaussian width).*

Note that if we let $t\to 0$ , then ${\omega_{t}(\mathcal{K})}/{t}$ goes to $\omega(\mathbb{B}_{2}^{n+m})$ , which is of the order of $\sqrt{n+m}$ . Then the results in Theorem 2 are exact what in Theorem 1 when $(\mu\bm{x}^{\star},\bm{v}^{\star})$ is an interior point of $\mathcal{T}$ . This suggests that Theorem 1 can be regarded as an extreme case of Theorem 2, and local Gaussian width can better characterizes the low dimension structure of sets than Gaussian width.

*Remark 3** (Relation to results in [9]).*

Theorems 1 and 2 show that the recovery error can be diminished to an arbitrarily small degree provided that the number of measurements is large enough. Specially, in the corruption-free case (i.e., without the $\psi+\mu$ term in the high-probability bounds), our results also agree with Theorem $1.4$ and Theorem $1.9$ in [9].

IV Proofs of Main Results

Before proving Theorems 1 and 2, we require two useful lemmas.

Lemma 1.

Suppose that $\bm{\Phi}_{i}\sim\mathcal{N}(0,\bm{I}_{n})$ and $\bar{\bm{y}}_{i}=f_{i}(\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle)$ are centered sub-Gaussian random variables with sub-Gaussian norm $\psi$ . Assume $\mathcal{K}^{t}=\mathcal{K}_{\bm{a}}^{t}\times\mathcal{K}_{\bm{b}}^{t}\subset t\mathbb{B}_{2}^{n+m}$ is a star shaped set and let $\bm{z}:=f(\bm{\Phi}\bm{x}^{\star})-\bm{\Phi}\mu\bm{x}^{\star}$ . Then, for any $0<s\leq\sqrt{m}$ , the event

[TABLE]

holds with probability at least $1-2\exp(-c{s^{2}\sigma^{4}}/{(\psi+\mu)^{4}})$ .

Proof.

See Appendix A. ∎

Lemma 2.

Let $\mathcal{K}=\mathcal{T}-(\mu\bm{x}^{\star},\bm{v}^{\star})$ be a star shaped set and $t>0$ . Suppose that $m\geq C\cdot\omega_{t}(\mathcal{K})^{2}/t^{2}$ . Then, the following lower bound

[TABLE]

holds for all $(\bm{h},\bm{e})\in\mathcal{K}$ satisfying $\sqrt{\|\bm{h}\|_{2}^{2}+\|\bm{e}\|_{2}^{2}}\geq t$ with probability at least $1-\exp\big{(}-\gamma(\mathcal{K}\cap t\mathbb{S}^{n+m-1})^{2}/t^{2}\big{)}$ .

Proof.

Let $\lambda=\frac{t}{\sqrt{\|\bm{h}\|_{2}^{2}+\|\bm{e}\|_{2}^{2}}}\leq 1$ and $(\bm{u},\bm{v})=\lambda\cdot(\bm{h},\bm{e})$ . Then $(\bm{u},\bm{v})\in\lambda\mathcal{K}\cap tS^{n+m-1}$ . Thus we have

[TABLE]

holds with probability at least $1-\exp\big{(}-\gamma(\mathcal{K}\cap t\mathbb{S}^{n+m-1})^{2}/t^{2}\big{)}$ . The first inequality holds because $\mathcal{K}$ is star shaped, then $\lambda\mathcal{K}\subset\mathcal{K}$ . The second inequality follows from (7). The third inequality holds because (5) and $\bm{0}\in\mathcal{K}$ , i.e.,

[TABLE]

The last inequality follows from the assumption on the number of measurements $m\geq C\cdot\omega_{t}(\mathcal{K})^{2}/t^{2}$ . ∎

IV-A Proof of Theorem 1

Proof.

For clarity, the proof is divided into three steps.

Step 1: Problem reduction. Since $(\hat{\bm{x}},\hat{\bm{v}})$ is the solution to the $\mathcal{T}$ -Lasso problem (4) and $(\mu\bm{x}^{\star},\bm{v}^{\star})\in\mathcal{T}$ , then we have

[TABLE]

Recall that $\bm{z}=f(\bm{\Phi}\bm{x}^{\star})-\bm{\Phi}\mu\bm{x}^{\star}$ , then $\bm{y}=\bm{\Phi}\mu\bm{x}^{\star}+\sqrt{m}\bm{v}^{\star}+\bm{z}$ . Let $\bm{h}=\hat{\bm{x}}-\mu\bm{x}^{\star}$ and $\bm{e}=\hat{\bm{v}}-\bm{v}^{\star}$ . Then (12) can be reformulated as

[TABLE]

Squaring both sides of (13) yields

[TABLE]

Step 2: Lower Bound on $\|{\bm{\Phi}\bm{h}+\sqrt{m}\bm{e}}\|_{2}$ . Define the error set

[TABLE]

in which the error vector $(\hat{\bm{x}}-\mu\bm{x}^{\star},\hat{\bm{v}}-\bm{v}^{\star})$ lives. Clearly, $\mathcal{E}(\mu\bm{x}^{\star},\bm{v}^{\star})$ belongs to the tangent cone $\mathcal{D}(\mathcal{T},(\mu\bm{x}^{\star},\bm{v}^{\star}))$ . It then follows from (6) that the event

[TABLE]

holds with probability at least $1-\exp\{-\gamma(\mathcal{D}\cap\mathbb{S}^{n+m-1})^{2}\}$ . The second inequality holds because (5) and $\bm{0}\in\mathcal{D}$ , namely

[TABLE]

The last inequality is due to (10).

Step 3: Upper Bound on $\left\langle\bm{\Phi}\bm{h}+\sqrt{m}\bm{e},\bm{z}\right\rangle$ . It follows Lemma 1 that (by setting $t=1$ ) the event

[TABLE]

holds with probability at least $1-2\exp(-cs^{2}\sigma^{4}/{(\psi+\mu)^{4}})$ .

Putting everything together and taking union bound, we have that, with probability at least $1-2\exp(-cs^{2}\sigma^{4}/{(\psi+\mu)^{4}})-\exp\big{(}-\gamma(\mathcal{D}\cap\mathbb{S}^{n+m-1})^{2}\big{)}$ ,

[TABLE]

Rearranging completes the proof of Theorem 1. ∎

IV-B Proof of Theorem 2

Proof.

First note that if $\sqrt{\|\bm{h}\|_{2}^{2}+\|\bm{e}\|_{2}^{2}}\leq t$ , then Theorem 2 holds trivially. So it is sufficient to prove Theorem 2 under assumption $\sqrt{\|\bm{h}\|_{2}^{2}+\|\bm{e}\|_{2}^{2}}\geq t$ .

Similar to Step 1 of the proof of Theorem 1, we have

[TABLE]

Observe that the error vector $(\bm{h},\bm{e})$ belongs to a star shaped set, namely $\mathcal{K}=\mathcal{T}-(\mu\bm{x}^{\star},\bm{v}^{\star})$ . It then follows from Lemma 2 that the following event

[TABLE]

holds with probability at least $1-\exp\big{(}-\gamma(\mathcal{K}\cap t\mathbb{S}^{n+m-1})^{2}/t^{2}\big{)}$ .

Combining (15) and (16) yields

[TABLE]

Note that $\sqrt{\|\bm{h}\|_{2}^{2}+\|\bm{e}\|_{2}^{2}}\geq t$ , we cannot use the upper bound in Lemma 1 directly. So dividing both sides of (17) by $m\delta=m\sqrt{\|\bm{h}\|_{2}^{2}+\|\bm{e}\|_{2}^{2}}$ , we obtain

[TABLE]

holds with probability at least $1-2\exp(-cs^{2}\sigma^{4}/{(\psi+\mu)^{4}})-\exp\big{(}-\gamma(\mathcal{K}\cap t\mathbb{S}^{n+m-1})^{2}/t^{2}\big{)}$ . In the second inequality we set $(\bm{u},\bm{v})=\delta^{-1}(\bm{h},\bm{e})$ . The third inequality holds due to $\mathcal{K}$ is star shaped, namely $t\delta^{-1}\mathcal{K}\subset\mathcal{K}$ and hence $\delta^{-1}\mathcal{K}\subset t^{-1}\mathcal{K}$ . In the fourth line we let $(\bm{a},\bm{b})=t(\bm{u},\bm{v})$ . The last inequality follows from Lemma 1. Thus we complete the proof. ∎

V Conclusion

In this paper, we have analyzed performance guarantees for $\mathcal{T}$ -Lasso which is used to recover a structured signal from corrupted non-linear Gaussian measurements. The theoretical results may be of help in some practical applications such as dealing with saturation error in quantization which has been a challenge in the area of signal processing. As for future work, it is worthwhile to deduce the explicit expressions of the main results for different specific problems, and to consider penalized recovery procedures rather than a constrained one for computational purposes.

Appendix A Proof of Lemma 1

A-A Auxiliary Definitions and Facts

To prove Lemma 1, we require some additional definitions and facts.

Definition 1 (Sub-exponential random variable and vector).

A random variable $X$ is called a sub-exponential random variable if the sub-exponential norm

[TABLE]

is finite. A random vector $\bm{x}$ in $\mathbb{R}^{n}$ is called sub-exponential random vector if all of its one-dimensional marginals are sub-exponential random variables. The sub-exponential norm of $\bm{x}$ is defined as

[TABLE]

Fact 2 (Sub-Gaussian distributions with independent coordinates).

[25*, Lemma 3.4.2]**

Let $X=(X_{1},\ldots,X_{n})^{T}\in\mathbb{R}^{n}$ be a random vector with independent, mean zero, sub-Gaussian coordinates $X_{i}$ . Then $X$ is a sub-Gaussian random vector, and*

[TABLE]

Fact 3 (Product of sub-Gaussian is sub-exponential).

[25, Lemma 2.7.7]** Let $X$ and $Y$ be sub-Gaussian random variables (not necessarily independent). Then $XY$ is sub-exponential. Moreover,

[TABLE]

Fact 4 (Centering).

[25*, Lemma 2.6.8 and Exercise 2.7.10]**

If $X$ is sub-Gaussian (or sub-exponential), then so is $X-\operatorname{\mathbb{E}}X$ . Moreover,*

[TABLE]

Fact 5 (Bernstein-type inequality).

[25*, Theorem 2.8.2]**

Let $X_{1},X_{2},\ldots,X_{m}$ be independent, mean-zero, sub-exponential random variables, and $\bm{a}=(a_{1},a_{2},\ldots,a_{m})^{T}\in\mathbb{R}^{m}$ . Then, for any $t\geq 0$ , we have*

[TABLE]

where $K=\max_{i}\|X_{i}\|_{\psi_{1}}$ .

Fact 6 (Gaussian concentration).

[25*, Theorem 5.2.2]**

Consider a random vector $X\sim\mathcal{N}(0,\bm{I}_{n})$ and a Lipschitz function $f:~{}\mathbb{R}^{n}\to\mathbb{R}$ with Lipschitz norm $\|f\|_{\textrm{Lip}}$ (with respect to the Euclidean metric). Then for any $t\geq 0$ , we have*

[TABLE]

Fact 7 (Talagrand’s Majorizing Measure Theorem).

[26*, Theorem 2.2.27]** or [24, Theorem 8]

Let $(X_{\bm{u}})_{\bm{u}\in\SS}$ be a random process indexed by points in a bounded set $\SS\subset\mathbb{R}^{n}$ . Assume that the process has sub-Gaussian increments, that is, there exists $M\geq 0$ such that*

[TABLE]

Then, for any $s\geq 0$ , the event

[TABLE]

holds with probability at least $1-\exp(-s^{2})$ , where $\operatorname{diam}(\SS):=\sup_{\bm{x},\bm{y}\in\SS}\|\bm{x}-\bm{y}\|_{2}$ denotes the diameter of $\SS$ .

A-B Proof of Lemma 1

We are now in position to prove Lemma 1. Observe that

[TABLE]

So it suffices to bound the two terms on the right side. To this end, we have the following two lemmas.

Lemma 3.

Under the settings of Lemma 1, then for any $0<s\leq\sqrt{m}$ , the event

[TABLE]

holds with probability at least $1-2\exp(-{cs^{2}\sigma^{4}}/{(\psi+\mu)^{4}})$ .

Proof.

See Appendix B. ∎

Lemma 4.

Under the settings of Lemma 1, the event

[TABLE]

holds with probability at least $1-\exp(-\frac{s^{2}\sigma^{2}}{(\psi+\mu)^{2}})$ .

Proof.

Note that $\bm{z}_{i}$ are i.i.d. centered sub-Gaussian variables with $\psi_{2}$ -norm

[TABLE]

Then by Fact 2, $\bm{z}$ is a sub-Gaussian random vector with

[TABLE]

Define the random process $X_{\bm{b}}:=\left\langle\bm{b},\bm{z}\right\rangle$ , which has sub-Gaussian increments:

[TABLE]

Note that $\bm{0}\in\mathcal{K}_{\bm{b}}^{t}$ , it then follows from Talagrand’s Majorizing Measure Theorem (Fact 7) that the event

[TABLE]

holds with probability at least $1-\exp(-u^{2})$ . The last inequality holds because $\operatorname{diam}(\mathcal{K}_{\bm{b}}^{t})=\sup_{\bm{x},\bm{y}\in\mathcal{K}_{\bm{b}}^{t}}\|\bm{x}-\bm{y}\|_{2}\leq 2t$ . Setting $u=\frac{s\sigma}{\psi+\mu}$ yields the desired results. ∎

Thus, combing Lemma 3 and Lemma 4 yields the proof of Lemma 1, namely, for any $0<s\leq\sqrt{m}$ , the event

[TABLE]

holds with probability at least

[TABLE]

In the last inequality we have used the facts that $\omega(\mathcal{K}_{\bm{a}}^{t})\leq\omega(\mathcal{K}^{t})$ and $\omega(\mathcal{K}_{\bm{b}}^{t})\leq\omega(\mathcal{K}^{t})$ .

Appendix B Proof of Lemma 3

The proof of Lemma 3 is inspired by [23]. For clarity, the proof is divided into the following three steps.

Step 1: Problem Reduction. Since $\bm{z}_{i}$ are not independent of $\bm{\Phi}_{i}$ , to facilitate the analysis, we need to “decouple” them as much as possible. To this end, we consider the orthogonal decomposition of the vectors $\bm{\Phi}_{i}$ along the direction of $\bm{x}^{\star}$ and its orthogonal complementary space. More precisely, we express

[TABLE]

where $\bm{P}:=\bm{x}^{\star}{\bm{x}^{\star}}^{\perp}$ and $\bm{P}^{\perp}:=\bm{I}_{n}-\bm{P}$ . Thus we have

[TABLE]

Step 2: Bound $E_{1}$ . Define $\xi_{i}:=\bm{z}_{i}\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle=\big{[}f(\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle)-\mu\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle\big{]}\left\langle\bm{\Phi}_{i},\bm{x}^{\star}\right\rangle$ . By the definition of $\mu$ , it is not hard to check that $\operatorname{\mathbb{E}}\xi_{i}=0$ . Note that $\bm{z}_{i}$ have sub-Gaussian norm $K\leq C_{2}(\psi+\mu)$ (see (18)) and $\left\langle\bm{\Phi}_{i},x^{\star}\right\rangle\sim\mathcal{N}(0,1)$ . It then follows from Fact 3 that $\xi_{i}$ are i.i.d. centered sub-exponential variables with $\|\xi_{i}\|_{\psi_{1}}=C^{\prime}K$ . Let $\epsilon=s/\sqrt{m}\leq 1$ . A Bernstein-type inequality (Fact 5) implies that

[TABLE]

holds with probability at least

[TABLE]

In the last inequality we have used the facts that $\sigma^{2}=\operatorname{\mathbb{E}}\bm{z}_{i}^{2}\leq CK^{2}$ and $\epsilon\leq 1$ .

Step 3: Bound $E_{2}$ . Let $\bm{w}=\sum_{i=1}^{m}\bm{z}_{i}\bm{P}^{\perp}\bm{\Phi}_{i}$ . By the orthogonal decomposition (19), $\bm{P}^{\perp}\bm{\Phi}_{i}$ and $\bm{z}_{i}$ are independent [23, Lemma 8.1]. Fixing $\bm{z}_{i}$ , a direct calculation shows that

[TABLE]

where $k=\sqrt{{\sum_{i=1}^{m}\bm{z}_{i}^{2}}}$ . Thus, conditioning on $\bm{z}_{i}$ , $E_{2}=\sup_{\bm{a}\in\mathcal{K}_{\bm{a}}^{t}}\left\langle\bm{w},\bm{a}\right\rangle=k\cdot\sup_{\bm{a}\in\mathcal{K}_{\bm{a}}^{t}}\left\langle\bm{P}^{\perp}\bm{g},\bm{a}\right\rangle$ .

Note that $\bm{z}_{i}^{2}$ are sub-exponential variables with mean $\sigma^{2}$ and $\psi_{1}$ -norm $CK^{2}$ . By Fact 4, $\bm{z}_{i}^{2}-\sigma^{2}$ are centered sub-exponential variables with $\psi_{1}$ -norm $C^{\prime}K^{2}$ . A similar application of Bernstein-type inequality (Fact 5) yields that

[TABLE]

holds with probability at least

[TABLE]

Here we have used the fact that $\sigma^{2}=\operatorname{\mathbb{E}}\bm{z}_{i}^{2}\leq CK^{2}$ again. Therefore, with probability at least $1-2\exp(-{cm\sigma^{4}}/{K^{4}})$ ,

[TABLE]

We next bound $\sup_{\bm{a}\in\mathcal{K}_{\bm{a}}^{t}}\left\langle\bm{P}^{\perp}\bm{g},\bm{a}\right\rangle$ using Gaussian concentration. Since $\mathcal{K}_{\bm{a}}^{t}\subset t\mathbb{B}_{2}^{n}$ , the function $\bm{x}\mapsto\sup_{\bm{a}\in\mathcal{K}_{\bm{a}}^{t}}\left\langle\bm{P}^{\perp}\bm{x},\bm{a}\right\rangle$ has Lipschitz norm at most $t$ . Indeed,

[TABLE]

where we choose $\tilde{\bm{a}}$ such that $\sup_{\bm{a}\in\mathcal{K}_{\bm{a}}^{t}}\left\langle\bm{P}^{\perp}\bm{x},\bm{a}\right\rangle=\left\langle\bm{P}^{\perp}\bm{x},\tilde{\bm{a}}\right\rangle$ .

Therefore, Gaussian concentration inequality (Fact 6) implies that

[TABLE]

holds with probability at least $1-\exp(-c\epsilon^{2}m)$ . The second inequality holds because

[TABLE]

where the inequality follows from the independence of $\bm{P}{\bm{g}}$ and $\bm{P}^{\perp}{\bm{g}}$ and Jensen’s inequality.

Taking union bound yields, with probability at least $1-2\exp\Big{(}-\frac{cm\sigma^{4}}{K^{4}}\Big{)}-\exp(-c\epsilon^{2}m)$ ,

[TABLE]

Putting everything together, we conclude that, for any $0<s\leq\sqrt{m}$ (noting that $\epsilon=s/\sqrt{m}$ ),

[TABLE]

holds with probability at least

[TABLE]

Here we have used again that $\sigma^{2}\leq CK^{2}$ and $\epsilon\leq 1$ . Thus we complete the proof.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The convex geometry of linear inverse problems,” Found. Comut. Math. , vol. 12, no. 6, pp. 805–849, 2012.
2[2] J. A. Tropp, “Convex recovery of a structured signal from independent random linear measurements,” in Sampling Theory, a Renaissance . Springer, 2015, pp. 67–101.
3[3] R. Vershynin, “Estimation in high dimensions: a geometric perspective,” in Sampling theory, a renaissance . Springer, 2015, pp. 3–66.
4[4] C. Thrampoulidis, S. Oymak, and B. Hassibi, “Recovering structured signals in noise: least-squares meets compressed sensing,” in Compressed Sensing and its Applications . Springer, 2015, pp. 97–141.
5[5] P. T. Boufounos and R. G. Baraniuk, “1-bit compressive sensing,” in Information Sciences and Systems, 2008. CISS 2008. 42nd Annual Conference on . IEEE, 2008, pp. 16–21.
6[6] P. Mc Cullagh, “Generalized linear models,” European Journal of Operational Research , vol. 16, no. 3, pp. 285–292, 1984.
7[7] H. Ichimura, “Semiparametric least squares (sls) and weighted sls estimation of single-index models,” Journal of Econometrics , vol. 58, no. 1-2, pp. 71–120, 1993.
8[8] J. L. Horowitz and W. Härdle, “Direct semiparametric estimation of single-index models with discrete covariates,” Journal of the American Statistical Association , vol. 91, no. 436, pp. 1632–1640, 1996.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Recovery of Structured Signals From Corrupted Non-Linear Measurements

Abstract

I Introduction

II Preliminaries

II-A Convex Geometry

II-B High-Dimensional Probability

II-C A Useful Tool

Fact 1** (Extended Matrix Deviation Inequality, [22]).**

III Main Results

Theorem 1**.**

Remark 1* (Relation to corrupted sensing).*

Theorem 2**.**

Remark 2* (Local Gaussian width).*

Remark 3* (Relation to results in [9]).*

IV Proofs of Main Results

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

IV-A Proof of Theorem 1

Proof.

IV-B Proof of Theorem 2

Proof.

V Conclusion

Appendix A Proof of Lemma 1

A-A Auxiliary Definitions and Facts

Definition 1** (Sub-exponential random variable and vector).**

Fact 2** (Sub-Gaussian distributions with independent coordinates).**

Fact 3** (Product of sub-Gaussian is sub-exponential).**

Fact 4** (Centering).**

Fact 5** (Bernstein-type inequality).**

Fact 6** (Gaussian concentration).**

Fact 7** (Talagrand’s Majorizing Measure Theorem).**

A-B Proof of Lemma 1

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Appendix B Proof of Lemma 3

Fact 1 (Extended Matrix Deviation Inequality, [22]).

Theorem 1.

*Remark 1** (Relation to corrupted sensing).*

Theorem 2.

*Remark 2** (Local Gaussian width).*

*Remark 3** (Relation to results in [9]).*

Lemma 1.

Lemma 2.

Definition 1 (Sub-exponential random variable and vector).

Fact 2 (Sub-Gaussian distributions with independent coordinates).

Fact 3 (Product of sub-Gaussian is sub-exponential).

Fact 4 (Centering).

Fact 5 (Bernstein-type inequality).

Fact 6 (Gaussian concentration).

Fact 7 (Talagrand’s Majorizing Measure Theorem).

Lemma 3.

Lemma 4.