An Analysis of State Evolution for Approximate Message Passing with Side   Information

Hangjin Liu; Cynthia Rush; Dror Baron

arXiv:1902.00150·cs.IT·May 7, 2019

An Analysis of State Evolution for Approximate Message Passing with Side Information

Hangjin Liu, Cynthia Rush, Dror Baron

PDF

Open Access

TL;DR

This paper extends the theoretical understanding of approximate message passing algorithms by providing performance guarantees for AMP with side information, supported by numerical evidence showing accurate state evolution predictions.

Contribution

It offers the first rigorous analysis of AMP with side information, establishing conditions under which its performance can be accurately predicted by state evolution.

Findings

01

Performance guarantees for AMP-SI under joint distribution assumptions

02

State evolution accurately predicts mean square error in AMP-SI

03

Numerical results support theoretical predictions

Abstract

A common goal in many research areas is to reconstruct an unknown signal x from noisy linear measurements. Approximate message passing (AMP) is a class of low-complexity algorithms for efficiently solving such high-dimensional regression tasks. Often, it is the case that side information (SI) is available during reconstruction. For this reason a novel algorithmic framework that incorporates SI into AMP, referred to as approximate message passing with side information (AMP-SI), has been recently introduced. An attractive feature of AMP is that when the elements of the signal are exchangeable, the entries of the measurement matrix are independent and identically distributed (i.i.d.) Gaussian, and the denoiser applies the same non-linearity at each entry, the performance of AMP can be predicted accurately by a scalar iteration referred to as state evolution (SE). However, the AMP-SI…

Figures1

Click any figure to enlarge with its caption.

Equations165

y = Ax + w,

y = Ax + w,

g_{t} (a, b) = E [X ∣ X + λ_{t} N (0, I_{n}) = a, X = b] .

g_{t} (a, b) = E [X ∣ X + λ_{t} N (0, I_{n}) = a, X = b] .

r^{t}

r^{t}

x^{t + 1}

λ_{t}^{2} = σ_{w}^{2} + \frac{1}{δ n} E [∣∣ g_{t - 1} (X + λ_{t - 1} Z, X) - X ∣ ∣^{2}],

λ_{t}^{2} = σ_{w}^{2} + \frac{1}{δ n} E [∣∣ g_{t - 1} (X + λ_{t - 1} Z, X) - X ∣ ∣^{2}],

\left|\phi(\mathbf{x})-\phi(\mathbf{y})\right|\leq L\Big{(}1+\Big{(}\frac{||\mathbf{x}||}{\sqrt{n}}\Big{)}^{k-1}+\Big{(}\frac{||\mathbf{y}||}{\sqrt{n}}\Big{)}^{k-1}\Big{)}\frac{||\mathbf{x}-\mathbf{y}||}{\sqrt{n}}.

\left|\phi(\mathbf{x})-\phi(\mathbf{y})\right|\leq L\Big{(}1+\Big{(}\frac{||\mathbf{x}||}{\sqrt{n}}\Big{)}^{k-1}+\Big{(}\frac{||\mathbf{y}||}{\sqrt{n}}\Big{)}^{k-1}\Big{)}\frac{||\mathbf{x}-\mathbf{y}||}{\sqrt{n}}.

η_{t} (a, b) = E [X ∣ X + λ_{t} N (0, 1) = a, X = b],

η_{t} (a, b) = E [X ∣ X + λ_{t} N (0, 1) = a, X = b],

r^{t} = y - A x^{t} + \frac{r ^{t - 1}}{δ} i = 1 \sum n η_{t - 1}^{'} ([x^{t - 1} + A^{T} r^{t - 1}]_{i}, x_{i}),

r^{t} = y - A x^{t} + \frac{r ^{t - 1}}{δ} i = 1 \sum n η_{t - 1}^{'} ([x^{t - 1} + A^{T} r^{t - 1}]_{i}, x_{i}),

x_{i}^{t + 1} = η_{t} ([x^{t} + A^{T} r^{t}]_{i}, x_{i}), for i = 1, 2, \dots, n,

λ_{t}^{2} = σ_{w}^{2} + \frac{1}{δ} E [(η_{t - 1} (X + λ_{t - 1} Z, X) - X)^{2}],

λ_{t}^{2} = σ_{w}^{2} + \frac{1}{δ} E [(η_{t - 1} (X + λ_{t - 1} Z, X) - X)^{2}],

ϕ_{m} (a, b) ψ_{n} (x, y, x) := \frac{1}{m} i = 1 \sum m ϕ (a_{i}, b_{i}) := \frac{1}{n} i = 1 \sum n ψ (x_{i}, y_{i}, x_{i}) .

ϕ_{m} (a, b) ψ_{n} (x, y, x) := \frac{1}{m} i = 1 \sum m ϕ (a_{i}, b_{i}) := \frac{1}{n} i = 1 \sum n ψ (x_{i}, y_{i}, x_{i}) .

m lim ϕ_{m} (r^{t}, w) = p m lim E [ϕ_{m} (W + λ_{t}^{2} - σ_{w}^{2} Z_{1}, W)]),

m lim ϕ_{m} (r^{t}, w) = p m lim E [ϕ_{m} (W + λ_{t}^{2} - σ_{w}^{2} Z_{1}, W)]),

n lim ψ_{n} (x^{t} + A^{T} r^{t}, x, x) = p n lim E [ψ_{n} (X + λ_{t} Z_{2}, X, X)],

n \to \infty lim \frac{1}{n} ∣∣ x^{t} + A^{T} r^{t} - x ∣ ∣^{2} = p λ_{t}^{2},

n \to \infty lim \frac{1}{n} ∣∣ x^{t} + A^{T} r^{t} - x ∣ ∣^{2} = p λ_{t}^{2},

n \to \infty lim \frac{1}{n} ∣∣ x^{t + 1} - x ∣ ∣^{2} = p δ (λ_{t + 1}^{2} - σ_{w}^{2}) .

n \to \infty lim \frac{1}{n} ∣∣ x^{t + 1} - x ∣ ∣^{2} = p δ (λ_{t + 1}^{2} - σ_{w}^{2}) .

\Big{\lvert}\frac{\partial}{\partial x}\phi(x,y)\Big{\lvert}\leq\mathsf{D}_{1}\qquad\text{and }\qquad\Big{\lvert}\frac{\partial}{\partial y}\phi(x,y)\Big{\lvert}\leq\mathsf{D}_{2}

\Big{\lvert}\frac{\partial}{\partial x}\phi(x,y)\Big{\lvert}\leq\mathsf{D}_{1}\qquad\text{and }\qquad\Big{\lvert}\frac{\partial}{\partial y}\phi(x,y)\Big{\lvert}\leq\mathsf{D}_{2}

∣ ϕ (x_{1}, y_{1}) - ϕ (x_{2}, y_{2}) ∣ = ∣ ϕ (x_{1}, y_{1}) - ϕ (x_{1}, y_{2}) + ϕ (x_{1}, y_{2}) - ϕ (x_{2}, y_{2}) ∣ \leq ∣ ϕ (x_{1}, y_{1}) - ϕ (x_{1}, y_{2}) ∣ + ∣ ϕ (x_{1}, y_{2}) - ϕ (x_{2}, y_{2}) ∣ \leq D_{2} ∣ y_{1} - y_{2} ∣ + D_{1} ∣ x_{1} - x_{2} ∣ \leq D_{2}^{2} + D_{1}^{2} (y_{1} - y_{2})^{2} + (x_{1} - x_{2})^{2} = D_{2}^{2} + D_{1}^{2} ∣∣ (x_{1}, y_{1}) - (x_{2}, y_{2}) ∣∣ .

∣ ϕ (x_{1}, y_{1}) - ϕ (x_{2}, y_{2}) ∣ = ∣ ϕ (x_{1}, y_{1}) - ϕ (x_{1}, y_{2}) + ϕ (x_{1}, y_{2}) - ϕ (x_{2}, y_{2}) ∣ \leq ∣ ϕ (x_{1}, y_{1}) - ϕ (x_{1}, y_{2}) ∣ + ∣ ϕ (x_{1}, y_{2}) - ϕ (x_{2}, y_{2}) ∣ \leq D_{2} ∣ y_{1} - y_{2} ∣ + D_{1} ∣ x_{1} - x_{2} ∣ \leq D_{2}^{2} + D_{1}^{2} (y_{1} - y_{2})^{2} + (x_{1} - x_{2})^{2} = D_{2}^{2} + D_{1}^{2} ∣∣ (x_{1}, y_{1}) - (x_{2}, y_{2}) ∣∣ .

X = X + N (0, σ^{2} I) .

X = X + N (0, σ^{2} I) .

η_{t} (a, b)

η_{t} (a, b)

= \frac{σ _{x}^{2} σ ^{2} a + σ _{x}^{2} λ _{t}^{2} b}{σ _{x}^{2} ( σ ^{2} + λ _{t}^{2} ) + σ ^{2} λ _{t}^{2}} .

λ_{t}^{2} = σ_{w}^{2} + \frac{1}{δ} [\frac{σ _{x}^{2} σ ^{2} λ _{t - 1}^{2}}{σ _{x}^{2} ( σ ^{2} + λ _{t - 1}^{2} ) + σ ^{2} λ _{t - 1}^{2}}] .

λ_{t}^{2} = σ_{w}^{2} + \frac{1}{δ} [\frac{σ _{x}^{2} σ ^{2} λ _{t - 1}^{2}}{σ _{x}^{2} ( σ ^{2} + λ _{t - 1}^{2} ) + σ ^{2} λ _{t - 1}^{2}}] .

\Big{\lvert}\frac{\partial}{\partial a}\eta_{t}(a,b)\Big{\lvert}=\Big{\lvert}\frac{\sigma_{x}^{2}\sigma^{2}}{\sigma_{x}^{2}(\sigma^{2}+\lambda_{t}^{2})+\sigma^{2}\lambda_{t}^{2}}\Big{\lvert}\leq 1,

\Big{\lvert}\frac{\partial}{\partial a}\eta_{t}(a,b)\Big{\lvert}=\Big{\lvert}\frac{\sigma_{x}^{2}\sigma^{2}}{\sigma_{x}^{2}(\sigma^{2}+\lambda_{t}^{2})+\sigma^{2}\lambda_{t}^{2}}\Big{\lvert}\leq 1,

\Big{\lvert}\frac{\partial}{\partial b}\eta_{t}(a,b)\Big{\lvert}=\Big{\lvert}\frac{\sigma_{x}^{2}\lambda_{t}^{2}}{\sigma_{x}^{2}(\sigma^{2}+\lambda_{t}^{2})+\sigma^{2}\lambda_{t}^{2}}\Big{\lvert}\leq 1,

\Big{\lvert}\frac{\partial}{\partial b}\eta_{t}(a,b)\Big{\lvert}=\Big{\lvert}\frac{\sigma_{x}^{2}\lambda_{t}^{2}}{\sigma_{x}^{2}(\sigma^{2}+\lambda_{t}^{2})+\sigma^{2}\lambda_{t}^{2}}\Big{\lvert}\leq 1,

η_{t} (a, b)

η_{t} (a, b)

= Pr (X \neq = 0∣ a, b) E [X ∣ a, b, X \neq = 0]

= Pr (X \neq = 0∣ a, b) \frac{σ ^{2} a + λ _{t}^{2} b}{σ ^{2} + λ _{t}^{2} + σ ^{2} λ _{t}^{2}},

Pr (X \neq = 0∣ a, b) = (1 + T_{a, b})^{- 1},

Pr (X \neq = 0∣ a, b) = (1 + T_{a, b})^{- 1},

\begin{split}T_{a,b}&:=\frac{(1-\epsilon)\rho_{\lambda_{t}^{2}}(a)\rho_{\sigma^{2}}(b)}{\epsilon\rho_{1+\sigma^{2}}(b)\rho_{\frac{\sigma^{2}}{1+\sigma^{2}}+\lambda_{t}^{2}}\Big{(}\frac{b}{1+\sigma^{2}}-a\Big{)}}\\ &=\Big{(}\frac{1-\epsilon}{\epsilon}\Big{)}\sqrt{\frac{\sigma^{2}+\lambda_{t}^{2}+\sigma^{2}\lambda_{t}^{2}}{\lambda_{t}^{2}\sigma^{2}}}\\ &\qquad\exp\Big{\{}\frac{-(\sigma^{2}a+\lambda_{t-1}^{2}b)^{2}}{2\sigma^{2}\lambda_{t-1}^{2}(\sigma^{2}+\lambda_{t}^{2}+\sigma^{2}\lambda_{t}^{2})}\Big{\}}\\ &=\Big{(}\frac{1-\epsilon}{\epsilon}\Big{)}\frac{\nu_{t}\sqrt{2\pi}}{\lambda_{t}^{2}\sigma^{2}}\rho_{\nu_{t}}(\sigma^{2}a+\lambda_{t}^{2}b),\end{split}

\begin{split}T_{a,b}&:=\frac{(1-\epsilon)\rho_{\lambda_{t}^{2}}(a)\rho_{\sigma^{2}}(b)}{\epsilon\rho_{1+\sigma^{2}}(b)\rho_{\frac{\sigma^{2}}{1+\sigma^{2}}+\lambda_{t}^{2}}\Big{(}\frac{b}{1+\sigma^{2}}-a\Big{)}}\\ &=\Big{(}\frac{1-\epsilon}{\epsilon}\Big{)}\sqrt{\frac{\sigma^{2}+\lambda_{t}^{2}+\sigma^{2}\lambda_{t}^{2}}{\lambda_{t}^{2}\sigma^{2}}}\\ &\qquad\exp\Big{\{}\frac{-(\sigma^{2}a+\lambda_{t-1}^{2}b)^{2}}{2\sigma^{2}\lambda_{t-1}^{2}(\sigma^{2}+\lambda_{t}^{2}+\sigma^{2}\lambda_{t}^{2})}\Big{\}}\\ &=\Big{(}\frac{1-\epsilon}{\epsilon}\Big{)}\frac{\nu_{t}\sqrt{2\pi}}{\lambda_{t}^{2}\sigma^{2}}\rho_{\nu_{t}}(\sigma^{2}a+\lambda_{t}^{2}b),\end{split}

\begin{split}\lambda_{t}^{2}=\sigma_{w}^{2}+\frac{1}{\delta}\Big{(}\frac{T_{a,b}}{1+T_{a,b}}\Big{)}^{2}\left[\frac{(\sigma^{2}+\lambda_{t-1}^{2})+\sigma^{2}\lambda_{t-1}^{2}}{\sigma^{2}+\lambda_{t-1}^{2}+\sigma^{2}\lambda_{t-1}^{2}}\right].\end{split}

\begin{split}\lambda_{t}^{2}=\sigma_{w}^{2}+\frac{1}{\delta}\Big{(}\frac{T_{a,b}}{1+T_{a,b}}\Big{)}^{2}\left[\frac{(\sigma^{2}+\lambda_{t-1}^{2})+\sigma^{2}\lambda_{t-1}^{2}}{\sigma^{2}+\lambda_{t-1}^{2}+\sigma^{2}\lambda_{t-1}^{2}}\right].\end{split}

f_{a, b} := \frac{σ ^{2} a + λ _{t}^{2} b}{σ ^{2} + λ _{t}^{2} + σ ^{2} λ _{t}^{2}} .

f_{a, b} := \frac{σ ^{2} a + λ _{t}^{2} b}{σ ^{2} + λ _{t}^{2} + σ ^{2} λ _{t}^{2}} .

η_{t} (a, b)

η_{t} (a, b)

\begin{split}&\Big{\lvert}\frac{\partial\eta_{t}(a,b)}{\partial a}\Big{\lvert}\\ &=\Big{\lvert}\frac{1}{1+T_{a,b}}\Big{[}\frac{\partial f_{a,b}}{\partial a}\Big{]}-\frac{1}{(1+T_{a,b})^{2}}\Big{[}\frac{\partial T_{a,b}}{\partial a}\Big{]}f_{a,b}\Big{\lvert}\\ &\leq\frac{(1+2T_{a,b})}{(1+T_{a,b})^{2}}\Big{\lvert}\frac{\partial f_{a,b}}{\partial a}\Big{\lvert}+\frac{1}{(1+T_{a,b})^{2}}\Big{\lvert}\frac{\partial(T_{a,b}f_{a,b})}{\partial a}\Big{\lvert}.\end{split}

\begin{split}&\Big{\lvert}\frac{\partial\eta_{t}(a,b)}{\partial a}\Big{\lvert}\\ &=\Big{\lvert}\frac{1}{1+T_{a,b}}\Big{[}\frac{\partial f_{a,b}}{\partial a}\Big{]}-\frac{1}{(1+T_{a,b})^{2}}\Big{[}\frac{\partial T_{a,b}}{\partial a}\Big{]}f_{a,b}\Big{\lvert}\\ &\leq\frac{(1+2T_{a,b})}{(1+T_{a,b})^{2}}\Big{\lvert}\frac{\partial f_{a,b}}{\partial a}\Big{\lvert}+\frac{1}{(1+T_{a,b})^{2}}\Big{\lvert}\frac{\partial(T_{a,b}f_{a,b})}{\partial a}\Big{\lvert}.\end{split}

\frac{(1+2T_{a,b})}{(1+T_{a,b})^{2}}\Big{\lvert}\frac{\partial f_{a,b}}{\partial a}\Big{\lvert}\leq 1.

\frac{(1+2T_{a,b})}{(1+T_{a,b})^{2}}\Big{\lvert}\frac{\partial f_{a,b}}{\partial a}\Big{\lvert}\leq 1.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Distributed Sensor Networks and Detection Algorithms · Blind Source Separation Techniques

Full text

An Analysis of State Evolution for Approximate Message Passing with Side Information

Hangjin Liu

NC State University

Email: [email protected]

Cynthia Rush

Columbia University

Email: [email protected]

Dror Baron

NC State University

Email: [email protected]

Abstract

A common goal in many research areas is to reconstruct an unknown signal $\mathbf{x}$ from noisy linear measurements. Approximate message passing (AMP) is a class of low-complexity algorithms for efficiently solving such high-dimensional regression tasks. Often, it is the case that side information (SI) is available during reconstruction. For this reason a novel algorithmic framework that incorporates SI into AMP, referred to as approximate message passing with side information (AMP-SI), has been recently introduced. An attractive feature of AMP is that when the elements of the signal are exchangeable, the entries of the measurement matrix are independent and identically distributed (i.i.d.) Gaussian, and the denoiser applies the same non-linearity at each entry, the performance of AMP can be predicted accurately by a scalar iteration referred to as state evolution (SE). However, the AMP-SI framework uses different entry-wise scalar denoisers, based on the entry-wise level of the SI, and therefore is not supported by the standard AMP theory. In this work, we provide rigorous performance guarantees for AMP-SI when the input signal and SI are drawn i.i.d. according to some joint distribution subject to finite moment constraints. Moreover, we provide numerical examples to support the theory which demonstrate empirically that the SE can predict the AMP-SI mean square error accurately.

I Introduction

High-dimensional linear regression is a well-studied model being used in many applications including compressed sensing[1], imaging[2], and machine learning and statistics[3]. The unknown signal $\mathbf{x}\in\mathbb{R}^{n}$ is viewed through the linear model:

[TABLE]

where $\mathbf{y}\in\mathbb{R}^{m}$ are the measurements, $\mathbf{A}\in\mathbb{R}^{m\times n}$ is a known measurement matrix, and $\mathbf{w}\in\mathbb{R}^{m}$ is measurement noise. The goal is to estimate the unknown signal $\mathbf{x}$ having knowledge only of the noisy measurements $\mathbf{y}$ and the measurement matrix $\mathbf{A}$ . When the problem is under-determined (i.e., $m<n$ ), in order for reconstruction to be successful, it is necessary to exploit structural or probabilistic characteristics of the input signal $\mathbf{x}$ . Often a prior distribution on the input signal $\mathbf{x}$ is assumed, and in this case approximate message passing (AMP) algorithms[1] can be used for the reconstruction task.

AMP [1, 4] is a class of low-complexity algorithms for efficiently solving high-dimensional regression tasks (1). AMP works by iteratively generating estimates of the unknown input vector, $\mathbf{x}$ , using a possibly non-linear denoiser function tailored to any prior knowledge about $\mathbf{x}$ . One favorable feature of AMP is that under some technical conditions on the measurement matrix $\mathbf{A}$ and $\mathbf{x}$ , the observations at each iteration of the algorithm are almost surely equal in distribution to $\mathbf{x}$ plus independent and identically distributed (i.i.d.) Gaussian noise in the large system limit.

AMP with Side Information (AMP-SI): In information theory [5], when different communication systems share side information (SI), overall communication can become more efficient. Recently [6, 7], a novel algorithmic framework, referred to as AMP-SI, has been introduced for incorporating SI into AMP for high-dimensional regression tasks (1). AMP-SI has been empirically demonstrated to have good reconstruction quality and is easy to use. For example, we have proposed to use AMP-SI for channel estimation in emerging millimeter wave communication systems [8], where the time dynamics of the channel structure allow previous channel estimates to be used as SI when estimating the current channel structure [7].

We model the observed SI, denoted by ${{\widetilde{\mathbf{x}}}\in\mathbb{R}^{n}}$ , as depending statistically on the unknown signal $\mathbf{x}$ through some joint probability density function (pdf), $f({\mathbf{X}},\widetilde{{\mathbf{X}}})$ . AMP-SI uses a conditional denoiser, $g_{t}:\mathbb{R}^{2n}\rightarrow\mathbb{R}^{n}$ , to incorporate SI,

[TABLE]

The AMP-SI algorithm iteratively updates estimates of the input signal $\mathbf{x}$ : let $\mathbf{x}^{0}=\mathbf{0}$ , the all-zeros vector, then

[TABLE]

where $\mathbf{x}^{t}\in\mathbb{R}^{n}$ is the estimate of $\mathbf{x}$ at iteration $t$ and $\delta=\frac{m}{n}$ is the measurement rate. For a differential function $g:\mathbb{R}^{2n}\rightarrow\mathbb{R}^{n}$ we use $\text{div}g(\mathbf{a},\mathbf{b})=\sum_{i=1}^{n}\frac{\partial g_{i}}{\partial a_{i}}(\mathbf{a},\mathbf{b})$ . Using the denoiser in (2), the AMP-SI algorithm (3)-(4) provides the minimum mean squared error (MMSE) estimate of the signal when SI $\widetilde{\mathbf{x}}$ is available [6].

State Evolution (SE): It has been proven that the performance of AMP, as measured, for example, by the normalized squared $\ell_{2}$ -error $\frac{1}{n}||\mathbf{x}^{t}-\mathbf{x}||_{2}^{2}$ between the estimate $\mathbf{x}^{t}$ and true signal $\mathbf{x}$ , can be accurately predicted by a scalar recursion referred as SE[9, 10] when the measurement matrix $\mathbf{A}$ is i.i.d. Gaussian under various assumptions on the elements of the signal. The SE equation for AMP-SI is as follows. Assume the entries of the noise $\mathbf{w}$ are i.i.d. $\sim f(W)$ with $\sigma_{w}^{2}=\mathbb{E}[W^{2}]$ , and let $\lambda_{0}=\sigma_{w}^{2}+\mathbb{E}[||\mathbf{X}||^{2}]/n\delta$ . Then for $t\geq 0$ ,

[TABLE]

where $(\mathbf{X},\widetilde{\mathbf{X}})\sim f(\mathbf{X},\widetilde{\mathbf{X}})$ are independent of $\mathbf{Z}\sim\mathcal{N}(0,\mathbb{I}_{n})$ , where we use $\mathcal{N}(\mu,\sigma^{2})$ to denote a Gaussian distribution with mean $\mu$ and variance $\sigma^{2}$ .

Considering AMP-SI (3)-(4), however, we cannot directly apply the existing AMP theoretical results [9, 10], as the conditional denoiser (2) depends on the index $i$ through the SI, meaning that different scalar denoisers will be used at different indices within the AMP-SI iterations. Recent results [11], however, extend the asymptotic SE analysis to a larger class of possible denoisers, allowing, for example, each element of the input to use a different non-linear denoiser as is the case in AMP-SI. We employ these results to rigorously relate the SE presented in (5) to the AMP-SI algorithm in (3)-(4).

Related Work: While integrating SI into reconstruction algorithms is not new, AMP-SI introduces a unified framework within AMP supporting arbitrary signal and SI dependencies. Prior work using SI has been either heuristic, limited to specific applications, or outside the AMP framework.

For example, Wang and Liang [12] integrate SI into AMP for a specific signal prior density, but the method is difficult to apply to other signal models. Ziniel and Schniter [13] develop an AMP-based reconstruction algorithm for a time-varying signal model based on Markov processes for the support and amplitude. This signal model is easily incorporated into the AMP-SI framework as discussed in the analysis of the birth-death-drift model of [6, 7]. Manoel et al. implement an AMP-based algorithm in which the input signal is repeatedly reconstructed in a streaming fashion, and information from past reconstruction attempts is aggregated into a prior, thus improving ongoing reconstruction results [14]. This reconstruction scheme resembles that of AMP-SI, in particular when the Bernoulli-Gaussian model is used (see Section II-B).

Contribution and Outline: Ma et al. use numerical experiments to show that SE (5) accurately tracks the performance of AMP-SI (3)-(4) [7], as was shown rigorously for standard AMP. Ma et al. conjecture that rigorous theoretical guarantees can be given for AMP-SI as well [7]. In this work, we analyze AMP-SI performance when the input signal and SI are drawn i.i.d. according to a general pdf $f(\mathbf{X},\widetilde{\mathbf{X}})$ obeying some finite moment conditions, the AMP-SI denoiser (2) is Lipschitz, and the measurement matrix $\mathbf{A}$ is i.i.d. Gaussian.

In Section II, we give the main results, examples for various signal and SI models, and numerical experiments comparing the empirical performance of AMP-SI and the SE predictions. The proof of our main theorem is provided in Section III.

II Main Results

II-A Main Theorem

Our main result provides AMP-SI performance guarantees when considering pseudo-Lipschitz loss functions, which we define in the following.

Definition II.1.

Pseudo-Lipschitz functions [11]: For $k\in\mathbb{N}_{>0}$ and any $n\in\mathbb{N}_{>0}$ , a function $\phi:\mathbb{R}^{n}\to\mathbb{R}$ is pseudo-Lipschitz of order $k$ , or PL(k), if there exists a constant $L$ , referred to as the pseudo-Lipschitz constant of $\phi$ , such that for $\mathbf{x},\mathbf{y}\in\mathbb{R}^{n}$

[TABLE]

For $k=1$ , this definition coincides with the standard definition of a Lipschitz function.

A sequence (in $n$ ) of PL(k) functions $\{\phi_{n}\}_{n\in\mathbb{N}_{>0}}$ is called uniformly pseudo-Lipschitz of order $k$ , or uniformly PL(k), if, denoting by $L_{n}$ the pseudo-Lipschitz constant of $\phi_{n}$ , we have $L_{n}<\infty$ for each $n$ and $\lim\sup_{n\to\infty}L_{n}<\infty$ .

Throughout the work, $||\cdot||$ denotes the Euclidean norm, and $\overset{p}{=}$ denotes convergence in probability. In the case of $(\mathbf{X},\widetilde{\mathbf{X}})$ sampled i.i.d. $f(X,\widetilde{X})$ the AMP-SI denoiser (originally defined in (2)) is separable: define $\eta_{t}:\mathbb{R}^{2}\rightarrow\mathbb{R}$ , as

[TABLE]

and the AMP-SI algorithm in (3)-(4) simplifies to

[TABLE]

where the derivative $\eta_{t}^{\prime}(s,\cdot)=\frac{\partial}{\partial s}\eta_{t}(s,\cdot)$ . For the denoiser in (6), the SE is as follows: let $\lambda_{0}=\sigma_{w}^{2}+\mathbb{E}[X^{2}]/\delta$ and for $t\geq 0$ ,

[TABLE]

where $(X,\widetilde{X})\sim f(X,\widetilde{X})$ are independent of $Z\sim\mathcal{N}(0,1)$ .

Theorem II.1.

For any PL(2) functions $\phi:\mathbb{R}^{2}\rightarrow\mathbb{R}$ and $\psi:\mathbb{R}^{3}\rightarrow\mathbb{R}$ , define sequences of functions $\phi_{m}:\mathbb{R}^{2m}\rightarrow\mathbb{R}$ and $\psi_{n}:\mathbb{R}^{3n}\rightarrow\mathbb{R}$ as follows: for vectors $\mathbf{a},\mathbf{b}\in\mathbb{R}^{m}$ and $\mathbf{x},\mathbf{y},\widetilde{\mathbf{x}}\in\mathbb{R}^{n}$ ,

[TABLE]

Then the functions in (10) are uniformly PL(2). Next, assume the following:

(A1)

The measurement matrix $\mathbf{A}$ has i.i.d. Gaussian entries with mean [math] and variance $1/m$ .

(A2)

The noise $\mathbf{w}$ is i.i.d. $\sim f(W)$ with finite $\mathbb{E}[|W|^{\max\{k,2\}}]$ .

(A3)

The signal and SI $(\mathbf{x},\widetilde{\mathbf{x}})$ are sampled i.i.d. from $f(X,\widetilde{X})$ with finite $\mathbb{E}[|X|^{2}]$ , finite $\mathbb{E}[|\widetilde{X}|^{2}]$ , and finite $\mathbb{E}[|X\widetilde{X}|]$ .

(A4)

For $t\geq 0$ , the denoisers $\eta_{t}(\cdot,\cdot)$ defined in (6) are Lipschitz continuous: for scalars $a_{1},a_{2},b_{1},b_{2}$ , and constant $L>0$ , $|\eta_{t}(a_{1},b_{1})-\eta_{t}(a_{2},b_{2})|\leq L||(a_{1},b_{1})-(a_{2},b_{2})||$ .

Then, we have the following asymptotic results for the functions defined in (10),

[TABLE]

where $\mathbf{Z}_{1}\sim\mathcal{N}(0,\mathbb{I}_{m})$ , $\mathbf{Z}_{2}\sim\mathcal{N}(0,\mathbb{I}_{n})$ , independent of $\mathbf{W}\sim i.i.d.\ f(W)$ and $(\mathbf{X},\widetilde{\mathbf{X}})\sim i.i.d.\ f(X,\widetilde{X})$ . $\mathbf{x}^{t}$ and $\mathbf{r}^{t}$ are defined in the AMP-SI recursion (7)-(8), and $\lambda_{t}$ in the SE (9).

Section III contains the proof of Theorem II.1. The proof follows from Berthier et al. [11, Theorem 14] and the strong law of large numbers. The main details involve showing that assumptions $\textbf{(A1)}-\textbf{(A4)}$ allow us to apply [11, Theorem 14]

As a concrete example of how Theorem II.1 provides performance guarantees for AMP-SI, let us consider a few interesting pseudo-Lipschitz loss functions.

Corollary II.1.1.

Under assumptions $\textbf{(A1)}-\textbf{(A4)}$ , letting $\psi^{1}:\mathbb{R}^{3}\rightarrow\mathbb{R}$ be $\psi^{1}(x,y,z)=(x-y)^{2}$ , then by Theorem II.1,

[TABLE]

where $\lambda_{t}^{2}$ is defined in (5). Similarly if $\psi^{2}:\mathbb{R}^{3}\rightarrow\mathbb{R}$ is defined as $\psi^{2}(x,y,z)=(\eta_{t}(x,z)-y)^{2}$ , then by Theorem II.1

[TABLE]

When $\eta_{t}$ is Lipschitz, it is straightforward to show that $\psi^{1}$ and $\psi^{2}$ are both PL(2), and thus Theorem II.1 can be applied.

II-B Examples

Next, we consider a few signal and SI models to show how one can derive the denoiser in (2), use this to construct the AMP-SI algorithm and the SE, and apply Theorem II.1. Before we get to the examples we state a lemma that allows us know about how functions with bounded derivative are Lipschitz.

Lemma II.1.1.

A function $\phi:\mathbb{R}^{2}\rightarrow\mathbb{R}$ having bounded derivatives, $0<\mathsf{D}_{1},\mathsf{D}_{2}<\infty,$

[TABLE]

is Lipschitz continuous with Lipschitz constant $\sqrt{\mathsf{D}_{1}^{2}+\mathsf{D}_{2}^{2}}$ .

Proof.

The result follows using the Triangle Inequality and Cauchy-Schwarz,

[TABLE]

∎

II-B1 Gaussian-Gaussian Signal and SI

In this model, referred to as the GG model henceforth, the signal has i.i.d. Gaussian entries with zero mean and finite variance and we have access to SI in the form of the signal with additive white Gaussian noise (AWGN). The signal, $X$ , and SI, $\widetilde{X}$ , are related by

[TABLE]

In this case, the AMP-SI denoiser (2) equals [7]

[TABLE]

Then the SE (5) can be computed as

[TABLE]

We note that as a result of Lemma II.1.1 because

[TABLE]

and

[TABLE]

and therefore the assumptions $\textbf{(A1)}-\textbf{(A4)}$ are satisfied in the GG case and we can apply Thoerem II.1.

II-B2 Bernoulli-Gaussian Signal and SI

The Bernoulli-Gaussian (BG) model reflects scenario in which one wishes to recover a sparse signal and has access to SI in the form of the signal with AWGN as in (13). In this model, each entry of the signal is independently generated according to $x_{i}\sim\epsilon\mathcal{N}(0,1)+(1-\epsilon)\delta_{0}$ , where $\delta_{0}$ is the Dirac delta function at [math]. In words, the entries of the signal independently take the value [math] with probability $1-\epsilon$ and are $\mathcal{N}(0,1)$ with probability $\epsilon$ . In this case, the AMP-SI denoiser (2) equals [7]

[TABLE]

where, letting $\rho_{\tau^{2}}(x)$ be the zero-mean Gaussian density with variance $\tau^{2}$ evaluated at $x$ , and defining $\nu_{t}:=\sigma^{2}\lambda_{t}^{2}(\sigma^{2}+\lambda_{t}^{2}+\sigma^{2}\lambda_{t}^{2})$ ,

[TABLE]

where we denote

[TABLE]

Then the SE (5) can be computed as

[TABLE]

We again use Lemma II.1.1 to show that the denoiser defined in (16) and (17) is Lipschitz continuous so that the assumptions $\textbf{(A1)}-\textbf{(A4)}$ are satisfied in the BG case and we can apply Thoerem II.1. We study the partial derivatives. Denote

[TABLE]

Combining (17) and (18) and (20),

[TABLE]

Then,

[TABLE]

Now we show upperbounds for the two terms of (LABEL:eq:partial_BG_1_V1) separately. For the first term, we see that $\frac{\partial f_{a,b}}{\partial a}\leq 1$ , so

[TABLE]

Now we consider the second term of

Consider the second term of (LABEL:eq:partial_BG_1_V1). First we note that

[TABLE]

Then from (18) and (20),

[TABLE]

then using that $\frac{\partial}{\partial x}\rho_{\tau^{2}}(x)=-\frac{x}{\tau^{2}}\rho_{\tau^{2}}(x)$ , we have

[TABLE]

To upper bound the above, we use $\exp\{-x\}\leq\frac{1}{1+x}$ when $x\geq 0$ , and so

[TABLE]

Using this in (22), we find

[TABLE]

where in the final inequality we use $\lambda_{t}\geq\sigma_{w}$ by (19), and

[TABLE]

Using the above in (LABEL:eq:partial_BG_1_V1), we have

[TABLE]

As in (LABEL:eq:partial_BG_1_V1) we can show

[TABLE]

Then,

[TABLE]

and a bound as in (22) - (23) gives

[TABLE]

II-C Numerical Examples

Finally, we provide numerical results to compare the empirical mean square error (MSE) performance of AMP-SI and the performance predicted by SE. Fig. 1 shows the MSE achieved by AMP-SI in the GG scenario and the SE prediction of its performance. In this example, the signal variance $\sigma_{x}^{2}=1$ , the measurement noise variance $\sigma_{w}^{2}=0.01$ , the variance of AWGN in SI $\sigma^{2}=0.04$ . We averaged over 10 trials of a GG recovery problem for empirical results of AMP-SI. The comparison in Fig. 1(a), Fig. 1(b) and Fig. 1(c) given by three different signal length. For smaller $n$ there is some gap between the empirical MSE and the SE prediction, as shown in Fig. 1 for $n=100$ , but the gap shrinks as $n$ is increased. The results show the empirical MSE tracks the SE prediction nicely.

Fig. 2 shows the MSE achieved by AMP-SI in the BG scenario, and the SE prediction of its performance. We again averaged over 10 trials of a BG recovery problem for empirical results of AMP-SI. The signal length $n=10000$ , $m=3000$ , the measurement noise variance $\sigma_{w}^{2}=0.01$ , and $\epsilon=0.2$ , where $20\%$ of the entries in the signal are nonzero. We vary the variance of AWGN in SI from $\sigma^{2}=0.04$ , $\sigma^{2}=0.25$ , and $\sigma^{2}=1$ . The results show that SE can predict the MSE achieved by AMP-SI at every iteration.

III Proof of Theorem II.1

III-A Step 1

First we show that the functions defined in (10) are uniformly PL(2) when $\phi$ and $\psi$ are PL(2). This is a straightforward application of Cauchy-Schwarz. We show the result for $\phi$ and the result for $\psi$ follows similarly.

First, by the fact that $\phi$ is PL(2) ,

[TABLE]

Then applying Cauchy-Schwarz in the following way: for any $r>0$ and $a_{1},a_{2},\ldots,a_{m}$ scalars, $(|a_{1}|+|a_{2}|+\ldots|a_{m}|)^{r}\leq m^{r-1}(|a_{1}|^{r}+|a_{2}|^{r}+\ldots|a_{m}|^{r})$ , we have

[TABLE]

In the final inequality in the above we have used that

[TABLE]

Finally, we note that this implies

[TABLE]

III-B Step 2

Next we show the asymptotic results given in (11). First we use Berthier et al. [11, Theorem 14] and then we make an appeal to the strong law of large numbers (SLLN): We remind the reader of the strong law:

Definition III.1.

Strong Law of Large Numbers [15]: Let $X_{1},X_{2},...$ be a sequence of i.i.d. random variables with finite mean $\mu$ . Then

[TABLE]

In words, the partial averages $\frac{1}{n}(X_{1}+X_{2}+...+X_{n})$ converge almost surely to $\mu<\infty$ .

We will make use of Berthier et al. [11, Theorem 14], restated here for convenience. To apply the result in Berthier et al. [11, Theorem 14], one needs to justify the following assumptions:

(C1)

The measurement matrix $\mathbf{A}$ has Gaussian entries with i.i.d. mean [math] and variance $1/m$ .

(C2)

Define a sequence of denoisers $\widetilde{\eta}_{n}^{t}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ to be those that apply the denoiser $\eta_{t}$ defined in (6) elementwise as follows: $\widetilde{\eta}_{n}^{t}({\mathbf{x}}):=\eta_{t}({\mathbf{x}},\widetilde{{\mathbf{x}}})$ . For each $t$ , $\widetilde{\eta}_{n}^{t}(\cdot)$ are uniformly Lipschitz. A function is uniformly Lipschitz in $n$ if the Lipschitz constant does not depend on $n$ .

(C3)

$||\mathbf{x}||_{2}^{2}/n$ converges to a constant as $n\to\infty$ .

(C4)

The limit $\sigma_{w}=\lim_{m\to\infty}{||\mathbf{w}||_{2}}/{\sqrt{m}}$ is finite.

(C5)

For any iterations $s,t\in\mathbb{N}$ and for any $2\times 2$ covariance matrix $\boldsymbol{\Sigma}$ , the following limits exist.

[TABLE]

where $(\mathbf{Z},\mathbf{Z}^{\prime})\sim N(0,\boldsymbol{\Sigma}\otimes\mathbb{I}_{n})$ , with $\otimes$ denoting the tensor product and $\mathbb{I}_{n}$ the identity matrix.

Theorem III.1.

Under the assumptions $\textbf{(C1)}-\textbf{(C5)}$ , for any sequences of uniformly pseudo-Lipschitz functions $\rho_{m}:\mathbb{R}^{m}\times\mathbb{R}^{m}\rightarrow\mathbb{R}$ and $\gamma_{n}:\mathbb{R}^{n}\times\mathbb{R}^{n}\rightarrow\mathbb{R}$ ,

[TABLE]

where $\mathbf{Z}_{1}\sim\mathcal{N}(0,\mathbb{I}_{m})$ , $\mathbf{Z}_{2}\sim\mathcal{N}(0,\mathbb{I}_{n})$ , $\mathbf{x}^{t}$ and $\mathbf{r}^{t}$ are defined in the AMP-SI recursion (7)-(8), and $\lambda_{t}$ in the SE (5).

Now we demonstrate that our assumptions $\textbf{(A1)}-\textbf{(A4)}$ stated in Section II are enough to satisfy the assumptions $\textbf{(C1)}-\textbf{(C5)}$ needed to apply Theorem III.1.

Assumptions (A1) and (C1) are identical. We will show that (C2) follows from (A4), (C4) follows from (A2), and (C3) follows from (A3). Finally we show (C5) follow from (A3) and (A4).

First consider assumption (C2). The non-separable denoiser $\widetilde{\eta}_{n}^{t}(X)=\eta_{t}({X},\widetilde{X})$ applies the AMP-SI denoiser defined in (2) entrywise to its vector inputs. From (A4), $\{\eta_{t}(\cdot,\cdot)\}_{t\geq 0}$ are Lipschitz continuous. Thus, for length- $n$ vectors $x_{1},x_{2}$ , and fixed SI $\widetilde{x}$ ,

[TABLE]

and so

[TABLE]

The Lipschitz constant does not depend on $n$ , so $\widetilde{\eta}_{n}^{t}({\cdot})$ is uniformly Lipschitz.

Now consider assumption (C4). From (A2), the measurement noise $\mathbf{w}$ in (1) has i.i.d. entries with zero-mean and finite $\mathbb{E}[|W|^{2}]$ . Then applying Definition III.1,

[TABLE]

where we have used that $\sigma_{w}^{2}<\infty$ follows from $\mathbb{E}[|W|^{2}]<\infty$ . The proof of (C3) similarly follows using the SLLN and the finiteness of $\mathbb{E}[|X|^{2}]$ given in assumption (A3).

We now show that (C5) is met. Recall $Z\sim\mathcal{N}(0,\sigma_{z}^{2}\mathbb{I}_{n})$ . Define $y_{i}:=x_{i}\mathbb{E}_{Z}\left[\eta_{t}(x_{i}+Z_{i},\widetilde{x}_{i})\right]$ for $i=1,2,\ldots,n$ . By assumption (A3), the signal and side information $(\mathbf{X},\widetilde{\mathbf{X}})$ are sampled i.i.d. from the joint density $f(X,\widetilde{X})$ . It follows that $y_{1},y_{2},\ldots,y_{n}$ are also i.i.d., so by Definition III.1 if $\mathbb{E}[X\eta_{t}(X+Z,\widetilde{X})]<\infty$ where $Z\sim\mathcal{N}(0,\sigma_{z}^{2})$ independent of $(X,\widetilde{X})\sim f(X,\widetilde{X})$ , then

[TABLE]

We now show that $\mathbb{E}[X\eta_{t}(X+Z,\widetilde{X})]<\infty$ .

First note that (A4) assumes $\eta_{t}(\cdot,\cdot)$ is Lipschitz, meaning for scalars $a_{1},a_{2},b_{1},b_{2}$ and some constant $L>0$ ,

[TABLE]

Therefore letting $a_{2}=b_{2}=0$ we have

[TABLE]

giving the follows upper bound for constant $L^{\prime}>0$ ,

[TABLE]

Now using (26) and the triangle inequality,

[TABLE]

Finally, by assumption (A3) we have that $\mathbb{E}[|X|^{2}],\mathbb{E}[|\widetilde{X}|^{2}]$ and $\mathbb{E}|X\widetilde{X}|$ are all finite. Then noting that for any random variable, $Y$ , we have $|Y|^{r}\leq 1+|Y|^{k}$ for $1\leq r\leq k$ , meaning $\mathbb{E}[|Y|]^{r}<1+\mathbb{E}[|Y|^{k}]$ the boundednes of $\mathbb{E}[X\eta_{t}(X+Z,\widetilde{X})]$ follows from (27) with assumption (A3).

The proof of the second equation in (C5) follows similarly to the proof of the first equation in (C5). Recall $(Z,Z^{\prime})\sim N(0,\Sigma\otimes\mathbb{I}_{n})$ . Define $y_{i}:=\mathbb{E}_{Z,Z^{\prime}}[\eta_{t}(x_{i}+Z_{i},\widetilde{x}_{i})\eta_{s}(x_{i}+Z^{\prime}_{i},\widetilde{x}_{i})]$ for $i=1,2,\ldots,n$ . By assumption (A3), the signal and side information $(\mathbf{X},\widetilde{\mathbf{X}})$ are sampled i.i.d. from the joint density $f(X,\widetilde{X})$ . It follows that $y_{1},y_{2},\ldots,y_{n}$ are also i.i.d., so by Definition III.1 if $\mathbb{E}[\eta_{t}(X+Z,\widetilde{X})\eta_{s}(X+Z^{\prime},\widetilde{X})]<\infty$ where $Z\sim\mathcal{N}(0,\sigma_{z}^{2})$ and $Z^{\prime}\sim\mathcal{N}(0,\sigma_{z^{\prime}}^{2})$ , independent of $(X,\widetilde{X})\sim f(X,\widetilde{X})$ , then

[TABLE]

We will now show that $\mathbb{E}[\eta_{t}(X+Z,\widetilde{X})\eta_{s}(X+Z^{\prime},\widetilde{X})]<\infty$ . Using the bound (26),

[TABLE]

Then using the triangle inequality,

[TABLE]

III-C Step 3

Now that we’ve justified $\textbf{(C1)}-\textbf{(C5)}$ , we make an appeal to Theorem III.1 and the SLLN in order to finally prove (11). The first result in (11), namely the asymptotic result for $\phi_{m}$ uniformly PL(2), follows almost immediately by applying Theorem III.1 using $\rho_{m}=\phi_{m}$ . Namely, by Theorem III.1,

[TABLE]

since $\phi_{m}$ is assumed to be uniformly PL(2). To complete the proof, we will finally prove that

[TABLE]

where $W\sim f(W)$ independent of $Z_{1}$ standard Gaussian. Then the desired result follows since

[TABLE]

The result follows by the SLLN (Definition III.1) so long as $\mathbb{E}[\phi(W+\sqrt{\lambda_{t}^{2}-\sigma_{w}^{2}}\,{Z_{1}},W)]$ is finite. By Definition II.1 it is easy to see that if $\phi:\mathbb{R}^{2}\rightarrow\mathbb{R}$ is PL(2), then there is a constant $L^{\prime}>0$ such that for all $\mathbf{x}\in\mathbb{R}^{2}$ : $|\phi(\mathbf{x})|\leq L^{\prime}(1+||\mathbf{x}||^{2}).$ Using this,

[TABLE]

where we have used: for any $r>0$ and any $a_{1},a_{2}$ scalars, $\lvert\lvert(a_{1},a_{2})\lvert\lvert^{2}=a_{1}^{2}+a_{2}^{2}$ and $(|a_{1}|+|a_{2}|)^{r}\leq 2^{r-1}(|a_{1}|^{r}+|a_{2}|^{r})$ . Thus,

[TABLE]

Similarly, we have the upper bound,

[TABLE]

Therefore, using (31), and the boundedness of $\mathbb{E}[|X|^{2}]$ and $\mathbb{E}[|\widetilde{X}|^{2}]$ assumed in (A3),

[TABLE]

The second result of (11) requires a bit more care as it is not immediate that the function $\gamma_{n}:\mathbb{R}^{2n}\rightarrow\mathbb{R}$ defined as $\gamma_{n}(\mathbf{a},\mathbf{b}):=\psi_{n}(\mathbf{a},\mathbf{b},\widetilde{\mathbf{x}})$ for a sequence of side informations $\{\widetilde{\mathbf{x}}\}_{n}$ is uniformly PL(2) as needed to apply Theorem III.1. The next step of the proof deals with carefully handling this issue. We note that once we have shown that

[TABLE]

then the last step showing that

[TABLE]

follows by the SLLN as in (29) - (LABEL:eq:W_equation). However, the function $\gamma_{n}$ is not obviously uniformly PL(2) since an upper bound on $|\psi_{n}(\mathbf{a},\widetilde{\mathbf{a}},\widetilde{\mathbf{x}})-\psi_{n}(\mathbf{b},\widetilde{\mathbf{b}},\widetilde{\mathbf{x}})|$ necessarily has an $||\widetilde{\mathbf{x}}||/\sqrt{n}$ factor. This is mainly a technicality as $||\widetilde{\mathbf{x}}||/\sqrt{n}$ is bounded by a constant (independent of $n$ ) with high probability.

To show (32) we would like to show that for any $\epsilon>0$ ,

[TABLE]

as $n\rightarrow\infty$ . Define a pair of events $\mathcal{T}_{n}(\epsilon)$ and $\mathcal{B}_{n}(C)$ as

[TABLE]

and for constant $C>0$ independent of $n$ , $\mathcal{B}_{n}(C):=\{\widetilde{\mathbf{x}}\in\mathbb{R}^{n}:||\widetilde{\mathbf{x}}||/\sqrt{n}<C\}.$ Then demonstrating (33) means showing, for any $\epsilon>0$ , that $\lim_{n}P(\mathcal{T}_{n}(\epsilon))=0$ . Note that,

[TABLE]

Considering the above, the first term approaches [math] as $n$ gets large due to Theorem III.1, since one can argue $P(\mathcal{T}_{n}(\epsilon)\lvert\mathcal{B}_{n}(C))=P(\mathcal{T}_{n}(\epsilon)\lvert\mathcal{B}_{p}(C)\text{ for all }p>p_{0})$ and conditional on the event $\mathcal{B}_{p}(C)$ being true for all integers $p>p_{0}$ (constant $p_{0}>0$ ), the function $\gamma_{n}$ defined in (III-C) is uniformly PL(2) in $n$ . This uses that $\widetilde{\mathbf{x}}(n)$ is independent of the other random elements, namely $\mathbf{A}(n)$ and $\mathbf{w}(n)$ . Next, by choosing $C$ large enough, the second probability $P(\text{not }\mathcal{B}_{n}(C))$ goes to zero almost surely by the SLLN as $||\widetilde{\mathbf{x}}||/\sqrt{n}$ concentrates to the elementwise expectation of $\widetilde{\mathbf{x}}$ .

Acknowledgment

We thank You (Joe) Zhou for insightful conversations and valuable advice. Liu and Baron acknowledge support from NSF EECS $\#1611112$ and Rush from NSF $\#1217023$ .

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms for compressed sensing,” Proc. Nat. Academy Sci. , vol. 106, no. 45, pp. 18 914–18 919, Nov. 2009.
2[2] H. Arguello and G. Arce, “Code aperture optimization for spectrally agile compressive imaging,” J. Opt. Soc. Am. , vol. 28, no. 11, pp. 2400–2413, Nov. 2011.
3[3] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical Learning . Springer, Aug. 2001.
4[4] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” Arxiv preprint ar Xiv:1010.5141 , Oct. 2010.
5[5] T. M. Cover and J. A. Thomas, Elements of Information Theory . New York, NY, USA: Wiley-Interscience, 2006.
6[6] D. Baron, A. Ma, D. Needell, C. Rush, and T. Woolf, “Conditional approximate message passing with side information,” in Proc. IEEE Asilomar Conf. Signals, Syst. Comput. , 2017.
7[7] A. Ma, Y. Zhou, C. Rush, D. Baron, and D. Needell, “An approximate message passing framework for side information,” ar Xiv:1807.04839 , July 2018.
8[8] A. Saleh and R. Valenzuela, “A statistical model for indoor multipath propagation,” IEEE J. Select. Areas Commun. , vol. 5, no. 2, pp. 128–137, Feb. 1987.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

An Analysis of State Evolution for Approximate Message Passing with Side Information

Abstract

I Introduction

II Main Results

II-A Main Theorem

Definition II.1**.**

Theorem II.1**.**

Corollary II.1.1**.**

II-B Examples

Lemma II.1.1**.**

Proof.

II-B1 Gaussian-Gaussian Signal and SI

II-B2 Bernoulli-Gaussian Signal and SI

II-C Numerical Examples

III Proof of Theorem II.1

III-A Step 1

III-B Step 2

Definition III.1**.**

Theorem III.1**.**

III-C Step 3

Acknowledgment

Definition II.1.

Theorem II.1.

Corollary II.1.1.

Lemma II.1.1.

Definition III.1.

Theorem III.1.