An Analysis of State Evolution for Approximate Message Passing with Side Information
Hangjin Liu, Cynthia Rush, Dror Baron

TL;DR
This paper extends the theoretical understanding of approximate message passing algorithms by providing performance guarantees for AMP with side information, supported by numerical evidence showing accurate state evolution predictions.
Contribution
It offers the first rigorous analysis of AMP with side information, establishing conditions under which its performance can be accurately predicted by state evolution.
Findings
Performance guarantees for AMP-SI under joint distribution assumptions
State evolution accurately predicts mean square error in AMP-SI
Numerical results support theoretical predictions
Abstract
A common goal in many research areas is to reconstruct an unknown signal x from noisy linear measurements. Approximate message passing (AMP) is a class of low-complexity algorithms for efficiently solving such high-dimensional regression tasks. Often, it is the case that side information (SI) is available during reconstruction. For this reason a novel algorithmic framework that incorporates SI into AMP, referred to as approximate message passing with side information (AMP-SI), has been recently introduced. An attractive feature of AMP is that when the elements of the signal are exchangeable, the entries of the measurement matrix are independent and identically distributed (i.i.d.) Gaussian, and the denoiser applies the same non-linearity at each entry, the performance of AMP can be predicted accurately by a scalar iteration referred to as state evolution (SE). However, the AMP-SI…
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Distributed Sensor Networks and Detection Algorithms · Blind Source Separation Techniques
An Analysis of State Evolution for Approximate Message Passing with Side Information
Hangjin Liu
NC State University
Email: [email protected]
Cynthia Rush
Columbia University
Email: [email protected]
Dror Baron
NC State University
Email: [email protected]
Abstract
A common goal in many research areas is to reconstruct an unknown signal from noisy linear measurements. Approximate message passing (AMP) is a class of low-complexity algorithms for efficiently solving such high-dimensional regression tasks. Often, it is the case that side information (SI) is available during reconstruction. For this reason a novel algorithmic framework that incorporates SI into AMP, referred to as approximate message passing with side information (AMP-SI), has been recently introduced. An attractive feature of AMP is that when the elements of the signal are exchangeable, the entries of the measurement matrix are independent and identically distributed (i.i.d.) Gaussian, and the denoiser applies the same non-linearity at each entry, the performance of AMP can be predicted accurately by a scalar iteration referred to as state evolution (SE). However, the AMP-SI framework uses different entry-wise scalar denoisers, based on the entry-wise level of the SI, and therefore is not supported by the standard AMP theory. In this work, we provide rigorous performance guarantees for AMP-SI when the input signal and SI are drawn i.i.d. according to some joint distribution subject to finite moment constraints. Moreover, we provide numerical examples to support the theory which demonstrate empirically that the SE can predict the AMP-SI mean square error accurately.
I Introduction
High-dimensional linear regression is a well-studied model being used in many applications including compressed sensing[1], imaging[2], and machine learning and statistics[3]. The unknown signal is viewed through the linear model:
[TABLE]
where are the measurements, is a known measurement matrix, and is measurement noise. The goal is to estimate the unknown signal having knowledge only of the noisy measurements and the measurement matrix . When the problem is under-determined (i.e., ), in order for reconstruction to be successful, it is necessary to exploit structural or probabilistic characteristics of the input signal . Often a prior distribution on the input signal is assumed, and in this case approximate message passing (AMP) algorithms[1] can be used for the reconstruction task.
AMP [1, 4] is a class of low-complexity algorithms for efficiently solving high-dimensional regression tasks (1). AMP works by iteratively generating estimates of the unknown input vector, , using a possibly non-linear denoiser function tailored to any prior knowledge about . One favorable feature of AMP is that under some technical conditions on the measurement matrix and , the observations at each iteration of the algorithm are almost surely equal in distribution to plus independent and identically distributed (i.i.d.) Gaussian noise in the large system limit.
AMP with Side Information (AMP-SI): In information theory [5], when different communication systems share side information (SI), overall communication can become more efficient. Recently [6, 7], a novel algorithmic framework, referred to as AMP-SI, has been introduced for incorporating SI into AMP for high-dimensional regression tasks (1). AMP-SI has been empirically demonstrated to have good reconstruction quality and is easy to use. For example, we have proposed to use AMP-SI for channel estimation in emerging millimeter wave communication systems [8], where the time dynamics of the channel structure allow previous channel estimates to be used as SI when estimating the current channel structure [7].
We model the observed SI, denoted by , as depending statistically on the unknown signal through some joint probability density function (pdf), . AMP-SI uses a conditional denoiser, , to incorporate SI,
[TABLE]
The AMP-SI algorithm iteratively updates estimates of the input signal : let , the all-zeros vector, then
[TABLE]
where is the estimate of at iteration and is the measurement rate. For a differential function we use . Using the denoiser in (2), the AMP-SI algorithm (3)-(4) provides the minimum mean squared error (MMSE) estimate of the signal when SI is available [6].
State Evolution (SE): It has been proven that the performance of AMP, as measured, for example, by the normalized squared -error between the estimate and true signal , can be accurately predicted by a scalar recursion referred as SE[9, 10] when the measurement matrix is i.i.d. Gaussian under various assumptions on the elements of the signal. The SE equation for AMP-SI is as follows. Assume the entries of the noise are i.i.d. with , and let . Then for ,
[TABLE]
where are independent of , where we use to denote a Gaussian distribution with mean and variance .
Considering AMP-SI (3)-(4), however, we cannot directly apply the existing AMP theoretical results [9, 10], as the conditional denoiser (2) depends on the index through the SI, meaning that different scalar denoisers will be used at different indices within the AMP-SI iterations. Recent results [11], however, extend the asymptotic SE analysis to a larger class of possible denoisers, allowing, for example, each element of the input to use a different non-linear denoiser as is the case in AMP-SI. We employ these results to rigorously relate the SE presented in (5) to the AMP-SI algorithm in (3)-(4).
Related Work: While integrating SI into reconstruction algorithms is not new, AMP-SI introduces a unified framework within AMP supporting arbitrary signal and SI dependencies. Prior work using SI has been either heuristic, limited to specific applications, or outside the AMP framework.
For example, Wang and Liang [12] integrate SI into AMP for a specific signal prior density, but the method is difficult to apply to other signal models. Ziniel and Schniter [13] develop an AMP-based reconstruction algorithm for a time-varying signal model based on Markov processes for the support and amplitude. This signal model is easily incorporated into the AMP-SI framework as discussed in the analysis of the birth-death-drift model of [6, 7]. Manoel et al. implement an AMP-based algorithm in which the input signal is repeatedly reconstructed in a streaming fashion, and information from past reconstruction attempts is aggregated into a prior, thus improving ongoing reconstruction results [14]. This reconstruction scheme resembles that of AMP-SI, in particular when the Bernoulli-Gaussian model is used (see Section II-B).
Contribution and Outline: Ma et al. use numerical experiments to show that SE (5) accurately tracks the performance of AMP-SI (3)-(4) [7], as was shown rigorously for standard AMP. Ma et al. conjecture that rigorous theoretical guarantees can be given for AMP-SI as well [7]. In this work, we analyze AMP-SI performance when the input signal and SI are drawn i.i.d. according to a general pdf obeying some finite moment conditions, the AMP-SI denoiser (2) is Lipschitz, and the measurement matrix is i.i.d. Gaussian.
In Section II, we give the main results, examples for various signal and SI models, and numerical experiments comparing the empirical performance of AMP-SI and the SE predictions. The proof of our main theorem is provided in Section III.
II Main Results
II-A Main Theorem
Our main result provides AMP-SI performance guarantees when considering pseudo-Lipschitz loss functions, which we define in the following.
Definition II.1**.**
Pseudo-Lipschitz functions [11]: For and any , a function is pseudo-Lipschitz of order , or PL(k), if there exists a constant , referred to as the pseudo-Lipschitz constant of , such that for
[TABLE]
For , this definition coincides with the standard definition of a Lipschitz function.
A sequence (in ) of PL(k) functions is called uniformly pseudo-Lipschitz of order , or uniformly PL(k), if, denoting by the pseudo-Lipschitz constant of , we have for each and .
Throughout the work, denotes the Euclidean norm, and denotes convergence in probability. In the case of sampled i.i.d. the AMP-SI denoiser (originally defined in (2)) is separable: define , as
[TABLE]
and the AMP-SI algorithm in (3)-(4) simplifies to
[TABLE]
where the derivative . For the denoiser in (6), the SE is as follows: let and for ,
[TABLE]
where are independent of .
Theorem II.1**.**
For any PL(2) functions and , define sequences of functions and as follows: for vectors and ,
[TABLE]
Then the functions in (10) are uniformly PL(2). Next, assume the following:
- (A1)
The measurement matrix has i.i.d. Gaussian entries with mean [math] and variance .
- (A2)
The noise is i.i.d. with finite .
- (A3)
The signal and SI are sampled i.i.d. from with finite , finite , and finite .
- (A4)
For , the denoisers defined in (6) are Lipschitz continuous: for scalars , and constant , .
Then, we have the following asymptotic results for the functions defined in (10),
[TABLE]
where , , independent of and . and are defined in the AMP-SI recursion (7)-(8), and in the SE (9).
Section III contains the proof of Theorem II.1. The proof follows from Berthier et al. [11, Theorem 14] and the strong law of large numbers. The main details involve showing that assumptions allow us to apply [11, Theorem 14]
As a concrete example of how Theorem II.1 provides performance guarantees for AMP-SI, let us consider a few interesting pseudo-Lipschitz loss functions.
Corollary II.1.1**.**
Under assumptions , letting be , then by Theorem II.1,
[TABLE]
where is defined in (5). Similarly if is defined as , then by Theorem II.1
[TABLE]
When is Lipschitz, it is straightforward to show that and are both PL(2), and thus Theorem II.1 can be applied.
II-B Examples
Next, we consider a few signal and SI models to show how one can derive the denoiser in (2), use this to construct the AMP-SI algorithm and the SE, and apply Theorem II.1. Before we get to the examples we state a lemma that allows us know about how functions with bounded derivative are Lipschitz.
Lemma II.1.1**.**
A function having bounded derivatives,
[TABLE]
is Lipschitz continuous with Lipschitz constant .
Proof.
The result follows using the Triangle Inequality and Cauchy-Schwarz,
[TABLE]
∎
II-B1 Gaussian-Gaussian Signal and SI
In this model, referred to as the GG model henceforth, the signal has i.i.d. Gaussian entries with zero mean and finite variance and we have access to SI in the form of the signal with additive white Gaussian noise (AWGN). The signal, , and SI, , are related by
[TABLE]
In this case, the AMP-SI denoiser (2) equals [7]
[TABLE]
Then the SE (5) can be computed as
[TABLE]
We note that as a result of Lemma II.1.1 because
[TABLE]
and
[TABLE]
and therefore the assumptions are satisfied in the GG case and we can apply Thoerem II.1.
II-B2 Bernoulli-Gaussian Signal and SI
The Bernoulli-Gaussian (BG) model reflects scenario in which one wishes to recover a sparse signal and has access to SI in the form of the signal with AWGN as in (13). In this model, each entry of the signal is independently generated according to , where is the Dirac delta function at [math]. In words, the entries of the signal independently take the value [math] with probability and are with probability . In this case, the AMP-SI denoiser (2) equals [7]
[TABLE]
where, letting be the zero-mean Gaussian density with variance evaluated at , and defining ,
[TABLE]
where we denote
[TABLE]
Then the SE (5) can be computed as
[TABLE]
We again use Lemma II.1.1 to show that the denoiser defined in (16) and (17) is Lipschitz continuous so that the assumptions are satisfied in the BG case and we can apply Thoerem II.1. We study the partial derivatives. Denote
[TABLE]
Combining (17) and (18) and (20),
[TABLE]
Then,
[TABLE]
Now we show upperbounds for the two terms of (LABEL:eq:partial_BG_1_V1) separately. For the first term, we see that , so
[TABLE]
Now we consider the second term of
Consider the second term of (LABEL:eq:partial_BG_1_V1). First we note that
[TABLE]
[TABLE]
then using that , we have
[TABLE]
To upper bound the above, we use when , and so
[TABLE]
Using this in (22), we find
[TABLE]
where in the final inequality we use by (19), and
[TABLE]
Using the above in (LABEL:eq:partial_BG_1_V1), we have
[TABLE]
As in (LABEL:eq:partial_BG_1_V1) we can show
[TABLE]
Then,
[TABLE]
and a bound as in (22) - (23) gives
[TABLE]
II-C Numerical Examples
Finally, we provide numerical results to compare the empirical mean square error (MSE) performance of AMP-SI and the performance predicted by SE. Fig. 1 shows the MSE achieved by AMP-SI in the GG scenario and the SE prediction of its performance. In this example, the signal variance , the measurement noise variance , the variance of AWGN in SI . We averaged over 10 trials of a GG recovery problem for empirical results of AMP-SI. The comparison in Fig. 1(a), Fig. 1(b) and Fig. 1(c) given by three different signal length. For smaller there is some gap between the empirical MSE and the SE prediction, as shown in Fig. 1 for , but the gap shrinks as is increased. The results show the empirical MSE tracks the SE prediction nicely.
Fig. 2 shows the MSE achieved by AMP-SI in the BG scenario, and the SE prediction of its performance. We again averaged over 10 trials of a BG recovery problem for empirical results of AMP-SI. The signal length , , the measurement noise variance , and , where of the entries in the signal are nonzero. We vary the variance of AWGN in SI from , , and . The results show that SE can predict the MSE achieved by AMP-SI at every iteration.
III Proof of Theorem II.1
III-A Step 1
First we show that the functions defined in (10) are uniformly PL(2) when and are PL(2). This is a straightforward application of Cauchy-Schwarz. We show the result for and the result for follows similarly.
First, by the fact that is PL(2) ,
[TABLE]
Then applying Cauchy-Schwarz in the following way: for any and scalars, , we have
[TABLE]
In the final inequality in the above we have used that
[TABLE]
Finally, we note that this implies
[TABLE]
III-B Step 2
Next we show the asymptotic results given in (11). First we use Berthier et al. [11, Theorem 14] and then we make an appeal to the strong law of large numbers (SLLN): We remind the reader of the strong law:
Definition III.1**.**
Strong Law of Large Numbers [15]: Let be a sequence of i.i.d. random variables with finite mean . Then
[TABLE]
In words, the partial averages converge almost surely to .
We will make use of Berthier et al. [11, Theorem 14], restated here for convenience. To apply the result in Berthier et al. [11, Theorem 14], one needs to justify the following assumptions:
- (C1)
The measurement matrix has Gaussian entries with i.i.d. mean [math] and variance .
- (C2)
Define a sequence of denoisers to be those that apply the denoiser defined in (6) elementwise as follows: . For each , are uniformly Lipschitz. A function is uniformly Lipschitz in if the Lipschitz constant does not depend on .
- (C3)
converges to a constant as .
- (C4)
The limit is finite.
- (C5)
For any iterations and for any covariance matrix , the following limits exist.
[TABLE]
where , with denoting the tensor product and the identity matrix.
Theorem III.1**.**
Under the assumptions , for any sequences of uniformly pseudo-Lipschitz functions and ,
[TABLE]
where , , and are defined in the AMP-SI recursion (7)-(8), and in the SE (5).
Now we demonstrate that our assumptions stated in Section II are enough to satisfy the assumptions needed to apply Theorem III.1.
Assumptions (A1) and (C1) are identical. We will show that (C2) follows from (A4), (C4) follows from (A2), and (C3) follows from (A3). Finally we show (C5) follow from (A3) and (A4).
First consider assumption (C2). The non-separable denoiser applies the AMP-SI denoiser defined in (2) entrywise to its vector inputs. From (A4), are Lipschitz continuous. Thus, for length- vectors , and fixed SI ,
[TABLE]
and so
[TABLE]
The Lipschitz constant does not depend on , so is uniformly Lipschitz.
Now consider assumption (C4). From (A2), the measurement noise in (1) has i.i.d. entries with zero-mean and finite . Then applying Definition III.1,
[TABLE]
where we have used that follows from . The proof of (C3) similarly follows using the SLLN and the finiteness of given in assumption (A3).
We now show that (C5) is met. Recall . Define for . By assumption (A3), the signal and side information are sampled i.i.d. from the joint density . It follows that are also i.i.d., so by Definition III.1 if where independent of , then
[TABLE]
We now show that .
First note that (A4) assumes is Lipschitz, meaning for scalars and some constant ,
[TABLE]
Therefore letting we have
[TABLE]
giving the follows upper bound for constant ,
[TABLE]
Now using (26) and the triangle inequality,
[TABLE]
Finally, by assumption (A3) we have that and are all finite. Then noting that for any random variable, , we have for , meaning the boundednes of follows from (27) with assumption (A3).
The proof of the second equation in (C5) follows similarly to the proof of the first equation in (C5). Recall . Define for . By assumption (A3), the signal and side information are sampled i.i.d. from the joint density . It follows that are also i.i.d., so by Definition III.1 if where and , independent of , then
[TABLE]
We will now show that . Using the bound (26),
[TABLE]
Then using the triangle inequality,
[TABLE]
III-C Step 3
Now that we’ve justified , we make an appeal to Theorem III.1 and the SLLN in order to finally prove (11). The first result in (11), namely the asymptotic result for uniformly PL(2), follows almost immediately by applying Theorem III.1 using . Namely, by Theorem III.1,
[TABLE]
since is assumed to be uniformly PL(2). To complete the proof, we will finally prove that
[TABLE]
where independent of standard Gaussian. Then the desired result follows since
[TABLE]
The result follows by the SLLN (Definition III.1) so long as is finite. By Definition II.1 it is easy to see that if is PL(2), then there is a constant such that for all : Using this,
[TABLE]
where we have used: for any and any scalars, and . Thus,
[TABLE]
Similarly, we have the upper bound,
[TABLE]
Therefore, using (31), and the boundedness of and assumed in (A3),
[TABLE]
The second result of (11) requires a bit more care as it is not immediate that the function defined as for a sequence of side informations is uniformly PL(2) as needed to apply Theorem III.1. The next step of the proof deals with carefully handling this issue. We note that once we have shown that
[TABLE]
then the last step showing that
[TABLE]
follows by the SLLN as in (29) - (LABEL:eq:W_equation). However, the function is not obviously uniformly PL(2) since an upper bound on necessarily has an factor. This is mainly a technicality as is bounded by a constant (independent of ) with high probability.
To show (32) we would like to show that for any ,
[TABLE]
as . Define a pair of events and as
[TABLE]
and for constant independent of , Then demonstrating (33) means showing, for any , that . Note that,
[TABLE]
Considering the above, the first term approaches [math] as gets large due to Theorem III.1, since one can argue and conditional on the event being true for all integers (constant ), the function defined in (III-C) is uniformly PL(2) in . This uses that is independent of the other random elements, namely and . Next, by choosing large enough, the second probability goes to zero almost surely by the SLLN as concentrates to the elementwise expectation of .
Acknowledgment
We thank You (Joe) Zhou for insightful conversations and valuable advice. Liu and Baron acknowledge support from NSF EECS and Rush from NSF .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms for compressed sensing,” Proc. Nat. Academy Sci. , vol. 106, no. 45, pp. 18 914–18 919, Nov. 2009.
- 2[2] H. Arguello and G. Arce, “Code aperture optimization for spectrally agile compressive imaging,” J. Opt. Soc. Am. , vol. 28, no. 11, pp. 2400–2413, Nov. 2011.
- 3[3] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical Learning . Springer, Aug. 2001.
- 4[4] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” Arxiv preprint ar Xiv:1010.5141 , Oct. 2010.
- 5[5] T. M. Cover and J. A. Thomas, Elements of Information Theory . New York, NY, USA: Wiley-Interscience, 2006.
- 6[6] D. Baron, A. Ma, D. Needell, C. Rush, and T. Woolf, “Conditional approximate message passing with side information,” in Proc. IEEE Asilomar Conf. Signals, Syst. Comput. , 2017.
- 7[7] A. Ma, Y. Zhou, C. Rush, D. Baron, and D. Needell, “An approximate message passing framework for side information,” ar Xiv:1807.04839 , July 2018.
- 8[8] A. Saleh and R. Valenzuela, “A statistical model for indoor multipath propagation,” IEEE J. Select. Areas Commun. , vol. 5, no. 2, pp. 128–137, Feb. 1987.
