I-MMSE relations in random linear estimation and a sub-extensive interpolation method
Jean Barbier, Nicolas Macris

TL;DR
This paper introduces a new interpolation method called 'sub-extensive interpolation' to analyze the I-MMSE relations in random linear estimation, providing a novel proof and deeper insights into the mutual information and MMSE connection.
Contribution
It presents a new proof of the I-MMSE relation using the sub-extensive interpolation method, independent of Tanaka's formula, and clarifies the relation between different mutual information variations.
Findings
Established a simple identity linking mutual information variations and MMSE.
Provided a new, independent proof of the I-MMSE relation for linear Gaussian models.
Connected the variations to the replica symmetric formula for mutual information.
Abstract
Consider random linear estimation with Gaussian measurement matrices and noise. One can compute infinitesimal variations of the mutual information under infinitesimal variations of the signal-to-noise ratio or of the measurement rate. We discuss how each variation is related to the minimum mean-square error and deduce that the two variations are directly connected through a very simple identity. The main technical ingredient is a new interpolation method called "sub-extensive interpolation method". We use it to provide a new proof of an I-MMSE relation recently found by Reeves and Pfister [1] when the measurement rate is varied. Our proof makes it clear that this relation is intimately related to another I-MMSE relation also recently proved in [2]. One can directly verify that the identity relating the two types of variation of mutual information is indeed consistent with the one letter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
I-MMSE relations in random linear estimation
and a sub-extensive interpolation method
Jean Barbier and Nicolas Macris
Laboratoire de Théorie des Communications, Faculté Informatique et Communications,
Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Suisse.
{jean.barbier, nicolas.macris}@epfl.ch
Abstract
Consider random linear estimation with Gaussian measurement matrices and noise. One can compute infinitesimal variations of the mutual information under infinitesimal variations of the signal-to-noise ratio or of the measurement rate. We discuss how each variation is related to the minimum mean-square error and deduce that the two variations are directly connected through a very simple identity. The main technical ingredient is a new interpolation method called “sub-extensive interpolation method”. We use it to provide a new proof of an I-MMSE relation recently found by Reeves and Pfister [1] when the measurement rate is varied. Our proof makes it clear that this relation is intimately related to another I-MMSE relation also recently proved in [2].
One can directly verify that the identity relating the two types of variation of mutual information is indeed consistent with the one letter replica symmetric formula for the mutual information, first derived by Tanaka [3] for binary signals, and recently proved in more generality in [4, 2, 1, 5] (by independent methods). However our proof is independent of any knowledge of Tanaka’s formula.
I Introduction
Random linear estimation (RLE) is a fundamental research field which has been revived by a number of recent theoretical and practical developments such as compressed sensing [6], error correction via sparse superposition codes [7], Boolean group testing [8] or code division multiple access in communication [9]. Important steps towards a complete rigorous theory have been recently obtained. In particular the proof of the replica symmetric formula, a single letter formula for the asymptotic mutual information (MI), is now established for Gaussian RLE [4, 2, 1, 5]. In [2] the limits of optimality of the low complexity approximate message-passing denoising algorithm are explicitly established.
An important ingredient in the proofs of the replica formula are interesting relations for the rate of variation (the derivative) of the MI when: the signal-to-noise ratio varies [2]; the measurement rate varies [5]. These formulas give the rate of variation directly in terms of the MMSE, and therefore belong to a “family” of I-MMSE relations, the simplest member of the family being the well known relation of Guo, Verdu and Shamai [10]. Of course once the replica symetric formula for the MI is available one can a posteriori check all these relations, however the proof of these relations does not involve any knowledge of the replica formula.
In this note we give a new derivation of the I-MMSE relation proved and used in [5]. The derivation given here explicitly shows that all I-MMSE relations are intimately connected. The main new technical ingredient is an interpolation method, here called sub-extensive interpolation method. It involves a mix of ideas originating in interpolation methods developed in recent years for dense and sparse graphical systems, and we believe it is of independent interest and is bound to have applications in other problems.
We end this introduction with a few (non-exhaustive) pointers to the literature that has led to the present work. The replica formula for Gaussian RLE was first proposed, on the basis of (non-rigorous) calculations using the replica method, by Tanaka [3] for the CDMA problem with binary input signals, and was later generalized in [11] (see also [12] for recent developments in the context of compressed sensing). Montanari and Tse [13] sketched a rigorous proof of Tanaka’s formula in a regime where there is no phase transition (by which we here mean no jump discontinuity in the MMSE) and Korada and Macris [14, 15] used a Guerra-Toninelli [16] interpolation method to establish that the replica formula is always an upper bound to the MI. In [2, 4] the converse bound (and equality) is proven by using spatial coupling as a proof technique [17, 18] developped in the realm of spatially coupled graphical systems; namely spatial coupling for RLE [19, 20, 21], threshold saturation [22, 23, 24, 25, 26] and invariance of the MI under spatial coupling [27, 17].
II Random linear estimation: Setting and results
II-A Gaussian random linear estimation
We consider Gaussian RLE, where one is interested in reconstructing a signal from measurements obtained from the projection of s by a random i.i.d Gaussian measurement matrix . We consider i.i.d additive white Gaussian noise (AWGN) of variance . Call the (standardized) noise components , . The RLE model is
[TABLE]
Consider a structured setting where the signal is made of i.i.d -dimensional sections distributed according to a discrete prior with a finite (but as large as desired) number of terms and all ’s bounded (with maximum componentwise amplitude ). Thus the total number of signal components is . We denote if for all its i.i.d sections. The matrix has entries . We place ourselves in the high dimensional setting where the measurement rate is fixed when letting ( is always finite).
Define , . The likelihood of y is . From the Bayes formula, the posterior of the RLE model is
[TABLE]
where by a slight abuse of notation y here denotes the set of the independent quenched random variables . The normalization is the integral w.r.t x of the numerator. The Gibbs averages w.r.t this posterior (2) are denoted by . For example the MMSE estimator is .
II-B I-MMSE relations
The mutual information per section is
[TABLE]
and the MMSE per section is
[TABLE]
Moreover we define a measurement MMSE as
[TABLE]
linked to the MI by a canonical I-MMSE relation [10, 2]:
[TABLE]
One can show thanks to the Guerra-Toninelli interpolation method that exists (see [15] for binary signals and [2] for general signals). It can also be shown from (5) that [28] (indeed the measurement MMSE cannot increase by increasing the snr) so is a sequence of concave functions. We will repeatedly use that for a sequence of concave differentiable functions whose pointwise limit exists on , a standard result of real analysis states that: the limit is concave and continuous on all compact subsets; the limit is differentiable at almost every (a.e.) point; we can exchange the limit and derivative for a.e. points. Therefore for a.e. we have .
We now present the new I-MMSE relations specific to Gaussian RLE.
Theorem II.1** (snr I-MMSE relation)**
For Gaussian RLE with a discrete prior, exists and for a.e. ,
[TABLE]
Proof:
As remarked above for a.e. . Thus from (5) exists for a.e. and equals . On the other hand Theorem 3.4 in [2] states that a.e. ,
[TABLE]
where \lim_{L\to\infty}\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{\scriptscriptstyle\mathcal{O}}}_{L}(1)\!=\!0. Therefore also exists for a.e. and (6) holds. ∎
Remark II.2
The proof of (7) in [2] (Theorem 3.4) requires concentration properties that are currently proven for a discrete prior. An extension to the case of a general prior (e.g., a mixture of discrete and absolutely continuous parts) could perhaps be obtained by quantizing the signal and showing that \mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{\scriptscriptstyle\mathcal{O}}}_{L}(1) is uniform in the quantization but this is not immediate. A similar relation already appears in [13] but to the best of our knowledge the proof details are not given.
The main goal of this note is to give a new proof of the following I-MMSE relation first obtained in [1] (for ).
Theorem II.3** ( I-MMSE relation)**
For Gaussian RLE and for a.e. and ,
[TABLE]
Remark II.4
We conjecture that the relation is true for all and a.e . The proof of [1] works for general priors that can have discrete and absolutely continuous parts. Our approach is based on concentration theorems underpinning (7) that as explained in Remark II.2 do not quite cover this general case.
We end this section by noting that eliminating from (6) and (8) we obtain the interesting and simple formula
[TABLE]
One may check (9) directly on the replica formula for the MI.
III A sub-extensive interpolation method
We introduce the sub-extensive interpolation method which allows to first prove a slightly weaker form of Theorem II.3. All the stated lemmas are proved in the next section.
III-A The interpolated perturbed model
The interpolation is done between the RLE model (1) where a measurement matrix with lines is used and one where the measurement matrix has lines, . The additional lines have the same statistical properties and are indexed by a set of extra indices. This is a sub-extensive set because . We still denote the overall measurement and measurement matrix, that include the additional measurements and lines associated with . Define the following interpolated perturbed Hamiltonian
[TABLE]
Here the interpolation parameter is , and going from to continuously adds the new measurements. The first perturbation term corresponds to extra measurements obtained from scalar AWGN “side channels”, , , , where the snr is “small” and will eventually tend to zero. This term allows to use a useful concentration result proved in [2] (see Lemma IV.1 in sec. IV where also the second term is needed for technical reasons).
Denote the MI associated with the perturbed interpolated model , expressed similarly to (3) but with . This leads to where . The MI of this model with and without the additional measurements is, respectively, and . The Gibbs average is associated with the posterior of the interpolated pertubed model . Finally, we define the MMSE similarly as (4) but with replacing .
III-B The sub-extensive interpolation
We first show a weaker version of Theorem II.3 for the perturbed interpolated model which is valid for all but a.e. . In sec. III-C we show how to take the limit for a.e. and thus recover Theorem II.3.
Theorem III.1** ( I-MMSE relation for a.e. )**
The following limits exist for all and a.e. , and satisfy
[TABLE]
The proof is based on two lemmas proved in sec. IV.
Define a measurement MMSE associated to the subset :
[TABLE]
where denotes the expectation w.r.t all quenched variables.
Lemma III.2** (MMSE relation)**
For a.e we have
[TABLE]
Lemma III.3** (MMSE variation)**
*Fix in the interpolated perturbed model (10). Then for any , we have E_{t,h}=E_{0,h}+\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{\scriptscriptstyle\mathcal{O}}}_{L}(1) for a.e. . *
We now sketch the proof of Theorem III.1.
Proof:
By the fundamental theorem of calculus, one may write . Direct differentiation gives
[TABLE]
Integrating by parts over , (14) becomes
[TABLE]
for a.e . For the second equality we used Lemma III.2. Thus
[TABLE]
This integral over cannot be calculated immediately as the MMSE depends on . We overcome this difficulty using Lemma III.3 and . Then (15) becomes
[TABLE]
Note that is the MI of a (perturbed) RLE model with measurement rate while corresponds to a measurement rate . It is then not difficult to show with concavity inequalities w.r.t ,111Alternatively one can also directly use the Alexandrov theorem which states that a concave function has a second derivative almost everywhere. that for a.e.
[TABLE]
Finally equations (16) and (17) imply (11). ∎
III-C Proof of Theorem II.3: taking the limit
We consider the limit of (11). Again, a concavity argument allows to permute this limit and the derivative for a.e. . Also it is not very difficult to argue that all finite size quantities are continuous in . Therefore and . So Theorem II.3 follows if we can show that and for a.e. . We will show that the first limit exchange is valid for all and the second one for a.e. .
For the first limit exchange the argument is standard. The first derivative of w.r.t is an MMSE, namely , so its second derivative is negative because the MMSE cannot increase with increasing snr of the side channel (it can also be seen by explicit calculation [2]). Thus is concave in , and since also exists, the limit is attained uniformly in . This allows to exchange the limits for all .
The second limit exchange is less immediate because we cannot use a convexity argument directly on the sequence . However by a mild generalisation of Theorem II.1 (that follows from Lemmas 4.5 and 4.6 in [2]) we have for a.e. ,
[TABLE]
Then, since is a concave function of and its limit exists we can take the limit of this equation and permute it with the derivative for a.e. . Thus must exist for a.e. and satisfies
[TABLE]
But since we have (6), we conclude for a.e. . Thus the limits are exchangeable for a.e. .
IV Proofs of Lemmas III.2 and III.3
IV-A Preliminaries
Let X, two i.i.d replicas drawn according to the product distribution . Then for any function ,
[TABLE]
This identity, which has been called a Nishimori identity in the statistical mechanical literature, follows from a simple application of Bayes formula. It has a certain number of useful consequences that we list here (all the derivations can be found in appendix B of [2]).
IV-A1 Identity 1
First we have
[TABLE]
To derive this recall , expand the squares and systematically apply (18).
IV-A2 Identity 2
Set . Then (18) implies
[TABLE]
IV-A3 Identity 3
This one is more complicated. Define . From Gaussian integration by parts over and (18) one can show
[TABLE]
We also need the following concentration result.
Lemma IV.1** (Concentration of )**
Let (recall (20)). For any we have
[TABLE]
The proof of this lemma is the same as the one of Proposition 8.1 in [2]. This type of result is also found in [15] for binary signals. Lebesgue’s dominated convergence theorem applied to (22) implies \mathbb{E}[\langle\delta{\cal E}^{2}\rangle_{t,h}]\!=\!\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{\scriptscriptstyle\mathcal{O}}}_{L}(1) for a.e. .
IV-B Proof of Lemma III.2
Using (12) and an integration by parts w.r.t gives
[TABLE]
which combined with (19) leads to
[TABLE]
Integrating by part (23) again but this time w.r.t , one finds
[TABLE]
(where and are i.i.d replicas). Then using (21) for the first term in the bracket and the definition of for the second one, simple algebra leads to where
[TABLE]
The noise z has i.i.d standardized Gaussian components, so the central limit theorem implies
[TABLE]
Below we show that Lemma IV.1 implies for a.e. ,
[TABLE]
Then from (23) and (24) we get {Y}_{2}=(t/\Delta)Y_{t,h}^{({\cal S})}E_{t,h}\!+\!\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{\scriptscriptstyle\mathcal{O}}}_{L}(1). Putting all pieces together we get
[TABLE]
which is equivalent to (13) in Lemma III.2.
It remains to justify (24). We have . Thus it suffices to show that the second term is \mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{\scriptscriptstyle\mathcal{O}}}_{L}(1) for a.e. . From Cauchy-Schwarz
[TABLE]
As remarked below it, Lemma IV.1 implies \mathbb{E}[\langle\delta{\cal E}^{2}\rangle_{t,h}]=\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{\scriptscriptstyle\mathcal{O}}}_{L}(1) for a.e. , thus we just have to argue that is bounded uniformly in . By Cauchy-Schwarz again the square of this quantity is smaller than
[TABLE]
Expanding only terms of the form remain (with ). By Cauchy-Schwarz once more, their square is less than
[TABLE]
where the equality comes form the Nishimori identity (18). It is clear that these moments are all bounded uniformly in . Indeed is independent of s, so conditional on s, the linear combination is a Gaussian variable with a variance less than .
The proof of Lemma III.2 is now complete.
IV-C Proof of Lemma III.3
Recall (20). Then the MMSE difference can be written as
[TABLE]
where . Note that in (28) can be replaced by . Also, all ’s are statistically equivalent and we can replace them by the first term in the set , say . Thus
[TABLE]
Integrating over , applying Fubini and Cauchy-Schwarz, one gets
[TABLE]
Proceeding similarly as in the steps (26)–(27) one shows that (w.r.t ). Lemma IV.1 allows to conclude
[TABLE]
which implies Lemma III.3 by Lebesgue’s dominated convergence theorem as the integrand is bounded and .
V Conclusion
Let us end by pointing another application of the sub-extensive interpolation method. In [2] it is used to prove the invariance of the MI under spatial coupling in RLE. There, one interpolates between a homogeneous measurement matrix and a spatially coupled one. This is done by iteratively removing sub-extensive blocks of lines in the homogeneous matrix and replacing them by “spatially coupled” lines. Along this process the MI is monotonously varying which leads to useful inequalities. This partly discrete, partly continuous interpolation defines a “family” of interpolation methods parametrized by (the sub-extensive block size parameter). Roughly speaking our sub-extensive interpolation method “interpolates” between the purely global and continuous method of Guerra and Toninelli for dense graphical models [16] (at ) and the combinatorial approach developed for sparse graphs by Gamarnik, Bayati and Tetali in [29] (at ), where a discrete and local interpolation is done “one constraint at a time” (here one measurement at a time).
Acknowledgments
J.B acknowledges SNSF grant no. 200021-156672.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. Reeves and H. D. Pfister, “The replica-symmetric prediction for compressed sensing with gaussian matrices is exact,” in 2016 IEEE International Symposium on Information Theory (ISIT) , July 2016.
- 2[2] J. Barbier, N. Macris, M. Dia, and F. Krzakala, “Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation.” [Online]. Available: https://arxiv.org/pdf/1701.05823 v 1.pdf
- 3[3] T. Tanaka, “A statistical-mechanics approach to large-system analysis of cdma multiuser detectors,” IEEE Trans. on Information Theory , 2002.
- 4[4] J. Barbier, M. Dia, N. Macris, and F. Krzakala, “The Mutual Information in Random Linear Estimation,” in in the 54th Annual Allerton Conference on Communication, Control, and Computing , September 2016.
- 5[5] G. Reeves and H. D. Pfister, “The replica-symmetric prediction for compressed sensing with gaussian matrices is exact,” 2016. [Online]. Available: http://arxiv.org/abs/1607.02524
- 6[6] E. J. Candes and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. on Information Theory , 2006.
- 7[7] A. Barron and A. Joseph, “Toward fast reliable communication at rates near capacity with gaussian noise,” in Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on , 2010.
- 8[8] G. K. Atia and V. Saligrama, “Boolean compressed sensing and noisy group testing,” IEEE Trans. on Information Theory , 2012.
