Mutual information for low-rank even-order symmetric tensor estimation
Cl\'ement Luneau, Jean Barbier, Nicolas Macris

TL;DR
This paper derives a variational formula for the asymptotic mutual information in finite-rank symmetric tensor factorization of even order, extending adaptive interpolation methods to more complex tensor models.
Contribution
It introduces a novel extension of the adaptive interpolation method for finite-rank, even-order symmetric tensors, advancing theoretical understanding of tensor estimation.
Findings
Derived a single-letter variational expression for mutual information
Extended adaptive interpolation to finite-rank, even-order tensors
Identified limitations for odd-order tensor cases
Abstract
We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This requires new nontrivial ideas with respect to the current analysis in the literature. We also underline where the proof falls short when dealing with odd-order tensors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Mutual information for low-rank even-order symmetric tensor estimation
Clément Luneau
Communication Theory Laboratory, École Polytechnique Fédérale de Lausanne, Switzerland
Jean Barbier
The Abdus Salam International Center for Theoretical Physics, Trieste, Italy.
Nicolas Macris
Communication Theory Laboratory, École Polytechnique Fédérale de Lausanne, Switzerland
Abstract
We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This requires new nontrivial ideas with respect to the current analysis in the literature. We also underline where the proof falls short when dealing with odd-order tensors.
1 Introduction
There exist well-known unsupervised algorithms to discover structure in a 2D dataset, e.g., singular value decomposition (SVD), principal component analysis (PCA) and other spectral methods [14]. Tensors naturally handle multidimensional data and their use becomes more and more beneficial with the emergence of big data, a strong incentive to go beyond the flat matrix world. Tensor decompositions come with some advantages with respect to matrices, and have numerous applications in signal processing and machine learning, e.g., data compression, data visualization, learning probabilistic latent variables models, etc. [9, 21]. The canonical polyadic decomposition (CPD), also known as tensor rank decomposition or tensor factorization, is the most familiar one and represents a tensor as a minimum-length linear combination of rank-one tensors. This minimum-length defines the tensor rank. If instead the number of rank-one tensors forming the linear combination is not minimal, we talk of a -term decomposition. Decompositions of tensors are also called tensor factorizations and this is the terminology we adopt in the rest of the paper.
One approach to explore computational and/or statistical limits of tensor factorization is to consider a statistical model, as done in [22]. The model is the following: draw column vectors, evaluate for each of them their th tensor power and sum those symmetric order- tensors (this sum is exactly a -term polyadic decomposition). Tensor factorization can then be studied as an inference problem, namely, to estimate the initial vectors from noisy observations of the tensor and to determine information theoretic limits for this task. To do so, we focus on proving formulas for the asymptotic mutual information between the noisy observed tensor and the original vectors. Such formulas were first rigorously derived for and , i.e., rank-one matrix factorization: see [15] for the case with a binary input vector, [10] for the restricted case in which no discontinuous phase transition occurs, [16] for a single-sided bound and, finally, [3] for the fully general case. The proof in [3] combines interpolation techniques with spatial coupling and an analysis of the Approximate Message-Passing (AMP) algorithm. Later, and still for , [17] went beyond rank-one by using a rigorous version of the cavity method. Reference [18] applied the heuristic replica method to conjecture a formula for any and finite , which is then proved for and . Reference [18] also details the AMP algorithm for tensor factorization and shows how the single-letter variational expression for the mutual information allows one to give guarantees on AMP’s performance. Afterwards, [5, 6] introduced the adaptive interpolation proof technique which they applied to the case , . Other proofs based on interpolations recently appeared, see [1] (, ) and [20] (, ).
In this work, we prove the conjectured replica formula for any finite rank and any even order using the adaptive interpolation method. We also underline what is missing to extend the proof to odd orders. The adaptive interpolation method was introduced in [5, 6] as a powerful extension to the Guerra-Toninelli interpolation scheme [11]. Since then, it has been applied to many other inference problems in order to prove formulas for the mutual information, e.g., [7, 4]. While our proof outline is similar to [6], there are two important new ingredients. First, to establish a tight lower bound on the asymptotic mutual information, we have to prove the regularity of a change of variable given by the solutions to an ordinary differential equation. This is nontrivial when the rank becomes greater than one. Second, the same bound requires one to prove the concentration of the overlap (a quantity that fully characterizes the system in the high-dimensional limit). When the rank is greater than one, this overlap is a matrix and a recent result [2] on the concentration of overlap matrices can be adapted to obtain the required concentration in our interpolation scheme.
The paper is organized as follows. In Section 2 we set up our precise statistical model and state the main theorems giving the single-letter variational expression for the asymptotic mutual information. The adaptive interpolation method is formulated in Section 3 and the basic upper and lower bounds on the asymptotic mutual information are proved in Section 4. Sections 5 and 6 contain the new and essential results which allow to go from rank-one to finite-rank tensors. Finally, the difficulties encountered for odd-order tensors are discussed in the last section, that is, Section 7. The reader will find in Appendix B a technical calculation which is new and crucial to our proof, while Appendices A and C present more classical material.
2 Low-rank symmetric tensor factorization
We study the following statistical model. Let be a positive integer. are random column vectors in , independent and identically distributed (i.i.d.) with distribution . These vectors are not directly observed. Instead, for each -tuple with , we observe
[TABLE]
where is a known signal-to-noise ratio (SNR) and the noise is i.i.d. with respect to the standard normal distribution . Let be the matrix whose th row is given by . All the observations (1) are combined into the following symmetric order- tensor ( denotes the th column of ):
[TABLE]
Our main result is the proof of a formula for the mutual information in the limit while the rank is kept fixed. This formula is given as the optimization of a potential over the cone of symmetric positive semidefinite matrices . Let and . Define the convex function (see Lemma 4 in Appendix A)
[TABLE]
as well as the potential
[TABLE]
where is the th Hadamard power of (for two matrices and of the same dimension, the Hadamard product is the matrix of same dimension with elements given by ). Note that, by the Schur Product Theorem [23], the Hadamard product of two matrices in is also in . Let the second moment matrix of a random vector . Our main result is the proof of the replica formula conjectured in [18], that is,
Theorem 1**.**
(Mutual information in the high-dimensional limit) Assume is even and is such that its first moments are finite. Then
[TABLE]
Important remark: We can reduce the proof of (4) to the case by rescaling properly . From now on, we set and define .
Before proving Theorem 1, we introduce important information theoretic quantities, adopting the statistical mechanics terminology. Let . Given the observations , define the Hamiltonian for all :
[TABLE]
Using Bayes’ rule, the posterior probability density function is
[TABLE]
with the normalization factor. Finally, the free entropy is the quantity
[TABLE]
which is linked to the mutual information through the identity
[TABLE]
In (7), is a quantity such that is bounded uniformly in . Thanks to (7), Theorem 1 will follow directly from the next two bounds on the asymptotic free entropy.
Theorem 2**.**
(Lower bound on the asymptotic free entropy) Assume is even and is such that its first moments are finite. Then
[TABLE]
Theorem 3**.**
(Upper bound on the asymptotic free entropy) Assume is even and has bounded support. Then
[TABLE]
Important remark: Note that the assumption on in Theorem 3 is stricter than the one in Theorem 1. Therefore, combining Theorem 2 and Theorem 3 only proves the limit (4) for a distribution which has bounded support. The generalization to a distribution whose first moments are finite is done by approaching with distributions having bounded support, much as it is done in [17, Section 6.2.2]
3 Adaptive path interpolation
We introduce a time parameter . The adaptive path interpolation interpolates from the original channel (1) at to decoupled channels at . In between, we follow an interpolation path , which is a continuously differentiable function parametrized by a small perturbation and such that . More precisely, for , we observe:
[TABLE]
The noise \widetilde{Z}_{j}\overset{\text{\tiny i.i.d.}}{\mathrel{\raisebox{-2.0pt}{\sim}}}\mathcal{N}(0,I_{K}) is independent of both and . Let be the matrix whose th row is given by and . The associated interpolating Hamiltonian reads:
[TABLE]
The interpolating free entropy is defined similarly to the original free entropy (6), that is,
[TABLE]
with . Evaluating (12) at both extremes gives:
[TABLE]
denotes the Frobenius norm and is a quantity such that . In order to deal with future computations, it is useful to introduce the Gibbs brackets that denote an expectation with respect to the posterior distribution, i.e.,
[TABLE]
Combining (13) with the fundamental theorem of calculus , we obtain the sum-rule of the adaptive path interpolation.
Proposition 1** (Sum-rule).**
Assume has finite th-order moments. Denote the derivative of the interpolation path . Let be the entries of the overlap matrix . Then
[TABLE]
where and are independent of and , respectively.
Proof.
See Section 5 for the computation of , that is, the -derivative of . ∎
4 Matching bounds
In this section we prove both Theorems 2 and 3 by plugging two different choices for in the sum-rule (15).
4.1 Lower bound: proof of Theorem 2
A lower bound on is obtained by choosing the interpolation function with a symmetric positive semidefinite matrix, i.e., and . Then the sum-rule (15) reads
[TABLE]
where . If is even then is nonnegative on and (16) directly implies . Taking the inferior limit on both sides of this inequality, and bearing in mind that the inequality is valid for all , ends the proof of Theorem 2.
We have at our disposal a wealth of interpolation paths when considering any continuously differentiable . However, to establish the lower bound (8), we have used a simple linear interpolation, i.e., . Such an interpolation dates back to Guerra [11] and was already used by [18, 17] to derive the lower bound (8) for both cases , any order , and , any finite rank . Now that we turn to the proof of the upper bound (9), we will see how the flexibility in the choice of constitutes an improvement on the classical interpolation.
4.2 Upper bound: proof of Theorem 3
4.2.1 Interpolation determined by an ordinary differential equation (ODE)
The sum-rule (15) suggests to pick an interpolation path satisfying
[TABLE]
The integral in (15) can then be split in two terms: one similar to the second summand in (3), and one that will vanish in the high-dimensional limit if the overlap concentrates. The next proposition states that (17) indeed admits a solution, a fact which is not obvious because the Gibbs brackets themselves depend on . Nontrivial properties required to show the upper bound (9) are also proved.
Proposition 2**.**
For all , there exists a unique global solution to the first-order ODE
[TABLE]
This solution is continuously differentiable and bounded. If is even then , is a -diffeomorphism from (the open cone of symmetric positive definite matrices) into whose Jacobian determinant is greater than one, i.e.,
[TABLE]
Here denotes the Jacobian matrix of .
Proof.
We now rewrite (17) explicitly as an ODE. Let be a matrix in . Consider the problem of inferring from the following observations:
[TABLE]
It is reminiscent of the interpolating problem (10). We can form a Hamiltonian similar to (11), where is simply replaced by , and are the Gibbs brackets associated to the posterior of this model. We define the function
[TABLE]
Note that is a symmetric positive semidefinite matrix. Indeed, from the Nishimori identity111 The Nishimori identity is a direct consequence of the Bayes formula. In our setting, it states where are two samples drawn independently from the posterior distribution given , . Here can also explicitly depend on , . , . By the Schur Product Theorem [23], the Hadamard power also belongs to , justifying that takes values in the cone of symmetric positive semidefinite matrices. is continusouly differentiable on . By the Cauchy-Lipschitz theorem, there exists a unique global solution to the -dimensional ODE:
[TABLE]
Each initial condition is tied to a unique solution . This implies that the function is injective. Its Jacobian determinant is given by Liouville’s formula [12]:
[TABLE]
Thanks to the identity (22), we can show that the Jacobian determinant is greater than (or equal to) one by proving that the divergence
[TABLE]
is nonnegative for all . By Lemma 5 in Appendix B, the divergence reads (we omit the subscripts of the Gibbs brackets ):
[TABLE]
where
[TABLE]
If is even then is nonnegative. We show next that the ’s are nonnegative, thus ending the proof of (18). The second expectation on the right-hand side (r.h.s.) of (24) satisfies:
[TABLE]
The inequality is a simple application of Jensen’s inequality, while the equality that follows is an application of the Nishimori identity. The final upper bound is nothing but the first expectation on the r.h.s. of (24). Therefore, . ∎
4.2.2 Proof of Theorem 3
Let be a symmetric positive definite matrix, i.e., . We interpolate with the unique solution to (17). The sum-rule (15) then reads:
[TABLE]
Using first the Lipschitz continuity of and then its convexity (see Lemma 4, Appendix A), it comes:
[TABLE]
with . Combining both (25) and (4.2.2) directly gives:
[TABLE]
In order to end the proof of (9), we must show that the last integral term in the upper bound (27) vanishes when goes to infinity. This will be the case if the overlap matrix concentrates around its expectation . Indeed, provided that the th-order moments of are finite, there exists a constant depending only on such that
[TABLE]
However, proving that the r.h.s. of (28) vanishes is only possible after integrating on a well-chosen set of perturbations (that play the role of initial conditions in the ODE (21)). In essence, the integration over smoothens the phase transitions that might appear for particular choices of when goes to infinity. We now describe the set of perturbations on which to integrate.
Let be a decreasing sequence of real numbers in and define the sequence of subsets:
[TABLE]
Those are subsets of symmetric strictly diagonally dominant matrices with positive diagonal entries, hence they are included in (see [13, Corollary 7.2.3]). As is a -dimensionnal hypercube whose side has length , its volume is .
Remember that, as per Proposition 2, for every the interpolation path is chosen as the unique solution to . Then, for a fixed , using Cauchy-Schwarz inequality and the change of variable – which is justified because is a -diffeomorphism (see Proposition 2) –, we obtain:
[TABLE]
We introduced the notation while are still the Gibbs brackets associated to the posterior distribution of the inference problem (19). The last inequality follows from (18). It will be easier to work with the convex hulls of , denoted . These convex hulls are uniformly bounded compact sets of . Indeed, every is compact and included in the convex set
[TABLE]
which does not depend on and (see Section 6, property (i) of Lemma 1). Note that the upper bound (30), the inclusion and the nonnegativity of the integrand directly imply:
[TABLE]
By Theorem 4 in Section 6, there exists a positive constant which depends only on , and such that:
[TABLE]
Combining (28), (32) and (33), we finally get:
[TABLE]
To conclude the proof, we have to further constrain to satisfy both and when . E.g., with is a valid choice. Under this constraint, the upper bound (34) vanishes in the high-dimensional limit. Integrating the inequality (27) over and, then, making use of the vanishing upper bound (34) as well as
[TABLE]
give the inequality f_{n}=V_{\mathcal{E}_{n}}^{-1}\int_{\mathcal{E}_{n}}d\epsilon\,f_{n}\leq\sup_{S\in\mathcal{S}_{K}^{+}}\phi_{p}(S)+\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{\scriptscriptstyle\mathcal{O}}}_{n}(1). The upper bound (9) follows simply, thus ending the proof of Theorem 3.
5 Time-derivative of the average interpolating free entropy
In order to prove the sum-rule in Proposition 1, we need to compute the derivative of the averaged interpolating free entropy (12) with respect to . We recall that denotes the derivative of and that the overlap matrix is , that is,
[TABLE]
Proposition 3** (Derivative of the average interpolating free entropy).**
Assume that has finite th-order moments. Consider the average free entropy (12). Its derivative with respect to satisfies:
[TABLE]
Here is a quantity such that is bounded uniformly in , and .
Proof.
Note that the conditional probability density function of given reads:
[TABLE]
Therefore, the average interpolating free entropy satisfies:
[TABLE]
Taking the time-derivative of (38), we get:
[TABLE]
where , are given by the two expectations and
[TABLE]
Equation (40) comes from differentiating the interpolating Hamiltonian (11). Before diving further, we remind two useful identities:
[TABLE]
The identities (41) and (42) can further be combined to obtain
[TABLE]
Evaluating (40) at and then making use of (43), it comes:
[TABLE]
Thanks to the Nishimori identity,
[TABLE]
It follows that
[TABLE]
where we used to get the last equality. Therefore, . Plugging (5) in the expression for , we obtain:
[TABLE]
The two kind of expectations appearing on the r.h.s. of (45) are simplified in the paragraphs a) and b).
a) Integrating by parts with respect to the Gaussian random variable , we get:
[TABLE]
Summing the latter identity over and , we obtain:
[TABLE]
This last equality can be further simplified by replacing the sum over tuples such that by a sum over any -tuple whose elements are distinct divided by (the cardinality of the symmetric group of degree ). This is possible because the summand is symmetric with respect to any permutation of the indices . We also need to account for the terms corresponding to -tuples having common elements (that is, for some ). There are such terms and each summand is bounded under the assumption that has finite th order moments. Hence the term appearing in the final equalities:
[TABLE]
b) Now we look at the second expectation and integrate by parts with respect to the Gaussian random vector :
[TABLE]
Equation (47) can be further simplified thanks to the Nishimory identity (for the first and last equalities) and the identity (43) (for the second equality):
[TABLE]
Summing the latter over , we obtain:
[TABLE]
Summing the final expressions in (46) and (49) ends the proof of Proposition 3. ∎
6 Concentration of the overlap matrix
The proof of Theorem 3 requires that, up to an integral over a small volume of perturbations , the overlap matrix concentrates around its expectation . We chose to integrate the perturbation over the hypercube which is defined by (29) and depends on a sequence of decreasing numbers in . Remember that, for all , is the unique solution to and, for all , is the convex hull of the image . We also remind that, in Proposition 2, we introduced the inference problem (19) whose associated posterior distribution reads
[TABLE]
where :
[TABLE]
Let be the Gibbs brackets associated to the posterior distribution (50). Thanks to a change of variables (see the upper bound (32)), we showed that the following theorem is enough to prove Theorem 3.
Theorem 4** (Concentration of the overlap matrix around its expectation).**
Assume has bounded support. There exists a positive constant depending only on , and such that
[TABLE]
The proof of Theorem 4 relies on the one of [2, Theorem 3]. In the later reference, the concentration result is given for an integral over a hypercube . In our case, the integral on the left-hand side of (52) is over the convex hull of ’s image by the function . It is likely not a hypercube, even less one whose form is similar to . Therefore, we first show that the convex hulls have properties allowing us to carry out a proof similar to [2].
6.1 Properties of ’s convex hull
For , we will denote the symmetric matrix whose entries are:
[TABLE]
Lemma 1** (Properties of ’s convex hull).**
For every :
- (i)
; 2. (ii)
there exists such that ; 3. (iii)
for every pair and real number , is a symmetric positive definite matrix; 4. (iv)
the st-order Fréchet derivative and the nd-order Fréchet derivative satisfy
[TABLE]
Remark: Note that (i) does not depend on and , while (ii-iv) do not depend on .
Proof.
We start by proving (i). If then there exists such that , i.e.,
[TABLE]
Thus, . We have:
[TABLE]
The second inequality follows from Cauchy-Schwarz inequality and the first equality from the Nishimori identity. Hence the upper bound for all , which directly extends to by definition of a convex hull.
Now to prove (ii). If , note that (56) directly implies as – by the Nishimori identity and the Schur Product theorem – is symmetric positive semidefinite for all . More generally, if , there exist , and such that and . It follows direcly that where . As is convex, it concludes the proof of (ii).
We now show (ii) (iii). Let and pick such that . For all and , is a symmetric strictly diagonally dominant matrix with positive diagonal entries. Therefore, belongs to and .
Finally, we prove (iv). Let and denote its minimum eigenvalue. Applying [19, Theorem 1.1] (the first upper bound in (6) to be more precise), we obtain:
[TABLE]
Using (ii), pick such that . By [24, Corollary 2], the minimum eigenvalue of is greater than where
[TABLE]
Hence . Combining this lower bound with (58) ends the proof of (iv). ∎
6.2 Concentration of around its expectation
As in [2], the concentration of the overlap matrix around its expectation will follow from the concentration of the symmetric matrix whose entries are:
[TABLE]
This is well-defined as long as . To prove concentration results on , it will be useful to work with the free entropy where is the normalization factor of the posterior distribution (128). In Appendix C, we prove that this free entropy concentrates around its expectation when . In order to shorten notations, we define:
[TABLE]
Proposition 4** (Thermal fluctuations of ).**
Assume has finite fourth-order moments. There exists a positive constant , depending only on , and , such that for all :
[TABLE]
Proof.
Fix . Note that , :
[TABLE]
Further differentiating, we obtain:
[TABLE]
Combining (64) and (75) for (see Lemma 2 following this proof), it comes:
[TABLE]
We start with upper bounding the integral over of the second summand on the right-hand side of (65). Thanks to the Nishimory identity, we can see that . Indeed:
[TABLE]
Therefore, is symmetric positive semidefinite and the second term on the right-hand side of (65) satisfies:
[TABLE]
The last inequality follows from the upper bound (54) in Lemma 1. Therefore, keeping in mind that is included in the ball , there exists a positive constant depending only on , and such that:
[TABLE]
Now we turn to upper bounding \int_{C(\mathcal{R}_{n,t})}\frac{dR}{n}\frac{\partial^{2}f_{n}}{\partial R_{\ell\ell^{\prime}}^{2}}\Big{|}_{t,R}. Define the closed convex set
[TABLE]
For every pair , we denote the symmetric matrix whose entries are given by:
[TABLE]
Because is a closed convex, there exist two functions such that :
- (i)
; 2. (ii)
; 3. (iii)
.
Therefore,
[TABLE]
Note that :
[TABLE]
where the second and third inequalities follow from the identity (74) (see Lemma 2 following this proof) and the inequality (57), respectively. Combining both (71) and (72), we finally get
[TABLE]
where is a positive constant that depends only on , and . Integrating (65) over , making use of the upper bounds (68) and (73) and, finally, summing over end the proof. ∎
We relied on the following lemma for the proof of Proposition 4.
Lemma 2**.**
Assume has finite second-order moments. Let if and otherwise. Then, , :
[TABLE]
Proof.
Fix . By the definition (59) of , we have :
[TABLE]
Integrating by parts with respect to the Gaussian random vectors , , the last expectation on the right-hand side of (76) reads:
[TABLE]
The second and third equalities follow from (135) and the Nishimori identity, respectively. Plugging (77) in (76) and, then, making use of the identity end the proof of (74):
[TABLE]
We now turn to the proof of (75). We have:
[TABLE]
The second equality follows once again from a Gaussian integration by parts with respect to , . Note that for all :
[TABLE]
because of the identity
[TABLE]
Plugging (79) in (6.2), further simplifies:
[TABLE]
The second equality follows from the Nishimori identity. ∎
Proposition 5** (Quenched fluctuations of ).**
Assume has bounded support. There exists a positive constant , depending only on , and , such that for all :
[TABLE]
Proof.
Fix . For all and , we have:
[TABLE]
By assumption there exists a nonnegative real number such that almost surely. Using the upper bound (55) in Lemma 1, the second term on the right-hand side of (83) can be upper bounded:
[TABLE]
From now on, we also fix as well as . The closed convex set is as defined by (69). Remember that, for every real number , is the matrix defined by (70), and that there exist two functions such that :
- (i)
; 2. (ii)
; 3. (iii)
.
Besides, by property (iii) in Lemma 1, for every the matrix is in . Thus, we can define for all :
[TABLE]
is convex on as it is twice differentiable with a nonnegative second derivative by (83) and (86). The same holds for . We will apply the following standard to these two convex functions (see [5] for a proof):
Lemma 3** (An upper bound for differentiable convex functions).**
Let and be two differentiable convex functions defined on an interval . Let and such that . Then
[TABLE]
where .
For all , we have:
[TABLE]
Let , which is nonnegative by convexity of . It follows from Lemma 3 and the two identities (90) and (91) that :
[TABLE]
Thanks to the inequality , this directly implies :
[TABLE]
The next step is to bound the integral of the three summands on the right-hand side of (92). Remember that . By property (i) in Lemma 1, we have:
[TABLE]
Besides, by independence of the Gaussian random vectors , {\mathbb{V}\mathrm{ar}}\big{(}\sum_{j=1}^{n}\|\widetilde{Z}_{j}\|\big{)}=n{\mathbb{V}\mathrm{ar}}\,\|\widetilde{Z}_{1}\|\leq nK. We conclude that there exists a positive constant depending only on , and such that :
[TABLE]
Note that . For all , we have:
[TABLE]
where is a positive constant depending only on , and . The second inequality in (95) follows from the upper bounds (see (72)), (93) and . Thus, for the second summand, we obtain :
[TABLE]
The last inequality is a simple application of the mean value theorem. We finally turn to the third summand. For every and pair , we have:
[TABLE]
This upper bound is uniform in and . Hence, by Theorem 5 of Appendix C, there exists a positive constant depending only on , and such that , :
[TABLE]
Using first (97) and then (93), we see that the third summand satisfies :
[TABLE]
We now choose . As , this choice satisfies . The combination of (92) with the three upper bounds (94), (96) and (98) shows the existence of a positive constant depending only on , and such that:
[TABLE]
One important fact following from our analysis is that can be chosen independently of both and . Therefore, for all , we have
[TABLE]
where denotes the volume of . As each of the sets is uniformly bounded in and , the theorem follows from summing (100) over . ∎
6.3 Concentration of around its expectation
We forthwith use the concentration results for , that is, Propositions 4 and 5, to prove Theorem 4. First an intermediary result on the thermal fluctuations of :
Proposition 6** (Concentration of the overlap matrix around its expectation).**
Assume has finite fourth-order moments. There exists a positive constant depending only on , and such that
[TABLE]
Proof.
Fix . Note that :
[TABLE]
where is finite thanks to the assumption. Differentiating with respect to the identity (see Lemma 2), we obtain (see [2] for the detailed computation):
[TABLE]
By Proposition 4 and the inequality (68) combined with (75), there exists a positive constant depending only on , and such that:
[TABLE]
Combining both inequalities (103) and (105) with Cauchy-Schwarz inequality, it comes:
[TABLE]
with the volume of which is bounded uniformly in and by (i) of Lemma 1. This ends the proof of (101). The inequality (102) is proved in a similar way (see [2]). ∎
Finally we conclude this section with the proof of Theorem (4).
Proof of Theorem 4.
To lighten notations we drop the subscripts of the Gibbs brackets . The concentration of can be linked to the concentration of by rewriting properly. Thanks to the identity (74), we have:
[TABLE]
Plugging ’s definition (59) in and integrating by parts with respect to the Gaussian random vectors , , we find:
[TABLE]
Note that :
[TABLE]
The second equality follows from (135), for the first expectation, and the Nishimori identity, for the second expectation. Plugging (109) in (108), it comes:
[TABLE]
Subtracting (107) to (110), we obtain:
[TABLE]
Remember the matrices defined by (53). As , we have:
[TABLE]
Subtracting (112) to (113) yields:
[TABLE]
Plugging (114) in (111) gives the equality:
[TABLE]
On one hand,
[TABLE]
On the other hand, using exclusively Cauchy-Schwarz inequality, we have:
[TABLE]
Note that where the last inequality follows from (i) in Lemma 1. Therefore, \big{\|}\frac{\partial\sqrt{R}}{\partial R_{\ell\ell^{\prime}}}\sqrt{R}\big{\|}\leq\big{\|}\frac{\partial\sqrt{R}}{\partial R_{\ell\ell^{\prime}}}\big{\|}\|\sqrt{R}\|\leq\nicefrac{{B}}{{\sqrt{2s_{n}}}} (remember (54)). By Cauchy-Schwarz inequality:
[TABLE]
Further upperbounding, we obtain
[TABLE]
as, by Jensen’s inequality and Nishimori identity, we have:
[TABLE]
Putting together the equality (115), the lower bound (116) and the upper bounds (117), (118), (119), there exists a positive constant depending only on , and such that:
[TABLE]
To end the proof of Theorem 4, it remains to integrate both sides of (120) over and apply Propositions 4, 5, 6. ∎
7 Conclusion and discussion for odd-order tensors
In this work, we have proved the conjectured replica formula for even-order symmetric tensors. It would be desirable to extend both Theorem 2 and Theorem 3 to the odd-order case. For the case we refer to [18]. For , this is still an open problem and we now briefly discuss where our proofs fall short in this case.
Ideally, to extend Theorem 2 to an odd order , we would show that the integral on the r.h.s. of (16), i.e., with , is nonnegative. However, when is odd, is not nonnegative on its whole domain of definition. To be able to say something about the integral, we have to take a Gibbs average of before applying . This requires rewriting the integral as follows:
[TABLE]
When , both and are nonnegative real numbers. The nonnegativity of for then ensures that the second integral on the r.h.s. of (121) is nonnegative and, by introducing a small perturbation on which we integrate, we can cancel the first integral as was done in the proof of Theorem 3. This is how the lower bound is proved in [18]. When , we only know that and are symmetric positive semidefinite matrices: a priori nothing can be said on the sign of their individual entries. The problem remains if we further rewrite:
[TABLE]
While and are positive semidefinite, nothing can be said on the sign of their individual entries. Most probably, it should be the full sum over that one should consider to conclude on the sign of the second integral on the r.h.s. of (122). Indeed, using , we can show that is nonnegative if or . As far as we can tell, it is not clear why such partial ordering between and (which itself depends on ) holds.
Regarding Theorem 3, the whole proof would directly apply to odd if we could show that the divergence (23) is nonnegative. However this is more difficult than for even. Indeed, while the ’s are still , it is not necessarily the case of when is odd.
Funding
This work was supported by the Swiss National Science Foundation [200021E-175541 to C. L].
Appendix A Properties of the function
Lemma 4**.**
Let and . The function , defined as
[TABLE]
is Lipschitz continuous with Lipschitz constant and convex.
Proof.
Consider the inference problem in which one observes the -dimensional vector , where is known, and one wants to recover . The posterior of X given Y is
[TABLE]
where \mathcal{Z}_{R}(Y)=\int dP_{X}(x)\exp\big{(}Y^{T}\sqrt{R}x-\frac{1}{2}x^{T}Rx\big{)}. We denote the Gibbs brackets associated to the latter posterior distribution. Clearly, .
Now fix . We will prove that the function is convex, thus proving that is convex on . The convexity on the whole cone will then follow from the continuity of (which is clear from its definition). is twice differentiable. Its derivative reads:
[TABLE]
To get the second equality, first integrate by parts with respect to the Gaussian random variables , . Then make use of the identity
[TABLE]
which follows from . Differentiating (125) further, we find (the subscript of is omitted):
[TABLE]
To get the second equality, we once again used Gaussian integration by parts and the identity (126). The second-to-last equality follows from the Nishimori identity:
[TABLE]
The convexity of now follows directly from the non-negativity of on .
To prove the Lipschitz continuity of , note that the derivative of satisfies :
[TABLE]
The mean value theorem then directly implies . The last inequality in (127) follows from Cauchy-Schwarz inequality, Jensen’s inequality and the Nishimori identity:
[TABLE]
∎
Appendix B Divergence of the function
In Proposition 2 we introduced the inference problem (19). The associated posterior distribution is
[TABLE]
where and :
[TABLE]
Let be the Gibbs brackets associated to the posterior distribution (128). In this appendix we prove a formula for the divergence of the function
[TABLE]
Lemma 5** (Divergence of ).**
Let if and otherwise. :
[TABLE]
Then, the divergence of is
[TABLE]
with
[TABLE]
Proof.
To lighten notations, we omit the subscripts of the Gibbs brackets . Let . The partial derivative of R\mapsto\big{(}G_{n}(t,R)\big{)}_{\ell\ell^{\prime}} with respect to reads:
[TABLE]
with
[TABLE]
Once the identity (134) has been plugged in the right-hand side of (133), two expectations involving the Gaussian randon vectors , , appear. An integration by parts with respect to the Gaussian random variables , , gives:
[TABLE]
[TABLE]
In both chains of equalities, the last one follows from an identity similar to (43), i.e.,
[TABLE]
Making use of the two identities yielded by the integration by parts, as well as (135), we find:
[TABLE]
Thanks to the Nishimori identity, we have
[TABLE]
and (136) further simplifies (the last equality uses the cyclic property of the trace):
[TABLE]
Now consider the case . All the entries of are zeros save for the entries and which are both one. Equation (137) then reads:
[TABLE]
Combining (133) and (138) gives the identity (131) when . The case is obtained in a similar way except that now the entries of are zeros save for the entry which is one.
We can now prove the identity for the divergence of . This divergence, denoted , satisfies:
[TABLE]
In the last equality of (139), replacing the summands by their formula (131) yields:
[TABLE]
Remember that , and therefore , is symmetric. Using that the trace is invariant by transposition and cyclic permutation, the two traces in (140) read:
[TABLE]
Clearly, \mathbb{E}\,\big{\langle}({\mathbf{Q}}+{\mathbf{Q}}^{T})\circ\big{(}{\mathbf{Q}}+{\mathbf{Q}}^{T}-\langle{\mathbf{Q}}+{\mathbf{Q}}^{T}\rangle\big{)}\big{\rangle}=\mathbb{E}\,\big{\langle}\big{(}{\mathbf{Q}}+{\mathbf{Q}}^{T}-\langle{\mathbf{Q}}+{\mathbf{Q}}^{T}\rangle\big{)}^{\circ 2}\big{\rangle}. Similarly, we have
[TABLE]
For this last equality, we could complete the square thanks to the following term being zero:
[TABLE]
Plugging these identities back in (140), we finally obtain:
[TABLE]
where we recognize that the second expectation is equal to (see definition (132)). ∎
Appendix C Concentration of the free entropy
Consider the inference problem (19). The associated Hamiltonian reads
[TABLE]
In this section we show that the free entropy
[TABLE]
concentrates around its expectation. We will sometimes write , omitting the arguments, to shorten notations.
Theorem 5** (Concentration of the free entropy).**
Assume has finite th order moments. There exists a positive constant depending only on , , and such that
[TABLE]
Proof.
To lighten notations we drop the subscripts of the Gibbs brackets . First, we show that the free entropy concentrates on its conditional expectation given the Gaussian noise , . Thus, is seen as a function of only and we work conditionally to . Let be i.i.d. samples from , independent of . For all , we define
[TABLE]
where , are obtained from , by replacing by . We can consider an inference problem similar to (19) for which the observations are , . Then the Gibbs brackets associated to the posterior distribution are
[TABLE]
By the Efron-Stein inequality (see [8, Theorem 3.1]), we have:
[TABLE]
Fix . By Jensen’s inequality, note that
[TABLE]
Define and \forall i\in\mathcal{I}_{j}:c(i)=\big{|}\big{\{}a\in\{1,\dots,p\}:i_{a}=j\big{\}}\big{|}. The quantity inbetween the Gibbs brackets in (145) reads:
[TABLE]
Using Jensen’s inequality, we further obtain:
[TABLE]
We now bound each summand on the right-hand side of (147) separately. For all and :
[TABLE]
The first inequality follows from the Cauchy-Schwarz inequality, the second one from Jensen’s inequality, and the first equality from the Nishimori identity. The final bound is finite given that has finite th order moments. Hence, there exists a positive constant depending only on , and such that the first term on the right-hand side of (147) is bounded by (as ). Regarding the second term on the right-hand side of (147), we easily get:
[TABLE]
We conclude that there exists a positive constant depending only on , , and such that
[TABLE]
A similar bound holds when the Gibbs brackets are replaced by . Finally, combining (144), (145) and (148), we obtain the desired upper bound:
[TABLE]
where the positive constant is not necessarily the same than before but still only depends on , , and .
The second – and final – step is to show that the conditional expectation of the free entropy given concentrates on its expectation. Let . By the Gaussian-Poincaré inequality (see [8, Theorem 3.20]), we have:
[TABLE]
The squared norm of the gradient of reads . Each of these partial derivatives takes the form \partial g=-n^{-1}\big{\langle}\partial\mathcal{H}_{t,R}\big{\rangle}. More precisely:
[TABLE]
On one hand, we have
[TABLE]
where the first two inequalities follow from Jensen’s inequality and the equality from the Nishimori identity. On the other hand, we have
[TABLE]
where the first inequality follows from Jensen’s inequality and the equality from the Nishimori identity. Both upper bounds in (151) and (152) take the form with a positive constant depending only on , , and (remember that ). Plugging (151) and (152) in (150), we conclude that
[TABLE]
where depends only on , , and . Combining (149) and (153) ends the proof of (143). ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Alaoui, A. E. & Krzakala, F. (2018) Estimation in the Spiked Wigner Model: A Short Proof of the Replica Formula. in 2018 IEEE International Symposium on Information Theory (ISIT) , pp. 1874–1878.
- 2[2] Barbier, J. (2020) Overlap matrix concentration in optimal Bayesian inference. Inf. Inference , https://doi.org/10.1093/imaiai/iaaa 008 . · doi ↗
- 3[3] Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T. & Zdeborová, L. (2016) Mutual information for symmetric rank-one matrix estimation: a proof of the replica formula. in Advances in Neural Information Processing Systems 29 , NIPS 2016, p. 424–432, Red Hook, NY, USA. Curran Associates.
- 4[4] Barbier, J., Krzakala, F., Macris, N., Miolane, L. & Zdeborová, L. (2019) Optimal errors and phase transitions in high-dimensional generalized linear models. Proc. Natl. Acad. Sci. USA , 116 (12), 5451–5460.
- 5[5] Barbier, J. & Macris, N. (2019 a) The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference. Probab. Theory Related Fields , 174 (3), 1133–1185.
- 6[6] (2019 b) The adaptive interpolation method for proving replica formulas. Applications to the Curie–Weiss and Wigner spike models. J. Phys. A , 52 (29), 294002.
- 7[7] Barbier, J., Macris, N. & Miolane, L. (2017) The Layered Structure of Tensor Estimation and its Mutual Information. ar Xiv:1709.10368 [cs.IT].
- 8[8] Boucheron, S., Lugosi, G. & Massart, P. (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence . Oxford Univ. Press, London, U.K.
