Mutual information for low-rank even-order symmetric tensor estimation

Cl\'ement Luneau; Jean Barbier; Nicolas Macris

arXiv:1904.04565·cs.IT·September 24, 2020

Mutual information for low-rank even-order symmetric tensor estimation

Cl\'ement Luneau, Jean Barbier, Nicolas Macris

PDF

TL;DR

This paper derives a variational formula for the asymptotic mutual information in finite-rank symmetric tensor factorization of even order, extending adaptive interpolation methods to more complex tensor models.

Contribution

It introduces a novel extension of the adaptive interpolation method for finite-rank, even-order symmetric tensors, advancing theoretical understanding of tensor estimation.

Findings

01

Derived a single-letter variational expression for mutual information

02

Extended adaptive interpolation to finite-rank, even-order tensors

03

Identified limitations for odd-order tensor cases

Abstract

We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This requires new nontrivial ideas with respect to the current analysis in the literature. We also underline where the proof falls short when dealing with odd-order tensors.

Equations427

Y_{i} = \frac{λ ( p - 1 )!}{n ^{p - 1}} k = 1 \sum K X_{i_{1} k} X_{i_{2} k} \dots X_{i_{p} k} + Z_{i},

Y_{i} = \frac{λ ( p - 1 )!}{n ^{p - 1}} k = 1 \sum K X_{i_{1} k} X_{i_{2} k} \dots X_{i_{p} k} + Z_{i},

Y : = \nicefrac λ (p - 1)! n^{p - 1} k = 1 \sum K X_{\cdot, k}^{\otimes p} + Z .

Y : = \nicefrac λ (p - 1)! n^{p - 1} k = 1 \sum K X_{\cdot, k}^{\otimes p} + Z .

\psi:S\in\mathcal{S}_{K}^{+}\mapsto\mathbb{E}\ln\int dP_{X}(x)\,\exp\Big{(}X^{T}Sx+\widetilde{Z}^{T}\sqrt{S}x-\frac{1}{2}x^{T}Sx\Big{)}\;;

\psi:S\in\mathcal{S}_{K}^{+}\mapsto\mathbb{E}\ln\int dP_{X}(x)\,\exp\Big{(}X^{T}Sx+\widetilde{Z}^{T}\sqrt{S}x-\frac{1}{2}x^{T}Sx\Big{)}\;;

\phi_{p,\lambda}:S\in\mathcal{S}_{K}^{+}\mapsto\psi\big{(}\lambda S^{\circ(p-1)}\big{)}-\frac{\lambda(p-1)}{2p}\sum_{\ell,\ell^{\prime}=1}^{K}\big{(}S^{\circ p}\big{)}_{\ell\ell^{\prime}}\;,

\phi_{p,\lambda}:S\in\mathcal{S}_{K}^{+}\mapsto\psi\big{(}\lambda S^{\circ(p-1)}\big{)}-\frac{\lambda(p-1)}{2p}\sum_{\ell,\ell^{\prime}=1}^{K}\big{(}S^{\circ p}\big{)}_{\ell\ell^{\prime}}\;,

\lim_{n\to+\infty}\frac{1}{n}I({\mathbf{X}};{\mathbf{Y}})=\frac{\lambda}{2p}\sum_{\ell,\ell^{\prime}=1}^{K}\big{(}\Sigma_{X}^{\circ p}\big{)}_{\ell\ell^{\prime}}-\sup_{S\in\mathcal{S}_{K}^{+}}\phi_{p,\lambda}(S)\,.

\lim_{n\to+\infty}\frac{1}{n}I({\mathbf{X}};{\mathbf{Y}})=\frac{\lambda}{2p}\sum_{\ell,\ell^{\prime}=1}^{K}\big{(}\Sigma_{X}^{\circ p}\big{)}_{\ell\ell^{\prime}}-\sup_{S\in\mathcal{S}_{K}^{+}}\phi_{p,\lambda}(S)\,.

\mathcal{H}_{n}({\mathbf{x}};{\mathbf{Y}})\coloneqq\sum_{i\in\mathcal{I}}\frac{(p-1)!}{2n^{p-1}}\bigg{(}\sum_{\ell=1}^{K}\prod_{a=1}^{p}x_{i_{a}\ell}\bigg{)}^{\!2}-\sum_{i\in\mathcal{I}}\sqrt{\frac{(p-1)!}{n^{p-1}}}Y_{i_{1}\dots i_{p}}\sum_{\ell=1}^{K}\prod_{a=1}^{p}x_{i_{a}\ell}\>.

\mathcal{H}_{n}({\mathbf{x}};{\mathbf{Y}})\coloneqq\sum_{i\in\mathcal{I}}\frac{(p-1)!}{2n^{p-1}}\bigg{(}\sum_{\ell=1}^{K}\prod_{a=1}^{p}x_{i_{a}\ell}\bigg{)}^{\!2}-\sum_{i\in\mathcal{I}}\sqrt{\frac{(p-1)!}{n^{p-1}}}Y_{i_{1}\dots i_{p}}\sum_{\ell=1}^{K}\prod_{a=1}^{p}x_{i_{a}\ell}\>.

d P (x ∣ Y) : = \frac{1}{Z _{n} ( Y )} e^{- H_{n} (x; Y)} j = 1 \prod n d P_{X} (x_{j}),

d P (x ∣ Y) : = \frac{1}{Z _{n} ( Y )} e^{- H_{n} (x; Y)} j = 1 \prod n d P_{X} (x_{j}),

f_{n} : = \frac{1}{n} E ln Z_{n} (Y),

f_{n} : = \frac{1}{n} E ln Z_{n} (Y),

\frac{1}{n}I({\mathbf{X}};{\mathbf{Y}})=\frac{1}{2p}\sum_{\ell,\ell^{\prime}=1}^{K}\big{(}\Sigma_{X}^{\circ p}\big{)}_{\ell\ell^{\prime}}-f_{n}+\mathcal{O}(n^{-1})\,.

\frac{1}{n}I({\mathbf{X}};{\mathbf{Y}})=\frac{1}{2p}\sum_{\ell,\ell^{\prime}=1}^{K}\big{(}\Sigma_{X}^{\circ p}\big{)}_{\ell\ell^{\prime}}-f_{n}+\mathcal{O}(n^{-1})\,.

n \to + \infty lim inf f_{n} \geq S \in S_{K}^{+} sup ϕ_{p} (S) .

n \to + \infty lim inf f_{n} \geq S \in S_{K}^{+} sup ϕ_{p} (S) .

n \to + \infty lim sup f_{n} \leq S \in S_{K}^{+} sup ϕ_{p} (S) .

n \to + \infty lim sup f_{n} \leq S \in S_{K}^{+} sup ϕ_{p} (S) .

⎩ ⎨ ⎧ Y_{i}^{(t)} = \frac{( 1 - t ) ( p - 1 )!}{n ^{p - 1}} k = 1 \sum K a = 1 \prod p X_{i_{a} k} + Z_{i}, i \in I; Y_{j}^{(t, ϵ)} = R (t, ϵ) X_{j} + Z_{j}, j \in [n] .

⎩ ⎨ ⎧ Y_{i}^{(t)} = \frac{( 1 - t ) ( p - 1 )!}{n ^{p - 1}} k = 1 \sum K a = 1 \prod p X_{i_{a} k} + Z_{i}, i \in I; Y_{j}^{(t, ϵ)} = R (t, ϵ) X_{j} + Z_{j}, j \in [n] .

\mathcal{H}_{t,\epsilon}({\mathbf{x}};{\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,\epsilon)})\coloneqq\sum_{i\in\mathcal{I}}\frac{(1-t)(p-1)!}{2n^{p-1}}\bigg{(}\sum_{k=1}^{K}\prod_{a=1}^{p}x_{i_{a}k}\bigg{)}^{\!2}-\sum_{i\in\mathcal{I}}\sqrt{\frac{(1-t)(p-1)!}{n^{p-1}}}Y_{i}^{(t)}\sum_{k=1}^{K}\prod_{a=1}^{p}x_{i_{a}k}\\ +\sum_{j=1}^{n}\frac{1}{2}x_{j}^{T}R(t,\epsilon)x_{j}-\big{(}\widetilde{Y}_{j}^{(t,\epsilon)}\big{)}^{T}\sqrt{R(t,\epsilon)}x_{j}\>.

\mathcal{H}_{t,\epsilon}({\mathbf{x}};{\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,\epsilon)})\coloneqq\sum_{i\in\mathcal{I}}\frac{(1-t)(p-1)!}{2n^{p-1}}\bigg{(}\sum_{k=1}^{K}\prod_{a=1}^{p}x_{i_{a}k}\bigg{)}^{\!2}-\sum_{i\in\mathcal{I}}\sqrt{\frac{(1-t)(p-1)!}{n^{p-1}}}Y_{i}^{(t)}\sum_{k=1}^{K}\prod_{a=1}^{p}x_{i_{a}k}\\ +\sum_{j=1}^{n}\frac{1}{2}x_{j}^{T}R(t,\epsilon)x_{j}-\big{(}\widetilde{Y}_{j}^{(t,\epsilon)}\big{)}^{T}\sqrt{R(t,\epsilon)}x_{j}\>.

f_{n} (t, ϵ) : = \frac{1}{n} E ln Z_{t, ϵ} (Y^{(t)}, Y^{(t, ϵ)})

f_{n} (t, ϵ) : = \frac{1}{n} E ln Z_{t, ϵ} (Y^{(t)}, Y^{(t, ϵ)})

{f_{n} (0, ϵ) = f_{n} + O (∥ ϵ ∥); f_{n} (1, ϵ) = ψ (R (1, ϵ)) .

{f_{n} (0, ϵ) = f_{n} + O (∥ ϵ ∥); f_{n} (1, ϵ) = ψ (R (1, ϵ)) .

⟨ g (x) ⟩_{t, ϵ} = \int g (x) \frac{e ^{- H_{t, ϵ} (x; Y^{(t)}, Y^{(t, ϵ)})}}{Z _{t, ϵ} ( Y ^{(t)} , Y ^{(t, ϵ)} )} j = 1 \prod n d P_{X} (x_{j}) .

⟨ g (x) ⟩_{t, ϵ} = \int g (x) \frac{e ^{- H_{t, ϵ} (x; Y^{(t)}, Y^{(t, ϵ)})}}{Z _{t, ϵ} ( Y ^{(t)} , Y ^{(t, ϵ)} )} j = 1 \prod n d P_{X} (x_{j}) .

f_{n} = O (∥ ϵ ∥) + O (n^{- 1}) + ψ (R (1, ϵ)) + \frac{1}{2 p} \int_{0}^{1} d t ℓ, ℓ^{'} = 1 \sum K E ⟨(Q_{ℓ ℓ^{'}})^{p} ⟩_{t, ϵ} - p (R^{'} (t, ϵ))_{ℓ ℓ^{'}} E ⟨ Q_{ℓ ℓ^{'}} ⟩_{t, ϵ},

f_{n} = O (∥ ϵ ∥) + O (n^{- 1}) + ψ (R (1, ϵ)) + \frac{1}{2 p} \int_{0}^{1} d t ℓ, ℓ^{'} = 1 \sum K E ⟨(Q_{ℓ ℓ^{'}})^{p} ⟩_{t, ϵ} - p (R^{'} (t, ϵ))_{ℓ ℓ^{'}} E ⟨ Q_{ℓ ℓ^{'}} ⟩_{t, ϵ},

f_{n}=\mathcal{O}(n^{-1})+\phi_{p}(S)+\frac{1}{2p}\int_{0}^{1}dt\sum_{\ell,\ell^{\prime}=1}^{K}\mathbb{E}\big{\langle}h_{p}(S_{\ell\ell^{\prime}},Q_{\ell\ell^{\prime}})\big{\rangle}_{t,0}\quad,

f_{n}=\mathcal{O}(n^{-1})+\phi_{p}(S)+\frac{1}{2p}\int_{0}^{1}dt\sum_{\ell,\ell^{\prime}=1}^{K}\mathbb{E}\big{\langle}h_{p}(S_{\ell\ell^{\prime}},Q_{\ell\ell^{\prime}})\big{\rangle}_{t,0}\quad,

\forall (ℓ, ℓ^{'}) \in {1, \dots, K}^{2} : (R^{'} (t, ϵ))_{ℓ ℓ^{'}} = E [⟨ Q_{ℓ ℓ^{'}} ⟩_{t, ϵ}]^{p - 1} .

\forall (ℓ, ℓ^{'}) \in {1, \dots, K}^{2} : (R^{'} (t, ϵ))_{ℓ ℓ^{'}} = E [⟨ Q_{ℓ ℓ^{'}} ⟩_{t, ϵ}]^{p - 1} .

\forall t \in [0, 1] : \frac{d R ( t )}{d t} = E [⟨ Q ⟩_{t, ϵ}]^{\circ (p - 1)}, R (0) = ϵ .

\forall t \in [0, 1] : \frac{d R ( t )}{d t} = E [⟨ Q ⟩_{t, ϵ}]^{\circ (p - 1)}, R (0) = ϵ .

\forall ϵ \in S_{K}^{++} : det J_{R (t, \cdot)} (ϵ) \geq 1 .

\forall ϵ \in S_{K}^{++} : det J_{R (t, \cdot)} (ϵ) \geq 1 .

⎩ ⎨ ⎧ Y_{i}^{(t)} Y_{j}^{(t, R)} = \frac{( 1 - t ) ( p - 1 )!}{n ^{p - 1}} k = 1 \sum K a = 1 \prod p X_{i_{a} k} + Z_{i}, i \in I; = R X_{j} + Z_{j}, j \in [n] .

⎩ ⎨ ⎧ Y_{i}^{(t)} Y_{j}^{(t, R)} = \frac{( 1 - t ) ( p - 1 )!}{n ^{p - 1}} k = 1 \sum K a = 1 \prod p X_{i_{a} k} + Z_{i}, i \in I; = R X_{j} + Z_{j}, j \in [n] .

G_{n}:\begin{array}[]{ccl}[0,1]\times\mathcal{S}_{K}^{+}&\to&\mathcal{S}_{K}^{+}\\ (t,R)&\mapsto&\mathbb{E}[\langle{\mathbf{Q}}\rangle_{t,R}]^{\circ(p-1)}\end{array}\,.

G_{n}:\begin{array}[]{ccl}[0,1]\times\mathcal{S}_{K}^{+}&\to&\mathcal{S}_{K}^{+}\\ (t,R)&\mapsto&\mathbb{E}[\langle{\mathbf{Q}}\rangle_{t,R}]^{\circ(p-1)}\end{array}\,.

\forall t \in [0, 1] : \frac{d R ( t )}{d t} = G_{n} (t, R (t)), R (0) = ϵ \in S_{K}^{+} .

\forall t \in [0, 1] : \frac{d R ( t )}{d t} = G_{n} (t, R (t)), R (0) = ϵ \in S_{K}^{+} .

\det J_{R(t,\cdot)}(\epsilon)=\exp\int_{0}^{t}ds\!\!\sum_{1\leq\ell\leq\ell^{\prime}\leq K}\frac{\partial(G_{n})_{\ell\ell^{\prime}}}{\partial R_{\ell\ell^{\prime}}}\bigg{|}_{s,R(s,\epsilon)}\,.

\det J_{R(t,\cdot)}(\epsilon)=\exp\int_{0}^{t}ds\!\!\sum_{1\leq\ell\leq\ell^{\prime}\leq K}\frac{\partial(G_{n})_{\ell\ell^{\prime}}}{\partial R_{\ell\ell^{\prime}}}\bigg{|}_{s,R(s,\epsilon)}\,.

\sum_{1\leq\ell\leq\ell^{\prime}\leq K}\frac{\partial(G_{n})_{\ell\ell^{\prime}}}{\partial R_{\ell\ell^{\prime}}}\bigg{|}_{s,R(s,\epsilon)}

\sum_{1\leq\ell\leq\ell^{\prime}\leq K}\frac{\partial(G_{n})_{\ell\ell^{\prime}}}{\partial R_{\ell\ell^{\prime}}}\bigg{|}_{s,R(s,\epsilon)}

\sum_{\ell\leq\ell^{\prime}}\frac{\partial(G_{n})_{\ell\ell^{\prime}}}{\partial R_{\ell\ell^{\prime}}}\bigg{|}_{t,R}=n(p-1)\sum_{\ell,\ell^{\prime}}\mathbb{E}[\langle Q_{\ell\ell^{\prime}}\rangle\,]^{p-2}\,\Delta_{\ell\ell^{\prime}}\;,

\sum_{\ell\leq\ell^{\prime}}\frac{\partial(G_{n})_{\ell\ell^{\prime}}}{\partial R_{\ell\ell^{\prime}}}\bigg{|}_{t,R}=n(p-1)\sum_{\ell,\ell^{\prime}}\mathbb{E}[\langle Q_{\ell\ell^{\prime}}\rangle\,]^{p-2}\,\Delta_{\ell\ell^{\prime}}\;,

\Delta_{\ell\ell^{\prime}}\coloneqq\mathbb{E}\bigg{[}\bigg{\langle}\bigg{(}\frac{Q_{\ell\ell^{\prime}}+Q_{\ell^{\prime}\ell}}{2}-\bigg{\langle}\frac{Q_{\ell\ell^{\prime}}+Q_{\ell^{\prime}\ell}}{2}\bigg{\rangle}\bigg{)}^{\!\!2}\,\bigg{\rangle}\bigg{]}-\mathbb{E}\bigg{[}\bigg{(}\bigg{\langle}\frac{Q_{\ell\ell^{\prime}}+Q_{\ell^{\prime}\ell}}{2}\bigg{\rangle}\!\!-\frac{(\langle{\mathbf{x}}\rangle^{T}\langle{\mathbf{x}}\rangle)_{\ell\ell^{\prime}}}{n}\bigg{)}^{\!\!2}\,\bigg{]}.

\Delta_{\ell\ell^{\prime}}\coloneqq\mathbb{E}\bigg{[}\bigg{\langle}\bigg{(}\frac{Q_{\ell\ell^{\prime}}+Q_{\ell^{\prime}\ell}}{2}-\bigg{\langle}\frac{Q_{\ell\ell^{\prime}}+Q_{\ell^{\prime}\ell}}{2}\bigg{\rangle}\bigg{)}^{\!\!2}\,\bigg{\rangle}\bigg{]}-\mathbb{E}\bigg{[}\bigg{(}\bigg{\langle}\frac{Q_{\ell\ell^{\prime}}+Q_{\ell^{\prime}\ell}}{2}\bigg{\rangle}\!\!-\frac{(\langle{\mathbf{x}}\rangle^{T}\langle{\mathbf{x}}\rangle)_{\ell\ell^{\prime}}}{n}\bigg{)}^{\!\!2}\,\bigg{]}.

\displaystyle\mathbb{E}\,\bigg{(}\bigg{\langle}\frac{Q_{\ell\ell^{\prime}}+Q_{\ell^{\prime}\ell}}{2}\bigg{\rangle}-\frac{(\langle{\mathbf{x}}\rangle^{\intercal}\langle{\mathbf{x}}\rangle)_{\ell\ell^{\prime}}}{n}\bigg{)}^{\!\!2}

\displaystyle\mathbb{E}\,\bigg{(}\bigg{\langle}\frac{Q_{\ell\ell^{\prime}}+Q_{\ell^{\prime}\ell}}{2}\bigg{\rangle}-\frac{(\langle{\mathbf{x}}\rangle^{\intercal}\langle{\mathbf{x}}\rangle)_{\ell\ell^{\prime}}}{n}\bigg{)}^{\!\!2}

\displaystyle\leq\mathbb{E}\,\bigg{\langle}\bigg{(}\frac{({\mathbf{x}}^{\intercal}{\mathbf{X}}+{\mathbf{X}}^{\intercal}{\mathbf{x}})_{\ell\ell^{\prime}}}{2n}-\frac{(\langle{\mathbf{x}}\rangle^{\intercal}{\mathbf{x}}+{\mathbf{x}}^{\intercal}\langle{\mathbf{x}}\rangle)_{\ell\ell^{\prime}}}{2n}\bigg{)}^{\!\!2}\,\bigg{\rangle}

\displaystyle=\mathbb{E}\,\bigg{\langle}\Big{(}\frac{({\mathbf{X}}^{\intercal}{\mathbf{x}}+{\mathbf{x}}^{\intercal}{\mathbf{X}})_{\ell\ell^{\prime}}}{2n}-\frac{(\langle{\mathbf{x}}\rangle^{\intercal}{\mathbf{X}}+{\mathbf{X}}^{\intercal}\langle{\mathbf{x}}\rangle)_{\ell\ell^{\prime}}}{2n}\Big{)}^{\!\!2}\,\bigg{\rangle}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Mutual information for low-rank even-order symmetric tensor estimation

Clément Luneau

Communication Theory Laboratory, École Polytechnique Fédérale de Lausanne, Switzerland

Jean Barbier

The Abdus Salam International Center for Theoretical Physics, Trieste, Italy.

Nicolas Macris

Communication Theory Laboratory, École Polytechnique Fédérale de Lausanne, Switzerland

Abstract

We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This requires new nontrivial ideas with respect to the current analysis in the literature. We also underline where the proof falls short when dealing with odd-order tensors.

1 Introduction

There exist well-known unsupervised algorithms to discover structure in a 2D dataset, e.g., singular value decomposition (SVD), principal component analysis (PCA) and other spectral methods [14]. Tensors naturally handle multidimensional data and their use becomes more and more beneficial with the emergence of big data, a strong incentive to go beyond the flat matrix world. Tensor decompositions come with some advantages with respect to matrices, and have numerous applications in signal processing and machine learning, e.g., data compression, data visualization, learning probabilistic latent variables models, etc. [9, 21]. The canonical polyadic decomposition (CPD), also known as tensor rank decomposition or tensor factorization, is the most familiar one and represents a tensor as a minimum-length linear combination of rank-one tensors. This minimum-length defines the tensor rank. If instead the number $K$ of rank-one tensors forming the linear combination is not minimal, we talk of a $K$ -term decomposition. Decompositions of tensors are also called tensor factorizations and this is the terminology we adopt in the rest of the paper.

One approach to explore computational and/or statistical limits of tensor factorization is to consider a statistical model, as done in [22]. The model is the following: draw $K$ column vectors, evaluate for each of them their $p$ th tensor power and sum those $K$ symmetric order- $p$ tensors (this sum is exactly a $K$ -term polyadic decomposition). Tensor factorization can then be studied as an inference problem, namely, to estimate the initial $K$ vectors from noisy observations of the tensor and to determine information theoretic limits for this task. To do so, we focus on proving formulas for the asymptotic mutual information between the noisy observed tensor and the original $K$ vectors. Such formulas were first rigorously derived for $p=2$ and $K=1$ , i.e., rank-one matrix factorization: see [15] for the case with a binary input vector, [10] for the restricted case in which no discontinuous phase transition occurs, [16] for a single-sided bound and, finally, [3] for the fully general case. The proof in [3] combines interpolation techniques with spatial coupling and an analysis of the Approximate Message-Passing (AMP) algorithm. Later, and still for $p=2$ , [17] went beyond rank-one by using a rigorous version of the cavity method. Reference [18] applied the heuristic replica method to conjecture a formula for any $p$ and finite $K$ , which is then proved for $p\geq 2$ and $K=1$ . Reference [18] also details the AMP algorithm for tensor factorization and shows how the single-letter variational expression for the mutual information allows one to give guarantees on AMP’s performance. Afterwards, [5, 6] introduced the adaptive interpolation proof technique which they applied to the case $p\geq 2$ , $K=1$ . Other proofs based on interpolations recently appeared, see [1] ( $p=2$ , $K=1$ ) and [20] ( $p\geq 2$ , $K=1$ ).

In this work, we prove the conjectured replica formula for any finite rank $K$ and any even order $p$ using the adaptive interpolation method. We also underline what is missing to extend the proof to odd orders. The adaptive interpolation method was introduced in [5, 6] as a powerful extension to the Guerra-Toninelli interpolation scheme [11]. Since then, it has been applied to many other inference problems in order to prove formulas for the mutual information, e.g., [7, 4]. While our proof outline is similar to [6], there are two important new ingredients. First, to establish a tight lower bound on the asymptotic mutual information, we have to prove the regularity of a change of variable given by the solutions to an ordinary differential equation. This is nontrivial when the rank becomes greater than one. Second, the same bound requires one to prove the concentration of the overlap (a quantity that fully characterizes the system in the high-dimensional limit). When the rank is greater than one, this overlap is a matrix and a recent result [2] on the concentration of overlap matrices can be adapted to obtain the required concentration in our interpolation scheme.

The paper is organized as follows. In Section 2 we set up our precise statistical model and state the main theorems giving the single-letter variational expression for the asymptotic mutual information. The adaptive interpolation method is formulated in Section 3 and the basic upper and lower bounds on the asymptotic mutual information are proved in Section 4. Sections 5 and 6 contain the new and essential results which allow to go from rank-one to finite-rank tensors. Finally, the difficulties encountered for odd-order tensors are discussed in the last section, that is, Section 7. The reader will find in Appendix B a technical calculation which is new and crucial to our proof, while Appendices A and C present more classical material.

2 Low-rank symmetric tensor factorization

We study the following statistical model. Let $n$ be a positive integer. $X_{1},\dots,X_{n}$ are random column vectors in $\mathbb{R}^{K}$ , independent and identically distributed (i.i.d.) with distribution $P_{X}$ . These vectors are not directly observed. Instead, for each $p$ -tuple $i=(i_{1},\dots,i_{p})\in[n]^{p}$ with $i_{1}\leq i_{2}\leq\dots\leq i_{p}$ , we observe

[TABLE]

where $\lambda$ is a known signal-to-noise ratio (SNR) and the noise $Z_{i}$ is i.i.d. with respect to the standard normal distribution $\mathcal{N}(0,1)$ . Let ${\mathbf{X}}$ be the $n\times K$ matrix whose $i$ th row is given by $X_{i}$ . All the observations (1) are combined into the following symmetric order- $p$ tensor ( $X_{\cdot,k}\in\mathbb{R}^{n}$ denotes the $k$ th column of ${\mathbf{X}}$ ):

[TABLE]

Our main result is the proof of a formula for the mutual information in the limit $n\to+\infty$ while the rank $K$ is kept fixed. This formula is given as the optimization of a potential over the cone of $K\times K$ symmetric positive semidefinite matrices $\mathcal{S}_{K}^{+}$ . Let $\widetilde{Z}\sim\mathcal{N}(0,I_{K})$ and $X\sim P_{X}$ . Define the convex function (see Lemma 4 in Appendix A)

[TABLE]

as well as the potential

[TABLE]

where $S^{\circ k}$ is the $k$ th Hadamard power of $S$ (for two matrices $A$ and $B$ of the same dimension, the Hadamard product $A\circ B$ is the matrix of same dimension with elements given by $(A\circ B)_{ij}=A_{ij}B_{ij}$ ). Note that, by the Schur Product Theorem [23], the Hadamard product of two matrices in $\mathcal{S}_{K}^{+}$ is also in $\mathcal{S}_{K}^{+}$ . Let $\Sigma_{X}\coloneqq\mathbb{E}[XX^{\intercal}]\in\mathcal{S}_{K}^{+}$ the second moment matrix of a random vector $X\sim P_{X}$ . Our main result is the proof of the replica formula conjectured in [18], that is,

Theorem 1.

(Mutual information in the high-dimensional limit) Assume $p$ is even and $P_{X}$ is such that its first $2p$ moments are finite. Then

[TABLE]

Important remark: We can reduce the proof of (4) to the case $\lambda=1$ by rescaling properly $P_{X}$ . From now on, we set $\lambda=1$ and define $\phi_{p}\coloneqq\phi_{p,1}$ .

Before proving Theorem 1, we introduce important information theoretic quantities, adopting the statistical mechanics terminology. Let $\mathcal{I}\coloneqq\{i\in[n]^{p}:i_{a}\leq i_{a+1}\}$ . Given the observations ${\mathbf{Y}}$ , define the Hamiltonian for all ${\mathbf{x}}\in\mathbb{R}^{n\times K}$ :

[TABLE]

Using Bayes’ rule, the posterior probability density function is

[TABLE]

with $\mathcal{Z}_{n}({\mathbf{Y}})\coloneqq\int e^{-\mathcal{H}_{n}({\mathbf{x}};{\mathbf{Y}})}\prod_{j}dP_{X}(x_{j})$ the normalization factor. Finally, the free entropy is the quantity

[TABLE]

which is linked to the mutual information through the identity

[TABLE]

In (7), $\mathcal{O}(n^{-1})$ is a quantity such that $n\mathcal{O}(n^{-1})$ is bounded uniformly in $n$ . Thanks to (7), Theorem 1 will follow directly from the next two bounds on the asymptotic free entropy.

Theorem 2.

(Lower bound on the asymptotic free entropy) Assume $p$ is even and $P_{X}$ is such that its first $2p$ moments are finite. Then

[TABLE]

Theorem 3.

(Upper bound on the asymptotic free entropy) Assume $p$ is even and $P_{X}$ has bounded support. Then

[TABLE]

Important remark: Note that the assumption on $P_{X}$ in Theorem 3 is stricter than the one in Theorem 1. Therefore, combining Theorem 2 and Theorem 3 only proves the limit (4) for a distribution $P_{X}$ which has bounded support. The generalization to a distribution $P_{X}$ whose first $2p$ moments are finite is done by approaching $P_{X}$ with distributions having bounded support, much as it is done in [17, Section 6.2.2]

3 Adaptive path interpolation

We introduce a time parameter $t\in[0,1]$ . The adaptive path interpolation interpolates from the original channel (1) at $t=0$ to decoupled channels at $t=1$ . In between, we follow an interpolation path $R(\cdot,\epsilon):[0,1]\to\mathcal{S}_{K}^{+}$ , which is a continuously differentiable function parametrized by a small perturbation $\epsilon\in\mathcal{S}_{K}^{+}$ and such that $R(0,\epsilon)=\epsilon$ . More precisely, for $t\in[0,1]$ , we observe:

[TABLE]

The noise $\widetilde{Z}_{j}\overset{\text{\tiny i.i.d.}}{\mathrel{\raisebox{-2.0pt}{$ \sim $}}}\mathcal{N}(0,I_{K})$ is independent of both ${\mathbf{X}}$ and ${\mathbf{Z}}$ . Let $\widetilde{{\mathbf{Z}}}$ be the $n\times K$ matrix whose $j$ th row is given by $\widetilde{Z}_{j}$ and $\widetilde{{\mathbf{Y}}}^{(t,\epsilon)}\coloneqq\sqrt{R(t,\epsilon)}\,{\mathbf{X}}^{T}+\widetilde{{\mathbf{Z}}}^{T}$ . The associated interpolating Hamiltonian reads:

[TABLE]

The interpolating free entropy is defined similarly to the original free entropy (6), that is,

[TABLE]

with $\mathcal{Z}_{t,\epsilon}({\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,\epsilon)})\coloneqq\int e^{-\mathcal{H}_{t,\epsilon}({\mathbf{x}};{\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,\epsilon)})}\prod_{j=1}^{n}dP_{X}(x_{j})$ . Evaluating (12) at both extremes gives:

[TABLE]

$\|\cdot\|$ denotes the Frobenius norm and $\mathcal{O}(\|\epsilon\|)$ is a quantity such that $|\mathcal{O}(\|\epsilon\|)|\leq\nicefrac{{\mathrm{Tr}(\Sigma_{X})\|\epsilon\|}}{{2}}$ . In order to deal with future computations, it is useful to introduce the Gibbs brackets $\langle-\rangle_{t,\epsilon}$ that denote an expectation with respect to the posterior distribution, i.e.,

[TABLE]

Combining (13) with the fundamental theorem of calculus $f_{n}(0,\epsilon)=f_{n}(1,\epsilon)-\int_{0}^{1}f_{n}^{\prime}(t,\epsilon)dt$ , we obtain the sum-rule of the adaptive path interpolation.

Proposition 1 (Sum-rule).

Assume $P_{X}$ has finite $(2p)$ th-order moments. Denote $R^{\prime}(\cdot,\epsilon)$ the derivative of the interpolation path $R(\cdot,\epsilon)$ . Let $Q_{\ell\ell^{\prime}}\coloneqq\frac{1}{n}\sum_{j=1}^{n}x_{j\ell}X_{j\ell^{\prime}}$ be the entries of the $K\times K$ overlap matrix ${\mathbf{Q}}\coloneqq\frac{1}{n}{\mathbf{x}}^{\intercal}{\mathbf{X}}$ . Then

[TABLE]

where $\mathcal{O}(n^{-1})$ and $\mathcal{O}(\|\epsilon\|)$ are independent of $\epsilon$ and $n$ , respectively.

Proof.

See Section 5 for the computation of $f_{n}^{\prime}(t,\epsilon)$ , that is, the $t$ -derivative of $f_{n}(\cdot,\epsilon)$ . ∎

4 Matching bounds

In this section we prove both Theorems 2 and 3 by plugging two different choices for $R(\cdot,\epsilon)$ in the sum-rule (15).

4.1 Lower bound: proof of Theorem 2

A lower bound on $f_{n}$ is obtained by choosing the interpolation function $R(t,0)=tS^{\circ(p-1)}$ with $S$ a $K\times K$ symmetric positive semidefinite matrix, i.e., $\epsilon=0$ and $R^{\prime}(t,\epsilon)=S^{\circ(p-1)}$ . Then the sum-rule (15) reads

[TABLE]

where $h_{p}(r,q)\coloneqq q^{p}-pqr^{p-1}+(p-1)r^{p}$ . If $p$ is even then $h_{p}$ is nonnegative on $\mathbb{R}^{2}$ and (16) directly implies $f_{n}\geq\phi_{p}(S)+\mathcal{O}(n^{-1})$ . Taking the inferior limit on both sides of this inequality, and bearing in mind that the inequality is valid for all $S\in\mathcal{S}_{K}^{+}$ , ends the proof of Theorem 2. $\square$

We have at our disposal a wealth of interpolation paths when considering any continuously differentiable $R(\cdot,\epsilon)$ . However, to establish the lower bound (8), we have used a simple linear interpolation, i.e., $R^{\prime}(t,\epsilon)=S^{\circ(p-1)}$ . Such an interpolation dates back to Guerra [11] and was already used by [18, 17] to derive the lower bound (8) for both cases $K=1$ , any order $p$ , and $p=2$ , any finite rank $K$ . Now that we turn to the proof of the upper bound (9), we will see how the flexibility in the choice of $R(\cdot,\epsilon)$ constitutes an improvement on the classical interpolation.

4.2 Upper bound: proof of Theorem 3

4.2.1 Interpolation determined by an ordinary differential equation (ODE)

The sum-rule (15) suggests to pick an interpolation path satisfying

[TABLE]

The integral in (15) can then be split in two terms: one similar to the second summand in (3), and one that will vanish in the high-dimensional limit if the overlap concentrates. The next proposition states that (17) indeed admits a solution, a fact which is not obvious because the Gibbs brackets $\langle-\rangle_{t,\epsilon}$ themselves depend on $R(\cdot,\epsilon)$ . Nontrivial properties required to show the upper bound (9) are also proved.

Proposition 2.

For all $\epsilon\in\mathcal{S}_{K}^{+}$ , there exists a unique global solution $R(\cdot,\epsilon):[0,1]\to\mathcal{S}_{K}^{+}$ to the first-order ODE

[TABLE]

This solution is continuously differentiable and bounded. If $p$ is even then $\forall\,t\in[0,1]$ , $R(t,\cdot)$ is a $\mathcal{C}^{1}$ -diffeomorphism from $\mathcal{S}_{K}^{++}$ (the open cone of $K\times K$ symmetric positive definite matrices) into $R(t,\mathcal{S}_{K}^{++})$ whose Jacobian determinant is greater than one, i.e.,

[TABLE]

Here $J_{R(t,\cdot)}$ denotes the Jacobian matrix of $R(t,\cdot)$ .

Proof.

We now rewrite (17) explicitly as an ODE. Let $R$ be a matrix in $\mathcal{S}_{K}^{+}$ . Consider the problem of inferring ${\mathbf{X}}$ from the following observations:

[TABLE]

It is reminiscent of the interpolating problem (10). We can form a Hamiltonian similar to (11), where $R(t,\epsilon)$ is simply replaced by $R$ , and $\langle-\rangle_{t,R}$ are the Gibbs brackets associated to the posterior of this model. We define the function

[TABLE]

Note that $\mathbb{E}\langle{\mathbf{Q}}\rangle_{t,R}$ is a symmetric positive semidefinite matrix. Indeed, from the Nishimori identity111 The Nishimori identity is a direct consequence of the Bayes formula. In our setting, it states $\mathbb{E}\langle g({\mathbf{x}},{\mathbf{X}})\rangle_{t,R}=\mathbb{E}\langle g({\mathbf{x}},{\mathbf{x}}^{\prime})\rangle_{t,R}=\mathbb{E}\langle g({\mathbf{X}},{\mathbf{x}})\rangle_{t,R}$ where ${\mathbf{x}},{\mathbf{x}}^{\prime}$ are two samples drawn independently from the posterior distribution given ${\mathbf{Y}}^{(t)}$ , $\widetilde{{\mathbf{Y}}}^{(t,R)}$ . Here $g$ can also explicitly depend on ${\mathbf{Y}}^{(t)}$ , $\widetilde{{\mathbf{Y}}}^{(t,R)}$ . , $\mathbb{E}\langle{\mathbf{Q}}\rangle_{t,R}=n^{-1}\mathbb{E}[\langle{\mathbf{x}}\rangle_{t,R}^{\intercal}{\mathbf{X}}]=n^{-1}\mathbb{E}[\langle{\mathbf{x}}\rangle_{t,R}^{\intercal}\langle{\mathbf{x}}\rangle_{t,R}]$ . By the Schur Product Theorem [23], the Hadamard power $\mathbb{E}[\langle{\mathbf{Q}}\rangle_{t,R}]^{\circ(p-1)}$ also belongs to $\mathcal{S}_{K}^{+}$ , justifying that $G_{n}$ takes values in the cone of symmetric positive semidefinite matrices. $G_{n}$ is continusouly differentiable on $[0,1]\times\mathcal{S}_{K}^{+}$ . By the Cauchy-Lipschitz theorem, there exists a unique global solution $R(\cdot,\epsilon)$ to the $K(K+1)/2$ -dimensional ODE:

[TABLE]

Each initial condition $\epsilon\in S_{K}^{+}$ is tied to a unique solution $R(\cdot,\epsilon)$ . This implies that the function $\epsilon\mapsto R(t,\epsilon)$ is injective. Its Jacobian determinant is given by Liouville’s formula [12]:

[TABLE]

Thanks to the identity (22), we can show that the Jacobian determinant is greater than (or equal to) one by proving that the divergence

[TABLE]

is nonnegative for all $(s,R)\in[0,1]\times\mathcal{S}_{K}^{+}$ . By Lemma 5 in Appendix B, the divergence reads (we omit the subscripts of the Gibbs brackets $\langle-\rangle_{t,R}$ ):

[TABLE]

where

[TABLE]

If $p$ is even then $\mathbb{E}[\langle Q_{\ell\ell^{\prime}}\rangle_{t,R}]^{p-2}$ is nonnegative. We show next that the $\Delta_{\ell\ell^{\prime}}$ ’s are nonnegative, thus ending the proof of (18). The second expectation on the right-hand side (r.h.s.) of (24) satisfies:

[TABLE]

The inequality is a simple application of Jensen’s inequality, while the equality that follows is an application of the Nishimori identity. The final upper bound is nothing but the first expectation on the r.h.s. of (24). Therefore, $\forall(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}:\Delta_{\ell\ell^{\prime}}\geq 0$ . ∎

4.2.2 Proof of Theorem 3

Let $\epsilon$ be a symmetric positive definite matrix, i.e., $\epsilon\in\mathcal{S}_{K}^{++}$ . We interpolate with the unique solution $R(\cdot,\epsilon):[0,1]\mapsto\mathcal{S}_{K}^{++}$ to (17). The sum-rule (15) then reads:

[TABLE]

Using first the Lipschitz continuity of $\psi$ and then its convexity (see Lemma 4, Appendix A), it comes:

[TABLE]

with $|\mathcal{O}(\|\epsilon\|)|\leq\frac{\mathrm{Tr}\,\Sigma_{X}}{2}\|\epsilon\|$ . Combining both (25) and (4.2.2) directly gives:

[TABLE]

In order to end the proof of (9), we must show that the last integral term in the upper bound (27) vanishes when $n$ goes to infinity. This will be the case if the overlap matrix ${\mathbf{Q}}$ concentrates around its expectation $\mathbb{E}\langle{\mathbf{Q}}\rangle_{t,\epsilon}$ . Indeed, provided that the $(4p-4)$ th-order moments of $P_{X}$ are finite, there exists a constant $C_{X}$ depending only on $P_{X}$ such that

[TABLE]

However, proving that the r.h.s. of (28) vanishes is only possible after integrating on a well-chosen set of perturbations $\epsilon$ (that play the role of initial conditions in the ODE (21)). In essence, the integration over $\epsilon$ smoothens the phase transitions that might appear for particular choices of $\epsilon$ when $n$ goes to infinity. We now describe the set of perturbations on which to integrate.

Let $(s_{n})_{n\in\mathbb{N}^{*}}$ be a decreasing sequence of real numbers in $(0,1)$ and define the sequence of subsets:

[TABLE]

Those are subsets of symmetric strictly diagonally dominant matrices with positive diagonal entries, hence they are included in $\mathcal{S}_{K}^{++}$ (see [13, Corollary 7.2.3]). As $\mathcal{E}_{n}$ is a $K(K+1)/2$ -dimensionnal hypercube whose side has length $s_{n}$ , its volume is $V_{\mathcal{E}_{n}}=s_{n}^{\nicefrac{{K(K+1)}}{{2}}}$ .

Remember that, as per Proposition 2, for every $\epsilon\in\mathcal{E}_{n}$ the interpolation path is chosen as the unique solution $R(\cdot,\epsilon):[0,1]\mapsto\mathcal{S}_{K}^{++}$ to $R^{\prime}(t,\epsilon)=\mathbb{E}[\langle{\mathbf{Q}}\rangle_{t,\epsilon}]^{\circ(p-1)}$ . Then, for a fixed $t\in[0,1]$ , using Cauchy-Schwarz inequality and the change of variable $\epsilon\to R\equiv R(t,\epsilon)$ – which is justified because $\epsilon\mapsto R(t,\epsilon)$ is a $\mathcal{C}^{1}$ -diffeomorphism (see Proposition 2) –, we obtain:

[TABLE]

We introduced the notation $\mathcal{R}_{n,t}\coloneqq R(t,\mathcal{E}_{n})$ while $\langle-\rangle_{t,R}$ are still the Gibbs brackets associated to the posterior distribution of the inference problem (19). The last inequality follows from (18). It will be easier to work with the convex hulls of $\mathcal{R}_{n,t}$ , denoted $\mathrm{C}(\mathcal{R}_{n,t})$ . These convex hulls are uniformly bounded compact sets of $\mathcal{S}_{K}^{++}$ . Indeed, every $\mathcal{R}_{n,t}$ is compact and included in the convex set

[TABLE]

which does not depend on $n$ and $t$ (see Section 6, property (i) of Lemma 1). Note that the upper bound (30), the inclusion $\mathcal{R}_{n,t}\subseteq\mathrm{C}(\mathcal{R}_{n,t})$ and the nonnegativity of the integrand directly imply:

[TABLE]

By Theorem 4 in Section 6, there exists a positive constant $C$ which depends only on $P_{X}$ , $K$ and $p$ such that:

[TABLE]

Combining (28), (32) and (33), we finally get:

[TABLE]

To conclude the proof, we have to further constrain $s_{n}$ to satisfy both $s_{n}\to 0$ and $s_{n}^{9+3K(K+1)}n\to+\infty$ when $n\to+\infty$ . E.g., $s_{n}=(0.99/n)^{\alpha}$ with $0<\alpha<(9+3K(K+1))^{-1}$ is a valid choice. Under this constraint, the upper bound (34) vanishes in the high-dimensional limit. Integrating the inequality (27) over $\epsilon\in\mathcal{E}_{n}$ and, then, making use of the vanishing upper bound (34) as well as

[TABLE]

give the inequality $f_{n}=V_{\mathcal{E}_{n}}^{-1}\int_{\mathcal{E}_{n}}d\epsilon\,f_{n}\leq\sup_{S\in\mathcal{S}_{K}^{+}}\phi_{p}(S)+\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}_{n}(1)$ . The upper bound (9) follows simply, thus ending the proof of Theorem 3. $\square$

5 Time-derivative of the average interpolating free entropy

In order to prove the sum-rule in Proposition 1, we need to compute the derivative of the averaged interpolating free entropy (12) with respect to $t$ . We recall that $R^{\prime}(\cdot,\epsilon)$ denotes the derivative of $R(\cdot,\epsilon)$ and that the overlap matrix is ${\mathbf{Q}}=\frac{1}{n}{\mathbf{x}}^{T}{\mathbf{X}}\in\mathbb{R}^{K\times K}$ , that is,

[TABLE]

Proposition 3 (Derivative of the average interpolating free entropy).

Assume that $P_{X}$ has finite $(2p)$ th-order moments. Consider the average free entropy (12). Its derivative with respect to $t$ satisfies:

[TABLE]

Here $\mathcal{O}_{n}(n^{-1})$ is a quantity such that $n\mathcal{O}_{n}(n^{-1})$ is bounded uniformly in $n$ , $t$ and $\epsilon$ .

Proof.

Note that the conditional probability density function of $({\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,\epsilon)})$ given ${\mathbf{X}}={\mathbf{x}}^{\star}$ reads:

[TABLE]

Therefore, the average interpolating free entropy satisfies:

[TABLE]

Taking the time-derivative of (38), we get:

[TABLE]

where $T_{1}$ , $T_{2}$ are given by the two expectations $\mathbb{E}[-]$ and

[TABLE]

Equation (40) comes from differentiating the interpolating Hamiltonian (11). Before diving further, we remind two useful identities:

[TABLE]

The identities (41) and (42) can further be combined to obtain

[TABLE]

Evaluating (40) at $({\mathbf{x}},{\mathbf{Y}},\widetilde{{\mathbf{Y}}})=({\mathbf{X}},{\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,\epsilon)})$ and then making use of (43), it comes:

[TABLE]

Thanks to the Nishimori identity,

[TABLE]

It follows that

[TABLE]

where we used $\mathbb{E}[Z_{i}]=\mathbb{E}[\widetilde{Z}_{j}]=0$ to get the last equality. Therefore, $f^{\prime}_{n}(t,\epsilon)=-\nicefrac{{T_{1}}}{{n}}$ . Plugging (5) in the expression for $T_{1}$ , we obtain:

[TABLE]

The two kind of expectations appearing on the r.h.s. of (45) are simplified in the paragraphs a) and b).

a) Integrating by parts with respect to the Gaussian random variable $Z_{i}$ , we get:

[TABLE]

Summing the latter identity over $\ell\in\{1,\dots,K\}$ and $i\in\mathcal{I}=\{i\in[n]^{p}:i_{a}\leq i_{a+1}\}$ , we obtain:

[TABLE]

This last equality can be further simplified by replacing the sum over tuples $i\in[n]^{p}$ such that $i_{1}<\dots<i_{p}$ by a sum over any $p$ -tuple whose elements are distinct divided by $p!$ (the cardinality of the symmetric group of degree $p$ ). This is possible because the summand is symmetric with respect to any permutation of the indices $(i_{1},\dots,i_{p})$ . We also need to account for the terms corresponding to $p$ -tuples having common elements (that is, $i_{a}=i_{a^{\prime}}$ for some $a\neq a^{\prime}$ ). There are $\mathcal{O}_{n}(n^{p-1})$ such terms and each summand is bounded under the assumption that $P_{X}$ has finite $(2p)$ th order moments. Hence the term $\mathcal{O}_{n}(n^{-1})$ appearing in the final equalities:

[TABLE]

b) Now we look at the second expectation and integrate by parts with respect to the Gaussian random vector $\widetilde{Z}_{j}$ :

[TABLE]

Equation (47) can be further simplified thanks to the Nishimory identity (for the first and last equalities) and the identity (43) (for the second equality):

[TABLE]

Summing the latter over $j\in\{1,\dots,n\}$ , we obtain:

[TABLE]

Summing the final expressions in (46) and (49) ends the proof of Proposition 3. ∎

6 Concentration of the overlap matrix

The proof of Theorem 3 requires that, up to an integral over a small volume of perturbations $\epsilon\in\mathcal{S}_{K}^{++}$ , the overlap matrix ${\mathbf{Q}}$ concentrates around its expectation $\mathbb{E}\langle{\mathbf{Q}}\rangle_{t,\epsilon}$ . We chose to integrate the perturbation over the hypercube $\mathcal{E}_{n}\subseteq\mathcal{S}_{K}^{++}$ which is defined by (29) and depends on a sequence $(s_{n})_{n\in\mathbb{N}^{*}}$ of decreasing numbers in $(0,1)$ . Remember that, for all $\epsilon\in\mathcal{E}_{n}$ , $R(\cdot,\epsilon):[0,1]\mapsto\mathcal{S}_{K}^{++}$ is the unique solution to $R^{\prime}(t,\epsilon)=\mathbb{E}[\langle{\mathbf{Q}}\rangle_{t,\epsilon}]^{\circ(p-1)}$ and, for all $t\in[0,1]$ , $C(\mathcal{R}_{n,t})$ is the convex hull of the image $\mathcal{R}_{n,t}\coloneqq R(t,\mathcal{E}_{n})$ . We also remind that, in Proposition 2, we introduced the inference problem (19) whose associated posterior distribution reads

[TABLE]

where $\forall{\mathbf{x}}\in\mathbb{R}^{n\times K}$ :

[TABLE]

Let $\langle-\rangle_{t,R}=\int-\,dP({\mathbf{x}}\,|\,{\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,R)})$ be the Gibbs brackets associated to the posterior distribution (50). Thanks to a change of variables (see the upper bound (32)), we showed that the following theorem is enough to prove Theorem 3.

Theorem 4 (Concentration of the overlap matrix around its expectation).

Assume $P_{X}$ has bounded support. There exists a positive constant $C$ depending only on $P_{X}$ , $K$ and $p$ such that

[TABLE]

The proof of Theorem 4 relies on the one of [2, Theorem 3]. In the later reference, the concentration result is given for an integral over a hypercube $\mathcal{E}_{n}$ . In our case, the integral on the left-hand side of (52) is over the convex hull of $\mathcal{E}_{n}$ ’s image by the function $R(t,\cdot)$ . It is likely not a hypercube, even less one whose form is similar to $\eqref{definition_En}$ . Therefore, we first show that the convex hulls $\mathrm{C}(\mathcal{R}_{n,t})$ have properties allowing us to carry out a proof similar to [2].

6.1 Properties of $\mathcal{R}_{n,t}$ ’s convex hull

For $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ , we will denote $E^{(\ell,\ell^{\prime})}$ the $K\times K$ symmetric matrix whose entries are:

[TABLE]

Lemma 1 (Properties of $\mathcal{R}_{n,t}$ ’s convex hull).

For every $R\in\mathrm{C}(\mathcal{R}_{n,t})$ :

(i)

$\|R\|\leq 4K^{\nicefrac{{3}}{{2}}}+\mathrm{Tr}(\Sigma_{X})^{p-1}$ ; 2. (ii)

there exists $\epsilon\in\mathcal{E}_{n}$ such that $R\succcurlyeq\epsilon$ ; 3. (iii)

for every pair $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ and real number $\delta\in(-s_{n},s_{n})$ , $R+\delta E^{(\ell,\ell^{\prime})}$ is a symmetric positive definite matrix; 4. (iv)

the $1$ st-order Fréchet derivative $\frac{\partial\sqrt{R}}{\partial R_{\ell\ell^{\prime}}}$ and the $2$ nd-order Fréchet derivative $\frac{\partial^{2}\sqrt{R}}{\partial R_{\ell\ell^{\prime}}^{2}}$ satisfy

[TABLE]

Remark: Note that (i) does not depend on $n$ and $t$ , while (ii-iv) do not depend on $t$ .

Proof.

We start by proving (i). If $R\in\mathcal{R}_{n,t}$ then there exists $\epsilon\in\mathcal{E}_{n}$ such that $R=R(t,\epsilon)$ , i.e.,

[TABLE]

Thus, $\|R\|\leq\|\epsilon\|+\int_{0}^{t}\|\mathbb{E}[\langle{\mathbf{Q}}\rangle_{s,\epsilon}]^{\circ(p-1)}\|\,ds\leq 4K^{\nicefrac{{3}}{{2}}}+\int_{0}^{t}\|\mathbb{E}[\langle{\mathbf{Q}}\rangle_{s,\epsilon}]\|^{p-1}\,ds$ . We have:

[TABLE]

The second inequality follows from Cauchy-Schwarz inequality and the first equality from the Nishimori identity. Hence the upper bound $\|R\|\leq 4K^{\nicefrac{{3}}{{2}}}+\mathrm{Tr}(\Sigma_{X})^{p-1}$ for all $R\in\mathcal{R}_{n,t}$ , which directly extends to $C(\mathcal{R}_{n,t})$ by definition of a convex hull.

Now to prove (ii). If $R\in\mathcal{R}_{n,t}$ , note that (56) directly implies $R-\epsilon\succcurlyeq 0$ as – by the Nishimori identity and the Schur Product theorem – $\mathbb{E}[\langle{\mathbf{Q}}\rangle_{s,\epsilon}]^{\circ(p-1)}$ is symmetric positive semidefinite for all $s\in[0,1]$ . More generally, if $R\in C(\mathcal{R}_{n,t})$ , there exist $m\in\mathbb{N}^{*}$ , $(\alpha_{1},\alpha_{2},\dots,\alpha_{m})\in[0,1]^{m}$ and $(R_{1},\dots,R_{m})\in(\mathcal{R}_{n,t})^{M}$ such that $\sum_{j=1}^{m}\alpha_{j}=1$ and $R=\sum_{j=1}^{m}\alpha_{j}R_{j}$ . It follows direcly that $R\succcurlyeq\sum_{j=1}^{m}\alpha_{j}\epsilon_{j}$ where $\forall j\in\{1,\dots,m\}:\mathcal{E}_{n}\ni\epsilon_{j}\preccurlyeq R_{j}$ . As $\mathcal{E}_{n}$ is convex, it concludes the proof of (ii).

We now show (ii) $\Rightarrow$ (iii). Let $R\in\mathrm{C}(\mathcal{R}_{n,t})$ and pick $\epsilon\in\mathcal{E}_{n}$ such that $R\succcurlyeq\epsilon$ . For all $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ and $\delta\in(-s_{n},s_{n})$ , $\epsilon+\delta E^{(\ell,\ell^{\prime})}$ is a symmetric strictly diagonally dominant matrix with positive diagonal entries. Therefore, $\epsilon+\delta E^{(\ell,\ell^{\prime})}$ belongs to $\mathcal{S}_{K}^{++}$ and $R+\delta E^{(\ell,\ell^{\prime})}\succcurlyeq\epsilon+\delta E^{(\ell,\ell^{\prime})}\succ 0$ .

Finally, we prove (iv). Let $R\in\mathrm{C}(\mathcal{R}_{n,t})$ and denote $\lambda_{\min}(R)$ its minimum eigenvalue. Applying [19, Theorem 1.1] (the first upper bound in (6) to be more precise), we obtain:

[TABLE]

Using (ii), pick $\epsilon\in\mathcal{E}_{n}$ such that $R\succcurlyeq\epsilon$ . By [24, Corollary 2], the minimum eigenvalue of $\epsilon$ is greater than $\sqrt{\alpha\beta}$ where

[TABLE]

Hence $\lambda_{\min}(R)\geq\sqrt{\alpha\beta}\geq s_{n}$ . Combining this lower bound with (58) ends the proof of (iv). ∎

6.2 Concentration of $\boldsymbol{\mathcal{L}}$ around its expectation

As in [2], the concentration of the overlap matrix around its expectation will follow from the concentration of the $K\times K$ symmetric matrix $\boldsymbol{\mathcal{L}}\equiv\boldsymbol{\mathcal{L}}(R)$ whose entries are:

[TABLE]

This is well-defined as long as $R\in\mathcal{S}_{K}^{++}$ . To prove concentration results on $\boldsymbol{\mathcal{L}}$ , it will be useful to work with the free entropy $\frac{1}{n}\ln\mathcal{Z}_{t,R}({\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,R)})$ where $\mathcal{Z}_{t,R}({\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,R)})$ is the normalization factor of the posterior distribution (128). In Appendix C, we prove that this free entropy concentrates around its expectation when $n\to+\infty$ . In order to shorten notations, we define:

[TABLE]

Proposition 4 (Thermal fluctuations of $\boldsymbol{\mathcal{L}}$ ).

Assume $P_{X}$ has finite fourth-order moments. There exists a positive constant $C$ , depending only on $\Sigma_{X}$ , $K$ and $p$ , such that for all $(n,t)\in\mathbb{N}^{*}\times[0,1]$ :

[TABLE]

Proof.

Fix $(n,t)\in\mathbb{N}^{*}\times[0,1]$ . Note that $\forall R\in\mathcal{S}_{K}^{++}$ , $\forall(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ :

[TABLE]

Further differentiating, we obtain:

[TABLE]

Combining (64) and (75) for $\mathbb{E}\,\langle\nicefrac{{\partial\mathcal{L}_{\ell\ell^{\prime}}}}{{\partial R_{\ell\ell^{\prime}}}}\rangle_{t,R}$ (see Lemma 2 following this proof), it comes:

[TABLE]

We start with upper bounding the integral over $C(\mathcal{R}_{n,t})$ of the second summand on the right-hand side of (65). Thanks to the Nishimory identity, we can see that $\Sigma_{X}\succcurlyeq\mathbb{E}\,\langle{\mathbf{Q}}\rangle_{t,R}$ . Indeed:

[TABLE]

Therefore, $(\nicefrac{{\partial\sqrt{R}}}{{\partial R_{\ell\ell^{\prime}}}})(\Sigma_{X}-\mathbb{E}\,\langle{\mathbf{Q}}\rangle_{t,R})(\nicefrac{{\partial\sqrt{R}}}{{\partial R_{\ell\ell^{\prime}}}})$ is symmetric positive semidefinite and the second term on the right-hand side of (65) satisfies:

[TABLE]

The last inequality follows from the upper bound (54) in Lemma 1. Therefore, keeping in mind that $C(\mathcal{R}_{n,t})$ is included in the ball $\mathcal{B}(\Sigma_{X},K,p)$ , there exists a positive constant $C_{1}$ depending only on $\Sigma_{X}$ , $K$ and $p$ such that:

[TABLE]

Now we turn to upper bounding $\int_{C(\mathcal{R}_{n,t})}\frac{dR}{n}\frac{\partial^{2}f_{n}}{\partial R_{\ell\ell^{\prime}}^{2}}\Big{|}_{t,R}$ . Define the closed convex set

[TABLE]

For every pair $(\widetilde{R},r)\in C^{(\ell,\ell^{\prime})}\times\mathbb{R}$ , we denote $\widetilde{R}\cup\{r\}$ the symmetric matrix whose entries are given by:

[TABLE]

Because $C(\mathcal{R}_{n,t})$ is a closed convex, there exist two functions $a,b:C^{(\ell,\ell^{\prime})}\to\mathbb{R}$ such that $\forall\widetilde{R}\in C^{(\ell,\ell^{\prime})}$ :

(i)

$a(\widetilde{R})\leq b(\widetilde{R})$ ; 2. (ii)

$\forall r\in[a(\widetilde{R}),b(\widetilde{R})]:\widetilde{R}\cup\{r\}\in C(\mathcal{R}_{n,t})$ ; 3. (iii)

$\forall r\in\mathbb{R}\setminus[a(\widetilde{R}),b(\widetilde{R})]:\widetilde{R}\cup\{r\}\notin C(\mathcal{R}_{n,t})$ .

Therefore,

[TABLE]

Note that $\forall R\in\mathcal{S}_{K}^{++}$ :

[TABLE]

where the second and third inequalities follow from the identity (74) (see Lemma 2 following this proof) and the inequality (57), respectively. Combining both (71) and (72), we finally get

[TABLE]

where $C_{2}$ is a positive constant that depends only on $\Sigma_{X}$ , $K$ and $p$ . Integrating (65) over $C(\mathcal{R}_{n,t})$ , making use of the upper bounds (68) and (73) and, finally, summing over $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ end the proof. ∎

We relied on the following lemma for the proof of Proposition 4.

Lemma 2.

Assume $P_{X}$ has finite second-order moments. Let $\delta_{\ell\ell^{\prime}}=0$ if $\ell\neq\ell^{\prime}$ and $\delta_{\ell\ell^{\prime}}=1$ otherwise. Then, $\forall(t,R)\in[0,1]\times\mathcal{S}_{K}^{++}$ , $\forall(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ :

[TABLE]

Proof.

Fix $(t,R)\in[0,1]\times\mathcal{S}_{K}^{++}$ . By the definition (59) of $\boldsymbol{\mathcal{L}}$ , we have $\forall(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ :

[TABLE]

Integrating by parts with respect to the Gaussian random vectors $\widetilde{Z}_{j}$ , $j\in[n]$ , the last expectation on the right-hand side of (76) reads:

[TABLE]

The second and third equalities follow from (135) and the Nishimori identity, respectively. Plugging (77) in (76) and, then, making use of the identity $\nicefrac{{\partial R}}{{\partial R_{\ell\ell^{\prime}}}}=E^{(\ell,\ell^{\prime})}$ end the proof of (74):

[TABLE]

We now turn to the proof of (75). We have:

[TABLE]

The second equality follows once again from a Gaussian integration by parts with respect to $\widetilde{Z}_{j}$ , $j\in[n]$ . Note that for all $v\in\mathbb{R}^{K}$ :

[TABLE]

because of the identity

[TABLE]

Plugging (79) in (6.2), $\mathbb{E}\,\langle\nicefrac{{\partial\mathcal{L}_{\ell\ell^{\prime}}}}{{\partial R_{\ell\ell^{\prime}}}}\rangle_{t,R}$ further simplifies:

[TABLE]

The second equality follows from the Nishimori identity. ∎

Proposition 5 (Quenched fluctuations of $\boldsymbol{\mathcal{L}}$ ).

Assume $P_{X}$ has bounded support. There exists a positive constant $C$ , depending only on $P_{X}$ , $K$ and $p$ , such that for all $(n,t)\in\mathbb{N}^{*}\times[0,1]$ :

[TABLE]

Proof.

Fix $(n,t)\in\mathbb{N}^{*}\times[0,1]$ . For all $R\in\mathcal{S}_{K}^{++}$ and $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ , we have:

[TABLE]

By assumption there exists a nonnegative real number $B_{X}$ such that $X\sim P_{X}\Rightarrow\|X\|\leq B_{X}$ almost surely. Using the upper bound (55) in Lemma 1, the second term on the right-hand side of (83) can be upper bounded:

[TABLE]

From now on, we also fix $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ as well as $\widetilde{R}\in C^{(\ell,\ell^{\prime})}$ . The closed convex set $C^{(\ell,\ell^{\prime})}$ is as defined by (69). Remember that, for every real number $r$ , $\widetilde{R}\cup\{r\}$ is the matrix defined by (70), and that there exist two functions $a,b:C^{(\ell,\ell^{\prime})}\to\mathbb{R}$ such that $\forall\widetilde{R}\in C^{(\ell,\ell^{\prime})}$ :

(i)

$a(\widetilde{R})\leq b(\widetilde{R})$ ; 2. (ii)

$\forall r\in[a(\widetilde{R}),b(\widetilde{R})]:\widetilde{R}\cup\{r\}\in C(\mathcal{R}_{n,t})$ ; 3. (iii)

$\forall r\in\mathbb{R}\setminus[a(\widetilde{R}),b(\widetilde{R})]:\widetilde{R}\cup\{r\}\notin C(\mathcal{R}_{n,t})$ .

Besides, by property (iii) in Lemma 1, for every $r\in(a(\widetilde{R})-s_{n},b(\widetilde{R})+s_{n})$ the matrix $\widetilde{R}\cup\{r\}$ is in $\mathcal{S}_{K}^{++}$ . Thus, we can define for all $r\in(a(\widetilde{R})-s_{n},b(\widetilde{R})+s_{n})$ :

[TABLE]

$F$ is convex on $(a(\widetilde{R})-s_{n},b(\widetilde{R})+s_{n})$ as it is twice differentiable with a nonnegative second derivative by (83) and (86). The same holds for $f$ . We will apply the following standard to these two convex functions (see [5] for a proof):

Lemma 3 (An upper bound for differentiable convex functions).

Let $g$ and $G$ be two differentiable convex functions defined on an interval $I\subseteq\mathbb{R}$ . Let $r\in I$ and $\delta>0$ such that $r\pm\delta\in I$ . Then

[TABLE]

where $C_{\delta}(r)=g^{\prime}(r+\delta)-g^{\prime}(r-\delta)\geq 0$ .

For all $r\in(a(\widetilde{R})-s_{n},b(\widetilde{R})+s_{n})$ , we have:

[TABLE]

Let $C_{\delta}(r)=f^{\prime}(r+\delta)-f^{\prime}(r-\delta)$ , which is nonnegative by convexity of $f$ . It follows from Lemma 3 and the two identities (90) and (91) that $\forall(r,\delta)\in[a(\widetilde{R}),b(\widetilde{R})]\times(0,s_{n})$ :

[TABLE]

Thanks to the inequality $(\sum_{i=1}^{m}v_{i})^{2}\leq m\sum_{i=1}^{m}v_{i}^{2}$ , this directly implies $\forall(r,\delta)\in[a(\widetilde{R}),b(\widetilde{R})]\times(0,s_{n})$ :

[TABLE]

The next step is to bound the integral of the three summands on the right-hand side of (92). Remember that $\forall r\in[a(\widetilde{R}),b(\widetilde{R})]:\widetilde{R}\cup\{r\}\in C(\mathcal{R}_{n,t})$ . By property (i) in Lemma 1, we have:

[TABLE]

Besides, by independence of the Gaussian random vectors $\widetilde{Z}_{j}$ , ${\mathbb{V}\mathrm{ar}}\big{(}\sum_{j=1}^{n}\|\widetilde{Z}_{j}\|\big{)}=n{\mathbb{V}\mathrm{ar}}\,\|\widetilde{Z}_{1}\|\leq nK$ . We conclude that there exists a positive constant $C_{1}$ depending only on $P_{X}$ , $K$ and $p$ such that $\forall\delta\in(0,s_{n})$ :

[TABLE]

Note that $C_{\delta}(r)=|C_{\delta}(r)|\leq|f^{\prime}(r+\delta)|+|f^{\prime}(r-\delta)|$ . For all $q\in(a(\widetilde{R})-s_{n},b(\widetilde{R})+s_{n})$ , we have:

[TABLE]

where $\widetilde{C}_{2}$ is a positive constant depending only on $P_{X}$ , $K$ and $p$ . The second inequality in (95) follows from the upper bounds $|\mathbb{E}\,\langle\mathcal{L}_{\ell\ell^{\prime}}\rangle_{t,\widetilde{R}\cup\{q\}}|\leq\mathrm{Tr}(\Sigma_{X})$ (see (72)), (93) and $\mathbb{E}\|\widetilde{Z}_{j}\|\leq\mathbb{E}[\|\widetilde{Z}_{j}\|^{2}]^{\nicefrac{{1}}{{2}}}=\sqrt{K}$ . Thus, for the second summand, we obtain $\forall\delta\in(0,s_{n})$ :

[TABLE]

The last inequality is a simple application of the mean value theorem. We finally turn to the third summand. For every $\widetilde{R}\in C^{(\ell,\ell^{\prime})}$ and pair $(r,\delta)\in[a(\widetilde{R}),b(\widetilde{R})]\times(-s_{n},s_{n})$ , we have:

[TABLE]

This upper bound is uniform in $n$ and $t$ . Hence, by Theorem 5 of Appendix C, there exists a positive constant $C_{3}$ depending only on $P_{X}$ , $K$ and $p$ such that $\forall\widetilde{R}\in C^{(\ell,\ell^{\prime})}$ , $\forall(r,\delta)\in[a(\widetilde{R}),b(\widetilde{R})]\times(-s_{n},s_{n})$ :

[TABLE]

Using first (97) and then (93), we see that the third summand satisfies $\forall\delta\in(0,s_{n})$ :

[TABLE]

We now choose $\delta=\nicefrac{{s_{n}^{\nicefrac{{3}}{{2}}}}}{{n^{\nicefrac{{1}}{{3}}}}}$ . As $s_{n}\in(0,1)$ , this choice satisfies $\delta\in(0,s_{n})$ . The combination of (92) with the three upper bounds (94), (96) and (98) shows the existence of a positive constant $C$ depending only on $P_{X}$ , $K$ and $p$ such that:

[TABLE]

One important fact following from our analysis is that $C$ can be chosen independently of both $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ and $\widetilde{R}\in C^{(\ell,\ell^{\prime})}$ . Therefore, for all $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ , we have

[TABLE]

where $V_{C^{(\ell,\ell^{\prime})}}$ denotes the volume of $C^{(\ell,\ell^{\prime})}$ . As each of the $K(K+1)/2$ sets $C^{(\ell,\ell^{\prime})}$ is uniformly bounded in $n$ and $t$ , the theorem follows from summing (100) over $(\ell,\ell^{\prime})$ . ∎

6.3 Concentration of ${\mathbf{Q}}$ around its expectation

We forthwith use the concentration results for $\boldsymbol{\mathcal{L}}$ , that is, Propositions 4 and 5, to prove Theorem 4. First an intermediary result on the thermal fluctuations of ${\mathbf{Q}}$ :

Proposition 6 (Concentration of the overlap matrix around its expectation).

Assume $P_{X}$ has finite fourth-order moments. There exists a positive constant $C$ depending only on $P_{X}$ , $K$ and $p$ such that

[TABLE]

Proof.

Fix $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ . Note that $\forall(t,R)\in[0,1]\times\mathcal{S}_{K}^{++}$ :

[TABLE]

where $M_{X}\coloneqq\frac{1}{n^{2}}\sum_{i,j=1}^{n}\mathbb{E}[X_{i\ell^{\prime}}^{2}X_{j\ell^{\prime}}^{2}]$ is finite thanks to the assumption. Differentiating with respect to $R_{\ell\ell}$ the identity $\mathbb{E}\,\langle\mathcal{L}_{\ell\ell}\rangle_{t,R}=-\frac{1}{2}\mathbb{E}\,\langle Q_{\ell\ell}\rangle_{t,R}$ (see Lemma 2), we obtain (see [2] for the detailed computation):

[TABLE]

By Proposition 4 and the inequality (68) combined with (75), there exists a positive constant $C$ depending only on $P_{X}$ , $K$ and $p$ such that:

[TABLE]

Combining both inequalities (103) and (105) with Cauchy-Schwarz inequality, it comes:

[TABLE]

with $V_{C(\mathcal{R}_{n,t})}$ the volume of $C(\mathcal{R}_{n,t})$ which is bounded uniformly in $n$ and $t$ by (i) of Lemma 1. This ends the proof of (101). The inequality (102) is proved in a similar way (see [2]). ∎

Finally we conclude this section with the proof of Theorem (4).

Proof of Theorem 4.

To lighten notations we drop the subscripts of the Gibbs brackets $\langle-\rangle_{t,R}$ . The concentration of ${\mathbf{Q}}$ can be linked to the concentration of $\boldsymbol{\mathcal{L}}$ by rewriting $\mathrm{Tr}\,\mathbb{E}\,\langle{\mathbf{Q}}(\boldsymbol{\mathcal{L}}-\mathbb{E}\langle\boldsymbol{\mathcal{L}}\rangle)\rangle$ properly. Thanks to the identity (74), we have:

[TABLE]

Plugging $\boldsymbol{\mathcal{L}}$ ’s definition (59) in $\mathrm{Tr}\,\mathbb{E}\,\langle{\mathbf{Q}}\boldsymbol{\mathcal{L}}\rangle$ and integrating by parts with respect to the Gaussian random vectors $\widetilde{Z}_{j}$ , $j\in[n]$ , we find:

[TABLE]

Note that $\forall(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2},\forall j\in\{1,\dots,n\}$ :

[TABLE]

The second equality follows from (135), for the first expectation, and the Nishimori identity, for the second expectation. Plugging (109) in (108), it comes:

[TABLE]

Subtracting (107) to (110), we obtain:

[TABLE]

Remember the matrices $E^{(\ell,\ell^{\prime})}$ defined by (53). As $\nicefrac{{\partial R}}{{\partial R_{\ell\ell^{\prime}}}}=E^{(\ell,\ell^{\prime})}$ , we have:

[TABLE]

Subtracting (112) to (113) yields:

[TABLE]

Plugging (114) in (111) gives the equality:

[TABLE]

On one hand,

[TABLE]

On the other hand, using exclusively Cauchy-Schwarz inequality, we have:

[TABLE]

Note that $\|\sqrt{R}\|=\sqrt{\mathrm{Tr}\,R}\leq\sqrt{\|R\|\|I_{K}\|}\leq\sqrt{4K^{2}+K^{\nicefrac{{1}}{{2}}}\mathrm{Tr}(\Sigma_{X})^{p-1}}\coloneqq B$ where the last inequality follows from (i) in Lemma 1. Therefore, $\big{\|}\frac{\partial\sqrt{R}}{\partial R_{\ell\ell^{\prime}}}\sqrt{R}\big{\|}\leq\big{\|}\frac{\partial\sqrt{R}}{\partial R_{\ell\ell^{\prime}}}\big{\|}\|\sqrt{R}\|\leq\nicefrac{{B}}{{\sqrt{2s_{n}}}}$ (remember (54)). By Cauchy-Schwarz inequality:

[TABLE]

Further upperbounding, we obtain

[TABLE]

as, by Jensen’s inequality and Nishimori identity, we have:

[TABLE]

Putting together the equality (115), the lower bound (116) and the upper bounds (117), (118), (119), there exists a positive constant $C$ depending only on $P_{X}$ , $K$ and $p$ such that:

[TABLE]

To end the proof of Theorem 4, it remains to integrate both sides of (120) over $C(\mathcal{R}_{n,t})$ and apply Propositions 4, 5, 6. ∎

7 Conclusion and discussion for odd-order tensors

In this work, we have proved the conjectured replica formula for even-order symmetric tensors. It would be desirable to extend both Theorem 2 and Theorem 3 to the odd-order case. For the case $K=1$ we refer to [18]. For $K>1$ , this is still an open problem and we now briefly discuss where our proofs fall short in this case.

Ideally, to extend Theorem 2 to an odd order $p$ , we would show that the integral on the r.h.s. of (16), i.e., $\int_{0}^{1}dt\,\sum_{\ell,\ell^{\prime}}\mathbb{E}\,\langle h_{p}(S_{\ell\ell^{\prime}},Q_{\ell\ell^{\prime}})\rangle_{t,0}$ with $h_{p}(r,q)=q^{p}-pqr^{p-1}+(p-1)r^{p}$ , is nonnegative. However, when $p$ is odd, $h_{p}$ is not nonnegative on its whole domain of definition. To be able to say something about the integral, we have to take a Gibbs average of $Q_{\ell\ell^{\prime}}$ before applying $h_{p}$ . This requires rewriting the integral as follows:

[TABLE]

When $K=1$ , both $\langle{\mathbf{x}}\rangle_{t,0}^{T}\langle{\mathbf{x}}\rangle_{t,0}$ and $S$ are nonnegative real numbers. The nonnegativity of $h_{p}(r,q)$ for $r,q\geq 0$ then ensures that the second integral on the r.h.s. of (121) is nonnegative and, by introducing a small perturbation $\epsilon$ on which we integrate, we can cancel the first integral as was done in the proof of Theorem 3. This is how the lower bound is proved in [18]. When $K>1$ , we only know that $\langle{\mathbf{x}}\rangle_{t,0}^{T}\langle{\mathbf{x}}\rangle_{t,0}$ and $S$ are symmetric positive semidefinite matrices: a priori nothing can be said on the sign of their individual entries. The problem remains if we further rewrite:

[TABLE]

While $\mathbb{E}\,\langle{\mathbf{Q}}\rangle_{t,0}$ and $S$ are positive semidefinite, nothing can be said on the sign of their individual entries. Most probably, it should be the full sum over $(\ell,\ell^{\prime})$ that one should consider to conclude on the sign of the second integral on the r.h.s. of (122). Indeed, using $A\succcurlyeq B\succcurlyeq 0\Rightarrow\forall k\in\mathbb{N}:A^{\circ k}\succcurlyeq B^{\circ k}\succcurlyeq 0$ , we can show that $\sum_{\ell,\ell^{\prime}=1}^{K}h_{p}(S_{\ell\ell^{\prime}},\mathbb{E}\,\langle Q_{\ell\ell^{\prime}}\rangle_{t,0})$ is nonnegative if $S\succcurlyeq\mathbb{E}\,\langle{\mathbf{Q}}\rangle_{t,0}$ or $\mathbb{E}\,\langle{\mathbf{Q}}\rangle_{t,0}\succcurlyeq S$ . As far as we can tell, it is not clear why such partial ordering between $S$ and $\mathbb{E}\,\langle{\mathbf{Q}}\rangle_{t,0}$ (which itself depends on $S$ ) holds.

Regarding Theorem 3, the whole proof would directly apply to $p$ odd if we could show that the divergence (23) is nonnegative. However this is more difficult than for $p$ even. Indeed, while the $\Delta_{\ell\ell^{\prime}}$ ’s are still $\geq 0$ , it is not necessarily the case of $\mathbb{E}[\langle Q_{\ell\ell^{\prime}}\rangle_{t,R}]^{p-2}$ when $p-2$ is odd.

Funding

This work was supported by the Swiss National Science Foundation [200021E-175541 to C. L].

Appendix A Properties of the function $\psi$

Lemma 4.

Let $X\in\mathbb{R}^{K}\sim P_{X}$ and $\widetilde{Z}\in\mathbb{R}^{K}\sim\mathcal{N}(0,I_{K})$ . The function $\psi:\mathcal{S}_{K}^{+}\to\mathbb{R}$ , defined as

[TABLE]

is Lipschitz continuous with Lipschitz constant $\nicefrac{{\mathrm{Tr}(\Sigma_{X})}}{{2}}$ and convex.

Proof.

Consider the inference problem in which one observes the $K$ -dimensional vector $Y=\sqrt{R}X+\widetilde{Z}$ , where $R\in\mathcal{S}_{K}^{+}$ is known, and one wants to recover $X$ . The posterior of X given Y is

[TABLE]

where $\mathcal{Z}_{R}(Y)=\int dP_{X}(x)\exp\big{(}Y^{T}\sqrt{R}x-\frac{1}{2}x^{T}Rx\big{)}$ . We denote $\langle-\rangle_{R}=\int-\,dP(x\,|\,Y,R)$ the Gibbs brackets associated to the latter posterior distribution. Clearly, $\psi(R)=\mathbb{E}_{X,\widetilde{Z}}[\ln\mathcal{Z}_{R}(Y)]$ .

Now fix $R,Q\in\mathcal{S}_{K}^{++}$ . We will prove that the function $h:t\in[0,1]\mapsto\psi(tR+(1-t)Q)$ is convex, thus proving that $\psi$ is convex on $\mathcal{S}_{K}^{++}$ . The convexity on the whole cone $\mathcal{S}_{K}^{+}$ will then follow from the continuity of $\psi$ (which is clear from its definition). $h$ is twice differentiable. Its derivative reads:

[TABLE]

To get the second equality, first integrate by parts with respect to the Gaussian random variables $\widetilde{Z}_{i}$ , $i\in[n]$ . Then make use of the identity

[TABLE]

which follows from $\sqrt{tR+(1-t)Q}\frac{d\sqrt{tR+(1-t)Q}}{dt}+\frac{d\sqrt{tR+(1-t)Q}}{dt}\sqrt{tR+(1-t)Q}=\frac{d(tR+(1-t)Q)}{dt}$ . Differentiating (125) further, we find (the subscript of $\langle-\rangle_{tR+(1-t)Q}$ is omitted):

[TABLE]

To get the second equality, we once again used Gaussian integration by parts and the identity (126). The second-to-last equality follows from the Nishimori identity:

[TABLE]

The convexity of $h$ now follows directly from the non-negativity of $h^{\prime\prime}$ on $[0,1]$ .

To prove the Lipschitz continuity of $\psi$ , note that the derivative of $h$ satisfies $\forall t\in[0,1]$ :

[TABLE]

The mean value theorem then directly implies $|\psi(R)-\psi(Q)|=|h(1)-h(0)|\leq\frac{\mathrm{Tr}\,\Sigma_{X}}{2}\|R-Q\|$ . The last inequality in (127) follows from Cauchy-Schwarz inequality, Jensen’s inequality and the Nishimori identity:

[TABLE]

∎

Appendix B Divergence of the function $G_{n}$

In Proposition 2 we introduced the inference problem (19). The associated posterior distribution is

[TABLE]

where $\mathcal{Z}_{t,R}({\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,R)})\coloneqq\int e^{-\mathcal{H}_{t,R}({\mathbf{x}};{\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,R)})}\,\prod_{j=1}^{n}dP_{X}(x_{j})$ and $\forall{\mathbf{x}}\in\mathbb{R}^{n\times K}$ :

[TABLE]

Let $\langle-\rangle_{t,R}=\int-\,dP({\mathbf{x}}\,|\,{\mathbf{Y}}^{(t)},\widetilde{{\mathbf{Y}}}^{(t,R)})$ be the Gibbs brackets associated to the posterior distribution (128). In this appendix we prove a formula for the divergence of the function

[TABLE]

Lemma 5 (Divergence of $G_{n}$ ).

Let $\delta_{\ell\ell^{\prime}}=0$ if $\ell\neq\ell^{\prime}$ and $\delta_{\ell\ell^{\prime}}=1$ otherwise. $\forall(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ :

[TABLE]

Then, the divergence of $G_{n}$ is

[TABLE]

with

[TABLE]

Proof.

To lighten notations, we omit the subscripts of the Gibbs brackets $\langle-\rangle_{t,R}$ . Let $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ . The partial derivative of $R\mapsto\big{(}G_{n}(t,R)\big{)}_{\ell\ell^{\prime}}$ with respect to $R_{\ell\ell^{\prime}}$ reads:

[TABLE]

with

[TABLE]

Once the identity (134) has been plugged in the right-hand side of (133), two expectations involving the Gaussian randon vectors $\widetilde{Z}_{j}$ , $j=1\dots n$ , appear. An integration by parts with respect to the Gaussian random variables $\widetilde{Z}_{jk}$ , $k\in\{1,\dots,K\}$ , gives:

[TABLE]

In both chains of equalities, the last one follows from an identity similar to (43), i.e.,

[TABLE]

Making use of the two identities yielded by the integration by parts, as well as (135), we find:

[TABLE]

Thanks to the Nishimori identity, we have

[TABLE]

and (136) further simplifies (the last equality uses the cyclic property of the trace):

[TABLE]

Now consider the case $\ell\neq\ell^{\prime}$ . All the entries of $\nicefrac{{\partial R}}{{\partial R_{\ell\ell^{\prime}}}}$ are zeros save for the entries $(\ell,\ell^{\prime})$ and $(\ell^{\prime},\ell)$ which are both one. Equation (137) then reads:

[TABLE]

Combining (133) and (138) gives the identity (131) when $\ell\neq\ell^{\prime}$ . The case $\ell=\ell^{\prime}$ is obtained in a similar way except that now the entries of $\nicefrac{{\partial R}}{{\partial R_{\ell\ell}}}$ are zeros save for the entry $(\ell,\ell)$ which is one.

We can now prove the identity for the divergence of $G_{n}$ . This divergence, denoted $\mathcal{D}$ , satisfies:

[TABLE]

In the last equality of (139), replacing the summands by their formula (131) yields:

[TABLE]

Remember that $\mathbb{E}[\langle{\mathbf{Q}}\rangle_{t,R}\,]$ , and therefore $\mathbb{E}[\langle{\mathbf{Q}}\rangle_{t,R}\,]^{\circ(p-2)}$ , is symmetric. Using that the trace is invariant by transposition and cyclic permutation, the two traces in (140) read:

[TABLE]

Clearly, $\mathbb{E}\,\big{\langle}({\mathbf{Q}}+{\mathbf{Q}}^{T})\circ\big{(}{\mathbf{Q}}+{\mathbf{Q}}^{T}-\langle{\mathbf{Q}}+{\mathbf{Q}}^{T}\rangle\big{)}\big{\rangle}=\mathbb{E}\,\big{\langle}\big{(}{\mathbf{Q}}+{\mathbf{Q}}^{T}-\langle{\mathbf{Q}}+{\mathbf{Q}}^{T}\rangle\big{)}^{\circ 2}\big{\rangle}$ . Similarly, we have

[TABLE]

For this last equality, we could complete the square thanks to the following term being zero:

[TABLE]

Plugging these identities back in (140), we finally obtain:

[TABLE]

where we recognize that the second expectation $\mathbb{E}[-]$ is equal to $4\boldsymbol{\Delta}$ (see definition (132)). ∎

Appendix C Concentration of the free entropy

Consider the inference problem (19). The associated Hamiltonian reads

[TABLE]

In this section we show that the free entropy

[TABLE]

concentrates around its expectation. We will sometimes write $\frac{1}{n}\ln\mathcal{Z}_{t,R}$ , omitting the arguments, to shorten notations.

Theorem 5 (Concentration of the free entropy).

Assume $P_{X}$ has finite $(4p-4)$ th order moments. There exists a positive constant $C$ depending only on $P_{X}$ , $K$ , $p$ and $\|R\|$ such that

[TABLE]

Proof.

To lighten notations we drop the subscripts of the Gibbs brackets $\langle-\rangle_{t,R}$ . First, we show that the free entropy concentrates on its conditional expectation given the Gaussian noise ${\mathbf{Z}}$ , $\widetilde{{\mathbf{Z}}}$ . Thus, $\nicefrac{{\ln\mathcal{Z}_{t,R}}}{{n}}$ is seen as a function of $X_{1},\dots,X_{n}$ only and we work conditionally to ${\mathbf{Z}},\widetilde{{\mathbf{Z}}}$ . Let $X^{\prime}_{1},\dots,X^{\prime}_{n}$ be i.i.d. samples from $P_{X}$ , independent of ${\mathbf{X}}$ . For all $j\in\{1,\dots,n\}$ , we define

[TABLE]

where ${\mathbf{Y}}^{(j,t)}$ , $\widetilde{{\mathbf{Y}}}^{(j,t,R)}$ are obtained from ${\mathbf{Y}}^{(t)}$ , $\widetilde{{\mathbf{Y}}}^{(t,R)}$ by replacing $X_{j}$ by $X^{\prime}_{j}$ . We can consider an inference problem similar to (19) for which the observations are ${\mathbf{Y}}^{(j,t)}$ , $\widetilde{{\mathbf{Y}}}^{(j,t,R)}$ . Then the Gibbs brackets associated to the posterior distribution are

[TABLE]

By the Efron-Stein inequality (see [8, Theorem 3.1]), we have:

[TABLE]

Fix $j\in\{1,\dots,n\}$ . By Jensen’s inequality, note that

[TABLE]

Define $\mathcal{I}_{j}=\{i\in\mathcal{I}:\exists b\in\{1,\dots,p\}:i_{b}=j\}$ and $\forall i\in\mathcal{I}_{j}:c(i)=\big{|}\big{\{}a\in\{1,\dots,p\}:i_{a}=j\big{\}}\big{|}$ . The quantity inbetween the Gibbs brackets in (145) reads:

[TABLE]

Using Jensen’s inequality, we further obtain:

[TABLE]

We now bound each summand on the right-hand side of (147) separately. For all $i\in\mathcal{I}_{j}$ and $(\ell,\ell^{\prime})\in\{1,\dots,K\}^{2}$ :

[TABLE]

The first inequality follows from the Cauchy-Schwarz inequality, the second one from Jensen’s inequality, and the first equality from the Nishimori identity. The final bound is finite given that $P_{X}$ has finite $(4p-4)$ th order moments. Hence, there exists a positive constant $C$ depending only on $P_{X}$ , $K$ and $p$ such that the first term on the right-hand side of (147) is bounded by $\nicefrac{{C|\mathcal{I}_{j}|^{2}}}{{n^{2p-2}}}\leq C$ (as $|\mathcal{I}_{j}|\leq n^{p-1}$ ). Regarding the second term on the right-hand side of (147), we easily get:

[TABLE]

We conclude that there exists a positive constant $C$ depending only on $P_{X}$ , $K$ , $p$ and $\|R\|$ such that

[TABLE]

A similar bound holds when the Gibbs brackets $\langle-\rangle$ are replaced by $\langle-\rangle_{(j)}$ . Finally, combining (144), (145) and (148), we obtain the desired upper bound:

[TABLE]

where the positive constant $C$ is not necessarily the same than before but still only depends on $P_{X}$ , $K$ , $p$ and $\|R\|$ .

The second – and final – step is to show that the conditional expectation of the free entropy given ${\mathbf{Z}},\widetilde{{\mathbf{Z}}}$ concentrates on its expectation. Let $g({\mathbf{Z}},\widetilde{{\mathbf{Z}}})=\mathbb{E}[\nicefrac{{\ln\mathcal{Z}_{t,R}}}{{n}}|{\mathbf{Z}},\widetilde{{\mathbf{Z}}}]$ . By the Gaussian-Poincaré inequality (see [8, Theorem 3.20]), we have:

[TABLE]

The squared norm of the gradient of $g$ reads $\|\nabla g\|^{2}=\sum_{i\in\mathcal{I}}|\nicefrac{{\partial g}}{{\partial Z_{i}}}|^{2}+\sum_{j=1}^{n}\sum_{\ell=1}^{K}|\nicefrac{{\partial g}}{{\partial\widetilde{Z}_{j\ell}}}|^{2}$ . Each of these partial derivatives takes the form $\partial g=-n^{-1}\big{\langle}\partial\mathcal{H}_{t,R}\big{\rangle}$ . More precisely:

[TABLE]

On one hand, we have

[TABLE]

where the first two inequalities follow from Jensen’s inequality and the equality from the Nishimori identity. On the other hand, we have

[TABLE]

where the first inequality follows from Jensen’s inequality and the equality from the Nishimori identity. Both upper bounds in (151) and (152) take the form $\nicefrac{{C}}{{n}}$ with $C$ a positive constant $C$ depending only on $P_{X}$ , $K$ , $p$ and $\|R\|$ (remember that $|\mathcal{I}|\leq n^{p}$ ). Plugging (151) and (152) in (150), we conclude that

[TABLE]

where $C$ depends only on $P_{X}$ , $K$ , $p$ and $\|R\|$ . Combining (149) and (153) ends the proof of (143). ∎

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Alaoui, A. E. & Krzakala, F. (2018) Estimation in the Spiked Wigner Model: A Short Proof of the Replica Formula. in 2018 IEEE International Symposium on Information Theory (ISIT) , pp. 1874–1878.
2[2] Barbier, J. (2020) Overlap matrix concentration in optimal Bayesian inference. Inf. Inference , https://doi.org/10.1093/imaiai/iaaa 008 . · doi ↗
3[3] Barbier, J., Dia, M., Macris, N., Krzakala, F., Lesieur, T. & Zdeborová, L. (2016) Mutual information for symmetric rank-one matrix estimation: a proof of the replica formula. in Advances in Neural Information Processing Systems 29 , NIPS 2016, p. 424–432, Red Hook, NY, USA. Curran Associates.
4[4] Barbier, J., Krzakala, F., Macris, N., Miolane, L. & Zdeborová, L. (2019) Optimal errors and phase transitions in high-dimensional generalized linear models. Proc. Natl. Acad. Sci. USA , 116 (12), 5451–5460.
5[5] Barbier, J. & Macris, N. (2019 a) The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference. Probab. Theory Related Fields , 174 (3), 1133–1185.
6[6] (2019 b) The adaptive interpolation method for proving replica formulas. Applications to the Curie–Weiss and Wigner spike models. J. Phys. A , 52 (29), 294002.
7[7] Barbier, J., Macris, N. & Miolane, L. (2017) The Layered Structure of Tensor Estimation and its Mutual Information. ar Xiv:1709.10368 [cs.IT].
8[8] Boucheron, S., Lugosi, G. & Massart, P. (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence . Oxford Univ. Press, London, U.K.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Mutual information for low-rank even-order symmetric tensor estimation

Abstract

1 Introduction

2 Low-rank symmetric tensor factorization

Theorem 1**.**

Theorem 2**.**

Theorem 3**.**

3 Adaptive path interpolation

Proposition 1** (Sum-rule).**

Proof.

4 Matching bounds

4.1 Lower bound: proof of Theorem 2

4.2 Upper bound: proof of Theorem 3

4.2.1 Interpolation determined by an ordinary differential equation (ODE)

Proposition 2**.**

Proof.

4.2.2 Proof of Theorem 3

5 Time-derivative of the average interpolating free entropy

Proposition 3** (Derivative of the average interpolating free entropy).**

Proof.

6 Concentration of the overlap matrix

Theorem 4** (Concentration of the overlap matrix around its expectation).**

6.1 Properties of Rn,t\mathcal{R}_{n,t}Rn,t​’s convex hull

Lemma 1** (Properties of Rn,t\mathcal{R}_{n,t}Rn,t​’s convex hull).**

Proof.

6.2 Concentration of L\boldsymbol{\mathcal{L}}L around its expectation

Proposition 4** (Thermal fluctuations of L\boldsymbol{\mathcal{L}}L).**

Proof.

Lemma 2**.**

Proof.

Proposition 5** (Quenched fluctuations of L\boldsymbol{\mathcal{L}}L).**

Proof.

Lemma 3** (An upper bound for differentiable convex functions).**

6.3 Concentration of Q{\mathbf{Q}}Q around its expectation

Proposition 6** (Concentration of the overlap matrix around its expectation).**

Proof.

Proof of Theorem 4.

7 Conclusion and discussion for odd-order tensors

Funding

Appendix A Properties of the function ψ\psiψ

Lemma 4**.**

Proof.

Appendix B Divergence of the function GnG_{n}Gn​

Lemma 5** (Divergence of GnG_{n}Gn​).**

Proof.

Appendix C Concentration of the free entropy

Theorem 5** (Concentration of the free entropy).**

Proof.

Theorem 1.

Theorem 2.

Theorem 3.

Proposition 1 (Sum-rule).

Proposition 2.

Proposition 3 (Derivative of the average interpolating free entropy).

Theorem 4 (Concentration of the overlap matrix around its expectation).

6.1 Properties of $\mathcal{R}_{n,t}$ ’s convex hull

Lemma 1 (Properties of $\mathcal{R}_{n,t}$ ’s convex hull).

6.2 Concentration of $\boldsymbol{\mathcal{L}}$ around its expectation

Proposition 4 (Thermal fluctuations of $\boldsymbol{\mathcal{L}}$ ).

Lemma 2.

Proposition 5 (Quenched fluctuations of $\boldsymbol{\mathcal{L}}$ ).

Lemma 3 (An upper bound for differentiable convex functions).

6.3 Concentration of ${\mathbf{Q}}$ around its expectation

Proposition 6 (Concentration of the overlap matrix around its expectation).

Appendix A Properties of the function $\psi$

Lemma 4.

Appendix B Divergence of the function $G_{n}$

Lemma 5 (Divergence of $G_{n}$ ).

Theorem 5 (Concentration of the free entropy).