High-temperature Expansions and Message Passing Algorithms

Antoine Maillard; Laura Foini; Alejandro Lage Castellanos; Florent; Krzakala; Marc M\'ezard; Lenka Zdeborov\'a

arXiv:1906.08479·cond-mat.dis-nn·June 11, 2020

High-temperature Expansions and Message Passing Algorithms

Antoine Maillard, Laura Foini, Alejandro Lage Castellanos, Florent, Krzakala, Marc M\'ezard, Lenka Zdeborov\'a

PDF

TL;DR

This paper revisits high-temperature expansions in statistical physics to unify and analyze message passing algorithms for large, correlated systems, proposing their equivalence and potential exactness in certain limits.

Contribution

It derives a unified framework showing many message passing algorithms are based on the same assumptions, and conjectures their exactness in the thermodynamic limit.

Findings

01

All approximation schemes are equivalent in the high-dimensional limit.

02

Conjecture that these schemes are exact in the replica symmetric phase.

03

Uncovering diagrammatical results related to free probability and random matrix theory.

Abstract

Improved mean-field technics are a central theme of statistical physics methods applied to inference and learning. We revisit here some of these methods using high-temperature expansions for disordered systems initiated by Plefka, Georges and Yedidia. We derive the Gibbs free entropy and the subsequent self-consistent equations for a generic class of statistical models with correlated matrices and show in particular that many classical approximation schemes, such as adaptive TAP, Expectation-Consistency, or the approximations behind the Vector Approximate Message Passing algorithm all rely on the same assumptions, that are also at the heart of high-temperature expansions. We focus on the case of rotationally invariant random coupling matrices in the `high-dimensional' limit in which the number of samples and the dimension are both large, but with a fixed ratio. This encapsulates many…

Equations506

H_{J} (x)

H_{J} (x)

P_{β, J} (d x)

P_{β, J} (d x)

Z_{β, J}

Z_{β, J}

≃ \int d γ i = 1 \prod N \int_{R} d x_{i} e^{\frac{β}{2} \sum_{i, j} J_{ij} x_{i} x_{j} + \frac{γ}{2} (N σ^{2} - \sum_{i} x_{i}^{2})},

≃ exp [γ in f {lo g [\int i = 1 \prod N d x_{i} e^{\frac{β}{2} \sum_{i, j} J_{ij} x_{i} x_{j} + \frac{γ}{2} (N σ^{2} - \sum_{i} x_{i}^{2})}]}] .

P_{β, J} (d x)

P_{β, J} (d x)

Z_{β, J} ≃ exp [γ in f {\frac{N}{2} (lo g 2 π + γ σ^{2} - \frac{1}{N} λ \sum lo g (γ - β λ))}],

Z_{β, J} ≃ exp [γ in f {\frac{N}{2} (lo g 2 π + γ σ^{2} - \frac{1}{N} λ \sum lo g (γ - β λ))}],

N \to \infty lim \frac{1}{N} λ \sum \frac{1}{γ - β λ} = σ^{2},

N \to \infty lim \frac{1}{N} λ \sum \frac{1}{γ - β λ} = σ^{2},

\int \frac{ρ _{D} ( d λ )}{γ - β λ} = σ^{2} .

\int \frac{ρ _{D} ( d λ )}{γ - β λ} = σ^{2} .

γ = β R_{ρ_{D}} (β σ^{2}) + \frac{1}{σ ^{2}} = β S_{ρ_{D}}^{- 1} (- β σ^{2}),

γ = β R_{ρ_{D}} (β σ^{2}) + \frac{1}{σ ^{2}} = β S_{ρ_{D}}^{- 1} (- β σ^{2}),

Φ_{J} (β) \equiv N \to \infty lim \frac{1}{N} lo g Z_{β, J} .

Φ_{J} (β) \equiv N \to \infty lim \frac{1}{N} lo g Z_{β, J} .

Φ_{J} (β)

Φ_{J} (β)

Φ_{J} (β) = \frac{1}{2} (1 + lo g 2 π σ^{2}) + \frac{1}{2} \int_{0}^{β σ^{2}} R_{ρ_{D}} (x) d x .

Φ_{J} (β) = \frac{1}{2} (1 + lo g 2 π σ^{2}) + \frac{1}{2} \int_{0}^{β σ^{2}} R_{ρ_{D}} (x) d x .

Φ_{J} (β) = \frac{1}{2} (lo g 2 π + λ_{max} β σ^{2} - lo g β - \int ρ_{D} (d λ) lo g (λ_{max} - λ)) .

Φ_{J} (β) = \frac{1}{2} (lo g 2 π + λ_{max} β σ^{2} - lo g β - \int ρ_{D} (d λ) lo g (λ_{max} - λ)) .

Φ_{J} (β) = \frac{1}{2} lo g 2 π + \frac{1}{2} γ in f [γ σ^{2} - \int ρ_{D} (d λ) lo g (γ - β λ)],

Φ_{J} (β) = \frac{1}{2} lo g 2 π + \frac{1}{2} γ in f [γ σ^{2} - \int ρ_{D} (d λ) lo g (γ - β λ)],

σ^{2} = \frac{1}{N} i = 1 \sum N [v_{i} + m_{i}^{2}] .

σ^{2} = \frac{1}{N} i = 1 \sum N [v_{i} + m_{i}^{2}] .

U (β, J) \equiv H_{J} - ⟨ H_{J} ⟩_{β} + i = 1 \sum N \partial_{β} λ_{i} (β) (x_{i} - m_{i}) + \frac{1}{2} i = 1 \sum N \partial_{β} γ_{i} (β) [x_{i}^{2} - v_{i} - m_{i}^{2}],

U (β, J) \equiv H_{J} - ⟨ H_{J} ⟩_{β} + i = 1 \sum N \partial_{β} λ_{i} (β) (x_{i} - m_{i}) + \frac{1}{2} i = 1 \sum N \partial_{β} γ_{i} (β) [x_{i}^{2} - v_{i} - m_{i}^{2}],

Φ_{J} (β = 0)

Φ_{J} (β = 0)

= \frac{1}{2} lo g 2 π + \frac{1}{N} i = 1 \sum N [\frac{γ _{i}}{2} (v_{i} + m_{i}^{2}) - \frac{1}{2} lo g γ_{i} + λ_{i} m_{i} + \frac{λ _{i}^{2}}{2 γ _{i}}] .

Φ_{J} (β = 0) = \frac{1}{2} [1 + lo g 2 π] + \frac{1}{2 N} i = 1 \sum N lo g v_{i} .

Φ_{J} (β = 0) = \frac{1}{2} [1 + lo g 2 π] + \frac{1}{2 N} i = 1 \sum N lo g v_{i} .

(\frac{\partial Φ _{J}}{\partial β})_{β = 0} = - \frac{1}{N} ⟨ H_{J} ⟩_{β = 0} = \frac{1}{2 N} i, j \sum J_{ij} m_{i} m_{j} + \frac{1}{2 N} i = 1 \sum N J_{ii} v_{i} .

(\frac{\partial Φ _{J}}{\partial β})_{β = 0} = - \frac{1}{N} ⟨ H_{J} ⟩_{β = 0} = \frac{1}{2 N} i, j \sum J_{ij} m_{i} m_{j} + \frac{1}{2 N} i = 1 \sum N J_{ii} v_{i} .

γ_{i} (β)

γ_{i} (β)

m_{i} γ_{i} (β) + λ_{i} (β)

U (β = 0, J)

U (β = 0, J)

\frac{1}{2} (\frac{\partial ^{2} Φ _{J}}{\partial β ^{2}})_{β = 0}

\frac{1}{2} (\frac{\partial ^{2} Φ _{J}}{\partial β ^{2}})_{β = 0}

\frac{1}{3 !} (\frac{\partial ^{3} Φ _{J}}{\partial β ^{3}})_{β = 0}

\frac{1}{3 !} (\frac{\partial ^{3} Φ _{J}}{\partial β ^{3}})_{β = 0}

\frac{1}{4 !} (\frac{\partial ^{4} Φ _{J}}{\partial β ^{4}})_{β = 0}

\frac{1}{4 !} (\frac{\partial ^{4} Φ _{J}}{\partial β ^{4}})_{β = 0}

Φ_{J} (β)

Φ_{J} (β)

\displaystyle+\frac{1}{N}\sum_{p=1}^{\infty}\frac{\beta^{p}}{2p}\sum_{\begin{subarray}{c}i_{1},\cdots,i_{p}\\ \text{pairwise distincts }\end{subarray}}J_{i_{1}i_{2}}J_{i_{2}i_{3}}\cdots J_{i_{p-1}i_{p}}J_{i_{p}i_{1}}\prod_{\alpha=1}^{p}v_{i_{\alpha}}+\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$\scriptscriptstyle\mathcal{O}$}}_{N}(1).

E \frac{1}{N} p = 1 \sum \infty \frac{β ^{p} σ ^{2 p}}{2 p} i_{1}, \dots, i_{p} pairwise distincts \sum J_{i_{1} i_{2}} J_{i_{2} i_{3}} \dots J_{i_{p - 1} i_{p}} J_{i_{p} i_{1}}

E \frac{1}{N} p = 1 \sum \infty \frac{β ^{p} σ ^{2 p}}{2 p} i_{1}, \dots, i_{p} pairwise distincts \sum J_{i_{1} i_{2}} J_{i_{2} i_{3}} \dots J_{i_{p - 1} i_{p}} J_{i_{p} i_{1}}

\forall p \in N^{⋆}, c_{p} (ρ_{D})

\forall p \in N^{⋆}, c_{p} (ρ_{D})

N \to \infty lim E \frac{1}{N} i_{1}, \dots, i_{p} pairwise distincts \sum J_{i_{1} i_{2}} J_{i_{2} i_{3}} \dots J_{i_{p - 1} i_{p}} J_{i_{p} i_{1}} - c_{p} (ρ_{D})^{2}

N \to \infty lim E \frac{1}{N} i_{1}, \dots, i_{p} pairwise distincts \sum J_{i_{1} i_{2}} J_{i_{2} i_{3}} \dots J_{i_{p - 1} i_{p}} J_{i_{p} i_{1}} - c_{p} (ρ_{D})^{2}

N \to \infty lim \frac{1}{N} p = 1 \sum \infty \frac{β ^{p} σ ^{2 p}}{2 p} i_{1}, \dots, i_{p} pairwise distincts \sum J_{i_{1} i_{2}} J_{i_{2} i_{3}} \dots J_{i_{p - 1} i_{p}} J_{i_{p} i_{1}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

High-temperature Expansions and Message Passing Algorithms

Antoine Maillard*⋆,⊗*, Laura Foini†, Alejandro Lage Castellanos⋄,

Florent Krzakala⋆, Marc Mézard⋆, Lenka Zdeborová†

Abstract

Improved mean-field technics are a central theme of statistical physics methods applied to inference and learning. We revisit here some of these methods using high-temperature expansions for disordered systems initiated by Plefka, Georges and Yedidia. We derive the Gibbs free entropy and the subsequent self-consistent equations for a generic class of statistical models with correlated matrices and show in particular that many classical approximation schemes, such as adaptive TAP, Expectation-Consistency, or the approximations behind the Vector Approximate Message Passing algorithm all rely on the same assumptions, that are also at the heart of high-temperature expansions. We focus on the case of rotationally invariant random coupling matrices in the ‘high-dimensional’ limit in which the number of samples and the dimension are both large, but with a fixed ratio. This encapsulates many widely studied models, such as Restricted Boltzmann Machines or Generalized Linear Models with correlated data matrices. In this general setting, we show that all the approximation schemes described before are equivalent, and we conjecture that they are exact in the thermodynamic limit in the replica symmetric phases. We achieve this conclusion by resummation of the infinite perturbation series, which generalises a seminal result of Parisi and Potters. A rigorous derivation of this conjecture is an interesting mathematical challenge. On the way to these conclusions, we uncover several diagrammatical results in connection with free probability and random matrix theory, that are interesting independently of the rest of our work.

†† $\star$ Laboratoire de Physique de l’ENS, PSL University, CNRS, Sorbonne Universités, Paris, France.

$\dagger$ Institut de Physique Théorique, CNRS, CEA, Université Paris-Saclay, Saclay, France.

$\diamond$ University of Havana - Departamento de Física Teórica, Havana, Cuba.

$\otimes$ To whom correspondence shall be sent: [email protected]

1 Introduction
1.1 Background and overview of related works
1.2 Structure of the paper, and summary of our contributions
2 Symmetric and bipartite spherical models with rotationally-invariant couplings
2.1 Symmetric spherical model
2.1.1 Direct free entropy computation
2.1.2 Plefka expansion and the Georges-Yedidia formalism
2.1.3 Stability of the paramagnetic phase
2.2 Bipartite spherical model
2.2.1 Direct free entropy computation
2.2.2 Plefka expansion
3 Plefka expansion and Expectation Consistency approximations
3.1 Expectation Consistency, adaptive TAP, and Vector Approximate Message Passing approximations
3.1.1 Expectation Consistency approximation
3.1.2 Adaptive TAP approximation
3.1.3 Vector Approximate Message Passing approximation
3.2 Plefka expansion for models of symmetric pairwise interactions
3.2.1 A symmetric model with generic priors
3.2.2 Connection of the Plefka expansion to EC approximations
3.2.3 Application to the Hopfield model
3.3 The Replica approach
3.3.1 Expectation-Consistency approximation
3.3.2 The adaTAP approximation
3.3.3 The VAMP approach
3.3.4 Plefka expansion
3.4 Plefka expansion for models of bipartite pairwise interactions
3.4.1 A bipartite model with generic priors
3.4.2 Generalized Linear Models with correlated matrices
4 Consequences for iterative algorithms
4.1 Generalized Approximate Message Passing for a GLM with i.i.d. matrices
4.2 Vector Approximate Message Passing (VAMP) in Compressed Sensing
4.2.1 The TAP equations in Compressed sensing
4.2.2 TAP equations and the fixed point of the VAMP algorithm
4.3 Generalized Vector Approximate Message Passing (G-VAMP) for Generalized Linear Models
4.3.1 The TAP equations from the Plefka expansion
4.3.2 The G-VAMP algorithm for Generalized Linear Models
4.3.3 TAP equations and fixed points of G-VAMP
5 The diagrammatics of the Plefka expansion
5.1 A weaker version of Theorem 1
5.2 The expectation of generic diagrams
5.2.1 Eulerian diagrams, strongly irreducible diagrams and simple cycles
5.2.2 Cactus diagrams
5.3 Concentration of the diagrams: a second moment analysis
5.4 The higher-order moments and their influence on the diagrammatics in the symmetric model
5.5 Extension to bipartite models
5.5.1 Generalization of the previous results to rectangular matrices
5.5.2 The higher order moments and their influence on the diagrammatics
5.6 A note on i.i.d. matrices
A The Georges-Yedidia formalism
B Order $4$ of the Plefka expansion for Sec. 2.1.
C Some definitions and reminders of random matrix theory
D Technical derivations and generalizations of the diagrammatics
D.1 Hermitian matrix model
D.2 A note on the expectation of diagrams of diverging size

1 Introduction

1.1 Background and overview of related works

Many inference and learning tasks can be formulated as a statistical physics problem, where one needs to compute or approximate the marginal distributions of single variables in an interacting model. This is, for instance, the basis behind the popular variational mean-field approach [WJ08]. Going beyond the naive mean-field theory has been a constant goal in both physics and machine learning. One approach, for instance, has been very effective on tree-like structures: the Bethe approximation, or Belief-Propagation. Its development in the statistical physics of disordered systems can be traced back to Thouless-Anderson-Palmer (TAP) [TAP77] and has seen many developments since then [MPV87, YFW03, MM09, ZK16]. Over the last decades, in particular, there has been many works on densely connected models, leading to a myriad of different approximation schemes. In many disordered problems with i.i.d. couplings, a classical approach has been to write the TAP equations as an iterative scheme. Iterative algorithms based on this scheme are often called Approximate Message Passing (AMP) [DMM09, KMS*+*12] in this context.

AMP, or TAP, is an especially powerful approach when the coupling constants in the underlying statistical model are distributed as i.i.d. variables. This is, of course, a strong limitation and many inference schemes have been designed to improve on it: the adaptive TAP (adaTAP) method [OW01a, OW01b], approximation schemes such as Expectation-Consistency (EC) [Min01, OW05a] and the recent improvements of AMP such as Vector approximate Message Passing (VAMP) and its variants [MP17, RSF17, SRF16, OCW16, ÇOFW16]. Given all these approaches, one may wonder how different they are, and when they actually lead to asymptotically exact inference. In this paper, we wish to address this question using two main tools: high-temperature expansions and random matrix theory.

High-temperature expansions at fixed order parameters (denoted in this paper as “Plefka expansions”) are an important tool of the study of disordered systems. In the context of spin glass models, they have been introduced by Plefka [Ple82] for the Sherrington-Kirkpatrick (SK) model, and have been subsequently generalized, in particular by Georges-Yedidia [GY91]. This latter paper provides a systematic way to compute high-temperature (or high-dimension) expansions of the Gibbs free entropy for a fixed value of the order parameters (that is Plefka expansions).

One aim of the present paper is to apply this method to a general class of inference problems with pairwise interactions, in which the coupling constants are not i.i.d., but they can have strong correlations, while keeping a rotational invariance that will be made explicit below. In particular, we generalize earlier and inspirational work by Parisi and Potters [PP95], who computed the self-consistent equations for the marginals in Ising models with orthogonal couplings via a resummation of the infinite series given by the high-temperature expansion. We shall show that a similar resummation yields the EC, adaTAP and VAMP formalisms.

1.2 Structure of the paper, and summary of our contributions

In this paper, we perform Plefka expansions for a generic class of models of pairwise interactions with correlated matrices. We provide a detailed derivation of the method, inspired by the work of Georges-Yedidia [GY91] for Ising models, and we include new results on the diagrammatics of the expansions, leveraging rigorous results of random matrix theory. This yields a general framework that encapsulates many known properties of systems sharing this pairwise structure. The main message of this work is that the three successful approximation schemes that have been developed in the last two decades, Expectation-Consistency, adaTAP or Vector Approximate Message Passing, are equivalent and rely on the same hidden hypothesis. A careful analysis of the Plefka expansion reveals this hypothesis, as it identifies the class of high-temperature expansion diagrams that are effectively kept in these three schemes. A diagrammatic analysis leads us to conjecture that all these methods are asymptotically exact for rotationally-invariant models, in the high-temperature phase. It is also worth noting that although all four methods (Expectation-Consistency, adaTAP, Vector Approximate Message Passing, Plefka expansion) lead to the same mean-field equations, the (most recent) VAMP approach presents the advantage of generating a “natural” way to iterate these equations, which turns them into efficient algorithms. We now turn to a more precise description of the content of the paper. Throughout the paper, we will use two random matrix ensembles that we will both refer to as being rotationally invariant. The first one is defined as a measure over the set $\mathcal{S}_{N}$ of symmetric matrices:

Model S (Symmetric rotationally invariant matrix).

Let $N\geq 1$ . $J\in{\cal S}_{N}$ is generated as $J=ODO^{\intercal}$ , in which $O\in\mathcal{O}(N)$ is drawn uniformly from the (compact) orthogonal group $\mathcal{O}(N)$ , and $D=\mathrm{Diag}(\{d_{i}\}_{i=1}^{N})$ is a random diagonal matrix, such that its empirical spectral distribution $\rho^{(N)}_{D}\equiv\frac{1}{N}\sum_{i=1}^{N}\delta_{d_{i}}$ converges (almost surely) as $N\to\infty$ to a probability distribution $\rho_{D}$ with compact support. The smallest and largest eigenvalue of $D$ are assumed to converge almost surely to the infimum and supremum of the support of $\rho_{D}$ .

In a similar way, we define an ensemble of rectangular rotationally invariant matrices:

Model R (Rectangular rotationally invariant matrix).

Let $N\geq 1$ , and $M=M(N)\geq 1$ such that $M/N\to\alpha>0$ as $N\to\infty$ . $L\in\mathbb{R}^{M\times N}$ is generated via its SVD decomposition $L=U\Sigma V^{\intercal}$ , in which $U\in\mathcal{O}(M)$ and $V\in\mathcal{O}(N)$ are drawn uniformly from their respective orthogonal group. $D\equiv\Sigma^{\intercal}\Sigma=\mathrm{Diag}(\{d_{i}\}_{i=1}^{N})$ is a diagonal matrix, such that its empirical spectral distribution $\rho^{(N)}_{D}\equiv\frac{1}{N}\sum_{i=1}^{N}\delta_{d_{i}}$ converges (almost surely) as $N\to\infty$ to a probability distribution $\rho_{D}$ , which has compact support. The smallest and largest eigenvalue of $D$ are assumed to converge almost surely to the infimum and supremum of the support of $\rho_{D}$ .

Examples

Examples of such random matrix ensembles include matrices generated via a potential $V(x)$ : one can generate $J\in\mathcal{S}_{N}$ with a probability density proportional to $e^{-\frac{N}{2}\,\mathrm{Tr}\,V(J)}$ , and this kind of matrix satisfies the hypotheses of Model S. These ensembles also include the following well-known examples:

$\bullet$

The Gaussian Orthogonal Ensemble (GOE), in the case of Model S with a potential $V(x)=x^{2}/2$ .

$\bullet$

The Wishart ensemble with a ratio $\psi\geq 1$ . This corresponds to a random matrix $W=XX^{\intercal}/m$ , with $X\in\mathbb{R}^{n\times m}$ an i.i.d. standard Gaussian matrix, and $n,m\to\infty$ with $m/n\to\psi$ . This ensemble satisfies Model S, with a potential $V(x)=x-(\psi-1)\log x$ .

$\bullet$

Standard Gaussian i.i.d. rectangular matrices, for Model R. One can also think of them as generated via a potential, as the probability density of such a matrix is $\mathbb{P}(L)\propto e^{-\frac{1}{2}\mathrm{Tr}\,L^{\intercal}L}$ .

$\bullet$

Generically, consider a random matrix $L$ from Model R. Then, both $J_{1}\equiv L^{\intercal}L$ and $J_{2}\equiv LL^{\intercal}$ satisfy the hypotheses of Model S.

The structure of our work is as follows:

$\bullet$

Spherical models with rotationally invariant couplings In Sec. 2, we focus on spherical models and we generalize the seminal works of [MPR94a, MPR94b, PP95]. While they studied Ising models with orthogonal couplings, we consider spherical models, just assuming the coupling matrix to be rotationally invariant. We consider two types of models: “symmetric” models with an interaction of the type ${\textbf{x}}^{\intercal}J{\textbf{x}}$ , in which $J$ follows Model S, and “bipartite” models with interactions of the type ${\textbf{h}}^{\intercal}F{\textbf{x}}$ , in which $F$ follows Model R. This encapsulates orthogonal couplings, but can also be applied to other random matrix ensembles such as the Gaussian Orthogonal Ensemble (GOE), the Wishart ensemble, and many others. Using diagrammatic results that we derive with random matrix theory, we conjecture a resummation of the Plefka expansion giving the Gibbs free entropy in these models. Our results are in particular consistent with the findings of classical works for Gaussian couplings [Ple82] and orthogonal couplings [PP95].

$\bullet$

Plefka expansion for statistical models with correlated couplings Sec. 3 is devoted to the description of the Plefka expansion for different statistical models and inference problems which possess a coupling or data matrix that has rotation invariance properties. We consider models similar to the spherical models of Sec. 2, but with generic prior distributions on the underlying variables. In Sec. 3.1, we recall the Expectation-Consistency (EC), adaTAP and VAMP approximations and comment briefly on their respective history, before showing that they are equivalent. As a consequence, we will generically refer to these approximations as the Expectation-Consistency approximations (EC). We hope that our paper will help providing a unifying presentation of these works, generalizing them by leveraging random matrix theory. Our main conjecture for this part can be stated as the following:

Conjecture 1.

[Informal] For statistical models of symmetric or bipartite interactions with coupling matrices that satisfy respectively Model S or Model R, the three equivalent approximations, Expectation-Consistency, adaTAP and VAMP (generically denoted EC approximations), are exact in the large size limit in the high temperature phase.

We believe that the validity of the above conjecture extends beyond the high temperature phase. In particular that it is correct for inference problems in the Bayes-optimal setting, and more generally anytime the system is in a replica symmetric phase as defined in [MPV87].

The approximation behind EC approximations can be checked order by order using our high-temperature Plefka expansions technique and its resummation. We then derive Plefka expansions for these generic models, and we apply it to different situations, namely:

–

In Sec. 3.2.1 we perform a Plefka expansion for a generic symmetric rotationally invariant model with pairwise interactions. Using this method and our diagrammatic results, we show then in Sec. 3.2.2 that the EC approximations are exact for these models in the large size limit.

–

In Sec. 3.2.3 we apply our general result to the TAP free energy of the Hopfield model [Hop82], an Ising spin model with a correlated matrix of the Wishart ensemble, used as a basic model of neural network. In particular, we find back straightforwardly the results of [NT97] and [Méz17].

–

In Sec. 3.3 we extend our Plefka expansion and the corresponding diagrammatic techniques to the study of a replicated system, in which we constraint the overlap between different replicas. The interest for such systems comes as a consequence of the celebrated replica method of theoretical physics [MPV87].

–

Finally, we show in Sec. 3.4 how we can use these results to derive the Plefka-expanded free entropy for a very broad class of bipartite models, which includes the Generalized Linear Models (GLMs) with correlated data matrices, and the Compressed Sensing problem.

We emphasize that we were able to derive the free entropy of all these models using very generic arguments relying only on the rotational invariance of the problem.

$\bullet$

The TAP equations and message passing algorithms Finally, we show in Sec. 4 that the TAP (or EC) equations that we derived by maximizing the Gibbs free entropy of rotationally invariant models can strikingly be understood as the fixed point equations of message passing algorithms. In the converse way, many message-passing algorithms can be seen as an iteration scheme of the TAP equations. This was known in many models in which the underlying data matrix was assumed to be i.i.d. For instance, the Generalized Approximate Message Passing (GAMP) algorithm [Ran11] was shown in [KMS*+*12] to be equivalent to the TAP equations, a result that we find back in Sec. 4.1, while TAP equations were already iterated for Restricted Boltzmann Machines, see [TGM*+*18]. In the Plefka expansion language, these results relied on the early stopping of the expansion at order $2$ (in powers of the couplings) as a consequence of the i.i.d. hypothesis. Using our resummation results to deal with the series at infinite orders, we were able to generalize these correspondences to correlated models. We argue that the stationary limit of the Vector Approximate Message Passing (VAMP) algorithm [RSF17] (that is its fixed point equations) for compressed sensing with correlated matrices gives back our TAP equations derived via Plefka expansion, see Sec. 4.2. Even more generally, the Generalized Vector Approximate Passing (G-VAMP) algorithm [SRF16], defined for the very broad class of Generalized Linear Models with correlated matrices, yields fixed point equations that are equivalent to our Plefka-expanded TAP equations, see Sec. 4.3. Combined with the results of Sec. 3, this indicates that the VAMP algorithm is an example of an approximation scheme that follows conjecture 1.

$\bullet$

Diagrammatics of the expansion and random matrix theory Our results are largely based on a better control on the diagrammatics of the Plefka expansions for rotationally invariant random matrices, which are presented in Sec. 5. We leverage mathematically rigorous results on Harish-Chandra-Itzykson-Zuber (HCIZ) integrals [HC57, IZ80, GM05, CŚ07], involving transforms of the asymptotic spectrum of the coupling matrix, to argue that only a very specific class of diagrams contributes to the high-temperature expansion of a system with rotationally invariant couplings. These results are used throughout our study, and are detailed in Sec. 5. Some generalizations are postponed to Appendix D.

2 Symmetric and bipartite spherical models with rotationally-invariant couplings

In this section we consider two spherical models that will serve both as guidelines and building blocks for our subsequent analysis. We show in details how to perform the Plefka-Georges-Yedidia high-temperature expansion in this context, and the precise diagrammatic results that allow us to resum the Plefka series for rotationally invariant couplings. These results will be useful to clarify our subsequent derivation of the TAP equations in more involved models, and are also interesting by themselves from a random matrix theory point of view.

2.1 Symmetric spherical model

In this section $N\geq 1$ , $\sigma>0$ , and we define the following pairwise interaction Hamiltonian on $\mathbb{S}^{N-1}(\sigma\sqrt{N})$ , the $N$ -th dimensional sphere of radius $\sigma\sqrt{N}$ :

[TABLE]

The coupling matrix $J$ is a $N\times N$ symmetric random matrix drawn from Model S.

2.1.1 Direct free entropy computation

The Gibbs measure for our model at inverse temperature $\beta$ is defined as:

[TABLE]

in which ${\rm d}{\textbf{x}}$ is the usual surface measure on the sphere $\mathbb{S}^{N-1}(\sigma\sqrt{N})$ . We write the partition function of the model introducing a Lagrange multiplier $\gamma$ to enforce the condition $\left\lVert{\textbf{x}}\right\rVert^{2}=N\sigma^{2}$ . We will write $A_{N}\simeq B_{N}$ to denote that $\frac{1}{N}\log A_{N}=\frac{1}{N}\log B_{N}+\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}_{N}(1)$ . At leading exponential order, one has:

[TABLE]

Denoting $\gamma(\beta)$ the solution to the saddle-point equation in eq. (4), we have effectively defined a new Gibbs measure:

[TABLE]

where now $\mathrm{d}{\textbf{x}}$ is the usual Euclidian measure on $\mathbb{R}^{N}$ . Following [KTJ76] we diagonalize the Hamiltonian and we integrate over the spins in this new basis, which yields:

[TABLE]

in which the sum over $\lambda$ runs over the set of eigenvalues of $J$ . Taking the $N\to\infty$ limit, the saddle point equation reads:

[TABLE]

which we can write as a function of the limiting spectral law $\rho_{D}$ of the matrix $J$ (defined in Model S):

[TABLE]

We assumed (see Model S) that the support of $\rho_{D}$ is compact so that we can define its maximum $\lambda_{\rm max}\in\mathbb{R}$ . Under these assumptions, eq. (8) has the solution:

[TABLE]

as long as $-{\cal S}_{\rho_{D}}(\lambda_{\rm max})\geq\beta\sigma^{2}$ , where ${\cal R}_{\rho_{D}}$ is the ${\cal R}$ -transform of $\rho_{D}$ and ${\cal S}_{\rho_{D}}$ its Stieltjes transform (see Appendix C for their definitions). In the opposite case (if $\beta\sigma^{2}>-{\cal S}_{\rho_{D}}(\lambda_{\rm max})$ ), $\gamma$ ‘sticks’ to the solution $\gamma=\lambda_{\rm max}\beta$ . The intensive free entropy $\Phi_{J}(\beta)$ is defined as:

[TABLE]

In the end, we can compute the free entropy in the high-temperature phase $\beta\leq\beta_{c}\equiv-\sigma^{-2}{\cal S}_{\rho_{D}}(\lambda_{\rm max})$ :

[TABLE]

By taking the derivative of this expression with respect to $\beta$ it is easy to show that this simplifies to:

[TABLE]

In the low temperature phase (for $\beta\geq\beta_{c}=-\sigma^{-2}{\cal S}_{\rho_{D}}(\lambda_{\rm max})$ ) one has

[TABLE]

Note that both in the high and low temperature phases the free entropy can formally be expressed as:

[TABLE]

a formulation which is both more compact and easier to implement algorithmically for generic matrices $J$ .

Remark

The free entropy is usually defined as an average over the quenched disorder $J$ , but here it is clear that the free entropy is self-averaging as a function of $J$ , so that taking this average is trivial. Moreover, $\Phi_{J}(\beta)$ only depends on $J$ via $\rho_{D}$ , its asymptotic eigenvalue distribution.

Remark

The derivation of the free entropy both in the high and low temperature phase has been made rigorous in [GM05], and the method of proof also essentially consists in fixing a Lagrange multiplier to enforce the condition $\sum_{i}s_{i}^{2}=\sigma^{2}N$ .

2.1.2 Plefka expansion and the Georges-Yedidia formalism

A more generic way to compute the free entropy is to follow the formalism of [GY91] to perform a high-temperature Plefka expansion [Ple82]. The goal is to expand the free entropy at low $\beta$ , in the high-temperature phase. In order to do so, we introduce the very useful $U$ operator defined in Appendix A of [GY91]. We will compute the free entropy given the constraints on the means $\braket{x_{i}}_{\beta}=m_{i}$ and on the variances $\braket{x_{i}^{2}}_{\beta}=v_{i}+m_{i}^{2}$ . The notation $\braket{\cdot}_{\beta}$ indicates an average over the Gibbs measure of our system at inverse temperature $\beta$ , see eq. (5). A set of parameters $\{m_{i},v_{i}\}$ will thus determine a free entropy value, and the comparison with the direct calculation of Sec. 2.1.1 will be made by maximizing the free entropy with respect to $\{m_{i},v_{i}\}$ . We can enforce the spherical constraint $\left\lVert{\textbf{x}}\right\rVert_{2}^{2}=\sigma^{2}N$ by constraining our choice of parameters $\{m_{i},v_{i}\}$ to satisfy the identity:

[TABLE]

The Lagrange parameters introduced to fix the magnetizations are denoted $\{\lambda_{i}\}$ , and the ones used to fix the variances are denoted $\{\gamma_{i}\}$ . For clarity we will keep their dependency on $\beta$ explicit only when needed. For a given $\beta$ and a given $J$ one defines the operator $U$ of Georges-Yedidia:

[TABLE]

The derivation of $U$ as well as its (many) useful properties are briefly recalled in Appendix A. We are now ready to compute the first orders of the expansion of the free entropy $\Phi_{J}(\beta)$ in terms of $\beta$ . In this expansion the Lagrange parameters $\{\lambda_{i}(\beta),\gamma_{i}(\beta)\}$ are always considered at $\beta=0$ , so we drop their $\beta$ -dependency. We detail the first orders of the expansion, following Appendix A (cft. Appendix A of [GY91]).

Order 0

First of all, taking $\beta=0$ one has easily:

[TABLE]

This yields after extremization over $\{\lambda_{i},\gamma_{i}\}$ :

[TABLE]

Order 1

At order $1$ , one easily derives:

[TABLE]

We can now make use of the Maxwell-type relations which are valid at any $\beta$ :

[TABLE]

These relations plugged in eq. (18) lead to $\partial_{\beta}\gamma_{i}(\beta=0)=J_{ii}$ and $\partial_{\beta}\lambda_{i}(\beta=0)=\sum_{j(\neq i)}J_{ij}m_{j}$ . We then obtain the $U$ operator at $\beta=0$ from eq. (16):

[TABLE]

Order 2

Following eq. (190) in Appendix A, we have the relation:

[TABLE]

Order 3 and 4

For the order $3$ , we obtain:

[TABLE]

in which the sum is made over pairwise distinct $i,j,k$ indices. Applying eq. (A) we reach:

[TABLE]

where again, $i,j,k,l$ are pairwise distinct indices. For pedagogical purposes (and since it will be useful for the following sections), we detail this calculation in Appendix B.

Larger orders

By its very nature, the perturbative expansion of Georges-Yedidia [GY91] can not (somehow disappointingly) give an analytic result for an arbitrary perturbation order $n$ . However, the results up to order $4$ of eqs. (17), (18), (22), (23), (24) lead to the following natural conjecture for the free entropy at a given realization of the disorder:

[TABLE]

Note that in order to obtain this formula, we took the $N\to\infty$ limit at every perturbation order in $\beta$ , which is part of the implicit assumptions of the Plefka expansion. The terms of this perturbative expansion can be represented diagrammatically as simple cycles of order $p$ , see Fig. 1(a).

In general, at any order in the expansion one can construct a diagrammatic representation of the contributing terms, and one expects that only strongly irreducible diagrams contribute to the free entropy. Strongly irreducible diagrams are those that cannot be split into two pieces by removing a vertex [GY91] (examples are given in Fig. 1(a) and 1(b)). However we retain only simple cycles as the one depicted in Fig. 1(a) because other diagrams as in Fig. 1(b) are negligible when $N\to\infty$ for rotationally invariant models, as we argue in Section 5.2. For the case of orthogonal couplings, this dominance of simple cycles was already noted in [PP95]. On the other hand, generic cactus diagrams like the one pictured in Fig. 1(c) are not negligible, but they cancel out and do not appear in the final form of the expansion (at order $4$ , this is shown in Appendix B).

We shall now prove the dominance of simple cycles, and the correctness of eq. (2.1.2), in the high-temperature phase. In this phase, the solution to the maximization of eq. (2.1.2) under $\{m_{i}\}$ is the paramagnetic solution $m_{i}=0$ . Furthermore, we expect that the $\{v_{i}\}$ that maximize the free entropy of eq. (2.1.2) are homogeneous, that is $\forall i,\,v_{i}=v$ . The constraint of eq. (15) thus gives $v=\sigma^{2}$ .

We can compare the result of the resummation of simple cycles, eq. (2.1.2) with the exact results of eq. (12) in the paramagnetic phase. For these two results to agree, we need the generating function for simple cycles to be related to the ${\cal R}$ -transform of $\rho_{D}$ by:

[TABLE]

in which the outer expectation is with respect to the distribution of $J$ . In particular, an order-by-order comparison yields that the free cumulants $\{c_{p}(\rho_{D})\}_{p\in\mathbb{N}^{\star}}$ (see Appendix C for their definition) must satisfy:

[TABLE]

Using rigorous results of [GM05], we were able to prove a stronger version of eq. (27), namely convergence in $L^{2}$ norm, so we state it as a theorem:

Theorem 1.

For a matrix $J\in{\cal S}_{N}$ generated by Model S, one has for every $p\in\mathbb{N}^{\star}$ :

[TABLE]

We postpone the proof to Sec. 5. We assume that we can invert the summation over $p$ and the $N\to\infty$ limit in eq. (2.1.2), so Theorem 1 implies that eq. (26) is true not only in expectation but that we can write:

[TABLE]

in which the limit here means convergence in $L^{2}$ norm as $N\to\infty$ . This is important, as it allows to “resum” the free entropy of eq. (2.1.2), which is valid for a given instance of $J$ . As a final note, we can use the results of Sec. 2.1.1 to write our result in an alternative form (dropping $\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}_{N}(1)$ terms):

[TABLE]

2.1.3 Stability of the paramagnetic phase

We can check whether the paramagnetic solution is stable exactly up to the temperature $\beta=\beta_{c}$ . Recall that in this model we do not optimize the free entropy simultaneously over $v$ and the $\{m_{i}\}$ , because the norm $||{\textbf{x}}||_{2}^{2}=\sigma^{2}N$ is fixed, yielding the constraint $v=\sigma^{2}-\frac{1}{N}\sum_{i}m_{i}^{2}$ . Solely as a function of the $\{m_{i}\}$ , the free entropy therefore reads, up to $\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}_{N}(1)$ terms:

[TABLE]

in which $G_{\rho_{D}}$ is the integrated $\mathcal{R}$ -transform of $\rho_{D}$ , see Appendix C for its definition. The Hessian of the extensive free entropy $N\Phi_{J}$ at the paramagnetic solution $\bf{m}=0$ is:

[TABLE]

The paramagnetic solution is stable as long as the Hessian $N\partial^{2}_{m}\Phi_{J}(\beta,m=0)$ is a negative matrix. This is true as long as $\beta<\beta_{c}=-\sigma^{-2}\mathcal{S}_{\rho_{D}}(\lambda_{\rm max})$ , because at $\beta_{c}$ the spectrum of $N\left(\frac{\partial^{2}\Phi_{J}}{\partial m_{i}\partial m_{j}}\right)_{m=0}$ touches zero. For $\beta>\beta_{c}$ the Hessian is again negative, giving the impression of stability of the paramagnetic phase, however $\mathcal{S}_{\rho_{D}}^{-1}(-\beta\sigma^{2})$ is evaluated in the non physical solution, so the solution has to be discarded. Our Plefka expansion allows thus to compute the free entropy in the whole paramagnetic phase, coherently with the results of Sec. 2.1.1. More generically, as shown by Plefka [Ple82] in the closely-related SK model, the Hessian of the free entropy with respect to $\{m_{i}\}$ is related to the inverse susceptibility matrix of the system, and thus the non-inversibility of the Hessian implies a non-analyticity point of the free entropy.

Validity of the Plefka expansion and stability of the replica symmetric solution

We believe it is an open question to relate the range of validity $\beta_{c}$ of the Plefka expansion and the de Almeida-Thouless condition that characterizes the local stability of the replica symmetric solution (see [DAT78] for its original derivation, and [Kab08a, SK08] for examples of its applications in inference problems). The equivalence of these two conditions was shown in the seminal paper of Plefka [Ple82] in the Sherrington-Kirpatrick model. It is tedious but straightforward to generalize this conclusion to a model with Ising spins $x_{i}=\pm 1$ and a Hamiltonian given by eq. (1) with a rotationally-invariant coupling matrix $J$ drawn from Model S. However, investigating the relation between these two conditions in a general model appears to be an open problem, and is beyond the scope of our paper.

The region of validity of the expansion and the free cumulant series

In this remark, we clarify some possible confusion about the radius of convergence of the free cumulant series and the domain of validity of the Plefka expansion. We performed an expansion of $\Phi_{J}(\beta)$ close to $\beta=0$ , which implies that this expansion is thus valid in the region $(0,\beta_{c})$ , in which $\beta_{c}=-\sigma^{-2}{\cal S}(\lambda_{\rm max})$ is the first non-analyticity of $\Phi_{J}(\beta)$ , see eqs. (12), (13) and the discussion above. Note that there exists spectrums for which the function ${\cal R}_{\rho_{D}}(x)$ can be analytically extended beyond $x_{c}\equiv-{\cal S}_{\rho_{D}}(\lambda_{\rm max})$ , as for instance Wigner’s semi-circle law, for which ${\cal R}_{\rm s.c.}(x)=x$ . Yet, one has to be careful that this does not imply that the free entropy $\Phi_{J}(\beta)$ is analytic beyond $\beta_{c}$ , and thus even in this case our Plefka expansion is a priori only valid up to $\beta=\beta_{c}$ .

2.2 Bipartite spherical model

In this section we consider $N,M\geq 1$ . We let $\alpha>0$ , and we will take the limit (sometimes referred to as the thermodynamic limit) in which $N,M\to\infty$ with a fixed ratio $M/N\to\alpha$ . We let $\sigma_{x},\sigma_{h}>0$ . Let us consider the following Hamiltonian, which is a function of two fields ${\textbf{h}}\in\mathbb{R}^{M}$ and ${\textbf{x}}\in\mathbb{R}^{N}$ :

[TABLE]

The coupling matrix $L\in\mathbb{R}^{M\times N}$ is assumed to be drawn from Model R.

2.2.1 Direct free entropy computation

The calculation for this bipartite case is very similar to the calculation performed in Sec. 2.1.1, although one can not always express the result as a well-known transform of the measure $\rho_{D}$ of Model S. For all values of $\beta$ , the result can be expressed as:

[TABLE]

where $\rho_{D}$ is the asymptotic eigenvalue distribution of $L^{T}L$ (see the definition of Model R).

2.2.2 Plefka expansion

The Plefka expansion for this model is a straightforward generalization of Sec. 2.1.2. We will fix the averages to be $\braket{h_{\mu}}=m^{h}_{\mu}$ and $\braket{x_{i}}=m^{x}_{i}$ , and the second moments $\braket{h_{\mu}^{2}}=v^{h}_{\mu}+(m^{h}_{\mu})^{2}$ and $\braket{x_{i}^{2}}=v^{x}_{i}+(m^{x}_{i})^{2}$ , again with the constraints $\sigma_{h}^{2}=\frac{1}{M}\sum_{\mu=1}^{M}\left[v^{h}_{\mu}+(m^{h}_{\mu})^{2}\right]$ and $\sigma_{x}^{2}=\frac{1}{N}\sum_{i=1}^{N}\left[v^{x}_{i}+(m^{x}_{i})^{2}\right]$ . In this problem, the $U$ operator of [GY91] at $\beta=0$ is given by:

[TABLE]

Once again, as in Sec. 2.1.2, one can study all the diagrams that appear in the Plefka expansion. We show again the $L^{2}$ concentration of the simple cycles, and the negligibility of other strongly irreducible diagrams that can be constructed from the rectangular $L$ matrix. We state in more details these results for the bipartite case in Sec. 5.5.1. We obtain the following result, a counterpart to eq. (2.1.2) for this bipartite model:

[TABLE]

in which indices $\{\mu_{l}\}$ run from $1$ to $M$ and indices $\{i_{l}\}$ run from $1$ to $N$ . We make again an assumption of uniform variances at the maximum: $v^{h}_{\mu}=v^{h},v^{x}_{i}=v^{x}$ . Comparing to eq. (33) in the paramagnetic $m^{h}_{\mu},m^{x}_{i}=0$ phase, we obtain the correspondence, similar to eq. (28) and valid a priori for any given realization of $L$ , in the high temperature phase:

[TABLE]

3 Plefka expansion and Expectation Consistency approximations

In this section we perform Plefka expansions for generic models of pairwise interactions, both symmetric and bipartite. In Sec. 3.1 we recall some known facts on the Expectation Consistency (also called Expectation Propagation [Min01]), adaTAP and VAMP approximations to compute the free entropy of such models. In Sec. 3.2 and Sec. 3.4 we generalize the results of the Plefka expansions of Sec. 2 to these models and highlight the main differences and assumptions of our method. This yields a very precise and systematic justification of the TAP equations for rotationally invariant models. We apply these results to retrieve the TAP free entropy of the Hopfield model, Compressed Sensing, as well as different variations of high-dimensional inference models called Generalized Linear Models (GLMs). Sec. 3.3 is devoted to the study of a generic replicated system using these approximations, and the Plefka expansion. We show that they can be used in the celebrated replica method [MPV87] of theoretical physics, to compute the Gibbs free entropy of a generic pairwise inference model.

3.1 Expectation Consistency, adaptive TAP, and Vector Approximate Message Passing approximations

Expectation Consistency (EC) [OW05a, OW05b], is an approximation scheme for a generic class of disordered systems that can also be applied to many inference problems. In this section we show how this scheme is derived and is closely related to the adaTAP approximation [OW01a, OW01b], and the VAMP approximation [RSF17]. Let us shortly comment on the history of these methods. The adaTAP scheme was developed and presented in $2001$ in [OW01b, OW01a], and was discussed in details in the review [OS01] for systems close to the SK model. The same year, Thomas Minka’s Expectation Propagation (EP) approach was presented [Min01]. Opper and Winther used an alternative view of local-consistency approximations of the EP–type which they call Expectation Consistent (EC) approximations in [OW05a, OW05b], effectively rederiving their adaTAP scheme from this new point of view. The VAMP approach is more recent [SRF16], and is again another EP approach for a different problem (compressed sensing) but it has the advantage that, compared with other EP-like approaches [ÇOFW16] it leads to a practical converging algorithm, and a rigorous treatement of its time evolution. The connection between these approaches and the Parisi-Potters formulation for inference problems [JR16] was hinted several times for SK-like problems, see e.g. [OCW16, ÇO19]. We hope that our paper will help providing a unifying presentation of these works, generalizing them way beyond the SK model alone by leveraging random matrix theory. We recall briefly the main arguments of these papers which are useful for our discussion.

3.1.1 Expectation Consistency approximation

Consider a model in which the density of a vector ${\textbf{x}}\in\mathbb{R}^{N}$ is given by a probability distribution of the form:

[TABLE]

Such distributions typically appear in Bayesian approaches to inference problems. We will use the Bayesian language and denote $P_{0}$ as a prior distribution on x, which will be typically factorized (all the components of x are assumed to be independent under $P_{0}$ ) ; The distribution $P_{J}$ is responsible for the interactions between the $\{x_{i}\}$ . In this paper we are interested in pairwise interactions, which means that the $\log$ of $P_{J}$ is a quadratic form in the $\{x_{i}\}$ variables. An example of such a model is the infinite-range Ising model of statistical physics at inverse temperature $\beta\geq 0$ , with a binary prior and a quadratic interaction governed by a coupling matrix $J$ . In this specific model, we have:

[TABLE]

for some $\{h_{i}\}\in\mathbb{R}^{N}$ . Our goal is to compute the large $N$ limit of the free entropy $\log Z$ in the model of eq. (37). Each of the two distributions $P_{0}$ and $P_{J}$ allows for tractable computations of physical quantities (like averages), but the difficulty arises when considering their product. The idea behind EC is to simultaneously approximate $P_{0}$ and $P_{J}$ by a tractable family of distributions. For the sake of the presentation we will consider the family of Gaussian probability distributions, although this can be generalized to different families, see the general framework of [OW05a]. We define the first approximation as:

[TABLE]

Here, the parameter $\Gamma_{0}$ is a symmetric positive matrix and $\bm{\lambda}_{0}$ is a vector. We will denote $\braket{\cdot}_{0}$ the averages with respect to $\mu_{0}$ . We can write the trivial identity:

[TABLE]

The idea of EC is to replace, when one computes the average $\braket{P_{J}({\textbf{x}})\,e^{\frac{1}{2}{\textbf{x}}^{\intercal}\Gamma_{0}{\textbf{x}}-\bm{\lambda}_{0}^{\intercal}{\textbf{x}}}}_{0}$ , the distribution $\mu_{0}$ by an approximate Gaussian distribution, that we can write as:

[TABLE]

Performing this replacement yields the expectation-consistency approximation to the free entropy:

[TABLE]

Note that all three parts of this free entropy are tractable. In order to symmetrize the result we can define a third measure:

[TABLE]

The final free entropy should not depend on the values of the parameters, so we expect that the best values for $\Gamma_{0},\Gamma_{J},\bm{\lambda}_{0},\bm{\lambda}_{J}$ make $Z^{\rm EC}$ stationary. This is a strong hypothesis, and the reader can refer to [OW05a] for more details and justifications. This yields the Expectation Consistency conditions, giving their name to the procedure:

[TABLE]

3.1.2 Adaptive TAP approximation

The adaptive TAP approximation (or adaTAP) [OW01a, OW01b] provides an equivalent way to derive the free entropy of eq. (42) for models with pairwise interactions. Let us briefly sketch its derivation and the main arguments behind it. We follow the formulation of [HK13] and we consider again the infinite-range Ising model of eq. (38). The extensive Gibbs free entropy $N\Phi=\log Z$ at fixed values of the magnetizations $m_{i}=\braket{x_{i}}$ and $v_{ij}=\braket{x_{i}x_{j}}_{c}$ can be written using Lagrange parameters: a vector $\bm{\lambda}$ and a symmetric matrix $\Gamma$ .

[TABLE]

The adaTAP approximation consists in writing:

[TABLE]

In this expression, $\Phi_{g}(\beta,\bm{m},{\textbf{v}})$ denotes the free entropy of the same system, but where the spins have a Gaussian statistics. The idea behind the adaTAP approximation is as follows. The derivative $\partial_{l}\Phi(l,\bm{m},{\textbf{v}})=\frac{1}{2N}\sum_{ij}J_{ij}\braket{x_{i}x_{j}}$ is an expectation of a sum over a large number of terms; therefore it is reasonable to assume that this expectation is the same as if the underlying variables were Gaussian. This assumption of adaTAP, although reasonable, is a priori hard to justify more rigorously and systematically. It is important to notice that the free entropy (46) of adaTAP is equivalent to the one derived using Expectation Consistency in eq. (42). Indeed, using Lagrange parameters we can write the three terms of eq. (46) as:

[TABLE]

Once written in this form, the extremization over $\bm{m}$ and v of the free entropy implies that $\Gamma_{S}=\Gamma_{0}+\Gamma_{J}$ and $\bm{\lambda}_{S}=\bm{\lambda}_{0}+\bm{\lambda}_{J}$ . It is then clear that we found back $\log Z^{\rm EC}$ of eq. (42).

3.1.3 Vector Approximate Message Passing approximation

The Vector Approximate Message Passing (VAMP) algorithm [RSF17] extends previous message-passing approaches like the GAMP algorithm [Ran11] (that we will describe in more details in Sec. 4.1) to a class of correlated interaction matrices, namely matrices that satisfy a right-rotation invariance property, similarly to Model S and Model R. The algorithm itself can be derived in several ways (see [RSF17]). Here we briefly recall the use of belief-propagation equations on a “duplicated” factor graph and their Gaussian projection. As we shall see, the Bethe free entropy, given as a function of the BP messages, is then equivalent to the expectation-consistency free entropy. For simplicity, we consider again the problem of eq. (38) with a pairwise interaction involving a matrix $J$ following Model S.

The idea behind VAMP is to consider two vector spin variables ${\bf x_{1}}$ , with measure $P_{0}$ and ${\bf x_{2}}$ with measure $P_{J}$ , and to impose that they are equal. The partition function can be written using a trivial decomposition:

[TABLE]

This partition function can be represented as a “duplicated” factor graph involving two vector nodes, see Fig. 2.

One then writes the BP equations for this problem using this factor graph representation (the reader who is not familiar with BP equations and factor graph representations can consult [MM09]). The BP equations can be written in terms of two “messages” $m_{0}({\bf x_{2}})$ and $m_{J}({\bf x_{1}})$ . They can be obtained by looking at the stationarity conditions of the following “Bethe free entropy”:

[TABLE]

As the factor graph has a tree structure, these BP equations are an exact representation of the original problem, but they are in general intractable. In order to make the computation tractable one can make a Gaussian approximation, which is at the core of the VAMP algorithm: the messages $m_{0}$ and $m_{J}$ on the factor graph of Fig. 2 are assumed to be Gaussian, and thus are only characterized by their mean and their covariance. We can thus write:

[TABLE]

Writing the BP update rule with this assumption yields the VAMP algorithm of [RSF17]. In the present case it reads:

[TABLE]

where the measures $\mu_{0}$ and $\mu_{J}$ are respectively

[TABLE]

Plugging the ansatz of eq. (50) into eq. (49) immediately gives back the EC free entropy of eq. (42). And the BP equations of eq. (51d) are identical to the Expectation Consistency conditions of eq. (44).

In the end, the three approximation schemes, Expectation Consistency, adaptive TAP and VAMP give the same expression for the free entropy. However, an important advantage of the VAMP approach is that it “naturally” gives an iterative scheme to solve the fixed point equations because it was derived via the belief propagation equations. This iterative scheme turns the fixed point equations into an efficient algorithm, as noticed by [RSF17] and as we show later in Sec. 4.2.

3.2 Plefka expansion for models of symmetric pairwise interactions

3.2.1 A symmetric model with generic priors

In this subsection we consider a generic model of $N$ “spin” variables ${\textbf{x}}=\{x_{1},\cdots,x_{N}\}\in\mathbb{R}^{N}$ . They interact via a pairwise interaction and are subject to a possible external field, which is modeled by the following Hamiltonian:

[TABLE]

We will consider random symmetric coupling matrices $\{J_{ij}\}$ , generated from a rotationally invariant ensemble satisfying Model S. We assume that the variables $\{x_{i}\}$ have a prior distribution under which they are independent variables, each with a distribution $P_{i}$ . For instance, this includes Ising (binary) spins by choosing $P_{i}=\frac{1}{2}(\delta_{1}+\delta_{-1})$ . At a given inverse temperature $\beta\geq 0$ and a fixed realization of the coupling matrix $J$ we define the Gibbs-Boltzmann distribution of the spins, and the partition function, as:

[TABLE]

We will compute the large $N$ limit of the free entropy $\Phi_{J}(\beta)\equiv\frac{1}{N}\log Z_{J}(\beta)$ at fixed values of the magnetizations $m_{i}=\braket{x_{i}}$ and variances $v_{i}=\braket{(x_{i}-m_{i})^{2}}$ using the Plefka expansion. We will fix these variables using Lagrange multipliers $\{\lambda_{i}\}$ for the $\{m_{i}\}$ variables, and $\{\gamma_{i}\}$ for the $\{v_{i}\}$ . The goal of this section is to show how, and under which assumptions, the calculation of Sec. 2.1.2 can be generalized in this context. Clearly the zeroth order term is different from the spherical case and it is given by:

[TABLE]

As we underlined in Sec. 2.1.2, in the Georges-Yedidia method the Lagrange parameters are always considered at $\beta=0$ . At order $1$ in $\beta$ we obtain at leading order:

[TABLE]

The operator $U$ of eq. (16), defined in [GY91] and taken at $\beta=0$ , is thus exactly the same as the one of eq. (21). Using this remark we can see that many of the results obtained in Sec. 2.1.2 will apply to the present case. For instance the second order term is identical and given in eq. (22). We then conjecture, backed by our diagrammatic results in Sec. 5, that the higher order terms are different from the spherical model only in terms which are sub-leading in $N$ . For instance at third order we obtain:

[TABLE]

In this equation, we denoted $\kappa^{(p)}_{i}$ the cumulant of order $p$ of the distribution of $x_{i}$ at $\beta=0$ . Note that the rotation invariance of Model S implies that if $i\neq j$ , typically $J_{ij}\sim\frac{1}{\sqrt{N}}$ . Therefore a term like $\sum_{i\neq j}J_{ij}^{3}$ gives a negligible contribution to the free entropy. We shall therefore assume that the second part of the RHS of eq. (58) is negligible as $N\to\infty$ . This is correct provided that the possible correlations of the third order cumulants $\kappa^{(3)}_{i}$ with the matrix $J$ do not change the scaling of this term sufficiently to make it thermodynamically relevant (see Sec. 5.4 for more details on this particular point). The first term corresponds to a simple cycle of order $3$ and is the same term that appeared in Sec. 2.1.2.

Higher orders

We can carry on the computation of the derivatives $\frac{\partial^{p}}{\partial\beta^{p}}\Phi_{J}(\beta=0)$ . We explain in more details in Sec. 5.4 under which precise results and assumptions we can leverage the results of Sec. 2.1.2, summarized in eq. (2.1.2), to conjecture the following value of the free entropy at all orders of perturbation and at leading order in $N$ :

[TABLE]

Homogeneous variances

For the remainder of this section we assume that the maximum of the free entropy of eq. (59) is attained for variables $\{v_{i}\}$ such that $v_{i}=v$ . This hypothesis can be argued as reasonable for many models, but we postpone this argumentation to the applications of eq. (59) to specific models. We obtain a resummation of the Plefka free entropy using the correspondence of eq. (28):

[TABLE]

Recall finally that $\Phi_{J}(0)$ is given by eq. (56). We were able to perform this expansion and its resummation almost only by applying our results on the spherical models of Sec. 2. The study of the large- $N$ behavior of diagrams made out of matrix elements of $J$ , performed in Sec. 5, proves to be of crucial importance both in the expansion and its resummation. As discussed in Sec. 2.1.2, we expect this Plefka expansion of the free entropy to hold for $\beta<\beta_{c}$ , in which $\beta_{c}\equiv-v^{-1}\ {\cal S}_{\rho_{D}}(\lambda_{\rm max})$ .

3.2.2 Connection of the Plefka expansion to EC approximations

Although not obvious at first, the result of the Plefka expansion in eq. (59) provides a systematic and precise analysis of asymptotic exactness for rotationally invariant models of the Expectation Consistency approximations. The more straightforward way to see how the Plefka expansion relates to these approximations is to start from the adaTAP approximation, see eq. (46). In the language of the Plefka expansion, adaTAP amounts to assuming that at every order $p\geq 1$ of perturbation in $\beta$ , one can perform the calculation as if the statistics of the variables were Gaussian. An equivalent formulation is that all the terms of order $p\geq 1$ in the low- $\beta$ expansion of the free entropy should be the same for the model with a generic prior of Sec. 3.2.1 and for the spherical model of Sec. 2.1. This statement, which generalizes the Parisi-Potters result of [PP95], is exactly what we argued in the calculation of Sec. 3.2.1, using the diagrammatic analysis of Sec. 5. Therefore, the diagrammatic analysis ‘à la Plefka’ provides a clear meaning to the EC approximations: the class of diagrams that are neglected in these approximations are explicited in Sec. 5. We believe that these diagrams are actually negligible in the large $N$ limit, so that the EC approximations are actually exact asymptotically for rotationally invariant models, in the high temperature phase as captured by the resummation of the Plefka expansion, which we summarized in Conjecture 1. We believe that this asymptotic exactness extends beyond the high temperature phase to any model in the replica symmetric phase. The diagrammatic analysis provides a route to proving this statement rigorously.

3.2.3 Application to the Hopfield model

In the Hopfield model [Hop82] we consider binary spins ${\textbf{x}}\in\{\pm 1\}^{N}$ and the coupling matrix $J$ is constructed out of $P$ patterns, which are spin configurations $\xi^{l}\in\{\pm 1\}^{N}$ , for $l\in\{1,\cdots,P\}$ . The coupling constants are defined as:

[TABLE]

and we assume that the $\{\xi^{l}_{i}\}$ are i.i.d. variables with equal probability in $\{\pm 1\}$ , so that $\mathbb{E}J_{ij}=0$ and $\mathbb{E}J_{ij}^{2}=P/N^{2}$ . We study this system in the limit in which both $P,N\to\infty$ with a fixed ratio $P/N\to\alpha$ . The derivation of the TAP free energy for these models has been performed in [NT97, Méz17] via the Plefka expansion, and via the cavity method in [MPV87]. If the random matrix ensemble of eq. (61) is a priori not rotationally invariant, one can show that since the variables $\{\xi^{l}_{i}\}$ are i.i.d., only the first and second moment of their distributions will contribute to the thermodynamic limit of the free entropy, so that we can assume that they are actually standard centered Gaussian variables without changing the free entropy. This is strengthened by the classical results of [MP67], who only need to consider i.i.d. variables $\{\xi^{l}_{\mu}\}$ to obtain that the spectral law of the covariance matrix written in eq. (61) converges weakly to the celebrated Marchenko-Pastur distribution.

The ensemble of eq. (61) is thus for our purposes essentially a Wishart matrix model in which the diagonal has been removed. It is then a known result of random matrix theory (see for instance [TV04]) that its asymptotic $\mathcal{R}$ -transform reads:

[TABLE]

The term $-\alpha$ in the first equality accounts for the “removal” of the diagonal of the Wishart matrix. Let us apply eq. (59) and eq. (60) for this model. At $\beta=0$ , the free entropy is given by eq. (56). For an Ising prior on the spins, it reads:

[TABLE]

Note that because $x_{i}^{2}=1$ the variances $v_{i}$ of the variables $x_{i}$ are fixed by the magnetizations $m_{i}$ by the relation $v_{i}=1-m_{i}^{2}$ . Eq. (59) becomes:

[TABLE]

Substituting the squared means $m_{i}^{2}$ by the spin glass order parameter $q\equiv(1/N)\sum_{i}m_{i}^{2}$ in eq. (64) and using the resummed form of eq. (60), we obtain:

[TABLE]

Starting from eq. (65) and eq. (62), we reach the final form for the free entropy:

[TABLE]

Maximizing it over the continuous set of magnetizations $\{m_{i}\}$ yields the TAP equations:

[TABLE]

This is in agreement with the findings of [MPV87, NT97, Méz17]. However, our framework and results allowed us to treat this kind of model in a very generic way.

3.3 The Replica approach

Interestingly, the same approaches can be used to study the partition function and derive the free entropy of rotationally invariant models using the replica approach. The essence of the replica method is well known [MPV87]: one studies the $n$ -th moment of the partition function $Z$ by introducing $n$ ’replicas’ of each original variable, and one then studies the average of $Z^{n}$ in the limit $n\to 0$ , which allows to reconstruct the average of $\log Z$ , and therefore the free entropy. We shall illustrate in this section how the three approximation schemes (EC, adaTAP and VAMP) can be used to derive the replica free entropy, and how the high-temperature Plefka expansion justifies these approximations.

Let us consider again the generic model of eq. (53), with $h_{i}=0$ for simplicity. The $n$ -th moment of the partition function reads ( $a,b$ indices always denote replica indices running from $1$ to $n$ ):

[TABLE]

This “replicated partition function” should then be averaged over the disorder. Here, we deal with a $J$ matrix generated from Model S, and the disorder average means an average over the orthogonal matrix $O$ in the decomposition $J=ODO^{T}$ (keeping the eigenvalues $D$ fixed). Let us see how it can be analyzed using the four (equivalent) approximation schemes. Interestingly the average over $O$ is in fact not needed in these approaches, as they show that the repliacted partition function is “self-averaging”, meaning that it gives the same free entropy density for almost all realizations of $J$ . The order parameter we will fixe here is the $n\times n$ symmetric matrix $Q$ with elements $Q_{ab}=\frac{1}{N}\sum_{i}\braket{x^{a}_{i}x^{b}_{i}}_{c}$ , called the overlap matrix in the statistical physics language.

3.3.1 Expectation-Consistency approximation

As we have seen in Sec. 3.1.1, in the Expectation-Consistency (EC) approximation, we decompose the replicated free entropy $\Phi=\frac{1}{N}\log Z_{J}^{n}(\beta)$ as a function of three auxiliary free entropies:

[TABLE]

The $\Gamma_{0},\Gamma_{J}$ matrices are symmetric $n\times n$ matrices, and ${\textbf{x}}_{i}=(x^{a}_{i})_{a=1}^{n}$ . Indeed, as we will only fix the $Q_{ab}$ , we expect the first moments to vanish and the second moments to be uniform in space (but not in replica space). The extremization over $\Gamma_{0},\Gamma_{J}$ yields the Expectation-Consistency equations, as described in eq. (44). In particular, the average $\frac{1}{N}\sum_{i}\braket{x^{a}_{i}x^{b}_{i}}$ is constrained to be the same under the three different measures appearing in eq. (3.3.1). As we want to impose this average to be $Q_{ab}$ , we can introduce a Lagrange parameter $\Gamma_{S}$ to fix this average for instance in the third part of the r.h.s. of eq. (3.3.1). This third term becomes

[TABLE]

Changing $\Gamma_{S}\to\Gamma_{S}+\Gamma_{0}+\Gamma_{J}$ gives:

[TABLE]

In the end, performing explicitely the Gaussian integration in the basis of eigenvectors of $J$ in eq. (3.3.1), we obtain the free entropy at fixed overlap $Q$ (recall that $\rho_{D}$ is the asymptotic eigenvalue density of $J$ ):

[TABLE]

The total free entropy is obtained simply as $\underset{Q}{\mathrm{extr}}\,\Phi(Q)$ . Assuming an ultrametric structure in replica space [MPV87] (for instance in the replica symmetric case the matrix elements of $Q$ take only two values, one on the diagonal and one out of the diagonal, and the same holds for $\Lambda$ and $R$ ), one can perform an explicit computation. For random orthogonal models, this expression was first derived using Plefka expansion in [MPR94a, MPR94b], and written in exactly the same form as ours in [CDL03].

3.3.2 The adaTAP approximation

Let us now describe the adaTAP approach to this problem. We introduce Lagrange parameters $\Lambda_{ab}$ which act as external fields, fixing the values of the order parameter $Q_{ab}$ . The matrix $\Lambda$ is a $n\times n$ symmetric matrix. The free entropy at fixed overlap matrix $\Phi(Q)$ is expressed as:

[TABLE]

The total free entropy will thus be obtained as $\underset{Q}{\mathrm{extr}}\,\Phi(Q)$ . We use the adaTAP approximation, which gives:

[TABLE]

where $\Phi_{G}$ is the Gibbs free energy for a model with Gaussian spins, at fixed overlap between spins. The three pieces of $\Phi$ can be written as:

[TABLE]

In $\Phi_{G}(Q,\beta)$ we carry the integral over $s_{i}^{a}$ in an (orthonormal) basis of eigenvectors of $\{J_{ij}\}$ . $\Phi_{G}(Q,\beta=0)$ is easily evaluated: the extremization gives $\Lambda=Q^{-1}$ , and therefore

[TABLE]

We change notation and denote by $R$ the matrix $\Lambda$ that appears in $\Phi_{G}(Q,\beta)$ . This gives finally:

[TABLE]

with

[TABLE]

We found back exactly the EC free entropy of eq. (3.3.1).

3.3.3 The VAMP approach

Following eq. (48), the replicated partition function can be written as a “duplicated” integral:

[TABLE]

where

[TABLE]

We now write the Bethe free entropy $\Phi_{\rm Bethe}$ as in eq. (49), and project it onto the space of Gaussian messages which are assumed to be proportional to identity in space, but with an arbitrary replica structure:

[TABLE]

We did not include first moments because we expect all the first moments to vanish in the replicated system. We can now write the stationarity equations of $\Phi_{\rm Bethe}$ with respect to $A^{0}$ and $A^{J}$ :

[TABLE]

It is easy to derive the replicated free entropy $\Phi$ . One finds

[TABLE]

where

[TABLE]

This is again exactly the same result as the replicated free entropy found with EC or adaTAP in the previous sections.

3.3.4 Plefka expansion

As before, we fix only the overlap $Q_{ab}$ , via Lagrange parameters $\gamma^{ab}$ . Let us denote $\Phi(Q,\beta)$ the corresponding free entropy. At $\beta=0$ the replicas are not coupled so that we obtain

[TABLE]

One can then compute the order $1$ perturbation and the $U$ operator of Georges-Yedidia:

[TABLE]

Note that at $\beta=0$ we have $\braket{x^{a}_{i}}_{0}=0$ . We obtain the order $2$ correction in the same way as before:

[TABLE]

Here the trace is taken in the replica space. One can continue the Plefka expansion at any order, and one obtains very similar results to the non-replicated free entropy, simply the product of variances is replaced by traces of the overlap matrix $Q$ . Indeed, the diagrams constructed from the matrix indices $\{J_{ij}\}$ that appear in the replicated free entropy are exactly the same as in the non-replicated case, so that all the results of Sec. 5 stay valid for this replicated calculation. In the end, the resummation yields the single-graph replicated free entropy as a function of the overlap matrix $Q$ :

[TABLE]

The series in the second part of this equation is nothing but $\mathrm{Tr}\,\left[G_{\rho_{D}}(Q)\right]$ (see Appendix C for the definition of this function). Recalling the expression (88) for the first, $\beta=0$ , piece, we get finally:

[TABLE]

and recall that the $\beta=0$ term is given by eq. (88) and that the $G_{\rho_{D}}$ function is the integrated $\mathcal{R}$ -transform defined in Appendix C, which verifies:

[TABLE]

Thus we recognize once again in eq. (93) the expression obtained via EC and adaTAP, see eq. (3.3.1).

3.4 Plefka expansion for models of bipartite pairwise interactions

3.4.1 A bipartite model with generic priors

We consider here another generic model, which is an extension of the bipartite model studied in Sec. 2.2, the difference being that we assume a generic prior on the variables rather than a spherical constraint. As we will see, this model is closely related to the symmetric model of Sec. 3.2.1. Let $M,N\geq 1$ . We consider two types of variables: a vector ${\textbf{x}}\in\mathbb{R}^{N}$ and a vector ${\textbf{h}}\in\mathbb{R}^{M}$ . These two fields are assumed to follow prior distributions under which they are independent and the distribution of all their components decouples. For instance, the prior $P_{X}$ on x can be written as $P_{X}(\mathrm{d}{\textbf{x}})=\prod_{i}P_{i}(\mathrm{d}x_{i})$ . The two fields ${\textbf{h}},{\textbf{x}}$ interact via the following Hamiltonian:

[TABLE]

As in Sec. 2.2, greek indices $\mu,\nu$ will always run from $1$ to $M$ while latin indices $i,j$ run from $1$ to $N$ . We assume that the coupling matrix $F\in\mathbb{R}^{M\times N}$ satisfies the rotation invariance property described in Model R. For a fixed $\beta\geq 0$ and a realization of $F$ we define the Gibbs-Boltzmann distribution and the partition function:

[TABLE]

As in the previous section we will compute the large $N$ limit of the free entropy $\Phi_{F}(\beta)\equiv\frac{1}{N}\log Z_{\beta,F}$ . We look at this problem in the thermodynamic limit, with $N,M\to\infty$ and a fixed ratio $M/N\to\alpha>0$ . We constraint the first and second moments of $\{x_{i}\}$ and $\{h_{\mu}\}$ under the Gibbs measure to be $\braket{x_{i}}=m^{x}_{i}$ , $\braket{h_{\mu}}=m^{h}_{\mu}$ , $\braket{(x_{i}-m_{i}^{x})^{2}}=v^{x}_{i}$ , $\braket{(h_{\mu}-m_{\mu}^{h})^{2}}=v^{h}_{\mu}$ . The Lagrange multipliers introduced to enforce these conditions will be denoted $\lambda^{x}_{i}$ , $\lambda^{h}_{\mu}$ for the first moments and $\gamma^{x}_{\mu}$ , $\gamma^{h}_{\mu}$ for the second moments. At order [math], we obtain:

[TABLE]

The calculations at order $1$ and $2$ are very similar to the ones of the spherical model of Sec. 2.2.2. One can also refer to the symmetric case of Sec. 3.2.1. We obtain the first four orders:

[TABLE]

Recall that $\kappa^{(p,x)}_{i}$ is the $p$ -th order cumulant of $x_{i}$ at $\beta=0$ , and we define $\kappa^{(p,h)}_{\mu}$ in the same way for $h_{\mu}$ .

On higher orders

The discussion on the higher orders in perturbation is very similar to the one we made in Sec. 3.2.1. In the end, we only retain the simple cycles made of matrix elements $\{F_{\mu i}\}$ in our Plefka expansion. As we explain in more details in Sec. 5, and in particular in Sec. 5.5 for the bipartite case, we make two crucial statements to obtain this result:

$\bullet$

Let us forget for the moment about the factors of variances or higher-order moments of the distributions of $x_{i},h_{\mu}$ at $\beta=0$ . We can then study all the possible terms appearing in the Plefka expansion at every order as diagrams made of matrix elements $\{F_{\mu i}\}$ , and show that the only non-vanishing diagrams are the simple cycles. This is shown in more details in Sec. 5.5.

$\bullet$

We assume that the factors arising from the variances $v^{x}_{i},v^{h}_{\mu}$ or higher-order cumulants of the variables $x_{i}$ and $h^{\mu}$ do not change significantly the scaling of the diagrams, that is we can still only retain the simple cycles in the thermodynamic limit. More details are given in Sec. 5.5.2.

For instance, at order $3$ these statements yield $\partial^{3}_{\beta}\Phi_{N,F}=\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}_{N}(1)$ since one can not construct a simple cycle for bipartite models at order $3$ . This also explains the $\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}_{N}(1)$ term in the order $4$ , see eq. (102). In the end, we obtain the following value of the free entropy at leading order in $N$ :

[TABLE]

In the summation all indices $\mu_{1},\cdots,\mu_{p}$ are pairwise distinct, and so are $i_{1},\cdots,i_{p}$ .

Homogeneous variances

Let us assume that the maximum of the free entropy of eq. (103) is attained for variables $\{v^{h}_{\mu},v^{x}_{i}\}$ such that $v^{h}_{\mu}=v^{h}$ and $v^{x}_{i}=v^{x}$ . As in the symmetric case this hypothesis can be justified for many models that we will analyze later on. Using eq. (36) the free entropy of eq. (103) can be resummed:

[TABLE]

where $\rho_{D}$ is the spectral distribution of $F^{T}F$ . At $\beta=0$ , $\Phi_{F}(0)$ is given by eq. (3.4.1). As in the symmetric model of Sec. 3.2 we were able to perform this expansion and its resummation by applying our results on the spherical models of Sec. 2. The diagrammatic study performed in Sec. 5.5, which is a generalization of the diagrammatic for symmetric matrices, plays a decisive role in this analysis.

A remark on Restricted Boltzmann machines

In the case of an i.i.d. matrix $F$ (see Sec. 5.6 for a remark on how to apply our results to this class of matrices), we recognize in particular in eq. (103) the result obtained in eq. (36) of [TGM*+*18] for Restricted Boltzmann Machines (RBMs). For generic rotationally invariant $F$ , the fixed point equations corresponding to the extremization of $\Phi$ in eq. (3.4.1) also correspond to the fixed point of Algorithm 3 of [TGM*+*18] (which was described there as “adaTAP inference algorithm”). This generic property of the fixed points of the Plefka-expanded free entropy will be investigated in Sec. 4.

3.4.2 Generalized Linear Models with correlated matrices

Generalized Linear Models (GLMs) [NW72, McC18] arise as a generalization of the Compressed Sensing problem (see below). They are of primary importance in a very wide variety of scientific and engineering fields, such as phase retrieval in optics [Fie82] and classification problems in statistics. GLMs can also be thought of as the building blocks of fully connected neural networks [LBH15]. Let us now define more precisely the model we will study. Consider $M,N\geq 1$ both going to infinity with a fixed ratio $M/N\to\alpha>0$ . We are given a (random) measurement matrix $F\in\mathbb{R}^{M\times N}$ which comes from the ensemble of Model R. Given $F$ , data samples $\{Y_{\mu}\}$ are generated as:

[TABLE]

in which ${\textbf{X}}\in\mathbb{R}^{N}$ is the vector we try to recover from the observation of $\{Y_{\mu}\}_{\mu=1}^{M}$ , and $P_{\rm out}$ is a fixed probabilistic channel. Recall that greek indices $\mu,\nu$ will always run between $1$ and $M$ and latin indices $i,j$ between $1$ and $N$ . The vector ${\textbf{X}}\in\mathbb{R}^{N}$ is assumed to be drawn with i.i.d. coordinates $\{X_{i}\}_{i=1}^{N}$ according to a prior $P_{X}$ with zero mean and variance $\rho>0$ .

Compressed Sensing, the Gaussian channel case

Compressed Sensing [Don06] arises as a particular case of channel distribution in eq. (105), in which the channel is taken to be a Gaussian distribution with zero mean and variance $\Delta$ . Equivalently, it can be formulated as:

[TABLE]

Our aim is to infer the vector X from these observations. In this equation we modeled the noise by a standard Gaussian variable $z_{\mu}$ , the strength of the noise being $\Delta>0$ . In [KMS*+*12] the authors have considered a subclass of matrices $F$ , namely i.i.d. matrices. We follow here the same probabilistic inference approach, as we aim to study this problem by sampling from the following distribution:

[TABLE]

As the parameters $(P_{X},\Delta)$ of the signal are known, we could use them in our model, a setting which is known as the Bayes-optimal setting. Using the matrix $J=-F^{\intercal}F$ (which follows Model S), we can rewrite the posterior distribution, up to a normalization, as:

[TABLE]

Defining $\beta\equiv\Delta^{-1}$ it becomes clear that this can be written as a Gibbs-Boltzmann distribution of the model we studied in Sec. 3.2.1, with $J=-F^{\intercal}F$ and $h_{i}=-\sum_{\mu}F_{\mu i}Y_{\mu}$ . Assuming that the variance variables at the maximum of the free entropy are homogeneous ( $v_{i}=v$ ) we can use eq. (60) to directly obtain the free entropy of this problem as a function of $\mathcal{R}_{J}$ , the $\mathcal{R}$ -transform of the asymptotic spectrum of the $J$ matrix:

[TABLE]

We postpone the analysis of the corresponding fixed point equations to our algorithmic discussion in Sec. 4.1 and Sec. 4.2.

Generic channel distributions

We now turn to generic $P_{\rm out}$ distributions. We assume that both $P_{\rm out}$ and $P_{X}$ are known (this is the Bayes-optimal setting, known in statistical physics as the Nishimori line), so that we can use them in the posterior distribution:

[TABLE]

from which we will sample to obtain an estimate of X. While in the compressed sensing setting $\beta=\Delta^{-1}$ played naturally the role of an inverse temperature, in the general setting of eq. (105) there is a priori no way to perform a Plefka expansion. As it turns out, there is a way to introduce an auxiliary parameter in terms of which we will perform the expansion, similarly to what is done in [AFP16, Alt18]. Introducing the usual Lagrange parameters to fix the mean and variance of $\{x_{i}\}$ , we obtain the free entropy:

[TABLE]

in which we introduced an action $S[{\textbf{x}}]$ :

[TABLE]

As before $\{\lambda_{i},\gamma_{i}\}$ are Lagrange parameters used to enforce the condition $\braket{x_{i}}=m_{i}$ and $\braket{x_{i}}^{2}=v_{i}+m_{i}^{2}$ . Introducing an auxiliary field ${\textbf{h}}\equiv{\textbf{F}}{\textbf{x}}\in\mathbb{R}^{M}$ , and using the Fourier representation of the Dirac distribution, we reach:

[TABLE]

with a new effective action $S_{\rm eff}$ :

[TABLE]

The key idea is to treat x and $i\tilde{{\textbf{h}}}$ as two independent non-Gaussian fields that interact via the last (quadratic) term of eq. (114) and to perform a Plefka expansion in terms of this effective Hamiltonian, which is exactly the bipartite Hamiltonian of the general model of Sec. 3.4.1. This mapping of a generalized linear model to a bipartite model using Fourier transformation has already been successfully applied in the context of the replica method, see [Kab08a, Kab08b]. We will call $\eta$ the “inverse temperature”, that is in eq. (114) we substitute:

[TABLE]

and at the end of the expansion we will set $\eta=1$ . Similarly as for the field x we will fix the first and second moments of the field $i\tilde{{\textbf{h}}}$ as $\braket{i\tilde{h}_{\mu}}_{\eta}=f_{\mu}$ and $\braket{(i\tilde{h}_{\mu})^{2}}_{\eta}=-r_{\mu}+f_{\mu}^{2}$ , conditions that will be enforced by new Lagrange parameters $\{\omega_{\mu},b_{\mu}\}$ . Although a bit tedious, this is straightforward, and we obtain a free entropy in which we will perform a low- $\eta$ expansion:

[TABLE]

The effective action and Hamiltonian are expressed as follows:

[TABLE]

From these equations it is clear that:

$(i)$

The priors on the variables $\{x_{i}\}$ and $\{(i\tilde{h}_{\mu})\}$ decouple. The prior on $x_{i}$ is $P_{X}(x_{i})$ , while the prior distribution on $(i\tilde{h})_{\mu}$ is related to the Fourier transform of the channel distribution:

[TABLE]

$(ii)$

The interaction Hamiltonian of eq. (118) is a bipartite Hamiltonian of the type of the model we studied in Sec. 3.4.1, in terms of the variables $x_{i}$ and $(i\tilde{h}_{\mu})$ .

Points $(i)$ and $(ii)$ allow us to use the general results of Sec. 3.4.1. We can therefore directly conjecture the final form of the free entropy we were seeking:

[TABLE]

Homogeneous variances

Once again we can assume that the maximum of the free entropy written in eq. (120) will be attained for variance variables $\{v_{i},r_{\mu}\}$ that are homogeneous: they satisfy $v_{i}=v$ and $r_{\mu}=r$ . Using the resummation of eq. (3.4.1) this leads to a simplified expression for eq. (120):

[TABLE]

We will study the fixed point equations corresponding to the free entropies of eq. (120) and eq. (121) in Sec. 4.1 and Sec. 4.3.

4 Consequences for iterative algorithms

In the previous section we showed how to use the Plefka expansion to derive the single-graph free entropy of a large class of systems with pairwise interactions. One then needs to maximize this free entropy, which yields fixed point equations. Iterating these fixed point equations is in itself a challenge since different choices for the iteration scheme can lead to drastically different convergence properties. In the context of the Plefka expansion, or equivalently in the adaTAP or EC approximation, several iterations schemes for the TAP equations have been studied, see for instance [OCW16, ÇOFW16].

On a parallel point of view, message-passing algorithms have been extensively studied in the statistical physics literature. In particular the belief-propagation equations [MM09] can be shown in the large $N$ limit to reduce to a simpler algorithm called Approximate Message Passing (AMP) [DMM09], and derived initially for i.i.d. coupling matrices. It is well understood that for these matrices the stationary limit of the AMP equations is directly related to the fixed point equations of the Plefka free entropy (stopping here at order $2$ in the couplings). Extending these algorithms to correlated matrices has been the subject of extensive studies [CWF14, MP17]. The generic (and appealing by its simplicity) iteration scheme called Vector Approximate Message Passing (VAMP) [RSF17] has proven to be very successful both numerically and in its theoretical justification in the case of rotationally-invariant sensing matrices. It has then been generalized to the broader class of generalized linear models with generic output channels [SRF16].

In Sec. 4.1 we describe the connection between AMP equations and the Plefka expansion in the context of generalized linear models with i.i.d. matrices, retrieving the GAMP algorithm [Ran11] and the analysis of [KMS*+*12]. In Sec. 4.2 we relate the VAMP algorithm to the Expectation-Consistency equations by showing that the stationary limit of the algorithm yields the TAP equations. We note that in the naïve Plefka expansion performed in terms of $\Delta^{-1}$ for the compressed sensing problem there is no clear way to iterate the TAP equations to find back an asymptotically exact algorithm. Finally, in Sec. 4.3 we extend this analysis to Generalized Linear Models with correlated matrices and the G-VAMP algorithm. In contrast with the naïve $\Delta^{-1}$ -expansion in compressed sensing, the mapping to a bipartite model that we performed in Sec. 3.4.2 allows to retrieve the stationary limit of the more general G-VAMP algorithm as our TAP equations.

4.1 Generalized Approximate Message Passing for a GLM with i.i.d. matrices

We consider in this subsection the Generalized Linear Model (GLM) of Sec. 3.4.2. We will concentrate on a particular subclass of sensing matrices, namely matrices $F\in\mathbb{R}^{M\times N}$ that are generated with i.i.d. centered standard Gaussian matrix elements $\{F_{\mu i}\}$ . Note that the remarks of Sec. 5.6 (coherently with the analysis of [Ran11, BKM*+*19]) show that the particular distribution of the elements does not need to be Gaussian, as long as the $\{F_{\mu i}\}$ are distributed independently and identically. Generalized linear estimation with such i.i.d. matrices $F$ and generic prior and channel distributions has received a lot of attention recently. In particular Generalized Approximate Message Passing (GAMP), an algorithm first developed in [Ran11], has been shown to be optimal among all polynomial-time algorithms for this problem, see [BKM*+*19]. A description of the algorithm in our case can be found in eqs. (171)-(177) of [ZK16]. We state here the iterative GAMP equations with our notations:

[TABLE]

In these equations $P_{X}(\lambda_{i},\gamma_{i})$ is the probability measure with density:

[TABLE]

and $g_{\rm out}$ is defined from the channel distribution $P_{\rm out}$ as:

[TABLE]

We now turn to the TAP equations that we can derive from the extremization of the Plefka-expanded free entropy of eq. (120). Note that as $F$ is an i.i.d. matrix, all the terms with $p\geq 2$ in eq. (120) will be negligible. More discussion on this can be found in Sec. 5.6. We obtain:

[TABLE]

Recall that $\Phi_{F}(\eta=0)$ is given by eq. (3.4.1). Extremizing $\Phi_{F}(\eta=0)$ over the Lagrange parameters yields the moments conditions that we wish to enforce:

[TABLE]

On the other hand, the maximization of eq. (125) with respect to the physical parameters $\{m_{i},v_{i},f_{\mu},r_{\mu}\}$ leads to four additional equations:

[TABLE]

Combining eq. (126) and eq. (127) we see easily that the equations that one has to solve are exactly the ones of the GAMP algorithm of eq. (122) (without time indices).

The Gaussian channel case

In the case of an additive gaussian channel with variance $\Delta$ the problem reduces to the compressed sensing studied in Sec. 3.4.2 and Sec. 4.2, with a Gaussian i.i.d. sensing matrix. In this case, the function $g_{\rm out}$ of eq. (124) is computable explicitly, and leads to $f_{\mu}=(\omega_{\mu}-y_{\mu})/(\Delta+b_{\mu})$ and $r_{\mu}=(\Delta+b_{\mu})^{-1}$ . In this particular case one recovers the AMP algorithm of [KMS*+*12], which is also compatible with the VAMP algorithm of Sec. 4.2.

4.2 Vector Approximate Message Passing (VAMP) in Compressed Sensing

4.2.1 The TAP equations in Compressed sensing

We analyze here the fixed point equations for the Compressed Sensing (CS) problem, see Sec. 3.4.2 for the corresponding free entropy derivation. Our starting point is the Plefka-expanded free entropy written in eq. (109). The extremization over the Lagrange parameters $\{\lambda_{i},\gamma_{i}\}$ (which are considered at $\beta=0$ ) yields:

[TABLE]

where $(a)$ and $(b)$ respectively define the functions $F_{m}$ and $F_{v}$ . Maximizing the free entropy with respect to the physical parameters $\{m_{i},v\}$ results in the following equations (recall that $\beta=\Delta^{-1}$ ):

[TABLE]

Eq. (128b) and eq. (129b) define a set of fixed point equations that one has to solve in order to retrieve the maximum of the free entropy of eq. (109).

4.2.2 TAP equations and the fixed point of the VAMP algorithm

A remark on i.i.d. matrices

We start with a remark on the case of an i.i.d. matrix $F$ . Remarkably, eqs. (128b) and (129b) are compatible with the fixed points of AMP, see eqs. (22) and (23) in [KMTZ14] with $R_{i}=-\lambda_{i}/\gamma$ and $\Sigma^{-2}=\gamma$ , since in this case $\mathcal{R}_{F^{\intercal}F/\Delta}(-v)=\alpha/(\Delta+v)$ , see for instance [TV04]. We now turn to the VAMP algorithm for a general rotationally invariant matrix $F$ . Applying the VAMP derivation of Sec. 3.1.3 to the Compressed Sensing problem of Sec. 3.4.2, the VAMP algorithm reads111Note that instead of fixing all the correlations $\braket{x_{i}x_{j}}$ , we only fix the ‘diagonal’ second moments $\braket{x_{i}^{2}}$ .:

[TABLE]

where $F_{m}$ and $F_{v}$ were defined in eq. (128b). Note that in Compressed Sensing the matrix $\gamma_{0}^{t}+F^{\intercal}F/\Delta$ has only strictly positive eigenvalues since $\gamma_{0}^{t}\geq 0$ , so the previous iterative equations are always well defined. At the fixed point, we expect ${\textbf{m}}_{1}={\textbf{m}}_{2}={\textbf{m}}$ and $v_{1}=v_{2}=v$ . In the stationary limit eq. (132) yields:

[TABLE]

From eq. (133), one has

[TABLE]

And from eq. (132), we obtain

[TABLE]

and

[TABLE]

which gives

[TABLE]

One now recognizes easily the fixed points obtained with the Plefka expansion in Sec. 4.2.1, namely eq. (128b) and eq. (129b), with ${\bm{\lambda}}_{J}={\bm{\lambda}}$ and $\gamma_{J}=\gamma$ .

A remark on iterating the TAP equations in the i.i.d. case

Note that in the i.i.d. case (Sec. 4.1), doing the Plefka expansion in terms of the $\eta$ parameter after having mapped the GLM to a bipartite problem allows us not only to retrieve the fixed point of the GAMP algorithm (and even the G-VAMP for non-i.i.d. matrices, as we will see in Sec. 4.3), but there is a simple iterating scheme of the TAP equations that exactly yields the GAMP algorithm. We insist that this is not true when making the correspondence of the VAMP algorithm with the Plefka expansion in $\Delta^{-1}$ for compressed sensing with an i.i.d. matrix. This underlines one of the possible limitations of the EC, adaTAP and Plefka methods for these problems, as iterating the TAP equations with an algorithmic scheme that guarantees convergence is a very involved task, while the VAMP derivation provides an iteration scheme of the equations.

4.3 Generalized Vector Approximate Message Passing (G-VAMP) for Generalized Linear Models

We focus in this section on Generalized Linear Models with a correlated matrix $F$ that satisfies rotation invariance (Model R). We first derive the TAP equations from the Plefka expansion we performed in Sec. 3.4.2, before stating the G-VAMP algorithm for this problem following [SRF16]. We then analyze how the stationary limit of G-VAMP is equivalent to these TAP equations.

4.3.1 The TAP equations from the Plefka expansion

Recall that the Plefka-expanded free entropy was computed in Sec. 3.4.2. Following the assumptions of the VAMP and G-VAMP algorithms [SRF16, RSF17] we assume that the variances $\{v_{i},r_{\mu}\}$ are homogeneous, that is $r_{\mu}=r$ and $v_{i}=v$ . We can then use the resummed expression of the Plefka free entropy expressed in eq. (121). We first extremize this expression with respect to the Lagrange parameters $\{\lambda_{i},\gamma_{i},\omega_{\mu},b_{\mu}\}$ and we obtain an equivalent expression to eq. (128b). We reach more precisely:

[TABLE]

Recall the definitions of $P_{X}(\lambda,\gamma)$ and $g_{\rm out}(y,\omega,b)$ from eq. (123) and eq. (124). The remaining equations are obtained by maximizing eq. (121) with respect to the physical parameters. We make use of the Jacobi formula for a symmetric positive definite matrix $J\in{\cal S}_{N}^{++}$ : $\frac{\partial}{\partial J_{ij}}\log\det J=(J^{-1})_{ij}$ . We reach:

[TABLE]

Remark: Additive gaussian channel

In the case of an additive Gaussian channel with variance $\Delta$ we find $r=(\Delta+b)^{-1}$ , which gives $\zeta=\Delta$ and $\gamma=\mathcal{R}_{F^{T}F/\Delta}(-v)$ . We thus coherently recover the TAP equations for the compressed sensing problem (see Sec. 4.2.1) even though these equations were derived with a “naïve” Plefka expansion in powers of $\beta\equiv\Delta^{-1}$ .

4.3.2 The G-VAMP algorithm for Generalized Linear Models

With a similar reasoning that we used to derive the VAMP algorithm for a symmetric pairwise model, we can write a VAMP algorithm for a bipartite model. We do not describe its full derivation here, and we simply report the G-VAMP algorithm for the GLM as stated in [SRF16]. We define a set of functions:

[TABLE]

The full algorithm then amounts to iterate the following equations:

[TABLE]

4.3.3 TAP equations and fixed points of G-VAMP

We want to see if the stationary limit of G-VAMP, that is the G-VAMP equations without time indices, is related to the TAP equations. At the fixed points of the G-VAMP algorithm written in eq. (142), one expects the following equalities to take place: ${\textbf{m}}_{1}={\textbf{m}}_{2}={\textbf{m}}$ , ${\textbf{z}}_{1}={\textbf{z}}_{2}={\textbf{z}}$ , $v_{1}=v_{2}=v$ and $\kappa_{1}=\kappa_{2}=\kappa$ . We start from the TAP equations, eq. (139) and eq. (140f), and we will try to recover every equation in eq. (142).

$\bullet$

From eq. (140f) and eq. (141d) we can write

[TABLE]

which can be identified with eq. (142h), with $b=\tau_{J}^{-1}$ and $\zeta=\tau_{0}^{-1}$ .

$\bullet$

Using eq. (141d) we write eq. (140d) as

[TABLE]

Finally from eq. (143) we obtain

[TABLE]

which is compatible with the second part of eq. (142g), with $\zeta=\tau_{0}^{-1}$ and $\zeta^{\prime}=\gamma_{0}$ .

$\bullet$

Eq. (140c) and eq. (140e) are equivalent to the second parts of eq. (142e) and eq. (142f), with $\zeta^{\prime}=\gamma_{0}$ , $\zeta=\tau_{0}^{-1}$ and $\gamma=\gamma_{J}$ .

$\bullet$

We write eq. (142e) as

[TABLE]

and using that $F{\textbf{m}}={\textbf{z}}$ , as well as eq. (142b) and eq. (142d), we arrive at

[TABLE]

which is exactly eq. (140a) with $\omega=\omega_{J}$ , ${\bm{\lambda}}=-{\textbf{r}}_{J}\gamma_{J}$ and $\tau_{J}=b^{-1}$ .

$\bullet$

Finally we note that eq. (140b) at the fixed point is nothing but ${\textbf{z}}=F{\textbf{m}}$ , which gives eq. (142g).

All these relations show the equivalence between the stationary limit of the G-VAMP algorithm of [SRF16] and the (TAP) maximization equations of the free entropy that we derived with our Plefka expansion in Sec. 3.4.2.

5 The diagrammatics of the Plefka expansion

The goal of this section is to precise how the different diagrams arising in our Plefka expansions in Sec. 3 can be computed. Recall that for symmetric random matrices $J$ we construct diagrams as described in Fig. 3.

For instance the diagram depicted in Fig. 3(a) is equal to:

[TABLE]

The perturbation order of any diagram is equal to its number of edges, since each of them represents a factor $J_{ij}$ . In this whole section we will only consider connected diagrams (unless stated otherwise). The structure of the section is the following:

$\bullet$

In Sec. 5.1 we prove a first rigorous result on the ‘simple cycles’ arising in the Plefka expansion of Sec. 2, namely we study these diagrams in expectation over $J$ and show a weaker version of Theorem 1.

$\bullet$

In Sec. 5.2 we extend this study to all possible diagrams, in expectation over $J$ .

$\bullet$

In Sec. 5.3 we show how the results of Sec. 5.1 and Sec. 5.2 can be extended to study the second moments of these diagrams, and use it to show concentration results. This will in particular imply the full statement of Theorem 1.

$\bullet$

In Sec. 5.4 we explain how to handle the higher-order moments that can appear as additional factors in these diagrams for the statistical models studied in Sec. 3.

$\bullet$

In Sec. 5.5 we explain how to generalize all these techniques and results to diagrams made of rectangular matrices, that arise in the Plefka expansion for bipartite models.

$\bullet$

Finally, in Sec. 5.6 we show that if one considers an i.i.d. coupling matrix, all the diagrams of order greater than $3$ will not contribute in the thermodynamic limit and that one can effectively consider the distribution of the matrix elements to be Gaussian.

Some technicalities, as well as side results and generalizations of these diagrammatics for Hermitian matrices and diverging-size diagrams, which are not directly useful for our expansions, are detailed in Appendix D. We finally note that some of our results are similar to the recent independent work of [BBJ19] that was recently brought to our attention.

5.1 A weaker version of Theorem 1

We will consider the random matrix ensemble defined by Model S. In the following, $J\in{\cal S}_{N}$ is a random matrix from this ensemble. Recall as well the random matrix tools defined in Appendix C, in particular the free cumulants $\{c_{p}(\rho_{D})\}$ . We first show a weaker version of Theorem 1:

Theorem 2 (Expectation of simple cycles and free cumulants).

For $J$ following Model S, for any $p\geq 1$ , and any set of pairwise distinct indices $i_{1},\cdots,i_{r}\in\mathbb{N}^{p}$ , one has:

[TABLE]

A stronger result actually takes place, that is we only need to average over $O$ to obtain the result:

[TABLE]

This last equality is true a.s. with respect to the law of $D$ .

Note that in the Plefka expansions we perform in Sec. 2.1.2 and Sec. 3.2 we consider sums over all distinct pairwise indices of eq. (149). The expectation of these sums over $O$ is an immediate consequence of Theorem 2:

[TABLE]

We now turn to the proof of Theorem 2.

Proof of Theorem 2.

A first pedestrian way to show eq. (150) for small values of $p$ is to use explicit integration of polynomials over the Haar measure of the orthogonal or unitary group, see for instance [CŚ06]. This can be used to check eq. (150) for the first values of $p$ . Since we aim at a generic proof we will choose a different path, leveraging from HCIZ-type integrals [HC57] [IZ80], in the particular case in which one matrix has finite rank. In our setting, the computation of these integrals has been made rigorous in [GM05]. Let us denote:

[TABLE]

In order to simplify the following calculation, we assume that $(i_{1},\cdots,i_{p})=(1,\cdots,p)$ . Since the sought result does not depend on the particular choice of indices (as is clear by rotational invariance), this does not remove any generality. We first note that the case $p=1$ and $p=2$ are trivial to show by an explicit computation, so we will assume $p\geq 3$ in the following. One can rewrite eq. (151) as:

[TABLE]

in which we denoted ${\textbf{b}}\equiv(b_{1},\cdots,b_{p})$ and $M({\textbf{b}})\in{\cal S}_{N}$ the following symmetric block matrix of rank $p$ :

[TABLE]

in which $M_{1}({\textbf{b}})\in{\cal S}_{p}$ with:

[TABLE]

Now we can apply Theorem 2 of [GM05]. Recall that $G_{\rho_{D}}$ is (up to a factor) the integrated $\mathcal{R}$ -transform of $\rho_{D}$ . We obtain:

[TABLE]

Let us denote $Z({\textbf{b}})\equiv\exp\left\{\frac{N}{2}\sum_{n=1}^{\infty}\frac{c_{n}(\rho_{D})}{n}\mathrm{Tr}\,[M({\textbf{b}})^{n}]\right\}$ . Note that differentiating $Z({\textbf{b}})$ with respect to $b_{1}$ yields (by cyclicity of the trace):

[TABLE]

with elementary symmetric matrices $(E_{ab})_{ll^{\prime}}\equiv\delta_{l,a}\delta_{l^{\prime},b}+\delta_{l^{\prime},a}\delta_{l,b}$ . These matrices are such that for each $a<b$ and $c<d$ , $E_{ab}E_{cd}=0$ if $\{c,d\}\cap\{a,b\}=\emptyset$ . The only way to obtain a matrix of non-zero trace with a product of matrices $\{E_{ab}\}$ is to have a cycle structure in the indices of the matrices. Recall that the indices are symmetric, that is $E_{ba}=E_{ab}$ . For instance:

[TABLE]

Because of this and the fact that $M({\textbf{b}}=0)=0$ , it is easy to see that the only term that will survive after taking all the successive derivatives and taking ${\textbf{b}}=0$ will be the derivatives of the right-hand-side of eq. (157), and not other derivatives of $Z({\textbf{b}})$ . Let us analyze what differentiating this term yields. As we saw, taking derivative with respect to $b_{1}$ yields a matrix $E_{12}$ . When differentiating with respect to $b_{2}$ this yields a matrix $E_{23}$ . Note that a priori, one would have:

[TABLE]

However, the following differentiations with respect to $b_{3},\cdots,b_{p}$ will never yield a matrix $E_{ab}$ with one of the indices being equal to $2$ . So in eq. (158) it is clear that only two terms of the sum, the term $k=0$ and $k=n-2$ , will yield a non-zero contribution. In the end, after taking all the $p$ successive derivatives, only two terms will remain, which correspond to the two possible orientations of the simple cycle:

[TABLE]

using that $M(0)=0$ , which finishes the proof. ∎

5.2 The expectation of generic diagrams

Following the remarks of [GY91] and [PP95], we can separate some of the diagrams constructed as in Fig. 3 in three disjoint categories or types:

T.1

Non-Eulerian diagrams. By definition, a diagram is Eulerian if one can construct a cyclic path in the graph that goes through each edge exactly once. Note that this is a classic result of graph theory (the Euler–Hierholzer theorem) that these graphs are exactly the connected graphs with even degree in each vertex. For instance, the graph depicted in eq. (4(a)) is not Eulerian, whereas the one of Fig. 4(b) is Eulerian. 2. T.2

Eulerian diagrams that are strongly irreducible but not simple cycles. By strongly irreducible, we mean [GY91] that one can not make it disconnected by removing any single vertex. For instance, the diagram of Fig. 4(b) is strongly irreducible, whereas the diagram of Fig. 4(c) is not. 3. T.3

Cactus diagrams. These diagrams, like the one of Fig. 4(c), are trees made of simple cycles joining at their vertices. Among them are of course the simple cycles.

We are not interested in Eulerian diagrams that are not strongly irreducible. Indeed, as argued in [GY91], only strongly irreducible diagrams will appear in the Plefka expansions. This is an important hypothesis of the Plefka expansion, somehow a bit hidden by the formalism. We give precise descriptions of the large $N$ limit of the expectation of all these diagrams in the following. When we write “expectation” we will always mean expectation over the orthogonal matrix of Model S. More precisely, we will show:

$(i)$

All non-Eulerian graphs of type T.1 have a vanishing expectation in the $N\to\infty$ limit. 2. $(ii)$

All strongly irreducible diagrams of type T.2 also have a vanishing expectation in the $N\to\infty$ limit. 3. $(iii)$

We already showed that the expectation of a simple cycle of size $p$ converges to the $p$ -th free cumulant of $\rho_{D}$ in Sec. 5.1. We show that the expectation of a cactus diagram converges to the product of the expectations of all its constituent simple cycles. For instance, for the diagram $\mathcal{C}$ of Fig. 4(c) we obtain that its expectation converges to:

[TABLE]

Results $(i)$ and $(ii)$ are justified in Sec. 5.2.1, and are directly useful for our diagrammatic expansions. Result $(iii)$ on the other hand is a side result that is not used in our expansions, as we argued that only strongly irreducible diagrams come up in our expansions [GY91]. It is justified in Sec. 5.2.2.

5.2.1 Eulerian diagrams, strongly irreducible diagrams and simple cycles

Let us consider a connected diagram $G$ with $V$ vertices and $E$ edges. We will show that:

$\bullet$

If $G$ is not Eulerian, its expectation goes to [math] as $N\to\infty$ .

$\bullet$

If $G$ is Eulerian and strongly irreducible, but is not a simple cycle, its expectation also goes to [math] as $N\to\infty$ .

Once averaged over the orthogonal matrices, the permutation invariance of the indices allows us to write

[TABLE]

in which the $\epsilon_{ll^{\prime}}$ are positive integers such that $\sum_{l<l^{\prime}}\epsilon_{ll^{\prime}}=E$ . We can now use the results of [GM05], as we did in Sec. 5.1, to write this diagram as (in the $N\to\infty$ limit):

[TABLE]

In this expression, $M({\textbf{b}})_{ll^{\prime}}\equiv b_{ll^{\prime}}=M({\textbf{b}})_{l^{\prime}l}$ for $l<l^{\prime}$ ,and the diagonal is zero: $M({\textbf{b}})_{ll}=0$ . Exactly as in Sec. 5.1, the elementary matrices $\{E_{ll^{\prime}}\}$ will appear in eq. (162) by successive derivatives of the exponential, using the fact that $\frac{\partial}{\partial b_{ll^{\prime}}}M({\textbf{b}})=E_{ll^{\prime}}$ and then using $M({\textbf{b}}=0)=0$ . As we explained in Sec. 5.1, a trace of the products of the $\{E_{ll^{\prime}}\}$ matrices will only be non-zero if and only if the indices in the products form a cycle. Moreover, as is clear in eq. (162), the terms corresponding to the decomposition of $\mathbb{E}\,G$ into the maximum number of such cycles will dominate in the large $N$ limit, as each derivation of the exponential term adds a multiplicative factor $N$ 222There might be a confusion, so we emphasize that this “decomposition” of $\mathbb{E}\,G$ is a decomposition of the graph representing $\mathbb{E}\,G$ .. These two facts together imply that:

$\bullet$

If $G$ is not Eulerian, as in Fig. 4(a), its expectation will be [math] in the limit $N\to\infty$ since it is not possible to decompose it into disjoint cycles by definition.

$\bullet$

If $G$ is Eulerian, strongly irreducible, but not a simple cycle, the dominant contribution to $\mathbb{E}\,G$ in eq. (162) will arise from decomposing the graph $G$ into simple cycles, as this decomposition maximizes the number of cycles, and we already showed that each simple cycle has a $\mathcal{O}_{N}(1)$ contribution. For the graph of Fig. 4(b), we show two such possible decompositions in Fig. 5.

Given the remarks above we assume now that $G$ is Eulerian and strongly irreducible. Let us denote $P$ the maximal number of simple cycles in such a decomposition of the graph $G$ . Then one can see that the scaling of eq. (162) will be:

[TABLE]

One can easily be convinced that for a strongly irreducible diagram $G$ we have $V+P-E-1\leq 0$ , and we have equality only if $G$ is a simple cycle. This implies that all the strongly irreducible diagrams that are not simple cycles and that appear in our Plefka expansions in Sec. 2 and Sec. 3 will not contribute in the $N\to\infty$ limit.

5.2.2 Cactus diagrams

As a side result, although it’s not directly useful for our Plefka expansions, we show that we can compute the large $N$ limit of any “cactus” [PP95] diagram (like the one of Fig. 4(c)) as a function of the free cumulants of $\rho_{D}$ . The argument is straightforward and uses the same technique as in Sec. 5.2.1. Consider a cactus diagram $G$ with $V$ vertices and $E$ edges. One can write the same equation as eq. (162):

[TABLE]

Again, the dominant contribution is obtained by decomposing $G$ in as many simple cycles as possible. For a cactus diagram it is easy to see that there is only one such decomposition, which corresponds to its natural decomposition into its constituent simple cycles, and that the number of such cycles is $P=E+V-1$ . Let us denote $\{r_{1},\cdots,r_{P}\}$ the number of vertices in each of these $P$ simple cycles. The dominant contribution corresponds to differentiating $P$ times inside the exponential of eq. (163). Using exactly the argument of Sec. 5.1 for each of the $P$ simple cycles we finally obtain:

[TABLE]

This justifies the point $(iii)$ that we gave in the introductory part of the section: the expectation of the cactus diagrams decouple into the products of their simple cycles constituents.

5.3 Concentration of the diagrams: a second moment analysis

Using our first moment results of Sec. 5.1 and Sec. 5.2, we will show the following results:

$(i)$

If ${\cal C}_{p}$ is the simple cycle of order $p$ , then we have that $\lim_{N\to\infty}{\cal C}_{p}\overset{L^{2}}{=}c_{p}(\rho_{D})$ , which implies directly Theorem 1 and thus ends its proof. Moreover, if $G$ is a cactus diagram then it converges in $L^{2}$ to the products of the free cumulants corresponding to its constituent simple cycles.

$(ii)$

If $G$ is of the type T.1 or T.2, we have:

[TABLE]

This implies that the diagram $G$ will be negligible in the $N\to\infty$ limit.

Note that following the arguments of [GY91], one can convince oneself that only strongly irreducible diagrams will contribute in general to the expansion in our models. Together with point $(ii)$ this shows in more detail why only the simple cycles contribute in our Plefka expansions, like in eq. (2.1.2) for the spherical model of Sec. 2.1. In order to show $(i)$ and $(ii)$ we will establish the following fact. Consider a diagram $G$ with $V$ vertices and $E$ edges, of any of the types T.1, T.2, or T.3. Then one has:

[TABLE]

In this formula, the sum $\sum_{\alpha}\mathcal{C}_{\alpha}$ represents all the possible diagrams that one can obtain by ‘gluing’ together two replicas of the diagram $G$ . Indeed, one can write the generic form of a diagram $G$ as:

[TABLE]

in which the integers $\epsilon_{ll^{\prime}}$ verify $\sum_{l<l^{\prime}}\epsilon_{ll^{\prime}}=E$ . Thus one has:

[TABLE]

In this expression, one can see that two types of terms have to be taken into account:

$\bullet$

A term for which all indices $\{i_{1},\cdots,i_{V},j_{1},\cdots,j_{V}\}$ are pairwise distinct. Diagrammatically, this corresponds to a graph with two disconnected components that are identical and equal to $G$ . Therefore, one can repeat the arguments of Sec. 5.1 and Sec. 5.2 straightforwardly. Indeed, as all the indices are distincts, the decomposition of this diagram into the maximum number of simple cycles will be two copies of the maximal decomposition of $G$ . This yields that this term is equal in the $N\to\infty$ limit to $(\mathbb{E}\,G)^{2}$ .

$\bullet$

Terms for which there is at least one equality of the type $i_{l}=j_{l^{\prime}}$ for $1\leq l,l^{\prime}\leq V$ . Such a term thus corresponds to a diagram with a single connected component and constructed by ‘gluing’ some of the vertices of two identical copies of $G$ . Since these diagrams have a single connected component, they carry a single $\frac{1}{N}$ factor, which explains the term $\frac{1}{N}\sum_{\alpha}\mathbb{E}\,\mathcal{C}_{\alpha}$ in eq. (166), if we denote $\mathcal{C}_{\alpha}$ each of these possible terms.

We give a schematic representation of eq. (166) for a simple cycle in Fig. 6.

It is now possible to see why it implies our results $(i)$ and $(ii)$ . Indeed, all the diagrams $\mathcal{C}_{\alpha}$ have an expectation that is $\mathcal{O}_{N}(1)$ by the first moment analysis we performed in Sec. 5.1 and Sec. 5.2. So very generically, for every kind of diagram we described we have:

[TABLE]

Given our previous computations of the first moments this implies results $(i)$ and $(ii)$ .

5.4 The higher-order moments and their influence on the diagrammatics in the symmetric model

All the results of Sec. 5.1, Sec. 5.2 and Sec. 5.3 that we derived for the diagrammatics of the Plefka expansion in this context were valid for diagrams solely made out of the matrix elements $\{J_{ij}\}$ , without any additional factors. However in the Plefka expansions there generically are possible factors that are the cumulants (or the moments) of the variables $x_{i}$ at $\beta=0$ , see Sec. 3.2. Recall that we denote $\kappa^{(p)}_{i}$ the cumulant of order $p$ of $x_{i}$ at $\beta=0$ . As an example, consider the diagram of Fig. 4(b). Two possible contributions to the free entropy in our Plefka expansion at order $6$ would be:

[TABLE]

Note that both these contributions are represented by the diagram of Fig. 4(b). One can now clearly see that in order to apply the diagrammatic results of Sec. 5.1, Sec. 5.2 and Sec. 5.3 to our Plefka expansion, and justify eq. (59), we need to make some additional assumptions that we detail here:

A.1

From the construction of the diagrams, odd cumulants of order greater or equal to $3$ only appear in non-Eulerian graphs. By the results of Sec. 5.2 and Sec. 5.3 we know that these diagrams, without the moments or cumulants as factors, are negligible. We assume that the possible correlations of the higher order moments of $x_{i}$ with the matrix elements $\{F_{\mu i}\}$ are not strong enough to yield thermodynamically relevant corrections to the free entropy. 2. A.2

Eulerian strongly irreducible diagrams that are not simple cycles are negligible by our previous result. We assume that the higher order (even) moments that appear as additional factors do not change their scaling, so that they remain negligible in the thermodynamic limit.

For instance, A.2 implies that the contributions of both eq. (168) and eq. (169) are negligible in the $N\to\infty$ limit, as the diagram of Fig. 4(b) is strongly irreducible but is not a simple cycle. Concerning the simple cycles, we already know that they are not thermodynamically negligible. So we do not need to assume anything additional regarding them. Note however that in order to “resum” the free entropy of the Plefka expansion, as we did in Sec. 3.2, we will need to assume that all the variance factors appearing in these simple cycles will be the same, that is $v_{i}=v$ (at the maximum of the free entropy).

5.5 Extension to bipartite models

We detail here how we can treat the diagrams that arise in the Plefka expansion of bipartite models with pairwise interactions (as the generalized linear models) that we perform in Sec. 3.4. The structure of this section is the following:

$\bullet$

We show in Sec. 5.5.1 how we can generalize all the techniques and results already seen in the rest of Sec. 5 to diagrams constructed from a random rectangular matrix $L$ drawn from the rotationally invariant ensemble given by Model R.

$\bullet$

In Sec. 5.5.2 we transpose the assumptions of Sec. 5.4 to this bipartite case, to deal with the higher-order moments of the fields that can arise in the high-temperature Plefka expansions.

5.5.1 Generalization of the previous results to rectangular matrices

Consider a random matrix $F\in\mathbb{R}^{M\times N}$ drawn from a rotation invariant ensemble satisfying Model R. We are interested in the limit $M,N\to\infty$ with a finite ratio $M/N\to\alpha>0$ . In the Plefka expansions performed for bipartite models in Sec. 2.2.2 and Sec. 3.4 they appear some quantities that we can represent as diagrams. In this subsection, we construct diagrams as explained Fig. 7. For instance, the diagram depicted in this figure represents the quantity:

[TABLE]

The analogous to Theorem 2 of [GM05] for this setting can be stated. Let $\Sigma\in\mathbb{R}^{M\times N}$ be a matrix such that the empirical spectral distribution of $D\equiv\Sigma^{\intercal}\Sigma$ converges (almost surely) as $N\to\infty$ to a probability measure $\rho_{D}$ . Denote $\mathcal{G}_{\alpha,\rho_{D}}$ the following function:

[TABLE]

Note that this is obviously an even function of $x$ , and that $\mathcal{G}_{\alpha,\rho_{D}}(0)=0$ . The function $\mathcal{G}_{\alpha,\rho_{D}}$ stands as an analog to the integrated $\mathcal{R}$ -transform $G_{\rho_{D}}$ for this problem. Since one could expand the function $G_{\rho_{D}}$ using the free cumulants of $\rho_{D}$ , we analogously expand formally $\mathcal{G}_{\alpha,\rho_{D}}(x)$ around $x=0$ , and define the coefficients $\Gamma_{p}(\alpha,\rho_{D})$ by:

[TABLE]

Recall that for any function $f(x)$ , and any symmetric matrix $J=ODO^{\intercal}\in{\cal S}_{N}$ , one can define $f(J)\equiv Of(D)O^{\intercal}$ , with $f(D)=\mathrm{Diag}\,(\{f(d_{i})\}_{1\leq i\leq N})$ . If one can expand $f(x)=\sum_{k\geq 0}c_{k}x^{k}$ , this definition is coherent with $f(J)=\sum_{k\geq 0}c_{k}J^{k}$ . Our generalization of Theorem 2 of [GM05] is the following: consider a rectangular matrix $\Lambda\in\mathbb{R}^{M\times N}$ of finite rank $p$ . In other terms, one can write its SVD decomposition as:

[TABLE]

with $\Lambda_{p}\in\mathbb{R}^{p\times p}$ a square diagonal matrix, and $U_{0},V_{0}$ orthogonal matrices. We can now state:

[TABLE]

Note first that the right hand side of this equation can also be written as:

[TABLE]

since $\Lambda$ is of finite rank $p$ . Equality $(a)$ is true since $\mathcal{G}_{\alpha,\rho_{D}}(0)=0$ , and the spectrum of $\Lambda^{\intercal}\Lambda$ and $\Lambda\Lambda^{\intercal}$ only differ by eigenvalues which are all equal to [math]. Note also that we already derived and used this relation, for $p=1$ , when computing the free entropy of the model of Sec. 2.2, as stated in eq. (33). Equipped with the definitions of $\mathcal{G}_{\alpha,\rho_{D}}$ , $\Gamma_{p}(\alpha,\rho_{D})$ , and eq. (173), we can state the counterpart of all our previous results in this rectangular setting:

R.1

Consider a simple cycle of size $2p$ . Then it converges (in $L^{2}$ ) to $\Gamma_{p}(\alpha,\rho_{D})$ as $N\to\infty$ . More precisely we have:

[TABLE] 2. R.2

Any diagram $G$ that is not Eulerian will have a vanishing first and second moment as $N\to\infty$ :

[TABLE] 3. R.3

Any diagram $G$ that is strongly irreducible (that is it can not be disconnected by removing a single vertex) but not a simple cycle, as in Fig. 7, will also have a vanishing first and second moment. 4. R.4

If $G$ is a cactus (a tree made of simple cycles joining at vertices [PP95]) made of $r$ simple cycles of size $(2p_{1},\cdots,2p_{r})$ , we have:

[TABLE]

Since every argument to show points R.1 to R.4 is straightforwardly given by slightly modifying what we already did in Sec. 5, the rest of Sec. 5.5.1 will be devoted to show point R.1, and we leave the remaining points for the reader.

Justifying R.1

In order to show eq. (175), we proceed as in Sec. 5.1 and begin by showing:

[TABLE]

By rotation invariance of the indices we can replace the left-hand side of eq. (178) by a term without summation on the indices, and as in Sec. 5.1 we obtain at leading order in $N$ :

[TABLE]

with $M({\textbf{b}},{\textbf{c}})$ a block matrix of rank $p$ defined as:

[TABLE]

Using eq. (173), we obtain:

[TABLE]

We define the elementary matrices $(T_{ab})_{ll^{\prime}}=\delta_{al}\delta_{bl^{\prime}}$ and the symmetric elementary matrices $E_{ab}=T_{ab}+T_{ba}$ . One easily derives that $\frac{\partial^{2}}{\partial b_{1}\partial c_{1}}M({\textbf{b}},{\textbf{c}})^{\intercal}M({\textbf{b}},{\textbf{c}})=E_{12}$ . In a very similar way to what was done in Sec. 5.1, the dominant terms in eq. (179) will be given by the maximum number of differentiations of the exponential term. However, one can see that the exponential can only be differentiated once: since $M(0,0)=0$ , one would need to create cycles with the matrices $E_{ab}$ , and such a cycle can only appear if one derives a single time the exponential term. As in Sec. 5.1, there are two cycles that are created by the successive derivatives: $E_{12}E_{23}\cdots E_{p1}$ and $E_{21}E_{1p}\cdots E_{32}$ . These two cycles yield the dominant contribution:

[TABLE]

This shows eq. (178). The exact same arguments as the ones used in Sec. 5.3 show that we have $L^{2}$ concentration, which means:

[TABLE]

which is the point R.1 we wanted to show.

5.5.2 The higher order moments and their influence on the diagrammatics

In Sec. 3.4, we deal with diagrams which have additional factors coming from the higher order moments of the fields $x_{i}$ and $h_{\mu}$ at $\beta=0$ , while all the results R.1 to R.4 that we derived for the diagrammatics of the Plefka expansion in this context were made solely out of the matrix elements $\{F_{\mu i}\}$ , without any additional factors. We adopt the notation of Sec. 5.4 for the higher order cumulants. Exactly as in Sec. 5.4, when considering the diagram of Fig. 7, two possible contributions to the free entropy at order $8$ would be:

[TABLE]

The assumptions we need to make in order to deal with these diagrams are very similar to A.1 and A.2, and we state them here for completeness:

B.1

From the construction of the diagrams, odd moments of order greater or equal to $3$ , like $\kappa^{(3,x)}$ , only appear in non-Eulerian graphs. By R.2 we know that these diagrams (without the moments as factors) are negligible. We assume that the possible correlations of the higher order moments with the matrix elements $\{F_{\mu i}\}$ are not strong enough to yield thermodynamically relevant corrections to the free entropy. 2. B.2

Eulerian strongly irreducible diagrams that are not simple cycles are negligible by R.3. We assume that the higher order (even) moments that appear as additional factors do not change their scaling, so that they remain negligible in the thermodynamic limit.

5.6 A note on i.i.d. matrices

We make here a side comment on i.i.d. rectangular matrices. We consider a random matrix $F\in\mathbb{R}^{M\times N}$ whose elements $\{F_{\mu i}\}$ are taken i.i.d., such that $\sqrt{N}F_{\mu i}$ is drawn from a given probability measure $\rho$ . We assume that $\rho$ has zero mean and finite moments of all orders. These matrices appear in our study of the GAMP algorithm in Sec. 4.1. Except if $\rho$ is a Gaussian probability measure, the matrix $F$ is not rotationally invariant, in the sense that it does not satisfy Model R. However, one can still derive strong results on the diagrammatics of $F$ . We still assume B.1 and B.2, that is we assume that the additional factors in the diagrams do not change the scaling of a negligible diagram enough to make it thermodynamically relevant. It is then easy to see that because the $\{F_{\mu i}\}$ are uncorrelated, all diagrams with order $p\geq 3$ are negligible in the $N\to\infty$ limit. The only diagram that remains in the $N\to\infty$ limit is:

[TABLE]

In particular, we can only retain this diagram and apply the results of our Plefka expansions in this case as well, despite the fact that $F$ is not rotationally invariant.

Acknowledgments

The authors would like to thank Yoshiyuki Kabashima, Marylou Gabrié, Bertrand Eynard and Jorge Kurchan for many insightful discussions. This work is supported by “Investissements d’Avenir" LabEx PALM (ANR-10-LABX-0039-PALM) (EquiDystant project, L. Foini), as well as by the French Agence Nationale de la Recherche under grant ANR-17-CE23-0023-01 PAIL, the European Union’s Horizon 2020 Research and Innovation Program 714608-SMiLe, and the ERC 307087 SPARCS. Additional funding is acknowledged by AM from ‘Chaire de recherche sur les modèles et sciences des données’, Fondation CFM pour la Recherche-ENS.

Appendix A The Georges-Yedidia formalism

In this section we recall the formalism of [GY91] which allows to systematically expand the free entropy around $\beta=0$ , at fixed values of the first and second moments of the variables. We consider here a generic Hamiltonian $H_{J}({\textbf{x}})$ , with variables $(x_{1},\cdots,x_{N})$ . We fix the first and second moments $\braket{x_{i}}_{\beta}=m_{i}$ and $\braket{(x_{i}-m_{i})^{2}}_{\beta}=v_{i}$ , using Lagrange parameters respectivelly denoted $\lambda_{i}(\beta)$ and $\gamma_{i}(\beta)$ . Recall that $\braket{\cdot}_{\beta}$ stands for the expectation over the Gibbs measure at inverse temperature $\beta$ , constrained by the Lagrange multipliers $\lambda_{i}$ and $\gamma_{i}$ . From now on, we will drop the $\beta$ supbscript.

We introduce the operator $U$ from Appendix A of [GY91]:

[TABLE]

Then the derivative of the thermal average of any observable $O$ is given by

[TABLE]

As the Lagrange multipliers $\lambda_{i}$ and $\gamma_{i}$ have been introduced to fix the average of $x_{i}$ and its variance one has the following easy identity, valid at any $\beta$ :

[TABLE]

Moreover, given that the magnetizations $\{m_{i}\}$ and the variances $\{v_{i}\}$ do not depend on $\beta$ one has:

[TABLE]

Considering the previous results one can compute the derivative of $U$ :

[TABLE]

Equipped with these relations one can compute the derivatives of the free entropy up to fourth order. Recall that $\Phi_{J}$ is the intensive free entropy of the system. We obtain its derivatives:

[TABLE]

These relations are valid at any inverse temperature $\beta$ ! In the main sections we derive the explicit expression of the operator $U$ for our particular choice of Hamiltonian, and we will use these relations (and show how to conjecture their higher order counterparts) to compute the expansion of the free entropy around $\beta=0$ .

Appendix B Order $4$ of the Plefka expansion for Sec. 2.1.

We start from eq. (A) in Appendix A, that we consider at $\beta=0$ :

[TABLE]

For simplicity we will denote $\tilde{x}_{i}\equiv(x_{i}-m_{i})$ , so that at $\beta=0$ the $\{\tilde{x_{i}}\}$ variables are Gaussian variables with mean $\braket{\tilde{x}_{i}}=0$ and covariance $\braket{\tilde{x_{i}}\tilde{x_{j}}}=\delta_{ij}v_{i}$ . In particular eq. (21) becomes:

[TABLE]

From the calculation at order $2$ we obtain the following relation that we can represent diagrammatically:

[TABLE]

We now turn to the next term:

[TABLE]

In $(a)$ we used the Maxwell equation eq. (20), while in $(b)$ we made use of the fact that the order $2$ of the free entropy does not depend on the $m_{i}$ variables. We obtain:

[TABLE]

in which we used the Maxwell relation eq. (19) to compute $\partial^{2}_{\beta}\gamma_{i}$ . To compute $\braket{U^{2}\tilde{x}_{i}^{2}}_{0}$ , we expand:

[TABLE]

We can then use Wick’s theorem to simplify the average. There are two types of contractions (or pairings):

$\bullet$

Contractions that do not mix indices $i_{1},j_{1},i_{2},j_{2}$ with $i$ . There are $2$ such possible pairings and in $\frac{\partial^{4}\Phi_{J}}{\partial\beta^{4}}$ they give rise to the diagram $\left[N\leavevmode\hbox to40.12pt{\vbox to17.11pt{\pgfpicture\makeatletter\hbox{\hskip 5.83301pt\lower-8.5572pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-2.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$ \bullet $}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{25.95276pt}{-2.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$ \bullet $}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@curveto{7.84589pt}{7.84589pt}{20.60687pt}{7.84589pt}{28.45276pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{ {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@curveto{7.84589pt}{-7.84589pt}{20.60687pt}{-7.84589pt}{28.45276pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{ {}{}{}{}{}}{{{}}{{}}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right]^{2}$ .

$\bullet$

Contractions that mix these indices with $i$ . There are all equivalent and there are $8$ of them, which gives rise to the diagram: $N\leavevmode\hbox to68.57pt{\vbox to17.11pt{\pgfpicture\makeatletter\hbox{\hskip 34.28577pt\lower-8.5572pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-2.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$ \bullet $}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{25.95276pt}{-2.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$ \bullet $}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-30.95276pt}{-2.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$ \bullet $}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@curveto{7.84589pt}{7.84589pt}{20.60687pt}{7.84589pt}{28.45276pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{ {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@curveto{7.84589pt}{-7.84589pt}{20.60687pt}{-7.84589pt}{28.45276pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{ {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{-28.45276pt}{0.0pt}\pgfsys@curveto{-20.60687pt}{7.84589pt}{-7.84589pt}{7.84589pt}{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{ {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{-28.45276pt}{0.0pt}\pgfsys@curveto{-20.60687pt}{-7.84589pt}{-7.84589pt}{-7.84589pt}{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{ {}{}{}{}{}}{{{}}{{}}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$ .

In the end, we reach:

[TABLE]

We can finally compute the term we were seeking:

[TABLE]

Note that in this last equation we could add the hypothesis that $j\neq k$ . Indeed the term $j=k$ would give rise to the diagram $N\leavevmode\hbox to40.12pt{\vbox to19.6pt{\pgfpicture\makeatletter\hbox{\hskip 5.83301pt\lower-9.80043pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.5pt}{-2.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$ \bullet $}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{}{{ {}{}}}{ {}{}} {{}{{}}}{{}{}}{}{{}{}} { }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{25.95276pt}{-2.22221pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{$ \bullet $}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} { {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@curveto{6.36427pt}{9.08911pt}{22.08849pt}{9.08911pt}{28.45276pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{ {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@curveto{10.71768pt}{2.8718pt}{17.73508pt}{2.8718pt}{28.45276pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{ {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@curveto{6.36427pt}{-9.08911pt}{22.08849pt}{-9.08911pt}{28.45276pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{ {}{}{}}{}{{}}{} { {}{}{}}{{}{}}{{}} {{{}}{{}}}{{}}{ {}{}{}}{{{}}{{}}}{ {}{}{}}{}{{}}{}{}{}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@setlinewidth{1.42264pt}\pgfsys@invoke{ }{}\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@curveto{10.71768pt}{-2.8718pt}{17.73508pt}{-2.8718pt}{28.45276pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{ {}{}{}{}{}}{{{}}{{}}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}$ , which is negligible since for every $i\neq j$ one has $J_{ij}=\mathcal{O}(\frac{1}{\sqrt{N}})$ as a consequence of rotational invariance (Model S). We finally turn to the computation of $\braket{U^{4}}_{0}$ :

[TABLE]

The possible contractions arising from Wick’s theorem yield several contributions, that we can represent by diagrams. Note that these diagrams are very different from the diagrams that we described for instance in Fig. 1, and are merely a way to visualize the contractions in Wick’s theorem. The first column contains the $i_{\alpha}$ indices and the second contains the $j_{\alpha}$ . Note that we always have $i_{\alpha}\neq j_{\alpha}$ . The two different types of contractions are represented as Fig. 8(a) and Fig. 8(b). They are $12$ possible contractions of the type of Fig. 8(a) and $48$ of Fig. 8(b). We also take into account that in the pairings of Fig. 8(b) indices are not all necessarily pairwise distinct. Discarding terms that are $\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}_{N}(N)$ , we finally reach:

[TABLE]

Finally, combining eq. (195), eq. (197), and eq. (198) to plug them into eq. (193), we reach:

[TABLE]

which is what we wanted to show !

Appendix C Some definitions and reminders of random matrix theory

For a complete mathematical introduction to random matrix theory, the reader can refer to [Meh04, AGZ10], while a more practical approach is carried out in [TV04]. Let us consider a compactly supported probability measure $\mu$ on $\mathbb{R}$ . We denote $\lambda_{\rm max}\equiv\max\mathrm{supp}(\mu)$ and $\lambda_{\rm min}\equiv\min\mathrm{supp}(\mu)$ . One can introduce the Stieltjes transform of $\mu$ as:

[TABLE]

On $(\lambda_{\rm max},+\infty)$ , $\mathcal{S}_{\mu}$ induces a strictly increasing $\mathcal{C}^{\infty}$ diffeomorphism ${\cal S}_{\mu}:(\lambda_{\rm max},\infty)\hookrightarrow(-\infty,0)$ , and we denote its inverse $\mathcal{S}^{-1}_{\mu}$ . One can then introduce the ${\cal R}$ -transform of $\mu$ as:

[TABLE]

$\mathcal{R}_{\mu}(z)$ is a priori defined for $-z\in\mathcal{S}_{\mu}\left[(\lambda_{\rm min},\lambda_{\rm max})^{c}\right]$ and admits an analytical expansion around $z=0$ . We can write this expansion as:

[TABLE]

The elements of the sequence $\{c_{k}(\mu)\}_{k\in\mathbb{N}^{\star}}$ are called the free cumulants of $\mu$ . In particular, one can show that $c_{1}(\mu)=\mathbb{E}_{\mu}(X)$ and $c_{2}(\mu)=\mathbb{E}_{\mu}(X^{2})-(\mathbb{E}_{\mu}X)^{2}$ . The free cumulants can be recursively computed from the moments of the measure using the so-called free cumulant equation:

[TABLE]

For practical purposes, for all $x\in(-\mathcal{S}_{\mu}(\lambda_{\rm min}),-\mathcal{S}_{\mu}(\lambda_{\rm max}))$ we can define:

[TABLE]

Appendix D Technical derivations and generalizations of the diagrammatics

We detail here some extensions of the results of Sec. 5. In Sec. D.1, we explain how to transpose these results to Hermitian matrix models, and in Sec. D.2 we show how to extend some of them to diagrams of diverging size (as $N\to\infty$ ).

D.1 Hermitian matrix model

One can generalize the results of Sec. 5 to the following Hermitian matrix model, similar to Model S:

Model 3.

Let $N\geq 1$ and $\mathcal{U}(N)$ be the unitary group. Let $J\in\mathbb{C}^{N\times N}$ be a random matrix generated as $J=UDU^{\dagger}$ with $U\in\mathcal{U}(N)$ drawn uniformly and independently from $D$ . $D$ is a real diagonal matrix such that its empirical spectral distribution $\rho^{(N)}_{D}\equiv\frac{1}{N}\sum_{i=1}^{N}\delta_{d_{i}}$ converges (almost surely) as $N\to\infty$ a.s. to a probability distribution $\rho_{D}$ with compact support. The smallest and largest eigenvalue of $D$ are assumed to converge almost surely to the infimum and supremum of the support of $\rho_{D}$ .

Note that the diagrams are now directed, as $J_{ij}=\overline{J_{ji}}$ . We describe such diagrams in Fig. 9.

For instance, the diagram of Fig. 9(a) is equal to:

[TABLE]

while the diagram of Fig. 9(b) represents the quantity:

[TABLE]

In the complex case, an Eulerian graph is similarly defined as a graph in which one can construct a cyclic path (following the directions of the edges) that visits each edge exactly once. Note that a simple cycle is defined such that the arrows on its edges themselves form a cycle, like the constituent cycles of Fig. 9(c). We describe the main results we get, using the same kind of techniques as used in Sec. 5.1:

$(i)$

Only Eulerian diagrams contribute in the $N\to\infty$ limit. 2. $(ii)$

Consider a simple cycle ${\cal C}_{p}$ with $p$ vertices. Then this diagram converges in the $N\to\infty$ limit to the free cumulant $c_{p}(\rho_{D})$ in $L^{2}$ norm, as in the real case. More precisely:

[TABLE] 3. $(iii)$

Any Eulerian strongly irreducible diagram that is not a simple cycle will be negligible in the $N\to\infty$ limit (in $L^{2}$ norm). 4. $(iv)$

Any Eulerian cactus diagram (like in Fig. 9(c)) will converge in $L^{2}$ to the products of the free cumulants of $\rho_{D}$ corresponding to each one of its constituent simple cycles.

These results are straightforward generalizations of the ones obtained for real matrices in Sec. 5. For completeness, we describe how to show a weaker version of $(ii)$ , and leave other statements as easy generalizations of Sec. 5. Let us now show that the limit of the expectation of the term in $(ii)$ is the free cumulant, as in Sec. 5.1. As before, by unitary invariance we can assume that $(i_{1},\cdots,i_{p})=(1,,\cdots,p)$ , and we can apply the results of [GM05] to obtain a similar equation to eq. (155):

[TABLE]

with now the matrix $M({\textbf{b}},{\textbf{c}})$ defined as:

[TABLE]

Now, we have $\left[\frac{\partial}{\partial b_{i}}+i\frac{\partial}{\partial c_{i}}\right]M({\textbf{b}},{\textbf{c}})=F_{i+1,i}$ , in which $(F_{a,b})_{ll^{\prime}}\equiv\delta_{al}\delta_{bl^{\prime}}$ are elementary non-symmetric matrices. In the exact same way as in Sec. 5.1, the dominant contribution in eq. (D.1) will be given by differentiating a single time the exponential term, and creating a cycle with the matrices $F_{i+1,i}$ . Note that contrary to the symmetric case of Sec. 5.1, here only the directed cycle will contribute, whereas both possible directions of the cycle contributed in eq. (159). Indeed, the cycles in terms of the matrices $\{F_{a,b}\}$ have to be directed in order to yield a non-zero contribution:

[TABLE]

Thus we have:

[TABLE]

In order to get $L^{2}$ concentration of the simple cycle on the free cumulant, one can exactly repeat the arguments of Sec. 5.3.

D.2 A note on the expectation of diagrams of diverging size

Although it is not directly useful in our Plefka expansions, another side question one can ask on the behavior of these diagrams is: how do diagrams that have a number of edges that diverge with $N$ behave in the $N\to\infty$ limit ? In all of Sec. 5 we only considered diagrams of finite size. The behavior of the HCIZ-type integrals with a matrix with diverging rank (as opposed to the finite-rank case) has been rigorously treated in [GM05] and then generalized in [CŚ07] as soon as the rank of the matrix diverges sub linearly in $N$ . We recall the main result of [CŚ07]:

Theorem 3 (Collins-Śniadyc).

Let $A_{N},B_{N}$ be diagonal real matrices of size $N$ . Assume that the rank $M(N)$ of $A_{N}$ is such that $M(N)=\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}(N)$ , and denote $a_{1,N}\geq\cdots\geq a_{M,N}$ the eigenvalues of $A_{N}$ . Assume that the spectral measure of $B_{N}$ converges a.s. and in the weak sense to a probability measure $\rho_{B}$ , and that all elements of $A_{N}$ are bounded by a constant independent of $N$ . Then one has:

[TABLE]

A similar result holds for real orthogonal matrices:

[TABLE]

The techniques of Sec. 5 thus generalize to this case. We consider real symmetric matrices under Model S (in the Hermitian case, the results also generalize following the line of Appendix D.1). We say that a sequence $\{p(N)\}$ satisfies the bounded free cumulant property if it satisfies the following:

Property 1.

There exists $C>0$ such that for all $N\in\mathbb{N}$ , $|c_{p(N)}(\rho_{D})|<C$ .

We state two of the results of Sec. 5 that can be easily generalized to the diverging size case without changing any of the arguments:

$(a)$

Consider a sequence $p(N)=\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}_{N}(N)$ that satisfies the bounded free cumulant property. Then one obtains the generalization of eq. (150):

[TABLE] 2. $(c)$

Consider a cactus diagram $G$ composed of $P(N)$ simple cycles of size $(r_{1}(N),\cdots,r_{P}(N))$ , joining at vertices. Assume that $\sum_{i=1}^{P(N)}r_{i}(N)=\mathchoice{{\scriptstyle\mathcal{O}}}{{\scriptstyle\mathcal{O}}}{{\scriptscriptstyle\mathcal{O}}}{\scalebox{0.7}{$ \scriptscriptstyle\mathcal{O} $}}_{N}(N)$ and that all the sequences $r_{i}(N)$ satisfy the bounded free cumulant property. Then one has:

[TABLE]

Other results obtained in Sec. 5 for finite-size diagrams might also be applicable to the diverging size case, and we leave them for future work.

Bibliography58

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AFP 16] Ada Altieri, Silvio Franz, and Giorgio Parisi. The jamming transition in high dimension: an analytical study of the TAP equations and the effective thermodynamic potential. Journal of Statistical Mechanics: Theory and Experiment , 2016(9):093301, 2016.
2[AGZ 10] Greg W Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to random matrices , volume 118. Cambridge university press, 2010.
3[Alt 18] Ada Altieri. Higher-order corrections to the effective potential close to the jamming transition in the perceptron model. Physical Review E , 97(1):012103, 2018.
4[BBJ 19] M. Bauer, D. Bernard, and T. Jin. Equilibrium Fluctuations in Maximally Noisy Extended Quantum Systems. Sci Post Phys. , 6:45, 2019.
5[BKM + 19] Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, and Lenka Zdeborová. Optimal errors and phase transitions in high-dimensional generalized linear models. Proceedings of the National Academy of Sciences , page 201802705, 2019.
6[CDL 03] Raphaël Cherrier, David S Dean, and Alexandre Lefèvre. Role of the interaction matrix in mean-field spin glass models. Physical Review E , 67(4):046112, 2003.
7[ÇO 19] Burak Çakmak and Manfred Opper. Convergent dynamics for solving the tap equations of ising models with arbitrary rotation invariant coupling matrices. ar Xiv preprint ar Xiv:1901.08583 , 2019.
8[ÇOFW 16] Burak Çakmak, Manfred Opper, Bernard H Fleury, and Ole Winther. Self-averaging expectation propagation. ar Xiv preprint ar Xiv:1608.06602 , 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

High-temperature Expansions and Message Passing Algorithms

Abstract

Contents

1 Introduction

1.1 Background and overview of related works

1.2 Structure of the paper, and summary of our contributions

Model S** (Symmetric rotationally invariant matrix).**

Model R** (Rectangular rotationally invariant matrix).**

Examples

Conjecture 1**.**

2 Symmetric and bipartite spherical models with rotationally-invariant couplings

2.1 Symmetric spherical model

2.1.1 Direct free entropy computation

Remark

Remark

2.1.2 Plefka expansion and the Georges-Yedidia formalism

Order 0

Order 1

Order 2

Order 3 and 4

Larger orders

Theorem 1**.**

2.1.3 Stability of the paramagnetic phase

Validity of the Plefka expansion and stability of the replica symmetric solution

The region of validity of the expansion and the free cumulant series

2.2 Bipartite spherical model

2.2.1 Direct free entropy computation

2.2.2 Plefka expansion

3 Plefka expansion and Expectation Consistency approximations

3.1 Expectation Consistency, adaptive TAP, and Vector Approximate Message Passing approximations

3.1.1 Expectation Consistency approximation

3.1.2 Adaptive TAP approximation

3.1.3 Vector Approximate Message Passing approximation

3.2 Plefka expansion for models of symmetric pairwise interactions

3.2.1 A symmetric model with generic priors

Higher orders

Homogeneous variances

3.2.2 Connection of the Plefka expansion to EC approximations

3.2.3 Application to the Hopfield model

3.3 The Replica approach

3.3.1 Expectation-Consistency approximation

3.3.2 The adaTAP approximation

3.3.3 The VAMP approach

3.3.4 Plefka expansion

3.4 Plefka expansion for models of bipartite pairwise interactions

3.4.1 A bipartite model with generic priors

On higher orders

Homogeneous variances

A remark on Restricted Boltzmann machines

3.4.2 Generalized Linear Models with correlated matrices

Compressed Sensing, the Gaussian channel case

Generic channel distributions

Homogeneous variances

4 Consequences for iterative algorithms

4.1 Generalized Approximate Message Passing for a GLM with i.i.d. matrices

The Gaussian channel case

4.2 Vector Approximate Message Passing (VAMP) in Compressed Sensing

4.2.1 The TAP equations in Compressed sensing

4.2.2 TAP equations and the fixed point of the VAMP algorithm

A remark on i.i.d. matrices

A remark on iterating the TAP equations in the i.i.d. case

4.3 Generalized Vector Approximate Message Passing (G-VAMP) for Generalized Linear Models

4.3.1 The TAP equations from the Plefka expansion

Remark: Additive gaussian channel

4.3.2 The G-VAMP algorithm for Generalized Linear Models

4.3.3 TAP equations and fixed points of G-VAMP

5 The diagrammatics of the Plefka expansion

5.1 A weaker version of Theorem 1

Theorem 2** (Expectation of simple cycles and free cumulants).**

Proof of Theorem 2.

5.2 The expectation of generic diagrams

5.2.1 Eulerian diagrams, strongly irreducible diagrams and simple cycles

5.2.2 Cactus diagrams

5.3 Concentration of the diagrams: a second moment analysis

Model S (Symmetric rotationally invariant matrix).

Model R (Rectangular rotationally invariant matrix).

Conjecture 1.

Theorem 1.

Theorem 2 (Expectation of simple cycles and free cumulants).

Appendix B Order $4$ of the Plefka expansion for Sec. 2.1.

Model 3.

Theorem 3 (Collins-Śniadyc).

Property 1.