Neural Networks retrieving Boolean patterns in a sea of Gaussian ones

Elena Agliari; Adriano Barra; Chiara Longo; Daniele Tantari

arXiv:1703.05210·math-ph·August 2, 2017

Neural Networks retrieving Boolean patterns in a sea of Gaussian ones

Elena Agliari, Adriano Barra, Chiara Longo, Daniele Tantari

PDF

TL;DR

This paper investigates the retrieval capabilities of neural networks with mixed real and Boolean patterns, demonstrating that Boolean patterns can be retrieved even under high real pattern load, aligning with classical theoretical thresholds.

Contribution

It introduces a theoretical analysis of mixed Hebbian networks with both Gaussian and Boolean patterns, revealing conditions for Boolean pattern retrieval in high real pattern load regimes.

Findings

01

Boolean patterns can be retrieved despite high real pattern load

02

The critical load threshold matches the classical Amit-Gutfreund-Sompolinsky theory

03

The analysis extends existing models to mixed pattern types

Abstract

Restricted Boltzmann Machines are key tools in Machine Learning and are described by the energy function of bipartite spin-glasses. From a statistical mechanical perspective, they share the same Gibbs measure of Hopfield networks for associative memory. In this equivalence, weights in the former play as patterns in the latter. As Boltzmann machines usually require real weights to be trained with gradient descent like methods, while Hopfield networks typically store binary patterns to be able to retrieve, the investigation of a mixed Hebbian network, equipped with both real (e.g., Gaussian) and discrete (e.g., Boolean) patterns naturally arises. We prove that, in the challenging regime of a high storage of real patterns, where retrieval is forbidden, an extra load of Boolean patterns can still be retrieved, as long as the ratio among the overall load and the network size does not exceed…

Equations133

h_{i} = j \neq = i \sum N J_{ij} σ_{j},

h_{i} = j \neq = i \sum N J_{ij} σ_{j},

J_{ij} = \frac{1}{N} μ = 1 \sum p ξ_{i}^{μ} ξ_{j}^{μ} .

J_{ij} = \frac{1}{N} μ = 1 \sum p ξ_{i}^{μ} ξ_{j}^{μ} .

H_{N}^{A H N} (σ, ξ) = - \frac{1}{2 N} i, j \sum N μ = 1 \sum p ξ_{i}^{μ} ξ_{j}^{μ} σ_{i} σ_{j} .

H_{N}^{A H N} (σ, ξ) = - \frac{1}{2 N} i, j \sum N μ = 1 \sum p ξ_{i}^{μ} ξ_{j}^{μ} σ_{i} σ_{j} .

Z_{N, p}^{A H N} (β) = σ \sum exp {\frac{β}{2 N} μ = 1 \sum p i, j \sum N ξ_{i}^{μ} ξ_{j}^{μ} σ_{i} σ_{j}} .

Z_{N, p}^{A H N} (β) = σ \sum exp {\frac{β}{2 N} μ = 1 \sum p i, j \sum N ξ_{i}^{μ} ξ_{j}^{μ} σ_{i} σ_{j}} .

H_{N}^{R B M} (σ, ξ) = - \frac{1}{N} i, μ \sum N, p ξ_{i}^{μ} σ_{i} z_{μ} .

H_{N}^{R B M} (σ, ξ) = - \frac{1}{N} i, μ \sum N, p ξ_{i}^{μ} σ_{i} z_{μ} .

Z_{N, p}^{R B M} (β) = σ \sum \int_{R^{p}} d M (z) exp {\frac{β}{N} μ = 1 \sum p i = 1 \sum N ξ_{i}^{μ} σ_{i} z_{μ}},

Z_{N, p}^{R B M} (β) = σ \sum \int_{R^{p}} d M (z) exp {\frac{β}{N} μ = 1 \sum p i = 1 \sum N ξ_{i}^{μ} σ_{i} z_{μ}},

Z_{N, p}^{A H N} (β) \equiv Z_{N, p}^{R B M} (β),

Z_{N, p}^{A H N} (β) \equiv Z_{N, p}^{R B M} (β),

m_{μ, N} (σ) = \frac{1}{N} i = 1 \sum N ξ_{i}^{μ} σ_{i} .

m_{μ, N} (σ) = \frac{1}{N} i = 1 \sum N ξ_{i}^{μ} σ_{i} .

H_{N}^{A H N} (σ, ξ) = - \frac{N}{2} μ = 1 \sum p m_{μ}^{2},

H_{N}^{A H N} (σ, ξ) = - \frac{N}{2} μ = 1 \sum p m_{μ}^{2},

H_{N}^{R B M} (σ, ξ) = - N μ = 1 \sum p m_{μ} z_{μ},

H_{N}^{R B M} (σ, ξ) = - N μ = 1 \sum p m_{μ} z_{μ},

{P {\tilde{ξ}_{i}^{ν} = + 1} = P {\tilde{ξ}_{i}^{ν} = - 1} = \frac{1}{2} P (ξ_{i}^{μ}) \sim N (0, 1) \forall i = 1, \dots, N and ν = 1, \dots, k, \forall i = 1, \dots, N and μ = 1, \dots, p .

{P {\tilde{ξ}_{i}^{ν} = + 1} = P {\tilde{ξ}_{i}^{ν} = - 1} = \frac{1}{2} P (ξ_{i}^{μ}) \sim N (0, 1) \forall i = 1, \dots, N and ν = 1, \dots, k, \forall i = 1, \dots, N and μ = 1, \dots, p .

H_{N}^{M H N} (σ, ξ, \tilde{ξ}) = - \frac{1}{N} 1 ⩽ i < j ⩽ N \sum (ν = 1 \sum k \tilde{ξ}_{i}^{ν} \tilde{ξ}_{j}^{ν} + μ = 1 \sum p ξ_{i}^{μ} ξ_{j}^{μ}) σ_{i} σ_{j} .

H_{N}^{M H N} (σ, ξ, \tilde{ξ}) = - \frac{1}{N} 1 ⩽ i < j ⩽ N \sum (ν = 1 \sum k \tilde{ξ}_{i}^{ν} \tilde{ξ}_{j}^{ν} + μ = 1 \sum p ξ_{i}^{μ} ξ_{j}^{μ}) σ_{i} σ_{j} .

H_{N} (σ, ξ, \tilde{ξ}) = - \frac{1}{2 N} i, j = 1 \sum N (ν = 1 \sum k \tilde{ξ}_{i}^{ν} \tilde{ξ}_{j}^{ν} + μ = 1 \sum p ξ_{i}^{μ} ξ_{j}^{μ}) σ_{i} σ_{j} + \frac{1}{2 N} i = 1 \sum N μ = 1 \sum p (ξ_{i}^{μ})^{2} + \frac{k}{2},

H_{N} (σ, ξ, \tilde{ξ}) = - \frac{1}{2 N} i, j = 1 \sum N (ν = 1 \sum k \tilde{ξ}_{i}^{ν} \tilde{ξ}_{j}^{ν} + μ = 1 \sum p ξ_{i}^{μ} ξ_{j}^{μ}) σ_{i} σ_{j} + \frac{1}{2 N} i = 1 \sum N μ = 1 \sum p (ξ_{i}^{μ})^{2} + \frac{k}{2},

N \to \infty lim [\frac{1}{2 N} i = 1 \sum N μ = 1 \sum p (ξ_{i}^{μ})^{2}] = \frac{α}{2} .

N \to \infty lim [\frac{1}{2 N} i = 1 \sum N μ = 1 \sum p (ξ_{i}^{μ})^{2}] = \frac{α}{2} .

ω_{N} (F) = \frac{\sum _{σ} F ( σ ) e ^{- β H_{N} (σ, ξ, \tilde{ξ})}}{Z _{N} ( β )} .

ω_{N} (F) = \frac{\sum _{σ} F ( σ ) e ^{- β H_{N} (σ, ξ, \tilde{ξ})}}{Z _{N} ( β )} .

\displaystyle\begin{split}\Omega\bigl{(}F(\sigma^{(1)},\ldots,\sigma^{(s)})\bigr{)}=\frac{1}{Z_{N}^{s}}\sum_{\sigma^{(1)}}\cdots\sum_{\sigma^{(s)}}F(\sigma^{(1)},\ldots,\sigma^{(s)})\exp\left\{-\beta\sum_{a=1}^{s}H_{N}(\sigma^{(a)},\xi,\tilde{\xi})\right\}\;.\end{split}

\displaystyle\begin{split}\Omega\bigl{(}F(\sigma^{(1)},\ldots,\sigma^{(s)})\bigr{)}=\frac{1}{Z_{N}^{s}}\sum_{\sigma^{(1)}}\cdots\sum_{\sigma^{(s)}}F(\sigma^{(1)},\ldots,\sigma^{(s)})\exp\left\{-\beta\sum_{a=1}^{s}H_{N}(\sigma^{(a)},\xi,\tilde{\xi})\right\}\;.\end{split}

E [F (ξ, \tilde{ξ})] = \int μ = 1 \prod p i = 1 \prod N \frac{d ξ _{i}^{μ}}{2 π} e^{- \frac{( ξ _{i}^{μ} ) ^{2}}{2}} \times ν = 1 \prod k j = 1 \prod N {\tilde{ξ}_{j}^{ν}} \sum \frac{1}{2} F (ξ, \tilde{ξ}) .

E [F (ξ, \tilde{ξ})] = \int μ = 1 \prod p i = 1 \prod N \frac{d ξ _{i}^{μ}}{2 π} e^{- \frac{( ξ _{i}^{μ} ) ^{2}}{2}} \times ν = 1 \prod k j = 1 \prod N {\tilde{ξ}_{j}^{ν}} \sum \frac{1}{2} F (ξ, \tilde{ξ}) .

q_{ab} (σ) = \frac{1}{N} i = 1 \sum N σ_{i}^{(a)} σ_{i}^{(b)} \in [- 1, 1],

q_{ab} (σ) = \frac{1}{N} i = 1 \sum N σ_{i}^{(a)} σ_{i}^{(b)} \in [- 1, 1],

p_{ab} (z) = \frac{1}{p} μ = 1 \sum p z_{μ}^{(a)} z_{μ}^{(b)} \in (- \infty, + \infty) .

p_{ab} (z) = \frac{1}{p} μ = 1 \sum p z_{μ}^{(a)} z_{μ}^{(b)} \in (- \infty, + \infty) .

A (α, β) = N \to \infty lim A_{N, k, p} (β), A_{N, k, p} (β) = \frac{1}{N} E ln Z_{N, k, p} (β),

A (α, β) = N \to \infty lim A_{N, k, p} (β), A_{N, k, p} (β) = \frac{1}{N} E ln Z_{N, k, p} (β),

Z_{N, k, p} (β)

Z_{N, k, p} (β)

\displaystyle\begin{split}A_{N,k,p}&(\beta)=\frac{1}{N}\mathbb{E}\log Z_{N,k,p}(\beta)=\\ =&\frac{1}{N}\mathbb{E}\Biggl{[}-\frac{\beta k}{2}-\frac{\beta}{2N}\sum_{i=1}^{N}\sum_{\mu=1}^{p}(\xi_{i}^{\mu})^{2}\Biggr{]}+\\ &+\frac{1}{N}\mathbb{E}\log\Biggl{(}\sum_{\sigma}\exp\Biggl{\{}\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\mu=1}^{p-1}\xi_{i}^{\mu}\xi_{j}^{\mu}\sigma_{i}\sigma_{j}\Biggr{\}}\Biggl{)}\\ =&-O\left(\frac{\ln N}{N}\right)-\frac{\alpha_{N}\beta}{2}+\\ &+\frac{1}{N}\mathbb{E}\ln\Biggl{(}\sum_{\sigma}\exp\Biggl{\{}\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu}\sigma_{i}\sigma_{j}\Biggr{\}}\Biggl{)}\;,\end{split}

\displaystyle\begin{split}A_{N,k,p}&(\beta)=\frac{1}{N}\mathbb{E}\log Z_{N,k,p}(\beta)=\\ =&\frac{1}{N}\mathbb{E}\Biggl{[}-\frac{\beta k}{2}-\frac{\beta}{2N}\sum_{i=1}^{N}\sum_{\mu=1}^{p}(\xi_{i}^{\mu})^{2}\Biggr{]}+\\ &+\frac{1}{N}\mathbb{E}\log\Biggl{(}\sum_{\sigma}\exp\Biggl{\{}\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\mu=1}^{p-1}\xi_{i}^{\mu}\xi_{j}^{\mu}\sigma_{i}\sigma_{j}\Biggr{\}}\Biggl{)}\\ =&-O\left(\frac{\ln N}{N}\right)-\frac{\alpha_{N}\beta}{2}+\\ &+\frac{1}{N}\mathbb{E}\ln\Biggl{(}\sum_{\sigma}\exp\Biggl{\{}\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu}\sigma_{i}\sigma_{j}\Biggr{\}}\Biggl{)}\;,\end{split}

\displaystyle\begin{split}Z_{N,k,p}(\beta)=&\exp\Biggl{\{}-\frac{\beta k}{2}+\frac{\beta}{2N}\sum_{i=1}^{N}\sum_{\mu=1}^{p}(\xi_{i}^{\mu})^{2}\Biggr{\}}\times\\ &\times\sum_{\sigma}\exp\left\{\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu}\sigma_{i}\sigma_{j}\right\}=\\ =&\exp\Biggl{\{}-\frac{\beta k}{2}+\frac{\beta}{2N}\sum_{i=1}^{N}\sum_{\mu=1}^{p}(\xi_{i}^{\mu})^{2}\Biggr{\}}\sum_{\sigma}\exp\left\{\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}\right\}\times\\ &\times\int_{\mathbb{R}^{p}}d\mathcal{M}(z)\exp\left\{\sqrt{\frac{\beta}{N}}\sum_{\mu=1}^{p}\sum_{i=1}^{N}\xi_{i}^{\mu}\sigma_{i}z_{\mu}\right\}\;,\end{split}

\displaystyle\begin{split}Z_{N,k,p}(\beta)=&\exp\Biggl{\{}-\frac{\beta k}{2}+\frac{\beta}{2N}\sum_{i=1}^{N}\sum_{\mu=1}^{p}(\xi_{i}^{\mu})^{2}\Biggr{\}}\times\\ &\times\sum_{\sigma}\exp\left\{\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu}\sigma_{i}\sigma_{j}\right\}=\\ =&\exp\Biggl{\{}-\frac{\beta k}{2}+\frac{\beta}{2N}\sum_{i=1}^{N}\sum_{\mu=1}^{p}(\xi_{i}^{\mu})^{2}\Biggr{\}}\sum_{\sigma}\exp\left\{\frac{\beta}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}\right\}\times\\ &\times\int_{\mathbb{R}^{p}}d\mathcal{M}(z)\exp\left\{\sqrt{\frac{\beta}{N}}\sum_{\mu=1}^{p}\sum_{i=1}^{N}\xi_{i}^{\mu}\sigma_{i}z_{\mu}\right\}\;,\end{split}

\displaystyle\begin{split}Z_{N}&(t,x,\psi)=\exp\Biggl{\{}-\frac{\beta k}{2}-\frac{\beta}{2N}\sum_{i=1}^{N}\sum_{\mu=1}^{p}(\xi_{i}^{\mu})^{2}\Biggr{\}}\times\\ &\times\sum_{\sigma}\int_{\mathbb{R}^{p}}d\mathcal{M}(z)\ \exp\Biggl{\{}\frac{t}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\sum_{\nu=1}^{k}x_{\nu}\sum_{i=1}^{N}\tilde{\xi}_{i}^{\nu}\sigma_{i}\Biggr{\}}\times\\ &\times\exp\Biggl{\{}\sqrt{\psi}\sqrt{\frac{\beta}{N}}\sum_{\mu=1}^{p}\sum_{i=1}^{N}\xi_{i}^{\mu}\sigma_{i}z_{\mu}\Biggr{\}}\times\exp\Biggl{\{}A\sqrt{1-\psi}\sum_{i=1}^{N}\eta_{i}\sigma_{i}\Biggr{\}}\times\\ &\times\exp\Biggl{\{}B\sqrt{1-\psi}\sum_{\mu=1}^{p}\theta_{\mu}z_{\mu}\Biggr{\}}\times\exp\Biggl{\{}C\frac{1-\psi}{2}\sum_{\mu=1}^{p}(z_{\mu})^{2}\Biggr{\}}\;,\end{split}

\displaystyle\begin{split}Z_{N}&(t,x,\psi)=\exp\Biggl{\{}-\frac{\beta k}{2}-\frac{\beta}{2N}\sum_{i=1}^{N}\sum_{\mu=1}^{p}(\xi_{i}^{\mu})^{2}\Biggr{\}}\times\\ &\times\sum_{\sigma}\int_{\mathbb{R}^{p}}d\mathcal{M}(z)\ \exp\Biggl{\{}\frac{t}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\sum_{\nu=1}^{k}x_{\nu}\sum_{i=1}^{N}\tilde{\xi}_{i}^{\nu}\sigma_{i}\Biggr{\}}\times\\ &\times\exp\Biggl{\{}\sqrt{\psi}\sqrt{\frac{\beta}{N}}\sum_{\mu=1}^{p}\sum_{i=1}^{N}\xi_{i}^{\mu}\sigma_{i}z_{\mu}\Biggr{\}}\times\exp\Biggl{\{}A\sqrt{1-\psi}\sum_{i=1}^{N}\eta_{i}\sigma_{i}\Biggr{\}}\times\\ &\times\exp\Biggl{\{}B\sqrt{1-\psi}\sum_{\mu=1}^{p}\theta_{\mu}z_{\mu}\Biggr{\}}\times\exp\Biggl{\{}C\frac{1-\psi}{2}\sum_{\mu=1}^{p}(z_{\mu})^{2}\Biggr{\}}\;,\end{split}

A_{N, k, p} (t, x) = A_{N, k, p} (t, x, ψ = 1) = A_{N, k, p} (t, x, ψ = 0) + \int_{0}^{1} (\partial_{ψ^{'}} A_{N, k, p} (t, x, ψ^{'}))_{ψ^{'} = ψ} d ψ .

A_{N, k, p} (t, x) = A_{N, k, p} (t, x, ψ = 1) = A_{N, k, p} (t, x, ψ = 0) + \int_{0}^{1} (\partial_{ψ^{'}} A_{N, k, p} (t, x, ψ^{'}))_{ψ^{'} = ψ} d ψ .

\displaystyle\begin{split}A_{N,k,p}&(t,x,\psi=0)=-O\left(\frac{\ln N}{N}\right)-\frac{\alpha_{N}\beta}{2}+\\ &+\frac{1}{N}\mathbb{E}\Biggl{[}\log\sum_{\sigma}\exp\Biggl{\{}\frac{t}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\sum_{\nu=1}^{k}x_{\nu}\sum_{i=1}^{N}\tilde{\xi}_{i}^{\nu}\sigma_{i}+\mathcal{A}\sum_{i=1}^{N}\eta_{i}\sigma_{i}\Biggr{\}}\times\\ &\times\int_{\mathbb{R}^{p}}\frac{dz_{1}\cdots dz_{p}}{(2\pi)^{p/2}}\exp\Biggl{\{}\sum_{\mu=1}^{p}\biggl{(}\mathcal{B}\theta_{\mu}z_{\mu}+\frac{\mathcal{C}-1}{2}z_{\mu}^{2}\biggr{)}\Biggr{\}}\Biggr{]}=\\ =&-O\left(\frac{\ln N}{N}\right)-\frac{\alpha_{N}\beta}{2}+\frac{1}{N}\mathbb{E}\ln\Biggl{(}\frac{1}{(1-\mathcal{C})^{p/2}}e^{\frac{\mathcal{B}^{2}\theta^{2}}{2(1-\mathcal{C})}p}\Biggr{)}+\\ &+\frac{1}{N}\mathbb{E}\ln\sum_{\sigma}\exp\Biggl{\{}\frac{t}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\sum_{\nu=1}^{k}x_{\nu}\sum_{i=1}^{N}\tilde{\xi}_{i}^{\nu}\sigma_{i}+\mathcal{A}\sum_{i=1}^{N}\eta_{i}\sigma_{i}\Biggr{\}}\;.\end{split}

\displaystyle\begin{split}A_{N,k,p}&(t,x,\psi=0)=-O\left(\frac{\ln N}{N}\right)-\frac{\alpha_{N}\beta}{2}+\\ &+\frac{1}{N}\mathbb{E}\Biggl{[}\log\sum_{\sigma}\exp\Biggl{\{}\frac{t}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\sum_{\nu=1}^{k}x_{\nu}\sum_{i=1}^{N}\tilde{\xi}_{i}^{\nu}\sigma_{i}+\mathcal{A}\sum_{i=1}^{N}\eta_{i}\sigma_{i}\Biggr{\}}\times\\ &\times\int_{\mathbb{R}^{p}}\frac{dz_{1}\cdots dz_{p}}{(2\pi)^{p/2}}\exp\Biggl{\{}\sum_{\mu=1}^{p}\biggl{(}\mathcal{B}\theta_{\mu}z_{\mu}+\frac{\mathcal{C}-1}{2}z_{\mu}^{2}\biggr{)}\Biggr{\}}\Biggr{]}=\\ =&-O\left(\frac{\ln N}{N}\right)-\frac{\alpha_{N}\beta}{2}+\frac{1}{N}\mathbb{E}\ln\Biggl{(}\frac{1}{(1-\mathcal{C})^{p/2}}e^{\frac{\mathcal{B}^{2}\theta^{2}}{2(1-\mathcal{C})}p}\Biggr{)}+\\ &+\frac{1}{N}\mathbb{E}\ln\sum_{\sigma}\exp\Biggl{\{}\frac{t}{2N}\sum_{i,j=1}^{N}\sum_{\nu=1}^{k}\tilde{\xi}_{i}^{\nu}\tilde{\xi}_{j}^{\nu}\sigma_{i}\sigma_{j}+\sum_{\nu=1}^{k}x_{\nu}\sum_{i=1}^{N}\tilde{\xi}_{i}^{\nu}\sigma_{i}+\mathcal{A}\sum_{i=1}^{N}\eta_{i}\sigma_{i}\Biggr{\}}\;.\end{split}

Z_{N,k}(t,x)=\sum_{\sigma}\exp\Biggl{\{}\frac{tN}{2}\sum_{\nu=1}^{k}m_{\nu}^{2}+N\sum_{\nu=1}^{k}x_{\nu}m_{\nu}+\mathcal{A}\sum_{i=1}^{N}\eta_{i}\sigma_{i}\Biggr{\}}\;,

Z_{N,k}(t,x)=\sum_{\sigma}\exp\Biggl{\{}\frac{tN}{2}\sum_{\nu=1}^{k}m_{\nu}^{2}+N\sum_{\nu=1}^{k}x_{\nu}m_{\nu}+\mathcal{A}\sum_{i=1}^{N}\eta_{i}\sigma_{i}\Biggr{\}}\;,

\tilde{G}_{N, k} (t, x) = - \tilde{A}_{N, k} (t, x) = - \frac{1}{N} ln \tilde{Z}_{N} (t, x) .

\tilde{G}_{N, k} (t, x) = - \tilde{A}_{N, k} (t, x) = - \frac{1}{N} ln \tilde{Z}_{N} (t, x) .

\partial_{t} \tilde{A}_{N, k} (t, x)

\partial_{t} \tilde{A}_{N, k} (t, x)

\partial_{t}\bigl{(}\tilde{G}_{N,k}(t,x)\bigr{)}+\frac{1}{2}\bigl{(}\partial_{x}\tilde{G}_{N,k}(t,x)\bigr{)}^{2}+V_{N,k}(t,x)=0\;,

\partial_{t}\bigl{(}\tilde{G}_{N,k}(t,x)\bigr{)}+\frac{1}{2}\bigl{(}\partial_{x}\tilde{G}_{N,k}(t,x)\bigr{)}^{2}+V_{N,k}(t,x)=0\;,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Neural Networks retrieving Boolean

patterns in a sea of Gaussian ones

Elena Agliari

Dipartimento di Matematica, Sapienza Università di Roma, Italy

Istituto Nazionale di Alta Matematica (GNFM-INdAM), Roma, Italy

Adriano Barra

Dipartimento di Matematica e Fisica “Ennio De Giorgi”, Università del Salento, Italy

Istituto Nazionale di Alta Matematica (GNFM-INdAM), Roma, Italy

Chiara Longo

Dipartimento di Matematica, Sapienza Università di Roma, Italy

Daniele Tantari

Scuola Normale Superiore, Centro Ennio De Giorgi, Italy.

Istituto Nazionale di Alta Matematica (GNFM-INdAM), Roma, Italy

Abstract

Restricted Boltzmann Machines are key tools in Machine Learning and are described by the energy function of bipartite spin-glasses. From a statistical mechanical perspective, they share the same Gibbs measure of Hopfield networks for associative memory. In this equivalence, weights in the former play as patterns in the latter. As Boltzmann machines usually require real weights to be trained with gradient descent like methods, while Hopfield networks typically store binary patterns to be able to retrieve, the investigation of a mixed Hebbian network, equipped with both real (e.g., Gaussian) and discrete (e.g., Boolean) patterns naturally arises.

We prove that, in the challenging regime of a high storage of real patterns, where retrieval is forbidden, an extra load of boolean patterns can still be retrieved, as long as the ratio among the overall load and the network size does not exceed a critical threshold, that turns out to be the same of the standard Amit-Gutfreund-Sompolinsky theory. Assuming replica symmetry, we study the case of a low load of boolean patterns combining the stochastic stability and Hamilton-Jacobi interpolating techniques. The result can be extended to the high load by a non rigorous but standard replica computation argument.

I Introduction

In recent years we have witnessed a formidably fast development of research in Artificial Intelligence. Neural networks are playing an important role in this trend, mainly due to the ability of the so-called deep networks to solve difficult problems, upon a proper training. Such problems are broadly ranged in sciences (from Particle Physics [1] to Computational Biology [2] ), not to mention the applied world of technology, where their usage has become pervasive. Nevertheless, as admitted in [3] , despite its remarkable successes, nobody yet understands exhaustively how the whole scaffold works, while there is wide agreement that achieving a full understanding of Deep Learning is an urgent priority.

The pivotal constituent of Deep Learning machinery is the Restricted Boltzmann Machine (RBM) hinton1 ; hinton2 ; RBM1 ; RBM2 . This is a network of units with a bipartite structure, the two parties being referred to as visible layer and hidden layer; units belonging to different layers are connected by links endowed with weights while nodes belonging to the same layer are not connected (see Fig. $1$ left panel). In the jargon of statistical physicists, RBMs have the same energy of a bipartite spin-glass bg ; Bip ; zigg ; pizzo ; auffinger .

By marginalization over the hidden layer, RBMs have also been shown to share the same phase diagram of an Hopfield network BBCS ; prlnoi1 ; mezard ; multi ; remi , whose units, corresponding to those of the visible layer (see Fig. $1$ right panel), are connected via an Hebbian coupling Hebb , with number of patterns corresponding to the amount of hidden units. The Hopfield network is able to spontaneously retrieve such patterns, and therefore to work as an associative memoryags1 , as long as the ratio between the patterns to handle and the available neurons is not too large ags2 , or, in the dual perspective of the RBMs, until the size of the hidden layer is not too large compared to the visible layer’s one.

Crucially, the weight vectors learnt by the RBM after training play as patterns in Hopfield retrieval. Since standard Hopfield networks are built with Boolean patterns, studies on possible generalizations are needed and begin to appear in the literature lettera ; remi .

In the last years, an increasing number of semi-heuristic routes toward a rationale for Deep Learning have been introduced, while rigorous answers (e.g., avoiding the usage of the so called replica trick MPV ; Ton ; seung ) to specific questions are hardly distilled (see e.g. BGG-JSP2010 ; BGGT-JSM2012 ; S ; ST ; BG5 ; BGP3 ; Tala1 ; Tala2 ; lecun ). However, beyond the replica-trick, other techniques (from cavity or message passing mezard ; huang ; multi to those based on interpolating structures BGGT-JSM2012 ; bg ; Bip ; prlnoi3 ) to handle spin-glasses have recently appeared in the literature, hence an attempt should be made in using them to infer properties of these Restricted Boltzmann Machines also from a rigorous perspective.

Here we prove, at the replica symmetric level, that Hopfield networks endowed with patterns that are mixed, namely in part binary and in part real, are robustly capable of retrieving the digital information (i.e., the binary patterns) although “immersed” in the continuous (slow) noise generate by the real patterns (i.e., the sea). In particular, in this paper, by mixing two mathematical approaches, namely stochastic stability AC-SS ; BGG-JSP2010 ; BGGT-JSM2012 ; BGGTgauss and Hamilton-Jacobi interpolation HJ-Jstat ; ABDG ; BDT2013 ; guerra-HJ ; GTmult , we are able to describe the model free energy and phase diagram for pure state retrieval.

Let us consider a system made of $N$ Ising neurons dealing with a certain number of patterns, referred to as $p$ or $k$ according to whether the number scales linearly with $N$ (i.e., $p=\alpha N$ ) or logarithmically with $N$ ( $k=\gamma\ln N$ ). These two cases correspond to the so-called high storage and low storage regimes, respectively AMIT . As well known, in the low-storage regime the Hopfield model is able to retrieve patterns (i.e., to work as a distributed associative memory) for binary as well as real patterns lettera ; lungo , while, in the high-storage regime, only binary patterns can be retrieved because a linearly extensive (in $N$ ) amount of real patterns contains too much information for the $O(N^{2})$ synapses to perform pattern recognition or similar tasks bov-stoc ; lungo . Indeed, in general, the high-storage case is much more tricky due to its intrinsic glassiness, whence tools from disordered statistical mechanics are in order to infer its properties AMIT ; MPV . On the contrary, standard statistical mechanical machineries are usually effective for the low-storage case Ton .

Now, given the equivalence between RBM and Hopfield networks, a natural interest for mixed Hebbian networks (where patterns are in part analog and in part digital) arises and a first scenario we would figure out and clarify is their retrieval capabilities when they are constrained to keep an extensive amount of $p$ real patterns (hence the worst case for retrieval) but they are also over-fed by a further low-load of $k$ binary patterns.

Exploiting Guerra’s interpolating schemes we prove there exists a region in the parameter space (corresponding to not-too-high values of both fast and slow noises), where mixed Hebbian network works as a distributed associative memory and the boundaries of such a region are evidenced by a first-order phase transition.

Further, a fairly standard replica calculation, although not rigorous, suggests that this picture can be extended even to the case of an extensive load for both binary and real patterns, that is, there exists a retrieval region where pattern recognition for high-load digital information in a real sea seems possible.

Remarkably, in all these cases, the boundary for the retrieval region turns out to be always the one identified by Amit-Gutfreund-Sompolinsky in the $80$ ’s ags1 ; ags2 .

I.1 Associative Hopfield Networks and Restricted Boltzmann Machines

Let us deepen the ideas exposed so far, by introducing the standard definitions and concepts for Hopfield neural networks. Following classical notations Ton , we shall consider $N$ binary neurons (i.e., Ising spins AMIT ) and to each neuron $i$ we assign a dichotomic variable $\sigma_{i}$ that describes its activity: if $\sigma_{i}=+1$ the $i$ -th neuron is spiking, while if $\sigma_{i}=-1$ it is quiescent.

Neurons are embedded in a fully connected network, in such a way that mean-field approaches are suitable for the investigation. The synaptic potential $h_{i}$ that the $i$ -th neuron receives from the other $N-1$ is defined as

[TABLE]

where $J_{ij}=J_{ji}$ is the synaptic coupling between neuron $j$ and neuron $i$ , defined according to Hebb’s learning rule Hebb as

[TABLE]

Indeed, associative memory models are built to recognize a certain group of words, pixels, or generically patterns $\xi$ : a pattern is defined as a sequence of random variables $\xi=(\xi_{1},\ldots,\xi_{N})$ . If we want the network to memorize and retrieve a number $p$ of patterns, we have to introduce another index to distinguish them: $\{\xi^{1},\ldots,\xi^{p}\}$ , and we shall assume that the set $\left\{\xi_{i}^{\mu}\right\}_{i,\mu}$ is made of $p\times N$ i.i.d. variables. Notice that, for a Shannon information compression argument, if the network is able to cope with this kind of pattern, then it certainly retains at least the same capacity in the case of correlated patterns correlated ; ABDG .

Boolean binary patterns have entries such that $\mathbb{P}(\xi_{i}=+1)=\mathbb{P}(\xi_{i}=-1)=1/2$ , while Gaussian real patterns have entries drawn from $\mathbb{P}(\xi_{i})\sim\mathcal{N}(0,1)$ .

Definition 1.

The Hamiltonian $H_{N}^{AHN}(\sigma,\xi)$ of the Associative Hopfield Network (AHN) equipped with $N$ Ising neurons $\sigma$ and $p$ patterns is defined as

[TABLE]

Once introduced the (fast) noise $\beta=1/T\in\mathbb{R}^{+}$ , where $T$ plays as a temperature in standard Statistical Mechanics, the partition function $Z_{N,p}^{AHN}(\beta)$ for the AHN is defined as

[TABLE]

and the free energy as $1/N\mathbb{E}_{\xi}\log Z_{N,p}^{AHN}(\beta)$ , whose analysis allows inferring the model phase-diagram in the thermodynamic limit $(N\to\infty)$ AMIT . Note that in the previous definitions we have introduced for simplicity also self-interactions, but we will see their presence doesn’t affect the thermodynamic state of the network because they contribute at most to a simple constant term in the free energy.

Definition 2.

The Hamiltonian $H_{N}^{RBM}(\sigma,\xi)$ of the Restricted Boltzmann Machine (RBM), equipped with a visible layer of $N$ binary (i.e. Boolean) units $\sigma_{i}$ , $i\in(1,...,N)$ and a hidden layer of $p$ real (i.e. Gaussian) units $z_{\mu}$ , $\mu\in(1,...,p)$ , connected by the $N\times p$ weight matrix $\xi_{i}^{\mu}$ , is defined as

[TABLE]

Again considering $\beta$ the fast noise of the network, the partition function $Z_{N,p}^{RBM}(\beta)$ for the RBM is introduced as

[TABLE]

where $d\mathcal{M}(z)=\prod_{\mu=1}^{p}\frac{dz_{\mu}}{\sqrt{2\pi}}e^{{z_{\mu}^{2}}/2}$ is the $p$ -dimensional centered Gaussian measure. The model free-energy is defined as before. It is just an exercise now to show (e.g., via standard Gaussian integration) the following

Proposition 1.

The partition functions of the Associative Hopfield Network and of the Restricted Boltzmann Machines are the same, i.e.

[TABLE]

and thus the same equivalence holds for the two free energies.

Note that, while the identity $Z_{N}^{AHN}(\beta)\equiv Z_{N}^{RBM}(\beta)$ holds only if we choose Gaussian hidden units $z_{\mu}$ , an analogous equivalence can be proved introducing a class of generalised AHN and RBM models with any unit priors lettera ; lungo .

In order to investigate the capabilities of these networks to retrieve patterns, it is useful to introduce the concept of Mattis magnetization as follows.

Definition 3.

For any $\mu\in(1,...,p)$ , we define the Mattis magnetization, i.e. the overlap between the $\mu$ -th patterns and the neuron states, as

[TABLE]

In the following we will often drop the $N$ or $\sigma$ dependencies for lightening the notation.

The magnitude of the Mattis magnetization encodes whether a pattern $\mu$ has been retrieved or not. Moreover we can rewrite the Hamiltonian (2) as a function of the order parameters $m_{\mu}$ ’s as

[TABLE]

hence it becomes clear, that its energy minima are located at large $m_{\mu}$ . This means that the energy function is minimized as the spins are aligned to some of the $p$ patterns, thus indicating a retrieving state (i.e. the network overall works as a distributed associative memory).

Let us now turn our attention to the RBM case. Its energy function (3) can be rewritten as

[TABLE]

thus, if the system is in the retrieval region, i.e., there is some pattern $\mu$ (say $\mu^{*}$ ) that is retrieved by the Hopfield network, its related Mattis magnetization raises from zero acting as a staggered magnetic field over its related hidden variable $z_{\mu^{*}}$ . In the Machine Learning perspective, this condition corresponds to selecting a feature, among the $p$ possible, and allows a statistically significant classification of the data.

II Mixed Hebbian networks

In our “hybrid” Hopfield model, we consider the case in which the network has stored a low load of Boolean patterns and a high load of Gaussian ones. We will assign the variables $\tilde{\xi}^{\nu}$ , $\nu=1,\ldots,k=\gamma\ln N$ to the binary memories and $\xi^{\mu}$ , $\mu=1,\ldots,p=\alpha N$ to the real ones (with $\gamma,\alpha\;\geqslant\;0$ ). We have

[TABLE]

Following the description of the standard Hopfield neural network given in Section I.1, we give the following

Definition 4.

The Hamiltonian $H_{N}^{MHN}(\sigma,\xi,\tilde{\xi})$ of the mixed Hebbian network (MHN), equipped with $N$ Ising neurons, a low load of $k$ binary patterns and a high load of $p$ real patterns, reads as

[TABLE]

Notice that, splitting the above summations over $(i,j)$ , the Hamiltonian of the mixed Hebbian network can be written as

[TABLE]

hence the last term at the r.h.s. of the previous equation does not contribute at all in the thermodynamic limit, while the second-last term converges to

[TABLE]

Definition 5.

The Gibbs measure for a generic function of the neurons $F(\sigma)$ at a given level of noise $\beta$ is

[TABLE]

Note that for $\beta\to 0$ the measure becomes flat, while for $\beta\to\infty$ , i.e. at zero temperature, only the global minima of the energy contribute to the measure.

Definition 6.

Given $s$ independent realizations (i.e., replicas) of the system, at the same noise level $1/\beta$ and quenched patterns $\xi$ and $\tilde{\xi}$ , we define the $s$ -replicated Gibbs measure as $\Omega=\omega^{1}\times\omega^{2}\times\ldots\times\omega^{s}$ , i.e. for any function of the $s$ neuron replicas $F(\sigma^{(1)},\ldots,\sigma^{(s)})$ ,

[TABLE]

Definition 7.

The average over the quenched memories $\{\tilde{\xi}_{i}^{\nu}\}_{i,\nu}$ and $\{\xi_{i}^{\mu}\}_{i,\mu}$ for a generic function $F(\xi,\tilde{\xi})$ is introduced as

[TABLE]

Moreover we define the average $\langle\cdotp\rangle=\mathbb{E}\Omega(\cdotp)$ .

We continue by introducing the order parameters necessary to carry out the analysis of the mixed model. For any pattern, we define the Mattis magnetization as before for describing thermodynamic states in the retrieval phase, while we introduce overlaps among replicas, as in BGG-JSP2010 ; BGGT-JSM2012 , for describing ordered states that are not correlated with patterns.

Definition 8.

Given two configurations $(a,b)$ of the network, the overlap $q_{ab}$ between visible units is defined as

[TABLE]

and the overlap $p_{ab}$ between hidden units as

[TABLE]

Finally, we introduce the free energy density as

Definition 9.

We define the free-energy density $A(\alpha,\beta)$ of the mixed Hebbian network as

[TABLE]

where the partition function $Z_{N,k,p}(\beta)$ reads as

[TABLE]

Therefore, the free-energy density at finite volume reads as

[TABLE]

in which the parameter $\alpha_{N}$ is such that $\alpha_{N}=\frac{p}{N}\to\alpha$ for $N\to\infty$ .

We recall that, in the statistical mechanical treatment, finding an explicit expression for the free-energy density $A(\alpha,\beta)$ is the first step for understanding the properties of the network’s thermodynamic states. This is because the solution of $A(\alpha,\beta)$ usually comes with a variational large deviation principle over the order parameters $\{m_{\mu},\ q_{ab},\ p_{ab}\}$ .

III Sum rules for the mixed Hebbian network’s free energy

In this Section we expose the interpolating structure that we set up to obtain an expression of the MHN free energy density, at the replica symmetric level, as a variational principle over the order parameters. The solution of this optimization problem is encoded into a set of self-consistent equations that the order parameters have to satisfy, giving the phase diagram of the model by varying the external parameter.

In particular, the question we are addressing in the present work is about the existence of a retrieval phase in such a phase diagram: we will prove that there is actually a region in the $(\alpha,\beta)$ plane where the mixed Hebbian network is able to retrieve, in particular where the signal conveyed by the binary patterns is detectable over the real noisy sea.

Summarizing the strategy, we will first generalize the partition function (12) by letting it depend on three interpolating parameters, namely $t\in\mathbb{R}^{+}$ , $x\in\mathbb{R}^{k}$ , $\psi\in[0,1]$ that, once set to proper values (i.e., $t=\beta$ , $x=0$ and $\psi=1$ ), recovers the original one of the mixed Hebbian network (see eq. 14). This interpolation will allow us to split the problem into two (related) sub-problems: one involving the Gaussian patterns, tackled by the stochastic stability technique in $\psi$ , and the other involving the Boolean patterns, treated via the Hamilton-Jacobi technique in the $1+k$ dimensional space $(t,\ x)$ .

Once formulated a sum rule for the free energy (see eq. 15), to set up the stochastic stability approach, we will introduce three external fields $\mathcal{A},\ \mathcal{B},\ \mathcal{C}$ , where $\mathcal{A}$ acts on the $\{\sigma\}$ party, while $\mathcal{B}$ and $\mathcal{C}$ act on the $\{z\}$ party: while explicit expressions for these fields will be set a fortiori, their meaning can be discussed immediately. They are required to ensure that the interpolative procedure in $\psi$ , always reproduces the correct statistics on the neurons but in a mean field picture where units are no longer coupled. The Hamilton-Jacobi formalism is naturally introduced when dealing with the explicit calculation of $A_{N,k,p}(t,x,\psi=0)$ , which represents the free energy density of a Hopfield network with binary patterns and an external random field supplied by the Gaussian sea (that comes into play in terms of a quenched noise, hence against retrieval). In fact, $A_{N,k,p}(t,x,\psi=0)$ can be interpreted as the Guerra Action for a unitary-mass point-particle evolving in the $1+k$ dimensional $(t,\ x)$ space and can consequently be approached via standard techniques of Analytical Mechanics guerra-HJ ; HJ-Jstat .

We stress that the order in which we apply these two methods is interchangeable and in Appendix A we show how, reasonably proceeding the other way around (that is, using first the Hamilton-Jacobi streaming and, later, the stochastic stability), we obtain the same results.

As a preliminary step, it is useful to apply the Gaussian integration to the partition function (12) to linearize the Gaussian section of the free energy density function $A_{N,k,p}(\beta)$ with respect to the bilinear quenched memories carried by $\xi_{i}^{\mu}\xi_{j}^{\mu}$ . Namely:

[TABLE]

where $d\mathcal{M}(z)=\prod_{\mu=1}^{p}\frac{dz_{\mu}}{\sqrt{2\pi}}e^{{z_{\mu}^{2}}/2}$ is the $p$ -dimensional Gaussian measure.

As anticipated earlier, to achieve our goal we shall now analyse a generalized problem, for which we give hereafter the definition:

Definition 10.

Once introduced $k+2$ scalar parameters $t\in\mathbb{R}^{+},\ x\in\mathbb{R}^{k},\psi\in[0,1]$ , and three scalar fields $\mathcal{A},\ \mathcal{B},\ \mathcal{C}$ , the generalized partition function $Z_{N}(t,x,\psi)$ for the mixed Hebbian network is defined as

[TABLE]

with $\theta_{\mu},\eta_{i}\sim\mathcal{N}(0,1)$ $\forall\mu=1,\ldots,p$ , $i=1,\ldots,N$ .

Note that, by now, the scalar fields are given in full generality and they will be chosen later on, in order to ensure that the replica symmetric framework is preserved at the end of the interpolation.

Note further that, in perfect analogy we can extend also the free energy density function to $A_{N,k,p}(t,x,\psi)$ , the Gibbs measures to $\omega_{t,x,\psi}$ and $\Omega_{t,x,\psi}$ and the overall average to $\langle\cdotp\rangle_{t,x,\psi}$ . Of course, also these quantities recover the standard statistical mechanical scenario once evaluated at $t=\beta$ , $x=0$ and $\psi=1$ .

We begin the study of the free energy density function through the stochastic stability. First, exploiting the Fundamental Theorem of Calculus on $A_{N,k,p}(t,x,\psi)$ in the $\psi$ variable we write the next

Proposition 2.

The following sum rule for the generalised free energy $A_{N,k,p}(t,x,\psi)$ of the mixed Hebbian network holds

[TABLE]

The original problem is therefore recast in the evaluation of the two terms at the r.h.s. of eq. (15).

To compute the first term we start through a standard Gaussian integration, hence

[TABLE]

It is now crucial to notice that the fourth term of Eq. (16) can be interpreted as the free energy density $\tilde{A}_{N,k}(t,x)$ of a Hopfield network with $k$ binary patterns $\{\tilde{\xi}^{\nu}\}$ and $N$ external random fields $\mathcal{A}\eta_{i}$ : note that the latter account for the slow noise supplied by the underlying sea of Gaussian patterns that can not be retrieved.

It is convenient to rename this free energy density $\tilde{A}_{N,k}(t,x)$ by the following definition:

Definition 11.

Once introduced a generalized partition function $Z_{N,k}(t,x)$ , identified by the following expression

[TABLE]

we define the Guerra Action $\tilde{G}_{N,k}(t,x)$ , for a unitary-mass point-particle moving in the $(1+k)$ dimensional $(t,x)$ space, as the negative free energy density $\tilde{A}_{N}(t,x)$ :

[TABLE]

With this definition, the application of the Hamilton-Jacobi formalism for handling $\tilde{A}_{N,k}(t,x)$ is straightforward. In fact, it is immediate to check that, as $\tilde{A}_{N,k}(t,x)$ has the following properties

[TABLE]

we can proceed according to the Hamilton-Jacobi prescription for $\tilde{G}_{N,k}(t,x)$ . In fact, thanks to the properties (18), it is immediate to verify the next

Proposition 3.

The Guerra Action obeys the following Hamilton-Jacobi streaming

[TABLE]

where the potential $V_{N,k}(t,x)$ is given by the sum over all the binary patterns of their related Mattis magnetization’s variances, namely

[TABLE]

Remark 1.

As we are in the low-storage regime for binary patterns (i.e., $k\propto\ln N$ ), in the thermodynamic limit the Guerra Action paints a Galilean trajectory for the point-like particle: its evolution is simply a free motion as $\lim_{N\to\infty}V_{N,k}(t,x)=0$ .

Proposition 4.

If we define a $k$ -dimensional vector $\Gamma_{N}(t,x)$ , whose components are $\Gamma_{N}^{\nu}(t,x)=\partial_{x_{\nu}}\tilde{G}_{N,k}(t,x)$ , by deriving Eq. (19) with respect to $x_{\nu}$ we obtain the following set of $k$ Burgers equations for the canonical momenta

[TABLE]

At present, the goal is thus to solve the Burgers equations and integrate back the solutions to get the original problem for $\tilde{G}_{N,k}(t,x)$ (and therefore for $\tilde{A}_{N}(t,x)$ ) solved too. As standard, performing the Cole-Hopf transform $\Phi_{N,k}(t,x):=e^{N\tilde{A}_{N,k}(t,x)}$ , we can assert that

Proposition 5.

Solving expression (20) is equal to solve the following Cauchy problem for the heat equation

[TABLE]

We can now deal with the problem above through standard techniques. Namely we write

[TABLE]

where $G$ is the Green propagator $G(t,x)=\left(\frac{N}{2\pi t}\right)^{k/2}e^{-\frac{\sum_{\nu}x_{\nu}^{2}N}{2t}}$ .

The computations for the initial condition $\Phi_{N,k}(0,x)$ return

[TABLE]

Therefore, we can state that

Theorem 1.

The solution to the problem in (21) is given by the following saddle point equation:

[TABLE]

Corollary 1.

Recalling that $\tilde{A}_{N,k}(t,x)=\frac{1}{N}\ln\Phi_{N}(t,x)$ , in the thermodynamic limit we have that

[TABLE]

To get the full expression of the Guerra Action in the thermodynamic limit, we must finally set $t=\beta$ , $x=0$ and perform the minimization of the function $g$ given in (24): with these values for $t$ and $x$ , we have to fix $x_{\nu}^{\prime}=\beta\langle m_{\nu}\rangle$ $\forall\nu=1,\ldots,k$ .

At this point equation (15) is almost all explicit. We still need to calculate the integral term at the top right side of equation (15), for which it is sufficient to evaluate the $\psi$ -derivative of the free-energy density $A_{N,k,p}(t,x,\psi)$ and write it in a way that allows to extrapolate easily its replica symmetric approximation.

Here we just provide the final result, while the step-by-step calculations for the $\psi$ -derivative are left for the reader in Appendix B. So briefly,

[TABLE]

Fixing the free parameters $\mathcal{A}$ , $\mathcal{B}$ and $\mathcal{C}$ as

[TABLE]

and adding and subtracting the term $(\alpha_{N}\beta\cdot\bar{q}\bar{p})/2$ in Eq. (26) we have

[TABLE]

In the replica symmetric regime, the order parameters $m$ , $q_{12}$ , $p_{12}$ do not fluctuate with respect to their quenched averages in the thermodynamic limit, i.e. using a bar to denote their averages, $\langle m\rangle_{t,x}\to\bar{m}$ , $\langle q_{12}\rangle_{t,x}\to\bar{q}_{12}$ , $\langle p_{12}\rangle_{t,x}\to\bar{p}_{12}$ as $N\to\infty$ . By choosing $\bar{p}=\bar{p}_{12}$ and $\bar{q}=\bar{q}_{12}$ the last term at the r.h.s. of the above expression goes to zero in the thermodynamic limit and the $\psi$ -derivative can be integrated being constant over $\psi$ . It holds bg ; Bip ; zigg ; pizzo that the optimal values of $\bar{p}$ and $\bar{q}$ can simply be obtained by computing the two overlaps at $\psi=0$ and this turns out to be equivalent to take the extremum of the trial free energy $(\ref{tfc})$ w.r.t. $\bar{p}$ and $\bar{q}$ as stated in the following main theorem.

Theorem 2.

The replica-symmetric free-energy density of the mixed Hebbian network defined by the Hamiltonian (6), in the thermodynamic limit, is determined by extremizing $A(\boldsymbol{m},\bar{q},\bar{p};\alpha,\beta)$ , where

[TABLE]

with $\eta\sim\mathcal{N}(0,1)$ and where the values of its order parameters are set via their following self-consistencies

[TABLE]

Remark 2.

We highlight that for $\alpha=0$ and $\nu=1$ we recover the Curie-Weiss free energy density barra0 , while, if $\alpha>0$ and $\nu=0$ we recover the free energy density of the analog Hopfield model at high storage BGG-JSP2010 and, finally, keeping $\nu=0$ , with $\alpha\to\infty$ (such that $\alpha\beta^{2}=\beta^{\prime}$ , with $\beta^{\prime}$ finite), we recover the expression of the Sherrington-Kirkpatrick free energy density at noise level $\beta^{\prime}$ BGGT-JSM2012 ; Bip .

Remark 3.

In order to get insights in the critical behavior exhibited by the system, in the expression (31), as standard when dealing with second-order phase transition, we can expand for small $q$

[TABLE]

This procedure returns a (second order) transition line for ergodicity breaking at

[TABLE]

that is the same as the one for the (standard, i.e. digital) Hopfield network ags1 ; ags2 as well as for its analog counterpart BGG-JSP2010 ; BGGT-JSM2012 : this is not particularly surprising as we are checking here the pure ergodic/spin-glass transition where Universality is expected to hold carmona ; genovese .

A different intuition is needed when searching the boundary (i.e. the transition line) splitting the spin-glass phase (whose existence has never been discussed) from a (possible) region of retrieval (whose existence is not straightforward).

To find this first-order transition line we must compare the values of the two free-energies (the one under the pure state ansatz holding for retrieval and the other for no net magnetization accounting for the spin glass phase), check that there is a region in the $(\alpha,\beta)$ plane where one prevails over the other and a complementary region where the opposite is true. The transition line is just given by the set of points in the parameter space where the two free energy balance. Our results return the same transition (hence the same retrieval region) of the standard (i.e. digital) Hopfield network. Its analog counterpart does not retrieve at all hence there is no line to compare that case.

The whole can be restated in the following

Proposition 6.

The mixed Hebbian network, equipped with an extensive load of real patterns and with a low load of binary patterns, is able to handle the binary patterns as long as the system stays confined within the standard retrieval region Ton .

Remark 4.

Once we fixed the parameters $\mathcal{A}$ , $\mathcal{B}$ and $\mathcal{C}$ (and, in particular, noting that $\mathcal{A}=\sqrt{\alpha\beta\bar{p}}$ ) and we have an explicit expression for the mixed Hebbian network’s free energy density (see eq. 29), via its $\langle m_{\nu}\rangle$ self-consistency we can appreciate how the high load of real patterns acts as a disturbing noise against the signal carried by the booleans

Remark 5.

Note that, in the $\alpha\to 0$ limit (hence neglecting the real sea), the critical point becomes $\beta_{c}=1$ . This is perfectly consistent with the emergence of a ferromagnetic phase (i.e., the point $(\beta=1,\alpha=0)$ is the Curie-Weiss or Mattis critical point).

Note that a fairly standard usage of the replica-trick allows to extend the previous result to the case of a high load of boolean patterns too. Since it is not a rigorous argument we state the following as a

Conjecture 1.

Assuming an high storage of both real patterns (hence $p=\alpha N$ ) as well as binary patterns (hence $k=\gamma N$ ), Theorem 2 keeps holding as long as we replace $\alpha\to\alpha+\gamma$ .

IV Conclusions

The Hopfield neural network and the restricted Boltzmann machine are amongst the best known and intensively studied models in Artificial Intelligence. The former is meant to mimic retrieval, namely the capacity of (the neurons of) a machine to recall a pattern of information previously stored. The latter is meant to mimic learning, namely the capacity of (the synapses of) a machine to be trained to encode selected patterns of information. Remarkably, Hopfield networks and Boltzmann machines share the same thermodynamics. This equivalence has several implications and, in particular, it implies that the conditions under which the former is able to retrieve are the same conditions under which the latter is able to identify features in the input. In fact, in this equivalence, the patterns of information retrieved by the Hopfield model corresponds to the optimized weights of the trained Boltzmann machine.

However, in the wide Literature concerning these models, the patterns handled by the Hopfield model are typically binary, while the weights the Boltzmann Machine usually ends up with are real: this gap looks structural since the retrieval of real patterns (at least in the high-load regime) is beyond the Hopfield model capabilities. While numerical understanding in the field increases at an impressive rate, analytical improvements proceed more slowly. In order to get further insights into this point through the analytic perspective, in this work we considered a mixed Hopfield network, where patterns are partly real and partly binary and we studied its statistical mechanical properties (i.e., we focused on the behavior of averaged systems and in the thermodynamic limit, which is not the typical benchmark in Computer Science).

In particular, we rigorously answered (positively) to the question of whether such a hybrid network with a high-load of analog patterns and a low-load of binary patterns is able to retrieve the latter (on the other hand, the retrieval of a high-load of analog patterns is already known to be unfeasible bov-stoc ; lungo ). We proved that the hybrid model shares the same phase diagram of the classic Hopfield network with a high storage of Boolean patters only: in the parameter space, where parameters are given by the fast noise (i.e., the temperature) and by the slow-noise (i.e., the “sea” of analog patterns), there exists a retrieval region bounded by a first-order transition line.

This result has been achieved by developing a novel interpolating technique entirely stemming from the Guerra scheme (see BGG-JSP2010 ; BGGT-JSM2012 and HJ-Jstat ; Bip ). In a nutshell, exploiting the above mentioned equivalence, we recast the hybrid Hopfield model in terms of its related Boltzmann machine and then we ask for stochastic stability of the bulk of patterns (hence the real ones). We interpolate between the free energy of the mixed Hopfield model and two one-body random systems (whose factorized treatment becomes straightforward). This approach allows us to recognize, within the free energy contribution due to real patterns, another nestling free-energy density due to the Boolean contribution of the binary patterns. The latter can then be extracted via the Hamilton-Jacobi route in terms of its natural order parameters. This approach allows detecting when the signal carried by a logarithmic load of Booleans is strong enough to shine over the noisy sea generated by the extensive storage of Gaussian patterns.

Finally, we stress that this machinery does not apply in the case of an high load of real as well as binary patterns. This challenging case can however be addressed via a fairly standard replica-trick calculation obtaining evidence that the outlined scenario is preserved as long as the sum of the two slow noises (stemming from the two contributions of real and binary patterns) does not exceed the usual threshold.

Acknowledgements

E.A. acknowledges financial support from GNFM-INdAM (Progetto Giovani Agliari-2016).

A.B. acknowledges financial support from Salento University and by GNFM-INdAM.

D.T. acknowledges financial support from GNFM-INdAM (Progetto Giovani Tantari-2016).

Appendix A The inverse process

In this appendix we shall illustrate that proceeding first with the HJ formalism and then with the stochastic stability is equivalent to the process we described in Sec. III. Briefly, the method consists of the following steps.

Now, instead of the generalized partition function defined in (14), we have the following:

[TABLE]

where we can notice the Hamilton-Jacobi scaffold in the interpolation of the Boolean section of the system. We recover the proper partition function if we put $t=\beta$ and $x=0$ , while if $t=0$ and $x=1$ we obtain a one-body problem for the boolean memories.

Even though the generalized free energy is now defined through this new partition function, the equations for its derivatives expressed in (18) still hold and therefore we can proceed with the Hamilton-Jacobi formalism adopting the same argument we used in Sec. III. So Eqs. (19), (20) and (21) still hold, but now the initial state function $A_{N,k,p}(0,x)$ is

[TABLE]

This function is now interpretable as the free energy density at a finite volume $N$ of a Hopfield network with $p$ real patterns and an external field (that this time contains patterns of information), so we can now use the stochastic stability technique to write an explicit form of the expression above. To do so, we introduce the variable $\psi\in[0,1]$ and the interpolated free energy density:

[TABLE]

Mirroring the exposition reported in the main text, we can now apply the Fundamental Theorem of Calculus in $\psi$ , perform almost the same calculations and substitute the values of the free parameters according to (27). What we obtain is:

[TABLE]

Now recalling that the solution to (21) is defined by (22), and that $\Phi_{N,k,p}=e^{NA_{N,k,p}}$ we can write the free energy density function at a finite volume $N$ :

[TABLE]

where

[TABLE]

In the thermodynamic limit the free-energy density is consequently obtained by (25) with the help of a saddle point argument. So, fixing the parameters $t$ and $x$ to be $t=\beta$ , $x=0$ and finding that the minimum of the function $g$ is determined by $x_{\nu}^{\prime}=\beta\langle m_{\nu}\rangle$ , we can write the following expression for $A(\alpha,\beta)$ :

[TABLE]

which is exactly the same as Eq. (29) that we found through the calculations of Sec. III where the order of the methods were reverted.

Appendix B Calculating the $\psi$ -streaming of the interpolating free energy

As anticipated in Sec. III, in this appendix we will illustrate the calculations of the $\psi$ -derivative of the generalized free energy density $A_{N,k,p}(t,x,\psi)$ written in Eq. (26).

When evaluating the streaming $d_{\psi}A_{N,k,p}(t,x,\psi)$ we get the sum of four terms: $I$ , $II$ , $III$ and $IV$ , that we shall analyse shortly. Each one comes as a consequence of the derivation of a corresponding exponential term appearing into the expression of the generalized free energy density, whose generalized partition function $Z_{N,k,p}(t,x,\psi)$ is defined in (14).

We remind that we introduced in Sec. III the generalized average $\langle\cdot\rangle_{t,x,\psi}$ , that naturally extends the Gibbs measure encoded in the interpolating scheme (and is reduced to the proper one whenever setting $t=\beta$ , $x=0$ and $\psi=1$ ). To lighten the expressions, we introduce the function $B_{N,k,p}(t,x,\psi)$ that stands for the generalized Boltzmann factor.

We can now show the calculations of terms $I$ , $II$ , $III$ and $IV$ :

[TABLE]

In these three equations we used integration by parts (Wick’s Theorem), and we manipulated the expressions in order to let the order parameters $q_{12}$ and $p_{12}$ appear (for their general definitions see Eqs. (9) and (10)). Term $IV$ is easily computed through the standard Gaussian integration:

[TABLE]

Summing the final expressions of Eqs. (36), (40), (44) and (45) we have:

[TABLE]

which is what we reported in Eq. (26).

Bibliography57

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) E. Agliari, et al., Multitasking associative networks , Phys. Rev. Lett. 109 , 268101, (2012).
2(2) E. Agliari, et. al., Retrieval capabilities of hierarchical networks: From Dyson to Hopfield , Phys. Rev. Lett. 114 , 028103, (2015).
3(3) E. Agliari, A. Barra, A. De Antoni, A. Galluzzi, Parallel retrieval of correlated patterns: From Hopfield networks to Boltzmann machines , Neural Networks 38 :52-63 (2013).
4(4) M. Aizenman, P. Contucci, On the stability of the quenched state in mean-field spin-glass models , J. Stat. Phys. 92 (5):765-783, (1998).
5(5) D.J. Amit, Modeling brain function: The world of attractor neural networks , Cambridge University Press, (1992).
6(6) D.J. Amit, H. Gutfreund, H. Sompolinsky, Spin Glass model of neural networks , Phys. Rev. A 32 , 1007-1018, (1985).
7(7) D.J. Amit, H. Gutfreund, H. Sompolinsky, Storing infinite numbers of patterns in a spin glass model of neural networks , Phys. Rev. Lett. 55 , 1530-1533, (1985).
8(8) A. Auffinger, W.K. Chen, Free energy and complexity of spherical bipartite models , J. Stat. Phys. 157 , 40, (2014).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Neural Networks retrieving Boolean

Abstract

I Introduction

I.1 Associative Hopfield Networks and Restricted Boltzmann Machines

Definition 1**.**

Definition 2**.**

Proposition 1**.**

Definition 3**.**

II Mixed Hebbian networks

Definition 4**.**

Definition 5**.**

Definition 6**.**

Definition 7**.**

Definition 8**.**

Definition 9**.**

III Sum rules for the mixed Hebbian network’s free energy

Definition 10**.**

Proposition 2**.**

Definition 11**.**

Proposition 3**.**

Remark 1**.**

Proposition 4**.**

Proposition 5**.**

Theorem 1**.**

Corollary 1**.**

Theorem 2**.**

Remark 2**.**

Remark 3**.**

Proposition 6**.**

Remark 4**.**

Remark 5**.**

Conjecture 1**.**

IV Conclusions

Appendix A The inverse process

Appendix B Calculating the ψ\psiψ-streaming of the interpolating free energy

Definition 1.

Definition 2.

Proposition 1.

Definition 3.

Definition 4.

Definition 5.

Definition 6.

Definition 7.

Definition 8.

Definition 9.

Definition 10.

Proposition 2.

Definition 11.

Proposition 3.

Remark 1.

Proposition 4.

Proposition 5.

Theorem 1.

Corollary 1.

Theorem 2.

Remark 2.

Remark 3.

Proposition 6.

Remark 4.

Remark 5.

Conjecture 1.

Appendix B Calculating the $\psi$ -streaming of the interpolating free energy