High storage capacity in the Hopfield model with auto-interactions - stability analysis
Jacopo Rocchi, David Saad, Daniele Tantari

TL;DR
This paper analyzes the stability of new fixed points in the Hopfield model with auto-interactions, revealing their instability and limited usefulness for reliable pattern storage despite high capacity claims.
Contribution
It provides a stability analysis of recently proposed high-capacity fixed points in the Hopfield model, highlighting their instability and limitations for associative memory applications.
Findings
New fixed points are unstable under small perturbations
Errors in stored patterns tend to increase during retrieval
High storage capacity fixed points are of limited practical use
Abstract
Recent studies point to the potential storage of a large number of patterns in the celebrated Hopfield associative memory model, well beyond the limits obtained previously. We investigate the properties of new fixed points to discover that they exhibit instabilities for small perturbations and are therefore of limited value as associative memories. Moreover, a large deviations approach also shows that errors introduced to the original patterns induce additional errors and increased corruption with respect to the stored patterns.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
High storage capacity in the Hopfield model with auto-interactions - stability analysis
Jacopo Rocchi
Nonlinearity and Complexity Research Group, Aston University, Birmingham B4 7ET, United Kingdom
David Saad
Nonlinearity and Complexity Research Group, Aston University, Birmingham B4 7ET, United Kingdom
Daniele Tantari
Scuola Normale Superiore, Centro Ennio de Giorgi, Piazza dei Cavalieri 3, I-56100, Pisa, Italy
Abstract
Recent studies point to the potential storage of a large number of patterns in the celebrated Hopfield associative memory model, well beyond the limits obtained previously. We investigate the properties of new fixed points to discover that they exhibit instabilities for small perturbations and are therefore of limited value as associative memories. Moreover, a large deviations approach also shows that errors introduced to the original patterns induce additional errors and increased corruption with respect to the stored patterns.
1 Introduction
Hopfield models [1, 2, 3] are recurrent neural networks where connections between units form a fully connected symmetric network. They have been proposed as models of content addressable memories, i.e. systems that are able to retrieve memory items from partial information. Their introduction was inspired by the observation that in large physical systems, interactions between the elementary degrees of freedom generate collective phenomena, such low temperature magnetisation in Ising models. Following this idea, the stability of memories in systems of interacting neurons has been successfully described as an emergent property, instigated by the dynamics of neural network models [1, 2].
Any physical system whose dynamics is dominated by a number of locally stable states can act as a content addressable memory as long as these states can be controlled. The Hebbian rule [4] has played an important role in training the couplings between neurons such that a prescribed set of memories (binary configurations of the Hopfield model) become attractors of the dynamics. Hopfield pointed out [1] that the issue of pattern retrieval is non trivial and that retrieval performance falls rapidly as more patterns are introduced, and are incorporated in the couplings. This behaviour was first found in numerical simulations and then analysed utilising statistical mechanics methods and exploiting the analogy with spin glass models. For is the number of patterns, the number of neurons and , it has been found [5, 6] that the critical value below which recovery is possible is approximately . Original studies considered the case where diagonal terms are not present, i.e. neural network models without auto-interactions. Subsequent studies [7, 8] considered also the problem with auto-interactions, but only recently it has been pointed out [9] that in this case an interesting regime can be found at . The probability that a given pattern is not a fixed point of the dynamics was studied and it has been shown that this probability is very small for very low , as expected, but surprisingly that there is another unexplored region at very large where this probability is very small as well. While the former and other intermediate regimes are well studied [6], the behaviour at was unexpected, since it implies that in this new regime the patterns are again fixed points of the dynamical equations. Moreover, it has been pointed out [9] that this new regime does not appear in the absence of diagonal interaction terms.
In this paper we study the stability of these fixed points. The relevance of this analysis comes from the fact that associative memories are useful when (in the regime where) they recover memories on the basis of similarity. In other words, the Hopfield model can be used to retrieve memories when starting the dynamics from a configuration similar, but not exactly equal to, a given pattern, we converge to one of the original patterns. Our analysis suggests that although training patterns are fixed points of the dynamics in the newly discovered regime [9], they are unstable, contrarily to the known regime of small values. In Sec. 2 we introduce the model and its dynamics; while in Sec. 3 we compute the probability of escaping the stored pattern when a small perturbation is introduced. In Sec. 4 we complete the analysis by computing the typical number of errors made after one dynamical step, where errors are measured in terms of the Hamming distance between the initial configuration (one of the training vectors) and the dynamical configuration.
2 Dynamics of a neural network
The neural network model that we will consider in this work is a system of binary variables (neurons) interconnected by a symmetric network of synapses specified by the real coupling matrix . We will focus on the non-linear dynamical equations
[TABLE]
where the value represents the state of the neuron at time , which may be active, , or inactive, . The value depends on the state of the neurons at the previous time step, . These equations give rise to a dynamical process in the space of configurations, depending on the properties of the matrix but, as pointed out by Hopfield [1], a careful choice of may trap the dynamics in basins of attraction that correspond to a given random set of patterns (training vectors) where and . These patterns can be considered as memories that the system is able to retrieve and should be fixed points of Eq. (1). In the following we will focus on the case where the matrix is specified by the Hebbian rule [4],
[TABLE]
introduced in order to explain associative learning. In fact, for , Eq. (2) can be obtained cumulatively from the successive application of the learning rule , specifying the change in the coupling between neurons when learning a given pattern and describing the observation that the simultaneous activation of neurons and increased the coupling strength between them.
Retrieval of patterns is known to be possible only for a number of patterns that is a small fraction of that of the neurons [1, 2, 5, 6]. Diagonal interaction terms were not considered in the early works about Hopfield model for a physical reason: in the corresponding spin models the field of a variable is induced by the state of its neighbours, but not on its own; thus self interactions do not exist and . Neural networks with diagonal terms have been studied in [9] and a very interesting regime has been found for . The probability that a given pattern is a not fixed point of the dynamics has been computed and has been shown to be very small for very low , as expected, but surprisingly another region has been identified at the very large regime, where is also very small. In the intermediate regime, is large. The first and the intermediate regime are well known but the behaviour of at was new and unexpected. The probability that a random vector, not in the training set , is not a fixed point of the dynamics has also been studied. As expected, in the low regime, the probability is close to but, and as increases this probability vanishes. Thus, in the new, large regime, is also very small. In other words, in this regime, both patterns and random configurations are likely to be fixed point of the dynamics. This has a trivial interpretation by noticing that for very large , the interaction matrix defined in Eq. (2) tends to the unit matrix. Even if this result seems to invalidate the usefulness of this new regime, it was shown that the ratio tends to a finite number, , in the large limit and that real patterns have an higher probability of being fixed points of the model. In the next section we address the stability question of these patterns, making use of the same strategy used in [9].
3 Stability of the fixed points
In order to study the stability of fixed points of the dynamical equations given in Eq. (1), we consider the case where one of the patterns is randomly perturbed. Since the patterns are configurations of binary variables, a random perturbation is obtained by flipping the value of sites. We denote by the set of perturbed sites and by the set of unperturbed variables and clearly, . We can consider the equations
[TABLE]
and compute the probability to get back to when the starting configuration is given by
[TABLE]
where is one if , and zero otherwise. As in [9] we focus on the one-step dynamics.
Let us first consider the case when . After some elementary algebra, the argument of the sign function in the r.h.s. of Eq. (3) becomes
[TABLE]
The second term contains uncorrelated terms of unitary variance. Using the central limit theorem we obtain for large
[TABLE]
where is drawn from a normalised Gaussian distribution. Clearly, if were 0, and we recover the correct sign. The variable , induced by the arbitrary bit flips, is thus a destabilising term, that impacts on the r.h.s. of Eq. (6). It is actually harmless as soon as it doesn’t change the sign of , thus we make a mistake on the value with a probability equal to
[TABLE]
Analogously, when , the r.h.s. of Eq. (3) becomes
[TABLE]
Using a central limit argument one obtains
[TABLE]
where is again drawn from a normalised Gaussian distribution. As in Eq. (7) we obtain the probability of making an error on one of the perturbed variables,
[TABLE]
Notice that and differ in the sign in front of , indicating the contribution coming from the diagonal interaction components. This contribution is always aligned with the variable value and consequently decreases the error probability in unperturbed variables. In the limit of large and , and finite we obtain
[TABLE]
While tends to zero as and , with a maximum at , the error probability for perturbed spins is an increasing function of , going from [math] to . This difference in stability between perturbed and unperturbed spins affects the overall stability of the original pattern. Since there are perturbed variables and unperturbed variables, the probability of failing to recover the original pattern after a single step of parallel dynamics is
[TABLE]
that, for , becomes the probability [9]. This probability can be plotted for different values at a given . In Fig. 1(a) we plot for and (blue) and (red), where the second case corresponds to the unperturbed case studied in [9]. We also performed numerical simulations (dots) in systems of variables for a different number of , counting the number of times that taking one of the training vectors and perturbing of its values we did not recover the original training vector, repeating this procedure times. While at large values the probability (red line) decreases to zero, consistently with the observation made in [9], this does not happen for the perturbed case (blue line). In other words, perturbing just one variable in a system of neurons is sufficient to not recover the correct patterns in the regime . Moreover, we notice that for small values both lines are close to zero, meaning that in this regime perturbing one neuron does not make a big difference. This is in agreement with Eq. (11): for , follows the behaviour of , where the stability of unperturbed spins at is affected by perturbed spins that are dominated by the diagonal interactions at large values, which increase the error probability. In Fig. 1(b) we plot the same quantity in the case (red line) and (blue line) for a system of , finding the same qualitative behaviour.
4 Large Deviations
In this section we compute the typical number of errors observed in with respect to the original patterns, produced by applying Eq. (3) to the vector of dynamical variables , which is specified by Eq. (4) for a given number of perturbed spins. Let us denote by the number of errors in the set of perturbed spins and by the number of errors in the set of unperturbed spins in . While the probability of is given by
[TABLE]
the probability of is given by
[TABLE]
Let us denote the total number of error by . The probability of , the number of errors at the next time step, is given by
[TABLE]
and we readily obtain
[TABLE]
Since we are mostly interested in the large behaviour, we denote by , , and , and use the Sterling’s formula for approximating the factorial of a large integer,
[TABLE]
Simple algebra leads to the expression
[TABLE]
where is the maximum of over at a given . The large deviation function of the probability is given by and its expression is
[TABLE]
Let us first consider the case , the case where the selected pattern is not perturbed at all. In Fig. 2(a) we plot
[TABLE]
for , as a function of for different choices of . The maximum of corresponds to a minimum of , where the double logarithm is chosen to emphasise the difference between different lines at large values. While at small values we find , i.e. the most likely value for is zero, corresponding to a non-increasing number of errors, as increases the probability of observing decreases and for (which corresponds roughly to ) we find a different minimum at , as can be seen in Fig. 2(b) and Fig. 3(a). We also observe that the probability of is sharply peaked in [math] in the low regime, while it is much broader in the large regime, even if it is clearly visible that the probability of observing is negligible.
The behaviour of at can be seen in Fig. 2(d) leading to a qualitatively similar behaviour of shown in Fig. 2(c), where we observe that small values of are dominated by , while as increases remains grater than zero. In other words, while the low regime leads to a recovery of the original pattern with probability even in the case when we perturb variables, the large regime does not. Notice in Fig. (2)(d) that the value of is mainly given by with fluctuations of order : errors are mainly induced by the set of perturbed spins that remain blocked because of the dominating diagonal terms.
Finally, to emphasise the sensitivity of fixed points to perturbations we plotted the values for a single pattern error shown in Fig. 3(b). We observe the same qualitative behaviour of Fig. 2(b).
Conclusion
The discovery of fixed points in the Hopfield model at the large number of patterns limit raised new hopes for a high-capacity properties of the Hopfield model, especially in within the context of associative memories in neural networks. We examine the usefulness of the newly discovered fixed points by focussing on their ability to recover stored patterns on the basis of incomplete information. In other words, the ability to converge to the original pattern when starting from a configuration that is similar to, but not exactly equal to it. We study the stability properties of these fixed points and show that these fixed points are unstable with respect to small perturbations. We also investigate the typical number of errors made by the one time step dynamics given in Eq. (1) and find that while this number is zero in the low storage regimes, it is not in the new large storage regime.
Finally, we notice that a simple statistical mechanical argument suggests that it is unlikely that the phase diagrams of an Hopfield model with auto-interactions differs from the phase diagram of an Hopfield model without auto-interactions. In fact, the partition functions of an Hopfield model with auto-interactions and that of an Hopfield model without auto-interactions differ from sub-leading terms in and so their thermodynamical properties have to be the same. Thus the presence of a new, unexplored, thermodynamical phase comprising multiple stable fixed points with non-vanishing basins of attraction at has to be ruled out.
Acknowledgement
Support from The Leverhulme Trust grant RPG-2013-48 is acknowledged. We wish to thank Pierfrancesco Urbani for insightful discussions.
References
- [1]
J J Hopfield.
Neural networks and physical systems with emergent collective computational abilities.
Proceedings of the national academy of sciences, 79(8):2554–2558, 1982.
- [2]
J J Hopfield.
Neurons with graded response have collective computational properties like those of two-state neurons.
Proceedings of the national academy of sciences, 81(10):3088–3092, 1984.
- [3]
J J Hopfield, D I Feinstein, and R G Palmer.
’Unlearning’ has a stabilizing effect in collective memories.
Nature, 304:158 – 159, 1983.
- [4]
D O Hebb.
The organization of behavior: A neuropsychological theory.
Psychology Press, 2005.
- [5]
D J Amit, H Gutfreund, and H Sompolinsky.
Spin-glass models of neural networks.
Physical Review A, 32(2):1007, 1985.
- [6]
D J Amit, H Gutfreund, and H Sompolinsky.
Storing infinite numbers of patterns in a spin-glass model of neural networks.
Physical Review Letters, 55(14):1530, 1985.
- [7]
Y Kabashima and D Saad.
The TAP approach to intensive and extensive connectivity systems.
Advanced Mean Field Methods–Theory and Practice, 6:65–84, 2001.
- [8]
M Mézard.
Mean-field message-passing equations in the hopfield model and its generalizations.
Physical Review E, 95(2):022117, 2017.
- [9]
V Folli, M Leonetti, and G Ruocco.
On the maximum storage capacity of the hopfield model.
Frontiers in Computational Neuroscience, 10:144, 2017.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences , 79(8):2554–2558, 1982.
- 2[2] J J Hopfield. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the national academy of sciences , 81(10):3088–3092, 1984.
- 3[3] J J Hopfield, D I Feinstein, and R G Palmer. ’Unlearning’ has a stabilizing effect in collective memories. Nature , 304:158 – 159, 1983.
- 4[4] D O Hebb. The organization of behavior: A neuropsychological theory . Psychology Press, 2005.
- 5[5] D J Amit, H Gutfreund, and H Sompolinsky. Spin-glass models of neural networks. Physical Review A , 32(2):1007, 1985.
- 6[6] D J Amit, H Gutfreund, and H Sompolinsky. Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters , 55(14):1530, 1985.
- 7[7] Y Kabashima and D Saad. The TAP approach to intensive and extensive connectivity systems. Advanced Mean Field Methods–Theory and Practice , 6:65–84, 2001.
- 8[8] M Mézard. Mean-field message-passing equations in the hopfield model and its generalizations. Physical Review E , 95(2):022117, 2017.
