Hopfield Networks is All You Need
Hubert Ramsauer, Bernhard Sch\"afl, Johannes Lehner, Philipp Seidl,, Michael Widrich, Thomas Adler, Lukas Gruber, Markus Holzleitner, Milena, Pavlovi\'c, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp,, G\"unter Klambauer, Johannes Brandstetter, Sepp Hochreiter

TL;DR
This paper presents a modern Hopfield network with continuous states that can store exponentially many patterns, is equivalent to transformer attention, and improves performance across various machine learning tasks.
Contribution
It introduces a new Hopfield network model with an update rule equivalent to transformer attention, enabling exponential storage and retrieval capabilities.
Findings
Achieved state-of-the-art results on multiple learning benchmarks.
Demonstrated broad applicability of Hopfield layers in deep learning.
Improved performance on immune repertoire and drug design datasets.
Abstract
We introduce a modern Hopfield network with continuous states and a corresponding update rule. The new Hopfield network can store exponentially (with the dimension of the associative space) many patterns, retrieves the pattern with one update, and has exponentially small retrieval errors. It has three types of energy minima (fixed points of the update): (1) global fixed point averaging over all patterns, (2) metastable states averaging over a subset of patterns, and (3) fixed points which store a single pattern. The new update rule is equivalent to the attention mechanism used in transformers. This equivalence enables a characterization of the heads of transformer models. These heads perform in the first layers preferably global averaging and in higher layers partial averaging via metastable states. The new modern Hopfield network can be integrated into deep learning architectures as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCognitive Science and Education Research · Semantic Web and Ontologies
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Hopfield Layer · WordPiece · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Attention Is All You Need · Adam
