On the Role of Hidden States of Modern Hopfield Network in Transformer
Tsubasa Masumura, Masato Taki

TL;DR
This paper explores the connection between modern Hopfield networks and Transformer self-attention, introducing a new attention mechanism called modern Hopfield attention (MHA) that enhances deep Transformer performance by addressing rank collapse and token uniformity.
Contribution
It generalizes the relationship between Hopfield networks and Transformers by incorporating hidden states, leading to the development of MHA that improves attention quality and model accuracy without extra parameters.
Findings
MHA improves attention weight quality and diversity.
MHA addresses rank collapse and token uniformity issues.
MHA enhances Transformer accuracy without additional training parameters.
Abstract
Associative memory models based on Hopfield networks and self-attention based on key-value mechanisms have been popular approaches in the study of memory mechanisms in deep learning. It has been pointed out that the state update rule of the modern Hopfield network (MHN) in the adiabatic approximation is in agreement with the self-attention layer of Transformer. In this paper, we go beyond this approximation and investigate the relationship between MHN and self-attention. Our results show that the correspondence between Hopfield networks and Transformers can be established in a more generalized form by adding a new variable, the hidden state derived from the MHN, to self-attention. This new attention mechanism, modern Hopfield attention (MHA), allows the inheritance of attention scores from the input layer of the Transformer to the output layer, which greatly improves the nature of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Face Recognition and Perception
