Stochastic Attention via Langevin Dynamics on the Modern Hopfield Energy
Abdulrahman Alswaidan, Jeffrey D. Varner

TL;DR
This paper introduces stochastic attention via Langevin dynamics on the modern Hopfield energy, enabling training-free, temperature-controlled retrieval and generation with diverse, high-fidelity samples across various domains.
Contribution
It presents a novel, training-free stochastic attention method based on Langevin sampling, eliminating the need for learned models or score networks.
Findings
Exact retrieval achieved at low temperature.
Generated samples were more diverse and novel than learned baselines.
Preserved family-level fidelity in protein sequence generation.
Abstract
Attention heads retrieve: given a query, they return a weighted average of stored values. We showed that this computation is one step of gradient descent on the modern Hopfield energy, and that Langevin sampling from the corresponding Boltzmann distribution yielded stochastic attention, a training-free sampler controlled by a single temperature parameter. Lowering the temperature gave exact retrieval; raising it gave open-ended generation. Because the energy gradient equals the attention map, no score network, training loop, or learned model was required, making the approach particularly suited to the low-data regime where learned generative models are starved of training signal. We derived an entropy inflection condition that identified the retrieval-to-generation transition temperature for any memory geometry and validated the sampler on five domains spanning two orders of magnitude…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
