In-context denoising with one-layer transformers: connections between attention and associative memory retrieval
Matthew Smart, Alberto Bietti, Anirvan M. Sengupta

TL;DR
This paper establishes a theoretical and empirical connection between attention-based transformers and dense associative memory networks, showing that single-layer transformers can perform optimal denoising through gradient updates on memory landscapes.
Contribution
It introduces in-context denoising, demonstrating that one-layer transformers can solve certain problems optimally by mimicking associative memory retrieval via gradient updates.
Findings
Single-layer transformers perform optimal denoising in specific tasks.
Attention layers implement gradient updates on associative memory landscapes.
Associative memory models are relevant for understanding in-context learning.
Abstract
We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Attention Is All You Need
