In-context denoising with one-layer transformers: connections between attention and associative memory retrieval

Matthew Smart; Alberto Bietti; Anirvan M. Sengupta

arXiv:2502.05164·cs.LG·June 9, 2025

In-context denoising with one-layer transformers: connections between attention and associative memory retrieval

Matthew Smart, Alberto Bietti, Anirvan M. Sengupta

PDF

Open Access

TL;DR

This paper establishes a theoretical and empirical connection between attention-based transformers and dense associative memory networks, showing that single-layer transformers can perform optimal denoising through gradient updates on memory landscapes.

Contribution

It introduces in-context denoising, demonstrating that one-layer transformers can solve certain problems optimally by mimicking associative memory retrieval via gradient updates.

Findings

01

Single-layer transformers perform optimal denoising in specific tasks.

02

Attention layers implement gradient updates on associative memory landscapes.

03

Associative memory models are relevant for understanding in-context learning.

Abstract

We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSoftmax · Attention Is All You Need