Test-time regression: a unifying framework for designing sequence models with associative memory
Ke Alexander Wang, Jiaxin Shi, Emily B. Fox

TL;DR
This paper introduces a unifying framework for sequence models based on associative recall, formalized as test-time regression, which encompasses many existing architectures and guides the design of new models.
Contribution
It formalizes associative recall as regression, unifies various sequence models under this framework, and derives novel higher-order attention mechanisms.
Findings
Clarifies limitations of linear attention in capturing token correlations.
Provides mathematical justification for query-key normalization in softmax attention.
Derives new higher-order generalizations of softmax attention.
Abstract
Sequence models lie at the heart of modern deep learning. However, rapid advancements have produced a diversity of seemingly unrelated architectures, such as Transformers and recurrent alternatives. In this paper, we introduce a unifying framework to understand and derive these sequence models, inspired by the empirical importance of associative recall, the capability to retrieve contextually relevant tokens. We formalize associative recall as a two-step process, memorization and retrieval, casting memorization as a regression problem. Layers that combine these two steps perform associative recall via ``test-time regression'' over its input tokens. Prominent layers, including linear attention, state-space models, fast-weight programmers, online learners, and softmax attention, arise as special cases defined by three design choices: the regression weights, the regressor function class,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsAttention Is All You Need · Softmax
