Titans: Learning to Memorize at Test Time
Ali Behrouz, Peilin Zhong, Vahab Mirrokni

TL;DR
Titans introduce a neural long-term memory module combined with attention, enabling models to effectively memorize and utilize long past information, outperforming Transformers and linear recurrent models on various tasks.
Contribution
The paper proposes a novel neural memory module and a new architecture family called Titans, which enhance long-term memory and scalability in sequence modeling.
Findings
Outperforms Transformers and linear recurrent models on multiple tasks.
Effectively scales to context windows larger than 2 million tokens.
Improves accuracy in needle-in-haystack tasks.
Abstract
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference. From a memory perspective, we argue that attention due to its limited context but accurate dependency modeling performs as a short-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Titans: Learning to Memorize at Test Time (Paper Analysis)· youtube
Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)· youtube
Taxonomy
TopicsArtificial Intelligence in Games
MethodsSoftmax · Attention Is All You Need
