Continuous Latent Contexts Enable Efficient Online Learning in Transformers
Emile Anand, Abdullah Ateyeh, Xinyuan Cao, Max Dabagia

TL;DR
This paper demonstrates that continuous latent context tokens enable transformers to efficiently perform online learning tasks by implementing algorithms like weighted majority and Q-learning, leading to improved online decision-making.
Contribution
The authors show how continuous latent contexts allow transformers to explicitly implement online learning algorithms with constant depth and train a small GPT-2 model to outperform larger models on synthetic online tasks.
Findings
Transformers with latent contexts can implement online algorithms like weighted majority and Q-learning.
A small GPT-2 model with latent contexts outperforms larger models on synthetic online prediction sequences.
Continuous latent contexts serve as effective persistent states for online learning in transformers.
Abstract
Large language models (LLMs) exhibit a strong capacity for in-context learning: Given labeled examples, they can generate good predictions without parameter updates. However, many interactive settings go beyond static prediction to online decision-making, in which effective behavior demands adaptation over long multi-turn horizons in response to feedback, and efficient algorithms in these domains must use compact representations of what they have learned. Recently, continuous transformer architectures with latent chain of thought have shown promise for offline iterative tasks such as directed graph-reachability. Motivated by this, we study whether continuous latent context tokens equip transformers to more effectively realize online learning. We give explicit constructions of constant-depth transformers that implement two foundational online decision-making procedures -- the weighted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
