Continuous Latent Contexts Enable Efficient Online Learning in Transformers

Emile Anand; Abdullah Ateyeh; Xinyuan Cao; Max Dabagia

arXiv:2605.09867·cs.LG·May 12, 2026

Continuous Latent Contexts Enable Efficient Online Learning in Transformers

Emile Anand, Abdullah Ateyeh, Xinyuan Cao, Max Dabagia

PDF

TL;DR

This paper demonstrates that continuous latent context tokens enable transformers to efficiently perform online learning tasks by implementing algorithms like weighted majority and Q-learning, leading to improved online decision-making.

Contribution

The authors show how continuous latent contexts allow transformers to explicitly implement online learning algorithms with constant depth and train a small GPT-2 model to outperform larger models on synthetic online tasks.

Findings

01

Transformers with latent contexts can implement online algorithms like weighted majority and Q-learning.

02

A small GPT-2 model with latent contexts outperforms larger models on synthetic online prediction sequences.

03

Continuous latent contexts serve as effective persistent states for online learning in transformers.

Abstract

Large language models (LLMs) exhibit a strong capacity for in-context learning: Given labeled examples, they can generate good predictions without parameter updates. However, many interactive settings go beyond static prediction to online decision-making, in which effective behavior demands adaptation over long multi-turn horizons in response to feedback, and efficient algorithms in these domains must use compact representations of what they have learned. Recently, continuous transformer architectures with latent chain of thought have shown promise for offline iterative tasks such as directed graph-reachability. Motivated by this, we study whether continuous latent context tokens equip transformers to more effectively realize online learning. We give explicit constructions of constant-depth transformers that implement two foundational online decision-making procedures -- the weighted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.