Transformers Provably Implement In-Context Reinforcement Learning with Policy Improvement
Haodong Liang, Lifeng Lai

TL;DR
This paper demonstrates that transformers can explicitly implement classical reinforcement learning algorithms and guarantees their convergence under certain conditions, bridging mechanistic understanding and training dynamics.
Contribution
It provides the first provable implementation of policy improvement methods in transformers and analyzes their training dynamics with convergence guarantees.
Findings
Transformers can implement policy improvement algorithms like SARSA and actor-critic.
Gradient flow during training converges locally and exponentially to an optimal parameter manifold.
Trained transformers recover explicit parameter structures and perform well on unseen MDPs.
Abstract
We investigate the ability of transformers to perform in-context reinforcement learning (ICRL), where a model must infer and execute learning algorithms from trajectory data without parameter updates. We show that a linear self-attention transformer block can provably implement policy-improvement methods, including semi-gradient SARSA and actor-critic, via explicit parameter constructions. Beyond existence, we design a teacher-mimicking training procedure, analyze its gradient-flow dynamics, and establish the first convergence guarantee in the ICRL literature: under suitable richness conditions on the training MDP distribution, gradient flow converges locally and exponentially to an optimal parameter manifold corresponding to the desired RL update. Empirically, training transformers on randomly generated tabular MDPs confirms these predictions: the learned models recover the parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
