Transformers Can Learn Temporal Difference Methods for In-Context   Reinforcement Learning

Jiuqi Wang; Ethan Blaser; Hadi Daneshmand; Shangtong Zhang

arXiv:2405.13861·cs.LG·February 26, 2025

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning

Jiuqi Wang, Ethan Blaser, Hadi Daneshmand, Shangtong Zhang

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that transformers trained for policy evaluation can inherently learn to implement temporal difference learning, enabling in-context reinforcement learning without parameter updates.

Contribution

It provides empirical and theoretical evidence that transformers can discover and implement TD learning algorithms during training for RL tasks.

Findings

01

Transformers trained for policy evaluation learn TD learning in their forward pass.

02

The learned TD algorithms enable in-context RL without parameter updates.

03

Theoretical analysis supports the empirical observations.

Abstract

Traditionally, reinforcement learning (RL) agents learn to solve new tasks by updating their neural network parameters through interactions with the task environment. However, recent works demonstrate that some RL agents, after certain pretraining procedures, can learn to solve unseen new tasks without parameter updates, a phenomenon known as in-context reinforcement learning (ICRL). The empirical success of ICRL is widely attributed to the hypothesis that the forward pass of the pretrained agent neural network implements an RL algorithm. In this paper, we support this hypothesis by showing, both empirically and theoretically, that when a transformer is trained for policy evaluation tasks, it can discover and learn to implement temporal difference learning in its forward pass.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Building Energy and Comfort Optimization · Smart Grid Energy Management