Transformers as Decision Makers: Provable In-Context Reinforcement   Learning via Supervised Pretraining

Licong Lin; Yu Bai; Song Mei

arXiv:2310.08566·cs.LG·May 28, 2024·1 cites

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

Licong Lin, Yu Bai, Song Mei

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper offers a theoretical analysis of how pretrained transformer models can perform in-context reinforcement learning, demonstrating their ability to imitate algorithms and approximate near-optimal policies in various RL settings.

Contribution

It provides the first quantitative theoretical framework explaining how supervised pretraining enables transformers to perform ICRL and approximate RL algorithms.

Findings

01

Transformers can imitate expert algorithms with bounded error.

02

ReLU attention transformers can approximate near-optimal RL algorithms.

03

Generalization error depends on model capacity and data distribution divergence.

Abstract

Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

licong-lin/in-context-rl
pytorchOfficial

Videos

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms