Distilling Reinforcement Learning Algorithms for In-Context Model-Based   Planning

Jaehyeon Son; Soochan Lee; Gunhee Kim

arXiv:2502.19009·cs.LG·February 27, 2025

Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning

Jaehyeon Son, Soochan Lee, Gunhee Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DICP, a framework that distills in-context model-based reinforcement learning with Transformers, enabling efficient learning and planning that surpasses existing methods in various environments.

Contribution

The paper proposes DICP, a novel in-context model-based RL approach that distills environment dynamics and policy improvement within Transformers, reducing interactions and improving performance.

Findings

01

DICP achieves state-of-the-art results across multiple environments.

02

DICP requires fewer environment interactions than baseline methods.

03

DICP effectively combines environment modeling and policy improvement in-context.

Abstract

Recent studies have shown that Transformers can perform in-context reinforcement learning (RL) by imitating existing RL algorithms, enabling sample-efficient adaptation to unseen tasks without parameter updates. However, these models also inherit the suboptimal behaviors of the RL algorithms they imitate. This issue primarily arises due to the gradual update rule employed by those algorithms. Model-based planning offers a promising solution to this limitation by allowing the models to simulate potential outcomes before taking action, providing an additional mechanism to deviate from the suboptimal behavior. Rather than learning a separate dynamics model, we propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where Transformers simultaneously learn environment dynamics and improve policy in-context. We evaluate DICP across a range of discrete and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jaehyeon-son/dicp
pytorchOfficial

Videos

Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · AI-based Problem Solving and Planning · Artificial Intelligence in Games