Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning
Jaehyeon Son, Soochan Lee, Gunhee Kim

TL;DR
This paper introduces DICP, a framework that distills in-context model-based reinforcement learning with Transformers, enabling efficient learning and planning that surpasses existing methods in various environments.
Contribution
The paper proposes DICP, a novel in-context model-based RL approach that distills environment dynamics and policy improvement within Transformers, reducing interactions and improving performance.
Findings
DICP achieves state-of-the-art results across multiple environments.
DICP requires fewer environment interactions than baseline methods.
DICP effectively combines environment modeling and policy improvement in-context.
Abstract
Recent studies have shown that Transformers can perform in-context reinforcement learning (RL) by imitating existing RL algorithms, enabling sample-efficient adaptation to unseen tasks without parameter updates. However, these models also inherit the suboptimal behaviors of the RL algorithms they imitate. This issue primarily arises due to the gradual update rule employed by those algorithms. Model-based planning offers a promising solution to this limitation by allowing the models to simulate potential outcomes before taking action, providing an additional mechanism to deviate from the suboptimal behavior. Rather than learning a separate dynamics model, we propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where Transformers simultaneously learn environment dynamics and improve policy in-context. We evaluate DICP across a range of discrete and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · AI-based Problem Solving and Planning · Artificial Intelligence in Games
