Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds
Fan Wang, Pengtao Shao, Yiming Zhang, Bo Yu, Shaoshan Liu, Ning Ding, Yang Cao, Yu Kang, Haifeng Wang

TL;DR
This paper introduces AnyMDP, a scalable procedurally generated task set for in-context reinforcement learning, enabling better generalization and empirical analysis of task diversity effects on performance.
Contribution
The paper proposes AnyMDP, a large-scale, randomized task generation framework, and introduces methods for efficient meta-training and analysis of ICRL generalization.
Findings
Large-scale AnyMDP tasks improve ICRL generalization.
Task diversity influences adaptation time and performance.
Scaling ICRL requires extensive, diverse task sets.
Abstract
In-Context Reinforcement Learning (ICRL) enables agents to learn automatically and on-the-fly from their interactive experiences. However, a major challenge in scaling up ICRL is the lack of scalable task collections. To address this, we propose the procedurally generated tabular Markov Decision Processes, named AnyMDP. Through a carefully designed randomization process, AnyMDP is capable of generating high-quality tasks on a large scale while maintaining relatively low structural biases. To facilitate efficient meta-training at scale, we further introduce decoupled policy distillation and induce prior information in the ICRL framework. Our results demonstrate that, with a sufficiently large scale of AnyMDP tasks, the proposed model can generalize to tasks that were not considered in the training set through versatile in-context learning paradigms. The scalable task set provided by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Data Stream Mining Techniques
