Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds

Fan Wang; Pengtao Shao; Yiming Zhang; Bo Yu; Shaoshan Liu; Ning Ding; Yang Cao; Yu Kang; Haifeng Wang

arXiv:2502.02869·cs.LG·November 4, 2025

Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds

Fan Wang, Pengtao Shao, Yiming Zhang, Bo Yu, Shaoshan Liu, Ning Ding, Yang Cao, Yu Kang, Haifeng Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces AnyMDP, a scalable procedurally generated task set for in-context reinforcement learning, enabling better generalization and empirical analysis of task diversity effects on performance.

Contribution

The paper proposes AnyMDP, a large-scale, randomized task generation framework, and introduces methods for efficient meta-training and analysis of ICRL generalization.

Findings

01

Large-scale AnyMDP tasks improve ICRL generalization.

02

Task diversity influences adaptation time and performance.

03

Scaling ICRL requires extensive, diverse task sets.

Abstract

In-Context Reinforcement Learning (ICRL) enables agents to learn automatically and on-the-fly from their interactive experiences. However, a major challenge in scaling up ICRL is the lack of scalable task collections. To address this, we propose the procedurally generated tabular Markov Decision Processes, named AnyMDP. Through a carefully designed randomization process, AnyMDP is capable of generating high-quality tasks on a large scale while maintaining relatively low structural biases. To facilitate efficient meta-training at scale, we further introduce decoupled policy distillation and induce prior information in the ICRL framework. Our results demonstrate that, with a sufficiently large scale of AnyMDP tasks, the proposed model can generalize to tasks that were not considered in the training set through versatile in-context learning paradigms. The scalable task set provided by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds· slideslive

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Data Stream Mining Techniques