Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL

Xiaofeng Lin; Sirou Zhu; Yilei Chen; Mingyu Chen; Hejian Sang; Ioannis Paschalidis; Zhipeng Wang; Aldo Pacchiano; Xuezhou Zhang

arXiv:2602.04089·cs.AI·February 5, 2026

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL

Xiaofeng Lin, Sirou Zhu, Yilei Chen, Mingyu Chen, Hejian Sang, Ioannis Paschalidis, Zhipeng Wang, Aldo Pacchiano, Xuezhou Zhang

PDF

Open Access

TL;DR

This paper introduces ORBIT, a meta-reinforcement learning framework that trains large language models to improve their online learning capabilities through interaction, enabling better decision-making in dynamic environments without weight updates.

Contribution

The paper presents a novel multi-task, multi-episode meta-RL training method for LLMs, significantly enhancing their in-context online learning ability in unseen environments.

Findings

01

Qwen3-14B matches GPT-5.2 in online learning tasks

02

Outperforms standard RL fine-tuning by a large margin

03

Scaling experiments show consistent gains with larger models

Abstract

Large language models (LLMs) achieve strong performance when all task-relevant information is available upfront, as in static prediction and instruction-following problems. However, many real-world decision-making tasks are inherently online: crucial information must be acquired through interaction, feedback is delayed, and effective behavior requires balancing information collection and exploitation over time. While in-context learning enables adaptation without weight updates, existing LLMs often struggle to reliably leverage in-context interaction experience in such settings. In this work, we show that this limitation can be addressed through training. We introduce ORBIT, a multi-task, multi-episode meta-reinforcement learning framework that trains LLMs to learn from interaction in context. After meta-training, a relatively small open-source model (Qwen3-14B) demonstrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning