Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning
Shaohuai Liu, Weirui Ye, Yilun Du, Le Xie

TL;DR
This paper introduces a multi-task model-based reinforcement learning approach that scales with the number of tasks rather than samples, leading to more efficient and robust humanoid control learning.
Contribution
It proposes a novel multi-task MBRL algorithm, EZ-M, demonstrating that task scaling improves sample efficiency and robustness in robotic learning.
Findings
EZ-M achieves state-of-the-art results on HumanoidBench.
Task diversity enhances dynamics learning and sample efficiency.
Multi-task learning with shared models outperforms single-task approaches.
Abstract
Developing generalist robots capable of mastering diverse skills remains a central challenge in embodied AI. While recent progress emphasizes scaling model parameters and offline datasets, such approaches are limited in robotics, where learning requires active interaction. We argue that effective online learning should scale the \emph{number of tasks}, rather than the number of samples per task. This regime reveals a structural advantage of model-based reinforcement learning (MBRL). Because physical dynamics are invariant across tasks, a shared world model can aggregate multi-task experience to learn robust, task-agnostic representations. In contrast, model-free methods suffer from gradient interference when tasks demand conflicting actions in similar states. Task diversity therefore acts as a regularizer for MBRL, improving dynamics learning and sample efficiency. We instantiate this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
