DreamPolicy: A Unified World-model Policy for Scalable Humanoid Locomotion
Yahao Fan, Tianxiang Gui, Kaiyang Ji, Shutong Ding, Chixuan Zhang, Yifeng Xu, Ke Yang, Jiayuan Gu, Jingyi Yu, Jingya Wang, Ye Shi

TL;DR
DreamPolicy introduces a diffusion-based world model framework that enables a single humanoid locomotion policy to generalize to unseen terrains, surpassing previous methods in scalability and robustness.
Contribution
It presents a novel integration of offline data with a diffusion-based world model for scalable, terrain-agnostic humanoid locomotion, improving generalization over prior distillation approaches.
Findings
Outperforms baselines by up to 27% on unseen terrains.
Achieves 38% improvement on combined terrains.
Enables zero-shot transfer to complex, unseen environments.
Abstract
Achieving versatile humanoid locomotion with a single policy presents a critical scalability challenge. Prevailing methods often rely on distilling multiple terrain-specific teacher policies into a unified student policy. However, while such distillation captures basic locomotion primitives, it struggles to organically compose these skills to adapt to complex environments, resulting in poor generalization to novel composite terrains unseen during training. To overcome this, we present DreamPolicy, a unified framework that integrates offline data with a diffusion-based world model, enabling a single policy to master both known and unseen terrains. Central to our approach is a terrain-aware world model, driven by an autoregressive diffusion world model trained on aggregated rollouts from specialized policies. This model synthesizes physically plausible future trajectories, which serve as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
