WPT: World-to-Policy Transfer via Online World Model Distillation
Guangfeng Jiang, Yueru Luo, Jun Liu, Yi Huang, Yiyao Zhu, Zhan Qu, Dave Zhenyu Chen, Bingbing Liu, Xu Yan

TL;DR
WPT introduces an online world model distillation method that improves policy learning and planning efficiency by integrating world knowledge into lightweight policies, achieving state-of-the-art results in driving benchmarks.
Contribution
The paper presents a novel online distillation framework that transfers world model knowledge into a lightweight policy for real-time decision making.
Findings
Achieves a 0.11 collision rate in open-loop tests.
Attains a 79.23 driving score in closed-loop benchmarks.
Increases inference speed by up to 4.9 times.
Abstract
Recent years have witnessed remarkable progress in world models, which primarily aim to capture the spatio-temporal correlations between an agent's actions and the evolving environment. However, existing approaches often suffer from tight runtime coupling or depend on offline reward signals, resulting in substantial inference overhead or hindering end-to-end optimization. To overcome these limitations, we introduce WPT, a World-to-Policy Transfer training paradigm that enables online distillation under the guidance of an end-to-end world model. Specifically, we develop a trainable reward model that infuses world knowledge into a teacher policy by aligning candidate trajectories with the future dynamics predicted by the world model. Subsequently, we propose policy distillation and world reward distillation to transfer the teacher's reasoning ability into a lightweight student policy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
