WPT: World-to-Policy Transfer via Online World Model Distillation

Guangfeng Jiang; Yueru Luo; Jun Liu; Yi Huang; Yiyao Zhu; Zhan Qu; Dave Zhenyu Chen; Bingbing Liu; Xu Yan

arXiv:2511.20095·cs.CV·March 19, 2026

WPT: World-to-Policy Transfer via Online World Model Distillation

Guangfeng Jiang, Yueru Luo, Jun Liu, Yi Huang, Yiyao Zhu, Zhan Qu, Dave Zhenyu Chen, Bingbing Liu, Xu Yan

PDF

Open Access

TL;DR

WPT introduces an online world model distillation method that improves policy learning and planning efficiency by integrating world knowledge into lightweight policies, achieving state-of-the-art results in driving benchmarks.

Contribution

The paper presents a novel online distillation framework that transfers world model knowledge into a lightweight policy for real-time decision making.

Findings

01

Achieves a 0.11 collision rate in open-loop tests.

02

Attains a 79.23 driving score in closed-loop benchmarks.

03

Increases inference speed by up to 4.9 times.

Abstract

Recent years have witnessed remarkable progress in world models, which primarily aim to capture the spatio-temporal correlations between an agent's actions and the evolving environment. However, existing approaches often suffer from tight runtime coupling or depend on offline reward signals, resulting in substantial inference overhead or hindering end-to-end optimization. To overcome these limitations, we introduce WPT, a World-to-Policy Transfer training paradigm that enables online distillation under the guidance of an end-to-end world model. Specifically, we develop a trainable reward model that infuses world knowledge into a teacher policy by aligning candidate trajectories with the future dynamics predicted by the world model. Subsequently, we propose policy distillation and world reward distillation to transfer the teacher's reasoning ability into a lightweight student policy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis