Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training
Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yanjiang Guo, Jiaming Liu, Shanghang Zhang, Jianyu Chen, Yichen Zhu

TL;DR
Hi-WM introduces a framework where learned world models enable human-guided corrections in simulation, significantly improving real-world robot manipulation success rates without physical retries.
Contribution
The paper presents Hi-WM, a novel method leveraging world models for scalable, human-in-the-loop post-training correction of robot policies in simulation.
Findings
Hi-WM improves real-world success by 37.9 points on average.
World-model evaluation correlates strongly with real-world performance (r = 0.953).
The approach reduces the need for physical robot resets and supervision.
Abstract
Post-training is essential for turning pretrained generalist robot policies into reliable task-specific controllers, but existing human-in-the-loop pipelines remain tied to physical execution: each correction requires robot time, scene setup, resets, and operator supervision in the real world. Meanwhile, action-conditioned world models have been studied mainly for imagination, synthetic data generation, and policy evaluation. We propose \textbf{Human-in-the-World-Model (Hi-WM)}, a post-training framework that uses a learned world model as a reusable corrective substrate for failure-targeted policy improvement. A policy is first rolled out in closed loop inside the world model; when the rollout becomes incorrect or failure-prone, a human intervenes directly in the model to provide short corrective actions. Hi-WM caches intermediate states and supports rollback and branching, allowing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
