WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL

Zhennan Jiang; Shangqing Zhou; Yutong Jiang; Zefang Huang; Mingjie Wei; Yuhui Chen; Tianxing Zhou; Zhen Guo; Hao Lin; Quanlu Zhang; Yu Wang; Haoran Li; Chao Yu; and Dongbin Zhao

arXiv:2602.13977·cs.RO·February 17, 2026

WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL

Zhennan Jiang, Shangqing Zhou, Yutong Jiang, Zefang Huang, Mingjie Wei, Yuhui Chen, Tianxing Zhou, Zhen Guo, Hao Lin, Quanlu Zhang, Yu Wang, Haoran Li, Chao Yu, and Dongbin Zhao

PDF

Open Access

TL;DR

WoVR introduces a reliable world-model-based RL framework for VLA policies, explicitly managing hallucination and error accumulation to enable stable long-horizon imagined rollouts and improved robotic task success.

Contribution

It presents a novel approach that regulates interaction with imperfect world models, enhancing stability and effectiveness of RL for vision-language-action tasks.

Findings

01

Significant improvement in LIBERO benchmark success rates.

02

Enhanced real-world robotic manipulation performance.

03

Effective control of hallucination in learned world models.

Abstract

Reinforcement learning (RL) promises to unlock capabilities beyond imitation learning for Vision-Language-Action (VLA) models, but its requirement for massive real-world interaction prevents direct deployment on physical robots. Recent work attempts to use learned world models as simulators for policy optimization, yet closed-loop imagined rollouts inevitably suffer from hallucination and long-horizon error accumulation. Such errors do not merely degrade visual fidelity; they corrupt the optimization signal, encouraging policies to exploit model inaccuracies rather than genuine task progress. We propose WoVR, a reliable world-model-based reinforcement learning framework for post-training VLA policies. Instead of assuming a faithful world model, WoVR explicitly regulates how RL interacts with imperfect imagined dynamics. It improves rollout stability through a controllable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning