Reinforcing Action Policies by Prophesying
Jiahui Zhang, Ze Huang, Chun Gu, Zipei Ma, Li Zhang

TL;DR
This paper introduces ProphRL, a method combining a learned world model and reinforcement learning to improve vision-language-action policies for robots, enhancing data efficiency, stability, and adaptability across diverse tasks and environments.
Contribution
It presents Prophet, a large-scale pretrained action-outcome dynamics model, and FA-GRPO with FlowScale, a reinforcement learning framework tailored for VLA policies, enabling few-shot adaptation and improved performance.
Findings
Achieved 5-17% success improvements on public benchmarks.
Real robot experiments showed 24-30% success gains.
ProphRL demonstrated effective adaptation to new robots and environments.
Abstract
Vision-Language-Action (VLA) policies excel in aligning language, perception, and robot control. However, most VLAs are trained purely by imitation, which overfits to demonstrations, and is brittle under distribution shift. Reinforcement learning (RL) directly optimizes task reward and thus addresses this misalignment, but real-robot interaction is expensive and conventional simulators are hard to engineer and transfer. We address both data efficiency and optimization stability in VLA post-training via a learned world model and an RL procedure tailored to flow-based action heads. Specifically, we introduce Prophet, a unified action-to-video robot actuation pretrained across large-scale, heterogeneous robot data to learn reusable action-outcome dynamics. It is able to few-shot adapt to new robots, objects, and environments, yielding a rollout-ready simulator. Upon Prophet, we reinforce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning
