Reinforcing VLAs in Task-Agnostic World Models
Yucen Wang, Rui Yu, Fengming Zhang, Junjie Lu, Xinyao Qin, Tianxiang Zhang, Kaixin Wang, Li Zhao

TL;DR
RAW-Dream introduces a task-agnostic framework for vision-language-action models that leverages pre-trained world models and vision-language models, enabling zero-shot adaptation to new tasks with improved scalability and reduced reliance on task-specific data.
Contribution
The paper presents RAW-Dream, a novel paradigm that disentangles world model learning from task dependencies, using pre-trained models and a dual-noise verification to enhance zero-shot VLA adaptation.
Findings
Consistent performance improvements across simulation and real-world tasks.
Effective substitution of task-dependent data with generalized physical priors.
Successful mitigation of hallucinations in world model rollouts.
Abstract
Post-training Vision-Language-Action (VLA) models via reinforcement learning (RL) in learned world models has emerged as an effective strategy to adapt to new tasks without costly real-world interactions. However, while using imagined trajectories reduces the sample complexity of policy training, existing methods still heavily rely on task-specific data to fine-tune both the world and reward models, fundamentally limiting their scalability to unseen tasks. To overcome this, we argue that world and reward models should capture transferable physical priors that enable zero-shot inference. We propose RAW-Dream (Reinforcing VLAs in task-Agnostic World Dreams), a new paradigm that completely disentangles world model learning from downstream task dependencies. RAW-Dream utilizes a world model pre-trained on diverse task-free behaviors for predicting future rollouts, and an off-the-shelf…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
