VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model

Yanjiang Guo; Tony Lee; Lucy Xiaoyang Shi; Jianyu Chen; Percy Liang; Chelsea Finn

arXiv:2602.12063·cs.RO·February 17, 2026

VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model

Yanjiang Guo, Tony Lee, Lucy Xiaoyang Shi, Jianyu Chen, Percy Liang, Chelsea Finn

PDF

Open Access

TL;DR

This paper introduces an iterative method that enhances vision-language-action models by using a learned world model to generate synthetic data, leading to significant performance improvements on real robot tasks.

Contribution

It proposes a simple iterative algorithm that leverages real-world data to improve a learned world model, which then generates synthetic data to boost VLA model performance.

Findings

01

39.2% success rate improvement over base policy

02

11.6% improvement from synthetic rollouts

03

Effective enhancement of VLA models on real robot tasks

Abstract

The goal of this paper is to improve the performance and reliability of vision-language-action (VLA) models through iterative online interaction. Since collecting policy rollouts in the real world is expensive, we investigate whether a learned simulator-specifically, an action-conditioned video generation model-can be used to generate additional rollout data. Unfortunately, existing world models lack the physical fidelity necessary for policy improvement: they are predominantly trained on demonstration datasets that lack coverage of many different physical interactions (particularly failure cases) and struggle to accurately model small yet critical physical details in contact-rich object manipulation. We propose a simple iterative improvement algorithm that uses real-world roll-out data to improve the fidelity of the world model, which can then, in turn, be used to generate supplemental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics