Self-Correcting VLA: Online Action Refinement via Sparse World Imagination

Chenyv Liu; Wentao Tan; Lei Zhu; Fengling Li; Jingjing Li; Guoli Yang; Heng Tao Shen

arXiv:2602.21633·cs.RO·February 26, 2026

Self-Correcting VLA: Online Action Refinement via Sparse World Imagination

Chenyv Liu, Wentao Tan, Lei Zhu, Fengling Li, Jingjing Li, Guoli Yang, Heng Tao Shen

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Self-Correcting VLA, a novel approach that enhances vision-language-action models with sparse world imagination and online action refinement, leading to improved robustness and efficiency in robot manipulation tasks.

Contribution

The paper proposes a self-correcting framework for VLA models that integrates sparse world imagination and online action refinement for better physical understanding and self-improvement.

Findings

01

Achieves state-of-the-art performance on robot manipulation benchmarks.

02

Reduces steps by 16% while increasing success rate by 9%.

03

Gains 14% in real-world experiments.

Abstract

Standard vision-language-action (VLA) models rely on fitting statistical data priors, limiting their robust understanding of underlying physical dynamics. Reinforcement learning enhances physical grounding through exploration yet typically relies on external reward signals that remain isolated from the agent's internal states. World action models have emerged as a promising paradigm that integrates imagination and control to enable predictive planning. However, they rely on implicit context modeling, lacking explicit mechanisms for self-improvement. To solve these problems, we propose Self-Correcting VLA (SC-VLA), which achieve self-improvement by intrinsically guiding action refinement through sparse imagination. We first design sparse world imagination by integrating auxiliary predictive heads to forecast current task progress and future trajectory trends, thereby constraining the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Kisaragi0/arx5_real_world_datasets
dataset· 628 dl
628 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning