$\Delta$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation

Yijie Zhu; Jie He; Rui Shao; Kaishen Yuan; Tao Tan; Xiaochen Yuan; Zitong Yu

arXiv:2603.08361·cs.CV·March 10, 2026

$\Delta$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation

Yijie Zhu, Jie He, Rui Shao, Kaishen Yuan, Tao Tan, Xiaochen Yuan, Zitong Yu

PDF

Open Access

TL;DR

The paper introduces $ riangle$VLA, a novel framework for vision-language-action in robotics that models world knowledge variations relative to a prior, enhancing reasoning and efficiency in action generation.

Contribution

It proposes a prior-guided approach with new modules for extracting, encoding, and disentangling world knowledge variations, improving over existing forecasting methods.

Findings

01

Achieves state-of-the-art results on robotic manipulation benchmarks.

02

Demonstrates improved efficiency and reasoning in real-world tasks.

03

Outperforms prior models in accuracy and robustness.

Abstract

Recent vision-language-action (VLA) models have significantly advanced robotic manipulation by unifying perception, reasoning, and control. To achieve such integration, recent studies adopt a predictive paradigm that models future visual states or world knowledge to guide action generation. However, these models emphasize forecasting outcomes rather than reasoning about the underlying process of change, which is essential for determining how to act. To address this, we propose $Δ$ VLA, a prior-guided framework that models world-knowledge variations relative to an explicit current-world knowledge prior for action generation, rather than regressing absolute future world states. Specifically, 1) to construct the current world knowledge prior, we propose the Prior-Guided WorldKnowledge Extractor (PWKE). It extracts manipulable regions, spatial relations, and semantic cues from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning