VGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation

Changhua Xu; Jie Lu; Junyu Xuan; En Yu

arXiv:2602.07399·cs.AI·February 10, 2026

VGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation

Changhua Xu, Jie Lu, Junyu Xuan, En Yu

PDF

Open Access

TL;DR

VGAS introduces a value-guided action selection framework for few-shot vision-language-action tasks, improving geometric precision and robustness in scarce data scenarios through a novel critic and regularization techniques.

Contribution

It proposes VGAS, a new inference-time selection method with a geometric critic and explicit regularization, enhancing few-shot VLA adaptation performance.

Findings

01

VGAS improves success rates in limited demonstration settings.

02

VGAS enhances robustness against distribution shifts.

03

The geometric critic and regularization stabilize action ranking.

Abstract

Vision--Language--Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce demonstrations remains unreliable. While fine-tuned VLA policies often produce semantically plausible trajectories, failures often arise from unresolved geometric ambiguities, where near-miss action candidates lead to divergent execution outcomes under limited supervision. We study few-shot VLA adaptation from a \emph{generation--selection} perspective and propose a novel framework \textbf{VGAS} (\textbf{V}alue-\textbf{G}uided \textbf{A}ction-chunk \textbf{S}election). It performs inference-time best-of- $N$ selection to identify action chunks that are both semantically faithful and geometrically precise. Specifically, \textbf{VGAS} employs a finetuned VLA as a high-recall proposal generator and introduces the \textrm{Q-Chunk-Former}, a geometrically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis