Boosting Vision-Language-Action Finetuning with Feasible Action Neighborhood Prior

Haochen Niu; Kanyu Zhang; Shuyu Yin; Qinghai Guo; Peilin Liu; Fei Wen

arXiv:2604.01570·cs.RO·April 3, 2026

Boosting Vision-Language-Action Finetuning with Feasible Action Neighborhood Prior

Haochen Niu, Kanyu Zhang, Shuyu Yin, Qinghai Guo, Peilin Liu, Fei Wen

PDF

TL;DR

This paper introduces a FAN-guided regularizer for vision-language-action models in robotic manipulation, improving sample efficiency and generalization by leveraging the feasible action neighborhood property.

Contribution

It proposes a novel regularizer that aligns model predictions with the geometry of feasible action neighborhoods, enhancing VLA finetuning performance.

Findings

01

Significant improvement in sample efficiency in RFT and SFT.

02

Higher success rates in both in-distribution and OOD scenarios.

03

Effective exploitation of the feasible action neighborhood property.

Abstract

In real-world robotic manipulation, states typically admit a neighborhood of near-equivalent actions. That is, for each state, there exist a feasible action neighborhood (FAN) rather than a single correct action, within which motions yield indistinguishable progress. However, prevalent VLA training methodologies are directly inherited from linguistic settings and do not exploit the FAN property, thus leading to poor generalization and low sample efficiency. To address this limitation, we introduce a FAN-guided regularizer that shapes the model's output distribution to align with the geometry of FAN. Concretely, we introduce a Gaussian prior that promotes locally smooth and unimodal predictions around the preferred direction and magnitude. In extensive experiments across both reinforced finetuning (RFT) and supervised finetuning (SFT), our method achieves significant improvement in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.