StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

Yiyang Fu; Chubin Zhang; Shukai Gong; Yufan Deng; Kaiwei Sun; Qiyang Min; Qibin Hou; Yansong Tang; Jianan Wang; Daquan Zhou

arXiv:2605.18287·cs.CV·May 19, 2026

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

Yiyang Fu, Chubin Zhang, Shukai Gong, Yufan Deng, Kaiwei Sun, Qiyang Min, Qibin Hou, Yansong Tang, Jianan Wang, Daquan Zhou

PDF

TL;DR

This paper introduces StableVLA, a robust vision-language-action model enhanced with a lightweight IB-Adapter that filters visual noise, significantly improving performance under unseen disturbances without extra data.

Contribution

The authors propose the IB-Adapter, a novel information-theoretic module that enhances VLA model robustness against unseen visual disturbances without additional data or augmentation.

Findings

01

IB-Adapter improves baseline performance by 30% on average.

02

StableVLA with a smaller backbone achieves robustness comparable to larger models.

03

The approach maintains accuracy on long-horizon tasks under visual corruptions.

Abstract

It is infeasible to encompass all possible disturbances within the training dataset. This raises a critical question regarding the robustness of Vision-Language-Action (VLA) models when encountering unseen real-world visual disturbances, particularly under imperfect visual conditions. In this work, we conduct a systematic study based on recent state-of-the-art VLA models and reveal a significant performance drop when visual disturbances absent from the training data are introduced. To mitigate this issue, we propose a lightweight adapter module grounded in information theory, termed the Information Bottleneck Adapter (IB-Adapter), which selectively filters potential noise from visual inputs. Without requiring any extra data or augmentation strategies, IB-Adapter consistently improves over the baseline by an average of 30%, while adding fewer than 10M parameters, demonstrating notable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.