VLATest: Testing and Evaluating Vision-Language-Action Models for Robotic Manipulation

Zhijie Wang; Zhehua Zhou; Jiayang Song; Yuheng Huang; Zhan Shu; Lei Ma

arXiv:2409.12894·cs.SE·May 13, 2025

VLATest: Testing and Evaluating Vision-Language-Action Models for Robotic Manipulation

Zhijie Wang, Zhehua Zhou, Jiayang Song, Yuheng Huang, Zhan Shu, Lei Ma

PDF

Open Access

TL;DR

This paper introduces VLATest, a fuzzing framework for testing vision-language-action models in robotic manipulation, revealing their current lack of robustness across diverse scenarios and conditions.

Contribution

We developed VLATest to generate diverse testing scenes and conducted an empirical study on seven VLA models, exposing their robustness limitations.

Findings

01

VLA models lack robustness in diverse scenarios

02

Performance drops with confounding objects and lighting changes

03

Unseen objects and instruction mutations significantly affect accuracy

Abstract

The rapid advancement of generative AI and multi-modal foundation models has shown significant potential in advancing robotic manipulation. Vision-language-action (VLA) models, in particular, have emerged as a promising approach for visuomotor control by leveraging large-scale vision-language data and robot demonstrations. However, current VLA models are typically evaluated using a limited set of hand-crafted scenes, leaving their general performance and robustness in diverse scenarios largely unexplored. To address this gap, we present VLATest, a fuzzing framework designed to generate robotic manipulation scenes for testing VLA models. Based on VLATest, we conducted an empirical study to assess the performance of seven representative VLA models. Our study results revealed that current VLA models lack the robustness necessary for practical deployment. Additionally, we investigated the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Robotics and Automated Systems