Metamorphic Testing of Vision-Language Action-Enabled Robots
Pablo Valle, Sergio Segura, Shaukat Ali, Aitor Arrieta

TL;DR
This paper applies metamorphic testing to vision-language robotic models to automatically detect failures and evaluate performance without relying on complex test oracles, demonstrating broad applicability across models and tasks.
Contribution
It introduces a set of metamorphic relations for testing VLA models, addressing the test oracle problem and enabling failure detection without explicit correctness criteria.
Findings
MT effectively detects diverse failures in VLA models
Proposed MRs are generalizable across models and tasks
Approach reduces reliance on complex symbolic oracles
Abstract
Vision-Language-Action (VLA) models are multimodal robotic task controllers that, given an instruction and visual inputs, produce a sequence of low-level control actions (or motor commands) enabling a robot to execute the requested task in the physical environment. These systems face the test oracle problem from multiple perspectives. On the one hand, a test oracle must be defined for each instruction prompt, which is a complex and non-generalizable approach. On the other hand, current state-of-the-art oracles typically capture symbolic representations of the world (e.g., robot and object states), enabling the correctness evaluation of a task, but fail to assess other critical aspects, such as the quality with which VLA-enabled robots perform a task. In this paper, we explore whether Metamorphic Testing (MT) can alleviate the test oracle problem in this context. To do so, we propose two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
