Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots
Pablo Valle, Chengjie Lu, Shaukat Ali, Aitor Arrieta

TL;DR
This paper introduces new uncertainty and quality metrics for Visual Language Action models in robotics, demonstrating their effectiveness in correlating with human judgments and improving evaluation beyond success rates.
Contribution
It proposes eight uncertainty and five quality metrics tailored for VLA models and validates their correlation with expert assessments through extensive empirical analysis.
Findings
Several metrics strongly correlate with human judgments
Metrics can distinguish execution quality levels
Current success rate evaluations are insufficient
Abstract
Visual Language Action (VLA) models are a multi-modal class of Artificial Intelligence (AI) systems that integrate visual perception, natural language understanding, and action planning to enable agents to interpret their environment, comprehend instructions, and perform embodied tasks autonomously. Recently, significant progress has been made to advance this field. These kinds of models are typically evaluated through task success rates, which fail to capture the quality of task execution and the mode's confidence in its decisions. In this paper, we propose eight uncertainty metrics and five quality metrics specifically designed for VLA models for robotic manipulation tasks. We assess their effectiveness through a large-scale empirical study involving 908 successful task executions from three state-of-the-art VLA models across four representative robotic manipulation tasks. Human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning
