Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots

Pablo Valle; Chengjie Lu; Shaukat Ali; Aitor Arrieta

arXiv:2507.17049·cs.SE·August 4, 2025

Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots

Pablo Valle, Chengjie Lu, Shaukat Ali, Aitor Arrieta

PDF

Open Access

TL;DR

This paper introduces new uncertainty and quality metrics for Visual Language Action models in robotics, demonstrating their effectiveness in correlating with human judgments and improving evaluation beyond success rates.

Contribution

It proposes eight uncertainty and five quality metrics tailored for VLA models and validates their correlation with expert assessments through extensive empirical analysis.

Findings

01

Several metrics strongly correlate with human judgments

02

Metrics can distinguish execution quality levels

03

Current success rate evaluations are insufficient

Abstract

Visual Language Action (VLA) models are a multi-modal class of Artificial Intelligence (AI) systems that integrate visual perception, natural language understanding, and action planning to enable agents to interpret their environment, comprehend instructions, and perform embodied tasks autonomously. Recently, significant progress has been made to advance this field. These kinds of models are typically evaluated through task success rates, which fail to capture the quality of task execution and the mode's confidence in its decisions. In this paper, we propose eight uncertainty metrics and five quality metrics specifically designed for VLA models for robotic manipulation tasks. We assess their effectiveness through a large-scale empirical study involving 908 successful task executions from three state-of-the-art VLA models across four representative robotic manipulation tasks. Human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning