How Users Understand Robot Foundation Model Performance through Task Success Rates and Beyond

Isaac Sheidlower; Jindan Huang; James Staley; Bingyu Wu; Qicong Chen; Reuben Aronson; Elaine Short

arXiv:2602.03920·cs.RO·February 5, 2026

How Users Understand Robot Foundation Model Performance through Task Success Rates and Beyond

Isaac Sheidlower, Jindan Huang, James Staley, Bingyu Wu, Qicong Chen, Reuben Aronson, Elaine Short

PDF

Open Access

TL;DR

This paper investigates how non-expert users interpret robot foundation model performance data, emphasizing the importance of task success rates and additional information for understanding robot capabilities and risks.

Contribution

It provides empirical insights into how non-experts interpret RFM evaluation data, highlighting the value of success rates and failure case information for informed decision-making.

Findings

01

Non-experts interpret success rates similarly to experts.

02

Failure case descriptions are highly valued by users.

03

Users desire access to both real evaluation data and predictive estimates.

Abstract

Robot Foundation Models (RFMs) represent a promising approach to developing general-purpose home robots. Given the broad capabilities of RFMs, users will inevitably ask an RFM-based robot to perform tasks that the RFM was not trained or evaluated on. In these cases, it is crucial that users understand the risks associated with attempting novel tasks due to the relatively high cost of failure. Furthermore, an informed user who understands an RFM's capabilities will know what situations and tasks the robot can handle. In this paper, we study how non-roboticists interpret performance information from RFM evaluations. These evaluations typically report task success rate (TSR) as the primary performance metric. While TSR is intuitive to experts, it is necessary to validate whether novices also use this information as intended. Toward this end, we conducted a study in which users saw real…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Robot Manipulation and Learning · Human-Automation Interaction and Safety