When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question Answering

Tao Wu; Chuhao Zhou; Guangyu Zhao; Haozhi Cao; Yewen Pu; Jianfei Yang

arXiv:2512.04597·cs.CV·December 5, 2025

When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question Answering

Tao Wu, Chuhao Zhou, Guangyu Zhao, Haozhi Cao, Yewen Pu, Jianfei Yang

PDF

Open Access

TL;DR

This paper introduces AbstainEQA, a benchmark for evaluating when embodied question answering agents should abstain from answering, highlighting the importance of abstention for reliable human-agent interaction.

Contribution

It presents a new dataset and evaluation framework for abstention in embodied question answering, inspired by human communication errors and cognitive theories.

Findings

01

Best models only achieve 42.79% abstention recall

02

Humans achieve 91.17% abstention recall

03

Scaling and prompting yield marginal improvements

Abstract

Embodied Question Answering (EQA) requires an agent to interpret language, perceive its environment, and navigate within 3D scenes to produce responses. Existing EQA benchmarks assume that every question must be answered, but embodied agents should know when they do not have sufficient information to answer. In this work, we focus on a minimal requirement for EQA agents, abstention: knowing when to withhold an answer. From an initial study of 500 human queries, we find that 32.4% contain missing or underspecified context. Drawing on this initial study and cognitive theories of human communication errors, we derive five representative categories requiring abstention: actionability limitation, referential underspecification, preference dependence, information unavailability, and false presupposition. We augment OpenEQA by having annotators transform well-posed questions into ambiguous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Topic Modeling