TL;DR
This paper introduces 3D Instruction Ambiguity Detection, a new task to identify ambiguous commands in 3D scenes, supported by a large benchmark dataset and a novel two-stage framework, AmbiVer, to improve ambiguity detection.
Contribution
The paper defines the new task of 3D Instruction Ambiguity Detection, creates the Ambi3D benchmark, and proposes AmbiVer, a framework that enhances ambiguity judgment using multi-view visual evidence.
Findings
State-of-the-art 3D LLMs struggle with ambiguity detection.
AmbiVer improves ambiguity detection accuracy.
The task reveals limitations of current embodied AI models.
Abstract
In safety-critical domains, linguistic ambiguity can have severe consequences; a vague command like "Pass me the vial" in a surgical setting could lead to catastrophic errors. Yet, most embodied AI research overlooks this, assuming instructions are clear and focusing on execution rather than confirmation. To address this critical safety gap, we are the first to define 3D Instruction Ambiguity Detection, a fundamental new task where a model must determine if a command has a single, unambiguous meaning within a given 3D scene. To support this research, we build Ambi3D, the large-scale benchmark for this task, featuring over 700 diverse 3D scenes and around 22k instructions. Our analysis reveals a surprising limitation: state-of-the-art 3D Large Language Models (LLMs) struggle to reliably determine if an instruction is ambiguous. To address this challenge, we propose AmbiVer, a two-stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
