Affordance-Aware Interactive Decision-Making and Execution for Ambiguous Instructions

Hengxuan Xu; Fengbo Lan; Zhixin Zhao; Shengjie Wang; Mengqiao Liu; Jieqian Sun; Yu Cheng; Tao Zhang

arXiv:2602.05273·cs.RO·February 6, 2026

Affordance-Aware Interactive Decision-Making and Execution for Ambiguous Instructions

Hengxuan Xu, Fengbo Lan, Zhixin Zhao, Shengjie Wang, Mengqiao Liu, Jieqian Sun, Yu Cheng, Tao Zhang

PDF

Open Access

TL;DR

This paper introduces AIDE, a dual-stream framework that enhances robot understanding and execution of ambiguous instructions through interactive exploration and vision-language reasoning, achieving high success and accuracy rates.

Contribution

AIDE integrates interactive exploration with vision-language reasoning, enabling zero-shot affordance analysis and improved real-time decision-making for ambiguous instructions.

Findings

01

Achieves over 80% task planning success rate.

02

Attains more than 95% accuracy in continuous execution.

03

Outperforms existing VLM-based methods in diverse scenarios.

Abstract

Enabling robots to explore and act in unfamiliar environments under ambiguous human instructions by interactively identifying task-relevant objects (e.g., identifying cups or beverages for "I'm thirsty") remains challenging for existing vision-language model (VLM)-based methods. This challenge stems from inefficient reasoning and the lack of environmental interaction, which hinder real-time task planning and execution. To address this, We propose Affordance-Aware Interactive Decision-Making and Execution for Ambiguous Instructions (AIDE), a dual-stream framework that integrates interactive exploration with vision-language reasoning, where Multi-Stage Inference (MSI) serves as the decision-making stream and Accelerated Decision-Making (ADM) as the execution stream, enabling zero-shot affordance analysis and interpretation of ambiguous instructions. Extensive experiments in simulation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics