Talk-to-Resolve: Combining scene understanding and spatial dialogue to resolve granular task ambiguity for a collocated robot
Pradip Pramanick, Chayan Sarkar, Snehasis Banerjee, Brojeshwar, Bhowmick

TL;DR
Talk-to-Resolve is a system that enables collocated robots to use scene understanding and spatial dialogue to resolve task ambiguities, improving interaction naturalness and task success rate.
Contribution
The paper introduces a novel dialogue-based approach combining scene understanding and language to resolve task stalemates in robot instruction execution.
Findings
82% accuracy in identifying and resolving stalemates
Questions from the system are rated more natural (4.02/5) than state-of-the-art
System effectively uses dense scene captions and instructions for decision-making
Abstract
The utility of collocating robots largely depends on the easy and intuitive interaction mechanism with the human. If a robot accepts task instruction in natural language, first, it has to understand the user's intention by decoding the instruction. However, while executing the task, the robot may face unforeseeable circumstances due to the variations in the observed scene and therefore requires further user intervention. In this article, we present a system called Talk-to-Resolve (TTR) that enables a robot to initiate a coherent dialogue exchange with the instructor by observing the scene visually to resolve the impasse. Through dialogue, it either finds a cue to move forward in the original plan, an acceptable alternative to the original plan, or affirmation to abort the task altogether. To realize the possible stalemate, we utilize the dense captions of the observed scene and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
