TL;DR
This paper introduces a robot system that understands and executes complex pick-and-place tasks from natural language instructions, including spatial relations, by grounding objects and their relationships from images and language.
Contribution
It is the first to ground both object picking and placement from language, enabling complex, natural language-guided manipulation tasks with a real robot.
Findings
Effective understanding of unconstrained language instructions
Successful grounding of objects and spatial relations
Demonstrated on a real PR2 robot
Abstract
Controlling robots to perform tasks via natural language is one of the most challenging topics in human-robot interaction. In this work, we present a robot system that follows unconstrained language instructions to pick and place arbitrary objects and effectively resolves ambiguities through dialogues. Our approach infers objects and their relationships from input images and language expressions and can place objects in accordance with the spatial relations expressed by the user. Unlike previous approaches, we consider grounding not only for the picking but also for the placement of everyday objects from language. Specifically, by grounding objects and their spatial relations, we allow specification of complex placement instructions, e.g. "place it behind the middle red bowl". Our results obtained using a real-world PR2 robot demonstrate the effectiveness of our method in understanding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
