Composing Pick-and-Place Tasks By Grounding Language

Oier Mees; Wolfram Burgard

arXiv:2102.08094·cs.RO·February 17, 2021

Composing Pick-and-Place Tasks By Grounding Language

Oier Mees, Wolfram Burgard

PDF

2 Repos

TL;DR

This paper introduces a robot system that understands and executes complex pick-and-place tasks from natural language instructions, including spatial relations, by grounding objects and their relationships from images and language.

Contribution

It is the first to ground both object picking and placement from language, enabling complex, natural language-guided manipulation tasks with a real robot.

Findings

01

Effective understanding of unconstrained language instructions

02

Successful grounding of objects and spatial relations

03

Demonstrated on a real PR2 robot

Abstract

Controlling robots to perform tasks via natural language is one of the most challenging topics in human-robot interaction. In this work, we present a robot system that follows unconstrained language instructions to pick and place arbitrary objects and effectively resolves ambiguities through dialogues. Our approach infers objects and their relationships from input images and language expressions and can place objects in accordance with the spatial relations expressed by the user. Unlike previous approaches, we consider grounding not only for the picking but also for the placement of everyday objects from language. Specifically, by grounding objects and their spatial relations, we allow specification of complex placement instructions, e.g. "place it behind the middle red bowl". Our results obtained using a real-world PR2 robot demonstrate the effectiveness of our method in understanding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.