Learning Instruction-Guided Manipulation Affordance via Large Models for   Embodied Robotic Tasks

Dayou Li; Chenkun Zhao; Shuo Yang; Lin Ma; Yibin Li; and Wei Zhang

arXiv:2408.10658·cs.RO·August 27, 2024

Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks

Dayou Li, Chenkun Zhao, Shuo Yang, Lin Ma, Yibin Li, and Wei Zhang

PDF

Open Access

TL;DR

This paper introduces IGANet, a model that predicts instruction-dependent manipulation regions for robots using vision and language priors, enhanced by large-scale data augmentation and large language models, improving generalization and performance.

Contribution

The paper presents a novel instruction-guided affordance prediction model that incorporates large pre-trained vision and language models, along with a data augmentation pipeline, for improved robotic manipulation.

Findings

01

Enhanced manipulation accuracy with generated data

02

Better generalization to unseen objects and instructions

03

Effective use of large-scale vision-language priors

Abstract

We study the task of language instruction-guided robotic manipulation, in which an embodied robot is supposed to manipulate the target objects based on the language instructions. In previous studies, the predicted manipulation regions of the target object typically do not change with specification from the language instructions, which means that the language perception and manipulation prediction are separate. However, in human behavioral patterns, the manipulation regions of the same object will change for different language instructions. In this paper, we propose Instruction-Guided Affordance Net (IGANet) for predicting affordance maps of instruction-guided robotic manipulation tasks by utilizing powerful priors from vision and language encoders pre-trained on large-scale datasets. We develop a Vison-Language-Models(VLMs)-based data augmentation pipeline, which can generate a large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning