Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks
Dayou Li, Chenkun Zhao, Shuo Yang, Lin Ma, Yibin Li, and Wei Zhang

TL;DR
This paper introduces IGANet, a model that predicts instruction-dependent manipulation regions for robots using vision and language priors, enhanced by large-scale data augmentation and large language models, improving generalization and performance.
Contribution
The paper presents a novel instruction-guided affordance prediction model that incorporates large pre-trained vision and language models, along with a data augmentation pipeline, for improved robotic manipulation.
Findings
Enhanced manipulation accuracy with generated data
Better generalization to unseen objects and instructions
Effective use of large-scale vision-language priors
Abstract
We study the task of language instruction-guided robotic manipulation, in which an embodied robot is supposed to manipulate the target objects based on the language instructions. In previous studies, the predicted manipulation regions of the target object typically do not change with specification from the language instructions, which means that the language perception and manipulation prediction are separate. However, in human behavioral patterns, the manipulation regions of the same object will change for different language instructions. In this paper, we propose Instruction-Guided Affordance Net (IGANet) for predicting affordance maps of instruction-guided robotic manipulation tasks by utilizing powerful priors from vision and language encoders pre-trained on large-scale datasets. We develop a Vison-Language-Models(VLMs)-based data augmentation pipeline, which can generate a large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning
