GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic   Manipulation

Junghyun Kim; Gi-Cheon Kang; Jaein Kim; Suyeon Shin; Byoung-Tak Zhang

arXiv:2307.05963·cs.RO·July 13, 2023·1 cites

GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Suyeon Shin, Byoung-Tak Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces GVCCI, a lifelong learning framework that enables visual grounding models to adapt to robotic manipulation tasks without human supervision, significantly improving performance across diverse environments.

Contribution

GVCCI is the first lifelong learning approach for visual grounding in robotic manipulation, generating synthetic instructions to continuously improve model accuracy without human labels.

Findings

01

VG accuracy improved by up to 56.7%

02

LGRM performance increased by up to 29.4%

03

Introduced a large-scale dataset with 252k triplets

Abstract

Language-Guided Robotic Manipulation (LGRM) is a challenging task as it requires a robot to understand human instructions to manipulate everyday objects. Recent approaches in LGRM rely on pre-trained Visual Grounding (VG) models to detect objects without adapting to manipulation environments. This results in a performance drop due to a substantial domain gap between the pre-training and real-world data. A straightforward solution is to collect additional training data, but the cost of human-annotation is extortionate. In this paper, we propose Grounding Vision to Ceaselessly Created Instructions (GVCCI), a lifelong learning framework for LGRM, which continuously learns VG without human supervision. GVCCI iteratively generates synthetic instruction via object detection and trains the VG model with the generated data. We validate our framework in offline and online settings across diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JHKim-snu/GVCCI
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition