VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects   in Cluttered Indoor Scenes

Yuhao Lu; Yixuan Fan; Beixing Deng; Fangfu Liu; Yali Li; Shengjin Wang

arXiv:2308.00640·cs.RO·August 2, 2023·1 cites

VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes

Yuhao Lu, Yixuan Fan, Beixing Deng, Fangfu Liu, Yali Li, Shengjin Wang

PDF

Open Access 1 Repo

TL;DR

VL-Grasp is a novel 6-DOF interactive grasp policy enabling robots to locate and grasp target objects specified by human language in cluttered indoor scenes, demonstrating high success rates and robustness.

Contribution

The paper introduces a new visual grounding dataset, a 6-DOF interactive grasp policy, and a grasp pose filter, advancing language-based robotic grasping in complex environments.

Findings

01

Achieved 72.5% success rate in real-world indoor scenes.

02

Extended the universality of interactive grasping with 6-DOF policy.

03

Demonstrated effectiveness and extendibility of VL-Grasp.

Abstract

Robotic grasping faces new challenges in human-robot-interaction scenarios. We consider the task that the robot grasps a target object designated by human's language directives. The robot not only needs to locate a target based on vision-and-language information, but also needs to predict the reasonable grasp pose candidate at various views and postures. In this work, we propose a novel interactive grasp policy, named Visual-Lingual-Grasp (VL-Grasp), to grasp the target specified by human language. First, we build a new challenging visual grounding dataset to provide functional training data for robotic interactive perception in indoor environments. Second, we propose a 6-Dof interactive grasp policy combined with visual grounding and 6-Dof grasp pose detection to extend the universality of interactive grasping. Third, we design a grasp pose filter module to enhance the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luyh20/vl-grasp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Robot Manipulation and Learning