OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping
Li Meng, Zhao Qi, Lyu Shuchang, Wang Chunlei, Ma Yujing, Cheng, Guangliang, Yang Chenguang

TL;DR
This paper introduces OVGNet, a unified visual-linguistic framework that enables robots to recognize and grasp both known and novel objects by leveraging a new benchmark dataset and alignment modules, significantly improving open-vocabulary robotic grasping performance.
Contribution
The paper presents a novel framework integrating open-vocabulary learning into robotic grasping, along with a new benchmark dataset and alignment modules for enhanced perception.
Findings
Achieved 71.2% accuracy on base objects
Achieved 64.4% accuracy on novel objects
Validated effectiveness through extensive experiments
Abstract
Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamlessly propose a novel framework that integrates open-vocabulary learning into the domain of robotic grasping, empowering robots with the capability to adeptly handle novel objects. Our contributions are threefold. Firstly, we present a large-scale benchmark dataset specifically tailored for evaluating the performance of open-vocabulary grasping tasks. Secondly, we propose a unified visual-linguistic framework that serves as a guide for robots in successfully grasping both base and novel objects. Thirdly, we introduce two alignment modules designed to enhance visual-linguistic perception in the robotic grasping process. Extensive experiments validate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Automated Systems · Natural Language Processing Techniques
