TL;DR
Sim-Grasp introduces a synthetic benchmark and a learning system for 6-DOF robotic grasping in cluttered environments, integrating language models for target identification and achieving high success rates.
Contribution
The paper presents a new synthetic dataset, Sim-Grasp-Dataset, and a grasping network, Sim-GraspNet, that incorporate language models for flexible object manipulation in cluttered scenes.
Findings
Achieves over 97% success rate on single objects.
Attains approximately 87% success in mixed clutter scenarios.
Enables object-agnostic and target-specific grasping using language prompts.
Abstract
In this paper, we present Sim-Grasp, a robust 6-DOF two-finger grasping system that integrates advanced language models for enhanced object manipulation in cluttered environments. We introduce the Sim-Grasp-Dataset, which includes 1,550 objects across 500 scenarios with 7.9 million annotated labels, and develop Sim-GraspNet to generate grasp poses from point clouds. The Sim-Grasp-Polices achieve grasping success rates of 97.14% for single objects and 87.43% and 83.33% for mixed clutter scenarios of Levels 1-2 and Levels 3-4 objects, respectively. By incorporating language models for target identification through text and box prompts, Sim-Grasp enables both object-agnostic and target picking, pushing the boundaries of intelligent robotic systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
