TL;DR
This paper presents an end-to-end trainable CNN architecture for robotic grasp detection and semantic segmentation, achieving state-of-the-art accuracy and enabling object-specific grasping in complex scenes.
Contribution
Introduces a novel CNN architecture with a refinement module for improved grasp detection and extends the OCID dataset for challenging scene evaluation.
Findings
State-of-the-art accuracy on Cornell and Jacquard datasets
Effective use of semantic segmentation for object-specific grasping
Enhanced dataset for complex scene analysis
Abstract
In this work, we introduce a novel, end-to-end trainable CNN-based architecture to deliver high quality results for grasp detection suitable for a parallel-plate gripper, and semantic segmentation. Utilizing this, we propose a novel refinement module that takes advantage of previously calculated grasp detection and semantic segmentation and further increases grasp detection accuracy. Our proposed network delivers state-of-the-art accuracy on two popular grasp dataset, namely Cornell and Jacquard. As additional contribution, we provide a novel dataset extension for the OCID dataset, making it possible to evaluate grasp detection in highly challenging scenes. Using this dataset, we show that semantic segmentation can additionally be used to assign grasp candidates to object classes, which can be used to pick specific objects in the scene.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
