Open-Vocabulary Part-Based Grasping
Tjeard van Oort, Dimity Miller, Will N. Browne, Nicolas Marticorena, Jesse Haviland, Niko Suenderhauf

TL;DR
This paper introduces AnyPart, a modular framework combining open-vocabulary detection, segmentation, and grasping to enable robots to grasp specific object parts based on natural language, achieving high success rates efficiently.
Contribution
The paper presents a novel modular approach that unifies detection, segmentation, and grasping for open-vocabulary part-based grasping without extra training.
Findings
Achieves 60.8% grasp success in cluttered scenes
Operates 60 times faster than existing methods
Introduces a new dataset for part-based grasping
Abstract
Many robotic tasks require grasping objects at specific object parts instead of arbitrarily, a crucial capability for interactions beyond simple pick-and-place, such as human-robot interaction, handovers, or tool use. Prior work has focused either on generic grasp prediction or task-conditioned grasping, but not on directly targeting object parts in an open-vocabulary way. We propose AnyPart, a modular framework that unifies open-vocabulary object detection, part segmentation, and 6-DoF grasp prediction to enable robots to grasp user-specified parts of arbitrary objects based on natural language prompts. We evaluate 16 model combinations, and demonstrate that the best-performing combination achieves 60.8% grasp success in cluttered real-world scenes at 60 times faster inference than existing approaches. To support this study, we introduce a new dataset for part-based grasping and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
