OVAL-Grasp: Open-Vocabulary Affordance Localization for Task Oriented Grasping
Edmond Tong, Advaith Balaji, Anthony Opipari, Stanley Lewis, Zhen Zeng, Odest Chadwicke Jenkins

TL;DR
OVAL-Grasp is a zero-shot, open-vocabulary method that uses large-language and vision-language models to enable robots to perform task-oriented, affordance-based grasping on novel objects by identifying and segmenting target object parts.
Contribution
It introduces a novel modular approach combining LLMs and VLMs for task-oriented grasping, outperforming existing methods in unstructured environments.
Findings
Achieved 95% accuracy in identifying correct object parts.
Successfully grasped correct actionable areas 78.3% of the time in real-world tests.
Maintained 80% success rate in cluttered scenes with occlusions.
Abstract
To manipulate objects in novel, unstructured environments, robots need task-oriented grasps that target object parts based on the given task. Geometry-based methods often struggle with visually defined parts, occlusions, and unseen objects. We introduce OVAL-Grasp, a zero-shot open-vocabulary approach to task-oriented, affordance based grasping that uses large-language models and vision-language models to allow a robot to grasp objects at the correct part according to a given task. Given an RGB image and a task, OVAL-Grasp identifies parts to grasp or avoid with an LLM, segments them with a VLM, and generates a 2D heatmap of actionable regions on the object. During our evaluations, we found that our method outperformed two task oriented grasping baselines on experiments with 20 household objects with 3 unique tasks for each. OVAL-Grasp successfully identifies and segments the correct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Motor Control and Adaptation · Social Robot Interaction and HRI
