ORACLE-Grasp: Zero-Shot Affordance-Aligned Robotic Grasping using Large Multimodal Models
Avihai Giuili, Rotem Atari, Avishai Sintov

TL;DR
ORACLE-Grasp introduces a zero-shot, multimodal model-based framework for robotic grasping that combines semantic understanding and spatial reasoning to generalize across diverse objects without task-specific training.
Contribution
The paper presents ORACLE-Grasp, a novel zero-shot approach using large multimodal models for affordance-aligned grasping without training on specific datasets.
Findings
Achieves high success rates in real-world grasping tasks.
Produces human-like, context-sensitive grasp suggestions.
Operates effectively on diverse RGB and RGB-D images.
Abstract
Grasping unknown objects in unstructured environments is a critical challenge for service robots, which must operate in dynamic, real-world settings such as homes, hospitals, and warehouses. Success in these environments requires both semantic understanding and spatial reasoning. Traditional methods often rely on dense training datasets or detailed geometric modeling, which demand extensive data collection and do not generalize well to novel objects or affordances. We present ORACLE-Grasp, a zero-shot framework that leverages Large Multimodal Models (LMMs) as semantic oracles to guide affordance-aligned grasp selection, without requiring task-specific training or manual input. The system reformulates grasp prediction as a structured, iterative decision process, using a dual-prompt tool-calling strategy: the first prompt extracts high-level object semantics, while the second identifies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
