ORACLE-Grasp: Zero-Shot Affordance-Aligned Robotic Grasping using Large Multimodal Models

Avihai Giuili; Rotem Atari; Avishai Sintov

arXiv:2505.08417·cs.RO·February 17, 2026

ORACLE-Grasp: Zero-Shot Affordance-Aligned Robotic Grasping using Large Multimodal Models

Avihai Giuili, Rotem Atari, Avishai Sintov

PDF

TL;DR

ORACLE-Grasp introduces a zero-shot, multimodal model-based framework for robotic grasping that combines semantic understanding and spatial reasoning to generalize across diverse objects without task-specific training.

Contribution

The paper presents ORACLE-Grasp, a novel zero-shot approach using large multimodal models for affordance-aligned grasping without training on specific datasets.

Findings

01

Achieves high success rates in real-world grasping tasks.

02

Produces human-like, context-sensitive grasp suggestions.

03

Operates effectively on diverse RGB and RGB-D images.

Abstract

Grasping unknown objects in unstructured environments is a critical challenge for service robots, which must operate in dynamic, real-world settings such as homes, hospitals, and warehouses. Success in these environments requires both semantic understanding and spatial reasoning. Traditional methods often rely on dense training datasets or detailed geometric modeling, which demand extensive data collection and do not generalize well to novel objects or affordances. We present ORACLE-Grasp, a zero-shot framework that leverages Large Multimodal Models (LMMs) as semantic oracles to guide affordance-aligned grasp selection, without requiring task-specific training or manual input. The system reformulates grasp prediction as a structured, iterative decision process, using a dual-prompt tool-calling strategy: the first prompt extracts high-level object semantics, while the second identifies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.