VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models

Manav Kulshrestha; S. Talha Bukhari; Damon Conover; Aniket Bera

arXiv:2511.05791·cs.RO·March 17, 2026

VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models

Manav Kulshrestha, S. Talha Bukhari, Damon Conover, Aniket Bera

PDF

Open Access

TL;DR

VLAD-Grasp leverages vision-language models for zero-shot robotic grasp detection, eliminating the need for curated datasets and enabling generalization to real-world objects with competitive performance.

Contribution

The paper introduces a training-free, zero-shot grasp detection method using vision-language models, advancing robotic manipulation without dataset curation.

Findings

01

Achieves competitive accuracy on Cornell and Jacquard datasets.

02

Demonstrates successful zero-shot grasping on real-world objects.

03

Eliminates the need for curated grasp datasets.

Abstract

Robotic grasping is a fundamental capability for enabling autonomous manipulation, with usually infinite solutions. State-of-the-art approaches for grasping rely on learning from large-scale datasets comprising expert annotations of feasible grasps. Curating such datasets is challenging, and hence, learning-based methods are limited by the solution coverage of the dataset, and require retraining to handle novel objects. Towards this, we present VLAD-Grasp, a Vision-Language model Assisted zero-shot approach for Detecting Grasps. Our method (1) prompts a large vision-language model to generate a goal image where a virtual cylindrical proxy intersects the object's geometry, explicitly encoding an antipodal grasp axis in image space, then (2) predicts depth and segmentation to lift this generated image into 3D, and (3) aligns generated and observed object point clouds via principal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Motor Control and Adaptation · Reinforcement Learning in Robotics