GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

Shailesh; Alok Raj; Nayan Kumar; Priya Shukla; Andrew Melnik; Michael Beetz; Gora Chand Nandi

arXiv:2506.15607·cs.RO·November 18, 2025

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

Shailesh, Alok Raj, Nayan Kumar, Priya Shukla, Andrew Melnik, Michael Beetz, Gora Chand Nandi

PDF

Open Access 1 Video

TL;DR

GRIM is a training-free framework that uses video generation models and a retrieval-then-refinement pipeline to enable robots to select functionally appropriate grasps for specific tasks, demonstrating strong generalization and state-of-the-art results.

Contribution

GRIM introduces a novel, training-free approach combining video generation models with exemplar retrieval and geometric refinement for task-oriented grasping.

Findings

01

Achieves state-of-the-art performance on TOG benchmarks.

02

Demonstrates strong generalization across diverse objects and tasks.

03

Effectively constructs a memory of object-task exemplars from various sources.

Abstract

Task-Oriented Grasping (TOG) requires robots to select grasps that are functionally appropriate for a specified task - a challenge that demands an understanding of task semantics, object affordances, and functional constraints. We present GRIM (Grasp Re-alignment via Iterative Matching), a training-free framework that addresses these challenges by leveraging Video Generation Models (VGMs) together with a retrieve-align-transfer pipeline. Beyond leveraging VGMs, GRIM can construct a memory of object-task exemplars sourced from web images, human demonstrations, or generative models. The retrieved task-oriented grasp is then transferred and refined by evaluating it against a set of geometrically stable candidate grasps to ensure both functional suitability and physical feasibility. GRIM demonstrates strong generalization and achieves state-of-the-art performance on standard TOG benchmarks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples· underline

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning