Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion   Descriptors

Nikolaos Tsagkas; Jack Rome; Subramanian Ramamoorthy; Oisin Mac Aodha,; Chris Xiaoxuan Lu

arXiv:2403.14526·cs.RO·December 30, 2024·1 cites

Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors

Nikolaos Tsagkas, Jack Rome, Subramanian Ramamoorthy, Oisin Mac Aodha,, Chris Xiaoxuan Lu

PDF

Open Access

TL;DR

This paper introduces a zero-shot method for precise robotic manipulation using visual diffusion models to establish dense semantic part correspondence, enabling manipulation based on user clicks without manual demonstrations.

Contribution

It presents a novel zero-shot approach leveraging web-trained diffusion models for fine-grained part correspondence in robotic manipulation, eliminating the need for extensive training data.

Findings

01

Effective zero-shot manipulation in real-world scenarios

02

No manual grasping demonstrations required

03

Demonstrates robustness across different object instances

Abstract

Precise manipulation that is generalizable across scenes and objects remains a persistent challenge in robotics. Current approaches for this task heavily depend on having a significant number of training instances to handle objects with pronounced visual and/or geometric part ambiguities. Our work explores the grounding of fine-grained part descriptors for precise manipulation in a zero-shot setting by utilizing web-trained text-to-image diffusion-based generative models. We tackle the problem by framing it as a dense semantic part correspondence task. Our model returns a gripper pose for manipulating a specific part, using as reference a user-defined click from a source image of a visually different instance of the same object. We require no manual grasping demonstrations as we leverage the intrinsic object geometry and features. Practical experiments in a real-world tabletop scenario…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Image and Object Detection Techniques · Domain Adaptation and Few-Shot Learning