SuctionPrompt: Visual-assisted Robotic Picking with a Suction Cup Using   Vision-Language Models and Facile Hardware Design

Tomohiro Motoda; Takahide Kitamura; Ryo Hanai; Yukiyasu Domae

arXiv:2410.23640·cs.RO·November 1, 2024

SuctionPrompt: Visual-assisted Robotic Picking with a Suction Cup Using Vision-Language Models and Facile Hardware Design

Tomohiro Motoda, Takahide Kitamura, Ryo Hanai, Yukiyasu Domae

PDF

Open Access

TL;DR

SuctionPrompt leverages vision-language models and 3D detection to enable robots to perform product picking with high accuracy and success rates in dynamic environments, demonstrating effective integration of AI models into robotic manipulation.

Contribution

This work introduces SuctionPrompt, a novel robotic system that combines VLM prompting with 3D spatial info for versatile object picking in real-world settings.

Findings

01

75.4% accuracy in selecting suction points

02

65.0% success rate in picking common items

03

Effective use of VLMs with simple 3D processing

Abstract

The development of large language models and vision-language models (VLMs) has resulted in the increasing use of robotic systems in various fields. However, the effective integration of these models into real-world robotic tasks is a key challenge. We developed a versatile robotic system called SuctionPrompt that utilizes prompting techniques of VLMs combined with 3D detections to perform product-picking tasks in diverse and dynamic environments. Our method highlights the importance of integrating 3D spatial information with adaptive action planning to enable robots to approach and manipulate objects in novel environments. In the validation experiments, the system accurately selected suction points 75.4%, and achieved a 65.0% success rate in picking common items. This study highlights the effectiveness of VLMs in robotic manipulation tasks, even with simple 3D processing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoft Robotics and Applications · Modular Robots and Swarm Intelligence · Robot Manipulation and Learning