Manipulate-Anything: Automating Real-World Robots using Vision-Language   Models

Jiafei Duan; Wentao Yuan; Wilbert Pumacay; Yi Ru Wang; Kiana Ehsani,; Dieter Fox; Ranjay Krishna

arXiv:2406.18915·cs.RO·August 30, 2024·5 cites

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani,, Dieter Fox, Ranjay Krishna

PDF

Open Access

TL;DR

Manipulate-Anything introduces a scalable, real-world robotic manipulation data generation method that operates without privileged information or hand-designed skills, outperforming existing approaches and enabling robust policy training.

Contribution

The paper presents a novel method that automates real-world robot data generation without privileged info or pre-designed skills, applicable to any static object.

Findings

01

Successfully generates trajectories for 7 real-world and 14 simulation tasks.

02

Outperforms existing methods like VoxPoser in data generation.

03

Produces demonstrations that train more robust policies than human or other synthetic data.

Abstract

Large-scale endeavors like and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited to environments with privileged state information, they require hand-designed skills, and are limited to interactions with few object instances. We propose Manipulate-Anything, a scalable automated generation method for real-world robotic manipulation. Unlike prior work, our method can operate in real-world environments without any privileged state information, hand-designed skills, and can manipulate any static object. We evaluate our method using two setups. First, Manipulate-Anything successfully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques