Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani,, Dieter Fox, Ranjay Krishna

TL;DR
Manipulate-Anything introduces a scalable, real-world robotic manipulation data generation method that operates without privileged information or hand-designed skills, outperforming existing approaches and enabling robust policy training.
Contribution
The paper presents a novel method that automates real-world robot data generation without privileged info or pre-designed skills, applicable to any static object.
Findings
Successfully generates trajectories for 7 real-world and 14 simulation tasks.
Outperforms existing methods like VoxPoser in data generation.
Produces demonstrations that train more robust policies than human or other synthetic data.
Abstract
Large-scale endeavors like and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited to environments with privileged state information, they require hand-designed skills, and are limited to interactions with few object instances. We propose Manipulate-Anything, a scalable automated generation method for real-world robotic manipulation. Unlike prior work, our method can operate in real-world environments without any privileged state information, hand-designed skills, and can manipulate any static object. We evaluate our method using two setups. First, Manipulate-Anything successfully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
