MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation
Abhay Deshpande, Maya Guru, Rose Hendrix, Snehal Jauhri, Ainaz Eftekhar, Rohun Tripathi, Max Argus, Jordi Salvador, Haoquan Fang, Matthew Wallingford, Wilbert Pumacay, Yejin Kim, Quinn Pfeifer, Ying-Chun Lee, Piper Wolters, Omar Rayyan, Mingtong Zhang, Jiafei Duan, Karen Farley

TL;DR
This paper demonstrates that large-scale, diverse simulated training data can enable zero-shot transfer of manipulation policies to real robots without real-world fine-tuning, challenging the common belief that simulation alone is insufficient.
Contribution
The authors introduce MolmoBot-Engine for procedural data generation and release a large dataset, enabling zero-shot sim-to-real transfer for manipulation tasks with diverse environments.
Findings
Zero-shot transfer achieved with 79.2% success rate on real robots.
Diverse synthetic data improves robustness and generalization.
Open-source pipeline facilitates scalable simulation-based policy training.
Abstract
A prevailing view in robot learning is that simulation alone is not enough; effective sim-to-real transfer is widely believed to require at least some real-world data collection or task-specific fine-tuning to bridge the gap between simulated and physical environments. We challenge that assumption. With sufficiently large-scale and diverse simulated synthetic training data, we show that zero-shot transfer to the real world is not only possible, but effective for both static and mobile manipulation. We introduce MolmoBot-Engine, a fully open-source pipeline for procedural data generation across robots, tasks, and diverse simulated environments in MolmoSpaces. With it, we release MolmoBot-Data, a dataset of 1.8 million expert trajectories for articulated object manipulation and pick-and-place tasks. We train three policy classes: MolmoBot, a Molmo2-based multi-frame vision-language model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗allenai/MolmoBot-Pi0-DROIDmodel· ♡ 3♡ 3
- 🤗allenai/MolmoBot-DROIDmodel· 32 dl· ♡ 232 dl♡ 2
- 🤗allenai/MolmoBot-Img-DROIDmodel· 15 dl· ♡ 215 dl♡ 2
- 🤗allenai/MolmoBot-Ablation-MF3-DROIDmodel· 10 dl· ♡ 210 dl♡ 2
- 🤗allenai/MolmoBot-SPOC-RBY1Rigidmodel· ♡ 2♡ 2
- 🤗allenai/MolmoBot-RBY1DoorOpeningmodel· 8 dl· ♡ 28 dl♡ 2
- 🤗allenai/MolmoBot-RBY1Multitaskmodel· 11 dl· ♡ 211 dl♡ 2
- 🤗allenai/MolmoBot-SPOC-RBY1Articulatedmodel· ♡ 1♡ 1
- 🤗allenai/MolmoBot-SPOC-DROIDmodel· ♡ 4♡ 4
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics
