MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation

Abhay Deshpande; Maya Guru; Rose Hendrix; Snehal Jauhri; Ainaz Eftekhar; Rohun Tripathi; Max Argus; Jordi Salvador; Haoquan Fang; Matthew Wallingford; Wilbert Pumacay; Yejin Kim; Quinn Pfeifer; Ying-Chun Lee; Piper Wolters; Omar Rayyan; Mingtong Zhang; Jiafei Duan; Karen Farley; Winson Han; Eli Vanderbilt; Dieter Fox; Ali Farhadi; Georgia Chalvatzaki; Dhruv Shah; Ranjay Krishna

arXiv:2603.16861·cs.RO·March 27, 2026

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation

Abhay Deshpande, Maya Guru, Rose Hendrix, Snehal Jauhri, Ainaz Eftekhar, Rohun Tripathi, Max Argus, Jordi Salvador, Haoquan Fang, Matthew Wallingford, Wilbert Pumacay, Yejin Kim, Quinn Pfeifer, Ying-Chun Lee, Piper Wolters, Omar Rayyan, Mingtong Zhang, Jiafei Duan, Karen Farley

PDF

Open Access 9 Models 1 Datasets

TL;DR

This paper demonstrates that large-scale, diverse simulated training data can enable zero-shot transfer of manipulation policies to real robots without real-world fine-tuning, challenging the common belief that simulation alone is insufficient.

Contribution

The authors introduce MolmoBot-Engine for procedural data generation and release a large dataset, enabling zero-shot sim-to-real transfer for manipulation tasks with diverse environments.

Findings

01

Zero-shot transfer achieved with 79.2% success rate on real robots.

02

Diverse synthetic data improves robustness and generalization.

03

Open-source pipeline facilitates scalable simulation-based policy training.

Abstract

A prevailing view in robot learning is that simulation alone is not enough; effective sim-to-real transfer is widely believed to require at least some real-world data collection or task-specific fine-tuning to bridge the gap between simulated and physical environments. We challenge that assumption. With sufficiently large-scale and diverse simulated synthetic training data, we show that zero-shot transfer to the real world is not only possible, but effective for both static and mobile manipulation. We introduce MolmoBot-Engine, a fully open-source pipeline for procedural data generation across robots, tasks, and diverse simulated environments in MolmoSpaces. With it, we release MolmoBot-Data, a dataset of 1.8 million expert trajectories for articulated object manipulation and pick-and-place tasks. We train three policy classes: MolmoBot, a Molmo2-based multi-frame vision-language model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

allenai/molmobot-data
dataset· 2.9k dl
2.9k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics