MolmoAct2: Action Reasoning Models for Real-world Deployment
Haoquan Fang, Jiafei Duan, Donovan Clay, Sam Wang, Shuo Liu, Weikai Huang, Xiang Fan, Wei-Chuan Tsai, Shirui Chen, Yi Ru Wang, Shanli Xing, Jaemin Cho, Jae Sung Park, Ainaz Eftekhar, Peter Sushko, Karen Farley, Angad Wadhwa, Cole Harrison, Winson Han, Ying-Chun Lee

TL;DR
MolmoAct2 is an open, practical vision-language-action model for robots that advances reasoning, datasets, and architecture, outperforming prior models in extensive real-world and simulation benchmarks.
Contribution
It introduces MolmoER, a specialized VLM backbone, new datasets, an open-weight action tokenizer, and a novel architecture with flow-matching for continuous actions, plus an adaptive reasoning variant.
Findings
MolmoAct2 outperforms strong baselines in extensive benchmarks.
MolmoER surpasses GPT-5 and Gemini ER-1.5 in embodied reasoning.
The model and datasets are openly released for community use.
Abstract
Vision-Language-Action (VLA) models aim to provide a single generalist controller for robots, but today's systems fall short on the criteria that matter for real-world deployment. Frontier models are closed, open-weight alternatives are tied to expensive hardware, reasoning-augmented policies pay prohibitive latency for their grounding, and fine-tuned success rates remain below the threshold for dependable use. We present MolmoAct2, a fully open action reasoning model built for practical deployment, advancing its predecessor along five axes. We introduce MolmoER, a VLM backbone specialized for spatial and embodied reasoning, trained on a 3.3M-sample corpus with a specialize-then-rehearse recipe. We release three new datasets spanning low-to-medium cost platforms, including MolmoAct2-BimanualYAM, 720 hours of teleoperated bimanual trajectories that constitute the largest open bimanual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗allenai/Molmo2-ERmodel· 78k dl· ♡ 1178k dl♡ 11
- 🤗allenai/MolmoAct2model· 384 dl· ♡ 12384 dl♡ 12
- 🤗allenai/MolmoAct2-SO100_101model· 1.6k dl· ♡ 101.6k dl♡ 10
- 🤗allenai/MolmoAct2-Think-LIBEROmodel· 283 dl· ♡ 4283 dl♡ 4
- 🤗allenai/MolmoAct2-Thinkmodel· 314 dl· ♡ 2314 dl♡ 2
- 🤗allenai/MolmoAct2-Pretrainmodel· 743 dl· ♡ 4743 dl♡ 4
- 🤗allenai/MolmoAct2-DROIDmodel· 502 dl· ♡ 4502 dl♡ 4
- 🤗allenai/MolmoAct2-BimanualYAMmodel· 2.6k dl· ♡ 42.6k dl♡ 4
- 🤗allenai/MolmoAct2-LIBEROmodel· 462 dl· ♡ 2462 dl♡ 2
- 🤗allenai/MolmoAct2-FAST-Tokenizermodel· ♡ 4♡ 4
- allenai/10122025-box-01dataset· 579 dl579 dl
- allenai/24122025-foldclo-03dataset· 377 dl377 dl
- allenai/20012026-charging-03dataset· 650 dl650 dl
- allenai/21012026-charging-02dataset· 456 dl456 dl
- allenai/MolmoAct2-LIBERO-Datasetdataset· 964 dl964 dl
- allenai/24112025-yam-01dataset· 2.1k dl2.1k dl
- allenai/25112025-yam-02dataset· 479 dl479 dl
- allenai/25112025-yam-03dataset· 286 dl286 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
