MolmoWeb: Open Visual Web Agent and Open Data for the Open Web
Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, Harsh Trivedi, Taylor Blanton, Caleb Ouellette, Winson Han, Ali Farhadi, and Ranjay Krishna

TL;DR
MolmoWeb introduces fully open multimodal web agents trained on diverse data, achieving state-of-the-art performance on web navigation benchmarks without proprietary models.
Contribution
The paper presents MolmoWeb, a family of open, multimodal web agents trained on a large, diverse dataset, outperforming proprietary models on key benchmarks.
Findings
MolmoWeb agents outperform similar scale open models on web benchmarks.
Test-time scaling improves success rates significantly.
Open data and models will be released for reproducibility.
Abstract
Web agents--autonomous systems that navigate and execute tasks on the web on behalf of users--have the potential to transform how people interact with the digital world. However, the most capable web agents today rely on proprietary models with undisclosed training data and recipes, limiting scientific understanding, reproducibility, and community-driven progress. We believe agents for the open web should be built in the open. To this end, we introduce (1) MolmoWebMix, a large and diverse mixture of browser task demonstrations and web-GUI perception data and (2) MolmoWeb, a family of fully open multimodal web agents. Specifically, MolmoWebMix combines over 100K synthetic task trajectories from multiple complementary generation pipelines with 30K+ human demonstrations, atomic web-skill trajectories, and GUI perception data, including referring expression grounding and screenshot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- allenai/MolmoWeb-SyntheticTrajsdataset· 1.7k dl1.7k dl
- allenai/MolmoWeb-SyntheticGrounddataset· 632 dl632 dl
- allenai/MolmoWeb-HumanSkillsdataset· 2.4k dl2.4k dl
- allenai/MolmoWeb-SyntheticSkillsdataset· 382 dl382 dl
- allenai/MolmoWeb-HumanTrajsdataset· 2.6k dl2.6k dl
- allenai/MolmoWeb-SyntheticQAdataset· 1.1k dl1.1k dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
