OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Kaichen Zhang, Keming Wu, Zuhao Yang, Bo Li, Kairui Hu, Bin Wang, Ziwei Liu, Xingxuan Li, Lidong Bing

TL;DR
OpenMMReasoner introduces a transparent, two-stage training recipe for multimodal reasoning that significantly improves performance across multiple benchmarks by emphasizing data quality and training design.
Contribution
It presents a fully transparent, reproducible two-stage training approach combining supervised fine-tuning and reinforcement learning for multimodal reasoning.
Findings
Achieves 11.6% improvement over baseline on nine benchmarks.
Constructs a large, validated dataset for initial training.
Demonstrates the importance of data quality and training design.
Abstract
Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of transparent and reproducible data curation and training strategies remains a major barrier to scalable research. In this work, we introduce OpenMMReasoner, a fully transparent two-stage recipe for multimodal reasoning spanning supervised fine-tuning (SFT) and reinforcement learning (RL). In the SFT stage, we construct an 874K-sample cold-start dataset with rigorous step-by-step validation, providing a strong foundation for reasoning capabilities. The subsequent RL stage leverages a 74K-sample dataset across diverse domains to further sharpen and stabilize these abilities, resulting in a more robust and efficient learning process. Extensive evaluations demonstrate that our training recipe not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗OpenMMReasoner/OpenMMReasoner-RLmodel· 170 dl· ♡ 16170 dl♡ 16
- 🤗OpenMMReasoner/OpenMMReasoner-ColdStartmodel· 7.2k dl· ♡ 37.2k dl♡ 3
- 🤗AIcell/Frankenstein-INmodel· 1 dl1 dl
- 🤗AIcell/Frankenstein-RLmodel· 1 dl1 dl
- 🤗AIcell/Frankenstein-RL-Frozen-Latemodel· 2 dl2 dl
- 🤗AIcell/Frankenstein-RL-Frozen-Earlymodel· 1 dl1 dl
- 🤗AIcell/Frankenstein-RL-Frozen-Midmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling
