OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Kaichen Zhang; Keming Wu; Zuhao Yang; Bo Li; Kairui Hu; Bin Wang; Ziwei Liu; Xingxuan Li; Lidong Bing

arXiv:2511.16334·cs.AI·December 8, 2025

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Kaichen Zhang, Keming Wu, Zuhao Yang, Bo Li, Kairui Hu, Bin Wang, Ziwei Liu, Xingxuan Li, Lidong Bing

PDF

Open Access 7 Models 3 Datasets

TL;DR

OpenMMReasoner introduces a transparent, two-stage training recipe for multimodal reasoning that significantly improves performance across multiple benchmarks by emphasizing data quality and training design.

Contribution

It presents a fully transparent, reproducible two-stage training approach combining supervised fine-tuning and reinforcement learning for multimodal reasoning.

Findings

01

Achieves 11.6% improvement over baseline on nine benchmarks.

02

Constructs a large, validated dataset for initial training.

03

Demonstrates the importance of data quality and training design.

Abstract

Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of transparent and reproducible data curation and training strategies remains a major barrier to scalable research. In this work, we introduce OpenMMReasoner, a fully transparent two-stage recipe for multimodal reasoning spanning supervised fine-tuning (SFT) and reinforcement learning (RL). In the SFT stage, we construct an 874K-sample cold-start dataset with rigorous step-by-step validation, providing a strong foundation for reasoning capabilities. The subsequent RL stage leverages a 74K-sample dataset across diverse domains to further sharpen and stabilize these abilities, resulting in a more robust and efficient learning process. Extensive evaluations demonstrate that our training recipe not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling