MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Honglin Lin, Zheng Liu, Yun Zhu, Chonghan Qin, Juekai Lin, Xiaoran Shang, Conghui He, Wentao Zhang, Lijun Wu

TL;DR
This paper introduces MMFineReason, a large-scale multimodal reasoning dataset with high-quality annotations, enabling models to achieve state-of-the-art performance across diverse challenging visual reasoning tasks.
Contribution
The creation of MMFineReason, a comprehensive dataset with systematic quality filtering and reasoning annotations, and the development of fine-tuned models that outperform larger proprietary systems.
Findings
Models trained on MMFineReason outperform existing open-source models.
A small, high-quality subset of data achieves comparable performance to the full dataset.
Reasoning-oriented data enhances general multimodal capabilities.
Abstract
Recent advances in Vision Language Models (VLMs) have driven significant progress in visual reasoning. However, open-source VLMs still lag behind proprietary systems, largely due to the lack of high-quality reasoning data. Existing datasets offer limited coverage of challenging domains such as STEM diagrams and visual puzzles, and lack consistent, long-form Chain-of-Thought (CoT) annotations essential for eliciting strong reasoning capabilities. To bridge this gap, we introduce MMFineReason, a large-scale multimodal reasoning dataset comprising 1.8M samples and 5.1B solution tokens, featuring high-quality reasoning annotations distilled from Qwen3-VL-235B-A22B-Thinking. The dataset is established via a systematic three-stage pipeline: (1) large-scale data collection and standardization, (2) CoT rationale generation, and (3) comprehensive selection based on reasoning quality and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- OpenDataArena/MMFineReason-1.8M-Qwen3-VL-235B-Thinkingdataset· 1.5k dl1.5k dl
- OpenDataArena/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinkingdataset· 680 dl680 dl
- OpenDataArena/MMFineReason-Full-2.3M-Qwen3-VL-235B-Thinkingdataset· 3.8k dl3.8k dl
- OpenDataArena/MMFineReason-SFT-586K-Qwen3-VL-235B-Thinkingdataset· 649 dl649 dl
- OcasAI/FineReason-1.8M-Qwen3-VL-235B-Thinkingdataset· 78 dl78 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Graph Neural Networks
