TL;DR
M2-Reasoning-7B is a new multimodal model that combines innovative data generation and multi-task training to excel in general and spatial reasoning, achieving state-of-the-art results across multiple benchmarks.
Contribution
The paper introduces a novel data pipeline and dynamic multi-task training strategy to enhance reasoning capabilities in multimodal large language models.
Findings
Achieved SOTA performance on 8 reasoning benchmarks.
Generated 294.2K high-quality reasoning samples.
Effectively integrated spatial and general reasoning in a single model.
Abstract
Recent advancements in Multimodal Large Language Models (MLLMs), particularly through Reinforcement Learning with Verifiable Rewards (RLVR), have significantly enhanced their reasoning abilities. However, a critical gap persists: these models struggle with dynamic spatial interactions, a capability essential for real-world applications. To bridge this gap, we introduce M2-Reasoning-7B, a model designed to excel in both general and spatial reasoning. Our approach integrates two key innovations: (1) a novel data pipeline that generates 294.2K high-quality data samples (168K for cold-start fine-tuning and 126.2K for RLVR), which feature logically coherent reasoning trajectories and have undergone comprehensive assessment; and (2) a dynamic multi-task training strategy with step-wise optimization to mitigate conflicts between data, and task-specific rewards for delivering tailored incentive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
