MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Xingxuan Li; Yao Xiao; Dianwen Ng; Hai Ye; Yue Deng; Xiang Lin; Bin Wang; Zhanfeng Mo; Chong Zhang; Yueyi Zhang; Zonglin Yang; Ruilin Li; Lei Lei; Shihao Xu; Han Zhao; Weiling Chen; Feng Ji; Lidong Bing

arXiv:2507.14683·cs.CL·July 22, 2025

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Xingxuan Li, Yao Xiao, Dianwen Ng, Hai Ye, Yue Deng, Xiang Lin, Bin Wang, Zhanfeng Mo, Chong Zhang, Yueyi Zhang, Zonglin Yang, Ruilin Li, Lei Lei, Shihao Xu, Han Zhao, Weiling Chen, Feng Ji, Lidong Bing

PDF

2 Datasets

TL;DR

MiroMind-M1 introduces an open-source, multi-stage trained mathematical reasoning language model with a novel context-aware policy optimization, achieving state-of-the-art results and enhancing reproducibility in reasoning tasks.

Contribution

It presents the first fully open-source RLM trained on curated math datasets with a new context-aware multi-stage policy optimization algorithm.

Findings

01

Achieves state-of-the-art performance on math benchmarks.

02

Demonstrates superior token efficiency compared to similar models.

03

Provides comprehensive open-source resources for reproducibility.

Abstract

Large language models have recently evolved from fluent text generation to advanced reasoning across diverse domains, giving rise to reasoning language models. Among these domains, mathematical reasoning serves as a representative benchmark as it requires precise multi-step logic and abstract reasoning, which can be generalized to other tasks. While closed-source RLMs such as GPT-o3 demonstrate impressive reasoning capabilities, their proprietary nature limits transparency and reproducibility. Although many open-source projects aim to close this gap, most of them lack sufficient openness by omitting critical resources such as datasets and detailed training configurations, which hinders reproducibility. To contribute toward greater transparency in RLM development, we introduce the MiroMind-M1 series, a set of fully open-source RLMs built on the Qwen-2.5 backbone that match or exceed the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.