WarriorMath: Enhancing the Mathematical Ability of Large Language Models with a Defect-aware Framework

Yue Chen; Minghua He; Fangkai Yang; Pu Zhao; Lu Wang; Yu Kang; Yifei Dong; Yuefeng Zhan; Hao Sun; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang

arXiv:2508.01245·cs.CL·August 5, 2025

WarriorMath: Enhancing the Mathematical Ability of Large Language Models with a Defect-aware Framework

Yue Chen, Minghua He, Fangkai Yang, Pu Zhao, Lu Wang, Yu Kang, Yifei Dong, Yuefeng Zhan, Hao Sun, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

PDF

Open Access

TL;DR

WarriorMath is a novel framework that enhances large language models' mathematical skills by generating defect-aware training data through expert collaboration and progressive fine-tuning, leading to state-of-the-art performance.

Contribution

It introduces a defect-aware data synthesis process and progressive training strategy that specifically target LLM weaknesses, improving mathematical problem-solving capabilities.

Findings

01

Outperforms baselines by 12.57% on average across six benchmarks.

02

Produces high-quality, defect-aware training data through expert collaboration.

03

Sets new state-of-the-art in mathematical problem solving for LLMs.

Abstract

Large Language Models (LLMs) excel in solving mathematical problems, yet their performance is often limited by the availability of high-quality, diverse training data. Existing methods focus on augmenting datasets through rephrasing or difficulty progression but overlook the specific failure modes of LLMs. This results in synthetic questions that the model can already solve, providing minimal performance gains. To address this, we propose WarriorMath, a defect-aware framework for mathematical problem solving that integrates both targeted data synthesis and progressive training. In the synthesis stage, we employ multiple expert LLMs in a collaborative process to generate, critique, and refine problems. Questions that base LLMs fail to solve are identified and iteratively improved through expert-level feedback, producing high-quality, defect-aware training data. In the training stage, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Mathematics, Computing, and Information Processing · Topic Modeling