Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks
ShaoZhen Liu, Xinting Huang, Houwen Peng, Xin Chen, Xinyang Song, Qi Li, Zhenan Sun

TL;DR
This paper introduces a dual-phase training framework for large language models that enhances their reasoning abilities through self-generated data and difficulty-aware sampling, leading to improved performance on mathematical benchmarks.
Contribution
It proposes a novel two-stage training method combining self-generated chain-of-thought data and dynamic data filtering to boost LLM reasoning capabilities.
Findings
Extended reasoning chains over 4 times longer.
Improved performance on GSM8K and MATH500 benchmarks.
Enhanced handling of complex mathematical problems.
Abstract
In recent years, large language models (LLMs) have demonstrated significant potential in complex reasoning tasks like mathematical problem-solving. However, existing research predominantly relies on reinforcement learning (RL) frameworks while overlooking supervised fine-tuning (SFT) methods. This paper proposes a new two-stage training framework that enhances models' self-correction capabilities through self-generated long chain-of-thought (CoT) data. During the first stage, a multi-turn dialogue strategy guides the model to generate CoT data incorporating verification, backtracking, subgoal decomposition, and backward reasoning, with predefined rules filtering high-quality samples for supervised fine-tuning. The second stage employs a difficulty-aware rejection sampling mechanism to dynamically optimize data distribution, strengthening the model's ability to handle complex problems.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications
