Data Diversification Methods In Alignment Enhance Math Performance In LLMs
Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou

TL;DR
This paper demonstrates that data diversification strategies, especially the novel Diversified-ThinkSolve method, significantly improve the mathematical reasoning abilities of large language models with minimal additional computational cost.
Contribution
The paper introduces Diversified-ThinkSolve, a new structured approach for data diversification that enhances mathematical reasoning in LLMs more effectively than traditional methods.
Findings
DTS improves GSM8K accuracy by 7.1% and MATH by 4.2%.
DTS incurs only 1.03x computational overhead.
MCTS is more costly with less performance gain.
Abstract
While recent advances in preference learning have enhanced alignment in human feedback, mathematical reasoning remains a persistent challenge. We investigate how data diversification strategies in preference optimization can improve the mathematical reasoning abilities of large language models (LLMs). We evaluate three common data generation methods: temperature sampling, Chain-of-Thought prompting, and Monte Carlo Tree Search (MCTS), and introduce Diversified-ThinkSolve (DTS), a novel structured approach that systematically decomposes problems into diverse reasoning paths. Our results show that with strategically diversified preference data, models can substantially improve mathematical reasoning performance, with the best approach yielding gains of 7.1% on GSM8K and 4.2% on MATH over the base model. Despite its strong performance, DTS incurs only a marginal computational overhead…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
