TL;DR
This paper introduces FLAMES, a comprehensive framework for analyzing synthetic data strategies in LLM math reasoning, revealing key factors that improve performance and generalization.
Contribution
The paper systematically evaluates 10 data synthesis strategies and develops new methods, providing insights and a dataset that enhance out-of-domain math reasoning.
Findings
Complexity-increasing data agents improve math metrics.
Higher problem coverage outweighs solution reliability in fixed budgets.
Synthetic data from GSM8K and MATH enhances benchmark performance.
Abstract
Recent works improving LLM math reasoning with synthetic data have used unique setups, making comparison of data synthesis strategies impractical. This leaves many unanswered questions about the roles of different factors in the synthetic data pipeline, such as the impact of filtering low-quality problems. To address this gap, we introduce FLAMES, a Framework for LLM Assessment of Math rEasoning Data Synthesis, and perform a systematic study of 10 existing data synthesis strategies and multiple other factors impacting the performance of synthetic math reasoning data. Our FLAMES experiments provide several valuable insights about the optimal balance of difficulty and diversity of synthetic data. First, data agents designed to increase problem complexity lead to best improvements on most math metrics. Second, with a fixed data generation budget, keeping higher problem coverage is more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
