FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline

Parker Seegmiller; Kartik Mehta; Soumya Saha; Chenyang Tao; Shereen Oraby; Arpit Gupta; Tagyoung Chung; Mohit Bansal; Nanyun Peng

arXiv:2508.16514·cs.LG·August 25, 2025

FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline

Parker Seegmiller, Kartik Mehta, Soumya Saha, Chenyang Tao, Shereen Oraby, Arpit Gupta, Tagyoung Chung, Mohit Bansal, Nanyun Peng

PDF

1 Video

TL;DR

This paper introduces FLAMES, a comprehensive framework for analyzing synthetic data strategies in LLM math reasoning, revealing key factors that improve performance and generalization.

Contribution

The paper systematically evaluates 10 data synthesis strategies and develops new methods, providing insights and a dataset that enhance out-of-domain math reasoning.

Findings

01

Complexity-increasing data agents improve math metrics.

02

Higher problem coverage outweighs solution reliability in fixed budgets.

03

Synthetic data from GSM8K and MATH enhances benchmark performance.

Abstract

Recent works improving LLM math reasoning with synthetic data have used unique setups, making comparison of data synthesis strategies impractical. This leaves many unanswered questions about the roles of different factors in the synthetic data pipeline, such as the impact of filtering low-quality problems. To address this gap, we introduce FLAMES, a Framework for LLM Assessment of Math rEasoning Data Synthesis, and perform a systematic study of 10 existing data synthesis strategies and multiple other factors impacting the performance of synthetic math reasoning data. Our FLAMES experiments provide several valuable insights about the optimal balance of difficulty and diversity of synthetic data. First, data agents designed to increase problem complexity lead to best improvements on most math metrics. Second, with a fixed data generation budget, keeping higher problem coverage is more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FLAMES: Improving LLM Math Reasoning via a Fine-Grained Analysis of the Data Synthesis Pipeline· underline