ControlMath: Controllable Data Generation Promotes Math Generalist Models
Nuo Chen, Ning Wu, Jianhui Chang, Jia Li

TL;DR
ControlMath introduces an iterative data generation framework with diverse math problems, significantly enhancing the generalization of math reasoning models beyond specific domains using a large, high-quality dataset.
Contribution
The paper presents a novel iterative method for generating diverse math problems, improving model generalization with fewer high-quality data points.
Findings
Generated 190k diverse math problems with ControlMathQA
Combining ControlMathQA with existing datasets improves reasoning performance
Enhanced model generalization beyond specific domains
Abstract
Utilizing large language models (LLMs) for data augmentation has yielded encouraging results in mathematical reasoning. However, these approaches face constraints in problem diversity, potentially restricting them to in-domain/distribution data generation. To this end, we propose ControlMath, an iterative method involving an equation-generator module and two LLM-based agents. The module creates diverse equations, which the Problem-Crafter agent then transforms into math word problems. The Reverse-Agent filters and selects high-quality data, adhering to the "less is more" principle, achieving better results with fewer data points. This approach enables the generation of diverse math problems, not limited to specific domains or distributions. As a result, we collect ControlMathQA, which involves 190k math word problems. Extensive results prove that combining our dataset with in-domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Modeling and Simulation Systems · Model Reduction and Neural Networks
