InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic   Mathematical Reasoning

Bo-Wen Zhang; Yan Yan; Lin Li; Guang Liu

arXiv:2408.07089·cs.LG·August 15, 2024

InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

Bo-Wen Zhang, Yan Yan, Lin Li, Guang Liu

PDF

5 Datasets

TL;DR

InfinityMATH is a scalable, number-independent instruction dataset for mathematical reasoning, significantly improving language models' performance and robustness across diverse math benchmarks.

Contribution

The paper introduces InfinityMATH, a novel scalable dataset construction method that decouples numbers from problems, enabling efficient synthesis and fine-tuning of models for mathematical reasoning.

Findings

01

Models fine-tuned on InfinityMATH show 184.7% to 514.3% improvement on benchmarks.

02

Enhanced models demonstrate high robustness on number-variant test sets.

03

InfinityMATH enables scalable and flexible dataset creation for mathematical reasoning.

Abstract

Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challenges for scalability. We introduce InfinityMATH, a scalable instruction tuning dataset for programmatic mathematical reasoning. The construction pipeline emphasizes decoupling numbers from mathematical problems to synthesize number-independent programs, enabling efficient and flexible scaling while minimizing dependency on specific numerical values. Fine-tuning experiments with open-source language and code models, such as Llama2 and CodeLlama, demonstrate the practical benefits of InfinityMATH.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.