MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Zhengyang Tang, Xingxing Zhang, Benyou Wang, Furu Wei

TL;DR
MathScale introduces a scalable method to generate extensive mathematical reasoning datasets using frontier LLMs, significantly enhancing the mathematical problem-solving abilities of open-source models like LLaMA-2 and Mistral.
Contribution
We propose MathScale, a novel approach to create large-scale math reasoning data and demonstrate its effectiveness in improving LLMs' mathematical capabilities.
Findings
Created MathScaleQA with 2 million math QA pairs.
Achieved state-of-the-art performance on MwpBench with MathScale-7B.
Significant accuracy improvements over peers of similar size.
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data using frontier LLMs (e.g., {\tt GPT-3.5}). Inspired by the cognitive mechanism in human mathematical learning, it first extracts topics and knowledge points from seed math questions and then build a concept graph, which is subsequently used to generate new math questions. MathScale exhibits effective scalability along the size axis of the math dataset that we generate. As a result, we create a mathematical reasoning dataset (MathScaleQA) containing two million math question-answer pairs. To evaluate mathematical reasoning abilities of LLMs comprehensively, we construct {\sc MwpBench}, a benchmark of Math Word Problems,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗hkust-nlp/dart-math-mistral-7b-prop2diffmodel· 68 dl· ♡ 168 dl♡ 1
- 🤗hkust-nlp/dart-math-mistral-7b-uniformmodel· 10 dl10 dl
- 🤗hkust-nlp/dart-math-llama3-8b-prop2diffmodel· 17 dl· ♡ 117 dl♡ 1
- 🤗hkust-nlp/dart-math-llama3-8b-uniformmodel· 7 dl· ♡ 27 dl♡ 2
- 🤗hkust-nlp/dart-math-dsmath-7b-prop2diffmodel· 13 dl· ♡ 313 dl♡ 3
- 🤗hkust-nlp/dart-math-llama3-70b-prop2diffmodel· 10 dl10 dl
- 🤗hkust-nlp/dart-math-dsmath-7b-uniformmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗hkust-nlp/dart-math-llama3-70b-uniformmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗fdqerq22ds/MathScale-Mistralmodel· 3 dl· ♡ 33 dl♡ 3
- 🤗RichardErkhov/hkust-nlp_-_dart-math-llama3-8b-prop2diff-ggufmodel· 232 dl232 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics Education and Teaching Techniques · Intelligent Tutoring Systems and Adaptive Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Layer Normalization · Byte Pair Encoding · Dropout · Multi-Head Attention · Linear Warmup With Cosine Annealing
