DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Yuxuan Tong, Xiwen Zhang, Rui Wang, Ruidong Wu, Junxian He

TL;DR
DART-Math introduces a difficulty-aware rejection tuning method that enhances mathematical problem-solving models by focusing on challenging queries, resulting in superior performance using smaller, publicly available datasets and models.
Contribution
The paper presents DART, a novel method that prioritizes difficult queries during data synthesis, creating smaller, more effective datasets for training mathematical reasoning models without proprietary data.
Findings
DART-Math outperforms previous methods on 6 benchmarks.
Models trained with DART datasets outperform those trained with larger, less focused datasets.
Synthetic datasets created with DART are the most cost-effective publicly available resources.
Abstract
Solving mathematical problems requires advanced reasoning abilities and presents notable challenges for large language models. Previous works usually synthesize data from proprietary models to augment existing datasets, followed by instruction tuning to achieve top-tier results. However, our analysis of these datasets reveals severe biases towards easy queries, with frequent failures to generate any correct response for the most challenging queries. Hypothesizing that difficult queries are crucial to learn complex reasoning, we propose Difficulty-Aware Rejection Tuning (DART), a method that allocates difficult queries more trials during the synthesis phase, enabling more extensive training on difficult samples. Utilizing DART, we have created new datasets for mathematical problem-solving that focus more on difficult queries and are substantially smaller than previous ones. Remarkably,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗hkust-nlp/dart-math-mistral-7b-prop2diffmodel· 68 dl· ♡ 168 dl♡ 1
- 🤗hkust-nlp/dart-math-mistral-7b-uniformmodel· 10 dl10 dl
- 🤗hkust-nlp/dart-math-llama3-8b-prop2diffmodel· 17 dl· ♡ 117 dl♡ 1
- 🤗hkust-nlp/dart-math-llama3-8b-uniformmodel· 7 dl· ♡ 27 dl♡ 2
- 🤗hkust-nlp/dart-math-dsmath-7b-prop2diffmodel· 13 dl· ♡ 313 dl♡ 3
- 🤗hkust-nlp/dart-math-llama3-70b-prop2diffmodel· 10 dl10 dl
- 🤗hkust-nlp/dart-math-dsmath-7b-uniformmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗hkust-nlp/dart-math-llama3-70b-uniformmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗RichardErkhov/hkust-nlp_-_dart-math-llama3-8b-prop2diff-ggufmodel· 232 dl232 dl
- 🤗RichardErkhov/hkust-nlp_-_dart-math-dsmath-7b-prop2diff-ggufmodel· 91 dl91 dl
Videos
Taxonomy
TopicsParallel Computing and Optimization Techniques
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections
