Augmenting Math Word Problems via Iterative Question Composing
Haoxiong Liu, Yifan Zhang, Yifan Luo, Andrew Chi-Chih Yao

TL;DR
This paper introduces the MMIQC dataset and a novel augmentation method, IQC, to improve mathematical reasoning in open-source LLMs, achieving state-of-the-art results on math benchmarks.
Contribution
The paper presents the MMIQC dataset and the IQC augmentation technique, significantly enhancing open-source LLM performance on math reasoning tasks.
Findings
Qwen-72B-MMIQC achieves 45.0% accuracy on MATH benchmark.
IQC augmentation accounts for most performance improvements.
Models trained on MMIQC generalize well to unseen data.
Abstract
Despite the advancements in large language models (LLMs) for mathematical reasoning, solving competition-level math problems remains a significant challenge, especially for open-source LLMs without external tools. We introduce the MMIQC dataset, comprising a mixture of processed web data and synthetic question-response pairs, aimed at enhancing the mathematical reasoning capabilities of base language models. Models fine-tuned on MMIQC consistently surpass their counterparts in performance on the MATH benchmark across various model sizes. Notably, Qwen-72B-MMIQC achieves a 45.0% accuracy, exceeding the previous open-source state-of-the-art by 8.2% and outperforming the initial version GPT-4 released in 2023. Extensive evaluation results on Hungarian high school finals suggest that such improvement can generalize to unseen data. Our ablation study on MMIQC reveals that a large part of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Educational Assessment and Pedagogy
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Dropout · Layer Normalization · Multi-Head Attention · Adam · Softmax · Dense Connections
