MAmmoTH: Building Math Generalist Models through Hybrid Instruction   Tuning

Xiang Yue; Xingwei Qu; Ge Zhang; Yao Fu; Wenhao Huang; Huan Sun; Yu; Su; Wenhu Chen

arXiv:2309.05653·cs.CL·October 4, 2023·20 cites

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu, Su, Wenhu Chen

PDF

Open Access 1 Repo 10 Models 5 Datasets

TL;DR

MAmmoTH introduces a series of open-source large language models tailored for general math problem-solving, leveraging a hybrid of chain-of-thought and program-of-thought rationales to significantly outperform existing models.

Contribution

The paper presents MAmmoTH, a new open-source LLM series trained on a curated math instruction dataset with hybrid rationales, achieving state-of-the-art performance on multiple math reasoning benchmarks.

Findings

01

MAmmoTH models outperform existing open-source models by 16-32% on nine datasets.

02

MAmmoTH-7B reaches 33% on MATH, surpassing WizardMath by 23%.

03

MAmmoTH-34B achieves 44% on MATH, exceeding GPT-4's CoT results.

Abstract

We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset. MathInstruct is compiled from 13 math datasets with intermediate rationales, six of which have rationales newly curated by us. It presents a unique hybrid of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and also ensures extensive coverage of diverse fields in math. The hybrid of CoT and PoT not only unleashes the potential of tool use but also allows different thought processes for different math problems. As a result, the MAmmoTH series substantially outperform existing open-source models on nine mathematical reasoning datasets across all scales with an average accuracy gain between 16% and 32%. Remarkably, our MAmmoTH-7B model reaches 33%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tiger-ai-lab/mammoth
none

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Online Learning and Analytics · Intelligent Tutoring Systems and Adaptive Learning