Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks
Avinash Anand, Mohit Gupta, Kritarth Prasad, Navya Singla, Sanjana, Sanjeev, Jatin Kumar, Adarsh Raj Shivam, Rajiv Ratn Shah

TL;DR
This paper introduces MathQuest, a comprehensive dataset for evaluating large language models on mathematical problem solving, and benchmarks three models, finding MAmmoTH-13B to be the most effective.
Contribution
The paper presents a new extensive mathematics dataset and evaluates LLMs, establishing MAmmoTH-13B as a strong benchmark for solving NCERT mathematics problems.
Findings
MAmmoTH-13B outperforms other models in solving mathematical problems.
MathQuest covers a wide range of mathematical concepts and complexities.
Fine-tuned models serve as effective benchmarks for educational math tasks.
Abstract
The rapid progress in the field of natural language processing (NLP) systems and the expansion of large language models (LLMs) have opened up numerous opportunities in the field of education and instructional methods. These advancements offer the potential for tailored learning experiences and immediate feedback, all delivered through accessible and cost-effective services. One notable application area for this technological advancement is in the realm of solving mathematical problems. Mathematical problem-solving not only requires the ability to decipher complex problem statements but also the skill to perform precise arithmetic calculations at each step of the problem-solving process. However, the evaluation of the arithmetic capabilities of large language models remains an area that has received relatively little attention. In response, we introduce an extensive mathematics dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Mathematics Education and Teaching Techniques · Mathematics Education and Pedagogy
