PARAMANU-GANITA: Can Small Math Language Models Rival with Large   Language Models on Mathematical Reasoning?

Mitodru Niyogi; Arnab Bhattacharya

arXiv:2404.14395·cs.CL·March 6, 2025

PARAMANU-GANITA: Can Small Math Language Models Rival with Large Language Models on Mathematical Reasoning?

Mitodru Niyogi, Arnab Bhattacharya

PDF

Open Access 1 Models 1 Datasets

TL;DR

Paramanu-Ganita, a small 208M parameter math-focused language model, achieves competitive mathematical reasoning performance surpassing larger generalist models through domain-specific training and instruction fine-tuning.

Contribution

This paper introduces Paramanu-Ganita, a small, cost-effective, domain-specific math language model trained from scratch with specialized tokenization and Chain-of-Thought fine-tuning, outperforming larger models.

Findings

01

Outperforms larger generalist LLMs by 30% in GSM8K accuracy

02

Achieves 6-8% higher scores on MATH benchmark

03

Surpasses models on various math and reasoning benchmarks

Abstract

In this paper, we study whether domain specific pretraining of small generative language models (SLM) from scratch with domain specialized tokenizer and Chain-of-Thought (CoT) instruction fine-tuning results in competitive performance on mathematical reasoning compared to LLMs? Secondly, whether this approach is environmentally sustainable, highly cost efficient? To address these research questions, we present Paramanu-Ganita, a 208 million-parameter novel decoder-only Auto Regressive SLM on mathematics. We performed pretraining from scratch on 31.5 billion tokens for 170 A100 hours using a context size of 4096 on a mixed mathematical corpus consisting of web pages, source code, textbooks, CoT templatised StackOverflow QA pairs, and mathematical lecture notes in LaTeX curated by us. We also trained a math and code specialised BPE tokenizer. We proposed and performed CoT instruction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
gyanai/paramanu-ganita-208M-hf
model

Datasets

gyanai/ganita
dataset· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Linguistics, Cultural Analysis

MethodsPathways Language Model