PARAMANU-GANITA: Can Small Math Language Models Rival with Large Language Models on Mathematical Reasoning?
Mitodru Niyogi, Arnab Bhattacharya

TL;DR
Paramanu-Ganita, a small 208M parameter math-focused language model, achieves competitive mathematical reasoning performance surpassing larger generalist models through domain-specific training and instruction fine-tuning.
Contribution
This paper introduces Paramanu-Ganita, a small, cost-effective, domain-specific math language model trained from scratch with specialized tokenization and Chain-of-Thought fine-tuning, outperforming larger models.
Findings
Outperforms larger generalist LLMs by 30% in GSM8K accuracy
Achieves 6-8% higher scores on MATH benchmark
Surpasses models on various math and reasoning benchmarks
Abstract
In this paper, we study whether domain specific pretraining of small generative language models (SLM) from scratch with domain specialized tokenizer and Chain-of-Thought (CoT) instruction fine-tuning results in competitive performance on mathematical reasoning compared to LLMs? Secondly, whether this approach is environmentally sustainable, highly cost efficient? To address these research questions, we present Paramanu-Ganita, a 208 million-parameter novel decoder-only Auto Regressive SLM on mathematics. We performed pretraining from scratch on 31.5 billion tokens for 170 A100 hours using a context size of 4096 on a mixed mathematical corpus consisting of web pages, source code, textbooks, CoT templatised StackOverflow QA pairs, and mathematical lecture notes in LaTeX curated by us. We also trained a math and code specialised BPE tokenizer. We proposed and performed CoT instruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Linguistics, Cultural Analysis
MethodsPathways Language Model
