GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO
Shubhashis Roy Dipta, Khairul Mahbub, Nadia Najjar

TL;DR
GanitLLM is a Bengali mathematical reasoning model that uses a difficulty-aware curriculum training pipeline, significantly improving reasoning accuracy and Bengali token usage on math benchmarks.
Contribution
The paper introduces GanitLLM, a new Bengali math dataset with difficulty tags, and a curriculum-based training pipeline called Curriculum-GRPO for low-resource language reasoning.
Findings
GanitLLM-4B improves accuracy by +8 and +6 points on two benchmarks.
Increases Bengali reasoning tokens from 14% to over 88%.
Reduces solution length from 943 to 193 words.
Abstract
We present a Bengali mathematical reasoning model called GanitLLM (named after the Bangla word for mathematics, Ganit), together with a new difficulty-aware Bengali math corpus and a curriculum-based GRPO pipeline. Bengali is one of the world's most widely spoken languages, yet existing LLMs either reason in English and then translate, or simply fail on multi-step Bengali math, in part because reinforcement learning recipes are tuned for high-resource languages and collapse under reward sparsity in low-resource settings. To address this, we construct Ganit, a rigorously filtered and decontaminated Bengali math dataset with automatic difficulty tags derived from the pass@k of a strong evaluator model. Building on this dataset, we propose Curriculum-GRPO, which combines multi-stage training (SFT + GRPO) with difficulty-aware sampling and verifiable rewards for format, numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗dipta007/GanitLLM-1.7B-SFTmodel· 8 dl8 dl
- 🤗dipta007/GanitLLM-4B-SFTmodel· 12 dl12 dl
- 🤗dipta007/GanitLLM-0.6B-SFTmodel· 8 dl8 dl
- 🤗dipta007/GanitLLM-4B_SFT_CGRPOmodel· 97 dl97 dl
- 🤗dipta007/GanitLLM-4B_SFT_GRPOmodel· 14 dl· ♡ 114 dl♡ 1
- 🤗dipta007/GanitLLM-4B_CGRPOmodel· 11 dl11 dl
- 🤗dipta007/GanitLLM-1.7B_CGRPOmodel· 12 dl12 dl
- 🤗dipta007/GanitLLM-1.7B_SFT_GRPOmodel· 13 dl13 dl
- 🤗dipta007/GanitLLM-1.7B_SFT_CGRPOmodel· 122 dl122 dl
- 🤗dipta007/GanitLLM-0.6B_SFT_CGRPOmodel· 15 dl15 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
