Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning

Bidyarthi Paul; Jalisha Jashim Era; Mirazur Rahman Zim; Tahmid Sattar Aothoi; Faisal Muhammad Shah

arXiv:2505.21354·cs.CL·July 31, 2025

Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning

Bidyarthi Paul, Jalisha Jashim Era, Mirazur Rahman Zim, Tahmid Sattar Aothoi, Faisal Muhammad Shah

PDF

1 Datasets

TL;DR

This paper introduces SOMADHAN, a new Bengali MWP dataset, and demonstrates that chain of thought prompting significantly improves large language models' reasoning accuracy in low-resource Bengali language tasks.

Contribution

The creation of SOMADHAN dataset and the evaluation of LLMs with chain of thought prompting for Bengali math word problems are novel contributions.

Findings

01

Chain of Thought prompting improves model accuracy.

02

LLaMA-3.3 70B achieves 88% accuracy with few-shot CoT.

03

Fine-tuning with LoRA enhances Bengali MWP solving capabilities.

Abstract

Solving Bengali Math Word Problems (MWPs) remains a major challenge in natural language processing (NLP) due to the language's low-resource status and the multi-step reasoning required. Existing models struggle with complex Bengali MWPs, largely because no human-annotated Bengali dataset has previously addressed this task. This gap has limited progress in Bengali mathematical reasoning. To address this, we created SOMADHAN, a dataset of 8792 complex Bengali MWPs with manually written, step-by-step solutions. We designed this dataset to support reasoning-focused evaluation and model development in a linguistically underrepresented context. Using SOMADHAN, we evaluated a range of large language models (LLMs) - including GPT-4o, GPT-3.5 Turbo, LLaMA series models, Deepseek, and Qwen - through both zero-shot and few-shot prompting with and without Chain of Thought (CoT) reasoning. CoT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

dipta007/Ganit
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.