Bengali Text Classification: An Evaluation of Large Language Model Approaches

Md Mahmudul Hoque; Md Mehedi Hassain; Md Hojaifa Tanvir; Rahul Nandy

arXiv:2601.12132·cs.CL·January 21, 2026

Bengali Text Classification: An Evaluation of Large Language Model Approaches

Md Mahmudul Hoque, Md Mehedi Hassain, Md Hojaifa Tanvir, Rahul Nandy

PDF

Open Access

TL;DR

This paper evaluates the performance of large language models in classifying Bengali newspaper articles, demonstrating that LLMs can be effective despite limited resources for Bengali NLP.

Contribution

It provides an empirical comparison of instruction-tuned LLMs for Bengali text classification, highlighting the potential of LLMs in resource-scarce languages.

Findings

01

Qwen 2.5 achieved 72% accuracy, outperforming other models.

02

LLaMA models achieved 53% and 56% accuracy.

03

LLMs show promise for Bengali NLP tasks despite resource limitations.

Abstract

Bengali text classification is a Significant task in natural language processing (NLP), where text is categorized into predefined labels. Unlike English, Bengali faces challenges due to the lack of extensive annotated datasets and pre-trained language models. This study explores the effectiveness of large language models (LLMs) in classifying Bengali newspaper articles. The dataset used, obtained from Kaggle, consists of articles from Prothom Alo, a major Bangladeshi newspaper. Three instruction-tuned LLMs LLaMA 3.1 8B Instruct, LLaMA 3.2 3B Instruct, and Qwen 2.5 7B Instruct were evaluated for this task under the same classification framework. Among the evaluated models, Qwen 2.5 achieved the highest classification accuracy of 72%, showing particular strength in the "Sports" category. In comparison, LLaMA 3.1 and LLaMA 3.2 attained accuracies of 53% and 56%, respectively. The findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Topic Modeling · Imbalanced Data Classification Techniques