Evaluating LLMs' Multilingual Capabilities for Bengali: Benchmark Creation and Performance Analysis
Shimanto Bhowmik, Tawsif Tashwar Dipto, Md Sazzad Islam, Sheryl Hsu, Tahsin Reasat

TL;DR
This paper creates a benchmark to evaluate Bengali language capabilities in large language models, revealing performance gaps, error patterns, and the impact of tokenization, thus guiding future improvements for underrepresented languages.
Contribution
It introduces a standardized benchmark for Bengali NLP, evaluates 10 LLMs, and provides insights into their strengths and weaknesses specific to Bengali language processing.
Findings
Smaller models and certain architectures like Mistral underperform in Bengali.
Robustness varies across models, with DeepSeek showing stable multilingual performance.
Tokenization efficiency inversely affects model accuracy, impacting Bengali NLP performance.
Abstract
Bengali is an underrepresented language in NLP research. However, it remains a challenge due to its unique linguistic structure and computational constraints. In this work, we systematically investigate the challenges that hinder Bengali NLP performance by focusing on the absence of standardized evaluation benchmarks. We then evaluated 10 recent open source Large Language Models (LLMs) in 8 of the translated datasets and performed a comprehensive error analysis to pinpoint their primary failure modes. Our findings reveal consistent performance gaps for Bengali compared to English, particularly for smaller models and specific model families like Mistral. We also identified promising robustness in certain architectures, such as DeepSeek, that maintain more stable performance across languages. Our analysis reveals an inverse relationship between tokenization efficiency and LLM accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBanking Sector Performance and Management · Second Language Learning and Teaching · Open Education and E-Learning
