BURMESE-SAN: Burmese NLP Benchmark for Evaluating Large Language Models

Thura Aung; Jann Railey Montalan; Jian Gang Ngui; Peerat Limkonchotiwat

arXiv:2602.18788·cs.CL·February 26, 2026

BURMESE-SAN: Burmese NLP Benchmark for Evaluating Large Language Models

Thura Aung, Jann Railey Montalan, Jian Gang Ngui, Peerat Limkonchotiwat

PDF

Open Access

TL;DR

BURMESE-SAN is a comprehensive benchmark for evaluating large language models on Burmese language tasks, covering understanding, reasoning, and generation, and highlighting the importance of model design and fine-tuning.

Contribution

This work introduces the first holistic Burmese NLP benchmark with diverse subtasks, constructed through native speaker input, and provides a large-scale evaluation of LLMs on Burmese.

Findings

01

Model performance improves with regional fine-tuning and newer architectures.

02

Language complexity impacts model performance more than size alone.

03

Benchmark is publicly available for ongoing evaluation.

Abstract

We introduce BURMESE-SAN, the first holistic benchmark that systematically evaluates large language models (LLMs) for Burmese across three core NLP competencies: understanding (NLU), reasoning (NLR), and generation (NLG). BURMESE-SAN consolidates seven subtasks spanning these competencies, including Question Answering, Sentiment Analysis, Toxicity Detection, Causal Reasoning, Natural Language Inference, Abstractive Summarization, and Machine Translation, several of which were previously unavailable for Burmese. The benchmark is constructed through a rigorous native-speaker-driven process to ensure linguistic naturalness, fluency, and cultural authenticity while minimizing translation-induced artifacts. We conduct a large-scale evaluation of both open-weight and commercial LLMs to examine challenges in Burmese modeling arising from limited pretraining coverage, rich morphology, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques