LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models
Jian Gao, Richeng Xuan, Zhaolu Kang, Dingshi Liao, Wenxin Huang, Zongmou Huang, Yangdi Xu, Bowen Qin, Zheqi He, Xi Yang, Changjin Li, Yonghua Lin

TL;DR
LaoBench is a comprehensive benchmark designed to evaluate large language models' understanding and reasoning in Lao, addressing the lack of evaluation resources for low-resource Southeast Asian languages.
Contribution
It introduces the first large-scale, multidimensional Lao language benchmark with expert-curated samples and a hybrid pipeline for high-quality evaluation.
Findings
State-of-the-art LLMs lag behind human experts in Lao understanding.
Models perform poorly in culturally grounded reasoning and translation fidelity.
LaoBench enables fair black-box evaluation with secure, held-out subsets.
Abstract
The rapid advancement of large language models (LLMs) has not been matched by their evaluation in low-resource languages, especially Southeast Asian languages like Lao. To fill this gap, we introduce \textbf{LaoBench}, the first large-scale, high-quality, and multidimensional benchmark for assessing LLM language understanding and reasoning in Lao. LaoBench contains \textbf{17,000+} expert-curated samples across three dimensions: culturally grounded knowledge application, curriculum-aligned K12 education, and bilingual translation among Lao, Chinese, and English. It includes open-source and held-out subsets, where the held-out portion enables secure black-box evaluation via a controlled service to improve fairness and data security. We construct LaoBench with a hybrid pipeline that combines expert authoring with agent-assisted verification, ensuring linguistic accuracy, cultural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
