LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages
Alham Fikri Aji, Trevor Cohn

TL;DR
LoraxBench is a comprehensive multilingual benchmark suite for 20 Indonesian languages, assessing diverse NLP tasks and revealing challenges in low-resource language performance and register variations.
Contribution
This paper introduces LoraxBench, the first multilingual benchmark for Indonesian languages covering six tasks and including register variations, highlighting performance gaps and challenges.
Findings
Benchmark is challenging for current models.
Performance varies significantly across languages.
Register changes impact model accuracy.
Abstract
As one of the world's most populous countries, with 700 languages spoken, Indonesia is behind in terms of NLP progress. We introduce LoraxBench, a benchmark that focuses on low-resource languages of Indonesia and covers 6 diverse tasks: reading comprehension, open-domain QA, language inference, causal reasoning, translation, and cultural QA. Our dataset covers 20 languages, with the addition of two formality registers for three languages. We evaluate a diverse set of multilingual and region-focused LLMs and found that this benchmark is challenging. We note a visible discrepancy between performance in Indonesian and other languages, especially the low-resource ones. There is no clear lead when using a region-specific model as opposed to the general multilingual model. Lastly, we show that a change in register affects model performance, especially with registers not commonly found in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEdcuational Technology Systems
