LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages

Alham Fikri Aji; Trevor Cohn

arXiv:2508.12459·cs.CL·August 19, 2025

LoraxBench: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages

Alham Fikri Aji, Trevor Cohn

PDF

Open Access 1 Video

TL;DR

LoraxBench is a comprehensive multilingual benchmark suite for 20 Indonesian languages, assessing diverse NLP tasks and revealing challenges in low-resource language performance and register variations.

Contribution

This paper introduces LoraxBench, the first multilingual benchmark for Indonesian languages covering six tasks and including register variations, highlighting performance gaps and challenges.

Findings

01

Benchmark is challenging for current models.

02

Performance varies significantly across languages.

03

Register changes impact model accuracy.

Abstract

As one of the world's most populous countries, with 700 languages spoken, Indonesia is behind in terms of NLP progress. We introduce LoraxBench, a benchmark that focuses on low-resource languages of Indonesia and covers 6 diverse tasks: reading comprehension, open-domain QA, language inference, causal reasoning, translation, and cultural QA. Our dataset covers 20 languages, with the addition of two formality registers for three languages. We evaluate a diverse set of multilingual and region-focused LLMs and found that this benchmark is challenging. We note a visible discrepancy between performance in Indonesian and other languages, especially the low-resource ones. There is no clear lead when using a region-specific model as opposed to the general multilingual model. Lastly, we show that a change in register affects model performance, especially with registers not commonly found in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LORAXBENCH: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages· underline

Taxonomy

TopicsEdcuational Technology Systems