FormosanBench: Benchmarking Low-Resource Austronesian Languages in the Era of Large Language Models
Kaiying Kevin Lin, Hsiyu Chen, Haopeng Zhang

TL;DR
This paper introduces FORMOSANBENCH, a benchmark for evaluating large language models on low-resource Formosan languages across multiple NLP tasks, revealing significant performance gaps and highlighting the need for more inclusive NLP technologies.
Contribution
It is the first benchmark to evaluate LLMs on endangered Formosan languages, covering translation, speech recognition, and summarization, and provides insights into model limitations and future research directions.
Findings
LLMs underperform on Formosan languages across tasks
10-shot learning and fine-tuning yield limited improvements
Significant performance gap between high-resource and low-resource languages
Abstract
While large language models (LLMs) have demonstrated impressive performance across a wide range of natural language processing (NLP) tasks in high-resource languages, their capabilities in low-resource and minority languages remain significantly underexplored. Formosan languages -- a subgroup of Austronesian languages spoken in Taiwan -- are both linguistically rich and endangered, largely due to the sociolinguistic dominance of Mandarin. In this work, we introduce FORMOSANBENCH, the first benchmark for evaluating LLMs on low-resource Austronesian languages. It covers three endangered Formosan languages: Atayal, Amis, and Paiwan, across three core NLP tasks: machine translation, automatic speech recognition (ASR), and text summarization. We assess model performance in zero-shot, 10-shot, and fine-tuned settings using FORMOSANBENCH. Our results reveal a substantial performance gap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods
