Batayan: A Filipino NLP benchmark for evaluating Large Language Models

Jann Railey Montalan; Jimson Paulo Layacan; David Demitri Africa; Richell Isaiah Flores; Michael T. Lopez II; Theresa Denise Magsajo; Anjanette Cayabyab; William Chandra Tjhi

arXiv:2502.14911·cs.CL·June 23, 2025

Batayan: A Filipino NLP benchmark for evaluating Large Language Models

Jann Railey Montalan, Jimson Paulo Layacan, David Demitri Africa, Richell Isaiah Flores, Michael T. Lopez II, Theresa Denise Magsajo, Anjanette Cayabyab, William Chandra Tjhi

PDF

1 Video

TL;DR

Batayan is a comprehensive Filipino NLP benchmark that evaluates large language models across understanding, reasoning, and generation tasks, addressing resource gaps and linguistic complexities of Filipino.

Contribution

It introduces eight new Filipino NLP tasks, including three novel ones, with a native-speaker-driven validation process, and provides a public evaluation suite for community progress.

Findings

01

Significant performance gaps in LLMs on Filipino tasks.

02

Under-representation of Filipino in pre-training data.

03

Challenges due to Filipino's morphology and syntax.

Abstract

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities on widely benchmarked high-resource languages. However, linguistic nuances of under-resourced languages remain unexplored. We introduce Batayan, a holistic Filipino benchmark that systematically evaluates LLMs across three key natural language processing (NLP) competencies: understanding, reasoning, and generation. Batayan consolidates eight tasks, three of which have not existed prior for Filipino corpora, covering both Tagalog and code-switched Taglish utterances. Our rigorous, native-speaker-driven adaptation and validation processes ensures fluency and authenticity to the complex morphological and syntactic structures of Filipino, alleviating the pervasive translationese bias in existing Filipino corpora. We report empirical results on a variety of open-source and commercial LLMs, highlighting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Batayan: A Filipino NLP benchmark for evaluating Large Language Models· underline