Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages

Lance Calvin Lim Gamboa; Yue Feng; and Mark Lee

arXiv:2506.07249·cs.CL·August 29, 2025

Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages

Lance Calvin Lim Gamboa, Yue Feng, and Mark Lee

PDF

TL;DR

This paper adapts an information-theoretic bias attribution metric for agglutinative languages like Filipino, revealing distinct bias patterns in Filipino and multilingual models compared to English models.

Contribution

It extends bias interpretability methods to agglutinative languages and demonstrates their effectiveness on Filipino and multilingual models.

Findings

01

Filipino models are biased by words related to people, objects, and relationships.

02

Bias themes in Filipino differ from English, focusing on entities rather than actions.

03

Filipino and multilingual models process sociodemographic bias differently from English models.

Abstract

Emerging research on bias attribution and interpretability have revealed how tokens contribute to biased behavior in language models processing English texts. We build on this line of inquiry by adapting the information-theoretic bias attribution score metric for implementation on models handling agglutinative languages, particularly Filipino. We then demonstrate the effectiveness of our adapted method by using it on a purely Filipino model and on three multilingual models: one trained on languages worldwide and two on Southeast Asian data. Our results show that Filipino models are driven towards bias by words pertaining to people, objects, and relationships, entity-based themes that stand in contrast to the action-heavy nature of bias-contributing themes in English (i.e., criminal, sexual, and prosocial behaviors). These findings point to differences in how English and non-English…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.