Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages
Lance Calvin Lim Gamboa, Yue Feng, and Mark Lee

TL;DR
This paper adapts an information-theoretic bias attribution metric for agglutinative languages like Filipino, revealing distinct bias patterns in Filipino and multilingual models compared to English models.
Contribution
It extends bias interpretability methods to agglutinative languages and demonstrates their effectiveness on Filipino and multilingual models.
Findings
Filipino models are biased by words related to people, objects, and relationships.
Bias themes in Filipino differ from English, focusing on entities rather than actions.
Filipino and multilingual models process sociodemographic bias differently from English models.
Abstract
Emerging research on bias attribution and interpretability have revealed how tokens contribute to biased behavior in language models processing English texts. We build on this line of inquiry by adapting the information-theoretic bias attribution score metric for implementation on models handling agglutinative languages, particularly Filipino. We then demonstrate the effectiveness of our adapted method by using it on a purely Filipino model and on three multilingual models: one trained on languages worldwide and two on Southeast Asian data. Our results show that Filipino models are driven towards bias by words pertaining to people, objects, and relationships, entity-based themes that stand in contrast to the action-heavy nature of bias-contributing themes in English (i.e., criminal, sexual, and prosocial behaviors). These findings point to differences in how English and non-English…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
