Gender Bias in Masked Language Models for Multiple Languages
Masahiro Kaneko, Aizhan Imankulova, Danushka Bollegala, Naoaki Okazaki

TL;DR
This paper introduces a multilingual bias evaluation method for masked language models that does not require manual annotation, revealing gender biases across eight languages and validating the approach with Japanese and Russian datasets.
Contribution
The paper proposes the Multilingual Bias Evaluation (MBE) score, enabling bias assessment in multiple languages using only English attribute lists and parallel corpora, without manual annotation.
Findings
Gender biases are present in MLMs across eight languages.
The MBE score correlates well with manually created datasets.
Bias exists in multilingual MLMs regardless of language.
Abstract
Masked Language Models (MLMs) pre-trained by predicting masked tokens on large corpora have been used successfully in natural language processing tasks for a variety of languages. Unfortunately, it was reported that MLMs also learn discriminative biases regarding attributes such as gender and race. Because most studies have focused on MLMs in English, the bias of MLMs in other languages has rarely been investigated. Manual annotation of evaluation data for languages other than English has been challenging due to the cost and difficulty in recruiting annotators. Moreover, the existing bias evaluation methods require the stereotypical sentence pairs consisting of the same context with attribute words (e.g. He/She is a nurse). We propose Multilingual Bias Evaluation (MBE) score, to evaluate bias in various languages using only English attribute word lists and parallel corpora between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification
