Gender Bias in Masked Language Models for Multiple Languages

Masahiro Kaneko; Aizhan Imankulova; Danushka Bollegala; Naoaki Okazaki

arXiv:2205.00551·cs.CL·May 5, 2022

Gender Bias in Masked Language Models for Multiple Languages

Masahiro Kaneko, Aizhan Imankulova, Danushka Bollegala, Naoaki Okazaki

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multilingual bias evaluation method for masked language models that does not require manual annotation, revealing gender biases across eight languages and validating the approach with Japanese and Russian datasets.

Contribution

The paper proposes the Multilingual Bias Evaluation (MBE) score, enabling bias assessment in multiple languages using only English attribute lists and parallel corpora, without manual annotation.

Findings

01

Gender biases are present in MLMs across eight languages.

02

The MBE score correlates well with manually created datasets.

03

Bias exists in multilingual MLMs regardless of language.

Abstract

Masked Language Models (MLMs) pre-trained by predicting masked tokens on large corpora have been used successfully in natural language processing tasks for a variety of languages. Unfortunately, it was reported that MLMs also learn discriminative biases regarding attributes such as gender and race. Because most studies have focused on MLMs in English, the bias of MLMs in other languages has rarely been investigated. Manual annotation of evaluation data for languages other than English has been challenging due to the cost and difficulty in recruiting annotators. Moreover, the existing bias evaluation methods require the stereotypical sentence pairs consisting of the same context with attribute words (e.g. He/She is a nurse). We propose Multilingual Bias Evaluation (MBE) score, to evaluate bias in various languages using only English attribute word lists and parallel corpora between the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kanekomasahiro/bias_eval_in_multiple_mlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification