TL;DR
This paper investigates ethnic bias in monolingual BERT models across multiple languages, introduces a new bias metric, and proposes two mitigation strategies that effectively reduce bias depending on resource availability.
Contribution
It develops a novel ethnic bias metric and compares two mitigation methods, demonstrating their effectiveness across diverse languages and resource settings.
Findings
Mitigation methods effectively reduce ethnic bias in BERT.
Multilingual and contextual alignment approaches vary in effectiveness based on language resources.
Proposed methods generalize to languages like Arabic and Greek.
Abstract
BERT and other large-scale language models (LMs) contain gender and racial bias. They also exhibit other dimensions of social bias, most of which have not been studied in depth, and some of which vary depending on the language. In this paper, we study ethnic bias and how it varies across languages by analyzing and mitigating ethnic bias in monolingual BERT for English, German, Spanish, Korean, Turkish, and Chinese. To observe and quantify ethnic bias, we develop a novel metric called Categorical Bias score. Then we propose two methods for mitigation; first using a multilingual model, and second using contextual word alignment of two monolingual models. We compare our proposed methods with monolingual BERT and show that these methods effectively alleviate the ethnic bias. Which of the two methods works better depends on the amount of NLP resources available for that language. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Softmax · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Residual Connection
