An Analysis of Social Biases Present in BERT Variants Across Multiple   Languages

Aristides Milios (1; 2); Parishad BehnamGhader (1; 2) ((1); McGill University; (2) Mila)

arXiv:2211.14402·cs.CL·November 29, 2022·5 cites

An Analysis of Social Biases Present in BERT Variants Across Multiple Languages

Aristides Milios (1, 2), Parishad BehnamGhader (1, 2) ((1), McGill University, (2) Mila)

PDF

Open Access 1 Repo

TL;DR

This paper investigates social biases in monolingual BERT models across English, Greek, and Persian, analyzing gender, religious, and ethnic biases using a novel template-based measurement approach that accounts for linguistic diversity.

Contribution

It introduces a language-agnostic bias measurement method and provides a cross-linguistic analysis of biases in BERT models, highlighting cultural and linguistic differences.

Findings

01

Bias measurement varies significantly across languages.

02

Cultural and linguistic factors influence bias expression.

03

Higher biases in non-English models may relate to training data content.

Abstract

Although large pre-trained language models have achieved great success in many NLP tasks, it has been shown that they reflect human biases from their pre-training corpora. This bias may lead to undesirable outcomes when these models are applied in real-world settings. In this paper, we investigate the bias present in monolingual BERT models across a diverse set of languages (English, Greek, and Persian). While recent research has mostly focused on gender-related biases, we analyze religious and ethnic biases as well and propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood, that can handle morphologically complex languages with gender-based adjective declensions. We analyze each monolingual model via this method and visualize cultural similarities and differences across different dimensions of bias. Ultimately, we conclude that current methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

parishadbehnam/social-biases-in-bert-variants-across-multiple-languages
mxnetOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Residual Connection · Dense Connections · Layer Normalization · WordPiece · Linear Warmup With Linear Decay · Softmax