Are Models Biased on Text without Gender-related Language?

Catarina G Bel\'em; Preethi Seshadri; Yasaman Razeghi; Sameer; Singh

arXiv:2405.00588·cs.CL·May 2, 2024·2 cites

Are Models Biased on Text without Gender-related Language?

Catarina G Bel\'em, Preethi Seshadri, Yasaman Razeghi, Sameer, Singh

PDF

Open Access 1 Repo 1 Datasets 1 Video 3 Reviews

TL;DR

This paper introduces UnStereoEval, a framework for assessing gender bias in language models in non-stereotypical contexts, revealing widespread bias even without gendered language.

Contribution

The paper presents a novel evaluation framework, UnStereoEval, for measuring gender bias in stereotype-free sentences, and benchmarks 28 models, uncovering pervasive bias beyond stereotypical language.

Findings

01

Models show low fairness in stereotype-free sentences (9%-41%)

02

Bias exists even without gender-related words

03

Highlights need for comprehensive bias evaluation methods

Abstract

Gender bias research has been pivotal in revealing undesirable behaviors in large language models, exposing serious gender stereotypes associated with occupations, and emotions. A key observation in prior work is that models reinforce stereotypes as a consequence of the gendered correlations that are present in the training data. In this paper, we focus on bias where the effect from training data is unclear, and instead address the question: Do language models still exhibit gender bias in non-stereotypical settings? To do so, we introduce UnStereoEval (USE), a novel framework tailored for investigating gender bias in stereotype-free scenarios. USE defines a sentence-level score based on pretraining data statistics to determine if the sentence contain minimal word-gender associations. To systematically benchmark the fairness of popular language models in stereotype-free scenarios, we…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

* The paper addresses an important area in the field of language technology * The analysis in Section 4 is thorough considering various factors that can be impacting the results. * The presentation is clear with visualizations and tables used appropriately making it easier for the reader to understand the work

Weaknesses

* I am not very convinced with the correctness of generated sentences. As authors themselves mention in the limitations, the generation doesn't involve a human in the loop. Additionally, the generation is limited to a single model (ChatGPT) and having diversity in the models for a model-based benchmark construction would be a better and more fair way to go about it. * The definition of bias is also not very clear. Since it is a non-stereotypical setting, we see the models favour male gender ove

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The idea is straightforward, and the authors make the flow easy to follow. The authors did the analysis with different model families and demonstrated models show clear gender preference.

Weaknesses

Since LLMs have gender bias is not new, I'm curious about what new aspects this paper is adding. It will be great if the authors provide more details about why we need to care about this neutral setting. And a deeper understanding of under which case models prefer certain gender will also make the paper stronger.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

An interesting analysis of gender preference in neutral contexts A comprehensive analysis for many models

Weaknesses

My main concern with the analysis relates to the little correlation that exists between intrinsic measures of bias and bias in downstream tasks. As such, I am not sure whether or how this property would influence the fairness or the potential harms in a downstream task. See more details in the Questions section. While I find the analysis interesting, I am not sure whether analyzing intrinsic measures of bias is useful or impactful. There have been several papers that show issues with intrinsic

Code & Models

Repositories

ucinlp/unstereo-eval
pytorchOfficial

Datasets

ucinlp/unstereo-eval
dataset· 54 dl
54 dl

Videos

Are Models Biased on Text without Gender-related Language?· slideslive

Taxonomy

TopicsGender Studies in Language

MethodsMultilingual Universal Sentence Encoder · Focus