Are Models Biased on Text without Gender-related Language?
Catarina G Bel\'em, Preethi Seshadri, Yasaman Razeghi, Sameer, Singh

TL;DR
This paper introduces UnStereoEval, a framework for assessing gender bias in language models in non-stereotypical contexts, revealing widespread bias even without gendered language.
Contribution
The paper presents a novel evaluation framework, UnStereoEval, for measuring gender bias in stereotype-free sentences, and benchmarks 28 models, uncovering pervasive bias beyond stereotypical language.
Findings
Models show low fairness in stereotype-free sentences (9%-41%)
Bias exists even without gender-related words
Highlights need for comprehensive bias evaluation methods
Abstract
Gender bias research has been pivotal in revealing undesirable behaviors in large language models, exposing serious gender stereotypes associated with occupations, and emotions. A key observation in prior work is that models reinforce stereotypes as a consequence of the gendered correlations that are present in the training data. In this paper, we focus on bias where the effect from training data is unclear, and instead address the question: Do language models still exhibit gender bias in non-stereotypical settings? To do so, we introduce UnStereoEval (USE), a novel framework tailored for investigating gender bias in stereotype-free scenarios. USE defines a sentence-level score based on pretraining data statistics to determine if the sentence contain minimal word-gender associations. To systematically benchmark the fairness of popular language models in stereotype-free scenarios, we…
Peer Reviews
Decision·ICLR 2024 poster
* The paper addresses an important area in the field of language technology * The analysis in Section 4 is thorough considering various factors that can be impacting the results. * The presentation is clear with visualizations and tables used appropriately making it easier for the reader to understand the work
* I am not very convinced with the correctness of generated sentences. As authors themselves mention in the limitations, the generation doesn't involve a human in the loop. Additionally, the generation is limited to a single model (ChatGPT) and having diversity in the models for a model-based benchmark construction would be a better and more fair way to go about it. * The definition of bias is also not very clear. Since it is a non-stereotypical setting, we see the models favour male gender ove
The idea is straightforward, and the authors make the flow easy to follow. The authors did the analysis with different model families and demonstrated models show clear gender preference.
Since LLMs have gender bias is not new, I'm curious about what new aspects this paper is adding. It will be great if the authors provide more details about why we need to care about this neutral setting. And a deeper understanding of under which case models prefer certain gender will also make the paper stronger.
An interesting analysis of gender preference in neutral contexts A comprehensive analysis for many models
My main concern with the analysis relates to the little correlation that exists between intrinsic measures of bias and bias in downstream tasks. As such, I am not sure whether or how this property would influence the fairness or the potential harms in a downstream task. See more details in the Questions section. While I find the analysis interesting, I am not sure whether analyzing intrinsic measures of bias is useful or impactful. There have been several papers that show issues with intrinsic
Code & Models
Videos
Taxonomy
TopicsGender Studies in Language
MethodsMultilingual Universal Sentence Encoder · Focus
