Biased or Flawed? Mitigating Stereotypes in Generative Language Models   by Addressing Task-Specific Flaws

Akshita Jha; Sanchit Kabra; Chandan K. Reddy

arXiv:2412.11414·cs.CL·December 17, 2024

Biased or Flawed? Mitigating Stereotypes in Generative Language Models by Addressing Task-Specific Flaws

Akshita Jha, Sanchit Kabra, Chandan K. Reddy

PDF

Open Access 1 Repo

TL;DR

This paper presents a method to reduce stereotypes in generative language models by addressing comprehension failures through instruction-tuning, achieving over 60% reduction in stereotypical outputs without explicit debiasing.

Contribution

It introduces a targeted stereotype mitigation framework that disentangles bias from comprehension errors and demonstrates its effectiveness across multiple models and bias dimensions.

Findings

01

Over 60% reduction in stereotypical outputs

02

Effective across multiple bias categories

03

Maintains model utility while reducing bias

Abstract

Recent studies have shown that generative language models often reflect and amplify societal biases in their outputs. However, these studies frequently conflate observed biases with other task-specific shortcomings, such as comprehension failure. For example, when a model misinterprets a text and produces a response that reinforces a stereotype, it becomes difficult to determine whether the issue arises from inherent bias or from a misunderstanding of the given content. In this paper, we conduct a multi-faceted evaluation that distinctly disentangles bias from flaws within the reading comprehension task. We propose a targeted stereotype mitigation framework that implicitly mitigates observed stereotypes in generative models through instruction-tuning on general-purpose datasets. We reduce stereotypical outputs by over 60% across multiple dimensions -- including nationality, age, gender,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

akshitajha/biased_or_flawed
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques