Generalizing Fairness to Generative Language Models via Reformulation of   Non-discrimination Criteria

Sara Sterlie; Nina Weng; Aasa Feragen

arXiv:2403.08564·cs.CL·September 4, 2024·3 cites

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

Sara Sterlie, Nina Weng, Aasa Feragen

PDF

Open Access 1 Repo

TL;DR

This paper develops methods to identify and measure gender bias in large language models by adapting fairness criteria from classification, focusing on occupational stereotypes in medical contexts.

Contribution

It introduces generative AI analogues of non-discrimination fairness criteria and applies them to detect occupational gender bias in language models.

Findings

01

Identified gender bias in medical occupational stereotypes.

02

Demonstrated applicability of fairness criteria to generative models.

03

Provided prompts to measure bias in conversational AI.

Abstract

Generative AI, such as large language models, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative language models. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sterlie/fairness-criteria-llm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus