Bias and Fairness in Large Language Models: A Survey

Isabel O. Gallegos; Ryan A. Rossi; Joe Barrow; Md Mehrab Tanjim,; Sungchul Kim; Franck Dernoncourt; Tong Yu; Ruiyi Zhang; Nesreen K. Ahmed

arXiv:2309.00770·cs.CL·July 16, 2024·59 cites

Bias and Fairness in Large Language Models: A Survey

Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim,, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Nesreen K. Ahmed

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This survey comprehensively reviews bias evaluation and mitigation techniques for large language models, formalizing notions of social bias and fairness, and proposing taxonomies to organize existing methods and datasets.

Contribution

It introduces unified taxonomies for bias metrics, datasets, and mitigation techniques, and formalizes social bias and fairness notions in NLP.

Findings

01

Provides a consolidated, accessible collection of bias evaluation datasets

02

Classifies bias mitigation methods by intervention stage

03

Identifies open challenges and future directions

Abstract

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

i-gallegos/fair-llm-benchmark
noneOfficial

Datasets

BAAI/SurveyScope
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection