A Survey on Fairness in Large Language Models
Yingji Li, Mengnan Du, Rui Song, Xin Wang, Ying Wang

TL;DR
This paper provides a comprehensive review of fairness issues in Large Language Models, covering evaluation metrics, debiasing methods, and future challenges across different model sizes and training paradigms.
Contribution
It offers a structured overview of fairness research in LLMs, distinguishing approaches for medium-sized and large-sized models under various training paradigms.
Findings
Evaluation metrics for bias assessment in LLMs
Debiasing techniques for reducing social biases
Identification of challenges and future research directions
Abstract
Large Language Models (LLMs) have shown powerful performance and development prospects and are widely deployed in the real world. However, LLMs can capture social biases from unprocessed training data and propagate the biases to downstream tasks. Unfair LLM systems have undesirable social impacts and potential harms. In this paper, we provide a comprehensive review of related research on fairness in LLMs. Considering the influence of parameter magnitude and training paradigm on research strategy, we divide existing fairness research into oriented to medium-sized LLMs under pre-training and fine-tuning paradigms and oriented to large-sized LLMs under prompting paradigms. First, for medium-sized LLMs, we introduce evaluation metrics and debiasing methods from the perspectives of intrinsic bias and extrinsic bias, respectively. Then, for large-sized LLMs, we introduce recent fairness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI
