TL;DR
This paper systematically surveys fairness definitions in language models, clarifying their distinctions, practical implications, and categorizing them based on transformer architecture to guide future research and application.
Contribution
It provides a comprehensive overview of fairness notions in LMs, introduces a novel taxonomy based on transformer types, and illustrates each definition through experiments.
Findings
Different fairness notions have distinct practical implications.
A taxonomy categorizes fairness definitions by transformer architecture.
Experiments demonstrate the practical outcomes of various fairness notions.
Abstract
Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real-world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairness notions. However, the lack of clear agreement on which fairness definition to apply in specific contexts and the complexity of understanding the distinctions between these definitions can create confusion and impede further progress. To this end, this paper proposes a systematic survey that clarifies the definitions of fairness as they apply to LMs. Specifically, we begin with a brief introduction to LMs and fairness in LMs, followed by a comprehensive, up-to-date overview of existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
