TL;DR
This survey critically examines fairness datasets for language models, analyzing their limitations and proposing a unified evaluation framework to improve fairness assessment and guide future benchmark development.
Contribution
It provides a comprehensive analysis of existing fairness datasets, introduces a unified evaluation framework, and offers insights to improve fairness benchmarking in language models.
Findings
Identified biases and limitations in current fairness datasets
Proposed a unified framework for evaluating demographic disparities
Highlighted the need for broader social context in future benchmarks
Abstract
Despite the growing reliance on fairness benchmarks to evaluate language models, the datasets that underpin these benchmarks remain critically underexamined. This survey addresses that overlooked foundation by offering a comprehensive analysis of the most widely used fairness datasets in language model research. To ground this analysis, we characterize each dataset across key dimensions, including provenance, demographic scope, annotation design, and intended use, revealing the assumptions and limitations baked into current evaluation practices. Building on this foundation, we propose a unified evaluation framework that surfaces consistent patterns of demographic disparities across benchmarks and scoring metrics. Applying this framework to sixteen popular datasets, we uncover overlooked biases that may distort conclusions about model fairness and offer guidance on selecting, combining,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
