BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation
Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada, Pruksachatkun, Kai-Wei Chang, Rahul Gupta

TL;DR
This paper introduces BOLD, a large dataset and new metrics to systematically measure social biases in open-ended language generation models, revealing that current models exhibit significant biases compared to human text.
Contribution
The paper presents BOLD, a comprehensive dataset and automated metrics for benchmarking biases in open-ended language generation across multiple social domains.
Findings
Language models show larger biases than human text across domains.
New metrics effectively measure toxicity, psycholinguistic norms, and gender polarity.
Bias benchmarking highlights the need for bias mitigation in language models.
Abstract
Recent advances in deep learning techniques have enabled machines to generate cohesive open-ended text when prompted with a sequence of words as context. While these models now empower many downstream applications from conversation bots to automatic storytelling, they have been shown to generate texts that exhibit social biases. To systematically study and benchmark social biases in open-ended language generation, we introduce the Bias in Open-Ended Language Generation Dataset (BOLD), a large-scale dataset that consists of 23,679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion, and political ideology. We also propose new automated metrics for toxicity, psycholinguistic norms, and text gender polarity to measure social biases in open-ended text generation from multiple angles. An examination of text generated from three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗google/gemma-7bmodel· 30k dl· ♡ 329330k dl♡ 3293
- 🤗google/gemma-2-2b-itmodel· 368k dl· ♡ 1314368k dl♡ 1314
- 🤗google/gemma-2-2bmodel· 489k dl· ♡ 636489k dl♡ 636
- 🤗google/gemma-2bmodel· 174k dl· ♡ 1152174k dl♡ 1152
- 🤗google/gemma-2-27b-itmodel· 309k dl· ♡ 561309k dl♡ 561
- 🤗google/gemma-2-9b-itmodel· 254k dl· ♡ 781254k dl♡ 781
- 🤗ataeff/recurrentgemma-2b-itmodel· ♡ 1♡ 1
- 🤗google/gemma-2b-itmodel· 57k dl· ♡ 86257k dl♡ 862
- 🤗google/gemma-7b-itmodel· 67k dl· ♡ 124167k dl♡ 1241
- 🤗alpindale/gemma-7bmodel· 66 dl· ♡ 766 dl♡ 7
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
