What's Not Said Still Hurts: A Description-Based Evaluation Framework for Measuring Social Bias in LLMs
Jinhao Pan, Chahat Raj, Ziyu Yao, Ziwei Zhu

TL;DR
This paper presents a new evaluation framework for detecting subtle social biases in large language models by analyzing bias within naturalistic, contextually framed scenarios rather than just term associations.
Contribution
The authors introduce the Description-based Bias Benchmark (DBB), a novel dataset that assesses bias at the semantic level in realistic contexts, revealing persistent biases in state-of-the-art LLMs.
Findings
Models reduce bias at the term level but not in nuanced contexts.
Bias persists in subtle, contextually hidden forms.
The benchmark uncovers biases that traditional methods miss.
Abstract
Large Language Models (LLMs) often exhibit social biases inherited from their training data. While existing benchmarks evaluate bias by term-based mode through direct term associations between demographic terms and bias terms, LLMs have become increasingly adept at avoiding biased responses, leading to seemingly low levels of bias. However, biases persist in subtler, contextually hidden forms that traditional benchmarks fail to capture. We introduce the Description-based Bias Benchmark (DBB), a novel dataset designed to assess bias at the semantic level that bias concepts are hidden within naturalistic, subtly framed contexts in real-world scenarios rather than superficial terms. We analyze six state-of-the-art LLMs, revealing that while models reduce bias in response at the term level, they continue to reinforce biases in nuanced settings. Data, code, and results are available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods
