What's Not Said Still Hurts: A Description-Based Evaluation Framework for Measuring Social Bias in LLMs

Jinhao Pan; Chahat Raj; Ziyu Yao; Ziwei Zhu

arXiv:2502.19749·cs.CL·September 18, 2025

What's Not Said Still Hurts: A Description-Based Evaluation Framework for Measuring Social Bias in LLMs

Jinhao Pan, Chahat Raj, Ziyu Yao, Ziwei Zhu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents a new evaluation framework for detecting subtle social biases in large language models by analyzing bias within naturalistic, contextually framed scenarios rather than just term associations.

Contribution

The authors introduce the Description-based Bias Benchmark (DBB), a novel dataset that assesses bias at the semantic level in realistic contexts, revealing persistent biases in state-of-the-art LLMs.

Findings

01

Models reduce bias at the term level but not in nuanced contexts.

02

Bias persists in subtle, contextually hidden forms.

03

The benchmark uncovers biases that traditional methods miss.

Abstract

Large Language Models (LLMs) often exhibit social biases inherited from their training data. While existing benchmarks evaluate bias by term-based mode through direct term associations between demographic terms and bias terms, LLMs have become increasingly adept at avoiding biased responses, leading to seemingly low levels of bias. However, biases persist in subtler, contextually hidden forms that traditional benchmarks fail to capture. We introduce the Description-based Bias Benchmark (DBB), a novel dataset designed to assess bias at the semantic level that bias concepts are hidden within naturalistic, subtly framed contexts in real-world scenarios rather than superficial terms. We analyze six state-of-the-art LLMs, revealing that while models reduce bias in response at the term level, they continue to reinforce biases in nuanced settings. Data, code, and results are available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jp-25/hidden-bias-benchmark
pytorchOfficial

Videos

What's Not Said Still Hurts: A Description-Based Evaluation Framework for Measuring Social Bias in LLMs· underline

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods