Biases in the Blind Spot: Detecting What LLMs Fail to Mention

Iv\'an Arcuschin; David Chanin; Adri\`a Garriga-Alonso; Oana-Maria Camburu

arXiv:2602.10117·cs.LG·March 2, 2026

Biases in the Blind Spot: Detecting What LLMs Fail to Mention

Iv\'an Arcuschin, David Chanin, Adri\`a Garriga-Alonso, Oana-Maria Camburu

PDF

Open Access

TL;DR

This paper presents an automated, black-box method for detecting hidden, task-specific biases in large language models by analyzing their reasoning traces without relying on predefined bias categories.

Contribution

The authors introduce a novel pipeline that automatically uncovers unverbalized biases in LLMs, surpassing traditional manual bias detection methods.

Findings

01

Automatically discovers previously unknown biases such as language fluency and formality.

02

Validates known biases like gender, race, and religion.

03

Works across multiple models and decision tasks.

Abstract

Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these *unverbalized biases*. Monitoring models via their stated reasoning is therefore unreliable, and existing bias evaluations typically require predefined categories and hand-crafted datasets. In this work, we introduce a fully automated, black-box pipeline for detecting task-specific unverbalized biases. Given a task dataset, the pipeline uses LLM autoraters to generate candidate bias concepts. It then tests each concept on progressively larger input samples by generating positive and negative variations, and applies statistical techniques for multiple testing and early stopping. A concept is flagged as an unverbalized bias if it yields statistically significant performance differences while not being cited as justification in the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education