Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models
Abhinaba Basu, Pavan Chakraborty

TL;DR
This paper introduces Contextual StereoSet, a benchmark for stress-testing bias robustness in large language models by varying contextual framing, revealing significant bias shifts and proposing a diagnostic profile for evaluation consistency.
Contribution
The paper presents a novel benchmark and methodology for evaluating bias robustness in language models under different contextual conditions, highlighting the variability of bias detection.
Findings
Bias shifts significantly with contextual framing.
Certain frames increase stereotype selection across models.
Evaluation scores may not generalize across different contexts.
Abstract
A model that avoids stereotypes in a lab benchmark may not avoid them in deployment. We show that measured bias shifts dramatically when prompts mention different places, times, or audiences -- no adversarial prompting required. We introduce Contextual StereoSet, a benchmark that holds stereotype content fixed while systematically varying contextual framing. Testing 13 models across two protocols, we find striking patterns: anchoring to 1990 (vs. 2030) raises stereotype selection in all models tested on this contrast (p<0.05); gossip framing raises it in 5 of 6 full-grid models; out-group observer framing shifts it by up to 13 percentage points. These effects replicate in hiring, lending, and help-seeking vignettes. We propose Context Sensitivity Fingerprints (CSF): a compact profile of per-dimension dispersion and paired contrasts with bootstrap CIs and FDR correction. Two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Misinformation and Its Impacts · Ethics and Social Impacts of AI
