Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles
Irti Haq, Bel\'en Sald\'ias

TL;DR
This study investigates how implicit dialect signals versus explicit user profiles influence bias and safety in large language models, revealing complex trade-offs between safety, diversity, and model behavior.
Contribution
It provides a detailed analysis of how explicit and implicit identity cues differently affect LLM safety and bias, highlighting safety over-reliance on explicit keywords.
Findings
Implicit dialect cues reduce refusal rates and increase semantic similarity.
Explicit identity prompts activate safety filters, increasing refusals.
Current safety techniques are brittle and over-rely on explicit keywords.
Abstract
As state-of-the-art Large Language Models (LLMs) have become ubiquitous, ensuring equitable performance across diverse demographics is critical. However, it remains unclear whether these disparities arise from the explicitly stated identity itself or from the way identity is signaled. In real-world interactions, users' identity is often conveyed implicitly through a complex combination of various socio-linguistic factors. This study disentangles these signals by employing a factorial design with over 24,000 responses from two open-weight LLMs (Gemma-3-12B and Qwen-3-VL-8B), comparing prompts with explicitly announced user profiles against implicit dialect signals (e.g., AAVE, Singlish) across various sensitive domains. Our results uncover a unique paradox in LLM safety where users achieve ``better'' performance by sounding like a demographic than by stating they belong to it. Explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
