Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models
Nandini Arimanda, Achyuth Mukund, Sakthi Balan Muthiah, Rajesh Sharma

TL;DR
This paper introduces BADx, a scalable metric to measure how persona contexts influence intersectional biases in large language models, revealing dynamic bias shifts that static tests miss.
Contribution
The study develops BADx, combining bias scores, sensitivity, and volatility, to detect persona-induced bias amplification and explainability in LLMs, advancing bias evaluation methods.
Findings
Bias context significantly modulates model biases.
GPT-4o shows high bias sensitivity and volatility.
LLaMA-4 maintains low volatility and stable bias profile.
Abstract
Large Language Models (LLMs) excel at human-like language generation but often embed and amplify implicit, intersectional biases, especially under persona-driven contexts. Existing bias audits rely on static, embedding-based tests (CEAT, I-WEAT, I-SEAT) that quantify absolute association strengths. We show that they have limitations in capturing dynamic shifts when models adopt social roles. We address this gap by introducing the Bias Amplification Differential and Explainability Score (BADx): a novel, scalable metric that measures persona-induced bias amplification and integrates local explainability insights. BADx comprises three components - differential bias scores (BAD, based on CEAT, I-WEAT, I-SEAT),Persona Sensitivity Index (PSI), and Volatility (Standard Deviation), augmented by LIME-based analysis for emphasizing explainability. This study is divided and performed as two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
