Intra-Fairness Dynamics: The Bias Spillover Effect in Targeted LLM Alignment
Eva Paraschou, Line Harder Clemmensen, Sneha Das

TL;DR
This paper explores how targeted fairness interventions in large language models can unintentionally cause bias spillover across multiple sensitive attributes, emphasizing the importance of context-aware fairness evaluation.
Contribution
It introduces the concept of bias spillover in LLM fairness, demonstrating its occurrence across nine attributes and highlighting the need for multi-attribute, context-sensitive fairness assessments.
Findings
Bias spillover occurs when fairness improvements in one attribute worsen others.
Context-aware analysis reveals significant bias degradation in ambiguous situations.
Targeted fairness interventions can inadvertently increase disparities across multiple attributes.
Abstract
Conventional large language model (LLM) fairness alignment largely focuses on mitigating bias along single sensitive attributes, overlooking fairness as an inherently multidimensional and context-specific value. This approach risks creating systems that achieve narrow fairness metrics while exacerbating disparities along untargeted attributes, a phenomenon known as bias spillover. While extensively studied in machine learning, bias spillover remains critically underexplored in LLM alignment. In this work, we investigate how targeted gender alignment affects fairness across nine sensitive attributes in three state-of-the-art LLMs (Mistral 7B, Llama 3.1 8B, Qwen 2.5 7B). Using Direct Preference Optimization and the BBQ benchmark, we evaluate fairness under ambiguous and disambiguous contexts. Our findings reveal noticeable bias spillover: while aggregate results show improvements,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Mobile Crowdsensing and Crowdsourcing
