Intra-Fairness Dynamics: The Bias Spillover Effect in Targeted LLM Alignment

Eva Paraschou; Line Harder Clemmensen; Sneha Das

arXiv:2602.16438·cs.LG·February 19, 2026

Intra-Fairness Dynamics: The Bias Spillover Effect in Targeted LLM Alignment

Eva Paraschou, Line Harder Clemmensen, Sneha Das

PDF

Open Access

TL;DR

This paper explores how targeted fairness interventions in large language models can unintentionally cause bias spillover across multiple sensitive attributes, emphasizing the importance of context-aware fairness evaluation.

Contribution

It introduces the concept of bias spillover in LLM fairness, demonstrating its occurrence across nine attributes and highlighting the need for multi-attribute, context-sensitive fairness assessments.

Findings

01

Bias spillover occurs when fairness improvements in one attribute worsen others.

02

Context-aware analysis reveals significant bias degradation in ambiguous situations.

03

Targeted fairness interventions can inadvertently increase disparities across multiple attributes.

Abstract

Conventional large language model (LLM) fairness alignment largely focuses on mitigating bias along single sensitive attributes, overlooking fairness as an inherently multidimensional and context-specific value. This approach risks creating systems that achieve narrow fairness metrics while exacerbating disparities along untargeted attributes, a phenomenon known as bias spillover. While extensively studied in machine learning, bias spillover remains critically underexplored in LLM alignment. In this work, we investigate how targeted gender alignment affects fairness across nine sensitive attributes in three state-of-the-art LLMs (Mistral 7B, Llama 3.1 8B, Qwen 2.5 7B). Using Direct Preference Optimization and the BBQ benchmark, we evaluate fairness under ambiguous and disambiguous contexts. Our findings reveal noticeable bias spillover: while aggregate results show improvements,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Mobile Crowdsensing and Crowdsourcing