A Representation-Level Assessment of Bias Mitigation in Foundation Models

Svetoslav Nizhnichenkov; Rahul Nair; Elizabeth Daly; Brian Mac Namee

arXiv:2604.08561·cs.CL·April 13, 2026

A Representation-Level Assessment of Bias Mitigation in Foundation Models

Svetoslav Nizhnichenkov, Rahul Nair, Elizabeth Daly, Brian Mac Namee

PDF

1 Repo

TL;DR

This paper analyzes how bias mitigation techniques alter the internal representations of foundation models like BERT and Llama2, demonstrating reduced gender-occupation bias through geometric changes in embeddings.

Contribution

It provides an internal representational analysis of bias mitigation effects and introduces WinoDec, a new dataset for assessing decoder-only models.

Findings

01

Bias mitigation reduces gender-occupation disparities in embeddings.

02

Representational shifts are consistent across different model architectures.

03

Embedding analysis can validate debiasing effectiveness.

Abstract

We investigate how successful bias mitigation reshapes the embedding space of encoder-only and decoder-only foundation models, offering an internal audit of model behaviour through representational analysis. Using BERT and Llama2 as representative architectures, we assess the shifts in associations between gender and occupation terms by comparing baseline and bias-mitigated variants of the models. Our findings show that bias mitigation reduces gender-occupation disparities in the embedding space, leading to more neutral and balanced internal representations. These representational shifts are consistent across both model types, suggesting that fairness improvements can manifest as interpretable and geometric transformations. These results position embedding analysis as a valuable tool for understanding and validating the effectiveness of debiasing methods in foundation models. To further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

winodec/wino-dec
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.