Unveiling the "Fairness Seesaw": Discovering and Mitigating Gender and Race Bias in Vision-Language Models
Jian Lan, Udo Schlegel, Tanveer Hannan, Gengyuan Zhang, Haokun Chen, Thomas Seidl

TL;DR
This paper systematically uncovers gender and race biases in vision-language models, revealing internal bias dynamics and proposing a post-hoc method to improve fairness and calibration without harming reasoning.
Contribution
It introduces a novel analysis of bias mechanisms in VLMs and proposes RES-FAIR, a framework for bias mitigation through hidden state adjustment.
Findings
Models often produce fair labels but with skewed confidence scores.
Fairness knowledge varies across model layers, peaking mid-way.
Within layers, residual streams may carry conflicting social biases.
Abstract
Although Vision-Language Models (VLMs) have achieved remarkable success, the knowledge mechanisms underlying their social biases remain a black box, where fairness- and ethics-related problems harm certain groups of people in society. It is unknown to what extent VLMs yield gender and race bias in generative responses. In this paper, we conduct a systematic discovery of gender and race bias in state-of-the-art VLMs, focusing not only on surface-level responses but also on the internal probability distributions and hidden state dynamics. Our empirical analysis reveals three critical findings: 1) The Fairness Paradox: Models often generate fair text labels while maintaining highly skewed confidence scores (mis-calibration) toward specific social groups. 2) Layer-wise Fluctuation: Fairness knowledge is not uniformly distributed; it peaks in intermediate layers and undergoes substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · AI in Service Interactions · Speech and dialogue systems
MethodsFocus
