Unveiling the "Fairness Seesaw": Discovering and Mitigating Gender and Race Bias in Vision-Language Models

Jian Lan; Udo Schlegel; Tanveer Hannan; Gengyuan Zhang; Haokun Chen; Thomas Seidl

arXiv:2505.23798·cs.CL·February 12, 2026

Unveiling the "Fairness Seesaw": Discovering and Mitigating Gender and Race Bias in Vision-Language Models

Jian Lan, Udo Schlegel, Tanveer Hannan, Gengyuan Zhang, Haokun Chen, Thomas Seidl

PDF

Open Access

TL;DR

This paper systematically uncovers gender and race biases in vision-language models, revealing internal bias dynamics and proposing a post-hoc method to improve fairness and calibration without harming reasoning.

Contribution

It introduces a novel analysis of bias mechanisms in VLMs and proposes RES-FAIR, a framework for bias mitigation through hidden state adjustment.

Findings

01

Models often produce fair labels but with skewed confidence scores.

02

Fairness knowledge varies across model layers, peaking mid-way.

03

Within layers, residual streams may carry conflicting social biases.

Abstract

Although Vision-Language Models (VLMs) have achieved remarkable success, the knowledge mechanisms underlying their social biases remain a black box, where fairness- and ethics-related problems harm certain groups of people in society. It is unknown to what extent VLMs yield gender and race bias in generative responses. In this paper, we conduct a systematic discovery of gender and race bias in state-of-the-art VLMs, focusing not only on surface-level responses but also on the internal probability distributions and hidden state dynamics. Our empirical analysis reveals three critical findings: 1) The Fairness Paradox: Models often generate fair text labels while maintaining highly skewed confidence scores (mis-calibration) toward specific social groups. 2) Layer-wise Fluctuation: Fairness knowledge is not uniformly distributed; it peaks in intermediate layers and undergoes substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · AI in Service Interactions · Speech and dialogue systems

MethodsFocus