Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos

Haodong Chen; Qiang Huang; Jiaqi Zhao; Qiuping Jiang; Xiaojun Chang; Jun Yu

arXiv:2601.06931·cs.CV·April 21, 2026

Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos

Haodong Chen, Qiang Huang, Jiaqi Zhao, Qiuping Jiang, Xiaojun Chang, Jun Yu

PDF

TL;DR

This paper introduces a face-only counterfactual evaluation method and dataset to measure social bias in vision-language models, revealing persistent demographic disparities across tasks.

Contribution

It proposes a novel face-only counterfactual paradigm, creates the FOCUS dataset, and benchmarks bias in VLMs across multiple decision tasks.

Findings

01

Demographic disparities persist under strict visual control.

02

Bias varies substantially across different task formulations.

03

Counterfactual evaluation reveals biases not apparent in original images.

Abstract

Vision-Language Models (VLMs) are increasingly deployed in socially consequential settings, raising concerns about social bias driven by demographic cues. A central challenge in measuring such social bias is attribution under visual confounding: real-world images entangle race and gender with correlated factors such as background and clothing, obscuring attribution. We propose a \textbf{face-only counterfactual evaluation paradigm} that isolates demographic effects while preserving real-image realism. Starting from real photographs, we generate counterfactual variants by editing only facial attributes related to race and gender, keeping all other visual factors fixed. Based on this paradigm, we construct \textbf{FOCUS}, a dataset of 480 scene-matched counterfactual images across six occupations and ten demographic groups, and propose \textbf{REFLECT}, a benchmark comprising three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.