Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models
Zahraa Al Sahili, Ioannis Patras, Matthew Purver

TL;DR
This study systematically audits four multilingual vision-language models, revealing that multilinguality often amplifies biases, especially in low-resource and gendered languages, highlighting the need for nuanced bias evaluation.
Contribution
It provides the first comprehensive bias analysis of multiple multilingual VLMs across ten languages, uncovering how multilinguality influences stereotype propagation and bias amplification.
Findings
All models exhibit stronger gender bias than English-only baselines.
Biases are amplified in low-resource and highly gendered languages.
Shared encoders transfer stereotypes across languages, increasing bias.
Abstract
Multilingual vision-language models (VLMs) promise universal image-text retrieval, yet their social biases remain underexplored. We perform the first systematic audit of four public multilingual CLIP variants: M-CLIP, NLLB-CLIP, CAPIVARA-CLIP, and the debiased SigLIP-2, covering ten languages that differ in resource availability and morphological gender marking. Using balanced subsets of FairFace and the PATA stereotype suite in a zero-shot setting, we quantify race and gender bias and measure stereotype amplification. Contrary to the intuition that multilinguality mitigates bias, every model exhibits stronger gender skew than its English-only baseline. CAPIVARA-CLIP shows its largest biases precisely in the low-resource languages it targets, while the shared encoder of NLLB-CLIP and SigLIP-2 transfers English gender stereotypes into gender-neutral languages; loosely coupled encoders…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCategorization, perception, and language · Language, Metaphor, and Cognition
MethodsContrastive Language-Image Pre-training
