Data Matters Most: Auditing Social Bias in Contrastive Vision Language Models

Zahraa Al Sahili; Ioannis Patras; Matthew Purver

arXiv:2501.13223·cs.LG·January 26, 2026

Data Matters Most: Auditing Social Bias in Contrastive Vision Language Models

Zahraa Al Sahili, Ioannis Patras, Matthew Purver

PDF

Open Access

TL;DR

This paper investigates how model size, training data scale, and data source influence social biases in vision-language models, revealing data source as the primary bias driver and evaluating debiasing methods.

Contribution

It systematically compares models with identical objectives but different data sources and sizes, highlighting data source as the key factor in bias and debiasing effectiveness.

Findings

01

Increasing encoder size reduces gender bias in CLIP but amplifies racial bias in OpenCLIP.

02

Expanding LAION dataset increases racial bias in OpenCLIP.

03

Data source choice significantly impacts bias patterns and debiasing success.

Abstract

Vision-language models (VLMs) deliver strong zero-shot recognition but frequently inherit social biases from their training data. We systematically disentangle three design factors -- model size, training-data scale, and training-data source -- by comparing CLIP and OpenCLIP, two models that share an identical contrastive objective yet differ in encoder width and in the image-text corpora on which they are pre-trained (400M proprietary pairs vs. 400M/2B LAION). Across balanced face-analysis benchmarks, enlarging the encoder reduces gender skew in CLIP but amplifies both gender and racial skew in OpenCLIP; increasing the LAION corpus from 400M to 2B further increases OpenCLIP bias. At matched model and data budgets, substituting proprietary data with LAION improves gender fairness while increasing racial skew, underscoring data source as the primary driver of bias patterns. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicslinguistics and terminology studies · Text Readability and Simplification

MethodsDiffusion · Contrastive Language-Image Pre-training