No Filter: Cultural and Socioeconomic Diversity in Contrastive   Vision-Language Models

Ang\'eline Pouget; Lucas Beyer; Emanuele Bugliarello; Xiao Wang,; Andreas Peter Steiner; Xiaohua Zhai; Ibrahim Alabdulmohsin

arXiv:2405.13777·cs.CV·October 25, 2024

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models

Ang\'eline Pouget, Lucas Beyer, Emanuele Bugliarello, Xiao Wang,, Andreas Peter Steiner, Xiaohua Zhai, Ibrahim Alabdulmohsin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how filtering training data to English image-text pairs in contrastive vision-language models can limit cultural and socioeconomic diversity, proposing new evaluation metrics and training strategies to promote inclusivity.

Contribution

It reveals the biases introduced by data filtering, introduces geo-localization as a new diversity metric, and demonstrates that unfiltered global pretraining enhances cultural understanding.

Findings

01

Filtering biases lower socioeconomic and cultural diversity.

02

Unfiltered pretraining improves cultural understanding.

03

Geo-localization effectively measures cultural diversity.

Abstract

We study cultural and socioeconomic diversity in contrastive vision-language models (VLMs). Using a broad range of benchmark datasets and evaluation metrics, we bring to attention several important findings. First, the common filtering of training data to English image-text pairs disadvantages communities of lower socioeconomic status and negatively impacts cultural understanding. Notably, this performance gap is not captured by - and even at odds with - the currently popular evaluation metrics derived from the Western-centric ImageNet and COCO datasets. Second, pretraining with global, unfiltered data before fine-tuning on English content can improve cultural understanding without sacrificing performance on said popular benchmarks. Third, we introduce the task of geo-localization as a novel evaluation metric to assess cultural diversity in VLMs. Our work underscores the value of using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/big_vision
jaxOfficial

Videos

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models· slideslive

Taxonomy

TopicsReligious Education and Schools