Dataset Scale and Societal Consistency Mediate Facial Impression Bias in   Vision-Language AI

Robert Wolfe; Aayushi Dangol; Alexis Hiniker; Bill Howe

arXiv:2408.01959·cs.CV·August 29, 2024

Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI

Robert Wolfe, Aayushi Dangol, Alexis Hiniker, Bill Howe

PDF

Open Access

TL;DR

This study investigates how dataset size and societal consistency influence facial impression biases in CLIP vision-language models and their impact on generative AI, revealing that larger, culturally diverse datasets lead to more human-like and subtle biases.

Contribution

It demonstrates that dataset scale and societal consensus shape the emergence of facial impression biases in CLIP models and their transfer to generative models like Stable Diffusion.

Findings

01

Biases reflect societal consensus across models.

02

Larger datasets produce more human-like biases.

03

Biases intersect with racial biases in generative models.

Abstract

Multimodal AI models capable of associating images and text hold promise for numerous domains, ranging from automated image captioning to accessibility applications for blind and low-vision users. However, uncertainty about bias has in some cases limited their adoption and availability. In the present work, we study 43 CLIP vision-language models to determine whether they learn human-like facial impression biases, and we find evidence that such biases are reflected across three distinct CLIP model families. We show for the first time that the the degree to which a bias is shared across a society predicts the degree to which it is reflected in a CLIP model. Human-like impressions of visually unobservable attributes, like trustworthiness and sexuality, emerge only in models trained on the largest dataset, indicating that a better fit to uncurated cultural data results in the reproduction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Face Recognition and Perception

MethodsContrastive Language-Image Pre-training · Diffusion