Markedness in Visual Semantic AI
Robert Wolfe, Aylin Caliskan

TL;DR
This paper evaluates biases in the CLIP multimodal model related to age, gender, and race, revealing significant disparities in how it labels and perceives different social groups, reflecting societal biases.
Contribution
It provides a detailed analysis of bias patterns in CLIP, highlighting markedness and self-similarity disparities across social groups, and connects these biases to societal and linguistic influences.
Findings
CLIP labels White individuals as 'person' more often than other races.
Gender and age influence how CLIP marks individuals, with notable disparities.
Biases in CLIP mirror societal stereotypes and linguistic biases present in training data.
Abstract
We evaluate the state-of-the-art multimodal "visual semantic" model CLIP ("Contrastive Language Image Pretraining") for biases related to the marking of age, gender, and race or ethnicity. Given the option to label an image as "a photo of a person" or to select a label denoting race or ethnicity, CLIP chooses the "person" label 47.9% of the time for White individuals, compared with 5.0% or less for individuals who are Black, East Asian, Southeast Asian, Indian, or Latino or Hispanic. The model is more likely to rank the unmarked "person" label higher than labels denoting gender for Male individuals (26.7% of the time) vs. Female individuals (15.2% of the time). Age affects whether an individual is marked by the model: Female individuals under the age of 20 are more likely than Male individuals to be marked with a gender label, but less likely to be marked with an age label, while Female…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
MethodsContrastive Language-Image Pre-training
