TL;DR
This study investigates how a popular vision-language model perceives human faces socially, revealing biases related to protected attributes and emphasizing the importance of controlled experiments for accurate bias assessment.
Contribution
It introduces a systematic, experimental approach to study social perception biases in CLIP using synthetic faces with controlled attribute variations.
Findings
CLIP can make human-like social judgments on faces.
Biases related to age, gender, and race are present in CLIP.
Facial expression influences social perception more than age or lighting.
Abstract
We explore social perception of human faces in CLIP, a widely used open-source vision-language model. To this end, we compare the similarity in CLIP embeddings between different textual prompts and a set of face images. Our textual prompts are constructed from well-validated social psychology terms denoting social perception. The face images are synthetic and are systematically and independently varied along six dimensions: the legally protected attributes of age, gender, and race, as well as facial expression, lighting, and pose. Independently and systematically manipulating face attributes allows us to study the effect of each on social perception and avoids confounds that can occur in wild-collected data due to uncontrolled systematic correlations between attributes. Thus, our findings are experimental rather than observational. Our main findings are three. First, while CLIP is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
