From Bias to Balance: Detecting Facial Expression Recognition Biases in   Large Multimodal Foundation Models

Kaylee Chhua; Zhoujinyi Wen; Vedant Hathalia; Kevin Zhu; Sean O'Brien

arXiv:2408.14842·cs.CV·August 28, 2024

From Bias to Balance: Detecting Facial Expression Recognition Biases in Large Multimodal Foundation Models

Kaylee Chhua, Zhoujinyi Wen, Vedant Hathalia, Kevin Zhu, Sean O'Brien

PDF

Open Access

TL;DR

This paper evaluates racial biases in large multimodal foundation models for facial expression recognition, revealing significant disparities and emphasizing the need for fairer FER systems.

Contribution

It benchmarks leading LMFMs for racial bias in FER and provides insights into their performance disparities across demographics.

Findings

01

LMFMs show higher error rates for darker skin tones.

02

Anger is misclassified as Disgust more often in Black Females.

03

High accuracy achieved with linear classifiers on CLIP embeddings.

Abstract

This study addresses the racial biases in facial expression recognition (FER) systems within Large Multimodal Foundation Models (LMFMs). Despite advances in deep learning and the availability of diverse datasets, FER systems often exhibit higher error rates for individuals with darker skin tones. Existing research predominantly focuses on traditional FER models (CNNs, RNNs, ViTs), leaving a gap in understanding racial biases in LMFMs. We benchmark four leading LMFMs: GPT-4o, PaliGemma, Gemini, and CLIP to assess their performance in facial emotion detection across different racial demographics. A linear classifier trained on CLIP embeddings obtains accuracies of 95.9\% for RADIATE, 90.3\% for Tarr, and 99.5\% for Chicago Face. Furthermore, we identify that Anger is misclassified as Disgust 2.1 times more often in Black Females than White Females. This study highlights the need for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition

MethodsContrastive Language-Image Pre-training