EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression   Recognition

Niki Maria Foteinopoulou; Ioannis Patras

arXiv:2310.16640·cs.CV·March 19, 2024·1 cites

EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition

Niki Maria Foteinopoulou, Ioannis Patras

PDF

Open Access 1 Repo

TL;DR

EmoCLIP introduces a vision-language model that leverages sample-level text descriptions for zero-shot video facial expression recognition, significantly improving performance over existing methods and aiding mental health assessment.

Contribution

The paper presents a novel zero-shot FER approach using sample-level text supervision, enhancing latent representations and extending applications to mental health symptom estimation.

Findings

01

Outperforms CLIP by over 10% in weighted average recall

02

Achieves Pearson's r up to 0.85 in schizophrenia symptom estimation

03

Demonstrates strong agreement with human experts

Abstract

Facial Expression Recognition (FER) is a crucial task in affective computing, but its conventional focus on the seven basic emotions limits its applicability to the complex and expanding emotional spectrum. To address the issue of new and unseen emotions present in dynamic in-the-wild FER, we propose a novel vision-language model that utilises sample-level text descriptions (i.e. captions of the context, expressions or emotional cues) as natural language supervision, aiming to enhance the learning of rich latent representations, for zero-shot classification. To test this, we evaluate using zero-shot classification of the model trained on sample-level descriptions on four popular dynamic FER datasets. Our findings show that this approach yields significant improvements when compared to baseline methods. Specifically, for zero-shot video FER, we outperform CLIP by over 10\% in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nickyfot/emoclip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Mental Health via Writing · Sentiment Analysis and Opinion Mining

MethodsContrastive Language-Image Pre-training · Focus