Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge   Transfer

Zengqun Zhao; Yu Cao; Shaogang Gong; Ioannis Patras

arXiv:2405.19100·cs.CV·November 27, 2024·1 cites

Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

Zengqun Zhao, Yu Cao, Shaogang Gong, Ioannis Patras

PDF

Open Access 1 Repo

TL;DR

This paper introduces Exp-CLIP, a novel method that enhances zero-shot facial expression recognition by transferring task-specific knowledge from large language models to improve generalization on unseen data.

Contribution

The work proposes a new approach to improve zero-shot FER by integrating LLM-derived semantic knowledge into vision-language models through a specialized projection head.

Findings

01

Exp-CLIP outperforms CLIP and other LVLMs on seven FER datasets.

02

The method effectively transfers LLM knowledge to improve zero-shot recognition.

03

Results demonstrate superior generalization to unseen facial expressions.

Abstract

Current facial expression recognition (FER) models are often designed in a supervised learning manner and thus are constrained by the lack of large-scale facial expression images with high-quality annotations. Consequently, these models often fail to generalize well, performing poorly on unseen images in inference. Vision-language-based zero-shot models demonstrate a promising potential for addressing such challenges. However, these models lack task-specific knowledge and therefore are not optimized for the nuances of recognizing facial expressions. To bridge this gap, this work proposes a novel method, Exp-CLIP, to enhance zero-shot FER by transferring the task knowledge from large language models (LLMs). Specifically, based on the pre-trained vision-language encoders, we incorporate a projection head designed to map the initial joint vision-language space into a space that captures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zengqunzhao/exp-clip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Analysis · Brain Tumor Detection and Classification · Face recognition and analysis

MethodsALIGN · Contrastive Language-Image Pre-training