MER-CLIP: AU-Guided Vision-Language Alignment for Micro-Expression Recognition

Shifeng Liu; Xinglong Mao; Sirui Zhao; Peiming Li; Tong Xu; Enhong Chen

arXiv:2505.05937·cs.HC·May 12, 2025

MER-CLIP: AU-Guided Vision-Language Alignment for Micro-Expression Recognition

Shifeng Liu, Xinglong Mao, Sirui Zhao, Peiming Li, Tong Xu, Enhong Chen

PDF

Open Access

TL;DR

This paper introduces MER-CLIP, a novel micro-expression recognition approach that leverages CLIP's cross-modal alignment, AU-based textual descriptions, and data augmentation to improve accuracy and generalization in recognizing subtle facial movements.

Contribution

We propose MER-CLIP, integrating CLIP with AU-based textual descriptions and an emotion inference module, along with a new data augmentation strategy to enhance micro-expression recognition performance.

Findings

01

Achieved UF1 scores of 0.7832, 0.6544, and 0.4997 on CAS(ME)3 for 3-, 4-, and 7-class tasks.

02

Outperformed previous methods significantly on four benchmark datasets.

03

Demonstrated the effectiveness of AU-guided semantic alignment and data augmentation in improving MER accuracy.

Abstract

As a critical psychological stress response, micro-expressions (MEs) are fleeting and subtle facial movements revealing genuine emotions. Automatic ME recognition (MER) holds valuable applications in fields such as criminal investigation and psychological diagnosis. The Facial Action Coding System (FACS) encodes expressions by identifying activations of specific facial action units (AUs), serving as a key reference for ME analysis. However, current MER methods typically limit AU utilization to defining regions of interest (ROIs) or relying on specific prior knowledge, often resulting in limited performance and poor generalization. To address this, we integrate the CLIP model's powerful cross-modal semantic alignment capability into MER and propose a novel approach namely MER-CLIP. Specifically, we convert AU labels into detailed textual descriptions of facial muscle movements, guiding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Face Recognition and Perception · Face recognition and analysis