Micro-AU CLIP: Fine-Grained Contrastive Learning from Local Independence to Global Dependency for Micro-Expression Action Unit Detection

Jinsheng Wei; Fengzhou Guo; Yante Li; Haoyu Chen; Guanming Lu; Guoying Zhao

arXiv:2603.16302·cs.CV·March 18, 2026

Micro-AU CLIP: Fine-Grained Contrastive Learning from Local Independence to Global Dependency for Micro-Expression Action Unit Detection

Jinsheng Wei, Fengzhou Guo, Yante Li, Haoyu Chen, Guanming Lu, Guoying Zhao

PDF

Open Access

TL;DR

Micro-AU CLIP introduces a novel framework that models local independence and global dependency of facial action units for improved micro-expression recognition, achieving state-of-the-art results.

Contribution

The paper proposes a new micro-AU detection framework combining local semantic independence and global dependency modeling, with a specialized contrastive loss for fine-grained feature learning.

Findings

01

Achieves state-of-the-art micro-AU detection performance

02

Effectively models local independence and global dependency of AUs

03

Enhances micro-expression recognition accuracy

Abstract

Micro-expression (ME) action units (Micro-AUs) provide objective clues for fine-grained genuine emotion analysis. Most existing Micro-AU detection methods learn AU features from the whole facial image/video, which conflicts with the inherent locality of AU, resulting in insufficient perception of AU regions. In fact, each AU independently corresponds to specific localized facial muscle movements (local independence), while there is an inherent dependency between some AUs under specific emotional states (global dependency). Thus, this paper explores the effectiveness of the independence-to-dependency pattern and proposes a novel micro-AU detection framework, micro-AU CLIP, that uniquely decomposes the AU detection process into local semantic independence modeling (LSI) and global semantic dependency (GSD) modeling. In LSI, Patch Token Attention (PTA) is designed, mapping several local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Face and Expression Recognition · Human Pose and Action Recognition