AU-LLM: Micro-Expression Action Unit Detection via Enhanced LLM-Based Feature Fusion
Zhishu Liu, Kaishen Yuan, Bo Zhao, Yong Xu, Zitong Yu

TL;DR
This paper introduces AU-LLM, a novel framework that leverages Large Language Models with an enhanced feature fusion technique to detect subtle micro-expression Action Units, achieving state-of-the-art results on benchmark datasets.
Contribution
It pioneers the use of LLMs for micro-expression AU detection and proposes the Enhanced Fusion Projector for effective vision-language feature integration.
Findings
Achieves state-of-the-art performance on CASME II and SAMM datasets.
Demonstrates robustness across LOSO and cross-domain protocols.
Validates the effectiveness of LLM-based reasoning in subtle facial expression analysis.
Abstract
The detection of micro-expression Action Units (AUs) is a formidable challenge in affective computing, pivotal for decoding subtle, involuntary human emotions. While Large Language Models (LLMs) demonstrate profound reasoning abilities, their application to the fine-grained, low-intensity domain of micro-expression AU detection remains unexplored. This paper pioneers this direction by introducing \textbf{AU-LLM}, a novel framework that for the first time uses LLM to detect AUs in micro-expression datasets with subtle intensities and the scarcity of data. We specifically address the critical vision-language semantic gap, the \textbf{Enhanced Fusion Projector (EFP)}. The EFP employs a Multi-Layer Perceptron (MLP) to intelligently fuse mid-level (local texture) and high-level (global semantics) visual features from a specialized 3D-CNN backbone into a single, information-dense token. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
