Hierarchical Space-Time Attention for Micro-Expression Recognition
Haihong Hao, Shuo Wang, Huixia Ben, Yanbin Hao, Yansong Wang, Weiwei, Wang

TL;DR
This paper introduces Hierarchical Space-Time Attention (HSTA), a novel model for micro-expression recognition that captures subtle facial movements by integrating space-time relationships and crossmodal data fusion, leading to improved accuracy.
Contribution
The paper proposes a hierarchical attention framework combining Unimodal and Crossmodal Space-Time Attention to better model facial cues in micro-expression videos, addressing limitations of previous methods.
Findings
Achieves about 3% improvement on CASME3 dataset in seven-category classification.
Effectively models temporal and spatial facial cues.
Outperforms recent methods on four benchmark datasets.
Abstract
Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within these relationships. To solve this issue, we propose the Hierarchical Space-Time Attention (HSTA). Specifically, we first process ME video frames and special frames or data parallelly by our cascaded Unimodal Space-Time Attention (USTA) to establish connections between subtle facial movements and specific facial areas. Then, we design Crossmodal Space-Time Attention (CSTA) to achieve a higher-quality fusion for crossmodal data. Finally, we hierarchically integrate USTA and CSTA to grasp the deeper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Neural Networks and Applications · Speech Recognition and Synthesis
