TL;DR
SpotFormer introduces a multi-scale spatio-temporal Transformer framework utilizing novel optical flow features and contrastive learning to improve facial expression spotting, especially micro-expressions, in videos.
Contribution
The paper presents a new multi-scale Transformer architecture with a compact optical flow feature and contrastive learning for enhanced micro-expression detection.
Findings
Outperforms state-of-the-art models on SAMM-LV, CAS(ME)^2, and CAS(ME)^3 datasets.
Effectively detects subtle micro-expressions by tailored optical flow features.
Demonstrates the effectiveness of multi-scale spatio-temporal encoding and contrastive learning.
Abstract
Facial expression spotting, identifying periods where facial expressions occur in a video, is a significant yet challenging task in facial expression analysis. The issues of irrelevant facial movements and the challenge of detecting subtle motions in micro-expressions remain unresolved, hindering accurate expression spotting. In this paper, we propose an efficient framework for facial expression spotting. First, we propose a Compact Sliding-Window-based Multi-temporal-Resolution Optical flow (CSW-MRO) feature, which calculates multi-temporal-resolution optical flow of the input image sequence within compact sliding windows. The window length is tailored to perceive complete micro-expressions and distinguish between general macro- and micro-expressions. CSW-MRO can effectively reveal subtle motions while avoiding the optical flow being dominated by head movements. Second, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
