PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding
Wang-Wang Yu, Kai-Fu Yang, Xiangrui Hu, Jingwen Jiang, Hong-Mei Yan,, Yong-Jie Li

TL;DR
PESFormer introduces a vision transformer-based approach with direct timestamp encoding for more accurate macro- and micro-expression spotting, effectively utilizing training data without sliding window slicing.
Contribution
The paper proposes PESFormer, a novel model that replaces anchor-based encoding with direct timestamp encoding, improving expression localization in untrimmed videos.
Findings
Outperforms existing methods on three benchmark datasets.
Effectively preserves training intervals by zero-padding videos.
Achieves state-of-the-art performance in expression spotting.
Abstract
The task of macro- and micro-expression spotting aims to precisely localize and categorize temporal expression instances within untrimmed videos. Given the sparse distribution and varying durations of expressions, existing anchor-based methods often represent instances by encoding their deviations from predefined anchors. Additionally, these methods typically slice the untrimmed videos into fixed-length sliding windows. However, anchor-based encoding often fails to capture all training intervals, and slicing the original video as sliding windows can result in valuable training intervals being discarded. To overcome these limitations, we introduce PESFormer, a simple yet effective model based on the vision transformer architecture to achieve point-to-interval expression spotting. PESFormer employs a direct timestamp encoding (DTE) approach to replace anchors, enabling binary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Neural Networks and Reservoir Computing
MethodsLinear Layer · Layer Normalization · Residual Connection · Attention Is All You Need · Dense Connections · Softmax · Multi-Head Attention · Vision Transformer
