PESFormer: Boosting Macro- and Micro-expression Spotting with Direct   Timestamp Encoding

Wang-Wang Yu; Kai-Fu Yang; Xiangrui Hu; Jingwen Jiang; Hong-Mei Yan,; Yong-Jie Li

arXiv:2410.18695·cs.CV·October 25, 2024

PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding

Wang-Wang Yu, Kai-Fu Yang, Xiangrui Hu, Jingwen Jiang, Hong-Mei Yan,, Yong-Jie Li

PDF

Open Access

TL;DR

PESFormer introduces a vision transformer-based approach with direct timestamp encoding for more accurate macro- and micro-expression spotting, effectively utilizing training data without sliding window slicing.

Contribution

The paper proposes PESFormer, a novel model that replaces anchor-based encoding with direct timestamp encoding, improving expression localization in untrimmed videos.

Findings

01

Outperforms existing methods on three benchmark datasets.

02

Effectively preserves training intervals by zero-padding videos.

03

Achieves state-of-the-art performance in expression spotting.

Abstract

The task of macro- and micro-expression spotting aims to precisely localize and categorize temporal expression instances within untrimmed videos. Given the sparse distribution and varying durations of expressions, existing anchor-based methods often represent instances by encoding their deviations from predefined anchors. Additionally, these methods typically slice the untrimmed videos into fixed-length sliding windows. However, anchor-based encoding often fails to capture all training intervals, and slicing the original video as sliding windows can result in valuable training intervals being discarded. To overcome these limitations, we introduce PESFormer, a simple yet effective model based on the vision transformer architecture to achieve point-to-interval expression spotting. PESFormer employs a direct timestamp encoding (DTE) approach to replace anchors, enabling binary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Neural Networks and Reservoir Computing

MethodsLinear Layer · Layer Normalization · Residual Connection · Attention Is All You Need · Dense Connections · Softmax · Multi-Head Attention · Vision Transformer