SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting

Yicheng Deng; Hideaki Hayashi; Hajime Nagahara

arXiv:2407.20799·cs.CV·April 14, 2026

SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting

Yicheng Deng, Hideaki Hayashi, Hajime Nagahara

PDF

1 Repo

TL;DR

SpotFormer introduces a multi-scale spatio-temporal Transformer framework utilizing novel optical flow features and contrastive learning to improve facial expression spotting, especially micro-expressions, in videos.

Contribution

The paper presents a new multi-scale Transformer architecture with a compact optical flow feature and contrastive learning for enhanced micro-expression detection.

Findings

01

Outperforms state-of-the-art models on SAMM-LV, CAS(ME)^2, and CAS(ME)^3 datasets.

02

Effectively detects subtle micro-expressions by tailored optical flow features.

03

Demonstrates the effectiveness of multi-scale spatio-temporal encoding and contrastive learning.

Abstract

Facial expression spotting, identifying periods where facial expressions occur in a video, is a significant yet challenging task in facial expression analysis. The issues of irrelevant facial movements and the challenge of detecting subtle motions in micro-expressions remain unresolved, hindering accurate expression spotting. In this paper, we propose an efficient framework for facial expression spotting. First, we propose a Compact Sliding-Window-based Multi-temporal-Resolution Optical flow (CSW-MRO) feature, which calculates multi-temporal-resolution optical flow of the input image sequence within compact sliding windows. The window length is tailored to perceive complete micro-expressions and distinguish between general macro- and micro-expressions. CSW-MRO can effectively reveal subtle motions while avoiding the optical flow being dominated by head movements. Second, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KinopioIsAllIn/SpotFormer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.