Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data

Yin Chen; Jia Li; Yu Zhang; Zhenzhen Hu; Shiguang Shan; Meng Wang; Richang Hong

arXiv:2409.06154·cs.CV·October 31, 2025

Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data

Yin Chen, Jia Li, Yu Zhang, Zhenzhen Hu, Shiguang Shan, Meng Wang, Richang Hong

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces S4D, a dual-modal framework that leverages static facial expression data to improve dynamic facial expression recognition, achieving state-of-the-art results and providing deeper understanding of the static-dynamic correlation.

Contribution

The paper proposes a novel S4D framework with a shared ViT encoder and MoAE module, effectively integrating static and dynamic data for enhanced facial expression recognition.

Findings

01

S4D outperforms existing methods on multiple benchmarks.

02

The MoAE module improves task-specific knowledge sharing.

03

Systematic analysis reveals strong correlation between static and dynamic expressions.

Abstract

Dynamic facial expression recognition (DFER) infers emotions from the temporal evolution of expressions, unlike static facial expression recognition (SFER), which relies solely on a single snapshot. This temporal analysis provides richer information and promises greater recognition capability. However, current DFER methods often exhibit unsatisfied performance largely due to fewer training samples compared to SFER. Given the inherent correlation between static and dynamic expressions, we hypothesize that leveraging the abundant SFER data can enhance DFER. To this end, we propose Static-for-Dynamic (S4D), a unified dual-modal learning framework that integrates SFER data as a complementary resource for DFER. Specifically, S4D employs dual-modal self-supervised pre-training on facial images and videos using a shared Vision Transformer (ViT) encoder-decoder architecture, yielding improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

msa-lmc/s4d
noneOfficial

Models

🤗
cyinen/S4D
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Face recognition and analysis

MethodsAttention Is All You Need · Absolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Vision Transformer · Multi-Head Attention