Soften the Mask: Adaptive Temporal Soft Mask for Efficient Dynamic Facial Expression Recognition
Mengzhu Li, Quanxing Zha, Hongjun Wu

TL;DR
This paper introduces AdaTosk, a novel adaptive temporal soft masking approach for dynamic facial expression recognition that improves efficiency by reducing computational costs while maintaining high accuracy.
Contribution
The paper proposes a new supervised temporal soft masked autoencoder network, AdaTosk, which effectively filters irrelevant information and enhances critical expression moments in DFER.
Findings
Reduces computational costs compared to state-of-the-art methods.
Maintains competitive performance on benchmark datasets.
Enhances critical expression moments through adaptive masking.
Abstract
Dynamic Facial Expression Recognition (DFER) facilitates the understanding of psychological intentions through non-verbal communication. Existing methods struggle to manage irrelevant information, such as background noise and redundant semantics, which impacts both efficiency and effectiveness. In this work, we propose a novel supervised temporal soft masked autoencoder network for DFER, namely AdaTosk, which integrates a parallel supervised classification branch with the self-supervised reconstruction branch. The self-supervised reconstruction branch applies random binary hard mask to generate diverse training samples, encouraging meaningful feature representations in visible tokens. Meanwhile the classification branch employs an adaptive temporal soft mask to flexibly mask visible tokens based on their temporal significance. Its two key components, respectively of, class-agnostic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
