SVFAP: Self-supervised Video Facial Affect Perceiver

Licai Sun; Zheng Lian; Kexin Wang; Yu He; Mingyu Xu; Haiyang Sun; Bin; Liu; and Jianhua Tao

arXiv:2401.00416·cs.CV·October 2, 2024·2 cites

SVFAP: Self-supervised Video Facial Affect Perceiver

Licai Sun, Zheng Lian, Kexin Wang, Yu He, Mingyu Xu, Haiyang Sun, Bin, Liu, and Jianhua Tao

PDF

Open Access 1 Repo

TL;DR

SVFAP introduces a self-supervised learning framework for video facial affect analysis that leverages masked autoencoding and a novel Transformer architecture, significantly improving performance across multiple affect recognition tasks without requiring labeled data.

Contribution

The paper proposes SVFAP, a self-supervised approach with a novel Transformer encoder, enabling effective large-scale pre-training on unlabeled videos for facial affect analysis.

Findings

01

Outperforms state-of-the-art on nine datasets

02

Effective in multiple affect recognition tasks

03

Reduces computational costs with novel Transformer design

Abstract

Video-based facial affect analysis has recently attracted increasing attention owing to its critical role in human-computer interaction. Previous studies mainly focus on developing various deep learning architectures and training them in a fully supervised manner. Although significant progress has been achieved by these supervised methods, the longstanding lack of large-scale high-quality labeled data severely hinders their further improvements. Motivated by the recent success of self-supervised learning in computer vision, this paper introduces a self-supervised approach, termed Self-supervised Video Facial Affect Perceiver (SVFAP), to address the dilemma faced by supervised methods. Specifically, SVFAP leverages masked facial video autoencoding to perform self-supervised pre-training on massive unlabeled facial videos. Considering that large spatiotemporal redundancy exists in facial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sunlicai/svfap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Face recognition and analysis · Human Pose and Action Recognition

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Pointwise Convolution · Softmax · Label Smoothing · Multi-Head Attention · Adam · Dropout · Absolute Position Encodings