Spatiotemporal Contrastive Learning of Facial Expressions in Videos

Shuvendu Roy; Ali Etemad

arXiv:2108.03064·cs.CV·August 9, 2021

Spatiotemporal Contrastive Learning of Facial Expressions in Videos

Shuvendu Roy, Ali Etemad

PDF

TL;DR

This paper introduces a self-supervised contrastive learning method for facial expression recognition in videos, utilizing a novel temporal augmentation scheme to improve accuracy and outperform existing methods.

Contribution

It presents a new temporal sampling-based augmentation scheme for contrastive learning in FER, enhancing the effectiveness of self-supervised video-based facial expression recognition.

Findings

01

Achieved 89.4% accuracy on Oulu-CASIA dataset.

02

Outperformed existing FER methods with the proposed approach.

03

Temporal augmentation significantly improves recognition performance.

Abstract

We propose a self-supervised contrastive learning approach for facial expression recognition (FER) in videos. We propose a novel temporal sampling-based augmentation scheme to be utilized in addition to standard spatial augmentations used for contrastive learning. Our proposed temporal augmentation scheme randomly picks from one of three temporal sampling techniques: (1) pure random sampling, (2) uniform sampling, and (3) sequential sampling. This is followed by a combination of up to three standard spatial augmentations. We then use a deep R(2+1)D network for FER, which we train in a self-supervised fashion based on the augmentations and subsequently fine-tune. Experiments are performed on the Oulu-CASIA dataset and the performance is compared to other works in FER. The results indicate that our method achieves an accuracy of 89.4%, setting a new state-of-the-art by outperforming other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Contrastive Learning · Residual Connection · Average Pooling · Dense Connections · Global Average Pooling · (2+1)D Convolution · Batch Normalization · R(2+1)D