Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video
Boris Knyazev, Roman Shvetsov, Natalia Efremova, Artem Kuharenko

TL;DR
This paper presents an ensemble of CNNs pretrained on face recognition datasets for emotion classification from videos, achieving state-of-the-art accuracy without temporal information.
Contribution
It introduces the use of pretrained face recognition CNNs in an ensemble for emotion recognition, improving accuracy over previous methods.
Findings
Achieved 60.03% accuracy on EmotiW 2017 test set.
Ensemble of spatial and audio features enhances performance.
Pretraining on face recognition datasets boosts emotion classification accuracy.
Abstract
In this paper we describe a solution to our entry for the emotion recognition challenge EmotiW 2017. We propose an ensemble of several models, which capture spatial and audio features from videos. Spatial features are captured by convolutional neural networks, pretrained on large face recognition datasets. We show that usage of strong industry-level face recognition networks increases the accuracy of emotion recognition. Using our ensemble we improve on the previous best result on the test set by about 1 %, achieving a 60.03 % classification accuracy without any use of visual temporal information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
