An Occam's Razor View on Learning Audiovisual Emotion Recognition with   Small Training Sets

Valentin Vielzeuf; Corentin Kervadec; St\'ephane Pateux; Alexis; Lechervy; Fr\'ed\'eric Jurie

arXiv:1808.02668·cs.AI·August 9, 2018

An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets

Valentin Vielzeuf, Corentin Kervadec, St\'ephane Pateux, Alexis, Lechervy, Fr\'ed\'eric Jurie

PDF

TL;DR

This paper introduces a simple, lightweight deep neural model for audiovisual emotion recognition that achieves competitive accuracy on small datasets by emphasizing minimalism and effective transfer learning.

Contribution

The authors propose a novel, minimalistic neural architecture for audiovisual emotion recognition that relies on transfer learning, simple temporal scoring, and late fusion, suitable for small datasets.

Findings

01

Achieved 60.64% accuracy on AFEW dataset

02

Ranked 4th in Emotion in the Wild 2018 challenge

03

Demonstrated effectiveness of simple methods on small datasets

Abstract

This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest earning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of the representations. ii) The isual temporal information is handled by a simple score-per-frame selection process, averaged across time. iii) A simple frame selection echanism is also proposed to weight the images of a sequence. iv) The fusion of the different modalities is performed at prediction level (late usion). We also highlight the inherent challenges of the AFEW dataset and the difficulty of model selection with as few as 383 validation equences. The proposed real-time emotion classifier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.