Exploiting multi-CNN features in CNN-RNN based Dimensional Emotion Recognition on the OMG in-the-wild Dataset
Dimitrios Kollias, Stefanos Zafeiriou

TL;DR
This paper introduces a CNN-RNN framework that leverages multi-level CNN features for dimensional emotion recognition in-the-wild, achieving state-of-the-art results using only visual data on the OMG-Emotion dataset.
Contribution
It proposes a novel multi-level feature extraction and fusion approach within CNN-RNN models, enhancing emotion recognition performance in challenging real-world scenarios.
Findings
Outperformed state-of-the-art methods using only visual data.
Achieved second place in OMG-Emotion Challenge for valence estimation.
Combining low- and high-level features significantly improves arousal estimation.
Abstract
This paper presents a novel CNN-RNN based approach, which exploits multiple CNN features for dimensional emotion recognition in-the-wild, utilizing the One-Minute Gradual-Emotion (OMG-Emotion) dataset. Our approach includes first pre-training with the relevant and large in size, Aff-Wild and Aff-Wild2 emotion databases. Low-, mid- and high-level features are extracted from the trained CNN component and are exploited by RNN subnets in a multi-task framework. Their outputs constitute an intermediate level prediction; final estimates are obtained as the mean or median values of these predictions. Fusion of the networks is also examined for boosting the obtained performance, at Decision-, or at Model-level; in the latter case a RNN was used for the fusion. Our approach, although using only the visual modality, outperformed state-of-the-art methods that utilized audio and visual modalities.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
