# Emotion Recognition Using Fusion of Audio and Video Features

**Authors:** Juan D. S. Ortega, Patrick Cardinal, Alessandro L. Koerich

arXiv: 1906.10623 · 2019-06-26

## TL;DR

This paper presents a fusion method combining visual and auditory features for continuous emotion recognition, achieving improved accuracy in predicting arousal and valence levels using a novel multimodal approach.

## Contribution

It introduces a multimodal fusion approach at feature and prediction levels using deep learning and transfer learning for emotion recognition.

## Key findings

- Achieved CCC of 0.749 for arousal
- Achieved CCC of 0.565 for valence
- Utilized pre-trained CNN and transfer learning for feature extraction

## Abstract

In this paper we propose a fusion approach to continuous emotion recognition that combines visual and auditory modalities in their representation spaces to predict the arousal and valence levels. The proposed approach employs a pre-trained convolution neural network and transfer learning to extract features from video frames that capture the emotional content. For the auditory content, a minimalistic set of parameters such as prosodic, excitation, vocal tract, and spectral descriptors are used as features. The fusion of these two modalities is carried out at a feature level, before training a single support vector regressor (SVR) or at a prediction level, after training one SVR for each modality. The proposed approach also includes preprocessing and post-processing techniques which contribute favorably to improving the concordance correlation coefficient (CCC). Experimental results for predicting spontaneous and natural emotions on the RECOLA dataset have shown that the proposed approach takes advantage of the complementary information of visual and auditory modalities and provides CCCs of 0.749 and 0.565 for arousal and valence, respectively.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.10623/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1906.10623/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1906.10623/full.md

---
Source: https://tomesphere.com/paper/1906.10623