Investigations on Audiovisual Emotion Recognition in Noisy Conditions

Michael Neumann; Ngoc Thang Vu

arXiv:2103.01894·cs.SD·March 3, 2021

Investigations on Audiovisual Emotion Recognition in Noisy Conditions

Michael Neumann, Ngoc Thang Vu

PDF

TL;DR

This study investigates how audiovisual emotion recognition performs under noisy conditions, demonstrating that visual features can mitigate performance loss caused by acoustic noise in speech-based emotion detection.

Contribution

It provides an analytical comparison of speech and visual features in noisy environments, introducing a hybrid fusion neural network approach for improved emotion recognition robustness.

Findings

01

Performance drops significantly with noisy speech data

02

Adding visual features improves accuracy under noise

03

Hybrid fusion neural network enhances robustness

Abstract

In this paper we explore audiovisual emotion recognition under noisy acoustic conditions with a focus on speech features. We attempt to answer the following research questions: (i) How does speech emotion recognition perform on noisy data? and (ii) To what extend does a multimodal approach improve the accuracy and compensate for potential performance degradation at different noise levels? We present an analytical investigation on two emotion datasets with superimposed noise at different signal-to-noise ratios, comparing three types of acoustic features. Visual features are incorporated with a hybrid fusion approach: The first neural network layers are separate modality-specific ones, followed by at least one shared layer before the final prediction. The results show a significant performance decrease when a model trained on clean audio is applied to noisy data and that the addition of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.