On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks
Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg and, Stefan Wermter

TL;DR
This paper evaluates the robustness of neural speech emotion recognition models in human-robot interaction, highlighting the impact of environmental noise and proposing data augmentation techniques to enhance real-world performance.
Contribution
The study assesses neural SER model robustness in noisy robot environments and introduces data augmentation methods to improve their real-world applicability.
Findings
Neural SER models are significantly affected by environmental noise.
Data augmentation techniques improve model robustness in real-world conditions.
Proposed methods reduce the performance gap between training and testing environments.
Abstract
Speech emotion recognition (SER) is an important aspect of effective human-robot collaboration and received a lot of attention from the research community. For example, many neural network-based architectures were proposed recently and pushed the performance to a new level. However, the applicability of such neural SER models trained only on in-domain data to noisy conditions is currently under-researched. In this work, we evaluate the robustness of state-of-the-art neural acoustic emotion recognition models in human-robot interaction scenarios. We hypothesize that a robot's ego noise, room conditions, and various acoustic events that can occur in a home environment can significantly affect the performance of a model. We conduct several experiments on the iCub robot platform and propose several novel ways to reduce the gap between the model's performance during training and testing in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
