Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
Andreas Triantafyllopoulos, Anton Batliner, Bj\"orn W. Schuller

TL;DR
This study reviews 15 years of deep learning advancements in speech emotion recognition, revealing diminishing returns with newer models and emphasizing the importance of model selection in perceived progress.
Contribution
It provides a comprehensive quantification of progress in SER over 15 years and analyzes the impact of different model architectures, highlighting the plateau after transformer models.
Findings
Diminishing returns observed with recent deep learning models
Plateau in progress after transformer architectures
Perceptions of progress depend on model comparison choices
Abstract
Speech emotion recognition (SER) has long benefited from the adoption of deep learning methodologies. Deeper models -- with more layers and more trainable parameters -- are generally perceived as being `better' by the SER community. This raises the question -- \emph{how much better} are modern-era deep neural networks compared to their earlier iterations? Beyond that, the more important question of how to move forward remains as poignant as ever. SER is far from a solved problem; therefore, identifying the most prominent avenues of future research is of paramount importance. In the present contribution, we attempt a quantification of progress in the 15 years of research beginning with the introduction of the landmark 2009 INTERSPEECH Emotion Challenge. We conduct a large scale investigation of model architectures, spanning both audio-based models that rely on speech inputs and text-baed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Sentiment Analysis and Opinion Mining
