Enhancing Speech Emotion Recognition through Segmental Average Pooling of Self-Supervised Learning Features
Jonghwan Hyeon, Yung-Hwan Oh, Ho-Jin Choi

TL;DR
This paper introduces Segmental Average Pooling (SAP) to improve Speech Emotion Recognition by focusing on speech segments in self-supervised learning features, outperforming traditional global pooling methods.
Contribution
The paper proposes a novel Segmental Average Pooling method that enhances SSL-based SER by selectively emphasizing speech segments over non-speech segments.
Findings
SAP improves SER accuracy on IEMOCAP and KEMDy19 datasets.
The combined use of GAP and SAP yields state-of-the-art results.
SAP outperforms traditional global average pooling in speech emotion tasks.
Abstract
Speech Emotion Recognition (SER) analyzes human emotions expressed through speech. Self-supervised learning (SSL) offers a promising approach to SER by learning meaningful representations from a large amount of unlabeled audio data. However, existing SSL-based methods rely on Global Average Pooling (GAP) to represent audio signals, treating speech and non-speech segments equally. This can lead to dilution of informative speech features by irrelevant non-speech information. To address this, the paper proposes Segmental Average Pooling (SAP), which selectively focuses on informative speech segments while ignoring non-speech segments. By applying both GAP and SAP to SSL features, our approach utilizes overall speech signal information from GAP and specific information from SAP, leading to improved SER performance. Experiments show state-of-the-art results on the IEMOCAP for English and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Emotion and Mood Recognition
MethodsAverage Pooling · Global Average Pooling
