Multi-Window Data Augmentation Approach for Speech Emotion Recognition
Sarala Padi, Dinesh Manocha, Ram D.Sriram

TL;DR
This paper introduces a multi-window data augmentation method for speech emotion recognition that enhances model performance by using multiple window sizes during audio feature extraction, validated on three benchmark datasets.
Contribution
The paper proposes a novel multi-window augmentation technique combined with a deep learning model, improving speech emotion recognition accuracy over single-window approaches.
Findings
Multi-window augmentation improves SER performance.
Optimal window size significantly affects feature extraction.
The approach outperforms single-window models on benchmark datasets.
Abstract
We present a Multi-Window Data Augmentation (MWA-SER) approach for speech emotion recognition. MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method and building the deep learning model to recognize the underlying emotion of an audio signal. Our proposed multi-window augmentation approach generates additional data samples from the speech signal by employing multiple window sizes in the audio feature extraction process. We show that our augmentation method, combined with a deep learning model, improves speech emotion recognition performance. We evaluate the performance of our approach on three benchmark datasets: IEMOCAP, SAVEE, and RAVDESS. We show that the multi-window model improves the SER performance and outperforms a single-window model. The notion of finding the best window size is an essential step in audio feature extraction.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Speech Recognition and Synthesis
