Multi-Window Data Augmentation Approach for Speech Emotion Recognition

Sarala Padi; Dinesh Manocha; Ram D.Sriram

arXiv:2010.09895·cs.SD·February 17, 2022·5 cites

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

Sarala Padi, Dinesh Manocha, Ram D.Sriram

PDF

Open Access

TL;DR

This paper introduces a multi-window data augmentation method for speech emotion recognition that enhances model performance by using multiple window sizes during audio feature extraction, validated on three benchmark datasets.

Contribution

The paper proposes a novel multi-window augmentation technique combined with a deep learning model, improving speech emotion recognition accuracy over single-window approaches.

Findings

01

Multi-window augmentation improves SER performance.

02

Optimal window size significantly affects feature extraction.

03

The approach outperforms single-window models on benchmark datasets.

Abstract

We present a Multi-Window Data Augmentation (MWA-SER) approach for speech emotion recognition. MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method and building the deep learning model to recognize the underlying emotion of an audio signal. Our proposed multi-window augmentation approach generates additional data samples from the speech signal by employing multiple window sizes in the audio feature extraction process. We show that our augmentation method, combined with a deep learning model, improves speech emotion recognition performance. We evaluate the performance of our approach on three benchmark datasets: IEMOCAP, SAVEE, and RAVDESS. We show that the multi-window model improves the SER performance and outperforms a single-window model. The notion of finding the best window size is an essential step in audio feature extraction.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech and Audio Processing · Speech Recognition and Synthesis