Speech Emotion Recognition Using Quaternion Convolutional Neural   Networks

Aneesh Muppidi; Martin Radfar

arXiv:2111.00404·cs.SD·November 2, 2021

Speech Emotion Recognition Using Quaternion Convolutional Neural Networks

Aneesh Muppidi, Martin Radfar

PDF

TL;DR

This paper introduces a quaternion CNN model for speech emotion recognition that encodes Mel-spectrogram features in quaternion space, outperforming existing methods and achieving state-of-the-art accuracy on multiple datasets.

Contribution

The paper presents a novel quaternion CNN approach for SER that effectively encodes speech features and reduces model size while improving accuracy over real-valued methods.

Findings

01

Outperforms real-valued methods on RAVDESS dataset

02

Achieves state-of-the-art accuracy on RAVDESS (77.87%)

03

Comparable results on IEMOCAP and EMO-DB datasets

Abstract

Although speech recognition has become a widespread technology, inferring emotion from speech signals still remains a challenge. To address this problem, this paper proposes a quaternion convolutional neural network (QCNN) based speech emotion recognition (SER) model in which Mel-spectrogram features of speech signals are encoded in an RGB quaternion domain. We show that our QCNN based SER model outperforms other real-valued methods in the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS, 8-classes) dataset, achieving, to the best of our knowledge, state-of-the-art results. The QCNN also achieves comparable results with the state-of-the-art methods in the Interactive Emotional Dyadic Motion Capture (IEMOCAP 4-classes) and Berlin EMO-DB (7-classes) datasets. Specifically, the model achieves an accuracy of 77.87\%, 70.46\%, and 88.78\% for the RAVDESS, IEMOCAP, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.