Representation Learning with Graph Neural Networks for Speech Emotion Recognition
Junghun Kim, Jihie Kim

TL;DR
This paper introduces CoGCN, a graph neural network model using cosine similarity for speech emotion recognition, demonstrating robustness to noise and superior performance with fewer parameters.
Contribution
The paper proposes a novel cosine similarity-based graph structure and CoGCN model for SER, improving noise robustness and reducing model size compared to existing methods.
Findings
Outperforms state-of-the-art SER methods
Achieves significant model size reduction
Demonstrates robustness to speech noise
Abstract
Learning expressive representation is crucial in deep learning. In speech emotion recognition (SER), vacuum regions or noises in the speech interfere with expressive representation learning. However, traditional RNN-based models are susceptible to such noise. Recently, Graph Neural Network (GNN) has demonstrated its effectiveness for representation learning, and we adopt this framework for SER. In particular, we propose a cosine similarity-based graph as an ideal graph structure for representation learning in SER. We present a Cosine similarity-based Graph Convolutional Network (CoGCN) that is robust to perturbation and noise. Experimental results show that our method outperforms state-of-the-art methods or provides competitive results with a significant model size reduction with only 1/30 parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Advanced Graph Neural Networks
MethodsGraph Neural Network
