A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition
Ying Hu, Yuwu Tang, Hao Huang, Liang He

TL;DR
This paper introduces WMA-GIN, a novel graph neural network architecture for speech emotion recognition that effectively addresses information confusion and over-squashing, leading to improved accuracy on the IEMOCAP dataset.
Contribution
The paper proposes WMA-GIN with weighted multiple aggregators, a Full-Adjacent layer, and multi-phase attention to enhance GNN performance in speech emotion recognition.
Findings
WMA-GIN outperforms other GNN-based methods on IEMOCAP.
Achieves 72.48% weighted accuracy and 67.72% unweighted accuracy.
Effective handling of information confusion and over-squashing in GNNs.
Abstract
Speech emotion recognition (SER) is an essential part of human-computer interaction. In this paper, we propose an SER network based on a Graph Isomorphism Network with Weighted Multiple Aggregators (WMA-GIN), which can effectively handle the problem of information confusion when neighbour nodes' features are aggregated together in GIN structure. Moreover, a Full-Adjacent (FA) layer is adopted for alleviating the over-squashing problem, which is existed in all Graph Neural Network (GNN) structures, including GIN. Furthermore, a multi-phase attention mechanism and multi-loss training strategy are employed to avoid missing the useful emotional information in the stacked WMA-GIN layers. We evaluated the performance of our proposed WMA-GIN on the popular IEMOCAP dataset. The experimental results show that WMA-GIN outperforms other GNN-based methods and is comparable to some advanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGraph Neural Network · Graph Isomorphism Network
