Emotional Speaker Identification using a Novel Capsule Nets Model
Ali Bou Nassif, Ismail Shahin, Ashraf Elnagar, Divya Velayudhan, Adi, Alhudhaif, Kemal Polat

TL;DR
This paper introduces a novel CapsNet-based model for emotional speaker identification, demonstrating faster training and improved accuracy over existing methods across multiple speech databases.
Contribution
The study proposes a new CapsNet architecture tailored for emotional speaker recognition, addressing CNN limitations in capturing spatial feature relationships.
Findings
CapsNet model trains faster than baseline models
CapsNet achieves higher accuracy in emotional speaker identification
Routing algorithm iterations impact performance significantly
Abstract
Speaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult because emotions alter the voice characteristics of a person; thus, the acoustic features differ from those used to train models in a neutral environment. Therefore, speaker recognition models trained on neutral speech fail to correctly identify speakers under emotional stress. Although considerable advancements in speaker identification have been made using convolutional neural networks (CNN), CNNs cannot exploit the spatial association between low-level features. Inspired by the recent introduction of capsule networks (CapsNets), which are based on deep learning to overcome the inadequacy of CNNs in preserving the pose relationship between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
