Novel Dual-Channel Long Short-Term Memory Compressed Capsule Networks for Emotion Recognition
Ismail Shahin, Noor Hindawi, Ali Bou Nassif, Adi Alhudhaif, Kemal, Polat

TL;DR
This paper introduces a novel dual-channel LSTM compressed capsule network architecture for speech emotion recognition, demonstrating improved accuracy and efficiency across multiple datasets compared to existing methods.
Contribution
The study presents a new dual-channel LSTM compressed CapsNet model that enhances speech emotion recognition accuracy and reduces training time, with optimal feature extraction using MFCCs delta-delta.
Findings
Achieved 89.3% accuracy on Arabic Emirati-accented corpus.
Outperformed state-of-the-art systems, CNN, and original CapsNet.
Reduced training and testing time significantly.
Abstract
Recent analysis on speech emotion recognition has made considerable advances with the use of MFCCs spectrogram features and the implementation of neural network approaches such as convolutional neural networks (CNNs). Capsule networks (CapsNet) have gained gratitude as alternatives to CNNs with their larger capacities for hierarchical representation. To address these issues, this research introduces a text-independent and speaker-independent SER novel architecture, where a dual-channel long short-term memory compressed-CapsNet (DC-LSTM COMP-CapsNet) algorithm is proposed based on the structural features of CapsNet. Our proposed novel classifier can ensure the energy efficiency of the model and adequate compression method in speech emotion recognition, which is not delivered through the original structure of a CapsNet. Moreover, the grid search approach is used to attain optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCapsule Network
