Persian Speech Emotion Recognition by Fine-Tuning Transformers
Minoo Shayaninasab, Bagher Babaali

TL;DR
This paper explores the use of fine-tuned transformer models for Persian speech emotion recognition, achieving significant accuracy improvements by leveraging multilingual training and different input representations.
Contribution
It introduces transformer-based models for Persian speech emotion recognition and demonstrates their effectiveness with accuracy improvements over previous methods.
Findings
Accuracy increased from 65% to 80% on the shEMO dataset.
Multilingual fine-tuning with English and Persian data further improved accuracy to 82%.
Models based on spectrograms and raw audio both showed significant performance gains.
Abstract
Given the significance of speech emotion recognition, numerous methods have been developed in recent years to create effective and efficient systems in this domain. One of these methods involves the use of pretrained transformers, fine-tuned to address this specific problem, resulting in high accuracy. Despite extensive discussions and global-scale efforts to enhance these systems, the application of this innovative and effective approach has received less attention in the context of Persian speech emotion recognition. In this article, we review the field of speech emotion recognition and its background, with an emphasis on the importance of employing transformers in this context. We present two models, one based on spectrograms and the other on the audio itself, fine-tuned using the shEMO dataset. These models significantly enhance the accuracy of previous systems, increasing it from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Emotion and Mood Recognition · Speech Recognition and Synthesis
