Transfer Learning based Speech Affect Recognition in Urdu
Sara Durrani, Muhammad Umair Arshad

TL;DR
This paper presents a transfer learning approach using deep residual networks for speech affect recognition in Urdu, effectively addressing data scarcity and achieving high accuracy on multiple datasets.
Contribution
It introduces a transfer learning method with pre-trained models for low-resource language affect recognition, demonstrating significant improvements over existing algorithms.
Findings
Achieved 74.7% UAR on RAVDESS dataset.
Effective transfer learning reduces data requirements for Urdu affect recognition.
Pre-trained models significantly enhance feature extraction and accuracy.
Abstract
It has been established that Speech Affect Recognition for low resource languages is a difficult task. Here we present a Transfer learning based Speech Affect Recognition approach in which: we pre-train a model for high resource language affect recognition task and fine tune the parameters for low resource language using Deep Residual Network. Here we use standard four data sets to demonstrate that transfer learning can solve the problem of data scarcity for Affect Recognition task. We demonstrate that our approach is efficient by achieving 74.7 percent UAR on RAVDESS as source and Urdu data set as a target. Through an ablation study, we have identified that pre-trained model adds most of the features information, improvement in results and solves less data issues. Using this knowledge, we have also experimented on SAVEE and EMO-DB data set by setting Urdu as target language where only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Emotion and Mood Recognition · Speech Recognition and Synthesis
