Transfer Learning based Speech Affect Recognition in Urdu

Sara Durrani; Muhammad Umair Arshad

arXiv:2103.03580·cs.CL·March 8, 2021·6 cites

Transfer Learning based Speech Affect Recognition in Urdu

Sara Durrani, Muhammad Umair Arshad

PDF

Open Access

TL;DR

This paper presents a transfer learning approach using deep residual networks for speech affect recognition in Urdu, effectively addressing data scarcity and achieving high accuracy on multiple datasets.

Contribution

It introduces a transfer learning method with pre-trained models for low-resource language affect recognition, demonstrating significant improvements over existing algorithms.

Findings

01

Achieved 74.7% UAR on RAVDESS dataset.

02

Effective transfer learning reduces data requirements for Urdu affect recognition.

03

Pre-trained models significantly enhance feature extraction and accuracy.

Abstract

It has been established that Speech Affect Recognition for low resource languages is a difficult task. Here we present a Transfer learning based Speech Affect Recognition approach in which: we pre-train a model for high resource language affect recognition task and fine tune the parameters for low resource language using Deep Residual Network. Here we use standard four data sets to demonstrate that transfer learning can solve the problem of data scarcity for Affect Recognition task. We demonstrate that our approach is efficient by achieving 74.7 percent UAR on RAVDESS as source and Urdu data set as a target. Through an ablation study, we have identified that pre-trained model adds most of the features information, improvement in results and solves less data issues. Using this knowledge, we have also experimented on SAVEE and EMO-DB data set by setting Urdu as target language where only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Emotion and Mood Recognition · Speech Recognition and Synthesis