SPARTA: Speaker Profiling for ARabic TAlk
Wael Farhan, Muhy Eddin Za'ter, Qusai Abu Obaidah, Hisham al Bataineh,, Zyad Sober, Hussein T. Al-Natsheh

TL;DR
This paper introduces SPARTA, a multi-task learning framework for Arabic speech analysis that estimates gender, emotion, and dialect, demonstrating improved accuracy over single-task models across multiple datasets.
Contribution
It presents a novel multi-task learning approach for Arabic speaker trait classification, utilizing various neural networks and features, with publicly available datasets and models.
Findings
MTL outperforms STL in accuracy
Raw features like MFCC and MEL are effective with LSTM and CNN
Pre-trained vectors enhance classification performance
Abstract
This paper proposes a novel approach to an automatic estimation of three speaker traits from Arabic speech: gender, emotion, and dialect. After showing promising results on different text classification tasks, the multi-task learning (MTL) approach is used in this paper for Arabic speech classification tasks. The dataset was assembled from six publicly available datasets. First, The datasets were edited and thoroughly divided into train, development, and test sets (open to the public), and a benchmark was set for each task and dataset throughout the paper. Then, three different networks were explored: Long Short Term Memory (LSTM), Convolutional Neural Network (CNN), and Fully-Connected Neural Network (FCNN) on five different types of features: two raw features (MFCC and MEL) and three pre-trained vectors (i-vectors, d-vectors, and x-vectors). LSTM and CNN networks were implemented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
