Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, and Bj\"orn W., Schuller

TL;DR
This paper introduces MTL-AUG, a multitask learning framework that leverages augmented data and auxiliary tasks without meta labels to improve speech emotion recognition across various challenging conditions.
Contribution
The proposed MTL-AUG framework enables training SER models with augmented data and auxiliary tasks without meta labels, enhancing generalisation and semi-supervised learning capabilities.
Findings
Improved SER performance across multiple datasets.
Effective in noisy and adversarial conditions.
Outperforms existing state-of-the-art methods.
Abstract
Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack generalisation across different conditions. A key underlying reason for poor generalisation is the scarcity of emotion datasets, which is a significant roadblock to designing robust machine learning (ML) models. Recent works in SER focus on utilising multitask learning (MTL) methods to improve generalisation by learning shared representations. However, most of these studies propose MTL solutions with the requirement of meta labels for auxiliary tasks, which limits the training of SER systems. This paper proposes an MTL framework (MTL-AUG) that learns generalised representations from augmented data. We utilise augmentation-type classification and unsupervised reconstruction as auxiliary tasks, which allow training SER systems on augmented data without requiring any meta labels for auxiliary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Sentiment Analysis and Opinion Mining
