Multitask Learning from Augmented Auxiliary Data for Improving Speech   Emotion Recognition

Siddique Latif; Rajib Rana; Sara Khalifa; Raja Jurdak; and Bj\"orn W.; Schuller

arXiv:2207.05298·cs.SD·July 13, 2022

Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, and Bj\"orn W., Schuller

PDF

Open Access

TL;DR

This paper introduces MTL-AUG, a multitask learning framework that leverages augmented data and auxiliary tasks without meta labels to improve speech emotion recognition across various challenging conditions.

Contribution

The proposed MTL-AUG framework enables training SER models with augmented data and auxiliary tasks without meta labels, enhancing generalisation and semi-supervised learning capabilities.

Findings

01

Improved SER performance across multiple datasets.

02

Effective in noisy and adversarial conditions.

03

Outperforms existing state-of-the-art methods.

Abstract

Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack generalisation across different conditions. A key underlying reason for poor generalisation is the scarcity of emotion datasets, which is a significant roadblock to designing robust machine learning (ML) models. Recent works in SER focus on utilising multitask learning (MTL) methods to improve generalisation by learning shared representations. However, most of these studies propose MTL solutions with the requirement of meta labels for auxiliary tasks, which limits the training of SER systems. This paper proposes an MTL framework (MTL-AUG) that learns generalised representations from augmented data. We utilise augmentation-type classification and unsupervised reconstruction as auxiliary tasks, which allow training SER systems on augmented data without requiring any meta labels for auxiliary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Sentiment Analysis and Opinion Mining