Active Learning with Task Adaptation Pre-training for Speech Emotion   Recognition

Dongyuan Li; Ying Zhang; Yusong Wang; Funakoshi Kataro and; Manabu Okumura

arXiv:2405.00307·cs.SD·May 2, 2024

Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition

Dongyuan Li, Ying Zhang, Yusong Wang, Funakoshi Kataro and, Manabu Okumura

PDF

Open Access

TL;DR

This paper introduces extsc{After}, an active learning framework combined with task adaptation pre-training to improve speech emotion recognition accuracy and efficiency, especially with limited labeled data.

Contribution

It proposes integrating task adaptation pre-training with active learning for SER, reducing data and time requirements while boosting performance.

Findings

01

Achieves 8.45% accuracy improvement with only 20% data

02

Reduces fine-tuning time by 79%

03

Effective across various real-world scenarios

Abstract

Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Speech and dialogue systems · Fuzzy Logic and Control Systems