Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
Dongyuan Li, Ying Zhang, Yusong Wang, Funakoshi Kataro and, Manabu Okumura

TL;DR
This paper introduces extsc{After}, an active learning framework combined with task adaptation pre-training to improve speech emotion recognition accuracy and efficiency, especially with limited labeled data.
Contribution
It proposes integrating task adaptation pre-training with active learning for SER, reducing data and time requirements while boosting performance.
Findings
Achieves 8.45% accuracy improvement with only 20% data
Reduces fine-tuning time by 79%
Effective across various real-world scenarios
Abstract
Speech emotion recognition (SER) has garnered increasing attention due to its wide range of applications in various fields, including human-machine interaction, virtual assistants, and mental health assistance. However, existing SER methods often overlook the information gap between the pre-training speech recognition task and the downstream SER task, resulting in sub-optimal performance. Moreover, current methods require much time for fine-tuning on each specific speech dataset, such as IEMOCAP, which limits their effectiveness in real-world scenarios with large-scale noisy data. To address these issues, we propose an active learning (AL)-based fine-tuning framework for SER, called \textsc{After}, that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Speech and dialogue systems · Fuzzy Logic and Control Systems
