Active Learning Based Fine-Tuning Framework for Speech Emotion   Recognition

Dongyuan Li; Yusong Wang; Kotaro Funakoshi; Manabu Okumura

arXiv:2310.00283·cs.SD·October 3, 2023

Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

PDF

Open Access

TL;DR

This paper introduces an active learning-based fine-tuning framework for speech emotion recognition that improves accuracy and reduces training time by selectively using informative samples and minimizing the information gap through task adaptation pre-training.

Contribution

It combines task adaptation pre-training with active learning to enhance SER performance and efficiency, addressing limitations of existing methods.

Findings

01

20% sample usage yields 8.45% accuracy improvement

02

Reduces fine-tuning time by 79%

03

Effective in large-scale noisy data scenarios

Abstract

Speech emotion recognition (SER) has drawn increasing attention for its applications in human-machine interaction. However, existing SER methods ignore the information gap between the pre-training speech recognition task and the downstream SER task, leading to sub-optimal performance. Moreover, they require much time to fine-tune on each specific speech dataset, restricting their effectiveness in real-world scenes with large-scale noisy data. To address these issues, we propose an active learning (AL) based Fine-Tuning framework for SER that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training and the downstream task. Then, AL methods are used to iteratively select a subset of the most informative and diverse samples for fine-tuning, reducing time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques