A Cross-Lingual Meta-Learning Method Based on Domain Adaptation for   Speech Emotion Recognition

David-Gabriel Ion; R\u{a}zvan-Alexandru Sm\u{a}du; Dumitru-Clementin; Cercel; Florin Pop; Mihaela-Claudia Cercel

arXiv:2410.04633·cs.CL·October 8, 2024

A Cross-Lingual Meta-Learning Method Based on Domain Adaptation for Speech Emotion Recognition

David-Gabriel Ion, R\u{a}zvan-Alexandru Sm\u{a}du, Dumitru-Clementin, Cercel, Florin Pop, Mihaela-Claudia Cercel

PDF

Open Access

TL;DR

This paper introduces a practical cross-lingual meta-learning approach with domain adaptation for speech emotion recognition, achieving high accuracy on unseen language datasets with limited training data.

Contribution

It proposes an improved meta-learning framework using a large pre-trained backbone and prototypical networks, with a novel fine-tuning method for better out-of-distribution performance.

Findings

01

Achieved 83.78% accuracy on Greek speech emotion recognition.

02

Achieved 56.30% accuracy on Romanian speech emotion recognition.

03

Demonstrated effectiveness of the method on low-resource, cross-lingual datasets.

Abstract

Best-performing speech models are trained on large amounts of data in the language they are meant to work for. However, most languages have sparse data, making training models challenging. This shortage of data is even more prevalent in speech emotion recognition. Our work explores the model's performance in limited data, specifically for speech emotion recognition. Meta-learning specializes in improving the few-shot learning. As a result, we employ meta-learning techniques on speech emotion recognition tasks, accent recognition, and person identification. To this end, we propose a series of improvements over the multistage meta-learning method. Unlike other works focusing on smaller models due to the high computational cost of meta-learning algorithms, we take a more practical approach. We incorporate a large pre-trained backbone and a prototypical network, making our methods more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis