A Cross-Lingual Meta-Learning Method Based on Domain Adaptation for Speech Emotion Recognition
David-Gabriel Ion, R\u{a}zvan-Alexandru Sm\u{a}du, Dumitru-Clementin, Cercel, Florin Pop, Mihaela-Claudia Cercel

TL;DR
This paper introduces a practical cross-lingual meta-learning approach with domain adaptation for speech emotion recognition, achieving high accuracy on unseen language datasets with limited training data.
Contribution
It proposes an improved meta-learning framework using a large pre-trained backbone and prototypical networks, with a novel fine-tuning method for better out-of-distribution performance.
Findings
Achieved 83.78% accuracy on Greek speech emotion recognition.
Achieved 56.30% accuracy on Romanian speech emotion recognition.
Demonstrated effectiveness of the method on low-resource, cross-lingual datasets.
Abstract
Best-performing speech models are trained on large amounts of data in the language they are meant to work for. However, most languages have sparse data, making training models challenging. This shortage of data is even more prevalent in speech emotion recognition. Our work explores the model's performance in limited data, specifically for speech emotion recognition. Meta-learning specializes in improving the few-shot learning. As a result, we employ meta-learning techniques on speech emotion recognition tasks, accent recognition, and person identification. To this end, we propose a series of improvements over the multistage meta-learning method. Unlike other works focusing on smaller models due to the high computational cost of meta-learning algorithms, we take a more practical approach. We incorporate a large pre-trained backbone and a prototypical network, making our methods more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
