Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning

Liang-Yeh Shen; Shi-Xin Fang; Yi-Cheng Lin; Huang-Cheng Chou; Hung-yi Lee

arXiv:2505.16220·eess.AS·May 23, 2025

Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning

Liang-Yeh Shen, Shi-Xin Fang, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee

PDF

Open Access

TL;DR

Meta-PerSER is a meta-learning framework that personalizes speech emotion recognition by quickly adapting to individual listener styles using few labeled examples and pre-trained models.

Contribution

It introduces a novel meta-learning approach with combined-set training and adaptive learning rates for personalized SER, leveraging self-supervised representations.

Findings

01

Significantly outperforms baselines on IEMOCAP

02

Effective in both seen and unseen data scenarios

03

Enables rapid personalization with few examples

Abstract

This paper introduces Meta-PerSER, a novel meta-learning framework that personalizes Speech Emotion Recognition (SER) by adapting to each listener's unique way of interpreting emotion. Conventional SER systems rely on aggregated annotations, which often overlook individual subtleties and lead to inconsistent predictions. In contrast, Meta-PerSER leverages a Model-Agnostic Meta-Learning (MAML) approach enhanced with Combined-Set Meta-Training, Derivative Annealing, and per-layer per-step learning rates, enabling rapid adaptation with only a few labeled examples. By integrating robust representations from pre-trained self-supervised models, our framework first captures general emotional cues and then fine-tunes itself to personal annotation styles. Experiments on the IEMOCAP corpus demonstrate that Meta-PerSER significantly outperforms baseline methods in both seen and unseen data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Sentiment Analysis and Opinion Mining