emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang, Zhang, Xie Chen

TL;DR
emotion2vec is a self-supervised pre-trained speech emotion representation model that outperforms existing models across multiple languages and emotion-related tasks, demonstrating universal applicability.
Contribution
It introduces the first universal speech emotion representation model trained via self-supervised online distillation, outperforming state-of-the-art models on diverse datasets.
Findings
Outperforms state-of-the-art models on IEMOCAP dataset
Shows consistent improvements across 10 languages
Excels in various emotion-related tasks like song emotion recognition
Abstract
We propose emotion2vec, a universal speech emotion representation model. emotion2vec is pre-trained on open-source unlabeled emotion data through self-supervised online distillation, combining utterance-level loss and frame-level loss during pre-training. emotion2vec outperforms state-of-the-art pre-trained universal models and emotion specialist models by only training linear layers for the speech emotion recognition task on the mainstream IEMOCAP dataset. In addition, emotion2vec shows consistent improvements among 10 different languages of speech emotion recognition datasets. emotion2vec also shows excellent results on other emotion tasks, such as song emotion recognition, emotion prediction in conversation, and sentiment analysis. Comparison experiments, ablation experiments, and visualization comprehensively demonstrate the universal capability of the proposed emotion2vec. To the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗emotion2vec/emotion2vec_plus_basemodel· 420 dl· ♡ 7420 dl♡ 7
- 🤗thegenerativegeneration/emotion2vec_base_finetunedmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗emotion2vec/emotion2vec_basemodel· 39 dl· ♡ 439 dl♡ 4
- 🤗emotion2vec/emotion2vec_plus_seedmodel· 29 dl29 dl
- 🤗emotion2vec/emotion2vec_plus_largemodel· 828 dl· ♡ 70828 dl♡ 70
- 🤗skygmt/emotion2vec_plus_basemodel· 4 dl4 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis
