emotion2vec: Self-Supervised Pre-Training for Speech Emotion   Representation

Ziyang Ma; Zhisheng Zheng; Jiaxin Ye; Jinchao Li; Zhifu Gao; Shiliang; Zhang; Xie Chen

arXiv:2312.15185·cs.CL·December 27, 2023·6 cites

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang, Zhang, Xie Chen

PDF

Open Access 2 Repos 6 Models

TL;DR

emotion2vec is a self-supervised pre-trained speech emotion representation model that outperforms existing models across multiple languages and emotion-related tasks, demonstrating universal applicability.

Contribution

It introduces the first universal speech emotion representation model trained via self-supervised online distillation, outperforming state-of-the-art models on diverse datasets.

Findings

01

Outperforms state-of-the-art models on IEMOCAP dataset

02

Shows consistent improvements across 10 languages

03

Excels in various emotion-related tasks like song emotion recognition

Abstract

We propose emotion2vec, a universal speech emotion representation model. emotion2vec is pre-trained on open-source unlabeled emotion data through self-supervised online distillation, combining utterance-level loss and frame-level loss during pre-training. emotion2vec outperforms state-of-the-art pre-trained universal models and emotion specialist models by only training linear layers for the speech emotion recognition task on the mainstream IEMOCAP dataset. In addition, emotion2vec shows consistent improvements among 10 different languages of speech emotion recognition datasets. emotion2vec also shows excellent results on other emotion tasks, such as song emotion recognition, emotion prediction in conversation, and sentiment analysis. Comparison experiments, ablation experiments, and visualization comprehensively demonstrate the universal capability of the proposed emotion2vec. To the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis