Speech Emotion Diarization: Which Emotion Appears When?

Yingzhi Wang; Mirco Ravanelli; Alya Yacoubi

arXiv:2306.12991·cs.CL·October 23, 2023

Speech Emotion Diarization: Which Emotion Appears When?

Yingzhi Wang, Mirco Ravanelli, Alya Yacoubi

PDF

Open Access 3 Repos 1 Models 1 Datasets

TL;DR

This paper introduces Speech Emotion Diarization (SED), a new task that identifies when specific emotions occur in speech, supported by a new dataset, ZED, and baseline models for evaluation.

Contribution

The paper proposes SED as a novel fine-grained approach to speech emotion analysis, along with the ZED dataset and baseline solutions.

Findings

01

Introduction of the SED task and ZED dataset

02

Baseline models for emotion segmentation provided

03

Open-source code and models available

Abstract

Speech Emotion Recognition (SER) typically relies on utterance-level solutions. However, emotions conveyed through speech should be considered as discrete speech events with definite temporal boundaries, rather than attributes of the entire utterance. To reflect the fine-grained nature of speech emotions, we propose a new task: Speech Emotion Diarization (SED). Just as Speaker Diarization answers the question of "Who speaks when?", Speech Emotion Diarization answers the question of "Which emotion appears when?". To facilitate the evaluation of the performance and establish a common benchmark for researchers, we introduce the Zaion Emotion Dataset (ZED), an openly accessible speech emotion dataset that includes non-acted emotions recorded in real-life conditions, along with manually-annotated boundaries of emotion segments within the utterance. We provide competitive baselines and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
speechbrain/emotion-diarization-wavlm-large
model· 170 dl· ♡ 56
170 dl♡ 56

Datasets

AdeoyeLadele/emouerj-sed
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Sentiment Analysis and Opinion Mining