MSP-Conversation: A Corpus for Naturalistic, Time-Continuous Emotion Recognition

Luz Martinez-Lucas; Pravin Mote; Abinay Reddy Naini; Mohammed Abdelwahab; Carlos Busso

arXiv:2603.22536·eess.AS·March 25, 2026

MSP-Conversation: A Corpus for Naturalistic, Time-Continuous Emotion Recognition

Luz Martinez-Lucas, Pravin Mote, Abinay Reddy Naini, Mohammed Abdelwahab, Carlos Busso

PDF

Open Access

TL;DR

This paper introduces MSP-Conversation, a large, naturalistic speech emotion dataset with time-continuous annotations, enabling more realistic emotion recognition research in conversational settings.

Contribution

It provides a new, extensive corpus with detailed annotations and baseline experiments, addressing the need for naturalistic, dynamic emotion datasets in speech emotion recognition.

Findings

01

The corpus contains over 70 hours of conversational audio with detailed emotional annotations.

02

Baseline SER experiments demonstrate the utility of the dataset for dynamic emotion recognition.

03

Annotations include fine-grained valence, arousal, and dominance traces.

Abstract

Affective computing aims to understand and model human emotions for computational systems. Within this field, speech emotion recognition (SER) focuses on predicting emotions conveyed through speech. While early SER systems relied on limited datasets and traditional machine learning models, recent deep learning approaches demand largescale, naturalistic emotional corpora. To address this need, we introduce the MSP-Conversation corpus: a dataset of more than 70 hours of conversational audio with time-continuous emotional annotations and detailed speaker diarizations. The time-continuous annotations capture the dynamic and contextdependent nature of emotional expression. The annotations in the corpus include fine-grained temporal traces of valence, arousal, and dominance. The audio data is sourced from publicly available podcasts and overlaps with a subset of the isolated speaking turns in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Music and Audio Processing · Speech and dialogue systems