MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik,, Erik Cambria, Rada Mihalcea

TL;DR
MELD is a large-scale multimodal dataset with over 13,000 utterances from TV-series dialogues, designed to advance emotion recognition in multi-party conversations by including audio, visual, and textual data.
Contribution
The paper introduces MELD, the first extensive multimodal multi-party conversational dataset with emotion and sentiment labels, filling a critical gap in emotion recognition research.
Findings
Multimodal and contextual information significantly improve emotion recognition accuracy.
Strong baseline models demonstrate the dataset's utility for developing emotion recognition methods.
The dataset enables research on emotion dynamics in multi-party conversations.
Abstract
Emotion recognition in conversations is a challenging task that has recently gained popularity due to its potential applications. Until now, however, a large-scale multimodal multi-party emotional conversational database containing more than two speakers per dialogue was missing. Thus, we propose the Multimodal EmotionLines Dataset (MELD), an extension and enhancement of EmotionLines. MELD contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends. Each utterance is annotated with emotion and sentiment labels, and encompasses audio, visual and textual modalities. We propose several strong multimodal baselines and show the importance of contextual and multimodal information for emotion recognition in conversations. The full dataset is available for use at http:// affective-meld.github.io.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
