DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset
Alkis Koudounas, Moreno La Quatra, Elena Baralis

TL;DR
DeepDialogue is a large, multimodal dataset of multi-turn dialogues with rich emotional content across diverse domains, enabling advances in emotionally-aware conversational AI and revealing insights into model coherence and domain effects.
Contribution
It introduces the first large-scale open-source multimodal dialogue dataset with emotional consistency, generated using multiple language models and incorporating speech synthesis.
Findings
Smaller models struggle beyond 6 turns
Concrete domains produce more meaningful dialogues
Cross-model interactions enhance coherence
Abstract
Recent advances in conversational AI have demonstrated impressive capabilities in single-turn responses, yet multi-turn dialogues remain challenging for even the most sophisticated language models. Current dialogue datasets are limited in their emotional range, domain diversity, turn depth, and are predominantly text-only, hindering progress in developing more human-like conversational systems across modalities. To address these limitations, we present DeepDialogue, a large-scale multimodal dataset containing 40,150 high-quality multi-turn dialogues spanning 41 domains and incorporating 20 distinct emotions with coherent emotional progressions. Our approach pairs 9 different language models (4B-72B parameters) to generate 65,600 initial conversations, which we then evaluate through a combination of human annotation and LLM-based quality filtering. The resulting dataset reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling
