SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations
Satwik Kottur, Seungwhan Moon, Alborz Geramifard, Babak Damavandi

TL;DR
SIMMC 2.0 introduces a large, immersive multimodal dialog dataset in the shopping domain, enabling advanced research in context-aware task-oriented systems within realistic environments.
Contribution
The paper presents SIMMC 2.0, a novel multimodal dialog dataset with a two-phase collection process and detailed benchmarks for immersive, context-aware conversations.
Findings
Baseline models achieve promising results.
The dataset reveals new challenges for multimodal dialog understanding.
Provides a foundation for future immersive dialog system research.
Abstract
Next generation task-oriented dialog systems need to understand conversational contexts with their perceived surroundings, to effectively help users in the real-world multimodal environment. Existing task-oriented dialog datasets aimed towards virtual assistance fall short and do not situate the dialog in the user's multimodal context. To overcome, we present a new dataset for Situated and Interactive Multimodal Conversations, SIMMC 2.0, which includes 11K task-oriented user<->assistant dialogs (117K utterances) in the shopping domain, grounded in immersive and photo-realistic scenes. The dialogs are collected using a two-phase pipeline: (1) A novel multimodal dialog simulator generates simulated dialog flows, with an emphasis on diversity and richness of interactions, (2) Manual paraphrasing of the generated utterances to collect diverse referring expressions. We provide an in-depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems
