Situated and Interactive Multimodal Conversations

Seungwhan Moon; Satwik Kottur; Paul A. Crook; Ankita De; Shivani; Poddar; Theodore Levin; David Whitney; Daniel Difranco; Ahmad Beirami,; Eunjoon Cho; Rajen Subba; Alborz Geramifard

arXiv:2006.01460·cs.CL·November 12, 2020

Situated and Interactive Multimodal Conversations

Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ankita De, Shivani, Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami,, Eunjoon Cho, Rajen Subba, Alborz Geramifard

PDF

2 Repos

TL;DR

This paper introduces SIMMC, a new framework for training multimodal virtual assistants that handle complex, grounded conversations involving vision, memory, and multimodal actions, supported by new datasets and evaluation tasks.

Contribution

The paper presents two large multimodal dialogue datasets, a unified annotation framework, and benchmark tasks for training and evaluating multimodal conversational agents.

Findings

01

Existing models show strong baseline performance on SIMMC tasks.

02

Rich multimodal interactions can be effectively modeled and evaluated.

03

Datasets and tools are publicly available for further research.

Abstract

Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e.g., displaying a route in addition to generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling ~13K human-human dialogs (~169K utterances) using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture (grounded in a shared virtual environment) and, (b) fashion (grounded in an evolving set of images). We also provide logs of the items appearing in each scene, and contextual NLU and coreference annotations, using a novel and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.