OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset   with Visual Contexts

Shuhe Wang; Yuxian Meng; Xiaoya Li; Xiaofei Sun; Rongbin Ouyang; Jiwei; Li

arXiv:2109.12761·cs.CL·September 29, 2021·5 cites

OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts

Shuhe Wang, Yuxian Meng, Xiaoya Li, Xiaofei Sun, Rongbin Ouyang, Jiwei, Li

PDF

Open Access 1 Repo

TL;DR

OpenViDial 2.0 is a significantly larger multi-modal dialogue dataset with 5.6 million turns, incorporating visual contexts from movies and TV series to advance open-domain dialogue generation research.

Contribution

The paper introduces OpenViDial 2.0, a large-scale dataset with visual contexts, addressing the data scarcity in multi-modal dialogue learning.

Findings

01

Contains 5.6 million dialogue turns with visual contexts

02

Facilitates research on multi-modal pretraining for dialogue

03

Expands dataset scale significantly over previous versions

Abstract

In order to better simulate the real human conversation process, models need to generate dialogue utterances based on not only preceding textual contexts but also visual contexts. However, with the development of multi-modal dialogue learning, the dataset scale gradually becomes a bottleneck. In this report, we release OpenViDial 2.0, a larger-scale open-domain multi-modal dialogue dataset compared to the previous version OpenViDial 1.0. OpenViDial 2.0 contains a total number of 5.6 million dialogue turns extracted from either movies or TV series from different resources, and each dialogue turn is paired with its corresponding visual context. We hope this large-scale dataset can help facilitate future researches on open-domain multi-modal dialog generation, e.g., multi-modal pretraining for dialogue generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShannonAI/OpenViDial
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Human Pose and Action Recognition