Interview: A Large-Scale Open-Source Corpus of Media Dialog

Bodhisattwa Prasad Majumder; Shuyang Li; Jianmo Ni; Julian McAuley

arXiv:2004.03090·cs.CL·April 8, 2020·5 cites

Interview: A Large-Scale Open-Source Corpus of Media Dialog

Bodhisattwa Prasad Majumder, Shuyang Li, Jianmo Ni, Julian McAuley

PDF

Open Access

TL;DR

This paper introduces 'Interview', a large-scale media dialog dataset from news interviews, which improves dialog modeling and response specificity in conversational AI systems.

Contribution

The paper presents a new extensive media dialog dataset with speaker role annotations, enhancing the training and performance of dialog systems in real-world scenarios.

Findings

01

Models trained on 'Interview' outperform others in zero-shot out-of-domain tasks.

02

Speaker role annotations improve dialog system responsiveness and specificity.

03

The dataset enables more natural and engaging interview-style conversations.

Abstract

Existing conversational datasets consist either of written proxies for dialog or small-scale transcriptions of natural speech. We introduce 'Interview': a large-scale (105K conversations) media dialog dataset collected from news interview transcripts. Compared to existing large-scale proxies for conversational data, language models trained on our dataset exhibit better zero-shot out-of-domain performance on existing spoken dialog datasets, demonstrating its usefulness in modeling real-world conversations. 'Interview' contains speaker role annotations for each turn, facilitating the development of engaging, responsive dialog systems. In fact, experiments on two dialog tasks show that leveraging such labels improves performance over strong speaker-agnostic baselines, and enabling models to generate more specific and inquisitive responses in interview-style conversations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems