MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

Song Feng; Siva Sankalp Patel; Hui Wan; Sachindra Joshi

arXiv:2109.12595·cs.CL·May 4, 2022

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

Song Feng, Siva Sankalp Patel, Hui Wan, Sachindra Joshi

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces MultiDoc2Dial, a new dataset and task for modeling goal-oriented dialogues grounded in multiple documents across various domains, addressing more realistic multi-topic information-seeking scenarios.

Contribution

It presents a novel dataset and task for multi-document grounded dialogues, along with baseline models and experimental results to foster future research.

Findings

01

Baseline models demonstrate the feasibility of multi-document dialogue modeling.

02

Experimental results highlight challenges and potential directions for improvement.

03

The dataset covers four diverse domains for comprehensive evaluation.

Abstract

We propose MultiDoc2Dial, a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single given document or passage. In this work, we aim to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics, and hence is grounded on different documents. To facilitate such a task, we introduce a new dataset that contains dialogues grounded in multiple documents from four different domains. We also explore modeling the dialogue-based and document-based context in the dataset. We present strong baseline approaches and various experimental results, aiming to support further research efforts on such a task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/multidoc2dial
pytorchOfficial

Datasets

IBM/multidoc2dial
dataset· 292 dl
292 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems