DialFact: A Benchmark for Fact-Checking in Dialogue
Prakhar Gupta, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong

TL;DR
This paper introduces DialFact, a comprehensive benchmark dataset for fact-checking in dialogue, addressing the unique challenges of conversational claims and evaluating models' ability to verify, retrieve evidence, and classify claims.
Contribution
The paper presents DialFact, the first benchmark dataset for dialogue fact-checking, along with a simple data-efficient method to improve model performance in this domain.
Findings
Existing models trained on non-dialogue data perform poorly on DialFact.
A simple data-efficient approach improves fact-checking accuracy in dialogue.
Challenges include handling colloquialisms, coreferences, and retrieval ambiguities.
Abstract
Fact-checking is an essential tool to mitigate the spread of misinformation and disinformation. We introduce the task of fact-checking in dialogue, which is a relatively unexplored area. We construct DialFact, a testing benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia. There are three sub-tasks in DialFact: 1) Verifiable claim detection task distinguishes whether a response carries verifiable factual information; 2) Evidence retrieval task retrieves the most relevant Wikipedia snippets as evidence; 3) Claim verification task predicts a dialogue response to be supported, refuted, or not enough information. We found that existing fact-checking models trained on non-dialogue data like FEVER fail to perform well on our task, and thus, we propose a simple yet data-efficient solution to effectively improve fact-checking performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Misinformation and Its Impacts · Text Readability and Simplification
