FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification

Xiangyan Chen; Yufeng Li; Yujian Gan; Arkaitz Zubiaga; and Matthew Purver

arXiv:2508.05782·cs.CL·August 11, 2025

FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification

Xiangyan Chen, Yufeng Li, Yujian Gan, Arkaitz Zubiaga, and Matthew Purver

PDF

Open Access

TL;DR

FineDialFact introduces a new benchmark for fine-grained fact verification in dialogue responses, highlighting the challenges and potential of Chain-of-Thought reasoning methods in improving factual accuracy detection.

Contribution

The paper presents a novel benchmark and dataset for atomic fact verification in dialogue, emphasizing the need for fine-grained approaches beyond coarse factual labels.

Findings

01

Chain-of-Thought reasoning improves verification performance.

02

Best F1-score on HybriDialogue is 0.75, showing room for improvement.

03

Benchmark remains challenging for current methods.

Abstract

Large Language Models (LLMs) are known to produce hallucinations - factually incorrect or fabricated information - which poses significant challenges for many Natural Language Processing (NLP) applications, such as dialogue systems. As a result, detecting hallucinations has become a critical area of research. Current approaches to hallucination detection in dialogue systems primarily focus on verifying the factual consistency of generated responses. However, these responses often contain a mix of accurate, inaccurate or unverifiable facts, making one factual label overly simplistic and coarse-grained. In this paper, we introduce a benchmark, FineDialFact, for fine-grained dialogue fact verification, which involves verifying atomic facts extracted from dialogue responses. To support this, we construct a dataset based on publicly available dialogue datasets and evaluate it using various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems