A Claim Decomposition Benchmark for Long-form Answer Verification
Zhihao Zhang, Yixing Fan, Ruqing Zhang, Jiafeng Guo

TL;DR
This paper introduces a new benchmark dataset for identifying atomic, checkworthy claims in long-form responses from LLMs, aiming to improve factuality and verifiability.
Contribution
It presents the Chinese Atomic Claim Decomposition Dataset (CACDD), a high-quality, expert-annotated benchmark for claim decomposition in LLM responses.
Findings
Claim decomposition is highly challenging for current LLMs.
Zero-shot, few-shot, and fine-tuned models show varying performance on the task.
The dataset and baseline results highlight the need for further research in claim identification.
Abstract
The advancement of LLMs has significantly boosted the performance of complex long-form question answering tasks. However, one prominent issue of LLMs is the generated "hallucination" responses that are not factual. Consequently, attribution for each claim in responses becomes a common solution to improve the factuality and verifiability. Existing researches mainly focus on how to provide accurate citations for the response, which largely overlook the importance of identifying the claims or statements for each response. To bridge this gap, we introduce a new claim decomposition benchmark, which requires building system that can identify atomic and checkworthy claims for LLM responses. Specifically, we present the Chinese Atomic Claim Decomposition Dataset (CACDD), which builds on the WebCPM dataset with additional expert annotations to ensure high data quality. The CACDD encompasses a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Access Control and Trust · Natural Language Processing Techniques
MethodsFocus
