A Claim Decomposition Benchmark for Long-form Answer Verification

Zhihao Zhang; Yixing Fan; Ruqing Zhang; Jiafeng Guo

arXiv:2410.12558·cs.CL·October 17, 2024

A Claim Decomposition Benchmark for Long-form Answer Verification

Zhihao Zhang, Yixing Fan, Ruqing Zhang, Jiafeng Guo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new benchmark dataset for identifying atomic, checkworthy claims in long-form responses from LLMs, aiming to improve factuality and verifiability.

Contribution

It presents the Chinese Atomic Claim Decomposition Dataset (CACDD), a high-quality, expert-annotated benchmark for claim decomposition in LLM responses.

Findings

01

Claim decomposition is highly challenging for current LLMs.

02

Zero-shot, few-shot, and fine-tuned models show varying performance on the task.

03

The dataset and baseline results highlight the need for further research in claim identification.

Abstract

The advancement of LLMs has significantly boosted the performance of complex long-form question answering tasks. However, one prominent issue of LLMs is the generated "hallucination" responses that are not factual. Consequently, attribution for each claim in responses becomes a common solution to improve the factuality and verifiability. Existing researches mainly focus on how to provide accurate citations for the response, which largely overlook the importance of identifying the claims or statements for each response. To bridge this gap, we introduce a new claim decomposition benchmark, which requires building system that can identify atomic and checkworthy claims for LLM responses. Specifically, we present the Chinese Atomic Claim Decomposition Dataset (CACDD), which builds on the WebCPM dataset with additional expert annotations to ensure high data quality. The CACDD encompasses a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FBzzh/CACDD
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Access Control and Trust · Natural Language Processing Techniques

MethodsFocus