Core: Robust Factual Precision with Informative Sub-Claim Identification

Zhengping Jiang; Jingyu Zhang; Nathaniel Weir; Seth Ebner; Miriam; Wanner; Kate Sanders; Daniel Khashabi; Anqi Liu; Benjamin Van Durme

arXiv:2407.03572·cs.CL·October 17, 2024

Core: Robust Factual Precision with Informative Sub-Claim Identification

Zhengping Jiang, Jingyu Zhang, Nathaniel Weir, Seth Ebner, Miriam, Wanner, Kate Sanders, Daniel Khashabi, Anqi Liu, Benjamin Van Durme

PDF

Open Access 1 Repo

TL;DR

This paper introduces Core, a new subclaim filtering method that enhances the robustness of factual precision metrics for large language models by reducing manipulation through repetitive claims.

Contribution

We propose Core, a customizable subclaim selection component that improves the robustness of factual precision metrics like FActScore across diverse knowledge domains.

Findings

01

Core significantly increases metric robustness against manipulation.

02

Augmented metrics with Core outperform existing methods.

03

Framework and dataset released for community use.

Abstract

Hallucinations pose a challenge to the application of large language models (LLMs) thereby motivating the development of metrics to evaluate factual precision. We observe that popular metrics using the Decompose-Then-Verify framework, such as \FActScore, can be manipulated by adding obvious or repetitive subclaims to artificially inflate scores. This observation motivates our new customizable plug-and-play subclaim selection component called Core, which filters down individual subclaims according to their uniqueness and informativeness. We show that many popular factual precision metrics augmented by Core are substantially more robust on a wide range of knowledge domains. We release an evaluation framework supporting easy and modular use of Core and various decomposition strategies, which we recommend adoption by the community. We also release an expansion of the FActScore biography…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zipjiang/core
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Spam and Phishing Detection · Authorship Attribution and Profiling