Core: Robust Factual Precision with Informative Sub-Claim Identification
Zhengping Jiang, Jingyu Zhang, Nathaniel Weir, Seth Ebner, Miriam, Wanner, Kate Sanders, Daniel Khashabi, Anqi Liu, Benjamin Van Durme

TL;DR
This paper introduces Core, a new subclaim filtering method that enhances the robustness of factual precision metrics for large language models by reducing manipulation through repetitive claims.
Contribution
We propose Core, a customizable subclaim selection component that improves the robustness of factual precision metrics like FActScore across diverse knowledge domains.
Findings
Core significantly increases metric robustness against manipulation.
Augmented metrics with Core outperform existing methods.
Framework and dataset released for community use.
Abstract
Hallucinations pose a challenge to the application of large language models (LLMs) thereby motivating the development of metrics to evaluate factual precision. We observe that popular metrics using the Decompose-Then-Verify framework, such as \FActScore, can be manipulated by adding obvious or repetitive subclaims to artificially inflate scores. This observation motivates our new customizable plug-and-play subclaim selection component called Core, which filters down individual subclaims according to their uniqueness and informativeness. We show that many popular factual precision metrics augmented by Core are substantially more robust on a wide range of knowledge domains. We release an evaluation framework supporting easy and modular use of Core and various decomposition strategies, which we recommend adoption by the community. We also release an expansion of the FActScore biography…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Spam and Phishing Detection · Authorship Attribution and Profiling
