FASTTRACK: Fast and Accurate Fact Tracing for LLMs

Si Chen; Feiyang Kang; Ning Yu; Ruoxi Jia

arXiv:2404.15157·cs.CL·April 24, 2024

FASTTRACK: Fast and Accurate Fact Tracing for LLMs

Si Chen, Feiyang Kang, Ning Yu, Ruoxi Jia

PDF

Open Access

TL;DR

FASTTRACK leverages Large Language Models to improve fact tracing accuracy and efficiency by validating supportive evidence and reducing the training data scope, significantly outperforming existing methods.

Contribution

The paper introduces FASTTRACK, a novel LLM-based method that enhances fact tracing by validating evidence and clustering data, addressing limitations of relevance and computational cost.

Findings

01

Over 100% improvement in F1 score over state-of-the-art methods.

02

FASTTRACK is 33 times faster than TracIn.

03

Achieves superior accuracy and efficiency in fact tracing.

Abstract

Fact tracing seeks to identify specific training examples that serve as the knowledge source for a given query. Existing approaches to fact tracing rely on assessing the similarity between each training sample and the query along a certain dimension, such as lexical similarity, gradient, or embedding space. However, these methods fall short of effectively distinguishing between samples that are merely relevant and those that actually provide supportive evidence for the information sought by the query. This limitation often results in suboptimal effectiveness. Moreover, these approaches necessitate the examination of the similarity of individual training points for each query, imposing significant computational demands and creating a substantial barrier for practical applications. This paper introduces FASTTRACK, a novel approach that harnesses the capabilities of Large Language Models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management