FASTTRACK: Fast and Accurate Fact Tracing for LLMs
Si Chen, Feiyang Kang, Ning Yu, Ruoxi Jia

TL;DR
FASTTRACK leverages Large Language Models to improve fact tracing accuracy and efficiency by validating supportive evidence and reducing the training data scope, significantly outperforming existing methods.
Contribution
The paper introduces FASTTRACK, a novel LLM-based method that enhances fact tracing by validating evidence and clustering data, addressing limitations of relevance and computational cost.
Findings
Over 100% improvement in F1 score over state-of-the-art methods.
FASTTRACK is 33 times faster than TracIn.
Achieves superior accuracy and efficiency in fact tracing.
Abstract
Fact tracing seeks to identify specific training examples that serve as the knowledge source for a given query. Existing approaches to fact tracing rely on assessing the similarity between each training sample and the query along a certain dimension, such as lexical similarity, gradient, or embedding space. However, these methods fall short of effectively distinguishing between samples that are merely relevant and those that actually provide supportive evidence for the information sought by the query. This limitation often results in suboptimal effectiveness. Moreover, these approaches necessitate the examination of the similarity of individual training points for each query, imposing significant computational demands and creating a substantial barrier for practical applications. This paper introduces FASTTRACK, a novel approach that harnesses the capabilities of Large Language Models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
