Enhancing Training Data Attribution for Large Language Models with   Fitting Error Consideration

Kangxi Wu; Liang Pang; Huawei Shen; Xueqi Cheng

arXiv:2410.01285·cs.CL·November 20, 2024

Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration

Kangxi Wu, Liang Pang, Huawei Shen, Xueqi Cheng

PDF

Open Access 1 Video

TL;DR

This paper introduces DDA, a novel training data attribution method for large language models that improves influence function accuracy by addressing fitting errors, leading to better data attribution and model interpretability.

Contribution

The paper proposes DDA, a new TDA method that enhances influence functions by removing bias and smoothing influence scores, improving attribution accuracy for large language models.

Findings

01

DDA achieves an averaged AUC of 91.64%.

02

DDA outperforms existing TDA methods.

03

DDA is effective across various models and data sources.

Abstract

The black-box nature of large language models (LLMs) poses challenges in interpreting results, impacting issues such as data intellectual property protection and hallucination tracing. Training data attribution (TDA) methods are considered effective solutions to address these challenges. Most recent TDA methods rely on influence functions, assuming the model achieves minimized empirical risk. However, achieving this criterion is difficult, and sourcing accuracy can be compromised by fitting errors during model training. In this paper, we introduce a novel TDA method called Debias and Denoise Attribution (DDA), which enhances influence functions by addressing fitting errors. Specifically, the debias strategy seeks to improve the performance of influence functions by eliminating the knowledge bias present in the base model before fine-tuning, while the denoise strategy aims to reduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management

MethodsBalanced Selection