Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for   Explaining Language Model Predictions

Jingtan Wang; Xiaoqiang Lin; Rui Qiao; Chuan-Sheng Foo; Bryan Kian; Hsiang Low

arXiv:2406.04606·cs.LG·June 10, 2024

Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions

Jingtan Wang, Xiaoqiang Lin, Rui Qiao, Chuan-Sheng Foo, Bryan Kian, Hsiang Low

PDF

Open Access 1 Repo

TL;DR

This paper introduces FreeShap, a computationally efficient, fine-tuning-free approximation of the Shapley value for explaining language model predictions, demonstrating improved robustness and applicability to large models.

Contribution

We propose FreeShap, a novel approximation method for Shapley values that is efficient, robust, and applicable to large language models, enhancing instance attribution explanations.

Findings

01

FreeShap outperforms existing methods in instance attribution tasks.

02

FreeShap is effective for data removal, selection, and label correction.

03

The method scales to large language models.

Abstract

The increasing complexity of foundational models underscores the necessity for explainability, particularly for fine-tuning, the most widely used training method for adapting models to downstream tasks. Instance attribution, one type of explanation, attributes the model prediction to each training example by an instance score. However, the robustness of instance scores, specifically towards dataset resampling, has been overlooked. To bridge this gap, we propose a notion of robustness on the sign of the instance score. We theoretically and empirically demonstrate that the popular leave-one-out-based methods lack robustness, while the Shapley value behaves significantly better, but at a higher computational cost. Accordingly, we introduce an efficient fine-tuning-free approximation of the Shapley value (FreeShap) for instance attribution based on the neural tangent kernel. We empirically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jtwang2000/freeshap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)