Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining

Weiyi Wang; Junwei Deng; Yuzheng Hu; Shiyuan Zhang; Xirui Jiang; Runting Zhang; Han Zhao; Jiaqi W. Ma

arXiv:2505.24261·cs.LG·October 24, 2025

Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining

Weiyi Wang, Junwei Deng, Yuzheng Hu, Shiyuan Zhang, Xirui Jiang, Runting Zhang, Han Zhao, Jiaqi W. Ma

PDF

1 Repo 1 Video

TL;DR

This paper investigates the hyperparameter sensitivity of data attribution methods, highlighting the challenge of tuning these parameters efficiently without costly retraining, and proposes a theoretical approach for practical hyperparameter selection.

Contribution

It provides the first large-scale empirical study on hyperparameter sensitivity in data attribution and introduces a theoretical method for hyperparameter tuning without retraining.

Findings

01

Most data attribution methods are sensitive to key hyperparameters.

02

Evaluating attribution performance is costly due to retraining requirements.

03

A lightweight, theory-based hyperparameter selection procedure is effective across benchmarks.

Abstract

Data attribution methods, which quantify the influence of individual training data points on a machine learning model, have gained increasing popularity in data-centric applications in modern AI. Despite a recent surge of new methods developed in this space, the impact of hyperparameter tuning in these methods remains under-explored. In this work, we present the first large-scale empirical study to understand the hyperparameter sensitivity of common data attribution methods. Our results show that most methods are indeed sensitive to certain key hyperparameters. However, unlike typical machine learning algorithms -- whose hyperparameters can be tuned using computationally-cheap validation metrics -- evaluating data attribution performance often requires retraining models on subsets of training data, making such metrics prohibitively costly for hyperparameter tuning. This poses a critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

data-attribution-hp/data-attribution-hp
pytorchOfficial

Videos

Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining· slideslive