Daunce: Data Attribution through Uncertainty Estimation

Xingyuan Pan; Chenlu Ye; Joseph Melkonian; Jiaqi W. Ma; Tong Zhang

arXiv:2505.23223·cs.LG·May 30, 2025

Daunce: Data Attribution through Uncertainty Estimation

Xingyuan Pan, Chenlu Ye, Joseph Melkonian, Jiaqi W. Ma, Tong Zhang

PDF

Open Access 3 Reviews

TL;DR

Daunce introduces a scalable data attribution method using uncertainty estimation by fine-tuning perturbed models and analyzing loss covariance, outperforming existing methods in accuracy and applicability to large language models.

Contribution

It presents a novel, scalable data attribution technique based on uncertainty estimation, effective for large models and proprietary LLMs, with improved accuracy over prior gradient-based methods.

Findings

01

Daunce achieves higher attribution accuracy than existing methods.

02

It is scalable to large language models, including proprietary GPTs.

03

The method is effective across vision and language tasks.

Abstract

Training data attribution (TDA) methods aim to identify which training examples influence a model's predictions on specific test data most. By quantifying these influences, TDA supports critical applications such as data debugging, curation, and valuation. Gradient-based TDA methods rely on gradients and second-order information, limiting their applicability at scale. While recent random projection-based methods improve scalability, they often suffer from degraded attribution accuracy. Motivated by connections between uncertainty and influence functions, we introduce Daunce - a simple yet effective data attribution approach through uncertainty estimation. Our method operates by fine-tuning a collection of perturbed models and computing the covariance of per-example losses across these models as the attribution score. Daunce is scalable to large language models (LLMs) and achieves more…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

- As mentioned in the summary, the paper's primary contribution, DAUNCE, introduces a simple yet novel approach to TDA. Instead of relying on computationally expensive second-order information like the Hessian matrix, which is a major bottleneck for evaluating large models, it uses uncertainty estimation. The method of fine-tuning K perturbed models and calculating the covariance of their losses is an efficient and smart approach to avoid the bottleneck. - The paper also conducted extensive expe

Weaknesses

- The computation cost would be eye-watering considering all the perturbed models the method used. From `figure 1 (a)` and `figure 6` in the paper, it seems like DAUNCE only outperforms other methods if `K` is at least `100`. Even with a LoRA rank of `64`, this still looks really expensive. The TRAK (as baseline) only needs one forward and backward pass, but DAUNCE needs `k` fine-tuning runs. - It seems like the method saturates after `k` approaches 200 (from `figure 6`); it would be nice to see

Reviewer 02Rating 2Confidence 4

Strengths

**1. Well-motivated and clearly presented** This paper is well-motivated with proper background discussion apt summary of their proposed methods. **2. Black-box and white-box applicability** The proposed method is applicable both in black-box and white-box settings which is of immense practical use in real life application of AI

Weaknesses

**1. Limited novelty compared to TARK.** This paper criticises TRAK [1] in their motivation due to their projection error. However, their proposed primary equation (5) is directly related to Eqn (11) of TRAK where they both applied a uniformly distributed random matrix to approximate the computationally expensive Taylor expression. Ensembling from randomness in the approximation is also originally proposed in TARK. Therefore, the claimed novelty of this paper's in bringing scalability to TDA met

Reviewer 03Rating 4Confidence 3

Strengths

1. Introduces a novel data-attribution algorithm leveraging uncertainty estimation rather than relying on gradients or Hessians. 2. Extends to large-scale settings, even black-box LLMs, demonstrating good generalization beyond standard gradient-based attribution setups.

Weaknesses

1. The method incurs additional $O(K·|D^k|)$ training cost and requires storing K models. Computational and memory trade-offs are insufficiently analyzed and not compared against baselines. The scale and choice of $|D^k|$ are not specified or discussed, making it hard to assess practical overhead. 2. Experimental settings and baseline selections are inconsistent. Missing feasible baselines and inconsistent settings weaken the strength of the empirical evidence.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Linear Layer · Residual Connection · Dense Connections · Linear Warmup With Cosine Annealing · Attention Dropout · Softmax · Discriminative Fine-Tuning · Weight Decay