Daunce: Data Attribution through Uncertainty Estimation
Xingyuan Pan, Chenlu Ye, Joseph Melkonian, Jiaqi W. Ma, Tong Zhang

TL;DR
Daunce introduces a scalable data attribution method using uncertainty estimation by fine-tuning perturbed models and analyzing loss covariance, outperforming existing methods in accuracy and applicability to large language models.
Contribution
It presents a novel, scalable data attribution technique based on uncertainty estimation, effective for large models and proprietary LLMs, with improved accuracy over prior gradient-based methods.
Findings
Daunce achieves higher attribution accuracy than existing methods.
It is scalable to large language models, including proprietary GPTs.
The method is effective across vision and language tasks.
Abstract
Training data attribution (TDA) methods aim to identify which training examples influence a model's predictions on specific test data most. By quantifying these influences, TDA supports critical applications such as data debugging, curation, and valuation. Gradient-based TDA methods rely on gradients and second-order information, limiting their applicability at scale. While recent random projection-based methods improve scalability, they often suffer from degraded attribution accuracy. Motivated by connections between uncertainty and influence functions, we introduce Daunce - a simple yet effective data attribution approach through uncertainty estimation. Our method operates by fine-tuning a collection of perturbed models and computing the covariance of per-example losses across these models as the attribution score. Daunce is scalable to large language models (LLMs) and achieves more…
Peer Reviews
Decision·Submitted to ICLR 2026
- As mentioned in the summary, the paper's primary contribution, DAUNCE, introduces a simple yet novel approach to TDA. Instead of relying on computationally expensive second-order information like the Hessian matrix, which is a major bottleneck for evaluating large models, it uses uncertainty estimation. The method of fine-tuning K perturbed models and calculating the covariance of their losses is an efficient and smart approach to avoid the bottleneck. - The paper also conducted extensive expe
- The computation cost would be eye-watering considering all the perturbed models the method used. From `figure 1 (a)` and `figure 6` in the paper, it seems like DAUNCE only outperforms other methods if `K` is at least `100`. Even with a LoRA rank of `64`, this still looks really expensive. The TRAK (as baseline) only needs one forward and backward pass, but DAUNCE needs `k` fine-tuning runs. - It seems like the method saturates after `k` approaches 200 (from `figure 6`); it would be nice to see
**1. Well-motivated and clearly presented** This paper is well-motivated with proper background discussion apt summary of their proposed methods. **2. Black-box and white-box applicability** The proposed method is applicable both in black-box and white-box settings which is of immense practical use in real life application of AI
**1. Limited novelty compared to TARK.** This paper criticises TRAK [1] in their motivation due to their projection error. However, their proposed primary equation (5) is directly related to Eqn (11) of TRAK where they both applied a uniformly distributed random matrix to approximate the computationally expensive Taylor expression. Ensembling from randomness in the approximation is also originally proposed in TARK. Therefore, the claimed novelty of this paper's in bringing scalability to TDA met
1. Introduces a novel data-attribution algorithm leveraging uncertainty estimation rather than relying on gradients or Hessians. 2. Extends to large-scale settings, even black-box LLMs, demonstrating good generalization beyond standard gradient-based attribution setups.
1. The method incurs additional $O(K·|D^k|)$ training cost and requires storing K models. Computational and memory trade-offs are insufficiently analyzed and not compared against baselines. The scale and choice of $|D^k|$ are not specified or discussed, making it hard to assess practical overhead. 2. Experimental settings and baseline selections are inconsistent. Missing feasible baselines and inconsistent settings weaken the strength of the empirical evidence.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Linear Layer · Residual Connection · Dense Connections · Linear Warmup With Cosine Annealing · Attention Dropout · Softmax · Discriminative Fine-Tuning · Weight Decay
