A Bayesian Information-Theoretic Approach to Data Attribution
Dharmesh Tailor, Nicol\`o Felicioni, Kamil Ciosek

TL;DR
This paper introduces a Bayesian information-theoretic method for data attribution in machine learning, improving interpretability and scalability for modern neural networks.
Contribution
It formulates data attribution as an information loss problem, approximates it with Gaussian processes, and scales it for large models and datasets.
Findings
Aligns with classical influence scores for single-example attribution
Promotes diversity in subset selection for better interpretability
Scales to modern architectures and large datasets
Abstract
Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
