A Bayesian Information-Theoretic Approach to Data Attribution

Dharmesh Tailor; Nicol\`o Felicioni; Kamil Ciosek

arXiv:2604.03858·cs.LG·April 10, 2026

A Bayesian Information-Theoretic Approach to Data Attribution

Dharmesh Tailor, Nicol\`o Felicioni, Kamil Ciosek

PDF

TL;DR

This paper introduces a Bayesian information-theoretic method for data attribution in machine learning, improving interpretability and scalability for modern neural networks.

Contribution

It formulates data attribution as an information loss problem, approximates it with Gaussian processes, and scales it for large models and datasets.

Findings

01

Aligns with classical influence scores for single-example attribution

02

Promotes diversity in subset selection for better interpretability

03

Scales to modern architectures and large datasets

Abstract

Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.