Data Attribution in Adaptive Learning

Amit Kiran Rege

arXiv:2604.04892·cs.LG·April 7, 2026

Data Attribution in Adaptive Learning

Amit Kiran Rege

PDF

TL;DR

This paper addresses the challenge of attributing data contributions in adaptive learning models where data influences future data collection, introducing formal methods and identifying conditions for accurate attribution.

Contribution

It formalizes occurrence-level attribution in adaptive learning, proves limitations of replay data, and identifies a structural class enabling target identification from logged data.

Findings

01

Replay-side information cannot generally recover the attribution target.

02

A structural class exists where the target is identifiable from logged data.

03

Formalization of occurrence-level attribution for finite-horizon adaptive learning.

Abstract

Machine learning models increasingly generate their own training data -- online bandits, reinforcement learning, and post-training pipelines for language models are leading examples. In these adaptive settings, a single training observation both updates the learner and shifts the distribution of future data the learner will collect. Standard attribution methods, designed for static datasets, ignore this feedback. We formalize occurrence-level attribution for finite-horizon adaptive learning via a conditional interventional target, prove that replay-side information cannot recover it in general, and identify a structural class in which the target is identified from logged data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.