Scalable Data Attribution via Forward-Only Test-Time Inference

Sibo Ma; Julian Nyarko

arXiv:2511.19803·cs.LG·November 26, 2025

Scalable Data Attribution via Forward-Only Test-Time Inference

Sibo Ma, Julian Nyarko

PDF

Open Access

TL;DR

This paper introduces a scalable data attribution method that shifts computation from inference to training simulation, enabling real-time attribution in large models with lower computational costs.

Contribution

The proposed method eliminates per-query backward passes by simulating training influence through short-horizon gradient propagation, improving scalability and efficiency.

Findings

01

Matches or surpasses state-of-the-art attribution metrics

02

Offers orders-of-magnitude lower inference cost

03

Applicable to large pretrained models for real-time attribution

Abstract

Data attribution seeks to trace model behavior back to the training examples that shaped it, enabling debugging, auditing, and data valuation at scale. Classical influence-function methods offer a principled foundation but remain impractical for modern networks because they require expensive backpropagation or Hessian inversion at inference. We propose a data attribution method that preserves the same first-order counterfactual target while eliminating per-query backward passes. Our approach simulates each training example's parameter influence through short-horizon gradient propagation during training and later reads out attributions for any query using only forward evaluations. This design shifts computation from inference to simulation, reflecting real deployment regimes where a model may serve billions of user queries but originate from a fixed, finite set of data sources (for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Complex Network Analysis Techniques · Information Retrieval and Search Behavior