Efficiently Verifiable Proofs of Data Attribution

Ari Karchmer; Martin Pawelczyk; Seth Neel

arXiv:2508.10866·cs.LG·August 19, 2025

Efficiently Verifiable Proofs of Data Attribution

Ari Karchmer, Martin Pawelczyk, Seth Neel

PDF

TL;DR

This paper introduces an interactive verification protocol that allows resource-constrained parties to efficiently verify data attributions provided by a powerful Prover, ensuring trustworthiness with formal guarantees and minimal computational effort.

Contribution

It proposes a PAC-verifiable interactive proof system for data attribution that is efficient, scalable, and applicable to linear functions, addressing trust issues in data attribution methods.

Findings

01

Verifies data attributions with high confidence and efficiency.

02

Verification workload scales independently of dataset size.

03

Detects deviations by the Prover with high probability.

Abstract

Data attribution methods aim to answer useful counterfactual questions like "what would a ML model's prediction be if it were trained on a different dataset?" However, estimation of data attribution models through techniques like empirical influence or "datamodeling" remains very computationally expensive. This causes a critical trust issue: if only a few computationally rich parties can obtain data attributions, how can resource-constrained parties trust that the provided attributions are indeed "good," especially when they are used for important downstream applications (e.g., data pricing)? In this paper, we address this trust issue by proposing an interactive verification paradigm for data attribution. An untrusted and computationally powerful Prover learns data attributions, and then engages in an interactive proof with a resource-constrained Verifier. Our main result is a protocol…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.