Efficiently Verifiable Proofs of Data Attribution
Ari Karchmer, Martin Pawelczyk, Seth Neel

TL;DR
This paper introduces an interactive verification protocol that allows resource-constrained parties to efficiently verify data attributions provided by a powerful Prover, ensuring trustworthiness with formal guarantees and minimal computational effort.
Contribution
It proposes a PAC-verifiable interactive proof system for data attribution that is efficient, scalable, and applicable to linear functions, addressing trust issues in data attribution methods.
Findings
Verifies data attributions with high confidence and efficiency.
Verification workload scales independently of dataset size.
Detects deviations by the Prover with high probability.
Abstract
Data attribution methods aim to answer useful counterfactual questions like "what would a ML model's prediction be if it were trained on a different dataset?" However, estimation of data attribution models through techniques like empirical influence or "datamodeling" remains very computationally expensive. This causes a critical trust issue: if only a few computationally rich parties can obtain data attributions, how can resource-constrained parties trust that the provided attributions are indeed "good," especially when they are used for important downstream applications (e.g., data pricing)? In this paper, we address this trust issue by proposing an interactive verification paradigm for data attribution. An untrusted and computationally powerful Prover learns data attributions, and then engages in an interactive proof with a resource-constrained Verifier. Our main result is a protocol…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
