On the Usage of Gaussian Process for Efficient Data Valuation
Cl\'ement B\'enesse, Patrick Mesana, Ath\'ena\"is Gautier, S\'ebastien Gambs

TL;DR
This paper introduces a Bayesian approach using Gaussian Processes for efficient data valuation in machine learning, enabling fast and theoretically grounded impact analysis of data points on model training.
Contribution
It presents a novel canonical decomposition of data valuation methods and leverages Gaussian Processes for practical and rapid utility estimation on sub-models.
Findings
The method allows fast valuation updates.
The approach is theoretically grounded in Bayesian theory.
It improves efficiency in data impact analysis.
Abstract
In machine learning, knowing the impact of a given datum on model training is a fundamental task referred to as Data Valuation. Building on previous works from the literature, we have designed a novel canonical decomposition allowing practitioners to analyze any data valuation method as the combination of two parts: a utility function that captures characteristics from a given model and an aggregation procedure that merges such information. We also propose to use Gaussian Processes as a means to easily access the utility function on ``sub-models'', which are models trained on a subset of the training set. The strength of our approach stems from both its theoretical grounding in Bayesian theory, and its practical reach, by enabling fast estimation of valuations thanks to efficient update formulae.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Modeling and Causal Inference · Forecasting Techniques and Applications
