LAVA: Data Valuation without Pre-Specified Learning Algorithms
Hoang Anh Just, Feiyang Kang, Jiachen T. Wang, Yi Zeng, Myeongseob Ko,, Ming Jin, Ruoxi Jia

TL;DR
LAVA introduces a learning-algorithm-agnostic data valuation framework using Wasserstein distance, enabling fast, reliable data value assessment without retraining models, beneficial for data sourcing and pricing.
Contribution
The paper proposes a novel, learning-agnostic data valuation method based on class-wise Wasserstein distance, bypassing the need for model retraining and dependence on specific algorithms.
Findings
Achieves significant performance improvements over state-of-the-art methods.
Enables fast data valuation directly from optimization solver outputs.
Effectively detects low-quality data in various scenarios.
Abstract
Traditionally, data valuation (DV) is posed as a problem of equitably splitting the validation performance of a learning algorithm among the training data. As a result, the calculated data values depend on many design choices of the underlying learning algorithm. However, this dependence is undesirable for many DV use cases, such as setting priorities over different data sources in a data acquisition process and informing pricing mechanisms in a data marketplace. In these scenarios, data needs to be valued before the actual analysis and the choice of the learning algorithm is still undetermined then. Another side-effect of the dependence is that to assess the value of individual points, one needs to re-run the learning algorithm with and without a point, which incurs a large computation burden. This work leapfrogs over the current limits of data valuation methods by introducing a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
