Data Distribution Valuation Using Generalized Bayesian Inference
Cuong N. Nguyen, Cuong V. Nguyen

TL;DR
This paper introduces a novel framework called Generalized Bayes Valuation for quantifying data distribution values from samples, applicable to various practical problems like annotator evaluation and data augmentation.
Contribution
It develops a unified Bayesian inference-based approach for data valuation that extends to continuous data streams, improving applicability and effectiveness.
Findings
Framework effectively evaluates data distributions in real-world scenarios.
Extension to continuous data streams enhances practical usability.
Experimental results confirm the framework's efficiency and effectiveness.
Abstract
We investigate the data distribution valuation problem, which aims to quantify the values of data distributions from their samples. This is a recently proposed problem that is related to but different from classical data valuation and can be applied to various applications. For this problem, we develop a novel framework called Generalized Bayes Valuation that utilizes generalized Bayesian inference with a loss constructed from transferability measures. This framework allows us to solve, in a unified way, seemingly unrelated practical problems, such as annotator evaluation and data augmentation. Using the Bayesian principles, we further improve and enhance the applicability of our framework by extending it to the continuous data stream setting. Our experiment results confirm the effectiveness and efficiency of our framework in different real-world scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
