Unifying and Optimizing Data Values for Selection via Sequential-Decision-Making
Hongliang Chi, Qiong Wu, Zhengyi Zhou, Jonathan Light, Emily Dodwell,, Yao Ma

TL;DR
This paper reformulates data selection as a sequential decision process, unifies existing valuation methods like Data Shapley, and introduces an efficient approximation scheme with learned utility models, improving selection effectiveness.
Contribution
It introduces a sequential decision-making framework for data valuation, unifies existing methods, and proposes a scalable approximation approach using learned bipartite graphs.
Findings
The framework unifies and reinterprets data valuation methods.
The approximation scheme with learned utility models is effective.
Extensive experiments validate the approach across datasets.
Abstract
Data selection has emerged as a crucial downstream application of data valuation. While existing data valuation methods have shown promise in selection tasks, the theoretical foundations and full potential of using data values for selection remain largely unexplored. In this work, we first demonstrate that data values applied for selection can be naturally reformulated as a sequential-decision-making problem, where the optimal data value can be derived through dynamic programming. We show this framework unifies and reinterprets existing methods like Data Shapley through the lens of approximate dynamic programming, specifically as myopic reward function approximations to this sequential problem. Furthermore, we analyze how sequential data selection optimality is affected when the ground-truth utility function exhibits monotonic submodularity with curvature. To address the computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Process Monitoring · Multi-Criteria Decision Making · Big Data and Business Intelligence
