CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification
Stephanie Schoch, Haifeng Xu, Yangfeng Ji

TL;DR
This paper introduces CS-Shapley, a novel data valuation method for classification that differentiates helpful and harmful instances, improving noisy label detection and data removal tasks, with demonstrated transferability across models.
Contribution
The paper proposes CS-Shapley, a new value function for Shapley-based data valuation that uniquely evaluates in-class and out-of-class contributions in classification tasks.
Findings
CS-Shapley outperforms existing methods in noisy label detection.
The value function is theoretically unique under certain properties.
Data values are transferable across different classifiers.
Abstract
Data valuation, or the valuation of individual datum contributions, has seen growing interest in machine learning due to its demonstrable efficacy for tasks such as noisy label detection. In particular, due to the desirable axiomatic properties, several Shapley value approximation methods have been proposed. In these methods, the value function is typically defined as the predictive accuracy over the entire development set. However, this limits the ability to differentiate between training instances that are helpful or harmful to their own classes. Intuitively, instances that harm their own classes may be noisy or mislabeled and should receive a lower valuation than helpful instances. In this work, we propose CS-Shapley, a Shapley value with a new value function that discriminates between training instances' in-class and out-of-class contributions. Our theoretical analysis shows the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Advanced Statistical Methods and Models
