CS-Shapley: Class-wise Shapley Values for Data Valuation in   Classification

Stephanie Schoch; Haifeng Xu; Yangfeng Ji

arXiv:2211.06800·cs.LG·November 15, 2022·5 cites

CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification

Stephanie Schoch, Haifeng Xu, Yangfeng Ji

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces CS-Shapley, a novel data valuation method for classification that differentiates helpful and harmful instances, improving noisy label detection and data removal tasks, with demonstrated transferability across models.

Contribution

The paper proposes CS-Shapley, a new value function for Shapley-based data valuation that uniquely evaluates in-class and out-of-class contributions in classification tasks.

Findings

01

CS-Shapley outperforms existing methods in noisy label detection.

02

The value function is theoretically unique under certain properties.

03

Data values are transferable across different classifiers.

Abstract

Data valuation, or the valuation of individual datum contributions, has seen growing interest in machine learning due to its demonstrable efficacy for tasks such as noisy label detection. In particular, due to the desirable axiomatic properties, several Shapley value approximation methods have been proposed. In these methods, the value function is typically defined as the predictive accuracy over the entire development set. However, this limits the ability to differentiate between training instances that are helpful or harmful to their own classes. Intuitively, instances that harm their own classes may be noisy or mislabeled and should receive a lower valuation than helpful instances. In this work, we propose CS-Shapley, a Shapley value with a new value function that discriminates between training instances' in-class and out-of-class contributions. Our theoretical analysis shows the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Advanced Statistical Methods and Models