Data Shapley Value for Handling Noisy Labels: An application in Screening COVID-19 Pneumonia from Chest CT Scans
Nastaran Enshaei, Moezedin Javad Rafiee, Arash Mohammadi, Farnoosh, Naderkhani

TL;DR
This paper explores how the Data Shapley Value can be used to identify noisy labels in training data for COVID-19 CT scan classification, revealing that the choice of evaluation metric significantly impacts its effectiveness.
Contribution
It provides a comparative analysis of how different evaluation metrics affect the Data Shapley Value's ability to detect noisy labels in medical imaging data.
Findings
Data SV can effectively identify noisy labels in COVID-19 CT data.
The effectiveness of SV in detecting noisy labels varies with the evaluation metric used.
Different evaluation metrics significantly influence the importance scores assigned by SV.
Abstract
A long-standing challenge of deep learning models involves how to handle noisy labels, especially in applications where human lives are at stake. Adoption of the data Shapley Value (SV), a cooperative game theoretical approach, is an intelligent valuation solution to tackle the issue of noisy labels. Data SV can be used together with a learning model and an evaluation metric to validate each training point's contribution to the model's performance. The SV of a data point, however, is not unique and depends on the learning model, the evaluation metric, and other data points collaborating in the training game. However, effects of utilizing different evaluation metrics for computation of the SV, detecting the noisy labels, and measuring the data points' importance has not yet been thoroughly investigated. In this context, we performed a series of comparative analyses to assess SV's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
