Data Valuation with Gradient Similarity
Nathaniel J. Evans, Gordon B. Mills, Guanming Wu, Xubo Song, Shannon, McWeeney

TL;DR
This paper introduces DVGS, a simple and scalable data valuation method based on gradient similarity, which effectively identifies low-quality data across various domains, improving data cleaning and model performance.
Contribution
The paper proposes a novel data valuation method, DVGS, that is easy to apply, scalable, and performs well in identifying noisy or mislabeled data compared to existing methods.
Findings
DVGS performs comparably or better than baseline methods.
It effectively identifies mislabeled and noisy data.
The method is applicable across diverse datasets and domains.
Abstract
High-quality data is crucial for accurate machine learning and actionable analytics, however, mislabeled or noisy data is a common problem in many domains. Distinguishing low- from high-quality data can be challenging, often requiring expert knowledge and considerable manual intervention. Data Valuation algorithms are a class of methods that seek to quantify the value of each sample in a dataset based on its contribution or importance to a given predictive task. These data values have shown an impressive ability to identify mislabeled observations, and filtering low-value data can boost machine learning performance. In this work, we present a simple alternative to existing methods, termed Data Valuation with Gradient Similarity (DVGS). This approach can be easily applied to any gradient descent learning algorithm, scales well to large datasets, and performs comparably or better than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Cell Image Analysis Techniques
