Cross-Validation Is All You Need: A Statistical Approach To Label Noise Estimation
Jianan Chen, Vishwesh Ramanathan, Tony Xu, Anne L. Martel

TL;DR
This paper introduces two novel algorithms, ReCoV and fastReCoV, for detecting label noise in machine learning, demonstrating their effectiveness across various models and tasks, including challenging survival analysis scenarios.
Contribution
It presents the first label noise detection methods specifically designed for survival analysis, leveraging cross-validation fluctuations to identify noisy labels.
Findings
ReCoV and fastReCoV outperform existing noise detection methods.
Algorithms are effective across multiple modalities and models.
Proposed methods are robust and computationally efficient.
Abstract
Machine learning models experience deteriorated performance when trained in the presence of noisy labels. This is particularly problematic for medical tasks, such as survival prediction, which typically face high label noise complexity with few clear-cut solutions. Inspired by the large fluctuations across folds in the cross-validation performance of survival analyses, we design Monte-Carlo experiments to show that such fluctuation could be caused by label noise. We propose two novel and straightforward label noise detection algorithms that effectively identify noisy examples by pinpointing the samples that more frequently contribute to inferior cross-validation results. We first introduce Repeated Cross-Validation (ReCoV), a parameter-free label noise detection algorithm that is robust to model choice. We further develop fastReCoV, a less robust but more tractable and efficient variant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
