Cross-Validation Is All You Need: A Statistical Approach To Label Noise   Estimation

Jianan Chen; Vishwesh Ramanathan; Tony Xu; Anne L. Martel

arXiv:2306.13990·cs.LG·July 22, 2024·2 cites

Cross-Validation Is All You Need: A Statistical Approach To Label Noise Estimation

Jianan Chen, Vishwesh Ramanathan, Tony Xu, Anne L. Martel

PDF

Open Access 1 Repo

TL;DR

This paper introduces two novel algorithms, ReCoV and fastReCoV, for detecting label noise in machine learning, demonstrating their effectiveness across various models and tasks, including challenging survival analysis scenarios.

Contribution

It presents the first label noise detection methods specifically designed for survival analysis, leveraging cross-validation fluctuations to identify noisy labels.

Findings

01

ReCoV and fastReCoV outperform existing noise detection methods.

02

Algorithms are effective across multiple modalities and models.

03

Proposed methods are robust and computationally efficient.

Abstract

Machine learning models experience deteriorated performance when trained in the presence of noisy labels. This is particularly problematic for medical tasks, such as survival prediction, which typically face high label noise complexity with few clear-cut solutions. Inspired by the large fluctuations across folds in the cross-validation performance of survival analyses, we design Monte-Carlo experiments to show that such fluctuation could be caused by label noise. We propose two novel and straightforward label noise detection algorithms that effectively identify noisy examples by pinpointing the samples that more frequently contribute to inferior cross-validation results. We first introduce Repeated Cross-Validation (ReCoV), a parameter-free label noise detection algorithm that is robust to model choice. We further develop fastReCoV, a less robust but more tractable and efficient variant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gjiananchen/recov
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification