Binary classification with corrupted labels
Yonghoon Lee, Rina Foygel Barber

TL;DR
This paper investigates how small amounts of label corruption in binary classification can act as a regularizer, improving robustness and providing bounds on estimation error, especially when classes are perfectly separable.
Contribution
It establishes that label corruption can serve as a beneficial regularizer in certain settings and derives explicit bounds on estimation error related to corruption levels.
Findings
Corruption acts as a regularizer in separable binary classification.
Estimation error bounds scale with the square root of sample size.
Small fractions of corrupted labels can improve model robustness.
Abstract
In a binary classification problem where the goal is to fit an accurate predictor, the presence of corrupted labels in the training data set may create an additional challenge. However, in settings where likelihood maximization is poorly behaved-for example, if positive and negative labels are perfectly separable-then a small fraction of corrupted labels can improve performance by ensuring robustness. In this work, we establish that in such settings, corruption acts as a form of regularization, and we compute precise upper bounds on estimation error in the presence of corruptions. Our results suggest that the presence of corrupted data points is beneficial only up to a small fraction of the total sample, scaling with the square root of the sample size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
