Learning from Binary Labels with Instance-Dependent Corruption
Aditya Krishna Menon, Brendan van Rooyen, Nagarajan Natarajan

TL;DR
This paper investigates the theoretical limits of learning from binary labels corrupted by instance- and label-dependent noise, demonstrating consistency of classifiers and ranking metrics, and proposing an efficient learning algorithm under certain conditions.
Contribution
It proves that algorithms consistent on noisy data are also consistent on clean data, and introduces an efficient method for learning with corrupted labels when the true model is a generalized linear model.
Findings
Consistency of classifiers under instance-dependent noise.
Consistency of AUC under broad noise conditions.
Efficient learning with the Isotron when the true model is a generalized linear model.
Abstract
Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise. With sufficiently many such samples, can we optimally classify and rank instances with respect to the noise-free distribution? We provide a theoretical analysis of this question, with three main contributions. First, we prove that for instance-dependent noise, any algorithm that is consistent for classification on the noisy distribution is also consistent on the clean distribution. Second, we prove that for a broad class of instance- and label-dependent noise, a similar consistency result holds for the area under the ROC curve. Third, for the latter noise model, when the noise-free class-probability function belongs to the generalised linear model family, we show that the Isotron can efficiently and provably learn from the corrupted sample.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Imbalanced Data Classification Techniques
