Harmless label noise and informative soft-labels in supervised classification
Daniel Ahfock, Geoffrey J. McLachlan

TL;DR
This paper explores how supervised classification models, especially logistic regression, can be robust to label noise when the noise correlates with classification difficulty, and demonstrates that multiple noisy labels can sometimes outperform true labels in information content.
Contribution
It introduces a model-based framework for understanding the value of noisy labels derived from posterior probabilities and analyzes their impact on logistic regression performance.
Findings
Logistic regression is robust to label noise correlated with difficulty.
Multiple noisy labels can provide more information than a single ground-truth label.
Noisy labels sampled from posterior probabilities can be as informative as true labels.
Abstract
Manual labelling of training examples is common practice in supervised learning. When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the training dataset. If the manual annotation is carried out by multiple experts, the same training example can be given different class assignments by different experts, which is indicative of label noise. In the framework of model-based classification, a simple, but key observation is that when the manual labels are sampled using the posterior probabilities of class membership, the noisy labels are as valuable as the ground-truth labels in terms of statistical information. A relaxation of this process is a random effects model for imperfect labelling by a group that uses approximate posterior probabilities of class membership. The relative efficiency of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
