Label Noise Cleaning for Supervised Classification via Bernoulli Random Sampling
Yuxin Liu, Xiong Jin, Yang Han

TL;DR
This paper introduces a classifier-agnostic method using Bernoulli random sampling to effectively identify and clean label noise in supervised classification, improving model accuracy without prior label information.
Contribution
It presents a novel label noise cleaning technique based on Bernoulli sampling and mixture distribution modeling, with theoretical justification and practical effectiveness.
Findings
Method accurately separates clean and noisy labels.
Performs well on both simulated and real datasets.
Theoretically justified with convergence guarantees.
Abstract
Label noise - incorrect labels assigned to observations - can substantially degrade the performance of supervised classifiers. This paper proposes a label noise cleaning method based on Bernoulli random sampling. We show that the mean label noise levels of subsets generated by Bernoulli random sampling containing a given observation are identically distributed for all clean observations, and identically distributed, with a different distribution, for all noisy observations. Although the mean label noise levels are not independent across observations, by introducing an independent coupling we further prove that they converge to a mixture of two well-separated distributions corresponding to clean and noisy observations. By establishing a linear model between cross-validated classification errors and label noise levels, we are able to approximate this mixture distribution and thereby…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Advanced Statistical Methods and Models
