ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning
Ragav Sachdeva, Filipe R Cordeiro, Vasileios Belagiannis, Ian Reid,, Gustavo Carneiro

TL;DR
ScanMix is a novel training algorithm that combines semantic clustering and semi-supervised learning, achieving state-of-the-art robustness to severe label noise across multiple benchmark datasets.
Contribution
It introduces a new algorithm based on expectation maximisation that effectively handles severe label noise with proven convergence and superior empirical performance.
Findings
Achieves state-of-the-art results on CIFAR-10/-100 with various label noise types.
Performs competitively on Red Mini-ImageNet, Clothing1M, and WebVision datasets.
Demonstrates robustness to severe label noise in all tested benchmarks.
Abstract
We propose a new training algorithm, ScanMix, that explores semantic clustering and semi-supervised learning (SSL) to allow superior robustness to severe label noise and competitive robustness to non-severe label noise problems, in comparison to the state of the art (SOTA) methods. ScanMix is based on the expectation maximisation framework, where the E-step estimates the latent variable to cluster the training images based on their appearance and classification results, and the M-step optimises the SSL classification and learns effective feature representations via semantic clustering. We present a theoretical result that shows the correctness and convergence of ScanMix, and an empirical result that shows that ScanMix has SOTA results on CIFAR-10/-100 (with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Face and Expression Recognition
