Almost exact recovery in noisy semi-supervised learning
Konstantin Avrachenkov, Maximilien Dreveton

TL;DR
This paper investigates the impact of noisy labels in graph-based semi-supervised learning, deriving a MAP estimator for DC-SBM, proposing a consistent algorithm, and demonstrating robust performance on synthetic and real datasets.
Contribution
It introduces a MAP-based estimator and a continuous relaxation algorithm for noisy semi-supervised learning on DC-SBM, with proven consistency and strong empirical results.
Findings
The proposed method achieves accurate clustering despite high label noise.
The algorithm is consistent under the degree corrected stochastic block model.
Numerical experiments confirm robustness on synthetic and real data.
Abstract
Graph-based semi-supervised learning methods combine the graph structure and labeled data to classify unlabeled data. In this work, we study the effect of a noisy oracle on classification. In particular, we derive the Maximum A Posteriori (MAP) estimator for clustering a Degree Corrected Stochastic Block Model (DC-SBM) when a noisy oracle reveals a fraction of the labels. We then propose an algorithm derived from a continuous relaxation of the MAP, and we establish its consistency. Numerical experiments show that our approach achieves promising performance on synthetic and real data sets, even in the case of very noisy labeled data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
