Almost exact recovery in noisy semi-supervised learning

Konstantin Avrachenkov; Maximilien Dreveton

arXiv:2007.14717·cs.LG·February 5, 2025·1 cites

Almost exact recovery in noisy semi-supervised learning

Konstantin Avrachenkov, Maximilien Dreveton

PDF

Open Access 1 Repo

TL;DR

This paper investigates the impact of noisy labels in graph-based semi-supervised learning, deriving a MAP estimator for DC-SBM, proposing a consistent algorithm, and demonstrating robust performance on synthetic and real datasets.

Contribution

It introduces a MAP-based estimator and a continuous relaxation algorithm for noisy semi-supervised learning on DC-SBM, with proven consistency and strong empirical results.

Findings

01

The proposed method achieves accurate clustering despite high label noise.

02

The algorithm is consistent under the degree corrected stochastic block model.

03

Numerical experiments confirm robustness on synthetic and real data.

Abstract

Graph-based semi-supervised learning methods combine the graph structure and labeled data to classify unlabeled data. In this work, we study the effect of a noisy oracle on classification. In particular, we derive the Maximum A Posteriori (MAP) estimator for clustering a Degree Corrected Stochastic Block Model (DC-SBM) when a noisy oracle reveals a fraction of the labels. We then propose an algorithm derived from a continuous relaxation of the MAP, and we establish its consistency. Numerical experiments show that our approach achieves promising performance on synthetic and real data sets, even in the case of very noisy labeled data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mdreveton/ssl-sbm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications