Analysis of label noise in graph-based semi-supervised learning
Bruno Klaus de Aquino Afonso, Lilian Berton

TL;DR
This paper empirically evaluates the robustness of graph-based semi-supervised learning algorithms against label noise, highlighting their ability to detect noisy labels and comparing their performance in different data scenarios.
Contribution
It provides a comprehensive empirical comparison of existing graph-based SSL algorithms under varying label noise and data conditions, revealing their strengths and limitations.
Findings
Algorithms can detect noisy labels when data aligns with SSL assumptions.
Detection of noisy labels becomes harder with fewer labeled samples.
Laplacian Eigenmaps outperforms label propagation on high-dimensional clustered data.
Abstract
In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data. However, the labeling process can be tedious, long, costly, and error-prone. It is often the case that most of our data is unlabeled. Semi-supervised learning (SSL) alleviates that by making strong assumptions about the relation between the labels and the input data distribution. This paradigm has been successful in practice, but most SSL algorithms end up fully trusting the few available labels. In real life, both humans and automated systems are prone to mistakes; it is essential that our algorithms are able to work with labels that are both few and also unreliable. Our work aims to perform an extensive empirical evaluation of existing graph-based semi-supervised algorithms, like Gaussian Fields and Harmonic Functions, Local and Global Consistency, Laplacian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
