Fast Randomized Semi-Supervised Clustering

Alaa Saade; Florent Krzakala; Marc Lelarge; Lenka Zdeborov\'a

arXiv:1605.06422·cs.LG·June 28, 2018

Fast Randomized Semi-Supervised Clustering

Alaa Saade, Florent Krzakala, Marc Lelarge, Lenka Zdeborov\'a

PDF

TL;DR

This paper presents a fast, efficient semi-supervised clustering algorithm using a power iteration of the non-backtracking operator, achieving low classification error with minimal pairwise comparisons.

Contribution

It introduces a novel local algorithm for semi-supervised clustering based on non-backtracking operators, with theoretical bounds and practical performance analysis.

Findings

01

Achieves small classification error with O(n) measurements

02

Efficient in time and space complexity

03

Performs well on synthetic and real data

Abstract

We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking operator and study its performance on a simple model. For the case of two clusters, we give bounds on the classification error and show that a small error can be achieved from $O (n)$ randomly chosen measurements, where $n$ is the number of items in the dataset. Our algorithm is therefore efficient both in terms of time and space complexities. We also investigate numerically the performance of the algorithm on synthetic and real world data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.