Near-Optimal Comparison Based Clustering
Micha\"el Perrot, Pascal Mattia Esser, Debarghya Ghoshdastidar

TL;DR
This paper introduces a near-optimal method for clustering based solely on ordinal comparisons, using a two-step approach with similarity estimation and SDP-based clustering, supported by theoretical guarantees and real data validation.
Contribution
It presents a novel two-step clustering method that recovers planted clusters from passive comparisons with near-optimal sample complexity.
Findings
Exact recovery of planted clusters under certain conditions
Method performs well on real datasets
Theoretical guarantees match empirical results
Abstract
The goal of clustering is to group similar objects into meaningful partitions. This process is well understood when an explicit similarity measure between the objects is given. However, far less is known when this information is not readily available and, instead, one only observes ordinal comparisons such as "object i is more similar to j than to k." In this paper, we tackle this problem using a two-step procedure: we estimate a pairwise similarity matrix from the comparisons before using a clustering method based on semi-definite programming (SDP). We theoretically show that our approach can exactly recover a planted clustering using a near-optimal number of passive comparisons. We empirically validate our theoretical findings and demonstrate the good behaviour of our method on real data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Data Management and Algorithms
