Near-Optimal Comparison Based Clustering

Micha\"el Perrot; Pascal Mattia Esser; Debarghya Ghoshdastidar

arXiv:2010.03918·cs.LG·October 12, 2020

Near-Optimal Comparison Based Clustering

Micha\"el Perrot, Pascal Mattia Esser, Debarghya Ghoshdastidar

PDF

Open Access 1 Video

TL;DR

This paper introduces a near-optimal method for clustering based solely on ordinal comparisons, using a two-step approach with similarity estimation and SDP-based clustering, supported by theoretical guarantees and real data validation.

Contribution

It presents a novel two-step clustering method that recovers planted clusters from passive comparisons with near-optimal sample complexity.

Findings

01

Exact recovery of planted clusters under certain conditions

02

Method performs well on real datasets

03

Theoretical guarantees match empirical results

Abstract

The goal of clustering is to group similar objects into meaningful partitions. This process is well understood when an explicit similarity measure between the objects is given. However, far less is known when this information is not readily available and, instead, one only observes ordinal comparisons such as "object i is more similar to j than to k." In this paper, we tackle this problem using a two-step procedure: we estimate a pairwise similarity matrix from the comparisons before using a clustering method based on semi-definite programming (SDP). We theoretically show that our approach can exactly recover a planted clustering using a near-optimal number of passive comparisons. We empirically validate our theoretical findings and demonstrate the good behaviour of our method on real data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Near-Optimal Comparison Based Clustering· slideslive

Taxonomy

TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Data Management and Algorithms