Same-Cluster Querying for Overlapping Clusters

Wasim Huleihel; Arya Mazumdar; Muriel M\'edard; and Soumyabrata Pal

arXiv:1910.12490·cs.LG·October 29, 2019·6 cites

Same-Cluster Querying for Overlapping Clusters

Wasim Huleihel, Arya Mazumdar, Muriel M\'edard, and Soumyabrata Pal

PDF

Open Access

TL;DR

This paper addresses the challenge of efficiently recovering overlapping clusters using minimal queries by developing algorithms that are order optimal, noise-tolerant, and validated on real-world data.

Contribution

It introduces new algorithms for overlapping cluster recovery with theoretical guarantees and practical efficiency, extending prior work from disjoint to overlapping clusters.

Findings

01

Algorithms are order optimal in query complexity.

02

Algorithms work under noise and arbitrary models.

03

Validated on synthetic and real-world datasets.

Abstract

Overlapping clusters are common in models of many practical data-segmentation applications. Suppose we are given $n$ elements to be clustered into $k$ possibly overlapping clusters, and an oracle that can interactively answer queries of the form "do elements $u$ and $v$ belong to the same cluster?" The goal is to recover the clusters with minimum number of such queries. This problem has been of recent interest for the case of disjoint clusters. In this paper, we look at the more practical scenario of overlapping clusters, and provide upper bounds (with algorithms) on the sufficient number of queries. We provide algorithmic results under both arbitrary (worst-case) and statistical modeling assumptions. Our algorithms are parameter free, efficient, and work in the presence of random noise. We also derive information-theoretic lower bounds on the number of queries needed, proving that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Advanced Clustering Algorithms Research · Bayesian Methods and Mixture Models

MethodsTest