Clustering with Queries under Semi-Random Noise

Alberto Del Pia; Mingchen Ma; Christos Tzamos

arXiv:2206.04583·cs.LG·July 22, 2022

Clustering with Queries under Semi-Random Noise

Alberto Del Pia, Mingchen Ma, Christos Tzamos

PDF

Open Access

TL;DR

This paper introduces robust clustering algorithms that operate effectively under semi-random noise, matching the performance of fully-random models and providing the first parameter-free algorithm for such settings.

Contribution

It develops the first computationally efficient, semi-random noise-tolerant clustering algorithms with guarantees matching fully-random models, and introduces a parameter-free algorithm for the fully-random case.

Findings

01

Queries needed scale as O(nk log n / (1-2p)^2).

02

Algorithms can identify large clusters efficiently under semi-random noise.

03

First parameter-free clustering algorithm for fully-random noise model.

Abstract

The seminal paper by Mazumdar and Saha \cite{MS17a} introduced an extensive line of work on clustering with noisy queries. Yet, despite significant progress on the problem, the proposed methods depend crucially on knowing the exact probabilities of errors of the underlying fully-random oracle. In this work, we develop robust learning methods that tolerate general semi-random noise obtaining qualitatively the same guarantees as the best possible methods in the fully-random model. More specifically, given a set of $n$ points with an unknown underlying partition, we are allowed to query pairs of points $u, v$ to check if they are in the same cluster, but with probability $p$ , the answer may be adversarially chosen. We show that information theoretically $O (\frac{nk l o g n}{( 1 - 2 p ) ^{2}})$ queries suffice to learn any cluster of sufficiently large size. Our main result is a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Data Management and Algorithms · Privacy-Preserving Technologies in Data