When is Clustering Perturbation Robust?

Margareta Ackerman; Jarrod Moore

arXiv:1601.05900·cs.LG·January 25, 2016

When is Clustering Perturbation Robust?

Margareta Ackerman, Jarrod Moore

PDF

Open Access

TL;DR

This paper analyzes the limits of clustering algorithms' robustness to data perturbations and identifies structural conditions that enable meaningful clustering despite noisy data.

Contribution

It provides a formal analysis of perturbation robustness in clustering and characterizes structures that support reliable clustering under data noise.

Findings

01

Robustness to perturbations is inherently limited in clustering algorithms.

02

Certain data structures enable meaningful clustering despite noisy dissimilarities.

03

The paper offers theoretical insights into when clustering can withstand data perturbations.

Abstract

Clustering is a fundamental data mining tool that aims to divide data into groups of similar items. Generally, intuition about clustering reflects the ideal case -- exact data sets endowed with flawless dissimilarity between individual instances. In practice however, these cases are in the minority, and clustering applications are typically characterized by noisy data sets with approximate pairwise dissimilarities. As such, the efficacy of clustering methods in practical applications necessitates robustness to perturbations. In this paper, we perform a formal analysis of perturbation robustness, revealing that the extent to which algorithms can exhibit this desirable characteristic is inherently limited, and identifying the types of structures that allow popular clustering paradigms to discover meaningful clusters in spite of faulty data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Advanced Clustering Algorithms Research