Clustering with Confidence: Finding Clusters with Statistical Guarantees
Andreas Henelius, Kai Puolam\"aki, Henrik Bostr\"om, Panagiotis, Papapetrou

TL;DR
This paper introduces a statistical method to quantify and find robust clusters with guarantees on their stability, applicable to various clustering and classification algorithms, validated on simulated and real data.
Contribution
The paper proposes a novel technique for identifying core clusters with statistical robustness guarantees, linking it to maximal clique detection in graphs.
Findings
Clusters meet robustness guarantees in tests
Method applicable to clustering and classification algorithms
Validated on simulated and real datasets
Abstract
Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or re-running a clustering algorithm involving some stochastic component may lead to completely different clusters. There is, hence, a need for techniques that can quantify the instability of the generated clusters. In this study, we propose a technique for quantifying the instability of a clustering solution and for finding robust clusters, termed core clusters, which correspond to clusters where the co-occurrence probability of each data item within a cluster is at least . We demonstrate how solving the core clustering problem is linked to finding the largest maximal cliques in a graph. We show that the method can be used with both clustering and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Bayesian Methods and Mixture Models
