Determinantal consensus clustering

Serge Vicente; Alejandro Murua

arXiv:2102.03948·stat.ML·February 9, 2021

Determinantal consensus clustering

Serge Vicente, Alejandro Murua

PDF

Open Access

TL;DR

This paper introduces determinantal point processes (DPP) for ensemble consensus clustering, leveraging their diversity-promoting properties to improve clustering robustness and outperform traditional uniform sampling methods.

Contribution

It proposes using DPP for random restart in clustering algorithms, demonstrating its advantages over uniform sampling in producing diverse and representative center point subsets.

Findings

01

DPP-based sampling yields more diverse center sets.

02

DPP improves clustering performance over uniform sampling.

03

DPP outperforms classical consensus clustering methods.

Abstract

Random restart of a given algorithm produces many partitions to yield a consensus clustering. Ensemble methods such as consensus clustering have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPP and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points within subsets. So, subsets with more similar points have less chances of being generated than subsets with very distinct points. The current and most popular sampling technique is sampling center points uniformly at random. We show through extensive simulations that, contrary to DPP, this technique fails both to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Topological and Geometric Data Analysis · Data Management and Algorithms