Clustering by the Probability Distributions from Extreme Value Theory
Sixiao Zheng, Ke Fan, Yanxi Hou, Jianfeng Feng, and Yanwei Fu

TL;DR
This paper introduces a probabilistic clustering algorithm based on Extreme Value Theory, modeling cluster distributions with Generalized Pareto Distribution to improve upon traditional k-means.
Contribution
It generalizes k-means by incorporating EVT, modeling distances with GPD, and introduces the centroid margin distance concept for probabilistic clustering.
Findings
GPD k-means outperforms traditional methods on synthetic and real datasets.
The approach provides a more stable and probabilistic clustering framework.
GPD k-means effectively estimates cluster structure.
Abstract
Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique cluster, while it does not utilize the information of sample distribution or density. Comparably, it would potentially be more beneficial to consider the probability of each sample in a possible cluster. To this end, this paper generalizes k-means to model the distribution of clusters. Our novel clustering algorithm thus models the distributions of distances to centroids over a threshold by Generalized Pareto Distribution (GPD) in Extreme Value Theory (EVT). Notably, we propose the concept of centroid margin distance, use GPD to establish a probability model for each cluster, and perform a clustering algorithm based on the covering probability function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Bayesian Methods and Mixture Models
