Clustering by the Probability Distributions from Extreme Value Theory

Sixiao Zheng; Ke Fan; Yanxi Hou; Jianfeng Feng; and Yanwei Fu

arXiv:2202.09784·cs.LG·February 22, 2022

Clustering by the Probability Distributions from Extreme Value Theory

Sixiao Zheng, Ke Fan, Yanxi Hou, Jianfeng Feng, and Yanwei Fu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a probabilistic clustering algorithm based on Extreme Value Theory, modeling cluster distributions with Generalized Pareto Distribution to improve upon traditional k-means.

Contribution

It generalizes k-means by incorporating EVT, modeling distances with GPD, and introduces the centroid margin distance concept for probabilistic clustering.

Findings

01

GPD k-means outperforms traditional methods on synthetic and real datasets.

02

The approach provides a more stable and probabilistic clustering framework.

03

GPD k-means effectively estimates cluster structure.

Abstract

Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique cluster, while it does not utilize the information of sample distribution or density. Comparably, it would potentially be more beneficial to consider the probability of each sample in a possible cluster. To this end, this paper generalizes k-means to model the distribution of clusters. Our novel clustering algorithm thus models the distributions of distances to centroids over a threshold by Generalized Pareto Distribution (GPD) in Extreme Value Theory (EVT). Notably, we propose the concept of centroid margin distance, use GPD to establish a probability model for each cluster, and perform a clustering algorithm based on the covering probability function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sixiaozheng/evt-k-means
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Bayesian Methods and Mixture Models