A new measure for assessment of clustering based on kernel density   estimation

Soumita Modak

arXiv:2201.02030·stat.ME·February 15, 2022

A new measure for assessment of clustering based on kernel density estimation

Soumita Modak

PDF

TL;DR

This paper introduces a novel clustering validity index based on kernel density estimation of interpoint distances, suitable for high-dimensional data, and demonstrates its effectiveness through simulations and real-world applications.

Contribution

It proposes a new, dimensionally robust clustering accuracy measure that can determine the number of clusters and assess clustering quality across various data types.

Findings

01

Proves superiority over silhouette and Dunn index in simulations.

02

Effective in high-dimensional biostatistics and astrophysics applications.

03

Compatible with any clustering algorithm and distance measure.

Abstract

A new clustering accuracy measure is proposed to determine the unknown number of clusters and to assess the quality of clustering of a data set given in any dimensional space. Our validity index applies the classical nonparametric univariate kernel density estimation method to the interpoint distances computed between the members of data. Being based on interpoint distances, it is free of the curse of dimensionality and therefore efficiently computable for high-dimensional situations where the number of study variables can be larger than the sample size. The proposed measure is compatible with any clustering algorithm and with every kind of data set where the interpoint distance measure can be defined to have a density function. Simulation study proves its superiority over widely used cluster validity indices like the average silhouette width and the Dunn index, whereas its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.