A new nonparametric interpoint distance-based measure for assessment of   clustering

Soumita Modak

arXiv:2210.08972·cs.LG·October 18, 2022

A new nonparametric interpoint distance-based measure for assessment of clustering

Soumita Modak

PDF

TL;DR

This paper introduces a nonparametric, distance-based measure for determining the optimal number of clusters in a dataset, applicable to various data types and compatible with any clustering algorithm.

Contribution

It proposes a novel cluster validity index that is independent of data distribution and effective for high-dimensional, univariate, and multivariate data.

Findings

01

Demonstrates superiority over existing cluster validity measures

02

Applicable to data with arbitrary scales and high dimensionality

03

Effective in both synthetic and real-world datasets

Abstract

A new interpoint distance-based measure is proposed to identify the optimal number of clusters present in a data set. Designed in nonparametric approach, it is independent of the distribution of given data. Interpoint distances between the data members make our cluster validity index applicable to univariate and multivariate data measured on arbitrary scales, or having observations in any dimensional space where the number of study variables can be even larger than the sample size. Our proposed criterion is compatible with any clustering algorithm, and can be used to determine the unknown number of clusters or to assess the quality of the resulting clusters for a data set. Demonstration through synthetic and real-life data establishes its superiority over the well-known clustering accuracy measures of the literature.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.