Data ultrametricity and clusterability

Dan Simovici; Kaixun Hua

arXiv:1908.10833·cs.LG·January 8, 2020

Data ultrametricity and clusterability

Dan Simovici, Kaixun Hua

PDF

TL;DR

This paper introduces a new ultrametric-based method to assess the clusterability of datasets, enabling efficient partitioning by evaluating ultrametricity through a novel matrix product approach.

Contribution

It proposes a novel technique to determine dataset ultrametricity and clusterability, improving the efficiency of clustering massive datasets.

Findings

01

The method effectively evaluates dataset ultrametricity.

02

Applying the technique yields the sub-dominant ultrametric of dissimilarities.

03

The approach facilitates efficient clustering of large datasets.

Abstract

The increasing needs of clustering massive datasets and the high cost of running clustering algorithms poses difficult problems for users. In this context it is important to determine if a data set is clusterable, that is, it may be partitioned efficiently into well-differentiated groups containing similar objects. We approach data clusterability from an ultrametric-based perspective. A novel approach to determine the ultrametricity of a dataset is proposed via a special type of matrix product, which allows us to evaluate the clusterability of the dataset. Furthermore, we show that by applying our technique to a dissimilarity space will generate the sub-dominant ultrametric of the dissimilarity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.