Shape complexity in cluster analysis

Eduardo J. Aguilar; Valmir C. Barbosa

arXiv:2205.08046·cs.LG·May 30, 2023

Shape complexity in cluster analysis

Eduardo J. Aguilar, Valmir C. Barbosa

PDF

Open Access

TL;DR

This paper introduces a novel data-driven method for determining scaling factors in cluster analysis by leveraging shape complexity, a concept borrowed from cosmology, to improve clustering performance.

Contribution

It proposes a new approach using shape complexity to select scaling factors before clustering, which is a departure from traditional statistical scaling methods.

Findings

01

Positive results on iconic data sets

02

Highlights strengths of shape complexity approach

03

Discusses potential weaknesses and data considerations

Abstract

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here we explore the use of multidimensional shapes of data, aiming to obtain scaling factors for use prior to clustering by some method, like k-means, that makes explicit use of distances between samples. We borrow from the field of cosmology and related areas the recently introduced notion of shape complexity, which in the variant we use is a relatively simple, data-dependent nonlinear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Computational Drug Discovery Methods