Mass Distribution versus Density Distribution in the Context of Clustering
Kai Ming Ting, Ye Zhu, Hang Zhang, Tianrun Liang

TL;DR
This paper compares density and mass distributions in clustering, highlighting limitations of density-based methods and proposing a mass-maximization clustering algorithm that reduces bias towards dense clusters.
Contribution
It introduces a novel mass-maximization clustering algorithm that overcomes density bias and provides a new perspective on data clustering.
Findings
Density distribution has a fundamental high-density bias.
Existing density-based algorithms struggle with arbitrary cluster shapes.
Mass-maximization clustering reduces bias and improves cluster discovery.
Abstract
This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution has its fundamental limitation -- high-density bias, irrespective of the algorithms used to perform clustering. Existing density-based clustering algorithms have employed different algorithmic means to counter the effect of the high-density bias with some success, but the fundamental limitation of using density distribution remains an obstacle to discovering clusters of arbitrary shapes, sizes and densities. Using the mass distribution as a better foundation, we propose a new algorithm which maximizes the total mass of all clusters, called mass-maximization clustering (MMC). The algorithm can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Statistical Methods and Applications
