Mass Distribution versus Density Distribution in the Context of Clustering

Kai Ming Ting; Ye Zhu; Hang Zhang; Tianrun Liang

arXiv:2601.10759·stat.ML·January 26, 2026

Mass Distribution versus Density Distribution in the Context of Clustering

Kai Ming Ting, Ye Zhu, Hang Zhang, Tianrun Liang

PDF

Open Access

TL;DR

This paper compares density and mass distributions in clustering, highlighting limitations of density-based methods and proposing a mass-maximization clustering algorithm that reduces bias towards dense clusters.

Contribution

It introduces a novel mass-maximization clustering algorithm that overcomes density bias and provides a new perspective on data clustering.

Findings

01

Density distribution has a fundamental high-density bias.

02

Existing density-based algorithms struggle with arbitrary cluster shapes.

03

Mass-maximization clustering reduces bias and improves cluster discovery.

Abstract

This paper investigates two fundamental descriptors of data, i.e., density distribution versus mass distribution, in the context of clustering. Density distribution has been the de facto descriptor of data distribution since the introduction of statistics. We show that density distribution has its fundamental limitation -- high-density bias, irrespective of the algorithms used to perform clustering. Existing density-based clustering algorithms have employed different algorithmic means to counter the effect of the high-density bias with some success, but the fundamental limitation of using density distribution remains an obstacle to discovering clusters of arbitrary shapes, sizes and densities. Using the mass distribution as a better foundation, we propose a new algorithm which maximizes the total mass of all clusters, called mass-maximization clustering (MMC). The algorithm can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Statistical Methods and Applications