A simulation study of cluster search algorithms in data set generated by Gaussian mixture models
Ryosuke Motegi, Yoichi Seki

TL;DR
This study systematically compares centroid-based and model-based cluster search algorithms using Gaussian mixture models across various data conditions, revealing their strengths and limitations in different scenarios.
Contribution
It provides a comprehensive evaluation of cluster search algorithms under diverse conditions, highlighting their robustness and sensitivity factors.
Findings
Euclidean distance-based criteria can be unreliable with overlapping clusters.
Model-based algorithms are more robust to covariance and overlap when sample size is large.
The study offers practical insights for choosing cluster search methods in large datasets.
Abstract
Determining the number of clusters is a fundamental issue in data clustering. Several algorithms have been proposed, including centroid-based algorithms using the Euclidean distance and model-based algorithms using a mixture of probability distributions. Among these, greedy algorithms for searching the number of clusters by repeatedly splitting or merging clusters have advantages in terms of computation time for problems with large sample sizes. However, studies comparing these methods in systematic evaluation experiments still need to be included. This study examines centroid- and model-based cluster search algorithms in various cases that Gaussian mixture models (GMMs) can generate. The cases are generated by combining five factors: dimensionality, sample size, the number of clusters, cluster overlap, and covariance type. The results show that some cluster-splitting criteria based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
