Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters
Christian Hennig, Chien-Ju Lin

TL;DR
This paper introduces a flexible parametric bootstrap method to test homogeneity and determine the number of clusters in data, accounting for data structure and various clustering techniques.
Contribution
It proposes a novel bootstrap approach that models data features to improve cluster validation and homogeneity testing, accommodating complex data types and structures.
Findings
Effective in testing homogeneity across diverse data types.
Calibrates validation indexes for accurate cluster number estimation.
Applicable to various clustering methods and data structures.
Abstract
There are two notoriously hard problems in cluster analysis, estimating the number of clusters, and checking whether the population to be clustered is not actually homogeneous. Given a dataset, a clustering method and a cluster validation index, this paper proposes to set up null models that capture structural features of the data that cannot be interpreted as indicating clustering. Artificial datasets are sampled from the null model with parameters estimated from the original dataset. This can be used for testing the null hypothesis of a homogeneous population against a clustering alternative. It can also be used to calibrate the validation index for estimating the number of clusters, by taking into account the expected distribution of the index under the null model for any given number of clusters. The approach is illustrated by three examples, involving various different clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data-Driven Disease Surveillance
