Statistical power for cluster analysis

E. S. Dalmaijer; C. L. Nord; and D. E. Astle

arXiv:2003.00381·stat.ML·May 26, 2021·26 cites

Statistical power for cluster analysis

E. S. Dalmaijer, C. L. Nord, and D. E. Astle

PDF

Open Access 1 Repo

TL;DR

This study evaluates statistical power in cluster analysis, demonstrating how factors like effect size, sample size, and clustering method influence the ability to detect true subgroups in biomedical data.

Contribution

It provides the first comprehensive simulation-based assessment of power for various cluster analysis pipelines, guiding researchers on optimal practices.

Findings

01

Large effect sizes enable detection with small samples (N=20 per subgroup).

02

Fuzzy clustering outperforms traditional methods in overlapping distributions.

03

Multidimensional scaling enhances cluster separation and power.

Abstract

Cluster algorithms are increasingly popular in biomedical research due to their compelling ability to identify discrete subgroups in data, and their increasing accessibility in mainstream software. While guidelines exist for algorithm selection and outcome evaluation, there are no firmly established ways of computing a priori statistical power for cluster analysis. Here, we estimated power and accuracy for common analysis pipelines through simulation. We varied subgroup size, number, separation (effect size), and covariance structure. We then subjected generated datasets to dimensionality reduction (none, multidimensional scaling, or UMAP) and cluster algorithms (k-means, agglomerative hierarchical clustering with Ward or average linkage and Euclidean or cosine distance, HDBSCAN). Finally, we compared the statistical power of discrete (k-means), "fuzzy" (c-means), and finite mixture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

esdalmaijer/cluster_power
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models