Quantifying Distances Between Clusters with Elliptical or Non-Elliptical Shapes
Meredith L. Wallace, Lisa McTeague, Jessica L. Graves, Nicholas, Kissel, Cristina Tortora, Bradley Wheeler, and Satish Iyengar

TL;DR
This paper introduces measures for quantifying distances between clusters with various shapes, including non-elliptical, to improve interpretation and comparison of complex clustering models in health data.
Contribution
It proposes practical measures and computational tools for assessing multivariate distances between diverse cluster shapes, validated through simulations and real health data.
Findings
Measures effectively distinguish clusters with different means, scales, and rotations.
Simulation results demonstrate robustness of the measures across scenarios.
Application to health data illustrates practical utility in real-world clustering.
Abstract
Finite mixture models that allow for a broad range of potentially non-elliptical cluster distributions is an emerging methodological field. Such methods allow for the shape of the clusters to match the natural heterogeneity of the data, rather than forcing a series of elliptical clusters. These methods are highly relevant for clustering continuous non-normal data - a common occurrence with objective data that are now routinely captured in health research. However, interpreting and comparing such models - especially with regards to whether they produce meaningful clusters that are reasonably well separated - is non-trivial. We summarize several measures that can succinctly quantify the multivariate distance between two clusters, regardless of the cluster distribution, and suggest practical computational tools. Through a simulation study, we evaluate these measures across three scenarios…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data-Driven Disease Surveillance
