Clustering inference in multiple groups
Debora Zava Bello, Marcio Valk, Gabriela Bettella Cybis

TL;DR
This paper introduces a non-parametric, U-statistics based clustering method for high-dimensional data that effectively identifies three-group structures and assesses their significance, demonstrating superior power and versatility across various applications.
Contribution
The paper develops a novel U-statistics based approach for clustering into three groups with significance testing, extending existing methods and enhancing power in high-dimensional data analysis.
Findings
More statistical power than competing methods in simulations.
Effective in high-dimensional, real-world datasets.
Versatile across different scientific applications.
Abstract
Inference in clustering is paramount to uncovering inherent group structure in data. Clustering methods which assess statistical significance have recently drawn attention owing to their importance for the identification of patterns in high dimensional data with applications in many scientific fields. We present here a U-statistics based approach, specially tailored for high-dimensional data, that clusters the data into three groups while assessing the significance of such partitions. Because our approach stands on the U-statistics based clustering framework of the methods in R package uclust, it inherits its characteristics being a non-parametric method relying on very few assumptions about the data, and thus can be applied to a wide range of dataset. Furthermore our method aims to be a more powerful tool to find the best partitions of the data into three groups when that particular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Gene expression and cancer classification
