U-statistical inference for hierarchical clustering

Marcio Valk; Gabriela Bettella Cybis

arXiv:1805.12179·stat.ME·June 1, 2018·J. Comput. Graph. Stat.·1 cites

U-statistical inference for hierarchical clustering

Marcio Valk, Gabriela Bettella Cybis

PDF

Open Access

TL;DR

This paper introduces a U-statistics based method for assessing the significance of hierarchical clustering, especially effective for high-dimensional low-sample-size data, with proven power and broad applicability.

Contribution

It develops a novel U-statistics based approach for significance testing in hierarchical clustering tailored to HDLSS data, including new algorithms and asymptotic theory.

Findings

01

Methods outperform competing alternatives in simulations

02

Algorithms are effective in genetics and image recognition applications

03

Approach relies on minimal assumptions about data

Abstract

Clustering methods are a valuable tool for the identification of patterns in high dimensional data with applications in many scientific problems. However, quantifying uncertainty in clustering is a challenging problem, particularly when dealing with High Dimension Low Sample Size (HDLSS) data. We develop here a U-statistics based clustering approach that assesses statistical significance in clustering and is specifically tailored to HDLSS scenarios. These non-parametric methods rely on very few assumptions about the data, and thus can be applied to a wide range of datasets for which the euclidean distance captures relevant features. We propose two significance clustering algorithms, a hierarchical method and a non-nested version. In order to do so, we first propose an extension of a relevant U-statistics and develop its asymptotic theory. Our methods are tested through extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Complex Network Analysis Techniques