Non-Parametric Cluster Significance Testing with Reference to a Unimodal   Null Distribution

Erika S. Helgeson; Eric Bair

arXiv:1610.01424·stat.ME·October 7, 2016

Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

Erika S. Helgeson, Eric Bair

PDF

TL;DR

This paper introduces a non-parametric method for testing the significance of clusters in high-dimensional data by comparing within-cluster sums of squares to a unimodal null distribution estimated via kernel density, effectively identifying true clusters.

Contribution

It presents a novel, distribution-free approach for assessing cluster significance using kernel density estimation to model the null distribution.

Findings

01

Accurately tests for the presence of clusters in high-dimensional data.

02

Does not assume specific data distributions, increasing flexibility.

03

Effective in microarray gene expression data analysis.

Abstract

Cluster analysis is an unsupervised learning strategy that can be employed to identify subgroups of observations in data sets of unknown structure. This strategy is particularly useful for analyzing high-dimensional data such as microarray gene expression data. Many clustering methods are available, but it is challenging to determine if the identified clusters represent distinct subgroups. We propose a novel strategy to investigate the significance of identified clusters by comparing the within- cluster sum of squares from the original data to that produced by clustering an appropriate unimodal null distribution. The null distribution we present for this problem uses kernel density estimation and thus does not require that the data follow any particular distribution. We find that our method can accurately test for the presence of clustering even when the number of features is high.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.