Clustering small datasets in high-dimension by random projection

Alden Bradford; Tarun Yellamraju; and Mireille Boutin

arXiv:2008.09579·stat.ML·August 24, 2020·1 cites

Clustering small datasets in high-dimension by random projection

Alden Bradford, Tarun Yellamraju, and Mireille Boutin

PDF

Open Access

TL;DR

This paper introduces a simple, low-computation method for detecting statistically significant clusters in small, high-dimensional datasets by using random projections and feature space extension, enabling effective clustering validation.

Contribution

The paper presents a novel approach combining random projection and feature space extension to identify and validate clusters in small high-dimensional datasets.

Findings

01

Effective clustering with as few as 100-200 points

02

Clusters persist as dataset size increases

03

Method bypasses high-dimensional statistical validation challenges

Abstract

Datasets in high-dimension do not typically form clusters in their original space; the issue is worse when the number of points in the dataset is small. We propose a low-computation method to find statistically significant clustering structures in a small dataset. The method proceeds by projecting the data on a random line and seeking binary clusterings in the resulting one-dimensional data. Non-linear separations are obtained by extending the feature space using monomials of higher degrees in the original features. The statistical validity of the clustering structures obtained is tested in the projected one-dimensional space, thus bypassing the challenge of statistical validation in high-dimension. Projecting on a random line is an extreme dimension reduction technique that has previously been used successfully as part of a hierarchical clustering method for high-dimensional data. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Bayesian Methods and Mixture Models