Clustering small datasets in high-dimension by random projection
Alden Bradford, Tarun Yellamraju, and Mireille Boutin

TL;DR
This paper introduces a simple, low-computation method for detecting statistically significant clusters in small, high-dimensional datasets by using random projections and feature space extension, enabling effective clustering validation.
Contribution
The paper presents a novel approach combining random projection and feature space extension to identify and validate clusters in small high-dimensional datasets.
Findings
Effective clustering with as few as 100-200 points
Clusters persist as dataset size increases
Method bypasses high-dimensional statistical validation challenges
Abstract
Datasets in high-dimension do not typically form clusters in their original space; the issue is worse when the number of points in the dataset is small. We propose a low-computation method to find statistically significant clustering structures in a small dataset. The method proceeds by projecting the data on a random line and seeking binary clusterings in the resulting one-dimensional data. Non-linear separations are obtained by extending the feature space using monomials of higher degrees in the original features. The statistical validity of the clustering structures obtained is tested in the projected one-dimensional space, thus bypassing the challenge of statistical validation in high-dimension. Projecting on a random line is an extreme dimension reduction technique that has previously been used successfully as part of a hierarchical clustering method for high-dimensional data. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Bayesian Methods and Mixture Models
