Using the left Gram matrix to cluster high dimensional data
Shahina Rahman, Valen E. Johnson, Suhasini Subba Rao

TL;DR
This paper introduces a novel clustering algorithm for high-dimensional data using the normalized left Gram matrix, which avoids preprocessing and hyperparameter tuning, and demonstrates superior accuracy on microarray datasets.
Contribution
The paper presents a new clustering method based on the Gram matrix that is computationally efficient and more accurate than existing algorithms on benchmark microarray data.
Findings
Outperforms 14 other clustering algorithms on benchmark datasets
Does not require dimension reduction or hyperparameter tuning
Provides the most accurate cluster estimates more than twice as often
Abstract
For high dimensional data, where P features for N objects (P >> N) are represented in an NxP matrix X, we describe a clustering algorithm based on the normalized left Gram matrix, G = XX'/P. Under certain regularity conditions, the rows in G that correspond to objects in the same cluster converge to the same mean vector. By clustering on the row means, the algorithm does not require preprocessing by dimension reduction or feature selection techniques and does not require specification of tuning or hyperparameter values. Because it is based on the NxN matrix G, it has a lower computational cost than many methods based on clustering the feature matrix X. When compared to 14 other clustering algorithms applied to 32 benchmarked microarray datasets, the proposed algorithm provided the most accurate estimate of the underlying cluster configuration more than twice as often as its closest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Face and Expression Recognition · Neural Networks and Applications
MethodsFeature Selection
