A Scalable Approach to Estimating the Rank of High-Dimensional Data
Wenlan Zang, Jen-hwa Chu, Michael J. Kane

TL;DR
This paper introduces a scalable method for estimating the rank of high-dimensional data matrices by distinguishing significant eigenvalues from noise using the Marchenko-Pastur distribution.
Contribution
It proposes a latent-space-construction procedure that accurately estimates the signal rank by comparing eigenvalues to the MP distribution, improving over heuristic methods.
Findings
The method effectively identifies the true rank in high-dimensional settings.
It reduces noise inclusion by statistically testing eigenvalue significance.
The approach is scalable to large datasets.
Abstract
A key challenge to performing effective analyses of high-dimensional data is finding a signal-rich, low-dimensional representation. For linear subspaces, this is generally performed by decomposing a design matrix (via eigenvalue or singular value decomposition) into orthogonal components, and then retaining those components with sufficient variations. This is equivalent to estimating the rank of the matrix and deciding which components to retain is generally carried out using heuristic or ad-hoc approaches such as plotting the decreasing sequence of the eigenvalues and looking for the "elbow" in the plot. While these approaches have been shown to be effective, a poorly calibrated or misjudged elbow location can result in an overabundance of noise or an under-abundance of signal in the low-dimensional representation, making subsequent modeling difficult. In this article, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoil Geostatistics and Mapping · Spectroscopy and Chemometric Analyses · Advanced Statistical Methods and Models
