Rank Selection for Non-negative Matrix Factorization
Yun Cai, Hong Gu, Toby Kenney

TL;DR
This paper introduces a novel hypothesis testing-based method for selecting the optimal rank in Non-Negative Matrix Factorization, improving accuracy and efficiency in extracting meaningful features from complex data.
Contribution
The paper proposes a new rank selection technique using deconvolved bootstrap distribution, enhancing accuracy over existing methods especially with difficult feature distinctions.
Findings
Accurately estimates true ranks in simulations
Efficient computation compared to existing methods
Extracts interpretable sub-communities in microbiome data
Abstract
Non-Negative Matrix Factorization (NMF) is a widely used dimension reduction method that factorizes a non-negative data matrix into two lower dimensional non-negative matrices: One is the basis or feature matrix which consists of the variables and the other is the coefficients matrix which is the projections of data points to the new basis. The features can be interpreted as sub-structures of the data. The number of sub-structures in the feature matrix is also called the rank which is the only tuning parameter in NMF. An appropriate rank will extract the key latent features while minimizing the noise from the original data. In this paper, we develop a novel rank selection method based on hypothesis testing, using a deconvolved bootstrap distribution to assess the significance level accurately despite the large amount of optimization error. In the simulation section, we compare our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks
