Rank Selection for Non-negative Matrix Factorization

Yun Cai; Hong Gu; Toby Kenney

arXiv:2211.00857·stat.AP·November 3, 2022

Rank Selection for Non-negative Matrix Factorization

Yun Cai, Hong Gu, Toby Kenney

PDF

Open Access

TL;DR

This paper introduces a novel hypothesis testing-based method for selecting the optimal rank in Non-Negative Matrix Factorization, improving accuracy and efficiency in extracting meaningful features from complex data.

Contribution

The paper proposes a new rank selection technique using deconvolved bootstrap distribution, enhancing accuracy over existing methods especially with difficult feature distinctions.

Findings

01

Accurately estimates true ranks in simulations

02

Efficient computation compared to existing methods

03

Extracts interpretable sub-communities in microbiome data

Abstract

Non-Negative Matrix Factorization (NMF) is a widely used dimension reduction method that factorizes a non-negative data matrix into two lower dimensional non-negative matrices: One is the basis or feature matrix which consists of the variables and the other is the coefficients matrix which is the projections of data points to the new basis. The features can be interpreted as sub-structures of the data. The number of sub-structures in the feature matrix is also called the rank which is the only tuning parameter in NMF. An appropriate rank will extract the key latent features while minimizing the noise from the original data. In this paper, we develop a novel rank selection method based on hypothesis testing, using a deconvolved bootstrap distribution to assess the significance level accurately despite the large amount of optimization error. In the simulation section, we compare our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks