Bi-cross-validation of the SVD and the nonnegative matrix factorization
Art B. Owen, Patrick O. Perry

TL;DR
This paper introduces a bi-cross-validation method for selecting the rank in matrix factorization models like SVD and NMF, using a novel leave-out strategy that improves model selection accuracy.
Contribution
It proposes a new bi-cross-validation approach that leaves out both rows and columns, with theoretical and empirical validation for better rank determination.
Findings
Leaving out half the rows and columns performs well.
Smaller hold-out sets tend to overfit, larger ones underfit.
Method is effective in simulated examples.
Abstract
This article presents a form of bi-cross-validation (BCV) for choosing the rank in outer product models, especially the singular value decomposition (SVD) and the nonnegative matrix factorization (NMF). Instead of leaving out a set of rows of the data matrix, we leave out a set of rows and a set of columns, and then predict the left out entries by low rank operations on the retained data. We prove a self-consistency result expressing the prediction error as a residual from a low rank approximation. Random matrix theory and some empirical results suggest that smaller hold-out sets lead to more over-fitting, while larger ones are more prone to under-fitting. In simulated examples we find that a method leaving out half the rows and half the columns performs well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
