Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration
Tavor Z. Baharav, Phillip B. Nicol, Rafael A. Irizarry, Rong Ma

TL;DR
This paper compares two data integration methods, Stack-SVD and SVD-Stack, using Random Matrix Theory to derive their asymptotic performance, revealing when and how to optimally weight and choose between them.
Contribution
The paper provides the first rigorous asymptotic analysis of Stack-SVD and SVD-Stack, deriving phase transitions and optimal weighting schemes for improved data integration.
Findings
Optimally weighted Stack-SVD outperforms SVD-Stack in asymptotic regimes.
Derived exact formulas for performance and phase transitions of both methods.
Provided practical algorithms for estimating optimal weights from data.
Abstract
Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets. A commonly used model assumes that the data matrices are noisy observations of low-rank matrices with a shared singular subspace. In this case, two primary methods have emerged for estimating this shared structure, which vary in how they integrate information across datasets. The first approach, termed Stack-SVD, concatenates all the datasets, and then performs a singular value decomposition (SVD). The second approach, termed SVD-Stack, first performs an SVD separately for each dataset, then aggregates the top singular vectors across these datasets, and finally computes a consensus amongst them. While these methods are widely used, they have not been rigorously studied in the proportional asymptotic regime, which is of great practical relevance in today's world of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
