Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration

Tavor Z. Baharav; Phillip B. Nicol; Rafael A. Irizarry; Rong Ma

arXiv:2507.22170·stat.ML·July 31, 2025

Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration

Tavor Z. Baharav, Phillip B. Nicol, Rafael A. Irizarry, Rong Ma

PDF

TL;DR

This paper compares two data integration methods, Stack-SVD and SVD-Stack, using Random Matrix Theory to derive their asymptotic performance, revealing when and how to optimally weight and choose between them.

Contribution

The paper provides the first rigorous asymptotic analysis of Stack-SVD and SVD-Stack, deriving phase transitions and optimal weighting schemes for improved data integration.

Findings

01

Optimally weighted Stack-SVD outperforms SVD-Stack in asymptotic regimes.

02

Derived exact formulas for performance and phase transitions of both methods.

03

Provided practical algorithms for estimating optimal weights from data.

Abstract

Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets. A commonly used model assumes that the data matrices are noisy observations of low-rank matrices with a shared singular subspace. In this case, two primary methods have emerged for estimating this shared structure, which vary in how they integrate information across datasets. The first approach, termed Stack-SVD, concatenates all the datasets, and then performs a singular value decomposition (SVD). The second approach, termed SVD-Stack, first performs an SVD separately for each dataset, then aggregates the top singular vectors across these datasets, and finally computes a consensus amongst them. While these methods are widely used, they have not been rigorously studied in the proportional asymptotic regime, which is of great practical relevance in today's world of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.