TL;DR
This paper introduces maRRR, a novel flexible matrix regression method for integrating multiple high-dimensional datasets, such as pan-cancer gene expression data, to improve analysis power and uncover shared or specific variation.
Contribution
It proposes a new structured nuclear norm-based framework that unifies and extends existing methods for multi-cohort data analysis, including a novel approach for single dataset regression and factorization.
Findings
Significant power gains in multi-dataset analysis demonstrated through simulations.
Effective prediction and imputation of gene expression data across cancer types.
New biological insights into mutation-driven and shared variation across cancers.
Abstract
Statistical approaches that successfully combine multiple datasets are more powerful, efficient, and scientifically informative than separate analyses. To address variation architectures correctly and comprehensively for high-dimensional data across multiple sample sets (i.e., cohorts), we propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method to concurrently learn both covariate-driven and auxiliary structured variation. We consider a structured nuclear norm objective that is motivated by random matrix theory, in which the regression or factorization terms may be shared or specific to any number of cohorts. Our framework subsumes several existing methods, such as reduced rank regression and unsupervised multi-matrix factorization approaches, and includes a promising novel approach to regression and factorization of a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
