Statistical Integration of Heterogeneous Data with PO2PLS
Said el Bouhaddani, Hae-Won Uh, Geurt Jongbloed, Jeanine, Houwing-Duistermaat

TL;DR
This paper introduces PO2PLS, a probabilistic framework for integrating heterogeneous multi-omics data, addressing high-dimensionality and correlation challenges, with improved feature selection and prediction capabilities demonstrated through simulations and real data examples.
Contribution
The paper presents PO2PLS, a novel probabilistic method that unifies existing omics integration techniques and offers efficient estimation, hypothesis testing, and improved performance.
Findings
PO2PLS outperforms existing methods in feature selection.
It provides accurate predictions in simulated and real datasets.
The method identifies both known and novel biological relationships.
Abstract
The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high-dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), which addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we implement a fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for testing the relationship between two datasets is proposed, and its asymptotic distribution is derived. Notably, several existing omics integration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock
