A whitening approach to probabilistic canonical correlation analysis for omics data integration
Takoua Jendoubi, Korbinian Strimmer

TL;DR
This paper introduces a novel probabilistic CCA model based on statistical whitening, offering interpretability and efficiency for high-dimensional omics data integration, with practical implementation in an R package.
Contribution
It presents a new whitening-based probabilistic CCA model that improves interpretability and computational efficiency for large-scale, high-dimensional data analysis.
Findings
Effective in simulations and real omics data applications
Handles negative and non-normal variables
Provides a computationally efficient estimation method
Abstract
Background: Canonical correlation analysis (CCA) is a classic statistical tool for investigating complex multivariate data. Correspondingly, it has found many diverse applications, ranging from molecular biology and medicine to social science and finance. Intriguingly, despite the importance and pervasiveness of CCA, only recently a probabilistic understanding of CCA is developing, moving from an algorithmic to a model-based perspective and enabling its application to large-scale settings. Results: Here, we revisit CCA from the perspective of statistical whitening of random variables and propose a simple yet flexible probabilistic model for CCA in the form of a two-layer latent variable generative model. The advantages of this variant of probabilistic CCA include non-ambiguity of the latent variables, provisions for negative canonical correlations, possibility of non-normal generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
