Sparse integrative clustering of multiple omics data sets
Ronglai Shen, Sijian Wang, Qianxing Mo

TL;DR
This paper introduces a penalized latent variable regression approach for integrative clustering of multi-omics data, revealing disease subtypes by identifying key genomic features across data types.
Contribution
It develops a novel joint modeling method using sparsity-inducing penalties and efficient model selection for multi-omics data integration and clustering.
Findings
Successfully identified disease subtypes in breast and lung cancer datasets.
Outperformed sparse SVD and penalized GMM in experiments.
Revealed important genomic features contributing to disease heterogeneity.
Abstract
High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling approach measures multiple omics data types simultaneously in the same set of biological samples. Such approach renders an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
