TL;DR
This paper introduces an outcome-guided sparse K-means clustering method that integrates phenotypic data with high-dimensional transcriptomic data to discover disease subtypes more relevant to clinical outcomes.
Contribution
The authors develop a novel unified objective function for outcome-guided clustering that combines sample clustering, gene selection, and phenotypic data incorporation.
Findings
Outperforms existing clustering methods in simulations
Effective in identifying clinically meaningful subtypes in breast cancer and Alzheimer's data
Implemented as an accessible R package on GitHub
Abstract
The discovery of disease subtypes is an essential step for developing precision medicine, and disease subtyping via omics data has become a popular approach. While promising, subtypes obtained from existing approaches are not necessarily associated with clinical outcomes. With the rich clinical data along with the omics data in modern epidemiology cohorts, it is urgent to develop an outcome-guided clustering algorithm to fully integrate the phenotypic data with the high-dimensional omics data. Hence, we extended a sparse K-means method to an outcome-guided sparse K-means (GuidedSparseKmeans) method. An unified objective function was proposed, which was comprised of (i) weighted K-means to perform sample clusterings; (ii) lasso regularizations to perform gene selection from the high-dimensional omics data; (iii) incorporation of a phenotypic variable from the clinical dataset to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
