Outcome-guided Sparse K-means for Disease Subtype Discovery via   Integrating Phenotypic Data with High-dimensional Transcriptomic Data

Lingsong Meng; Dorina Avram; George Tseng; Zhiguang Huo

arXiv:2103.09974·q-bio.QM·March 1, 2022

Outcome-guided Sparse K-means for Disease Subtype Discovery via Integrating Phenotypic Data with High-dimensional Transcriptomic Data

Lingsong Meng, Dorina Avram, George Tseng, Zhiguang Huo

PDF

1 Repo

TL;DR

This paper introduces an outcome-guided sparse K-means clustering method that integrates phenotypic data with high-dimensional transcriptomic data to discover disease subtypes more relevant to clinical outcomes.

Contribution

The authors develop a novel unified objective function for outcome-guided clustering that combines sample clustering, gene selection, and phenotypic data incorporation.

Findings

01

Outperforms existing clustering methods in simulations

02

Effective in identifying clinically meaningful subtypes in breast cancer and Alzheimer's data

03

Implemented as an accessible R package on GitHub

Abstract

The discovery of disease subtypes is an essential step for developing precision medicine, and disease subtyping via omics data has become a popular approach. While promising, subtypes obtained from existing approaches are not necessarily associated with clinical outcomes. With the rich clinical data along with the omics data in modern epidemiology cohorts, it is urgent to develop an outcome-guided clustering algorithm to fully integrate the phenotypic data with the high-dimensional omics data. Hence, we extended a sparse K-means method to an outcome-guided sparse K-means (GuidedSparseKmeans) method. An unified objective function was proposed, which was comprised of (i) weighted K-means to perform sample clusterings; (ii) lasso regularizations to perform gene selection from the high-dimensional omics data; (iii) incorporation of a phenotypic variable from the clinical dataset to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LingsongMeng/GuidedSparseKmeans
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.