A Bayesian Semiparametric Factor Analysis Model for Subtype   Identification

Jiehuan Sun; Joshua L. Warren; Hongyu Zhao

arXiv:1609.02984·stat.ME·September 27, 2016

A Bayesian Semiparametric Factor Analysis Model for Subtype Identification

Jiehuan Sun, Joshua L. Warren, Hongyu Zhao

PDF

Open Access

TL;DR

This paper introduces BCSub, a Bayesian semiparametric factor analysis model for disease subtype identification from gene expression data, improving clustering accuracy and clinical relevance.

Contribution

The paper presents a novel Bayesian method that reduces high-dimensional gene data to factors for better disease subtype clustering, outperforming existing methods.

Findings

01

BCSub outperforms traditional clustering methods in simulations.

02

Identifies more clinically relevant subtypes in real datasets.

03

Effective in high-dimensional, correlated gene expression data.

Abstract

Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite many successes, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering due to the high dimensionality. In this article, we introduce a novel subtype identification method in the Bayesian setting based on gene expression profiles. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering. Through extensive simulation studies, we show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Gene expression and cancer classification · Bioinformatics and Genomic Networks