Bayesian Consensus Clustering

Eric F. Lock; David B. Dunson

arXiv:1302.7280·stat.ML·December 1, 2015

Bayesian Consensus Clustering

Eric F. Lock, David B. Dunson

PDF

TL;DR

This paper introduces a Bayesian model for clustering objects based on multiple data sources, allowing for source-specific clusterings that loosely follow an overall consensus, improving robustness and power in heterogeneous data analysis.

Contribution

It presents a scalable Bayesian framework for simultaneous estimation of source-specific and consensus clusterings, enhancing robustness over joint clustering and power over separate clustering.

Findings

01

More robust than joint clustering of all data sources

02

More powerful than clustering each data source separately

03

Successfully applied to breast cancer subtype identification

Abstract

The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-specific clusterings. We demonstrate that this flexible approach is more robust than joint clustering of all data sources, and is more powerful than clustering each data source separately. This work is motivated by the integrated analysis of heterogeneous biomedical data, and we present an application to subtype identification of breast cancer tumor samples using publicly available data from The Cancer Genome Atlas.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.