Biclustering Via Sparse Clustering

Qian Liu; Guanhua Chen; Michael R. Kosorok; and Eric Bair

arXiv:1407.3010·stat.ME·July 14, 2014

Biclustering Via Sparse Clustering

Qian Liu, Guanhua Chen, Michael R. Kosorok, and Eric Bair

PDF

Open Access

TL;DR

This paper introduces a flexible biclustering framework based on sparse clustering, capable of identifying subgroups with differences in means or variances across features, demonstrated on simulated and real data.

Contribution

It extends sparse clustering to effectively identify biclusters with various types of feature differences, improving accuracy and computational efficiency.

Findings

01

Outperforms existing methods in accuracy

02

Faster computation times

03

Effective on both simulated and real datasets

Abstract

In many situations it is desirable to identify clusters that differ with respect to only a subset of features. Such clusters may represent homogeneous subgroups of patients with a disease, such as cancer or chronic pain. We define a bicluster to be a submatrix U of a larger data matrix X such that the features and observations in U differ from those not contained in U. For example, the observations in U could have different means or variances with respect to the features in U. We propose a general framework for biclustering based on the sparse clustering method of Witten and Tibshirani (2010). We develop a method for identifying features that belong to biclusters. This framework can be used to identify biclusters that differ with respect to the means of the features, the variance of the features, or more general differences. We apply these methods to several simulated and real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Advanced Clustering Algorithms Research