A Family of Mixture Models for Biclustering

Wangshu Tu; Sanjeena Subedi

arXiv:2009.05098·stat.ME·September 14, 2020·Stat. Anal. Data Min.

A Family of Mixture Models for Biclustering

Wangshu Tu, Sanjeena Subedi

PDF

Open Access

TL;DR

This paper introduces a flexible family of biclustering models based on mixture models with diagonal covariance matrices, improving fit on complex data by allowing non-zero off-diagonal elements.

Contribution

It proposes a new class of biclustering models that relax previous restrictions, using diagonal covariance matrices to better capture complex data structures.

Findings

01

Models outperform previous methods on simulated data

02

Effective in bioinformatics and text analytics applications

03

Demonstrates improved data fit with flexible covariance structures

Abstract

Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has been introduced in a model-based clustering framework by utilizing a structure similar to a mixture of factor analyzers. In such models, observed variables $X$ are modelled using a latent variable $U$ that is assumed to be from $N (0, I)$ . Clustering of variables is introduced by imposing constraints on the entries of the factor loading matrix to be 0 and 1 that results in a block diagonal covariance matrices. However, this approach is overly restrictive as off-diagonal elements in the blocks of the covariance matrices can only be 1 which can lead to unsatisfactory model fit on complex data. Here, the latent variable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Gene expression and cancer classification · Advanced Clustering Algorithms Research