Multiple co-clustering based on nonparametric mixture models with   heterogeneous marginal distributions

Tomoki Tokuda; Junichiro Yoshimoto; Yu Shimizu; Shigeru Toki; Go; Okada; Masahiro Takamura; Tetsuya Yamamoto; Shinpei Yoshimura; Yasumasa; Okamoto; Shigeto Yamawaki; Kenji Doya

arXiv:1510.06138·stat.ML·July 3, 2019

Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

Tomoki Tokuda, Junichiro Yoshimoto, Yu Shimizu, Shigeru Toki, Go, Okada, Masahiro Takamura, Tetsuya Yamamoto, Shinpei Yoshimura, Yasumasa, Okamoto, Shigeto Yamawaki, Kenji Doya

PDF

TL;DR

This paper introduces a nonparametric Bayesian multiple co-clustering method capable of handling heterogeneous data types and inferring the number of clusters and views, demonstrated to outperform existing methods on synthetic and real datasets.

Contribution

The novel approach models multiple co-clustering with heterogeneous distributions using a nonparametric Bayesian framework, automatically inferring the number of clusters and views.

Findings

01

Outperforms existing methods in recovering true cluster structures

02

Effective on high-dimensional data with mixed variable types

03

Provides useful insights on real biomedical data

Abstract

We propose a novel method for multiple clustering that assumes a co-clustering structure (partitions in both rows and columns of the data matrix) in each view. The new method is applicable to high-dimensional data. It is based on a nonparametric Bayesian approach in which the number of views and the number of feature-/subject clusters are inferred in a data-driven manner. We simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block. This makes our method applicable to datasets consisting of both numerical and categorical variables, which biomedical data typically do. Clustering solutions are based on variational inference with mean field approximation. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.