A Nonparametric Bayesian Method for Clustering of High-Dimensional Mixed   Dataset

Chetkar Jha

arXiv:1808.04045·stat.ME·August 15, 2018

A Nonparametric Bayesian Method for Clustering of High-Dimensional Mixed Dataset

Chetkar Jha

PDF

Open Access

TL;DR

This paper introduces Gen-VariScan, a nonparametric Bayesian method for biclustering high-dimensional mixed datasets, effectively revealing biological associations and covariate structures.

Contribution

It proposes a novel Bayesian biclustering approach using GLMs and PDP for mixed data, addressing the challenge of clustering high-dimensional datasets.

Findings

01

Covariate co-cluster detection is consistent asymptotically.

02

Gen-VariScan outperforms existing methods in simulations.

03

Provides a new beta regression approach via working value method.

Abstract

The paper is motivated from clustering problem in high-throughput mixed datasets. Clustering of such datasets can provide much insight into biological associations. An open problem in this context is to simultaneously cluster high-dimensional mixed dataset. This paper fills that gap and proposes a nonparametric Bayesian method called Gen-VariScan for biclustering of high-dimensional mixed dataset. Gen-VariScan utilizes Generalized Linear Models (GLM), and latent variable approaches to integrate mixed dataset. We make use of Poisson Dirichlet Process (PDP) to identify a lower dimensional structure of mixed covariates. We show that covariate co-cluster detection is aposteriori consistent, as the number of subject and covariates grows. The advantage of Gen-VariScan is also demonstrated through numerical simulation and data analysis. As a byproduct, we derive a working value approach to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Gene expression and cancer classification