A Nonparametric Bayesian Method for Clustering of High-Dimensional Mixed Dataset
Chetkar Jha

TL;DR
This paper introduces Gen-VariScan, a nonparametric Bayesian method for biclustering high-dimensional mixed datasets, effectively revealing biological associations and covariate structures.
Contribution
It proposes a novel Bayesian biclustering approach using GLMs and PDP for mixed data, addressing the challenge of clustering high-dimensional datasets.
Findings
Covariate co-cluster detection is consistent asymptotically.
Gen-VariScan outperforms existing methods in simulations.
Provides a new beta regression approach via working value method.
Abstract
The paper is motivated from clustering problem in high-throughput mixed datasets. Clustering of such datasets can provide much insight into biological associations. An open problem in this context is to simultaneously cluster high-dimensional mixed dataset. This paper fills that gap and proposes a nonparametric Bayesian method called Gen-VariScan for biclustering of high-dimensional mixed dataset. Gen-VariScan utilizes Generalized Linear Models (GLM), and latent variable approaches to integrate mixed dataset. We make use of Poisson Dirichlet Process (PDP) to identify a lower dimensional structure of mixed covariates. We show that covariate co-cluster detection is aposteriori consistent, as the number of subject and covariates grows. The advantage of Gen-VariScan is also demonstrated through numerical simulation and data analysis. As a byproduct, we derive a working value approach to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Gene expression and cancer classification
