A Multivariate Poisson-Log Normal Mixture Model for Clustering   Transcriptome Sequencing Data

Anjali Silva; Steven J. Rothstein; Paul D. McNicholas; Sanjeena; Subedi

arXiv:1711.11190·stat.ME·December 1, 2017·BMC Bioinform.

A Multivariate Poisson-Log Normal Mixture Model for Clustering Transcriptome Sequencing Data

Anjali Silva, Steven J. Rothstein, Paul D. McNicholas, Sanjeena, Subedi

PDF

2 Repos

TL;DR

This paper introduces a mixture model based on the multivariate Poisson-Log Normal distribution for clustering high-dimensional, discrete, and skewed transcriptome sequencing data, facilitating the discovery of gene co-expression groups.

Contribution

It proposes a novel MPLN mixture model tailored for RNA sequencing data, with a specific MCMC-EM algorithm for parameter estimation and model selection.

Findings

01

Effective modeling of correlation and overdispersion in count data

02

Successful identification of gene clusters in transcriptome data

03

Demonstrated advantages over traditional clustering methods

Abstract

High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. A mixture of multivariate Poisson-Log Normal (MPLN) model is proposed for clustering of high-throughput transcriptome sequencing data. The MPLN model is able to fit a wide range of correlation and overdispersion situations, and is ideal for modeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.