Classification and clustering of sequencing data using a Poisson model

Daniela M. Witten

arXiv:1202.6201·stat.AP·February 29, 2012

Classification and clustering of sequencing data using a Poisson model

Daniela M. Witten

PDF

TL;DR

This paper introduces new classification and clustering methods tailored for sequencing data, using a Poisson model to better handle the discrete count nature of such data, improving analysis accuracy.

Contribution

It develops a Poisson-based analog of linear discriminant analysis and a novel clustering dissimilarity measure for sequencing data, addressing limitations of Gaussian-based methods.

Findings

01

Poisson-based methods outperform Gaussian-based methods on sequencing data

02

The approaches work well on real RNA sequencing datasets

03

The methods are validated through simulation and real data analysis

Abstract

In recent years, advances in high throughput sequencing technology have led to a need for specialized methods for the analysis of digital gene expression data. While gene expression data measured on a microarray take on continuous values and can be modeled using the normal distribution, RNA sequencing data involve nonnegative counts and are more appropriately modeled using a discrete count distribution, such as the Poisson or the negative binomial. Consequently, analytic tools that assume a Gaussian distribution (such as classification methods based on linear discriminant analysis and clustering methods that use Euclidean distance) may not perform as well for sequencing data as methods that are based upon a more appropriate distribution. Here, we propose new approaches for performing classification and clustering of observations on the basis of sequencing data. Using a Poisson log…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.