Classification and clustering of sequencing data using a Poisson model
Daniela M. Witten

TL;DR
This paper introduces new classification and clustering methods tailored for sequencing data, using a Poisson model to better handle the discrete count nature of such data, improving analysis accuracy.
Contribution
It develops a Poisson-based analog of linear discriminant analysis and a novel clustering dissimilarity measure for sequencing data, addressing limitations of Gaussian-based methods.
Findings
Poisson-based methods outperform Gaussian-based methods on sequencing data
The approaches work well on real RNA sequencing datasets
The methods are validated through simulation and real data analysis
Abstract
In recent years, advances in high throughput sequencing technology have led to a need for specialized methods for the analysis of digital gene expression data. While gene expression data measured on a microarray take on continuous values and can be modeled using the normal distribution, RNA sequencing data involve nonnegative counts and are more appropriately modeled using a discrete count distribution, such as the Poisson or the negative binomial. Consequently, analytic tools that assume a Gaussian distribution (such as classification methods based on linear discriminant analysis and clustering methods that use Euclidean distance) may not perform as well for sequencing data as methods that are based upon a more appropriate distribution. Here, we propose new approaches for performing classification and clustering of observations on the basis of sequencing data. Using a Poisson log…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
