# Poisson PCA: Poisson Measurement Error corrected PCA, with Application   to Microbiome Data

**Authors:** Toby Kenney, Tianshu Huang, Hong Gu

arXiv: 1904.11745 · 2021-05-25

## TL;DR

This paper introduces Poisson PCA, a semiparametric method for principal component analysis of Poisson-distributed data, effectively correcting bias and accounting for sequencing depth, with applications to microbiome data.

## Contribution

We develop a novel semiparametric Poisson PCA approach that corrects bias in variance estimation and computes principal scores without relying on parametric models.

## Key findings

- Our method outperforms PLN in identifying main principal components.
- It is faster and more robust to outliers than existing parametric approaches.
- Applications to microbiome data demonstrate practical effectiveness.

## Abstract

In this paper, we study the problem of computing a Principal Component Analysis of data affected by Poisson noise. We assume samples are drawn from independent Poisson distributions. We want to estimate principle components of a fixed transformation of the latent Poisson means. Our motivating example is microbiome data, though the methods apply to many other situations. We develop a semiparametric approach to correct the bias of variance estimators, both for untransformed and transformed (with particular attention to log-transformation) Poisson means. Furthermore, we incorporate methods for correcting different exposure or sequencing depth in the data. In addition to identifying the principal components, we also address the non-trivial problem of computing the principal scores in this semiparametric framework. Most previous approaches tend to take a more parametric line. For example the Poisson-log-normal (PLN) model, approach. We compare our method with the PLN approach and find that our method is better at identifying the main principal components of the latent log-transformed Poisson means, and as a further major advantage, takes far less time to compute. Comparing methods on real data, we see that our method also appears to be more robust to outliers than the parametric method.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.11745/full.md

## Figures

45 figures with captions in the complete paper: https://tomesphere.com/paper/1904.11745/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1904.11745/full.md

---
Source: https://tomesphere.com/paper/1904.11745