Sparse Bayesian Partially Identified Models for Sequence Count Data
Won Gu, Francesca Chiaromonte, Justin D. Silverman

TL;DR
This paper introduces a Bayesian model for sequence count data in genomics that explicitly accounts for uncertainty in sparsity assumptions, leading to more accurate differential analysis and reduced error rates.
Contribution
The paper proposes a novel sparse Bayesian Partially Identified Model that extends the SRI framework to handle scale uncertainty in sequence count data analysis.
Findings
Significant reduction in Type I errors compared to existing methods.
Substantial decrease in Type II errors demonstrated through simulations.
Theoretical proof of estimator consistency.
Abstract
In genomics, differential abundance and expression analyses are complicated by the compositional nature of sequence count data, which reflect only relative-not absolute-abundances or expression levels. Many existing methods attempt to address this limitation through data normalizations, but we have shown that such approaches imply strong, often biologically implausible assumptions about total microbial load or total gene expression. Even modest violations of these assumptions can inflate Type I and Type II error rates to over 70%. Sparse estimators have been proposed as an alternative, leveraging the assumption that only a small subset of taxa (or genes) change between conditions. However, we show that current sparse methods suffer from similar pathologies because they treat sparsity assumptions as fixed and ignore the uncertainty inherent in these assumptions. We introduce a sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gene expression and cancer classification · Bayesian Methods and Mixture Models
