Inference on High-Dimensional Sparse Count Data

Jyotishka Datta; David B. Dunson

arXiv:1510.04320·stat.ME·April 15, 2016

Inference on High-Dimensional Sparse Count Data

Jyotishka Datta, David B. Dunson

PDF

Open Access

TL;DR

This paper introduces a new Bayesian approach with local-global shrinkage priors for analyzing high-dimensional sparse count data, improving flexibility and accuracy over existing models.

Contribution

The authors develop a novel class of continuous shrinkage priors specifically designed for sparse count data, with theoretical guarantees and superior empirical performance.

Findings

01

Strong posterior concentration properties.

02

Enhanced control of false discoveries in multiple testing.

03

Robustness and super-efficiency demonstrated in simulations.

Abstract

In a variety of application areas, there is a growing interest in analyzing high dimensional sparse count data, with sparsity exhibited by an over-abundance of zeros and small non-zero counts. Existing approaches for analyzing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to the level and nature of sparsity in the data. We develop a new class of continuous local-global shrinkage priors tailored for sparse counts. Theoretical properties are assessed, including posterior concentration, stronger control on false discoveries in multiple testing, robustness in posterior mean and super-efficiency in estimating the sampling density. Simulation studies illustrate excellent small sample properties relative to competitors. We apply the method to detect rare mutational hotspots in exome sequencing data and to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic Associations and Epidemiology · Statistical Methods and Inference · Bayesian Methods and Mixture Models