Poisson Hierarchical Indian Buffet Processes-With Indications for Microbiome Species Sampling Models
Lancelot F. James, Juho Lee, Abhinav Pandey

TL;DR
This paper introduces the Poisson Hierarchical Indian Buffet Process, a Bayesian nonparametric model for complex, sparse count data, enabling flexible species sampling analysis with applications in microbiome and other fields.
Contribution
It develops a novel hierarchical Poisson Indian Buffet Process model with tractable inference, addressing overdispersion, zero-inflation, and the unseen species problem in microbiome data.
Findings
Provides a flexible multivariate count model for microbiome data
Enables explicit modeling of technical and biological zeros
Offers extensions for incorporating domain knowledge
Abstract
We introduce the Poisson Hierarchical Indian Buffet Process (PHIBP), a new class of species sampling models designed to address the challenges of complex, sparse count data by facilitating information sharing across and within groups. Our theoretical developments enable a tractable Bayesian nonparametric framework with machine learning elements, accommodating a potentially infinite number of species (taxa) whose parameters are learned from data. Focusing on microbiome analysis, we address key gaps by providing a flexible multivariate count model that accounts for overdispersion and robustly handles diverse data types (OTUs, ASVs). We introduce novel parameters reflecting species abundance and diversity. The model borrows strength across groups while explicitly distinguishing between technical and biological zeros to interpret sparse co-occurrence patterns. This results in a framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
MethodsADaptive gradient method with the OPTimal convergence rate
