Consensus Monte Carlo for Random Subsets using Shared Anchors
Yang Ni, Yuan Ji, Peter Mueller

TL;DR
This paper introduces a scalable consensus Monte Carlo algorithm for Bayesian nonparametric models, enabling efficient inference on large datasets across various sampling models and applications.
Contribution
The authors develop a versatile consensus Monte Carlo method applicable to any prior on random subsets, extending Bayesian nonparametric inference to big data scenarios.
Findings
Effective on large datasets like MNIST images and EHR records.
Accurate inference demonstrated through simulation studies.
Applicable to diverse sampling models such as Dirichlet process and Indian buffet process.
Abstract
We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model, inference under an Indian buffet process prior with a binomial sampling model, and with a categorical sampling model. We assess the proposed algorithm with simulation studies and show results for inference with three datasets: an MNIST image dataset, a dataset of pancreatic cancer mutations, and a large set of electronic health records (EHR). Supplementary materials for this article are available online.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Statistical Methods and Inference
