Consensus Monte Carlo for Random Subsets using Shared Anchors

Yang Ni; Yuan Ji; Peter Mueller

arXiv:1906.12309·stat.CO·February 26, 2020·1 cites

Consensus Monte Carlo for Random Subsets using Shared Anchors

Yang Ni, Yuan Ji, Peter Mueller

PDF

Open Access

TL;DR

This paper introduces a scalable consensus Monte Carlo algorithm for Bayesian nonparametric models, enabling efficient inference on large datasets across various sampling models and applications.

Contribution

The authors develop a versatile consensus Monte Carlo method applicable to any prior on random subsets, extending Bayesian nonparametric inference to big data scenarios.

Findings

01

Effective on large datasets like MNIST images and EHR records.

02

Accurate inference demonstrated through simulation studies.

03

Applicable to diverse sampling models such as Dirichlet process and Indian buffet process.

Abstract

We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model, inference under an Indian buffet process prior with a binomial sampling model, and with a categorical sampling model. We assess the proposed algorithm with simulation studies and show results for inference with three datasets: an MNIST image dataset, a dataset of pancreatic cancer mutations, and a large set of electronic health records (EHR). Supplementary materials for this article are available online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Statistical Methods and Inference