Scalable Bayesian Nonparametric Clustering and Classification
Yang Ni, Peter M\"uller, Maurice Diesendruck, Sinead, Williamson, Yitan Zhu, Yuan Ji

TL;DR
This paper introduces a scalable, parallel Monte Carlo algorithm for Bayesian nonparametric clustering and classification, enabling efficient analysis of large datasets with applications to health records and telemarketing data.
Contribution
It presents a general, simple, and parallelizable inference method for a broad class of Bayesian nonparametric models, applicable to large-scale data.
Findings
Identified meaningful clusters in health records and telemarketing data.
Achieved competitive classification accuracy compared to existing methods.
Demonstrated the method's scalability and flexibility for large datasets.
Abstract
We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and favorable classification performance relative to other widely used competing classifiers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Algorithms and Data Compression
