Measuring Cluster Stability for Bayesian Nonparametrics Using the Linear Bootstrap
Ryan Giordano, Runjing Liu, Nelle Varoquaux, Michael I. Jordan, Tamara, Broderick

TL;DR
This paper introduces a fast, automatic linear bootstrap method to assess the stability of Bayesian nonparametric clustering, especially for complex models where traditional bootstrap is computationally expensive.
Contribution
The authors propose a novel linear bootstrap approach for efficiently estimating cluster stability in Bayesian nonparametric models, leveraging auto-differentiation for automation.
Findings
Linear bootstrap provides a fast approximation to traditional bootstrap.
The method is demonstrated on gene expression time course data.
Auto-differentiation enables automatic computation of stability estimates.
Abstract
Clustering procedures typically estimate which data points are clustered together, a quantity of primary importance in many analyses. Often used as a preliminary step for dimensionality reduction or to facilitate interpretation, finding robust and stable clusters is often crucial for appropriate for downstream analysis. In the present work, we consider Bayesian nonparametric (BNP) models, a particularly popular set of Bayesian models for clustering due to their flexibility. Because of its complexity, the Bayesian posterior often cannot be computed exactly, and approximations must be employed. Mean-field variational Bayes forms a posterior approximation by solving an optimization problem and is widely used due to its speed. An exact BNP posterior might vary dramatically when presented with different data. As such, stability and robustness of the clustering should be assessed. A popular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gene expression and cancer classification · Statistical Methods and Inference
