Common Failure Modes of Subcluster-based Sampling in Dirichlet Process Gaussian Mixture Models -- and a Deep-learning Solution
Vlad Winter, Or Dinari, Oren Freifeld

TL;DR
This paper analyzes failure modes of a subcluster-based sampling method in Dirichlet Process Gaussian Mixture Models and introduces deep learning-based solutions to improve clustering performance and convergence.
Contribution
It identifies limitations of random subcluster initialization and proposes deep learning-based methods as effective replacements to enhance sampler efficiency.
Findings
Deep learning-based initialization improves split proposals.
Enhanced sampler stability and faster convergence.
Significant performance gains over traditional methods.
Abstract
The Dirichlet Process Gaussian Mixture Model (DPGMM) is often used to cluster data when the number of clusters is unknown. One main DPGMM inference paradigm relies on sampling. Here we consider a known state-of-art sampler (proposed by Chang and Fisher III (2013) and improved by Dinari et al. (2019)), analyze its failure modes, and show how to improve it, often drastically. Concretely, in that sampler, whenever a new cluster is formed it is augmented with two subclusters whose labels are initialized at random. Upon their evolution, the subclusters serve to propose a split of the parent cluster. We show that the random initialization is often problematic and hurts the otherwise-effective sampler. Specifically, we demonstrate that this initialization tends to lead to poor split proposals and/or too many iterations before a desired split is accepted. This slows convergence and can damage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Machine Learning in Healthcare
