Relabelling Algorithms for Large Dataset Mixture Models
Wanchuang Zhu, Yanan Fan

TL;DR
This paper reviews existing label switching solutions for large dataset mixture models, introduces a new scalable algorithm based on loss functions, and compares their performance on simulated and real data.
Contribution
It proposes a novel, computationally efficient label relabelling algorithm that scales well for large datasets and compares it with existing methods.
Findings
The new algorithm performs well on large datasets.
Existing methods can be too slow for high-dimensional data.
The paper provides practical recommendations for large-scale mixture modeling.
Abstract
Mixture models are flexible tools in density estimation and classification problems. Bayesian estimation of such models typically relies on sampling from the posterior distribution using Markov chain Monte Carlo. Label switching arises because the posterior is invariant to permutations of the component parameters. Methods for dealing with label switching have been studied fairly extensively in the literature, with the most popular approaches being those based on loss functions. However, many of these algorithms turn out to be too slow in practice, and can be infeasible as the size and dimension of the data grow. In this article, we review earlier solutions which can scale up well for large data sets, and compare their performances on simulated and real datasets. In addition, we propose a new, and computationally efficient algorithm based on a loss function interpretation, and show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Algorithms and Data Compression
