Informed Asymmetric Dirichlet Priors for Multivariate Bernoulli Mixture Models
Luisa Ferrari, Maria Franco Villoria, Garritt L. Page, and Alex Laini

TL;DR
This paper introduces a Bayesian clustering method for multivariate binary data using an asymmetric Dirichlet prior and an efficient MCMC algorithm, balancing computational efficiency and full posterior inference.
Contribution
It proposes a novel Bayesian approach with an asymmetric Dirichlet prior fixed on a large component number, enhancing interpretability and efficiency in clustering binary data.
Findings
The method is competitive with existing clustering algorithms.
It can outperform alternatives in specific scenarios.
Demonstrated on ecological presence-absence data.
Abstract
Clustering multivariate binary data is of interest in many scientific fields, including ecology, biomedicine, and social policy. Beyond heuristic clustering algorithms, such data can be modelled using multivariate Bernoulli mixture models. Many Bayesian implementations of these models involve a trade-off between computational efficiency and full posterior inference. We propose instead a Bayesian approach able to provide both aspects. The method fixes the total number of components to a large value and employs an asymmetric Dirichlet prior on the mixture weights. The asymmetric Dirichlet hyperparameters are elicited using the popular Penalized Complexity prior framework, which provides an intuitive way for users to inform the induced distribution of the number of clusters. An efficient MCMC algorithm is then developed to fit the model. Simulations and real-world applications demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
