A Model-based Semi-Supervised Clustering Methodology
Jordan Yoder, Carey E. Priebe

TL;DR
This paper extends model-based clustering to semi-supervised scenarios with pre-labeled data, using BIC for model selection, and demonstrates its effectiveness through simulations and biological data analysis.
Contribution
It introduces a BIC-based approach for selecting the number of clusters and relevant variables in semi-supervised model-based clustering.
Findings
Effective in simulation studies
Successfully applied to biological data
Improves clustering accuracy with partial labels
Abstract
We consider an extension of model-based clustering to the semi-supervised case, where some of the data are pre-labeled. We provide a derivation of the Bayesian Information Criterion (BIC) approximation to the Bayes factor in this setting. We then use the BIC to the select number of clusters and the variables useful for clustering. We demonstrate the efficacy of this adaptation of the model-based clustering paradigm through two simulation examples and a fly larvae behavioral dataset in which lines of neurons are clustered into behavioral groups.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Bayesian Modeling and Causal Inference
