Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering
Gilles Celeux, Marie-Laure Martin-Magniette, Cathy Maugis-Rabusseau, and Adrian E. Raftery

TL;DR
This paper compares model selection and regularization methods for variable selection in clustering, showing that model selection generally outperforms regularization in accuracy, especially with correlated variables.
Contribution
It provides a comprehensive simulation-based comparison of the two approaches, highlighting the advantages of model selection in various scenarios.
Findings
Model selection outperforms regularization in variable identification accuracy.
Both methods improve clustering accuracy over K-means without variable selection.
Model selection is particularly better with correlated variables.
Abstract
We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data Mining Algorithms and Applications
