clustvarsel: A Package Implementing Variable Selection for Model-based Clustering in R
Luca Scrucca (Universit\`a degli Studi di Perugia), Adrian E., Raftery (University of Washington)

TL;DR
The paper introduces the R package clustvarsel, which implements variable selection methods for model-based clustering using Gaussian mixture models, improving interpretability and efficiency.
Contribution
It presents an improved methodology and an R package for variable selection in clustering, including algorithms and parallel implementation.
Findings
Effective variable subset selection enhances clustering accuracy.
The package provides computationally efficient algorithms.
Application examples demonstrate practical utility.
Abstract
Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, improved clustering partitions. This paper describes the R package clustvarsel which performs subset selection for model-based clustering. An improved version of the methodology of Raftery and Dean (2006) is implemented in the new version 2 of the package to find the (locally) optimal subset of variables with group/cluster information in a dataset. Search over the solution space is performed using either a stepwise greedy search or a headlong algorithm. Adjustments for speeding up these algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Statistical Methods and Bayesian Inference
