Variable selection for model-based clustering using the integrated complete-data likelihood
Marbac Matthieu, Sedki Mohammed

TL;DR
This paper introduces a novel model selection criterion based on integrated complete-data likelihood for variable selection in model-based clustering, eliminating the need for parameter estimation during selection.
Contribution
It proposes a new, computationally efficient information criterion for variable selection that avoids parameter estimation and improves over classical methods.
Findings
Outperforms classical variable selection methods on simulated data
Efficient model selection without parameter estimation
Applicable to Gaussian mixture models with independence assumptions
Abstract
Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often greedy because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require any estimate and its maximization is simple and computationally efficient. The original contribution of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
