Finite mixture regression: A sparse variable selection by model selection for clustering
Emilie Devijver (LM-Orsay)

TL;DR
This paper introduces a method for high-dimensional Gaussian mixture regression that combines maximum likelihood estimation with an 1-penalization for variable selection, providing theoretical guarantees.
Contribution
It develops a new approach for variable selection in high-dimensional mixture regression models using penalized maximum likelihood with oracle inequalities.
Findings
Oracle inequality for the estimator with Jensen-Kullback-Leibler loss
Derivation of penalty shape based on model complexity
Effective variable selection in high-dimensional settings
Abstract
We consider a finite mixture of Gaussian regression model for high- dimensional data, where the number of covariates may be much larger than the sample size. We propose to estimate the unknown conditional mixture density by a maximum likelihood estimator, restricted on relevant variables selected by an 1-penalized maximum likelihood estimator. We get an oracle inequality satisfied by this estimator with a Jensen-Kullback-Leibler type loss. Our oracle inequality is deduced from a general model selection theorem for maximum likelihood estimators with a random model collection. We can derive the penalty shape of the criterion, which depends on the complexity of the random model collection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference
