Finite mixture regression: A sparse variable selection by model   selection for clustering

Emilie Devijver (LM-Orsay)

arXiv:1409.1331·math.ST·September 5, 2014·5 cites

Finite mixture regression: A sparse variable selection by model selection for clustering

Emilie Devijver (LM-Orsay)

PDF

Open Access

TL;DR

This paper introduces a method for high-dimensional Gaussian mixture regression that combines maximum likelihood estimation with an 1-penalization for variable selection, providing theoretical guarantees.

Contribution

It develops a new approach for variable selection in high-dimensional mixture regression models using penalized maximum likelihood with oracle inequalities.

Findings

01

Oracle inequality for the estimator with Jensen-Kullback-Leibler loss

02

Derivation of penalty shape based on model complexity

03

Effective variable selection in high-dimensional settings

Abstract

We consider a finite mixture of Gaussian regression model for high- dimensional data, where the number of covariates may be much larger than the sample size. We propose to estimate the unknown conditional mixture density by a maximum likelihood estimator, restricted on relevant variables selected by an 1-penalized maximum likelihood estimator. We get an oracle inequality satisfied by this estimator with a Jensen-Kullback-Leibler type loss. Our oracle inequality is deduced from a general model selection theorem for maximum likelihood estimators with a random model collection. We can derive the penalty shape of the criterion, which depends on the complexity of the random model collection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference