Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model
Ning Wang, Xin Zhang, Qing Mai

TL;DR
This paper introduces a novel group lasso penalized EM algorithm for high-dimensional mixture linear regression, providing theoretical guarantees without sample-splitting and demonstrating strong numerical performance.
Contribution
The paper develops a new penalized EM algorithm for high-dimensional mixture regression that avoids sample-splitting and extends to multivariate responses, with proven statistical properties.
Findings
Algorithm performs well in numerical experiments.
Theoretical analysis confirms statistical consistency.
No sample-splitting required for convergence.
Abstract
The expectation-maximization (EM) algorithm and its variants are widely used in statistics. In high-dimensional mixture linear regression, the model is assumed to be a finite mixture of linear regression and the number of predictors is much larger than the sample size. The standard EM algorithm, which attempts to find the maximum likelihood estimator, becomes infeasible for such model. We devise a group lasso penalized EM algorithm and study its statistical properties. Existing theoretical results of regularized EM algorithms often rely on dividing the sample into many independent batches and employing a fresh batch of sample in each iteration of the algorithm. Our algorithm and theoretical analysis do not require sample-splitting, and can be extended to multivariate response cases. The proposed methods also have encouraging performances in numerical studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Survey Sampling and Estimation Techniques · Crystallization and Solubility Studies
