
TL;DR
This paper introduces an EM algorithm-based approach for subset selection in high-dimensional linear regression, demonstrating that better model fitting correlates with improved variable screening and outperforming existing methods in simulations.
Contribution
It proposes a novel EM algorithm for best subset regression that enhances screening performance by leveraging model fitting, supported by theoretical and simulation results.
Findings
The method improves variable screening accuracy.
It outperforms popular screening methods in simulations.
The algorithms have a monotonicity property ensuring better model fitting.
Abstract
To find efficient screening methods for high dimensional linear regression models, this paper studies the relationship between model fitting and screening performance. Under a sparsity assumption, we show that a subset that includes the true submodel always yields smaller residual sum of squares (i.e., has better model fitting) than all that do not in a general asymptotic setting. This indicates that, for screening important variables, we could follow a "better fitting, better screening" rule, i.e., pick a "better" subset that has better model fitting. To seek such a better subset, we consider the optimization problem associated with best subset regression. An EM algorithm, called orthogonalizing subset screening, and its accelerating version are proposed for searching for the best subset. Although the two algorithms cannot guarantee that a subset they yield is the best, their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Control Systems and Identification
