Scalable Subset Selection in Linear Mixed Models
Ryan Thompson, Matt P. Wand, Joanna J. J. Wang

TL;DR
This paper introduces a scalable $ ext{L}_0$ regularized method for subset selection in linear mixed models, capable of handling thousands of predictors efficiently, with proven convergence and strong empirical results.
Contribution
It develops a novel $ ext{L}_0$ regularized approach with algorithms that scale to large datasets, filling a gap in sparse learning for LMMs.
Findings
Method runs in seconds to minutes on datasets with thousands of predictors.
Provides finite-sample bounds on divergence, ensuring statistical reliability.
Demonstrates excellent performance on synthetic and real data.
Abstract
Linear mixed models (LMMs), which incorporate fixed and random effects, are key tools for analyzing heterogeneous data, such as in personalized medicine. Nowadays, this type of data is increasingly wide, sometimes containing thousands of candidate predictors, necessitating sparsity for prediction and interpretation. However, existing sparse learning methods for LMMs do not scale well beyond tens or hundreds of predictors, leaving a large gap compared with sparse methods for linear models, which ignore random effects. This paper closes the gap with a new regularized method for LMM subset selection that can run on datasets containing thousands of predictors in seconds to minutes. On the computational front, we develop a coordinate descent algorithm as our main workhorse and provide a guarantee of its convergence. We also develop a local search algorithm to help traverse the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
