Solar: $L_0$ solution path averaging for fast and accurate variable selection in high-dimensional data
Ning Xu, Timothy C.G. Fisher

TL;DR
The paper introduces Solar, a new variable selection method that uses $L_0$ path averaging across subsamples to improve stability, accuracy, and efficiency in high-dimensional data analysis, outperforming lasso and related methods.
Contribution
Solar is a novel $L_0$ norm-based path averaging algorithm that enhances variable selection stability and accuracy while reducing computational load in high-dimensional settings.
Findings
Solar achieves 64-84% reduction in redundant variables compared to lasso.
Solar improves variable selection accuracy and stability over traditional methods.
Solar is significantly faster, with 98% lower computation time than parallelized bootstrap lasso.
Abstract
We propose a new variable selection algorithm, subsample-ordered least-angle regression (solar), and its coordinate descent generalization, solar-cd. Solar re-constructs lasso paths using the norm and averages the resulting solution paths across subsamples. Path averaging retains the ranking information of the informative variables while averaging out sensitivity to high dimensionality, improving variable selection stability, efficiency, and accuracy. We prove that: (i) with a high probability, path averaging perfectly separates informative variables from redundant variables on the average path; (ii) solar variable selection is consistent and accurate; and (iii) the probability that solar omits weak signals is controllable for finite sample size. We also demonstrate that: (i) solar yields, with less than of the lasso computation load, substantial improvements over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Machine Learning and Data Classification
