Sparsification and feature selection by compressive linear regression
Florin Popescu, Daniel Renz

TL;DR
This paper introduces a MDL-based approach for sparsifying linear regression models that is faster and hyper-parameter free compared to LASSO, with comparable or better generalization on benchmark datasets.
Contribution
It develops a principled coding scheme for parameters and residuals enabling gradient-based sparsification without hyper-parameter tuning.
Findings
Faster sparsification than LASSO on multiple datasets
Achieves comparable or better generalization accuracy
Fully automatic process without cross-validation or regularization tuning
Abstract
The Minimum Description Length (MDL) principle states that the optimal model for a given data set is that which compresses it best. Due to practial limitations the model can be restricted to a class such as linear regression models, which we address in this study. As in other formulations such as the LASSO and forward step-wise regression we are interested in sparsifying the feature set while preserving generalization ability. We derive a well-principled set of codes for both parameters and error residuals along with smooth approximations to lengths of these codes as to allow gradient descent optimization of description length, and go on to show that sparsification and feature selection using our approach is faster than the LASSO on several datasets from the UCI and StatLib repositories, with favorable generalization accuracy, while being fully automatic, requiring neither…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Face and Expression Recognition
