Multiple Equivalent Solutions for the Lasso
Yannis Pantazis, Vincenzo Lagani, Paulos Charonyktakis, Ioannis, Tsamardinos

TL;DR
This paper extends the Lasso feature selection algorithm to identify multiple equivalent solutions, revealing the existence of alternative feature subsets with similar predictive power in real datasets, which is important for domain insights.
Contribution
It formalizes the concept of multiple equivalent solutions for Lasso in classification and regression, and develops an algorithm to identify them.
Findings
Multiple equivalent solutions exist in real datasets.
The extended Lasso algorithm can identify a subset of these solutions.
Lasso solutions often outperform SES solutions in prediction accuracy.
Abstract
Feature selection is an important problem studied in data analytics seeking to identify a minimal-size feature subset that is optimally predictive for an outcome of interest. It is also a powerful tool in Knowledge Discovery as a means for gaining domain insight, e.g., identifying which medical quantities carry unique information for the disease status. It is arguably less recognized however, that the problem may have multiple, equivalent solutions. In that case, it is misleading to domain experts to report only one of them and ignore all other equivalent solutions. In this paper, we extend a well-established single, feature selection algorithm (i.e., reporting a single solution), namely the Lasso algorithm, to the multiple solution problem based on formalized notion of equivalence for both classification and regression tasks. Empirical results are obtained using a fully automated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Modeling and Causal Inference · Machine Learning and Data Classification
