Improving Group Lasso for high-dimensional categorical data
Szymon Nowakowski, Piotr Pokarowski, Wojciech Rejchel, Agnieszka, So{\l}tys

TL;DR
This paper introduces a two-step method to improve the sparsity and interpretability of Group Lasso models for high-dimensional categorical data, combining dimensionality reduction and clustering-based model selection.
Contribution
The paper proposes a novel two-step procedure that enhances the sparsity and interpretability of Group Lasso estimates for categorical data in high-dimensional settings.
Findings
The method improves prediction accuracy over existing algorithms.
It achieves sparser models with better interpretability.
The approach performs well on both synthetic and real datasets.
Abstract
Sparse modelling or model selection with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm for selection continuous or categorical variables, but all estimates related to a selected factor usually differ. Therefore, a fitted model may not be sparse, which makes the model interpretation difficult. To obtain a sparse solution of the Group Lasso we propose the following two-step procedure: first, we reduce data dimensionality using the Group Lasso; then to choose the final model we use an information criterion on a small family of models prepared by clustering levels of individual factors. We investigate selection correctness of the algorithm in a sparse high-dimensional scenario. We also test our method on synthetic as well as real datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Modeling and Causal Inference · Statistical Methods and Bayesian Inference
MethodsTest
