Group Lasso merger for sparse prediction with high-dimensional categorical data
Szymon Nowakowski, Piotr Pokarowski, Wojciech Rejchel

TL;DR
This paper introduces GLAMER, a novel algorithm that merges levels of categorical variables to achieve sparsity in high-dimensional prediction models, with proven consistency and strong empirical results.
Contribution
The paper presents GLAMER, the first algorithm with proven selection consistency for sparse high-dimensional models with categorical data, by merging similar levels.
Findings
GLAMER effectively recovers true sparse models in high-dimensional settings.
The algorithm demonstrates strong empirical performance in numerical experiments.
GLAMER achieves model sparsity by merging levels based on estimate differences.
Abstract
Sparse prediction with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm for selection continuous or categorical variables, but all estimates related to a selected factor usually differ, so a fitted model may not be sparse. To make the Group Lasso solution sparse, we propose to merge levels of the selected factor, if a difference between its corresponding estimates is less than some predetermined threshold. We prove that under weak conditions our algorithm, called GLAMER for Group LAsso MERger, recovers the true, sparse linear or logistic model even for the high-dimensional scenario, that is when a number of parameters is greater than a learning sample size. To our knowledge, selection consistency has been proven many times for different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
