Sparse-Group Boosting with Balanced Selection Frequencies: A Simulation-Based Approach and R Implementation
Fabian Obster, Christian Heumann

TL;DR
This paper presents a new sparse-group boosting framework with a simulation-based algorithm to balance selection frequencies, implemented in an R package, improving variable selection fairness and interpretability in high-dimensional grouped data.
Contribution
It introduces a novel simulation-based balancing algorithm for variable selection in boosting, implemented in the sgboost R package, enhancing fairness and interpretability.
Findings
Demonstrates reduced group bias through simulations
Shows improved variable selection accuracy
Provides practical R implementation and examples
Abstract
This paper introduces a novel framework for reducing variable selection bias by balancing selection frequencies of base-learners in boosting and introduces the sgboost package in R, which implements this framework combined with sparse-group boosting. The group bias reduction algorithm employs a simulation-based approach to iteratively adjust the degrees of freedom for both individual and group base-learners, ensuring balanced selection probabilities and mitigating the tendency to over-select more complex groups. The efficacy of the group balancing algorithm is demonstrated through simulations. Sparse-group boosting offers a flexible approach for both group and individual variable selection, reducing overfitting and enhancing model interpretability for modeling high-dimensional data with natural groupings in covariates. The package uses regularization techniques based on the degrees of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R
