Deselection of Base-Learners for Statistical Boosting -- with an   Application to Distributional Regression

Annika Str\"omer; Christian Staerk; Nadja Klein; Leonie Weinhold,; Stephanie Titze; Andreas Mayr

arXiv:2202.01657·stat.ME·February 4, 2022

Deselection of Base-Learners for Statistical Boosting -- with an Application to Distributional Regression

Annika Str\"omer, Christian Staerk, Nadja Klein, Leonie Weinhold,, Stephanie Titze, Andreas Mayr

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new deselection procedure for component-wise gradient boosting to improve variable selection, especially in low-dimensional data, demonstrated through a distributional regression application in a health study.

Contribution

The paper proposes a novel method to deselect less important base-learners in boosting, reducing false positives and improving model interpretability.

Findings

01

Enhanced variable selection accuracy in low-dimensional data

02

Reduced inclusion of false positive variables

03

Improved prediction performance compared to existing methods

Abstract

We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include too many variables in some situations. This occurs particularly for low-dimensional data (p<n), where we observe a slow overfitting behavior of boosting. As a result, more variables get included into the final model without altering the prediction accuracy. Many of these false positives are incorporated with a small coefficient and therefore have a small impact, but lead to a larger model. We try to overcome this issue by giving the algorithm the chance to deselect base-learners with minor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

annikastr/deselectboost
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Liver Disease Diagnosis and Treatment · Bayesian Methods and Mixture Models