Individualized and Global Feature Attributions for Gradient Boosted   Trees in the Presence of $\ell_2$ Regularization

Qingyao Sun (University of Chicago)

arXiv:2211.04409·stat.ML·November 9, 2022

Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of $\ell_2$ Regularization

Qingyao Sun (University of Chicago)

PDF

Open Access 1 Repo

TL;DR

This paper introduces PreDecomp, a new method for individualized feature attribution in gradient boosted trees with $\

Contribution

It presents PreDecomp and TreeInner, novel attribution methods that account for $\

Findings

01

PreDecomp accurately recovers additive models with independent features.

02

TreeInner achieves state-of-the-art feature selection performance.

03

The methods are validated on simulated and genomic datasets.

Abstract

While $ℓ_{2}$ regularization is widely used in training gradient boosted trees, popular individualized feature attribution methods for trees such as Saabas and TreeSHAP overlook the training procedure. We propose Prediction Decomposition Attribution (PreDecomp), a novel individualized feature attribution for gradient boosted trees when they are trained with $ℓ_{2}$ regularization. Theoretical analysis shows that the inner product between PreDecomp and labels on in-sample data is essentially the total gain of a tree, and that it can faithfully recover additive models in the population case when features are independent. Inspired by the connection between PreDecomp and total gain, we also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nalzok/treeinner
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning

MethodsFeature Selection