Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of $\ell_2$ Regularization
Qingyao Sun (University of Chicago)

TL;DR
This paper introduces PreDecomp, a new method for individualized feature attribution in gradient boosted trees with $\
Contribution
It presents PreDecomp and TreeInner, novel attribution methods that account for $\
Findings
PreDecomp accurately recovers additive models with independent features.
TreeInner achieves state-of-the-art feature selection performance.
The methods are validated on simulated and genomic datasets.
Abstract
While regularization is widely used in training gradient boosted trees, popular individualized feature attribution methods for trees such as Saabas and TreeSHAP overlook the training procedure. We propose Prediction Decomposition Attribution (PreDecomp), a novel individualized feature attribution for gradient boosted trees when they are trained with regularization. Theoretical analysis shows that the inner product between PreDecomp and labels on in-sample data is essentially the total gain of a tree, and that it can faithfully recover additive models in the population case when features are independent. Inspired by the connection between PreDecomp and total gain, we also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning
MethodsFeature Selection
