Finding Influential Training Samples for Gradient Boosted Decision Trees

Boris Sharchilev; Yury Ustinovsky; Pavel Serdyukov; Maarten de Rijke

arXiv:1802.06640·cs.LG·March 14, 2018·19 cites

Finding Influential Training Samples for Gradient Boosted Decision Trees

Boris Sharchilev, Yury Ustinovsky, Pavel Serdyukov, Maarten de Rijke

PDF

Open Access 1 Repo

TL;DR

This paper develops efficient methods to identify influential training samples in gradient boosted decision trees by extending leave-one-out analysis, balancing accuracy and computational cost.

Contribution

It introduces novel approaches for influence estimation in GBDT models under fixed tree structures, improving efficiency over existing methods.

Findings

01

Proposed methods accurately identify influential samples.

02

Methods are computationally efficient compared to baselines.

03

Approaches perform well across various experimental scenarios.

Abstract

We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model's predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bsharchilev/influence_boosting
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning