PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample   Regularization Techniques

Yubin Park; Joyce C. Ho

arXiv:1807.08383·stat.ML·July 24, 2018·6 cites

PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques

Yubin Park, Joyce C. Ho

PDF

Open Access

TL;DR

PaloBoost introduces novel regularization techniques using out-of-bag samples in Stochastic Gradient TreeBoost to improve robustness against overfitting, reduce parameter sensitivity, and enhance feature importance estimation.

Contribution

It presents a new method that leverages out-of-bag samples for gradient-aware pruning and adaptive learning rates, improving overfitting resistance and parameter robustness.

Findings

01

PaloBoost is robust to overfitting across multiple datasets.

02

It requires less parameter tuning compared to traditional methods.

03

The new feature importance formula better reflects node coverage and learning rates.

Abstract

Stochastic Gradient TreeBoost is often found in many winning solutions in public data science challenges. Unfortunately, the best performance requires extensive parameter tuning and can be prone to overfitting. We propose PaloBoost, a Stochastic Gradient TreeBoost model that uses novel regularization techniques to guard against overfitting and is robust to parameter settings. PaloBoost uses the under-utilized out-of-bag samples to perform gradient-aware pruning and estimate adaptive learning rates. Unlike other Stochastic Gradient TreeBoost models that use the out-of-bag samples to estimate test errors, PaloBoost treats the samples as a second batch of training samples to prune the trees and adjust the learning rates. As a result, PaloBoost can dynamically adjust tree depths and learning rates to achieve faster learning at the start and slower learning as the algorithm converges. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Neural Networks and Applications