Selecting Hyperparameters for Tree-Boosting
Floris Jan Koster, Fabio Sigrist

TL;DR
This paper empirically compares various hyperparameter optimization methods for tree-boosting, finding SMAC to be the most effective, and highlights the importance of extensive tuning and early stopping for accuracy.
Contribution
It provides a comprehensive empirical comparison of hyperparameter optimization methods for tree-boosting and offers practical insights on tuning strategies.
Findings
SMAC outperforms other hyperparameter optimization methods.
More than 100 trials are needed for accurate tuning.
Default hyperparameters lead to inaccurate models.
Abstract
Tree-boosting is a widely used machine learning technique for tabular data. However, its out-of-sample accuracy is critically dependent on multiple hyperparameters. In this article, we empirically compare several popular methods for hyperparameter optimization for tree-boosting including random grid search, the tree-structured Parzen estimator (TPE), Gaussian-process-based Bayesian optimization (GP-BO), Hyperband, the sequential model-based algorithm configuration (SMAC) method, and deterministic full grid search using regression and classification data sets. We find that the SMAC method clearly outperforms all the other considered methods. We further observe that (i) a relatively large number of trials larger than is required for accurate tuning, (ii) using default values for hyperparameters yields very inaccurate models, (iii) all considered hyperparameters can have a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
