GPU-acceleration for Large-scale Tree Boosting
Huan Zhang, Si Si, Cho-Jui Hsieh

TL;DR
This paper introduces a GPU-accelerated, histogram-based algorithm for decision tree building that significantly improves the scalability and speed of training gradient boosting models without sacrificing accuracy.
Contribution
It develops a novel GPU-efficient histogram construction method for decision trees, enabling faster training in boosting systems compared to existing CPU and GPU algorithms.
Findings
7-8 times faster than CPU-based histogram algorithms in LightGBM
25 times faster than exact-split algorithms in XGBoost
Achieves similar accuracy with significantly reduced training time
Abstract
In this paper, we present a novel massively parallel algorithm for accelerating the decision tree building procedure on GPUs (Graphics Processing Units), which is a crucial step in Gradient Boosted Decision Tree (GBDT) and random forests training. Previous GPU based tree building algorithms are based on parallel multi-scan or radix sort to find the exact tree split, and thus suffer from scalability and performance issues. We show that using a histogram based algorithm to approximately find the best split is more efficient and scalable on GPU. By identifying the difference between classical GPU-based image histogram construction and the feature histogram construction in decision tree training, we develop a fast feature histogram building kernel on GPU with carefully designed computational and memory access sequence to reduce atomic update conflict and maximize GPU utilization. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Neural Networks and Applications
