Optimization of Oblivious Decision Tree Ensembles Evaluation for CPU
Alexey Mironov, Ilnur Khuziev

TL;DR
This paper enhances the evaluation speed of CatBoost's oblivious decision trees on CPUs by leveraging AVX instruction sets, achieving up to 70% speed improvements with minimal quality loss.
Contribution
It introduces AVX-based optimizations and a new speed-quality trade-off for faster CatBoost evaluation on single-core CPUs.
Findings
20-40% performance increase with AVX2 instructions
50-70% speed-up using float16 and AVX-512
Minimal impact on model quality
Abstract
CatBoost is a popular machine learning library. CatBoost models are based on oblivious decision trees, making training and evaluation rapid. CatBoost has many applications, and some require low latency and high throughput evaluation. This paper investigates the possibilities for improving CatBoost's performance in single-core CPU computations. We explore the new features provided by the AVX instruction sets to optimize evaluation. We increase performance by 20-40% using AVX2 instructions without quality impact. We also introduce a new trade-off between speed and quality. Using float16 for leaf values and AVX-512 instructions, we achieve 50-70% speed-up.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
