Optimization of Decision Tree Evaluation Using SIMD Instructions
Alexey Mironov, Ilnur Khuziev

TL;DR
This paper investigates how AVX CPU instructions can optimize decision tree evaluation, achieving significant speedups in model scoring processes crucial for large-scale machine learning applications.
Contribution
It demonstrates the potential of AVX instructions to enhance decision tree evaluation efficiency, building upon prior SSE-based methods.
Findings
35% speedup in binarization stage
20% speedup in trees apply stage
Improved CPU-based decision tree scoring performance
Abstract
Decision forest (decision tree ensemble) is one of the most popular machine learning algorithms. To use large models on big data, like document scoring with learning-to-rank models, we need to evaluate these models efficiently. In this paper, we explore MatrixNet, the ancestor of the popular CatBoost library. Both libraries use the SSE instruction set for scoring on CPU. This paper investigates the opportunities given by the AVX instruction set to evaluate models more efficiently. We achieved 35% speedup on the binarization stage (nodes conditions comparison), and 20% speedup on the trees apply stage on the ranking model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Data Mining Algorithms and Applications · Machine Learning and Data Classification
MethodsConvolution · Stochastic Steady-state Embedding · MatrixNet
