Runtime Optimizations for Prediction with Tree-Based Models
Nima Asadi, Jimmy Lin, and Arjen P. de Vries

TL;DR
This paper presents techniques to optimize the runtime prediction performance of tree-based models by improving memory layout, reducing branches, and employing vectorization to better utilize modern processor architectures.
Contribution
It introduces architecture-aware methods such as cache-conscious data structures, predication, and vectorization to significantly accelerate tree-based model predictions.
Findings
Speedup over traditional implementations
Enhanced cache utilization
Effective use of vectorization techniques
Abstract
Tree-based models have proven to be an effective solution for web ranking as well as other problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, given an already-trained model. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processor architectures. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures and significantly improve the speed of tree-based models over hard-coded if-else blocks. Our work contributes to the exploration of architecture-conscious runtime implementations of machine learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Parallel Computing and Optimization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
