Fast Inference of Tree Ensembles on ARM Devices
Simon Koschel, Sebastian Buschj\"ager, Claudio Lucchese, Katharina, Morik

TL;DR
This paper adapts and optimizes tree ensemble inference for ARM CPUs, demonstrating significant speed-ups and the benefits of quantization, crucial for deploying ML models in IoT devices.
Contribution
It extends fast inference techniques to ARM architectures, incorporating quantization and classification models, and analyzes architectural impacts on performance.
Findings
Speed-up of up to 9.4 times on ARM CPUs.
Quantized models outperform floating-point models in speed.
Implementation effectiveness varies across ARM devices.
Abstract
With the ongoing integration of Machine Learning models into everyday life, e.g. in the form of the Internet of Things (IoT), the evaluation of learned models becomes more and more an important issue. Tree ensembles are one of the best black-box classifiers available and routinely outperform more complex classifiers. While the fast application of tree ensembles has already been studied in the literature for Intel CPUs, they have not yet been studied in the context of ARM CPUs which are more dominant for IoT applications. In this paper, we convert the popular QuickScorer algorithm and its siblings from Intel's AVX to ARM's NEON instruction set. Second, we extend our implementation from ranking models to classification models such as Random Forests. Third, we investigate the effects of using fixed-point quantization in Random Forests. Our study shows that a careful implementation of tree…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Solar Radiation and Photovoltaics · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
