High performance and energy efficient inference for deep learning on ARM processors
Adri\'an Castell\'o, Sergio Barrachina, Manuel F. Dolz, Enrique S., Quintana-Ort\'i, Pau San Juan

TL;DR
This paper presents an optimized deep learning inference framework for ARM processors that improves throughput and energy efficiency by leveraging thread-level parallelism, micro-kernels, and cache tuning, demonstrated on ResNet50.
Contribution
It introduces a highly optimized inference engine for ARM processors with novel micro-kernels and cache configurations, outperforming TFLite and approaching ArmNN performance.
Findings
Superior inference throughput compared to TFLite.
Energy-efficient inference on ARM processors.
Competitive performance with ArmNN.
Abstract
We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors involves several high-level transformations of the original framework, such as the development and integration of Cython routines to exploit thread-level parallelism; the design and development of micro-kernels for the matrix multiplication, vectorized with ARMs NEON intrinsics, that can accommodate layer fusion; and the appropriate selection of several cache configuration parameters tailored to the memory hierarchy of the target ARM processors. Our experiments evaluate both inference throughput (measured in processed images/s) and inference latency (i.e., time-to-response) as well as energy consumption per image when varying the level of thread parallelism and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Brain Tumor Detection and Classification
