High performance and energy efficient inference for deep learning on ARM   processors

Adri\'an Castell\'o; Sergio Barrachina; Manuel F. Dolz; Enrique S.; Quintana-Ort\'i; Pau San Juan

arXiv:2105.09187·cs.DC·May 20, 2021

High performance and energy efficient inference for deep learning on ARM processors

Adri\'an Castell\'o, Sergio Barrachina, Manuel F. Dolz, Enrique S., Quintana-Ort\'i, Pau San Juan

PDF

Open Access

TL;DR

This paper presents an optimized deep learning inference framework for ARM processors that improves throughput and energy efficiency by leveraging thread-level parallelism, micro-kernels, and cache tuning, demonstrated on ResNet50.

Contribution

It introduces a highly optimized inference engine for ARM processors with novel micro-kernels and cache configurations, outperforming TFLite and approaching ArmNN performance.

Findings

01

Superior inference throughput compared to TFLite.

02

Energy-efficient inference on ARM processors.

03

Competitive performance with ArmNN.

Abstract

We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors involves several high-level transformations of the original framework, such as the development and integration of Cython routines to exploit thread-level parallelism; the design and development of micro-kernels for the matrix multiplication, vectorized with ARMs NEON intrinsics, that can accommodate layer fusion; and the appropriate selection of several cache configuration parameters tailored to the memory hierarchy of the target ARM processors. Our experiments evaluate both inference throughput (measured in processed images/s) and inference latency (i.e., time-to-response) as well as energy consumption per image when varying the level of thread parallelism and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Brain Tumor Detection and Classification