High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors
Siqi Wang, Gayathri Ananthanarayanan, Yifan Zeng, Neeraj Goel, Anuj, Pathania, Tulika Mitra

TL;DR
This paper introduces Pipe-it, a pipelined framework for CNN inference on ARM big.LITTLE processors that improves throughput by optimizing layer distribution across heterogeneous cores.
Contribution
It proposes a novel pipelined approach with a performance prediction model to efficiently balance CNN layers across heterogeneous cores, surpassing previous methods.
Findings
Achieves 39% higher throughput on average compared to prior approaches.
Effectively models execution time using only convolutional layer descriptors.
Demonstrates improved utilization of heterogeneous core architectures.
Abstract
IoT Edge intelligence requires Convolutional Neural Network (CNN) inference to take place in the edge devices itself. ARM big.LITTLE architecture is at the heart of prevalent commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that enable power and performance trade-offs. All cores are expected to be simultaneously employed in inference to attain maximal throughput. However, high communication overhead involved in parallelization of computations from convolution kernels across clusters is detrimental to throughput. We present an alternative framework called Pipe-it that employs pipelined design to split convolutional layers across clusters while limiting parallelization of their respective kernels to the assigned cluster. We develop a performance-prediction model that utilizes only the convolutional layer descriptors to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
