Automated Design Space Exploration for optimised Deployment of DNN on   Arm Cortex-A CPUs

Miguel de Prado; Andrew Mundy; Rabia Saeed; Maurizio Denna; Nuria; Pazos; Luca Benini

arXiv:2006.05181·cs.LG·December 29, 2020

Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs

Miguel de Prado, Andrew Mundy, Rabia Saeed, Maurizio Denna, Nuria, Pazos, Luca Benini

PDF

TL;DR

This paper introduces an automated reinforcement learning-based framework for optimizing deep neural network deployment on Arm Cortex-A CPUs, significantly improving performance and reducing memory usage with minimal accuracy loss.

Contribution

It presents a novel automated exploration framework that combines reinforcement learning with deep learning inference to optimize DNN deployment across software levels on embedded CPUs.

Findings

01

Up to 4x performance improvement on Arm Cortex-A CPUs.

02

Over 2x reduction in memory usage.

03

Negligible accuracy loss compared to standard implementations.

Abstract

The spread of deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN). Works have mainly focused on: i) efficient DNN architectures, ii) network optimisation techniques such as pruning and quantisation, iii) optimised algorithms to speed up the execution of the most computational intensive layers and, iv) dedicated hardware to accelerate the data flow and computation. However, there is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution. Thus, leading to suboptimal deployment in terms of latency, accuracy, and memory. In this work, we first detail and analyse the methods to improve the deployment of DNNs across the different levels of software optimisation. Building on this knowledge, we present an automated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings