Optimizing Deep Learning Inference on Embedded Systems Through Adaptive   Model Selection

Vicent Sanz Marco; Ben Taylor; Zheng Wang; Yehia Elkhatib

arXiv:1911.04946·cs.LG·November 13, 2019

Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Vicent Sanz Marco, Ben Taylor, Zheng Wang, Yehia Elkhatib

PDF

TL;DR

This paper introduces an adaptive model selection method that dynamically chooses the most suitable DNN for embedded inference tasks, balancing accuracy and latency without compromising privacy or connectivity.

Contribution

It presents a machine learning-based predictive model for selecting optimal DNNs on embedded devices, improving inference efficiency while maintaining accuracy.

Findings

01

1.8x faster inference for image classification with better accuracy

02

1.34x faster inference for machine translation with minimal quality loss

03

Effective on Jetson TX2 platform with diverse DNN models

Abstract

Deep neural networks ( DNNs ) are becoming a key enabling technology for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long inferencing time and resource requirements of many DNNs. Offloading computation into the cloud is often unacceptable due to privacy concerns, high latency, or the lack of connectivity. While compression algorithms often succeed in reducing inferencing times, they come at the cost of reduced accuracy. This paper presents a new, alternative approach to enable efficient execution of DNNs on embedded devices. Our approach dynamically determines which DNN to use for a given input, by considering the desired accuracy and inference time. It employs machine learning to develop a low-cost predictive model to quickly select a pre-trained DNN to use for a given input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.