Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection
Vicent Sanz Marco, Ben Taylor, Zheng Wang, Yehia Elkhatib

TL;DR
This paper introduces an adaptive model selection method that dynamically chooses the most suitable DNN for embedded inference tasks, balancing accuracy and latency without compromising privacy or connectivity.
Contribution
It presents a machine learning-based predictive model for selecting optimal DNNs on embedded devices, improving inference efficiency while maintaining accuracy.
Findings
1.8x faster inference for image classification with better accuracy
1.34x faster inference for machine translation with minimal quality loss
Effective on Jetson TX2 platform with diverse DNN models
Abstract
Deep neural networks ( DNNs ) are becoming a key enabling technology for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long inferencing time and resource requirements of many DNNs. Offloading computation into the cloud is often unacceptable due to privacy concerns, high latency, or the lack of connectivity. While compression algorithms often succeed in reducing inferencing times, they come at the cost of reduced accuracy. This paper presents a new, alternative approach to enable efficient execution of DNNs on embedded devices. Our approach dynamically determines which DNN to use for a given input, by considering the desired accuracy and inference time. It employs machine learning to develop a low-cost predictive model to quickly select a pre-trained DNN to use for a given input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
