Characterizing the Deep Neural Networks Inference Performance of Mobile Applications
Samuel S. Ogden, Tian Guo

TL;DR
This paper compares on-device and cloud-based deep neural network inference on mobile devices, analyzing performance and proposing CNNSelect, a dynamic model selection algorithm that improves inference accuracy and SLA compliance under variable conditions.
Contribution
It provides a comprehensive measurement study of inference performance on mobile and cloud, and introduces CNNSelect, a novel algorithm for adaptive CNN model selection in mobile applications.
Findings
Newer mobile devices can run optimized CNN models efficiently.
Cloud-based inference can be hindered by network variability.
CNNSelect improves accuracy and SLA adherence in most cases.
Abstract
Today's mobile applications are increasingly leveraging deep neural networks to provide novel features, such as image and speech recognitions. To use a pre-trained deep neural network, mobile developers can either host it in a cloud server, referred to as cloud-based inference, or ship it with their mobile application, referred to as on-device inference. In this work, we investigate the inference performance of these two common approaches on both mobile devices and public clouds, using popular convolutional neural networks. Our measurement study suggests the need for both on-device and cloud-based inferences for supporting mobile applications. In particular, newer mobile devices is able to run mobile-optimized CNN models in reasonable time. However, for older mobile devices or to use more complex CNN models, mobile applications should opt in for cloud-based inference. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · IoT and Edge/Fog Computing · Advanced Memory and Neural Computing
