Cloud-based or On-device: An Empirical Study of Mobile Deep Inference
Tian Guo

TL;DR
This paper empirically compares cloud-based and on-device deep learning inference on mobile devices, revealing significant performance and energy trade-offs, and identifying key bottlenecks in on-device inference.
Contribution
It provides an empirical evaluation of mobile deep inference performance, highlighting the feasibility challenges of on-device inference compared to cloud-based solutions.
Findings
On-device inference can be up to 100 times slower than cloud-based inference.
Loading models and computing probabilities are major bottlenecks.
On-device inference consumes significantly more energy than cloud-based inference.
Abstract
Modern mobile applications are benefiting significantly from the advancement in deep learning, e.g., implementing real-time image recognition and conversational system. Given a trained deep learning model, applications usually need to perform a series of matrix operations based on the input data, in order to infer possible output values. Because of computational complexity and size constraints, these trained models are often hosted in the cloud. To utilize these cloud-based models, mobile apps will have to send input data over the network. While cloud-based deep learning can provide reasonable response time for mobile apps, it restricts the use case scenarios, e.g. mobile apps need to have network access. With mobile specific deep learning optimizations, it is now possible to employ on-device inference. However, because mobile hardware, such as GPU and memory size, can be very limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Green IT and Sustainability · Age of Information Optimization
