Mobile-Cloud Inference for Collaborative Intelligence
Mateen Ulhaq

TL;DR
This paper explores a shared mobile-cloud inference approach that performs partial inference on mobile devices to reduce latency, energy use, bandwidth, and enhance privacy, with additional gains from feature tensor compression.
Contribution
It introduces a collaborative inference framework that balances local and cloud processing, improving efficiency and privacy over traditional cloud-only methods.
Findings
Reduces inference latency and energy consumption
Decreases network bandwidth usage
Enhances privacy by keeping raw data on device
Abstract
As AI applications for mobile devices become more prevalent, there is an increasing need for faster execution and lower energy consumption for deep learning model inference. Historically, the models run on mobile devices have been smaller and simpler in comparison to large state-of-the-art research models, which can only run on the cloud. However, cloud-only inference has drawbacks such as increased network bandwidth consumption and higher latency. In addition, cloud-only inference requires the input data (images, audio) to be fully transferred to the cloud, creating concerns about potential privacy breaches. There is an alternative approach: shared mobile-cloud inference. Partial inference is performed on the mobile in order to reduce the dimensionality of the input data and arrive at a compact feature tensor, which is a latent space representation of the input signal. The feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Age of Information Optimization · Context-Aware Activity Recognition Systems
