Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions
Zongshun Zhang, Ibrahim Matta

TL;DR
This paper surveys methods for optimizing deep learning inference placement across edge and cloud platforms, balancing latency, privacy, and cost through multi-objective optimization techniques.
Contribution
It provides a comprehensive review of current model offloading and adaptation strategies within a multi-objective optimization framework for edge-cloud deep learning applications.
Findings
Identifies key trade-offs in inference placement among latency, privacy, and cost.
Highlights recent techniques like model compression and architecture adaptation.
Discusses future directions for multi-objective optimization in edge-cloud DL deployment.
Abstract
Edge intelligent applications like VR/AR and language model based chatbots have become widespread with the rapid expansion of IoT and mobile devices. However, constrained edge devices often cannot serve the increasingly large and complex deep learning (DL) models. To mitigate these challenges, researchers have proposed optimizing and offloading partitions of DL models among user devices, edge servers, and the cloud. In this setting, users can take advantage of different services to support their intelligent applications. For example, edge resources offer low response latency. In contrast, cloud platforms provide low monetary cost computation resources for computation-intensive workloads. However, communication between DL model partitions can introduce transmission bottlenecks and pose risks of data leakage. Recent research aims to balance accuracy, computation delay, transmission delay,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
