Intra-DP: A High Performance Collaborative Inference System for Mobile Edge Computing
Zekai Sun, Xiuxian Guan, Zheng Lin, Zihan Fang, Xiangming Cai, Zhe Chen, Fangming Liu, Heming Cui, Jie Xiong, Wei Ni, Chau Yuen

TL;DR
Intra-DP is a novel collaborative inference system for mobile edge computing that significantly reduces latency and energy consumption by parallelizing local operator computations and overlapping transmission, enabling efficient DNN deployment on resource-limited devices.
Contribution
The paper introduces Intra-DP, a new parallel computing approach that decomposes local operators in DNNs to mitigate transmission bottlenecks in MEC environments.
Findings
Reduces per-inference latency by up to 50%.
Decreases energy consumption by up to 75%.
Maintains accuracy while improving efficiency.
Abstract
Deploying deep neural networks (DNNs) on resource-constrained mobile devices presents significant challenges, particularly in achieving real-time performance while simultaneously coping with limited computational resources and battery life. While Mobile Edge Computing (MEC) offers collaborative inference with GPU servers as a promising solution, existing approaches primarily rely on layer-wise model partitioning and undergo significant transmission bottlenecks caused by the sequential execution of DNN operations. To address this challenge, we present Intra-DP, a high-performance collaborative inference system optimized for DNN inference on MEC. Intra DP employs a novel parallel computing technique based on local operators (i.e., operators whose minimum unit input is not the entire input tensor, such as the convolution kernel). By decomposing their computations (operations) into several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
