Dynamic DNN Decomposition for Lossless Synergistic Inference
Beibei Zhang, Tian Xiang, Hongxuan Zhang, Te Li, Shiqiang Zhu, Jianjun, Gu

TL;DR
D3 is a dynamic DNN decomposition system that enables lossless, resource-adaptive, and parallelized inference across device, edge, and cloud, significantly improving efficiency and reducing communication overhead.
Contribution
The paper introduces a novel heuristic partitioning algorithm and parallel feature map processing strategy for synergistic DNN inference without accuracy loss.
Findings
D3 outperforms state-of-the-art methods by up to 3.4x in inference time.
Reduces communication overhead by up to 3.68x.
Supports dynamic adaptation to resource and network changes.
Abstract
Deep neural networks (DNNs) sustain high performance in today's data processing applications. DNN inference is resource-intensive thus is difficult to fit into a mobile device. An alternative is to offload the DNN inference to a cloud server. However, such an approach requires heavy raw data transmission between the mobile device and the cloud server, which is not suitable for mission-critical and privacy-sensitive applications such as autopilot. To solve this problem, recent advances unleash DNN services using the edge computing paradigm. The existing approaches split a DNN into two parts and deploy the two partitions to computation nodes at two edge computing tiers. Nonetheless, these methods overlook collaborative device-edge-cloud computation resources. Besides, previous algorithms demand the whole DNN re-partitioning to adapt to computation resource changes and network dynamics.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · IoT and Edge/Fog Computing
