Orchestrating Multimodal DNN Workloads in Wireless Neural Processing
Sai Xu, Kai-Kit Wong, Yanan Du, Hyundong Shin

TL;DR
This paper introduces a unified framework for optimizing multimodal DNN workloads in wireless neural processing, effectively reducing inference latency by overlapping wireless transmission with DNN execution.
Contribution
It proposes a novel end-to-end orchestration framework, O-WiN, with algorithms RTFS and PACS for improved scheduling and latency reduction in wireless neural processing.
Findings
PACS outperforms RTFS in high heterogeneity scenarios
Communication-computation pipelining accelerates multimodal DNN execution
Unified model enables effective workload orchestration in WNP
Abstract
In edge inference, wireless resource allocation and accelerator-level deep neural network (DNN) scheduling have yet to be co-optimized in an end-to-end manner. The lack of coordination between wireless transmission and accelerator-level DNN execution prevents efficient overlap, leading to higher end-to-end inference latency. To address this issue, this paper investigates multimodal DNN workload orchestration in wireless neural processing (WNP), a paradigm that integrates wireless transmission and multi-core accelerator execution into a unified end-to-end pipeline. First, we develop a unified communication-computation model for multimodal DNN execution and formulate the corresponding optimization problem. Second, we propose O-WiN, a framework that orchestrates DNN workloads in WNP through two tightly coupled stages: simulation-based optimization and runtime execution. Third, we develop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · IoT and Edge/Fog Computing · Privacy-Preserving Technologies in Data
