Optimization Framework for Splitting DNN Inference Jobs over Computing Networks
Sehun Jung, Hyang-Won Lee

TL;DR
This paper introduces an optimization framework using a layered graph model to efficiently distribute DNN inference tasks across network resources, significantly reducing latency in AI services for 6G systems.
Contribution
It presents a novel layered graph model that reformulates DNN inference job splitting as a routing problem, enabling faster and more adaptive solutions.
Findings
Faster solution times compared to existing methods.
Adaptive node and path selection reduces inference latency.
Effective for resource-constrained end devices.
Abstract
Ubiquitous artificial intelligence (AI) is considered one of the key services in 6G systems. AI services typically rely on deep neural network (DNN) requiring heavy computation. Hence, in order to support ubiquitous AI, it is crucial to provide a solution for offloading or distributing computational burden due to DNN, especially at end devices with limited resources. We develop an optimization framework for assigning the computation tasks of DNN inference jobs to computing resources in the network, so as to reduce the inference latency. To this end, we propose a layered graph model with which simple conventional routing jointly solves the problem of selecting nodes for computation and paths for data transfer between nodes. We show that using our model, the existing approaches to splitting DNN inference jobs can be equivalently reformulated as a routing problem that possesses better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Age of Information Optimization · Stochastic Gradient Optimization Techniques
