Design Insights into Partition Placement and Routing for DNN Inference in Multi-Hop Edge Networks
Jinkun Zhang, Poonam Yadav

TL;DR
This paper explores joint partition placement and routing strategies for DNN inference in multi-hop edge networks, addressing latency, congestion, and load balancing challenges.
Contribution
It formulates a congestion-aware optimization problem and proposes an alternating framework for effective partition placement and routing in heterogeneous edge networks.
Findings
Split flexibility enhances performance in IoT-edge-cloud setups.
Congestion-aware refinement improves efficiency under high load.
Optimal operating points depend on communication-computation tradeoffs.
Abstract
Partitioned DNN inference is a promising approach for latency-sensitive intelligent services in edge networks, since it allows different parts of a model to be executed across end devices, edge servers, and the cloud. However, in a multi-hop edge network, partition placement and inference traffic routing are inherently coupled: raw inputs, intermediate features, and final outputs may have very different sizes, while candidate nodes also differ in computation capability. In addition, both communication and computation delays can become congestion-dependent under load. In this paper, we study joint partition placement and routing for fixed-partition DNN inference over heterogeneous multi-hop edge networks. We consider a small number of DNN partitions, each placed at exactly one node without replication, and formulate a congestion-aware mixed discrete--continuous optimization problem that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
