Partitioning and Deployment of Deep Neural Networks on Edge Clusters
Arjun Parthasarathy, Bhaskar Krishnamachari

TL;DR
This paper introduces a scalable, fault-tolerant system for partitioning and deploying deep neural networks across edge device clusters to maximize inference throughput, outperforming baseline algorithms.
Contribution
The authors present a novel algorithm and system for partitioning DNNs across edge clusters, with open-source implementation, improving throughput and fault-tolerance.
Findings
Reduced bottleneck latency by up to 10x compared to random algorithms.
Achieved 35% improvement over greedy joint partitioning algorithms.
Produced near-optimal results within 9.2% of the best possible latency.
Abstract
Edge inference has become more widespread, as its diverse applications range from retail to wearable technology. Clusters of networked resource-constrained edge devices are becoming common, yet no system exists to split a DNN across these clusters while maximizing the inference throughput of the system. Additionally, no production-ready orchestration system exists for deploying said models over such edge networks which adopts the robustness and scalability of the cloud. We present an algorithm which partitions DNNs and distributes them across a set of edge devices with the goal of minimizing the bottleneck latency and therefore maximizing inference throughput. The system scales well to systems of different node memory capacities and numbers of nodes, while being node fault-tolerant. We find that we can reduce the bottleneck latency by 10x over a random algorithm and 35% over a greedy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
