Automated Deep Neural Network Inference Partitioning for Distributed Embedded Systems
Fabian Kre\ss, El Mahdi El Annabi, Tim Hotfilter, Julian Hoefer, Tanja, Harbaum, Juergen Becker

TL;DR
This paper introduces a hardware-aware, graph-based framework for partitioning DNN inference across distributed embedded systems, improving performance and energy efficiency under strict constraints.
Contribution
It presents a novel automated layer scheduling method that optimally partitions DNNs considering system constraints and metrics.
Findings
Achieves up to 47.5% throughput increase for EfficientNet-B0
Demonstrates improved energy efficiency across six DNNs
Provides a systematic approach for hardware-aware DNN partitioning
Abstract
Distributed systems can be found in various applications, e.g., in robotics or autonomous driving, to achieve higher flexibility and robustness. Thereby, data flow centric applications such as Deep Neural Network (DNN) inference benefit from partitioning the workload over multiple compute nodes in terms of performance and energy-efficiency. However, mapping large models on distributed embedded systems is a complex task, due to low latency and high throughput requirements combined with strict energy and memory constraints. In this paper, we present a novel approach for hardware-aware layer scheduling of DNN inference in distributed embedded systems. Therefore, our proposed framework uses a graph-based algorithm to automatically find beneficial partitioning points in a given DNN. Each of these is evaluated based on several essential system metrics such as accuracy and memory utilization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
