Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time
Matteo Risso, Alessio Burrello, Daniele Jahier Pagliari

TL;DR
This paper introduces ODiMO, a hardware-aware training-time tool for optimally mapping DNNs onto multi-accelerator SoCs, significantly improving latency and energy efficiency while maintaining accuracy.
Contribution
ODiMO is the first to explore fine-grain, training-time mapping of DNN layers across heterogeneous accelerators to optimize energy and latency with accuracy considerations.
Findings
Up to 8x latency reduction at iso-accuracy.
Up to 50.8x energy efficiency improvements.
Minimal accuracy loss (<0.3%) with optimized mappings.
Abstract
The demand for executing Deep Neural Networks (DNNs) with low latency and minimal power consumption at the edge has led to the development of advanced heterogeneous Systems-on-Chips (SoCs) that incorporate multiple specialized computing units (CUs), such as accelerators. Offloading DNN computations to a specific CU from the available set often exposes accuracy vs efficiency trade-offs, due to differences in their supported operations (e.g., standard vs. depthwise convolution) or data representations (e.g., more/less aggressively quantized). A challenging yet unresolved issue is how to map a DNN onto these multi-CU systems to maximally exploit the parallelization possibilities while taking accuracy into account. To address this problem, we present ODiMO, a hardware-aware tool that efficiently explores fine-grain mapping of DNNs among various on-chip CUs, during the training phase. ODiMO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNuclear Physics and Applications · Particle Detector Development and Performance · Radiation Effects in Electronics
MethodsSparse Evolutionary Training
