Hardware-Algorithm Co-Optimization of Early-Exit Neural Networks for Multi-Core Edge Accelerators
Alaa Zniber, Arne Symons, Ouassim Karrakchou, Marian Verhelst, Mounir Ghogho

TL;DR
This paper introduces a co-design framework for early-exit neural networks on multi-core edge accelerators, optimizing hardware and algorithm parameters to improve energy-latency trade-offs.
Contribution
It models the complex interactions between quantization, exit placement, and hardware mapping, enabling better deployment strategies for dynamic neural networks.
Findings
Achieved over 50% reduction in energy-latency product on CIFAR-10.
Identified that small architectural changes can significantly impact hardware efficiency.
Demonstrated the importance of deployment-aware co-design for heterogeneous edge platforms.
Abstract
Deployment of dynamic neural networks on edge accelerators requires careful consideration of hardware constraints beyond conventional complexity metrics such as Multiply-Accumulate operations. In Early-Exiting Neural Networks (EENN), exit placement, quantization level, and hardware workload mapping interact in non-trivial ways, influencing memory traffic, accelerator utilization, and ultimately energy-latency trade-offs. These interactions remain insufficiently understood in existing Neural Architecture Search (NAS) approaches, which typically rely on proxy metrics or hardware-in-the-loop evaluation. This work presents a hardware-algorithm co-design framework for EENN that explicitly models the interplay between quantization, exit configuration, and multi-core accelerator mapping. Using analytical design space exploration, we characterize how small architectural variations can induce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
