A Comparative Study of CNN Optimization Methods for Edge AI: Exploring the Role of Early Exits
Nekane Fernandez, Ivan Valdes, Steven Van Vaerenbergh, Idoia de la Iglesia, and Julen Arratibel

TL;DR
This paper compares static compression and dynamic early-exit methods for deploying CNNs on edge devices, highlighting their distinct trade-offs and the benefits of combining both approaches for improved efficiency.
Contribution
It provides a unified, real-hardware evaluation of static and dynamic CNN optimization techniques, demonstrating their complementary strengths and combined effectiveness.
Findings
Static methods reduce memory footprint consistently.
Early-exit enables input-adaptive computation savings.
Combining both methods reduces latency and memory with minimal accuracy loss.
Abstract
Deploying deep neural networks on edge devices requires balancing accuracy, latency, and resource constraints under realistic execution conditions. To fit models within these constraints, two broad strategies have emerged: static compression techniques such as pruning and quantization, which permanently reduce model size, and dynamic approaches such as early-exit mechanisms, which adapt computational cost at runtime. While both families are widely studied in isolation, they are rarely compared under identical conditions on physical hardware. This paper presents a unified deployment-oriented comparison of static compression and dynamic early-exit mechanisms, evaluated on real edge devices using ONNX based inference pipelines. Our results show that static and dynamic techniques offer fundamentally different trade-offs for edge deployment. While pruning and quantization deliver consistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
