TL;DR
This paper introduces a differentiable neural architecture search method that efficiently generates models meeting multiple hardware constraints, significantly reducing memory and latency for IoT devices without sacrificing accuracy.
Contribution
It presents a novel single-shot NAS approach that incorporates multiple hardware constraints directly into the optimization process, enabling rapid deployment on low-power IoT devices.
Findings
Achieved 87.4% reduction in memory usage.
Reduced latency by 54.2%.
Maintained non-inferior accuracy on benchmarks.
Abstract
The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
