TL;DR
This paper proposes a scalable hardware-aware neural architecture search method that uses only one proxy device to efficiently predict latency across diverse devices by exploiting latency monotonicity, reducing the need for device-specific latency predictors.
Contribution
The authors introduce a novel approach leveraging latency monotonicity to reuse architectures across devices, and propose an adaptation technique to improve monotonicity when it is weak.
Findings
Using one proxy device yields architectures nearly as optimal as per-device NAS.
The approach significantly reduces the cost of latency prediction across multiple devices.
Experimental validation on various platforms and search spaces demonstrates effectiveness.
Abstract
Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity -- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
