Efficient Incorporation of Multiple Latency Targets in the Once-For-All Network
Vidhur Kumar, Andrew Szidon

TL;DR
This paper proposes two strategies, Top-down and Bottom-up, to efficiently incorporate multiple latency targets into the Once-For-All network, significantly improving search speed without losing accuracy.
Contribution
Introduces warm starting and randomized pruning strategies to enhance OFA's multi-latency target search efficiency.
Findings
Significant reduction in search time compared to original OFA implementation.
Maintains high accuracy of subnetworks across multiple latency targets.
Generalizes performance improvements across various design spaces.
Abstract
Neural Architecture Search has proven an effective method of automating architecture engineering. Recent work in the field has been to look for architectures subject to multiple objectives such as accuracy and latency to efficiently deploy them on different target hardware. Once-for-All (OFA) is one such method that decouples training and search and is able to find high-performance networks for different latency constraints. However, the search phase is inefficient at incorporating multiple latency targets. In this paper, we introduce two strategies (Top-down and Bottom-up) that use warm starting and randomized network pruning for the efficient incorporation of multiple latency targets in the OFA network. We evaluate these strategies against the current OFA implementation and demonstrate that our strategies offer significant running time performance gains while not sacrificing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Adversarial Robustness in Machine Learning
MethodsPruning
