Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
Houwen Peng, Hao Du, Hongyuan Yu, Qi Li, Jing Liao, Jianlong Fu

TL;DR
This paper introduces a path distillation approach for one-shot neural architecture search that improves subnetwork training and final architecture quality by collaboratively learning and selecting top-performing paths during training.
Contribution
The paper proposes a novel path distillation method using prioritized paths that enhances training convergence and architecture quality without complex search algorithms.
Findings
Improves convergence ratio and performance of hypernetworks.
Achieves superior accuracy compared to MobileNetV3 and EfficientNet.
Demonstrates robustness across object detection and diverse search spaces.
Abstract
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in hypernetworks. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsPointwise Convolution · Depthwise Convolution · ReLU6 · Depthwise Separable Convolution · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Inverted Residual Block · Hard Swish · Convolution · Dense Connections
