Blockwisely Supervised Neural Architecture Search with Knowledge Distillation
Changlin Li, Jiefeng Peng, Liuchun Yuan, Guangrun Wang, Xiaodan Liang,, Liang Lin, Xiaojun Chang

TL;DR
This paper introduces a block-wise neural architecture search method guided by knowledge distillation, improving accuracy and efficiency in designing neural networks, and surpassing existing models on ImageNet.
Contribution
The paper proposes a novel block-wise NAS approach combined with architecture knowledge distillation, leading to more accurate and scalable neural network designs.
Findings
Achieved 78.4% top-1 accuracy on ImageNet with a mobile model.
Outperformed EfficientNet-B0 by 2.1% in accuracy.
Demonstrated the effectiveness of architecture knowledge distillation in NAS.
Abstract
Neural Architecture Search (NAS), aiming at automatically designing network architectures by machines, is hoped and expected to bring about a new revolution in machine learning. Despite these high expectation, the effectiveness and efficiency of existing NAS solutions are unclear, with some recent works going so far as to suggest that many existing NAS solutions are no better than random architecture selection. The inefficiency of NAS solutions may be attributed to inaccurate architecture evaluation. Specifically, to speed up NAS, recent works have proposed under-training different candidate architectures in a large search space concurrently by using shared network parameters; however, this has resulted in incorrect architecture ratings and furthered the ineffectiveness of NAS. In this work, we propose to modularize the large search space of NAS into blocks to ensure that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
