ADWPNAS: Architecture-Driven Weight Prediction for Neural Architecture Search
XuZhang, ChenjunZhou, BoGu

TL;DR
This paper introduces ADWPNAS, a neural architecture search method that predicts model weights using a HyperNetwork, enabling fast evaluation of architectures without finetuning, and achieves state-of-the-art results efficiently.
Contribution
The paper presents a novel architecture-driven weight prediction approach for NAS that significantly reduces search time and improves model performance.
Findings
Search procedure completes in 4.0 GPU hours on CIFAR-10.
Discovered model achieves 2.41% test error with 1.52M parameters.
Method outperforms existing models in efficiency and accuracy.
Abstract
How to discover and evaluate the true strength of models quickly and accurately is one of the key challenges in Neural Architecture Search (NAS). To cope with this problem, we propose an Architecture-Driven Weight Prediction (ADWP) approach for neural architecture search (NAS). In our approach, we first design an architecture-intensive search space and then train a HyperNetwork by inputting stochastic encoding architecture parameters. In the trained HyperNetwork, weights of convolution kernels can be well predicted for neural architectures in the search space. Consequently, the target architectures can be evaluated efficiently without any finetuning, thus enabling us to search fortheoptimalarchitectureinthespaceofgeneralnetworks (macro-search). Through real experiments, we evaluate the performance of the models discovered by the proposed AD-WPNAS and results show that one search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsHyperNetwork · Sigmoid Activation · Tanh Activation · Softmax · Long Short-Term Memory · Convolution
