Searching for A Robust Neural Architecture in Four GPU Hours
Xuanyi Dong, Yi Yang

TL;DR
This paper introduces GDAS, a gradient-based neural architecture search method that efficiently finds high-performing models in just four GPU hours, significantly reducing search time compared to previous approaches.
Contribution
The paper presents a novel differentiable architecture search method using a DAG representation and a learnable sampler, enabling end-to-end training with gradient descent.
Findings
Search completed in four GPU hours on CIFAR-10
Discovered model achieves 2.82% test error
Model has only 2.5 million parameters
Abstract
Conventional neural architecture search (NAS) approaches are based on reinforcement learning or evolutionary strategy, which take more than 3000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach learning to search by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains billions of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG. This sampler is learnable and optimized by the validation loss after training the sampled architecture. In this way, our approach can be trained in an end-to-end fashion by gradient descent, named Gradient-based search using Differentiable Architecture Sampler (GDAS). In experiments, we can finish one searching procedure in four GPU hours on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsTest · Sigmoid Activation · Tanh Activation · Softmax · Long Short-Term Memory
