Efficient Neural Architecture Search via Parameter Sharing

Hieu Pham; Melody Y. Guan; Barret Zoph; Quoc V. Le; Jeff Dean

arXiv:1802.03268·cs.LG·February 13, 2018·630 cites

Efficient Neural Architecture Search via Parameter Sharing

Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean

PDF

Open Access 5 Repos

TL;DR

ENAS introduces a fast, cost-effective neural architecture search method using parameter sharing, achieving state-of-the-art results on language modeling and competitive performance on image classification with significantly reduced computational resources.

Contribution

The paper presents ENAS, a novel neural architecture search approach that drastically reduces search time and cost through parameter sharing, outperforming or matching existing methods.

Findings

01

Achieves a test perplexity of 55.8 on Penn Treebank.

02

Attains a test error of 2.89% on CIFAR-10.

03

Reduces computational cost by 1000x compared to standard NAS.

Abstract

We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsSigmoid Activation · Tanh Activation · Softmax · Long Short-Term Memory