Learning Transferable Architectures for Scalable Image Recognition

Barret Zoph; Vijay Vasudevan; Jonathon Shlens; Quoc V. Le

arXiv:1707.07012·cs.CV·April 12, 2018

Learning Transferable Architectures for Scalable Image Recognition

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le

PDF

5 Repos 3 Models

TL;DR

This paper introduces a method to learn neural network architectures directly on datasets, transferring designs from small to large datasets, achieving state-of-the-art accuracy with reduced computational costs.

Contribution

The work proposes a new search space and transfer learning approach for neural architecture search, along with a regularization technique, improving image recognition performance.

Findings

01

Achieved 2.4% error on CIFAR-10, state-of-the-art.

02

Attained 82.7% top-1 accuracy on ImageNet, state-of-the-art.

03

Reduced computational demand by 28% compared to previous models.

Abstract

Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. The key contribution of this work is the design of a new search space (the "NASNet search space") which enables transferability. In our experiments, we search for the best convolutional layer (or "cell") on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional architecture, named "NASNet architecture". We also introduce a new regularization technique called ScheduledDropPath that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsNeural Architecture Search · Sigmoid Activation · Tanh Activation · Entropy Regularization · Proximal Policy Optimization · Exponential Decay · Instance Normalization · Layer Normalization · Dropout · RMSProp