Towards Automated Deep Learning: Efficient Joint Neural Architecture and   Hyperparameter Search

Arber Zela; Aaron Klein; Stefan Falkner; Frank Hutter

arXiv:1807.06906·cs.LG·July 19, 2018·88 cites

Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search

Arber Zela, Aaron Klein, Stefan Falkner, Frank Hutter

PDF

Open Access 3 Repos

TL;DR

This paper introduces a joint neural architecture and hyperparameter search method using Bayesian optimization and Hyperband, addressing inefficiencies and suboptimality in traditional separate tuning approaches.

Contribution

It proposes a novel combined search approach that optimizes architecture and hyperparameters simultaneously, improving efficiency and performance over existing methods.

Findings

01

Joint search outperforms separate tuning in accuracy.

02

Using Bayesian optimization with Hyperband reduces search time.

03

The method finds better configurations than traditional NAS.

Abstract

While existing work on neural architecture search (NAS) tunes hyperparameters in a separate post-processing step, we demonstrate that architectural choices and other hyperparameter settings interact in a way that can render this separation suboptimal. Likewise, we demonstrate that the common practice of using very few epochs during the main NAS and much larger numbers of epochs during a post-processing step is inefficient due to little correlation in the relative rankings for these two training regimes. To combat both of these problems, we propose to use a recent combination of Bayesian optimization and Hyperband for efficient joint neural architecture and hyperparameter search.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Machine Learning and Algorithms

MethodsSigmoid Activation · Tanh Activation · Softmax · Long Short-Term Memory