Training Deep Neural Networks by optimizing over nonlocal paths in   hyperparameter space

Vlad Pushkarov; Jonathan Efroni; Mykola Maksymenko; Maciej; Koch-Janusz

arXiv:1909.04013·cs.LG·September 10, 2019·1 cites

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

Vlad Pushkarov, Jonathan Efroni, Mykola Maksymenko, Maciej, Koch-Janusz

PDF

Open Access

TL;DR

This paper introduces a novel hyperparameter optimization method that uses nonlocal path sampling via parallel tempering, leading to faster training and better generalization in deep neural networks.

Contribution

The authors propose a new approach that couples model instances to explore hyperparameter space more effectively using statistical physics techniques.

Findings

01

Faster training convergence observed.

02

Reduced overfitting and improved validation error.

03

Outperforms benchmark hyperparameter optimization methods.

Abstract

Hyperparameter optimization is both a practical issue and an interesting theoretical problem in training of deep architectures. Despite many recent advances the most commonly used methods almost universally involve training multiple and decoupled copies of the model, in effect sampling the hyperparameter space. We show that at a negligible additional computational cost, results can be improved by sampling nonlocal paths instead of points in hyperparameter space. To this end we interpret hyperparameters as controlling the level of correlated noise in training, which can be mapped to an effective temperature. The usually independent instances of the model are coupled and allowed to exchange their hyperparameters throughout the training using the well established parallel tempering technique of statistical physics. Each simulation corresponds then to a unique path, or history, in the joint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Machine Learning and Algorithms

MethodsDropout