Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space
Vlad Pushkarov, Jonathan Efroni, Mykola Maksymenko, Maciej, Koch-Janusz

TL;DR
This paper introduces a novel hyperparameter optimization method that uses nonlocal path sampling via parallel tempering, leading to faster training and better generalization in deep neural networks.
Contribution
The authors propose a new approach that couples model instances to explore hyperparameter space more effectively using statistical physics techniques.
Findings
Faster training convergence observed.
Reduced overfitting and improved validation error.
Outperforms benchmark hyperparameter optimization methods.
Abstract
Hyperparameter optimization is both a practical issue and an interesting theoretical problem in training of deep architectures. Despite many recent advances the most commonly used methods almost universally involve training multiple and decoupled copies of the model, in effect sampling the hyperparameter space. We show that at a negligible additional computational cost, results can be improved by sampling nonlocal paths instead of points in hyperparameter space. To this end we interpret hyperparameters as controlling the level of correlated noise in training, which can be mapped to an effective temperature. The usually independent instances of the model are coupled and allowed to exchange their hyperparameters throughout the training using the well established parallel tempering technique of statistical physics. Each simulation corresponds then to a unique path, or history, in the joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Machine Learning and Algorithms
MethodsDropout
