Critical Hyper-Parameters: No Random, No Cry
Olivier Bousquet, Sylvain Gelly, Karol Kurach, Olivier Teytaud, Damien, Vincent

TL;DR
This paper advocates for the use of Low Discrepancy Sequences in hyperparameter optimization for deep learning, demonstrating they outperform random search in efficiency and effectiveness.
Contribution
It introduces a simple LDS-based method for hyperparameter search that is theoretically sound and practically effective, serving as a drop-in replacement for traditional methods.
Findings
LDS methods outperform random search in hyperparameter tuning efficiency.
LDS require fewer runs to find suitable hyperparameters in deep learning models.
The proposed LDS approach is versatile as a one-shot or initialization method.
Abstract
The selection of hyper-parameters is critical in Deep Learning. Because of the long training time of complex models and the availability of compute resources in the cloud, "one-shot" optimization schemes - where the sets of hyper-parameters are selected in advance (e.g. on a grid or in a random manner) and the training is executed in parallel - are commonly used. It is known that grid search is sub-optimal, especially when only a few critical parameters matter, and suggest to use random search instead. Yet, random search can be "unlucky" and produce sets of values that leave some part of the domain unexplored. Quasi-random methods, such as Low Discrepancy Sequences (LDS) avoid these issues. We show that such methods have theoretical properties that make them appealing for performing hyperparameter search, and demonstrate that, when applied to the selection of hyperparameters of complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Probabilistic and Robust Engineering Design · Gaussian Processes and Bayesian Inference
MethodsRandom Search · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
