Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions
Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, Roger Grosse

TL;DR
Self-Tuning Networks (STNs) introduce a scalable bilevel optimization approach that adaptively tunes hyperparameters during training by approximating the best-response function, outperforming existing methods on large-scale deep learning tasks.
Contribution
The paper presents a novel method for online hyperparameter tuning using structured best-response approximations, enabling adaptive and discrete hyperparameter optimization without differentiating the training loss.
Findings
Outperforms existing hyperparameter optimization methods on large-scale problems
Allows tuning of discrete hyperparameters, data augmentation, and dropout probabilities
Discovers hyperparameter schedules that outperform fixed values
Abstract
Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases. We show how to construct scalable best-response approximations for neural networks by modeling the best-response as a single network whose hidden units are gated conditionally on the regularizer. We justify this approximation by showing the exact best-response for a shallow linear network with L2-regularized Jacobian can be represented by a similar gating mechanism. We fit this model using a gradient-based hyperparameter optimization algorithm which alternates between approximating the best-response around the current hyperparameters and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
MethodsHyperNetwork · Dropout
