Self-Tuning Networks: Bilevel Optimization of Hyperparameters using   Structured Best-Response Functions

Matthew MacKay; Paul Vicol; Jon Lorraine; David Duvenaud; Roger Grosse

arXiv:1903.03088·cs.LG·March 8, 2019·29 cites

Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions

Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, Roger Grosse

PDF

Open Access 3 Repos

TL;DR

Self-Tuning Networks (STNs) introduce a scalable bilevel optimization approach that adaptively tunes hyperparameters during training by approximating the best-response function, outperforming existing methods on large-scale deep learning tasks.

Contribution

The paper presents a novel method for online hyperparameter tuning using structured best-response approximations, enabling adaptive and discrete hyperparameter optimization without differentiating the training loss.

Findings

01

Outperforms existing hyperparameter optimization methods on large-scale problems

02

Allows tuning of discrete hyperparameters, data augmentation, and dropout probabilities

03

Discovers hyperparameter schedules that outperform fixed values

Abstract

Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases. We show how to construct scalable best-response approximations for neural networks by modeling the best-response as a single network whose hidden units are gated conditionally on the regularizer. We justify this approximation by showing the exact best-response for a shallow linear network with L2-regularized Jacobian can be represented by a similar gating mechanism. We fit this model using a gradient-based hyperparameter optimization algorithm which alternates between approximating the best-response around the current hyperparameters and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques

MethodsHyperNetwork · Dropout