Automatic Gradient Descent: Deep Learning without Hyperparameters

Jeremy Bernstein; Chris Mingard; Kevin Huang; Navid Azizan; and Yisong Yue

arXiv:2304.05187·cs.LG·April 12, 2023·6 cites

Automatic Gradient Descent: Deep Learning without Hyperparameters

Jeremy Bernstein, Chris Mingard, Kevin Huang, Navid Azizan, and Yisong Yue

PDF

Open Access 1 Repo

TL;DR

This paper introduces automatic gradient descent, a hyperparameter-free optimizer tailored for neural architectures, extending mirror descent theory to non-convex deep networks, enabling out-of-the-box training at large scale.

Contribution

It develops a new optimization framework that explicitly incorporates neural architecture, resulting in an automatic, hyperparameter-free gradient descent method for deep learning.

Findings

01

Successfully trains deep networks without hyperparameters

02

Applies to fully-connected and convolutional networks

03

Operates efficiently at ImageNet scale

Abstract

The architecture of a deep neural network is defined explicitly in terms of the number of layers, the width of each layer and the general network topology. Existing optimisation frameworks neglect this information in favour of implicit architectural information (e.g. second-order methods) or architecture-agnostic distance functions (e.g. mirror descent). Meanwhile, the most popular optimiser in practice, Adam, is based on heuristics. This paper builds a new framework for deriving optimisation algorithms that explicitly leverage neural architecture. The theory extends mirror descent to non-convex composite objective functions: the idea is to transform a Bregman divergence to account for the non-linear structure of neural architecture. Working through the details for deep fully-connected networks yields automatic gradient descent: a first-order optimiser without any hyperparameters.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jxbz/agd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM

MethodsAdam