Learning rate adaptive stochastic gradient descent optimization methods:   numerical simulations for deep learning methods for partial differential   equations and convergence analyses

Steffen Dereich; Arnulf Jentzen; Adrian Riekert

arXiv:2406.14340·math.OC·June 21, 2024

Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses

Steffen Dereich, Arnulf Jentzen, Adrian Riekert

PDF

Open Access 1 Repo

TL;DR

This paper introduces a learning-rate-adaptive variant of SGD and Adam optimizers that adjusts the learning rate based on the objective function, leading to faster convergence in deep learning applications for PDEs and providing theoretical convergence guarantees.

Contribution

The paper proposes a novel adaptive learning rate scheme for SGD and Adam, with empirical validation in deep PDE approximation methods and a rigorous convergence proof for a class of problems.

Findings

01

Adaptive Adam converges faster than default Adam in deep learning PDE applications.

02

The proposed method improves objective function reduction in neural network training.

03

A convergence proof is provided for a class of quadratic minimization problems.

Abstract

It is known that the standard stochastic gradient descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam optimizer fail to converge if the learning rates do not converge to zero (as, for example, in the situation of constant learning rates). Numerical simulations often use human-tuned deterministic learning rate schedules or small constant learning rates. The default learning rate schedules for SGD optimization methods in machine learning implementation frameworks such as TensorFlow and Pytorch are constant learning rates. In this work we propose and study a learning-rate-adaptive approach for SGD optimization methods in which the learning rate is adjusted based on empirical estimates for the values of the objective function of the considered optimization problem (the function that one intends to minimize). In particular, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deeplearningmethods/adaptive-learning-rate
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques

MethodsAdam · Stochastic Gradient Descent