Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses
Steffen Dereich, Arnulf Jentzen, Adrian Riekert

TL;DR
This paper introduces a learning-rate-adaptive variant of SGD and Adam optimizers that adjusts the learning rate based on the objective function, leading to faster convergence in deep learning applications for PDEs and providing theoretical convergence guarantees.
Contribution
The paper proposes a novel adaptive learning rate scheme for SGD and Adam, with empirical validation in deep PDE approximation methods and a rigorous convergence proof for a class of problems.
Findings
Adaptive Adam converges faster than default Adam in deep learning PDE applications.
The proposed method improves objective function reduction in neural network training.
A convergence proof is provided for a class of quadratic minimization problems.
Abstract
It is known that the standard stochastic gradient descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam optimizer fail to converge if the learning rates do not converge to zero (as, for example, in the situation of constant learning rates). Numerical simulations often use human-tuned deterministic learning rate schedules or small constant learning rates. The default learning rate schedules for SGD optimization methods in machine learning implementation frameworks such as TensorFlow and Pytorch are constant learning rates. In this work we propose and study a learning-rate-adaptive approach for SGD optimization methods in which the learning rate is adjusted based on empirical estimates for the values of the objective function of the considered optimization problem (the function that one intends to minimize). In particular, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
MethodsAdam · Stochastic Gradient Descent
