Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems
Steffen Dereich, Arnulf Jentzen, Adrian Riekert

TL;DR
This paper demonstrates that averaged variants of the Adam optimizer significantly improve training efficiency and accuracy for deep neural networks applied to scientific computing problems like PDEs and optimal control, outperforming standard Adam and SGD.
Contribution
It introduces and evaluates averaged Adam variants inspired by Polyak-Ruppert averaging, showing their effectiveness in scientific computing and machine learning tasks.
Findings
Averaged Adam outperforms standard Adam in scientific machine learning tasks.
Averaged Adam improves training stability and convergence speed.
Source code is publicly available on GitHub.
Abstract
Deep learning methods - usually consisting of a class of deep neural networks (DNNs) trained by a stochastic gradient descent (SGD) optimization method - are nowadays omnipresent in data-driven learning problems as well as in scientific computing tasks such as optimal control (OC) and partial differential equation (PDE) problems. In practically relevant learning tasks, often not the plain-vanilla standard SGD optimization method is employed to train the considered class of DNNs but instead more sophisticated adaptive and accelerated variants of the standard SGD method such as the popular Adam optimizer are used. Inspired by the classical Polyak-Ruppert averaging approach, in this work we apply averaged variants of the Adam optimizer to train DNNs to approximately solve exemplary scientific computing problems in the form of PDEs and OC problems. We test the averaged variants of Adam in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks
MethodsStochastic Gradient Descent · Adam
