Averaged Adam accelerates stochastic optimization in the training of   deep neural network approximations for partial differential equation and   optimal control problems

Steffen Dereich; Arnulf Jentzen; Adrian Riekert

arXiv:2501.06081·math.OC·January 13, 2025

Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems

Steffen Dereich, Arnulf Jentzen, Adrian Riekert

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that averaged variants of the Adam optimizer significantly improve training efficiency and accuracy for deep neural networks applied to scientific computing problems like PDEs and optimal control, outperforming standard Adam and SGD.

Contribution

It introduces and evaluates averaged Adam variants inspired by Polyak-Ruppert averaging, showing their effectiveness in scientific computing and machine learning tasks.

Findings

01

Averaged Adam outperforms standard Adam in scientific machine learning tasks.

02

Averaged Adam improves training stability and convergence speed.

03

Source code is publicly available on GitHub.

Abstract

Deep learning methods - usually consisting of a class of deep neural networks (DNNs) trained by a stochastic gradient descent (SGD) optimization method - are nowadays omnipresent in data-driven learning problems as well as in scientific computing tasks such as optimal control (OC) and partial differential equation (PDE) problems. In practically relevant learning tasks, often not the plain-vanilla standard SGD optimization method is employed to train the considered class of DNNs but instead more sophisticated adaptive and accelerated variants of the standard SGD method such as the popular Adam optimizer are used. Inspired by the classical Polyak-Ruppert averaging approach, in this work we apply averaged variants of the Adam optimizer to train DNNs to approximately solve exemplary scientific computing problems in the form of PDEs and OC problems. We test the averaged variants of Adam in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deeplearningmethods/averaged-adam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks

MethodsStochastic Gradient Descent · Adam