PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning

Arnulf Jentzen; Julian Kranz; Adrian Riekert

arXiv:2505.22085·math.OC·May 29, 2025

PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning

Arnulf Jentzen, Julian Kranz, Adrian Riekert

PDF

Open Access

TL;DR

PADAM is a novel parallel averaging method for ADAM that dynamically selects the best averaged variant during training, reducing optimization error without extra gradient evaluations, especially effective in scientific machine learning tasks.

Contribution

Introduces PADAM, a parallel averaged ADAM approach that adaptively chooses the optimal averaging variant during training without additional gradient computations.

Findings

01

PADAM achieves the lowest optimization error in most tested problems.

02

Effective in scientific machine learning applications like PDEs and control problems.

03

Requires no more gradient evaluations than standard ADAM.

Abstract

Averaging techniques such as Ruppert--Polyak averaging and exponential movering averaging (EMA) are powerful approaches to accelerate optimization procedures of stochastic gradient descent (SGD) optimization methods such as the popular ADAM optimizer. However, depending on the specific optimization problem under consideration, the type and the parameters for the averaging need to be adjusted to achieve the smallest optimization error. In this work we propose an averaging approach, which we refer to as parallel averaged ADAM (PADAM), in which we compute parallely different averaged variants of ADAM and during the training process dynamically select the variant with the smallest optimization error. A central feature of this approach is that this procedure requires no more gradient evaluations than the usual ADAM optimizer as each of the averaged trajectories relies on the same underlying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification

MethodsStochastic Gradient Descent · Adam