Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers

Tao Shi; Liangming Chen; Long Jin; and Mengchu Zhou

arXiv:2603.07122·cs.LG·March 10, 2026

Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers

Tao Shi, Liangming Chen, Long Jin, and Mengchu Zhou

PDF

Open Access

TL;DR

This paper introduces DualAdam, a new optimizer combining Adam and InvAdam to improve generalization in neural network training by escaping sharp minima, supported by theoretical analysis and extensive experiments.

Contribution

It proposes DualAdam, a novel optimizer that integrates Adam and InvAdam mechanisms to enhance convergence and generalization in deep learning.

Findings

01

DualAdam outperforms Adam and variants in generalization tasks.

02

InvAdam effectively escapes sharp minima as shown by diffusion theory.

03

Extensive experiments validate the improved performance of DualAdam.

Abstract

In the training of neural networks, adaptive moment estimation (Adam) typically converges fast but exhibits suboptimal generalization performance. A widely accepted explanation for its defect in generalization is that it often tends to converge to sharp minima. To enhance its ability to find flat minima, we propose its new variant named inverse Adam (InvAdam). The key improvement of InvAdam lies in its parameter update mechanism, which is opposite to that of Adam. Specifically, it computes element-wise multiplication of the first-order and second-order moments, while Adam computes the element-wise division of these two moments. This modification aims to increase the step size of the parameter update when the elements in the second-order moments are large and vice versa, which helps the parameter escape sharp minima and stay at flat ones. However, InvAdam's update mechanism may face…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare