HomeAdam: Adam and AdamW Algorithms Sometimes Go Home to Obtain Better Provable Generalization
Feihu Huang, Guanyi Zhang, Songcan Chen

TL;DR
This paper analyzes the generalization properties of Adam and AdamW optimizers, proposing a new variant called HomeAdam(W) that achieves better theoretical generalization bounds and convergence rates, supported by numerical experiments.
Contribution
The paper provides the first theoretical analysis of Adam(W)'s generalization error and introduces HomeAdam(W), a novel optimizer with improved generalization and convergence guarantees.
Findings
HomeAdam(W) achieves $O(1/N)$ generalization error, better than Adam(W)-srf and existing Adam variants.
HomeAdam(W) has a faster convergence rate of $O(1/T^{1/4})$ compared to Adam(W)-srf.
Numerical experiments confirm the efficiency of HomeAdam(W) in practice.
Abstract
Adam and AdamW are a class of default optimizers for training deep learning models in machine learning. These adaptive algorithms converge faster but generalize worse compared to SGD. In fact, their proved generalization error also is larger than of SGD, where denotes training sample size. Recently, although some variants of Adam have been proposed to improve its generalization, their improved generalizations are still unexplored in theory. To fill this gap, in the paper, we restudy generalization of Adam and AdamW via algorithmic stability, and first prove that Adam and AdamW without square-root (i.e., Adam(W)-srf) have a generalization error , where denotes iteration number and denotes the smallest element of second-order momentum plus a small positive number. To improve generalization, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Quantum Computing Algorithms and Architecture
