Lipschitzness Effect of a Loss Function on Generalization Performance of Deep Neural Networks Trained by Adam and AdamW Optimizers
Mohammad Lashkari, Amin Gheibi

TL;DR
This paper demonstrates that the Lipschitz constant of a loss function significantly influences the generalization ability of deep neural networks trained with Adam and AdamW, providing theoretical insights and practical validation.
Contribution
It offers a theoretical proof linking the Lipschitz constant of a loss function to generalization error reduction for Adam-based training, and validates this with human age estimation experiments.
Findings
Lower Lipschitz constant loss functions improve generalization.
Loss functions with smaller maximum values enhance model performance.
Theoretical bounds align with empirical results in computer vision tasks.
Abstract
The generalization performance of deep neural networks with regard to the optimization algorithm is one of the major concerns in machine learning. This performance can be affected by various factors. In this paper, we theoretically prove that the Lipschitz constant of a loss function is an important factor to diminish the generalization error of the output model obtained by Adam or AdamW. The results can be used as a guideline for choosing the loss function when the optimization algorithm is Adam or AdamW. In addition, to evaluate the theoretical bound in a practical setting, we choose the human age estimation problem in computer vision. For assessing the generalization better, the training and test datasets are drawn from different distributions. Our experimental evaluation shows that the loss function with a lower Lipschitz constant and maximum value improves the generalization of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Face and Expression Recognition · Advanced Neural Network Applications
MethodsTest · Adam · AdamW
