Understanding the robustness difference between stochastic gradient   descent and adaptive gradient methods

Avery Ma; Yangchen Pan; Amir-massoud Farahmand

arXiv:2308.06703·cs.LG·November 30, 2023·2 cites

Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

Avery Ma, Yangchen Pan, Amir-massoud Farahmand

PDF

Open Access 1 Repo

TL;DR

This paper investigates why stochastic gradient descent (SGD) yields more robust models than adaptive methods like Adam, revealing that SGD-trained models have smaller Lipschitz constants and are less sensitive to irrelevant frequencies in data.

Contribution

The study provides empirical and theoretical insights into the robustness differences between SGD and adaptive methods, highlighting the role of weight norms and Lipschitz constants.

Findings

01

SGD-trained models exhibit greater robustness to input perturbations.

02

Models trained with adaptive methods are sensitive to irrelevant frequencies.

03

Smaller weight norms correlate with improved robustness in linear models.

Abstract

Stochastic gradient descent (SGD) and adaptive gradient methods, such as Adam and RMSProp, have been widely used in training deep neural networks. We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations. Notably, our investigation demonstrates the presence of irrelevant frequencies in natural datasets, where alterations do not affect models' generalization performance. However, models trained with adaptive methods show sensitivity to these changes, suggesting that their use of irrelevant frequencies can lead to solutions sensitive to perturbations. To better understand this difference, we study the learning dynamics of gradient descent (GD) and sign gradient descent (signGD) on a synthetic dataset that mirrors natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

averyma/opt-robust
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Adversarial Robustness in Machine Learning

MethodsAdam · RMSProp · Stochastic Gradient Descent