How do SGD hyperparameters in natural training affect adversarial robustness?
Sandesh Kamath, Amit Deshpande, K V Subrahmanyam

TL;DR
This paper investigates how SGD hyperparameters like learning rate, batch size, and momentum influence both the accuracy and adversarial robustness of neural networks, revealing that certain configurations maintain robustness across batch sizes.
Contribution
It provides empirical insights into the effects of SGD hyperparameters on adversarial robustness, especially highlighting the role of momentum and learning rate to batch size ratio.
Findings
Constant learning rate to batch size ratio maintains robustness across batch sizes.
Momentum enhances robustness more effectively with varying batch sizes.
Robustness remains stable when training with fixed learning rate to batch size ratio.
Abstract
Learning rate, batch size and momentum are three important hyperparameters in the SGD algorithm. It is known from the work of Jastrzebski et al. arXiv:1711.04623 that large batch size training of neural networks yields models which do not generalize well. Yao et al. arXiv:1802.08241 observe that large batch training yields models that have poor adversarial robustness. In the same paper, the authors train models with different batch sizes and compute the eigenvalues of the Hessian of loss function. They observe that as the batch size increases, the dominant eigenvalues of the Hessian become larger. They also show that both adversarial training and small-batch training leads to a drop in the dominant eigenvalues of the Hessian or lowering its spectrum. They combine adversarial training and second order information to come up with a new large-batch training algorithm and obtain robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Integrated Circuits and Semiconductor Failure Analysis
MethodsStochastic Gradient Descent
