Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney

TL;DR
This paper investigates how large batch training affects the loss landscape of neural networks using Hessian analysis, revealing that larger Hessian spectra correlate with poorer robustness and that robust training favors flatter minima.
Contribution
It provides a Hessian-based analysis of large batch training effects, showing that saddle points are not the cause of generalization gaps and linking robustness to the flatness of minima.
Findings
Large batch training converges to points with higher Hessian spectrum.
Robust training favors flatter minima with lower Hessian spectrum.
The inner loop of robust training is nearly saddle-free almost everywhere.
Abstract
Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The exact underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze exactly how the landscape of the loss function changes when training with large batch size. We compute the true Hessian spectrum, without approximation, by back-propagating the second derivative. Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably higher Hessian spectrum. Furthermore, we show that robust training allows one to favor flat areas, as points with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
