Hessian-based Analysis of Large Batch Training and Robustness to   Adversaries

Zhewei Yao; Amir Gholami; Qi Lei; Kurt Keutzer; Michael W. Mahoney

arXiv:1802.08241·cs.CV·April 21, 2021·65 cites

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney

PDF

Open Access 5 Repos

TL;DR

This paper investigates how large batch training affects the loss landscape of neural networks using Hessian analysis, revealing that larger Hessian spectra correlate with poorer robustness and that robust training favors flatter minima.

Contribution

It provides a Hessian-based analysis of large batch training effects, showing that saddle points are not the cause of generalization gaps and linking robustness to the flatness of minima.

Findings

01

Large batch training converges to points with higher Hessian spectrum.

02

Robust training favors flatter minima with lower Hessian spectrum.

03

The inner loop of robust training is nearly saddle-free almost everywhere.

Abstract

Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The exact underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze exactly how the landscape of the loss function changes when training with large batch size. We compute the true Hessian spectrum, without approximation, by back-propagating the second derivative. Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably higher Hessian spectrum. Furthermore, we show that robust training allows one to favor flat areas, as points with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning