Minibatching Offers Improved Generalization Performance for Second Order   Optimizers

Eric Silk; Swarnita Chakraborty; Nairanjana Dasgupta; Anand D.; Sarwate; Andrew Lumsdaine; Tony Chiang

arXiv:2307.11684·cs.LG·July 24, 2023

Minibatching Offers Improved Generalization Performance for Second Order Optimizers

Eric Silk, Swarnita Chakraborty, Nairanjana Dasgupta, Anand D., Sarwate, Andrew Lumsdaine, Tony Chiang

PDF

Open Access

TL;DR

This paper empirically investigates how minibatching influences the generalization performance of second-order optimizers in deep neural network training, revealing that smaller batch sizes improve accuracy and reduce variance, especially for second-order methods.

Contribution

It provides the first comprehensive empirical analysis of batch size effects on second-order optimizers, highlighting their potential for more efficient training with less hyperparameter tuning.

Findings

01

Smaller batch sizes lead to higher peak accuracy.

02

Second-order optimizers show lower variance at certain batch sizes.

03

Full batch training performs the worst in terms of accuracy.

Abstract

Training deep neural networks (DNNs) used in modern machine learning is computationally expensive. Machine learning scientists, therefore, rely on stochastic first-order methods for training, coupled with significant hand-tuning, to obtain good performance. To better understand performance variability of different stochastic algorithms, including second-order methods, we conduct an empirical study that treats performance as a response variable across multiple training sessions of the same model. Using 2-factor Analysis of Variance (ANOVA) with interactions, we show that batch size used during training has a statistically significant effect on the peak accuracy of the methods, and that full batch largely performed the worst. In addition, we found that second-order optimizers (SOOs) generally exhibited significantly lower variance at specific batch sizes, suggesting they may require less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Multi-Objective Optimization Algorithms · Gaussian Processes and Bayesian Inference