Large batch size training of neural networks with adversarial training   and second-order information

Zhewei Yao; Amir Gholami; Daiyaan Arfeen; Richard Liaw; Joseph; Gonzalez; Kurt Keutzer; Michael Mahoney

arXiv:1810.01021·cs.LG·January 6, 2020·35 cites

Large batch size training of neural networks with adversarial training and second-order information

Zhewei Yao, Amir Gholami, Daiyaan Arfeen, Richard Liaw, Joseph, Gonzalez, Kurt Keutzer, Michael Mahoney

PDF

Open Access 1 Repo

TL;DR

This paper introduces an efficient adaptive batch size training framework with autoscaling and second-order methods, improving training speed and accuracy for neural networks across multiple datasets.

Contribution

It presents a novel elastic scaling approach with negligible overhead and a new adaptive batch size scheme leveraging second-order and adversarial training methods.

Findings

01

Achieves up to 1% higher accuracy

02

Reduces number of SGD iterations by up to 5x

03

Demonstrates effectiveness across multiple datasets and architectures

Abstract

The most straightforward method to accelerate Stochastic Gradient Descent (SGD) computation is to distribute the randomly selected batch of inputs over multiple processors. To keep the distributed processors fully utilized requires commensurately growing the batch size. However, large batch training often leads to poorer generalization. A recently proposed solution for this problem is to use adaptive batch sizes in SGD. In this case, one starts with a small number of processes and scales the processes as training progresses. Two major challenges with this approach are (i) that dynamically resizing the cluster can add non-trivial overhead, in part since it is currently not supported, and (ii) that the overall speed up is limited by the initial phase with smaller batches. In this work, we address both challenges by developing a new adaptive batch size framework, with autoscaling based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amirgholami/hessianflow
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent