Parameter Re-Initialization through Cyclical Batch Size Schedules

Norman Mu; Zhewei Yao; Amir Gholami; Kurt Keutzer; Michael; Mahoney

arXiv:1812.01216·cs.LG·April 21, 2021·5 cites

Parameter Re-Initialization through Cyclical Batch Size Schedules

Norman Mu, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael, Mahoney

PDF

Open Access

TL;DR

This paper introduces a cyclical batch size schedule for neural networks that re-initializes weights during training, improving performance, reducing training time, and enabling ensemble and adversarial training techniques.

Contribution

It proposes a novel cyclical batch size schedule based on Bayesian principles for weight re-initialization during training.

Findings

01

Improves language modeling perplexity by up to 7.91

02

Reduces training iterations by up to 61%

03

Enables snapshot ensembling and adversarial training

Abstract

Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions. Here, we propose a method of weight re-initialization by repeated annealing and injection of noise in the training process. We implement this through a cyclical batch size schedule motivated by a Bayesian perspective of neural network training. We evaluate our methods through extensive experiments on tasks in language modeling, natural language inference, and image classification. We demonstrate the ability of our method to improve language modeling performance by up to 7.91 perplexity and reduce training iterations by up to $61%$ , in addition to its flexibility in enabling snapshot ensembling and use with adversarial training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Machine Learning and Algorithms