Training Neural Networks at Any Scale
Thomas Pethick, Kimon Antonakopoulos, Antonio Silveti-Falls, Leena Chennuru Vankadara, Volkan Cevher

TL;DR
This paper reviews advanced optimization techniques for neural network training, emphasizing scalability and efficiency, and introduces a unified framework adaptable to various problem sizes for practitioners and researchers.
Contribution
It presents a unified algorithmic template for modern optimization methods that are scalable and adaptable to different neural network problem sizes.
Findings
State-of-the-art algorithms under a unified framework
Techniques for making algorithms scale-agnostic
Guidance for practitioners and researchers
Abstract
This article reviews modern optimization methods for training neural networks with an emphasis on efficiency and scale. We present state-of-the-art optimization algorithms under a unified algorithmic template that highlights the importance of adapting to the structures in the problem. We then cover how to make these algorithms agnostic to the scale of the problem. Our exposition is intended as an introduction for both practitioners and researchers who wish to be involved in these exciting new developments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and Data Classification
