Training Neural Networks at Any Scale

Thomas Pethick; Kimon Antonakopoulos; Antonio Silveti-Falls; Leena Chennuru Vankadara; Volkan Cevher

arXiv:2511.11163·cs.LG·November 17, 2025

Training Neural Networks at Any Scale

Thomas Pethick, Kimon Antonakopoulos, Antonio Silveti-Falls, Leena Chennuru Vankadara, Volkan Cevher

PDF

Open Access

TL;DR

This paper reviews advanced optimization techniques for neural network training, emphasizing scalability and efficiency, and introduces a unified framework adaptable to various problem sizes for practitioners and researchers.

Contribution

It presents a unified algorithmic template for modern optimization methods that are scalable and adaptable to different neural network problem sizes.

Findings

01

State-of-the-art algorithms under a unified framework

02

Techniques for making algorithms scale-agnostic

03

Guidance for practitioners and researchers

Abstract

This article reviews modern optimization methods for training neural networks with an emphasis on efficiency and scale. We present state-of-the-art optimization algorithms under a unified algorithmic template that highlights the importance of adapting to the structures in the problem. We then cover how to make these algorithms agnostic to the scale of the problem. Our exposition is intended as an introduction for both practitioners and researchers who wish to be involved in these exciting new developments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and Data Classification