Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Ansh Nagwekar

arXiv:2512.18373·cs.LG·December 23, 2025

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Ansh Nagwekar

PDF

Open Access

TL;DR

This paper explores the evolution of neural network optimization algorithms, emphasizing principled design and advanced techniques like second-order methods to improve training efficiency and understanding.

Contribution

It provides a comprehensive analysis of optimization methods from first-order to higher-order techniques, offering practical strategies for modern deep learning training.

Findings

01

Limitations of SGD in anisotropic data regimes

02

Advantages of second-order approximation techniques

03

Integration strategies for advanced optimizers in training workflows

Abstract

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models, order-of-magnitude reductions in training time, and improved interpretability into how networks learn. While stochastic gradient descent (SGD) and its variants have become the de facto standard for training deep networks, their success in these over-parameterized regimes often appears more empirical than principled. This thesis investigates this apparent paradox by tracing the evolution of optimization algorithms from classical first-order methods to modern higher-order techniques, revealing how principled algorithmic design can demystify the training process. Starting from first principles with SGD and adaptive gradient methods, the analysis progressively uncovers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Advanced Multi-Objective Optimization Algorithms