A Short Survey of Averaging Techniques in Stochastic Gradient Methods
K. Lakshmanan

TL;DR
This survey reviews averaging techniques in stochastic gradient methods, highlighting their theoretical foundations, recent developments, and applications in machine learning, especially deep learning, emphasizing improved convergence and generalization.
Contribution
It provides a comprehensive overview of averaging methods, their theoretical basis, recent advances, and practical applications in large-scale machine learning.
Findings
Averaging schemes improve convergence stability.
Recent techniques like stochastic weight averaging enhance generalization.
Finite-sample analysis offers new insights into averaging effectiveness.
Abstract
Stochastic gradient methods are among the most widely used algorithms for large-scale optimization and machine learning. A key technique for improving the statistical efficiency and stability of these methods is the use of averaging schemes applied to the sequence of iterates generated during optimization. Starting from the classical work on stochastic approximation, averaging techniques such as Polyak--Ruppert averaging have been shown to achieve optimal asymptotic variance and improved convergence behavior. In recent years, averaging methods have gained renewed attention in machine learning applications, particularly in the training of deep neural networks and large-scale learning systems. Techniques such as tail averaging, exponential moving averages, and stochastic weight averaging have demonstrated strong empirical performance and improved generalization properties. This paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Risk and Portfolio Optimization
