Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives
Zhishuai Guo, Yan Yan, Tianbao Yang

TL;DR
This paper investigates how increasingly weighted averaging schemes in stochastic gradient descent influence both optimization and generalization errors across convex, strongly convex, and non-convex problems, revealing trade-offs and practical benefits.
Contribution
It provides a comprehensive analysis of increasingly weighted averaging in SGD for various objective types, addressing both optimization and generalization errors, which was previously under-explored.
Findings
Weighted averaging affects optimization and generalization errors differently.
Polynomially increased weighting can improve convergence and generalization.
Trade-offs exist between optimization speed and generalization performance.
Abstract
Stochastic gradient descent (SGD) has been widely studied in the literature from different angles, and is commonly employed for solving many big data machine learning problems. However, the averaging technique, which combines all iterative solutions into a single solution, is still under-explored. While some increasingly weighted averaging schemes have been considered in the literature, existing works are mostly restricted to strongly convex objective functions and the convergence of optimization error. It remains unclear how these averaging schemes affect the convergence of {\it both optimization error and generalization error} (two equally important components of testing error) for {\bf non-strongly convex objectives, including non-convex problems}. In this paper, we {\it fill the gap} by comprehensively analyzing the increasingly weighted averaging on convex, strongly convex and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques
