Revisiting SGD with Increasingly Weighted Averaging: Optimization and   Generalization Perspectives

Zhishuai Guo; Yan Yan; Tianbao Yang

arXiv:2003.04339·cs.LG·May 28, 2020·1 cites

Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives

Zhishuai Guo, Yan Yan, Tianbao Yang

PDF

Open Access

TL;DR

This paper investigates how increasingly weighted averaging schemes in stochastic gradient descent influence both optimization and generalization errors across convex, strongly convex, and non-convex problems, revealing trade-offs and practical benefits.

Contribution

It provides a comprehensive analysis of increasingly weighted averaging in SGD for various objective types, addressing both optimization and generalization errors, which was previously under-explored.

Findings

01

Weighted averaging affects optimization and generalization errors differently.

02

Polynomially increased weighting can improve convergence and generalization.

03

Trade-offs exist between optimization speed and generalization performance.

Abstract

Stochastic gradient descent (SGD) has been widely studied in the literature from different angles, and is commonly employed for solving many big data machine learning problems. However, the averaging technique, which combines all iterative solutions into a single solution, is still under-explored. While some increasingly weighted averaging schemes have been considered in the literature, existing works are mostly restricted to strongly convex objective functions and the convergence of optimization error. It remains unclear how these averaging schemes affect the convergence of {\it both optimization error and generalization error} (two equally important components of testing error) for {\bf non-strongly convex objectives, including non-convex problems}. In this paper, we {\it fill the gap} by comprehensively analyzing the increasingly weighted averaging on convex, strongly convex and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques