Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent
Xiang Li, Qiaomin Xie

TL;DR
This paper introduces a coupling-based convergence diagnostic for SGD with constant stepsize, enabling a dynamic stepsize scheme that improves convergence detection and performance across various problems.
Contribution
The paper proposes a novel coupling-based diagnostic method for SGD that effectively detects stationarity, facilitating an adaptive stepsize scheme with superior performance.
Findings
The diagnostic accurately detects the transition to stationarity in SGD.
The proposed stepsize scheme outperforms existing methods in diverse problems.
The approach is robust to different hyperparameter settings.
Abstract
The convergence behavior of Stochastic Gradient Descent (SGD) crucially depends on the stepsize configuration. When using a constant stepsize, the SGD iterates form a Markov chain, enjoying fast convergence during the initial transient phase. However, when reaching stationarity, the iterates oscillate around the optimum without making further progress. In this paper, we study the convergence diagnostics for SGD with constant stepsize, aiming to develop an effective dynamic stepsize scheme. We propose a novel coupling-based convergence diagnostic procedure, which monitors the distance of two coupled SGD iterates for stationarity detection. Our diagnostic statistic is simple and is shown to track the transition from transience stationarity theoretically. We conduct extensive numerical experiments and compare our method against various existing approaches. Our proposed coupling-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsSparse Evolutionary Training · Stochastic Gradient Descent
