Accelerating Single-Pass SGD for Generalized Linear Prediction
Qian Chen, Shihong Ding, Cong Fang

TL;DR
This paper introduces a novel momentum-accelerated algorithm for single-pass stochastic gradient descent in generalized linear prediction, achieving improved convergence and resolving an open problem in streaming optimization.
Contribution
It presents the first data-dependent proximal method with dual-momentum acceleration for single-pass streaming generalized linear models, improving optimization and statistical error bounds.
Findings
Momentum accelerates convergence more effectively than variance reduction.
The algorithm achieves minimax optimal statistical error.
Theoretical analysis handles model mis-specification effectively.
Abstract
We study generalized linear prediction under a streaming setting, where each iteration uses only one fresh data point for a gradient-level update. While momentum is well-established in deterministic optimization, a fundamental open question is whether it can accelerate such single-pass non-quadratic stochastic optimization. We propose the first algorithm that successfully incorporates momentum via a novel data-dependent proximal method, achieving dual-momentum acceleration. Our derived excess risk bound decomposes into three components: an improved optimization error, a minimax optimal statistical error, and a higher-order model-misspecification error. The proof handles mis-specification via a fine-grained stationary analysis of inner updates, while localizing statistical error through a two-phase outer-loop analysis. As a result, we resolve the open problem posed by Jain et al. [2018a]…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Generative Adversarial Networks and Image Synthesis
