Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization
Pan Zhou, Xiaotong Yuan

TL;DR
This paper introduces a hybrid stochastic-deterministic minibatch proximal gradient algorithm that achieves nearly optimal data-size-independent complexity, enabling less-than-single-pass optimization with strong generalization guarantees for large-scale learning.
Contribution
The paper proposes the HSDMPG algorithm with provably improved complexity bounds that are nearly independent of data size, outperforming prior SVRG methods for large-scale problems.
Findings
Achieves $ ilde{O}(n^{0.875})$ gradient evaluations for generalization in quadratic loss.
Provides complexity bounds that are nearly data-size-independent.
Demonstrates computational advantages over prior algorithms through numerical results.
Abstract
Stochastic variance-reduced gradient (SVRG) algorithms have been shown to work favorably in solving large-scale learning problems. Despite the remarkable success, the stochastic gradient complexity of SVRG-type algorithms usually scales linearly with data size and thus could still be expensive for huge data. To address this deficiency, we propose a hybrid stochastic-deterministic minibatch proximal gradient (HSDMPG) algorithm for strongly-convex problems that enjoys provably improved data-size-independent complexity guarantees. More precisely, for quadratic loss of components, we prove that HSDMPG can attain an -optimization-error within $\mathcal{O}\Big(\frac{\kappa^{1.5}\epsilon^{0.75}\log^{1.5}(\frac{1}{\epsilon})+1}{\epsilon}\wedge\Big(\kappa…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
