Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:   Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

Pan Zhou; Xiaotong Yuan

arXiv:2009.09835·cs.LG·September 22, 2020

Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

Pan Zhou, Xiaotong Yuan

PDF

Open Access 1 Video

TL;DR

This paper introduces a hybrid stochastic-deterministic minibatch proximal gradient algorithm that achieves nearly optimal data-size-independent complexity, enabling less-than-single-pass optimization with strong generalization guarantees for large-scale learning.

Contribution

The paper proposes the HSDMPG algorithm with provably improved complexity bounds that are nearly independent of data size, outperforming prior SVRG methods for large-scale problems.

Findings

01

Achieves $ ilde{O}(n^{0.875})$ gradient evaluations for generalization in quadratic loss.

02

Provides complexity bounds that are nearly data-size-independent.

03

Demonstrates computational advantages over prior algorithms through numerical results.

Abstract

Stochastic variance-reduced gradient (SVRG) algorithms have been shown to work favorably in solving large-scale learning problems. Despite the remarkable success, the stochastic gradient complexity of SVRG-type algorithms usually scales linearly with data size and thus could still be expensive for huge data. To address this deficiency, we propose a hybrid stochastic-deterministic minibatch proximal gradient (HSDMPG) algorithm for strongly-convex problems that enjoys provably improved data-size-independent complexity guarantees. More precisely, for quadratic loss $F (θ)$ of $n$ components, we prove that HSDMPG can attain an $ϵ$ -optimization-error $E [F (θ) - F (θ^{*})] \leq ϵ$ within $\mathcal{O}\Big(\frac{\kappa^{1.5}\epsilon^{0.75}\log^{1.5}(\frac{1}{\epsilon})+1}{\epsilon}\wedge\Big(\kappa…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM