Balancing Rates and Variance via Adaptive Batch-Size for Stochastic   Optimization Problems

Zhan Gao; Alec Koppel; Alejandro Ribeiro

arXiv:2007.01219·eess.SP·July 10, 2020

Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems

Zhan Gao, Alec Koppel, Alejandro Ribeiro

PDF

Open Access

TL;DR

This paper introduces the two scale adaptive (TSA) scheme for stochastic gradient descent, which adaptively adjusts batch size and step-size to optimize convergence speed and computational efficiency in both convex and non-convex problems.

Contribution

The paper proposes a novel adaptive algorithm that dynamically adjusts batch size and step-size, achieving optimal convergence rates and reducing computational costs compared to fixed-parameter methods.

Findings

01

TSA achieves the same asymptotic convergence as standard SGD.

02

TSA attains the optimal error decreasing rate theoretically.

03

Experimentally, TSA outperforms fixed mini-batch and step-size methods.

Abstract

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is required for exact asymptotic convergence with the fact that constant step-size learns faster in finite time up to an error. To do so, rather than fixing the mini-batch and the step-size at the outset, we propose a strategy to allow parameters to evolve adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is developed for both convex and non-convex stochastic optimization problems. It inherits the exact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications

MethodsStochastic Gradient Descent