Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems
Zhan Gao, Alec Koppel, Alejandro Ribeiro

TL;DR
This paper introduces the two scale adaptive (TSA) scheme for stochastic gradient descent, which adaptively adjusts batch size and step-size to optimize convergence speed and computational efficiency in both convex and non-convex problems.
Contribution
The paper proposes a novel adaptive algorithm that dynamically adjusts batch size and step-size, achieving optimal convergence rates and reducing computational costs compared to fixed-parameter methods.
Findings
TSA achieves the same asymptotic convergence as standard SGD.
TSA attains the optimal error decreasing rate theoretically.
Experimentally, TSA outperforms fixed mini-batch and step-size methods.
Abstract
Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is required for exact asymptotic convergence with the fact that constant step-size learns faster in finite time up to an error. To do so, rather than fixing the mini-batch and the step-size at the outset, we propose a strategy to allow parameters to evolve adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is developed for both convex and non-convex stochastic optimization problems. It inherits the exact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
