Variance Regularization for Accelerating Stochastic Optimization
Tong Yang, Long Sha, Pengyu Hong

TL;DR
This paper introduces a variance regularization technique that leverages mini-batch gradient statistics to reduce error accumulation, thereby accelerating and stabilizing stochastic optimization processes.
Contribution
It proposes a universal variance regularization principle that adjusts learning rates based on mini-batch variances, improving stochastic gradient methods.
Findings
Speeds up convergence of stochastic optimization.
Stabilizes stochastic gradient descent.
Enhances performance of generic first-order methods.
Abstract
While nowadays most gradient-based optimization methods focus on exploring the high-dimensional geometric features, the random error accumulated in a stochastic version of any algorithm implementation has not been stressed yet. In this work, we propose a universal principle which reduces the random error accumulation by exploiting statistic information hidden in mini-batch gradients. This is achieved by regularizing the learning-rate according to mini-batch variances. Due to the complementarity of our perspective, this regularization could provide a further improvement for stochastic implementation of generic 1st order approaches. With empirical results, we demonstrated the variance regularization could speed up the convergence as well as stabilize the stochastic optimization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research
