Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients
Sachin Garg, Albert S. Berahas, Micha{\l} Derezi\'nski

TL;DR
This paper shows that incorporating partial second-order information into variance-reduced stochastic gradient methods significantly enhances their robustness to mini-batch size variations, enabling better scalability and consistent convergence.
Contribution
The paper introduces a novel mini-batch stochastic variance-reduced Newton method that maintains fast convergence across a wide range of mini-batch sizes, with theoretical and empirical validation.
Findings
Convergence rate is independent of mini-batch size for large data when using the proposed method.
The phase transition point for mini-batch size aligns with theoretical predictions.
Empirical results confirm robustness of the method across various tasks.
Abstract
We show that, for finite-sum minimization problems, incorporating partial second-order information of the objective function can dramatically improve the robustness to mini-batch size of variance-reduced stochastic gradient methods, making them more scalable while retaining their benefits over traditional Newton-type approaches. We demonstrate this phenomenon on a prototypical stochastic second-order algorithm, called Mini-Batch Stochastic Variance-Reduced Newton (), which combines variance-reduced gradient estimates with access to an approximate Hessian oracle. In particular, we show that when the data size is sufficiently large, i.e., , where is the condition number and is the Hessian approximation factor, then achieves a fast linear convergence rate that is independent of the gradient mini-batch size ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Machine Learning and ELM
