Second-order Information Promotes Mini-Batch Robustness in   Variance-Reduced Gradients

Sachin Garg; Albert S. Berahas; Micha{\l} Derezi\'nski

arXiv:2404.14758·math.OC·April 24, 2024

Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients

Sachin Garg, Albert S. Berahas, Micha{\l} Derezi\'nski

PDF

Open Access

TL;DR

This paper shows that incorporating partial second-order information into variance-reduced stochastic gradient methods significantly enhances their robustness to mini-batch size variations, enabling better scalability and consistent convergence.

Contribution

The paper introduces a novel mini-batch stochastic variance-reduced Newton method that maintains fast convergence across a wide range of mini-batch sizes, with theoretical and empirical validation.

Findings

01

Convergence rate is independent of mini-batch size for large data when using the proposed method.

02

The phase transition point for mini-batch size aligns with theoretical predictions.

03

Empirical results confirm robustness of the method across various tasks.

Abstract

We show that, for finite-sum minimization problems, incorporating partial second-order information of the objective function can dramatically improve the robustness to mini-batch size of variance-reduced stochastic gradient methods, making them more scalable while retaining their benefits over traditional Newton-type approaches. We demonstrate this phenomenon on a prototypical stochastic second-order algorithm, called Mini-Batch Stochastic Variance-Reduced Newton ( $Mb-SVRN$ ), which combines variance-reduced gradient estimates with access to an approximate Hessian oracle. In particular, we show that when the data size $n$ is sufficiently large, i.e., $n ≫ α^{2} κ$ , where $κ$ is the condition number and $α$ is the Hessian approximation factor, then $Mb-SVRN$ achieves a fast linear convergence rate that is independent of the gradient mini-batch size $b$ ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Machine Learning and ELM