Stochastic Second-Order Methods Improve Best-Known Sample Complexity of   SGD for Gradient-Dominated Function

Saeed Masiha; Saber Salehkaleybar; Niao He; Negar Kiyavash; Patrick; Thiran

arXiv:2205.12856·cs.LG·January 24, 2023·1 cites

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick, Thiran

PDF

Open Access

TL;DR

This paper demonstrates that Stochastic Cubic Regularized Newton (SCRN) methods significantly improve sample complexity over traditional stochastic gradient descent for gradient-dominated functions, with applications in machine learning and reinforcement learning.

Contribution

The paper introduces SCRN for gradient-dominated functions, providing improved sample complexity bounds and demonstrating effectiveness in reinforcement learning scenarios.

Findings

01

SCRN achieves better sample complexity than SGD for gradient-dominated functions.

02

SCRN's performance is validated through experiments in reinforcement learning.

03

Variance reduction further improves SCRN's efficiency for certain cases.

Abstract

We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with $1 \leq α \leq 2$ which holds in a wide range of applications in machine learning and signal processing. This condition ensures that any first-order stationary point is a global optimum. We prove that the total sample complexity of SCRN in achieving $ϵ$ -global optimum is $O (ϵ^{- 7/ (2 α) + 1})$ for $1 \leq α < 3/2$ and $\tilde{O} (ϵ^{- 2/ (α)})$ for $3/2 \leq α \leq 2$ . SCRN improves the best-known sample complexity of stochastic gradient descent. Even under a weak version of gradient dominance property, which is applicable to policy-based reinforcement learning (RL), SCRN achieves the same improvement over stochastic policy gradient methods. Additionally, we show that the average sample complexity of SCRN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Machine Learning and ELM