Stochastic Dimension-reduced Second-order Methods for Policy   Optimization

Jinsong Liu; Chenghan Xie; Qi Deng; Dongdong Ge; Yinyu Ye

arXiv:2301.12174·math.OC·January 31, 2023·1 cites

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

Jinsong Liu, Chenghan Xie, Qi Deng, Dongdong Ge, Yinyu Ye

PDF

Open Access

TL;DR

This paper introduces new stochastic second-order algorithms for policy optimization that are computationally efficient, leveraging dimension reduction and variance reduction techniques to improve convergence rates over existing methods.

Contribution

The paper presents two novel algorithms, DR-SOPO and DVR-SOPO, that achieve improved convergence complexities for policy optimization using second-order information.

Findings

01

DR-SOPO achieves $ ilde{O}(rac{1}{ ext{epsilon}^{3.5}})$ complexity.

02

DVR-SOPO improves complexity to $ ilde{O}(rac{1}{ ext{epsilon}^3})$ with variance reduction.

03

Preliminary experiments show favorable performance compared to existing stochastic and variance-reduced policy gradient methods.

Abstract

In this paper, we propose several new stochastic second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration, making them computationally efficient and comparable to policy gradient methods. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem. We show that DR-SOPO obtains an $O (ϵ^{- 3.5})$ complexity for reaching approximate first-order stationary condition and certain subspace second-order stationary condition. In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to $O (ϵ^{- 3})$ based on the variance reduction technique. Preliminary experiments show that our proposed algorithms perform favorably compared with stochastic and variance-reduced policy gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Machine Learning and ELM