Stochastic Dimension-reduced Second-order Methods for Policy Optimization
Jinsong Liu, Chenghan Xie, Qi Deng, Dongdong Ge, Yinyu Ye

TL;DR
This paper introduces new stochastic second-order algorithms for policy optimization that are computationally efficient, leveraging dimension reduction and variance reduction techniques to improve convergence rates over existing methods.
Contribution
The paper presents two novel algorithms, DR-SOPO and DVR-SOPO, that achieve improved convergence complexities for policy optimization using second-order information.
Findings
DR-SOPO achieves $ ilde{O}(rac{1}{ ext{epsilon}^{3.5}})$ complexity.
DVR-SOPO improves complexity to $ ilde{O}(rac{1}{ ext{epsilon}^3})$ with variance reduction.
Preliminary experiments show favorable performance compared to existing stochastic and variance-reduced policy gradient methods.
Abstract
In this paper, we propose several new stochastic second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration, making them computationally efficient and comparable to policy gradient methods. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem. We show that DR-SOPO obtains an complexity for reaching approximate first-order stationary condition and certain subspace second-order stationary condition. In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to based on the variance reduction technique. Preliminary experiments show that our proposed algorithms perform favorably compared with stochastic and variance-reduced policy gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
