Bregman Gradient Policy Optimization
Feihu Huang, Shangqian Gao, Heng Huang

TL;DR
This paper introduces a novel Bregman gradient policy optimization framework for reinforcement learning, providing algorithms with improved sample complexity and unifying existing methods, supported by theoretical analysis and experiments.
Contribution
It proposes a new Bregman gradient policy optimization framework with accelerated variants and convergence analysis, unifying and improving upon existing policy optimization algorithms.
Findings
BGPO achieves $O(psilon^{-4})$ sample complexity.
VR-BGPO achieves $O(psilon^{-3})$ sample complexity.
Experimental results demonstrate the efficiency of the proposed algorithms.
Abstract
In the paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning based on Bregman divergences and momentum techniques. Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm based on the basic momentum technique and mirror descent iteration. Meanwhile, we further propose an accelerated Bregman gradient policy optimization (VR-BGPO) algorithm based on the variance reduced technique. Moreover, we provide a convergence analysis framework for our Bregman gradient policy optimization under the nonconvex setting. We prove that our BGPO achieves a sample complexity of for finding -stationary policy only requiring one trajectory at each iteration, and our VR-BGPO reaches the best known sample complexity of , which also only requires one trajectory at each iteration. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
