Bregman Gradient Policy Optimization

Feihu Huang; Shangqian Gao; Heng Huang

arXiv:2106.12112·cs.LG·March 17, 2022·1 cites

Bregman Gradient Policy Optimization

Feihu Huang, Shangqian Gao, Heng Huang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel Bregman gradient policy optimization framework for reinforcement learning, providing algorithms with improved sample complexity and unifying existing methods, supported by theoretical analysis and experiments.

Contribution

It proposes a new Bregman gradient policy optimization framework with accelerated variants and convergence analysis, unifying and improving upon existing policy optimization algorithms.

Findings

01

BGPO achieves $O(psilon^{-4})$ sample complexity.

02

VR-BGPO achieves $O(psilon^{-3})$ sample complexity.

03

Experimental results demonstrate the efficiency of the proposed algorithms.

Abstract

In the paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning based on Bregman divergences and momentum techniques. Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm based on the basic momentum technique and mirror descent iteration. Meanwhile, we further propose an accelerated Bregman gradient policy optimization (VR-BGPO) algorithm based on the variance reduced technique. Moreover, we provide a convergence analysis framework for our Bregman gradient policy optimization under the nonconvex setting. We prove that our BGPO achieves a sample complexity of $O (ϵ^{- 4})$ for finding $ϵ$ -stationary policy only requiring one trajectory at each iteration, and our VR-BGPO reaches the best known sample complexity of $O (ϵ^{- 3})$ , which also only requires one trajectory at each iteration. In particular,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gaosh/bgpo
pytorchOfficial

Videos

Bregman Gradient Policy Optimization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques