Unified Policy Optimization for Continuous-action Reinforcement Learning   in Non-stationary Tasks and Games

Rong-Jun Qin; Fan-Ming Luo; Hong Qian; Yang Yu

arXiv:2208.09452·cs.LG·August 22, 2022·1 cites

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

Rong-Jun Qin, Fan-Ming Luo, Hong Qian, Yang Yu

PDF

Open Access

TL;DR

This paper introduces PORL, a no-regret reinforcement learning algorithm for continuous actions in non-stationary environments, with proven convergence and superior performance in dynamic and adversarial settings.

Contribution

It proposes a novel PORL algorithm based on FTRL and MD, with last-iterate convergence guarantees for non-stationary continuous-action tasks.

Findings

01

PORL matches or exceeds SAC in stationary environments.

02

PORL outperforms SAC in non-stationary and adversarial environments.

03

PORL demonstrates stable training and better final policies.

Abstract

This paper addresses policy learning in non-stationary environments and games with continuous actions. Rather than the classical reward maximization mechanism, inspired by the ideas of follow-the-regularized-leader (FTRL) and mirror descent (MD) update, we propose a no-regret style reinforcement learning algorithm PORL for continuous action tasks. We prove that PORL has a last-iterate convergence guarantee, which is important for adversarial and cooperative games. Empirical studies show that, in stationary environments such as MuJoCo locomotion controlling tasks, PORL performs equally well as, if not better than, the soft actor-critic (SAC) algorithm; in non-stationary environments including dynamical environments, adversarial training, and competitive games, PORL is superior to SAC in both a better final policy performance and a more stable training process.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsGlobal Average Pooling · Dilated Convolution · Convolution · 1x1 Convolution · Average Pooling · Switchable Atrous Convolution